A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs

Charalambous, Charalambos D.; van Schuppen, Jan H.

doi:10.3390/e24091227

Open AccessFeature PaperArticle

A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs^†

by

Charalambos D. Charalambous

^1,*

and

Jan H. van Schuppen

²

¹

Department of Electrical and Computer Engineering, University of Cyprus, P.O. Box 20537, CY-1678 Nicosia, Cyprus

²

Van Schuppen Control Research, Gouden Leeuw 143, 1103 KB Amsterdam, The Netherlands

^*

Author to whom correspondence should be addressed.

^†

Preliminary results are presented in part at the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020.

Entropy 2022, 24(9), 1227; https://doi.org/10.3390/e24091227

Submission received: 21 June 2022 / Revised: 21 August 2022 / Accepted: 25 August 2022 / Published: 1 September 2022

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Examined in this paper is the Gray and Wyner source coding for a simple network of correlated multivariate Gaussian random variables,

Y_{1} : Ω \to R^{p_{1}}

and

Y_{2} : Ω \to R^{p_{2}}

. The network consists of an encoder that produces two private rates

R_{1}

and

R_{2}

, and a common rate

R_{0}

, and two decoders, where decoder 1 receives rates

(R_{1}, R_{0})

and reproduces

Y_{1}

by

{\hat{Y}}_{1}

, and decoder 2 receives rates

(R_{2}, R_{0})

and reproduces

Y_{2}

by

{\hat{Y}}_{2}

, with mean-square error distortions

E | | Y_{i} - {\hat{Y}}_{i} {| |}_{R^{p_{i}}}^{2} \leq Δ_{i} \in [0, \infty], i = 1, 2

. Use is made of the weak stochastic realization and the geometric approach of such random variables to derive test channel distributions, which characterize the rates that lie on the Gray and Wyner rate region. Specific new results include: (1) A proof that, among all continuous or finite-valued random variables,

W : Ω \to W

, Wyner’s common information,

C (Y_{1}, Y_{2}) = {inf}_{P_{Y_{1}, Y_{2}, W} : P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}} I (Y_{1}, Y_{2}; W)

, is achieved by a Gaussian random variable,

W : Ω \to R^{n}

of minimum dimension n, which makes the two components of the tuple

(Y_{1}, Y_{2})

conditionally independent according to the weak stochastic realization of

(Y_{1}, Y_{2})

, and a the formula

C (Y_{1}, Y_{2}) = \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{1 + d_{j}}{1 - d_{j}}),

where

d_{i} \in (0, 1), i = 1, \dots, n

are the canonical correlation coefficients of the correlated parts of

Y_{1}

and

Y_{2}

, and a realization of

(Y_{1}, Y_{2}, W)

which achieves this. (2) The parameterization of rates that lie on the Gray and Wyner rate region, and several of its subsets. The discussion is largely self-contained and proceeds from first principles, while connections to prior literature is discussed.

Keywords:

Gray–Wyner network; Wyner’s lossy common information; weak realizations of conditional independence; canonical variable form of multivariate Gaussian random variables; multi-user communication

1. Introduction

In their seminal paper, Source Coding for a Simple Network [1], Gray and Wyner characterized the lossless rate region for a tuple of finite-valued random variables, and the lossy rate region for a tuple of arbitrary distributed random variables. Many extensions and generalizations followed Gray and Wyner’s fundamental work. Wyner [2] introduced an operational definition of the common information between a tuple of sources that generate symbols with values in finite spaces. Wyner’s operational definition of common information is defined as the minimum achievable common message rate on the Gray and Wyner lossless rate region. Witsenhausen [3] investigated bounds for Wyner’s common information, and sequences of pairs of random variables in this regard [4]. Gács and Körner [5] introduced another definition of common randomness between a tuple of jointly independent and identically distributed random variables. Benammar and Zaidi [6,7] characterized the Gray–Wyner rate region, when there is side information at the decoders, under various scenarios that include both receivers and reproduce the source symbols without distortion. Insightful application examples for binary sources are considered in [7] (Section 4.2). In their previous work, Benammar and Zaidi [8,9] characterized the rate distortion function of the Heegard and Berger [10] problem, with two sources and side information at the two decoders (under a degraded set-up). Connections between the Gray and Wyner lossy source coding network and the notions of empirical and strong coordination capacity for arbitrary networks were developed by Cuff, Permuter and Cover [11] and the references therein, where the authors elaborated on the usefulness of the common information between the different network nodes.

Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13], explored the connection of Wyner’s common information and the Gray and Wyner lossy rate region, to generalize Wyner’s common information to its lossy counterpart, for random variables taking values in arbitrary spaces. They characterized Wyner’s lossy common information as the minimum common message rate on the Gray and Wyner lossy rate region, when the sum rate is arbitrarily close to the rate distortion function with joint decoding for the Gray and Wyner lossy network. Applications to encryption and secret key generation are discussed by Viswanatha, Akyol and Rose in [12] (and references therein).

The current paper is focused on the calculations of rates that lie in the Gray and Wyner rate region [1], for two sources that generate symbols, according to the model of jointly independent and identically distributed multivariate correlated Gaussian random variables

Y_{1} : Ω \to R^{p_{1}}, Y_{2} : Ω \to R^{p_{2}}

, and square-error fidelity at the two decoders. The current literature on methods and algorithms to compute such rates are subject to a number of limitations which often prevent their practical usefulness:

(1) Rates that lie in the Gray and Wyner rate region are only known for the special case of a tuple of scalar-valued Gaussian random variables with square error distortion, i.e.,

p_{1} = p_{2} = 1

[1,12,13].

(2) Wyner’s lossy common information is only computed in closed form, for the special cases of a tuple of scalar-valued Gaussian random variables, [12,13].

(3) Important generalizations to a tuple of sources that generate multivariate Gaussian symbols, require new derivations often of considerable difficulty.

(4) Realizations of the optimal test channel distributions and their structural properties of the various rate distortion functions (RDFs), which are involved in the Gray and Wyner characterization of the rate region, are not developed.

(5) A proof that the Gray and Wyner for jointly Gaussian sources, is characterized by a Gaussian auxiliary random variable W is still missing from past literature.

It is known from [1] that the Gray and Wyner rate region can be parameterized by an auxiliary random variable

W : Ω \to W

, via several rate distortion functions. Moreover, subsets of the Gray and Wyner rate region are parameterized by W which satisfies conditional independence (1).

\begin{matrix} P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W} . \end{matrix}

(1)

The current paper makes use of the canonical variable form and the weak stochastic realization of the tuple of random variables

(Y_{1}, Y_{2})

, introduced in Section 2 to characterize subsets of the Gray and Wyner rate region, which are parameterized by jointly Gaussian random variables

(Y_{1}, Y_{2}, W)

with

W : Ω \to R^{n}

, where n is a finite number, while in some cases, the minimum dimension of W is clarified. The weak stochastic realization is developed to deal with the fundamental issue that, for the Gray and Wyner network, one is given the joint distribution

P_{Y_{1}, Y_{2}}

, while the characterization of the RDFs involves the specification of the test channel distributions, that achieve these RDFs, and the actual construction of realizations of all random variables involved, that induce the test channel distributions. Furthermore, Wyner’s common information between

Y_{1}

and

Y_{2}

involves the construction of a joint distribution

P_{Y_{1}, Y_{2}, W}

where W is the auxiliary random variable that makes

Y_{1}

and

Y_{2}

conditionally independent, i.e., (1) holds.

The rest of the section serves mainly to review the Gray and Wyner characterization of the lossy rate region and the characterization of Wyner’s lossy common information.

1.1. Literature Review

(a) The Gray and Wyner source coding for a simple network [1].

Consider the Gray and Wyner source coding for a simple network, as shown in Figure 1, for a tuple of jointly independent and identically distributed multivariate Gaussian random variables

(Y_{1}^{N}, Y_{2}^{N}) = {(Y_{1, i}, Y_{2, i}) : i = 1, 2, \dots, N}

,

\begin{matrix} Y_{1, i} : Ω \to R^{p_{1}} = Y_{1}, Y_{2, i} : Ω \to R^{p_{2}} = Y_{2}, i = 1, \dots, N \end{matrix}

(2)

with square error distortion functions at the two decoders,

\begin{matrix} D_{Y_{1}} (y_{1}^{N}, {\hat{y}}_{1}^{N}) = \frac{1}{N} \sum_{i = 1}^{N} | | y_{1, i} - {\hat{y}}_{1, i} {| |}_{R^{p_{1}}}^{2}, D_{Y_{2}} (y_{2}^{N}, {\hat{y}}_{2}^{N}) = \frac{1}{N} \sum_{i = 1}^{N} | | y_{2, i} - {\hat{y}}_{2, i} {| |}_{R^{p_{2}}}^{2} \end{matrix}

(3)

where

| | \cdot {| |}_{R^{p_{i}}}^{2}

are Euclidean distances on

R^{p_{i}}, i = 1, 2

.

The encoder takes as its input the data sequences

(Y_{1}^{N}, Y_{2}^{N})

and produces at its output three messages,

(S_{0}, S_{1}, S_{2})

, with binary bit representations

(N R_{0}, N R_{1}, N R_{2})

, respectively. There are three channels, channel 0, channel 1, channel 2, with capacities

(C_{0}, C_{1}, C_{2})

(in bits per second), respectively, to transmit the messages to two decoders. Channel 0 is a common channel and channel 1 and channel 2 are the private channels which connect the encoder to each of the two decoders. Message

S_{0}

is a common or public message that is transmitted through the common channel 0 with capacity

C_{0}

to decoder 1 and decoder 2;

S_{1}

is a private message which is transmitted through the private channel 1 with capacity

C_{1}

to decoder 1; and

S_{2}

is a private message, which is transmitted through the private channel 2 with capacity

C_{2}

to decoder 2.

Decoder 1 aims to reproduce

Y_{1}^{N}

by

{\hat{Y}}_{1}^{N}

subject to an average distortion and decoder 2 aims to reproduce

Y_{2}^{N}

by

{\hat{Y}}_{2}^{N}

, subject to an average distortion, where

({\hat{Y}}_{1, i}, {\hat{Y}}_{2, i}) = ({\hat{y}}_{1, i}, {\hat{y}}_{2, i}) \in {\hat{Y}}_{1} \times {\hat{Y}}_{2} \subseteq Y_{1} \times Y_{2}, i = 1, \dots, N

, that is,

\begin{matrix} E \{D_{Y_{1}} (Y_{1}^{N}, {\hat{Y}}_{1}^{N})\} \leq Δ_{1}, E \{D_{Y_{2}} (Y_{2}^{N}, {\hat{Y}}_{2}^{N})\} \leq Δ_{2}, (Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] . \end{matrix}

(4)

Gray and Wyner characterized the rate region, denoted by

R_{G W} (Δ_{1}, Δ_{2})

, via a coding scheme that uses the auxiliary random variable W, and the family of probability distributions

\begin{matrix} P ≜ \{P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w), y_{1} \in Y_{1}, y_{2} \in Y_{2}, w \in W | P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, \infty) = P_{Y_{1}, Y_{2}} (y_{1}, y_{2})\} \end{matrix}

(5)

such that the joint probability distribution

P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w)

on

Y_{1} \times Y_{2} \times W

, has a

(Y_{1}, Y_{2})

−marginal probability distribution

P_{Y_{1}, Y_{2}} (y_{1}, y_{2})

on

Y_{1} \times Y_{2}

that coincides with the probability distribution of

(Y_{1}, Y_{2})

.

For the source and distortion function specified in (2) and (3), we apply the weak stochastic realization to construct the family of distributions

P

, which is parameterized by the auxiliary random variable W. Use is made of the characterization of

R_{G W} (Δ_{1}, Δ_{2})

is described in terms of an auxiliary random variable, as follows.

Theorem 1

(Theorem 8 in [1]). Let

R_{G W} (Δ_{1}, Δ_{2})

denote the Gray and Wyner rate region of the simple network shown in Figure 1.

Suppose there exists

{\hat{y}}_{i} \in {\hat{Y}}_{i}

such that

E {d_{Y_{i}} (Y_{i}, {\hat{y}}_{i})} < \infty

, for

i = 1, 2

.

For each

P_{Y_{1}, Y_{2}, W} \in P

and

Δ_{1} \geq 0, Δ_{2} \geq 0

, define the subset of Euclidean

3 -

dimensional space

\begin{matrix} R_{G W}^{P_{Y_{1}, Y_{2}, W}} (Δ_{1}, Δ_{2}) = \{(R_{0}, R_{1}, R_{2}) | R_{0} \geq I (Y_{1}, Y_{2}; W), R_{1} \geq R_{Y_{1} | W} (Δ_{1}), R_{2} \geq R_{Y_{2} | W} (Δ_{2})\} \end{matrix}

(6)

where

R_{Y_{i} | W} (Δ_{i})

is the conditional rate distortion function of

Y_{i}^{N}

, conditioned on

W^{N}

, at decoder i, for

i = 1, 2

, and

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

is the joint rate distortion function of the joint decoding of

(Y_{1}^{N}, Y_{2}^{N})

(all single letters). Let

\begin{matrix} R_{G W}^{*} (Δ_{1}, Δ_{2}) = {(⋃_{P_{Y_{1}, Y_{2}, W} \in P} R_{G W}^{P_{Y_{1}, Y_{2}, W}} (Δ_{1}, Δ_{2}))}^{c} \end{matrix}

(7)

where

{(\cdot)}^{c}

denotes the closure of the indicated set. The achievable Gray–Wyner lossy rate region is given by

\begin{matrix} R_{G W} (Δ_{1}, Δ_{2}) = R_{G W}^{*} (Δ_{1}, Δ_{2}) . \end{matrix}

(8)

Gray and Wyner [1] (Theorem 6) also showed that, if

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

, then

\begin{matrix} R_{0} + R_{1} + R_{2} & \geq R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), \end{matrix}

(9)

\begin{matrix} R_{0} + R_{1} & \geq R_{Y_{1}} (Δ_{1}), \end{matrix}

(10)

\begin{matrix} R_{0} + R_{2} & \geq R_{Y_{2}} (Δ_{2}) \end{matrix}

(11)

where

R_{Y_{i}} (Δ_{i})

is the rate distortion function of

Y_{i}^{N}

at decoder i, for

i = 1, 2

, and

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

is the joint rate distortion function of

(Y_{1}^{N}, Y_{2}^{N})

at the two decoders. The inequality in (9) is called the Pangloss Bound of the Gray–Wyner lossy rate region

R_{G W} (Δ_{1}, Δ_{2})

. The set of triples

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

that satisfy the equality

R_{0} + R_{1} + R_{2} = R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

is called the Pangloss Plane of the Gray–Wyner lossy rate region

R_{G W} (Δ_{1}, Δ_{2})

.

Gray and Wyner proved that

R_{G W} (Δ_{1}, Δ_{2})

, is also determined from [1] ((4) of page 1703, Equation (42)),

\begin{matrix} T (α_{1}, α_{2}) = inf_{P_{Y_{1}, Y_{2}, W} \in P} \{I (Y_{1}, Y_{2}; W) + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})\} \end{matrix}

(12)

where

0 \leq α_{i} \leq 1, i = 1, 2, α_{1} + α_{2} \geq 1

, and where for each

P_{Y_{1}, Y_{2}, W} \in P

, the conditional distribution

P_{Y_{1}, Y_{2} | W}

is defined, from which follows the

Y_{i} -

marginals

P_{Y_{i} | W}, i = 1, 2

.

(b) Wyner’s common Information of finite-valued random variables.

Wyner [2] introduced an operational definition of the common information between a tuple of random variables

(Y_{1}^{N}, Y_{2}^{N})

that takes values in finite spaces.

The first approach of Wyner’s operational definition of common information between sequences

Y_{1}^{N}

and

Y_{2}^{N}

is defined as the minimum achievable common message rate

R_{0}

on the Gray–Wyner network of Figure 1.

Wyner’s single letter information theoretic characterization of the infimum of all achievable message rates

R_{0}

, called Wyner’s common information, is defined by

\begin{matrix} C (Y_{1}, Y_{2}) = inf_{P_{Y_{1}, Y_{2}, W} : P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}} I (Y_{1}, Y_{2}; W) . \end{matrix}

(13)

Here,

P_{Y_{1}, Y_{2}, W}

is any joint probability distribution on

Y_{1} \times Y_{2} \times W

with

(Y_{1}, Y_{2})

−marginal

P_{Y_{1}, Y_{2}}

, such that W makes

Y_{1}

and

Y_{2}

conditionally independent, that is

P_{Y_{1}, Y_{2}, W} \in P

.

(c) Minimum common message rate and Wyner’s lossy common information for arbitrary random variables.

Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13] explored the connection of Wyner’s common information and the Gray–Wyner lossy rate region, to provide a new interpretation of Wyner’s common information to its lossy counterpart.

The following characterization was derived by Xu, Liu and Chen [13] (an equivalent characterization was also derived by Viswanatha, Akyol and Rose [12]).

Theorem 2

(Theorem 4 in [13]). Suppose there exists

{\hat{y}}_{i} \in {\hat{Y}}_{i}

such that

E {d_{Y_{i}} (Y_{i}, {\hat{y}}_{i})} < \infty

, for

i = 1, 2

.

Let

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2})

denote the minimum common message rate

R_{0}

on the Gray–Wyner lossy rate region

R_{G W} (Δ_{1}, Δ_{2})

, with a sum rate not exceeding the joint rate distortion function,

\sum_{i = 0}^{2} R_{i} \geq R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

, while satisfying the average distortions.

Then,

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2})

is characterized by the optimization problem

\begin{matrix} C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = inf I (Y_{1}, Y_{2}; W) \end{matrix}

(14)

such that the following identity holds

\begin{matrix} R_{Y_{1} | W} (Δ_{1}) + R_{Y_{2} | W} (Δ_{2}) + I (Y_{1}, Y_{2}; W) = R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) \end{matrix}

(15)

where the infimum is over all random variables W taking values in

W

, which parameterize the source distribution via

P_{Y_{1}, Y_{2}, W}

, having a

Y_{1} \times Y_{2} -

marginal source distribution

P_{Y_{1}, Y_{2}}

, and induce joint distributions

P_{W, Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}}

which satisfy the constraints.

It is shown in [12,13] that there exists a distortion region such that

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = C_{W} (Y_{1}, Y_{2})

, i.e., it is independent of the distortions

(Δ_{1}, Δ_{2})

, and

C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2})

, i.e., it is equal to Wyner’s information theoretic characterization of common information between

Y_{1}

and

Y_{2}

, defined by (13). However, their proofs that W is finite-dimensional Gaussian relies on the assumption that W is continuous-valued.

The next theorem is derived by Xu, Liu and Chen [13].

Theorem 3

(Theorem 5 in [13]). Let

(Y_{1}, Y_{2})

be a pair of random variables with distribution

P_{Y_{1}, Y_{2}}

on the alphabet space

Y_{1} \times Y_{2}

, where

Y_{1}

and

Y_{2}

are arbitrary measurable spaces that can be discrete or continuous.

Let W be any random variable achieving

C (Y_{1}, Y_{2})

defined by (13).

Let the reproduction alphabet

{\hat{Y}}_{1} = Y_{1}

,

{\hat{Y}}_{2} = Y_{2}

and two per-letter distortion measures

d_{Y_{1}} (y_{1}, {\hat{y}}_{1}), d_{Y_{2}} (y_{2}, {\hat{y}}_{2})

satisfy

\begin{matrix} d_{Y_{i}} (y_{i}, {\hat{y}}_{i}) > d_{Y_{i}} (y_{i}, y_{i}) = 0, y_{i} \neq {\hat{y}}_{i}, i = 1, 2 \end{matrix}

(16)

If the following conditions are satisfied:

(1) For any

y_{1} \in Y_{1}, y_{2} \in Y_{2}

and

w \in W

,

P_{W | Y_{1}, Y_{2}} > 0

;

(2) There exists an

{\hat{y}}_{i} \in {\hat{Y}}_{i}

, such that

\begin{matrix} E \{d_{Y_{i}} (Y_{i}, {\hat{y}}_{i})\} < \infty, i = 1, 2 \end{matrix}

(17)

then there exists a strictly positive vector

γ = (γ_{1}, γ_{2}) \in (0, \infty) \times (0, \infty)

, such that, for

0 \leq (Δ_{1}, Δ_{2}) \leq γ

,

\begin{matrix} C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2}) . \end{matrix}

(18)

Moreover,

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2})

is constant on

D_{W} = \{(Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] : 0 \leq (Δ_{1}, Δ_{2}) \leq γ\}

.

The analog of the above theorem is also derived by Viswanatha, Akyol and Rose in [12] (Lemma 1). A subset of the Pangloss plane is derived by Gray and Wyner [1] (Theorem 9).

For a bivariate Gaussian random variables, i.e.,

p_{1} = p_{2} = 1

, with square-error distortions, Viswanatha, Akyol and Rose in [12], and Xu, Liu and Chen [13] computed

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2})

, by using Xiao’s and Luo’s [14] (Theorem 6) the closed-form expression of joint rate distortion function

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

. In addition, for the bivariate Gaussian random variables, with symmetric square-error distortions, i.e.,

Δ_{1} = Δ_{2} = Δ

, Gray and Wyner [1] (Section 2.5, (B)), computed a rate-triple

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

that lies on the Pangloss plane.

1.2. Main Theorems and Discussion

What follows is a brief summary of the main theorems derived in this paper, and relations to the literature.

Theorem 9 shows that, among all joint distributions

P_{Y_{1}, Y_{2}, W}

induced by a tuple of multivariate correlated Gaussian random variables

(Y_{1}, Y_{2})

, and an arbitrary random variable

W : Ω \to W

, continuous or discrete-valued, Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), is minimized by a triple

(Y_{1}, Y_{2}, W)

which induces a jointly Gaussian distribution

P_{Y_{1}, Y_{2}, W}

, and

W : Ω \to W = R^{n}

is a finite-dimensional Gaussian random variable. In particular, Theorem 9 gives the weak stochastic realization of

(Y_{1}, Y_{2})

, and the construction of the random variable W, which induce a joint distribution

P_{Y_{1}, Y_{2}, W}

that achieves the minimum of

I (Y_{1}, Y_{2}; W)

such that W makes

Y_{1}

and

Y_{2}

conditionally independent.

Then, use is made of Theorem 9, Section 2.2, such as Definition 1 of the canonical variable form and the weak stochastic realization to derive Wyner’s common information

C (Y_{1}, Y_{2})

defined by (13), and the optimal realization of the triple

(Y_{1}, Y_{2}, W^{*}) = (Y_{1}, Y_{2}, W^{*})

that achieves

C (Y_{1}, Y_{2})

, as stated in the next theorem.

Theorem 4.

Consider a tuple of Gaussian random variables

Y_{i} : Ω \to R^{p_{i}}

, with

Q_{Y_{i}} > 0

, for

i = 1, 2

,

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

,

Q_{(Y_{1}, Y_{2})} \geq 0

, and apply Algorithm A1 (and the notation therein) to decompose and transform the random variables into a canonical variable form (with abuse of notation, the transform random variables are denoted by

(Y_{1}, Y_{2}) \in G (0, Q_{cvf})

.),

(Y_{1}, Y_{2}) \in G (0, Q_{cvf})

, using the material and notation of Section 2.2, i.e., Definition 1.

(a) Then,

\begin{matrix} C (Y_{1}, Y_{2}) = & C (Y_{11}, Y_{21}) + C (Y_{12}, Y_{22}) + C (Y_{13}, Y_{23}) \\ = & \{\begin{matrix} 0, & i f & p_{13} > 0, p_{23} > 0, p_{11} = p_{12} = p_{21} = p_{22} = 0, \\ \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}), & i f & p_{12} = p_{22} > 0, p_{11} = p_{21} = 0, p_{13} \geq 0, p_{23} \geq 0, \\ + \infty, & i f & p_{11} = p_{21} > 0 \end{matrix} \end{matrix}

(19)

where

(p_{11}, p_{12}, p_{13})

and

(p_{21}, p_{22}, p_{23})

are the dimensions of the canonical variable decomposition of the tuple

(Y_{1}, Y_{2})

, and

\begin{matrix} C (Y_{11}, Y_{21}) = & + \infty, i f p_{11} = p_{21} > 0; \end{matrix}

(20)

\begin{matrix} C (Y_{13}, Y_{23}) = & 0, i f p_{13} > 0 a n d p_{23} > 0; \end{matrix}

(21)

\begin{matrix} C (Y_{12}, Y_{22}) = & \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}), i f n = p_{12} = p_{22} > 0 . \end{matrix}

(22)

Thus,

C (Y_{12}, Y_{22})

is the most interesting value if defined.

(b) The random variable

W^{*}

defined below is such that

C (Y_{1}, Y_{2})

of part (a) is attained.

\begin{matrix} W^{*} : Ω \to R^{n}, n \in Z_{+}, \\ n_{1} = p_{11} = p_{21}, n_{2} = p_{12} = p_{22}, n_{1} + n_{2} = n, \\ W^{*} = (\begin{matrix} W_{1}^{*} \\ W_{2}^{*} \end{matrix}), W_{1}^{*} : Ω \to R^{n_{1}}, W_{2}^{*} : Ω \to R^{n_{2}}, \\ W_{1}^{*} = Y_{11} = Y_{21}, \end{matrix}

(23)

W_{2}^{*} = L_{1} Y_{12} + L_{2} Y_{22} + L_{3} V, s e e T h e o r e m 11 . (b) f o r t h e f o r m u l a s o f L_{1}, L_{2}, L_{3}

(24)

where the following properties hold:

t h e n (Y_{1}, Y_{2}, W^{*}) \in G (0, Q_{s} (I)), s e e (81) f o r Q_{s} (I),

(25)

(F^{Y_{11}, Y_{12}, Y_{13}}, F^{Y_{21}, Y_{22}, Y_{23}} | F^{W_{1}^{*}, W_{2}^{*}}) \in CI,

(26)

F^{W_{1}^{*}} \subseteq (F^{Y_{11}} \lor F^{Y_{21}}), F^{W_{2}^{*}} \subseteq (F^{Y_{12}} \lor F^{Y_{22}}),

(27)

C (Y_{1}, Y_{2}) = I (Y_{1}, Y_{2}; W^{*}) .

(28)

(c) The following operations are defined, using (a),

W^{*} = (\begin{matrix} W_{1}^{*} \\ W_{2}^{*} \end{matrix}),

(29)

W_{1}^{*} = Y_{11} = Y_{21},

(30)

W_{2}^{*} = L_{1} Y_{12} + L_{2} Y_{22} + L_{3} V, s e e (103), (104) f o r t h e f o r m u l a s o f L_{1}, L_{2}, L_{3};

(31)

Z_{12} = Y_{12} - E [Y_{12} | F^{W_{2}^{*}}] = Y_{12} - Q_{Y_{12}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} W_{2}^{*},

(32)

Z_{22} = Y_{22} - E [Y_{22} | F^{W_{2}^{*}}] = Y_{22} - Q_{Y_{22}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} W_{2}^{*},

(33)

Z_{13} = Y_{13}, Z_{23} = Y_{23}, (t h e c o m p o n e n t s Z_{11} a n d Z_{21} d o n o t e x i s t),

(34)

Z_{1} = (\begin{matrix} Z_{12} \\ Z_{13} \end{matrix}), Z_{2} = (\begin{matrix} Z_{22} \\ Z_{23} \end{matrix})

(35)

and these imply,

Y_{11} = W_{1}^{*} = Y_{21},

(36)

Y_{12} = Q_{Y_{12}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} W_{2}^{*} + Z_{12}, Y_{22} = Q_{Y_{22}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} W_{2}^{*} + Z_{22},

(37)

Y_{13} = Z_{13}, Y_{23} = Z_{23};

(38)

equivalently

\begin{matrix} (\begin{matrix} Y_{11} \\ Y_{12} \\ Y_{13} \end{matrix}) = (\begin{matrix} I_{n_{1}} & 0 \\ 0 & Q_{Y_{12}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} \\ 0 & 0 \end{matrix}) (\begin{matrix} W_{1}^{*} \\ W_{2}^{*} \end{matrix}) + (\begin{matrix} 0 & 0 \\ I_{n_{2}} & 0 \\ 0 & I_{n - n_{1} - n_{2}} \end{matrix}) (\begin{matrix} Z_{12} \\ Z_{13} \end{matrix}), \\ (\begin{matrix} Y_{21} \\ Y_{22} \\ Y_{23} \end{matrix}) = (\begin{matrix} I_{n_{1}} & 0 \\ 0 & Q_{Y_{22}, W_{2}^{*}} Q_{W_{2}^{*}}^{- 1} \\ 0 & 0 \end{matrix}) (\begin{matrix} W_{1}^{*} \\ W_{2}^{*} \end{matrix}) + (\begin{matrix} 0 & 0 \\ I_{n_{2}} & 0 \\ 0 & I_{n - n_{1} - n_{2}} \end{matrix}) (\begin{matrix} Z_{22} \\ Z_{23} \end{matrix}) . \end{matrix}

The derivation of Theorem 4 is presented in Section 3.2, after several of the tools are presented, such as, weak stochastic realizations and minimal realizations.

Remark 1.

Relation of Theorem 4 to the literature.

(a) Corollary 1 in [15] gives an expression analogous to the case (20), which is expressed in terms of the correlation coefficients,

ρ_{i} \in (- 1, 1)

and not the canonical correlation coefficients

d_{i} \in (0, 1)

. Similarly, [16], under Lemma 1, reproduces Corollary 1 in [15], with the correlation coefficients

ρ_{i}

replaced by their absolute values

| ρ_{i} |

.

(b) The derivation in [15,16] is based on the use of rate distortion functions of Gaussian random variables with square-error distortion functions, which presupposes the that auxiliary RV

W \to W

takes continuous values.

(c) Refs. [15,16] do not provide a realization of the triple

(Y_{1}, Y_{2}, W^{*})

, as given in Theorem 4 (which is based on applying the parametrization of Theorem 8).

On the other hand, the derivation of Theorem 4 is based on Theorem 9, which shows that, among all joint distributions

P_{Y_{1}, Y_{2}, W}

induced by a tuple of multivariate correlated Gaussian random variables

(Y_{1}, Y_{2})

, and an arbitrary random variable

W : Ω \to W

, continuous or discrete-valued, Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), is minimized by a triple

(Y_{1}, Y_{2}, W)

which induces a jointly Gaussian distribution

P_{Y_{1}, Y_{2}, W}

, and

W : Ω \to W = R^{n}

is finite-dimensional Gaussian random variable.

(d) The derivation of Theorem 4 contains many intermediate results which are applicable to the problems considered in [15,16], such as Relaxed Wyner’s Common Information in [17]. These are discussed in Section 4.3.

Theorem 5 gives a parametric characterization of the Gray and Wyner rate region

R_{G W} (Δ_{1}, Δ_{2})

, with respect to the variance matrix of the triple of jointly Gaussian random variables

(Y_{1}, Y_{2}, W)

.

Theorem 5.

Consider a tuple of Gaussian random variables

Y_{i} : Ω \to R^{p_{i}}

, with

Q_{Y_{i}} > 0

, for

i = 1, 2

,

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

(not necessarily in canonical variable form), with induced Gaussian measure

P_{0} = G (0, Q_{(Y_{1}, Y_{2})})

on the space

(R^{p_{1}} \times R^{p_{2}}, B (R^{p_{1}}) \otimes B (R^{p_{2}}))

, and square error distrortion functions

D_{Y_{1}} (y_{1}, {\hat{y}}_{1}) = | | y_{1} - {\hat{y}}_{1} {| |}_{R^{p_{1}}}^{2}, D_{Y_{2}} (y_{2}, {\hat{y}}_{2}) = | | y_{2} - {\hat{y}}_{2} {| |}_{R^{p_{2}}}^{2}

.

The following hold.

(a) There exists a Gaussian measure

P_{1} = G (0, Q_{(Y_{1}, Y_{2}, W)})

defined on the space

(R^{p_{1}} \times R^{p_{2}} \times R^{n}, B (R^{p_{1}}) \otimes B (R^{p_{2}}) \otimes B (R^{n})), n \in Z_{+}

associated with the Gaussian random variables

(Y_{1}, Y_{2}))

,

W : Ω \to R^{n}, W \in G (0, Q_{W})

such that

P_{1} |_{R^{p_{1}} \times R^{p_{2}}} = G (0, Q_{(Y_{1}, Y_{2})})

. Moreover, a realization of the random variables

(Y_{1}, Y_{2}, W)

with induced measure

P_{1} = G (0, Q_{(Y_{1}, Y_{2}, W)})

is

(\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) = Q_{(Y_{1}, Y_{2}), W} Q_{W}^{†} W + (\begin{matrix} Z_{1} \\ Z_{2} \end{matrix})

(39)

(Z_{1}, Z_{2}) \in G (0, Q_{(Z_{1}, Z_{2})}), (Z_{1}, Z_{2}) i n d e p e n d e n t o f W,

(40)

Q_{(Y_{1}, Y_{2}), W} = E [(\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) W^{T}] = (\begin{matrix} Q_{Y_{1}, W} \\ Q_{Y_{2}, W} \end{matrix})

(41)

where

Q_{W}^{†}

is the pseudoinverse of

Q_{W}

.

(b) For Gaussian auxiliary random variables given in part (a), the Gray–Wyner rate region

R_{G W} (Δ_{1}, Δ_{2})

is determined from

\begin{matrix} T^{G} (α_{1}, α_{2}) = & inf_{(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)}) o f (68)} {I (Y_{1}, Y_{2}; W) \end{matrix}

\begin{matrix} + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})} \\ = & inf_{(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)}), o f (), ()} {I (Y_{1}, Y_{2}; W) \end{matrix}

(42)

\begin{matrix} + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})} \\ = & inf_{Q_{Y_{1}, Y_{2} | W}, Q_{Y_{i} | W}, i = 1, 2} {\frac{1}{2} ln {(\frac{det (Q_{(Y_{1}, Y_{2})})}{det (Q_{Y_{1} | Y_{2}, W}) det (Q_{Y_{2} | W})})}^{+} \end{matrix}

(43)

+ α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})}

(44)

s u b j e c t t o, Q_{(Y_{1}, Y_{2})} \geq Q_{(Y_{1}, Y_{2}) | W}, Q_{Y_{1} | Y_{2}, W} = Q_{Y_{1} | W} - Q_{Y_{1}, Y_{2} | W} Q_{Y_{2} | W}^{†} Q_{Y_{1}, Y_{2} | W}^{T}

(45)

where

0 \leq α_{i} \leq 1, i = 1, 2, α_{1} + α_{2} \geq 1

,

I (Y_{1}, Y_{2}; W) = H (Y_{1}, Y_{2}) - H (Y_{1} | Y_{2}, W) - H (Y_{2} | W)

, and

R_{Y_{i} | W} (Δ_{i}), i = 1, 2

are given in Theorem 13.(b), and

{(\cdot)}^{+} = max {1, \cdot}

.

The derivation of Theorem 5 is presented in Section 4.4, after the structural properties of RDFs,

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), R_{Y_{i} | W} (Δ_{i}), R_{Y_{i}} (Δ_{i}), i = 1, 2

, of Theorem 12, Theorem 13, Theorem 14 are presented. From Theorem 5, follow simplified characterizations of subsets of the rate region

R_{G W} (Δ_{1}, Δ_{2})

, such as rates that lie on Pangloss Plane, and rates that correspond to W that make

Y_{1}

and

Y_{2}

conditional independent, i.e., W is such that

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

.

Utilizing the structural properties of RDFs,

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), R_{Y_{i} | W} (Δ_{i}), R_{Y_{i}} (Δ_{i}), i = 1, 2

, of Theorem 12, Theorem 13, Theorem 14, and Theorem 4, the next theorem is obtained, which gives the formula of Wyner’s lossy common information

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = C_{W} (Y_{1}, Y_{2})

.

Theorem 6.

Consider a tuple

(Y_{1}, Y_{2})

of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), and subset of the distortion region is defined by

D_{W} = \{(Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] | 0 \leq Δ_{1} \leq n (1 - d_{1}), 0 \leq Δ_{2} \leq n (1 - d_{1})\},

(46)

\forall j \in Z_{n}, d_{j} \in (0, 1) .

(47)

Then, Wyner’s lossy common information (calculation of expression in Theorem 3) is given by

\begin{matrix} C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = & C_{W} (Y_{1}, Y_{2}) \end{matrix}

(48)

\begin{matrix} = & C (Y_{1}, Y_{2}) = \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{1 + d_{j}}{1 - d_{j}}), (Δ_{1}, Δ_{2}) \in D_{W} \end{matrix}

(49)

The derivation of Theorem 6 is presented in Section 4.2 and makes use of a degenerate version of the realization of the triple

(Y_{1}, Y_{2}, W^{*})

given in Theorem 4, and the RDFs

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), R_{Y_{i} | W} (Δ_{i}), i = 1, 2

.

Remark 2.

By Theorem 5, a subset of the Gray–Wyner rate region is obtained by replacing

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

of (39) and (40) by W that makes

Y_{1}

and

Y_{2}

conditionally independent, i.e.,

(Z_{1}, Z_{2}) \in G (0, Q_{(Z_{1}, Z_{2})})

and

(Z_{1}, Z_{2}, W)

mutually independent (e.g.,

Q_{(Z_{1}, Z_{2})}

is block-diagonal).

1.3. Structure of the Paper

Section 2 introduces the mathematical tools of the geometric approach to Gaussian random variables, the weak stochastic realization of conditional independence (Section 2.4).

Section 3 contains the problem statement, the solution procedure and the weak realization of a tuple of multivariable random variables

(Y_{1}, Y_{2})

such that another multivariate Gaussian random variable W makes

Y_{1}

and

Y_{2}

conditionally independent (Section 2.5).

C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2})

.

Section 4 is concerned with the characterization of the Gray–Wyner rate region

R_{G W} (Δ_{1}, Δ_{2})

, the characterization of rates that lie on the Pangloss Plane, and Wyner’s lossy common information. This section includes calculations of the rate distortion functions

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

,

R_{Y_{i} | W} (Δ_{i}), R_{Y_{i}} (Δ_{i}), i = 1, 2

, the weak stochastic realizations of the random variables

(Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}, W)

which achieve these rate distortion functions, for jointly multivariate Gaussian random variables with square-error distortion functions.

Section 5 includes remarks on possible extensions.

Appendix A.3 makes use of a matrix equality and a determinant inequality first obtained by Hua LooKeng in 1952, which are used to carry out the optimization problem of Wyner’s lossy common information

C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2})

.

2. Probabilistic Properties of Tuples of Random Variables

The reader finds in this section the basic properties associated with:

(1) the transformation of a tuple of Gaussian multivariate random variables

(Y_{1}, Y_{2})

in their canonical variable form, and

(2) The parameterization of all jointly Gaussian distributions

P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w)

by a zero mean Gaussian random variables

W : Ω \to R^{k} \equiv W

such that (a) W makes the multivariate random variables

(Y_{1}, Y_{2})

conditional independent, and (b) The marginal distribution

P_{Y_{1}, Y_{2}, W} (y_{1}, y_{1}, \infty) = P_{Y_{1}, Y_{2}} (y_{1}, y_{2})

coincides with the joint distribution of the multivariate random variables

(Y_{1}, Y_{2})

.

2.1. Notation of Elements of Probability Theory

The notation used in this paper is briefly specified. Denote by

Z_{+} = {1, 2, \dots,}

the set of the integers and by

N = {0, 1, 2, \dots,}

the set of the natural integers. For

n \in Z_{+}

denote the following finite subsets of the above defined sets by

Z_{n} = {1, 2, \dots, n}

and

N_{n} = {0, 1, 2, \dots, n}

.

Denote the real numbers by

R

and the set of positive and of strictly positive real numbers, respectively, by

R_{+} = [0, \infty)

and

R_{+ +} = (0, \infty) \subset R

. The vector space of n-tuples of real numbers is denoted by

R^{n}

. Denote the Borel

σ

-algebra on this vector space by

B (R^{n})

, hence,

(R^{n}, B (R^{n}))

is a measurable space.

The expression

R^{n \times m}

denotes the set of n by m matrices with elements in the real numbers, for

n, m \in Z_{+}

. For the symmetric matrix

Q \in R^{n \times n}

, the inequality

Q \geq 0

denotes that for all vectors

u \in R^{n}

the inequality

u^{T} Q u \geq 0

holds. Similar,

Q > 0

denotes that for all

u \in R^{n} ∖ {0}

,

u^{T} Q u > 0

. The notation

Q_{1} \leq Q_{2}

denotes that

Q_{2} - Q_{1} \geq 0

.

Consider a probability space denoted by

(Ω, F, P)

consisting of a set

Ω

, a

σ

-algebra F of subsets of

Ω

, and a probability measure

P : F \to [0, 1]

.

A real-valued random variable is a function

X : Ω \to R

such that the following set belongs to the indicated

σ

-algebra,

{ω \in Ω | X (ω) \in (- \infty, u]} \in F

for all

u \in R

. A random variable taking values in an arbitrary measurable space

(X, B (X))

is defined correspondingly by

X : Ω \to X

and

X^{- 1} (A) = {ω \in Ω | X (ω) \in A} \in B (X)

, for all

A \in B (X)

. The measure (or distribution if

X

is an Euclidean space) induced by the random variable on

(X, B (X))

is denoted by

P_{X}

or

P (d x)

. The

σ

-algebra generated by a random variable

X : Ω \to X

is defined as the smallest

σ

-algebra containing the subsets

X^{- 1} (A) \in F

for all

A \in B (X)

. It is denoted by

F^{X}

. The real-valued random variable X is called G-measurable for a

σ

-algebra

G \subseteq F

if the subset

{ω \in Ω | X (ω) \in (- \infty, u]} \in G

for all

u \in R

. Denote the set of positive random variables which are measurable on a sub-

σ

-algebra

G \subseteq F

by,

L_{+} (G) = {X : Ω \to R_{+} = [0, \infty) | X i s G - m e a s u r a b l e} .

The tuple of sub-

σ

-algebras

F_{1}, F_{2} \subseteq F

is called independent if

E [X_{1} X_{2}] = E [X_{1}] E [X_{2}]

for all

X_{1} \in L_{+} (F_{1})

and all

X_{2} \in L_{+} (F_{2})

. The definition can be extended to any finite set of independent sub-

σ

-algebras.

2.2. Geometric Approach of Gaussian Random Variables and Canonical Variable Form

The purpose of this section is to introduce the geometric approach of a tuple of finite-dimensional Gaussian random variables using the canonical variable form of the tuple introduced by H. Hotelling, [18]. The use of the geometric approach of two Gaussian random variables with respect to the computation of mutual information is elaborated by Gelfand and Yaglom in [19], making reference to an insight due to Kolmogorov. However, the canonical variable form is not given in [19].

An

R^{n}

-valued Gaussian random variable with as parameters the mean value

m_{X} \in R^{n}

and the variance

Q_{X} \in R^{n \times n}

,

Q_{X} = Q_{X}^{T} \geq 0

, is a function

X : Ω \to R^{n}

which is a random variable and such that the measure of this random variable equals a Gaussian measure described by its characteristic function,

E [exp (i u^{T} X)] = exp (i u^{T} m_{X} - \frac{1}{2} u^{T} Q_{X} u), \forall u \in R^{n} .

Note that this definition includes the case in which the random variable is almost surely equal to a constant in which case

Q_{X} = 0

. A Gaussian random variable with these parameters is denoted by

X \in G (m_{X}, Q_{X})

.

The effective dimension of the random variable is denoted by

dim (X) = rank (Q_{X})

.

Any tuple of random variables

X_{1}, \dots, X_{k}

is called jointly Gaussian if the vector

{(X_{1}^{T}, X_{2}^{T}, \dots, X_{k}^{T})}^{T}

is a Gaussian random variable. A tuple of Gaussian random variables

(Y_{1}, Y_{2})

will be denoted this way to save space, rather than by

\begin{matrix} (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) . \end{matrix}

Then, the variance matrix of this tuple is denoted by

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{(Y_{1}, Y_{2})} = (\begin{matrix} Q_{Y_{1}} & Q_{Y_{1}, Y_{2}} \\ Q_{Y_{1}, Y_{2}}^{T} & Q_{Y_{2}} \end{matrix}) \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})} .

The reader should distinguish the variance matrices

Q_{(Y_{1}, Y_{2})}

and

Q_{Y_{1}, Y_{2}} \in R^{p_{1} \times p_{2}}

. Any such tuple of Gaussian random variables is independent if and only if

Q_{Y_{1}, Y_{2}} = 0

.

Definition 1.

The canonical variable form.

Consider a tuple of Gaussian random variables

Y_{i} : Ω \to R^{p_{i}}

, with

Q_{Y_{i}} > 0

, for

i = 1, 2

,

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

,

Q_{(Y_{1}, Y_{2})} \geq 0

. Define the canonical variable form of these random variables if a basis has been chosen and a transformation of the random variables to this basis has been carried out such that with respect to the new basis, one has the representation

(Y_{1}, Y_{2}) \in G (0, Q_{cvf}), w h e r e,

\begin{matrix} Q_{cvf} = (\begin{matrix} I_{p_{11}} & 0 & 0 & I_{p_{21}} & 0 & 0 \\ 0 & I_{p_{12}} & 0 & 0 & D & 0 \\ 0 & 0 & I_{p_{13}} & 0 & 0 & 0 \\ I_{p_{21}} & 0 & 0 & I_{p_{21}} & 0 & 0 \\ 0 & D & 0 & 0 & I_{p_{22}} & 0 \\ 0 & 0 & 0 & 0 & 0 & I_{p_{23}} \end{matrix}) \in R^{p \times p}, \\ p, p_{1}, p_{2}, p_{11}, p_{12}, p_{13}, p_{21}, p_{22}, p_{23} \in N, \\ p = p_{1} + p_{2}, p_{1} = p_{11} + p_{12} + p_{13}, p_{2} = p_{21} + p_{22} + p_{23}, p_{11} = p_{21}, p_{12} = p_{22}, \end{matrix}

(50)

D = Diag (d_{1}, \dots, d_{p_{12}}), 1 > d_{1} \geq d_{2} \geq \dots \geq d_{p_{12}} > 0,

(51)

Y = (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) = (\begin{matrix} Y_{11} \\ Y_{12} \\ Y_{13} \\ Y_{21} \\ Y_{22} \\ Y_{23} \end{matrix}), Y_{i j} : Ω \to R^{p_{i j}}, i = 1, 2, j = 1, 2, 3 .

(52)

One then says that

(Y_{11}, \dots, Y_{1 k_{1}})

,

(Y_{21}, \dots, Y_{2 k_{2}})

are the canonical variables and

(d_{1}, \dots, d_{k_{12}})

the canonical correlation coefficients.

If

Q_{(Y_{1}, Y_{2})} > 0

then necessarily

p_{11} = p_{21} = 0

.

Appendix A.1 gives Algorithm A1 to transform the variance matrix

Q_{(Y_{1}, Y_{2})} \geq 0

by two nonsingular transformations

S_{i} \in R^{p_{i} \times p_{i}}, i = 1, 2

, to its canonical variable form

Q_{cvf}

of Definition 1, such that

S_{1} Y_{1} = (V_{1}, Y_{1}^{'}) = ((Y_{11}, Y_{12}), Y_{13}), S_{2} Y_{2} = (V_{2}, Y_{2}^{'}) = ((Y_{21}, Y_{22}), Y_{23}),

(53)

Y_{11} = Y_{21} - a . s ., E [Y_{12} Y_{22}^{T}] = D .

(54)

Proposition 1.

Properties of components of the canonical variable form.

Consider a tuple

(Y_{1}, Y_{2}) \in G (0, Q_{cvf})

of Gaussian random variables in the canonical variable form.

(a) The three components

Y_{11}, Y_{12}, Y_{13}

of

Y_{1}

are independent random variables. Similarly, the three components

Y_{21}, Y_{22}, Y_{23}

of

Y_{2}

are independent random variables.

(b) The equality

Y_{11} = Y_{21}

of these random variables holds almost surely.

(c) The tuple of random variables

(Y_{12}, Y_{22})

is correlated as shown by the formula

E [Y_{12} Y_{22}^{T}] = D = Diag (d_{1}, \dots, d_{p_{12}}) .

(55)

Note that the different components of

Y_{12}

and of

Y_{22}

are independent random variables; thus,

Y_{12, i}

and

Y_{12, j}

are independent, and

Y_{22, i}

and

Y_{22, j}

are independent, and

Y_{12, i}

and

Y_{22, j}

are independent, for all

i \neq j

; and that

Y_{12, j}

and

Y_{22, j}

for

j = 1, \dots, p_{12} = p_{22}

are correlated.

(d) The random variable

Y_{13}

is independent of

Y_{2}

. Similarly, the random variable

Y_{23}

is independent of

Y_{1}

Proof.

The results are immediately obvious from the fact that the random variables are all jointly Gaussian and from the variance formula (51) of the canonical variable form. □

Next, the interpretation of the various components of the canonical variable form is defined, as in [20].

Definition 2.

Interpretation of components of the canonical variable form.

Consider a tuple of jointly Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{cvf})

in the canonical variable form of Definition 1. Call the various components as defined in the next table.

$Y_{11} = Y_{21} - a . s .$	identical information of $Y_{1}$ and $Y_{2}$
$Y_{12}$	correlated information of $Y_{1}$ with respect to $Y_{2}$
$Y_{13}$	private information of $Y_{1}$ with respect to $Y_{2}$
$Y_{21} = Y_{11} - a . s .$	identical information of $Y_{1}$ and $Y_{2}$
$Y_{22}$	correlated information of $Y_{2}$ with respect to $Y_{1}$
$Y_{23}$	private information of $Y_{2}$ with respect to $Y_{1}$

For

Y_{11} = Y_{21} - a . s .

the term identical information is used.

Theorem 7 is a formula of mutual information

I (Y_{1}; Y_{2})

for a general tuple of finite-dimensional Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

. This formula is the subject of much discussion in Gelfand and Yaglom [19] (see Equantion (2.8’) and Chapter II).

Theorem 7.

Consider a tuple of finite-dimensional Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

,

Q_{Y_{i}} > 0, i = 1, 2

.

Compute the canonical variable form of the tuple of Gaussian random variables according to Algorithm A1. This yields the indices

p_{11} = p_{21}

,

p_{12} = p_{22}

,

p_{13}

,

p_{23}

, and

n = p_{11} + p_{12} = p_{21} + p_{22}

and the diagonal matrix D with canonical correlation coefficients or singular values

d_{i} \in (0, 1)

for

i = 1, \dots, n

.

Then, mutual information

I (Y_{1}; Y_{2})

is computed according to the formula

\begin{matrix} I (Y_{1}; Y_{2}) = & \{\begin{matrix} 0, & i f & 0 = p_{11} = p_{12} = p_{21} = p_{22}, p_{13} > 0, p_{23} > 0, \\ - \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}), & i f & 0 = p_{11} = p_{21}, p_{12} = p_{22} > 0, p_{13} \geq 0, p_{23} \geq 0, \\ \infty, & i f & p_{11} = p_{21} > 0, p_{12} = p_{22} \geq 0, p_{13} \geq 0, p_{23} \geq 0 . \end{matrix} \end{matrix}

(56)

where

d_{i}

are the canonical correlation coefficients, i.e.,

d_{i} = d_{i} (Y_{12, i} Y_{22, i}) = \frac{E \{Y_{12, i} Y_{22, i}\}}{\sqrt{E {\{Y_{12, i}\}}^{2} E {\{Y_{22, i}\}}^{2}}} = E \{Y_{12, i} Y_{22, i}\}, i = 1, \dots, n .

(57)

Proof.

The derivation given in Appendix A.3 of [21] (since it is not given in [19]). □

By the last entry of (57), it is appropriate to consider to only

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

such that

p_{11} = p_{21} = 0

, i.e., by removing the identical components prior to the analysis of mutual information problems.

Remark 3.

The material discussed in Section 1.2 makes use of the concepts of this section. The main point to be made is that, in lossy source coding problems, the source distribution is fixed, while the optimal reproduction distribution needs to be found and realized. Then, a pre-encoder can be used by invoking Algorithm A1.

2.3. Conditional Independence of a Triple of Gaussian Random Variables

The concept of conditional independence is basic to the entire paper. The definition is provided below. The characterization of a Gaussian measure on a triple of Gaussian random variables having the conditional independence property is stated.

Definition 3.

Conditional independence.

Consider a probability space

(Ω, F, P)

and three sub-σ-algebras

F_{1}, F_{2}, G \subseteq F

. Call the sub-σ-algebras

F_{1}

and

F_{2}

conditionally independent given, or conditioned on, the sub-σ-algebra G if the following factorization property holds:

\begin{matrix} E [Y_{1} Y_{2} | G] = E [Y_{1} | G] E [Y_{2} | G], \forall Y_{1} \in L_{+} (F_{1}), Y_{2} \in L_{+} (F_{2}) . \end{matrix}

(58)

Denote this property by

(F_{1}, F_{2} | G) \in C I

.

For Gaussian random variables, the definition of minimality of a Gaussian random variable X that makes two Gaussian random variables

(Y_{1}, Y_{2})

conditionally independent is needed. The definition is introduced below.

Definition 4.

Minimality of conditional independence of Gaussian random variables.

Consider three random variables,

Y_{i} : Ω \to R^{p_{i}}

for

i = 1, 2

and

X : Ω \to R^{n}

.

Call the random variables

Y_{1}

and

Y_{2}

Gaussian conditionally independent conditioned on or given

F^{X}

if:

(1)

(F^{Y_{1}}, F^{Y_{2}} | F^{X}) \in CI

;

(2)

(Y_{1}, Y_{2}, X)

are jointly Gaussian random variables.

The notation

(Y_{1}, Y_{2} | X) \in CIG

is used to denote this property.

Call the random variables

(Y_{1}, Y_{2} | X)

minimally Gaussian conditionally independent if

(1) They are Gaussian conditionally independent;

(2) There does not exist another tuple

(Y_{1}, Y_{2} | X_{1})

with

X_{1} : Ω \to R^{n_{1}}

such that

(Y_{1}, Y_{2} | X_{1}) \in CIG

and

n_{1} < n

.

This property is denoted by

(Y_{1}, Y_{2} | X_{1}) \in {CIG}_{m i n}

.

There exists a simple equivalent condition for the conditional independence of tuple of Gaussian random variables by a third Gaussian random variable. This condition is expressed in terms of parameterizing the variance matrix of the tuple as presented in the next proposition.

Proposition 2.

[22] (Proposition 3.4) Equivalent condition for the conditional independence of the tuple of Gaussian random variables.

Consider a triple of jointly Gaussian random variables denoted as

(Y_{1}, Y_{2}, X) \in G (0, Q)

with

Q_{X} > 0

. This triple is Gaussian conditionally independent if and only if

Q_{Y_{1}, Y_{2}} = Q_{Y_{1}, X} Q_{X}^{- 1} Q_{X, Y_{2}} .

(59)

It is minimally Gaussian conditionally independent if and only if, in addition,

n = dim (X) = rank (Q_{Y_{1}, Y_{2}})

.

It will become apparent in Section 4.4 that the Gray and Wyner lossy rate region

R_{G W} (Δ_{1}, Δ_{2})

is parameterized by a triple of jointly Gaussian random variables

(Y_{1}, Y_{2}, W)

, but not necessarily such that W makes

Y_{1}

and

Y_{2}

conditionally independent. However, subsets of

R_{G W} (Δ_{1}, Δ_{2})

, are characterized by a triple

(Y_{1}, Y_{2}, W)

, such that W makes

Y_{1}

and

Y_{2}

conditionally independent.

2.4. Weak Realization of a Gaussian Probability Measure on a Tuple of Random Variables

This section is motivated by Theorem 9, which states that, among all joint distributions

P_{Y_{1}, Y_{2}, W}

induced by a tuple of multivariate correlated Gaussian random variables

(Y_{1}, Y_{2})

, and an arbitrary random variable

W : Ω \to W

, continuous or discrete-valued, Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), is minimized by a triple

(Y_{1}, Y_{2}, W)

which induces a jointly Gaussian distribution

P_{Y_{1}, Y_{2}, W}

, and

W : Ω \to W = R^{n}

is finite-dimensional Gaussian random variable.

To develop the above results, use is made of the solution of the problem of the weak Gaussian stochastic realization of a tuple of Gaussian random variables. Specifically, to determine a Gaussian probability measure on a triple of Gaussian random variables such that:

(1): The measure restricted to the first two Gaussian random variables is equal to the considered probability measure;
(2): The third Gaussian random variable makes the other two random variables conditionally independent. This problem does not have a unique solution, there is a set of Gaussian probability measures which meets those conditions. Needed is the parameterization of this set of solutions.

Below, the problem is stated in more detail. Its solution is provided in the next section.

Problem 1.

Weak stochastic realization of a tuple of conditionally independent Gaussian random variables.

Weak stochastic realization problem of a Gaussian random variable. Consider a Gaussian measure

P_{0} = G (0, Q_{0})

on the space

(R^{p_{1} + p_{2}}, B (R^{p_{1} + p_{2}}))

. Determine the integer

n \in N

and construct all Gaussian measures on the space

(R^{p_{1} + p_{2} + n}, (B (R^{p_{1} + p_{2} + n}))

such that, if

P_{1} = G (0, Q_{1})

is such a measure with

(Y_{1}, Y_{2}, X) \in G (0, Q_{1})

, then:

(1)

G (0, Q_{1}) |_{R^{p_{1} + p_{2}}} = G (0, Q_{0})

;

(2)

(Y_{1}, Y_{2} | X) \in {CIG}_{m i n}

.

Here, the indicated random variables

(Y_{1}, Y_{2}, X)

are constructed having the measure

G (0, Q_{1})

with the dimensions

p_{1}, p_{2}, n \in Z_{+}

, respectively.

The next definition and proposition are about the weak Gaussian stochastic realization of a tuple of jointly Gaussian multivariate random variables and its weak stochastic realization.

Definition 5.

Minimality of weak stochastic realization of a tuple of conditionally independent Gaussian random variables.

Consider a Gaussian measure

P_{0} = G_{0} (0, Q_{(y_{1}, Y_{2})})

with zero mean values for a tuple

(Y_{1}, Y_{2})

of random variables on the product space

(R^{p_{1}} \times R^{p_{2}}, B (R^{p_{1}}) \otimes B (R^{p_{2}})

for

p_{1}, p_{2} \in Z_{+}

with

\begin{matrix} Q_{(Y_{1}, Y_{2})} & = & (\begin{matrix} Q_{Y_{1}} & Q_{Y_{1}, Y_{2}} \\ Q_{Y_{1}, Y_{2}}^{T} & Q_{Y_{2}} \end{matrix}), Q_{Y_{1}} > 0, Q_{Y_{2}} > 0 . \end{matrix}

(a) A weak Gaussian stochastic realization of the Gaussian measure

G_{0} (0, Q_{(y_{1}, Y_{2})})

is defined to be a Gaussian measure

P_{1} = G_{1}

if there exists an integer

n \in Z_{+}

such that the Gaussian measure

G_{1}

is defined on the space

(R^{p_{1}} \times R^{p_{1}} \times R^{n}, B (R^{p_{1}}) \otimes B (R^{p_{2}}) \otimes B (R^{n}))

associated with random variables in the three spaces denoted, respectively, by

Y_{1}

,

Y_{2}

, and X, and such that:

(1)

G_{1} |_{R^{p_{1}} \times R^{p_{2}}} = G_{0} (0, Q_{(Y_{1}, Y_{2})})

;

(2)

Q_{X} > 0

;

(3) Conditional independence holds:

P_{Y_{1}, Y_{2} | X} = P_{Y_{1} | X} P_{Y_{2} | X},

where these are Gaussian measures, with means which are linear functions of the random variable X and deterministic variance matrices.

(b) The weak Gaussian stochastic realization is called minimal if the dimension n of the random variable X is the smallest possible over all weak Gaussian stochastic realizations as defined in (a).

(c) A Gaussian random variable representation of a weak Gaussian stochastic realization

G_{1}

is defined as a triple of random variables satisfying the following relations

\begin{matrix} (Y_{1}, Y_{2}, X, V_{1}, V_{2}), p_{V_{1}}, p_{V_{2}} \in Z_{+}, p_{V_{1}} \geq p_{1}, p_{V_{2}} \geq p_{2}, \\ Y_{1} : Ω \to R^{p_{1}}, Y_{2} : Ω \to R^{p_{2}}, V_{1} : Ω \to R^{p_{v_{1}}}, V_{2} : Ω \to R^{p_{v_{2}}}, X : Ω \to R^{n}, \\ (V_{1}, V_{2}, X) \in G, a n d t h e s e a r e z e r o m e a n i n d e p e n d e n t r a n d o m v a r i a b l e s \\ Q_{V_{1}} > 0, Q_{V_{2}} > 0, Q_{X} > 0; \end{matrix}

(60)

C_{1} \in R^{p_{1} \times n}, C_{2} \in R^{p_{2} \times n}, N_{1} \in R^{p_{1} \times p_{V_{1}}}, N_{2} \in R^{p_{2} \times p_{V_{2}}},

(61)

Y_{1} = C_{1} X + N_{1} V_{1},

(62)

Y_{2} = C_{2} X + N_{2} V_{2},

(63)

Q_{Y_{1}} = C_{1} Q_{X} C_{1}^{T} + N_{1} Q_{V_{1}} N_{1}^{T},

(64)

Q_{Y_{2}} = C_{2} Q_{X} C_{2}^{T} + N_{2} Q_{V_{2}} N_{2}^{T},

(65)

\begin{matrix} Q_{Y_{1}, Y_{2}} = C_{1} Q_{X} C_{2}^{T}, \\ G_{0} (0, Q_{(Y_{1}, Y_{2})}) = G_{1} |_{R^{p_{1}} \times R^{p_{2}}} . \end{matrix}

(66)

From the assumptions, it then follows that

(Y_{1}, Y_{2})

are Gaussian random variables, hence the last equality makes sense.

(d) A minimal Gaussian random variable representation of a weak Gaussian stochastic realization is defined as a triple of random variables as in (c) except that, in addition, it is required that,

rank (C_{1}) = n = rank (C_{2}) .

(67)

The case

Q_{X} \geq 0

in (a).(2) is similar.

The next proposition shows the equivalence of weak Gaussian stochastic realizations of Definition 5. (a), (b) to Definition 5. (c), (d), respectively.

Proposition 3.

Consider the setting of Definition 5 with

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

with the representation of (62) and (63).

(a) A weak Gaussian stochastic realization in terms of a measure

P_{1} = G_{1}

as defined in Definition 5. (a) is equivalent to a Gaussian random variable representation of Definition 5. (c).

(b) The minimal weak Gaussian stochastic realization of Definition 5. (b) is equivalent to a minimal weak Gaussian random variable representation of Definition 5. (d).

Proof.

The derivation given in Appendix A.5 of [21]. □

Consider Figure 2. The two signals

Y_{1}, Y_{2}

are to be reproduced at the two decoders by

{\hat{Y}}_{1}, {\hat{Y}}_{2}

subject to the square-error distortion functions. According to Gray and Wyner, the characterization of the lossy rate region is described by a single coding scheme that uses the auxiliary random variable W, which is common to both

Y_{1}, Y_{2}

. A subset of the rate triples on the Gray and Wyner rate region, which is achieved by a triple that satisfies

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

. Below, this conditional independence is further detailed in terms of the mathematical framework of weak stochastic realization such that

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

.

Definition 6.

The model for a triple of Gaussian random variables.

Consider a tuple of Gaussian random variables specified by

Y = (Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

with

Y_{i} : Ω \to R^{p_{i}}

for

i = 1, 2

. Take a jointly Gaussian measure

G (0, Q_{(Y_{1}, Y_{2}, W)})

for the triple

(Y_{1}, Y_{2}, W)

,

W : Ω \to R^{n}

,

W \in G (0, Q_{W})

, such that the marginal measure on

(Y_{1}, Y_{2})

is equal to the considered measure, with

\begin{matrix} Q_{(Y_{1}, Y_{2}, W)} = (\begin{matrix} Q_{Y_{1}} & Q_{Y_{1}, Y_{2}} & Q_{Y_{1}, W} \\ Q_{Y_{1}, Y_{2}}^{T} & Q_{Y_{2}} & Q_{Y_{2}, W} \\ Q_{Y_{1}, W}^{T} & Q_{Y_{2}, W}^{T} & Q_{W} \end{matrix}) . \end{matrix}

(68)

Denote the parameterized joint measure with respect to W, by

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

. This parameterized joint measure

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

also includes the subset such that the conditional independence holds,

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

.

In the following subsections, it will be shown how such a random variable W can be constructed in a number of cases.

Algorithm 1. (a) gives the general case, while (b) gives the special case when the joint measure by

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

such that

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

via weak stochastic realization.

Aldorithm 1.

Consider the model of a tuple of Gaussian random variables of Definition 6.

(a) General case.

At the encoder, first compute the variables,

$(\begin{matrix} Z_{1} \\ Z_{2} \end{matrix}) = (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) - E [(\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) W^{T}] = (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) - Q_{(Y_{1}, Y_{2}), W} Q_{W}^{†} W$

(69)

$Q_{(Y_{1}, Y_{2}), W} = E [(\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) W^{T}] = (\begin{matrix} Q_{Y_{1}, W} \\ Q_{Y_{2}, W} \end{matrix});$

(70)

then, the triple $(Z_{1}, Z_{2}, W)$ of jointly Gaussian random variables are such that $(Z_{1}, Z_{2}) \in G (0, Q_{(Z_{1}, Z_{2})})$ and $(Z_{1}, Z_{2})$ independent of W.
The tuple of random variables $(Y_{1}, Y_{2})$ are represented according to

$(\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) = Q_{(Y_{1}, Y_{2}), W} Q_{W}^{†} W + (\begin{matrix} Z_{1} \\ Z_{2} \end{matrix})$

(71)

(b) Special case. Consider

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

, and assume

Q_{W} > 0

.

At the encoder, compute first the variables,

$\begin{matrix} Z_{1} = & Y_{1} - E [Y_{1} | F^{W}] = Y_{1} - Q_{Y_{1}, W} Q_{W}^{- 1} W, \end{matrix}$

(72)

$\begin{matrix} Z_{2} = & Y_{2} - E [Y_{2} | F^{W}] = Y_{2} - Q_{Y_{2}, W} Q_{W}^{- 1} W; \end{matrix}$

(73)

then the triple $(Z_{1}, Z_{2}, W)$ of jointly Gaussian random variables are independent.
The tuple of random variables $(Y_{1}, Y_{2})$ are represented according to,

$\begin{matrix} Y_{1} = Q_{Y_{1}, W} Q_{W}^{- 1} W + Z_{1}, Y_{2} = Q_{Y_{2}, W} Q_{W}^{- 1} W + Z_{2} . \end{matrix}$

(74)

We emphasize that

Y_{1}

and

Y_{2}

are conditionally independent condition on W if and only if

Z_{1}

and

Z_{2}

are independent.

The validity of the statements of the algorithm follow from the next proposition.

Proposition 4.

Consider the model of a tuple of Gaussian random variables of Definition 6, for cases (a), (b).

(a): At the encoder, the conditional expectations are correct and the definitions of $Z_{1}$ and of $Z_{2}$ are well defined.
(b): The three random variables $(Z_{1}, Z_{2}, W)$ are independent. Consequently, the three sequences
$(W^{N}, Z_{1}^{N}, Z_{2}^{N})$ , and messages generated by the Gray–Wyner encoder,
$f^{(E)} (Y_{1}^{N}, Y_{2}^{N}) = {\bar{f}}^{(E)} (W^{N}, Z_{1}^{N}, Z_{2}^{N}) = (S_{0}, S_{1}, S_{2})$ are independent.

Proof.

Case (a). This follows from realization theory (since no constraints are imposed). Case (b). This is a specific application of Proposition 3. □

For the definition of

C (Y_{1}, Y_{2})

, use is made of the construction of the actual family of measures such that

(Y_{1}, Y_{2} | W) \in CIG

holds, and the weak strochastic realization. These are presented in Theorem 8 and Corollary 1.

2.5. Characterization of Minimal Conditional Independence of a Triple of Gaussian Random Variables

Introduce the notation of the parameterization of the family of Gaussian probability distributions

\begin{matrix} P^{C I G} = & {P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w) | P_{Y_{1}, Y_{2} | W} (y_{1}, y_{2} | w) = P_{Y_{1} | W} (y_{1} | w) P_{Y_{2} | W} (y_{2} | w), \\ P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, \infty) = P_{Y_{1}, Y_{2}} (y_{1}, y_{2}), (Y_{1}, Y_{2}, W) i s j o i n t l y G a u s s i a n} . \end{matrix}

(75)

A subset of the set

P^{C I G}

is the set of distributions

P_{m i n}^{C I G}

, with the additional constraint that the dimension of the random variable W is minimal while all other conditions hold, defined by

P_{m i n}^{C I G} = \{P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w) \in P^{C I G} | (Y_{1}, Y_{2} | W) \in {CIG}_{m i n}\} \subseteq P^{C I G} .

(76)

The parameterization of the family of Gaussian probability distributions

P^{C I G}

and

P_{m i n}^{C I G}

require the solution of the weak stochastic realization problem of Gaussian random variables defined by Problem 1. This problem is solved in [22] (Theorem 4.2). For the readers’ convenience, it is stated below.

Theorem 8.

Ref. [22] (Theorem 4.2) Consider a tuple

(Y_{1}, Y_{2})

of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables. Thus, the random variables

Y_{1}, Y_{2}

have the same dimension

n = p_{1} = p_{2}

, and their covariance matrix

D \in R^{n \times n}

is a nonsingular diagonal matrix with the diagonal ordered real-numbers in the interval

(0, 1)

. Hence,

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}) = P_{0}, Y_{1}, Y_{2} : Ω \to R^{n}, n \in Z_{+},

(77)

Q_{(y_{1}, y_{2})} = (\begin{matrix} I & D \\ D & I \end{matrix}),

(78)

D = Diag (d_{1}, d_{2}, \dots, d_{n}) \in R^{n \times n}, 1 > d_{1} \geq d_{2} \geq \dots \geq d_{n} > 0 .

(79)

That is,

p_{11} = p_{21} = 0, p_{13} = p_{23} = 0 .

(a) There exists a probability measure

P_{1}

, and a triple of Gaussian random variables

Y_{1}, Y_{2},

W : Ω \to R^{n}

defined on it, such that (i)

P_{1} |_{(Y_{1}, Y_{2})} = P_{0}

and (ii)

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in {CIG}_{min}

;

(b) There exist a family of Gaussian measures denoted by

P_{ci} \subseteq P_{m i n}^{C I G}

, that satisfy (i) and (ii) of (a), and moreover, this family is parameterized by the matrices and sets, as follows.

G (0, Q_{s} (Q_{W})), Q_{W} \in Q_{W},

(80)

\begin{matrix} Q_{s} = Q_{s} (Q_{W}) = (\begin{matrix} I & D & D^{1 / 2} \\ D & I & D^{1 / 2} Q_{W} \\ D^{1 / 2} & Q_{W} D^{1 / 2} & Q_{W} \end{matrix}), \end{matrix}

(81)

Q_{W} = \{Q_{W} \in R^{n \times n} | Q_{W} = Q_{W}^{T}, 0 < D \leq Q_{W} \leq D^{- 1}\},

(82)

P_{ci} = \{G (0, Q_{s} (Q_{W})) o n (R^{3 n}, B (R^{3 n})) | Q_{W} \in Q_{W}\} \subseteq P_{m i n}^{C I G} .

(83)

Furthermore, for any measure

P_{1} \in P_{m i n}^{C I G}

, there exists a triple of state transformation of the form

(Y_{1}, Y_{2}, W) \mapsto (S_{1} Y_{1}, S_{2} Y_{2}, S_{W} W)

for nonsingular square matrices

S_{1}, S_{2}, S_{W}

such that the corresponding measure of the three transformed variables belongs to

P_{ci}

.

The application of Theorem 8 is discussed in the next remark, in the context of parameterizing any rate-triple on the Gray–Wyner lossy rate region

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

that lies on the Pangloss plane.

Remark 4.

Applications of Theorem 8.

(a) Theorem 8 is a parameterization of the family of Gaussian measures

P_{ci} \subseteq P_{m i n}^{C I G}

by the entries of the covariance matrix

Q_{W}

. Hence, it is at most an

n (n + 1) / 2 -

dimensional parameterization;

(b) It is shown in Section 4.4 that only a subset of the achievable rate region

R_{G W} (Δ_{1}, Δ_{2}) = R_{G W}^{*} (Δ_{1}, Δ_{2})

is generated from distributions

P_{ci} \subseteq P_{m i n}^{C I G} \subseteq P

.

The next corollary is useful to the calculation of

C (Y_{1}, Y_{2})

, since by Theorem 9, an achievable lower bound on

I (Y_{1}, Y_{2}; W)

is incurred by a Gaussian random variable W, such that the distribution

P_{Y_{1}, Y_{2}, W} \in P_{ci} \subseteq P_{m i n}^{C I G}

, corresponding to

W \in G (0, Q_{W})

. By Theorem 9, and since

C (Y_{1}, Y_{2})

is invariant with respect to nonsingular transformations applied to

(Y_{1}, Y_{2}, W)

, the next corollary gives the realization of

(Y_{1}, Y_{2})

as defined in Theorem 8, by (77)–(79), expressed in terms of an arbitrary Gaussian random variable

W \in G (0, Q_{W})

.

Corollary 1.

Consider a tuple

(Y_{1}, Y_{2})

of Gaussian random variables in the canonical variable form of Definition 1. Restrict the attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79).

Then, a realization of the random variables

(Y_{1}, Y_{2})

which induce the family of measures

P_{ci} \subseteq P_{m i n}^{C I G}

, defined by (80)–(83), is

Y_{1} = Q_{Y_{1}, W} Q_{W}^{- 1} W + Z_{1}

(84)

Q_{Y_{1}, W} = D^{1 / 2}, Z_{1} \in G (0, (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})),

(85)

Y_{2} = Q_{Y_{2}, W} Q_{W}^{- 1} W + Z_{2}

(86)

Q_{Y_{2}, W} = D^{1 / 2} Q_{W}, Z_{2} \in G (0, (I - D^{1 / 2} Q_{W} D^{1 / 2})),

(87)

(Z_{1}, Z_{2}, W), a r e i n d e p e n d e n t .

(88)

Furthermore, the mutual information

I (Y_{1}, Y_{2}; W)

is given by

\begin{matrix} I (Y_{1}, Y_{2}; W) = & H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W) \end{matrix}

(89)

\begin{matrix} = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} ln (det ([I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{W} D^{1 / 2}])) \end{matrix}

(90)

and it is parameterized by

Q_{W} \in Q_{W}

, where

Q_{W}

is defined by the set of Equation (82).

Proof.

The correctness of the realization is due to Proposition 2 and Theorem 8. The calculation of mutual information follows from the realization. □

3. Wyner’s Common Information

This section is devoted to the calculation of Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), for

P_{Y_{1}, Y_{2}} = G (0, Q_{(Y_{1}, Y_{2})})

, and the construction of the weak stochastic realization of

(Y_{1}, Y_{2}, W)

that achieves this.

3.1. Reduction of the Calculation of Wyner’s Common Information

First, we show Theorem 9, which states: given a tuple of multivariate correlated Gaussian random variables

(Y_{1}, Y_{2})

, and an arbitrary random variable W (i.e., taking continuous or discrete values), Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), is minimized by a triple

(Y_{1}, Y_{2}, W)

which induces a jointly Gaussian distribution

P_{Y_{1}, Y_{2}, W}

, and

W : Ω \to W = R^{n}

is finite-dimensional Gaussian random variable.

Theorem 9.

Consider a tuple of multivariate-correlated Gaussian random variables

Y_{1} : Ω \to R^{p_{1}}, Y_{2} : Ω \to R^{p_{2}}

,

p_{i} \in Z_{+}, i = 1, 2

with the variance matrix of this tuple denoted by

\begin{matrix} (Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{(Y_{1}, Y_{2})} = (\begin{matrix} Q_{Y_{1}} & Q_{Y_{1}, Y_{2}} \\ Q_{Y_{1}, Y_{2}}^{T} & Q_{Y_{2}} \end{matrix}) \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})} \end{matrix}

(91)

and, without loss of generality, assume that

Q_{(Y_{1}, Y_{2})}

is a positive definite matrix. Let

W : Ω \to W

be any auxiliary random variable, with

W

being an arbitrary measurable space, and

P_{Y_{1}, Y_{2}, W}

any joint probability distribution of the triple

(Y_{1}, Y_{2}, W)

on the product space

(R^{p_{1}} \times R^{p_{2}} \times W,

B (R^{p_{1}}) \otimes B (R^{p_{2}}) \otimes B (W))

with

(Y_{1}, Y_{2}) -

marginal

P_{Y_{1}, Y_{2}}

the Gaussian distribution

P_{Y_{1}, Y_{2}} = G (0, Q_{(Y_{1}, Y_{2})})

.

The following hold.

(a) Define the random variables

Z_{1}, Z_{2}

by

Z_{i} = Y_{i} - E [Y_{i} | W], Z_{i} : Ω \to R^{p_{i}}, i = 1, 2 .

(92)

The inequalities hold:

\begin{matrix} I (Y_{1}, Y_{2}; W) = & H (Y_{1}, Y_{2}) - H (Y_{1}, Y_{2} | W) \end{matrix}

(93)

\begin{matrix} = & H (Y_{1}, Y_{2}) - H (Y_{1} | Y_{2}, W) - H (Y_{2} | W) \end{matrix}

(94)

\begin{matrix} \geq & H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W) \end{matrix}

(95)

\begin{matrix} = & H (Y_{1}, Y_{2}) - H (Y_{1} - E [Y_{1} | W] | W) - H (Y_{2} - E [Y_{2} | W] | W) \end{matrix}

(96)

\begin{matrix} \geq & H (Y_{1}, Y_{2}) - H (Y_{1} - E [Y_{1} | W]) - H (Y_{2} - E [Y_{2} | W]) \end{matrix}

(97)

\begin{matrix} = & H (Y_{1}, Y_{2}) - H (Z_{1}) - H (Z_{2}) \end{matrix}

(98)

\begin{matrix} \geq & H (Y_{1}, Y_{2}) - H (Z_{1}) - H (Z_{2}) i f Z_{1}, Z_{2} h a v e f i n i t e v a r i a n c e s, a n d G a u s s i a n . \end{matrix}

(99)

(b) If:

(i)

W : Ω \to W = R^{n}

is an

n -

dimensional,

n \in Z_{+}

, Gaussian random variable;

(ii)

(Z_{1}, Z_{2}, W)

are mutually independent jointly Gaussian random variables, then all inequalities in (93)–(99) hold with equality, and

(Y_{1}, Y_{2}, W)

induces a family of jointly probability distributions

P_{Y_{1}, Y_{2}, W}

with

(Y_{1}, Y_{2}) -

marginal

P_{Y_{1}, Y_{2}}

, such that W makes

Y_{1}

and

Y_{2}

conditionally independent, that is

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

;

(c) Among all joint distributions,

P_{Y_{1}, Y_{2}, W}

induced by the jointly Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

of (91), and an arbitrary random variable

W : Ω \to W

, such that the

(Y_{1}, Y_{2}) -

marginal

P_{Y_{1}, Y_{2}}

is the Gaussian distribution

P_{Y_{1}, Y_{2}} = G (0, Q_{(Y_{1}, Y_{2})})

, and

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

, a jointly Gaussian distribution achieves the lower bounds of

I (Y_{1}, Y_{2}; W)

in part (a), i.e., achieves Wyner’s common information

C (Y_{1}, Y_{2})

, defined by (13), and

P_{Y_{1}, Y_{2}, W}

is induced by an

n -

dimensional,

n \in Z_{+}

, Gaussian random variable

W : Ω \to W = R^{n}

, and

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}

, and such a distribution is induced by the triple

(Y_{1}, Y_{2}, W)

represented by

\begin{matrix} Y_{1} = E [Y_{1} | W] + Z_{1}, Y_{2} = E [Y_{2} | W] + Z_{2}, \end{matrix}

(100)

\begin{matrix} (W, Z_{1}, Z_{2}) a r e m u t u a l l y i n d e p e n d e n t, G a u s s i a n r a n d o m v a r i a b l e s . \end{matrix}

(101)

Proof.

(a) (93) is due to an identity of mutual information, (94) is due to the chain rule of entropy, (95) due to conditioning reduces entropy, (96) due to a property of conditional entropy, (97) due to conditioning reduces entropy, (98) is due to definition (92) and (99), is due to maximum entropy principle. (b) Since,

Y_{i} = E [Y_{i} | W] + Z_{i}, i = 1, 2

and

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

, if (i) and (ii) hold, then all inequalities hold with equality, and the statements are easily verified. (c) Follows from part (b). □

Remark 5.

Theorem 9 shows that, among all random variables W which induce a joint distribution

P_{Y_{1}, Y_{2}, W}

, with

(Y_{1}, Y_{2}) -

marginal

P_{Y_{1}, Y_{2}}

the Gaussian distribution

P_{Y_{1}, Y_{2}} = G (0, Q_{(Y_{1}, Y_{2})})

, then for the Wyner’s common information

C (Y_{1}, Y_{2})

problem, it suffices to consider a jointly Gaussian triple

(Y_{1}, Y_{2}, W)

such that W makes

Y_{1}

and

Y_{2}

conditionally independent.

3.2. Wyner’s Common Information of Correlated Random Variables

Assume that the tuple of multivariate correlated Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

of Theorem 9 is already transformed to the canonical variable representation, see Definition 1, using Algorithm A1, i.e., by the nonsingular transformation,

S = Block - diag (S_{1}, S_{2})

. Mutual information is invariant with respect to nonsingular transformations, and

I (Y_{1}, Y_{2}; W) = I (S_{1} Y_{1}, S_{2} Y_{2}; W)

.

By Theorem 9. (c), the joint probability distributions

P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, w)

are jointly Gaussian, and parameterized by the random variable W. This family of distributions is parameterized by the multidimensional random variable W, such that

(Y_{1}, Y_{2})

are conditionally independent, conditioned on W, the marginal distribution

P_{Y_{1}, Y_{2}, W} (y_{1}, y_{2}, \infty) = P_{Y_{1}, Y_{2}} (y_{1}, y_{2})

coincides with the distribution of

(Y_{1}, Y_{2})

, and to represent

(Y_{1}, Y_{2})

.

Using the above construction, one obtains the next theorem.

Theorem 10.

Consider a tuple of Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{cvf})

as described and decomposed according to Algorithm A1. Restrict attention to the correlated parts of these random variables, as described in Theorem 8, (77)–(79) (i.e., only components

(Y_{12}, Y_{22})

are present).

(a) Theorem 8 holds, and in particular, the family of jointly Gaussian distributions

P_{Y_{1}, Y_{2}, W}

induced by

(Y_{1}, Y_{2}) \in G (0, Q_{cvf}))

and a Gaussian random variable

W : Ω \to R^{n}

, with minimum dimension n, such that the

(Y_{1}, Y_{2}) -

marginal is

P_{Y_{1}, Y_{2}} = G (0, G (0, Q_{cvf}))

, and

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

, is paremetrized by the family of Theorem 8. (b), i.e., (80)–(83);

(b) Corollary 1, (84)–(88) characterizes the family of realizations of

(Y_{1}, Y_{2}, W)

, parameterized by W, which induce jointly Gaussian distributions, such that

W : Ω \to R^{n}

is a Gaussian random variable with minimum dimension n,

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

, and

(Y_{1}, Y_{2}) \in G (0, Q_{cvf}))

. Moreover, Wyner’s common information

C (Y_{1}, Y_{2})

is computed from the expression

I (Y_{1}, Y_{2}; W)

of Corollary 1, (90), optimized over

Q_{W} \in Q_{W}

, where

Q_{W}

is defined by the set of Equation (82).

Remark 6.

It is apparent that the proof of the formula

C (Y_{1}, Y_{2}; W)

in [15,16] is based on rate distortion function, i.e., they do not directly address Wyner’s optimization problem (13), as in Theorem 9, which first shows, among all continuous or discrete random variables, W is Gaussian, and there is no parameterization of the set of distributions

P_{Y_{1}, Y_{2}, W}

achieving conditional independence

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

, i.e., the optimization over the parameterized family of Gaussian measures of Theorem 8 is not given.

In the next theorem, the family of measures

P_{ci} \subseteq P_{m i n}^{C I G}

, defined by (80)–(83), which leads to realization of

(Y_{1}, Y_{2})

, given in Corollary 1, is ordered for the determination of a single joint distribution

P_{Y_{1}, Y_{2}, W^{*}} \in P_{ci} \subseteq P_{m i n}^{C I G}

, which achieves

C (Y_{1}, Y_{2})

. This leads to the realization of

(Y_{1}, Y_{2})

expressed in terms of

W^{*}

and vectors of independent Gaussian random variables

(Z_{1}, Z_{2})

, one for each realization, each having independent components.

Theorem 11.

Consider a tuple

(Y_{1}, Y_{2})

of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, defined by (77)–(79.

The following hold.

(a) The information quantity

C (Y_{1}, Y_{2})

is given by

\begin{matrix} C (Y_{1}, Y_{2}) = \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}) = \frac{1}{2} \sum_{i = 1}^{n} ln (1 + \frac{2 d_{i}}{1 - d_{i}}) \in (0, \infty) . \end{matrix}

(102)

(b) The realizations of the random variables

(Y_{1}, Y_{2}, W^{*})

that achieve

C (Y_{1}, Y_{2})

are represented by

\begin{matrix} V : Ω \to R^{n}, V \in G (0, I), t h e v e c t o r V h a s i n d e p e n d e n t c o m p o n e n t s, \\ F^{V}, F^{Y_{1}} \lor F^{Y_{2}}, a r e i n d e p e n d e n t σ - a l g e b r a s, \end{matrix}

\begin{matrix} L_{1} = L_{2} = D^{1 / 2} {(I + D)}^{- 1} \in R^{n \times n}, \end{matrix}

(103)

\begin{matrix} L_{3} = {(I - D)}^{1 / 2} {(I + D)}^{- 1 / 2} \in R^{n \times n}, L_{1}, L_{2}, L_{3}, a r e d i a g o n a l m a t r i c e s, \end{matrix}

(104)

\begin{matrix} W^{*} = L_{1} Y_{1} + L_{2} Y_{2} + L_{3} V, W^{*} : Ω \to R^{n}, \end{matrix}

(105)

\begin{matrix} Z_{1} = Y_{1} - D^{1 / 2} W^{*}, Z_{1} : Ω \to R^{n}, \end{matrix}

(106)

\begin{matrix} Z_{2} = Y_{2} - D^{1 / 2} W^{*}, Z_{2} : Ω \to R^{n} . \end{matrix}

(107)

Then:

Z_{1} \in G (0, (I - D)), Z_{2} \in G (0, (I - D)), W^{*} \in G (0, I);

(108)

(Z_{1}, Z_{2}, W^{*}), a r e i n d e p e n d e n t a n d

(109)

Y_{1} = D^{1 / 2} W^{*} + Z_{1}, Y_{2} = D^{1 / 2} W^{*} + Z_{2}

(110)

hence, the variables

(Y_{1}, Y_{2}, W^{*})

induce a distribution

P_{Y_{1}, Y_{2}, W^{*}} \in P_{ci} \subseteq P_{m i n}^{C I G}

. Note that, in addition, each of the random variables

Z_{1}

,

Z_{2}

, and

W^{*}

have independent components.

(c) The variables

(Y_{1}, Y_{2}, W^{*})

defined in (b) induce a distribution

P_{Y_{1}, Y_{2}, W^{*}} \in P_{ci} \subseteq P_{m i n}^{C I G}

which achieves

C (Y_{1}, Y_{2})

,

C (Y_{1}, Y_{2}) = I (Y_{1}, Y_{2}; W^{*}) .

(111)

Proof.

By Theorem 9, the random variables

(Y_{1}, Y_{2}, W)

are restricted to jointly Gaussian random variables. Since mutual information

I (Y_{1}, Y_{2}; W)

is invariant with respect to nonsingular transformations

S_{1}, S_{2}

, i.e.,

I (Y_{1}, Y_{2}; W) = I (S_{1} Y_{1}, S Y_{2}; W)

, and

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

is equivalent to

(F^{S_{1} Y_{1}}, F^{S_{2} Y_{2}} | F^{W}) \in CIG

, then it suffices to consider the canonical variable form of Definition 1, and to construct a measure that carries a triple of jointly Gaussian random variables

Y_{1}, Y_{2}, W : Ω \to R^{n}

such that

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

.

(a) (1) Take a probability measure

P_{1}

such that there exists a triple of Gaussian random variables

Y_{1}, Y_{2}, W : Ω \to R^{n}

with

P_{1} |_{(y_{1}, y_{2})} = P_{0}

and

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

. It will first be proven that attention can be restricted to those state random variables W of which the dimension equals

n = p_{12} = p_{22}

.

Suppose that there exists a state random variable

W : Ω \to R^{n_{1}}

such that

(F^{Y_{1}}, F^{Y_{2}} | F^{W})

\in CIG

and

n_{1} > n

. Hence, W does not make

(Y_{1}, Y_{2})

minimally conditionally independent. Construct a minimal vector which makes the tuple minimally conditionally independent according to the procedure of [22] (Proposition 3.5). Thus,

\begin{matrix} W_{1} = & E [Y_{1} | F^{W}] = L_{11} W, L_{11} \in R^{n \times n_{1}}, \\ W_{2} = & E [Y_{2} | F^{W_{1}}] = L_{12} W_{1}, L_{12} \in R^{n \times n} . \end{matrix}

Then,

(F^{Y_{1}}, F^{Y_{2}} | F^{W_{2}}) \in {CIG}_{min}

and the dimension of

W_{2}

is

n = p_{12} = p_{22}

. Determine a linear transformation of

W_{2}

by a matrix

L_{15} \in R^{n \times n}

such that

W_{3} = L_{15} W_{2} = L_{15} L_{12} L_{11} W = L_{13} W, L_{13} = L_{15} L_{12} L_{11}, W_{3} \in G (0, Q_{3}), Q_{3} = I_{n} = L_{13} Q_{W} L_{13}^{T} .

It is then possible to construct a matrix

L_{14} \in R^{(n_{1} - n) \times n_{1}}

such that

W_{4} = L_{14} W, W_{4} \in G (0, Q_{4}), Q_{4} = I, L_{14} Q_{W} L_{13}^{T} = 0; (\begin{matrix} W_{3} \\ W_{4} \end{matrix}) \in G (0, I_{n_{1}}), rank ((\begin{matrix} L_{13} \\ L_{14} \end{matrix})) = n_{1},

and, due to

L_{14} Q_{W} L_{13}^{T} = 0

,

W_{3}

,

W_{4}

are independent random variables. See [23] [Theorem 4.9] for a theorem with which the existence of

L_{4}

can be proven. Note further that

F^{W} = F^{W_{3}, W_{4}}

.

Hence, the random variables

W_{3}, W_{4}

are independent,

(F^{Y_{1}}, F^{Y_{2}} | F^{W_{3}}) \in {CIG}_{min}

, and

I (Y_{1}, Y_{2}; W) = I (Y_{1}, Y_{2}; W_{3}, W_{4})

.

By properties of mutual information, it now follows that

\begin{matrix} I (Y_{1}, Y_{2}; W_{3}, W_{4}) - I (Y_{1}, Y_{2}; W_{3}) \\ = H (Y_{1}, Y_{2}) + H (W_{3}, W_{4}) - H (Y_{1}, Y_{2}, W_{3}, W_{4}) - H (Y_{1}, Y_{2}) - H (W_{3}) + H (Y_{1}, Y_{2}, W_{3}) \\ = H (Y_{1}, Y_{2}, W_{3}) + H (W_{4}) - H (Y_{1}, Y_{2}, W_{3}, W_{4}), b y i n d e p e n d e n c e o f W_{3} a n d W_{4}; \\ = I (Y_{1}, Y_{2}, W_{3}; W_{4}) \geq 0 . \end{matrix}

Thus, for the computation of

C (Y_{1}, Y_{2})

, attention can be restricted to those state variables W which are of miminal dimension.

(2) Take a probability measure

P_{1}

such that there exists a triple of Gaussian random variables

Y_{1}, Y_{2}, W : Ω \to R^{n}

with

P_{1} |_{(Y_{1}, Y_{2})} = P_{0}

and

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in {CIG}_{min}

.

According to [22] (Theorem 4.2), there exist in general many such measures which are parameterized by the matrices and the sets, as stated in Theorem 8, (b), and defined by (80)–(83).

(3) Then, the mutual information of the triple of Gaussian random variables is calculated, using Theorem 8. (b) for any choice of

Q_{W} \in Q_{W}

, where

Q_{W}

is given by (82). Then

I (Y_{1}, Y_{2}; W) = H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W) .

The following calculations are then obvious:

\begin{matrix} det (Q_{(Y_{1}, Y_{2})}) = & det (\begin{matrix} I & D \\ D & I \end{matrix}) = det (I - D^{2}) = \prod_{i = 1}^{n} (1 - d_{i}^{2}); \\ H (Y_{1}, Y_{2}) = & \frac{1}{2} ln (det (Q_{(y_{1}, y_{2})})) + \frac{1}{2} (2 n) ln (2 π e) = \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) + n ln (2 π e); \\ P_{Y_{1} | W} (y_{1} | w) \in G (E [Y_{1} | F^{W}], Q_{Y_{1} | W}), \\ E [Y_{1} | F^{W}] = & Q_{Y_{1}, W} Q_{W}^{- 1} W = D^{1 / 2} Q_{W}^{- 1} W; b y (81) \\ Q_{Y_{1} | W} = & I - Q_{Y_{1}, W} Q_{W}^{- 1} Q_{W} Q_{W}^{- 1} Q_{Y_{1}, W}^{T} = I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}; b y (81) \\ H (Y_{1} | W) = & \frac{1}{2} ln (det (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})) + \frac{1}{2} n ln (2 π e); \\ E [Y_{2} | F^{W}] = & Q_{Y_{2}, W} Q_{W}^{- 1} W = D^{1 / 2} Q_{W} Q_{W}^{- 1} W = D^{1 / 2} W; \\ Q_{Y_{2} | W} = & I - Q_{Y_{2}, W} Q_{W}^{- 1} Q_{W} Q_{W}^{- 1} Q_{Y_{2}, W}^{T} = I - D^{1 / 2} Q_{W} D^{1 / 2}; \\ H (Y_{2} | W) = & \frac{1}{2} ln (det (I - D^{1 / 2} Q_{W} D^{1 / 2})) + \frac{1}{2} n ln (2 π e) . \end{matrix}

From the above calculations, it then follows

\begin{matrix} I (Y_{1}, Y_{2}; W) = & H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W) \end{matrix}

(112)

\begin{matrix} = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) + n ln (2 π e) \end{matrix}

\begin{matrix} - \frac{1}{2} ln (det (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})) - \frac{1}{2} n ln (2 π e) \end{matrix}

\begin{matrix} - \frac{1}{2} ln (det (I - D^{1 / 2} Q_{W} D^{1 / 2})) - \frac{1}{2} n ln (2 π e) \end{matrix}

(113)

\begin{matrix} = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} ln (det ([I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{W} D^{1 / 2}])) . \end{matrix}

(114)

The above calculations verify the statements of Corollary 1.

(4) The computation of

C (Y_{1}, Y_{2})

requires the solution of an optimization problem.

\begin{matrix} C (Y_{1}, Y_{2}) = & inf_{P_{1} \in P_{ci}} I (Y_{1}, Y_{2}; W) \\ = & inf_{Q_{W} \in Q_{W}} \{\frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} ln (det ([I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{W} D^{1 / 2}]))\} . \end{matrix}

(115)

Since the first term in (115),

\frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2})

, does not depend on

Q_{W}

and the natural logarithm is a strictly increasing function, then

C (Y_{1}, Y_{2}) i s e q u i v a l e n t t o : sup_{Q_{W} \in Q_{W}} det [(I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}) (I - D^{1 / 2} Q_{W} D^{1 / 2})] .

(116)

Define:

\begin{matrix} L_{1} (Q_{W}) = & (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}) (I - D^{1 / 2} Q_{W} D^{1 / 2}), \end{matrix}

(117)

\begin{matrix} f_{1} (Q_{W}) = & det (L_{1} (Q_{W})) . \end{matrix}

(118)

Note that the expression

L_{1} (Q_{W}) \in R^{n \times n}

is a non-symmetric square matrix in general.

It will be proven that

\begin{matrix} f_{1} (Q_{W}) = & det (L_{1} (Q_{W})) \leq det ({[I - D]}^{2}), \forall Q_{W} \in Q_{W}, \end{matrix}

(119)

\begin{matrix} det (L_{1} (Q_{W})) = & det ({[I - D]}^{2}) i f a n d o n l y i f Q_{W} = I . \end{matrix}

(120)

From these two relations, it follows that

Q_{W}^{*} = I \in R^{n \times n}

is the unique solution of the supremization problem.

The inequality in (119) follows from Proposition A4. The equality of (120) is proven in two steps. If

Q_{W} = I

, then the equality of (120) holds as follows from direct substitution in (117). The converse is proven by contradiction. Suppose that

Q_{W} \neq I

. Then, it again follows from Proposition A4 that strict inequality holds in (119). Hence, the equality is proven.

(5) Finally, the value of

C (Y_{1}, Y_{2})

is computed for

Q_{W}^{*} = I

.

\begin{matrix} C (Y_{1}, Y_{2}) = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{1}^{2}) - \frac{1}{2} ln (det (I - D^{1 / 2} {(Q_{W}^{*})}^{- 1} D^{1 / 2})) - \frac{1}{2} ln (det (I - D^{1 / 2} (Q_{W}^{*}) D^{1 / 2})) \\ = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} 2 ln (det (I - D)) \\ = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} \sum_{i = 1}^{n} ln ({(1 - d_{i})}^{2}) = \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 - d_{i}^{2}}{{(1 - d_{i})}^{2}}) \\ = & \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}) = \frac{1}{2} \sum_{i = 1}^{n} ln (1 + \frac{2 d_{i}}{1 - d_{i}}) . \end{matrix}

(b) It follows from part (a) of the theorem that

C (Y_{1}, Y_{2})

is attained as the mutual information

I (Y_{1}, Y_{2}; W)

for a random variable W with

Q_{W} = Q = I

. Consider now a triple of random variables

(Y_{1}, Y_{2}, W) \in G (0, Q_{s} (I))

as defined in (80)–(83), hence,

Q_{W} = I

. Denote the random variable W from now on by

W^{*}

to indicate that it achieves the infimum of the definition of

C (Y_{1}, Y_{2})

. Thus,

Q_{W^{*}} = I

and

\begin{matrix} (Y_{12}, Y_{22}, W^{*}) \in G (0, Q_{s} (I)), \\ Q_{s} (I) = (\begin{matrix} I & D & D^{1 / 2} \\ D & I & D^{1 / 2} \\ D^{1 / 2} & D^{1 / 2} & I \end{matrix}) > 0 . \end{matrix}

(121)

Let

V : Ω \to R^{n_{12}}

be a Gaussian random variable with

V \in G (0, I)

which is independent of

(Y_{1}, Y_{2}, W)

.

Define the new state variable

\bar{W} = L_{1} Y_{1} + L_{2} Y_{2} + L_{3} V

. Then,

(Y_{1}, Y_{2}, V, W^{*})

are jointly Gaussian and it has to be shown that then

Q_{\bar{W}} = I

,

Q_{Y_{1}, \bar{W}} = D^{1 / 2}

, and

Q_{Y_{2}, \bar{W}} = D^{1 / 2}

. These equalities follow from simple calculations using the expressions of

L_{1}

,

L_{2}

, and

L_{3}

which calculations are omitted. It then follows from those calculations and the definition of the Gaussian measure

G (0, Q_{s} (I))

that, almost surely,

\bar{W} = W^{*}

.

The signals are then represented by

Z_{1} = Y_{1} - E [Y_{1} | F^{W^{*}}] = Y_{1} - Q_{Y_{1}, W^{*}} {(Q_{W^{*}})}^{- 1} W^{*} = Y_{1} - D^{1 / 2} W^{*},

(122)

Z_{2} = Y_{2} - E [Y_{2} | F^{W^{*}}] = Y_{2} - Q_{Y_{2}, W^{*}} {(Q_{W^{*}})}^{- 1} W^{*} = Y_{2} - D^{1 / 2} W^{*} .

(123)

It is proven that the triple of random variables

(Z_{1}, Z_{2}, W^{*})

are independent.

\begin{matrix} E [Z_{1} {(W^{*})}^{T}] = & E [Y_{1} {(W^{*})}^{T}] - D^{1 / 2} E [W^{*} {(W^{*})}^{T}] = D^{1 / 2} - D^{1 / 2} = 0, E [Z_{2} {(W^{*})}^{T}] = 0, \\ E [Z_{1} Z_{2}^{T}] = & E [(Y_{1} - D^{1 / 2} W^{*}) {(Y_{2} - D^{1 / 2} W^{*})}^{T}] = 0 . \end{matrix}

Hence, the original signals are represented as shown by the formulas,

\begin{matrix} Y_{1} = & Z_{1} + D^{1 / 2} W^{*}, b y (121), Q_{y_{1}, W^{*}} Q_{W^{*}}^{- 1} = D^{1 / 2}, a n d b y d e f i n i t i o n o f Z_{1}, \\ Y_{2} = & Z_{2} + D^{1 / 2} W^{*}, s i m i l a r l y . \end{matrix}

□

3.3. Wyner’s Common Information of Arbitrary Gaussian Random Variables

First, the two special cases of (1) a tuple of independent Gaussian random variables and (2) a tuple of identical Gaussian random variables are analyzed. From those results and that of the previous subsection, one can then prove Wyner’s common information for arbitrary Gaussian random variables.

The special case of the canonical variable form with only private parts is presented below.

Proposition 5.

Consider the case of a tuple of Gaussian vectors with only private parts. Hence, the Gaussian measure is

(Y_{13}, Y_{23}) \in G (0, Q_{(Y_{13}, Y_{23})}), Q_{(Y_{13}, Y_{23})} = (\begin{matrix} I & 0 \\ 0 & I \end{matrix}), Y_{13} : Ω \to R^{p_{13}}, Y_{23} : Ω \to R^{p_{23}} .

(124)

(a): The minimal σ-algebra $F^{W}$ which makes $Y_{13}, Y_{23}$ conditionally independent is the trivial σ-algebra denoted by $F_{0} = {\emptyset, Ω}$ . Thus, $(F^{Y_{13}}, F^{Y_{23}} | F_{0}) \in CI$ . The random variable W, in this case, is the constant $W_{3} = 0 \in R$ , hence $F^{W_{3}} = F_{0}$ .
(b): Then, $W^{*} = W_{3}$ and

$C (Y_{1}, Y_{2}) = C (Y_{13}, Y_{23}) = I (Y_{13}, Y_{23}; W_{3}) = 0 .$

(125)
(c): The weak stochastic realization that achieves $C (Y_{13}, Y_{23}) = 0$ is

$Z_{1} = Y_{13}, Z_{2} = Y_{23}, W_{3} = 0 .$

(126)

The special case of canonical variable form with only identical parts is presented below.

Proposition 6.

Consider the case of a tuple of Gaussian vectors with only the identical part. Hence the Gaussian measure is,

\begin{matrix} Y_{11} : Ω \to R^{p_{11}}, Y_{21} : Ω \to R^{p_{21}}, p_{11} = p_{21}, \\ (Y_{11}, Y_{21}) \in G (0, Q_{(Y_{11}, Y_{21})}), Q_{(y_{11}, y_{21})} = (\begin{matrix} I & I \\ I & I \end{matrix}), Y_{11} = Y_{21} a . s . \end{matrix}

(127)

(a): The only minimal σ-algebra which makes $Y_{11}$ and $Y_{21}$ Gaussian conditional-independent is $F^{Y_{11}} = F^{Y_{21}}$ . The state variable is thus, $W_{1} = Y_{11} = Y_{21}$ and $F^{W} = F^{Y_{11}} = F^{Y_{21}}$ .
(b): Then $C (Y_{1}, Y_{2}) = C (Y_{11}, Y_{21}) = + \infty$ .
(c): The weak stochastic realization is again simple, the variable W equals the identical component and there is no need to use the signals $Z_{1}$ and $Z_{2}$ . Thus, the representations are,

$Z_{1} = 0 \in R, Z_{2} = 0 \in R, W = Y_{11} = Y_{21} .$

(128)

Theorem 4 is now proven. Thus, the setting is that of a tuple of arbitrary Gaussian random variables, not necessarily restricted to the correlated parts of these random variables of Theorem 8, by (77)–(79). It is shown that

C (Y_{1}, Y_{2})

is computed by a decomposition and by the use of the formulas previously obtained in Section 3.2.

Proof of Theorem 4.

(a)

\begin{matrix} C (Y_{1}, Y_{2}) = inf_{(Y_{1}, Y_{2}, W) \in CIG} I (Y_{1}, Y_{2}; W) \\ = & inf \{I (Y_{11}, Y_{21}; W_{1}) + I (Y_{12}, Y_{22}; W_{2}) + I (Y_{13}, Y_{23}; 0)\}, b y P r o p o s i t i o n A 1, \\ \geq & inf_{(Y_{1}, Y_{2}, W) \in CIG} I (Y_{11}, Y_{21}; W_{1}) + inf_{(Y_{1}, Y_{2}, W) \in CIG} I (Y_{12}, Y_{22}; W_{2}) + inf_{(Y_{1}, Y_{2}, W) \in CIG} I (Y_{13}, Y_{23}; 0) \\ = & inf_{(Y_{11}, Y_{21}, W_{1}) \in CIG} I (Y_{11}, Y_{21}; W_{1}) + inf_{(Y_{12}, Y_{22}, W_{2}) \in CIG} I (Y_{12}, Y_{22}; W_{2}) + I (Y_{13}, Y_{23}; 0) \\ = & C (Y_{11}, Y_{21}; W_{1}) + C (Y_{12}, Y_{22}; W_{2}) + C (Y_{13}, Y_{23}; 0) \\ = & \{\begin{matrix} 0, & i f & p_{13} > 0, p_{23} > 0, p_{11} = p_{12} = p_{21} = p_{22} = 0, \\ \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}), & i f & p_{12} = p_{22} > 0, p_{11} = p_{21} = 0, p_{13} \geq 0, p_{23} \geq 0, \\ + \infty, & e l s e . \end{matrix} \end{matrix}

(129)

The latter equality follows from, respectively, Proposition 6, Theorem 11 and Proposition 5

(a and b). It will be shown that

C (Y_{1}, Y_{2})

is less than or equal to the right-hand side of Equation (129). From the latter inequality and the above inequality then follows the expression according to Equation (129).

To be specific, it will be proven that

C (Y_{1}, Y_{2})

is less than the expression

I (Y_{1}, Y_{2}; W^{*})

where

W^{*}

is defined in statement (b) of the proposition. It then follows from the proof of Theorem 11 that

(F^{Y_{12}}, F^{Y_{22}} | F^{W_{2}^{*}}) \in {CIG}_{min}

.

Then:

\begin{matrix} C (Y_{1}, Y_{2}) = & inf_{(Y_{1}, Y_{2} | W) \in CIG} I (Y_{1}, Y_{2}; W) \leq I (Y_{1}, Y_{2}; W^{*}) \\ = & I (Y_{11}, Y_{21}; W_{1}^{*}) + I (Y_{12}, Y_{22}; W_{2}^{*}) + I (Y_{13}, Y_{23}; \emptyset) \\ = & \{\begin{matrix} 0, & i f & p_{13} > 0, p_{23} > 0, p_{11} = p_{12} = p_{21} = p_{22} = 0, \\ \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}), & i f & p_{12} = p_{22} > 0, p_{11} = p_{21} = 0, p_{13} \geq 0, p_{23} \geq 0, \\ + \infty, & e l s e . \end{matrix} \end{matrix}

The latter equality is proven as follows. In the first case, when

p_{13} > 0

,

p_{23} > 0

, and

p_{11} = p_{12} = p_{21} = p_{22} = 0

, then

Y_{1} = Y_{13}

and

Y_{2} = Y_{23}

are independent random variables. It then follows from Proposition 5 that

I (Y_{1}, Y_{2}; 0) = I (Y_{13}, Y_{23}; 0) = 0

. In the second case, when

p_{12} = p_{22} > 0

,

p_{13} \geq 0

,

p_{23} \geq 0

, and

p_{11} = p_{21} = 0

, it follows from Proposition A1 and from Theorem 11 that

I (Y_{1}, Y_{2}; W^{*}) = I (Y_{12}, Y_{22}; W_{2}^{*}) + I (Y_{13}, Y_{23}; 0) = \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}) .

In the third case, when

p_{11} = p_{21} > 0

and the other

p_{i j}

indices are arbitrary, then

I (Y_{1}, Y_{2}; W^{*}) = + \infty

. Hence, the inequality

C (Y_{1}, Y_{2}) \leq r i g h t - h a n d s i d e

is proven and hence equality holds.

(c) This directly follows from Proposition 4. See also Section 3.6 of [21]. □

A procedure for the numerical calculations of Wyner’s information common information is given in Section 3.7 of [21].

4. Parametrization of Gray and Wyner Rate Region and Wyner’s Lossy Common Information

This section is devoted to the characterizations of rates that lie in the Gray–Wyner rate region for a tuple of Gaussian random variables with square error distortion functions.

By Gray–Wyner [1] (Theorem 8), reproduced in Theorem 1 to characterize rate triples

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

, it is necessary to:

(i) Characterize the rate distortion functions

R_{Y_{i}} (Δ_{i}), R_{Y_{i} | W} (Δ_{i}), i = 1, 2

and

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

;

(ii) Construct the realizations that induce the test channels of

R_{Y_{i}} (Δ_{i}), R_{Y_{i} | W} (Δ_{i}), i = 1, 2

and

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

, and understand the structural properties of the realizations.

4.1. Characterizations of Joint, Conditional and Marginal RDFs

Theorem 12 is the characterization of the joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

from [24].

Theorem 12.

Ref. [24] Consider a tuple of Gaussian random variables

Y_{i} : Ω \to R^{p_{i}}

, with

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

,

Q_{(Y_{1}, Y_{2})} \geq 0

(which implies

Q_{Y_{i}} > 0

, for

i = 1, 2

). Consider the joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

with square error distortion functions

D_{Y_{1}} (y_{1}, {\hat{y}}_{1}) = | | y_{1} - {\hat{y}}_{1} {| |}_{R^{p_{1}}}^{2},

D_{Y_{2}} (y_{2}, {\hat{y}}_{2}) = | | y_{2} - {\hat{y}}_{2} {| |}_{R^{p_{2}}}^{2}

. Then, the following hold:

(a) The mutual information

I (Y_{1}, Y_{2}; {\hat{Y}}_{1}, {\hat{Y}}_{2})

satisfies

I (Y_{1}, Y_{2}; {\hat{Y}}_{1}, {\hat{Y}}_{2}) \geq I (Y_{1}, Y_{2}; E \{Y_{1} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\}, E \{Y_{2} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\})

(130)

and the mean square error satisfies

E \{| | Y_{i} - {\hat{Y}}_{i} {| |}_{R^{p_{1}}}^{2}\} \geq E \{| | Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\} {| |}_{R^{p_{i}}}^{2}\}, i = 1, 2 .

(131)

Moreover, inequalities in (130) and (131) hold with equality if there exists a jointly Gaussian realization of

({\hat{Y}}_{1}, {\hat{Y}}_{2})

or a Gaussian test channel distribution

P_{{\hat{Y}}_{1}, {\hat{Y}}_{2} | Y_{1}, Y_{2}}

such that the joint distribution

P_{{\hat{Y}}_{1}, {\hat{Y}}_{2}, Y_{1}, Y_{2}}

is jointly Gaussian, and such that the following identities both hold;

E \{Y_{1} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\} = E \{Y_{1} | F^{{\hat{Y}}_{1}}\} = {\hat{Y}}_{1}, E \{Y_{2} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\} = E \{Y_{2} | F^{{\hat{Y}}_{2}}\} = {\hat{Y}}_{2} .

(132)

(b) A realization that achieves the lower bounds of part (a), i.e., satisfies (132), is the Gaussian realization of

(Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2})

given by

(\begin{matrix} {\hat{Y}}_{1} \\ {\hat{Y}}_{2} \end{matrix}) = H (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) + (\begin{matrix} V_{1} \\ V_{2} \end{matrix})

(133)

(V_{1}, V_{2}) \in G (0, Q_{(V_{1}, V_{2})}), (V_{1}, V_{2}) i n d e p e n d e n t o f (Y_{1}, Y_{2}),

(134)

H Q_{(Y_{1}, Y_{2})} = Q_{(Y_{1}, Y_{2})} - Q_{(E_{1}, E_{2})} \geq 0,

(135)

\begin{matrix} Q_{(V_{1}, V_{2})} = Q_{(E_{1}, E_{2})} H^{T} = H Q_{(Y_{1}, Y_{2})} - H Q_{(Y_{1}, Y_{2})} H^{T} \geq 0, \\ i f Q_{(Y_{1}, Y_{2})} > 0 t h e n \end{matrix}

(136)

\begin{matrix} H = I_{p_{1} + p_{2}} - Q_{(E_{1}, E_{2})} Q_{(Y_{1}, Y_{2})}^{- 1}, Q_{(V_{1}, V_{2})} = Q_{(E_{1}, E_{2})} - Q_{(E_{1}, E_{2})} Q_{(Y_{1}, Y_{2})}^{- 1} Q_{(E_{1}, E_{2})} \geq 0, \end{matrix}

(137)

where

(E_{1}, E_{2})

is the error tuple, that satisfies the structural property,

\begin{matrix} E_{i} = Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{1}, {\hat{Y}}_{2}}\} = Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{i}}\} = Y_{i} - {\hat{Y}}_{i}, i = 1, 2 \end{matrix}

(138)

and the variance matrix of this tuple is,

\begin{matrix} (E_{1}, E_{2}) \in G (0, Q_{(E_{1}, E_{2})}), Q_{(E_{1}, E_{2})} = (\begin{matrix} Q_{E_{1}} & Q_{E_{1}, E_{2}} \\ Q_{E_{1}, E_{2}}^{T} & Q_{E_{2}} \end{matrix}) \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})} . \end{matrix}

(c) The joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

is characterized by

\begin{matrix} R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) = inf_{E \{| | E_{i} {| |}_{R^{p_{i}}}^{2}\} = trace (Q_{E_{i}}) \leq Δ_{i}, i = 1, 2} \frac{1}{2} ln {(\frac{det (Q_{(Y_{1}, Y_{2})})}{det (Q_{(E_{1}, E_{2})})})}^{+} \in [0, \infty], \end{matrix}

(139)

\begin{matrix} s u c h t h a t Q_{({\hat{Y}}_{1}, {\hat{Y}}_{2})} = Q_{(Y_{1}, Y_{2})} - Q_{(E_{1}, E_{2})} \geq 0 \end{matrix}

(140)

where the test channel distribution

P_{{\hat{Y}}_{1}, {\hat{Y}}_{2} | Y_{1}, Y_{2}}

or the joint distribution

P_{{\hat{Y}}_{1}, {\hat{Y}}_{2}, Y_{1}, Y_{2}}

is induced by the realization of part (b).

The conditional rate distortion function, derived in [25,26], is also required.

Theorem 13.

Ref. [25] (Theorem 1, Thorem 4), [26] Consider a triple of random variables

Y_{i} : Ω \to R^{p_{i}}, i = 1, 2

,

W : Ω \to W

, where W is continuous or finite-valued, with joint distribution

P_{Y_{1}, Y_{2}, W}

, and marginal distributions

P_{Y_{1}, Y_{2}}

and

P_{Y_{i}}, i = 1, 2

, the jointly Gaussian distribution

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{(Y_{1}, Y_{2})} \geq 0

and

Y_{i} \in G (0, Q_{Y_{i}}), Q_{Y_{i}} > 0, i = 1, 2

, respectively. Consider the conditional RDFs

R_{Y_{i} | W} (Δ_{i}), i = 1, 2

with square error distortion functions

D_{Y_{i}} (y_{i}, {\hat{y}}_{i}) = | | y_{i} - {\hat{y}}_{i} {| |}_{R^{p_{i}}}^{2}, i = 1, 2

. Then, the following hold.

(a) For an arbitrary random variable,

W : Ω \to W

, the mutual information

I (Y_{i}; {\hat{Y}}_{i} | W)

satisfies

\begin{matrix} I (Y_{i}; {\hat{Y}}_{i} | W) \geq I (Y_{i}; E \{Y_{i} | F^{{\hat{Y}}_{i}, W}\} | W), i = 1, 2 \end{matrix}

(141)

and the mean square error satisfies

\begin{matrix} E \{| | Y_{i} - {\hat{Y}}_{i} {| |}_{R^{p_{i}}}^{2}\} \geq E \{| | Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{i}, W}\} {| |}_{R^{p_{i}}}^{2}\}, i = 1, 2 . \end{matrix}

(142)

Moreover, inequalities in (141) and (142) hold with equality, if there exists a realization of

{\hat{Y}}_{1}

of the test channel distribution

P_{{\hat{Y}}_{i} | Y_{i}, W}

, such that the joint distribution

P_{{\hat{Y}}_{i}, Y_{i}, W}

satisfies the identity:

\begin{matrix} {\hat{X}}_{i}^{cm} ≐ E \{Y_{i} | F^{{\hat{Y}}_{i}, W}\} = E \{Y_{i} | F^{{\hat{Y}}_{i}}\} = {\hat{Y}}_{i}, i = 1, 2 . \end{matrix}

(143)

(b) Suppose

P_{Y_{1}, Y_{2}, W}

is a jointly Gaussian distribution and

W : Ω \to R^{n}

is Gaussian. A realization that achieves the lower bounds of part (a), i.e., satisfies (143), is the Gaussian realization of

(Y_{i}, W, {\hat{Y}}_{i})

given by

\begin{matrix} {\hat{Y}}_{i} = H_{i} Y_{i} + (I_{p_{i}} - H_{i}) Q_{Y_{i}, W} Q_{W}^{†} W + V_{i}, i = 1, 2, \end{matrix}

(144)

\begin{matrix} V_{i} \in G (0, Q_{V_{1}}), V_{i} i n d e p e n d e n t o f Y_{i}, \end{matrix}

(145)

\begin{matrix} H_{i} Q_{Y_{i} | W} = Q_{Y_{i} | W} - Q_{E_{i}} \geq 0, \end{matrix}

(146)

\begin{matrix} Q_{V_{i}} = H_{i} Q_{E_{i}} = H_{i} Q_{Y_{i} | W} - H_{i} Q_{Y_{i} | W} H_{i}^{T} \geq 0, \end{matrix}

(147)

where

^{†}

denotes the pseudoinverse of a matrix,

E_{i}

is the error, that satisfies the structural property,

\begin{matrix} E_{i} = Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{i}, W}\} = Y_{i} - {\hat{Y}}_{i}, E_{i} \in G (0, Q_{E_{i}}), i = 1, 2 . \end{matrix}

(148)

The RDF

R_{Y_{i} | W} (Δ_{i})

for jointly Gaussian

(Y_{1}, Y_{2}, W)

is characterized by

\begin{matrix} R_{Y_{i} | W} (Δ_{i}) = inf_{E \{| | E_{i} {| |}_{R^{p_{i}}}^{2}\} = trace (Q_{E_{i}}) \leq Δ_{i}} \frac{1}{2} ln {(\frac{det (Q_{Y_{i} | W})}{det (Q_{E_{i}})})}^{+} \in [0, \infty], i = 1, 2, \end{matrix}

(149)

\begin{matrix} s u c h t h a t Q_{{\hat{Y}}_{i}} = Q_{Y_{i} | W} - Q_{E_{i}} \geq 0, \end{matrix}

(150)

where the test channel distribution

P_{{\hat{Y}}_{i} | Y_{i}, W}

or the joint distribution

P_{{\hat{Y}}_{i}, Y_{i}, W}

is induced by the above realization.

The following is stated as a conjectured, because it is not shown in this paper; it can be shown using Theorem 13.

Conjecture 1.

Conditional RDF of Gaussian sources with arbitrary conditioning RV

Consider a triple of random variables

Y_{i} : Ω \to R^{p_{i}}, i = 1, 2

,

W : Ω \to W

, where W is continuous or finite-valued, with joint distribution

P_{Y_{1}, Y_{2}, W}

, and marginal distributions

P_{Y_{1}, Y_{2}}

and

P_{Y_{i}}, i = 1, 2

, the jointly Gaussian distribution

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{(Y_{1}, Y_{2})} \geq 0

and

Y_{i} \in G (0, Q_{Y_{i}}), Q_{Y_{i}} > 0, i = 1, 2

, respectively. Consider the conditional RDFs

R_{Y_{i} | W} (Δ_{i}), i = 1, 2

with square error distortion functions

D_{Y_{i}} (y_{i}, {\hat{y}}_{i}) = | | y_{i} - {\hat{y}}_{i} {| |}_{R^{p_{i}}}^{2}, i = 1, 2

.

Then, the following hold.

(a) For an arbitrary random variable,

W : Ω \to W

, and

{\hat{X}}_{i}^{cm}

satisfying (143), the following lower bounds hold.

\begin{matrix} I (X_{i}; {\hat{X}}_{i} | W) \geq & I (X_{i}; {\hat{X}}_{i}^{cm} | W), i = 1, 2 \end{matrix}

(151)

\begin{matrix} = & \int_{W} I (X_{i}; {\hat{X}}_{i}^{cm} | W = w) P_{W} (d w) \end{matrix}

(152)

\begin{matrix} \geq & inf_{w \in W} I (X_{i}; {\hat{X}}_{i}^{cm} | W = w), \end{matrix}

(153)

\begin{matrix} E [D_{X_{i}} (X_{i}, {\hat{X}}_{i})] = & \int_{W} (\int_{R^{p_{1}} \times R^{p_{2}}} D_{X_{i}} (x_{i}, {\hat{x}}_{i}) P_{X_{i}, {\hat{X}}_{i} | W} (x_{i}, {\hat{x}}_{i} | w)) P_{W} (d w) \end{matrix}

(154)

\begin{matrix} = & \int_{W} Δ_{i} (w) P_{W} (d w), \end{matrix}

(155)

\begin{matrix} \geq & \int_{W} Δ_{i}^{cm} (w) P_{W} (d w), \end{matrix}

(156)

\begin{matrix} \geq & inf_{w \in W} Δ_{i}^{cm} (w), i = 1, 2, \end{matrix}

(157)

where

\begin{matrix} Δ_{i} (w) ≐ E [D_{X_{i}} (X_{i}, {\hat{X}}_{i}) | W = w], Δ_{i}^{cm} (w) ≐ E [D_{X_{i}} (X_{i}, {\hat{X}}_{i}^{cm}) | W = w], i = 1, 2 . \end{matrix}

(158)

Moreover, the inequalities in (153), (157), are achieved if,

(i) (141) holds, and

(ii) the mutual information

I (X_{i}; {\hat{X}}_{i} | W = w)

and

Δ_{i} (w)

for

i = 1, 2

, are independent of

w \in W

.

(b) The rate distortion function

R_{X_{i} | W} (Δ_{i})

for

W : Ω \to W

a continuous or finite-valued, achieves a minimum value if,

(i)

W : Ω \to R^{n}

,

n \in Z_{+}

is Gaussian, and

P_{Y_{i}, W}

is jointly Gaussian,

(ii)

({\hat{X}}_{i}, X_{i}, W)

is given by the realization of Theorem 13.(b) for

i = 1, 2

.

The characterization of the marginal RDFs

R_{Y_{i}} (Δ_{i}), i = 1, 2

—which are well-known, and can be found in many books—is also needed; the weak realization of the test channel, which follows from Theorem 13 (see also [25]), as a degenerate case, and summarized in the next theorem, is important in this paper.

Theorem 14.

Ref. [25] (Theorem 1, Theorem 4) Consider a tuple of Gaussian random variables

Y_{i} : Ω \to R^{p_{i}}, i = 1, 2

, with

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

,

Q_{(Y_{1}, Y_{2})} > 0

,

Q_{Y_{i}} > 0

, for

i = 1, 2

.

For the marginal RDFs

R_{Y_{i}} (Δ_{i}), i = 1, 2

with square error distortion functions

D_{Y_{i}} (y_{i}, {\hat{y}}_{i}) = | | y_{i} - {\hat{y}}_{i} {| |}_{R^{p_{i}}}^{2}

, the statements of Theorem 13 hold with W, generating the trivial information, i.e.,

F^{W} = {Ω, \emptyset}

. That is, the marginal RDFs

R_{Y_{i}} (Δ_{i})

are characterized by

\begin{matrix} R_{Y_{i}} (Δ_{i}) = inf_{E \{| | E_{i} {| |}_{R^{p_{i}}}^{2}\} = trace (Q_{E_{i}}) \leq Δ_{i}} \frac{1}{2} ln {(\frac{det (Q_{Y_{i}})}{det (Q_{E_{i}})})}^{+} \in [0, \infty], i = 1, 2, \end{matrix}

(159)

\begin{matrix} s u c h t h a t Q_{{\hat{Y}}_{i}} = Q_{Y_{i}} - Q_{E_{i}} \geq 0 \end{matrix}

(160)

where the test channel distribution

P_{{\hat{Y}}_{i} | Y_{i}}

or the joint distribution

P_{{\hat{Y}}_{i}, Y_{i}}

is induced by the realization

\begin{matrix} {\hat{Y}}_{i} = H_{i} Y_{i} + V_{i}, i = 1, 2, \end{matrix}

(161)

\begin{matrix} V_{i} \in G (0, Q_{V_{1}}), V_{i} i n d e p e n d e n t o f Y_{i}, \end{matrix}

(162)

\begin{matrix} H_{i} Q_{Y_{i}} = Q_{Y_{i}} - Q_{E_{i}} \geq 0, \end{matrix}

(163)

\begin{matrix} Q_{V_{i}} = H_{i} Q_{E_{i}} = H_{i} Q_{Y_{i}} - H_{i} Q_{Y_{i}} H_{i}^{T} \geq 0 . \end{matrix}

(164)

and where

E_{i}

is the error that satisfies the structural property

\begin{matrix} E_{i} = Y_{i} - E \{Y_{i} | F^{{\hat{Y}}_{i}}\} = Y_{i} - {\hat{Y}}_{i}, E_{i} \in G (0, Q_{E_{i}}), i = 1, 2 . \end{matrix}

(165)

Then, we express the characterization of the joint RDF of Theorem 12, using the canonical variable form, and the canonical correlation coefficients. The special case when

Q_{(E_{1}, E_{2})}

is is block-diagonal is given in [27].

Theorem 15.

Consider the statement of Theorem 12. Compute the canonical variable form of the tuple of Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{Y_{i}} > 0

, according to Algorithm A1. This yields the indices

p_{11} = p_{21}

,

p_{12} = p_{22}

,

p_{13}

,

p_{23}

, and

n = p_{11} + p_{12} = p_{21} + p_{22}

, the diagonal matrix

D_{4}

with canonical correlation coefficients

d_{4, i} \in (0, 1)

for

i = 1, \dots, p_{12}

, and decompositions (see Algorithm A1, 1–4)

\begin{matrix} Q_{Y_{1}} = U_{1} D_{1} U_{1}^{T}, Q_{Y_{2}} = U_{2} D_{2} U_{2}^{T}, \end{matrix}

(166)

with

U_{i} \in R^{p_{i} \times p_{i}}

orthogonal (

U_{i} U_{i}^{T} = I_{p_{i}} = U_{i}^{T} U_{i}

),

i = 1, 2

, singular-value decomposition of

\begin{matrix} D_{1}^{- \frac{1}{2}} U_{1}^{T} Q_{Y_{1} Y_{2}} U_{2} D_{2}^{- \frac{1}{2}} = U_{3} D_{3} U_{4}^{T}, \end{matrix}

(167)

with

U_{3} \in R^{p_{1} \times p_{1}}

,

U_{4} \in R^{p_{2} \times p_{2}}

orthogonal,

\begin{matrix} D_{3} = & (\begin{matrix} I_{p_{11}} & 0 & 0 \\ 0 & D_{4} & 0 \\ 0 & 0 & 0 \end{matrix}) \in R^{p_{1} \times p_{2}}, \end{matrix}

(168)

\begin{matrix} D_{4} = & Diag (d_{4, 1}, . . ., d_{4, p_{12}}) \in R^{p_{12} \times p_{12}}, 1 > d_{4, 1} \geq d_{4, 2} \geq \dots \geq d_{4, p_{12}} > 0 . \end{matrix}

(169)

Define the new variance matrix of

Q_{(Y_{1}, Y_{2})}

according to

\begin{matrix} Q_{cvf} = (\begin{matrix} I_{p_{1}} & D_{3} \\ D_{3}^{T} & I_{p_{2}} \end{matrix}) . \end{matrix}

(170)

Compute the canonical variable form of the tuple of Gaussian error random variables

(E_{1}, E_{2}) \in G (0, Q_{(E_{1}, E_{2})})

of Theorem 12.(b), according to Algorithm A1. This yields the indices

{\bar{p}}_{11} = {\bar{p}}_{21}

,

{\bar{p}}_{12} = {\bar{p}}_{22}

,

{\bar{p}}_{13}

,

{\bar{p}}_{23}

, and

\bar{n} = {\bar{p}}_{11} + {\bar{p}}_{12} = {\bar{p}}_{21} + {\bar{p}}_{22}

and the diagonal matrix

{\bar{D}}_{4}

with canonical correlation coefficients

{\bar{d}}_{4, i} \in (0, 1)

for

i = 1, \dots, {\bar{p}}_{12}

, and decompositions (see Algorithm A1, 1–4),

\begin{matrix} Q_{E_{1}} = {\bar{U}}_{1} {\bar{D}}_{1} {\bar{U}}_{1}^{T}, Q_{E_{2}} = {\bar{U}}_{2} {\bar{D}}_{2} {\bar{U}}_{2}^{T}, \end{matrix}

(171)

\begin{matrix} {\bar{D}}_{i} = Diag ({\bar{d}}_{i, 1}, \dots, {\bar{d}}_{i, p_{i}}) \in R^{p_{i} \times p_{i}}, {\bar{d}}_{i, 1} \geq {\bar{d}}_{i, 2} \geq \dots \geq {\bar{d}}_{i, p_{i}} > 0, i = 1, 2, \end{matrix}

(172)

with

{\bar{U}}_{i} \in R^{p_{i} \times p_{i}}

orthogonal (

{\bar{U}}_{i} {\bar{U}}_{i}^{T} = I_{p_{i}} = {\bar{U}}_{i}^{T} {\bar{U}}_{i}

),

i = 1, 2

, singular-value decomposition of

\begin{matrix} {\bar{D}}_{1}^{- \frac{1}{2}} {\bar{U}}_{1}^{T} Q_{E_{1} E_{2}} {\bar{U}}_{2} {\bar{D}}_{2}^{- \frac{1}{2}} = {\bar{U}}_{3} {\bar{D}}_{3} {\bar{U}}_{4}^{T}, \end{matrix}

(173)

with

{\bar{U}}_{3} \in R^{p_{1} \times p_{1}}

,

{\bar{U}}_{4} \in R^{p_{2} \times p_{2}}

orthogonal,

\begin{matrix} {\bar{D}}_{3} = & (\begin{matrix} I_{{\bar{p}}_{11}} & 0 & 0 \\ 0 & {\bar{D}}_{4} & 0 \\ 0 & 0 & 0 \end{matrix}) \in R^{p_{1} \times p_{2}}, \end{matrix}

(174)

\begin{matrix} {\bar{D}}_{4} = & Diag ({\bar{d}}_{4, 1}, . . ., {\bar{d}}_{4, {\bar{p}}_{12}}) \in R^{{\bar{p}}_{12} \times {\bar{p}}_{12}}, 1 > {\bar{d}}_{4, 1} \geq {\bar{d}}_{4, 2} \geq \dots \geq {\bar{d}}_{4, {\bar{p}}_{12}} > 0 . \end{matrix}

(175)

Define the new variance matrix of

Q_{(E_{1}, E_{2})}

according to,

\begin{matrix} {\bar{Q}}_{cvf} = (\begin{matrix} I_{p_{1}} & {\bar{D}}_{3} \\ {\bar{D}}_{3}^{T} & I_{p_{2}} \end{matrix}) . \end{matrix}

(176)

The joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

of Theorem 12. (c) is equivalently characterized by

\begin{matrix} R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) = inf_{Q_{(E_{1}, E_{2})} \geq 0 : \bar{n} \in Z_{+}, \sum_{i = 1}^{p_{1}} {\bar{d}}_{1, i} \leq Δ_{1}, \sum_{i = 1}^{p_{2}} {\bar{d}}_{2, i} \leq Δ_{2}} \frac{1}{2} ln {(\frac{det (D_{1}) det (D_{2}) det (Q_{cvf})}{det ({\bar{D}}_{1}) det ({\bar{D}}_{2}) det ({\bar{Q}}_{cvf})})}^{+} \in [0, \infty], \end{matrix}

(177)

\begin{matrix} s u c h t h a t Q_{({\hat{Y}}_{1}, {\hat{Y}}_{2})} = Q_{(Y_{1}, Y_{2})} - Q_{(E_{1}, E_{2})} \geq 0, \end{matrix}

(178)

where

\begin{matrix} det (Q_{cvf}) = det (I_{p_{1}} - D_{3} D_{3}^{T}) \end{matrix}

(179)

\begin{matrix} = & \{\begin{matrix} 1, & i f & p_{13} > 0, p_{23} > 0, p_{11} = p_{12} = p_{21} = p_{22} = 0, \\ \prod_{i = 1}^{n} (1 - d_{4, i}^{2}), & i f & p_{11} = p_{21} = 0, p_{12} = p_{22} = n, p_{13} \geq 0, p_{23} \geq 0, \\ 0, & i f & p_{11} = p_{21} > 0, p_{12} = p_{22} \geq 0, p_{13} \geq 0, p_{23} \geq 0, \end{matrix} \end{matrix}

(180)

\begin{matrix} det ({\bar{Q}}_{cvf}) = det (I_{p_{1}} - {\bar{D}}_{3} {\bar{D}}_{3}^{T}) \end{matrix}

(181)

\begin{matrix} = & \{\begin{matrix} 1, & i f & {\bar{p}}_{13} > 0, {\bar{p}}_{23} > 0, {\bar{p}}_{11} = {\bar{p}}_{12} = {\bar{p}}_{21} = {\bar{p}}_{22} = 0, \\ \prod_{i = 1}^{\bar{n}} (1 - {\bar{d}}_{4, i}^{2}), & i f & {\bar{p}}_{11} = {\bar{p}}_{21} = 0, {\bar{p}}_{12} = {\bar{p}}_{22} = \bar{n}, {\bar{p}}_{13} \geq 0, {\bar{p}}_{23} \geq 0, \\ 0, & i f & {\bar{p}}_{11} = {\bar{p}}_{21} > 0, {\bar{p}}_{12} = {\bar{p}}_{22} \geq 0, {\bar{p}}_{13} \geq 0, {\bar{p}}_{23} \geq 0, \end{matrix} \end{matrix}

(182)

Moreover, a necessary condition for

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) < + \infty

is

{\bar{p}}_{11} = {\bar{p}}_{21} = 0

.

Proof.

First, apply Algorithm A1 to the tuple of Gaussian random variables

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

, and then to the Gaussian random variables

(E_{1}, E_{2}) \in G (0, Q_{(E_{1}, E_{2})})

of Theorem 12. (b). This gives (166)–(176). Then, (177) follows from (139) using (166)–(176), and the standard properties of a determinant of a matrix. The remaining equations are obtained from (166)–(176). The last statement follows from the values of

det ({\bar{Q}}_{cvf}), det (Q_{cvf})

.

□

Remark 7.

By Theorem 15, since

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) \in [0, \infty]

, by (180), it suffices to consider

Q_{(Y_{1}, Y_{2})} > 0

, which implies

p_{11} = p_{21} = 0

,

Q_{Y_{i}} > 0, i = 1, 2

. Furthermore, to ensure

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) \in [0, \infty)

, it suffices to also consider

Q_{(E_{1}, E_{2})} > 0

, which implies that

{\bar{p}}_{11} = {\bar{p}}_{21} = 0

,

Q_{E_{i}} > 0, i = 1, 2

.

From Theorem 15, the next corollary directly follows which identifies the subset of the distortion region such that Gray’s lower bound [28],

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) \geq R_{Y_{2} | Y_{1}} (Δ_{2}) + R_{Y_{1}} (Δ_{1})

holds with equality.

Corollary 2.

Consider the statement of Theorem 15, and without loss of generality, assume

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})})

, with

Q_{(Y_{1}, Y_{2})} > 0

(and hence

Q_{Y_{i}} > 0, i = 1, 2

).

The joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

with

D_{1}, D_{2}, {\bar{D}}_{1}, {\bar{D}}_{2}, Q_{cvf}, {\bar{Q}}_{cvf}

, defined in Theorem 15, and corresponding to

p_{11} = p_{21} = {\bar{p}}_{11} = {\bar{p}}_{21} = 0

satisfies the lower bound (

R_{Y_{2} | Y_{1}} (Δ_{2})

is obtained from Theorem 13 by letting

W = Y_{1}

.),

\begin{matrix} R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) \geq & R_{Y_{2} | Y_{1}} (Δ_{2}) + R_{Y_{1}} (Δ_{1}) \\ = & inf_{E \{| | E_{2} {| |}_{R^{p_{2}}}^{2}\} = trace (Q_{E_{2}}) \leq Δ_{2}} \frac{1}{2} ln (\frac{det (Q_{Y_{2} | Y_{1}})}{det (Q_{E_{2}})}) \end{matrix}

(183)

\begin{matrix} + inf_{E \{| | E_{1} {| |}_{R^{p_{1}}}^{2}\} = trace (Q_{E_{1}}) \leq Δ_{1}} \frac{1}{2} ln (\frac{det (Q_{Y_{1}})}{det (Q_{E_{1}})}) \end{matrix}

(184)

\begin{matrix} = & inf_{\sum_{i = 1}^{p_{2}} {\bar{d}}_{2, i} \leq Δ_{2}} \frac{1}{2} ln (\frac{det (D_{2}) det (Q_{cvf})}{det ({\bar{D}}_{2})}) + inf_{\sum_{i = 1}^{p_{1}} {\bar{d}}_{1, i} \leq Δ_{1}} \frac{1}{2} ln (\frac{det (D_{1})}{det ({\bar{D}}_{1})}) \end{matrix}

(185)

that is,

{\bar{p}}_{12} = {\bar{p}}_{22} = 0

.

Moreover, the inequalities (184) and (185) hold with the equalities, on the strictly positive surface

D_{C} (Y_{1}, Y_{2})

, defined by

\begin{matrix} D_{C} (Y_{1}, Y_{2}) = \{(Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] | Q_{(Y_{1}, Y_{2})} - Q_{(E_{1}, E_{2})} > 0\} . \end{matrix}

(186)

Proof.

The lower bound (183) is due to Gray [28]. The equality in (184) follows by using the values of the rate distortion functions in the right hand side of (183). Equality (185) follows from the singular value decomposition of the matrices given in Theorem 15, using

Q_{Y_{2} | Y_{1}} = Q_{Y_{2}} - Q_{Y_{2}, Y_{1}} Q_{Y_{1}}^{- 1} Q_{Y_{2}, Y_{1}}^{T}

. To establish the equalities, note that (177) with

det ({\bar{Q}}_{cvf}) = 1

equivalently

{\bar{p}}_{12} = {\bar{p}}_{22} = 0

, is precisely (185). Moreover, it can be easily verified that

{\bar{p}}_{12} = {\bar{p}}_{22} = 0

for the distortion region

D_{C} (Y_{1}, Y_{2})

. □

4.2. Wyner’s Lossy Common Information of Correlated Gaussian Vectors

Derived in this section are the characterizations of

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2})

via Theorem 2, for jointly Gaussian random variables with square-error distortion, as well as

C_{W} (Y_{1}, Y_{2})

via Theorem 3.

Definition 7.

Wyner’s lossy common information of a tuple of Gaussian multivariate random variables. Consider a tuple of jointly Gaussian random variables

Y_{1} : Ω \to R^{p_{1}} \equiv Y_{1}

,

Y_{2} : Ω \to R^{p_{2}} \equiv Y_{2}

, in terms of the notation

(Y_{1}, Y_{2}) \in G (0, Q_{(Y_{1}, Y_{2})}), Q_{Y_{i}} > 0, i = 1, 2

, and square error distortion functions between

(y_{1}, y_{2})

, and its reproduction

({\hat{y}}_{1}, {\hat{y}}_{2})

, given by

\begin{matrix} D_{Y_{1}} (y_{1}, {\hat{y}}_{1}) = | | y_{1} - {\hat{y}}_{1} {| |}_{R^{p_{1}}}^{2}, D_{Y_{2}} (y_{2}, {\hat{y}}_{2}) = | | y_{2} - {\hat{y}}_{2} {| |}_{R^{p_{2}}}^{2} \end{matrix}

(187)

where

| | \cdot {| |}_{R^{p_{i}}}^{2}

denotes Euclidean distances on

R^{p_{i}}, i = 1, 2

.

(a) Wyner’s common information (information definition) of the tuple of Gaussian random variables

(Y_{1}, Y_{2})

is defined by the expression

\begin{matrix} C (Y_{1}, Y_{2}) = inf_{W : Ω \to R^{n}, (F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG} I (Y_{1}, Y_{2}; W) \in [0, \infty] . \end{matrix}

(188)

Call any random variable W as defined above such that

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

and

(F^{Y_{1}}, F^{Y_{2}} | F^{W}) \in CIG

a state of the tuple

(Y_{1}, Y_{2})

.

If there exists a random variable

W^{*} : Ω \to R^{n^{*}}

with

n^{*} \in Z_{+} = {1, 2, \dots,}

which attains the infimum; thus, if

C (Y_{1}, Y_{2}) = I (Y_{1}, Y_{2}; W^{*})

, then call that random variable a minimal information state of the tuple

(Y_{1}, Y_{2})

.

(b) Wyner’s common information (operational definition) is defined for a tuple of strictly positive real numbers

γ = (γ_{1}, γ_{2}) \in R_{+ +} \times R_{+ +} = (0, \infty) \times (0, \infty)

such that, for all

0 \leq (Δ_{1}, Δ_{2}) \leq γ

,

\begin{matrix} C_{G W} (Y_{1}, Y_{2}; & Δ_{1}, Δ_{2}) = C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2}), \\ f o r (Δ_{1}, Δ_{2}) \in D_{W} = \{(Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] | 0 \leq (Δ_{1}, Δ_{2}) \leq γ\} \end{matrix}

(189)

provided identity (15) holds, i.e.,

R_{Y_{1} | W} (Δ_{1}) + R_{Y_{2} | W} (Δ_{2}) + I (Y_{1}, Y_{2}; W) = R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

.

By the above definition, the problem of calculating Wyner’s lossy common information via (18) is decomposed into the characterization of

C (Y_{1}, Y_{2})

such that identity (15) is satisfied. This follows from the fact that the only difference between

C_{W} (Y_{1}, Y_{2})

and

C (Y_{1}, Y_{2})

is the specification of the region

D_{W}

such that

C_{G W} (Y_{1}, Y_{2}; Δ_{1}, Δ_{2}) = C_{W} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2})

is constant for

(Δ_{1}, Δ_{2}) \in D_{W}

.

In the next theorem, we make use of the characterizations of the various rate distortion functions, and the test channel realizations to identify subsets of the rate region that lie on the Pangloss plane, and are consistent with the characterization of Viswanatha, Akyol and Rose [12] (Theorem 1, Equations (19) and (20)).

Theorem 16.

Consider a tuple

(Y_{1}, Y_{2})

of Gaussian random variables in the canonical variable form of Definition 1. Restrict the attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79). Furthermore, consider a realization of the random variables

(Y_{1}, Y_{2})

which induces the family of measures

P_{ci} \subseteq P_{m i n}^{C I G}

, as defined in Corollary 1, by (84)–(88).

Then, the following hold.

(a) The joint rate distortion function

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

of

(Y_{1}, Y_{2})

with square error distortion satisfies

\begin{matrix} R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) = inf_{\sum_{j = 1}^{n} Δ_{1, j} \leq Δ_{1}, \sum_{j = 1}^{n} Δ_{2, j} \leq Δ_{2}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j}^{2})}{Δ_{1, j} Δ_{2, j}}), (Δ_{1}, Δ_{2}) \in D_{C} (Y_{1}, Y_{2}), \end{matrix}

(190)

\begin{matrix} trace (Q_{E_{1}}) = E | | Y_{1} - {\hat{Y}}_{1} {| |}_{R^{n}}^{2} = \sum_{j = 1}^{n} Δ_{1, j}, trace (Q_{E_{2}}) = E | | Y_{2} - {\hat{Y}}_{2} {| |}_{R^{n}}^{2} = \sum_{j = 1}^{n} Δ_{2, j} \end{matrix}

(191)

where

D_{C} (Y_{1}, Y_{2})

is a strictly positive surface, defined by

\begin{matrix} D_{C} (Y_{1}, Y_{2}) = \{(Δ_{1}, Δ_{2}) \in [0, \infty] \times [0, \infty] | Q_{(Y_{1}, Y_{2})} - Q_{(E_{1}, E_{2})} > 0\} \end{matrix}

(192)

and where

Q_{(E_{1}, E_{2})}

is the variance of the errors

E_{i} = Y_{i} - {\hat{Y}}_{i}, i = 1, 2

, with parameters

{\bar{p}}_{11} = {\bar{p}}_{21} = {\bar{p}}_{12} = {\bar{p}}_{22} = 0

, and

{\bar{p}}_{13} = {\bar{p}}_{23} = n

.

The conditional rate distortion functions

R_{Y_{i} | W} (Δ_{i})

of

Y_{i}

conditioned on W with square error distortion, and mutual information

I (Y_{1}, Y_{2}; W)

satisfy

\begin{matrix} R_{Y_{1} | W} (Δ_{1}) = inf_{trace (Q_{E_{1}}) \leq Δ_{1}} \frac{1}{2} ln {(\frac{det (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})}{det (Q_{E_{1}})})}^{+}, Δ_{1} \in [0, \infty) \end{matrix}

(193)

\begin{matrix} R_{Y_{2} | W} (Δ_{2}) = inf_{trace (Q_{E_{2}}) \leq Δ_{2}} \frac{1}{2} ln {(\frac{det (I - D^{1 / 2} Q_{W} D^{1 / 2})}{det (Q_{E_{2}})})}^{+}, Δ_{2} \in [0, \infty) \end{matrix}

(194)

\begin{matrix} I (Y_{1}, Y_{2}; W) = \frac{1}{2} ln {(\frac{det (I - D^{2})}{det ([I - D^{1 / 2} D_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} D_{W} D^{1 / 2}])})}^{+} . \end{matrix}

(195)

where

trace (Q_{E_{i}}), i = 1, 2

are defined as in (191).

(b) The representations of reproductions (the reader may verify that the realization satisfies the conditions given in Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)),

({\hat{Y}}_{1}, {\hat{Y}}_{2})

of

(Y_{1}, Y_{2})

at the output of decoder 1 and decoder 2, which achieve the joint rate distortion functions

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), R_{Y_{i} | W} (Δ_{i}), i = 1, 2

of part (a), are

\begin{matrix} Y_{1} = D^{1 / 2} Q_{W}^{- 1} W + Z_{1}, \end{matrix}

(196)

\begin{matrix} Y_{2} = D^{1 / 2} W + Z_{2}, \end{matrix}

(197)

\begin{matrix} {\hat{Y}}_{1} = Y_{1} - Q_{E_{1}} {(I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})}^{- 1} Z_{1} + V_{1}, \end{matrix}

(198)

\begin{matrix} = D^{1 / 2} Q_{W}^{- 1} W + A_{1} Z_{1} + V_{1}, \end{matrix}

(199)

\begin{matrix} {\hat{Y}}_{2} = Y_{2} - Q_{E_{2}} {(I - D^{1 / 2} Q_{W} D^{1 / 2})}^{- 1} Z_{2} + V_{2}, \end{matrix}

(200)

\begin{matrix} = D^{1 / 2} W + A_{2} Z_{2} + V_{2}, \end{matrix}

(201)

\begin{matrix} Z_{1} \in G (0, (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})), Z_{2} \in G (0, (I - D^{1 / 2} Q_{W} D^{1 / 2})), W \in G (0, Q_{W}), \end{matrix}

(202)

\begin{matrix} Q_{E_{1}} = E {(Y_{1} - {\hat{Y}}_{1}) {(Y_{1} - {\hat{Y}}_{1})}^{T}}, Q_{E_{2}} = E {(Y_{2} - {\hat{Y}}_{2}) {(Y_{2} - {\hat{Y}}_{2})}^{T}}, \end{matrix}

(203)

\begin{matrix} V_{1} \in G (0, Q_{E_{1}} A_{1}^{T}), V_{2} \in G (0, Q_{E_{2}} A_{2}^{T}), \end{matrix}

(204)

\begin{matrix} A_{1} = I - Q_{E_{1}} {(I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})}^{- 1}, A_{2} = I - Q_{E_{2}} {(I - D^{1 / 2} Q_{W} D^{1 / 2})}^{- 1}, \end{matrix}

(205)

\begin{matrix} Q_{E_{i}} = U_{i} Λ_{i} U_{i}^{T}, Λ_{i} = Diag (Δ_{i, 1}, \dots, Δ_{i, n}) \in R^{n \times n}, U_{i} U_{i}^{T} = U_{i}^{T} U_{i} = I, i = 1, 2, \end{matrix}

(206)

\begin{matrix} Q_{Y_{1} | W} = I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2} = U_{1} Λ_{Y_{1} | W} U_{1}^{T}, Λ_{Y_{1} | W} = Diag (Λ_{Y_{1} | W, 1}, \dots, Λ_{Y_{1} | W, n}), \end{matrix}

(207)

\begin{matrix} Q_{Y_{2} | W} = I - D^{1 / 2} Q_{W} D^{1 / 2} = U_{2} Λ_{Y_{2} | W} U_{2}^{T}, Λ_{Y_{2} | W} = Diag (Λ_{Y_{2} | W, 1}, \dots, Λ_{Y_{2} | W, n}), \end{matrix}

(208)

\begin{matrix} (V_{1}, V_{2}, Z_{1}, Z_{2}, W), a r e i n d e p e n d e n t \end{matrix}

(209)

and are parameterized by

Q_{W} \in Q_{W}

, where

Q_{W}

is defined by the set of Equation (82).

Moreover, the joint distribution

P_{Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}, W}

satisfies (the reader may verify that conditions (210) are identical to Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)) for rates that lie on the Pangloss plane)

\begin{matrix} P_{{\hat{Y}}_{1}, {\hat{Y}}_{2} | W} = P_{{\hat{Y}}_{1} | W} P_{{\hat{Y}}_{2} | W}, P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}, W} = P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}}, \end{matrix}

(210)

\begin{matrix} P_{Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}, W} = P_{{\hat{Y}}_{1} | Y_{1}, W} P_{{\hat{Y}}_{2} | Y_{2}, W} P_{Y_{1} | W} P_{Y_{2} | W} P_{W}, \end{matrix}

(211)

\begin{matrix} P_{Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}, W} = P_{Y_{1} | {\hat{Y}}_{1}} P_{Y_{2} | {\hat{Y}}_{2}} P_{{\hat{Y}}_{1} | W} P_{{\hat{Y}}_{2} | W} P_{W} . \end{matrix}

(212)

(c) Consider part (a) and the realization of part (b). Then,

R_{Y_{1} | W} (Δ_{1}) + R_{Y_{2} | W} (Δ_{2}) + I (Y_{1}, Y_{2}; W)

= R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

on the subset

D_{C} (Y_{1}, Y_{2})

, defined by (192) such that (210) holds.

(d) Suppose

Q_{W} = Q_{W^{*}} \in Q_{W}

is diagonal, i.e.,

Q_{W^{*}} = Diag (Q_{W_{1}^{*}}, \dots, Q_{W_{n}^{*}}), d_{i} \leq Q_{W_{i}^{*}} \leq d_{i}^{- 1}, \forall i

. Then, the conditional RDFs

R_{Y_{i} | W^{*}} (Δ_{i})

are given by

\begin{matrix} R_{Y_{1} | W^{*}} (Δ_{1}) = & inf_{\sum_{j = 1}^{n} Δ_{1, j} = Δ_{1}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j} / Q_{W_{j}^{*}})}{Δ_{1, j}}), \end{matrix}

(213)

\begin{matrix} R_{Y_{2} | W^{*}} (Δ_{2}) = & inf_{\sum_{j = 1}^{n} Δ_{2, j} = Δ_{2}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j} Q_{W_{j}^{*}})}{Δ_{2, j}}), \end{matrix}

(214)

and the optimal

Δ_{1, j}, Δ_{2, j}

are obtained from the water-filling equations,

\begin{matrix} Δ_{1, j} = \{\begin{matrix} λ, & λ < 1 - d_{j} / Q_{W_{j}^{*}} \\ 1 - d_{j}, & λ \geq 1 - d_{j} / Q_{W_{j}^{*}}, \end{matrix} Δ_{1} \in [0, \infty), \end{matrix}

(215)

\begin{matrix} Δ_{2, j} = \{\begin{matrix} λ, & λ < 1 - d_{j} Q_{W_{j}^{*}} \\ 1 - d_{j}, & λ \geq 1 - d_{j} Q_{W_{j}^{*}}, \end{matrix} Δ_{2} \in [0, \infty) . \end{matrix}

(216)

and the representations of part (b) hold, with

Q_{E_{i}}, Q_{Y_{i} | W^{*}}

diagonal matrices.

Proof.

(a) Since the attention is restricted to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), then the statements of joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

of part (a) are a special case of Theorem 12. (c), and obtained from Corollary 2. Similarly, expressions (193)–(195) follow from (13). However, as demonstrated shortly, these also follow, from the derivation of part (b). (b) Recall that the joint rate distortion function is achieved by a jointly Gaussian distribution

P_{Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}}

such that the average square-error distortions are satisfied. Consider the realization of the random variables

(Y_{1}, Y_{2})

which induce the family of measures

P_{ci} \subseteq P_{m i n}^{C I G}

, as defined in Corollary 1, by (84)–(88). By properties of mutual information, then

\begin{matrix} I (Y_{1}, Y_{2}; {\hat{Y}}_{1}, {\hat{Y}}_{2}) = & H (Y_{1}, Y_{2}) - I (Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}) \end{matrix}

(217)

\begin{matrix} = & H (Y_{1}, Y_{2}) - H (Y_{1} | {\hat{Y}}_{1}, {\hat{Y}}_{2}, Y_{2}) - H (Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}) \end{matrix}

(218)

\begin{matrix} = & H (Y_{1}, Y_{2}) - H (Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}, Y_{1}) - H (Y_{1} | {\hat{Y}}_{1}, {\hat{Y}}_{2}) \end{matrix}

(219)

\begin{matrix} \geq & H (Y_{1}, Y_{2}) - H (Y_{1} | {\hat{Y}}_{1}) - H (Y_{2} | {\hat{Y}}_{2}), c o n d . r e d u c e s e n t r o p y, \end{matrix}

(220)

\begin{matrix} = & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) + n ln (2 π e) - H (Y_{1} | {\hat{Y}}_{1}) - H (Y_{2} | {\hat{Y}}_{2}) \\ \geq & \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) + n ln (2 π e) - \frac{1}{2} \sum_{i = 1}^{n} ln (Δ_{1, i}) - \frac{1}{2} n ln (2 π e) \end{matrix}

(221)

\begin{matrix} - \frac{1}{2} \sum_{i = 1}^{n} ln (Δ_{2, i}) - \frac{1}{2} n ln (2 π e), m a x i m u m e n t r o p y o f G a u s . d i s t . \end{matrix}

(222)

\begin{matrix} = & \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{(1 - d_{i}^{2})}{Δ_{1, i} Δ_{2, i}}) \end{matrix}

(223)

where

\sum_{i = 1}^{n} Δ_{1, i} = E [| | Y_{1} - {\hat{Y}}_{1} {| |}_{R^{n}}^{2}] \leq Δ_{1}

and

\sum_{i = 1}^{n} Δ_{2, i} = E [| | Y_{2} - {\hat{Y}}_{2} {| |}_{R^{n}}^{2}] \leq Δ_{2}

. The average distortion satisfies

Δ_{1} \geq E [| | Y_{1} - {\hat{Y}}_{1} {| |}_{R^{n}}^{2}] \geq E [| | Y_{1} - E [Y_{1} | F^{{\hat{Y}}_{1}}] {| |}_{R^{n}}^{2}],

(224)

Δ_{2} \geq E [| | Y_{1} - {\hat{Y}}_{1} {| |}_{R^{n}}^{2}] \geq E [| | Y_{1} - E [Y_{1} | F^{{\hat{Y}}_{1}}] {| |}_{R^{n}}^{2}] .

(225)

Furthermore,

\begin{matrix} i f P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}} = P_{Y_{1} | {\hat{Y}}_{1}} P_{Y_{2} | {\hat{Y}}_{2}} t h e n i n e q u a l i t y (220) h o l d s w i t h e q u a l i t y, \end{matrix}

(226)

i f E [Y_{1} | F^{{\hat{Y}}_{1}}] = {\hat{Y}}_{1} t h e n i n e q u a l i t y (224) h o l d s w i t h e q u a l i t y,

(227)

\begin{matrix} i f E [Y_{2} | F^{{\hat{Y}}_{2}}] = {\hat{Y}}_{2} t h e n i n e q u a l i t y (225) h o l d s w i t h e q u a l i t y . \end{matrix}

(228)

It can be verified that the representations (196)–(209) satisfy

P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}} = P_{Y_{1} | {\hat{Y}}_{1}} P_{Y_{2} | {\hat{Y}}_{2}}

,

E [Y_{1} | F^{{\hat{Y}}_{1}}] = {\hat{Y}}_{1}

,

E [Y_{2} | F^{{\hat{Y}}_{2}}] = {\hat{Y}}_{2}

, and that all inequalities become equalities. The decomposition of the joint distribution according to (210) follows from the representations of

({\hat{Y}}_{1}, {\hat{Y}}_{2})

, and similarly for (211) and (112). The conditional RDFs

R_{Y_{i} | W} (Δ_{i}), i = 1, 2

are shown as above. (c) This is easily verified, because (210) holds and hence rates lie on the Pangloss plane, for the strictly positive surface,

D_{C} (Y_{1}, Y_{2})

. (d) This follows directly from parts (a)–(c). □

Remark 8.

We should emphasize that Theorem 16 does not fully characterize the Pangloss plane, i.e., the subset of the rate regions such that

R_{Y_{1} | W} (Δ_{1}) + R_{Y_{2} | W} (Δ_{2}) + I (Y_{1}, Y_{2}; W) = R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

is larger than

D_{C} (Y_{1}, Y_{2})

. To determine the entire set that characterizes the Pangloss plane, we need to consider the rate distortion function (177) with (178), and the general realization (40). We do not pursue this further, because it requires the closed-form solution of

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

, which is currently an open and challenging problem, and beyond the scope of this paper. We should mention that the analysis of the scalar-valued Gaussian example in [12,13], i.e., when

p_{1} = p_{2} = 1

, made use of closed-form expression of

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

due to [13].

Proof

(Proof of Theorem 6). One way to prove the statement is to compute the characterizations of the rate distortion functions

R_{Y_{i}} (Δ_{i}), R_{Y_{i} | W} (Δ_{i}), i = 1, 2

and

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

, using the realization of the random variables

(Y_{1}, Y_{2})

which induce the family of measures

P_{ci} \subseteq P_{m i n}^{C I G}

, as defined in Corollary 1, by (84)–(88). In view of Definition 7. (b), it suffices to verify that identity (15) holds, i.e.,

R_{Y_{1} | W} (Δ_{1}) + R_{Y_{2} | W} (Δ_{2}) + I (Y_{1}, Y_{2}; W) = R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

for

(Δ_{1}, Δ_{2}) \in D_{W}

, for the choice

W = W^{*} \in G (0, I)

which achieves the minimum in (188) (i.e., due to Theorem 11. (b)).

Similar to Theorem 16, it can be shown that the conditional RDFs

R_{Y_{i} | W^{*}} (Δ_{i}), i = 1, 2

are given by

R_{Y_{1} | W^{*}} (Δ_{1}) = inf_{\sum_{j = 1}^{n} Δ_{1, j} \leq Δ_{1}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j})}{Δ_{1, j}}), W^{*} \in G (0, I)

(229)

R_{Y_{2} | W^{*}} (Δ_{2}) = inf_{\sum_{j = 1}^{n} Δ_{2, j} \leq Δ_{1}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j})}{Δ_{2, j}}), W^{*} \in G (0, I),

(230)

\begin{matrix} E | | Y_{1} - {\hat{Y}}_{1} {| |}_{R^{n}}^{2} = \sum_{j = 1}^{n} Δ_{1, j}, E | | Y_{2} - {\hat{Y}}_{2} {| |}_{R^{n}}^{2} = \sum_{j = 1}^{n} Δ_{2, j} . \end{matrix}

(231)

The pay-off of the joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

in (190) is related to the pay-offs of the conditional RDFs

R_{Y_{1} | W} (Δ_{1}), R_{Y_{2} | W} (Δ_{2})

, and

C (Y_{1}, Y_{2}) = I (Y_{1}, Y_{2}; W^{*})

in (102), via the identity

\frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j}^{2})}{Δ_{1, j} Δ_{2, j}}) = \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j})}{Δ_{1, j}}) + \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j})}{Δ_{2, j}}) + \frac{1}{2} \sum_{i = 1}^{n} ln (\frac{1 + d_{i}}{1 - d_{i}}) .

(232)

For

(Δ_{1}, Δ_{2}) \in D_{W}

defined by (47), it then follows from (232), the identity

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}) = inf_{\sum_{j = 1}^{n} Δ_{1, j} \leq Δ_{1}, \sum_{j = 1}^{n} Δ_{2, j} \leq Δ_{2}} \frac{1}{2} \sum_{j = 1}^{n} ln (\frac{(1 - d_{j}^{2})}{Δ_{1, j} Δ_{2, j}})

(233)

= R_{Y_{1} | W^{*}} (Δ_{1}) + R_{Y_{2} | W^{*}} (Δ_{2}) + I (Y_{1}, Y_{2}; W^{*}), (Δ_{1}, Δ_{2}) \in D_{W} d e f i n e d b y (47) .

(234)

This completes the proof. □

4.3. Applications to Problems of the Literature [15,16,17]

The next two corollaries illustrate the application of the results developed in this paper to the optimization problems analyzed in [15,16,17].

Corollary 3.

Applications to problems in [15]

Consider the Gaussian secure source coding and Wyner’s common information [15], defined by the optimization problem [15] (see Equation (18), Section IV.B),

\begin{matrix} arg min_{P_{Y_{1}, Y_{2}, W} : P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}} \{λ I (Y_{1}; W) + I (Y_{1}, Y_{2}; W)\}, λ \in [0, \infty) \end{matrix}

(235)

where the tuple

(Y_{1}, Y_{2})

are the zero mean jointly Gaussian, and

W : Ω \to W

is continuous or discrete-valued random variable (The derivation of the formula for (235) in [15] makes use of rate distortion functions, [15] (Equation (47))).

Then, the following hold.

For any jointly distributed random variables

(Y_{1}, Y_{2}, W)

that minimize the expression in (235), there exists a jointly Gaussian triple

(Y_{1}, Y_{2}, W)

such that

W : \to R^{n}

is a Gaussian random variable, which achieves the same minimum value.

Moreover, the following characterization of (235) holds.

\begin{matrix} arg min_{P_{Y_{1}, Y_{2}, W} : P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}} \{λ I (Y_{1}; W) + I (Y_{1}, Y_{2}; W)\} \\ = arg min_{Q_{W} \in Q_{W} : Q_{W} d e f i n e d b y (82)} {\frac{λ}{2} ln \frac{1}{det (I - D^{1 / 2} Q_{W} D^{1 / 2})} + \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) \end{matrix}

(236)

\begin{matrix} - \frac{1}{2} ln (det ([I - D^{1 / 2} D_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} D_{W} D^{1 / 2}]))} \\ = arg min_{Q_{W} = D_{W} \in Q_{W} : Q_{W} d e f i n e d b y (82)} {\frac{λ}{2} \sum_{i = 1}^{n} ln ({[1 - \frac{d_{i}}{q_{i}}]}^{- 1}) + \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) \\ - \frac{1}{2} ln ([1 - \frac{d_{i}}{q_{i}}] [1 - q_{i} d_{i}])} \end{matrix}

(237)

where

\begin{matrix} Q_{W} = D_{W} = Diag (q_{1}, q_{2}, \dots, q_{n}) \in Q_{W}, q_{1} \geq q_{2} \geq \dots \geq q_{n} > 0, Q_{W} = (82) . \end{matrix}

(238)

Proof.

By the use Theorem 9. (c), it suffices to restrict attention to jointly Gaussian random variables

(Y_{1}, Y_{2}, W)

. Transform the tuple

(Y_{1}, Y_{2})

in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), and consider the realization of the transformed random variables of Corollary 1. Then, the value of

I (Y_{1}; W) + I (Y_{2}, Y_{2}; W)

is identical to the value of the same expression, evaluated using the realization of Corollary 1. By simple evaluation, using the realization of Corollary 1,

\begin{matrix} λ I (Y_{1}; W) & + I (Y_{2}, Y_{2}; W) = \frac{λ}{2} ln \frac{1}{det (I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})} \\ + \frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} ln (det ([I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{W} D^{1 / 2}])) \end{matrix}

(239)

and it is parameterized by

Q_{W} \in Q_{W}

, where

Q_{W}

is defined by the set of Equation (82). By Hadamard’s determinant inequality, an achievable lower bound on the first right-hand side term of (239), holds if

Q_{W} \in Q_{W}

and

(I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2})

is a diagonal matrix, and this lower bound is achieved by a diagonal

Q_{W} \in Q_{W}

. Furthermore, by recalling the derivation of Theorem 11, an achievable lower bound on the second right-hand side term of (239) holds, i.e., of

I (Y_{1}, Y_{2}; W)

, when

Q_{W} \in Q_{W}

is diagonal. Hence, both lower bounds are achieved simultaneously, by

Q_{W} \in Q_{W}

and

Q_{W}

a diagonal matrix. Then, an achievable lower bound on (239) is obtained, if

Q_{W}

is specified by (238). □

The remaining optimization problem in (237) is easily carried out, and hence omitted.

Corollary 4 illustrates the application of the results developed in this paper to the Gaussian relaxed Wyner’s common information [16,17] (Definition 2 and Section III).

Corollary 4.

Applications to problems in [16,17]

Consider the Gaussian relaxed Wyner’s common information considered in [16,17] (see Definition 2 and Section III of [17])

\begin{matrix} C_{γ} (Y_{1}, Y_{2}) = & min_{P_{W | Y_{1}, Y_{2}} : I (Y_{1}; Y_{2} | W) \leq γ} I (Y_{1}, Y_{2}; W) \end{matrix}

(240)

where the tuple

(Y_{1}, Y_{2})

are zero mean jointly Gaussian, and

W : Ω \to W

is continuous or discrete-valued random variable (the value of (240) computed in [16,17], Theorem 4, is different from (241); moreover, the derivation in [16,17], Section III.A, is different from the derivation presented below). Then

\begin{matrix} C_{γ} (Y_{1}, Y_{2}) = C (Y_{1}, Y_{2}) = (19), \forall γ \in (0, \infty) . \end{matrix}

(241)

Proof.

By the use Theorem 9. (c), it suffices to restrict the attention to jointly Gaussian random variables

(Y_{1}, Y_{2}, W)

. By Proposition 3 or Corollary 1, there exists a family of realizations of

(Y_{1}, Y_{2})

parameterized by a Gaussian random variable W, which induces conditional independence

P_{Y_{1}, Y_{2} | W} = P_{Y_{1} | W} P_{Y_{2} | W}

, and hence the lower bound

I (Y_{1}, Y_{2}; W) \geq H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W)

is achieved, i.e., the constraint in (240) is always satisfied, because the minimizer is such that

I (Y_{1}; Y_{2} | W) = 0

, i.e., the constraint is not active. Hence, the general solution of (240) is the one given in Theorem 4. □

Remark 9.

Corollary 4 implies that the definition of the relaxed Gaussian Wyner’s common information considered in [16,17] (see Definition 2 and Section III of [17]) should be replaced by

{min}_{P_{W | Y_{1}, Y_{2}} : I (Y_{1}; Y_{2} | W) = γ} I (Y_{1}, Y_{2}; W)

, i.e., the inequality is replaced by an equality, so that the constraint is active for all

γ \in (0, \infty)

.

4.4. Characterization and Parameterization of the Gray and Wyner Rate Region by Jointly Gaussian RVs

Derived in this section, for jointly Gaussian random variables with square-error distortion, using [1] ((4) of page 1703, Equation (42)), i.e., (12), and the RDFs,

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}),

R_{Y_{i} | W} (Δ_{i})

,

R_{Y_{i}} (Δ_{i}), i = 1, 2

, of Theorems 12, 13, 14, and Theorem 9, are:

(1): Theorem 5—the characterizations of the rate region $R_{G W} (Δ_{1}, Δ_{2})$ , and
(2): The characterization of rates that lie on Pangloss Plane.

Proof

(Proof of Theorem 5). (a) This follows from the realization of random variables that induce Gaussian measures, by repeating Theorem 9, without requiring W makes

Y_{1}

and

Y_{2}

conditionally independent, i.e., the stated realization with random variables

Z_{1}

and

Z_{2}

correlated, achieves a lower bound on

I (Y_{1}, Y_{2}; W)

, among all random variables W. (b) The stated characterization (42) follows from the discussion prior to the Theorem, i.e., by an application of [1] ((4) of page 1703, Equation (42)), i.e.,

T (α_{1}, α_{2})

, and Theorem 9, Theorem 13, which imply that the infimum in

T (α_{1}, α_{2})

is over the parameterized set of jointly Gaussian random variables

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

with joint distribution (68). From part (a), then (43) follows. (44) is due to identity

I (Y_{1}, Y_{2}; W) = H (Y_{1}, Y_{2}) - H (Y_{1} | Y_{2}, W) - H (Y_{2} | W)

and that the values

R_{Y_{1} | W} (Δ_{1})

and

R_{Y_{2} | W} (Δ_{2})

depend only on

Q_{Y_{1} | W}

and

Q_{Y_{2} | W}

, and the errors (see Theorem 13). □

Theorem 17 gives the parameterization of a subset of the Pangloss Plane, as a degenerate case of Theorem 5.

Theorem 17.

Consider the statement of Theorem 5.

(a) Rate triples

(R_{0}, R_{1}, R_{2})

that lie on the Pangloss Plane, is determined by the subset of the rate region

R_{G W} (Δ_{1}, Δ_{2})

of Theorem 5. (b), such that the joint distribution

P_{W, Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}}

satisfies the conditions,

\begin{matrix} P_{{\hat{Y}}_{1}, {\hat{Y}}_{2} | W} = P_{{\hat{Y}}_{1} | W} P_{{\hat{Y}}_{2} | W}, P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}, W} = P_{Y_{1}, Y_{2} | {\hat{Y}}_{1}, {\hat{Y}}_{2}} . \end{matrix}

(242)

Specifically, the Pangloss Plane is characterized by

\begin{matrix} T^{G} (α_{1}, α_{2}) = & inf_{(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)} o f (30), (40)} {I (Y_{1}, Y_{2}; W) \end{matrix}

\begin{matrix} + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})} \\ = & inf_{Q_{Y_{1} | Y_{2}, W}, Q_{Y_{2} | W}} {\frac{1}{2} ln \frac{det (Q_{(Y_{1}, Y_{2})})}{det (Q_{Y_{1} | Y_{2}, W}) det (Q_{Y_{2} | W})} \end{matrix}

(243)

+ α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})}

(244)

such that

\begin{matrix} P_{W, Y_{1}, Y_{2}, {\hat{Y}}_{1}, {\hat{Y}}_{2}} s a t i s f i e s (242), a n d i t s m a r g i n a l s i n d u c e t h e t e s t c h a n n e l s \\ o f t h e R D F s, R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2}), R_{Y_{i} | W} (Δ_{i}), R_{Y_{i}} (Δ_{i}), i = 1, 2, o f T h e o r e m s 12, 13, 14 . \end{matrix}

(245)

(b) A subset of the rate triple

(R_{0}, R_{1}, R_{2})

that lie on the Pangloss Plane is determined by the restriction of part (a) to

(Y_{1}, Y_{2} | W) \in CIG

, i.e.,

\begin{matrix} T^{G C I} (α_{1}, α_{2}) = & inf_{(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)}, (Y_{1}, Y_{2} | W) \in CIG} {I (Y_{1}, Y_{2}; W) \end{matrix}

\begin{matrix} + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})} \\ = & inf_{Q_{Y_{1} | W}, Q_{Y_{2} | W}} {\frac{1}{2} ln \frac{det (Q_{(Y_{1}, Y_{2})})}{det (Q_{Y_{1} | W}) det (Q_{Y_{2} | W})} \end{matrix}

(246)

+ α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})}

(247)

such that (245) holds, where

I (Y_{1}, Y_{2}; W) = H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W)

, and

R_{Y_{i} | W} (Δ_{i}),

i = 1, 2

are given in Theorem 13. (c).

Proof.

(a) Condition (242) characterizes the rates

(R_{0}, R_{1}, R_{2}) \in R_{G W} (Δ_{1}, Δ_{2})

that lie on the Pangloss plane, and these are derived in [12] (Theorem 1, Equations (19) and (20)). Hence, the statement follows from Theorem 5.

(b) That (246) defines a subset of

R_{G W} (Δ_{1}, Δ_{2})

, follows from the fact that the set of joint distributions

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)}

and

(Y_{1}, Y_{2} | W) \in CIG

is a subset of the set of joint distributions

(Y_{1}, Y_{2}, W) \in G (0, Q_{(Y_{1}, Y_{2}, W)})

. Moreover, by part (a) and

(Y_{1}, Y_{2} | W) \in CIG

implies

I (Y_{1}, Y_{2}; W) = H (Y_{1}, Y_{2}) - H (Y_{1} | Y_{2}, W) - H (Y_{2} | W) = H (Y_{1}, Y_{2}) - H (Y_{1} | W) - H (Y_{2} | W)

and that the values

R_{Y_{1} | W} (Δ_{1})

and

R_{Y_{2} | W} (Δ_{2})

only depend on

Q_{Y_{1} | W}

and

Q_{Y_{2} | W}

, and the errors, as shown in Theorem 13. (c). Hence, the statement holds. □

From Theorem 16 and Theorem 17. (b) follows a simpler parameterization of rates that lie on the Pangloss Plane of the Gray–Wyner rate region

R_{G W} (Δ_{1}, Δ_{2})

, when

(Y_{1}, Y_{2})

are in canonical variable form.

Corollary 5.

Consider the statement of Theorem 16 with square error distortion functions, i.e., a tuple

(Y_{1}, Y_{2})

is the canonical variable form.

A subset of the rate triple

(R_{0}, R_{1}, R_{2})

that lie on the Pangloss Plane corresponding to the restriction

(Y_{1}, Y_{2} | W) \in CIG

, is determined from

\begin{matrix} T_{cvf}^{C I G} (α_{1}, α_{2}) = & inf_{Q_{W}} \{I (Y_{1}, Y_{2}; W) + α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})\} \\ = & inf_{Q_{W}} {\frac{1}{2} \sum_{i = 1}^{n} ln (1 - d_{i}^{2}) - \frac{1}{2} ln (det ([I - D^{1 / 2} Q_{W}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{W} D^{1 / 2}])) \end{matrix}

(248)

+ α_{1} R_{Y_{1} | W} (Δ_{1}) + α_{2} R_{Y_{2} | W} (Δ_{2})}

(249)

such that (245) holds, where

0 \leq α_{i} \leq 1, i = 1, 2, α_{1} + α_{2} \geq 1

, and where

R_{Y_{i} | W} (Δ_{i}), i = 1, 2

are given in Theorem 16, and the infimum is taken over

Q_{W} \in Q_{W}

, defined by the set of Equation (82).

Proof.

The stated characterization (248) is the application of [1] ((4) of page 1703, Equation (42)) and the results of Theorem 16 and Theorem 17. (b). □

In view of Theorem 1 (i.e., Theorem 8 in [1]), additional parameterizations of the Gray–Wyner rate region

R_{G W} (Δ_{1}, Δ_{2})

directly follow from the expressions already derived, i.e., the joint RDF

R_{Y_{1}, Y_{2}} (Δ_{1}, Δ_{2})

of Theorem 12, the conditional RDFs

R_{Y_{1} | W} (Δ_{1}), R_{Y_{2} | W} (Δ_{2})

of Theorem 13, the maginal RDFs

R_{Y_{1}} (Δ_{1}), R_{Y_{2}} (Δ_{2})

of Theorem 14, and the values of

I (Y_{1}, Y_{2}; W)

, based on Gaussian realization of Theorem 5. (a).

5. Conclusions

This paper formulates the classical Gray and Wyner source coding for a simple network with a tuple of multivariate, correlated Gaussian random variables, with square-error fidelity at the two decoders, from the geometric approach of a Gaussian random variables and the weak stochastic realization of correlated Gaussian random variables. This approach leads to a parameterization of the Gray–Wyner rate region, with respect to variance matrix of the jointly Gaussian triple

(Y_{1}, Y_{2}, W)

, where W a Gaussian auxiliary random variable. However, much remains to be achieved, from the computation point of view, for this problem, and to exploit the new approach to other multi-user problems of information theory.

Author Contributions

C.D.C. and J.H.v.S. contributed to the conceptualization, methodology, and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The work of C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).

Data Availability Statement

Numerical evaluations of Wyner’s common information, based on the implementation of the canonical variable form, and the calculation of the canonical variable coefficients are found in [21], Section 3.7.

Acknowledgments

The second author is grateful to H.S. Witsenhausen (formerly affiliated with Bell Laboratories) for contacts about the problem of common information in the early 1980s. This paper is an answer to their questions about the problem of Wyner’s common information. The authors are very grateful to the University of Cyprus for the partial financial support which made their cooperation possible. The authors are also grateful to Guo Lei (Chinese Academy of Sciences, Institute for Mathematics) and to Xi Kaihua (Shandong University, Jinan, Shandong Province, China; formerly of Delft University of Technology) for help with obtaining copies of the papers of Hua LooKeng.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Appendix A.1. Algorithm to Generate the Canonical Variable Form

The algorithm that generates

Q_{cvf}

is presented below.

Transformation of a variance matrix into its canonical variable form.

Data:

p_{1}, p_{2} \in Z_{+}

,

Q \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}

, satisfying

Q = Q^{T} \geq 0

(it is noted that

Q > 0

implies

Q_{11} > 0,

Q_{22} > 0

), with decomposition

\begin{matrix} Q = (\begin{matrix} Q_{11} & Q_{12} \\ Q_{12}^{T} & Q_{22} \end{matrix}), Q_{11} \in R^{p_{1} \times p_{1}}, Q_{22} \in R^{p_{2} \times p_{2}}, Q_{12} \in R^{p_{1} \times p_{2}}, Q_{11} > 0, Q_{22} > 0 . \end{matrix}

1: Perform singular-value decompositions:

$Q_{11} = U_{1} D_{1} U_{1}^{T}, Q_{22} = U_{2} D_{2} U_{2}^{T},$

with $U_{1} \in R^{p_{1} \times p_{1}}$ orthogonal ( $U_{1} U_{1}^{T} = I = U_{1}^{T} U_{1}$ ) and

$D_{1} = Diag (d_{1, 1}, \dots, d_{1, p_{1}}) \in R^{p_{1} \times p_{1}}, d_{1, 1} \geq d_{1, 2} \geq \dots \geq d_{1, p_{1}} > 0,$

and $U_{2}, D_{2}$ satisfying corresponding conditions.
2: Perform a singular-value decomposition of

$D_{1}^{- \frac{1}{2}} U_{1}^{T} Q_{12} U_{2} D_{2}^{- \frac{1}{2}} = U_{3} D_{3} U_{4}^{T},$

with $U_{3} \in R^{p_{1} \times p_{1}}$ , $U_{4} \in R^{p_{2} \times p_{2}}$ orthogonal and

$\begin{matrix} D_{3} = & (\begin{matrix} I_{p_{11}} & 0 & 0 \\ 0 & D_{4} & 0 \\ 0 & 0 & 0 \end{matrix}) \in R^{p_{1} \times p_{2}}, \\ D_{4} = & Diag (d_{4, 1}, \dots, d_{4, p_{12}}) \in R^{p_{12} \times p_{12}}, 1 > d_{4, 1} \geq d_{4, 2} \geq \dots \geq d_{4, p_{12}} > 0 . \end{matrix}$
3: Compute the new variance matrix according to

$\begin{matrix} Q_{cvf} = (\begin{matrix} I_{p_{1}} & D_{3} \\ D_{3}^{T} & I_{p_{2}} \end{matrix}) . \end{matrix}$
4: The transformation to the canonical variable representation
$(Y_{1} \mapsto S_{1} Y_{1}, Y_{2} \mapsto S_{2} Y_{2})$ is then

$S_{1} = U_{3}^{T} D_{1}^{- \frac{1}{2}} U_{1}^{T}, S_{2} = U_{4}^{T} D_{2}^{- \frac{1}{2}} U_{2}^{T} .$

If

Q > 0

then

Q_{11} > 0

and

Q_{22} > 0

, and

D_{3}

does not contain the first row and first column, i.e., these are removed.

Appendix A.2. Information Theory

In this appendix, the reader finds two formulas of information theory which are used in the body of the paper. These are obtained from [29,30,31].

The first equality is proven in [29] (p. 19, Th. 2.4.1) and [30] (p. 31, (2.4.26), (2.4.28)).

Proposition A1.

Consider random variables

Y_{1, 1}, Y_{1, 2}, Y_{2, 1}, Y_{2, 2}, X_{1}, X_{2}

such that the following two triples are independent random variables,

(Y_{1, 1}, Y_{2, 1}, X_{1})

and

(Y_{1, 2}, Y_{2, 2}, X_{2})

. Then, the mutual information expression additively decomposes,

\begin{matrix} I (Y_{1, 1}, Y_{1, 2}, Y_{2, 1}, Y_{2, 2}; X_{1}, X_{2}) = I (Y_{1, 1}, Y_{2, 1}; X_{1}) + I (Y_{1, 2}, Y_{2, 2}; X_{2}) . \end{matrix}

(A1)

Proof.

For completeness, the proof is given in [21] (Proposition A.2). □

Consider a tuple of jointly Gaussian random variables

(X, Y) \in G (0, Q_{(X, Y)})

with

X : Ω \to R^{n}

,

Y : Ω \to R^{p}

,

Q_{(X, Y)} > 0

, and

Q_{(X, Y)} = (\begin{matrix} Q_{X} & Q_{X, Y} \\ Q_{X, Y}^{T} & Q_{Y} \end{matrix})

. Then,

\begin{matrix} H (X) = & \frac{1}{2} ln (det (Q_{X})) + \frac{1}{2} n ln (2 π e), \end{matrix}

(A2)

\begin{matrix} H (Y | X) = & H (X, Y) - H (X) = \frac{1}{2} ln (det (Q_{Y} - Q_{X, Y}^{T} Q_{X}^{- 1} Q_{X, Y})) + \frac{1}{2} p ln (2 π e), \end{matrix}

(A3)

\begin{matrix} I (Y; X) = & - \frac{1}{2} ln (\frac{det (Q_{(X, Y)})}{det (Q_{Y}) det (Q_{X})}) . \end{matrix}

(A4)

Appendix A.3. An Inequality for Determinants

An inequality for matrices is derived in this appendix which is needed in the body of the paper.

Lemma A1.

Consider the real-valued matrices

A, B \in R^{n \times n}

. Assume that

\begin{matrix} 0 & \leq & I - A^{T} A, 0 < I - B^{T} B, a n d rank (B) = n . \end{matrix}

(A5)

Then,

\begin{matrix} det ([I - A^{T} A] [I - B^{T} B]) \leq {(det (I - A^{T} B))}^{2} . \end{matrix}

(A6)

A related result is mentioned at [32] (Theorem 9.E.6). In that book, the proof of corresponding result refers to paper [33]. That reference was received by the authors but they could not read it because the paper is in Mandarin. However, they could read the formulas of the paper. Hua LooKeng developed these results to calculate an orthonormal basis for a function of one complex variable. A more recent reference for this inequality is [34] (Theorem 7.19).

The proof of Lemma A1 below is analogous to that of Hua LooKeng in [33]. The main differences are in the assumptions.

Lemma A2.

Ref. [33] (pp. 464, 470). Consider the matrices

A, B \in R^{n \times n}

. Assume that

I - B^{T} B

is a nonsingular matrix and that

rank (B) = n

. Then

\begin{matrix} (I - A^{T} A) - (I - A^{T} B) {[I - B^{T} B]}^{- 1} {(I - A^{T} B)}^{T} \\ = & - (A - B) {[I - B^{T} B]}^{- 1} {(A - B)}^{T} . \end{matrix}

(A7)

Proof.

For completeness, the proof is given in [21] (Lemma A.4). □

Proposition A2.

Ref. [33] (Equation (2)). Consider the symmetric positive-definite matrices

Q_{1}, Q_{2},

Q \in R^{n \times n}

such that

Q_{1} + Q_{2} = Q

. Then

\begin{matrix} det (Q_{1}) + det (Q_{2}) \leq det (Q) . \end{matrix}

(A8)

Proof. Proof of Lemma A1.

By the assumptions,

0 \leq (I - A^{T} A)

,

0 < (I - B^{T} B)

, and

rank (B) = n

, from Lemma A2 follows that

\begin{matrix} (A - B) {[I - B^{T} B]}^{- 1} {(A - B)}^{T} + [I - A^{T} A] \\ = & (I - A^{T} B) {[I - B^{T} B]}^{- 1} {(I - A^{T} B)}^{T}; \\ 0 \leq det (I - A^{T} A), b y t h e a s s u m p t i o n o n A, \\ \leq & det ((A - B) {[I - B^{T} B]}^{- 1} {(A - B)}^{T}) + det ([I - A^{T} A]), b y a n a s s u m p t i o n o n B, \\ \leq & det ((I - A^{T} B) {[I - B^{T} B]}^{- 1} {(I - A^{T} B)}^{T}) \\ b y L e m m a, P r o p o s i t i o n, a n d b y t h e a s s u m p t i o n s, \\ = & {(det (I - A^{T} B))}^{2} {[det ([I - B^{T} B])]}^{- 1}, \\ \Rightarrow & det ([I - A^{T} A] [I - B^{T} B]) = det ([I - A^{T} A]) det ([I - B^{T} B]) \leq {(det ([I - A^{T} B]))}^{2} . \end{matrix}

□

Another preliminary result is needed.

Proposition A3.

Consider the matrix

Q_{X} = Q_{X}^{T} \in R^{n \times n}

of Proposition 2 and the matrix

D \in R^{n \times n}

of Definition 1. Thus, both

Q_{X} > 0

and

D > 0

. Then,

\begin{matrix} D \leq Q_{X}^{- 1} \leq D^{- 1} \Leftrightarrow D \leq Q_{X} \leq D^{- 1}, \end{matrix}

(A9)

\begin{matrix} D < Q_{X}^{- 1} < D^{- 1} \Leftrightarrow D < Q_{X} < D^{- 1} . \end{matrix}

(A10)

Proof.

For completeness, the proof is given in [21] (Proposition A.6). □

Proposition A4.

Consider the matrices defined in Definition 1. Thus,

D \in R^{n \times n}

, is a diagonal matrix satisfying

0 < D

, and the matrix

Q_{X} \in R^{n \times n}

satisfies

Q_{X} = Q_{X}^{T}

and

0 < D \leq Q_{X} \leq D^{- 1}

. Then,

\begin{matrix} det ([I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{X} D^{1 / 2}]) \leq det ({[I - D]}^{2}), \forall Q_{X}, \end{matrix}

(A11)

\begin{matrix} det ([I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{X} D^{1 / 2}]) < det ({[I - D]}^{2}), i f Q_{X} \neq I . \end{matrix}

(A12)

Proof.

(1) If

Q_{X} < D^{- 1}

then

det (D^{- 1} - Q_{X}) > 0

. Consider the case of in which

Q_{X}

satisfies

Q_{X} \leq D^{- 1}

but not

Q_{X} < D^{- 1}

. Then,

det (D^{- 1} - Q_{X}) = 0

. Hence,

\begin{matrix} det (I - D^{1 / 2} Q_{X} D^{1 / 2}) = & det (D^{1 / 2} [D^{- 1} - Q_{X}] D^{1 / 2}) \\ = & det (D^{1 / 2}) det (D^{- 1} - Q_{X}) det (Q^{1 / 2}) = 0 . \end{matrix}

Then,

0 < D < I

implies that

det ({[I - D]}^{2}) \geq 0

, hence that the inequality (A11) holds.

(2) If

D < Q_{X}

then

det (D - Q_{X}) < 0

. If

D \leq Q_{X}

but not

D < Q_{X}

then by Proposition A3

Q_{X}^{- 1} \leq D^{- 1}

but not

Q_{X}^{- 1} < D^{- 1}

. Then,

det (D^{- 1} - Q_{X}^{- 1}) = 0

. Hence,

\begin{matrix} det (I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}) = & det (D^{1 / 2} [D^{- 1} - Q_{X}^{- 1}] D^{1 / 2}) \\ = & det (D^{1 / 2}) det (D^{- 1} - Q_{X}^{- 1}) det (D^{1 / 2}) = 0 . \end{matrix}

In this case, the inequality (A11) also holds.

(3) Then consider the case in which

D < Q_{x} < D^{- 1}

. Lemma A1 will be used to prove the result. Define, therefore,

\begin{matrix} A & = & (Q_{X}^{- 1 / 2}) D^{1 / 2}, B = (Q_{X}^{1 / 2}) D^{1 / 2} . \end{matrix}

First, it is proven that the assumptions of the lemma are satisfied. Note that

0 < Q_{X}

implies that

rank (Q_{X}) = n

. This and the fact that

rank (D) = n

imply that

rank (B) = rank (Q_{X}^{1 / 2} D^{1 / 2}) = n

. Further note that

\begin{matrix} I - A^{T} A & = & I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}, \\ 0 < D \leq Q_{X} \leq D^{- 1} b y a s s u m p t i o n, \Rightarrow \\ 0 & < & D \leq Q_{X}^{- 1} \leq D^{- 1}, b y P r o p o s i t i o n, \\ 0 & < & D^{2} \leq D^{1 / 2} Q_{X}^{- 1} D^{1 / 2} \leq I \Rightarrow \\ 0 & \leq & I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2} = I - A^{T} A; \\ 0 < D < Q_{X} < D^{- 1} b y t h e c a s e c o n s i d e r e d, \\ 0 & < & D^{2} < D^{1 / 2} Q_{X} D^{1 / 2} < I, \Rightarrow \\ 0 & < & I - D^{1 / 2} Q_{X} D^{1 / 2} = I - B^{T} B; \\ I - A^{T} B & = & I - D^{1 / 2} Q_{X}^{- 1 / 2} Q_{X}^{1 / 2} D^{1 / 2} = I - D . \end{matrix}

From Lemma A1 it then follows that

\begin{matrix} det ([I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{X} D^{1 / 2}]) \leq det ({[I - D]}^{2}) . \end{matrix}

(4) Then suppose that, in addition,

Q_{X} \neq I

. Then

\begin{matrix} A - B & = & Q_{X}^{- 1 / 2} D^{1 / 2} - Q_{X}^{1 / 2} D^{1 / 2} = Q_{x}^{- 1 / 2} [I - Q_{X}] D^{1 / 2} \neq 0, u s i n g t h a t, Q_{X} \neq I, \\ 0 & < & (A - B) {[I - B^{T} B]}^{- 1} {(A - B)}^{T}, b e c a u s e 0 < I - B^{T} B, a n d A - B \neq 0, \\ det (I - A^{T} A) \\ < & det ((A - B) {[I - B^{T} B]}^{- 1} (A - B)) + det (I - A^{T} A), \\ \leq & det ((I - A^{T} B) {[I - B^{T} B]}^{- 1} {(I - A^{T} B)}^{T}), b y L e m m a, \\ \Rightarrow & det ((I - A^{T} A) (I - B^{T} B)) = det (I - A^{T} A) det (I - B^{T} B) \\ < & {[det (I - A^{T} B)]}^{2}, b y L e m m a a n d i t s p r o o f, \\ \Rightarrow & det ([I - D^{1 / 2} Q_{X}^{- 1} D^{1 / 2}] [I - D^{1 / 2} Q_{X} D^{1 / 2}]) < det ({[I - D]}^{2}), b y s u b . o f A, B . \end{matrix}

□

References

Gray, R.M.; Wyner, A.D. Source coding for a simple network. Bell Syst. Tech. J. 1974, 53, 1681–1721. [Google Scholar] [CrossRef]
Wyner, A.D. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
Witsenhausen, H.S. Values and bounds for common information of two discrete variables. SIAM J. Appl. Math. 1976, 31, 313–333. [Google Scholar] [CrossRef]
Witsenhausen, H.S. On sequences of pairs of dependent random variables. SIAM J. Appl. Math. 1975, 28, 100–113. [Google Scholar] [CrossRef]
Gacs, P.; Korner, J. Common information is much less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Benammar, M.; Zaidi, A. Rate-distortion region of a Gray-Wyner model with side information. Entropy 2018, 20, 2. [Google Scholar] [CrossRef] [PubMed]
Benammar, M.; Zaidi, A. Rate-distortion region of a Gray-Wyner problem with side information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2017), Aachen, Germany, 25–30 June 2017; pp. 106–110. [Google Scholar]
Benammar, M.; Zaidi, A. Rate-distortion function for a Heegard-Berger problem with two sources and degraded reconstruction sets. IEEE Trans. Inf. Theory 2016, 62, 5080–5092. [Google Scholar] [CrossRef]
Benammar, M.; Zaidi, A. Rate-distortion function for a Heegard-Berger problem with common reconstruction constraint. In Proceedings of the IEEE International Theory Workshop (ITW.2015), Jeju Island, Korea, 11–15 October 2015. [Google Scholar]
Heegard, C.; Berger, T. Rate distortion when side information may be absent. IEEE Trans. Inf. Theory 1985, 31, 727–734. [Google Scholar] [CrossRef]
Cuff, P.W.; Permuter, H.H.; Cover, T.M. Coordination Capacity. IEEE Trans. Inf. Theory 2010, 56, 4181–4206. [Google Scholar] [CrossRef]
Kumar, B.; Viswanatha, E.A.; Rose, K. The lossy common information of correlated sources. IEEE Trans. Inf. Theory 2014, 60, 3238–3253. [Google Scholar]
Xu, G.; Liu, W.; Chen, B. A lossy source coding interpretation of Wyner’s common information. IEEE Trans. Inf. Theory 2016, 62, 754–768. [Google Scholar] [CrossRef]
Xiao, J.-J.; Luo, Z.-Q. Compression of Correlated Gaussian Sources under Individual Distortion Criteria. In Proceedings of the 43rd Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 28–30 September 2005; pp. 438–447. [Google Scholar]
Satpathy, S.; Cuff, P. Gaussian secure source coding and Wyner’s common information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2015), Hong Kong, China, 14–19 July 2015; pp. 116–120. [Google Scholar]
Veld, G.J.O.; Gastpar, M.C. Total correlation of Gaussian vector sources on the Gray-Wyner network. In Proceedings of the 54th Annual Allerton Conference on Communication, Control and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 385–392. [Google Scholar]
Sula, E.; Gastpar, M. Relaxed Wyner’s common information. arXiv 2019, arXiv:1912.07083. [Google Scholar]
Hotelling, H. Relation between two sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
Gelfand, M.I.; Yaglom, M. Calculation of the amount of information about a random function contained in another such function. Am. Math. Soc. Transl. 1959, 2, 199–246. [Google Scholar]
van Schuppen, J.H. Common, correlated, and private information in control of decentralized systems. In Coordination Control of Distributed Systems; van Schuppen, J.H., Villa, T., Eds.; Number 456 in Lecture Notes in Control and Information Sciences; Springer International Publishing: Cham, Switzerland, 2015; pp. 215–222. [Google Scholar]
Charalambous, C.D.; van Schuppen, J.H. A new approach to lossy network compression of a tuple of correlated multivariate Gaussian RVs. arXiv 2019, arXiv:1905.12695. [Google Scholar]
van Putten, C.; van Schuppen, J.H. The weak and strong Gaussian probabilistic realization problem. J. Multivar. Anal. 1983, 13, 118–137. [Google Scholar] [CrossRef]
Noble, B. Applied Linear Algebra; Prentice-Hall: Englewood Cliffs, NJ, USA, 1969. [Google Scholar]
Stylianou, E.; Charalambous, C.D.; Charalambous, T. Joint rate distortion function of a tuple of correlated multivariate Gaussian sources with individual fidelity criteria. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT.2021), Melbourne, Australia, 12–20 July 2021; pp. 2167–2172. [Google Scholar]
Gkagkos, M.; Charalambous, C.D. Structural properties of optimal test channels for distributed source coding with decoder side information for multivariate Gaussian sources with square-error fidelity. arXiv 2020, arXiv:2011.10941. [Google Scholar]
Gkangos, M.; Charalambous, C.D. Structural properties of test channels of the RDF for Gaussian multivariate distributed sources. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT.2021), Melbourne, Australia, 12–20 July 2021; pp. 2631–2636. [Google Scholar]
Charalambous, C.D.; van Schuppen, J.H. Characterization of conditional independence and weak realizations of multivariate gaussian random variables: Applications to networks. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2020), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
Gray, R.M. A new class of lower bounds to information rates of stationary via conditional rate-distortion functions. IEEE Trans. Inf. Theory 1973, 19, 480–489. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: New York, NY, USA, 1968. [Google Scholar]
Yaglom, A.M.; Yaglom, I.M. Probability and Information; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1983. [Google Scholar]
Marshall, A.W.; Olkin, I. Inequalities: Theory of Majorization and Its Applications; Academic Press: New York, NY, USA, 1979. [Google Scholar]
Hua, L.K. Inequalies involving determinants. Acta Math. Sin. 1955, 5, 463–470, (In Chinese; English summary). [Google Scholar]
Zhang, F. Positive semidefinite matrices. In Matrix Theory: Basic Results and Techniques; Springer Science+Business Media: New York, NY, USA, 2011; pp. 199–252. [Google Scholar]

Figure 1. The Gray and Wyner source coding for a simple network [1]

(Y_{1, i}, Y_{2, i}) \sim P_{Y_{1}, Y_{2}}, i = 1, \dots, N

.

Figure 1. The Gray and Wyner source coding for a simple network [1]

(Y_{1, i}, Y_{2, i}) \sim P_{Y_{1}, Y_{2}}, i = 1, \dots, N

.

Figure 2. Weak stochastic realization of

(Y_{1, i}, Y_{2, i}) \sim P_{Y_{1}, Y_{2}}, i = 1, \dots, N

and

({\hat{Y}}_{1, i}, {\hat{Y}}_{2, i}), i = 1, \dots, N

at the encoder and decoder with respect to the common and private random variables

(W^{N}, Z_{1}^{N}, Z_{2}^{N}), (W^{N}, {\hat{Z}}_{1}^{N}, {\hat{Z}}_{2}^{N})

.

Figure 2. Weak stochastic realization of

(Y_{1, i}, Y_{2, i}) \sim P_{Y_{1}, Y_{2}}, i = 1, \dots, N

and

({\hat{Y}}_{1, i}, {\hat{Y}}_{2, i}), i = 1, \dots, N

at the encoder and decoder with respect to the common and private random variables

(W^{N}, Z_{1}^{N}, Z_{2}^{N}), (W^{N}, {\hat{Z}}_{1}^{N}, {\hat{Z}}_{2}^{N})

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Charalambous, C.D.; van Schuppen, J.H. A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs. Entropy 2022, 24, 1227. https://doi.org/10.3390/e24091227

AMA Style

Charalambous CD, van Schuppen JH. A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs. Entropy. 2022; 24(9):1227. https://doi.org/10.3390/e24091227

Chicago/Turabian Style

Charalambous, Charalambos D., and Jan H. van Schuppen. 2022. "A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs" Entropy 24, no. 9: 1227. https://doi.org/10.3390/e24091227

APA Style

Charalambous, C. D., & van Schuppen, J. H. (2022). A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs. Entropy, 24(9), 1227. https://doi.org/10.3390/e24091227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs^†

Abstract

1. Introduction

1.1. Literature Review

1.2. Main Theorems and Discussion

1.3. Structure of the Paper

2. Probabilistic Properties of Tuples of Random Variables

2.1. Notation of Elements of Probability Theory

2.2. Geometric Approach of Gaussian Random Variables and Canonical Variable Form

2.3. Conditional Independence of a Triple of Gaussian Random Variables

2.4. Weak Realization of a Gaussian Probability Measure on a Tuple of Random Variables

2.5. Characterization of Minimal Conditional Independence of a Triple of Gaussian Random Variables

3. Wyner’s Common Information

3.1. Reduction of the Calculation of Wyner’s Common Information

3.2. Wyner’s Common Information of Correlated Random Variables

3.3. Wyner’s Common Information of Arbitrary Gaussian Random Variables

4. Parametrization of Gray and Wyner Rate Region and Wyner’s Lossy Common Information

4.1. Characterizations of Joint, Conditional and Marginal RDFs

4.2. Wyner’s Lossy Common Information of Correlated Gaussian Vectors

4.3. Applications to Problems of the Literature [15,16,17]

4.4. Characterization and Parameterization of the Gray and Wyner Rate Region by Jointly Gaussian RVs

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Algorithm to Generate the Canonical Variable Form

Appendix A.2. Information Theory

Appendix A.3. An Inequality for Determinants

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs †

Abstract

1. Introduction

1.1. Literature Review

1.2. Main Theorems and Discussion

1.3. Structure of the Paper

2. Probabilistic Properties of Tuples of Random Variables

2.1. Notation of Elements of Probability Theory

2.2. Geometric Approach of Gaussian Random Variables and Canonical Variable Form

2.3. Conditional Independence of a Triple of Gaussian Random Variables

2.4. Weak Realization of a Gaussian Probability Measure on a Tuple of Random Variables

2.5. Characterization of Minimal Conditional Independence of a Triple of Gaussian Random Variables

3. Wyner’s Common Information

3.1. Reduction of the Calculation of Wyner’s Common Information

3.2. Wyner’s Common Information of Correlated Random Variables

3.3. Wyner’s Common Information of Arbitrary Gaussian Random Variables

4. Parametrization of Gray and Wyner Rate Region and Wyner’s Lossy Common Information

4.1. Characterizations of Joint, Conditional and Marginal RDFs

4.2. Wyner’s Lossy Common Information of Correlated Gaussian Vectors

4.3. Applications to Problems of the Literature [15,16,17]

4.4. Characterization and Parameterization of the Gray and Wyner Rate Region by Jointly Gaussian RVs

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Algorithm to Generate the Canonical Variable Form

Appendix A.2. Information Theory

Appendix A.3. An Inequality for Determinants

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs^†