Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints

Stavrou, Photios A.; Østergaard, Jan; Skoglund, Mikael

doi:10.3390/e22080842

Open AccessArticle

Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints

by

Photios A. Stavrou

^1,*

,

Jan Østergaard

²

and

Mikael Skoglund

¹

Department of Intelligent Systems, Division of Information Science and Engineering, KTH Royal Institute of Technology, 11428 Stockholm, Sweden

²

Section on Signal and Information Processing, Department of Electronic Systems, Aalborg University, 9000 Aalborg, Denmark

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(8), 842; https://doi.org/10.3390/e22080842

Submission received: 18 June 2020 / Revised: 22 July 2020 / Accepted: 27 July 2020 / Published: 30 July 2020

(This article belongs to the Special Issue Multiuser Information Theory III)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we derive lower and upper bounds on the OPTA of a two-user multi-input multi-output (MIMO) causal encoding and causal decoding problem. Each user’s source model is described by a multidimensional Markov source driven by additive

i . i . d .

noise process subject to three classes of spatio-temporal distortion constraints. To characterize the lower bounds, we use state augmentation techniques and a data processing theorem, which recovers a variant of rate distortion function as an information measure known in the literature as nonanticipatory

ϵ

-entropy, sequential or nonanticipative RDF. We derive lower bound characterizations for a system driven by an

i . i . d .

Gaussian noise process, which we solve using the SDP algorithm for all three classes of distortion constraints. We obtain closed form solutions when the system’s noise is possibly non-Gaussian for both users and when only one of the users is described by a source model driven by a Gaussian noise process. To obtain the upper bounds, we use the best linear forward test channel realization that corresponds to the optimal test channel realization when the system is driven by a Gaussian noise process and apply a sequential causal DPCM-based scheme with a feedback loop followed by a scaled ECDQ scheme that leads to upper bounds with certain performance guarantees. Then, we use the linear forward test channel as a benchmark to obtain upper bounds on the OPTA, when the system is driven by an additive

i . i . d .

non-Gaussian noise process. We support our framework with various simulation studies.

Keywords:

bounds; causal coding; one-shot information theory; convex programming; estimation; spatial distortion constraints; temporal distortion constraints; multi-user rate distortion theory

1. Problem Statement

We consider the two-user causal encoding and causal decoding setup illustrated in Figure 1. In this setup, users 1 and 2 are modeled by the following discrete-time time-invariant multidimensional Markov processes:

\begin{matrix} \begin{matrix} x_{t + 1}^{1} & = A_{1} x_{t}^{1} + w_{t}^{1}, t = 0, 1, \dots \\ x_{t + 1}^{2} & = A_{2} x_{t}^{2} + w_{t}^{2}, \end{matrix}, \end{matrix}

(1)

where

x_{t}^{1} \in R^{p_{1}}, x_{t}^{2} \in R^{p_{2}}

, with

p_{1}

not necessarily equal to

p_{2}

,

(A_{1}, A_{2})

are known constant matrices of appropriate dimensions and

(w_{t}^{1}, w_{t}^{2})

are additive

i . i . d .

possibly non-Gaussian noise processes with zero mean and covariance matrix

Σ_{w^{i}} ≻ 0, i = 1, 2

, independent of

x_{0}^{i}, i = 1, 2

and from each other for all

t \geq 0

. The initial states

x_{0}^{i}, i = 1, 2

are given by

x_{0}^{i} \sim (0; Σ_{x_{0}^{i}}), i = 1, 2

. Finally, we restrict the eigenvalues of

(A_{1}, A_{2})

to be within the unit circle, which means that each user’s system model in (1) is asymptotically stable (i.e., asymptotically stationary).

The goal is to cast the performance of the setup in Figure 1 under various distortion metrics when the encoder compress information causally whereas the lossless compression between the encoder and the decoder is done in one shot assuming their clocks are synchronized.

First, we apply state space augmentation [1] to the state-space models in (1) to transform them into a single augmented state-space model as follows:

\begin{matrix} x_{t + 1} = A x_{t} + w_{t}, \end{matrix}

(2)

where

x_{t + 1} = {[x_{t + 1}^{1^{T}}, x_{t + 1}^{2^{T}}]}^{^{T}} \in R^{p_{1} + p_{2}}

, A is a block diagonal matrix and

w_{t}

is an additive

i . i . d .

possibly non-Gaussian noise process such that

w_{t} \sim (0; Σ_{w})

where

(A, Σ_{w})

are of the form

\begin{matrix} [\begin{matrix} A_{1} & 0 \\ 0 & A_{2} \end{matrix}] \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}, [\begin{matrix} Σ_{w^{1}} & 0 \\ 0 & Σ_{w^{2}} \end{matrix}] \in R^{(p 1 + p 2) \times (p 1 + p 2)} . \end{matrix}

(3)

We note that the operation in (3) can be mathematically denoted as

A = A_{1} \oplus A_{2}

and similarly,

Σ_{w} = Σ_{w^{1}} \oplus Σ_{w^{2}}

(see the notation section for “⊕”).

System Operation: The encoder at each time instant t observes the augmented state

x_{t}

and generates the data packet

ℓ_{t} \in {1, \dots, 2^{R_{t}}}

of instantaneous rate

R_{t}

. At time t, the packet

ℓ_{t}

is sent over a noiseless channel with rate

R_{t}

. The decoder at each time t, receives

ℓ_{t}

to construct an estimate

y_{t}

of

x_{t}

. We assume that the clocks of the encoder and decoder are synchronized. Formally, the encoder (

E

) and the decoder (

D

) are specified by a sequence of measurable functions

{(f_{t}, g_{t}) : t \in N_{0}}

as follows:

\begin{matrix} \begin{matrix} E : ℓ_{t} & = f_{t} (ℓ^{t - 1}, x^{t}), ℓ_{- \infty}^{- 1} = \emptyset, \\ D : y_{t} & = g_{t} (ℓ^{t}) . \end{matrix} \end{matrix}

(4)

1.1. Generalizations

It should be noted that the setup in Figure 1 can be generalized to any finite number of users. The only change will appear in the number of state-space equations and the dimension of the vectors and matrices in the augmented state-space representation of (2).

Next, we explain the setup of two users that are correlated (in states). In such scenario, users 1 and 2 are modeled by the following discrete-time time-invariant multidimensional Markov processes:

\begin{matrix} \begin{matrix} x_{t + 1}^{1} & = A_{11} x_{t}^{1} + A_{12} x_{t}^{2} + w_{t}^{1}, t = 0, 1, \dots \\ x_{t + 1}^{2} & = A_{22} x_{t}^{2} + A_{21} x_{t}^{1} + w_{t}^{2}, \end{matrix}, \end{matrix}

(5)

where

x_{t}^{1} \in R^{p_{1}}, x_{t}^{2} \in R^{p_{2}}

, with

p_{1}

not necessarily equal to

p_{2}

,

(A_{11}, A_{12}, A_{21}, A_{22})

are known constant matrices of appropriate dimensions whereas all the other assumptions remain similar to the user models described in (1). The single augmented state-space model now is obtained as follows:

\begin{matrix} x_{t + 1} = \hat{A} x_{t} + w_{t}, \end{matrix}

(6)

where

\hat{A}

is a block matrix of the form

\begin{matrix} [\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix}] \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}, \end{matrix}

(7)

where

A_{11}, A_{22}

are square matrices but

(A_{12}, A_{21})

may be rectangular matrices (if

p_{1} \neq p_{2}

). We will not consider this case in our paper because it is straightforward by replacing everywhere matrix A with matrix

\hat{A}

. Clearly, this case can be generalized to any finite number of users with appropriate modifications on the state-space models.

1.2. Distortion Constraints

In this work we consider three types of distortion constraints. These are articulated as follows:

(i): a per-dimension (spatial) distortion constraint on the asymptotically averaged total (across the time) MMSE covariance matrix;
(ii): an asymptotically averaged total (across the time and across the space) distortion constraint;
(iii): a covariance matrix distortion constraint.

Next, we give the definition of each distortion constraint and explain some of their utility in multi-user systems.

A per-dimension (spatial) distortion constraint imposed on the covariance distortion matrix

Σ_{Δ} ≜ {lim sup}_{n ⟶ \infty} \frac{1}{n + 1} \sum_{t = 0}^{n} E \{(x_{t} - y_{t}) {(x_{t} - y_{t})}^{T}\}

, where

Σ_{Δ} ⪰ 0

, is defined as follows:

\begin{matrix} Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p, \end{matrix}

(8)

where

D_{i i} \in [0, D_{i i}^{max}]

are given diagonal entries of the positive semidefinite matrix

\hat{D} ⪰ 0

, with

trace (\hat{D}) \equiv D

,

D \in [0, D_{max}]

. Note that under this distortion constraint, it trivially holds that

{lim sup}_{n ⟶ \infty} \frac{1}{n + 1} \sum_{t = 0}^{n} E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D

.

Utility: The choice of per-dimension distortion constraints is arguably more realistic in various network systems. For instance, one use of such hard constraints can be found in multivariable feedback control systems also called centralized multi-input multi-output (MIMO) systems [2] (see Figure 2). In such networks, it may be the case that one wishes to minimize the temporally total performance criterion or satisfy some total fidelity constraint. However, in addition it is always required that the resource allocation to the different nodes (or variables) to never exceed certain performance thresholds when the demands in data transmission within the communication link allows only limited rate. Nonetheless, the problem is that variables interact. Some variables could be considered more important for certain applications according to the demands of the system or the quality of service, which is why they need hard constraints.

An asymptotically averaged total (across the time and space) distortion constraint is defined as follows:

\begin{matrix} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} \sum_{t = 0}^{n} E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D_{^{T}}, \end{matrix}

(9)

where

D_{^{T}} \in [0, D_{^{T}}^{max}]

, with

D_{^{T}}

not necessarily equal to D.

Utility: The asymptotically averaged total (across the time and space) distortion constraint ensure shared or allocated distortion arbitrarily among the transmit dimensions. The combination of the per-dimension distortion constraint with the averaged total distortion constraints ensure a total allocated distortion budget in the system that depends on the allowable (by design) distortion budget at each dimension (or user).

A covariance matrix distortion constraint is a generalization of the per-dimension distortion constraint defined by

\begin{matrix} Σ_{Δ} ⪯ D_{cov}, \end{matrix}

(10)

where

D_{cov} ⪰ 0

.

Utility: During the recent years, there has been a shift from conventional

MSE

distortion constraints (scalar-valued target distortions) to covariance matrix distortions in the areas of multiterminal and distributed source coding [3,4,5,6,7] and signal processing [7,8,9]. Nonetheless, the argument for considering covariance distortion constraints despite its difficulty is its generality and the flexibility in formulating new problems. For instance, one practical example would be wireless adhoc microphones, that transmit to receiver(s) over an MIMO channel. In such setups, perhaps the receiver(s) need to do beam forming or some multi-channel Wiener filtering variants. In both cases, one needs to know the covariance matrix of e.g., the error signal (covariance distortion matrix) to perform the desired signal enhancement. For example, if the quality of one of the signals is too bad, this could harm the overall signal enhancement, and one therefore need to trade-off the bits correctly among the microphones. In an adhoc microphone array, the different signals are naturally correlated, which adds an interesting interplay between them that goes beyond MSE distortion.

1.3. Operational Interpretations

In this subsection, we use the three types of distortion constraints introduced in Section 1.2 to define the corresponding operational definitions for which we study lower and upper bounds in this paper.

Definition 1

(Causal

RDF

subject to (8)). The operational causal

RDF

under per-dimension distortion constraints is defined as follows:

\begin{matrix} R_{pd}^{c} (D) ≜ inf_{\begin{matrix} (f_{t}, g_{t}) : t \in N_{0} \\ Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} \sum_{t = 0}^{n} R_{t}, \end{matrix}

(11)

where

D_{i i} \in [0, D_{i i, max}]

and

D \in [0, D_{max}]

.

Definition 2

(Causal

RDF

subject to (8) and (9)). The operational causal

RDF

under joint per-dimension and asymptotically averaged total distortion constraints is defined as follows:

\begin{matrix} R_{joint}^{c} (D^{*}) ≜ inf_{\begin{matrix} (f_{t}, g_{t}) : t \in N_{0} \\ Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p, \\ {lim sup}_{n ⟶ \infty} \frac{1}{n + 1} \sum_{t = 0}^{n} E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D^{*} \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} \sum_{t = 0}^{n} R_{t}, \end{matrix}

(12)

where

D^{*} = min {D_{^{T}}, D}

.

Interplay between Definitions 1 and 2. Clearly, Definition 1 is a lower bound to Definition 2 because its constraint set of feasible solutions is larger. Note that, in general, the asymptotically averaged total distortion constraint in (12) is active when

D_{^{T}} \leq D

, otherwise, it is a trivial constraint and (12) is equivalent to the optimization problem of (11). This observation will be shown via a simulation study in the sequel of the paper.

Definition 3

(Causal

RDF

subject to (10)). The operational causal

RDF

under covariance matrix distortion constraints is defined as follows:

\begin{matrix} R^{c} (D_{cov}) ≜ inf_{\begin{matrix} (f_{t}, g_{t}) : t \in N_{0} \\ Σ_{Δ} ⪯ D_{cov} \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} \sum_{t = 0}^{n} R_{t}, \end{matrix}

(13)

where

D_{cov} ⪰ 0

.

Literature Review. In information theory, causal coding and causal decoding also termed zero-delay coding (see, e.g., [10,11,12,13,14]) does not rely on the traditional construction based on random codebooks that in turn allows asymptotically large source vector dimensions [15] to establish achievability of a certain (non-causal) rate-distortion performance. Indeed, the optimal rate-distortion performance for causal source coding (with the clocks of the encoder and decoder to be synchronized), is hard to compute and often bounds are derived in the literature. For example, lower and upper bounds on the operational causal

RDF

subject to solely the distortion constraint in (9) (or the more stringent per instant distortion constraint

E {| | x_{t} - y_{t} {| |}_{2}^{2}} \geq D_{t}, \forall t

) are already studied extensively for various special cases of the setup of Figure 1, see, e.g., [11,14,16,17] and the references therein. In this work, we study new problems related to the causal

RDF

for the general multi-user source coding setup of Figure 1 under various classes of distortion constraints that their utility (partly explained in Section 1.2) has not been studied in the literature so far. These bounds are established using tools from information theory, convex optimization and causal MMSE estimation.

1.4. Contributions

In this paper we obtain the following results for the setup of Figure 1.

Characterization and computation of the true lower bounds on (11)–(13) when the users’ augmented source model is driven by a Gaussian noise process (Lemma 3, Theorem 2).
Analytical lower bounds on (11)–(13) when the users’ augmented source model is driven by additive $i . i . d .$ noise process (including both additive Gaussian and non-Gaussian noise) (Theorem 3). As a consequence, we also obtain analytical lower bounds when only one of the users’ source model is driven by a Gaussian noise process (Corollary 2).
Characterizations and computation of achievable bounds on (11)–(13) when the users’ augmented source model is driven by a Gaussian noise process (Theorem 4).
Characterizations of achievable bounds on (11)–(13), when the users’ augmented source model is driven by additive $i . i . d .$ non-Gaussian noise process (Theorem 5).

Machinery and tools. The information theoretic rate distortion definitions that are used to obtain the lower bounds in this paper are derived using a data processing theorem (Theorem 1) that reveals the “suitable” information measure to use. The derivation of the steady-state characterization of the lower bounds in Lemma 3 is derived using inequalities from matrix algebra and a convexity argument that allows the use of Jensen’s inequality. To derive lower bounds beyond additive

i . i . d .

Gaussian noise process, we use the fact that the characterizations of the lower bounds for the Gaussian case are in fact the characterizations obtained for the best linear coding policies (Lemma 4) hence these can serve as a benchmark to derive lower bounds beyond Gaussian noise process by leveraging certain trace/determinant inequalities and most importantly Minkowski’s determinant inequality ([18], Exercise 12.13) and EPI [19]. The upper bounds on the OPTA when the system’s noise is Gaussian, are derived using a causal sequential DPCM-based scheme with feedback loop which is equivalent to the scheme first derived in [14], followed by an ECDQ scheme that uses vector quantization. The upper bounds on the OPTA, when the system’s noise is additive

i . i . d .

non-Gaussian are obtained using precisely the same trick that is used to obtain the lower bounds, i.e., we use the linear test channel realization that achieves similar upper bounds for the Gaussian case and then, using an SLB type concept (Theorem 5) we obtain the desired results.

The paper is structured as follows. In Section 2 we characterize and compute lower bounds on the OPTA of (11)–(13). In Section 3 we characterize and compute achievable coding scheme on the OPTA of (11)–(13). We draw conclusions and future directions in Section 4.

2. Lower Bounds

In this section, we first choose a suitable information measure that will be used to derive a lower bound on Definitions 1–3. This information measure is a variant of directed information subject to some conditional independence constraints. Then, we obtain lower bounds on Definitions 1–3 for jointly Gaussian Markov processes and for Markov processes driven by additive

i . i . d .

possibly non-Gaussian noise process.

First, we write the joint distribution of the communication system of Figure 1, i.e., from the two users described by the augmented state

{x_{t} : t \in N_{0}^{n}}

to the augmented output of the

MMSE

decoder

{y_{t} : t \in N_{0}^{n}}

. In particular, the joint distribution induced by the joint process

{(x_{t}, ℓ_{t}, y_{t}) : t \in N_{0}^{n}}

admits the following decomposition:

\begin{matrix} P (d x^{n}, d ℓ^{n}, d y^{n}) & = \otimes_{t = 0}^{n} P (d y_{t}, d ℓ_{t}, d x_{t} | y^{t - 1}, ℓ^{t - 1}, x^{t - 1}) \\ = \otimes_{t = 0}^{n} P (d y_{t} | y^{t - 1}, x^{t}, ℓ^{t}) \otimes P (d ℓ_{t} | ℓ^{t - 1}, y^{t - 1}, x^{t}) \otimes P (d x_{t} | x^{t - 1}, y^{t - 1}, ℓ^{t - 1}) - a . s . \\ = \otimes_{t = 0}^{n} P (d y_{t} | y^{t - 1}, ℓ^{t}) \otimes P (d ℓ_{t} | ℓ^{t - 1}, y^{t - 1}, x^{t}) \otimes P (d x_{t} | x^{t - 1}) - a . s ., \end{matrix}

(14)

which means that the augmented state “source” process

x_{t}

, and the decoder’s output process

y_{t}

satisfy the following conditional independence constraints:

\begin{matrix} P (d x_{t} | x^{t - 1}, y^{t - 1}, ℓ^{t - 1}) & = P (d x_{t} | x^{t - 1}) - a . s ., \end{matrix}

(15)

\begin{matrix} P (d y_{t} | y^{t - 1}, ℓ^{t}, x^{t}) & = P (d y_{t} | y^{t - 1}, ℓ^{t}) - a . s . \end{matrix}

(16)

For (14) we state the following technical remark.

Remark 1

(Trivial initial information). In (14) we assume that the joint distribution

P (d x^{- 1}, d ℓ^{- 1}, d y^{- 1})

generates trivial information. This means that

P (d x_{0} | x^{- 1}, y^{- 1}, ℓ^{- 1}) = P (d x_{0})

,

P (d y_{0} | y^{- 1}, x^{0}, ℓ^{0}) = P (d y_{0} | x_{0}, ℓ_{0})

and

P (d ℓ_{0} | ℓ^{- 1}, y^{- 1}, x^{0}) = P (d ℓ_{0} | x_{0})

.

We next prove a data processing theorem.

Theorem 1

(Data processing theorem). Provided the decomposition of the joint distribution in (14) holds, the augmented state-space representation of the system in Figure 1 admits the following data processing inequalities:

\begin{matrix} \begin{matrix} I (x^{n}; y^{n}) \overset{(ii)}{\leq} I (x^{n}; ℓ^{n} | | y^{n - 1}) \overset{(i)}{\leq} \sum_{t = 0}^{n} R_{t}, \end{matrix} \end{matrix}

(17)

where

\begin{matrix} \begin{matrix} I (x^{n}; ℓ^{n} | | y^{n - 1}) & ≜ \sum_{t = 0}^{n} I (x^{t}; ℓ_{t} | ℓ^{t - 1}, y^{t - 1}), \\ I (x^{n}; y^{n}) & ≜ \sum_{t = 0}^{n} I (x^{t}; y_{t} | y^{t - 1}), \end{matrix} \end{matrix}

(18)

assuming

I (x^{t}; ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) < \infty, \forall t,

and

I (x^{t}; y_{t} | y^{t - 1}) < \infty, \forall t

.

Proof.

We first prove (i).

\begin{matrix} \sum_{t = 0}^{n} R_{t} & \geq \sum_{t = 0}^{n} H (ℓ_{t} | ℓ^{t - 1}) \\ \overset{(a)}{\geq} \sum_{t = 0}^{n} H (ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) \\ \overset{(b)}{\geq} \sum_{t = 0}^{n} [H (ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) - H (ℓ_{t} | ℓ^{t - 1}, y^{t - 1}, x^{t})] \\ \overset{(c)}{=} \sum_{t = 0}^{n} I (x^{t}; ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) \\ \equiv I (x^{n}; ℓ^{n} | | y^{n - 1}), \end{matrix}

where

(a)

follows because conditioning reduces entropy [19];

(b)

follows because of the non-negativity of the discrete entropy [19];

(c)

follows by definition.

Next, we prove (ii). This can be shown as follows:

\begin{matrix} I (x^{t}; ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) - I (x^{t}; y_{t} | y^{t - 1}) & \overset{(d)}{=} I (x^{t}; ℓ_{t}, y_{t} | ℓ^{t - 1}, y^{t - 1}) - I (x^{t}; y_{t} | y^{t - 1}) \\ \overset{(e)}{=} I (x^{t}; ℓ^{t} | y^{t}) - I (x^{t}; ℓ^{t - 1} | y^{t - 1}), \\ \overset{(f)}{=} I (x^{t}; ℓ^{t} | y^{t}) - I (x^{t - 1}; ℓ^{t - 1} | y^{t - 1}), \end{matrix}

(19)

where

(d)

follows from an adaptation of ([20], Lemma 3.3) to processes, i.e.,

I (x^{t}; ℓ_{t}, y_{t} | ℓ^{t - 1}, y^{t - 1}) = I (x^{t}; ℓ_{t} | ℓ^{t - 1}, y^{t - 1}) + I (x^{t}; y_{t} | ℓ^{t}, y^{t - 1})

and the second term is zero because of the conditional independence constraint (16);

(e)

follows by the chain rule of conditional mutual information (again an adaptation of ([20], Lemma 3.3)) which decomposes the conditional mutual information in two different ways, i.e.,

\begin{matrix} \begin{matrix} I (x^{t}; ℓ^{t}, y_{t} | y^{t - 1}) & = I (x^{t}; ℓ^{t - 1} | y^{t - 1}) + I (x^{t}; ℓ_{t}, y_{t} | ℓ^{t - 1}, y^{t - 1}) \\ = I (x^{t}; ℓ^{t} | y^{t}) + I (x^{t}; y_{t} | y^{t - 1}); \end{matrix} \end{matrix}

(f)

follows because an adaptation of ([20], Lemma 3.3) can be applied to

I (x^{t}; ℓ^{t - 1} | y^{t - 1})

as follows

\begin{matrix} \begin{matrix} I (x^{t}; ℓ^{t - 1} | y^{t - 1}) & = I (x_{t}, x^{t - 1}; ℓ^{t - 1} | y^{t - 1}) \\ = I (x_{t}; ℓ^{t - 1} | x^{t - 1}, y^{t - 1}) + I (x^{t - 1}; ℓ^{t - t} | y^{t - 1}) \\ \overset{(g)}{=} I (x^{t - 1}; ℓ^{t - t} | y^{t - 1}), \forall t, \end{matrix} \end{matrix}

where

(g)

follows because

I (x_{t}; ℓ^{t - 1} | x^{t - 1}, y^{t - 1}) = 0

. This can be shown as follows.

\begin{matrix} I (x_{t}; ℓ^{t - 1} | x^{t - 1}, y^{t - 1}) & = h (x_{t} | x^{t - 1}, y^{t - 1}) - h (x_{t} | x^{t - 1}, y^{t - 1}, ℓ^{t - 1}) \\ = h (x_{t} | x^{t - 1}, y^{t - 1}) - h (x_{t} | x^{t - 1}) \leq 0, \forall t, \end{matrix}

(20)

where each

h (\cdot)

is assumed to be finite for any t, and (20) follows from the conditional independence constraint in (15). From the non-negativity of conditional mutual information [19], the result follows.

Finally, from (19) we have

\begin{matrix} \sum_{t = 0}^{n} [I (x^{t}; ℓ^{t} | y^{t}) - I (x^{t - 1}; ℓ^{t - 1} | y^{t - 1})] & = I (x_{0}; ℓ_{0} | y_{0}) + I (x^{1}; ℓ^{1} | y^{1}) - I (x_{0}; ℓ_{0} | y_{0}) + \dots \\ \dots + I (x^{n}; ℓ^{n} | y^{n}) - I (x^{n - 1}; ℓ^{n - 1} | y^{n - 1}) \end{matrix}

(21)

\begin{matrix} = I (x^{n}; ℓ^{n} | y^{n}) \geq 0, \end{matrix}

(22)

where (22) follows by applying the method of differences in (21). The result follows because (22) is by definition non-negative. We note that if

I (x_{0}; ℓ_{0} | y_{0})

also appeared in the cancellations, then, this would have been the telescopic sum of the series. This completes the derivation. □

We note that Theorem 1 is different from the data processing theorem derived in ([21], Lemma 1) in that we assume the conditional independence constraint (15) instead of the conditional independence constraint

P (d x_{t} | x^{t - 1}, y^{t - 1}, ℓ^{t - 1}) = P (d x_{t} | x^{t - 1}, y^{t - 1}) - a . s .

, i.e., the source process is not allowed to have access via feedback to the previous output symbols

y^{t - 1}

. This technical difference results into having the mutual information in (18) subject to conditional independence constraints, instead of the well-known directed information [22].

Before we introduce the information theoretic definitions that correspond to lower bounds on (11)–(13), we formally show the construction of

I (x^{n}; y^{n})

.

Source. The augmented source process

{x_{t} : t \in N_{0}}

induces the sequence of conditional distributions

{P (d x_{t} | x^{t - 1}), t \in N_{0}^{n}}

. Since no initial information is assumed, the distribution at

t = 0

is

P (d x_{0})

. In addition, by Bayes’ rule we obtain

P (d x^{n}) ≜ \otimes_{t = 0}^{n} P (d x_{t} | x^{t - 1})

.

Reproduction or “test-channel”. The reproduction process

{y_{t} : t \in N_{0}^{n}}

parameterized by

X^{t}

induces the sequence of conditional distributions, known as test-channels, by

{P (d y_{t} | y^{t - 1}, x^{t}), t \in N_{0}^{n}}

. At

t = 0

, no initial state information is assumed, hence

P (d y_{0} | y^{- 1}, x^{0}) = P (d y_{0} | x_{0})

. In addition, by Bayes’ rule we obtain

\vec{Q} (d y^{n} | x^{n}) ≜ \otimes_{t = 0}^{n} P (d y_{t} | y^{t - 1}, x^{t})

.

From ([23], Remark 1), it can be shown that the sequence of conditional distributions

{P (d x_{t} | x^{t - 1}) : t \in N_{0}^{n}}

and

{P (d y_{t} | y^{t - 1}, x^{t}) : t \in N_{0}^{n}}

uniquely define the family of conditional distributions on

X^{n}

and

Y^{n}

parameterized by

x^{n} \in X^{n}

, respectively, given by the joint distribution

\begin{matrix} P (d x^{n}, d y^{n}) = P (d x^{n}) \otimes \vec{Q} (d y^{n} | x^{n}) . \end{matrix}

(23)

In addition, from (23), we can uniquely define the

Y^{n} —

marginal distribution by

\begin{matrix} P (d y^{n}) ≜ \int_{X^{n}} P (d x^{n}) \otimes \vec{Q} (d y^{n} | x^{n}), \end{matrix}

and the conditional distributions

{P (d y_{t} | y^{t - 1}) : t \in N_{0}^{n}}

.

Given the above construction of distributions, we can formally introduce the information measure using relative entropy as follows:

\begin{matrix} \begin{matrix} I (x^{n}; y^{n}) & \overset{(a)}{≜} D (P (d x^{n}, d y^{n}) | | P (d x^{n}) \times P (d y^{n})) \in [0, \infty] \\ \overset{(b)}{=} \int_{X^{n} \times Y^{n}} log (\frac{d \vec{Q} (\cdot | x^{n})}{d P (\cdot)} (y^{n})) P (d x^{n}, d y^{n}) \\ \overset{(c)}{=} \sum_{t = 0}^{n} E \{log (\frac{d P (\cdot | y^{t - 1}, x^{t})}{d P (\cdot | y^{t - 1})} (y_{t}))\} \\ \overset{(d)}{=} \sum_{t = 0}^{n} I (x^{t}; y_{t} | y^{t - 1}), \end{matrix} \end{matrix}

(24)

where

(a)

follows by definition of relative entropy between

P (d x^{n}, d y^{n})

and the product distribution

P (d x^{n}) \times P (d y^{n})

;

(b)

is due to the Radon–Nikodym derivative theorem ([23], Appendix A and Appendix C);

(c)

is due to chain rule of relative entropy;

(d)

follows by definition.

We now state as a definition the lower bounds on (11)–(13).

Definition 4

(Lower bounds). Using the previous construction of distributions and the information measure of (24), we can define the following lower bounds on (11)–(13).

(1): The sum-rate subject to per-dimension distortion constraint is defined as follows:

$\begin{matrix} R_{pd}^{L B} (D) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t = 0, \dots, \infty \\ Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} I (x^{n}; y^{n}), \end{matrix}$

(25)

with

$\begin{matrix} R_{pd, [0, n]}^{L B} (D) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t \in N_{0}^{n} \\ Σ_{Δ_{i i}, t} \leq D_{i i}, i = 1, \dots, p \end{matrix}} I (x^{n}; y^{n}) . \end{matrix}$

(26)

where $Σ_{Δ, t} ≜ \frac{1}{n + 1} \sum_{t = 0}^{n} E {(x_{t} - y_{t}) {(x_{t} - y_{t})}^{T}}$ , $Σ_{Δ_{i i}, t} ≜ \frac{1}{n + 1} \sum_{t = 0}^{n} {[E {(x_{t} - y_{t}) {(x_{t} - y_{t})}^{T}}]}_{i i}$ and $D \geq 0$ .
(2): The sum-rate subject to joint distortion constraints is defined as follows:

$\begin{matrix} R_{joint}^{L B} (D^{*}) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t = 0, \dots, \infty \\ Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p \\ {lim sup}_{n ⟶ \infty} \frac{1}{n + 1} \sum_{t = 0}^{n} E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D_{^{T}} \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} I (x^{n}; y^{n}), \end{matrix}$

(27)

$\begin{matrix} R_{joint, [0, n]}^{L B} (D^{*}) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t \in N_{0}^{n} \\ Σ_{Δ_{i i}, t} \leq D_{i i}, i = 1, \dots, p \\ \frac{1}{n + 1} \sum_{t = 0}^{n} E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D_{^{T}} \end{matrix}} I (x^{n}; y^{n}), \end{matrix}$

(28)

where $D^{*} = min {D, D_{^{T}}}$ .
(3): The sum-rate subject to covariance matrix distortion constraints is defined as follows:

$\begin{matrix} R^{L B} (D_{cov}) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t = 0, \dots, \infty \\ Σ_{Δ} ⪯ D_{cov} \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} I (x^{n}; y^{n}), \end{matrix}$

(29)

$\begin{matrix} R_{[0, n]}^{L B} (D_{cov}) ≜ inf_{\begin{matrix} P (d y_{t} | y^{t - 1}, x^{t}) : t \in N_{0}^{n} \\ Σ_{Δ, t} ⪯ D_{cov} \end{matrix}} I (x^{n}; y^{n}), \end{matrix}$

(30)

where $D_{cov} ⪰ 0$ .

Next, we stress some technical remarks related to the new information theoretic measures in Definition 4 that can be obtained using known results in the literature and some known lower bounds that use the same objective function with (26)–(30).

Remark 2

(Comments on Definition 4). It can be shown that the infimization problems (26), (28) and (30), in contrast to their operational counterparts (11)–(13) are convex with respect to their test channel. This can be shown following, for instance, the techniques of [23]. By the structural properties of the test channel derived in ([24], Section 4), if the source is first-order Markov, i.e., with distribution

P (d x_{t} | x_{t - 1}), t \in N_{0}^{n}

, the test channel distribution is of the form

P (d y_{t} | y^{t - 1}, x_{t}), t \in N_{0}^{n}

. Finally, combining this structural result, with ([25], Theorem 1.8.6), it can be shown that if

x^{n}

is Gaussian then a jointly Gaussian process

{(x_{t}, y_{t}) : t \in N_{0}}

achieves a smaller value of the information rates, and if

x^{n}

is Gaussian and Markov, then the infimum in (26), (28) and (30) can be restricted to test channel distributions which are Gaussian, of the form

P (d y_{t} | y_{t - 1}, x_{t})

.

We recall that when the distortion constraint set contains only (9), its finite time horizon counterpart or the per instant distortion constraint

E {| | x_{t} - y_{t} {| |}_{2}^{2}} \leq D_{t} \forall t

, we end up having the well known nonanticipatory-ϵ entropy [26] also found in the literature as sequential or nonanticipative

RDF

[27,28]. Nonanticipatory-ϵ entropy received significant interest during the last twenty years in an anthology of papers (see, e.g., [11,16,24,29,30,31]) due to its utility in control related and delay-constrained applications. Moreover, the characterizations in (29) and (30) do not appear to be manageable to solve using standard techniques, and no closed-form statements are available for the general RDF in the literature. For this reason, we will seek only for non-tight bounds.

In view of the above, in the sequel we characterize and compute lower bounds on Definitions 1–3 for Gauss–Markov processes and for Markov models driven by additive

i . i . d .

noise processes.

2.1. Characterization and Computation of Jointly Gaussian Processes

In this section, we assume that the augmented joint process

{(x_{t}, y_{t}) : t \in N_{0}}

is jointly Gaussian. We use this assumption to first characterize and then to compute optimally (26), (28) and (30).

We first use the following helpful lemma. We exclude the proof because it is already derived in other papers, see, e.g., [14,24]. The only modification is the augmented joint processes

{(x_{t}, y_{t}) : t \in N_{0}^{n}}

.

Lemma 1

(Realization of

{P^{*} (d y_{t} | y^{t - 1}, x_{t}) : t \in N_{0}^{n}}

). Consider the class of conditionally Gaussian test channels

{P^{*} (d y_{t} | y^{t - 1}, x_{t}) : t \in N_{0}^{n}}

. Then, the following statements hold.

(1): Any candidate of ${P^{*} (d y_{t} | y^{t - 1}, x_{t}) : t \in N_{0}^{n}}$ can be realized by the recursion

$\begin{matrix} y_{t} = & H_{t} (x_{t} - {\hat{x}}_{t | t - 1}) + {\hat{x}}_{t | t - 1} + v_{t}, {\hat{x}}_{0 | - 1} = g i v e n, t \in N_{0}^{n}, \end{matrix}$

(31)

where ${\hat{x}}_{t | t - 1} ≜ E {x_{t} | y^{t - 1}}$ , ${v_{t} \in R^{p_{1} + p_{2}} \sim N (0; Σ_{v_{t}}) : t \in N_{0}^{n}}$ is an independent Gaussian process independent of ${w_{t} : t \in N_{0}^{n - 1}}$ and $x_{0}$ , and ${H_{t} \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})} : t \in N_{0}^{n}}$ are time-varying deterministic matrices.
Moreover, the innovations process ${I_{t} \in R^{p_{1} + p_{2}} : t \in N_{0}^{n}}$ of (31) is the orthogonal process defined by

$\begin{matrix} I_{t} ≜ y_{t} - E \{y_{t} | y^{t - 1}\} = H_{t} (x_{t} - {\hat{x}}_{t | t - 1}) + v_{t}, \end{matrix}$

where $I_{t} \sim N (0; Σ_{I_{t}})$ , $Σ_{I_{t}} = H_{t} Σ_{t | t - 1} H_{t}^{T} + Σ_{v_{t}}$ and $Σ_{t | t - 1} ≜ E \{(x_{t} - {\hat{x}}_{t | t - 1}) {(x_{t} - {\hat{x}}_{t | t - 1})}^{T} | y^{t - 1}\} = E \{(x_{t} - {\hat{x}}_{t | t - 1}) {(x_{t} - {\hat{x}}_{t | t - 1})}^{T}\}$ .
(2): Let ${\hat{x}}_{t | t} ≜ E {x_{t} | y^{t}}$ and $Σ_{t | t} ≜ E \{(x_{t} - {\hat{x}}_{t | t}) {(x_{t} - {\hat{x}}_{t | t})}^{T} | y^{t}\} = E \{(x_{t} - {\hat{x}}_{t | t}) {(x_{t} - {\hat{x}}_{t | t})}^{T}\}$ . Then, ${{\hat{x}}_{t | t - 1}, Σ_{t | t - 1} : t \in N_{0}^{n}}$ satisfy the following vector-valued equations:

$\begin{matrix} \begin{matrix} {\hat{x}}_{t | t - 1} = A {\hat{x}}_{t - 1 | t - 1}, \\ Σ_{t | t - 1} = A Σ_{t - 1 | t - 1} A^{T} + Σ_{w_{t}}, \\ {\hat{x}}_{t | t} = {\hat{x}}_{t | t - 1} + N_{t} I_{t}, \\ N_{t} = Σ_{t | t - 1} H_{t}^{T} Σ_{I_{t}}^{- 1} (K a l m a n G a i n), \\ Σ_{t | t} = Σ_{t | t - 1} - Σ_{t | t - 1} H_{t}^{T} Σ_{I_{t}}^{- 1} H_{t} Σ_{t | t - 1}, \end{matrix} \end{matrix}$

(32)

where $Σ_{t | t} ⪰ 0$ and $Σ_{t | t - 1} ≻ 0$ .
(3): Using $MMSE$ estimation via the vector-valued KF recursions of (32), the following finite dimensional characterizations of $R_{pd, [0, n]}^{L B, G} (D), R_{joint, [0, n]}^{L B, G} (D^{*}), R_{cov, [0, n]}^{L B, G} (D_{cov})$ can be obtained:

$\begin{matrix} R_{pd, [0, n]}^{L B, G} (D) & = inf_{\begin{matrix} H_{t} \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}, Σ_{v_{t}} ⪰ 0, t \in N_{0}^{n} \\ 0 \leq {\tilde{Σ}}_{i i, t} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Σ_{t | t - 1} |}{| Σ_{t | t} |}]}^{+}, \end{matrix}$

(33)

$\begin{matrix} R_{joint, [0, n]}^{L B, G} (D^{*}) & = inf_{\begin{matrix} H_{t} \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}, Σ_{v_{t}} ⪰ 0, t \in N_{0}^{n} \\ 0 \leq {\tilde{Σ}}_{i i, t} \leq D_{i i}, i = 1, \dots, p \\ \frac{1}{n + 1} \sum_{t = 0}^{n} trace ((I_{p_{1} + p_{2}} - H_{t}) Σ_{t | t - 1} {(I_{p_{1} + p_{2}} - H_{t})}^{T} + Σ_{v_{t}}) \leq D_{^{T}} \end{matrix}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Σ_{t | t - 1} |}{| Σ_{t | t} |}]}^{+}, \end{matrix}$

(34)

$\begin{matrix} R_{[0, n]}^{L B, G} (D_{cov}) & = inf_{\begin{matrix} H_{t} \in R^{(p_{1} + p_{2}) \times (p_{1} + p_{2})}, Σ_{v_{t}} ⪰ 0, t \in N_{0}^{n} \\ \frac{1}{n + 1} \sum_{t = 0}^{n} ((I_{p_{1} + p_{2}} - H_{t}) Σ_{t | t - 1} {(I_{p_{1} + p_{2}} - H_{t})}^{T} + Σ_{v_{t}}) ⪯ D_{cov} \end{matrix}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Σ_{t | t - 1} |}{| Σ_{t | t} |}]}^{+}, \end{matrix}$

(35)

where ${\tilde{Σ}}_{i i, t} ≜ \frac{1}{n + 1} \sum_{t = 0}^{n} {[(I_{p_{1} + p_{2}} - H_{t}) Σ_{t | t - 1} {(I_{p_{1} + p_{2}} - H_{t})}^{T} + Σ_{v_{t}}]}_{i i} \geq 0$ , $D \in [0, D_{max}]$ and $D^{*} \in [0, D_{max}^{*}]$ .

We note that one can directly study the finite-dimensional characterizations of Lemma 1, (3), and try to come up with numerical solutions. However, it is much more insightful to use instead the identification of the design parameters

{(H_{t}, Σ_{v_{t}}) : t \in N_{0}^{n}}

of the test-channel realization in (31). This approach is already done in [14,24] hence we state it without a proof. Note, however, that compared to [14,24] that assume distortion constraints like (9) (or the per time instant counterpart of (9), i.e.,

E \{| | x_{t} - y_{t} {| |}_{2}^{2}\} \leq D_{t}, \forall t

), here we assume augmented state-space models and various spatio-temporal distortion constraints, namely, per-dimension, jointly per-dimension and averaged total distortion constraints, and covariance matrix distortion constraint.

Lemma 2

(Alternative characterizations of (33)–(35) via system identification). Consider Lemma 1 and set

Δ_{t} ≜ Σ_{t | t}

and

Λ_{t} ≜ Σ_{t | t - 1}

. Then, the following statements hold.

(1): The test-channel distribution $P (d y_{t} | y^{t - 1}, x_{t})$ admits the following linear Markov additive noise realization:

$\begin{matrix} y_{t} = H_{t} x_{t} + (I_{p_{1} + p_{2}} - H_{t}) A y_{t - 1} + v_{t}, y_{- 1} = g i v e n, t \in N_{0}^{n}, \end{matrix}$

(36)

where

$\begin{matrix} H_{t} ≜ I_{p_{1} + p_{2}} - Δ_{t} Λ_{t}^{- 1}, Σ_{v_{t}} ≜ Δ_{t} H_{t}^{T} ⪰ 0 . \end{matrix}$

(37)
(2): The finite-dimensional characterizations of $R_{pd, [0, n]}^{L B, G} (D), R_{joint, [0, n]}^{L B, G} (D^{*}), R_{cov, [0, n]}^{L B, G} (D_{cov})$ can be simplified to the following alternative characterizations:

$\begin{matrix} R_{pd, [0, n]}^{L B, G} (D) & = inf_{0 \leq Δ_{i i, t} \leq D_{i i}, i = 1, \dots, p, t \in N_{0}^{n}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Λ_{t} |}{| Δ_{t} |}]}^{+}, \end{matrix}$

(38)

$\begin{matrix} R_{joint, [0, n]}^{L B, G} (D^{*}) & = inf_{\begin{matrix} 0 \leq Δ_{i i, t} \leq D_{i i}, i = 1, \dots, p, t \in N_{0}^{n}, \\ \frac{1}{n + 1} \sum_{t = 0}^{n} trace (Δ_{t}) \leq D_{^{T}} \end{matrix}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Λ_{t} |}{| Δ_{t} |}]}^{+}, \end{matrix}$

(39)

$\begin{matrix} R_{[0, n]}^{L B, G} (D_{cov}) & = inf_{\frac{1}{n + 1} \sum_{t = 0}^{n} (Δ_{t}) ⪯ D_{cov}} \frac{1}{2} \sum_{t = 0}^{n} {[log \frac{| Λ_{t} |}{| Δ_{t} |}]}^{+}, \end{matrix}$

(40)

where $Δ_{i i, t}$ is defined precisely as ${\tilde{Σ}}_{i i, t}$ .

Next, we give some technical remarks related to Lemma 2.

Remark 3

(Special case and technical remarks).

(1): Clearly, if in the forward test-channel realization with additive noise, we assume that the block diagonal matrix $A = 0$ (null matrix), then, we recover the classical forward test-channel realization for vector memoryless Gaussian source subject to a $MSE$ distortion (see, e.g., ([32], Chapter 4.5), ([15], Chapter 9.7)) given by

$\begin{matrix} y_{t} = H_{t} x_{t} + v_{t}, t \in N_{0}^{n}, \end{matrix}$

(41)

and the coefficients in (37) give

$\begin{matrix} H_{t} ≜ I_{p_{1} + p_{2}} - Δ_{t} Σ_{w}^{- 1} ⪰ 0, Σ_{v_{t}} ≜ Δ_{t} H_{t}^{T} ⪰ 0 . \end{matrix}$

(42)

Moreover, the characterizations in (38)–(40) change in that $Λ_{t} = Σ_{w}$ . Clearly, (42) can be seen as reverse-waterfilling design parameters.
(2): Compared to(1), we note that $H_{t}$ in (37) should not be confused with a positive semidefinite matrix defined in the usual quadratic form [33] but instead it can possibly be a non-symmetric matrix which however contains only real non-negative eigenvalues. This observation is important because it means that in general the design variables $(Δ_{t}, Λ_{t})$ do not commute like in the classical reverse-waterfilling problems for memoryless multivariate Gaussian random variables or in $i . i . d .$ processes (see, e.g., [19,25]).
(3): For jointly Gaussian processes, the linear forward realization in (36) is the optimal realization among all realizations for this problem because the KF is the optimal causal MMSE estimator. Beyond Gaussian processes, and when the noise is zero-mean, uncorrelated and white (in our setup these properties hold), the optimal realization for Gaussian processes becomes the best linear realization (see, e.g., ([34], §3.2) or ([35], p. 130)) and similarly the corresponding characterizations in (38)–(40) are the best linear characterizations. By saying “best-linear” realization and characterizations, respectively, we mean that there may be non-linear realizations and hence non-linear-based characterizations that outperform the best linear.
(4): The characterization of (39) is different from the characterization obtained in ([16], Theorem 1, (25e)) that uses weighted distortion constraints. The former optimization problem imposes hard constraints whereas the latter imposes soft constraints via weights. Nonetheless, an interesting open question is whether there exists a set of weights, which will give the same per dimension distortion when imposed as a weighted total distortion constraint.
(5): It should also be stressed that the per-dimension constraints on the diagonal entries of $Δ_{t}$ are not the same as having constraints on the eigenvalues of $Δ_{t}$ . This further means that even for this class of distortion constraints, it is still possible to have rate-distortion resource allocation (i.e., a type of reverse-waterfilling optimization).

Remark 4

(Convexity). The optimization problems in (38) and (39) are convex because the objective function is linear and the constraints are affine and positive semi-definite. Thus, the problem can be solved numerically using convex programming software (see, e.g, [36]) or the more challenging

KKT

conditions that are first-order necessary conditions for global optimality ([37], Chapter 5.3). The latter, will give certain non-linear matrix Riccati equations that need to be solved in order to construct a reverse-waterfilling algorithm.

Remark 5

(Existence of Solution). A sufficient condition for existence of a solution with a finite value in (38)–(40) is to consider the strict LMI constraint

0 ≺ Δ_{t} ⪯ Λ_{t}

that ensures the objective function is bounded. The strict LMI ensures that

Δ_{t} ≻ 0

which further means that

D > 0

,

D_{^{T}} > 0

and

D_{cov} ≻ 0

.

In what follows, we derive lower bounds on (11)–(13).

Lemma 3

(Steady-state lower bounds on (11)–(13)). Suppose that the conditions of Remark 5 hold. Moreover, let

Δ ≜ \frac{1}{n + 1} \sum_{t = 0}^{n} Δ_{t}

for some finite n. Then, the following statements hold.

(1): $\begin{matrix} R_{pd}^{c} (D) \geq min_{\begin{matrix} 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \frac{1}{2} log \frac{| Λ |}{| Δ |}, \end{matrix}$

(43)

where $Λ = A Δ A^{T} + Σ_{w}$ .
(2): $\begin{matrix} R_{joint}^{c} (D^{*}) \geq min_{\begin{matrix} 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \\ trace (Δ) \leq D \end{matrix}} \frac{1}{2} log \frac{| Λ |}{| Δ |}, \end{matrix}$

(44)

for some $D^{*} = min {D_{^{T}}, D}$ .
(3): $\begin{matrix} R^{c} (D_{cov}) \geq min_{\begin{matrix} 0 ≺ Δ ⪯ Λ \\ 0 ≺ Δ ⪯ D_{cov} \end{matrix}} \frac{1}{2} log \frac{| Λ |}{| Δ |} . \end{matrix}$

(45)

Proof.

See Appendix A. □

It should be remarked that instead of the derivation based on a convexity argument in Lemma 3, one can assume that the optimal minimizer

P (d y_{t} | y_{t - 1}, x_{t})

that achieves (43)–(45) is time-invariant and the output distribution

P (d y_{t} | y_{t - 1})

is also time-invariant with a unique invariant distribution, see, e.g., ([14], Theorem 3). Moreover, the optimal linear forward test-channel that achieves the lower bounds in (43)–(45) correspond to the time-invariant realization (36), given by

\begin{matrix} y_{t} = H x_{t} + (I_{p_{1} + p_{2}} - H) A y_{t - 1} + v_{t}, \end{matrix}

(46)

whereas the corresponding time-invariant scaling coefficients in (37) are as follows

\begin{matrix} H ≜ I_{p_{1} + p_{2}} - Δ Λ^{- 1}, Σ_{v} ≜ Δ H^{T} ⪰ 0 . \end{matrix}

(47)

From Lemma 3, the following corollary can be immediately obtained.

Corollary 1

(Fixed design variable

Δ

). If in Lemma 2 we assume that

Δ_{t} = Δ

,

\forall t

, then we obtain (43)–(45).

Proof.

This is immediate from the derivation of Lemma 3. □

In what follows, we show that the lower bounds in Lemma 3 are semidefinite representable, thus, they can be readily computed.

Theorem 2

(Computation of the lower bounds in Lemma 3). Consider the variable

Q_{1} ≜ Δ^{- 1} - A^{T} Σ_{w}^{- 1} A

, where

Δ ≻ 0

. Then, the following semidefinite programming representations hold.

(1): For some $D \equiv trace (\hat{D}) > 0$ , the lower bound in (43), denoted hereinafter by $R_{pd}^{L B} (D)$ , is semidefinite representable as follows:

$\begin{matrix} R_{pd}^{L B} (D) & = min_{\begin{matrix} Q_{1} ≻ 0 \end{matrix}} - \frac{1}{2} log | Q_{1} | + \frac{1}{2} log | Σ_{w} | . \\ s . t . & 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \end{matrix}$

(48)

$\begin{matrix} [\begin{matrix} Δ - Q_{1} & Δ A^{T} \\ A Δ & Λ \end{matrix}] ⪰ 0 \end{matrix}$

(49)
(2): For some $D^{*} = min {D_{^{T}}, D} > 0$ , the lower bound in (44), denoted hereinafter by $R_{joint}^{L B} (D^{*})$ , is semidefinite representable as follows:

$\begin{matrix} R_{joint}^{L B} (D^{*}) & = min_{\begin{matrix} Q_{1} ≻ 0 \end{matrix}} - \frac{1}{2} log | Q_{1} | + \frac{1}{2} log | Σ_{w} | . \\ s . t . & 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \\ trace (Δ) \leq D_{^{T}} \\ [\begin{matrix} Δ - Q_{1} & Δ A^{T} \\ A Δ & Λ \end{matrix}] ⪰ 0 \end{matrix}$

(50)
(3): For some $D_{cov} ≻ 0$ , the lower bound in (45), denoted hereinafter by $R^{L B} (D_{cov})$ , is semidefinite representable as follows:

$\begin{matrix} R^{L B} (D_{cov}) & = min_{\begin{matrix} Q_{1} ≻ 0 \end{matrix}} - \frac{1}{2} log | Q_{1} | + \frac{1}{2} log | Σ_{w} | . \\ s . t . & 0 ≺ Δ ⪯ Λ \\ Δ ⪯ D_{cov} \\ [\begin{matrix} Δ - Q_{1} & Δ A^{T} \\ A Δ & Λ \end{matrix}] ⪰ 0 \end{matrix}$

(51)

Proof.

See Appendix B. □

Next, we stress some comments on the semidefinite representation of the lower bounds in Theorem 2.

Remark 6

(Comments on Theorem 2).

(1): We note that a similar characterization to the characterizations derived in Theorem 2 (subject to the distortion constraint (9) or the per-instant distortion constraint $E {| | x_{t} - y_{t} {| |}_{2}^{2}} \leq D_{t}, \forall t$ , for a special case of the setup in Figure 1) is recently derived in ([16], Equation (27)). The log-determinant convex optimization problems in Theorem 2 are widely used in systems and control theories because they are able to deal efficiently with LMIs [38].
(2): Recently, the efficiency of SDP algorithm in solving linear and non-linear optimization problems attracted experts from the field of information theory who noticed its usefulness in solving distributed source coding problems (see, e.g., [3,39]). Such log-determinant problems when solved using the semidefinite programming method are known to have polynomial worst-case complexity (see, e.g., [40]). In addition, for an interior point method such as the SDP approach, the most computationally expensive step is the Cholesky factorization involved in the Newton steps.
(3): On the other hand, due to its complexity, the SDP approach for high dimensional systems is often time consuming whereas for very large scale systems is occasionally impossible to obtain numerical solutions. Hence, ideally one could preferably consider alternative methods to solve a problem sacrificing for instance the optimality of the SDP algorithm but gaining in scalability and reducing the complexity. The most computationally efficient way to compute such problems and, additionally, to gain some insight from the solution is via the well-known reverse-waterfilling algorithm ([19], Theorem 10.3.3), which is however very hard to construct and compute because one needs to employ and solve complicated KKT conditions [37]. Such effort was recently made for multivariate Gauss-Markov processes under per instant, averaged total and asymptotically averaged total distortion constraints in [24,41].

Next, we perform some numerical illustrations using the semidefinite representations of Theorem 2. We also compare (48) and (50), to the known expression obtained only for the asymptotically averaged total MSE distortion constraint in ([16], Equation (27)). We note that the SDP algorithm for (48)–(51) is implemented using the CVX platform [36].

Example 1

(Comparison of

R_{joint}^{L B} (D^{*})

,

R_{pd}^{L B} (D)

and ([16], Equation (27))). For the system in (1), we assume that user 1 is described by a

R^{2}

-valued time-invariant Markov source driven by

i . i . d .

Gaussian noise process with parameters

(A_{1}, Σ_{w^{1}})

:

\begin{matrix} (A_{1}, Σ_{w^{1}}) = ([\begin{matrix} 0.5 & 0.2 \\ 0.3 & 0.6 \end{matrix}], [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]), \end{matrix}

(52)

whereas, user 2 is described by a

R^{3}

-valued time-invariant Markov source driven by

i . i . d .

Gaussian noise process with parameters

(A_{2}, Σ_{w^{2}})

:

\begin{matrix} (A_{2}, Σ_{w^{2}}) = ([\begin{matrix} 0.5 & 0.2 & 0.1 \\ 0.3 & 0.6 & 0.1 \\ 0.7 & 0.3 & 0.4 \end{matrix}], [\begin{matrix} 1 & 0 & 0 \\ 0 & 0.2 & 0 \\ 0 & 0 & 0.5 \end{matrix}]) . \end{matrix}

(53)

Clearly, the augmented state space model (2) generates

A = A_{1} \oplus A_{2}

and

Σ_{w} = Σ_{w}^{1} \oplus Σ_{w}^{2}

. For this example, we assume that

D_{^{T}} = 1.5

and

D_{11} = 0.1, D_{22} = 0.01, D_{33} = 0.6, D_{44} = 0.15, D_{55} = 0.1

, which implies that

D = 0.96

. This means that

D^{*} = min {D_{^{T}}, D} = 0.96

.

In Figure 3, we compare the numerical solutions of

R_{joint}^{L B} (D^{*})

and

R_{pd}^{L B} (D)

with ([16], Equation (27)), denoted hereinafter as

R^{L B} (D_{^{T}})

.

Based on this numerical study, we observe that for distortion levels between

(0, D^{*} = D]

,

R_{joint}^{L B} (D^{*}) \geq R_{pd}^{L B} (D)

whereas for values of

D_{^{T}}

greater than

D^{*}

we observe that

R_{joint}^{L B} (D^{*}) = R_{pd}^{L B} (D)

because the asymptotically averaged total MSE distortion constraint is inactive. This observation verifies our comment in Section 1.2 regarding the connection of (11) and (12). Clearly, at high rates (or high resolution) we observe that

R_{joint}^{L B} (D^{*}) \approx R^{L B} (D_{^{T}})

.

Another interesting observation (illustrated in Figure 4) that can be made, is that if in the same example we allocate the total budget of per dimension distortion equally, i.e.,

D_{i i} = D_{j j}, \forall i \neq j

, we observe that for distortion levels between

(0, D^{*} = D]

,

R_{joint}^{L B} (D^{*}) = R^{L B} (D_{^{T}}) \geq R_{pd}^{L B} (D)

.

Example 2

(Covariance matrix distortion constraint). For the system in (1) we assume that user 1 is described by a

R^{2}

-valued time-invariant Markov source driven by

i . i . d .

Gaussian noise process with parameters

(A_{1}, Σ_{w^{1}})

:

\begin{matrix} (A_{1}, Σ_{w^{1}}) = ([\begin{matrix} 0.5 & 0.2 \\ 0.3 & 0.6 \end{matrix}], [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]), \end{matrix}

(54)

whereas, user 2 is described by an

R

-valued time-invariant Markov source driven by

i . i . d .

Gaussian noise process with parameters

(A_{2}, Σ_{w^{2}})

:

\begin{matrix} (A_{2}, Σ_{w^{2}}) = (0.6, 2) . \end{matrix}

(55)

The augmented state space model (2) is generated by

A = A_{1} \oplus A_{2}

and similarly

Σ_{w} = Σ_{w}^{1} \oplus Σ_{w}^{2}

. For this example, we assume a covariance matrix distortion constraint given by:

\begin{matrix} D_{cov} = [\begin{matrix} 1.5 & γ & γ \\ γ & 1 & γ \\ γ & γ & 0.5 \end{matrix}], \end{matrix}

(56)

where

γ > 0

is the positive correlation coefficient between the distortion matrix components (i.e., diagonal entries) and it is chosen such that

D_{cov} ≻ 0

.

In Figure 5 we demonstrate a comparison between

R^{L B} (D_{cov})

evaluated for several different values of γ. One interesting observation that can be made is that higher distortion correlation in (56) leads to less bits with a

γ_{max} \approx 0.53

, beyond which the value of

R^{L B} (D_{cov})

remains unchanged. Another interesting observation is that for negative correlation γ, the approximation via SDP does not give a number. However, this is not the case, in general (see, e.g., ([42], Example 1)).

Using the same simulation study, we can arrive to an interesting connection between the approximation in (51) and (48). In particular, if for instance in (56) we restrict the matrix distortion constraint only to the main diagonal elements (i.e., exactly like the the per-dimension constraints) then, we obtain the plot of Figure 6 which clearly demonstrates that

R^{L B} (D_{cov}) = R_{pd}^{L B} (D)

. In fact, restricting the covariance matrix distortion constraint of (56) to the per dimension distortion constraint, is as if we optimize via a solution space in which γ is allowed to have any value in

R

. As a result, the feasible set of solutions is larger when the constraint set is subject to per-dimension distortion constraints rather than the covariance matrix distortion constraint.

2.2. Analytical Lower Bounds for Markov Sources Driven by Additive $i . i . d .$ Noise Processes

In this subsection, we derive analytical lower bounds on (11)–(13)) when the source model describing the behavior of user 1 or user 2 is driven by possibly

i . i . d .

non-Gaussian noise process.

We first give a lemma which will facilitate the derivation of our lower bounds. We only consider the case of RDFs subject to per-dimension distortion constraints because the other classes of distortion constraints follow similarly.

Lemma 4

(Rate-distortion bounds). For the augmented source model describing the behavior of users 1, 2 in (3), the following inequalities hold assuming distortion constraints in the class of (8):

\begin{matrix} R_{pd}^{L B} (D) \overset{(a)}{\leq} R_{pd}^{L B, l i n e a r} (D) \leq R_{pd}^{c} (D), \end{matrix}

(57)

where

\begin{matrix} R_{pd}^{L B, l i n e a r} (D) ≜ inf_{\begin{matrix} P^{l i n e a r} (d y_{t} | y_{t - 1}, x_{t}) : t \in N_{0} \\ Σ_{Δ_{i i}} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \underset{n ⟶ \infty}{lim sup} \frac{1}{n + 1} \sum_{t = 0}^{n} I (x_{t}; y_{t} | y_{t - 1}), \end{matrix}

(58)

and

(a)

holds with equality if the augmented state space model described in (2) is jointly Gaussian and the optimal minimizer, i.e.,

P^{*} (d y_{t} | y^{t - 1}, x^{t})

of

R_{pd}^{L B} (D)

is conditionally Gaussian. Equality

(a)

holds trivially at

D_{max}

.

Proof.

The RHS inequality follows from Theorem 1 and (43) whereas the LHS inequality follows from the fact that the constraint set of

R_{pd}^{L B} (D)

is larger than the constraint set of

R_{pd}^{L B, l i n e a r} (D)

which is restricted to linear coding policies. Now, under the specific augmented source model in (3), and using Lemma 2, (1), we obtain

R_{pd}^{L B, l i n e a r} (D)

defined as in (58) because these are the best linear coding policies since KF algorithm is the best linear causal

MSE

estimator beyond additive Gaussian noise processes (see the discussion in Remark 3, (3)). Clearly, if the augmented source in (3) is jointly Gaussian and the optimal realization of

R_{pd}^{L B} (D)

is conditionally Gaussian, then, the system model is jointly Gaussian and the optimal policies are linear given by the forward linear test channel realization obtained in (36) hence the LHS inequality holds with equality. □

Remark 7

(Comments on Lemma 4). We note that Lemma 4 holds if we assume RDFs with distortion constraints in the class of (9) or (10).

The following theorem is a major result of this paper.

Theorem 3

(Analytical lower bounds on (11)–(13)). Consider the source models of users

1, 2

in (1). Then, the following analytical lower bounds on (11)–(13) hold.

(1): For $D_{i i} > 0, \forall i$ , we obtain

$\begin{matrix} R_{pd}^{c} (D) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{(p_{1} + p_{2}) N (w)}{D}), \end{matrix}$

(59)

where $D \in (0, \frac{(p_{1} + p_{2}) N (w)}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}}]$ with $N (w) = \frac{1}{2 π e} 2^{\frac{2}{p_{1} + p_{2}} h (w)}$ and $h (w) > - \infty$ .
(2): For $D_{^{T}} > 0$ , and $D_{i i} > 0, \forall i$ , we obtain

$\begin{matrix} R_{joint}^{c} (D^{*}) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{(p_{1} + p_{2}) N (w)}{D^{*}}), \end{matrix}$

(60)

where $D^{*} \in (0, \frac{(p_{1} + p_{2}) N (w)}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}}]$ , with $N (w)$ defined as in(1)and $h (w) > - \infty$ .
(3): For $D_{cov} ≻ 0$ we obtain

$\begin{matrix} R^{c} (D_{cov}) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{N (w)}{| D_{cov} |^{\frac{1}{p_{1} + p_{2}}}}), \end{matrix}$

(61)

where $| D_{cov} | \in (0, {(\frac{N (w)}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}})}^{(p_{1} + p_{2})}]$ , with $N (w)$ defined as in(1)and $h (w) > - \infty$ .

Proof.

See Appendix C. □

The following technical remarks can be made regarding Theorem 3.

Remark 8.

(1): Note that if in Theorem 3 we allow $h (w) = - \infty$ , then, the analytical lower bound expressions take a negative finite value or $- \infty$ , which cannot be the case ( $RDF$ is, by definition, non-negative). A way to include the case where $h (w)$ is allowed to be $- \infty$ in our lower bound expressions, is to set the objective functions in (59)–(61) to be ${[log (\cdot)]}^{+}$ . This will mean that whenever $h (w) = - \infty$ , the analytical lower bound expression will be zero.
(2): The analytical lower bounds in (59)–(61) do not correspond to the best linear forward test channel realization of Lemma 3 (see (46)) which is also the optimal policy under the assumption of a MMSE decoder when the system’s noise is purely Gaussian (see Remark 3,(3)). Moreover, it is not clear what is the realization that achieves them the same way the bounds in Lemma 3 are achieved for Gaussian processes.
(3): If in Theorem 3 we assume that the users $1, 2$ have source models described by Markov processes driven by additive Gaussian noise processes then from the EPI (see, e.g., ([43], Equation (7))) $N (w) = | Σ_{w} |^{p_{1} + p_{2}}$ and (59)–(61) change accordingly.
(4): One can choose to further bound (61) using the inequality $| D_{cov} |^{- \frac{1}{p_{1} + p_{2}}} \geq \frac{p_{1} + p_{2}}{trace (D_{cov})}$ obtaining a further lower bound that coincides with the lower bound in (59) (see also the discussion in Example 2). Such lower bound will mean that we extend the set of feasible solutions that correspond to the initial problem statement (13), to be similar to the initial problem statement of (11) which cannot be the case, in general. Our bound in (61) encapsulates the off diagonal elements of the distortion covariance matrix distortion $D_{cov}$ hence it is an appropriate lower bound for the specific problem.

In what follows, we give a numerical simulation where we compare the solution of

R_{joint}^{L B} (D^{*})

(that corresponds to the lower bound achieved by the optimal coding policies when the system is driven by additive

i . i . d .

Gaussian noise processes) computed via the SDP representation of (50), with the lower bound obtained in (60) when the system’s noise is also Gaussian.

Example 3

(Comparison of (50) with (60) for jointly Gaussian processes). We consider the same input data assumed in Example 1 for users

1, 2

. Then, we proceed to compute the true lower bound of (50) and the lower bound obtained in (60).

Our simulation study in Figure 7 shows that at high rates the performance of the two bounds is almost identical whereas to moderate and low rates we observe a gap that remains constant when

D_{^{T}} \geq D

, i.e., when the asymptotically averaged total distortion constraint is inactive. The same behavior is expected for systems of larger dimension (larger scale optimization problems) with a possibility of an increased gap to moderate and low rates depending on the structure of the block diagonal matrices A and

Σ_{w}

.

Next, we state a corollary of Theorem 3.

Corollary 2

(Analytical bounds when users

1, 2

are not specified by the same additive noise process). Consider the source models of users

1, 2

in (1). Moreover, assume that

w_{t}^{1} \sim N (0; Σ_{w^{1}})

and

w_{t}^{2} \sim (0; Σ_{w^{2}})

with

Σ_{w^{1}} ⪯ I_{p_{1}}

and

Σ_{w^{2}} ⪯ I_{p_{2}}

and

h (w^{2}) > - \infty

. Then, the following analytical lower bounds on (11)–(13) hold.

(1): For $D_{i i} > 0, \forall i$ , we obtain

$\begin{matrix} R_{pd}^{c} (D) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{(p_{1} + p_{2}) {| Σ_{w^{1}} |}^{\frac{1}{p_{1}}} N (w^{2})}{D}), \end{matrix}$

(62)

where $D \in (0, \frac{(p_{1} + p_{2}) N (w)}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}}]$ with $N (w^{2}) = \frac{1}{2 π e} 2^{\frac{2}{p_{2}} h (w^{2})}$ .
(2): For $D_{^{T}} > 0$ , $D_{i i} > 0, \forall i$ , we obtain

$\begin{matrix} R_{joint}^{c} (D^{*}) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{(p_{1} + p_{2}) {| Σ_{w^{1}} |}^{\frac{1}{p_{1}}} N (w^{2})}{D^{*}}), \end{matrix}$

(63)

where $D^{*} \in (0, \frac{(p_{1} + p_{2}) {| Σ_{w^{1}} |}^{\frac{1}{p_{1}}} N (w^{2})}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}}]$ .
(3): For $D_{cov} ≻ 0$ we obtain

$\begin{matrix} R^{c} (D_{cov}) \geq \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{| Σ_{w^{1}} |^{\frac{1}{p_{1}}} N (w^{2})}{| D_{cov} |^{\frac{1}{p_{1} + p_{2}}}}), \end{matrix}$

(64)

where $| D_{cov} | \in (0, {(\frac{| Σ_{w^{1}} |^{\frac{1}{p_{1}}} N (w^{2})}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}})}^{(p_{1} + p_{2})}]$ .

Proof.

All cases (1)–(3) follow almost identical steps with the derivation of Theorem 3. The only different but crucial step lies in (A6) where we then use the fact that

\begin{matrix} | Σ_{w} |^{\frac{1}{p_{1} + p_{2}}} \overset{(a)}{=} | Σ_{w^{1}} |^{\frac{1}{p_{1} + p_{2}}} | Σ_{w^{2}} |^{\frac{1}{p_{1} + p_{2}}} \overset{(b)}{\geq} | Σ_{w^{1}} |^{\frac{1}{p_{1}}} | Σ_{w^{2}} |^{\frac{1}{p_{2}}} \overset{(c)}{\geq} {| Σ_{w^{1}} |}^{\frac{1}{p_{1}}} N (w^{2}), \end{matrix}

(65)

where

(a)

follows from properties of block diagonal matrices ([33], Section 0.9.2);

(b)

follows from the conditions of the corollary on the noise covariance matrices;

(c)

follows from the EPI ([43], Equation (7)). □

One can deduce the following for Corollary 2.

Remark 9.

Corollary 2 will give similar analytical lower bounds (with appropriate modifications) if instead of user 1, we assume that the source model of user 2 is driven by a Gaussian noise process. The additional assumption on the covariance matrix of the noise process in both users is imposed because otherwise we cannot guarantee that the key series of inequalities (65) will be satisfied.

3. Upper Bounds

In this section we explain the case of encoding the augmented vector-valued Markov source modeled by (3) using a sequential causal DPCM scheme with a feedback loop followed by an ECDQ. The scheme relies on the linear forward test channel realization of the bounds in Lemma 2. The precursor of the DPCM-based scheme with feedback loop is [14] whereas ECDQ is a classical source coding approach with standard performance guarantees in information theory (see, e.g., [44]). The ECDQ scheme is utilized to bound the rate performance of the DPCM scheme. This approach furnish with an achievable (upper) bound the operation causal RDFs in (11)–(13).

3.1. DPCM with Feedback Loop

First, we briefly describe the sequential causal DPCM scheme with feedback loop introduced in ([14], Figure 2) (see also [45]). Observe that because the augmented source is modeled as a first-order multidimensional Markov process, the sequential causal coding is precisely equivalent to a predictive coding paradigm (see, e.g., [14,46]).

At each time instant t, the encoder or innovations’ encoder performs the linear operation

\begin{matrix} {\hat{x}}_{t} = x_{t} - A y_{t - 1}, \end{matrix}

(66)

where at

t = 0

we assume initial data

{\hat{x}}_{0} = x_{0}

and also

y_{t - 1} ≜ E \{x_{t - 1} | ℓ^{t - 1}\}

, i.e., an estimate of

x_{t - 1}

given the previous quantized symbols

ℓ^{t - 1}

(Note that the process

{\hat{x}}_{t}

has a temporal correlation since it subtracts the error of

x_{t}

given all previous quantized symbols

ℓ^{t - 1}

and not the infinite past of the source

x_{- \infty}^{t}

. Hence,

{\hat{x}}_{t}

is only an estimate of the true process and this causes a part of the sub-optimality of this scheme.). Then, by means of a

R^{p_{1} + p_{2}}

-valued MMSE quantizer that operates at a rate

R_{t}

, we generate the quantized reconstruction

{\hat{y}}_{t}

of the residual source

{\hat{x}}_{t}

denoted by

{\hat{y}}_{t} = y_{t} - A y_{t - 1}

. Then, we send

ℓ_{t}

over the channel (the corresponding data packet to

{\hat{y}}_{t}

). At the decoder we receive

ℓ_{t}

and recover the quantized symbol

{\hat{y}}_{t}

of

{\hat{x}}_{t}

.

Then, we generate the estimate

y_{t}

using the linear operation

\begin{matrix} y_{t} = {\hat{y}}_{t} + A y_{t - 1} . \end{matrix}

(67)

Combining both (66) and (67), we obtain

\begin{matrix} x_{t} - y_{t} = {\hat{x}}_{t} - {\hat{y}}_{t} . \end{matrix}

(68)

From (68), we can immediately deduce that the error between

x_{t}

and

y_{t}

is equal to the quantization error introduced by

{\hat{x}}_{t}

and

{\hat{y}}_{t}

which means that the

MSE

distortion at each instant of time satisfy

\begin{matrix} E {| | x_{t} - y_{t} {| |}_{2}^{2}} = E {| | {\hat{x}}_{t} - {\hat{y}}_{t} {| |}_{2}^{2}} . \end{matrix}

(69)

In addition, the covariance matrix

Σ_{Δ}

yields

\begin{matrix} E {(x_{t} - y_{t}) {(x_{t} - y_{t})}^{T}} = E {({\hat{x}}_{t} - {\hat{y}}_{t}) {({\hat{x}}_{t} - {\hat{y}}_{t})}^{T}} . \end{matrix}

(70)

A pictorial view of the DPCM scheme with feedback loop is given in Figure 8.

3.2. Bounding (11)–(13) via A DPCM-based ECDQ for Gaussian Noise Processes

In this subsection, we bound the rate performance of the DPCM scheme described in Section 3.1 in the infinite time horizon, using a scheme that utilizes the steady-state linear forward test-channel realization that achieves the lower bounds of Lemma 3. Essentially, what we do in this scheme is that we replace the quantization noise with an additive Gaussian noise with the same second moments (see e.g., [47] or [44] (Chapter 5) and the references therein).

Recall that the steady-state linear forward test-channel realization of the lower bounds in Lemma 3 is written as follows:

\begin{matrix} y_{t} = H x_{t} + (I_{p_{1} + p_{2}} - H) A y_{t - 1} + v_{t}, \end{matrix}

(71)

whereas the steady-state reverse-waterfilling parameters

(H, Σ_{v})

are given by

\begin{matrix} H ≜ I_{p_{1} + p_{2}} - Δ Λ^{- 1}, Σ_{v} = H Δ ⪰ 0 . \end{matrix}

(72)

The forward test-channel realization of (71) is illustrated in Figure 9.

Before we proceed, we point out the following important technical remarks on the realization of (71) and the coefficients (72).

Remark 10

(Observations (71) and (72)). The linear forward test channel realization with additive noise in (71) is equivalent to the steady-state realization in (46) because for both it can be shown that the MSE distortion constraint is achieved (i.e.,

v_{t} \sim N (0; Σ_{v})

,

Σ_{v} = H Δ = Δ H^{T} ⪰ 0

). Moreover, this realization is equivalent but simpler to build compared to the forward test channel realization introduced in [14] in which non-singular matrices and diagonalization by congruence is assumed (see ([14], Theorem 4)).

In the test channel realization of Figure 9, a reverse-waterfilling in spatial dimension is possible when we assume asymptotically averaged total MSE distortion constraints similar to ([14], Theorem 40). This reverse-waterfilling is dictated by the rank of matrix H. To make this point clear, if H is full rank, then all spatial dimensions in the system are active whereas if H is rank deficient, then, some dimensions are inactive (these dimensions form the null space and the nullity of H) and for these dimensions the rate is zero hence they can be excluded from the realization of Figure 9. In the sequel, we will present simulations where we study the reverse-waterfilling in the spatial domain under a certain distortion constraint studied in this paper.

Pre/Post Filtered ECDQ with multiplicative factors for augmented multivariate Gauss–Markov sources and spatial reverse-waterfilling. First, we consider a

rank (H) —

dimensional lattice quantizer

Q_{rank (H)} (\cdot)

[48] such that

\begin{matrix} E {z_{t} z_{t}^{T}} = Σ_{v^{c}}, Σ_{v^{c}} ≻ 0, \end{matrix}

where

z_{t} \in R^{rank (H)}

is a random dither vector generated both at the encoder and the decoder independent of the input signals

{\hat{x}}_{t}

and the previous realizations of the dither, uniformly distributed over the basic Voronoi cell of the

rank (H) —

dimensional lattice quantizer

Q_{rank (H)} (\cdot)

such that

v_{t}^{c} \sim U n i f (0; Σ_{v_{t}^{c}})

. At the encoder the lattice quantizer quantize

H {\hat{x}}_{t} + z_{t}

, that is,

Q_{rank (H)} (H {\hat{x}}_{t} + z_{t})

, where

{\hat{x}}_{t}

is given by (66). Then, the encoder applies conditional entropy coding to the output of the quantizer and transmits the output of the entropy coder. At the decoder the coded bits are received and the output of the quantizer is reconstructed, i.e.,

Q_{rank (H)} (H {\hat{x}}_{t} + z_{t})

. Then, it generates an estimate by subtracting

z_{t}

from the quantizer’s output and multiplies the result by

I_{rank (H)}

(

I_{rank (H)}

denotes the identity matrix with dimensions according to the rank of H. This identity matrix can be excluded but we include it here for completeness.) as follows:

\begin{matrix} y_{t} = I_{rank (H)} (Q_{rank (H)} (H {\hat{x}}_{t} + z_{t}) - z_{t}), \end{matrix}

(73)

Performance. The coding rate at each instant of time of the conditional entropy of the

MMSE

quantizer is given by

\begin{matrix} H (Q_{rank (H)} | z_{t}) & = I (H {\hat{x}}_{t}; H {\hat{x}}_{t} + v_{t}^{c}) \\ \overset{(a)}{=} I (H {\hat{x}}_{t}; H {\hat{x}}_{t} + v_{t}) + D (v_{t}^{c} | | v_{t}) - D (H {\hat{x}}_{t} + v_{t}^{c} | | H {\hat{x}}_{t} + v_{t}) \\ \overset{(b)}{\leq} I (H {\hat{x}}_{t}; H {\hat{x}}_{t} + v_{t}) + D (v_{t}^{c} | | v_{t}) \\ \overset{(c)}{\leq} I (H {\hat{x}}_{t}; H {\hat{x}}_{t} + v_{t}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) \\ \overset{(d)}{\leq} I (x_{t}; y_{t} | y^{t - 1}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) \end{matrix}

(74)

where

v_{t}^{c} \in R^{rank (H)}

is the (uniform) coding noise in the ECDQ scheme and

v_{t}

is the corresponding Gaussian counterpart;

(a)

follows because the two random vectors

v_{t}^{c}, v_{t}

have the same second moments hence we can use the identity

D (x | | x^{'}) = h (x^{'}) - h (x)

;

(b)

follows because

D (H {\hat{x}}_{t} + v_{t}^{c} | | H {\hat{x}}_{t} + v_{t}) \geq 0

;

(c)

follows because the divergence of the coding noise from Gaussianity is less than or equal to

\frac{rank (H)}{2} log (2 π e G_{rank (H)})

[47] where

G_{rank (H)}

is the dimensionless normalized second moment of the lattice ([44], Definition 3.2.2);

(d)

follows from data processing properties, namely,

I (x_{t}; y_{t} | y^{t - 1}) \overset{(*)}{=} I (x_{t}; y_{t} | y_{t - 1}) \overset{(* *)}{=} I ({\hat{x}}_{t}; {\hat{y}}_{t}) \overset{(* * *)}{\geq} I (H {\hat{x}}_{t}; H {\hat{x}}_{t} + v_{t})

where

(*)

follows from the realization of (71),

(* *)

follows from the fact that

{\hat{x}}_{t}

and

{\hat{y}}_{t}

(obtained by (67)) are independent of

y_{t - 1}

, and

(* * *)

is a consequence of data processing inequality since

(H {\hat{x}}_{t} + v_{t}) \leftrightarrow {\hat{x}}_{t} \leftrightarrow H {\hat{x}}_{t}

. Under the assumption that the clocks of the entropy encoder and entropy decoder in the ECDQ scheme are synchronized, then, the total coding rate is obtained as follows

\begin{matrix} \sum_{t = 1}^{n} R_{t} \leq \sum_{t = 0}^{n} (H (Q_{rank (H)} | z_{t}) & \overset{(e)}{\leq} \sum_{t = 0}^{n} I (x_{t}; y_{t} | y^{t - 1}) + \frac{(n + 1) rank (H)}{2} log (2 π e G_{rank (H)}) \\ \overset{(f)}{=} \frac{1}{2} \sum_{t = 0}^{n} {log}_{2} \frac{| Λ_{t} |}{| Δ_{t} |} + \frac{(n + 1) rank (H)}{2} log (2 π e G_{rank (H)}), \end{matrix}

(75)

where

(e)

follows from (74);

(f)

follows from the derivation of Lemma 2.

The previous analysis yields the following theorem.

Theorem 4

(Achievability bound on (11)–(13)). Suppose that

Δ_{t} = Δ, \forall t

and assume that the users

1, 2

source models (1) are driven by Gaussian noise processes. Then, the augmented state space source model in (3) ensures the following achievability bounds on (11)–(13), as follows

\begin{matrix} R_{pd}^{c} (D) \leq R_{pd}^{L B} (D) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) \end{matrix}

(76)

\begin{matrix} R_{joint}^{c} (D^{*}) \leq R_{joint}^{L B} (D^{*}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}), \end{matrix}

(77)

\begin{matrix} R^{c} (D_{cov}) \leq R^{L B} (D_{cov}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) . \end{matrix}

(78)

Proof.

Under the conditions of the Theorem and the ECDQ scheme that leads to (75), the RHS terms in (76)–(78) are all constants. Then, taking the limit in both sides of (76)–(78) and then the appropriate infimization (minimization) constraint sets, the result follows. □

We wish to point out the following for Theorem 4.

Remark 11

(Comments on Theorem 4).

(1): The ECDQ that leads to (75) is not the same as the standard symmetric ECDQ scheme for scalar-valued processes, i.e., when the coefficient H breaks into two pre and post additive noise channel scalings that tune the MSE distortion (see, e.g., [44,47]). In our pre-post scaled ECDQ scheme we take asymmetric coefficients based on the realization of Figure 9. This leads to a coarser lattice than the one used for the unscaled ECDQ (for details see for instance [44]).
(2): Since the upper bound essentially relies on the obtained lower bound for all (76)–(78), this means that similar observations can be made. For instance, if $D < D_{^{T}}$ , then, (77) recovers (76), i.e., the asymptotically averaged total distortion constraint is inactive. Moreover, we cannot claim tightness of the achievability bound in (78) because the lower bound is already non-tight.

Next, we give an example where we compare the RL gap for various distortion levels of the achievability bound obtained in Theorem 4 and the lower bound obtained in Theorem 2 for the operational causal RDF with joint distortion constraints.

Example 4

(RL gap of achievability and lower bounds). In this example, we plot lower and upper bounds on the operational causal RDF subject to joint distortion constraints using the bounds derived in (50) and (77), respectively. We first consider the same input data assumed in Example 1 for users

1, 2

. Then, we proceed to compute the lower bound via (50) and the achievability bound in (77). After the first numerical study, we consider another one for which we only change the Gaussian noise process for both users

1, 2

, as follows

\begin{matrix} (Σ_{w^{1}}, Σ_{w^{2}}) = ([\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}], [\begin{matrix} 1.4039 & 0.6034 & 0.5165 \\ 0.6034 & 0.9563 & 0.7682 \\ 0.5165 & 0.7682 & 0.6620 \end{matrix}]) . \end{matrix}

(79)

Using the data of Example 1, and the same

D_{^{T}}

and

D_{i i}, i = 1, 2, 3, 4, 5

, we obtain the plots of Figure 10. For this study, we have used a Schläfli lattice (for details on this lattice see, e.g., [48]).

{\tilde{D}}_{5}

with a dimensionless normalized second moment of the lattice

G_{5} \approx 0.0756

bits. In this example H is always full rank and the RL gap is constant at

0.9218

bits/augmented vector.

For the second example, we obtain the plots of Figure 11. For this study, we have used the dimensionless normalized second moment of a Schläfli lattice

{\tilde{D}}_{5}

for the full rank case and for the rank deficient cases a Schläfli lattice

D_{4}

with a dimensionless normalized second moment of the lattice

G_{4} \approx 0.0766

bits. Similar to the first study, when H is always full rank, the RL gap is

0.9218

bits/augmented vector whereas for H rank deficient the RL gap is

0.7754

bits/augmented vector.

3.3. Bounding (11)–(13) via a DPCM-based ECDQ for Non-Gaussian Noise Processes

Similar to Lemma 4 where linear policies are the benchmark to derive lower bounds on (11)–(13), in this subsection we derive upper bounds on (11)–(13) using the linear test channel realization in Figure 9 and the DPCM-based ECDQ scheme of Section 3.1 and Section 3.2.

Next, we state the following theorem.

Theorem 5

(Achievability bound on (11)–(13) for additive non-Gaussian noise process). Suppose that

Δ_{t} = Δ, \forall t

and assume that the users

1, 2

source models (1) are driven by non-Gaussian noise processes. Then, the augmented state space source model in (3) ensures the following achievability bounds on (11)–(13), as follows

\begin{matrix} R_{pd}^{c} (D) \leq R_{pd}^{L B, l i n e a r} (D) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) + D (\hat{x} | | {\hat{x}}^{G}) \end{matrix}

(80)

\begin{matrix} R_{joint}^{c} (D^{*}) \leq R_{joint}^{L B, l i n e a r} (D^{*}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) + D (\hat{x} | | {\hat{x}}^{G}), \end{matrix}

(81)

\begin{matrix} R^{c} (D_{cov}) \leq R^{L B, l i n e a r} (D_{cov}) + \frac{rank (H)}{2} log (2 π e G_{rank (H)}) + D (\hat{x} | | {\hat{x}}^{G}) . \end{matrix}

(82)

where

D (\hat{x} | | {\hat{x}}^{G})

is the KL divergence between the residual source

{\hat{x}}_{t}

assuming linear policies and the Gaussian residual source

{\hat{x}}_{t}^{G} \sim N (0; Λ)

in Figure 9.

Proof.

We only prove (80) because (81) and (82) follow similarly. In addition, in parts we sketch the derivation because it is clear from the previous results. From Lemmas 4 and 3, we can easily obtain the following lower bound (similar to SLB [32])

\begin{matrix} R_{pd}^{c} (D) \geq R_{pd}^{L B, l i n e a r} (D) - D (\hat{x} | | {\hat{x}}^{G}), \end{matrix}

(83)

where

D (\hat{x} | | {\hat{x}}^{G}) \geq 0

is the discrepancy between the residual source

{\hat{x}}_{t}

assuming linear policies and the optimal Gaussian residual source

{\hat{x}}_{t}^{G} \sim N (0; Λ)

. From (83), we obtain

\begin{matrix} R_{pd}^{L B, l i n e a r} (D) \leq R_{pd}^{c} (D) + D (\hat{x} | | {\hat{x}}^{G}) . \end{matrix}

(84)

Then, applying the DPCM-based ECDQ scheme based on the linear forward test channel realization of Figure 9 discussed in Section 3.1 and Section 3.2 we obtain (76). Since the coding scheme is obtained because we have assumed linear policies,

R_{pd}^{L B} (D)

will be replaced by

R_{pd}^{L B, l i n e a r} (D)

. This completes the derivation. □

Remark 12

(Comments on Theorem 5). Clearly, Theorem 5 is a generalization of Theorem 4 under the assumption of the linear realization of Figure 9 with systems driven by additive

i . i . d .

non-Gaussian noise process. If in (80)–(82), we assume that the system is driven by additive

i . i . d .

Gaussian noise process, then, clearly

D (\hat{x} | | {\hat{x}}^{G}) = 0

and Theorem 5 recovers Theorem 4.

4. Conclusions and Future Research

In this paper, we derived bounds on the OPTA of a two-user MIMO causal encoding and causal decoding problem (assuming the clocks of the encoder and the decoder to be synchronized). In our setup, each one of the users is described by a multivariate Markov source driven by additive

i . i . d .

noise process (possibly non-Gaussian) subject to three classes of spatio-temporal distortion constraints.

Although not directly pursued in this paper, all the results can be easily generalized to any finite number of users in Figure 1. Moreover, as a future research we aim to study the case of separate encoding for each user which will be a generalized version of a multi-user (distributed) source coding setup. Finally, due to the lack of insight of our results (mainly because we employed the general SDP solvers to compute our bounds) it makes sense to consider more specific setups and try to solve them using KKT conditions and, then, identify structural properties of matrices

(A, Σ_{w})

for which KKT conditions can be optimally solved.

Author Contributions

Conceptualization, P.A.S., J.Ø. and M.S.; methodology, P.A.S., J.Ø. and M.S.; software, P.A.S.; validation, P.A.S, J.Ø. and M.S.; formal analysis, P.A.S; investigation, P.A.S.; resources, M.S.; data curation, P.A.S.; writing—original draft preparation, P.A.S.; writing—review and editing, P.A.S., J.Ø. and M.S.; visualization, P.A.S.; supervision, M.S.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by KAW Foundation and the Swedish Foundation for Strategic Research.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Notation

The following notation is used in this manuscript:

Symbol	Description
$R$	The set of real numbers
$Z$	The set of integers
$N_{0}$	The set of natural numbers including zero
$N_{0}^{n}$	The set ${0, \dots, n}$ where $n \in N_{0}$
$x \in R$	Random variable
$X$	Alphabet for the random variable $x$
$x_{r}^{t}$	The sequence of random variables $(x_{r}, x_{r + 1}, \dots, x_{t})$ , $(r, t) \in Z \times Z, r \leq t$
$x_{r}^{t}$	Sequence of the random variables realizations, where $x_{r}^{t} \in X_{r}^{t}$
$X_{r}^{t}$	$\times_{k = r}^{t} X_{k}$ with $X_{t} = X$
$P (d x)$	The probability of the $RV$ $x$ on $X$
$P (d y \| x)$	The conditional distribution of $RV$ $y$ given $x = x$
⊗	compound product
⊕	Direct sum
$x \in R^{p \times 1}$	Column vector
$x^{T} \in R^{1 \times p}$	Row vector
$K \in R^{p \times p}$	Square real matrix
$K^{T} \in R^{p \times p}$	Transpose of square real matrix
$K_{i i}$	Diagonal elements of matrix K
$\| K \|$	Determinant of K
$rank (K)$	Rank of K
$trace (K)$	Trace of K
$μ_{K, i}$	the $i^{th}$ eigenvalue of matrix K
$Σ_{x}$	The covariance of a random vector $x$
$Σ_{x} ≻ 0$	Positive definite covariance matrix $Σ_{x}$
$Σ_{x} ⪰ 0$	Positive semidefinite covariance matrix $Σ_{x}$
$Σ_{x} ⪰ Σ_{x^{'}}$	$Σ_{x} - Σ_{x^{'}}$ is positive semidefinite
$Σ_{x} ≻ Σ_{x^{'}}$	$Σ_{x} - Σ_{x^{'}}$ is positive definite
$0$	Null matrix
$I_{p}$	Identity matrix of dimension p
$H (\cdot)$	Discrete entropy
$h (\cdot)$	Differential entropy
$D (x \| \| x^{'})$	KL Divergence of probability distribution $P (x)$ with respect to probability distribution $P (x^{'})$
$x \sim N (0; Σ)$	Gaussian random vector $x$ with zero mean and covariance $Σ$
$x \sim U n i f (0; Σ)$	Uniformly distributed random vector $x$ with zero mean and covariance $Σ$
$h^{G} (\cdot)$	Gaussian differential entropy
$R^{G} (\cdot)$	Gaussian information RDF
$N (x)$	The entropy power of random vector $x$
$\| \| \cdot {\| \|}_{2}$	Euclidean norm
$E {\cdot}$	Expectation operator
${[\cdot]}^{+}$	$max {0, \cdot}$
$A \leftrightarrow B \leftrightarrow C$	$A, B, C$ form a Markov chain

Abbreviations

The following abbreviations are used in this manuscript:

OPTA	Optimal Performance Theoretically Attainable
RDF	Rate distortion function
DPCM	Differential pulse coded modulation
ECDQ	Entropy coded dithered quantization
MSE	Mean-squared error
MMSE	Minimum MSE
RHS	Right Hand Side
LHS	Left Hand Side
$i . i . d .$	Independent Identically Distributed
a.s.	almost surely
KKT	Karush Kuhn Tucker
LMI	Linear matrix inequality
SDP	Semidefinite programming
KF	Kalman filter
EP	Entropy power
EPI	Entropy power inequalities
RL	Rate loss
SLB	Shannon Lower Bound

Appendix A

Proof of Lemma 3.

We only prove (1) as both (2), (3), follow similarly. First, by assumptions of the theorem,

Δ ≜ \frac{1}{n + 1} \sum_{t = 0}^{n} Δ_{t}

for some finite n with

Δ ≻ 0

(sufficient condition for existence of a finite solution) and

B ≜ A^{T} Σ_{w}^{- 1} A ⪰ 0

. Moreover,

\begin{matrix} \frac{1}{n + 1} \sum_{t = 0}^{n} R_{t} & \overset{(a)}{\geq} \frac{1}{n + 1} \sum_{t = 0}^{n} I (x_{t}; y_{t} | y^{t - 1}) \\ \overset{(b)}{=} \frac{1}{n + 1} \frac{1}{2} \sum_{t = 0}^{n} log \frac{| Λ_{t} |}{| Δ_{t} |} \\ \overset{(c)}{=} \frac{1}{n + 1} [\frac{1}{2} log | Λ_{0} | - \frac{1}{2} log | Λ_{n + 1} | + \frac{1}{2} \sum_{t = 0}^{n} log \frac{| Λ_{t + 1} |}{| Δ_{t} |}] \\ \overset{(d)}{=} \frac{1}{n + 1} [\frac{1}{2} log | Λ_{0} | - \frac{1}{2} log | Λ_{n + 1} | + \frac{1}{2} \sum_{t = 0}^{n} [log | Σ_{w} | + log | Δ_{t}^{- 1} + B |]] \\ \overset{(e)}{\geq} \frac{1}{n + 1} [\frac{1}{2} log | Λ_{0} | - \frac{1}{2} log (\frac{{(trace (A^{T} A) trace (\hat{D}) + trace (Σ_{w})))}^{p}}{p})] \\ + \frac{1}{2} \frac{1}{n + 1} \sum_{t = 0}^{n} [log | Σ_{w} | + log | Δ_{t}^{- 1} + B |] \\ \overset{(f)}{\geq} \frac{1}{n + 1} [\frac{1}{2} log | Λ_{0} | - \frac{1}{2} log (\frac{{(trace (A^{T} A) trace (\hat{D}) + trace (Σ_{w})))}^{p}}{p})] \\ + \frac{1}{2} log | Σ_{w} | + \frac{1}{2} log | Δ^{- 1} + B | \\ \overset{(g)}{\geq} \frac{1}{n + 1} [\frac{1}{2} log | Λ_{0} | - \frac{1}{2} log (\frac{{(trace (A^{T} A) trace (\hat{D}) + trace (Σ_{w})))}^{p}}{p})] + \frac{1}{2} log \frac{| Λ |}{| Δ |}, \end{matrix}

(A1)

where

(a)

follows from Theorem 1 (see also Remark 2);

(b)

follows from Lemma 1, (1), because

h^{G} (x_{t} | y^{t - 1}) = \frac{1}{2} log {(2 π e)}^{p_{1} + p_{2}} | Λ_{t} |

and

h^{G} (x_{t} | y^{t}) = \frac{1}{2} log {(2 π e)}^{p_{1} + p_{2}} | Δ_{t} |

;

(c)

follows by reformulating the additive objective;

(d)

follows because

| Λ_{t + 1} | | Δ_{t}^{- 1} | = | A Δ_{t} A^{T} + Σ_{w} | | Δ_{t}^{- 1} | = | Σ_{w} (Σ_{w}^{- 1} A Δ_{t} A^{T} + I_{p_{1} + p_{2}}) | | Δ_{t}^{- 1} | \overset{(d 1)}{=} | Σ_{w} (A^{T} Σ_{w}^{- 1} A Δ_{t} + I_{p_{1} + p_{2}}) | | Δ_{t}^{- 1} | \overset{(d 2)}{=} | Σ_{w} | | B + Δ_{t}^{- 1} |

where

(d 1)

follows from Weinstein–Aronszajn identity ([49], Corollary 18.1.2) and

(d 2)

from standard determinant properties of square matrices of the same size;

(e)

follows from the inequalities

\begin{matrix} | Λ_{n + 1} | & \overset{(e 1)}{\leq} \frac{{(trace (A Δ_{n} A^{T} + Σ_{w}))}^{p}}{p} \\ \overset{(e 2)}{\leq} \frac{{(trace (A Δ_{n} A^{T}) + trace (Σ_{w})))}^{p}}{p} \\ \overset{(e 3)}{\leq} \frac{{(trace (A^{T} A) trace (Δ_{n}) + trace (Σ_{w})))}^{p}}{p} \\ \overset{(e 4)}{\leq} \frac{{(trace (A^{T} A) trace (\hat{D}) + trace (Σ_{w})))}^{p}}{p}, \end{matrix}

where

(e 1)

follows because

{| K |}^{\frac{1}{p}} \leq \frac{trace (K)}{p}

for

K ≻ 0

,

(e 2)

follows from ([18], Ex. 12.14),

(e 3)

follows from the cycling property of trace and ([18], Ex. 12.14),

(e 4)

follows because

trace (Δ_{n}) \leq trace (\hat{D})

(by definition);

(f)

follows because the term

log | Δ_{t}^{- 1} + B |

is convex with respect to

Δ_{t}

for

B ⪰ 0

(see, e.g., [50]) hence we can apply Jensen’s inequality ([19], Theorem 2.6.2);

(g)

follows because

log | Σ_{w} | + \frac{1}{2} log | Δ^{- 1} + B |

can be rearranged to

\frac{| Λ |}{| Δ |}

where

Λ = A Δ A^{T} + Σ_{w}

. Taking the limit in both sides in (A1), we observe that the first RHS term vanishes asymptotically. Finally, using the appropriate infimization constraints in both sides of the limiting objective functions in (A1), and because we assume sufficient conditions for existence of solution of the RHS term infimum is in fact minimum and the result follows. □

Appendix B

Proof of Theorem 2.

Similar to the proof of Lemma 3, we only prove one case, i.e., (1), as the following cases can be shown using the exact same way.

By invoking matrix determinant lemma ([49], Theorem 18.1.1) in

R_{pd}^{L B} (D)

, we obtain

\begin{matrix} R_{pd}^{na} (D) = min_{\begin{matrix} 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \end{matrix}} \frac{1}{2} log | Σ_{w} | - \frac{1}{2} log {| Δ^{- 1} - A^{T} Σ_{w}^{- 1} A |}^{- 1} . \end{matrix}

(A2)

Next, we introduce a decision variable

Q_{1} ≜ Δ^{- 1} - A^{T} Σ_{w}^{- 1} A

. Using the monotonicity of the determinant we can rewrite (A2) as

\begin{matrix} R_{pd}^{na} (D) = min_{\begin{matrix} 0 ≺ Δ ⪯ Λ \\ Δ_{i i} \leq D_{i i}, i = 1, \dots, p \\ 0 ≺ Q_{1} ⪯ {(Δ^{- 1} - A^{T} Σ_{w}^{- 1} A)}^{- 1} \end{matrix}} \frac{1}{2} log | Σ_{w} | - \frac{1}{2} log | Q_{1} | . \end{matrix}

(A3)

Applying Woodbury matrix identity ([49], Theorem 18.2.8) in the inequality constraint

0 ≺ Q_{1} ⪯ {(Δ^{- 1} - A^{T} Σ_{w}^{- 1} A)}^{- 1}

, we obtain

\begin{matrix} 0 ≺ Q_{1} ⪯ Δ - Δ A^{T} {(Σ_{w} + A Δ A^{T})}^{- 1} A Δ . \end{matrix}

(A4)

From Theorem 2 we have

Λ = A Δ A^{T} + Σ_{w}

, hence (A4) is equivalent to the LMI condition of (49).

The decision variable is convex and there exists an optimal solution because

Δ ≻ 0

. This completes the proof. □

Appendix C

Proof of Theorem 3.

We only prove in detail (1) and sketch the proof for (2), (3) because several steps are identical to (1). First note that from Lemma 3 we have the bound in (43). Next, we simply show that the objective function can be lower bounded by a constant value (independent of the constraint set).

\begin{matrix} \frac{1}{2} log \frac{| Λ |}{| Δ |} & = \frac{1}{2} log | A Δ A^{T} + Σ_{w} | - \frac{1}{2} log | Δ | \\ = \frac{(p_{1} + p_{2})}{2} log {| A Δ A^{T} + Σ_{w} |}^{\frac{1}{p_{1} + p_{2}}} - \frac{(p_{1} + p_{2})}{2} log {| Δ |}^{\frac{1}{p_{1} + p_{2}}} \\ \overset{(a)}{\geq} \frac{(p_{1} + p_{2})}{2} log (| A Δ A^{T} |^{\frac{1}{p_{1} + p_{2}}} + {| Σ_{w} |}^{\frac{1}{p_{1} + p_{2}}}) - \frac{(p_{1} + p_{2})}{2} log {| Δ |}^{\frac{1}{p_{1} + p_{2}}} \\ \overset{(b)}{=} \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} {| Δ |}^{\frac{1}{p_{1} + p_{2}}} + {| Σ_{w} |}^{\frac{1}{p_{1} + p_{2}}}) - \frac{(p_{1} + p_{2})}{2} log {| Δ |}^{\frac{1}{p_{1} + p_{2}}} \\ = \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + | Σ_{w} |^{\frac{1}{p_{1} + p_{2}}} {| Δ |}^{- \frac{1}{p_{1} + p_{2}}}) \\ \overset{(c)}{\geq} \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + {| Σ_{w} |}^{\frac{1}{p_{1} + p_{2}}} \frac{(p_{1} + p_{2})}{trace (Δ)}) \end{matrix}

(A5)

\begin{matrix} \overset{(d)}{\geq} \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + {| Σ_{w} |}^{\frac{1}{p_{1} + p_{2}}} \frac{(p_{1} + p_{2})}{D}) \end{matrix}

(A6)

\begin{matrix} \overset{(e)}{\geq} \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{(p_{1} + p_{2})}} + \frac{(p_{1} + p_{2}) N (w)}{D}), \end{matrix}

(A7)

where

(a)

follows from Minkowski’s determinant inequality ([18], Exercise 12.13);

(b)

follows from standard properties of determinants for square matrices of the same size;

(c)

follows from the reverse application of EPI (see, e.g., ([43], Equation (7)));

(d)

follows because

trace (Δ) \leq trace (\hat{D}) \equiv D

;

(e)

follows from ([43], Equation (7)). The constant value in (A7) is well defined if

D \in (0, \frac{(p_{1} + p_{2}) N (w)}{1 - | A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}}}]

, where

N (w) = \frac{1}{2 π e} 2^{\frac{2}{p_{1} + p_{2}} h (w)}

and

h (w) > - \infty

.

In (2) we follow similar steps to (1) but in inequality

(d)

we have instead

trace (Δ) \leq min {D_{^{T}}, trace (\hat{D})} \equiv D^{*}

.

In (3) we follow similar steps until (A5). Afterwards, we leverage the fact that

| Δ | \leq | D_{cov} |

to obtain instead of (A6) the following:

\begin{matrix} \frac{(p_{1} + p_{2})}{2} log (| A^{T} {A |}^{\frac{1}{p_{1} + p_{2}}} + \frac{| Σ_{w} |^{\frac{1}{p_{1} + p_{2}}}}{| D_{cov} |^{\frac{1}{p_{1} + p_{2}}}}) . \end{matrix}

This completes the derivation. □

References

Caines, P.E. Linear Stochastic Systems; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1988. [Google Scholar]
Skogestad, S.; Postlethwaite, I. Multivariable Feedback Control: Analysis and Design, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
Wang, J.; Chen, J. Vector Gaussian Multiterminal Source Coding. IEEE Trans. Inf. Theory 2014, 60, 5533–5552. [Google Scholar] [CrossRef] [Green Version]
Ekrem, E.; Ulukus, S. An Outer Bound for the Vector Gaussian CEO Problem. IEEE Trans. Inf. Theory 2014, 60, 6870–6887. [Google Scholar] [CrossRef] [Green Version]
Oohama, Y. Indirect and Direct Gaussian Distributed Source Coding Problems. IEEE Trans. Inf. Theory 2014, 60, 7506–7539. [Google Scholar] [CrossRef]
Rahman, M.S.; Wagner, A.B. Rate Region of the Vector Gaussian One-Helper Source-Coding Problem. IEEE Trans. Inf. Theory 2015, 61, 2708–2728. [Google Scholar] [CrossRef] [Green Version]
Zahedi, A.; Østergaard, J.; Jensen, S.H.; Naylor, P.A.; Bech, S. Source Coding in Networks With Covariance Distortion Constraints. IEEE Trans. Signal Proc. 2016, 64, 5943–5958. [Google Scholar] [CrossRef] [Green Version]
Vaseghi, S.V. Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications; John Wiley & Sons: Chichester, UK, 2007. [Google Scholar]
Zahedi, A.; Østergaard, J.; Jensen, S.H.; Bech, S.; Naylor, P. Audio coding in wireless acoustic sensor networks. Signal Process. 2015, 107, 141–152. [Google Scholar] [CrossRef]
Linder, T.; Lagosi, G. A zero-delay sequential scheme for lossy coding of individual sequences. IEEE Trans. Inf. Theory 2001, 47, 2533–2538. [Google Scholar] [CrossRef]
Derpich, M.S.; Østergaard, J. Improved Upper Bounds to the Causal Quadratic Rate-Distortion Function for Gaussian Stationary Sources. IEEE Trans. Inf. Theory 2012, 58, 3131–3152. [Google Scholar] [CrossRef] [Green Version]
Kaspi, Y.; Merhav, N. Structure Theorems for Real-Time Variable Rate Coding With and Without Side Information. IEEE Trans. Inf. Theory 2012, 58, 7135–7153. [Google Scholar] [CrossRef] [Green Version]
Linder, T.; Yüksel, S. On Optimal Zero-Delay Coding of Vector Markov Sources. IEEE Trans. Inf. Theory 2014, 60, 5975–5991. [Google Scholar] [CrossRef] [Green Version]
Stavrou, P.A.; Østergaard, J.; Charalambous, C.D. Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources. IEEE J. Sel. Top. Signal Process. 2018, 12, 841–856. [Google Scholar] [CrossRef]
Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
Tanaka, T.; Kim, K.K.K.; Parrilo, P.A.; Mitter, S.K. Semidefinite Programming Approach to Gaussian Sequential Rate-Distortion Trade-Offs. IEEE Trans. Autom. Control 2017, 62, 1896–1910. [Google Scholar] [CrossRef]
Khina, A.; Kostina, V.; Khisti, A.; Hassibi, B. Tracking and Control of Gauss-Markov Processes over Packet-Drop Channels with Acknowledgments. IEEE Trans. Control Netw. Syst. 2019, 6, 549–560. [Google Scholar] [CrossRef] [Green Version]
Abadir, K.M.; Magnus, J.R. Matrix Algebra; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
Wyner, A. A definition of conditional mutual information for arbitrary ensembles. Inf. Control 1978, 38, 51–59. [Google Scholar] [CrossRef] [Green Version]
Tanaka, T.; Esfahani, P.M.; Mitter, S.K. LQG Control With Minimum Directed Information: Semidefinite Programming Approach. IEEE Trans. Autom. Control 2018, 63, 37–52. [Google Scholar] [CrossRef] [Green Version]
Massey, J.L. Causality, Feedback and Directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA ’90), Waikiki, HI, USA, 27–30 November 1990; pp. 303–305. [Google Scholar]
Charalambous, C.D.; Stavrou, P.A. Directed Information on Abstract Spaces: Properties and Variational Equalities. IEEE Trans. Inf. Theory 2016, 62, 6019–6052. [Google Scholar] [CrossRef]
Stavrou, P.; Charalambous, T.; Charalambous, C.; Loyka, S. Optimal Estimation via Nonanticipative Rate Distortion Function and Applications to Time-Varying Gauss–Markov Processes. SIAM J. Control Optim. 2018, 56, 3731–3765. [Google Scholar] [CrossRef] [Green Version]
Ihara, S. Information Theory—For Continuous Systems; World Scientific: Singapore, 1993. [Google Scholar]
Gorbunov, A.K.; Pinsker, M.S. Nonanticipatory and Prognostic Epsilon Entropies and Message Generation Rates. Problems Inf. Transmiss. 1973, 9, 184–191. [Google Scholar]
Tatikonda, S.C. Control Under Communication Constraints. Ph.D. Thesis, Mass. Inst. of Tech. (M.I.T.), Cambridge, MA, USA, 2000. [Google Scholar]
Charalambous, C.D.; Stavrou, P.A.; Ahmed, N.U. Nonanticipative Rate Distortion Function and Relations to Filtering Theory. IEEE Trans. Autom. Control 2014, 59, 937–952. [Google Scholar] [CrossRef]
Tatikonda, S.; Sahai, A.; Mitter, S. Stochastic linear control over a communication channel. IEEE Trans. Autom. Control 2004, 49, 1549–1561. [Google Scholar] [CrossRef]
Silva, E.I.; Derpich, M.S.; Østergaard, J. A Framework for Control System Design Subject to Average Data-Rate Constraints. IEEE Trans. Autom. Control 2011, 56, 1886–1899. [Google Scholar] [CrossRef] [Green Version]
Stavrou, P.A.; Charalambous, T.; Charalambous, C.D. Finite-Time Nonanticipative Rate Distortion Function for Time-Varying Scalar-Valued Gauss-Markov Sources. IEEE Control Syst. Lett. 2018, 2, 175–180. [Google Scholar] [CrossRef]
Berger, T. Rate Distortion Theory: A Mathematical Basis for Data Compression; Prentice-Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
Horn, R.A.; Johnson, C.R. (Eds.) Matrix Analysis, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
Anderson, B.; Moore, J. Optimal Filtering; Prentice-Hall: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
Simon, D. Optimal State Estimation: Kalman, H_∞, and Nonlinear Approaches; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, Version 2.1. Available online: http://cvxr.com/cvx (accessed on 1 March 2014).
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: New York, NY, USA, 2004. [Google Scholar]
Boyd, S.; El Ghaoui, L.; Feron, E.; Balakrishnan, V. Linear Matrix Inequalities in System and Control Theory; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1994. [Google Scholar]
Wang, J.; Chen, J.; Wu, X. On the Sum Rate of Gaussian Multiterminal Source Coding: New Proofs and Results. IEEE Trans. Inf. Theory 2010, 56, 3946–3960. [Google Scholar] [CrossRef]
Vandenberghe, L.; Boyd, S.; Wu, S.P. Determinant maximization with linear matrix inequality constraints. SIAM J. Matrix Anal. Appl. 1998, 19, 499–533. [Google Scholar] [CrossRef] [Green Version]
Stavrou, P.A.; Charalambous, T.; Charalambous, C.D.; Loyka, S.; Skoglund, M. Asymptotic Reverse- Waterfilling Characterization of Nonanticipative Rate Distortion Function of Vector-Valued Gauss-Markov Processes with MSE Distortion. In Proceedings of the 2018 IEEE Conference on Decision and Control (CDC), Beach, FL, USA, 17–19 December 2018; pp. 14–20. [Google Scholar]
Stavrou, P.A.; Østergaard, J.; Skoglund, M. On Zero-delay Source Coding of LTI Gauss-Markov Systems with Covariance Matrix Distortion Constraints. In Proceedings of the European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 3083–3088. [Google Scholar]
Rioul, O. Information Theoretic Proofs of Entropy Power Inequalities. IEEE Trans. Inf. Theory 2011, 57, 33–55. [Google Scholar] [CrossRef] [Green Version]
Zamir, R. Lattice Coding for Signals and Networks; Cabridge University Press: Cambridge, UK, 2014. [Google Scholar]
Fuglsig, A.J.; Østergaard, J. Zero-delay Multiple descriptions of stationary scalar Gauss-Markov sources. Entropy 2019, 21, 1185. [Google Scholar] [CrossRef] [Green Version]
Tanaka, T.; Johansson, K.H.; Oechtering, T.; Sandberg, H.; Skoglund, M. Rate of prefix-free codes in LQG control systems. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 2399–2403. [Google Scholar]
Zamir, R.; Feder, M. Information rates of pre/post-filtered dithered quantizers. IEEE Trans. Inf. Theory 1996, 42, 1340–1353. [Google Scholar] [CrossRef]
Conway, J.H.; Sloane, N.J.A. Sphere-packings, Lattices, and Groups, 3rd ed.; Springer-Verlag New York, Inc.: New York, NY, USA, 1999. [Google Scholar]
Harville, D.A. Matrix Algebra From a Statistician’s Perspective; Springer: New York, NY, USA, 1997. [Google Scholar]
Kim, K.K.K. Optimization and Convexity of log det(I+KX⁻¹). Int. J. Contr. Autom. Syst. 2019, 17, 1067–1070. [Google Scholar] [CrossRef]

Figure 1. System model. The encoder receives information from two users that do not interact from a dynamical system perspective, but they are allowed to allocate bits between them and across the dimensions. The compression is done causally whereas the clocks of the encoder and the decoder are assumed to be synchronized.

Figure 2. Centralized multivariable multi-input multi-output (MIMO) control system.

Figure 3. Comparison of

R_{joint}^{na} (D^{*})

,

R_{pd}^{na} (D)

when

D_{i i} \neq D_{j j}

, for

i \neq j

, and comparison with ([16], Equation (27)).

Figure 3. Comparison of

R_{joint}^{na} (D^{*})

,

R_{pd}^{na} (D)

when

D_{i i} \neq D_{j j}

, for

i \neq j

, and comparison with ([16], Equation (27)).

Figure 4. Comparison of

R_{joint}^{L B} (D^{*})

,

R_{pd}^{L B} (D)

and

R^{L B} (D_{^{T}})

when

D_{i i} = D_{j j}

,

\forall i \neq j

.

Figure 4. Comparison of

R_{joint}^{L B} (D^{*})

,

R_{pd}^{L B} (D)

and

R^{L B} (D_{^{T}})

when

D_{i i} = D_{j j}

,

\forall i \neq j

.

Figure 5.

R^{L B} (D_{cov})

as a function of

γ \geq 0

.

Figure 5.

R^{L B} (D_{cov})

as a function of

γ \geq 0

.

Figure 6. Comparison of

R^{L B} (D_{cov})

(restricted to its values in the main diagonal) and

R_{pd}^{L B} (D)

for certain values of γ.

Figure 6. Comparison of

R^{L B} (D_{cov})

(restricted to its values in the main diagonal) and

R_{pd}^{L B} (D)

for certain values of γ.

Figure 7. Comparison of the lower bound

R_{joint}^{L B} (D^{*})

with the analytical expression of (60).

Figure 7. Comparison of the lower bound

R_{joint}^{L B} (D^{*})

with the analytical expression of (60).

Figure 8. DPCM scheme with feedback loop for the augmented multidimensional Markov model of (3).

Figure 9. Forward test channel realization of (71).

Figure 10. Comparison of lower and upper bounds on

R_{joint}^{c} (D)

when H is full rank.

Figure 10. Comparison of lower and upper bounds on

R_{joint}^{c} (D)

when H is full rank.

Figure 11. Comparison of the lower and upper bounds on

R_{joint}^{c} (D)

when H is rank deficient.

Figure 11. Comparison of the lower and upper bounds on

R_{joint}^{c} (D)

when H is rank deficient.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stavrou, P.A.; Østergaard, J.; Skoglund, M. Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints. Entropy 2020, 22, 842. https://doi.org/10.3390/e22080842

AMA Style

Stavrou PA, Østergaard J, Skoglund M. Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints. Entropy. 2020; 22(8):842. https://doi.org/10.3390/e22080842

Chicago/Turabian Style

Stavrou, Photios A., Jan Østergaard, and Mikael Skoglund. 2020. "Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints" Entropy 22, no. 8: 842. https://doi.org/10.3390/e22080842

APA Style

Stavrou, P. A., Østergaard, J., & Skoglund, M. (2020). Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints. Entropy, 22(8), 842. https://doi.org/10.3390/e22080842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints

Abstract

1. Problem Statement

1.1. Generalizations

1.2. Distortion Constraints

1.3. Operational Interpretations

1.4. Contributions

2. Lower Bounds

2.1. Characterization and Computation of Jointly Gaussian Processes

2.2. Analytical Lower Bounds for Markov Sources Driven by Additive $i . i . d .$ Noise Processes

3. Upper Bounds

3.1. DPCM with Feedback Loop

3.2. Bounding (11)–(13) via A DPCM-based ECDQ for Gaussian Noise Processes

3.3. Bounding (11)–(13) via a DPCM-based ECDQ for Non-Gaussian Noise Processes

4. Conclusions and Future Research

Author Contributions

Funding

Conflicts of Interest

Notation

Abbreviations

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints

Abstract

1. Problem Statement

1.1. Generalizations

1.2. Distortion Constraints

1.3. Operational Interpretations

1.4. Contributions

2. Lower Bounds

2.1. Characterization and Computation of Jointly Gaussian Processes

2.2. Analytical Lower Bounds for Markov Sources Driven by Additive i . i . d . Noise Processes

3. Upper Bounds

3.1. DPCM with Feedback Loop

3.2. Bounding (11)–(13) via A DPCM-based ECDQ for Gaussian Noise Processes

3.3. Bounding (11)–(13) via a DPCM-based ECDQ for Non-Gaussian Noise Processes

4. Conclusions and Future Research

Author Contributions

Funding

Conflicts of Interest

Notation

Abbreviations

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. Analytical Lower Bounds for Markov Sources Driven by Additive $i . i . d .$ Noise Processes