Next Article in Journal
Deep Learning for Stock Market Prediction
Next Article in Special Issue
Reduction Theorem for Secrecy over Linear Network Code for Active Attacks
Previous Article in Journal
Performance Analysis and Constellation Design for the Parallel Quadrature Spatial Modulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints

by
Photios A. Stavrou
1,*,
Jan Østergaard
2 and
Mikael Skoglund
1
1
Department of Intelligent Systems, Division of Information Science and Engineering, KTH Royal Institute of Technology, 11428 Stockholm, Sweden
2
Section on Signal and Information Processing, Department of Electronic Systems, Aalborg University, 9000 Aalborg, Denmark
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(8), 842; https://doi.org/10.3390/e22080842
Submission received: 18 June 2020 / Revised: 22 July 2020 / Accepted: 27 July 2020 / Published: 30 July 2020
(This article belongs to the Special Issue Multiuser Information Theory III)

Abstract

:
In this paper, we derive lower and upper bounds on the OPTA of a two-user multi-input multi-output (MIMO) causal encoding and causal decoding problem. Each user’s source model is described by a multidimensional Markov source driven by additive i . i . d . noise process subject to three classes of spatio-temporal distortion constraints. To characterize the lower bounds, we use state augmentation techniques and a data processing theorem, which recovers a variant of rate distortion function as an information measure known in the literature as nonanticipatory ϵ -entropy, sequential or nonanticipative RDF. We derive lower bound characterizations for a system driven by an i . i . d . Gaussian noise process, which we solve using the SDP algorithm for all three classes of distortion constraints. We obtain closed form solutions when the system’s noise is possibly non-Gaussian for both users and when only one of the users is described by a source model driven by a Gaussian noise process. To obtain the upper bounds, we use the best linear forward test channel realization that corresponds to the optimal test channel realization when the system is driven by a Gaussian noise process and apply a sequential causal DPCM-based scheme with a feedback loop followed by a scaled ECDQ scheme that leads to upper bounds with certain performance guarantees. Then, we use the linear forward test channel as a benchmark to obtain upper bounds on the OPTA, when the system is driven by an additive i . i . d . non-Gaussian noise process. We support our framework with various simulation studies.

1. Problem Statement

We consider the two-user causal encoding and causal decoding setup illustrated in Figure 1. In this setup, users 1 and 2 are modeled by the following discrete-time time-invariant multidimensional Markov processes:
x t + 1 1 = A 1 x t 1 + w t 1 , t = 0 , 1 , x t + 1 2 = A 2 x t 2 + w t 2 , ,
where x t 1 R p 1 , x t 2 R p 2 , with p 1 not necessarily equal to p 2 , ( A 1 , A 2 ) are known constant matrices of appropriate dimensions and ( w t 1 , w t 2 ) are additive i . i . d . possibly non-Gaussian noise processes with zero mean and covariance matrix Σ w i 0 , i = 1 , 2 , independent of x 0 i , i = 1 , 2 and from each other for all t 0 . The initial states x 0 i , i = 1 , 2 are given by x 0 i ( 0 ; Σ x 0 i ) , i = 1 , 2 . Finally, we restrict the eigenvalues of ( A 1 , A 2 ) to be within the unit circle, which means that each user’s system model in (1) is asymptotically stable (i.e., asymptotically stationary).
The goal is to cast the performance of the setup in Figure 1 under various distortion metrics when the encoder compress information causally whereas the lossless compression between the encoder and the decoder is done in one shot assuming their clocks are synchronized.
First, we apply state space augmentation [1] to the state-space models in (1) to transform them into a single augmented state-space model as follows:
x t + 1 = A x t + w t ,
where x t + 1 = [ x t + 1 1 T , x t + 1 2 T ] T R p 1 + p 2 , A is a block diagonal matrix and w t is an additive i . i . d . possibly non-Gaussian noise process such that w t ( 0 ; Σ w ) where ( A , Σ w ) are of the form
A 1 0 0 A 2 R ( p 1 + p 2 ) × ( p 1 + p 2 ) , Σ w 1 0 0 Σ w 2 R ( p 1 + p 2 ) × ( p 1 + p 2 ) .
We note that the operation in (3) can be mathematically denoted as A = A 1 A 2 and similarly, Σ w = Σ w 1 Σ w 2 (see the notation section for “⊕”).
System Operation: The encoder at each time instant t observes the augmented state x t and generates the data packet t { 1 , , 2 R t } of instantaneous rate R t . At time t, the packet t is sent over a noiseless channel with rate R t . The decoder at each time t, receives t to construct an estimate y t of x t . We assume that the clocks of the encoder and decoder are synchronized. Formally, the encoder ( E ) and the decoder ( D ) are specified by a sequence of measurable functions { ( f t , g t ) : t N 0 } as follows:
E : t = f t ( t 1 , x t ) , 1 = , D : y t = g t ( t ) .

1.1. Generalizations

It should be noted that the setup in Figure 1 can be generalized to any finite number of users. The only change will appear in the number of state-space equations and the dimension of the vectors and matrices in the augmented state-space representation of (2).
Next, we explain the setup of two users that are correlated (in states). In such scenario, users 1 and 2 are modeled by the following discrete-time time-invariant multidimensional Markov processes:
x t + 1 1 = A 11 x t 1 + A 12 x t 2 + w t 1 , t = 0 , 1 , x t + 1 2 = A 22 x t 2 + A 21 x t 1 + w t 2 , ,
where x t 1 R p 1 , x t 2 R p 2 , with p 1 not necessarily equal to p 2 , ( A 11 , A 12 , A 21 , A 22 ) are known constant matrices of appropriate dimensions whereas all the other assumptions remain similar to the user models described in (1). The single augmented state-space model now is obtained as follows:
x t + 1 = A ^ x t + w t ,
where A ^ is a block matrix of the form
A 11 A 12 A 21 A 22 R ( p 1 + p 2 ) × ( p 1 + p 2 ) ,
where A 11 , A 22 are square matrices but ( A 12 , A 21 ) may be rectangular matrices (if p 1 p 2 ). We will not consider this case in our paper because it is straightforward by replacing everywhere matrix A with matrix A ^ . Clearly, this case can be generalized to any finite number of users with appropriate modifications on the state-space models.

1.2. Distortion Constraints

In this work we consider three types of distortion constraints. These are articulated as follows:
(i)
a per-dimension (spatial) distortion constraint on the asymptotically averaged total (across the time) MMSE covariance matrix;
(ii)
an asymptotically averaged total (across the time and across the space) distortion constraint;
(iii)
a covariance matrix distortion constraint.
Next, we give the definition of each distortion constraint and explain some of their utility in multi-user systems.
A per-dimension (spatial) distortion constraint imposed on the covariance distortion matrix Σ Δ lim sup n 1 n + 1 t = 0 n E ( x t y t ) ( x t y t ) T , where Σ Δ 0 , is defined as follows:
Σ Δ i i D i i , i = 1 , , p ,
where D i i [ 0 , D i i max ] are given diagonal entries of the positive semidefinite matrix D ^ 0 , with trace ( D ^ ) D , D [ 0 , D max ] . Note that under this distortion constraint, it trivially holds that lim sup n 1 n + 1 t = 0 n E | | x t y t | | 2 2 D .
Utility: The choice of per-dimension distortion constraints is arguably more realistic in various network systems. For instance, one use of such hard constraints can be found in multivariable feedback control systems also called centralized multi-input multi-output (MIMO) systems [2] (see Figure 2). In such networks, it may be the case that one wishes to minimize the temporally total performance criterion or satisfy some total fidelity constraint. However, in addition it is always required that the resource allocation to the different nodes (or variables) to never exceed certain performance thresholds when the demands in data transmission within the communication link allows only limited rate. Nonetheless, the problem is that variables interact. Some variables could be considered more important for certain applications according to the demands of the system or the quality of service, which is why they need hard constraints.
An asymptotically averaged total (across the time and space) distortion constraint is defined as follows:
lim sup n 1 n + 1 t = 0 n E | | x t y t | | 2 2 D T ,
where D T [ 0 , D T max ] , with D T not necessarily equal to D.
Utility: The asymptotically averaged total (across the time and space) distortion constraint ensure shared or allocated distortion arbitrarily among the transmit dimensions. The combination of the per-dimension distortion constraint with the averaged total distortion constraints ensure a total allocated distortion budget in the system that depends on the allowable (by design) distortion budget at each dimension (or user).
A covariance matrix distortion constraint is a generalization of the per-dimension distortion constraint defined by
Σ Δ D cov ,
where D cov 0 .
Utility: During the recent years, there has been a shift from conventional MSE distortion constraints (scalar-valued target distortions) to covariance matrix distortions in the areas of multiterminal and distributed source coding [3,4,5,6,7] and signal processing [7,8,9]. Nonetheless, the argument for considering covariance distortion constraints despite its difficulty is its generality and the flexibility in formulating new problems. For instance, one practical example would be wireless adhoc microphones, that transmit to receiver(s) over an MIMO channel. In such setups, perhaps the receiver(s) need to do beam forming or some multi-channel Wiener filtering variants. In both cases, one needs to know the covariance matrix of e.g., the error signal (covariance distortion matrix) to perform the desired signal enhancement. For example, if the quality of one of the signals is too bad, this could harm the overall signal enhancement, and one therefore need to trade-off the bits correctly among the microphones. In an adhoc microphone array, the different signals are naturally correlated, which adds an interesting interplay between them that goes beyond MSE distortion.

1.3. Operational Interpretations

In this subsection, we use the three types of distortion constraints introduced in Section 1.2 to define the corresponding operational definitions for which we study lower and upper bounds in this paper.
Definition 1 
(Causal RDF subject to (8)). The operational causal RDF under per-dimension distortion constraints is defined as follows:
R pd c ( D ) inf ( f t , g t ) : t N 0 Σ Δ i i D i i , i = 1 , , p lim sup n 1 n + 1 t = 0 n R t ,
where D i i [ 0 , D i i , max ] and D [ 0 , D max ] .
Definition 2 
(Causal RDF subject to (8) and (9)). The operational causal RDF under joint per-dimension and asymptotically averaged total distortion constraints is defined as follows:
R joint c ( D * ) inf ( f t , g t ) : t N 0 Σ Δ i i D i i , i = 1 , , p , lim sup n 1 n + 1 t = 0 n E | | x t y t | | 2 2 D * lim sup n 1 n + 1 t = 0 n R t ,
where D * = min { D T , D } .
Interplay between Definitions 1 and 2. Clearly, Definition 1 is a lower bound to Definition 2 because its constraint set of feasible solutions is larger. Note that, in general, the asymptotically averaged total distortion constraint in (12) is active when D T D , otherwise, it is a trivial constraint and (12) is equivalent to the optimization problem of (11). This observation will be shown via a simulation study in the sequel of the paper.
Definition 3 
(Causal RDF subject to (10)). The operational causal RDF under covariance matrix distortion constraints is defined as follows:
R c ( D cov ) inf ( f t , g t ) : t N 0 Σ Δ D cov lim sup n 1 n + 1 t = 0 n R t ,
where D cov 0 .
Literature Review. In information theory, causal coding and causal decoding also termed zero-delay coding (see, e.g., [10,11,12,13,14]) does not rely on the traditional construction based on random codebooks that in turn allows asymptotically large source vector dimensions [15] to establish achievability of a certain (non-causal) rate-distortion performance. Indeed, the optimal rate-distortion performance for causal source coding (with the clocks of the encoder and decoder to be synchronized), is hard to compute and often bounds are derived in the literature. For example, lower and upper bounds on the operational causal RDF subject to solely the distortion constraint in (9) (or the more stringent per instant distortion constraint E { | | x t y t | | 2 2 } D t , t ) are already studied extensively for various special cases of the setup of Figure 1, see, e.g., [11,14,16,17] and the references therein. In this work, we study new problems related to the causal RDF for the general multi-user source coding setup of Figure 1 under various classes of distortion constraints that their utility (partly explained in Section 1.2) has not been studied in the literature so far. These bounds are established using tools from information theory, convex optimization and causal MMSE estimation.

1.4. Contributions

In this paper we obtain the following results for the setup of Figure 1.
  • Characterization and computation of the true lower bounds on (11)–(13) when the users’ augmented source model is driven by a Gaussian noise process (Lemma 3, Theorem 2).
  • Analytical lower bounds on (11)–(13) when the users’ augmented source model is driven by additive i . i . d . noise process (including both additive Gaussian and non-Gaussian noise) (Theorem 3). As a consequence, we also obtain analytical lower bounds when only one of the users’ source model is driven by a Gaussian noise process (Corollary 2).
  • Characterizations and computation of achievable bounds on (11)–(13) when the users’ augmented source model is driven by a Gaussian noise process (Theorem 4).
  • Characterizations of achievable bounds on (11)–(13), when the users’ augmented source model is driven by additive i . i . d . non-Gaussian noise process (Theorem 5).
Machinery and tools. The information theoretic rate distortion definitions that are used to obtain the lower bounds in this paper are derived using a data processing theorem (Theorem 1) that reveals the “suitable” information measure to use. The derivation of the steady-state characterization of the lower bounds in Lemma 3 is derived using inequalities from matrix algebra and a convexity argument that allows the use of Jensen’s inequality. To derive lower bounds beyond additive i . i . d . Gaussian noise process, we use the fact that the characterizations of the lower bounds for the Gaussian case are in fact the characterizations obtained for the best linear coding policies (Lemma 4) hence these can serve as a benchmark to derive lower bounds beyond Gaussian noise process by leveraging certain trace/determinant inequalities and most importantly Minkowski’s determinant inequality ([18], Exercise 12.13) and EPI [19]. The upper bounds on the OPTA when the system’s noise is Gaussian, are derived using a causal sequential DPCM-based scheme with feedback loop which is equivalent to the scheme first derived in [14], followed by an ECDQ scheme that uses vector quantization. The upper bounds on the OPTA, when the system’s noise is additive i . i . d . non-Gaussian are obtained using precisely the same trick that is used to obtain the lower bounds, i.e., we use the linear test channel realization that achieves similar upper bounds for the Gaussian case and then, using an SLB type concept (Theorem 5) we obtain the desired results.
The paper is structured as follows. In Section 2 we characterize and compute lower bounds on the OPTA of (11)–(13). In Section 3 we characterize and compute achievable coding scheme on the OPTA of (11)–(13). We draw conclusions and future directions in Section 4.

2. Lower Bounds

In this section, we first choose a suitable information measure that will be used to derive a lower bound on Definitions 1–3. This information measure is a variant of directed information subject to some conditional independence constraints. Then, we obtain lower bounds on Definitions 1–3 for jointly Gaussian Markov processes and for Markov processes driven by additive i . i . d . possibly non-Gaussian noise process.
First, we write the joint distribution of the communication system of Figure 1, i.e., from the two users described by the augmented state { x t : t N 0 n } to the augmented output of the MMSE decoder { y t : t N 0 n } . In particular, the joint distribution induced by the joint process { ( x t , t , y t ) : t N 0 n } admits the following decomposition:
P ( d x n , d n , d y n ) = t = 0 n P ( d y t , d t , d x t | y t 1 , t 1 , x t 1 ) = t = 0 n P ( d y t | y t 1 , x t , t ) P ( d t | t 1 , y t 1 , x t ) P ( d x t | x t 1 , y t 1 , t 1 ) a . s . = t = 0 n P ( d y t | y t 1 , t ) P ( d t | t 1 , y t 1 , x t ) P ( d x t | x t 1 ) a . s . ,
which means that the augmented state “source” process x t , and the decoder’s output process y t satisfy the following conditional independence constraints:
P ( d x t | x t 1 , y t 1 , t 1 ) = P ( d x t | x t 1 ) a . s . ,
P ( d y t | y t 1 , t , x t ) = P ( d y t | y t 1 , t ) a . s .
For (14) we state the following technical remark.
Remark 1 
(Trivial initial information). In (14) we assume that the joint distribution P ( d x 1 , d 1 , d y 1 ) generates trivial information. This means that P ( d x 0 | x 1 , y 1 , 1 ) = P ( d x 0 ) , P ( d y 0 | y 1 , x 0 , 0 ) = P ( d y 0 | x 0 , 0 ) and P ( d 0 | 1 , y 1 , x 0 ) = P ( d 0 | x 0 ) .
We next prove a data processing theorem.
Theorem 1 
(Data processing theorem). Provided the decomposition of the joint distribution in (14) holds, the augmented state-space representation of the system in Figure 1 admits the following data processing inequalities:
I ( x n ; y n ) ( ii ) I ( x n ; n | | y n 1 ) ( i ) t = 0 n R t ,
where
I ( x n ; n | | y n 1 ) t = 0 n I ( x t ; t | t 1 , y t 1 ) , I ( x n ; y n ) t = 0 n I ( x t ; y t | y t 1 ) ,
assuming I ( x t ; t | t 1 , y t 1 ) < , t , and I ( x t ; y t | y t 1 ) < , t .
Proof. 
We first prove (i).
t = 0 n R t t = 0 n H ( t | t 1 ) ( a ) t = 0 n H ( t | t 1 , y t 1 ) ( b ) t = 0 n H ( t | t 1 , y t 1 ) H ( t | t 1 , y t 1 , x t ) = ( c ) t = 0 n I ( x t ; t | t 1 , y t 1 ) I ( x n ; n | | y n 1 ) ,
where ( a ) follows because conditioning reduces entropy [19]; ( b ) follows because of the non-negativity of the discrete entropy [19]; ( c ) follows by definition.
Next, we prove (ii). This can be shown as follows:
I ( x t ; t | t 1 , y t 1 ) I ( x t ; y t | y t 1 ) = ( d ) I ( x t ; t , y t | t 1 , y t 1 ) I ( x t ; y t | y t 1 ) = ( e ) I ( x t ; t | y t ) I ( x t ; t 1 | y t 1 ) , = ( f ) I ( x t ; t | y t ) I ( x t 1 ; t 1 | y t 1 ) ,
where ( d ) follows from an adaptation of ([20], Lemma 3.3) to processes, i.e., I ( x t ; t , y t | t 1 , y t 1 ) = I ( x t ; t | t 1 , y t 1 ) + I ( x t ; y t | t , y t 1 ) and the second term is zero because of the conditional independence constraint (16); ( e ) follows by the chain rule of conditional mutual information (again an adaptation of ([20], Lemma 3.3)) which decomposes the conditional mutual information in two different ways, i.e.,
I ( x t ; t , y t | y t 1 ) = I ( x t ; t 1 | y t 1 ) + I ( x t ; t , y t | t 1 , y t 1 ) = I ( x t ; t | y t ) + I ( x t ; y t | y t 1 ) ;
( f ) follows because an adaptation of ([20], Lemma 3.3) can be applied to I ( x t ; t 1 | y t 1 ) as follows
I ( x t ; t 1 | y t 1 ) = I ( x t , x t 1 ; t 1 | y t 1 ) = I ( x t ; t 1 | x t 1 , y t 1 ) + I ( x t 1 ; t t | y t 1 ) = ( g ) I ( x t 1 ; t t | y t 1 ) , t ,
where ( g ) follows because I ( x t ; t 1 | x t 1 , y t 1 ) = 0 . This can be shown as follows.
I ( x t ; t 1 | x t 1 , y t 1 ) = h ( x t | x t 1 , y t 1 ) h ( x t | x t 1 , y t 1 , t 1 ) = h ( x t | x t 1 , y t 1 ) h ( x t | x t 1 ) 0 , t ,
where each h ( · ) is assumed to be finite for any t, and (20) follows from the conditional independence constraint in (15). From the non-negativity of conditional mutual information [19], the result follows.
Finally, from (19) we have
t = 0 n I ( x t ; t | y t ) I ( x t 1 ; t 1 | y t 1 ) = I ( x 0 ; 0 | y 0 ) + I ( x 1 ; 1 | y 1 ) I ( x 0 ; 0 | y 0 ) + + I ( x n ; n | y n ) I ( x n 1 ; n 1 | y n 1 )
= I ( x n ; n | y n ) 0 ,
where (22) follows by applying the method of differences in (21). The result follows because (22) is by definition non-negative. We note that if I ( x 0 ; 0 | y 0 ) also appeared in the cancellations, then, this would have been the telescopic sum of the series. This completes the derivation. □
We note that Theorem 1 is different from the data processing theorem derived in ([21], Lemma 1) in that we assume the conditional independence constraint (15) instead of the conditional independence constraint P ( d x t | x t 1 , y t 1 , t 1 ) = P ( d x t | x t 1 , y t 1 ) a . s . , i.e., the source process is not allowed to have access via feedback to the previous output symbols y t 1 . This technical difference results into having the mutual information in (18) subject to conditional independence constraints, instead of the well-known directed information [22].
Before we introduce the information theoretic definitions that correspond to lower bounds on (11)–(13), we formally show the construction of I ( x n ; y n ) .
Source. The augmented source process { x t : t N 0 } induces the sequence of conditional distributions { P ( d x t | x t 1 ) , t N 0 n } . Since no initial information is assumed, the distribution at t = 0 is P ( d x 0 ) . In addition, by Bayes’ rule we obtain P ( d x n ) t = 0 n P ( d x t | x t 1 ) .
Reproduction or “test-channel”. The reproduction process { y t : t N 0 n } parameterized by X t induces the sequence of conditional distributions, known as test-channels, by { P ( d y t | y t 1 , x t ) , t N 0 n } . At t = 0 , no initial state information is assumed, hence P ( d y 0 | y 1 , x 0 ) = P ( d y 0 | x 0 ) . In addition, by Bayes’ rule we obtain Q ( d y n | x n ) t = 0 n P ( d y t | y t 1 , x t ) .
From ([23], Remark 1), it can be shown that the sequence of conditional distributions { P ( d x t | x t 1 ) : t N 0 n } and { P ( d y t | y t 1 , x t ) : t N 0 n } uniquely define the family of conditional distributions on X n and Y n parameterized by x n X n , respectively, given by the joint distribution
P ( d x n , d y n ) = P ( d x n ) Q ( d y n | x n ) .
In addition, from (23), we can uniquely define the Y n marginal distribution by
P ( d y n ) X n P ( d x n ) Q ( d y n | x n ) ,
and the conditional distributions { P ( d y t | y t 1 ) : t N 0 n } .
Given the above construction of distributions, we can formally introduce the information measure using relative entropy as follows:
I ( x n ; y n ) ( a ) D ( P ( d x n , d y n ) | | P ( d x n ) × P ( d y n ) ) [ 0 , ] = ( b ) X n × Y n log d Q ( · | x n ) d P ( · ) ( y n ) P ( d x n , d y n ) = ( c ) t = 0 n E log d P ( · | y t 1 , x t ) d P ( · | y t 1 ) ( y t ) = ( d ) t = 0 n I ( x t ; y t | y t 1 ) ,
where ( a ) follows by definition of relative entropy between P ( d x n , d y n ) and the product distribution P ( d x n ) × P ( d y n ) ; ( b ) is due to the Radon–Nikodym derivative theorem ([23], Appendix A and Appendix C); ( c ) is due to chain rule of relative entropy; ( d ) follows by definition.
We now state as a definition the lower bounds on (11)–(13).
Definition 4 
(Lower bounds). Using the previous construction of distributions and the information measure of (24), we can define the following lower bounds on (11)–(13).
(1) 
The sum-rate subject to per-dimension distortion constraint is defined as follows:
R pd L B ( D ) inf P ( d y t | y t 1 , x t ) : t = 0 , , Σ Δ i i D i i , i = 1 , , p lim sup n 1 n + 1 I ( x n ; y n ) ,
with
R pd , [ 0 , n ] L B ( D ) inf P ( d y t | y t 1 , x t ) : t N 0 n Σ Δ i i , t D i i , i = 1 , , p I ( x n ; y n ) .
where Σ Δ , t 1 n + 1 t = 0 n E { ( x t y t ) ( x t y t ) T } , Σ Δ i i , t 1 n + 1 t = 0 n E { ( x t y t ) ( x t y t ) T } i i and D 0 .
(2) 
The sum-rate subject to joint distortion constraints is defined as follows:
R joint L B ( D * ) inf P ( d y t | y t 1 , x t ) : t = 0 , , Σ Δ i i D i i , i = 1 , , p lim sup n 1 n + 1 t = 0 n E | | x t y t | | 2 2 D T lim sup n 1 n + 1 I ( x n ; y n ) ,
R joint , [ 0 , n ] L B ( D * ) inf P ( d y t | y t 1 , x t ) : t N 0 n Σ Δ i i , t D i i , i = 1 , , p 1 n + 1 t = 0 n E | | x t y t | | 2 2 D T I ( x n ; y n ) ,
where D * = min { D , D T } .
(3) 
The sum-rate subject to covariance matrix distortion constraints is defined as follows:
R L B ( D cov ) inf P ( d y t | y t 1 , x t ) : t = 0 , , Σ Δ D cov lim sup n 1 n + 1 I ( x n ; y n ) ,
R [ 0 , n ] L B ( D cov ) inf P ( d y t | y t 1 , x t ) : t N 0 n Σ Δ , t D cov I ( x n ; y n ) ,
where D cov 0 .
Next, we stress some technical remarks related to the new information theoretic measures in Definition 4 that can be obtained using known results in the literature and some known lower bounds that use the same objective function with (26)–(30).
Remark 2 
(Comments on Definition 4). It can be shown that the infimization problems (26), (28) and (30), in contrast to their operational counterparts (11)–(13) are convex with respect to their test channel. This can be shown following, for instance, the techniques of [23]. By the structural properties of the test channel derived in ([24], Section 4), if the source is first-order Markov, i.e., with distribution P ( d x t | x t 1 ) , t N 0 n , the test channel distribution is of the form P ( d y t | y t 1 , x t ) , t N 0 n . Finally, combining this structural result, with ([25], Theorem 1.8.6), it can be shown that if x n is Gaussian then a jointly Gaussian process { ( x t , y t ) : t N 0 } achieves a smaller value of the information rates, and if x n is Gaussian and Markov, then the infimum in (26), (28) and (30) can be restricted to test channel distributions which are Gaussian, of the form P ( d y t | y t 1 , x t ) .
We recall that when the distortion constraint set contains only (9), its finite time horizon counterpart or the per instant distortion constraint E { | | x t y t | | 2 2 } D t t , we end up having the well known nonanticipatory-ϵ entropy [26] also found in the literature as sequential or nonanticipative RDF [27,28]. Nonanticipatory-ϵ entropy received significant interest during the last twenty years in an anthology of papers (see, e.g., [11,16,24,29,30,31]) due to its utility in control related and delay-constrained applications. Moreover, the characterizations in (29) and (30) do not appear to be manageable to solve using standard techniques, and no closed-form statements are available for the general RDF in the literature. For this reason, we will seek only for non-tight bounds.
In view of the above, in the sequel we characterize and compute lower bounds on Definitions 1–3 for Gauss–Markov processes and for Markov models driven by additive i . i . d . noise processes.

2.1. Characterization and Computation of Jointly Gaussian Processes

In this section, we assume that the augmented joint process { ( x t , y t ) : t N 0 } is jointly Gaussian. We use this assumption to first characterize and then to compute optimally (26), (28) and (30).
We first use the following helpful lemma. We exclude the proof because it is already derived in other papers, see, e.g., [14,24]. The only modification is the augmented joint processes { ( x t , y t ) : t N 0 n } .
Lemma 1 
(Realization of { P * ( d y t | y t 1 , x t ) : t N 0 n } ). Consider the class of conditionally Gaussian test channels { P * ( d y t | y t 1 , x t ) : t N 0 n } . Then, the following statements hold.
(1) 
Any candidate of { P * ( d y t | y t 1 , x t ) : t N 0 n } can be realized by the recursion
y t = H t x t x ^ t | t 1 + x ^ t | t 1 + v t , x ^ 0 | 1 = g i v e n , t N 0 n ,
where x ^ t | t 1 E { x t | y t 1 } , { v t R p 1 + p 2 N ( 0 ; Σ v t ) : t N 0 n } is an independent Gaussian process independent of { w t : t N 0 n 1 } and x 0 , and { H t R ( p 1 + p 2 ) × ( p 1 + p 2 ) : t N 0 n } are time-varying deterministic matrices.
Moreover, the innovations process { I t R p 1 + p 2 : t N 0 n } of (31) is the orthogonal process defined by
I t y t E y t | y t 1 = H t x t x ^ t | t 1 + v t ,
where I t N ( 0 ; Σ I t ) , Σ I t = H t Σ t | t 1 H t T + Σ v t and Σ t | t 1 E ( x t x ^ t | t 1 ) ( x t x ^ t | t 1 ) T | y t 1 = E ( x t x ^ t | t 1 ) ( x t x ^ t | t 1 ) T .
(2) 
Let x ^ t | t E { x t | y t } and Σ t | t E ( x t x ^ t | t ) ( x t x ^ t | t ) T | y t = E ( x t x ^ t | t ) ( x t x ^ t | t ) T . Then, { x ^ t | t 1 , Σ t | t 1 : t N 0 n } satisfy the following vector-valued equations:
x ^ t | t 1 = A x ^ t 1 | t 1 , Σ t | t 1 = A Σ t 1 | t 1 A T + Σ w t , x ^ t | t = x ^ t | t 1 + N t I t , N t = Σ t | t 1 H t T Σ I t 1 ( K a l m a n G a i n ) , Σ t | t = Σ t | t 1 Σ t | t 1 H t T Σ I t 1 H t Σ t | t 1 ,
where Σ t | t 0 and Σ t | t 1 0 .
(3) 
Using MMSE estimation via the vector-valued KF recursions of (32), the following finite dimensional characterizations of R pd , [ 0 , n ] L B , G ( D ) , R joint , [ 0 , n ] L B , G ( D * ) , R cov , [ 0 , n ] L B , G ( D cov ) can be obtained:
R pd , [ 0 , n ] L B , G ( D ) = inf H t R ( p 1 + p 2 ) × ( p 1 + p 2 ) , Σ v t 0 , t N 0 n 0 Σ ˜ i i , t D i i , i = 1 , , p 1 2 t = 0 n log | Σ t | t 1 | | Σ t | t | + ,
R joint , [ 0 , n ] L B , G ( D * ) = inf H t R ( p 1 + p 2 ) × ( p 1 + p 2 ) , Σ v t 0 , t N 0 n 0 Σ ˜ i i , t D i i , i = 1 , , p 1 n + 1 t = 0 n trace ( I p 1 + p 2 H t ) Σ t | t 1 ( I p 1 + p 2 H t ) T + Σ v t D T 1 2 t = 0 n log | Σ t | t 1 | | Σ t | t | + ,
R [ 0 , n ] L B , G ( D cov ) = inf H t R ( p 1 + p 2 ) × ( p 1 + p 2 ) , Σ v t 0 , t N 0 n 1 n + 1 t = 0 n ( I p 1 + p 2 H t ) Σ t | t 1 ( I p 1 + p 2 H t ) T + Σ v t D cov 1 2 t = 0 n log | Σ t | t 1 | | Σ t | t | + ,
where Σ ˜ i i , t 1 n + 1 t = 0 n ( I p 1 + p 2 H t ) Σ t | t 1 ( I p 1 + p 2 H t ) T + Σ v t i i 0 , D [ 0 , D max ] and D * [ 0 , D max * ] .
We note that one can directly study the finite-dimensional characterizations of Lemma 1, (3), and try to come up with numerical solutions. However, it is much more insightful to use instead the identification of the design parameters { ( H t , Σ v t ) : t N 0 n } of the test-channel realization in (31). This approach is already done in [14,24] hence we state it without a proof. Note, however, that compared to [14,24] that assume distortion constraints like (9) (or the per time instant counterpart of (9), i.e., E | | x t y t | | 2 2 D t , t ), here we assume augmented state-space models and various spatio-temporal distortion constraints, namely, per-dimension, jointly per-dimension and averaged total distortion constraints, and covariance matrix distortion constraint.
Lemma 2 
(Alternative characterizations of (33)–(35) via system identification). Consider Lemma 1 and set Δ t Σ t | t and Λ t Σ t | t 1 . Then, the following statements hold.
(1) 
The test-channel distribution P ( d y t | y t 1 , x t ) admits the following linear Markov additive noise realization:
y t = H t x t + ( I p 1 + p 2 H t ) A y t 1 + v t , y 1 = g i v e n , t N 0 n ,
where
H t I p 1 + p 2 Δ t Λ t 1 , Σ v t Δ t H t T 0 .
(2) 
The finite-dimensional characterizations of R pd , [ 0 , n ] L B , G ( D ) , R joint , [ 0 , n ] L B , G ( D * ) , R cov , [ 0 , n ] L B , G ( D cov ) can be simplified to the following alternative characterizations:
R pd , [ 0 , n ] L B , G ( D ) = inf 0 Δ i i , t D i i , i = 1 , , p , t N 0 n 1 2 t = 0 n log | Λ t | | Δ t | + ,
R joint , [ 0 , n ] L B , G ( D * ) = inf 0 Δ i i , t D i i , i = 1 , , p , t N 0 n , 1 n + 1 t = 0 n trace Δ t D T 1 2 t = 0 n log | Λ t | | Δ t | + ,
R [ 0 , n ] L B , G ( D cov ) = inf 1 n + 1 t = 0 n Δ t D cov 1 2 t = 0 n log | Λ t | | Δ t | + ,
where Δ i i , t is defined precisely as Σ ˜ i i , t .
Next, we give some technical remarks related to Lemma 2.
Remark 3 
(Special case and technical remarks).
(1) 
Clearly, if in the forward test-channel realization with additive noise, we assume that the block diagonal matrix A = 0 (null matrix), then, we recover the classical forward test-channel realization for vector memoryless Gaussian source subject to a MSE distortion (see, e.g., ([32], Chapter 4.5), ([15], Chapter 9.7)) given by
y t = H t x t + v t , t N 0 n ,
and the coefficients in (37) give
H t I p 1 + p 2 Δ t Σ w 1 0 , Σ v t Δ t H t T 0 .
Moreover, the characterizations in (38)–(40) change in that Λ t = Σ w . Clearly, (42) can be seen as reverse-waterfilling design parameters.
(2) 
Compared to(1), we note that H t in (37) should not be confused with a positive semidefinite matrix defined in the usual quadratic form [33] but instead it can possibly be a non-symmetric matrix which however contains only real non-negative eigenvalues. This observation is important because it means that in general the design variables ( Δ t , Λ t ) do not commute like in the classical reverse-waterfilling problems for memoryless multivariate Gaussian random variables or in i . i . d . processes (see, e.g., [19,25]).
(3) 
For jointly Gaussian processes, the linear forward realization in (36) is the optimal realization among all realizations for this problem because the KF is the optimal causal MMSE estimator. Beyond Gaussian processes, and when the noise is zero-mean, uncorrelated and white (in our setup these properties hold), the optimal realization for Gaussian processes becomes the best linear realization (see, e.g., ([34], §3.2) or ([35], p. 130)) and similarly the corresponding characterizations in (38)–(40) are the best linear characterizations. By saying “best-linear” realization and characterizations, respectively, we mean that there may be non-linear realizations and hence non-linear-based characterizations that outperform the best linear.
(4) 
The characterization of (39) is different from the characterization obtained in ([16], Theorem 1, (25e)) that uses weighted distortion constraints. The former optimization problem imposes hard constraints whereas the latter imposes soft constraints via weights. Nonetheless, an interesting open question is whether there exists a set of weights, which will give the same per dimension distortion when imposed as a weighted total distortion constraint.
(5) 
It should also be stressed that the per-dimension constraints on the diagonal entries of Δ t are not the same as having constraints on the eigenvalues of Δ t . This further means that even for this class of distortion constraints, it is still possible to have rate-distortion resource allocation (i.e., a type of reverse-waterfilling optimization).
Remark 4 
(Convexity). The optimization problems in (38) and (39) are convex because the objective function is linear and the constraints are affine and positive semi-definite. Thus, the problem can be solved numerically using convex programming software (see, e.g, [36]) or the more challenging KKT conditions that are first-order necessary conditions for global optimality ([37], Chapter 5.3). The latter, will give certain non-linear matrix Riccati equations that need to be solved in order to construct a reverse-waterfilling algorithm.
Remark 5 
(Existence of Solution). A sufficient condition for existence of a solution with a finite value in (38)–(40) is to consider the strict LMI constraint 0 Δ t Λ t that ensures the objective function is bounded. The strict LMI ensures that Δ t 0 which further means that D > 0 , D T > 0 and D cov 0 .
In what follows, we derive lower bounds on (11)–(13).
Lemma 3 
(Steady-state lower bounds on (11)–(13)). Suppose that the conditions of Remark 5 hold. Moreover, let Δ 1 n + 1 t = 0 n Δ t for some finite n. Then, the following statements hold.
(1) 
R pd c ( D ) min 0 Δ Λ Δ i i D i i , i = 1 , , p 1 2 log | Λ | | Δ | ,
where Λ = A Δ A T + Σ w .
(2) 
R joint c ( D * ) min 0 Δ Λ Δ i i D i i , i = 1 , , p trace ( Δ ) D 1 2 log | Λ | | Δ | ,
for some D * = min { D T , D } .
(3) 
R c ( D cov ) min 0 Δ Λ 0 Δ D cov 1 2 log | Λ | | Δ | .
Proof. 
See Appendix A. □
It should be remarked that instead of the derivation based on a convexity argument in Lemma 3, one can assume that the optimal minimizer P ( d y t | y t 1 , x t ) that achieves (43)–(45) is time-invariant and the output distribution P ( d y t | y t 1 ) is also time-invariant with a unique invariant distribution, see, e.g., ([14], Theorem 3). Moreover, the optimal linear forward test-channel that achieves the lower bounds in (43)–(45) correspond to the time-invariant realization (36), given by
y t = H x t + ( I p 1 + p 2 H ) A y t 1 + v t ,
whereas the corresponding time-invariant scaling coefficients in (37) are as follows
H I p 1 + p 2 Δ Λ 1 , Σ v Δ H T 0 .
From Lemma 3, the following corollary can be immediately obtained.
Corollary 1 
(Fixed design variable Δ ). If in Lemma 2 we assume that Δ t = Δ , t , then we obtain (43)–(45).
Proof. 
This is immediate from the derivation of Lemma 3. □
In what follows, we show that the lower bounds in Lemma 3 are semidefinite representable, thus, they can be readily computed.
Theorem 2 
(Computation of the lower bounds in Lemma 3). Consider the variable Q 1 Δ 1 A T Σ w 1 A , where Δ 0 . Then, the following semidefinite programming representations hold.
(1) 
For some D trace ( D ^ ) > 0 , the lower bound in (43), denoted hereinafter by R pd L B ( D ) , is semidefinite representable as follows:
R pd L B ( D ) = min Q 1 0 1 2 log | Q 1 | + 1 2 log | Σ w | . s . t . 0 Δ Λ Δ i i D i i , i = 1 , , p
Δ Q 1 Δ A T A Δ Λ 0
(2) 
For some D * = min { D T , D } > 0 , the lower bound in (44), denoted hereinafter by R joint L B ( D * ) , is semidefinite representable as follows:
R joint L B ( D * ) = min Q 1 0 1 2 log | Q 1 | + 1 2 log | Σ w | . s . t . 0 Δ Λ Δ i i D i i , i = 1 , , p trace ( Δ ) D T Δ Q 1 Δ A T A Δ Λ 0
(3) 
For some D cov 0 , the lower bound in (45), denoted hereinafter by R L B ( D cov ) , is semidefinite representable as follows:
R L B ( D cov ) = min Q 1 0 1 2 log | Q 1 | + 1 2 log | Σ w | . s . t . 0 Δ Λ Δ D cov Δ Q 1 Δ A T A Δ Λ 0
Proof. 
See Appendix B. □
Next, we stress some comments on the semidefinite representation of the lower bounds in Theorem 2.
Remark 6 
(Comments on Theorem 2).
(1) 
We note that a similar characterization to the characterizations derived in Theorem 2 (subject to the distortion constraint (9) or the per-instant distortion constraint E { | | x t y t | | 2 2 } D t , t , for a special case of the setup in Figure 1) is recently derived in ([16], Equation (27)). The log-determinant convex optimization problems in Theorem 2 are widely used in systems and control theories because they are able to deal efficiently with LMIs [38].
(2) 
Recently, the efficiency of SDP algorithm in solving linear and non-linear optimization problems attracted experts from the field of information theory who noticed its usefulness in solving distributed source coding problems (see, e.g., [3,39]). Such log-determinant problems when solved using the semidefinite programming method are known to have polynomial worst-case complexity (see, e.g., [40]). In addition, for an interior point method such as the SDP approach, the most computationally expensive step is the Cholesky factorization involved in the Newton steps.
(3) 
On the other hand, due to its complexity, the SDP approach for high dimensional systems is often time consuming whereas for very large scale systems is occasionally impossible to obtain numerical solutions. Hence, ideally one could preferably consider alternative methods to solve a problem sacrificing for instance the optimality of the SDP algorithm but gaining in scalability and reducing the complexity. The most computationally efficient way to compute such problems and, additionally, to gain some insight from the solution is via the well-known reverse-waterfilling algorithm ([19], Theorem 10.3.3), which is however very hard to construct and compute because one needs to employ and solve complicated KKT conditions [37]. Such effort was recently made for multivariate Gauss-Markov processes under per instant, averaged total and asymptotically averaged total distortion constraints in [24,41].
Next, we perform some numerical illustrations using the semidefinite representations of Theorem 2. We also compare (48) and (50), to the known expression obtained only for the asymptotically averaged total MSE distortion constraint in ([16], Equation (27)). We note that the SDP algorithm for (48)–(51) is implemented using the CVX platform [36].
Example 1 
(Comparison of R joint L B ( D * ) , R pd L B ( D ) and ([16], Equation (27))). For the system in (1), we assume that user 1 is described by a R 2 -valued time-invariant Markov source driven by i . i . d . Gaussian noise process with parameters ( A 1 , Σ w 1 ) :
( A 1 , Σ w 1 ) = 0.5 0.2 0.3 0.6 , 1 0 0 1 ,
whereas, user 2 is described by a R 3 -valued time-invariant Markov source driven by i . i . d . Gaussian noise process with parameters ( A 2 , Σ w 2 ) :
( A 2 , Σ w 2 ) = 0.5 0.2 0.1 0.3 0.6 0.1 0.7 0.3 0.4 , 1 0 0 0 0.2 0 0 0 0.5 .
Clearly, the augmented state space model (2) generates A = A 1 A 2 and Σ w = Σ w 1 Σ w 2 . For this example, we assume that D T = 1.5 and D 11 = 0.1 , D 22 = 0.01 , D 33 = 0.6 , D 44 = 0.15 , D 55 = 0.1 , which implies that D = 0.96 . This means that D * = min { D T , D } = 0.96 .
In Figure 3, we compare the numerical solutions of R joint L B ( D * ) and R pd L B ( D ) with ([16], Equation (27)), denoted hereinafter as R L B ( D T ) .
Based on this numerical study, we observe that for distortion levels between ( 0 , D * = D ] , R joint L B ( D * ) R pd L B ( D ) whereas for values of D T greater than D * we observe that R joint L B ( D * ) = R pd L B ( D ) because the asymptotically averaged total MSE distortion constraint is inactive. This observation verifies our comment in Section 1.2 regarding the connection of (11) and (12). Clearly, at high rates (or high resolution) we observe that R joint L B ( D * ) R L B ( D T ) .
Another interesting observation (illustrated in Figure 4) that can be made, is that if in the same example we allocate the total budget of per dimension distortion equally, i.e., D i i = D j j , i j , we observe that for distortion levels between ( 0 , D * = D ] , R joint L B ( D * ) = R L B ( D T ) R pd L B ( D ) .
Example 2 
(Covariance matrix distortion constraint). For the system in (1) we assume that user 1 is described by a R 2 -valued time-invariant Markov source driven by i . i . d . Gaussian noise process with parameters ( A 1 , Σ w 1 ) :
( A 1 , Σ w 1 ) = 0 . 5 0 . 2 0 . 3 0 . 6 , 1 0 0 1 ,
whereas, user 2 is described by an R -valued time-invariant Markov source driven by i . i . d . Gaussian noise process with parameters ( A 2 , Σ w 2 ) :
( A 2 , Σ w 2 ) = ( 0.6 , 2 ) .
The augmented state space model (2) is generated by A = A 1 A 2 and similarly Σ w = Σ w 1 Σ w 2 . For this example, we assume a covariance matrix distortion constraint given by:
D cov = 1.5 γ γ γ 1 γ γ γ 0.5 ,
where γ > 0 is the positive correlation coefficient between the distortion matrix components (i.e., diagonal entries) and it is chosen such that D cov 0 .
In Figure 5 we demonstrate a comparison between R L B ( D cov ) evaluated for several different values of γ. One interesting observation that can be made is that higher distortion correlation in (56) leads to less bits with a γ max 0.53 , beyond which the value of R L B ( D cov ) remains unchanged. Another interesting observation is that for negative correlation γ, the approximation via SDP does not give a number. However, this is not the case, in general (see, e.g., ([42], Example 1)).
Using the same simulation study, we can arrive to an interesting connection between the approximation in (51) and (48). In particular, if for instance in (56) we restrict the matrix distortion constraint only to the main diagonal elements (i.e., exactly like the the per-dimension constraints) then, we obtain the plot of Figure 6 which clearly demonstrates that R L B ( D cov ) = R pd L B ( D ) . In fact, restricting the covariance matrix distortion constraint of (56) to the per dimension distortion constraint, is as if we optimize via a solution space in which γ is allowed to have any value in R . As a result, the feasible set of solutions is larger when the constraint set is subject to per-dimension distortion constraints rather than the covariance matrix distortion constraint.

2.2. Analytical Lower Bounds for Markov Sources Driven by Additive i . i . d . Noise Processes

In this subsection, we derive analytical lower bounds on (11)–(13)) when the source model describing the behavior of user 1 or user 2 is driven by possibly i . i . d . non-Gaussian noise process.
We first give a lemma which will facilitate the derivation of our lower bounds. We only consider the case of RDFs subject to per-dimension distortion constraints because the other classes of distortion constraints follow similarly.
Lemma 4 
(Rate-distortion bounds). For the augmented source model describing the behavior of users 1, 2 in (3), the following inequalities hold assuming distortion constraints in the class of (8):
R pd L B ( D ) ( a ) R pd L B , l i n e a r ( D ) R pd c ( D ) ,
where
R pd L B , l i n e a r ( D ) inf P l i n e a r ( d y t | y t 1 , x t ) : t N 0 Σ Δ i i D i i , i = 1 , , p lim sup n 1 n + 1 t = 0 n I ( x t ; y t | y t 1 ) ,
and ( a ) holds with equality if the augmented state space model described in (2) is jointly Gaussian and the optimal minimizer, i.e., P * ( d y t | y t 1 , x t ) of R pd L B ( D ) is conditionally Gaussian. Equality ( a ) holds trivially at D max .
Proof. 
The RHS inequality follows from Theorem 1 and (43) whereas the LHS inequality follows from the fact that the constraint set of R pd L B ( D ) is larger than the constraint set of R pd L B , l i n e a r ( D ) which is restricted to linear coding policies. Now, under the specific augmented source model in (3), and using Lemma 2, (1), we obtain R pd L B , l i n e a r ( D ) defined as in (58) because these are the best linear coding policies since KF algorithm is the best linear causal MSE estimator beyond additive Gaussian noise processes (see the discussion in Remark 3, (3)). Clearly, if the augmented source in (3) is jointly Gaussian and the optimal realization of R pd L B ( D ) is conditionally Gaussian, then, the system model is jointly Gaussian and the optimal policies are linear given by the forward linear test channel realization obtained in (36) hence the LHS inequality holds with equality. □
Remark 7 
(Comments on Lemma 4). We note that Lemma 4 holds if we assume RDFs with distortion constraints in the class of (9) or (10).
The following theorem is a major result of this paper.
Theorem 3 
(Analytical lower bounds on (11)–(13)). Consider the source models of users 1 , 2 in (1). Then, the following analytical lower bounds on (11)–(13) hold.
(1) 
For D i i > 0 , i , we obtain
R pd c ( D ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + ( p 1 + p 2 ) N ( w ) D ,
where D 0 , ( p 1 + p 2 ) N ( w ) 1 | A T A | 1 p 1 + p 2 with N ( w ) = 1 2 π e 2 2 p 1 + p 2 h ( w ) and h ( w ) > .
(2) 
For D T > 0 , and D i i > 0 , i , we obtain
R joint c ( D * ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + ( p 1 + p 2 ) N ( w ) D * ,
where D * 0 , ( p 1 + p 2 ) N ( w ) 1 | A T A | 1 p 1 + p 2 , with N ( w ) defined as in(1)and h ( w ) > .
(3) 
For D cov 0 we obtain
R c ( D cov ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + N ( w ) | D cov | 1 p 1 + p 2 ,
where | D cov | 0 , N ( w ) 1 | A T A | 1 p 1 + p 2 ( p 1 + p 2 ) , with N ( w ) defined as in(1)and h ( w ) > .
Proof. 
See Appendix C. □
The following technical remarks can be made regarding Theorem 3.
Remark 8. 
(1) 
Note that if in Theorem 3 we allow h ( w ) = , then, the analytical lower bound expressions take a negative finite value or , which cannot be the case ( RDF is, by definition, non-negative). A way to include the case where h ( w ) is allowed to be in our lower bound expressions, is to set the objective functions in (59)–(61) to be [ log ( · ) ] + . This will mean that whenever h ( w ) = , the analytical lower bound expression will be zero.
(2) 
The analytical lower bounds in (59)–(61) do not correspond to the best linear forward test channel realization of Lemma 3 (see (46)) which is also the optimal policy under the assumption of a MMSE decoder when the system’s noise is purely Gaussian (see Remark 3,(3)). Moreover, it is not clear what is the realization that achieves them the same way the bounds in Lemma 3 are achieved for Gaussian processes.
(3) 
If in Theorem 3 we assume that the users 1 , 2 have source models described by Markov processes driven by additive Gaussian noise processes then from the EPI (see, e.g., ([43], Equation (7))) N ( w ) = | Σ w | p 1 + p 2 and (59)–(61) change accordingly.
(4) 
One can choose to further bound (61) using the inequality | D cov | 1 p 1 + p 2 p 1 + p 2 trace ( D cov ) obtaining a further lower bound that coincides with the lower bound in (59) (see also the discussion in Example 2). Such lower bound will mean that we extend the set of feasible solutions that correspond to the initial problem statement (13), to be similar to the initial problem statement of (11) which cannot be the case, in general. Our bound in (61) encapsulates the off diagonal elements of the distortion covariance matrix distortion D cov hence it is an appropriate lower bound for the specific problem.
In what follows, we give a numerical simulation where we compare the solution of R joint L B ( D * ) (that corresponds to the lower bound achieved by the optimal coding policies when the system is driven by additive i . i . d . Gaussian noise processes) computed via the SDP representation of (50), with the lower bound obtained in (60) when the system’s noise is also Gaussian.
Example 3 
(Comparison of (50) with (60) for jointly Gaussian processes). We consider the same input data assumed in Example 1 for users 1 , 2 . Then, we proceed to compute the true lower bound of (50) and the lower bound obtained in (60).
Our simulation study in Figure 7 shows that at high rates the performance of the two bounds is almost identical whereas to moderate and low rates we observe a gap that remains constant when D T D , i.e., when the asymptotically averaged total distortion constraint is inactive. The same behavior is expected for systems of larger dimension (larger scale optimization problems) with a possibility of an increased gap to moderate and low rates depending on the structure of the block diagonal matrices A and Σ w .
Next, we state a corollary of Theorem 3.
Corollary 2 
(Analytical bounds when users 1 , 2 are not specified by the same additive noise process). Consider the source models of users 1 , 2 in (1). Moreover, assume that w t 1 N ( 0 ; Σ w 1 ) and w t 2 ( 0 ; Σ w 2 ) with Σ w 1 I p 1 and Σ w 2 I p 2 and h ( w 2 ) > . Then, the following analytical lower bounds on (11)–(13) hold.
(1) 
For D i i > 0 , i , we obtain
R pd c ( D ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + ( p 1 + p 2 ) | Σ w 1 | 1 p 1 N ( w 2 ) D ,
where D 0 , ( p 1 + p 2 ) N ( w ) 1 | A T A | 1 p 1 + p 2 with N ( w 2 ) = 1 2 π e 2 2 p 2 h ( w 2 ) .
(2) 
For D T > 0 , D i i > 0 , i , we obtain
R joint c ( D * ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + ( p 1 + p 2 ) | Σ w 1 | 1 p 1 N ( w 2 ) D * ,
where D * 0 , ( p 1 + p 2 ) | Σ w 1 | 1 p 1 N ( w 2 ) 1 | A T A | 1 p 1 + p 2 .
(3) 
For D cov 0 we obtain
R c ( D cov ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + | Σ w 1 | 1 p 1 N ( w 2 ) | D cov | 1 p 1 + p 2 ,
where | D cov | 0 , | Σ w 1 | 1 p 1 N ( w 2 ) 1 | A T A | 1 p 1 + p 2 ( p 1 + p 2 ) .
Proof. 
All cases (1)(3) follow almost identical steps with the derivation of Theorem 3. The only different but crucial step lies in (A6) where we then use the fact that
| Σ w | 1 p 1 + p 2 = ( a ) | Σ w 1 | 1 p 1 + p 2 | Σ w 2 | 1 p 1 + p 2 ( b ) | Σ w 1 | 1 p 1 | Σ w 2 | 1 p 2 ( c ) | Σ w 1 | 1 p 1 N ( w 2 ) ,
where ( a ) follows from properties of block diagonal matrices ([33], Section 0.9.2); ( b ) follows from the conditions of the corollary on the noise covariance matrices; ( c ) follows from the EPI ([43], Equation (7)). □
One can deduce the following for Corollary 2.
Remark 9. 
Corollary 2 will give similar analytical lower bounds (with appropriate modifications) if instead of user 1, we assume that the source model of user 2 is driven by a Gaussian noise process. The additional assumption on the covariance matrix of the noise process in both users is imposed because otherwise we cannot guarantee that the key series of inequalities (65) will be satisfied.

3. Upper Bounds

In this section we explain the case of encoding the augmented vector-valued Markov source modeled by (3) using a sequential causal DPCM scheme with a feedback loop followed by an ECDQ. The scheme relies on the linear forward test channel realization of the bounds in Lemma 2. The precursor of the DPCM-based scheme with feedback loop is [14] whereas ECDQ is a classical source coding approach with standard performance guarantees in information theory (see, e.g., [44]). The ECDQ scheme is utilized to bound the rate performance of the DPCM scheme. This approach furnish with an achievable (upper) bound the operation causal RDFs in (11)–(13).

3.1. DPCM with Feedback Loop

First, we briefly describe the sequential causal DPCM scheme with feedback loop introduced in ([14], Figure 2) (see also [45]). Observe that because the augmented source is modeled as a first-order multidimensional Markov process, the sequential causal coding is precisely equivalent to a predictive coding paradigm (see, e.g., [14,46]).
At each time instant t, the encoder or innovations’ encoder performs the linear operation
x ^ t = x t A y t 1 ,
where at t = 0 we assume initial data x ^ 0 = x 0 and also y t 1 E x t 1 | t 1 , i.e., an estimate of x t 1 given the previous quantized symbols t 1 (Note that the process x ^ t has a temporal correlation since it subtracts the error of x t given all previous quantized symbols t 1 and not the infinite past of the source x t . Hence, x ^ t is only an estimate of the true process and this causes a part of the sub-optimality of this scheme.). Then, by means of a R p 1 + p 2 -valued MMSE quantizer that operates at a rate R t , we generate the quantized reconstruction y ^ t of the residual source x ^ t denoted by y ^ t = y t A y t 1 . Then, we send t over the channel (the corresponding data packet to y ^ t ). At the decoder we receive t and recover the quantized symbol y ^ t of x ^ t .
Then, we generate the estimate y t using the linear operation
y t = y ^ t + A y t 1 .
Combining both (66) and (67), we obtain
x t y t = x ^ t y ^ t .
From (68), we can immediately deduce that the error between x t and y t is equal to the quantization error introduced by x ^ t and y ^ t which means that the MSE distortion at each instant of time satisfy
E { | | x t y t | | 2 2 } = E { | | x ^ t y ^ t | | 2 2 } .
In addition, the covariance matrix Σ Δ yields
E { ( x t y t ) ( x t y t ) T } = E { ( x ^ t y ^ t ) ( x ^ t y ^ t ) T } .
A pictorial view of the DPCM scheme with feedback loop is given in Figure 8.

3.2. Bounding (11)–(13) via A DPCM-based ECDQ for Gaussian Noise Processes

In this subsection, we bound the rate performance of the DPCM scheme described in Section 3.1 in the infinite time horizon, using a scheme that utilizes the steady-state linear forward test-channel realization that achieves the lower bounds of Lemma 3. Essentially, what we do in this scheme is that we replace the quantization noise with an additive Gaussian noise with the same second moments (see e.g., [47] or [44] (Chapter 5) and the references therein).
Recall that the steady-state linear forward test-channel realization of the lower bounds in Lemma 3 is written as follows:
y t = H x t + ( I p 1 + p 2 H ) A y t 1 + v t ,
whereas the steady-state reverse-waterfilling parameters ( H , Σ v ) are given by
H I p 1 + p 2 Δ Λ 1 , Σ v = H Δ 0 .
The forward test-channel realization of (71) is illustrated in Figure 9.
Before we proceed, we point out the following important technical remarks on the realization of (71) and the coefficients (72).
Remark 10 
(Observations (71) and (72)). The linear forward test channel realization with additive noise in (71) is equivalent to the steady-state realization in (46) because for both it can be shown that the MSE distortion constraint is achieved (i.e., v t N ( 0 ; Σ v ) , Σ v = H Δ = Δ H T 0 ). Moreover, this realization is equivalent but simpler to build compared to the forward test channel realization introduced in [14] in which non-singular matrices and diagonalization by congruence is assumed (see ([14], Theorem 4)).
In the test channel realization of Figure 9, a reverse-waterfilling in spatial dimension is possible when we assume asymptotically averaged total MSE distortion constraints similar to ([14], Theorem 40). This reverse-waterfilling is dictated by the rank of matrix H. To make this point clear, if H is full rank, then all spatial dimensions in the system are active whereas if H is rank deficient, then, some dimensions are inactive (these dimensions form the null space and the nullity of H) and for these dimensions the rate is zero hence they can be excluded from the realization of Figure 9. In the sequel, we will present simulations where we study the reverse-waterfilling in the spatial domain under a certain distortion constraint studied in this paper.
Pre/Post Filtered ECDQ with multiplicative factors for augmented multivariate Gauss–Markov sources and spatial reverse-waterfilling. First, we consider a rank ( H ) dimensional lattice quantizer Q rank ( H ) ( · ) [48] such that
E { z t z t T } = Σ v c , Σ v c 0 ,
where z t R rank ( H ) is a random dither vector generated both at the encoder and the decoder independent of the input signals x ^ t and the previous realizations of the dither, uniformly distributed over the basic Voronoi cell of the rank ( H ) dimensional lattice quantizer Q rank ( H ) ( · ) such that v t c U n i f ( 0 ; Σ v t c ) . At the encoder the lattice quantizer quantize H x ^ t + z t , that is, Q rank ( H ) ( H x ^ t + z t ) , where x ^ t is given by (66). Then, the encoder applies conditional entropy coding to the output of the quantizer and transmits the output of the entropy coder. At the decoder the coded bits are received and the output of the quantizer is reconstructed, i.e., Q rank ( H ) ( H x ^ t + z t ) . Then, it generates an estimate by subtracting z t from the quantizer’s output and multiplies the result by I rank ( H ) ( I rank ( H ) denotes the identity matrix with dimensions according to the rank of H. This identity matrix can be excluded but we include it here for completeness.) as follows:
y t = I rank ( H ) ( Q rank ( H ) ( H x ^ t + z t ) z t ) ,
Performance. The coding rate at each instant of time of the conditional entropy of the MMSE quantizer is given by
H ( Q rank ( H ) | z t ) = I ( H x ^ t ; H x ^ t + v t c ) = ( a ) I ( H x ^ t ; H x ^ t + v t ) + D ( v t c | | v t ) D ( H x ^ t + v t c | | H x ^ t + v t ) ( b ) I ( H x ^ t ; H x ^ t + v t ) + D ( v t c | | v t ) ( c ) I ( H x ^ t ; H x ^ t + v t ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) ( d ) I ( x t ; y t | y t 1 ) + rank ( H ) 2 log ( 2 π e G rank ( H ) )
where v t c R rank ( H ) is the (uniform) coding noise in the ECDQ scheme and v t is the corresponding Gaussian counterpart; ( a ) follows because the two random vectors v t c , v t have the same second moments hence we can use the identity D ( x | | x ) = h ( x ) h ( x ) ; ( b ) follows because D ( H x ^ t + v t c | | H x ^ t + v t ) 0 ; ( c ) follows because the divergence of the coding noise from Gaussianity is less than or equal to rank ( H ) 2 log ( 2 π e G rank ( H ) ) [47] where G rank ( H ) is the dimensionless normalized second moment of the lattice ([44], Definition 3.2.2); ( d ) follows from data processing properties, namely, I ( x t ; y t | y t 1 ) = ( * ) I ( x t ; y t | y t 1 ) = ( * * ) I ( x ^ t ; y ^ t ) ( * * * ) I ( H x ^ t ; H x ^ t + v t ) where ( * ) follows from the realization of (71), ( * * ) follows from the fact that x ^ t and y ^ t (obtained by (67)) are independent of y t 1 , and ( * * * ) is a consequence of data processing inequality since ( H x ^ t + v t ) x ^ t H x ^ t . Under the assumption that the clocks of the entropy encoder and entropy decoder in the ECDQ scheme are synchronized, then, the total coding rate is obtained as follows
t = 1 n R t t = 0 n ( H ( Q rank ( H ) | z t ) ( e ) t = 0 n I ( x t ; y t | y t 1 ) + ( n + 1 ) rank ( H ) 2 log ( 2 π e G rank ( H ) ) = ( f ) 1 2 t = 0 n log 2 | Λ t | | Δ t | + ( n + 1 ) rank ( H ) 2 log ( 2 π e G rank ( H ) ) ,
where ( e ) follows from (74); ( f ) follows from the derivation of Lemma 2.
The previous analysis yields the following theorem.
Theorem 4 
(Achievability bound on (11)–(13)). Suppose that Δ t = Δ , t and assume that the users 1 , 2 source models (1) are driven by Gaussian noise processes. Then, the augmented state space source model in (3) ensures the following achievability bounds on (11)–(13), as follows
R pd c ( D ) R pd L B ( D ) + rank ( H ) 2 log ( 2 π e G rank ( H ) )
R joint c ( D * ) R joint L B ( D * ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) ,
R c ( D cov ) R L B ( D cov ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) .
Proof. 
Under the conditions of the Theorem and the ECDQ scheme that leads to (75), the RHS terms in (76)–(78) are all constants. Then, taking the limit in both sides of (76)–(78) and then the appropriate infimization (minimization) constraint sets, the result follows. □
We wish to point out the following for Theorem 4.
Remark 11 
(Comments on Theorem 4).
(1) 
The ECDQ that leads to (75) is not the same as the standard symmetric ECDQ scheme for scalar-valued processes, i.e., when the coefficient H breaks into two pre and post additive noise channel scalings that tune the MSE distortion (see, e.g., [44,47]). In our pre-post scaled ECDQ scheme we take asymmetric coefficients based on the realization of Figure 9. This leads to a coarser lattice than the one used for the unscaled ECDQ (for details see for instance [44]).
(2) 
Since the upper bound essentially relies on the obtained lower bound for all (76)–(78), this means that similar observations can be made. For instance, if D < D T , then, (77) recovers (76), i.e., the asymptotically averaged total distortion constraint is inactive. Moreover, we cannot claim tightness of the achievability bound in (78) because the lower bound is already non-tight.
Next, we give an example where we compare the RL gap for various distortion levels of the achievability bound obtained in Theorem 4 and the lower bound obtained in Theorem 2 for the operational causal RDF with joint distortion constraints.
Example 4 
(RL gap of achievability and lower bounds). In this example, we plot lower and upper bounds on the operational causal RDF subject to joint distortion constraints using the bounds derived in (50) and (77), respectively. We first consider the same input data assumed in Example 1 for users 1 , 2 . Then, we proceed to compute the lower bound via (50) and the achievability bound in (77). After the first numerical study, we consider another one for which we only change the Gaussian noise process for both users 1 , 2 , as follows
( Σ w 1 , Σ w 2 ) = 1 0.5 0.5 1 , 1.4039 0.6034 0.5165 0.6034 0.9563 0.7682 0.5165 0.7682 0.6620 .
Using the data of Example 1, and the same D T and D i i , i = 1 , 2 , 3 , 4 , 5 , we obtain the plots of Figure 10. For this study, we have used a Schläfli lattice (for details on this lattice see, e.g., [48]). D ˜ 5 with a dimensionless normalized second moment of the lattice G 5 0.0756 bits. In this example H is always full rank and the RL gap is constant at 0.9218 bits/augmented vector.
For the second example, we obtain the plots of Figure 11. For this study, we have used the dimensionless normalized second moment of a Schläfli lattice D ˜ 5 for the full rank case and for the rank deficient cases a Schläfli lattice D 4 with a dimensionless normalized second moment of the lattice G 4 0.0766 bits. Similar to the first study, when H is always full rank, the RL gap is 0.9218 bits/augmented vector whereas for H rank deficient the RL gap is 0.7754 bits/augmented vector.

3.3. Bounding (11)–(13) via a DPCM-based ECDQ for Non-Gaussian Noise Processes

Similar to Lemma 4 where linear policies are the benchmark to derive lower bounds on (11)–(13), in this subsection we derive upper bounds on (11)–(13) using the linear test channel realization in Figure 9 and the DPCM-based ECDQ scheme of Section 3.1 and Section 3.2.
Next, we state the following theorem.
Theorem 5 
(Achievability bound on (11)–(13) for additive non-Gaussian noise process). Suppose that Δ t = Δ , t and assume that the users 1 , 2 source models (1) are driven by non-Gaussian noise processes. Then, the augmented state space source model in (3) ensures the following achievability bounds on (11)–(13), as follows
R pd c ( D ) R pd L B , l i n e a r ( D ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) + D ( x ^ | | x ^ G )
R joint c ( D * ) R joint L B , l i n e a r ( D * ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) + D ( x ^ | | x ^ G ) ,
R c ( D cov ) R L B , l i n e a r ( D cov ) + rank ( H ) 2 log ( 2 π e G rank ( H ) ) + D ( x ^ | | x ^ G ) .
where D ( x ^ | | x ^ G ) is the KL divergence between the residual source x ^ t assuming linear policies and the Gaussian residual source x ^ t G N ( 0 ; Λ ) in Figure 9.
Proof. 
We only prove (80) because (81) and (82) follow similarly. In addition, in parts we sketch the derivation because it is clear from the previous results. From Lemmas 4 and 3, we can easily obtain the following lower bound (similar to SLB [32])
R pd c ( D ) R pd L B , l i n e a r ( D ) D ( x ^ | | x ^ G ) ,
where D ( x ^ | | x ^ G ) 0 is the discrepancy between the residual source x ^ t assuming linear policies and the optimal Gaussian residual source x ^ t G N ( 0 ; Λ ) . From (83), we obtain
R pd L B , l i n e a r ( D ) R pd c ( D ) + D ( x ^ | | x ^ G ) .
Then, applying the DPCM-based ECDQ scheme based on the linear forward test channel realization of Figure 9 discussed in Section 3.1 and Section 3.2 we obtain (76). Since the coding scheme is obtained because we have assumed linear policies, R pd L B ( D ) will be replaced by R pd L B , l i n e a r ( D ) . This completes the derivation. □
Remark 12 
(Comments on Theorem 5). Clearly, Theorem 5 is a generalization of Theorem 4 under the assumption of the linear realization of Figure 9 with systems driven by additive i . i . d . non-Gaussian noise process. If in (80)–(82), we assume that the system is driven by additive i . i . d . Gaussian noise process, then, clearly D ( x ^ | | x ^ G ) = 0 and Theorem 5 recovers Theorem 4.

4. Conclusions and Future Research

In this paper, we derived bounds on the OPTA of a two-user MIMO causal encoding and causal decoding problem (assuming the clocks of the encoder and the decoder to be synchronized). In our setup, each one of the users is described by a multivariate Markov source driven by additive i . i . d . noise process (possibly non-Gaussian) subject to three classes of spatio-temporal distortion constraints.
Although not directly pursued in this paper, all the results can be easily generalized to any finite number of users in Figure 1. Moreover, as a future research we aim to study the case of separate encoding for each user which will be a generalized version of a multi-user (distributed) source coding setup. Finally, due to the lack of insight of our results (mainly because we employed the general SDP solvers to compute our bounds) it makes sense to consider more specific setups and try to solve them using KKT conditions and, then, identify structural properties of matrices ( A , Σ w ) for which KKT conditions can be optimally solved.

Author Contributions

Conceptualization, P.A.S., J.Ø. and M.S.; methodology, P.A.S., J.Ø. and M.S.; software, P.A.S.; validation, P.A.S, J.Ø. and M.S.; formal analysis, P.A.S; investigation, P.A.S.; resources, M.S.; data curation, P.A.S.; writing—original draft preparation, P.A.S.; writing—review and editing, P.A.S., J.Ø. and M.S.; visualization, P.A.S.; supervision, M.S.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by KAW Foundation and the Swedish Foundation for Strategic Research.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Notation

The following notation is used in this manuscript:
SymbolDescription
R The set of real numbers
Z The set of integers
N 0 The set of natural numbers including zero
N 0 n The set { 0 , , n } where n N 0
x R Random variable
X Alphabet for the random variable x
x r t The sequence of random variables ( x r , x r + 1 , , x t ) , ( r , t ) Z × Z , r t
x r t Sequence of the random variables realizations, where x r t X r t
X r t × k = r t X k with X t = X
P ( d x ) The probability of the RV x on X
P ( d y | x ) The conditional distribution of RV y given x = x
compound product
Direct sum
x R p × 1 Column vector
x T R 1 × p Row vector
K R p × p Square real matrix
K T R p × p Transpose of square real matrix
K i i Diagonal elements of matrix K
| K | Determinant of K
rank ( K ) Rank of K
trace ( K ) Trace of K
μ K , i the i th eigenvalue of matrix K
Σ x The covariance of a random vector x
Σ x 0 Positive definite covariance matrix Σ x
Σ x 0 Positive semidefinite covariance matrix Σ x
Σ x Σ x Σ x Σ x is positive semidefinite
Σ x Σ x Σ x Σ x is positive definite
0 Null matrix
I p Identity matrix of dimension p
H ( · ) Discrete entropy
h ( · ) Differential entropy
D ( x | | x ) KL Divergence of probability distribution P ( x ) with respect to probability distribution P ( x )
x N ( 0 ; Σ ) Gaussian random vector x with zero mean and covariance Σ
x U n i f ( 0 ; Σ ) Uniformly distributed random vector x with zero mean and covariance Σ
h G ( · ) Gaussian differential entropy
R G ( · ) Gaussian information RDF
N ( x ) The entropy power of random vector x
| | · | | 2 Euclidean norm
E { · } Expectation operator
[ · ] + max { 0 , · }
A B C A , B , C form a Markov chain

Abbreviations

The following abbreviations are used in this manuscript:
OPTAOptimal Performance Theoretically Attainable
RDFRate distortion function
DPCMDifferential pulse coded modulation
ECDQEntropy coded dithered quantization
MSEMean-squared error
MMSEMinimum MSE
RHSRight Hand Side
LHSLeft Hand Side
i . i . d . Independent Identically Distributed
a.s.almost surely
KKTKarush Kuhn Tucker
LMILinear matrix inequality
SDPSemidefinite programming
KFKalman filter
EPEntropy power
EPIEntropy power inequalities
RLRate loss
SLBShannon Lower Bound

Appendix A

Proof of Lemma 3.
We only prove (1) as both (2), (3), follow similarly. First, by assumptions of the theorem, Δ 1 n + 1 t = 0 n Δ t for some finite n with Δ 0 (sufficient condition for existence of a finite solution) and B A T Σ w 1 A 0 . Moreover,
1 n + 1 t = 0 n R t ( a ) 1 n + 1 t = 0 n I ( x t ; y t | y t 1 ) = ( b ) 1 n + 1 1 2 t = 0 n log | Λ t | | Δ t | = ( c ) 1 n + 1 1 2 log | Λ 0 | 1 2 log | Λ n + 1 | + 1 2 t = 0 n log | Λ t + 1 | | Δ t | = ( d ) 1 n + 1 1 2 log | Λ 0 | 1 2 log | Λ n + 1 | + 1 2 t = 0 n log | Σ w | + log | Δ t 1 + B | ( e ) 1 n + 1 1 2 log | Λ 0 | 1 2 log trace ( A T A ) trace ( D ^ ) + trace ( Σ w ) ) p p + 1 2 1 n + 1 t = 0 n log | Σ w | + log | Δ t 1 + B | ( f ) 1 n + 1 1 2 log | Λ 0 | 1 2 log trace ( A T A ) trace ( D ^ ) + trace ( Σ w ) ) p p + 1 2 log | Σ w | + 1 2 log | Δ 1 + B | ( g ) 1 n + 1 1 2 log | Λ 0 | 1 2 log trace ( A T A ) trace ( D ^ ) + trace ( Σ w ) ) p p + 1 2 log | Λ | | Δ | ,
where ( a ) follows from Theorem 1 (see also Remark 2); ( b ) follows from Lemma 1, (1), because h G ( x t | y t 1 ) = 1 2 log ( 2 π e ) p 1 + p 2 | Λ t | and h G ( x t | y t ) = 1 2 log ( 2 π e ) p 1 + p 2 | Δ t | ; ( c ) follows by reformulating the additive objective; ( d ) follows because | Λ t + 1 | | Δ t 1 | = | A Δ t A T + Σ w | | Δ t 1 | = | Σ w ( Σ w 1 A Δ t A T + I p 1 + p 2 ) | | Δ t 1 | = ( d 1 ) | Σ w ( A T Σ w 1 A Δ t + I p 1 + p 2 ) | | Δ t 1 | = ( d 2 ) | Σ w | | B + Δ t 1 | where ( d 1 ) follows from Weinstein–Aronszajn identity ([49], Corollary 18.1.2) and ( d 2 ) from standard determinant properties of square matrices of the same size; ( e ) follows from the inequalities
| Λ n + 1 | ( e 1 ) ( trace ( A Δ n A T + Σ w ) ) p p ( e 2 ) trace ( A Δ n A T ) + trace ( Σ w ) ) p p ( e 3 ) trace ( A T A ) trace ( Δ n ) + trace ( Σ w ) ) p p ( e 4 ) trace ( A T A ) trace ( D ^ ) + trace ( Σ w ) ) p p ,
where ( e 1 ) follows because | K | 1 p trace ( K ) p for K 0 , ( e 2 ) follows from ([18], Ex. 12.14), ( e 3 ) follows from the cycling property of trace and ([18], Ex. 12.14), ( e 4 ) follows because trace ( Δ n ) trace ( D ^ ) (by definition); ( f ) follows because the term log | Δ t 1 + B | is convex with respect to Δ t for B 0 (see, e.g., [50]) hence we can apply Jensen’s inequality ([19], Theorem 2.6.2); ( g ) follows because log | Σ w | + 1 2 log | Δ 1 + B | can be rearranged to | Λ | | Δ | where Λ = A Δ A T + Σ w . Taking the limit in both sides in (A1), we observe that the first RHS term vanishes asymptotically. Finally, using the appropriate infimization constraints in both sides of the limiting objective functions in (A1), and because we assume sufficient conditions for existence of solution of the RHS term infimum is in fact minimum and the result follows. □

Appendix B

Proof of Theorem 2.
Similar to the proof of Lemma 3, we only prove one case, i.e., (1), as the following cases can be shown using the exact same way.
By invoking matrix determinant lemma ([49], Theorem 18.1.1) in R pd L B ( D ) , we obtain
R pd na ( D ) = min 0 Δ Λ Δ i i D i i , i = 1 , , p 1 2 log | Σ w | 1 2 log | Δ 1 A T Σ w 1 A | 1 .
Next, we introduce a decision variable Q 1 Δ 1 A T Σ w 1 A . Using the monotonicity of the determinant we can rewrite (A2) as
R pd na ( D ) = min 0 Δ Λ Δ i i D i i , i = 1 , , p 0 Q 1 ( Δ 1 A T Σ w 1 A ) 1 1 2 log | Σ w | 1 2 log | Q 1 | .
Applying Woodbury matrix identity ([49], Theorem 18.2.8) in the inequality constraint 0 Q 1 ( Δ 1 A T Σ w 1 A ) 1 , we obtain
0 Q 1 Δ Δ A T ( Σ w + A Δ A T ) 1 A Δ .
From Theorem 2 we have Λ = A Δ A T + Σ w , hence (A4) is equivalent to the LMI condition of (49).
The decision variable is convex and there exists an optimal solution because Δ 0 . This completes the proof. □

Appendix C

Proof of Theorem 3.
We only prove in detail (1) and sketch the proof for (2), (3) because several steps are identical to (1). First note that from Lemma 3 we have the bound in (43). Next, we simply show that the objective function can be lower bounded by a constant value (independent of the constraint set).
1 2 log | Λ | | Δ | = 1 2 log | A Δ A T + Σ w | 1 2 log | Δ | = ( p 1 + p 2 ) 2 log | A Δ A T + Σ w | 1 p 1 + p 2 ( p 1 + p 2 ) 2 log | Δ | 1 p 1 + p 2 ( a ) ( p 1 + p 2 ) 2 log | A Δ A T | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 ( p 1 + p 2 ) 2 log | Δ | 1 p 1 + p 2 = ( b ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 | Δ | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 ( p 1 + p 2 ) 2 log | Δ | 1 p 1 + p 2 = ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 | Δ | 1 p 1 + p 2 ( c ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 ( p 1 + p 2 ) trace ( Δ )
( d ) ( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 ( p 1 + p 2 ) D
( e ) ( p 1 + p 2 ) 2 log | A T A | 1 ( p 1 + p 2 ) + ( p 1 + p 2 ) N ( w ) D ,
where ( a ) follows from Minkowski’s determinant inequality ([18], Exercise 12.13); ( b ) follows from standard properties of determinants for square matrices of the same size; ( c ) follows from the reverse application of EPI (see, e.g., ([43], Equation (7))); ( d ) follows because trace ( Δ ) trace ( D ^ ) D ; ( e ) follows from ([43], Equation (7)). The constant value in (A7) is well defined if D 0 , ( p 1 + p 2 ) N ( w ) 1 | A T A | 1 p 1 + p 2 , where N ( w ) = 1 2 π e 2 2 p 1 + p 2 h ( w ) and h ( w ) > .
In (2) we follow similar steps to (1) but in inequality ( d ) we have instead trace ( Δ ) min { D T , trace ( D ^ ) } D * .
In (3) we follow similar steps until (A5). Afterwards, we leverage the fact that | Δ | | D cov | to obtain instead of (A6) the following:
( p 1 + p 2 ) 2 log | A T A | 1 p 1 + p 2 + | Σ w | 1 p 1 + p 2 | D cov | 1 p 1 + p 2 .
This completes the derivation. □

References

  1. Caines, P.E. Linear Stochastic Systems; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1988. [Google Scholar]
  2. Skogestad, S.; Postlethwaite, I. Multivariable Feedback Control: Analysis and Design, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
  3. Wang, J.; Chen, J. Vector Gaussian Multiterminal Source Coding. IEEE Trans. Inf. Theory 2014, 60, 5533–5552. [Google Scholar] [CrossRef] [Green Version]
  4. Ekrem, E.; Ulukus, S. An Outer Bound for the Vector Gaussian CEO Problem. IEEE Trans. Inf. Theory 2014, 60, 6870–6887. [Google Scholar] [CrossRef] [Green Version]
  5. Oohama, Y. Indirect and Direct Gaussian Distributed Source Coding Problems. IEEE Trans. Inf. Theory 2014, 60, 7506–7539. [Google Scholar] [CrossRef]
  6. Rahman, M.S.; Wagner, A.B. Rate Region of the Vector Gaussian One-Helper Source-Coding Problem. IEEE Trans. Inf. Theory 2015, 61, 2708–2728. [Google Scholar] [CrossRef] [Green Version]
  7. Zahedi, A.; Østergaard, J.; Jensen, S.H.; Naylor, P.A.; Bech, S. Source Coding in Networks With Covariance Distortion Constraints. IEEE Trans. Signal Proc. 2016, 64, 5943–5958. [Google Scholar] [CrossRef] [Green Version]
  8. Vaseghi, S.V. Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications; John Wiley & Sons: Chichester, UK, 2007. [Google Scholar]
  9. Zahedi, A.; Østergaard, J.; Jensen, S.H.; Bech, S.; Naylor, P. Audio coding in wireless acoustic sensor networks. Signal Process. 2015, 107, 141–152. [Google Scholar] [CrossRef]
  10. Linder, T.; Lagosi, G. A zero-delay sequential scheme for lossy coding of individual sequences. IEEE Trans. Inf. Theory 2001, 47, 2533–2538. [Google Scholar] [CrossRef]
  11. Derpich, M.S.; Østergaard, J. Improved Upper Bounds to the Causal Quadratic Rate-Distortion Function for Gaussian Stationary Sources. IEEE Trans. Inf. Theory 2012, 58, 3131–3152. [Google Scholar] [CrossRef] [Green Version]
  12. Kaspi, Y.; Merhav, N. Structure Theorems for Real-Time Variable Rate Coding With and Without Side Information. IEEE Trans. Inf. Theory 2012, 58, 7135–7153. [Google Scholar] [CrossRef] [Green Version]
  13. Linder, T.; Yüksel, S. On Optimal Zero-Delay Coding of Vector Markov Sources. IEEE Trans. Inf. Theory 2014, 60, 5975–5991. [Google Scholar] [CrossRef] [Green Version]
  14. Stavrou, P.A.; Østergaard, J.; Charalambous, C.D. Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources. IEEE J. Sel. Top. Signal Process. 2018, 12, 841–856. [Google Scholar] [CrossRef]
  15. Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
  16. Tanaka, T.; Kim, K.K.K.; Parrilo, P.A.; Mitter, S.K. Semidefinite Programming Approach to Gaussian Sequential Rate-Distortion Trade-Offs. IEEE Trans. Autom. Control 2017, 62, 1896–1910. [Google Scholar] [CrossRef]
  17. Khina, A.; Kostina, V.; Khisti, A.; Hassibi, B. Tracking and Control of Gauss-Markov Processes over Packet-Drop Channels with Acknowledgments. IEEE Trans. Control Netw. Syst. 2019, 6, 549–560. [Google Scholar] [CrossRef] [Green Version]
  18. Abadir, K.M.; Magnus, J.R. Matrix Algebra; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
  19. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
  20. Wyner, A. A definition of conditional mutual information for arbitrary ensembles. Inf. Control 1978, 38, 51–59. [Google Scholar] [CrossRef] [Green Version]
  21. Tanaka, T.; Esfahani, P.M.; Mitter, S.K. LQG Control With Minimum Directed Information: Semidefinite Programming Approach. IEEE Trans. Autom. Control 2018, 63, 37–52. [Google Scholar] [CrossRef] [Green Version]
  22. Massey, J.L. Causality, Feedback and Directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA ’90), Waikiki, HI, USA, 27–30 November 1990; pp. 303–305. [Google Scholar]
  23. Charalambous, C.D.; Stavrou, P.A. Directed Information on Abstract Spaces: Properties and Variational Equalities. IEEE Trans. Inf. Theory 2016, 62, 6019–6052. [Google Scholar] [CrossRef]
  24. Stavrou, P.; Charalambous, T.; Charalambous, C.; Loyka, S. Optimal Estimation via Nonanticipative Rate Distortion Function and Applications to Time-Varying Gauss–Markov Processes. SIAM J. Control Optim. 2018, 56, 3731–3765. [Google Scholar] [CrossRef] [Green Version]
  25. Ihara, S. Information Theory—For Continuous Systems; World Scientific: Singapore, 1993. [Google Scholar]
  26. Gorbunov, A.K.; Pinsker, M.S. Nonanticipatory and Prognostic Epsilon Entropies and Message Generation Rates. Problems Inf. Transmiss. 1973, 9, 184–191. [Google Scholar]
  27. Tatikonda, S.C. Control Under Communication Constraints. Ph.D. Thesis, Mass. Inst. of Tech. (M.I.T.), Cambridge, MA, USA, 2000. [Google Scholar]
  28. Charalambous, C.D.; Stavrou, P.A.; Ahmed, N.U. Nonanticipative Rate Distortion Function and Relations to Filtering Theory. IEEE Trans. Autom. Control 2014, 59, 937–952. [Google Scholar] [CrossRef]
  29. Tatikonda, S.; Sahai, A.; Mitter, S. Stochastic linear control over a communication channel. IEEE Trans. Autom. Control 2004, 49, 1549–1561. [Google Scholar] [CrossRef]
  30. Silva, E.I.; Derpich, M.S.; Østergaard, J. A Framework for Control System Design Subject to Average Data-Rate Constraints. IEEE Trans. Autom. Control 2011, 56, 1886–1899. [Google Scholar] [CrossRef] [Green Version]
  31. Stavrou, P.A.; Charalambous, T.; Charalambous, C.D. Finite-Time Nonanticipative Rate Distortion Function for Time-Varying Scalar-Valued Gauss-Markov Sources. IEEE Control Syst. Lett. 2018, 2, 175–180. [Google Scholar] [CrossRef]
  32. Berger, T. Rate Distortion Theory: A Mathematical Basis for Data Compression; Prentice-Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
  33. Horn, R.A.; Johnson, C.R. (Eds.) Matrix Analysis, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
  34. Anderson, B.; Moore, J. Optimal Filtering; Prentice-Hall: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
  35. Simon, D. Optimal State Estimation: Kalman, H, and Nonlinear Approaches; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  36. Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, Version 2.1. Available online: http://cvxr.com/cvx (accessed on 1 March 2014).
  37. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: New York, NY, USA, 2004. [Google Scholar]
  38. Boyd, S.; El Ghaoui, L.; Feron, E.; Balakrishnan, V. Linear Matrix Inequalities in System and Control Theory; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1994. [Google Scholar]
  39. Wang, J.; Chen, J.; Wu, X. On the Sum Rate of Gaussian Multiterminal Source Coding: New Proofs and Results. IEEE Trans. Inf. Theory 2010, 56, 3946–3960. [Google Scholar] [CrossRef]
  40. Vandenberghe, L.; Boyd, S.; Wu, S.P. Determinant maximization with linear matrix inequality constraints. SIAM J. Matrix Anal. Appl. 1998, 19, 499–533. [Google Scholar] [CrossRef] [Green Version]
  41. Stavrou, P.A.; Charalambous, T.; Charalambous, C.D.; Loyka, S.; Skoglund, M. Asymptotic Reverse- Waterfilling Characterization of Nonanticipative Rate Distortion Function of Vector-Valued Gauss-Markov Processes with MSE Distortion. In Proceedings of the 2018 IEEE Conference on Decision and Control (CDC), Beach, FL, USA, 17–19 December 2018; pp. 14–20. [Google Scholar]
  42. Stavrou, P.A.; Østergaard, J.; Skoglund, M. On Zero-delay Source Coding of LTI Gauss-Markov Systems with Covariance Matrix Distortion Constraints. In Proceedings of the European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 3083–3088. [Google Scholar]
  43. Rioul, O. Information Theoretic Proofs of Entropy Power Inequalities. IEEE Trans. Inf. Theory 2011, 57, 33–55. [Google Scholar] [CrossRef] [Green Version]
  44. Zamir, R. Lattice Coding for Signals and Networks; Cabridge University Press: Cambridge, UK, 2014. [Google Scholar]
  45. Fuglsig, A.J.; Østergaard, J. Zero-delay Multiple descriptions of stationary scalar Gauss-Markov sources. Entropy 2019, 21, 1185. [Google Scholar] [CrossRef] [Green Version]
  46. Tanaka, T.; Johansson, K.H.; Oechtering, T.; Sandberg, H.; Skoglund, M. Rate of prefix-free codes in LQG control systems. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 2399–2403. [Google Scholar]
  47. Zamir, R.; Feder, M. Information rates of pre/post-filtered dithered quantizers. IEEE Trans. Inf. Theory 1996, 42, 1340–1353. [Google Scholar] [CrossRef]
  48. Conway, J.H.; Sloane, N.J.A. Sphere-packings, Lattices, and Groups, 3rd ed.; Springer-Verlag New York, Inc.: New York, NY, USA, 1999. [Google Scholar]
  49. Harville, D.A. Matrix Algebra From a Statistician’s Perspective; Springer: New York, NY, USA, 1997. [Google Scholar]
  50. Kim, K.K.K. Optimization and Convexity of log det(I+KX−1). Int. J. Contr. Autom. Syst. 2019, 17, 1067–1070. [Google Scholar] [CrossRef]
Figure 1. System model. The encoder receives information from two users that do not interact from a dynamical system perspective, but they are allowed to allocate bits between them and across the dimensions. The compression is done causally whereas the clocks of the encoder and the decoder are assumed to be synchronized.
Figure 1. System model. The encoder receives information from two users that do not interact from a dynamical system perspective, but they are allowed to allocate bits between them and across the dimensions. The compression is done causally whereas the clocks of the encoder and the decoder are assumed to be synchronized.
Entropy 22 00842 g001
Figure 2. Centralized multivariable multi-input multi-output (MIMO) control system.
Figure 2. Centralized multivariable multi-input multi-output (MIMO) control system.
Entropy 22 00842 g002
Figure 3. Comparison of R joint na ( D * ) , R pd na ( D ) when D i i D j j , for i j , and comparison with ([16], Equation (27)).
Figure 3. Comparison of R joint na ( D * ) , R pd na ( D ) when D i i D j j , for i j , and comparison with ([16], Equation (27)).
Entropy 22 00842 g003
Figure 4. Comparison of R joint L B ( D * ) , R pd L B ( D ) and R L B ( D T ) when D i i = D j j , i j .
Figure 4. Comparison of R joint L B ( D * ) , R pd L B ( D ) and R L B ( D T ) when D i i = D j j , i j .
Entropy 22 00842 g004
Figure 5. R L B ( D cov ) as a function of γ 0 .
Figure 5. R L B ( D cov ) as a function of γ 0 .
Entropy 22 00842 g005
Figure 6. Comparison of R L B ( D cov ) (restricted to its values in the main diagonal) and R pd L B ( D ) for certain values of γ.
Figure 6. Comparison of R L B ( D cov ) (restricted to its values in the main diagonal) and R pd L B ( D ) for certain values of γ.
Entropy 22 00842 g006
Figure 7. Comparison of the lower bound R joint L B ( D * ) with the analytical expression of (60).
Figure 7. Comparison of the lower bound R joint L B ( D * ) with the analytical expression of (60).
Entropy 22 00842 g007
Figure 8. DPCM scheme with feedback loop for the augmented multidimensional Markov model of (3).
Figure 8. DPCM scheme with feedback loop for the augmented multidimensional Markov model of (3).
Entropy 22 00842 g008
Figure 9. Forward test channel realization of (71).
Figure 9. Forward test channel realization of (71).
Entropy 22 00842 g009
Figure 10. Comparison of lower and upper bounds on R joint c ( D ) when H is full rank.
Figure 10. Comparison of lower and upper bounds on R joint c ( D ) when H is full rank.
Entropy 22 00842 g010
Figure 11. Comparison of the lower and upper bounds on R joint c ( D ) when H is rank deficient.
Figure 11. Comparison of the lower and upper bounds on R joint c ( D ) when H is rank deficient.
Entropy 22 00842 g011

Share and Cite

MDPI and ACS Style

Stavrou, P.A.; Østergaard, J.; Skoglund, M. Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints. Entropy 2020, 22, 842. https://doi.org/10.3390/e22080842

AMA Style

Stavrou PA, Østergaard J, Skoglund M. Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints. Entropy. 2020; 22(8):842. https://doi.org/10.3390/e22080842

Chicago/Turabian Style

Stavrou, Photios A., Jan Østergaard, and Mikael Skoglund. 2020. "Bounds on the Sum-Rate of MIMO Causal Source Coding Systems with Memory under Spatio-Temporal Distortion Constraints" Entropy 22, no. 8: 842. https://doi.org/10.3390/e22080842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop