Next Article in Journal
Bipartite Structures in Social Networks: Traditional versus Entropy-Driven Analyses
Next Article in Special Issue
Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information
Previous Article in Journal
Information Dynamics of the Brain, Cardiovascular and Respiratory Network during Different Levels of Mental Stress
Previous Article in Special Issue
MIMO Gaussian State-Dependent Channels with a State-Cognitive Helper
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Monotone Path Proof of an Extremal Result for Long Markov Chains

1
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
2
Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
*
Authors to whom correspondence should be addressed.
Entropy 2019, 21(3), 276; https://doi.org/10.3390/e21030276
Submission received: 31 January 2019 / Revised: 5 March 2019 / Accepted: 11 March 2019 / Published: 13 March 2019
(This article belongs to the Special Issue Multiuser Information Theory II)

Abstract

:
We prove an extremal result for long Markov chains based on the monotone path argument, generalizing an earlier work by Courtade and Jiao.

1. Introduction

Shannon’s entropy power inequality (EPI) and its variants assert the optimality of the Gaussian solution to certain extremal problems. They play important roles in characterizing the information-theoretic limits of network information theory problems that involve Gaussian sources and/or channels (see, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]). Many different approaches have been developed for proving such extremal results. Two notable ones are the doubling trick [16,17] and the monotone path argument [18,19,20,21]. Roughly speaking, the former reaches the desired conclusion by establishing the subadditivity of the relevant functional while the latter accomplishes its goal by constructing a monotone path with one end associated with an arbitrary given point in the feasible region and the other associated with the optimal solution to the Gaussian version of the problem. Though the doubling trick typically yields simpler proofs, the monotone path argument tends to be more informative. Indeed, it shows not only the existence of the Gaussian optimizer, but also the fact that every Karush–Kuhn–Tucker (KKT) point (also known as the stationary point) of the Gaussian version of the problem is in fact globally optimal. Such information is highly useful for numerical optimization.
Several years ago, inspired by the Gaussian two-terminal source coding problem, Courtade [22] conjectured the following EPI-type extremal result for long Markov chains.
Conjecture 1.
Let X and Z be two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ X 0 and Σ Z 0 respectively, and define Y = X + Z . Then, for any μ 0 ,
inf U , V : U X Y V I ( X ; U ) μ I ( Y ; U ) + I ( Y ; V | U ) μ I ( X ; V | U )
has a Gaussian minimizer (i.e., the infimum is achieved by some ( U , V ) jointly Gaussian with ( X , Y ) ). Here, U X Y V means U, X , Y , and V form a Markov chain in that order.
Later, Courtade and Jiao [23] proved this conjecture using the doubling trick. It is natural to ask whether this conjecture can also be proved via the monotone path argument. We shall show in this paper that it is indeed possible. Our work also sheds some light on the connection between these two proof strategies.
In fact, we shall prove a strengthened version of Conjecture 1 with some additional constraints imposed on ( U , V ) . For any random (column) vector S and random object ω , let D S | ω denote the distortion covariance matrix incurred by the minimum mean squared error (MMSE) estimator of S from ω (i.e., E [ ( S E [ S | ω ] ) ( S E [ S | ω ] ) t ] ), where ( · ) t is the transpose operator. The main result of this paper is as follows.
Theorem 1.
For any μ 0 ,
inf U , V : U X Y V ( D X | Y , U , D Y | X , V ) D μ h ( X | U ) + μ h ( Y | U ) + ( μ 1 ) h ( X | U , V ) h ( Y | V ) + h ( X | V )
has a Gaussian minimizer, where D = { ( D a , D b ) : 0 D a D 1 , 0 D b D 2 } with D 1 and D 2 satisfying 0 D 1 D X | Y and 0 D 2 D Y | X .
Remark 1.
The objective functions in (1) and (2) are equivalent. Indeed,
I ( X ; U ) μ I ( Y ; U ) + I ( Y ; V | U ) μ I ( X ; V | U ) h ( X | U ) + μ h ( Y | U ) + h ( Y | U ) h ( Y | U , V ) μ h ( X | U ) + μ h ( X | U , V ) = ( μ + 1 ) h ( X | U ) + ( μ + 1 ) h ( Y | U ) h ( Y | U , V ) + μ h ( X | U , V ) + h ( Y | X , U , V ) h ( Y | X , V ) = ( μ + 1 ) h ( X | U ) + ( μ + 1 ) h ( Y | U ) I ( Y ; X | U , V ) + μ h ( X | U , V ) h ( Y | X , V ) = ( μ + 1 ) h ( X | U ) + ( μ + 1 ) h ( Y | U ) + ( μ 1 ) h ( X | U , V ) + h ( X | Y , U ) h ( Y | X , V ) ( μ + 1 ) h ( X | U ) + ( μ + 1 ) h ( Y | U ) + ( μ 1 ) h ( X | U , V ) + h ( X | U ) h ( Y | U ) h ( Y | X , V ) μ h ( X | U ) + μ h ( Y | U ) + ( μ 1 ) h ( X | U , V ) h ( Y | V ) + h ( X | V ) ,
where “≈” means that the two sides are equal up to an additive constant.
Remark 2.
Conjecture 1 corresponds to the special case where D 1 = D X | Y and D 2 = D Y | X .
The rest of the paper is organized as follows. Section 2 is devoted to the analysis of the Gaussian version of the optimization problem in (2). The key construction underlying our monotone path argument is introduced in Section 3. The proof of Theorem 1 is presented in Section 4. We conclude the paper in Section 5.

2. The Gaussian Version

In this section, we consider the Gaussian version of the optimization problem in (2) defined by imposing the restriction that ( U , V ) and ( X , Y ) are jointly Gaussian. Specifically, let U g = X + N 1 and V g = Y + N 2 , where N 1 and N 2 are two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ 1 0 and Σ 2 0 , respectively. It is assumed that ( N 1 , N 2 ) is independent of ( X , Y ) ; as a consequence, the Markov chain constraint U g X Y V g is satisfied. Clearly,
μ h ( X | U g ) + μ h ( Y | U g ) + ( μ 1 ) h ( X | U g , V g ) h ( Y | V g ) + h ( X | V g ) μ h ( X | Y , U g ) + ( μ 1 ) h ( X | U g , V g ) h ( Y | X , V g ) μ 2 log | D X | Y , U g | + μ 1 2 log | D X | U g , V g | 1 2 log | D Y | X , V g | .
Moreover,
D X | Y , U g = ( Σ X 1 + Σ Z 1 + Σ 1 1 ) 1 ,
D Y | X , V g = ( Σ Z 1 + Σ 2 1 ) 1 ,
D X | U g , V g = ( Σ X 1 + Σ 1 1 + ( Σ Z + Σ 2 ) 1 ) 1 .
Now, it is straightforward to write down the Gaussian version of the optimization problem in (2) with Σ 1 and Σ 2 as variables. However, as shown in the sequel, through a judicious change of variables, one can obtain an equivalent version that is more amenable to analysis.
Given λ ( 0 , 1 ) , we introduce two random (column) vectors M X and M Y , independent of ( X , Y , U g , V g ) , such that the joint distribution of ( M X , M Y ) is the same as that of
1 λ λ ( X E [ X | U g , V g ] ) , λ 1 λ ( Y E [ Y | U g , V g ] ) .
Denote the covariance matrix of ( M X t , M Y t ) t by Σ ( M X t , M Y t ) t . We have
Σ ( M X t , M Y t ) t 1 = λ 1 λ I 0 0 1 λ λ I Σ ( X t , Y t ) t 1 + Σ ( N 1 t , N 2 t ) t 1 λ 1 λ I 0 0 1 λ λ I = λ 1 λ ( Σ X 1 + Σ Z 1 + Σ 1 1 ) Σ Z 1 Σ Z 1 1 λ λ ( Σ Z 1 + Σ 2 1 ) .
Define W X = X + M X and W Y = Y + M Y . Since
D ( X t , Y t ) t | W X , W Y = ( Σ ( X t , Y t ) t 1 + Σ M X , M Y 1 ) 1 = Σ X 1 + Σ Z 1 + λ 1 λ ( Σ X 1 + Σ Z 1 + Σ 1 1 ) 0 0 Σ Z 1 + 1 λ λ ( Σ Z 1 + Σ 2 1 ) 1 ,
X and Y must be conditionally independent given ( W X , W Y ) . It can be verified that
D X g : = D X | W X , W Y , U g = ( D X | W X , W Y 1 + Σ 1 1 ) 1 = ( 1 λ ) ( Σ X 1 + Σ Z 1 + Σ 1 1 ) 1 , D Y g : = D Y | W X , W Y , V g = ( D Y | W X , W Y 1 + Σ 2 1 ) 1 = λ ( Σ Z 1 + Σ 2 1 ) 1 ,
which implies
Σ 1 = ( ( 1 λ ) D X 1 Σ X 1 Σ Z 1 ) 1 ,
Σ 2 = ( λ ( D Y g ) 1 Σ Z 1 ) 1 .
Substituting (7) and (8) into (3–(5) gives
D X | Y , U g = 1 1 λ D X g ,
D Y | X , V g = 1 λ D Y g ,
D X | U g , V g = 1 1 λ ( D X g ) 1 1 λ ( 1 λ ) Σ Z 1 D Y g Σ Z 1 1 .
Therefore, the Gaussian version of the optimization problem in (2) can be written as
inf ( D X g , D Y g ) D μ 2 log | D X g | μ 1 2 log ( D X g ) 1 1 λ ( 1 λ ) Σ Z 1 D Y g Σ Z 1 1 2 log | D Y g | ,
where D = { ( D a , D b ) : 0 D a ( 1 λ ) D 1 , 0 D b λ D 2 } . It is clear that the infimum in (12) is achievable by some ( D X , D Y ) D . Moreover, such ( D X , D Y ) must satisfy the following KKT conditions:
μ ( D X ) 1 + ( μ 1 ) ( D X ) 1 ( D X ) 1 1 λ ( 1 λ ) Σ Z 1 D Y Σ Z 1 1 ( D X ) 1 + Π 1 = 0 ,
( μ 1 ) λ ( 1 λ ) Σ Z 1 ( D X ) 1 1 λ ( 1 λ ) Σ Z 1 D Y Σ Z 1 1 Σ Z 1 ( D Y ) 1 + Π 2 = 0 ,
Π 1 ( D X ( 1 λ ) D 1 ) = 0 ,
Π 2 ( D Y λ D 2 ) = 0 ,
where Π 1 , Π 2 0 .

3. The Key Construction

Let ( X , Y ) be an identically distributed copy of ( X , Y ) . Moreover, let N 1 and N 2 be two n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ 1 and Σ 2 , respectively. It is assumed that ( X , Y ) , N 1 , and N 2 are mutually independent. Define U = X + N 1 and V = Y + N 2 . We choose Σ 1 and Σ 2 such that (cf. (9)–(11))
D X | Y , U = 1 1 λ D X ,
D Y | X , V = 1 λ D Y ,
D X | U , V = 1 1 λ ( D X ) 1 1 λ ( 1 λ ) Σ Z 1 D Y Σ Z 1 1
for some ( D X , D Y ) D satisfying the KKT conditions (13)–(16).
Let U and V be two arbitrary random objects jointly distributed with ( X , Y ) such that U X Y V and ( D X | Y , U , D Y | X , V ) D . It is assumed that ( X , Y , U , V ) and ( X , Y , U , V ) are mutually independent. We aim to show that the objective function in (2) does not increase when ( X , Y , U , V ) is replaced with ( X , Y , U , V ) , from which the desired result follows immediately. To this end, we develop a monotone path argument based on the following construction.
For λ [ 0 , 1 ] , define
X λ X ¯ λ = λ I 1 λ I 1 λ I λ I X X , Y λ Y ¯ λ = λ I 1 λ I 1 λ I λ I Y Y .
It is easy to verify the following Markov structures (see Figure 1a):
( U , U ) ( X , X ) ( X λ , Y ¯ λ ) ( Y , Y ) ( V , V ) , ( U , U ) ( X , X ) ( X ¯ λ , Y λ ) ( Y , Y ) ( V , V ) , λ [ 0 , 1 ] .
Note that, as λ changes from 0 to 1, X λ ( Y λ ) moves from X ( Y ) to X ( Y ) while X ¯ λ ( Y ¯ λ ) moves the other way around. This construction generalizes its counterpart in the doubling trick, which corresponds to the special case λ = 1 2 .
For λ ( 0 , 1 ) , define W X = X + M X and W Y = Y + M Y , where
M X = 1 λ λ ( X E [ X | U , V ] ) , M Y = λ 1 λ ( Y E [ Y | U , V ] ) .
In view of (6), we have the following Markov structure (see Figure 1b):
U X ( W X , W Y ) Y V .
Define D X = D X | X λ , Y ¯ λ , U , U and D Y = D Y | X λ , Y ¯ λ , V , V . It can be verified that for λ ( 0 , 1 ) ,
D X = D X | X λ , Y ¯ λ , U , U , V , V
= D X | W X , W Y , U , U , V , V
= D X | W X , W Y , U , V
= D X | W X , W Y , U ,
where (22) is due to (20), (23) is due to the existence of a bijection between ( X λ , Y ¯ λ , U , V ) and ( W X , W Y , U , V ) , (24) is due to the fact that ( U , V ) is independent of ( X , W X , W Y , U , V ) , and (25) is due to (21). Similarly, we have D Y = D Y | W X , W Y , V for λ ( 0 , 1 ) .

4. Proof of Theorem 1

The following technical lemmas are needed for proving Theorem 1. Their proofs are relegated to the Appendix A, Appendix B, Appendix C and Appendix D.
Lemma 1.
For λ ( 0 , 1 ) ,
d d λ h ( X λ | U , U , V , V ) = n 2 ( 1 λ ) + 1 2 ( 1 λ ) 2 t r D X | U , V 1 D X | X λ , U , U , V , V ,
d d λ h ( X λ , Y ¯ λ | U , U ) = n 2 ( 1 λ ) + 1 2 ( 1 λ ) 2 t r D X | Y , U 1 D X ,
d d λ h ( X λ , Y ¯ λ | V , V ) = n 2 λ 1 2 λ 2 t r D Y | X , V 1 D Y .
Lemma 2.
For λ ( 0 , 1 ) ,
D X | X λ , U , U , V , V Δ 0 ,
where
Δ = D X 1 1 ( 1 λ ) 2 Σ Z 1 D Y 1 1 λ D Y D Y 1 D Y Σ Z 1 1 .
Lemma 3.
( B 1 A 1 ) 1 is matrix convex in ( A , B ) for A B 0 .
Lemma 4.
Let X be a Gaussian random vector and U be an arbitrary random object. Moreover, let N 1 and N 2 be two zero-mean Gaussian random vectors, independent of ( X , U ) , with covariance matrices Σ 1 and Σ 2 respectively. If Σ 2 Σ 1 0 , then
D X | X + N 2 , U D X | X + N 1 , U 1 + Σ 2 1 Σ 1 1 1 .
Now, we are in a position to prove Theorem 1. Define
h λ = μ h ( X ¯ λ | X λ , U , U ) + μ h ( Y ¯ λ | X λ , U , U ) + ( μ 1 ) h ( X ¯ λ | X λ , U , U , V , V ) h ( Y ¯ λ | X λ , V , V ) + h ( X ¯ λ | X λ , V , V ) .
Clearly,
h λ λ = 0 = μ h ( X | U ) + μ h ( Y | U ) + ( μ 1 ) h ( X | U , V ) h ( Y | V ) + h ( X | V ) , h λ λ = 1 = μ h ( X | U ) + μ h ( Y | U ) + ( μ 1 ) h ( X | U , V ) h ( Y | V ) + h ( X | V ) .
Therefore, it suffices to show d h λ d λ 0 for λ ( 0 , 1 ) . Note that
h λ = μ h ( X λ , X ¯ λ | U , U ) + μ h ( X λ , Y ¯ λ | U , U ) + ( μ 1 ) h ( X λ , X ¯ λ | U , U , V , V ) ( μ 1 ) h ( X λ | U , U , V , V ) h ( X λ , Y ¯ λ | V , V ) + h ( X λ , X ¯ λ | V , V ) .
Since h ( X λ , X ¯ λ | U , U ) = h ( X , X | U , U ) , h ( X λ , X ¯ λ | U , U , V , V ) = h ( X , X | U , U , V , V ) , and h ( X λ , X ¯ λ | V , V ) = h ( X , X | V , V ) , we have
d d λ h ( X λ , X ¯ λ | U , U ) = 0 , d d λ h ( X λ , X ¯ λ | U , U , V , V ) = 0 , d d λ h ( X λ , X ¯ λ | V , V ) = 0 ,
which, together with Lemma 1, implies
d h λ d λ = n 2 λ ( 1 λ ) + μ 2 ( 1 λ ) 2 tr D X | Y , U 1 D X + 1 2 λ 2 tr D Y | X , V 1 D Y ( μ 1 ) 2 ( 1 λ ) 2 tr D X | U , V 1 D X | X λ , U , U , V , V .
Combining Lemma 2 and (29) shows
d h λ d λ n 2 λ ( 1 λ ) 1 2 f ( D X , D Y ) ,
where
f ( D X , D Y ) = μ ( 1 λ ) 2 tr D X | Y , U 1 D X 1 λ 2 tr D Y | X , V 1 D Y + ( μ 1 ) ( 1 λ ) 2 tr D X | U , V 1 Δ .
We shall derive a lower bound for f ( D X , D Y ) . To this end, we first identify certain constraints on D X and D Y . Let W = X + Z ˜ , where Z ˜ is a zero-mean Gaussian random vector, independent of ( X , U ) , with covariance matrix Σ Z ˜ . We shall choose Σ Z ˜ such that D X | W = D X | W X , W Y , which implies
Σ Z ˜ Σ Z ,
D X | W , U = D X | W X , W Y , U .
It can be verified (cf. (6)) that
Σ Z ˜ = Σ Z 1 + λ 1 λ ( Σ X 1 + Σ Z 1 + ( Σ 1 ) 1 ) 1 = Σ Z 1 + λ ( D X ) 1 1 .
In view of (25), (31), (32), and Lemma 4,
D X | Y , U D X 1 + Σ Z 1 Σ Z ˜ 1 1 ,
which, together with (33) and the constraint 0 D X | Y , U D 1 , implies
D X D 1 1 + λ ( D X ) 1 1 .
Similarly, we have
D Y D 2 1 + ( 1 λ ) D Y 1 1 .
Define D ¯ 1 = D 1 1 + λ ( D X ) 1 1 , D ¯ 2 = D 2 1 + ( 1 λ ) D Y 1 1 , and D ¯ = { ( D a , D b ) : 0 D a D ¯ 1 , 0 D b D ¯ 2 } . Consider
min ( D X , D Y ) D ¯ f ( D X , D Y ) ,
which is a convex semidefinite programming problem according to Lemma 3 (in view of (34) and Lemma 2, we have D Y 1 1 λ D Y and Δ 0 ). Note that ( D X , D Y ) D ¯ is an optimal solution to this convex programming problem if and only if it satisfies the following KKT conditions:
μ ( 1 λ ) 2 D X | Y , U 1 + ( μ 1 ) ( 1 λ ) 2 D X 1 Δ D X | U , V 1 Δ D X 1 + Π 1 = 0 1 λ 2 D Y | X , V 1 + ( μ 1 ) ( 1 λ ) 2 ( D Y ( 1 λ ) D Y ) 1 D Y Σ Z 1 Δ D X | U , V 1 Δ Σ Z 1 D Y ( D Y ( 1 λ ) D Y ) 1
+ Π 2 = 0 ,
Π 1 ( D X D ¯ 1 ) = 0 ,
Π 2 ( D Y D ¯ 2 ) = 0 ,
where Π 1 , Π 2 0 . Let D X = D X , D Y = D Y , Π 1 = 1 1 λ Π 1 , and Π 2 = 1 λ Π 2 . One can readily verify by leveraging (17)–(19) that (35)–(38) with this particular choice of ( D X , D Y , Π 1 , Π 2 ) can be deduced from (13)–(16); it is also easy to see that ( D X , D Y ) D ¯ . Therefore,
min ( D X , D Y ) D ¯ f ( D X , D Y ) = f ( D X , D Y ) .
Combining (30), (39), and the fact that f ( D X , D Y ) = n λ ( 1 λ ) shows d h λ d λ 0 , which completes the proof.

5. Conclusions

We have generalized an extremal result by Courtade and Jiao. It is worth mentioning that recently Courtade [24] found a different generalization using the doubling trick. So far, we have not been able to prove this new result via the monotone path argument. A deeper understanding of the connection between these two methods is needed. It is conceivable that the convex-like property revealed by the monotone path argument and the subadditive property revealed by the doubling trick are manifestations of a common underlying mathematical structure yet to be uncovered.

Author Contributions

Methodology, J.W., J.C.; writing—original draft preparation, J.W.; writing—review and editing, J.C.

Funding

This research was funded by the National Science Foundation of China grant number 61771305.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 1

Note that
d d λ h ( X λ | U , U , V , V ) = d d λ n 2 log λ + d d λ h ( X + M X | U , V ) = n 2 λ + d d λ h ( X + M X | U , V ) = n 2 λ + t r Σ M X h ( X + M X | U , V ) λ Σ M X = n 2 λ 1 λ 2 t r D X | U , V Σ M X h ( X + M X | U , V ) ,
where Σ M X = 1 λ λ D X | U , V . We have
Σ M X h ( X + M X | U , V ) = 1 2 J ( X + M X | U , V )
= λ 2 J ( X λ | U , U , V , V ) ,
where J ( · ) denotes the Fisher information matrix, and (A2) is due to ([25], Theorem 4). Substituting (A3) into (A1) gives
d d λ h ( X λ | U , U , V , V ) = n 2 λ 1 2 λ tr D X | U , V J ( X λ | U , U , V , V ) ,
which, together with the fact ([26], Lemma 2) that
J ( X λ | U , U , V , V ) = 1 1 λ D X | U , V 1 λ ( 1 λ ) 2 D X | U , V 1 D X | X λ , U , U , V , V D X | U , V 1 ,
proves (26).
It remains to prove (27) since (28) follows by symmetry. Note that
X λ Y ¯ λ = λ I 0 0 1 λ I X Y + N X N Y ,
where N X = 1 λ λ X and N Y = λ 1 λ Y . Define
N ¯ X = N X E [ N X | U ] , N ¯ Y = N Y E [ N Y | U ] .
Denote the covariance matrix of ( N ¯ X t , N ¯ Y t ) t by Σ ( N ¯ X t , N ¯ Y t ) t . It can be verified (cf. (A4) and (A5)) that
d d λ h ( X λ , Y ¯ λ | U , U ) = ( 1 2 λ ) n 2 λ ( 1 λ ) + 1 2 tr λ Σ ( N ¯ X t , N ¯ Y t ) t J ( X t , Y t ) t + ( N ¯ X t , N ¯ Y t ) t | U ,
J ( X t , Y t ) t + ( N ¯ X t , N ¯ Y t ) t | U = Σ ( N ¯ X t , N ¯ Y t ) t 1 Σ ( N ¯ X t , N ¯ Y t ) t 1 D ( X t , Y t ) t | X λ , Y ¯ λ , U , U Σ ( N ¯ X t , N ¯ Y t ) t 1 .
Since
Σ ( N ¯ X t , N ¯ Y t ) t = 1 λ λ D X | U D X | U D X | U λ 1 λ ( D X | U + Σ Z ) ,
we have
Σ ( N ¯ X t , N ¯ Y t ) t 1 = λ 1 λ ( D X | U 1 + Σ Z 1 ) Σ Z 1 Σ Z 1 1 λ λ Σ Z 1 , λ Σ ( N ¯ X t , N ¯ Y t ) t = 1 λ 2 D X | U 0 0 1 ( 1 λ ) 2 ( D X | U + Σ Z ) ,
which further implies
1 2 tr λ Σ ( N ¯ X t , N ¯ Y t ) t Σ ( N ¯ X t , N ¯ Y t ) t 1 = 0 .
Combining (A6)–(A8) gives
d d λ h ( X λ , Y ¯ λ | U , U ) = ( 1 2 λ ) n 2 λ ( 1 λ ) 1 2 tr Σ ( N ¯ X t , N ¯ Y t ) t 1 λ Σ ( N ¯ X t , N ¯ Y t ) t Σ ( N ¯ X t , N ¯ Y t ) t 1 D ( X t , Y t ) t | X λ , Y ¯ λ , U , U .
In view of (20),
D ( X t , Y t ) t | X λ , Y ¯ λ , U , U = D X 0 0 D Y | X λ , Y ¯ λ .
Moreover,
Σ ( N ¯ X t , N ¯ Y t ) t 1 λ Σ ( N ¯ X t , N ¯ Y t ) t Σ ( N ¯ X t , N ¯ Y t ) t 1 = 1 ( 1 λ ) 2 ( D X | U 1 + Σ Z 1 ) 0 0 1 λ 2 Σ Z 1 = 1 ( 1 λ ) 2 D X | Y , U 1 0 0 1 λ 2 Σ Z 1 .
Substituting (A10) and (A11) into (A9) yields
d d λ h ( X λ , Y ¯ λ | U , U ) = ( 1 2 λ ) n 2 λ ( 1 λ ) + 1 2 ( 1 λ ) 2 tr D X | Y , U 1 D X 1 2 λ 2 tr Σ Z 1 D Y | X λ , Y ¯ λ .
It can be verified that
D ( X t , Y t ) t | X λ , Y ¯ λ = D ( X t , Y t ) t | X + N X , Y + N Y = Σ ( X t , Y t ) t 1 + Σ ( N X t , N Y t ) t 1 1 = ( 1 λ ) ( Σ X 1 + Σ Z 1 ) 1 0 0 λ Σ Z ,
which implies
D Y | X λ , Y ¯ λ = λ Σ Z .
Substituting (A13) into (A12) proves (27).

Appendix B. Proof of Lemma 2

It can be verified that E [ W Y | X , Y , W X ] = A X + Y A W X , where A = 1 1 λ D Y Σ Z 1 ; moreover, Q : = W Y E [ W Y | X , Y , W X ] is a zero-mean Gaussian random vector with covariance matrix Σ Q = 1 1 λ D Y , and is independent of ( X , Y , W X , U , V ) . Define Y ˜ = A X + Y . We have
D ( X t , Y ˜ t ) t | W X , W Y , U , V = I 0 A I D ( X t , Y t ) t | W X , W Y , U , V I A t 0 I = I 0 A I D X 0 0 D Y I A t 0 I .
On the other hand,
D ( X t , Y ˜ t ) t | W X , W Y , U , V = D ( X t , Y ˜ t ) t | W X , U , V , Y ˜ + Q D ( X t , Y ˜ t ) t | W X , U , V 1 + 0 0 0 Σ Q 1 1 ,
where (A15) is due to the fact that the distortion covariance matrix incurred by the MMSE estimator of ( X t , Y ˜ t ) t from ( W X , U , V , Y ˜ + Q ) is no greater than (in the semidefinite sense) that incurred by the linear MMSE estimator of ( X t , Y ˜ t ) t from ( E [ ( X t , Y ˜ t ) t | W X , U , V ] , Y ˜ + Q ) . Combining (A14) and (A15) yields
D ( X t , Y ˜ t ) t | W X , U , V D X 1 + A t D Y 1 A A t D Y 1 D Y 1 A D Y 1 Σ Q 1 1 .
Comparing the upper-left submatrices on the two sides of the above inequality gives D X | W X , U , V Δ , which, together with the fact that D X | X λ , U , U , V , V = D X | W X , U , V , proves D X | X λ , U , U , V , V Δ . Moreover, we have
0 D ( X t , Y t ) t | ( W X , U , V ) 1 D X 1 + A t D Y 1 A A t D Y 1 D Y 1 A D Y 1 Σ Q 1 ,
where the second inequality is due to (A14) and (A15). Therefore, Δ , which is equal to the inverse of the upper-left submatrix of
D X 1 + A t D Y 1 A A t D Y 1 D Y 1 A D Y 1 Σ Q 1 1 ,
must be positive definite.

Appendix C. Proof of Lemma 3

It is known [27] that B S 1 B is matrix convex in ( S , B ) for symmetric matrix B and positive definite matrix S . The desired conclusion follows from the fact that ( B 1 A 1 ) 1 = B + B ( A B ) 1 B and that affine transformations preserve convexity.

Appendix D. Proof of Lemma 4

Without loss of generality, we assume that N 2 = N 1 + N ˜ , where N ˜ is a zero-mean Gaussian random vector with covariance matrix Σ ˜ = Σ 2 Σ 1 and is independent of ( X , N 1 , U ) . As a consequence, E [ N 1 | N 2 ] = H N 2 , where H = ( Σ 1 1 + Σ ˜ 1 ) 1 Σ ˜ 1 ; moreover, M : = N 1 E [ N 1 | N 2 ] is a zero-mean Gaussian random vector with covariance matrix Σ M = ( Σ 1 1 + Σ ˜ 1 ) 1 , and is independent of ( X , N 2 , U ) . Note that
J ( X + N 1 | X + N 2 , U ) = J ( I H ) X + M | X + N 2 , U = ( Σ 1 1 + Σ ˜ 1 ) Σ 1 1 D X | X + N 1 , U Σ 1 1 ,
where the second equality follows by ([26], Lemma 2). On the other hand,
J ( X + N 1 | X + N 2 , U ) D X + N 1 | X + N 2 , U 1
= D ( I H ) X + M | X + N 2 , U 1
= ( I H ) D X | X + N 2 , U ( I H t ) + Σ M 1 ,
where (A17) is due to the conditional version of the Cramer–Rao inequality. Combining (A16) and (A18) yields the desired result.

References

  1. Bergmans, P. A simple converse for broadcast channels with additive white Gaussian noise (corresp.). IEEE Trans. Inf. Theory 1974, 20, 279–280. [Google Scholar] [CrossRef]
  2. Weingarten, H.; Steinberg, Y.; Shamai, S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theory 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
  3. Prabhakaran, V.; Tse, D.; Ramchandran, K. Rate region of the quadratic Gaussian CEO problem. In Proceedings of the IEEE International Symposium on Information Theory, (ISIT), Chicago, IL, USA, 27 June–2 July 2004; p. 117. [Google Scholar]
  4. Oohama, Y. Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder. IEEE Trans. Inf. Theory 2005, 51, 2577–2593. [Google Scholar] [CrossRef]
  5. Wang, J.; Chen, J. Vector Gaussian two-terminal source coding. IEEE Trans. Inf. Theory 2013, 59, 3693–3708. [Google Scholar] [CrossRef]
  6. Wang, J.; Chen, J. Vector Gaussian multiterminal source coding. IEEE Trans. Inf. Theory 2014, 60, 5533–5552. [Google Scholar] [CrossRef]
  7. Liu, T.; Shamai, S. A note on the secrecy capacity of the multiple-antenna wiretap channel. IEEE Trans. Inf. Theory 2009, 55, 2547–2553. [Google Scholar] [CrossRef]
  8. Chen, J. Rate region of Gaussian multiple description coding with individual and central distortion constraints. IEEE Trans. Inf. Theory 2009, 55, 3991–4005. [Google Scholar] [CrossRef]
  9. Motahari, A.S.; Khandani, A.K. Capacity bounds for the Gaussian interference channel. IEEE Trans. Inf. Theory 2009, 55, 620–643. [Google Scholar] [CrossRef]
  10. Shang, X.; Kramer, G.; Chen, B. A new outer bound and the noisy-interference sum-rate capacity for Gaussian interference channels. IEEE Trans. Inf. Theory 2009, 55, 689–699. [Google Scholar] [CrossRef]
  11. Annapureddy, V.S.; Veeravalli, V.V. Gaussian interference networks: Sum capacity in the low interference regime and new outer bounds on the capacity region. IEEE Trans. Inf. Theory 2009, 55, 3032–3050. [Google Scholar] [CrossRef]
  12. Song, L.; Chen, J.; Wang, J.; Liu, T. Gaussian robust sequential and predictive coding. IEEE Trans. Inf. Theory 2013, 59, 3635–3652. [Google Scholar] [CrossRef]
  13. Song, L.; Chen, J.; Tian, C. Broadcasting correlated vector Gaussians. IEEE Trans. Inf. Theory 2015, 61, 2465–2477. [Google Scholar] [CrossRef]
  14. Khezeli, K.; Chen, J. A source-channel separation theorem with application to the source broadcast problem. IEEE Trans. Inf. Theory 2016, 62, 1764–1781. [Google Scholar] [CrossRef]
  15. Tian, C.; Chen, J.; Diggavi, S.N.; Shamai, S. Matched multiuser Gaussian source channel communications via uncoded schemes. IEEE Trans. Inf. Theory 2017, 63, 4155–4171. [Google Scholar] [CrossRef]
  16. Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inf. Theory 2014, 60, 2087–2104. [Google Scholar] [CrossRef]
  17. Geng, Y.; Nair, C. Reflection and summary of the paper: The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE IT Soc. Newlett. 2017, 67, 5–7. [Google Scholar]
  18. Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef] [Green Version]
  19. Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory 1965, 11, 267–271. [Google Scholar] [CrossRef]
  20. Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef]
  21. Liu, T.; Viswanath, P. An extremal inequality motivated by multiterminal information-theoretic problems. IEEE Trans. Inf. Theory 2007, 53, 1839–1851. [Google Scholar] [CrossRef]
  22. Courtade, T.A. An Extremal Conjecture: Experimenting With Online Collaboration. Information Theory b-Log. 5 March 2013. Available online: http://blogs.princeton.edu/blogit/2013/03/05/ (accessed on 15 March 2013).
  23. Courtade, T.A.; Jiao, J. An extremal inequality for long Markov chains. arXiv, 2014; arXiv:1404.6984v1. [Google Scholar]
  24. Courtade, T.A. A strong entropy power inequality. IEEE Trans. Inf. Theory 2018, 64, 2173–2192. [Google Scholar] [CrossRef]
  25. Palomar, D.P.; Verdú, S. Gradient of mutual information in linear vector Gaussian channels. IEEE Trans. Inf. Theory 2006, 52, 141–154. [Google Scholar] [CrossRef] [Green Version]
  26. Zhou, Y.; Xu, Y.; Yu, W.; Chen, J. On the optimal fronthaul compression and decoding strategies for uplink cloud radio access networks. IEEE Trans. Inf. Theory 2016, 62, 7402–7418. [Google Scholar] [CrossRef]
  27. Marshall, A.W.; Olkin, I. Inequalities: Theory of Majorization and Its Applications; Academic Press: New York, NY, USA, 1979. [Google Scholar]
Figure 1. Illustrations of the Markov structures in (20) and (21).
Figure 1. Illustrations of the Markov structures in (20) and (21).
Entropy 21 00276 g001

Share and Cite

MDPI and ACS Style

Wang, J.; Chen, J. A Monotone Path Proof of an Extremal Result for Long Markov Chains. Entropy 2019, 21, 276. https://doi.org/10.3390/e21030276

AMA Style

Wang J, Chen J. A Monotone Path Proof of an Extremal Result for Long Markov Chains. Entropy. 2019; 21(3):276. https://doi.org/10.3390/e21030276

Chicago/Turabian Style

Wang, Jia, and Jun Chen. 2019. "A Monotone Path Proof of an Extremal Result for Long Markov Chains" Entropy 21, no. 3: 276. https://doi.org/10.3390/e21030276

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop