A Monotone Path Proof of an Extremal Result for Long Markov Chains

We prove an extremal result for long Markov chains based on the monotone path argument, generalizing an earlier work by Courtade and Jiao.


Introduction
Shannon's entropy power inequality (EPI) and its variants assert the optimality of the Gaussian solution to certain extremal problems. They play important roles in characterizing the information-theoretic limits of network information theory problems that involve Gaussian sources and/or channels (see, e.g., [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]). Many different approaches have been developed for proving such extremal results. Two notable ones are the doubling trick [16,17] and the monotone path argument [18][19][20][21]. Roughly speaking, the former reaches the desired conclusion by establishing the subadditivity of the relevant functional while the latter accomplishes its goal by constructing a monotone path with one end associated with an arbitrary given point in the feasible region and the other associated with the optimal solution to the Gaussian version of the problem. Though the doubling trick typically yields simpler proofs, the monotone path argument tends to be more informative. Indeed, it shows not only the existence of the Gaussian optimizer, but also the fact that every Karush-Kuhn-Tucker (KKT) point (also known as the stationary point) of the Gaussian version of the problem is in fact globally optimal. Such information is highly useful for numerical optimization.
Several years ago, inspired by the Gaussian two-terminal source coding problem, Courtade [22] conjectured the following EPI-type extremal result for long Markov chains. Conjecture 1. Let X and Z be two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ X 0 and Σ Z 0 respectively, and define Y = X + Z. Then, for any µ ≥ 0, inf U,V:U↔X↔Y↔V has a Gaussian minimizer (i.e., the infimum is achieved by some (U, V) jointly Gaussian with (X, Y)). Here, U ↔ X ↔ Y ↔ V means U, X, Y, and V form a Markov chain in that order.
Later, Courtade and Jiao [23] proved this conjecture using the doubling trick. It is natural to ask whether this conjecture can also be proved via the monotone path argument. We shall show in this paper that it is indeed possible. Our work also sheds some light on the connection between these two proof strategies.
In fact, we shall prove a strengthened version of Conjecture 1 with some additional constraints imposed on (U, V). For any random (column) vector S and random object ω, let D S|ω denote the distortion covariance matrix incurred by the minimum mean squared error (MMSE) estimator of S from ω (i.e., , where (·) t is the transpose operator. The main result of this paper is as follows. (1) and (2) are equivalent. Indeed,

Remark 1. The objective functions in
where "≈" means that the two sides are equal up to an additive constant. The rest of the paper is organized as follows. Section 2 is devoted to the analysis of the Gaussian version of the optimization problem in (2). The key construction underlying our monotone path argument is introduced in Section 3. The proof of Theorem 1 is presented in Section 4. We conclude the paper in Section 5.

The Gaussian Version
In this section, we consider the Gaussian version of the optimization problem in (2) defined by imposing the restriction that (U, V) and (X, Y) are jointly Gaussian. Specifically, let U g = X + N 1 and V g = Y + N 2 , where N 1 and N 2 are two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ 1 0 and Σ 2 0, respectively. It is assumed that (N 1 , N 2 ) is independent of (X, Y); as a consequence, the Markov chain constraint Moreover, Now, it is straightforward to write down the Gaussian version of the optimization problem in (2) with Σ 1 and Σ 2 as variables. However, as shown in the sequel, through a judicious change of variables, one can obtain an equivalent version that is more amenable to analysis. Given λ ∈ (0, 1), we introduce two random (column) vectors M X and M Y , independent of (X, Y, U g , V g ), such that the joint distribution of (M X , M Y ) is the same as that of Denote the covariance matrix of (M t X and Y must be conditionally independent given (W X , W Y ). It can be verified that which implies Substituting (7) and (8) into (3-(5) gives Therefore, the Gaussian version of the optimization problem in (2) can be written as where Π 1 , Π 2 0.

The Key Construction
Let (X * , Y * ) be an identically distributed copy of (X, Y). Moreover, let N * 1 and N * 2 be two n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices Σ * 1 and Σ * 2 , respectively. It is assumed that (X * , Y * ), N * 1 , and N * 2 are mutually independent. Define U * = X * + N * 1 and V * = Y * + N * 2 . We choose Σ * 1 and Σ * 2 such that (cf. (9)- (11)) for some (D * X , D * Y ) ∈ D satisfying the KKT conditions (13)- (16). Let U and V be two arbitrary random objects jointly distributed with (X, Y) such that U ↔ X ↔ Y ↔ V and (D X|Y,U , D Y|X,V ) ∈ D. It is assumed that (X, Y, U, V) and (X * , Y * , U * , V * ) are mutually independent. We aim to show that the objective function in (2) does not increase when (X, Y, U, V) is replaced with (X * , Y * , U * , V * ), from which the desired result follows immediately. To this end, we develop a monotone path argument based on the following construction.
For λ ∈ [0, 1], define It is easy to verify the following Markov structures (see Figure 1a): Note that, as λ changes from 0 to 1, X λ (Y λ ) moves from X * (Y * ) to X (Y) whileX λ (Ȳ λ ) moves the other way around. This construction generalizes its counterpart in the doubling trick, which corresponds to the special case λ = 1 2 .

Proof of Theorem 1
The following technical lemmas are needed for proving Theorem 1. Their proofs are relegated to the Appendices A-D.
Lemma 2. For λ ∈ (0, 1), Lemma 4. Let X be a Gaussian random vector and U be an arbitrary random object. Moreover, let N 1 and N 2 be two zero-mean Gaussian random vectors, independent of (X, U), with covariance matrices Σ 1 and Σ 2 respectively. If Σ 2 Σ 1 0, then Now, we are in a position to prove Theorem 1. Define Clearly, Therefore, it suffices to show dh λ dλ ≤ 0 for λ ∈ (0, 1). Note that which, together with Lemma 1, implies Combining Lemma 2 and (29) shows where We shall derive a lower bound for f (D X , D Y ). To this end, we first identify certain constraints on D X and D Y . Let W = X +Z, whereZ is a zero-mean Gaussian random vector, independent of (X, U), with covariance matrix ΣZ. We shall choose ΣZ such that D X|W = D X|W * It can be verified (cf. (6)) that In view of (25), (31), (32), and Lemma 4, which, together with (33) and the constraint 0 ≺ D X|Y,U D 1 , implies Similarly, we have which is a convex semidefinite programming problem according to Lemma 3 (in view of (34) and Lemma 2, we have D Y ≺ 1 1−λ D * Y and ∆ 0). Note that (D X , D Y ) ∈D is an optimal solution to this convex programming problem if and only if it satisfies the following KKT conditions: where Π 1 , Π 2 0. Let D X = D * X , D Y = D * Y , Π 1 = 1 1−λ Π 1 , and Π 2 = 1 λ Π 2 . One can readily verify by leveraging (17)-(19) that (35)-(38) with this particular choice of (D X , D Y , Π 1 , Π 2 ) can be deduced from (13)- (16); it is also easy to see that (D * X , D * Y ) ∈D. Therefore, Combining (30), (39), and the fact that f (D * X , D * Y ) = − n λ (1−λ) shows dh λ dλ ≤ 0, which completes the proof.

Conclusions
We have generalized an extremal result by Courtade and Jiao. It is worth mentioning that recently Courtade [24] found a different generalization using the doubling trick. So far, we have not been able to prove this new result via the monotone path argument. A deeper understanding of the connection between these two methods is needed. It is conceivable that the convex-like property revealed by the monotone path argument and the subadditive property revealed by the doubling trick are manifestations of a common underlying mathematical structure yet to be uncovered.
It remains to prove (27) since (28) follows by symmetry. Note that Denote the covariance matrix of (N t It can be verified (cf. (A4) and (A5)) that , which further implies In view of (20), Moreover, It can be verified that Substituting (A13) into (A12) proves (27).
where the second inequality is due to (A14) and (A15). Therefore, ∆ , which is equal to the inverse of the upper-left submatrix of , must be positive definite.

Appendix C. Proof of Lemma 3
It is known [27] that BS −1 B is matrix convex in (S, B) for symmetric matrix B and positive definite matrix S. The desired conclusion follows from the fact that (B −1 − A −1 ) −1 = B + B(A − B) −1 B and that affine transformations preserve convexity.