Abstract
We prove an extremal result for long Markov chains based on the monotone path argument, generalizing an earlier work by Courtade and Jiao.
1. Introduction
Shannon’s entropy power inequality (EPI) and its variants assert the optimality of the Gaussian solution to certain extremal problems. They play important roles in characterizing the information-theoretic limits of network information theory problems that involve Gaussian sources and/or channels (see, e.g., [,,,,,,,,,,,,,,]). Many different approaches have been developed for proving such extremal results. Two notable ones are the doubling trick [,] and the monotone path argument [,,,]. Roughly speaking, the former reaches the desired conclusion by establishing the subadditivity of the relevant functional while the latter accomplishes its goal by constructing a monotone path with one end associated with an arbitrary given point in the feasible region and the other associated with the optimal solution to the Gaussian version of the problem. Though the doubling trick typically yields simpler proofs, the monotone path argument tends to be more informative. Indeed, it shows not only the existence of the Gaussian optimizer, but also the fact that every Karush–Kuhn–Tucker (KKT) point (also known as the stationary point) of the Gaussian version of the problem is in fact globally optimal. Such information is highly useful for numerical optimization.
Several years ago, inspired by the Gaussian two-terminal source coding problem, Courtade [] conjectured the following EPI-type extremal result for long Markov chains.
Conjecture 1.
Let and be two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices and respectively, and define . Then, for any ,
has a Gaussian minimizer (i.e., the infimum is achieved by some jointly Gaussian with ). Here, means U, , , and V form a Markov chain in that order.
Later, Courtade and Jiao [] proved this conjecture using the doubling trick. It is natural to ask whether this conjecture can also be proved via the monotone path argument. We shall show in this paper that it is indeed possible. Our work also sheds some light on the connection between these two proof strategies.
In fact, we shall prove a strengthened version of Conjecture 1 with some additional constraints imposed on . For any random (column) vector and random object , let denote the distortion covariance matrix incurred by the minimum mean squared error (MMSE) estimator of from (i.e., ), where is the transpose operator. The main result of this paper is as follows.
Theorem 1.
For any ,
has a Gaussian minimizer, where with and satisfying and .
Remark 1.
Remark 2.
Conjecture 1 corresponds to the special case where and .
The rest of the paper is organized as follows. Section 2 is devoted to the analysis of the Gaussian version of the optimization problem in (2). The key construction underlying our monotone path argument is introduced in Section 3. The proof of Theorem 1 is presented in Section 4. We conclude the paper in Section 5.
2. The Gaussian Version
In this section, we consider the Gaussian version of the optimization problem in (2) defined by imposing the restriction that and are jointly Gaussian. Specifically, let and , where and are two independent n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices and , respectively. It is assumed that is independent of ; as a consequence, the Markov chain constraint is satisfied. Clearly,
Moreover,
Now, it is straightforward to write down the Gaussian version of the optimization problem in (2) with and as variables. However, as shown in the sequel, through a judicious change of variables, one can obtain an equivalent version that is more amenable to analysis.
Given , we introduce two random (column) vectors and , independent of , such that the joint distribution of is the same as that of
Denote the covariance matrix of by . We have
Define and . Since
and must be conditionally independent given . It can be verified that
which implies
Substituting (7) and (8) into (3–(5) gives
Therefore, the Gaussian version of the optimization problem in (2) can be written as
where . It is clear that the infimum in (12) is achievable by some . Moreover, such must satisfy the following KKT conditions:
where .
3. The Key Construction
Let be an identically distributed copy of . Moreover, let and be two n-dimensional zero-mean Gaussian random (column) vectors with covariance matrices and , respectively. It is assumed that , , and are mutually independent. Define and . We choose and such that (cf. (9)–(11))
for some satisfying the KKT conditions (13)–(16).
Let U and V be two arbitrary random objects jointly distributed with such that and . It is assumed that and are mutually independent. We aim to show that the objective function in (2) does not increase when is replaced with , from which the desired result follows immediately. To this end, we develop a monotone path argument based on the following construction.
For , define
It is easy to verify the following Markov structures (see Figure 1a):
Note that, as changes from 0 to 1, () moves from () to () while () moves the other way around. This construction generalizes its counterpart in the doubling trick, which corresponds to the special case .
For , define and , where
In view of (6), we have the following Markov structure (see Figure 1b):
Define and . It can be verified that for ,
where (22) is due to (20), (23) is due to the existence of a bijection between and , (24) is due to the fact that is independent of , and (25) is due to (21). Similarly, we have for .
4. Proof of Theorem 1
The following technical lemmas are needed for proving Theorem 1. Their proofs are relegated to the Appendix A, Appendix B, Appendix C and Appendix D.
Lemma 1.
For ,
Lemma 2.
For ,
where
Lemma 3.
is matrix convex in for .
Lemma 4.
Let be a Gaussian random vector and U be an arbitrary random object. Moreover, let and be two zero-mean Gaussian random vectors, independent of , with covariance matrices and respectively. If , then
Now, we are in a position to prove Theorem 1. Define
Clearly,
Therefore, it suffices to show for . Note that
Since , , and , we have
which, together with Lemma 1, implies
Combining Lemma 2 and (29) shows
where
We shall derive a lower bound for . To this end, we first identify certain constraints on and . Let , where is a zero-mean Gaussian random vector, independent of , with covariance matrix . We shall choose such that , which implies
It can be verified (cf. (6)) that
In view of (25), (31), (32), and Lemma 4,
which, together with (33) and the constraint , implies
Similarly, we have
Define , , and . Consider
which is a convex semidefinite programming problem according to Lemma 3 (in view of (34) and Lemma 2, we have and ). Note that is an optimal solution to this convex programming problem if and only if it satisfies the following KKT conditions:
where . Let , , , and . One can readily verify by leveraging (17)–(19) that (35)–(38) with this particular choice of can be deduced from (13)–(16); it is also easy to see that . Therefore,
Combining (30), (39), and the fact that shows , which completes the proof.
5. Conclusions
We have generalized an extremal result by Courtade and Jiao. It is worth mentioning that recently Courtade [] found a different generalization using the doubling trick. So far, we have not been able to prove this new result via the monotone path argument. A deeper understanding of the connection between these two methods is needed. It is conceivable that the convex-like property revealed by the monotone path argument and the subadditive property revealed by the doubling trick are manifestations of a common underlying mathematical structure yet to be uncovered.
Author Contributions
Methodology, J.W., J.C.; writing—original draft preparation, J.W.; writing—review and editing, J.C.
Funding
This research was funded by the National Science Foundation of China grant number 61771305.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proof of Lemma 1
Note that
where . We have
where denotes the Fisher information matrix, and (A2) is due to ([], Theorem 4). Substituting (A3) into (A1) gives
which, together with the fact ([], Lemma 2) that
proves (26).
Appendix B. Proof of Lemma 2
It can be verified that , where ; moreover, is a zero-mean Gaussian random vector with covariance matrix , and is independent of . Define . We have
On the other hand,
where (A15) is due to the fact that the distortion covariance matrix incurred by the MMSE estimator of from is no greater than (in the semidefinite sense) that incurred by the linear MMSE estimator of from . Combining (A14) and (A15) yields
Comparing the upper-left submatrices on the two sides of the above inequality gives , which, together with the fact that , proves . Moreover, we have
where the second inequality is due to (A14) and (A15). Therefore, , which is equal to the inverse of the upper-left submatrix of
must be positive definite.
Appendix C. Proof of Lemma 3
It is known [] that is matrix convex in for symmetric matrix and positive definite matrix . The desired conclusion follows from the fact that and that affine transformations preserve convexity.
Appendix D. Proof of Lemma 4
Without loss of generality, we assume that , where is a zero-mean Gaussian random vector with covariance matrix and is independent of . As a consequence, , where ; moreover, is a zero-mean Gaussian random vector with covariance matrix , and is independent of . Note that
where the second equality follows by ([], Lemma 2). On the other hand,
where (A17) is due to the conditional version of the Cramer–Rao inequality. Combining (A16) and (A18) yields the desired result.
References
- Bergmans, P. A simple converse for broadcast channels with additive white Gaussian noise (corresp.). IEEE Trans. Inf. Theory 1974, 20, 279–280. [Google Scholar] [CrossRef]
- Weingarten, H.; Steinberg, Y.; Shamai, S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theory 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
- Prabhakaran, V.; Tse, D.; Ramchandran, K. Rate region of the quadratic Gaussian CEO problem. In Proceedings of the IEEE International Symposium on Information Theory, (ISIT), Chicago, IL, USA, 27 June–2 July 2004; p. 117. [Google Scholar]
- Oohama, Y. Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder. IEEE Trans. Inf. Theory 2005, 51, 2577–2593. [Google Scholar] [CrossRef]
- Wang, J.; Chen, J. Vector Gaussian two-terminal source coding. IEEE Trans. Inf. Theory 2013, 59, 3693–3708. [Google Scholar] [CrossRef]
- Wang, J.; Chen, J. Vector Gaussian multiterminal source coding. IEEE Trans. Inf. Theory 2014, 60, 5533–5552. [Google Scholar] [CrossRef]
- Liu, T.; Shamai, S. A note on the secrecy capacity of the multiple-antenna wiretap channel. IEEE Trans. Inf. Theory 2009, 55, 2547–2553. [Google Scholar] [CrossRef]
- Chen, J. Rate region of Gaussian multiple description coding with individual and central distortion constraints. IEEE Trans. Inf. Theory 2009, 55, 3991–4005. [Google Scholar] [CrossRef]
- Motahari, A.S.; Khandani, A.K. Capacity bounds for the Gaussian interference channel. IEEE Trans. Inf. Theory 2009, 55, 620–643. [Google Scholar] [CrossRef]
- Shang, X.; Kramer, G.; Chen, B. A new outer bound and the noisy-interference sum-rate capacity for Gaussian interference channels. IEEE Trans. Inf. Theory 2009, 55, 689–699. [Google Scholar] [CrossRef]
- Annapureddy, V.S.; Veeravalli, V.V. Gaussian interference networks: Sum capacity in the low interference regime and new outer bounds on the capacity region. IEEE Trans. Inf. Theory 2009, 55, 3032–3050. [Google Scholar] [CrossRef]
- Song, L.; Chen, J.; Wang, J.; Liu, T. Gaussian robust sequential and predictive coding. IEEE Trans. Inf. Theory 2013, 59, 3635–3652. [Google Scholar] [CrossRef]
- Song, L.; Chen, J.; Tian, C. Broadcasting correlated vector Gaussians. IEEE Trans. Inf. Theory 2015, 61, 2465–2477. [Google Scholar] [CrossRef]
- Khezeli, K.; Chen, J. A source-channel separation theorem with application to the source broadcast problem. IEEE Trans. Inf. Theory 2016, 62, 1764–1781. [Google Scholar] [CrossRef]
- Tian, C.; Chen, J.; Diggavi, S.N.; Shamai, S. Matched multiuser Gaussian source channel communications via uncoded schemes. IEEE Trans. Inf. Theory 2017, 63, 4155–4171. [Google Scholar] [CrossRef]
- Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inf. Theory 2014, 60, 2087–2104. [Google Scholar] [CrossRef]
- Geng, Y.; Nair, C. Reflection and summary of the paper: The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE IT Soc. Newlett. 2017, 67, 5–7. [Google Scholar]
- Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef]
- Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory 1965, 11, 267–271. [Google Scholar] [CrossRef]
- Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef]
- Liu, T.; Viswanath, P. An extremal inequality motivated by multiterminal information-theoretic problems. IEEE Trans. Inf. Theory 2007, 53, 1839–1851. [Google Scholar] [CrossRef]
- Courtade, T.A. An Extremal Conjecture: Experimenting With Online Collaboration. Information Theory b-Log. 5 March 2013. Available online: http://blogs.princeton.edu/blogit/2013/03/05/ (accessed on 15 March 2013).
- Courtade, T.A.; Jiao, J. An extremal inequality for long Markov chains. arXiv, 2014; arXiv:1404.6984v1. [Google Scholar]
- Courtade, T.A. A strong entropy power inequality. IEEE Trans. Inf. Theory 2018, 64, 2173–2192. [Google Scholar] [CrossRef]
- Palomar, D.P.; Verdú, S. Gradient of mutual information in linear vector Gaussian channels. IEEE Trans. Inf. Theory 2006, 52, 141–154. [Google Scholar] [CrossRef]
- Zhou, Y.; Xu, Y.; Yu, W.; Chen, J. On the optimal fronthaul compression and decoding strategies for uplink cloud radio access networks. IEEE Trans. Inf. Theory 2016, 62, 7402–7418. [Google Scholar] [CrossRef]
- Marshall, A.W.; Olkin, I. Inequalities: Theory of Majorization and Its Applications; Academic Press: New York, NY, USA, 1979. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).