Abstract
Recently, Ledoux, Nair, and Wang proved that the Fisher information along the heat flow is log-convex in dimension one, that is for , where is a random variable with density function satisfying the heat equation. In this paper, we consider the high dimensional case and prove that the Fisher information is square root convex in dimension two, that is for . The proof is based on the semidefinite programming approach.
1. Introduction
Let X be a random variable defined on with density function , which is assumed to be differentiable. The differential entropy and the Fisher information of X are, respectively, defined to be
In 1948, Shannon [1] proposed the entropy power inequality (EPI) , where X and Y are independent random variables defined by and As one of the most important inequalities in information theory, Shannon’s EPI has many proofs and applications [2,3,4,5,6].
In 1985, Costa [7] proved a generalization of Shannon’s EPI, that is, the entropy power of is concave in t, where X is a random variable and is the n-dimensional standard normal distribution, independent of X. This inequality also has many proofs and applications [8,9,10,11].
Costa also proved that and [7] (Corollary 1). Along this line, Cheng and Geng [12] proposed the completely monotone conjecture (CMC)
and proved the conjecture for and . Guo, Yuan, and Gao [13] proved the conjecture in the cases and the case , using semidefinite programming (SDP) software programs. Other related results were also obtained based on the SDP approach [14,15].
The CMC was implicitly considered by Mckean [16] in studying the entropy for solutions of the heat equation . The density function of is a solution of the heat equation [2]. Interestingly, the converse is also true; that is, if the density function of a random variable is a solution of the heat equation, then has the form of [11]. Thus, studying properties of and are equivalent to studying that of a probability measure satisfying the heat equation.
Cheng and Geng [12] also proposed the log-convexity conjecture: the Fisher information along the heat flow is log-convex, which can be deduced from CMC. In 2021, Ledoux, Nair, and Wang [17] proved the log-convexity conjecture for .
In this paper, we consider the two-dimensional case as suggested in [17]. We prove the square root convexity (abbr. sqrt-convexity) of Fisher information along heat flow in dimension two. Precisely, we prove the following result.
Theorem 1.
Let X be a random variable defined on , a Gaussian variable independent of X, and . Then we have
The main idea of the proof is that proof for inequality (1) can be reduced to the proof of whether a quadratic polynomial is a sum of squares (SOS) [18] of linear forms, which can be solved with SDP [19]. The SOS is explicitly given, which provides a rigourous proof for the theorem. The SDP problem related with Theorem 1 has 71 variables, which is difficult to solve by manual calculation.
We also show that log-convexity of the Fisher information along heat flow in dimension two cannot be proven with the SDP approach. More precisely, the SDP software program terminates, but fails to give a solution to prove the log-convexity. This does not imply that the log-convexity in dimension two is not correct, because the SOS problem to be solved with the SDP program is only a sufficient condition but not a necessary for the log-convexity. Theorem 1 is proven as a weaker form of the log-convexity conjecture for . We also show that Theorem 1 implies the CMC for the third-order derivative in dimension two without assuming the log-concavity of p(x). Refer to Corollary 1 for details.
In Theorem 1, we do not assume that X is a log-concave variable. If adding the log-concave condition, then from Toscani [20], is concave, which implies inequality (1) and the proof can be found in Lemma 2.
A drawback of the approach based on SDP is that the proof is difficult for people to check. Although the SOS gives an explicit proof for the theorem, it is quite large to be computed manually. To alleviate this problem, we give the programs and data in github.com, so that interested readers may check the proof using software systems. Refer to Remark 2 for details on how to do this. We also give an illustration for the method by proving Theorem 1 for the case n = 1 in Section 3.1. On the other hand, in the proof of information inequalities, it often happens that the computation is too large to be performed manually, and using computer programs becomes one of the major approaches in proving information inequalities [14,21,22,23,24,25]. To show our result more intuitively, we give the figures of and in Figure 1, where in Equation (2) is . In this case, both and are convex in t.
Figure 1.
Figures for and which are convex in t.
2. Preliminaries
2.1. Notations and Preliminary Results
Let X be a random variable defined by with density function , which is assumed to be differentiable and the n-dimensional standard normal distribution, independent of X. Then is also a random variable defined on with density function
which is differentiable since is. It is known that satisfies the heat Equation (2)
The differential entropy and Fisher information of are, respectively, defined as
For convenience, we use and to denote and in the rest of the paper.
We can easily obtain the following relation between and by de Bruijn’s identity [2]:
By the definition of , the Fisher information is always positive, so we can take the square root of it. By Equation (3) and the fact [7], the first derivative of the Fisher information is always negative:
A function is called sqrt-convex in t if the square root of is convex in t. The following lemma gives an equivalent form of sqrt-convexity, which will be used in the proof of Lemma 10.
Lemma 1.
Theorem 1 is valid, that is, is sqrt-convex in t, if and only if
Proof.
The convexity of is equivalent to the fact that second-order derivative of is positive. From Equation (4), we have
Since , the lemma is proven. □
Corollary 1.
If is sqrt-convex in t for , then the CMC for the third-order with dimension two is correct.
Proof.
Since , it suffices to prove . Using Lemma 1, if is sqrt-convex in t for , then we have . Because , then . □
Lemma 2 gives the relationship among sqrt-convexity, log-convexity, and concavity of .
Lemma 2.
If is concave in t, then is convex in t. If is convex in t, then is sqrt-convex in t.
Proof.
Since , we have . Then, , which means that is convex. Similarly, convexity of means that . Then we can obtain . By Lemma 1, is sqrt-convex in t. □
We consider the two-dimensional case and suppose that the two variables are . For convenience, we use f instead of and instead of . Then we can rewrite the Fisher information as
and the heat equation as
By Equation (7), it is easy to see that for each , we have
In the following, we formally define the concept of differential forms, which are used to reduce the size of the SDP problems to be solved. Refer to Remark 1 for details.
A differential monomial is of the form , where , , and . We define the order of to be , the total order of M to be . The total degree of M is . A differential polynomial is a finite linear combination of differential monomials over . A differential polynomial P is called the k-th order differentially homogenous polynomial, or simply a k-th order differential form, if each of its differential monomial is of total degree k and total order k.
In Lemma 3, we compute the expression of .
Lemma 3.
We have
where each is a -th order differential form for .
Proof.
By Equation (6),
is a second-order differential form, so the lemma is correct for . For ,
Then,
where
are third-order differential forms.
Thus, is a fourth-order differential form. Similarly, we can show that is a sixth-order differential form:
The lemma is proven. □
Inspired by Cauchy–Schwarz inequality, we obtain the following inequality which is used in the proof of Lemma 9.
Lemma 4.
For functions in , we have
Proof.
Using the Cauchy–Schwarz inequality, we have . Using the Cauchy–Schwarz inequality of integral form, we have
Combining the above two inequalities, we prove the lemma. □
2.2. Constraints
The density function f and its derivatives satisfy certain integral equations, from which the constraints of the SDP problems to be solved are obtained. Due to these reasons, these integral equations are called constraints. Precisely, a -th order differential form R is called a -th order constraint, if
It is easy to see that the equations in (9) are still valid if is replaced by , when is a -th order constraint. Guo, Yuan, and Gao [13] proposed a method to compute the constraints, which will be used here to compute the constraints in dimension two. In the following, we show how to compute the -order constraints.
Lemma 5
This lemma guarantees that when using the integration by parts, the integral term of lower dimensions vanishes. The following lemma shows how to generate constraints. We repeat the proof here, because the proof procedure will be used in the proof of Lemma 7.
Lemma 6.
Let M be a differential monomial with total order . Then we can use integration by parts to obtain a -th order constraint from M.
Proof.
Let be one of the variables , and be another variable. Then we have
Then using integration by parts, we have
Thus, is a -th order constraint and the lemma is proven. □
3. Proof of Theorem 1
The proof of Theorem 1 mainly consists of two steps. The first step, summarized in Lemma 10, is used to reduce the proof of Theorem 1 to the proof of the non-negativeness for a quadratic form with undetermined coefficients. This step is given in Section 3.2, Section 3.3 and Section 3.4. The reduction has three main ingredients: (1) Constraints given in Lemma 8 are used to form the SOS and Lemmas 5 and 6 show how to compute the constraints. (2) Lemma 7 is used to reduce all involved quantities into quadratic forms in certain variables. (3) By introducing in Lemma 9 and using the Cauchy–Schwarz inequality in Lemma 4, the quantity is relaxed to a simple form.
The second step, given in Section 3.5, is to compute the undetermined coefficients of the quadratic form using SDP, which is summarized as Problem 1. This step has two sub-steps: (1) In Problem 2, the undetermined coefficients and are computed by omitting the second degree terms. (2) In Problem 5, the undetermined coefficients are computed using the values of and obtained in the first sub-step. In these two sub-steps, the quadratic forms are linear in the undetermined coefficients which can be computed with SDP and the computation procedure is given in Problems 3 and 4.
3.1. An Illustrative Example
In this subsection, we will prove Theorem 1 for n =1 and use this as an illustration of our proving method.
By Lemma 1, it suffices to prove (5). For convenience, we write as f and as . Using Lemma 6, we can obtain the constraints :
By Lemma 3, we have
where . By Lemma 4,
where .
By Lemma 1, it suffices to find an such that is true under the constraints , which is a consequence of the following SOS:
By (16) and (17),
Theorem 1 of case n = 1 is proven.
Equation (17) can be obtained in two steps. In the first step, we compute . Instead of , we consider under the constraints, which can be solved by SDP since is linear in the expression. Suppose that the solution for is .
In the second step, we check whether is valid under the constraints using SDP, and the SOS in (17) can be found. Details of the proof procedure are given in the rest of this section.
3.2. Compute Constraints
In this section, we compute the fourth-order and sixth-order constraints using Lemma 6. For instance, from the differential monomial with total order 3, we obtain two fourth-order constraints:
By considering all differential monomials with total order 3 and total degree 3, we obtain 20 constraints. Some of the constraints cannot be divided by or , which are not needed in the proof due to the form of in Equation (11). Finally, we obtain eight fourth-order constraints and , where
Similarly, we obtain 136 sixth-order constraints . In summary, we obtain constraints , and , which satisfy
3.3. Reduce to Quadratic Form
In order to obtain an SDP problem with a smaller size, we will reduce all differential polynomials in the proof into quadratic forms in a set of new variables which are all the differential monomials with total order 3 and total degree 3:
The following lemma shows that any sixth-order constraint can be reduced to another sixth-order constraint which can be written as a quadratic form in .
Lemma 7.
For any differential monomial M with total order 6 and total degree 6, we can compute a sixth-order differential form P such that
and P is a quadratic form in in Equation (20).
Proof.
Since M is a differential monomial with total degree 6 and total order 6, let with satisfying , , and for . We call the order type and the leading order of M.
If , similar to the proof of Lemma 6, we can use integration by parts to obtain a new polynomial with leading order .
where we assume , without loss of generality. Let . It is easy to see that is a sixth-order differential form. Since , we have for , and hence the leading orders of all monomials of are equal to or less than . If the leading order of a monomial of is still equal to or more than 4, we can repeat procedure (22) for until the leading orders of all monomials of are equal to or less than 3.
After the above procedure, we obtain a sixth-order differential form such that the leading orders of all monomials of are equal to or less than 3. If the order type of a monomial of is , then we use procedure (22) to change to a differential polynomial . It is clear that the leading orders of all monomials of are equal to or less than 3 and the order types of all monomials of are not . Using the above procedure, we may eliminate all monomials with order type . For instance, for the monomial with order type , we can obtain a sixth-order differential form .
After the above two reduction procedures, we obtain a differential polynomial P such that the leading orders of all monomials of P are equal to or less than 3 and the order types of all monomials of P are not . Then the order types of the monomials of P are
All monomials with the above order types can be written as for certain in Equation (20). For instance, the monomial has order type , which can be written as . Thus, P is a quadratic form in variables . The lemma is proven. □
Using Lemma 7 to all monomials of , we obtain which are quadratic forms in . Doing Gaussian elimination to to eliminate the linearly dependent ones, we obtain 48 constraints which are given in Appendix B.
The variables in satisfy certain relations, such as , which are called intrinsic constraints. We have 15 intrinsic constraints . In total, we have 63 sixth-order constraints which are quadratic forms in :
where are given in Appendix B.
The following lemma summarizes all the constraints needed in the proof.
Lemma 8.
Proof.
We need only to consider the equalities for . is obtained from by applying Lemma 7 to each monomial of . Then by Equation (19) and Lemma 7, we have . are obtained from by doing Gaussian elimination, so the are linear combinations of over . Thus . The lemma is proven. □
3.4. Reduction to Semidefinite Positiveness of a Quadratic Form
In this section, we give an , which is a quadratic form in , such that Theorem 1 is true if , that is, is a semidefinite positive polynomial when are treated as independent variables.
In the following key lemma, we introduce in order to generate a common factor in the proof of Lemma 10.
Lemma 9.
Proof.
In Lemma 10, proof of Theorem 1 is finally reduced to the proof of an inequality for a quadratic form with undetermined coefficients.
Lemma 10.
Proof.
is clearly a quadratic form in , since and are. By Lemma 3, we have
Since and , by Lemma 1, Theorem 1 is true if . □
3.5. Prove Theorem 1 by Solving an SDP Problem
In this section, we will give an in Equation (29) satisfying and hence proving Theorem 1. By Lemma 10, in order to prove Theorem 1, it suffices to solve the following problem.
Problem 1.
It is impossible to compute in Problem 1 with SDP directly, since is not linear in . We use the following strategy to solve Problem 1:
- S1
- Expanding the squares and and deleting the terms and , we obtain Problem 2 which is weaker than Problem 1.
- S2
- Since in Problem 2 is linear in , we can use SDP to solve Problem 2 and let be the solutions.
- S3
- Let be obtained from by substituting with . Then, is linear in and we can use SDP to compute such that is true. Under this condition, Problem 1 becomes Problem 5, and it suffices to solve Problem 5 in order to prove Theorem 1.
Problem 2.
Since is a quadratic form in , it is well known that is equivalent to the fact that the symmetric matrix of is positive semidefinite, that is, [19]. In other words, Problem 2 is equivalent to the following SDP problem [19].
Problem 3.
where is the corresponding symmetric matrix for any quadratic form Q in and .
We set the objective function to be 1, which means that it suffices to satisfy the constraints.
We actually solve the following dual problem [19] of Problem 3:
Problem 4.
where , and .
Remark 1.
If not using differential forms to reduce the polynomials into quadratic forms in , then we need to consider all differential monomials with total degree 3 and total order as the bases for the SDP Problem 4. In such a case, instead of , and we need to solve a much larger SDP problem for .
We use the CVX package in Matlab [26] to solve Problem 4. The program is given in Appendix A. Our complete code and data are available (accessed on 30 November 2022) at https://github.com/liujunliang19/sqrt-convex.
With CVX, we obtain a set of solutions for , which are given in Appendix C. From the above discussions, we see that these values are also solutions to Problem 2.
Finally, according to step S3 just above Problem 2, we put the solutions for back into in Problem 1 and obtain the following problem.
Problem 5.
Similar to Problems 3 and 4, we obtain a set of solutions for , which are given in Appendix D. Now is a semi-positive quadratic form and it is well known that can be written as an SOS. The value of as well as its SOS representation are given in Appendix E. Hence, we solve Problem 1 and therefore prove Theorem 1.
Remark 2.
Note that the SOS given in Appendix E provides an explicit and direct proof for Theorem 1 and the solution procedure for the SDP is not needed, similar to Equation (17) for the case of . Of course, the SOS in Appendix E is quite large and difficult to check manually. In order for interested readers to check the proof with a mathematical software system, we also give the complete code and data in https://github.com/liujunliang19/sqrt-convex (accessed on 30 November 2022). The SOS expression for is in the bottom of our Maple code named sqrt-convex2.mw, which can be run directly.
Remark 3.
We also try to use the above approach to prove the log-convexity of the Fisher information along heat flow for . The CVX program returns failed. Thus, we cannot prove the log-convexity with the above approach. We also cannot say that the log-convexity is not correct, since the log-convexity is not equivalent to Problem 3.
Remark 4.
Theorem 1 is stronger than the CMC for the third-order derivative with dimension two. In other words, given Theorem 1, we can obtain . Using Lemma 1, we obtain . Since , we have . Using Equation (3), we have .
4. Conclusions
In this paper, we prove the sqrt-convexity of Fisher Information along heat flow in dimension two. It is easy to find that this conclusion is weaker than the log-convexity conjecture. However, it is stronger than the CMC for the third-order derivative with dimension two.
The proof is based on the SDP method. In order to reduce the size of the SDP problem, we prove that any sixth-order differential form can be reduced to an “equivalent” differential polynomial which is a quadratic form in certain new variables. Based on this fact, we reduce the sixth-order differential forms into quadratic forms in a set of new variables, which reduces the size of the SDP problem significantly.
For possible future research directions, it is interesting to prove the sqrt-convexity for higher dimensions using the method given in this paper. In this case, the main difficulty is to establish inequality (27) in higher dimensions. Another question is to prove the log-convexity by introducing more constraints or new methods to solve Problem 1 without using the relaxation method used in Problem 2. The methods introduced in this paper may be used to prove other EPI inequalities related with the heat equations.
Author Contributions
Conceptualization, J.L.; methodology, J.L. and X.G.; software, J.L.; validation, J.L. and X.G.; formal analysis, J.L. and X.G.; investigation, J.L.;data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, X.G.; supervision, X.G.; project administration, X.G. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by NSFC grant number 11688101 and NKRDP 2018YFA0704705.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
The code for the SDP solver and data are available (accessed on 30 November 2022) at https://github.com/liujunliang19/sqrt-convex.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Matlab Codes for SDP
The following Matlab codes are used to solve Problem 4.
- cvx_begin
- variable Z(n,n) symmetric
- dual variable y
- maximize(trace(C∗X))
- subject to
- [trace(A_1∗Z),trace(A_2∗Z),...,trace(A_m∗Z)]’==
- zeros(m,1):y;
- Z == semidefinite(n);
- cvx_end
Appendix B
We give in Equation (25).
Appendix C. Solutions to Problem 2
We give the solutions , , to Problem 4, which are also solutions to Problems 2 and 3.
Appendix D. Solutions to Problem 5
We give the solutions to Problem 5.
Appendix E. SOS Expression of Θ1
The value of is:
Next, we give the SOS expression of . The parameters not mentioned above are 0.
References
- Shannon, C.E. A mathematical theory of communications. Bell Syst. Technol. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef]
- Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory 1965, 11, 267–271. [Google Scholar] [CrossRef]
- Lieb, E.H. Proof of an entropy conjecture of Wehrl. Commun. Math. Phys. 1978, 62, 35–41. [Google Scholar] [CrossRef]
- Verdú, S.; Guo, D. A simple proof of the entropy-power inequality. IEEE Trans. Inf. Theory 2006, 52, 2165–2166. [Google Scholar] [CrossRef]
- Rioul, O. Information theoretic proofs of entropy power inequalities. IEEE Trans. Inf. Theory 2011, 57, 33–55. [Google Scholar] [CrossRef]
- Costa, M.H.M. A new entropy power ineqaulity. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
- Costa, M.H.M. On the Gaussian interference channel. IEEE Trans. Inf. Theory 1985, 31, 607–615. [Google Scholar] [CrossRef]
- Bergmans, P.P. A simple converse for broadcast channels with additive white Gaussian noise. IEEE Trans. Inf. Theory 1974, 20, 279–280. [Google Scholar] [CrossRef]
- Dembo, A. Simple proof of the concavity of the entropy power with respect to added Gaussian noise. IEEE Trans. Inf. Theory 1989, 35, 887–888. [Google Scholar] [CrossRef]
- Villani, C. A short proof of the ‘concavity of entropy power’. IEEE Trans. Inf. Theory 2000, 46, 1695–1696. [Google Scholar] [CrossRef]
- Cheng, F.; Geng, Y. Higher order derivatives in Costa’s entropy power inequality. IEEE Trans. Inf. Theory 2015, 61, 5892–5905. [Google Scholar] [CrossRef]
- Guo, L.; Yuan, C.M.; Gao, X.S. Lower bounds on multivariate higher order derivatives of differential entropy. Entropy 2022, 24, 1155. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Anantharam, V.; Geng, Y. Gaussian optimality for derivatives of differential entropy using linear matrix inequalities. Entropy 2018, 20, 182. [Google Scholar] [CrossRef] [PubMed]
- Guo, L.; Yuan, C.M.; Gao, X.S. A generalization of the concavity of Renyi entropy power. Entropy 2021, 23, 1593. [Google Scholar] [CrossRef] [PubMed]
- McKean, H.P., Jr. Speed of approach to equilibrium for Kacs caricature of a Maxwellian gas. Arch. Rational Mech. Anal. 1966, 21, 343–367. [Google Scholar] [CrossRef]
- Ledoux, M.; Nair, C.; Wang, Y.N. Log-Convexity of Fisher Information along Heat Flow; University of Toulouse: Toulouse, France, 2021. [Google Scholar]
- Powers, V. Hilbert’s 17th problem and the champagne problem. Am. Math. Mon. 1996, 103, 879–887. [Google Scholar] [CrossRef]
- Vandenberghet, L.; Boyd, S. Semidefinite programming. SIAM Rev. 1996, 38, 49–95. [Google Scholar] [CrossRef]
- Toscani, G. A concavity property for the reciprocal of Fisher information and its consequences on Costa’s EPI. Phys. A Stat. Mech. Its Appl. 2015, 432, 15. [Google Scholar] [CrossRef]
- Yeung, R.W.; Li, C.T. Machine-Proving of Entropy Inequalities. IEEE BITS Inf. Theory Mag. 2021, 1, 12–22. [Google Scholar] [CrossRef]
- Yeung, R.W.; Yan, Y.-O. Information Theoretic Inequality Prover (ITIP), MATLAB Program Software Package. 1996. Available online: http://home.ie.cuhk.edu.hk/~ITIP (accessed on 22 March 2023).
- Pulikkoonattu, R.; Diggavi, S. Xitip, ITIP-Based C Program Software Package. 2006. Available online: http://xitip.epfl.ch (accessed on 22 March 2023).
- Csirmaz, L. A Minimal Information Theoretic Inequality Prover (Minitip). 2016. Available online: https://github.com/lcsirmaz/minitip (accessed on 22 March 2023).
- Li, C.T. Python Symbolic Information Theoretic Inequality Prover (psitip). 2020. Available online: https://github.com/cheuktingli/ (accessed on 22 March 2023).
- Grant, M.; Boyd, S.; Ye, Y. CVX: Matlab Software for Disciplined Convex Programming, Version 2.0 Beta. Available online: http://cvxr.com/cvx (accessed on 22 March 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).