Next Article in Journal
Equilibrium States in Two-Temperature Systems
Previous Article in Journal
Content Adaptive Lagrange Multiplier Selection for Rate-Distortion Optimization in 3-D Wavelet-Based Scalable Video Coding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities †

1
Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
2
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
3
University of Chinese Academy of Sciences, Beijing 100049, China
4
Department of Electrical Engineering and Computer Sciences (EECS), University of California, Berkeley, CA 94720-1234, USA
5
Shanghai Institute of Fog Computing Technology, Shanghai Tech University, Shanghai 201210, China
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper submitted to 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018.
Entropy 2018, 20(3), 182; https://doi.org/10.3390/e20030182
Submission received: 23 January 2018 / Revised: 22 February 2018 / Accepted: 5 March 2018 / Published: 9 March 2018
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Let Z be a standard Gaussian random variable, X be independent of Z, and t be a strictly positive scalar. For the derivatives in t of the differential entropy of X + t Z , McKean noticed that Gaussian X achieves the extreme for the first and second derivatives, among distributions with a fixed variance, and he conjectured that this holds for general orders of derivatives. This conjecture implies that the signs of the derivatives alternate. Recently, Cheng and Geng proved that this alternation holds for the first four orders. In this work, we employ the technique of linear matrix inequalities to show that: firstly, Cheng and Geng’s method may not generalize to higher orders; secondly, when the probability density function of X + t Z is log-concave, McKean’s conjecture holds for orders up to at least five. As a corollary, we also recover Toscani’s result on the sign of the third derivative of the entropy power of X + t Z , using a much simpler argument.

1. Introduction

For a general continuous random variable X with probability density function f ( x ) , its differential entropy is defined as
h ( X ) = + f ( x ) ln f ( x ) d x ,
given that the above integral exists. In [1], Shannon considered the entropy power N ( X ) = 1 2 π e e 2 h ( X ) , and introduced the celebrated Entropy Power Inequality (EPI):
e 2 h X + Y e 2 h ( X ) + e 2 h ( Y ) ,
where X and Y are independent, and equality holds if and only if X and Y are Gaussian. This inequality is nontrivial and was rigorously proved later by Stam [2].
Remark 1.
This is the full version of a conference paper submitted to ISIT 2018 [3].
There have been numerous generalizations of the EPI. In [4], Costa considered the case where X is perturbed by an independent standard Gaussian Z, and showed that N ( X + t Z ) is concave in t for t > 0 :
d 2 d t 2 N ( X + t Z ) 0 , t > 0 .
Toscani [5] further showed that d 3 d t 3 N ( X + t Z ) 0 , under the condition that the probability density function of X + t Z is log-concave.
Later, Villani [6] simplified Costa’s proof by directly studying the second derivative in t of the differential entropy instead of the second derivative of the entropy power. In the proof, it was noticed [6,7,8] that the signs of the first two derivatives of h ( X + t Z ) alternate. Along this line, Cheng and Geng [9] showed that this alternation holds for the first four derivatives, and they made the following conjecture that the alternation is true for general orders of derivatives.
Conjecture 1
([9]). The derivatives of the differential entropy h ( X + t Z ) satisfy ( 1 ) n 1 × d n d t n h ( X + t Z ) 0 for t > 0 and n 1 .
According to Equation (3) in Lemma 2 and the comments, 2 × d d t h ( X + t Z ) is the Fisher information J ( X + t Z ) . The above conjecture is equivalent to hypothesizing that the Fisher information of X + t Z is completely monotone, thus admitting a very simple characterization using the Laplace Transform [10]: there exists a finite Borel measure μ ( · ) such that
J ( X + t Z ) = 0 + e λ t μ ( d λ ) .
Back in 1966, McKean [7] also studied the derivatives in t of h ( X + t Z ) , and noticed that Gaussian X achieves the minimum of d d t h ( X + t Z ) and d 2 d t 2 h ( X + t Z ) , subject to Var ( X ) = σ 2 . Then, McKean implicitly made the following conjecture that Gaussian optimality holds generally:
Conjecture 2
([7]). Subject to Var ( X ) = σ 2 , Gaussian X with variance σ 2 achieves the minimum of ( 1 ) n 1 × d n d t n h ( X + t Z ) for t > 0 and n 1 .
Notice that, if X G is Gaussian with variance σ 2 , by routine calculation,
2 h ( X G + t Z ) = ln 2 π e ( σ 2 + t ) , ( 1 ) n 1 × 2 d n d t n h ( X G + t Z ) = n 1 ! × ( σ 2 + t ) n > 0 .
Hence, McKean’s conjecture implies the one by Cheng and Geng.
Compared with the progress made by Cheng and Geng [9] on Conjecture 1, there has been little progress on Conjecture 2. Most of the existing results are on the second derivative of the differential entropy (or the mutual information), and on generalizing the EPI to other settings. For example: Guo et al. [11] represents the derivatives in the signal-to-noise ratio of the mutual information in terms of the minimum mean-square estimation error, building on de Bruijn’s identity [2]; Wibisono and Jog [12] study the mutual information along the density flow defined by the heat equation and show that it is a convex function of time if the initial distribution is log-concave; Wang and Madiman [13] recover the proof of the EPI via rearrangements; Courtade [14] generalizes Costa’s EPI to non-Gaussian additive perturbations; and König and Smith [15] propose a quantum version of the EPI.
In this paper, we work on Conjecture 2. The main results are to show that Conjecture 2 holds for higher orders up to at least five under the log-concavity condition, and the introduction of the technique of linear matrix inequalities.
The paper is organized as follows: in Section 2, we obtain the formulae for the derivatives of the differential entropy h ( X + t Z ) (Theorem 1) and show that McKean’s conjecture holds for higher orders up to at least five under the log-concavity condition (Corollary 1). As a corollary, we recover Toscani’s result [5] on the third derivative of the entropy power, using the Cauchy–Schwartz inequality, which is much simpler. In Section 3, we introduce the linear matrix inequality approach, and transform the above two conjectures to the feasibility check of semidefinite programming problems. With this approach, we can easily obtain the coefficients in Theorem 1. Then, we show that the direct generalization of the method by Cheng and Geng might not work for orders higher than four for proving Conjecture 1. In Section 4, we prove the main theorem of Section 2.

2. Main Results

We first introduce the notation that is used throughout this paper. When the functions are single-variate, we use d · d · for its derivative. For the multi-variate cases, we use · · for the partial derivative. To simplify the notation, for the derivatives of a general single-variate function g ( y ) , we also use g ( y ) , g ( y ) and g ( y ) to represent the first, second and third derivatives, respectively; and g ( n ) ( y ) denotes the n-th derivative for n 1 .
In the rest of the paper, let Z be a standard Gaussian random variable, and X be independent of Z. Denote
Y t : = X + t Z , t > 0 .
According to [4,16], Y t has nice properties: The probability density function f ( y , t ) of Y t exists, is strictly positive and infinitely differentiable; The differential entropy h Y t exists. Denote
f n : = n y n f ( y , t ) , T n : = n y n ln f ( y , t ) , n = 0 , 1 , 2 , ,
where it is understood that f n and T n are functions of ( y , t ) . We also present some properties of f ( y , t ) in the following lemma. The proof can be found in, say, [2,16] and Propositions 1 and 2 in [9].
Lemma 1.
For t > 0 , the probability density function f ( y , t ) satisfies the following properties:
(1) 
The heat equation holds: t f = 1 2 2 y 2 f .
(2) 
lim | y | f n = 0 , n 0 , t > 0 .
(3) 
The expectation of the product of the T i , E [ i T i ] exists, and lim | y | f i T i = 0 , t > 0 .
In Lemma 1, part (3), in writing E [ i T i ] , we think of each T i as a function of ( Y t , t ) .
Notice that, given X and Z, the differential entropy h ( X + t Z ) is a function of t. The formulae for the first and second derivatives of h ( X + t Z ) are presented in the following lemma. According to Stam [2], the first equality is due to de Bruijn, and the right-hand side is actually the Fisher information (page 671 of [17]); the second one is due to McKean [7], Toscani [8] and Villani [6]; the Gaussian optimality is due to McKean [7].
Lemma 2.
For the first and second derivatives of the differential entropy h ( X + t Z ) , the following expressions hold for t > 0 :
2 h ( X + t Z ) = E f 1 f 2 ,
2 h ( X + t Z ) = E f 2 f f 1 2 f 2 2 .
Subject to Var X = σ 2 , Gaussian X with variance σ 2 minimizes h ( X + t Z ) and h ( X + t Z ) .
By standard manipulations, one has
T 1 = f 1 f , T 2 = f 2 f f 1 2 f 2 .
Thus, it is straightforward to rewrite the derivatives as
2 h ( X + t Z ) = E T 1 2 ,
2 h ( X + t Z ) = E T 2 2 .
For the third and fourth derivatives, one can refer to Theorems 1 and 2 in [9], where they were represented by the f i . Notice that these representations are not unique, and the ones in [9] are sufficient for identifying the signs. Instead, in Theorem 1, we use the T i , and this will facilitate our proof of the Gaussian optimality in Corollary 1.
Theorem 1.
For t > 0 , the derivatives of the differential entropy h ( X + t Z ) can be expressed as:
2 h ( 3 ) ( X + t Z ) = E T 3 2 2 T 2 3 ,
2 h ( 4 ) ( X + t Z ) = E T 4 2 + 6 T 2 4 12 T 3 2 T 2 ,
2 h ( 5 ) ( X + t Z ) = E T 5 2 24 T 2 5 8 T 4 2 T 2 6 T 3 2 T 2 T 1 2 + 12 T 5 T 3 T 2 + 114 T 3 2 T 2 2 .
The proof to this theorem is left to Section 4. The existence of such expressions and how to obtain the coefficients are left to Section 3, where the method of linear matrix inequalities is introduced.

Log-Concave Case

Lemma 2 already ensures the optimality of Gaussians, subject to Var ( X ) = σ 2 , for the first and second derivatives. For higher ones, we do not know if we can show the optimality based on the expressions in Theorem 1. Here, we impose the constraint of log-concavity on f ( y , t ) and summarize the results in Corollaries 1–3.
A nonnegative function f ( · ) is logarithmically concave (or log-concave for short) if its domain is convex and it satisfies the inequality
f θ x + ( 1 θ ) y f ( x ) θ f ( y ) 1 θ
for all x , y in the domain and 0 < θ < 1 . If f is strictly positive, this is equivalent to saying that the logarithm of the function is concave (Section 2.5 of [18]). In our case, assuming that f ( y , t ) is log-concave in y is equivalent to T 2 0 .
Examples of log-concave distributions include the Gaussian, exponential, Laplace, and the Gamma with parameter larger than one. Notice that, if the probability density function of X is log-concave, then so is that of X + t Z (Section 3.5.2 of [18]).
Corollary 1.
If the probability density function of X + t Z is log-concave, then, subject to Var ( X ) = σ 2 , Gaussian X with variance σ 2 achieves the minimum of ( 1 ) n 1 h ( n ) ( X + t Z ) for t > 0 and 3 n 5 .
Proof. 
Let X G be Gaussian with mean μ and variance σ 2 . The probability density function of Y G : = X G + t Z is
f y G , t = 1 2 π σ 2 + t × exp { y G μ 2 2 σ 2 + t } .
The key observation is that the second derivative of the logarithm in the Gaussian case is
T 2 , G : = 2 y G 2 ln f ( y G , t ) = ( σ 2 + t ) 1 .
Hence, from Equation (2), the derivatives of the differential entropy in the Gaussian case are
( 1 ) n 1 × 2 h ( n ) ( X G + t Z ) = ( n 1 ) ! × ( σ 2 + t ) n = ( n 1 ) ! × E T 2 , G n .
Now, if one can show the following chain of inequalities:
( 1 ) n 1 × 2 h ( n ) ( X + t Z ) ( a ) ( n 1 ) ! × E T 2 n ( b ) ( n 1 ) ! × E T 2 n ( c ) ( n 1 ) ! × E T 2 , G n ,
then one is done.
For inequality ( b ) , the log-concavity condition, namely T 2 0 , suffices.
For inequality ( c ) , it suffices to show that E [ T 2 ] E [ T 2 , G ] 0 . This can be proved using Lemma 2: Notice that
E f 2 f = + f 2 ( y , t ) d y = + d f 1 ( y , t ) = f 1 ( y , t ) + = 0 0 ,
where the last equality is due to Lemma 1.
Now, from Equation (5),
E T 2 = E f 2 f + f 1 2 f 2 = E f 1 2 f 2 .
Combining this with Lemma 2, one has
E T 2 = 2 h ( X + t Z ) 2 h ( X G + t Z ) = E T 2 , G .
This part is finished by noticing that E T 2 , G > 0 from Equation (2).
For inequality (a), we show each case of n using Theorem 1 and the condition T 2 0 . For n = 3 ,
2 h ( 3 ) ( X + t Z ) = E T 3 2 2 T 2 3 E 2 T 2 3 = ( n 1 ) ! × E T 2 n , n = 3 .
For n = 4 ,
2 h ( 4 ) ( X + t Z ) = E T 4 2 + 6 T 2 4 12 T 3 2 T 2 E 6 T 2 4 = ( n 1 ) ! × E ( T 2 ) n , n = 4 ,
where the inequality is due to T 2 0 , thus E [ 12 T 3 2 T 2 ] 0 . For n = 5 ,
2 h ( 5 ) ( X + t Z ) = E T 5 2 24 T 2 5 8 T 4 2 T 2 6 T 3 2 T 2 T 1 2 + 12 T 5 T 3 T 2 + 114 T 3 2 T 2 2 = E T 5 + 6 T 3 T 2 2 24 T 2 5 8 T 4 2 T 2 6 T 3 2 T 2 T 1 2 + 78 T 3 2 T 2 2 E 24 T 2 5 = ( n 1 ) ! × E ( T 2 ) n , n = 5 .
Now, the proof is finished. ☐
The following corollary deals with the fifth-order case in [9], under the log-concavity assumption. The proof follows directly from Corollary 1 and Equation (2).
Corollary 2.
If the probability density function of X + t Z is log-concave, then the fifth derivative of the differential entropy is strictly positive: h ( 5 ) ( X + t Z ) > 0 for t > 0 .
Regarding the entropy power, it is already known that N ( X + t Z ) 0 from the connection with Fisher information, and N ( X + t Z ) 0 according to [4]. For the third derivative, Toscani showed that N ( 3 ) ( X + t Z ) 0 , under the log-concavity assumption. Here, we simplify Toscani’s proof, using a Cauchy–Schwartz argument.
Corollary 3.
If the probability density function of X + t Z is log-concave, then the third derivative of the entropy power is nonnegative: N ( 3 ) ( X + t Z ) 0 for t > 0 .
Proof. 
For brevity, let h : = h ( X + t Z ) , and, similarly, we omit the arguments for higher orders. Routine manipulations yield that
N ( 3 ) ( X + t Z ) = d 3 d t 3 1 2 π e e 2 h ( X + t Z ) = 1 2 π e e 2 h ( X + t Z ) 2 h 3 + 3 × 2 h × 2 h + 2 h .
Thus, it suffices to show 2 h 3 + 3 × 2 h × 2 h + 2 h 0 . First, we express h , h and h in the form of the T i : according to Lemma 2 and Equation (12), 2 h = E [ T 2 ] ; from Equation (7), 2 h = E [ T 2 2 ] ; copying from Equation (8), 2 h = E [ T 3 2 2 T 2 3 ] .
Also notice that, from Lemma 2, 2 h ( X + t Z ) 2 h ( X G + t Z ) = ( σ X 2 + t ) 1 > 0 . Hence, E [ T 2 ] > 0 (it cannot be zero). Now, under the log-concavity condition, namely T 2 0 , from the Cauchy–Schwartz inequality for random variables, we have:
E [ T 2 ] E [ T 2 3 ] = E T 2 2 E T 2 3 2 E T 2 T 2 3 2 = E T 2 2 2 .
Thus, we have
2 h 3 + 3 × 2 h × 2 h + 2 h = E [ T 2 ] 3 3 × E [ T 2 ] × E [ T 2 2 ] + E [ T 3 2 2 T 2 3 ] E [ T 2 ] 3 3 × E [ T 2 ] × E [ T 2 2 ] + E [ 2 T 2 3 ] = E [ T 2 ] 1 E [ T 2 ] 4 3 × E [ T 2 ] 2 × E [ T 2 2 ] + 2 E [ T 2 ] E [ T 2 3 ] E [ T 2 ] 1 E [ T 2 ] 4 3 × E [ T 2 ] 2 × E [ T 2 2 ] + 2 E [ T 2 2 ] 2 = E [ T 2 ] 1 E [ T 2 2 ] E [ T 2 ] 2 2 E [ T 2 2 ] E [ T 2 ] 2 .
The proof is finished by noticing that E [ T 2 2 ] E [ T 2 ] 2 0 , which implies that the right-hand side is nonnegative. ☐

3. Linear Matrix Inequalities

In this section, we introduce the method of linear matrix inequalities (LMI), and transform the proof of Conjectures 1 and 2 to the feasibility problem of LMI. This transformation also enables us to find the right coefficients in Theorem 1.
Recall that, in [9], the authors first obtained the fourth derivative as the following (Equation (27) in [9])
2 h ( 4 ) ( X + t Z ) = E f 4 2 f 2 4 f 2 f 3 2 f 3 + 4 f 1 2 f 3 2 f 4 3 f 2 4 f 4 + 24 f 1 2 f 2 3 f 5 36 f 1 4 f 2 2 f 6 + 90 f 1 8 7 f 8 .
Then, with some equalities (from integration by parts), they showed this derivative can be expressed as the negative of a sum of squares (Theorem 2 in [9]):
2 h ( 4 ) ( X + t Z ) = E f 4 f 6 5 f 1 f 3 f 2 7 10 f 2 2 f 2 + 8 5 f 1 2 f 2 f 3 1 2 f 1 4 f 4 2 + 2 5 f 1 f 3 f 2 1 3 f 1 2 f 2 f 3 + 9 100 f 1 4 f 4 2 + 4 100 f 1 2 f 2 f 3 + 4 100 f 1 4 f 4 2 + 1 300 f 2 4 f 4 + 56 90,000 f 1 4 f 2 2 f 6 + 13 70,000 f 1 8 f 8 .
Hence, the fourth derivative is nonpositive. The sum of squares has a natural connection with positive semidefinite matrices. The right-hand side of Equation (14) can be written as E [ u T F u ] , where u is the column vector with coordinates u = ( f 4 / f , f 1 f 3 / f 2 , f 2 2 / f 2 , f 1 2 f 2 / f 3 , f 1 4 / f 4 ) and F is a positive semidefinite matrix. Thus, the method in [9] is actually to verify the existence of a suitable positive semidefinite matrix F. This can be cast as the feasibility of a linear matrix inequality.
A linear matrix inequality (Chapter 2 of [18]) has the form
F ( x , y ) : = F 0 + i = 1 I x i F i + j = 1 J y j G j 0 ,
where the m × m symmetric matrices F 0 , F i , G j , i = 1 , , I , j = 1 , , J are given, variables x i are real and y j are nonnegative, and the notation F ( x , y ) 0 means F ( x , y ) is positive semidefinite. The feasibility problem refers to identifying if there exists a set of x i and y j such that F ( x , y ) is positive semidefinite.
To reformulate the method used by Cheng and Geng [9] as an LMI feasibility problem, using the fourth derivative as an illustrative example, the main idea is: first, transform the original expression of the derivative to the form
2 h ( 4 ) ( X + t Z ) = E [ u T F 0 u ] .
Then, transform the equalities resulting from integration by parts to the form
0 = E [ u T F i u ] , i = 1 , 2 , , I .
Finally, try to find a set of variables x i such that F 0 + i x i F i 0 , which is sufficient to show that
2 h ( 4 ) ( X + t Z ) = E [ u T F 0 u ] = E [ u T ( F 0 + i x i F i ) u ] 0 .
One can notice that there is no matrix G j in the above statement. This is mainly because only equalities were available in [9]. When one imposes inequality constraints, for example T 2 0 , as in this paper, then one will be able to construct matrices G j .
Before we proceed to introduce the details on constructing those matrices, the following observations are clear regarding u = ( f 4 / f , f 1 f 3 / f 2 , f 2 2 / f 2 , f 1 2 f 2 / f 3 , f 1 4 / f 4 ) and the fourth derivative 2 h ( 4 ) ( X + t Z ) (see Equation (13)): (a) the sum-order of derivatives for each entry of u is four, for example, the sum-order of f 1 2 f 2 / f 3 is 1 × 2 + 2 = 4 ; (b) the highest order of a single term in the entries of u is four, namely f 4 / f ; (c) the sum-order of each entry in the fourth derivative is eight, which is twice that of u.
In the following, we take the fourth derivative as an example, and show how to construct these matrices F 0 (Section 3.3), F i (Section 3.1 and Section 3.2), and G j (Section 3.4). We decide to use the T k as the entries of u, instead of the f k , the motivation for which is clear from the proof of Corollary 1 and the desire to exploit the assumption T 2 0 . Based on the above observation and the expressions in Equation (5), our vector u is
u = T 4 , T 3 T 1 , T 2 2 , T 2 T 1 2 , T 1 4 .
Thus, F 0 , F i , G j are 5 × 5 symmetric matrices. Here, we mention that the expressions appearing as coordinates in u correspond to the integer partitions of four.
The organization of this section is as follows: Section 3.1, Section 3.2 and Section 3.3 deal with the sign of the fourth derivative with only equality constraints (see Conjecture 1); Section 3.4 further incorporates the inequality constraints, namely T 2 0 ; Section 3.5 shows the manipulation for the optimality of Gaussian inputs (see Conjecture 2). In Section 3.6, we consider the sign and the Gaussian optimality for the fifth derivative.

3.1. Matrices F i from Multiple Representations

The matrices F i are such that E [ u T F i u ] = 0 . A trivial case is to notice that different products of the form u ( i ) u ( j ) may map to the same term, for example
T 2 2 T 1 4 = ( T 2 2 ) ( T 1 4 ) = ( T 2 T 1 2 ) ( T 2 T 1 2 ) , u ( 3 ) u ( 5 ) = u ( 4 ) u ( 4 ) .
That is, T 2 2 T 1 4 admits multiple representations as u ( i ) u ( j ) . It is easy to construct the corresponding matrix F 1 such that u T F 1 u = 0 :
F 1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 1 0 0 .
For the fourth derivative, only one term has multiple representations. There is none for the third derivative, and three for the fifth ( F 1 , F 2 and F 3 in Section 3.6).

3.2. Matrices F i from Integration by Parts

The equalities of the type E [ u T F i u ] = 0 used in [9] are from integration by parts. Here, we list them one by one.
Notice that all the possible terms with sum-order eight and highest-order four are the following (the numbers in the left column are indices):
1 5 : T 4 2 , T 4 T 3 T 1 , T 4 T 2 2 , T 4 T 2 T 1 2 , T 4 T 1 4 , 6 9 : T 3 2 T 1 2 , T 3 T 2 2 T 1 , T 3 T 2 T 1 3 , T 3 T 1 5 , 10 13 : T 2 4 , T 2 3 T 1 2 , T 2 2 T 1 4 , T 2 T 1 6 , 14 : T 1 8 , 15 : T 3 2 T 2 .
Denote this vector as w.
These terms are arranged in the order such that the first (fourteen) terms can be expressed as u ( i ) u ( j ) for some i and j, while the last term(s) cannot be. We call the first terms the quadratic part w q u a , and the last term(s) the non-quadratic part w n o n . Thus, w = ( w q u a , w n o n ) .
It is not difficult to conclude that, for non-repetition, one only needs to perform integration by parts on the entries whose highest-order term is of power one. All of these entries are (eight in total):
T 4 T 3 T 1 , T 4 T 2 2 , T 4 T 2 T 1 2 , T 4 T 1 4 , T 3 T 2 2 T 1 , T 3 T 2 T 1 3 , T 3 T 1 5 , T 2 T 1 6 .
Taking T 4 T 3 T 1 as an example, one can show that (Equation (18), see the end of this subsection)
E 2 T 4 T 3 T 1 + T 3 2 T 1 2 + T 3 2 T 2 = 0 .
In addition, this can be written as E [ c 1 T w ] = 0 , where
c 1 R 15 , c 1 ( [ 2 , 6 , 15 ] ) = [ 2 , 1 , 1 ] .
There are eight equalities in total and hence there are vectors c 1 , , c 8 . We put each c i as the i-th row of C R 8 × 15 , and write those equalities as
E [ C w ] = 0 ,
where
C = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [ 2 1 1 1 1 2 1 1 2 1 1 4 1 3 1 1 2 3 1 1 5 1 7 1 ] 1 2 3 4 5 6 7 8 .
The entries can be found in Equations (18)–(25).
We need to extract matrices F from these eight equalities E [ C w ] = 0 , such that E [ u T F u ] 0 . The main problem is that c k T w may contain entries that are not expressible as u ( i ) u ( j ) . In particular, for the fourth derivative, this happens when c k ( 15 ) 0 . One needs to do some work to cancel these entries. The general method, which can also be used in higher-order cases, is stated below:
  • Firstly, since w = ( w q u a , w n o n ) , we separate the blocks of C accordingly,
    C = C 11 C 12 C 21 0 , E [ C 11 C 12 C 21 0 w q u a w n o n ] = 0 .
    In the above, C 11 R 2 × 14 , C 12 R 2 × 1 , C 21 R 6 × 14 .
  • Secondly, each row of C 21 corresponds to a symmetric matrix F i such that E [ u T F i u ] 0 . In particular, for the first row of C 21 , the matrix is
    F 2 = 0 0 0 1 0 0 2 2 1 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 ,
    such that 1 2 u T F 2 u = T 4 T 2 T 1 2 + T 3 2 T 1 2 + 2 T 3 T 2 2 T 1 + T 3 T 2 T 1 3 . Notice a scaling of a factor of two is added here just for conciseness, and this does not affect the feasibility of (15). Similarly, the other five matrices, corresponding to the remaining rows of C 21 , are
    F 3 = 0 0 0 0 1 0 0 0 4 1 0 0 0 0 0 0 4 0 0 0 1 1 0 0 0 , F 4 = 0 0 0 0 0 0 0 3 0 0 0 3 2 1 0 0 0 1 0 0 0 0 0 0 0 , F 5 = 0 0 0 0 0 0 0 0 2 0 0 0 0 3 1 0 2 3 0 0 0 0 1 0 0 ,
    F 6 = 0 0 0 0 0 0 0 0 0 1 0 0 0 0 5 0 0 0 0 1 0 1 5 1 0 , F 7 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 7 2 .
  • Thirdly, for C 11 and C 12 , the equalities are E [ C 11 w q u a + C 12 w n o n ] = 0 . Notice w n o n cannot be expressed in a quadratic form. Supposing that we can find a column vector z such that z T C 12 = 0 , then E [ z T C 11 w q u a ] = E [ z T ( C 11 w q u a + C 12 w n o n ) ] = 0 . The vector z actually lies in the null space of C 12 T , and it suffices to find the basis. One way is to do the Q R decomposition:
    C 12 = Q U 0 ,
    where U is upper-triangular. The null-space of C 12 T has the same dimensions as the number of rows of 0 above, and a basis as the last several columns of Q—in particular, for the fourth derivative
    C 12 = 1 2 = Q R = 1 5 2 5 2 5 1 5 5 0 .
    Hence, one takes z as the second column of Q, which is (after scaling for conciseness) z T = 2 , 1 . Then, one calculates z T C 11 w q u a = 4 T 4 T 3 T 1 + T 4 T 2 2 2 T 3 2 T 1 2 + T 3 T 2 2 T 1 , and the corresponding matrix F 8 (scaled by a factor of two) is
    F 8 = 0 4 1 0 0 4 4 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 .
The rest of this subsection is devoted to calculating the equalities obtained from integration by parts. This is similar to that in [9], except in the form of the T i . To begin, we need the following lemma.
Lemma 3.
Let A be a linear combination of terms of products of the T i , then, for n 2 ,
E T n A + T n 1 y A + T n 1 T 1 A = 0 .
Proof. 
From calculus,
E [ T n A ] = f T n A d y = f A d T n 1 = ( a ) 0 T n 1 d ( f A ) = T n 1 f 1 A + f y A d y = ( b ) T n 1 f T 1 A + f y A d y = E T n 1 y A + T n 1 T 1 A ,
where ( a ) is due to Lemma 1, and ( b ) is due to Equation (5). ☐
Now, using Lemma 3, one obtains the following equalities:
T n = T 4 , A = T 3 T 1 , E 2 T 4 T 3 T 1 + T 3 2 T 2 + T 3 2 T 1 2 = 0 ,
T n = T 4 , A = T 2 2 , E T 4 T 2 2 + 2 T 3 2 T 2 + T 3 T 2 2 T 1 = 0 ,
T n = T 4 , A = T 2 T 1 2 , E T 4 T 2 T 1 2 + T 3 2 T 1 2 + 2 T 3 T 2 2 T 1 + T 3 T 2 T 1 3 = 0 ,
T n = T 4 , A = T 1 4 , E T 4 T 1 4 + 4 T 3 T 2 T 1 3 + T 3 T 1 5 = 0 ,
T n = T 3 , A = T 2 2 T 1 , E 3 T 3 T 2 2 T 1 + T 2 4 + T 2 3 T 1 2 = 0 ,
T n = T 3 , A = T 2 T 1 3 , E 2 T 3 T 2 T 1 3 + 3 T 2 3 T 1 2 + T 2 2 T 1 4 = 0 ,
T n = T 3 , A = T 1 5 , E T 3 T 1 5 + 5 T 2 2 T 1 4 + T 2 T 1 6 = 0 ,
T n = T 2 , A = T 1 6 , E 7 T 2 T 1 6 + T 1 8 = 0 .
With these equalities, matrix (17) can be constructed.

3.3. Matrix F 0 from the Derivative

Suppose we have already obtained the fourth derivative in the form (see Equation (30) later)
2 h ( 4 ) ( X + t Z ) = E [ d T w ] = E [ d 1 T w q u a + d 2 T w n o n ] ,
where d 1 R 14 , d 2 R 1 . Then, similar to F 8 , we can find the matrix F 0 such that 2 h ( 4 ) ( X + t Z ) = E [ u T F 0 u ] .
To cancel the non-quadratic term d 2 T w n o n , we solve for z 2 T C 12 = d 2 T (the solution z 2 should exist, otherwise it is not possible to find a quadratic form and the LMI approach fails). Then, since E [ C 11 w q u a + C 12 w n o n ] = 0 , we have
2 h ( 4 ) ( X + t Z ) = E d 1 T w q u a + d 2 T w n o n = E d 1 T w q u a + d 2 T w n o n z 2 T C 11 w q u a + C 12 w n o n = E d 1 T z 2 T C 11 w q u a .
Now, F 0 can be constructed from d 1 T z T C 11 .
The details are as follows. First, we need to express the derivative using the entries of w. This can be done recursively using the following lemma.
Lemma 4.
Let A be a linear combination of terms of products of the T i . The following equalities hold:
2 t T n = n y n ( T 2 + T 1 2 ) = T n + 2 + k = 0 n C n k T k + 1 T n k + 1 , n 0 ,
d d t E [ A ] = E 1 2 ( T 2 + T 1 2 ) A + t A ,
d d t E [ T n 2 ] = E T n + 1 2 + T n k = 1 n 1 C n k T k + 1 T n k + 1 , n 1 .
The proof is left to Appendix A. Now, with Equation (7):
2 h ( X + t Z ) = E [ T 2 2 ] ,
and Equation (28), one can easily obtain that
2 h ( 3 ) ( X + t Z ) = d d t E T 2 2 = E T 3 2 2 T 2 3 .
For the fourth derivative,
2 h ( 4 ) ( X + t Z ) = d d t E T 3 2 2 T 2 3 = ( a ) E T 4 2 + T 3 3 T 2 T 3 + 3 T 3 T 2 d d t E 2 T 2 3 = ( b ) E T 4 2 + T 3 3 T 2 T 3 + 3 T 3 T 2 E T 2 + T 1 2 T 2 3 3 T 2 2 t 2 T 2 = ( c ) E T 4 2 + 6 T 3 2 T 2 T 2 4 T 2 3 T 1 2 3 T 2 2 ( T 4 + 2 T 3 T 1 + 2 T 2 T 2 ) = E T 4 2 + 3 T 4 T 2 2 6 T 3 2 T 2 + 6 T 3 T 2 2 T 1 + 7 T 2 4 + T 2 3 T 1 2 ,
where ( a ) , ( b ) , ( c ) are due to Equations (28), (27), (26), respectively.
Hence, we have the vector d = ( d 1 , d 2 ) R 15 and its blocks d 1 R 14 , d 2 R 1 :
d [ 1 , 3 , 7 , 10 , 11 ] = [ 1 , 3 , 6 , 7 , 1 ] , d [ 15 ] = 6 .
One solves for z 2 such that z 2 T C 12 = d 2 T and obtains
z 2 T = 0 , 3 .
Now, d 1 T z 2 T C 11 has nonzero entries at locations [ 1 , 3 , 7 , 10 , 11 ] , with values [ 1 , 6 , 9 , 7 , 1 ] , respectively. Furthermore, F 0 (scaled by a factor of two) is found as
F 0 = 2 0 6 0 0 0 0 9 0 0 6 9 14 1 0 0 0 1 0 0 0 0 0 0 0 .
By the end of this subsection, it is easy to see that Cheng and Geng’s method can be reformulated as identifying if there exist x 1 , , x 8 R such that
F 0 + i = 1 8 x i F i 0 .
We use the convex optimization package [19] to identify the feasibility of the above LMI problem, and it turns out to be feasible as it should be according to Equation (14).

3.4. Matrices G j from Log-Concavity

Recall that, in [9], there is no matrix G j , since there is no inequality constraint. In this paper, we consider the log-concave case T 2 0 , thus introducing inequality constraints.
For the fourth order, T 2 0 actually implies that the following entries in w are nonpositive:
T 2 3 T 1 2 , T 2 T 1 6 , T 2 T 3 2 .
It is clear that the powers of T 2 are odd, and the others are even.
To transform these nonpositive terms into matrices G j , the first two terms, T 2 3 T 1 2 and T 2 T 1 6 are trivial, since they can be expressed by u ( i ) u ( j ) directly:
0 2 E [ T 2 3 T 1 2 ] = E [ u T G 1 u ] , 0 2 E [ T 2 T 1 6 ] = E [ u T G 2 u ] ,
where
G 1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 , G 2 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 .
For the term T 2 T 3 2 , the idea is similar to the third part in Section 3.2. One first finds z 3 R 2 such that z 3 T C 12 w n o n = T 2 T 3 2 , namely z 3 T C 12 = 1 . The solution is z 3 T = 0 , 1 / 2 . Then,
E [ T 2 T 3 2 ] = E [ T 2 T 3 2 z 3 T ( C 11 w q u a + C 12 w n o n ) ] = E [ z 3 T C 11 w q u a ] = E [ 1 2 T 4 T 2 2 1 2 T 3 T 2 2 T 1 ] .
Now, it is routine to obtain
0 4 E [ T 2 T 3 2 ] = E [ u T G 3 u ] ,
where
G 3 = 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 .
At this point, we are done with the procedure for calculating all these matrices F 0 , the F i and the G j . To show the negativity of the fourth derivative, it suffices to find a set of variables x i R and y j 0 such that
F 0 + i = 1 8 x i F i + j = 1 3 y j G j 0 .
Remark 2.
The matrix G 2 is actually redundant, since we know that E [ T 2 T 1 6 ] 1 7 E [ T 1 8 ] 0 , which is already included in the matrices F i (in particular, matrix F 7 in Section 3.2). Including G 2 will not affect the feasibility check.

3.5. Matrix F ˜ 0 for Gaussian Optimality

However, to show the optimality of the Gaussian, the above formulation is not enough. According to inequality ( a ) in Equation (11), it would suffice to show that
( 1 ) n 1 × 2 h ( n ) ( X + t Z ) ( n 1 ) ! × E [ ( T 2 ) n ] 0
instead of ( 1 ) n 1 × 2 h ( n ) ( X + t Z ) 0 . Thus, one needs to calculate the matrix F 0 such that
( 1 ) n 1 × 2 h ( n ) ( X + t Z ) ( n 1 ) ! × E [ ( T 2 ) n ] = E [ u T F 0 u ] .
The procedure is the same as that in Section 3.3.
In particular, for the fourth derivative, since n = 4 is even, we directly have the quadratic form E [ ( T 2 ) n ] = u ( 3 ) u ( 3 ) . It is straightforward to construct the matrix F ˜ 0 (scaled by a factor of two) here
F ˜ 0 = F 0 0 0 0 0 0 0 0 0 0 0 0 0 2 × 3 ! 0 0 0 0 0 0 0 0 0 0 0 0 = 2 0 6 0 0 0 0 9 0 0 6 9 2 1 0 0 0 1 0 0 0 0 0 0 0 ,
such that E [ u T F ˜ 0 u ] = 4 h ( 4 ) ( X + t Z ) 12 E [ T 2 4 ] .
Now, the LMI is updated as F ˜ 0 + i = 1 8 x i F i + j = 1 3 y j G j 0 . Again, we use the convex optimization package [19] to check the feasibility. It turns out to be feasible and the solution helps us to identify the coefficients in Equation (9).

3.6. Fifth Derivative

For the fifth derivative, we omit the details of the manipulations since they are routine, and just provide the matrices here. For brevity, we only list out the nonzero entries of the upper-triangular part of a symmetric matrix. These matrices (with scaling) are
F 0 : F [ ( 1 , 1 ) , ( 1 , 3 ) , ( 1 , 5 ) , ( 2 , 3 ) , ( 2 , 5 ) , ( 3 , 3 ) , ( 3 , 4 ) , ( 3 , 5 ) , ( 3 , 6 ) , ( 5 , 5 ) , ( 5 , 6 ) ] = [ 2 , 20 , 29 , 214 3 , 49 2 , 178 3 , 37 3 , 58 , 6 , 45 , 1 2 ] , F 1 : F [ ( 3 , 6 ) , ( 4 , 5 ) ] = [ 1 , 1 ] , F 2 : F [ ( 3 , 7 ) , ( 4 , 6 ) ] = [ 1 , 1 ] , F 3 : F [ ( 5 , 7 ) , ( 6 , 6 ) ] = [ 1 , 2 ] , F 4 : F [ ( 1 , 4 ) , ( 2 , 2 ) , ( 2 , 3 ) , ( 2 , 4 ) ] = [ 1 , 2 , 2 , 1 ] , F 5 : F [ ( 1 , 6 ) , ( 2 , 4 ) , ( 2 , 5 ) , ( 2 , 6 ) ] = [ 1 , 1 , 3 , 1 ] , F 6 : F [ ( 1 , 7 ) , ( 2 , 6 ) , ( 2 , 7 ) ] = [ 1 , 5 , 1 ] , F 7 : F [ ( 2 , 4 ) , ( 3 , 4 ) , ( 4 , 4 ) ] = [ 2 , 3 , 2 ] , F 8 : F [ ( 2 , 5 ) , ( 3 , 4 ) , ( 3 , 5 ) , ( 3 , 6 ) ] = [ 1 , 2 , 2 , 1 ] , F 9 : F [ ( 2 , 6 ) , ( 3 , 6 ) , ( 3 , 7 ) , ( 4 , 4 ) ] = [ 1 , 4 , 1 , 2 ] , F 10 : F [ ( 2 , 7 ) , ( 3 , 7 ) , ( 4 , 7 ) ] = [ 1 , 6 , 1 ] , F 11 : F [ ( 3 , 6 ) , ( 5 , 5 ) , ( 5 , 6 ) ] = [ 3 , 6 , 1 ] , F 12 : F [ ( 3 , 7 ) , ( 5 , 6 ) , ( 5 , 7 ) ] = [ 2 , 5 , 1 ] , F 13 : F [ ( 4 , 7 ) , ( 5 , 7 ) , ( 6 , 7 ) ] = [ 1 , 7 , 1 ] , F 14 : F [ ( 6 , 7 ) , ( 7 , 7 ) ] = [ 9 , 2 ] , F 15 : F [ ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 2 ) , ( 2 , 3 ) , ( 3 , 3 ) , ( 3 , 4 ) ] = [ 6 , 3 , 6 , 5 , 2 , 1 ] , F 16 : F [ ( 1 , 5 ) , ( 2 , 3 ) , ( 2 , 5 ) , ( 3 , 3 ) , ( 3 , 5 ) ] = [ 1 , 2 , 1 , 6 , 1 ] .
For the sign of the fifth derivative, we used the convex optimization package [19] to solve the following LMI problem,
F 0 + i = 1 16 x i F i 0 ,
but could not find a feasible solution x 1 , , x 16 R . This suggests to us that a direct generalization of Cheng and Geng’s method may not work for the fifth derivative.
Instead, if we consider the log-concavity constraint T 2 0 and check the optimality of Gaussian inputs, then we have a new matrix F ˜ 0 (similar to Section 3.5) and several matrices G j as the following:
F ˜ 0 : F [ ( 1 , 1 ) , ( 1 , 3 ) , ( 1 , 5 ) , ( 2 , 3 ) , ( 2 , 5 ) , ( 3 , 3 ) , ( 3 , 4 ) , ( 3 , 5 ) , ( 3 , 6 ) , ( 5 , 5 ) , ( 5 , 6 ) ] = [ 2 , 20 , 29 , 214 3 , 49 2 , 178 3 , 37 3 , 38 , 6 , 3 , 1 2 ] , G 1 : G [ ( 3 , 4 ) ] = [ 1 ] , G 2 : G [ ( 5 , 6 ) ] = [ 1 ] , G 3 : G [ ( 6 , 7 ) ] = [ 1 ] , G 4 : G [ ( 1 , 3 ) , ( 2 , 3 ) , ( 3 , 3 ) , ( 3 , 4 ) ] = [ 3 , 5 , 2 , 1 ] , G 5 : G [ ( 3 , 5 ) , ( 5 , 5 ) ] = [ 2 , 1 ] .
Now, one would like to find x 1 , , x 16 R and y 1 , , y 5 R + such that
F ˜ 0 + i = 1 16 x i F i + i = 1 5 y i G i 0 .
This can be solved by the convex optimization package [19]. Again, the solution helps us to arrive at Equation (10).

4. Proof of Theorem 1

Proof. 
For the third derivative, according to Equation (29), we have
2 h ( 3 ) ( X + t Z ) = E T 3 2 2 T 2 3 .
For the fourth derivative, according to Equation (30):
2 h ( 4 ) ( X + t Z ) = E T 4 2 + 3 T 4 T 2 2 6 T 3 2 T 2 + 6 T 3 T 2 2 T 1 + 7 T 2 4 + T 2 3 T 1 2 .
Adding multiples of the left-hand sides of the equations: 3 × ( 19 ) ( 22 ) , we obtain
2 h ( 4 ) ( X + t Z ) = E T 4 2 + 3 T 4 T 2 2 6 T 3 2 T 2 + 6 T 3 T 2 2 T 1 + 7 T 2 4 + T 2 3 T 1 2 = ( a ) E T 4 2 + 6 T 3 2 T 2 3 T 3 T 2 2 T 1 6 T 3 2 T 2 + 6 T 3 T 2 2 T 1 + 7 T 2 4 + T 2 3 T 1 2 = E T 4 2 12 T 3 2 T 2 + 3 T 3 T 2 2 T 1 + 7 T 2 4 + T 2 3 T 1 2 = ( b ) E T 4 2 12 T 3 2 T 2 + T 2 4 T 2 3 T 1 2 + 7 T 2 4 + T 2 3 T 1 2 = E T 4 2 12 T 3 2 T 2 + 6 T 2 4 ,
where ( a ) is due to Equation (19), and ( b ) is due to Equation (22).
For the fifth derivative,
2 h ( 5 ) ( X + t Z ) = d d t E T 4 2 6 T 2 4 + 12 T 3 2 T 2 = d d t E T 4 2 + d d t E 6 T 2 4 + d d t E 12 T 3 2 T 2 .
For each term above on the right-hand side: According to Equation (28),
d d t E T 4 2 = E T 5 2 8 T 4 2 T 2 6 T 4 T 3 2 .
For the second term,
d d t E 6 T 2 4 = ( c ) E 3 ( T 2 + T 1 2 ) T 2 4 12 T 2 3 × t ( 2 T 2 ) = ( d ) E 3 T 2 5 3 T 2 4 T 1 2 12 T 2 3 T 4 + 2 T 3 T 1 + 2 T 2 2 = E 3 T 2 4 T 1 2 12 T 4 T 2 3 24 T 3 T 2 3 T 1 27 T 2 5 ,
where ( c ) is due to Equation (27), and ( d ) is due to Equation (26). For the third term, according to Equation (27),
d d t E 12 T 3 2 T 2 = E 6 T 2 + T 1 2 T 3 2 T 2 + t 12 T 3 2 T 2 ,
where the last term is
t 12 T 3 2 T 2 = 12 T 3 2 t T 2 + 24 T 3 T 2 × t T 3 = ( e ) 6 T 3 2 T 4 + 2 T 3 T 1 + 2 T 2 2 + 12 T 3 T 2 T 5 + 2 T 4 T 1 + 6 T 3 T 2 = 12 T 5 T 3 T 2 + 6 T 4 T 3 2 + 24 T 4 T 3 T 2 T 1 + 12 T 3 3 T 1 + 84 T 3 2 T 2 2 ,
where ( e ) is due to Equation (26). Hence
d d t E 12 T 3 2 T 2 = E 12 T 5 T 3 T 2 + 6 T 4 T 3 2 + 24 T 4 T 3 T 2 T 1 + 6 T 3 2 T 2 T 1 2 + 12 T 3 3 T 1 + 90 T 3 2 T 2 2 .
Finally, combining Equations (31)–(33), we get
2 h ( 5 ) ( X + t Z ) = d d t E T 4 2 + d d t E 6 T 2 4 + d d t E 12 T 3 2 T 2 = E T 5 2 8 T 4 2 T 2 3 T 2 4 T 1 2 12 T 4 T 2 3 24 T 3 T 2 3 T 1 27 T 2 5 + 12 T 5 T 3 T 2 + 24 T 4 T 3 T 2 T 1 + 6 T 3 2 T 2 T 1 2 + 12 T 3 3 T 1 + 90 T 3 2 T 2 2 .
To simplify Equation (34), using Lemma 3, one first obtains the following equalities:
T n = T 4 , A = T 3 T 2 T 1 , E 2 T 4 T 3 T 2 T 1 + T 3 3 T 1 + T 3 2 T 2 2 + T 3 2 T 2 T 1 2 = 0 ,
T n = T 3 , A = T 2 3 T 1 , E 4 T 3 T 2 3 T 1 + T 2 5 + T 2 4 T 1 2 = 0 ,
T n = T 4 , A = T 2 3 , E T 4 T 2 3 + 3 T 3 2 T 2 2 + T 3 T 2 3 T 1 = 0 .
Then, adding multiples of the left-hand sides of Equations (35)–(37), we have
2 h ( 5 ) ( X + t Z ) = 2 h ( 5 ) ( X + t Z ) 12 × ( 35 ) + 3 × ( 36 ) + 12 × ( 37 ) = E T 5 2 24 T 2 5 8 T 4 2 T 2 6 T 3 2 T 2 T 1 2 + 12 T 5 T 3 T 2 + 114 T 3 2 T 2 2 .
 ☐

5. Discussion

5.1. On the Derivatives

We are not able to say anything conclusive about the sign of the fifth derivative of the differential entropy h ( X + t Z ) . If we impose the log-concavity condition, namely T 2 0 , then the fifth derivative is at least 4 ! × E [ ( T 2 ) 5 ] . This motivates us to consider the following problem: Without additional constraints, what are the values c 5 > 0 such that
2 h ( 5 ) ( X + t Z ) c 5 × E [ ( T 2 ) 5 ] .
If one finds such a value c 5 , then so long as E [ ( T 2 ) 5 ] 0 , the sign of the fifth derivative is determined. This condition is much weaker than T 2 0 .
For the computational part, one only needs to construct the matrix F ˜ 0 such that 2 h ( 5 ) ( X + t Z ) c 5 × E [ ( T 2 ) 5 ] = E [ u T F ˜ 0 u ] , and then solve the problem (see Section 3.6 for the matrices F i )
F ˜ 0 + i = 1 16 x i F i 0 .
It turns out that c 5 = 0.13 works, while c 5 = 0.125 fails. The authors guess that c 5 [ 0.13 , 24 ] works, but, at the moment, can just partly confirm this with limited simulation.
Notice that the third derivative of the entropy power N ( X + t Z ) was shown to be nonnegative under the log-concavity condition [5], and we recover this in Corollary 3. We also considered the fourth derivative, but failed to obtain the sign because we were unable to apply the Cauchy–Schwartz inequality as we did for the third derivative.

5.2. Possible Proofs

To prove Conjecture 1, besides the method proposed in [9], we are also considering the following ways: the first one is constructive and inspired by Equation (1). Given a random variable X, if we can construct a proper measure μ ( · ) such that Equation (1) holds, then one proves Conjecture 1. However, this is difficult even when X is binary symmetric, which is a very simple random variable.
The second one is recursive. Suppose one can find a formula for the n-th derivative such that
( 1 ) n 1 × h ( n ) ( X + t Z ) = E [ i = 1 k n A i 2 ] , d d t E [ A i 2 ] = E [ B i 2 ] ,
then it is clear that
( 1 ) n × h ( n + 1 ) ( X + t Z ) = E [ i = 1 k n B i 2 ] 0 .
However, this fails for n = 2 (see Equation (7) and Theorem 1). Instead, one may expect that
d d t E [ A i 2 ] = E [ B i 2 C i 2 + B i + 1 2 ] ,
and then
( 1 ) n × h ( n + 1 ) ( X + t Z ) = E [ B 1 2 + B k n + 1 2 i = 1 k n C i 2 ] .
If further one can show that E [ B 1 2 + B k n + 1 2 ] = E [ C k n + 1 2 ] for some C k n + 1 , then one finishes the proof. Notice here that a clever observation is needed for this way to work.

5.3. Applications

The topic of Gaussian optimality has wide applications, for example in [20,21]. In this work, besides the Gaussian optimality, we also have some new observations. In [11], the derivatives in the signal-noise ratio (snr) of I ( X ; s n r X + Z ) are studied. In particular, the first four derivatives are obtained in the language of the minimum mean-square error (Equations (69)–(72) in Corollary 1 of [11]). However, it is not clear whether some of these derivatives are signed or not.
With some standard manipulations, it is not difficult to show that
I ( X ; s n r X + Z ) = h ( s n r X + Z ) h ( Z ) = h ( X + 1 s n r Z ) + log s n r 1 2 log 2 π e .
By letting t = 1 / s n r , one can easily connect the minimum mean-square error formulae in [11] with the signs of the derivatives of h ( X + t Z ) in t. The verification of Conjectures 1 and 2 would imply the bounding and extremal properties of Equations (69)–(72) in [11], and thus deepen our understanding of the minimum mean-square error estimation under the additive-Gaussian setting.
In addition, notice that the probability density function f ( y , t ) of Y = X + t Z is the solution of the heat equation t f ( y , t ) = 1 2 2 y 2 f ( y , t ) with the initial condition that f ( y , 0 ) = f X ( y ) . Hence, Conjectures 1 and 2, if true, reveal the properties of the differential entropy of functions that satisfy the heat equation. For more results related to diffusion equations, one may refer to [22].

6. Conclusions

In this paper, we studied two conjectures on the derivatives of the differential entropy of a general random variable with added Gaussian noise. Regarding the conjecture on the signs of the derivatives made by Cheng and Geng, we introduced the linear matrix inequality approach to provide evidence that their original method might not generalize to orders higher than four. Instead, we considered imposing an additional constraint, namely the log-concavity assumption, and showed the optimality of Gaussian random variables for orders three, four and five. Thus, we made progress on McKean’s conjecture, under a mild condition.

Acknowledgments

The authors would like to thank Professor Chandra Nair for his valuable suggestions. The work of Venkat Anantharam was supported by the National Science Foundation (NSF) grants ECCS-1343398, CNS-1527846, CCF-1618145, the NSF Science and Technology Center grant CCF-0939370 (Science of Information), and the William and Flora Hewlett Foundation supported Center for Long Term Cybersecurity at Berkeley. The work of Yanlin Geng was supported in part by the National Natural Science Foundation of China under Grant 61601288, and the Science and Technology Commission of Shanghai Municipality under Grant 15YF1407900.

Author Contributions

Venkat Anantharam proposed the linear matrix inequality approach. Xiaobing Zhang performed the experiments to find the coefficients in the theorem and wrote the paper. Yanlin Geng proved the main results. Venkat Anantharam and Yanlin Geng reviewed and edited the manuscript. All authors read and approved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 4

Proof. 
For Equation (26), according to Lemma 1, f ( y , t ) satisfies the following (heat) equation:
f t : = t f = 1 2 f 2 .
In addition, according to Equation (5),
T 1 = f 1 f , T 2 = f 2 f f 1 2 f 2 .
Hence,
2 t T 0 = 2 t ln f ( y , t ) = 2 f t f = f 2 f = T 2 + T 1 2 .
Now, it follows that, for n 0 ,
2 t T n = 2 t n y n T 0 = n y n t 2 T 0 = n y n T 2 + T 1 2 = T n + 2 + k = 0 n C n k T k + 1 T n k + 1 .
For Equation (27),
d d t E [ A ] = d d t f A d y = f t A + f t A d y = f 1 2 f 2 f A + f t A d y = E 1 2 ( T 2 + T 1 2 ) A + t A .
For Equation (28), the derivative is
d d t E [ T n 2 ] = ( a ) f 1 2 ( T 2 + T 1 2 ) T n 2 + f t T n 2 d y = f 1 2 ( T 2 + T 1 2 ) T n 2 + f T n × 2 t T n d y ,
where ( a ) is due to Equation (27).
For the first term of the right-hand side, from Lemma 1 and integration by parts,
f T n + 1 T n T 1 d y = f T n T 1 d T n = 0 T n ( f 1 T n T 1 + f T n + 1 T 1 + f T n T 2 ) d y , f T n + 1 T n T 1 d y = 1 2 f T n 2 ( T 2 + T 1 2 ) d y .
For the second term, we have
f T n × 2 t T n d y = ( b ) f T n T n + 2 + k = 0 n C n k T k + 1 T n k + 1 d y = f T n 2 T n + 1 T 1 + k = 1 n 1 C n k T k + 1 T n k + 1 d y + f T n d T n + 1 = f T n 2 T n + 1 T 1 + k = 1 n 1 C n k T k + 1 T n k + 1 T n + 1 ( f T 1 T n + f T n + 1 ) d y = f T n + 1 2 + T n + 1 T n T 1 + T n k = 1 n 1 C n k T k + 1 T n k + 1 d y ,
where ( b ) is due to Equation (26).
Combining these two terms together, the third equality is proved. ☐

References

  1. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  2. Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef]
  3. Zhang, X.; Anantharam, V.; Geng, Y. Gaussian Extremality for Derivatives of Differential Entropy under the Additive Gaussian Noise Flow. IEEE Int. Symp. Inf. Theory 2018. submitted. [Google Scholar]
  4. Costa, M. A new entropy power inequality. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
  5. Toscani, G. A concavity property for the reciprocal of Fisher information and its consequences on Costa’s EPI. Phys. A Stat. Mech. Appl. 2015, 432, 35–42. [Google Scholar] [CrossRef]
  6. Villani, C. A short proof of the “concavity of entropy power”. IEEE Trans. Inf. Theory 2000, 46, 1695–1696. [Google Scholar] [CrossRef]
  7. McKean, H.P. Speed of approach to equilibrium for Kac’s caricature of a Maxwellian gas. Arch. Ration. Mech. Anal. 1966, 21, 343–367. [Google Scholar] [CrossRef]
  8. Toscani, G. Entropy production and the rate of convergence to equilibrium for the Fokker-Planck equation. Q. Appl. Math. 1999, 57, 521–541. [Google Scholar] [CrossRef]
  9. Cheng, F.; Geng, Y. Higher order derivatives in Costa’s entropy power inequality. IEEE Trans. Inf. Theory 2015, 61, 5892–5905. [Google Scholar] [CrossRef]
  10. Bernstein, S. Sur les fonctions absolument monotones. Acta Math. 1929, 52, 1–66. [Google Scholar] [CrossRef]
  11. Guo, D.; Wu, Y.; Shitz, S.S.; Verdú, S. Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inf. Theory 2011, 57, 2371–2385. [Google Scholar]
  12. Wibisono, A.; Jog, V. Convexity of mutual information along the heat flow. arXiv, 2018; arXiv:1801.06968. [Google Scholar]
  13. Wang, L.; Madiman, M. Beyond the entropy power inequality, via rearrangements. IEEE Trans. Inf. Theory 2014, 60, 5116–5137. [Google Scholar] [CrossRef]
  14. Courtade, T.A. Concavity of entropy power: Equivalent formulations and generalizations. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 56–60. [Google Scholar]
  15. König, R.; Smith, G. The entropy power inequality for quantum systems. IEEE Trans. Inf. Theory 2014, 60, 1536–1548. [Google Scholar] [CrossRef]
  16. Rioul, O. Information theoretic proofs of entropy power inequalities. IEEE Trans. Inf. Theory 2011, 57, 33–55. [Google Scholar] [CrossRef]
  17. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2006. [Google Scholar]
  18. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  19. Grant, M.; Boyd, S.; Ye, Y. CVX: Matlab Software for Disciplined Convex Programming. 2008. Available online: http://cvxr.com/cvx/ (accessed on 5 March 2018).
  20. Weingarten, H.; Steinberg, Y.; Shamai, S.S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theory 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
  21. Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inf. Theory 2014, 60, 2087–2104. [Google Scholar] [CrossRef]
  22. Toscani, G. Diffusion Equations and Entropy Inequalities. preprint 2016. Available online: http://mate.unipv.it/toscani/publi/Note-Ravello-2016.pdf (accessed on 5 March 2018).

Share and Cite

MDPI and ACS Style

Zhang, X.; Anantharam, V.; Geng, Y. Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities. Entropy 2018, 20, 182. https://doi.org/10.3390/e20030182

AMA Style

Zhang X, Anantharam V, Geng Y. Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities. Entropy. 2018; 20(3):182. https://doi.org/10.3390/e20030182

Chicago/Turabian Style

Zhang, Xiaobing, Venkat Anantharam, and Yanlin Geng. 2018. "Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities" Entropy 20, no. 3: 182. https://doi.org/10.3390/e20030182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop