Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities

Zhang, Xiaobing; Anantharam, Venkat; Geng, Yanlin

doi:10.3390/e20030182

Open AccessArticle

Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities^†

by

Xiaobing Zhang

^1,2,3,

Venkat Anantharam

⁴ and

Yanlin Geng

^2,5,*

¹

Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China

²

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Department of Electrical Engineering and Computer Sciences (EECS), University of California, Berkeley, CA 94720-1234, USA

⁵

Shanghai Institute of Fog Computing Technology, Shanghai Tech University, Shanghai 201210, China

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper submitted to 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018.

Entropy 2018, 20(3), 182; https://doi.org/10.3390/e20030182

Submission received: 23 January 2018 / Revised: 22 February 2018 / Accepted: 5 March 2018 / Published: 9 March 2018

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

:

Let Z be a standard Gaussian random variable, X be independent of Z, and t be a strictly positive scalar. For the derivatives in t of the differential entropy of

X + \sqrt{t} Z

, McKean noticed that Gaussian X achieves the extreme for the first and second derivatives, among distributions with a fixed variance, and he conjectured that this holds for general orders of derivatives. This conjecture implies that the signs of the derivatives alternate. Recently, Cheng and Geng proved that this alternation holds for the first four orders. In this work, we employ the technique of linear matrix inequalities to show that: firstly, Cheng and Geng’s method may not generalize to higher orders; secondly, when the probability density function of

X + \sqrt{t} Z

is log-concave, McKean’s conjecture holds for orders up to at least five. As a corollary, we also recover Toscani’s result on the sign of the third derivative of the entropy power of

X + \sqrt{t} Z

, using a much simpler argument.

Keywords:

differential entropy; entropy power; log-concavity; linear matrix inequality; Gaussian optimality

1. Introduction

For a general continuous random variable X with probability density function

f (x)

, its differential entropy is defined as

\begin{matrix} h (X) = - \int_{- \infty}^{+ \infty} f (x) ln f (x) d x, \end{matrix}

given that the above integral exists. In [1], Shannon considered the entropy power

N (X) = \frac{1}{2 π e} e^{2 h (X)}

, and introduced the celebrated Entropy Power Inequality (EPI):

\begin{matrix} e^{2 h (X + Y)} \geq e^{2 h (X)} + e^{2 h (Y)}, \end{matrix}

where X and Y are independent, and equality holds if and only if X and Y are Gaussian. This inequality is nontrivial and was rigorously proved later by Stam [2].

Remark 1.

This is the full version of a conference paper submitted to ISIT 2018 [3].

There have been numerous generalizations of the EPI. In [4], Costa considered the case where X is perturbed by an independent standard Gaussian Z, and showed that

N (X + \sqrt{t} Z)

is concave in t for

t > 0

:

\begin{matrix} \frac{d^{2}}{d t^{2}} N (X + \sqrt{t} Z) \leq 0, t > 0 . \end{matrix}

Toscani [5] further showed that

\frac{d^{3}}{d t^{3}} N (X + \sqrt{t} Z) \geq 0

, under the condition that the probability density function of

X + \sqrt{t} Z

is log-concave.

Later, Villani [6] simplified Costa’s proof by directly studying the second derivative in t of the differential entropy instead of the second derivative of the entropy power. In the proof, it was noticed [6,7,8] that the signs of the first two derivatives of

h (X + \sqrt{t} Z)

alternate. Along this line, Cheng and Geng [9] showed that this alternation holds for the first four derivatives, and they made the following conjecture that the alternation is true for general orders of derivatives.

Conjecture 1

([9]). The derivatives of the differential entropy

h (X + \sqrt{t} Z)

satisfy

{(- 1)}^{n - 1} \times \frac{d^{n}}{d t^{n}} h (X + \sqrt{t} Z) \geq 0

for

t > 0

and

n \geq 1

.

According to Equation (3) in Lemma 2 and the comments,

2 \times \frac{d}{d t} h (X + \sqrt{t} Z)

is the Fisher information

J (X + \sqrt{t} Z)

. The above conjecture is equivalent to hypothesizing that the Fisher information of

X + \sqrt{t} Z

is completely monotone, thus admitting a very simple characterization using the Laplace Transform [10]: there exists a finite Borel measure

μ (\cdot)

such that

\begin{matrix} J (X + \sqrt{t} Z) = \int_{0}^{+ \infty} e^{- λ t} μ (d λ) . \end{matrix}

(1)

Back in 1966, McKean [7] also studied the derivatives in t of

h (X + \sqrt{t} Z)

, and noticed that Gaussian X achieves the minimum of

\frac{d}{d t} h (X + \sqrt{t} Z)

and

- \frac{d^{2}}{d t^{2}} h (X + \sqrt{t} Z)

, subject to

Var (X) = σ^{2}

. Then, McKean implicitly made the following conjecture that Gaussian optimality holds generally:

Conjecture 2

([7]). Subject to

Var (X) = σ^{2}

, Gaussian X with variance

σ^{2}

achieves the minimum of

{(- 1)}^{n - 1} \times \frac{d^{n}}{d t^{n}} h (X + \sqrt{t} Z)

for

t > 0

and

n \geq 1

.

Notice that, if

X_{G}

is Gaussian with variance

σ^{2}

, by routine calculation,

\begin{matrix} 2 h (X_{G} + \sqrt{t} Z) & = ln 2 π e (σ^{2} + t), \\ {(- 1)}^{n - 1} \times 2 \frac{d^{n}}{d t^{n}} h (X_{G} + \sqrt{t} Z) & = (n - 1)! \times {(σ^{2} + t)}^{- n} > 0 . \end{matrix}

(2)

Hence, McKean’s conjecture implies the one by Cheng and Geng.

Compared with the progress made by Cheng and Geng [9] on Conjecture 1, there has been little progress on Conjecture 2. Most of the existing results are on the second derivative of the differential entropy (or the mutual information), and on generalizing the EPI to other settings. For example: Guo et al. [11] represents the derivatives in the signal-to-noise ratio of the mutual information in terms of the minimum mean-square estimation error, building on de Bruijn’s identity [2]; Wibisono and Jog [12] study the mutual information along the density flow defined by the heat equation and show that it is a convex function of time if the initial distribution is log-concave; Wang and Madiman [13] recover the proof of the EPI via rearrangements; Courtade [14] generalizes Costa’s EPI to non-Gaussian additive perturbations; and König and Smith [15] propose a quantum version of the EPI.

In this paper, we work on Conjecture 2. The main results are to show that Conjecture 2 holds for higher orders up to at least five under the log-concavity condition, and the introduction of the technique of linear matrix inequalities.

The paper is organized as follows: in Section 2, we obtain the formulae for the derivatives of the differential entropy

h (X + \sqrt{t} Z)

(Theorem 1) and show that McKean’s conjecture holds for higher orders up to at least five under the log-concavity condition (Corollary 1). As a corollary, we recover Toscani’s result [5] on the third derivative of the entropy power, using the Cauchy–Schwartz inequality, which is much simpler. In Section 3, we introduce the linear matrix inequality approach, and transform the above two conjectures to the feasibility check of semidefinite programming problems. With this approach, we can easily obtain the coefficients in Theorem 1. Then, we show that the direct generalization of the method by Cheng and Geng might not work for orders higher than four for proving Conjecture 1. In Section 4, we prove the main theorem of Section 2.

2. Main Results

We first introduce the notation that is used throughout this paper. When the functions are single-variate, we use

\frac{d \cdot}{d \cdot}

for its derivative. For the multi-variate cases, we use

\frac{\partial \cdot}{\partial \cdot}

for the partial derivative. To simplify the notation, for the derivatives of a general single-variate function

g (y)

, we also use

g^{'} (y)

,

g^{″} (y)

and

g^{‴} (y)

to represent the first, second and third derivatives, respectively; and

g^{(n)} (y)

denotes the n-th derivative for

n \geq 1

.

In the rest of the paper, let Z be a standard Gaussian random variable, and X be independent of Z. Denote

Y_{t} : = X + \sqrt{t} Z, t > 0 .

According to [4,16],

Y_{t}

has nice properties: The probability density function

f (y, t)

of

Y_{t}

exists, is strictly positive and infinitely differentiable; The differential entropy

h (Y_{t})

exists. Denote

\begin{matrix} f_{n} : = & \frac{\partial^{n}}{\partial y^{n}} f (y, t), \\ T_{n} : = & \frac{\partial^{n}}{\partial y^{n}} ln f (y, t), n = 0, 1, 2, \dots, \end{matrix}

where it is understood that

f_{n}

and

T_{n}

are functions of

(y, t)

. We also present some properties of

f (y, t)

in the following lemma. The proof can be found in, say, [2,16] and Propositions 1 and 2 in [9].

Lemma 1.

For

t > 0

, the probability density function

f (y, t)

satisfies the following properties:

(1): The heat equation holds: $\frac{\partial}{\partial t} f = \frac{1}{2} \frac{\partial^{2}}{\partial y^{2}} f$ .
(2): ${lim}_{| y | \to \infty} f_{n} = 0$ , $\forall n \geq 0$ , $\forall t > 0$ .
(3): The expectation of the product of the $T_{i}$ , $E [\prod_{i} T_{i}]$ exists, and ${lim}_{| y | \to \infty} f \prod_{i} T_{i} = 0$ , $\forall t > 0$ .

In Lemma 1, part (3), in writing

E [\prod_{i} T_{i}],

we think of each

T_{i}

as a function of

(Y_{t}, t)

.

Notice that, given X and Z, the differential entropy

h (X + \sqrt{t} Z)

is a function of t. The formulae for the first and second derivatives of

h (X + \sqrt{t} Z)

are presented in the following lemma. According to Stam [2], the first equality is due to de Bruijn, and the right-hand side is actually the Fisher information (page 671 of [17]); the second one is due to McKean [7], Toscani [8] and Villani [6]; the Gaussian optimality is due to McKean [7].

Lemma 2.

For the first and second derivatives of the differential entropy

h (X + \sqrt{t} Z)

, the following expressions hold for

t > 0

:

\begin{matrix} 2 h^{'} (X + \sqrt{t} Z) & = E [{(\frac{f_{1}}{f})}^{2}], \end{matrix}

(3)

\begin{matrix} - 2 h^{″} (X + \sqrt{t} Z) & = E [{(\frac{f_{2}}{f} - \frac{f_{1}^{2}}{f^{2}})}^{2}] . \end{matrix}

(4)

Subject to

Var X = σ^{2}

, Gaussian X with variance

σ^{2}

minimizes

h^{'} (X + \sqrt{t} Z)

and

- h^{″} (X + \sqrt{t} Z)

.

By standard manipulations, one has

\begin{matrix} T_{1} = \frac{f_{1}}{f}, T_{2} = \frac{f_{2}}{f} - \frac{f_{1}^{2}}{f^{2}} . \end{matrix}

(5)

Thus, it is straightforward to rewrite the derivatives as

\begin{matrix} 2 h^{'} (X + \sqrt{t} Z) & = E [T_{1}^{2}], \end{matrix}

(6)

\begin{matrix} - 2 h^{″} (X + \sqrt{t} Z) & = E [T_{2}^{2}] . \end{matrix}

(7)

For the third and fourth derivatives, one can refer to Theorems 1 and 2 in [9], where they were represented by the

f_{i}

. Notice that these representations are not unique, and the ones in [9] are sufficient for identifying the signs. Instead, in Theorem 1, we use the

T_{i}

, and this will facilitate our proof of the Gaussian optimality in Corollary 1.

Theorem 1.

For

t > 0

, the derivatives of the differential entropy

h (X + \sqrt{t} Z)

can be expressed as:

\begin{matrix} 2 h^{(3)} (X + \sqrt{t} Z) & = E [T_{3}^{2} - 2 T_{2}^{3}], \end{matrix}

(8)

\begin{matrix} - 2 h^{(4)} (X + \sqrt{t} Z) & = E [T_{4}^{2} + 6 T_{2}^{4} - 12 T_{3}^{2} T_{2}], \end{matrix}

(9)

\begin{matrix} 2 h^{(5)} (X + \sqrt{t} Z) & = E [T_{5}^{2} - 24 T_{2}^{5} - 8 T_{4}^{2} T_{2} - 6 T_{3}^{2} T_{2} T_{1}^{2} + 12 T_{5} T_{3} T_{2} + 114 T_{3}^{2} T_{2}^{2}] . \end{matrix}

(10)

The proof to this theorem is left to Section 4. The existence of such expressions and how to obtain the coefficients are left to Section 3, where the method of linear matrix inequalities is introduced.

Log-Concave Case

Lemma 2 already ensures the optimality of Gaussians, subject to

Var (X) = σ^{2}

, for the first and second derivatives. For higher ones, we do not know if we can show the optimality based on the expressions in Theorem 1. Here, we impose the constraint of log-concavity on

f (y, t)

and summarize the results in Corollaries 1–3.

A nonnegative function

f (\cdot)

is logarithmically concave (or log-concave for short) if its domain is convex and it satisfies the inequality

\begin{matrix} f (θ x + (1 - θ) y) \geq f {(x)}^{θ} f {(y)}^{1 - θ} \end{matrix}

for all

x, y

in the domain and

0 < θ < 1

. If f is strictly positive, this is equivalent to saying that the logarithm of the function is concave (Section 2.5 of [18]). In our case, assuming that

f (y, t)

is log-concave in y is equivalent to

T_{2} \leq 0

.

Examples of log-concave distributions include the Gaussian, exponential, Laplace, and the Gamma with parameter larger than one. Notice that, if the probability density function of X is log-concave, then so is that of

X + \sqrt{t} Z

(Section 3.5.2 of [18]).

Corollary 1.

If the probability density function of

X + \sqrt{t} Z

is log-concave, then, subject to

Var (X) = σ^{2}

, Gaussian X with variance

σ^{2}

achieves the minimum of

{(- 1)}^{n - 1} h^{(n)} (X + \sqrt{t} Z)

for

t > 0

and

3 \leq n \leq 5

.

Proof.

Let

X_{G}

be Gaussian with mean

μ

and variance

σ^{2}

. The probability density function of

Y_{G} : = X_{G} + \sqrt{t} Z

is

f (y_{G}, t) = \frac{1}{\sqrt{2 π (σ^{2} + t)}} \times \exp {- \frac{{(y_{G} - μ)}^{2}}{2 (σ^{2} + t)}} .

The key observation is that the second derivative of the logarithm in the Gaussian case is

\begin{matrix} T_{2, G} : = \frac{\partial^{2}}{\partial y_{G}^{2}} ln f (y_{G}, t) = - {(σ^{2} + t)}^{- 1} . \end{matrix}

Hence, from Equation (2), the derivatives of the differential entropy in the Gaussian case are

{(- 1)}^{n - 1} \times 2 h^{(n)} (X_{G} + \sqrt{t} Z) = (n - 1)! \times {(σ^{2} + t)}^{- n} = (n - 1)! \times E {[- T_{2, G}]}^{n} .

Now, if one can show the following chain of inequalities:

\begin{matrix} {(- 1)}^{n - 1} \times 2 h^{(n)} (X + \sqrt{t} Z) \overset{(a)}{\geq} (n - 1)! \times E [{(- T_{2})}^{n}] \overset{(b)}{\geq} (n - 1)! \times E {[- T_{2}]}^{n} \overset{(c)}{\geq} (n - 1)! \times E {[- T_{2, G}]}^{n}, \end{matrix}

(11)

then one is done.

For inequality

(b)

, the log-concavity condition, namely

T_{2} \leq 0

, suffices.

For inequality

(c)

, it suffices to show that

E [- T_{2}] \geq E [- T_{2, G}] \geq 0

. This can be proved using Lemma 2: Notice that

E [\frac{f_{2}}{f}] = \int_{- \infty}^{+ \infty} f_{2} (y, t) d y = \int_{- \infty}^{+ \infty} d f_{1} (y, t) = f_{1} (y, t) |_{- \infty}^{+ \infty} = 0 - 0,

where the last equality is due to Lemma 1.

Now, from Equation (5),

\begin{matrix} E [- T_{2}] & = E [- \frac{f_{2}}{f} + \frac{f_{1}^{2}}{f^{2}}] = E [\frac{f_{1}^{2}}{f^{2}}] . \end{matrix}

(12)

Combining this with Lemma 2, one has

E [- T_{2}] = 2 h^{'} (X + \sqrt{t} Z) \geq 2 h^{'} (X_{G} + \sqrt{t} Z) = E [- T_{2, G}] .

This part is finished by noticing that

E [- T_{2, G}] > 0

from Equation (2).

For inequality (a), we show each case of n using Theorem 1 and the condition

T_{2} \leq 0

. For

n = 3

,

2 h^{(3)} (X + \sqrt{t} Z) = E [T_{3}^{2} - 2 T_{2}^{3}] \geq E [- 2 T_{2}^{3}] = (n - 1)! \times E [{(- T_{2})}^{n}], n = 3 .

For

n = 4

,

- 2 h^{(4)} (X + \sqrt{t} Z) = E [T_{4}^{2} + 6 T_{2}^{4} - 12 T_{3}^{2} T_{2}] \geq E [6 T_{2}^{4}] = (n - 1)! \times E [{(- T_{2})}^{n}], n = 4,

where the inequality is due to

T_{2} \leq 0

, thus

E [- 12 T_{3}^{2} T_{2}] \geq 0

. For

n = 5

,

\begin{matrix} 2 h^{(5)} (X + \sqrt{t} Z) & = E [T_{5}^{2} - 24 T_{2}^{5} - 8 T_{4}^{2} T_{2} - 6 T_{3}^{2} T_{2} T_{1}^{2} + 12 T_{5} T_{3} T_{2} + 114 T_{3}^{2} T_{2}^{2}] \\ = E [{(T_{5} + 6 T_{3} T_{2})}^{2} - 24 T_{2}^{5} - 8 T_{4}^{2} T_{2} - 6 T_{3}^{2} T_{2} T_{1}^{2} + 78 T_{3}^{2} T_{2}^{2}] \\ \geq E [- 24 T_{2}^{5}] \\ = (n - 1)! \times E [{(- T_{2})}^{n}], n = 5 . \end{matrix}

Now, the proof is finished. ☐

The following corollary deals with the fifth-order case in [9], under the log-concavity assumption. The proof follows directly from Corollary 1 and Equation (2).

Corollary 2.

If the probability density function of

X + \sqrt{t} Z

is log-concave, then the fifth derivative of the differential entropy is strictly positive:

h^{(5)} (X + \sqrt{t} Z) > 0

for

t > 0

.

Regarding the entropy power, it is already known that

N^{'} (X + \sqrt{t} Z) \geq 0

from the connection with Fisher information, and

N^{″} (X + \sqrt{t} Z) \leq 0

according to [4]. For the third derivative, Toscani showed that

N^{(3)} (X + \sqrt{t} Z) \geq 0

, under the log-concavity assumption. Here, we simplify Toscani’s proof, using a Cauchy–Schwartz argument.

Corollary 3.

If the probability density function of

X + \sqrt{t} Z

is log-concave, then the third derivative of the entropy power is nonnegative:

N^{(3)} (X + \sqrt{t} Z) \geq 0

for

t > 0

.

Proof.

For brevity, let

h^{'} : = h^{'} (X + \sqrt{t} Z)

, and, similarly, we omit the arguments for higher orders. Routine manipulations yield that

\begin{matrix} N^{(3)} (X + \sqrt{t} Z) = \frac{d^{3}}{d t^{3}} \frac{1}{2 π e} e^{2 h (X + \sqrt{t} Z)} = \frac{1}{2 π e} e^{2 h (X + \sqrt{t} Z)} [{(2 h^{'})}^{3} + 3 \times 2 h^{'} \times 2 h^{″} + 2 h^{‴}] . \end{matrix}

Thus, it suffices to show

{(2 h^{'})}^{3} + 3 \times 2 h^{'} \times 2 h^{″} + 2 h^{‴} \geq 0

. First, we express

h^{'}

,

h^{″}

and

h^{‴}

in the form of the

T_{i}

: according to Lemma 2 and Equation (12),

2 h^{'} = E [- T_{2}]

; from Equation (7),

2 h^{″} = - E [T_{2}^{2}]

; copying from Equation (8),

2 h^{‴} = E [T_{3}^{2} - 2 T_{2}^{3}]

.

Also notice that, from Lemma 2,

2 h^{'} (X + \sqrt{t} Z) \geq 2 h^{'} (X_{G} + \sqrt{t} Z) = {(σ_{X}^{2} + t)}^{- 1} > 0

. Hence,

E [- T_{2}] > 0

(it cannot be zero). Now, under the log-concavity condition, namely

T_{2} \leq 0

, from the Cauchy–Schwartz inequality for random variables, we have:

\begin{matrix} E [- T_{2}] E [- T_{2}^{3}] = E [{\sqrt{- T_{2}}}^{2}] E [{\sqrt{- T_{2}^{3}}}^{2}] \geq E {[\sqrt{- T_{2}} \sqrt{- T_{2}^{3}}]}^{2} = E {[T_{2}^{2}]}^{2} . \end{matrix}

Thus, we have

\begin{matrix} {(2 h^{'})}^{3} + 3 \times 2 h^{'} \times 2 h^{″} + 2 h^{‴} & = E {[- T_{2}]}^{3} - 3 \times E [- T_{2}] \times E [T_{2}^{2}] + E [T_{3}^{2} - 2 T_{2}^{3}] \\ \geq E {[- T_{2}]}^{3} - 3 \times E [- T_{2}] \times E [T_{2}^{2}] + E [- 2 T_{2}^{3}] \\ = {(E [- T_{2}])}^{- 1} (E {[- T_{2}]}^{4} - 3 \times E {[- T_{2}]}^{2} \times E [T_{2}^{2}] + 2 E [- T_{2}] E [- T_{2}^{3}]) \\ \geq {(E [- T_{2}])}^{- 1} (E {[- T_{2}]}^{4} - 3 \times E {[- T_{2}]}^{2} \times E [T_{2}^{2}] + 2 E {[T_{2}^{2}]}^{2}) \\ = {(E [- T_{2}])}^{- 1} (E [T_{2}^{2}] - E {[- T_{2}]}^{2}) (2 E [T_{2}^{2}] - E {[- T_{2}]}^{2}) . \end{matrix}

The proof is finished by noticing that

E [T_{2}^{2}] \geq E {[- T_{2}]}^{2} \geq 0

, which implies that the right-hand side is nonnegative. ☐

3. Linear Matrix Inequalities

In this section, we introduce the method of linear matrix inequalities (LMI), and transform the proof of Conjectures 1 and 2 to the feasibility problem of LMI. This transformation also enables us to find the right coefficients in Theorem 1.

Recall that, in [9], the authors first obtained the fourth derivative as the following (Equation (27) in [9])

\begin{matrix} 2 h^{(4)} (X + \sqrt{t} Z) = E [- \frac{f_{4}^{2}}{f^{2}} - \frac{4 f_{2} f_{3}^{2}}{f^{3}} + \frac{4 f_{1}^{2} f_{3}^{2}}{f^{4}} - \frac{3 f_{2}^{4}}{f^{4}} + \frac{24 f_{1}^{2} f_{2}^{3}}{f^{5}} - \frac{36 f_{1}^{4} f_{2}^{2}}{f^{6}} + \frac{90 f_{1}^{8}}{7 f^{8}}] . \end{matrix}

(13)

Then, with some equalities (from integration by parts), they showed this derivative can be expressed as the negative of a sum of squares (Theorem 2 in [9]):

\begin{matrix} 2 h^{(4)} (X + \sqrt{t} Z) & = - E [{(\frac{f_{4}}{f} - \frac{6}{5} \frac{f_{1} f_{3}}{f^{2}} - \frac{7}{10} \frac{f_{2}^{2}}{f^{2}} + \frac{8}{5} \frac{f_{1}^{2} f_{2}}{f^{3}} - \frac{1}{2} \frac{f_{1}^{4}}{f^{4}})}^{2} + {(\frac{2}{5} \frac{f_{1} f_{3}}{f^{2}} - \frac{1}{3} \frac{f_{1}^{2} f_{2}}{f^{3}} + \frac{9}{100} \frac{f_{1}^{4}}{f^{4}})}^{2} \\ + {(- \frac{4}{100} \frac{f_{1}^{2} f_{2}}{f^{3}} + \frac{4}{100} \frac{f_{1}^{4}}{f^{4}})}^{2} + \frac{1}{300} \frac{f_{2}^{4}}{f^{4}} + \frac{56}{90,000} \frac{f_{1}^{4} f_{2}^{2}}{f^{6}} + \frac{13}{70,000} \frac{f_{1}^{8}}{f^{8}}] . \end{matrix}

(14)

Hence, the fourth derivative is nonpositive. The sum of squares has a natural connection with positive semidefinite matrices. The right-hand side of Equation (14) can be written as

- E [u^{T} F u]

, where u is the column vector with coordinates

u = (f_{4} / f, f_{1} f_{3} / f^{2}, f_{2}^{2} / f^{2}, f_{1}^{2} f_{2} / f^{3}, f_{1}^{4} / f^{4})

and F is a positive semidefinite matrix. Thus, the method in [9] is actually to verify the existence of a suitable positive semidefinite matrix F. This can be cast as the feasibility of a linear matrix inequality.

A linear matrix inequality (Chapter 2 of [18]) has the form

\begin{matrix} F (x, y) : = F_{0} + \sum_{i = 1}^{I} x_{i} F_{i} + \sum_{j = 1}^{J} y_{j} G_{j} ⪰ 0, \end{matrix}

(15)

where the

m \times m

symmetric matrices

F_{0}

,

F_{i}

,

G_{j}

,

i = 1, \dots, I

,

j = 1, \dots, J

are given, variables

x_{i}

are real and

y_{j}^{'}

are nonnegative, and the notation

F (x, y) ⪰ 0

means

F (x, y)

is positive semidefinite. The feasibility problem refers to identifying if there exists a set of

x_{i}

and

y_{j}

such that

F (x, y)

is positive semidefinite.

To reformulate the method used by Cheng and Geng [9] as an LMI feasibility problem, using the fourth derivative as an illustrative example, the main idea is: first, transform the original expression of the derivative to the form

- 2 h^{(4)} (X + \sqrt{t} Z) = E [u^{T} F_{0} u] .

Then, transform the equalities resulting from integration by parts to the form

0 = E [u^{T} F_{i} u], i = 1, 2, \dots, I .

Finally, try to find a set of variables

x_{i}

such that

F_{0} + \sum_{i} x_{i} F_{i} ⪰ 0

, which is sufficient to show that

- 2 h^{(4)} (X + \sqrt{t} Z) = E [u^{T} F_{0} u] = E [u^{T} (F_{0} + \sum_{i} x_{i} F_{i}) u] \geq 0 .

One can notice that there is no matrix

G_{j}

in the above statement. This is mainly because only equalities were available in [9]. When one imposes inequality constraints, for example

T_{2} \leq 0

, as in this paper, then one will be able to construct matrices

G_{j}

.

Before we proceed to introduce the details on constructing those matrices, the following observations are clear regarding

u = (f_{4} / f, f_{1} f_{3} / f^{2}, f_{2}^{2} / f^{2}, f_{1}^{2} f_{2} / f^{3}, f_{1}^{4} / f^{4})

and the fourth derivative

2 h^{(4)} (X + \sqrt{t} Z)

(see Equation (13)): (a) the sum-order of derivatives for each entry of u is four, for example, the sum-order of

f_{1}^{2} f_{2} / f^{3}

is

1 \times 2 + 2 = 4

; (b) the highest order of a single term in the entries of u is four, namely

f_{4} / f

; (c) the sum-order of each entry in the fourth derivative is eight, which is twice that of u.

In the following, we take the fourth derivative as an example, and show how to construct these matrices

F_{0}

(Section 3.3),

F_{i}

(Section 3.1 and Section 3.2), and

G_{j}

(Section 3.4). We decide to use the

T_{k}

as the entries of u, instead of the

f_{k}

, the motivation for which is clear from the proof of Corollary 1 and the desire to exploit the assumption

T_{2} \leq 0

. Based on the above observation and the expressions in Equation (5), our vector u is

\begin{matrix} u = (\begin{matrix} T_{4}, & T_{3} T_{1}, & T_{2}^{2}, & T_{2} T_{1}^{2}, & T_{1}^{4} \end{matrix}) . \end{matrix}

(16)

Thus,

F_{0}

,

F_{i}

,

G_{j}

are

5 \times 5

symmetric matrices. Here, we mention that the expressions appearing as coordinates in u correspond to the integer partitions of four.

The organization of this section is as follows: Section 3.1, Section 3.2 and Section 3.3 deal with the sign of the fourth derivative with only equality constraints (see Conjecture 1); Section 3.4 further incorporates the inequality constraints, namely

T_{2} \leq 0

; Section 3.5 shows the manipulation for the optimality of Gaussian inputs (see Conjecture 2). In Section 3.6, we consider the sign and the Gaussian optimality for the fifth derivative.

3.1. Matrices $F_{i}$ from Multiple Representations

The matrices

F_{i}

are such that

E [u^{T} F_{i} u] = 0

. A trivial case is to notice that different products of the form

u (i) u (j)

may map to the same term, for example

T_{2}^{2} T_{1}^{4} = (T_{2}^{2}) (T_{1}^{4}) = (T_{2} T_{1}^{2}) (T_{2} T_{1}^{2}), \Rightarrow u (3) u (5) = u (4) u (4) .

That is,

T_{2}^{2} T_{1}^{4}

admits multiple representations as

u (i) u (j)

. It is easy to construct the corresponding matrix

F_{1}

such that

u^{T} F_{1} u = 0

:

\begin{matrix} F_{1} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & - 1 \\ 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & - 1 & 0 & 0 \end{matrix}] . \end{matrix}

For the fourth derivative, only one term has multiple representations. There is none for the third derivative, and three for the fifth (

F_{1}

,

F_{2}

and

F_{3}

in Section 3.6).

3.2. Matrices $F_{i}$ from Integration by Parts

The equalities of the type

E [u^{T} F_{i} u] = 0

used in [9] are from integration by parts. Here, we list them one by one.

Notice that all the possible terms with sum-order eight and highest-order four are the following (the numbers in the left column are indices):

\begin{matrix} \begin{matrix} 1 - 5 : & T_{4}^{2}, & T_{4} T_{3} T_{1}, & T_{4} T_{2}^{2}, & T_{4} T_{2} T_{1}^{2}, & T_{4} T_{1}^{4}, \\ 6 - 9 : & T_{3}^{2} T_{1}^{2}, & T_{3} T_{2}^{2} T_{1}, & T_{3} T_{2} T_{1}^{3}, & T_{3} T_{1}^{5}, \\ 10 - 13 : & T_{2}^{4}, & T_{2}^{3} T_{1}^{2}, & T_{2}^{2} T_{1}^{4}, & T_{2} T_{1}^{6}, \\ 14 : & T_{1}^{8}, \\ 15 : & T_{3}^{2} T_{2} . \end{matrix} \end{matrix}

Denote this vector as w.

These terms are arranged in the order such that the first (fourteen) terms can be expressed as

u (i) u (j)

for some i and j, while the last term(s) cannot be. We call the first terms the quadratic part

w^{q u a}

, and the last term(s) the non-quadratic part

w^{n o n}

. Thus,

w = (w^{q u a}, w^{n o n})

.

It is not difficult to conclude that, for non-repetition, one only needs to perform integration by parts on the entries whose highest-order term is of power one. All of these entries are (eight in total):

\begin{matrix} \begin{matrix} T_{4} T_{3} T_{1}, & T_{4} T_{2}^{2}, & T_{4} T_{2} T_{1}^{2}, & T_{4} T_{1}^{4}, & T_{3} T_{2}^{2} T_{1}, & T_{3} T_{2} T_{1}^{3}, & T_{3} T_{1}^{5}, & T_{2} T_{1}^{6} . \end{matrix} \end{matrix}

Taking

T_{4} T_{3} T_{1}

as an example, one can show that (Equation (18), see the end of this subsection)

E [2 T_{4} T_{3} T_{1} + T_{3}^{2} T_{1}^{2} + T_{3}^{2} T_{2}] = 0 .

In addition, this can be written as

E [c_{1}^{T} w] = 0

, where

c_{1} \in R^{15}, c_{1} ([2, 6, 15]) = [2, 1, 1] .

There are eight equalities in total and hence there are vectors

c_{1}, \dots, c_{8}

. We put each

c_{i}

as the i-th row of

C \in R^{8 \times 15}

, and write those equalities as

E [C w] = 0,

where

C = \begin{matrix} \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \end{matrix} \\ [ & \begin{matrix} 2 & 1 & 1 \\ 1 & 1 & 2 \\ 1 & 1 & 2 & 1 \\ 1 & 4 & 1 \\ 3 & 1 & 1 \\ 2 & 3 & 1 \\ 1 & 5 & 1 \\ 7 & 1 \end{matrix} & ] & \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \end{matrix} \end{matrix} .

(17)

The entries can be found in Equations (18)–(25).

We need to extract matrices F from these eight equalities

E [C w] = 0

, such that

E [u^{T} F u] \equiv 0

. The main problem is that

c_{k}^{T} w

may contain entries that are not expressible as

u (i) u (j)

. In particular, for the fourth derivative, this happens when

c_{k} (15) \neq 0

. One needs to do some work to cancel these entries. The general method, which can also be used in higher-order cases, is stated below:

Firstly, since $w = (w^{q u a}, w^{n o n})$ , we separate the blocks of C accordingly,

$\begin{matrix} C = [\begin{matrix} C_{11} & C_{12} \\ C_{21} & 0 \end{matrix}], E [[\begin{matrix} C_{11} & C_{12} \\ C_{21} & 0 \end{matrix}] [\begin{matrix} w^{q u a} \\ w^{n o n} \end{matrix}]] = 0 . \end{matrix}$

In the above, $C_{11} \in R^{2 \times 14}$ , $C_{12} \in R^{2 \times 1}$ , $C_{21} \in R^{6 \times 14}$ .
Secondly, each row of $C_{21}$ corresponds to a symmetric matrix $F_{i}$ such that $E [u^{T} F_{i} u] \equiv 0$ . In particular, for the first row of $C_{21}$ , the matrix is

$\begin{matrix} F_{2} = [\begin{matrix} 0 & 0 & 0 & 1 & 0 \\ 0 & 2 & 2 & 1 & 0 \\ 0 & 2 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], \end{matrix}$

such that $\frac{1}{2} u^{T} F_{2} u = T_{4} T_{2} T_{1}^{2} + T_{3}^{2} T_{1}^{2} + 2 T_{3} T_{2}^{2} T_{1} + T_{3} T_{2} T_{1}^{3}$ . Notice a scaling of a factor of two is added here just for conciseness, and this does not affect the feasibility of (15). Similarly, the other five matrices, corresponding to the remaining rows of $C_{21}$ , are

$\begin{matrix} F_{3} & = [\begin{matrix} 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 4 & 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{matrix}], F_{4} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 3 & 2 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], F_{5} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 & 1 \\ 0 & 2 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \end{matrix}], \end{matrix}$

$\begin{matrix} F_{6} & = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 5 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 1 & 5 & 1 & 0 \end{matrix}], F_{7} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 7 \\ 0 & 0 & 0 & 7 & 2 \end{matrix}] . \end{matrix}$
Thirdly, for $C_{11}$ and $C_{12}$ , the equalities are $E [C_{11} w^{q u a} + C_{12} w^{n o n}] = 0$ . Notice $w^{n o n}$ cannot be expressed in a quadratic form. Supposing that we can find a column vector z such that $z^{T} C_{12} = 0$ , then $E [z^{T} C_{11} w^{q u a}] = E [z^{T} (C_{11} w^{q u a} + C_{12} w^{n o n})] = 0$ . The vector z actually lies in the null space of $C_{12}^{T}$ , and it suffices to find the basis. One way is to do the $Q R$ decomposition:

$C_{12} = Q [\begin{matrix} U \\ 0 \end{matrix}],$

where U is upper-triangular. The null-space of $C_{12}^{T}$ has the same dimensions as the number of rows of $0$ above, and a basis as the last several columns of Q—in particular, for the fourth derivative

$C_{12} = [\begin{matrix} 1 \\ 2 \end{matrix}] = Q R = [\begin{matrix} - \frac{1}{\sqrt{5}} & - \frac{2}{\sqrt{5}} \\ - \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{matrix}] [\begin{matrix} - \sqrt{5} \\ 0 \end{matrix}] .$

Hence, one takes z as the second column of Q, which is (after scaling for conciseness) $z^{T} = [\begin{matrix} - 2, & 1 \end{matrix}]$ . Then, one calculates $z^{T} C_{11} w^{q u a} = - 4 T_{4} T_{3} T_{1} + T_{4} T_{2}^{2} - 2 T_{3}^{2} T_{1}^{2} + T_{3} T_{2}^{2} T_{1}$ , and the corresponding matrix $F_{8}$ (scaled by a factor of two) is

$\begin{matrix} F_{8} = [\begin{matrix} 0 & - 4 & 1 & 0 & 0 \\ - 4 & - 4 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] . \end{matrix}$

The rest of this subsection is devoted to calculating the equalities obtained from integration by parts. This is similar to that in [9], except in the form of the

T_{i}

. To begin, we need the following lemma.

Lemma 3.

Let A be a linear combination of terms of products of the

T_{i}

, then, for

n \geq 2

,

E [T_{n} A + T_{n - 1} \frac{\partial}{\partial y} A + T_{n - 1} T_{1} A] = 0 .

Proof.

From calculus,

\begin{matrix} E [T_{n} A] & = \int f T_{n} A d y = \int f A d T_{n - 1} \\ \overset{(a)}{=} 0 - \int T_{n - 1} d (f A) \\ = - \int T_{n - 1} (f_{1} A + f \frac{\partial}{\partial y} A) d y \\ \overset{(b)}{=} - \int T_{n - 1} (f T_{1} A + f \frac{\partial}{\partial y} A) d y \\ = - E [T_{n - 1} \frac{\partial}{\partial y} A + T_{n - 1} T_{1} A], \end{matrix}

where

(a)

is due to Lemma 1, and

(b)

is due to Equation (5). ☐

Now, using Lemma 3, one obtains the following equalities:

\begin{matrix} T_{n} = T_{4}, A = T_{3} T_{1}, \Rightarrow & E [2 T_{4} T_{3} T_{1} + T_{3}^{2} T_{2} + T_{3}^{2} T_{1}^{2}] = 0, \end{matrix}

(18)

\begin{matrix} T_{n} = T_{4}, A = T_{2}^{2}, \Rightarrow & E [T_{4} T_{2}^{2} + 2 T_{3}^{2} T_{2} + T_{3} T_{2}^{2} T_{1}] = 0, \end{matrix}

(19)

\begin{matrix} T_{n} = T_{4}, A = T_{2} T_{1}^{2}, \Rightarrow & E [T_{4} T_{2} T_{1}^{2} + T_{3}^{2} T_{1}^{2} + 2 T_{3} T_{2}^{2} T_{1} + T_{3} T_{2} T_{1}^{3}] = 0, \end{matrix}

(20)

\begin{matrix} T_{n} = T_{4}, A = T_{1}^{4}, \Rightarrow & E [T_{4} T_{1}^{4} + 4 T_{3} T_{2} T_{1}^{3} + T_{3} T_{1}^{5}] = 0, \end{matrix}

(21)

\begin{matrix} T_{n} = T_{3}, A = T_{2}^{2} T_{1}, \Rightarrow & E [3 T_{3} T_{2}^{2} T_{1} + T_{2}^{4} + T_{2}^{3} T_{1}^{2}] = 0, \end{matrix}

(22)

\begin{matrix} T_{n} = T_{3}, A = T_{2} T_{1}^{3}, \Rightarrow & E [2 T_{3} T_{2} T_{1}^{3} + 3 T_{2}^{3} T_{1}^{2} + T_{2}^{2} T_{1}^{4}] = 0, \end{matrix}

(23)

\begin{matrix} T_{n} = T_{3}, A = T_{1}^{5}, \Rightarrow & E [T_{3} T_{1}^{5} + 5 T_{2}^{2} T_{1}^{4} + T_{2} T_{1}^{6}] = 0, \end{matrix}

(24)

\begin{matrix} T_{n} = T_{2}, A = T_{1}^{6}, \Rightarrow & E [7 T_{2} T_{1}^{6} + T_{1}^{8}] = 0 . \end{matrix}

(25)

With these equalities, matrix (17) can be constructed.

3.3. Matrix $F_{0}$ from the Derivative

Suppose we have already obtained the fourth derivative in the form (see Equation (30) later)

- 2 h^{(4)} (X + \sqrt{t} Z) = E [d^{T} w] = E [d_{1}^{T} w^{q u a} + d_{2}^{T} w^{n o n}],

where

d_{1} \in R^{14}

,

d_{2} \in R^{1}

. Then, similar to

F_{8}

, we can find the matrix

F_{0}

such that

- 2 h^{(4)} (X + \sqrt{t} Z) = E [u^{T} F_{0} u]

.

To cancel the non-quadratic term

d_{2}^{T} w^{n o n}

, we solve for

z_{2}^{T} C_{12} = d_{2}^{T}

(the solution

z_{2}

should exist, otherwise it is not possible to find a quadratic form and the LMI approach fails). Then, since

E [C_{11} w^{q u a} + C_{12} w^{n o n}] = 0

, we have

\begin{matrix} - 2 h^{(4)} (X + \sqrt{t} Z) & = E [d_{1}^{T} w^{q u a} + d_{2}^{T} w^{n o n}] \\ = E [d_{1}^{T} w^{q u a} + d_{2}^{T} w^{n o n} - z_{2}^{T} (C_{11} w^{q u a} + C_{12} w^{n o n})] \\ = E [(d_{1}^{T} - z_{2}^{T} C_{11}) w^{q u a}] . \end{matrix}

Now,

F_{0}

can be constructed from

d_{1}^{T} - z^{T} C_{11}

.

The details are as follows. First, we need to express the derivative using the entries of w. This can be done recursively using the following lemma.

Lemma 4.

Let A be a linear combination of terms of products of the

T_{i}

. The following equalities hold:

\begin{matrix} 2 \frac{\partial}{\partial t} T_{n} = \frac{\partial^{n}}{\partial y^{n}} (T_{2} + T_{1}^{2}) = T_{n + 2} + \sum_{k = 0}^{n} C_{n}^{k} T_{k + 1} T_{n - k + 1}, n \geq 0, \end{matrix}

(26)

\begin{matrix} \frac{d}{d t} E [A] = E [\frac{1}{2} (T_{2} + T_{1}^{2}) A + \frac{\partial}{\partial t} A], \end{matrix}

(27)

\begin{matrix} \frac{d}{d t} E [T_{n}^{2}] = E [- T_{n + 1}^{2} + T_{n} \sum_{k = 1}^{n - 1} C_{n}^{k} T_{k + 1} T_{n - k + 1}], n \geq 1 . \end{matrix}

(28)

The proof is left to Appendix A. Now, with Equation (7):

- 2 h^{″} (X + \sqrt{t} Z) = E [T_{2}^{2}],

and Equation (28), one can easily obtain that

\begin{matrix} 2 h^{(3)} (X + \sqrt{t} Z) = - \frac{d}{d t} E [T_{2}^{2}] = E [T_{3}^{2} - 2 T_{2}^{3}] . \end{matrix}

(29)

For the fourth derivative,

\begin{matrix} - 2 h^{(4)} (X + \sqrt{t} Z) & = - \frac{d}{d t} E [T_{3}^{2} - 2 T_{2}^{3}] \\ \overset{(a)}{=} - E [- T_{4}^{2} + T_{3} (3 T_{2} T_{3} + 3 T_{3} T_{2})] - \frac{d}{d t} E [- 2 T_{2}^{3}] \\ \overset{(b)}{=} - E [- T_{4}^{2} + T_{3} (3 T_{2} T_{3} + 3 T_{3} T_{2})] - E [- (T_{2} + T_{1}^{2}) T_{2}^{3} - 3 T_{2}^{2} \frac{\partial}{\partial t} 2 T_{2}] \\ \overset{(c)}{=} - E [- T_{4}^{2} + 6 T_{3}^{2} T_{2} - T_{2}^{4} - T_{2}^{3} T_{1}^{2} - 3 T_{2}^{2} (T_{4} + 2 T_{3} T_{1} + 2 T_{2} T_{2})] \\ = E [T_{4}^{2} + 3 T_{4} T_{2}^{2} - 6 T_{3}^{2} T_{2} + 6 T_{3} T_{2}^{2} T_{1} + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}], \end{matrix}

(30)

where

(a)

,

(b)

,

(c)

are due to Equations (28), (27), (26), respectively.

Hence, we have the vector

d = (d_{1}, d_{2}) \in R^{15}

and its blocks

d_{1} \in R^{14}

,

d_{2} \in R^{1}

:

\begin{matrix} d [1, 3, 7, 10, 11] = [1, 3, 6, 7, 1], \\ d [15] = - 6 . \end{matrix}

One solves for

z_{2}

such that

z_{2}^{T} C_{12} = d_{2}^{T}

and obtains

\begin{matrix} z_{2}^{T} = [\begin{matrix} 0, & - 3 \end{matrix}] . \end{matrix}

Now,

d_{1}^{T} - z_{2}^{T} C_{11}

has nonzero entries at locations

[1, 3, 7, 10, 11]

, with values

[1, 6, 9, 7, 1],

respectively. Furthermore,

F_{0}

(scaled by a factor of two) is found as

\begin{matrix} F_{0} = [\begin{matrix} 2 & 0 & 6 & 0 & 0 \\ 0 & 0 & 9 & 0 & 0 \\ 6 & 9 & 14 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] . \end{matrix}

By the end of this subsection, it is easy to see that Cheng and Geng’s method can be reformulated as identifying if there exist

x_{1}, \dots, x_{8} \in R

such that

F_{0} + \sum_{i = 1}^{8} x_{i} F_{i} ⪰ 0 .

We use the convex optimization package [19] to identify the feasibility of the above LMI problem, and it turns out to be feasible as it should be according to Equation (14).

3.4. Matrices $G_{j}$ from Log-Concavity

Recall that, in [9], there is no matrix

G_{j}

, since there is no inequality constraint. In this paper, we consider the log-concave case

T_{2} \leq 0

, thus introducing inequality constraints.

For the fourth order,

T_{2} \leq 0

actually implies that the following entries in w are nonpositive:

T_{2}^{3} T_{1}^{2}, T_{2} T_{1}^{6}, T_{2} T_{3}^{2} .

It is clear that the powers of

T_{2}

are odd, and the others are even.

To transform these nonpositive terms into matrices

G_{j}

, the first two terms,

T_{2}^{3} T_{1}^{2}

and

T_{2} T_{1}^{6}

are trivial, since they can be expressed by

u (i) u (j)

directly:

\begin{matrix} 0 \geq 2 E [T_{2}^{3} T_{1}^{2}] = E [u^{T} G_{1} u], \\ 0 \geq 2 E [T_{2} T_{1}^{6}] = E [u^{T} G_{2} u], \end{matrix}

where

\begin{matrix} G_{1} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], G_{2} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \end{matrix}] . \end{matrix}

For the term

T_{2} T_{3}^{2}

, the idea is similar to the third part in Section 3.2. One first finds

z_{3} \in R^{2}

such that

z_{3}^{T} C_{12} w^{n o n} = T_{2} T_{3}^{2}

, namely

z_{3}^{T} C_{12} = 1

. The solution is

z_{3}^{T} = [\begin{matrix} 0, & 1 / 2 \end{matrix}]

. Then,

\begin{matrix} E [T_{2} T_{3}^{2}] = E [T_{2} T_{3}^{2} - z_{3}^{T} (C_{11} w^{q u a} + C_{12} w^{n o n})] = E [- z_{3}^{T} C_{11} w^{q u a}] = E [- \frac{1}{2} T_{4} T_{2}^{2} - \frac{1}{2} T_{3} T_{2}^{2} T_{1}] . \end{matrix}

Now, it is routine to obtain

\begin{matrix} 0 \geq 4 E [T_{2} T_{3}^{2}] = E [u^{T} G_{3} u], \end{matrix}

where

\begin{matrix} G_{3} = [\begin{matrix} 0 & 0 & - 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 & 0 \\ - 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] . \end{matrix}

At this point, we are done with the procedure for calculating all these matrices

F_{0}

, the

F_{i}

and the

G_{j}

. To show the negativity of the fourth derivative, it suffices to find a set of variables

x_{i} \in R

and

y_{j} \geq 0

such that

\begin{matrix} F_{0} + \sum_{i = 1}^{8} x_{i} F_{i} + \sum_{j = 1}^{3} y_{j} G_{j} ⪰ 0 . \end{matrix}

Remark 2.

The matrix

G_{2}

is actually redundant, since we know that

E [T_{2} T_{1}^{6}] \equiv - \frac{1}{7} E [T_{1}^{8}] \leq 0

, which is already included in the matrices

F_{i}

(in particular, matrix

F_{7}

in Section 3.2). Including

G_{2}

will not affect the feasibility check.

3.5. Matrix ${\tilde{F}}_{0}$ for Gaussian Optimality

However, to show the optimality of the Gaussian, the above formulation is not enough. According to inequality

(a)

in Equation (11), it would suffice to show that

{(- 1)}^{n - 1} \times 2 h^{(n)} (X + \sqrt{t} Z) - (n - 1)! \times E [{(- T_{2})}^{n}] \geq 0

instead of

{(- 1)}^{n - 1} \times 2 h^{(n)} (X + \sqrt{t} Z) \geq 0

. Thus, one needs to calculate the matrix

F_{0}

such that

{(- 1)}^{n - 1} \times 2 h^{(n)} (X + \sqrt{t} Z) - (n - 1)! \times E [{(- T_{2})}^{n}] = E [u^{T} F_{0} u] .

The procedure is the same as that in Section 3.3.

In particular, for the fourth derivative, since

n = 4

is even, we directly have the quadratic form

E [{(- T_{2})}^{n}] = u (3) u (3)

. It is straightforward to construct the matrix

{\tilde{F}}_{0}

(scaled by a factor of two) here

\begin{matrix} {\tilde{F}}_{0} = F_{0} - [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 \times 3! & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] = [\begin{matrix} 2 & 0 & 6 & 0 & 0 \\ 0 & 0 & 9 & 0 & 0 \\ 6 & 9 & 2 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], \end{matrix}

such that

E [u^{T} {\tilde{F}}_{0} u] = - 4 h^{(4)} (X + \sqrt{t} Z) - 12 E [T_{2}^{4}]

.

Now, the LMI is updated as

{\tilde{F}}_{0} + \sum_{i = 1}^{8} x_{i} F_{i} + \sum_{j = 1}^{3} y_{j} G_{j} ⪰ 0 .

Again, we use the convex optimization package [19] to check the feasibility. It turns out to be feasible and the solution helps us to identify the coefficients in Equation (9).

3.6. Fifth Derivative

For the fifth derivative, we omit the details of the manipulations since they are routine, and just provide the matrices here. For brevity, we only list out the nonzero entries of the upper-triangular part of a symmetric matrix. These matrices (with scaling) are

\begin{matrix} F_{0} : & F [(1, 1), (1, 3), (1, 5), (2, 3), (2, 5), (3, 3), (3, 4), (3, 5), (3, 6), (5, 5), (5, 6)] \\ = [2, 20, 29, \frac{214}{3}, \frac{49}{2}, \frac{178}{3}, - \frac{37}{3}, 58, - 6, 45, - \frac{1}{2}], \\ F_{1} : & F [(3, 6), (4, 5)] = [- 1, 1], \\ F_{2} : & F [(3, 7), (4, 6)] = [- 1, 1], \\ F_{3} : & F [(5, 7), (6, 6)] = [- 1, 2], \\ F_{4} : & F [(1, 4), (2, 2), (2, 3), (2, 4)] = [1, 2, 2, 1], \\ F_{5} : & F [(1, 6), (2, 4), (2, 5), (2, 6)] = [1, 1, 3, 1], \\ F_{6} : & F [(1, 7), (2, 6), (2, 7)] = [1, 5, 1], \\ F_{7} : & F [(2, 4), (3, 4), (4, 4)] = [2, 3, 2], \\ F_{8} : & F [(2, 5), (3, 4), (3, 5), (3, 6)] = [1, 2, 2, 1], \\ F_{9} : & F [(2, 6), (3, 6), (3, 7), (4, 4)] = [1, 4, 1, 2], \\ F_{10} : & F [(2, 7), (3, 7), (4, 7)] = [1, 6, 1], \\ F_{11} : & F [(3, 6), (5, 5), (5, 6)] = [3, 6, 1], \\ F_{12} : & F [(3, 7), (5, 6), (5, 7)] = [2, 5, 1], \\ F_{13} : & F [(4, 7), (5, 7), (6, 7)] = [1, 7, 1], \\ F_{14} : & F [(6, 7), (7, 7)] = [9, 2], \\ F_{15} : & F [(1, 2), (1, 3), (2, 2), (2, 3), (3, 3), (3, 4)] = [6, - 3, 6, - 5, - 2, - 1], \\ F_{16} : & F [(1, 5), (2, 3), (2, 5), (3, 3), (3, 5)] = [- 1, - 2, - 1, 6, 1] . \end{matrix}

For the sign of the fifth derivative, we used the convex optimization package [19] to solve the following LMI problem,

F_{0} + \sum_{i = 1}^{16} x_{i} F_{i} ⪰ 0,

but could not find a feasible solution

x_{1}, \dots, x_{16} \in R

. This suggests to us that a direct generalization of Cheng and Geng’s method may not work for the fifth derivative.

Instead, if we consider the log-concavity constraint

T_{2} \leq 0

and check the optimality of Gaussian inputs, then we have a new matrix

{\tilde{F}}_{0}

(similar to Section 3.5) and several matrices

G_{j}

as the following:

\begin{matrix} {\tilde{F}}_{0} : & F [(1, 1), (1, 3), (1, 5), (2, 3), (2, 5), (3, 3), (3, 4), (3, 5), (3, 6), (5, 5), (5, 6)] \\ = [2, 20, 29, \frac{214}{3}, \frac{49}{2}, \frac{178}{3}, - \frac{37}{3}, - 38, - 6, - 3, - \frac{1}{2}], \\ G_{1} : & G [(3, 4)] = [1], \\ G_{2} : & G [(5, 6)] = [1], \\ G_{3} : & G [(6, 7)] = [1], \\ G_{4} : & G [(1, 3), (2, 3), (3, 3), (3, 4)] = [- 3, - 5, - 2, - 1], \\ G_{5} : & G [(3, 5), (5, 5)] = [- 2, - 1] . \end{matrix}

Now, one would like to find

x_{1}, \dots, x_{16} \in R

and

y_{1}, \dots, y_{5} \in R^{+}

such that

{\tilde{F}}_{0} + \sum_{i = 1}^{16} x_{i} F_{i} + \sum_{i = 1}^{5} y_{i} G_{i} ⪰ 0 .

This can be solved by the convex optimization package [19]. Again, the solution helps us to arrive at Equation (10).

4. Proof of Theorem 1

Proof.

For the third derivative, according to Equation (29), we have

\begin{matrix} 2 h^{(3)} (X + \sqrt{t} Z) = E [T_{3}^{2} - 2 T_{2}^{3}] . \end{matrix}

For the fourth derivative, according to Equation (30):

\begin{matrix} - 2 h^{(4)} (X + \sqrt{t} Z) = E [T_{4}^{2} + 3 T_{4} T_{2}^{2} - 6 T_{3}^{2} T_{2} + 6 T_{3} T_{2}^{2} T_{1} + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}] . \end{matrix}

Adding multiples of the left-hand sides of the equations:

- 3 \times (19) - (22)

, we obtain

\begin{matrix} - 2 h^{(4)} (X + \sqrt{t} Z) & = E [T_{4}^{2} + 3 T_{4} T_{2}^{2} - 6 T_{3}^{2} T_{2} + 6 T_{3} T_{2}^{2} T_{1} + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}] \\ \overset{(a)}{=} E [T_{4}^{2} + (- 6 T_{3}^{2} T_{2} - 3 T_{3} T_{2}^{2} T_{1}) - 6 T_{3}^{2} T_{2} + 6 T_{3} T_{2}^{2} T_{1} + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}] \\ = E [T_{4}^{2} - 12 T_{3}^{2} T_{2} + 3 T_{3} T_{2}^{2} T_{1} + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}] \\ \overset{(b)}{=} E [T_{4}^{2} - 12 T_{3}^{2} T_{2} + (- T_{2}^{4} - T_{2}^{3} T_{1}^{2}) + 7 T_{2}^{4} + T_{2}^{3} T_{1}^{2}] \\ = E [T_{4}^{2} - 12 T_{3}^{2} T_{2} + 6 T_{2}^{4}], \end{matrix}

where

(a)

is due to Equation (19), and

(b)

is due to Equation (22).

For the fifth derivative,

\begin{matrix} 2 h^{(5)} (X + \sqrt{t} Z) = \frac{d}{d t} E [- T_{4}^{2} - 6 T_{2}^{4} + 12 T_{3}^{2} T_{2}] = \frac{d}{d t} E [- T_{4}^{2}] + \frac{d}{d t} E [- 6 T_{2}^{4}] + \frac{d}{d t} E [12 T_{3}^{2} T_{2}] . \end{matrix}

For each term above on the right-hand side: According to Equation (28),

\begin{matrix} \frac{d}{d t} E [- T_{4}^{2}] = E [T_{5}^{2} - 8 T_{4}^{2} T_{2} - 6 T_{4} T_{3}^{2}] . \end{matrix}

(31)

For the second term,

\begin{matrix} \frac{d}{d t} E [- 6 T_{2}^{4}] & \overset{(c)}{=} E [- 3 (T_{2} + T_{1}^{2}) T_{2}^{4} - 12 T_{2}^{3} \times \frac{\partial}{\partial t} (2 T_{2})] \\ \overset{(d)}{=} E [- 3 T_{2}^{5} - 3 T_{2}^{4} T_{1}^{2} - 12 T_{2}^{3} (T_{4} + 2 T_{3} T_{1} + 2 T_{2}^{2})] \\ = E [- 3 T_{2}^{4} T_{1}^{2} - 12 T_{4} T_{2}^{3} - 24 T_{3} T_{2}^{3} T_{1} - 27 T_{2}^{5}], \end{matrix}

(32)

where

(c)

is due to Equation (27), and

(d)

is due to Equation (26). For the third term, according to Equation (27),

\begin{matrix} \frac{d}{d t} E [12 T_{3}^{2} T_{2}] = E [6 (T_{2} + T_{1}^{2}) T_{3}^{2} T_{2} + \frac{\partial}{\partial t} (12 T_{3}^{2} T_{2})], \end{matrix}

where the last term is

\begin{matrix} \frac{\partial}{\partial t} (12 T_{3}^{2} T_{2}) & = 12 T_{3}^{2} \frac{\partial}{\partial t} T_{2} + 24 T_{3} T_{2} \times \frac{\partial}{\partial t} T_{3} \\ \overset{(e)}{=} 6 T_{3}^{2} (T_{4} + 2 T_{3} T_{1} + 2 T_{2}^{2}) + 12 T_{3} T_{2} (T_{5} + 2 T_{4} T_{1} + 6 T_{3} T_{2}) \\ = 12 T_{5} T_{3} T_{2} + 6 T_{4} T_{3}^{2} + 24 T_{4} T_{3} T_{2} T_{1} + 12 T_{3}^{3} T_{1} + 84 T_{3}^{2} T_{2}^{2}, \end{matrix}

where

(e)

is due to Equation (26). Hence

\begin{matrix} \frac{d}{d t} E [12 T_{3}^{2} T_{2}] & = E [12 T_{5} T_{3} T_{2} + 6 T_{4} T_{3}^{2} + 24 T_{4} T_{3} T_{2} T_{1} + 6 T_{3}^{2} T_{2} T_{1}^{2} + 12 T_{3}^{3} T_{1} + 90 T_{3}^{2} T_{2}^{2}] . \end{matrix}

(33)

Finally, combining Equations (31)–(33), we get

\begin{matrix} 2 h^{(5)} (X + \sqrt{t} Z) & = \frac{d}{d t} E [- T_{4}^{2}] + \frac{d}{d t} E [- 6 T_{2}^{4}] + \frac{d}{d t} E [12 T_{3}^{2} T_{2}] \\ = E [T_{5}^{2} - 8 T_{4}^{2} T_{2} - 3 T_{2}^{4} T_{1}^{2} - 12 T_{4} T_{2}^{3} - 24 T_{3} T_{2}^{3} T_{1} \\ - 27 T_{2}^{5} + 12 T_{5} T_{3} T_{2} + 24 T_{4} T_{3} T_{2} T_{1} + 6 T_{3}^{2} T_{2} T_{1}^{2} + 12 T_{3}^{3} T_{1} + 90 T_{3}^{2} T_{2}^{2}] . \end{matrix}

(34)

To simplify Equation (34), using Lemma 3, one first obtains the following equalities:

\begin{matrix} T_{n} = T_{4}, A = T_{3} T_{2} T_{1}, \Rightarrow & E [2 T_{4} T_{3} T_{2} T_{1} + T_{3}^{3} T_{1} + T_{3}^{2} T_{2}^{2} + T_{3}^{2} T_{2} T_{1}^{2}] = 0, \end{matrix}

(35)

\begin{matrix} T_{n} = T_{3}, A = T_{2}^{3} T_{1}, \Rightarrow & E [4 T_{3} T_{2}^{3} T_{1} + T_{2}^{5} + T_{2}^{4} T_{1}^{2}] = 0, \end{matrix}

(36)

\begin{matrix} T_{n} = T_{4}, A = T_{2}^{3}, \Rightarrow & E [T_{4} T_{2}^{3} + 3 T_{3}^{2} T_{2}^{2} + T_{3} T_{2}^{3} T_{1}] = 0 . \end{matrix}

(37)

Then, adding multiples of the left-hand sides of Equations (35)–(37), we have

\begin{matrix} 2 h^{(5)} (X + \sqrt{t} Z) & = 2 h^{(5)} (X + \sqrt{t} Z) - 12 \times (35) + 3 \times (36) + 12 \times (37) \\ = E [T_{5}^{2} - 24 T_{2}^{5} - 8 T_{4}^{2} T_{2} - 6 T_{3}^{2} T_{2} T_{1}^{2} + 12 T_{5} T_{3} T_{2} + 114 T_{3}^{2} T_{2}^{2}] . \end{matrix}

☐

5. Discussion

5.1. On the Derivatives

We are not able to say anything conclusive about the sign of the fifth derivative of the differential entropy

h (X + \sqrt{t} Z)

. If we impose the log-concavity condition, namely

T_{2} \leq 0

, then the fifth derivative is at least

4! \times E [{(- T_{2})}^{5}]

. This motivates us to consider the following problem: Without additional constraints, what are the values

c_{5} > 0

such that

2 h^{(5)} (X + \sqrt{t} Z) \geq c_{5} \times E [{(- T_{2})}^{5}] .

If one finds such a value

c_{5}

, then so long as

E [{(- T_{2})}^{5}] \geq 0

, the sign of the fifth derivative is determined. This condition is much weaker than

T_{2} \leq 0

.

For the computational part, one only needs to construct the matrix

{\tilde{F}}_{0}

such that

2 h^{(5)} (X + \sqrt{t} Z) - c_{5} \times E [{(- T_{2})}^{5}] = E [u^{T} {\tilde{F}}_{0} u]

, and then solve the problem (see Section 3.6 for the matrices

F_{i}

)

{\tilde{F}}_{0} + \sum_{i = 1}^{16} x_{i} F_{i} ⪰ 0 .

It turns out that

c_{5} = 0.13

works, while

c_{5} = 0.125

fails. The authors guess that

c_{5} \in [0.13, 24]

works, but, at the moment, can just partly confirm this with limited simulation.

Notice that the third derivative of the entropy power

N (X + \sqrt{t} Z)

was shown to be nonnegative under the log-concavity condition [5], and we recover this in Corollary 3. We also considered the fourth derivative, but failed to obtain the sign because we were unable to apply the Cauchy–Schwartz inequality as we did for the third derivative.

5.2. Possible Proofs

To prove Conjecture 1, besides the method proposed in [9], we are also considering the following ways: the first one is constructive and inspired by Equation (1). Given a random variable X, if we can construct a proper measure

μ (\cdot)

such that Equation (1) holds, then one proves Conjecture 1. However, this is difficult even when X is binary symmetric, which is a very simple random variable.

The second one is recursive. Suppose one can find a formula for the n-th derivative such that

\begin{matrix} {(- 1)}^{n - 1} \times h^{(n)} (X + \sqrt{t} Z) & = E [\sum_{i = 1}^{k_{n}} A_{i}^{2}], \\ \frac{d}{d t} E [A_{i}^{2}] & = E [- B_{i}^{2}], \end{matrix}

then it is clear that

\begin{matrix} {(- 1)}^{n} \times h^{(n + 1)} (X + \sqrt{t} Z) & = E [\sum_{i = 1}^{k_{n}} B_{i}^{2}] \geq 0 . \end{matrix}

However, this fails for

n = 2

(see Equation (7) and Theorem 1). Instead, one may expect that

\begin{matrix} \frac{d}{d t} E [A_{i}^{2}] & = E [- B_{i}^{2} - C_{i}^{2} + B_{i + 1}^{2}], \end{matrix}

and then

\begin{matrix} {(- 1)}^{n} \times h^{(n + 1)} (X + \sqrt{t} Z) & = E [- B_{1}^{2} + B_{k_{n} + 1}^{2} - \sum_{i = 1}^{k_{n}} C_{i}^{2}] . \end{matrix}

If further one can show that

E [- B_{1}^{2} + B_{k_{n} + 1}^{2}] = E [- C_{k_{n} + 1}^{2}]

for some

C_{k_{n} + 1}

, then one finishes the proof. Notice here that a clever observation is needed for this way to work.

5.3. Applications

The topic of Gaussian optimality has wide applications, for example in [20,21]. In this work, besides the Gaussian optimality, we also have some new observations. In [11], the derivatives in the signal-noise ratio (snr) of

I (X; \sqrt{s n r} X + Z)

are studied. In particular, the first four derivatives are obtained in the language of the minimum mean-square error (Equations (69)–(72) in Corollary 1 of [11]). However, it is not clear whether some of these derivatives are signed or not.

With some standard manipulations, it is not difficult to show that

\begin{matrix} I (X; \sqrt{s n r} X + Z) = h (\sqrt{s n r} X + Z) - h (Z) = h (X + \frac{1}{\sqrt{s n r}} Z) + log \sqrt{s n r} - \frac{1}{2} log 2 π e . \end{matrix}

By letting

t = 1 / \sqrt{s n r}

, one can easily connect the minimum mean-square error formulae in [11] with the signs of the derivatives of

h (X + \sqrt{t} Z)

in t. The verification of Conjectures 1 and 2 would imply the bounding and extremal properties of Equations (69)–(72) in [11], and thus deepen our understanding of the minimum mean-square error estimation under the additive-Gaussian setting.

In addition, notice that the probability density function

f (y, t)

of

Y = X + \sqrt{t} Z

is the solution of the heat equation

\frac{\partial}{\partial t} f (y, t) = \frac{1}{2} \frac{\partial^{2}}{\partial y^{2}} f (y, t)

with the initial condition that

f (y, 0) = f_{X} (y)

. Hence, Conjectures 1 and 2, if true, reveal the properties of the differential entropy of functions that satisfy the heat equation. For more results related to diffusion equations, one may refer to [22].

6. Conclusions

In this paper, we studied two conjectures on the derivatives of the differential entropy of a general random variable with added Gaussian noise. Regarding the conjecture on the signs of the derivatives made by Cheng and Geng, we introduced the linear matrix inequality approach to provide evidence that their original method might not generalize to orders higher than four. Instead, we considered imposing an additional constraint, namely the log-concavity assumption, and showed the optimality of Gaussian random variables for orders three, four and five. Thus, we made progress on McKean’s conjecture, under a mild condition.

Acknowledgments

The authors would like to thank Professor Chandra Nair for his valuable suggestions. The work of Venkat Anantharam was supported by the National Science Foundation (NSF) grants ECCS-1343398, CNS-1527846, CCF-1618145, the NSF Science and Technology Center grant CCF-0939370 (Science of Information), and the William and Flora Hewlett Foundation supported Center for Long Term Cybersecurity at Berkeley. The work of Yanlin Geng was supported in part by the National Natural Science Foundation of China under Grant 61601288, and the Science and Technology Commission of Shanghai Municipality under Grant 15YF1407900.

Author Contributions

Venkat Anantharam proposed the linear matrix inequality approach. Xiaobing Zhang performed the experiments to find the coefficients in the theorem and wrote the paper. Yanlin Geng proved the main results. Venkat Anantharam and Yanlin Geng reviewed and edited the manuscript. All authors read and approved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 4

Proof.

For Equation (26), according to Lemma 1,

f (y, t)

satisfies the following (heat) equation:

\begin{matrix} f_{t} : = \frac{\partial}{\partial t} f = \frac{1}{2} f_{2} . \end{matrix}

In addition, according to Equation (5),

\begin{matrix} T_{1} = \frac{f_{1}}{f}, T_{2} = \frac{f_{2}}{f} - \frac{f_{1}^{2}}{f^{2}} . \end{matrix}

Hence,

\begin{matrix} 2 \frac{\partial}{\partial t} T_{0} = 2 \frac{\partial}{\partial t} ln f (y, t) & = \frac{2 f_{t}}{f} = \frac{f_{2}}{f} = T_{2} + T_{1}^{2} . \end{matrix}

Now, it follows that, for

n \geq 0

,

\begin{matrix} 2 \frac{\partial}{\partial t} T_{n} = 2 \frac{\partial}{\partial t} (\frac{\partial^{n}}{\partial y^{n}} T_{0}) = \frac{\partial^{n}}{\partial y^{n}} (\frac{\partial}{\partial t} 2 T_{0}) = \frac{\partial^{n}}{\partial y^{n}} (T_{2} + T_{1}^{2}) = T_{n + 2} + \sum_{k = 0}^{n} C_{n}^{k} T_{k + 1} T_{n - k + 1} . \end{matrix}

For Equation (27),

\begin{matrix} \frac{d}{d t} E [A] = \frac{d}{d t} \int f A d y = \int (f_{t} A + f \frac{\partial}{\partial t} A) d y = \int (f \frac{1}{2} \frac{f_{2}}{f} A + f \frac{\partial}{\partial t} A) d y = E [\frac{1}{2} (T_{2} + T_{1}^{2}) A + \frac{\partial}{\partial t} A] . \end{matrix}

For Equation (28), the derivative is

\begin{matrix} \frac{d}{d t} E [T_{n}^{2}] \overset{(a)}{=} \int (f \frac{1}{2} (T_{2} + T_{1}^{2}) T_{n}^{2} + f \frac{\partial}{\partial t} T_{n}^{2}) d y = \int (f \frac{1}{2} (T_{2} + T_{1}^{2}) T_{n}^{2} + f T_{n} \times 2 \frac{\partial}{\partial t} T_{n}) d y, \end{matrix}

where

(a)

is due to Equation (27).

For the first term of the right-hand side, from Lemma 1 and integration by parts,

\begin{matrix} \int f T_{n + 1} T_{n} T_{1} d y = \int f T_{n} T_{1} d T_{n} = 0 - \int T_{n} (f_{1} T_{n} T_{1} + f T_{n + 1} T_{1} + f T_{n} T_{2}) d y, \\ \Rightarrow & \int f T_{n + 1} T_{n} T_{1} d y = - \frac{1}{2} \int f T_{n}^{2} (T_{2} + T_{1}^{2}) d y . \end{matrix}

For the second term, we have

\begin{matrix} \int f T_{n} \times 2 \frac{\partial}{\partial t} T_{n} d y & \overset{(b)}{=} \int f T_{n} (T_{n + 2} + \sum_{k = 0}^{n} C_{n}^{k} T_{k + 1} T_{n - k + 1}) d y \\ = \int f T_{n} (2 T_{n + 1} T_{1} + \sum_{k = 1}^{n - 1} C_{n}^{k} T_{k + 1} T_{n - k + 1}) d y + \int f T_{n} d T_{n + 1} \\ = \int (f T_{n} (2 T_{n + 1} T_{1} + \sum_{k = 1}^{n - 1} C_{n}^{k} T_{k + 1} T_{n - k + 1}) - T_{n + 1} (f T_{1} T_{n} + f T_{n + 1})) d y \\ = \int f (- T_{n + 1}^{2} + T_{n + 1} T_{n} T_{1} + T_{n} \sum_{k = 1}^{n - 1} C_{n}^{k} T_{k + 1} T_{n - k + 1}) d y, \end{matrix}

where

(b)

is due to Equation (26).

Combining these two terms together, the third equality is proved. ☐

References

Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef]
Zhang, X.; Anantharam, V.; Geng, Y. Gaussian Extremality for Derivatives of Differential Entropy under the Additive Gaussian Noise Flow. IEEE Int. Symp. Inf. Theory 2018. submitted. [Google Scholar]
Costa, M. A new entropy power inequality. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
Toscani, G. A concavity property for the reciprocal of Fisher information and its consequences on Costa’s EPI. Phys. A Stat. Mech. Appl. 2015, 432, 35–42. [Google Scholar] [CrossRef]
Villani, C. A short proof of the “concavity of entropy power”. IEEE Trans. Inf. Theory 2000, 46, 1695–1696. [Google Scholar] [CrossRef]
McKean, H.P. Speed of approach to equilibrium for Kac’s caricature of a Maxwellian gas. Arch. Ration. Mech. Anal. 1966, 21, 343–367. [Google Scholar] [CrossRef]
Toscani, G. Entropy production and the rate of convergence to equilibrium for the Fokker-Planck equation. Q. Appl. Math. 1999, 57, 521–541. [Google Scholar] [CrossRef]
Cheng, F.; Geng, Y. Higher order derivatives in Costa’s entropy power inequality. IEEE Trans. Inf. Theory 2015, 61, 5892–5905. [Google Scholar] [CrossRef]
Bernstein, S. Sur les fonctions absolument monotones. Acta Math. 1929, 52, 1–66. [Google Scholar] [CrossRef]
Guo, D.; Wu, Y.; Shitz, S.S.; Verdú, S. Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inf. Theory 2011, 57, 2371–2385. [Google Scholar]
Wibisono, A.; Jog, V. Convexity of mutual information along the heat flow. arXiv, 2018; arXiv:1801.06968. [Google Scholar]
Wang, L.; Madiman, M. Beyond the entropy power inequality, via rearrangements. IEEE Trans. Inf. Theory 2014, 60, 5116–5137. [Google Scholar] [CrossRef]
Courtade, T.A. Concavity of entropy power: Equivalent formulations and generalizations. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 56–60. [Google Scholar]
König, R.; Smith, G. The entropy power inequality for quantum systems. IEEE Trans. Inf. Theory 2014, 60, 1536–1548. [Google Scholar] [CrossRef]
Rioul, O. Information theoretic proofs of entropy power inequalities. IEEE Trans. Inf. Theory 2011, 57, 33–55. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2006. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Grant, M.; Boyd, S.; Ye, Y. CVX: Matlab Software for Disciplined Convex Programming. 2008. Available online: http://cvxr.com/cvx/ (accessed on 5 March 2018).
Weingarten, H.; Steinberg, Y.; Shamai, S.S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theory 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inf. Theory 2014, 60, 2087–2104. [Google Scholar] [CrossRef]
Toscani, G. Diffusion Equations and Entropy Inequalities. preprint 2016. Available online: http://mate.unipv.it/toscani/publi/Note-Ravello-2016.pdf (accessed on 5 March 2018).

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Anantharam, V.; Geng, Y. Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities. Entropy 2018, 20, 182. https://doi.org/10.3390/e20030182

AMA Style

Zhang X, Anantharam V, Geng Y. Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities. Entropy. 2018; 20(3):182. https://doi.org/10.3390/e20030182

Chicago/Turabian Style

Zhang, Xiaobing, Venkat Anantharam, and Yanlin Geng. 2018. "Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities" Entropy 20, no. 3: 182. https://doi.org/10.3390/e20030182

APA Style

Zhang, X., Anantharam, V., & Geng, Y. (2018). Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities. Entropy, 20(3), 182. https://doi.org/10.3390/e20030182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities^†

Abstract

1. Introduction

2. Main Results

Log-Concave Case

3. Linear Matrix Inequalities

3.1. Matrices $F_{i}$ from Multiple Representations

3.2. Matrices $F_{i}$ from Integration by Parts

3.3. Matrix $F_{0}$ from the Derivative

3.4. Matrices $G_{j}$ from Log-Concavity

3.5. Matrix ${\tilde{F}}_{0}$ for Gaussian Optimality

3.6. Fifth Derivative

4. Proof of Theorem 1

5. Discussion

5.1. On the Derivatives

5.2. Possible Proofs

5.3. Applications

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Lemma 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities †

Abstract

1. Introduction

2. Main Results

Log-Concave Case

3. Linear Matrix Inequalities

3.1. Matrices F i from Multiple Representations

3.2. Matrices F i from Integration by Parts

3.3. Matrix F 0 from the Derivative

3.4. Matrices G j from Log-Concavity

3.5. Matrix F ˜ 0 for Gaussian Optimality

3.6. Fifth Derivative

4. Proof of Theorem 1

5. Discussion

5.1. On the Derivatives

5.2. Possible Proofs

5.3. Applications

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Lemma 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Gaussian Optimality for Derivatives of Differential Entropy Using Linear Matrix Inequalities^†

3.1. Matrices $F_{i}$ from Multiple Representations

3.2. Matrices $F_{i}$ from Integration by Parts

3.3. Matrix $F_{0}$ from the Derivative

3.4. Matrices $G_{j}$ from Log-Concavity

3.5. Matrix ${\tilde{F}}_{0}$ for Gaussian Optimality