# A New Tight Upper Bound on the Entropy of Sums

^{*}

Next Article in Journal

Previous Article in Journal

Department of Electrical and Computer Engineering, American University of Beirut, Beirut 1107 2020, Lebanon

Author to whom correspondence should be addressed.

Academic Editor: Raúl Alcaraz Martínez

Received: 18 August 2015 / Revised: 8 December 2015 / Accepted: 14 December 2015 / Published: 19 December 2015

(This article belongs to the Section Information Theory, Probability and Statistics)

We consider the independent sum of a given random variable with a Gaussian variable and an infinitely divisible one. We find a novel tight upper bound on the entropy of the sum which still holds when the variable possibly has an infinite second moment. The proven bound has several implications on both information theoretic problems and infinitely divisible noise channels’ transmission rates.

Information inequalities have been investigated since the foundation of information theory. Two such important ones are due to Shannon [1]:

- The first one is an upper bound on the entropy of Random Variables (RV)s having a finite second moment by virtue of the fact that Gaussian distributions maximize entropy under a second moment constraint (the (differential) entropy $h\left(Y\right)$ of a random variable Y having a probability density function $p\left(y\right)$ is defined as:$$h\left(Y\right)=-{\int}_{-\infty}^{+\infty}p\left(y\right)lnp\left(y\right)\phantom{\rule{0.166667em}{0ex}}dy,$$$$h(X+Z)\le \frac{1}{2}ln2\pi e\left({\sigma}_{X}^{2}+{\sigma}_{Z}^{2}\right).$$
- The second one is a lower bound on the entropy of independent sums of RVs and commonly known as the Entropy Power Inequality (EPI). The EPI states that given two real independent RVs X, Z such that $h\left(X\right)$, $h\left(Z\right)$ and $h(X+Z)$ exist, then (Corollary 3, [2])$$N(X+Z)\ge N\left(X\right)+N\left(Z\right),$$$$N\left(X\right)=\frac{1}{2\pi e}{e}^{2h\left(X\right)}.$$

While Shannon proposed Equation (2) and proved it locally around the normal distribution, Stam [3] was the first to prove this result in general followed by Blachman [4] in what is considered to be a simplified proof. The proof was done via the usage of two information identities:

- 1-
- The Fisher Information Inequality (FII): Let X and Z be two independent RVs such that the respective Fisher informations $J\left(X\right)$ and $J\left(Z\right)$ exist (the Fisher information $J\left(Y\right)$ of a random variable Y having a probability density function $p\left(y\right)$ is defined as:$$J\left(Y\right)={\int}_{-\infty}^{+\infty}\frac{1}{p\left(y\right)}\phantom{\rule{0.166667em}{0ex}}{p}^{\prime 2}\left(y\right)\phantom{\rule{0.166667em}{0ex}}dy,$$$$\frac{1}{J(X+Z)}\ge \frac{1}{J\left(X\right)}+\frac{1}{J\left(Z\right)}.$$
- 2-
- The de Bruijn’s identity: For any $\u03f5>0$,$$\frac{d}{d\u03f5}\phantom{\rule{0.166667em}{0ex}}h(X+\sqrt{\u03f5}Z)=\frac{{\sigma}^{2}}{2}J(X+\sqrt{\u03f5}Z),$$Rioul proved that the de Bruijn’s identity holds at $\u03f5={0}^{+}$ for any finite-variance RV Z (Proposition 7, p. 39, [5]).

The remarkable similarity between Equations (2) and (3) was pointed out in Stam’s paper [3] who in addition, related the entropy power and the Fisher information by an “uncertainty principle-type” relation:
which is commonly known as the Isoperimetric Inequality for Entropies (IIE) (Theorem 16, [6]). Interestingly, equality holds in Equation (5) whenever X is Gaussian distributed and in Equations (1)–(3) whenever X and Z are independent Gaussian.

$$N\left(X\right)J\left(X\right)\ge 1,$$

When it comes to upper bounds, a bound on the discrete entropy of the sum exists [7]:

$$H(X+Z)\le H\left(X\right)+H\left(Z\right).$$

In addition, several identities involving discrete entropy of sums were shown in [8,9] using the Plünnecke-Ruzsa sumset theory and its analogy to Shannon entropy. Except for Equation (1), that holds for finite variance RVs, the differential entropy inequalities provided in some sense a lower bound on the entropy of sums of independent RVs. Equation (6) does not always hold for differential entropies, and unless the variance is finite, if we start with two RVs X and Z having respectively finite differential entropies $h\left(X\right)$ and $h\left(Z\right)$, one does not have a clear idea on how much the growth of $h(X+Z)$ will be. The authors in [10] deferred this to the fact that discrete entropy has a functional submodularity property which is not the case for differential entropy. Nevertheless, the authors were able to derive various useful inequalities. Madiman [11] used basic information theoretic relations to prove the submodularity of the entropy of independent sums and found accordingly upper bounds on the discrete and differential entropy of sums. Though, in its general form, the problem of upper bounding the differential entropy of independent sums is not always possible (proposition 4, [2]), several results are known in particular settings. Cover et al. [12] solved the problem of maximizing the differential entropy of the sum of dependent RVs having the same marginal log-concave densities. In [13], Ordentlich found the maximizing probability distribution for the differential entropy of the independent sum of n finitely supported symmetric RVs. For “sufficiently convex” probability distributions, an interesting reverse EPI was proven to hold in (Theorem 1.1, p. 63, [14]).

In this study, we find a tight upper bound on the (differential) entropy of the independent sum of a RV X not necessarily having a finite variance with an infinitely divisible variable having a Gaussian component. The proof is based on the infinite divisibility property with the application of the FII, de Bruijn’s identity, a proven concavity result of the differential entropy and a novel “de Bruijn type” identity. We use convolutions along small perturbations to upper bound some relevant information theoretic quantities as done in [15] where some moment constraints were imposed on X which is not the case here. The novel bound presented in this paper is, for example, useful when studying Gaussian channels or when the additive noise is modeled as a combination of Gaussian and Poisson variables [16]. It has several implications which are listed in Section 2 and can be possibly used for variables with infinite second moments. Even when the second moment of X is finite, in some cases our bound can be tighter than Equation (1).

We consider the following scalar RV,
where $Z={Z}_{1}+{Z}_{2}$ is a composite infinitely divisible RV that is independent of X and where:

$$Y=X+Z,$$

- ${Z}_{1}\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$ is a Gaussian RV with mean ${\mu}_{1}$ and positive variance ${\sigma}_{1}^{2}$.
- ${Z}_{2}$ is an infinitely divisible RV with mean ${\mu}_{2}$ and finite (possibly zero) variance ${\sigma}_{2}^{2}$ that is independent of ${Z}_{1}$.

We note that since ${Z}_{1}$ is absolutely continuous with bounded Probability Density Function (PDF), then so are $Z={Z}_{1}+{Z}_{2}$ and $Y=X+Z$ for any RV X [17]. In addition, we define the set $\mathcal{L}$ of distribution functions $F\left(x\right)$ that have a finite logarithmic moment:

$$\mathcal{L}=\left\{F:\int ln\left(1+\left|X\right|\right)\phantom{\rule{0.166667em}{0ex}}dF\left(x\right)<+\infty \right\}.$$

Let $\mathrm{E}\left[\xb7\right]$ denote the expectation operator. Using the identity $ln\left(1+|x+n|\right)\le ln\left(1+\left|x\right|\right)+ln\left(1+\left|n\right|\right)$,

$$\mathrm{E}\left[ln\left(1+\left|Y\right|\right)\right]=\mathrm{E}\left[ln\left(1+|X+Z|\right)\right]\le \mathrm{E}\left[ln\left(1+\left|X\right|\right)\right]+\mathrm{E}\left[ln\left(1+\left|Z\right|\right)\right].$$

Since Z has a bounded PDF with finite variance then necessarily it has a finite logarithmic moment, and by virtue of Equation (8), if $X\in \mathcal{L}$ then $Y\in \mathcal{L}$.

Under this finite logarithmic constraint, the differential entropy of X is well defined and is such that $-\infty \le h\left(X\right)<+\infty $ (Proposition 1, [5]) and that of Y exists and is finite (Lemma 1, [5]). Also, when $X\in \mathcal{L}$, the identity $I(X+Z;Z)=h(X+Z)-h\left(X\right)$ always holds (Lemma 1, [5]).

The main result of this work stated in Theorem 1 is a novel upper bound on $h\left(Y\right)$ whenever $X\in \mathcal{L}$ with finite differential entropy $h\left(X\right)$ and finite Fisher information $J\left(X\right)$.

Let $X\in \mathcal{L}$ having finite $h\left(X\right)$ and $J\left(X\right)$. The differential entropy of $X+Z$ is upper bounded by:
where the Kullback-Leibler divergence between probability distributions p and q is denoted $D(p\parallel q)$. In the case where $Z\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$, i.e. ${\sigma}_{2}^{2}=0$, we have
and equality holds if and only if both X and Z are Gaussian distributed.

$$h(X+Z)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)+\phantom{\rule{0.166667em}{0ex}}{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}min\left\{\underset{x}{sup}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}};\frac{1}{2{\sigma}_{1}^{2}}\right\},$$

$$h(X+Z)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right),$$

We defer the proof of Theorem 1 to Section 3. The rest of this section is dedicated to the implications of identity Equation (10) which are fivefold:

- 1-
- While the usefulness of this upper bound is clear for RVs X having an infinite second moment for which Equation (1) fails, it can in some cases, present a tighter upper bound than the one provided by Shannon for finite second moment variables X. This is the case, for example, when $Z\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$ and X is a RV having the following PDF:$${p}_{X}\left(x\right)=\left\{\begin{array}{cc}{\displaystyle f(x+a)}\hfill & -1-a\le x\le 1-a\hfill \\ {\displaystyle f(x-a)}\hfill & -1+a\le x\le 1+a,\hfill \end{array}\right.$$$$f\left(x\right)=\left\{\begin{array}{cc}{\displaystyle \frac{3}{4}{(1+x)}^{2}}\hfill & -1\le x\le 0\hfill \\ \\ {\displaystyle \frac{3}{4}{(1-x)}^{2}}\hfill & 0<x\le 1\hfill \\ \\ {\displaystyle 0}\hfill & \mathrm{otherwise}.\hfill \end{array}\right.$$$$h(X+Z)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)=ln\frac{4}{3}+\frac{2}{3}+\frac{1}{2}ln(1+12\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}),$$$$h(X+Z)\le \frac{1}{2}ln2\pi e\left({\sigma}_{X}^{2}+{\sigma}_{1}^{2}\right)=\frac{1}{2}ln2\pi e+\frac{1}{2}ln\left({a}^{2}+\frac{1}{10}+{\sigma}_{1}^{2}\right).$$
- 2-
- Theorem 1 gives an analytical bound on the change in the transmission rates of the linear Gaussian channel function of an input scaling operation. In fact, let X be a RV satisfying the conditions of Theorem 1 and $Z\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$. Then $aX$ satisfies similar conditions for some positive scalar a. Hence$$h(aX+Z)\le h\left(aX\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(aX\right)\right)=h\left(X\right)+lna+\frac{1}{2}ln\left(1+\frac{{\sigma}_{1}^{2}}{{a}^{2}}J\left(X\right)\right),$$$$I(aX+Z;X)-I(X+Z;X)\le \frac{1}{2}ln\left({a}^{2}+{\sigma}_{1}^{2}J\left(X\right)\right).$$
- 3-
- If the EPI is regarded as being a lower bound on the entropy of sums, Equation (10) can be considered as its upper bound counterpart whenever one of the variables is Gaussian. In fact using both of these inequalities gives:$$N\left(X\right)+N\left(Z\right)\le N\left(Y\right)\le N\left(X\right)+N\left(Z\right)\left[N\left(X\right)J\left(X\right)\right].$$
- 4-
- The result of Theorem 1 is more powerful that the IIE in Equation (5). Indeed, using the fact that $h\left(Z\right)\le h(X+Z)$, inequality Equation (10) gives the looser inequality:$$h\left(Z\right)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}^{2}J\left(X\right)\right),$$$$N\left(X\right)J\left(X\right)\ge \frac{{\sigma}^{2}J\left(X\right)}{1+{\sigma}^{2}J\left(X\right)},$$$$N\left(aX\right)J\left(aX\right)=N\left(X\right)J\left(X\right)\ge \frac{{\sigma}^{2}J\left(X\right)}{{a}^{2}+{\sigma}^{2}J\left(X\right)},$$
- 5-
- Finally, in the context of communicating over a channel, it is well-known that, under a second moment constraint, the best way to “fight” Gaussian noise is to use Gaussian inputs. This follows from the fact that Gaussian variables maximize entropy under a second moment constraint. Conversely, when using a Gaussian input, the worst noise in terms of minimizing the transmission rates is also Gaussian. This is a direct result of the EPI and is also due to the fact that Gaussian distributions have the highest entropy and therefore are the worst noise to deal with. If one were to make a similar statement where instead of the second moment, the Fisher information is constrained, i.e., if the input X is subject to a Fisher information constraint: $J\left(X\right)\le A$ for some $A>0$, then the input minimizing the mutual information of the additive white Gaussian channel is Gaussian distributed. This is a result of the EPI in Equation (2) and the IIE in Equation (5). They both reduce in this setting to$$\mathrm{arg}\underset{X:J\left(X\right)\le A}{min}h(X+Z)\sim \mathcal{N}\left(0;\frac{1}{A}\right).$$$$\mathrm{arg}\underset{Z:J\left(Z\right)\le A}{max}h(X+Z).$$$$I(Y;X)\le \frac{1}{2}ln\left(1+pJ\left(Z\right)\right),$$

Let U be an infinitely divisible RV with characteristic function ${\varphi}_{U}\left(\omega \right)$ (the characteristic function $\varphi \left(\omega \right)$ of a probability distribution function ${F}_{U}\left(u\right)$ is defined by:
which is the Fourier transform of ${F}_{U}\left(u\right)$ at $-\omega $). For each real $t\ge 0$, denote by ${F}_{t}(\xb7)$ the unique probability distribution (Theorem 2.3.9, p. 65, [18]) with characteristic function:
where $ln(\xb7)$ is the principal branch of the logarithm. For the rest of this paper, we denote by ${U}_{t}$ a RV with characteristic function ${\varphi}_{t}\left(\omega \right)$ as defined in Equation (15). Note that ${U}_{0}$ is deterministically equal to 0 (i.e., distributed according to the Dirac delta distribution) and ${U}_{1}$ is distributed according to U. The family of probability distributions ${\left\{{F}_{t}(\xb7)\right\}}_{t\ge 0}$ forms a continuous convolution semi-group in the space of probability measures on $\mathbb{R}$ (see Definition 2.3.8 and Theorem 2.3.9, [18]) and hence one can write:
where ${U}_{s}$ and ${U}_{t}$ are independent.

$${\varphi}_{U}\left(\omega \right)={\int}_{\mathbb{R}}{e}^{i\omega u}\phantom{\rule{0.166667em}{0ex}}d{F}_{U}\left(u\right),\phantom{\rule{2.em}{0ex}}\omega \in \mathbb{R},$$

$${\varphi}_{t}\left(\omega \right)={e}^{tln{\varphi}_{U}\left(\omega \right)},$$

$${U}_{s}+{U}_{t}={U}_{s+t}\phantom{\rule{2.em}{0ex}}\forall s,t\ge 0,$$

Let U be an infinitely divisible RV and ${\left\{{U}_{t}\right\}}_{t\ge 0}$ an associated family of RVs distributed according to Equation (15) and independent of X. The differential entropy $h(X+{U}_{t})$ is a concave function in $t\ge 0$.

In the case of a Gaussian-distributed U, the family ${\left\{{U}_{t}\right\}}_{t\ge 0}$ has the same distribution as ${\left\{\sqrt{t}U\right\}}_{t}$, and it is already known that the entropy (and actually even the entropy power) of Y is concave in t ((Section VII, p. 51, [5]) and [19]).

We start by noting that $h(X+{U}_{t})$ is non-decreasing in t. For $0\le s<t$,
where ${U}_{t}$, ${U}_{s}$ and ${U}_{t-s}$ are three independent instances of RVs in the family ${\left\{{U}_{t}\right\}}_{t\ge 0}$. Next we show that $h(X+{U}_{t})$ is midpoint concave: Let ${U}_{t}$, ${U}_{s}$, ${U}_{(t+s)/2}$ and ${U}_{(t-s)/2}$ be independent RVs in the family ${\left\{{U}_{t}\right\}}_{t\ge 0}$. For $0\le s<t$,
where Equation (16) is the definition of the mutual information and Equation (17) is the application of the data processing inequality to the Markov chain ${U}_{(t-s)/2}-(X+{U}_{s}+{U}_{(t-s)/2})-(X+{U}_{(t+s)/2}+{U}_{(t-s)/2})$. Therefore,
and the function is midpoint concave for $t\ge 0$. Since the function is non-decreasing, it is Lebesgue measurable and midpoint concavity guarantees its concavity. ☐

$$h(X+{U}_{t})=h(X+{U}_{s}+{U}_{t-s})\ge h(X+{U}_{s}),$$

$$\begin{array}{}& \hfill h(X+{U}_{t})-h(X+{U}_{(t+s)/2})& =h(X+{U}_{(t+s)/2}+{U}_{(t-s)/2})-h(X+{U}_{(t+s)/2})\hfill \mathrm{(16)}& =I(X+{U}_{(t+s)/2}+{U}_{(t-s)/2};{U}_{(t-s)/2})\hfill \mathrm{(17)}& \le I(X+{U}_{s}+{U}_{(t-s)/2};{U}_{(t-s)/2})\hfill & =h(X+{U}_{(t+s)/2})-h(X+{U}_{s})\hfill \end{array}$$

$$h(X+{U}_{(t+s)/2})\ge \frac{1}{2}\left[h(X+{U}_{t})+h(X+{U}_{s})\right],$$

An interesting implication of Lemma 1 is that $h(X+{U}_{t})$ as a function of t is below any of its tangents. Particularly,

$$h(X+{U}_{t})\le h\left(X\right)+t\frac{dh(X+{U}_{t})}{dt}{|}_{t=0}.$$

Define the non-negative quantity
which is possibly infinite if the supremum is not finite. We first note that ${C}_{X}$ has the following interesting properties:

$${C}_{X}=\underset{x\ne 0}{sup}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}},$$

- It was found by Verdu [20] to be equal to the channel capacity per unit cost of the linear average power constrained additive noise channel where the noise is independent of the input and is distributed according to X.
- Using the above interpretation, one can infer that for independent RVs X and W,$${C}_{X+W}\le {C}_{X}.$$$$D\left({p}_{X+W}(u-x)\parallel {p}_{X+W}\left(u\right)\right)\le D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right),$$
- Using Kullback’s well-known result on the divergence (Section 2.6, [21]),$${C}_{X}\ge \underset{x\to {0}^{+}}{lim}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}}=\frac{1}{2}J\left(X\right).$$
- Whenever the supremum is at “0”,$${C}_{X}=\frac{1}{2}J\left(X\right),$$

Next we prove the following lemma,

Let U be an infinitely divisible real RV with finite variance ${\sigma}_{U}^{2}$ and ${\left\{{U}_{t}\right\}}_{t\ge 0}$ the associated family of RVs distributed according to Equation (15). Let X be an independent real variable satisfying the following:

- X has a positive PDF ${p}_{X}\left(x\right)$.
- The integrals ${\left\{{\displaystyle {\int}_{\mathbb{R}}{\left|\omega \right|}^{k}\left|{\varphi}_{X}\left(\omega \right)\right|\phantom{\rule{0.166667em}{0ex}}d\omega}\right\}}_{k}$ are finite for all $k\in \mathbb{N}\backslash \left\{0\right\}$.
- $C}_{X}=\underset{x\ne 0}{sup}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}$ is finite.

$$\frac{{d}^{+}}{dt}h(X+{U}_{t}){|}_{t=\tau}=\underset{t\to {0}^{+}}{lim}\frac{h(X+{U}_{\tau +t})-h(X+{U}_{\tau})}{t}\le {\sigma}_{U}^{2}\phantom{\rule{0.166667em}{0ex}}{C}_{X}.$$

Before we proceed to the proof, we note that by Lemma 1, $h(X+{U}_{\tau})$ is concave in the real-valued τ and hence it is everywhere right and left differentiable.

Using Equation (15), the characteristic function of the independent sum $X+{U}_{\tau +t}$ for $\tau \ge 0$ and small $t>0$ is:
Taking the inverse Fourier transform yields,
where ${\mathcal{F}}^{-1}$ denotes the inverse distributional Fourier transform operator. Equation (22) holds true when ${\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}{ln}^{k}{\varphi}_{U}\left(\omega \right)\right\}\left(y\right)$ exists for all $k\in \mathbb{N}\backslash \left\{0\right\}$, which is the case. Indeed, using the Kolmogorov representation of the characteristic function of an infinitely divisible RV (Theorem 7.7, [22]), we know that there exists a unique distribution function ${H}_{U}\left(x\right)$ associated with U such that

$$\begin{array}{ccc}\hfill {\varphi}_{X+{U}_{\tau +t}}\left(\omega \right)& =& {\varphi}_{X+{U}_{\tau}}\left(\omega \right){\varphi}_{{U}_{t}}\left(\omega \right)={\varphi}_{X+{U}_{\tau}}\left(\omega \right)exp\left[tln{\varphi}_{U}\left(\omega \right)\right]\hfill \\ & =& {\varphi}_{X+{U}_{\tau}}\left(\omega \right)+\left[{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right]t+o\left(t\right).\hfill \end{array}$$

$${p}_{X+{U}_{\tau +t}}\left(y\right)={p}_{X+{U}_{\tau}}\left(y\right)+t\phantom{\rule{0.166667em}{0ex}}{\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y)+o\left(t\right),$$

$$ln\left[{\varphi}_{U}\left(\omega \right)\right]={\sigma}_{U}^{2}{\int}_{\mathbb{R}}\left({e}^{i\omega x}-1-i\omega x\right)\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right).$$

Furthermore, since $\left|{e}^{i\omega x}-1-i\omega x\right|\le \frac{{\omega}^{2}\phantom{\rule{0.166667em}{0ex}}{x}^{2}}{2}$ (p.179, [22]),
which implies that
which is finite under the conditions of the lemma and hence ${\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}{ln}^{k}{\varphi}_{U}\left(\omega \right)\right\}\left(y\right)$ exists. Using the definition of the derivative, Equation (22) implies that:

$$\left|ln\left[{\varphi}_{U}\left(\omega \right)\right]\right|\le {\sigma}_{U}^{2}{\int}_{\mathbb{R}}\left|{e}^{i\omega x}-1-i\omega x\right|\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\le \frac{{\sigma}_{U}^{2}}{2}{\omega}^{2},$$

$$\begin{array}{cc}\hfill {\int}_{\mathbb{R}}\left|{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}{ln}^{k}{\varphi}_{U}\left(\omega \right)\right|\phantom{\rule{0.166667em}{0ex}}d\omega & \le {\int}_{\mathbb{R}}\left|{\varphi}_{X}\left(\omega \right)\right|\left|{\varphi}_{{U}_{\tau}}\left(\omega \right)\right|{\left|ln{\varphi}_{U}\left(\omega \right)\right|}^{k}\phantom{\rule{0.166667em}{0ex}}d\omega \hfill \\ & \le {\left(\frac{{\sigma}_{U}^{2}}{2}\right)}^{k}{\int}_{\mathbb{R}}\left|{\varphi}_{X}\left(\omega \right)\right|{\omega}^{2k}\phantom{\rule{0.166667em}{0ex}}d\omega ,\hfill \end{array}$$

$$\frac{d\phantom{\rule{0.166667em}{0ex}}{p}_{X+{U}_{t}}\left(y\right)}{dt}{|}_{t=\tau}={\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y).$$

Using the Mean Value Theorem: for some $0<h\left(t\right)<t$,
where Equation (24) is justified by the fact that ${\varphi}_{X}\left(0\right)={\varphi}_{U}\left(0\right)=1$. We proceed next to evaluate the inverse Fourier transform in the integrand of Equation (24):
where the last equation is due to standard properties of the Fourier transform and Equation (25) is due to Equation (23). The interchange in Equation (26) is justified by Fubini. In fact $|{e}^{i\omega x}-1-i\omega x|\le \frac{{\omega}^{2}\phantom{\rule{0.166667em}{0ex}}{x}^{2}}{2}$ and
which is finite by assumption. Back to Equation (24),
where the interchange in the order of integration in Equation (27) will be validated next by Fubini. Considering the inner integral,

$$\begin{array}{cc}\hfill \frac{h(X+{U}_{\tau +t})-h(X+{U}_{\tau})}{t}& =-{\int}_{\mathbb{R}}\frac{{p}_{X+{U}_{\tau +t}}\left(y\right)\phantom{\rule{0.166667em}{0ex}}ln{p}_{X+{U}_{\tau +t}}\left(y\right)-{p}_{X+{U}_{\tau}}\left(y\right)\phantom{\rule{0.166667em}{0ex}}ln{p}_{X+{U}_{\tau}}\left(y\right)}{t}\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =-{\int}_{\mathbb{R}}\left(1+ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)\right)\frac{d\phantom{\rule{0.166667em}{0ex}}{p}_{X+{U}_{t}}\left(y\right)}{dt}{|}_{\tau +h\left(t\right)}\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =-{\int}_{\mathbb{R}}\left(1+ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)\right){\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y)\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =-{\int}_{\mathbb{R}}ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right){\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y)\phantom{\rule{0.166667em}{0ex}}dy,\hfill \end{array}$$

$$\begin{array}{}& {\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y)=\frac{1}{2\pi}{\int}_{\mathbb{R}}{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right){e}^{-i\omega y}\phantom{\rule{0.166667em}{0ex}}d\omega \hfill \mathrm{(25)}& =\frac{{\sigma}_{U}^{2}}{2\pi}{\int}_{\mathbb{R}}{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}{\int}_{\mathbb{R}}\left({e}^{i\omega x}-1-i\omega x\right)\phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\phantom{\rule{0.166667em}{0ex}}{e}^{-i\omega y}\phantom{\rule{0.166667em}{0ex}}d\omega \hfill \mathrm{(26)}& =\frac{{\sigma}_{U}^{2}}{2\pi}{\int}_{\mathbb{R}}{\int}_{\mathbb{R}}{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\left({e}^{i\omega x}-1-i\omega x\right)\phantom{\rule{0.166667em}{0ex}}{e}^{-i\omega y}\phantom{\rule{0.166667em}{0ex}}d\omega \phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\hfill & ={\sigma}_{U}^{2}{\int}_{\mathbb{R}}\left({p}_{X+{U}_{\tau +h\left(t\right)}}(y-x)-{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)-x\phantom{\rule{0.166667em}{0ex}}{p}_{X+{U}_{\tau +h\left(t\right)}}^{{}^{\prime}}\left(y\right)\right)\phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right),\hfill \end{array}$$

$$\begin{array}{cc}& {\int}_{\mathbb{R}}{\int}_{\mathbb{R}}\left|{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\left({e}^{i\omega x}-1-i\omega x\right)\phantom{\rule{0.166667em}{0ex}}{e}^{-i\omega y}\phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\right|\phantom{\rule{0.166667em}{0ex}}d\omega \phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\hfill \\ & \le {\int}_{\mathbb{R}}{\int}_{\mathbb{R}}\left|{\varphi}_{X}\left(\omega \right)\right|\phantom{\rule{0.166667em}{0ex}}\frac{{\omega}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}d\omega \phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\le {\int}_{\mathbb{R}}\left|{\varphi}_{X}\left(\omega \right)\right|\phantom{\rule{0.166667em}{0ex}}\frac{{\omega}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}d\omega ,\hfill \end{array}$$

$$\begin{array}{cc}& \frac{h(X+{U}_{\tau +t})-h(X+{U}_{\tau})}{t}\hfill \\ & =-{\int}_{\mathbb{R}}ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right){\mathcal{F}}^{-1}\left\{{\varphi}_{X+{U}_{\tau +h\left(t\right)}}\left(\omega \right)\phantom{\rule{0.166667em}{0ex}}ln{\varphi}_{U}\left(\omega \right)\right\}(-y)\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =-{\sigma}_{U}^{2}{\int}_{\mathbb{R}}{\int}_{\mathbb{R}}ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)\left({p}_{X+{U}_{\tau +h\left(t\right)}}(y-x)-{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)-x{p}_{X+{U}_{\tau +h\left(t\right)}}^{{}^{\prime}}\left(y\right)\right)\phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right)\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =-{\sigma}_{U}^{2}{\int}_{\mathbb{R}}{\int}_{\mathbb{R}}ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)\left({p}_{X+{U}_{\tau +h\left(t\right)}}(y-x)-{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)-x{p}_{X+{U}_{\tau +h\left(t\right)}}^{{}^{\prime}}\left(y\right)\right)\phantom{\rule{0.166667em}{0ex}}dy\phantom{\rule{0.166667em}{0ex}}\frac{1}{{x}^{2}}\phantom{\rule{0.166667em}{0ex}}d{H}_{U}\left(x\right),\hfill \end{array}$$

$$\begin{array}{cc}& -{\int}_{\mathbb{R}}ln{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)\left({p}_{X+{U}_{\tau +h\left(t\right)}}(y-x)-{p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right)-x{p}_{X+{U}_{\tau +h\left(t\right)}}^{{}^{\prime}}\left(y\right)\right)\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =D({p}_{X+{U}_{\tau +h\left(t\right)}}(y-x)\parallel {p}_{X+{U}_{\tau +h\left(t\right)}}\left(y\right))+x{\int}_{\mathbb{R}}{p}_{X+{U}_{\tau +h\left(t\right)}}^{{}^{\prime}}\left(y\right)\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & =D({p}_{X+{U}_{\tau +h\left(t\right)}}(u-x)\parallel {p}_{X+{U}_{\tau +h\left(t\right)}}\left(u\right)).\hfill \end{array}$$

Finally, Equation (27) gives
which is finite. Equation (29) is due to the definition Equation (19) and Equation (30) is due to Equation (20). The finiteness of the end result justifies the interchange of the order of integration in Equation (27) by Fubini. ☐

$$\begin{array}{}\mathrm{(28)}& \hfill \frac{h(X+{U}_{\tau +t})-h(X+{U}_{\tau})}{t}& ={\sigma}_{U}^{2}{\int}_{\mathbb{R}}D({p}_{X+{U}_{\tau +h\left(t\right)}}(u-x)\parallel {p}_{X+{U}_{\tau +h\left(t\right)}}\left(u\right))\frac{1}{{x}^{2}}d{H}_{U}\left(x\right)\hfill \mathrm{(29)}& & \le {\sigma}_{U}^{2}{C}_{X+{U}_{\tau +h\left(t\right)}}\hfill \mathrm{(30)}& & \le {\sigma}_{U}^{2}\phantom{\rule{0.166667em}{0ex}}{C}_{X},\hfill \end{array}$$

We point out that the result of this lemma is sufficient for the purpose of the main result of this paper. Nevertheless, it is worth noting that Equation (21) could be strengthened to:

$$\frac{d}{dt}h(X+{U}_{t}){|}_{t=\tau}={\sigma}_{U}^{2}{\int}_{\mathbb{R}}D({p}_{X+{U}_{\tau}}(u-x)\parallel {p}_{X+{U}_{\tau}}\left(u\right))\frac{1}{{x}^{2}}d{H}_{U}\left(x\right).$$

In fact, the set of points where the left and right derivative of the concave function $h(X+{U}_{t})$ differ is of zero measure. One can therefore state that the derivative exists almost-everywhere and it is upperbounded almost-everywhere by Equation (30). Furthermore, considering Equation (28), one can see that taking the limit as $t\to {0}^{+}$ will yield
by the Monotone Convergence Theorem. The continuity of relative entropy may be established using techniques similar to those in [23] when appropriate conditions on ${p}_{X}$ hold.

$$\frac{{d}^{+}}{dt}h(X+{U}_{t}){|}_{t=\tau}={\sigma}_{U}^{2}{\int}_{\mathbb{R}}\underset{t\to 0}{lim}D({p}_{X+{U}_{\tau +h\left(t\right)}}(u-x)\parallel {p}_{X+{U}_{\tau +h\left(t\right)}}\left(u\right))\frac{1}{{x}^{2}}d{H}_{U}\left(x\right),$$

Finally, When U is purely Gaussian, ${U}_{t}\sim \sqrt{t}U$, ${H}_{U}\left(x\right)$ is the unit step function and Equation (31) boils down to de Bruijn’s identity for Gaussian perturbations (Equation (4)).

We assume without loss of generality that X and Z have a zero mean. Otherwise, define ${Y}^{{}^{\prime}}=Y-({\mu}_{X}+{\mu}_{1}+{\mu}_{2})$ and ${X}^{{}^{\prime}}=X-{\mu}_{X}$ for which $h\left({Y}^{{}^{\prime}}\right)=h\left(Y\right)$, $h\left({X}^{{}^{\prime}}\right)=h\left(X\right)$ and $J\left({X}^{{}^{\prime}}\right)=J\left(X\right)$ since differential entropy and Fisher information are translation invariant. We divide the proof into two steps and we start by proving the theorem when Z is purely Gaussian. ☐

Z is purely Gaussian:

We decompose Z as follows: Let $n\in {\mathbb{N}}^{*}\backslash \left\{1\right\}$ and $\u03f5=\frac{1}{n}$, then
where the ${\left\{{Z}_{i}\right\}}_{0\le i\le n}$ are IID with the same law as Z. We write Equation (7) in an incremental formulation as follows:
for $l\in \{1,\cdots ,n\}$. Note that ${Y}^{n}$ has the same statistics as Y. Using the de Bruijn’s identity (Equation (4)), we write
and
by virtue of the fact that $h\left({Y}^{l}\right)$ is concave in $\u03f5\ge 0$ (Section VII, p. 51, [5]). Using the FII Equation (3) on ${Y}^{l}$ we obtain

$$Z=\sqrt{\u03f5}\sum _{i=1}^{n}{Z}^{i},$$

$$\begin{array}{cc}\hfill {Y}^{0}\phantom{\rule{0.166667em}{0ex}}& =X\hfill \\ \hfill {Y}^{l}\phantom{\rule{0.166667em}{0ex}}& ={Y}^{l-1}+\sqrt{\u03f5}{Z}^{l}=X+\sqrt{\u03f5}\sum _{i=1}^{l}{Z}^{i}\hfill \end{array}$$

$$h\left({Y}^{l}\right)=h\left({Y}^{l-1}\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}J\left({Y}^{l-1}\right)+\mathrm{o}\left(\u03f5\right),$$

$$\begin{array}{cc}\hfill h\left({Y}^{l}\right)\phantom{\rule{0.166667em}{0ex}}& \le h\left({Y}^{l-1}\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}J\left({Y}^{l-1}\right).\phantom{\rule{2.em}{0ex}}l\in \{1,\cdots ,n\},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \frac{1}{J\left({Y}^{l}\right)}& \phantom{\rule{0.166667em}{0ex}}\ge \frac{1}{J\left({Y}^{l-1}\right)}+\frac{1}{J\left(\sqrt{\u03f5}{Z}^{l}\right)}=\frac{1}{J\left({Y}^{l-1}\right)}+\frac{\u03f5}{J\left({Z}^{l}\right)}\hfill \\ & \phantom{\rule{0.166667em}{0ex}}\ge \frac{1}{J\left(X\right)}+l\frac{\u03f5}{J\left({Z}^{1}\right)}\phantom{\rule{2.em}{0ex}}l\in \{1,\cdots ,n\}.\hfill \end{array}$$

Examining now $h\left(Y\right)=h\left({Y}^{n}\right)$,
where, in order to write the last equality, we used the fact that $J\left({Z}^{1}\right)=\frac{1}{{\sigma}_{!}^{2}}$ since ${Z}_{1}\sim \mathcal{N}(0,{\sigma}_{1}^{2})$. Equation (34) is due to the bounds in Equation (33) and Equation (35) is justified since the function $\frac{J\left({Z}^{1}\right)}{J\left({Z}^{1}\right)+u\phantom{\rule{0.166667em}{0ex}}\u03f5J\left(X\right)}$ is decreasing in u. Since the upper bound is true for any small-enough ϵ, necessarily
which completes the proof of the second part of the theorem. When X and Z are both Gaussian, evaluating the quantities shows that equality holds. In addition, equality only holds whenever the FII is satisfied with equality that is whenever X and Z are Gaussian.

$$\begin{array}{}& & h\left({Y}^{n}\right)\le h\left({Y}^{n-1}\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}J\left({Y}^{n-1}\right)\hfill & & \le h\left(X\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}\sum _{l=0}^{n-1}J\left({Y}^{l}\right)\hfill \mathrm{(34)}& & \le h\left(X\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}\left[J\left(X\right)+\sum _{l=1}^{n-1}\frac{J\left(X\right)J\left({Z}^{1}\right)}{J\left({Z}^{1}\right)+l\phantom{\rule{0.166667em}{0ex}}\u03f5\phantom{\rule{0.166667em}{0ex}}J\left(X\right)}\right]\hfill \mathrm{(35)}& & \le h\left(X\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}J\left(X\right)\left[1+{\int}_{0}^{n-1}\phantom{\rule{-11.38092pt}{0ex}}\frac{J\left({Z}^{1}\right)}{J\left({Z}^{1}\right)+u\phantom{\rule{0.166667em}{0ex}}\u03f5\phantom{\rule{0.166667em}{0ex}}J\left(X\right)}du\right]\hfill & & =h\left(X\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}J\left(X\right)+\frac{{\sigma}_{1}^{2}}{2}J\left({Z}^{1}\right)ln\left[1+(n-1)\u03f5\phantom{\rule{0.166667em}{0ex}}\frac{J\left(X\right)}{J\left({Z}^{1}\right)}\right]\hfill & & =h\left(X\right)+\frac{\u03f5\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}}{2}J\left(X\right)+\frac{1}{2}ln\left[1+(1-\u03f5)\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}J\left(X\right)\right],\hfill \end{array}$$

$$h\left(Y\right)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right),$$

Z is a sum of a Gaussian variable with a non-Gaussian infinitely divisible one:

Let ${\left\{{U}_{t}\right\}}_{t\ge 0}$ be a family of RVs associated with the non-Gaussian infinitely divisible RV ${Z}_{2}$ and distributed according to Equation (15). Using concavity Equation (18) in t,
Equation (36) is an application of Lemma 2 since $(X+{Z}_{1})$ has a positive PDF and satisfies all the required technical conditions. The upperbound proven in the previous paragraph gives Equations (37) and (38) is due to Equation (20). ☐

$$\begin{array}{}& \hfill h\left(Y\right)& =h(X+{Z}_{1}+{Z}_{2})=h(X+{Z}_{1}+{U}_{1})\hfill & & \le h(X+{Z}_{1})+\frac{dh(X+{Z}_{1}+{U}_{t})}{dt}{|}_{t={0}^{+}}\hfill \mathrm{(36)}& & \le h(X+{Z}_{1})+\phantom{\rule{0.166667em}{0ex}}{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}{C}_{X+{Z}_{1}}\hfill \mathrm{(37)}& & \le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)+\phantom{\rule{0.166667em}{0ex}}{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}{C}_{X+{Z}_{1}}\hfill \mathrm{(38)}& & \le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)+\phantom{\rule{0.166667em}{0ex}}{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}min\left\{{C}_{X};{C}_{Z}\right\}\hfill & & =h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)+\phantom{\rule{0.166667em}{0ex}}{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}min\left\{\underset{x}{sup}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}};\frac{1}{2{\sigma}_{1}^{2}}\right\}.\hfill \end{array}$$

On a final note, using Lemmas 1 and 2 one could have applied the Plünnecke-Ruzsa inequality (Theorem 3.11, [10]) which yields
which is looser than Equation (37).

$$h\left(Y\right)\le h\left(X\right)+\frac{{\sigma}_{1}^{2}}{2}\phantom{\rule{0.166667em}{0ex}}J\left(X\right)+{\sigma}_{2}^{2}\phantom{\rule{0.166667em}{0ex}}{C}_{X+{Z}_{1}},$$

The bound in Equation (10) may be extended to the n-dimensional vector Gaussian case, $\mathbf{Y}=\mathbf{X}+\mathbf{Z}$, where $\mathbf{Z}$ is an n-dimensional Gaussian vector. In this case, if $\mathsf{J}\left(\mathbf{X}\right)$ denotes the Fisher information matrix, the Fisher information and the entropy power are defined as

$$J\left(\mathbf{X}\right)=\mathrm{Tr}\left(\mathsf{J}\left(\mathbf{X}\right)\right)\phantom{\rule{2.em}{0ex}}\phantom{\rule{2.em}{0ex}}N\left(\mathbf{X}\right)=\frac{1}{2\pi e}{e}^{\frac{2}{n}h\left(\mathbf{X}\right)}.$$

- When $\mathbf{Z}$ has n IID Gaussian components –i.e., with covariance matrix ${\Lambda}_{Z}={\sigma}^{2}\mathsf{I}$, following similar steps lead to:$$N(\mathbf{X}+\mathbf{Z})\le N\left(\mathbf{X}\right)+N\left(\mathbf{Z}\right)\frac{N\left(\mathbf{X}\right)J\left(\mathbf{X}\right)}{n},$$
- In general, for any positive-definite matrix ${\Lambda}_{Z}$ with a singular value decomposition $\mathsf{U}\mathsf{D}{\mathsf{U}}^{T}$, if we denote by $\mathsf{B}=\mathsf{U}{\mathsf{D}}^{-\frac{1}{2}}{\mathsf{U}}^{T}$ then$$\mathsf{B}\mathbf{Y}=\mathsf{B}(\mathbf{X}+\mathbf{Z})=\mathsf{B}\mathbf{X}+\mathsf{B}\mathbf{Z}=\mathsf{B}\mathbf{X}+{\mathbf{Z}}^{\prime}$$$$\begin{array}{cc}\hfill N\left(\mathsf{B}\right(\mathbf{X}+\mathbf{Z}\left)\right)& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathsf{B}\mathbf{X}\right)+N\left({\mathbf{Z}}^{\prime}\right)\frac{N\left(\mathsf{B}\mathbf{X}\right)J\left(\mathsf{B}\mathbf{X}\right)}{n}\hfill \\ \hfill \u27faN(\mathbf{X}+\mathbf{Z})& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathbf{X}\right)+\frac{N\left(\mathbf{X}\right)J\left(\mathsf{B}\mathbf{X}\right)}{n}\hfill \\ \hfill \u27faN(\mathbf{X}+\mathbf{Z})& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathbf{X}\right)+\frac{N\left(\mathbf{X}\right)\mathrm{Tr}\left(\mathsf{J}\left(\mathbf{X}\right){\Lambda}_{Z}\right)}{n},\hfill \end{array}$$

We have derived a novel tight upper bound on the entropy of the sum of two independent random variables where one of them is infinitely divisible with a Gaussian component. The bound is shown to be tighter than previously known ones and holds for variables with possibly infinite second moment. With the isoperimetric inequality in mind, the “symmetry” that this bound provides with the known lower bounds is remarkable and hints to possible generalizations to scenarios where no Gaussian component is present.

The authors would like to thank the Assistant Editor for his patience and the anonymous reviewers for their helpful comments. This work was supported by AUB’s University Research Board and the Lebanese National Council for Scientific Research (CNRS-L).

Both authors contributed equally on all stages of this work: formulated and solved the problem and prepared the manuscript. Both authors have read and approved the final manuscript.

The authors declare no conflict of interest.

- Shannon, C.E. A mathematical theory of communication, part I. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Bobkov, S.G.; Chistyakov, G.P. Entropy power inequality for the Renyi entropy. IEEE Trans. Inf. Theory
**2015**, 61, 708–714. [Google Scholar] [CrossRef] - Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control
**1959**, 2, 101–112. [Google Scholar] [CrossRef] - Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory
**1965**, 11, 267–271. [Google Scholar] [CrossRef] - Rioul, O. Information theoretic proofs of entropy power inequality. IEEE Trans. Inf. Theory
**2011**, 57, 33–55. [Google Scholar] [CrossRef] - Dembo, A.; Cover, T.M.; Thomas, J.A. Information Theoretic Inequalities. IEEE Trans. Inf. Theory
**1991**, 37, 1501–1518. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Ruzsa, I.Z. Sumsets and entropy. Random Struct. Algorithms
**2009**, 34, 1–10. [Google Scholar] [CrossRef] - Tao, T. Sumset and inverse sumset theory for Shannon entropy. Comb. Probab. Comput.
**2010**, 19, 603–639. [Google Scholar] [CrossRef] - Kontoyiannis, I.; Madiman, M. Sumset and inverse sumset inequalities for differential entropy and mutual information. IEEE Trans. Inf. Theory
**2014**, 60, 4503–4514. [Google Scholar] [CrossRef] - Madiman, M. On the entropy of sums. In Proceedings of the 2008 IEEE Information Theory Workshop, Oporto, Portugal, 5–9 May 2008.
- Cover, T.M.; Zhang, Z. On the maximum entropy of the sum of two dependent random variables. IEEE Trans. Inf. Theory
**1994**, 40, 1244–1246. [Google Scholar] [CrossRef] - Ordentlich, E. Maximizing the entropy of a sum of independent bounded random variables. IEEE Trans. Inf. Theory
**2006**, 52, 2176–2181. [Google Scholar] [CrossRef] - Bobkov, S.; Madiman, M. On the Problem of Reversibility of the Entropy Power Inequality. In Limit Theorems in Probability, Statistics and Number Theory; Springer-Verlag: Berlin/Heidelberg, Germany, 2013; pp. 61–74. [Google Scholar]
- Miclo, L. Notes on the speed of entropic convergence in the central limit theorem. Progr. Probab.
**2003**, 56, 129–156. [Google Scholar] - Luisier, F.; Blu, T.; Unser, M. Image denoising in mixed Poisson-Gaussian noise. IEEE Trans. Image Process.
**2011**, 20, 696–708. [Google Scholar] [CrossRef] [PubMed] - Fahs, J.; Abou-Faycal, I. Using Hermite bases in studying capacity-achieving distributions over AWGN channels. IEEE Trans. Inf. Theory
**2012**, 58, 5302–5322. [Google Scholar] [CrossRef] - Heyer, H. Structural Aspects in the Theory of Probability: A Primer in Probabilities on Algebraic-Topological Structures; World Scientific: Singapore, Singapore, 2004; Volume 7. [Google Scholar]
- Costa, M.H.M. A new entropy power inequality. IEEE Trans. Inf. Theory
**1985**, 31, 751–760. [Google Scholar] [CrossRef] - Verdú, S. On channel capacity per unit cost. IEEE Trans. Inf. Theory
**1990**, 36, 1019–1030. [Google Scholar] [CrossRef] - Kullback, S. Information Theory and Statistics; Dover Publications: Mineola, NY, USA, 1968. [Google Scholar]
- Steutel, F.W.; Harn, K.V. Infinite Divisibility of Probability Distributions on the Real Line; Marcel Dekker Inc.: New York, NY, USA, 2006. [Google Scholar]
- Fahs, J.; Abou-Faycal, I. On the finiteness of the capacity of continuous channels. IEEE Trans. Commun.
**2015**. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).