Self-similar solutions of R\'enyi's entropy and the concavity of its entropy power

We study the class of self-similar probability density functions with finite mean and variance which maximize R\'{e}nyi's entropy. The investigation is restricted in the Schwartz space $S(\mathbb{R}^d)$ and in the space of $l$-differentiable compactly supported functions $C_c^l(\mathbb{R}^d)$. Interestingly the solutions of this optimization problem do not coincide with the solutions of the usual porous medium equation with a Dirac point source, as it occurs in the optimization of Shannon's entropy. We also study the concavity of the entropy power in $\mathbb{R}^d$ with respect to time using two different methods. The first one takes advantage of the solutions determined earlier while the second one is based on a setting that could be used for Riemannian manifolds.


Introduction
The last two decades have witnessed an enormous growing interest in using information concepts in diverse fields of science. Although Rényi entropy was introduced as early as 1961, only recently a wide range of applications has emerged as in the analysis of quantum entanglement [8], quantum correlations [4], computer vision [5], clustering [9], quantum cryptography [3] and pattern recognition [16].
In the present work we solve three problems. The first examines the possibility to extremize Rényi's entropy using self-similar probability density functions (p.d.f.'s) with zero expectation value and finite second moment in the Schwartz space and the space of compactly supported continuous functions on R d . The second tackles the same problem but with the additional feature of non-zero mean. Finally, the third one is devoted to the determination of conditions under which the concavity of entropy power is valid.
Our contribution is threefold: First, we theoretically establish the solutions of the first two problems by applying the method of calculus of variations, which was lacking from the literature. Second, we compare the specified solutions with the already known ones derived from the fast and porous medium equations. Third, we propose two different methods to answer the third problem.
In particular, for the first problem, the functional which contains Rényi's entropy and the constraints, incorporated as Lagrange multipliers, is constructed and then by applying the calculus of variations its critical points is determined. The perturbed p.d.f.'s have the form g ǫ = f + ǫh where |ǫ| < ǫ 0 < 1 and h are functions which are chosen in such a way that g ǫ is a p.d.f. and has the same variance as f . The vanishing of the first variation of the Lagrange functional provides the equation which determines the critical points. It turns out that the solutions are unique. The nonnegativity of the second variation leads to an integral inequality which is preserved by the admissible perturbations we consider. Therefore the critical point is a local maximum of the functional. To prove its global nature we use the concept of the relative Rényi entropy and examine its positivity at the critical point. This procedure can be generalised in R d and also by requiring a finite covariance constraint the wellknown solutions of [21] are recovered. As a check one can prove that in the α → 1 limit the solutions converge to the p.f.d. of the normal distribution N (0, µ 2 ). The second problem is proved to be equivalent to the first one by performing a suitable transformation to the random variable (composition of a displacement with a rescaling). Therefore its solution maps to the one of the first problem.
The knowledge of the solutions enables us to construct the nonlinear diffusion equation they satisfy and compare them with those of Zel'dovich, Kompaneets and Barenblatt (ZKB) [2], [22]. The difference is in the diffusion coefficient which depends not only on the shape and the size of a molecule but also on the order α of the Rényi entropy and the dimension of the space. We plot both our solution and the Barenblatt's solutions and observe that their while for values of α greater than the threshold α th. (d) the inequality changes direction.
Finally, the problem of concavity of the entropy power in R d is confronted by utilising two different methods. In the first method, the solutions of the first problem guarantee concavity on the condition that the second time derivative of Rényi's entropy fulfils inequality (69). The second method is closer to the spirit analyzed in [23], where the α = 1 case was studied, and concavity holds provided that (84) is satisfied.
The paper is organized into six sections. Section 2 reviews and proves some properties of the entropy. Section 3 determines the solutions of the two maximization problems using the method of calculus of variations and examines their global validity using the concept of relative Rényi entropy. Section 4 provides the nonlinear diffusion equation the solutions satisfy and compare it with the usual one. Section 5 proves the concavity of the Rényi entropy with respect to time following two different methods. Section 6 concludes the work and comments on more general constraints one could have considered.

Preliminaries
In this section we briefly review some properties of the Rényi entropy and for the sake of completeness we present the corresponding proofs. Definition 2.1 Let (Ω, A, µ) be a probabilty space and an A-measurable function f : Ω → R + be a probability density function (p.d.f.). The differential Rényi entropy of order α, α ∈ R + , is the nonlinear functional defined by where F is the probability measure induced by f namely Other equivalent ways of defining the Rényi entropy are

Properties
(1α) H α [f ] is continuous (α = 1) and strictly decreasing function in α unless f is the uniform density in which case it is constant.
(1β) A consequence of property (1α) is the inequality is the Kullback-Leibler (KB) relative entropy. If α > 1 then the direction of the inequality is reversed. Therefore the Rényi entropy as a function of α, for fixed f , is bounded by the difference between the Shannon entropy of g and the KB relative entropy of f and g. Proof (1α) By Hölder's inequality there is a family of relations holding whenever p, q ≥ 0 and θ ∈ [0, 1]. Taking p = 1 and assuming f to be a p.d.f. we have Let 0 < α < β < 1 and q = α. Then for θ = (1 − β)/(1 − α) < 1 the previous inequality becomes which, using that the ln-function is an increasing function, implies that The same proof holds for α, β > 1.
(1β) Differentiating H α with respect to α we obtain from which the inequality follows.
(2) H α (f ) as a function of α converges to the following limits where H 1 [f ] is the Shannon entropy of f.
(3) Let f be a non-negative and integrable function w.r.t. the measure µ on Ω, then the following inequality holds while for α > 1 the inequality is reversed. Proof This is a direct consequence of Jensen's inequality for concave functions φ. In our case (φ • f )(x)) = f (x) α is concave for α < 1 and by applying it we have from which it is deduced straightforwardly. In particular for a p.d.f. it reduces to H α [f ] < ln(µ(Ω)).
(4) If the L 1 (Ω, R + ) norm is invariant under the homogeneous dilations then H α , for α < 1, scales as implies the condition which combined with the definition of Rényi's entropy produces the desired result.

Formulation of the first problem and its solutions
In what follows we restrict on the probability space (R, B, dx) where B = σ(O) is the sigma algebra on open sets and dx the Lebesgue measure on R. The domain of the Rényi functional, D(H α ), is defined to be where f is a positive and integrable real valued function, f + (x) = f (x)χ B (x) with χ the indicator function of the set B and [·] is the integer part of the number.
The first entropy maximization problem with vanishing mean and finite variance is formulated as: Using the method of Lagrange multipliers we construct the functional and impose appropriate conditions on the perturbations in order to calculate its first and second variation.
is called admissible if it satisfies the following conditions: If we introduce the usual inner product in S(R), the previous integral conditions imply that we search for a class of functions which are orthogonal to the unity and x 2 . Odd functions in the Schwartz space such as Q(x)e −b|x| µ , b, µ > 0 with Q(x) be a polynomial of odd powers of x satisfies these criteria. Expanding the Lagrange functional in a Taylor series up to second order in ǫ we obtain The first-order necessary condition for optimality requires [12] δFf (h) = lim The function f (x) is determined by using the following lemma Proof Suppose that f (x) = 0 then there exists ξ ∈ R such that f (ξ) = c > 0 (assuming that the constant is positive). Since f ∈ S(R) there exists a neighbour (a, b) of ξ in which f (x) > 0, ∀x ∈ (a, b). Define the function The since the integrand is positive (except at a and b). This contradiction proves the lemma. Therefore f (x) is given bŷ where the L ∞ norm is with respect to x andλ k > 0 sincef ∈ D α<1 . In orderf to be a local maximum the following second-order necessary condition for optimality should also hold In other words, the second variation of F atf should be positive semidefinite on the space of admissible perturbations h. The previous inequality is translated into It is easily checked that the solutionf together with the admissible perturbations satisfy the strict inequality, thereforef is a strict one-parameter family of local maxima with increasing Rényi entropy as property (1α) of section 2 guarantees. Remark If 0 < R f α (x)dx ≤ 2 then Taylor expanding the ln-function around unity, the Rényi entropy reduces to the Havrda-Charvát entropy [7] also called Tsallis entropy [20] S The solution in this case is the previous one but with the substitutionλ k = λ k Depending on the space of functions, we distinguish the following two types of solutions.
while the second moment constraint gives We adopt the abbreviation B s/2 ≡ B(s/2, 1/(1−α)−s/2) from now on. From these two relations we conclude thatλ Using this relation, the exact solution can be expressed aŝ This one-parameter family of local maxima of H α is unique and it remains to prove that it is actually also a one-parameter family of global maxima in D(H α<1 ). For this we use the notion of the relative α-Rényi entropy of two densitiesf and g, defined by [13] where g satisfies the same second moment constraint asf . The first term in the right hand side of (38) equals to H α [f ] as one may check since Rf by applying Hölder's inequality to the functions (f α−1 g) α andf α(α−1) . The same result holds in the α > 1 case.

Proposition 3.3
The optimization problem for α < 1 has a one-parameter family of global maximaf which in the α → 1 − limit converges to the global maximum of the Shannon entropy which is the normal distribution N (0, µ 2 ).

Proof
The first two terms of (37) in the α → 1 − limit give while the third term, by performing the change of variables s = (1 − α)/(3α − 1), ρ = 1/2s successively, converges to (3β) The l-differentiable compactly supported solution (α > 1). In this case following steps similar to the previous one the solution turns out to bê where θ(x) is the unit-step function and the arbitrary constantλ 0 is specified as usual by imposing the requirement thatf be a p.d.f.

Remarks
• The solution (3α) can also be derived by integrating equation (26), in which case we obtain the relation between the Lagrange multipliers Eliminating λ 0 from the solution and applying the constraints we arrive at (37).
• The previous set up can also be applied to the more general case in which a covariance matrix C-constraint is present. The new solution can be derived from the old one by replacing x 2 by x T C −1 x, see [21].

Formulation of the second problem and its solutions
The second entropy maximization problem with non-vanishing mean and finite variance is formulated as follows: This problem can equivalently be restated as subjected to the new constraints E(Y ) = R yf Y (y)dy = 0 The random variables X, Y are related through the transformation X = µ 2 Y + µ 1 and their corre- As a consequence the entropies are given by The solutions of the second problem (45) are therefore given by the solutions of first problem (21) with the substitution (x − µ 1 )/ → x. One may also try to solve directly the second problem starting from the Lagrange functional The first-order necessary optimization condition dictates the solution which can also be proved to be a global maximum. We distinguish the following two classes of solutions.
(4α) The S(R) solution (α < 1). The positivity of the solutionf α−1 (x), ∀x ∈ R requiresλ 2 > 0 andλ 2λ0 − (λ 1 ) 2 > 0. The p.d.f. and the mean value constraints lead to the condition while the variance constraint implies Finally, the solution is written aŝ which is identical to the solution derived from the equivalent problem.
(4β) The l-differentiable compactly supported solution (α > 1). In this case the polynomial kλ kx k should be positive between its real roots. This occurs provided thatλ 2 < 0 and (λ 1 ) 2 −λ 2λ0 > 0. Using the indicator function χ (x 1 ,x 2 ) , with x 1 , x 2 the roots of the polynomial, we find the previous solution with a relative minus sign between the terms inside the parentheses while the power is now positive.

Comparison with the FME and PME solutions
The p.d.f.,f , which maximizes the Shannon entropy under finiteness of the second moment turns out to be identical to the fundamental solution of the diffusion equation with a Dirac point source. It is worth noting thatf is actually a global maximum of H 1 . This observation is accidental as one may justify from the study of the corresponding optimization problem for the Rényi entropy. In particular, the nonlinear, initial valued problem has the following self-similar solutions [21] FME solution: where and (59) The d-dimensional time-dependent functions and and derived from the optimization of Rényi's entropy, can be shown to satisfy the following initial value problem provided that γ ≡ γ(α, d) = 1/(2 + d(α − 1)) and the coefficient is given by where µ 2 (t) = t γ . The presence of the ratio x/t γ is implied by the self-similar property of the solution which requires the function into the parentheses to remain invariant under the rescalings: x →x = λ γ x and t →t = λt. Therefore in general, the p.d.f. maximizing the Rényi entropy is a solution of an appropriately constructed difussion equation problem. We plot the FME and PME solutions of (54) versus the solutions of (65). In the fast diffusion case u ∞ < f ∞ since C α<1 < A α<1 , ∀ d d+2 < α < 1 as one may prove using (8.1). Figure 2: A snapshot at t = 1 of the PME solution of (54) (red line) and (65) (blue line) initial value problems, with parameters: d = 1, α = 2.2.
In the porous medium regime u ∞ < f ∞ up to the threshold value α th. (d = 1) = 1.8268 for which C α>1 = A α>1 and then u ∞ > f ∞ , ∀α > 1.8268. The threshold value α th. , which depends on the dimension d, is determined arithmetically. 6 The concavity of Rényi's entropy power The α-weighted Fisher information of f is defined as while the entropy power of f associated to the Rényi entropy H α is defined as Proposition 6.2 Let Ω = E d be the d-dimensional Euclidean space. The entropy power N α is a concave function of t, ∀t ∈ (0, ∞) provided that : where the right hand side of the inequality represents the contributions from the global maximum of H α .

Proof
Using the nonlinear diffusion equation, a straightforward calculation reveals the connection between dH α /dt and I α expressed by the relation The entropy power is a concave function of time iff or, equivalently when Next, we establish the identity To do so we rewrite the integral as The first time derivative of the Rényi entropy satisfies the following upper bounds To prove this note that the term ∆(lnf ) is given by and therefore its contribution to the integral is Also, Substituting (75) into (72) we recover the expected result.
✷ Note that in the α → 1 − limit we reproduce the well-known result valid for the Shannon's entropy.
Next we prove the concavity of Rény's power entropy on a different setting. This problem was also studied in [18] but our approach leads to a condition not predicted before. We will need the following lemma.

Proof
Using the porous medium equations for f, v as well as the relation ∇f α = f ∇v we have where partial integrations in the fourth and fifth equalities have been used. Also in the last step the Bochner's formula in Euclidean space has been applied. Relation (81) is proved using (80). ✷ Theorem 6.4 The Rényi entropy power, for self-similar solutions, is concave in t provided that the following inequality is satisfied where the equality holds for α = 1.

Proof
The Rényi entropy power is concave in t iff the condition(72) holds which can be written equivalently as diffusion equation initial value problem (54), they appear to behave in a particular way as seen in Figures (1) and (2). If one considers finite even moments of the random variable X, as constraints, and try to solve the corresponding maximization problem then the S(R) solution exists wheneverf α−1 is a complete polynomial of even degree, or equivalently, when all the coefficientsλ 2k+1 vanish. In those cases there exists the possibility not to have interception points with the x-axis since the roots come into conjugate complex pairs. The compactly supported solution, under certain conditions, can always be determined.
The concavity of entropy power holds whenever the second time derivative of the entropy varies according to (69) or the function f α − Cf 2α−1 belongs to L 1 (R d ). It would be more appealing to have a deeper understanding of the origin of the later constraint which at this stage seems to be a requirement for consistency.

Acknowledgments
The author would like to thank both J. K. Pachos for the fruitful discussions he had related to this project, and the Department of Physics and Astronomy of the University of Leeds for its hospitality during this visit.

3.
The constraint between d, λ becomes 4. The last formula is

Proof
The large asymptotic expansion of the ratio of gamma functions [19], is given by Substituting the values a = −s/2, b = 0 in the previous expression and taking the limit ρ → ∞ we recover the desired result. 2.