Next Article in Journal
Nonlocal ψ-Hilfer Generalized Proportional Boundary Value Problems for Fractional Differential Equations and Inclusions
Previous Article in Journal
Extending King’s Method for Finding Solutions of Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Onicescu’s Informational Energy and Correlation Coefficient in Exponential Families

Sony Computer Science Laboratories Inc., Tokyo 141-0022, Japan
Foundations 2022, 2(2), 362-376; https://doi.org/10.3390/foundations2020025
Submission received: 4 February 2022 / Revised: 14 April 2022 / Accepted: 16 April 2022 / Published: 19 April 2022

Abstract

:
The informational energy of Onicescu is a positive quantity that measures the amount of uncertainty of a random variable. However, contrary to Shannon’s entropy, the informational energy is strictly convex and increases when randomness decreases. We report a closed-form formula for Onicescu’s informational energy and its associated correlation coefficient when the probability distributions belong to an exponential family. We show how to instantiate the generic formula for several common exponential families. Finally, we discuss the characterization of valid thermodynamic process trajectories on a statistical manifold by enforcing that the entropy and the informational energy shall vary in opposite directions.

1. Introduction

1.1. Onicescu’s Informational Energy

Let ( X , F , μ ) be a probability space [1] with σ -algebra F on the sample space X , and μ a base measure often chosen as the Lebesgue measure or as the counting measure. Let M denote the set of Radon-Nikodym densities of probability measures dominated by μ . Two probability densities p and q are said equal (i.e., p = q ) if and only if p ( x ) = q ( x ) μ -almost everywhere, and different (i.e., p q ) when μ { x X   :   p ( x ) q ( x ) } 0 .
Octav Onicescu [2,3] (1892–1983) was a renowned Romanian mathematician who founded the school of probability theory and statistics [4] in Romania. Onicescu introduced the informational energy [5] (also termed information energy [6] in the literature) of a probability measure P μ with Radon-Nikodym density p = d P d μ as
I ( p ) : = p 2 ( x ) d μ ( x ) > 0 .
The Rényi entropy [7,8] of order-2 can be written using the informational energy:
R 2 ( p ) : = log p 2 ( x ) d μ ( x ) = log I ( p ) ,
as well as Vajda’s quadratic entropy [9]:
V 2 ( p ) : = 1 p 2 ( x ) d μ ( x ) = 1 I ( p ) .
Notice that it follows from Cauchy-Schwarz’s inequality that we have I ( p ) 1 when p L 2 ( μ ) , the Lebesgue space of square integrable functions [1]. The informational energy for continuous distributions p is I ( p ) = X p 2 ( x ) d x and the informational energy for discrete distributions q, I ( q ) = x X q 2 ( x ) . Notice that the informational energy in the continuous case is not a limit of the informational energy in the discrete case [6,10].
The informational energy is an important concept in statistics which fruitfully interplays with Shannon’s entropy [11,12] H ( p ) :
H ( p ) : = p ( x ) log 1 p ( x ) d μ ( x ) = p ( x ) log p ( x ) d μ ( x ) .
Notice that the informational energy is always positive but the Shannon’s entropy may be negative for continuous distributions (e.g., differential entropy of the normal distributions for small standard deviation). For the Dirac’s distribution δ e (with δ e ( x ) = 1 when x = e and 0 otherwise), the informational energy is I ( δ e ) = + but Shannon’s entropy is H ( δ e ) = .
The informational energy measures the amount of uncertainty of a random variable like Shannon’s entropy but augments when randomness decreases.
Onicescu’s informational energy finds also applications in various other fields since Onicescu’s informational energy corresponds to the Herfindahl–Hirschman index [13] HHI ( p ) = I ( p ) in economics, to the index coincidence [14] IC ( p ) = I ( p ) in information theory (originally developed for cryptanalysis [15]), and is related to Simpson’s diversity index [16,17] S ( p ) = I ( p ) in ecology (also called the Gini-Simpson index of diversity [18]). Section 4 will consider the joint variations of the Shannon entropy and the informational energy to characterize valid thermodynamic process paths on a statistical manifold.
Another key difference with Shannon’s entropy is that Shannon’s entropy is always strictly concave [12] but the informational energy is always strictly convex:
Property 1. 
Onicescu’s informational energy I ( · ) is a strictly convex functional.
Proof. 
A function F is strictly convex if and only if for any α ( 0 , 1 ) and p q two densities of a convex domain M , we have F ( ( 1 α ) p + α q ) < ( 1 α ) F ( p ) + α F ( q ) . Let us check this strict inequality for the informational energy I:
I ( ( 1 α ) p + α q ) = ( 1 α ) 2 I ( p ) + α 2 I ( q ) + 2 α ( 1 α ) p ( x ) q ( x ) d μ ( x ) , = ( 1 α ) I ( p ) + α I ( q ) + 2 α ( 1 α ) p ( x ) q ( x ) d μ ( x ) α ( 1 α ) ( I ( p ) + I ( q ) ) ,
= ( 1 α ) I ( p ) + α I ( q ) α ( 1 α ) ( p ( x ) q ( x ) ) 2 d μ ( x ) > 0 ,
< ( 1 α ) I ( p ) + α I ( q ) ,
when p q and α ( 0 , 1 ) . □
Table 1 summarizes the comparison between Shannon’s entropy and Onicescu’s informational energy.
Since I ( p ) is strictly convex, we can define the informational energy divergence as the following Jensen divergence [19] measuring the convexity gap:
J I ( p , q ) : = I ( p ) + I ( q ) 2 I p + q 2 ,
= 1 4 X ( p ( x ) q ( x ) ) 2 d μ ( x ) .
For a uniform discrete distribution p on an alphabet X of d letters, we have I ( p ) = 1 d , and for any probability mass function p on X , we have I ( p ) 1 d . Recall that H ( p ) log d with equality when p is the discrete uniform distribution. More generally, for a continuous density on an interval X = [ a , b ] , we have I ( p ) 1 b a .
For discrete or continuous distributions, we have the following inequality [20] (Proposition 5.8.5):
H ( p ) + 1 2 I ( p ) 1 log 2 > 0.69897 ,
and Shannon’s cross-entropy
H × ( p : q ) = p ( x ) log q ( x ) d μ ( x )
can be lower bounded using the informational energy as follows (Problem 5.8 in [20]): For any x > 0 , we have log x x 1 . Thus we get log q ( x ) q ( x ) 1 and p ( x ) log q ( x ) p ( x ) p ( x ) q ( x ) . Therefore we have
H × ( p : q ) 1 p ( x ) q ( x ) d μ ( x ) .
Using Cauchy–Schwarz’s inequality
p ( x ) q ( x ) d μ ( x ) p ( x ) 2 d μ ( x )   q ( x ) 2 d μ ( x ) ,
we get:
H × ( p : q ) 1 I ( p )   I ( q ) .
In particular, we get a lower bound on Shannon’s entropy:
H ( p ) = H × ( p : q ) 1 I ( p ) .
For an in-depth treatment of Onicescu’s informational energy, we refer to the paper [6] (77 pages, with main properties listed in pp. 167–169 as statistical applications). Onicescu’s informational energy has been used in physics [21], information theory in electronic structure theory of atomic and molecular systems [22,23,24], machine learning [17,25], and complex systems [26], among others.

1.2. Onicescu’s Correlation Coefficient

Onicescu also defined a correlation coefficient (see [20], Chapter 5):
ρ ( p , q ) : = I ( p , q ) I ( p )   I ( q ) ,
where I ( p , q ) denotes the cross-informational energy:
I ( p , q ) : = p ( x ) q ( x ) d μ ( x ) ,
with I ( p ) = I ( p , p ) . Notice that it follows from the Cauchy-Schwarz inequality that I ( p , q ) I ( p ) I ( q ) , and therefore we have:
0 < ρ ( p , q ) 1 ,
assuming both densities p and q belong to the Lebesgue space L 2 ( μ ) .
Notice that the informational energy of a statistical mixture m ( x ) = i = 1 k w i p i ( x ) with k weighted components p 1 ( x ) , , p k ( x ) (with w Δ k the ( k 1 ) -dimensional standard simplex) can be expressed as follows:
I ( m ) = i = 1 k w i p i ( x ) 2 d μ ( x ) = i = 1 k j = 1 k w i w j   I ( p i , p j ) .
The Cauchy-Schwarz divergence [27,28] is defined by
D CS ( p , q ) : = log X p ( x ) q ( x ) d μ ( x ) X p ( x ) 2 d μ ( x ) X q ( x ) 2 d μ ( x ) 0 .
Thus, the Cauchy-Schwarz divergence is a projective divergence (that is, we have D CS ( p , q ) = D CS ( λ p , λ q ) for any λ > 0 and λ > 0 ) which can be rewritten using the Onicescu’s correlation coefficient as:
D CS ( p , q ) = log ρ ( p , q ) .

1.3. Exponential Families

Consider a natural exponential family [29,30] (NEF)
E = p θ ( x ) = exp θ t ( x ) F ( θ ) + k ( x )   :   θ Θ ,
where t ( x ) denotes the minimal sufficient statistics, k ( x ) an auxiliary measure carrier term, and
F ( θ ) : = log X exp ( θ t ( x ) ) d μ ( x ) ,
the cumulant function which is commonly called the log-normalizer (or log-partition function in statistical physics). Parameter θ is called the natural parameter and is defined on the open convex natural parameter space Θ .
Many familiar families of distributions { p λ ( x )   λ Λ } are exponential families in disguise after reparameterization: p λ ( x ) = p θ ( λ ) ( x ) (e.g., normal family or Poisson family). Those families are called exponential families (omitting the leading adjective ‘natural’), and their densities are canonically factorized as follows:
p λ ( x ) = exp θ ( λ ) t ( x ) F ( θ ( λ ) ) + k ( x ) .
We call parameter λ Λ the source parameter, and parameter θ ( λ ) Θ is called the corresponding natural parameter. Densities of an exponential family have all the same support X .

2. Onicescu’s Informational Energy and Correlation Coefficient in Exponential Families

In this section, we first report a closed-form formula for the informational energy and its associated correlation coefficient in Section 2.1. We then describe some statistical divergences related to Onicescu’s correlation coefficient in Section 2.2.

2.1. Closed-Form Formula

We report closed-form formulas for Onicescu’s informational energy and correlation coefficient when densities belong to a prescribed exponential family, and then illustrate those formula on common families of probability distributions.
Theorem 1 
(Onicescu’s informational energy and correlation coefficient in exponential families). In an exponential family E = { p θ } θ Θ , Onicescu’s informational energy of a probability density p θ is given by:
I ( p θ ) = exp F ( 2 θ ) 2 F ( θ )   E p 2 θ exp ( k ( x ) ) ,
provided that 2 θ Θ so that p 2 θ E . When the auxiliary carrier term k ( x ) vanishes, we have E p 2 θ exp ( k ( x ) ) = 1 .
The Onicescu’s correlation coefficient ρ ( p θ 1 , p θ 2 ) between densities p 1 = p θ 1 and p 2 = p θ 2 is
ρ ( p θ 1 , p θ 2 ) = exp ( J F ( 2 θ 1 : 2 θ 2 ) ) × E p θ 1 + θ 2 exp ( k ( x ) ) E p 2 θ 1 exp ( k ( x ) )   E p 2 θ 2 exp ( k ( x ) ) ,
provided that θ 1 + θ 2 Θ , 2 θ 1 Θ , 2 θ 2 Θ , where
J F ( θ 1 , θ 2 ) : = F ( θ 1 ) + F ( θ 2 ) 2 F θ 1 + θ 2 2 0 ,
is a Jensen divergence [19] induced by the cumulant function of the exponential family.
Proof. 
The proof follows the same line of arguments as in [31]. Consider the term I ( p θ 1 , p θ 2 ) :
I ( p θ 1 , p θ 2 ) : = exp ( t ( x ) θ 1 F ( θ 1 ) + k ( x ) ) exp ( t ( x ) θ 2 F ( θ 2 ) + k ( x ) )   d μ ( x ) , = exp t ( x ) ( θ 1 + θ 2 ) F ( θ 1 + θ 2 ) + k ( x ) + F ( θ 1 + θ 2 ) F ( θ 1 ) F ( θ 2 ) + k ( x ) d μ ( x ) ,
= exp F ( θ 1 + θ 2 ) F ( θ 1 ) F ( θ 2 )   p θ 1 + θ 2 ( x ) exp ( k ( x ) ) d μ ( x ) ,
= exp F ( θ 1 + θ 2 ) F ( θ 1 ) F ( θ 2 )   E p θ 1 + θ 2 exp ( k ( x ) ) ,
provided that θ 1 + θ 2 Θ . This condition is always satisfied when the natural parameter space is either a cone [31] (e.g., Gaussian family, Wishart family, etc.) or an affine space [32] (e.g., Poisson family, isotropic Gaussian family, etc). Since I ( p ) = I ( p , p ) and ρ ( p , q ) = I ( p , q ) I ( p ) I ( q ) , we deduce formula of Equations (21) and (22).  □

2.2. Divergences Related to Onicescu’s Correlation Coefficient

Since D CS ( p , q ) = log ρ ( p , q ) , we get the following closed-form for the Cauchy-Schwarz divergence:
D CS ( p θ 1 , p θ 2 ) = J F ( 2 θ 1 : 2 θ 2 ) + log E p 2 θ 1 exp ( k ( x ) )   E p 2 θ 2 exp ( k ( x ) ) E p θ 1 + θ 2 exp ( k ( x ) ) .
We check that when θ 1 = θ 2 , we have D CS ( p θ 1 , p θ 2 ) = 0 . Closed-form formula were also reported for the Cauchy-Schwarz divergence between densities of an exponential family in [33]. Table 2 reports the formula for Onicescu’s informational energy and the Shannon’s entropy [34] for densities belonging to some common exponential families. These formula can be recovered easily from the generic formula using the canonical decompositions of exponential families reported in [29]. Shannon’s entropy [34] of a density p θ of E is
H ( p θ ) = F ( θ ) θ F ( θ ) E p θ [ k ( x ) ] ,
H ( p θ ) = F * ( η ) E p θ [ k ( x ) ] ,
where F * denotes the Legendre-Fenchel convex conjugate and η = F ( θ ) = E p θ [ t ( x ) ] the moment parameter.
Notice that when k ( x ) = 0 (no auxiliary carrier term, e.g., Gaussian family), we have E p [ e k ( x ) ] = E p [ 1 ] = p ( x ) d μ ( x ) = 1 for any density p M . In that case, the above formula simplify as follows:
I ( p θ ) = exp F ( 2 θ ) 2 F ( θ ) ,
ρ ( p θ 1 , p θ 2 ) = exp ( J F ( 2 θ 1 : 2 θ 2 ) ) ,
D CS ( p θ 1 , p θ 2 ) = J F ( 2 θ 1 : 2 θ 2 ) .
The Cauchy-Schwarz divergence between mixtures of Gaussians has been reported in [35], and extended to mixtures of exponential families with conic natural parameter spaces in [33].
Furthermore, since the Jensen divergence J F is defined for a strictly convex generator F modulo an affine term, we may choose the representative F ( θ ) = log p θ ( ω ) = : l θ ( ω ) for the equivalence class [ F ] of strictly convex functions, where ω is any point belonging to the the support X of the exponential family X and l θ ( · ) the (concave) log-likelihood function, see [36] for details.
It follows that, we can rewrite the Onicescu’s informational energy, correlation coefficient and the Cauchy-Schwarz divergence when k ( x ) = 0 as follows:
I ( p θ ) = p θ 2 ( ω ) p 2 θ ( ω ) ,   ω X
ρ ( p θ 1 , p θ 2 ) = p 2 θ 1 ( ω ) p 2 θ 2 ( ω ) p θ 1 + θ 2 ( ω ) ,   ω X ,
D CS ( p θ 1 , p θ 2 ) = log p 2 θ 1 ( ω ) p 2 θ 2 ( ω ) p θ 1 + θ 2 ( ω ) ,
= l θ 1 + θ 2 ( ω ) l 2 θ 1 ( ω ) + l 2 θ 2 ( ω ) 2 ,   ω X .
Moreover, the Cauchy-Schwarz divergence can be generalized to the broader class of Hölder divergences [28] for conjugate exponents 1 α + 1 β = 1 with α > 1 , β > 1 , and γ > 0 as follows:
D H o ¨ lder α , γ ( p θ 1 , p θ 2 ) : = log X p ( x ) γ / α q ( x ) γ / β d μ ( x ) X p ( x ) γ d μ ( x ) 1 / α X q ( x ) γ d μ ( x ) 1 / β ,
= log p γ α θ 1 + γ β θ 2 ( ω ) p γ θ 1 1 α ( ω ) p γ θ 2 1 β ( ω ) ,   ω X ,
D H o ¨ lder α , γ ( p λ 1 , p λ 2 ) = log p γ α θ ( λ 1 ) + γ β θ ( λ 2 ) ( ω ) p γ θ ( λ 1 ) 1 α ( ω ) p γ θ ( λ 2 ) 1 β ( ω ) ,   ω X .
The latter two formulae hold when k ( x ) = 0 (no auxiliary carrier term like the Gaussian family). When α = β = γ = 2 , we recover the Cauchy-Schwarz divergence: D H o ¨ lder 2 , 2 ( p θ 1 , p θ 2 ) = D CS ( p θ 1 , p θ 2 ) .

3. Some Illustrating Examples

Let us illustrate how to instantiate the generic formula with some examples of exponential families.

3.1. Exponential Family of Exponential Distributions

Consider the family of exponential distributions with rate parameter λ > 0 . The densities of this exponential family writes as p λ ( x ) = λ exp ( λ x ) with support X = [ 0 , ) . We use the canonical decomposition of the exponential family to get t ( x ) = x , θ = λ , F ( θ ) = log θ and k ( x ) = 0 . It follows that
I ( p θ ) = exp F ( 2 θ ) 2 F ( θ ) ,
= exp log 2 θ + 2 log θ ,
= exp log 2 + log θ ,
= θ 2 .
Thus I ( p λ ) = λ 2 . Similarly, we find that
I ( p λ 1 , p λ 2 ) = exp F θ 1 + θ 2 F θ 1 F θ 2 E p θ 1 + θ 2 [ exp ( k ( x ) ) ] ,
= λ 1 λ 2 λ 1 + λ 2 .
Thus ρ ( p λ 1 , p λ 2 ) = 2 λ 1 λ 2 λ 1 + λ 2 , and D CS ( p λ 1 , p λ 2 ) = log λ 1 + λ 2 2 1 2 log ( λ 1 λ 2 ) . We check that D CS ( p λ 1 , p λ 2 ) 0 since the arithmetic mean A ( λ 1 , λ 2 ) = λ 1 + λ 2 2 is greater or equal than the geometric mean G ( λ 1 , λ 2 ) = λ 1 λ 2 , and D CS ( p λ 1 , p λ 2 ) = log A ( λ 1 , λ 2 ) G ( λ 1 , λ 2 ) .
Choose ω = 0 so that p λ ( ω ) = λ and l λ ( ω ) = log λ .
ρ ( p θ 1 , p θ 2 ) = p 2 θ 1 ( ω ) p 2 θ 2 ( ω ) p θ 1 + θ 2 ( ω ) ,
= 2 λ 1 λ 2 λ 1 + λ 2 ,
D CS ( p θ 1 , p θ 2 ) = l θ 1 + θ 2 ( ω ) l 2 θ 1 ( ω ) + l 2 θ 2 ( ω ) 2 ,
= log ( λ 1 + λ 2 ) 1 2 ( log ( 2 λ 1 ) + log ( 2 λ 2 ) ) = log λ 1 + λ 2 2 λ 1 λ 2 .
To illustrate the fact that the formula Equation() is independent of the choice of ω , let us consider ω = 1 so that p λ ( ω ) = λ exp ( λ ) with log-likelihood l λ ( ω ) = log ( λ ) λ . The correlation coefficient is then calculated as
ρ ( p θ 1 , p θ 2 ) = p 2 θ 1 ( ω ) p 2 θ 2 ( ω ) p θ 1 + θ 2 ( ω ) ,
= 2 λ 1 exp ( 2 λ 1 ) 2 λ 2 exp ( 2 λ 2 ) ( λ 1 + λ 2 ) exp ( ( λ 1 + λ 2 ) ) ,
= 2 λ 1 λ 2 λ 1 + λ 2 exp ( ( λ 1 + λ 2 ) ) exp ( ( λ 1 + λ 2 ) ) ,
= 2 λ 1 λ 2 λ 1 + λ 2 .

3.2. Exponential Family of Poisson Distributions

The Poisson family of probability mass functions (PMFs) p λ ( x ) = λ x exp ( λ ) x ! where λ > 0 denotes the intensity parameter and x X = { 0 , 1 , , } is a discrete exponential family with sufficient statistic t ( x ) = x , natural parameter θ ( λ ) = log λ (affine natural parameter space), cumulant function F ( θ ) = exp ( θ ) , and auxiliary carrier term k ( x ) = log x ! . The informational energy is
I ( p λ ) = x = 0 p λ 2 ( x ) ,
= I ( p θ ) = exp F ( 2 θ ) 2 F ( θ ) E p 2 θ exp ( k ( x ) ) ,
= exp ( e 2 θ 2 e θ ) E p 2 θ 1 x ! ,
= e λ 2 2 λ E p λ 2 1 x ! .

3.3. Exponential Family of Univariate Normal Distributions

Consider the set of univariate normal probability density function:
N : = p λ ( x ) = 1 σ 2 π exp 1 2 x μ σ 2 ,   λ = ( μ , σ 2 ) R × R + + ,
where R + + = { x R   :   x > 0 } denote the set of positive reals. Family N is interpreted as an exponential family indexed by the source parameter λ = ( μ , σ 2 ) Λ with Λ = R × R + + . The corresponding natural parameter is θ ( λ ) = μ σ 2 , 1 2 σ 2 with the sufficient statistic t ( x ) = ( x , x 2 ) on the support X = ( , ) (and no additional carrier term, i.e., k ( x ) = 0 ). The cumulant function for the normal family is F ( θ ) = θ 1 2 4 θ 2 + 1 2 log π θ 2 .
We have
I ( p θ ) = exp F ( 2 θ ) 2 F ( θ ) ,
= exp ( 2 θ 1 ) 2 4 ( 2 θ 2 ) + 1 2 log π ( 2 θ 2 ) + 1 2 θ 1 2 θ 2 log π ( θ 2 ) ,
= exp 1 2 log π ( 2 θ 2 ) log π ( θ 2 ) ,
= exp 1 2 log ( 2 π ) 1 2 log ( 2 σ 2 ) ,
= 1 2 σ π .
Similar calculations for θ 1 = ( μ 1 , σ 1 ) and θ 2 = ( μ 2 , σ 2 ) yield
I ( p μ 1 , σ 1 , p μ 2 , σ 2 ) = 1 2 π exp ( μ 1 μ 2 ) 2 2 σ 1 2 + 2 σ 2 2 σ 1 2 + σ 2 2 .
We check that I ( p θ ) = I ( p θ , p θ ) .
It follows that Onicescu’s correlation coefficient between two normal densities is:
ρ ( p μ 1 , σ 1 , p μ 2 , σ 2 ) = 2 σ 1 σ 2 σ 1 2 + σ 2 2 exp ( μ 1 μ 2 ) 2 2 σ 1 2 + 2 σ 2 2 ,
and the Cauchy-Schwarz divergence between two univariate Gaussians is:
D CS ( p μ 1 , σ 1 , p μ 2 , σ 2 ) = log ρ ( p μ 1 , σ 1 , p μ 2 , σ 2 ) = ( μ 1 μ 2 ) 2 2 σ 1 2 + 2 σ 2 2 + 1 2 log 1 2 σ 1 σ 2 + σ 2 σ 1 .

3.4. Exponential Family of Multivariate Normal Distributions

Consider the example of the multivariate normal (MVN) family: The parameter λ = ( λ v , λ M ) of a MVN consists of a vector part λ v = μ and a d × d positive-definite matrix part λ M = Σ 0 . The density is given by
p λ ( x ; λ ) = 1 ( 2 π ) d 2 | λ M | exp 1 2 ( x λ v ) λ M 1 ( x λ v ) ,
where | · | denotes the matrix determinant. Choose the sufficient statistic t ( x ) = ( x , 1 2 x x ) so that θ = ( θ v = Σ 1 μ , θ M = Σ 1 ) . Since k ( x ) = 0 , let ω = 0 ,
p λ ( 0 ) = 1 ( 2 π ) d 2 | Σ | exp 1 2 μ Σ 1 μ ,
and apply the formula of Equation (33) with 2 θ M = 2 Σ 1 = 1 2 Σ 1 :
I ( p θ ) = p θ 2 ( 0 ) p 2 θ ( 0 ) ,
= ( 2 π ) d 2 1 2 Σ 1 2 ( 2 π ) d | Σ | = 1 2 d π d 2 | Σ | 1 2 .
Let us calculate the formula for the Cauchy-Schwarz divergence between two multivariate Gaussian distributions. We have θ ( λ ) = ( θ b , θ M ) = ( Σ 1 μ , Σ 1 ) for λ = ( μ , Σ ) . Conversely, we have λ ( θ ) = ( θ M 1 θ v , θ M 1 ) . It follows that
λ ( θ 1 + θ 2 ) = ( Σ 1 1 + Σ 2 1 ) 1 ( Σ 1 1 μ 1 + Σ 2 1 μ 2 ) , ( Σ 1 1 + Σ 2 1 ) 1 .
In particular, we have λ ( 2 θ ) = ( μ , 1 2 Σ ) . Let ω = 0 so that
p λ ( 0 ) = 1 ( 2 π ) d 2 | Σ | exp 1 2 μ Σ 1 μ ,
l λ ( 0 ) = d 2 log ( 2 π ) 1 2 log | Σ | 1 2 μ Σ 1 μ .
Thus we get
D CS ( p λ 1 , p λ 2 ) = l λ ( θ 1 + θ 2 ) ( ω ) l λ ( 2 θ 1 ) ( ω ) + l λ 2 θ 2 ( ω ) 2 , D CS ( p μ 1 , Σ 1 , p μ 2 , Σ 2 ) = 1 2 log 1 2 d | Σ 1 |   Σ 2 | | | ( Σ 1 1 + Σ 2 1 ) 1 | + 1 2 μ 1 Σ 1 1 μ 1 + 1 2 μ 2 Σ 2 1 μ 2
1 2 ( Σ 1 1 μ 1 + Σ 2 1 μ 2 ) ( Σ 1 1 + Σ 2 1 ) 1 ( Σ 1 1 μ 1 + Σ 2 1 μ 2 ) .
The formula coincides with the formula of Equation (66) when Σ 1 = σ 1 2 and Σ 2 = σ 2 2 .

3.5. Exponential Family of Pareto Distributions

Consider the family of Pareto densities defined by a shape parameter a > 0 and a prescribed scale parameter k > 0 as follows:
p a ( x ) = a k a x a + 1 ,   x [ k , ) .
Writing the density as p a ( x ) = exp ( a log k + log a ( a + 1 ) log x ) = p θ ( a ) ( x ) , we deduce that the Pareto densities form an exponential family with natural parameter θ = a + 1 , sufficient statistic t ( x ) = log x and k ( x ) = 0 . Let us choose ω = k , and apply the generic formula for the informational energy with θ ( a ) = a + 1 and θ 1 ( b ) = b 1 (and 2 θ ( a ) = θ 1 ( 2 a + 2 ) = 2 a + 2 1 = 2 a + 1 ):
I ( p θ ) = p θ 2 ( ω ) p 2 θ ( ω ) ,
I ( p a ) = p θ ( a ) 2 ( k ) p 2 θ ( a ) ( k ) = p a 2 ( k ) p 2 a + 1 ( k )
= a k a x a + 1 2 k 2 a + 2 ( 2 a + 1 ) k 2 a + 1 ,
= a 2 k ( 2 a + 1 ) .

3.6. Instantiating Formula with a Computer Algebra System

In general, we can automate the calculations of closed-form formula for conic exponential families using a computer algebra system (CAS) by defining the source-to-natural parameter conversion function θ ( λ ) , and then apply the formula
D CS ( p λ 1 , p λ 2 ) = log p θ ( λ 1 ) + θ ( λ 2 ) ( ω ) p 2 θ ( λ 1 ) ( ω ) p 2 θ ( λ 2 ) ( ω ) ,   ω X .
For example, using the CAS Maxima (http://maxima.sourceforge.net/ accessed on 14 April 2022), we can calculate the formula of the information energy of Pareto densities as follows:
  • /* Pareto densities form an exponential family */
  • assume(k>0);
  • assume(a>0);
  • Pareto(x,a):=a*(k**a)/(x**(a+1));
  • /* check that it is a density (=1) */
  • integrate(Pareto(x,a),x,k,inf);
  • /* calculate Onicescu’s informational energy */
  • integrate(Pareto(x,a)**2,x,k,inf);
  • /* method bypassing the integral calculation */
  • omega:k;
  • (Pareto(omega,a)**2)/Pareto(omega,2*a+1);

4. Informational Energy and the Laws of Thermodynamics

The informational energy was originally motivated by an analogy to kinetic energy in physics, and proves useful when investigating the thermodynamics laws on a statistical manifold [20] where thermodynamic processes [37] can be viewed as oriented trajectories on the manifold as depicted in Figure 1: Indeed, the third law of thermodynamics states that the entropy and the kinetic energy (i.e., informational energy) of an isolated thermodynamical system should vary with opposite directions. Thus when viewing a parametric family M = { p θ   :   θ Θ } of distributions as a statistical manifold [20,38], a valid oriented thermodynamic path { p θ ( t ) , t I R } (with time t increasing) should satisfy the following condition:
H ( p θ ( t + d t ) ) H ( p θ ( t ) )   ×   I ( p θ ( t + d t ) ) I ( p θ ( t ) ) < 0 .
This condition can be written equivalently as the following variational inequality:
d d t H ( p θ ( t ) )   ×   d d t I ( p θ ( t ) ) < 0 .
We consider the thermodynamic process paths on exponential family manifolds in Section 4.1 and on location-scale manifolds in Section 4.2.

4.1. Exponential Family Manifolds

Consider statistical manifolds induced by exponential families [20], and let us report the variations of entropy and informational energy on those manifolds.
Lemma 1. 
The variation of entropy of a density p θ ( t ) of a D-dimensional exponential family with log-normalized F ( θ ) for θ = ( θ 1 , , θ D ) is
d d t H ( p θ ( t ) ) = θ ˙ ( t )   2 F ( θ ( t ) )   θ ( t ) ,
where θ ˙ ( t ) = d d t θ 1 ( t ) , , d d t θ D ( t ) .
Proof. 
The entropy H ( p θ ) of a density p θ of a (potentially reparametrized) natural exponential family with log-normalizer F ( θ ) is given by [34]:
H ( p θ ) = F * ( η ) = F ( θ ) θ , F ( θ ) ,
where · , · denotes the scalar product (Euclidean inner product). It follows that we have
d d t H ( p θ ( t ) ) = d d t   F ( θ ( t ) ) θ ( t ) , F ( θ ( t ) ) , = θ ˙ ( t ) , F ( θ ( t ) ) θ ˙ ( t ) , F ( θ ( t ) ) θ ( t ) , 2 F ( θ ( t ) ) , θ ˙ ( t ) , = θ ˙ ( t )   2 F ( θ ( t ) )   θ ( t ) .
Notice that the Hessian matrix 2 F ( θ ) is positive-definite since the log-normalizer F ( θ ) is strictly convex for a minimal regular exponential family [20]. □
Next, we report the variation of the informational energy on any thermodynamic process trajectory:
Lemma 2. 
The variation of informational energy of a density p θ ( t ) of an exponential family of order D with log-normalizer F ( θ ) and zero auxiliary carrier term k ( x ) is
d d t I ( p θ ( t ) ) = 2   θ ˙ ( t ) , F ( 2 θ ( t ) ) F ( θ ( t ) )   I ( p θ ( t ) ) ,
provided that 2 θ Θ .
Proof. 
Assume no extra auxiliary carrier term (i.e., k ( x ) = 0 ) so that by Theorem 1 we have I ( p θ ) = exp ( F ( 2 θ ) 2 F ( θ ) ) . Therefore, we get
d d t I ( p θ ( t ) ) = 2 θ ˙ ( t ) , F ( 2 θ ( t ) ) F ( θ ( t ) )   I ( p θ ( t ) ) .
Thus we can check that whether the statistical normal manifold [20] satisfies the third thermodynamic law everywhere or not as follows: We have H ( p μ ( t ) , σ ( t ) ) = log ( 2 π e σ ( t ) ) and I ( p μ ( t ) , σ ( t ) ) = 1 2 π σ ( t ) . Therefore we get:
d d t H ( p θ ( t ) ) × d d t I ( p θ ( t ) ) = σ ˙ ( t ) σ ( t ) × σ ˙ ( t ) 1 2 π σ 2 ( t ) , = 1 2 π σ ˙ ( t ) 2 σ 3 ( t ) < 0 ,
since σ ( t ) > 0 for all t. Any smooth curve on a statistical normal manifold can be a thermodynamic process path satisfying the third law of thermodynamics.

4.2. Location-Scale Manifolds

Consider the setting where M is a statistical manifold modeling a location-scale family:
M = p l , s ( x ) = 1 s   p std x l s ,   ( l , s ) ( R , R + + ) ,
where p std ( x ) denotes the standard probability density function of the family (i.e., p std ( x ) = p 0 , 1 ( x ) ). The family of normal distributions and the family of Cauchy distributions are two examples of location-scale families with standard density p std ( x ) = 1 2 π   exp 1 2 x 2 and p std ( x ) = 1 π ( 1 + x 2 ) , respectively.
By a change of variable y = x l s in the integral definitions of the entropy (Equation (4)) and the informational energy (Equation (1)) of a location-scale probability density function p l , s ( x ) , we find that
H ( p l , s ) = H ( p std ) + log s ,
I ( p l , s ) = 1 s   I ( p std ) .
The informational energy of the standard normal and Cauchy distributions are 1 2 π (since e x 2 d x = π ) and 1 2 (since 1 ( 1 + x 2 ) 2 d x = π 2 ), respectively. Thus the informational energy of the normal distributions N ( μ , σ ) and the Cauchy distributions C ( l , s ) are 1 2 σ π (see also Table 2) and 1 2 s , respectively.
Using a first-order Taylor expansion on the scale parameter
s ( t + d t ) s ( t ) + s ˙ ( t ) d t ,
we get
d d t H ( p l ( t ) , s ( t ) ) = s ˙ ( t ) ,
and
d d t I ( p l ( t ) , s ( t ) ) = s ˙ ( t ) s 2 ( t )   I ( p std ) .
Therefore, we have
d d t H ( p θ ( t ) ) × d d t I ( p θ ( t ) ) = s ˙ ( t ) 2 s 3 ( t )   I ( p std ) < 0 ,
since s ( t ) > 0 and I ( p std ) > 0 for any (standard) probability density function. Thus all smooth paths on a location-scale manifold are compatible with the third law of thermodynamics.
Note that the second law of thermodynamics states that the entropy of an isolated system increases over time. That is, for a thermodynamic process { p θ ( t ) , t I R } , we shall have H ( p θ ( t ) ) < H ( p θ ( t + d t ) ) . On a statistical location-scale manifold with differential entropy H ( p l ( t ) , s ( t ) ) = H ( p std ) + log s ( t ) , we have
d d t   H ( p l ( t ) , s ( t ) ) = s ˙ ( t ) s ( t ) .
Thus the entropy of a thermodynamic process increases whenever the scale s ( t ) increases with time (i.e., s ˙ ( t ) > 0 ) since s ( t ) > 0 . Therefore not all oriented thermodynamic process paths on a location-scale manifold satisfies the second law of thermodynamics but only paths with increasing scales.

5. Summary and Discussion

Shannon’s entropy [11] and Onicescu’s informational energy [5] are two complementary measures of uncertainty of a random variable: When randomness increases, Shannon’s entropy increases but Onicescu’s informational energy decreases. Table 1 compares the properties of these randomness measures: Interestingly, the Onicescu’s informational energy is strictly convex and is always positive while Shannon’s entropy is strictly concave and can be negative for continuous random variables (e.g., differential entropy of a normal distribution with small variance). The fundamental quantity of the informational energy of Equation (1) has been studied in various other fields under different names: Simpson’s diversity index [16] or Gini-Simpson index [18] in ecology, Herfindahl–Hirschman index [13] in economics, or the index of coincidence [14] in cryptanalysis. In this work, we show how to compute the informational energy for a distribution belonging to an exponential family in Theorem 1, and reported closed-form formula for its associated correlation coefficient and related statistical divergences. We then show how to characterize thermodynamic processes satisfying the third law of thermodynamics by interpreting them as smooth oriented trajectory paths on a statistical manifold and asserting that the entropy and the informational energy (interpreted as the kinetic energy) shall vary in opposite directions.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author would like to thank the reviewers for their comments and suggestions which lead to this revised manuscript.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Billingsley, P. Probability and Measure, 3rd ed.; Wiley: Hoboken, NJ, USA, 1995. [Google Scholar]
  2. Iosifescu, M. Obituary notice: Octav Onicescu, 1892–1983. Int. Stat. Rev. 1986, 54, 97–108. [Google Scholar]
  3. Crepel, P.; Fienberg, S.; Gani, J. Statisticians of the Centuries; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  4. Onicescu, O.; Stefanescu, V. Elements of Informational Statistics with Applications/Elemente de Statistica Informationala cu Aplicatii (in Romanian); Editura Tehnica: Bucharest, Romania, 1979. [Google Scholar]
  5. Onicescu, O. Théorie de l’information énergie informationelle. Comptes Rendus De L’Academie Des Sci. Ser. AB 1966, 263, 841–842. [Google Scholar]
  6. Pardo, L.; Taneja, I. Information Energy and Its Aplications. In Advances in Electronics and Electron Physics; Elsevier: Amsterdam, The Netherlands, 1991; Volume 80, pp. 165–241. [Google Scholar]
  7. Pardo, J.A.; Vicente, M. Asymptotic distribution of the useful informational energy. Kybernetika 1994, 30, 87–99. [Google Scholar]
  8. Nielsen, F.; Nock, R. On Rényi and Tsallis entropies and divergences for exponential families. arXiv 2011, arXiv:1105.3259. [Google Scholar]
  9. Vajda, I.; Zvárová, J. On generalized entropies, Bayesian decisions and statistical diversity. Kybernetika 2007, 43, 675–696. [Google Scholar]
  10. Ho, S.W.; Yeung, R.W. On the discontinuity of the Shannon information measures. IEEE Trans. Inf. Theory 2009, 55, 5362–5374. [Google Scholar]
  11. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  12. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  13. Matsumoto, A.; Merlone, U.; Szidarovszky, F. Some notes on applying the Herfindahl–Hirschman Index. Appl. Econ. Lett. 2012, 19, 181–184. [Google Scholar] [CrossRef]
  14. Harremoës, P.; Topsoe, F. Inequalities between entropy and index of coincidence derived from information diagrams. IEEE Trans. Inf. Theory 2001, 47, 2944–2960. [Google Scholar] [CrossRef]
  15. Friedman, W.F. The Index of Coincidence and Its Applications in Cryptography; Department of Ciphers Publ 22; Riverbank Laboratories: Geneva, IL, USA, 1922. [Google Scholar]
  16. Simpson, E.H. Measurement of diversity. Nature 1949, 163, 688. [Google Scholar] [CrossRef]
  17. Nunes, A.P.; Silva, A.C.; Paiva, A.C.D. Detection of masses in mammographic images using geometry, Simpson’s Diversity Index and SVM. Int. J. Signal Imaging Syst. Eng. 2010, 3, 40–51. [Google Scholar] [CrossRef]
  18. Rao, C.R. Gini-Simpson index of diversity: A characterization, generalization and applications. Util. Math. 1982, 21, 273–282. [Google Scholar]
  19. Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef] [Green Version]
  20. Calin, O.; Udrişte, C. Geometric Modeling in Probability and Statistics; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  21. Agop, M.; Gavriluţ, A.; Rezuş, E. Implications of Onicescu’s informational energy in some fundamental physical models. Int. J. Mod. Phys. B 2015, 29, 1550045. [Google Scholar] [CrossRef]
  22. Chatzisavvas, K.C.; Moustakidis, C.C.; Panos, C. Information entropy, information distances, and complexity in atoms. J. Chem. Phys. 2005, 123, 174111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Alipour, M.; Mohajeri, A. Onicescu information energy in terms of Shannon entropy and Fisher information densities. Mol. Phys. 2012, 110, 403–405. [Google Scholar] [CrossRef]
  24. Ou, J.H.; Ho, Y.K. Shannon, Rényi, Tsallis Entropies and Onicescu Information Energy for Low-Lying Singly Excited States of Helium. Atoms 2019, 7, 70. [Google Scholar] [CrossRef] [Green Version]
  25. Andonie, R.; Cataron, A. An information energy LVQ approach for feature ranking. In Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium, 28–30 April 2004. [Google Scholar]
  26. Rizescu, D.; Avram, V. Using Onicescu’s informational energy to approximate social entropy. Procedia Soc. Behav. Sci. 2014, 114, 377–381. [Google Scholar] [CrossRef] [Green Version]
  27. Jenssen, R.; Principe, J.C.; Erdogmus, D.; Eltoft, T. The Cauchy–Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels. J. Frankl. Inst. 2006, 343, 614–629. [Google Scholar] [CrossRef]
  28. Nielsen, F.; Sun, K.; Marchand-Maillet, S. On Hölder projective divergences. Entropy 2017, 19, 122. [Google Scholar] [CrossRef] [Green Version]
  29. Nielsen, F.; Garcia, V. Statistical exponential families: A digest with flash cards. arXiv 2009, arXiv:0911.4863. [Google Scholar]
  30. Barndorff-Nielsen, O. Information and Exponential Families; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  31. Nielsen, F.; Nock, R. A closed-form expression for the Sharma–Mittal entropy of exponential families. J. Phys. A Math. Theor. 2011, 45, 032003. [Google Scholar] [CrossRef] [Green Version]
  32. Nielsen, F.; Nock, R. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Process. Lett. 2013, 21, 10–13. [Google Scholar] [CrossRef] [Green Version]
  33. Nielsen, F. Closed-form information-theoretic divergences for statistical mixtures. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 1723–1726. [Google Scholar]
  34. Nielsen, F.; Nock, R. Entropies and cross-entropies of exponential families. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 3621–3624. [Google Scholar]
  35. Kampa, K.; Hasanbelliu, E.; Principe, J.C. Closed-form Cauchy-Schwarz PDF divergence for mixture of Gaussians. In Proceedings of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2578–2585. [Google Scholar]
  36. Nielsen, F.; Nock, R. Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family. arXiv 2020, arXiv:2003.02469. [Google Scholar]
  37. Ito, S.; Dechant, A. Stochastic time evolution, information geometry, and the Cramér-Rao bound. Phys. Rev. X 2020, 10, 021056. [Google Scholar] [CrossRef]
  38. Nielsen, F. The Many Faces of Information Geometry. Not. Am. Math. Soc. 2022, 69, 36–45. [Google Scholar] [CrossRef]
Figure 1. Visualizing a thermodynamic process as an oriented trajectory on a statistical manifold.
Figure 1. Visualizing a thermodynamic process as an oriented trajectory on a statistical manifold.
Foundations 02 00025 g001
Table 1. Comparison between Shannon’s entropy and Onicescu’s informational energy.
Table 1. Comparison between Shannon’s entropy and Onicescu’s informational energy.
Entropy H ( p ) Informational Energy I ( p )
H ( p ) = p ( x ) log p ( x ) d μ ( x ) I ( p ) = p 2 ( x ) d μ ( x )
convexitystrictly concavestrictly convex
rangecan be negativealways positive
uncertainty measureaugments with disorderdecreases with disorder
uniform discrete distribution u H ( u ) = log d I ( u ) = 1 d
(with alphabet size | X | = d )
bound H ( p ) 1 I ( p ) I ( p ) 1 H ( p )
Inequality: H ( p ) + 1 2   I ( p ) 1 log 2
Table 2. Comparisons between Shannon’s entropy and Onicescu’s informational energy for common distributions of exponential families. | · | denotes the matrix determinant, Γ ( · ) the gamma function, ψ ( · ) the digamma function, and B ( α , β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) .
Table 2. Comparisons between Shannon’s entropy and Onicescu’s informational energy for common distributions of exponential families. | · | denotes the matrix determinant, Γ ( · ) the gamma function, ψ ( · ) the digamma function, and B ( α , β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) .
FamilyEntropyInformational Energy
Generic E F ( θ ) θ F ( θ ) E p θ [ k ( x ) ] e F ( 2 θ ) 2 F ( θ ) E p 2 θ exp ( k ( x ) )
Univar. normal N ( μ , σ ) 1 2 log ( 2 π e σ 2 ) 1 2 σ π
Multivar. normal N ( μ , Σ ) 1 2 log | 2 π e Σ | π d 2 2 d | Σ | 1 2
LogNormal ( μ , σ ) log ( σ e μ + 1 2 2 π ) 1 2 σ π exp ( σ 2 4 μ )
Exponential ( λ ) 1 log λ λ 2
Pareto k ( a ) 1 + 1 a + log k a a 2 k ( 2 a + 1 )
Gamma ( α , β ) α + log Γ ( α ) β + ( 1 α ) ψ ( α ) 1 β ( 2 α 1 ) B α , 1 2
Beta ( α , β ) log B ( α , β ) ( α 1 ) ψ ( α ) B 2 ( α , β ) Γ ( 2 α 1 ) Γ ( 2 β 1 ) Γ ( 2 α + 2 β 2 )
( β 1 ) ψ ( β )
+ ( α + β 2 ) ψ ( α + β )
Poisson ( λ ) λ ( 1 log λ ) + e λ i = 0 λ i log i ! i ! exp ( 2 λ ) i = 0 λ 2 i ( i ! ) 2
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nielsen, F. Onicescu’s Informational Energy and Correlation Coefficient in Exponential Families. Foundations 2022, 2, 362-376. https://doi.org/10.3390/foundations2020025

AMA Style

Nielsen F. Onicescu’s Informational Energy and Correlation Coefficient in Exponential Families. Foundations. 2022; 2(2):362-376. https://doi.org/10.3390/foundations2020025

Chicago/Turabian Style

Nielsen, Frank. 2022. "Onicescu’s Informational Energy and Correlation Coefficient in Exponential Families" Foundations 2, no. 2: 362-376. https://doi.org/10.3390/foundations2020025

Article Metrics

Back to TopTop