Weighted Relative Group Entropies and Associated Fisher Metrics

A large family of new α-weighted group entropy functionals is defined and associated Fisher-like metrics are considered. All these notions are well-suited semi-Riemannian tools for the geometrization of entropy-related statistical models, where they may act as sensitive controlling invariants. The main result of the paper establishes a link between such a metric and a canonical one. A sufficient condition is found, in order that the two metrics be conformal (or homothetic). In particular, we recover a recent result, established for α=1 and for non-weighted relative group entropies. Our conformality condition is “universal”, in the sense that it does not depend on the group exponential.


History
The inhabitants of the Universe of Uncertainty are probability distribution functions (PDFs). One can try to understand them through their entropy, a property which provides us a measure of disorder. Since its discovery, in the second part of the 19th Century, entropy was investigated by functional, algebraic or analytical techniques. The first geometric tools arose in the 1920s, through the work of Fisher ([1]), whose information matrix is the germ for what today is known as the Fisher metric. The notion was generalized by Rao ([2,3]) in the 1940's, who put it in the appropriate context of Riemannian geometry. After a 30 years gap, Efron ([4]) and Amari ([5,6]) reopened the interest in differential geometric invariants associated to statistical models. Since the 1980s, the geometrization of the parameters space of the PDFs evolved into a field with rapidly growing expanse, including (among others) the new topics of statistical manifolds and of the dual connections ( [7][8][9][10][11]; see in [12] for a recent review).
Our paper is a piece of Riemannian geometry intended for entropy study. We construct a new family of Fisher-like metrics, canonically associated to some entropy functionals, and we establish a sufficient condition in order for two of these metrics to be conformal. For describing the recipe, we must first sketch the story of its three main ingredients: the weighting procedure of an entropy functional, the relative group entropy and the group Fisher metric associated to it.
The notion of relative group entropy is more recent ( [24]), but, as a nice coincidence, its algebraic roots are contemporary with the papers of Rao about the Fisher metric and distance, previously quoted. These algebraic foundations are the (Lie) formal groups, defined by Bochner [25] in 1946, as a unifying tool between Lie groups and Lie algebras (see also Hazewinkel [26]). Tempesta used the formal group laws, in order to define universality classes of entropies ( [27,28], see also [29,30]). Starting with a formal group logarithm he defined ( [27]) group entropy functionals of the form where p is a PDF on X. As an application, the relative entropy (also known as the Kullback-Leibler divergence) was generalized ( [27]) to the relative group entropy. Gomez et al. used ([24]) the relative group entropies for defining group Fisher metrics, as "universal" extensions of the classical Fisher metrics arising from the classical Boltzmann-Gibbs entropy. Their main result states that the previous pairs of metrics are homothetic, and the constant of homothety depends on G (0) and G (0).

Motivation
Our vision about entropy will be naive and formal, without entering into deep interpretation about the physical or informational meanings. Our goal is to filter the details and to see the nude invariants of geometric nature which might enlighten us to the general behavior of the various (and apparently too many) types of entropy avatars. For more details and hints about statistical interpretations, the reader is encouraged to look into the inspirational papers [24,27,31,32].

Contents of the Paper
In Section 2, we try to systematize and to unify different definitions for the entropy functions associated to PDFs. Two procedures of refinement are recalled: through "weighting" and through "powering" the PDFs. We shall need both in Sections 4 and 5.
In Section 3, we review, in a creative manner, various methods to obtain semi-Riemannian metrics associated to families of entropy functions or functionals, as part of what we call "the geometrization problem for the entropy". In particular, we have the Fisher metrics, the Hessian metrics and the group Fisher metrics. Generalizations include "mean" Hessian metrics, with the ("partial") Hessian appearing under the integral sign. Section 4 is the main part of the paper and is devoted to a generalization of the result of Gomez et al. [24]. The Theorem 1 provides a formula, linking a canonical weighted and powered Fisher-like metric to another one, which is, moreover, associated to a group exponential. We give a sufficient condition in order the two metrics be conformal or even homothetic. This conformality condition is, in some sense, "universal", in that it does not depend on the group exponential.
Section 5 contains examples which illustrate the application of the general method from Section 4. We consider a normal PDF on the real line, with two parameters, with particular weighting functions and particular powers. As the compatibility relation is satisfied, it follows that, for any formal group exponential, we can construct a weighted Fisher metric, homothetic with the standard one. In addition to the construction of homothetic metrics in [24], our examples provide an additional control tool, through the weighting function. Moreover, our examples are well suited to be easily adapted and extended in more general frameworks.

Conventions
All the integrals are supposed to be correctly defined. Partial derivatives are supposed to commute with the integral. All the geometric objects are supposed to be differentiable, even if, in some cases, a weaker assumption would suffice (for example continuity or integrability).

Preliminaries
We review in a creative way some entropy-related notions, following mainly those in [14,27,28].
Consider X ⊂ R m the domain of a real valued random variable x. Let p = p(x) be a (differentiable) probability density function (PDF), with p(x) ≥ 0 and X p(x)dx = 1. Let f : R × X → R be a differentiable controlling function.
If the controlling function f depends on p and does not depend on x, we write f : R → R and we say the entropy is classical.
Using the convention in Section 1, we may write (1) and (2) in the equivalent normalized form where f ((p(x), x) = p(x)h((p(x), x) and f (p(x)) = p(x)h(p(x)), respectively. Normalizing does not change the property of an entropy to be generalized or classical, respectively. A slightly more general type of normalization is to consider some positive real number α and the entropy functions of the form This "powering" of the PDF acts as a control tool over the set of entropy functions and enables -sometimes-a useful refinement of the models. However, we must be aware of the fact that, with an appropriate notation, the previous entropy functions may be rewritten in the form (3).

Example 1.
Almost all the known entropies used in statistical mechanics and information geometry are classical. We list but a few, from simple to more sophisticated ones. (ii) For any fixed a ∈ R\{1}, f (p) := p p a−1 −1 a−1 provides a Tsallis entropy; when a → 1 we recover the BGS entropy.
(iii) Considering one more PDF q, the functional f (p) := p log( q p ) leads to the relative entropy (also known as the Kullback-Leibler divergence [24]) between them, which is the normalized classical entropy We accept (formally) that 0 · log( 0 q ) = 0 and p · log( p 0 ) = 0. More generally, we may start with k ≥ 1 more PDFs p 1 , . . . , p k and a differentiable function F : R k+1 → R. Then, the functional f (p) := F(p, p 1 , . . . , p k ) produces a relative entropy of p w.r.t. p 1 , . . . , p k , namely, the normalized classical entropy Moreover, this one may be viewed as a generalized entropy associated to the functional p i → F(p, p 1 , . . . , p k ), for every i = 1, k.
(iv) Let G = G(t) be a formal group logarithm, which is a differentiable real valued function with some special algebraic properties, inspired from the formal series linking Lie groups to Lie algebras. (We refer to [24,27,28] for details about these functions). The group entropy functional associated to it is defined by ( [24,27]) which is a normalized classical entropy of the form (3). If q is another PDF, then the relative group entropy functional of p and q is defined ( [24]) by which is another normalized classical entropy of the form (3). Generalization for k PDFs leading to a relative group entropy of the form (5) is also possible, but we leave this step to the reader.

Remark 1.
Consider, moreover, an arbitrary differentiable function w : X → R. The weighted generalized entropy corresponding to the triple (p, f , w), associated to the generalized entropy (1), is another generalized entropy, given by Similar weighted entropies may be defined starting from (2) and (3). For example, the weighted Tsallis entropy writes Weighting does not change the property of an entropy to be generalized or normalized, respectively. Instead, weighting may transform a classical entropy into a generalized one.

Fisher Metrics for Generalized Entropy Functions
We review in a creative way the main notions related to the Fisher metric derived from a family of generalized entropies, associated to some parameterized PDFs, following mainly the work in [24].
Consider the case when the PDF p in Section 2 depends, moreover, on n real parameters θ 1 , . . . , θ n , that is, p : X × R n → R, p = p(x, θ), with θ := (θ 1 , . . . , θ n ). Let f : R × X × R n → R be a differentiable controlling function, f = f (p, x, θ). The dependence on the parameter θ provides, via the relation (1), a generalized entropy function (loosely denoted also by) H : R n → R, given by Analogously, classical entropy functions and normalized ones arise naturally, from parameterizing with θ the relations (2) and (3), respectively.
The geometrization problem for the entropy functions. Associate a relevant geometric structure to the function H (given by (8) or by any other avatar), whose invariants might provide information about the PDF p. Use f as a control tool in this process.

Remark 2.
The fist idea that comes to mind is to consider the (classical) Hessian tensor field associated to H, with coefficients If Hess H is non-degenerate, it provides a semi-Riemannian metric (of constant signature) on R n , called Hessian metric. Its geometry was subject to many papers (see, for example, in [33,34] and the references therein) and is useful in understanding the extremum points of H, as a semi-Riemannian optimization topic. A simple example is the Euclidian metric g ij = δ ij , arising as Hessian metric from Formula (9),

Remark 3. On another hand, consider a controlling function h
, and its associated normalized classical entropy function Define If the matrix (g ij ) i,j=1,n is nowhere vanishing, we obtain the (semi-Riemannian) Fisher metric associated to H (or h) [24]. An example is included in Section 5, for a PDF of exponential type.
Denoteh(x, θ) := h(p(x, θ), θ). Then, This formula does not define (in general) a Hessian metric. In fact, this might be called "mean" Hessian metric, as we do not derivate the whole integral, but only a factor of the integrand. (Here we consider the Hessian forh as function depending on θ only.) However, in some particular cases, the Fisher metrics may (eventually) borrow the appearance of a Hessian metric, but w.r.t. other appropriate function, as in the following case [34]. Consider the PDF of exponential type p : where C = C(x), F 1 = F 1 (x), . . . , F n = F n (x) and ν = ν(θ) are smooth functions. The associated Fischer metric on R n is g = Hess ν , a Hessian metric w.r.t. ν, which is not derived from an entropy function like in Formula (9).

Example 2.
In the case of the BCS-entropy, h = log • p,h(x, θ) = log(p(x, θ)) and we get the well-known classical Fisher metric Its scalar curvature is interpreted as the average statistical uncertainty of a density matrix ( [24]). This claim is quite natural, as the scalar curvature is obtained from the curvature tensor by taking the trace (a "mean") two times successively, while the curvature tensor measures a "force" associated to the "matter" in the "Universe" R n driven by g.
Denote the entropy function This formula expresses the Fischer metric g in terms of the Hessian of H and of the "mean" Hessian of p, "weighted" byh.
and an associated weighted classical entropy function H w : R n → R, given by Their Hessians have the componentŝ In case of simultaneous non-degeneracy, the corresponding (semi-Riemannian) Hessian metricsĝ andĝ w provide a (geo)metric tool for studying the impact of the weighting function w upon the entropy of the system.
(ii) Suppose the entropy function and the weighted entropy function are given in the normalized form p(x, θ)). Then, we may calculate the components of the respective Hessiansĝ andĝ w , as in (i). Eventually, in case of non-degeneracy, we get Hessian metrics and follow the strategy from (i).
For H and H w , we may associate also the "mean" Hessian metrics g and g w , as in (12). Therefore, we have the Fisher metrics on R n , given by (iii) Similar Hessians or "mean" Hessians can be associated to relative entropy functions, as in Example 1,(iii), or to entropy group functionals, as in Example 1, (iv). If non-degenerate, they become semi-Riemannian Hessian metrics or Fisher metrics ( [24,27,28]).
(iv) Analogous entropy functions and metrics arise when weighting by functions w = w(x, θ), instead of w = w(x). In fact, this is a more flexible weighting procedure, as it does not impose universal weighting constraints upon the whole family of PDFs.

A Step Further: Passing From the Fisher Metric Group to a Weighted One
Consider a PDF p depending on n real parameters θ 1 , . . . , θ n ; that is, p : X × R n → R, p = p(x, θ), with θ := (θ 1 , . . . , θ n ). Let g be the Fischer (semi)-Riemannian metric on R n , whose coordinate functions are given by Let G = G(t) be a formal group logarithm, as in Example 1, (iv). Consider H : R n × R n → R the double parameterized group entropy functional associated to it ( [24]), by The Hessian tensor field g G , with components is called the group Fisher metric in [24]. The main result in [24] states that the metrics g and g G are homothetic, via the formula From it follows the well-known correspondence between the main invariants of g and g G , such as the Christoffel coefficients, the Riemann curvature coefficients and the scalar curvatures.
We prove now that this result remains true, in the more general assumption of αweighted group entropy functionals. Theorem 1. Let α be a positive real number. Let w : X × R n → R be a weighting function for (13), providing the double-parameterized weighted α-normalized group entropy functional Consider the α-weighted group Fisher metric g w,α G , with components )} | θ=θ dx , and the α-weighted Fisher metric g w,α , with components Then, the metrics g w,α and g w,α G are related, via the formula Proof. We begin by deriving We assign θ :=θ and use the properties: G(0) = 0 and X p(x, θ)dx = 1. We get ∂θ i ∂θ j dx, which concludes the proof.

Corollary 1.
In the hypothesis and with the notations from Theorem 1, suppose there exists a function ϕ : R n → R, such that, for every i, j = 1, n, Then the metrics g w,α and g w,α G are conformal, via the formula If, moreover, ϕ is constant, then the metrics g w,α and g w,α G are homothetic. [24], the notation θ 0 is used, instead our notationθ. We consider it more appropriate, for not creating the (possible wrong) impression of "constancy" of the variable.

Remark 5. (i) In
(ii) We used the terminology from [24], when calling g w,α G a group Fisher entropy metric, but, in our opinion, it is but a Hessian-like metric, not Fisher-like (i.e., of a "mean" Hessian type, see Section 2).
(iii) Taking w(x) = 1 and α = 1 in Corollary 1, we get ϕ = 0 and we recover the result from [24], outlined in the preamble of the section.
(iv) The relation (17) writes also A strong sufficient condition for (19) to hold true is the Hessian matrix of p (w.r.t. to θ) is proportional with the matrix ( ∂p(x,θ) (19) is satisfied iff the respective proportionality holds as a mean, through the intermediate of the integral, and weighted by wp α−2 .
(v) We may also understand Corollary 1 in the following way: let p be a parameterized PDF with an entropy function (13); if there exists a weighting function w, a positive integer α, and a function ϕ, such that (19) is satisfied, then the metrics g w,α G and g w,α are conformal, via (18). Thus, the search for pairs of conformal metrics is controlled by a triple "tool box". A nontrivial example will be given in the next section.
Moreover, we may conjecture that given a PDF p, there exist w, ϕ and α such that (19) holds. Preliminary calculations provide hints that this might be true, at least for PDFs "of exponential type".
(vi) In (18), the conformal factor depends on α, only through the intermediary of the "speed" of G around 0, not (also) through the "acceleration" of G around 0.
(vii) The condition (19) does not depend on G. In this sense, it is an "universal" condition, and we suspect it may hidden some (remarkable?) unraveled family of PDFs.
Scenario: suppose p, w, α are given, such that the conformality condition (19) holds. Calculate the scalar curvature ρ w,α (together with all the invariants associated to the metric g w,α ). Then, the previous formula allows a conformal variation of ρ w,α , controlled by the set of formal group logarithms G, which is more flexible than the homothetic variation in [24]. Homothetic transformations of the metrics are quite rigid, as they preserve the qualitative behavior of the main invariants semi-Riemannian (distance, geodesics, curvature). By contrast, conformal non-homothetic transformations produce significant geometric changes, such as passing from a plane to a sphere.
Let α be a positive real number and consider the particular weighting function w(x, θ) := (x − θ 1 ) 2 . We calculate the coefficients of the α-weighted Fisher metric g w,α : We prove now that, in the particular case of α = 3+ , the relation (19) is identically satisfied. We evaluate the left side integral in (19), in a case-by-case calculation.
as p α is an even function and all the binomial integrands are odd functions, w.r.t. (x − θ 1 ).
(We see that here we did not use the particular form of ϕ, so the result is valid for any positive α and any function ϕ = ϕ(θ).) (ii) The case i = 1, j = 1: (We see that here we did not use the particular form of α, so the result is valid for any positive α and for the function ϕ = 1 − α 3 .) (iii) The case i = 2, j = 2: where k is a nowhere vanishing function, depending on α and θ 2 . (We see that here we used both the particular form of α and of the function ϕ.) Let G be any formal group logarithm. As the relation (19) is verified for a constant function ϕ, it follows that the α-weighted group Fisher metric g w,α G is homothetic to g w,α . Using Formula (18), we obtain Remark 6. Similar examples may be obtained: for w(x, θ) := (x − θ 1 ) a , with even a; for a polynomial function w with variable (x − θ 1 ) (the odd powers are irrelevant); for w(x) := x a (or a polynomial function in x), with a significant increase of the complexity of the calculations.

Remark 7.
Consider the PDF (20), a positive real number α and a weighting function w(x, θ) = l( , where l : R → R is an arbitrary continuous function (satisfying, eventually, some additional specific hypothesis, in order the following integrals exist). The cases i = 1, j = 1 and i = 1, j = 2 of the relation (19) are identically satisfied, for the constant function First, let us remark that this family of examples cannot lead to conformal non-homothetic transformations of the metric, via the relation (18). Second, we must impose that condition (19) holds, for i = 2, j = 2 also, and this provides a strong additional constraint on ϕ and α.
Unfortunately, the existence of a suitable value of the power α is not guaranteed. For example, in the particular case of l(u) := 1 + e −u , we get Then, from the case for i = 2, j = 2 in (19), it follows that α is non-real or non-positive.

Example 3.
We consider now a particular case of the previous example. Let p = p(x, θ) be the two-variate PDF from (20); the weighting function w( (i) Let G be the Thallis group logarithm, given [24] by When q → 1, we get G(t) → t, i.e. the BGS group logarithm. By abuse, from now on, we shall suppose q ∈ R. Denote The α-weighted Fisher metric writes Its Christoffel coefficients are We calculate successively the Riemann-Christoffel coefficients and the scalar curvature ρ = ρ(θ 2 ), We have G (0) = 1 and G (0) = 1 − q. From the relation (21) we calculate the scalar curvature of the associated α-weighted group Fisher metric g w,α G : We restrain the study for q < 21+5 √ 57 6 .
As already pointed out in Example 2, the scalar curvature measures the average statistical uncertainty. In this case, both scalar curvature functions ρ and ρ G take strictly negative values, so the uncertainty is proportional to their module. The variation of ρ G is depicted in Figure 1. (ii) If we replace the Tsallis group logarithm with the Kaniakidis group logarithm [24] the scalar curvatures ρ and ρ G differ by a constant only. The reason is that G (0) = 1 and G (0) = 0, and ρ G gets no new variables from (21). (iii) Instead, replacing the Tsallis group logarithm with the Abe group logarithm [24] leads to a behavior of ρ G versus ρ similar to (i), because G (0) = 1 and G (0) = a + b. In this case, ρ G depends on a, b and θ 2 .
Remark 8. Let p = p(x, θ) be the two-variate PDF from (20); the weighting function w(x, θ) = (x − θ 1 ) 2 ; α an arbitrary positive real number; G an arbitrary group logarithm. In general, the hypothesis of the Corollary 1 is not fulfilled anymore. This implies that the metrics g w,α and g w,α G might not be conformal, and all we deduced previously for their scalar curvatures did not remain true, in general. However, in this case we can study what can be derived, directly from Theorem 1, via the relation (16). A tedious calculation produces the following formulas: The scalar curvature writes Until now, G was arbitrary. Let us particularize it: consider G(t) = t, i.e., the BGS group logarithm. The scalar curvature of the corresponding α-weighted relative group metric is Its "macroscopic" variation with respect to θ 2 and α can be seen in Figure 2. We remark the special case α = 1, when the metrics g w,1 and g w,1 G become locally Euclidean and all the related curvature invariants (including the scalar curvature) vanish. For α > 1, both scalar curvatures are negative and this behavior is clearly visible in Figure 1. For α ∈ (0, 1), both scalar curvatures are positive; to see it, we must take a magnified detail, as in Figure 3. Similar calculations can be made for other group logarithms also, following the same line. For the Tsallis group logarithm, the scalar curvature will depend also on the third variable q; for the Abe group logarithm, the scalar curvature will depend also on two more variables, namely, a and b.

Discussion
The main motivation of our research was the discovery of new metrics, able to geometrize properties of statistical phenomena. We succeeded a double-folded generalization of the construction from in [24], extending the "homothetization" of the group Fisher metric to a "conformalization" of the α-weighted ones. These are semi-Riemannian metrics associated to entropy functions depending on arbitrary weighting functions and to powers of the PDFs. However, the generalization has a price: the existence of the conformal transformation requires a compatibility relation between the weighting functions, the powers of the PDFs and the PDFs (the relation (19)). We conjectured that, for a given PDF, there exists a weighting function and a power of the PDF, such that the relation (19) holds. We speculate that the elegance of (19) suggests a deeper property, whose hidden meaning remains to be revealed.
From now on, several directions of study open, hierarchically ordered upon the complexity/difficulty of the calculations involved: (i) the constructions of new examples, by appropriate α-weighting the PDF from Section 5, as suggested in the Remark 6.
(ii) the constructions of new examples, similar to those in Section 5, for other remarkable PDFs on the real line, depending on two parameters (as the lognormal, the Gamma and the Beta distributions, following the classical approach from [10] ).
(iii) passing from m = 1 and n = 2 (as in (i) and (ii)) to arbitrary integers m and n, and appropriate α-weighting of classical examples of PDFs.
(iv) deepening the differential geometric study of the previous new metrics, and finding new geometric invariants for the modeling and the control of statistical phenomena. For example, the study of the distance related topics (see [35]), of the geodesics (see [36][37][38]) and of the various types of curvature tensor fields (see [10,[39][40][41][42] for samples of the needed geometric tools).
(These examples may also provide hints and support for our previously described conjecture.) Beyond the dreaded complications of the formalism, await the rewards of the much richer conformal geometry.