Next Article in Journal
Information Equation of State
Next Article in Special Issue
Modeling Non-Equilibrium Dynamics of a Discrete Probability Distribution: General Rate Equation for Maximal Entropy Generation in a Maximum-Entropy Landscape with Time-Dependent Constraints
Previous Article in Journal / Special Issue
Incremental Entropy Relation as an Alternative to MaxEnt
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalised Exponential Families and Associated Entropy Functions

Department of Physics, University of Antwerpen, Groenenborgerlaan 171, 2020 Antwerpen, Belgium
Entropy 2008, 10(3), 131-149; https://doi.org/10.3390/entropy-e10030131
Submission received: 26 February 2008 / Revised: 1 July 2008 / Accepted: 14 July 2008 / Published: 16 July 2008

Abstract

:
A generalised notion of exponential families is introduced. It is based on the vari- ational principle, borrowed from statistical physics. It is shown that inequivalent generalised entropy functions lead to distinct generalised exponential families. The well-known result that the inequality of Cramér and Rao becomes an equality in the case of an exponential family can be generalised. However, this requires the introduction of escort probabilities.

1. Introduction

Generalised entropy functions have been studied intensively in the second half of the past century. They have been called quasi-entropies in [1]. Every entropy function is in fact minus a relative entropy, also called a divergence. It is relative to some reference measure c. Consider the f-divergence [2,3]
Entropy 10 00131 i001
with f (u) a convex function defined for u > 0 and strictly convex at u = 1. It is minus the entropy of p, relative to c. Taking ca = 1 for all a and f (u) = u ln u one obtains the Boltzmann-Gibbs-Shannon entropy
Entropy 10 00131 i002
Note that throughout the paper discrete probabilities are considered, with events a belonging to a finite or countable alphabet A.
Recent interest in these generalised entropies within statistical physics goes back to the introduction by Tsallis [4] of the q-entropy
Entropy 10 00131 i003
with q > 0. In the limit q = 1 it converges to (2). It has been studied before in the mathematics literature by Havrda and Charvat [5], and by Daróczy [6]. Investigations within the physics community have lead to some interesting developments. One of them is the introduction of deformed logarithmic and exponential functions [7,8] — see the Section 13. They have been very useful to generalise common concepts, like that of an exponential family or of a Gaussian distribution. They also helped to clarify the pitfalls of the generalisation process. One of the surprises is the necessity to introduce escort probability functions [9] — see Section 11. In a series of papers, including [10,11], the present author has elaborated a formalism based on deformed logarithms. In the present work, it is shown that slightly more general results are obtained when abandoning these deformed logarithms.
Independent of the developments in statistical physics was the progress made in the context of game theory. A link, known to exist between maximising entropy and minimising losses [12], was generalised to arbitrary entropies by Grünwald and Dawid [13]. This lead to the introduction of the notion of generalised exponential families, notion which is also essential in [11], and which extents Lafferty’s notion of additive models [14].
In Section 2, Section 3, Section 4, Section 5 and Section 6 the maximum entropy principle and the variational principle are discussed in the context of generalised entropies. In particular, a characterisation of the maximising probability distributions is given. This is used in Section 7 to define a generalised exponential family. In Section 8 it is shown that the intersection of distinct generalised exponential families is empty and that there exists a one-to-one relation with generalised entropy functions. Section 9, Section 10, Section 11 and Section 12 discuss geometric aspects, starting with concepts from thermodynamics and introducing escort families and a generalised Fisher information matrix. Section 13 and Section 14 discuss non-extensive thermostatistics and the percolation problem as examples of the generalised formalism. The paper ends with a short discussion in Section 15.

2. Generalised entropies

Let us fix some further notations. The space of probability distributions is denoted Entropy 10 00131 i004. Expectation values are denoted Entropy 10 00131 i005. Here we follow the physics tradition to put the elements of the dual space at the l.h.s..
It is rather common to define a generalised entropy as any function I(p) of the form
Entropy 10 00131 i006
where h(u) is a continuous strictly concave function, defined on [0,1], which vanishes when u = 0 or u = 1. This is a special case of minus the f-divergence (1), with weights ca = 1. The entropy function I(p) is defined for any Entropy 10 00131 i007 and has values in [0, +∞]. In the present paper it is allowed that the function h(u) is stochastic, this means, depends also on a in A. But for convenience of notation, this dependence will not be made explicit.
Throughout the paper it is assumed that the derivative
Entropy 10 00131 i008
exists on the interval (0, 1) and defines a continuous function on the halfopen interval (0, 1]. Because h(u) is strictly concave, f (u) is strictly increasing. Note that it is allowed to diverge to −∞ at u = 0. This is indeed the case when h(u) = −u ln u and f (u) = 1 + ln u.
The function f(u) can be used to rewrite the entropy I(p) as
Entropy 10 00131 i009
Note that the latter expression implies that
Entropy 10 00131 i010
The standard definition of the Bregman divergence [15] reads (see for instance Section 3 of [14])
Entropy 10 00131 i011
In the case that f (u) diverges at u = 0 it is only well defined when qa = 0 implies pa = 0. It is a convex function of the first argument. Note that one can write
Entropy 10 00131 i012
From the latter expression it is immediately clear that D(p||q) ≥ 0, with equality if and only if p = q.

3. Maximum entropy principle

Let be given a finite number of real functions H1(a), H2(a), ···, Hn(a). Assume they are bounded from below. In a physical context these functions may be called Hamiltonians. The maximum entropy problem deals with finding the probability distribution p that maximises I(p) under the constraint that the expectation values of the Hamiltonians Hj attain given values Uj , called energies. Introduce the notation
Entropy 10 00131 i013
Then one looks for the probability distribution p𝒫U which maximises I(p).
Definition 1 
A probability distribution p𝒫U is said to satisfy the maximum entropy principle if it satisfies
I(p) ≤ I(p) < + for all p𝒫U.
In what follows a stronger condition is needed. It was introduced some 40 years ago [16] — see Theorem 7.4.1 of [17] — and is in fact a stability criterion.
Definition 2 
A probability distribution p is said to satisfy the variational principle if there exist parameters θ1,θ2, ··· , θn such that
Entropy 10 00131 i014
In statistical physics, a probability distribution satisfying the variational principle is called an equilibrium state. The quantity in (12) is minus the free energy. The well-known interpretation of (12) is then that the free energy is minimal at thermodynamic equilibrium.

4. Lagrange multipliers

A popular way to solve the maximum entropy problem is by the introduction of Lagrange parameters. However, a difficulty arises, known as the cutoff problem. It is indeed possible that some of the probabilities pa of the optimising probability distribution vanish. Let us see how this problem arises. The Lagrangean reads
Entropy 10 00131 i015
Here, α is the parameter introduced to fix the normalisation condition Σa∈A pa = 1, the θj are introduced to cope with the constraints (10). Variation of L w.r.t. the pa yields
Entropy 10 00131 i016
The existence of parameters α and θj, so that (14) holds, is known as one of the Karush-Kuhn-Tucker conditions — these are sufficient conditions for the existence of a global maximum. The problem that can arise is that it may well happen that the r.h.s. of this expression does not belong to the range of the function f(u). This situation is particularly likely to occur when f (u) does not tend to −∞ when u tends to 0. If the r.h.s. is in the range of f (u) then pa is determined uniquely by (14) because of the assumption that f (u) is a strictly increasing function.
The above problem is well known in optimisation theory. Because the constraints, defining 𝒫U, are affine, the set 𝒫U forms a simplex. Its faces are obtained by putting some of the probabilities pa equal to zero. Because the entropy function I(p) is concave it attains its maximum within one of these faces. This observation leads to the ansatz that the probability distribution p, which maximises I(p) with p in 𝒫U, if it exists, is determined by a subset A0 = {aA : pa = 0}, and by the values of the parameters α and θj, which determine the remaining probabilities via (14). Let us now try to prove this statement.

5. Characterisation

The Theorems 1 and 2 below give a characterisation of the probability distributions satisfying the variational principle. This is done separately for the cases that f(0) is finite or infinite. Of course, both theorems could have been taken together into one single theorem. But it is instructive to emphasise the complications which arise in the case of finite f (0). For the same reason the proofs do not rely on the results of [13], but are worked out independently. In the present section it is assumed that f (0) = −∞.
Lemma 1 
Assume f (0) = −∞. Let Entropy 10 00131 i017 satisfy the variational principle. Then Entropy 10 00131 i018 holds for all a ∈ A.
Proof 
 
The inverted statement is proved.
Because of the normalisation, there exists at least one aA for which p a > 0. Assume bA such that p b = 0. Let us show that this implies that p does not satisfy the variational principle.
Fix 0 < << 1. Introduce a new probability distribution p which coincides with p except that
Entropy 10 00131 i019
Let
Entropy 10 00131 i020
Then one has
Entropy 10 00131 i021
From the assumption f(0) = −∞ then follows that
Entropy 10 00131 i022
This proves that p does not satisfy the variational principle because for ε sufficiently small M (ε) is strictly larger than M (0).
                              ☐
Theorem 1 
Assume f (0) = −∞. A probability distribution p satisfies the variational principle if and only if there exists α and θ1,θ2,···, θn such that (14) holds for all aA.
Proof 
 
First assume that p satisfies (14). This implies that p a > 0 for all aA because f(0) is not defined. Hence, the divergence D(p||p) is well defined for all p. Next one calculates
Entropy 10 00131 i023
Because D(p||p) ≥ 0 with equality if and only if p = p there follows that p satisfies the variational principle.
Next assume that p satisfies the variational principle (12). From the lemma then follows that p a > 0 for all aA. Hence, the divergence D(p||p) is well-defined for all p M 1 + . It follows from the variational principle that
Entropy 10 00131 i024
Now, the function pD(p||p) is convex with continuous derivatives. The r.h.s. of the above expression is affine. Both l.h.s. and r.h.s. vanish for p = p. One then concludes that the r.h.s. is tangent to the convex function and must be identically zero. One concludes that for all p
Entropy 10 00131 i025
This implies that f ( p a ) is of the form (14) — take pa = δa,b for some fixed b to see this.
                              ☐

6. The case with cutoff

Assume now that f(0) = limu↓0 f(u) converges. Then the divergence D(p||q) is well defined for any pair of probability distributions p, q.
Theorem 2 
Assume that f (0) = limu↓0 f(u) converges. Are equivalent
1.
psatisfies the variational principle;
2.
there exist parameters α and θ1, θ2, ···, θn, and a subset A0 of A such that
  • (14) is satisfied for all aA\A0;
  • p a = 0 for all aA0;
  • Entropy 10 00131 i026 for all aA0.
Note that this last condition expresses that the r.h.s. of (14) is out of the range of f (u) because it takes a value less than f(0).
Proof 
 
1)
implies 2) As in the proof of the previous Theorem, one shows that (20) holds for all p. But now one cannot conclude (21) because some of the p a may vanish so that p lies in one of the faces of the simplex M 1 + . But one can still derive (14) for all a for which p a ≠ 0.
Assume now that p a = 0 for some given aA. Let
Entropy 10 00131 i027
Then the l.h.s. of (20) becomes
Entropy 10 00131 i028
On the other hand, the r.h.s. of (20) becomes
Entropy 10 00131 i029
From the inequality (20) then follows
Entropy 10 00131 i030
This implies the desired inequality because
Entropy 10 00131 i031
2)
implies 1) One calculates
Entropy 10 00131 i032
The variational principle now follows using the third assumption of the Theorem.
                              ☐

7. Statistical models

In the definition of the variational principle there is given a set of Hamiltonians H1(a), H2(a), ···, Hn(a), this means, real functions over the alphabet A, bounded from below. The equilibrium distribution p is then characterised by a normalisation constant α, by parameters θ1, θ2, ···, θn, and by a subset A0 of the alphabet A — see (14). The emphasis now shifts towards these parameters.
Theorem 3 
Let be given Hamiltonians H1(a),H2(a),···, Hn(a). For each θ in R n there exists at most one probability distribution p satisfying the variational principle (12) with these parameters θ.
Proof 
 
If p and q both satisfy the variational principle (12) with the same parameters θ then also the convex combination r = 1 2 p + 1 2 q has the same property because the entropy function is concave. But then one can conclude from the inequalities (12) that I(r) = 1 2 I(p) + 1 2 I(q). Because the entropy function is strictly concave there follows p = q.
The set of θ for which a p exists, satisfying the variational principle (12), is denoted 𝒟. The probability distribution is denoted pθ instead of p. The constant α appearing in (14) is replaced by α(θ).
A statistical model is a parametrised set of probability distributions. The above Theorem implies that the set (pθ)θ𝒟, of probability distributions satisfying the variational principle, is a statistical model. One can say that such a model belongs to the generalised exponential family.
Definition 3 
Let be given a generalised entropy function I(p) of the form (4). A statistical model (pθ)θ𝒟 belongs to the generalised exponential family if there exist real functions H1(a),H2(a),···, Hn(a), bounded from below, such that each member pθ of the model satisfies the variational principle (12) with these Hamiltonians and with this set of parameters.
This definition corresponds with the notion of natural generalised exponential family as introduced by Grünwald and Dawid [13]. It extends slightly the notion of phi-exponential family found in [11].
Clearly, entropy functions which differ only by a scalar factor determine the same generalised exponential family.

8. Uniqueness theorem

Let us now turn to the question whether a given model (pθ)θ𝒟 can belong to two different generalised exponential families.
Theorem 4 
Let be given a model (pθ)θ𝒟. Assume that there exists an open subset 𝒟0 of 𝒟 with the property that the set of values of pθ,a covers the open interval (0, 1)
(0, 1) ⊂ {pθ,a: θD0,aA}.
If the model belongs to two different generalised exponential families, one with entropy function I1(p), the other with entropy function I2(p), then there exists a constant λ such that I2(p) = λI1(p) for all p.
Proof 
 
Take any point u in (0, 1) and a corresponding θ D 0 and a such that pθ,a = u. From the previous theorems follows that there exist functions αi(θ) and Hamiltonians Hi1(a), Hi2(a), ···, Hin(a), with i = 1, 2, such that
Entropy 10 00131 i033
Let Entropy 10 00131 i034. Note that this is a strictly increasing continuous function. Then one has
Entropy 10 00131 i035
This relation holds also on a vicinity of θ D 0. It therefore implies the existence of λa and Ki,j such that
H2,j(a) − K2,j = λa(H1,j(a) − K1,j), j = 1, 2, ··· , n.
Then one can rewrite (30) as
Entropy 10 00131 i036
with
Entropy 10 00131 i037
valid for some neighbourhood of the given θ. Using the definition of Fa(v) one obtains
f2,a(u) = γa(θ) + λaf1,a(u),
valid on some neighbourhood of the given u ∈ (0, 1). Because u is arbitrary and the functions fia are continuous, the same expression must hold on all of (0, 1]. From 0 = hi,a(0) = Entropy 10 00131 i038 now follows that γa(θ) = 0. Therefore (33) becomes
Entropy 10 00131 i039
In particular, λa does not depend on aA. One concludes therefore that there exists λ so that f2,a(u) = λf1,a(u). This implies I2(p) = λI1(p).

9. Thermodynamics

Throughout this Section, let be given a statistical model (pθ)θ𝒟 belonging to the generalised exponential family.
Note that if pθ and pη both belong to the same set 𝒫U then they satisfy I(pθ) = I(pη). Hence, a function S(U) can be defined by
S(U) = I(pθ) whenever 〈pθ,Hj〉 = Uj for j = 1, 2, ··· , n.
In the physics literature, this function is called the thermodynamic entropy (it was called specific entropy in [13]; but note that specific entropy has a different meaning in thermodynamics). The concept of thermodynamic entropy was first introduced by Clausius around 1850. The Legendre transform of the thermodynamic entropy is given by
Entropy 10 00131 i040
This function was introduced by Massieu in 1869. The suprememum is taken over all U for which S(U) is defined by (36). The function is convex — this is a well-known property of Legendre transforms.
Proposition 1 
One has
Entropy 10 00131 i041
Proof 
 
Given θ D there exists pθ for which the variational principle holds. Then one has, with Uj = 〈pθ,Hj〉,
Entropy 10 00131 i042
This proves the inequality in one direction. Next, fix > 0 and let U be such that
Entropy 10 00131 i043
with U such that S(U) is defined by (36). Then, there follows from the definition of S(U) that η𝒟 exists such that S(U) = I(pη) with 〈pη,Hj〉 = Uj, j = 1, 2, ···, n. The variational principle now implies that
Entropy 10 00131 i044
Because > 0 is arbitrary, the inequality in the other direction follows now.
                              ☐
The inverse Legendre transformation reads
Entropy 10 00131 i045
It is a concave function.
Proposition 2 
One has S(U) = S (U) for all U for which S(U) is defined by (36).
Proof 
 
From the definition of the Massieu function Φ(θ) there follows that
Entropy 10 00131 i046
This implies that S(U) ≤ S (U). On the other hand, from the definition (36) of S(U) follows that
Entropy 10 00131 i047
where θ is such that pθ P U. This implies S(U) ≥ S (U). The two inequalities together establish the desired equality.
                              ☐

10. Thermodynamic relations

Like in the previous Section, there is given a statistical model (pθ)θ𝒟 belonging to the generalised exponential family. In addition, let D 0 be an open subset of D on which the map θ → 〈pθ, Hj〉 is continuous.
The following results are typical properties of Legendre transforms. For completeness, proofs are given.
Proposition 3 
The first derivative of the Massieu function Φ(θ) exists for θ in D 0. It satisfies
Entropy 10 00131 i048
Proof 
 
From the definitions one has for θ and θ + η in D 0
Entropy 10 00131 i049
and
Entropy 10 00131 i050a Entropy 10 00131 i050b
Expression (45) now follows using the continuity of the map θ → (pθ,Hj).
                              ☐
Introduce the metric tensor
Entropy 10 00131 i051
Because the Massieu function Φ(θ) is convex the matrix g(θ) is positive definite, whenever it exists. By the previous Proposition one has
Entropy 10 00131 i052
for those θ in D 0 for which the derivative exists.
In thermodynamics, the derivative of S(U) equals the inverse of the absolute temperature T . Here, the analogous property becomes
Proposition 4 
Let θD0 and define U by Uj = 〈pθ,Hj〉. Then one has
Entropy 10 00131 i053
Proof 
 
On a vicinity of Entropy 10 00131 i054. Hence, one can write
Entropy 10 00131 i055
But the first term in the r.h.s. vanishes because the previous Proposition holds. Hence, the desired result follows.
                              ☐
The two relations (45) and (50) are dual in the sense of Amari [18]. In thermodynamics, the entropy S(U) and Massieu’s function Φ(θ) are state functions, the energies Uj are extensive thermodynamic variables, the parameters θj are the intensive thermodynamic variables.

11. Escort probabilities

Let us now make the additional assumption that the function f(u), which enters the definition (6) of the generalised entropy, has a derivative f'(u). Because f(u) was supposed to be strictly increasing, one can write
Entropy 10 00131 i056
where ϕ(v) = 1/(df/dv) is a strictly positive function.
As before, there is given a statistical model (pθ)θD belonging to the generalised exponential family, and D 0 is an open subset of D on which the map θ pθ,Hj〉 is continuous. The set A0(θ) is the set of aA for which pθ(a) = 0. From theorems 1 and 2 now follows
Entropy 10 00131 i057
This expression was used in [11] as a condition under which a generalisation of the well-known bound of Cramér and Rao is optimal. An immediate consequence of (53) is
Proposition 5 
Assume the regularity condition
Entropy 10 00131 i058
Assume in addition that
Entropy 10 00131 i059
where Σ' denotes the sum over all aA\A0(θ). Then one has
Entropy 10 00131 i060
Proof 
 
On a vicinity of the given θ one has (53). Hence, by summing (53) over aA\A0(θ) one obtains using (54)
Entropy 10 00131 i061
                              ☐
The probability distribution
Entropy 10 00131 i062
when it exists, is called the escort of the model (pθ)θD. With this notation, one can write the result of the Proposition as
Entropy 10 00131 i063

12. Generalised Fisher information

Let be given a model (pθ)θ𝒟 for which z(θ), as given by (55), converges. The escort probabilities Pθ,a are defined by (58). Then one can define a generalised Fisher information matrix by
Ii,j(θ) = 〈Pθ,Xi(θ)Xj(θ)〉,
where the score variables are defined by
Entropy 10 00131 i064
Note that in the standard case of h(u) = −u ln u one has ϕ(u) = u so that the escort probabilities Pθ coincide with the pθ. Then (60) reduces to the conventional definition.
Fix now a set of Hamiltonians H1(a), H2(a), ···, Hn(a). Then one can define a covariance matrix σ(θ) by
σi,j(θ) = 〈Pθ,HiHj− 〈Pθ,Hi〉〈Pθ,Hj〉.
Proposition 6 
Assume a finite alphabet A. Then one has
Ii,j(θ) = z(θ)gi,j= z2(θ)σi,j.
Proof 
 
 
From (53) follows
Entropy 10 00131 i065
for all θ𝒟0 and aA\A0(θ). Hence, the Fisher information matrix becomes
Entropy 10 00131 i066
Using (59) there follows Ii,j(θ) = z2(θ)σi,j.
On the other hand, from (49) and (53) there follows
Entropy 10 00131 i067
Using (56) there follows gi,j(θ) = z(θ)σi,j.
The assumption of a finite alphabet is made to ensure that the conditions of Proposition 5 are fulfilled and that the sum and derivative may be interchanged in (66).
The generalised inequality of Cramér and Rao, in the present notations, reads [11]
Entropy 10 00131 i068
with u and v arbitrary real vectors. The previous Proposition then implies that the inequality becomes an equality when u = v, when P is related to p via (58), and when pθ belongs to a generalised exponential family.

13. Non-extensive thermostatistics

Define the q-deformed logarithm by [7,19]
Entropy 10 00131 i069
It is a strictly increasing function, defined for u > 0. Indeed, its derivative equals
Entropy 10 00131 i070
In the limit q = 1 the q-deformed logarithm converges to the nature logarithm ln u.
The deformed logarithm can be used in more than one way to define an entropy function. The q- entropy (3) can be written as
Entropy 10 00131 i071
Comparison with (4) gives
Entropy 10 00131 i072
One has h(0) = h(1) = 0. Taking the derivative gives
Entropy 10 00131 i073
It is a strictly increasing function on (0, 1] when q > 0. The function ϕ(u) is given by
Entropy 10 00131 i074
The probability distributions belonging to the generalised exponential family, corresponding with (70), are
Entropy 10 00131 i075
with [u]+ = max{0,u}. This is indeed the kind of probability distribution discussed in the original paper of Tsallis [4]. However, more often used is the alternative of [9]. In the latter paper the concept of escort probability distributions was introduced into the literature. They were defined by
Entropy 10 00131 i076
which in the present notations corresponds with ϕ(u) proportional to uq. This can be obtained by replacing the constant q by 2 − q in (70). The entropy function then reads
Entropy 10 00131 i077
which is not the expression that one would write down based on the information theoretical argument that ln(1/pa) is the amount of information (counted in units of ln 2), gained from an event occurring with probability pa. Note that with this definition of entropy function the condition q < 2 is needed in order to satisfy the requirements that the function Entropy 10 00131 i078 is an increasing function.

14. The percolation problem

This example has been treated in [20]. It is a genuine example of an important model of statistical physics which does not belong to the exponential family. In addition, it is an example which fits into the present generalised context provided that one allows that the function h(u) appearing in the definition (4) of the generalised entropy function is stochastic.
In the site percolation problem [21], the points of a lattice are occupied with probability q, independent of each other. The point at the origin is either unoccupied, with probability pØ, or it belongs to a cluster of shape i, with probability pi. This cluster is finite with probability 1, provided that 0 ≤ qqc, where qc is the percolation threshold. The probability p that the origin belongs to an infinite cluster is strictly positive for q > qc. However, for the sake of simplicity of the presentation, 0 < q < qc will be assumed — see [20] for the general case.
These probabilities are given by
pi = ciqs(i)(1 − q)t(i),
where ci is the number of different clusters of shape i, s(i) is the number of occupied sites in the cluster, and t(i) is the number of perimeter sites, this is, of unoccupied neighbouring sites. Note that (77) also holds when the origin is not occupied, provided that one convenes that c(Ø) = 1, s(Ø) = 0 and t(Ø) = 1.
Choose the Hamiltonian
Entropy 10 00131 i079
and introduce the parameter θ by
Entropy 10 00131 i080
Then one can write
Entropy 10 00131 i081
with
Entropy 10 00131 i082
This looks like an exponential family, except for the extra factor [s(i) + t(i)] in the r.h.s.. Introduce the stochastic function
Entropy 10 00131 i083
Then the above expression is of the form (14). By integrating fi(u) one obtains
Entropy 10 00131 i084
It is now straightforward to verify that the percolation problem belongs to a generalised exponential family. The relevant entropy function for the percolation model in the non-percolating region 0 < q < qc is therefore
Entropy 10 00131 i085

15. Discussion

Sections 3 to 6 of the present paper discuss the variational principle, which is stronger than the maximum entropy principle. It is shown that the method of Lagrange multipliers leads to the correct result, even in the context of generalised entropy functions. The difficulty that arises is known as the cutoff problem: the optimising probability distribution may assign vanishing probabilities to some of the events. To cope with this situation the two cases have been considered separately. Theorem 1 treats the standard case, Theorem 2 copes with the vanishing probabilities.
In Section 7, a generalised definition of an exponential family is given. It identifies the members of the generalised exponential family with the solutions of the variational principle, given a generalised entropy function of the usual form (4). The definition of the standard exponential family corresponds of course with the Boltzmann-Gibbs-Shannon entropy. Entropy functions I(p) and λI(p), with λ > 0, determine the same exponential family. Assuming some technical condition, the intersection of different generalised exponential families is empty — see Theorem 4. As a consequence, a one-to one relation has been established between generalised exponential families and classes of equivalent entropy functions.
In [11], the notion of phi-exponential family was introduced. The ’phi’ in this name refers to the function ϕ(v), introduced in (52). It is one divided by the derivative of the function f (v) appearing in the expression (6) for the entropy function I(p). The assumption that the derivative of f (v) exists for all v > 0 has been eliminated in the present paper. More important is that the definition of a generalised exponential family is now given directly in terms of the entropy function I(p), via the variational principle, without relying on the notion of deformed exponential functions.
Section 9, Section 10, Section 11 and Section 12 discuss the geometric properties of a generalised exponential family, using a terminology coming from 150 year old thermodynamics. The main result is (63), proving the equality of the three quantities generalised Fisher information, metric tensor times partition sum z(θ), and covariance matrix multiplied with z2(θ). The covariance matrix is calculated using the escort family of probability distributions.
Many applications of generalised exponential families are found in the literature, in the context of nonextensive thermostatistics. The latter has been discussed in Section 13. A completely different kind of example is found in percolation theory — see Section 14. It illustrates the possibility that the function f (u), which determines the entropy function I(p) via (6), is of a stochastic nature. One can expect that many other applications will be found in the near future.
Finally note that the present work has a quantum analogue. Let be given a strictly increasing function f (u), continuous on (0, 1]. The expression (6) can be generalised to
Entropy 10 00131 i086
where ρ is any density operator in a Hilbert space. The Bregman divergence (8) generalises to
Entropy 10 00131 i087
The basic inequality D(ρ||ρ) ≥ 0 is proved using Klein’s inequality — see 2.5.2. of [17].

Acknowledgements

This work has benefitted from a series of discussions with Flemming Topsøe. I am grateful to the anonymous referees for their constructive remarks and for pointing out the references [2,13,14].

References

  1. Petz, D. Quasi-entropies for finite quantum systems. Rep. Math. Phys. 1986, 23, 57–65. [Google Scholar]
  2. Csisza´r, I. Information type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
  3. Csisza´r, I. A class of measures of informativity of observation channels. Per. Math. Hung. 1972, 2, 191–213. [Google Scholar] [CrossRef]
  4. Tsallis, C. Possible Generalization of Boltzmann-Gibbs Statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  5. Havrda, J.; Charvat, F. Quantification method of classification processes, the concept of structural entropy. Kybernetica 1967, 3, 30–35. [Google Scholar]
  6. Daro´czy, Z. Generalized Information Functions. Information and Control 1970, 16, 36–51. [Google Scholar]
  7. Tsallis, C. What are the numbers that experiments provide? Quimica Nova 1994, 17, 468. [Google Scholar]
  8. Naudts, J. Deformed exponentials and logarithms in generalized thermostatistics. Physica A 2002, 316, 323–334. [Google Scholar] [CrossRef]
  9. Tsallis, C.; Mendes, R.; Plastino, A. The role of constraints within generalized nonextensive statistics. Physica A 1998, 261, 543–554. [Google Scholar] [CrossRef]
  10. Naudts, J. Continuity of a class of entropies and relative entropies. Rev. Math. Phys. 2004, 16, 809–822. [Google Scholar]
  11. Naudts, J. Estimators, escort probabilities, and phi-exponential families in statistical physics. J. Ineq. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
  12. Topsøe, F. Information-theoretical optimization techniques. Kybernetika 1979, 15, 8–27. [Google Scholar]
  13. Gru¨nwald, P.; Dawid, A. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 2004, 32, 1367–1433. [Google Scholar]
  14. Lafferty, J. Additive Models, Boosting, and Inference for Generalized Divergences. In Additive models, boosting, and inference for generalized divergences; 1999; pp. 125–133. [Google Scholar]
  15. Bregman, L. The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
  16. Ruelle, D. A variational formulation of equilibrium statistical mechanics and the Gibbs phase rule. Commun. Math. Phys. 1967, 5, 324–329. [Google Scholar] [CrossRef]
  17. Ruelle, D. Statistical mechanics; W.A. Benjamin: New York, 1969. [Google Scholar]
  18. Amari, S. Differential-geometrical methods in statistics; Springer: New York, Berlin, 1985; Vol. 28. [Google Scholar]
  19. Tsallis, C. Nonextensive statistical mechanics: construction and physical interpretation. In Nonextensive Entropy; Oxford, 2004; pp. 1–53. [Google Scholar]
  20. Naudts, J. Parameter estimation in nonextensive thermostatistics. Physica A 2006, 365, 42–49. [Google Scholar] [CrossRef] [Green Version]
  21. Stauffer, D. Introduction to percolation theory; Plenum Press: New York, 1985. [Google Scholar]

Share and Cite

MDPI and ACS Style

Naudts, J. Generalised Exponential Families and Associated Entropy Functions. Entropy 2008, 10, 131-149. https://doi.org/10.3390/entropy-e10030131

AMA Style

Naudts J. Generalised Exponential Families and Associated Entropy Functions. Entropy. 2008; 10(3):131-149. https://doi.org/10.3390/entropy-e10030131

Chicago/Turabian Style

Naudts, Jan. 2008. "Generalised Exponential Families and Associated Entropy Functions" Entropy 10, no. 3: 131-149. https://doi.org/10.3390/entropy-e10030131

Article Metrics

Back to TopTop