Article Geometry of q-Exponential Family of Probability Distributions

The Gibbs distribution of statistical physics is an exponential family of probability distributions, which has a mathematical basis of duality in the form of the Legendre transformation. Recent studies of complex systems have found lots of distributions obeying the power law rather than the standard Gibbs type distributions. The Tsallis q-entropy is a typical example capturing such phenomena. We treat the q-Gibbs distribution or the q-exponential family by generalizing the exponential function to the q-family of power functions, which is useful for studying various complex or non-standard physical phenomena. We give a new mathematical structure to the q-exponential family different from those previously given. It has a dually flat geometrical structure derived from the Legendre transformation and the conformal geometry is useful for understanding it. The q-version of the maximum entropy theorem is naturally induced from the q-Pythagorean theorem. We also show that the maximizer of the q-escort distribution is a Bayesian MAP (Maximum A posteriori Probability) estimator.


Introduction
Statistical physics is founded on the Gibbs distribution for microstates, which forms an exponential family of probability distributions known in statistics. Important macro-quantities such as energy, entropy, free energy, etc. are connected with it. However, recent studies show that there are non-standard complex systems which are subject to the power law instead of the exponential law of the Gibbs type distributions. See [1,2] as well as extensive literatures cited in them.
Tsallis [3] defined the q-entropy to elucidate various physical phenomena of this type, followed by many related research works on this subject (see, [1]). The concept of the q-Gibbs distribution or q-exponential family of probability distributions is naturally induced from this framework (see also [4]). However, its mathematical structure has not yet been explored enough [2,5,6], while the Gibbs type distribution has been studied well as the exponential family of distributions [7]. We need a mathematical (geometrical) foundation to study the properties of the q-exponential family. This paper presents a geometrical foundation for the q-exponential family based on information geometry [8], giving geometrical definitions of the q-potential function, q-entropy and q-divergence in a unified way.
We define the q-geometrical structure consisting of a Riemannian metric and a pair of dual affine connections. By using this framework, we prove that a family of q-exponential distributions is dually flat, in which the q-Pythagorean theorem holds. This naturally induces the corresponding q-maximum entropy theorem similarly to the case of the Tsallis q-entropy [1,9,10]. The q-structure is ubiquitous since the family S n of all discrete probability distributions can always be endowed with the structure of the q-exponential family for arbitrary q. It is possible to generalize the q-structure to any family of probability distributions. Further, it has a close relation with the α-geometry [8], which is one of information geometric structure of constant curvature. This new dually flat structure, different from the old one given rise to from the invariancy in information geometry, can be also obtained by conformal flattening of the α-geometry [11,12], using a technique in the conformal and projective geometry [13][14][15].
The present framework prepares mathematical tools for analyzing physical phenomena subject to the power law. The Legendre transformation again plays a fundamental role for deriving the geometrical dual structure. There exist lots of applications of q-geometry to information theory ( [16] and others) and statistics, including Bayes q-statistics.
It is possible to generalize our framework to a more general non-linear family of distributions by using a positive convex function instead of q-exponential function (See [2,17]). A good example is the κ-exponential family [18][19][20], but we do not state it here.

q-Logarithm and q-Exponential Function
It is the first step to generalize the logarithm and exponential functions to include a family of power functions, where the logarithm and exponential functions are included as the limiting case [1,5,21]. This was also used for defining the α-family of distributions in information geometry [8]. We define the q-logarithm by and its inverse function, the q-exponential, by for a positive q with q = 1. The limiting case q → 1 reduces to so that log q and exp q are defined for q > 0.

q-Exponential Family
The standard form of an exponential family of distributions is written as with respect to an adequate measure μ(x), where x = (x 1 , · · · , x n ) is a set of random variables and θ = (θ 1 , · · · , θ n ) are the canonical parameters to describe the underlying system. The Gibbs distribution is of this type. Here, ψ(θ) is called the free energy, which is the cumulant generating function. The power version of the Gibbs distribution is written as where θ · x = θ i x i . This is the q-Gibbs distribution or q-exponential family [4], which we denote by S, where the domain of x is restricted such that p(x, θ) > 0 holds. The function ψ q (θ), called the q-free energy or q-potential function, is determined from the normalization condition: where we replaced dμ(x) by dx for brevity's sake. The function ψ q depends on q, but we hereafter neglect suffix q in most cases. Research on the q-exponential family can be found, for example, in [2,4,19]. The q-Gaussian distribution is given by and is studied in [22][23][24][25] in detail. Here, we need to introduce a vector random variable x = (x, x 2 ) and a new parameter θ, which is a vector-valued function of μ and σ, to represent it in the standard form (7). It is an interesting observation that the domain of x in the q-Gaussian case depends on q if 0 < q < 1.
Hence, that qand q -Gaussian are in general not absolutely continuous when q = q .
It should be remarked that the q-exponential family itself is the same as the α-family of distributions in information geometry [8]. Here, we introduce a different geometrical structure, generalizing the result of [24].
We mainly use the family S n of discrete distributions over (n + 1) elements X = {x 0 , x 1 , · · · , x n }, although we can easily extend the results to the case of continuous random variables. Here, random variable x takes values over X. We also treat the case of 0 < q < 1, and the limiting cases of q = 0 or 1 give the well-known ones.
Let us put p i = Prob {x = x i } and denote the probability distribution by vector p = (p 0 , p 1 , · · · , p n ), where The probability of x is also written as where Theorem 1 The family S n of discrete probability distributions has the structure of a q-exponential family for any q.
Proof We take log q of distribution p(x) of (11). For any function f (u), we have By taking into account, discrete distribution (11) can be rewritten in the form (8) as where is treated as a function of (p 1 , · · · , p n ). Hence, S n is q-exponential family (6) for any q, with the following q-canonical parameters, random variables and q-potential function: This completes the proof.
Note that the q-potential ψ(θ) and the canonical parameter θ depend on q as is seen in (17) and (19). It should also be remarked that Theorem 1 does not contradict to the theorem 1 in [19] stating that a parametrized family of probability distributions can belong to at most one q-exponential family. The author considers an m-dimensional parametrized submanifold in S n with m < n where the canonical parameter depending on q is given via the variational principle. Therefore, by denoting the q-canonical parameter by θ q ∈ R m , we can restate his theorem in terms of geometry that a linear submanifold parametrized by θ q ∈ R m is not a linear submanifold parametrized by θ q ∈ R m when q = q. On the other hand, the present theorem states that there exists the q-canonical parameter θ q ∈ R n on whole S n for any q and the manifold has linear structure with respect to any θ q . This is a surprising new finding.

q-Potential Function
We study the q-geometrical structure of S. The q-log-likelihood is a linear form defined by By differentiating it with respect to θ i , with the abbreviated notation From this we have the following important theorem.

Theorem 2
The q-free energy or q-potential ψ q (θ) is a convex function of θ q .
Proof We omit the suffix q for simplicity's sake. We have The following identities hold: Here, we define an important functional in particular for discrete S n , for 0 < q < 1. This function plays a key role in the following. From (25) and (26), by using (23) and (24), we have The latter shows that ∂ i ∂ j ψ(θ) is positive-definite, and hence ψ is convex.

q-Divergence
A convex function ψ(θ) makes it possible to define a divergence of the Bregman-type between two probability distributions p (x, θ 1 ) and p (x, θ 2 ) [8,26,27]. It is given by using the gradient ∇ = ∂/∂θ, satisfying the non-negativity condition with equality when and only when θ 1 = θ 2 . This gives a q-divergence in S n different from the invariant divergence of S n [28]. The divergence is canonical in the sense that it is uniquely determined in accordance with dually flat structure of q-exponential family in Sections 3 and 4. The canonical divergence is different from the α-divergence or conventional Tsallis relative entropy used in information geometry (See the discussion in the end of this subsection). Note that it is used in [16].
Theorem 3 For two discrete distributions p(x) = p and r(x) = r, the q-divergence is given by Proof The potentials are, from (19), for p and r. We need to calculate ∇ψ(θ) given in (29). In our case, x i = δ i (x) and hence By using this and (17), we obtain (33).
It is useful to consider a related probability distribution, for defining the q-expectation. This is called the q-escort probability distribution [1,4,29]. Introducing the q-expectation of random variable f (x) by we can rewrite the q-divergence (31) for p(x), r(x) ∈ S as because of the relations (20) and (29). The expression (38) is also valid on the exterior of S × S when it is integrable. This is different from the definition of the Tsallis relative entropy [30,31] which is equal to the well-known α-divergence up to a constant factor where α = 1 − 2q (see [8,28]), satisfying the invariance criterion. We have This is a conformal transformation of divergence, as we see in the following. See also the derivation based on affine differential geometry [12].

q-Riemannian Metric
When θ 2 is infinitesimally close to θ 1 , by putting θ 1 = θ, θ 2 = θ+dθ and using the Taylor expansion, we have where g is a positive-definite matrix. We call g ij (θ) the q-Fisher information matrix. When q = 1, this reduces to the ordinary Fisher information matrix given by The positive-definite matrix g (q) ij (θ) defines a Riemannian metric on S n , giving it the q-Riemannian structure.

Theorem 4 The q-Fisher information metric is given by a conformal transformation of the Fisher information metric
Proof The q-metric is derived from the Taylor expansion of D q [p : p + dp]. We have using the identities (25) and (26). When q = 1, this is the Fisher information given by (43). Hence, the q-Fisher information is given by (45).
A Riemannian metric defines the length of a tangent vector X = (X 1 , · · · , X n ) at θ by Similarly, for two tangent vectors X and Y , their inner product is defined by When X, Y vanishes, X and Y are said to be orthogonal. The orthogonality, or more generally the angle, of two vectors X and Y does not change by a conformal transformation, although their magnitudes change.

Legendre Transformation and q-Entropy
Given a convex function ψ(θ), the Legendre transformation is defined by where ∇ = (∂/∂θ i ) is the gradient. Since the correspondence between θ and η is one-to-one, we may consider η as another coordinate system of S. The dual potential function is defined by which is convex with respect to η. The original coordinates are recovered from the inverse transformation given by where ∇ = (∂/∂η i ), so that θ and η are in dual correspondence.
The following theorem gives explicit relations among these quantities.

Theorem 5
The dual coordinates η are given by and the dual potential is given by Proof The relation (52) is immediate from (29). From the Legendre duality, the dual potential satisfies when θ and η correspond to each other by η = ∇ψ(θ). Therefore, This is a convex function of η.
We call the q-dual potential the negative q-entropy, because it is the Legendre-dual of the q-free energy ψ(θ). There are various definitions of q-entropy. The Tsallis q-entropy [3] is originally defined by while the Rényi q-entropy [33] is They are mutually related by monotone functions. When q → 1, all of them reduce to the Shannon entropy.
Our definition of is also monotonically connected with the previous ones, but is more natural from the point of view of q-geometry. The entropy H q has been known as the normalized q-entropy, which was studied in [16,[34][35][36][37].

q-Dually Flat Structure
There are two dually coupled coordinate systems θ and η in q-exponential family S with two potential functions ψ(θ) and ϕ(η) for each q. Two affine structures are introduced by the two convex functions ψ and ϕ. See information geometry of dually flat space [8]. Although S is a Riemannian manifold given by the q-Fisher information matrix (45), we may nevertheless regard S as an affine manifold where θ is an affine coordinate system. They represent intensive quantities of a physical system. Dually, we introduce a dual affine structure to S, where η is another affine coordinate system. They represent extensive quantities. We can define two types of straight lines or geodesics in S due to the q-affine structures.
For two distributions p (x, θ 1 ) and p (x, θ 2 ) in S, a curve p (x, θ(t)) is said to be a q-geodesic connecting them, when where t is the parameter of the curve. Dually, in terms of dual coordinates η, when holds, the curve is said to be a dual q-geodesic. More generally, the q-geodesic connecting two distribution p 1 (x) and p 2 (x) is given by where c(t) is a normalizing term. This is rewritten as Dually, the dual q-geodesic connecting p 1 (x) and p 2 (x) is given by using the escort distributions aŝ Since the manifold S has a q-Riemannian structure, the orthogonality of two tangent vectors is defined by the Riemannian metric. We rewrite the orthogonality of two geodesics in terms of the affine coordinates. Let us consider two small deviations d 1 p(x) and d 2 p(x) of p(x), that is, from p(x) to p(x)+d 1 p(x) and p(x)+d 2 p(x), which are regarded as two (infinitesimal) tangent vectors of S at p(x).

Lemma 1
The inner product of two deviations d 1 p and d 2 p is given by Proof By simple calculations, we have of which the right-hand side is the Riemannian inner product in the form of (46).
Here,θ 1 (t) andη 2 (t) denote derivatives of θ 1 (t) and η 2 (t) by t, respectively. The two geodesics and the orthogonality play a fundamental role in S as will be seen in the following.

q-Pythagorean and q-Max-Ent Theorems
A dually flat Riemannian manifold admits the generalized Pythagorean theorem and the related projection theorem [8]. We state them in our case. q-Pythagorean Theorem. For three distributions p 1 (x), p 2 (x) and p 3 (x) in S, it holds that when the dual geodesic connecting p 1 (x) and p 2 (x) is orthogonal at p 2 (x) to the geodesic connecting p 2 (x) and p 3 (x) (see Figure 1).  We show that the well-known q-max-ent theorem in the case of Tsallis q-entropy [1,4,9,11] is a direct consequence of the above q-Pythagorean and q-projection theorems. q-Max-Ent Theorem. Probability distributions maximizing the q-entropies H Tsallis , H Rényi and H q under q-linear constraints for m random variables c k (x) and various values of a k form a q-exponential family The proof is easily obtained by the standard analytical method. Here, we give a geometrical proof. Let us consider the subspace M * ⊂ S whose member p(x) satisfies the m constraints Since the constraints are linear in the dual affine coordinates η orp(x), M * is a linear subspace of S with respect to the dual affine connection. Let p 0 (x, θ 0 ) be the uniform distribution defined by θ 0 = 0, which implies p 0 (x, θ 0 ) = const from (6). Letp(x) ∈ M * be the q-projection of p 0 (x) to M * (Figure 3).
Let η p be the dual coordinates of p(x). Since the divergence is written as the minimizer of D q [p : p 0 ] among p(x) ∈ M * is justp(x), which is also the maximizer of the entropy −ϕ η p . The trajectories ofp(x) for various values of a k form a flat subspace orthogonal to M * , implying that they form a q-exponential family of the form (6) (see Figure 3). The tangent directions dp(x) of M * satisfies dp(x)c k (x)dx = 0, k = 1, · · · , m.
Hence, a q-exponential family of the form is orthogonal to M * , when dp(x)d log q p(x, ξ)dx = 0 This implies that d i (x) = c i (x). Hence, we have the q-exponential family (72) that maximizes the q-entropies.

q-Bayesian MAP Estimator
Given N iid observations Since log q u is a monotonically increasing function, the maximizer of the q-likelihood is the same as the ordinary maximum likelihood estimator (mle). However, the maximizer of the q-escort distribution that maximizes the q-escort log-likelihood, is different from this. We show that the q-mle is a Bayesian MAP (maximum a posteriori probability) estimator. This clarifies the meaning of the q-escort mle. The q-escort mle is the maximizer of the q-escort distribution, Theorem 6 The q-escort mleξ q is the Bayesian MAP estimator with the prior distribution Proof The Bayesian MAP is the maximizer of the posterior distribution with prior π(ξ) which also maximizes (π(ξ)p (x 1 , · · · , x N , ξ)) q , for q > 0 (85) On the other hand, the q-escort mle is the maximizer of Hence, when π(ξ) = h q (ξ) −N/q (87) the two estimators are identical.
The theorem shows that the Bayesian prior has a peak at the maximizer of our q-entropy H q .

Conclusions
Much attention has been recently paid to the probability distributions subject to the power law, instead of the exponential law, since Tsallis proposed the q-entropy and related theories. The power law is also found in various communication networks. It is now a hot topic of research.
However, we do not have a geometrical foundation while that for the ordinary family of probability distributions is given by information geometry [8]. The present paper tried to give a geometrical foundation to the q-family of probability distributions. We introduced a new notion of the q-geometry. The q-structure is ubiquitous in the sense that the family of all the discrete probability distributions (and the family of all the continuous probability distributions, if we neglect delicate problems involved in the infinite dimensionality) belongs to the q-exponential family of distributions for any q. That is, we can introduce the q-geometrical structure to an arbitrary family of probability distributions, because any parametrized family of probability distributions forms a submanifold embedded in the entire manifold.
The q-structure consists of a Riemannian metric together with a pair of dually coupled affine connections, which sits in the framework of the standard information geometry. However, the q-structure is essentially different from the standard one derived by the invariance criterion of the manifold of probability distributions. We have a novel look on the theory related to the q-entropy from a viewpoint of conformal transformation. This leads us to unified definitions of various quantities such as the q-entropy, q-divergence, q-potential function and their duals, as well as new interpretations of known quantities. This is a geometrical foundation and we expect that the paper contributes to provide further developments in this field.