Abstract
Based on the maximum entropy (MaxEnt) principle for a generalized entropy functional and the conjugate representations introduced by Zhang, we have reformulated the method of information geometry. For a set of conjugate representations, the associated escort expectation is naturally introduced and characterized by the generalized score function which has zero-escort expectation. Furthermore, we show that the escort expectation induces a conformal divergence.
1. Introduction
Information geometry (IG) [,] is a differential geometrical method based on a Riemannian metric on a statistical manifold, which is constructed from a given parameterized probability distribution function (pdf) . It provides a useful tool to study, for example, the dually flat structures of a statistical manifold. Recently, much effort has been made to study some deformed exponential families of pdfs, in which the standard exponential function and its inverse function are replaced with a deformed exponential and its inverse function, which is called a deformed logarithmic function. Among different deformed exponential functions, relatively well known ones include Tsallis’ q-deformed exponential [] and Kaniadakis’ -deformed exponential functions []. Naudts [] introduced the so-called -logarithmic function in terms of a positive increasing function , and studied the generalized thermostatistics. It is shown that a q-deformed relative entropy is proportional to Amari’s -divergence and is related with the -geometry on the statistical manifold with a constant curvature []. In order to construct a suitable statistical manifold in IG, usually the -representation (rep.), or -immersion, of a pdf is used. It is well known that the -rep. works fine with an exponential family but does not necessarily work fine for a non-exponential pdf, e.g., a -deformed exponential family []. A generalization of the -rep. is conjugate representations, or -reps., by Zhang []. He also introduced the -divergences from the point of view of “representation duality”. By finding out a suitable conjugate rep. for -deformed exponential pdf, the IG of the -generalized thermostatistics [] was studied. We further studied the IG structures among the thermodynamic potentials in the -thermostatistics [], in which the escort pdfs and escort expectations play an important role. Zhang [] further showed that his conjugate reps. also include Naudts’ -logarithm [] as a special case. In this way, Zhang’s conjugate reps. are very useful as a generalization of -reps in IG. Amari [] showed that -divergences generate a dually flat structure in the manifold of positive measures and in that of positive-definite matrices. In this contribution, we reformulate the IG structures based on Zhang’s conjugate reps. and the maximum entropy (MaxEnt) principle for a generalized entropy functional. Our approach is different from the previous works [,] in that we relate the -rep. of a pdf with the Lagrange multipliers in the MaxEnt problem. This enables us to introduce a generalized score function and characterize the escort expectations.
The rest of the paper is organized as follows. The next section provides us with the preliminaries for the basics of IG for the exponential families of pdf. In Section 3, after a brief review of conjugate reps. introduced by Zhang [], we reformulate the IG structures based on the conjugate reps. and discuss the maximum entropy (MaxEnt) principle for a generalized entropy. For a set of conjugate reps., the associated generalized score function is introduced. The escort rep. and escort expectation are then naturally induced. Section 4 relates the conformal divergence to the difference of the entropies in terms of the escort expectations. The final section is devoted to our concluding remarks. Throughout the paper, we use the abbreviations for , and for .
2. Preliminaries
Information geometry [,] provides us a useful tool for studying a family
of a probability distribution function (pdf) characterized by a set of real parameters . is called a (M-dimensional) statistical model and the pdf of can be regarded as a point in a differential manifold with local coordinates . is called a statistical manifold and a Riemannian metric on is provided by the Fisher information matrix [,
where . In this contribution, we assume that is positive definite, and stands for the linear expectation with respect to the pdf .
A manifold is said to be e-flat (exponential-flat) if a set of coordinate systems satisfies
identically. Any set of coordinates satisfying (3) is called e-affine coordinates. A well-known example of e-flat manifolds is the exponential family
where each is a given function of a random value x and is the normalization factor of a pdf . From the normalization of the pdf , we see that
and, for the exponential family, we have
which does not depend on x. Hence, the condition (3) is satisfied and one confirms that the exponential family is e-flat. In addition, for the exponential family, we have
Taking the expectation of both sides and using Equation (5), we see that the m-affine coordinates (-coordinates) of the exponential family are given by
Accounting for Equations (7) and (8), from definition (2), we obtain
which is the covariance matrix for the statistical model .
A manifold is said to be m-flat (mixture-flat) if a set of coordinate systems satisfies
identically. In this case, the set of coordinates is called m-affine coordinates.
In a dually flat structure, the - and -coordinates are related by the Legendre transformation
where and are Legendre–Fenchel dual to each other and are called - and -potential functions, respectively. The canonical divergence function [] for a set of two pdf and can be defined by
which is a Bregman divergence with the convex function .
For a dually flat manifold , Pythagorean relation is generalized in terms of divergence. Let be three probability distributions in . When the e-geodesic connecting p and r is orthogonal at r to the m-geodesic connecting r and s, the following generalized Pythagorean relation [] holds.
As is well known, maximizing the Boltzmann–Gibbs–Shannon (BGS) entropy
under the M-constraints
for a given set of and the normalization , leads to the optimized pdf belonging to the exponential family . The Lagrange multipliers are the control parameters for the above M-constraints. From the normalization of an exponential pdf, we readily obtain the -potential function as
We note that, in addition to Equation (2), the Fisher metric can be written equivalently in other different expressions
In particular, combining Equation (6) with (20), we readily confirm the important relation
that is, the Fisher metric coincides with the Hessian matrix of the -potential function . It is known that an exponential family naturally has the dualistic Hessian structures and their canonical divergences coincide with the Kullback–Leibler divergences. Furthermore, using Equation (8), the Fisher matrix can also be rewritten as
which holds for the exponential family .
In general, the dual affine connections are induced from the metric. By applying to Equation (19) for , we see that the following relation holds
where the Christoffel symbol of the first kind for the e-affine connection and that for the m-affine connection are defined by
respectively. In addition, we can introduce a cubic form
which characterizes the difference between the affine connection (or ) and Levi–Civita connection through the relations
3. Conjugate Representations
Here, we briefly review Zhang’s conjugate representations []. For a parameterized probability density function (pdf) with a set of real parameters , information geometry is founded by Prof. Amari [], based on his -representations (reps.) defined by
for a real parameter , and on the -divergence
As a generalization of the -reps., Zhang [] introduced the conjugate representations as follows.
Definition 1.
A ρ-representation of a real positive number ξ is a mapping , where is a strictly monotone function. For a smooth and strictly convex function , a τ-representation: is said to be conjugate to the ρ-representation with respect to if the following relations are satisfied,
where the convex functions and are Legendre dual to each other:
By utilizing the conjugate reps., the associated Bregman divergence can be defined as
The -rep. is, of course, an example of the conjugate reps., and they are related as follows.
The -divergence (31) is expressed as .
Remark 1.
We assume that ρ- and τ-functions satisfy the suitable regularity conditions throughout this paper. It is important to describe the domains and the tangents of the relevant ρ- and τ-functions. However, this is a very difficult matter in general. For example, consider a statistical manifold which is a set of q-Gaussian distributions, and using the α-rep. (36). We see that it is an α-affine manifold with [,]. In this case, if the domain Ω (the total sample space) is , then α must satisfy . If , then α must satisfy . The lower bound comes from the regularity conditions of the statistical manifold (Amari and Nagaoka [], Chapter 2), and the upper bounds come from the integrability conditions of probability densities. In this way, the regularity conditions for a set of ρ- and τ-functions are not determined from these functions themselves only, but depend on the total sample space and the given statistical model. Some arguments have been given in our previous paper [].
3.1. MaxEnt
For a set of conjugate reps., let us introduce a generalized entropy functional S defined by
and consider the following MaxEnt problem.
where is a given function of x, and and are the Lagrange multipliers. Using the relation defined in Equation (32), this MaxEnt problem leads to
where stands for . We assume because if then the -rep. is a constant mapping, which fails to work as a rep., or immersion, of a pdf . We thus obtain
Remark 2.
Note that unless , the constraints of this generalized MaxEnt problem are neither the standard expectations nor the normalization of the pdf . However, the solution of this MaxEnt problem is expressed in terms of the inverse function of as
Definition 2.
For any given ρ-rep., the generalized score function is defined by
Remark 3.
In the above MaxEnt setting, substituting Equation (42) into the generalized score function, we obtain that
Theorem 1.
For any set of conjugate reps., and the associated generalized score function ,
holds.
Proof.
From the definition (32) of the conjugate reps., we see . It follows that
and integrating both sides by x, we obtain the result. ☐
Definition 3 (Escort rep.).
For a given ρ-rep. which satisfies , we can introduce a new -rep., which is called the escort rep. of a pdf and is defined by
where c is an appropriate constant and is the inverse function of .
Remark 4.
For the α-reps., we have
We thus see that
which states that is a self-escort rep. with the constant .
One of the merits of introducing the escort rep. is the next theorem.
Theorem 2.
A -rep. satisfies
Proof.
For this -rep., we can introduce the associated convex functions and that satisfy
respectively.
Note that combining Theorem 1 with Theorem 2 leads to
We then obtain the following corollary
Corollary 1.
For the conjugate reps. ρ and , the associated function satisfies that
which is the constant c defined in Equation (47) for any normalized pdf .
Remark 5.
For the α-rep., we see that
Definition 4 (Escort pdf and escort exprectation).
Define the escort pdf with regards to a pdf by utilizing the escort rep. as follows.
and define the escort expectation with regards to of a given function as
Theorem 3.
In the MaxEnt setting of Equation (39), the score function has zero-escort expectation, i.e., , and it follows that
Remark 6.
In our formalism, the escort expectation is characterized by the generalized score function which is unbiased, i.e., has zero-escort expectation.
We see that the Lagrange multiplier is the θ-potential associated with the escort expectation. The dual affine coordinate is
and the associated Riemannian metric and cubic form are
respectively. Since is a Hessian metric, the statistical manifold described by the θ- and -coordinates is dually flat.
4. Conformal Divergence
Let us consider the Bregman divergence () of the escort reps., i.e.,
The next theorem is a main result of this contribution.
Theorem 4.
The relative escort expectation of ρ-reps. is the conformal (or scaled) divergence of with the scaling factor , i.e.,
Proof.
From Corollary 1, we see that
where c is an appropriate constant for any normalized pdf , and it follows that
Substituting this relation into Equation (64) leads to
Dividing both sides by and using the escort expectation, we obtain the result. ☐
Remark 7.
As an example of Theorem 4, let us consider the α-rep. case. Since as shown in Remark 4, it follows that . The corresponding escort pdf becomes
and Equation (65) becomes
When we set this relation becomes
which was first shown by Matsuzoe and Ohara [].
5. Concluding Remarks
We have discussed and reformulated the method of information geometry in terms of the conjugate reps. introduced by Zhang []. For an appropriate set of conjugate reps., the MaxEnt principle for a generalized entropy relates the associated Lagrange multipliers to the corresponding rep. (41) of the optimal pdf (42). For a generalized score function (2), the escort rep. and escort expectation are then naturally induced. The conformal divergence is related to the difference of the entropies in terms of the escort expectations, as shown in Theorem 4.
In previous work [], we studied, for the -deformed exponential family, the dualistic Hessian geometries among the thermodynamic potentials in the -deformed thermostatistics, and found that there exist two different kinds of dual affine-coordinates: one is associated with the standard expectation; and the other is associated with the escort expectation. There, the double escort distributions, i.e., the escort of the escort distributions, play an important role. For the q-deformed exponential family, one of the authors (H.M.) further studied a sequence (or hierarchy) of escort distributions []. We think that these results are not specific to the q- or -deformed exponential pdf. We believe that these results [,] can be systematically studied by applying the reformulated method developed in this work. Further studies are needed and will be carried out in future work.
Acknowledgments
We acknowledge an anonymous reviewer for providing useful comments to improve our manuscript. The first named author is partially supported by Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research (KAKENHI) Grant Number JP17K05341. The second named author is partially supported by the JSPS Grants-in-Aid for Scientific Research (KAKENHI) Grant Number JP26108003 and JP15K04842.
Author Contributions
Tatsuaki Wada designed the main subject of this research and mainly wrote the manuscript. Hiroshi Matsuzoe commented on the manuscript at all stages. All authors equally promoted the research and discussed the results. All authors have read and approved the final manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Amari, S.-I.; Nagaoka, H. Method of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Amari, S.-I. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
- Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
- Kaniadakis, G. Theoretical foundations and mathematical formalism of the power-law tailed statistical distributions. Entropy 2013, 15, 3983–4010. [Google Scholar] [CrossRef]
- Naudts, J. Generalized Thermostatistics; Springer: Berlin, Germany, 2011. [Google Scholar]
- Amari, S.-I.; Ohara, A. A Geometry of q-Exponential Family of Probability Distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
- Zhang, J. Divergence function, duality and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
- Wada, T.; Scarfone, A.M. Information geometry on the κ-thermostatistics. Entropy 2015, 17, 1204–1217. [Google Scholar] [CrossRef]
- Wada, T.; Matsuzoe, H.; Scarfone, A.M. Dualistic Hessian structures among the thermodynamic potentials in the κ-thermostatistics. Entropy 2015, 17, 7213–7229. [Google Scholar] [CrossRef]
- Zhang, J. On monotone embedding in information geometry. Entropy 2015, 17, 4485–4499. [Google Scholar] [CrossRef]
- Amari, S.-I. Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure. Entropy 2014, 16, 2131–2145. [Google Scholar] [CrossRef]
- Matsuzoe, H.; Wada, T. Deformed Algebras and Generalizations of Independence on Deformed Exponential Families. Entropy 2015, 17, 5729–5751. [Google Scholar] [CrossRef]
- Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent Progress in Differential Geometry and Its Related Fields, Proceedings of the 2nd International Colloquium on Differential Geometry and Its Related Fields, Veliko Tarnovo, Bulgaria, 6–10 September 2010; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
- Matsuzoe, H. A sequence of escort distributions and generalizations of expectations on q-exponential family. Entropy 2017, 19, 7. [Google Scholar] [CrossRef]
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).