Abstract
This study considers dualistic structures of the probability simplex from the information geometry perspective. We investigate a foliation by deformed probability simplexes for the transition of -parameters, not for a fixed -parameter. We also describe the properties of extended divergences on the foliation when different -parameters are defined on each of the various leaves.
1. Introduction
For instance, since the Cauchy distribution and the Student’s t-distribution are q-Gaussians, a set of q-normal distributions is considered a typical q-exponential family and has been related to nonextensive statistical mechanics [1,2]. Sets of q-normal distributions and q-exponential families have been investigated from the information geometry perspective, sometimes for nonextensive statistical mechanics and sometimes independently of it [3,4,5,6,7,8,9,10]. Deformed q-exponential families are defined using the deformed logarithm and reciprocal-deformed exponential functions. For instance, deformed q-exponential families have been used for studying escort distributions [11]. Their Hessian and conformal structures have also been investigated [12,13,14].
The current study considers a foliation by deformed probability simplexes representing sets of escort distributions, which are typical q-exponential families for the continuous transition of -parameters on the information geometry. Previous studies on escort distributions are for a fixed -parameter or among several -parameters. However, foliations and divergence decomposition in dually flat spaces using mixed parameterizations are crucial. Therefore, we investigate extended divergences on the foliation, setting different -parameters on each leaf.
First, we explain the dualistic structures, -divergences, and the Tsallis relative entropy on the probability simplex. Next, we describe the divergences generated by affine immersions as level surfaces on the deformed probability simplexes corresponding to sets of escort distributions. We then define an extended divergence on a foliation by deformed probability simplexes. Finally, we propose a new decomposition of an extended divergence on the foliation.
2. Dualistic Structures and Divergences on the Probability Simplex
Let be the n-dimensional probability simplex, i.e.,
where are the probabilities of states. Let be an affine coordinate system on , where for , and
be a frame of a tangent vector field on . The Fisher metric on is defined by
where is Kronecker’s delta. We define an -connection on by
where if , and if others. Then, the Levi-Civita connection ∇ of g coincides with . For , we have
where is the set of all smooth tangent vector fields on . Then, is called the dual connection of . For each , is torsion-free and is symmetric. Therefore, the triple is a statistical manifold, and the dual statistical manifold of it [9,12,13].
For , an -divergence is defined by
For , it is known that
for the Tsallis relative entropy, defined by
where is the q-logarithmic function defined by
[1,2]. The Tsallis relative entropy converges to the Kullback–Leibler divergence as , because . In information geometric view, the -divergence converges to the Kullback–Leibler divergence as .
3. Deformed Probability Simplexes and Escort Distributions
For states on and , if each probability satisfies
the probability distribution is called the escort distribution [1,2], where is powered by q.
It realizes the dualistic structure of a set of escort distributions via the affine immersion into [12,13]. Let be the affine immersion of into defined by
where is the canonical coordinate system on . Then, the escort distribution is represented as follows:
For a function on defined by
the image is a level surface of satisfying . For , the Hessian matrix of the function is positive definite on . Then, induces the Hessian structure , where D is the canonical flat affine connection [9,15]. By definition
the tetrad is the dual flat structure.
The submanifold structure of induced by coincides with the dualistic structure induced by the equiaffine immersion , where
for the gradient vector field of on defined by
(cf. Theorem 2) [13,16,17]. In Equation (19), the induced affine connection is the restricted D. The affine fundamental form is the restricted h. The operator is called the shape operator. If the transversal connection form satisfies , , then it is called the equiaffine immersion [18].
For the restricted D and h on , we use the same notations. The pullback of to is -conformally equivalent to defined by Equations (3)–(5). In addition, has a constant curvature [13].
4. Divergences Generated by Affine Immersions as Level Surfaces
Let be the canonical flat affine connection on an -dimensional real affine space . The following theorem is known on the level surfaces of a Hessian function.
Theorem 1
([16]). Let M be a simply connected n-dimensional level surface of φ on an -dimensional Hessian domain with a Riemannian metric and suppose that . If we consider a flat statistical manifold, is a 1-conformally flat statistical submanifold of , where D and g denote the connection and the Riemannian metric on M induced by and , respectively.
For , statistical manifolds and are -conformally equivalent if there exists a function on N such that
If is 1-conformally equivalent to a flat statistical manifold , is called a 1-conformally flat statistical manifold. A statistical manifold is 1-conformally flat iff the dual statistical manifold is -conformally flat [19].
In terms of affine geometry, and are -conformally equivalent if and only if and are projectively equivalent [17,20].
The conformal immersion w for an affine immersion satisfies that , where b is a point on the surface, and a pairing of and . The next definition is given in relation to affine immersions and divergences.
Definition 1
([19]). Let be a 1-conformally flat statistical manifold realized by a non-degenerate affine immersion into , and w the conormal immersion for v. Then, the divergence of is defined by
The definition is independent of the choice of a realization of .
This divergence is referred to as Kurose geometric divergence in affine geometry and as Fenchel–Young divergence in the machine learning community [21]. The canonical divergence of a flat statistical manifold is defined by
where , , and are the primal coordinate, the dual coordinate, and the Legendre transform of , respectively [9]. The gradient mapping is defined by . For a 1-conformally flat statistical submanifold of a Hessian domain , we denote by the restriction of the divergence . Then, the next theorem holds.
Theorem 2
([17]). For a 1-conformally flat statistical submanifold of , two divergences and coincide.
5. Extended Divergence on a Foliation by Deformed Probability Simplexes
Previous sections described the dualistic structure, the affine immersion, and the divergence for each fixed q. This section defines a divergence on a foliation by deformed probability simplexes for all , and shows the divergence decomposition property. We apply the discussion on and in Section 4 to the one on and .
Let be the divergence on defined by the affine immersion by Equations (17) and (18).
Let , which corresponds to a foliation . We define a function on as follows:
where
We identify the dual space with . The i-th component of the conformal immersion of is . Then, the dual coordinate of b, denoted by , satisfies that [12,17]. The next proposition holds.
Proposition 1.
The function satisfies that:
(i) If , , where .
(ii) In the case of , for , and if and only if .
Proof.
The Legendre transform of is defined by + . By Equation (21), (i) holds. The definition of induces that
If , it holds that . In addition, and are convex centro-affine hypersurfaces, and is more on the origin side than . Then, . Thus, (ii) holds. □
Definition 2.
We refer to defined by Equations (22) and (23) as an extended divergence on the foliation .
We define the extended dual divergence of as follows:
where is the Legendre transform of for . Then, the following holds.
Proposition 2.
The functions and satisfy the following:
Proof.
By the definition of the Legendre transform, we have
□
The extended divergence using Equations (22) and (23) is related to the duo Bregman (pseudo-)divergence where the parameters also define the convex functions [22]. Their relationship will be studied in future works.
6. Decomposition of an Extended Divergence
At the beginning of this section, to make a decomposition theorem of an extended divergence, we give flows which are orthogonal to each leaf of .
For the foliation , we consider the flow on defined using the following equation:
where a function on takes the i-th component of the dual coordinate on as Equation (23) for each . An integral curve of Equation (27) is orthogonal to for each q with respect to the pairing . The set of the integral curves becomes the orthogonal foliation of . We denote it by .
Translating into the primal coordinate system, we have the next equation on :
The right-hand side of Equation (28) is calculated using Equations (17) and (18) for . A leaf of is an integral curve of the vector field that takes the value on for each q.
The following theorem is about the decomposition of the extended divergence.
Theorem 3.
Let be the probability simplex, and the 1-conformally flat statistical manifold generated by the affine immersion , where is defined as
, , and . Let , , and . If there exists an orthogonal leaf , which includes b and c, we have
where is the dual coordinate of for each q.
Proof.
From , it holds that , where . By the definition in Equations (22) and (23), we have
□
A decomposition similar to Equation (30) on a foliation of Hessian level surfaces is also available [17]. Theorem 3 generalizes the previous decomposition.
Finally, we describe the gradient flow on a leaf using the extended divergence.
Theorem 4.
For a submanifold of , we denote by an affine coordinate system on such that , , and set , . The gradient flow on defined by
converges to the unique point , where is a variable point parametrized by .
Proof.
By Theorem 3, there exists such that and for any . Equation (31) is described by the dual coordinate system on as follows:
On , from Proposition 1 (i), coincides with the geometric divergence , generated by the affine immersion . The geometric divergence generates the dual coordinate such that , , to be derived by [19]. Then, it holds that
and that
where is an initial point of Equation (31). Then, the gradient flow of Equation (31) converges to following a geodesic for the dual coordinate system. □
The gradient flow similar to Equation (31) has been provided on a flat statistical submanifold [23]. The similar one on a Hessian level surface, i.e., a 1-conformally statistical submanifold, has been given in [17].
7. Conclusions
This study considers a foliation of probability simplexes, which are typical q-exponential families, for the continuous transition of -parameters on information geometry. We still need to provide details on the extended divergence and natural definition of the foliation of q-exponentials.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
I am grateful to the referees for their constructive comments.
Conflicts of Interest
The author declares no conflict of interest.
References
- Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
- Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
- Ohara, A.; Wada, T. Information geometry of q-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A Math. Theor. 2010, 43, 035002. [Google Scholar] [CrossRef]
- Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent Progress in Differential Geometry and Its Related Fields; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific Publishing: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
- Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Physica A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
- Matsuzoe, H.; Henmi, M. Hessian structures and divergence functions on deformed exponential families. In Geometric Theory of Information, Signals and Communication Technology; Nielsen, F., Ed.; Springer: Basel, Switzerland, 2014; pp. 57–80. [Google Scholar]
- Matsuzoe, H.; Wada, T. Deformed algebras and generalizations of independence on deformed exponential families. Entropy 2015, 17, 5729–5751. [Google Scholar] [CrossRef]
- Wada, T.; Matsuzoe, H.; Scarfone, A.M. Dualistic Hessian structures among the thermodynamic potentials in the κ-thermostatistics. Entropy 2015, 17, 7213–7229. [Google Scholar] [CrossRef]
- Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
- Scarfone, A.M.; Matsuzoe, H.; Wada, T. Information geometry of κ-exponential families: Dually-flat, Hessian and Legendre structures. Entropy 2018, 20, 436. [Google Scholar] [CrossRef] [PubMed]
- Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
- Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
- Ohara, A. Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation. Eur. Phys. J. B 2009, 70, 15–28. [Google Scholar] [CrossRef]
- Matsuzoe, H. A sequence of escort distributions and generalizations of expectations on q-exponential family. Entropy 2017, 19, 7. [Google Scholar] [CrossRef]
- Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
- Uohashi, K.; Ohara, A.; Fujii, T. 1-conformally flat statistical submanifolds. Osaka J. Math. 2000, 37, 501–507. [Google Scholar]
- Uohashi, K.; Ohara, A.; Fujii, T. Foliations and divergences of flat statistical manifolds. Hiroshima Math. J. 2000, 30, 403–414. [Google Scholar] [CrossRef]
- Nomizu, K.; Sasaki, T. Affine Differential Geometry: Geometry of Affine Immersions; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
- Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
- Nomizu, K.; Pinkal, U. On the geometry and affine immersions. Math. Z. 1987, 195, 165–178. [Google Scholar] [CrossRef]
- Azoury, K.S.; Warmuth, M.K. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 2001, 43, 211–246. [Google Scholar] [CrossRef]
- Nielsen, F. Statistical divergences between densities of truncated exponential families with nested supports: Duo Bregman and duo Jensen divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
- Fujiwara, A.; Amari, S. Gradient systems in view of information geometry. Physica D 1995, 80, 317–327. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).