Next Article in Journal
On the Possibility of Calculating Entropy, Free Energy, and Enthalpy of Vitreous Substances
Next Article in Special Issue
A Mathematical Realization of Entropy through Neutron Slowing Down
Previous Article in Journal
A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications
Previous Article in Special Issue
Equilibrium States in Two-Temperature Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Conformal Flattening for Deformed Information Geometries on the Probability Simplex  †

Department of Electrical and Electronics, University of Fukui, Bunkyo, Fukui 910-8507, Japan
This paper is an extended version of our paper published in SigmaPhi 2014, 2017 and Geometric Science of Information (GSI 2017).
Entropy 2018, 20(3), 186; https://doi.org/10.3390/e20030186
Submission received: 20 February 2018 / Revised: 8 March 2018 / Accepted: 8 March 2018 / Published: 10 March 2018
(This article belongs to the Special Issue New Trends in Statistical Physics of Complex Systems)

Abstract

:
Recent progress of theories and applications regarding statistical models with generalized exponential functions in statistical science is giving an impact on the movement to deform the standard structure of information geometry. For this purpose, various representing functions are playing central roles. In this paper, we consider two important notions in information geometry, i.e., invariance and dual flatness, from a viewpoint of representing functions. We first characterize a pair of representing functions that realizes the invariant geometry by solving a system of ordinary differential equations. Next, by proposing a new transformation technique, i.e., conformal flattening, we construct dually flat geometries from a certain class of non-flat geometries. Finally, we apply the results to demonstrate several properties of gradient flows on the probability simplex.

1. Introduction

The theory of information geometry has elucidated abundant geometric properties equipped with a Riemannian metric and mutually dual affine connections. When it is applied to the study of statistical models described by the exponential family, the logarithmic function plays a significant role in giving the standard information geometric structure to the models [1,2].
Inspired by the recent progress of several areas in statistical physics and mathematical statistics [3,4,5,6,7,8,9,10] which have exploited theoretical interests and possible applications for generalized exponential families, one research direction in information geometry is pointing to constructions of deformed geometries based on the standard one, keeping its basic properties. A typical and classical example of such a deformation would be the alpha-geometry [1,2], a statistical definition of which can be regarded as a replacement of the logarithmic function by suitable power functions. Hence, for the purpose of the generalization and flexible applicability, much attention is paid to various uses of such replacements by representing functions as important tools [3,4,11,12].
Two major characteristics of the standard structure are dual flatness and invariance [2]. Dual flatness (or Hessian structure [13]) produces fruitful properties such as the existence of canonical coordinate systems, a pair of conjugate potential functions and the canonical divergence (relative entropy). In addition, they are connected with the Legendre duality relation, which is also fundamental in the generalization of statistical physics. On the other hand, the invariance of geometric structure is crucially valuable in developing mathematical statistics. It has been proved [14] that invariance holds for only the structure with a special triple of a Riemannian metric and a pair of mutually dual affine connections, which are respectively called the Fisher information and the alpha-connections (see Section 3 for their definitions). The study of these two characteristics from a viewpoint of representing functions would contribute to our geometrical understanding.
In this paper, we first characterize a pair of representing functions that realizes the invariant information geometric structure. Next, we propose a new transformation to obtain dually flat geometries from a certain class of non-flat information geometries, using concepts from affine differential geometry [15,16]. We call the transformation conformal flattening, which is a generalization of the way to realize the corresponding dually flat geometry from the alpha-geometry developed in [17,18]. As applications and easy consequences of the results, we finally show several properties of gradient flows associated with realized dually flat geometries. Focusing on geometric characteristics conserved by the transformation, we discuss the properties such as a relation between geodesics and flows, the first integral of the flows and so on. These properties are new and general. Hence, they refine the arguments of the flows in [18], where only the alpha-geometry is treated.
The paper is organized as follows. In Section 2, we introduce preliminary results, explaining several existing methods to construct the information geometric structure that includes a dually flat structure and the alpha-structure and so on. We also give a short summary of concepts from affine differential geometry, which will be used in this paper. Section 3 provides a characterization of representing functions that realize invariant geometry, i.e., the one equipped with the Fisher information and a pair of the alpha-connections. The characterization is obtained by solving a simple system of ordinary equations. In Section 4, we first obtain a certain class of information geometric structure by regarding representing functions as immersions into an ambient affine space. Then, we demonstrate the conformal flattening to realize the corresponding dually flat structure, and discuss their properties and relations with generalized entropies or escort probabilities [19]. Section 5 exhibits the geometric properties of gradient flows with respect to a conformally realized Riemannian metric. These flows are reduced to the well-known replicator flow [20] (Chapter 16) when we consider the standard information geometry. Suitably choosing its pay-off functions, we see that the flow follows a geodesic curve or conserves a divergence from an equilibrium. In the final section, some concluding remarks are made.
Throughout the paper, we use a probability simplex as a statistical model for the sake of simplicity.

2. Preliminaries

2.1. Information Geometry of S n and R + n + 1

Let us represent an element p R n + 1 with its components p i , i = 1 , , n + 1 as p = ( p i ) R n + 1 . Denote, respectively, the positive orthant by
R + n + 1 : = { p = ( p i ) R n + 1 | p i > 0 , i = 1 , , n + 1 } ,
and the relative interior of the probability simplex by
S n : = p R + n + 1 i = 1 n + 1 p i = 1 .
Let p ( X ) be a probability distribution of a random variable X taking a value in the finite sample space Ω = { 1 , 2 , , n , n + 1 } . We consider a set of distributions p ( X ) with positive probabilities, i.e., p ( i ) = p i > 0 , i = 1 , , n + 1 , defined by
p ( X ) = i = 1 n + 1 p i δ i ( X ) , δ i ( j ) = δ i j ( the   Kronecker s   delta ) ,
which is identified with S n . A statistical model in S n is represented with parameters ζ = ( ζ j ) , j = 1 , , d n by
p ζ ( X ) = i = 1 n + 1 p i ( ζ ) δ i ( X ) ,
where each p i is smoothly parametrized by ζ . For such a statistical model, ζ j can also be regarded as coordinates of the corresponding submanifold in S n . For simplicity, we shall consider the full model, i.e., d = n and the parameter set is bijective with S n via p i ( ζ ) ’s.
The information geometric structure [2] on S n denoted by ( g , , * ) is composed of the pair of mutually dual torsion-free affine connections ∇ and * with respect to a Riemannian metric g. If we write i : = / ζ i , i = 1 , , n , the mutual duality requires components of ( g , , * ) to satisfy
i g j k = Γ i j , k + Γ i k , j * .
Let L and M be a pair of strictly monotone (i.e., one-to-one) smooth functions on the interval ( 0 , 1 ) . One way of constructing such a structure ( g , , * ) is to define the components as follows [2,11]:
g i j ( p ) = X Ω i L ( p ζ ( X ) ) j M ( p ζ ( X ) ) , i , j = 1 , , n ,
Γ i j , k ( p ) = X Ω i j L ( p ζ ( X ) ) k M ( p ζ ( X ) ) , i , j , k = 1 , , n ,
Γ i j , k * ( p ) = X Ω k L ( p ζ ( X ) ) i j M ( p ζ ( X ) ) , i , j , k = 1 , , n .
In this paper, we call L and M representing functions. It is easy to verify the mutual duality (1). (Positive definiteness of g needs additional conditions.)
When the curvature tensors of both ∇ and * vanish, ( g , , * ) is called dually flat [2]. It is known that ( g , , * ) is dually flat if and only if there exist two special coordinate systems denoted by θ i ( p ) and η i ( p ) , i = 1 , , n , respectively, where ( θ i ) is ∇-affine, ( η i ) is * -affine and they are biorthogonal, i.e.,
g θ i , η j = δ i j .
We give examples. For a real number α , define L ( α ) ( u ) : = 2 u ( 1 α ) / 2 / ( 1 α ) and L ( 1 ) : = ln u . If we set L ( u ) = L ( α ) ( u ) and M ( u ) = L ( α ) ( u ) , then they derive the alpha-structure [2] ( g F , ( α ) , ( α ) ) , where g F is the Fisher information and ( ± α ) are the alpha-connections (see Section 3). In particular, if we choose α = 1 , it defines the standard dually flat structure ( g F , ( e ) : = ( 1 ) , ( m ) : = ( 1 ) ) , where ( e ) and ( m ) are called the e- and m-connection, respectively [2]. Similarly, the ϕ -log geometry [3] can also be introduced in the same way by taking L ( u ) = log ϕ ( u ) and M ( u ) = u .
One traditional way to construct a general information geometric structure ( g , , * ) , without using representing functions, is by means of contrast functions (or divergences) [2,21]. In our case, let ρ be a function on S n × S n satisfying ρ ( p , r ) 0 , p , r S n with equality if and only if p = r . For a vector field i , let ( i ) p denote its tangent vector at p. When we define
g i j ( p ) = ( i ) p ( j ) r ρ ( p , r ) p = r , i , j = 1 , , n ,
Γ i j , k ( p ) = ( i ) p ( j ) p ( k ) r ρ ( p , r ) p = r , i , j , k = 1 , , n ,
Γ i j , k * ( p ) = ( i ) p ( j ) r ( k ) r ρ ( p , r ) p = r , i , j , k = 1 , , n ,
we can confirm that (1) holds. If g is positive definite, we say that ρ is a contrast function or a divergence that induces the structure ( g , , * ) .
A contrast function ρ of the form:
ρ ( p , r ) = ψ ( θ ( p ) ) + φ ( η ( r ) ) i = 1 n θ i ( p ) η i ( r )
always induces the corresponding dually flat structure. Conversely, it is known [2] that if ( g , , * ) is dually flat, then there exists the unique contrast function of the form (8) that induces the structure. Hence, it is called the canonical divergence of ( g , , * ) and we say that the functions ψ and φ are potentials. By setting p = r , we see that a dually flat structure naturally gives the Legendre duality relations at each p, i.e., the function φ , is the Legendre conjugate of ψ satisfying
η i = ψ θ i , θ i = φ η i .
Applying the idea of affine hypersurface theory [15] is also one of the other ways to construct the information geometric structure. Let D be the canonical flat affine connection on R n + 1 . Consider an immersion f from S n into R n + 1 and a vector field ξ on S n that is transversal to the hypersurface f ( S n ) in R n + 1 . Such a pair ( f , ξ ) , called an affine immersion, defines a torsion-free connection ∇ and the affine fundamental form g on S n via the Gauss formula as
D X f * ( Y ) = f * ( X Y ) + g ( X , Y ) ξ , X , Y X ( S n ) ,
where X ( S n ) is the set of tangent vector fields on S n and f * denotes the differential of f. By regarding g as a (pseudo-) Riemannian metric, one can discuss the realized structure ( g , ) on S n .
We say that ( f , ξ ) is non-degenerate and equiaffine if g is non-degenerate and D X ξ is tangent to S n for any X X ( S n ) , respectively. The latter ensures that the volume element θ on S n defined by
θ ( X 1 , , X n ) = det ( f * ( X 1 ) , , f * ( X n ) , ξ ) , X i X ( S n )
is parallel to ∇ [15] (p.31). It is known [15,16] that there exists a torsion-free dual affine connection * satisfying (1) if and only if ( f , ξ ) is non-degenerate and equiaffine. In this case, the obtained structure ( g , , * ) on S n is not dually flat in general. However, there always exists a positive function σ and a dually flat structure ( g ˜ , ˜ , ˜ * ) on S n that hold the following relations [16]:
g ˜ = σ g ,
g ( ˜ X Y , Z ) = g ( X Y , Z ) d ( ln σ ) ( Z ) g ( X , Y ) ,
g ( ˜ X * Y , Z ) = g ( X * Y , Z ) + d ( ln σ ) ( X ) g ( Y , Z ) + d ( ln σ ) ( Y ) g ( X , Z ) .
Furthermore, there exists a specific contrast function ρ ( p , r ) for ( g , , * ) called the geometric divergence. Then, a contrast function ρ ˜ ( p , r ) that induces ( g ˜ , ˜ , ˜ * ) is given by the conformal divergence ρ ˜ ( p , r ) = σ ( r ) ρ ( p , r ) . These properties of the structure ( g , ) realized by the non-degenerate and equiaffine immersion are called 1-conformal flatness [16].

3. Characterization of Invariant Geometry by Representing Functions

Suppose that a pair of representing functions ( L , M ) defines an information geometric structure ( g , , * ) by (2), (3) and (4). In this section, we consider the condition of ( L , M ) such that ( g , , * ) is invariant. This is equivalent [2,14] to g which is the Fisher information g F defined by
g i j F ( p ) = X Ω p ζ ( i ln p ζ ) ( j ln p ζ )
and a pair of dual connections satisfies = ( α ) and * = ( α ) for a certain α R , where ( α ) is the α -connection defined by
Γ i j , k ( α ) = X Ω p ζ i j ln p ζ + 1 α 2 ( i ln p ζ ) ( j ln p ζ ) ( k ln p ζ ) .
Hence, g i j expressed in (2) by functions L ( u ) and M ( u ) coincides with the Fisher information if and only if the following equation holds:
d L d u d M d u = 1 / u .
Similarly, we derive a condition for Γ i j , k expressed in (3) to be the α -connection. First, note that the following relations hold:
i ln p ζ = ( i p ζ ) 1 p ζ , i j ln p ζ = ( i j p ζ ) 1 p ζ ( i p ζ ) ( j p ζ ) 1 p ζ 2 .
On the other hand, we have
i j L ( p ζ ) = ( i j p ζ ) d L d u ( p ζ ) + ( i p ζ ) ( j p ζ ) d 2 L d u 2 ( p ζ ) ,
k M ( p ζ ) = ( k p ζ ) d M d u ( p ζ ) .
Substituting (16), (17) and (18) into (3) and (14), and comparing them, we obtain (15) again and
d 2 L d u 2 d M d u = 1 + α 2 u 2 .
Expressing L : = d L / d u and L : = d 2 L / d u 2 , we have the following ODE from (15) and (19):
L L = 1 + α 2 u .
By integrations, we get
ln L = ( 1 + α ) 2 ln u + c
and
L ( u ) = c 1 u ( 1 α ) / 2 + c 2 , M ( u ) = c 3 u ( 1 + α ) / 2 + c 4 ,
where c and c i , i = 1 , , 4 are constants with a constraint c 1 c 3 = 4 / ( 1 α 2 ) . Thus, ( L , M ) is essentially a pair of representing functions that derives the alpha-geometry and there is only freedom of adjusting the constants for the invariance of geometry. If we require solely (15), which implies that only a Riemannian metric g is the Fisher information g F , there still remains much freedom for ( L , M ) .

4. Affine Immersion of the Probability Simplex

Now we consider the affine immersion with the following assumptions.
Assumptions:
  • The affine immersion ( f , ξ ) is nondegenerate and equiaffine,
  • The immersion f is given by the component-by-component and common representing function L, i.e.,
    f : S n p = ( p i ) x = ( x i ) R n + 1 , x i = L ( p i ) , i = 1 , , n + 1 ,
  • The representing function L : ( 0 , 1 ) R is sign-definite, concave with L < 0 and strictly increasing, i.e., L > 0 . Hence, the inverse of L denoted by E exists, i.e., E L = id .
  • Each component of ξ satisfies ξ i < 0 , i = 1 , , n + 1 on S n .
Remark 1.
From the assumption 3, it follows that L E = 1 , E > 0 and E > 0 . Regarding sign-definiteness of L, note that we can adjust L ( u ) to L ( u ) + c by a suitable constant c without loss of generality since the resultant geometric structure is unchanged (See Theorem 1) by the adjustment. For a fixed L satisfying the assumption 3, we can choose ξ that meets the assumptions 1 and 4. For example, if we take ξ i = | L ( p i ) | , then ( f , ξ ) is called centro-affine, which is known to be equiaffine [15] (p.37). The assumptions 3 and 4 also assure positive definiteness of g (the details are described in the proof of Theorem 1). Hence, ( f , ξ ) is non-degenerate and we can regard g as a Riemannian metric on S n .

4.1. Conormal Vector and the Geometric Divergence

Define a function Ψ on R n + 1 by
Ψ ( x ) : = i = 1 n + 1 E ( x i ) ,
then f ( S n ) immersed in R n + 1 is expressed as a level surface of Ψ ( x ) = 1 . Denote by R n + 1 the dual space of R n + 1 and by ν , x the pairing of x R n + 1 and ν R n + 1 . The conormal vector [15] (p.57) ν : S n R n + 1 for the affine immersion ( f , ξ ) is defined by
ν ( p ) , f * ( X ) = 0 , X T p S n , ν ( p ) , ξ ( p ) = 1 ,
for p S n . Using the assumptions and noting the relations:
Ψ x i = E ( x i ) = 1 L ( p i ) > 0 , i = 1 , , n + 1 ,
we have
ν i ( p ) : = 1 Λ Ψ x i = 1 Λ ( p ) E ( x i ) = 1 Λ ( p ) 1 L ( p i ) , i = 1 , , n + 1 ,
where Λ is a normalizing factor defined by
Λ ( p ) : = i = 1 n + 1 Ψ x i ξ i = i = 1 n + 1 1 L ( p i ) ξ i ( p ) .
Then, we can confirm (23) using the relation i = 1 n + 1 X i = 0 for X = ( X i ) X ( S n ) . Note that v : S n R n + 1 defined by
v i ( p ) = Λ ( p ) ν i ( p ) = 1 L ( p i ) , i = 1 , , n + 1 ,
also satisfies
v ( p ) , f * ( X ) = 0 , X T p S n .
Furthermore, it follows, from (24), (25) and the assumption 4, that
Λ ( p ) < 0 , ν i ( p ) < 0 , i = 1 , , n + 1 ,
for all p S n .
It is known [15] (p.57) that the affine fundamental form g can be represented by
g ( X , Y ) = ν * ( X ) , f * ( Y ) , X , Y T p S n .
In our case, it is calculated via (26) as
g ( X , Y ) = Λ 1 v * ( X ) , f * ( Y ) X ( Λ 1 ) v , f * ( Y ) = 1 Λ i = 1 n + 1 1 L ( p i ) L ( p i ) X i Y i = 1 Λ i = 1 n + 1 L ( p i ) L ( p i ) X i Y i , X , Y T p S n .
Hence, g is positive definite from the assumptions 3 and 4, and we can regard it as a Riemannian metric.
Utilizing these notions from affine differential geometry, we can introduce a geometric divergence [16] as follows:
ρ ( p , r ) = ν ( r ) , f ( p ) f ( r ) = i = 1 n + 1 ν i ( r ) ( L ( p i ) L ( r i ) ) = 1 Λ ( r ) i = 1 n + 1 L ( p i ) L ( r i ) L ( r i ) , p , r S n .
It is easily checked that ρ is actually a contrast function of the 1-conformally flat structure ( g , , * ) using (5), (6) and (7).

4.2. Conformal Flattening Transformation

As is described in the preliminary section, by 1-conformally flatness there exists a positive function, i.e., conformal factor σ that relates ( g , , * ) with a dually flat structure ( g ˜ , ˜ , ˜ * ) via the conformal transformation (10), (11) and (12). A contrast function ρ ˜ that induces ( g ˜ , ˜ , ˜ * ) is given as the conformal divergence:
ρ ˜ ( p , r ) = σ ( r ) ρ ( p , r ) , p , r S n .
from the geometric divergence ρ in (28).
For an arbitrary function L within our setting given by the four assumptions, we prove that we can construct a dually flat structure ( g ˜ , ˜ , ˜ * ) by choosing the conformal factor σ carefully. Hereafter, we call this transformation conformal flattening.
Define
Z ( p ) : = i = 1 n + 1 ν i ( p ) = 1 Λ ( p ) i = 1 n + 1 1 L ( p i ) ,
then it is negative because each ν i ( p ) is negative. The conformal divergence to ρ with respect to the conformal factor σ ( r ) : = 1 / Z ( r ) is
ρ ˜ ( p , r ) = 1 Z ( r ) ρ ( p , r ) .
Theorem 1.
If the conformal factor is σ = 1 / Z , then the information geometric structure ( g ˜ , ˜ , ˜ * ) on S n that is transformed from the 1-conformally flat structure ( g , , * ) via (10), (11) and (12) is dully flat. Furthermore, the conformal divergence ρ ˜ that induces ( g ˜ , ˜ , ˜ * ) on S n is canonical where Legendre conjugate potential functions and coordinate systems are explicitly given by
θ i ( p ) = x i ( p ) x n + 1 ( p ) = L ( p i ) L ( p n + 1 ) , i = 1 , , n ,
η i ( p ) = P i ( p ) : = ν i ( p ) Z ( p ) = 1 / L ( p i ) k = 1 n + 1 1 / L ( p k ) , i = 1 , , n ,
ψ ( p ) = x n + 1 ( p ) = L ( p n + 1 ) ,
φ ( p ) = 1 Z ( p ) i = 1 n + 1 ν i ( p ) x i ( p ) = i = 1 n + 1 P i ( p ) L ( p i ) .
Proof. 
Using given relations, we first show that the conformal divergence ρ ˜ is the canonical divergence for ( g ˜ , ˜ , ˜ * ) :
ρ ˜ ( p , r ) = 1 Z ( r ) ν ( r ) , f ( p ) f ( r ) = P ( r ) , f ( r ) f ( p ) = i = 1 n + 1 P i ( r ) ( x i ( r ) x i ( p ) ) = i = 1 n + 1 P i ( r ) x i ( r ) i = 1 n P i ( r ) ( x i ( p ) x n + 1 ( p ) ) i = 1 n + 1 P i ( r ) x n + 1 ( p ) = φ ( r ) i = 1 n η i ( r ) θ i ( p ) + ψ ( p ) .
Next, let us confirm that ψ / θ i = η i .
Since θ i ( p ) = L ( p i ) + ψ ( p ) , i = 1 , , n , we have
p i = E ( θ i ψ ) , i = 1 , , n + 1 ,
by setting θ n + 1 : = 0 . Hence, we have
1 = i = 1 n + 1 E ( θ i ψ ) .
Differentiating by θ j , we obtain
0 = θ j i = 1 n + 1 E ( θ i ψ ) = i = 1 n + 1 E ( θ i ψ ) δ j i ψ θ j = E ( x j ) i = 1 n + 1 E ( x i ) ψ θ j .
This implies that
ψ θ j = E ( x j ) i = 1 n + 1 E ( x i ) = η j .
Together with (34) and this relation, φ is confirmed to be the Legendre conjugate of ψ .
The dual relation φ / η i = θ i follows automatically from the property of the Legendre transform. □
The following corollary is straightforward because all the quantities in the theorem depend on only L:
Corollary 1.
Under the assumptions, the dually flat structure ( g ˜ , ˜ , ˜ * ) on S n , obtained by following the above conformal flattening, does not depend on the choice of the transversal vector ξ.
Remark 2.
Note that the conformal metric is given by g ˜ = g / Z and is positive definite. Furthermore, the relation (12) means that the dual affine connections * and ˜ * are projectively (or -1-conformally) equivalent [15,16]. Hence, * is projectively flat. Furthermore, the above corollary implies that the realized affine connectionis also projectively equivalent to the flat connection ˜ if we use the centro-affine immersion, i.e., ξ i = L ( p i ) [15,16]. See Proposition 3 for an application of projective equivalence of affine connections.
Remark 3.
In our setting, conformal flattening is geometrically regarded as normalization of the conormal vector ν. Hence, the dual coordinates η i ( p ) = P i ( p ) can be interpreted as a generalization of the escort probability [10,19] (see the following example). Similarly, ψ and φ might be seen as the associated Massieu function and entropy, respectively.
Remark 4.
While the immersion f is composed of a representing function L under the assumption 2, the corresponding M of a single variable does not generally exist for ( g , , * ) nor ( g ˜ , ˜ , ˜ * ) . From the expressions of the Riemann metrics g in (27) and g ˜ = g / Z , we see that the counterparts of the representing functions M ( p i ) would be, respectively, ν i ( p ) and P i ( p ) , but note that they are multi-variable functions of p = ( p i ) .

4.3. Examples

If we take L to be the logarithmic function L ( t ) = ln ( t ) , then the conformally flattened geometry immediately defines the standard dually flat structure ( g F , ( 1 ) , ( 1 ) ) on the simplex S n . We see that φ ( p ) is the entropy, i.e., φ ( p ) = i = 1 n + 1 p i ln p i and the conformal divergence is the KL divergence (relative entropy), i.e., ρ ˜ ( p , r ) = D ( KL ) ( r | | p ) = i = 1 n + 1 r i ( ln r i ln p i ) .
Next, let the affine immersion ( f , ξ ) be defined by the following L and ξ :
L ( t ) : = 1 1 q t 1 q , x i ( p ) = 1 1 q ( p i ) 1 q ,
and
ξ i ( p ) = q ( 1 q ) x i ( p ) ,
with 0 < q and q 1 . We see that the immersion is centro-affine scaled by the constant factor q ( 1 q ) . Then, we see that the immersion realizes the alpha-structure ( g F , ( α ) , ( α ) ) on S n with q = ( 1 + α ) / 2 . The geometric divergence is the alpha-divergence, i.e.,
ρ ( p , r ) = 4 1 α 2 1 i = 1 n + 1 ( p i ) ( 1 α ) / 2 ( r i ) ( 1 + α ) / 2 .
Following the procedure of conformal flattening described in the above, we have [17]
Ψ ( x ) = i = 1 n + 1 ( ( 1 q ) x i ) 1 / 1 q , Λ ( p ) = q , ( c o n s t a n t )
ν i ( p ) = 1 q ( p i ) q , σ ( p ) = 1 Z ( p ) = q k = 1 n + 1 ( p i ) q ,
and obtain a dually flat structure ( g ˜ F , ˜ , ˜ * ) via the formulas in Theorem 1:
η i = P i = ( p i ) q k = 1 n + 1 ( p k ) q , θ i = 1 1 q ( p i ) 1 q 1 1 q ( p n + 1 ) 1 q = ln q ( p i ) ψ ( p ) ,
ψ ( p ) = ln q ( p n + 1 ) , φ ( p ) = ln q 1 exp q ( S q ( p ) ) , g ˜ F = 1 Z ( p ) g F .
Here, ln q and S q ( p ) are the q-logarithmic function and the Tsallis entropy [10], respectively, defined by
ln q ( t ) = t 1 q 1 1 q , S q ( p ) = i = 1 n + 1 ( p i ) q 1 1 q .
Note that the escort probability appears as the dual coordinate η i .

5. An Application to Gradient Flows on S n

Recall the replicator flow on the simplex S n for given functions f i ( p ) defined by
p ˙ i = p i ( f i ( p ) f ¯ ( p ) ) , i = 1 , , n + 1 , f ¯ ( p ) : = i = 1 n + 1 p i f i ( p ) ,
which is extensively studied in evolutionary game theory. It is known [20] (Chapter 16) that
(i)
the solution to (35) is the gradient flow that maximizes a function V ( p ) satisfying
f i = V p i , i = 1 , , n + 1 ,
with respect to the Shahshahani metric g S (See below),
(ii)
the KL divergence is a local Lyapunov function for an equilibrium called the evolutionary stable state (ESS) for the case of f i ( p ) = j = 1 n + 1 a i j p j with ( a i j ) R ( n + 1 ) × ( n + 1 ) .
The Shahshahani metric g S is defined on the positive orthant R + n + 1 by
g i j S ( p ) = k = 1 n + 1 p k p i δ i j , i , j = 1 , , n + 1 .
Note that the Shahshahani metric induces the Fisher metric g F on S n . Further, the KL divergence is the canonical divergence [2] of ( g F , ( 1 ) , ( 1 ) ) . Thus, the replicator dynamics (35) are closely related with the standard dually flat structure ( g F , ( 1 ) , ( 1 ) ) , which associates with exponential and mixture families of probability distributions. In addition, investigation of the flow is also important from a viewpoint of statistical physics governed by the Boltzmann–Gibbs distributions when we choose V ( p ) as various physical quantities, e.g., free energy or entropy.
Similarly, when we consider various Legendre relations deformed by L, it would be of interest to investigate gradient flows on S n for a dually flat structure ( g ˜ , ˜ , ˜ * ) or a 1-conformally flat structure ( g , , * ) . Since g and g ˜ can be naturally extended to R + n + 1 as a diagonal form (we use the same notation for brevity):
g i j ( p ) = 1 Λ ( p ) L ( p i ) L ( p i ) δ i j , g ˜ i j ( p ) = 1 Z ( p ) g i j ( p ) , i , j = 1 , , n + 1
from (27), we can define two gradient flows for V ( p ) on S n . One is the gradient flow for g, which is
p ˙ i = g i i 1 ( f i f ¯ H ) , f ¯ H ( p ) : = k = 1 n + 1 H k ( p ) f k ( p ) , H i ( p ) : = g i i 1 ( p ) k = 1 n + 1 g k k 1 ( p ) ,
for i = 1 , , n + 1 . It is verified that p ˙ is tangent to S n , i.e., p ˙ T p S n and gradient of V, i.e.,
g ( X , p ˙ ) = i = 1 n + 1 f i X i f ¯ H i = 1 n + 1 X i = i = 1 n + 1 V p i X i , X = ( X i ) X ( S n ) .
In the same way, the other one for g ˜ is defined by
p ˙ i = g ˜ i i 1 ( f i f ¯ H ) , f ¯ H ( p ) : = k = 1 n + 1 H k ( p ) f k ( p ) , i = 1 , , n + 1 .
Note that both the flows reduce to (35) when L = ln .
From (37), the following consequence is immediate:
Proposition 1.
The trajectories of the gradient flow (38) and (39) starting from the same initial point coincide while velocities of time-evolutions are different by the factor- Z ( p ) .
Taking account of the example with respect to the alpha-geometry and the conformally flattened one given in subsection 4.3, the following result shown in [18] can be regarded as a corollary of the above proposition:
Corollary 2.
The trajectories of the gradient flow (39) with respect to the conformal metric g ˜ for L ( t ) = t 1 q / ( 1 q ) coincide with those of the replicator flow (35) while velocities of time-evolutions are different by the factor- Z ( p ) .
Next, we particularly consider the case when V ( p ) is a potential function or divergences. As for a gradient flow on a manifold equipped with a dually flat structure ( g ˜ , ˜ , ˜ * ) , the following result is known:
Proposition 2.
[22] Consider the potential function ψ ( p ) and the canonical divergence ρ ˜ ( p , r ) of ( g ˜ , ˜ , ˜ * ) for an arbitrary prefixed point r. The gradient flows for V ( p ) = ± ψ ( p ) and V ( p ) = ± ρ ( p , r ) follow ˜ * -geodesic curves.
As is described in Remark 2, * and ˜ * are projectively equivalent. One geometrically interesting property of the projective equivalence is that * - and ˜ * - geodesic curves coincide up to their parametrizations (i.e., a curve is * -pregeodesic if and only if it is ˜ * -pregeodesic) [15] (p.17). Combining this fact with Propositions 1 and 2, we see that the following result holds:
Proposition 3.
Let r S n be an arbitrary prefixed point. The gradient flows (38) for V ( p ) = ± ρ ( p , r ) = ± ρ ˜ ( p , r ) / σ ( r ) , V ( p ) = ± ρ ˜ ( p , r ) and V ( p ) = ± ψ ( p ) follow ˜ * -geodesic curves.
Finally, we demonstrate here another aspect of the flow (39). Let us particularly consider the following functions f i :
f i ( p ) : = L ( p i ) ( L ( p i ) ) 2 j = 1 n + 1 a i j P j ( p ) , a i j = a j i R , i , j = 1 , , n + 1 .
Note that f i s are not integrable, i.e., non-trivial V satisfying (36) does not exist because of the anti-symmetry of a i j . Hence, for this case, (39) is no longer a gradient flow. However, we can prove the following result:
Theorem 2.
Consider the flow (39) with the functions f i s defined in (40) and assume that there exists an equilibrium r S n for the flow. Then, ρ ( p , r ) and ρ ˜ ( p , r ) are the first integral (conserved quantity) of the flow.
Proof. 
By substituting (40) into f ¯ H ( p ) in (39) and using the expression of g ˜ i i in (37), we have
f ¯ H ( p ) = 1 k = 1 n + 1 L ( p k ) / L ( p k ) i = 1 n + 1 1 L ( p i ) j = 1 n + 1 a i j P i ( p ) .
By the relation E ( x i ) = 1 / L ( p i ) and (31), it holds that
i = 1 n + 1 1 L ( p i ) j = 1 n + 1 a i j P i ( p ) = 1 k = 1 n + 1 E ( x k ) i = 1 n + 1 j = 1 n + 1 a i j E ( x l ) E ( x j ) = 0 .
Hence, we see that f ¯ H = 0 and the flow (39) reduces to
p ˙ i = g ˜ i i 1 ( p ) f i ( p ) , i = 1 , , n + 1 .
Since r is an equilibrium point, we see from (40) that
j = 1 n + 1 a i j P j ( r ) = 0 , i = 1 , , n + 1 .
Then, using (34), (41) and (42), we have
d ρ ˜ ( p , r ) d t = i = 1 n + 1 P i ( r ) L ( p i ) p ˙ i = Z ( p ) Λ ( p ) i = 1 n + 1 P i ( r ) ( L ( p i ) ) 2 L ( p i ) f i ( p ) = Z ( p ) Λ ( p ) i = 1 n + 1 P i ( r ) j = 1 n + 1 a i j P j ( p ) = Z ( p ) Λ ( p ) i = 1 n + 1 j = 1 n + 1 ( P i ( p ) P i ( r ) ) a i j P j ( p ) = Z ( p ) Λ ( p ) i = 1 n + 1 j = 1 n + 1 ( P i ( p ) P i ( r ) ) a i j ( P j ( p ) P j ( r ) ) = 0 .
Thus, ρ ˜ ( p , r ) is the first integral of the flow. It follows that ρ ( p , r ) is also the first integral of the flow from the definition of conformal divergence (29). □
Remark 5.
From proposition 1, the same statement holds for the flow (38). The proposition implies the fact [20] that the KL divergence is the first integral for the replicator flow (35) with the function f i ( p ) in (40) defined by L ( t ) = ln t and P j ( p ) = p j .

6. Conclusions

We have considered two important aspects of information geometric structure, i.e., invariance and dual flatness, from a viewpoint of representing functions. As for the invariance of geometry, we have proved that a pair of representing functions that derives the alpha-structure is essentially unique. On the other hand, we have shown the explicit formula of conformal flattening that transforms 1-conformally flat structures on the simplex S n realized by affine immersions to the corresponding dually flat structures. Finally, we have discussed several geometric properties of gradient flows associated to two structures.
Presently, our analysis is restricted to the probability simplex, i.e., the space of discrete probability distributions. For the continuous case, the similar or related results are obtained in [23,24] without using affine immersions. Extensions of the results obtained in this paper to continuous probability space and the exploitation of relations to the literature are left for future work.
The conformal flattening can also be applied to the computationally efficient construction of a Voronoi diagram with respect to the geometric divergences [18]. Exploring the possibilities of other applications would be of interest.

Acknowledgments

Part of the results is adapted and reprinted with permission from Springer Customer Service Centre GmbH (licence No: 4294160782766): Springer Nature, Geometric Science of Information LNCS 10589 Nielsen, F., Barbaresco, F., Eds., (Article Name:) On affine immersions of the probability simplex and their conformal flattening, (Author:) A. Ohara, (Copyright:) Springer International Publishing AG 2017 [25]. The author is partially supported by JSPS Grant-in-Aid (C) 15K04997.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Amari, S.I. Differential-Geometrical Methods in Statistics; Lecture Notes in Statistics Series 28; Springer: New York, NY, USA, 1985. [Google Scholar]
  2. Amari, S.I.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs Series 191; AMS & Oxford University Press: Oxford, UK, 2000. [Google Scholar]
  3. Naudts, J. Continuity of a class of entropies and relative entropies. Rev. Math. Phys. 2004, 16, 809. [Google Scholar] [CrossRef]
  4. Eguchi, S. Information geometry and statistical pattern recognition. Sugaku Expos. 2006, 19, 197. [Google Scholar]
  5. Grünwald, P.D.; Dawid, A.P. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 2004, 32, 1367. [Google Scholar]
  6. Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053. [Google Scholar] [CrossRef]
  7. Naudts, J. The q-exponential family in statistical Physics. Cent. Eur. J. Phys. 2009, 7, 405. [Google Scholar] [CrossRef]
  8. Naudts, J. Generalized thermostatics; Springer: Berlin, Germany, 2010. [Google Scholar]
  9. Ollila, E.; Tyler, D.; Koivunen, V.; Poor, V. Complex elliptically symmetric distributions : Survey, new results and applications. IEEE Trans. Sig. Proc. 2012, 60, 5597. [Google Scholar] [CrossRef]
  10. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: Berlin, Germany, 2009. [Google Scholar]
  11. Zhang, J. Divergence Function, Duality, and Convex Analysis. Neural Comput. 2004, 16, 159. [Google Scholar] [CrossRef] [PubMed]
  12. Wada, T.; Matsuzoe, H. Conjugate representations and characterizing escort expectations in information geometry. Entropy 2017, 19, 309. [Google Scholar] [CrossRef]
  13. Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
  14. Chentsov, N.N. Statistical Decision Rules and Optimal Inference; AMS: Providence, RI, USA, 1982. [Google Scholar]
  15. Nomizu, K.; Sasaki, T. Affine Differential Geometry; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
  16. Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994, 46, 427. [Google Scholar] [CrossRef]
  17. Ohara, A.; Matsuzoe, H.; Amari, S.I. A dually flat structure on the space of escort distributions. J. Phys. Conf. Ser. 2010, 201, 012012. [Google Scholar] [CrossRef]
  18. Ohara, A.; Matsuzoe, H.; Amari, S.I. Conformal geometry of escort probability and its applications. Mod. Phys. Lett. B 2012, 26, 1250063. [Google Scholar] [CrossRef]
  19. Tsallis, C.; Mendes, M.S.; Plastino, A.R. The role of constraints within generalized nonextensive statistics. Physica A 1998, 261, 534. [Google Scholar] [CrossRef]
  20. Hofbauer, J.; Sigmund, K. The Theory of Evolution and Dynamical Systems: Mathematical Aspects of Selection; Cambridge University Press: Cambridge, UK, 1988. [Google Scholar]
  21. Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631. [Google Scholar]
  22. Fujiwara, A.; Amari, S.I. Gradient systems in view of information geometry. Physica D 1995, 80, 317. [Google Scholar] [CrossRef]
  23. Amari, S.I.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Physica A 2012, 391, 4308. [Google Scholar] [CrossRef]
  24. Matsuzoe, H. Hessian structures on deformed exponential families and their conformal structures. Diff. Geo. Appl. 2014, 35, 323. [Google Scholar] [CrossRef]
  25. Ohara, A. On affine immersions of the probability simplex and their conformal flattening. In Geometric Science of Information; Nielsen, F., Barbaresco, F., Eds.; Springer: Berlin, Germany, 2017. [Google Scholar]

Share and Cite

MDPI and ACS Style

Ohara, A. Conformal Flattening for Deformed Information Geometries on the Probability Simplex . Entropy 2018, 20, 186. https://doi.org/10.3390/e20030186

AMA Style

Ohara A. Conformal Flattening for Deformed Information Geometries on the Probability Simplex . Entropy. 2018; 20(3):186. https://doi.org/10.3390/e20030186

Chicago/Turabian Style

Ohara, Atsumi. 2018. "Conformal Flattening for Deformed Information Geometries on the Probability Simplex " Entropy 20, no. 3: 186. https://doi.org/10.3390/e20030186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop