A Foliation by Deformed Probability Simplexes for Transition of α -Parameters

: This study considers dualistic structures of the probability simplex from the information geometry perspective. We investigate a foliation by deformed probability simplexes for the transition of α -parameters, not for a ﬁxed α -parameter. We also describe the properties of extended divergences on the foliation when different α -parameters are deﬁned on each of the various leaves.


Introduction
For instance, since the Cauchy distribution and the Student's t-distribution are q-Gaussians, a set of q-normal distributions is considered a typical q-exponential family and has been related to nonextensive statistical mechanics [1,2]. Sets of q-normal distributions and q-exponential families have been investigated from the information geometry perspective, sometimes for nonextensive statistical mechanics and sometimes independently of it [3][4][5][6][7][8][9][10]. Deformed q-exponential families are defined using the deformed logarithm and reciprocal-deformed exponential functions. For instance, deformed q-exponential families have been used for studying escort distributions [11]. Their Hessian and conformal structures have also been investigated [12][13][14].
The current study considers a foliation by deformed probability simplexes representing sets of escort distributions, which are typical q-exponential families for the continuous transition of α-parameters on the information geometry. Previous studies on escort distributions are for a fixed α-parameter or among several α-parameters. However, foliations and divergence decomposition in dually flat spaces using mixed parameterizations are crucial. Therefore, we investigate extended divergences on the foliation, setting different α-parameters on each leaf.
First, we explain the dualistic structures, α-divergences, and the Tsallis relative entropy on the probability simplex. Next, we describe the divergences generated by affine immersions as level surfaces on the deformed probability simplexes corresponding to sets of escort distributions. We then define an extended divergence on a foliation by deformed probability simplexes. Finally, we propose a new decomposition of an extended divergence on the foliation.

Dualistic Structures and Divergences on the Probability Simplex
Let S n be the n-dimensional probability simplex, i.e., where p 1 , . . . , p n+1 are the probabilities of n + 1 states. Let {p 1 , . . . ,p n } be an affine coordinate system on S n , wherep i ≡ p i − p n+1 for i = 1, . . . , n, and (2) be a frame of a tangent vector field on S n . The Fisher metric g = (g ij ) on S n is defined by where δ ij is Kronecker's delta. We define an α-connection ∇ (α) on S n by where δ k ij = 1 if i = j = k, and δ k ij = 0 if others. Then, the Levi-Civita connection ∇ of g coincides with ∇ (0) . For α ∈ R, we have where X (S n ) is the set of all smooth tangent vector fields on S n . Then, ∇ (−α) is called the dual connection of ∇ (α) . For each α, ∇ (α) is torsion-free and ∇ (α) g is symmetric. Therefore, the triple (S n , ∇ (α) , g) is a statistical manifold, and (S n , ∇ (−α) , g) the dual statistical manifold of it [9,12,13]. For α = ±1, an α-divergence D (α) is defined by For q = (1 − α)/2, it is known that for the Tsallis relative entropy, K q defined by where ln q is the q-logarithmic function defined by [1,2]. The Tsallis relative entropy K q converges to the Kullback-Leibler divergence as q → 1, because lim q→1 ln q x = log x. In information geometric view, the α-divergence D (α) converges to the Kullback-Leibler divergence as α → −1.

Deformed Probability Simplexes and Escort Distributions
For n + 1 states p 1 , . . . , p n+1 on S n and 0 < q < 1, if each probability P(p i ) satisfies the probability distribution P is called the escort distribution [1,2], where (p i ) q is p i powered by q. It realizes the dualistic structure of a set of escort distributions via the affine immersion into R n+1 + [12,13]. Let f q be the affine immersion of S n into R n+1 + defined by where {θ 1 , . . . , θ n+1 } is the canonical coordinate system on R n+1 . Then, the escort distribution P is represented as follows: For a function ψ q on R n+1 + defined by the Hessian matrix of the function ψ q is positive definite on R n+1 + . Then, ψ q induces the Hessian structure (R n+1 where D is the canonical flat affine connection [9,15]. By definitioñ for the gradient vector fieldẼ of ψ q on R n+1 + defined by (cf. Theorem 2) [13,16,17]. In Equation (19), the induced affine connection D E is the restricted D. The affine fundamental form h E is the restricted h. The operator S E is called the shape operator. If the transversal connection form satisfies τ E ≡ 0, ( f q , E), then it is called the equiaffine immersion [18].

Divergences Generated by Affine Immersions as Level Surfaces
LetD be the canonical flat affine connection on an (n + 1)-dimensional real affine space A n+1 . The following theorem is known on the level surfaces of a Hessian function. Theorem 1 ([16]). Let M be a simply connected n-dimensional level surface of ϕ on an (n + 1)dimensional Hessian domain (Ω,D,g =Ddϕ) with a Riemannian metricg and suppose that n ≥ 2. If we consider (Ω,D,g) a flat statistical manifold, (M, D, g) is a 1-conformally flat statistical submanifold of (Ω,D,g), where D and g denote the connection and the Riemannian metric on M induced byD andg, respectively.
The conformal immersion w for an affine immersion (v, ξ) satisfies that w(b), ξ b = 1, where b is a point on the surface, and a, b a pairing of a ∈ A * n+1 and b ∈ A n+1 . The next definition is given in relation to affine immersions and divergences. Definition 1 ([19]). Let (N, ∇, h) be a 1-conformally flat statistical manifold realized by a nondegenerate affine immersion (v, ξ) into A n+1 , and w the conormal immersion for v. Then, the divergence ρ con f of (N, ∇, h) is defined by The ρ con f definition is independent of the choice of a realization of (N, ∇, h).
This divergence ρ con f is referred to as Kurose geometric divergence in affine geometry and as Fenchel-Young divergence in the machine learning community [21]. The canonical divergence ρ of a flat statistical manifold (Ω,D,g =Ddϕ) is defined by where (ṽ i ), (ṽ i = −∂ϕ/∂ṽ i ), and ϕ * are the primal coordinate, the dual coordinate, and the Legendre transform of ϕ, respectively [9]. The gradient mappingw is defined bỹ w = −∂ϕ/∂ṽ ∈ A * n+1 . For a 1-conformally flat statistical submanifold (M, D, g) of a Hessian domain (Ω,D,g), we denote by ρ sub the restriction of the divergence ρ. Then, the next theorem holds.
On the level surface ( f q (S n ), D, h) in Section 3, the restricted divergence from the canonical divergence of (R n+1 + , D, h) coincides with the geometric divergence by Equation (20) for the affine immersion ( f q , E). In addition, the pullback divergence to S n coincides with D (α) and the Tsallis relative entropy K q [12].

Extended Divergence on a Foliation by Deformed Probability Simplexes
Previous sections described the dualistic structure, the affine immersion, and the divergence for each fixed q. This section defines a divergence on a foliation by deformed probability simplexes ( f q (S n ), D, h) for all 0 < q < 1, and shows the divergence decomposition property. We apply the discussion on A * n+1 and A n+1 in Section 4 to the one on R * n+1 + and R n+1 + . Let ρ q be the divergence on f q (S n ) defined by the affine immersion ( f q , E q ) by Equations (17) and (18).
Let S f ol = ∪ 0<q<1 f q (S n ) ⊂ R n+1 + , which corresponds to a foliation F = { f q (S n )|0 < q < 1}. We define a function ρ f ol on S f ol × S f ol as follows: We identify the dual space [12,17]. The next proposition holds.

Proposition 1.
The function ρ f ol satisfies that:

Proof. The Legendre transform of
. By Equation (21), (i) holds. The definition of S n induces that In addition, f q(a) (S n ) and f q(b) (S n ) are convex centro-affine hypersurfaces, and f q(a) (S n ) is more on the origin side than

Definition 2.
We refer to ρ f ol defined by Equations (22) and (23) as an extended divergence on the foliation S f ol .
We define the extended dual divergence ρ * f ol of ρ f ol as follows: for where ψ * q is the Legendre transform of ψ q for 0 < q < 1. Then, the following holds.

Proposition 2.
The functions ρ f ol and ρ * f ol satisfy the following: Proof. By the definition of the Legendre transform, we have The extended divergence using Equations (22) and (23) is related to the duo Bregman (pseudo-)divergence where the parameters also define the convex functions [22]. Their relationship will be studied in future works.

Decomposition of an Extended Divergence
At the beginning of this section, to make a decomposition theorem of an extended divergence, we give flows which are orthogonal to each leaf of F .
For the foliation F = { f q (S n )|0 < q < 1}, we consider the flow on S f ol defined using the following equation: where a function η i on S f ol takes the i-th component of the dual coordinate on f q (S n ) as Equation (23) for each 0 < q < 1. An integral curve of Equation (27) is orthogonal to f q (S n ) for each q with respect to the pairing , . The set of the integral curves becomes the orthogonal foliation of F . We denote it by F ⊥ . Translating into the primal coordinate system, we have the next equation on S f ol : The right-hand side of Equation (28) is calculated using Equations (17) and (18) for ψ q . A leaf of F ⊥ is an integral curve of the vector fieldẼ that takes the valueẼ q on f q (S n ) for each q.
The following theorem is about the decomposition of the extended divergence.
Theorem 3. Let S n be the probability simplex, and ( f q (S n ), D, h q = Ddψ q ) the 1-conformally flat statistical manifold generated by the affine immersion ( f q , E q ), where f q is defined as Let a, b ∈ f q(a) (S n ), 0 < q(a) < 1, and c ∈ S f ol ≡ ∪ 0<q<1 f q (S n ). If there exists an orthogonal leaf L ⊥ ∈ F ⊥ , which includes b and c, we have where η(·) is the dual coordinate of f q (S n ) for each q.
Proof. From a, b ∈ f q(a) (S n ), it holds that ψ q(a) (a) = ψ q(b) (b), where q(b) = q(a). By the definition in Equations (22) and (23), we have A decomposition similar to Equation (30) on a foliation of Hessian level surfaces is also available [17]. Theorem 3 generalizes the previous decomposition.
Finally, we describe the gradient flow on a leaf f q (S n ) using the extended divergence.

Theorem 4.
For a submanifold ( f q (S n ), D, h q ) of S f ol , we denote by (θ 1 , . . . , θ n ) an affine coordinate system on f q (S n ) such that Ddθ i = 0, i = 1, . . . , n, and set h q ij = h q (∂/∂θ i , ∂/∂θ j ), The gradient flow on f q (S n ) defined by converges to the unique point b ∈ L ⊥ ∩ f q (S n ), where a θ is a variable point parametrized by (θ i ).
and that where a| t=0 is an initial point of Equation (31). Then, the gradient flow of Equation (31) converges to b ∈ L ⊥ ∩ f q (S n ) following a geodesic for the dual coordinate system.
The gradient flow similar to Equation (31) has been provided on a flat statistical submanifold [23]. The similar one on a Hessian level surface, i.e., a 1-conformally statistical submanifold, has been given in [17].

Conclusions
This study considers a foliation of probability simplexes, which are typical q-exponential families, for the continuous transition of α-parameters on information geometry. We still need to provide details on the extended divergence and natural definition of the foliation of q-exponentials.