1. Introduction
For instance, since the Cauchy distribution and the Student’s t-distribution are
q-Gaussians, a set of
q-normal distributions is considered a typical
q-exponential family and has been related to nonextensive statistical mechanics [
1,
2]. Sets of
q-normal distributions and
q-exponential families have been investigated from the information geometry perspective, sometimes for nonextensive statistical mechanics and sometimes independently of it [
3,
4,
5,
6,
7,
8,
9,
10]. Deformed
q-exponential families are defined using the deformed logarithm and reciprocal-deformed exponential functions. For instance, deformed
q-exponential families have been used for studying escort distributions [
11]. Their Hessian and conformal structures have also been investigated [
12,
13,
14].
The current study considers a foliation by deformed probability simplexes representing sets of escort distributions, which are typical q-exponential families for the continuous transition of -parameters on the information geometry. Previous studies on escort distributions are for a fixed -parameter or among several -parameters. However, foliations and divergence decomposition in dually flat spaces using mixed parameterizations are crucial. Therefore, we investigate extended divergences on the foliation, setting different -parameters on each leaf.
First, we explain the dualistic structures, -divergences, and the Tsallis relative entropy on the probability simplex. Next, we describe the divergences generated by affine immersions as level surfaces on the deformed probability simplexes corresponding to sets of escort distributions. We then define an extended divergence on a foliation by deformed probability simplexes. Finally, we propose a new decomposition of an extended divergence on the foliation.
2. Dualistic Structures and Divergences on the Probability Simplex
Let
be the
n-dimensional probability simplex, i.e.,
where
are the probabilities of
states. Let
be an affine coordinate system on
, where
for
, and
be a frame of a tangent vector field on
. The Fisher metric
on
is defined by
where
is Kronecker’s delta. We define an
-connection
on
by
where
if
, and
if others. Then, the Levi-Civita connection ∇ of
g coincides with
. For
, we have
where
is the set of all smooth tangent vector fields on
. Then,
is called the dual connection of
. For each
,
is torsion-free and
is symmetric. Therefore, the triple
is a statistical manifold, and
the dual statistical manifold of it [
9,
12,
13].
For
, an
-divergence
is defined by
For
, it is known that
for the Tsallis relative entropy,
defined by
where
is the
q-logarithmic function defined by
[
1,
2]. The Tsallis relative entropy
converges to the Kullback–Leibler divergence as
, because
. In information geometric view, the
-divergence
converges to the Kullback–Leibler divergence as
.
3. Deformed Probability Simplexes and Escort Distributions
For
states
on
and
, if each probability
satisfies
the probability distribution
is called the escort distribution [
1,
2], where
is
powered by
q.
It realizes the dualistic structure of a set of escort distributions via the affine immersion into
[
12,
13]. Let
be the affine immersion of
into
defined by
where
is the canonical coordinate system on
. Then, the escort distribution
is represented as follows:
For a function
on
defined by
the image
is a level surface of
satisfying
. For
, the Hessian matrix of the function
is positive definite on
. Then,
induces the Hessian structure
, where
D is the canonical flat affine connection [
9,
15]. By definition
the tetrad
is the dual flat structure.
The submanifold structure of
induced by
coincides with the dualistic structure induced by the equiaffine immersion
, where
for the gradient vector field
of
on
defined by
(cf. Theorem 2) [
13,
16,
17]. In Equation (
19), the induced affine connection
is the restricted
D. The affine fundamental form
is the restricted
h. The operator
is called the shape operator. If the transversal connection form satisfies
,
, then it is called the equiaffine immersion [
18].
For the restricted
D and
h on
, we use the same notations. The pullback of
to
is
-conformally equivalent to
defined by Equations (3)–(5). In addition,
has a constant curvature
[
13].
4. Divergences Generated by Affine Immersions as Level Surfaces
Let be the canonical flat affine connection on an -dimensional real affine space . The following theorem is known on the level surfaces of a Hessian function.
Theorem 1 ([
16])
. Let M be a simply connected n-dimensional level surface of φ on an -dimensional Hessian domain with a Riemannian metric and suppose that . If we consider a flat statistical manifold, is a 1-conformally flat statistical submanifold of , where D and g denote the connection and the Riemannian metric on M induced by and , respectively. For
, statistical manifolds
and
are
-conformally equivalent if there exists a function
on
N such that
If
is 1-conformally equivalent to a flat statistical manifold
,
is called a 1-conformally flat statistical manifold. A statistical manifold
is 1-conformally flat iff the dual statistical manifold
is
-conformally flat [
19].
In terms of affine geometry,
and
are
-conformally equivalent if and only if
and
are projectively equivalent [
17,
20].
The conformal immersion w for an affine immersion satisfies that , where b is a point on the surface, and a pairing of and . The next definition is given in relation to affine immersions and divergences.
Definition 1 ([
19])
. Let be a 1-conformally flat statistical manifold realized by a non-degenerate affine immersion into , and w the conormal immersion for v. Then, the divergence of is defined byThe definition is independent of the choice of a realization of .
This divergence
is referred to as Kurose geometric divergence in affine geometry and as Fenchel–Young divergence in the machine learning community [
21]. The canonical divergence
of a flat statistical manifold
is defined by
where
,
, and
are the primal coordinate, the dual coordinate, and the Legendre transform of
, respectively [
9]. The gradient mapping
is defined by
. For a 1-conformally flat statistical submanifold
of a Hessian domain
, we denote by
the restriction of the divergence
. Then, the next theorem holds.
Theorem 2 ([
17])
. For a 1-conformally flat statistical submanifold of , two divergences and coincide. On the level surface
in
Section 3, the restricted divergence from the canonical divergence of
coincides with the geometric divergence by Equation (
20) for the affine immersion
. In addition, the pullback divergence to
coincides with
and the Tsallis relative entropy
[
12].
5. Extended Divergence on a Foliation by Deformed Probability Simplexes
Previous sections described the dualistic structure, the affine immersion, and the divergence for each fixed
q. This section defines a divergence on a foliation by deformed probability simplexes
for all
, and shows the divergence decomposition property. We apply the discussion on
and
in
Section 4 to the one on
and
.
Let be the divergence on defined by the affine immersion by Equations (17) and (18).
Let
, which corresponds to a foliation
. We define a function
on
as follows:
where
We identify the dual space
with
. The
i-th component of the conformal immersion of
is
. Then, the dual coordinate of
b, denoted by
, satisfies that
[
12,
17]. The next proposition holds.
Proposition 1. The function satisfies that:
(i) If , , where .
(ii) In the case of , for , and if and only if .
Proof. The Legendre transform of
is defined by
+
. By Equation (
21), (i) holds. The definition of
induces that
If , it holds that . In addition, and are convex centro-affine hypersurfaces, and is more on the origin side than . Then, . Thus, (ii) holds. □
Definition 2. We refer to defined by Equations (22) and (23) as an extended divergence on the foliation .
We define the
extended dual divergence of
as follows:
where
is the Legendre transform of
for
. Then, the following holds.
Proposition 2. The functions and satisfy the following: Proof. By the definition of the Legendre transform, we have
□
The extended divergence using Equations (22) and (23) is related to the duo Bregman (pseudo-)divergence where the parameters also define the convex functions [
22]. Their relationship will be studied in future works.
6. Decomposition of an Extended Divergence
At the beginning of this section, to make a decomposition theorem of an extended divergence, we give flows which are orthogonal to each leaf of .
For the foliation
, we consider the flow on
defined using the following equation:
where a function
on
takes the
i-th component of the dual coordinate on
as Equation (
23) for each
. An integral curve of Equation (
27) is orthogonal to
for each
q with respect to the pairing
. The set of the integral curves becomes the orthogonal foliation of
. We denote it by
.
Translating into the primal coordinate system, we have the next equation on
:
The right-hand side of Equation (
28) is calculated using Equations (17) and (18) for
. A leaf of
is an integral curve of the vector field
that takes the value
on
for each
q.
The following theorem is about the decomposition of the extended divergence.
Theorem 3. Let be the probability simplex, and the 1-conformally flat statistical manifold generated by the affine immersion , where is defined as , , and . Let , , and . If there exists an orthogonal leaf , which includes b and c, we havewhere is the dual coordinate of for each q. Proof. From
, it holds that
, where
. By the definition in Equations (22) and (23), we have
□
A decomposition similar to Equation (
30) on a foliation of Hessian level surfaces is also available [
17]. Theorem 3 generalizes the previous decomposition.
Finally, we describe the gradient flow on a leaf using the extended divergence.
Theorem 4. For a submanifold of , we denote by an affine coordinate system on such that , , and set , . The gradient flow on defined byconverges to the unique point , where is a variable point parametrized by . Proof. By Theorem 3, there exists
such that
and
for any
. Equation (
31) is described by the dual coordinate system
on
as follows:
On
, from Proposition 1 (i),
coincides with the geometric divergence
, generated by the affine immersion
. The geometric divergence generates the dual coordinate
such that
,
, to be derived by
[
19]. Then, it holds that
and that
where
is an initial point of Equation (
31). Then, the gradient flow of Equation (
31) converges to
following a geodesic for the dual coordinate system. □
The gradient flow similar to Equation (
31) has been provided on a flat statistical submanifold [
23]. The similar one on a Hessian level surface, i.e., a 1-conformally statistical submanifold, has been given in [
17].
7. Conclusions
This study considers a foliation of probability simplexes, which are typical q-exponential families, for the continuous transition of -parameters on information geometry. We still need to provide details on the extended divergence and natural definition of the foliation of q-exponentials.