1. Introduction
The Cauchy and Student’s
t-distributions are prototypical examples of
q-Gaussian distributions. Thus, the set of
q-normal distributions is regarded as a standard
q-exponential family, underlining its strong connection to nonextensive statistical mechanics [
1,
2]. Both
q-normal distributions and
q-exponential families have been extensively studied from an information-geometric perspective, with applications ranging from nonextensive statistical mechanics to various other fields [
3,
4,
5,
6,
7,
8,
9,
10]. Furthermore, deformed
q-exponential families, defined by deformed logarithm and exponential functions, have been applied to escort distributions [
11], and their Hessian and conformal structures have been actively investigated [
12,
13,
14].
Our previous work explored the foliation formed by deformed probability simplexes representing sets of escort distributions (a typical discrete case of
q-exponential families) in relation to the continuous variation of
-parameters within the information geometry framework [
15,
16]. For example, we define an extended divergence and its decomposition on this foliation. We then extend this analysis to the foliation formed by escort distributions associated with continuous probability distributions [
17]. In this study, we augment the mathematical proofs provided in our previous work and elucidate the foliation structure formed by escort distributions associated with normal distributions.
The paper is organized as follows. First, we introduce an -divergence on a subset of an affine space identified with a foliation of multiplied exponential families and discuss the relationship with the Tsallis relative entropy. Next, we present the component expressions of the Riemannian metric, -affine connections, and -coordinates introduced from the -divergence. We compare -divergences for continuous and discrete probability distributions. We also describe the dualistic structures of affine immersions as level surfaces on the q-escort distributions of exponential families. Then, we define the extended divergence on a foliation formed by q-escort distributions of an exponential distribution. Finally, we propose the decomposition of the resulting extended divergence on the foliation.
2. -Divergences on a Foliation of Multiplied Exponential Families
Following the characterization of discrete probability distributions in [
12,
13], we explain the
-divergences on a subset of an affine space by the foliation of probability distributions.
Let
be an exponential family defined by
where
is an element of the parameter space
,
is a function on
, and
are functions on the sample space
. Let
be a smooth submanifold of
, identified with a smooth submanifold of an
n-dimensional affine space
. Let
be the parameter space, which is the
u times of
, i.e.,
We consider an extended parameter space
, which is naturally identified by a smooth submanifold of
(
Figure 1). Defining the cone-like
using the parameter u is for the purpose of defining projections and immersions of an exponential family in later sections. In the following, all integrations are performed over the sample space
.
Definition 1
([
17])
. Let be the parameter space of an exponential family, and the u times of . For , an α-divergence on an extended parameter space is defined bywhere . , and . However, an
-divergence
is defined when the integral in Equation (
3) exists. From this definition, the following properties hold.
Proposition 1
([
17])
. An α-divergence on an extended parameter space Θ
satisfies the following:- (i)
In the case of , i.e., , - (ii)
In the case of ,
The proof of Proposition 1 follows directly from the normalization condition:
Regarding the decomposition of the -divergence , we obtain the following theorem.
Theorem 1.
For , an α-divergence on an extended parameter space satisfies thatif , , and . Proof. Each term is defined as follows:
Then, we have that
Thus, Theorem 1 holds. □
By Theorem 1, the next proposition holds.
Proposition 2.
For , let be an α-divergence on an extended parameter space . For , it holds thatFurthermore, if and only if , Proof. Let
and
. For
, it follows from Theorem 1 that
From Proposition 1 (i), we have
The
-divergence
on the right-hand side of Equation (
9) is the divergence on the exponential family
and satisfies
Therefore,
, where equality holds if and only if
. Similarly, from Proposition 1 (ii), it holds that
The right-hand side of Equation (
11) is an
-divergence on the half-line
. Thus,
, where equality holds if and only if
. Consequently, Proposition 2 holds. □
At the end of this section, we describe the relationship between the
-divergence and entropy parameterized by
q. For
, the divergence
restricted to the exponential family
satisfies
where
is the continuous Tsallis relative entropy defined by
with
, and
is the
q-logarithm defined by
The Tsallis relative entropy
converges to the Kullback–Leibler divergence as
because
. Similarly, the
-divergence
converges to the Kullback–Leibler divergence as
on the exponential family
[
1,
2,
12,
13].
3. The Dualistic Structure Induced by an -Divergence on an Extended Parameter Space
The -divergence on the extended parameter space induces a dualistic structure.
Let
be the canonical affine coordinate system on
such that its restriction to
coincides with
. We define a vector field transversal to each
by
Note that
by Equation (
14). However, the coordinate
is determined such that
for all
(
Figure 2).
Then, the Riemannian metric
g and
-connection
are computed as follows:
and
where
is the
-divergence defined by Equation (
3), and
. Since
p belongs to an exponential family, the following equation is used for the calculations above.
The triplet
forms a statistical manifold, and
is its dual.
The
-coordinate system
on
is given as follows:
4. Comparison with -Divergences on Discrete Probability Distributions
In this section, for
, let
be an
-divergence on the positive orthant
(which forms a convex cone) defined by
The
-divergence of Proposition 1 (ii) coincides with Equation (
17) in the one-dimensional case.
Let
be the
n-dimensional probability simplex, i.e.,
where
represent the probabilities of
states. Then,
can be identified with the extended parameter space based on
as
Equation (
18) describes the foliation of unnormalized probability simplexes. Consequently, the following corollary holds in a similar manner to Theorem 1.
Corollary 1.
For , the α-divergence on satisfiesif and , where For
, the divergence
restricted to the probability simplex
satisfies
where
is the Tsallis relative entropy defined by
As in the case of continuous distributions, the Tsallis relative entropy
converges to the Kullback–Leibler divergence as
. From an information-geometric perspective, the
-divergence
also converges to the Kullback–Leibler divergence as
. On the probability simplex
, the dualistic structure is induced by the
-divergence (or equivalently, the Tsallis relative entropy) [
1,
2,
9,
12,
13].
5. Escort Distributions via Affine Immersions Generated by Exponential Families
In this section, we discuss
q-escort distributions formed by an exponential family. Although our explanation is based on
Section 2 and
Section 3, there are cases in which
q-escort distributions are mapped into the extended parameter space
instead of ordinary exponential distributions.
For
and
, the
q-escort distribution
is defined by
where
[
1,
2]. Let
be a function on the extended parameter space
defined by
Then, the image
is a level surface of
satisfying
, where the affine immersion
of
into
is defined by
and
is the
u times of the
-coordinate of
,
.
In fact, for
, it holds that
As mentioned above, the unnormalized
q-escort distribution family generated by an exponential family can be identified with
. Similarly, the unnormalized
-escort distribution family is associated with
. The image
is a level surface of
satisfying
, where
is defined by Equation (
20) with
q replaced by
. Since
, the pullback of
can be regarded as the
-coordinate system of the exponential family
.
For
, if the Hessian matrix of the function
is non-degenerate,
induces the Hessian structure
, where
D is the canonical flat affine connection, i.e.,
[
9,
18],
, and
. By definition,
the tetrad
forms a dually flat structure. The connection
coincides with the Levi–Civita connection of the Riemannian metric
h.
The dual coordinate system of
is induced by partial differentials of the following functional:
A function
can be replaced with
on a level surface
. Then, the right-hand side of Equation (
21) becomes
Therefore, on a level surface
, the dual coordinate system of the Hessian structure
coincides with the
-coordinate system defined by an affine immersion
.
Generally, the submanifold structure of
induced by
coincides with the dualistic structure induced by the equiaffine immersion
, where
for the gradient vector field
of
on
, and
for
[
19,
20,
21,
22,
23].
Furthermore,
has a constant curvature
[
13].
6. Extended Divergences on Foliations by Escort Distributions
In this section, we describe a foliation of escort distributions on an extended parameter space and define an extended divergence on this foliation.
Let , where each leaf is considered as a q-escort distribution family generated by an exponential family.
Definition 2.
Let be a function on the extended parameter space Θ
defined byand the image of a level surface of satisfying , where the affine immersion of into is defined byand is the u times of the α-coordinate of , . Then, an extended divergence on the foliation is defined as a function on given by On the level surface
, the restricted divergence from the canonical divergence of
coincides with the geometric divergence for the affine immersion
[
19,
20]. The pullback divergence to
coincides with
.
Let
be the divergence on
defined by the affine immersion
in
Section 5. We identify the dual space
with
. For each
q, the dual coordinates are defined by
By direct calculation, we obtain the following results.
Proposition 3.
For , the dual coordinates of a level surface are given by Proof. Using the techniques described in
Section 3, Equation (
23) is derived. By Equation (
14), it holds that
Substituting Equation (
23) into this relation yields Equation (
24). □
By changing from the integral notation to the coordinate component notation, the following properties hold.
Proposition 4.
The extended divergence on satisfies the following properties:
- (i)
If , then where is the divergence on defined by an affine immersion, is an -divergence defined by Equation (3), and . - (ii)
where equality holds if and only if .
Proof. If
, then
. By Definition 2 and the properties of affine immersions, it follows that
which proves (i).
Next, if
, then we have
from
by the definition of
. Geometrically,
and
are convex surfaces centered at the origin of
, and the surface
lies closer to the origin than
. Therefore, we obtain
, which proves (ii). □
The extended dual divergence
of
is defined in the same manner as that for discrete escort distributions [
15,
16].
The proposed extended divergence is closely related to the duo Bregman (pseudo)divergence, where the parameters also define the underlying convex functions [
24,
25].
7. Decomposition of an Extended Divergence
In this section, we propose a decomposition theorem of an extended divergence. The following discussion also characterizes the extended divergence associated with q-escort distribution families under Definition 3.
We consider the flow on
defined by
where each function
on
represents the
i-th component of the dual coordinate on
for each
. On a level surface, because the dual coordinate is parallel to the gradient of
,
is orthogonal to
. Then, an integral curve of Equation (
26) is orthogonal to
for each
q with respect to the pairing
on
and
. The set of integral curves becomes the orthogonal foliation
of
.
Translating into the primal coordinate system yields the following equations on
.
where
is the inverse matrix of
. A leaf of
is an integral curve of the vector field
that represents the value
on
for each
q.
The following theorem describes the decomposition of the extended divergence.
Theorem 2.
Let be an exponential family, and let be the 1-conformally flat statistical manifold generated by the affine immersion , where is defined by Equation (20). Let , , where is the restriction of to . For with , and . If there exists an orthogonal leaf containing both b and c, then we havewhere denotes the dual coordinate of for each q. Proof. Because
, it follows that
with
. By Definition 2, we obtain
□
In a manner similar to discrete escort distributions [
15,
16,
26], we obtain the gradient flow on a leaf
using the extended divergence.
Finally, an example of a normal distribution is shown. A normal distribution is defined by
The affine coordinate system
is defined as
where
When
, the values of
and level surfaces for
are shown in
Figure 3 (
) and
Figure 4 (
). The level surfaces overlaid for
q = 0.75, 0.5, and 0.25 is shown in
Figure 5.
Figure 6 shows the level surface for
, taking
into consideration. It is impossible to create a gradient flow in the directions where the Hessian of
does not degenerate. In directions where it degenerates, it is necessary to define a separate dual geometric structure.
8. Discussion
This study investigated a foliation of deformed exponential families corresponding to sets of escort distributions with q-parameters, accommodating the continuous transition of -parameters within the framework of information geometry. By establishing a natural definition for this foliation, we provide an account of an extended divergence that quantifies the proximity between q-escort distributions characterized by different q-parameters.
In the future, the cases of , which is important in nonequilibrium statistical mechanics, and must be considered. Accurately comparing the q-escort and q-exponential families also remains a challenge.
This theoretical framework offers a robust mathematical foundation for advancing machine learning and deep learning methodologies, particularly in systems where heterogeneous
q-parameters coexist. In the context of deep learning, our results are directly applicable to the design of robust loss functions and
q-generalized activation functions (such as the
q-softmax), which are essential for handling heavy-tailed noise and non-Gaussian data distributions. Furthermore, the proposed decomposition theorem provides an information-geometric basis for adaptive optimization algorithms, potentially allowing the dynamic tuning of
q-parameters during the training process to enhance model generalization and convergence. While the decomposition theorem suggests a structural path toward determining optimal
q-parameters, the specific algorithmic implementations for hyperparameter optimization warrant further investigation. Exploring the connection between our framework and the recently proposed
-duality in nonextensive statistical mechanics [
27,
28] remains a promising direction for future research in both statistical physics and neural network theory.