Next Article in Journal
SCKM: Symmetric Co-Skew Moment for User Selection in Federated Learning
Previous Article in Journal
Early Warning Signals in Ecological Time-Series
Previous Article in Special Issue
Fisher–Rao Distance for Finite-Energy Signal Manifolds: Geometric Foundations and Numerical Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extended Divergence on a Foliation by Continuous-Type Escort Distributions

Faculty of Engineering, Tohoku Gakuin University, Sendai 984-8588, Japan
Entropy 2026, 28(6), 629; https://doi.org/10.3390/e28060629
Submission received: 1 April 2026 / Revised: 25 May 2026 / Accepted: 29 May 2026 / Published: 2 June 2026

Abstract

From an information geometric perspective, this study considers a natural foliation of dualistic structures associated with escort distributions of exponential families. We propose an extended divergence on this foliation by continuous-type escort distributions. Specifically, we investigate the foliation formed by escort distributions to analyze the transition of q-parameters, rather than relying on a fixed parameter. Within this foliation, distinct q-parameters and their corresponding dualistic α -parameters were defined on each leaf. Finally, we present a decomposition of the extended divergence on this foliation, providing an analog to the method previously established for discrete escort distributions.

1. Introduction

The Cauchy and Student’s t-distributions are prototypical examples of q-Gaussian distributions. Thus, the set of q-normal distributions is regarded as a standard q-exponential family, underlining its strong connection to nonextensive statistical mechanics [1,2]. Both q-normal distributions and q-exponential families have been extensively studied from an information-geometric perspective, with applications ranging from nonextensive statistical mechanics to various other fields [3,4,5,6,7,8,9,10]. Furthermore, deformed q-exponential families, defined by deformed logarithm and exponential functions, have been applied to escort distributions [11], and their Hessian and conformal structures have been actively investigated [12,13,14].
Our previous work explored the foliation formed by deformed probability simplexes representing sets of escort distributions (a typical discrete case of q-exponential families) in relation to the continuous variation of α -parameters within the information geometry framework [15,16]. For example, we define an extended divergence and its decomposition on this foliation. We then extend this analysis to the foliation formed by escort distributions associated with continuous probability distributions [17]. In this study, we augment the mathematical proofs provided in our previous work and elucidate the foliation structure formed by escort distributions associated with normal distributions.
The paper is organized as follows. First, we introduce an α -divergence on a subset of an affine space identified with a foliation of multiplied exponential families and discuss the relationship with the Tsallis relative entropy. Next, we present the component expressions of the Riemannian metric, α -affine connections, and α -coordinates introduced from the α -divergence. We compare α -divergences for continuous and discrete probability distributions. We also describe the dualistic structures of affine immersions as level surfaces on the q-escort distributions of exponential families. Then, we define the extended divergence on a foliation formed by q-escort distributions of an exponential distribution. Finally, we propose the decomposition of the resulting extended divergence on the foliation.

2. α -Divergences on a Foliation of Multiplied Exponential Families

Following the characterization of discrete probability distributions in [12,13], we explain the α -divergences on a subset of an affine space by the foliation of probability distributions.
Let S 1 be an exponential family defined by
S 1 = p ( x ; θ ( 1 ) ) | p ( x ; θ ( 1 ) ) = exp i = 1 n θ ( 1 ) i c i ( x ) Φ ( θ ( 1 ) ) , θ ( 1 ) Θ ( 1 ) R n ,
where θ ( 1 ) = ( θ ( 1 ) 1 , , θ ( 1 ) n ) is an element of the parameter space Θ ( 1 ) , Φ is a function on Θ ( 1 ) , and c 1 , , c n are functions on the sample space X R . Let Θ ( 1 ) be a smooth submanifold of R n , identified with a smooth submanifold of an n-dimensional affine space A n . Let Θ ( u ) be the parameter space, which is the u times of Θ ( 1 ) , i.e.,
Θ ( u ) = θ ( u ) = ( θ ( u ) 1 , , θ ( u ) n ) | θ ( u ) = u θ ( 1 ) = ( u θ ( 1 ) 1 , , u θ ( 1 ) n ) , θ ( 1 ) Θ ( 1 ) , u A + .
We consider an extended parameter space Θ u > 0 ( Θ ( u ) { u } ) , which is naturally identified by a smooth submanifold of A n A + (Figure 1). Defining the cone-like Θ using the parameter u is for the purpose of defining projections and immersions of an exponential family in later sections. In the following, all integrations are performed over the sample space X .
Definition 1 
([17]). Let Θ ( 1 ) be the parameter space of an exponential family, and Θ ( u ) the u times of Θ ( 1 ) . For 1 < α < 1 , an α-divergence D ( α ) on an extended parameter space Θ : = u > 0 ( Θ ( u ) { u } ) A n A + is defined by
D ( α ) ( a , b ) = 4 1 α 2 1 α 2 u a + 1 + α 2 u b u a p x ; 1 u a θ a 1 α 2 u b p x ; 1 u b θ b 1 + α 2 d x ,
where a = ( θ a , u a ) , b = ( θ b , u b ) Θ . θ a Θ ( u a ) , θ b Θ ( u b ) , and u a , u b A + .
However, an α -divergence D ( α ) is defined when the integral in Equation (3) exists. From this definition, the following properties hold.
Proposition 1 
([17]). An α-divergence D ( α ) on an extended parameter space Θ satisfies the following:
(i) 
In the case of θ a , θ b Θ ( u a ) , i.e., u a = u b A + ,
D ( α ) ( a , b ) = u a D ( α ) 1 u a θ a , 1 , 1 u a θ b , 1 .
(ii) 
In the case of ( 1 / u b ) θ b = ( 1 / u c ) θ c Θ ( 1 ) ,
D ( α ) ( b , c ) = 4 1 α 2 1 α 2 u b + 1 + α 2 u c u b 1 α 2 u c 1 + α 2 .
The proof of Proposition 1 follows directly from the normalization condition:
p x ; 1 u a θ a d x = p x ; 1 u b θ b d x = 1 , 1 u a θ a , 1 u b θ b Θ ( 1 ) .
Regarding the decomposition of the α -divergence D ( α ) , we obtain the following theorem.
Theorem 1. 
For 1 < α < 1 , an α-divergence D ( α ) on an extended parameter space Θ : = u > 0 ( Θ ( u ) { u } ) A n A + satisfies that
D ( α ) ( a , c ) = μ D ( α ) ( a , b ) + D ( α ) ( b , c ) , μ = u c u b 1 + α 2 ,
if a = ( θ a , u a ) , b = ( θ b , u b ) , c = ( θ c , u c ) Θ , u a = u b , and ( 1 / u b ) θ b = ( 1 / u c ) θ c .
Proof. 
Each term is defined as follows:
1 α 2 4 D ( α ) ( a , c ) = 1 α 2 u a + 1 + α 2 u c u a p x ; 1 u a θ a 1 α 2 u c p x ; 1 u c θ c 1 + α 2 d x = 1 α 2 u b + 1 + α 2 u c u b 1 α 2 · μ u b 1 + α 2 p x ; 1 u a θ a 1 α 2 p x ; 1 u c θ c 1 + α 2 d x = 1 α 2 u b + 1 + α 2 u c μ u b p x ; 1 u a θ a 1 α 2 p x ; 1 u b θ b 1 + α 2 d x .
1 α 2 4 D ( α ) ( a , b ) = 1 α 2 u a + 1 + α 2 u b u a p x ; 1 u a θ a 1 α 2 u b p x ; 1 u b θ b 1 + α 2 d x = 1 α 2 u b + 1 + α 2 u b u b 1 α 2 u b 1 + α 2 p x ; 1 u a θ a 1 α 2 p x ; 1 u b θ b 1 + α 2 d x = u b u b p x ; 1 u a θ a 1 α 2 p x ; 1 u b θ b 1 + α 2 d x .
1 α 2 4 D ( α ) ( b , c ) = 1 α 2 u b + 1 + α 2 u c u b p x ; 1 u b θ b 1 α 2 u c p x ; 1 u c θ c 1 + α 2 d x = 1 α 2 u b + 1 + α 2 u c μ u b p x ; 1 u b θ b d x = 1 α 2 u b + 1 + α 2 u c μ u b .
Then, we have that
1 α 2 4 { D ( α ) ( a , c ) ( μ D ( α ) ( a , b ) + D ( α ) ( b , c ) ) } = μ u b + μ u b = 0 .
Thus, Theorem 1 holds. □
By Theorem 1, the next proposition holds.
Proposition 2. 
For 1 < α < 1 , let D ( α ) be an α-divergence on an extended parameter space Θ : = u > 0 ( Θ ( u ) { u } ) A n A + . For a , b Θ , it holds that
D ( α ) ( a , b ) 0 .
Furthermore, D ( α ) ( a , b ) = 0 if and only if a = b ,
Proof. 
Let a = ( θ a , u a ) , b = ( θ b , u b ) Θ and μ = ( u b / u a ) 1 + α 2 . For b ˜ = ( ( u a / u b ) θ b , u a ) Θ , it follows from Theorem 1 that
D ( α ) ( a , b ) = μ D ( α ) ( a , b ˜ ) + D ( α ) ( b ˜ , b ) .
From Proposition 1 (i), we have
D ( α ) ( a , b ˜ ) = u a D ( α ) 1 u a θ a , 1 , 1 u a θ b ˜ , 1 .
The α -divergence D ( α ) on the right-hand side of Equation (9) is the divergence on the exponential family S 1 and satisfies
D ( α ) 1 u a θ a , 1 , 1 u a θ b ˜ , 1 = 4 1 α 2 1 p x ; 1 u a θ a 1 α 2 p x ; 1 u a θ b ˜ 1 + α 2 d x .
Therefore, D ( α ) ( a , b ˜ ) 0 , where equality holds if and only if a = b ˜ . Similarly, from Proposition 1 (ii), it holds that
D ( α ) ( b ˜ , b ) = 4 1 α 2 1 α 2 u a + 1 + α 2 u b u a 1 α 2 u b 1 + α 2 .
The right-hand side of Equation (11) is an α -divergence on the half-line { u u > 0 } . Thus, D ( α ) ( b ˜ , b ) 0 , where equality holds if and only if b ˜ = b . Consequently, Proposition 2 holds. □
At the end of this section, we describe the relationship between the α -divergence and entropy parameterized by q. For q = ( 1 α ) / 2 , the divergence D ( α ) restricted to the exponential family S 1 satisfies
D ( α ) ( a , b ) = 1 q K q ( p ( x ; θ a ) , p ( x ; θ b ) ) , a = ( θ a , 1 ) , b = ( θ b , 1 ) Θ ( 1 ) { 1 } ,
where K q is the continuous Tsallis relative entropy defined by
K q ( p ( x ; θ a ) , p ( x ; θ b ) ) p ( x ; θ a ) ln q p ( x ; θ b ) p ( x ; θ a ) d x = 1 1 q 1 p ( x ; θ a ) q p ( x ; θ b ) 1 q d x , p ( x ; θ a ) , p ( x ; θ b ) S 1 ,
with p ( x ; θ a ) , p ( x ; θ b ) S 1 , and ln q is the q-logarithm defined by
ln q x x 1 q 1 1 q , q 1 , x > 0 .
The Tsallis relative entropy K q converges to the Kullback–Leibler divergence as q 1 because lim q 1 ln q x = log x . Similarly, the α -divergence D ( α ) converges to the Kullback–Leibler divergence as α 1 on the exponential family S 1 [1,2,12,13].

3. The Dualistic Structure Induced by an α -Divergence on an Extended Parameter Space

The α -divergence on the extended parameter space Θ : = u > 0 ( Θ ( u ) { u } ) induces a dualistic structure.
Let { θ 1 , , θ n , u } be the canonical affine coordinate system on Θ A n A + such that its restriction to Θ ( u ) coincides with { θ ( u ) 1 , , θ ( u ) n , u } . We define a vector field transversal to each Θ ( u ) { u } by
u ¯ a = 1 u a i = 1 n θ a i θ i + u .
Note that ( / u ¯ ) ( / u ) by Equation (14). However, the coordinate u ¯ is determined such that u ¯ ( a ) = u ( a ) for all a Θ (Figure 2).
Then, the Riemannian metric g and α -connection ( α ) are computed as follows:
g θ i , θ j a = θ i a θ j b D ( α ) ( a , b ) | a = b = 1 u a p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a c j ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) j | θ ( 1 ) = 1 u a θ a d x ,
g θ i , u ¯ a = g u ¯ , θ i a = θ i a u ¯ b D ( α ) ( a , b ) | a = b = 1 u a p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a d x ,
g u ¯ , u ¯ a = u ¯ a u ¯ b D ( α ) ( a , b ) | a = b = 1 u a ,
g θ i ( α ) θ j , θ k a = 2 θ i θ j a θ k b D ( α ) ( a , b ) | a = b = 1 u a 2 p x ; 1 u a θ a { 1 α 2 c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a c j ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) j | θ ( 1 ) = 1 u a θ a 2 Φ ( θ ( 1 ) ) θ ( 1 ) i θ ( 1 ) j | θ ( 1 ) = 1 u a θ a } c k ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) k | θ ( 1 ) = 1 u a θ a d x ,
g θ i ( α ) θ j , u ¯ a = 2 θ i θ j a u ¯ b D ( α ) ( a , b ) | a = b = 1 u a 2 p x ; 1 u a θ a { 1 α 2 c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a c j ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) j | θ ( 1 ) = 1 u a θ a 2 Φ ( θ ( 1 ) ) θ ( 1 ) i θ ( 1 ) j | θ ( 1 ) = 1 u a θ a } d x ,
g u ¯ ( α ) θ i , θ j a = g θ i ( α ) u ¯ , θ j a = 2 u ¯ θ i a θ j b D ( α ) ( a , b ) | a = b = 1 + α 2 1 u a 2 p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a c j ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) j | θ ( 1 ) = 1 u a θ a d x ,
g u ¯ ( α ) u ¯ , θ i a = 2 u ¯ 2 a θ i b D ( α ) ( a , b ) | a = b = 1 + α 2 1 u a 2 p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a d x ,
g u ¯ ( α ) θ i , u ¯ a = g θ i ( α ) u ¯ , u ¯ a = 2 u ¯ θ i a u ¯ b D ( α ) ( a , b ) | a = b = 1 + α 2 1 u a 2 p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a d x ,
and
g u ¯ ( α ) u ¯ , u ¯ a = 2 u ¯ 2 a u ¯ b D ( α ) ( a , b ) | a = b = 1 + α 2 1 u a 2 ,
where D ( α ) is the α -divergence defined by Equation (3), and a , b Θ . Since p belongs to an exponential family, the following equation is used for the calculations above.
θ i p x ; 1 u θ | a = p x ; 1 u a θ a θ ( 1 ) i i = 1 n θ ( 1 ) i c i ( x ) Φ ( θ ( 1 ) ) | θ ( 1 ) = 1 u a θ a θ i 1 u θ i | a = 1 u a p x ; 1 u a θ a c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a
The triplet ( Θ , ( α ) , g ) forms a statistical manifold, and ( Θ , ( α ) , g ) is its dual.
The α -coordinate system { θ ( α ) 1 , , θ ( α ) n , u ¯ ( α ) } on Θ is given as follows:
θ ( α ) i ( a ) = θ i ln 1 q u p x ; 1 u θ d x | a = 2 1 α θ i u p x ; 1 u θ 1 α 2 1 d x | a
= 1 u a u a p x ; 1 u a θ a 1 α 2 c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u a θ a d x , i = 1 , , n ,
u ¯ ( α ) ( a ) = u ¯ ln 1 q u p x ; 1 u θ d x | a = 2 1 α u ¯ u p x ; 1 u θ 1 α 2 1 d x | a
= 1 u a u a p x ; 1 u a θ a 1 α 2 d x , a Θ , q = 1 α 2 .

4. Comparison with α -Divergences on Discrete Probability Distributions

In this section, for α ± 1 , let D ( α ) be an α -divergence on the positive orthant A + n + 1 (which forms a convex cone) defined by
D ( α ) ( a , b ) = 4 1 α 2 1 α 2 i = 1 n + 1 a i + 1 + α 2 i = 1 n + 1 b i i = 1 n + 1 a i 1 α 2 b i 1 + α 2 , a , b A + n + 1 .
The α -divergence of Proposition 1 (ii) coincides with Equation (17) in the one-dimensional case.
Let S be the n-dimensional probability simplex, i.e.,
S = a = ( a 1 , , a n + 1 ) | a i > 0 , a i A , i = 1 , , n + 1 , i = 1 n + 1 a i = 1 ,
where a 1 , , a n + 1 represent the probabilities of n + 1 states. Then, A + n + 1 can be identified with the extended parameter space based on S as
A + n + 1 = u > 0 a = ( a 1 , , a n + 1 ) | a i > 0 , a i A , i = 1 , , n + 1 , i = 1 n + 1 a i = u .
Equation (18) describes the foliation of unnormalized probability simplexes. Consequently, the following corollary holds in a similar manner to Theorem 1.
Corollary 1. 
For α ± 1 , the α-divergence D ( α ) on A + n + 1 satisfies
D ( α ) ( a , c ) = μ D ( α ) ( a , b ) + D ( α ) ( b , c ) , μ = u c u b 1 + α 2 , a , b , a n d c A + n + 1 ,
if u a = u b and ( 1 / u b ) b = ( 1 / u c ) c , where
i = 1 n + 1 a i = u a , i = 1 n + 1 b i = u b , i = 1 n + 1 c i = u c .
For q = ( 1 α ) / 2 , the divergence D ( α ) restricted to the probability simplex S satisfies
D ( α ) ( a , b ) = 1 q K q ( a , b ) , a , b S ,
where K q is the Tsallis relative entropy defined by
K q ( a , b ) i = 1 n + 1 a i ln q b i a i = 1 1 q 1 i = 1 n + 1 a i q b i 1 q , a , b S .
As in the case of continuous distributions, the Tsallis relative entropy K q converges to the Kullback–Leibler divergence as q 1 . From an information-geometric perspective, the α -divergence D ( α ) also converges to the Kullback–Leibler divergence as α 1 . On the probability simplex S , the dualistic structure is induced by the α -divergence (or equivalently, the Tsallis relative entropy) [1,2,9,12,13].

5. Escort Distributions via Affine Immersions Generated by Exponential Families

In this section, we discuss q-escort distributions formed by an exponential family. Although our explanation is based on Section 2 and Section 3, there are cases in which q-escort distributions are mapped into the extended parameter space Θ instead of ordinary exponential distributions.
For p ( x ; θ ( 1 ) ) S 1 and 0 < q < 1 , the q-escort distribution P q ( x ) is defined by
P q ( x ) p ( x ; θ ( 1 ) ) q p ( x ; θ ( 1 ) ) q d x = 1 r p ( x ; θ ( 1 ) ) q 1 r p ( x ; θ ( 1 ) ) q d x , x X ,
where r > 0 [1,2]. Let ψ q be a function on the extended parameter space Θ defined by
ψ q ( a ) = 1 1 q ( q u a p ( x ; 1 u a θ ( u a ) ) ) 1 q d x , a = ( θ a , u a ) Θ , θ a Θ ( u a ) , u a A + .
Then, the image f q ( S 1 ) is a level surface of ψ q satisfying ψ q ( a ) = 1 / ( 1 q ) , where the affine immersion f q of S 1 into A n A + is defined by
f q : p ( x ; θ ( 1 ) ) ( θ ( u ) , u ) Θ ( u ) { u } Θ A n A + , u = 1 q p ( x ; θ ( 1 ) ) q d x ,
and ( θ ( u ) , u ) is the u times of the α -coordinate of ( θ ˜ ( 1 ) , 1 ) S 1 , q = ( 1 α ) / 2 .
In fact, for ( θ ( u ) , u ) f q ( S 1 ) , it holds that
θ ( u ) i = 1 q p ( x ; θ ( 1 ) ) q c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i d x = u p ( x ; θ ˜ ( 1 ) ) 1 α 2 c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i d x , i = 1 , , n .
As mentioned above, the unnormalized q-escort distribution family generated by an exponential family can be identified with f q ( S 1 ) . Similarly, the unnormalized ( 1 q ) -escort distribution family is associated with f 1 q ( S 1 ) . The image f 1 q ( S 1 ) is a level surface of ψ 1 q satisfying ψ 1 q ( a ) = 1 / q , where f 1 q is defined by Equation (20) with q replaced by 1 q . Since 1 q = ( 1 + α ) / 2 , the pullback of f 1 q ( S 1 ) can be regarded as the ( α ) -coordinate system of the exponential family S 1 .
For 0 < q < 1 , if the Hessian matrix of the function ψ q is non-degenerate, ψ q induces the Hessian structure ( Θ , D , h ( 2 ψ q / θ i θ j ) ) , where D is the canonical flat affine connection, i.e., D d θ i = 0 [9,18], ( θ 1 , , θ n + 1 ) Θ , and θ n + 1 = u A + . By definition,
h θ i , θ j = 2 ψ q θ i θ j , h D θ i ( α ) θ j , θ k = 3 ψ q θ i θ j θ k , i , j , k = 1 , , n + 1 ,
the tetrad ( Θ , D , D ( 1 ) , h ) forms a dually flat structure. The connection D ( 0 ) coincides with the Levi–Civita connection of the Riemannian metric h.
The dual coordinate system of ( θ 1 , , θ n + 1 ) is induced by partial differentials of the following functional:
ψ q z 1 1 q ( q z ) 1 q d x | z = u p ( x ; 1 u θ ( u ) ) = 1 1 q q u p x ; 1 u θ ( u ) 1 q q d x .
A function z = u p ( x ; ( 1 / u ) θ ( u ) ) can be replaced with ( ( 1 / q ) p ( x ; θ ( 1 ) ) ) q on a level surface ( f q ( S 1 ) , D , h ) . Then, the right-hand side of Equation (21) becomes
1 1 q p ( x ; θ ( 1 ) ) ) 1 q d x , a = ( θ a , u a ) f q ( S 1 ) .
Therefore, on a level surface ( f q ( S 1 ) , D , h ) , the dual coordinate system of the Hessian structure ( Θ , D , h ( 2 ψ q / θ i θ j ) ) coincides with the ( α ) -coordinate system defined by an affine immersion f 1 q .
Generally, the submanifold structure of f q ( S 1 ) induced by ( Θ , D , D ( 1 ) , h ) coincides with the dualistic structure induced by the equiaffine immersion ( f q , E q ) , where E q d ψ q ( E ˜ ) 1 E ˜ for the gradient vector field E ˜ of ψ q on Θ , and h ( X ˜ , E ˜ ) = d ψ q ( X ˜ ) for X ˜ X ( Θ ) [19,20,21,22,23].
Furthermore, ( f q ( S 1 ) , D , h ) has a constant curvature κ = q ( 1 q ) = ( 1 α 2 ) / 4 [13].

6. Extended Divergences on Foliations by Escort Distributions

In this section, we describe a foliation of escort distributions on an extended parameter space and define an extended divergence on this foliation.
Let F = 0 < q < 1 f q ( S 1 ) A n A + , where each leaf f q ( S 1 ) is considered as a q-escort distribution family generated by an exponential family.
Definition 2. 
Let ψ q be a function on the extended parameter space Θ defined by
ψ q ( a ) = 1 1 q ( q u a p ( x ; 1 u a θ ( u a ) ) ) 1 q d x , a = ( θ a , u a ) Θ , θ a Θ ( u a ) , u a A + ,
and f q ( S 1 ) the image of a level surface of ψ q satisfying ψ q ( a ) = 1 / ( 1 q ) , where the affine immersion of S 1 into A n A + is defined by
f q : p ( x ; θ ( 1 ) ) ( θ ( u ) , u ) Θ ( u ) { u } Θ A n A + , u = 1 q p ( x ; θ ( 1 ) ) q d x ,
and ( θ ( u ) , u ) is the u times of the α-coordinate of ( θ ˜ ( 1 ) , 1 ) S 1 , q = ( 1 α ) / 2 . Then, an extended divergence ρ f o l on the foliation F is defined as a function on F × F given by
ρ f o l ( a , b ) ψ q a ( a ) ψ q b ( b ) 1 1 q b u b p x ; 1 u b θ b 1 q b
1 q a u a p x ; 1 u a θ a q a 1 q b u b p x ; 1 u b θ b q b d x
f o r a f q a ( S 1 ) , b f q b ( S 1 ) , 0 < q a < 1 , 0 < q b < 1 .
On the level surface ( f q ( S 1 ) , D , h ) , the restricted divergence from the canonical divergence of ( A + n + 1 , D , h ) coincides with the geometric divergence for the affine immersion ( f q , E q ) [19,20]. The pullback divergence to S 1 coincides with D ( α ) .
Let ρ q be the divergence on f q ( S 1 ) defined by the affine immersion ( f q , E q ) in Section 5. We identify the dual space A n + 1 * with A n + 1 . For each q, the dual coordinates are defined by
η i ( b ) ψ q b ( b ) θ i , i = 1 , , n + 1 , ( θ 1 , , θ n + 1 ) Θ , θ n + 1 = u A + .
By direct calculation, we obtain the following results.
Proposition 3. 
For 0 < q b < 1 , the dual coordinates of a level surface ( f q ( S 1 ) , D , h ) are given by
η i = 1 ( 1 q ) q · 1 u q u p x ; 1 u θ 1 q c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) i | θ ( 1 ) = 1 u θ d x , i = 1 , , n ,
η n + 1 = 1 ( 1 q ) q · 1 u q u p x ; 1 u θ 1 q 1 1 u i = 1 n θ i c i ( x ) Φ ( θ ( 1 ) ) θ ( 1 ) n + 1 | θ ( 1 ) = 1 u θ d x .
Proof. 
Using the techniques described in Section 3, Equation (23) is derived. By Equation (14), it holds that
η n + 1 = ψ q b ( b ) u ¯ 1 u i = 1 n θ i ψ q b ( b ) θ i = 1 ( 1 q ) q · 1 u q u p x ; 1 u θ 1 q d x 1 u i = 1 n θ i η i .
Substituting Equation (23) into this relation yields Equation (24). □
By changing from the integral notation to the coordinate component notation, the following properties hold.
Proposition 4. 
The extended divergence ρ f o l on F satisfies the following properties:
(i) 
If a , b f q a ( S 1 ) , then
ρ f o l ( a , b ) = ρ q a ( a , b ) = D ( α a ) ( f q a 1 ( a ) , f q a 1 ( b ) ) ,
where ρ q a is the divergence on f q a ( S 1 ) defined by an affine immersion, D ( α a ) is an α a -divergence defined by Equation (3), and α a = 1 2 q a .
(ii) 
If q a q b , then
ρ f o l ( a , b ) 0 f o r ( a , b ) F × F ,
where equality holds if and only if a = b .
Proof. 
If a , b f q a ( S 1 ) , then ψ q a ( a ) = ψ q b ( a ) = ψ q b ( b ) . By Definition 2 and the properties of affine immersions, it follows that
ρ f o l ( a , b ) = i = 1 n + 1 η i ( b ) ( θ i ( a ) θ i ( b ) ) = ρ q a ( a , b ) ,
which proves (i).
Next, if 1 > q a q b > 0 , then we have ψ q a ( a ) ψ q b ( b ) from
ψ q a ( a ) = 1 1 q a , ψ q b ( b ) = 1 1 q b
by the definition of f q ( S 1 ) . Geometrically, f q a ( S 1 ) and f q b ( S 1 ) are convex surfaces centered at the origin of A + n + 1 , and the surface f q a ( S 1 ) lies closer to the origin than f q b ( S 1 ) . Therefore, we obtain i = 1 n + 1 η i ( b ) ( θ i ( a ) θ i ( b ) ) 0 , which proves (ii). □
The extended dual divergence ρ fol * of ρ fol is defined in the same manner as that for discrete escort distributions [15,16].
The proposed extended divergence is closely related to the duo Bregman (pseudo)divergence, where the parameters also define the underlying convex functions [24,25].

7. Decomposition of an Extended Divergence

In this section, we propose a decomposition theorem of an extended divergence. The following discussion also characterizes the extended divergence associated with q-escort distribution families under Definition 3.
We consider the flow on F = 0 < q < 1 f q ( S 1 ) defined by
d η i d t = η i , i = 1 , , n + 1 ,
where each function η i on F represents the i-th component of the dual coordinate on f q ( S 1 ) for each 0 < q < 1 . On a level surface, because the dual coordinate is parallel to the gradient of f q , ( η 1 , , η n + 1 ) is orthogonal to f q ( S 1 ) . Then, an integral curve of Equation (26) is orthogonal to f q ( S 1 ) for each q with respect to the pairing · , · on A n + 1 and A n + 1 * . The set of integral curves becomes the orthogonal foliation F of F .
Translating into the primal coordinate system yields the following equations on F .
d θ i d t = E ˜ i , i = 1 , , n + 1
E ˜ i E ˜ q i = j = 1 n + 1 h q i j ψ q θ j if ( θ i ) f q ( S 1 ) ,
where ( h q i j ) is the inverse matrix of ( h q i j ) . A leaf of F is an integral curve of the vector field E ˜ that represents the value E ˜ q on f q ( S 1 ) for each q.
The following theorem describes the decomposition of the extended divergence.
Theorem 2. 
Let S 1 be an exponential family, and let ( f q ( S 1 ) , D , h q = D d ψ q ) be the 1-conformally flat statistical manifold generated by the affine immersion ( f q , E q ) , where f q is defined by Equation (20). Let E q d ψ ( E ˜ q ) 1 E ˜ q , E ˜ q i j = 1 n + 1 g q i j ψ q / θ j , where g q is the restriction of ( g q i j ) = D d ψ q to f q ( S 1 ) . For a , b f q a ( S 1 ) with 0 < q a < 1 , and c F 0 < q < 1 f q ( S 1 ) . If there exists an orthogonal leaf L F containing both b and c, then we have
ρ f o l ( a , c ) = μ ρ f o l ( a , b ) + ρ f o l ( b , c ) , η ( c ) = μ η ( b ) , μ > 0 ,
where η ( · ) denotes the dual coordinate of f q ( S 1 ) for each q.
Proof. 
Because a , b f q a ( S 1 ) , it follows that ψ q a ( a ) = ψ q b ( b ) with q b = q a . By Definition 2, we obtain
ρ f o l ( a , c ) = ψ q a ( a ) ψ q c ( c ) i = 1 n + 1 η i ( c ) ( θ i ( a ) θ i ( c ) ) = ψ q b ( b ) ψ q c ( c ) i = 1 n + 1 { η i ( c ) ( θ i ( a ) θ i ( b ) ) + η i ( c ) ( θ i ( b ) θ i ( c ) ) } = μ i = 1 n + 1 η i ( b ) ( θ i ( a ) θ i ( b ) ) + { ψ q b ( b ) ψ q c ( c ) i = 1 n + 1 η i ( c ) ( θ i ( b ) θ i ( c ) ) } = μ ρ f o l ( a , b ) + ρ f o l ( b , c ) .
In a manner similar to discrete escort distributions [15,16,26], we obtain the gradient flow on a leaf f q ( S 1 ) using the extended divergence.
Finally, an example of a normal distribution is shown. A normal distribution is defined by
p ( x ; μ , σ 2 ) = 1 2 π σ exp ( x μ ) 2 2 σ 2 .
The affine coordinate system ( θ 1 , θ 2 ) is defined as
p ( x ; θ 1 , θ 2 ) = exp θ 1 x + θ 2 x 2 ( θ 1 ) 2 4 θ 2 log 2 π σ ,
where
θ 1 = μ σ 2 , θ 2 = 1 2 σ 2 .
When θ 1 = 0 , the values of ψ q and level surfaces for ψ q = 1 / ( 1 q ) are shown in Figure 3 ( q = 0.5 ) and Figure 4 ( q = 0.25 ). The level surfaces overlaid for q = 0.75, 0.5, and 0.25 is shown in Figure 5. Figure 6 shows the level surface for q = 0.5 , taking θ 1 into consideration. It is impossible to create a gradient flow in the directions where the Hessian of ψ q does not degenerate. In directions where it degenerates, it is necessary to define a separate dual geometric structure.

8. Discussion

This study investigated a foliation of deformed exponential families corresponding to sets of escort distributions with q-parameters, accommodating the continuous transition of α -parameters within the framework of information geometry. By establishing a natural definition for this foliation, we provide an account of an extended divergence that quantifies the proximity between q-escort distributions characterized by different q-parameters.
In the future, the cases of 1 q < 3 , which is important in nonequilibrium statistical mechanics, and < q 0 must be considered. Accurately comparing the q-escort and q-exponential families also remains a challenge.
This theoretical framework offers a robust mathematical foundation for advancing machine learning and deep learning methodologies, particularly in systems where heterogeneous q-parameters coexist. In the context of deep learning, our results are directly applicable to the design of robust loss functions and q-generalized activation functions (such as the q-softmax), which are essential for handling heavy-tailed noise and non-Gaussian data distributions. Furthermore, the proposed decomposition theorem provides an information-geometric basis for adaptive optimization algorithms, potentially allowing the dynamic tuning of q-parameters during the training process to enhance model generalization and convergence. While the decomposition theorem suggests a structural path toward determining optimal q-parameters, the specific algorithmic implementations for hyperparameter optimization warrant further investigation. Exploring the connection between our framework and the recently proposed λ -duality in nonextensive statistical mechanics [27,28] remains a promising direction for future research in both statistical physics and neural network theory.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

I am grateful to the referees for their constructive comments.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
  2. Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
  3. Ohara, A.; Wada, T. Information geometry of q-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A Math. Theor. 2010, 43, 035002. [Google Scholar] [CrossRef]
  4. Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent Progress in Differential Geometry and Its Related Fields; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific Publishing: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
  5. Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Physica A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
  6. Matsuzoe, H.; Henmi, M. Hessian structures and divergence functions on deformed exponential families. In Geometric Theory of Information, Signals and Communication Technology; Nielsen, F., Ed.; Springer: Basel, Switzerland, 2014; pp. 57–80. [Google Scholar]
  7. Matsuzoe, H.; Wada, T. Deformed algebras and generalizations of independence on deformed exponential families. Entropy 2015, 17, 5729–5751. [Google Scholar] [CrossRef]
  8. Wada, T.; Matsuzoe, H.; Scarfone, A.M. Dualistic Hessian structures among the thermodynamic potentials in the κ-thermostatistics. Entropy 2015, 17, 7213–7229. [Google Scholar] [CrossRef]
  9. Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
  10. Scarfone, A.M.; Matsuzoe, H.; Wada, T. Information geometry of κ-exponential families: Dually-flat, Hessian and Legendre structures. Entropy 2018, 20, 436. [Google Scholar] [CrossRef] [PubMed]
  11. Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Ineq. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
  12. Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
  13. Ohara, A. Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation. Eur. Phys. J. B 2009, 70, 15–28. [Google Scholar] [CrossRef]
  14. Matsuzoe, H. A sequence of escort distributions and generalizations of expectations on q-exponential family. Entropy 2017, 19, 7. [Google Scholar] [CrossRef]
  15. Uohashi, K. Extended Divergence on a Foliation by Deformed Probability Simplexes. Entropy 2022, 24, 1736. [Google Scholar] [CrossRef] [PubMed]
  16. Uohashi, K. A Foliation by Deformed Probability Simplexes for Transition of-Parameters. Phys. Sci. Forum 2022, 5, 53. [Google Scholar]
  17. Uohashi, K. A Foliation by Escort Distributions of Exponential Families and Extended Divergence. In Geometric Science of Information, 7th International Conference, GSI 2025; Nielsen, F., Barbaresco, F., Eds.; LNCS16033; Springer: Cham, Switzerland, 2025; pp. 83–91. [Google Scholar]
  18. Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
  19. Uohashi, K.; Ohara, A.; Fujii, T. 1-conformally flat statistical submanifolds. Osaka J. Math. 2000, 37, 501–507. [Google Scholar]
  20. Uohashi, K.; Ohara, A.; Fujii, T. Foliations and divergences of flat statistical manifolds. Hiroshima Math. J. 2000, 30, 403–414. [Google Scholar] [CrossRef]
  21. Nomizu, K.; Sasaki, T. Affine Differential Geometry: Geometry of Affine Immersions; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
  22. Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
  23. Nomizu, K.; Pinkal, U. On Geometry and Affine Immersions. Math. Z. 1987, 195, 165–178. [Google Scholar] [CrossRef]
  24. Azoury, K.S.; Warmuth, M.K. Relative loss bounds for online density estimation with the exponential family of distributions. Mach. Learn. 2001, 43, 211–246. [Google Scholar] [CrossRef]
  25. Nielsen, F. Statistical divergences between densities of truncated exponential families with nested supports: Duo–Bregman and Duo–Jensen divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
  26. Fujiwara, A.; Amari, S. Gradient Systems in View of Information Geometry. Physica D 1995, 80, 317–327. [Google Scholar] [CrossRef]
  27. Zhang, J.; Wong, T.K.L. λ-Deformation: A canonical framework for statistical manifolds of constant curvature. Entropy 2022, 24, 193. [Google Scholar] [CrossRef] [PubMed]
  28. Wong, T.K.L.; Zhang, J. Tsallis and Rényi deformations linked via a new λ-duality. IEEE Trans. Inf. Theory 2022, 68, 5353–5373. [Google Scholar] [CrossRef]
Figure 1. An extended parameter space Θ (displaying cases for u = 1 , 2 , and 3).
Figure 1. An extended parameter space Θ (displaying cases for u = 1 , 2 , and 3).
Entropy 28 00629 g001
Figure 2. Transversal vector fields.
Figure 2. Transversal vector fields.
Entropy 28 00629 g002
Figure 3. Contour lines of a potential function ψ q ( q = 0.5 ). The thick line is for 1 / ( 1 q ) = 2 (horizontal axis: θ 2 , vertical axis: u).
Figure 3. Contour lines of a potential function ψ q ( q = 0.5 ). The thick line is for 1 / ( 1 q ) = 2 (horizontal axis: θ 2 , vertical axis: u).
Entropy 28 00629 g003
Figure 4. Contour lines of a potential function ψ q ( q = 0.25 ). The thick line is for 1 / ( 1 q ) = 1.333 . (horizontal axis: θ 2 , vertical axis: u).
Figure 4. Contour lines of a potential function ψ q ( q = 0.25 ). The thick line is for 1 / ( 1 q ) = 1.333 . (horizontal axis: θ 2 , vertical axis: u).
Entropy 28 00629 g004
Figure 5. Level surfaces that realize the q-escort distribution family (from the bottom in order, in the case of q = 0.75 , 0.5 , and 0.25 ) and a gradient flow from c to b. The arrows are in the direction of the gradient of level surfaces and represent the dual coordinates at each point. The case of u = 1 shows the exponential family ( q = 1 ). (horizontal axis: θ 2 , vertical axis: u).
Figure 5. Level surfaces that realize the q-escort distribution family (from the bottom in order, in the case of q = 0.75 , 0.5 , and 0.25 ) and a gradient flow from c to b. The arrows are in the direction of the gradient of level surfaces and represent the dual coordinates at each point. The case of u = 1 shows the exponential family ( q = 1 ). (horizontal axis: θ 2 , vertical axis: u).
Entropy 28 00629 g005
Figure 6. A level surface that realizes the q-escort distribution family for q = 0.5 . (horizontal axis: θ 1 , θ 2 , vertical axis: u).
Figure 6. A level surface that realizes the q-escort distribution family for q = 0.5 . (horizontal axis: θ 1 , θ 2 , vertical axis: u).
Entropy 28 00629 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Uohashi, K. Extended Divergence on a Foliation by Continuous-Type Escort Distributions. Entropy 2026, 28, 629. https://doi.org/10.3390/e28060629

AMA Style

Uohashi K. Extended Divergence on a Foliation by Continuous-Type Escort Distributions. Entropy. 2026; 28(6):629. https://doi.org/10.3390/e28060629

Chicago/Turabian Style

Uohashi, Keiko. 2026. "Extended Divergence on a Foliation by Continuous-Type Escort Distributions" Entropy 28, no. 6: 629. https://doi.org/10.3390/e28060629

APA Style

Uohashi, K. (2026). Extended Divergence on a Foliation by Continuous-Type Escort Distributions. Entropy, 28(6), 629. https://doi.org/10.3390/e28060629

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop