Next Article in Journal
Flight Delay Regression Prediction Model Based on Att-Conv-LSTM
Next Article in Special Issue
Pullback Bundles and the Geometry of Learning
Previous Article in Journal
Nontraditional Deterministic Remote State Preparation Using a Non-Maximally Entangled Channel without Additional Quantum Resources
Previous Article in Special Issue
Modeling Categorical Variables by Mutual Information Decomposition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information-Geometric Approach for a One-Sided Truncated Exponential Family

1
Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan
2
Center for Education in Liberal Arts and Sciences, Osaka University, Osaka 560-0043, Japan
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(5), 769; https://doi.org/10.3390/e25050769
Submission received: 10 April 2023 / Revised: 1 May 2023 / Accepted: 5 May 2023 / Published: 8 May 2023
(This article belongs to the Special Issue Information Geometry for Data Analysis)

Abstract

:
In information geometry, there has been extensive research on the deep connections between differential geometric structures, such as the Fisher metric and the α-connection, and the statistical theory for statistical models satisfying regularity conditions. However, the study of information geometry for non-regular statistical models is insufficient, and a one-sided truncated exponential family (oTEF) is one example of these models. In this paper, based on the asymptotic properties of maximum likelihood estimators, we provide a Riemannian metric for the oTEF. Furthermore, we demonstrate that the oTEF has an α = 1 parallel prior distribution and that the scalar curvature of a certain submodel, including the Pareto family, is a negative constant.

1. Introduction

Information geometry is the study of the structure of statistical models using differential geometry. From the standpoint of geometry, a statistical model consisting of a collection of parameterized probability distributions can be regarded as a manifold. Then, when the statistical model satisfies a certain regularity condition, Chentsov’s theorem leads to a natural differential geometric structure [1]. This natural differential geometric structure consists of the Riemann metric defined from the Fisher information matrix and a one-parameter family of affine connections, called the Fisher metric and the α-connection, respectively.
Information geometry of regular models has been studied for a long time, and deep relationships between the geometric structures and statistical models have been revealed (Amari [2] and Amari and Nagaoka [3]).
However, the geometric properties of statistical models which do not satisfy regularity conditions are not sufficiently investigated. One reason is that Chentsov’s theorem cannot be applied to non-regular models; thus, a natural geometric structure is not established. For example, in regular models, it is possible to define the Fisher information matrix in two forms, but in statistical models where the support of the probability density function depends on the parameters, they do not coincide. In previous work, Amari [4] discussed the relationship between the Finsler geometry and non-regular models especially for the translation family.
In the present study, we discuss information geometry for a one-sided truncated exponential family (oTEF) [5], a typical non-regular model. An oTEF is a statistical model with two parameters: the natural parameter θ and the truncation parameter γ. The support of the probability density function depends on the truncation parameter γ, which is what makes such a model non-regular. However, similar to the exponential family, the derivatives of the log-likelihood function, moments, and KL divergence can be explicitly described. Additionally, an oTEF model is a model containing many practical examples, such as the Pareto distribution family, truncated normal distribution family, and truncated exponential distribution family. The statistical properties of the oTEF have been studied for a long time, particularly in the area of parameter estimation theory (Bar-Lev [5], Akahira [6], and Akahira [7]). Recently, Nielsen [8] studied the relationship between the Kullback–Leibler divergence and the Bhattacharyya distance in the oTEF. Shemyakin [9] explored the Hellinger information matrix and its application to noninformative priors for multiparameter non-regular cases, which includes the oTEF, within the framework of Bayesian statistics.
The family of Pareto distributions, a type of oTEF, is a topic of study in information geometry as well (Rylov [10], Peng et al. [11], Li et al. [12], and Sun et al. [13]). As a result, it is known that the Pareto model has constant curvature with respect to the Levi-Civita connection. However, it is impossible to introduce a natural geometric structure, so the geometric structure of the Pareto distribution family has been formally defined based on regular models. Consequently, the consistency with statistical theory remains to be determined. Moreover, multiple geometric structures exist depending on the regular model expressions used as references. For example, Peng et al. [11], Rylov [10], and Sun et al. [13] all define the geometric structure of the Pareto distribution family based on the Fisher metric and α-connection, but these structures are different. Therefore, the question of which geometric structure should be used remains unresolved.
The results of the present study consist of two theorems: one regarding the α-parallel prior distribution and another concerning curvature. First of all, by comparing the relationship between the Fisher metric and the asymptotic behavior of the maximum likelihood estimators in regular models with those in the oTEF, we define the Riemannian metric for the oTEF. Additionally, we use a formal definition for the α-connection, similar to Sun et al. [13]. Under this geometric structure, we demonstrate two geometric properties. The first property is that the oTEF always has an α-parallel prior distribution with α = 1. The α-parallel prior distribution is a type of noninformative prior distribution used in Bayesian statistics and is an information-geometric extension of Jeffreys prior distribution. Although the α-parallel prior distribution with α = 1 always exists, this is not true for other α values. Such a phenomenon does not occur in regular models, highlighting the non-regularity. The second research result is related to the curvature of a submodel of the oTEF. We consider a submodel of the oTEF that includes common scale Pareto distribution families [14] and common location exponential distribution families [15] and demonstrate that its scalar curvature is constant. This result, stating that the scalar curvature of this submodel is constant, corresponds to a higher-dimensional version of the fact that the Pareto model has constant curvature [10].
The remainder of this paper is organized as follows. In Section 2, we review the geometric structure of regular statistical models and one-sided truncated exponential families, providing an overview of the relevant background. In Section 3, we provide a Riemannian metric and affine connection in the oTEF, giving the family a differential geometric structure. Under the geometric structure, we study the α-parallel prior to the oTEF in Section 4. Finally, Section 5 gives our derivation of the curvature for a submodel of the oTEF.

2. Preliminaries

2.1. Statistical Manifold

In this section, we review basic definitions and notation of information geometry for regular statistical models. See work by Amari [2] for more details.
Let P = P θ : θ Θ be a family of distributions P θ on a sample space χ parametrized by θ = ( θ 1 , , θ n ) , where Θ is an open subset of R n . We treat the statistical model P as an n-dimensional manifold. In this case, parameter θ takes the role of coordinates of the manifold P . Furthermore, we assume that parameter θ and distribution P θ have a one-to-one correspondence. Furthermore, suppose that all distributions in P are absolutely continuous with respect to a dominating measure μ . We denote the probability density function of distribution P θ with respect to μ as p ( x , θ ) .
In information geometry, we assume certain conditions on P .
Definition 1 (Regular statistical models).
A statistical model P is said to be regular if it satisfies the following conditions.
1. 
The support of the density function
supp p = x χ : p ( x , θ ) > 0 ¯
is independent of  θ Θ .
2. 
For all  x χ , the function  θ p ( x , θ )  is  C  on Θ.
3. 
Let l ( x , θ ) = log p x , θ . The elements of { 1 l , , n l } are linearly independent as functions on χ.
4. 
For any functions in the following, partial differentiation i and integration with respect to the measure μ are interchangeable as
χ i f ( x , θ ) d μ ( x ) = i χ f ( x , θ ) d μ ( x ) .
The above four conditions are referred to as regularity conditions.
We will treat non-regular statistical models in later sections. In preparation, this subsection discusses the geometric structure of regular statistical models.
Next, we introduce the Fisher metric and α-connection. Let P be a regular statistical model and X be a random variable distributed according to P θ . We usually use the following definition of the Riemannian metric and affine connections on P .
Definition 2 (Fisher metric).
The Fisher metric is the Riemannian metric on P defined by
g i j F = E θ i l X , θ j l X , θ ,
for i , j = 1 , , n . Here, the symbol E [ · ] denotes the expectation with respect to the observation X.
Definition 3 (α-connection).
The α-connection is an affine connection defined by the coefficients
Γ ( α ) i j , k = E i j l X , θ k l X , θ + 1 α 2 E [ i l X , θ j l X , θ k l X , θ ] ,
for i , j , k = 1 , , n and α R on the coordinate system θ.
If P satisfies the regularity conditions, the Fisher metric and the α-connection have several properties. For instance, there is a formula [16] convenient for calculating the Fisher metric:
g i j F = E i j l ( x , θ ) .
Furthermore, the α-connection and the −α-connection are dual. In other words,
A B , C = ( α ) A   B , C + B , ( α ) A   C
holds for all tangent vectors A, B, and C, where · , · denotes the inner product given by the Fisher metric. In particular, the 0-connection is self-dual. Furthermore, the 0-connection corresponds to the Levi-Civita connection, defined by the coefficients
Γ g i j , k = 1 2 i g j k F + j g k i F k g i j F .

2.2. Prior Distributions and Volume Elements

Next, we introduce volume elements on a regular statistical model and its relation to Bayesian statistics [17].
In Bayesian statistics, for a given statistical model P , we need a probability distribution over the model parameter space, which is called a prior distribution or simply a prior. We often denote a prior density as π . ( π ( θ ) 0 and Θ π ( θ ) d θ = 1 . )
A volume element on an n-dimensional oriented model manifold corresponds to a prior density function over the parameter space ( θ Θ R n ) in a one-to-one manner. For a prior π ( θ ) , its corresponding volume element ω is an n-form (differential form of degree n) and is written as
ω = π ( θ ) d θ 1 d θ n
in the local coordinates.
For example, in the two-dimensional Euclidian space ( n = 2 ), the volume element is given by ω = d x d y in the Cartesian coordinates ( x , y ) . In the polar coordinates ( r , ψ ) , it is written as ω = r d r d ψ .
Now, let us explain noninformative priors (see, e.g., work by Robert [18] for details). If we have specific information on the parameter in advance, then the prior should reflect it, in which case it is often called a subjective prior. If not, we adopt a certain criterion and use a prior obtained through the criterion. Such priors are called noninformative priors or objective priors. In particular, the Jeffreys prior, which is given by det ( g ) , is the standard noninformative prior [19].
In information geometry, the Jeffreys prior is regarded as a 0-parallel volume element and was extended to an α-parallel volume element by Takeuchi and Amari [17]. The extensions are called α-parallel priors. They showed that the geometrical properties of regular models are deeply related to the existence of α-parallel priors.
When a suitable geometric structure is defined on a non-regular statistical model, it is interesting to see the relationship between the geometrical properties and the existence of α-parallel priors. Later sections will discuss this topic. In preparation, in this subsection, we briefly summarize some definitions and facts of α-parallel priors for regular models.
To define α-parallel priors, we introduce a geometric property of affine connections.
Definition 4 (Equiaffine).
Let  P  be an n-dimensional manifold with an affine connection induced by a covariant derivative.
An affine connectionis equiaffine if there exists a volume element ω such that
ω = 0
holds everywhere in  P . Furthermore, such a volume element ω is said to be a parallel volume element with respect to.
The necessary and sufficient condition for an affine connection ∇ to be equiaffine is described by its curvature. The following proposition holds for a manifold with an affine connection ∇. Let R i j k l be the components of the Riemannian curvature tensor [3] of ∇, defined as
R i j k l = i Γ j k l i Γ i k l + Γ i m l Γ j k m Γ j m l Γ i k m ,   ( i , j , k , l = 1 , , n )
where Γ i j k denotes the connection coefficients of ∇.
Proposition 1 
(Nomizu and Sasaki [20]). The following conditions are equivalent:
  • is equiaffine,
  • R i j k k = 0 ,
where R i j k k = k = 1 n R i j k k .
The Levi-Civita connection is always equiaffine (Nomizu and Sasaki [20]).
Returning to the statistical model, we define the α -parallel prior distribution. Let P be an n-dimensional regular statistical manifold. It is shown that ( α ) is equiaffine for some α R \ { 0 } if and only if it is equiaffine for all α R . Such a statistical manifold P is said to be statistically equiaffine. In a statistically equiaffine manifold, we may represent the α -parallel volume element ω ( α ) as
ω ( α ) = π ( α ) ( θ ) d θ 1 d θ n
for the coordinates θ , where π ( α ) C ( P ) . We take π ( α ) ( θ ) as a prior distribution on the parameter space Θ .
Definition 5 ( α -parallel prior).
In a statistically equiaffine manifold, for fixed α R , we call the above form of π ( α ) an α-parallel prior.
Since the 0-connection (Levi-Civita connection) is equiaffine, there always exists a 0-parallel prior, known as the Jeffreys prior.
Takeuchi and Amari [17] give a necessary and sufficient condition for α-parallel priors to exist.
Proposition 2 
(Takeuchi and Amari [17]). For a model manifold P , if
i T j k k j T i k k = 0 ( i , j = 1 , , n ) ,
then the α-parallel prior exists for any α R . Otherwise, only the 0-parallel prior exists.

2.3. One-Sided Truncated Exponential Family

In this section, we introduce a one-sided truncated exponential family, which is a typical non-regular model.
Definition 6 
(One-sided Truncated Exponential Family (Bar-Lev [5])). A one-sided truncated exponential family of distributions (oTEF) is a family  P = P θ , γ : θ Θ , γ I 1 , I 2 of distributions P θ , γ  with the density functions
p ( x , θ , γ ) = exp { i = 1 n θ i F i ( x ) + C ( x ) ψ ( θ , γ ) } · 𝟙 [ γ , I 2 ) ( x ) x I 1 , I 2
with respect to the Lebesgue measure. Here, I 1 < I 2 are known parameters, C is a continuous function, and F 1 , , F n are absolutely continuous with d F 1 ( x ) / d x , , d F n ( x ) / d x 0 over the interval I 1 , I 2 .
We say θ = θ 1 , , θ n is the natural parameter and γ is the truncation parameter. We also denote the interval I 1 , I 2 as I.
In oTEF P , the support of the density function depends on the truncation parameter γ . Then, the oTEF does not satisfy the regularity conditions. An oTEF differs from an exponential family at this point. The density of an exponential family has a support that is independent of the parameters.
Furthermore, this truncation parameter γ does not allow for the interchange of partial differentiation γ and integration with respect to the Lebesgue measure. For instance, this means
E γ l ( X , θ , γ ) = γ ψ ( θ , γ )
instead of E γ l ( X , θ , γ ) = 0 .
Here, we introduce two properties of the function ψ .
First, the partial derivative γ ψ ( θ , γ ) coincides with p ( x , θ , γ ) x = γ and is always positive. This fact can be verified as follows. Since p is a probability density function,
exp { ψ ( θ , γ ) } = γ I 2 exp { i = 1 n θ i F i ( x ) + C ( x ) } d x .
Therefore, by differentiating both sides with respect to γ , we obtain
γ ψ ( θ , γ ) exp { ψ ( θ , γ ) } = exp { i = 1 n θ i F i ( γ ) + C ( γ ) }
and
γ ψ ( θ , γ ) = p ( x , θ , γ ) x = γ > 0 .
Second, the following lemma holds.
Lemma 1.
For the function ψ ( θ , γ ) in (13), there always exists θ and γ such that i γ ψ ( θ , γ ) 0 .
Proof. 
Suppose that γ i ψ ( θ , γ ) is always 0. Under this assumption, we will show that i p ( x , θ , γ ) 0 , leading to a contradiction that θ is not a parameter.
If γ i ψ ( θ , γ ) 0 , the function ψ ( θ , γ ) can be expressed as
ψ ( θ , γ ) = ψ 1 ( θ ) + ψ 2 ( γ ) ,
where ψ 1 C ( Θ ) , ψ 2 C ( I ) .
Then,
exp { ψ 1 ( θ ) + ψ 2 ( γ ) } = γ I 2 exp { C ( x ) + θ i F i ( x ) } d x .
Differentiating both sides with respect to γ , we obtain
ψ 2 ( γ ) e ψ 1 ( θ ) + ψ 2 ( γ ) = exp { C ( γ ) + θ i F i ( γ ) } ,
log { ψ 2 ( γ ) e ψ 2 ( γ ) C ( γ ) } + ψ 1 ( θ ) = θ i F i ( γ ) .
Further differentiating both sides with respect to θ i , we have
i ψ 1 ( θ ) = F i ( γ )
and all F i are constants.
Therefore,
i p ( x , θ , γ ) = p ( x , θ , γ ) i { C ( x ) + θ i F i ψ 1 ( θ ) ψ 2 ( γ ) }
= p ( x , θ , γ ) { F i i ψ 1 ( θ ) }
= 0 .
The family of Pareto distributions is an example of the oTEF. Pareto distributions have the following density functions,
p ( x , θ , γ ) = θ γ θ x θ + 1 · 𝟙 [ γ , ) ( x )
= exp ( ( 1 + θ ) log x + log θ + θ log γ ) · 𝟙 [ γ , ) ( x )
with the natural parameter θ ( 0 , ) and the truncation parameter γ ( 0 , ) . This family is used to describe various natural and social phenomena [21].
To discuss geometric structures in subsequent sections, we focus on the asymptotic behavior of maximum likelihood estimators in the oTEF. Our discussion of these estimators follows previous works. Consider random variables X 1 , , X N independent and identically distributed according to P θ , γ , and let X ( 1 ) , , X ( N ) be the order statistics of the sample. Let θ ^ and γ ^ denote the maximum likelihood estimators for θ and γ , respectively. Bar-Lev [5] showed the existence and uniqueness of θ ^ and γ ^ . Here, γ ^ = X ( 1 ) and θ ^ is a root of the maximum likelihood equation i l ( X , θ ^ , γ ^ ) = 0 for i = 1 , , n .
The first-order asymptotic variances of θ ^ and γ ^ are given by
V θ ^ = 1 N i j ψ θ , γ + O 1 N 2 ,
V γ ^ = 1 N 2 { γ ψ ( θ , γ ) } 2 + O 1 N 3 ,
Cov θ ^ , γ ^ = O 1 N 3 .
These are essential to our argument in the next section. Additionally, Akahira [6] and Akahira and Ohyauchi [22] obtained the second-order asymptotic loss.

3. Geometric Structure on One-Sided Exponential Families

This section gives definitions of a geometric structure of the oTEF. We take P as an ( n + 1 ) -dimensional manifold with coordinates ( θ 1 , , θ n , γ ) . Since P does not satisfy regularity conditions, its geometric structure is not determined in a natural way.

3.1. Riemannian Metric in the oTEF

In this subsection, we define a Riemannian metric on oTEF P as follows.
Definition 7 (Riemannian metric in the oTEF).
Let P = P θ , γ : θ Θ , γ I  belong to the oTEF. The Riemannian metric of the oTEF is defined by
g i j = E i l ( x , θ , γ ) j l ( x , θ , γ ) ,
g i γ = 0 ,
g γ γ = { γ ψ ( θ , γ ) } 2
for i , j = 1 , , n . This metric is also represented as
( g ) = g 11 g 1 n 0 g n 1 g n n 0 0 0 { γ ψ ( θ , γ ) } 2 .
Note that g i j can be expressed in terms of the function ψ as follows:
g i j = E i j l ( x , θ , γ ) = i j ψ ( θ , γ ) .
This is similar to the case of exponential family distributions.
We will now explain how we came to the above definition.
Consider an exponential family E γ = P θ , γ : θ Θ for γ I and a statistical model F θ = P θ , γ : γ I for θ Θ . E γ is an n-dimensional submanifold of P obtained by fixing the truncation parameter γ and satisfies the regularity conditions. Additionally, F θ is a one-dimensional submanifold of P obtained by fixing the natural parameter θ . Since E γ is regular, the Riemannian metric on E γ should be the Fisher metric defined in Definition 2. This idea induces the components g i j to be the components of the Fisher metric. The remaining task is to define g i γ , the inner products of i and γ , and g γ γ , the metric on F θ .
In this context, we review the statistical interpretation of the Fisher metric in regular models. Let P 0 be a regular model with parameters θ R n . As mentioned in Section 2.1, the Fisher metric is a Riemannian metric defined by the Fisher information matrix. Expanding the variance of the maximum likelihood estimator θ ^ , we have
V [ θ ^ ] = 1 N g i j F 1 + O 1 N 2 ,
where the first-order coefficient corresponds to g i j F 1 .
On the other hand, in the oTEF, the Riemannian metric is determined from the coefficient of the variance of the maximum likelihood estimator. As shown in Section 2.3, the variances of θ ^ and γ ^ are expressed as
V [ θ ^ ] = 1 N g i j 1 + O 1 N 2 ,
V [ γ ^ ] = 1 N 2 { γ ψ θ , γ } 2 + O 1 N 3 ,
Cov [ θ ^ , γ ^ ] = O 1 N 3 ,
where ( g i j ) is the matrix ( g i j ) i , j = 1 n . Then, similarly to the Fisher information matrix, { γ ψ θ , γ } 2 appears as the reciprocal of the first-term coefficient. From this, the Riemannian metric of F θ is defined as
g γ γ = { γ ψ ( θ , γ ) } 2 .
Furthermore, it should be noted that Cov [ θ ^ , γ ^ ] is negligible up to O 1 N 2 . Moreover, whether θ is known does not affect the first-order term of V [ θ ^ ] [22]. This is also true for the estimation of θ [23]. Based on these facts, we assume that i and γ are orthogonal, and define
g i γ = 0 ( i = 1 , , n ) .
As a result, the Riemannian metric in Definition 7 is obtained.
This Riemannian metric is equal to the formally defined Fisher metric on the oTEF. In other words, the equality
g a b = E a l ( X , θ , γ ) b l ( X , θ , γ )
holds for a , b = 1 , , n , γ . The right-hand side is the same as the definition of the Fisher metric on regular statistical models.
However, the Riemann metric does not satisfy the equation
g a γ = E a γ l ( x , θ , γ ) ( a = 1 , , n , γ ) .
For i = 1 , , n , we have
g i γ = 0 ,
E i γ l ( x , θ , γ ) = i γ ψ ( θ , γ ) ,
but i γ ψ ( θ , γ ) 0 for some θ and γ , by Lemma 1. This is influenced by the fact that it does not satisfy the regularity conditions.

3.2. Affine Connections in the oTEF

Next, we define an affine connection in the oTEF. In this study, we adopt two types of connections: the Levi-Civita connection and the α -connection.
The first affine connection is the Levi-Civita connection:
Γ g a b , c = a g b c + b g c a c g a b
introduced from the Riemannian metric. Of course, this Levi-Civita connection is also a metric and self-dual, the same as in the regular case. Rylov [10] and Li et al. [12] previously adopted this same affine connection.
Second, we define an affine connection in the oTEF as an analogy for the α -connection in the regular statistical model defined in Definition 3.
Definition 8.
For a given α R , the α-connection in the oTEF is defined by the connection coefficients
Γ ( α ) a b , c ( θ , γ ) = E a b l c l + 1 α 2 E a l b l c l ( a , b , c = 1 , , n , γ ) ,
where l = l ( X , θ , γ ) is a log-likelihood function.
The above definition is obtained by substituting the oTEF probability density function into Equation (4) for the α-connection in the regular model. In particular, for the Pareto distribution family, it coincides with the α-connection given by Sun et al. [13] (see the equation in their paper). Note that, the log-likelihood function is not differentiable with respect to γ at γ = x . We calculate the expectation over the interval ( γ , I 2 ) instead of [ γ , I 2 ) .
This α -connection is torsion-free,
Γ ( α ) a b , c = Γ ( α ) b a , c .
The connection coefficients for the α -connection are given by
Γ ( α ) i j , k = 1 α 2 i j k ψ ,
Γ ( α ) i j , γ = 1 + α 2 i j ψ γ ψ ,
Γ ( α ) i γ , j = 1 α 2 i j ψ γ ψ ,
Γ ( α ) i γ , γ = i γ ψ γ ψ ,
Γ ( α ) γ γ , i = 0 ,
Γ ( α ) γ γ , γ = γ γ ψ γ ψ 1 α 2 γ ψ 3 ,
where ψ = ψ ( θ , γ ) .
The α -connection in the oTEF differs from the one in the regular model in several aspects.
First, the 0-connection does not correspond to the Levi-Civita connection. This is due to the inability to express the α -connection coefficients in the following form:
Γ ( α ) a b c ( θ , γ ) = Γ g a b c ( θ , γ ) α 2 E [ a l b l c l ] .
This transformation involves an interchange of the order of differentiation and integration, which is one of the regularity conditions. Therefore, in the non-regular model oTEF, the two sides do not match.
Additionally, the dual connection of the α -connection in the oTEF does not become the α -connection. This can be verified as follows.
The partial derivative γ g i γ and connection coefficients Γ ( α ) γ i , γ and Γ ( α ) γ γ , i are given by
γ g γ i = 0 ,
Γ ( α ) γ i , γ = i γ ψ θ , γ γ ψ θ , γ ,
Γ ( α ) γ γ , i = 0 .
Thus,
Γ ( α ) γ i , γ + Γ ( α ) γ γ , i = γ i ψ ( θ , γ ) γ ψ ( θ , γ ) .
Therefore, since γ i ψ ( θ , γ ) 0 and γ ψ ( θ , γ ) < 0 , the duality does not hold.

4. Existence of an α-Parallel Prior in the oTEF

In this section, as part of investigating the properties of the α-connection in the oTEF, we deal with the α-parallel priors. We show that there exists an α-parallel prior distribution for α = 1.
The existence of α-parallel priors depends on the geometric properties of statistical models, and they are not guaranteed to exist in general. Therefore, it is necessary to investigate the existence of α-parallel priors in the case of the oTEF. Note that the Levi-Civita connection is always equiaffine, and it is known to have the Jeffreys prior as a parallel prior distribution. Therefore, we do not deal with it in this section.
In the oTEF, attention is needed to be paid to the conditions for the existence of the α-parallel prior distributions. In the case of regular models, Proposition 2 provides a necessary and sufficient condition for the existence of α-parallel priors. Sun et al. [13] revealed that the Pareto distribution family does not satisfy this condition and claims that it does not have α-parallel priors. However, the above deduction is incorrect, since Proposition 2 does not hold for an oTEF distribution. In proof [17] of Proposition 2, the connection coefficients of the α-connection are written in the same form as (55) by the Levi-Civita connection and the cubic tensor T i j k = E [ i l j l k l ] . However, this form cannot represent the α-connection in Definition 8. Then, in the oTEF, Proposition 2 does not confirm that the equation
a T b c c b T a c c = 0 ( a , b = 1 , , n , γ )
is a necessary and sufficient condition for the existence of α-parallel priors.
Therefore, in this study, we use the necessary and sufficient conditions for general affine connections to investigate the existence of the α-parallel prior.
The following theorem reveals the existence of α-parallel priors in the oTEF.
Theorem 1.
Consider P as belonging to the oTEF with densities of the form as those in Definition 13 with the natural parameter θ Θ and the truncation parameter γ. If α = 1 , then the connection ( α ) is equiaffine and there exists a one-parallel prior. Moreover, this one-parallel prior π ( 1 ) can be represented as
π ( 1 ) θ , γ γ ψ θ , γ
for θ Θ , γ I .
Proof. 
First, we prove that ( 1 ) is equiaffine. By Proposition 1, the necessary and sufficient condition for ( α ) to be equiaffine is R ( α ) a b c c = 0 for a , b = 1 , , n , γ everywhere in P , where R ( α ) denotes α-Riemannian curvature tensors. This condition can be represented as a Γ ( α ) b c c = b Γ ( α ) a c c , since
R ( α ) a b c c = c , d = c { a Γ ( α ) b c d b Γ ( α ) a c d + Γ ( α ) a e d Γ ( α ) b c e Γ ( α ) b e d Γ ( α ) a c e }
= a Γ ( α ) b c c b Γ ( α ) a c c + Γ ( α ) a e c Γ ( α ) b c e Γ ( α ) b e c Γ ( α ) a c e
= a Γ ( α ) b c c b Γ ( α ) a c c .
By Definition 7, we have
Γ ( α ) j a a = 1 α 2 j log det ( g k l ) + j log γ ψ ,
Γ ( α ) γ a a = n + 1 ( 1 α ) 2 γ ψ + γ log γ ψ .
Thus,
i Γ ( α ) j a a j Γ ( α ) i a a = 0 ,
γ Γ ( α ) i a a i Γ ( α ) γ a a = 1 α 2 i γ log det ( g k l ) + ( n + 1 ) i γ ψ .
Therefore, the formula a Γ ( α ) b c a = b Γ ( α ) a c a holds when α = 1 and ( 1 ) is equiaffine.
Second, we find the one-parallel prior. Let π ( 1 ) be the density of the one-parallel volume element. According to Proposition 1 in the paper by Takeuchi and Amari [17], we have
a log π ( 1 ) = Γ ( 1 ) a b b .
In the case of the oTEF, its representation is
i log π ( 1 ) = i log γ ψ , γ log π ( 1 ) = γ log γ ψ .
Therefore, the one-parallel prior for P is given as
π ( 1 ) ( θ , γ ) γ ψ .
Moreover, the above one-parallel prior coincides with a certain reference prior. A reference prior, proposed by Bernardo [24], is a noninformative prior distribution derived from an information-theoretic perspective. Specifically, it is defined as a prior distribution that maximizes the expectation of the KL divergence between the posterior and prior distributions, and in the case of no nuisance parameters, it coincides with the Jeffreys prior [24]. However, this is not necessarily the case when nuisance parameters are present. Ghosh and Mukerjee [25] provided a new formulation for reference priors with nuisance parameters by considering the maximization of a functional with an appropriate penalty term. Furthermore, Ghosal [26] extended this reference prior to non-regular models where the support of the density depends on the parameters. When applied to an oTEF model with γ as the parameter of interest and θ as the nuisance parameter, the reference prior of Ghosal [26] is given by
π Ghosal ( θ , γ ) γ ψ .
Thus, it coincides with the one-parallel prior in Theorem 1.

5. Scalar Curvature on a Submodel of the oTEF

This section finds a submodel of the oTEF with constant scalar curvature for the Levi-Civita connection. We do not use the α-connection in Definition 8.
Previous works about the information geometry of non-regular cases have mainly studied the geometric structure of Pareto distributions for the Levi-Civita connection. They adopted the formally defined Fisher metric as in (42), which is consistent with our Riemannian metric. Rylov [10] found that the family of Pareto distributions has a constant curvature with respect to the Levi-Civita connection. Li et al. [12] showed that its geometrical structure is isometric to the Poincaré upper half-plane and applied this geometrical structure to Bayesian inference by considering the Jeffreys prior.
We extend these previous works on Pareto distributions to n dimensions. For i = 1 , , n , let F i be a smooth function on the interval I and let X i be a random variable following a two-parameter truncated exponential distribution, which has density function
p E ( x , θ i , γ ) = θ i exp { θ i ( x γ ) } · 𝟙 [ γ , ) ( x ) ( x R )
with respect to the Lebesgue measure, where θ i ( 0 , ) , γ R . An oTEF includes this distribution.
Consider Q θ , γ , a joint distribution of independent random variables F 1 ( X 1 ) , , F n ( X n ) with the common redefined truncation parameter γ and the natural parameter θ = ( θ 1 , , θ n ) . Q θ , γ has the density function
q ( x , θ , γ ) = exp { i = 1 n θ i { F i ( x i ) F i ( γ ) } + i = 1 n log θ i } · 𝟙 I n ( γ ) ( x )
x = x 1 , , x n I n
with respect to the Lebesgue measure on R n . Here, I n is a rectangle i = 1 n F i ( R ) and I ( γ ) is I n [ γ , ) n . Q is a family of distributions Q θ , γ with parameters θ and γ .
Q includes practical examples such as a family of several Pareto distributions with a common scale parameter (Rohatgi and Saleh [14]) and a family of truncated exponential distributions with a common location parameter (Ghosh and Razmpour [15]). Truncated exponential distributions with a common γ are sometimes applied to reliability and life testing. Please assume that the case where the first failure of n products can occur only after a common minimum time γ has elapsed, and these products have unknown and possibly unequal failure rates θ 1 , , θ n . This truncation parameter γ takes the role of a “guarantee time”, so estimation of the parameter γ is vital to determining the warranty period.
Note that the geometric structure of the above common truncation parameter model is equivalent to that of the family of Pareto distributions when n = 1 .
Theorem 2.
Riemannian manifold Q has a constant scalar curvature of 2 .
Proof. 
Let G ( θ , γ ) denote i θ i F i ( γ ) , where F ( γ ) = d F ( γ ) / d γ .
The Riemannian metric for the common truncation parameter model is given by
g i j = 1 θ i 2 δ i j ,
g i γ = 0 ,
g γ γ = G ( θ , γ ) 2
for i , j = 1 , , n . The tangent vectors 1 , , n , γ are mutually orthogonal. By (46), we have
Γ i j , k = 1 θ i 3 δ i j k ,
Γ i j , γ = 0 ,
Γ i γ , j = 0 ,
Γ i γ , γ = F i ( γ ) G ( θ , γ ) ,
Γ γ γ , i = F i ( γ ) G ( θ , γ ) ,
Γ γ γ , γ = G ( θ , γ ) γ G ( θ , γ ) ,
and
Γ i j k = 1 θ i δ i j k ,
Γ i j γ = 0 ,
Γ i γ j = 0 ,
Γ i γ γ = F i ( γ ) G ( θ , γ )
= i log G ( θ , γ ) ,
Γ γ γ i = F i ( γ ) G ( θ , γ )
= i log G ( θ , γ ) ,
Γ γ γ γ = γ G ( θ , γ ) G ( θ , γ )
= γ log G ( θ , γ )
for i , j , k = 1 , , n , where
δ i j k = 1 for i = j = k , 0 otherwise .
Hence, we obtain
R i γ i γ = i Γ γ i γ γ Γ i i γ + Γ i a γ Γ γ i a Γ γ a γ Γ i i a
= i i log G ( θ , γ ) + i log G ( θ , γ ) 2 + 1 θ i i log G ( θ , γ )
= F i ( γ ) G ( θ , γ ) 2 + F i ( γ ) G ( θ , γ ) 2 + F i ( γ ) θ i G ( θ , γ )
= F i ( γ ) θ i G ( θ , γ )
for i = 1 , , n .
Therefore, the scalar curvature is given by
R = R a b c d g a d g b c
= 2 .

6. Concluding Remarks

This paper considered the geometric structure of a one-sided truncated exponential family (oTEF) with parameters θ and γ . We constructed a Riemannian metric based on the asymptotic properties of the maximum likelihood estimators. Under this, we showed that the formally defined α -connection admits α -parallel priors when α = 1 . Our result gives geometric meaning to a specific reference prior. Furthermore, we proved that the scalar curvature of some submodels of the oTEF obtained by making γ common across multiple distributions is constant.
It is essential to discuss suitable affine conditions for the oTEF. First, we need to reveal the statistical meaning of the α -connection in the oTEF. α -connection coefficients are expected to appear in higher-order terms of the variance of maximum likelihood estimators. Additionally, instead of the α -connection, we can construct a family of affine connections to be equiaffine by connecting the α -connection and the Levi-Civita connection. It is also interesting to consider affine connections induced by the third derivatives of divergences.

Author Contributions

Conceptualization, M.Y. and F.T.; methodology, M.Y. and F.T.; validation, M.Y. and F.T.; investigation, M.Y. and F.T.; resources, M.Y. and F.T.; writing—original draft preparation, M.Y.; writing—review and editing, F.T.; visualization, M.Y.; supervision, F.T.; project administration, M.Y. and F.T.; funding acquisition, M.Y. and F.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI grant numbers 19K11860 and 23K11006, in addition to JST SPRING grant number JPMJSP2138.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Chentsov, N.N. Statistical Decision Rules and Optimal Inference. In Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 1982; Volume 53. [Google Scholar] [CrossRef]
  2. Amari, S. Differential-Geometrical Methods in Statistics. In Lecture Notes in Statistics; Springer: Berlin, Germany, 1985; Volume 28. [Google Scholar] [CrossRef]
  3. Amari, S.; Nagaoka, H. Methods of Information Geometry. In Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA; Oxford University Press: Oxford, UK, 2000. [Google Scholar] [CrossRef]
  4. Amari, S. Finsler geometry of non-regular statistical models. RIMS Kokyuroku 1984, 538, 81–95. [Google Scholar]
  5. Bar-Lev, S.K. Large Sample Properties of the Mle and Mcle for the Natural Parameter of a Truncated Exponential Family. Ann. Inst. Stat. Math. 1984, 36, 217–222. [Google Scholar] [CrossRef]
  6. Akahira, M. Second-Order Asymptotic Comparison of the MLE and MCLE of a Natural Parameter for a Truncated Exponential Family of Distributions. Ann. Inst. Stat. Math. 2016, 68, 469–490. [Google Scholar] [CrossRef]
  7. Akahira, M. Statistical Estimation for Truncated Exponential Families; Springer Briefs in Statistics; Springer: Singapore, 2017. [Google Scholar] [CrossRef]
  8. Nielsen, F. Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
  9. Shemyakin, A. Hellinger Information Matrix and Hellinger Priors. Entropy 2023, 25, 344. [Google Scholar] [CrossRef] [PubMed]
  10. Rylov, A. Constant Curvature Connections On Statistical Models. In Information Geometry and Its Applications; Ay, N., Gibilisco, P., Matúš, F., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 252, pp. 349–361. [Google Scholar] [CrossRef]
  11. Peng, L.; Sun, H.; Jiu, L. The Geometric Structure of the Pareto Distribution. Geom. Struct. Pareto Distrib. 2007, 14, 5–13. [Google Scholar]
  12. Li, M.; Sun, H.; Peng, L. Fisher–Rao Geometry and Jeffreys Prior for Pareto Distribution. Commun. Stat.—Theory Methods 2022, 51, 1895–1910. [Google Scholar] [CrossRef]
  13. Sun, F.; Cao, Y.; Zhang, S.; Sun, H. The Bayesian Inference of Pareto Models Based on Information Geometry. Entropy 2021, 23, 45. [Google Scholar] [CrossRef] [PubMed]
  14. Rohatgi, V.K.; Saleh, A.K.M.E. Estimation of the Common Scale Parameter of Two Pareto Distributions in Censored Samples. Nav. Res. Logist. 1987, 34, 235–238. [Google Scholar] [CrossRef]
  15. Ghosh, M.; Razmpour, A. Estimation of the Common Location Parameter of Several Exponentials. Sankhyā Indian J. Stat. Ser. A (1961–2002) 1984, 46, 383–394. Available online: http://xxx.lanl.gov/abs/25050498 (accessed on 4 May 2023).
  16. Bartlett, M.S. Approximate Confidence Intervals. Biometrika 1953, 40, 12–19. [Google Scholar] [CrossRef]
  17. Takeuchi, J.; Amari, S. Alpha-Parallel Prior and Its Properties. IEEE Trans. Inf. Theory 2005, 51, 1011–1023. [Google Scholar] [CrossRef]
  18. Robert, C.P. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed.; Springer Texts in Statistics; Springer: New York, NY, USA, 2007. [Google Scholar]
  19. Jeffreys, H. Theory of Probability, 3rd ed.; Clarendon Press: Oxford, UK, 1961. [Google Scholar]
  20. Nomizu, K.; Sasaki, T. Affine Differential Geometry. In Cambridge Tracts in Mathematics; Cambridge University Press: Cambridge, UK, 1994; Volume 111. [Google Scholar]
  21. Arnold, B.C. Pareto Distributions. In Statistical Distributions in Scientific Work; International Co-operative Publishing House: Burtonsville, MD, USA, 1983; Volume 5. [Google Scholar]
  22. Akahira, M.; Ohyauchi, N. Second-Order Asymptotic Loss of the MLE of a Truncation Parameter for a Truncated Exponential Family of Distributions. Commun. Stat. Theory Methods 2017, 46, 6085–6097. [Google Scholar] [CrossRef]
  23. Akahira, M. Second Order Asymptotic Variance of the Bayes Estimator of a Truncation Parameter for a One-Sided Truncated Exponential Family of Distributions. J. Jpn. Stat. Soc. (Nihon Tôkei Gakkai Kaihô) 2016, 46, 81–98. [Google Scholar] [CrossRef]
  24. Bernardo, J.M. Reference Posterior Distributions for Bayesian Inference. J. R. Stat. Soc. Ser. B (Methodol.) 1979, 41, 113–128. [Google Scholar] [CrossRef]
  25. Ghosh, J.K.; Mukerjee, R. Non-Informative Priors. In Bayesian Statistics, 4 (Peñíscola, 1991); Oxford University Press: New York, NY, USA, 1992; pp. 195–210. [Google Scholar]
  26. Ghosal, S. Reference Priors in Multiparameter Nonregular Cases. Test 1997, 6, 159–186. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yoshioka, M.; Tanaka, F. Information-Geometric Approach for a One-Sided Truncated Exponential Family. Entropy 2023, 25, 769. https://doi.org/10.3390/e25050769

AMA Style

Yoshioka M, Tanaka F. Information-Geometric Approach for a One-Sided Truncated Exponential Family. Entropy. 2023; 25(5):769. https://doi.org/10.3390/e25050769

Chicago/Turabian Style

Yoshioka, Masaki, and Fuyuhiko Tanaka. 2023. "Information-Geometric Approach for a One-Sided Truncated Exponential Family" Entropy 25, no. 5: 769. https://doi.org/10.3390/e25050769

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop