1. Introduction
In Bayesian inference, a parameter is regarded as a random variable . A density of is called, by abuse of terminology, a prior distribution . After collecting some data, one obtains a conditional density , referred to as the likelihood function. The Bayes’ theorem then computes the posterior distribution using and . This is interpreted as an update of the information about the unknown parameter in Bayesian inference. One such choice of the prior distribution is Jeffreys prior , which is the correct choice of uniform distribution. Here, the word “uniform” means uninformative, not favorable of any particular choice of the parameter.
Information geometry, in its narrowest sense, is an attempt to use differential geometry to study statistical inference. It has found applications in statistical inference, signal processing, and machine learning [
1]. References [
2,
3] are two elementary introductions. In information geometry, geometric structures, for example metric tensors
g and affine connections ∇, can be put on the set of prior distributions
. These geometric structures help to single out some particular prior distributions, for example, the Jeffreys prior
which, by the fundamental theorem of Riemannian geometry, is the unique volume form parallel with respect to the Levi–Civita connection
. Since the Jeffreys prior
is provided by geometry, it is automatically invariant under reparametrization, which reflects the opinion that information can be at best not lost during a transformation of parameters and this is encoded in the notion of sufficient statistics. Similarly, if one can find a unique prior distribution satisfying some specified geometric conditions, then that prior distribution is called canonically chosen. Matsuzoe, Takeuchi, and Amari used information geometry to define the
-parallel prior
such that, when
, it reduces to the Jeffreys prior [
4].
Historically, Weyl proposed a generalization of general relativity to unify gravity and electromagnetism. Einstein soon pointed out that Weyl’s theory predicted substantial broadening of the characteristic length of atoms, which is contradictory to the well-observed thin atomic spectra. Even though Weyl geometry failed the unification of gravity and electromagnetism, which is still an open problem, Weyl geometry has found applications in possible generalization of general relativity [
5] and the differential-geometric study of defects in continuum mechanics [
6]. Weyl geometry is kept in mathematics. The relation of affine differential geometry, Weyl geometry, and Riemannian geometry are shown below. Let
be a fibre bundle with base
B and let each fibre
, be a Lie group
G, called the structure group of the fibre bundle. For different
G, we obtain different geometries as follows:
the general linear group, affine differential geometry;
the conformal group, Weyl geometry;
the orthogonal group, Riemannian geometry.
With the reduction of structure groups
, and the fact that smaller the structure group, the more geometric properties are expected. In our case, the reduction of group gives rise to a canonical choice for the parameter
of the
-parallel prior. For more about bundle-theoretic differential geometry, see [
7].
In this paper, we will use Weyl geometry to define a prior distribution for Bayesian inference, which we call the Weyl prior. We will elucidate the relation between the dimension of a statistical manifold and the parameter in Takeuchi and Amari’s -parallel prior.
The organization of the paper is as follows: In
Section 2, we review information geometry and the
-parallel prior. We discuss Weyl geometry in
Section 3. We define the Weyl prior, and elucidate the relation between the Weyl prior and the
-parallel prior in
Section 4. We calculate the Weyl prior for the univariate Gaussian distribution as an example in
Section 5 and the multivariate Gaussian distribution in
Section 6. All functions in the paper are real-valued and smooth, all connections are torsion-free, and the Einstein summation rule is used.
2. Information Geometry and -Priors
In this section, we review some basics of information geometry. For more details, see [
1].
Let us consider a statistical model which is a set of parametric densities can be geometrized as follows: first, we introduce the Fisher metric tensor, which is a 2nd order tensor,
Definition 1. The Fisher metric tensor is defined bywhere is the the transition kernel , l is the log likelihood function, and is the partial derivative with respect to coordinate i. Then, we introduce the Amari–Chentsov tensor, which is a 3rd order tensor.
Definition 2 (Amari–Chentsov Tensor)
. The Amari–Chentsov tensor C is defined by Remark 1. The Amari–Chentsov tensor C defined above satisfiesIn other words, C is the covariant derivative of the metric tensor In Riemannian geometry, C vanishes everywhere, which is required if the length of a tangent vector is to be preserved under the parallel transport. In information geometry, this requirement is dropped and thus a duality theory arises. Let ∇ be an arbitrary torsion-free affine connection on a Riemannian manifold The dual connection of ∇ plays an important role in information geometry.
Definition 3 (Dual Connection)
. The dual connection on a Riemannian manifold with affine connection ∇ is defined as the unique affine connection satisfying the following equation:where and Remark 2. The dual connection preserves the metric tensor g together with where and Π and are parallel transports induced by ∇ and , respectively, along some curve from p to q. In general, and , unless . See [1]. Now, we introduce -connections.
Definition 4. The α-connections are defined in terms of Christoffel symbols bywhere and LC stands for Levi–Civita. Remark 3. The dual connection of is then given by Remark 4. The α-parallel prior is the volume form parallel with respect to Unlike the Jeffreys prior, which always exists, the α-parallel prior do not necessarily exist. An α-parallel prior exists if and only if the Ricci curvature tensor is symmetric [4]. However, if exists for one then it exists for all α [8]. The following characterization will be used in
Section 5 to obtain the relation between the
-parallel prior and the Weyl prior defined therein.
Proposition 1. [4] Let be a statistical manifold. If there exists an exact 1-form for some function Ω determined by ∇ and g, then the α-parallel prior is Remark 5. is known as the Chebyshev 1-form. A differential form ϕ is called closed if the exterior derivative vanishes i.e., , and is called exact if there exists a differential form φ such that . By definition, every exact form is closed. By Poincare’s lemma, every closed form is locally exact. Because statistical manifolds are simply connected, closedness implies exactness.
3. Weyl Geometry
In this section, we review some concepts of Weyl geometry which are needed in the next section. For more details, see [
9].
Two Riemannian metrics g and on a manifold M are said to be conformally equivalent if for some smooth function on M.
A conformal structure on M is an equivalent class of conformally equivalent Riemannian metrics, i.e., .
A Weyl structure is a map
from the conformal structure
to the set of 1-forms on
satisfying
The image of g under F is called the Weyl 1-form
A Weyl structure enables us to translate a scalar product
at
p to
at
q along a curve
where
is the pullback of the Weyl
form
along curve
A Weyl manifold is a manifold with a Weyl structure.
Remark 6. The meaning of this equation is: If we start with a scalar product at a point p arising from the conformal class , then there exists a metric tensor extending , i.e., . The value of this particular choice of g at another point q is . However, different choice of g gives rise to different . The scalar product determined by Weyl translation is proven to be independent of g [9]. Hence, by Weyl translation, we can compare lengths of vectors at different points on a Weyl manifold, whereas, with only the conformal structure , we can only compare ratios of lengths. An affine connection ∇ is said to be a Weyl connection if the parallel transport of a scalar product under ∇ coincides with the Weyl translation.
The Weyl connection is characterized by the following propositions.
Proposition 2 ([
9])
. An affine connection ∇ is a Weyl connection if and only if for all Proposition 3 (Fundamental Theorem of Weyl Geometry [
9])
. There exists a unique torsion-free Weyl connection on a Weyl manifold The Christoffel symbols of are given bywhere is the Kronecker delta. 4. Weyl Prior
In this section, we define the Weyl prior and show its relation to the -parallel prior.
First, we define the Weyl prior as follows.
Definition 5 (Weyl Prior). Let be an n-dimensional Riemannian manifold with the conformal structure and the Weyl structure Let be the Weyl connection. The Weyl prior is defined as the unique volume form parallel with respect to
Remark 7. The uniqueness of the Weyl prior is the result of the uniqueness of the Weyl connection.
Now, we prove the main result of this paper.
Theorem 1. Let be a Riemannian manifold. Let and be the Weyl connection and the -connection, i.e., the α-connection with , where n is the dimension of Suppose that the -prior exists, then Proof. Consider an arbitrary volume form
where
f is a positive function on
For
to be parallel with respect to
it is necessary and sufficient that
Componentwise, Equation (
8) becomes
since covariant derivative coincides with partial derivative for functions.
Since
is a scalar density of weight
its covariant derivative is given by
where
is obtained by the contraction of Equation (
7) over
i and
Substituting Equation (
10) into Equation (
9), we obtain
Since the covariant derivative coincides with exterior derivative for functions, collect indices in Equation (
11)
Assume for now that the Weyl 1-form
is exact, that is,
for some function
on
Then, from Equation (
29), the Weyl prior is given by
By comparison of Equation (
13) with Proposition 1, the theorem is proved under the assumption of the exactness of the Weyl 1-form.
Since we proved that the Weyl prior is the prior with and we required the existence of our assumption of the exactness of the Weyl form is indeed true by Remark 4. □
Remark 8. The minus sign in is a result of the definition of α-connection. By Remark 3, the dual connection of is we would have here, had we defined the α-connection to be its dual connection in Definition 4. It would seem more natural to consider the dual prior of the Weyl prior.
5. Weyl Prior for Gaussian Family
In this section, we calculate the Weyl prior of the Gaussian family as an example.
Example 1 (Gaussian Family). Consider the Gaussian family
Choosing
as a coordinate system, we have
where
l is the log likelihood function.
The first element of the Fisher metric tensor
g in the
-coordinate is given by
where
is the conditional expectation of
X given
The other elements of the Fisher metric tensor are
and
Hence,
To calculate the Weyl 1-form, we first calculate the Amari–Chentsov tensor
Similarly,
and
Hence, the Weyl 1-form is given by
Now, it is easy to check that
is an exact form. Hence, for Gaussian family
a Weyl prior exists and is given by
Remark 9. Based on our calculation, we find that the Weyl prior for the univariate Gaussian distribution with unknown mean and unknown variance is just the uniform prior. This shows that the uniform prior is in fact a uninformative prior. This counter-intuitive result is related to the fact that every two-dimensional manifold is conformally-flat, which can be proved using the existence of isothermal coordinates in two dimensions [10]. 6. Multivariate Gaussian
The above example can be extended to the multivariate case. Consider the multivariate Gaussian distribution
where
is the mean vector and
is the covariance matrix.
Using matrix calculus, we have
and
We can now compute the Fisher information matrix.
where the second last line is by the action of matrix tensor product and the last line is by the definition of covariance matrix.
Similarly,
and
The Amari–Chentsov tensor can be computed in the same way:
The detail computation of Equation (
26) is as follows:
The above expression can be evaluated by 4th and 6th moments of multivariate Gaussian.
The Weyl prior is then given by:
The Weyl prior is thus given by:
where, in the first line,
is the dimension of the statistical manifold for the multivariate Gaussian distribution.
Remark 10. Our calculation of the Weyl prior of the multivariate Gaussian distribution is generally not a uniform prior. However, Equation (29) shows that, when , that is, the univariate case, the Weyl prior is indeed the uniform prior. This is in accordance with our direct calculation for the univariate case. 7. Discussion and Conclusions
We discussed Weyl geometry and Weyl prior in this paper. We also calculated Weyl prior for the Gaussian family as an example.
The underlying principle of Jeffreys prior,
-parallel prior, and Weyl prior is the concept of invariance in statistics. Jeffreys prior is invariant under a change of the coordinate of parameters. Weyl prior and
-parallel prior, as generalizations of Jeffreys prior, automatically satisfy this invariance. Moreover, Weyl prior, as a volume form defined on a Weyl manifold, is also invariant under a gauge transformation [
11]. In addition, invariant under the gauge transformation is the generalized conjugate connection [
11].
One possible use of the Weyl prior is using the uniform prior for distributions with two parameters. This is because any two-dimensional manifold is conformally-flat.