Next Article in Journal
Entropy Production and the Maximum Entropy of the Universe
Previous Article in Journal
Inheritance is a Surjection: Description and Consequence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Symplectic/Contact Geometry Related to Bayesian Statistics †

Department of Mathematics, Osaka Dental University, Osaka 573-1121, Japan
Presented at the 5th International Electronic Conference on Entropy and Its Applications, 18–30 November 2019; Available online: https://ecea-5.sciforum.net/.
Proceedings 2020, 46(1), 13; https://doi.org/10.3390/ecea-5-06665
Published: 17 November 2019

Abstract

:
In the previous work, the author gave the following symplectic/contact geometric description of the Bayesian inference of normal means: The space H of normal distributions is an upper halfplane which admits two operations, namely, the convolution product and the normalized pointwise product of two probability density functions. There is a diffeomorphism F of H that interchanges these operations as well as sends any e-geodesic to an e-geodesic. The product of two copies of H carries positive and negative symplectic structures and a bi-contact hypersurface N. The graph of F is Lagrangian with respect to the negative symplectic structure. It is contained in the bi-contact hypersurface N. Further, it is preserved under a bi-contact Hamiltonian flow with respect to a single function. Then the restriction of the flow to the graph of F presents the inference of means. The author showed that this also works for the Student t-inference of smoothly moving means and enables us to consider the smoothness of data smoothing. In this presentation, the space of multivariate normal distributions is foliated by means of the Cholesky decomposition of the covariance matrix. This provides a pair of regular Poisson structures, and generalizes the above symplectic/contact description to the multivariate case. The most of the ideas presented here have been described at length in a later article of the author.

1. Introduction

We work in the C -smooth category. A manifold U embedded in the space of probability distributions inherits a separating premetric D : U × U R 0 from the relative entropy, which is called the Kullback–Leibler divergence. The geometry of ( U , D ) is studied in the information theory. The information geometry [1] concerns the infinitesimal behavior of D. In the case where U is the space of univariate normal distributions, we regard U as the half plane H = R × R > 0 ( m , s ) , where m denotes the mean and s the standard deviation. Since the convolution of two normal densities is a normal density, it induces a product ∗ on H , which we call the convolution product. On the other hand, since the pointwise product of two normal densities is proportional to a normal density, it induces another product · on H , which we call the Bayesian product. The first half of this presentation is devoted to the geometric description of Bayesian statistics including this product.
On the other hand, the current statistics lies not only in probability theory, but also in information theory. The author [2] found a symplectic description of the statistics of univariate normal distributions which is simultaneously based on these theories. Precisely, on the product H × H with coordinates ( m , s , M , S ) , we take the positive and negative symplectic forms d λ ± with the fixed primitives λ ± = d m s ± d M S . Then the Lagrangian surfaces
F ε = ( m , s , M , S ) H × H | m s + M ε S = 0 , s S = 1 ( ε R )
with respect to d λ foliate the hypersurface N = { s S = 1 } . For each ε R , the leaf F ε is the graph of a diffeomorphism of H which sends any geodesic to a geodesic with respect to the e-connection. Further, in the case where ε = 0 , the diffeomorphism interchanges the products ∗ and ·, namely,
( m , s , M , S ) F 0 ( m , s , M , S ) F 0 ( ( m , s ) ( m , s ) , ( M , S ) · ( M , S ) ) F 0 ( ( m , s ) · ( m , s ) , ( M , S ) ( M , S ) ) F 0 .
Thus, the iteration of * in the first factor of H × H corresponds to that of · in the second factor. The primitives λ ± , the hypersurface N, the foliation { F ε } ε R and the leaf F 0 are preserved under the diffeomorphism φ ζ : ( m , s , M , S ) ( ζ m , ζ s , ζ 1 M , ζ 1 S ) for any ζ R > 0 . This map appears in the construction of Hilbert modular cusps by Hirzebruch [3]. Further the function f : H × H R 0 which is defined by f ( m , s , M , S ) = D ( m , s , m , s ) for ( m , s , M , S ) F 0 is also preserved under φ ζ . On the other hand, the hypersurface N inherits the mutually transverse pair of the contact structures ker ( λ ± | N ) , which we call the bi-contact structure. Let X be the contact Hamiltonian vector field of λ + | N with respect to the function m s , i.e., the unique contact vector field satisfying λ + | N ( X ) = m s . Then X is also the contact Hamiltonian vector field of λ | N with respect to the same function m s . We call such a vector field a bi-contact Hamiltonian vector field. There is a non-trivial bi-contact Hamiltonian vector field on ( N , λ ± ) which is tangent to the leaf F 0 . It is the one for the above function m s up to constant multiple. We may regard its restriction to F 0 as a vector field on the second factor of H × H since F 0 is the graph of a diffeomorphism. Surprisingly, this vector field is tangent to a foliation by e-geodesics, and each leaf is closed under the Bayesian product. Further, the author [4] showed that a similar vector field on the squared space of Student’s t-distributions can provide an indication of “geometric smoothness” in actual data smoothing.
In the second half of this presentation, we generalize the above description to the multivariate case. It is straightforward except that we use the Cholesky decomposition to foliate the squared space of n-variate normal distributions. Here the leaves are 4 n -dimensional submanifolds carrying two symplectic structures. They form a pair of Poisson structures on the squared space.
This short paper provides the calculation results and sketches the mathematical ideas. For the precise descriptions and the proofs, see the article [5] by the author.

2. Results

2.1. Bayesian Information Geometry

In this subsection, we generalize the setting of the information geometry. Take a smooth family of volume forms with finite total volumes on R n . We regard each of the volume forms as a point of a manifold M , namely, a point y M presents a volume form ρ y d Vol smoothly depending on y. Let V be the space of volume forms with finite total volumes on M . We take a volume form V in V . Given a point z on R n , we regard the value ρ y ( z ) of the density as a function ρ ( z ) : y ρ y ( z ) , and multiply the volume form V by the function ρ ( z ) . This defines the updating map
φ : R n × V ( z , V ) ρ ( z ) V V .
We notice that a volume form with finite total volume is proportional to a probability measure. Thus the function ρ ( z ) is proportional to the likelihood, and System (1) presents Bayes’ rule.
A proper subset U ˜ V is called a (generalized) conjugate prior if it satisfies
φ ( R n × U ˜ ) U ˜ .
Suppose that we have a conjugate prior U ˜ which is a smooth manifold, and further that, by using the hypersurface U = { V U ˜ R n V = 1 } , it can be written as U ˜ = { k V V U , k > 0 } . We define on U ˜ the following “distance” D ˜ , which satisfies non of the axioms of distance.
D ˜ ( V 1 , V 2 ) = R n V 1 ln V 2 V 1 ( the relative entropy )
Note that the restriction D ˜ | U × U = D satisfies the separation axiom, and is called the Kullback–Leibler divergence. We have to fix the coordinate in the k-direction which presents the time. Then we write the quadratic term of the Taylor expansion of D ˜ ( P , P + d P ) + D ˜ ( P + d P , P ) as i , j g ˜ i j d P i d P j , where g ˜ i j = g ˜ j i . Suppose that g ˜ = [ g ˜ i j ] is a metric on U ˜ . Let ˜ 0 be the Levi-Civita connection with respect to g ˜ . We write the cubic term of the expansion of 3 D ˜ ( P , P + d P ) 3 D ˜ ( P + d P , P ) symmetrically as i , j , k T ˜ i j k d P i d P j d P k . This defines the line of (generalized) α -connections ˜ α = ˜ 0 α g ˜ T ˜ with affine parameter α R , where g ˜ T denotes the contraction l g ˜ k l T ˜ i j l by the contravariant metric g 1 = [ g i j ] . Note that ˜ α has no torsion. Restricting all of the above notions with tilde to the hypersurface U U ˜ , we obtain the notions without tilde in the usual information geometry [1]. Here U can be identified with a space U of probability distributions.

2.2. The Geometry of Normal Distributions

In this subsection we consider the space U of multivariate normal distributions. The pair of a vector μ = ( μ i ) 1 i n R n and an upper triangular matrix C = [ c i j ] 1 i , j n Mat ( n , R ) with positive diagonal entries determines an n-variate normal distribution by declaring that μ presents the mean and C T C the Cholesky decomposition of the covariance matrix. We put
σ i = c i i and r i j = c i j c i i ( i , j { 1 , , n } ) , i . e . , C = diag ( σ ) [ r i j ] .
Note that [ r i j ] is unitriangular, i.e., it is a triangular matrix whose diagonal entries are all 1. Considering σ R n and r = ( r i j ) 1 i < j n R n ( n 1 ) / 2 as parameters, we can write the probability density of the n-variate normal distribution at P = ( μ , σ , r ) U = R n × ( R > 0 ) n × R n ( n 1 ) / 2 as
p ( x ) = 1 ( 2 π ) n | σ | exp 1 2 C ( σ , r ) T ( x μ ) 2 ( x R n ) .
Then the relative entropy defines the premetric
D ( P , Q = ( μ , σ , r ) ) = C ( σ , r ) T ( μ μ ) 2 2 + C ( σ , r ) C ( σ , r ) 1 2 n 2 i = 1 n ln σ i σ i ,
where · 2 denotes the sum of squares (i.e., · the Frobenius norm). Thus,
D ( P + Δ P , P ) = C T Δ μ 2 2 + Δ C C 1 2 2 + tr ( Δ C C 1 ) ln 1 n + Δ C C 1 ,
where 1 n is the unit, and Δ C the difference C ( σ + Δ σ , r + Δ r ) C ( σ , r ) . Let r i j be the entries of the inverse matrix of [ r i j ] . Then we have
( the i j entry of Δ C C 1 ) = Δ σ i σ i ( i = j ) σ i + Δ σ i σ j k = i + 1 j r k j Δ r i k ( i < j ) 1 1 0 ( i > j ) .
The Fisher information g appears in D ( P + d P , P ) as the quadratic form
g = k = 1 n 1 σ k i = 1 k r i k d μ i 2 + 2 i = 1 n d σ i σ i 2 + l = 1 n 1 k = l + 1 n σ l σ k i = l + 1 k r i k d r l i 2 ,
which is presented by a block diagonal diag ( g μ μ , g σ σ , g r r , 2 , , g r r , n ) , where
g μ μ = g μ i , μ j = k i , j r i k r j k σ k 2 = C 1 C T , g σ σ = diag 2 σ i 2 ,
and g r r , l = g r l i r l j i , j > l = σ l 2 g μ i , μ j i , j > l ( l = 1 , , n 1 ). Lowering the upper indices of the α -connection by L g K L Γ α I J K = Γ { I , J } , K α , we have
Γ { μ i , μ j } , σ k 0 = Γ { μ i , σ k } , μ j 0 = r i k r j k σ k 3 , Γ { σ i , σ i } , σ i 0 = 2 σ i 3 , Γ { μ i , μ j } , r a b 0 = Γ { μ i , r a b } , μ j 0 = k = b n r b k ( r i a r j k + r i k r j a ) 2 σ k 2 , Γ { r l i , r l j } , σ l 0 = Γ { r l i , σ l } , r l j 0 = k i , j σ l r i k r j k σ k 2 , Γ { r l i , r l j } , σ k 0 = Γ { r l i , σ k } , r l j 0 = σ l 2 r i k r j k σ k 3 ( k i , j ) , Γ { r l i , r l j } , r a b 0 = Γ { r l i , r a b } , r l j 0 = σ l 2 Γ { μ i , μ j } , r a b 0 ( a > l ) , Γ { I , J } , K 0 = 0 ( for the other choices of { I , J } and K ) ,
and
Γ { μ i , σ k } , μ j 1 = 2 Γ { μ i , σ k } , μ j 0 , Γ { σ i , σ i } , σ i 1 = 3 Γ { σ i , σ i } , σ i 0 , Γ { μ i , r a b } , μ j 1 = 2 Γ { μ i , r a b } , μ j 0 , Γ { r l i , r l j } , σ l 1 = 2 Γ { r l i , r l j } , σ l 0 , Γ { r l i , σ k } , r l j 1 = 2 Γ { r l i , σ k } , r l j 0 ( k i , j ) , Γ { r l i , r a b } , r l j 1 = 2 Γ { r l i , r a b } , r l j 0 ( a > l ) , Γ { I , J } , K 1 = 0 ( for the other choices of { I , J } and K ) ,
and thus we also have
Γ { μ i , μ j } , σ k ( 1 ) = 2 Γ { μ i , μ j } , σ j 0 , Γ { σ i , σ i } , σ i ( 1 ) = Γ { σ i , σ i } , σ i 0 , Γ { μ i , μ j } , r a b ( 1 ) = 2 Γ { μ i , μ j } , r a b 0 , Γ { r l i , σ l } , r l j ( 1 ) = 2 Γ { r l i , σ l } , r l j 0 , Γ { r l i , r l j } , σ k ( 1 ) = 2 Γ { r l i , r l j } , σ k 0 ( k i , j ) , Γ { r l i , r l j } , r a b ( 1 ) = 2 Γ { r l i , r l j } , r a b 0 ( a > l ) , Γ { I , J } , K ( 1 ) = 0 ( for the other choices of { I , J } and K ) .
The coefficients for the e-connection all vanish with respect to the natural parameter θ = ( C 1 C T μ , ξ ) , where ξ = ( ξ a b ) 1 a b n is the upper half of C 1 C T . Dually, the coefficients for the m-connection all vanish with respect to the expectation parameter η = ( μ , ν ) , where ν = ( ν a b ) 1 a b n is the upper half of C T C + μ μ T . Now we fix the third component r of ( μ , σ , r ) , and change the others. We take the natural projection π : U = H n × R n ( n 1 ) / 2 R n ( n 1 ) / 2 and modify the coordinates ( μ , σ ) on the fiber L ( r ) = π 1 ( r ) into ( m , s ) in the next proposition.
Proposition 1.
The fiber L ( r ) = π 1 ( r ) is an affine subspace of U with respect to the e-connection 1 . It can be parametrized by affine parameters m i s i 2 and 1 s i 2 , where m = [ r i j ] T μ and s = 2 σ .
The fiber L ( r ) satisfies the following two properties.
Proposition 2.
L ( r ) is closed under the convolution * and the normalized pointwise product · between the probability densities.
Proposition 3.
The fiber L ( r ) with the induced metric from g admits a Kähler complex structure.
We write the restriction D | L ( r ) of the premetric D using the coordinates ( m , s ) as
D | L ( ( m , s ) , ( m , s ) ) = 1 2 i = 1 n m i s i m i s i 2 + s i 2 s i 2 1 ln s i 2 s i 2 .
We take the product U 1 × U 2 of two copies of the space U. Then the products L 1 ( r ) × L 2 ( R ) of the fibers foliate U 1 × U 2 . We call this the primary foliation of U 1 × U 2 . For each ( r , R ) R n ( n 1 ) , we have the coordinate system ( m , s , M , S ) on the leaf L 1 ( r ) × L 2 ( R ) . From the Kähler forms
ω 1 = 2 i = 1 n d m i d s i s i 2 and ω 2 = 2 i = 1 n d M i d S i S i 2
respectively, on L 1 ( r ) and L 2 ( R ) , we define the symplectic forms ω 1 ± ω 2 on L 1 ( r ) × L 2 ( R ) . We fix their primitive 1-forms
λ ± = 2 i = 1 n d m i s i ± d M i S i .
The symplectic structures on the primary foliation defines a pair of regular Poisson structures.
Now we take the 2 n -dimensional submanifolds
F ε , δ = m i s i + M i ε i S i = 0 , s i S i = δ i ( i = 1 , , n )
of the leaf L 1 ( r ) × L 2 ( R ) for ε R n and δ ( R > 0 ) n . The secondary foliation of U 1 × U 2 foliates any leaf U ( r ) × U ( r ) by the 3 n -dimensional submanifolds F ε = δ ( R > 0 ) n F ε , δ for ε R n . The tertiary foliation of U 1 × U 2 foliates all leaves F ε of the secondary foliation by the 2 n -dimensional submanifolds F ε , δ for δ ( R > 0 ) n . We take the hypersurface
N = ( m , s , M , S ) L 1 × L 2 | i = 1 n ( s i S i ) = 1 ,
which inherits the contact forms α ± = λ ± | N . We can prove the following propositions.
Proposition 4.
With respect to the Kähler form d λ , the tertiary leaves F ε , δ are Lagrangian correspondences.
Proposition 5.
For any ε and δ with i = 1 n δ i = 1 , F ε , δ N is a disjoint union of n-dimensional submanifolds { s = c o n s t } F ε , δ which are integral submanifolds of the contact hyperplane distribution α + on N.
For each point ( ε , δ ) H n , we have the diffeomorphism F ^ ε , δ : H n H n sending ( m , s ) H n to ( M , S ) H n with ( m , s , M , S ) F ε , δ . We put
f ε , δ ( m , s , M , S ) = 1 2 i = 1 n M i ε i S i + e h i m i s i 2 + e 2 h i 1 + 2 h i ,
where h i = ln s i S i δ i . Then we have
D | L ( ( m , s ) , ( m , s ) ) = f ε , δ ( ( m , s ) , F ^ ε , δ ( m , s ) ) .
For any ζ ( R > 0 ) 2 n , we define the diffeomorphism
φ ε , ζ : ( m , s , M , S ) ( ζ 2 i 1 m i ) , ( ζ 2 i 1 s i ) , ( ε i + ζ 2 i ( M i ε i ) ) , ( ζ 2 i S i ) ) ,
which preserves the 1-forms λ ± . It is easy to prove
Proposition 6.
In the case where ζ 2 i 1 ζ 2 i = 1 for i = 1 , , n , the diffeomorphism φ ε , ζ preserves f ε , δ .
For each ε R n , we take the set f ε = ( f ε , δ , F ε , δ ) δ ( R > 0 ) n , and consider it as a structure of the secondary leaf F ε . Then we can prove
Proposition 7.
For any ζ ( R > 0 ) n , the diffeomorphism φ ε , ζ preserves the set f ε for any ε R n . In the case where ζ satisfies i = 1 n ( ζ 2 i 1 ζ 2 i ) = 1 , the diffeomorphism φ ε , ζ also preserves the hypersurface N.
Hereafter we fix ε = 0 . For any δ ( R > 0 ) n , the diffeomorphism F ^ 0 , δ interchanges the operation
( m , s ) ( m , s ) = m + m , s i 2 + s i 2
with the operation
( m , s ) · ( m , s ) = m i s i 2 + m i s i 2 s i 2 + s i 2 , s i s i s i 2 + s i 2 .
Namely,
Proposition 8.
If ( m , s , M , S ) , ( m , s , M , S ) F 0 , δ , then
( ( m , s ) · ( m , s ) , ( M , S ) ( M , S ) ) F 0 , δ ( ( m , s ) ( m , s ) , ( M , S ) · ( M , S ) ) F 0 , δ .
A curve ( m ( t ) , s ( t ) ) H n is a geodesic with respect to the e-connection 1 if and only if m i s i 2 and 1 s i 2 are affine functions of t for i = 1 , , n .
Definition 1.
We say that an e-geodesic ( m ( t ) , s ( t ) ) H n is intensive if it admits an affine parametrization such that 1 s i 2 are linear for i = 1 , , n .
Note that any e-geodesic is intensive in the case where n = 1 . We show
Proposition 9.
Given an intensive e-geodesic ( m ( t ) , s ( t ) ) H n , we can parametrize its image
M ( t ) , S ( t ) = ε i m i ( t ) δ i s i 2 , δ i s i
under the diffeomorphism F ^ ε , δ to obtain an intensive e-geodesic.
We have the hypersurface N = i = 1 n s i S i = 1 H n carrying the contact forms α ± = 2 i = 1 n d m i s i ± d M i S i | N . We state the main result.
Theorem 1.
The contact Hamiltonian vector field X of the restriction of the function i = 1 n m i s i to the hypersurface N on any leaf L 1 ( r ) × L 2 ( R ) H 2 n of the primary foliation of U 1 × U 2 with respect to the contact form α + on N coincides with that for the other contact form α . The vector field X is tangent to the tertiary leaves F ε , δ and defines flows on them. Here each flow line presents a correspondence between intensive e-geodesics as is described in Proposition 9. Particularly, for ε = 0 and any δ ( R > 0 ) n , the flow on the leaf F 0 , δ presents the iteration of the operation ∗ on the first factor of U × U and that of the operation · on the second factor.
Finally, we consider the transverse unitriangular group. We have the orthonormal frame
e i j = σ i σ j k = j n r j k r i k ( 1 i < j n )
with the relations [ e i j , e k l ] = δ i l e k j δ k j e i l of the unitriangular algebra. Using the dual coframe e i j , the relations can be expressed as d e i j = k = i + 1 j 1 e i k e k j . The transverse section of the primary foliation of U 1 × U 2 is the product of two copies of the unitriangular Lie group, which we would like to call the bi-unitriangular group. We fix the frame (resp. the coframe) of the transverse section consisting of the above e i j (resp. e i j ) in the first factor U 1 and their copies E i j (resp. E i j ) in the second factor U 2 . The quotient manifold carries the ( n 2 ) -plectic structure
Ω = i = 1 n e i , i + 1 e i , n E n i + 1 , n i + 2 E n i + 1 , n ,
which satisfies d Ω = 0 and Ω n > 0 . We notice that, in the symplectic case where n = 3 , the quotient manifold admits no Kähler structure (see [6]).

3. Discussion

It is remarkable that the transverse symplectic 6-manifold is naturally ignored in the Bayesian inference on 3-dimensional normal prior. The author conjectures that a similar geometry of 3 + 1 -dimensional relativistic prior has some relation to the M-theory. See [7] for a relation between Poisson geometry and matrix theoretical and non-commutative geometrical physics.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
  2. Mori, A. Information geometry in a global setting. Hiroshima Math. J. 2018, 48, 291–306. [Google Scholar] [CrossRef]
  3. Hirzebruch, F. Hilbert modular surfaces. Enseign. Math. 1973, 19, 183–281. [Google Scholar]
  4. Mori, A. A concurrence theorem for alpha-connections on the space of t-distributions and its application. Hokkaido Math. J. 2020, in press. [Google Scholar] [CrossRef]
  5. Mori, A. Global geometry of Bayesian statistics. Entropy 2020, 22, 240. [Google Scholar] [CrossRef] [PubMed]
  6. Cordero, L.; Fernández, F.; Gray, A. Symplectic manifolds with no Kähler structure. Topology 1986, 25, 375–380. [Google Scholar] [CrossRef]
  7. Kuntner, N.; Steinacker, H. On Poisson geometries related to noncommutative emergent gravity. J. Geom. Phys. 2012, 62, 1760–1777. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mori, A. Symplectic/Contact Geometry Related to Bayesian Statistics. Proceedings 2020, 46, 13. https://doi.org/10.3390/ecea-5-06665

AMA Style

Mori A. Symplectic/Contact Geometry Related to Bayesian Statistics. Proceedings. 2020; 46(1):13. https://doi.org/10.3390/ecea-5-06665

Chicago/Turabian Style

Mori, Atsuhide. 2020. "Symplectic/Contact Geometry Related to Bayesian Statistics" Proceedings 46, no. 1: 13. https://doi.org/10.3390/ecea-5-06665

Article Metrics

Back to TopTop