Symplectic/Contact Geometry Related to Bayesian Statistics †

: In the previous work, the author gave the following symplectic/contact geometric description of the Bayesian inference of normal means: The space H of normal distributions is an upper halfplane which admits two operations, namely, the convolution product and the normalized pointwise product of two probability density functions. There is a diffeomorphism F of H that interchanges these operations as well as sends any e-geodesic to an e-geodesic. The product of two copies of H carries positive and negative symplectic structures and a bi-contact hypersurface N . The graph of F is Lagrangian with respect to the negative symplectic structure. It is contained in the bi-contact hypersurface N . Further, it is preserved under a bi-contact Hamiltonian ﬂow with respect to a single function. Then the restriction of the ﬂow to the graph of F presents the inference of means. The author showed that this also works for the Student t -inference of smoothly moving means and enables us to consider the smoothness of data smoothing. In this presentation, the space of multivariate normal distributions is foliated by means of the Cholesky decomposition of the covariance matrix. This provides a pair of regular Poisson structures, and generalizes the above symplectic/contact description to the multivariate case. The most of the ideas presented here have been described at length in a later article of the author.


Introduction
We work in the C ∞ -smooth category. A manifold U embedded in the space of probability distributions inherits a separating premetric D : U × U → R ≥0 from the relative entropy, which is called the Kullback-Leibler divergence. The geometry of (U, D) is studied in the information theory. The information geometry [1] concerns the infinitesimal behavior of D. In the case where U is the space of univariate normal distributions, we regard U as the half plane H = R × R >0 (m, s), where m denotes the mean and s the standard deviation. Since the convolution of two normal densities is a normal density, it induces a product * on H, which we call the convolution product. On the other hand, since the pointwise product of two normal densities is proportional to a normal density, it induces another product · on H, which we call the Bayesian product. The first half of this presentation is devoted to the geometric description of Bayesian statistics including this product.
On the other hand, the current statistics lies not only in probability theory, but also in information theory. The author [2] found a symplectic description of the statistics of univariate normal distributions which is simultaneously based on these theories. Precisely, on the product H × H with coordinates (m, s, M, S), we take the positive and negative symplectic forms dλ ± with the fixed primitives λ ± = dm s ± dM S . Then the Lagrangian surfaces with respect to dλ − foliate the hypersurface N = {sS = 1}. For each ε ∈ R, the leaf F ε is the graph of a diffeomorphism of H which sends any geodesic to a geodesic with respect to the e-connection. Further, in the case where ε = 0, the diffeomorphism interchanges the products * and ·, namely, Thus, the iteration of * in the first factor of H × H corresponds to that of · in the second factor. The primitives λ ± , the hypersurface N, the foliation {F ε } ε∈R and the leaf F 0 are preserved under the diffeomorphism ϕ ζ : (m, s, M, S) → (ζm, ζs, ζ −1 M, ζ −1 S) for any ζ ∈ R >0 . This map appears in the construction of Hilbert modular cusps by Hirzebruch [3]. Further the function f : On the other hand, the hypersurface N inherits the mutually transverse pair of the contact structures ker(λ ± | N ), which we call the bi-contact structure. Let X be the contact Hamiltonian vector field of λ + | N with respect to the function m s , i.e., the unique contact vector field satisfying λ + | N (X) = m s .
Then X is also the contact Hamiltonian vector field of λ − | N with respect to the same function m s .
We call such a vector field a bi-contact Hamiltonian vector field. There is a non-trivial bi-contact Hamiltonian vector field on (N, λ ± ) which is tangent to the leaf F 0 . It is the one for the above function m s up to constant multiple. We may regard its restriction to F 0 as a vector field on the second factor of H × H since F 0 is the graph of a diffeomorphism. Surprisingly, this vector field is tangent to a foliation by e-geodesics, and each leaf is closed under the Bayesian product. Further, the author [4] showed that a similar vector field on the squared space of Student's t-distributions can provide an indication of "geometric smoothness" in actual data smoothing.
In the second half of this presentation, we generalize the above description to the multivariate case. It is straightforward except that we use the Cholesky decomposition to foliate the squared space of n-variate normal distributions. Here the leaves are 4n-dimensional submanifolds carrying two symplectic structures. They form a pair of Poisson structures on the squared space.
This short paper provides the calculation results and sketches the mathematical ideas. For the precise descriptions and the proofs, see the article [5] by the author.

Bayesian Information Geometry
In this subsection, we generalize the setting of the information geometry. Take a smooth family of volume forms with finite total volumes on R n . We regard each of the volume forms as a point of a manifold M, namely, a point y ∈ M presents a volume form ρ y dVol smoothly depending on y. Let V be the space of volume forms with finite total volumes on M. We take a volume form V in V. Given a point z on R n , we regard the value ρ y (z) of the density as a function ρ(z) : y → ρ y (z), and multiply the volume form V by the function ρ(z). This defines the updating map We notice that a volume form with finite total volume is proportional to a probability measure. Thus the function ρ(z) is proportional to the likelihood, and System (1) presents Bayes' rule.
Suppose that we have a conjugate prior U which is a smooth manifold, and further that, by using the hypersurface U = {V ∈ U | R n V = 1}, it can be written as U = {kV | V ∈ U , k > 0}. We define on U the following "distance" D, which satisfies non of the axioms of distance.
Note that the restriction D| U ×U = D satisfies the separation axiom, and is called the Kullback-Leibler divergence. We have to fix the coordinate in the k-direction which presents the time. Then we write the quadratic term of the Taylor expansion of D(P, P + dP) + D(P + dP, P) as ∑ i,j g ij dP i dP j , where g ij = g ji . Suppose that g = [ g ij ] is a metric on U . Let ∇ 0 be the Levi-Civita connection with respect to g. We write the cubic term of the expansion of 3 D(P, P + dP) − 3 D(P + dP, P) symmetrically as ∑ i,j,k T ijk dP i dP j dP k . This defines the line of (generalized) α-connections ∇ α = ∇ 0 − α g * T with affine parameter α ∈ R, where g * T denotes the contraction ∑ l g kl T ijl by the contravariant metric g −1 = [g ij ]. Note that ∇ α has no torsion. Restricting all of the above notions with tilde to the hypersurface U ⊂ U , we obtain the notions without tilde in the usual information geometry [1]. Here U can be identified with a space U of probability distributions.

The Geometry of Normal Distributions
In this subsection we consider the space U of multivariate normal distributions. The pair of a vector µ = (µ i ) 1≤i≤n ∈ R n and an upper triangular matrix C = [c ij ] 1≤i,j≤n ∈ Mat(n, R) with positive diagonal entries determines an n-variate normal distribution by declaring that µ presents the mean and C T C the Cholesky decomposition of the covariance matrix. We put Note that [r ij ] is unitriangular, i.e., it is a triangular matrix whose diagonal entries are all 1. Considering σ ∈ R n and r = (r ij ) 1≤i<j≤n ∈ R n(n−1)/2 as parameters, we can write the probability density of the n-variate normal distribution at P = (µ, σ, r) ∈ U = R n × (R >0 ) n × R n(n−1)/2 as Then the relative entropy defines the premetric where · 2 denotes the sum of squares (i.e., · the Frobenius norm). Thus, where 1 n is the unit, and ∆C the difference C(σ + ∆σ, r + ∆r) − C(σ, r). Let r ij be the entries of the inverse matrix of [r ij ]. Then we have .
The Fisher information g appears in D(P + dP, P) as the quadratic form which is presented by a block diagonal diag(g µµ , g σσ , g rr,2 , . . . , g rr,n ), where and g rr,l = g r li r lj i,j>l = σ l 2 g µ i ,µ j i,j>l (l = 1, . . . , n − 1). Lowering the upper indices of the  and thus we also have {I,J},K = 0 (for the other choices of {I, J} and K).
The coefficients for the e-connection all vanish with respect to the natural parameter θ = (C −1 C −T µ, ξ), where ξ = (ξ ab ) 1≤a≤b≤n is the upper half of C −1 C −T . Dually, the coefficients for the m-connection all vanish with respect to the expectation parameter η = (µ, ν), where ν = (ν ab ) 1≤a≤b≤n is the upper half of C T C + µµ T . Now we fix the third component r of (µ, σ, r), and change the others. We take the natural projection π : U = H n × R n(n−1)/2 → R n(n−1)/2 and modify the coordinates (µ, σ) on the fiber L(r) = π −1 (r) into (m, s) in the next proposition. The fiber L(r) satisfies the following two properties.

Proposition 2. L(r)
is closed under the convolution * and the normalized pointwise product · between the probability densities.
Proposition 3. The fiber L(r) with the induced metric from g admits a Kähler complex structure.
We write the restriction D| L(r) of the premetric D using the coordinates (m, s) as We take the product U 1 × U 2 of two copies of the space U. Then the products L 1 (r) × L 2 (R) of the fibers foliate U 1 × U 2 . We call this the primary foliation of U 1 × U 2 . For each (r, R) ∈ R n(n−1) , we have the coordinate system (m, s, M, S) on the leaf L 1 (r) × L 2 (R). From the Kähler forms respectively, on L 1 (r) and L 2 (R), we define the symplectic forms ω 1 ± ω 2 on L 1 (r) × L 2 (R). We fix their primitive 1-forms The symplectic structures on the primary foliation defines a pair of regular Poisson structures. Now we take the 2n-dimensional submanifolds of the leaf L 1 (r) × L 2 (R) for ε ∈ R n and δ ∈ (R >0 ) n . The secondary foliation of U 1 × U 2 foliates any leaf U(r) × U(r) by the 3n-dimensional submanifolds F ε = δ∈(R >0 ) n F ε,δ for ε ∈ R n . The tertiary foliation of U 1 × U 2 foliates all leaves F ε of the secondary foliation by the 2n-dimensional submanifolds F ε,δ for δ ∈ (R >0 ) n . We take the hypersurface which inherits the contact forms α ± = λ ± | N . We can prove the following propositions.

Proposition 4.
With respect to the Kähler form dλ − , the tertiary leaves F ε,δ are Lagrangian correspondences.

Proposition 5.
For any ε and δ with n ∏ i=1 δ i = 1, F ε,δ ⊂ N is a disjoint union of n-dimensional submanifolds {s = const} ⊂ F ε,δ which are integral submanifolds of the contact hyperplane distribution α + on N.
Hereafter we fix ε = 0. For any δ ∈ (R >0 ) n , the diffeomorphismF 0,δ interchanges the operation Note that any e-geodesic is intensive in the case where n = 1. We show Proposition 9. Given an intensive e-geodesic (m(t), s(t)) ∈ H n , we can parametrize its image under the diffeomorphismF ε,δ to obtain an intensive e-geodesic.
We have the hypersurface N = We state the main result. N on any leaf L 1 (r) × L 2 (R) ≈ H 2n of the primary foliation of U 1 × U 2 with respect to the contact form α + on N coincides with that for the other contact form α − . The vector field X is tangent to the tertiary leaves F ε,δ and defines flows on them. Here each flow line presents a correspondence between intensive e-geodesics as is described in Proposition 9. Particularly, for ε = 0 and any δ ∈ (R >0 ) n , the flow on the leaf F 0,δ presents the iteration of the operation * on the first factor of U × U and that of the operation · on the second factor.
Finally, we consider the transverse unitriangular group. We have the orthonormal frame with the relations [e ij , e kl ] = δ il e kj − δ kj e il of the unitriangular algebra. Using the dual coframe e ij , the relations can be expressed as de ij = j−1 ∑ k=i+1 e ik ∧ e kj . The transverse section of the primary foliation of U 1 × U 2 is the product of two copies of the unitriangular Lie group, which we would like to call the bi-unitriangular group. We fix the frame (resp. the coframe) of the transverse section consisting of the above e ij (resp. e ij ) in the first factor U 1 and their copies E ij (resp. E ij ) in the second factor U 2 . The quotient manifold carries the (n − 2)-plectic structure Ω = n ∑ i=1 e i,i+1 ∧ · · · ∧ e i,n ∧ E n−i+1,n−i+2 ∧ · · · ∧ E n−i+1,n , which satisfies dΩ = 0 and Ω n > 0. We notice that, in the symplectic case where n = 3, the quotient manifold admits no Kähler structure (see [6]).

Discussion
It is remarkable that the transverse symplectic 6-manifold is naturally ignored in the Bayesian inference on 3-dimensional normal prior. The author conjectures that a similar geometry of 3 + 1-dimensional relativistic prior has some relation to the M-theory. See [7] for a relation between Poisson geometry and matrix theoretical and non-commutative geometrical physics.

Conflicts of Interest:
The author declares no conflict of interest.