Mean Estimation on the Diagonal of Product Manifolds

Jensen, Mathias Højgaard; Sommer, Stefan

doi:10.3390/a15030092

Open AccessArticle

Mean Estimation on the Diagonal of Product Manifolds

by

Mathias Højgaard Jensen

^† and

Stefan Sommer

^*,†

Department of Computer Science, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen, Denmark

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2022, 15(3), 92; https://doi.org/10.3390/a15030092

Submission received: 5 February 2022 / Revised: 2 March 2022 / Accepted: 3 March 2022 / Published: 10 March 2022

(This article belongs to the Special Issue Stochastic Algorithms and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

Computing sample means on Riemannian manifolds is typically computationally costly, as exemplified by computation of the Fréchet mean, which often requires finding minimizing geodesics to each data point for each step of an iterative optimization scheme. When closed-form expressions for geodesics are not available, this leads to a nested optimization problem that is costly to solve. The implied computational cost impacts applications in both geometric statistics and in geometric deep learning. The weighted diffusion mean offers an alternative to the weighted Fréchet mean. We show how the diffusion mean and the weighted diffusion mean can be estimated with a stochastic simulation scheme that does not require nested optimization. We achieve this by conditioning a Brownian motion in a product manifold to hit the diagonal at a predetermined time. We develop the theoretical foundation for the sampling-based mean estimation, we develop two simulation schemes, and we demonstrate the applicability of the method with examples of sampled means on two manifolds.

Keywords:

diffusion mean; Fréchet mean; bridge simulation; geometric statistics; geometric deep learning

1. Introduction

The Euclidean expected value can be generalized to geometric spaces in several ways. Fréchet [1] generalized the notion of mean values to arbitrary metric spaces as minimizers of the sum of squared distances. Fréchet’s notion of mean values therefore naturally includes means on Riemannian manifolds. On manifolds without metric, for example, affine connection spaces, a notion of the mean can be defined by exponential barycenters, see, e.g., [2,3]. Recently, Hansen et al. [4,5] introduced a probabilistic notion of a mean, the diffusion mean. The diffusion mean is defined as the most likely starting point of a Brownian motion given the observed data. The variance of the data is here modelled in the evaluation time

T > 0

of the Brownian motion, and Varadhan’s asymptotic formula relating the heat kernel with the Riemannian distance relates the diffusion mean and the Fréchet mean in the

T \to 0

limit.

Computing sample estimators of geometric means is often difficult in practice. For example, estimating the Fréchet mean often requires evaluating the distance to each sample point at each step of an iterative optimization to find the optimal value. When closed-form solutions of geodesics are not available, the distances are themselves evaluated by minimizing over curves ending at the data points, thus leading to a nested optimization problem. This is generally a challenge in geometric statistics, the statistical analysis of geometric data. However, it can pose an even greater challenge in geometric deep learning, where a weighted version of the Fréchet mean is used to define a generalization of the Euclidean convolution taking values in a manifold [6]. As the mean appears in each layer of the network, closed-form geodesics is in practice required for its evaluation to be sufficiently efficient.

As an alternative to the weighted Fréchet mean, Ref. [7] introduced a corresponding weighted version of the diffusion mean. Estimating the diffusion mean usually requires ability to evaluate the heat kernel making it often similarly computational difficult to estimate. However, Ref. [7] also sketched a simulation-based approach for estimating the (weighted) diffusion mean that avoids numerical optimization and estimation of the heat kernel. Here, a mean candidate is generated by simulating a single forward pass of a Brownian motion on a product manifold conditioned to hit the diagonal of the product space. The idea is sketched for samples in

R^{2}

in Figure 1.

Contribution

In this paper, we present a comprehensive investigation of the simulation-based mean sampling approach. We provide the necessary theoretical background and results for the construction, we present two separate simulation schemes, and we demonstrate how the schemes can be used to compute means on high-dimensional manifolds.

2. Background

We here outline the necessary concepts from Riemannian geometry, geometric statistics, stochastic analysis, and bridge sampling necessary for the sampling schemes presented later in the paper.

2.1. Riemannian Geometry

A Riemannian metric g on a d-dimensional differentiable manifold M is a family of inner products

{(g_{p})}_{p \in M}

on each tangent space

T_{p} M

varying smoothly in p. The Riemannian metric allows for geometric definitions of, e.g., length of curves, angles of intersections, and volumes on manifolds. A differentiable curve on M is a map

γ : [0, 1] \to M

for which the time derivative

γ^{'} (t)

belongs to

T_{γ_{t}} M

, for each

t \in (0, 1)

. The length of the differentiable curve can then be determined from the Riemannian metric by

L (γ) : = \int_{0}^{1} \sqrt{g_{γ_{t}} (γ^{'} (t), γ^{'} (t))} d t = \int_{0}^{1} {∥ γ^{'} (t) ∥}_{γ_{t}} d t

. Let

p, q \in M

and let

Γ

be the set of differentiable curves joining p and q, i.e.,

Γ = {γ : [0, 1] \to M | γ (0) = p

and

γ (1) = q}

. The (Riemannian) distance between p and q is defined as

d (p, q) = {min}_{γ \in Γ} L (γ)

. Minimizing curves are called geodesics.

A manifold can be parameterized using coordinate charts. The charts consist of open subsets of M providing a global cover of M such that each subset is diffeomorphic to an open subset of

R^{d}

, or, equivalently,

R^{d}

itself. The exponential normal chart is often a convenient choice to parameterize a manifold for computational purposes. The exponential chart is related to the exponential map

{exp}_{p} : T_{p} M \to M

that for each

p \in M

is given by

{exp}_{p} (v) = γ_{v} (1)

, where

γ_{v}

is the unique geodesic satisfying

γ_{v} (0) = p

and

γ_{v}^{'} (0) = v

. For each

p \in M

, the exponential map is a diffeomorphism from a star-shaped subset V centered at the origin of

T_{p} M

to its image

{exp}_{p} (V) \subseteq M

, covering all M except for a subset of (Riemannian) volume measure zero,

Cut (p)

, the cut-locus of p. The inverse map

{log}_{p} : M \ Cut (p) \to T_{p} M

provides a local parameterization of M due to the isomorphism between

T_{p} M

and

R^{d}

. For submanifolds

N \subseteq M

, the cut-locus

Cut (N)

is defined in a fashion similar to

Cut (p)

, see e.g., [8].

Stochastic differential equations on manifolds are often conveniently expressed using the frame bundle

F M

, the fiber bundle which for each point

p \in M

assigns a frame or basis for the tangent space

T_{p} M

, i.e.,

F M

consists of a collection of pairs

(p, u)

, where

u : R^{d} \to T_{p} M

is a linear isomorphism. We let

π

denote the projection

π : F M \to M

. There exist a subbundle of

F M

consisting of orthonormal frames called the orthonormal frame bundle

O M

. In this case, the map

u : R^{d} \to T_{p} M

is a linear isometry.

2.2. Weighted Fréchet Mean

The Euclidean mean has three defining properties: The algebraic property states the uniqueness of the arithmetic mean as the mean with residuals summing to zero, the geometric property defines the mean as the point that minimizes the variance, and the probabilistic property adheres to a maximum likelihood principle given an i.i.d. assumption on the observations (see also [9] Chapter 2). Direct generalization of the arithmetic mean to non-linear spaces is not possible due to the lack of vector space structure. However, the properties above allow giving candidate definitions of mean values in non-linear spaces.

The Fréchet mean [1] uses the geometric property by generalizing the mean-squared distance minimization property to general metric spaces. Given a random variable X on a metric space

(E, d)

, the Fréchet mean is defined by

μ = \underset{p \in E}{arg min} E [d {(p, X)}^{2}] .

(1)

In the present context, the metric space is a Riemannian manifold M with Riemannian distance function d. Given realizations

x_{1}, \dots, x_{n} \in M

from a distribution on M, the estimator of the weighted Fréchet mean is defined as

\hat{μ} = \underset{p \in M}{arg min} \sum_{i = 1}^{n} w_{i} d {(p, x_{i})}^{2},

(2)

where

w_{1}, \dots, w_{n}

are the corresponding weights satisfying

w_{i} > 0

and

\sum_{i} w_{i} = 1

. When the weights are identical, (2) is an estimator of the Fréchet mean. Throughout, we shall make no distinction between the estimator and the Fréchet mean and will refer to both as the Fréchet mean.

In [6,10], the weighted Fréchet mean was used to define a generalization of the Euclidean convolution to manifold-valued inputs. When closed-form solutions of geodesics are available, the weighted Fréchet mean can be estimated efficiently with a recursive algorithm, also denoted an inductive estimator [6].

2.3. Weighted Diffusion Mean

The diffusion mean [4,5] was introduced as a geometric mean satisfying the probabilistic property of the Euclidean expected value, specifically as the starting point of a Brownian motion that is most likely given observed data. This results in the diffusion t-mean definition

μ_{t} = \underset{p \in M}{arg min} E [- log p_{t} (p, X)],

(3)

where

p_{t} (\cdot, \cdot)

denotes the transition density of a Brownian motion on M. Equivalently,

p_{t}

denotes the solution to the heat equation

\partial u / \partial t = \frac{1}{2} Δ u

, where

Δ

denotes the Laplace-Beltrami operator associated with the Riemannian metric. The definition allows for an interpretation of the mean as an extension of the Fréchet mean due to Varadhan’s result stating that

{lim}_{t \to 0} - 2 t log p_{t} (x, y) = d {(x, y)}^{2}

uniformly on compact sets disjoint from the cut-locus of either x or y [11].

Just as the Fréchet mean, the diffusion mean has a weighted version, and the corresponding estimator of the weighted diffusion t-mean is given as

{\hat{μ}}_{t} = \underset{p \in M}{arg min} \sum_{i = 1}^{n} - log p_{t / w_{i}} (p, x_{i}) .

(4)

Please note that the evaluation time is here scaled by the weights. This is equivalent to scaling the variance of the steps of the Brownian motion [12].

As closed-form expressions for the heat kernel are only available on specific manifolds, evaluating the diffusion t-mean often rely on numerical methods. One example of this is using bridge sampling to numerically estimate the transition density [9,13]. If a global coordinate chart is available, the transition density can be written in the form (see [14,15])

p_{T} (z, v) = \sqrt{\frac{det g (v)}{{(2 π T)}^{2}}} e^{- \frac{{∥ a (z) (z - v) ∥}^{2}}{2 T}} E [φ],

(5)

where g is the metric matrix, a a square root of g, and

φ

denotes the correction factor between the law of the true diffusion bridge and the law of the simulation scheme. The expectation over the correction factor can be numerically approximated using Monte Carlo sampling. The correction factor will appear again when we discuss guided bridge proposals below.

2.4. Diffusion Bridges

The proposed sampling scheme for the (weighted) diffusion mean builds on simulation methods for conditioned diffusion processes, diffusion bridges. Here, we outline ways to simulate conditioned diffusion processes numerically in both the Euclidean and manifold context.

Euclidean Diffusion Bridges

Let

(Ω, F, F_{t}, P)

be a filtered probability space, and X a d-dimensional Euclidean diffusion

[0, T]

satisfying the stochastic differential equation (SDE)

d X_{t} = b_{t} (X_{t}) d t + σ_{t} (X_{t}) d W_{t}, X_{0} = x,

(6)

where W is a d-dimensional Brownian motion. Let

v \in R^{d}

be a fixed point. Conditioning X on reaching v at a fixed time

T > 0

gives the bridge process

X | X_{T} = v

. Denoting this process Y, Doob’s h-transform shows that Y is a solution of the SDE (see e.g., [16])

\begin{matrix} d Y_{t} & = {\tilde{b}}_{t} (Y_{t}) d t + σ_{t} (Y_{t}) d {\tilde{W}}_{t}, Y_{0} = x \\ {\tilde{b}}_{t} (y) & = b_{t} (y) + a_{t} (y) \nabla_{y} log p_{T - t} (y, v), \end{matrix}

(7)

where

p_{t} (\cdot, \cdot)

denotes the transition density of the diffusion X,

a = σ σ^{T}

, and where

\tilde{W}

is a d-dimensional Brownian motion under a changed probability law. From a numerical viewpoint, in most cases, the transition density is intractable and therefore direct simulation of (7) is not possible.

If we instead consider a Girsanov transformation of measures to obtain the system (see, e.g., [17] Theorem 1)

\begin{matrix} d Y_{t} & = {\tilde{b}}_{t} (Y_{t}) d t + σ_{t} (Y_{t}) d {\tilde{W}}_{t}, Y_{0} = x \\ {\tilde{b}}_{t} (y) & = b_{t} (y) + σ_{t} (y) h (t, y), \end{matrix}

(8)

the corresponding change of measure is given by

\frac{d P^{h}}{d P} |_{F_{t}} = e^{\int_{0}^{t} h {(s, X_{s})}^{T} d W_{s} - \frac{1}{2} \int_{0}^{t} {∥ h (s, X_{s}) ∥}^{2} d s} .

(9)

From (7), it is evident that

h (t, x) = σ^{T} \nabla_{x} log p_{T - t} (x, v)

gives the diffusion bridge. However, different choices of the function h can yield processes which are absolutely continuous regarding the actual bridges, but which can be simulated directly.

Delyon and Hu [17] suggested to use

h (t, x) = σ_{t}^{- 1} (x) \nabla_{x} log q_{T - t} (x, v)

, where q denotes the transition density of a standard Brownian motion with mean v, i.e.,

q_{t} (x, v) = {(2 π t)}^{- d / 2} exp (- {∥ x - v ∥}^{2} / 2 t)

. They furthermore proposed a method that would disregard the drift term b, i.e.,

h (t, x)) = σ_{t}^{- 1} (x) \nabla_{x} log q_{T - t} (x, v) - σ_{t}^{- 1} (x) b_{t} (x)

. Under certain regularity assumptions on b and

σ

, the resulting processes converge to the target in the sense that

{lim}_{t \to T} Y_{t} = v

a.s. In addition, for bounded continuous functions f, the conditional expectation is given by

E [f (X) | X_{T} = v] = C E [f (Y) φ (y)],

(10)

where

φ

is a functional of the whole path Y on

[0, T]

that can be computed directly. From the construction of the h-function, it can be seen that the missing drift term is accounted for in the correction factor

φ

.

The simulation approach of [17] can be improved by the simulation scheme introduced by Schauer et al. [18]. Here, an h-function defined by

h (t, x) = \nabla_{x} log {\hat{p}}_{T - t} (x, v)

is suggested, where

\hat{p}

denotes the transition density of an auxiliary diffusion process with known transition densities. The auxiliary process can for example be linear because closed-form solutions of transition densities for linear processes are available. Under the appropriate measure

P^{h}

, the guided proposal process is a solution to

d Y_{t} = b_{t} (Y_{t}) d t + a_{t} (Y_{t}) \nabla_{x} log {\hat{p}}_{T - t} (x, v) |_{x = Y_{t}} d t + σ_{t} (Y_{t}) d W_{t} .

(11)

Note the factor

a (t, y)

in the drift in (7) which is also present in (11) but not with the scheme proposed by [17]. Moreover, the choice of a linear process grants freedom to model. For other choices of an h-functions see e.g., [19,20].

Marchand [19] extended the ideas of Delyon and Hu by conditioning a diffusion process on partial observations at a finite collection of deterministic times. Where Delyon and Hu considered the guided diffusion processes satisfying the SDE

d Y_{t} = b_{t} (Y_{t}) d t - \frac{Y_{t} - v}{T - t} d t + σ_{t} (Y_{t}) d w_{t},

(12)

for

v \in R^{d}

over the time interval

[0, T]

, Marchand proposed the guided diffusion process conditioned on partial observations

v_{1}, \dots, v_{N}

solving the SDE

d Y_{t} = b_{t} (Y_{t}) d t - \sum_{k = 1}^{n} P_{t}^{k} (Y_{t}) \frac{Y_{t} - u_{k}}{T_{k} - t} 1_{(T_{k} - ε_{k}, T_{k})} d t + σ_{t} (Y_{t}) d w_{t},

(13)

where

u_{k}

is be any vector satisfying

L_{k} (x) u_{k} = v_{k}

,

L_{k}

a deterministic matrix in

M_{m_{k}, n} (R)

whose

m_{k}

rows form a orthonormal family,

P_{t}^{k}

are projections to the range of

L_{k}

, and

T_{k} - ε_{k} < T_{k}

. The

ε_{k}

only allow the application of the guiding term on a part of the time intervals

[T_{k - 1}, T_{k}]

. We will only consider the case

k = 1

. The scheme allows the sampling of bridges conditioned on

L Y_{T} = v

.

2.5. Manifold Diffusion Processes

To work with diffusion bridges and guided proposals on manifolds, we will first need to consider the Eells–Elworthy–Malliavin construction of Brownian motion and the connected characterization of semimartingales [21]. Endowing the frame bundle

F M

with a connection allows splitting the tangent bundle

T F M

into a horizontal and vertical part. If the connection on

F M

is a lift of a connection on M, e.g., the Levi–Civita connection of a metric on M, the horizontal part of the frame bundle is in one-to-one correspondence with M. In addition, there exist fundamental horizontal vector fields

H_{i} : F M \to H F M

such that for any continuous

R^{d}

-valued semimartingale Z the process U defined by

d U_{t} = H_{i} (U_{t}) \circ d Z_{t}^{i},

(14)

is a horizontal frame bundle semimartingale, where ∘ denotes integration in the Stratonovich sense. The process

X_{t} : = π (U_{t})

is then a semimartingale on M. Any semimartingale

X_{t}

on M has this relation to a Euclidean semimartingale

Z_{t}

.

X_{t}

is denoted the development of

Z_{t}

, and

Z_{t}

the antidevelopment of

X_{t}

. We will use this relation when working with bridges on manifolds below.

When

Z_{t}

is a Euclidean Brownian motion, the development

X_{t}

is a Brownian motion. We can in this case also consider coordinate representations of the process. With an atlas

{(D_{α}, ϕ_{α})}_{α}

of M, there exists an increasing sequence of predictable stopping times

0 \leq T_{k} \leq T_{k + 1}

such that on each stochastic interval

〚 T_{k}, T_{k + 1} 〛 = {(ω, t) \in Ω \times R_{+} | T_{k} (ω) \leq t \leq T_{k + 1} (ω)}

the process

x_{t} \in D_{α}

, for some

α

(see [22] Lemma 3.5). Thus, the Brownian motion x on M can be described locally in a chart

D_{α} \subset M

as the solution to the system of SDEs, for

(ω, t) \in 〚 T_{k}, T_{k + 1} 〛 \cap {T_{k} < T_{k + 1}}

d x_{t}^{i} (ω) = b^{i} (x_{t} (ω)) d t + σ_{j}^{i} (x_{t} (ω)) d W_{t}^{j} (ω),

(15)

where

σ

denotes the matrix square root of the inverse of the Riemannian metric tensor

(g^{i j})

and

b^{k} (x) = - \frac{1}{2} g^{i j} (x) Γ_{i j}^{k} (x)

is the contraction over the Christoffel symbols (see, e.g., [11] Chapter 3). Strictly speaking, the solution of Equation (15) is defined by

x_{t}^{i} = ϕ_{α} {(x_{t})}^{i}

.

We thus have two concrete SDEs for the Brownian motion in play: The

F M

SDE (14) and the coordinate SDE (15).

Throughout the paper, we assume that M is stochastically complete, i.e., the Brownian motions does not explode in finite time and, consequently,

\int_{M} p_{t} (x, y) d {Vol}_{M} (y) = 1

, for all

t > 0

and all

x \in M

.

2.6. Manifold Bridges

The Brownian bridge process Y on M conditioned at

Y_{T} = v

is a Markov process with generator

\frac{1}{2} Δ + \nabla log p_{T - t} (\cdot, v)

. Closed-form expressions of the transition density of a Brownian motion are available on selected manifolds including Euclidean spaces, hyperbolic spaces, and hyperspheres. Direct simulation of Brownian bridges is therefore possible in these cases. However, generally, transition densities are intractable and auxiliary processes are needed to sample from the desired conditional distributions.

To this extent, various types of bridge processes on Riemannian manifolds have been described in the literature. In the case of manifolds with a pole, i.e., the existence of a point

p \in M

such that the exponential map

{exp}_{p} : T_{p} M \to M

is a diffeomorphism, the semi-classical (Riemannian Brownian) bridge was introduced by Elworthy and Truman [23] as the process with generator

\frac{1}{2} Δ + \nabla log k_{T - t} (\cdot, v)

, where

k_{t} (x, v) = {(2 π t)}^{- n / 2} e^{- \frac{d {(x, v)}^{2}}{2 t}} J^{- 1 / 2} (x),

and

J (x) = | det D_{exp {(v)}^{- 1}} {exp}_{v} |

denotes the Jacobian determinant of the exponential map at v. Elworthy and Truman used the semi-classical bridge to obtain heat-kernel estimates, and the semi-classical bridge has been studied by various others [24,25].

By Varadhan’s result (see [11] Theorem 5.2.1), as

t \to T

, we have the asymptotic relation

((T - t) log p_{T - t} (x, y) \sim - \frac{1}{2} d {(x, y)}^{2}

. In particular, the following asymptotic relation was shown to hold by Malliavin, Stroock, and Turetsky [26,27]:

(T - t) \nabla log p_{T - t} (x, y) \sim - \frac{1}{2} \nabla d {(x, y)}^{2}

. From these results, the generators of the Brownian bridge and the semi-classical bridge differ in the limit by a factor of

- \frac{1}{2} \nabla log J (x)

. However, under a certain boundedness condition, the two processes can be shown to be identical under a changed probability measure [8] Theorem 4.3.1.

To generalize the heat-kernel estimates of Elworthy and Truman, Thompson [8,28] considered the Fermi bridge process conditioned to arrive in a submanifold

N \subseteq M

at time

T > 0

. The Fermi bridge is defined as the diffusion process with generator

\frac{1}{2} Δ + \nabla log q_{T - t} (\cdot, N)

, where

q_{t} (x, N) = {(2 π t)}^{- n / 2} e^{- \frac{d {(x, N)}^{2}}{2 t}} .

For both bridge processes, when

M = R^{d}

and N is a point, both the semi-classical bridge and the Fermi bridge agree with the Euclidean Brownian bridge.

Ref. [15] introduce a numerical simulation scheme for conditioned diffusions on Riemannian manifolds, which generalize the method by Delyon and Hu [17]. The guiding term used is identical to the guiding term of the Fermi bridge when the submanifold is a single point v.

3. Diffusion Mean Estimation

The standard setup for diffusion mean estimation described in the literature (e.g., [13]) is as follows: Given a set of observations

x_{1}, \dots, x_{n} \in M

, for each observation

x_{i}

, sample a guided bridge process approximating the bridge

X_{i, t} | X_{i, T} = x_{i}

with starting point

x_{0}

. The expectation over the correction factors can be computed from the samples, and the transition density can be evaluated using (5). An iterative maximum likelihood approach using gradient descent to update

x_{0}

yielding an approximation of the diffusion mean in the final value of

x_{0}

. The computation of the diffusion mean, in the sense just described, is, similarly to the Fréchet mean, computationally expensive.

We here explore the idea first put forth in [7]: We turn the situation around to simulate n independent Brownian motions starting at each of

x_{1}, \dots, x_{n}

, and we condition the n processes to coincide at time T. We will show that the value

x_{1, T} = \dots = x_{n, T}

is an estimator of the diffusion mean. By introducing weights in the conditioning, we can similarly estimate the weighted diffusion mean. The construction can concisely be described as a single Brownian motion on the n-times product manifold

M^{n}

conditioned to hit the diagonal

diag (M^{n}) = {(x, \dots, x) | x \in M} \subset M^{n}

. To shorten notation, we denote the diagonal submanifold N below. We start with examples with M Euclidean to motivate the construction.

Example 1.

Consider the two-dimensional Euclidean multivariate normal distribution

(\begin{matrix} X \\ Y \end{matrix}) \sim N ((\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}), (\begin{matrix} σ_{11} σ_{12} \\ σ_{21} σ_{22} \end{matrix})) .

The conditional distribution of X given

Y = y

follows a univariate normal distribution

X | Y = y \sim N (μ_{1} + σ_{12} σ_{22}^{- 1} (y - μ_{2}), σ_{11} - σ_{12} σ_{22}^{- 1} σ_{21}) .

This can be seen from the fact that if

X \sim N (μ, Σ)

then for any linear transformation

A X + b \sim N (b + A μ, A Σ A^{T})

. Defining the random variable

Z = X - σ_{12} σ_{22}^{- 1} Y

, the result applied to

(Z, X)

gives

Z \sim N (μ_{1} - σ_{12} σ_{22}^{- 1} μ_{2}, σ_{11} - σ_{12} σ_{22}^{- 1} σ_{21})

. The conclusion then follows from

X = Z + σ_{12} σ_{22}^{- 1} Y

. Please note that X and Y are independent if and only if

σ_{12} = σ_{21} = 0

and the conditioned random variable is in this case identical in law to X.

Let now

x_{1}, \dots, x_{n} \in M

be observations and let

x = (x_{1}, \dots, x_{n}) \in M^{n}

be an element of the n-product manifold

M \times \dots \times M

with the product Riemannian metric. We again first consider the case

M = R^{d}

:

Example 2.

Let

Y_{i} \sim N (x_{i}, \frac{T}{w_{i}} I_{d})

be independent random variables. The conditional distribution

Y_{1} | Y_{1} = \dots = Y_{n}

is normal

N (\frac{\sum_{i} w_{i} x_{i}}{\sum_{i} w_{i}}, \frac{T}{\sum_{i} w_{i}})

. This can be seen inductively: The conditioned random variable

Y_{1} | Y_{1} = Y_{2}

is identical to

Y_{1} | Y_{1} - Y_{2} = 0

. Now let

X = Y_{1}

and

Y = Y_{1} - Y_{2}

and refer to Example 1. To conclude, assume

Z_{n} : = Y_{1} | Y_{1} = \dots, = Y_{n - 1}

follows the desired normal distribution. Then

Z_{n} | Z_{n} = Y_{n}

is normally distributed with the desired parameters and

Z_{n} | Z_{n} = Y_{n}

is identical to

Y_{1} | Y_{1} = \dots = Y_{n}

.

The following example establishes the weighted average as a projection onto the diagonal.

Example 3.

Let x be a point in

{(R^{d})}^{n}

and let P be the orthogonal projection to the diagonal of

{(R^{d})}^{n}

such that

P x = {(\frac{1}{n d} \sum_{i = 1}^{n d} x_{i} \dots \frac{1}{n d} \sum_{i = 1}^{n d} x_{i})}^{T}

. We see that the projection yields n copies of the arithmetic mean of the coordinates. This is illustrated in Figure 2.

The idea of conditioning diffusion bridge processes on the diagonal of a product manifold originates from the facts established in Examples 1–3. We sample the mean by sampling from the conditional distribution

Y_{1} | Y_{1} = \dots = Y_{n}

from Example 2 using a guided proposal scheme on the product manifolds

M^{n}

and on each step of the sampling projecting to the diagonal as in Example 3.

Turning now to the manifold situation, we replace the normal distributions with mean

x_{i} \in R^{d}

and variance

T / w_{i}

with Brownian motions started at

x_{i} \in M

and evaluated at time

T / w_{i}

. Please note that the Brownian motion density, the heat kernel, is symmetric in its coordinates:

p_{t} (x, y) = p_{t} (y, x)

. We will work with multiple process and indicate with superscript the density with respect to a particular process, e.g.,

p_{T}^{X}

. Note also that change of the evaluation time T is equal to scaling the variance, i.e.,

p_{α T}^{X} (x, y) = p_{T}^{X^{α}} (x, y)

where

X^{α}

is a Brownian motion with variance of the increments scaled by

α > 0

. This gives the following theorem, first stated in [7] with sketch proof:

Theorem 1.

Let

X_{t} = (X_{1, t}^{w_{1}^{- 1}}, \dots, X_{n, t}^{w_{n}^{- 1}})

consist of n independent Brownian motions on M with variance

w_{i}^{- 1}

and

X_{i, 0} = x_{i}

, and let

P^{*}

the law of the conditioned process

Y_{t} = X_{t} | X_{T} \in N

,

N = diag (M^{n})

. Let v be the random variable

Y_{1, T}

. Then v has density

p_{v}^{Y} (y) \propto \prod_{i = 1}^{n} p_{T / w_{i}} (x_{i}; y)

and

v = Y_{i, T}

for all i a.s. (almost surely).

Proof.

p_{T}^{X} ((x_{1}, \dots, x_{n}), (y, \dots, y)) = \prod_{i = 1}^{n} p_{T}^{X^{w_{i}^{- 1}}} (x_{i}, y)

because the processes

X_{i, t}

are independent. By symmetry of the Brownian motion and the time rescaling property,

p_{T}^{X_{i}^{w_{i}^{- 1}}} (x_{i}, y)

= p_{T / w^{i}} (y, x_{i})

. For elements

(y, \dots, y) \in diag (M^{n})

and

x \in M^{n}

,

p_{v} (y) = p_{T}^{Y} (x, y) \propto p_{T}^{X} (x, y)

. As a result of the conditioning,

v = Y_{1, T} = \dots = Y_{n, T}

. In combination, this establishes the result. □

Consequently, the set of modes of

p_{v}

equal the set of the maximizers for the likelihood

L (y; x_{1}, \dots, x_{n}) = \prod_{i = 1}^{n} p_{T / w_{i}} (x_{i}; y)

and thus the weighted diffusion mean. This result is the basis for the sampling scheme. Intuitively, if the distribution of v is relatively well behaved (e.g., close to normal), a sample from v will be close to a weighted diffusion mean with high probability.

In practice, however, we cannot sample

Y_{t}

directly. Instead, we will below use guided proposal schemes resulting in processes

{\tilde{Y}}_{t}

with law

\tilde{P}

that we can actually sample and that under certain assumptions, will be absolutely continuous with respect to

Y_{t}

with explicitly computable likelihood ratio so that

P^{*} = \frac{φ ({\tilde{Y}}_{T})}{E^{\tilde{P}} [φ ({\tilde{Y}}_{T})]} \tilde{P}

.

Corollary 1.

Let

\tilde{P}

be the law of

{\tilde{Y}}_{t}

and φ be the corresponding correction factor of the guiding scheme. Let

\tilde{v}

be the random variable

{\tilde{Y}}_{1, T}

with law

\frac{φ ({\tilde{Y}}_{T})}{E^{\tilde{P}} [φ ({\tilde{Y}}_{T})]} \tilde{P}

. Then

\tilde{v}

has density

p_{\tilde{v}} (y) \propto \prod_{i = 1}^{n} p_{T / w_{i}} (x_{i}; y)

.

We now proceed to actually construct the guided sampling schemes.

3.1. Fermi Bridges to the Diagonal

Consider a Brownian motion

X_{t} = (X_{1, t}, \dots X_{n, t})

in the product manifold

M^{n}

conditioned on

X_{1, T} = \dots = X_{n, T}

or, equivalently,

X_{T} \in N

,

N = diag (M^{n})

. Since N is a submanifold of

M^{n}

, the conditioned diffusion defined above is absolutely continuous with respect to the Fermi bridge on

[0, T)

[8,28]. Define the

F M

-valued horizontal guided process

d U_{t} = H_{i} (U_{t}) \circ (d W_{t}^{i} - \frac{H_{i} {\tilde{r}}_{N}^{2} (U_{t})}{2 (T - t)} d t),

(16)

where

\tilde{r}

denotes the lift of the radial distance to N defined by

{\tilde{r}}_{N} (u) : = r_{N} (π (u)) = d (π (u), N)

. The Fermi bridge

Y^{F}

is the projection of U to M, i.e.,

Y_{t}^{F} : = π (U_{t})

. Let

P^{F}

denotes its law.

Theorem 2.

For all continuous bounded functions f on

M^{n}

, we have

E [f (X) | X_{1, T} = \dots = X_{n, T}] = lim_{t ↑ T} C E^{P^{F}} [f (Y) φ (Y)],

(17)

with a constant

C > 0

, where

d log φ (Y_{s}^{F}) = \frac{r_{N} (Y_{s}^{F})}{T - s} (d η_{s} + d L_{s}) with d η_{s} = \frac{\partial}{\partial r_{N}} log Θ_{N}^{- \frac{1}{2}} d s,

d L_{s} : = d L_{s} (Y^{F})

with

L

being the geometric local time at

Cut (N)

, and

Θ_{N}

the determinant of the derivative of the exponential map normal to N with support on

M^{n} \ Cut (N)

[8].

Proof.

From [15] Theorem 8 and [28],

E [f (X) | X_{T} \in N] = lim_{t ↑ T} C E^{P^{F}} [f (Y^{F}) φ (Y_{t}^{F})] .

□

Since N is a totally geodesic submanifold of dimension d, the results of [8] can be used to give sufficient conditions to extend the equivalence in (17) to the entire interval

[0, T]

. A set A is said to be polar for a process

X_{t}

if the first hitting time of A by X is infinity a.s.

Corollary 2.

If either of the following conditions are satisfied

(i): the sectional curvature of planes containing the radial direction is non-negative or the Ricci curvature in the radial direction is non-negative;
(ii): $Cut (N)$ is polar for the Fermi bridge $Y^{F}$ and either the sectional curvature of planes containing the radial direction is non-positive or the Ricci curvature in the radial direction is non-positive;

then

E [f (X) | X_{1, T} = \dots = X_{n, T}] = C E^{P^{F}} [f (Y^{F}) φ (Y_{T}^{F})] .

In particular,

\frac{φ (Y_{T}^{F})}{E^{P^{F}} [φ (Y_{T}^{F})]} d P^{F} \propto d P^{*}

.

Proof.

See [8] (Appendix C.2). □

For numerical purposes, the equivalence (17) in Theorem 2 is sufficient as the interval

[0, T]

is finitely discretized. To obtain the result on the full interval, the conditions in Corollary 2 may at first seem quite restrictive. A sufficient condition for a subset of a manifold to be polar for a Brownian motion is its Hausdorff dimension being two less than the dimension of the manifold. Thus,

Cut (N)

is polar if

dim (Cut (N)) \leq n d - 2

. Verifying whether this is true requires specific investigation of the geometry of

M^{n}

.

The SDE (16) together with (17) and the correction

φ

gives a concrete simulation scheme that can be implemented numerically. Implementation of the geometric constructs is discussed in Section 4. The main complication of using Fermi bridges for simulation is that it involves evaluation of the radial distance

r_{N}

at each time-step of the integration. Since the radial distance finds the closest point on N to

x_{1}, \dots, x_{n}

, it is essentially a computation of the Fréchet mean and thus hardly more computationally efficient than computing the Fréchet mean itself. For this reason, we present a coordinate-based simulation scheme below.

3.2. Simulation in Coordinates

We here develop a more efficient simulation scheme focusing on manifolds that can be covered by a single chart. The scheme follows the partial observation scheme developed [19]. Representing the product process in coordinates and using a transformation L, whose kernel is the diagonal

diag (M^{n})

, gives a guided bridge process converging to the diagonal. An explicit expression for the likelihood is given.

In the following, we assume that M can be covered by a chart in which the square root of the cometric tensor, denoted by

σ

, is

C^{2}

. Furthermore,

σ

and its derivatives are bounded;

σ

is invertible with bounded inverse. The drift b is locally Lipschitz and locally bounded.

Let

x_{1}, \dots, x_{n} \in M

be observations and let

X_{1, t}, \dots, X_{n, t}

be independent Brownian motions with

X_{1, 0} = x_{1}, \dots, X_{n, 0} = x_{n}

. Using the coordinate SDE (15) for each

X_{i, t}

, we can write the entire system on

M^{n}

as

d (\begin{matrix} X_{1, t}^{1} \\ ⋮ \\ X_{1, t}^{d} \\ ⋮ \\ X_{n, t}^{1} \\ ⋮ \\ X_{n, t}^{d} \end{matrix}) = (\begin{matrix} b^{1} (X_{1, t}) \\ ⋮ \\ b^{d} (X_{1, t}) \\ ⋮ \\ b^{1} (X_{n, t}) \\ ⋮ \\ b^{d} (X_{n, t}) \end{matrix}) d t + [\begin{matrix} Σ^{1} (X_{1, t}, \dots, X_{n, t}) \\ ⋮ \\ Σ^{d} (X_{1, t}, \dots, X_{n, t}) \end{matrix}] d W_{t} .

(18)

In the product chart,

Σ

and b satisfy the same assumptions as the metric and cometric tensor and drift listed above.

The conditioning

X_{T} \in N

is equivalent to the requiring

X_{T} \in diag ({(R^{d})}^{n})

in coordinates.

diag ({(R^{d})}^{n})

is a linear subspace of

{(R^{d})}^{n}

, we let

L \in M^{d \times n d}

be a matrix with orthonormal rows and

ker L = diag ({(R^{d})}^{n})

so that the desired conditioning reads

L X_{T} = 0

. Define the following oblique projection, similar to [19],

P_{t} (x) = a (x) L^{T} A (x) L

(19)

where

a (x) = Σ (x) Σ {(x)}^{T} and A_{t} (x) = {(L a (x) L^{T})}^{- 1} .

Set

β (x) = Σ {(x)}^{T} L^{T} A (x)

. The guiding scheme (13) then becomes

d Y_{t} = b (Y_{t}) d t + Σ (Y_{t}) d W_{t} - Σ (Y_{t}) β (Y_{t}) \frac{L Y_{t}}{T - t} 1_{(T - ε, T)} (t) d t, Y_{0} = u .

(20)

We have the following result.

Lemma 1.

Equation (20) admits a unique solution on

[0, T)

. Moreover,

{∥ L Y_{t} ∥}^{2} \leq C (ω) (T - t) log log [{(T - t)}^{- 1} + e]

a.s., where C is a positive random variable.

Proof.

Since

L P = L

, the proof is similar to the proof of [19] Lemma 6. □

With the same assumptions, we also obtain the following result similar to [19] Theorem 3.

Theorem 3.

Let

Y_{t}

be a solution of (20), and assume the drift b is bounded. For any bounded function f,

E [f (X) | X_{T} \in N] = C E [f (y) φ (Y)],

(21)

where C is a positive constant and

\begin{matrix} φ (Y_{t}) & = \sqrt{det (A (Y_{T}))} exp {- \frac{{∥ β_{T - ε} (Y_{T - ε}) L Y_{T - ε} ∥}^{2}}{2 ε} \\ - & \int_{T - ε}^{T} \frac{2 {(L Y_{s})}^{T} L b (Y_{s}) d s - {(L Y_{s})}^{T} d (A (Y_{s})) L Y_{s} + d [A {(Y_{s})}^{i j}, {(L Y_{s})}_{i} {(L Y_{s})}_{j}]}{2 (T - s)}} \end{matrix}

Proof.

A direct consequence of [19] Theorem 3, for

k = 1

, and Lemma 1. □

The theorem can also be applied for unbounded drift by replacing b with a bounded approximation and performing a Girsanov change of measure.

3.3. Accounting for $φ$

The sampling schemes (16) and (20) above provides samples on the diagonal and thus candidates for the diffusion mean estimates. However, the schemes sample from a distribution which is only equivalent to the bridge process distribution: We still need to handle the correction factor in the sampling to sample from the correct distribution, i.e., the rescaling

\frac{φ}{E [φ]}

of the guided law in Theorem 1. A simple way to achieve this is to do sampling importance resampling (SIR) as described in Algorithm 1. This yields an approximation of the weighted diffusion mean. For each sample

y^{i}

of the guided bridge process, we compute the corresponding correction factor

φ (y^{i})

. The resampling step then consists of picking

y_{T}^{j}

with a probability determined by the correction terms, i.e., with J the number of samples we pick sample j with probability

P_{j} = \frac{φ (y_{T}^{j})}{\sum_{i = 1}^{J} φ (y_{T}^{i})}

.

Algorithm 1:weighted Diffusion Mean

It depends on the practical application if the resampling is necessary, or if direct samples from the guided process (corresponding to

J = 1

) are sufficient.

4. Experiments

We here exemplify the mean sampling scheme on the two-sphere

S^{2}

and on finite sets of landmark configurations endowed with the LDDMM metric [29,30]. With the experiment on

S^{2}

, we aim to give a visual intuition of the sampling scheme and the variation in the diffusion mean estimates caused by the sampling approach. In the higher-dimensional landmark example where closed-form solutions of geodesics are not available, we compare to the Fréchet mean and include rough running times of the algorithms to give a sense of the reduced time complexity. Note, however, that the actual running times are very dependent on the details of the numerical implementation, stopping criteria for the optimization algorithm for the Fréchet mean, etc.

The code used for the experiments is available in the software package Jax Geometry http://bitbucket.org/stefansommer/jaxgeometry (accessed on 4 February 2022). The implementation uses automatic differentiation libraries extensively for the geometry computations as is further described [31].

4.1. Mean Estimation on $S^{2}$

To illustrate the diagonal sampling scheme, Figure 3 displays a sample from a diagonally conditioned Brownian motion on

{(S^{2})}^{n}

,

n = 3

. The figure shows both the diagonal sample (red point) and the product process starting at the three data points and ending at the diagonal. In Figure 4, we increase the number of samples to

n = 256

and sample 32 mean samples

(T = 0.2)

. The population mean is the north pole, and the samples can be seen to cluster closely around the population mean with little variation in the mean samples.

4.2. LDDMM Landmarks

We here use the same setup as in [13], where the diffusion mean is estimated by iterative optimization, to exemplify the mean estimation on a high-dimensional manifold. The data consists of annotations of left ventricles cardiac MR images [32] with 17 tlandmarks selected from the annotation set from a total of 14 images. Each configuration of 17 landmarks in

R^{2}

gives a point in a 34-dimensional shape manifold. We equip this manifold with the LDDMM Riemannian metric [29,30]. Please note that the configurations can be represented as points in

R^{34}

, and the entire shape manifold is the subset of

R^{34}

where no two landmarks coincide. This provides a convenient Euclidean representation of the landmarks. The cometric tensor is not bounded in this representation, and we therefore cannot directly apply the results of the previous sections. We can nevertheless explore the mean simulation scheme experimentally.

Figure 5 shows one landmark configuration overlayed the MR image from which the configuration was annotated, and all 14 landmark configurations plotted together. Figure 6 displays samples from the diagonal process for two values of the Brownian motion end time T. Please note that each landmark configuration is one point on the 34-dimensional shape manifold, and each of the paths displayed is therefore a visualization of a Brownian path on this manifold. This figure and Figure 3 both show diagonal processes, but on two different manifolds.

In Figure 7, an estimated diffusion mean and Fréchet mean for the landmark configurations are plotted together. On a standard laptop, generation of one sample diffusion mean takes approximately 1 s. For comparison, estimation of the Fréchet mean with the standard nested optimization approach using the Riemannian logarithm map as implemented in Jax Geometry takes approximately 4 min. The diffusion mean estimation performed in [13] using direct optimization of the likelihood approximation with bridge sampling from the mean candidate to each data point is comparable in complexity to the Fréchet mean computation.

5. Conclusions

In [7], the idea of sampling means by conditioning on the diagonal of product manifolds was first described and the bridge sampling construction sketched. In the present paper, we have provided a comprehensive account of the background for the idea, including the relation between the (weighted) Fréchet and diffusion means, and the foundations in both geometry and stochastic analysis. We have constructed two simulation schemes and demonstrated the method on both low and a high-dimensional manifolds, the sphere

S^{2}

and the LDDMM landmark manifold, respectively. The experiments show the feasibility of the method and indicate the potential high reduction in computation time compared to computing means with iterative optimization.

Author Contributions

Conceptualization, S.S.; methodology, M.H.J. and S.S.; software, S.S.; validation, M.H.J. and S.S.; formal analysis, M.H.J. and S.S.; investigation, M.H.J. and S.S.; resources, S.S.; data curation, S.S.; writing—original draft preparation, M.H.J. and S.S.; writing—review and editing, M.H.J. and S.S.; visualization, S.S.; supervision, S.S.; project administration, M.H.J. and S.S.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Villum Fonden grant number 22924 and grant number 40582, and Novo Nordisk Fonden grant number NNF18OC0052000.

Acknowledgments

The work presented is supported by the CSGB Centre for Stochastic Geometry and Advanced Bioimaging funded by a grant from the Villum foundation, the Villum Foundation grants 22924 and 40582, and the Novo Nordisk Foundation grant NNF18OC0052000.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. L’Institut Henri Poincaré 1948, 10, 215–310. [Google Scholar]
Arnaudon, M.; Li, X.M. Barycenters of measures transported by stochastic flows. Ann. Probab. 2005, 33, 1509–1543. [Google Scholar] [CrossRef][Green Version]
Pennec, X. Barycentric Subspace Analysis on Manifolds. Ann. Stat. 2018, 46, 2711–2746. [Google Scholar] [CrossRef]
Hansen, P.; Eltzner, B.; Sommer, S. Diffusion Means and Heat Kernel on Manifolds. In Geometric Science of Information; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2021; pp. 111–118. [Google Scholar] [CrossRef]
Hansen, P.; Eltzner, B.; Huckemann, S.F.; Sommer, S. Diffusion Means in Geometric Spaces. arXiv 2021, arXiv:2105.12061. [Google Scholar]
Chakraborty, R.; Bouza, J.; Manton, J.H.; Vemuri, B.C. ManifoldNet: A Deep Neural Network for Manifold-Valued Data With Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 799–810. [Google Scholar] [CrossRef]
Sommer, S.; Bronstein, A. Horizontal Flows and Manifold Stochastics in Geometric Deep Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 811–822. [Google Scholar] [CrossRef]
Thompson, J. Submanifold Bridge Processes. Ph.D. Thesis, University of Warwick, Coventry, UK, 2015. [Google Scholar]
Pennec, X.; Sommer, S.; Fletcher, T. Riemannian Geometric Statistics in Medical Image Analysis; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
Pennec, X.; Fillard, P.; Ayache, N. A Riemannian Framework for Tensor Computing. Int. J. Comput. Vis. 2006, 66, 41–66. [Google Scholar] [CrossRef]
Hsu, E.P. Stochastic Analysis on Manifolds; American Mathematical Society: Providence, RI, USA, 2002; Volume 38. [Google Scholar]
Grong, E.; Sommer, S. Most Probable Paths for Anisotropic Brownian Motions on Manifolds. arXiv 2021, arXiv:2110.15634. [Google Scholar]
Sommer, S.; Arnaudon, A.; Kuhnel, L.; Joshi, S. Bridge Simulation and Metric Estimation on Landmark Manifolds. In Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; pp. 79–91. [Google Scholar]
Papaspiliopoulos, O.; Roberts, G. Importance sampling techniques for estimation of diffusion models. Stat. Methods Stoch. Differ. Equ. 2012, 124, 311–340. [Google Scholar]
Jensen, M.H.; Sommer, S. Simulation of Conditioned Semimartingales on Riemannian Manifolds. arXiv 2021, arXiv:2105.13190. [Google Scholar]
Lyons, T.J.; Zheng, W.A. On conditional diffusion processes. Proc. R. Soc. Edinb. Sect. Math. 1990, 115, 243–255. [Google Scholar] [CrossRef]
Delyon, B.; Hu, Y. Simulation of Conditioned Diffusion and Application to Parameter Estimation. Stoch. Process. Appl. 2006, 116, 1660–1675. [Google Scholar] [CrossRef]
Schauer, M.; Van Der Meulen, F.; Van Zanten, H. Guided proposals for simulating multi-dimensional diffusion bridges. Bernoulli 2017, 23, 2917–2950. [Google Scholar] [CrossRef]
Marchand, J.L. Conditioning diffusions with respect to partial observations. arXiv 2011, arXiv:1105.1608. [Google Scholar]
van der Meulen, F.; Schauer, M. Bayesian estimation of discretely observed multi-dimensional diffusion processes using guided proposals. Electron. J. Stat. 2017, 11, 2358–2396. [Google Scholar] [CrossRef][Green Version]
Elworthy, D. Geometric aspects of diffusions on manifolds. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–87; Springer: Berlin/Heidelberg, Germany, 1988; pp. 277–425. [Google Scholar]
Emery, M. Stochastic Calculus in Manifolds; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
Elworthy, K.; Truman, A. The diffusion equation and classical mechanics: An elementary formula. In Stochastic Processes in Quantum Theory and Statistical Physics; Springer: Berlin/Heidelberg, Germany, 1982; pp. 136–146. [Google Scholar]
Li, X.M. On the semi-classical Brownian bridge measure. Electron. Commun. Probab. 2017, 22, 1–15. [Google Scholar] [CrossRef]
Ndumu, M.N. Brownian Motion and the Heat Kernel on Riemannian Manifolds. Ph.D. Thesis, University of Warwick, Coventry, UK, 1991. [Google Scholar]
Malliavin, P.; Stroock, D.W. Short time behavior of the heat kernel and its logarithmic derivatives. J. Differ. Geom. 1996, 44, 550–570. [Google Scholar] [CrossRef]
Stroock, D.W.; Turetsky, J. Short time behavior of logarithmic derivatives of the heat kernel. Asian J. Math. 1997, 1, 17–33. [Google Scholar] [CrossRef]
Thompson, J. Brownian bridges to submanifolds. Potential Anal. 2018, 49, 555–581. [Google Scholar] [CrossRef]
Joshi, S.; Miller, M. Landmark Matching via Large Deformation Diffeomorphisms. IEEE Trans. Image Process. 2000, 9, 1357–1370. [Google Scholar] [CrossRef]
Younes, L. Shapes and Diffeomorphisms; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Kühnel, L.; Sommer, S.; Arnaudon, A. Differential Geometry and Stochastic Dynamics with Deep Learning Numerics. Appl. Math. Comput. 2019, 356, 411–437. [Google Scholar] [CrossRef]
Stegmann, M.B.; Fisker, R.; Ersbøll, B.K. Extending and Applying Active Appearance Models for Automated, High Precision Segmentation in Different Image Modalities. Available online: http://www2.imm.dtu.dk/pubdb/edoc/imm118.pdf (accessed on 4 February 2022).

Figure 1. (left) The mean estimator viewed as a projection onto the diagonal of a product manifold. Given a set

x_{1}, \dots, x_{n} \in M

, the tuple

(x_{1}, \dots, x_{n})

(blue dot) belongs to the product manifold

M \times \dots \times M

. The mean estimator

\hat{μ}

can be identified with the projection of

(x_{1}, \dots, x_{n})

onto the diagonal N (red dot). (right) Diffusion mean estimator in

R^{2}

using Brownian bridges conditioned on the diagonal. Here a Brownian bridge

X_{t} = (X_{1, t}, \dots, X_{4, t})

in

R^{8}

is conditioned on hitting the diagonal

N \subseteq R^{8}

at time

T > 0

. The components

X_{j}

each being two-dimensional processes are shown in the plot.

Figure 1. (left) The mean estimator viewed as a projection onto the diagonal of a product manifold. Given a set

x_{1}, \dots, x_{n} \in M

, the tuple

(x_{1}, \dots, x_{n})

(blue dot) belongs to the product manifold

M \times \dots \times M

. The mean estimator

\hat{μ}

can be identified with the projection of

(x_{1}, \dots, x_{n})

onto the diagonal N (red dot). (right) Diffusion mean estimator in

R^{2}

using Brownian bridges conditioned on the diagonal. Here a Brownian bridge

X_{t} = (X_{1, t}, \dots, X_{4, t})

in

R^{8}

is conditioned on hitting the diagonal

N \subseteq R^{8}

at time

T > 0

. The components

X_{j}

each being two-dimensional processes are shown in the plot.

Figure 2. The mean estimator viewed as a projection onto the diagonal of a product manifold. Conditioning on the closest point in the diagonal yields a density on the diagonal depending on the time to arrival

T > 0

. As T tends to zero the density convergence to the Dirac-delta distribution (grey), whereas as T increases the variance of the distribution increases (rouge).

Figure 2. The mean estimator viewed as a projection onto the diagonal of a product manifold. Conditioning on the closest point in the diagonal yields a density on the diagonal depending on the time to arrival

T > 0

. As T tends to zero the density convergence to the Dirac-delta distribution (grey), whereas as T increases the variance of the distribution increases (rouge).

Figure 3. 3 points on

S^{2}

together with a sample mean (red) and the diagonal process in

{(S^{2})}^{n}

,

n = 3

with

T = 0.2

conditioned on the diagonal.

Figure 3. 3 points on

S^{2}

together with a sample mean (red) and the diagonal process in

{(S^{2})}^{n}

,

n = 3

with

T = 0.2

conditioned on the diagonal.

Figure 4. (left) 256 sampled data points on

S^{2}

(north pole being population mean). (right) 32 samples of the diffusion mean conditioned on the diagonal of

{(S^{2})}^{n}

,

n = 256

,

T = 0.2

. As can be seen, the variation in the mean samples is limited.

Figure 4. (left) 256 sampled data points on

S^{2}

(north pole being population mean). (right) 32 samples of the diffusion mean conditioned on the diagonal of

{(S^{2})}^{n}

,

n = 256

,

T = 0.2

. As can be seen, the variation in the mean samples is limited.

Figure 5. (left) One configuration of 17 landmarks overlayed the MR image from which the configuration was annotated. (right) All 14 landmark configurations plotted together (one color for each configuration of 17 landmarks).

Figure 6. Samples from the diagonal process with

T = 0.2

(left) and

T = 1

(right). The effect of varying the Brownian motion end time T is clearly visible.

Figure 6. Samples from the diagonal process with

T = 0.2

(left) and

T = 1

(right). The effect of varying the Brownian motion end time T is clearly visible.

Figure 7. One sampled diffusion mean with the sampling scheme (blue configuration) together with estimated Fréchet mean (green configuration). The forward sampling scheme is significantly faster than the iterative optimization needed for the Fréchet mean on the landmark manifold where closed-form solutions of the geodesic equations are not available.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jensen, M.H.; Sommer, S. Mean Estimation on the Diagonal of Product Manifolds. Algorithms 2022, 15, 92. https://doi.org/10.3390/a15030092

AMA Style

Jensen MH, Sommer S. Mean Estimation on the Diagonal of Product Manifolds. Algorithms. 2022; 15(3):92. https://doi.org/10.3390/a15030092

Chicago/Turabian Style

Jensen, Mathias Højgaard, and Stefan Sommer. 2022. "Mean Estimation on the Diagonal of Product Manifolds" Algorithms 15, no. 3: 92. https://doi.org/10.3390/a15030092

APA Style

Jensen, M. H., & Sommer, S. (2022). Mean Estimation on the Diagonal of Product Manifolds. Algorithms, 15(3), 92. https://doi.org/10.3390/a15030092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mean Estimation on the Diagonal of Product Manifolds

Abstract

1. Introduction

Contribution

2. Background

2.1. Riemannian Geometry

2.2. Weighted Fréchet Mean

2.3. Weighted Diffusion Mean

2.4. Diffusion Bridges

Euclidean Diffusion Bridges

2.5. Manifold Diffusion Processes

2.6. Manifold Bridges

3. Diffusion Mean Estimation

3.1. Fermi Bridges to the Diagonal

3.2. Simulation in Coordinates

3.3. Accounting for $φ$

4. Experiments

4.1. Mean Estimation on $S^{2}$

4.2. LDDMM Landmarks

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Mean Estimation on the Diagonal of Product Manifolds

Abstract

1. Introduction

Contribution

2. Background

2.1. Riemannian Geometry

2.2. Weighted Fréchet Mean

2.3. Weighted Diffusion Mean

2.4. Diffusion Bridges

Euclidean Diffusion Bridges

2.5. Manifold Diffusion Processes

2.6. Manifold Bridges

3. Diffusion Mean Estimation

3.1. Fermi Bridges to the Diagonal

3.2. Simulation in Coordinates

3.3. Accounting for φ

4. Experiments

4.1. Mean Estimation on S 2

4.2. LDDMM Landmarks

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3. Accounting for $φ$

4.1. Mean Estimation on $S^{2}$