Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing

Luo, Yihao; Zhang, Shiqiang; Cao, Yueqi; Sun, Huafei

doi:10.3390/e23091214

Open AccessArticle

Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing

¹

School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China

²

Department of Computing, Imperial College London, London SW7 2AZ, UK

³

Department of Mathematics, Imperial College London, London SW7 2AZ, UK

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(9), 1214; https://doi.org/10.3390/e23091214

Submission received: 5 August 2021 / Revised: 9 September 2021 / Accepted: 10 September 2021 / Published: 14 September 2021

Download

Browse Figures

Versions Notes

Abstract

:

The Wasserstein distance, especially among symmetric positive-definite matrices, has broad and deep influences on the development of artificial intelligence (AI) and other branches of computer science. In this paper, by involving the Wasserstein metric on

S P D (n)

, we obtain computationally feasible expressions for some geometric quantities, including geodesics, exponential maps, the Riemannian connection, Jacobi fields and curvatures, particularly the scalar curvature. Furthermore, we discuss the behavior of geodesics and prove that the manifold is globally geodesic convex. Finally, we design algorithms for point cloud denoising and edge detecting of a polluted image based on the Wasserstein curvature on

S P D (n)

. The experimental results show the efficiency and robustness of our curvature-based methods.

Keywords:

symmetric positive-definite matrix; Wasserstein metric; curvature; point cloud denoising; image edge detecting

1. Introduction

Symmetric positive-definite matrices have wide usage in many fields of information science, such as stability analysis of signal processing, linear stationary systems, optimal control strategies and imaging analysis [1,2,3]. Its importance is beyond words [4,5]. Instead of considering a single matrix, contemporary scientists tend to comprehend the global structure of the set consisting of all

n \times n

symmetric positive-definite matrices. This set is known as

S P D (n)

.

S P D (n)

can be endowed with various structures. The most traditional Euclidean metric is induced as submanifold metric from the Euclidean inner product on the space of matrices. X. Pennec, P. Fillard et al. [6] defined the affine-invariant Riemannian metric. V. Arsigny, P. Fillard et al. [7] showed the Lie group [8] structure on

S P D (n)

, which admits a bi-invariant metric called the Log-Euclidean metric.

Recently, by constructing a principle bundle, Y. Li, M. Wong et al. [9] and S. Zhang et al. [10] defined a new Riemannian metric on

S P D (n)

whose geodesic distance is equivalent to the Wasserstein-2 [11,12] distance, the so-called Wasserstein metric. This distance rather than the metric has been widely used in artificial intelligence [13]. In geometry, encouragingly, T. Asuka [14] and E. Massart, P.-A. Absil [15] gave a series of expressions for geometric quantities theoretically. However, these expressions are too general or complicated to be applied. In this paper, we derive more computationally feasible expressions in a concrete case. Moreover, we give the Jacobi field and scalar curvature.

Along the blooming of data science, point cloud processing, especially denoising, plays an increasingly important role in data relevant researches and engineering. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares such as PCL [16]. These methods share a common drawback when high density noise is added to point clouds. Utilizing the geometric structure of

S P D (n)

, we design a novel algorithm to overcome this drawback. Compared to traditional methods, our algorithm is more accurate and less dependent on artificial parameters.

In addition to that, we involve our theory for image edge detection, which is a classical problem in image processing and design a new detecting algorithm. Different from traditional gradient-based filters, such as Sobel, Prewitt and Laplacian, we present the connection between Wasserstein sectional curvature and edges. Experiments show the feasibility of our algorithm.

The paper is organized as follows. In Section 2, we introduce some basic knowledge of the Riemannian manifold

(S P D (n), g_{W})

, and consider the symmetry of the

(S P D (n), g_{W})

as well. In Section 3, we describe the Wasserstein geometry of

S P D (n)

, including the geodesic, exponential map, connection and curvature. In particular, we prove the geodesic convexity and the nonexistence of cut locus and conjugate pair. In Section 4, we design an adaptive algorithm to denoise point clouds. In Section 5, we develop a curvature-based method to detect image edge. Proofs and detailed numerical results are presented in the Appendix B.

2. Preliminary

2.1. Notation

In this paper, we adopt conventional notations in algebra and geometry. Riemannian manifolds are denoted as pairs of

(manifold, metric)

. For example, our mainly interesting object is

(S P D (n), g_{W})

, meaning

S P D (n)

endowed with Wasserstein metric.

R^{n}

is the n-dimensional Euclidean space.

M (n)

represents the set of

n \times n

matrices,

S y m (n)

means the set of

n \times n

symmetric matrices, and

O (n)

means the set of

n \times n

orthogonal matrices.

T_{A} M

is conventionally the tangent space of M at a point A.

Λ

always represents a diagonal

n \times n

matrix. For an

n \times n

matrix A,

λ (A)

or

λ_{i} (A)

means an eigenvalue or the i-th eigenvalue of A, respectively. The components of matrix A with the entries

A_{i j}

will always be noted as

[A_{i j}]

. The identity matrix is denoted as I. In this paper, we conventionally express points on manifolds as

A, B

and vector fields as

X, Y

.

Sylvester equation is one of the most classical matrix equations. The following special case of Sylvester equation plays a key role in understand the geometry of

(S P D (n), g_{W})

\begin{matrix} A K + K A = X, A \in S P D (n), X \in S y m (n) . \end{matrix}

(1)

We denote the solution about K of (1) as

Γ_{A} [X]

. Then, the matrix

Γ_{A} [X] \in M (n)

satisfies

\begin{matrix} A Γ_{A} [X] + Γ_{A} [X] A = X . \end{matrix}

(2)

From geometric aspects, we can ensure the existence and uniqueness of the solution in the case involved in this paper. Some properties of

Γ_{A} [X]

can be found in Appendix A. More details about the Sylvester equation are presented in [17].

We recall an algorithm to solve this kind of Sylvester equations, which offers an explicit expression of the solution. This expression only depends on the eigenvalue decomposition. More details can be found in [10].

Algorithm 1 will be used frequently in the following passage, and it helps us to comprehend the geometry of

S P D (n)

. Note that this algorithm can also be used for general

X \in M (n)

. In particular, when X is symmetric (skew-symmetric),

Γ_{A} [X]

is also symmetric (skew-symmetric). Moreover, this algorithm will be simplified if A is diagonal.

Algorithm 1 Solution of Sylvester Equation.

Input:

A \in S P D (n), X \in S y m (n)

Output:

Γ_{A} [X]

1:: Eigenvalue decomposition, $A = Q Λ Q^{T}$ , where $Q \in O (n), Λ : = diag (λ_{1}, \dots, λ_{n})$ are eigenvalues of A;
2:: $C_{X} : = [c_{i j}] = Q^{T} X Q$ ;
3:: $E_{X} = [e_{i j}] = (\frac{c_{i j}}{λ_{i} + λ_{j}})$ ;
4:: $Γ_{A} [X] = Q E Q^{T}$ .

2.2. Wasserstein Metric

In this part, we introduce the Wasserstein metric on

S P D (n)

.

Definition 1.

For any

A \in S P D (n)

,

X, Y \in T_{A} S P D (n)

, define

\begin{matrix} g_{W} |_{A} (X, Y) = tr (Γ_{A} [X] A Γ_{A} [Y]) = \frac{1}{2} tr (X Γ_{A} [Y]) . \end{matrix}

(3)

In the second equation, we use the facts that

Γ_{A} [X], Γ_{A} [Y], A

are all symetric and that

A Γ_{A} [X] + Γ_{A} [X] A = X

. One can check that

g_{W}

is a symmetric and non-degenerated bilinear tensor fields on

S P D (n)

[18]. We call

g_{W}

the Wasserstein metric.

Denote

g_{E} (\tilde{X}, \tilde{Y}) : = tr ({\tilde{X}}^{T} \tilde{Y}), \forall \tilde{A} \in G L (n), \tilde{X}, \tilde{Y} \in T_{\tilde{A}} G L (n)

as Euclidean metric on

G L (n)

. Then, we have the following conclusions.

Proposition 1.

The projection

\begin{matrix} σ : (G L (n), g_{E}) & \to (S P D (n), g_{W}) \\ \tilde{A} & \mapsto {\tilde{A}}^{T} \tilde{A} \end{matrix}

(4)

is a Riemannian submersion [19], which means that

d σ

is surjective and

\begin{matrix} g_{E} (\tilde{X}, \tilde{Y}) = g_{W} (d σ (\tilde{X}), d σ (\tilde{Y})) = g_{W} (X, Y) . \end{matrix}

(5)

Remark 1.

The general linear group with Euclidean metric

(G L (n), g_{E})

and projection σ is a trivial principal bundles on

S P D (n)

with orthogonal group

O (n)

as the structure group. This bundle structure establishes two facts [10]:

S P D (n) ≅ G L (n) / O (n)

, and

g_{E}

remains invariant under the group action of

O (n)

.

Before giving more conclusions, we review some concepts. For any

\tilde{A} \in G L (n)

, we say that the tangent vector

\tilde{V} \in T_{\tilde{A}} G L (n)

is vertical if

d σ (\tilde{V}) = 0

, and

\tilde{W} \in T_{\tilde{A}} G L (n)

is horizontal if

g_{W} |_{\tilde{A}} (\tilde{V}, \tilde{W}) = 0

for all vertical vecters

\tilde{V}

. In addition to that, if

d σ (\tilde{X}) = X \in T_{A} S P D (n)

, we call

\tilde{X}

as a lift of X, where

A = σ (\tilde{A})

. Using the notation

Γ_{A} [X]

, we can find the horizontal lift of

X \in T_{A} S P D (n)

.

Proposition 2.

For any

\tilde{A} \in (G L (n), g_{E})

,

A = σ (\tilde{A})

and any

X \in T_{A} S P D (n)

, there is a unique

\tilde{X}

to be the horizontal lift of X at

T_{\tilde{A}} G L (n)

—that is,

\begin{matrix} \tilde{X} = \tilde{A} Γ_{A} [X] . \end{matrix}

(6)

Using Proposition 2, we can obtain the representations of horizontal and vertical vectors.

Proposition 3.

For any

\tilde{A} \in (G L (n), g_{E})

,

T_{\tilde{A}} G L (n)

has the following orthogonal decomposition

\begin{matrix} T_{\tilde{A}} G L (n) = H (\tilde{A}) \oplus V (\tilde{A}), \end{matrix}

(7)

where

H (\tilde{A})

consists of all horizontal vectors,

V (\tilde{A})

consists of all vertical vectors, and

\begin{matrix} H (\tilde{A}) = {\tilde{A} K | K^{T} = K}, V (\tilde{A}) = {S \tilde{A} | S^{T} = - S} . \end{matrix}

(8)

Proofs of Proposition 2 and 3 can be found in [10].

2.3. Symmetry

Now we study the symmetry of

(S P D (n), g_{W})

. Consider orthogonal group action

Ψ : O (n) \times S P D (n) \to S P D (n)

defined by

\begin{matrix} Ψ_{O} (A) = O A O^{T}, \forall O \in O (n), A \in S P D (n) . \end{matrix}

(9)

One can check that

Ψ

is a group action of

O (n)

and that

d Ψ_{O}

are isometric for all

O \in O (n)

, which means that

O (n)

is isomorphic to a subgroup of the isometry group

I S O (S P D (n), g_{W})

on

S P D (n)

. Precisely, we have

\begin{matrix} {Ψ_{O}}_{O \in O (n)} ⊲ I S O (S P D (n), g_{W}) . \end{matrix}

(10)

According to (10), when we study local geometric characteristics, we only need to consider the sorted diagonal matrices as the representational elements under the orthogonal action rather than all general points on

S P D (n)

. Therefore, some pointwise quantities, such as the scalar curvature and the bounds of sectional curvatures, depend only on the eigenvalues.

At the end of this part, we give the symmetry degree of

(S P D (n), g_{W})

, which is defined by the dimension of

I S O (S P D (n), g_{W})

.

Proposition 4.

(S P D (n)), g_{W})

has its symmetry degree controlled by

\begin{matrix} \frac{1}{2} (n - 1) n \leq \dim (I S O (S P D (n), g_{W})) \leq \frac{1}{8} (n - 1) n (n + 1) (n + 2) + 1 . \end{matrix}

(11)

Proof.

The famous interval theorem [20] about isometric group shows the nonexistence of isometric groups with dimension between

\frac{m (m - 1)}{2} + 1

and

\frac{m (m + 1)}{2}

, for any m-dimensional Riemannian manifold, except

m \neq 4

.

On one hand, For an n-dimensional Riemannian manifold, the dimension of isometry group achieves maximum if and only if it has constant sectional curvature. However, we will show later that

(S P D (n), g_{W})

has no constant sectional curvature, which means its symmetry degree is less than the highest.

On the other hand, Equation (10) shows that the dimension of Wasserstein isometric group is higher than the dimension of

O (n)

. Therefore, by

\dim (S P D (n)) = \frac{n^{2} + n}{2} \neq 4

and

\dim (O (n)) = \frac{n^{2} - n}{2}

, we obtain the desired result. □

3. Wasserstein Geometry of $S P D (n)$

3.1. Geodesic

In this part, we give the expression of geodesic on

(S P D (n), g_{W})

, including the geodesic jointing of two points and the geodesic with initial values. Then, we will show that the whole Riemannian manifold

(S P D (n), g_{W})

is geodesic convex, which means that we can always find the minimal geodesic jointing any two points. To some extent, geodesic convexity may make up for the incompleteness of

(S P D (n), g_{W})

.

To prove the geodesic convexity of

(S P D (n), g_{W})

, we need the following theorem.

Theorem 1.

For any

A_{1}, A_{2} \in S P D (n)

, let

{\tilde{A}}_{1} = A_{1}^{\frac{1}{2}}

be the fixed lift of

A_{1}

, there exists a lift of

A_{2}

\begin{matrix} {\tilde{A}}_{2} = A_{1}^{- \frac{1}{2}} {(A_{1} A_{2})}^{\frac{1}{2}}, \end{matrix}

(12)

such that the line segment

\tilde{γ} (t) = t {\tilde{A}}_{2} + (1 - t) {\tilde{A}}_{1}, t \in [0, 1]

is horizontal and non-degenerated.

Proof of Theorem 1 can be found in Appendix B. This theorem brings some geometrical and physical facts.

Corollary 1.

(geodesic convexity)

(S P D (n), g_{W})

is a geodesic convex Riemannian manifold. Between any two points

A_{1}, A_{2} \in S P D (n)

, there exists a minimal Wasserstein geodesic

\begin{matrix} γ (t) = {(1 - t)}^{2} A_{1} + t (1 - t) ({(A_{1} A_{2})}^{\frac{1}{2}} + {(A_{2} A_{1})}^{\frac{1}{2}}) + t^{2} A_{2}, \end{matrix}

(13)

where

γ (t)

lies on

S P D (n)

strictly. Thus,

(S P D (n), g_{W})

is geodesic convex.

The similar expression of geodesic can also be found in several papers [14,15].

Theorem 2.

For any two points in

(S P D (n), g_{W})

, there exists a unique geodesic jointing them. From geometric aspect, there is no cut locus on any geodesic.

Proof.

We have proved the existence of minimal geodesic jointing any two points in Corollary 1. Assume that the there exists two minimal geodesics jointing

A, B \in S P D (n)

. Fix

\tilde{A} = A^{\frac{1}{2}}

as the horizontal lift of A and lift horizontally these two geodesic, we will find two horizontal lifts of B. Denote these two lifts as

{\tilde{B}}_{1} = B^{\frac{1}{2}} Q_{1}, {\tilde{B}}_{2} = B^{\frac{1}{2}} Q_{2}, Q_{1}, Q_{2} \in O (n)

. Then,

Q_{1}

and

Q_{2}

are both solutions of the following optimization problem

\begin{matrix} \underset{Q \in O (n)}{arg min} d_{F} (A^{\frac{1}{2}}, B^{\frac{1}{2}} Q) . \end{matrix}

(14)

Since the compactness of

O (n)

, this problem has a unique solution. Thus, we prove the uniqueness of minimal geodesic. □

Remark 2.

Due to the nonexistence of cut locus, there exists no conjugate pair on

(S P D (n), g_{W})

.

3.2. Exponential Map

Following Lemma A1, we can directly write down the Wasserstein logarithm on

S P D (n)

, for any

A_{1} \in S P D (n)

,

{log}_{A_{1}} : S P D (n) \to T_{A_{1}} S P D (n)

\begin{matrix} {log}_{A_{1}} A_{2} {= d σ |}_{A_{1}^{\frac{1}{2}}} \dot{\tilde{γ}} (0) = {(A_{1} A_{2})}^{\frac{1}{2}} + {(A_{2} A_{1})}^{\frac{1}{2}} - 2 A_{1} . \end{matrix}

(15)

By solving the inverse problem of above equation, we gain the expression of the Wasserstein exponential.

Theorem 3.

In a small open ball

B (0, ε), ε > 0

in

T_{A} S P D (n) ≅ R^{\frac{1}{2} n (n + 1)}

, the Wasserstein exponential at A,

{exp}_{A} : B (0, ε) \to S P D (n)

is explicitly expressed by

\begin{matrix} {exp}_{A} X = A + X + Γ_{A} [X] A Γ_{A} [X] . \end{matrix}

(16)

Proof.

By choosing the normal coordinates [21] at A, there always exist neighborhoods where

{exp}_{A}

is well-defined. From (15), given

{exp}_{A} X

as well-defined, this satisfies

\begin{matrix} {(A {exp}_{A} X)}^{\frac{1}{2}} + {({exp}_{A} X A)}^{\frac{1}{2}} = X + 2 A . \end{matrix}

(17)

This equation can convert to the Sylvester equation, and we can express its solution as

\begin{matrix} A A^{- 1} {(A {exp}_{A} X)}^{\frac{1}{2}} + {({exp}_{A} X A)}^{\frac{1}{2}} A^{- 1} A = X + 2 A \\ \Leftrightarrow & A (A^{- 1} {(A {exp}_{A} X)}^{\frac{1}{2}}) + (A^{- 1} {(A {exp}_{A} X)}^{\frac{1}{2}}) A = X + 2 A \\ \Leftrightarrow & A Γ_{A} [X + 2 A] = {(A {exp}_{A} X)}^{\frac{1}{2}} . \end{matrix}

(18)

Therefore, we have

\begin{matrix} {exp}_{A} X = Γ_{A} [X + 2 A] A Γ_{A} [X + 2 A] = A + X + Γ_{A} [X] A Γ_{A} [X], \end{matrix}

(19)

which finishes this proof. □

Remark 3.

We call the first two terms of right hand

A + X

as the Euclidean exponential, and the last term of right hand

Γ_{A} [X] A Γ_{A} [X]

as the Wasserstein correction for this bend manifold.

Corollary 2.

The geodesic equations with initial conditions

γ (0), \dot{γ} (0)

on

(S P D (n), g_{W})

have the following explicit solution

\begin{matrix} γ (t) = γ (0) + t \dot{γ} (0) + t^{2} Γ_{γ (0)} [\dot{γ} (0)] γ (0) Γ_{γ (0)} [\dot{γ} (0)], t \in (- ε, ε) . \end{matrix}

(20)

Using an exponential map, one can directly construct Jacobi fields with the geodesic variation.

Theorem 4.

Along a geodesic

γ (t)

with

γ (0) = A \in S P D (n), \dot{γ} (0) = X \in T_{A} S P D (n)

, there exists a unique normal Jacobi vector field

J (t)

with initial conditions

J (0) = 0, \nabla_{\dot{γ} (0)} J (t) = Y \in T_{A} S P D (n)

, where

g_{W} |_{A} 〈 X, Y 〉 = 0

. We have

\begin{matrix} J (t) = t Y + t^{2} (Γ_{A} [X] A Γ_{A} [Y] + Γ_{A} [Y] A Γ_{A} [X]) . \end{matrix}

(21)

As in [18],

J (t)

is constructed by

\begin{matrix} J (t) : = {\frac{\partial}{\partial s}|}_{s = 0} {exp}_{A} t (X + s Y), \end{matrix}

(22)

substituting (16) into (22), and Theorem 4 comes from direct computation.

Subsequently, the next natural question is what is the maximal length of the extension of a geodesic. This question is equivalent to what is the largest domain of the exponential. We still focus on diagonal matrices.

Theorem 5.

For any

A \in S P D (n)

and

X \in T_{A} S P D (n)

,

{exp}_{A} (t X) : [0, ε) \to S P D (n)

is well-defined if and only if

\begin{matrix} ε_{max} = \{\begin{matrix} - \frac{1}{λ_{min}}, & if λ_{min} < 0 \\ + \infty, & if λ_{min} \geq 0 \end{matrix}, \end{matrix}

(23)

where

λ_{min}

is the minimal eigenvalue of

Γ_{A} [X]

.

Proof.

Evidently,

ε_{max} = min {s > 0 | \det ({exp}_{A} (s X)) = 0}

. By (19), we have

\begin{matrix} \det ({exp}_{A} (s X)) = \det (A) \det^{2} (I + s Γ_{A} [X]) = 0 \Leftrightarrow s = - \frac{1}{λ (Γ_{A} [X])}, \end{matrix}

(24)

where

λ (Γ_{A} [X])

is the eigenvalue of

Γ_{A} [X]

. Thus,

ε_{max} = min \{- \frac{1}{λ (Γ_{A} [X])} > 0\}

. □

Corollary 3.

The Wasserstein metric

g_{W}

on

S P D (n)

is incomplete.

Corollary 3 can be directly obtained from Hopf–Rinow theorem [22]. Theorem 5 and the next theorem help us to comprehend the size of

(S P D (n), g_{W})

from sense of each point.

Figure 1 shows geodesics starting from different origins on

S P D (2)

. From this group of pictures, we can observe the outline of the manifold and some behaviors of geodesics.

Using

ε_{max}

, we can obtain the injectivity radius

r (A), \forall A \in S P D (n)

. Geometrically speaking,

r (A)

is the maximal radius of the ball in which

{exp}_{A}

is well-defined.

Theorem 6.

The Wasserstein radius

r (A) : S P D (n) \to (0, + \infty)

can be given by

\begin{matrix} r (A) = \sqrt{λ_{min} (A)}, \end{matrix}

(25)

and the function

r (A)

is continuous.

Proof of Theorem 6 can be found in Appendix C. Due to the geodesic convexity, the radius actually defines the Wasserstein distance of a point on

S P D (n)

to the ’boundary’ of the manifold. It also measures the degenerated degree of a positive-definite symmetric matrix by

\sqrt{λ_{min}}

.

Figure 2 shows three maximal geodisical balls with different centers on

S P D (2)

. From the viewpoint of

R^{3}

, the three balls have different sizes in the sense of Euclidean distance, but on

(S P D (2), g_{W})

, all of them have the same radius.

3.3. Connection

In this section, we will study the Riemannian connection of

(S P D (n), g_{W})

, called a Wasserstein connection. The flatness of

(G L (n), g_{E})

and the structure of the Riemannian submersion will take a series of convenience to our work. During computation, we denote both tensor actions of

g_{W}

on

S P D (n)

and

g_{E}

on

G L (n)

by

〈 \cdot, \cdot 〉

. Then, we denote the Euclidean connection as D and the Wasserstein connection as ∇.

The main idea to express the Wasserstein connection is to compute the horizontal term of the Euclidean covariant derivative of lifted vector fields. We shall prove:

Lemma 1.

The Euclidean connection is a lift of the Wasserstein connection. For any smooth vector fields X and Y on

S P D (n)

, and

\tilde{X}

and

\tilde{Y}

are their horizontal lifts, respectively, the following equation holds

\begin{matrix} {d σ |}_{\tilde{A}} (D_{\tilde{X}} \tilde{Y}) = \nabla_{X} Y, \forall \tilde{A} \in G L (n) . \end{matrix}

(26)

Proof of Lemma 1 can be found in Appendix D. This lemma holds for general Riemannian submersion. The reason we reprove it for the case is that we will need use some middle results of the proof later. Using Lemma 1, we can find a direct corollary, which is one of the essential results in this paper.

Corollary 4.

The Wasserstein connection has an explicit expression:

\begin{matrix} \nabla_{X} Y = d Y (X) - Γ_{A} [X] A Γ_{A} [Y] - Γ_{A} [Y] A Γ_{A} [X], \end{matrix}

(27)

where

d Y (X)

is a Euclidean directional derivative.

Proof.

From Lemma 1 and (A14) (in Appendix D), we have

\begin{matrix} \nabla_{X} Y & {= d σ |}_{\tilde{A}} (D_{\tilde{X}} \tilde{Y}) = {\tilde{A}}^{T} D_{\tilde{X}} \tilde{Y} + {(D_{\tilde{X}} \tilde{Y})}^{T} \tilde{A} \\ = d Y (X) - (X Γ_{A} [Y] + Γ_{A} [Y] X) + A Γ_{A} [X] Γ_{A} [Y] + Γ_{A} [Y] Γ_{A} [X] A \\ = d Y (X) - Γ_{A} [X] A Γ_{A} [Y] - Γ_{A} [Y] A Γ_{A} [X] . \end{matrix}

(28)

The linearity, Leibnitz’s law and symmetry of Wasserstein connection are easily checked from the expression. □

The vertical component of lifted covariant derivative of Y along X is a vector field in

G L (n)

whose value at

\tilde{A}

is defined by

\begin{matrix} T_{\tilde{A}} (X, Y) : = D_{\tilde{X}} \tilde{Y} - \tilde{\nabla_{X} Y} . \end{matrix}

(29)

We say

T_{\tilde{A}}

is an

A

-tensor. The whole vector field is denoted as

T (X, Y)

. By definition and previous results, we can obtain the expression of

T_{\tilde{A}} (X, Y)

.

Proposition 5.

T_{\tilde{A}} (\cdot, \cdot)

is a antisymmetric bilinear map:

T_{A} S P D (n) \otimes T_{A} S P D (n) \to T_{\tilde{A}} G L (n)

, and it satisfies

\begin{matrix} T_{\tilde{A}} (X, Y) = \tilde{A} Γ_{A} [Π [X, Y]] A . \end{matrix}

(30)

Proof.

Using (A12) (in Appendix D), (6) and (27), we have

\begin{matrix} T_{\tilde{A}} (X, Y) = & \tilde{A} (Γ_{A} [d Y (X)] - Γ_{A} [X Γ_{A} [Y] + Γ_{A} [Y] X] + Γ_{A} [X] Γ_{A} [Y]) \\ - \tilde{A} Γ_{A} [d Y (X) - Γ_{A} [X] A Γ_{A} [Y] - Γ_{A} [Y] A Γ_{A} [X]] \\ = & \tilde{A} (- Γ_{A} [A Γ_{A} [X] Γ_{A} [Y]] - Γ_{A} [Γ_{A} [Y] Γ_{A} [X] A] + Γ_{A} [X] Γ_{A} [Y]) \\ = & \tilde{A} (Γ_{A} [X] Γ_{A} [Y] - A Γ_{A} [Γ_{A} [X] Γ_{A} [Y]] - Γ_{A} [Γ_{A} [Y] Γ_{A} [X]] A) \\ = & \tilde{A} Γ_{A} [Γ_{A} [X] Γ_{A} [Y] - Γ_{A} [Y] Γ_{A} [X]] A \\ = & \tilde{A} Γ_{A} [Π [X, Y]] A, \end{matrix}

(31)

where (31) shows that

T_{\tilde{A}} (X, Y)

depends only on

\tilde{A}

and the vectors on

T_{A} S P D (n)

. The multi-linearity and

T_{\tilde{A}} (X, Y) = - T_{\tilde{A}} (Y, X)

are easily checked. □

Recalling (A14) in Appendix D, we also find that

\begin{matrix} [\tilde{X}, \tilde{Y}] = \tilde{[X, Y]} + 2 T (X, Y) . \end{matrix}

(32)

In the following parts, we will show the tensor

T (X, Y)

plays a significant role for computing curvature.

3.4. Curvature

In this part, we tend to understand the curvature of

(S P D (n), g_{W})

. Although there exists some relevant results giving abstract expressions for general cases, we obtain simpler expressions and derive the scalar curvature via a special basis.

3.4.1. Riemannian Curvature Tensor

First, we derive the Riemannian curvature of

(S P D (n), g_{W})

. We denote the Euclidean curvature on bundle (null entirely) as

\tilde{R}

, and the Wasserstein (Riemannian) curvature on

(S P D (n), g_{W})

as R.

Theorem 7.

For any

A \in S P D (n)

, and

X, Y

are smooth vector fields on

S P D (n)

, the Wasserstein curvature tensor

R (X, Y, X, Y) : = {〈 R_{X Y} X, Y 〉}_{A}

at A has an explicit expression

\begin{matrix} R (X, Y, X, Y) = 3 tr (Γ_{A} [X] A Γ_{A} [Γ_{A} [X] Γ_{A} [Y] - Γ_{A} [Y] Γ_{A} [X]] A Γ_{A} [Y]) . \end{matrix}

(33)

Proof of Theorem 7 can be found in Appendix E. The expression

\begin{matrix} R (X, Y, X, Y) = {∥ T (X, Y) ∥}^{2} \end{matrix}

(34)

has been derived before by other research group in similar way. However, here we use another way to calculate curvature tensor and find a more explicit expression, which is easier than expanding

{∥ T (X, Y) ∥}^{2}

directly. In addition to that, from

{∥ T (X, Y) ∥}^{2} \geq 0

and (A23) (in Appendix E), we can obtain the following corollary.

Corollary 5.

(S P D (n), g_{W})

has non-negative curvatures, namely

\begin{matrix} R (X, Y, X, Y) \geq 0 . \end{matrix}

(35)

By solving the Sylvester equation with Algorithm 1, we can simplify the expression. We give the sectional curvature K of the section

span {X (A), Y (A)}

\begin{matrix} {K |}_{A} (X, Y) = \frac{R (X, Y, X, Y)}{〈 X, X 〉 〈 Y, Y 〉 - {〈 X, Y 〉}^{2}} = 12 \frac{tr (E_{X} Λ Γ_{Λ} [E_{X}, E_{Y}] Λ E_{Y})}{tr (E_{X} C_{X}) tr (E_{Y} C_{Y}) - {tr}^{2} (E_{X} C_{Y})}, \end{matrix}

(36)

where we use the same donations as Algorithm 1. In particular, in diagonal cases, we obverse that the sectional curvature conforms to the inverse ratio law

\begin{matrix} {K |}_{k Λ} (X, Y) = \frac{1}{k} {K |}_{Λ} (X, Y), \forall k \in R - {0} . \end{matrix}

(37)

These results conform with our visualized views of

(S P D (n), g_{W})

, as presented in Figure 1, where the manifold tends to be flat when k increases.

3.4.2. Sectional Curvature

Now, we derive more explicit expressions for sectional curvature and scalar curvature. Conventionally, we only need to consider diagonal cases. Before that, we introduce a basis on

S y m (n)

, which is the tangent space of

S P D (n)

. Define

{S^{p, q}}

as

\begin{matrix} S^{p, q} = [S_{i j}^{p, q}], S_{i j}^{p, q} = δ_{i}^{p} δ_{j}^{q} + δ_{i}^{q} δ_{j}^{p}, \end{matrix}

(38)

where the superscripts

p, q

marks the nonzero elements in

S^{p, q}

and

δ

is the Kronecker delta. Apparently,

{S^{p, q} | 1 \leq p \leq q \leq n}

forms a basis of

S y m (n)

. For simplicity, we sometimes sign

S^{p, q}, S^{r, t}

with

S_{1}, S_{2}

, respectively. In this way, we can express the curvature under this basis.

By direct calculation, we have

\begin{matrix} {(S_{1} S_{2})}_{i j} = \sum_{k = 1}^{n} S_{i k}^{p, q} S_{k j}^{r, t} = δ_{i}^{p} δ^{q t} δ_{j}^{r} + δ_{i}^{p} δ^{q r} δ_{j}^{t} + δ_{i}^{q} δ^{p t} δ_{j}^{r} + δ_{i}^{q} δ^{p r} δ_{j}^{t} . \end{matrix}

(39)

By Algorithm 1, we know that

E_{S} = (\frac{S_{i j}}{λ_{i} + λ j}) = Γ_{Λ} S, \forall S \in T_{Λ} S P D (n)

(

Q = I

in the decomposition of

Λ

). Note that the elements of

Λ, S_{1}, S_{2}

are all positive; therefore, we have

\begin{matrix} E_{S_{1}} E_{S_{2}} \neq 0 ⟺ S_{1} S_{2} \neq 0 ⟺ {p, q} \cap {r, t} \neq \emptyset . \end{matrix}

(40)

According to the anti-symmetry of curvature tensor, the non-vanishing curvature means that

{p, q} \neq {r, t}

. Moreover, by definition we know

S^{p, q} = S^{q, p}, S^{r, t} = S^{t, r}

. Without loss of generality, we only need to consider the following particular case:

\begin{matrix} p = r, q \neq t . \end{matrix}

(41)

Theorem 8.

For any diagonal matrix

Λ = diag (λ_{1}, \dots, λ_{n}) \in S P D (n)

, where

λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

, Wasserstein sectional curvature satisfies

\begin{matrix} {K |}_{Λ} (S_{1}, S_{2}) = \frac{3 (1 + δ^{p q}) (1 + δ^{p t}) λ_{q} λ_{t}}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t}) (λ_{q} + λ_{t})}, \end{matrix}

(42)

where

S_{1} = S^{p, q}, S_{2} = S^{r, t}, p = r, q \neq t

.

Proof of Theorem 8 can be found in Appendix F. With the above expansion for sectional curvatures, we can easily find that sectional curvature can be controlled by the secondly minimal eigenvalue, which implies that the curvature will seldom explode even on a domain almost degenerated. Only when the matrices degenerate at over two dimensions will the curvatures be very large. This phenomenon ensures the curvature information makes sense in most applications. Some examples for this phenomenon can be observed later.

3.4.3. Scalar Curvature

In the last part of this section, we calculate the scalar curvature directly.

Theorem 9.

For any

A \in S P D (A)

, its scalar curvature

ρ (A)

is

\begin{matrix} ρ (A) = 3 tr (U Λ (U + U^{T}) + (U + U^{T}) Λ U + (U + U^{T}) Λ U Λ (U + U^{T})), \end{matrix}

(43)

where the diagonal matrix

Λ = diag (λ_{1}, \dots, λ_{n})

is orthogonal similar to A, and

U = {(\frac{1}{λ_{i} + λ_{j}})}_{i < j}

.

Proof of Theorem 9 can be found in Appendix G. Figure 3 presents some examples for scalar curvatures on

(S P D (2), g_{W})

, which shows our argument in the last part of Section 3.4.2.

4. Point Cloud Denoising

Denoising or outlier removal is a fundamental step of point cloud preprocessing since real-world data are often polluted by noise. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares. For example, PCL [16] is a popular platform for point cloud processing, which collects four denoising schemes. However, these methods fail to give satisfactory performance when point clouds are polluted by high density noise. To solve this problem, we consider both the statistical and geometrical structure of data and design a new algorithm.

The idea is that by embedding the original point cloud from Euclidean space into

S P D (n)

, the Wasserstein scalar curvature gives essential information about noise and true data. Therefore, our new algorithm mainly contains two steps: First, we give the desired embedding by fitting a Gaussian distribution locally at each point. Then, we identify noise by looking at the histogram of the Wasserstein scalar curvature. Due to the flatness of the space of noise, it is reasonable to classify points with small curvature to be noise. The threshold is set to be the first local minimum of the histogram. We call this new scheme adaptive Wasserstein curvature denoising (AWCD).

In the following, we introduce two traditional denoising methods called radius outlier removal (ROR) and statistical outlier removal (SOR). Then, we explain details about AWCD. Additionally, we carry out experiments using different datasets, with a comparison to two classical methods. From the experimental results, AWCD presents better performance regardless of the data size and the density of noise. We also give a time complexity analysis for each denoising algorithm. The results show that AWCD is as efficient as other classical methods. Thus, it is applicable in many practical tasks.

4.1. Radius Outlier Removal

In Radius Outlier Removal, called ROR (seeing Algorithm 2), points are clustered into two categories according to their local density, i.e., points with low density tend to be recognized as noise, whereas points with high density are recognized as true data. ROR requires two parameters: a preset parameter d as the radius for local neighborhoods and

α

as the least number of points in each neighborhood.

Algorithm 2 Radius Outlier Removal.

Input: initial point cloud

D_{0}

, parameters d,

α

Output: cleaned point cloud

D_{1}

1:: search d-radius neighborhood $N_{i}$ for each point $P_{i}$ , where

$N_{i} = {N_{i j} \in D_{0} | ∥ N_{i j} - P_{i} ∥ \leq d};$
2:: if number of neighbors $| N_{i} | \geq α$ then put $P_{i}$ into $D_{1}$ ;
3:: return $D_{1}$ .

As an illustration, we add uniform noise to the Stanford Bunny with 10,000 points (see Figure 4).

Then, we apply ROR to denoise the polluted point cloud. The result is shown in Figure 5. From a visual observation, ROR preserves almost all true points but fails to recognize a small portion of noise at any area.

In fact, from a series of repetitive experiments we find that ROR is sensitive to the choice of manual parameters. A small radius will make ROR inefficient, while a large radius will wrongly recognize true points as noise. One of the disadvantages of ROR is that there exists no universal method to determine the best parameters. Further, since ROR uses the kernel method to find the undetermined closest neighbors, the time complexity can reach to

O (n^{2})

where n is the number of points. Thus, in practice, it is often difficult to make a trade-off between efficiency and effect of ROR.

4.2. Statistical Outlier Removal

Compared to ROR, Statistical Outlier Removal (SOR) considers more detailed local structures than density does. SOR showed in Algorithm 3 is one of the most popular methods to preprocess point clouds due to its efficiency when dealing with low density noise. However, SOR gives worse performance than ROR when the noise is of high density. The main idea of SOR comes from one-sigma law from classical statistics [23]. An outlier is believed to be far from the center of its k-nearest neighborhood. Conversely, a true point should lie in a confidence area of its neighborhoods. Let

Φ

be a d-variate Gaussian distribution with expectation

μ

and covariance

Σ

, and let P be a fixed point in

R^{d}

. Then,

Φ

induces a Gaussian distribution on the line

{μ + t v_{P} | t \in R}

where

v_{P} = \frac{P - μ}{∥ P - μ ∥}

. In fact, we write the eigendecomposition of

Σ

as

Σ = E diag (σ_{1}^{2}, \dots, σ_{d}^{2}) E^{T}

, where

E = [e_{1}, \dots, e_{d}]

is an orthogonal matrix. If we write

v_{P} = \sum_{i = 1}^{d} λ_{i} e_{i},

(44)

the projected Gaussian distribution in direction

v_{P}

has null expectation and variance

\sum_{i = 1}^{d} λ_{i}^{2} σ_{i}^{2}

. According to one-sigma law, we say P is in the confidence area of

Φ

if

{∥ P - μ ∥}^{2} \leq \sum_{i = 1}^{d} λ_{i}^{2} σ_{i}^{2},

(45)

which is equivalent to

{(P - μ)}^{T} Σ (P - μ) \geq {∥ P - μ ∥}^{4} .

(46)

This inequality is a generalization of one-sigma law in high dimensions.

Algorithm 3 Statistical Outlier Removal.

Input: initial point cloud

D_{0}

, parameter k

Output: cleaned point cloud

D_{1}

1:: search kNN $N_{i}$ for each point $P_{i}$ ;
2:: compute local mean and local covariance

$μ_{i} = \frac{1}{k} \sum_{j = 1}^{k} N_{i j}, Σ_{i} = \frac{1}{k - 1} \sum_{j = 1}^{k} {(N_{i j} - μ_{i})}^{T} (N_{i j} - μ_{i});$
3:: if ${(P_{i} - μ_{i})}^{T} Σ_{i} (P_{i} - μ_{i}) \geq {∥ P_{i} - μ_{i} ∥}^{4}$ then put $P_{i}$ into $D_{1}$ ;
4:: return $D_{1}$ .

Thus, SOR consists of three steps: first we search the k-nearest neighbors (kNN) for every point. Then, we compute the empirical mean and covariance under the assumption of Gaussian distribution for each neighborhood. Finally, true points are identified using (46). SOR requires a single parameter k for kNN. Again, as an illustration, we use the data in Figure 4. After SOR, the result is shown in Figure 6.

We use KD-tree in kNN search. Thus, the time complexity is known as

O (k n log n)

where k is the number of neighbors and n is the number of points. The remaining steps are finished in

O (n)

time. Therefore, the total time complexity is

O (k n log n)

.

4.3. Adaptive Wasserstein Curvature Denoising

Note that the key step in SOR is to compute the local covariance, which is a positive-definite matrix. Motivated by the idea of SOR, we extract the covariance matrix at each point, which is equivalent to embed the original point cloud into

S P D (n)

. From an intuitive perspective, since the true data presents a particular pattern, the covariance matrices should have a large Wasserstein curvature. Conversely, for noise, the covariance matrices form a flat region. Hence, AWCD is based on a principal hypothesis that the Wasserstein curvature of true data is larger than noise.

Under such a hypothesis, what we need to do is to set a threshold to pick out points with a small Wasserstein curvature. To do so, we gather all information in a histogram counting the number of points of different curvature. By the continuity of curvature function, true data and noise will form two different ’hills’. Figure 7 shows an example for the histogram.

The phase change happens at the borderline of two hills, i.e., we seek to find the second minimal value of the histogram. In Figure 7, the critical value is annotated as ’marked curvature’. In this way, we do not need to set the threshold manually and, instead, achieve an adaptive selection process. Algorithm 4 shows the processing of this adaptive denoising via wasserstein curvature.

Algorithm 4 Adaptive Wasserstein Curvature Denoising.

Input: initial point cloud

D_{0}

, parameter k

Output: cleaned point cloud

D_{1}

1:: search kNN neighbors $N_{i}$ for each point $P_{i}$ ;
2:: compute local mean and local covariance as before;
3:: compute Wasserstein curvature $ρ (Σ)$ as (43);
4:: construct curvature histogram and determine the marked curvature $ρ_{0}$ ;
5:: if $ρ (Σ_{i}) \geq ρ_{0}$ then put $P_{i}$ into $D_{1}$ ;
6:: return $D_{1}$ .

We use the same example as in Figure 4. The performance of AWCD is shown in Figure 8.

In this example, AWCD removes almost all noise far from Stanford Bunny, and remains almost all true data. The only problem is that a small portion of noise lying on the dragon cause the false positiveness and some true data located on the flat part are wrongly removed.

Since the main step in AWCD is also kNN, the time complexity is the same as SOR, which is

O (k n log n)

. Therefore, AWCD is applicable in practice. It is remarkable that AWCD is effective for data with dense noise and robust to the unique parameter k.

4.4. Experiments

We use ROR, SOR and AWCD to denoise polluted data sets with noise of different levels of densities. The point clouds are from the Stanford 3D scanning repository, including Stanford Bunny, Duke Dragon, Armadillo, Lucy and Happy Buddha. For each data set, we add noise and record its signal-noise ratio (SNR). To show the influence of data size, we downsample the original data sets of different scales.

We adopt three criteria to measure the performance of the algorithms, including true positive rate (TPR), false positive rate (FPR) and signal-noise rate growing (SNRG). TPR describes the accuracy to preserve true points from unpolluted data sets. FPR describes the success rate to remove noisy points. SNRG explicates the promotion of SNR after processing. For any polluted point cloud

D_{0} = D \cup N

, where D is the points set of true data and N is the set of noise. We obtain the cleaned point cloud

D_{1}

after the denoising algorithms. Then, the computation of these measurements are

\begin{matrix} TPR = \frac{| D_{1} \cap D |}{| D |}, \\ FPR = 1 - \frac{| D_{1} \cap N |}{| N |}, \\ SNRG = \frac{| D_{1} \cap D |}{| D_{1} \cap N |} \cdot \frac{| N |}{| D |} - 1, \end{matrix}

(47)

where

| \cdot |

denotes the cardinality or size of a finite set. Intuitively, higher TPR, SNRG and lower FPR mean better performance of an algorithm. The experimental results are shown in Table A1 in Appendix I. In each experiment, we highlight the lowest FPR, the highest TPR and the SNRG over 99%. Table A1 shows the superiority of AWCD to ROR and SOR. In general, AWCD can remove almost all noise and meanwhile preserves the true data, except for Armadillo.

5. Edge Detection

In this part, we attempt to apply the Wasserstein curvatures to detect the edges of images with noises. This application follows the idea that the edge parts contain more local information while the relatively flat parts tend to be regarded locally as white noise. Hence, the Wasserstein curvatures have natural advantages to depict the local information. This leads to the following Wasserstein sectional curvature edge detection (WSCED) of Algorithm 5.

Algorithm 5 Wasserstein sectional curvature edge detection.

Input: initial grayscale

F_{0}

with pixels of

n \times m

, parameter k

Output: edge figure

F_{e}

1:: search kNN $N_{i j}$ for each point $P_{i j}$ ;
2:: compute every local covariance $Σ_{i j}$ to obtain the covariance image $C I$ , which is a $(n - 2 k) \times (m - 2 k)$ matrix constructed by matrixes $Σ_{i j}$ ;
3:: determine the section $σ_{i j} : = X_{i j} \land Y_{i j}$ for every point $Σ_{i j}$ on $C I$ by computing tangent vectors $X_{i j} = (Σ_{(i + 1) j} - Σ_{(i - 1) j}), Y_{i j} = (Σ_{i (j + 1)} - Σ_{i (j - 1)})$ ;
4:: compute the Wasserstein sectional curvature for every ${K w |}_{Σ_{i j}} (σ_{i j})$ with (36) to obtain the curvature image $F_{e}$ , which is a $(n - 2 k) \times (m - 2 k)$ real matrix;
5:: return $F_{e}$ .

Similar to what we have done in the last section, the first step of WSCED is computing the local mean and variance after kNN, which can be regarded as a two-dimensional embedding from a image into

S P D (n)

. Every pixel coordinate

(i, j)

determines a local covariance matrix

Σ_{i j}

. In the second step, we compute the sectional curvature for every

Σ_{i j}

. The chosen section

X_{i j} \land Y_{i j}

is determined by two difference vectors along x-axis and y-axis,

\begin{matrix} X_{i j} : = Σ_{(i + 1) j} - Σ_{(i - 1) j}, \\ Y_{i j} : = Σ_{i (j + 1)} - Σ_{i (j - 1)} . \end{matrix}

(48)

According to (36), we obtain the chosen curvature

{K I}_{i j} = K_{w} |_{Σ_{i j}} (X_{i j}, Y_{i j})

on

Σ_{i j}

. Then, we obtain a curvature image

K I

. Finally, with some appropriate image transformation, we can detect edges on

K I

.

In simulations, we compare WSCED to traditional edge detecting filters, including Sobel, Prewitt and Laplacian [24]. We tend to detect edges for images with noises in high density. From Figure 9, we find that WSCED approaches the same outcome as Sobel and Prewitt filters, which implies the potential connection between Wasserstein curvature and edges. This result also shows the robustness of WSCED to noises. We present more effects of digital experiments in Figure A1 in Appendix H.

6. Conclusions and Future Work

In this paper, we studied the geometric characteristics of

(S P D (n), g_{W})

, including geodesics, the connection, Jacobi fields and curvatures. Compared with the existing results, our results are simpler in form and more suitable for computation. Based on these results, we designed novel algorithms for point cloud denoising and image edge detection. Numerical experiments showed that these geometry-based methods were valid for applications. From both a theoretical and practical prospective, we gained a more comprehensive understanding regarding the Wasserstein geometry on

S P D (n)

, which shows that the Wasserstein metric has both deep application potential and mathematical elegance.

In our future work, on the one hand, we aim to study Wasserstein geometry on other matrix manifolds, such as the Stiefel manifold [25], Grassman manifold [26] and some complex matrix manifolds [27]. On the other hand, we would like to generalize geometry-based methods to solve more problems in image, signal processing [28] and data science.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L.; software, Y.L., Y.C.; validation, S.Z.; investigation, H.S.; resources, H.S.; writing—original draft preparation, Y.L.; writing—review and editing, S.Z., Y.C.; visualization, Y.C.; supervision, H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This subject is supported partially by National Key Research and Development Plan of China (No. 2020YFC2006201).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this paper, we obtain all of our point clouds data from the Large Geometric Models Archive of Georgia Tech at https://search.crossref.org/funding (on 5 April 2021). The images were gained from the open resources from Google.

Acknowledgments

The authors would like to express their sincere thanks to Chao Qian for his valuable suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Properties of Γ_A [X]

Here, we list some basic properties, which will be used frequently in the discussions of this paper.

Proposition A1.

For any

A, B \in S P D (n), X, Y \in M (n), Q \in O (n), k \in R

, the following equations hold:

1.: $Γ_{A} [X + k Y] = Γ_{A} [X] + k Γ_{A} [Y]$ ;
2.: $Γ_{k A} [X] = \frac{1}{k} Γ_{A} [X]$ ;
3.: $Γ_{A + B} [X] = Γ_{A} [X] - Γ_{A} [B Γ_{A + B} [X] + Γ_{A + B} [X] B], A + B \in S P D (n)$ ;
4.: $Γ_{A} [A X] = A Γ_{A} [X], Γ_{A} [X A] = Γ_{A} [X] A$ ;
5.: $Γ_{A^{- 1}} [X] = Γ_{A} [A X A] = A Γ_{A} [X] A$ ;
6.: $Γ_{Q A Q^{- 1}} [Q X Q^{- 1}] = Q Γ_{A} [X] Q^{- 1}$ .

Proposition A1 will determine the geometry on

(S P D (n), g_{W})

and be involved into every calculation repetitively and alternatively. We omit the proof since these properties are easily checked.

Appendix B. Proof of Theorem 1

To prove Theorem 1, we shall give two lemmas first.

Lemma A1.

For any

{\tilde{A}}_{1}, {\tilde{A}}_{2} \in G L (n)

,

l (t) = t {\tilde{A}}_{2} + (1 - t) {\tilde{A}}_{1}, t \in [0, 1]

.

\dot{l} (t)

is horizontal if and only if

\det (l (t)) > 0

, and

\dot{l} (0)

is horizontal at

T_{{\tilde{A}}_{1}} G L (n)

.

Proof.

\begin{matrix} \dot{l} (0) = {\tilde{A}}_{2} - {\tilde{A}}_{1} is horizontal at T_{{\tilde{A}}_{1}} G L (n) \\ \Leftrightarrow & {\tilde{A}}_{1}^{- 1} ({\tilde{A}}_{2} - {\tilde{A}}_{1}) \in S y m (n) \\ \Leftrightarrow & {\tilde{A}}_{1}^{- 1} {\tilde{A}}_{2} \in S y m (n) \\ \Leftrightarrow & {\tilde{A}}_{2}^{- 1} {\tilde{A}}_{1} \in S y m (n) \\ \Leftrightarrow & {\tilde{A}}_{2}^{- 1} ({\tilde{A}}_{2} - {\tilde{A}}_{1}) \in S y m (n) \\ \Leftrightarrow & \dot{l} (1) is also horizontal at T_{{\tilde{A}}_{2}} G L (n) . \end{matrix}

(A1)

For any

s \in (0, 1)

that satisfies

\det (l (s)) \neq 0

, we constrict the line segment by

l_{s} (t) : = l (s t)

, and the lemma can be induced by a horizontal of

{\dot{l}}_{s} (1)

. □

Lemma A1 ensures that a line segment is horizontal if its initial vector is horizontal, which brings quite convenience for the following proofs of certain results.

Lemma A2.

For any

A_{1}, A_{2} \in S P D (n)

, there exists

P = A_{1}^{- \frac{1}{2}} {(A_{1} A_{2})}^{\frac{1}{2}} A_{2}^{- \frac{1}{2}} \in O (n)

such that

\tilde{γ} (t) = t P A_{2}^{\frac{1}{2}} + (1 - t) A_{1}^{\frac{1}{2}}

. When

\det (\tilde{γ} (t)) > 0

,

\dot{\tilde{γ}} (t)

keeps horizontal at

T_{\tilde{γ} (t)} G L (n)

.

Proof.

First, we have

\begin{matrix} P^{T} P & = A_{2}^{- \frac{1}{2}} {(A_{2} A_{1})}^{\frac{1}{2}} A_{1}^{- 1} {(A_{1} A_{2})}^{\frac{1}{2}} A_{2}^{- \frac{1}{2}} \\ = A_{2}^{- \frac{1}{2}} A_{1}^{- 1} {(A_{1} A_{2})}^{\frac{1}{2}} {(A_{1} A_{2})}^{\frac{1}{2}} A_{2}^{- \frac{1}{2}} \\ = I . \end{matrix}

(A2)

Then, we only need to check that

A_{1}^{- \frac{1}{2}} \dot{\tilde{γ}} (0) \in S y m (n)

. In fact, we have

\begin{matrix} A_{1}^{- \frac{1}{2}} \dot{\tilde{γ}} (0) & = A_{1}^{- 1} {(A_{1} A_{2})}^{\frac{1}{2}} - I \\ = A_{1}^{- \frac{1}{2}} A_{1}^{- \frac{1}{2}} {(A_{1} A_{2})}^{\frac{1}{2}} A_{1}^{\frac{1}{2}} A_{1}^{- \frac{1}{2}} - I \\ = A_{1}^{- \frac{1}{2}} {(A_{1}^{\frac{1}{2}} A_{2} A_{1}^{\frac{1}{2}})}^{\frac{1}{2}} A_{1}^{- \frac{1}{2}} - I \in S y m (n), \end{matrix}

(A3)

which proves the lemma. □

Remark A1.

Finding the orthogonal matrix P is equivalent to solving a Riccati equation. One can see [29] for details.

To prove Theorem 1, the last step is to clarify the non-degeneration.

Proof .

(Theorem 1) Due to the symmetry from (10), it suffices to prove the diagonal case

A_{1} = Λ

. We shall prove

\det (\tilde{γ} (t)) > 0

,

\forall t \in (0, 1)

. We have

\begin{matrix} \det [\tilde{γ} (t)] & = \det [t Λ^{- \frac{1}{2}} {(Λ A_{2})}^{\frac{1}{2}} + (1 - t) Λ^{\frac{1}{2}}] \\ = {(1 - t)}^{n} \det [I + \frac{t}{(1 - t)} Λ^{- \frac{1}{2}} {(Λ A_{2})}^{- \frac{1}{2}} Λ^{- \frac{1}{2}}] \det [Λ^{\frac{1}{2}}] . \end{matrix}

(A4)

Since

Λ^{- \frac{1}{2}} {(Λ A_{2})}^{\frac{1}{2}} Λ^{- \frac{1}{2}}

is congruent to

{(Λ A_{2})}^{\frac{1}{2}}

, its eigenvalues are all positive. Thus,

\det [\tilde{γ} (t)] > 0

holds for all

t \in (0, 1)

. Combining with Lemmas A1 and A2, the proof is done. □

Appendix C. Proof of Theorem 6

Proof.

Again, with (10), we have

r (A) = r (Λ)

where

A = Q Λ Q^{T}

. Thus, we still focus on

r (Λ)

. Following the definition and discussions around (23),

\begin{matrix} r (Λ) & = inf {ϵ (V) | V \in T_{Λ} S P D (n), ∥ V ∥_{g_{W}} = 1} \\ = inf {∥ V ∥_{g_{W}} | V \in T_{Λ} S P D (n), \det (I + Γ_{Λ} [V]) = 0}, \end{matrix}

(A5)

in which

{∥ \cdot ∥}_{g_{W}}

is the inner product reduced by

g_{W}

. Then, we have

\begin{matrix} 2 r^{2} (Λ) = inf {tr (V Γ_{Λ} [V]) | Γ_{Λ} [V] has eigenvalue η_{k} = - 1} . \end{matrix}

(A6)

Denote

{λ_{i} > 0}

as eigenvalues of

Λ

, and

η_{j}

as eigenvalues of

Γ_{Λ} [V]

. Let

x_{j}

be eigenvectors corresponding to

η_{j}

such that

{x_{j}}

forms an orthogonal basis of

R^{n}

. By direct calculation, we find

\begin{matrix} tr (V Γ_{Λ} [V]) & = \sum_{i = 1}^{n} x_{i}^{T} V Γ_{Λ} [V] x_{i} = \sum_{i = 1}^{n} η_{i} x_{i}^{T} V x_{i} \\ = \sum_{i = 1}^{n} η_{i} x_{i}^{T} (Λ Γ_{Λ} [V] + Γ_{Λ} [V] Λ) x_{i} \\ = \sum_{i = 1}^{n} (η_{i} {(Γ_{Λ} [V] x_{i})}^{T} Λ x_{i} + η_{i} x_{i}^{T} Λ Γ_{Λ} [V] x_{i}) \\ = 2 \sum_{i = 1}^{n} η_{i}^{2} x_{i}^{T} Λ x_{i} \geq 2 x_{k}^{T} Λ x_{k}, \end{matrix}

(A7)

where we used the facts that

Γ

is positive-definite and

η_{k} = - 1

. The equality

tr (V Γ_{Λ} [V]) = x_{k}^{T} Λ x_{k}

holds if and only if

η_{j} = 0, \forall j \neq k

. In these cases, by Algorithm 1, we have

\begin{matrix} \frac{V_{i j}}{λ_{i} + λ_{j}} = {(Γ_{Λ} [V])}_{i j} = - δ_{i k} δ_{j k}, \end{matrix}

(A8)

then we obtain that

\begin{matrix} V_{i j} = - 2 δ_{i k} δ_{i k} λ_{k}, {(x_{k})}_{i} = δ_{k i} . \end{matrix}

(A9)

Thus,

\begin{matrix} tr (V Γ_{Λ} [V]) = 2 x_{k}^{T} Λ x_{k} = 2 λ_{k} . \end{matrix}

(A10)

In particular, we obtain

2 λ_{min}

as

min {tr (V Γ_{Λ} [V])}

, when

Λ_{k} = λ_{min}

and

V_{i j} = - 2 δ_{i k} δ_{j k} λ_{min}

. The tangent vector V is called the speed-degenerated direction. The function

λ_{min} (A)

is certainly continuous, hence

r (A)

is also continuous. □

Appendix D. Proof of Lemma 1

Before proving Lemma 1, we shall prove a key lemma, which points the relation between the Lie-brackets on the total space and base space.

Lemma A3.

The horizontal lift of vector fields commutes with Lie-brackets,

\begin{matrix} {d σ |}_{\tilde{A}} [\tilde{X}, \tilde{Y}] = [X, Y], \forall \tilde{A} \in G L (n) . \end{matrix}

(A11)

Proof .

(Lemma A3) On the flat manifold

(G L (n), g_{E})

, the connection equals to the usual directional derivative in the Euclidean space. Therefore, for vector fields

X, Y

on

S P D (n)

, we have

\begin{matrix} D_{\tilde{X}} \tilde{Y} & = lim_{t \to 0} \frac{1}{t} (\tilde{Y} {|_{\tilde{A} + \tilde{X} t} - \tilde{Y} |}_{\tilde{A}}) \\ = lim_{t \to 0} \frac{1}{t} ((\tilde{A} + \tilde{X} t) Γ_{A + X t} [Y |_{A + X t}] - \tilde{A} Γ_{A} [Y |_{A}]) \\ = lim_{t \to 0} \frac{\tilde{A}}{t} (Γ_{A + X t} [Y |_{A + X t}] - Γ_{A} [Y |_{A}]) + \tilde{X} Γ_{A} [Y] \\ = lim_{t \to 0} \frac{\tilde{A}}{t} (Γ_{A + X t} [Y |_{A + X t}] - Γ_{A} [Y |_{A}]) + \tilde{A} Γ_{A} [X] Γ_{A} [Y] \\ = \tilde{A} (Γ_{A} [d Y (X)] - Γ_{A} [X Γ_{A} [Y] + Γ_{A} [Y] X] + Γ_{A} [X] Γ_{A} [Y]) . \end{matrix}

(A12)

With

Π_{A} [X, Y] : = Γ_{A} [X] Γ_{A} [Y] - Γ_{A} [Y] Γ_{A} [X]

into (2), we obtain

\begin{matrix} Γ_{A} [X Γ_{A} [Y] + Γ_{A} [Y] X - Y Γ_{A} [X] - Γ_{A} [X] Y] = Π_{A} [X, Y] - 2 Γ_{A} [Π_{A} [X, Y]] A . \end{matrix}

(A13)

Then, we compute the Lie-bracket

\begin{matrix} [\tilde{X}, \tilde{Y}] : = & D_{\tilde{X}} \tilde{Y} - D_{\tilde{Y}} \tilde{X} \\ = & \tilde{A} Γ_{A} [d Y (X) - d X (Y)] + \tilde{A} (Γ_{A} [X] Γ_{A} [Y] - Γ_{A} [Y] Γ_{A} [X]) \\ - \tilde{A} Γ_{A} [X Γ_{A} [Y] + Γ_{A} [Y] X - Y Γ_{A} [X] - Γ_{A} [X] Y] \\ = & \tilde{[X, Y]} + 2 \tilde{A} Γ_{A} [Π_{A} [X, Y]] A, \end{matrix}

(A14)

where the last equality in (A14) comes from (A13).

Finally, we show the second term in (A14) is vertical. In fact, we have

\begin{matrix} 2 \tilde{A} Γ_{A} [Π_{A} [X, Y]] A = (2 \tilde{A} Γ_{A} [Π_{A} [X, Y]] {\tilde{A}}^{T}) \tilde{A}, \end{matrix}

(A15)

where

2 \tilde{A} Γ_{A} [Π_{A} [X, Y]] {\tilde{A}}^{T}

is anti-symmetric. Thus, the proof for Lemma A3 has been done. □

By Lemma A3, the proof for Lemma 1 is clarified.

Proof .

(Lemma 1) For any smooth vector field Z on

S P D (n)

, and its horizontal lift

\tilde{Z}

, we have

\begin{matrix} 〈 \tilde{\nabla_{X} Y}, \tilde{Z} 〉 = 〈 \nabla_{X} Y, Z 〉 \\ = & \frac{1}{2} (X 〈 Y, Z 〉 + Y 〈 Z, X 〉 - Z 〈 X, Y 〉 + 〈 Z, [X, Y] 〉 + 〈 Y, [Z, X] 〉 - 〈 X, [Y, Z] 〉) \\ = & \frac{1}{2} (\tilde{X} 〈 \tilde{Y}, \tilde{Z} 〉 + \tilde{Y} 〈 \tilde{Z}, \tilde{X} 〉 - \tilde{Z} 〈 \tilde{X}, \tilde{Y} 〉 + 〈 \tilde{Z}, \tilde{[X, Y]} 〉 + 〈 \tilde{Y}, \tilde{[Z, X]} 〉 - 〈 \tilde{X}, \tilde{[Y, Z]} 〉) \\ = & \frac{1}{2} (\tilde{X} 〈 \tilde{Y}, \tilde{Z} 〉 + \tilde{Y} 〈 \tilde{Z}, \tilde{X} 〉 - \tilde{Z} 〈 \tilde{X}, \tilde{Y} 〉 + 〈 \tilde{Z}, [\tilde{X}, \tilde{Y}] 〉 + 〈 \tilde{Y}, [\tilde{Z}, \tilde{X}] 〉 - 〈 \tilde{X}, [\tilde{Y}, \tilde{Z}] 〉) \\ = & 〈 D_{\tilde{X}} \tilde{Y}, \tilde{Z} 〉 . \end{matrix}

(A16)

By the arbitrariness of Z, Lemma 1 is proved. □

Remark A2.

Proof of Lemma 1 implies that

{d σ |}_{\tilde{A}} (D_{\tilde{X}} \tilde{Y})

is independent on

\tilde{A}

chosen among a fixed fiber.

Appendix E. Proof of Theorem 7

Proof.

Calculating

\tilde{R} |_{\tilde{A}} (\tilde{X}, \tilde{Y}, \tilde{X}, \tilde{Y})

as follows

\begin{matrix} \tilde{R} |_{\tilde{A}} (\tilde{X}, \tilde{Y}, \tilde{X}, \tilde{Y}) = 〈 {\tilde{R}}_{\tilde{X} \tilde{Y}} \tilde{X}, \tilde{Y} 〉 \\ = & 〈 D_{[\tilde{X}, \tilde{Y}]} \tilde{X}, \tilde{Y} 〉 - 〈 D_{\tilde{X}} D_{\tilde{Y}} \tilde{X}, \tilde{Y} 〉 + 〈 D_{\tilde{Y}} D_{\tilde{X}} \tilde{X}, \tilde{Y} 〉 \\ = & 〈 D_{\tilde{[X, Y]}} \tilde{X}, \tilde{Y} 〉 + 〈 D_{2 T (X, Y)} \tilde{X}, \tilde{Y} 〉 - 〈 D_{\tilde{X}} \tilde{\nabla_{Y} X}, \tilde{Y} 〉 \\ - 〈 D_{\tilde{X}} T (Y, X), \tilde{Y} 〉 + 〈 D_{\tilde{Y}} \tilde{\nabla_{X} X}, \tilde{Y} 〉 + 〈 D_{\tilde{Y}} T (X, X), \tilde{Y} 〉 \\ = & 〈 \nabla_{[X, Y]} X, Y 〉 + 2 〈 D_{T (X, Y)} \tilde{X}, \tilde{Y} 〉 - 〈 \nabla_{X} \nabla_{Y} X, Y 〉 \\ - 〈 T (X, Y), T (X, Y) 〉 + 〈 \nabla_{Y} \nabla_{X} X, Y 〉 \\ = & {R |}_{A} (X, Y, X, Y) + 2 〈 D_{T (X, Y)} \tilde{X}, \tilde{Y} 〉 - 〈 T (X, Y), T (X, Y) 〉, \end{matrix}

(A17)

where we used (29), (32),

T_{\tilde{A}} (X, Y) \in V (\tilde{A})

,

T_{\tilde{A}} (X, Y) = - T_{\tilde{A}} (Y, X)

and the following equation

\begin{matrix} 〈 D_{\tilde{X}} T (Y, X), \tilde{Y} 〉 & = \tilde{X} 〈 T (Y, X), \tilde{Y} 〉 - 〈 T (Y, X), D_{\tilde{X}} \tilde{Y} 〉 \\ = 〈 T (X, Y), T (X, Y) + \tilde{\nabla_{X} Y} 〉 \\ = 〈 T (X, Y), T (X, Y) 〉 . \end{matrix}

(A18)

Since

T (X, Y)

and

\tilde{X}, \tilde{Y}

are both vertical, we have

\begin{matrix} 〈 [T (X, Y), \tilde{X}], \tilde{Y} 〉 = 0 . \end{matrix}

(A19)

Then, using

D_{T (X, Y)} \tilde{X} - D_{\tilde{X}} T (X, Y) = [T (X, Y), \tilde{X}]

we can obtain that

\begin{matrix} 〈 D_{T (X, Y)} \tilde{X}, \tilde{Y} 〉 = 〈 D_{\tilde{X}} T (X, Y), \tilde{Y} 〉 = - 〈 T (X, Y), T (X, Y) 〉 . \end{matrix}

(A20)

By the definition of the directional derivative and

σ (\tilde{A} + T (X, Y)) = A

, we have

\begin{matrix} D_{T (X, Y)} \tilde{X} & = lim_{t \to 0} \frac{1}{t} ((\tilde{A} + T (X, Y) t) Γ_{A} [X] - \tilde{A} Γ_{A} [X]) \\ = T (X, Y) Γ_{A} [X] . \end{matrix}

(A21)

Combining (A20) and (A21), we obtain the following equation

\begin{matrix} 〈 T (X, Y), T (X, Y) 〉 = - 〈 T (X, Y) Γ_{A} [X], \tilde{Y} 〉, \end{matrix}

(A22)

and

\begin{matrix} \tilde{R} (\tilde{X}, \tilde{Y}, \tilde{X}, \tilde{Y}) & = R (X, Y, X, Y) + 3 〈 T (X, Y) Γ_{A} [X], \tilde{A} Γ_{A} [Y] 〉 \\ = R (X, Y, X, Y) - 3 〈 T (X, Y), T (X, Y) 〉 . \end{matrix}

(A23)

Due to

\tilde{R} \equiv 0

, we obtain the explicit expression for the Wasserstein curvature

\begin{matrix} R (X, Y, X, Y) = & - 3 〈 T (X, Y) Γ_{A} [X], \tilde{A} Γ_{A} [Y] 〉 \\ = & 3 tr (Γ_{A} [X] T (X, Y) \tilde{A} Γ_{A} [Y]) \\ = & 3 tr (Γ_{A} [X] A Γ_{A} [Π_{A} [X, Y]] A Γ_{A} [Y]) . \end{matrix}

(A24)

□

Appendix F. Proof of Theorem 8

Proof.

Using

p = r, q \neq t

, we have

\begin{matrix} {(S_{1} S_{2})}_{i j} = (1 + δ^{p q}) (1 + δ^{p t}) δ_{i}^{q} δ_{j}^{t} = \{\begin{matrix} 2 δ_{i}^{q} δ_{j}^{t}, & q = p or t = p \\ δ_{i}^{q} δ_{j}^{t}, & q \neq p and t \neq p \end{matrix}, \end{matrix}

(A25)

and

\begin{matrix} {(E_{S_{1}} E_{S_{2}})}_{i j} = \frac{(1 + δ^{p q}) (1 + δ^{p t})}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t})} δ_{i}^{q} δ_{j}^{t} . \end{matrix}

(A26)

Then, we obtain

\begin{matrix} {(Γ_{Λ} [E_{S_{1}}, E_{S_{2}}])}_{i j} & = \frac{{[E_{S_{1}}, E_{S_{2}}]}_{i j}}{λ_{i} + λ_{j}} \\ = \frac{(1 + δ^{p q}) (1 + δ^{p t}) (δ_{i}^{q} δ_{j}^{t} - δ_{j}^{q} δ_{i}^{t})}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t}) (λ_{q} + λ_{t})}, \end{matrix}

(A27)

and

\begin{matrix} \sum_{k = 1}^{n} {(Γ_{Λ} [E_{S_{1}}, E_{S_{2}}])}_{i k} {(Λ E_{S_{2}} E_{S_{1}} Λ)}_{k j} \\ = & \frac{(1 + δ^{p q}) (1 + δ^{p t}) (δ_{i}^{q} δ_{k}^{t} - δ_{k}^{q} δ_{i}^{t})}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t}) (λ_{q} + λ_{t})} \frac{(1 + δ^{p q}) (1 + δ^{p t}) λ_{q} λ_{t}}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t})} δ_{k}^{t} δ_{j}^{q} \\ = & \frac{{(1 + δ^{p q})}^{2} {(1 + δ^{p t})}^{2} λ_{q} λ_{t}}{{(λ_{p} + λ_{q})}^{2} {(λ_{p} + λ_{t})}^{2} (λ_{q} + λ_{t})} δ_{i}^{q} δ_{j}^{q} . \end{matrix}

(A28)

Thus, we obtain the expression for curvature as follows

\begin{matrix} {R |}_{Λ} (S_{1}, S_{2}, S_{1}, S_{2}) & = 3 tr (E_{S_{1}} Λ Γ_{Λ} [E_{S_{1}}, E_{S_{2}}] Λ E_{S_{2}}) \\ = \frac{3 {(1 + δ^{p q})}^{2} {(1 + δ^{p t})}^{2} λ_{q} λ_{t}}{{(λ_{p} + λ_{q})}^{2} {(λ_{p} + λ_{t})}^{2} (λ_{q} + λ_{t})} . \end{matrix}

(A29)

On the other hand, by (A26), we have

\begin{matrix} S_{1} \neq S_{2} ⟹ 〈 S_{1}, S_{2} 〉 = tr (E_{S_{2}} Λ E_{S_{1}}) = tr (Λ E_{S_{1}} E_{S_{2}}) = 0, \end{matrix}

(A30)

and

\begin{matrix} 〈 S_{1}, S_{1} 〉 = \frac{1}{2} tr (S_{1} E_{S_{1}}) = \frac{1 + δ^{p q}}{λ_{p} + λ_{q}}, \\ 〈 S_{2}, S_{2} 〉 = \frac{1}{2} tr (S_{2} E_{S_{2}}) = \frac{1 + δ^{p t}}{λ_{p} + λ_{t}} . \end{matrix}

(A31)

Then, we can find the area of the sectional parallelogram as

\begin{matrix} 〈 S_{1}, S_{1} 〉 〈 S_{2}, S_{2} 〉 - {〈 S_{1}, S_{2} 〉}^{2} = \frac{(1 + δ^{p q}) (1 + δ^{p t})}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t})} . \end{matrix}

(A32)

From (A29) and (A32), the proof is done consequently. □

Appendix G. Proof of Theorem 9

Before proving this theorem, we need to do some preparation. For any

A \in S P D (n)

with eigenvalues

λ_{i}, i = 1, \dots, n

, construct diagonal matrix

Λ = diag (λ_{1}, \dots, λ_{n})

and upper triangle matrix

U = {(\frac{1}{λ_{i} + λ_{j}})}_{i < j}

. In

p = r, q < t

cases, by Theorem 8, we have

\begin{matrix} {K |}_{A} (S^{p, q}, S^{r, t}) = & \frac{3 (1 + δ^{p q}) (1 + δ^{p t}) λ_{q} λ_{t}}{(λ_{p} + λ_{q}) (λ_{p} + λ_{t}) (λ_{q} + λ_{t})} \\ = & 3 U_{q t} ({(U + U^{T})}_{p q} Λ_{q q} + δ_{p q}) ({(U + U^{T})}_{p t} Λ_{t t} + δ_{p t}) \\ = & 3 U_{q t} Λ_{t t} {(U + U^{T})}_{t p} δ_{p q} + 3 {(U + U^{T})}_{p q} Λ_{q q} U_{q t} δ_{p t} \\ + 3 {(U + U^{T})}_{p t} Λ_{t t} U_{t q} Λ_{q q} {(U + U^{T})}_{q p}, \end{matrix}

(A33)

where we used

\begin{matrix} \frac{1 - δ^{p q}}{λ_{p} + λ_{q}} = {(U + U^{T})}_{p q}, \end{matrix}

(A34)

and

\begin{matrix} \frac{(1 + δ^{p q}) λ_{q}}{λ_{p} + λ_{q}} = \frac{(1 - δ^{p q}) λ_{q}}{λ_{p} + λ_{q}} λ_{q} + \frac{2 δ^{p q} λ_{q}}{λ_{p} + λ_{q}} = {(U + U^{T})}_{p q} λ_{q} + δ^{p q} . \end{matrix}

(A35)

Then, we can obtain the scalar curvature by summing (A33).

Proof.

Note that

U_{q t} = 0, q \geq t

, we have

\begin{matrix} ρ (Λ) = & \sum_{p, q < t} {K |}_{Λ} (S^{p, q}, S^{p, t}) \\ = & \sum_{p, q < t} (3 U_{q t} Λ_{t t} {(U + U^{T})}_{t p} δ_{p q} + 3 {(U + U^{T})}_{p q} Λ_{q q} U_{q t} δ_{p t}) \\ + \sum_{p, q < t} (3 {(U + U^{T})}_{p t} Λ_{t t} U_{t q} Λ_{q q} {(U + U^{T})}_{q p}) \\ = & \sum_{p, q, t} (3 U_{q t} Λ_{t t} {(U + U^{T})}_{t p} δ_{p q} + 3 {(U + U^{T})}_{p q} Λ_{q q} U_{q t} δ_{p t}) \\ + \sum_{p, q, t} (3 {(U + U^{T})}_{p t} Λ_{t t} U_{t q} Λ_{q q} {(U + U^{T})}_{q p}) \\ = & 3 tr (U Λ (U + U^{T}) + (U + U^{T}) Λ U + (U + U^{T}) Λ U Λ (U + U^{T})) . \end{matrix}

(A36)

□

Appendix H. Effects of WSCED

Figure A1. WSCED on different polluted images. The first row shows the original RGB images. The second row shows images with uniform noises. The third row shows edge images after WSCED.

Appendix I

Table A1. Comparison of three denosing algorithms.

Datasource (Size)	Original SNR	ROR			SOR			AWCD
Datasource (Size)	Original SNR	TPR	FPR	SNRG	TPR	FPR	SNRG	TPR	FPR	SNRG
Stanford Bunny (10,000)	10	100.00%	18.10%	452.5%	87.78%	49.00%	79.1%	99.28%	4.80%	1968.3%
Stanford Bunny (10,000)	0.1	100.00%	100.00%	0.0%	84.44%	71.61%	17.9%	99.93%	90.53%	10.4%
Duke Dragon (100,000)	100	100.00%	9.60%	941.7%	82.29%	51.90%	58.6%	100.00%	2.90%	3348.3%
Duke Dragon (100,000)	1	100.00%	100.00%	0.0%	82.12%	70.16%	17.1%	99.88%	2.40%	4054.8%
Armadillo (50,000)	10	62.26%	0.54%	114.3	88.06%	63.46%	38.8%	34.84%	0.01%	347.38
Armadillo (50,000)	1	62.66%	0.41%	153.34	87.99%	69.62%	26.4%	34.84%	0.09%	404.14
Lucy (50,000)	100	100.00%	6.00%	1566.6%	80.71%	51.80%	55.8%	99.99%	0.80%	123.99
Lucy (50,000)	10	100.00%	35.50%	181.7%	80.65%	66.44%	21.4%	99.99%	2.86%	3396.1%
Happy Buddha (50,000)	100	100.00%	6.80%	1370.6%	77.82%	56.60%	37.5%	99.92%	2.40%	4063.3%
Happy Buddha (50,000)	10	100.00%	13.86%	621.5%	78.24%	64.76%	20.8%	99.93%	1.88%	5215.6%

References

Hosseini, R.; Sra, S. Matrix Manifold Optimization for Gaussian Mixtures. Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Manton, J.H. A Primer on Stochastic Differential Geometry for Signal Processing. IEEE J. Sel. Top. Signal Process. 2013, 4, 681–699. [Google Scholar] [CrossRef] [Green Version]
Rubner, Y.; Tomasi, C.; Guibas, L.J. A Metric for Distributions with Applications to Image Databases. In Proceedings of the International Conference on Computer Vision, Bombay, India, 7 January 1998. [Google Scholar]
Bhatia, R. Matrix Ananlysis. Graduate Texts in Mathematics; Springer: New York, NY, USA, 1997. [Google Scholar]
Bhatia, R. Positive Definite Matrices; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Pennec, X.; Fillard, P.; Ayache, N. A Riemannian Framework for Tensor Computing. Int. J. Comput. Vis. 2006, 1, 41–66. [Google Scholar] [CrossRef] [Green Version]
Arsigny, V.; Fillard, P.; Pennec, X.; Ayache, N. Log-Euclidean Metrics for Fast and Simple Calculus on Diffusion Tensors. Magn. Reson. Med. 2010, 2, 411–421. [Google Scholar] [CrossRef] [PubMed]
Hall, B.C. Lie Groups, Lie Algebras, and Representations; Springer: Berlin, Germany, 2015. [Google Scholar]
Li, Y.; Wong, K. Riemannian Distances for Signal Classification by Power Spectral Density. IEEE J. Sel. Top. Signal Process. 2013, 7, 655–669. [Google Scholar] [CrossRef]
Zhang, S.; Cao, Y.; Li, W.; Yan, F.; Luo, Y.; Sun, H. A New Riemannian Structure in SPD(n). In Proceedings of the IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019. [Google Scholar]
Gelbrich, M. On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces. Math. Nachrichten 2015, 1, 185–203. [Google Scholar] [CrossRef]
Givens, C.R.; Shortt, R.M. A Class of Wasserstein Metrics for Probability Distributions. Mich. Math. J. 1984, 2, 231–240. [Google Scholar] [CrossRef]
Martin, A.; Soumith, C.; Leon, B. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
Asuka, T. On Wasserstein Geometry of Gaussian Measures. In Probability Approach to Geometry; Mathematical Society of Japan: Tokyo, Japan, 2010; pp. 463–472. [Google Scholar]
Massart, E.; Absil, P.-A. Quotient Geometry with Simple Geodesics for the Manifold of Fixed-Rank Positive-Semidefinite Matrices. Siam J. Matrix Anal. Appl. 2020, 1, 171–198. [Google Scholar] [CrossRef]
Rusu, R.B.; Cousins, S. 3D is Here: Point Cloud Library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011. [Google Scholar]
Ward, A.J.B. A General Analysis of Sylvester’s Matrix Equation. Int. J. Math. Educ. Sci. Technol. 1991, 22, 615–620. [Google Scholar] [CrossRef]
Do Carmo, M.P. Riemannian Geometry; Springer: Boston, MA, USA, 1992. [Google Scholar]
Lee, J.M. Introduction to Smooth Manifolds; Springer: Berlin, Germany, 2003. [Google Scholar]
Kobayashi, S. Transformation Groups in Differential Geometry. In Classics in Mathematics; Springer: Berlin, Germany, 1932. [Google Scholar]
Lee, J.M. Riemannian Manifolds An Introduction to Curvature; Springer: New York, NY, USA, 1997. [Google Scholar]
Boothby, W.M. An Introduction to Differentiable Manifolds and Riemannian Geometry, 2nd ed.; Academic Press: Cambridge, MA, USA, 1986. [Google Scholar]
Masry, E. Multivariate Probability Density Estimation for Associated Processes: Strong Consistency and Rates. Stat. Probab. Lett. 2002, 2, 205–219. [Google Scholar] [CrossRef]
Nadernejad, E.; Sharifzadeh, S.; Hassanpour, H. Edge Detection Techniques: Evaluations and Comparisons. Appl. Math. Sci. 2008, 2, 1507–1520. [Google Scholar]
Nishimori, Y.; Akaho, S. Learning Algorithms Utilizing Quasi-geodesic Flows on the Stiefel Manifold. Neurocomputing 2005, 67, 106–135. [Google Scholar] [CrossRef]
Fiori, S. A Theory for Learning by Weight Flow on Stiefel-Grassman Manifold. Neural Comput. 2001, 13, 1625–1647. [Google Scholar] [CrossRef]
Hua, X.; Ono, Y.; Peng, L.; Cheng, Y.; Wang, H. Target Detection Within Nonhomogeneous Clutter Via Total Bregman Divergence-Based Matrix Information Geometry Detectors. IEEE Trans. Signal Process. 2021, 69, 4326–4340. [Google Scholar] [CrossRef]
Hua, X.; Peng, L. MIG Median Detectors with Manifold Filter. Signal Process. 2021, 188, 108176. [Google Scholar] [CrossRef]
Malag, L.; Montrucchio, L.; Pistone, G. Wasserstein Riemannian Geometry of Gaussian densities. In Information Geometry; Springer: Berlin/Heidelberger, Germany, 2018; pp. 137–179. [Google Scholar]

Figure 1. Geodesics of

g_{W}

on

S P D (2)

.

Figure 1. Geodesics of

g_{W}

on

S P D (2)

.

Figure 2. Three geodisical balls with the same radius.

Figure 3. Scalar curvatures on SPD(2).

Figure 4. Stanford Bunny with uniform noise.

Figure 5. Cleaned point cloud by ROR.

Figure 6. Cleaned point cloud by SOR.

Figure 7. Histogram of the Wasserstein scalar curvature; x-axis: scalar curvature and y-axis: point number.

Figure 8. Cleaned point cloud by AWCD.

Figure 9. An example to show different edge detection algorithms on the flower image.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, Y.; Zhang, S.; Cao, Y.; Sun, H. Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing. Entropy 2021, 23, 1214. https://doi.org/10.3390/e23091214

AMA Style

Luo Y, Zhang S, Cao Y, Sun H. Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing. Entropy. 2021; 23(9):1214. https://doi.org/10.3390/e23091214

Chicago/Turabian Style

Luo, Yihao, Shiqiang Zhang, Yueqi Cao, and Huafei Sun. 2021. "Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing" Entropy 23, no. 9: 1214. https://doi.org/10.3390/e23091214

APA Style

Luo, Y., Zhang, S., Cao, Y., & Sun, H. (2021). Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing. Entropy, 23(9), 1214. https://doi.org/10.3390/e23091214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing

Abstract

1. Introduction

2. Preliminary

2.1. Notation

2.2. Wasserstein Metric

2.3. Symmetry

3. Wasserstein Geometry of S P D ( n )

3.1. Geodesic

3.2. Exponential Map

3.3. Connection

3.4. Curvature

3.4.1. Riemannian Curvature Tensor

3.4.2. Sectional Curvature

3.4.3. Scalar Curvature

4. Point Cloud Denoising

4.1. Radius Outlier Removal

4.2. Statistical Outlier Removal

4.3. Adaptive Wasserstein Curvature Denoising

4.4. Experiments

5. Edge Detection

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Properties of ΓA [X]

Appendix B. Proof of Theorem 1

Appendix C. Proof of Theorem 6

Appendix D. Proof of Lemma 1

Appendix E. Proof of Theorem 7

Appendix F. Proof of Theorem 8

Appendix G. Proof of Theorem 9

Appendix H. Effects of WSCED

Appendix I

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Wasserstein Geometry of $S P D (n)$

Appendix A. Properties of Γ_A [X]