1. Introduction
Symmetric positivedefinite matrices have wide usage in many fields of information science, such as stability analysis of signal processing, linear stationary systems, optimal control strategies and imaging analysis [
1,
2,
3]. Its importance is beyond words [
4,
5]. Instead of considering a single matrix, contemporary scientists tend to comprehend the global structure of the set consisting of all
$n\times n$ symmetric positivedefinite matrices. This set is known as
$SPD(n)$.
$SPD(n)$ can be endowed with various structures. The most traditional Euclidean metric is induced as submanifold metric from the Euclidean inner product on the space of matrices. X. Pennec, P. Fillard et al. [
6] defined the affineinvariant Riemannian metric. V. Arsigny, P. Fillard et al. [
7] showed the Lie group [
8] structure on
$SPD(n)$, which admits a biinvariant metric called the LogEuclidean metric.
Recently, by constructing a principle bundle, Y. Li, M. Wong et al. [
9] and S. Zhang et al. [
10] defined a new Riemannian metric on
$SPD(n)$ whose geodesic distance is equivalent to the Wasserstein2 [
11,
12] distance, the socalled Wasserstein metric. This distance rather than the metric has been widely used in artificial intelligence [
13]. In geometry, encouragingly, T. Asuka [
14] and E. Massart, P.A. Absil [
15] gave a series of expressions for geometric quantities theoretically. However, these expressions are too general or complicated to be applied. In this paper, we derive more computationally feasible expressions in a concrete case. Moreover, we give the Jacobi field and scalar curvature.
Along the blooming of data science, point cloud processing, especially denoising, plays an increasingly important role in data relevant researches and engineering. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares such as PCL [
16]. These methods share a common drawback when high density noise is added to point clouds. Utilizing the geometric structure of
$SPD(n)$, we design a novel algorithm to overcome this drawback. Compared to traditional methods, our algorithm is more accurate and less dependent on artificial parameters.
In addition to that, we involve our theory for image edge detection, which is a classical problem in image processing and design a new detecting algorithm. Different from traditional gradientbased filters, such as Sobel, Prewitt and Laplacian, we present the connection between Wasserstein sectional curvature and edges. Experiments show the feasibility of our algorithm.
The paper is organized as follows. In
Section 2, we introduce some basic knowledge of the Riemannian manifold
$(SPD(n),{g}_{W})$, and consider the symmetry of the
$(SPD(n),{g}_{W})$ as well. In
Section 3, we describe the Wasserstein geometry of
$SPD(n)$, including the geodesic, exponential map, connection and curvature. In particular, we prove the geodesic convexity and the nonexistence of cut locus and conjugate pair. In
Section 4, we design an adaptive algorithm to denoise point clouds. In
Section 5, we develop a curvaturebased method to detect image edge. Proofs and detailed numerical results are presented in the
Appendix B.
2. Preliminary
2.1. Notation
In this paper, we adopt conventional notations in algebra and geometry. Riemannian manifolds are denoted as pairs of $(\mathrm{manifold},\mathrm{metric})$. For example, our mainly interesting object is $(SPD(n),{g}_{W})$, meaning $SPD(n)$ endowed with Wasserstein metric. ${\mathbb{R}}^{n}$ is the ndimensional Euclidean space. $M(n)$ represents the set of $n\times n$ matrices, $Sym(n)$ means the set of $n\times n$ symmetric matrices, and $O(n)$ means the set of $n\times n$ orthogonal matrices. ${T}_{A}M$ is conventionally the tangent space of M at a point A.
$\mathsf{\Lambda}$ always represents a diagonal $n\times n$ matrix. For an $n\times n$ matrix A, $\lambda (A)$ or ${\lambda}_{i}(A)$ means an eigenvalue or the ith eigenvalue of A, respectively. The components of matrix A with the entries ${A}_{ij}$ will always be noted as $[{A}_{ij}]$. The identity matrix is denoted as I. In this paper, we conventionally express points on manifolds as $A,B$ and vector fields as $X,Y$.
Sylvester equation is one of the most classical matrix equations. The following special case of Sylvester equation plays a key role in understand the geometry of
$(SPD(n),{g}_{W})$We denote the solution about
K of (
1) as
${\Gamma}_{A}[X]$. Then, the matrix
${\Gamma}_{A}[X]\in M(n)$ satisfies
From geometric aspects, we can ensure the existence and uniqueness of the solution in the case involved in this paper. Some properties of
${\Gamma}_{A}[X]$ can be found in
Appendix A. More details about the Sylvester equation are presented in [
17].
We recall an algorithm to solve this kind of Sylvester equations, which offers an explicit expression of the solution. This expression only depends on the eigenvalue decomposition. More details can be found in [
10].
Algorithm 1 will be used frequently in the following passage, and it helps us to comprehend the geometry of
$SPD(n)$. Note that this algorithm can also be used for general
$X\in M(n)$. In particular, when
X is symmetric (skewsymmetric),
${\Gamma}_{A}[X]$ is also symmetric (skewsymmetric). Moreover, this algorithm will be simplified if
A is diagonal.
Algorithm 1 Solution of Sylvester Equation. 
Input:$A\in SPD(n),X\in Sym(n)$ Output:${\Gamma}_{A}[X]$  1:
Eigenvalue decomposition, $A=Q\mathsf{\Lambda}{Q}^{T}$, where $Q\in O(n),\mathsf{\Lambda}:=\mathrm{diag}({\lambda}_{1},\cdots ,{\lambda}_{n})$ are eigenvalues of A;  2:
${C}_{X}:=[{c}_{ij}]={Q}^{T}XQ$;  3:
${E}_{X}=[{e}_{ij}]=\left(\right)open="("\; close=")">\frac{{c}_{ij}}{{\lambda}_{i}+{\lambda}_{j}}$;  4:
${\Gamma}_{A}[X]=QE{Q}^{T}$.

2.2. Wasserstein Metric
In this part, we introduce the Wasserstein metric on $SPD(n)$.
Definition 1. For any $A\in SPD(n)$, $X,Y\in {T}_{A}SPD(n)$, define In the second equation, we use the facts that
${\Gamma}_{A}[X],{\Gamma}_{A}[Y],A$ are all symetric and that
$A{\Gamma}_{A}[X]+{\Gamma}_{A}[X]A=X$. One can check that
${g}_{W}$ is a symmetric and nondegenerated bilinear tensor fields on
$SPD(n)$ [
18]. We call
${g}_{W}$ the Wasserstein metric.
Denote ${g}_{E}(\tilde{X},\tilde{Y}):=\mathrm{tr}({\tilde{X}}^{T}\tilde{Y}),\forall \tilde{A}\in GL(n),\tilde{X},\tilde{Y}\in {T}_{\tilde{A}}GL(n)$ as Euclidean metric on $GL(n)$. Then, we have the following conclusions.
Proposition 1. The projectionis a Riemannian submersion [19], which means that $\mathrm{d}\sigma $ is surjective and Remark 1. The general linear group with Euclidean metric $(GL(n),{g}_{E})$ and projection σ is a trivial principal bundles on $SPD(n)$ with orthogonal group $O(n)$ as the structure group. This bundle structure establishes two facts [10]: $SPD(n)\cong GL(n)/O(n)$, and ${g}_{E}$ remains invariant under the group action of $O(n)$. Before giving more conclusions, we review some concepts. For any $\tilde{A}\in GL(n)$, we say that the tangent vector $\tilde{V}\in {T}_{\tilde{A}}GL(n)$ is vertical if $\mathrm{d}\sigma (\tilde{V})=0$, and $\tilde{W}\in {T}_{\tilde{A}}GL(n)$ is horizontal if ${g}_{W}{}_{\tilde{A}}(\tilde{V},\tilde{W})=0$ for all vertical vecters $\tilde{V}$. In addition to that, if $\mathrm{d}\sigma (\tilde{X})=X\in {T}_{A}SPD(n)$, we call $\tilde{X}$ as a lift of X, where $A=\sigma (\tilde{A})$. Using the notation ${\Gamma}_{A}[X]$, we can find the horizontal lift of $X\in {T}_{A}SPD(n)$.
Proposition 2. For any $\tilde{A}\in (GL(n),{g}_{E})$, $A=\sigma (\tilde{A})$ and any $X\in {T}_{A}SPD(n)$, there is a unique $\tilde{X}$ to be the horizontal lift of X at ${T}_{\tilde{A}}GL(n)$—that is, Using Proposition 2, we can obtain the representations of horizontal and vertical vectors.
Proposition 3. For any $\tilde{A}\in (GL(n),{g}_{E})$, ${T}_{\tilde{A}}GL(n)$ has the following orthogonal decompositionwhere $H(\tilde{A})$ consists of all horizontal vectors, $V(\tilde{A})$ consists of all vertical vectors, and Proofs of Proposition 2 and 3 can be found in [
10].
2.3. Symmetry
Now we study the symmetry of
$(SPD(n),{g}_{W})$. Consider orthogonal group action
$\mathsf{\Psi}:O(n)\times SPD(n)\to SPD(n)$ defined by
One can check that
$\mathsf{\Psi}$ is a group action of
$O(n)$ and that
$\mathrm{d}{\mathsf{\Psi}}_{O}$ are isometric for all
$O\in O(n)$, which means that
$O(n)$ is isomorphic to a subgroup of the isometry group
$ISO(SPD(n),{g}_{W})$ on
$SPD(n)$. Precisely, we have
According to (
10), when we study local geometric characteristics, we only need to consider the sorted diagonal matrices as the representational elements under the orthogonal action rather than all general points on
$SPD(n)$. Therefore, some pointwise quantities, such as the scalar curvature and the bounds of sectional curvatures, depend only on the eigenvalues.
At the end of this part, we give the symmetry degree of $(SPD(n),{g}_{W})$, which is defined by the dimension of $ISO(SPD(n),{g}_{W})$.
Proposition 4. $(SPD(n)),{g}_{W})$ has its symmetry degree controlled by Proof. The famous interval theorem [
20] about isometric group shows the nonexistence of isometric groups with dimension between
$\frac{m(m1)}{2}+1$ and
$\frac{m(m+1)}{2}$, for any
mdimensional Riemannian manifold, except
$m\ne 4$.
On one hand, For an ndimensional Riemannian manifold, the dimension of isometry group achieves maximum if and only if it has constant sectional curvature. However, we will show later that $(SPD(n),{g}_{W})$ has no constant sectional curvature, which means its symmetry degree is less than the highest.
On the other hand, Equation (
10) shows that the dimension of Wasserstein isometric group is higher than the dimension of
$O(n)$. Therefore, by
$\mathrm{dim}(SPD(n))=\frac{{n}^{2}+n}{2}\ne 4$ and
$\mathrm{dim}(O(n))=\frac{{n}^{2}n}{2}$, we obtain the desired result. □
3. Wasserstein Geometry of $SPD(n)$
3.1. Geodesic
In this part, we give the expression of geodesic on $(SPD(n),{g}_{W})$, including the geodesic jointing of two points and the geodesic with initial values. Then, we will show that the whole Riemannian manifold $(SPD(n),{g}_{W})$ is geodesic convex, which means that we can always find the minimal geodesic jointing any two points. To some extent, geodesic convexity may make up for the incompleteness of $(SPD(n),{g}_{W})$.
To prove the geodesic convexity of $(SPD(n),{g}_{W})$, we need the following theorem.
Theorem 1. For any ${A}_{1},{A}_{2}\in SPD(n)$, let ${\tilde{A}}_{1}={A}_{1}^{\frac{1}{2}}$ be the fixed lift of ${A}_{1}$, there exists a lift of ${A}_{2}$such that the line segment $\tilde{\gamma}(t)=t{\tilde{A}}_{2}+(1t){\tilde{A}}_{1},t\in [0,1]$ is horizontal and nondegenerated. Proof of Theorem 1 can be found in
Appendix B. This theorem brings some geometrical and physical facts.
Corollary 1. (geodesic convexity) $(SPD(n),{g}_{W})$ is a geodesic convex Riemannian manifold. Between any two points ${A}_{1},{A}_{2}\in SPD(n)$, there exists a minimal Wasserstein geodesicwhere $\gamma (t)$ lies on $SPD(n)$ strictly. Thus, $(SPD(n),{g}_{W})$ is geodesic convex. The similar expression of geodesic can also be found in several papers [
14,
15].
Theorem 2. For any two points in $(SPD(n),{g}_{W})$, there exists a unique geodesic jointing them. From geometric aspect, there is no cut locus on any geodesic.
Proof. We have proved the existence of minimal geodesic jointing any two points in Corollary 1. Assume that the there exists two minimal geodesics jointing
$A,B\in SPD(n)$. Fix
$\tilde{A}={A}^{\frac{1}{2}}$ as the horizontal lift of
A and lift horizontally these two geodesic, we will find two horizontal lifts of
B. Denote these two lifts as
${\tilde{B}}_{1}={B}^{\frac{1}{2}}{Q}_{1},{\tilde{B}}_{2}={B}^{\frac{1}{2}}{Q}_{2},{Q}_{1},{Q}_{2}\in O(n)$. Then,
${Q}_{1}$ and
${Q}_{2}$ are both solutions of the following optimization problem
Since the compactness of
$O(n)$, this problem has a unique solution. Thus, we prove the uniqueness of minimal geodesic. □
Remark 2. Due to the nonexistence of cut locus, there exists no conjugate pair on $(SPD(n),{g}_{W})$.
3.2. Exponential Map
Following Lemma A1, we can directly write down the Wasserstein logarithm on
$SPD(n)$, for any
${A}_{1}\in SPD(n)$,
${log}_{{A}_{1}}:SPD(n)\to {T}_{{A}_{1}}SPD(n)$By solving the inverse problem of above equation, we gain the expression of the Wasserstein exponential.
Theorem 3. In a small open ball $B(0,\epsilon ),\epsilon >0$ in ${T}_{A}SPD(n)\cong {\mathbb{R}}^{\frac{1}{2}n(n+1)}$, the Wasserstein exponential at A, ${exp}_{A}:\mathcal{B}(0,\epsilon )\to SPD(n)$ is explicitly expressed by Proof. By choosing the normal coordinates [
21] at
A, there always exist neighborhoods where
${exp}_{A}$ is welldefined. From (
15), given
${exp}_{A}X$ as welldefined, this satisfies
This equation can convert to the Sylvester equation, and we can express its solution as
Therefore, we have
which finishes this proof. □
Remark 3. We call the first two terms of right hand $A+X$ as the Euclidean exponential, and the last term of right hand ${\Gamma}_{A}[X]A{\Gamma}_{A}[X]$ as the Wasserstein correction for this bend manifold.
Corollary 2. The geodesic equations with initial conditions $\gamma (0),\dot{\gamma}(0)$ on $(SPD(n),{g}_{W})$ have the following explicit solution Using an exponential map, one can directly construct Jacobi fields with the geodesic variation.
Theorem 4. Along a geodesic $\gamma (t)$ with $\gamma (0)=A\in SPD(n),\dot{\gamma}(0)=X\in {T}_{A}SPD(n)$, there exists a unique normal Jacobi vector field $J(t)$ with initial conditions $J(0)=0,{\nabla}_{\dot{\gamma}(0)}J(t)=Y\in {T}_{A}SPD(n)$, where ${g}_{W}{}_{A}\langle X,Y\rangle =0$. We have As in [
18],
$J(t)$ is constructed by
substituting (
16) into (
22), and Theorem 4 comes from direct computation.
Subsequently, the next natural question is what is the maximal length of the extension of a geodesic. This question is equivalent to what is the largest domain of the exponential. We still focus on diagonal matrices.
Theorem 5. For any $A\in SPD(n)$ and $X\in {T}_{A}SPD(n)$, ${exp}_{A}(tX):[0,\epsilon )\to SPD(n)$ is welldefined if and only ifwhere ${\lambda}_{min}$ is the minimal eigenvalue of ${\Gamma}_{A}[X]$. Proof. Evidently,
${\epsilon}_{max}=min\{s>0\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{det}({exp}_{A}(sX))=0\}$. By (
19), we have
where
$\lambda ({\Gamma}_{A}[X])$ is the eigenvalue of
${\Gamma}_{A}[X]$. Thus,
${\epsilon}_{max}=min\left(\right)open="\{"\; close="\}">\frac{1}{\lambda ({\Gamma}_{A}[X])}0$. □
Corollary 3. The Wasserstein metric ${g}_{W}$ on $SPD(n)$ is incomplete.
Corollary 3 can be directly obtained from Hopf–Rinow theorem [
22]. Theorem 5 and the next theorem help us to comprehend the size of
$(SPD(n),{g}_{W})$ from sense of each point.
Figure 1 shows geodesics starting from different origins on
$SPD(2)$. From this group of pictures, we can observe the outline of the manifold and some behaviors of geodesics.
Using ${\epsilon}_{max}$, we can obtain the injectivity radius $r(A),\forall A\in SPD(n)$. Geometrically speaking, $r(A)$ is the maximal radius of the ball in which ${exp}_{A}$ is welldefined.
Theorem 6. The Wasserstein radius $r(A):SPD(n)\to (0,+\infty )$ can be given byand the function $r(A)$ is continuous. Proof of Theorem 6 can be found in
Appendix C. Due to the geodesic convexity, the radius actually defines the Wasserstein distance of a point on
$SPD(n)$ to the ’boundary’ of the manifold. It also measures the degenerated degree of a positivedefinite symmetric matrix by
$\sqrt{{\lambda}_{min}}$.
Figure 2 shows three maximal geodisical balls with different centers on
$SPD(2)$. From the viewpoint of
${\mathbb{R}}^{3}$, the three balls have different sizes in the sense of Euclidean distance, but on
$(SPD(2),{g}_{W})$, all of them have the same radius.
3.3. Connection
In this section, we will study the Riemannian connection of $(SPD(n),{g}_{W})$, called a Wasserstein connection. The flatness of $(GL(n),{g}_{E})$ and the structure of the Riemannian submersion will take a series of convenience to our work. During computation, we denote both tensor actions of ${g}_{W}$ on $SPD(n)$ and ${g}_{E}$ on $GL(n)$ by $\langle \xb7,\xb7\rangle $. Then, we denote the Euclidean connection as D and the Wasserstein connection as ∇.
The main idea to express the Wasserstein connection is to compute the horizontal term of the Euclidean covariant derivative of lifted vector fields. We shall prove:
Lemma 1. The Euclidean connection is a lift of the Wasserstein connection. For any smooth vector fields X and Y on $SPD(n)$, and $\tilde{X}$ and $\tilde{Y}$ are their horizontal lifts, respectively, the following equation holds Proof of Lemma 1 can be found in
Appendix D. This lemma holds for general Riemannian submersion. The reason we reprove it for the case is that we will need use some middle results of the proof later. Using Lemma 1, we can find a direct corollary, which is one of the essential results in this paper.
Corollary 4. The Wasserstein connection has an explicit expression:where $\mathrm{d}Y(X)$ is a Euclidean directional derivative. Proof. From Lemma 1 and (
A14) (in
Appendix D), we have
The linearity, Leibnitz’s law and symmetry of Wasserstein connection are easily checked from the expression. □
The vertical component of lifted covariant derivative of
Y along
X is a vector field in
$GL(n)$ whose value at
$\tilde{A}$ is defined by
We say
${\mathcal{T}}_{\tilde{A}}$ is an
$\mathcal{A}$tensor. The whole vector field is denoted as
$\mathcal{T}(X,Y)$. By definition and previous results, we can obtain the expression of
${\mathcal{T}}_{\tilde{A}}(X,Y)$.
Proposition 5. ${\mathcal{T}}_{\tilde{A}}(\xb7,\xb7)$ is a antisymmetric bilinear map: ${T}_{A}SPD(n)\otimes {T}_{A}SPD(n)\to {T}_{\tilde{A}}GL(n)$, and it satisfies Proof. Using (
A12) (in
Appendix D), (
6) and (
27), we have
where (
31) shows that
${\mathcal{T}}_{\tilde{A}}(X,Y)$ depends only on
$\tilde{A}$ and the vectors on
${T}_{A}SPD(n)$. The multilinearity and
${\mathcal{T}}_{\tilde{A}}(X,Y)={\mathcal{T}}_{\tilde{A}}(Y,X)$ are easily checked. □
In the following parts, we will show the tensor $\mathcal{T}\left(\right)open="("\; close=")">X,Y$ plays a significant role for computing curvature.
3.4. Curvature
In this part, we tend to understand the curvature of $(SPD(n),{g}_{W})$. Although there exists some relevant results giving abstract expressions for general cases, we obtain simpler expressions and derive the scalar curvature via a special basis.
3.4.1. Riemannian Curvature Tensor
First, we derive the Riemannian curvature of $(SPD(n),{g}_{W})$. We denote the Euclidean curvature on bundle (null entirely) as $\tilde{R}$, and the Wasserstein (Riemannian) curvature on $(SPD(n),{g}_{W})$ as R.
Theorem 7. For any $A\in SPD(n)$, and $X,Y$ are smooth vector fields on $SPD(n)$, the Wasserstein curvature tensor $R(X,Y,X,Y):={\langle {R}_{XY}X,Y\rangle}_{A}$ at A has an explicit expression Proof of Theorem 7 can be found in
Appendix E. The expression
has been derived before by other research group in similar way. However, here we use another way to calculate curvature tensor and find a more explicit expression, which is easier than expanding
${\parallel \mathcal{T}(X,Y)\parallel}^{2}$ directly. In addition to that, from
${\parallel \mathcal{T}(X,Y)\parallel}^{2}\ge 0$ and (
A23) (in
Appendix E), we can obtain the following corollary.
Corollary 5. $(SPD(n),{g}_{W})$ has nonnegative curvatures, namely By solving the Sylvester equation with Algorithm 1, we can simplify the expression. We give the sectional curvature
K of the section
$\mathrm{span}\{X(A),Y(A)\}$
where we use the same donations as Algorithm 1. In particular, in diagonal cases, we obverse that the sectional curvature conforms to the inverse ratio law
These results conform with our visualized views of
$(SPD(n),{g}_{W})$, as presented in
Figure 1, where the manifold tends to be flat when
k increases.
3.4.2. Sectional Curvature
Now, we derive more explicit expressions for sectional curvature and scalar curvature. Conventionally, we only need to consider diagonal cases. Before that, we introduce a basis on
$Sym(n)$, which is the tangent space of
$SPD(n)$. Define
$\left\{{S}^{p,q}\right\}$ as
where the superscripts
$p,q$ marks the nonzero elements in
${S}^{p,q}$ and
$\delta $ is the Kronecker delta. Apparently,
$\left\{{S}^{p,q}\phantom{\rule{3.33333pt}{0ex}}\right\phantom{\rule{3.33333pt}{0ex}}1\le p\le q\le n\}$ forms a basis of
$Sym(n)$. For simplicity, we sometimes sign
${S}^{p,q},{S}^{r,t}$ with
${S}_{1},{S}_{2}$, respectively. In this way, we can express the curvature under this basis.
By direct calculation, we have
By Algorithm 1, we know that
${E}_{S}=\left(\right)open="("\; close=")">\frac{{S}_{ij}}{{\lambda}_{i}+\lambda j}$(
$Q=I$ in the decomposition of
$\mathsf{\Lambda}$). Note that the elements of
$\mathsf{\Lambda},{S}_{1},{S}_{2}$ are all positive; therefore, we have
According to the antisymmetry of curvature tensor, the nonvanishing curvature means that
$\{p,q\}\ne \{r,t\}$. Moreover, by definition we know
${S}^{p,q}={S}^{q,p},{S}^{r,t}={S}^{t,r}$. Without loss of generality, we only need to consider the following particular case:
Theorem 8. For any diagonal matrix $\mathsf{\Lambda}=\mathrm{diag}({\lambda}_{1},\cdots ,{\lambda}_{n})\in SPD(n)$, where ${\lambda}_{1}\le {\lambda}_{2}\le \cdots \le {\lambda}_{n}$, Wasserstein sectional curvature satisfieswhere ${S}_{1}={S}^{p,q},{S}_{2}={S}^{r,t},p=r,q\ne t$. Proof of Theorem 8 can be found in
Appendix F. With the above expansion for sectional curvatures, we can easily find that sectional curvature can be controlled by the secondly minimal eigenvalue, which implies that the curvature will seldom explode even on a domain almost degenerated. Only when the matrices degenerate at over two dimensions will the curvatures be very large. This phenomenon ensures the curvature information makes sense in most applications. Some examples for this phenomenon can be observed later.
3.4.3. Scalar Curvature
In the last part of this section, we calculate the scalar curvature directly.
Theorem 9. For any $A\in SPD(A)$, its scalar curvature $\rho (A)$ iswhere the diagonal matrix $\mathsf{\Lambda}=\mathrm{diag}({\lambda}_{1},\cdots ,{\lambda}_{n})$ is orthogonal similar to A, and $U={\left(\right)}_{\frac{1}{{\lambda}_{i}+{\lambda}_{j}}}ij$. Proof of Theorem 9 can be found in
Appendix G.
Figure 3 presents some examples for scalar curvatures on
$(SPD(2),{g}_{W})$, which shows our argument in the last part of
Section 3.4.2.
4. Point Cloud Denoising
Denoising or outlier removal is a fundamental step of point cloud preprocessing since realworld data are often polluted by noise. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares. For example, PCL [
16] is a popular platform for point cloud processing, which collects four denoising schemes. However, these methods fail to give satisfactory performance when point clouds are polluted by high density noise. To solve this problem, we consider both the statistical and geometrical structure of data and design a new algorithm.
The idea is that by embedding the original point cloud from Euclidean space into $SPD(n)$, the Wasserstein scalar curvature gives essential information about noise and true data. Therefore, our new algorithm mainly contains two steps: First, we give the desired embedding by fitting a Gaussian distribution locally at each point. Then, we identify noise by looking at the histogram of the Wasserstein scalar curvature. Due to the flatness of the space of noise, it is reasonable to classify points with small curvature to be noise. The threshold is set to be the first local minimum of the histogram. We call this new scheme adaptive Wasserstein curvature denoising (AWCD).
In the following, we introduce two traditional denoising methods called radius outlier removal (ROR) and statistical outlier removal (SOR). Then, we explain details about AWCD. Additionally, we carry out experiments using different datasets, with a comparison to two classical methods. From the experimental results, AWCD presents better performance regardless of the data size and the density of noise. We also give a time complexity analysis for each denoising algorithm. The results show that AWCD is as efficient as other classical methods. Thus, it is applicable in many practical tasks.
4.1. Radius Outlier Removal
In Radius Outlier Removal, called ROR (seeing Algorithm 2), points are clustered into two categories according to their local density, i.e., points with low density tend to be recognized as noise, whereas points with high density are recognized as true data. ROR requires two parameters: a preset parameter
d as the radius for local neighborhoods and
$\alpha $ as the least number of points in each neighborhood.
Algorithm 2 Radius Outlier Removal. 
Input: initial point cloud ${D}_{0}$, parameters d, $\alpha $ Output: cleaned point cloud ${D}_{1}$  1:
search dradius neighborhood ${N}_{i}$ for each point ${P}_{i}$, where
 2:
if number of neighbors ${N}_{i}\ge \alpha $ then put ${P}_{i}$ into ${D}_{1}$;  3:
return${D}_{1}$.

As an illustration, we add uniform noise to the Stanford Bunny with 10,000 points (see
Figure 4).
Then, we apply ROR to denoise the polluted point cloud. The result is shown in
Figure 5. From a visual observation, ROR preserves almost all true points but fails to recognize a small portion of noise at any area.
In fact, from a series of repetitive experiments we find that ROR is sensitive to the choice of manual parameters. A small radius will make ROR inefficient, while a large radius will wrongly recognize true points as noise. One of the disadvantages of ROR is that there exists no universal method to determine the best parameters. Further, since ROR uses the kernel method to find the undetermined closest neighbors, the time complexity can reach to $O({n}^{2})$ where n is the number of points. Thus, in practice, it is often difficult to make a tradeoff between efficiency and effect of ROR.
4.2. Statistical Outlier Removal
Compared to ROR, Statistical Outlier Removal (SOR) considers more detailed local structures than density does. SOR showed in Algorithm 3 is one of the most popular methods to preprocess point clouds due to its efficiency when dealing with low density noise. However, SOR gives worse performance than ROR when the noise is of high density. The main idea of SOR comes from onesigma law from classical statistics [
23]. An outlier is believed to be far from the center of its knearest neighborhood. Conversely, a true point should lie in a confidence area of its neighborhoods. Let
$\mathsf{\Phi}$ be a
dvariate Gaussian distribution with expectation
$\mu $ and covariance
$\mathsf{\Sigma}$, and let
P be a fixed point in
${\mathbb{R}}^{d}$. Then,
$\mathsf{\Phi}$ induces a Gaussian distribution on the line
$\{\mu +t{v}_{P}t\in \mathbb{R}\}$ where
${v}_{P}=\frac{P\mu}{\parallel P\mu \parallel}$. In fact, we write the eigendecomposition of
$\mathsf{\Sigma}$ as
$\mathsf{\Sigma}=E\mathrm{diag}({\sigma}_{1}^{2},\cdots ,{\sigma}_{d}^{2}){E}^{T}$, where
$E=[{e}_{1},\cdots ,{e}_{d}]$ is an orthogonal matrix. If we write
the projected Gaussian distribution in direction
${v}_{P}$ has null expectation and variance
${\sum}_{i=1}^{d}{\lambda}_{i}^{2}{\sigma}_{i}^{2}$. According to onesigma law, we say
P is in the confidence area of
$\mathsf{\Phi}$ if
which is equivalent to
This inequality is a generalization of onesigma law in high dimensions.
Algorithm 3 Statistical Outlier Removal. 
Input: initial point cloud ${D}_{0}$, parameter k Output: cleaned point cloud ${D}_{1}$  1:
search kNN ${N}_{i}$ for each point ${P}_{i}$;  2:
compute local mean and local covariance
 3:
if${({P}_{i}{\mu}_{i})}^{T}{\mathsf{\Sigma}}_{i}({P}_{i}{\mu}_{i})\ge {\parallel {P}_{i}{\mu}_{i}\parallel}^{4}$then put ${P}_{i}$ into ${D}_{1}$;  4:
return${D}_{1}$.

Thus, SOR consists of three steps: first we search the knearest neighbors (kNN) for every point. Then, we compute the empirical mean and covariance under the assumption of Gaussian distribution for each neighborhood. Finally, true points are identified using (
46). SOR requires a single parameter
k for kNN. Again, as an illustration, we use the data in
Figure 4. After SOR, the result is shown in
Figure 6.
We use KDtree in kNN search. Thus, the time complexity is known as $O(knlogn)$ where k is the number of neighbors and n is the number of points. The remaining steps are finished in $O(n)$ time. Therefore, the total time complexity is $O(knlogn)$.
4.3. Adaptive Wasserstein Curvature Denoising
Note that the key step in SOR is to compute the local covariance, which is a positivedefinite matrix. Motivated by the idea of SOR, we extract the covariance matrix at each point, which is equivalent to embed the original point cloud into $SPD(n)$. From an intuitive perspective, since the true data presents a particular pattern, the covariance matrices should have a large Wasserstein curvature. Conversely, for noise, the covariance matrices form a flat region. Hence, AWCD is based on a principal hypothesis that the Wasserstein curvature of true data is larger than noise.
Under such a hypothesis, what we need to do is to set a threshold to pick out points with a small Wasserstein curvature. To do so, we gather all information in a histogram counting the number of points of different curvature. By the continuity of curvature function, true data and noise will form two different ’hills’.
Figure 7 shows an example for the histogram.
The phase change happens at the borderline of two hills, i.e., we seek to find the second minimal value of the histogram. In
Figure 7, the critical value is annotated as ’marked curvature’. In this way, we do not need to set the threshold manually and, instead, achieve an adaptive selection process. Algorithm 4 shows the processing of this adaptive denoising via wasserstein curvature.
Algorithm 4 Adaptive Wasserstein Curvature Denoising. 
Input: initial point cloud ${D}_{0}$, parameter k Output: cleaned point cloud ${D}_{1}$  1:
search kNN neighbors ${N}_{i}$ for each point ${P}_{i}$;  2:
compute local mean and local covariance as before;  3:
compute Wasserstein curvature $\rho (\mathsf{\Sigma})$ as ( 43);  4:
construct curvature histogram and determine the marked curvature ${\rho}_{0}$;  5:
if$\rho ({\mathsf{\Sigma}}_{i})\ge {\rho}_{0}$then put ${P}_{i}$ into ${D}_{1}$;  6:
return${D}_{1}$.

We use the same example as in
Figure 4. The performance of AWCD is shown in
Figure 8.
In this example, AWCD removes almost all noise far from Stanford Bunny, and remains almost all true data. The only problem is that a small portion of noise lying on the dragon cause the false positiveness and some true data located on the flat part are wrongly removed.
Since the main step in AWCD is also kNN, the time complexity is the same as SOR, which is $O(knlogn)$. Therefore, AWCD is applicable in practice. It is remarkable that AWCD is effective for data with dense noise and robust to the unique parameter k.
4.4. Experiments
We use ROR, SOR and AWCD to denoise polluted data sets with noise of different levels of densities. The point clouds are from the Stanford 3D scanning repository, including Stanford Bunny, Duke Dragon, Armadillo, Lucy and Happy Buddha. For each data set, we add noise and record its signalnoise ratio (SNR). To show the influence of data size, we downsample the original data sets of different scales.
We adopt three criteria to measure the performance of the algorithms, including true positive rate (TPR), false positive rate (FPR) and signalnoise rate growing (SNRG). TPR describes the accuracy to preserve true points from unpolluted data sets. FPR describes the success rate to remove noisy points. SNRG explicates the promotion of SNR after processing. For any polluted point cloud
${D}_{0}=D\cup N$, where
D is the points set of true data and
N is the set of noise. We obtain the cleaned point cloud
${D}_{1}$ after the denoising algorithms. Then, the computation of these measurements are
where
$\xb7$ denotes the cardinality or size of a finite set. Intuitively, higher TPR, SNRG and lower FPR mean better performance of an algorithm. The experimental results are shown in
Table A1 in
Appendix I. In each experiment, we highlight the lowest FPR, the highest TPR and the SNRG over 99%.
Table A1 shows the superiority of AWCD to ROR and SOR. In general, AWCD can remove almost all noise and meanwhile preserves the true data, except for Armadillo.
5. Edge Detection
In this part, we attempt to apply the Wasserstein curvatures to detect the edges of images with noises. This application follows the idea that the edge parts contain more local information while the relatively flat parts tend to be regarded locally as white noise. Hence, the Wasserstein curvatures have natural advantages to depict the local information. This leads to the following Wasserstein sectional curvature edge detection (WSCED) of Algorithm 5.
Algorithm 5 Wasserstein sectional curvature edge detection. 
Input: initial grayscale ${F}_{0}$ with pixels of $n\times m$, parameter k Output: edge figure ${F}_{e}$  1:
search kNN ${N}_{ij}$ for each point ${P}_{ij}$;  2:
compute every local covariance ${\mathsf{\Sigma}}_{ij}$ to obtain the covariance image $CI$, which is a $(n2k)\times (m2k)$ matrix constructed by matrixes ${\mathsf{\Sigma}}_{ij}$;  3:
determine the section ${\sigma}_{ij}:={X}_{ij}\wedge {Y}_{ij}$ for every point ${\mathsf{\Sigma}}_{ij}$ on $CI$ by computing tangent vectors ${X}_{ij}=({\mathsf{\Sigma}}_{(i+1)j}{\mathsf{\Sigma}}_{(i1)j}),{Y}_{ij}=({\mathsf{\Sigma}}_{i(j+1)}{\mathsf{\Sigma}}_{i(j1)})$;  4:
compute the Wasserstein sectional curvature for every ${Kw}_{{\mathsf{\Sigma}}_{ij}}({\sigma}_{ij})$ with ( 36) to obtain the curvature image ${F}_{e}$, which is a $(n2k)\times (m2k)$ real matrix;  5:
return${F}_{e}$.

Similar to what we have done in the last section, the first step of WSCED is computing the local mean and variance after kNN, which can be regarded as a twodimensional embedding from a image into
$SPD(n)$. Every pixel coordinate
$(i,j)$ determines a local covariance matrix
${\mathsf{\Sigma}}_{ij}$. In the second step, we compute the sectional curvature for every
${\mathsf{\Sigma}}_{ij}$. The chosen section
${X}_{ij}\wedge {Y}_{ij}$ is determined by two difference vectors along
xaxis and
yaxis,
According to (
36), we obtain the chosen curvature
${KI}_{ij}={K}_{w}{}_{{\mathsf{\Sigma}}_{ij}}\left(\right)open="("\; close=")">{X}_{ij},{Y}_{ij}$ on
${\mathsf{\Sigma}}_{ij}$. Then, we obtain a curvature image
$KI$. Finally, with some appropriate image transformation, we can detect edges on
$KI$.
In simulations, we compare WSCED to traditional edge detecting filters, including Sobel, Prewitt and Laplacian [
24]. We tend to detect edges for images with noises in high density. From
Figure 9, we find that WSCED approaches the same outcome as Sobel and Prewitt filters, which implies the potential connection between Wasserstein curvature and edges. This result also shows the robustness of WSCED to noises. We present more effects of digital experiments in
Figure A1 in
Appendix H.
6. Conclusions and Future Work
In this paper, we studied the geometric characteristics of $(SPD(n),{g}_{W})$, including geodesics, the connection, Jacobi fields and curvatures. Compared with the existing results, our results are simpler in form and more suitable for computation. Based on these results, we designed novel algorithms for point cloud denoising and image edge detection. Numerical experiments showed that these geometrybased methods were valid for applications. From both a theoretical and practical prospective, we gained a more comprehensive understanding regarding the Wasserstein geometry on $SPD(n)$, which shows that the Wasserstein metric has both deep application potential and mathematical elegance.
In our future work, on the one hand, we aim to study Wasserstein geometry on other matrix manifolds, such as the Stiefel manifold [
25], Grassman manifold [
26] and some complex matrix manifolds [
27]. On the other hand, we would like to generalize geometrybased methods to solve more problems in image, signal processing [
28] and data science.