Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing

The Wasserstein distance, especially among symmetric positive-definite matrices, has broad and deep influences on the development of artificial intelligence (AI) and other branches of computer science. In this paper, by involving the Wasserstein metric on SPD(n), we obtain computationally feasible expressions for some geometric quantities, including geodesics, exponential maps, the Riemannian connection, Jacobi fields and curvatures, particularly the scalar curvature. Furthermore, we discuss the behavior of geodesics and prove that the manifold is globally geodesic convex. Finally, we design algorithms for point cloud denoising and edge detecting of a polluted image based on the Wasserstein curvature on SPD(n). The experimental results show the efficiency and robustness of our curvature-based methods.


Introduction
Symmetric positive-definite matrices have wide usage in many fields of information science, such as stability analysis of signal processing, linear stationary systems, optimal control strategies and imaging analysis [1][2][3]. Its importance is beyond words [4,5]. Instead of considering a single matrix, contemporary scientists tend to comprehend the global structure of the set consisting of all n × n symmetric positive-definite matrices. This set is known as SPD(n). SPD(n) can be endowed with various structures. The most traditional Euclidean metric is induced as submanifold metric from the Euclidean inner product on the space of matrices. X. Pennec, P. Fillard et al. [6] defined the affine-invariant Riemannian metric. V. Arsigny, P. Fillard et al. [7] showed the Lie group [8] structure on SPD(n), which admits a bi-invariant metric called the Log-Euclidean metric.
Recently, by constructing a principle bundle, Y. Li, M. Wong et al. [9] and S. Zhang et al. [10] defined a new Riemannian metric on SPD(n) whose geodesic distance is equivalent to the Wasserstein-2 [11,12] distance, the so-called Wasserstein metric. This distance rather than the metric has been widely used in artificial intelligence [13]. In geometry, encouragingly, T. Asuka [14] and E. Massart, P.-A. Absil [15] gave a series of expressions for geometric quantities theoretically. However, these expressions are too general or complicated to be applied. In this paper, we derive more computationally feasible expressions in a concrete case. Moreover, we give the Jacobi field and scalar curvature.
Along the blooming of data science, point cloud processing, especially denoising, plays an increasingly important role in data relevant researches and engineering. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares such as PCL [16]. These methods share a common drawback when high density noise is added to point clouds. Utilizing the geometric structure of SPD(n), we design a novel algorithm to overcome this drawback. Compared to traditional methods, our algorithm is more accurate and less dependent on artificial parameters.
In addition to that, we involve our theory for image edge detection, which is a classical problem in image processing and design a new detecting algorithm. Different from traditional gradient-based filters, such as Sobel, Prewitt and Laplacian, we present the connection between Wasserstein sectional curvature and edges. Experiments show the feasibility of our algorithm.
The paper is organized as follows. In Section 2, we introduce some basic knowledge of the Riemannian manifold (SPD(n), g W ), and consider the symmetry of the (SPD(n), g W ) as well. In Section 3, we describe the Wasserstein geometry of SPD(n), including the geodesic, exponential map, connection and curvature. In particular, we prove the geodesic convexity and the nonexistence of cut locus and conjugate pair. In Section 4, we design an adaptive algorithm to denoise point clouds. In Section 5, we develop a curvature-based method to detect image edge. Proofs and detailed numerical results are presented in the Appendix B.

Notation
In this paper, we adopt conventional notations in algebra and geometry. Riemannian manifolds are denoted as pairs of (manifold, metric). For example, our mainly interesting object is (SPD(n), g W ), meaning SPD(n) endowed with Wasserstein metric. R n is the n-dimensional Euclidean space. M(n) represents the set of n × n matrices, Sym(n) means the set of n × n symmetric matrices, and O(n) means the set of n × n orthogonal matrices. T A M is conventionally the tangent space of M at a point A.
Λ always represents a diagonal n × n matrix. For an n × n matrix A, λ(A) or λ i (A) means an eigenvalue or the i-th eigenvalue of A, respectively. The components of matrix A with the entries A ij will always be noted as [A ij ]. The identity matrix is denoted as I. In this paper, we conventionally express points on manifolds as A, B and vector fields as X, Y.
Sylvester equation is one of the most classical matrix equations. The following special case of Sylvester equation plays a key role in understand the geometry of (SPD(n), g W ) AK + KA = X, A ∈ SPD(n), X ∈ Sym(n). (1) We denote the solution about K of (1) as Γ A [X]. Then, the matrix Γ A [X] ∈ M(n) satisfies From geometric aspects, we can ensure the existence and uniqueness of the solution in the case involved in this paper. Some properties of Γ A [X] can be found in Appendix A. More details about the Sylvester equation are presented in [17]. We recall an algorithm to solve this kind of Sylvester equations, which offers an explicit expression of the solution. This expression only depends on the eigenvalue decomposition. More details can be found in [10]. Algorithm 1 will be used frequently in the following passage, and it helps us to comprehend the geometry of SPD(n). Note that this algorithm can also be used for general X ∈ M(n). In particular, when X is symmetric (skew-symmetric), Γ A [X] is also symmetric (skew-symmetric). Moreover, this algorithm will be simplified if A is diagonal.

Wasserstein Metric
In this part, we introduce the Wasserstein metric on SPD(n).

Definition 1. For any
In the second equation, we use the facts that , A are all symetric and that AΓ A [X] + Γ A [X]A = X. One can check that g W is a symmetric and non-degenerated bilinear tensor fields on SPD(n) [18]. We call g W the Wasserstein metric.
Denote g E ( X, Y) := tr( X T Y), ∀ A ∈ GL(n), X, Y ∈ T A GL(n) as Euclidean metric on GL(n). Then, we have the following conclusions.
is a Riemannian submersion [19], which means that dσ is surjective and Remark 1. The general linear group with Euclidean metric (GL(n), g E ) and projection σ is a trivial principal bundles on SPD(n) with orthogonal group O(n) as the structure group. This bundle structure establishes two facts [10]: SPD(n) ∼ = GL(n)/O(n), and g E remains invariant under the group action of O(n).
Before giving more conclusions, we review some concepts. For any A ∈ GL(n), we say that the tangent vector In addition to that, if dσ( X) = X ∈ T A SPD(n), we call X as a lift of X, where A = σ( A). Using the notation Γ A [X], we can find the horizontal lift of X ∈ T A SPD(n).

Proposition 2.
For any A ∈ (GL(n), g E ), A = σ( A) and any X ∈ T A SPD(n), there is a unique X to be the horizontal lift of X at T A GL(n)-that is, Using Proposition 2, we can obtain the representations of horizontal and vertical vectors.

Proposition 3.
For any A ∈ (GL(n), g E ), T A GL(n) has the following orthogonal decomposition where H( A) consists of all horizontal vectors, V( A) consists of all vertical vectors, and Proofs of Proposition 2 and 3 can be found in [10].

Symmetry
Now we study the symmetry of (SPD(n), g W ). Consider orthogonal group action One can check that Ψ is a group action of O(n) and that dΨ O are isometric for all O ∈ O(n), which means that O(n) is isomorphic to a subgroup of the isometry group ISO(SPD(n), g W ) on SPD(n). Precisely, we have According to (10), when we study local geometric characteristics, we only need to consider the sorted diagonal matrices as the representational elements under the orthogonal action rather than all general points on SPD(n). Therefore, some pointwise quantities, such as the scalar curvature and the bounds of sectional curvatures, depend only on the eigenvalues.
At the end of this part, we give the symmetry degree of (SPD(n), g W ), which is defined by the dimension of ISO(SPD(n), g W ).
Proof. The famous interval theorem [20] about isometric group shows the nonexistence of isometric groups with dimension between m(m−1) 2 + 1 and m(m+1)
On one hand, For an n-dimensional Riemannian manifold, the dimension of isometry group achieves maximum if and only if it has constant sectional curvature. However, we will show later that (SPD(n), g W ) has no constant sectional curvature, which means its symmetry degree is less than the highest.
On the other hand, equation (10) shows that the dimension of Wasserstein isometric group is higher than the dimension of O(n). Therefore, by dim(SPD(n)) = n 2 +n 2 = 4 and dim(O(n)) = n 2 −n 2 , we obtain the desired result.

Wasserstein Geometry of SPD(n) 3.1. Geodesic
In this part, we give the expression of geodesic on (SPD(n), g W ), including the geodesic jointing of two points and the geodesic with initial values. Then, we will show that the whole Riemannian manifold (SPD(n), g W ) is geodesic convex, which means that we can always find the minimal geodesic jointing any two points. To some extent, geodesic convexity may make up for the incompleteness of (SPD(n), g W ).
To prove the geodesic convexity of (SPD(n), g W ), we need the following theorem.
Proof of Theorem 1 can be found in Appendix B. This theorem brings some geometrical and physical facts. Corollary 1. (geodesic convexity) (SPD(n), g W ) is a geodesic convex Riemannian manifold. Between any two points A 1 , A 2 ∈ SPD(n), there exists a minimal Wasserstein geodesic where γ(t) lies on SPD(n) strictly. Thus, (SPD(n), g W ) is geodesic convex.
The similar expression of geodesic can also be found in several papers [14,15].

Theorem 2.
For any two points in (SPD(n), g W ), there exists a unique geodesic jointing them. From geometric aspect, there is no cut locus on any geodesic.

Proof.
We have proved the existence of minimal geodesic jointing any two points in Corollary 1. Assume that the there exists two minimal geodesics jointing A, B ∈ SPD(n). Fix A = A 1 2 as the horizontal lift of A and lift horizontally these two geodesic, we will find two horizontal lifts of B. Denote these two lifts as Then, Q 1 and Q 2 are both solutions of the following optimization problem arg min Since the compactness of O(n), this problem has a unique solution. Thus, we prove the uniqueness of minimal geodesic.

Remark 2.
Due to the nonexistence of cut locus, there exists no conjugate pair on (SPD(n), g W ).

Exponential Map
Following Lemma A1, we can directly write down the Wasserstein logarithm on SPD(n), for any A 1 ∈ SPD(n), log A 1 : SPD(n) → T A 1 SPD(n) By solving the inverse problem of above equation, we gain the expression of the Wasserstein exponential.

Theorem 3.
In a small open ball B(0, ε), ε > 0 in T A SPD(n) ∼ = R 1 2 n(n+1) , the Wasserstein exponential at A, exp A : B(0, ε) → SPD(n) is explicitly expressed by Proof. By choosing the normal coordinates [21] at A, there always exist neighborhoods where exp A is well-defined. From (15), given exp A X as well-defined, this satisfies This equation can convert to the Sylvester equation, and we can express its solution as Therefore, we have which finishes this proof.

Remark 3.
We call the first two terms of right hand A + X as the Euclidean exponential, and the last term of right hand Γ A [X]AΓ A [X] as the Wasserstein correction for this bend manifold.

Corollary 2.
The geodesic equations with initial conditions γ(0),γ(0) on (SPD(n), g W ) have the following explicit solution Using an exponential map, one can directly construct Jacobi fields with the geodesic variation.
As in [18], J(t) is constructed by substituting (16) into (22), and Theorem 4 comes from direct computation. Subsequently, the next natural question is what is the maximal length of the extension of a geodesic. This question is equivalent to what is the largest domain of the exponential. We still focus on diagonal matrices.

Theorem 5. For any
where λ min is the minimal eigenvalue of where

Corollary 3. The Wasserstein metric g W on SPD(n) is incomplete.
Corollary 3 can be directly obtained from Hopf-Rinow theorem [22]. Theorem 5 and the next theorem help us to comprehend the size of (SPD(n), g W ) from sense of each point. Figure 1 shows geodesics starting from different origins on SPD (2). From this group of pictures, we can observe the outline of the manifold and some behaviors of geodesics. Using ε max , we can obtain the injectivity radius r(A), ∀A ∈ SPD(n). Geometrically speaking, r(A) is the maximal radius of the ball in which exp A is well-defined. Theorem 6. The Wasserstein radius r(A) : SPD(n) → (0, +∞) can be given by and the function r(A) is continuous.
Proof of Theorem 6 can be found in Appendix C. Due to the geodesic convexity, the radius actually defines the Wasserstein distance of a point on SPD(n) to the 'boundary' of the manifold. It also measures the degenerated degree of a positive-definite symmetric matrix by √ λ min . Figure 2 shows three maximal geodisical balls with different centers on SPD (2). From the viewpoint of R 3 , the three balls have different sizes in the sense of Euclidean distance, but on (SPD(2), g W ), all of them have the same radius.

Connection
In this section, we will study the Riemannian connection of (SPD(n), g W ), called a Wasserstein connection. The flatness of (GL(n), g E ) and the structure of the Riemannian submersion will take a series of convenience to our work. During computation, we denote both tensor actions of g W on SPD(n) and g E on GL(n) by ·, · . Then, we denote the Euclidean connection as D and the Wasserstein connection as ∇.
The main idea to express the Wasserstein connection is to compute the horizontal term of the Euclidean covariant derivative of lifted vector fields. We shall prove: Lemma 1. The Euclidean connection is a lift of the Wasserstein connection. For any smooth vector fields X and Y on SPD(n), and X and Y are their horizontal lifts, respectively, the following equation holds Proof of Lemma 1 can be found in Appendix D. This lemma holds for general Riemannian submersion. The reason we reprove it for the case is that we will need use some middle results of the proof later. Using Lemma 1, we can find a direct corollary, which is one of the essential results in this paper.
where dY(X) is a Euclidean directional derivative.

Proof. From Lemma 1 and (A14) (in Appendix D), we have
The linearity, Leibnitz's law and symmetry of Wasserstein connection are easily checked from the expression.
The vertical component of lifted covariant derivative of Y along X is a vector field in GL(n) whose value at A is defined by We say T A is an A-tensor. The whole vector field is denoted as T (X, Y). By definition and previous results, we can obtain the expression of T A (X, Y).
Proof. Using (A12) (in Appendix D), (6) and (27), we have where (31) shows that T A (X, Y) depends only on A and the vectors on T A SPD(n). The multi-linearity and Recalling (A14) in Appendix D, we also find that In the following parts, we will show the tensor T (X, Y) plays a significant role for computing curvature.

Curvature
In this part, we tend to understand the curvature of (SPD(n), g W ). Although there exists some relevant results giving abstract expressions for general cases, we obtain simpler expressions and derive the scalar curvature via a special basis.

Riemannian Curvature Tensor
First, we derive the Riemannian curvature of (SPD(n), g W ). We denote the Euclidean curvature on bundle (null entirely) as R, and the Wasserstein (Riemannian) curvature on (SPD(n), g W ) as R.

Theorem 7.
For any A ∈ SPD(n), and X, Y are smooth vector fields on SPD(n) , the Wasserstein curvature tensor R(X, Y, X, Y) := R XY X, Y A at A has an explicit expression Proof of Theorem 7 can be found in Appendix E. The expression has been derived before by other research group in similar way. However, here we use another way to calculate curvature tensor and find a more explicit expression, which is easier than expanding T (X, Y) 2 directly. In addition to that, from T (X, Y) 2 ≥ 0 and (A23) (in Appendix E), we can obtain the following corollary.
By solving the Sylvester equation with Algorithm 1, we can simplify the expression. We give the sectional curvature K of the section span{X(A), Y(A)} where we use the same donations as Algorithm 1. In particular, in diagonal cases, we obverse that the sectional curvature conforms to the inverse ratio law These results conform with our visualized views of (SPD(n), g W ), as presented in Figure 1, where the manifold tends to be flat when k increases.

Sectional Curvature
Now, we derive more explicit expressions for sectional curvature and scalar curvature. Conventionally, we only need to consider diagonal cases. Before that, we introduce a basis on Sym(n), which is the tangent space of SPD(n). Define {S p,q } as where the superscripts p, q marks the nonzero elements in S p,q and δ is the Kronecker delta. Apparently, {S p,q | 1 ≤ p ≤ q ≤ n} forms a basis of Sym(n). For simplicity, we sometimes sign S p,q , S r,t with S 1 , S 2 , respectively. In this way, we can express the curvature under this basis. By direct calculation, we have By Algorithm 1, we know that E S = S ij λ i +λj = Γ Λ S, ∀S ∈ T Λ SPD(n)(Q = I in the decomposition of Λ). Note that the elements of Λ, S 1 , S 2 are all positive; therefore, we have According to the anti-symmetry of curvature tensor, the non-vanishing curvature means that {p, q} = {r, t}. Moreover, by definition we know S p,q = S q,p , S r,t = S t,r . Without loss of generality, we only need to consider the following particular case: Theorem 8. For any diagonal matrix Λ = diag(λ 1 , · · · , λ n ) ∈ SPD(n), where λ 1 ≤ λ 2 ≤ · · · ≤ λ n , Wasserstein sectional curvature satisfies where S 1 = S p,q , S 2 = S r,t , p = r, q = t.
Proof of Theorem 8 can be found in Appendix F. With the above expansion for sectional curvatures, we can easily find that sectional curvature can be controlled by the secondly minimal eigenvalue, which implies that the curvature will seldom explode even on a domain almost degenerated. Only when the matrices degenerate at over two dimensions will the curvatures be very large. This phenomenon ensures the curvature information makes sense in most applications. Some examples for this phenomenon can be observed later.

Scalar Curvature
In the last part of this section, we calculate the scalar curvature directly.

Theorem 9. For any A ∈ SPD(A), its scalar curvature ρ(A) is
where the diagonal matrix Λ = diag(λ 1 , · · · , λ n ) is orthogonal similar to A, and U = 1 λ i +λ j i<j . Proof of Theorem 9 can be found in Appendix G. Figure 3 presents some examples for scalar curvatures on (SPD(2), g W ), which shows our argument in the last part of Section 3.4.2.

Point Cloud Denoising
Denoising or outlier removal is a fundamental step of point cloud preprocessing since real-world data are often polluted by noise. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares. For example, PCL [16] is a popular platform for point cloud processing, which collects four denoising schemes. However, these methods fail to give satisfactory performance when point clouds are polluted by high density noise. To solve this problem, we consider both the statistical and geometrical structure of data and design a new algorithm.
The idea is that by embedding the original point cloud from Euclidean space into SPD(n), the Wasserstein scalar curvature gives essential information about noise and true data. Therefore, our new algorithm mainly contains two steps: First, we give the desired embedding by fitting a Gaussian distribution locally at each point. Then, we identify noise by looking at the histogram of the Wasserstein scalar curvature. Due to the flatness of the space of noise, it is reasonable to classify points with small curvature to be noise. The threshold is set to be the first local minimum of the histogram. We call this new scheme adaptive Wasserstein curvature denoising (AWCD).
In the following, we introduce two traditional denoising methods called radius outlier removal (ROR) and statistical outlier removal (SOR). Then, we explain details about AWCD. Additionally, we carry out experiments using different datasets, with a comparison to two classical methods. From the experimental results, AWCD presents better performance regardless of the data size and the density of noise. We also give a time complexity analysis for each denoising algorithm. The results show that AWCD is as efficient as other classical methods. Thus, it is applicable in many practical tasks.

Radius Outlier Removal
In Radius Outlier Removal, called ROR (seeing Algorithm 2), points are clustered into two categories according to their local density, i.e., points with low density tend to be recognized as noise, whereas points with high density are recognized as true data. ROR requires two parameters: a preset parameter d as the radius for local neighborhoods and α as the least number of points in each neighborhood.
2: if number of neighbors |N i | ≥ α then put P i into D 1 ; 3: return D 1 .
As an illustration, we add uniform noise to the Stanford Bunny with 10,000 points (see Figure 4).  Then, we apply ROR to denoise the polluted point cloud. The result is shown in Figure 5. From a visual observation, ROR preserves almost all true points but fails to recognize a small portion of noise at any area.
In fact, from a series of repetitive experiments we find that ROR is sensitive to the choice of manual parameters. A small radius will make ROR inefficient, while a large radius will wrongly recognize true points as noise. One of the disadvantages of ROR is that there exists no universal method to determine the best parameters. Further, since ROR uses the kernel method to find the undetermined closest neighbors, the time complexity can reach to O(n 2 ) where n is the number of points. Thus, in practice, it is often difficult to make a trade-off between efficiency and effect of ROR.

Statistical Outlier Removal
Compared to ROR, Statistical Outlier Removal (SOR) considers more detailed local structures than density does. SOR showed in Algorithm 3 is one of the most popular methods to preprocess point clouds due to its efficiency when dealing with low density noise. However, SOR gives worse performance than ROR when the noise is of high density. The main idea of SOR comes from one-sigma law from classical statistics [23]. An outlier is believed to be far from the center of its k-nearest neighborhood. Conversely, a true point should lie in a confidence area of its neighborhoods. Let Φ be a d-variate Gaussian distribution with expectation µ and covariance Σ, and let P be a fixed point in R d . Then, Φ induces a Gaussian distribution on the line {µ + tv P |t ∈ R} where v P = P−µ P−µ . In fact, we write the eigendecomposition of Σ as Σ = Ediag(σ 2 1 , · · · , σ 2 d )E T , where E = [e 1 , · · · , e d ] is an orthogonal matrix. If we write the projected Gaussian distribution in direction v P has null expectation and variance ∑ d i=1 λ 2 i σ 2 i . According to one-sigma law, we say P is in the confidence area of Φ if which is equivalent to This inequality is a generalization of one-sigma law in high dimensions.

Algorithm 3 Statistical Outlier Removal.
Input: initial point cloud D 0 , parameter k Output: cleaned point cloud D 1 1: search kNN N i for each point P i ; 2: compute local mean and local covariance Thus, SOR consists of three steps: first we search the k-nearest neighbors (kNN) for every point. Then, we compute the empirical mean and covariance under the assumption of Gaussian distribution for each neighborhood. Finally, true points are identified using (46). SOR requires a single parameter k for kNN. Again, as an illustration, we use the data in Figure 4. After SOR, the result is shown in Figure 6. We use KD-tree in kNN search. Thus, the time complexity is known as O(kn log n) where k is the number of neighbors and n is the number of points. The remaining steps are finished in O(n) time. Therefore, the total time complexity is O(kn log n).

Adaptive Wasserstein Curvature Denoising
Note that the key step in SOR is to compute the local covariance, which is a positivedefinite matrix. Motivated by the idea of SOR, we extract the covariance matrix at each point, which is equivalent to embed the original point cloud into SPD(n). From an intuitive perspective, since the true data presents a particular pattern, the covariance matrices should have a large Wasserstein curvature. Conversely, for noise, the covariance matrices form a flat region. Hence, AWCD is based on a principal hypothesis that the Wasserstein curvature of true data is larger than noise.
Under such a hypothesis, what we need to do is to set a threshold to pick out points with a small Wasserstein curvature. To do so, we gather all information in a histogram counting the number of points of different curvature. By the continuity of curvature function, true data and noise will form two different 'hills'. Figure 7 shows an example for the histogram. The phase change happens at the borderline of two hills, i.e., we seek to find the second minimal value of the histogram. In Figure 7, the critical value is annotated as 'marked curvature'. In this way, we do not need to set the threshold manually and, instead, achieve an adaptive selection process. Algorithm 4 shows the processing of this adaptive denoising via wasserstein curvature.
We use the same example as in Figure 4. The performance of AWCD is shown in Figure 8.  In this example, AWCD removes almost all noise far from Stanford Bunny, and remains almost all true data. The only problem is that a small portion of noise lying on the dragon cause the false positiveness and some true data located on the flat part are wrongly removed.
Since the main step in AWCD is also kNN, the time complexity is the same as SOR, which is O(kn log n). Therefore, AWCD is applicable in practice. It is remarkable that AWCD is effective for data with dense noise and robust to the unique parameter k.

Experiments
We use ROR, SOR and AWCD to denoise polluted data sets with noise of different levels of densities. The point clouds are from the Stanford 3D scanning repository, including Stanford Bunny, Duke Dragon, Armadillo, Lucy and Happy Buddha. For each data set, we add noise and record its signal-noise ratio (SNR). To show the influence of data size, we downsample the original data sets of different scales.
We adopt three criteria to measure the performance of the algorithms, including true positive rate (TPR), false positive rate (FPR) and signal-noise rate growing (SNRG). TPR describes the accuracy to preserve true points from unpolluted data sets. FPR describes the success rate to remove noisy points. SNRG explicates the promotion of SNR after processing. For any polluted point cloud D 0 = D ∪ N, where D is the points set of true data and N is the set of noise. We obtain the cleaned point cloud D 1 after the denoising algorithms. Then, the computation of these measurements are where | · | denotes the cardinality or size of a finite set. Intuitively, higher TPR, SNRG and lower FPR mean better performance of an algorithm. The experimental results are shown in Table A1 in Appendix I. In each experiment, we highlight the lowest FPR, the highest TPR and the SNRG over 99%. Table A1 shows the superiority of AWCD to ROR and SOR.
In general, AWCD can remove almost all noise and meanwhile preserves the true data, except for Armadillo.

Edge Detection
In this part, we attempt to apply the Wasserstein curvatures to detect the edges of images with noises. This application follows the idea that the edge parts contain more local information while the relatively flat parts tend to be regarded locally as white noise. Hence, the Wasserstein curvatures have natural advantages to depict the local information. This leads to the following Wasserstein sectional curvature edge detection (WSCED) of Algorithm 5.
Algorithm 5 Wasserstein sectional curvature edge detection. Input: initial grayscale F 0 with pixels of n × m, parameter k Output: edge figure F e 1: search kNN N ij for each point P ij ; 2: compute every local covariance Σ ij to obtain the covariance image CI, which is a (n − 2k) × (m − 2k) matrix constructed by matrixes Σ ij ; 3: determine the section σ ij := X ij ∧ Y ij for every point Σ ij on CI by computing tangent vectors 1) ); 4: compute the Wasserstein sectional curvature for every Kw| Σ ij (σ ij ) with (36) to obtain the curvature image F e , which is a (n − 2k) × (m − 2k) real matrix; 5: return F e .
Similar to what we have done in the last section, the first step of WSCED is computing the local mean and variance after kNN, which can be regarded as a two-dimensional embedding from a image into SPD(n). Every pixel coordinate (i, j) determines a local covariance matrix Σ ij . In the second step, we compute the sectional curvature for every Σ ij . The chosen section X ij ∧ Y ij is determined by two difference vectors along x-axis and y-axis, According to (36), we obtain the chosen curvature KI ij = K w | Σ ij X ij , Y ij on Σ ij . Then, we obtain a curvature image KI. Finally, with some appropriate image transformation, we can detect edges on KI.
In simulations, we compare WSCED to traditional edge detecting filters, including Sobel, Prewitt and Laplacian [24]. We tend to detect edges for images with noises in high density. From Figure 9, we find that WSCED approaches the same outcome as Sobel and Prewitt filters, which implies the potential connection between Wasserstein curvature and edges. This result also shows the robustness of WSCED to noises. We present more effects of digital experiments in Figure A1 in Appendix H.

Conclusions and Future Work
In this paper, we studied the geometric characteristics of (SPD(n), g W ), including geodesics, the connection, Jacobi fields and curvatures. Compared with the existing results, our results are simpler in form and more suitable for computation. Based on these results, we designed novel algorithms for point cloud denoising and image edge detection. Numerical experiments showed that these geometry-based methods were valid for applications. From both a theoretical and practical prospective, we gained a more comprehensive understanding regarding the Wasserstein geometry on SPD(n), which shows that the Wasserstein metric has both deep application potential and mathematical elegance.
In our future work, on the one hand, we aim to study Wasserstein geometry on other matrix manifolds, such as the Stiefel manifold [25], Grassman manifold [26] and some complex matrix manifolds [27]. On the other hand, we would like to generalize geometry-based methods to solve more problems in image, signal processing [28] and data science.
For any s ∈ (0, 1) that satisfies det(l(s)) = 0, we constrict the line segment by l s (t) := l(st), and the lemma can be induced by a horizontal ofl s (1).
Lemma A1 ensures that a line segment is horizontal if its initial vector is horizontal, which brings quite convenience for the following proofs of certain results.

Proof. First, we have
Then, we only need to check that A − 1 2 1˙ γ(0) ∈ Sym(n). In fact, we have which proves the lemma.
Remark A1. Finding the orthogonal matrix P is equivalent to solving a Riccati equation. One can see [29] for details.
To prove Theorem 1, the last step is to clarify the non-degeneration.

Appendix C. Proof of Theorem 6
Proof. Again, with (10), we have r(A) = r(Λ) where A = QΛQ T . Thus, we still focus on r(Λ). Following the definition and discussions around (23), in which · g W is the inner product reduced by g W . Then, we have Denote {λ i > 0} as eigenvalues of Λ, and η j as eigenvalues of Γ Λ [V]. Let x j be eigenvectors corresponding to η j such that {x j } forms an orthogonal basis of R n . By direct calculation, we find where we used the facts that Γ is positive-definite and η k = −1. The equality tr(VΓ Λ [V]) = x T k Λx k holds if and only if η j = 0, ∀j = k. In these cases, by Algorithm 1, we have then we obtain that In particular, we obtain 2λ min as min{tr(VΓ Λ [V])}, when Λ k = λ min and V ij = −2δ ik δ jk λ min . The tangent vector V is called the speed-degenerated direction. The function λ min (A) is certainly continuous, hence r(A) is also continuous.

Appendix D. Proof of Lemma 1
Before proving Lemma 1, we shall prove a key lemma, which points the relation between the Lie-brackets on the total space and base space.
Lemma A3. The horizontal lift of vector fields commutes with Lie-brackets, Proof. (Lemma A3) On the flat manifold (GL(n), g E ), the connection equals to the usual directional derivative in the Euclidean space. Therefore, for vector fields X, Y on SPD(n), we have (2), we obtain Then, we compute the Lie-bracket where the last equality in (A14) comes from (A13). Finally, we show the second term in (A14) is vertical. In fact, we have A T is anti-symmetric. Thus, the proof for Lemma A3 has been done.
By Lemma A3, the proof for Lemma 1 is clarified.
Proof. (Lemma 1) For any smooth vector field Z on SPD(n), and its horizontal lift Z, we have By the arbitrariness of Z, Lemma 1 is proved.
Remark A2. Proof of Lemma 1 implies that dσ| A (D X Y) is independent on A chosen among a fixed fiber.

(A36)
Appendix H. Effects of WSCED Figure A1. WSCED on different polluted images. The first row shows the original RGB images. The second row shows images with uniform noises. The third row shows edge images after WSCED.
Appendix I