Subspace Detours Meet Gromov-Wasserstein

In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measures space from an optimal transport plan in a wisely chosen subspace, onto which the original measures are projected. The contribution of this paper is to extend this category of methods to the Gromov-Wasserstein problem, which is a particular type of transport distance involving the inner geometry of the compared distributions. After deriving the associated formalism and properties, we also discuss a specific cost for which we can show connections with the Knothe-Rosenblatt rearrangement. We finally give an experimental illustration on a shape matching problem.


Introduction
Classical optimal transport (OT) has received lots of attention recently, in particular in Machine Learning for tasks such as generative networks (Arjovsky et al., 2017) or domain adaptation (Courty et al., 2016) to name a few.It generally relies on the Wasserstein distance, that builds an optimal coupling between distributions given their geometry.Yet, this metric lacks from potentially important properties, such as translation or rotation invariance, which can be useful when comparing shapes for instance (Mémoli, 2011;Chowdhury et al., 2021), and cannot be used directly whenever the distributions lie in different metric spaces.In order to alleviate those problems, custom solutions have been proposed, such as (Alvarez-Melis et al., 2019;Cai and Lim, 2020).
Apart from these works, another meaningful OT distance to tackle these problems is the Gromov-Wasserstein (GW) distance, originally proposed in Mémoli (2007Mémoli ( , 2011)).It is a distance between metric spaces and has several appealing properties such as geodesics or invariances (Sturm, 2012).Yet, the price to be paid lies in its computational complexity, which requires to solve a quadratic optimization problem with linear constraints.A recent line of work tends to compute approximations or relaxations of the original problem, in order to spread its use in more data intensive machine learning applications.For example, Peyré et al. use an entropic regularization in order to iterate several Sinkhorn projections (Cuturi, 2013).A related recent method imposes coupling with low-rank constraints (Scetbon et al., 2021).Vayer et al. proposed a sliced approach to approximate Gromov-Wasserstein.Fatras et al. studied an estimator based on mini-batches.In Chowdhury et al. (2021), authors propose to partition the space and to solve the optimal transport problem between a subset of points, before finding a coupling between all the points.
In this work, we study the subspace detour approach for Gromov-Wasserstein.This class of method was first proposed for the Wasserstein setting in Muzellec and Cuturi (2019) and consists in choosing the optimal transport plan between projected measures on a subspace, before finding a coupling on the whole space between the original measures using disintegration.Our main contribution is to derive the subspace detours between different subspaces and to apply it for GW costs.We derive some useful properties as well as closed-form solution between Gaussians.Interestingly enough, we also propose a separable quadratic cost for the GW problem that can be related with a triangular coupling, hence bridging the gap with Knothe-Rosenblatt (KR) rearrangements.Illustrations of the method are also given on a shape matching problem.

Background
In this section, we introduce all the necessary material to describe the subspace detour approach, from classical optimal transport and its connection to the Knothe-Rosenblatt rearrangement, before defining subspace optimal couplings via the gluing lemma and measure disintegration.Then, we introduce the Gromov-Wasserstein problem for which we will derive the subspace detour in the next sections.

Classical optimal transport
Let µ, ν ∈ P(R d ) be two probability measures.The set of couplings between µ and ν is defined as where π 1 and π 2 are the projections on the first and second coordinate (i.e.π 1 (x, y) = x), and # is the push forward operator, defined such that Kantorovitch problem There exists several types of coupling between probability measures and a non exhaustive list can be found in (Villani, 2008)[Chapter 1].Among them, the so called optimal coupling is the minimizer of the following Kantorovitch problem: inf γ∈Π(µ,ν) c(x, y)dγ(x, y) with c some cost function.When c(x, y) = x − y 2 2 , then it defines the Wasserstein distance x − y 2 2 dγ(x, y). (2) The Kantorovitch problem (1) is known to admit a solution when c is nonnegative and lower semi-continuous (Santambrogio, 2015)[Theorem 1.7].When the optimal coupling is of the form γ = (Id, T ) # µ with T some deterministic map such that T # µ = ν, T is called the Monge map.
In one dimension, with µ atomless, the solution to (2) is a deterministic coupling of the form (Santambrogio, 2015)[Theorem 2.5] where F µ is the cumulative distribution function of µ and F −1 ν the quantile function of ν.This map is also known as the increasing rearrangement.
Knothe-Rosenblatt rearrangement Another interesting coupling is the Knothe-Rosenblatt rearrangement, which takes advantage of the increasing rearrangement in one dimension by iterating over the dimension and disintegrating.Concatenating all the increasing rearrangements between the conditional probabilities, we obtain the KR rearrangement, which turns out to be a nondecreasing triangular map (i.e.T : R d → R d , for all x ∈ R d , T (x) = (T 1 (x 1 ), . . ., T j (x 1 , . . ., x j ), . . ., T d (x)) and for all j, T j is nondecreasing with respect to x j ), and a deterministic coupling (i.e.T # µ = ν) (Villani, 2008;Santambrogio, 2015;Jaini et al., 2019).Carlier et al. made a connection between this coupling and optimal transport by showing that it can be obtained as the limit of optimal transport plans for a degenerated cost where for all i ∈ {1, . . ., d}, t > 0, λ i (t) > 0 and for all i ≥ 2, λi(t)  λi−1(t) − −− → t→0 0. This cost can be recast as in (Bonnotte, 2013) This formalizes into the following Theorem: Theorem 1 (Carlier et al. (2010); Santambrogio (2015)).Let µ and ν be two absolutely continuous measures on R d , with compact supports.Let γ t be an optimal transport plan for the cost c t , let T K be the Knothe-Rosenblatt map between µ and ν, and γ K = (Id × T K ) # µ the associated transport plan.
Then, we have γ t Moreover, if γ t are induced by transport maps T t , then T t converges in L 2 (µ) when t tends to zero to the Knothe-Rosenblatt rearrangement.

Subspace detours and disintegration
Muzellec and Cuturi proposed another OT problem by optimizing over the couplings which share a measure on a subspace.More precisely, they defined subspace optimal plans for which the shared measure is the OT plan between projected measures.Definition 1 (Subspace-Optimal Plans (Muzellec and Cuturi, 2019) # ν (with π E the orthogonal projection on E).Then the set of E-optimal plans between µ and ν is defined as By the Gluing lemma (Villani, 2008), it is possible to construct a coupling γ ∈ Π(µ, ν) such that A way to do that is to rely on disintegration.
Disintegration Let (Y, Y) and (Z, Z) be measurable spaces, and (X, X ) = (Y × Z, Y ⊗ Z) the product measurable space.Then, for µ ∈ P(X), we denote µ Y = π Y # µ and µ Z = π Z # µ the marginals, where π Y (respectively π Z ) is the projection on Y (respectively Z).Then, a family (K(y, •)) y∈Y is a disintegration of µ if for all y ∈ Y , K(y, •) is a measure on Z, for all A ∈ Z, K(•, A) is measurable and where C(X) is the set of continuous functions on X.We can note µ = µ Y ⊗ K. K is a probability kernel if for all y ∈ Y , K(y, Z) = 1.The disintegration of a measure actually corresponds to conditional laws in the context of probabilities.This concept will allow us to obtain measures on the whole space from marginals on subspaces.
In the case where X = R d , which is the main case of interest in the remainder of the paper, we have existence and uniqueness of the disintegration (see Box 2.2 of Santambrogio (2015) or Chapter 5 of Ambrosio et al. (2008) for the more general case).

Coupling on the whole set
Then, to get a transport plan between the two originals measures on the whole space, we can look for another coupling between disintegrated measures µ E ⊥ |E and ν E ⊥ |E .In particular, two such couplings are proposed in Muzellec and Cuturi (2019), the Monge-Independent (MI) plan where we take the independent coupling between µ E ⊥ |E (x E , •) and ν E ⊥ |E (y E , •) for γ * E almost every (x E , y E ), and the Monge-Knothe (MK) plan •) for γ * E almost every (x E , y E ).Muzellec and Cuturi observed that MI is more adapted to noisy environments since it only computes the OT plan on the subspace.MK is more suited for applications where we want to prioritize some subspace but where all the directions still contain relevant informations.

Gromov-Wasserstein
Formally, the Gromov-Wasserstein distance allows to compare metric measure spaces (mm-space), triplets (X, d X , µ X ) and (Y, d Y , µ Y ) where (X, d X ) and (Y, d Y ) are complete separable metric spaces and µ X , µ Y Borel probability measures on X and Y (Sturm, 2012), by computing where L is some loss on R. It has actually been extended to other spaces by replacing the distances by cost functions c X and c Y , as e.g. in (Chowdhury and Mémoli, 2019).Furthermore, it has many appealing properties such as having invariances (which depend on the costs).
Vayer studied notably this problem in the setting where X and Y are Euclidean spaces, with L(x, y) = (x − y) 2 and c(x, x ) = x, x or c(x, x ) = x − x 2 2 .In particular, let µ ∈ P(R p ), ν ∈ P(R q ), the inner-GW problem is defined as For this problem, a closed-form in one dimension can be found: Theorem 2 (Vayer (2020) Theorem 4.2.4).Let µ, ν ∈ P(R), with µ absolutely continuous with respect to the Lebesgue measure.Let F µ (x) := F µ (x) = µ(]−∞, x]) be the cumulative distribution function and Then, an optimal solution of (4) is achieved either by γ = (Id × T asc ) # µ or by γ = (Id × T desc ) # µ.

Subspace detours for GW
In this section, we propose to extend subspace detours from Muzellec and Cuturi (2019) with Gromov-Wasserstein costs.We show that we can even take subspaces of different dimensions, and still obtain a coupling on the whole space using the Independent or the Monge-Knothe coupling.Then, we derive some properties analogously to Muzellec and Cuturi (2019), as well as some closed-form solutions between Gaussians.

Motivations
First, we adapt the definition of subspace optimal plans for difference subspaces.Indeed, the Gromov-Wasserstein distance being able to compare data on spaces of different dimensions, we can argue that the main information would not be in the same subspace for both datasets.For example, by rotating a dataset, we would change the subspace of interest and most likely lose information as we can see on Figure 1.On this illustration, we use as a source one moon of the Two moons dataset, and obtain a target by rotating it by an angle of π 2 .As GW with c(x, x ) = x − x 2 2 is invariant with respect to isometries, we are able to recover the exact correspondence between the points.However, when choosing a subspace to project both the source and target, we completely lose the optimal coupling between them.Nonetheless, by choosing more wisely one subspace by dataset (using here the first component of the principal component analysis (PCA) decomposition), we find the right coupling.This illustration underlines the idea that the choice of both subspaces is important.A way of choosing the subspaces could be to project on the subspace containing the more information for each dataset using e.g.PCA independently on each distribution.Muzellec and Cuturi proposed to optimize the optimal transport cost with respect to an orthonormal matrix with a projected gradient descent, which could be extended to an optimization over two orthonormal matrices in our context.By allowing to have different subspaces, we get the following definition of subspace optimal plans. 2 , Data projected on the 1st axis, OT plan obtained between the projected measures, Data projected on their 1st PCA component, OT plan obtained between the the projected measures Definition 2. Let µ ∈ P 2 (R p ), ν ∈ P 2 (R q ), E be a k-dimensional subspace of R p and F a kdimensional subspace of R q .Let γ * E,F be an optimal transport plan for Then the set of (E, F )optimal plans between µ and ν is defined as Analogously to Muzellec and Cuturi (2019) (Section 2.2), we can obtain from γ * E,F a coupling on the whole set by either defining the Monge-Independent plan where OT plans are taken with some OT cost such as e.g.GW .

Properties
Following Muzellec and Cuturi (2019), the Monge-Knothe coupling is the optimal measure among the subspace optimal plans for the corresponding cost.We show it for the Gromov-Wasserstein distance with cost L, which is a direct transposition of Proposition 1 in Muzellec and Cuturi (2019).
where γ * are optimal for the Gromov-Wasserstein problem with cost L. Then we have: L(x, x , y, y )dγ(x, y)dγ(x , y ).
Key properties of GW that we would like to keep are its invariances.We show in two particular cases that we conserve them on the orthogonal spaces (since the measure on E × F is fixed).Proposition 2. Let µ ∈ P(R p ), ν ∈ P(R q ), and denote L(x, x , y, y )dγ(x, y)dγ(x , y ).
, GW E,F is invariant with respect to translations and isometries on E ⊥ and F ⊥ .
For L(x, x , y, y ) = x, x p − y, y q 2 , GW E,F is invariant with respect to isometries on E ⊥ and F ⊥ .
We refer to Appendix A.1 for the proofs of the two previous propositions.

Closed-form between Gaussians
We can also derive explicit formulas between Gaussians in particular cases.Let q ≤ p, µ = N (m µ , Σ) ∈ P(R p ), ν = N (m ν , Λ) ∈ P(R q ) two Gaussian measures with Σ = P µ D µ P T µ and Λ = P ν D ν P T ν .As previously, let E ⊂ R p and F ⊂ R q be respectively k and k dimensional subspaces.Following Muzellec and Cuturi (2019), we represent Σ in an orthonormal basis of the Schur complement of Σ with respect to Σ E .We know that the conditionals of Gaussians are Gaussians, and of covariance the Schur complement (see e.g.Rasmussen ( 2003) ;Von Mises (1964)).
For L(x, x , y, y ) = x−x 2 2 − y−y 2 2 2 , we have for now no certainty that the optimal transport plan is Gaussian.By restricting the minimization problem to Gaussian couplings, Salmona et al.
showed that there is a solution γ where By combining the results of Muzellec and Cuturi (2019) and Salmona et al. (2021), we get the following closed-form for Monge-Knothe couplings.Proposition 3. Suppose p ≥ q and k = k .For the Gaussian restricted GW problem, a Monge-Knothe transport map between where with T E,F an optimal transport map between N (0 E , Σ E ) and N (0 Suppose that k ≥ k , m µ = 0, m ν = 0 and let T E,F be an optimal transport map between µ E and ν F (of the form ( 5)).We can derive a formula for the Monge-Independent coupling for the inner-GW problem and the Gaussian restricted GW problem.
where T E,F is an optimal transport map, either for the inner-GW problem or the Gaussian restricted problem.

Limit of optimal transport plans?
Another interesting property derived in Muzellec and Cuturi (2019) of the Monge-Knothe coupling is that it can be obtained as the limit of classic optimal transport plans, similar to Theorem 1, using a separable cost of the form c t (x, y) = (x − y) T P t (x − y) an orthonormal basis of R p .However, this property is not valid for the classical Gromov-Wasserstein cost (e.g.L(x, x , y, y ) = (d X (x, x ) 2 − d Y (y, y ) 2 ) 2 or L(x, x , y, y ) = ( x, x p − y, y q ) 2 ) as the cost is not separable.Motivated by this question, we ask ourselves in the following if we can derive a quadratic optimal transport cost for which we would have this property.

Construction and properties of the Hadamard-Wasserstein problem
The main idea of the proof of Theorem 1 in Carlier et al. (2010) is to decompose the objective function as before taking the limit t → 0 which makes the right-hand term vanish and allows to conclude on the limit of the first marginal of the optimal map.Reasoning by induction on the dimension, Carlier et al. are able to deal with one term at a time, and finally show that the limit of the optimal map is the Knothe-Rosenblatt transport (2.1).Another key ingredient is to have access to a unique transport map between measures in R, as it is the case for the Wasserstein distance with cost c(x, y) = 1 2 (x − y) 2 , the Monge map being the increasing rearrangement (3) (it can actually be extended to smoothly strictly convex costs, see Santambrogio (2015)[Theorem 2.9]).
For now, the only cost for which we have an optimal transport map in 1D is for the inner product (Vayer, 2020).Hence, we need a cost which reduces to inner-GW (4) in 1D.A natural choice is therefore to use the following cost: as a loss function, where is the Hadamard product (element wise product).We define the following "Hadamard Wasserstein" problem Properties The loss L (6) satisfies well the separability condition and reduces to the inner-GW loss in 1D.We can therefore define a degenerated version of it, ∀x, x , y, y , L t (x, x , y, y ) = with t , λ (2) t , ..., t ), and such as for all t > 0, and for all i ∈ {1, . . ., d− 1}, λ We denote HW t the problem (7) with the degenerate cost (8).We will derive some useful properties which are usual for the regular Gromov-Wasserstein cost.Proposition 5. Let µ, ν ∈ P(R d ).
3. HW is invariant to reflexion with respect to axes.
HW loses somes properties compared to GW .Indeed, it is only invariant with respect to axes and it can compare only measures lying in the same Euclidean space in order for the distance to be well defined.Nonetheless, we show in the following that we can derive some links with triangular couplings in the same way as the Wasserstein distance and KR.
We first define a triangular coupling different from the Knothe-Rosenblatt rearrangement in the sense that each map will not be nondecreasing.Indeed, following Theorem 2, the solution of each 1D problem Hence, at each step k ≥ 1, if we disintegrate the joint law of the k first variables as µ 1:k = µ 1:k−1 ⊗ µ k|1:k−1 , the optimal transport map T (•|x 1 , . . ., x k ) will be the solution of We now state the main theorem where we show that the limit of the OT plans obtained with the degenerated cost will be the triangular coupling we just defined.
Theorem 3. Let µ and ν be two absolutely continuous measures on R d such that x 4 2 µ(dx) < +∞, y 4 2 ν(dy) < +∞ and with compact support.Let γ t be an optimal transport plan for HW t , let T K be the alternate Knothe-Rosenblatt map between µ and ν as defined in the last paragraph, and let γ K = (Id × T K ) # µ be the associated transport plan.Then, we have γ t We report in Appendix C how to compute HW (7) in the discrete setting.

Illustrations
We use the Python Optimal Transport (POT) library (Flamary et al., 2021) to compute the different optimal transport problems involved in this illustration.We are interested here in solving a 3D mesh registration problem, which is a natural application of Gromov-Wasserstein (Mémoli, 2011) since it enjoys invariances with respect to isometries such as permutations, and can also naturally exploit the topology of the meshes.For this purpose, we selected two base meshes from the FAUST dataset (Bogo et al., 2014), which provides ground truth correspondences between shapes.The information available from those meshes are geometrical (6890 vertices positions) and topological (mesh connectivity).These two meshes are represented, along with the visual results of the registration, in Figure 2. In order to visually depict the quality of the assignment induced by the transport map, we propagate through it a color code of the source vertices toward their associated counterpart vertices in the target mesh.Both original color coded source and associated target ground truth are available on the first line of the illustration.To compute our method, we simply use as a natural subspace for both meshes the algebraic connectivity of the mesh topological information, also known as the Fiedler vector (eigenvector associated to the second smallest eigenvalue of the un-normalized Laplacian matrix).Reduced to a 1D optimal transport problem, following Eq.4, the computation time is very low (∼ 5secs.on a standard laptop), and the associated matching is very good with more than 98% of correct assignments.We qualitatively compare this result to Gromov-Wasserstein mappings induced by different cost functions, in the second line of Figure 2: adjacency (Xu et al., 2019), weighted adjacency (weights are given by distances between vertices), heat kernel (derived from the un-normalized Laplacian) (Chowdhury and Needham, 2021) and finally geodesic distances over the meshes.In average, computing the Gromov-Wasserstein mapping using POT took around 10 minutes of time.Both methods based on adjacency fail to recover a meaningful mapping.Heat kernel allows to map continuous areas of the source mesh, but fails in recovering a global structure.Finally, the geodesic distance gives a much more coherent mapping, but has inverted left and right of the human figure .Notably, a significant extra computation time was induced by the computation of the geodesic distances (∼ 1h/mesh using the NetworkX (Hagberg et al., 2008) shortest path procedure).As a conclusion, and despite the simplification of the original problem, our method performs best, with a speed-up of two-orders of magnitude.

Discussion
We proposed in this work to extend the subspace detour approach to different subspaces, and to other optimal transport costs such as Gromov-Wasserstein.Being able to project on different subspaces can be useful when the data are not aligned and do not share the same axes of interest, as well as when we are working between different metric spaces as it is the case for example with graphs.However, a question arising is how to choose these subspaces.Since the method is mostly interesting when we choose one dimensional subspaces, we proposed to use a PCA and to project on the first directions for data embedded in euclidean spaces.For more complicated data such as graphs, we projected onto the Fiedler vector and obtained good results in an efficient way on a 3D mesh registration problem.More generally, Muzellec and Cuturi proposed to perform a gradient descent on the loss with respect to orthonormal matrices.This approach is non-convex and only guaranteed to converge to a local minimum.Designing such an algorithm, which would minimize alternatively between two transformations in the Stiefel manifold, is left for future works.(Second row) After recalling the expected ground truth for ease of comparison, we present results of different Gromov-Wasserstein mappings obtained with metrics based on adjacency, heat kernel and geodesic distances.
The subspace detour approach for transport problem is meaningful whenever one can identify subspaces that gather most of the information from the original distributions, while making the estimate more robust and with a better sample complexity as far as dimensions are lower.On the computational complexity side, and when we have only access to discrete data, the subspace detour approach brings better computational complexity solely when the subspaces are chosen as one dimensional.Indeed, otherwise, we have the same complexity for solving the subspace detour and solving directly the OT problem (since the complexity only depends on the number of samples).In this case, the 1D projection often gives distinct values for all the samples (for continuous valued data) and hence the Monge-Knothe coupling is exactly the coupling in 1D.As such, information is lost on the orthogonal spaces.It can be artificially recovered by quantizing the 1D values (as experimented in practice in Muzellec and Cuturi (2019)), but the added value is not clear and deserves broader studies.If given absolutely continuous distributions wrt. the Lebesgue measure however, this limit does not exist, but comes with the extra cost of being able to compute efficiently the projected measure onto the subspace, which might require discretization of the space and is therefore not practical in high dimensions.
We also proposed a new quadratic cost HW that we call Hadamard-Wasserstein, which allows to define a degenerated cost for which the optimal transport plan converges to a triangular coupling.However, this cost loses many properties compared to W 2 or GW , for which we are inclined to use these problems.Indeed, while HW is a quadratic cost, it uses an euclidean norm between the Hadamard product of vectors and requires the two spaces to be the same (in order to have the distance well defined).A work around in the case X = R p and Y = R q with p ≤ q would be to "lift" the vectors in R p into vectors in R q with padding as it is proposed in Vayer et al. (2019b), or to project the vectors in R q on R p as in Cai and Lim (2020).Yet for some applications where only the distance/similarity matrices are available, a different strategy still needs to be found.Another concern is the limited invariance properties (only with respect to axial symmetry symmetry in our case).Nevertheless, we expect that such a cost can be of interest in cases where invariance to symmetry is a desired property, such as in (Nagar and Raman, 2019).

A Subspace detours
However, for γ * E×F a.e.(x E , y F ), (x E , y F ), by definition of the Monge-Knothe coupling.This is well optimal for subspace optimal plans.
Proof of Proposition 2. Let f : R p → R p be an invariance of , and therefore f E ⊥ is either an isometry or a translation.
From lemma 6 of Paty and Cuturi (2019), we know that We can rewrite For γ * E,F almost every (x E , y F ), (x E , y F ), we have By integrating with respect to γ * E,F , we obtain Now, we show that γ = (f, Hence, we can rewrite (9) as Now, by taking the infimum with respect to γ ∈ Π E,F (µ, ν), we find For the inner product case, we can do the same proof for isometries.

A.2 Closed-form between Gaussians
Let q ≤ p, µ = N (m µ , Σ) ∈ P(R p ), ν = N (m ν , Λ) ∈ P(R q ) two Gaussian measures with Σ = P µ D µ P T µ and Λ = P ν D ν P T ν .Let E ⊂ R p be a subspace of dimension k and F ⊂ R q a subspace of dimension k .
We represent Σ in an orthonormal basis of E ⊕ E ⊥ , and Λ in an orthonormal basis of F ⊕ F ⊥ , i.e.
the Schur complement of Σ with respect to Σ E .We know that the conditionals of Gaussians are Gaussians, and of covariance the Schur complement (see e.g.Rasmussen (2003) or Von Mises (1964)).

A.2.1 Quadratic GW problem
For GW with c(x, x ) = x − x 2 2 , we have for now no guarantee that there exists an optimal coupling which is a transport map.Salmona et al. proposed to restrict the problem to the set of Gaussian couplings π(µ, ν) ∩ N p+q where N p+q denotes the set of Gaussians in R p+q .In that case, the problem becomes In that case, they showed that an optimal solution is of the form −q and Ĩq of the form diag (±1) i≤q .
Since the problem is translation invariant, we can always solve the problem between the centered measures.
In the following, we suppose that k = k .Let's denote T E,F the optimal transport map for (10) between N (0, Σ E ) and N (0, Λ F ).According to Theorem 4.1 in Salmona et al. (2021), such a solution exists and is of the form (5). We also denote T E ⊥ ,F ⊥ the optimal transport map between N (0, Σ/Σ E ) and N (0, Λ/Λ F ) (which is well defined since we assumed p ≥ q and hence p − k ≥ q − k since k = k ).
We know that the Monge-Knothe transport map will be a linear map T MK (x) = Bx with B a block triangular matrix of the form with C ∈ R (q−k )×k , and such that BΣB T = Λ (to have well a transport map between µ and ν). Actually, First, we have well T E,F Σ E T T E,F = Λ F as T E,F is a transport map between µ E and ν F .Then, with positive values on the diagonals.Hence, we have And for the last equation,

A.2.2 Closed-form between Gaussians for Monge-Independent
Suppose k ≥ k in order to be able to define the OT map between µ E and ν F .
For the Monge-Independent plan We know that π MI is a degenerate Gaussian with a covariance of the form where Cov(X) = Σ and Cov(Y ) = Λ.Moreover, we know that C is of the form Let's assume that m µ = m ν = 0, then We also have Finally, we find .
By taking orthogonal bases (V E , V E ⊥ ) and (V F , V F ⊥ ), we can put it in a more compact way such as in Proposition 4 in Muzellec and Cuturi (2019): To check it, just expand the terms and see that

B Knothe-Rosenblatt
B.1 Properties of (7) Proof of Proposition 5. Let µ, ν ∈ P(R d ), 1. (x, x ) → x x is a continuous map, therefore L is lower semi-continuous.Hence, by applying lemma 2.2.1 of (Vayer, 2020), we have that γ → L(x, x , y, y )dγ(x, y)dγ(x , y ) is lower semi-continuous for the weak convergence of measures.Now, as Π(µ, ν) is a compact set (see the proof of Theorem 1.7 in Santambrogio (2015) for the Polish space case, and of Theorem 1.4 for the compact metric space), and γ → Ldγdγ is lower semi-continuous for the weak convergence, we can apply the Weierstrass theorem (Memo 2.2.1 in Vayer ( 2020)) which states that (7) always admits a minimizer.
3. For invariances, we first look at the properties that must be satisfied by T in order to have: i=1 as the canonical basis, we have If we take for T the reflection with respect to axis, then it satisfies well f (x, x ) = f (T (x), T (x )).Moreover, it is well an equivalent relation, and therefore we have a distance on the quotient space.Proposition 6.In a slightly more general setting, let X 0 = X 1 = R d , functions f 0 , f 1 from R d × R d to R d and measures µ 0 ∈ P(X 0 ), µ 1 ∈ P(X 1 ).Then the family X t = (X 0 × X 1 , f t , γ * ) defines a geodesic between X 0 and X 1 , where γ * is the optimal coupling of HW between µ 0 and µ 1 , and f t ((x 0 , x 0 ), (x 1 , x 1 )) = (1 − t)f 0 (x 0 , x 0 ) + tf 1 (x 1 , x 1 ).
Let µ, ν ∈ P(R d ), absolutely continuous, with finite fourth moments and compact supports.We recall the problem HW t , First, let's denote γ t the optimal coupling for HW t for all t > 0. We want to show that γ t Y are polish space, therefore, by (Villani, 2008)[Lemma 4.4], Π(µ, ν) is a tight set, and we can apply the Prokhorov theorem (Santambrogio, 2015)[Box 1.4] on (γ t ) t and extract a subsequence)).
However, γ 1 K was constructed in order to be the unique optimal map for this cost (either T asc or T desc according to theorem (Vayer, 2020)[Theorem 4.2.4]).Thus, we can deduce that γ Part 2: We know that for any t > 0, γ t and γ K share the same marginals.Thus, as previously, π 1 # γ t should have a cost worse than π 1 # γ K , which translates to Therefore, we have the following inequality, We can substract the first term and factorize by λ (1)
For the right hand side, using that γ t K = η t, ⊗ γ .
We can now disintegrate with respect to γ 1: −1 as before.We just need to prove that the marginals coincide which is done by taking for test functions ξ(x 1 , ..., x −1 , y 1 , ..., y −1 )φ(x ) ξ(x 1 , ..., x −1 , y 1 , ..., y −1 )ψ(y ) and using the fact that the measures are concentrated on y k = T K (x k ).E(γ) with E(γ) = i,j k, x i x k − y j y 2 2 γ i,j γ k, .As denoted in Peyré et al. (2016), if we note L i,j,k, = x i x k − y j y 2 2 , then we have where ⊗ is defined as On Figure 3, we generated 30 points of two gaussian distributions, and computed the optimal coupling of HW t for several t.These points have the same uniform weight.On the first row, we projected the points on the first coordinate.Note that for discrete points, the Knothe-Rosenblatt coupling comes back to sort the point with respect to the first coordinate if there is no ambiguity (i.e. x (1) n ) as it comes back to perform the optimal transport in one dimension (Peyré et al., 2019)[Remark 2.28].For our cost, the optimal coupling in 1D can either be the increasing or the decreasing one.We observe well on the first row of figure (3) that the optimal coupling when t is close to 0 corresponds to the "anti-cdf".

Figure 1 :
Figure 1: From left to right: Data (moons), OT plan obtained with GW for c(x, x ) = x − x 2 2 , Data projected on the 1st axis, OT plan obtained between the projected measures, Data projected on their 1st PCA component, OT plan obtained between the the projected measures

Figure 2 :
Figure 2: 3D Mesh registration.(First row) source and target meshes, color code of the source, ground truth color code on the target, result of subspace detour using Fiedler vectors as subspace.(Secondrow) After recalling the expected ground truth for ease of comparison, we present results of different Gromov-Wasserstein mappings obtained with metrics based on adjacency, heat kernel and geodesic distances.