The Gromov–Wasserstein Distance: A Brief Overview

Mémoli, Facundo

doi:10.3390/axioms3030335

Open AccessArticle

The Gromov–Wasserstein Distance: A Brief Overview

by

Facundo Mémoli

Department of Mathematics, The Ohio State University, Columbus, OH, USA

Axioms 2014, 3(3), 335-341; https://doi.org/10.3390/axioms3030335

Submission received: 1 May 2014 / Revised: 12 August 2014 / Accepted: 22 August 2014 / Published: 2 September 2014

(This article belongs to the Special Issue Discrete Differential Geometry and Its Applications to Imaging and Graphics)

Download Versions Notes

Abstract

:

We recall the construction of the Gromov–Wasserstein distance and concentrate on quantitative aspects of the definition.

Keywords:

metric geometry; graph theory; shape recognition; optimal transportation

1. Introduction

Modeling datasets as metric spaces seems to be natural for some applications and concepts revolving around the Gromov–Hausdorff distance—a notion of distance between compact metric spaces—provide a useful language for expressing properties of data and shape analysis methods. In many situations, however, this is not enough, and one must incorporate other sources of information into the model, with “weights” attached to each point being one of them. This gives rise to the idea of representing data as metric measure spaces, which are metric spaces endowed with a probability measure. In terms of a distance, the Gromov–Hausdorff metric is replaced with the Gromov–Wasserstein metric.

1.1. Notation and Background Concepts

The book by Burago, et al. [1] is a valuable source for many concepts in metric geometry. We refer the reader to that book for any concepts not explicitly defined in these notes.

We let

M

denote the collection of all compact metric spaces and by

M^{iso}

the collection of all isometry classes of

M

. Recall that for a given metric space

(X, d_{X}) \in M

, its diameter is defined as

diam (X) : = {max}_{x, x^{'} \in X} d_{X} (x, x^{'}) .

Similarly, the radius of X is defined as

rad (X) : = {min}_{x \in X} {max}_{x^{'} \in X} d_{X} (x, x^{'}) .

For a fixed metric space

(Z, d_{Z})

, we let

d_{H}^{Z}

denote the Hausdorff distance between (closed) subsets of Z.

We will often refer to a metric space

(X, d_{X})

by only X, but the notation for the underlying metric will be implicitly understood to be

d_{X}

. Recall that a map

φ : X \to Y

between metric spaces

(X, d_{X})

and

(Y, d_{Y})

is an isometric embedding if

d_{Y} (φ (x), φ (x^{'})) = d_{X} (x, x^{'})

for all

x, x^{'} \in X

. The map φ is an isometry if it is a surjective isometric embedding.

Recall that given measurable spaces

(X, Σ_{X})

and

(Y, Σ_{Y})

, a measure μ on

(X, Σ_{X})

and a measurable map

f : X \to Y

, the push-forward measure

f_{#} μ

on

(Y, Σ_{Y})

acts according to

f_{#} μ (B) = μ (f^{- 1} (B))

for any

B \in Σ_{Y}

.

A metric measure space (mm-space for short) is a triple

(X, d_{X}, μ_{X})

where

(X, d_{X})

is a compact metric space and

μ_{X}

is a Borel probability measure with full support:

supp (μ_{X}) = X

. We denote by

M^{w}

the collection of all mm-spaces. An isomorphism between

X, Y \in M^{w}

is any isometry

Ψ : X \to Y

, such that

Ψ_{#} μ_{X} = μ_{Y} .

2. The Gromov–Hausdorff Distance

One says that a subset

R \subset X \times Y

is a correspondence between sets X and Y whenever

π_{1} (R) = X

and

π_{2} (R) = Y,

where

π_{1} : X \times Y \to X

and

π_{2} : X \times Y \to Y

are the canonical projections. Let

R (X, Y)

denote the set of all correspondences between X and Y.

The Gromov–Hausdorff (GH) distance between compact metric spaces

(X, d_{X})

and

(Y, d_{Y})

is defined as:

d_{G H} (X, Y) : = \frac{1}{2} inf_{R} sup_{(x, y), (x^{'}, y^{'}) \in R} | d_{X} (x, x^{'}) - d_{Y} (y, y^{'}) |

(1)

where R ranges over

R (X, Y) .

Example 1.

The GH distance between any compact metric space X and the space with exactly one point is equal to

\frac{1}{2} diam (X) .

It turns out that

(M, d_{G H})

is a nice space in that is has many compact subclasses.

Theorem 1.

([1]) Let

N : [0, + \infty) \to N

be a bounded function and

D > 0

. Let

F (N, D) \subset M

be any family of compact metric spaces, such that

diam (X) \leq D

for all

X \in F (N, D)

, such that for any

ε > 0

, any

X \in F (N, ε)

admits an ε-net with at most

N (ε)

elements. Then,

F (N, D)

is pre-compact in the Gromov–Hausdorff topology.

Example 2.

An important example of families, such as the above, is given by those closed n-dimensional Riemannian manifolds

(X, g_{X}) \in M (n, κ, D)

with the diameter bounded by

D > 0

and the Ricci curvature bounded below by κ.

Theorem 2

([2]). The space

(M^{iso}, d_{G H})

is complete.

It then follows from the two theorems above that classes

F (N, D)

, such as above, are totally bounded for the Gromov–Hausdorff distance. This means that such classes are easy to organize in the sense of clustering or databases.

In many practical applications, one would like to take into account “weights” attached to points in a dataset. For example, the two metric spaces with the weights below are isometric, but not isomorphic in the sense that no isometry respects the weights:

The idea is that weights represent how much we trust a given “measurement” in practical applications. This leads to considering a more general collection of datasets and, in turn, an adapted notion of equality and a compatible metric over them. This naturally leads to regarding datasets as mm-spaces and then finding a notion of distance on

M^{w}

compatible with isomorphism of mm-spaces.

3. A Metric on $M^{w}$

Let

(X, d_{X}, μ_{X})

and

(Y, d_{Y}, μ_{Y})

be two given mm-spaces. In our path to defining a distance between mm-spaces, we emulate the construction of the Gromov–Hausdorff distance and start by identifying a notion of correspondence between mm-spaces.

A probability measure μ over

X \times Y

is called a coupling between

μ_{X}

and

μ_{Y}

if

{(π_{1})}_{#} μ = μ_{X}

and

{(π_{2})}_{#} μ = μ_{Y} .

We denote by

U (μ_{X}, μ_{Y})

the collection of all couplings between

μ_{X}

and

μ_{Y}

.

Example 3.

When

Y = {p}

,

μ_{Y} = δ_{p}

, and thus, there is a unique coupling between X and Y:

U (μ_{X}, μ_{Y}) = {μ_{X} \otimes δ_{p}} .

Example 4.

Consider for example the spaces with two points each that we depicted above. In that case,

μ_{X}

can be identified with the vector

(\frac{1}{2}, \frac{1}{2})

and

μ_{Y}

with the vector

(\frac{3}{4}, \frac{1}{4}) .

In this case, one sees that the matrix:

[\begin{matrix} \frac{1}{4} & \frac{1}{2} \\ \frac{1}{4} & 0 \end{matrix}]

induces a valid coupling.

Now, given

p \geq 1

, consider the function

(x, y, x^{'}, y^{'}) \mapsto | d_{X} (x, x^{'}) - d_{Y} (y, y^{'}) |^{p}

and pick any

μ \in U (μ_{X}, μ_{Y}) .

One then integrates this function against the measure

μ \otimes μ

and infimizes over the choice of

μ \in U (μ_{X}, μ_{Y})

to define the Gromov–Wasserstein distance of order p [3]:

d_{G W, p} (X, Y) : = \frac{1}{2} inf_{μ} {(\int \int | d_{X} (x, x^{'}) - d_{Y} (y, y^{'}) |^{p} μ (d x \times d y) μ (d x^{'} \times d y^{'}))}^{1 / p}

Remark 1.

This is an

L^{p}

analogue of Equation (1).

Theorem 3

([3]). The Gromov–Wasserstein distance of order

p \geq 1

defines a proper distance on the collection of isomorphism classes of mm-spaces.

By standard compactness arguments, one can prove that the infimum above is always attained [3]. Let

U_{p}^{opt} (X, Y)

denote the set of all the couplings in

U (μ_{X}, μ_{Y})

that achieve the minimum. The structure of the former set depends not only on

μ_{X}

and

μ_{Y}

, but also on

d_{X}

,

d_{Y}

and p.

Example 5.

Consider the mm-space with exactly one point:

({*}, (0), δ_{*})

. Then,

d_{G W, p} (X, {*}) = \frac{1}{2} {(\int \int {(d_{X} (x, x^{'}))}^{p} μ_{X} (d x) μ_{X} (d x^{'}))}^{1 / p}

and we define

{diam}_{p} (X)

—the p-statistical diameter of X—as twice the right-hand side. Notice that

{lim}_{p \to \infty} {diam}_{p} (X)

is equal to the usual diameter of X (as a metric space).

Question 1.

To what extent are we able to replicate the nice properties of

(M, d_{G H})

in the context of

(M^{w}, d_{G W, p})

? In particular, it is of interest to investigate whether this new space of datasets is complete and whether one can easily identify rich pre-compact classes.

3.1. Pre-Compactness

Theorem 4

([3]). For a function non-decreasing

ρ : [0, \infty) \to [0, 1]

, such that

ρ (ε) > 0

for

ε > 0

and

D > 0

, let

F^{w} (ρ, D) \subset M^{w}

denote the set of all mm-spaces X, such that

diam (X) \leq D

and

{inf}_{x} μ_{X} (B_{ε} (x)) \geq ρ (ε)

for all

ε > 0

. Then,

F^{w} (ρ, D)

is pre-compact for the Gromov–Wasserstein topology, for any

p \geq 1

.

Remark 2.

Recall Example 2, where closed n-dimensional Riemannian manifolds were regarded as metric spaces. One can, all the same, embed closed Riemannian manifolds into

M^{w}

via

(X, g_{X}) \mapsto (X, d_{X}, μ_{X})

, where

d_{X}

is the geodesic distance induced by the metric tensor

g_{X}

and

μ_{X}

stands for the normalized volume measure on X. It is well known [4] that for

ε > 0

small,

μ_{X} (B_{ε} (x)) = \frac{c_{n}}{vol (X)} ε^{n} (1 - \frac{s_{X} (x)}{6 (n + 1)} ε^{2} + O (ε^{4}))

, where

s_{X} (x)

is the scalar curvature of X at x, and

vol (X)

is the total volume of X. Thus, a lower bound on

μ_{X} (B_{ε} (x))

plays the role of a proxy for an upper bound on curvature.

3.2. Completeness

The Space

M^{w}

with any p-Gromov–Wasserstein distance is not complete. Indeed, consider the following family of mm-spaces:

Δ_{n} \in M^{w}

, where

Δ_{n}

consists of

n \in N

points at distance one from each other, and all with weights

1 / n

.

Claim 1.

For all

n, m \geq 1

,

d_{G W, p} (Δ_{n}, Δ_{m}) \leq \frac{1}{2} (n^{- 1 / p} + m^{- 1 / p})

.

The claim will follow from the following claim and triangle inequality for

d_{G W, p}

:

Claim 2.

For all

n, m \geq 1

,

d_{G W, p} (Δ_{n}, Δ_{n \cdot m}) \leq \frac{1}{2} n^{- 1 / p} .

In order to verify the claim, we denote by

{x_{1}, x_{2}, \dots, x_{n}}

the points of

Δ_{n}

and label the points in

Δ_{n \cdot m}

by

{y_{11}, \dots, y_{1 m}, y_{21}, \dots, y_{2 m}, \dots \dots, y_{n 1}, \dots, y_{n m}} .

Consider the following coupling between

μ_{n}

and

μ_{n \cdot m}

, the reference measures on

Δ_{n}

and

Δ_{n \cdot m}

:

μ (x_{i}, y_{k j}) : = \frac{1}{n \cdot m} δ_{i k}, for all i, k \in {1, \dots, n} and j \in {1, \dots, m}

It is clear that this defines a valid coupling between

μ_{n}

and

μ_{n \cdot m} .

Now, note that

\begin{matrix} J (μ) : = & \sum_{i, i^{'}} \sum_{(k, j), (k^{'}, j^{'})} {| d_{Δ_{n}} (x_{i}, x_{i^{'}}) - d_{Δ_{n \cdot m}} (y_{k j}, y_{k^{'} j^{'}}) |}^{p} μ (x_{i}, y_{k j}) μ (x_{i^{'}}, y_{k^{'} j^{'}}) \\ = & \frac{1}{{(n \cdot m)}^{2}} \sum_{i, i^{'}} \sum_{j, j^{'}} {| d_{Δ_{n}} (x_{i}, x_{i^{'}}) - d_{Δ_{n \cdot m}} (y_{i j}, y_{i^{'} j^{'}}) |}^{p} \\ = & \frac{1}{{(n \cdot m)}^{2}} \sum_{i} \sum_{j, j^{'}} {| 1 - δ_{j j^{'}} |}^{p} \\ = & \frac{m - 1}{n \cdot m} \\ \leq & n^{- 1} \end{matrix}

Now, by definition,

d_{G W, p} (Δ_{m}, Δ_{n \cdot m}) \leq \frac{1}{2} {(J (μ))}^{1 / p}

, so the claim follows.

Claim 1 indicates that

{Δ_{n}}_{n \in N}

constitutes a Cauchy sequence in

M^{w}

. However, a potential limit object for this sequence will have countably infinitely many points at distance one from each other. This space is not compact, thus

d_{G W, p}

is not a complete metric.

3.3. Other Properties: Geodesics and Alexandrov Curvature

Very recently, Sturm [5] pointed out that

M^{w}

is a geodesic space when endowed with any

d_{G W, p}

,

p \geq 1

. This means that given any two spaces

X_{0}, X_{1}

in

M^{w}

, one can find a curve

[0, 1] ∋ t \mapsto X_{t} \in M^{w}

, such that

d_{G W, p} (X_{t}, X_{s}) = | t - s | d_{G W, p} (X_{0}, X_{1}), s, t \in [0, 1] .

Proposition 1

([5]). For each

p \geq 1

, the space

(M^{w}, d_{G W, p})

is geodesic. Furthermore, for

p > 1

, the following curves on

M^{w}

define geodesics between

(X_{0}, d_{0}, μ_{0})

and

(X_{1}, d_{1}, μ_{1})

in

M^{w}

:

[0, 1] ∋ t \mapsto (X_{0} \times X_{1}, d_{t}, μ)

where

d_{t} ((x_{0}, x_{1}), (x_{0}^{'}, x_{1}^{'})) : = (1 - t) d_{0} (x_{0}, x_{0}^{'}) + t d_{1} (x_{1}, x_{1}^{'})

for

(x_{0}, x_{1}), (x_{0}^{'}, x_{1}^{'}) \in X_{0} \times X_{1}

and

μ \in U_{p}^{opt} (X, Y) .

Furthermore, for

p > 1

, all geodesics are of this form.

Sturm further proved that the completion

\bar{M^{w}}

of the space

M^{w}

with metric

d_{G W, 2}

satisfies:

Theorem 5

([5]). The metric space

(\bar{M^{w}}, d_{G W, 2})

is an Alexandrov space of curvature

\geq 0

.

Amongst the consequences of this property is the fact that one can conceive of gradient flows on the space of all mm-spaces [5].

3.4. The Metric $d_{G W, p}$ in Applications

Applications of the notion of Gromov-Wasserstein distance arise in shape and data analysis. In shape analysis, the main application is shape matching under invariances. Many easily computable lower bounds for the GW distance have been discussed in [3,6]. All of them lead to solving linear programming optimization problems (for which there are polynomial time algorithms) or can be computed via elicit formulas. As an example, consider the following invariant of an mm-space

(X, d_{X}, μ_{X})

:

H_{X} : [0, \infty) \to [0, 1], t \mapsto μ_{X} \otimes μ_{X} ({(x, x^{'}) | d_{X} (x, x^{'}) \leq t})

This invariant simply encodes the distribution of pairwise distances on the dataset X, and it is defined by analogy with the so-called shape distributions that are well known in computer graphics [7]. Then, one has:

Proposition 2

([3,6]). Let

X, Y \in M^{w}

be any two mm-spaces and

p \geq 1

. Then,

d_{G W, p} (X, Y) \geq \frac{1}{2} \int_{0}^{\infty} | H_{X} (t) - H_{Y} (t) | d t

Remark 3.

This invariant is also related to the work of Boutin and Kemper [8] and Brinkman and Olver [9].

Other lower bounds which can be computed in time polynomial in the number of points in the underlying mm-spaces have been reported in [3]. As a primary example, one has that the local shape distributions of shapes provide a lower bound which is strictly stronger than the ones in the Proposition above. In more detail, consider for a given mm-space

(X, d_{X}, μ_{X})

the invariant:

h_{X} : X \times [0, \infty) \to [0, 1], (x, t) \mapsto μ_{X} (\bar{B_{t} (x)}) .

Then, for mm-spaces X and Y consider the cost function

c_{X, Y} : X \times Y \to R^{+}

given by:

c_{X, Y} (x, y) : = \int_{0}^{\infty} | h_{X} (x, t) - h_{Y} (y, t) | d t .

One then has:

Proposition 3

([3,6]). Let

X, Y \in M^{w}

be any two mm-spaces and

p \geq 1

. Then,

d_{G W, p} (X, Y) \geq \frac{1}{2} inf_{μ} \int \int c_{X, Y} (x, y) μ (d x \times d y),

where μ ranges in

U (μ_{X}, μ_{Y}) .

Remark 4.

Solving for the infimum above leads to a mass transportation problem for which there exists efficient linear programming techniques.

Remark 5.

It is possible to define a notion of spectral Gromov-Wasserstein distance which operates at the level of compact Riemannian manifolds without boundaries, and is based con the comparison of heat-kernels. This notion permits inter-relating many pre-existing shape matching methods and suggests some others [12].

4. Discussion and Outlook

The Gromov–Hausdorff distance offers a useful language for expressing different tasks in shape and data analysis. Its origins are in the work of Gromov on synthetic geometry. For finite metric spaces, the Gromov–Hausdorff distance leads to solving NP-hard combinatorial optimization problems. Related to construction is Gromov–Wasserstein distances that operate on metric measure spaces [3,10]. In contrast to the Gromov–Hausdorff distance, the computation of Gromov–Wasserstein distances leads to solving quadratic optimization problems on continuous variables. The space of all metric measures spaces endowed with a certain variant of the Gromov–Wasserstein distance [3] enjoys nice theoretical properties [5]. It seems of interest to develop provably correct approximations to these distances when restricted to some suitable subclasses of finite metric spaces. Other aspects of the Gromov–Wasserstein distance are discussed in [3,5,10,11,12].

Conflicts of Interest

The author declares no conflict of interest.

References

Burago, D.; Burago, Y.; Ivanov, S. A Course in Metric Geometry. In AMS Graduate Studies in Math; American Mathematical Society: Providence, RI, USA, 2001; Volume 33. [Google Scholar]
Petersen, P. Gromov-Hausdorff convergence of metric spaces. In Differential Geometry: Riemannian Geometry, Proceedings of the Symposium in Pure Mathematics, Los Angeles, CA, USA, 8–18 July 1990; 1990. [Google Scholar]
Mémoli, F. Gromov-Wasserstein distances and the metric approach to object matching. Found. Comput. Math. 2011, 11, 417–487. [Google Scholar] [CrossRef]
Sakai, T. Riemannian geometry. In Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 1996; Volume 149. [Google Scholar]
Sturm, K.-T. The space of spaces: Curvature bounds and gradient flows on the space of metric measure spaces. Mathematics 2012. [Google Scholar]
Mémoli, F. On the use of Gromov-Hausdorff distances for shape comparison. In Proceedings of the Point Based Graphics 2007, Prague, Czech Republic, 2–3 September 2007.
Osada, R.; Funkhouser, T.; Chazelle, B.; Dobkin, D. Shape distributions. ACM Trans. Graph. 2002, 21, 807–832. [Google Scholar] [CrossRef]
Boutin, M.; Kemper, G. On reconstructing n-point configurations from the distribution of distances or areas. Adv. in Appl. Math. 2004, 32, 709–735. [Google Scholar] [CrossRef]
Brinkman, D.; Olver, P.J. Invariant histograms. Am. Math. Mon. 2012, 119, 4–24. [Google Scholar]
Sturm, K.-T. On the geometry of metric measure spaces. I. Acta Math. 2006, 196, 65–131. [Google Scholar] [CrossRef]
Mémoli, F. Gromov-Hausdorff distances in Euclidean spaces. In Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2008, Anchorage, AK, USA, 23–28 June 2008.
Mémoli, F. A spectral notion of Gromov-Wasserstein distances and related methods. Appl. Comput. Harmon. Anal. 2011, 30, 363–401. [Google Scholar] [CrossRef]

© 2014 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Mémoli, F. The Gromov–Wasserstein Distance: A Brief Overview. Axioms 2014, 3, 335-341. https://doi.org/10.3390/axioms3030335

AMA Style

Mémoli F. The Gromov–Wasserstein Distance: A Brief Overview. Axioms. 2014; 3(3):335-341. https://doi.org/10.3390/axioms3030335

Chicago/Turabian Style

Mémoli, Facundo. 2014. "The Gromov–Wasserstein Distance: A Brief Overview" Axioms 3, no. 3: 335-341. https://doi.org/10.3390/axioms3030335

APA Style

Mémoli, F. (2014). The Gromov–Wasserstein Distance: A Brief Overview. Axioms, 3(3), 335-341. https://doi.org/10.3390/axioms3030335

Article Menu

The Gromov–Wasserstein Distance: A Brief Overview

Abstract

1. Introduction

1.1. Notation and Background Concepts

2. The Gromov–Hausdorff Distance

3. A Metric on $M^{w}$

3.1. Pre-Compactness

3.2. Completeness

3.3. Other Properties: Geodesics and Alexandrov Curvature

3.4. The Metric $d_{G W, p}$ in Applications

4. Discussion and Outlook

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

The Gromov–Wasserstein Distance: A Brief Overview

Abstract

1. Introduction

1.1. Notation and Background Concepts

2. The Gromov–Hausdorff Distance

3. A Metric on M w

3.1. Pre-Compactness

3.2. Completeness

3.3. Other Properties: Geodesics and Alexandrov Curvature

3.4. The Metric d G W , p in Applications

4. Discussion and Outlook

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. A Metric on $M^{w}$

3.4. The Metric $d_{G W, p}$ in Applications