Next Article in Journal / Special Issue
Fast Spectral Approximation of Structured Graphs with Applications to Graph Filtering
Previous Article in Journal
Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification
Previous Article in Special Issue
Hierarchical and Unsupervised Graph Representation Learning with Loukas’s Coarsening
 
 
Article
Peer-Review Record

Fused Gromov-Wasserstein Distance for Structured Objects

Algorithms 2020, 13(9), 212; https://doi.org/10.3390/a13090212
by Titouan Vayer 1,*, Laetitia Chapel 1, Remi Flamary 2, Romain Tavenard 3 and Nicolas Courty 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Algorithms 2020, 13(9), 212; https://doi.org/10.3390/a13090212
Submission received: 8 July 2020 / Revised: 20 August 2020 / Accepted: 26 August 2020 / Published: 31 August 2020
(This article belongs to the Special Issue Efficient Graph Algorithms in Machine Learning)

Round 1

Reviewer 1 Report

In this paper, the authors extend and analyze the so-called Fused-Gromov Wasserstein metric defined in previous work by the same team. While Wasserstein metric act on probability distributions on the same space and Gromov-Wasserstein metric compares measured metric spaces, the FGW interpolates between the two to compare objects whose components have both features and structural relations between them (eg a graph with node features). The idea was proposed in a previous conference paper by the same authors, here it is extended to (potentially) continuous metric spaces, and some of its properties are analyzed - in particular sample complexity and geodesic properties. A significant number of experiments are then performed to demonstrate the use of FGW in various contexts.

This paper is clear and well-written. It includes illustrating examples along the way which facilitates the comprehension of the sometimes quite complex or cumbersome definitions. If the theoretical analysis remains sometimes a bit preliminary, it opens perspectives (eg on the properties of the geodesics), which is appropriate for a paper of this nature. Moreover, the large number of well-designed experiments demonstrate clearly the use of the FGW metric is various contexts of signal processing and machine learning on graphs. I have some comments below that can be considered a bit "major", hence my asking for "major" revisions, but I believe that they are fixable and do not affect the overall quality of the paper.

My main comment would be about the definition of a structured object itself, which unless I am mistaking is flawed. Indeed, strictly speaking, the cartesian product X \times A contains *all* the couples (x,a) for all elements x in X and a in A. I believe this is not what is meant by the authors: for a labelled graph for instance, one has a collection of couples (x_i, a_i), but by no means it is the cartesian product of the collection of x_i and the collection of a_i. So X \times A should really be replaced by a *subset* of the product, often strict (alternatively, the support of \mu could be limited to the considered couples, but this is maybe less intuitive). I believe the analysis in the paper is still valid everywhere with modified notations (individual X and A are respectively replaced by the projections of the considered subset, etc), however the geodesic part may change quite significantly, since cartesian products cannot be used as liberately as before. This may however also simplify definitions for the finite-sample bound and maybe other parts.

Other comments / typos:
- l 69: what are "machine learning objects" ?
- l 69: "objects, This"
- l 76 and more generally in the introduction: maybe clarify that [1] is "previous work" (implicitely, by the same team)
- l 93: the map T is not defined
- l 109: the notion *of* structured objects
- the definition 6 is a bit verbose, could it be written: there exist an isometry I such that (I, Id) is surjective and measure-preserving
- is it truly useful to consider the case where q is not equal to 1 ? Everything seems cleaner when q = 1. If unavoidable to obtain good performance in the experiments, might be worth commenting why
- in general, try to avoid too detailed reference to the theorems *before* they appear, as in line 289-290 ("consider f as in theorem...")
- it could be good to give an idea how difficult it is to prove each theorem, or even include the proof in the main text when it is short (eg Theorem 2, which is just a consequence of known results and the Wasserstein formulation of FGW, in contrast to theorem 3 which seems unexpectedly difficult to show)
- eq (16): the notation for the spaces seems in the wrong order
- l 427: the hypothesis that the histogram associated to the barycenter is known seems strong. In the paper on GW barycenters by Peyre et al, they mention that it could be included in the optimization process as well, it might be good to remark the same thing for readers unaware of the reference
- l 473: maybe recall what MDS is ?
- in the experiments, datasets that have no node features are treated with GW and not FGW: are they really useful? In any case, it should be stated clearly in line 490 that features won't be artificially derived with a node embedding method, as could have been expected for FGW for instance
- the implementation of PSCN by the authors seems strangely weak on the classical ENZYMES dataset, is there an explanation?
- how are node features for SBM generated?
- maybe include some potential outlooks in section 6?

Author Response

We thank the reviewer for its very insightful comments. Please find attached our response

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper analyzes the Fused Gromov-Wasserstein distance [1] (slightly generalizing it with an exponent of p), by providing it a rigorous mathematical framework, as well as proving its several properties: metric properties, interpolation properties, geodesic properties, and a concentration result for the convergence of finite samples. It provides illustrating examples to demonstrate its advantage over Wasserstein and GW distances, and provide experiments showing its application on graph classification, graph barycenter and graph clustering.

The paper is very well written, the mathematical framework is rigorous, and the mathematical results are strong and build a solid ground for the FGW distance. I haven't checked the proofs in Section 7 but I believe they are correct. Although the experiments mostly follow [1], they make the paper more complete. I recommend publication.

 

Typos:

Line 135: "mesurable"

Line 288 and 307: ]0,1[

 

 

Reference:

[1] Vayer, T.; Courty, N.; Tavenard, R.; Laetitia, C.; Flamary, R. Optimal Transport for structured data with application on graphs. Proceedings of the 36th International Conference on Machine Learning; Chaudhuri, K.; Salakhutdinov, R., Eds.; PMLR: Long Beach, California, USA, 2019; Vol. 97, Proceedings of Machine Learning Research, pp. 6275–6284.

 

Author Response

We thank the reviewer for the appreciation of our work. Please find attached our response

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I thank the authors for taking into account my comments. The solution provided to answer my main remark about the definition of structured spaces is satisfactory, and as far as I can tell mathematically sound. I recommend the paper for publication.

Small typo: a space "A_n" is left at line 321, I don't know is this is correct/the best notation.

Back to TopTop