# Hierarchical and Unsupervised Graph Representation Learning with Loukas’s Coarsening

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Definitions and Background

**Definition**

**1.**

**Definition**

**2.**

#### 3.1. Weisfeiler–Lehman Procedure (WL)

**WL**-Optimal Assignment kernel [27] as state-of-the-art. The procedure to generate the labels is the following:

#### 3.2. Negative Sampling and Mutual Information

#### 3.3. Graph2Vec

**WL**from this graph. Minimizing the cross entropy with respect to $\theta $ leads to the following expression for the loss:

**WL**label embedding ${\theta}_{x}\in {\mathbb{R}}^{d}$, which are vector of parameters randomly initialized and optimized with SGD. There is one such vector for each graph g and each label x produced by

**WL**. The resulting graph embedding is $\mathcal{E}(g)={\theta}_{g}$ while ${\theta}_{x}$ can be discarded. Optimizing this objective ensures to maximize the mutual information between the

**WL**labels and the graph embeddings, which is a way to compress information about the distribution of

**WL**labels into the embedding.

## 4. Contribution: Hierarchical Graph2Vec (HG2V)

- We first show that
**WL**fails to capture global scale information, which is hurtful for many tasks; - We then show that such flaw can be corrected by the use of graph coarsening. In particular, Loukas’s coarsening exhibits good properties in this regard;
- We finally show that the advantage of GNN over
**WL**is to be continuous functions in node features. They are robust to small perturbations.

- The training is unsupervised. No label is required. The representation can be used for different tasks.
- The model is inductive, trained once for all with the graphs of the dataset in linear time. The training dataset is used as a prior to embed new graphs, whatever their underlying distribution.
- It handles continuous nodes attributes by replacing the hash function in
**WL**procedure by a Convolutional Graph NN. It can be combined with other learning layers, serving as pre-processing step for feature extraction. - The model is end-to-end differentiable. Its input and its output can be connected to other deep neural networks to be used as building block in a full pipeline. The signal of the loss can be back-propagated through the model to train feature extractors, or to retrain the model in transfer learning. For example, if the node features are rich and complex (images, audio), a CNN can be connected to the input to improve the quality of representation.
- The structures of the graph at all scales are summarized using Loukas coarsening. The embedding combines
**local view and global view**of the graph.

**HG2V**.

Algorithm 1: High-level version of the Hierarchical Graph2Vec (HG2V) algorithm |

#### 4.1. Loukas’s Coarsening

**WL**procedure, and the benefit of graph coarsening to overcome this issue. For simplification, we will put aside the node attributes for a moment, and only focus on the graph structure. Even in this simplified setting,

**WL**appears to be sensitive to structural noise.

#### 4.1.1. Wesfeiler-Lehman Sensibility to Structural Noise

**WL**to discriminate all graph patterns comes with the incapacity to recognize as similar a graph and its noisy counterpart. Each edge added or removed can strongly perturb the histogram of labels produced by

**WL**. Said otherwise,

**WL**is not a good solution to inexact graph matching problem.

**WL**procedure.

#### 4.1.2. Robustness to Structural Noise with Loukas’s Coarsening

**WL**procedure characterizes a graph as a sequence of rooted subtrees with increasing width. While this description is suitable for small patterns and inexact graph matching at local scale, it is very sensitive to structural noise. Adding or removing few edges (without hurting the global shape) changes the labels completely for subtrees of higher width. So as to characterize global features (e.g., communities, bridges, sparsest cuts…), we rely on a sequence of coarsened graphs based on the Loukas’s procedure, that replaces the different iterations of neighborhoods of

**WL**.

**Definition**

**3**

**pooling function**:

**WL**procedure.

**Definition**

**4**

#### 4.2. Hierarchy of Neighborhoods

#### 4.3. Handling Continuous Node Attributes with Truncated Krylov

**WL**algorithm uses a discrete hash function in Equation (2), with the issue that nodes sharing similar but not quite identical neighborhoods are considered different. If the differences are caused by noise in computations or measures, they should not result in much differences in the labels. For that, we relax the injectivity property of

**WL**by replacing it by a function with a continuity property.

**WL**[32,34,35]. We require the opposite, and we emphasize the importance of not having a too strong discriminator. We use the extension of the Gromov–Wasserstein distance to attributed graphs, that requires the mapping to preserve both edge weights and node attributes. The resulting distance is a special case of the Fused Gromov–Wasserstein distance [36].

**Definition**

**5**

**Lemma**

**1**

**WL**.

#### 4.4. Hierarchical Negative Sampling

#### Complexity

## 5. Evaluation

#### 5.1. Datasets

#### 5.2. Supervised Classification

#### 5.2.1. Training Procedure

#### 5.2.2. Model Selection

#### 5.2.3. Baselines

**WL**-Optimal Assignment [27] and the Wasserstein Wesfeiler–Lehman [4] graph kernels. It almost always outperform inductive methods based on neural networks. However, like many kernel-based method, they have quadratic time complexity in the number of graphs, which is prohibitive for dealing with large datasets.

**supervised**. DiffPool also relies on graph coarsening, but their pooling function is learned, while Loukas coarsening is task agnostic.

#### 5.2.4. Results

#### 5.2.5. Computation Time

#### 5.3. Inductive Learning

#### Results

#### 5.4. Ablative Studies

**WL**iterations with forward pass through a GNN. All the attributes available are used.

**WL**iteration would do). The sequence of (unconnected) graphs is fed into Graph2Vec. Continuous attributes are ignored because

**WL**can not handle them.

#### Results

#### 5.5. Latent Space

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Proofs

**Proof.**

## Appendix B. Additional Visualizations of the Embeddings

**Figure A1.**Six nearest neighbors of graphs in latent space from MNIST for four graphs. Column 0 correspond to the randomly chosen graph then the six nearest neighbors are draw in increasing distance order from left to right (from 1 to 6).

**Figure A2.**Six nearest neighbors of graphs in latent space from IMDB for ten graphs. Column 0 correspond to the randomly chosen graph then the six nearest neighbors are draw in increasing distance order from left to right (from 1 to 6).

**Figure A3.**Six nearest neighbors of graphs in latent space from PTC for six graphs. Column 0 correspond to the randomly chosen graph then the six nearest neighbors are draw in increasing distance order from left to right (from 1 to 6).

## Appendix C. Details about the Datasets

#### Appendix C.1. DLA

#### Appendix C.2. MNIST and USPS

## References

- Hamilton, W.L.; Ying, R.; Leskovec, J. Representation learning on graphs: Methods and applications. arXiv
**2017**, arXiv:1709.05584. [Google Scholar] - Narayanan, A.; Chandramohan, M.; Venkatesan, R.; Chen, L.; Liu, Y.; Jaiswal, S. graph2vec: Learning distributed representations of graphs. arXiv
**2017**, arXiv:1707.05005. [Google Scholar] - Loukas, A. Graph reduction with spectral and cut guarantees. J. Mach. Learn. Res.
**2019**, 20, 1–42. [Google Scholar] - Togninalli, M.; Ghisu, E.; Llinares-López, F.; Rieck, B.; Borgwardt, K. Wasserstein weisfeiler-lehman graph kernels. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 6439–6449. [Google Scholar]
- Vishwanathan, S.V.N.; Schraudolph, N.N.; Kondor, R.; Borgwardt, K.M. Graph kernels. J. Mach. Learn. Res.
**2010**, 11, 1201–1242. [Google Scholar] - Shervashidze, N.; Borgwardt, K.M. Fast subtree kernels on graphs. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems 2009, Vancouver, BC, Canada, 7–10 December 2009; pp. 1660–1668. [Google Scholar]
- Shervashidze, N.; Schweitzer, P.; Leeuwen, E.J.V.; Mehlhorn, K.; Borgwardt, K.M. Weisfeiler-lehman graph kernels. J. Mach. Learn. Res.
**2011**, 12, 2539–2561. [Google Scholar] - Feragen, A.; Kasenburg, N.; Petersen, J.; De Bruijne, M.; Borgwardt, K. Scalable kernels for graphs with continuous attributes. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 216–224. [Google Scholar]
- Kriege, N.; Mutzel, P. Subgraph matching kernels for attributed graphs. arXiv
**2012**, arXiv:1206.6483. [Google Scholar] - Morris, C.; Kriege, N.M.; Kersting, K.; Mutzel, P. Faster kernels for graphs with continuous attributes via hashing. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016. [Google Scholar]
- Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 1024–1034. [Google Scholar]
- Sun, F.Y.; Hoffman, J.; Verma, V.; Tang, J. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In Proceedings of the ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Bianchi, F.M.; Grattarola, D.; Livi, L.; Alippi, C. Hierarchical representation learning in graph neural networks with node decimation pooling. arXiv
**2019**, arXiv:1910.11436. [Google Scholar] - Dorfler, F.; Bullo, F. Kron reduction of graphs with applications to electrical networks. IEEE Trans. Circuits Syst. I Regul. Pap.
**2013**, 60, 150–163. [Google Scholar] [CrossRef] [Green Version] - Bravo Hermsdorff, G.; Gunderson, L. A unifying framework for spectrum-preserving graph sparsification and coarsening. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 7736–7747. [Google Scholar]
- Ying, Z.; You, J.; Morris, C.; Ren, X.; Hamilton, W.; Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montreal, QC, Canada, 3–8 December 2018; pp. 4800–4810. [Google Scholar]
- Bianchi; Maria, F.; Grattarola, D.; Alippi, C. Spectral clustering with graph neural networks for graph pooling. In Proceedings of the 37th International Conference on Machine Learning, Online Event, 12–18 July 2020. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst.
**2020**, 1–21. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the ICLR, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Luan, S.; Zhao, M.; Chang, X.W.; Precup, D. Break the ceiling: Stronger multi-scale deep graph convolutional networks. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 10945–10955. [Google Scholar]
- Loukas, A. What graph neural networks cannot learn: Depth vs. width. In Proceedings of the ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Gama, F.; Marques, A.G.; Leus, G.; Ribeiro, A. Convolutional neural network architectures for signals supported on graphs. IEEE Trans. Signal Process.
**2019**, 67, 1034–1049. [Google Scholar] [CrossRef] [Green Version] - Weisfeiler, B.; Lehman, A.A. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Tech. Informatsia
**1968**, 2, 12–16. [Google Scholar] - Kriege, N.M.; Giscard, P.L.; Wilson, R. On valid optimal assignment kernels and applications to graph classification. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; pp. 1623–1631. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 3111–3119. [Google Scholar]
- Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Nowozin, S.; Cseke, B.; Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; pp. 271–279. [Google Scholar]
- Melamud, O.; Goldberger, J. Information-theory interpretation of the skip-gram negative-sampling objective function. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Volume 2, pp. 167–171. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Hagberg, A.; Swart, P.; S Chult, D. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Pasadena, CA, USA, 19–24 August 2008; Varoquaux, G., Vaught, T., Millman, J., Eds.; 2008; pp. 11–15. [Google Scholar]
- Maron, H.; Ben-Hamu, H.; Serviansky, H.; Lipman, Y. Provably powerful graph networks. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 2156–2167. [Google Scholar]
- Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4602–4609. [Google Scholar] [CrossRef] [Green Version]
- Vayer, T.; Chapel, L.; Flamary, R.; Tavenard, R.; Courty, N. Optimal Transport for structured data with application on graphs. In Proceedings of the ICML 2019—36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Kersting, K.; Kriege, N.M.; Morris, C.; Mutzel, P.; Neumann, M. Benchmark Data Sets for Graph Kernels. 2016. Available online: https://chrsmrrs.github.io/datasets/docs/datasets/ (accessed on 20 August 2020).
- Orsini, F.; Frasconi, P.; DeRaedt, L. Graph invariant kernels. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3756–3762. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Nikolentzos, G.; Siglidis, G.; Vazirgiannis, M. Graph kernels: A survey. arXiv
**2019**, arXiv:1904.12218. [Google Scholar] - Errica, F.; Podda, M.; Bacciu, D.; Micheli, A. A Fair Comparison of Graph Neural Networks for Graph Classification. arXiv
**2019**, arXiv:1912.09893. [Google Scholar] - Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv
**2017**, arXiv:1708.07747. [Google Scholar] - Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. arXiv
**2017**, arXiv:1704.01212. [Google Scholar] - Witten, T.A., Jr.; Sander, L.M. Diffusion-limited aggregation, a kinetic critical phenomenon. Phys. Rev. Lett.
**1981**, 47, 1400. [Google Scholar] [CrossRef] - Witten, T.A.; Sander, L.M. Diffusion-limited aggregation. Phys. Rev. B
**1983**, 27, 5686. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**Similarity score as a function of edges removed for different stages of

**WL**iterations. The similarity score reaches 100% for identical sets of labels, and 0% for disjoint sets of labels. (

**a**) Cycle: the 2-regular graph with one giant connected component. (

**b**) Tree: three children per node, except the ones at the last level. (

**c**) Wheel: like cycle graph, with an additional node connected to all the others. (

**d**) Ladder: two paths of 250 nodes each, where each pair of nodes is joined by an edge

**Figure 3.**Coarsening of four graphs built from MNIST digits using Loukas’s algorithm. The bigger the node, the wider the neighborhood pooled. Similar digits share similar shapes.

**Figure 4.**Wasserstein distance between spectra of graphs ${g}^{0}$ and ${h}^{0}$ sampled from different datasets (see Section 5 for their description) compared to the same distance between their coarsened graphs ${g}^{1}$ and ${h}^{1}$ and their two times coarsened graphs ${g}^{2}$ and ${h}^{2}$. In blue (resp. red) are given the correlation coefficients of the distance between ${g}^{0}$ & ${h}^{0}$ and ${g}^{1}$ & ${h}^{1}$ (resp. ${g}^{2}$ & ${h}^{2}$). These coefficients have been computed after averaging 10 runs where we sampled 1000 graphs couples (${g}^{0}$, ${h}^{0}$) for each dataset. As expected the more we coarsened graph the more the correlation coefficient decreases because coarsening always lost some structural informations.

**Figure 5.**Single level of the pyramid. Local information ${x}^{l}(u)$ centered around node u is extracted from graph ${g}^{l}$. The graph is coarsened to form a new graph ${g}^{l+1}$. There, ${g}^{l+1}(\mathcal{P}(u))$ captures information at a larger scale, centered on node $\mathcal{P}(u)$. The pair $({x}^{l}(u),{g}^{l+1}(\mathcal{P}(u)))$ is used as positive example in the negative sampling algorithm, and it helps to maximize mutual information between global and local view.

**Figure 6.**Six nearest neighbors of graphs in the learned latent space for four graphs from IMDB-b and MNIST. Column 0 corresponds to a randomly chosen graph, then the six nearest neighbors are drawn in increasing distance order from left to right (1 to 6).

**Table 1.**Key properties of methods (related or proposed) for graph embedding. N is the number of graphs. Symbol

**✗**for complexity (inference) means the method is transductive (and not inductive) and one needs to use the same time as for training. Symbol

**✓**for supervised means labels are required to learn a representation (by back-propagating classification loss).

Method | Continuous Attributes | Complexity (Training) | Complexity (Inference) | End-to-End Differentiable | Supervised |
---|---|---|---|---|---|

Kernel methods, e.g., WL-OA [27], WWL [4] | ✓ | $\mathcal{O}({N}^{2})$* | $\mathcal{O}(N)$* | ✗ | ✗ |

Graph2Vec [2] | ✗ | $\mathcal{O}(N)$ | ✗ | ✗ | ✗ |

GIN [32], DiffPool [17], MinCutPool [18] | ✓ | $\mathcal{O}(N)$ | $\mathcal{O}(1)$ | ✓ | ✓ |

HG2V (Section 4), Infograph [13] | ✓ | $\mathcal{O}(N)$ | $\mathcal{O}(1)$ | ✓ | ✗ |

**Table 2.**Accuracy on classification tasks. HG2V is trained over both TrainVal+Test splits, without using labels due to its unsupervised nature. Model selection of C-SVM and hyper-parameters of HG2V have been done with 5-cross validation over TrainVal split. We report on the accuracy over the Test split, averaged over 10 runs, and with standard deviation. Unavailable result marked as

**✗**.

DATASET | #graphs | #nodes | HG2V (Ours) | Graph2Vec | Infograph | DiffPool (Supervised) | GIN (Supervised) | MinCutPool (Supervised) | WL-OA (Kernel) | WWL (Kernel) |
---|---|---|---|---|---|---|---|---|---|---|

IMDB-m | 1500 | 13 | $47.9\pm 1.0$ | $\mathbf{50.4}\pm \mathbf{0.9}$ | $49.6\pm 0.5$ | $45.6\pm 3.4$ | $48.5\pm 3.3$ | ✗ | ✗ | ✗ |

PTC_FR | 351 | 15 | $\mathbf{67.5}\pm \mathbf{0.5}$ | $60.2\pm 6.9$ | ✗ | ✗ | ✗ | ✗ | $63.6\pm 1.5$ | ✗ |

FRANK. | 4337 | 17 | $\mathbf{65.3}\pm \mathbf{0.7}$ | $60.4\pm 1.3$ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |

MUTAG | 188 | 18 | $81.8\pm 1.8$ | $83.1\pm 9.2$ | $\mathbf{89.0}\pm \mathbf{1.1}$ | ✗ | ✗ | ✗ | $84.5\pm 1.7$ | $87.3\pm 1.5$ |

IMDB-b | 1000 | 20 | $71.3\pm 0.8$ | $63.1\pm 0.1$ | $73.0\pm 0.9$ | $68.4\pm 3.3$ | $71.2\pm 3.9$ | ✗ | ✗ | $\mathbf{74.4}\pm \mathbf{0.8}$ |

NCI1 | 4110 | 30 | $76.3\pm 0.8$ | $73.2\pm 1.8$ | ✗ | $76.9\pm 1.9$ | $80.0\pm 1.4$ | ✗ | $\mathbf{86.1}\pm \mathbf{0.2}$ | $85.8\pm 0.2$ |

NCI109 | 4127 | 30 | $75.6\pm 0.7$ | $74.3\pm 1.5$ | ✗ | ✗ | ✗ | ✗ | $\mathbf{86.3}\pm \mathbf{0.2}$ | ✗ |

ENZYMES | 600 | 33 | $66.0\pm 2.5$ | $51.8\pm 1.8$ | ✗ | $59.5\pm 5.6$ | $59.6\pm 4.5$ | ✗ | $59.9\pm 1.1$ | $\mathbf{73.3}\pm \mathbf{0.9}$ |

PROTEINS | 1113 | 39 | $75.7\pm 0.7$ | $73.3\pm 2.0$ | ✗ | $73.7\pm 3.5$ | $73.3\pm 4.0$ | $76.5\pm 2.6$ | $76.4\pm 0.4$ | $\mathbf{77.9}\pm \mathbf{0.8}$ |

MNIST | 10000 | 151 | $\mathbf{96.1}\pm \mathbf{0.2}$ | $56.3\pm 0.7$ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |

D&D | 1178 | 284 | $79.2\pm 0.8$ | $58.6\pm 0.1$ | ✗ | $75.0\pm 3.5$ | $75.3\pm 2.9$ | $\mathbf{80.8}\pm \mathbf{2.3}$ | $79.2\pm 0.4$ | $79.7\pm 0.5$ |

REDDIT-b | 2000 | 430 | $91.2\pm 0.6$ | $75.7\pm 1.0$ | $82.5\pm 1.4$ | $87.8\pm 2.5$ | $89.9\pm 1.9$ | $\mathbf{91.4}\pm \mathbf{1.5}$ | $89.3$ | ✗ |

DLA | 1000 | 501 | $\mathbf{99.9}\pm \mathbf{0.1}$ | $77.2\pm 2.5$ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |

REDDIT-5K | 4999 | 509 | $55.5\pm 0.7$ | $47.9\pm 0.3$ | $53.5\pm 1.0$ | $53.8\pm 1.4$ | $\mathbf{56.1}\pm \mathbf{1.7}$ | ✗ | ✗ | ✗ |

**Table 3.**Accuracy on classification tasks by training on some input distribution and performing inference on an other. The hyper-parameters selected are identical to the ones of Table 2.

Training Set | Inference Set | Accuracy (Inference) | Delta with baseline (see Table 2) |
---|---|---|---|

MNIST | USPS | 94.86 | ✗ |

USPS | MNIST | 93.68 | −2.40 |

REDDIT-b | REDDIT-5K | 55.00 | −0.48 |

REDDIT-5K | REDDIT-b | 91.00 | −0.15 |

REDDIT-b | IMDB-b | 69.00 | −2.25 |

REDDIT-5K | IMDB-b | 69.50 | −1.75 |

MNIST | FASHION MNIST | 83.35 | ✗ |

**Table 4.**Ablative studies. The accuracy on test set is in the column “Accuracy”. The column “Delta” corresponds to the difference in average accuracy with Graph2Vec.

**OOM**—Out of Memory Error.

HG2V | Graph2Vec + GNN | Graph2Vec + Loukas | Graph2Vec | |||||
---|---|---|---|---|---|---|---|---|

DATASET | #nodes | Accuracy | Delta | Accuracy | Delta | Accuracy | Delta | |

IMDB-b | 20 | 70.85 | +7.75 | 70.70 | +7.60 | 57.5 | −5.60 | 63.10 |

NCI1 | 30 | 77.97 | +4.75 | 75.40 | +2.18 | 65.45 | −7.77 | 73.22 |

MNIST | 151 | 95.83 | +39.56 | 91.05 | +34.78 | 72.5 | +16.23 | 56.27 |

D&D | 284 | 78.01 | +19.37 | 79.26 | +13.16 | 66.10 | +7.45 | 58.64 |

REDDIT-B | 430 | 91.95 | +16.23 | OOM | ✗ | 82.50 | +6.78 | 75.72 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Béthune, L.; Kaloga, Y.; Borgnat, P.; Garivier, A.; Habrard, A.
Hierarchical and Unsupervised Graph Representation Learning with Loukas’s Coarsening. *Algorithms* **2020**, *13*, 206.
https://doi.org/10.3390/a13090206

**AMA Style**

Béthune L, Kaloga Y, Borgnat P, Garivier A, Habrard A.
Hierarchical and Unsupervised Graph Representation Learning with Loukas’s Coarsening. *Algorithms*. 2020; 13(9):206.
https://doi.org/10.3390/a13090206

**Chicago/Turabian Style**

Béthune, Louis, Yacouba Kaloga, Pierre Borgnat, Aurélien Garivier, and Amaury Habrard.
2020. "Hierarchical and Unsupervised Graph Representation Learning with Loukas’s Coarsening" *Algorithms* 13, no. 9: 206.
https://doi.org/10.3390/a13090206