# Structural Hierarchy-Enhanced Network Representation Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

**Related Work.**The most relevant studies are HARP [7] and Marc [8], which are hierarchical NRL methods. HARP collapses nodes according to edge and star connections so that the hierarchy can be constructed for NRL. Marc iteratively consider 3-cliques as super nodes to construct the hierarchy. However, community knowledge in networks is not considered in both HARP and Marc. Besides, different-level’s node embeddings are learned independently in HARP and Marc. That said, higher-level NRL cannot utilize node embeddings derived from lower-level NRL. We will compare the proposed SHE with HARP and Marc in the experiments. As for NRLs using various hierarchical information, NetHiex [9] assumes each node is associated with a category, and categories form a hierarchical taxonomy, which is used for NRL. HRE [10] uses the relational hierarchy that comes from edge attributes for heterogeneous NRL. MINES [11] models multi-dimensional relations between different node types, along with their hierarchical connections, into the embeddings of users and items for recommender systems. Poincare [12] specializes NRL for graphs whose nodes naturally form a hierarchical structure. DiffPool [13] classifies graphs by learning their embeddings based on differentiable pooling applied to hierarchical groups of nodes. While these studies presume a variety of additional hierarchical information, i.e., category taxonomy, edge attributes, edge relations, hierarchical graph, and node groups, is accessible, our work does not rely on any of them.

## 2. Problem Statement

**Structural Hierarchical-Enhanced NRL (SHE-NRL).**Given a graph $G=(V,E)$, in which each node’s embedding vector ${\mathbf{x}}_{v}\in {\mathbb{R}}^{1\times n}$ ($v\in V$) is initialized by a unit vector, along with its structural hierarchy $\mathcal{H}$, SHE-NRL is to learn a mapping function $f:V\to {\mathbb{R}}^{k}$ from nodes to low-dimensional embedding vectors so that nodes sharing similar connections in the graph, i.e., having a larger overlap on their neighbor sets, are projected as closely as possible in the embedding space. Here k is the embedding dimension, and f is a matrix of size $n\times k$, where $k\ll \left|V\right|$.

## 3. The Proposed SHE-NRL Model

## 4. Experiments

#### 4.1. Experimental Setup

**Data and Settings.**We use three benchmark network datasets for the experiments, including Cora, Citeseer, and PudMed (https://linqs.soe.ucsc.edu/data). Cora data contain 2708 nodes, 5429 edges, and 7 labels, Citeseer has 3312 nodes, 4715 edges, and 6 labels, and PudMed contains $\mathrm{19,717}$ nodes, $\mathrm{44,338}$ edges, and 3 labels. We evaluate SHE-NRL on two well-known NRL models, DeepWalk (DW) [1], node2vec (n2v) [3], and LINE [2]. Two competing methods are employed. They are the state-of-the-art hierarchical NRL methods, HARP [7] and Marc [8]. Both construct hierarchical structures for NRL. The embedding dimension of all methods is set $k=128$. Note that since there are $\tau $ levels in a method, the embedding dimension of a level graph’s NRL is $\frac{k}{\tau}$.

**Evaluation Tasks.**We evaluate the effectiveness of node embeddings on two downstream tasks, node classification (NC) and link prediction (LP). Given a certain fraction of nodes and all their labels, the goal of NC is to classify the labels for the remaining nodes. Node embeddings are treated as features. We utilize one-vs-rest logistic regression classifier with L2 regularization as the classifier. The default ratio of training and test is 80:20%. In our main experiment, we will vary the ratio of training and testing to see how different methods perform. On the other hand, LP is to predict the existence of links, given the existing network structure. We need to have the feature vectors of links and non-links. We follow node2vec [3] to employ Hadamard operator, i.e., element-wise product, to generate the feature vectors from the embeddings of node pairs. To obtain links, we remove 50% of edges chosen randomly from the network while ensuring that the residual network is connected. To have non-links, we randomly sample an equal number of node pairs without edges connecting them.

**Evaluation Metrics.**For node classification, we consider Macro-F1 (MAF) and Micro-F1 (MIF) as the evaluation metrics. For link prediction, we utilize Area Under Curve (AUC) scores. Higher values indicate better performance in all metrics. Note that due to page limit, we report only MAF for node classification while MIF exhibits very similar results.

#### 4.2. Experimental Results

**Main Results.**The main results are shown in Figure 3, Figure 4 and Figure 5. We can have several observations. First, DeepWalk, node2vec, and LINE enhanced by the proposed SHE can get significant performance improvement (i.e., red vs. blue curves) in both tasks of NC and LP across datasets. The improvement margin is around 60% and 30% for NC and LP (these two improvement percentages are obtained by averaging the differences of MAF and AUC scores between red and blue curves over all training percentages on Citeseer and Cora datasets for node classification and link prediction, respectively), respectively. Second, SHE can further outperform the state-of-the-art methods HARP and Marc while both have already led to apparent improvement from the original DeepWalk and node2vec. We think the reason is that SHE not only leverages community knowledge, but also brings both fine-grained and coarse-grained information into the learning of different levels’ NRLs. Besides, the superiority of SHE is more obvious in node classification than in link prediction. Third, when the training percentage increases, SHE is able to consistently outperform the other hierarchical NRL enhancement methods. Such results prove the effectiveness of SHE.

**Level Analysis.**We aim to understand how the number of hierarchy levels affects the performance improvement of the proposed SHE. We vary the number of levels as 0, 0–1, and 0–2, which indicate no hierarchy (NRL on the original network), only one additional level in the hierarchy, and adopting a three-level hierarchy, respectively. The results of level analysis are exhibited in Figure 6. We can find that the hierarchy with only one additional level is enough to bring significant performance improvement. In addition, the performance of the hierarchy with two additional levels is nearly the same as that with one additional level. We can draw an insight from such results. Using only the community knowledge obtained from the original network is sufficient for SHE to boost the effectiveness of NRL.

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14), New York, NY, USA, 14–27 August 2014; ACM: New York, NY, USA, 2014; pp. 701–710. [Google Scholar]
- Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-Scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15), Florence, Italy, 18–22 May 2015; ACM: New York, NY, USA, 2015; pp. 1067–1077. [Google Scholar]
- Grover, A.; Leskovec, J. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 855–864. [Google Scholar]
- Dong, Y.; Chawla, N.V.; Swami, A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17), Halifax, NS, Canada, 13–17 August 2017; ACM: New York, NY, USA, 2017; pp. 135–144. [Google Scholar]
- Gao, H.; Huang, H. Deep Attributed Network Embedding. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 3364–3370. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR ’17), Toulon, France, 24–26 April 2017; pp. 873–876. [Google Scholar]
- Chen, H.; Perozzi, B.; Hu, Y.; Skiena, S. HARP: Hierarchical Representation Learning for Networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 2127–2134. [Google Scholar]
- Xin, Z.; Chen, J.; Chen, G.; Zhao, S. Marc: Multi-Granular Representation Learning for Networks Based on the 3-Clique. IEEE Access
**2019**, 7, 141715–141727. [Google Scholar] [CrossRef] - Ma, J.; Cui, P.; Wang, X.; Zhu, W. Hierarchical Taxonomy Aware Network Embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18), London, UK, 19–23 August 2018; ACM: New York, NY, USA, 2018; pp. 1920–1929. [Google Scholar]
- Chen, M.; Quirk, C. Embedding Edge-Attributed Relational Hierarchies. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), Paris, France, 21–25 July 2019; ACM: New York, NY, USA, 2019; pp. 873–876. [Google Scholar]
- Ma, Y.; Ren, Z.; Jiang, Z.; Tang, J.; Yin, D. Multi-Dimensional Network Embedding with Hierarchical Structure. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, Los Angeles, CA, USA, 5–9 February 2018; ACM: New York, NY, USA, 2018; pp. 387–395. [Google Scholar]
- Nickel, M.; Kiela, D. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems 30; 2017; pp. 6338–6347. Available online: https://papers.nips.cc/book/advances-in-neural-information-processing-systems-30-2017 (accessed on 10 October 2020).
- Ying, R.; You, J.; Morris, C.; Ren, X.; Hamilton, W.L.; Leskovec, J. Hierarchical Graph Representation Learning with Differentiable Pooling. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS ’18), Montreal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 4805–4815. [Google Scholar]
- Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp.
**2008**, 2008, P10008. [Google Scholar] [CrossRef] [Green Version] - Cai, H.; Zheng, V.W.; Chang, K.C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Trans. Knowl. Data Eng.
**2018**, 30, 1616–1637. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Given a collaboration network in the left, we illustrate and compare community-aware embedding space with general node embedding space, and expect that the community-aware version can better encode the structure of nodes in the embedding space.

**Figure 3.**Results on DeepWalk (DW) by varying the training percentage. The left and right columns are on node classification and link prediction, respectively. The top and bottom rows are using Citeseer and Cora datasets, respectively.

**Figure 4.**Results on node2vec (n2v) by varying the training percentage. The left and right columns are on node classification and link prediction, respectively. The top and bottom rows are using Citeseer and Cora datasets, respectively.

**Figure 5.**Performance comparison on Pubmed data. Left: node classification by MAF scores. Right: link prediction by AUC scores. Compared methods include the combinations of node embedding methods DeepWalk (DW), node2vec (n2v), and LINE with hierarchical NRL methods: original (original NRL method), HARP, Marc, and our proposed SHE.

**Figure 6.**Results by changing the number of utilized levels in SHE based on DW and n2v, respectively.

Notation | Description |
---|---|

$G=(V,E)$ | the original graph with node set V and edge set E |

k | the embedding dimension |

n | the number of nodes in G |

$\mathcal{H}$ | the structural hierarchy |

${H}^{h}$ | the level-h graph in $\mathcal{H}$ |

$\tau $ | the number of levels in $\mathcal{H}$ |

${C}_{i}^{h}$ | the i-th community node in ${H}^{h}$ |

${e}_{ij}^{h}$ | the edge that connects node ${C}_{i}^{h}$ with node ${C}_{j}^{h}$ in ${H}^{h}$ |

${D}^{h}$ | the set of edges that connect community nodes in ${H}^{h}$ |

$\rho $ | the hyperparameter controlling the height of $\mathcal{H}$ |

${n}_{h}$ | the number of nodes in ${H}^{h}$ |

${\mathbf{X}}^{h}$ | the embeddings of nodes in ${H}^{h}$ |

${v}_{j}^{h}$ | the j-th node in ${H}^{h}$ |

${\mathbf{x}}_{v}^{h}$ | the embedding vector of node v in ${H}^{h}$ |

${{\mathbf{x}}^{\prime}}_{v}^{h}$ | the new-initialized embedding vector of node v in ${H}^{h}$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, C.-T.; Lin, H.-Y.
Structural Hierarchy-Enhanced Network Representation Learning. *Appl. Sci.* **2020**, *10*, 7214.
https://doi.org/10.3390/app10207214

**AMA Style**

Li C-T, Lin H-Y.
Structural Hierarchy-Enhanced Network Representation Learning. *Applied Sciences*. 2020; 10(20):7214.
https://doi.org/10.3390/app10207214

**Chicago/Turabian Style**

Li, Cheng-Te, and Hong-Yu Lin.
2020. "Structural Hierarchy-Enhanced Network Representation Learning" *Applied Sciences* 10, no. 20: 7214.
https://doi.org/10.3390/app10207214