Structural Hierarchy-Enhanced Network Representation Learning

Network representation learning (NRL) is crucial in generating effective node features for downstream tasks, such as node classification (NC) and link prediction (LP). However, existing NRL methods neither properly identify neighbor nodes that should be pushed together and away in the embedding space, nor model coarse-grained community knowledge hidden behind the network topology. In this paper, we propose a novel NRL framework, Structural Hierarchy Enhancement (SHE), to deal with such two issues. The main idea is to construct a structural hierarchy from the network based on community detection, and to utilize such a hierarchy to perform level-wise NRL. In addition, lower-level node embeddings are passed to higher-level ones so that community knowledge can be aware of in NRL. Experiments conducted on benchmark network datasets show that SHE can significantly boost the performance of NRL in both tasks of NC and LP, compared to other hierarchical NRL methods.


Introduction
Network representation learning (NRL) is a crucial task in social and information network analysis. The idea of NRL is to learn a mapping function that converts each node into a low-dimensional embedding space while preserving the structural proximity between nodes in the given network. The derived node embedding vectors can be utilized for downstream tasks, including node classification, link prediction, and community detection. Typical NRL methods include DeepWalk [1], LINE [2], and node2vec [3], which consider structural neighborhood to depict every node. Metapath2vec [4] extends the skip-gram based NRL to heterogeneous information networks that contain multiple types of nodes and links. DANE [5] incorporates the similarity between node attributes into NRL. GCN [6] further learns node embeddings for semi-supervised node classification through a layer-wise propagation with graph convolution.
Regarding the typical NRL approaches [1][2][3] that preserve structural proximity in node embeddings, we think there are two major insufficiencies. First, every node is only aware of its few-hop neighbors by random walk sampling, which push them to be close in the embedding space. Nevertheless, as illustrated in the general node embedding space in Figure 1, non-neighbor nodes (v 5 ) being pushed away by negative sampling could still have some connections with the target node (v 1 ), e.g., belonging to the same network community (C1). In addition, neighbor nodes (v 8 ) sampled by random walks being pushed together could have weak connections to the target node (v 1 ), e.g., belonging to different communities (C1 and C3). In other words, community-level information cannot be incorporated into the learning of node embeddings. We think network communities can inform nodes to better recognize which nodes should better push together and away, as shown in the community-aware embedding space in Figure 1. Second, the given graph depicts only the fine-grained interactions between nodes. As shown in Figure 1, the given graph is a collaboration network, and thus the links depict co-authorships. The coarse-grained semantics, including how authors involve in different research areas and belong to different institutes, together with their interactions, is not aware by typical NRL approaches. We think encoding coarse-grained semantics can improve the effectiveness of node embeddings.

C1
C2 C3 C4 C5 Community-aware Embedding Space General Node Embedding Space Figure 1. Given a collaboration network in the left, we illustrate and compare community-aware embedding space with general node embedding space, and expect that the community-aware version can better encode the structure of nodes in the embedding space.
In this paper, we propose a novel framework, Structural Hierarchy Enhancement (SHE), to enhance the effectiveness of network representation learning. The main idea is two-fold. First, we construct a structural hierarchy that depicts both fine-grained and coarse-grained semantics of nodes and their interactions. We consider that node semantics is depicted by communities (i.e., clusters of nodes), and thus utilize community detection techniques to produce the structural hierarchy. By learning the embeddings of nodes in different levels of the hierarchy, our model will be capable of encoding multiple different views of each node. Hence, the semantics of nodes can be enriched, and nodes can be better distinguished from one another in the embedding space. Second, we utilize such a hierarchy to enhance NRL. Our SHE can be seamlessly applied to enhance any of existing typical NRL models mentioned above. Besides, any existing and state-of-the-art hierarchical community detection algorithm can be applied to generate the hierarchy. In other words, our SHE model provides great flexibility to be compatible with different combinations of NRL methods and community detection techniques.
Related Work. The most relevant studies are HARP [7] and Marc [8], which are hierarchical NRL methods. HARP collapses nodes according to edge and star connections so that the hierarchy can be constructed for NRL. Marc iteratively consider 3-cliques as super nodes to construct the hierarchy. However, community knowledge in networks is not considered in both HARP and Marc. Besides, different-level's node embeddings are learned independently in HARP and Marc. That said, higher-level NRL cannot utilize node embeddings derived from lower-level NRL. We will compare the proposed SHE with HARP and Marc in the experiments. As for NRLs using various hierarchical information, NetHiex [9] assumes each node is associated with a category, and categories form a hierarchical taxonomy, which is used for NRL. HRE [10] uses the relational hierarchy that comes from edge attributes for heterogeneous NRL. MINES [11] models multi-dimensional relations between different node types, along with their hierarchical connections, into the embeddings of users and items for recommender systems. Poincare [12] specializes NRL for graphs whose nodes naturally form a hierarchical structure. DiffPool [13] classifies graphs by learning their embeddings based on differentiable pooling applied to hierarchical groups of nodes. While these studies presume a variety of additional hierarchical information, i.e., category taxonomy, edge attributes, edge relations, hierarchical graph, and node groups, is accessible, our work does not rely on any of them.

Problem Statement
We first describe the notations for our problem. Let G = (V, E) denote a network, in which V is the node set (n = |V| is the number of nodes), and E is the edge set. We construct a structural hierarchy H from a network G: H = {H 0 , H 1 , . . . , H τ−1 }, where τ is the number of levels in hierarchy H, and H τ is the level-τ graph. Level-0 graph is the original network, i.e., H 0 = G. Level-(h + 1) graph H h+1 is constructed from level-h graph H h . We present the list of all notations used in this paper in Table 1.

Notation Description
Structural Hierarchical-Enhanced NRL (SHE-NRL). Given a graph G = (V, E), in which each node's embedding vector x v ∈ R 1×n (v ∈ V) is initialized by a unit vector, along with its structural hierarchy H, SHE-NRL is to learn a mapping function f : V → R k from nodes to low-dimensional embedding vectors so that nodes sharing similar connections in the graph, i.e., having a larger overlap on their neighbor sets, are projected as closely as possible in the embedding space. Here k is the embedding dimension, and f is a matrix of size n × k, where k |V|.

The Proposed SHE-NRL Model
The proposed SHE-NRL consists of four phases: (1) construction of structural hierarchy, (2) intra-level NRL, (3) level-wise pooling mechanism, and (4) generating final node embeddings. We first elaborate these four phases based on Figure 2. First, we construct the structural hierarchy by performing network community detection algorithm from lower-to higher-level graphs. Second, an existing or state-of-the-art NRL method is performed at the level-h graph (the original graph is level-0 graph) to generate level-h node embeddings. Third, a level-wise pooling mechanism is utilized to aggregate level-h node embeddings to not only initialize the level-(h + 1) node embeddings for its NRL in a bottom-up manner, but also to initialize the level-(h − 1) node embeddings for level-(h − 1) NRL in a top-down manner. The second and third phases are performed iteratively until the highest level of the hierarchy is reached. Lastly, the final node embeddings can be produced by concatenating node embeddings at different levels based on their community memberships.

Collaboration Graph
Research-Area Graph Institute Graph

Bottom-up:
Top-down: Phase 1: Construction of Structural Hierarchy. The structural hierarchy H is constructed from a network G. By applying a certain community detection algorithm to level-h graph H h , we can obtain a set of communities (i.e., node sets) where n h is the number of communities in H h , and C h i is the i-th community. These communities are treated as nodes at level-(h + 1) graph H h+1 , given by: where D h is the set of edges that connect communities. For every pair of communities C h i and C h j , we create an edge e h+1 ij to connect them in H h+1 if there exists at least one edge between nodes in C h i and nodes in C h j in H h . To adaptively determine the number of communities n h for every graph H h , we utilize Louvain [14] algorithm for community detection. We do not pre-define the number of levels τ in the hierarchy, but continue to produce H h+1 from H h until n h < ρ, where ρ is a hyperparameter controlling the height of the hierarchy. We set ρ = 5 by default. In other words, we will not generate level-(h + 1) graph H h+1 if the number of nodes n h in H h is lower than ρ. In other words, the hyperparameter ρ is the minimum number of communities at the last (highest) level of the hierarchy, rather than the hierarchy height. Another hyperparameter τ, described in Phase 2, is the hierarchy height. The τ is automatically determined by repeatedly generating new communities from the previous level's graph until n h < ρ, where n h is the number of communities in level-h's graph H h . Phase 2: Intra-level NRL. Given level-h graph H h , we perform intra-level NRL. We allow any of NRL methods that preserves structural proximity between nodes in our SHE-NRL framework. That said, typical NRL methods, such as DeepWalk [1], LINE [2], and node2vec [3], can be used for intra-level NRL. Let X h be the generated embedding matrix for all nodes in H h . The intra-level NRL is iteratively performed in a one-round circular manner. The bottom-up way is first executed, followed by the top-down way. Specifically, the intra-level NRL is performed one after another from H 0 , H 1 to H τ , i.e., the bottom-up way. Then we again iteratively perform the intra-level NRL from H τ , H τ−1 to H 0 , i.e., the top-down way. The bottom-up way brings fine-grained level's interactions between nodes into the intra-level NRLs at higher levels in the hierarchy. The top-down way of message passing makes the intra-level NRLs at lower levels be aware of coarse-grained community knowledge. All intra-level NRLs at level-h graphs H h (0 ≤ h ≤ τ − 1) are executed twice while the level-τ NRL is executed only one time. In the next phase, we will discuss how X h can be used to initialize the node embeddings of graphs H h+1 and H h−1 . We randomly initialize the embeddings of nodes for NRL in the original graph H 0 = G.
Phase 3: Level-wise Pooling Mechanism. In order to bring fine-grained information of node interactions into higher-level NRLs, and make lower-level NRLs be aware of coarse-grained community knowledge at higher levels of the hierarchy, we propose the level-wise pooling mechanism. The pooling mechanism consists of bottom-up pooling and top-down pooling. The bottom-up pooling utilizes the node embeddings X h of H h to initialize the node embeddings of H h+1 's NRL. The max pooling is adopted as the bottom-up pooling. The max pooling is performed to initialize the embedding of node v h+1 j at H h+1 from the corresponding i-th community C h i at H h . We exploit the most significant learned node embedding in community C h i at H h to be the initial node embedding of node v h+1 j at H h+1 . On the other hand, the average pooling is adopted as the top-down pooling. The average pooling is performed to initialize the embeddings of all nodes in the i-th community C h i at H h from the learned node embedding of node v h+1 j at H h+1 . That said, given the new-learned node embedding of node u h+1 j at H h+1 , denoted by x h+1 u , and the previously-generated embedding of corresponding lower-level node v h ∈ C h i at H h , denoted by x h v , we have the new initial embedding of node v h , denoted by x h v , based on the equation: Utilizing the average pooling for the top-down way of NRLs is able to distribute coarse-grained community knowledge back to the NRLs at lower-level graphs. Note that we utilize two different letters v and u here is to better distinguish nodes at different levels, i.e., v refers to a node at level h and u refers to a node at level h + 1.
Phase 2 and Phase 3 are iteratively adopted one after the other in first the bottom-up way, then the top-down way. In other words, in the bottom-up way, when node embeddings are generated by NRL in H h (Phase 2), they are immediately brought to initialize the embeddings of nodes in H h+1 via max pooling (Phase 3). In the top-down way, when node embeddings are produced by NRL in H h+1 (Phase 2), they are instantaneously employed to initialize the embeddings of nodes in H h via average pooling (Phase 3).
We utilize Figure 2 to better elaborate the interweaved process between Phase 2 and Phase 3 in an alternative view. In this paragraph, that said, we will describe the bottom-up process, followed by the top-down process. Given the structural hierarchy, consisting of three graphs at different levels, i.e., H 0 , H 1 , and H 2 , in the bottom-up process, we first perform a typical NRL method on H 0 and obtain node embedding x 0 v for every v ∈ H 0 . Then we apply max pooling on nodes, which belong to the same level-1 community, to initialize the embedding of each level-1 community node in H 1 . We again apply the same NRL method to H 1 to obtain node embedding for every level-1 community node, and use max pooling to initialize each level-2 community node in H 2 . Then NRL method is applied again on H 2 . Next, we are doing the top-down process. We utilize average pooling in Equation (2) to initialize level-1 community node embedding from level-2 community node embedding. Then the NRL method is applied to H 1 , followed by performing average pooling to initialize node embeddings in H 0 . The last NRL is executed on H 0 to generate the embeddings of nodes in the original graph.
Phase 4: Generating Final Node Embeddings. Equipped with the derived node embedding matrix X h at every level-h graph, we can generate the final embeddings for all nodes in the original network G (i.e., H 0 ). To let final node embeddings contain both fine-grained and coarse-grained information, the concatenation operation is adopted. We concatenate the embedding vector of node v in H 0 with all of its corresponding higher-level embedding vectors in H h , where h = 1, 2, . . . , τ. Suppose the dimension of each level's node embedding is equal and denoted as b. The dimension of final node embeddings will be (τ + 1)b.
Note that it is apparent that there can be alternative approaches, such as the operators of average, Hadamard, and weighted-L2 [3], to fuse node embeddings at different levels of the hierarchy. We leave the design of better embedding aggregation for future investigation.
Remark. Two of the important ideas in the proposed SHE-NRL model are the bottom-up and top-down correlations on embedding vectors. If we first generate node embeddings for every level from H 0 to H τ , then apply the top-down corrections, our model can encounter a critical issue-missing the interactions and connections between levels. Bottom-up and top-down correlations are to bring the fine-grained semantics encoded by lower-level node embeddings to higher-levels ones, and to deliver the coarse-grained semantics to lower-level nodes, respectively. Such a design follows the realistic intuition that, for an example in the university, a graduate student is depicted by her advisor, her college, and her school in order. We need to make the advisor recognize that student, and make the college see the advisor, and so on. By doing so, through the immediate correction of vectors in H h level based on the H h+1 vectors, the node embeddings can better encode the semantics, and become more robust.

Experiments
We conduct experiments to answer three questions. (a) Can SHE-NRL improve the effectiveness of NRL for different NRL methods? (b) Is SHE-NRL able to outperform the state-of-the-art competing method? (c) Does a structural hierarchy with more levels lead to better performance?

Experimental Setup
Data and Settings. We use three benchmark network datasets for the experiments, including Cora, Citeseer, and PudMed (https://linqs.soe.ucsc.edu/data). Cora data contain 2708 nodes, 5429 edges, and 7 labels, Citeseer has 3312 nodes, 4715 edges, and 6 labels , and PudMed contains 19, 717 nodes, 44, 338 edges, and 3 labels. We evaluate SHE-NRL on two well-known NRL models, DeepWalk (DW) [1], node2vec (n2v) [3], and LINE [2]. Two competing methods are employed. They are the state-of-the-art hierarchical NRL methods, HARP [7] and Marc [8]. Both construct hierarchical structures for NRL. The embedding dimension of all methods is set k = 128. Note that since there are τ levels in a method, the embedding dimension of a level graph's NRL is k τ . Evaluation Tasks. We evaluate the effectiveness of node embeddings on two downstream tasks, node classification (NC) and link prediction (LP). Given a certain fraction of nodes and all their labels, the goal of NC is to classify the labels for the remaining nodes. Node embeddings are treated as features. We utilize one-vs-rest logistic regression classifier with L2 regularization as the classifier. The default ratio of training and test is 80:20%. In our main experiment, we will vary the ratio of training and testing to see how different methods perform. On the other hand, LP is to predict the existence of links, given the existing network structure. We need to have the feature vectors of links and non-links. We follow node2vec [3] to employ Hadamard operator, i.e., element-wise product, to generate the feature vectors from the embeddings of node pairs. To obtain links, we remove 50% of edges chosen randomly from the network while ensuring that the residual network is connected. To have non-links, we randomly sample an equal number of node pairs without edges connecting them.
Evaluation Metrics. For node classification, we consider Macro-F1 (MAF) and Micro-F1 (MIF) as the evaluation metrics. For link prediction, we utilize Area Under Curve (AUC) scores. Higher values indicate better performance in all metrics. Note that due to page limit, we report only MAF for node classification while MIF exhibits very similar results.

Experimental Results
Main Results. The main results are shown in Figures 3-5. We can have several observations. First, DeepWalk, node2vec, and LINE enhanced by the proposed SHE can get significant performance improvement (i.e., red vs. blue curves) in both tasks of NC and LP across datasets. The improvement margin is around 60% and 30% for NC and LP (these two improvement percentages are obtained by averaging the differences of MAF and AUC scores between red and blue curves over all training percentages on Citeseer and Cora datasets for node classification and link prediction, respectively), respectively. Second, SHE can further outperform the state-of-the-art methods HARP and Marc while both have already led to apparent improvement from the original DeepWalk and node2vec. We think the reason is that SHE not only leverages community knowledge, but also brings both fine-grained and coarse-grained information into the learning of different levels' NRLs. Besides, the superiority of SHE is more obvious in node classification than in link prediction. Third, when the training percentage increases, SHE is able to consistently outperform the other hierarchical NRL enhancement methods. Such results prove the effectiveness of SHE.    The reason that the proposed SHE can outperform state-of-the-art methods is two-fold. On one hand, regarding constructing the hierarchy, HARP collapses nodes based on edge and star connections and Marc consider 3-cliques as super nodes. However, neither connection collapsing nor clique structure in both HARP and Marc can depict the semantic correlation between nodes, i.e., the community knowledge in networks. In our SHE, the obtained communities are used to form the hierarchy so that the fine-grained community information with respect to every node can be encoded into the learning of its embedding. On the other hand, different-level's node embeddings are learned independently in HARP and Marc. That said, higher-level NRL cannot utilize node embeddings derived from lower-level NRL, and lower-level NRL cannot be enhanced by high-level coarse-grained semantics. Therefore, less informed node embeddings lead to worse performance, comparing to our SHE method that jointly learns and exploits node embeddings at different levels in the hierarchy.
Level Analysis. We aim to understand how the number of hierarchy levels affects the performance improvement of the proposed SHE. We vary the number of levels as 0, 0-1, and 0-2, which indicate no hierarchy (NRL on the original network), only one additional level in the hierarchy, and adopting a three-level hierarchy, respectively. The results of level analysis are exhibited in Figure 6. We can find that the hierarchy with only one additional level is enough to bring significant performance improvement. In addition, the performance of the hierarchy with two additional levels is nearly the same as that with one additional level. We can draw an insight from such results. Using only the community knowledge obtained from the original network is sufficient for SHE to boost the effectiveness of NRL. Node Classification Link Prediction

Conclusions and Future Work
This paper proposes a structural hierarchy-enhanced network representation learning (SHE-NRL) framework to improve the effectiveness of the learned node embeddings. Our SHE can be incorporated into existing NRL methods, such as DeepWalk, node2vec, and LINE, so that their performance in downstream tasks can be boosted. Experiments conducted on real datasets for node classification and link prediction prove the effectiveness of SHE-NRL. An extensive empirical study also exhibits that even using one additional level hierarchy can significantly boost the effectiveness of NRL methods.
The promising results of SHE encourage us to create a three-fold extension. First, while graph neural networks (GNN) [15] are widely proven being effective for graph-based applications, we are developing a hierarchical message passing mechanism based on SHE so that both fine-grained and coarse-grained nodes can receive each other's information to improve the performance of semi-supervised node classification. Second, we will define the relational hierarchy in a heterogeneous network so that the learning of heterogeneous node embeddings, such as metapath2vec [4], can be aware of multi-typed community knowledge. Last, the current SHE treats the construction of structural hierarchy and the learning of node embeddings as two independent modules. An ongoing extension is to jointly optimize the hierarchy and node embeddings in an end-to-end manner.