Learning Heterogeneous Graph Embedding with Metapath-Based Aggregation for Link Prediction

Chengdong Zhang; Keke Li; Shaoqing Wang; Bin Zhou; Lei Wang; Fuzhen Sun

doi:10.3390/math11030578

,

and

School of Computer Science and Technology, Shandong University of Technology, Zibo 255091, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2023, 11(3), 578;https://doi.org/10.3390/math11030578

This article belongs to the Special Issue Deep Learning Algorithms in Computational Intelligence: Advances and Applications

Version Notes

Order Reprints

Abstract

Along with the growth of graph neural networks (GNNs), many researchers have adopted metapath-based GNNs to handle complex heterogeneous graph embedding. The conventional definition of a metapath only distinguishes whether there is a connection between nodes in the network schema, where the type of edge is ignored. This leads to inaccurate node representation and subsequently results in suboptimal prediction performance. In heterogeneous graphs, a node can be connected by multiple types of edges. In fact, each type of edge represents one kind of scene. The intuition is that if the embedding of nodes is trained under different scenes, the complete representation of nodes can be obtained by organically combining them. In this paper, we propose a novel definition of a metapath whereby the edge type, i.e., the relation between nodes, is integrated into it. A heterogeneous graph can be considered as the compound of multiple relation subgraphs from the view of a novel metapath. In different subgraphs, the embeddings of a node are separately trained by encoding and aggregating the neighbors of the intrapaths, which are the instance levels of a novel metapath. Then, the final embedding of the node is obtained by the use of the attention mechanism which aggregates nodes from the interpaths, which is the semantic level of the novel metapaths. Link prediction is a downstream task by which to evaluate the effectiveness of the learned embeddings. We conduct extensive experiments on three real-world heterogeneous graph datasets for link prediction. The empirical results show that the proposed model outperforms the state-of-the-art baselines; in particular, when comparing it to the best baseline, the F1 metric is increased by 10.35% over an Alibaba dataset.

Keywords:

graph neural network; heterogeneous graph; metapath; link prediction

MSC:

68T07

1. Introduction

With the coming of the era of science and technology, various network applications make people’s lives more convenient. However, the diversity of items leads to the problem of information overload. A promising solution is to learn the embeddings of users and items, which can be applied to various downstream tasks, such as link prediction [1,2] and recommendation systems [3]. As deep learning has developed, graph embedding, which can map nodes in a graph to low-dimensional continuous spaces, has been proposed and widely used because many natural representations of real-world datasets are made from graph data structures. DeepWalk [4], LINE [5], and node2vec [6] are pioneering works to learn node embeddings. DeepWalk and node2vec feed node sequences to a skip-gram model to learn the embeddings of nodes. The sequences are generated by random walks in the model. LINE studies the problem of embedding very large information networks into low-dimensional vector spaces by exploiting the first-order and second-order proximity between nodes.

Recently, graph neural networks (GNNs), the new products of graph-embedding models, have attracted a great deal of attention. GNNs, such as the graph convolutional network (GCN) [7], GraphSAGE [8], the graph attention network (GAT) [9], and many other variants [10,11,12,13], have proven to be strongly expressive and demonstrate state-of-the-art performance on various graph-based downstream tasks, such as link prediction [1,2], community detection [14,15], node classification [16,17], and the recommendation system [3]. All these GNN-based approachs can effectively reserve network structures and inherent properties with the specific message propagation.

Nevertheless, most methods were designed to resolve only the homogeneous graph, i.e., there is only one type of node and edge. In real life, graph data usually consists of different types of nodes and edges. For example, an e-commerce dataset includes two types of nodes at least, namely users and items. Furthermore, there are various types of edges. For example, clicking, add to favorite, add to cart, and purchase are classic behaviors in the e-commerce scenario. Each kind of behavior denotes one type of edge. We refer to this kind of graph as heterogeneous information networks or heterogeneous graphs. In heterogeneous graphs, traditional GNNs cannot encode the rich structural and semantic information into a low-dimensional vector space because they treat all nodes equally [18].

Many researchers have proposed graph-embedding methods based on the idea of metapaths [19] to deal with heterogeneous graphs [20,21,22]. In particular, a metapath is an ordered and composite relation sequence connecting different or identical node types. For example, an e-commerce dataset includes two node types, i.e., user and item, and various edge types include clicking, add to favorite, add to cart, and purchase.

U s e r

\overset{A d d - t o - c a r t}{\to}

I t e m

\overset{C l i c k e d b y}{\to}

U s e r

is a metapath which describes a composite relation between users. With the aid of various metapath instances, each node can be directly connected with its high-order neighbors, and the high-order proximity between two nodes is hidden in heterogeneous graphs [18].

Although the metapath-based approaches have had some success in heterogeneous graph embedding, they still have the following limitations. (1) The conventional definition of metapath [23] only distinguishes whether there is a connection between two nodes in the network schema; the type of edge is ignored. This leads to inaccurate node-representation learning and subsequently leads to the suboptimal prediction performance. In heterogeneous graphs, a node may be connected by multiple types of edges. In fact, each type of edge represents one kind of scene. The intuition is that if the embedding of nodes are trained under different scenes, the complete representation of nodes can be obtained by organically combining them. (2) In the aspect of node aggregation, classical network models are used, such as GCN [7] and graphSAGE [8]. GCN uses the adjacency matrix and degree matrix for aggregation operation. Although the adjacency matrix and degree matrix are obtained on the whole graph, they are suitable for operating on the whole graph and not for aggregating on the metapath. With the increasing of metapath length, the traditional aggregation methods (e.g., mean, maximum) aggregate all the node information with the same weight while ignoring their important differences, which leads to the deviation of node representation.

To solve the problems of multiple types of edge relations and aggregation over the metapath instances, we first propose a novel definition of metapath in which the edge type is considered. A heterogeneous graph can be considered as the compound of multiple relation subgraphs from the view of novel metapath. In different subgraphs, the embeddings of a node are separately trained by encoding and aggregating the neighbors of the intrapaths, which are the instance level of novel metapaths. Then, the final embedding of the node is obtained by use of the attention mechanism which aggregates from the interpath, which is the semantic level of novel metapaths. Finally, we make extensive experiments on three real-world heterogeneous graph datasets for link prediction. The empirical results demonstrate that our model is superior to the state-of-the-art baselines.

In summary, our contributions can be summarized in the following three points.

•: Novel definition: We propose a novel definition of metapath, i.e., relation-constrained metapath. Then, each relation constrained metapath instance has a specific meaning and can clearly describe the relations between nodes.
•: Multiview modeling: We design two levels of metapath-guided aggregation methods, which aggregates from the intrapath on the instance level and the interpath on the semantic level, respectively. This method is more conducive to node aggregation and improves the accuracy of nodes representations.
•: Multifaceted experiments: We perform a suite of experiments on three datasets. The results demonstrate the effectiveness of the proposed model compared to the baseline model. In addition, we validate that our model can alleviate cold-start problem to a certain extent.

The remainder of this paper is organized as shown below. In Section 2, we review the related work. We introduce some definitions in heterogeneous graph and summarize the notations used in Section 3. In Section 4, we present our proposed model. In Section 5, a number of experiments of the proposed model are conducted and detailed analyses are illustrated. The model and future research directions are summarized in Section 6.

2. Related Work

Graph embedding. The goal of graph embedding is to project nodes into a low-dimensional vector space, in which the embedding can be used for many downstream tasks, e.g., node classification, graph classification, community detection, and link prediction. DeepWalk [4] and node2vec [6] generate corpus on graph by random work and then feed corpus to a skip-gram model to learn the embeddings of nodes. LINE [5] studies the problem of embedding large-scale networks into low-dimensional space of vector while preserving the first-order and second-order proximity between nodes. Node2vec [6] devises a biased random walk programming to effectively exploit diverse neighbourhoods. Along with the rapid development of deep learning, many researchers propose GNNs to explore graph embedding because each node is naturally determined by its own attributes and its neighbours. GraphSAGE [8] proposes a layer-by-layer aggregation of attributes of neighbours while implicitly exploring the topological structure. Inspired by the Transformer [24], GAT [9] introduces the graph-attention mechanism to measure the relative contributions of the neighbors in aggregation from the perspective of target nodes. DH-HGCN [3] models the dual homogeneity from social relations and item connections by hypergraph convolution networks to obtain high-order correlations among users and items. These methods have made considerable advancements in a variety of tasks with homogeneous graphs or specific graph structures. However, when applied to heterogeneous graphs, they are unable to encode heterogeneity into the representation.

Heterogeneous graph embedding. A heterogeneous graph involves multiple types of nodes and multiple types of edges, and it can depict the real world more truly. Heterogeneous graphs can describe much more complex relationships and structures than homogeneous graphs, so they have been widely used in graph data mining. Typical heterogeneous graphs include social networks, e-commerce systems, knowledge graphs, and citation networks. A variety of heterogeneous graph-representation learning methods have been proposed. They can be roughly divided into two categories. One involves graph-based approaches, such as HetGNN [25] and HAN [26]. HetGNN [25] jointly considers node-heterogeneous content encoding, type-based neighbour aggregation, and heterogeneous-type combinations. HAN [26] proposes a heterogeneous graph neural network based on the hierarchical attention. The other available method for heterogeneous graphs embedding involves metapath-based methodologies [18,19]. Metapath2vec [23] formalizes metapath-based random walks to construct the heterogeneous neighbourhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. GATNE [27] formalizes the problem of graph embeddings of multiplex heterogeneity and proposes to solve this problem with transitional and inductive settings. MAGNN [18] applies three building block components, which settles the three characteristic limitations of current heterogeneous graph-embedding methods, and performs state-of-the-art performance on three real-world datasets with regard to the tasks of node clustering, node classification, and link prediction. Ref. [22] extracts multifaceted meaningful semantics reflected by the metapath on the heterogeneous graph as multiviews for both users and items, and effectively enhances user/item relationships on different aspects. HeCo [21] proposes two views of a heterogeneous graph (network schema and metapath views) to learn node embeddings in order to capture both local and higher-order structures simultaneously. MG-PFCM [20] defines the user-oriented and item-oriented metapaths, and performs the metapath-guided heterogeneous graph learning to improve the user and item embeddings. Although metapaths can deliver semantic information in heterogeneous graphs, it is too rough to capture the subtle semantics of certain applications. More specifically, the conventional definition of metapath is limited to whether there is a connection between nodes in the network schema. However, there are usually multiple types of edges between two nodes in heterogeneous graphs. The conventional definition of a metapath cannot capture the subtle semantics. In this paper, a type of edge is called one scene. A node has different representations in each scene. The intuition is that if the embedding of a node is trained under different scenes, the complete representation of the node can be obtained by organically combining them. Aiming at this problem, we propose a novel relation-constrained metapath.

Link prediction in heterogeneous graph. Inferring missing links between nodes in a graph based on observed interactions is known as link prediction [28,29,30], which is useful in numerous domains of application. Examples include bioinformatics [31], social media [32], e-commerce [33], and collaboration networks [34]. Many studies have been developed for link prediction in heterogeneous graphs. Ref. [35] proposes the problem of link prediction in heterogeneous networks as a multitask, metric learning problem. Ref. [30] considers the multinetwork scenario to encode diverse network structures of anchor users. Ref. [36] iteratively calculates link likelihoods taking longer paths between nodes into account. Ref. [37] first calculates node and edge relevance based on the summarized graph, and then combines both these factors to perform link prediction on unconnected pairs of nodes. Ref. [38] provides a systematic and comprehensive survey on hyperlink prediction. Link prediction is a downstream task to evaluate the effectiveness of the learned embeddings. We adopt it to evaluate effectiveness of the proposed method.

3. Preliminaries

3.1. Definitions

Definition 1.

Homogeneous and heterogeneous graph. Given a graph

G = (V, E)

,

V

and

E

represent the set of nodes and edges, respectively. The associated node and edge type mapping function are defined as ϕ:

V \to T

and ψ:

E \to R

, respectively. Each node

v \in V

belongs to a particular node type

ϕ (v) \in T

, and each edge

e \in E

belongs to a particular edge type

ψ (e) \in R

. When

| T | + | R | > 2

, the graph is called a heterogeneous graph; otherwise, it is a homogeneous graph.

In a heterogeneous graph, there are multiple node types and edge types. For example, in the e-commence datasets, they can be depicted by a heterogeneous graph in which two node types, i.e., users and items, and four edge types, i.e., clicking, add to favorite, add to cart, and purchase, are usually included.

Definition 2.

Metapath. In a heterogeneous graph, a metapath is an ordered and composite relation sequence connecting different or identical node types. A metapath

P

is denoted in the form of

T_{1} \overset{R_{1}}{\to} T_{2} \overset{R_{2}}{\to} \dots \overset{R_{l}}{\to} T_{l + 1},

(1)

where l is the length of

P

.

For example, in the e-commence datasets, metapath

P_{1} : U s e r \overset{P u r c h a s e}{\to} I t e m \overset{P u r c h a s e d b y}{\to} U s e r

(2)

can describe the copurchase relation. Another metapath,

P_{2} : U s e r \overset{A d d - t o - c a r t}{\to} I t e m \overset{C l i c k b y}{\to} U s e r \overset{P u r c h a s e}{\to} I t e m,

(3)

describes a composition operator on relations between users and items.

A path

p : v_{1} \to v_{2} \to \dots \to v_{l + 1}

between nodes

v_{1}

and

v_{l + 1}

is called a metapath instance of

P

, if

\forall i

, the node

v_{i}

and edge

e_{i} = (v_{i}, v_{i + 1})

satisfy

ϕ (v_{i}) \in T

and

ψ (e_{i}) \in R

. For example, in the e-commence datasets, the path

u_{1} \overset{P u r c h a s e}{\to} i_{2} \overset{P u r c h a s e d b y}{\to} u_{3}

(4)

and

u_{1} \overset{A d d - t o - c a r t}{\to} i_{4} \overset{C l i c k b y}{\to} u_{5} \overset{P u r c h a s e}{\to} i_{8}

(5)

are path instances of

P_{1}

and

P_{2}

, respectively.

The metapaths and instances indicate that there are potential relationships between nodes. However, in the heterogeneous graph, the various edge types make the composition relations between nodes quite complicated. It is difficult to study the nodes’ representations by use of the conventional metapath. Consequently, we propose a novel definition of metapath in which the edge type is integrated into it. The novel metapath is called a relation-constrained metapath, which can be formalized as shown below.

Definition 3.

Relation-constrained metapath. A relation-constrained metapath is a metapath based on a certain constraint that is denoted as

P^{C}

, where

P

is a metapath, and

C

represents the constraint on the relations in the metapath.

Note that the

C

can only be one constraint condition on a metapath. For example, the metapath

P_{1}

is equivalent to a relation-constrained metapath

P_{1}^{C = “ p u r c h a s e ”}

:

U s e r - I t e m - U s e r

, abbreviated as

P_{1}^{C = “ p u r c h a s e ”}

:

U - I - U

or

U I U

. Each relation-constrained metapath instance has a specific meaning and can clearly describe the relations between nodes.

3.2. Problem Definition

Given a heterogeneous graph

G = (V, E)

and a set of relation-constrained metapaths S, the problem of heterogeneous graph embedding is mapping each node v to a low-dimensional space representation, i.e., the goal is to find a function

f u n : v \to R^{d}

for every node v, where

d ≪ | V |

. We adopt link prediction, which infers missing links between nodes, as the downstream task to evaluate the effectiveness of the learned embeddings.

3.3. Notations

The notations we will use throughout the paper are summarized in Table 1.

Table 1. Notations used in the paper.

4. The Proposed Method

In the following section, we take the process of the vector representation of

u_{1}

as an example to present the proposed model in detail. As shown in Figure 1, the model starts with the instantiation process of metapaths, followed by the two-level aggregation. Then, we provide the optimization method and pseudocode of the proposed method. Finally, the classifier for link prediction is introduced.

Figure 1. The framework of the proposed model. In the heterogeneous graph, the data processing is divided into two flow directions. Lastly, all the local embeddings are organically combined by two-level aggregation. (1) As shown by the orange arrow, in the relation subgraph, i.e., the edge type is constrained by purchase behavior. First, the sequences started from node

u_{1}

are generated by the random walk method. Then, the most related neighbors of

u_{1}

are obtained. Next, a particular embedding of

u_{1}

can be gained by a sum over the embedding of the most related neighbors in the particular relation subgraph, i.e., the random walk-based intrapath view. Finally, the embeddings of

u_{1}

in different relation subgraphs are obtained by random walk-based interpath aggregation. (2) As shown by the blue arrow, first, all the relation-constrained metapath instances started from node

u_{1}

are generated. Then, the high-order neighbors of

u_{1}

are obtained. Next, the initial embeddings of

u_{1}

and its neighbors are fed to Transformer. Finally, another particular embedding of

u_{1}

can be obtained by the sum over the output of the transformer. This flow is called metapath-guided intrapath aggregation. Furthermore, the results of metapath-guided interpath aggregation are assembled by each intrapath aggregation (best viewed in color).

4.1. Instantiating Paths and Generating Neighbors

4.1.1. Random Walk-Based Metapath Instance and Neighbour Generation

In heterogeneous graph, each edge type denotes a scene. The intuition is that the optimal vector representation of a node can be obtained by organically combining the various scene’s vector representations of it. First, the heterogeneous graph is decomposed into multiple relation subgraphs according to the user’s behaviors. Then, random walk is used for these relation subgraphs. Next, the sequences of nodes are obtained. Finally, top-k neighbors are generated depending on the frequency wherein the nodes appear in the sequence.

4.1.2. Relation-Constrained Metapath Instance and Neighbor Generation

As shown in Figure 1, there are two relation-constrained metapaths:

P_{1}^{C = “ p u r c h a s e ”}

:

U - I - U

and

P_{1}^{C = “ a d d - t o - c a r t ”}

:

U - I - U

, which correspond to different user behaviors. The instantiation process of the relation-constrained metapath is carried out by Equation (6). We have

p (v_{i + 1} | v_{i}) = \frac{1}{| N (v_{i}) |}, (v_{i}, v_{i + 1}) \in R^{C},

(6)

where

R^{C}

denotes the relation between nodes, and it is consistent with the corresponding relation-constrained metapath, i.e., the flow of the walk is constrained by the predefined metapath

P^{C}

, and

N (v_{i})

represents the neighbors of the node

v_{i}

.

After the relation-constrained metapath instances are constructed, we continue to filter the sequences according to the first node so that the high-order neighbors with the same node type can be directly connected with the node. Finally, we obtain homogeneous subgraphs, such as user–user or item–item subgraphs, which can capture high-order proximity between same type nodes under the different specific user behaviour. The detailed procedure is presented in Figure 2. Each homogeneous subgraph describes the potential relationship among the same type of nodes in a scene. In different scenes, the neighbors of the same node are usually different.

Figure 2. A relation-constrained metapath instance and neighbor generation. In the heterogeneous graph, all the instances are generated according to the relation-constrained metapaths. After filtering metapath instances according to the type of first node in sequence, the neighbors in the same scene can be obtained. Finally, we obtain subgraphs with the same node type, such as, user–user or item–item subgraphs, which can capture high-order proximity between nodes under the specific user behaviour.

4.2. Node Aggregation

In order to obtain more accurate feature representation of a node, we aggregate information of its neighbors from both the intrapath and interpath view first. To be specific, the intrapath can be divided into a random walk-based intrapath and a metapath-guided intrapath. The interpath includes a random walk-based interpath and a metapath-guided interpath. Then, the final representation of a node can be trained by combining the different views organically.

4.2.1. Intrapath Aggregation

Random Walk-Based Intrapath Aggregation

The initial feature representation of the node is obtained by aggregating the neighbours in a random walk-based intrapath. Here, we use simple cumulative aggregation as shown in Equation (7). We have

{\vec{h_{v}}}^{t} = {\vec{h_{v}}}^{t - 1} + \sum_{i \in N (v)} {\vec{h_{i}}}^{t - 1},

(7)

where

\vec{h_{v}}

represents the vector representation of the central node, and

N (v)

represents the neighbours of the central node.

Relation Constrained Metapath-Guided Intrapath Aggregation

The subgraphs of the same node type are generated according to the high-order similar nodes in the relation-constrained metapath instances. Then, the feature vectors of neighbours are stacked to form a matrix A, which is linearly projected to three matrices Q, K, and V by use of Transformer [24]. To be specific, the feature vectors of neighbors of each node in subgraphs can merge into a feature matrix

A = {[\vec{h_{1}}, \dots, \vec{h_{i}}, \dots, \vec{h_{l^{'}}}]}^{T}

, where

h_{i} \in R^{k}

, k represents the feature dimension of nodes, and

l^{'}

represents the number of nodes in the sequences.

A^{'} = [\vec{h_{1}^{'}}, \dots, \vec{h_{i}^{'}}, \dots, \vec{h_{l^{'}}^{'}}]

, where

\vec{h_{i}^{'}} \in R^{k}

is the vector representation of node

v_{i}

after linear transformation. The converted vector representations are aggregated by Equation (8),

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(8)

where

\frac{1}{\sqrt{d_{k}}}

is the scaling factor. Then, we use Equation (7) to aggregate the central nodes.

4.2.2. Interpath Aggregation

After aggregation from the intrapath view, the vector representation of each node under different behaviors is obtained. All of the representations are semantic related and able to capture specific semantic information. In the next step, the different vector representations of the same node are aggregated by an attention mechanism to generate the final vector representation. Note that it is a semantic-level attention which can automatically learn the importance of different intrapaths and fuse them. Interpath aggregation includes random walk-based aggregation and relation-constrained, metapath-guided intrapath aggregation. In fact, the principle of two kinds of aggregation is the same. So we present them indiscriminately.

Firstly, we use two matrices,

W_{1}

and

W_{2}

, which are trainable parameters acting on the same node in different scenes. Then, the attention coefficient is calculated for the same node in each scene, as shown in Equation (9),

e_{i j} = tanh (W_{1} * F) * W_{2},

(9)

where

e_{i j}

denotes the importance of the feature vector of the same node in the scene j to the feature vector of the scene i, and

F = \{\vec{f_{1}}, \dots, \vec{f_{i}}, \dots, \vec{f_{n}}\}

is the output of previous intrapath aggregation. Next,

e_{i j}

is normalized by the softmax function to obtain the weight

α_{i j}

for each scene, as shown in Equation (10),

α_{i j} = Softmax (e_{i j}) = \frac{exp (e_{i j})}{\sum_{k \in R} exp (e_{i k})} .

(10)

The final representation of the interpath aggregation is computed by weighted value, as shown in Equation (11),

\vec{h_{i}} = σ (\sum_{j \in R} (α_{i, j} * W * \vec{f_{j}})),

(11)

where

σ

, W represents the nonlinearity and the trainable transformation matrix, respectively.

4.3. Model Optimization

There are two kinds of walk in the proposed model. The first kind of walk is random walk, which only constrains the edge type, to generate the node sequences. All the node sequences construct the training corpus. Then, the corpus is feed into skip-gram to learn embeddings. To be specific, given a constrained relation

R^{C}

, i.e., edge type, we can get random walk-based node sequences

S_{1} = (v_{s_{1}}, \dots, v_{s_{l^{'}}})

, where

l^{'}

is the length of sequences, and

(v_{s_{t}}, v_{s_{t + 1}}) \in R^{C} (t = 1, \dots, l^{'} - 1)

. The context of

v_{s_{t}}

can be denoted as

C = \{v_{s_{k}} |v_{s_{k}} \in S_{1},| k - t ∣ \leq c, k \neq t\},

(12)

where c is the radius of the window size.

The second kind of walk is walking along the proposed relation-constrained metapath, which constrained both the node and edge types. Specifically, given a heterogeneous graph

G = (V, E)

and a relation-constrained metapath

P^{C}

, a homogeneous graph can be constructed according to the Figure 2. Then, the central node and its direct neighbours can be reorganized as a node sequence, where neighbours can be considered in the context of a central node. We have

S_{2} = (v_{s_{- \frac{l^{'}}{2}}}, \dots, v_{s_{0}}, \dots, v_{\frac{l^{'}}{2}})

, where

s_{0}

denotes central node, and

(v_{s_{t}}, v_{s_{t + 1}}) \in R^{C}

and

ϕ (v_{s_{t}}) = ϕ (v_{s_{t + 1}})

. The context of

v_{s_{0}}

can be denoted as

C = \{v_{s_{k}} | v_{s_{k}} \in S_{2}, | k - t ∣ \leq \frac{c}{2}\},

(13)

where c is the radius of window size.

Given a node

v_{i}

and its context C, the objective is to minimize the negative logarithmic likelihood [5,27], as shown in Equation (14),

- log P_{θ} (\{v_{j} ∣ v_{j} \in C\} ∣ v_{i}) = \sum_{v_{j} \in C} - log P_{θ} (v_{j} ∣ v_{i}),

(14)

where

θ

denotes all parameters. To be specific, the probability definition of

P_{θ} (v_{j} ∣ v_{i})

is shown in Equation (15),

P_{θ} (v_{j}| v_{i}) = \frac{exp (c_{j}^{T} \cdot β_{i})}{\sum_{v_{k} \in C} exp (c_{k}^{T} \cdot β_{i})},

(15)

where

c_{k}

is the context embedding of node

v_{k}

,

β_{i}

is the embedding of node

v_{i}

. Finally, heterogeneous negative sampling is adopted to approximate the objective function, i.e., Equation (14), for each node pair (

v_{i}

,

v_{j}

), as shown in Equation (16),

E = - log σ (c_{j}^{T} \cdot β_{i}) - \sum_{n = 1}^{L} E_{v_{k} \sim P_{C} (v)} [log σ (- c_{k}^{T} \cdot β_{i})],

(16)

where

σ

is the sigmoid function, L is the number of negative samples,

v_{k}

is randomly drawn from the noise distribution

P_{c} (v)

defined on node

v_{j}

’s corresponding to node set C.

The pseudocode of the proposed method is shown in Algorithm 1.

The time complexity in Algorithm 1 is determined by the number of nodes (

| V |

), the number of edge types (

| R |

), the vector dimension of node (d), and the length of metapath (l) and random walk (

l^{'}

), the number of negative samples per training sample (L), which is expressed as

O (| V | * | R | * (l + l^{'} + L * d))

. The space complexity is represented as

O (| V | * | R | * d))

by the number of nodes, the vector dimension of node, and the number of edge types.

Algorithm 1: The proposed model.

Input:: A heterogeneous graph $G = (V, E)$ , a set of relation constrained metapath S, embedding dimension d, learning rate $η$ , the length of random walk $l^{'}$ , the number of negative samples L, window size c.

Output: Embeddings of all nodes.
Initialize all the model parameters

θ

Generate random walk based path instances and neighbors according to Section 4.1.1;
Generate relation-constrained metapath instances and neighbors according to Section 4.1.2;

4.4. Classifier for Link Prediction

Link prediction is a downstream task to evaluate the effectiveness of the learned embeddings. We consider probability-based method to evaluate it. To be specific, the sigmoid function is the final classifier in computing the performance metrics after obtaining node embeddings. The link score between node

v_{i}

and

v_{j}

can be computed as

S = sigmoid (\sum_{\begin{matrix} v_{i}, v_{j} \\ i \neq j \end{matrix}} \frac{h_{i}^{T} h_{j}}{| | h_{i} | | * | | h_{j} | |}) .

(17)

5. Simulation Experiment and Results Analysis

In this section, we aim at answering the following research questions:

RQ1: How do different hyperparameters affect the final performance?
RQ2: How does the proposed method perform compared with other state-of-the-art baselines on the task of link prediction?
RQ3: How does the proposed method perform on cold-start scenarios compared to baselines?

5.1. Datasets

We use four real-world datasets to perform our experiments, including an Amazon dataset, a YouTube dataset, an Ali dataset, and a Twitter dataset. A brief description of them is given below.

Amazon dataset. The Amazon dataset [39,40] is composed of Amazon product reviews and metadata. In the experiment, only the metadata of electronic products are used. It includes two kinds of user behaviors: purchase and browsing.

YouTube dataset. The YouTube dataset [41] contains five user relationships: contacts, shared subscribers, shared subscriptions, shared friends, and shared favorite videos between users.

Ali dataset. The Ali dataset, published in Tianchi challenge (https://tianchi.aliyun.com/dataset/, accessed on 10 June 2022), contains user identity, commodity identity, and four types of user behavior, i.e.,clicking, add to favorite, add to cart and purchase. Because the dataset is too large, we randomly selected all the relevant data with a commodity class whose ID is 3424 in the experiment.

Twitter dataset. The Higgs Twitter dataset (https://snap.stanford.edu/data/higgs-twitter.html, accessed on 5 October 2022) has been built after monitoring the spreading processes on Twitter before, during, and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. It consists of four types of relations: retweeting, replying, mentioning, and social relationships among users.

The detailed statistics of datasets are summarized in Table 2. The structure of Amazon dataset {User Behavior, itemID1, itemID2} indicates that item 1 and item 2 are purchased or browsed by the same user at the same time. The structure of YouTube and Twitter datasets are similar to the Amazon dataset. The structure of the Ali dataset is {User behavior, userID, itemID}, which represents the relationship between user and item. The number of users and items used to train is 4099 and 17,073, respectively.

Table 2. Statistics of datasets used in experiments.

Each dataset is divided into three groups: training set, testing set, and validation set according to the ratio of 8:1:1.

5.2. Parameters Experiments (RQ1)

We introduce the parameter settings of the experiment. First, we set some fixed parameters in the experiment. The vector dimension of node in each scene is 10, and the overall dimension of the node is 200. The length of random walk is set to 10, and window size is set to 5. In order to evaluate the performance of the proposed model, ROC-AUC, PR-AUC, and F1 are used as evaluation metrics. Then, we evaluate the influence of different values of two parameters: the length of the relation-constrained metapath instance, and the number of instances per node at the start of the sequence. The experimental results of the Amazon dataset are shown in Figure 3. In Figure 3a, we observe the influence of the length of relation-constrained metapath instance by fixing the number of instances per node at the start of the sequence. To be specific, the number of instances per node is set to 60. From the figure, we have the following observation. As the length of relation-constrained metapath instance increases, the ROC-AUC, PR-AUC, and F1 values first increase, and then decrease. One possible reason for the early increase of the values of metrics is that increasing the length makes the nodes in a path instance aggregate more information from their neighborhood. Later on, the values of metrics decrease as the length of metapath instance increases, because increasing the length of metapath instance introduces some noise and reduces the semantic influence. All metrics achieve their best performance when the length of metapath instance is set to 5 on the Amazon dataset.

Figure 3. Parameters experiments of the Amazon dataset.

In order to verify the influence of the number of instances per node as the start of sequence, the length of relation constrained metapath instance is set to 5, and the experiments are carried out by changing the number of instances. As shown in Figure 3b, the values of metrics first increase, and then decrease. All the values of metrics reach the optimal value when the number of instances per node as the start of sequence is 60.

We continue to conduct experiments with the YouTube, Ali and Twitter datasets. The experimental results are shown in Figure 4, Figure 5 and Figure 6, respectively.

Figure 4. Parameters experiments of the YouTube dataset.

Figure 5. Parameters experiments of the Ali dataset.

Figure 6. Parameters experiments of the Twitter dataset.

Finally, we draw the conclusion that our model achieves the best performances when the two variable parameters are set to (5,60), (7,50), (5,60), and (7,40) with regard to the Amazon, YouTube, Ali, and Twitter datasets, respectively.

5.3. Baselines

We regard the representative baseline approaches that fall into the following two categories. (1) Traditional models including DeepWalk [4] and node2vec [6]. (2) Heterogeneous graph-embedding models including Metapath2vec [23], GATNE [27], HetGNN [25], MAGNN [18], mSHINE [42], and GraphMSE [43].

Deepwalk [4] and node2vec [6] are two classical graph-embedding models in homogeneous graphs. Deepwalk generates a corpus on the graph by random walk, and then the corpus is fed into the skip-gram model to train the feature representation of nodes. Node2vec designs a biased random walk programming. It effectively exploits different neighbourhoods. Metapath2vec [23] is to walk regularly on the heterogeneous graph according to the designed metapath. GATNE [27] formalizes the problem of attributed multiplex embedding of heterogeneous network and supports both transductive and inductive embeddings learning for attributed multiplex heterogeneous networks. HetGNN [25] jointly considers node heterogeneous contents encoding, type-based neighbors aggregation, and heterogeneous-type combination. MAGNN [18] proposes a graph neural network based on metapath aggregation to overcome the three limitations of existing heterogeneous graph-embedding approaches in terms of features. mSHINE [42] is designed to learn multiple node representations of different metapaths simultaneously.

The hyperparameters of all baseline methods are adjusted according to the optimal parameters introduced in the papers, and the dimension of node embedding is fixed to 200.

5.4. Contrastive Experiment Results and Discussion (RQ2)

The performance of each model in link prediction task is shown in Table 3.

Table 3. Performance comparison of different models on four datasets. The optimal performance and the suboptimal performance are denoted in bold and underlined fonts respectively.

Compared to the GATNE model, which is almost the best baseline, we can observe that the ROC-AUC, PR-AUC, and F1 metrics are increased by 1.49%, 1.21%, and 1.23% over the Amazon dataset, and 3.57%, 2.49%, and 3.34% over the YouTube dataset, 6.35%, 5.19%, and 10.35% over the Ali dataset, and 6.17%, 4.39%, and 3.47% over the Twitter dataset, respectively.

Apparently, our model outperforms all the state-of-the-art baselines. The possible reasons are as follows. (1) Deepwalk and node2vec are classical models and adapt to the homogeneous graph. In the training process, both the node attribute and edge attribute are ignored, which leads to the loss of information. (2) The metapath2vec involves multiple edge types in the instantiation of heterogeneous graphs. Aggregation of inconsistent edge type information into a single embedding may cause noise, which degrades the performance of link prediction. (3) The GATNE model does not pay attention to the similarity between nodes and the high-order neighbour importance of the metapath in the aggregated neighbour nodes. (4) Both HetGNN and MAGNN model mainly focus on the importance of node content features, but the experimental datasets do not include node content. (5) The mSHINE model mainly aims at the incompatibility between different metapaths. However, the metapaths we selected have no such problem.

We continue to compare computational times of each method. The YouTube dataset is used as an example to report the experimental results, which are shown in Figure 7.

Figure 7. Computational times of each method over the YouTube dataset.

5.5. Cold-Start Experiment (RQ3)

The proposed model is a inductive framework, because it collects node feature information to generate node embeddings for previously unseen data, i.e., cold-start data. For example, there is little interaction data for a newly registered user, who is called cold-start user. To verify the performance of our model on a cold-start experiment, we choose GATNE model as baseline. All the interactive data with item category ID equal 3424 are randomly selected from the Ali dataset as experimental data which includes 107,730 interactive records, 4888 users and 25,682 items. The number of nodes for training, testing and validation was 21,172, 16,890 and 16,988, and the amount of interaction data for training, testing and validation was 26,561, 17,706 and 17,709, respectively. The batch-size is set to 64, the window size is set to 5, the number of instantiations per node is 30, and the length of path is 13. The optimal experimental results are achieved according to the above setting. There are a great deal of cold-start users and items. The sparse rate reaches 99.96%. Experimental results are shown in Table 4. Comparing to GATNE, our model achieves better performance. To be specific, ROC-AUC, PR-AUC, and F1 are increased 3.62%, 3.15%, and 4.40%, respectively. It indicates that our model can better reliefs the cold start problem to a certain extent.

Table 4. Effect comparison of cold-start experiment.

6. Conclusions and Future Work

In this paper, we present a novel framework of heterogeneous graph-embedding learning which consists of a novel relation constrained metapath and a representation learning module. The proposed novel relation-constrained metapath can express the characteristics between nodes more efficiently in different scenarios. The node aggregation from both intrapath and interpath views organically combines the local representation of nodes. The experimental results show that the proposed model is superior to other models. In addition, the cold-start experiment shows that our model can also relieve the cold-start problem to some extent.

In the future, there are several necessary tasks that need to be done. (1) In many real-world situations, the size of the graph is too large to be stored in memory, and we design a parallelized version for large-scale graphs. (2) The problem of metapath incompatibility is deserved to study and explore. (3) A heterogeneous contrastive learning mechanism can be integrated into our model. (4) A hyperlink in hypergraphs can connect any number of nodes [38]. An interesting idea for future research is to extend the idea of the paper to hyperlink prediction.

Author Contributions

Conceptualization, C.Z.; Methodology, C.Z.; Software, F.S.; Validation, K.L.; Formal analysis, L.W.; Investigation, K.L.; Resources, F.S.; Writing—original draft, C.Z.; Writing—review & editing, K.L.; Supervision, S.W.; Project administration, B.Z.; Funding acquisition, S.W., B.Z. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Shandong Provincial Natural Science Foundation, China (ZR2020MF147, ZR2021MF017, ZR2021MF031).

Data Availability Statement

Where no new data were created.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, B.; Zhou, M.; Zhang, S.; Yang, M.; Lian, D.; Huang, Z. BSAL: A Framework of Bi-component Structure and Attribute Learning for Link Prediction. In Proceedings of the SIGIR’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G., Eds.; ACM: New York, NY, USA, 2022; pp. 2053–2058. [Google Scholar] [CrossRef]
Yadati, N.; Nitin, V.; Nimishakavi, M.; Yadav, P.; Louis, A.; Talukdar, P. NHP: Neural Hypergraph Link Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM’20, Online, 19–23 October 2020; pp. 1705–1714. [Google Scholar]
Han, J.; Tao, Q.; Tang, Y.; Xia, Y. DH-HGCN: Dual Homogeneity Hypergraph Convolutional Network for Multiple Social Recommendations. In Proceedings of the SIGIR’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G., Eds.; ACM: New York, NY, USA, 2022; pp. 2190–2194. [Google Scholar] [CrossRef]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-Scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Huang, T.; Dong, Y.; Ding, M.; Yang, Z.; Feng, W.; Wang, X.; Tang, J. Mixgcf: An improved training method for graph neural network-based recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Long Beach, CA, USA, 6–10 August 2021; pp. 665–674. [Google Scholar]
Yu, J.; Yin, H.; Li, J.; Wang, Q.; Hung, N.Q.V.; Zhang, X. Self-supervised multi-channel hypergraph convolutional network for social recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 413–424. [Google Scholar]
Chen, W.; Feng, F.; Wang, Q.; He, X.; Song, C.; Ling, G.; Zhang, Y. CatGCN: Graph Convolutional Networks with Categorical Node Features. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
Kang, J.; Zhu, Y.; Xia, Y.; Luo, J.; Tong, H. RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional Network. In Proceedings of the WWW’22: The ACM Web Conference 2022, Lyon, France, 25–29 April 2022; Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., Médini, L., Eds.; ACM: New York, NY, USA, 2022; pp. 1214–1225. [Google Scholar] [CrossRef]
Luo, D.; Bian, Y.; Yan, Y.; Liu, X.; Huan, J.; Zhang, X. Local Community Detection in Multiple Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 266–274. [Google Scholar] [CrossRef]
Luo, X.; Wu, J.; Beheshti, A.; Yang, J.; Zhang, X.; Wang, Y.; Xue, S. ComGA: Community-Aware Attributed Graph Anomaly Detection. In Proceedings of the WSDM’22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event/Tempe, AZ, USA, 21–25 February 2022; Candan, K.S., Liu, H., Akoglu, L., Dong, X.L., Tang, J., Eds.; ACM: New York, NY, USA, 2022; pp. 657–665. [Google Scholar] [CrossRef]
Gao, H.; Wang, Z.; Ji, S. Large-Scale Learnable Graph Convolutional Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018; Guo, Y., Farooq, F., Eds.; ACM: New York, NY, USA, 2018; pp. 1416–1424. [Google Scholar] [CrossRef]
You, J.; Gomes-Selman, J.M.; Ying, R.; Leskovec, J. Identity-aware graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 10737–10745. [Google Scholar]
Fu, X.; Zhang, J.; Meng, Z.; King, I. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In Proceedings of the WWW’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; Huang, Y., King, I., Liu, T., van Steen, M., Eds.; ACM: New York, NY, USA, 2020; pp. 2331–2341. [Google Scholar] [CrossRef]
Sun, Y.; Han, J.; Yan, X.; Yu, P.S.; Wu, T. PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. Proc. VLDB Endow. 2011, 4, 992–1003. [Google Scholar] [CrossRef]
Guan, W.; Jiao, F.; Song, X.; Wen, H.; Yeh, C.; Chang, X. Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning. In Proceedings of the SIGIR’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G., Eds.; ACM: New York, NY, USA, 2022; pp. 482–491. [Google Scholar] [CrossRef]
Wang, X.; Liu, N.; Han, H.; Shi, C. Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning. In Proceedings of the KDD’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021; Zhu, F., Ooi, B.C., Miao, C., Eds.; ACM: New York, NY, USA, 2021; pp. 1726–1736. [Google Scholar] [CrossRef]
Zheng, J.; Ma, Q.; Gu, H.; Zheng, Z. Multi-view Denoising Graph Auto-Encoders on Heterogeneous Information Networks for Cold-start Recommendation. In Proceedings of the KDD’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021; Zhu, F., Ooi, B.C., Miao, C., Eds.; ACM: New York, NY, USA, 2021; pp. 2338–2348. [Google Scholar] [CrossRef]
Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, USA, 13–17 August 2017; pp. 135–144. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’19, Anchorage, AK, USA, 4–8 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 793–803. [Google Scholar] [CrossRef]
Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous Graph Attention Network. In Proceedings of the World Wide Web Conference, WWW’19, San Francisco, CA, USA, 13–17 May 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2022–2032. [Google Scholar] [CrossRef]
Cen, Y.; Zou, X.; Zhang, J.; Yang, H.; Zhou, J.; Tang, J. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1358–1368. [Google Scholar]
Martínez, V.; Berzal, F.; Cubero, J.C. A survey of link prediction in complex networks. ACM Comput. Surv. (CSUR) 2016, 49, 1–33. [Google Scholar] [CrossRef]
Lü, L.; Zhou, T. Link prediction in complex networks: A survey. Phys. A Stat. Mech. Its Appl. 2011, 390, 1150–1170. [Google Scholar] [CrossRef]
Amara, A.; Taieb, M.A.H.; Aouicha, M.B. Cross-network representation learning for anchor users on multiplex heterogeneous social network. Appl. Soft Comput. 2022, 118, 108461. [Google Scholar] [CrossRef]
Zitnik, M.; Leskovec, J. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 2017, 33, i190–i198. [Google Scholar] [CrossRef] [PubMed]
Daud, N.N.; Ab Hamid, S.H.; Saadoon, M.; Sahran, F.; Anuar, N.B. Applications of link prediction in social networks: A review. J. Netw. Comput. Appl. 2020, 166, 102716. [Google Scholar] [CrossRef]
Chiluka, N.; Andrade, N.; Pouwelse, J. A link prediction approach to recommendations in large-scale user-generated content systems. In Proceedings of the European Conference on Information Retrieval, Dublin, Ireland, 18–21 April 2011; pp. 189–200. [Google Scholar]
Kumar, A.; Singh, S.S.; Singh, K.; Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. A Stat. Mech. Its Appl. 2020, 553, 124289. [Google Scholar] [CrossRef]
Negi, S.; Chaudhury, S. Link prediction in heterogeneous social networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 609–617. [Google Scholar]
Mishra, S.; Singh, S.S.; Kumar, A.; Biswas, B. HOPLP- MUL: Link prediction in multiplex networks based on higher order paths and layer fusion. Appl. Intell. 2022, 53, 3415–3443. [Google Scholar] [CrossRef]
Mishra, S.; Singh, S.S.; Kumar, A.; Biswas, B. MNERLP-MUL: Merged node and edge relevance based link prediction in multiplex networks. J. Comput. Sci. 2022, 60, 101606. [Google Scholar] [CrossRef]
Chen, C.; Liu, Y.Y. A survey on hyperlink prediction. arXiv 2022, arXiv:2207.02911. [Google Scholar]
He, R.; McAuley, J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, USA, 11–15 April 2016; pp. 507–517. [Google Scholar]
McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar]
Tang, L.; Liu, H. Uncovering cross-dimension group structures in multi-dimensional networks. In Proceedings of the SDM Workshop on Analysis of Dynamic Networks; 2009; pp. 568–575. Available online: https://www.public.asu.edu/huanliu/papers/sdm-adn09.pdf (accessed on 17 January 2023).
Zhang, X.; Chen, L. mSHINE: A Multiple-meta-paths simultaneous learning framework for heterogeneous information network embedding. IEEE Trans. Knowl. Data Eng. 2020. [Google Scholar] [CrossRef]
Li, Y.; Jin, Y.; Song, G.; Zhu, Z.; Shi, C.; Wang, Y. GraphMSE: Efficient Meta-path Selection in Semantically Aligned Feature Space for Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 4206–4214. [Google Scholar]

Figure 1. The framework of the proposed model. In the heterogeneous graph, the data processing is divided into two flow directions. Lastly, all the local embeddings are organically combined by two-level aggregation. (1) As shown by the orange arrow, in the relation subgraph, i.e., the edge type is constrained by purchase behavior. First, the sequences started from node

u_{1}

are generated by the random walk method. Then, the most related neighbors of

u_{1}

are obtained. Next, a particular embedding of

u_{1}

can be gained by a sum over the embedding of the most related neighbors in the particular relation subgraph, i.e., the random walk-based intrapath view. Finally, the embeddings of

u_{1}

in different relation subgraphs are obtained by random walk-based interpath aggregation. (2) As shown by the blue arrow, first, all the relation-constrained metapath instances started from node

u_{1}

are generated. Then, the high-order neighbors of

u_{1}

are obtained. Next, the initial embeddings of

u_{1}

and its neighbors are fed to Transformer. Finally, another particular embedding of

u_{1}

can be obtained by the sum over the output of the transformer. This flow is called metapath-guided intrapath aggregation. Furthermore, the results of metapath-guided interpath aggregation are assembled by each intrapath aggregation (best viewed in color).

Figure 2. A relation-constrained metapath instance and neighbor generation. In the heterogeneous graph, all the instances are generated according to the relation-constrained metapaths. After filtering metapath instances according to the type of first node in sequence, the neighbors in the same scene can be obtained. Finally, we obtain subgraphs with the same node type, such as, user–user or item–item subgraphs, which can capture high-order proximity between nodes under the specific user behaviour.

Figure 3. Parameters experiments of the Amazon dataset.

Figure 4. Parameters experiments of the YouTube dataset.

Figure 5. Parameters experiments of the Ali dataset.

Figure 6. Parameters experiments of the Twitter dataset.

Figure 7. Computational times of each method over the YouTube dataset.

Table 1. Notations used in the paper.

Notations	Definitions
$U, I$	User type, item type
$u, i$	A user $u \in U$ , an item $i \in I$
$V, E$	Node set, edge set
S	A set of relation-constrained metapath
v	A node $v \in V$
d	The dimension of feature representation of nodes
$\vec{h_{i}}$	The feature representation of node $v_{i}$
$T, R$	Node type set, edge type set
$P, l$	A metapath, the length of metapath
$P^{C}$	A relation-constrained metapath
$A, A^{'}$	The matrix including all nodes features in the sequence, the matrix after linear transformation
$Q, K, V$	The matrix representation after linear transformation
$l^{'}$	The length of sequence
L	The number of negative samples per training sample

Table 2. Statistics of datasets used in experiments.

Dataset	#Nodes	#Edges	#Types of Nodes	#Types of Edges
Amazon	10,166	148,865	1	2
YouTube	2000	1,310,617	1	5
Ali	33,969	132,500	2	4
Twitter	10,000	331,899	1	4

Table 3. Performance comparison of different models on four datasets. The optimal performance and the suboptimal performance are denoted in bold and underlined fonts respectively.

Dataset	Model	ROC-AUC	PR-AUC	F1
Amazon	Deepwalk	94.20	94.03	87.38
	node2vec	94.47	94.30	87.88
	metapath2vec	94.15	94.01	87.48
	HetGNN	95.85	94.71	88.06
	GATNE	96.28	96.31	92.12
	MAGNN	87.86	87.23	85.67
	mSHINE	79.10	80.67	73.57
	Ours	97.72	97.48	93.26
YouTube	Deepwalk	71.11	70.04	65.52
	node2vec	71.21	70.32	65.36
	metapath2vec	70.98	70.02	65.34
	HetGNN	80.05	79.18	73.32
	GATNE	82.29	81.81	74.63
	MAGNN	75.16	74.69	70.13
	mSHINE	66.53	63.11	62.55
	Ours	85.23	83.85	77.13
Ali	Deepwalk	69.56	37.07	36.25
	node2vec	62.84	46.93	47.01
	metapath2vec	79.09	78.59	71.76
	HetGNN	79.95	78.24	75.37
	GATNE	82.87	85.26	76.13
	MAGNN	63.86	65.15	62.36
	mSHINE	41.07	53.04	52.43
	Ours	88.14	89.69	84.01
Twitter	Deepwalk	69.42	72.58	62.68
	node2vec	69.9	73.04	63.12
	metapath2vec	69.35	72.61	62.7
	HetGNN	72.36	75.28	69.71
	GATNE	72.4	74.4	65.89
	MAGNN	69.85	74.24	64.38
	mSHINE	67.17	70.85	63.98
	Ours	76.87	77.67	68.18

Table 4. Effect comparison of cold-start experiment.

Metric	GATNE	Ours
ROC-AUC	85.16	88.25
PR-AUC	86.90	89.64
F1	78.27	81.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Learning Heterogeneous Graph Embedding with Metapath-Based Aggregation for Link Prediction

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Definitions

3.2. Problem Definition

3.3. Notations

4. The Proposed Method

4.1. Instantiating Paths and Generating Neighbors

4.1.1. Random Walk-Based Metapath Instance and Neighbour Generation

4.1.2. Relation-Constrained Metapath Instance and Neighbor Generation

4.2. Node Aggregation

4.2.1. Intrapath Aggregation

Random Walk-Based Intrapath Aggregation

Relation Constrained Metapath-Guided Intrapath Aggregation

4.2.2. Interpath Aggregation

4.3. Model Optimization

4.4. Classifier for Link Prediction

5. Simulation Experiment and Results Analysis

5.1. Datasets

5.2. Parameters Experiments (RQ1)

5.3. Baselines

5.4. Contrastive Experiment Results and Discussion (RQ2)

5.5. Cold-Start Experiment (RQ3)

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics