Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN

Liu, Yi; Yin, Chengyu; Li, Jingwei; Wang, Fang; Wang, Senzhang

doi:10.3390/a15030080

Open AccessArticle

Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN^†

¹

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

Shenzhen Research Institute, Nanjing University of Aeronautics and Astronautics, Shenzhen 518000, China

³

Department of Computer Science and Engineerting, University at Buffalo, The State University of New York, Buffalo, NY 14200, USA

⁴

School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China

⁵

School of Computer Science and Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Recursive RNN based Shift Representation Learning for Dynamic User-Item Interaction Prediction, Foshan, China, 12–14 November 2020.

Algorithms 2022, 15(3), 80; https://doi.org/10.3390/a15030080

Submission received: 24 January 2022 / Revised: 21 February 2022 / Accepted: 25 February 2022 / Published: 28 February 2022

(This article belongs to the Special Issue Graph Embedding Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurately predicting user–item interactions is critically important in many real applications, including recommender systems and user behavior analysis in social networks. One major drawback of existing studies is that they generally directly analyze the sparse user–item interaction data without considering their semantic correlations and the structural information hidden in the data. Another limitation is that existing approaches usually embed the users and items into the different embedding spaces in a static way, but ignore the dynamic characteristics of both users and items. In this paper, we propose to learn the dynamic embedding vector trajectories rather than the static embedding vectors for users and items simultaneously. A Metapath-guided Recursive RNN based Shift embedding method named MRRNN-S is proposed to learn the continuously evolving embeddings of users and items for more accurately predicting their future interactions. The proposed MRRNN-S is extended from our previous model RRNN-S which was proposed in the earlier work. Comparedwith RRNN-S, we add the word2vec module and the skip-gram-based meta-path module to better capture the rich auxiliary information from the user–item interaction data. Specifically, we first regard the interaction data of each user with items as sentence data to model their semantic and sequential information and construct the user–item interaction graph. Then we sample the instances of meta-paths to capture the heterogeneity and structural information from the user–item interaction graph. A recursive RNN is proposed to iteratively and mutually learn the dynamic user and item embeddings in the same latent space based on their historical interactions. Next, a shift embedding module is proposed to predict the future user embeddings. To predict which item a user will interact with, we output the item embedding instead of the pairwise interaction probability between users and items, which is much more efficient. Through extensive experiments on three real-world datasets, we demonstrate that MRRNN-S achieves superior performance by extensive comparison with state-of-the-art baseline models.

Keywords:

recursive RNN; user–item interaction; shift embedding; meta-path

1. Introduction

In the era of big data, the large volume of online information generated in real time makes it very difficult for people to quickly find the valuable information they need. The recommendation algorithm is effective to solve the current information overload problem by recommending useful information to users while filtering out the less relevant information [2,3]. For example, in some online multimedia websites, user experience can be greatly improved by recommending the movies or songs to them which they may be interested in [4]. In many e-commerce platforms, recommending products that users would like to buy can help save their time and money [5]. Thus, accurately predicting the interactions between users and items is critically important to recommendation [6,7,8]. Figure 1a shows an example of the sequential interactions between two users (Bob and Alice) and the items. One can see that Bob first buys a cell phone, and shortly he buys a phone case and a phone film. Thus, we can infer that he is more likely to buy a mobile earphone rather than a suit. Figure 1b shows an example of user–item interaction on an e-commerce platform. Each arrow represents an interaction from a user to an item, e.g., a user buying or browsing a commodity on e-commerce platform Taobao, with each interaction associated with a timestamp t and a feature vector f (such as the interaction types, users and commodity features). An interaction between a user and an item can be a user clicking, buying, or browsing a certain commodity on the online shopping website.

Increasing research efforts have been devoted to the research topic of recommender system and great progress has been made recently [9,10]. Inspired by the successful applications in natural language processing (NLP), Hidasi proposed to use the GRU module to process the session-based behavior sequence of users for predicting the next item [11]. The interaction sequence is input into an LSTM module for adapting the dynamics of users and items. Wang et al. [12] applied the graph neural network architecture to knowledge graph to acquire rich auxiliary information for the recommendation task. However, there are three major issues when applying such methods directly to our studied problem. First, as shown in Figure 1a, the successive online behaviors of users are usually highly correlated but are largely ignored by existing collaborative filtering-based recommender systems [13,14]. Intuitively, if a user first buys a suit and then buys a pair of shoes, the next item that the user is more likely to buy in the near future is an apparel product rather than an electronic product such as a mobile phone. Second, some existing works only pay attention to exploring the interaction relationships between users and items, while ignoring the relationships between users and users as well as items and items. User–user and item–items interactions are important auxiliary information to solve the data sparsity issue. Third, the users’ preference and the item popularity evolve over time [15,16]. Existing works mostly learn a static represent vector for each user and item but fail to capture the dynamic representations of users and items that evolve over time.

To address the above issues, in this paper we propose a Meta-path guided Recursive RNN based Shift embedding method named MRRNN-S to more effectively learn the dynamic representations of users and items, based on which the future user–item interactions can be more accurately predicted. Inspired by the previous work [1], we argue that the previously proposed RRNN-S model can still be further improved in terms of processing the original data. RRNN-S only utilized the second-order graph structural information but ignored the fact that the sequence data and the heterogeneity hidden in the graph can also provide auxiliary information. Therefore, we propose to add two new modules, namely the word2vec module and the skip-gram based meta-path module to the RRNN-S model. The word2vec module treats the sequential interaction data as sentences and then extracts the semantic information of the sentences by using the word2vec method. The ship-gram-based meta-path module models the user–item interaction data as a heterogeneous graph to more effectively capture the heterogeneous information, and then learns the node embeddings on the graph. MRRNN-S first uses a word2vec module to capture the features from the user–item interaction sequence. A skip-gram-based meta-path module is proposed to capture the heterogeneous information and higher-order proximity from the user–item interaction graph. Next, a recursive RNN module is also designed to catch the sequential dependence of user–item interactions simultaneously by mapping users and items into the same latent representation space. The embeddings of users and items are mutually and iteratively updated by the proposed recursive RNN. Then, a shift embedding module is designed to predict the continuous future embedding of a user through the time interval and then predict the user embedding. Finally, we predict the embedding of the item and identities the item whose embedding vector is closest to it in the embedding space.

To summarize, our main contributions are as follows:

To acquire the rich auxiliary information from the user–item interaction data, we model the original user–item interaction data as sequences and graphs, respectively. A word2vec module is proposed on the interaction sequence of each user, which aims to learn the initial embedding that preserves the sequential pattern and semantic information from sequences. Then a skip-gram-based meta-path module is proposed to the user–item interaction graph for capturing the heterogeneous information and the higher-order user–item relationships.
We propose to apply the GCN module to learn the node features of the user–item interaction graph, so that the similar users or items are closer in the feature space.
Comprehensive experiments are conducted over three user–item interaction graph datasets. The result demonstrates the effectiveness of our method against several competitive baselines.

The proposed MRRNN-S is an extended version of the RRNN-S model which was proposed in our earlier paper published in ADMA2020. Next, we briefly describe the difference between this paper with our previous conference paper. We extract the static embedding with a newly designed module to better capture the auxiliary information. The previous work [1] only considered the structure information from the subgraphs extracted from the user–item interaction graph when exploring the static user and item embeddings. This work also improves the model’s capability of feature extraction by transforming the original data into the interaction sequence of each user and the user–item interaction graph, which can more effectively learn the embeddings by preserving the sequential information as well as the heterogeneous graph information. We also re-conduct most of the experiments and add several new experiments to demonstrate the superiority of MRRNN-S.

The remainder of this paper is organized as follows. We will first discuss related works in Section 2. Then Section 3 will give some notations and a formal definition of the studied problem. Section 4 will introduce the proposed model MRRNN-S and the objective function. In Section 5, we will evaluate our approach and report the results. Finally, the conclusion will be given in Section 6.

2. Related Work

In this section, we review related works from the aspects of collaborative filtering recommendation, deep learning-based recommendation, graph-based recommendation, and temporal network embedding.

2.1. Collaborative Filtering Recommendation

The collaborative filtering (CF) algorithm is one of the most classic models in recommender systems. Its main idea is to obtain the collective wisdom from a large number of user behavior data for recommendations. CF can be roughly divided into user-based collaborative filtering [17], item-based collaborative filtering [18], and model-based collaborative filtering [19]. Linden et al. proposed an item to item collaborative filtering algorithm [20], which matched the items interacted by a user to the similar items, and then put all similar items into a recommendation list. High-quality real-time recommendations can be generated because the number of online users in the e-commerce platform is irrelevant to the number of items in the item catalog. A model named neural collaborative filtering (NCF) is proposed to improve the capability of feature interaction learning by replacing the inner product operation in the matrix factorization model with a neural network [21]. However, most of the existing collaborative filtering models ignore the latent sequential patterns when dealing with the dynamic user–item interaction data.

2.2. Deep Learning-Based Recommendation

Due to the powerful feature extraction capabilities of deep learning techniques, a lot of recent works combined deep learning with recommendation systems and achieved promising performance [22]. The recent deep learning-based recommendation models can be roughly categorized into RS with neural building blocks (e.g., MLP, AE, RNN, CNN, etc.) and RS with deep hybrid models (e.g., RNN + CNN, AE + CNN, etc.). Ref. [23] jointly trained the wide linear models and deep neural network to combine the benefits of memorization and generalization for recommender systems. Ref. [24] proposed a model that integrated CNN into probabilistic matrix factorization, which was able to capture the contextual information of documents and improved the prediction accuracy. Ref. [25] designed a flexible encoder-decoder architecture which consisted of CNN and RNN. The model is capable of incorporating author metadata to learn a robust representation of the citation context. In order to acquire rich auxiliary information, ref. [26] combined a denoising auto-encoder model with a convolutional auto-encoder model to extract the textual and visual features of items, respectively. However, these works only focus on learning static user and item embeddings, which is not suitable for a temporal user–item interaction network. The sequential pattern is hidden in temporal user–item interaction reflects the dynamics of user preference for an item over time. Our work aims to capture the sequence of dynamic embeddings of users and items.

2.3. Graph-Based Recommendation

Recently, the related research on graph neural networks has made great progress in various domains [27,28,29]. Considerable works tried to apply the graph neural networks in the recommender systems because the user–item interaction data can be modeled as a graph. Sun et al. used the Bayesian graph convolutional network BGCN to model the uncertainty in the user–item interaction graph to solve the problem of unreal connections between some nodes in the graph [30]. It is verified that the feature transformation and nonlinear activation operations of traditional GCN are invalid for collaborative filtering, and thus a lightweight GCN model (LightGCN) is proposed for recommender systems [31]. Fan et al. preserved the sequential pattern in the interaction graph by assigning timestamp attributes to the edges in the user–item interaction graph. Inspired by the transformer, the TCT model was proposed to combine the sequential pattern in the data with the collaborative signal [32]. In order to obtain richer auxiliary information from a graph, more and more researchers pay their attention to heterogeneous graphs. Jiang et al. proposed a novel contrast GNN pre-training strategies on heterogeneous graphs, which was able to capture the semantic and structural information in a self-supervised learning way [33]. Since there are few studies on the dynamic embedding methods of heterogeneous graphs, Zhang et al. proposed the MDHNE model, which converted the heterogeneous graph into multiple views, and retained the evolution mode of the relationship between multiple views over time [34].

2.4. Temporal Network Embedding

Considering that the preference of a user and the popularity of an item can both change over time, increasing research interests have been devoted to temporal network embedding [35,36]. For example, Li et al. proposed the DANE model, which combined network topology and node features to achieve rapid dynamic update and learn dynamic network embedding [37]. Zhu et al. designed the DHPE model based on the generalized SVD and matrix perturbation theory [38], which was able to preserve the high-order proximity while dynamically updating the node representation of the network. However, these algorithms learn embeddings from a sequence of graph snapshots, which is not suitable to our setting of the successive user–item interaction data. With the development of the NLP techniques [39,40], some NLP methods are also applied to recommender systems. For example, inspired by the skip-gram model, Nguyen et al. proposed a model which was based on the embedding method with temporal random walk [41]. It aims to learn the more meaningful time-respecting embeddings from continuous-time dynamic networks. The drawback of the model is that it only generates the final static embedding of nodes. Recently, Xu et al. designed a temporal graph attention model (TGAT) to aggregate the temporal–topological neighborhood features and learn the time-feature interactions [42]. TASER proposed by Ye et al. is able to model the absolute time pattern and relative time pattern. The former highlights the users’ time-sensitive behavior, and the latter shows the effect of the time interval on the relationship between two actions [43].

3. Preliminary and Problem Definition

3.1. Preliminary

In a typical user–item interaction scenario, we use u

_{t}

∈

R

^{n}

∀u

\in U

to denote the user embedding and i

_{t}

\in R

^{n}

∀i

\in I

to denote the item embedding, where

U

and

I

are the sets of users and items, respectively. Interaction between users and items is an ordered sequence and is denoted as

S

. One historical interaction record is denoted as S = (u, i, t, f)

\in S

, where u and i denote a user and an item in

U

and

I

, separately. t is the timestamp of interaction S. Each interaction has an associated feature vector f (e.g., the embedding of user, item or interaction information). Table 1 lists the symbols and their descriptions used in this paper.

3.2. Problem Definition

According to the notations above, we can formulate our problem as follow. Given a set of historical user–item interactions

S

, our aim is to learn the future embeddings

u_{t +}

and

i_{t +}

for users and items, respectively; and predict with which item the user u will interact in a given future time slot t + .

4. Methodology

In this section, we will introduce our model which contains three parts as shown in Figure 2. The first part is the auxiliary information extraction from the historical user–item interactions. Next, the second part is the generation of the dynamical user and item embeddings. Finally, the objective function of the model will be introduced.

4.1. Auxiliary Information Extraction

For the user–item interaction data, auxiliary information can be acquired from the sequences and the graphs, which will be introduced in detail as follows.

Features extraction from the interaction sequence. We can regard the interaction sequence of one user as a sentence and then learn the sequential information from it. As shown in the white box of Figure 2, each item interacted by the user in the sequence can be regarded as a word in a sentence. Thus, we can project the representations of items and users to a common latent space by word2vec. For example, as illustrated in Figure 3, this is an interaction sequence of user Jack with items. He first buys a basketball, and shortly he buys a basketball jersey, sneakers, and knee pads. Obviously, these four items share some common features because they all belong to the category of sporting goods. Thus it is meaningful to make them close to each other in the projected latent feature space.

Given an interaction sequence of

u s e r_{j}

which is denoted as xj

= [z_{1}, z_{2}, . . ., z_{T}]

, where

z_{t} \in

R

^{d_{0} \times 1}

denotes the t-th item interacted by

u s e r_{j}

,

d_{0}

is the initial dimension of item. For each position t = 1, …,T, our task is to predict the context words within a window of size m, given the center word

z_{c}

.

The likelihood function can be formulated as follows

L (θ) = \prod_{t = 1}^{T} \prod_{\binom{- w \leq c \leq w}{c \neq 0}} P (z_{t + c} | z_{t}; θ) .

(1)

The objective function J(θ) is the average negative log likelihood as follows

J (θ) = - \frac{1}{T} l o g L (θ) = - \frac{1}{T} \sum_{t = 1}^{T} \sum_{\binom{- w \leq c \leq w}{c \neq 0}} l o g P (z_{t + c} | z_{t}; θ) .

(2)

Our goal is to find the parameters

θ

to minimize the objective function J(θ), and we will acquire the initial embedding of users and items which contains the sequence information.

Features extraction from graphs. The user–item interaction data can be also treated as a graph. As shown in the blue box in Figure 2, the user–item interaction graph consists of users’ historical interaction data. There are two types of nodes including users and items. A solid line connects one user node and one item node if the user interacts with the item. As depicted in Figure 4, one can see

u s e r 1

interacts with

i t e m 1

,

i t e m 2

, and

i t e m 4

while

u s e r 2

interacts with

i t e m 2

,

i t e m 3

, and

i t e m 4

in this user–item interaction graph. It is reasonable to infer that

i t e m 2

and

i t e m 4

are more similar because both of them are interacted by users at the same time. When

u s e r 3

interacts with

i t e m 2

, it is appropriate to recommend

i t e m 4

to

u s e r 3

. Thus, the user–item interaction graph is able to preserve rich auxiliary information. In addition, in order to make use of the heterogeneity, inspired by [44], we define the metapath which is able to reflect the semantic information between nodes in the graph. For example, given the metapath U-I-U, (U-I) means a user interacts with an item, and (I-U) means the item is interacted by another user. We can sample some instances of the predefined metapaths in the graph by random walk, and embed each node with a skip-gram model to capture the heterogeneity and structure information from the graph.

Given the user–item interaction graph G(V, E, T), in which each node v and each link e are associated with their mapping functions

ϕ

(v): V→

T_{V}

and

φ

(e): E→

T_{E}

, respectively. For a given node v, we aim to maximize the probability of correctly predicting its’ neighborhood nodes N

_{t}

(v), t∈

T_{V}

. It is helpful to use skip-gram model as follows to learn the node representations which preserve the heterogeneous semantic information

arg max_{θ} \sum_{v \in V} \sum_{t \in T_{V}} \sum_{c_{t} \in N_{t} (v)} log p (c_{t} | v; θ)

(3)

where N

_{t}

(v) denotes the neighborhood of node v of tth type. The node type can be a user or item. After the two auxiliary features extraction methods described above, we obtain the static embedding of each user node and item node with rich semantic information.

Then, a GCN module is applied to the user–item graph for learning structure information. Different from the general GCN which aggregates the features of the first order neighbor nodes, we design a GCN that only aggregates the second-order neighbor nodes’ features by considering the fact that the user–item graph is a bipartite graph. In this way, each node is able to obtain the features of its nearest neighbor nodes of the same type, i.e., the user nodes only receive the information from the nearest other user nodes through common item nodes, and so do the item nodes. To be specific, the information aggregation can be expressed as follows

H^{l + 1} = σ (D^{- \frac{1}{2}} A^{2} D^{- \frac{1}{2}} H^{l} W^{l}), l = 0, 1, \dots, L - 1

(4)

where

H^{l}

is the matrix of hidden representations of users and items in layer l, and

H^{0} = [E_{u}; E_{i}]

. A is the adjacency matrix. D is diagonal degree matrix of A.

W^{l}

is the trainable weight matrix,

σ

is a non-linear activation function, and L is the number of layers. The final outputs

h^{l} = [\bar{u}; \bar{i}]

contains the static user embeddings and the static item embeddings.

Finally, the static user embedding

\bar{u}

and the static item embedding

\bar{i}

is concatenated with the user–item interaction feature vector f respectively to generate the auxiliary information

o_{u}

and

o_{i}

for learning dynamic user and item embeddings.

4.2. Dynamical Embedding Generation

In this section, we will introduce how to generate the dynamical embeddings of users and items based on the learned static embeddings learned above.

Recursive RNN. As illustrated in the green box of Figure 2, there are two recursive RNN models. One is the UserRNN and the other is ItemRNN. The two components are designed to generate user and item dynamic embeddings according to their historical interaction, and they are shared by users and items to mutually learn dynamic embeddings of users and items. A user/item RNN is composed of RNN layers and the hidden states of the user/item RNN are used to represent user/item embeddings.

In the recursive RNN module, the user and item embeddings will be updated by the user RNN module and item RNN module, respectively, when an interaction between a user and an item occurs. To be specific, the user embedding

u_{t}

will be updated based on the user embedding

u_{t - 1}

, the item embedding

i_{t - 1}

and the auxiliary feature vector

o_{u}

at the previous timestamp

t - 1

. The item embedding is updated with the item RNN in a similar way to the user RNN. In this way, the user or item embeddings are able to absorb the information hidden in the interaction data. The user and item embeddings can also be encoded into the same latent space. Note that the embeddings of users and items evolve with the dynamic user–item interaction. There are two advantages in the designed recursive RNN model. First, users and items are embedded into the same latent space, and thus the similarity between users and items can be easily obtained through measuring their distance in the same embedding space. For example, if users A and B interacted with the items C and D respectively, and item C is similar with item D, we consider the users A and B are also similar. Based on this idea, the embeddings of users and items can be updated by the following formula iteratively.

\begin{matrix} u_{t} = σ (W_{1}^{u} u_{t - 1} + W_{2}^{u} i_{t - 1} + W_{3}^{u} o_{u}) \\ i_{t} = σ (W_{1}^{i} u_{t - 1} + W_{2}^{i} i_{t - 1} + W_{3}^{i} o_{i}) \end{matrix}

(5)

In Equation (5),

u_{t}

and

u_{i}

are dynamic embeddings of user u and item i at time t.

o_{u}

and

o_{i}

are auxiliary features.

σ

is a sigmoid function.

W_{1}^{u}

, …,

W_{3}^{u}

are the parameter matrices of user RNN, and

W_{1}^{i}

, …,

W_{3}^{i}

are the parameter matrices of item RNN.

Shift embedding. In order to predict the embedding of the user in the future, we design a shift embedding module that works as an embedding projection operation. The predicted embedding can then be used for downstream tasks, such as link prediction or recommendation. Existing works ignore the importance of temporal information and are not capable of continuously updating the interaction embeddings over time. The embeddings are only updated discretely when new interactions occur. However, we argue that users and items can evolve over time continuously such as the interests of users, attributes of items, and so on. We assume that the user and item embeddings can still change smoothly and continuously over time even there is no new interaction between a user and an item.

The part in the box of Figure 2 shows the shift embedding module which is able to capture the temporal dynamics of user embedding by considering the elapsed time information. If the time gap between two successive interactions of one user is large, the user embedding learned from the previous interaction is not appropriate to predict the current and continuously adjust the user embeddings between the time interval of two successively interactions with items. In other words, recent interactions have a greater impact on the user embedding, while interactions too long ago have a weaker impact. Inspired by previous work, the feedback loop of RNN keeps the previous information of hidden states as internal memory. First, we use a linear layer to obtain the internal memory needed to be adjusted

u_{t}^{S}

. Then it is adjusted by the elapsed time

\tilde{u_{t}^{S}}

. Finally, to compose the shifted embedding, the adjusted internal memory is combined with the original user embedding, (

\hat{u_{t + Δ}}

=

u_{t}

+

\tilde{u_{t}^{S}}

). Details of the shift embedding module are given below:

\begin{matrix} u_{t}^{S} = σ (W_{s} u_{t} + b) \\ \tilde{u_{t}^{S}} = u_{t}^{S} * g (Δ) \\ \hat{u_{t + Δ}} = u_{t} + \tilde{u_{t}^{S}} \end{matrix}

(6)

where

Δ

denotes the time interval since last previous user–item interaction,

W_{s}

is the parameter matrix of the linear layer and b is bias. The function

g (Δ) = W_{p} \cdot l o g (e + Δ)

is used to convert

Δ

to a time-context vector and

W_{p}

is trainable parameters.

\hat{u_{t + Δ}}

is the predicted user embedding at time

t + Δ

.

4.3. Overall Objective Function

In the final prediction step, we aim to predict the embedding

\tilde{J_{t}}

of the item that a user will interact with. Most existing models that give the highest interaction probability among all user–item pairs usually need to accomplish the neural-network forward process for each item, which is very time-consuming. Instead of searching the highest interaction probability from all the user–item pairs, we directly outputs the item embedding vector. Thus, our model only needs to accomplish the forward process once and predicts the item embedding in the shift embedding module, and then the item whose embedding vector is closest to the predicted one in the embedding space will be selected. Thus our model is much more efficient than existing models. The item embedding prediction function is given as follows.

\tilde{j_{t + Δ}} = W_{1} \hat{u_{t + Δ}} + W_{2} \bar{u} + W_{3} i_{t + Δ - 1} + W_{4} \bar{i} + b

(7)

where

W_{1}, \dots, W_{4}

are trainable parameters and b is a bias vector,

\bar{u}

and

\bar{i}

represent static embeddings of users and items, respectively.

\hat{u_{t + Δ}}

is the output of shift embedding module. Note that

i_{t + Δ - 1}

is the dynamic item embedding before

t + Δ

.

For training the parameters of the model, we minimize the

L_{2}

difference between the predicted item embedding

\tilde{j_{t}}

and the ground truth ground item embedding

j_{t}

at every interaction. We aim to minimize the following loss.

m i n i m i z e \sum_{(u, j, t, f) \in S} ∥ \tilde{j_{t}} - j_{t} ∥_{2} + λ_{U} ∥ u_{t} - u_{t - 1} ∥_{2} + λ_{I} {∥ i_{t} - i_{t - 1} ∥}_{2}

(8)

The first loss term is the error of the predicted embedding vector. To prevent the embeddings of users and items change sharply, the last two terms are embedding smoothness regularization, and

λ_{U}

and

λ_{I}

are scaling parameters.

5. Experiment

In this section, we evaluate our model on three real-world datasets: Wikipedia edits, Reddit posts, and JingDong online business. We will first introduce the datasets, the baselines, and the experiment setup, and then discuss the experiment results.

5.1. Dataset

The Wikipedia dataset, the Reddit dataset, and JingDong dataset are used for evaluation, and Table 2 presents the details of three datasets.

Wikipedia editing dataset: This dataset contains one month of edits made on Wikipedia pages. We select the 1000 pages that get the most edits as items and editors who made at least 5 edits as users. In total, we have 8227 users. There are 157,474 interactions between the selected users and pages in total, and the edited text is considered as features.
Reddit post dataset: This dataset includes one month of posts made by users on subreddits. We select the 1000 most active subreddits as items and the 10,000 most active users. There are 672,447 interactions and the text of each post is converted to a feature vector.
JingDong dataset: This dataset is extracted from JD.com, which contains the records of users’ online behaviors on the JingDong website. It contains 1,198,735 interactions between 10,692 users and 303,150 items from March 2020 to April 2020.

5.2. Baselines and Evaluation Metrics

We compare the proposed method with the following baseline models.

LSTM [45] is an important ingredient of RNN architectures. Here we simply record the sequence of items, dropping of the time information.
Time-LSTM [45] is a new LSTM variant, which equips LSTM with time gates to model the time intervals.
Jodie [36] is a coupled recurrent neural network model to learn the dynamic embeddings of users and items. Here we ignore the one-hot embedding for item in Jodie, because it cannot be utilized in a large number of items.
NGCF [28] is a recommendation framework based on a graph neural network, which explicitly encodes the collaborative signal in user–item bipartite graph by performing embedding propagation.
LightGCN [31] is a state-of-the-art collaborative filtering based method. It simplifies the design of GCN to make it more concise and appropriate for recommendation.
RRNN-S [1] is a recent state-of-the-art recursive RNN based shift embedding model for predicting dynamic user–item interaction.

We use 80% data for training, 10% for validation, and the remaining 10% for testing. We adopt mean reciprocal rank (MRR) and Recall@K defined as follows as the evaluation metric. MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by the probability of correctness. It is the average of the reciprocal ranks of results for a sample of queries Q:

M R R = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} \frac{1}{r a n k_{i}}

, where

r a n k_{i}

refers to the rank position of the first relevant document for the

i - t h

query. Recall@K measures the fraction of the total amount of relevant instances that are actually retrieved.

The embedding dimension is set to 128, the learning rate is 0.001, and the model is trained with Adam optimizer with a weight decay of 0.00001. The loss curves of the training process for the two datasets are shown in Figure 5. It shows that the model converges quickly. Within around 10 epochs, the training loss first drops quickly and then becomes stable.

5.3. Results

5.3.1. Comparison with Baselines

Table 3 shows the results of our model and the baseline models. We observe that MRRNN-S outperforms the baselines in most cases on the three datasets. It is also worth noting that our model performs significantly better than other models on the JingDong dataset, which is a relatively larger dataset. The best results are highlighted with bold font, and the best results achieved by baselines are underlined.

Among the baselines, LSTM only uses the interaction order information in the item sequence, and it does not take the time interval length of two successive interactions into consideration. As a variant of LSTM, Time-LSTM incorporates the time interval information into sequence data learning. Thus, Time-LSTM outperforms LSTM by 12.7% in the Wikipedia dataset and 12.2% in the JingDong dataset. Compared with LSTM and Time-LSTM, NGCF and LightGCN are collaborative filtering-based methods, which are not able to capture the time information in the temporal user–item interaction network. Thus the two methods are not suitable for the dynamic user–item interaction prediction and their performance is the worst among all the methods. Therefore, LSTM and Time-LSTM perform better than NGCF and LightGCN. Jodie considers dynamic and sequentially dependence between user–item interactions, which means richer auxiliary information can be absorbed into the model for a more accurate prediction. RRNN-S is able to capture the user–user and item–item relationships which is helpful to learn the similarity between nodes from high-order proximity. It improves the performance by 4.2%, 2.1%, and 9.2% on the three datasets compared with Jodie, respectively. One can also observe that MRRNN-S not only obtains sequence patterns from sequence data but also explores semantic information and structural information from heterogeneous graph data. Thus MRRNN-S can acquire more detailed auxiliary information which results in MRRNN-S outperforming RRNN-S. As the size of the dataset increases (the number of interactions in the dataset), the performance improvement of model MRRNN-S becomes more and more obvious, especially in the JingDong dataset.

5.3.2. Ablation Study

To investigate whether the components in our proposed model are all useful, we further compare MRRNN-S with the following four variants.

MRRNN-1 drops the meta-path module which is able to acquire the heterogeneity and semantic information from user–item interaction graph. Only the embedding learned by the word2vec module is fed to the prediction model.
MRRNN-2 drops the word2vec module which can capture the sequential information from the interaction sequence. Only the feature vectors extracted by the meta-path module are fed to the remaining part of MRRNN-S.
MRRNN-3 drops the GCN module which is designed to catch the structural information from the user–item interaction graph. Only the feature vectors processed by the meta-path module and word2vec module are input into the model.
MRRNN-4 drops the meta-path, word2vec, and GCN modules at the same time, and randomly initializes the embedding by a normal distribution as the model input.

As shown in Table 4, one can see that the word2vec module, the meta-path module, and the GCN module are all useful because the performance will decrease when any one of them is removed.

In order to show the influence of each module on MRRNN-S more intuitively, we draw a histogram of the percentage of each variant’s performance to the performance of the full model over the two datasets presented in Figure 6. Through Figure 6, one can clearly observe that the sequential information hidden in the interaction sequence seems to be more important on both two datasets because ignoring such information will result in a remarkable performance decline in terms of MRR and Recall@10. One of the possible reasons is that the data sets are mainly composed of sequence data, while the graph data structure is relatively simple. In particular, there are only two types of nodes that can be utilized. In addition, one can also observe that the performance of MRRNN-2 on the two datasets is quite different, with the performance on the Wikipedia dataset significantly better than the performance on the JingDong dataset. This is probably because of the contradiction between the large amount of data and too few types of nodes in the JingDong dataset, which makes the model more difficult to acquire useful auxiliary information from the sparse heterogeneity. Overall, MRRNN-S achieves the best performance in both Wikipedia and JingDong datasets when the three modules are combined, which verifies that the four proposed modules are all helpful for improving the model performance.

5.3.3. Parameters Sensitivity Analysis

In this section, in order to analyze the impact of the parameters on model performance, we design the following experiments to study the sensitivity of our model on the embedding dimensions and the parameters

λ_{I}

,

λ_{U}

.

Figure 7 depicts the MRR under different dimensions of embedding on the JingDong dataset. It shows that the best performance is achieved when the dimension is set to 64. From the trend of the performance curve in the figure, one can see that as the embedding dimension increases, the performance of the model becomes better and better. This may be because embedding with a higher dimension is able to contain more information. However, when the dimension continues to increase, the performance of the model decreases. This may be because a too high dimension will lead to overfitting.

Figure 8 shows the influence of parameters

λ_{I}

on Recall@K and the experiment is conducted on the Wikipedia dataset. In Figure 8a,

λ_{I}

varies from 0 to 1 and we fix

λ_{U} = 1

. One can see that our proposed model MRRNN-S achieves the best performance among all recall@K when

λ_{U} = 0.2

. Next, we further study the effect of parameters

λ_{U}

on model performance when

λ_{I} = 0.2

. The result is presented in Figure 8b. It shows that MRRNN-S achieves the best performance when

λ_{U} = 1

.

Furthermore, in order to study the robustness of our proposed model MRRNN-S, we vary the percentage of training data in the prediction task on the Wikipedia dataset. To be specific, when other parameters are fixed, the proportion of the training data is varied from 10% to 80%. In each case, the 10% interactions of the training data are used as validation, and the next 10% interactions data is used as testing. The purpose of the above experiment setting is to explore the performance under the same scale of testing data. Figure 9 shows the change in the MRR and Recall@10 on the Wikipedia dataset with the training data proportion increasing. One can see that the performance curves always fluctuate around 0.8, which means the performance of our model is stable even if the proportion of training data is changed.

6. Conclusions

In this paper, we proposed a novel Metapath-guided Recursive RNN based Shift embedding method named MRRNN-S for predicting which item will be interacted with by a user in the future. To capture the sequential information from the user–item interaction sequence data, we project each user and item into a latent space by word2vec module. The heterogeneity hidden in user–item interaction graph is also extracted by a meta-path module for providing richer auxiliary information. A recursive RNN is utilized to learn the user and item embedding by considering both dynamic and static features. Additionally, we designed a shift embedding module that is able to incorporate the time interval information for predicting future user embedding. Experimental results on two real-world datasets demonstrated the effectiveness of our model. Compared with the results from our previous paper [1], MRRNN-S achieves superior performance on a large dataset. It verifies the effectiveness of adding the two new modules to obtain more latent information from the sequence data and graph data respectively. Thus better feature representations of users and items are obtained by MRRNN-S. In the future, it would be interesting to further study whether the current framework for data modeling can be extended to other kinds of applications.

Author Contributions

Conceptualization, Y.L. and C.Y.; data curation, C.Y.; formal analysis, Y.L.; methodology, Y.L. and C.Y.; supervision, F.W., J.L. and S.W.; validation, Y.L.; writing—original draft, Y.L.; writing—review and editing, F.W., J.L. and S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Guangdong Basic and Applied Basic Research Foundation (No.2021A1515012239),General Project of Science and Technology Plan of Beijing Municipal Education Commission (No. KM202010017011), Program of Beijing Excellent Talents Training for Young Scholar (No. 2018000020124G089), and National Natural Science Foundation of China (No. 42104175).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://snap.stanford.edu/jodie/, accessed on 22 January 2022. The JingDong dataset presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yin, C.; Wang, S.; Du, J.; Zhang, M. Recursive RNN Based Shift Representation Learning for Dynamic User-Item Interaction Prediction. In Proceedings of the International Conference on Advanced Data Mining and Applications, Foshan, China, 12–14 November 2020; pp. 379–394. [Google Scholar]
Hosseini, M.; Maida, A.S.; Hosseini, M.; Raju, G. Inception lstm for next-frame video prediction (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13809–13810. [Google Scholar]
Barman, P.P.; Boruah, A. A RNN based Approach for next word prediction in Assamese Phonetic Transcription. Procedia Comput. Sci. 2018, 143, 117–123. [Google Scholar] [CrossRef]
Wang, H.; Zhang, F.; Zhang, M.; Leskovec, J.; Zhao, M.; Li, W.; Wang, Z. Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 968–977. [Google Scholar]
Huang, C.; Wu, X.; Zhang, X.; Zhang, C.; Zhao, J.; Yin, D.; Chawla, N.V. Online purchase prediction via multi-scale modeling of behavior dynamics. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2613–2622. [Google Scholar]
Cai, X.; Han, J.; Yang, L. Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5747–5754. [Google Scholar]
Lei, C.; Liu, D.; Li, W.; Zha, Z.-J.; Li, H. Comparative deep learning of hybrid representations for image recommendations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2545–2553. [Google Scholar]
Soh, H.; Sanner, S.; White, M.; Jamieson, G. Deep sequential recommendation for personalized adaptive user interfaces. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, Limassol, Cyprus, 13–16 March 2017; pp. 589–593. [Google Scholar]
Zhang, Q.; Wang, J.; Huang, H.; Huang, X.; Gong, Y. Hashtag Recommendation for Multimodal Microblog Using Co-Attention Network. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 3420–3426. [Google Scholar]
Lv, F.; Jin, T.; Yu, C.; Sun, F.; Lin, Q.; Yang, K.; Ng, W. SDM: Sequential deep matching model for online large-scale recommender system. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2635–2643. [Google Scholar]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv, 2015; arXiv:1511.06939. [Google Scholar]
Wang, H.; Zhang, F.; Xie, X.; Guo, M. DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1835–1844. [Google Scholar]
Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 111–112. [Google Scholar]
Dong, X.; Yu, L.; Wu, Z.; Sun, Y.; Yuan, L.; Zhang, F. A hybrid collaborative filtering model with deep structure for recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1309–1315. [Google Scholar]
Yin, C.; Wang, S.; Miao, H. Recursive LSTM with shift embedding for online user-item interaction prediction. In Proceedings of the 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), Beijing, China, 19–23 October 2020; pp. 10–12. [Google Scholar]
Wang, S.; Hu, L.; Wang, Y.; Cao, L.; Sheng, Q.Z.; Orgun, M. Sequential recommender systems: Challenges, progress and prospects. arXiv, 2019; arXiv:2001.04830. [Google Scholar]
Bellogín, A.; Parapar, J. Using graph partitioning techniques for neighbour selection in user-based collaborative filtering. In Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin, Ireland, 9–13 September 2012; pp. 213–216. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Loepp, B.; Ziegler, J. Towards interactive recommending in model-based collaborative filtering systems. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 546–547. [Google Scholar]
Linden, G.; Smith, B.; York, J. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 2003, 7, 76–80. [Google Scholar] [CrossRef] [Green Version]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. CSUR 2019, 52, 1–38. [Google Scholar] [CrossRef] [Green Version]
Cheng, H.-T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
Kim, D.; Park, C.; Oh, J.; Lee, S.; Yu, H. Convolutional matrix factorization for document context-aware recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 233–240. [Google Scholar]
Ebesu, T.; Fang, Y. Neural citation network for context-aware citation recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017; pp. 1093–1096. [Google Scholar]
Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W.-Y. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar]
Xiang, L.; Yuan, Q.; Zhao, S.; Chen, L.; Zhang, X.; Yang, Q.; Sun, J. Temporal recommendation on graphs via long-and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 723–732. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.-S. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
Zhou, H.; Tan, Q.; Huang, X.; Zhou, K.; Wang, X. Temporal Augmented Graph Neural Networks for Session-Based Recommendations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021; pp. 1798–1802. [Google Scholar]
Sun, J.; Guo, W.; Zhang, D.; Zhang, Y.; Regol, F.; Hu, Y.; Guo, H.; Tang, R.; Yuan, H.; He, X. A framework for recommending accurate and diverse items using bayesian graph convolutional neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 2030–2039. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 639–648. [Google Scholar]
Fan, Z.; Liu, Z.; Zhang, J.; Xiong, Y.; Zheng, L.; Yu, P.S. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, Australia, 1–5 November 2021; pp. 433–442. [Google Scholar]
Jiang, X.; Lu, Y.; Fang, Y.; Shi, C. Contrastive Pre-Training of GNNs on Heterogeneous Graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, Australia, 1–5 November 2021; pp. 803–812. [Google Scholar]
Zheng, J.; Ma, Q.; Gu, H.; Zheng, Z. Multi-view Denoising Graph Auto-Encoders on Heterogeneous Information Networks for Cold-start Recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event, Singapore, 14–18 August 2021; pp. 2338–2348. [Google Scholar]
Zhou, L.; Yang, Y.; Ren, X.; Wu, F.; Zhuang, Y. Dynamic network embedding by modeling triadic closure process. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 571–578. [Google Scholar]
Kumar, S.; Zhang, X.; Leskovec, J. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1269–1278. [Google Scholar]
Li, J.; Dani, H.; Hu, X.; Tang, J.; Chang, Y.; Liu, H. Attributed network embedding for learning in a dynamic environment. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 387–396. [Google Scholar]
Zhu, D.; Cui, P.; Zhang, Z.; Pei, J.; Zhu, W. High-order proximity preserved embedding for dynamic networks. IEEE Trans. Knowl. Data Eng. 2018, 30, 2134–2144. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 2018; arXiv:1810.04805. [Google Scholar]
Nguyen, G.H.; Lee, J.B.; Rossi, R.A.; Ahmed, N.K.; Koh, E.; Kim, S. Continuous-time dynamic network embeddings. In Proceedings of the Companion Proceedings of the The Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 969–976. [Google Scholar]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. arXiv, 2020; arXiv:2002.07962. [Google Scholar]
Ye, W.; Wang, S.; Chen, X.; Wang, X.; Qin, Z.; Yin, D. Time matters: Sequential recommendation with complex temporal information. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 1459–1468. [Google Scholar]
Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
Zhu, Y.; Li, H.; Liao, Y.; Wang, B.; Guan, Z.; Liu, H.; Cai, D. What to Do Next: Modeling User Behaviors by Time-LSTM. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 3602–3608. [Google Scholar]

Figure 1. (a) Illustration of two user interaction sequences: Bob buys a cell phone, a phone case, a phone film, and a mobile earphone; Alice buys a suit, a dress, a shoe, and a hat, successively. (b) A toy example of an interaction network containing three users and four items. Each arrow represents an interaction from a user to an item. Each interaction is associated with a timestamp t and a feature vector f (such as the feature of the commodity).

Figure 2. The framework of the proposed MRRNN-S.

Figure 3. An illustration of the interaction sequence of a user with items.

Figure 4. (a) An example of user–item interaction graph. (b) An example of metapath.

Figure 5. The training loss curves of MRRNN-S over the three datasets.

Figure 6. The percentage of each variant’s performance to the performance of the full model on the two datasets.

Figure 7. The impact of different embedding dimensions on the JingDong dataset.

Figure 8. (a) Recall@K with different

λ_{I}

when

λ_{U} = 1

. (b) Recall@K with different

λ_{U}

when

λ_{I} = 0.2

.

Figure 8. (a) Recall@K with different

λ_{I}

when

λ_{U} = 1

. (b) Recall@K with different

λ_{U}

when

λ_{I} = 0.2

.

Figure 9. Percentage of training data.

Table 1. Symbols and descriptions.

Symbols	Description
$S$	the set of user–item interaction
$U$ and $I$	the set of users and items
u $_{t}$ and i $_{t}$	the dynamic embedding of user and item at timestamp t
$\bar{u}$ and $\bar{i}$	the static embedding of user u and item i
E $_{u}$ and E $_{i}$	the static embedding matrices of users and items
$\hat{u}$ $t + Δ$	the predicted embedding of user u at time t + Δ
$\hat{i}$ $t + Δ$	the predicted embedding of item i at time t + Δ

Table 2. Statistics of the two datasets.

Data	#Users	#Items	#Interactions
Wikipedia	8227	1000	157,474
Reddit	10,000	1000	672,447
JingDong	10,692	303,150	1,198,735

Table 3. Overall performance w.r.t. Recall@10 and MRR. “*” indicates the improvement of the MRRNN-S over the baseline is significant at the level of 0.05.

Datasets	Metric	LSTM	Time-LSTM	Jodie	NGCF	LightGCN	RRNN-S	MRRNN-S
Wikipedia	MRR	0.332 *	0.351 *	0.741 *	-	-	0.756 *	0.779 *
	Recall@10	0.401 *	0.452 *	0.803 *	0.198 *	0.248 *	0.837 *	0.801 *
Reddit	MRR	0.343 *	0.355 *	0.718 *	-	-	0.733 *	0.766 *
	Recall@10	0.530 *	0.575 *	0.832 *	0.254 *	0.287 *	0.842 *	0.856 *
JingDong	MRR	0.039 *	0.047 *	0.080	-	-	0.089 *	0.264 *
	Recall@10	0.057 *	0.064 *	0.131 *	0.010 *	0.013 *	0.142 *	0.343 *

Table 4. Experiment result of the ablation study.

Datasets	Metric	MRRNN-1	MRRNN-2	MRRNN-3	MRRNN-4	MRRNN-S
Wikipedia	MRR	0.753	0.483	0.581	0.449	0.779
	Recall@10	0.782	0.610	0.692	0.583	0.801
JingDong	MRR	0.19	0.009	0.121	0.003	0.264
	Recall@10	0.261	0.012	0.220	0.005	0.343

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Yin, C.; Li, J.; Wang, F.; Wang, S. Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN. Algorithms 2022, 15, 80. https://doi.org/10.3390/a15030080

AMA Style

Liu Y, Yin C, Li J, Wang F, Wang S. Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN. Algorithms. 2022; 15(3):80. https://doi.org/10.3390/a15030080

Chicago/Turabian Style

Liu, Yi, Chengyu Yin, Jingwei Li, Fang Wang, and Senzhang Wang. 2022. "Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN" Algorithms 15, no. 3: 80. https://doi.org/10.3390/a15030080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN^†

Abstract

1. Introduction

2. Related Work

2.1. Collaborative Filtering Recommendation

2.2. Deep Learning-Based Recommendation

2.3. Graph-Based Recommendation

2.4. Temporal Network Embedding

3. Preliminary and Problem Definition

3.1. Preliminary

3.2. Problem Definition

4. Methodology

4.1. Auxiliary Information Extraction

4.2. Dynamical Embedding Generation

4.3. Overall Objective Function

5. Experiment

5.1. Dataset

5.2. Baselines and Evaluation Metrics

5.3. Results

5.3.1. Comparison with Baselines

5.3.2. Ablation Study

5.3.3. Parameters Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN †

Abstract

1. Introduction

2. Related Work

2.1. Collaborative Filtering Recommendation

2.2. Deep Learning-Based Recommendation

2.3. Graph-Based Recommendation

2.4. Temporal Network Embedding

3. Preliminary and Problem Definition

3.1. Preliminary

3.2. Problem Definition

4. Methodology

4.1. Auxiliary Information Extraction

4.2. Dynamical Embedding Generation

4.3. Overall Objective Function

5. Experiment

5.1. Dataset

5.2. Baselines and Evaluation Metrics

5.3. Results

5.3.1. Comparison with Baselines

5.3.2. Ablation Study

5.3.3. Parameters Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Predicting Dynamic User–Item Interaction with Meta-Path Guided Recursive RNN^†