1. Introduction
As e-commerce continues to grow in scale and the number and variety of products grows rapidly, it takes a lot of time for customers to find the products they want. This kind of browsing through a large amount of irrelevant information will undoubtedly cause a continuous loss of consumers who are drowning in the information overload problem. To solve these problems, recommendation systems have been proposed and studied widely. In a recommendation system, we predict the user’s preference by the interaction between the user and the product in a session, such as clicking or buying. For example, on the website Yelp (
http://www.yelp.com, accessed on 21 June 2022), users can express their preference for a product by rating it on a scale of 1–5, which represents very dislike, dislike, moderate, like, and very like, respectively.
The current approach mainly uses the user’s rating information to predict the user’s preferences. Based on this, the product for the next user interaction is predicted. However, such an approach makes it difficult to solve the sparsity problem and the cold-start problem in the recommendation system. In fact, using various semantic information contained in different kinds of products and different user interactions can largely avoid the above two issues. Therefore, exploiting the heterogeneity of the data in the recommender system data plays a crucial role in improving the effectiveness of recommender systems.
A heterogeneous information network (HIN) [
1] is a network used to capture heterogeneous semantic information in data, which can be used to solve the sparsity problem and the cold-start problem in recommender systems. Compared with other approaches, a heterogeneous information network can better represent the relationship between users and products. At the same time, the recommendation system approach based on heterogeneous information networks can better represent different types of nodes and heterogeneous relationships between nodes than homogeneous information networks. Metagraphs are a major method used in heterogeneous information networks to capture the semantic information in network data [
2].
Currently, metagraph-based heterogeneous information network methods have been applied in many information network data mining fields, such as social network data analysis [
3] and relational graph data mining [
4]. In recommendation systems based on heterogeneous information networks, metagraphs are often used as a powerful representation tool to obtain the relationships between nodes in heterogeneous information networks [
5]. In a heterogeneous information network, two nodes can be connected by different metagraphs, and these metagraphs may have different semantic information between them. Although the use of metagraphs and metapaths can effectively capture semantic information in the network, current approaches based on metagraphs and metapaths are manually specifying the structure of metagraphs and metapaths, which largely limits the recommendation effect on heterogeneous information networks. Therefore, in this paper, we propose a reinforcement learning-based approach to automatically find metapaths in heterogeneous information networks, which uses the recommendation effect of the model as a reward to guide the model to find better metagraphs and metapaths using the search method.
In a more complex heterogeneous information network, the numbers of the types of nodes and edges are often very large, so there are many types of metagraphs in a complex heterogeneous information network. For different recommendation tasks in heterogeneous information networks, different metagraphs play different roles [
6]. Therefore, when performing recommendation tasks on heterogeneous information networks, finding the appropriate metagraphs is crucial for the performance of recommendation tasks on heterogeneous information networks.
For example, when embedding nodes in a heterogeneous information network, the number of layers in the embedded network, the number of neurons in each layer, and the number of dimensions of the node embedding are all hyperparameters that need to be determined artificially. The type of graph neural network used to obtain the node embeddings and the choice of the activation function are determined artificially. In experiments, adjusting these hyperparameters consumes a lot of time and computational resources.
Recently, Neural Architecture Search (NAS) [
7,
8,
9,
10,
11,
12] has been gaining great attention in machine learning fields, and its main goal is to find the optimal hyperparameters and neural network architectures for task performance. Currently, DARTS (Differentiable Architecture Search)-based [
13] neural network architecture search methods solved the scalability challenge of an architecture search by formulating the task in a differentiable manner in an effective way. Therefore, we extend the microscopic neural network architecture search method to recommendation tasks in heterogeneous information networks. We use the neural network architecture search method to obtain better neural network architecture settings, and we use the memory-based method to search the metagraph structure used to find a more suitable metagraph for the task, thus improving the recommendation results.
In this paper, we propose a neural network architecture search algorithm for recommendation tasks on the metagraph of heterogeneous information networks. The main contributions of our paper are the following.
We propose a novel neural network architecture search algorithm for recommendation tasks on heterogeneous information networks, which can automatically search for the number of layers of neural networks, the number of neurons in each layer, the number of dimensions of the node embedding, and the type of graph neural networks used in the recommendation process. It can significantly reduce the time and computational resources compared with the manually searching way.
We propose a metagraph search method for heterogeneous information networks based on a micro-neural network architecture search, which can automatically search for metagraphs which are more suitable for different heterogeneous information networks and recommendation tasks.
We conducted experiments on Amazon and Yelp datasets, comparing the architecture settings obtained from the automatic search with the manually set recommendation structure and verified the recommendation effectiveness of the algorithm.
3. Methods
In this section, we present the detailed description of the proposed auto neural architecture search for metagraph of heterogeneous information network (ANAS-HIN for short) algorithm on heterogeneous information networks. The main notations are listed in
Table 1.
3.1. ANAS-HIN Algorithm
3.1.1. Neural Network Architecture Search Problem Formalization
For heterogeneous information network recommendation model framework proposed in
Figure 1, which contains multiple hyperparameters and multiple artificially set metapaths. Next, we search the hyperparameters and metapaths in the heterogeneous information network using the neural network framework search method. For the heterogeneous information network recommendation model framework
and the given validation set
D, find the optimal architecture of
so that it can achieve the optimal recommendation on the validation set
D and achieve the optimal recommendation results on the validation set
as Equation (
4).
The reinforcement learning framework is used to optimize the above equation to obtain the optimal neural network architecture , where the negative root mean square error on the validation set D. The negative root mean square error on the validation set is used as a reward for reinforcement learning selection (Reward, R).
We use recurrent neural networks (RNN for short) to select different hyperparameters and neural network architectures. First, we will generate a corresponding network framework description from the recurrent neural network
. Then, we go through the framework description
m to generate a specific network, which will be trained on a heterogeneous information network
G. After training, the negative root mean square error on the validation set
D is used as the reward for reinforcement learning, which is used to update the framework in reinforcement learning. The specific framework is shown in
Figure 2.
3.1.2. Neural Network Architecture Search Space
As shown in
Figure 2, we have utilized a controller to generate the framework of the neural network and use a recurrent neural network to search for the optimal neural network framework in the hyperparameter space. We list the searched hyperparameters and their search spaces in
Table 2.
Then, we give a deep introduction of graph neural network type and multiple metagraph embedding aggregation method, respectively. Given a heterogeneous information network
G and the metagraph
M, we can obtain its corresponding connection matrix as
. In order to resolve the corresponding user and commodity embeddings from the metagraph
M to resolve the corresponding user and product embeddings, we use a graph neural network [
39] with two layers of hyperbolic space to obtain the corresponding user and product embeddings.
For G, the node in , let its corresponding feature be . where d is the feature dimension. For featureless nodes, the one-hot vector corresponding to their IDs can be used as input features. Here, we use graph convolutional layer, graph sage layer, and graph attention layer to obtain the user embedding and item embedding.
Graph convolutional layer: The graph convolutional layer is mainly used to aggregate the adjacent node features in the network by convolution to obtain the node embedding in the next layer.
where
denotes the connection matrix under the metagraph,
M the connection matrix after regularization under
D is the connection matrix
of the degree matrix with diagonal elements
, and the non-diagonal elements are 0.
W is the weight matrix.
Graph attention layer: The graph attention layer is a variation of the graph convolutional layer. When node aggregation is performed in the graph convolutional layer, the relationship between node features is not considered; however, in the network, the influence between nodes with different features is often different. Therefore, the graph attention layer employs an attention mechanism to perform node aggregation. Suppose the set of neighbor nodes
of node
; then, when performing node aggregation, the graph attention layer uses the attention mechanism to calculate the weights of node
on node
as follows.
where
W is the learnable weight matrix and
is the attention mechanism. In our experiments, we use a feedforward neural network as the attention mechanism. Then, the graph attention layer regularizes the weights of all neighboring nodes using softmax as follows.
Finally, the graph attention layer uses a weighting approach to obtain the output of the next layer.
Graph sage layer: The graph sage layer is another class of variants of the graph convolutional layer. To ensure symmetry in node aggregation, the graph sage layer uses maximization pooling to aggregate the neighboring nodes of node
.
where
denotes the element-wise max pooling.
Through the graph neural network based on metaplot, we can obtain the embeddings of users and products under different metaplots. Then, we aggregate the user and product embeddings based on different metagraphs to finally obtain more expressive user and product embeddings to achieve more accurate recommendation results. When performing node embedding aggregation, we introduce three aggregation methods here.
Concatenation: For user embedding
and product embedding
, we directly stitch the embeddings obtained under different metagraphs to obtain the final user embedding and product embedding.
where
represents the splicing operation. However, the stitching operation increases the dimensionality of the embedding and therefore requires more computational power of the computer.
Mean: To avoid the increase in embedding dimensions, the user embedding
and the commodity embedding
under different metagraphs are averaged.
where
denotes the averaging operation. In contrast, averaging can ensure that the dimensionality of the aggregated vectors does not change, but it does not take into account the difference between different metagraphs, while averaging directly will make the aggregated vectors lose some information.
Attention: The attention mechanism [
40] can effectively avoid the disadvantages of these two methods. The attention mechanism can weight the user and product embeddings according to the input of different metagraphs, and the weight will be changed by the impact of the embeddings on the recommendation effect obtained from different original maps. The specific computation process is as follows.
For the metagraph derived from
M, the user embedding obtained from the
and the product embedding
, we use a two-layer perceptron for users and items, respectively, to obtain the attention scores of the corresponding users and items
and
.
where
denotes the weight matrix, and
is the bias term.
is the nonlinear activation function. After obtaining the attention scores, we regularize the attention scores of different metapaths using the softmax function as follows.
where
L denotes the set of all metagraph compositions. Finally, we obtain the embedding of end users and items by weighting as Equations (
10) and (
11).
Finally, we use the embedding of users and products to obtain the user ratings for the products. For user and product pairs
, we first stitch together the embeddings of the user and the item
, and we use a multilayer feed forward neural network to predict the user’s rating of the item.
The specific prediction process is shown below. The specific prediction process is shown as Equation (
12).
where
W denotes the weight matrix, and
b is the bias term.
is the function of the composite composition of the multilayer feedforward nonlinear neural network.
3.2. Metapaths Auto-Search
We propose to use the controller to search the new metagraph method. In the search process, we incorporate a metagraph of length of the metagraph search process, where the search space at each position is the node type and the empty type present in the current graph. Once the controller selects the empty type, the current metagraph construction is finished, and for the node types that the controller has searched, we use all the edge types in them to connect them to the previous node types to obtain the searched metagraph structure. We use both the manually constructed metagraph and the automatically searched metagraph together for recommendations on heterogeneous information networks. For the heterogeneous information network recommendation model framework, we use a list to represent the model framework obtained by the search. Equation (
13) is an example for the list.
which indicates the use of the graph attention mechanism as a graph convolutional network in stage 1, with tanh as the activation function of the graph convolutional layers, and the output dimension of each layer of the graph convolutional network is 128 dimensions, and the node embedding model under the metagraph is obtained by compounding the two layers of the graph convolutional layers, with the node embedding dimension of 128.
In stage 2, the node embeddings under multiple metagraphs are aggregated using the averaging method, and a 2-layer feedforward neural network is used to obtain the current score. Meanwhile, the search yields a metagraph of . Thus, the task of the controller is to generate the optimal sequence of the above framework. Let the length of the sequence be T; then, the sequence can be expressed as . where is obtained by searching from the corresponding hyperparametric search space. As mentioned before, we use the memory-based recurrent neural network to model the process.
3.3. Recurrent Neural Network with Memory Mechanism
The linear modeling of hyperparameter space with recurrent neural network as controller assumes that the hyperparameters are all linearly related to each other, but in practice, the connection between hyperparameters is not necessarily linear, and the modeling process has a greater relationship with the order of hyperparameters. Therefore, the connection between hyperparameters and the order of hyperparameters have a greater impact on the modeling effect. To solve this problem, we incorporate a memory mechanism in recurrent neural networks to model the connections between hyperparameters. Unlike common search algorithms for neural network architectures, the NAS-HIN algorithm has the following search process for hyperparameters.
The controller predicts the number of layers of the graph neural network in 1, 2, 3, placed at position 0 of the list of predicted model frames, calculated as:
The controller predicts the nonlinear activation function of the graphical neural network in Sigmoid, tanh, relu, ⋯, elu, placed at the 1st position in the list of predicted model frames, calculated as:
where
are the learnable parameter matrices in recurrent neural networks and
M is the learnable parameter matrix in the memory mechanism. Then, the controller sequentially predicts the graph neural network type, the number of graph neural network attention mechanism heads, the number of graph neural network output dimensions, the number of metagraph node embedding dimensions, the multivariate graph embedding aggregation method, and the number of scoring multilayer feedforward neural network layers. After that we will experimentally verify the effect of the memory mechanism on the effect of the model.
To obtain the optimal sequence of frames, we used a strategic gradient algorithm to update the parameters in the recurrent neural network
. After the recurrent neural network generates the corresponding model frame sequence
m, we construct a recommendation model based on
m, train it on the training set, and after the training, we test the model on the test set
D. After training, we test the model on the test set to obtain the test results
. In the experiments, the
is the negative root mean square error. We use it as a reward to train the recurrent neural network. Because, here,
is not differentiable, we use a reinforcement learning approach to update the parameters
:
where
b represents the exponential sliding average of the previous frame rewards in training, for the current generative model frame
m, the corresponding model training process and the training for the controller are independent of each other. In our experiments, we used cross-entropy loss for the training of the control. Considering that for the model
m the randomness of the training and testing process in the training process, we repeat the training process
N times and selected the best top
K. In order to reduce the error caused by randomness, we repeat the training process several times and select the top model as the candidate model for the final comparison.
3.4. Optimization
Because the nodes contain more types in most heterogeneous information networks and the metapaths in some of them are long in length, it takes longer time to optimize them in experiments using reinforcement learning methods. Thus, we use the Gumbel-Max trick to speed up the training of automatic metapath selection. For a metapath of length
N, there are
node types that can be selected at each step. At the
ith step of selection, the controller
draws a node type from the discrete distribution
:
where
are the corresponding weights in the controller
. The discrete distribution
is then defined by a softmax function.
where
is the prediction weight of the controller for each category when making the
ith step selection. From Formula (
15), it can be seen that the controller needs to be sampled when making each selection step, so the weights in the controller are not derivable and cannot be trained using the gradient descent method. In order to be able to optimize the ANAS-HIN algorithm using the gradient descent method, we used the Gumbel-Max trick to accelerate the optimization of the model. Therefore, we use the following equation instead of Formula (
15):
where
is obtained by independent sampling from the Gumbel (0,1) distribution, i.e.,
Next, we will use the
function to approximate the
function to make the whole process derivable.
where
is the temperature parameter, and
when
.
4. Experiment
In this section, we conducted experiments on two real datasets. First, we apply the ANAS-HIN algorithm on the two heterogeneous information networks datasets. We compared the ANAS-HIN algorithm with some recommendation algorithms to verify the real effects of the models. Finally, we performed an ablation analysis on the model to verify the effects of different parts of the model.
4.1. Dataset
Two datasets of Yelp (
http://www.yelp.com/dataset/, accessed on 21 June 2022) and Amazon (
http://jmcauley.ucsd.edu/data/amazon/, accessed on 21 June 2022) are used for our experiments. The Yelp dataset is a business recommendation dataset, and we extracted a subset of data from the Yelp dataset, which contains 18,465 users, 536 businesses, and 20,000 ratings, with a minimum rating of 1 and a maximum rating of 5, where the higher the rating is, the more users prefer the business. The Amazon dataset contains 16,970 users, 336 products, and 20,000 ratings, the same as the Yelp dataset, with a minimum rating of 1 and a maximum rating of 5, where a higher rating indicates that the user prefers the product [
41]. The specific statistics of the two datasets are shown in
Table 3.
For the Review-Aspect data, we used the Gensim tool to classify the average of the data by topic, where we set the number of topics to 10, so that each average corresponds to a vector of length 10, and each number in the vector corresponds to the probability of the review being the topic. The metagraph we chose in both datasets is shown in
Figure 3 and
Figure 4.
4.2. Evaluation Indicators
In order to evaluate the recommendation effectiveness of different models, we used root mean square error (RMSE) as Equation (
16) to evaluate the recommendation effectiveness of different models. The smaller the RMSE, the better the recommendation effect of the model.
where
denotes the test set, and
denotes the model’s prediction for the user’s commodity pair
, and
denotes the predicted rating of the user product pair,
denotes the true rating of the user pair, and
denotes the number of user pairs in the test set.
4.3. Baseline Algorithms
In order to verify the recommendation effectiveness of the ANAS-HIN algorithms, we compare them with the following baseline methods usually used in these recommendation systems.
NeuACF [
1]: The NeuACF model makes recommendations from two aspects. On the one hand, it uses human-defined metapaths for similarity between users and items; on the other hand, it uses matrix decomposition methods to obtain the embedding of users and items, uses inner product to obtain the similarity between users and items, and finally, it combines the similarity of both aspects to predict users’ ratings of items.
MGAR [
6]: Similar to the FMG model, the MGAR model is also a two-stage model. In the first stage, the MGAR model performs matrix decomposition through the connection matrix of metagraphs to obtain the embedding of users and products, and then in the second stage, the FMG weights the different metagraphs through the attention model to obtain the users’ ratings of products.
SemRec [
14]: The SemRec model is mainly for weighted heterogeneous information network for recommendation, which uses human-defined weighted metapaths to calculate the similarity between users and products, and finally uses this similarity to predict users’ ratings of products.
FMG [
24]: The FMG model is similar to the recommendation model framework we introduced. In the first stage, it uses a method based on metagraph and metapath matrix decomposition to obtain the embeddings of users and items; then, it uses the embeddings of users and items as their features, followed by a factor machine model to predict the users’ ratings of items.
FM [
25]: Factorization machine (FM) mainly uses linear combinations of users and items to predict users’ ratings of items. Unlike the PMF model, the factor machine model considers not only the first-order similarity between users and items but also the second-order similarity between users and items, and finally, the factor machine model combines this order similarity and the second-order similarity to predict users’ ratings of items.
PMF [
42]: Probabilistic Matrix Factorization (PMF) model transforms the interaction between users and items into an interaction matrix between users and items, and uses matrix decomposition to obtain the embeddings of users and items, and finally uses the inner product between the embeddings of users and items to predict the users’ ratings of items.
4.4. Experimental Results
The experimental results are shown in
Figure 5. In the experiments, we randomly selected 80% of the data in the dataset as the training set and the remaining 20% of the data as the test set. The first row in each method in
Figure 5 corresponds to the RMSE value of the recommended effect of that method.
The ANAS-HIN performed better than six baseline methods on the Yelp dataset. Compared to the PMF method, the ANAS-HIN improved the recommended effect by 70.1%. The effect of the ANAS-HIN improved by 56.6% over the FM method. The performance of the SemRec method on the Yelp dataset was 56.0% lower than that of the ANAS-HIN. Moreover, the ANAS-HIN significantly outperformed the FMG method, with a 50.1% improvement in RMSE values for the ANAS-HIN. The ANAS-HIN brought a positive effect of 30.1% and 3.4% relative to both the NeuACF and MGAR methods, respectively. On the Amazon dataset, the ANAS-HIN showed a superior effect similar to that of the Yelp dataset.
Compared with the PMF method, the ANAS-HIN brought a 66.0% advantage. Compared with the FM method, the ANAS-HIN improved the recommendation effect by 42.7%. Based on the SemRec method, the ANAS-HIN method improved the RMSE value by 44.9%. The RMSE value of the FMG method was 41.4% lower than that of the ANAS-HIN. The RMSE value of the ANAS-HIN brought a positive effect of 29.8% and 5.2% relative to both the NeuACF and MGAR methods, respectively. The comparison between the ANAS-HIN algorithm and the FMG algorithm and also the ANAS-HIN algorithm and the MGAR algorithm shows that the ANAS-HIN algorithm can find more powerful neural network architecture for a recommendation task by the method of neural network architecture search, which effectively improves the effectiveness of the recommendation algorithm on the heterogeneous information network.
4.5. Ablation Study
In this subsection, we investigate the effect of the memory mechanism in the NAS-HIN algorithm on the effectiveness of the algorithm. We remove the memory mechanism from the ANAS-HIN algorithm and use linear modeling for the search in hyperparameter space. We refer to the ANAS-HIN algorithm with the memory mechanism removed as the ANAS-HIN-M algorithm. We list the algorithm effects of the ANAS-HIN-M algorithm on the Yelp dataset and the Amazon dataset in
Table 4. From
Table 4, we can see that after removing the memory mechanism, the ANAS-HIN algorithm decreases 2.60% and 1.67% on the Yelp and Amazon datasets, respectively, which shows that the algorithm has difficulty in capturing the association between the hyperparameters after removing the memory mechanism.
4.6. Impact of Different Metagraphs on the Model
In this subsection, we investigate the effect of using different metagraphs and metagraphs obtained from an ANAS-HIN search (M-Auto) on the model in a heterogeneous information network. Therefore, we calculate the effect of recommendation in the ANAS-HIN when using one metagraph alone in each of the two datasets, compared with the effect when using all the metagraphs (M-A-all). The specific effects are shown in
Table 5 and
Table 6. Metagraph
is shown in
Figure 3 and
Figure 4.
From the above results in
Table 5 and
Table 6, it can be seen that the recommendation effect is not good when only one metagraph is used. This indicates that each metagraph contains only part of the information between the user and the product.
From
Table 5, we can see that in the Yelp dataset, the metagraph M6, which is manually determined by
in the manually determined metaplot, has the most effect on the recommendation improvement, from which it can be seen that the location where the business is located has the most effect on the user in Yelp’s recommendation. In the Amazon dataset of
Table 6, the metamap M5 has the most improvement on the recommendation effect. Comparing with other manually determined metagraphs, we can find that metagraph M5 contains the most semantic information, so its improvement on the recommendation effect is the most, from which we can see that the medium- and high-order semantic information can make the recommendation effect better.
In both datasets, the metagraph M-Auto, which is automatically searched by the ANAS-HIN algorithm, improves the recommendation effect more than all other manually determined metagraphs, which indicates that the ANAS-HIN algorithm can obtain metagraphs with richer semantic information by searching. From the results, we can also see that the effect of combining multiple metagraphs for recommendation effect will be much improved than that of using only one metagraph. From this, we can see that the semantic information contained in one metagraph is limited, and the semantic information in different metagraphs is different. Therefore, combining the semantic information in multiple metagraphs can effectively improve the recommendation effect of the model. Moreover, the impact of different metagraphs on the recommendation effect of the model is also different.
5. Summary
In this paper, we propose a neural network architecture search algorithm, the ANAS-HIN model, for recommendation algorithms on heterogeneous information networks. We first decompose the common recommendation algorithms on heterogeneous information networks; then, we summarize the more important hyperparameters in the recommendation algorithm and use a list to represent the model architecture of the recommendation algorithm. We use recurrent neural networks to model the selection of each hyperparameter and also use the recommendation effect of the algorithm model on the validation set as the reward corresponding to this architecture. Finally, we use reinforcement learning to train the ANAS-HIN model to obtain the optimal recommendation algorithm architecture. In addition, we use the NAS architecture to automatically search the metagraph structure used in the recommendation task to find a metagraph structure that is more suitable for the current dataset and the recommendation task. To verify the effectiveness of the ANAS-HIN algorithm, we applied the ANAS-HIN algorithm on two recommendation algorithms on the FMG and MGAR heterogeneous information networks and performed a neural network architecture search on both algorithms to obtain the optimal setting architecture. We conducted experiments on two real heterogeneous information network recommendation datasets, Yelp and Amazon, and compared the recommendation effectiveness with six mainstream recommendation algorithm models. The experimental results verified the effectiveness of the models on the recommendation task. In addition, there are some limitations of our model. Our model is mainly designed for recommendation systems based on heterogeneous information networks; however, it is not applicable for many other types of recommendation system scenarios, such as a session-based recommender system. This also provides directions for our future research.