A Knowledge Graph-Enhanced Attention Aggregation Network for Making Recommendations

: In recent years, many researchers have devoted time to designing algorithms used to introduce external information from knowledge graphs, to solve the problems of data sparseness and the cold start, and thus improve the performance of recommendation systems. Inspired by these studies, we proposed KANR, a knowledge graph-enhanced attention aggregation network for making recommendations. This is an end-to-end deep learning model using knowledge graph embedding to enhance the attention aggregation network for making recommendations. It consists of three main parts. The ﬁrst is the attention aggregation network, which collect the user’s interaction history and captures the user’s preference for each item. The second is the knowledge graph-embedded model, which aims to integrate the knowledge. The semantic information of the nodes and edges in the graph is mapped to the low-dimensional vector space. The ﬁnal part is the information interaction unit, which is used for fusing the features of two vectors. Experiments showed that our model achieved a stable improvement compared to the baseline model in making recommendations for movies, books, and music.


Introduction
In many online services such as e-commerce, Internet advertising, and social media, people access online content (usually in the form of purchases or clicks), thus generating a large number of interactive records. To reduce the impact of information overload, researchers have proposed recommendation systems to satisfy the personalized needs of users. Traditional recommendation methods, such as collaborative filtering (CF) and matrix factorization (MF) [1], predict whether users are interested in an item based on their historical interactions. However, these methods usually suffer from the problems of data sparsity and the cold start. Researchers have introduced external information to solve these problems; for example, social networks [2,3], item attributes [4], knowledge graphs [5], and other heterogeneous networks. Knowledge graphs are widely used in recommendation tasks because of their high-order connectivity and sufficient prior knowledge. In the recommendation task based on the knowledge graph, users and items correspond to nodes in the graph, and the relationships between items correspond to edges in the graph. At present, many research institutions open-source their academic knowledge graphs, such as DBpedia [6] and Google Knowledge Graph [7].
Inspired by these methods, we consider that the semantic relationship between users and items can improve the effect of recommendations. For an item, different users have different interactive behaviors, which represent the user's preference. For example, a user's interaction behavior in relation to an iPad Pro is "purchase", whereas another's interaction behavior is "browse". Obviously, compared with browsing, the interaction behavior of purchasing indicates the user prefers the iPad Pro. Therefore, we take users and items as entities, and the interaction behavior as the relationship between users and items. This forms a user-item interaction knowledge graph, as shown in Figure 1. Most researchers divide the recommendation system (RS) [8] and knowledge graph embedding (KGE) [9] into two independent tasks and train them in two different vector spaces. This method is convenient for training different vectors. The error caused by spaces will reduce the overall performance of the model. We concede that recommendation systems and knowledge graph embedding are not completely independent tasks [10]. Inspired by multitask learning [11], we propose KANR, an end-to-end deep learning model using knowledge graph embedding to enhance the attention aggregation network for making recommendations. The main contributions of this work are as follows:  We designed an information interaction unit to combine a recommendation system and knowledge graph embedding tasks. KANR can learn the semantic information in the knowledge graph in a unified vector space.  We proposed an attention aggregation network which is used to collect users' interaction history and mine users' preferences to improve personalized recommendations.  We conducted a series of experiments on three open-source datasets to verify the effectiveness of KANR. The results showed that KANR more effectively learned user preferences and performed well in click-through rate prediction and Top-K recommendation tasks.

Recommendation Systems
At present, recommendation system based on knowledge graphs can be divided into three types: embedding-based methods, path-based methods, and graph-based methods.
The embedding-based method uses KGE to map the items and relationships in the knowledge graph to a low-dimensional vector space, and to design a model for learning the features of users and items. The Deep Knowledge-aware network (DKE) [12] first pretrains the entities in the knowledge graph and then uses a VGG network [13] to extract the features of the joint vector of the entity and the words in news headlines. This method uses a simple framework to integrate the entities of the knowledge graph with news recommendations. However, this method only uses the words in the title to make recommendations, so the corresponding context information is missing. Collaborative Most researchers divide the recommendation system (RS) [8] and knowledge graph embedding (KGE) [9] into two independent tasks and train them in two different vector spaces. This method is convenient for training different vectors. The error caused by spaces will reduce the overall performance of the model. We concede that recommendation systems and knowledge graph embedding are not completely independent tasks [10]. Inspired by multitask learning [11], we propose KANR, an end-to-end deep learning model using knowledge graph embedding to enhance the attention aggregation network for making recommendations. The main contributions of this work are as follows:

•
We designed an information interaction unit to combine a recommendation system and knowledge graph embedding tasks. KANR can learn the semantic information in the knowledge graph in a unified vector space.

•
We proposed an attention aggregation network which is used to collect users' interaction history and mine users' preferences to improve personalized recommendations.

•
We conducted a series of experiments on three open-source datasets to verify the effectiveness of KANR. The results showed that KANR more effectively learned user preferences and performed well in click-through rate prediction and Top-K recommendation tasks.

Recommendation Systems
At present, recommendation system based on knowledge graphs can be divided into three types: embedding-based methods, path-based methods, and graph-based methods.
The embedding-based method uses KGE to map the items and relationships in the knowledge graph to a low-dimensional vector space, and to design a model for learning the features of users and items. The Deep Knowledge-aware network (DKE) [12] first pre-trains the entities in the knowledge graph and then uses a VGG network [13] to extract the features of the joint vector of the entity and the words in news headlines. This method uses a simple framework to integrate the entities of the knowledge graph with news recommendations. However, this method only uses the words in the title to make recommendations, so the corresponding context information is missing. Collaborative knowledge base embedding for recommender (CKE) [14] uses three embedding methods to fit the knowledge graph, text, and visual information in the knowledge base. Then, it combines the three vectors and the implicit feedback of users in the recommendation system.
Multiple paths connect the nodes in the knowledge graph. These paths can provide more additional information and interpretability for the recommendation system. The path-based method uses the path information in the knowledge graph to enhance the recommendation system and provide explanations for the recommendation results. Explainable Reasoning over Knowledge Graphs for Recommendation (KPRN) [15] uses a long short-term memory (LSTM) network [16] to capture the dependencies and inference paths of items to infer user preferences and generate reasonable explanations. Although the model based on path traversal provides interpretability and a good performance improvement, it consumes a large amount of computing resources when learning the optimal path. RuleGuider [17] proposed a rule guider to learn the probability distribution of the reasoning path. RuleGuider uses a symbol-based method to mine high-quality rules, and introduces an agent in reinforcement learning [18] to learn. The walking path uses high-quality rules to provide reward supervision for the agent.
Graph-based methods focus on the association between nodes and neighbors in the knowledge graph. A common approach is to treat a specific entity node as a center for aggregation to capture the node's characteristics and neighbors. Knowledge Graph Convolutional Networks for Recommender Systems (KGCN) [19] uses a graph convolutional network to aggregate neighbor nodes to obtain the k-hop structure information of the central node. Simultaneously, it weights neighbors according to the connection relationship and specific user scores to represent the semantic information of the knowledge graph. Neighborhood Aggregation Collaborative Filtering Based on Knowledge Graph (NACF) [20] uses the attention in the graph convolutional network (GCN) [21], calculates the weight for historical interactive items, and then uses the graph convolutional network to aggregate the items to capture the user's preferences.

Knowledge Graph Embedding
The function of knowledge graph embedding is to embed the entity and relationship in the knowledge graph into a continuous low-dimensional vector space to facilitate the sharing of the features in the knowledge graph.
At present, the mainly used knowledge graph embedding models are divided into two types. The first is the translation model; for example TransR [22] and TransD [23], can represent the structural information of the knowledge graph. TransR maps entities into R relational space through matrix M r , and learns embedding through h r + r = t r in different spaces. TranD sets up two matrices to map the head entity and tail entity into the relational space, respectively, so that the head entity and tail entity can support different embedding under the same relationship. The second is the semantic matching model [24,25]; for example, the Deep Structured Semantic Model (DSSM) [26] and DistMult [27], can represent the semantic information in the knowledge graph. DSSM maps h and t into the semantic space of common dimensions, and trains the implicit semantic model by maximizing the cosine similarity between h and t semantic vectors. DistMult is a simplified version of a latent factor model (LFM) that limits M r to a diagonal matrix and effectively learns the semantic relationship between entities.

Formulation
Inspired by multi-task learning, KANR uses the information interaction unit to combine the two tasks of RS and KGE. In the recommendation task, we define a set of N users U = {u 1 , u 2 , . . . , u n } and a set of M items I = {i 1 , i 2 , . . . , i m }. The set of user's historical interactive items are defined as V i = (v 1 , v 2 , . . . , v K ), where V i ∈ I represents the historical interactive items of u i . In the pre-training of the knowledge graph, we define a proprietary knowledge graph for each dataset. The knowledge graph is expressed as (h, r, t), where h and t represent the head entity and the tail entity, respectively, and r represents the relationship between entities. In the research of recommendation systems based on the knowledge graph, the entities are regarded as items and the relationship between the enti-Appl. Sci. 2021, 11, 10432 4 of 12 ties is equivalent to the attributes between items. Through the path link in the knowledge graph, the recommendation system can obtain more side information.

Our Model
The framework process of KANR is shown in Figure 2a. First, a user vector u and the set of the user's historical interactive items V are input. U and V are aggregated into the user-item interaction vector e through the attention aggregation network. The attention model can calculate the value of each item weight, so e contains the user's preference for historical interactive items. Then, the model calculates the scores of items i and e for the recommendation. The right half is the knowledge graph embedding. The middle part is the information interaction model, which fuses the features between e and t and the features between I and h.
interactive items of . In the pre-training of the knowledge graph, we define a proprietary knowledge graph for each dataset. The knowledge graph is expressed as ℎ, , , where ℎ and represent the head entity and the tail entity, respectively, and represents the relationship between entities. In the research of recommendation systems based on the knowledge graph, the entities are regarded as items and the relationship between the entities is equivalent to the attributes between items. Through the path link in the knowledge graph, the recommendation system can obtain more side information.

Our Model
The framework process of KANR is shown in Figure 2a. First, a user vector and the set of the user's historical interactive items are input. and are aggregated into the user-item interaction vector through the attention aggregation network. The attention model can calculate the value of each item weight, so contains the user's preference for historical interactive items. Then, the model calculates the scores of items and for the recommendation. The right half is the knowledge graph embedding. The middle part is the information interaction model, which fuses the features between and and the features between and ℎ.

Information Interaction Unit
The information interaction unit is a model that fuses features for two vectors. Inspired by the Deep and Cross Network (DCN) [28], we use feature cross and feature compression to complete the information interaction. Its framework is shown in Figure 3.

Information Interaction Unit
The information interaction unit is a model that fuses features for two vectors. Inspired by the Deep and Cross Network (DCN) [28], we use feature cross and feature compression to complete the information interaction. Its framework is shown in Figure 3. Given the two input vectors and , we first multiply and to construct a feature cross matrix C: Given the two input vectors ke and re, we first multiply ke and re to construct a feature cross matrix C: where C ∈ R d×d is the cross matrix. Because each possible combination ke i re j ∀(i, j) ∈ {1, 2 . . . , d} 2 between all elements, C has completed the feature cross between vectors. Then, we cross the features as the input vector of the recommendation module and the knowledge graph embedding module: where W ke , W re , b ke , b re represent trainable weights and biases. W re and W ke map the interaction matrix C into a d-dimensional vector space which generates new feature mixed vectors ke and re . After the vectors are mixed, they share each other's features. The information interaction model is described by M C .

Attention Aggregation Network
The attention aggregation network calculates the aggregation vector e of the user and historical interactive items, which we call the user-item interaction vector.
We use the user vector u and the historical interactive item vector v to calculate the user-item aggregation vector. In order to mine user preferences, we propose a novel attention model to assign item weights (as shown in Figure 2b). This combines the user vector u i with the history interactive item vector v ij , which share features with each other, and then connects these two vectors to calculate the scores. The score is the weight of the item, and also indicates the user's preference. The calculation process is described in the following formulas.
We first use the information interaction model to share the features between the initial user vector and the item vector. u i , v ij are the mixed feature vectors of the user and the item, which are expressed in the use of the attention model for each history. The interactive item is assigned the weight W ij , which represents the weight of the user's historical interactive item. We then calculate the weighted average of the historical interactive item vector and the initial user vector to obtain the user-item aggregation vector e.

Prediction
First, we use the information interaction unit to process the head entity h and the user-item aggregation vector e i to generate h and e . Furthermore, the tail entity t and new item i are also processed. Next, the head entity h is combined with the tail entity t to calculate the relation vector r ht .
e , h = M C (e, h), where W k ∈ R 2d are trainable weights and || represents a connection. δ is a nonlinear transformation. Then, we calculate the Score(h, r, t) between them using the distance function. Finally, e and i are multiplied to calculate y ui :

Learning
We initialize all users and items in the same vector space, obtain their low-dimensional vector representation, and then use the attention aggregation network to aggregate users and interactive historical items to obtain the user-item aggregation vector. We then use the information interaction model and the nodes of the knowledge graph to share features, and finally calculate the recommendation score and KGE score. After N epochs, KANR gradually learns the optimization results.
In order to better learn the parameters of KANR, we designed the following complete loss function: The complete loss function of KANR is formed by adding the three parts of L RS , L KGE , and L 2 , which respectively express the loss function of the recommendation model, the knowledge graph embedding model, and L 2 regularization. In L RS , we use the crossentropy function J as the loss function of the recommendation model. In L KGE , we calculate the score difference between the positive sampling and the negative sampling of the triple as the confidence of the embedding result, so as to improve the effect of the model. Finally, we add L 2 regularization to the complete loss function to prevent the model from overfitting, where λ 1 and λ 2 are hyperparameters.

Data and Experimental Environment
In this study, we used three datasets: MovieLens-1M, Last.FM, and Book-Crossings. MovieLens-1M comprises about 1 million clear scoring datapoints (scoring from 1 to 5) on the MovieLens website.
Last.FM collects the rating data of 2000 users in the online music system (from 1 to 352,698). The corresponding KG contains 9366 entities, 15,518 edges, and 60 relationship types.
Book-Crossings is a book scoring dataset. It contains 1.1 million ratings for 270,000 books from 90,000 users. The score ranges from 1 to 10, including explicit and implicit scores. The Book-Crossings dataset is one of the least dense datasets, and is also the least dense dataset with a clear score.
The user-item interaction graph corresponds to the dataset samples from the click matrix. In data preprocessing, we treat users and historical interaction items as entities, and the interaction behavior as relations. Because MovieLens-1M and Last.FM comprise scoring feedback, we use manually set rules to convert scoring feedback into click feedback. In the movie dataset, we selected records with 4 points and above (full score is 5) as positive samples. In the music dataset, because the music scores are too sparse (from 1 to 352,698), they cannot be used as a criterion for evaluating user interests. Therefore, we set all the music clicked by the user as a positive sample. Similarly, all interactive items (including 0 to 10 points) in Book-Crossing were regarded as positive samples. In order to avoid too large a gap between the number of positive samples and the number of negative samples, we adopted a negative sample method during training. This method randomly selected negative samples from items in the dataset without user interaction until the numbers of positive samples and negative samples were equal. In addition, we treated different ratings as different types of interaction. In Movie-1M and Book-Crossing, we set up five and 10 interaction types in the user-item interaction graph. Table 1 shows the basic statistics of the three datasets and the hyperparameters of KANR. λ 1 and λ 2 were both set to 0.01 and 0.001. (D represents the embedding vector dimension of the entity, and N represents the number of historical interactive items of the user). KANR was implemented under the Windows 10 operating system, using Python 3.7, Tensorflow-gpu 1.12.0, cudnn 7.1.4, and NumPy 1.15.4. The experimental hardware environment was AMD R5 2600, GTX2070TI, and 16G memory. In this experiment, the training set, evaluation set, and test set were set to a ratio of 6:2:2, and the hyperparameters of the model were dynamically adjusted according to the AUC index.

Baseline
We selected seven baseline models to compare the performance of KANR on three datasets, including three recommendation algorithms combining knowledge graphs and two classic recommendation algorithms. The baseline model was as follows: Wide&Deep [29] proposed a recommendation algorithm that combines a deep model and a shallow wide model for fusion training. The embedding vector dimensions of users and items in Wide&Deep are unified to 64, and a double-layer deep channel with dimensions of 100 and 50 and a wide channel are set at the same time.
CKE [14] proposed a unified recommendation framework to embed multimodal data such as knowledge graphs, text information, and picture information into the recommendation task. This paper sets the embedding vector dimensions of CKE users and items in three datasets. The dimensions are 64, 128, and 32, and the entity's vector embedding dimension is uniformly set to 32.
LibFM [30] is a feature-based factorization model widely used for CTR prediction. In this paper, TransR is used to train the initial user and item vectors.
RippleNet [5] is a hybrid method that uses the knowledge graph structure to assist with making recommendations. It completes personalized recommendation tasks by exploring the user's potential interest characteristics on the knowledge graph. This paper uses the TransE [14] algorithm to learn the embedding vectors of users and items.
KGCN [19] is a recommendation algorithm that uses knowledge graphs to convolve item features. The graph convolution operation greatly increases the utilization efficiency of the knowledge graph's network structure information. This paper sets the entity embedding vector dimensions of KGCN in the three datasets to be 32, 64 and 16, and the number of neighbor nodes is 8.
MKR [10] proposed a multi-task feature learning method combining knowledge embedding and recommendation tasks. By combining the knowledge graph embedding algorithm and the recommendation system module, the potential information of the recommended scene and the knowledge graph can be exchanged.
GMCF [31] proposed a collaborative filtering model based on neural network graph matching. By modeling and aggregating the attribute interaction in the graph matching structure, two types of attribute interaction are effectively captured.

Results
In order to comprehensively test the performance of KANR in the recommended scenario, in this study we conducted experiments on the two tasks of Top-K recommendation and CTR prediction, and compared the results of KANR with the above-mentioned baseline model. All experiments were performed four times, and the average value of the index was calculated.

Metrics
In CTR prediction, we used Area Under Curve (AUC) and Accuracy (ACC) to evaluate the performance of all models. AUC can still make a reasonable evaluation of the classifier in the case of unbalanced samples. ACC describes how many of the predicted positive examples were true. In the Top-K recommendation, we used Recall@K and Precision@K as metrics. Recall@K refers to the ratio of the number of Top-K results to the number of all relevant results. Precision@K quantifies how many of the Top-K results were relevant.

The Performance in CTR Prediction
In CTR prediction, we used AUC (Area Under Curve) and ACC (Accuracy) to evaluate the performance of all models. In addition, we also evaluated the enhancement effect of the knowledge graph embedding model and the attention model on KANR, as shown in Table 2. Table 2. The performance of each model in CTR prediction. The best results are in bold.

Method
MovieLens KANR-KGE means that the model only trained the recommendation model and did not interact with the knowledge graph embedded model. KANR-ATT indicates that the model only used the average aggregation algorithm in the aggregation process of the user vector and the historical interaction item vector, and did not add the item weight calculated by the attention model.
In CTR prediction, KANR slightly leads the baseline model in terms of AUC and ACC. In the movie dataset, KANR has certain performance advantages, whereas the AUC index in the music and book datasets has been significantly improved. However, it also lags slightly behind in the accuracy index of the book dataset. In addition, we can see that by excluding the knowledge graph embedding module and the attention module, the performance of the three datasets shows different degrees of decline, which proves that the semantic information in the knowledge graph and the attention model provides a certain improvement in performance for the recommendation task. As a result of the enhancement of KANR, the information interaction unit of KANR can effectively share vector features.

The Performance in Top-K Recommendation
In the Top-K recommendation task, we recommend the K items with the highest matching degree for each user in the dataset after the model training is completed. Recall@K and Precision@K (K = 1, 2, 5, 10, 50, 100) are used as evaluation indicators to evaluate the performance of each model.
As shown in Figure 4, KANR achieved the best performance in the Top-K recommendations of the three datasets. In Recall@10, the performance of KANR on the movie, music, and book datasets was 6.04%, 11.38%, and 14.03% higher than the best baseline. In Precision@10, it also achieved a performance gain of 9.09%, 7.6%, and 16.91%, which shows that KANR can effectively use the semantic information in the knowledge graph to enhance the recommendation model.  In general, the performance of the model based on the knowledge graph is superior to that of the traditional model. In addition, the performance of KANR, which uses the multi-task learning framework and information interaction model, is superior to that of other types of model, especially in the book dataset. The reason for this is that, in the case of the information interaction model, the knowledge triples in the dataset. The vector features can still be shared, and the recommendation system obtains effective semantic information in the knowledge graph and is not affected by knowledge sparseness.

The Performance in the Cold Start Environment
This section verifies whether KANR can effectively alleviate the cold start problem. We simulated the cold start environment by adjusting the proportion of the training set. When r = 20%, the AUC of the five baselines decreased by 7.12%, 6.23%, 4.91%, 2.01%, and 2.42%, respectively. The performance of KANR only dropped by 1.5%. The results of the model under the AUC evaluation index are shown in Figure 5. This shows that KANR can achieve better results than the baseline in cold start scenarios. Similarly, in this experiment, the recommendation strategy combined with the knowledge graph performed better than the traditional recommendation methods such as Wide&Deep. This proves that the side information of the knowledge graph can alleviate the cold start problem of the collaborative filtering model. In general, the performance of the model based on the knowledge graph is superior to that of the traditional model. In addition, the performance of KANR, which uses the multi-task learning framework and information interaction model, is superior to that of other types of model, especially in the book dataset. The reason for this is that, in the case of the information interaction model, the knowledge triples in the dataset. The vector features can still be shared, and the recommendation system obtains effective semantic information in the knowledge graph and is not affected by knowledge sparseness.

The Performance in the Cold Start Environment
This section verifies whether KANR can effectively alleviate the cold start problem. We simulated the cold start environment by adjusting the proportion of the training set. When r = 20%, the AUC of the five baselines decreased by 7.12%, 6.23%, 4.91%, 2.01%, and 2.42%, respectively. The performance of KANR only dropped by 1.5%. The results of the model under the AUC evaluation index are shown in Figure 5. This shows that KANR can achieve better results than the baseline in cold start scenarios. Similarly, in this experiment, the recommendation strategy combined with the knowledge graph performed better than the traditional recommendation methods such as Wide&Deep. This proves that the side information of the knowledge graph can alleviate the cold start problem of the collaborative filtering model.

The Performance in the Cold Start Environment
This section verifies whether KANR can effectively alleviate the cold start problem. We simulated the cold start environment by adjusting the proportion of the training set. When r = 20%, the AUC of the five baselines decreased by 7.12%, 6.23%, 4.91%, 2.01%, and 2.42%, respectively. The performance of KANR only dropped by 1.5%. The results of the model under the AUC evaluation index are shown in Figure 5. This shows that KANR can achieve better results than the baseline in cold start scenarios. Similarly, in this experiment, the recommendation strategy combined with the knowledge graph performed better than the traditional recommendation methods such as Wide&Deep. This proves that the side information of the knowledge graph can alleviate the cold start problem of the collaborative filtering model.   Table 3. With other parameters unchanged, the AUC of KANR in the movie dataset gradually increased with the increase in the d dimension, and reached the optimal performance in the case of 64 dimensions. In the music and book datasets, the model achieved the best performance in 16 dimensions. However, as the dimension increased, the performance gradually decreased. This shows that the entity embedding dimension adapted to each dataset can obtain the most effective data characteristics.

Discussion
The key factor in the recommendation task is the embedding of users and items. The CF-based method has the disadvantage of insufficient features due to sparse data. We found that the semantic information in the knowledge graph can effectively improve the effect of recommendations. Therefore, we propose KANR, an end-to-end recommendation method that uses semantic information in the user-item interaction knowledge graph to enhance the attention. It contains three parts, namely, a recommendation model, a knowledge graph embedding model, and an information interaction unit. The recommendation model uses the attention aggregation network to calculate the weights of historical interactive items and aggregate the product vectors to capture the user's interest preferences. The knowledge graph embedding model takes the item to be recommended as the head entity and the user as the tail entity, and uses the Semantic Matching Energy Model (SME) to obtain their semantic vector. Both the recommendation model and the knowledge graph embedding model share vector features through the information interaction unit. The item vector to be recommended and the head entity vector share features, and the user-item aggregation vector and the tail entity vector share features. It is worth mentioning that KANR changes the traditional attention mechanism and uses the information interaction model to share user vectors and historical interaction item vectors. The advantage of this is that it can introduce knowledge learned during information interaction into the attention mechanism. We conducted a series of experiments to prove that the semantic information in the knowledge graph can effectively enhance the recommendation model through the information interaction model.

Conclusions
The interaction between a user and an item represents the user's preference. We found that the semantic information of interaction can be used to improve the effect of the recommendation system. We proposed KANR and proved its effectiveness through experiments. Based on the research of this paper, we believe that the semantic relationship of the knowledge graph has great value in recommendation systems.
In the future, we will aim to design a more effective information interaction unit to combine with the recommendation model and the knowledge graph embedding model, and test the recommendation performance of different knowledge graph embedding models. In addition, we will also explore combining the knowledge graph with the recommendation problem in the sequence recommendation problem to realize a recommendation method using knowledge graph reasoning.