3.1. Knowledge Graph Introduction
Our work is based on the knowledge graph for extraction of users’ interest and items’ recommendation. We define the KG for a particular recommendation scenario as G = (E,R) = (h<e>, r<r>, t<e>), where E and R are entities and relations in the knowledge graph respectively and h<e>, r<r> and t<e> represent the head, relation and tail of a knowledge triple. In the recommendation scenario, the KG consists of a set of items and their related entities (such as item’s attributes, external knowledge, etc.). For example, for the <<Titanic>> (Titanic, film.film), the KG has a knowledge triple (Titanic, film.film.star, Leonardo) indicating that Leonardo DiCaprio is the star of the <<Titanic>>.
It is worth mentioning that our model is based on user-centered extraction through the knowledge graph. But the original knowledge graph does not include the user’s entity. In this work, we embed the user as a new entity type into the knowledge graph and define a new relation type to connect the users to interacted item entity. For example, Mike has seen 
<<Dark Knight>>, so we added a new triple 
(Mike,film.film.watch, The Dark Knight) in the KG of movie. The following 
Table 1 shows the key symbols and their meanings in this paper.
  3.2. Model Framework Introduction
The NACF model has four parts of input: the user , the item , the user–item interaction matrix M and the corresponding knowledge graph G. Under the above conditions, NACF discovers the potential interest of users in KG and extracts hidden features between the various entities.
Specifically, given the user ID 
u, the item ID 
i and the neighbor set 
 of each entity in the KG, NACF predicts whether 
u has potential interest in the 
i that 
u has not contacted. The user’s neighbors of KG is composed of item entities in user–item interaction record. The whole process is to learn the following prediction function:
        where 
 represents the probability that user 
u will participate in item 
i, 
F represents the recommendation function and 
w represents the trainable parameter of 
F.
For each particular user–item pair, the neighbor features of the user are iteratively aggregated onto the entity . It includes three steps: neighborhood construction, neighborhood aggregation and prediction.
- (1)
- Neighborhood Construction 
We take into account the entities in 
n-hop range of KG when extracting the users’ features. As shown in 
Figure 2 left sub-picture, the neighbors in the 
1-hop range of 
 (
) are composed of directly connected entities (items with interactive record), the neighbors in the 
2-hop range (
) are composed of neighbors of directly connected entities in 
1-hop and so on. Assuming the 
n-hop neighbor set of 
 is 
, neighborhood construction of 
 is as follows:
Such a construction method has the obvious advantage that the model expands the physical neighborhood through the KG and can explore the potential interests of the user more broadly and deeply. The range of neighborhood construction has a great impact on the final results of feature extraction. The goal of this step is to find the right range of neighborhood to fully exploit the potential knowledge without introducing too much noise.
- (2)
- Neighborhood Aggregation 
Before neighborhood aggregation, NACF initializes all entities and relations as trainable and random vectors (d-dimensions). If the neighborhood of entity includes n-hop (n layers), the aggregation of the entity is iterated n times in total. In the h-th iteration, all entities in the n-h layer perform aggregation of neighboring information and update the embedding vector. After one aggregation operation is completed, the updated representation of entity comes from the fusion of itself and the neighboring entities. We call such an aggregation operation as sub-aggregation . Until the model is iterated n times (the knowledge graph neighborhood converges to ), the neighborhood aggregation of user is complete.
In a 
n layer neighborhood, 
 will update a total of 
n-1 times and the final representation of 
 is 
. For example, in a 
2-hop aggregation scenario in 
Figure 2, the first iteration updates the 
 to 
, and the 
 is updated to 
, which we call the first-order representation of entity. In the second iteration, the 
 is updated to 
, which we call the second-order representation of entity. The specific sub-aggregation 
 will be described in the 
Section 3.3.
- (3)
- Prediction 
When the neighborhood aggregation of  is completed, the user’s embedding entity  and the item’s entity  perform a point multiplication to generate a prediction score. Finally, this score will be normalized by the sigmoid function to the predicted click rate .
The NACF framework is shown in Algorithm 1:
        
| Algorithm 1: NACF algorithm. | 
| ![Applsci 10 03818 i001 Applsci 10 03818 i001]() | 
  3.3. Aggregation Process
The aggregation of user-item pairs (
,
) includes the sub-aggregation of all entities in the current neighborhood 
. Sub-aggregation is a single process of aggregating information from directly connected neighbors to entity. As shown in Algorithm 1, given user id 
u and item id 
i, we define the 
h-th sub-aggregation process 
Agg of 
 as following:
        where 
 and 
 is initial 
.
As shown in 
Figure 3, we use the neighboring sub-aggregation process of the 
 as an example to illustrate the 
 in NACF. We use 
 to represent the collection of entities that are directly connected to 
 and assume that 
. 
 represents the relation between 
 and 
. The corresponding link relations are 
, 
, 
.
First we design a attention module to assign different weights to entities in the neighbor collection, since we believe that aggregation without distinction can introduce too much noise and is unreasonable. In NACF, we consider the degrees of users’ interest in different relations and the similarity between neighboring entities and recommended entity. In this way, the weights of different entities in the aggregation have been confirmed.
The attention module for calculating the weight is shown on the right sub-picture of 
Figure 3. For the weight calculation of the entity 
, two parts of weight (
 and 
) are calculated separately in the attention module. 
 represents the user’s attention to 
. For example, one user may be more inclined to choose a particular singer when listening to music; another may care more about the style of the music. 
 is the similarity between the current neighbor 
 and the recommended item 
. We believe that entities with high similarity to recommended entity have a greater impact on user’s choice. Finally, the weight of neighboring entities 
 is two-part (
 and 
) multiplication result. The process is as follows:
        where 
 and 
 are trainable vectors with 
-dimensions. 
 and 
 are one-dimensional trainable vectors.
In order to integrate the neighborhood information of 
, we perform an cumulative operation on the weighted neighbors to generate an aggregated vector 
. The final step of the sub-aggregation is to calculate the mean of the original entity representation 
 and neighborhood representation 
, and update the entity representation of 
 to 
. The update process of 
 is as follows:
In the actual knowledge graph, there may be significant differences in the number of neighbors for each entity. To facilitate more efficient batch operation of the model, we extract a fixed-size neighbor set as a sample for each entity instead of its complete neighbor set. Specifically, the real neighborhood of the entity  in the knowledge graph is . We set its calculation neighbor set to .  samples K neighbors from , and K is a hyper-parameter.
  3.4. Complete Loss Function
In order to better learn the NACF parameters and knowledge graph embedding representation, we designed the following complete loss function:
The loss function is divided into three parts. The first part represents the model prediction loss, and 
 is the cross entropy loss function. The second part is the 
 regularization of the trainable parameter 
w in the model. The third part is the 
 regularization of the knowledge graph embedding, where 
E and 
R are the embedding vectors of all entities and relations in KG respectively. 
 and 
 in 
 are configurable hyper-parameters. Because the above optimization problem is complicated, we use Adam [
29] to iteratively optimize the loss function. We will discuss the choice of hyper-parameters in the experimental section.