1. Introduction
Knowledge graphs (KGs) are directed graphs composed of nodes (entities) and edges (relations). Nodes are connected by edges to form a triple containing a head entity, relation, and tail entity [
1]. However, a large number of implicit triples in knowledge graphs constructed by machines or manuals are not displayed. For example, only 2% of the person entities in the Wikidata dataset involved in this experiment have father information. The main purpose of knowledge graph completion (KGC) is to extract implicit triples from graph structure data, which helps to improve the performance of downstream tasks of KGs [
2,
3,
4]. Therefore, the KGC task has attracted more attention from researchers and has become one of the main research hotspots.
Most of the existing KGC work requires a large number of entity pairs for relation inference. However, in many large-scale knowledge graphs, most relations are only associated with a small number of entity pairs, which leads to the fact that these relations that need to be completed cannot obtain sufficient features to learn, thus increasing the difficulty of embedding representation [
5,
6]. Handling relationships with a limited number of entity pairs is important and challenging. Therefore, dealing with relations with a limited number of entity pairs is very important and challenging. Researchers have proposed the few-shot knowledge graph completion (FKGC) task to solve the problem of long-tailed data representation learning in KGs [
7,
8,
9]. The FKGC model based on meta-learning measures the distance between the query set and the support set, and then determines whether the query entity pair satisfies the support relation representation.
In 
Table 1, we observe that the main semantic association among the triples of the support set, query set, and candidate set is the relation. Therefore, extracting accurate relation semantic features helps the model to discover connections between triples. In FKGC, the head and tail entities in the triple and the graph structure information around the triple can enhance the perception of the relation, which helps to understand the relation semantics [
10,
11]. Therefore, we propose a model that enhances relation representation using the external structure and internal semantics of triples for the FKGC.
Different from the existing FKGC model, which only produces insufficient relation features by modeling the distance between nodes, our proposed model aims to deepen the understanding of relation semantics through the multifaceted features of triples. An encoding strategy that incorporates relation and attention mechanism into the graph convolutional network (GCN) is designed to encode triple structure features. Afterward, we design the shared merged variables to map the internal and external semantic features into the relation embedding space. The relation semantic enhancement by this feature interaction is beneficial for the close semantic association among the triples with the same relation. Eventually, the attention convolution layer is designed to extract the prototype from the obtained relation semantics to construct the prototype network. The contributions of this paper are summarized as follows:
1. A method of introducing structural features around triples into relation feature representation is proposed. The deep relation semantic features of triples are learned by employing graph structural features. The encoder based on a GCN with attention and relation features is designed to encode reference embeddings. So, the obtained information on graph structure can be used to enrich the relation features of references on the few-shot scenarios.
2. This paper proposes a hybrid features mapping method, in which the external structure features and the internal features can be introduced effectively. The shared merged variables are used to map the entity and graph structure features into the relation embedding space. It can strengthen the ability to perceive the relation semantics. The associations among the triples with the same relation will be found more easily.
3. An encoding framework for enhancing relation semantic information is constructed. We design the prototype network based on attention convolution to emphasize important features. Meanwhile, the relation prototype extraction method is proposed to improve the training effect of the prototype network. It can make good use of the relations in sentences to complete the triples.
  2. Related Works
The GCN is widely adopted in the KGC task mainly for graph-structured data. The main idea of the GCN-based KGC model [
12,
13,
14] is to encode nodes by mining graph structure features, and then find the correlation among triples for link prediction. Michael et al. [
15] believe that there is a lot of missing information in the neighbor structure of the node. Based on this idea, the R-GCN encoder is proposed to encode the entities and then perform the link prediction. Vashishth et al. [
16] proposed the CompGCN model, which combines explicit relation and node embeddings to alleviate the problem of too many parameters in the representation learning. Dai et al. [
17] constructed the RAGAT model, which introduces the graph attention network (GAT) to calculate the importance of neighbors to the central node. In this paper, the GCN can apply the background structure features of a small number of samples to feature learning to improve the performance of the model.
In recent years, more and more researchers are interested in few-shot learning, which promotes the development of tasks in few-shot scenarios. The methods commonly used in few-shot learning can be divided into three categories: the model-based method [
18] uses the designed model to update the weights of a few samples. Santoro et al. [
19] used a model that can enhance data storage memory to classify a small number of samples. The metric-based [
20,
21] method uses the nearest neighbor idea to calculate the distance from the query set to the support set. Prototype networks [
22] convert the classification task into the nearest neighbor task to find the prototype in the embedding space. The optimization-based method [
23] fits few-shot learning tasks by adjusting the gradient of the model. Finn et al. [
24] proposed a model that can obtain better generalization performance with a small number of iterative steps. The metric-based method, which models the distance distribution between samples, can match the link prediction requirements in the FKGC task.
The FKGC is an emerging task with wide attention. Xiong et al. [
25] proposed the GMatching model, which used a GCN to encode the first-order neighbor features of nodes, and adopted LSTM to calculate the similarity between the support set and the query set. Chen et al. [
26] established a meta-learning framework called Mate-R, in which authors pass on relation and gradient elements in the relation link prediction. Sheng et al. [
27] designed the FAAN model encoder to learn the dynamic attribute representation of nodes and dealt with the multi-relation semantics of triples. Li et al. [
28] proposed a relation-specific context learning framework, which uses the context of triples to capture the semantic information of relations and entities. However, these methods are not comprehensive enough to express the important semantic features of relations in FKGC, which leads to a weak correlation among the triples with the same relation. Therefore, enhancing the model’s perception of relations is the key to this task.
  3. Background
A knowledge graph 
 is composed of the entity set and the relation set in the form of the triple 
. Due to the incompleteness of KGs caused by the fact that many entity pairs still have implicit relations, the link prediction task is proposed to find another element after being given two elements [
29,
30]. In this work, we mainly focus on using known head entities and relations to predict tail entities. Specifically, the negative triples replace the real tail entities with other entities, and then adjust the parameters and train the model through positive and negative sampling. This paper focuses on predicting tail entities with known head entities and relations.
The FKGC task trains the model with a small amount of support set and query set data and then judges the facts in the candidate set [
31,
32,
33]. We refer to the triples with the same relation in the support set as the reference. In the FKGC based on the meta-learning process, we sample K triples with the same relation to form a support set 
, 
, 
 refers to the triple set with the 
r. The query set is 
Q triples picked from 
 that are disjoint from 
, denoted as 
. We replace the tail entities in the query set with other random entities from 
, and define a set of negatively sampled triples for model training, denoted as 
. Furthermore, the distance between 
 and 
 is modeled to predict the authenticity of the triples, so as to find new implicit triples and then complete the knowledge graph. The model builds the meta-training set 
 through the above sampling method to fully train the model parameters, and then feeds the meta-testing set 
 into the trained model. The training data with 
r is 
, and contains a set of positive samples and negative samples in 
. The model has access to a subset 
 of the knowledge graph 
, which excludes all relations in 
 and 
.
  4. Methods
  4.1. Model Framework
An overall framework of a prototype network-based enhanced relation representation model (ERRM) is designed for the FKGC. As shown in 
Figure 1, the framework includes three parts:
(a) The local subgraph structure encoding of triples is proposed. In this part, entity and relation attribute features are fused into the context structure of triples, and then are fed into the GCN by the Transformer to obtain triple embeddings with structure semantics, so as to enrich the semantic information of relations in the triples.
(b) In order to obtain an enhanced triple relation embedding representation, we design the mapping matrix to map the semantic information of the external structure and the internal head and tail entities of the triple to the target relation embedding space, so that the relation embedding can obtain the surrounding information, to achieve the purpose of enhancing the relation semantics.
(c) The obtained triple representation is fed into meta-learning in the form of the support set and query set. We design a new meta-learning method based on the prototype network, which obtains prototype representation by attention convolution to accurately extract relation features. Finally, the distances supporting the prototype and query triples are calculated to construct the loss function and then the training model.
  4.2. Triple Local Structure Semantic Encoding
We establish local subgraphs for each few-shot reference on  by breath-first search, which includes not only the direct neighbor entities and relations of the references, but also their multi-hop neighbors. The TransE model is used to initialize the entity and relation embeddings in  and , including the embedding set in the subgraph corresponding to the reference, denoted as , , where  is the direct or multi-order neighbor of , , and the embedding dimension is d. The Laplacian matrix of the subgraph is , where ,  is the adjacency matrix,  is the degree matrix, and  is the identity matrix.
  4.2.1. Fusion of Relation Feature
In a traditional GCN, the relation between nodes does not participate in the structure feature fusion [
34,
35]. The neighbor relation helps to understand the semantics of triples, and the purpose of encoding triples is to enhance the relation semantics of reference, so the structure encoding depends on the semantic features of the neighbor relation. How to effectively introduce the neighbor relation features required by triples into the model is a challenge.
Based on the traditional GCN, a new method of fusing relation and entity features is designed. The in-degree and out-degree directions of the relation have different semantics for the node. For example, the relation direction is represented by the difference between the head and the tail entity in TransE. To express the relation direction in GCN, we introduce the relation direction parameter:
          where 
, 
 and 
 are in-degree and out-degree relation sets, respectively. 
r are relation embeddings in the reference, and 
 are self-loop, in-degree, and out-degree relation direction parameters, respectively, then 
. The updated function of triple embeddings is:
          where 
 is the layer number of GCN, 
 is the learnable weight parameter of the 
k-th layer GCN, and 
 is the updated function of GCN. It should be noted that we combine the head and tail entities of the triples in the subgraph, expressed as 
, and its self-ring relation is 
r. 
 is the Transformer module, the specific operation is shown in 
Section 4.2.2. 
 is the next layer triple embeddings obtained after the weight update of the 
k-th layer GCN.
  4.2.2. Triplet Structure Feature Encoding Based on Graph-Transformer
A GCN based on the spectral domain aggregates the neighbor embedding features of each node through the Laplace matrix [
36]. However, the contribution of neighbor features is inconsistent. So a graph-Transformer module is designed to emphasize the role of different neighbors in the process of aggregating information. The sequence position embeddings are obtained by the position encoding method in the Transformer as follows:
          where 
 is the position of the 
i-th neighbor (entity and corresponding relation) in sequence 
. PE is the position embeddings acquisition operation and 
 are the sequence position embeddings of the 
i-th neighbor. In addition, different from the previous model [
37], we design a hop-position encoding strategy for the graph structure as follows:
          where 
 is the hop of the 
i-th neighbor in the subgraph, and 
 are the hop position embeddings of the 
i-th neighbor.
          
          where 
 is a sequence set. The neighbor fusing embedding set with attention and position information obtained by 
 is fed into the GCN. Finally, the triple embeddings with graph structure features are gained.
  4.3. Embedding Mapping for Enhancing Relation Semantics
This paper proposes a strategy to enhance relation semantics by relying on the surrounding information of relations. The local graph structure feature embeddings and the head–tail entity fusing embeddings are used as information to improve the understanding of relation semantics. Furthermore, a mapping method is designed to put the context of the relation and relation features in the same embedding space.
We define the shared merged variables for a specific matrix combination operator to map the learned surrounding features to the target relation embedding space. The important semantic information is transmitted to the relation embeddings by this mapping method to enhance the relation semantics. In addition, the sparsity of specific variables is constrained to avoid excessive interaction between relations and other embedding features in different dimensions, thus preventing overfitting. The obtained structure embeddings 
 and the combined embeddings 
 of the head and tail entities are mapped to the embedding space unified with the relation embeddings as follows: 
        where 
 are the specific matrix variables of relation, structure, and entity weight, and 
 are the bias terms of the structure and entity combination, respectively. The enhanced relation semantic embeddings are obtained through two groups of mappings, 
 and 
, 
. Furthermore, we adopt the merging method to feed the obtained two enhanced relation embeddings into meta-learning, as shown in the formula:
        where 
 is the fusing method selected to concatenate relation embeddings in the channel dimension. The merged relation embeddings are represented as 
, 
. The purpose of concatenation is to retain as much semantic information as possible, which more accurately represents relations through convolution and attention in the prototype network.
The local structure features and entity features of the triples in the subgraph are mapped to the relation embedding space as shown in Algorithm 1.
        
| Algorithm 1 Embeddings mapping for enhancing relation semantics | 
| Require: Knowledge graph: , Knowledge graph background: ;Ensure: Enhanced relation embedding set: ;1:whiledo2:    Extract subgraph  from  according to Reference;3:    Initializing subgraph entity and relation embedding set: ;4:    Introducing in/out degree weights for relations ;5:    Get the position information embeddings  and ;6:    Entities, relations, and corresponding position embeddings are fed into Transformer to obtain merged entities and relations set ;7:     are fed into GCN to obtain triplet structure embeddings ;8:    The subgraph structure embeddings  are mapped into the relation embedding space to obtain new relation embeddings  enhanced by structural feature;9:    The combined entity embeddings  are mapped into the relation embedding space to obtain new relation embeddings  enhanced by entity features;10:    Concatenating  and  to obtain the fused relation embeddings ;11:end while12:The K-shot  is represented as a set ;
 | 
  4.4. Prototype Extraction Method and Model Training Based on Attention Convolution
In meta-learning-based FKGC, the candidate triples are assigned to the corresponding relational classes by measuring the distance between the query set and the support set. In the prototype network, the support prototype and the query instance in one way are placed in a unified embedding space to realize the triple classification task. We propose a prototype extraction method based on attention convolution to enhance the relation semantics representation in one way. In the support set, the 
K-shot 
 is represented by a set 
. We convert 
 into 
, as shown in 
Figure 2. Then, we perform two-layer attention convolutions on 
, where the middle layer embedding dimension 
, the Dropout layer, and the Normalize layer. Additionally, a self-attention mechanism is introduced as follows:
        where 
. 
 is obtained by assigning attention to 
, which aims to emphasize important relation semantic information in 
n way and prevent loss of attention information during convolution, to obtain prototype representation and 
. The common prototype extraction method is to perform an average operation on 
K-shot embeddings in 
. The Euclidean distance is used to calculate the distance 
d from the support prototype to the query set instance.
In 
Section 3, there is a positive set and negative set (
) in 
. Furthermore, positive and negative triples are encoded to calculate the semantic similarity to the prototype. The distance set between the instance and the prototype is 
 in the positive set and 
 in the negative set, 
b is batch. The standard hinge loss function used to train the model is as follows:
        where 
 refers to a margin separating positive and negative. The prototype network training process based on attention convolution is shown in Algorithm 2.
        
| Algorithm 2 Detailed training process of the prototype network | 
| Require: References in positive and negative samples;Ensure: Loss of the model: ;1: with structure features by GCN;2:The enhanced relation semantics triple embeddings  by the mapping method;3:while not done do4:    Enhanced relation embedding set  is fed into the two-layer attention convolution to obtain ;5:    The support prototype in 1 way is represented as p;6:    Extracting n-way  Extracted from  is encoded;7:    Calculate the distance  and  from the positive query instance  and negative query instance  to the supporting prototype p respectively;8:    Calculate the loss function, and use the loss  to update the parameters;9:end while
 | 
  5. Experiments
  5.1. Datasets
We conduct experiments on two datasets. The first one is based on the NELL dataset, which continuously collects knowledge from the Web. The second one is based on the Wiki dataset extracted from Wikipedia. The statistics of these two datasets are shown in 
Table 2. Few-shot relations have more than 50 and less than 500 tasks and other relations are background knowledge of the graph. Thus, the 67 tasks in NELL are divided into 51 training, 5 validation, and 11 testing. The 67 tasks in Wiki are divided into 133 training, 16 validation, and 34 testing.
  5.2. Implementation Details
Our model performs the FKGC task on the five-shot task. In , the number of neighbors (nodes and corresponding edges) outside the triple extracted by breadth-first traversal is fixed to 20. The batch is set to 128, the embedding dimension of entity and relation is set to 100, and the number of GCN layers is set to 2. The embedding dimension of the hidden layer in the attention convolutional layer of the prototype network is set to 50, and the Dropout is set to 0.2. The margin separating positive and negative is set to 5. Then we train the loss through the Adam optimizer, and the learning rate is selected from 0.0001 and 0.001. Our model training is evaluated on the evaluation set 10,000 times per iteration, and the optimal hyperparameters are optimized by grid search on the validation set. The maximum number of iterations is set to 20,000.
This paper uses Hits@N and MRR as metrics to evaluate the model. Hits@N is the probability that the entity appears in the top N scores, where N is set to 10, 3, and 1. MRR is a general mechanism for evaluating global search algorithms. In the i-th sample set, MRR ranks the predicted correct tail entity positions.
  5.3. Results Comparison
The traditional KGC model can be used in the FKGC task, so we choose three conventional KGC models as the FKGC baseline: TransE, which takes the relation as the difference between the head and tail entity embeddings, is a classical translation model. The DistMult model, which learns relation embeddings to mine the potential logical rules of triples, is a model based on tensor decomposition. The ComplEx model is a variation of DistMult, the idea of which is to score predicted triples by the Hermitian dot product. The abovementioned knowledge graph embedding (KGE) model injects the graph structure features into the entity-relation semantic information in an embedding form to address the knowledge representation problem [
38]. All entity pairs of background relations and training relations, as well as few-shot training entity pairs of validate and test relations, are used to train models.
In addition to the baselines mentioned above, we introduce the FKGC model in recent years to compare the results. These models combine local graph structure encoders and matching networks to learn entity embeddings and predict new facts. The GMatching model is the first few-shot link prediction model of KGs. The model is a meta-learning model based on entity embeddings and local graph structure learning matching metrics. The MetaR model adopts a meta-learning framework, which is used to realize the few-shot relation prediction of knowledge graphs. The FSRL model effectively captures knowledge from heterogeneous graph structures and discovers new relations from a small number of samples. The FAAN model uses the background knowledge graph to represent the split triples. The GANA model [
39] proposes that a novel gated and attentive neighbor aggregator is helpful for filtering noise.
Table 3 reports the accuracies of our and the baseline model on NELL and Wiki, we find that the few-shot model has better performance on the few-shot datasets than the model for traditional KGC, especially on the Wiki dataset. Therefore, the few-shot model has a better effect on long-tail training data. Our model has excellent performance on NELL and Wiki datasets, which proves that our model contributes to the FKGC task. However, it can be observed from 
Table 3 that the two metrics of our model on the NELL dataset are not as good as the GANA model, but all the metrics on the Wiki dataset are much larger than the GANA model. We analyze the reasons for this result: on the one hand, the attention neighbor aggregator method introduced by the GANA model is more suitable for simple graph-structured datasets than our model. On the other hand, the advantage of our model’s overall triple encoding over split triples is to avoid multiple relations affecting the triple semantic representation, so our model can adapt to the complex Wiki dataset.
   5.4. Analysis
  5.4.1. Ablation Study of the Structure Encoder
We adopt an ablation study to demonstrate the influence of each innovation in the triple structure encoder on FKGC. It mainly includes the following four ablations:
As1: In the encoder, we adopt an encoding structure feature strategy that treats triples as a whole. In order to prove its effectiveness, an entity-level encoding method is designed. This method first encodes the structure features around the head and tail entities respectively, and then merges them into a triple embedding in the form of vector addition.
As2: Based on the coding of the entire triple, the variant uses a GCN with relation features and Transformer modules as the encoding structure feature method, and then proves the effect of the encoder that ablates the position information.
As3: This variant ablates the Transformer module. Specifically, relation embeddings with direction features and entity embeddings are merged by adding, and then the merged embeddings are fed into the GCN.
As4: Relations without position information are introduced into the Transformer module to obtain sequence attention and input it into the GCN.
In 
Table 4, we list the results of the four ablation variants and our model on the two datasets. A comparison of our model with As1, As2, As3, and As4 shows that most of our model’s metrics are ahead of the other modules. The As3 module performs the worst among all modules, which proves that the Transformer module on the GCN can effectively emphasize important information, and then help the model achieve better results in the FKGC. From the comparison of the results of As1 and our model, it is found that the model effect of the graph structure encoding of whole triples is significantly stronger than that of the split triple encoding, which indicates that our model retains the triple integrity feature. From the result comparison of the two datasets, it can be observed that the improvement effect on Wiki is more obvious. This is because the Wiki dataset has more complex relation types, and the graph structure encoding of whole triples reduces the impact of complex relation features on the prediction results.
  5.4.2. Mapping Matrix Sparse Constraint
We leverage matrix variables to map structure embeddings and triple embeddings into the same embedding space and fuse them. However, the excessive interaction between feature embeddings is not conducive to the training of the model, so we constrain the matrix sparsity. The optimal matrix sparsity is obtained through experiments. As shown in 
Figure 3, the discounted trend represents the influence of matrix sparsity on the performance of the model on the NELL and Wiki datasets, respectively.
We observe from 
Figure 3 that the metrics on the NELL dataset perform worst when the sparsity is 0.1, 0.2, and 0.3, but there is a slight fluctuation between 0.4 and 0.6, and a downward trend after 0.6. On the Wiki dataset, 0.1 and 0.2 perform poorly, but between 0.3 and 0.4 are stable, and there is a downward trend after 0.4. Finally, we select the sparsity that can produce the best results, which is 0.6 on the NELL dataset and 0.3 on the Wiki dataset. In this regard, we analyze that the choice of matrix sparsity is related to the size and complexity of the dataset. The more complex the dataset, the smaller the selected sparsity. Because there are more features in the complex dataset, to avoid the occurrence of overfitting caused by excessive interaction between embeddings in the mapping process, it is necessary to reduce the sparsity of the mapping matrix.
  5.4.3. Influence of Attention Convolution on Node Dispersion in Prototype Network
In the prototype network, the structure feature embeddings obtained from the encoder and the initial entity embeddings in the triple are concatenated after the mapping operation and 
 is obtained by the attention convolution. In this part, we design a group of comparative experiments to observe the improvement effect of attention convolution on the model. We randomly select 10 relation classes and drew a scatter distribution of attention convolution, as shown in 
Figure 4.
In 
Figure 4, we can directly observe that there are confusing relation classes in (a), and the scatters with the same relation (b) are more compact. Inspired by the above results, we conclude that the attention convolution strategy can increase the semantic distance between different prototypes. The resulting prototype representation enables the model to better distinguish confusing data.