1. Introduction
Large-scale knowledge bases such as Freebase [
1], DBPdia [
2], and NELL [
3] are now publicly available, and they contain massive volumes of facts involving diverse entities and relations. Due to their huge size, they are used as an essential resource in many language-related tasks such as information retrieval, question-answering, and text mining. However, no matter how large the knowledge bases are, they are not yet complete since they are constructed manually. For instance, Freebase has three million entities for the ‘Person’ concept, but only 25% of them have nationality information [
4]. In addition, the entities with parent information occupy just 6% of the ‘Person’ entities, while it is natural that every person has a nationality or parent(s) in the real world. Such missing information appears for nearly every relation, and then is piled up. As a result, these cumulative missing facts limit the effective use of knowledge bases. Therefore, it is significant to fill in missing facts of a knowledge base automatically.
There have been many previous studies that attempted to fill in missing facts. One promising approach to this task is to use knowledge graph embedding [
5,
6,
7,
8,
9], which represents all entities and relations of a knowledge base as vectors in a low-dimensional space. The candidates for missing facts are also represented as vectors in the low-dimensional space and their plausibility is measured by vector similarity to the existing facts in the space. Then, the candidates with high plausibility are discovered as possible facts for the knowledge base. According to the experimental results of previous studies [
10], this approach achieves over 70% in Hits@10 (Hits@k is the rate of correct entities appearing at top-k ranked entities. That is, Hits@10 is the proportion of correct entities ranked in the top 10) in knowledge base completion. However, its Hits@1 is lower than 30%. This implies that it is still difficult to find correct missing facts using knowledge graph embedding alone.
A couple of methods have been proposed to improve Hits@1 performance in discovering missing facts. Wang et al. conducted knowledge base completion using both knowledge graph embedding and rules derived from knowledge base schema [
11]. Their experimental results show that adopting the rules helps improve Hits@1 performance. However, their method still does not cover
n-to-
n relations, since it is based on knowledge graph embedding which utilizes the characteristics of 1-to-1 relations. On the other hand, Choi et al. proposed a re-ranking model that uses both internal and external information of a knowledge graph for more accurate knowledge base completion [
12]. Their model first extracts top-
k candidates according to the plausibility computed by knowledge graph embedding. Then, the candidates are re-ranked by considering two kinds of additional information: knowledge base schema information and web search results. This is a good approach in that it is the first attempt to exploit such additional information, but it has still some problems. The main problem of the model is that it depends severely on the first top-
k results. If a correct fact is excluded in the first step, there is no chance to include it in the second step. Note that, in this model, the top-
k candidates are determined by a single knowledge graph embedding while no knowledge graph embedding is yet perfect. As a result, many correct facts are missed by this model.
This paper proposes a sole ranking model that adopts a committee of knowledge graph embeddings for accurate knowledge base completion. Unlike previous work that represents knowledge base completion as a re-ranking task [
12], we formulate it as a ranking task. Given a knowledge base, the proposed model generates candidate facts, and then the plausibility of each candidate is determined by a committee of knowledge graph embeddings, not by a single knowledge graph embedding. After that, the candidates are sorted according to their plausibility. Since the proposed model evaluates all candidates, the probability of missing correct facts gets reduced. Another advantage of the proposed model is that it can reflect various perspectives of a knowledge graph for measuring candidate plausibility, since it is a kind of committee machine. That is, the diversity of the committee members allows the model to have less variance error. According to our experiments on two standard data sets, the proposed committee-based model outperforms every single knowledge graph embedding. In addition, it shows higher Hits@
k performance as
k decreases, which implies it predicts missing facts more accurately.
The rest of this paper is organized as follows.
Section 2 describes related work on knowledge graph embedding for knowledge base completion.
Section 3 proposes the overall idea of the proposed model and
Section 4 explains how the proposed model works as a committee machine.
Section 5 describes how to measure the plausibility of each candidate using the proposed model and
Section 6 shows the experimental results. Finally,
Section 7 draws our conclusions.
2. Related Work
There have been a number of previous studies on knowledge base completion. One promising approach in the studies is to use knowledge graph embedding [
13,
14,
15,
16]. Knowledge graph embedding represents all entities and relations of knowledge facts as low-dimensional vectors. These vectors of entities and relations are trained by preserving the inherent structure of a knowledge base. Thus, the plausibility of a knowledge fact can be measured using the vectors, and then knowledge base completion is done by filling up the facts with a high plausibility.
Knowledge graph embeddings are clustered into two kinds of approach [
17,
18]. One is a translation-based model and the other is a latent semantics-based model. In the translation-based model, the entities are translated according to their relation and the plausibility of facts is measured by a distance-based plausibility function. The best-known representative translation-based model is TransE [
6]. TransE represents entities and relations as vectors in the same space assuming that the sum of a head entity vector and a relation vector is equal to a tail entity vector. Therefore, the plausibility function of TransE considers the distance between the sum vector and tail entity vector. It is simple and effective, especially for one-to-one relations.
There have been many extensions of TransE since its first appearance [
19,
20,
21,
22]. TransH [
23], TransR [
24], and TransD [
25] are the extensions, but they all adopt relation-specific entity embedding. Since TransH projects entities onto relation-specific hyperplanes, it can represent an entity as a different vector according to its relation. Similarly, TransR adopts a relation-specific space rather than hyperplanes. It embeds entities into relation-specific spaces with a projection matrix. TransD simplifies TransR by decomposing the projection matrix into a product of two vectors, where the two vectors are mapping matrices of a head and a tail. Since entities are projected according to their role, it can represent one-to-
n,
n-to-
n, and
m-to-
n relations.
Latent semantics-based models capture the latent semantics of entities and relations using a similarity-based plausibility function [
5,
26,
27]. In RESCAL [
7], entities are represented as vectors, and relations are matrices derived from pairwise interactions between entities. It works well for all relation types from one-to-one to
n-to-
m. However, it suffers from high complexity [
28]. DistMult solves this problem by simplifying the relation matrices of RESCAL [
29]. It represents a relation as diagonal matrices instead of matrices of relations. HolE combines RESCAL and DistMult effectively [
30]. The entities and relations are all represented as vectors, and the plausibility function adopts a circular correlation operation [
31] to compress the pairwise interactions. By this operation, HolE is able to model asymmetric relations that DistMult cannot. ComplEX is another method that extends DistMult to model asymmetric relations [
32]. Since it embeds entities and relations into a complex vector space, its plausibility function is based on a Hermitian dot product. As a result, it is scalable to manage a large data set.
Table 1 summarizes the knowledge graph embeddings explained above. For a given triple
, all methods but ComplEx embeds
h,
r, and
t into a space
, where
d is a space dimension. ComplEx embeds them into a complex space
. Then,
h and
t are represented as vectors
and
, and
r is as a vector
or as projection matrices
and
. In order to train these embedding vectors, each embedding has its own plausibility function
. In
Table 1,
implies a diagonal matrix, ⊛ is the circular correlation operation [
31], and
is the real part of a complex value.
There have been a few studies in which the advantages of various embeddings are combined. Krompaß et al. proposed an ensemble of knowledge graph embeddings for knowledge base completion [
33]. Their method combines three knowledge graph embeddings of TransE, RESCAL, and the embedding proposed by Xin et al. [
34]. They applied a logistic regression to each embedding to normalize the plausibility by each embedding. As a result, as the number of knowledge graph embeddings increases, the number of logistic regressions to train also increases. Another problem of their method is that it ignores the relative importance of each embedding.
3. Knowledge Base Completion
In general, a knowledge base stores a number of entities and relations, but it is usually incomplete in that it has many missing facts that exist in the real world. The applications of knowledge bases are limited due to this incompleteness. Therefore, it is important to solve knowledge base incompleteness.
The knowledge base completion is to find a set of facts missing from a knowledge base. Assume that a knowledge base S with a great number of facts is given. Each fact in S is represented as a triple , where h is a head entity, t is a tail entity, and r is their relation, respectively. The head entity h and the tail entity t belong to E, a set of entities, while the relation r is a member of R, a set of relations. Then, the knowledge base completion completes S by finding a set of missing triples .
On difficult problem in finding T is that not all possible candidates for the missing triples generated from S belong to T. For instance, a triple should not be a member of T, even if a knowledge base S has ‘Donald Trump’ and ‘China’ as its entities and nationality as its relation. Therefore, it is critical to measure the plausibility of every candidate missing triple. If a candidate triple is plausible enough with respect to a knowledge base S, it should be a member of T. Otherwise, it should be discarded.
4. Committee-Based Knowledge Base Completion
This paper proposes committee-based knowledge base completion that adopts a committee of knowledge graph embeddings to measure the plausibility of candidate missing triples.
Figure 1 shows how the set of missing triples,
T, is found systematically by the proposed committee-based knowledge base completion method. The proposed method first generates
, a set of candidate missing triples from
S as in the work of Bordes et al. [
35]. Let
be a concept of
h,
be that of
t,
be a set of all entities belonging to
, and
be a set of all entities belonging to
. Then, from every triple
, the candidate triples are generated by replacing
h with one of the elements in
except
h or replacing
t with one of the elements in
but
t. Thus,
candidates are prepared from each triple in
S.
Once is prepared, all its members are sorted according to their plausibility. There could be a number of ways to compute the plausibility of candidate triples, but knowledge graph embeddings are adopted to compute the plausibility in this paper. A knowledge graph embedding represents all entities and relations in S as low-dimensional vectors thereby preserving the inherent structure of S. Since the embedding is trained to preserve the structure of its knowledge graph, it can be used to measure how plausible the embedded vectors are in the space spanned by the knowledge graph embedding.
Even if any knowledge graph embedding can be used to compute the plausibility of triples, every embedding has its own idiosyncrasy.
Table 2 proves this. This table shows the top-10 candidate tail entities suggested by four famous knowledge graph embeddings when a head entity
yogurt and a relation
hypernym are known. The knowledge graph embeddings used in this table are TransE, TransR, DistMult, and ComplEx, and the entities are the synsets in the WordNet knowledge base. The bold entities are correct tail entities for
yogurt and
hypernym (Multiple entities can be correct since the relation
hypernym is transitive), and the shaded entities are those shared at least by two embeddings. As shown in this table, different knowledge graph embeddings suggest different entities. TransE shares three entities
dairy product,
solid food, and
foodstuff out of ten with TransR, and two entities of
dairy product and
solid food with DistMult and ComplEx. Moreover, even the shared entities are ranked differently. For instance,
dairy product is ranked first by TransE, fifth by TransR, ninth by DistMult, and second by ComplEx, while
solid food is ranked second, third, sixth, and first by them.
The ranking differences according to different embedding can be shown numerically by
Spearman’s rank correlation coefficient, which is a widely-used metric to evaluate rank correlation.
Table 3 shows the Spearman’s rank correlation coefficient between different knowledge graph embeddings. These coefficients are obtained using 500 triples sampled from the development set of WN18 dataset [
6]. The average coefficient among the embeddings is 0.5158. The coefficient between TransE and TranR is highest as 0.5904, and that between TransR and DistMult is lowest as 0.4916. These values prove that the embeddings are positively related, but not that strongly related. DistMult, in particular, shows lower coefficients against other embeddings, which implies that it ranks entities differently from the others.
In order to reflect the idiosyncrasies of knowledge graph embeddings in knowledge base completion while maximizing their effectiveness, the proposed method computes the plausibility of triples by a committee of knowledge graph embeddings rather than by a single knowledge graph embedding. That is, the plausibility of a candidate triple in is determined by the committee of knowledge graph embeddings. Then, all the members of are sorted by their plausibility, and the sorted set is . Finally, the top-k candidate triples from are selected as new facts to form T, a final set of missing triples.
5. Measuring Plausibility by Embedding Committee
The product of experts (PoE) proposed by Garcıa-Durán et al. [
36] is adopted as a committee machine to combine knowledge graph embeddings. When all experts are probabilistic models, PoE models their overall probability distribution by combining their outputs. While each expert considers a particular aspect of a target task, PoE manages the task comprehensively. As a result, it produces a better distribution than individual experts.
In this paper, the plausibility of a triple
is represented as its probability, where
, the probability of
, is estimated by
Here,
is the parameter of the
m-th individual embedding,
is the score of
by the
m-th embedding, and
is all possible candidate triples.
is basically a product of all outputs by individual embeddings. Thus, the more plausible
is, the higher
it has.
Four knowledge graph embeddings of TransE [
6], TransR [
24], DistMult [
29], and ComplEx [
32] are adopted as members of a committee to determine the final plausibility of candidate triples. As shown in
Table 3, these four embeddings have diverse tendencies for ranking triples. These embeddings output a real value as a score for a triple, but each expert should be a probabilistic model in PoE. Thus, the sigmoid function
is used to covert the score to a probability. That is, the score functions of the embeddings are
TransE and TransR have a distance-based score function, which implies that the smaller the score of a triple is, the more plausible the triple is. Thus, in Equation (2) and (3), a negative score is used in the sigmoid.
Note that the parameter
of an embedding in Equation (1) is determined to maximize the performance the embedding without considering other committee members. Therefore, in order to rank the candidate triples optimally by the committee, all
’s should be fine-tuned by considering neighboring embeddings. To fine-tune the parameters, the negative log-likelihood loss is used, which is defined as
The derivative of
with respect to
is
Since the second term of Equation (7) is intractable, it is approximated through a negative sampling [
37]. In this paper, negative samples are generated by swapping the head or tail entity of training triples. When a training triple
is given, the negative triples
are made by random-sampling
from
E, and replacing it with
t in
x. Another negative triples
are generated in the same way. Then, Equation (7) becomes
where
is a set of negative samples.
7. Conclusions
This paper has proposed a committee model of knowledge graph embeddings for knowledge base completion, where knowledge base completion is a problem of discovering missing triples in a knowledge base. The previous research on this task has used a single knowledge graph embedding. Since every embedding has its own idiosyncrasy, no single knowledge graph embedding is fully effective in solving the task. Thus, we address the knowledge base completion task by organizing a committee that reflects the synergistic advantages of various knowledge graph embeddings. When a knowledge base is given, candidate triples are first generated from the knowledge base to discover missing triples and then their plausibilities are computed by the committee. The candidates with a high plausibility are selected as missing facts of the knowledge base. This paper incorporates TransE, TransR, DistMult, and ComplEx into the embedding committee. According to our experimental results, the proposed committee-based model shows higher performance than any single knowledge graph embedding. In addition, it is robust even when the model accepts only a small partition of candidate triples. This is because the proposed model can combine these knowledge graph embeddings effectively. For future work, we are going to combine another knowledge graph embeddings into the committee. In addition, the performance of knowledge base completion depends much on the negative sampling method. Thus, we are going to study better negative sampling methods to improve the committee machine.