Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding

Xiang, Chao; Fu, Cong; Cai, Deng; He, Xiaofei

doi:10.3390/electronics12061348

Open AccessArticle

Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding

by

Chao Xiang

^1,*,

Cong Fu

¹,

Deng Cai

¹ and

Xiaofei He

^1,2

¹

State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China

²

Fabu Inc., Hangzhou 310030, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1348; https://doi.org/10.3390/electronics12061348

Submission received: 7 February 2023 / Revised: 6 March 2023 / Accepted: 7 March 2023 / Published: 12 March 2023

(This article belongs to the Special Issue Applications of Computational Intelligence, Volume 2)

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge Graph Embedding (KGE) is a powerful way to express Knowledge Graphs (KGs), which can help machines learn patterns hidden in the KGs. Relation patterns are useful hidden patterns, and they usually assist machines to predict unseen facts. Many existing KGE approaches can model some common relation patterns like symmetry/antisymmetry, inversion, and commutative composition patterns. However, most of them are weak in modeling noncommutative composition patterns. It means these approaches can not distinguish a lot of composite relations like “father’s mother” and “mother’s father”. In this work, we propose a new KGE method called QuatRotatScalE (QRSE) to overcome this weakness, since it utilizes rotation and scaling transformations of quaternions to design the relation embedding. Specifically, we embed the relations and entities into a quaternion vector space under the difference norm KGE framework. Since the multiplication of quaternions does not satisfy the commutative law, QRSE can model noncommutative composition patterns naturally. The experimental results on the synthetic dataset also support that QRSE has this ability. In addition, the experimental results on real-world datasets show that QRSE reaches state-of-the-art in link prediction problem.

Keywords:

knowledge graph; knowledge graph embedding; link prediction; quaternion; relation pattern

1. Introduction

Knowledge Graph (KG) is composed by structured, objective facts. The facts are usually expressed in the form of triples as (h, r, t), where h, r, and t express the head entity, the relation, and the tail entity, respectively. For example, (China, located_in, Asia). Knowledge graphs have successfully supported many applications in various fields, such as recommender systems [1], question answering [2], information retrieval [3], and natural language processing [4]. KGs have also attracted increased attention from both industry and academic communities. However, real-world knowledge graphs, such as Dbpedia [5], Freebase [6], Yago [7], and WordNet [8], are usually incomplete, which restricts their applications. Thus, knowledge graph completion has become a widely studied subject. This subject is usually formulated as a link prediction problem, i.e., predicting the missing links that should be in the knowledge graph. Generally speaking, it asks us to design an agent that takes the query as input and outputs some entities. The query may contain a head entity and a relation or a tail entity and a relation. Every outputted entity should be able to form a plausible triple together with the query.

So far, the fundamental way to deal with the link prediction problem is Knowledge Graph Embedding (KGE) in industry and academia [9,10,11,12,13]. In this way, the agent needs to learn a low-dimensional vector representation, also called embedding, for each entity and relation. We have to design a scorer that can grade any triple in the embedding form for its plausibility. When predicting the unknown entity, the agent only needs to grade all possible triples (composed by the query with each candidate entity) and then take the candidate entities of high score triples as the predicted result.

There are two reasons why these knowledge-graph embedding methods can tackle the link prediction problem effectively. On the one hand, there are many utilizable relation patterns in real-world KGs, such as symmetry/antisymmetry, inversion, and composition. These relation patterns are generally presented as natural redundancies in KGs, such as triple (China, located_in, Asia) and triple (Asia, includes, China) may exist in some KGs simultaneously, and they are describing the same fact. Here, located_in and includes are inverse relations for each other. On the other hand, existing KGE models have been able to model most relation patterns, i.e., evaluate the plausibility of triples by utilizing the relation patterns. For example, TransE [10] can model inversion patterns. When there are many natural redundancies relevant to located_in and includes in training KG, even if it has only seen triple (China, located_in, Asia) but not seen triple (Asia, includes, China), the TransE model can still mark a high plausibility score for triple (Asia, includes, China).

However, as far as we know, almost none of the existing KGE models can perfectly model the aforementioned relation patterns. For example, RotatE [13] declares that it can model all of the relation patterns, but it still has a fatal defect in modeling the composition patterns: It can only model commutative composition patterns, but can not model noncommutative composition patterns. This defect is also existing in some other KGE models which claim themselves can model composition patterns, such as TransE. Briefly speaking, a composition pattern implies the relation pattern that in the shape of

r_{1} \oplus r_{2} = r_{3}

, where ⊕ means the ordered composition of

r_{1}

and

r_{2}

. If composition pattern

r_{1} \oplus r_{2} = r_{3}

exists in some KG, it means that KG has frequent natural redundancies in the form of

[(e_{1}, r_{1}, e_{2}), (e_{2}, r_{2}, e_{3}), (e_{1}, r_{3}, e_{3})]

,

e_{i} (i = 1, 2, 3)

can be any entity. If

r_{1} \oplus r_{2} = r_{3}

and

r_{2} \oplus r_{1} = r_{3}

are both in a KG, we call

r_{1} \oplus r_{2} = r_{3}

a commutative composition pattern, such as

is_on_the_east_of \oplus is_on_the_north_of = is_on_the_northeast_of

and

is_on_the_north_of \oplus is_on_the_east_of = is_on_the_northeast_of

. Otherwise, if only

r_{1} \oplus r_{2} = r_{3}

is in the KG, it is a noncommutative composition pattern, such as

is_the_husband_of \oplus is_the_mother_of = is_the_father_of

.

Both of RotatE and TransE model the noncommutative composition pattern as the commutative composition pattern by mistake. This mistake will bring severely ridiculous inferences. For example, they will infer (i.e., mark a high score for) triple (Mary, is_the_father_of, Barbara) based on existing triples (Mary, is_the_mother_of, James) and (James, is_the_husband_of, Barbara). The primary cause of this mistake is that they have not taken the design inspiration of their models carefully. They both expect to express a fact triple (h, r, t) through an equation relevant to the embeddings of h, r, and t (noted as boldface letters h, r, and t):

h ⊙ r = t

, where ⊙ is some binary operation. Thus, they design the score function in the form of

- ∥ h ⊙ r - t ∥

, and we can see that the closer the equation is to hold, the higher the plausibility score is. TransE embeds entities and relations into the real number vector space and takes the addition in that space as ⊙, while RotatE replaces the real numbers with the complex numbers and takes the element-wise multiplication as ⊙. Since these two operations both satisfy the commutative law, the corresponding two models can only model commutative composition patterns.

Inspired by QuatE [14], which will be discussed in Section 2.1.2, we propose a new KGE model called QuatRotatScalE (or QRSE, for short) in this paper. The main difference from TransE and RotatE is it embeds entities and relations into the quaternion [15] vector space and takes the element-wise multiplication in that space as ⊙. Because the quaternion multiplication generally does not satisfy the commutative law (but there are special cases where the law holds), QRSE can model both composition patterns. Furthermore, we can prove that QRSE can also model the rest relation patterns. Thus it has become one of the KGE models that can model most relation patterns up to now. We evaluated QRSE and compared it with many baselines in two well-established and widely used real-world datasets FB15k-237 [16] and WN18RR [17]. The results indicate our method has reached the state-of-the-art in link prediction problem.

2. Related Work

At present, there are two classes of methods to solve the knowledge graph completion or link prediction problem. One class of methods is the KGE methods, and the other is the path-finding methods. Both of them are introduced below:

2.1. KGE Method

Embedding methods are widely used in many fields of machine learning since the embeddings of sentences, graphs, and many other data types can be easily transferred to various downstream tasks with only a little task-specific fine-tuning. For example, studies [18,19] first learn the embeddings of sentences and then use these embeddings to perform sentiment classification. Study [20] first learns an embedding for each graph, then use these embeddings to predict the missing labels of graphs. In addition, some other studies learn an embedding vector for each object (i.e., the node in a graph) of a given Heterogeneous Information Network (HIN) in a (semi-)supervised [21] or self-supervised [22] manner. Taking advantage of the learned embeddings of objects, they can fulfill many tasks, e.g., object classification, clustering, and visualization.

In knowledge graph completion or link prediction problem, Knowledge Graph Embedding methods are also the most studied methods. Let us use

E

to represent the set of all entities and use

R

to represent the set of all relations in KG. KGE methods need to assign a vector representation to every entity

e \in E

and relation

r \in R

, noted in boldface letters

e

and

r

, respectively.

e

or

r

is also called the embedding of e or r. In addition to this, KGE methods still need to design a score function

f_{r} (h, t)

to mark the plausibility of the triple (h, r, t). The objective of optimization is to mark high scores for the true triples and low scores for the false triples. Based on the type of score function, we can further divide the KGE methods into two sorts, KGE based on difference norm and KGE based on semantic matching:

2.1.1. KGE Based on Difference Norm

The common motivation of this sort of method is they want to use a triple approximate equation

f_{1} (h, r) \approx f_{2} (t, r)

to describe any triple (h, r, t), and the strict equation should hold for fact triples. As for the unknown triples, they think the proximity of the two sides can reflect the plausibility of the triple. Thus, the score functions of these methods are always in the form of

f_{r} (h, t) = - ∥ f_{1} (h, r) - f_{2} (t, r) ∥

.

Among them, there is a kind of method that is widely studied, called translational methods. We call them “translational” because the origin of this kind of method, TransE, uses the translation transformation to design the triple approximate equation. Precisely, it chooses the real number vector space

R^{k}

as the embedding space and regards the relation embedding

r

as a translation transformation from head entity embedding

h

to tail entity embedding

t

. So it designs the triple approximate equation as

h + r \approx t

. Following TransE, many improvements have emerged. TransH [23] claims it is better to assign a hyperplane in embedding space for every relation (the hyperplane’s normal vector noted as

r_{p}

), and only regards

r

as a translation from the projection of

h

to the projection of

t

on that hyperplane. Hence the triple approximate equation of TransH is

(I - r_{p} r_{p}^{⊤}) h + r \approx (I - r_{p} r_{p}^{⊤}) t

, where

I

is the identity matrix. TransR [24] generalizes TransH, it assigns a linear map to every relation r, noted as transfer matrix

W_{r}

. This linear map maps

h

and

t

into the relation space. Then TransR utilizes the images of

h

and

t

in the relation space with

r

to design the triple approximate equation in TransE’s style:

W_{r} h + r \approx W_{r} t

. Further, StransE [25] assigns each relation r two different transfer matrices

W_{r, 1}

and

W_{r, 2}

. Similarly, the triple approximate equation is designed as

W_{r, 1} h + r \approx W_{r, 2} t

. These derivative methods of TransE are collectively known as TransX. Their score functions can be written in the form of

f_{r} (h, t) = - ∥ g_{r, 1} (h) + r - g_{r, 2} (t) ∥

, where

g_{r, i} (\cdot)

denotes a matrix multiplication concerning relation r.

Since the large number of the derivative methods of TransE, some literature uses the translational methods to refer to KGE based on difference norm in general. But this is not accurate enough. Some other methods do not turn to translation transformation to design their triple approximate equations, such as TorusE [26] and RotatE. TorusE chooses a compact Lie group as its embedding space and can be regarded as a special case of RotatE when the embedding modulus are fixed [13]. RotatE embeds entities and relations into the complex number vector space

C^{k}

. It wants to replace the translation in

R^{k}

with the rotation in

C^{k}

. Specifically, for each element

r_{i} (1 \leq i \leq k)

of

r

, RotatE fixes it as a unitary complex number (i.e.,

| r_{i} | = 1

). Hence the complex multiplication between the i-th element of

h

(i.e.,

h_{i}

) and

r_{i}

means

h_{i}

rotates in its complex plane with angle

A r g (r_{i})

(i.e., the argument of complex

r_{i}

). Let us use ∘ to denote the Hadamard (element-wise) product between two complex vectors, the triple approximate equation of RotatE is

h \circ r \approx t

.

There are some KGE methods with score functions belonging to a special case of difference norm, which is in the form of

- ∥ h ⊙ r - t ∥

, where ⊙ is some binary operation. When the ideal optimization is achieved, the triple approximate equations of these methods hold:

h ⊙ r = t

. This property is useful to explain some abilities to model relation patterns. For example, TransE and RotatE are two of these methods, and because their binary operations are both associative and commutative, they can only model commutative composition patterns. For more details, please see Section 5.

2.1.2. KGE Based on Semantic Matching

The intuition of this sort of method is to measure the plausibility of a triple by inspecting the matching degree of the latent semantics of the two entities and the relation.

There is a family of methods called bilinear models that design score functions as bilinear maps of head and tail entities. RESCAL [9] may be the first bilinear model. It selects real vector space

R^{k}

as the embedding space of entities and assigns a

k \times k

real matrix

W_{r}

to each relation r. Then it directly applies

W_{r}

to define a bilinear map as the score function. To reduce the complexity of

W_{r}

, DistMult [11] restricts

W_{r}

to be a diagonal matrix. So DistMult can express

W_{r}

as a vector

r

in

R^{k}

and rewrite the score function in the form of the multi-linear dot product of

h

,

r

, and

t

. To overcome DistMult’s weakness in modeling antisymmetry relation pattern, ComplEx [12] extends the embedding space into the complex vector space

C^{k}

, and modifies the score function. QuatE [14] further develops ComplEx, it extends the embedding space into the quaternion vector space to obtain better expression ability. DualE [27] uses the dual quaternion vectors to design the embeddings of entities and relations, and chooses the dual quaternion inner product as the score function. DihEdral [28] designs entity embeddings with real vectors, and designs relation embeddings with dihedral group vectors, where each dihedral group is expressed as a second-order discrete real matrix. Although its score function is a bilinear form, which belongs to the type of semantic matching, it is theoretically proven that this score function is equivalent to a difference norm function in the form of

- ∥ h ⊙ r - t ∥

for optimizing relation embeddings. So DihEdral has the ability to model composition patterns like TransE and RotatE. Furthermore, since the multiplication of dihedral groups generally does not satisfy the commutative law, DihEdral can model noncommutative composition patterns. However, because the relation embeddings take discrete values, DihEdral has to use special treatments of the relation embeddings during the training process, and the actual performance is easily affected by special treatments. As for QuatE and DualE, their relation embeddings have the potential to model noncommutative composition patterns for the (dual) quaternion multiplication generally does not satisfy the commutative law. Nevertheless, because their score functions belong to the type of semantic matching and lack the theoretical equivalence to a difference norm function in the form of

- ∥ h ⊙ r - t ∥

like DihEdral at present, their abilities to model the composition patterns have no strict theoretical guarantees. More precisely, their triple approximate equations, if any, do not necessarily hold when the ideal optimization is achieved, which is a crucial but easily overlooked step for a rigorous proof.

Apart from bilinear models, some models based on neural networks emerged recently. Such as ConvE [17] and ConvKB [29] take the convolutional neural networks to construct the score functions.

Some mentioned KGE methods are listed in Table 1 with their score functions. Their abilities to model the relation patterns are shown in Table 2. We can see that our QRSE can model all relation patterns, which is a rare ability.

Additionally, “supervised relation composition” [30] is a method that can model composition patterns under supervision. But it is not a KGE method. Its goal is to design and train a function model that can take the embeddings of two relations as input and output the embedding of the composite relation of these two relations. The relation embeddings used are provided by an existing KGE model and are fixed once obtained. The supervisory information used for training is mined from the original KGs by another method. This method and the KGE models mentioned before belong to different research directions. The direction of KGE models studies how to directly model relation patterns (including composition patterns) by training entity and relation embeddings from the original KGs.

2.2. Path-Finding Method

This class of methods does not need score function to predict unknown entities, such as MINERVA [31], MultiHopKG [32], and DeepPath [33]. Instead, they should start from the query entity node and follow the direction implied by the query relation to search the KG for the unknown entity. Compared with KGE methods, their results are explainable to some extent since they can provide the inference paths as evidence, but the lack of precision is their weakness at present.

3. Preliminaries

Before introducing our proposed method, let us briefly explain the related concepts and geometric meaning of quaternions.

3.1. A Brief Introduction of Quaternion

As an extened number system from the complex numbers

C

, quaternions

H

[15] have to import three fundamental quaternion units

i

,

j

, and

k

, which are not existing in the real numbers. Each quaternion q can be expressed as

q = a + b i + c j + d k

, where a, b, c, and d are all real numbers. The addition of quaternions is defined as

(a_{1} + b_{1} i + c_{1} j + d_{1} k) + (a_{2} + b_{2} i + c_{2} j + d_{2} k) ≐ (a_{1} + a_{2}) + (b_{1} + b_{2}) i + (c_{1} + c_{2}) j + (d_{1} + d_{2}) k

. The multiplication between any two fundamental quaternion units are defined as

i^{2} = j^{2} = k^{2} = - 1

and

i j = - j i = k, j k = - k j = i, k i = - i k = j

. Obviously, this multiplication is associative but not commutative. For completeness, we also confirm the multiplication between any one in

{i, j, k}

, and a real number is commutative and associative. To obey the distributive law, we consequently get the multiplication between two arbitrary quaternions as:

\begin{matrix} (a_{1} + b_{1} i + c_{1} j + d_{1} k) (a_{2} + b_{2} i + c_{2} j + d_{2} k) \\ ≐ a_{1} a_{2} - b_{1} b_{2} - c_{1} c_{2} - d_{1} d_{2} \\ + (a_{1} b_{2} + b_{1} a_{2} + c_{1} d_{2} - d_{1} c_{2}) i \\ + (a_{1} c_{2} + c_{1} a_{2} + d_{1} b_{2} - b_{1} d_{2}) j \\ + (a_{1} d_{2} + d_{1} a_{2} + b_{1} c_{2} - c_{1} b_{2}) k . \end{matrix}

(1)

We can conclude that the multiplication of quaternions (also known as the Hamilton product) holds the associative and distributive law, but does not hold the commutative law in general. Nevertheless, there are some special cases where the commutative law holds.

Some useful concepts of quaternions are listed as follows (let

q = a + b i + c j + d k

):

Modulus: The modulus of q is written as

| q |

and is defined as

| q | ≐ \sqrt{a^{2} + b^{2} + c^{2} + d^{2}}

. Since the set of quaternions

H

is a linear space isomorphic to

R^{4}

with basis

(1, i, j, k)

, modulus means the length of q intuitively. In addition, if

| q | = 1

, q is called a unit quaternion.

Real and imaginary part: Similar to complex numbers, real number a is the real part of q, and real vector

v ≐ {(b, c, d)}^{⊤}

is the imaginary part of q. Sometimes we would like to express q in the form of

[a, v]

for convenience. Then, the multiplication of quaternions can be written as

[a_{1}, v_{1}] [a_{2}, v_{2}] = [a_{1} a_{2} - v_{1} \cdot v_{2}, a_{1} v_{2} + a_{2} v_{1} + v_{1} \times v_{2}]

, where · is the dot product and × is the cross product.

Conjugate: The conjugate of q is the quaternion

\bar{q} ≐ a - b i - c j - d k

. It has thses properties: (1)

\bar{q_{1} q_{2}} = \bar{q_{2}} \bar{q_{1}}

; (2)

q \bar{q} = \bar{q} q = {| q |}^{2}

, and from (1), (2) we get (3)

| q_{1} | | q_{2} | = | q_{1} q_{2} |

. As a corollary, the product of two unit quaternions is also a unit quaternion.

Reciprocal: If

q \neq 0

, the reciprocal of q is the quaternion

q^{- 1}

such that

q q^{- 1} = q^{- 1} q = 1

, and it is equivalent to define

q^{- 1} ≐ \bar{q} / {| q |}^{2}

.

3.2. The Geometric Meaning of the Multiplication of Quaternions

To see the geometric meaning of the multiplication of quaternions, we have to view the

H

as a linear space isomorphic to

R^{4}

with an orthonormal basis

(1, i, j, k)

. Any q in

H

can be expressed in the form as

ρ [cos θ, sin θ n]

, where

ρ \geq 0

and

∥ n ∥ = 1

. This is because if

q \neq 0

we could set

ρ = | q |

,

θ = arccos (a / | q |)

, and

n = v / ∥ v ∥

, whereas if

q = 0

we could set

ρ = 0

and choose

θ

and

n

arbitrarily. Note that

[cos θ, sin θ n]

is a unit quaternion and implies the direction of q, while

ρ

implies the length of q.

Take another quaternion

p = [s, u]

from

H

, then the product

p q = ρ (p [cos θ, sin θ n])

means a new quaternion reached via two steps from p: (1) changing the direction of p according to

[cos θ, sin θ n]

, (2) stretching the length by

ρ

times. So we only have to see what is the change implied by

p [cos θ, sin θ n]

.

Without loss of generality, we suppose

u \neq 0

and

u

is not parallel with

n

. Thus we can find another orthonormal basis of

H

:

([1, 0], [0, n], [0, n_{⊥}], [0, n_{\times}])

. Here,

n_{⊥} ≐ (u - (u \cdot n) n) / ∥ u - (u \cdot n) n ∥

and

n_{\times} ≐ n_{⊥} \times n

. Besides, we confirm the coordinates of p under this basis is

{(s, l, l_{⊥}, 0)}^{⊤}

, where

l = u \cdot n

and

l_{⊥} = ∥ u - (u \cdot n) n ∥

. Thus we can split p into two parts:

p = p_{1} + p_{2}

, where

p_{1} = [s, l n]

and

p_{2} = [0, l_{⊥} n_{⊥}]

. So we only have to see what do

p_{1} [cos θ, sin θ n]

and

p_{2} [cos θ, sin θ n]

mean.

Since

p_{1} [cos θ, sin θ n] = [s cos θ - l sin θ, (s sin θ + l cos θ) n]

, this product and

p_{1}

are both in the plane with basis

([1, 0], [0, n])

. And we can show the transformation from

p_{1}

to the product by their coordinates under basis

([1, 0], [0, n])

as:

p_{1} \to p_{1} [cos θ, sin θ n] : (\begin{matrix} s \\ l \end{matrix}) ⟶ (\begin{matrix} s cos θ - l sin θ \\ s sin θ + l cos θ \end{matrix}) = (\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}) (\begin{matrix} s \\ l \end{matrix}) .

(2)

So

p_{1} [cos θ, sin θ n]

means

p_{1}

rotates with angle

θ

counterclockwise in plane

s p a n ([1, 0], [0, n])

. In the same way, since

p_{2} [cos θ, sin θ n] = [0, l_{⊥} cos θ n_{⊥} + l_{⊥} sin θ n_{\times}]

, this product and

p_{2}

are both in the plane with basis

([0, n_{⊥}], [0, n_{\times}])

. And we can show the transformation from

p_{2}

to the product by their coordinates under basis

([0, n_{⊥}], [0, n_{\times}])

as:

p_{2} \to p_{2} [cos θ, sin θ n] : (\begin{matrix} l_{⊥} \\ 0 \end{matrix}) ⟶ (\begin{matrix} l_{⊥} cos θ \\ l_{⊥} sin θ \end{matrix}) = (\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}) (\begin{matrix} l_{⊥} \\ 0 \end{matrix}) .

(3)

So

p_{2} [cos θ, sin θ n]

means

p_{2}

rotates with angle

θ

counterclockwise in plane

s p a n ([0, n_{⊥}], [0, n_{\times}])

.

In a word, the change implied by

p [cos θ, sin θ n]

is: (1) Split p into two components

p_{1}

and

p_{2}

, where

p_{1}

is in plane

s p a n ([1, 0], [0, n])

and

p_{2}

is in plane

s p a n ([0, n_{⊥}], [0, n_{\times}])

; (2) Rotate

p_{1}

and

p_{2}

with angle

θ

counterclockwise in each plane simultaneously, as shown in Figure 1; (3) Add two new components together.

As a special case, when

u = 0

or

u

is parallel with

n

,

p = p_{1}

. Thus at that time,

p [cos θ, sin θ n]

only means the rotation in plane

s p a n ([1, 0], [0, n])

. Moreover, the geometric meaning of

q p

is almost the same with

p q

, except that the rotation for

p_{2}

is clockwise.

4. Proposed Method

Now we start to introduce our proposed KGE model. The embedding spaces of entities

E

and relations

R

are both the quaternion vector space

H^{k}

. For any

e \in E

and

r \in R

, their embeddings are noted as

e

and

r

in lower case bold letters, respectively. The i-th elements of

e

and

r

are written as

e_{i}

(

e_{i} \in H

) and

r_{i}

(

r_{i} \in H

) for every integer i from 1 to k. Our model is based on difference norm, so we first define its triple approximate equation as

h \otimes r \approx t

for any triple (h, r, t). Here, ⊗ denotes the Hadamard (element-wise) product between two quaternion vectors. So this triple approximate equation is equivalent to asking for

h_{i} r_{i} \approx t_{i}

for all

i (1 \leq i \leq k)

. In consequence, we get our score function:

f_{r} (h, t) ≐ - ∥ h \otimes r - t ∥ .

(4)

Here,

∥ q ∥

is the abbreviation of

{∥ q ∥}_{p, 1} ≐ \sum_{i = 1}^{k} {({| a_{i} |}^{p} + {| b_{i} |}^{p} + {| c_{i} |}^{p} + {| d_{i} |}^{p})}^{\frac{1}{p}}

, for any quaternion vector

q (q_{i} = a_{i} + b_{i} i + c_{i} j + d_{i} k, 1 \leq i \leq k)

.

p (p \geq 1)

is a hyperparameter.

According to the geometric meaning of the quaternion multiplication, we can explain the purpose of this triple approximate equation intuitively: We treat each element of relation embedding

r_{i}

(written in the form of

ρ_{i} [cos θ_{i}, sin θ_{i} n_{i}]

) as a two-step transformation from

h_{i}

to

t_{i}

: (1) Rotate

h_{i}

in two planes (

s p a n ([1, 0], [0, n_{i}])

and

s p a n ([0, n_{i, ⊥}], [0, n_{i, \times}])

) counterclockwise with angle

θ_{i}

; (2) Stretch

h_{i}

with scaling factor

ρ_{i}

. Thus we refer to our model as QuatRotatScalE (or QRSE, for short) due to we use Quaternions with Rotation and Scaling transformations to design the Embedding model.

Optimization

The general objective of KGE models is to return high scores for true triples and low scores for false triples. We adopt negative sampling as our training style to avoid the efficiency loss brought by the huge number of entities like most of the other KGE methods. The training KG usually only contains true triples (positive samples, noted as

Ω

) without false triples (negative samples). Thus we apply a common way (i.e., corrupting the positive samples) to obtain the negative samples. Suppose (h, r, t) is a positive sample, we can get two sets of negative samples by replacing the head or tail entity with other entities:

N_{h} (r, t) ≐ {(h^{'}, r, t) ∣ h^{'} is uniformly sampled from E s . t . (h^{'}, r, t) \notin Ω}

and

N_{t} (h, r) ≐ {(h, r, t^{'}) ∣ t^{'} is uniformly sampled from E s . t . (h, r, t^{'}) \notin Ω}

. The size of negative samples

N_{h} (r, t)

and

N_{t} (h, r)

is fixed and much smaller than

| E |

.

Following RotatE [13], we use the loss function on each triple (h, r, t) in the training KG as

L = - log σ (γ + f_{r} (h, t)) - \sum_{(h^{'}, r, t^{'}) \in N} p (h^{'}, r, t^{'}) log σ (- γ - f_{r} (h^{'}, t^{'}))

(5)

where

σ

is the sigmoid function,

γ

is a fixed margin, and

N

is

N_{h} (r, t)

or

N_{t} (h, r)

. In practice,

N

is regenerated in the same way (

N_{h} (r, t)

or

N_{t} (h, r)

) for every positive sample in one training batch. Once it turns to the next training batch,

N

should switch the regenerating way.

p (h^{'}, r, t^{'})

is the distribution of self-adversarial negative sampling proposed by RotatE [13] and is defined as

p (h^{'}, r, t^{'}) = \frac{exp (α f_{r} (h^{'}, t^{'}))}{\sum_{(h^{'}, r, t^{'}) \in N} exp (α f_{r} (h^{'}, t^{'}))}

(6)

where

α

is the temperature of sampling. The self-adversarial negative sampling can moderate the low efficiency of the uniform negative sampling. We also take Adam as our optimizer. Moreover,

p (h^{'}, r, t^{'})

plays a role of importance sampling ratio in L, so it need not backpropagate gradients through it.

5. Theoretic Analysis

5.1. Relation Patterns

As mentioned in introduction Section 1, modeling (i.e., identifying and utilizing) the relation patterns in KGs are the fundamental for KGE models to solve the link prediction problem. There are three types of relation patterns, which are very powerful and widely exist in various KGs [12,13,16,34,35]:

Symmetry/antisymmetry: A relation r is symmetric (antisymmetric) if

\forall e_{1}, e_{2} \in E

,

(e_{1}, r, e_{2}) \Rightarrow (e_{2}, r, e_{1})

(

(e_{1}, r, e_{2}) \Rightarrow \neg (e_{2}, r, e_{1})

).

Inversion: Relation

r_{1}

is inverse to relation

r_{2}

if

\forall e_{1}, e_{2} \in E

,

(e_{1}, r_{2}, e_{2}) \Rightarrow (e_{2}, r_{1}, e_{1})

.

Composition: Relation

r_{1}

is composed of relation

r_{2}

and relation

r_{3}

if

\forall e_{1}, e_{2}, e_{3} \in E

,

(e_{1}, r_{2}, e_{2}) \land (e_{2}, r_{3}, e_{3}) \Rightarrow (e_{1}, r_{1}, e_{3})

. We adopt form

r_{2} \oplus r_{3} = r_{1}

to describe this composition pattern for simplicity. Moreover, if both of

r_{2} \oplus r_{3} = r_{1}

and

r_{3} \oplus r_{2} = r_{1}

make sense,

r_{2} \oplus r_{3} = r_{1}

is a commutative composition pattern. Otherwise, if only

r_{2} \oplus r_{3} = r_{1}

holds, it is a noncommutative composition pattern.

5.2. Abilities to Model Relation Patterns

In this subsection, we prove that QRSE can model symmetry/antisymmetry, inversion, and composition patterns. Additionally, TransE and RotatE are unable to model noncommutative composition patterns. Next, if triple (h, r, t) is in the knowledge graph, we write it in the embedding space as

h \otimes r = t

for QRSE because its score function is a special case of difference norm

- ∥ h \otimes r - t ∥

and when the ideal optimization is achieved, we can directly get

h \otimes r = t

(we can replace ⊗ with + or ∘ for TransE or RotatE for the same reason).

QRSE can model symmetry/antisymmetry patterns:
Suppose $e_{2} \otimes r = e_{1}$ and $e_{1} \otimes r = e_{2}$ . We can get $e_{1} \otimes r \otimes r = e_{1}$ . It means for any $i (1 \leq i \leq k)$ , $e_{1, i} r_{i} r_{i} = e_{1, i}$ . If $e_{1, i} = 0$ , $r_{i}$ can be any quaternion. But if $e_{1, i} \neq 0$ , $r_{i}$ must satisfies:

$e_{1, i} r_{i} r_{i} = e_{1, i} \Leftrightarrow e_{1, i}^{- 1} e_{1, i} r_{i} r_{i} = e_{1, i}^{- 1} e_{1, i} \Leftrightarrow r_{i} r_{i} = 1 \Leftrightarrow r_{i} = r_{i}^{- 1} .$

(7)

$∵ \forall q_{1}, q_{2} \in H, | q_{1} | | q_{2} | = | q_{1} q_{2} |$ , $∴ | r_{i} | = 1$ . $∵ \forall q \in H, q^{- 1} = \bar{q} / {| q |}^{2}$ , $∴ r_{i} = \bar{r_{i}}$ . Thus $r_{i}$ is 1 or $- 1$ . In a word, if $r$ satisfies $r_{i} \in {1, - 1} (1 \leq i \leq k)$ , $r$ models a symmetry pattern, otherwise, it models a antisymmetry pattern.
QRSE can model inversion patterns:
Suppose $e_{2} \otimes r_{1} = e_{1}$ and $e_{1} \otimes r_{2} = e_{2}$ . We can get $e_{1} \otimes r_{2} \otimes r_{1} = e_{1}$ . It means for any $i (1 \leq i \leq k)$ , $e_{1, i} r_{2, i} r_{1, i} = e_{1, i}$ . If $e_{1, i} = 0$ , $r_{2, i}$ and $r_{1, i}$ can be any quaternions. But if $e_{1, i} \neq 0$ , $r_{2, i}$ and $r_{1, i}$ must satisfy:

$\begin{matrix} e_{1, i} r_{2, i} r_{1, i} = e_{1, i} \Leftrightarrow e_{1, i}^{- 1} e_{1, i} r_{2, i} r_{1, i} = e_{1, i}^{- 1} e_{1, i} \\ \Leftrightarrow r_{2, i} r_{1, i} = 1 \Leftrightarrow r_{1, i} = r_{2, i}^{- 1} . \end{matrix}$

(8)

Define $q^{- 1} ≐ {(q_{1}^{- 1}, q_{2}^{- 1}, . . ., q_{k}^{- 1})}^{⊤}$ for all $q \in H^{k}, q_{i} \neq 0 (1 \leq i \leq k)$ . We can conclude that if $r_{1} = r_{2}^{- 1}$ , $r_{1}$ and $r_{2}$ model an inversion pattern.
QRSE can model composition patterns:
Suppose $e_{1} \otimes r_{2} = e_{2}$ , $e_{2} \otimes r_{3} = e_{3}$ , and $e_{1} \otimes r_{1} = e_{3}$ . We can get $e_{1} \otimes r_{2} \otimes r_{3} = e_{1} \otimes r_{1}$ . It means for any $i (1 \leq i \leq k)$ , $e_{1, i} r_{2, i} r_{3, i} = e_{1, i} r_{1, i}$ . If $e_{1, i} = 0$ , $r_{2, i}$ , $r_{3, i}$ , and $r_{1, i}$ can be any quaternions. But if $e_{1, i} \neq 0$ , $r_{2, i}$ , $r_{3, i}$ , and $r_{1, i}$ must satisfy:

$e_{1, i} r_{2, i} r_{3, i} = e_{1, i} r_{1, i} \Leftrightarrow e_{1, i}^{- 1} e_{1, i} r_{2, i} r_{3, i} = e_{1, i}^{- 1} e_{1, i} r_{1, i} \Leftrightarrow r_{2, i} r_{3, i} = r_{1, i} .$

(9)

Moreover, if we still suppose $e_{4} \otimes r_{3} = e_{5}$ , $e_{5} \otimes r_{2} = e_{6}$ , and $e_{4} \otimes r_{1} = e_{6}$ . Then if $e_{4, i} \neq 0$ for all $i (1 \leq i \leq k)$ , $r_{3, i}$ , $r_{2, i}$ , and $r_{1, i}$ must satisfy: $r_{3, i} r_{2, i} = r_{1, i}$ . This means $r_{2, i} r_{3, i} = r_{3, i} r_{2, i}$ . If we note $r_{2, i} = [a_{2, i}, v_{2, i}]$ and $r_{3, i} = [a_{3, i}, v_{3, i}]$ , then we will get:

$\begin{matrix} [a_{2, i} a_{3, i} - v_{2, i} \cdot v_{3, i}, a_{2, i} v_{3, i} + a_{3, i} v_{2, i} + v_{2, i} \times v_{3, i}] \\ = & [a_{3, i} a_{2, i} - v_{3, i} \cdot v_{2, i}, a_{3, i} v_{2, i} + a_{2, i} v_{3, i} + v_{3, i} \times v_{2, i}] \\ \Leftrightarrow v_{2, i} \times v_{3, i} = v_{3, i} \times v_{2, i} \\ \Leftrightarrow v_{2, i} = 0 or v_{3, i} = λ v_{2, i} (λ \in R) . \end{matrix}$

(10)

We can conclude that if $r_{2} \otimes r_{3} = r_{1}$ , $r_{2}$ , $r_{3}$ , and $r_{1}$ model a composition pattern. Moreover, if $v_{2, i}$ is parallel with $v_{3, i}$ for all $i (1 \leq i \leq k)$ , it is a commutative composition pattern, otherwise, it is a noncommutative composition pattern.
TransE and RotatE can not model noncommutative composition patterns, and they can only model commutative composition patterns:
For TransE, we suppose $e_{1} + r_{2} = e_{2}$ , $e_{2} + r_{3} = e_{3}$ , $e_{1} + r_{1} = e_{3}$ , $e_{4} + r_{3} = e_{5}$ , $e_{5} + r_{2} = e_{6}$ , but $e_{4} + r_{1} \neq e_{6}$ , which means the composition of relation $r_{2}$ and $r_{3}$ is noncommutative. From the first three equations we get $r_{2} + r_{3} = r_{1}$ , and from the fourth and fifth equations we get $e_{4} + r_{3} + r_{2} = e_{6}$ . Because $r_{2} + r_{3} = r_{3} + r_{2}$ , we get $e_{4} + r_{1} = e_{6}$ , which contradicts the condition. Therefore TransE can not model noncommutative composition patterns. If we replace the condition $e_{4} + r_{1} \neq e_{6}$ with $e_{4} + r_{1} = e_{6}$ , then the composition of relation $r_{2}$ and $r_{3}$ becomes commutative composition. In this case the previous contradiction disappears, which means TransE can model commutative composition patterns.
As for RotatE, we suppose $e_{1} \circ r_{2} = e_{2}$ , $e_{2} \circ r_{3} = e_{3}$ , $e_{1} \circ r_{1} = e_{3}$ , $e_{4} \circ r_{3} = e_{5}$ , $e_{5} \circ r_{2} = e_{6}$ , but $e_{4} \circ r_{1} \neq e_{6}$ , which means the composition of relation $r_{2}$ and $r_{3}$ is noncommutative. Since $r_{2} \circ r_{3} = r_{3} \circ r_{2}$ (the multiplication of complex numbers satisfies the commutative law), we can get $e_{4} \circ r_{1} = e_{6}$ in the same way as TransE, which contradicts the condition. So RotatE can not model noncommutative composition patterns. If we replace the condition $e_{4} \circ r_{1} \neq e_{6}$ with $e_{4} \circ r_{1} = e_{6}$ , then the composition of relation $r_{2}$ and $r_{3}$ becomes commutative composition. In this case the previous contradiction disappears, which means RotatE can model commutative composition patterns.

6. Experiments

In this section, we first evaluate QRSE with RotatE on a small knowledge graph made up of two families. This experiment will verify the superiority of QRSE in modeling noncommutative composition relation patterns. Then we evaluate QRSE and compare it with many baselines in two well-established and widely used real-world datasets.

6.1. Experiment on a KG about Two Families

There are 10 entities and 4 relations in the training KG. Each entity is a member of one family, and each relation is a type of kinship. Such as triple (Am1, son, Am2) means Am1 has a son called Am2. All of the triples in the training KG are shown in Figure 2, where each directed edge represents a triple, and its direction is from the head entity to the tail entity. Furthermore, the test set contains two triples: (Bm1, daughter_of_son, Bw3) and (Bm1, son_of_daughter, Bm3). We let models predict the head or tail entity for each test triple, so there are 4 queries during the test process.

Since we need 2 and 4 real numbers to determine a complex number and a quaternion respectively, we take

C^{10}

(i.e., embedding dimension

k = 10

) and

H^{5}

(i.e.,

k = 5

) as the entity embedding spaces for RotatE and QRSE. Thus in practice, we can express the entity embeddings of RotatE and QRSE as 20-D real vectors. Except for the embedding dimension k, we keep other hyperparameters the same for two models: batch size

b = 10

, self-adversarial sampling temperature

α = 0

, fixed margin

γ = 0

, learning rate

η = 0.001

, negative sampling size

| N | = 2

, and the order of norm in score function

p = 2

.

We use Hit@1 to measure the performance of models, which means the proportion of the correctly answered queries (i.e., the true answer’s score is ranked first) among all test queries. The test performances of RotatE and QRSE are shown in Figure 3. We can see that QRSE gets the best Hit@1 value 1.00 quickly, and after that, it keeps this Hit@1 value all the time during the training process. RotatE also gets the best Hit@1 value quickly; however, after that, it’s Hit@1 value is always fluctuating between 0.5 and 1.00 randomly. To explain this phenomenon, we inspected the detailed scores and embeddings at step 16,000, which is large enough to ensure the convergence of the two models.

The top 3 scores for all test queries are shown in Table 3. For the two queries to predict the head entity Bm1, the scores of Bm1 are much higher than the second candidate entities for both RotatE and QRSE. However for the two queries to predict the tail entities Bm3 and Bw3, only QRSE keeps the large gap between the first and the second score, whereas RotatE gives very close scores for the top 2 candidate entities on both of the two queries. This result reveals that, for RotatE, the score ranks for the top 2 candidates are unstable and easily affected by the random noise on the two queries to predict the tail. That is why the Hit@1 of RotatE fluctuates during training. Moreover, for RotatE, the top 2 candidate entities are Bm3 and Bw3 for both of the two tail queries. Thus we guess the embeddings of these two entities are also very close.

Figure 4 shows the embeddings of Bm3 and Bw3 in RotatE and QRSE. As we guessed, the two embeddings are very close in RotatE but different in QRSE. This result verified that RotatE is unable to model noncommutative composition patterns, but QRSE can. Let us use the bold type to indicate the embeddings as before. For RotatE, along with the training process, Bw3 will close to

Bm 2 \circ daughter

, and Bm2 will close to

Bm 1 \circ son

. Hence Bw3 will close to

Bm 1 \circ son \circ daughter

. Similarly, Bm3 will close to

Bm 1 \circ daughter \circ son

. Because

daughter \circ son = son \circ daughter

, Bw3 will close to Bm3. For QRSE,

daughter \otimes son \neq son \otimes daughter

in general, so Bw3 will not close to Bm3.

We can also show this fact by directly inspecting the relation embeddings of the two models in Figure 5. Note that

daughter \oplus son

and

son \oplus daughter

are not the relations in KG but the combinations made up of the relations in KG. Their “embeddings” are calculated from the embeddings of some relations (e.g., the “embedding” of

daughter \oplus son

is

daughter \otimes son

in QRSE). Obviously, the embeddings of son_of_daughter and daughter_of_son are almost the same in RotatE, since they are both approaching

daughter \circ son

during training. However, they are different in QRSE since the embeddings of son_of_daughter is approaching

daughter \otimes son

while the other is approaching

son \otimes daughter

during training.

6.2. Experiment on Real-World Datasets

6.2.1. Experimental Setting

We still evaluated our method on two well-established and widely used real-world knowledge graphs, FB15k-237 [16] and WN18RR [17], with several strong baselines.

FB15k-237 is selected from FB15k [10], which is a subset of Freebase and mainly records the facts about movies, actors, and sports. Because FB15k suffers from test leakage through inverse relations: there are too many inversion patterns in KG, which are too easy to model, and even a simple rule-based model can perform well [17]. To make the results more reliable, FB15k-237 removed these inverse patterns. The statistics of FB15k-237 are 14,541 entities, 237 relations, 272,115 training triples, 17,535 validation triples, and 20,466 test triples.

WN18RR is selected from WN18 [10], which is a subset of WordNet and records lexical relations between words. WN18 also suffers from test leakage through inverse relations, so WN18RR removed its inverse patterns too. The statistics of WN18RR are 40,943 entities, 11 relations, 86,835 training triples, 3034 validation triples, and 3134 test triples.

The ranges of the hyperparameters for the grid search are following RotatE as embedding dimension

k \in {125, 250, 500, 1000}

, batch size

b \in {512, 1024, 2048}

, and fixed margin

γ \in {3, 6, 9, 12, 18, 24, 30}

. Moreover, we searched self-adversarial sampling temperature

α

in

{0.5, 1.0, 1.5}

, learning rate

η

in

{0.00005, 0.0001, 0.0002}

, negative sampling size

| N |

in

{16, 32, 64, 128}

, and the order p of norm in score function in

{2, 3, 4, 5, 6, 7}

. The embeddings are also uniformly initialized.

From each test triple (h, r, t), we generate two queries: (?, r, t) and (h, r, ?). Given each query, we can make a candidate triple by placing a candidate entity on the place of the entity to predict. The score of each candidate entity is just the score of its corresponding candidate triple. While ranking all the scores, we omit the scores of those candidate triples that already exist in training, validation, and test set, except the true answer for the query. This process is called “filtered” in some literature and is widely adopted in existing methods to avoid possibly flawed evaluation.

6.2.2. Results

We adopt these standard evaluation measures for both of the datasets: the mean reciprocal rank of the true answers (MRR), the proportion of queries whose true answers are ranked in the top k (Hit@k).

The link prediction results on real-world datasets are shown in Table 4. The result of TransE is taken from [29]. The results of DistMult, ComplEx, and ConvE are taken from [17]. The results of RotatE and DualE are taken from [13,27], respectively. The results of DihEdral(STE) and DihEdral(Gumbel) are taken from [28], where STE and Gumbel are two special treatments of the discrete relation embeddings. The results of QuatE and QuatE(TC) are taken from [14], where TC indicates the corresponding model using type constraints [36]. From this table, we can see that QRSE outperforms RotatE largely on all datasets and evaluation measures. This result supports our analysis of the modeling ability of the composition patterns. Compared with DihEdral(STE) and DihEdral(Gumbel), we find QRSE outperforms both of them on the two real-world datasets, whereas DihEdral(STE) is better than DihEdral(Gumbel) on FB15k-237 and just the opposite on WN18RR. This means the performance of DihEdral is easily affected by special treatments, and DihEdral can not perform well on the two real-world datasets simultaneously. Compared with DualE and QuatE, we find QRSE outperforms both of them too. This means that among all methods using (dual) quaternions so far, QRSE has explored the greatest potential of the (dual) quaternion space in the implementation of knowledge graph embedding. Because type constraints [36] can integrate prior knowledge into various KGE models and can significantly improve their performance in link prediction tasks, QRSE and most baselines display the results without it for fairness except QuatE(TC). Surprisingly, we can even see that QRSE is superior to QuatE with type constraints overall slightly. The success on this unfair comparison further demonstrates the excellence of QRSE. Overall, our QRSE has reached the state-of-the-art in link prediction problem on real-world datasets.

7. Conclusions and Future Work

We proposed a novel knowledge graph embedding model QRSE based on quaternions. QRSE is a KGE model that can model the noncommutative composition patterns. Besides, it can also model many other relation patterns, such as symmetry/antisymmetry, inversion, and commutative composition patterns. We varified these properties by theoretical proofs and experiments. From the definition of the triple approximate equation of QRSE, we can easily see that QRSE is a generalization of RotatE. Conversely, in some special cases, QRSE will degenerate to RotatE. For example, the case when the coefficients of

j

and

k

are fixed as 0 for all quaternions in all embeddings, and the modulus of all quaternions in relation embeddings are fixed as 1. Before QRSE, QuatE has already generalized ComplEx through replacing the complex numbers with quaternions. However, QuatE only takes advantage of that quaternions are more expressive than complex numbers. While our method not only leverages the expression advantage but also exploits the noncommutative property of quaternion multiplication to model the noncommutative composition patterns. The results of experiments on real-world datasets show that QRSE reaches the state-of-the-art on the link prediction problem. For future work, our plan is to combine QRSE with deep models for natural language processing. With its help, we expect deep models to achieve higher accuracy on question answering tasks and make the model’s answers more interpretable.

Author Contributions

Conceptualization, C.X., C.F., D.C. and X.H.; methodology, C.X.; software, C.X.; validation, C.X. and C.F.; formal analysis, C.X.; investigation, C.X.; resources, D.C. and X.H.; data curation, C.X.; writing—original draft preparation, C.X.; writing—review and editing, C.X., C.F., D.C. and X.H.; visualization, C.X.; supervision, D.C. and X.H.; project administration, D.C. and X.H.; funding acquisition, D.C. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by The National Nature Science Foundation of China (Grant Nos.: 62273302, 62036009, U1909203, 61936006), in part by Innovation Capability Support Program of Shaanxi (Program No. 2021TD-05).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study and the code are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W. Collaborative Knowledge Base Embedding for Recommender Systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar] [CrossRef]
Hao, Y.; Zhang, Y.; Liu, K.; He, S.; Liu, Z.; Wu, H.; Zhao, J. An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 221–231. [Google Scholar] [CrossRef] [Green Version]
Xiong, C.; Power, R.; Callan, J. Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 3–7 April 2017; pp. 1271–1279. [Google Scholar] [CrossRef] [Green Version]
Yang, B.; Mitchell, T.M. Leveraging Knowledge Bases in LSTMs for Improving Machine Reading. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 1436–1446. [Google Scholar] [CrossRef] [Green Version]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z.G. DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Republic of Korea, 11–15 November 2007; pp. 722–735. [Google Scholar] [CrossRef] [Green Version]
Bollacker, K.D.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008; pp. 1247–1250. [Google Scholar] [CrossRef]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar] [CrossRef] [Green Version]
Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Nickel, M.; Tresp, V.; Kriegel, H. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, DC, USA, 28 June–2 July 2011; pp. 809–816. [Google Scholar]
Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion Knowledge Graph Embeddings. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 2731–2741. [Google Scholar]
Hamilton, W.R. LXXVIII. On quaternions; or on a new system of imaginaries in Algebra: To the editors of the Philosophical Magazine and Journal. Philos. Mag. J. Sci. 1844, 25, 489–495. [Google Scholar] [CrossRef] [Green Version]
Toutanova, K.; Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, Beijing, China, 31 July 2015; pp. 57–66. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 1811–1818. [Google Scholar]
Chen, L.; Wang, F.; Yang, R.; Xie, F.; Wang, W.; Xu, C.; Zhao, W.; Guan, Z. Representation learning from noisy user-tagged data for sentiment classification. Int. J. Mach. Learn. Cybern. 2022, 13, 3727–3742. [Google Scholar] [CrossRef]
Zhao, W.; Guan, Z.; Chen, L.; He, X.; Cai, D.; Wang, B.; Wang, Q. Weakly-Supervised Deep Embedding for Product Review Sentiment Analysis. IEEE Trans. Knowl. Data Eng. 2018, 30, 185–197. [Google Scholar] [CrossRef]
Yang, Y.; Guan, Z.; Zhao, W.; Weigang, L.; Zong, B. Graph Substructure Assembling Network with Soft Sequence and Context Attention. IEEE Trans. Knowl. Data Eng. 2022, 1. [Google Scholar] [CrossRef]
Yang, Y.; Guan, Z.; Li, J.; Zhao, W.; Cui, J.; Wang, Q. Interpretable and Efficient Heterogeneous Graph Convolutional Network. IEEE Trans. Knowl. Data Eng. 2023, 35, 1637–1650. [Google Scholar] [CrossRef]
Yang, Y.; Guan, Z.; Wang, Z.; Zhao, W.; Xu, C.; Lu, W.; Huang, J. Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 16–19 May 2022; Oh, A.H., Agarwal, A., Belgrave, D., Cho, K., Eds.; 2022. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29thh AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
Nguyen, D.Q.; Sirts, K.; Qu, L.; Johnson, M. STransE: A novel embedding model of entities and relationships in knowledge bases. In Proceedings of the NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 460–466. [Google Scholar]
Ebisu, T.; Ichise, R. TorusE: Knowledge Graph Embedding on a Lie Group. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 1819–1826. [Google Scholar]
Cao, Z.; Xu, Q.; Yang, Z.; Cao, X.; Huang, Q. Dual quaternion knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 6894–6902. [Google Scholar]
Xu, C.; Li, R. Relation Embedding with Dihedral Group in Knowledge Graph. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 263–272. [Google Scholar]
Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D.Q. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018; Volume 2, pp. 327–333. [Google Scholar]
Chen, W.; Hakami, H.; Bollegala, D. Learning to compose relational embeddings in knowledge graphs. In Proceedings of the Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, 11–13 October 2019; Revised Selected Papers 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 56–66. [Google Scholar]
Das, R.; Dhuliawala, S.; Zaheer, M.; Vilnis, L.; Durugkar, I.; Krishnamurthy, A.; Smola, A.; McCallum, A. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Lin, X.V.; Socher, R.; Xiong, C. Multi-Hop Knowledge Graph Reasoning with Reward Shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3243–3253. [Google Scholar]
Xiong, W.; Hoang, T.; Wang, W.Y. DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017; pp. 564–573. [Google Scholar]
Guu, K.; Miller, J.; Liang, P. Traversing Knowledge Graphs in Vector Space. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; pp. 318–327. [Google Scholar]
Lin, Y.; Liu, Z.; Luan, H.; Sun, M.; Rao, S.; Liu, S. Modeling Relation Paths for Representation Learning of Knowledge Bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; pp. 705–714. [Google Scholar]
Krompaß, D.; Baier, S.; Tresp, V. Type-constrained representation learning in knowledge graphs. In Proceedings of the The Semantic Web-ISWC 2015: 14th International Semantic Web Conference, Bethlehem, PA, USA, 11–15 October 2015; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2015; pp. 640–655. [Google Scholar]

Figure 1. How do

p_{1}

and

p_{2}

rotate when they are multiplied by

[cos θ, sin θ n]

on the right.

Figure 1. How do

p_{1}

and

p_{2}

rotate when they are multiplied by

[cos θ, sin θ n]

on the right.

Figure 2. The structure of the training KG, where each directed edge represents a triple.

Figure 3. The Hit@1 performance of RotatE and QRSE on the test set along with the training process.

Figure 4. The entity embeddings of RotatE and QRSE on training step 16,000. The 10-D complex or 5-D quaternion vectors are expressed in the corresponding 20-D real vectors.

Figure 5. The relation embeddings of RotatE and QRSE on training step 16,000. A relation embedding of RotatE has 10 complex numbers with modulus 1, which are determined by their 10 arguments. Thus we express it by its 10 arguments in angle degrees. For QRSE, we continue use the corresponding 20-D real vectors for each relation embedding.

Table 1. Score functions and embedding spaces of several KGE models.

〈 a, b, c 〉 ≐ \sum_{i = 1}^{k} a_{i} b_{i} c_{i}

means the multi-linear dot product of vector

a

,

b

, and

c

;

\bar{\cdot}

denotes conjugate for a complex or quaternion vectors;

Re (\cdot)

denotes the real part of a complex number or quaternion; ⊗ indicates the Hadamard (element-wise) product between two quaternion vectors. Note that we report an equivalent formulation for QuatE to show the inheritance relationship with ComplEx.

Table 1. Score functions and embedding spaces of several KGE models.

〈 a, b, c 〉 ≐ \sum_{i = 1}^{k} a_{i} b_{i} c_{i}

means the multi-linear dot product of vector

a

,

b

, and

c

;

\bar{\cdot}

denotes conjugate for a complex or quaternion vectors;

Re (\cdot)

denotes the real part of a complex number or quaternion; ⊗ indicates the Hadamard (element-wise) product between two quaternion vectors. Note that we report an equivalent formulation for QuatE to show the inheritance relationship with ComplEx.

Model	Score Function	Embedding Space
TransE	$- ∥ h + r - t ∥$	$h, r, t \in R^{k}$
TransX	$- ∥ g_{r, 1} (h) + r - g_{r, 2} (t) ∥$	$h, r, t \in R^{k}$
RotatE	$- ∥ h \circ r - t ∥$	$h, r, t \in C^{k}, \| r_{i} \| = 1$
RESCAL	$h^{⊤} W_{r} t$	$h, t \in R^{k}, W_{r} \in R^{k \times k}$
DistMult	$〈 h, r, t 〉$	$h, r, t \in R^{k}$
ComplEx	$Re (〈 h, r, \bar{t} 〉)$	$h, r, t \in C^{k}$
QuatE	$Re (〈 h, r, \bar{t} 〉)$	$h, r, t \in H^{k}, \| r_{i} \| = 1$
QRSE	$- ∥ h \otimes r - t ∥$	$h, r, t \in H^{k}$

Table 2. The modeling ability comparison for various relation patterns among different models (partial reference from [13]).

Model	Symmetry	Anti- Symmetry	Inversion	Commutative Composition	Noncommutative Composition
TransE	×	√	√	√	×
TransX	√	√	×	×	×
RotatE	√	√	√	√	×
RESCAL	√	√	√	×	×
DistMult	√	×	×	×	×
ComplEx	√	√	√	×	×
QuatE	√	√	√	×	×
QRSE	√	√	√	√	√

Table 3. The detailed test results of RotatE and QRSE at training step 16,000.

Model	Test Triple	Entity to Predict	Ranked Scores for Top 3 Candidate Entities
RotatE	(`Bm1`, `son_of_daughter`, `Bm3`)	`Bm1`	`Bm1`: −0.0224, `Am2`: −5.4026, `Bm3`: −5.6849
	(`Bm1`, `son_of_daughter`, `Bm3`)	`Bm3`	`Bm3`: −0.0224, `Bw3`: −0.0227, `Bm1`: −5.6840
	(`Bm1`, `daughter_of_son`, `Bw3`)	`Bm1`	`Bm1`: −0.0178, `Am2`: −5.3975, `Bm3`: −5.6668
	(`Bm1`, `daughter_of_son`, `Bw3`)	`Bw3`	`Bw3`: −0.0178, `Bm3`: −0.0179, `Bm1`: −5.6654
QRSE	(`Bm1`, `son_of_daughter`, `Bm3`)	`Bm1`	`Bm1`: −0.00087, `Aw2`: −4.8535, `Aw3`: −4.9947
	(`Bm1`, `son_of_daughter`, `Bm3`)	`Bm3`	`Bm3`: −0.00087, `Aw3`: −5.2716, `Bw3`: −5.3903
	(`Bm1`, `daughter_of_son`, `Bw3`)	`Bm1`	`Bm1`: −0.00094, `Aw2`: −4.8532, `Aw3`: −4.9948
	(`Bm1`, `daughter_of_son`, `Bw3`)	`Bw3`	`Bw3`: −0.00094, `Bm3`: −5.3905, `Am2`: −5.8095

Table 4. Link prediction results on the FB15k-237 and WN18RR datasets. Numbers in boldface are the best, and underlined numbers are the second best.

Model	FB15k-237				WN18RR
Model	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10
TransE	0.294	-	-	0.465	0.226	-	-	0.501
DistMult	0.241	0.155	0.263	0.419	0.43	0.39	0.44	0.49
ComplEx	0.247	0.158	0.275	0.428	0.44	0.41	0.46	0.51
ConvE	0.325	0.237	0.356	0.501	0.43	0.40	0.44	0.52
RotatE	0.338	0.241	0.375	0.533	0.476	0.428	0.492	0.571
DualE	0.330	0.237	0.363	0.518	0.482	0.440	0.500	0.561
DihEdral(STE)	0.320	0.230	0.353	0.502	0.480	0.452	0.491	0.536
DihEdral(Gumbel)	0.300	0.204	0.332	0.496	0.486	0.442	0.505	0.557
QuatE	0.311	0.221	0.342	0.495	0.481	0.436	0.500	0.564
QuatE(TC)	0.348	0.248	0.382	0.550	0.488	0.438	0.508	0.582
QRSE	0.350	0.252	0.390	0.548	0.491	0.443	0.508	0.581

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, C.; Fu, C.; Cai, D.; He, X. Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding. Electronics 2023, 12, 1348. https://doi.org/10.3390/electronics12061348

AMA Style

Xiang C, Fu C, Cai D, He X. Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding. Electronics. 2023; 12(6):1348. https://doi.org/10.3390/electronics12061348

Chicago/Turabian Style

Xiang, Chao, Cong Fu, Deng Cai, and Xiaofei He. 2023. "Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding" Electronics 12, no. 6: 1348. https://doi.org/10.3390/electronics12061348

APA Style

Xiang, C., Fu, C., Cai, D., & He, X. (2023). Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding. Electronics, 12(6), 1348. https://doi.org/10.3390/electronics12061348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Noncommutative Composition of Relations for Knowledge Graph Embedding

Abstract

1. Introduction

2. Related Work

2.1. KGE Method

2.1.1. KGE Based on Difference Norm

2.1.2. KGE Based on Semantic Matching

2.2. Path-Finding Method

3. Preliminaries

3.1. A Brief Introduction of Quaternion

3.2. The Geometric Meaning of the Multiplication of Quaternions

4. Proposed Method

Optimization

5. Theoretic Analysis

5.1. Relation Patterns

5.2. Abilities to Model Relation Patterns

6. Experiments

6.1. Experiment on a KG about Two Families

6.2. Experiment on Real-World Datasets

6.2.1. Experimental Setting

6.2.2. Results

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI