1. Introduction
Knowledge graphs, which represent knowledge from real world applications, contain abundant facts. In knowledge graphs, each fact is represented by a triple
$(h,r,t)$ which indicates that the relation
r between the head entity
h and tail entity
t. Knowledge graphs have been applied to various tasks such as explainable recommendation system [
1], question answering [
2] and prediction of future research collaborations [
3].
Predicting missing facts (i.e., link prediction) is a fundamental task in knowledge graph research. Various models aiming at embedding entities and relations into lowdimension spaces have been proposed. For example, TransE [
4] learned the embeddings of entities and relations by transforming head entity to tail entity according to the relation; RotatE [
5] and QuatE [
6] learned the embeddings of entities and relations by considering relations as rotations from head entities to tail entities. However, existing transformation based models fail to capture multiple relations between head and tail entities. For example, as shown in
Figure 1,
David Lynch is the director, the creator and an actor in the film
Mulholland Drive, i.e., there are three relations: directed, created and actedIn between
David Lynch and
Mulholland Drive. These relations between head entity
David Lynch and tail entity
Mulholland Drive have no semantic connections with each other, which should be represented by spatially dispersed embeddings. Most existing transformation based models, however, assume that there is only one relation between each pair of head and tail entities. For instance, for each triple
$(h,r,t)$, their corresponding embeddings are assumed to be satisfied with
$h+r\approx t$ in TransE, which indicates, for
$(h,{r}_{1},t)$,
$(h,{r}_{2},t)$, and
$(h,{r}_{3},t)$, the embeddings of
${r}_{1}$,
${r}_{2}$,
${r}_{3}$ are similar, as shown in
Figure 2c (i.e.,
${\mathbf{r}}_{1}\approx {\mathbf{r}}_{2}\approx {\mathbf{r}}_{3}$). To overcome this challenge, we propose a novel approach that considers multiple relations between head and tail entities in knowledge graph.
In this paper, we propose a model called
DualQuatE which utilizes the various combinations of distinct rotations and translations to represent multiple relations between head and tail entities. Based on this, easily to think of RotatE combined with TransE in complex space and real space. However, it is hard to find a uniform mathematical expression to convey their combination. Therefore, we propose
DualQuatE which embeds entities and relations into dual quaternion space to combine rotation and translation. The dual quaternion consists of real part and dual part. More concretely, we embed entities with pure quaternions vectors in threedimensional space to represent entity embeddings. To distinguish various relations between head entity
h and tail entity
t, we design a score function to utilize dual quaternion Hamilton product to model relations as interaction of rotation and translation. We utilize distinct interactions of rotations and translations to represent various relations between head and tail entities. Compared with RotatE and TransE in twodimensional space, the dual quaternions space is eightdimensional with six real degrees of freedom, three for translation and three for rotation; we can explore the interaction of rotation and translation with more free degrees in higher dimensions. Summarized in
Table 1, our model has rich expression abilities of relations (i.e., relation patterns and multiple relations).
To conclude, the contributions of our proposed model are listed as follows:
We introduce dual quaternions to knowledge graph embeddings.
We propose a novel transformation based model DualQuatE to overcome the challenge of multiple relations between two entities.
Our experiments denote that DualQuatE is effective compared to the existing stateoftheart models.
The rest paper is organized as follows. In
Section 2, we introduce the related work.
Section 3 presents prerequisite knowledge about dual quaternions. In
Section 4, we describe our model. We present the results of experiments and make analysis and discussions in
Section 5. In
Section 6, we introduce the conclusion of this paper and future work.
2. Related Work
To gain highquality knowledge graphs, approaches which utilize knowledge graph embedding to predict missing facts have been proposed recently. These methods fall into two broad categories in [
10]: transformation based models and semantic matching models. Specifically, transformation based models transform head entity to tail entity by relations, while semantic matching models match entities and relations semantics in latent spaces. Compared to transformation based models, semantic matching models suffer from poor interpretability.
Transformation based models usually embed entities and relations into vector space and model the relation as a transformation from head entity embeddings to tail entity embeddings. One of the most representative is TransE which mapped entities and relations to the same space
${\mathbb{R}}^{k}$. For each triple
$(h,r,t)$, entity embeddings
$\mathbf{h},\mathbf{t}$ and relation embedding
$\mathbf{r}$ hold
$\mathbf{h}+\mathbf{r}\approx \mathbf{t}$. Then a series of extensions based on TransE are presented to improve accuracy and interpretability. For instance, TransR [
11] introduced relationsspecific spaces. TransR modeled relations and entities into different spaces following the idea that TransE can only express 1to1 relations. RotatE mapped embeddings into complex space which focused on expressing relation patterns. HAKE [
7] utilized the polar coordinate system to capture semantic hierarchies in the knowledge graphs.
Semantic matching models that match latent semantics of entities and relations can be divided into two categories: bilinear models and neural network based models. Bilinear models include DistMult [
8], HolE [
12], SimplE [
9], ComplEx [
13] and QuatE and DihEdral [
14]. DistMult represented each entity as a vector and each relation as a diagonal matrix. HolE matched latent semantics of entities by circular correlation operation and then the compositional vector interacted with relations latter. ComplEx, mapping knowledge graph embedding into complex space, leveraged Hermitian product to capture latent semantics of entities and relations which could express antisymmetry relation pattern. QuatE, extending knowledge graph embedding from complex space to quaternion space, modeled each relation as rotation in fourdimensional space with more degree of freedom. Compared with ComplEx, QuatE could express the main relation patterns except composition. For each entity, SimplE proposed two embeddings and each of them learned latent semantics dependently. DihEdral mapped relations into dihedral group to capture composition relations. Neural network based models including ConvE [
15], RGCNs [
16] and InteractE [
17] are proposed recently. ConvE, RGCNs introduced convolutional network and graph convolutional networks to knowledge graph embedding respectively. Compared with ConvE, InteractE introduced the feature permutation, “checkered” feature reshaping and circular convolution to increase interaction.
Recently, some models introduced hyperbolic space to knowledge graph embeddings. MuRP [
18] represented knowledge graph in Poincaré ball of hyperbolic space. Chami et al. [
19] attempted to capture hierarchical and logical patterns in hyperbolic space. Compared to hyperbolic space based models which focused on semantic hierarchies in knowledge graphs,
DualQuatE try to overcome overcome the challenge of multiple relations between two entities and relation patterns.
Both DualQuatE and QuatE use quaternion to embed knowledge graphs. However, those are three different models. The main differences between DualQuatE and QuatE are as follows:
DualQuatE, a transformation based model, measures score of triples by the distance between two entities. QuatE which is a semantic matching model measured the latent matching semantics of entities and relations.
The purpose of the model is different. DualQuatE aims to address the challenge of having multiple relations between two entities. QuatE aims to utilize quaterion Hamilton product to encourage a more compact interaction between entities and relations.
The geometric meaning is different. QuatE embeds entities and relations with quaterions to model relations as rotations. Our model firstly attempts to represent entities with pure quaternions and models relations as interaction of translation and rotation.
4. Our DualQuatE Model
In this section, we introduce our model DualQuatE which maps entities and relations to dual quaternion space, and two variations of DualQuatE, namely DualQuatE1 and DualQuatE2.
We denote a knowledge graph by $\mathcal{G}$, a set of entities by $\mathcal{E}$ and a set of relations by $\mathcal{R}$. A knowledge graph $\mathcal{G}$ is composed of a set of facts, each of which can be represented by $(h,r,t)$, where $h\in \mathcal{E}$ is a head entity, $t\in \mathcal{E}$ is a tail entity, and $r\in \mathcal{R}$ is a relation between h and t. We denote a set of facts that are true by ${\mathsf{\Omega}}^{+}$, and a set of facts that are false by ${\mathsf{\Omega}}^{}$. Given a knowledge graph $\mathcal{G}$, we aim to predict missing facts (i.e., link prediction) in $\mathcal{G}$.
4.1. Multiple Relations between the Entities
To address the challenge of having multiple relations between head and tail entities, we embed knowledge graph into dual quaternion space.
$\mathbf{h},\mathbf{r},\mathbf{t}$ denote vector of entity embeddings and relation embeddings, each element of entity embeddings
${h}_{i}$ or
${t}_{i}$ is pure quaternion and every dimension of relation embeddings
${r}_{i}$ is unit dual quaternion. We expect to model relation embeddings
$\mathbf{r}$ as
interaction of
rotation and
translation from head entity embeddings
$\mathbf{h}$ to tail entity embeddings
$\mathbf{t}$ as shown in
Figure 2a. Specifically, each true triple
$(h,r,t)$ satisfies:
where each dimension of
$\mathbf{r}$ is a unit dual quaterinon satisfying Formula (
4). We define a quaternion
$\mathbf{m}=\mathrm{cos}\frac{\theta}{2}+\mathbf{u}\mathrm{sin}\frac{\theta}{2}$ to represent a rotation about pure unit quaternion
$\mathbf{u}$ through
$\theta $ and a pure quaternion
$\mathbf{n}={n}_{1}\mathbf{i}+{n}_{2}\mathbf{j}+{n}_{3}\mathbf{k}$. Furthermore, we define a unit dual quaternion by:
With Formula (
10), we can deduce the transformation of
DualQuatE in Formula (
9):
where the geometric meaning of
$\mathbf{m}\mathbf{h}{\mathbf{m}}^{*}$ is shown in Formula (
2). As shown above,
DualQuatE transforms head entity
h to tail entity
t by relation
r which combines
rotation (i.e.,
$\mathbf{m}$) and
translation (i.e.,
$\mathbf{n}$). Unlike previous models learned similar representations of relations
${r}_{1},{r}_{2},{r}_{3}$ shown in
Figure 2c,d, our model learns combinations of different translations and rotations to represent various relations between head and tail entities.
We define score function by:
where
$\left\right\xb7\left\right$ represent
${L}_{2}$ norm of a vector. With the score function we want head entity to be as close to tail entity as possible after the transformation of the relation.
4.2. Loss Function
We employ selfadversarial negative sampling [
5] method to generate corrupt samples. We define the probability distribution of negative samples by:
where
$\alpha $ is sampling temperature. Combining with selfadversarial negative sampling, we define loss function by:
where
$\gamma $ is fixed margin. We define our algorithm as shown in Algorithm 1.
Algorithm 1DualQuatE. 
Input: Entity embeddings $\mathcal{E}$ and relation embeddings $\mathcal{R}$. hyperparameters including margin $\gamma $, martrix dim k, negative sample size n. 1: $\mathbf{h},\mathbf{t}\leftarrow $ uniform $(\frac{\gamma +2.0}{k},\frac{\gamma +2.0}{k})$ for each $\mathbf{h},\mathbf{t}\in $$\mathcal{E}$ $\mathbf{r}\leftarrow $ uniform $(\frac{\gamma +2.0}{k},\frac{\gamma +2.0}{k})$ for each $\mathbf{r}\in $$\mathcal{R}$ 2: repeat 3: $\hspace{1em}{T}_{pos}\leftarrow $ uniform random sampling $(h,r,t)$ 4: $\hspace{1em}({h}^{\prime},r,{t}^{\prime})\leftarrow $ generate n negative samples for $(h,r,t)$ 5: $\hspace{1em}T={T}_{pos}\cup \left\{({h}^{\prime},r,{t}^{\prime})\right\}$ 6: compute each $({h}^{\prime},r,{t}^{\prime})$ weight: $p({h}_{j}^{\prime},r,{t}_{j}^{\prime}\left\{({h}_{i}^{\prime},{r}_{i},{t}_{i}^{\prime})\right\})$ 7: update relation embeddings $\mathbf{r}$ and entity embeddings $\mathbf{h},\mathbf{t}$: $\mathbf{h},\mathbf{r},\mathbf{t}=\mathbf{h},\mathbf{r},\mathbf{t}{\nabla}_{{\theta}_{r}}[\mathrm{log}\sigma (\gamma +{f}_{r}(h,t)){\displaystyle \sum _{i=1}^{n}}p({{h}_{i}}^{\prime},r,{{t}_{i}}^{\prime})\mathrm{log}\sigma ({f}_{r}({h}_{i}^{\prime},{t}_{i}^{\prime})\gamma )]$ 8: until

4.3. Properties of DualQuatE
In this part we describe the relation patterns and introduce how
DualQuatE expresses those patterns. Recently, learning relation patterns including
symmetry/antisymmetry,
inversion and
composition have been realized to the key of link prediction task. Our model
DualQuatE can easily explain the relation patterns of the learned relation embeddings and proof of relation patterns can be found in the
Appendix A.
Inversion: If a relation ${r}^{\prime}\in \mathcal{R}$ is the inverse to a relation $r\in \mathcal{R}$, then we can infer $(h,r,t)\in {\mathsf{\Omega}}^{+}\iff (t,{r}^{\prime},h)\in {\mathsf{\Omega}}^{+}$. For example, the relation $has\_part$ is inverse to the relation $part\_of$. To r and ${r}^{\prime}$, we infer that $\left({\mathbf{m}}^{\prime}\mathbf{m}\right)\mathbf{h}{\left({\mathbf{m}}^{\prime}\mathbf{m}\right)}^{*}+{\mathbf{m}}^{\prime}\mathbf{n}{\mathbf{m}}^{\prime}+{\mathbf{n}}^{\prime}=\mathbf{h}$, which denotes the composition of component $\mathbf{m}$ and ${\mathbf{m}}^{\prime}$ have no rotation (i.e., $\left({\mathbf{m}}^{\prime}\mathbf{m}\right)\mathbf{h}{\left({\mathbf{m}}^{\prime}\mathbf{m}\right)}^{*}=\mathbf{h}$) and the translation $\mathbf{n}$ which is rotated by ${\mathbf{m}}^{\prime}$ is the opposite number of the translation ${\mathbf{n}}^{\prime}$ (i.e., ${\mathbf{m}}^{\prime}\mathbf{n}{\mathbf{m}}^{\prime}+{\mathbf{n}}^{\prime}=\mathbf{0}$).
Symmetry: A relation $r\in \mathcal{R}$ is symmetric, if $(h,r,t)\in {\mathsf{\Omega}}^{+}\iff (t,r,h)\in {\mathsf{\Omega}}^{+}$ holds. For instance, relations $similar\_to$ and $verb\_group$ from the dataset WN18 are symmetric. If a relation is symmetric, we reason that $\left(\mathbf{m}\mathbf{m}\right)\mathbf{h}{\left(\mathbf{m}\mathbf{m}\right)}^{*}+\mathbf{m}\mathbf{n}{\mathbf{m}}^{*}+\mathbf{n}=\mathbf{h}$, which means no rotation of the selfcomposition of component $\mathbf{m}$ (i.e., $\left(\mathbf{m}\mathbf{m}\right)\mathbf{h}{\left(\mathbf{m}\mathbf{m}\right)}^{*}=\mathbf{h}$) and no translation of component $\mathbf{n}$ (i.e., $\mathbf{m}\mathbf{n}{\mathbf{m}}^{*}+\mathbf{n}=\mathbf{0}$).
Antisymmetry: A relation $r\in \mathcal{R}$ is antisymmetric, if $(h,r,t)\in {\mathsf{\Omega}}^{+}\Rightarrow (h,r,t)\in {\mathsf{\Omega}}^{}$, which satisfies $\left(\mathbf{m}\mathbf{m}\right)\mathbf{h}{\left(\mathbf{m}\mathbf{m}\right)}^{*}+\mathbf{m}\mathbf{n}{\mathbf{m}}^{*}+\mathbf{n}\ne \mathbf{h}$. For example, the relation $part\_of$.
Composition: A relation ${r}_{3}$ is composed by the relation ${r}_{1}$ and ${r}_{2}$, which can be denoted by ${r}_{3}={r}_{1}\oplus {r}_{2}$ if $(h,r,s)\in {\mathsf{\Omega}}^{+}\wedge (s,r,t)\Rightarrow (t,r,h)\in {\mathsf{\Omega}}^{+}$. For example, relation uncle_of can be composited by brother_of and father_of such as if $(Alva,brother\_of,Aaron)$, $(Aaron,father\_of,Abel)$ are true triples, we can reason $(Alva,uncle\_of,Abel)$ is a true fact in the real world. Relation ${r}_{3}$ can be composited by relation ${r}_{1}$ and ${r}_{2}$; they can be represented by $\left({\mathbf{m}}_{2}{\mathbf{m}}_{1}\right)\mathbf{h}{\left({\mathbf{m}}_{2}{\mathbf{m}}_{1}\right)}^{*}+{\mathbf{m}}_{2}{\mathbf{n}}_{1}{\mathbf{m}}_{2}^{*}+{\mathbf{n}}_{2}={\mathbf{m}}_{3}\mathbf{h}{\mathbf{m}}_{3}^{*}+{\mathbf{n}}_{3}$, which deduces that ${\mathbf{n}}_{3}$ is equal to the sum of translation ${\mathbf{n}}_{2}$ and translation ${\mathbf{n}}_{1}$ which is rotated by the rotation ${\mathbf{m}}_{2}$ (i.e., ${\mathbf{m}}_{2}{\mathbf{n}}_{1}{\mathbf{m}}_{2}^{*}+{\mathbf{n}}_{2}={\mathbf{n}}_{3}$).
4.4. Variations
We introduce extensions of DualQuatE. DualQuatE is a transformation based model, which combines rotation and translation. To compare the effects of interaction of rotation and translation, we compare DualQuatE with DualQuatE1 which models relations as rotation in threedimensional space. Furthermore, we propose DualQuatE2 to explore the role of scaling in the rotation.
DualQuatE1: We devise DualQuatE1 which embeds entities and relations to quaternion space. Specifically, we represent entity embeddings $\mathbf{h},\mathbf{t}$ with pure quaternions and relation embeddings $\mathbf{r}$ with quaternions. We design a score function as follows ${f}_{r}(h,t)=\left\right\mathbf{r}\mathbf{h}{\mathbf{r}}^{*}\mathbf{t}\left\right$ to model the relation as rotation in threedimensional space. Namely, for each fact satisfies: $\mathbf{r}\mathbf{h}{\mathbf{r}}^{*}=\mathbf{t}$.
DualQuatE2: To explore the effect of scaling in knowledge graph embeddings, we present DualQuatE2 to introduce scaling. DualQuatE2 maps knowledge graph embeddings to fourdimensional space. Especially, we represent entities and relations with quaternions where relation embeddings are not unit quaternions. We define score function ${f}_{r}(h,t)=\left\right\mathbf{h}\mathbf{r}\mathbf{t}\left\right$ meaning relation transform head entity to tail entity combining rotation and scaling.
4.5. Connection to TransE and RotatE
Compared with RotatE: RotatE embedded entity embeddings
$\mathbf{h}$,
$\mathbf{t}$ and relation embeddings
$\mathbf{r}$ into the complex space. RotatE utilized score function
$\left\right\mathbf{h}\circ \mathbf{r}\mathbf{t}\left\right$ to calculate the probability of each triple, where
${r}_{i}$ is unit complex
$\mathrm{cos}\theta +\mathbf{i}\mathrm{sin}\theta $.
DualQuatE can be transformed to RotatE by fixing rotation plane and removing translation variables. For instance, we can construct relation embeddings by Formula (
10) in
$xoy$ plane, where
$\mathbf{u}=\mathbf{k}$ and
$\mathbf{n}=\mathbf{0}$ (i.e.,
$\mathbf{r}=\mathrm{cos}\frac{\theta}{2}+\mathrm{sin}\frac{\theta}{2}\mathbf{i}$) and embed entities with corresponding forms:
$\mathbf{h}$ or
$\mathbf{t}=a\mathbf{i}+b\mathbf{j}$.
Compared with TransE: TransE modeled relation as translation that embedded entity embeddings $\mathbf{h}$, $\mathbf{t}$ and relation embeddings $\mathbf{r}$ to vector space. To express TransE, we can set $\theta =0$ (i.e., $\mathbf{m}=\mathbf{0}$) in relation embeddings to ignore the rotation. In other words, the relation embedddings in DualQuatE can be expressed as $\mathbf{r}=1+\frac{\u03f5}{2}\mathbf{n}$.