Translation-Based Embeddings with Octonion for Knowledge Graph Completion

Yu, Mei; Bai, Chen; Yu, Jian; Zhao, Mankun; Xu, Tianyi; Liu, Hongwei; Li, Xuewei; Yu, Ruiguo

doi:10.3390/app12083935

Open AccessArticle

Translation-Based Embeddings with Octonion for Knowledge Graph Completion

by

Mei Yu

^1,2,3,

Chen Bai

^1,2,3,

Jian Yu

^1,2,3,

Mankun Zhao

^1,2,3,

Tianyi Xu

^1,2,3,

Hongwei Liu

⁴,

Xuewei Li

^1,2,3,* and

Ruiguo Yu

^1,2,3

¹

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China

²

Tianjin Key Laboratory of Advanced Networking (TANKLab), Tianjin University, Tianjin 300350, China

³

Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin 300350, China

⁴

Foreign Language, Literature and Culture Studies Center, Tianjin Foreign Studies University, Tianjin 300204, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(8), 3935; https://doi.org/10.3390/app12083935

Submission received: 22 February 2022 / Revised: 7 April 2022 / Accepted: 11 April 2022 / Published: 13 April 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge representation learning achieves the automatic completion of knowledge graphs (KGs) by embedding entities into continuous low-dimensional vector space. In knowledge graph completion (KGC) tasks, the inter-dependencies and hierarchical information in KGs have gained attention. Existing methods do not well capture the latent dependencies between all components of entities and relations. To address this, we introduce the mathematical theories of octonion, a more expressive generalized form of complex number and quaternion, and propose a translation-based KGC model with octonion (TransO). TransO models entities as octonion coordinate vectors, relations as the combination of octonion component matrices and coordinate vectors, and uses specific grouping calculation rules to interact between entities and relations. In addition, since hyperbolic Poincaré space in non-Euclidean mathematics can represent hierarchical data more accurately and effectively than traditional Euclidean space, we propose a Poincaré-extended TransO model (PTransO). PTransO transforms octonion coordinate vectors into hyperbolic embeddings by exponential mapping, and integrates the Euclidean-based calculations into hyperbolic space by operations such as Möbius addition and hyperbolic distance. The experimental results of link prediction indicate that TransO outperforms other translation-based models on the WN18 benchmark, and PTransO further achieves state-of-the-art performance in low-dimensional space on the well-established WN18RR and FB15k-237 benchmarks.

Keywords:

knowledge graph completion; octonion; hyperbolic geometry; Poincaré space; link prediction

1. Introduction

Knowledge graphs (KGs) are semantic networks composed of entities and relations, which provide an accurate description of objects in the real world [1]. Since their concept was first proposed by Google [2] in 2012, KGs have become an important part of search engines, personalized recommendation systems, and conversational robots [3,4,5].

KGs use a triple

(h, r, t)

to describe a fact, where h denotes the head entity, t denotes the tail entity, and r denotes the relation between the head entity and the tail entity [6]. Knowledge graph embeddings embed the entities and relations of triples into continuous vector space [7]. By defining the scoring function

f_{r} (h, t)

and calculating the score of any embedded triple, the embedding method can judge whether this triple is true or not. Since the embedding process obtains the vector representation of the entities and relations in the knowledge graph through machine learning, it is also called knowledge representation learning [8]. The combination of knowledge representation learning and link prediction can allow the missing facts in the KGs to be predicted, thus realizing knowledge graph completion (KGC).

Recently, in the research field of KGC, the inter-dependencies and hierarchical information in KGs have gained attention.

The dependency information in KGs can be divided into external dependencies and internal dependencies. On the one hand, external dependencies refer to the dependencies between multi-dimensional embedded vectors from the KGs. For example, in Figure 1a, there is a triple

(E d v a r d M u n c h, p a i n t e d, T h e s c r e a m)

, and the entities

E d v a r d M u n c h

and

T h e s c r e a m

are connected through the relation

p a i n t e d

, i.e., there is an external dependency between the embedded vectors of

E d v a r d M u n c h

and

T h e s c r e a m

. To some extent, the embedded vector representation of

E d v a r d M u n c h

depends on

T h e s c r e a m

, and vice versa. On the other hand, there are internal dependencies between the components or elements within the multi-dimensional embedded vector. For example, the value of a one-dimensional component in the embedded vector of

E d v a r d M u n c h

depends on other dimensional components, and the dimensions are not independent of each other.

Current KGC methods tend to focus more on the interaction between the individuals of entities or relations, i.e., external dependencies, while ignoring the latent internal dependencies between elements or components. For example, the translation-based TransE [9] method regards the relation in the triples of KGs as the translation vector from the head entity to the tail entity, and captures the distance feature between the head entity after the relational interaction and the tail entity; the semantics-based RESCAL [10] method represents the entity as a low-dimensional vector and relation as a matrix that captures the interaction between the head and the tail entity. By contrast, neural network-based methods can learn complex input–output mapping, and usually have better capabilities in capturing internal dependencies, such as ConvE [11], ConvKB [12]. However, they require a large number of parameters and complex calculations, and are not suitable for low-resource application scenarios.

Octonion can be regarded as a hypercomplex with one real and seven imaginary components, which takes the correlation between components into account in the structure. Therefore, it is widely used in the feature extraction of color images and multi-dimensional speech signals. For example, reference [13] proposes an octonion neural network to replace the real-valued neural network, and reference [14] proposes an octonion BP neural network to extract fine palmprint lines in color palmprint images. These studies show that by taking advantage of the specific rules of octonion multiplication, octonion can effectively encode the internal dependencies in input features and the external dependencies between features with relatively fewer parameters, and are suitable for applications with lower resources. Therefore, octonion has the potential to capture the neglected internal dependencies in KGs and improve the effectiveness of modeling.

We propose a translation-based KGC model with octonion (TransO), which effectively takes advantage of the lightweight and easy-to-train characteristics of the translation-based thoughts, and improves the capturing capability of the model for latent internal dependencies. Since the form and operation rules of octonion do not match the matrix operations commonly used in machine learning, it is difficult to directly introduce octonion into the KGC method, so we refer to mathematical research [15,16,17], and use coordinate vectors and component matrices to perform octonion operations in equivalent form. Specifically, TransO regards the entities in KGs as octonion coordinate vectors, and the relations as the combination of octonion component matrices and coordinate vectors, both of which are based on a real number field. TransO divides the interaction between entities and relations into two steps: multiplication interaction and translation interaction. The former captures internal dependencies by the grouping product of octonion vector and matrix, forming a compact interaction pattern; the latter captures external dependencies using vector addition and

l 2

-norm consistent with traditional Trans series methods.

In addition, hierarchical information widely exists in KGs such as social networks and geographic information, which are usually represented as discrete tree-like structures. For example, in Figure 1b, the entity

S i l i c o n V a l l e y

is taken as the root node of the hierarchy, and extends downward layer by layer; the second layer includes two corporate entities

A p p l e I n c .

and

G o o g l e I n c .

; the third layer includes four personal name entities such as

M a r y

, and finally forms a complete tree. Since there are multiple hierarchical structures in the KGs, some entities are at the top of one structure and may be at the bottom of the other. How to reasonably model hierarchical information to improve the embedding effect of KGs is also an important research direction.

However, most of the current methods are based on Euclidean geometric space and are not suitable for modeling hierarchical information. On the one hand, due to the discrete tree-like data structure of hierarchical information, Euclidean geometric space cannot preserve the latent hierarchical features with high information distortion in the case of low-dimensional embedding. On the other hand, since the number of nodes at each layer of the hierarchical tree-like structure reflects an exponential growth trend, modeling hierarchical information using Euclidean geometry requires high memory and computing resources.

Hyperbolic geometry in non-Euclidean mathematics has inherent advantages for modeling hierarchical information. Firstly, compared with Euclidean geometry, hyperbolic geometry is easier to capture and restore the features of hierarchical relational data, forming a natural tree-like hierarchical structure. Moreover, hyperbolic geometry has higher information-carrying capacity than Euclidean geometry, and can model with relatively lower dimensions and fewer parameters, i.e., hyperbolic geometry has less information distortion in low-dimensional embedding [18]. In recent years, KGC tasks in hyperbolic space have been studied. MuRP [19] introduces the theories of hyperbolic geometry, which embeds KGs into a hyperbolic Poincaré ball. AttH [20] uses reflection and rotation operations of hyperbolic isometrics to replace the linear transformation in MuRP, and combines reflection and rotation through the attention mechanism. Although these methods are still inadequate in feature interaction, and the interactive level is relatively shallow, they show the great advantages of hyperbolic geometry in low-dimensional embedded space or hierarchical information, which further inspire our ideas of introducing hyperbolic geometry.

We choose Poincaré space, which is widely researched in hyperbolic geometry, as the destination, and propose a Poincaré-extended TransO model (PTransO), which attempts to transfer TransO from Euclidean space to hyperbolic Poincaré space, and integrates the Euclidean-based translation thoughts into hyperbolic space. On the basis of TransO, PTransO further enhances the modeling capability to hierarchical data in KGs while maintaining the processing superiority of octonion on latent dependencies.

The main contributions of this paper are as follows:

Using the mathematical theories of octonion in the field of linear algebra, we propose to introduce octonion into the translation-based KGC method for the first time, which maintains the light weight and high efficiency of the translation-based KGC framework. We realize the compact interaction between entities and relations depending on the multiplication with grouping rules between octonion matrices and vectors, which fully exploits the internal correlations and dependencies of the octonion structure, and enhances the capability to model latent dependencies in KGs.
We attempt to transfer our model from Euclidean space to hyperbolic Poincaré space, transforming the octonion coordinate vectors into hyperbolic embeddings and integrating the Euclidean-based translation thoughts into hyperbolic space. On the premise of maintaining the processing superiority of octonion on latent dependencies, we further enhance the modeling capability to hierarchical data in KGs.
We analyze and confirm that our models are superior to the previous approaches on the standard benchmark datasets WN18, WN18RR, and FB15k-237. By ablation experiments, we analyze the performance of our models on (1) different types of relations and (2) different dimensions, demonstrating (1) the advantages of octonion and hyperbolic geometry in KGC tasks and (2) the respective suitable application scenarios.

2. Related Work

With the large-scale application of KG technology, the tasks of KGC have attracted widespread attention. KGC is accomplished by steps such as knowledge graph embedding and link prediction, whose core is the embedding process. Over the last several years, various embedding methods have been proposed. In the early research, by modeling KGs in high-dimensional vector space, the features and expressiveness could be increased to a certain extent. However, this leads to large memory consumption and increases the additional cost of model training.

In 2013, TransE [9] proposed a translation-based method for embedding entities and relations of KGs in low-dimensional continuous space, and made substantial improvements. Since then, several categories of KGC methods based on translation [9,21,22,23,24,25,26], semantics [10,27,28,29,30,31,32], and neural networks [11,12,33,34,35] have been developed successively. These methods aim to improve the feature interaction capabilities among triple elements by different forms of innovation, each with its own advantages and disadvantages.

2.1. Translation-Based Methods

In translation-based methods, the entities are usually regarded as points in vector space, and the relations are represented as the translation vectors between two entities. The l1-norm or l2-norm is used as the scoring function to judge the rationalities of the triples by measuring the distance from translated subject entities to object entities. The smaller the distance is, the more reliable the triple is.

TransE [9] is an epoch-making translation-based method. For a given triple

(h, r, t)

, TransE assumes that the true triple satisfies

h + r \approx t

, where

h

,

r

,

t

are the vector representations of h, r, and t, respectively. However, TransE has some defects, such as poor handling of 1-to-N, N-to-1, N-to-N relations, and the incapability to model symmetric relations. Subsequently, a number of TransE-derived methods are proposed to improve the problems with TransE. TransH [21] regards each relation as a hyperplane, and the entity embedded vectors are projected into the corresponding relational hyperplane. TransR [22] uses the relation-specific projection matrix

M_{r}

instead of TransH’s hyperplane to project the entity embedded vectors into space, but this will bring high consumption of resources. TransD [23] models the relational mapping matrix

M_{r e}

in a more flexible way. STransE [24] defines that each relation no longer has only one projection matrix, but carries out different transformations for the head and tail entities individually. Therefore, besides the translation vector

r

, each relation in STransE has two projection matrices

M_{r, h}

and

M_{r, t}

.

Due to the relatively simple structure, translation-based methods tend to have poor representation capability, especially in modeling complex relations in KGs. Nevertheless, for the advantage of being efficient with low computational cost, they still have value for further research. In this paper, we continue to focus on the translation-based methods, and enhance the translational KGC framework by introducing the mathematical theories of octonion and hyperbolic geometry. This enables our methods to maintain the advantages of being lightweight and training-friendly, while enhancing the representation capability, which is, to some extent, more promising than the following semantics-based and neural network-based methods.

2.2. Semantics-Based Methods

Semantics-based methods measure rationality by matching the latent semantics of entities and relations, most of which ultimately calculate the likelihood of triples through the product of matrix and vector.

RESCAL [10] is the earliest semantics-based KGC method, which uses full-rank relational matrices

M_{r}

to model latent pairwise interactions between head and tail entities, and optimizes a scoring function that computes a bilinear product

h^{⊤} M_{r} t

among head and tail entity embedded vectors and the relational matrix. Due to too many parameters, RESCAL has an overfitting problem. DistMult [27] improves the full-rank relational matrices

M_{r}

of RESCAL to diagonal matrices, which alleviates overfitting to a certain extent. Subsequently, SimplE [28] and TuckER [29] are proposed based on two types of matrix decomposition methods: CP and Tucker. TuckER achieves satisfactory results in KGC tasks, and proves that the above semantics-based methods are special cases of TuckER. Semantics-based methods thus achieve theoretical uniformity. ComplEx [30] expands DistMult from real-valued space to complex-valued space, and takes the real part of the result of complex multiplication as the final score. Since this operation is asymmetric, ComplEx can solve the problem that DistMult cannot model asymmetric relations. RotatE [31] models each relation of triples as a rotation operation from the head entity to the tail entity in complex-valued space, and has the capability to infer multiple relational patterns. QuatE [32] extends from complex space to quaternion space with higher degrees of freedom.

Compared with translation-based methods, semantics-based methods better reflect the semantic information of triples in the design of scoring functions, while translation-based methods usually capture shallow features. However, semantics-based methods have more redundancies in computations, so it is easy to overfit. When modeling KGs with a large number of entities and relations, semantics-based methods have to raise the embedded space to a very high dimension in order to completely embed and separate all entities and relations, which leads to high memory and computational resource costs. Moreover, semantics-based methods generally have poor interpretability.

2.3. Neural Network-Based Methods

In order to explore deeper-level information in KGs, researchers introduce neural network methods. ConvE [11] introduces a simple multi-layer convolutional architecture, which uses the convolutional layer to obtain features from the matrix composed of head entity vectors and relation vectors. By two-dimensional convolution, vector flattening, and a fully connected layer, ConvE matches all candidate tail entity embedded vectors. ConvKB [12] improves the form of input data for ConvE to capture deeper features. CapsE [33] is based on a capsule network. Both CapsE and ConvKB use the three-column matrix of the embedded vectors. InteractE [34] improves the convolutional steps of ConvE, and enhances the capability of feature interaction through the ideas of feature permutation, checkered reshaping, and circular convolution. R-GCN [35] proposes a KGC framework based on a graph convolution network (GCN).

Neural network-based methods are usually combined with deep learning, which has the disadvantages of complex network structure and a large amount of computation, but these methods are adept in dealing with KGs with a large number of multi-relational data features.

3. Methodology

3.1. Preliminaries

Before describing our approach, we need to explain some preliminaries about octonion and hyperbolic geometry, as shown below.

3.1.1. Background and Calculation Rules of Octonion

Octonion is a non-associative generalization of quaternion, which is usually denoted as

O

. An octonion can be regarded as a tuple with eight elements. Each octonion is a linear combination of unit octonion

\{1, i, j, k, l, i l, j l, k l\}

, i.e., the octonion

x = x_{0} + x_{1} i + x_{2} j + x_{3} k + x_{4} l + x_{5} i l + x_{6} j l + x_{7} k l

, where

x_{i} \in R, i = 0, 1, \dots, 7

.

As a hypercomplex number in abstract algebra, octonion has the following good mathematical properties:

As the generalized form of complex number and quaternion, octonion is the only normed division algebra except real number, complex number, and quaternion [36], which can define norm and multiplicative inverse, and can also perform inner product operation, with rich algebraic definitions.
In geometric space, octonions can represent the combination of rotation and translation [37], which has good geometric significance, while the quaternion, which is also one of the hypercomplex numbers, can only represent rotation. Consequently, compared with the complex number and quaternion, octonion has higher degrees of freedom and more flexible expressive capability.
The representation of octonion has internal correlations and dependencies among its constituent elements [38]. On the premise of not increasing the amount of parameters, octonion can add additional information capacity and interactive features to the graph data. When performing mathematical operations, octonion can encourage the compact interaction between the objects being calculated.

It can be seen that octonion has good interpretability, because it is well defined algebraically and has both algebraic and geometric interpretations.

Moreover, octonion adds additional interactive features and encourages compact interactive patterns mainly through its specific product operation rules. As shown in Figure 2, we compare the common operations among the inner product, tensor product, Hadamard product, and octonion product, where

a

and

b

denote multiplier vectors, which are represented in 8-dimensional form for comparison with octonions. c,

C

, or

c

denotes the product number, matrix, or vector.

As can be seen from the comparison in the figure, the octonion product

c

is still an octonion vector, and each one-dimensional component of

c

is related to each one-dimensional component of the multiplier octonion vectors

a

and

b

. Assuming that

a

and

b

denote two interactive input features, the octonion vector

c

denotes the output feature after interaction. Each component in

c

contains the information of each component in the input features, thus forming a compact interactive pattern, which can fully mine the deep features hidden in the input vectors. Moreover, after multiplicative interaction, the internal components of octonion product

c

are no longer relatively independent individuals, but have internal correlation, i.e., the value of a one-dimensional component depends on other components, which has significant advantages in modeling internal dependencies. These characteristics are not found in the inner product, tensor product, and Hadamard product.

Octonion is used in many scientific fields, such as string theory, special relativity, and quantum logic. In the field of signal processing, the representation of signals in the octonion domain can provide global stability; thus, octonion has a rich application background. From the perspective of signal processing, if one multi-dimensional input feature is regarded as a signal, in this way, some components of this feature can represent a channel of the signal, and there are tight correlations between different channels of the signal. These correlations are exactly what the representation of octonion is good at capturing.

The embedded representations in KGs are similar to signals. In form, the embeddings of KGs and signals are both multi-dimensional features, and the components of embedding have the meaning of projection in space, which is similar to the meaning of channels in the signal. Since signals can be represented by octonions, it is likely that the embeddings in KGs are also represented by octonions; in content, there are also a large number of complex correlations in KGs, such as external and internal dependencies. Since octonions are good at capturing the correlations of different channels in the signal, they have the potential to capture those between different components of embedded vectors in KGs.

At present, most of the technologies and frameworks currently used in KGs are based on real-valued representation and operation, and the research of octonion has great potential and broad prospects.

However, the calculation rules of octonion restrict its application in computer science to a certain extent, as shown below.

The addition operation of octonion adds the corresponding coefficients, similar to complex number or quaternion. However, the multiplication operation of octonion is determined by the specific multiplication rules of the unit octonion. There are also specific multiplication rules in quaternion, but the rules of octonion are obviously more complex, as they are the extension of quaternion.

From Figure 3a, we can preliminarily understand the multiplication rules of quaternion. When two multipliers are selected along the arrow direction in the figure, the multiplication result is the next element adjacent to the two multipliers, but when two multipliers are selected against the arrow direction, the multiplication result is the opposite number of the next element adjacent to the two multipliers—for example,

i j = k

,

i k = - j

. Similar to quaternion, the multiplication rules of octonion can be understood from the Fano plane in Figure 3b—for example,

k l j = i l

,

k l i = - j l

.

Moreover, from Figure 4, the multiplication rules of octonion can also be understood by looking at the table. Select multiplier 1 from the first column and multiplier 2 from the first row, and then the multiplication result is the element at the intersection of the row and column; for example, if we select

i l

from the first column as multiplier 1 and

k l

from the first row as multiplier 2, the operation result is j, which is consistent with that obtained in Figure 3b.

According to the above rules, the multiplication of octonion is neither commutative nor associative, e.g.,

i j = - j i \neq j i

, or

(i j) l = - i (j l) \neq i (j l)

. Furthermore, these rules of octonion are more complex, while most current machine learning methods use a matrix to operate the dataset; their forms do not match. If the machine learning model is directly introduced into the octonion space, it may not bring additional benefits to the model. Even the loss of some algebraic properties and the complex operation rules might bring side effects to the model.

In recent years, there have been studies [15,16,17] on transforming the operations between octonions into those between vectors and matrices in the real number field. These innovative operations can be performed using the knowledge of linear algebra and are much more convenient than the direct octonion multiplication shown in Figure 3b and Figure 4. With this, the calculation difficulty between octonions can be reduced to a great extent.

For any octonion

a = \sum_{w = 0}^{7} a_{w} e_{w}

on octonion algebra

O

, where

a_{w} \in R

and

e_{0} = 1

,

e_{1} = i

,

e_{2} = j

,

e_{3} = k

,

e_{4} = l

,

e_{5} = i l

,

e_{6} = j l

,

e_{7} = k l

, reference [17] proposes the following definitions and theorems:

s (a) = {(a_{0}, a_{1}, a_{2}, a_{3}, a_{4}, a_{5}, a_{6}, a_{7})}^{⊤} \in R^{8}

(1)

m (a) = [\begin{matrix} a_{0} & - a_{1} & - a_{2} & - a_{3} & - a_{4} & - a_{5} & - a_{6} & - a_{7} \\ a_{1} & a_{0} & - a_{3} & a_{2} & - a_{5} & a_{4} & a_{7} & - a_{6} \\ a_{2} & a_{3} & a_{0} & - a_{1} & - a_{6} & - a_{7} & a_{4} & a_{5} \\ a_{3} & - a_{2} & a_{1} & a_{0} & - a_{7} & a_{6} & - a_{5} & a_{4} \\ a_{4} & a_{5} & a_{6} & a_{7} & a_{0} & - a_{1} & - a_{2} & - a_{3} \\ a_{5} & - a_{4} & a_{7} & - a_{6} & a_{1} & a_{0} & a_{3} & - a_{2} \\ a_{6} & - a_{7} & - a_{4} & a_{5} & a_{2} & - a_{3} & a_{0} & a_{1} \\ a_{7} & a_{6} & - a_{5} & - a_{4} & a_{3} & a_{2} & - a_{1} & a_{0} \end{matrix}] \in R^{8 \times 8}

(2)

where

s (a)

is the coordinate vector of a, and

m (a)

is the component matrix. According to the reference [17], the coordinate vector and component matrix of octonion satisfy the following Theorems 1–3:

Theorem 1.

For any

u, v \in R

and

a, b \in O

,

s (u a + v b) = u s (a) + v s (b)

m (u a + v b) = u m (a) + v m (b)

Theorem 2.

For any

a \in O

,

a = 0 \Leftrightarrow s (a) = 0 \Leftrightarrow |m (a)| = 0 \Leftrightarrow m (a) = 0

a i s i n v e r t i b l e \Leftrightarrow m (a) i s i n v e r t i b l e

Theorem 3.

For any

a, b \in O

,

s (a b) = m (a) s (b)

m (a b) = m (a) m (b)

According to the above theorems, the multiplication between two octonions can be replaced by that between the component matrix and coordinate vector. For two octonions

a, b \in O

, their coordinate vectors are

s (a)

and

s (b)

, respectively, and their product can also be represented by coordinate vector

s (a b)

. Since

s (a b) = m (a) s (b)

, only one of the coordinate vectors needs to be represented in the form of a component matrix, i.e.,

s (a) \Rightarrow m (a)

, and then through the multiplication between matrix

m (a)

and vector

s (b)

, the coordinate vector of the product of two octonions a and b can be easily obtained, which can be converted into the traditional octonion form with imaginary parts at any time.

3.1.2. Hyperbolic Geometry of Poincaré Model

In addition to latent dependencies, we also attempt to improve the modeling of hierarchical information in KGs. Since hierarchical information is closely related to geometric space, we focus on the mathematical theories about geometric space.

Euclidean space, as we know it, is a geometric space with constant curvature of 0, while the curvature of non-Euclidean spaces is not. Non-Euclidean spaces can be divided into two types, elliptic space and hyperbolic space, in which elliptic space has positive curvature greater than 0 and hyperbolic space has negative curvature less than 0. This paper mainly focuses on hyperbolic space, because, compared with elliptic space, hyperbolic space has the capability to model hierarchical information, and we will explain the reason later.

Three typical equivalent models of hyperbolic space are the Lorentz model, Klein model, and Poincaré model. Among them, the Poincaré model is the most widely used equivalent model and is also the research object of this paper. It is is commonly known as the Poincaré disk or Poincaré ball; the former shows the construction results of the Poincaré model in two-dimensional space, while the latter is in three-dimensional or higher-dimensional space.

A

d - dimensional

Poincaré ball with curvature

- c (c > 0)

is defined as

B^{d, c} = {x \in R^{d} : {∥x∥}^{2} < 1 / c}

, where

∥\cdot∥

denotes

l 2 - norm

.

T_{x}^{c}

denotes the tangent space of

B^{d, c}

at point

x

, which can be regarded as Euclidean space.

The visualization of the Poincaré model in the form of a two-dimensional disk is shown in Figure 5, in which all arcs connecting two points are regarded as equal-length lines by the Poincaré model itself.

The above figure reflects the two inherent advantages of hyperbolic space for modeling hierarchical information:

1.: Hyperbolic space has fine capabilities of information representation, modeling, and restoration for tree-like structures. A point in the Poincaré disk shown in the figure forms a natural tree-like hierarchical structure in the process of connecting other points step by step to one side of the disk boundary. When the data being modeled also have a tree structure, hyperbolic space can more easily capture and restore the hierarchical features of the data than Euclidean space. In KGs, there are abundant tree-like hierarchical features. Thus, hyperbolic geometric space has natural consistency with the hierarchical information in KGs.
2.: Hyperbolic space is an infinite metric space with higher information capacity. The Poincaré disk shown in the figure has a quantity of hyperbolic spatial triangles with the same shape and different sizes, but for the Poincaré disk, the sizes of these triangles are regarded as the same, only because the visualization gives the impression that the triangles are shrinking as they approach the boundary of the Poincaré model. In fact, the closer to the boundary of the Poincaré model, the capacity of information that the space can accommodate will increase exponentially. In KGs, since hierarchical information has a tree-like structure, the number of nodes in each layer also reflects an exponential growth trend relative to the previous layer. Traditional Euclidean geometry space is unable to model the KGs with a large amount of hierarchical information, and the spatial dimension of the embedded vectors has to be further increased to meet the exponential information growth. However, for hyperbolic space, since it has less information distortion than Euclidean space with lower dimensions and fewer parameters, it is theoretically more suitable for modeling KGs with hierarchical information.

The operations involved in the Poincaré model mainly include: Möbius addition, exponential and logarithmic mappings, and hyperbolic distance calculation. Figure 6 visualizes the mapping operations, and the detailed definitions of all operations are as follows:

Möbius addition: There is no good definition of the native addition operation in the Poincaré model. If two points in the Poincaré model are added directly, the result will easily be out of the scope of space. Möbius addition provides an addition operation close to the native representation in the Poincaré model, and ensures that the operation result is still within the scope:

x \oplus_{c} y = \frac{(1 + 2 c 〈x, y〉 + c {∥y∥}^{2}) x + (1 - c {∥x∥}^{2}) y}{1 + 2 c 〈x, y〉 + c^{2} {∥x∥}^{2} {∥y∥}^{2}}

(3)

where

x, y \in B^{d, c}

,

c > 0

denotes the absolute value of negative curvature.

Exponential mapping:

{exp}_{x}^{c} (v)

constructs a tangent space

T_{x}^{c}

of the Poincaré ball

B^{d, c}

on the point

x

, and then maps the vector

v

in the tangent space back to the Poincaré ball:

{exp}_{x}^{c} (v) = x \oplus_{c} (tanh (\sqrt{c} \frac{λ_{x}^{c} ∥v∥}{2}) \frac{v}{\sqrt{c} ∥v∥})

(4)

where

x \in B^{d, c}

,

v \in T_{x}^{c}

,

c > 0

denotes the absolute value of negative curvature, and

λ_{x}^{c} = 2 / (1 - c {∥x∥}^{2})

denotes the conformal factor.

Logarithmic mapping: The processes of exponential mapping and logarithmic mapping are opposite. Logarithmic mapping

{log}_{x}^{c} (y)

maps a point

y

from

B^{d, c}

to

T_{x}^{c}

:

{log}_{x}^{c} (y) = \frac{2}{\sqrt{c} λ_{x}^{c}} arctanh (\sqrt{c} ∥- x \oplus_{c} y∥) \frac{- x \oplus_{c} y}{∥- x \oplus_{c} y∥}

(5)

where

x, y \in B^{d, c}

,

c > 0

denotes the absolute value of negative curvature, and

λ_{x}^{c} = 2 / (1 - c {∥x∥}^{2})

denotes the conformal factor.

Simplified representation of mappings: Since exponential mapping and logarithmic mapping usually construct tangent space

T_{0}^{c}

at the origin

0

, these two mappings can be simplified in the form of

{exp}_{0}^{c} (v)

and

{log}_{0}^{c} (y)

:

{exp}_{0}^{c} (v) = tanh (\sqrt{c} ∥v∥) \frac{v}{\sqrt{c} ∥v∥}

(6)

{log}_{0}^{c} (y) = arctanh (\sqrt{c} ∥y∥) \frac{y}{\sqrt{c} ∥y∥}

(7)

Hyperbolic distance calculation: By using Möbius addition, the hyperbolic distance of two points

x

and

y

in Poincaré ball

B^{d, c}

can be defined:

d_{B}^{c} (x, y) = \frac{2}{\sqrt{c}} arctanh (\sqrt{c} ∥- x \oplus_{c} y∥)

(8)

3.2. TransO: A Translation-Based KGC Model with Octonion

As mentioned above, translation-based methods have the advantages of a small number of parameters, easy training, and high deployment efficiency in general. However, limited by relatively simple operation rules, their feature interaction can only stay at a shallow level, and it is difficult to capture latent dependencies in KGs. As a kind of hypercomplex number, octonion has internal relevance and dependencies in its form, which is conducive to capturing the deep features and internal dependencies in KGs. We attempt to combine the translation method with octonion in order to exploit the advantages of both.

For a given triple

(h e a d, r e l, t a i l)

, in which

h e a d, t a i l \in E

, respectively, denote the head and tail entity in the entity set

E

, and

r e l \in R

denotes the relation in the relation set

R

, we regard both the head and the tail entity in the triple as octonion and represent them by h and t, respectively, i.e.,

h, t \in O

, and we regard the relation in triple as the combination of two octonions and represent it by

r_{1}

and

r_{2}

, i.e.,

r_{1}, r_{2} \in O

. The two octonions

r_{1}

and

r_{2}

are only related to the relation and are used, respectively, for multiplication and translation interactions with the head entity, in which the multiplication interaction uses the characteristics of octonion multiplication to mine the internal dependencies and deep features between all the components of the entities and relations, and the translation interaction uses the translation thoughts of the TransE method to mine the external dependencies and shallow features between the entities and relations. Subsequently, we judge the authenticity of the triple by calculating the distance between the head entity after the multiplication and translation interactions

r_{1} h + r_{2}

and the tail entity t.

In addition, by referring to the design in [19], we add hypersphere decision boundary radii

b_{h e a d}, b_{t a i l} \in R

for the head and tail entities, which are offset parameters related only to the entities themselves, and each entity has a boundary radius. By introducing these two parameters, our model will change from treating an entity as a point in space to a hypersphere, and the boundary radius parameter reflects the sphere of influence of the entity as a hypersphere. When

b_{h e a d}, b_{t a i l}

are large, offset corrections can be made for triples with lower scores but higher relatedness between entities to better reflect entity-specific distance characteristics.

However, as mentioned in Section 3.1.1, due to the relatively special operation rules of octonion, directly introducing octonion in hypercomplex form to the translation-based method does not match the widely used vector and matrix data forms in KGC methods. Therefore, we use the octonion coordinate vectors and component matrices in linear algebra to represent

h, r_{1}, r_{2}, t \in O

, i.e., we use

s (h), s (r_{2}), s (t) \in R^{8}

and

m (r_{1}) \in R^{8 \times 8}

, and convert the problem of calculating

r_{1} h + r_{2} - t \in O

into calculating

m (r_{1}) \cdot s (h) + s (r_{2}) - s (t) \in R^{8}

in linear algebraic form. According to Theorems 1–3, octonions can be integrated into the translation-based KGC method in the form of a matrix or vector.

Based on the above design, for a triple

(h e a d, r e l, t a i l)

, we define the scoring function of the TransO model as Equation (9):

\begin{matrix} f_{r e l} (h e a d, t a i l) & = - {∥m (r_{1}) \cdot s (h) + s (r_{2}) - s (t)∥}_{l 2}^{2} + b_{h e a d} + b_{t a i l} \\ = - {∥R \cdot h + r - t∥}_{l 2}^{2} + b_{h e a d} + b_{t a i l} \end{matrix}

(9)

where we regard the entity h, t, and relation

r_{1}

,

r_{2}

as octonions, i.e.,

h, r_{1}, r_{2}, t \in O

;

s (h)

,

s (r_{2})

,

s (t)

denote the coordinate vectors of the octonions h,

r_{2}

, t, i.e.,

s (h) = h

,

s (r_{2}) = r

,

s (t) = t

and

h, r, t \in R^{8}

;

m (r_{1})

denotes the component matrix of the octonion

r_{1}

, i.e.,

m (r_{1}) = R \in R^{8 \times 8}

;

b_{h e a d}, b_{t a i l} \in R

denote the radii of the decision boundary of the head and the tail entity, respectively.

Finally, the score of triple

(h e a d, r e l, t a i l)

is measured by the square of the

l 2

-norm of the interactive result coordinate vector, and the bias parameters

b_{h e a d}, b_{t a i l}

.

For the scoring function in Equation (9), since each coordinate vector or component matrix corresponds to only one octonion with eight-dimensional elements, their dimension is strictly limited to 8 or

8 \times 8

. Due to the low dimension, the model has limited representation capability and cannot achieve satisfactory results in application.

To solve this problem, we propose the generalized TransO model in higher-dimensional space, which represents each entity or relation as multiple groups of octonions and adopts grouping interaction rules to make the interaction occur between octonions in the same group. In this way, the dimension of the embedded space is no longer strictly limited to 8 and

8 \times 8

, but an integer multiple of 8, such as 32, 200, and 496. The scoring function of the generalized TransO model is shown in Equation (10):

\begin{matrix} f_{r e l} (h e a d, t a i l) = - {∥R \cdot h + r - t∥}_{l 2}^{2} + b_{h e a d} + b_{t a i l} \\ = - {∥[\begin{matrix} m (r_{11}) \\ m (r_{12}) \\ ⋱ \\ m (r_{1 k}) \end{matrix}] \cdot [\begin{matrix} s (h_{1}) \\ s (h_{2}) \\ ⋮ \\ s (h_{k}) \end{matrix}] + [\begin{matrix} s (r_{21}) \\ s (r_{22}) \\ ⋮ \\ s (r_{2 k}) \end{matrix}] - [\begin{matrix} s (t_{1}) \\ s (t_{2}) \\ ⋮ \\ s (t_{k}) \end{matrix}]∥}_{l 2}^{2} + b_{h e a d} + b_{t a i l} \end{matrix}

(10)

where

h, r, t \in R^{8 k} (k = 1, 2, \dots, n)

, respectively, denote the embedding vectors of the head entity, relation in translation interaction, and tail entity composed of k octonion coordinate vectors

s (h_{1}), s (h_{2}), \dots, s (h_{k})

, and

s (r_{21}), s (r_{22}), \dots, s (r_{2 k})

, and

s (t_{1}), s (t_{2}), \dots, s (t_{k})

;

R \in R^{8 k \times 8 k} (k = 1, 2, \dots, n)

denotes the diagonal matrix of the relation in the multiplication interaction composed of k octonion component matrices

m (r_{11}), m (r_{12}), \dots, m (r_{1 k})

;

b_{h e a d}, b_{t a i l} \in R

still denote the bias parameters of the head and the tail entity, respectively.

As can be seen from Equation (10), the component matrix of the relation used for multiplication interaction in the generalized TransO model is arranged as a diagonal matrix; each element on the principal diagonal is a standard octonion component matrix of

8 \times 8

, and several standard eight-dimensional octonion coordinate vectors of the entity and the relation used for translation interaction are also combined into a complete vector according to the multiple. In this way, entities and relations can interact in a higher embedded space and only between the octonions in the same group, ensuring that the time complexity of the model does not reach the square level, as with TransR [22] or RESCAL [10].

The growth of parameters resulting from the above group interaction process is still linear, and the model could achieve a balance between representation capability and complexity.

3.3. PTransO: Poincaré-Extended TransO Model

Since hyperbolic space has inherent advantages in modeling hierarchical information, we propose the PTransO model, which is used to build TransO in hyperbolic Poincaré space.

However, hyperbolic space is not directly defined for linear algebraic operations of matrices or vectors; linear algebraic multiplication of octonion component matrices and coordinate vectors cannot be directly performed in hyperbolic space. By means of tangent space equivalent to Euclidean space, PTransO allocates this part of the multiplication interaction into the tangent space, and transfers the interaction result from tangent space to hyperbolic space through exponential mapping. Meanwhile, due to the definition of Möbius vector addition in hyperbolic space, PTransO allocates the translation interaction to hyperbolic Poincaré space, replacing vector addition in traditional Euclidean geometry space, so as to integrate the Euclidean-based translation thoughts into hyperbolic space. Finally, the distance between the interacted head entity and the tail entity is evaluated by the hyperbolic distance calculation.

In short, PTransO makes use of the characteristics whereby the tangent space

T_{x}^{c}

in hyperbolic space can be regarded as Euclidean space, making the tangent space a bridge between hyperbolic geometry and Euclidean geometry. In the modeling process, PTransO uses both tangent space and hyperbolic Poincaré space, capturing the octonion multiplication interaction of entities and relations in tangent space, and capturing the translation interaction in hyperbolic Poincaré space.

For the triple

(h e a d, r e l, t a i l)

, the illustration of the interaction between entities and relations in PTransO is shown in Figure 7, and the scoring function of the PTransO model is shown in Equation (11):

\begin{matrix} f_{r e l} (h e a d, t a i l) = - d_{B}^{c_{r e l}} {(p_{h e a d}^{(r e l)}, p_{t a i l}^{(r e l)})}^{2} + b_{h e a d} + b_{t a i l} \\ = - d_{B}^{c_{r e l}} {({exp}_{0}^{c_{r e l}} (R \cdot h), {exp}_{0}^{c_{r e l}} (t) \oplus_{c_{r e l}} {exp}_{0}^{c_{r e l}} (r))}^{2} + b_{h e a d} + b_{t a i l} \end{matrix}

(11)

where

h, t, r \in R^{8}

denote the octonion coordinate vectors of the entity and the relation in translation interaction;

R \in R^{8 \times 8}

denotes the octonion component matrix of the relation in multiplication interaction;

h

,

t

,

r

and

R

are all constructed in the tangent space

T_{0}^{c}

equivalent to Euclidean space.

p_{h e a d}^{(r e l)}, p_{t a i l}^{(r e l)} \in B^{8, c_{r e l}}

denote the vectors of the head entity

h e a d

and the tail entity

t a i l

in the hyperbolic Poincaré space after interacting with relation

r e l

, respectively.

b_{h e a d}, b_{t a i l} \in R

are consistent with those defined in the TransO model, which are the bias parameters related to the head entity and tail entity, respectively.

The generating steps of

p_{h e a d}^{(r e l)}

are as follows:

1.: Make the multiplication interaction-related relation component matrix $R$ interact with the head entity coordinate vector $h$ in the tangent space;
2.: Transfer the interaction result to the hyperbolic Poincaré space by exponential mapping ${exp}_{0}^{c} (v)$ .

Moreover, the generating steps of

p_{t a i l}^{(r e l)}

are:

1.: Transfer the tail entity coordinate vector $t$ and translation interaction-related relation coordinate vector $r$ to hyperbolic Poincaré space by exponential mapping ${exp}_{0}^{c} (v)$ ;
2.: Add these two transferred vectors by Möbius addition $\oplus_{c}$ .

After obtaining

p_{h e a d}^{(r e l)}

and

p_{t a i l}^{(r e l)}

, calculate the hyperbolic distance

d_{B}^{c_{r e l}}

instead of

l 1

or

l 2 - norm

(Euclidean distance) for them, where

c_{r e l}

denotes the absolute value of curvature, which is a non-fixed value related to the relations.

As with generalized TransO, the PTransO model also has a generalized definition to satisfy the situation where the embedding dimension is an integer multiple of 8, as shown in Equation (12):

f_{r e l} (h e a d, t a i l) = - d_{B}^{c_{r e l}} {({exp}_{0}^{c_{r e l}} (R \cdot h), {exp}_{0}^{c_{r e l}} (t) \oplus_{c_{r e l}} {exp}_{0}^{c_{r e l}} (r))}^{2} + b_{h e a d} + b_{t a i l}

(12)

in which

R = [\begin{matrix} m (r_{11}) \\ m (r_{12}) \\ ⋱ \\ m (r_{1 k}) \end{matrix}]

(13)

\begin{matrix} h = [\begin{matrix} s (h_{1}) \\ s (h_{2}) \\ ⋮ \\ s (h_{k}) \end{matrix}] & r = [\begin{matrix} s (r_{21}) \\ s (r_{22}) \\ ⋮ \\ s (r_{2 k}) \end{matrix}] & t = [\begin{matrix} s (t_{1}) \\ s (t_{2}) \\ ⋮ \\ s (t_{k}) \end{matrix}] \end{matrix}

(14)

where

h, r, t \in R^{8 k}

and

R \in R^{8 k \times 8 k}

, respectively denote embedded vectors and matrices composed of octonion coordinate vectors or component matrices,

k = 1, 2, \dots, n

denotes the number of groups; the meaning of the group interaction represented in PTransO is consistent with that in TransO.

3.4. Training and Optimization

Referring to [11,29,30], we use the data enhancement method to add the inverse relation

r^{- 1}

to each triple

(h, r, t)

in the datasets, and randomly initialize the octonion matrix and vector of the inverse relation. In the training process, we regard the inverse triples

(t, r^{- 1}, h)

as true and add them to the positive sample set. By using the data enhancement method of inverse relation, our models train the triples from both the positive and negative directions, which captures the triple features more fully, and enrich the dataset with twice the sample size.

We use the uniform negative sampling method to randomly select k elements from the entity set

E

, and replace the tail entity t in the true triples, so as to generate k negative samples for each true triple. In the negative sampling, we only replace the tail entity, because we have added inverse triples in the above data enhancement method. Since the tail entity in the inverse triple is the head entity in the positive triple, only one of the head or tail entities of the enhanced data needs to be replaced to cover all the entities.

Since our models add

b_{h e a d}, b_{t a i l} \in R

as entity-specific relatedness bias parameters to the scoring function on the basis of calculating Euclidean or hyperbolic distance, the scoring function as a whole reflects a relatedness measure that combines distance with bias parameters. Therefore, we choose Bernoulli negative log-likelihood as a loss function, and train our models by minimizing this loss function as shown below:

L (y, p) = - \frac{1}{N} \sum_{i = 1}^{N} (y_{i} log (p_{i}) + (1 - y_{i}) log (1 - p_{i}))

(15)

where N denotes the number of training samples, y denotes the true or false label of the triple, and p denotes the probability that the triple is true, obtained by prediction.

We use the AdamW [39] optimizer to train our models on different benchmark datasets. Since the training efficiency of the AdamW optimizer is higher than that of SGD [40] applied in traditional methods, the training convergence time of the models is shortened. Compared with the Adam optimizer, the modified weight decay mechanism of AdamW can better control the strength of punishment to reduce the overfitting.

4. Experiments and Discussion

4.1. Link Prediction Experiment

Link prediction is an essential mechanism for network evolution [41], which, on the KGs, is an experiment under the condition of a given relation and a given head or tail entity to predict another missing entity. This experiment is one of the most important methods and standards to test the representation capability of KGC models.

The operation steps of the link prediction experiment are as follows:

1.: Make a list of all the entities in the dataset, and place this list in the position of the empty entity in the triple;
2.: Traverse each entity from the list in the position of empty entity, and form complete triples with the entity and relation in the non-empty positions;
3.: Use the experimental model to evaluate the scores of these triples, and sort the entities in the list according to the order of scores from high to low.

The link prediction experiment focuses on the rank of the correct entity in the original triple, and not only on whether the correct entity can be found. Therefore, the widely used evaluation indicators in this experiment are: mean rank (MR) and mean reciprocal rank (MRR) of correct entities, and the proportion of correct entities appearing in the top k items sorted by score from high to low (Hits@k). The above evaluation indicators usually contain two evaluation settings:

r a w

and

f i l t e r e d

.

f i l t e r e d

removes the triples that also exist in the KG dataset from the candidate list when calculating ranks, whereas

r a w

does not. The experimental data in both our proposed models and the reference baseline models adopt the

f i l t e r e d

setting by default.

4.2. Experimental Setup

We use the WN18 [9] benchmark dataset to test the link prediction effect of TransO compared with the traditional translation-based KGC models, and use the WN18RR [11] and FB15k-237 [42] benchmark datasets to test both TransO and PTransO compared with the existing advanced KGC models.

The WN18 dataset is derived from the cognitive-based English vocabulary knowledge network WordNet [43], and the WN18RR dataset is a subset of WN18 by removing the inverse relations in WN18. The FB15k-237 dataset is a subset of the FB15k [9] dataset, which is derived from the large knowledge base Freebase [44]. Similar to WN18RR, FB15k-237 removes the inverse relations in the original FB15k dataset. The main relational patterns in WN18 are “symmetric”, “asymmetric”, and “inversion”, whereas those in the WN18RR and FB15k-237 datasets are “symmetric”, “asymmetric”, and “combinatorial”. Significantly, WN18RR is hierarchical, but FB15k-237 is not.

Because WN18RR and FB15k-237 delete the inverse relations that may lead to data leakage from the original datasets WN18 and FB15k, respectively, these two datasets can more accurately reflect the effects of the models, and are generally less able to obtain satisfactory link prediction results than the original datasets. However, since the translation-based KGC models appeared earlier, they were mostly experimented on the WN18 dataset. In order to effectively compare TransO with the same type of translation models, we still retain WN18 as one of the experimental datasets to verify the effectiveness of the TransO model. The dataset statistics are shown in Table 1.

For the WN18 dataset, we conduct the link prediction experiment only in high-dimensional

(d \in \{200, 400, 496\})

embedded space in order to compare the best performance with traditional translation-based models, while we use the WN18RR and FB15k-237 datasets in both high-dimensional

(d \in \{200, 400, 496\})

and low-dimensional

(d = 32)

embedded spaces in order to analyze the influence of embedded dimensions on the experimental results.

The learning rate of our models is set to 7.5 × 10

^{- 5}

; the weight decay in the high-dimensional embedded space is set to 1.5 × 10

^{- 2}

, and that in the low-dimensional embedded space is set to 7.5 × 10

^{- 3}

; the batch size for training is set to 128; the number of negative samples is set to 50.

4.3. Experimental Results

Table 2 shows the link prediction results of TransO on the WN18 dataset; Table 3 and Table 4 show the link prediction results of TransO and PTransO on the WN18RR and FB15k-237 datasets, respectively, in high-dimensional and low-dimensional embedded spaces. In Table 2, Table 3 and Table 4, the best results are marked in bold, and the second-best results are underlined.

According to the experimental results in Table 2, TransO performs well on the WN18 dataset, and achieves state-of-the-art performance compared with the baseline models. The Hits@10 of TransO reached 0.962, increased by 1.26% compared with the previous best TransAT model; Hits@3 reached 0.942, increased by 2.06% compared with the previous best TransD model. However, for the Hits@1 indicator, TransO achieved a surprising improvement compared with the translation models, and nearly doubled the previous best TransR model. We analyze that the reason for this experimental result is that we introduced the octonion feature interaction, which causes the translation-based TransO to have the capabilities to mine the deep interaction features similarly to the semantic models, thereby improving the predictive effects on Hits@1. To validate this view, we added a comparison with the semantics-based DistMult model. It can be seen that DistMult also has been greatly improved compared with the translation baseline models on the Hits@1 indicator, which is consistent with the TransO model. Compared with DistMult, TransO performed better in all indicators, which reflects that the translation model can also have similar or stronger expressive capabilities compared to the semantic models by combining the method of octonion feature interaction. Moreover, the results on the MR and MRR indicators also indicate that TransO can improve the ranking of correct entities in the list of candidate entities in the process of link prediction, and it can more easily find the correct results from the list of candidate entities than other baseline models. The above experimental results preliminarily verify the feasibility of introducing the mathematical methods of octonion into the translation-based KGC model.

It can be seen from the experimental results in Table 3 that the PTransO and TransO models perform well on the WN18RR dataset in high-dimensional embedded space, in which most indicators (except MR) of PTransO achieve state-of-the-art performance, while TransO performs only next to PTransO, ranking second on this dataset. The PTransO model’s Hits@10 and MRR reached 0.602 and 0.504, which increased by 3.44% and 2.86%, respectively, compared with the previous best QuatE and AttE baseline models. Meanwhile, the TransO model’s Hits@10 and MRR reached 0.599 and 0.499, which increased by 2.92% and 1.84%, respectively, compared with the previous best baseline models. Although the TransO and PTransO models did not achieve the best MR indicator, their MRs were also at a relatively advanced level compared to the baseline models.

These results on the WN18RR dataset reflect that: (1) it is effective to introduce the mathematical theories of octonion into the translation-based KGC method, since the comparison of TransO and advanced baseline models (including semantics-based and neural network-based, etc.) shows that TransO is generally at a satisfactory level; (2) in the environment of a hierarchical dataset, it is effective to introduce hyperbolic Poincaré geometry into the translation-based KGC model, because, compared with TransO, PTransO shows an improvement in most indicators (except MR) and achieves state-of-the-art performance; the most obvious is that Hits@1 has increased by 2.25%. PTransO is only lower than TransO on the MR indicator. For this case, we consider that there are more false-positive samples in the test results of PTransO, i.e., the negative triples are wrongly judged as positive triples. Since the MR indicator is more vulnerable to false-positive samples [46], PTransO performs worse on MR than TransO. However, overall, PTransO has achieved a substantial improvement compared with TransO.

In addition, by comparing the performance of the TransO model on the WN18 and WN18RR datasets, respectively, in Table 2 and Table 3, it can be seen that all indicators of the model on WN18RR decreased significantly. This is mainly because WN18RR removed the inverse and redundant triples from the WN18 dataset, and resolved the problem of data leakage that exists in WN18, where test triples can be trained directly through the inverse triples in the training and validation sets. Therefore, there is no longer an “inversion” relation pattern in the WN18RR dataset, and the model can achieve significantly worse performance on this dataset than on the original WN18, but the evaluation on the WN18RR dataset is more conducive to reflecting the true modeling capabilities.

However, on the FB15k-237 dataset, there is still a certain gap between our methods and the state-of-the-art model in high-dimensional embedded space. The TransO model’s Hits@10 and MRR reached 0.534 and 0.347, which increased by 14.83% and 18.02% compared with the TransE model, but are still 3.00% and 3.17% lower, respectively, than the QuatE and TuckER models, which performed best among the baseline models. PTransO’s performance on FB15k-237 is close to TransO, and is slightly lower than TransO on most indicators. Moreover, the MR indicators of our models on this dataset are at a moderate level, and their performance is similar to that of the RotatE, MuRP, and MuRE models with geometric methods. Although significantly better than the translation-based TransE model, there is still a large gap compared with the semantics-based QuatE model.

The TransO and PTransO models underperform on the FB15k-237 dataset embedded in high-dimensional vector space. We analyze that the main reason is that the translation-based models have insufficient modeling capabilities for the datasets with complex relational types. The FB15k-237 dataset is derived from Freebase and contains various types of common information, including human, social, and geographical information. There are 237 relations in the FB15k-237 dataset, compared with only 11 relations in WN18RR. The translation-based models ultimately measure the distance between the head entities interacting with relations and the tail entities only through the

l 1

or

l 2

-norm, and generally have weaker capabilities to distinguish similar relations than semantic or neural network models. Although the bias parameters

b_{h e a d}

and

b_{t a i l}

are added to the models that we propose, they are only entity-specific parameters and have nothing to do with the relations.

TransO performs slightly better than PTransO on the FB15k-237 dataset embedded in high-dimensional space. We analyze that the main reason is that the FB15k-237 dataset is not hierarchical. Reviewing Section 3.1.2, we can see that hyperbolic geometry has inherent advantages in modeling capabilities compared with Euclidean geometry under the condition of hierarchical data or low-dimensional vector space, but if both conditions are not met, hyperbolic geometry has no obvious advantages over Euclidean geometry, i.e., the advantages of hyperbolic geometry cannot be utilized without low-dimensional space and hierarchical datasets. Moreover, the introduction of hyperbolic Poincaré geometry into the TransO model requires the indirect construction of a Euclidean-like space using tangent space

T_{0}^{c}

. Frequent use of the mapping from tangent space

T_{0}^{c}

to Poincaré space

B^{d, c}

may result in calculation errors that affect the accuracy of model predictions.

By analyzing the deficiencies of our models in Table 3, if we can find a method to improve the modeling capability of the translation model for complex datasets with a large number of relations, or find a method that can directly use the octonion in the hyperbolic Poincaré space rather than in the tangent space, we may further improve the performance.

As can be seen from Table 4, the PTransO model achieves state-of-the-art performance on the WN18RR dataset, and its Hits@10 and MRR reach 0.570 and 0.479, which have increased by 3.07% and 1.48%, respectively, compared with the previous best RotH baseline model.

Moreover, PTransO basically achieves the best effect on FB15k-237 (except Hits@1), and its Hits@10 and MRR reached 0.503 and 0.325, which increased by 0.40% and 0.31%, respectively, compared with the previous best AttH baseline model.

Compared with PTransO, the TransO model performs slightly less well in all aspects in low-dimensional embedded space, and its Hits@10 and Hits@3 in the WN18RR dataset, as well as all indicators in the FB15k-237 dataset, reached second place, while other indicators were close to those of the baseline models.

These further confirm that hyperbolic Poincaré geometry is better than Euclidean geometry in constructing KGs in low-dimensional embedded space.

4.4. Results in Modeling Different Types of Relations

To test the representation capability of our models in modeling different types of relations, we design an ablation experiment in this section. In this experiment, we attempt to summarize the MRR for each relation on the WN18RR dataset, and give the results shown in Table 5. The comparison models we selected are RotatE [31] and QuatE [32], which are typical KGC models that apply the mathematical theories of complex numbers and quaternions, respectively.

According to Table 5, among the 11 relations, nine (81.82%) MRRs of TransO exceed or equal RotatE, and seven (63.64%) MRRs exceed or equal QuatE. These results indicate that, firstly, octonions have the representation capability to surpass complex numbers and quaternions when modeling different types of relations; secondly, the combination of octonion and translation-based thoughts can compensate for the disadvantages of translation-based methods due to the shallow captured features to a certain extent, and generally achieves a performance gain over semantics-based methods on the WN18RR dataset. In addition, compared with TransO, PTransO has MRRs greater than or equal to TransO on 7 of 11 (63.64%) relations, confirming the effectiveness of introducing hyperbolic geometry to improve hierarchical data representation.

4.5. Euclidean or Poincaré?

To further test the effects of hyperbolic space and Euclidean space in link prediction, we design an ablation experiment. In this experiment, we attempt to evaluate the MRR and Hits@10 of the TransO and PTransO models on the WN18RR dataset under the dimension of

d \in \{32, 200, 400, 496\}

, and calculate the relative difference value between TransO and PTransO.

As can be seen from Figure 8, the MRR and Hits@10 of both the TransO and PTransO models show an upward trend, and the effects of PTransO are always better than those of TransO. However, with the increase in embedded dimension, the relative difference value between these two models decreases.

The above experimental results show that hyperbolic Poincaré space has more advantages for hierarchical data modeling than Euclidean space under low-dimensional embeddings. When the embedded dimension increases, the advantage of hyperbolic space is no longer obvious because Euclidean space also gradually has the capability to model complex data.

We can infer that when the embedded dimension increases further, the experimental effect of TransO will be equal to or even surpass that of PTransO. Therefore, hyperbolic Poincaré space is more suitable for use in the case of low-dimensional embeddings, which can not only save computing time and storage space consumption, but also retains rich hierarchical information in the data. Under low-dimensional embeddings, hyperbolic space has incomparable advantages over Euclidean space.

5. Conclusions

This paper introduces a translation-based octonion KGC model TransO, and an extended model, PTransO, constructed in hyperbolic Poincaré space. Through the mathematical operation between the component matrix and the coordinate vector of octonion, TransO enhances the interaction capability between entities and relations in triples of KGs, so as to capture the latent inter-dependencies of datasets. Furthermore, by transferring the embedded space from Euclidean to hyperbolic Poincaré, the extended model PTransO significantly enhances the representation capability to hierarchical information in low-dimensional space.

Link prediction results indicate that PTransO achieves state-of-the-art performance on the hierarchical WN18RR dataset in high-dimensional embedded space, and on both hierarchical WN18RR and non-hierarchical FB15k-237 datasets in low-dimensional embedded space; while TransO performs slightly worse than PTransO on most experimental results, it still achieves state-of-the-art performance on the WN18 dataset compared with the translation-based baseline models, and achieves an advanced level on the WN18RR dataset. These experimental results reflect the effectiveness of introducing mathematical theories of octonion into translation-based KGC methods and hyperbolic Poincaré geometry, especially in low-dimensional embedded space.

The experimental results also indicate that our methods yield a limited improvement on the FB15k-237 dataset in high-dimensional embedded space. This is mainly due to the fact that our models are translation-based in essence, and such KGC models are inherently less capable of constructing datasets with a large number of complex relations than semantics-based models. Moreover, the limited representation capability of hyperbolic Poincaré geometry in high-dimensional embedded space is also one of the reasons for the problem. In the future, our work will focus on the application of octonion in other natural language processing (NLP) tasks, so as to better play the important role of the mathematical theories of octonion in engineering.

Author Contributions

Conceptualization, M.Y.; methodology, M.Y. and C.B.; software, J.Y.; data curation, X.L.; writing—original draft preparation, M.Z. and T.X.; writing—review and editing, H.L.; visualization, C.B.; supervision, X.L.; project administration, R.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61877043) and the National Natural Science Foundation of China (Grant No. 61877044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, X.; Huan, Z.; Zhai, Y.; Lin, T. Research of Personalized Recommendation Technology Based on Knowledge Graphs. Appl. Sci. 2021, 11, 7104. [Google Scholar] [CrossRef]
Introducing the Knowledge Graph: Things, Not Strings. Available online: https://www.blog.google/products/search/introducing-knowledge-graph-things-not/ (accessed on 16 May 2012).
Xiong, C.; Power, R.; Callan, J. Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1271–1279. [Google Scholar]
Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T.-S. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 950–958. [Google Scholar]
Saha, A.; Pahuja, V.; Khapra, M.M.; Sankaranarayanan, K.; Chandar, S. Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 705–713. [Google Scholar]
Song, H.-J.; Kim, A.-Y.; Park, S.-B. Learning Translation-Based Knowledge Graph Embeddings by N-Pair Translation Loss. Appl. Sci. 2020, 10, 3964. [Google Scholar] [CrossRef]
Choi, S.J.; Song, H.-J.; Park, S.-B. An Approach to Knowledge Base Completion by a Committee-Based Knowledge Graph Embedding. Appl. Sci. 2020, 10, 2651. [Google Scholar] [CrossRef] [Green Version]
Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26 (NIPS 2013); Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 2787–2795. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.-P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 809–816. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1811–1818. [Google Scholar]
Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; pp. 327–333. [Google Scholar]
Popa, C.A. Octonion-Valued Neural Networks. In Proceedings of the 25th International Conference on Artificial Neural Networks, Barcelona, Spain, 6–9 September 2016; pp. 435–443. [Google Scholar]
Huang, G.; Li, X. Color Palmprint Feature Extraction and Recognition Algorithm Based on Octonion. Comput. Eng. 2012, 38, 28–33. [Google Scholar]
Daboul, J.; Delbourgo, R. Matrix representation of octonions and generalizations. J. Math. Phys. 1999, 40, 4134–4150. [Google Scholar] [CrossRef] [Green Version]
Cariow, A.; Cariowa, G. Algorithm for multiplying two octonions. Radioelectron. Commun. Syst. 2012, 55, 464–473. [Google Scholar] [CrossRef]
Yang, Y.; Yang, C. The Real Representation of Octonion Vector and Matrix. J. Xianyang Norm. Univ. 2013, 4, 9–12. [Google Scholar]
Ungar, A.A. Hyperbolic Trigonometry and its Application in the Poincaré Ball Model of Hyperbolic Geometry. Comput. Math. Appl. 2001, 41, 135–147. [Google Scholar] [CrossRef] [Green Version]
Balažević, I.; Allen, C.; Hospedales, T. Multi-relational Poincaré Graph Embeddings. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 4463–4473. [Google Scholar]
Chami, I.; Wolf, A.; Juan, D.-C.; Sala, F.; Ravi, S.; Ré, C. Low-Dimensional Hyperbolic Knowledge Graph Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. 5–10 July 2020; pp. 6901–6914. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
Nguyen, D.Q.; Sirts, K.; Qu, L.; Johnson, M. STransE: A novel embedding model of entities and relationships in knowledge bases. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 460–466. [Google Scholar]
Ji, G.; Liu, K.; He, S.; Zhao, J. Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 985–991. [Google Scholar]
Qian, W.; Fu, C.; Zhu, Y.; Cai, D.; He, X. Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 4286–4292. [Google Scholar]
Yang, B.; Yih, W.-T.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Kazemi, S.M.; Poole, D. SimplE Embedding for Link Prediction in Knowledge Graphs. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018); Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 4284–4295. [Google Scholar]
Balažević, I.; Allen, C.; Hospedales, T.M. TuckER: Tensor Factorization for Knowledge Graph Completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 5185–5194. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion Knowledge Graph Embeddings. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 2735–2745. [Google Scholar]
Nguyen, D.Q.; Vu, T.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 2180–2189. [Google Scholar]
Vashishth, S.; Sanyal, S.; Nitin, V.; Agrawal, N.; Talukdar, P. InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3009–3016. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the 15th European Semantic Web Conference, Heraklion, Greece, 3–7 June 2018; pp. 593–607. [Google Scholar]
Knarr, N.; Stroppel, M.J. Subforms of norm forms of octonion fields. Arch. Math. 2018, 110, 213–224. [Google Scholar] [CrossRef]
Conway, J.H.; Smith, D.A.; Dixon, G. On quaternions and octonions: Their geometry, arithmetic, and symmetry. Math. Intell. 2004, 26, 75–77. [Google Scholar]
Kaplan, A. Quaternions and octonions in Mechanics. Rev. Union Mat. Argent. 2008, 49, 45–53. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. In Proceedings of the 19th International Conference on Computational Statistics, Paris, France, 22–27 August 2010; pp. 177–186. [Google Scholar]
Li, K.; Gu, S.; Yan, D. A Link Prediction Method Based on Neural Networks. Appl. Sci. 2021, 11, 5186. [Google Scholar] [CrossRef]
Toutanova, K.; Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, Beijing, China, 31 July 2015; pp. 57–66. [Google Scholar]
Miller, G.A.; Beckwith, R.; Fellbaum, C.; Gross, D.; Miller, K. Introduction to WordNet: An On-line Lexical Database. Int. J. Lexicogr. 1990, 3, 235–244. [Google Scholar] [CrossRef] [Green Version]
Bollacker, K.D.; Evans, C.; Paritosh, P.K.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June 2008; pp. 1247–1249. [Google Scholar]
Lacroix, T.; Usunier, N.; Obozinski, G. Canonical Tensor Decomposition for Knowledge Base Completion. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 2863–2872. [Google Scholar]
Zhang, Y.; Yao, Q.; Shao, Y.; Chen, L. NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding. In Proceedings of the 35th IEEE International Conference on Data Engineering, Macao, China, 8–11 April 2019; pp. 614–625. [Google Scholar]

Figure 1. Examples of (a) inter-dependencies and (b) hierarchical information in KGs.

Figure 2. Illustrations of (a) inner product, (b) tensor product, (c) Hadamard product, and (d) octonion product.

Figure 3. Illustrations of (a) quaternion multiplication and (b) octonion multiplication (Fano plane).

Figure 4. Multiplication rules of the unit octonion in tabular form.

Figure 5. Illustration of two-dimensional Poincaré model.

Figure 6. An illustration of exponential mapping and logarithmic mapping on hyperbolic Riemannian manifold. M is a hyperbolic manifold and

T_{x} M

is the tangent space of M at point

x

. Point

y

can be transformed into vector

v

on tangent space

T_{x} M

by logarithmic mapping

{log}_{x} (y)

, and vector

v

can be transformed into point

y

on manifold M by exponential mapping

{exp}_{x} (v)

. Exponential mapping and logarithmic mapping are a pair of operations.

Figure 6. An illustration of exponential mapping and logarithmic mapping on hyperbolic Riemannian manifold. M is a hyperbolic manifold and

T_{x} M

is the tangent space of M at point

x

. Point

y

can be transformed into vector

v

on tangent space

T_{x} M

by logarithmic mapping

{log}_{x} (y)

, and vector

v

can be transformed into point

y

on manifold M by exponential mapping

{exp}_{x} (v)

. Exponential mapping and logarithmic mapping are a pair of operations.

Figure 7. A simple illustration of the interaction process of the PTransO model. For ease of understanding, we illustrate the process on the manifold, while the actual interaction process is generated in a specific Poincaré ball.

Figure 8. MRR and Hits@10 of the link prediction experiment of TransO and PTransO models on the WN18RR dataset.

Table 1. Dataset statistics.

Dataset	Entites	Relations	Triples	#Training	#Validation	#Test
WN18	40,943	18	151k	141,442	5000	5000
WN18RR	40,943	11	93k	86,835	3034	3134
FB15k-237	14,951	237	310k	272,115	17,535	20,466

Table 2. Link prediction results of TransO on WN18 dataset.

Model	MR	MRR	Hits@10	Hits@3	Hits@1
TransE (Bordes et al., 2013) [9]	251	0.454	0.934	0.823	0.089
TransH (Wang et al., 2014) [21]	388	0.485	0.936	0.916	0.060
TransR (Lin et al., 2015) [22]	225	0.605	0.940	0.876	0.335
TransD (Ji et al., 2015) [23]	212	0.580	0.942	0.923	0.241
STransE (Nguyen et al., 2016) [24]	206	0.657	0.934	-	-
TransSparse (Ji et al., 2016) [25]	211	-	0.932	-	-
TransAT (Qian et al., 2018) [26]	157	-	0.950	-	-
DistMult (Yang et al., 2014) [27]	902	0.822	0.934	0.914	0.728
TransO	98	0.918	0.962	0.942	0.889

Table 3. Link prediction results in high-dimensional space on WN18RR and FB15k-237 datasets.

Model	WN18RR					FB15k-237
Model	MR	MRR	Hits@10	Hits@3	Hits@1	MR	MRR	Hits@10	Hits@3	Hits@1
TransE (Bordes et al., 2013) [9]	3384	0.226	0.501	-	-	357	0.294	0.465	-	-
DistMult (Yang et al., 2014) [27]	5110	0.430	0.490	0.440	0.390	254	0.241	0.419	0.263	0.155
TuckER (Balažević et al., 2019) [29]	-	0.470	0.526	0.482	0.443	-	0.358	0.544	0.394	0.266
ComplEx-N3 (Lacroix et al., 2018) [45]	-	0.480	0.572	0.495	0.435	-	0.357	0.547	0.392	0.264
RotatE (Sun et al., 2019) [31]	3340	0.476	0.571	0.492	0.428	177	0.338	0.533	0.375	0.241
QuatE (Zhang et al., 2019) [32]	2314	0.488	0.582	0.508	0.438	87	0.348	0.550	0.382	0.248
ConvE (Dettmers et al., 2018) [11]	4187	0.430	0.520	0.440	0.400	244	0.325	0.501	0.356	0.237
MuRE (Balažević et al., 2019) [19]	2108	0.475	0.554	0.487	0.436	171	0.336	0.521	0.370	0.245
AttE (Chami et al., 2020) [20]	-	0.490	0.581	0.508	0.443	-	0.351	0.543	0.386	0.255
MuRP (Balažević et al., 2019) [19]	2306	0.481	0.566	0.495	0.440	172	0.335	0.518	0.367	0.243
AttH (Chami et al., 2020) [20]	-	0.486	0.573	0.499	0.443	-	0.348	0.540	0.384	0.252
TransO	2269	0.499	0.599	0.519	0.445	169	0.347	0.534	0.383	0.253
PTransO	2416	0.504	0.602	0.524	0.455	169	0.345	0.531	0.381	0.251

Table 4. Link prediction results in low-dimensional space on WN18RR and FB15k-237 datasets.

Model	WN18RR				FB15k-237
	MRR	Hits			MRR	Hits
	MRR	@10	@3	@1	MRR	@10	@3	@1
TransE (Bordes et al., 2013) [9]	0.182	0.419	0.266	0.053	0.147	0.259	0.158	0.089
DistMult (Yang et al., 2014) [27]	0.327	0.379	0.351	0.293	0.178	0.332	0.197	0.100
ComplEx-N3 (Lacroix et al., 2018) [45]	0.420	0.460	0.420	0.390	0.294	0.463	0.322	0.211
RotatE (Sun et al., 2019) [31]	0.387	0.491	0.417	0.330	0.290	0.458	0.316	0.208
ConvE (Dettmers et al., 2018) [11]	0.395	0.476	0.420	0.350	0.307	0.476	0.335	0.222
MuRE (Balažević et al., 2019) [19]	0.458	0.525	0.471	0.421	0.313	0.489	0.340	0.226
RotE (Chami et al., 2020) [20]	0.463	0.529	0.477	0.426	0.307	0.482	0.337	0.220
AttE (Chami et al., 2020) [20]	0.456	0.526	0.471	0.419	0.311	0.488	0.339	0.223
MuRP (Balažević et al., 2019) [19]	0.465	0.544	0.484	0.420	0.323	0.501	0.353	0.235
RotH (Chami et al., 2020) [20]	0.472	0.553	0.490	0.428	0.314	0.497	0.346	0.223
AttH (Chami et al., 2020) [20]	0.466	0.551	0.484	0.419	0.324	0.501	0.354	0.236
TransO	0.471	0.558	0.493	0.421	0.324	0.501	0.356	0.235
PTransO	0.479	0.570	0.500	0.430	0.325	0.503	0.357	0.235

Table 5. MRRs on each relation of WN18RR dataset.

Relation Name	RotatE	QuatE	TransO	PTransO
hypernym	0.148	0.173	0.206	0.207
derivationally_related_form	0.947	0.953	0.941	0.950
instance_hypernym	0.318	0.364	0.397	0.405
also_see	0.585	0.629	0.656	0.636
member_meronym	0.232	0.232	0.258	0.261
synset_domain_topic_of	0.341	0.468	0.427	0.420
has_part	0.184	0.233	0.239	0.246
member_of_domain_usage	0.318	0.441	0.400	0.373
member_of_domain_region	0.200	0.193	0.397	0.368
verb_group	0.943	0.924	0.874	0.888
similar_to	1.000	1.000	1.000	1.000

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, M.; Bai, C.; Yu, J.; Zhao, M.; Xu, T.; Liu, H.; Li, X.; Yu, R. Translation-Based Embeddings with Octonion for Knowledge Graph Completion. Appl. Sci. 2022, 12, 3935. https://doi.org/10.3390/app12083935

AMA Style

Yu M, Bai C, Yu J, Zhao M, Xu T, Liu H, Li X, Yu R. Translation-Based Embeddings with Octonion for Knowledge Graph Completion. Applied Sciences. 2022; 12(8):3935. https://doi.org/10.3390/app12083935

Chicago/Turabian Style

Yu, Mei, Chen Bai, Jian Yu, Mankun Zhao, Tianyi Xu, Hongwei Liu, Xuewei Li, and Ruiguo Yu. 2022. "Translation-Based Embeddings with Octonion for Knowledge Graph Completion" Applied Sciences 12, no. 8: 3935. https://doi.org/10.3390/app12083935

APA Style

Yu, M., Bai, C., Yu, J., Zhao, M., Xu, T., Liu, H., Li, X., & Yu, R. (2022). Translation-Based Embeddings with Octonion for Knowledge Graph Completion. Applied Sciences, 12(8), 3935. https://doi.org/10.3390/app12083935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Translation-Based Embeddings with Octonion for Knowledge Graph Completion

Abstract

1. Introduction

2. Related Work

2.1. Translation-Based Methods

2.2. Semantics-Based Methods

2.3. Neural Network-Based Methods

3. Methodology

3.1. Preliminaries

3.1.1. Background and Calculation Rules of Octonion

3.1.2. Hyperbolic Geometry of Poincaré Model

3.2. TransO: A Translation-Based KGC Model with Octonion

3.3. PTransO: Poincaré-Extended TransO Model

3.4. Training and Optimization

4. Experiments and Discussion

4.1. Link Prediction Experiment

4.2. Experimental Setup

4.3. Experimental Results

4.4. Results in Modeling Different Types of Relations

4.5. Euclidean or Poincaré?

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI