Next Article in Journal
Pharmacoinformatics Analysis Reveals Flavonoids and Diterpenoids from Andrographis paniculata and Thespesia populnea to Target Hepatocellular Carcinoma Induced by Hepatitis B Virus
Next Article in Special Issue
Robustness Analysis on Graph Neural Networks Model for Event Detection
Previous Article in Journal
Hybrid Genetic Algorithm−Based BP Neural Network Models Optimize Estimation Performance of Reference Crop Evapotranspiration in China
Previous Article in Special Issue
The Multi-Hot Representation-Based Language Model to Maintain Morpheme Units
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cosine-Based Embedding for Completing Lightweight Schematic Knowledge in DL-Litecore †

1
School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2
Key Laboratory of Computer Network and Information Integration, Southeast University, Ministry of Education, Nanjing 211189, China
3
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
4
School of Computer Science, The University of Auckland, Auckland 1010, New Zealand
5
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
6
Intel Joint Research Institute on Intelligent Edge Computing, Nanjing 211135, China
*
Author to whom correspondence should be addressed.
This Paper is a Substantial Extended Version of Paper Published in the 8th CCF International Conference on Natural Language Processing and Chinese Computing, Dunhuang, China, 9–14 October 2019.
Appl. Sci. 2022, 12(20), 10690; https://doi.org/10.3390/app122010690
Submission received: 6 September 2022 / Revised: 28 September 2022 / Accepted: 19 October 2022 / Published: 21 October 2022
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

Abstract

:
Schematic knowledge, an important component of knowledge graphs (KGs), defines a rich set of logical axioms based on concepts and relations to support knowledge integration, reasoning, and heterogeneity elimination over KGs. Although several KGs consist of lots of factual knowledge, their schematic knowledge (e.g., s u b c l a s s O f axioms, d i s j o i n t W i t h axioms) is far from complete. Currently, existing KG embedding methods for completing schematic knowledge still suffer from two limitations. Firstly, existing embedding methods designed to encode factual knowledge pay little attention to the completion of schematic knowledge (e.g., axioms). Secondly, several methods try to preserve logical properties of relations for completing schematic knowledge, but they cannot simultaneously preserve the transitivity (e.g., s u b c l a s s O f ) and symmetry (e.g., d i s j o i n t W i t h ) of axioms well. To solve these issues, we propose a cosine-based embedding method named CosE tailored for completing lightweight schematic knowledge in DL-Lite c o r e . Precisely, the concepts in axioms will be encoded into two semantic spaces defined in CosE. One is called angle-based semantic space, which is employed to preserve the transitivity or symmetry of relations in axioms. The other one is defined as translation-based semantic space that is used to measure the confidence of each axiom. We design two types of score functions for these two semantic spaces, so as to sufficiently learn the vector representations of concepts. Moreover, we propose a novel negative sampling strategy based on the mutual exclusion between s u b c l a s s O f and d i s j o i n t W i t h . In this way, concepts can obtain better vector representations for schematic knowledge completion. We implement our method and verify it on four standard datasets generated by real ontologies. Experiments show that CosE can obtain better results than existing models and keep the logical properties of relations for transitivity and symmetry simultaneously.

1. Introduction

In recent years, knowledge graphs (KGs) have attracted lots of attention since they can effectively organize and represent knowledge from rich resource data, which can provide users with various smarter services through knowledge reasoning techniques. There exist two types of knowledge in a KG. One is schematic knowledge, which is made up of assertions about concepts and relations called axioms. The other is factual knowledge, which is composed of statements about instances called triples [1].
Schematic knowledge, a critical component of KGs, formulates a rich set of logical axioms based on concepts to support the elimination of heterogeneity, integration, and reasoning over KGs. Nevertheless, existing knowledge graphs (e.g., WordNet [2], DBpedia [3], and YAGO [4]) mostly consists of lots of factual knowledge and little schematic knowledge. For the famous knowledge graph DBpedia (https://www.dbpedia.org/ accessed on 5 September 2022), it contains more than 1.89 billion triples and over 3.5 million entities among them. However, it contains only 768 concepts and 20 d i s j o i n t W i t h axioms asserted among them. The sparsity of schematic knowledge will limit the applications and services of KGs such as query-answering [5], recommendation system [6], and knowledge integration [7]. Hence, it is essential to improve the completeness of schematic knowledge. Nevertheless, it is hard for traditional reasoning-based methods to automatically infer all the remaining axioms. Take, for an example shown in Figure 1, there are three axioms (Farm_Boy, s u b c l a s s O f , Boy), (Boy, s u b c l a s s O f , Male_Person), and (Male_Person, s u b c l a s s O f , Person), defined in one schematic knowledge. If the relation s u b c l a s s O f from Boy to Male _ Person marked in red is missing, then it is hard to obtain the s u b c l a s s O f relation from Boy to Person marked in blue using traditional reasoning-based methods.
Knowledge graph embedding, which aims to encode the entities and relations of a KG into the low-dimensional and continuous vector space, has been widely studied and has been proven to be of great help in KG completion via link prediction [8] and other downstream tasks; see [9,10,11,12]. The mainstream models are designed for factual knowledge embedding, including TransE [13], TransH [14], TransR [15], and so on, which regard the relation as a “translation” from the head entity to the tail entity. Another kind of model for factual knowledge embedding, such as RESCAL [16], DistMult [17], HolE [18], and ComplEx [19], design various operators to encode rich interactions among embedding vectors. Recently, several works have tried to encode the properties of relations for completing schematic knowledge, such as EmbedS [20], TransC [21], HAKE [22], RotatE [23], EL Embeddings [24], and OWL2Vec* [25]. Most of them have attempted to encode the concepts and instances into a spherical semantic space so that the transitivity and other logical properties of relations could be preserved. Benefiting from this idea, more potential axioms and triples could be predicted.
Although the methods of KG embedding have achieved great success in KG completion, most of them suffer from two limitations. On the one hand, the mainstream models of KG embedding mainly consider the triples derived from factual knowledge, but few of them pay attention to the modeling of the logical properties of relations. Hence, these methods can hardly be applied to the related tasks of schematic knowledge (e.g., completion, reasoning, and repairing). For example, given one axiom ( C i , r , C j ) with two concepts C i , C j , and a symmetry relation r, for the translation-based KG embedding method TransE [13], if two concepts and a relation are projected into its defined semantic space, the axiom’s score of | | C i + r C j | | 2 is not equal to | | C j + r C i | | 2 . Therefore, the symmetry of relation r is lost in the defined semantic space of TransE. On the other hand, existing embedding models for schematic knowledge mainly focus on modeling the properties of relations such as transitivity, (anti)-symmetry, inversion, and composition. It remains a challenge for them to simultaneously preserve the transitivity (e.g., s u b c l a s s O f ) and symmetry (e.g., d i s j o i n t W i t h ) of axioms well. As the schematic knowledge of KG usually has its logical foundations based on ontology languages, such as the Resource Description Framework (Schema) (RDF(S)) (https://www.w3.org/TR/rdf-schema/ accessed on 5 September 2022) and the Ontology Web Language (OWL) (https://www.w3.org/OWL/ accessed on 5 September 2022), so it is important to improve the completion of s u b c l a s s O f axioms and d i s j o i n t W i t h ones, because both of them are basic axioms asserted in schematic knowledge or ontologies, which can ensure the quality of KGs and infer more implicit knowledge.
To solve the problems, we attempt to propose a cosine-based embedding method, namely CosE (Cosine-based Embedding), for learning the vector representations of concepts in lightweight schematic knowledge that corresponds to the ontology expressed in DL-Lite c o r e . DL-Lite c o r e is a lightweight language of Description Logics (DL), which can capture basic ontology languages and maintain a low complexity of reasoning [26]. In the previous study [27], the authors demonstrated that all the axioms asserted in DL-Lite c o r e could be reduced to the ones with s u b c l a s s O f and d i s j o i n t W i t h relations. Therefore, our proposed model is mainly designed to learn the representations of axioms that are defined by these two relations. In order to better preserve the properties of relations and to measure the confidence of axioms in schematic knowledge, we implement CosE by projecting concepts into the angle-based semantic space and translation-based semantic space according to the type of relations. In the angle-based semantic space, each concept is encoded with one vector and a valid length, which are utilized to preserve the properties of these two kinds of relations. In the translation-based semantic space, the vector representations of concepts are employed to measure the confidence of related axioms. Furthermore, we design a negative sampling strategy according to the mutual exclusion relationship between s u b c l a s s O f and d i s j o i n t W i t h during the training process of CosE, which can learn better vector representations of concepts for completing schematic knowledge.
The main contributions of this work can be summarized as follows.
  • We propose a cosine-based embedding model for completing lightweight schematic knowledge expressed in DL-Lite c o r e , in which two score functions are defined based on angle-based semantic space and translation-based semantic space so that the transitivity and symmetry of s u b c l a s s O f and d i s j o i n t W i t h relations can be preserved in our model simultaneously.
  • We design a negative sampling strategy based on the mutual exclusion relationship between s u b c l a s s O f axioms and d i s j o i n t W i t h ones so that CosE can obtain better vector representations of concepts for schematic knowledge completion.
  • We implement and evaluate our method based on four standard datasets constructed using real ontologies. Experiments on link prediction indicate that CosE could simultaneously preserve the logical properties (i.e., transitivity, symmetry) of relations and obtain better results than state-of-the-art models in most cases.
The rest of the paper is organized as follows. Section 2 shows the related work of knowledge graph embedding. Section 3 introduces the preliminaries, including both DL-Lite c o r e and its logical properties. Our proposed model for completing the schematic knowledge is described in Section 4. Section 5 presents the experiments and evaluation results, followed by discussions in Section 6. Section 7 gives the conclusion and future directions of research.

2. Related Work

In this section, we briefly give an overview of the existing research efforts on KG embedding and divide them into two categories.

2.1. Factual Knowledge Embedding

Factual knowledge embedding mainly consists of two mainstream models, which are translational distance models and semantic matching models [9]. The former utilized the distance-based scoring function to measure the plausibility of one triple, and the latter used the similarity-based function to match the latent semantics of entities and relations in the vector space.
TransE [13] was one of the most representative translational distance models. It tried to encode both entities and relations of triples as vectors into the same semantic space. For each triple (h, r, t), the head entity and the tail entity in the semantic space were denoted by h and t that could be connected by their relations t with low error, such that h + r t . Afterward, several methods have been proposed to improve this idea. TransH [14] projected all the entities into one relation-specific hyper-plane, which made different roles of one entity in different relations. TransR [15] and TransD [28] still followed the strategy of TransH. They projected entities into relation-specific spaces using a projection matrix so that more complex relations (i.e., 1-to-N, N-to-1 and N-to-N) could be encoded. To seek the most reliable relation among two entities, TransA [29] introduced an adaptive Mahalanobis distance to define the score function so that it could handle complex relations more flexibly. Nevertheless, the translation-based embedding strategy only considers the local information of triples, which cannot make full use of the global information in KGs.
Another type of KG embedding models based on semantic matching adopted the architectures of neural network, which obtained encouraging results of KG completion, including MLP [30], NAM [31], and R-GCN [32]. In addition, ProjE [33] and ConvE [34] introduced the features of complex space and further optimized the score functions of underlying models. Therefore, both of them could obtain better performances compared to several models without the features of complex space.
The methods for factual knowledge embedding mainly consider encoding the triples of KGs to obtain vector representations of entities and relations for factual knowledge completion. However, few of them pay attention to model logical properties (e.g., transitivity, symmetry) of relations. Hence, these methods can hardly be applied to the related tasks of schematic knowledge or ontologies (e.g., completion, reasoning, and repairing).

2.2. Schematic Knowledge Embedding

The studies of schematic knowledge embedding are primarily composed of logical rules embedding, logical properties embedding, and ontology embedding [35].
As the RDF(S) and schematic knowledge can be transformed into logical rules, several studies have tried to encode them into embedding models and enhance the performances for knowledge completion [36]. Guo et al. [37] designed a joint model named KALE, which could simultaneously encode the triples of factual knowledge and their related logical rules. Furthermore, the authors proposed an improved model called RUGE [38] that could integrate the labeled triples, soft rules, and unlabeled triples into an iterative framework for learning their vector representations in the semantic space. Similarly, Zhang et al. [39] proposed a model named IterE. Different from above methods that mainly proposed to learn rules, the authors devoted to learning the embeddings of entities and rules at the same time, making full of their advantages complementing each other during the processing of model learning.
To further maintain the logical properties of relations, some works have been proposed for schematic knowledge and lightweight ontologies called RDF Schema. On2Vec [40] was a translation-based method for embedding the population of ontologies, in which the matrices were introduced to encode the transitivity of several relations. In order to encode concepts and instances into the same semantic space, EmbedS [20] and TransC [21] tried to encode concepts as spheres and instances as vectors so that the transitivity of the is - A relations could be preserved. To model the semantic hierarchies of KGs, Zhang et al. [22] proposed a method called HAKE for modeling semantic hierarchies. It is inspired by the fact that concentric circles can naturally reflect the hierarchy in the polar coordinate system. To further model composition among relations, RotatE [23] encoded entities and relations into a complex vector space, in which each relation was treated as a rotation from its related head entity to the tail entity. Furthermore, it could persevere the (anti)-symmetry and inversion of relations at the same time.
Recently, embedding models for ontologies have received attention. EL Embedding [24] and Quantum Embedding [41] were two representative algorithms based on the end-to-end paradigm, in which loss functions and score functions were designed tailored for logical axioms expressed by EL + + and ALC , respectively. These two embedding models encoded the semantics of the logical constructors by transforming the relations into geometric relations so that they could complete some kinds of axioms in ontologies very well. Chen et al. [25] proposed an ontology embedding model combined with word embedding and random walk algorithm, called OWL2Vec🟉, which took in the lexical information, logical constructors, and graph structures of ontologies so that it could preserve the semantics of ontologies well.
Although the above models are enabled to encode the logical properties of relations in their designed semantic spaces, it is still a challenge for them to preserve the transitivity (e.g., s u b c l a s s O f ) and symmetry (e.g., d i s j o i n t W i t h ) of axioms at the same time. To the best of our knowledge, our model is the first work for completing lightweight schematic knowledge expressed in DL-Lite c o r e , by which the transitivity and symmetry of relations in axioms can be simultaneously preserved well.

3. Preliminary

This section first gives the basic syntax and definition of DL-Lite c o r e . Then, we introduce the definition of schematic knowledge embedding, and formulate its properties for preserving the transitivity and symmetry of relations in DL-Lite c o r e .

3.1. DL-Lite c o r e

DL-Lite c o r e is the core language for DL-Lite [26]. It is the lightweight language of Description Logics that represents the domain of interest via concepts denoting the set of instances and binary relations between instances. For the syntax of DL-Lite c o r e , the concepts and relations are defined as follows:
( 1 ) B : : = A | Q , ( 2 ) Q : : = P | P , ( 3 ) C : : = B | ¬ B , ( 4 ) R : : = Q | ¬ Q ,
where the symbols A and B denote an atomic concept and a basic one, P and Q represent an atomic relation and a basic one, and C and R denote the general concept and role, respectively.
The forms of axiom in DL-Lite c o r e can be asserted as follows: (1) the inclusion axiom of concepts is denoted by B C ; (2) the membership axiom of concept is denoted by A ( a ) , where a is an individual; and (3) the membership axiom of relation is denoted by P ( a , b ) , where a, b are two instances.
Definition 1
([42] (Ontology)). Let L be a logical language from Description Logics. An ontology denoted by O = T , A consists of a TBox T and an ABox A , where T is a set of concept inclusion axioms that is also called schematic knowledge, and A is a set of membership axioms about concepts and roles. The forms of all the axioms in O are constrained by the syntax of L .
For the lightweight schematic knowledge of KG, its logical axioms are mainly asserted based on s u b c l a s s O f axioms and d i s j o i n t W i t h axioms. According to the previous study [27], the authors demonstrated that all the axioms from TBox in DL-Lite c o r e could be completely reduced to axioms with s u b c l a s s O f and d i s j o i n t W i t h relations using the transformational rules based on a directed graph. Hence, our proposed model can encode all the axioms from TBox expressed in DL-Lite c o r e .
Unless specified otherwise, we assume that all the axioms of lightweight schematic knowledge in the subsequent sections are expressed in DL-Lite c o r e , and we do not differentiate the TBox in DL-Lite c o r e ontology and lightweight schematic knowledge. For convenience, we divide the axioms in TBox of DL-Lite c o r e into two sets, denoted by T = { ( C i , s u b c l a s s O f , C j ) } { ( C i , d i s j o i n t W i t h , C j ) } , where C i and C j are two general concepts. Notice that the axioms are also one special kind of triples. For schematic knowledge embedding, the encoded objects of models are head concepts, and the tail ones are defined by the syntax of DL-Lite c o r e , rather than entities without semantics.
Next, we give the definition of schematic knowledge embedding and formulate its properties for preserving the transitivity and symmetry of relations in DL-Lite c o r e .

3.2. Schematic Knowledge Embedding for DL-Lite c o r e

Definition 2
(Schematic Knowledge Embedding). Given a set of axioms T = { ( C i , r , C j ) } , where C i , C j are two concepts and r is a relation of them. Its embedding model is a function f ( C i , r , C j ) R that can encode all the concepts and relations in T into the semantic space, in which the logical property of each relation r can be preserved w.r.t. the asserted axioms { ( C i , r , C j ) } and inferred ones at the numerical level.
According to Definition 2, we formulate its properties for preserving the transitivity and symmetry of relations s u b c l a s s O f and d i s j o i n t W i t h in axioms during this process.
Definition 3
(Schematic Knowledge Embedding for DL-Lite c o r e ). Given a TBox T = { ( C i , r , C j ) } expressed in DL-Lite c o r e , its embedding model denoted by f ( C i , r , C j ) should satisfy the following properties to preserve the transitivity of s u b c l a s s O f and symmetry of d i s j o i n t W i t h :
1.
If r = s u b c l a s s O f and ( C 1 , r , C 2 ) , ( C 2 , r , C 3 ) are two axioms asserted in T , then f ( C 1 , r , C 3 )   f ( C 1 , r , C 2 ) f ( C 2 , r , C 3 ) .
2.
If r = d i s j o i n t W i t h and ( C 1 , r , C 2 ) is an axiom asserted in T , then f ( C 1 , r , C 2 ) f ( C 2 , r , C 1 ) .
Notice that the properties in Definition 3 are preconditions for the model of lightweight schematic knowledge embedding. The object function still needs to be competent for the task of completing lightweight schematic knowledge in DL-Lite c o r e . Our proposed model is subsequently designed to achieve these goals.

4. Method

In this section, we first show the framework of CosE for embedding schematic knowledge, and then we describe the score functions of CosE in detail. Finally, the strategy of negative sampling is introduced for training CosE.

4.1. The Framework of CosE

For each axiom ( C i , r , C j ) with transitivity or symmetry relation r expressed in DL-Lite c o r e , the existing KG embedding models prefer to treat relation r as one single symbol, but they usually ignore its logical properties. Therefore, the transitivity and symmetry of relations cannot be preserved in the semantic space, which is hardly applied to schematic knowledge completion. To better complete lightweight schematic knowledge, we propose a cosine-based embedding model called CosE, which can simultaneously preserve the transitivity of s u b c l a s s O f and symmetry of d i s j o i n t W i t h well.
Figure 2 shows the framework of CosE for schematic knowledge embedding with a concrete example, where the relations s u b c l a s s O f and d i s j o i n t W i t h are denoted by solid lines and dotted lines, respectively. Given a set of axioms expressed in DL-Lite c o r e shown in Figure 2a, CosE divides them into two disjoint sets, S D shown in Figure 2b, where S and D contain all the s u b c l a s s O f axioms and d i s j o i n t W i t h axioms, respectively. Then, all the concepts in S and D are projected according to the type of relations into angle-based semantic space and translation-based semantic space shown in Figure 2c. The angle-based semantic space is employed to preserve the transitivity or symmetry of relations in axioms, and the translation-based semantic space is used to measure the confidence of each axiom. Finally, the embedding of concepts will be obtained when the process of training CosE is finished.
Notice that s u b c l a s s O f and d i s j o i n t W i t h relations are 1-to-N and N-to-N ones. For example, one concept could belong to (or disjoint with) several concepts. To measure the confidences of axioms more accurately, CosE introduces the mapping matrix M C i C j to encode concepts as vectors in translation-based semantic space, where C i , C j are the concepts in given axioms. For the axiom ( C 1 , s u b c l a s s O f , C 2 ) shown in Figure 2c, the concepts C 1 , C 2 will be projected by M C i C j . It indicates that each axiom will be projected into one translation-based semantic space tailored for itself. More specifically, as shown in Figure 2c, we assume that C 1 12 , C 2 12 are the projected vectors of C 1 and C 2 by M C 1 C 2 , and C 2 23 and C 3 23 are the projected vectors of C 2 and C 3 by M C 2 C 3 . It is easy to observe that the located translation-based semantic spaces of them are different, which is helpful for each axiom to obtain a suitable confidence in its projected space. Given one axiom ( C i , r , C j ) , we define its mapping matrix M C i C j as follows:
M C i C j = C ip C jp + I n × n ,
where C ip , C jp R n are the projection vectors of head concept C i and tail concept C j in axiom ( C i , r , C j ) . I n × n is an identity matrix. According to the defined mapping matrix M C i C j , the projected vectors of concepts C i and C j in its translation-based semantic space are calculated as follows:
C i = M C i C j C i , C j = M C i C j C j .
As shown in Figure 2c, the translation-based semantic space introduced in CosE can only measure the confidence of axioms, while the logical properties of relations need to be encoded through the angle-based semantic space. To deal with the transitivity of axioms ( C 1 , s u b c l a s s O f , C 2 ) and ( C 2 , s u b c l a s s O f , C 3 ) in S , we hope the angles among vectors of C 1 , C 2 and C 3 should be close to 0 . To describe the direction of transmission about s u b c l a s s O f axioms, CosE employs the vector length of each concept as a restriction in angle-based semantic space, in which the lengths of sub-concepts should be less than the ones of their parents. For the axioms ( C 1 , s u b c l a s s O f , C 2 ) and ( C 2 , s u b c l a s s O f , C 3 ) in S , the length of C 3 is larger than C 2 , whose length is larger than C 1 . To preserve the symmetry, the length restrictions will be removed because the cosine function itself has the property of symmetry. For ( C 3 , d i s j o i n t W i t h , C 4 ) , the vector representations of concepts C 3 and C 4 are similar in the angle-based semantic space. With the help of these two types of semantic spaces, the transitivity and symmetry of relations could be simultaneously preserved in CosE.

4.2. The Score Functions of CosE

As CosE projects the concepts of all the axioms into two types of semantic spaces, we define the corresponding score functions to evaluate the score of axioms in these two semantic spaces. Given an axiom ( C i , r , C j ) , its score function is defined as:
f ( C i , r , C j ) = f a ( C i , r , C j ) + f t ( C i , r , C j ) ,
where f a ( C i , r , C j ) is score function defined in the angle-based semantic space, and f t ( C i , r , C j ) is one designed for translation-based semantic space.
For the score function f a ( C i , r , C j ) in angle-based semantic space, we assume that the relations with different properties should be measured by different score functions. For an axiom ( C i , r s , C j ) with a s u b c l a s s O f relation denoted by r s , CosE encodes concepts C i and C j as ( C i , m ) and ( C j , n ) , where C i and C j are the vectors of C i and C j , and m and n are two external vectors introduced to obtain the valid lengths of C i and C j for preserving the direction of transmission between the projected concepts. The score function of axiom f a ( C i , r s , C j ) designed for s u b c l a s s O f relation is defined as follows.
f a ( C i , r s , C j ) = 1 c o s ( C i , C j ) + | | m | | 2 | | n | | 2 ,
where C i R n and C j R n are two vectors corresponding to C i and C j , and | | m | | 2 and | | n | | 2 are two valid lengths of C i and C j . Note that these vectors are all parameters that need to be learned during the process of model training.
For an axiom ( C i , r d , C j ) with d i s j o i n t W i t h relation denoted by r d , the length constraints of vectors are removed so as to preserve the symmetry of the d i s j o i n t W i t h relation. The score function corresponding to ( C i , r d , C j ) is defined as follows.
f a ( C i , r d , C j ) = 1 c o s ( C i , C j ) ,
where C i , C j R n are the vectors of C i and C j in the angle-based semantic space.
Although the above score functions designed for angle-based semantic space can preserve the properties of s u b c l a s s O f and d i s j o i n t W i t h , it still cannot measure the confidences of axioms with these two relations well. This is because s u b c l a s s O f and d i s j o i n t W i t h are typical multivariate relations. To address this problem, we introduce a new score function for each axiom in translation-based semantic space to achieve this goal, as follows.
f t ( C i , r , C j ) = | | C i + r C j | | 2 ,
where r is the vector representation of relation r, and C i and C j R n are two projected vectors generated by Formulas (1) and (2) in translation-based semantic space. During the training process of CosE, we need to enforce the constraints such that | | C i | | 2 1 , | | C j | | 2 1 , | | C i | | 2 1 , and | | C j | | 2 1 .
Notice that the relation’s own transitivity or symmetry can also be modeled via the function of angle-based semantic space, and the normal relations without logical properties can be modeled via the function of translation-based semantic space.

4.3. Negative Sampling Based on Schematic Knowledge for the Training Model

To train our proposed model, every axiom in the training set needs to be labeled as “positive” or “negative”. However, there are only positive axioms asserted in the existing DL-Lite c o r e ontologies. Thus, we need to corrupt the positive axioms to generate a set of negative axioms. Precisely, for each axiom ( C i , r , C j ) asserted in the DL-Lite c o r e ontologies, we adopt the negative sampling strategy to generate a set of negative axioms, by which C i or C j in ( C i , r , C j ) are replaced with ( C i , r , C j ) or ( C i , r , C j ) according to the uniform probability distribution.
For all the axioms, we utilize the loss function based on margin rank to train the vector representation of concepts in CosE, where symbols T and T denote two sets of positive axioms and negative ones w.r.t. the type of relation. ξ and ξ represent a positive axiom and a negative one selected from T and T , respectively. For the axioms with s u b c l a s s O f relations, their loss function is defined as:
L s u b = ξ T s u b ξ T s u b [ γ s u b + f ( ξ ) f ( ξ ) ] + ,
where f ( · ) is the score function defined in Formula (3), and γ s u b is a margin to separate the positive axiom and the negative one, [ x ] + = m a x ( x , 0 ) . Similarly, the loss function of axioms with d i s j o i n t W i t h relations is defined as:
L d i s = ξ T d i s ξ T d i s [ γ d i s + f ( ξ ) f ( ξ ) ] + .
Finally, the complete loss function of CosE is linearly composed of the above two loss functions, defined as follows.
L = L s u b + L d i s
The target of training CosE is to minimize its loss function by updating the embeddings of concepts iteratively. Algorithm 1 presents the concrete procedure for training CosE. With a set of axioms T expressed in DL-Lite c o r e as inputs, we first divide T into two disjoint sets, S and D (Lines 1–2). Line 3 initializes all the parameters related to concepts trained in CosE, denoted by M . Lines 4–15 present the concrete realization of CosE for schematic knowledge embedding. For each axiom ( C i , s u b c l a s s O f , C j ) (or ( C i , d i s j o i n t W i t h , C j ) ), we employ the corresponding score function to learn the vector representations of concepts in two semantic spaces, and calculate the loss function of them (Lines 5–14). Line 15 calculates the sum of two loss functions of L s u b and L d i s . The whole training process will be terminated until the loss function L of model M is converged.
Algorithm 1: The algorithm of training CosE model
Applsci 12 10690 i001
Furthermore, we design a novel negative sampling skill according to the mutual exclusion between s u b c l a s s O f and d i s j o i n t W i t h relations. Unlike the uniform negative sampling method that randomly samples its replacer from all the concepts, we restrict the sampling scope to a group of candidates, which can provide more meaningful information during the process of training. Precisely, for each axiom ( C i , r s , C j ) with s u b c l a s s O f relation r s , if there exist s u b c l a s s O f relations (e.g., ( C i , s u b c l a s s O f , C i ) or ( C j , s u b c l a s s O f , C j ) ) asserted or inferred in ontologies, we need to exclude these replace cases because of the transitivity of s u b c l a s s O f . Relatively, if there exist d i s j o i n t W i t h relations (e.g., ( C i , d i s j o i n t W i t h , C i ) or ( C j ,   d i s j o i n t W i t h , C j ) ) in ontologies, we need to give the highest priority to these relations for replace cases. Similarly, for each axiom ( C i , r d , C j ) with d i s j o i n t W i t h relation r d , we need to exclude the replace cases ( C i , s u b c l a s s O f , C i ) or ( C j , s u b c l a s s O f , C j ) , which are asserted or inferred in ontologies. With these semantic constraints for negative sampling, we can obtain better vector representations of concepts for completing schematic knowledge.

5. Experiments and Results

To evaluate our method, we compare CosE with several well-known models and state-of-the-art methods of KG embedding on link prediction, which is a typical task employed for knowledge graph completion. In addition, we further extend the tasks of link prediction in order to verify the ability of models for preserving the transitivity and symmetry of relations in schematic knowledge.

5.1. Datasets

Although there exist several benchmark datasets (e.g., FB15K, WN18) in previous works [13,14,15], it is not suitable for them to evaluate the models for completing schematic knowledge. This is because most datasets mainly consist of factual knowledge, but few of them contain enough concepts and related axioms. Therefore, we collect four lightweight schematic knowledge named YAGO-On, FMA [43], FoodOn [44], and HeLiS [45], and two variants built based on YAGO-On (i.e., YAGO-on-t and YAGO-on-s), listed as follows.
  • YAGO-On: It is built from the well-known knowledge graph YAGO [4], which contains lots of concepts from WordNet [2].
  • FMA: It is an evolving ontology that has been maintained by University of Washington since 1994. It conceptualizes the phenotype structure of human in a machine-readable form, whose biomedical schematic knowledge has been open source in OAEI (http://oaei.ontologymatching.org/ accessed on 5 September 2022).
  • FoodOn: It is a comprehensive ontology resource that spans various domains related to food, which can precisely describe foods commonly known in cultures from around the world.
  • HeLiS: It is an ontology for promoting healthy lifestyles, which tries to conceptualize the domains of food and physical activity so that some unhealthy behaviors can be monitored.
  • YAGO-On-t: It is built from the axioms in YAGO-On according to the transitivity property of s u b c l a s s O f . If there exist ( C i , s u b c l a s s O f , C j ) and ( C j , s u b c l a s s O f , C m ) in YAGO-On, we add an axiom ( C i , s u b c l a s s O f , C m ) to YAGO-On-t.
  • YAGO-On-s: It is built from the axioms in YAGO-On according to the symmetry property of d i s j o i n t W i t h . If there exists an axiom ( C i , d i s j o i n t W i t h , C j ) in YAGO-On, we add an axiom ( C j , d i s j o i n t W i t h , C i ) to YAGO-On-s.
As several datasets only contain axioms with s u b c l a s s O f relations, we need to supplement some d i s j o i n t W i t h axioms. To achieve this goal, we utilize heuristic rules proposed in [46] to generate some axioms with d i s j o i n t W i t h relations and inject them into the original datasets. Table 1 lists the statistics of the above datasets for evaluation.

5.2. Implementation Details

To verify the effectiveness of CosE, we employ several KG embedding models as baselines, including TransE [13], TransH [14], TransR [15], TransD [28], RESCAL [16], HolE [18], ComplEx [19], and Analogy [47], which are implemented by the OpenKE platform [48]. In addition, we utilize the state-of-art KG embedding methods (i.e., TransC [21], RotatE [23], and EL Embedding [24]) (For the fairness of comparison, we do not compare CosE with OWL2Vec* [25] because it makes full of labels, comments, and extra resources of ontologies, whereas its source codes cannot be split.) to compare with CosE.
We implement CosE in Python with the help of the PyTorch platform. Its source code can be downloaded along with datasets (https://github.com/zhengxianda/CosE accessed on 5 September 2022). We utilize the stochastic gradient descent (SGD) with the mini-batch strategy to train CosE, and employ SGD as an optimizer to fine-tune hyper-parameters according to the validation datasets. The ranges of several hyper-parameters are listed as follows: the dimension d for embedding concepts is selected from the scope of { 100 , 125 , 200 , 250 , 500 , 1000 } , the mini-batch size b for the training range of { 64 , 128 , 200 , 512 , 1024 , 2048 } , and the margin γ for the loss functions range of { 1 , 2 , 3 , 6 , 9 , 12 , 15 } . For some special models (e.g., ComplEx, RotatE), we adopt uniform initialization for the real and imaginary vectors of concepts and relations. Notice that we do not employ regularization to constrain CosE because we observe that the fixed margin γ can effectively prevent CosE from over-fitting. The best configurations of hyper-parameters are determined by the validation set in terms of mean rank. Finally, the optimized hyper-parameters of CosE are d = 200 , b = 200 , and γ = 3 . In order to distinguish the effect between our proposed strategy of negative sampling and the traditional one, the symbol “CosE” represents that our negative sampling strategy has been equipped as the default, and “CosE” adopts the traditional negative sampling strategy in the subsequent tables.

5.3. The Results of Link Prediction

Link prediction is a typical task for completing the axiom when one of the concepts or their relation is missing. We employ MRR and Hits@N as evaluation metrics proposed in TransE [13]. For each axiom ( C i , r , C j ) in test datasets, we replace concept C i or C j with C n in the set of concepts C to generate corrupted axioms, and measure the confidences of these axioms using the score function. Then, the rank of correct concepts could be obtained by sorting the confidences of axioms in descending order. MRR is a metric that calculates the mean reciprocal rank of all correct concepts. Hits@N is a metric that counts the ratio of the correct concepts ranked in the top N. Notice that the corrupted axiom ranking above a test axiom is also valid, and should not be treated as a wrong axiom. Therefore, corrupted axioms asserted in schematic knowledge have been filtered before ranking. For convenience, we label the filtered result as “Filter”, and the unfiltered one is denoted by “Raw”. In our experiments of link prediction, all the models are required to infer the missing concept C i or C j for the axioms ( C i , r , C j ) in the test datasets. For the metrics MRR and Hits@N, a higher value of them indicates a better performance of the evaluated model.
Table 2 and Table 3 list the results of link prediction on YAGO-On, FMA, FoodOn, and HeLiS. Overall, CosE and CosE have obviously surpassed other models in terms of MRR and Hits@N. It indicates that both of them can preserve the logical properties (i.e., transitivity, symmetry) of relations via two designed semantic spaces, which can help our model to learn better vector representations of concepts for completing schematic knowledge. Compared with models employing the strategy of projection matrices (e.g., TransH, TransR and TransD), CosE is able to measure the confidences of related axioms more precisely. The possible reasons are that CosE attempts to project axioms with different relations into different translation-based semantic spaces, and the types of relations in schematic knowledge are relatively small. Hence, the projection strategy of CosE is more suitable than with other models. Furthermore, benefiting from our proposed strategy of negative sampling, the performances of CosE are slightly better than CosE in terms of MRR and Hits@N. We analyze that the mutual exclusion between s u b c l a s s O f and d i s j o i n t W i t h is useful in distinguishing the similar embedding of concepts in the semantic space.
Table 4 and Table 5 list the results of link prediction about the s u b c l a s s O f axioms of four datasets. Overall, CosE and CosE outperform most models in terms of MRR and Hits@N. It shows that the transitivity of s u b c l a s s O f is preserved well in our defined semantic spaces. Relatively, the performances of some schematic knowledge embedding methods are not well. We analyze that most of their score functions focus on modeling s u b c l a s s O f relations via spheres, but the external d i s j o i n t W i t h axioms may influence the convergence of their score functions.
Table 6 and Table 7 list the results of the axioms with d i s j o i n t W i t h relations. In terms of MRR and Hits@N, CosE and CosE have outperformed all the models in most cases. For link prediction results on YAGO-On and FMA, the performances of CosE are a little worse than TransR and TransE in MRR Raw, but it obtains better results in terms of MRR Filter. Through further analysis, we discover that CosE tends to give a higher score to the correct corrupted axiom, so the value of MRR Raw in CosE is much smaller than its MRR Filter value. Overall, Hits@1 of CosE has been improved from 15% to 30% in YAGO-On and FMA. It shows that CosE could preserve the symmetry of relations precisely in the angle-based semantic space. Nevertheless, the performances of all the KG embedding models on FoodOn and HeLiS are unsatisfactory. We observe that it may be related to the generation method [46] of d i s j o i n t W i t h axioms. More importantly, it indicates that the current models still lack scalability for various large-scale ontologies to some extent, and we will leave these issues for future work.

5.4. The Results of Transitivity and Symmetry

As discussed above, we further verify whether the embeddings of concepts in CosE can encode the transitivity and symmetry implicitly. To achieve this goal, we design two experiments based on link prediction using the constructed datasets (i.e., YAGO-On-t, YAGO-On-s). In YAGO-On-t, if the related axioms ( C i , s u b c l a s s O f , C j ) and ( C j , s u b c l a s s O f , C m ) in the training set are satisfied by the transitivity rule, the testing set will contain the inferred axiom ( C i , s u b c l a s s O f , C m ) according to the transitivity property of s u b c l a s s O f . Then, we train embedding models using the training set of YAGO-On-t, and utilize link prediction on the testing set to evaluate their performances on transitivity. Analogously, we evaluate the symmetry of models using YAGO-On-s. If the training set contains ( C i , d i s j o i n t W i t h , C j ) , the axiom ( C j , d i s j o i n t W i t h , C i ) will be added to the testing set.
As listed in Table 8, we observe that CosE and CosE achieve better results than other models on the constructed datasets in most cases. On YAGO-On-t, the performances of CosE have exceeded other models in terms of MRR and Hits@N. On YAGO-On-s, CosE is slightly worse than HolE and TransE in terms of MRR and Hits@1. It shows that CosE has better potential than other models to perform schematic knowledge completion in terms of transitivity and symmetry.

5.5. Case Study

The above experiments show that CosE has good performances for link prediction and can preserve the logical properties of transitivity or symmetry well. Next, we present several concrete examples of completed results about CosE compared with TransE listed in Table 9, where the bold words show correct answers.
We observe that CosE can improve the performance of predicting the tail concepts compared with TransE. The first example shows the result of predicting the tail concept of one s u b c l a s s O f axiom. For TransE, the correct answer is ranked 35th. The ranking of the correct answer in CosE has been significantly improved, and it is ranked fifth. It indicates that CosE could measure the confidences of axioms more precisely.
Table 9 also shows the ability of CosE to preserve transitivity and symmetry. Compared with TransE, the second example shows that CosE improves the correct answer of a s u b c l a s s O f axiom from 24th to 4th. Similarly, the correct answer of the disjointness axiom in the third example is improved to second place. These examples indicate that CosE can infer the tail or head concept more precisely than existing KG embedding models for given axioms.

6. Discussion and Limitations

Compared to our preliminary work [49], we model the schematic knowledge embedding in view of lightweight ontology language called DL-Lite c o r e . Therefore, the inferred properties of DL-Lite c o r e can be formulated and employed to optimize our methods. Notice that the mutual exclusion relationship between the s u b c l a s s O f axioms and the d i s j o i n t W i t h ones is derived from the minimal incoherence-preserving sub-TBoxes [50], which is the motivation of our negative sampling strategy. Therefore, our work will have a positive impact on optimizing schematic knowledge embedding models using the inferred properties of ontology language.
Secondly, we discover the result that the performances of all models for schematic knowledge completion on FoodOn and HeLiS are unsatisfactory. On the one hand, the reason is that existing models for completing schematic knowledge still lack the scalability for various large-scale ontologies to some extent. On the other hand, the generation strategy of d i s j o i n t W i t h axioms may not be suitable for all the ontologies so that it could influence the performances of models. Hence, our experiments demonstrate that there is still much room for existing models to achieve schematic knowledge completion.
Nevertheless, there are issues worth discussing that have not yet been addressed in our current work. Notice that we assume that all the concepts in schematic knowledge or ontologies are static and correct. However, the axioms are usually updating in real scenarios. An incremental embedding method for schematic knowledge embedding should be explored, so as to avoid repeating training all over again whenever axioms update. On the other hand, if there are some wrong axioms asserted in ontologies without labeling, how to detect these wrong axioms and encode the correct ones at the same time becomes challenging. Therefore, one kind of embedding method that can be compatible with wrong axioms still needs to be considered.

7. Conclusions

In this paper, we presented a cosine-based embedding model called CosE for completing lightweight schematic knowledge in DL-Lite c o r e , by which the transitivity and symmetry of relations in axioms could be preserved simultaneously. To sufficiently learn the vector representation of concepts in axioms, we introduced the two semantic spaces and designed two types of score functions for them, which are tailored for axioms expressed in DL-Lite c o r e . Furthermore, we proposed a strategy of negative sampling derived from the mutual exclusion between s u b c l a s s O f and d i s j o i n t W i t h relations. In this way, CosE could learn better vector representations of concepts for completing schematic knowledge. We implemented and evaluated our model on four standard datasets generated using real ontologies. Experimental results have shown that CosE could simultaneously keep the logical properties (i.e., transitivity, symmetry) of relations and outperform state-of-the-art models in most cases.
For future works, we will study further along three directions: (1) CosE is an embedding model to complete the axioms of lightweight schematic knowledge, but it is limited to DL-Lite c o r e . It is worth extending CosE to more expressive ontology languages of schematic knowledge such as DL-Lite A [51]. (2) Deep learning networks with the transformer architecture (e.g., BERT [52], GPT-3 [53], and their variants [54]) can be utilized to further optimize our model. All of them can take full advantage of large and external knowledge sources. Incorporating such models into our method can facilitate better performances being obtained for schematic knowledge completion. (3) The embedding of concepts could be helpful for the related tasks of ontologies and KGs. We will try to extend CosE so that it can be applied in these tasks and improve their performances, such as ontology-based data access [55], ontology matching [56], and knowledge graph refinement [57].

Author Contributions

Funding acquisition, W.L.; methodology, W.L. and H.G.; conceptualization, W.L. and H.G.; software, X.Z.; validation, X.Z.; writing, review and editing, Q.J. and G.Q.; supervision, G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China, grants (62006125, U21A20488, and 61602259), and the Foundation of Jiangsu Provincial Double-Innovation Doctor Program, grant (JSSCBS20210532), and the NUPTSF grant (No. NY220171).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, W.; Qi, G.; Ji, Q. Hybrid reasoning in knowledge graphs: Combing symbolic reasoning and statistical reasoning. Semant. Web 2020, 11, 53–62. [Google Scholar] [CrossRef]
  2. Fellbaum, C. WordNet; Springer: Dordrecht, The Netherlands, 2010; pp. 231–243. [Google Scholar]
  3. Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef] [Green Version]
  4. Suchanek, F.M.; Kasneci, G.; Weikum, G. YAGO: A Large Ontology from Wikipedia and WordNet. J. Web Semant. 2008, 6, 203–217. [Google Scholar] [CrossRef] [Green Version]
  5. Wang, M.; Wang, R.; Liu, J.; Chen, Y.; Zhang, L.; Qi, G. Towards Empty Answers in SPARQL: Approximating Querying with RDF Embedding. In Proceedings of the 17th International Semantic Web Conference, Monterey, CA, USA, 8–12 October 2018; pp. 513–529. [Google Scholar]
  6. Ristoski, P.; Rosati, J.; Di Noia, T.; De Leone, R.; Paulheim, H. RDF2Vec: RDF graph embeddings and their applications. Semant. Web 2019, 10, 721–752. [Google Scholar] [CrossRef] [Green Version]
  7. Sun, Z.; Zhang, Q.; Hu, W.; Wang, C.; Chen, M.; Akrami, F.; Li, C. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs. In Proceedings of the 46th International Conference on Very Large Data Bases, Tokyo, Japan, 31 August–4 September 2020; pp. 2326–2340. [Google Scholar]
  8. Weston, J.; Bordes, A.; Yakhnenko, O.; Usunier, N. Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Washington, DC, USA, 18–21 October 2013; pp. 1366–1371. [Google Scholar]
  9. Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
  10. Dai, Y.; Wang, S.; Xiong, N.N.; Guo, W. A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks. Electronics 2020, 9, 750. [Google Scholar] [CrossRef]
  11. Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
  12. Rossi, A.; Barbosa, D.; Firmani, D.; Matinata, A.; Merialdo, P. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. IEEE Trans. Knowl. Data Eng. 2021, 15, 1–49. [Google Scholar] [CrossRef]
  13. Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
  14. Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
  15. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
  16. Nickel, M.; Tresp, V.; Kriegel, H.P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 809–816. [Google Scholar]
  17. Yang, B.; Yih, S.W.T.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  18. Nickel, M.; Rosasco, L.; Poggio, T. Holographic Embeddings of Knowledge Graphs. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 1955–1961. [Google Scholar]
  19. Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33nd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
  20. Diaz, G.I.; Fokoue, A.; Sadoghi, M. EmbedS: Scalable, Ontology-aware Graph Embeddings. In Proceedings of the 21st International Conference on Extending Database Technology, Vienna, Austria, 26–29 March 2018; pp. 433–436. [Google Scholar]
  21. Lv, X.; Hou, L.; Li, J.; Liu, Z. Differentiating Concepts and Instances for Knowledge Graph Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1971–1979. [Google Scholar]
  22. Zhang, Z.; Cai, J.; Zhang, Y.; Wang, J. Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3065–3072. [Google Scholar]
  23. Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, Los Angeles, CA, USA, 6–9 May 2019. [Google Scholar]
  24. Kulmanov, M.; Liu-Wei, W.; Yan, Y.; Hoehndorf, R. EL Embeddings: Geometric Construction of Models for the Description Logic EL++. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 6103–6109. [Google Scholar]
  25. Chen, J.; Hu, P.; Jiménez-Ruiz, E.; Holter, O.M.; Antonyrajah, D.; Horrocks, I. OWL2Vec*: Embedding of OWL ontologies. Mach. Learn. 2021, 110, 1813–1845. [Google Scholar] [CrossRef]
  26. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; Rosati, R. DL-Lite: Tractable Description Logics for Ontologies. In Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh, PA, USA, 9–13 July 2005; pp. 602–607. [Google Scholar]
  27. Fu, X.; Qi, G.; Zhang, Y.; Zhou, Z. Graph-based approaches to debugging and revision of terminologies in DL-Lite. Knowl.-Based Syst. 2016, 100, 1–12. [Google Scholar] [CrossRef]
  28. Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
  29. Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransA: An Adaptive Approach for Knowledge Graph Embedding. arXiv 2015, arXiv:1509.05490. [Google Scholar]
  30. Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; Zhang, W. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 601–610. [Google Scholar]
  31. Liu, Q.; Jiang, H.; Evdokimov, A.; Ling, Z.H.; Zhu, X.; Wei, S.; Hu, Y. Probabilistic Reasoning via Deep Learning: Neural Association Models. arXiv 2016, arXiv:1603.07704. [Google Scholar]
  32. Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Berg, R.V.D.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the 15th Extended Semantic Web Conference, Heraklion, Crete, Greece, 3–7 June 2018; pp. 593–607. [Google Scholar]
  33. Shi, B.; Weninger, T. ProjE: Embedding Projection for Knowledge Graph Completion. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1236–1242. [Google Scholar]
  34. Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1811–1818. [Google Scholar]
  35. Zhang, W.; Chen, J.; Li, J.; Xu, Z.; Pan, J.Z.; Chen, H. Knowledge Graph Reasoning with Logics and Embeddings: Survey and Perspective. arXiv 2022, arXiv:2202.07412. [Google Scholar]
  36. Gutiérrez-Basulto, V.; Schockaert, S. From Knowledge Graph Embedding to Ontology Embedding? An Analysis of the Compatibility between Vector Space Representations and Rules. In Proceedings of the Sixteenth International Conference on Principles of Knowledge Representation and Reasoning., Tempe, AZ, USA, 30 October–2 November 2018; pp. 379–388. [Google Scholar]
  37. Guo, S.; Wang, Q.; Wang, L.; Wang, B.; Guo, L. Jointly Embedding Knowledge Graphs and Logical Rules. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 192–202. [Google Scholar]
  38. Guo, S.; Wang, Q.; Wang, L.; Wang, B.; Guo, L. Knowledge Graph Embedding With Iterative Guidance From Soft Rules. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4816–4823. [Google Scholar]
  39. Zhang, W.; Paudel, B.; Wang, L.; Chen, J.; Zhu, H.; Zhang, W.; Bernstein, A.; Chen, H. Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2366–2377. [Google Scholar]
  40. Chen, M.; Tian, Y.; Chen, X.; Xue, Z.; Zaniolo, C. On2Vec: Embedding-based Relation Prediction for Ontology Population. In Proceedings of the 2018 SIAM International Conference on Data Mining, San Diego, CA, USA, 3–5 May 2018; pp. 315–323. [Google Scholar]
  41. Garg, D.; Ikbal, S.; Srivastava, S.K.; Vishwakarma, H.; Karanam, H.; Subramaniam, L.V. Quantum Embedding of Knowledge for Reasoning. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 5595–5605. [Google Scholar]
  42. Staab, S.; Studer, R. Handbook on Ontologies, 2nd ed.; Springer: Dordrecht, The Netherlands, 2009. [Google Scholar]
  43. Noy, N.F.; Musen, M.A.; Mejino, J.L., Jr.; Rosse, C. Pushing the envelope: Challenges in a frame-based representation of human anatomy. Data Knowl. Eng. 2004, 48, 335–359. [Google Scholar] [CrossRef]
  44. Dooley, D.M.; Griffiths, E.J.; Gosal, G.S.; Buttigieg, P.L.; Hoehndorf, R.; Lange, M.C.; Schriml, L.M.; Brinkman, F.S.; Hsiao, W.W. FoodOn: A harmonized food ontology to increase global food traceability, quality control and data integration. npj Sci. Food 2018, 2, 23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Dragoni, M.; Bailoni, T.; Maimone, R.; Eccher, C. HeLiS: An Ontology for Supporting Healthy Lifestyles. In Proceedings of the 17th International Semantic Web Conference, Monterey, Monterey, CA, USA, 8–12 October 2018; pp. 53–69. [Google Scholar]
  46. Gao, H.; Qi, G.; Ji, Q. Schema induction from incomplete semantic data. Intell. Data Anal. 2018, 22, 1337–1353. [Google Scholar] [CrossRef]
  47. Liu, H.; Wu, Y.; Yang, Y. Analogical Inference for Multi-relational Embeddings. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2168–2178. [Google Scholar]
  48. Han, X.; Cao, S.; Lv, X.; Lin, Y.; Liu, Z.; Sun, M.; Li, J. OpenKE: An Open Toolkit for Knowledge Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 139–144. [Google Scholar]
  49. Gao, H.; Zheng, X.; Li, W.; Qi, G.; Wang, M. Cosine-Based Embedding for Completing Schematic Knowledge. In Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing, Dunhuang, China, 9–14 October 2019; pp. 249–261. [Google Scholar]
  50. Schlobach, S.; Cornet, R. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 9–15 August 2003; pp. 355–362. [Google Scholar]
  51. Poggi, A.; Lembo, D.; Calvanese, D.; Giacomo, G.D.; Lenzerini, M.; Rosati, R. Linking Data to Ontologies. J. Data Semant. 2008, 10, 133–173. [Google Scholar]
  52. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  53. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Sastry, G.; Askell, A.; Amodei, D. Language Models are Few-Shot Learners. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
  54. Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient Transformers: A Survey. ACM Comput. Surv. 2022; in press. [Google Scholar] [CrossRef]
  55. Xiao, G.; Calvanese, D.; Kontchakov, R.; Lembo, D.; Poggi, A.; Rosati, R.; Zakharyaschev, M. Ontology-Based Data Access: A Survey. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5511–5519. [Google Scholar]
  56. Otero-Cerdeira, L.; Rodríguez-Martínez, F.J.; Gómez-Rodríguez, A. Ontology matching: A literature review. Expert Syst. Appl. 2015, 42, 949–971. [Google Scholar] [CrossRef]
  57. Paulheim, H. Knowledge graph refinement: A survey of approaches and evaluation methods. Semant. Web 2017, 8, 489–508. [Google Scholar] [CrossRef]
Figure 1. An example of the missing one s u b c l a s s O f axiom for schematic knowledge completion.
Figure 1. An example of the missing one s u b c l a s s O f axiom for schematic knowledge completion.
Applsci 12 10690 g001
Figure 2. The framework of CosE for schematic knowledge embedding.
Figure 2. The framework of CosE for schematic knowledge embedding.
Applsci 12 10690 g002
Table 1. The statistics of generated datasets for evaluation.
Table 1. The statistics of generated datasets for evaluation.
DatasetYAGO-On [4]FMA [43]FoodOn [44]HeLiS [45]YAGO-On-tYAGO-On-s
♯ Concept46,10978,98828,18217,55046,10946,109
Train s u b c l a s s O f 29,18129,18120,84414,22211,8980
d i s j o i n t W i t h 32,67332,67317,39813,782010,000
Valid s u b c l a s s O f 100020001488101510001000
d i s j o i n t W i t h 100020002714172210001000
Test s u b c l a s s O f 100020002978203259490
d i s j o i n t W i t h 1000100021741722010,000
♯: indicates number of concepts.
Table 2. The results of YAGO-On and FMA on link prediction.
Table 2. The results of YAGO-On and FMA on link prediction.
YAGO-OnFMA
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.241 0.501 0.784 0.582 0.343 0.066 0.325 0.474 0.371 0.247
TransH [14]0.195 0.196 0.472 0.252 0.091 0.008 0.009 0.018 0.005 0.003
TransR [15]0.090 0.428 0.588 0.433 0.355 0.060 0.411 0.490 0.440 0.370
TransD [28]0.038 0.176 0.462 0.305 0.000 0.034 0.149 0.430 0.250 0.000
RESCAL [16]0.080 0.339 0.525 0.392 0.244 0.047 0.317 0.469 0.377 0.236
HolE [18]0.1550.2310.5230.2540.0990.0390.1120.3110.1200.033
ComplEx [19]0.034 0.237 0.491 0.403 0.058 0.033 0.201 0.484 0.372 0.011
Analogy [47]0.037 0.301 0.496 0.429 0.160 0.037 0.277 0.487 0.415 0.130
TransC [21]0.112 0.420 0.698 0.502 0.298
RotatE [23]0.0020.0020.0010.0000.0000.0010.0010.0010.0000.000
EL Embedding [24]0.0080.0080.0050.0000.0000.0140.0140.0190.0010.001
CosE 0.2290.5580.8590.6480.4950.0930.3860.6280.3910.271
CosE0.2470.6570.8610.7140.5500.1170.5070.6400.5450.423
† Indicates that the results are taken from our published work [49]. Other results are obtained by their source codes. ⋇ As experimental results of TransC are much worse than the ones mentioned in the paper [21], we utilizeits original results for evaluation.
Table 3. The results of FootOn and HeLiS on link prediction.
Table 3. The results of FootOn and HeLiS on link prediction.
FoodOnHeLiS
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.0110.0120.0200.0110.0060.0370.0370.0780.0280.010
TransH [14]0.0100.0120.0200.0130.0060.0260.0260.0500.0200.006
TransR [15]0.0080.0080.0130.0080.0040.0250.0260.0560.0160.004
TransD [28]0.0030.0030.0070.0040.0000.0080.0080.0180.0050.000
RESCAL [16]0.0010.0010.0040.0000.0000.0030.0030.0040.0030.001
HolE [18]0.0020.0020.0070.0010.0000.0350.0350.0780.0240.008
ComplEx [19]0.0010.0010.0030.0000.0000.0010.0010.0010.0000.000
RotatE [23]0.0090.0090.0170.0080.0040.0230.0230.0330.0260.012
EL Embedding [24]0.0010.0010.0020.0010.0000.0010.0010.0010.0000.000
CosE 0.0320.0370.0800.0580.0090.0770.0770.1440.0790.034
CosE0.0340.0380.0830.0570.0110.0800.0800.1520.0810.035
Table 4. The results of YAGO-On and FMA on link prediction about s u b c l a s s O f axioms.
Table 4. The results of YAGO-On and FMA on link prediction about s u b c l a s s O f axioms.
YAGO-OnFMA
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.375 0.375 0.722 0.472 0.179 0.113 0.113 0.260 0.110 0.035
TransH [14]0.377 0.377 0.494 0.179 0.179 0.110 0.110 0.295 0.080 0.040
TransR [15]0.063 0.063 0.216 0.020 0.000 0.010 0.010 0.050 0.050 0.050
TransD [28]0.011 0.011 0.018 0.008 0.000 0.050 0.050 0.050 0.000 0.000
RESCAL [16]0.069 0.069 0.143 0.073 0.035 0.009 0.009 0.010 0.005 0.005
HolE [18]0.2250.2250.4340.2290.1260.0020.0020.0000.0000.000
ComplEx [19]0.001 0.003 0.002 0.001 0.001 0.003 0.003 0.010 0.000 0.000
Analogy [47]0.003 0.003 0.035 0.003 0.003 0.050 0.050 0.050 0.050 0.050
RotatE [23]0.0010.0010.0000.0000.0000.0020.0020.0020.0000.000
EL Embedding [24]0.0010.0010.0000.0000.0000.0010.0010.0000.0000.000
CosE 0.3930.3930.7240.4710.2260.1280.1280.2950.1650.030
CosE0.3970.3970.7260.4580.2400.1450.1450.2900.1400.065
† Shows that the results are taken from the work [49]. Other results are generated by their source codes.
Table 5. The results of FoodOn and HeLiS on link prediction about s u b c l a s s O f axioms.
Table 5. The results of FoodOn and HeLiS on link prediction about s u b c l a s s O f axioms.
FoodOnHeLiS
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.0150.0150.0240.0140.0090.0720.0720.1540.0520.019
TransH [14]0.0110.0140.0250.0150.0060.0500.0500.0970.0370.012
TransR [15]0.0110.0110.0170.0100.0060.0500.0500.1110.0320.008
TransD [28],0.0010.0010.0010.0000.0000.0160.0160.0350.0110.000
RESCAL [16]0.0010.0010.0010.0000.0000.0050.0050.0070.0050.003
HolE [18]0.0040.0040.0130.0020.0000.0700.0700.1550.0480.016
ComplEx [19]0.0010.0010.0000.0000.0000.0010.0010.0000.0000.000
RotatE [23]0.0170.0170.0330.0160.0170.0460.0460.0650.0510.024
EL Embedding [24]0.0010.0010.0010.0010.0000.0010.0010.0000.0000.000
CosE 0.0380.0380.0740.0410.0170.1520.1520.2860.1550.068
CosE0.0400.0400.0790.0360.0210.1580.1580.3000.1590.071
Table 6. The results of YAGO-On and FMA on link prediction about d i s j o i n t W i t h axioms.
Table 6. The results of YAGO-On and FMA on link prediction about d i s j o i n t W i t h axioms.
YAGO-OnFMA
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.120 †0.627 †0.846 †0.693 †0.507 †0.1220.639 †0.927 †0.741 †0.491 †
TransH [14]0.010 †0.014 †0.220 †0.010 †0.003 †0.005 †0.006 †0.002 †0.001 †0.001 †
TransR [15]0.1320.792 †0.974 †0.848 †0.710 †0.010 †0.010 †0.050 †0.050 †0.050 †
TransD [15]0.066 †0.774 †0.906 †0.621 †0.000 †0.066 †0.292 †0.873 †0.488 †0.000 †
RESCAL [16]0.100 †0.640 †0.920 †0.720 †0.500 †0.094 †0.640 †0.940 †0.750 †0.480 †
HolE [18]0.0840.2370.6110.2780.0720.0780.2240.6220.2400.065
ComplEx [19]0.066 †0.470 †0.970 †0.820 †0.110 †0.003 †0.003 †0.010 †0.000 †0.000 †
Analogy [47]0.074 †0.598 †0.988 †0.854 †0.317 †0.069 †0.557 †0.979 †0.823 †0.264 †
RotatERotatE [23]0.0010.0010.0000.0000.0000.0010.0010.0000.0000.000
EL Embedding [24]0.0160.0160.0100.0010.0000.0280.0280.0370.0020.001
CosE 0.0660.7230.9940.8240.7640.0580.6440.9620.6170.512
CosE0.0970.9170.9960.9700.8600.0900.8700.9900.9500.780
† shows that the results are taken from the work [49]. Other results are obtained by their source codes.
Table 7. The results of FoodOn and HeLiS on link prediction about d i s j o i n t W i t h axioms.
Table 7. The results of FoodOn and HeLiS on link prediction about d i s j o i n t W i t h axioms.
FoodOnHeLiS
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.0070.0070.0150.0070.0040.0010.0020.0030.0020.000
TransH [14]0.0100.0100.0150.0110.0060.0020.0020.0030.0020.000
TransR [15]0.0050.0050.0100.0050.0020.0010.0010.0020.0010.000
TransD [28]0.0050.0060.0140.0090.0010.0010.0010.0020.0000.000
RESCAL [16]0.0010.0010.0010.0000.0000.0010.0010.0010.0010.000
HolE [18]0.0010.0010.0010.0010.0000.0010.0010.0020.0010.000
ComplEx [19]0.0010.0010.0010.0000.0000.0010.0010.0010.0000.000
RotatE [23]0.0010.0010.0010.0010.0000.0010.0010.0010.0010.000
EL Embedding [24]0.0010.0010.0030.0010.0000.0010.0010.0010.0010.000
CosE 0.0230.0350.0850.0750.0000.0010.0020.0030.0030.000
CosE0.0270.0360.0870.0780.0000.0010.0020.0030.0030.000
Table 8. The evaluated results on link prediction of transitivity and symmetry.
Table 8. The evaluated results on link prediction of transitivity and symmetry.
YAGO-On-tYAGO-On-s
MetricMRRHits@N(%)MRRHits@N(%)
RawFilter1031RawFilter1031
TransE [13]0.064 †0.077 †0.142 †0.070 †0.001 †0.043 †0.3690.971 †0.514 †0.080 †
TransH [14]0.200 †0.238 †0.309 †0.214 †0.149 †0.001 †0.002 †0.001 †0.000 †0.000 †
TransR [15]0.012 †0.013 †0.003 †0.002 †0.001 †0.010 †0.010 †0.001 †0.000 †0.000 †
TransD [28]0.008 †0.009 †0.020 †0.001 †0.000 †0.001 †0.181 †0.512 †0.302 †0.000 †
RESCAL [16]0.016 †0.020 †0.055 †0.015 †0.004 †0.032 †0.166 †0.449 †0.226 †0.039 †
HolE [18]0.0400.0450.0820.0080.0020.0700.3420.7160.4250.128
ComplEx [19]0.001 †0.001 †0.001 †0.000 †0.000 †0.036 †0.2530.7430.4390.000
Analogy [47]0.001 †0.001 †0.001 †0.001 †0.000 †0.043 †0.315 †0.932 †0.538 †0.000 †
RotatE [23]0.0010.0010.0010.0000.0000.0020.0020.0010.0000.000
EL Embedding [24]0.0010.0010.0010.0000.0000.0030.0030.0750.0170.003
CosE 0.2030.2600.4030.2460.1770.0380.3220.9830.5540.000
CosE0.2070.2840.4080.2610.2180.0380.3230.9920.5570.000
† Shows that the results are taken from the work [49]. Other results are obtained via their source codes.
Table 9. The examples of predicted results on CosE compared with TransE.
Table 9. The examples of predicted results on CosE compared with TransE.
Head ConceptRelationCosETransE [13]
Taksim_SK_footballerssubclassOfperson
player
site
club
football_player
person
airport
model
peninsula
singer
Soccer_clubs_in_the_
Greater_Los_Angeles_Area
subclassOfsite
person
player
club
football_player
person
airport
model
singer
writer
Tail ConceptRelationCosETransE
Irish_male_modelsDisjointWithFilipino_female_models
African_American_models
LGBT_models
LGBT_models
South_African_female_models
American_male_models
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, W.; Zheng, X.; Gao, H.; Ji, Q.; Qi, G. Cosine-Based Embedding for Completing Lightweight Schematic Knowledge in DL-Litecore. Appl. Sci. 2022, 12, 10690. https://doi.org/10.3390/app122010690

AMA Style

Li W, Zheng X, Gao H, Ji Q, Qi G. Cosine-Based Embedding for Completing Lightweight Schematic Knowledge in DL-Litecore. Applied Sciences. 2022; 12(20):10690. https://doi.org/10.3390/app122010690

Chicago/Turabian Style

Li, Weizhuo, Xianda Zheng, Huan Gao, Qiu Ji, and Guilin Qi. 2022. "Cosine-Based Embedding for Completing Lightweight Schematic Knowledge in DL-Litecore" Applied Sciences 12, no. 20: 10690. https://doi.org/10.3390/app122010690

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop