Ontology Completion with Graph-Based Machine Learning: A Comprehensive Evaluation

: Increasing quantities of semantic resources offer a wealth of human knowledge, but their growth also increases the probability of wrong knowledge base entries. The development of approaches that identify potentially spurious parts of a given knowledge base is therefore highly relevant. We propose an approach for ontology completion that transforms an ontology into a graph and recommends missing edges using structure-only link analysis methods. By systematically evaluating thirteen methods (some for knowledge graphs) on eight different semantic resources, including Gene Ontology, Food Ontology, Marine Ontology


Introduction
Researchers and companies often address tasks that require domain knowledge for their solution.Domain knowledge often contains complex relations between entities and human-defined terms that can be modeled using ontologies [1,2].Ontologies can range from small ones that describe people, their activities, and relations to other people, such as the FOAF ontology [3], up to large ones such as the Gene Ontology that describes, e.g., protein functions and cellular processes [4].
Ontologies have long been used to represent and reason over domain knowledge but have recently shown further potential in conjunction with machine learning methods.They have been used for relation prediction tasks [5,6], much like graphs, or to improve features with background knowledge in other machine learning tasks [7].Commonly applied methods range from more traditional semantic similarity approaches [5,8] to recently successful entity-embedding algorithms, whether graph-based [9][10][11], syntactic [12,13], hybrid [14], or rule-based [15].Alternatively, ontologies have been utilized to constrain the output of machine learning and optimization models to conform to certain rules [16].
While small ontologies can easily be annotated due to a limited number of possible relations, even highly skilled domain experts can make mistakes when considering larger ontologies, by missing some links or adding nonexistent ones.These mistakes can have an impact on our understanding of the domain and can produce models and solutions that do not perform well.This issue is addressed by ontology completion, which refers to the task of finding missing relations that are plausible but not logically deducible from the given ontology [17].This work adopts machine learning methods that perform well on the link prediction task for ontology completion exclusively based on the ontology structure.This is achieved by representing ontologies as graphs and applying link prediction methods, providing experts with a scalable ontology completion tool to help improve the error-prone human annotation process.The contributions of this work include:

•
A methodology for ontology completion using link analysis methods; • A methodology for recommending missing edges using scores obtained through link prediction; • Demonstrated utility of considered link analysis methods on multiple ontologies with different properties; • Simple-to-use software for ontology completion and evaluation of the proposed methodology on a given ontology (Available online: https://github.com/smeznar/ontology-completion-with-graph-learners, accessed on 1 November 2022).
The related work is presented in Section 2. The proposed methodology is outlined in Section 3. It consists of an ontology-to-graph transformation (Section 3.1), ontology completion using link prediction (Section 3.2), a recommendation of missing edges (Section 3.3), and an approach for explaining recommendations (Section 3.4).The experimental setting is outlined in Section 4. The results of the evaluation are presented in Section 5.The paper concludes with a discussion in Section 6.

Related Work
In this section, we first introduce ontologies and their use in machine learning, followed by the relevant related work from the field of machine learning on graphs and link prediction.

Ontologies
Ontologies refer to machine-readable representations of knowledge in a given application domain, usually defined in a declarative knowledge modeling language, such as OWL (Web Ontology Language) [18], which is based on description logic (DL).Ontologies operate with individuals, classes (sets of individuals), and properties (relations between individuals), for which they define semantics through a set of logical statements (axioms).
These statements fall into two categories.The terminological box (also called the T-box, vocabulary, or schema) contains statements defining classes, their characteristics, and hierarchy.In contrast, the assertional box (A-box) consists of assertions about individuals (concrete facts), which use the vocabulary of the T-box.Given a complete ontology, reasoners can infer additional implicit facts from the explicitly defined set based on rules defined in the T-box.For example, the A-box fact "Mary is a mother" implies "Mary is a parent" since the T-box defines that "mother is a subclass of parent".T-box statements in OWL are class subsumption axioms (e.g., "mother SubclassOf: woman"), property restrictions (e.g., "parent EquivalentTo: HasChild Some person", meaning that everyone who has at least one child is a parent) and set operators (e.g., "parent EquivalentTo: mother Or father").A-box statements include membership axioms (e.g., "Mary Types: mother") and property assertions (e.g., John Facts: HasWife Mary).
Ontologies are used in various fields and vary significantly in their purpose, content, and implementation.At one end of the spectrum, there are small, ungrounded ontologies that lack an A-box and serve as a semantic schema of high-level terms (classes) in a particular domain.They are often used in the scope of the semantic Web and are intended to be referenced by various Web sources and thereby populated with "external" facts.Examples include FOAF and the Marine Ontology.Conversely, there are larger, more grounded ontologies attempting to comprehensively capture knowledge in a domain as a complex hierarchy with many concrete facts.The Gene Ontology is an illustrative example.
Notably, grounded ontologies, or at least their graph representations, could be considered knowledge graphs (KGs) by some definitions that define the latter as a schema (T-box) accompanied by a large number of (A-box) facts [19].However, KGs do not have a well-established definition yet, with other work giving alternate proposals, such as a collection of facts without the schema [20], or a system encompassing an ontology and a reasoning engine [21].
In practice, different grounded ontologies also take different approaches to capturing A-box (ground level) facts in OWL.Some, such as HeLiS [22], use individuals and OWL's object property assertion axioms to define that two "ground level entities" are related by a property.Others, such as the Gene Ontology and Food Ontology [23], do not use individuals at all, representing even very "individual-like" entities as classes and thus blurring the line between T-box and A-box.Because of this, it can be difficult to create a general approach for embedding ontologies.

Ontology Embeddings
Ontologies can be seen as stand-alone resources, much like knowledge graphs, as means of encoding domain knowledge.On the other hand, ontologies have recently shown potential in combination with machine learning methods as means to provide additional background information or to constrain the learning process.To exploit ontology information using approaches such as linear regression or neural networks, one usually has to embed them into the vector space first.In our work, we propose to embed ontologies with graph-based methods to identify potentially novel relations for ontology completion.
The related work comprises several works that present ontology-specific embedding algorithms.Onto2Vec [12] constructs sentences from OWL axioms and trains a language model; On2Vec [11] is based on translational graph embeddings and OWL2Vec* [14] combines the language model approach with random walks on the ontology graph.These methods have been typically evaluated against knowledge graph embeddings on a limited number of large ontologies for the relation prediction task (predicting the predicate between two entities).
Other examples are domain-specific applications, most often in the biomedical domain, where ontologies are mined for tasks such as protein-protein interaction (PPI) prediction, or gene-disease prediction.Here, several ontologies are usually combined into a single data set and used with semantic similarity approaches [5,24], often heavily tailored to the task.Recently, similar works have adopted ontology-specific embeddings [6,25] that are more general.
None of the above can be considered a systematic study.To our knowledge, the most valuable resource about the topic that is closest to our work is a survey on the state of machine learning with ontologies [26] that covers both traditional semantic similarity methods and recent embedding-based methods.It looked at simple graph embeddings, knowledge graph embeddings, and ontology-specific embeddings, categorizing them into graph-based, syntactic and semantic approaches.The survey included an experimental evaluation of a subset of methods on a protein-protein interaction task.It only considered two subsets of GO as data sets, since its focus was on a theoretic categorization of the field and on the biomedical domain in particular.Compared to the survey [26] that focused on the overview of the field, our study focuses on the evaluation of a large number of graph-based methods, comparing graph embeddings to KG-specific embeddings.However, we limit ourselves to structure-only methods due to the nature of the ontology-to-graph conversion.We also test our methodology on substantially different ontologies both in terms of the domain they cover and their size, ranging from small schemas to large knowledge bases of ground-level facts.
A more extensive overview of related work is summarized in Table 1.

Graph-Based Machine Learning
Graph-based machine learning has seen a rise in popularity in recent years due to its potential to work with complex data structures such as relational databases and structures commonly found in biology and chemistry [28].This branch of machine learning mainly focuses on node and graph classification [29], node clustering, and link prediction tasks [30].
Machine learning tasks on graphs are usually solved in three different ways.Traditionally, tasks on graphs are solved using label propagation [31], PageRank [32], and proximity-based measures such as Adamic/Adar [33] or the Jaccard coefficient [34].Another group of approaches embed graphs into a vector space, used together with traditional machine learning methods such as logistic regression to generate predictions.These approaches include well-established methods such as node2vec [9] and Deepwalk [35], as well as new ones such as SNoRe [36].More recently, with new research in deep learning approaches, neural network models such as graph convolutional networks (GCN) [37], and graph attention networks (GAT) [38] have emerged as end-to-end learners.

Link Prediction
Link prediction is one of the most widely addressed tasks concerning graph-based data.Predicting whether there exists an edge between two nodes without any additional information is hard, but with some additional information about the graphs, nodes, and with some assumptions, various approaches can predict the edge existence well [30,39].The most common assumption used in link prediction is that two nodes are connected if they are similar.This similarity might be due to them sharing similar node features or having similar neighborhoods.Another assumption that is commonly used is that nodes are likely to be connected to nodes with a high number of neighbors.These assumptions are both reasonable and often occur in real-life networks as well as ontologies.
Link prediction is traditionally solved using proximity-based methods that model networks using the assumptions mentioned above.These methods commonly predict the existence of links based on the first and second neighbors of the nodes, e.g., the number of common neighbors.These include Adamic/Adar index [33], preferential attachment, and others.Later, embedding methods such as node2vec [9] and SNoRe [36] were developed.These methods use a random walk to explore and approximate a node's neighborhood.The embedding these methods produce is either low-dimensional or sparse and usually performs well even on networks where structure assumptions do not necessarily hold.Similar to embedding methods, graph neural networks such as GCN [37] and GAT [38] have recently been used for different machine learning tasks on networks.These approaches jointly exploit the adjacency matrix of a network alongside node features.For knowledge graphs, i.e., graphs where nodes and edges usually contain some additional information, specialized approaches such as metapath2vec [40] are used.Other approaches on knowledge graphs such as TransE [10] and RotatE [41] embed nodes and relations in such a way that a combination of their embeddings creates a vector that has a norm close to zero if the triplet (subject, predicate, and object) is inside the graph and close to one if it is not.
In our work, we mainly focus on link prediction using embedding and proximitybased methods as they do not require a specific representation or additional knowledge about the graph.By exploiting the semantic information of the knowledge graph, one can find missing links using rule-based approaches [15].

Methodology
The following section presents the ontology-to-graph transformation, link prediction, recommendation of missing links, and explanation of the recommendations.

Ontology-to-Graph Transformation
As outlined above, machine learning has approached the use of ontologies in various ways.One of the common approaches is to represent ontologies as graphs where nodes represent classes or individuals, and links encode semantic relationships defined by the ontology.This approach enables the use of many powerful graph-based machine learning methods that are being developed for other problem domains and are rapidly evolving.
Since an ontology can be understood as a set of logical expressions and is usually modeled as such, there exist multiple possible conversions of a given ontology into a graph.Certain expressions, such as property assertions, directly map to nodes and edges.Others, such as property restrictions, domain-range axioms, and set operators, do not have an obvious representation.Because of our aim to learn about ontologies using graph-based methods, the conversion needs to be such that semantics expressed with OWL axioms are sufficiently reflected in the resulting graph's topology.
A number of different approaches for converting OWL ontologies to graphs have been developed; however, there is not yet an established standard or agreement on what the most appropriate representation is.In our work we transform an ontology into a (knowledge) graph using projection rules [14,42], as it has been previously used in a similar machine learning context, outperforming methods such as OWL2RDF [14].However, we note that the conversion algorithm could be substituted by another without significant changes to the rest of our methodology.Projection rules transform class subsumptions and property assertions between individuals directly into predefined triplets, without loss of information, while more complex logical expressions, such as property restrictions, are approximated with simple triplets that do not keep the exact logical relationships.The result is a graph that presumably approximately captures all relationships but does not contain noisy syntactic structures.This approach produces a directed heterogeneous multigraph, i.e., a set of triplets (edges) of the form s, p, o ∈ T, where s and o are nodes (classes or individuals) and p is a label representing the relation between them.We use this graph directly as the input to our baselines, which operate on knowledge graphs.For other methods, we further convert this graph into an undirected homogeneous graph G(N, E) so that {o, s} ∈ E ⇔ ∃p : o, p, s ∈ T, meaning two nodes are at most connected by a single undirected anonymous edge.In both cases we discard any additional textual information, such as labels and descriptions of entities.We use undirected graphs instead of directed ones because benchmarking link prediction on directed graphs can be problematic; it may result in increasing scores artificially, since a connection between two nodes can occur in both the training and the test set.

Ontology Completion Using Link Prediction
Link prediction and ontology completion tasks are closely related since one predicts which edges are in the graph, and the other which relations are missing from the ontology.Therefore, it is crucial to determine how well our model works on the link prediction task.A high accuracy on the link prediction task means that the model will be able to reconstruct the graph well and thus accurately predict which edges (connections) are missing.
In our work, we use the following methodology to test how well our methods perform on the link prediction task.First, we transform an ontology into a graph as described in Section 3.1.After this, we create positive (existent) and negative (nonexistent) examples.We shuffle and split them into five folds.We then use the edges from four out of five folds to create the adjacency matrix used in training.For each fold, we then train the baseline models using the adjacency matrix generated from the other four folds and the corresponding positive and negative edges, if they are needed as input.We use these models to predict the existence of the positive and negative edges in this fold.Proximity-based methods and GNN predict scores directly, while other methods use logistic regression.We evaluate the performance using the ROC-AUC.An overview of the link prediction process can be seen in Figure 1.In knowledge graph baselines, where the existence of a specific relation is predicted, we score all relations and take the highest score.

Recommending Missing and Redundant Edges
Annotations of data are not perfect and often, an annotator might miss some relations in the ontology, or sufficient experiments might not have been conducted to determine if some relation exists.Because of this, methods for recommending missing links can help improve the ontology.This section presents an approach for creating such recommendations.
First, we transform the ontology as presented in Section 3.1.Then, we embed the nodes of the generated network into matrix R using a non proximity-based approach (e.g., approaches in Section 4).A row of R represents a node, while a column represents either a symbolic feature (in case of SNoRe) or a latent feature.Using R, we then create the link prediction matrix L = R • R T that is used to find candidates for the missing and redundant connections.For proximity-based approaches, matrix L can be obtained by individually generating scores for each pair of nodes.We split the link prediction matrix into two matrices, one that represents the score of existing edges and one that represents the score of nonexisting edges.The matrix with scores of the existing edges P can be obtained by using the adjacency matrix A as the mask P = L[A] (if matrices A and B are of the same size, we define A[B] = C as C i,j = A i,j if B i,j = 1, otherwise C i,j = 0).The matrix with scores for the nonexisting edges M can be obtained by subtracting the matrix with the scores of existing edges from the link prediction matrix M = L − P. Recommendations for missing edges are obtained by selecting the elements in the matrix of the nonexisting edges with the highest scores.Additionally, recommendations for redundant edges can be obtained by selecting the elements with the lowest (nonzero) scores in the matrix of existing edges.An overview of this methodology is shown in Figure 2.
Given that the space complexity of the link prediction matrix is quadratic, such approach might not be feasible for ontologies with many entities.One way to avoid this is to only create predictions for a subset of nodes P ⊂ V.This can be done by creating a prediction matrix L{P} = R{P} • R T , where A{B} represents the rows of nodes from set B in matrix A. To obtain recommendations, we use the same technique as before, the only difference being that we only use the mask for the selected nodes.For the approaches that do not generate an embedding, this is done by only generating scores of the selected pairs of nodes.

Explaining Recommendations
An ontology completion tool can have much higher utility to annotators if it can provide some level of explanation for its recommendations.By using a method that produces a symbolic embedding, such as SNoRe, we can explain why the model recommended some specific edges.The following sections presents how to create global explanations, which give the most important features (in our case nodes).Similarly, one can create explanations for each prediction using interpretability methods such as SHAP [43].
Global explanations are useful for finding nodes that contribute the most to the presence of an edge.Since features represent the similarity to some node's neighborhood, this information can be exploited to prioritize nodes with the higher chance of being connected.Specifically, annotators can start with nodes in the neighborhood of the most important feature (a node).To create such explanations, we first train a logistic regression model as in the link prediction benchmark.The importance of features can be estimated by the absolute value of its t-statistic [44].The t-statistic is calculated as the weight of the feature divided by its standard error.For the jth feature with the weight β j , the t-statistic is calculated using the formula SE(β j ) = (X T W X) −1  jj , where X represents the matrix with training data used as input to the logistic regression, W is a diagonal matrix where , and p is the number of features in the embedding.

Experimental Setting
In this section, we first present the considered data sets (ontologies), followed by the description of the baselines and the evaluation procedures.

Data Sets
In our work, we used the following ontologies: Marine TLO [45]: A small top-level ontology of concepts related to biodiversity data in the marine domain.It is intended to help integrate new information about marine species (linked data) by providing a hierarchy of generic classes such as legislative zone or ecosystem.
Anatomy Entity Ontology (AEO) [46]: A high-level vocabulary of anatomical structures common across species.It aims to enable interoperability between different anatomy ontologies (such as EHDAA2) and describes anatomical entities such as artery, bone, or mucous membrane.
SCTO [47]: Captures upper-level terms from the Systematized Nomenclature of Medicine (SNOMED CT), a comprehensive medical terminology used to manage electronic health data, as an ontology.It is just a taxonomy (only a subclass of relations), with diverse classes such as symptom, laboratory test, or anatomical structure.
Emotion Ontology (MFOEM) [48]: It aims to describe affective phenomena (emotions and moods), their different building blocks, and their effects on human behavior (expressions).Similar to the Anatomy Entity Ontology, this ontology includes more numerous and more specific terms than FOAF or SCTO, but is not as grounded as, for example, the Gene Ontology.It models classes such as anxiety, negative valence, or blushing, and properties such as has occurrent part.
Human Developmental Anatomy v2 (EHDAA2) [49]: An ontology that is primarily structured around the parts of organ systems and their development in the first 49 days (Carnegie stages (CS)1-20).It includes more than 2000 anatomical entities (AEs) and aims to include as much information about human developmental anatomy as is practical and as is available in the literature.
Food Ontology (FOODON) [23]: It aims to name all parts of animals, plants, and fungi that can bear a food role for humans, as well as their derived food products and the processes used to make them.It is a large, fairly grounded ontology with upper-level entities such as part of organism and leaf classes as specific as Pinot noir wine or chickpea.Some properties include has ingredient, derives from and has quality.
LKN [50]: A biological knowledge graph constructed from multiple different sources of information, including temporal expression data, small RNA-based interactions and protein-protein interactions.This source was obtained in the process of semiautomatic curation.

Gene Ontology (GO) [4]
A comprehensive source of information on cellular processes.It describes three types of entities-molecular functions, cellular components and biological processes-and their relations in a complex class hierarchy linked mostly by is a (subsumption), part of (meronymy) and regulates properties.Among the used ontologies, GO is the largest and most grounded, with entities (classes) ranging from molecular function to, for example, DNA alpha-glucosyl transferase activity.
We selected these ontologies because they possessed different properties, came from different domains, ranged from small to big ones, and contain both grounded and ungrounded ontologies.Because of this, we believe that the conclusions from our benchmark generalize well to most existing ontologies.Some basic statistics of these ontologies are shown in Table 2. Of the three bigger ontologies, the Food Ontology has a very treelike structure, while LKN and the Gene Ontology are more connected.

Baselines
We selected baselines that belonged to different groups.Proximity-based methods such as Adamic/Adar [33], the Jaccard coefficient [34], and preferential attachment [51] are fast, space-efficient, and require no additional training, but rely on strong assumptions which usually do not hold.Graph neural networks such as GAE [52], GAT [38], GCN [37], and GIN [53] usually work well when we have a lot of data, computational resources, and handcrafted features.Embedding methods SNoRe [36], node2vec [9], and metapath2vec [40] first embed graph nodes into a matrix and then solve a given task with a learner.We classified the existence of a link for this method using logistic regression as it has good performance and interpretability.TransE [10] and RotatE [41] are knowledge graph embedding methods that embed nodes as well as the relationships between them.Lastly, we also used a spectral clustering [54] baseline.A more detailed description of these baselines is the following: Adamic/Adar [33]: An edge between nodes u and v is scored with ∑ x∈N(u)∩N(v) where N(x) is the neighborhood of node x.These scores are normalized and thresholded to obtain a link prediction.
Jaccard coefficient [34]: An edge between nodes u and v is scored with , where N(u) is the neighborhood of node u.These scores are normalized and thresholded to obtain the link prediction.
Preferential attachment [51]: An edge between nodes u and v is scored with |N(u)| • |N(v)|, where N(u) is the neighborhood of node u.These scores are normalized and thresholded to obtain the link prediction.
GAE [52]: Generates a node representation with a variational graph autoencoder that uses latent variables to learn an interpretable model.
GAT [38]: Includes the attention mechanism that helps learn the importance of neighboring nodes.In our tests, we adapted the implementation from PyTorch Geometric [55].
GCN [37]: A method that introduced convolution to graph neural networks and revolutionized the field.This approach aggregates feature information from the node's neighborhood.In our tests, we adapted the implementation from PyTorch Geometric [55].
GIN [53]: Learns a representation that can provably achieve the maximum discriminative power.In our tests, we adapted the implementation from PyTorch Geometric [55].
SNoRe [36]: A node embedding algorithm that produces an interpretable embedding by calculating the similarity between vectors generated by hashing random walks.
node2vec [9]: A node-embedding algorithm that learns a low-dimensional representation of nodes that maximizes the likelihood of neighborhood preservation using random walks.

metapath2vec [40]:
A node-embedding algorithm that learns a low-dimensional representation of nodes.The algorithm works similarly to node2vec but samples random walks based on predetermined metapaths.
TransE [10]: Creates a knowledge graph embedding in such a way that the distance between the embedding of the second node and the embedding of the first node translated by the embedding of the relation is small.
RotatE [41]: Creates a knowledge graph embedding.This approach is similar to TransE, but instead of translating the embedding of the first node by the embedding of the relation, it rotates the embedding in a complex vector space.
Spectral clustering [54]: Generates a node embedding by using a nonlinear dimensionality reduction method based on a spectral decomposition of the graph's Laplacian matrix.

Evaluation
We evaluated the link prediction capabilities on transformed ontologies by using a five-fold cross-validation.We created these folds as follows.We started with a directed (multi)graph with multiple edges between each pair of nodes.We transformed this graph into a simple undirected graph and removed elements on the diagonal of the adjacency matrix.Afterwards, we took the upper triangle of the adjacency matrix, put the elements into an array, and shuffled them to create positive examples.Selecting only elements from the upper triangle ensured a fair evaluation since each edge was chosen exactly once and thus contained exactly one fold.For negative examples, we randomly sampled pairs of nodes, tested if the edge between them existed, and made sure they did not repeat.We used the same amount of positive and negative examples.We split positive and negative examples into five equally sized parts.
We obtained the score of a baseline on the selected data set by taking the mean value of the scores for each fold.A fold was scored by training the model with data from other folds and using the ROC-AUC to obtain prediction scores for edges in that fold.While the AUC assessed all decision thresholds and thus gave a score that might be too general [56], we believe it sufficiently showed the performance of different baselines in our benchmark.To make our experiments reproducible, we initialized the random number generators of the data splitting algorithm and all baselines with the same, predetermined seed.This way, data splits were the same for each baseline.
TransE and RotatE baselines needed two nodes, as well as the type of the edge between them to predict the edge score.To bypass this, we generated predictions for each relation and output the most probable one.We did this because if there was no edge between two nodes, all predictions should have a low score, and otherwise, at least one should have a high score (the one we selected).During the training of these baselines, we maintained the information about the edge type.

Results
In this section we present the results of the link prediction and show an example of the global explanations.

Link Prediction Results
The results of ontology completion using the methodology described in Section 3.2 are presented in Table 3. From the results, we can observe two aspects that generally held for all baselines: the variance of results fell with the number of edges, and baselines performed significantly worse on ontologies where the ratio between nodes and edges was close to one.We can also observe that on smaller ontologies, the embedding methods that did not rely on random walks such as spectral embedding, TransE, and RotatE worked best, while on bigger ones, SNoRe and TransE generally outperformed the others.By grouping baselines of similar kinds together, we see that proximity-based approaches usually gave mediocre performance, and graph neural networks worked well on most data sets but usually fell just below the best performing approaches.Node-embedding algorithms based on random walks generally performed great on all data sets and approaches designed for knowledge graphs performed similarly to other embedding methods.Spectral embedding generally performed better on smaller ontologies, the exceptions being the Marine Ontology.Overall, the best performing baselines (based on the average rank) were SNoRe and TransE.
When comparing training and inference speed, our experiments showed that proximitybased methods performed the fastest, while the knowledge graph embedding methods usually the slowest.We found that the running time of smaller ontologies was less than a second for (almost) any given baseline.On the three bigger ontologies, some proximitybased methods achieved great results even though they were a lot faster; e.g., preferential attachment on LKN.Overall, the slowest method was RotatE, which needed almost twice as much time on the Gene Ontology as TransE, the second slowest.We found that graph neural network methods performed similarly or a bit slower than SNoRe, while achieving lower results overall.

Interpretation Examples
To further make our predictions useful, we interpreted them using the methodology presented in Section 3.4.Figure 3 shows an example of the global explanation made for the 2019 Gene Ontology.The term apoptotic process had the highest value, meaning that when the neighborhood of two nodes were similar to the neighborhood of this node (a high value in the embedding for this feature), it was more likely that there was an edge between them.In a real-world application, the annotator should start with nodes that are in the neighborhood of the apoptotic process node.

Discussion
The main goal of our approach was to simplify and accelerate the process of ontology completion for annotators.We did this by transforming the ontology into a graph and using link prediction approaches to score the edges.We also presented a methodology for recommending the most probable missing (and possibly redundant) links.We empirically evaluated the results of the link prediction on graphs obtained by transforming ontologies.From Tables 2 and 3, we can see that the link prediction worked well on graphs with a high number of average edges per node but badly on graphs whose structure resembled a tree (i.e., average edges per node is close to one). Figure 4a,b show a visualization of the marine and anatomy ontologies.We can see that the marine ontology was well connected through most of the graph but had a lot of leaves, while the anatomy ontology was largely treelike, especially at the bottom left of the figure.There are a few disadvantages to the proposed methodology.While the methodology works well on ontologies where nodes are well connected, it performs poorly on the ones whose structure resembles a tree.Such ontologies probably do not contain enough information to recommend new connections solely based on their structure.A possible solution could be to transform the ontology into a knowledge graph, instead of a simple undirected graph.Another disadvantage is that the approach needs a quadratic space to store scores for each connection.This could prove problematic for large ontologies where such an approach is needed even more due to the number of possible connections that can easily be missed.In practice, this is not necessarily problematic since embeddings are usually small enough to fit inside the memory and can be used to calculate scores for only a subset of k nodes.This lowers the space complexity to |N| • k, which can easily fit inside the memory and gives the same results.Since we work with simple graphs, our methodology can only capture simple links and works under the assumption that two nodes are connected if they are similar.This assumption is not necessarily true in ontologies and knowledge graphs, where link can have different semantic meanings (as an example a link might represent that two nodes are opposite).Lastly, link prediction assumes that the distribution of links in the test set corresponds to the distribution of real missing links.For ontologies, this assumption might not hold and consequently, results in the paper might not be indicative of the true performance.
Note that this work primarily focused on outlining a scalable end-to-end methodology for graph-based link prediction on ontologies.As such, there exist possible improvements to many of the approaches adopted as part of the pipeline.For example, we demonstrated how KG specific methods compared to methods that operated on simple graphs.However, there exist even more specialized methods, such as ontology-specific embeddings [12][13][14].These approaches are tailored to capture the higher expressivity that ontologies offer compared to knowledge graphs and usually utilize lexical information (metadata) about nodes and relations, which we ignored.Leveraging this additional information would likely produce better results.Other choices could also be made when it comes to ontology preprocessing and the ontology-to-graph conversion step.These include extending given ontologies with related ones, ontology pruning, entailment reasoning before conversion, and the choice of the ontology-to-graph conversion protocol itself.
Lastly, the presented methodology could help annotate larger ontologies where connections can easily be missed.In practice, a web application can be set up, where the annotator selects some nodes and gets recommended the most probable missing edges.

Conclusions
In this work, we proposed a graph-based machine learning approach for ontology completion that is complementary to other domain-knowledge-based approaches.With our approach, an annotator can quickly generate recommendations for the most probable missing relations without the need to fine-tune their approach to the specific ontology.We showed that the methodology yielded good results on larger ontologies when nodes had a high average degree.The proposed approach could prove useful for annotators of large ontologies or domain experts (e.g., biologists) to find the connections that are the most likely to belong in the ontology.In further work, we plan to collaborate with domain experts to further analyze the performance of our methodology in a real-life setting.We also intend to study different approaches for preprocessing ontologies and representing them as graphs, since this is one potential area where incorporating more of the available semantic information into the model could help improve results.Finally, we wish to move our focus to ontology-specific embeddings and other methods that utilize metadata about entities.We suspect that taking full advantage of these additional features could significantly improve the results and further explore the limits of the presented methodology.

Figure 1 .
Figure 1.Overview of the link prediction methodology.

Figure 2 .
Figure 2. Overview of our methodology for finding missing and redundant edges.

Figure 3 .
Figure 3. Example of a global explanation for the Gene ontology.

Table 1 .
Overview of some of the related approaches, their aim, and a short description.The horizontal lines separate different types of research articles.
Combines data from several biomedical ontologies.Presents a novel domain-specific embedding model and evaluates it against existing ontology embeddings on a gene-disease prediction task.

Table 2 .
Basic statistics of the tested ontologies.Columns |N|, |E|, and Components describe the converted graphs.The rest describes the raw OWL ontologies.

Table 3 .
Link prediction results based on the ROC-AUC metric (multiplied by 100 to improve readability) and average rank.