1. Introduction
The proliferation of semantic web ontologies and their applications nowadays has made an abundance of information and knowledge available for reuse by a wide number of applications. In the era of big data, terabytes of data are being generated at each single moment via different types of media on the web. However, overlapping and inconsistency in the generated relevant data and information may exist. Therefore, there is a need to investigate and analyze such data for redundancy discovery, capturing new knowledge and building unified consistent knowledge bases that can be reused by respective applications. Thus, there is a massive need for techniques to merge knowledge from similar domain ontologies to produce up-to-date and integrated ontologies. Such a process is referred to as ontology merging [
1,
2]. Ontology merging aims to merge existing knowledge from heterogeneous sources to constitute a new one. However, despite the efforts devoted to ontology merging [
3,
4,
5,
6,
7,
8,
9], the incorporation of axioms, individuals and annotations in the resulting ontologies remains challenging. Furthermore, certain studies [
10] only relied on lexical analysis of ontology concepts to perform the merging, which does not cover the semantic analysis of features of the candidate ontologies. Consequently, existing ontology-merging solutions produce new ontologies that do not include all the related and relevant semantic features from the candidate ontologies. Furthermore, such output ontologies should meet the required quality standard to ensure their usefulness [
11,
12,
13,
14]. Quality ontology merging requires knowledge in the output ontology to be complete, has a minimum amount of knowledge redundancy, has a high level of connectivity and acclivity, is concise and consistent, and enables inferences with constraints [
12,
13,
14]. To address the limitations of existing ontology-merging solutions, this paper proposes a novel algorithm for multi-criteria ontology merging that automatically builds a new ontology from candidate ontologies by iteratively updating its RDF graph in the memory. The algorithm begins by extracting the concepts, logical axioms and individuals from both the base and candidate ontologies. Thereafter, the concepts and annotations are aligned and merged to build an RDF graph, which serves as the input for the subsequent stage. Next, logical axioms extracted from the candidate ontologies are matched against all the logical axioms within the base ontology. The decision to include or exclude these axioms in the output ontology is guided by a predefined similarity score threshold. The updated RDF graph serves as the input for the final stage, where individuals are matched and merged to construct the final RDF graph of the resulting output ontology. The proposed algorithm uses a similarity-based framework to assess the concept similarities to guide the subsequent merging processes. The proposed algorithm leverages state-of-the-art Natural Language Processing tools such as fuzzy string-matching algorithms, Word2Vec, BERT and WordNet as well as a Machine Learning-based framework, namely, SM-DTR, to assess the similarities and merge various criteria. The key contribution of the proposed algorithm lies in its ability to merge relevant features from candidate ontologies, such as logical and direct/declarative axioms, annotations, individuals and hierarchical structure, to build a more accurate, integrated and cohesive output ontology. The proposed algorithm was tested with five ontologies of different computing domains, which were downloaded from various repositories on the web, and evaluated in terms of its asymptotic behavior, quality and computational performance. The analysis of the experimental results indicated that the proposed algorithm produced output ontologies that met the integrity, accuracy and cohesion quality criteria better than related studies. This performance demonstrated the effectiveness and superior capabilities of the proposed algorithm for ontology merging. Furthermore, the proposed algorithm enabled iterative in-memory updating and building of the RDF graph of the resulting output ontology, which enhanced the processing speed and improved the computational efficiency, making it an ideal solution for big data applications.
The main contribution of this paper can be summarized in the following points:
The proposed algorithm takes into account multiple criteria in the ontology-merging process, including logical and direct/declarative axioms, hierarchal structure, individuals, and annotations. This results in high-quality output ontologies that are integrated, accurate and cohesive.
We introduce a multi-level measure of similarity between the matched components and vocabulary in the ontologies. This measure guides the decision-making process concerning how to seamlessly incorporate relevant vocabulary and knowledge from the candidate ontologies into the output ontology.
The proposed algorithm leverages the in-memory RDF graphs mechanism, enabling efficient processing capability for periodic and continuous updates and smooth ontology merging. This computational advantage holds value in various settings, especially for large-scale applications dealing with frequent data generation, including big data.
The remainder of this paper is organized as follows.
Section 2 delves into the existing literature and related work, providing valuable context for our research. In
Section 3, we detail the materials and methods used in the proposed algorithm.
Section 4 presents the proposed algorithm. In
Section 5, we present our experimental results and engage in a comprehensive discussion of their implications. In
Section 6, we compare our results and evaluate them against existing works. Finally, in
Section 7, we draw a conclusion and offer insights into potential directions for future research endeavors.
2. Literature Review
Previous endeavors have been devoted to providing guidelines for quality ontology merging. The authors of [
12] proposed a comprehensive framework for merging candidate ontologies into output ontologies that meet the integrity and cohesion quality criteria. The integrity criterion advocates for comprehensive knowledge coverage and minimal redundancy and emphasizes that the output ontology from the merging encapsulates as much pertinent knowledge as possible while trimming excess repetition, whereas the cohesion criterion indicates how related the properties within the ontology are. Another study [
13] defined four quality criteria of ontologies, including accuracy, understandability, cohesion, and conciseness, as well as the mathematical formulations for each of these criteria. In [
15], ontology quality metrics were classified into two distinct groups, namely, schema metrics and knowledge base metrics. The schema metrics assess the design of the ontology and its knowledge representation capability, while the knowledge base metrics examine the incorporation of instance data into the ontology and how effectively it leverages the knowledge outlined in the schema.
In [
16], the authors proposed a novel method for merging domain ontologies utilizing granular computing. The method consists of four major operations, including association, isolation, purification, and reduction. The method works by comparing the concepts of ontologies, aiming to reduce the level of granularity during the merging process. This study considered the labels of classes and the taxonomy, while other criteria, including properties, logical axioms, and individuals, were not considered in the proposed method. In another study [
17], the authors developed an algorithm called ATOM for taxonomy merging. They harnessed the is-a and equivalent classes relationships to match different concepts. Their focus was on taxonomy and individuals, while other criteria that can enrich the output ontology, such as direct and logical axioms, as well as object properties were not considered. In [
18], the researchers proposed a semi-automatic method for merging large ontologies using graphDB for big data processing. Their method is well suited for modularization-type problems such as the importing of specific module or part of the ontology into others. However, a main shortcoming of their algorithm is that it exploits the entire sub-classes of the matched entities into the resulting output ontology, assuming the relevance of the whole candidate ontologies’ sub-class taxonomies. This does not comply with the reduction and cohesion guidelines prescribed in [
12,
13,
14]. Additionally, the merging process involves manual intervention of human operators, which may slow down the process. These shortcomings are well addressed in the fully automated algorithm proposed in this study. In another work [
11], the authors proposed a semi-automatic framework for merging smart factories ontologies. Their methodology includes three major tasks, namely, preprocessing, matching, and post-processing. They performed several operations embedded within those tasks, including spell-checking of the concepts’ labels, tokenization and translation, structure analysis and user confirmation. The inclusion of similar concepts is based on two threshold values that determine the relevance to minimize the rate of rejection. However, their method does not process annotations, which are relevant features of ontologies. This shortcoming is overcome in our proposed algorithm in this study. In [
3], a method called the Hybrid Semantic Similarity Measure (HSSM) was proposed for merging ontologies. The aim was to remove redundancy and improve storage efficiency, which are important aspects of quality ontology merging. The method was developed using Formal Concept Analysis (FCA) and semantic similarity measures. Although their method is effective, it does not cover other relevant ontology elements, such as logical axioms and annotations. Many studies [
15,
19,
20,
21,
22,
23] utilized the lexical database WordNet in their ontology-merging methods due to their effectiveness in the semantic analysis of terminologies. Our proposed algorithm also utilized WordNet for synonym extraction purposes.
To the best of our knowledge, the majority of previous studies [
8,
9,
10,
24] did not process the logical axioms and annotations in their ontology-merging methods. This shortcoming is addressed in our proposed algorithm through the processing of various criteria, including logical axioms, individuals and annotations, with the aim of producing quality output ontologies that include relevant features from candidate ontologies as well as meeting the prescribed ontology-merging quality criteria [
12,
13,
14]. A comparative and exhaustive analysis of 19 studies was undertaken to review all the criteria used within their respective methodologies for ontology merging. The findings of this comparative analysis are presented in
Table 1. To represent the extent to which a criterion has been fulfilled in each study, we used the following encodings in
Table 1: (1) ✔ indicates that the given criterion is met comprehensively; (2) ✘ signifies the nonfulfillment of the criterion at all; and (3) ≈ denotes an intermediate degree of satisfaction, that is, a partial fulfillment of the criterion.
Table 1 offers an insightful glimpse into the spectrum of criteria governing ontology merging, along with the extent to which the corresponding studies have fulfilled them. It is obvious that a substantial portion of the studies investigated did not consider the logical axioms in the merging process. Additionally, none of the previous studies considered the annotations, while few have integrated the individuals within their methodologies. Another noticeable finding is that no study has endeavored to tackle the entirety of these criteria collectively. As a result, the resulting output ontologies from existing ontology-merging solutions do not satisfy the prescribed quality criteria of integration, cohesion and completeness.
Ontology merging is a complex and demanding task due to the heterogeneity in the structures and semantics of the ontologies being merged. Despite the advancements witnessed in the recent literature, where various solutions for merging ontologies have been proposed [
2,
7,
11,
15], it is noteworthy that many of the previous studies do not cover the holistic dimension of ontology merging. In other words, the majority of previous studies do not merge important features of ontology, including logical axioms, instances and annotations, in the output ontologies [
9,
15,
16]. Based on the analysis of the shortcomings above, the proposed algorithm in this work covers the full spectrum of the criteria shown in
Table 1 in the merging process. This ensures that the majority of the quality criteria prescribed [
12,
13,
14] are met by the resulting output ontologies from the proposed algorithm.
4. Proposed RDF-MOM Algorithm
The proposed algorithm works at three main stages, namely, merging concepts and annotations, merging logical axioms and merging individuals. The algorithm begins by extracting ontologies’ concepts, logical axioms and individuals and saves them into text files. This process is performed for both the candidate ontologies denoted On, where n is the index of the candidate ontology, and the base ontology, denoted Obase. Next, the concepts and annotations are aligned and subsequently merged. The outcome is an RDF graph, which serves as the input for the subsequent stage. In the second stage, the logical axioms extracted from the candidate ontologies are matched against all the logical axioms within the base ontology. The decision to incorporate or exclude these axioms hinges on a predefined similarity score threshold. In the final stage, individuals are matched and merged to build the final RDF graph of the output ontology. The proposed algorithm is presented in detail next.
The proposed algorithm for merging ontologies through iterative updating of the RDF graph is presented in Algorithm 1. Algorithm 1 initiates the process of aligning the concepts and annotations in line 3, where Algorithm 2 is invoked. The output of Algorithm 2 is an RDF graph, which serves as the input for the subsequent task of merging the logical axioms. The candidate ontology is converted into an RDF graph in line 4. Line 5 initiates an array that contains the list of RDF and OWL keywords that are going to be used in a later stage. In lines 6 and 7, the logical axioms are extracted using Algorithm 3. This is followed by the creation of word embeddings for the axioms in the base ontology in line 8. Because the number of logical expressions is needed for calculation of the average of similarity score later, the
sum and
count variables are declared in line 9. Starting from line 11, the algorithm iterates through the logical axioms in the candidate ontology in both their URI and label formats to align them with their counterparts in the base ontology. In lines 12–13, each logical axiom expression is tokenized to acquire the most relevant vector for each token. Tokens are words that compose a logical axiom expression. In lines 19–20, each token from the candidate ontology is tested against other vectors using the
boW2VModel created in line 8. It is essential to note that Word2Vec models cannot recognize previously unseen vocabulary when initially constructed. Hence, the algorithm keeps track of the tokens discovered by the model and those that were not in lines 14–15 and lines 24 and 26. Once all the tokens have been processed through the
boW2VModel, it is checked if they are all represented (line 27). If so, the average similarity score is calculated by dividing the
sum by the
count in line 28. The next step involves testing this average value. If it is greater than 0.50 and less than the predefined
logAxi_thr threshold, the axiom is added to the RDF graph in lines 29–30. Otherwise, the algorithm proceeds to iterate again over the next axioms in the candidate ontology. If not all the tokens are represented in
boW2VModel, the algorithm proceeds to line 31. From lines 32–42, tokens that were not represented are matched with all the logical axioms in the base ontology with the
bLab and
bUri text files using the Jaro–Winkler fuzzy string-matching algorithm (line 37). Subsequently, the average similarity score for all the tokens is calculated in line 40 and it is assessed whether the average similarity score meets the
logAxi_
thr threshold or is below 0.50 in line 41. If the condition holds, the axiom is discarded; otherwise, it is integrated into the RDF graph (lines 41–42).
Algorithm 1: RDF-MOM Algorithm |
1 | Inputs: boUrl, coUrl, conAx_thr, logAxi_thr, concWght, dirAxiWght |
2 | Outputs: myNewGraph |
3 | myNewGraph ← conceptsAndAnnotationsMerging(boUrl, coUrl, conAx_thr, concWght, dirAxiWght) |
4 | coGraph ← toRDFGraph (COurl) |
5 | owlLogAx_keywords ← [subClassOf, ObjectUnionOf, …] |
6 | boLogAxLab, boLogAxUris ← logicalAxiomExtraction (boUrl) |
7 | coLogAxLab, coLogAxUris ← logicalAxiomExtraction (coUrl) |
8 | boW2VModel ← W2Vec.trainModel(boLogAxLab) |
9 | sum, count ← 0 |
10 | |
11 | for conLab, conUri in coLogAxLab, coLogAxUris do: |
12 | temp_list1 ← sentenceToList(coLogAxLab) |
13 | temp_list1_uris ← sentenceToList(coLogAxUris) |
14 | unseen_words[] ← null |
15 | seen_words[] ← null |
16 | |
17 |
counter, avg_sum, max_sim ← 0 |
18 | max_lab ← null |
19 | for word in conLab do: |
20 | max_sim, max_lab ← GetMaxSimToken(boW2VModel.predictSimilarity(word)) |
21 | if similar token from conLab was found, then: |
22 | counter ← counter+1 |
23 |
sum ← sum + max_sim |
24 | seen_words.append([word, conLab, conUri, max_sim]) |
25 | else: |
26 | unseen_words.append[congLab, conUri] |
27 | if length of unseen_words == 0, then: |
28 |
avg ← sum/count |
29 | if avg <logAxi_thr and avg >0.50, then: |
30 | myNewGraph← addAxiomToRDFGraph(myNewGraph,conLab, ConUri) |
31 | else: |
32 | for unseenWord in unseen_words do: |
33 | max_val ← 0 |
34 | max_lab, max_uri ← null |
35 | for bLab, bUri in boLogAxLab, boLogAxUris do: |
36 | for bLabWord, bUriToken in bLab, bUri: |
37 | max_val,max_lab,max_uri ← GetMaxSimToken(JaroWinklerSim(unseen_words [0], bLabWord)) |
38 | sum ← sum + max_val |
39 | count ← count + 1 |
40 | avg ← sum/count |
41 | if avg < logAxi_thr and avg > 0.50, then: |
42 | myNewGraph. addAxiomToRDFGraph(myNewGraph, conLab, ConUri) |
43 | myNewGraph ← mergeIndividuals(myNewGraph, COurl) |
44 | return myNewGraph |
The algorithm proceeds with the merging of individuals in line 43 with Algorithm 6.
Algorithm 2 is used in line 3 of the proposed algorithm (Algorithm 1) to merge the concepts and annotations into the output ontology. The algorithm takes as inputs in line 1 the URIs of the concepts and annotations for the base and candidate ontologies,
boUrl and
coUrl, respectively, the concepts similarity score threshold
conAx_thr, the concepts weight
concWght, and the annotations weight
annWght. Thereafter, the graph structures of the base and candidate ontologies are extracted in lines 3–4. In line 5, the algorithm iterates through the concepts and their respective URIs within the candidate ontology to explore the relationships between them and their counterparts in the base ontology. Within the loop in line 5, an inner loop in line 12 traverses all the concepts within the base ontology as well as their associated annotations. These annotations are compared with the current concept’s annotations in the candidate ontology using the SM-DTR method in lines 12–17. Additionally, the annotations in the candidate ontology are matched to those of the base ontology using BERT in lines 23–28. From these iterative processes, the algorithm identifies the most similar concepts and annotations, weighed
concWght and
annAxiWght, respectively. These results are subsequently aggregated in line 29, and the cumulative similarity score is evaluated against the predefined
conAx_
thr similarity threshold in line 30. If this condition is met, signifying that the concepts share substantial commonality, their synonyms are extracted from WordNet and added to the in-memory RDF graph in lines 30–36. If the condition in line 30 is not met, a second check is performed in line 37 to determine whether the similarity score is higher than 0.50, that is, there is some degree of similarity. In such a case, the concept from the candidate ontology is added to the graph in line 38. If neither of the conditions in lines 30 and 37 is true, then the concept is discarded and the outer loop in line 5 proceeds to iterate over the remaining concepts and annotations within the candidate ontology.
Algorithm 2: conceptsAndAnnotationsMerging () |
1 | Inputs: boUrl, coUrl, conAx_thr, concWght, annWght |
2 | Outputs: new_rdfGraph |
3 | newRdfGraph ← toRDFGraph(boUrl) |
4 | cand_grpah ← toRDFGraph(coUrl) |
5 | for cand_conLab, cand_iri in cand_graph do: |
6 | if cand_conLab is empty, then: |
7 | continue |
8 |
else: |
9 | candAnn ← getDecAnnotations(cand_iri) |
10 |
maxSimCon ← “” |
11 |
maxSimVal ← 0 |
12 | for base_conLab, base_iri in newRdfGraph do: |
13 | if base_conLab is NOT empty, then: |
14 | s1 ← SM_DTR_SIMILARITY(cand_conLab, base_conLab) |
15 | if s1 > maxSimVal then: |
16 | maxSimVal ← s1 |
17 | maxSimCon ← base_iri |
18 | sim1 ← maxSimVal |
19 | sim2 ← 0 |
20 | baseAnn ← getDecAnnotations(maxSimCon) |
21 | candAnn ← getDecAnnotations(cand_iri) |
22 | common_Ann ← candAnn ∩ baseAnn |
23 | if common_Ann is NOT empty, then: |
24 | for ann in common_Ann do: |
25 | baseAnn ← getAnnVal(base_iri, baseDirAxAnn) |
26 | candAnn ← getAnnVal(cand_iri, candAnn) |
27 | annSimilarity ← BERT(baseAnn, candAnn) |
28 | sim2 ← annSimilarity |
29 | overall_sim ← (sim1 * concWght) + (sim2 * annWght) |
30 | if overall_sim > = conAx_thr then: |
31 | synSet[]←WordNet.synSet(cand_conLab) ∩ WordNet.synSet(base_conLab) |
32 | if cand_conLab is NOT exact match with base_conLab then: |
33 | synSet.addElement(cand_conLab) |
34 | if synSet is NOT empty, then: |
35 | newRdfGraph.addTriple((base_iri, nameSpace.synonyms, synSet)) |
36 | newRdfGraph.addTriple((base_iri, nameSpace.common_Ann, candAnn)) |
37 | else if overall_sim > 0.50, then: |
38 | newRdfGraph.addNewClass((cand_iri, CLASS)) |
39 | else: |
40 | continue |
41 | return newRdfGraph |
Algorithm 3 is used in lines 6–7 of the proposed algorithm (Algorithm 1) to extract all the logical axioms from a given input ontology. The algorithm extracts the axioms of the input ontology in line 3. In lines 4–5, the algorithm proceeds to create two text files to store the URIs and labels of the axioms. Subsequently, in lines 6–8, the URIs extracted in line 3 are stored into the
logAxioms_
URIs text file, while the labels are stored in the
logAxioms_
Labels text files in lines 9–12.
Algorithm 3: logicalAxiomExtraction () |
1 | Inputs: ontURL |
2 | Outputs: logAxioms_URIs, logAxioms_Labels |
3 | Axioms_URIs ← Extract_OWLLogAxioms(ontURL) |
4 | logAxioms_URIs ← CreatFile(‘ontology_name_URIs’) |
5 | logAxioms_Labels ← CreatFile(‘ontology_name_Labels’) |
6 | for axiom in Axioms_URIs do: |
7 | logAxioms_URIs.writeLine(axiom) |
8 | logAxioms_URIs.close() |
9 | for line in logAxioms_URIs do: |
10 | axiomWithLabels ← Tokenize line and get embedded labels from URIs using SPARQL |
11 | logAxioms_Labels.writeLine(axiomWithLabels) |
12 | logAxioms_Label.close() |
13 | return logAxioms_URIs, logAxioms_Labels |
Algorithm 4 is used in lines 30 and 42 of the proposed algorithm (Algorithm 1) to recursively add axioms to the RDF graph of the output ontology in the memory. The algorithm addresses two scenarios. The base case in lines 5–7 is when the axiom comprises precisely three tokens of keywords, URIs of vocabulary or values. Consequently, these tokens are added to the graph in lines 6–7 of Algorithm 4. However, when the axiom contains more than three tokens, the recursive case in lines 8–10 is executed. Upon the completion of the recursive process, the algorithm returns the updated graph
newGraph.
Algorithm 4: addAxiom() |
1 | Inputs: inputGraph, axiomLabExp[], axiomUriExp[] |
2 | Outputs: newGraph |
3 | newGraph ← inputGraph |
4 | x = isAxiomKeyword (axiomLabEx [0]) |
5 | if length(axiomLabExp) == 3, then: |
6 | newGraph.addTriple((axiomUriExp [1], x [0], axiomUriExp [1])) |
7 | newGraph.addTriple((axiomUriExp [1], RDFS.comment, “new merged axiom!”)) |
8 | else: |
9 | if x [1] != “concept_label”, then: |
10 | newGrph=addAxiom(newGraph,axiomLabExp [1:endOfList],axiomUriExp [1:endOfList]) |
11 | return newGraph |
Algorithm 5 is used in line 4 of Algorithm 4 to accept a term as an input and return a Boolean value. It tests whether the input term is a reserved RDF or OWL2 keyword or not. If the term is in the list
owlLogAx_keywords created in line 5 of Algorithm 1, then it returns true (lines 5–6), and it returns false otherwise (lines 7–8).
Algorithm 5: isAxiomKeyword() |
1 | Inputs: term |
2 | Outputs: result[] |
3 | result[] ← null |
4 | index = owlLogAx_keywords.indexOf(term) |
5 | if index >= 0, then: |
6 | result ← [ owlLogAx_keywords[index]] |
7 | else: |
8 | result ← [ term, “concept label”] |
9 | return result |
Algorithm 6 is used in line 43 of the proposed algorithm (Algorithm 1) to perform the merging of individuals. It accepts as inputs the graphs of the base and candidate ontology, along with a predefined threshold for the individual similarity scores (line 1). A SPARQL query in lines 3–7 is executed in lines 8 and 9 to retrieve all the individuals in the base and candidate ontologies. A loop in line 11 is used to process all the individuals in the candidate ontology. Within the loop in line 11, another loop is nested to iterate over all the individuals in the base ontology. The inner loop employs the Jaccard algorithm to assess the similarity score between the individuals in lines 15–20. Next, the maximum similarity score is tested against the threshold
threshold in line 21. If the maximum similarity score achieved the
threshold, then the two concepts are similar and no change is made in the graph. Otherwise, if the similarity score is greater than 0.50, the individual is added to the graph in line 22. If not, the individual is discarded. The updated
newGraph is then returned in line 23. The
newGraph is the final graph of the merging process and represents the graph of the output ontology in line 43 of the main algorithm (Algorithm 1).
Algorithm 6: mergeIndividuals() |
1 | Inputs: basetGraph, candGraph, threshold |
2 | Outputs: newGraph |
3 | query ← “ SELECT ?individual_uri ?label |
4 | WHERE { |
5 | ?individual rdf:type ?classs. |
6 | ?individual rdfs:label ?label. |
7 | } |
8 | baseResults ← baseGraph.executeQuery (query) |
9 | candResults ← candGraph.executeQuery (query) |
10 | newGraph ← baseGraph |
11 | for row1 in candResults do: |
12 | max_val ← 0 |
13 | max_label ← null |
14 | max_uri ← null |
15 | for row2 in baseResults do: |
16 | jacard_sim ← jaccardSimilarity(row1.label, row2.label) |
17 | if jaccard_sim > max_val, then: |
18 | max_val ← jaccard_sim |
19 | max_label ← row2.label |
20 | max_uri ← rwo2.individual_uri |
21 | if max_val < threshold and max_val > 0.80 then: |
22 | newGraph.addIndividual (row1) |
23 | return newGraph |
6. Comparison with Related Works
In this section, we conduct a comparative analysis of our results in relation to other existing studies, shedding light on both the commonalities and differences observed. In
Table 12, we provide an overview of how prior research has assessed the proposed methods for ontology merging, the summary of their findings and possible limitations. Additionally, we draw comparisons between our results and those presented in the literature. In
Table 1: (1) ✔ indicates that the given criterion is met comprehensively; (2) ✘ signifies the nonfulfillment of the criterion at all; and (3) ≈ denotes an intermediate degree of satisfaction, that is, a partial fulfillment of the criterion.
Table 12 reveals a prevalent trend among researchers, where a significant portion have not taken into consideration the complete spectrum of quality criteria. For instance, in [
20], the only quality criterion examined was accuracy, as the authors sought to validate the reliability of WordNet for semantic concept detection. In both [
1,
11], the evaluation criteria encompassed cohesion and integrity, with no considerations of the accuracy and execution time performance. Notably, the authors of [
1] emphasized that comparing the results achieved by their proposed method, namely, OnotMerger, to those of other works for an overall performance assessment proved challenging. This was attributed to OnotMerger’s specific requirements and inputs, which were not aligned with those of other methods.
In contrast, [
16] offered a comprehensive evaluation by addressing a wide array of quality criteria, providing compelling evidence that their method outperformed the HSSM approach. However, it is worth noting that their comparative analysis was limited to a single method. Meanwhile, in the case of the ATOM method [
18], the authors leveraged graph metrics, particularly by quantifying leaf paths in the knowledge graph, to evaluate their approach. The results highlighted a substantial reduction in the number of leaf paths and total classes, aligning well with the integrity criterion. Furthermore, the study delved into the execution time performance to offer a further perspective on the performance of ATOM.
Compared to the abovementioned related works, our proposed algorithm addressed all four quality criteria outlined in
Table 12. Our findings indicated the effectiveness of the proposed algorithm in merging the candidate ontologies with a base ontology. However, the proposed algorithm displayed a higher execution time when aligning large-scale ontologies. Furthermore, as demonstrated in
Section 5.7.1, the proposed algorithm may have a time complexity of the order of the fifth degree polynomial in the worst-case scenario. These particular aspects warrant further attention in future research, as we seek to explore and optimize the algorithm for applications involving large datasets, such as big data applications.
7. Conclusions and Future Work
In this study, we aimed to address a broad range of criteria for ontology merging, encompassing concepts (both lexical and semantic), properties, individuals, taxonomy, logical axioms, and annotations. We proposed a new algorithm that performs ontology merging in three stages, namely, concept and annotations, logical axioms, and individuals. This algorithm utilizes RDF graphs to iteratively merge the candidate ontology with the base ontology while preserving the structural integrity of the base ontology. In the first stage, the merging of concepts and annotations is achieved through the utilization of a Machine Learning-based framework called SM-DTR, WordNet, and BERT. Subsequently, the updated RDF graph serves as the input for the second stage, where logical axioms from the base and candidate ontologies are aligned and merged using the Word2Vec model and Jaro–Winkler algorithm. The resulting RDF graph then proceeds to the final stage, where individual merging occurs, facilitated by the Jaccard fuzzy string-matching algorithm. To assess the algorithm’s performance, we conducted asymptotic analysis and computational evaluations, along with the extraction of the base and graph metrics from the resulting output ontologies to evaluate their quality. While the findings revealed that the proposed algorithm is time-consuming according to the time complexity analysis and running time results, the quality of the resultant ontologies significantly improved. These resulting output ontologies met the established ontology-merging quality criteria, including integrity, cohesion, and accuracy. Furthermore, we compared the results achieved by the proposed algorithm with previous endeavors in the literature and found that our approach comprehensively addresses the quality criteria, unlike many existing studies. Future efforts will focus on enhancing the proposed algorithm by offering configurable settings that allow users to select alternative algorithms and techniques for each merging stage, tailoring the algorithm to specific user needs. Additionally, we will explore improvements in computational performance to support big data applications.