1. Introduction
Ontology provides a standard and shared representation on domain knowledge, which is regarded as the solution for the data heterogeneity problem [
1]. With the development of semantic web (SW) [
2,
3], more and more ontology-based intelligent systems for E-learning [
4], personalized search and browsing [
5], and collaborative molecular biology [
6] have been developed, which require mutual collaborations to enhance their intelligent behaviors. To this end, it is necessary to find the entity mappings between their ontologies, which is the so-called ontology matching [
7]. Essentially, matching two ontologies aims to find a mapping matrix (MM), which describes an alignment through setting rows and columns as two ontologies’ entities, respectively, and elements as 1 denotes that two corresponding entities are mapped, otherwise not.
Table 1 shows an example of MM, where
and
are two ontologies, and
and
respectively denote the concepts “Article” and “Paper” in
and
. Since the entities’ scale could be very large, most of the MM elements are zero (i.e., it is a sparse matrix), and to determine an optimal MM is essentially a sparse optimization problem [
8].
Recently, the evolutionary algorithm (EA) has become a popular method of addressing ontology matching problem [
9], but the existing single-objective EA makes use of f-measure [
10] to evaluate the MM quality, which yields the bias improvement on the solutions, i.e., the solution might sacrifice one of the objectives to improve the other one. To ensure the unanimous improvements on these two metrics, in this work, the ontology matching problem is defined as a sparse multi-objective optimization problem (SMOOP), and a multi-objective evolutionary algorithm with relevance matrix (MOEA-RM) is proposed to address it. The traditional MOEA performance usually degenerates when addressing SMOOP since the searching space grows exponentially with the increasing number of decision variables [
11,
12]. To face this challenge, various strategies have been proposed in the past decades, such as decision variable grouping [
13], decision variable analysis [
14] and special initialization and evolutionary operators [
15,
16,
17]. However, it is still difficult to maintain the sparsity of the population, and the algorithm always converges slowly. To overcome these drawbacks, MOEA-RM first introduces a relevance matrix (RM) to adaptively measure each gene or correspondence’s relevance to the objective, which is then used to initialize the population to ensure the population’s sparsity, and improve the algorithm’s converging speed as well as the sparsity of generated individuals when executing the crossover and mutation operators. In particular, this work’s contributions are as follows:
The multi-objective ontology matching problem is formally defined;
A MOEA-RM is presented to address the ontology matching problem, which uses RM-based initialization, crossover and mutation to adaptively maintain population’s diversity and improve the algorithm’s converging speed;
The proposed MOEA-RM is employed on 39 different ontology matching tasks, and the experimental results show its effectiveness.
The rest of this paper is organized as follows:
Section 2 overviews the existing MOEAs for addressing the ontology matching problem;
Section 3 provides the preliminary background knowledge and defines the problem investigated;
Section 4 presents the RM-based MOEA, and
Section 5 shows the experimental results; finally,
Section 6 concludes this work and points out the future work.
2. Related Work
Comparing with the popular artificial intelligence techniques, such as neural network [
18,
19,
20,
21], data mining [
22,
23], etc., the evolutionary improvement on alignment’s quality is able to better refine their quality. In recent years, various MOEA-based ontology matching techniques have been proposed. To trade off the alignment’s completeness and correctness, Xue et al. propose a NSGA-II [
24] and MOEA/D [
25] to tune the matching systems’ parameters. These MOEA based matching techniques are able to provide different Pareto solutions for decision makers. Acampora et al. also use NSGA-II to optimize the alignment’s quality [
26], and they further make comparisons among different MOEA-based matching techniques’ searching performance results [
27]. To face the challenge of a large-scale ontology matching problem, Xue et al. present a general framework of MOEA-based large-scale matching technique, which first divides two ontologies into several similar segments, and then use MOEA to match them separately. After that, to ensure the diversity of population, they use an adatively strategy to guide the algorithm’s search direction [
28]. They also try to reduce the MOEA’s computational complexity by using the compact encoding mechanism, which is able to address the ternary compound ontology matching problem [
29], where an entity correspondence might consist of more than two entities. To enhance the converging speed, Lv et al. [
30] get an expert involved in MOEA’s evolving process, and make use of his knowledge to improve the alignment’s quality. The interactive MOEA is also used to match the sensor ontology on the internet of things [
31].
The existing MOEA-based ontology matching techniques model the ontology matching process as a continuous optimization problem. However, they need to construct several similarity matrices to maintain the candidate entity mappings’ similarity values, which requires huge computational complexity. To overcome this drawback, in this work, the ontology matching is defined as a 0–1 integer optimization problem, and considering its characteristics of sparsity, a RM is proposed to adaptively maintain the population’s sparsity and guide the algorithm’s searching direction.
3. Ontology Matching Problem
An ontology consists of the classes that define the domain concepts, the datatype properties that describe the class’s feature, and the object properties that present the relationships between classes [
32,
33]. Different ontologies might define an entity in different ways, yielding the heterogeneity problem. To address this issue, it is necessary to find the heterogeneous entity correspondences in automatic or semi-automatic ways. The found correspondences between the entities are called ontology alignment, where each correspondence mainly consists of two entities, the relationships (typically equivalence ≡) and their similarity value. The similarity value of two entities is an important metric that measures whether they are similar or not, which is typically calculated through the similarity measure [
34].
To measure the quality of an alignment, the classic metrics are recall, precision and f-measure [
10]. However, these metrics require using the reference alignment, which is not always available in practical matching tasks. To this end, this work uses the approximate metrics, i.e., MatchCoverage and Frequency [
35], which respectively estimate an alignment’s recall and precision. To be specific, given two ontologies
and
, their alignment
A’s MatchCoverage and Frequency are respectively defined as follows:
where
,
are respectively the cardinality of the matched entity sets in
and
,
is the number of correspondences in
A, and
is the
i-th correspondence’s similarity value.
Given an alignment
A, the ontology matching problem is defined in Equation (
3):
where
and
are the number of concepts in ontologies
and
, the decision variable is the MM
, and the two objectives are to maximize the MM corresponding alignment’s MatchCoverage and Frequency.
4. Multi-Objective Evolutionary Algorithm with Relevance Matrix
Before executing the matching process, the entities, such as class, datatype property and object property, are extracted from two ontologies. Their names are pre-processed through tokenization and stemming, which are then used to construct the entity similarity matrix for evaluating the individual’s fitness value. In particular, the similarity matrix’s rows and columns are respectively two ontologies’ entities, and the elements are the corresponding entities’ similarity value. After that, the ontology alignment is optimized through MOEA-RM, whose framework is presented in Algorithm 1.
Algorithm 1 The framework of multi-objective evolutionary algorithm with relevance matrix |
Initialization(P); Non-dominated_Sort(P); while Terminating condition is not met do Update(); Generate via based operators; Non-dominated_Sort(); Select_Next_Generation(); Non-dominated_Sort(P); end while
|
The MOEA-RM framework is similar to that of NSGA-II, while the novelties lie in the RM maintenance in each generation and RM-based evolutionary operators. This work uses MM to encode an individual, which is described in
Table 1. A RM is a statistical matrix with exactly the same number of rows and columns as MM, which reflects the current generation’s Pareto front (PF) gene distribution. In each generation, we sum all the PF solution’s corresponding MMs to obtain RM, which is defined as follows:
where
is the
k-th MM’s
i-th row and
j-th column element’s value. The higher value of a RM’s element means more gene bits in the current population’s PF solutions have chosen corresponding entity pairs, which can be utilized to guide the algorithm in either speeding up the converging speed or enhancing the population’s diversity. In the following, we respectively describe RM-based initialization and RM-based evolutionary operators.
4.1. Initialization
Given two ontologies and , the population P with size N, a RM , an individual is initialized according to Algorithm 2.
Here, we first randomly initialize the population with uniform probability, and then all MMs are summed up to obtain a RM for initialization. After that, for each individual’s element, we randomly pick up one other element with the same value in this MM to compare the corresponding RM values. If the former’s RM value is bigger, i.e., more individuals have chosen this gene bit as 1, we change the MM value from 1 to 0; otherwise, the value is changed from 0 to 1.
Algorithm 2 Initialization |
for ; ; k++ do for ; ; i++ do for ; ; j++ do ; end for end for end for update with P; for ; ; k++ do for ; ; i++ do for ; ; j++ do if then Randomly select with value 1; if then ; end if end if if then Randomly select with value 0; if then ; end if end if end for end for end for
|
4.2. Relevance Matrix Based Evolutionary Operator
We use RM to execute the crossover and mutation to adaptively trade off the algorithm’s convergence and divergence, whose pseudo-codes are shown in Algorithm 3.
During the crossover, the offspring individual z is set the same as one of its parent solutions, assuming p. Then, for each different gene in p and q, we need to first decide whether the current evolutionary strategy is focused on convergence or diversity by the parameter . Assuming it is the former, and we compare the RM corresponding element with randomly selected element . If is larger, we set as 1, otherwise as 0. When it is the latter, if is larger, we set as 0, otherwise as 1. With respect to the mutation, we first judge whether this gene bit should execute the mutation according to the mutation rate . If it is so, we need to further judge whether the current evolutionary strategy should prefer convergence or diversity by the parameter . The rest of the operations are similar to those in the crossover.
In particular, the parameter
controls the algorithm’s preference on convergence; we update it in each generation and adaptively trade off the algorithm’s exploration and exploitation. In the early stage,
should be small to ensure the population’s diversity, while in the late stage, it should be large to speed up the algorithm’s convergence. On this basis, given current generation
and the maximum generation
,
is updated as follows:
Algorithm 3 Relevance matrix-based crossover and mutation |
********** Crossover ********** [] = randomly select two parents from the population; ; //Initialize the offspring individual z; for ; ; i++ do for ; ; j++ do if then if rand(0,1) < then //Enhance the convergence if then ; else ; end if else //Enhance the Diversity if then ; else ; end if end if end if end for end for ********** Mutation ********** for ; ; k++ do for ; ; i++ do for ; ; j++ do if rand(0,1)< then if rand(0,1)< then //Enhance the convergence if then ; else ; end if else //Enhance the diversity if then ; else ; end if end if end if end for end for end for
|
6. Conclusions and Future Work
Matching ontologies is critical to SW development, and to determine the high-quality ontology alignment, this work models the ontology matching problem as a SMOOP, and proposes a MOEA-RM to adaptively measure each gene or correspondence’s relevance to the objectives to effectively address it. To maintain the population’s diversity and overcome the algorithm’s premature convergence, MOEA-RM uses RM to adaptively measure each gene or correspondence’s relevance to the objective, which is then used to initialize the population to ensure the population’s sparsity, and guide the crossover and mutation operators. The experiment uses the OAEI benchmark and conference tracks to test the MOEA-RM performance, and the experimental results show that our approach is able to effectively match the ontologies with various heterogeneous characteristics.
In the future, we are interested in using MOEA-RM to determine m:n correspondences, which is more challenging in terms of complicate semantics and high computational cost. Additionally, the similarity measure should be improved to distinguish the complex style of correspondences, and the corresponding semantic reasoning techniques could be of help to achieve high precision value. Last but not least, the evolutionary operators could also be improved to enhance the algorithm’s efficiency.