Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation

Data integration is considered a classic research field and a pressing need within the information science community. Ontologies play a critical role in such a process by providing well-consolidated support to link and semantically integrate datasets via interoperability. This paper approaches data integration from an application perspective, looking at techniques based on ontology matching. An ontology-based process may only be considered adequate by assuming manual matching of different sources of information. However, since the approach becomes unrealistic once the system scales up, automation of the matching process becomes a compelling need. Therefore, we have conducted experiments on actual data with the support of existing tools for automatic ontology matching from the scientific community. Even considering a relatively simple case study (i.e., the spatio-temporal alignment of global indicators), outcomes clearly show significant uncertainty resulting from errors and inaccuracies along the automated matching process. More concretely, this paper aims to test on real-world data a bottom-up knowledge-building approach, discuss the lessons learned from the experimental results of the case study, and draw conclusions about uncertainty and uncertainty management in an automated ontology matching process. While the most common evaluation metrics clearly demonstrate the unreliability of fully automated matching solutions, properly designed semi-supervised approaches seem to be mature for a more generalized application.


Introduction
Data integration, defined as "the problem of combining data residing at different sources and providing the user with a unified view of these data" [1], can be considered a well-covered research field, as could witness the myriad of contributions in literature.Its relevance is determined by the practical implications in the different application domains, and it is well-recognized within the information science community.Most modern systems work at a semantic level [2] where data integration may be understood at different levels (e.g.concept [3] or multi-media [4]).Semantic technology has been largely adopted in data integration [1].It is definitely central to a more holistic approach, where data integration is considered a part of a more complex knowledge-building process.The proper adoption of semantic technology is extremely effective in supporting data integration and reuse via interoperability [5].Overall, associating formal semantics with data is a key step in the fields of artificial intelligence and database management.In addition, the analysis of semantic data can underpin sophisticated data mining techniques [6,7].
In the context of this work, "knowledge building" is seen as the process of combining "raw data" in order to create "rich data spaces" in which "semantics" are defined in a formal way [1].While data integration aims at establishing a An ontology can be interpreted as a tuple o = < C, I, P, V, Ax >, such that C is the set of concepts (or classes of individuals, or classes); I is the set of individuals (or instances/objects); P is the set of properties (or relations) which is divided into two sets: OP is the set of object properties (or relationships/associations), and DP is the set of datatype properties (or attributes); V is the set of datatype values (or data values, or data literals) specified by data types; and Ax is the set of axioms, such as axioms of subsumption (a.k.a.inclusion, "is-a", child-parent, sub-entity-super-entity, hyponymy-hypernymy, specialization-generalization) between two concepts or two properties; axioms of instantiation (or typing) between concepts and individuals, properties and property instances, data types and data values; axioms of disjointness (or exclusion) between two concepts or two properties; axioms of equivalence (or assignment) between two concepts or two properties; as well as other logical axioms such as restrictions on properties and complex relations.
Instantiation axioms between classes and their individuals are also called class assertions.For example, the individual "Italy" is an instance of the class "Country" (< Italy −→ type −→ Country >); and the individual "Rome" is an instance of the class "City" (< Rome −→ type −→ City >).Instantiation axioms between properties and their property instances are also called property assertions.An object property assertion means that an object property links an individual of a given class (called domain) to an individual of a given class (called range).For example, "Rome" is the capital of "Italy" (< Rome −→ capitalO f −→ Italy >); While a datatype property assertion means that a datatype property links an individual of a given class (called domain) to a data value of a given data type (called range), e.g., integer, string, boolean, real, etc.For example, "Rome" has a population of "4.400.000" which is a value of the data type Integer (< Rome −→ has_population −→ "4.400.000">).The subject of a property is called its domain, and the object of a property is called its range (< domain −→ property −→ range >).
Thus, an ontology is a set of triplets < sub ject −→ predicate −→ ob ject > (or < entity 1 −→ relation −→ entity 2 >).And it can also be viewed as a directed labeled graph, such that entities are nodes, and relations are edges.
The set of classes, object properties, datatype properties, individuals, and data values is called entities (or resources -except for data values because they do not have identities (i.e., unique identifiers)-).

Ontology Matching in the Knowledge-Building Process
The knowledge-building process, as understood in this paper, is not limited to data integration; it also includes semantic enrichment and annotations.Different ontology-based solutions have been proposed to integrate data within a range of scientific and business contexts [13][14][15], as well as to support the integration among systems [16].The process can be centralized, meaning that a global schema can be adopted to provide integrated access to information [13].

Knowledge-Building Process
Knowledge building through data integration is understood as the process of semantically integrating raw data [1].Ontologies can be used to define rich data spaces (knowledge) in which semantics are formally specified.In this work, the knowledge-building process takes as input heterogeneous raw datasets (assumed to be a relational database) and returns as output an integrated semantic data space represented by a knowledge graph.As shown in Figure 1, this process is composed of the three following main steps: i) Conversion of datasets into ontologies, ii) Ontology matching, and iii) Ontology integration (or ontology merging).
In other words, the system takes datasets (one or more tables according to the classical relational model) as input.The considered datasets provide different information about a domain, with some overlapping.First, if datasets are not available in a semantic format (i.e., RDF or OWL), they are converted into ontologies.Second, semantic matching between each pair of ontologies is performed to obtain pairwise alignments containing correspondences between the equivalent entities.Third, ontologies are aggregated and then merged by adding equivalence links (i.e., equivalence bridging axioms) reflecting the alignments' correspondences.Finally, an integrated ontology (or a knowledge graph) composed of the input ontologies and the equivalence bridge axioms is generated.
A semi-supervised or even fully automated knowledge-building process can be established by using external tools that implement dataset conversion, ontology matching, and ontology integration.

Dataset Conversion
By adopting semantic web technology, physical integration is implicitly supported since data available in a semantic format can be systematically and automatically added to the data space.This assumes that datasets and alignments are already imported into the semantic space.However, the conversion of data into a semantic format is not always an obvious step, especially for non-technical users.To properly support this conversion process, the virtual table model can be adopted.The latter is a simple and intuitive approach to data integration that assumes the target dataset is described in relational tables and automatically translated in a semantic format.From a user perspective, an external dataset may be mapped into a virtual table and automatically converted to OWL.A relational table is converted to an ontology, as follows: • The ID (or the primary key) in the relational model is converted into a class/concept in the ontology model.In this case, the primary key should not be a composed field (i.e., composed of multiple fields).
• Associations (or foreign keys) in the relational model are converted into object properties in the ontology model.
• Attributes (or data fields) in the relational model are converted into datatype properties in the ontology model.
• Key values (Data) in the relational model are converted into individuals/instances in the ontology model.
• Attribute values (Data) in the relational model are converted into data values (or literals) in the ontology model.

Ontology Matching
Ontology matching.Also known as ontology alignment, ontology matching is the process of finding semantic correspondences (mainly similarities) among entities from different ontologies (commonly two ontologies).Each type of entity (classes, object properties, datatype properties, and instances) is matched in isolation, such that no class-to-property, class-to-individual or property-to-individual correspondences are found.Entity pairs that have the same name and meaning or have different names but the same meaning should be matched.Ontology matching is an essential preceding step for ontology integration.or overlap (≬); and θ is a confidence value (or a confidence score) that assigns a correctness degree on the identified relation and typically ranges in the interval [0 − 1].The higher the confidence value, the more likely the correspondence holds [17].The confidence value of a given correspondence reflects the matcher's "belief" in the correctness (or reliability) of that correspondence [18].In the case of equivalence relation, c is denoted by < e O 1 ≡ e O 2 >, and θ reflects the similarity value (a.k.a. the similarity measure, the similarity score, or the similarity degree).
Ontology alignment.Denoted as A = {c 1 , c 2 , . . ., c n }, an ontology alignment is a set of semantic correspondences relating entities from two ontologies.It is normally understood as the result of an ontology-matching process.
Alignments are usually expressed in the RDF Alignment format1 [19] which is the standard format for representing ontology alignments.In general, an alignment is considered a partial many-to-many alignment.Indeed, in a partial alignment, there could be many entities in O 1 or O 2 that have no counterpart (equivalent) entities in the other ontology, whereas in a many-to-many alignment, an entity from one ontology (O 1 or O 2 ) can be matched to one or many entities from the other ontology.A many-to-many alignment is both a one-to-many alignment and a many-to-one alignment.

Ontology Integration
The simplest way for integrating ontologies is the Simple Union [20] or the simple merge [21,22] approach.It consists of aggregating the input ontologies or importing them into a new ontology and adding bridging axioms that translate the alignment(s) between them.The semantic correspondences of each alignment (resulting from the matching step) are transformed into bridging axioms in order to link the overlapping parts of the input ontologies [23,24].
The integration of two ontologies can be formalized by the following merge function: [17], such that O 1 and O 2 are two ontologies to be integrated, and A is the alignment between them.
OWL (Ontology Web Language) [25,26] is the most widely used language for representing and defining structured knowledge in the Semantic Web [9,10].It uses the rich formal semantics of Description Logics (DL) [27] to express ontologies and reason on them.The OWL language provides direct mechanisms to link equivalent entities.Indeed, equivalence correspondences between entities can be expressed by OWL built-in statements or axioms, as follows: • Equivalence between classes is expressed by the <owl:equivalentClass> axiom; • Equivalence between object or datatype properties is expressed by the <owl:equivalentProperty> axiom; • Equivalence between individuals (rather called "identity") is expressed by the <owl:sameAs> axiom.
Therefore, when we integrate two ontologies, the integrated ontology O 3 can be considered as the aggregation or the union of [28].The correspondences of the alignment A can be viewed as an ontology O A (a.k.a. an intermediate ontology [29], a bridge ontology [30], or an articulation ontology [31]).
The resulting ontology O 3 can be called an integrated ontology or a merged ontology.It can also be called a knowledge graph (KG) because a KG is composed of entities from many independent sources and can cover different domains at the same time [32].

Ontology Matching: Related Work
Ontology matching is a crucial step in the knowledge-building process.In this subsection, we will briefly report different ontology matching techniques used in the literature.
Real-world ontologies often introduce interoperability issues, as they are designed by different developers and communities with distinct requirements and tools and for different purposes and applications.Independently developed ontologies that describe similar domains or the same domain often have different conceptual models, different domain perspectives (points of view), different levels of expression detail (granularity), and different naming conventions.In other words, ontologies describing the same piece of knowledge (i.e., the same entities) will often be heterogeneous.To integrate their knowledge, ontologies need to reconcile their heterogeneity by using an ontology matching process.Because of the compelling need to minimize human intervention and speed up the matching process [33], automatic matching is becoming an essential requirement in many contexts and applications.Due to its unquestionable relevance in data integration within modern systems, ontology matching has been extensively discussed in the literature in terms of approaches, issues, solutions, challenges, and assessment (e.g.[17,29,[34][35][36][37][38][39][40]).
The most common classification for ontology matching solutions distinguishes between content-based techniques and context-based techniques [17,41].While the former class adopts only internal knowledge contained in the ontologies to be matched, the latter relies on external (or contextual) resources different from the target ontologies (e.g., linguistic resources and external vocabularies).Another possible classification distinguishes between element-level techniques (where entities are considered in isolation) and structure-level techniques (where entities are considered along with their relationships to other entities in the ontology).
Content-Based Ontology Matching.Different types of solutions adopt a content-based approach as follows: • Terminological techniques consider entities as strings or words.
External resources are used in order to find similarities between words based on linguistic relations between them, e.g., synonymy, hypernymy, and hyponymy relations.These techniques are usually applied before running the string-based techniques.
• Structural techniques rely on the (tree-like hierarchy) structure of the ontology and are based on the principle of locality [44].The latter assumes that entities adjacent or neighbors to entities of correct correspondence are likely to be matched to each other.The hierarchy neighbors of a given entity are its (direct) parents, (direct) children, siblings, and leaves.For example, if entities e 1 from O 1 and e 2 from O 2 are correctly matched, then the neighbors of e 1 are likely to be matched to the neighbors of e 2 .
• Semantic-based techniques compare the semantic interpretation of the entities to be matched (e.g., using a DL ontology reasoner (by inference or deduction) or using external resources).They are based on intuition, assuming that if two entities share the same interpretation, then they are the same.It should be noted that language-based techniques using external resources (such as WordNet [42]) can also be considered semantic techniques because linguistic relations (synonyms, hypernyms, and hyponyms, etc.) are also semantic relations.
• Instance-based techniques compute the correspondences by comparing the sets of individuals.They are based on intuition, assuming that if the instances are alike, then the classes to which they belong are also alike.
• Constraint-based techniques are based on the similarity of the internal constraints/structure of the entities (e.g., the domains and ranges of properties, and the data types of datatype properties, etc.).They are less popular than the previous ones and are commonly used in combination with others.
Context-Based Ontology Matching.Context-based techniques are normally classified depending on the external resources they adopt: • "Formal resource-based techniques" use formally represented resources to do the matching.These resources are usually external ontologies (e.g., upper-level ontologies or domain-specific ontologies), alignments from previously matched ontologies, linked data, or instances that are not part of the ontology (e.g., knowledge graphs), etc.
It should be noted that language-based techniques using external resources (such as WordNet [42]) can also be considered formal resource-based techniques because linguistic or semantic relations (synonyms, hypernyms, and hyponyms, etc.) can be found in an external ontology or in any external formal resource.
• Informal resource-based techniques use more informal resources to perform the matching process, such as linguistic resources (e.g., dictionaries) or annotated resources (e.g., encyclopedia pages and pictures).Similarities among entities are based on how these entities are related to such resources.For example, if two classes annotate the same set of pictures, these classes can be considered equivalent.In this case, to deduce reliable correspondences, informal resources are typically large corpora of related entities.
Within an automatic process of ontology matching, it has been shown that a combination of complementary matchers (string-based, lexical, structural, semantic, and instance-based matchers) improves the alignment quality [45].Nowadays, many ontology matching tools use the technique of matcher ensembles (a.k.a.matcher combination, or matcher aggregation) to produce optimal results.

Ontology Matching: Uncertainty
The ontology matching process is the main cause of uncertainty in the knowledge-building process.Once the systems scale up, the knowledge-building process using existing consolidated automatic tools is required.This automation generates uncertainty.Uncertain information is generally understood as information whose veracity cannot be determined or assessed [19,46].However, uncertainty is an intrinsic and unavoidable circumstance in automatic data integration and a critical issue for the reliability of systems and underlying processes.In the context of this work, the focus is on uncertainty in schema matching (and in particular in ontology matching).
We notice two types of uncertainty in ontology matching: (i) uncertainty caused by the domain of discourse, and (ii) uncertainty caused by the matcher combination strategy.

Uncertainty Caused by the Ontology Domain
While fully automatic ontology matching tools intrinsically lead to situations of uncertainty (as human experts do not verify them), even semi-supervised or manual approaches may lead to errors.Indeed, the users may not sufficiently understand the ontology domain (such as bioinformatics or biomedical domains) and therefore may provide imprecise and incorrect correspondences.Additionally, some domains can be intrinsically error-prone, as the aimed correspondences could not be clear [47].For example, concepts can have ambiguous semantics when they are closely related but neither completely synonymous (i.e., equivalent) nor completely dissimilar (i.e., disjoint) [45,48,49].In this case, matching systems become uncertain as it is not accurate to state whether two entities are equivalent or not [50].
Besides, in an ontology, entities are not supposed to be semantically disjoint by default.Entities follow the "Open World Assumption" where they are supposed to overlap and share a certain amount of common information (even if it is not yet specified) [50].Ontologies belonging to the same domain or to similar domains often introduce linguistic, structural, and semantic ambiguities, resulting from their different domain representations [51].These ambiguities will lead to the creation of an ambiguous alignment after performing the ontology matching process.
Let us introduce the notion of "alignment ambiguity".A many-to-many alignment is actually an ambiguous alignment since it contains ambiguous correspondences [17].Ambiguous correspondences are also called competing correspondences [52], or higher-multiplicity correspondences (or correspondences of higher multiplicity) [53].The latter are sets of correspondences that match one entity from the first ontology to multiple entities from the second ontology, or vice versa [17].These correspondences have in common the same source entity (coming from O 1 ) or the same target entity (coming from O 2 ).In other words, an ambiguous correspondence contains at least an entity (coming from O 1 or O 2 ) that appears in other correspondences.Here is an example of a set of three ambiguous equivalence correspondences written in the DL syntax: Klir and Yuan [54] distinguish two main types of uncertainty: Fuzziness (lack of sharp or definite distinctions) and Ambiguity (co-existence of many-to-one or one-to-many possible correspondences, leading to a choice disagreement).Ontologies carry this ambiguity in the process of ontology matching as well as the process of ontology integration [51].
In addition, such an error rate is likely to increase with the scale of the target system.An uncertain matching affects the performance of the whole system [50].

Uncertainty Caused by the Matcher Combination
Lexical matchers assume that similar entities are likely to have similar names, while structural matchers assume that similar entities are likely to have similar hierarchy neighbors (i.e., sub-entities, super-entities, etc.); Semantic matchers assume that similar entities are likely to have the same meaning; In contrast, instance-based matchers assume that similar concepts are likely to have similar individuals [33].To aggregate these assumptions by which different matchers assess the similarity between entities, a "combined matcher" automatically combines the alignments produced by all matchers and returns a single alignment.The latter contains correspondences with overall confidence values resulting from the combination of multiple confidence values generated by individual matchers [55].Confidence values can be combined by using one of the following strategies: min, max, average, or weighted sum.Combination algorithms often assign a weight to each matcher.The confidence values of the combined correspondences strongly depend on the choice of the matcher weights [50].
Nevertheless, when the number of individual matchers increases, it becomes trickier to combine their respective alignments.The actual challenge in any system that manages uncertainty is to check the reliability of the generated similarity scores (or confidence values).Indeed, unreliable values reduce the quality of the alignment, as erroneous correspondences could falsely have high similarity scores.Getting reliable and meaningful confidence values is always sought after, and it is one of the most challenging issues [47].The performance of automated matching systems heavily depends on the validity of the combination strategy, which is hard to design.The average combination is often considered to be the most effective strategy.Overall, neglecting the importance of correspondence similarity values does generate uncertainty in ontology alignments and deteriorates the quality of the ontology integration process [49].

Ontology Matching: Uncertainty Management
Uncertainty management plays a crucial role in real-world applications.It has been recognized as the next issue in data integration [18].Managing uncertainty in schema matching is the problem of dealing with imprecise or inaccurate correspondences [19,33].Non-supervised matching is often imperfect [48] as no automatic matching tool can be expected to produce the exact expected alignment (by finding all the correct correspondences between entities) [45].This automatic process always brings with it a degree of uncertainty (imperfection or imprecision) [55].Therefore, it has become evident that we should manage partially incorrect alignments in order to improve their quality [19].
Related work on data integration manipulates uncertainty until a choice is made to keep only exact information.There are two options for reducing (or removing) uncertainty in ontology alignments: (i) either manually, through user feedback, or (ii) automatically, using alignment filtering techniques.
In the (user-driven) manual approach, correct and incorrect correspondences can be manually selected from the alignment.For example, the user can manually validate all the ambiguous correspondences or all the low-confidence correspondences under a certain confidence threshold (assuming that low-confidence correspondences are less reliable than high-confidence ones) by keeping the correct correspondences and deleting the false ones.In addition, the user can alter correspondences by changing their relation type (e.g., from equivalence to subsumption) or by changing their confidence value (e.g., from a low value to a high value if they are falsely assigned to very low confidence values).The user can also add missing correspondences if possible.Otherwise, users can be incorporated into a semi-supervised matching process whenever the system asks for help [55].
In the (alignment filtering) automatic approach, two different levels of uncertainty management can be highlighted: (i) the alignment trimming, and (ii) the alignment disambiguation.

Alignment Trimming
The alignment trimming [52] approach is also called thresholding [39] or confidence cut [56].It aims to trim the alignment with a threshold in order to ensure that only the best correspondences are maintained in the alignment.The trimming process removes from the alignment all correspondences whose confidence values are below a given threshold.Formally, this approach discards all correspondences with a confidence value c < α by applying an α-cut to the alignment, such that α is the confidence threshold and α ∈ [0, 1].This technique increases the alignment precision at the expense of the alignment recall (see Subsection 5.2).
Semi-supervised user feedback cycles or automatic learning-based approaches both choose the threshold.In the former way (which is the most common), many trials should be performed by adjusting or varying the threshold value until the best or optimal threshold is found [45].
Assigning a given confidence threshold depends on the application in question.For instance, we can assign relatively low thresholds to a recommendation application since incorrect correspondences are tolerated; however, we should assign high thresholds to a scientific application since incorrect correspondences are not acceptable [50].Reducing the alignment uncertainty by alignment trimming generates a loss of information [45], as some of the deleted correspondences can be correct.Indeed, it is impossible to separate correct correspondences from incorrect correspondences using a trimming threshold because correspondences under any threshold can include correct and incorrect correspondences.As a result, any selection of a threshold will still generate false correspondences and/or missing correspondences.As a result, choosing a threshold and performing a trimming process alone will not yield the ideal alignment [18].
As a rule of thumb, the uncertainty resulting from the ontology matching process is either lost or transformed into exact information [19].Hence, to avoid losing relevant information from the alignment, we can preserve these uncertain correspondences and manage uncertainty manually [46].

Alignment Disambiguation
Ambiguous correspondences are often a source of uncertainty in ontology alignments because of their ambiguous interpretation (see Subsection 4.1).There are two possible approaches for disambiguating a set of ambiguous equivalence correspondences: (i) In the first approach, an alignment disambiguation process converts a many-to-many alignment into a one-to-one alignment.It is considered a bipartite matching problem (or a bipartite filtering problem).This approach consists of selecting the most similar pair of entities (i.e., the correspondence that has the highest confidence value) and removing the remaining correspondences that involve one of these entities.It assumes that only one correspondence reflects a correct semantic equivalence, while the other ones (i.e., those with lower confidence values) are incorrect correspondences that do not reflect a strict synonym but rather an overlap relation [53].Therefore, only one equivalence correspondence is considered among the set of ambiguous correspondences.To do so, we can apply the Stable Matching algorithm (or the Stable Marriage algorithm) [57].The latter assigns only one object o 1 from a set S 1 to only one object o 2 from a set S 2 , such that there is no other correspondence involving one of the two objects and having a higher similarity.This algorithm favors stronger (high-valued) individual correspondences-in terms of similarity values.After applying the Stable Matching algorithm, there is no object (entity) involved in multiple correspondences.
There are other algorithms, such as Maximum Weight Matching (or Maximum Weight Bipartite Filtering) [58,59].It aims to get a maximum weighted sub-alignment that maximizes the sum of the confidence values of all correspondences constituting the alignment.That is, it tries to maximize the total similarity values of the selected correspondences.However, we believe that the Stable Marriage works better than the Max Weight Matching in this case.Indeed, Stable Marriage always chooses the best correspondence for each entity in isolation (in a local manner) and thus guarantees that all high-confidence correspondences are selected.While the Max Weight Matching chooses the best correspondence for each entity in a global manner, which does not guarantee that all high-confidence correspondences are selected.Assuming the confidence values provided by the matching tool are reliable and truly reflect the probability of correspondences being correct, it is always better to have one high-confidence correspondence than two mediumconfidence correspondences in an alignment.Since the goal of alignment filtering is to minimize the number of incorrect correspondences and maximize the number of correct ones, we believe that the Stable Marriage is the best choice.
There is another disambiguation idea that is based on the principle of locality (see Subsection 3.2).It assumes that low confidence values in the neighborhood of a correspondence c can reveal the incorrectness of c.For instance, given a correspondence c composed of the entity e 1 from O 1 and the entity e 2 from O 2 (c = < e 1 ≡ e 2 >), if the correspondences relating the neighbors of e 1 and e 2 have low confidence values, then the correspondence c is likely to be incorrect.Recall that the neighbors of an entity are the more general entities (ancestors) and/or the more specific entities (descendants) of that entity.For each ambiguous correspondence, this algorithm [60] counts the confidence proportion of correspondences reachable by the neighbors of the correspondence in question.Then, it selects the correspondence with the highest confidence ratio.This algorithm is not always appropriate, especially when there are no correspondences in the neighborhood of a correspondence and when the hierarchy of ontologies is not deep enough.
(ii) As for the second approach, since the terms of one ontology are more general (or more detailed) than the terms of the other ontology (i.e., one ontology is more general or more granular than the other), [53], all equivalence relations in the ambiguous correspondences are considered subsumption relations.In this case, entities of the first ontology are decomposed into several more specific entities in the second ontology, or vice versa [18].Thus, the sets of ambiguous equivalence correspondences are actually incorrect.This alignment disambiguation approach transforms every ambiguous equivalence correspondence into a subsumption correspondence by altering its relation type from the equivalence relation ("≡") to the inclusion relation ("⊑" or "⊒") [53,61,62].
As a result, we deduce that a generic alignment disambiguation approach is difficult to define.

Ontology Alignment Evaluation Metrics
Several evaluation metrics have been defined and are commonly adopted within the research community to quantitatively assess the accuracy of alignments resulting from an automatic ontology matching system.
To do so, a reference alignment (understood as the expected or intended alignment-the "gold standard") should be available [63].In practice, an alignment is a set of correspondences between entities from different ontologies.Therefore, an alignment A, returned by a given ontology matching tool, can be compared to the reference alignment R by checking for the overlap of the two sets of correspondences [64].In general terms, the most common ontology alignment evaluation metrics are adaptations of classical metrics from the Information Retrieval (IR) field.

Basic Evaluation Metrics
The basic evaluation metrics assume the following definitions [17]: • True positives are correspondences that have been correctly found by an automatic ontology matching tool.
True Positives (TP) = A ∩ R • False positives are correspondences that have been falsely found by an automatic ontology matching tool.
False Positives (FP) = A − R • False negatives are correct correspondences that have not been found by an automatic ontology matching tool.
False Negatives (FN) = R − A • True negatives are false correspondences that have been correctly discarded by an automatic matching tool.
True Negatives (TN) Based on such definitions, the set of automatically identified correspondences comprises true positives and false positives (A = T P + FP), and the set of expected correspondences is composed of true positives and true negatives (R = T P + T N).Evidently, false positives and false negatives reduce the matching accuracy.Therefore, an efficient ontology matching system aims to minimize both of them.

Advanced Evaluation Metrics
The advanced evaluation metrics assume the following definitions [17,40,64,65]: • Precision measures the "correctness" of an alignment.It reflects the share of correct correspondences among all the ones found.It is defined as the ratio of the number of correctly found correspondences (TP) over the total number of found correspondences (TP + FP).Given a reference alignment R, the Precision of an alignment A is a function P : ∧ × ∧ → [0, 1], such that: • Recall measures the "completeness" of an alignment.It reflects the share of correct correspondences among all the expected correspondences.It is the ratio of the number of correctly found correspondences (TP) over the total number of expected correspondences to be found (TP + TN).Given a reference alignment R, the Recall of an alignment A is a function R : ∧ × ∧ → [0, 1], such that: In the ideal case, Precision and Recall reach their highest value of 1.0.
• Noise & Silence are the complement measures of Precision and Recall.Given a reference alignment R, the Noise and the Silence of an alignment A are functions N and S : ∧ × ∧ → [0, 1], such that: • F-measure combines precision and recall, as they cannot accurately assess the matching quality alone.Indeed, the ontology matching tool can have a high precision and a low recall, and vice versa.F-measure is a combined metric that attaches different importance to precision and recall.Given a reference alignment R and a number α between 0 and 1 (0 ⩽ α ⩽ 1), the F-measure of an alignment A is a function , such that: The α parameter defines the relative balance between precision and recall, as higher α values give more importance to precision with respect to recall.When α = 1, F-measure is equal to Precision; and when α = 0, F-measure is equal to Recall.A value of α = 0.5 defines the equal importance of both core metrics.Therefore, when α = 0.5, F-measure becomes the harmonic mean of precision and recall, as follows: F-measure(0.5), also called F1, is the most commonly used variant of F-measure(α) in IR since it balances the importance of precision and recall so that neither is compensated by the other.
Matching tools may need parameter tuning.In this case, F-measure is adopted as a driving factor to perform parameter tuning because values that maximize F-measure are considered the optimal ones.Hence, this metric is not only helpful in evaluating the quality of alignments but also in selecting input parameters for matching systems, such as the confidence threshold that returns the best F-measure value (see Subsection 4.2).
• Overall (or matching accuracy [66]) is explicitly developed for schema matching purposes, unlike the previously mentioned metrics.It measures the manual error correction effort.That is, it reflects the post-matching effort needed to fix the alignment by adding missing correspondences (FN) and removing false correspondences (FP).It is the ratio of the number of errors (FP + FN) over the total number of expected correspondences (TP + TN).Given a reference alignment R, the Overall of an alignment A is a function , such that: , where negative values are associated with "bad" matching performances.If an alignment A has a number of false positives higher than the number of true positives (Precision < 0.5), its Overall will have a negative value, which means that the alignment A is not worth the repair effort.Indeed, if more than half of the correspondences of A are false, the user would make less effort to manually match the ontologies from scratch than to manually modify the alignment of A.
In the ideal case, when Precision = Recall = 1, F-measure and Overall reach their highest value of 1.0.It should be noted that Overall values are always lower than F-measure values [17].

Ambiguity Evaluation Metric
All the metrics mentioned above reflect the correctness and the completeness of the generated alignment.However, the following metric reflects the ambiguity of the generated alignment.
• Ambiguity Degree [17] measures the "ambiguity" of an alignment.It is the proportion of ambiguous correspondences (i.e., entities that are matched to several entities).In other words, it is the proportion of entities from O 1 that are matched to at least two entities from O 2 , and vice versa.The number of ambiguous correspondences in an alignment (#Amb) is considered an absolute metric that varies according to the size of the alignment.Therefore, it is preferred to use a relative metric (%) reflecting the percentage of ambiguous correspondences in an alignment (regardless of the size of the alignment).The Ambiguity Degree of an alignment A is a function Ambiguity : ∧ × ∧ → [0%, 100%], such that: If the alignment has an Ambiguity degree of 0%, this means that it is not an ambiguous alignment.Otherwise, it is an ambiguous alignment.

Case Study: Spatio-Temporal Alignment of Global Indicators
We propose a classic case study that integrates several independent global indicators into a unique semantic data space.Successful integration is expected to provide a consistent view of the different indicators along the time and spatial dimensions, and, in general, all concepts should be matched.

Experiments Description
The case study, as proposed in this paper, has been inspired by the famous portal Our World in Data [67], which publishes and discusses heterogeneous indicators for countries from all over the world.In this portal, the different datasets are published in independent csv files.Still, they are considered integrated as the terminology used and the meaning of the fields in the different files are uniform.Our experiments consider raw data as originally provided by the respective sources, namely the datasets downloaded from their original links, as provided on the Our World in Data website.As such, the datasets in our experiments are actually heterogeneous.We aim for an automatic integration that emulates the integrated version of data published in the Our World in Data portal.Exp. 4 Food Security Indicators (I_2.4) [72] Prevalence of Undernourishment [71] Exp. 5 Macroeconomic Data (GDP) [73] GDP (current US $) [74] Exp.6 WDI Country [75] WUP2018 Largest Cities [76] Exp.7 Total Life Expectancy-Historical [77] Life Expectancy at Birth [78] Exp. 8 Historical Gas Emissions [79] List of Electoral Democracies [80] Exp. 9 List of Electoral Democracies [80] Sexual Violence in Childhood [81] Exp. 10 Historical Gas Emissions [79] Sexual Violence in Childhood [81] In table 0, we report the input datasets for the carried-out experiments.
Each experiment involves a pair of input datasets, since the matching tools adopted do not directly support the matching of more than two ontologies at a time.Each input dataset represents a relational table (provided in the csv format).As per previous explanations, the systematic integration process is composed of two seamless phases: (i) First, we convert original raw data (csv files) into ontologies (owl files) by using the dataset conversion tool described in Subsection 6.2; (ii) Then, we automatically match each pair of the resulting (converted) ontologies by using the most sophisticated available ontology matching tools (described in Subsection 6.2), to finally obtain an alignment (i.e., an rdf file) for each matched ontology pair.
The considered pairs can involve datasets belonging to the same domain or datasets from different domains.The latter case is prevalent in the context of overlapping, complementary, or interdisciplinary domains.Overall, the test bed proposed cannot be considered critical, neither in terms of size nor complexity.It is instead characterized by its small scale and low complexity.Input datasets are characterized by several columns that vary from 2 to 63 and several rows in the range of [46 − 540].
After converting and matching the input dataset pairs, we will get the output alignments.First, we will evaluate the quality of the output alignments.Then, we will trim and/or disambiguate these alignments and evaluate them again.We aim to see how the alignment trimming and disambiguation processes can affect the quality and uncertainty of the output alignments.In the alignment trimming process, we will proceed as described in Section 4.2.And in the alignment disambiguation process, we will apply a personalized simplified version of the Stable Marriage approach.
For the alignment trimming, we will use the Alignment API2 [56,82].The latter is a Java programming interface that facilitates the manipulation and evaluation of ontology alignments written in the RDF Alignment format.Given an alignment and a threshold value as input, the Alignment API automatically trims the input alignment using the predefined method cut() and returns a new trimmed alignment.The trimmed output alignment will only contain correspondences that have a confidence measure greater than or equal to the chosen threshold value.For the alignment disambiguation, we will apply an approach that is composed of two steps (see Figure 2): First, we go through all the correspondences in the alignment, and we disambiguate each set of ambiguous correspondences having a source entity in common (coming from O 1 ), as shown in Figure 2a; Second, we go through all the correspondences again, and we disambiguate each set of ambiguous correspondences having a target entity in common (coming from O 2 ), as shown in Figure 2b.In each step, disambiguation consists of keeping the strongest correspondence (i.e., the one with the highest similarity score) from the set of ambiguous correspondences and deleting the rest.If, by chance, two correspondences (from the set of ambiguous correspondences) have exactly the same similarity score, we keep both of them because we cannot randomly choose one over the other.The same thing applies if more than two correspondences happen to have the same similarity score, which is very rare.This alignment disambiguation approach produces the same results as the Stable Marriage approach.The disambiguated output alignment does not contain any ambiguous correspondences (i.e., it is composed of only 1-to-1 correspondences).
Finally, a reference alignment is manually provided by authors for each target experiment (as in table 0) for assessment purposes.Given an alignment (to be evaluated) and a reference alignment as input, the Alignment API [56] automatically returns the scores of the basic evaluation metrics (TP, TN, FP, FN) as well as the scores of the advanced evaluation metrics (Precision, Recall, F-measure, Overall, Noise, and Silence) which reflect the quality of the input alignment.As for the ambiguity evaluation metric, we create a simple algorithm in Java that takes as input an alignment and returns the score of the Ambiguity degree metric, reflecting the ambiguity degree of that input alignment.
Most (if not all) applications that require the combination of multiple indicators are unlikely to be error-tolerant, meaning that the resulting alignment is expected to be entirely correct.Despite its relative simplicity, the case study proposed is very relevant in different contexts and application domains.More concretely, the target case study is characterized by the need to compose indicators dynamically, and the resulting integration is expected to be precise and accurate.Practical examples may be identified, among others, in the areas of urban planning (e.g., [83]) and global sustainable development (e.g., [84]).Further requirements may characterize specific systems, such as real-time environments (in disaster management [85]).

External Tools Used in the Case Study Tool for Dataset Conversion
The dataset conversion tool [12] supports the conversion of a given relational table (i.e., a csv or excel file) into an ontology (i.e., an owl file).It applies the virtual table approach to facilitate such a process, assuming a supervised environment.The user interface allows users to straightforwardly import a relational table through the copy and paste functions.Automatic retrieval from a relational database is also possible.Users are asked to characterize each column of the table in the tool interface (i.e., ID column, association column, or attribute column).The tool requires relatively simple user inputs, as raw data can be just copied and pasted from external sources by adopting the current GUI.

Tools for Automatic Ontology Matching
Ontology matching tools take as input a pair of ontologies (in the owl format) and return as output an ontology alignment (in the rdf format).The Ontology Alignment Evaluation Initiative3 (OAEI) is the most known international campaign for the enhancement and evaluation of ontology matching tools.Outstanding results from the OAEI community are presented yearly in the Ontology Matching Workshop (OM), co-located with the prestigious International Semantic Web Conference (ISWC).We briefly describe below the two most popular tools within the OAEI community.We adopted both tools in our experiments to highlight the possible impact of the tool adopted on the final outcome.
• LogMap4 [44] is a highly scalable ontology matching tool.It performs an iterative process that starts with an initial set of equivalence correspondences obtained from a lexical matching process, then computes new correspondences by applying a structural matching process to the hierarchy neighbors of (the entities composing) the initial correspondences-based on the principle of locality.To achieve scalability, LogMap relies on lexical and structural indexes of the input ontologies.LogMap has taken part in the annual OAEI competition several times and has always been among the top-ranked solutions.It is considered by many to be the best ontology matching tool currently available.
• AML5 [52] is a scalable ontology matching system characterized by a comprehensive user interface.
The lexical and structural information of both input ontologies is stored in internal structures (i.e., HashMaps and MultiMaps).AML combines different individual matchers (i.e., the lexical matcher, the word-based string matcher, and alternatively the mediating and parametric string matchers) and keeps the highest confidence values in case of repeated correspondences.It should be noted that the mediating matcher uses a third external ontology as background knowledge.To achieve scalability, AML adopts efficient indexation by making HashMap cross-search in the matching process.AML has taken part in the annual OAEI competition on several occasions and has achieved good results.It is considered one of the best ontology matching tools currently available.
We have used the online Web Interface version 6 of LogMap.As for AML, we have used the version 3.1 of AML, which includes a Graphical User Interface (GUI).It should be noted that we have not used the latest AML version (version 3.2) because it does not include instance matching (i.e., it only performs schema matching).

Experimental Results
Initial results are summarized in Table 1.The latter shows the quality of ontology alignments resulting from each experiment using LogMap and AML.It reports the scores of the evaluation metrics, as previously described in Section 5.By looking at the number of correspondences in the reference alignment R and in the resulting alignment A and comparing them, we can initially approximate the quality of the resulting alignments at first glance.To do § Amb : ambiguous correspondences in A. so, we propose a further intuitive metric called ∆, which expresses the difference between the number of expected correspondences and the number of detected correspondences (∆ = R − A).By definition, ∆ = 0 is not synonymous with a correct matching because of potential false positives.Yet, such a metric is a valuable indicator to have a simple preliminary assessment by identifying potential under-matching (∆ > 0) and over-matching (∆ < 0).In Table 1, advanced metrics (precision, recall, F-measure, and overall) show similar results for the two adopted tools in most experiments.More concretely, for the precision metric, both tools provide a significantly different result in Experiments 1, 4, 6, and 7, and minor differences in the remaining experiments.The recall metric presents remarkable differences in Experiments 1, 4, and 7. F-measure values are quite different in Experiments 1, 4, and 7, while a pointless difference also shows up in Experiment 6.Finally, overall returns strongly different values in Experiments 1 and 7 and a more limited divergence in Experiments 4 and 6.In terms of performance, the ontology matching process averagely results in high values for precision with some evident exceptions (Experiment 4 especially), while recall, F-measure and overall values associated with the different experiments present important differences.
It is also worth mentioning that the three last experiments (in cross-domain matching) display a higher performance in terms of both precision and recall.Intuitively, integrating ontologies from different domains is relatively easier than integrating ontologies within the same domain because the number of potentially ambiguous correspondences is supposed to be averagely lower (as shown in Table 1).Additionally, ontologies belonging to the same domain or contiguous domains are often characterized by fine-grained terminology and heterogeneity.Therefore, ontology matching tools generate more false correspondences when matching ontologies from related or close domains.
All results are summarized in Tables 1, 2, 3, and 4. In Table 1, we evaluate the initially resulting alignments (i.e., without performing any alignment disambiguation or trimming processes to these alignments).In Table 2, we evaluate the resulting disambiguated alignments (i.e., we disambiguate the original resulting alignments by transforming them from N-to-N alignments to 1-to-1 alignments, then we evaluate them).In Table 3, we evaluate the resulting trimmed alignments (i.e., we trim the original resulting alignments by choosing the optimal threshold for each case and performing a confidence cut, then we evaluate them).In Table 4, we evaluate the resulting trimmed and disambiguated alignments (i.e., we perform a confidence cut to the original resulting alignments-after choosing the optimal threshold for each alignment-then we transform them into 1-to-1 alignments, and we finally evaluate them.) In Tables 2 and 4, all output alignments no longer contain any sets of ambiguous correspondences.Therefore, the scores of the ambiguity degree metric are null (See the columns in gray).In Tables 3 and 4, we report the trimming threshold values considered in the different experiments.Recall that alignment trimming applies a confidence α-cut to the produced alignment, where α ∈ [0, 1] is the confidence threshold value.Threshold tuning is performed in order to choose the optimal threshold value for each case.Indeed, in some experiments in Table 1, AML alignments contain many false correspondences with very low confidence values.These low-confidence correspondences make the precision of AML alignments decrease and therefore worsen the F-measure.To improve AML performance in these experiments, we have used a trimming threshold that generates the best F-measure results: We have made many manual trials to finally find the optimal threshold value that maximizes the F-measure of these AML alignments.We did not introduce any thresholds to LogMap experiments since the LogMap results are optimal by default.
Tables 5, 6, 7, 8, and 9 compare the results of Tables 1, 2, 3, and 4 by taking one evaluation metric at a time.They separately show the four tables' precision, recall, F-measure, overall, and ambiguity degree scores, respectively.In all these tables, the highest score is highlighted in bold for each experiment (and for each tool).The colored (up or down) arrows show an increase or a decrease compared with the scores of the initial results (from Table 1).It should be noted that sometimes the trimming process or the disambiguation process has no effect on the results of Table 4.In other words, some results after alignment disambiguation (shown in Table 2) do not change even if we add a trimming process (as shown in Table 4); and some results after alignment trimming (shown in Table 3) do not change even if we add a disambiguation process (as shown in Table 4).That is why the arrows in the last column of Tables 5, 6, 7, 8, and 9 can have different colors according to the scores of Table 4.
When we compare Tables 1 and 2, we notice that there is a very negligible decrease in recall scores in Experiments 1, 2, 3, and 4 for AML results (see Table 6) because of the alignment disambiguation process.However, there is a noticeable slight improvement in precision, F-measure, and overall scores in all experiments (see Tables 5, 7, and 8).There is also a slight improvement in precision, F-measure, and overall scores in Experiment 1 for LogMap (see Tables 5, 7 and 8).We deduce that the sets of ambiguous correspondences do surely convey some false correspondences (due to their obvious uncertainty).So, after disambiguating the output alignments, all the alignment evaluation scores improve, and thus the global uncertainty of alignments decreases.§ Amb : ambiguous correspondences in A.
As shown in Tables 1 and 3, AML generates a higher number of ambiguous correspondences (Amb) than does LogMap, especially in Experiments 1, 4, 6, and 7 (see Table 9).Therefore, AML alignments contain more uncertainty cases than LogMap alignments.In Subsection 7.2, we will show some ambiguity examples.When we compare Table 1 and Table 3, we notice that there is a very negligible decrease in recall scores in Experiments 1 and 7 for AML results (see Table 6) because of the alignment trimming process.However, there is a noticeable improvement in precision, F-measure, overall, and ambiguity degree scores in Experiments 1, 4, 6, and 7 (see Tables 5, 7, 8 and 9).We deduce that the removed (trimmed) correspondences do surely convey some false correspondences (due to their low confidence values).So after trimming the output alignments, all the alignment evaluation scores improve, and thus the global uncertainty of alignments decreases.
When we compare Table 1 and Table 4, we notice that there is a very negligible decrease in recall scores in Experiments 1, 2, 3, 4, and 7 for AML results (see Table 6) because of the alignment trimming and disambiguation processes.However, there is a noticeable improvement in precision, F-measure, and overall scores in all experiments (see Tables 5, 7, and 8).We deduce that the deleted correspondences-after trimming and disambiguation-do surely convey false correspondences (due to their ambiguity and low confidence).Therefore, after trimming and disambiguating the output alignments, all the evaluation scores improve, and thus the global uncertainty of these alignments is reduced.
Among the four result tables (Tables 1, 2, 3, and 4), Table 4 represents the best results.Therefore, Table 4 can be considered as the final results of our case study.By trimming and disambiguating the output alignments in Table 4, we exclude the maximum of untrustworthy correspondences, and we minimize the number of false positives in the alignments as much as possible.Despite that, experiments in Table 4 still point out a significant number of missing correspondences (with a significant number of false negatives) and a more contained number of false positives.Hence, the resulting ontology alignments are still uncertain and thus not entirely reliable for the task of ontology integration.
By and large, based on the set of experiments performed, LogMap outperforms its competitor, AML.However, it is true that the experience of a real-world case study has amply demonstrated the uncertainty that the high number of returned false positives and false negatives quantitatively reflect.In the next section, we will explain the reasons for the weak performance of LogMap and AML in some experiments, in particular Experiments 5 and 7 (see Subsection 7.1.1).

Uncertainty Causes in Ontology Matching
In this subsection, we explain the most important uncertainty causes that we have encountered in our experiments.

Uncertainty Caused by Different Ontology Granularity
Current ontology matching tools face many difficulties when one ontology is more detailed or more general than the other.For example, an ontology O 1 contains a concept that has the name of a specific country, while an ontology O 2 contains concepts that have the names of sub-countries constituting the country of O 1 .
Table 1 shows some examples from Experiment 1.In the first example of Table 1, LogMap and AML identified that "Sudan" in O 1 is equivalent to "Sudan" in O 2 , which is false in this case.Sudan in O 2 is actually the North Sudan, and Sudan in O 1 is the former Sudan that is composed of both the northern and southern parts.The entities "Sudan" and "South Sudan" from O 2 should rather be sub-entities of "Sudan" from O 1 (using a subsumption relation, not an equivalence relation).Ontology matching tools cannot identify such subsumption correspondences between entities.
Notice that O 1 and O 2 have a different granularity in each of these examples (in Table 1).That is, entities of O 1 are not always more detailed than entities of O 2 ; and entities of O 1 are not always more general than entities of O 2 .Thus, we cannot say that O 1 is more detailed (or more general) than O 2 .As a consequence, we cannot automatically replace every ambiguous equivalence correspondence in the alignment with a subsumption correspondence (as does the second approach of alignment disambiguation (see Subsection 4.2)).We can add another example from Experiment 3, where O 1 has datatype properties (or attributes) in the form of "year-year" intervals (e.g., "97-1999"), and O 2 has attributes as separate years (e.g., "1997", "1998", "1999").
There are many more complicated cases, such as complex correspondences.To identify complex correspondences, we need to perform a complex matching process.In complex matching, a complex correspondence can be composed of  Ontology an entity from ontology O 1 and a union of entities from ontology O 2 , or vice versa.Complex correspondences are extremely hard to identify.Table 2 shows some examples of complex correspondences extracted from Experiment 1.
The entity "Near_East" from O 1 should be matched to the entity "WesternAsia" from O 2 .While the entity "Asia_(EX._Near_East)"should be matched to the union of the entities "CentralAsia", "EasternAsia", "SouthernAsia", and "South-easternAsia".This is a complex compound correspondence.Otherwise, this complex correspondence can be split into four subsumption correspondences.None of these correspondences were detected by LogMap or AML.
The entities "Northern_Africa" and "NorthernAfrica" were correctly matched by both LogMap and AML.However, it was impossible for them to detect that "Sub-Saharan_Africa" is the union of "MiddleAfrica", "EasternAfrica", "WesternAfrica", and "SouthernAfrica".Complex matching is very difficult to perform with automated ontology matching systems.Only a human being can figure out such correspondences.
Similarly, "Northern_America" and "NorthernAmerica" were successfully matched by both LogMap and AML.However, it is not evident to match "Latin_Amer._and_Carib"with the union of "Central_America", "South_America", and "Caribbean" (in fact, Latin America is composed of Central America + South America).
In the next example, we will encounter a more complicated matching case where the same entities are simultaneously involved in two types of correspondences.Tables 3 and 4 show an example from Experiment 3. In this experiment, LogMap and AML have successfully matched "China,_Hong_Kong_SAR" from O 1 to "Hong_Kong_SAR,_China" from O 2 , and "China,_Macao_SAR" from O 1 to "Macao_SAR,_China" from O 2 .However, they both falsely matched "China" from O 1 to "China" from O 2 .The correct correspondence should rather match "China,_mainland" from O 1 to "China" from O 2 .Indeed, the entity "China" in O 1 actually means the whole China territory (including the mainland of China, Hong Kong, and Macao).Therefore, "China" from O 1 should be matched to the union of "China", "Hong_Kong_SAR,_China", and "Macao_SAR,_China" from O 2 (or split into three subsumption correspondences).Therefore, the entities "China", "Hong_Kong_SAR,_China", and "Macao_SAR,_China" from O 2 are simultaneously involved in equivalence correspondences (see Table 3) and subsumption correspondences (see Table 4).We conclude that ontologies belonging to the same domain and having different granularities certainly introduce many cases of uncertainty.

Uncertainty Caused by Different Entity Expressiveness
Ontology matching tools are also very sensitive to the expressiveness of entities.Consider the following example extracted from Experiment 4: In Experiment 4, ontology O 1 contains a class that is named "Regions.Subregions.Countries".However, the instances of that class are only regions and sub-regions; it has no countries as instances.This class should be instead named "Regions.Subregions".On the contrary, ontology O 2 contains a class that is named "Country_Name."However, instances of that class are countries as well as regions and sub-regions.This class should be rather named "Regions.Subregions.Countries"; we notice that there is inherent uncertainty in the original datasets.The names and terms of the classes are originally not expressive enough and do not perfectly represent the intended meaning.This feature makes the ontology-matching task even more difficult.Here is another example from Experiment 6: In Experiment 6, ontology O 1 has three datatype properties (or attributes): "Short_Name", "Table_Name", and "Long_Name".For example, for the country "VIR", the short name is "Virgin_Islands", the table name is "Vir-gin_Islands_(U.S.)", and the long name is "Virgin_Islands_of_the_United_States".These attributes are not significant enough; they should rather be named "Short_Country_Name", "Table_Country_Name", and "Long_Country_Name".On the other hand, ontology O 2 has a datatype property (attribute) called "Country_or_Area"; however, there are only instances of countries and no instances of areas.This attribute should rather be named "Country" or "Coun-try_Name" (not "Country_or_Area").Obviously, LogMap and AML could not match the properties "Short_Name", "Table_Name", and "Long_Name" with the property "Country_or_Area" since there is no word in common.There is an inherent uncertainty in these properties because their terms do not reflect their true and full meaning.We deduce that the choice of the entity names (in the original dataset) should be done carefully by domain experts.However, in real-world datasets, this is not always the case.Let us study one last example from Experiment 7: In Experiment 7, AML performs badly because some instances of O 1 are composed of both letters and numbers at the same time.Numbers disturb the AML processing.For example, instances such as "cshapes/1474", "cshapes/1475", "cshapes/1577", and "cshapes/1578" from O 1 are matched to many random country instances from O 2 , which lowers the precision of the AML alignment.These instances should not be matched to any instance from O 2 because they are instance IDs that do not have any meaning; they are just contiguous unique codes identifying the table rows.
In ontologies, every entity has a meaning.Entity names can be composed of words or alphanumeric codes.Still, when an entity name is an alphanumeric code, it is highly recommended that this entity also has a label (or multiple labels) mentioning the real full name or synonym(s) of the entity.Ontology matching tools often use the entity name information and the entity labels information to find correspondences between entities.However, real-world datasets do not contain such label information.Therefore, some errors in the matching process may occur.
Overall, entity terms should be written as straightforward and clear as possible so that existing ontology matching tools can produce good results and avoid additional uncertainty.

Uncertainty Caused by Different Conceptualization Choices
Recall that there are four main types of ontological entities: concepts (or classes), object properties (or relations), datatype properties (or attributes), and individuals (or instances).Current ontology matching tools do not match different types of entities (e.g., classes to instances, classes to relations, or relations to instances).They are only capable of matching classes to classes, object properties to object properties, datatype properties to datatype properties, and instances to instances.However, there is often a difference in the structuring (or organization) of entities in two similar ontologies.Here is an example from Experiment 5:

Years Countries
In ontology O 1 , countries are put in columns (as datatype properties or attributes), and years are put in rows (as instances); In contrast, in ontology O 2 , countries are put in rows (as instances) and years are put in columns (as datatype properties).The developers of these two datasets have made two different conceptual choices.The results of LogMap and AML are awful in Experiment 5 because there was nothing to match.The only ontology language that allows this kind of matching is the OWL-FULL language, which is not yet supported by the ontology community.Only OWL-Lite and OWL-DL languages are currently supported by the existing ontology building and matching tools.This is still a significant limitation that hinders interoperability and increases uncertainty in ontology integration tasks.

Uncertainty Caused by the Domain Complexity
The inherent domain complexity can also be a source of uncertainty.In this case, even normal users cannot be sure about ontology correspondences.We actually need a domain expert to identify the correct correspondences.In the following examples, we need an expert in the domain of geography (or the domain of geopolitics) to decide about the correspondences.Table 5 shows an example from Experiment 3. In this example, we are not sure whether the entity "Small_island_developing_states" from O 1 should be matched to the entity "Small_states", the entity "Pacific_island_small_states", or the entity "Other_small_states" from O 2 , or it should be matched to all of them at the same time.And we do not know if "Caribbean_small_states" can also be included.Tables 6  and 7 show an example from Experiment 5.In this case, we are not sure whether the entity "China" from O 1 should be matched to the entity "China" or to the entities "China", "Hong_Kong_SAR,_China" and "Macao_SAR,_China" from O 2 .An expert should check all these possible correspondences and choose the correct ones.

Examples of Uncertainty from the Resulting Alignments
The ontology matching tools used can be easily misled, as the input ontologies in these experiments contain different appellations for the same countries.In the following, we will show some examples of uncertainty cases (including false and missing correspondences) extracted from our resulting output alignments.

Example 1 (Ambiguity Case).
As said in Subsection 4.2, ambiguous equivalence correspondences are a source of uncertainty in ontology alignments.Each set of ambiguous correspondences often brings with it some false correspondences.Let us take an example from Experiment 1: In Experiment 1, LogMap identified two ambiguous correspondences, as follows: The first correspondence that matches "Repub._of_the_Congo"with "Democratic_Republic_of_the_Congo" has a confidence value of 0.76.The second correspondence that matches "Repub._of_the_Congo"with "Congo" has a confidence value of 0.8.The first correspondence is incorrect; the second one is correct.In this case, LogMap succeeded in assigning the highest confidence value to the correct correspondence and the lower confidence value to the incorrect one, but this is not always the case.This was the only false positive generated by LogMap in Experiment 1.

Example 2 (Ambiguity Case).
Let us take the same ambiguity example from Experiment 8: In Experiment 8, AML identified two ambiguous correspondences, as follows: The first correspondence that matches "Republic_of_Congo" with "Congo_(Brazzaville)" has a confidence value of 0.8.The second correspondence that matches "Republic_of_Congo" with "Congo_(Kinshasa)" has a confidence value of 0.8.The first correspondence is incorrect; the second one is correct.In this case, both correspondences have the same confidence value, which makes it even trickier to choose automatically.

Example 3 (Ambiguity Case).
Here is another ambiguity example from Experiment 2: In Experiment 2, AML identified three correspondences two ambiguous ones, as follows: The first and third correspondences are ambiguous because they have an entity in common (O 2 : Sudan).They have the same confidence measure, so we cannot automatically choose one from them.The first and second correspondences are correctly matched; however, the third one is incorrect.Sudan and South Sudan previously formed a single country called Sudan.Therefore, "Sudan_(former)" from O 1 should be a super-entity of both "Sudan" and "South_Sudan" from O 2 (i.e., both "Sudan" and "South_Sudan" from O 2 should be sub-entities of "Sudan_(former)" from O 1 ).The right matching would identify the two following subsumption correspondences: Current ontology matching tools can only identify equivalence correspondences.Only a human being can understand and identify subsumption correspondences.In many cases, the ambiguous equivalence correspondences convey subsumption.That is why it is preferred that a human expert check these ambiguous correspondences.Not all ambiguity cases can be resolved automatically.

Example 4 (Semantic Matching Case).
As stated before, neither LogMap nor AML could detect the correspondence between the entity "Near_East" from O 1 and the entity "WesternAsia" from O 2 in Experiment 1.Similarly, in Experiment 7, neither LogMap nor AML could detect the correspondence between the entity "Swaziland" from O 1 and the entity "Eswatini" from O 2 .The same goes for the entity "Palestine" from O 1 and the entity "West_Bank_and_Gaza" from O 2 in Experiment 3.
In these examples, the entity from O 1 and the entity from O 2 have no word or token in common, but they are synonymous.As a result, lexical and string-based matchers are not able to identify such correspondences.In this case, a semantic matcher is needed.However, the main limitation of current ontology matching tools is the semantic matching task, because semantic matching is the most challenging type of matching.To find similarities, a semantic matcher often requires external resources, such as dictionaries and thesauruses in different domains.Here are some additional straightforward cases that LogMap and AML did not pick up on: These correspondences were not detected because the used external dictionaries do not necessarily contain some particular words in their database (especially domain-oriented words).For example, WordNet, which is a large lexical database of English, does not recognize the word Czechia or the word Kyrgyz (although it recognizes the word Czech and the word Kyrgyzstan).Besides, the words from O 1 are proper nouns.Thus, word preprocessing techniques used by the matching tools (e.g., tokenization and lemmatization) do not even return a common token between these entities.

Example 5 (Abbreviations and Acronyms Case).
In Experiment 6, neither LogMap nor AML could identify the following correspondences: It is tricky for any ontology matching tool to detect the above correspondences because one of the two entities contains a lot of tokens, to the point that it will appear to the tool that it is not a correspondence.Otherwise, the matching tool can detect these correspondences, but they will have very low confidence values.This proves that not all low-confidence correspondences are necessarily incorrect.Abbreviations and acronyms make it even harder for the matching tools to detect such correspondences.For example, in Experiment 3, LogMap did not find the following correspondence; however, AML identified it and gave it a confidence value of 0.51: In all these cases, human intervention is necessary to remove the false correspondences and add the missing correct correspondences.No automated ontology matching tool can avoid uncertainty, even the most powerful ones.

Lessons Learnt
Despite the very good performance of ontology matching tools, the set of experiments that have been conducted shows how even a relatively simple use case can present several flaws that arguably lead to a situation of uncertainty once the integration process is automated.Besides, approaches used in the state of the art for managing uncertainty in ontology alignments are not effective.They cannot avoid deleting correct correspondences while filtering the alignment from ambiguous or low-confidence correspondences.
We notice that there are many uncertainty cases in ontology alignments, and it is difficult to treat all of them at once.However, it is very important to decrease uncertainty as much as possible.Uncertainty in ontology alignments can appear in several forms, such as matcher unreliability (or confidence unreliability), the high level of domain complexity, the ambiguity of correspondences, the incompleteness and/or incorrectness of correspondences, etc.Thus, there are several factors that can contribute to uncertainty in ontology matching.We cannot measure the uncertainty amount of a given alignment by simply using a single metric that holds all uncertainty aspects.Still, it would be interesting to combine all the available evaluation metrics into a single metric that globally reflects the holistic uncertainty of an alignment.
There are no specific evaluation metrics dedicated to measuring uncertainty in ontology alignments.The error rate associated with a matching process can be measured according to standard metrics that evaluate the quality of alignments.These metrics assess the completeness and correctness of an alignment compared to a reference.We have also found the ambiguity degree metric in the literature, which is very relevant for uncertainty measurement.However, the acceptability of a given error rate at an application level may depend on several factors.In general terms, it is reasonable to distinguish between error-tolerant applications that may assume a certain amount of imprecision and inaccuracies (such as text annotation applications and recommendation applications) and critical applications that require high correctness and precision (such as query answering applications or the case study previously presented).Future works should work on suggesting new metrics that better reflect the uncertainty in ontology alignments.
It is difficult to quantify uncertainty in ontology alignments, especially implicit uncertainty that cannot be measured.For example, how to know if the confidence values are reliable or not?In this case, one can suggest a sample checking process where the user manually evaluates some random correspondences from the alignment, and then a global score is returned based on the user responses.Similarly, how to know if the domain(s) of the input ontologies has (have) a high level of complexity?In this case, a score can be proposed by the user or the expert after examining the two ontologies (to be matched) or after examining the ambiguous correspondences of the output alignment.
We can also have an idea about the risk of ambiguity between the two input ontologies (to be matched).We can suggest a metric that reflects the percentage of common words (or tokens) among all the words (or tokens) composing the entities of both ontologies.It can be considered a metric that shows how similar the two ontologies are, since ontology similarity is an important uncertainty indicator.The more the ontologies are similar, the more they represent a risk of ambiguity and uncertainty in the matching process.These proposed metrics can help us estimate the degree of uncertainty of the returned alignment.
Overall, the uncertainty topic is not well treated in the ontology matching area.Uncertainty management in ontology alignments is still an open issue.New metrics for uncertainty evaluation and new uncertainty management solutions are really needed in the area of ontology matching (and the area of schema matching in general).
The holistic performance in terms of reliability of the fully automated integration process suggests semi-supervised matching approaches.The latter can intuitively facilitate manual operations by providing suggestions that need human validation before commitment to the system.Generally speaking, semi-supervised matching processes are not necessarily scalable.However, they can be considered effective in validating detected correspondences by rejecting the incorrect ones.On the other hand, semi-supervision is not that effective in managing missed correspondences.It implies the validator should check the whole alignment.Such an issue becomes more and more relevant, if not an actual bottleneck, when the amount of information to integrate increases.A situation in which the target alignment presents ambiguities or is not completely clear constitutes a special case for whom automated matching can provide an acceptable approximation, especially on a large scale.
In conclusion, despite the notable technological advances at both a theoretical level and an application level, a reliable automated integration process cannot be currently enforced, regardless of the scale and the characteristics of the target environment.

Conclusion
This paper reviews the ontology matching area, focusing on the aspects of uncertainty and uncertainty management.It also deals with the implementation of a knowledge-building case study from which we make observations, discuss uncertainty situations, and extract the lessons learned from using automatic ontology matching systems.The case study aims to provide an integrated semantic data space from heterogeneous raw datasets by leveraging automatic matching tools.We have conducted experiments on real data by combining the virtual table method with existing tools for automatic ontology matching.Indeed, we first use a tool that supports a user-friendly conversion of data into a semantic format.Then, we perform an automatic ontology matching process using well-known tools for ontology matching.
The results clearly show the significant uncertainty introduced by automatic ontology matching methods, even considering a relatively simple case study.In general terms, regardless of the adopted matching tool, experiments behold a high number of detected false positives and false negatives.It can be summarized as a fundamental unreliability of automatic matching techniques, probably with the exception of error-tolerant applications.Realistically, we believe that a further consolidation of current matching techniques with an increased level of customization may result in a more effective application of semi-supervised integration methods.
Future work will aim to manage uncertainty in practice by inferring potential relationships between the characteristics of the raw data to be integrated and the resulting integrated data space.Besides, we are starting to explore setting up a holistic alignment-based approach for integrating multiple ontologies driven by a holistic ontology matching process.

Figure 1 :
Figure 1: Knowledge-Building Process Driven by Data Integration.
Datasets in Experiment 1 describe a list of countries and several associated indicators (e.g., region, population, population density, surface area, GDP (Gross Domestic Product), birth/fertility rate, net migration, literacy, etc.).In Experiments 2, 3 and 4, datasets provide indicators of food security and undernourishment for different countries in different years.Experiments 5 and 6 describe economic indicators for different countries in different years, while Experiment 7 targets life expectancy indicators.Finally, Experiments 8, 9, and 10 perform cross-domain matching (between the domains of democracy, violence against children, and CO 2 & greenhouse gas emissions).Regardless of the meaning of the data, the original tables report the values of given indicators for different countries in different years.The actual structure may vary from case to case, but, in most cases, it proposes typical patterns used to organize spatio-temporal data.For tables describing a particular indicator, rows represent the names of countries and columns represent years (or intervals of years), or vice versa.In some other cases, tables report the values of more than one indicator for different countries in a single year (or in a single interval of years): Here, rows represent the names of countries, and columns represent indicators.More rarely, tables report the values of multiple indicators for different countries in different years: In this case, rows represent ID numbers (enumerated numbers/indexes), and columns contain a year or interval column, a country column, and indicators' columns.

( a )Figure 2 :
Figure 2: Alignment Disambiguation (A Simplified Approach for the Stable Marriage [57]) † R : Number of correspondences in the reference alignment R = Number of expected correspondences = T P + T N. ‡ A : Number of correspondences in the output alignment A = Number of detected correspondences = T P + FP. § Amb : Number of ambiguous correspondences in the output alignment A.

Table 4 :
Results Summary (after Alignment Trimming and Disambiguation).

Table 5 :
Comparison between Precision Results in Different Tables.

Table 6 :
Comparison between Recall Results in Different Tables.

Table 7 :
Comparison between F-measure Results in Different Tables.

Table 9 :
Comparison between Ambiguity Results in Different Tables.