An Entropy-Based Weighted Concept Lattice for Merging Multi-Source Geo-Ontologies

To deal with the complexities associated with the rapid growth in a merged concept lattice, a formal method based on an entropy-based weighted concept lattice (EWCL) is proposed as a mechanism for merging multi-source geographic ontologies (geo-ontologies). First, formal concept analysis (FCA) is used to formalize different term-based representations in relation to the geographic domain, and to construct a merged formal context. Second, a weighted concept lattice (WCL) is applied to reduce the merged concept lattice, based on information entropy and a deviance analysis. The entropy of the attribute set is exploited to acquire the intent weight value, and the standard deviation contributes to computing the intent importance deviance value, according to the user preferences and interests. Some nodes of the merged concept lattice are then removed if their intent weights are lower than the intent importance thresholds specified by the user. Finally, experiments were conducted by combining fundamental geographic information data and spatial data in the hydraulic engineering domain from China. The results indicate that the proposed method is feasible and valid for reducing the complexities associated with the merging of geo-ontologies. Although there are still some problems in the application, the manuscript offers a new approach for the merging of geo-ontologies.


Introduction
Currently, geo-ontology is widely used for representing and sharing spatial information in various application domains, and integrating different geospatial information between interoperating systems has become a hot topic in many scientific disciplines [1].However, due to the different data standards and incompatible terminologies for expressing spatial information in geographic information science, it is easy to produce semantic heterogeneity.For instance, semantic differences regarding rivers often occur in the distributed systems of the Ministry of Water Resources and the Ministry of Land and Resources in China.The former emphasizes the natural morphology of rivers, and the latter stresses the negotiability of waterways.These problems could possibly be avoided if ontologies were applied.At present, ontology is widely used as a tool to address heterogeneity problems in many areas, such as knowledge representation, information retrieval, and the semantic web [2,3].Furthermore, ontology has been defined as "a formal explicit specification of a shared conceptualization" [3], and geo-ontology is no exception in real applications, which is used to define a common vocabulary that will facilitate interoperability and handle some problems with data integration in various systems.[4].The existing geo-ontology building frameworks different experts, different tools, and different techniques.Geo-ontologies may differ and even conflict, even though the ontologies exist in the same domain.As a consequence, the problem of merging geo-ontologies from multi-source geospatial data is still a big challenge.
Up till now, some significant progress in merging geo-ontologies has been achieved.Kokla et al. [5] combined semantic factoring and a concept lattice for integrating multiple ontologies .They used an example of integrating the concept type "stream", as defined by three different ontologies: CYC top-level ontology, WordNet, and SDTS.This method can detect the possible implicit relations between concepts which are not predefined.Similarly, Zhu [6] presented a formal method based on concept lattices to form a more general semantic level, and an algorithm was designed to reduce the redundant concept relations.
Meanwhile, Buccella et al. [7] proposed a merging method by using a set of matching functions and inferences over the ontologies in order to find the more suitable correspondences.This method minimizes the redundant information and improves the understandability of data by applying the ISO19109 and ISO19107 standards to normalize the geographic ontologies.Torre et al. [4] provided a conceptualized framework of geographic application ontologies for sharing and integrating geospatial information.This method is based on abstract classes to cognitively classify geographic concepts, and directly translates the relationships between mapping concepts by a set of axiomatic relations.In addition, Stumme et al. [8] proposed the formal concept analysis merge (FCA-MERGE) method as a semi-automatic method for translating concept lattices into a merged ontology, but the method still requires revision by experts.Chen et al. [9] provided a new method for combining WordNet and the fuzzy formal concept analysis technique for merging ontologies with the same domain.Two ontologies, including a base ontology and a revision ontology, can be converted into a novel fuzzy ontology by using the revision ontology to update the base ontology, but the method solely utilize partial semantic factors to determine the relationship between elements, and much detailed information need be considered in the future.The applications mentioned above mainly focus on top-level ontologies, a conceptualized framework, matching functions, and formal concept analysis, However, the previous works have two drawbacks: (1) Each method must extract the formal representation of semantics from the definitions of geographic entities by FCA, and the complexity in the different levels does not get full attention, due to the rapid growth of the merged concept lattice.(2) These methods generally not only assume that all the intents are equally important during the process of constructing the concept lattice, but also do not fully consider the requirements and preferences of the user.Hence, new techniques must be added to handle these shortcomings.However, due to the high space and time complexity, the performance of most algorithms for constructing a concept lattice for dense and large contexts is not desirable [10].Accordingly, reducing the size of the merged concept lattice in most of the approaches is essential for the first problem.Since each node is composed of the intent and extent in the structure of a concept lattice [11,12], we propose the term of reduction from the viewpoint of finding an appropriate object set and attribute set, respectively.From the point of view of the object reduction in our previous work [13], we applied a fuzzy equivalence relation matrix to construct the equivalent characteristic components of the extent of the concept lattice.We then selected an appropriate threshold value to receive sets of concept extension in different granulations, and measured the similarity of any two extents in the granulation.Finally, the experimental results indicated that the merging process is a stepwise refinement process corresponding to different levels of granularity, and conceptual similarity in the fine-grained levels was higher than in the coarse-grained levels.
Following this premise, our current work is motivated by the need to address the two problems mentioned above.The main contributions of this manuscript can be summarized as follows: First, FCA is introduced for the merging of multi-source geo-ontologies.Second, EWCL is applied to consider the importance of the different intents of geo-ontologies in GIScience, in which the entropy of the attribute set is used to acquire the single attribute intent weight value, and the standard deviation is used for the intent importance deviance value between the multiple attributes.Third, the merged concept lattice is simplified from the point of view of attribute reduction, in terms of the importance threshold specified by the user.
The rest of this paper is organized as follows: Section 2 briefly reviews some notions of FCA and analyses some of the sematic representations of geo-ontologies; Section 3 describes the basic contents of the merging process of EWCL; Section 4 describes the experiments undertaken to investigate the merging of geo-ontologies, and presents an analysis of the results; and Section 5 draws conclusions and discusses the future work.

Basic Notions of FCA
FCA, a branch of applied mathematics based on lattice theory, is a conceptual framework proposed by Wille in 1982, and has been applied to many different fields, such as data analysis, knowledge discovery, software engineering, and information retrieval [14][15][16][17].To demonstrate the relations between objects and attributes in a given application domain, a concept in FCA is defined within a formal context.Here, we only briefly implement some basic sematic analysis by FCA.For a more extensive introduction refer to [18]..The formal concepts of the context are derived in terms of the following operations: ( '', ') ( ', '') For simplicity, we write  ,  A B ) in a formal context K is defined as follows: , , In the above condition, " "  is called the hierarchical order of concepts.' ' ' '

A Sematic Representation of Geospatial Information
In the current research on concept lattices, it is usually assumed that the extent and intent in the formal concept are of equal importance.The extent represents the entities, whereas the intent includes its intrinsic characteristics.However, the semantic basis for spatial concept types in the hydrological domain usually means intent, and the intent is determined by essential geographic properties [4].The extent and intent of the spatial concept may be extracted by the common understanding, taxonomic structure, and recognized vocabulary of the domain knowledge derived from professional dictionaries and standards.Based on the fundamental philosophical notions of identity, unity, essence, and dependence, Guarino and Welty [19] presented a set of meta-properties to represent the behavior of the essential properties, including the rigid property, non-rigid property, anti-rigid property, semi-rigid property, carrier identity, and external dependence.For example, we normally think that reservoirs in the hydrological domain possess properties such as "store water", "storage capacity", "name", etc.The "store water" property is normally rigid for each individual reservoir.The "name" property cannot support identity while being the same individual.The "storage capacity" has a non-rigid property because different reservoirs have different storage capacities.Therefore, "store water" may represent the ontological property of the reservoir.In geographic ontologies, definitions contain the rich sources of scientific knowledge of the geographic domain, in general, they are also the key and the only descriptions of category terms, which result in the semantic definition of geographic categories (e.g., purpose, cause, material) [20].In order to identify a set of semantic properties-relations, Wang et al. [21] proposed the property of geospatial ontology from the view of top-level ontology, including space, time, cause, material, function, object, and metrics.On the basis of their work, we identified partial semantic relationsproperties of inland hydrological concepts from GB/T 13923-2006 (Specifications for feature classification and codes of fundamental geographic information), GB/T 20258.1-2007(Data dictionary for fundamental geographic information features), and SL 213-98 (Specification on basic information coding of water conservancy projects) in China (see Table 1 and Table 2).From the given normative descriptions, heterogeneity problems inevitably exist.For example, one consider a lake consisting of the following semantic properties and relations : material with value "water", spatial morphology with value "low depressions", metrics with value "wide areas and slowly changes the yield of water", and the other defines a lake associated with the corresponding semantic properties: material with value "water", spatial morphology with value "natural depressions on the earth", function with value "store water".In terms of the formal definitions of ontologies, geo-ontologies are applied to capture the universal concepts and meanings in the geospatial domain.However, "lake: wide areas and slowly changes the yield of water" has a bit strong subjectivity and a non-rigid property, because different lakes have different areas, flow velocities and water yields.Hence, in order to replace "wide areas and slowly changes the yield of water", we adopt a more appropriate expert standard: metrics with value "≥ 10 5 m 3 " [22], meanwhile, "natural depressions" is a part of "low depressions".In addition, other context-specific semantic elements are also identified.For instance, the semantic properties in relation to hydrography are complemented by domain experts, such as time (perennial or seasonal) and cause (natural or artificial).Similarly, "reservoir: retain river runoff" is a part of "reservoir: prevent flood", and "in the river, the valley, the depressions and underground permeable layer" is considered as spatial morphology with value "depressions on the earth".Here, ontological properties of geospatial objects in Table 1 and Table 2 have been extracted from the normative descriptions, moreover, in particular, ontological properties in the parentheses are updated and complemented based on domain experts.In reality, extracting semantic information from the normative descriptions might encounter some differences and conflicts, such as inconsistencies in normative definitions, differences in different spatial locations and incomplete characteristics or overlapping functions.Furthermore, a certain vagueness, caused by different languages, also existed in the literal description of the concepts of geospatial objects.In order to deal with these heterogeneities from the merging of multi-source geospatial data, the formal method based on a top-level ontology should be considered.The formal conceptualization of geographical concepts consisting of two parts: the extent and the intent.The former includes the entities or objects, which belong to the concept, whereas the latter represents its intrinsic meaning or properties.Each row and column of a formal context represents an extent and intent of a geographical concept, respectively.Due to the extensive contents in the two above-mentioned domains, we only selected partial elements to construct the formal contexts by FCA in Table 3.

For running example in Table 3, '
A and ' B take the following form: seasonal lake ground river}',{seasonal river}', ,{dike}' ,{s8}' Where { }' { , , , , , } seasonal lake is the set of attributes corresponding to the semantic factor seasonal lake, whereas { }' { } seasonal lake,seasonal river, lake, pond c  is the set for semantic factors denoted by the attribute c (cause with value "nature").

The Entropy-Based Weighted Concept Lattice
Although all the intents are generally of equal importance during the construction of a concept lattice [10][11][12], in some practical applications, a user is usually interested in some certain attribute characteristics, according to his/her preference and requirement.Fox example, we may pay more attention to the "shipping property" of the canal rather than its "water storage", to some extent.Hence, we add some weights into the intent to capture its importance, and we do not need to investigate all the nodes, but only those nodes according to our needs.Motivated by an incremental updating algorithm used to effectively construct a weighted concept lattice [10][11][12] , a proposed approach of EWCL is outlined to resolve the above-mentioned problem in the following definitions below.In parallel, we briefly recall some basic notions and judgment methods of each intent weight value, with regard to weighting a concept lattice.Refer to [10][11][12] for a more extensive introduction.
In a general formal context ( , , ) K G M I  , a set of attributes is expressed by


. We demonstrate the weights of attributes denoted as a single attribute intent, otherwise B is denoted as a multi-attribute intent.Here, the weight of the multiattribute intent ( ( ) weight B ) is defined as the arithmetic average of the corresponding attributes computed as follows: where A   , or B   , we assume that ( ) 1 weight B  .In general, the weight of the single attribute intent is determined by domain experts.However, the current spatial objects stem from different domains, and it is difficult to determine the weight by experts in a specific domain.Therefore, under the condition of a lack of existing knowledge, we adopt an objective probability method to quantify the related weight by using axiomatic characterizations of information entropy, according to Shanonn [11,[23][24][25].

and
(1 ) , then i w is denoted as the weight value of single attribute intent i b .The probability and the weight value are computed as follows [11,25]: The above i w is generally a normalized form.Here, we regard a weighted concept lattice, based on the weighted value produced by the information entropy, as an entropy-based weighted concept lattice.However, in practical applications, ( ) weight B does not take into account the importance of the deviation among all the intents.The result is not conducive to the sensitive extraction of a user interested in knowledge.Therefore, in order to explore i w deviating from ( ) weight B , we introduce a deviation analysis to evaluate the importance of the multi-attribute intent weight value and select an appropriate threshold value to further meet the needs of the user.The deviation analysis is computed as follows [11,12]: w w e i g h tB n where ( ) D B is denoted as the deviation of the multi-attribute intent weight value.In particular, if From the above analysis, the basic steps of merging multi-source geo-ontologies based on EWCL are shown in Figure 1.We can see that the approach consists of three stages: extracting ontological properties, building a general concept lattice and reducing the general concept lattice based on information entropy and a deviance analysis.In the following Section 4, we will present experimental results to highlight the relevance of our method on merging multi-source geographic ontologies

Case Study and Discussion
Our method is focused on the interdisciplinary merging of geo-ontologies, which is quite different from other approaches in the same field.For example, we have employed the shared concept related to the hydrographic ontology between the fundamental geographic information (GB/T 13923-2006) and hydraulic engineering (SL 213-98) to implement the merging of ontologies.We selected several extents from these two domains, respectively, and constructed a synthetic formal context in Table 3, in which lakes, reservoirs, spillways, and dikes are interdisciplinary common objects.The intents of the two domain ontologies were expanded unanimously, based on domain experts (see Table 3).The process of building ontologies was not discussed in detail.Refer to [5,21] for a more extensive introduction.Here, we reduced the merged concept lattice from the intent direction by EWCL.
First, by using Equations ( 5)-( 7), we obtained the single intent weight value of the merged concept lattice in Table 4, by using information entropy based on Table 3.Then, by using the incremental updating algorithm of the concept lattice [11,12,26,27], we drew a general weighted concept lattice (WCL) for representing the merged ontologies, which was induced from the formal context in Table 3.
Despite semantic heterogeneities between multi-source geospatial data, the integrated concept lattice comprised a common and equally perceived part of geospatial objects in the two domains.As shown in Figure 2, all the intents were of equal importance, and all concept lattice nodes i C were the union of sets ( '', ') ( ', '')   { , , , , , , , , , , , , , , }; 3.
Specifically, the last tem of each node i C is denoted as the weight value in the above-mentioned parentheses.12,13,14,15,16,17,18 C C C C C C C and 19 C are the origin concepts by examining the formal context in Table 3.The least concept 20

C
does not possess the corresponding meanings in the geospatial domain, due to the completeness to form the bottom of the concept lattice.0, 1, C are the new generated concepts by the algorithm.Although not every new concept does correspond to a meaningful concept or a specialized term, it is beneficial to reveal the hierarchical relationships between geographic categories.The manuscript will not discuss further the hierarchical semantic classification of the integrated concept lattice, since our work does not contribute to those aspects.Instead, we focus on reducing the merged concept lattice, based on information entropy and a deviance analysis.Figure 2 is only a concept lattice generated by the partial objects in relation to hydrography, which is not a complete ontology category.Second, according to Table 4, we computed the average weight of the multi-attribute intent and the deviation of the intent importance using Equations ( 5) and (8).The results are shown in Table 5.In the construction process of the WCL, we took into account the user preference and interest by combining the average weight of the multi-attribute intent and the deviation of the intent importance to set thresholds for the intent importance.For any weighted concept ( , , ) , we defined the quantity (0 1)     as the threshold of the intent importance, We represented the following: if w   , w n is denoted as a frequent weighted node; otherwise w n is denoted as an infrequent weighted node.In general, the WCL is usually composed of all of the frequent weighted nodes [10,11].As is shown in Table 6, the quantity  is denoted in different granulations according to the range of weight(B) in Table 5.If we set up 0.040   as the threshold, then C1, C3, and C5 are removed because of w   .This indicates that one concept only involved the intents of material/water and spatial location/on the earth, which was unsatisfied with the threshold of the intent importance.In order to ensure the completeness of the concept lattice, C1 and C3 are temporarily retained, and C3 is removed.Similarly, if we set up 0.052   as the threshold, C1, C3, C5, C6 and C7 are removed.The results are shown in Figure 2 and Figure 3 in different granularity, obviously, Figure 3 is greatly simplified comparing with Figure 2, whereas Figure 4 is greatly simplified comparing with Figure 3. From the above analysis, we drew a conclusion that Hasse Diagram was gradually simplified with increase of the granularity ( )  , and the process of reducing the intergrated concept lattice was a stepwise refinement process corresponding to different levels of granularity.
Table 5.The intent weight value and importance deviation value of the lattice node.
Finally, we defined the quantity (0 1)     as the threshold of the deviation of the intent importance.In terms of ( ) D B in Table 5, if we set up 0.27   as the deviation threshold when ( 0.40   ), then C12, C13, C16, C18, and C19 are moved because of ( )  , namely, their intent weights were lower than the intent importance thresholds specified by the user.At the same time, we might encounter situations when although the intent weight values of these nodes (C12, C13, C16, C18, and C19) are greater than the predefined threshold (  ), the deviation values of the intent importance of these nodes are lower than the predefined threshold (  ), as can be seen the strong weighted concept lattice in Figure 5, then these nodes should also be removed.The reduced WCL when ( 0.40   and 0.27

 
) is shown in Figure 5, which is greatly simplified compared to Figure 3.In addition, according to the meaning of the deviance analysis, the smaller is the standard deviation  , the less are the single-attribute intents deviating from ( ) weight B , namely, if the greater is the difference between the single-attribute intent weight values , the greater is the deviation value ( ) D B .For example, the deviation value C6 is greater than C4 in Table 5.That is to say that the potential weight value difference between "a" and "d" is greater than that between "a" and "e".By ascending the deviance value in the process of reducing the multi-attribute concept lattice , in genearal, the method should give priority to retain some nodes existed a greater difference between the singleattribute intent weight values, and remove other nodes existed a small difference.The example shows that the proposed method is feasible and effective for reducing the merged concept lattice in the geoontological domain, and is appropriate for different user requirements.Consequently, we can select an appropriate threshold to reduce the complexity of the merging of the geo-ontologies by combing information entropy and a deviance analysis.

Conclusion and Future Work
In this study, we present a novel method for the merging of multi-source geo-ontologies by EWCL.Firstly, to deal with the semantic heterogeneity of the geospatial information, FCA was used to extract the formal semantics, and we constructed the formal contexts from two different domains of the fundamental geographic information and hydraulic engineering domain.Secondly, in order to address the complexity of the merged concept lattice with the rapid growth in the ontology size, we have proposed a merging method for the geo-ontologies in different granulations.According to the user preference and interest, we reduced the intent of the merged concept lattice by the WCL based on information entropy and deviance analysis.We can then select an appropriate threshold value to reduce the merged concept lattice according to the specific need.Finally, experiments were conducted by combining fundamental geographic information data and spatial data in the hydraulic engineering domain.The results showed that the proposed method is both feasible and valid.As a matter of fact, the merging of multi-source geo-ontologies is still a challenge, and error-prone problems inevitably exist.WCL is a known theory in other disciplinary areas, but the proposed EWCL in this paper is a crucial application related to information entropy theory in the field of the merging of geo-ontologies, and the research is a new attempt in this direction.
In the previous work, although FCA acquired some new implicated concepts induced concepts from the given multi-source geospatial data set, it generated some redundant concepts and relations.The intent weight value, based on information entropy and a deviance analysis, was regarded as a kind of constraint in order to reduce the merged concept lattice.Sometimes, however, the reduction achieved by EWCL is not enough; in such situations, a reasonable threshold relying on domain experts might be useful to control the simplicity.For example, extracting the semantic relations and concepts is not just a technical issue but requires a revision by domain experts to make sure that the resulting concepts make sense.Consequently, in the future, we will concentrate on multiple ways of acquisition for the intent weights of the concept lattice.Other important aspects must also be taken into account, in order to achieve semantic integration.For instance, we will further apply the formal reasoning mechanism to improve the merging integration process, and we will evaluate the quality of the merged ontology.

Definition 2 .(M
importance degree of the attribute i m .A weighted formal context is defined as a quadruple are two non-empty sets called objects and attributes, respectively.W is a set of the weight value, which indicates the importance of a single attribute in M , and I G M   is a binary relation between G and M .

Definition 4 .
Let any object j a called the probability of j a possessing the corresponding attribute i b ,and ( ) i E b is called the average information of weight of G providing the attribute i b .In a formal context w K

Figure 1 .
Figure 1.The workflow of merging multi-source geo-ontologies based on EWCL.

Figure 2 .
Figure 2. The merged concept lattice based on Table3.
, 10 C C C C C C C C C C C and 11

Figure 3 .
Figure 3.The reduced weighted concept lattice when (

Figure 4 .
Figure 4.The reduced weighted concept lattice when (

Figure 5 .
Figure 5.The reduced weighted concept lattice when ( 0.40   and 0.27
material/water, cause/nature, time/seasonal , material state/flow.(spatial morphology/long strip slot, spatial location/on the earth) reservoir A body of water or buildings generated from constructing all kinds of dam, gate, dike, and weir, which retain river runoff material/water, cause/artificial, spatial adjacency/dam, gate, dike and weir, function/ prevent flood.(function/ store water ,spatial morphology/ depressions on the earth)

Table 2 .
Partial semantics of hydraulic engineering concepts in SL 213-98.

Table 3 .
Parts of the formal contexts of two different ontologies.Where each letter from "a" to "o", represents material/water, material/soil or stone, cause/nature, cause/artificial, spatial morphology/long strip slot, spatial morphology/depressions, spatial morphology/buildings, spatial location/on the earth, spatial location/underground, time/perennial, time/seasonal, material state/flow, function/shipping, function/prevent flood and function/store water, respectively." * " stands for criterion satisfied.
operations result in the following concepts:

Table 4 .
Acquisition method for the single intent weight value.

Table 6 .
The reduced concept lattice nodes in different granulations.