A Novel Approach to Semantic Similarity Measurement Based on a Weighted Concept Lattice: Exemplifying Geo-Information

: The measurement of semantic similarity has been widely recognized as having a fundamental and key role in information science and information systems. Although various models have been proposed to measure semantic similarity, these models are not able effectively to quantify the weights of relevant factors that impact on the judgement of semantic similarity, such as the attributes of concepts, application context, and concept hierarchy. In this paper, we propose a novel approach that comprehensively considers the effects of various factors on semantic similarity judgment, which we name semantic similarity measurement based on a weighted concept lattice (SSMWCL). A feature model and network model are integrated together in SSMWCL. Based on the feature model, the combined weight of each attribute of the concepts is calculated by merging its information entropy and inclusion-degree importance in a speciﬁc application context. By establishing the weighted concept lattice, the relative hierarchical depths of concepts for comparison are computed according to the principle of the network model. The integration of feature model and network model enables SSMWCL to take account of differences in concepts more comprehensively in semantic similarity measurement. Additionally, a workﬂow of SSMWCL is designed to demonstrate these procedures and a case study of geo-information is conducted to assess the approach.


Introduction
In information science and systems, semantic similarity plays a major role in various fields such as information retrieval, data integration, data mining etc. [1][2][3].Since its emergence in these fields, various theories on semantic similarity have been proposed [4].At the same time, a large number of approaches based on these theories have been developed and utilized to measure semantic similarity [5][6][7][8].The objects of semantic similarity measurement are usually called different terms, such as classes or concepts, across various articles.In this paper, the term 'concept' is used to represent the measurement object.The classical models of semantic similarity measurement, named feature models [9], are based on the representation of object-attribute knowledge, in which concepts are represented as a set of features and sometimes the relationships between them.Following this, semantic similarity can be translated into a comparison of the commonalities and differences of sets of features that represent different concepts.Some other semantic similarity measurement approaches are based on network models, which encode knowledge in a form of a semantic network [10].Concepts in the semantic network are represented as vertices, which are connected with links.Thus, the 'distances' between vertices, which can be defined with various meanings, such as shortest path, weighted path length and so on, can be regarded as measurements of semantic similarity between concepts.Apart from these two main models mentioned above, some other models and approaches to measuring semantic similarity have been proposed [4,11].Some new approaches aim to overcome the shortcomings of existing solutions by integrating different models [8,12].
Although many different models and approaches have been proposed to deal with this issue, there is no general model or approach, which is broadly applicable for all fields.This is partly because the methods used to judge semantic similarity are dependent on different factors, such as the application context, purpose etc. [13].Most of the existing approaches are incapable of dealing with the issue of the dependence of semantic similarity judgment on different application contexts.Although some modern approaches take into account the influence of distinguishing contexts, they still cannot effectively assign weights to various factors, such as the semantic granularity of a concept and application context, which have different influences on semantic similarity judgment.With reference to various existing approaches, we propose a novel approach to semantic similarity measurement based on a weighted concept lattice (SSMWCL) in this paper, which combines both a feature model and network model.SSMWCL has the power to quantify the effects of different contexts on semantic similarity.Based on the knowledge representation of the feature model [9], SSMWCL first calculates the combined weight of each attribute of the concepts by merging its information entropy and the importance of its inclusion-degree in a specific application context.After this, our approach generates the weighted formal concepts using formal concept analysis and builds a network hierarchical structure, which is named the weighted concept lattice.In this lattice, the weighted concepts are represented as vertices linked with edges.Following this, the absolute semantic similarity between concepts is measured by comparing the commonalities and differences of their weighted attributes, in which the relative hierarchical depth of the concepts in the lattice is taken into account based on the principle of the network model.Finally, the semantic similarity is measured using the proportion of absolute semantic similarity in the semantic intent of the concept.Finally, the semantic similarity is normalized by introducing an exponential function.In our work, there are three main innovative points.The first involves transforming the impact of the specific application context on semantic similarity judgement into weights of attributes of concepts; the second includes integrating the feature model and network model in SSMWCL; while the third introduces the size of the concept intent's weight as an influencing factor on semantic similarity judgement.
The remainder of this article is organized as follows: Section 2 provides a brief survey of work related to semantic similarity measurement, formal concept analysis and the weighted concept lattice applied in information science and systems.At the same time, we present a workflow of SSMWCL.In Section 3, we introduce relevant algorithms to construct a weighted concept lattice based on the knowledge representation of the feature model.For this construction, some mathematical tools are applied, including the rough set, information entropy, and formal concept analysis.In Section 4, a novel approach for semantic similarity measurement based on a weighted concept lattice (SSMWCL) is proposed, and an implementation workflow of SSMWCL is presented at the end of the section.Section 5 demonstrates a case of measuring semantic similarities of geo-concepts in accordance with the workflow of SSMWCL, and the results are discussed briefly.Section 6 summarizes our approach, qualitatively compares it to other similar methods, and presents an outlook for the future.

Background
The importance of semantic similarity in theory and practice has been acknowledged for decades in information science and systems, with increasing numbers of relevant studies having been conducted.Being first proposed and used in psychology [14], the geometric model regarded semantic distance (similarity) in the analogy of spatial distance.Semantic distance (similarity) in the geometric model is computed to be a function of spatial distance.On the basis of this model, Gärdenfors, Raubal and Schwering used conceptual spaces, which developed the geometric model to measure semantic distance [15,16].Tversky et al. introduced and used a set-theoretical similarity measurement, nowadays known as the feature model.Based on the feature model, Rodriguez and Egenhofer proposed the matching-distance similarity measure (MDSM), which distinguished three different types of features of spatial entity classes to determine the semantic similarity of these classes [8].Furthermore, Janowicz et al. developed SIM-DL and SIM-DLA theories, which tried to introduce description logics into the feature model [17,18].Based on the network model that connects concepts to establish a semantic network, Ballatore et al. computed the semantic similarity of geographic concepts in the OpenStreetMap (OSM) semantic network [7].Janowicz et al. proposed a generic framework for semantic similarity measurements in geographic information retrieval, which allows designers to compare and select different measurement approaches for the specific application [19].Kim et al. matched place descriptions with overall similarity (including string, linguistic and spatial similarities), which can be regarded as combined semantic similarity [20].More recently, some other theories have been introduced to measure semantic similarity.Francis-Landau et al. tried to capture semantic similarity for entity linking with a convolutional neural network, which is known as the machine learning approach [21].Mihalcea et al. assessed the semantic similarity of the cross level with deep learning [22].
In this work, we focus on the semantic similarity measurement between a pair of concepts.In this specific field, four main sets of approaches are used to achieve this goal [23].The most popular method is the structural approach which uses the network model and relies on graph traversal.The shortest path [24], random-walk [25] and other interconnections [26] between nodes are the main variables used to define the semantic similarity function.Based on the feature model, the feature-based approach [8,27] compares the commonalities and differences between a pair of concepts in order to obtain the semantic similarity.The information theoretical approach relies on Shannon's information theory [28].With this approach, the semantic similarity of concepts is measured by comparing their information context (IC) [29,30].The hybrid approach takes advantage of various aforementioned paradigms.Singh et al. mixed the information theoretical and structural strategies [31].Rodríguez and Egenhofer proposed mixing the structural approach and the feature-based approach [2].However, the aforementioned approaches have several limitations.All these approaches require a taxonomy or ontology structure describing the elements to compare [23].The structural approaches require knowledge to be modeled in a specific manner in the graph and are not designed to take non-binary relationships into account.That means different types of relationships among concepts cannot be distinguished and weighted.Feature-based approaches usually cannot assign different weights to different attributes according to the various application backgrounds.In our work, the proposed SSMWCL does not require an existing taxonomy or ontology structure.However, this does not mean that SSMWCL cannot work based on a taxonomy or an ontology structure.In fact, existing ontology will make it easier to implement the SSMWCL by conveniently extracting essential properties and the classification hierarchy of concepts.On the other hand, the SSMWCL approach enables the assignment of different weights to different essential properties of concepts based on the various application backgrounds.In order to distinguish the influence weight of essential properties of concepts for measuring semantic similarity, we first analyze and calculate the combined weights of properties of concepts by mixing feature-based and information-theoretical approaches.Following this, the depth of concept in the network structure integrated with the weights of features can be used to define the semantic similarity between a pair of concepts.
Formal concept analysis (FCA), proposed by Wille [32], has become an important branch of applied mathematics.Its application has expanded into various fields, such as linguistics, information science, software engineering, computer science etc. Integrating heterogeneous data or information from many sources is an important characteristic of FCA.Stumme and Adche developed the FCA-MERGE method, which entails building a concept lattice and semi-automatically creating a target ontology from the lattice [33].Kokla and Kavouras applied FCA in order to establish a unified concept lattice in integrating geo-ontologies [34].Xiao and He proposed the combined weights of formal concepts using FCA and constructed a weighted concept lattice with these weighted concepts [35].In most of these studies, the semantic integration and semantic similarity measurement are interrelated.In this paper, we introduce the weighted concept lattice into semantic similarity measurement.
In order to measure semantic similarity based on the weighted concept lattice (SSMWCL) in a clear and understandable way, a workflow diagram (Figure 1) has been designed to demonstrate the main procedures.First, the existing classification knowledge and essential features of concepts that are extracted from the knowledge representations are brought together to build a decision table.Secondly, building a formal context converted from the decision table and the weights of the attributes in the formal context are calculated by combining inclusion-degree importance and information entropy.Following this, a weighted concept lattice is constructed from this concept context with weighted attributes.Finally, we calculate the semantic similarity between concepts based on the weighted concept lattice by comparing the commonalities and differences of their weighted attributes, in which the relative hierarchical depths of the concepts in the lattice are taken into account.
semantic similarity measurement are interrelated.In this paper, we introduce the weighted concept lattice into semantic similarity measurement.
In order to measure semantic similarity based on the weighted concept lattice (SSMWCL) in a clear and understandable way, a workflow diagram (Figure 1) has been designed to demonstrate the main procedures.First, the existing classification knowledge and essential features of concepts that are extracted from the knowledge representations are brought together to build a decision table.Secondly, building a formal context converted from the decision table and the weights of the attributes in the formal context are calculated by combining inclusion-degree importance and information entropy.Following this, a weighted concept lattice is constructed from this concept context with weighted attributes.Finally, we calculate the semantic similarity between concepts based on the weighted concept lattice by comparing the commonalities and differences of their weighted attributes, in which the relative hierarchical depths of the concepts in the lattice are taken into account.

Knowledge Representation of the Feature Model
Discussing and establishing knowledge representation is not the main focus of this present paper.However, in order to make the discussion and demonstration clearer and more convenient, we first introduce the dataset, which will be used as the sample data in Section 5.
Ontology, defined as 'a formal specification of a shared conceptualization' [36], is considered to be an effective tool for specifically representing knowledge in various studies [37,38].In geographic information science (GIScience), a number of scholars have tried to define essential properties of geo-ontologies and try to extract these properties from the definitions or descriptions of the geo-concepts' categories and specifications [39,40].Referring to a previous study [35], we extracted the essential properties of geo-ontologies (representing geo-concepts) based on geo-categories from GB/T 13923-2006 (specifications for feature classification and codes of fundamental geographic information) and definitions of these geo-categories from GB/T 20258.1-2007(data dictionary for fundamental geographic information features).Partial inland hydrological concepts and their essential properties are presented in Table 1.

Knowledge Representation of the Feature Model
Discussing and establishing knowledge representation is not the main focus of this present paper.However, in order to make the discussion and demonstration clearer and more convenient, we first introduce the dataset, which will be used as the sample data in Section 5.
Ontology, defined as 'a formal specification of a shared conceptualization' [36], is considered to be an effective tool for specifically representing knowledge in various studies [37,38].In geographic information science (GIScience), a number of scholars have tried to define essential properties of geo-ontologies and try to extract these properties from the definitions or descriptions of the geo-concepts' categories and specifications [39,40].Referring to a previous study [35], we extracted the essential properties of geo-ontologies (representing geo-concepts) based on geo-categories from GB/T 13923-2006 (specifications for feature classification and codes of fundamental geographic information) and definitions of these geo-categories from GB/T 20258.1-2007(data dictionary for fundamental geographic information features).Partial inland hydrological concepts and their essential properties are presented in Table 1.

Combined Weight of Attribute
Determining the weights of conceptual features is the key to measuring semantic similarity based on the feature model.In this paper, we represent conceptual features with properties or attributes.In the decision table, we use the property to refer to features.In the formal context and weighted concept lattices, a feature is represented as the attribute, which is the value of a specific property.For example, in Table 1, the geo-concept lake has the attribute of water, which is the value of its property material in Table 2.The weight of an attribute is computed via merging two factors that have mutually independent influences in our proposal.The first factor is the inclusion degree importance, which represents the degree to which the attribute impacts existing conceptual classification knowledge.We introduce information entropy as the other factor, which represents the average information of the weight of concepts comprising an attribute.The combination of these two influencing factors on an attribute, which is also known as its combined weight, represents the degree to which this attribute influences the semantic understanding of the concept.

Inclusion Degree Importance of a Property
In order to calculate the inclusion degree importance of a property, we first introduce the property importance from rough set theory.For more basic knowledge about the rough set, we refer readers to a previous study [41].Definition 1.An information system can be denoted as S = (U, A, V, f ), in which U is a non-empty and finite set of objects called the universe; A is a non-empty and finite set of attributes; V = a∈A V a , while V a is the domain of a; f : U × A → V is an information function which assigns an information value to each property of each object, denoting ∀a ∈ A, x ∈ U, f (x, a) ∈ V a .In particular, let A = C ∪ D, where C is the set of condition attributes and D is the set of decision properties.If C ∩ D = ∅, S is called a decision table.
For example, Table 3 is an information system, in which U includes lake, pond, seasonal lake, ground river, seasonal river, reservoir, spillway and dike; and A includes material, cause, spatial morphology, spatial location, time, material state, function and category.At the same time, it is a decision table if the attribute 'category' is considered as the decision property.Definition 2. Let S = (U, A, V, f ) be an information system, with each non-empty subset B ⊆ A. This determines an indiscernibility relation as follows: Obviously, ind(B) determines a partition of U denoted as U/ind(B) (for short U/B), which is also called a quotient set of U: where [x] B is the equivalence class determined by x with regards to B: For example, in Table 3, /{material} = {{lake, pond, seasonal lake, ground river, seasonal river, reservoir, spillway }, {dike}}, and U/{material, cause} = {{lake, seasonal lake, ground river, seasonal river}, {pond, reservoir, spillway}, {dike}}.
BX and BX denote B-lower and B-upper approximations of X, respectively: For example, in Table 3, let B = {cause} and X = {lake, pond, seasonal lake, ground river, seasonal river}, thus, BX = {lake, pond, seasonal lake, seasonal river} and BX = {lake, pond, seasonal lake, ground river, seasonal river, reservoir, spillway, dike}.Definition 4. Let S = (U, C ∪ D, V, f ) be a decision table and B ⊆ C, POS B (D) is called the positive region of the partition U/D with respect of B: For example, in In particular, let B = {a}, the importance of U/D with respect to a is: where For example, in  3) is built by inserting classification knowledge of concepts into Table 1.Generally, the decision properties are represented by the existing classification knowledge, such as industry standards or specifications, expert opinions etc.The existing classification knowledge of concepts reflects a specific application context to a great extent, which involves the general understanding of a specific domain within a relatively long period of time.Therefore, the importance weight of a condition property, which is inversely calculated from decision properties, reflects the degree of influence of this condition property on the semantic understanding of the concept.Although the importance of a condition property could quantify the weight of influence it has on the semantic understanding of the concept, this parameter will lose its function of distinguishing the importance of different properties when unnecessary properties exist in the decision table or when the decision table has more than one reduction result.Therefore, we introduced the inclusion degree proposed in a previous study [42], which is a solution to this problem.
Given two sets, X and Y, we define a function as follows: as the inclusion degree of U/A 2 to U/A 1 : Obviously, 0 ≤ CON(A 1 /A 2 ) ≤ n, especially if A 1 is smaller than A 2 .This means that ∀X i , ∃Y j makes X i ⊆ Y j with CON(A 1 /A 2 ) have the maximum value of n.

Formal Context
Before discussing the information entropy of attributes, we introduce the formal context from formal concept analysis (FCA).Definition 8.In FCA, a formal context is described by a triple K = (G, M, I), where G and M represent two non-empty sets of objects (called extent) and attributes (called intent), respectively, while I is a subset of the Cartesian product of G and M (I ⊆ G × M).When g ∈ G and m ∈ M, it means that object g has the attribute m, and that attribute m belongs to object g if gIm.
For example, Table 4 is a formal context, in which G = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 } and M = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o}.According to Definition 8, we can construct a formal context that is converted from the decision table (excluding decision properties) in which the value of properties is transformed into attributes of the formal context.As presented in Table 4, the attribute that holds for a specific concept is marked with '*'.Based on the formal context, which is also an object-attribute table, we introduce the information entropy to quantify the weights of the attributes.

Information Entropy of Attributes
Definition 9. Let K = (G, M, I) be a formal context, where G = {g 1 , g 2 , . . ., g n } and M = {m 1 , m 2 , . . ., m k }.Thus, p(m/g) denotes the probability of object g possessing the corresponding attribute m, while E(m) representing information entropy is the average information of attribute m provided by G, the set of objects.Following this, we can compute E(m j )(1 ≤ j ≤ k) according to the following formula: Before the combined weight of the attribute is calculated, we introduce the inclusion degree importance of the attribute based on the inclusion degree importance of the properties, Equation (11).
For example, in where m j ∈ C represents one of the conditional properties in decision table S, its possible value is m j , and | m j | is the number of nonempty value of m j .
For example, according to the decision table (Table 3) Thus, we can calculate the combined weight of attributes using the information entropy of attributes, Equation (12), and the inclusion-degree importance of the attributes, Equation (13).

Definition 11.
Let K = (G, M, I) be a formal context, where G = {g 1 , g 2 , . . ., g n } and M = {m 1 , m 2 , . . ., m k }, and w m j (1 ≤ j ≤ k) is the combined weight of inclusion-degree importance and information entropy of attribute m j .Following this, we define K W = (G, M, I, W) as a weighted formal context generated from K, where w m j ∈ W.
where E m j and SIG m j are calculated via Equations ( 12) and (13) respectively.
For example, w a = SIG(a) * E(a)/(∑ k i=1 SIG m j * E m j ) = 0.0153.The combined weights of all attributes are listed in Section 5.

Construction of the Weighted Concept Lattice
From Definition 11, the weighted formal context is defined by assigning combined weight to each attribute of the formal context.Following this, based on the theory of formal concept analysis, we define the weighted concept lattice from the weighted formal context.We refer readers to [32] to obtain more basic knowledge about the concept lattice in FCA.Definition 12. Let K = (G, M, I) be a formal context, with two sets, A ⊆ G and B ⊆ M. If every element (object) in A contains all of the attributes in B and every element (attribute) in B belongs to all of the objects in A, respectively, the following operations are denoted: then, the tuple (A, B) is a formal concept of K, if A = B and B = A .
Given that (A 1 , B 1 ) and (A 2 , B 2 ) are two formal concepts of K, there is a partial order relation (≤) between them if they satisfy the following condition: All formal concepts created from the formal context K = (G, M, I) are able to establish a hierarchical structure based on the partial order relation (≤) between concepts, named the concept lattice, which is denoted as (L(G, M, I), ≤) or (L, ≤).A concept lattice is a complete lattice.
For example, according to Definition 12, the tuple ({s 6 , s 8 }, {d, h, n}) is a formal concept generated from the formal context (Table 4) and ({s 8 }, {b, d, g, h, n}) is a formal concept too.Also, these two formal concepts have a partial order relation which is denoted as ({s 8 }, {b, d, g, h, n}) ≤ ({s 6 , s 8 }, {d, h, n}).Definition 13.Let K = (G, M, I) be a formal context, K W = (G, M, I, W) is the weighted formal context generated from K and (L(G, M, I), ≤) is the concept lattice established from K. Following this, we define (L W (G, M, I, W), ≤) as the weighted concept lattice of K W . (L W (G, M, I, W), ≤) has the same elements of (G, M, I) and the same structure with (L(G, M, I), ≤), while the each attribute in (L W (G, M, I, W), ≤) is assigned with the different weight.Definition 14.Let (L(G, M, I), ≤) and (L W (G, M, I, W), ≤) be a concept lattice and weighted concept lattice respectively.Given that (A, B) is a formal concept in (L, ≤), we define (A, B, w) as a weighted formal concept of (L W , ≤), in which w denotes the sum of weights of attributes that belong to its intent B.
According to Equation ( 14), the combined weight of each attribute can be calculated.Following this, we use Equation (18) to assign weight to each formal concept.For example, let A = ({s 8 }, {b, d, g, h, n}), A is a formal concept, and the weight of A is the sum of weights of {b, d, g, h, n}; thus w = ∑ m∈{b,d,g,h,n} w m = 0.2184.Therefore, ({s 8 }, {b, d, g, h, n}, 0.2184) is a weighted formal concept.

Semantic Similarity Measurement
With Definitions 13 and 14, we construct a weighted concept lattice based on a weighted formal context.Every vertex in this weighted concept lattice represents a weighted formal concept, the weight of which is calculated using Equation (18).In this section, a novel approach with detailed procedures is introduced to measure the semantic similarity between different weighted formal concepts in the weighted concept lattice.In particular, the formal concepts that include only one original geo-concept in their extent can be regarded as the original geo-concepts themselves.Therefore, the semantic similarities between these weighted formal concepts are considered to be those of the original geo-concepts.Meanwhile, the super-categories of these original geo-concepts can be represented by the weighted concepts that possess the same attributes in the lattice.Therefore, the concepts for comparison are not only restricted to the original geo-concepts but extended to their super-categories.
Before measuring the semantic similarity of the weighted concepts in the weighted concept lattice, we first introduce the relative hierarchical depth of concepts in the concept lattice to quantify the degree of impact that the concept hierarchy has on the semantic differences.

Relative Hierarchical Depth
Definition 15.Let (L, ≤) be a concept lattice.Given L ⊆ L is a subset of L, we denote sup(L ) as the supremum of the subset L , respectively.As the concept lattice (L, ≤) is a complete lattice, sup(L ) ∈ L and sup(L ) = ∅.In particular, if L contains only two elements(L = {a, b}), we denote sup(a, b) as the supremum of concept a and b.
For example, in the concept lattice (L, ≤) (Figure 2), sup(g, h) = sup(h, g) = b.Definition 16.Let (ℒ, ≤) be a concept lattice.We define and as two formal concepts, given that , ∈ ℒ and ≤ .Thus, we denote ( , ) as the shortest distance from to in the lattice, which means the least count of edges between vertices that connect vertices and in the lattice.Furthermore, we define ( , ) = 0.
For instance, in the concept lattice (ℒ, ≤) (Figure 2), as ≤ in the lattice, we can calculate ( , ) = 3 and ( , ) = 1.However, because ≰ , ( , ) is not defined.Thus, ( , ) and ( , ) are also not defined.The shortest distance is only defined from a sub-concept to its super-concept or itself.
In the network model, the distance between vertices representing concepts is usually an important indicator that reflects their semantic similarity.In the transformation model [43], the steps through which one concept is transformed to another are also regarded as a specific type of 'distance'.This particular 'distance' is also used to represent the semantic similarity between these two concepts.Similarly, the shortest distance from a sub-concept to its super-concept in Definition 16 possesses such a function.However, this parameter, the shortest distance of dis, focuses on representing the degree of differences between the sub-concept and one of its super-concepts.In the concept lattice, the intent of the super-concept is always a proper subset of the intent of its sub-concept.Therefore, the sub-concepts have the same general characteristics as the super-concept in the concept lattice.Thus, the shortest distance reflects the degree of differences between the sets of characteristics of a sub-concept and its specific super-concept.A longer shortest distance indicates a larger degree of differences between these two concepts.For instance, in the concept lattice (ℒ, ≤) (Figure 2), the concepts and have the same super-concept , but ( , ) = 2 while dis ( , ) = 1.Therefore, the concept has a larger degree of difference from than the concept does.Definition 16.Let (L, ≤) be a concept lattice.We define a and c as two formal concepts, given that a, c ∈ L and a ≤ c.Thus, we denote dis (a, c) as the shortest distance from a to c in the lattice, which means the least count of edges between vertices that connect vertices a and c in the lattice.Furthermore, we define dis (a, a) = 0.
For instance, in the concept lattice (L, ≤) (Figure 2), as k ≤ c in the lattice, we can calculate dis (k, c) = 3 and dis (j, c) = 1.However, because c k, dis (c, k) is not defined.Thus, dis (i, j) and dis (j, i) are also not defined.The shortest distance is only defined from a sub-concept to its super-concept or itself.
In the network model, the distance between vertices representing concepts is usually an important indicator that reflects their semantic similarity.In the transformation model [43], the steps through which one concept is transformed to another are also regarded as a specific type of 'distance'.This particular 'distance' is also used to represent the semantic similarity between these two concepts.Similarly, the shortest distance from a sub-concept to its super-concept in Definition 16 possesses such a function.However, this parameter, the shortest distance of dis, focuses on representing the degree of differences between the sub-concept and one of its super-concepts.In the concept lattice, the intent of the super-concept is always a proper subset of the intent of its sub-concept.Therefore, the sub-concepts have the same general characteristics as the super-concept in the concept lattice.Thus, the shortest distance reflects the degree of differences between the sets of characteristics of a sub-concept and its specific super-concept.A longer shortest distance indicates a larger degree of differences between these two concepts.For instance, in the concept lattice (L, ≤) (Figure 2), the concepts i and j have the same super-concept c, but dis (i, c) = 2 while dis (j, c) = 1.Therefore, the concept i has a larger degree of difference from c than the concept j does.Definition 17.Let (L, ≤) be a concept lattice.We denote a and b are two formal concepts.Given that a, c ∈ L and a ≤ c, we denote rhd (a, c) as the relative hierarchical depth of a to c in the concept lattice.rhd (a, c) is defined as follow: rhd (a, c) = dis (a, c)/(dis (a, c) where sup (L) is the supremum of the L that is actually the largest element of the lattice (L, ≤).Furthermore, we define rhd (a, a) = 0.
Although the shortest distance proposed in Definition 16 is able to reflect the different degree between concepts having partial relations in the concept lattice, this parameter is unable to distinguish the hierarchical differences of concepts.For instance, in the concept lattice (L, ≤) (Figure 2), dis (k, h) = dis (e, c) = 1.However, it is intuitively clear that the degrees between k, h and e, c are quite different.Therefore, in Definition 17, we introduce the relative hierarchical depth taking the impact of different hierarchies on the shortest distance between concepts into account.Thus, rhd (k, h) = 1/4 and rhd (e, c) = 1/2, which demonstrates that a lower hierarchy of the concept results in a smaller impact on the degree of difference.

Semantic Similarity Model
Proposed by Tversky, the contrast and ratio models (Equations ( 20) and ( 21)) are the prototypes for most approaches based on the feature model.These algorithm models have been widely used and developed into many new versions.For example, the MDSM extended it by distinguishing different types of features including parts, attributes, and functions [8].
In our work, we first introduce the absolute semantic similarity between concepts in the weighted concept lattice based on the contrast model, Equation (20).Following this, we propose a relative semantic similarity of concepts that is able to reflect and explain the cognitive cause of an asymmetric property of semantic similarity to some extent.
where rhd (a, c) and rhd (b, c) are the relative hierarchical depths of concepts a and b to c.
In Equation ( 22), we compare the commonalities and differences of weighted attributes between concepts based on the contrast model.However, in this formula of the absolute semantic similarity computation, the combined weighted difference of attributes and hierarchical depth difference of concepts are taken into account.
From Equation ( 22), the absolute semantic similarity has a symmetrical property and thus sim (a, b) = sim (b, a).However, according to Tversky's famous statement that similarity or dissimilarity is judged depending on the prominence or relative salience of concepts, the semantic similarity holds an asymmetrical property.Therefore, we introduce the relative semantic similarity, which is measured by the proportion of absolute semantic similarity in the semantic intent of the concept.Definition 19.Let (L(G, M, I, W), ≤) be a weighted concept lattice, while a and b are two formal concepts.Given a, b ∈ L, we then denote SI M(a, b) as the relative semantic similarity, or semantic similarity, of concepts a to b, which is defined as follows: where sim (a, b) = sim (b, a) is the absolute semantic similarity between concepts a to b, and sim (a, a) is the absolute semantic similarity of concept a to itself.
In Equation (23), sim (a, b)/sim (a, a) represents the relative proportion of the absolute semantic similarity in the concept a.The range of this parameter is from negative infinity to one, which means that sim (a, b)/sim (a, a) ∈ (−∞, 1].Following this, we introduce the exponential function to normalize it, and define the normalized result as the relative semantic similarity or semantic similarity of concepts a to b.From Definition 19, we know that this has an asymmetrical property with a range of 0-1 (SI M (a, b) ∈ (0, 1]).In particular, if SI M (a, b) = 1/2, it means that the similarity and dissimilarity between concepts a to b are approximately equal.In this case, SI M (a, b) = SI M(b, a) = 1/2.

Case Study and Discussion
In Section 3.2, by extracting the essential properties of partial geo-concepts from their definitions in GB/T 20258.1-2007,we build the object-properties table (Table 1) of geo-concepts.Obtaining the classification of these geo-concepts from GB/T 13923-2006 as the decision property, we subsequently add this decision property into Table 1 in order to establish a decision table (Table 3).Finally, a formal context (Table 4) is built by converting the decision table (Table 3).
Using Equation (11) in Section 3.2.1, the inclusion-degree weight of each property in the decision table is calculated, with the results shown in Table 5.Following this, the information entropy and inclusion degree of each attribute are calculated via Equations ( 12) and ( 13) respectively.Next, we substitute these parameters in Equation ( 14) to obtain the combined weight of each attribute.The results of the combined weights of attributes are shown in Table 6 after normalization.We calculate all formal concepts from the formal context of Table 4.After calculating the weight of each formal concept based on Equation ( 18), all weighted formal concepts are demonstrated as follows.
C 0 = ({s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 }; ∅; 0); C 1 = ({s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 }; {a}; 0.0154); C 2 = ({s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 8 }; {h}; 0.0188); C 3 = ({s 1 , s 2 , s 3 , s 4 , s 5 , s 6 }; {a, h}; 0.0342); C 4 = ({s 1 , s 3 , s 4 , s 5 }; {a, c, h}; 0.0516); C 5 = ({s 2 , s 6 , s 7 , s 8 }; {d}; 0.0174); C 6 = ({s 1 , s 2 , s 3 , s 6 }; {a, f, h, o}; 0.243); According to Definitions 18 and 19, the semantic similarities of one concept to another can be calculated.As the semantic similarity, (SIM) in Equation ( 23) has an asymmetrical property, we measure the semantic similarities of concepts of s , which is demonstrated in Table 7.In order to estimate the SMMWCL, we calculate semantic similarity between every pair of concepts based on the feature-based approach.According to Equation ( 21), we design an executable equation as following: In Equation (22), and are the sets of attributes of and respectively, and | | denotes the number of elements of .For example, in Table 4, | | = 6, ( , ) = 5/(5 + 0.5 + 0.5) = 0.83.With this equation, the semantic similarity results between every pair of concepts in   According to Definitions 18 and 19, the semantic similarities of one concept to another can be calculated.As the semantic similarity, (SIM) in Equation ( 23) has an asymmetrical property, we measure the semantic similarities of concepts of s 1−8 , which is demonstrated in Table 7.In order to estimate the SMMWCL, we calculate semantic similarity between every pair of concepts based on the feature-based approach.According to Equation ( 21), we design an executable equation as following: In Equation (22), A and B are the sets of attributes of a and b respectively, and |A| denotes the number of elements of A. For example, in Table 4, |s 1 | = 6, S(s 1 , s 2 ) = 5/(5 + 0.5 + 0.5) = 0.83.With this equation, the semantic similarity results between every pair of concepts in Table 4 are listed in Table 8.In Table 7, we list the results of semantic similarities between original geo-concepts.The value in each cell of that table represents the semantic similarity of the concept in its row to that of its column.For example, SI M (s 5 , s 8 ) = 0.225, while SI M (s 8 , s 5 ) = 0.135.According to the properties of semantic similarity (Definition 19), we can see that the values of semantic similarity in the table are in the range of 0-1 and thus, SI M ∈ (0, 1].If SI M (a, b) > 0.5, it means that the semantic similarity of concept a to b is larger than their difference and vice versa while SI M(a, b) < 0.5.Comparing the results in Tables 7 and 8 from Figure 4, we find that the distribution between their results is quite similar.Moreover, the correlation coefficient between them is 0.944, which means that two results of semantic similarity are highly correlated.At the same time, there are some differences between these two results.
We will analyze the results to illustrate the validity of SMMWCL.Comparing the results in Tables 7 and 8 from Figure 4, we find that the distribution between their results is quite similar.Moreover, the correlation coefficient between them is 0.944, which means that two results of semantic similarity are highly correlated.At the same time, there are some differences between these two results.We will analyze the results to illustrate the validity of SMMWCL.7 and 8.
First, the results in Tables 7 and 8 both show that those concepts belonging to the same super-category in the specification (GB/T 13923-2006) originally have relatively larger semantic similarities to each other.For instance, the lake, pond and seasonal lake (s , s , s ) in the specification are under the same super-category.There are relatively large semantic similarities between them.The pair of ground river and seasonal river (s , s ) as well as the pair of reservoir and spillway (s , s ) belong to the same super-categories, respectively.The semantic similarities between them are   7 and 8.
First, the results in Tables 7 and 8 both show that those concepts belonging to the same super-category in the specification (GB/T 13923-2006) originally have relatively larger semantic similarities to each other.For instance, the lake, pond and seasonal lake (s 1 , s 2 , s 3 ) in the specification are under the same super-category.There are relatively large semantic similarities between them.The pair of ground river and seasonal river (s 4 , s 5 ) as well as the pair of reservoir and spillway (s 6 , s 7 ) belong to the same super-categories, respectively.The semantic similarities between them are also relatively large.However, the feature-based approach sometimes cannot reveal the small difference of semantic similarity between concepts.For example, in Table 8, S(s 1 , s 2 ) = S(s 1 , s 3 ) = S(s 2 , s 6 ) = 0.83, while in Table 7, SI M(s 1 , s 2 ) = 0.943, S(s 1 , s 3 ) = 0.907 and S(s 2 , s 6 ) = 0.822.
Secondly, the results show that the concepts belonging to different categories in the specification possibly could have high semantic similarity.The concept of reservoir (s 6 ) is quite similar to the concept of lake, pond and seasonal lake (s 1 , s 2 , s 3 ).Dike (s 6 ) was found to have high semantic similarities with reservoir and spillway (s 6 , s 7 ).Sometimes, the semantic similarity between concepts in different categories originally is greater than that of concepts in the same one, such as the semantic similarity of pond to reservoir (SI M(s 2 , s 6 ) = 0.822), which is greater than that of pond to seasonal lake (SI M(s 2 , s 3 ) = 0.776).In this sense, our method might be able to reveal some implicit similarity of concepts and is a valid complement to the existing knowledge of classification.
Third, from Table 7, between two concepts the one that possesses a smaller weight (indicating poorer semantic intent) has a larger semantic similarity to the other.For example, for the two concepts of pond (s 2 ) and reservoir (s 6 ), s 6 has a larger weight (0.4426) than that (0.2897) of s 2 , which means that the granularity of s 6 is finer than that of s 2 .The semantic similarity of pond to reservoir (SI M (s 2 , s 6 ) = 0.822) is larger than that of reservoir to pond (SI M (s 6 , s 2 ) = 0.692).The results demonstrate the asymmetrical property of semantic similarities and show that a fine-grained concept is apt to be more similar to a coarser one than that of a coarse-grained concept to a finer one.
Finally, there are some formal concepts in the lattice that can match to the corresponding concepts of super-categories in the specification.For example, the nodes C 6 , C 12 and C 18 can correspond to the super-categories of (s 1 , s 2 , s 3 ), (s 4 , s 5 ) and (s 6 , s 7 ), respectively.Similarly, we can bring those super-concepts into the semantic similarity measurement calculation in the same way in SSMWCL.For instance, we can calculate the semantic similarity of concept reservoir to its super-concept C 18 , thus obtaining SI M(C 18 , s 6 ) = 0.832 and SI M(s 6 , C 18 ) = 0.64.In fact, the semantic similarity of each concept to another in the weighted concept lattice can be calculated via SSMWCL.

Conclusions and Outlook
We propose semantic similarity measurement based on a weighted concept lattice (SSMWCL) as a new approach to measuring semantic similarity among concepts.The concepts are represented by the feature model and the existing conceptual classification is applied.SSMWCL introduces the decision table from the rough set theory and information entropy in order to calculate the combined weight of features.Following this, formal concept analysis is used to establish the weighted concept lattice.Based on the hierarchical characteristics of the lattice, SSMWCL combines the feature model with the network model for semantic similarity measurement.The feature model is applied to compare the commonalities and differences of weighted conceptual features.At the same time, the network model is used to assign different weights to various concepts based on their relative hierarchical depths in the lattice.Finally, the absolute semantic similarity and the relative semantic similarity are distinguished.The absolute semantic similarity is one factor that impacts on the semantic similarity, while the other factor is the size of the concept intent's weight.As semantic similarity measurement between a pair of geo-concepts is widely applied in fields such as geo-information retrieval, geospatial semantic integration, spatial data mining etc., SSMWCL has the same function in those fields.Although SSMWCL is a knowledge-based approach, an ontology is not an indispensable element to support its implementation.In other words, SSMWCL can be used based on an ontology by extracting essential properties of classes (concepts) from the ontology, while it can also be applied by extracting essential properties from text analysis or domain specification.Indeed, how to extract essential properties from text analysis requires further study.Furthermore, if the formal context is large, involving for example a few thousand concepts, the algorithm for building a concept lattice will need much more time and space.Therefore, it is will be one of our future efforts to optimize the algorithm of SSMWCL.
In similar works [44,45] introducing formal concept analysis (FCA) in their algorithms, either the feature model and network model or the feature model and information theory are combined to measure semantic similarity.The attributes of the objects in the algorithms are potentially considered to impose the same weight impacts on the semantic similarity.To some extent, objective things almost contain infinite features.It is natural for us to understand a concept's semantic meaning based on its features, and different features have different influences upon understanding.On the other hand, the approaches proposed in the aforementioned papers could not maintain the asymmetrical property of the semantic similarity.In our work, we have distinguished the weight differences among features according to the application context.Furthermore, the application of the hierarchical depth of the concept node in the lattice enables us to preserve the asymmetrical property of semantic similarity.
There are several main characteristics of the SSMWCL: (1) the transformation from the specific application context (existing classification knowledge) to combined weights of attributes; (2) the application of the combined weight of an attribute with integration of the inclusion-degree importance and information entropy; (3) the combination of the feature model and the network model, which takes advantage of the hierarchical depth of concept in the concept lattice; and (4) the preservation of the asymmetrical property of semantic similarity between concepts.Geo-information is widely applied in various fields today.However, researchers in different fields or application contexts possess quite different cognition, which used to be represented through distinct classification systems.Therefore, it is quite significant to be able to measure the semantic similarity of geo-concepts based on different application backgrounds.Whether SSMWCL is valid or not in other application backgrounds and fields requires more evidence.Our future work will aim to generalize SSMWCL to broader areas.
There are two main aims for future studies following this work.First, we aim to extend SSMWCL to the measurement of semantic relatedness.Semantic similarity has been regarded as a particular subset of the notion of semantic relatedness [46].In SSMWCL, the network hierarchy of a weighted concept lattice is only applied to assign weights to the concepts according to the conceptual relative hierarchical depth in the lattice.However, we can also use the network hierarchy of the lattice to take advantage of network models and approaches, such as the semantic network.Therefore, further work should focus on studying the semantic relatedness measurement based on SSMWCL by constructing and evaluating the relationships of concepts in a concept lattice and integrating these network relationships into SSMWCL.Next, we plan to study the extraction of essential features based on the feature model.Although ontology has been identified as an effective tool for representing entity class or concept with essential features, it is not as effective in dealing with the entity object.Therefore, another focus of our work is to develop an approach to extracting essential features that are able to represent not only the entity class but also the entity objects.We hope that the knowledge representation (whether entity class or object) obtained using this approach will be applicable for SSMWCL.

Figure 1 .
Figure 1.Workflow for semantic similarity measurement based on a weighted concept lattice (SSMWCL).

Figure 1 .
Figure 1.Workflow for semantic similarity measurement based on a weighted concept lattice (SSMWCL).

Definition 18 .
Let (L(G, M, I, W), ≤) be a weighted concept lattice, Furthermore, a = (A 1 , B 1 ) and b = (A 2 , B 2 ) are two formal concepts.Given a, b ∈ L and c = sup (a, b) = (A 3 , B 3 ) we denote sim (a, b) as the relative semantic similarity between concepts a and b, which is defined as follow: sim(a, b)

Figure 3 .
Figure 3.The weighted concept lattice (established by linking weighted formal concepts).

Figure 4 .
Figure 4. Line chart in which semantic similarities are from Tables7 and 8.

Figure 4 .
Figure 4. Line chart in which semantic similarities are from Tables7 and 8.

Table 1 .
Geo-concepts with attributes table.

Table 1 .
Geo-concepts with attributes table.Each letter from 'a' to 'o' represents the attributes in Table2, while ∅ indicates that the object does not contain the attribute.As the object 'reservoir' includes two 'function' values (n and o), we represent its 'function' with '(n, o)' which is a new value of the 'function' property.

Table 2 .
Comparison table between attributes and identifiers.
Note: Each letter from 'a' to 'o' represents the same meaning as in Table2, while ∅ indicates that the object does not contain the attribute.The value of field 'category' is the classification code of the super-category of the objects, which is the decision attribute in the decision table.As the object 'reservoir' includes two 'function' values (n and o) in the formal context (Table1), we represent its 'function' with '(n, o)' which is a new value of the 'function' property in the decision table.

Table 4 .
A formal context converted from the decision table.Each letter from 'a' to 'o' represents attributes in Table2, while * represents a satisfied criterion.

Table 5 .
Weights of properties in the decision table.

Table 6 .
Attribute weight value of the formal context.

Table 4
are listed in Table8.

Table 7 ,
we list the results of semantic similarities between original geo-concepts.The value in each cell of that table represents the semantic similarity of the concept in its row to that of its column.For example,