Geospatial Information Categories Mapping in a Cross-lingual Environment : A Case Study of “ Surface Water ” Categories in Chinese and American Topographic Maps

The need for integrating geospatial information (GI) data from various heterogeneous sources has seen increased importance for geographic information system (GIS) interoperability. Using domain ontologies to clarify and integrate the semantics of data is considered as a crucial step for successful semantic integration in the GI domain. Nevertheless, mechanisms are still needed to facilitate semantic mapping between GI ontologies described in different natural languages. This research establishes a formal ontology model for cross-lingual geospatial information ontology mapping. By first extracting semantic primitives from a free-text definition of categories in two GI classification standards with different natural languages, an ontology-driven approach is used, and a formal ontology model is established to formally represent these semantic primitives into semantic statements, in which the spatial-related properties and relations are considered as crucial statements for the representation and identification of the semantics of the GI categories. Then, an algorithm is proposed to compare these semantic statements in a cross-lingual environment. We further design a similarity calculation algorithm based on the proposed formal ontology model to distance the semantic similarities and identify the mapping relationships between categories. In particular, we work with two GI classification standards for Chinese and American topographic maps. The experimental results demonstrate the feasibility and reliability of the proposed model for cross-lingual geospatial information ontology mapping.


Introduction
The vision of a "Digital Earth" articulated by US Vice President Al Gore [1][2][3] has contributed significantly to the growth in global geospatial information (GI) on physical and social environments.However, how to query, retrieve, and manipulate those data from heterogeneous sources has challenged the GI community [2][3][4][5].Thus, an approach to integrating GI data from various heterogeneous sources has found increased importance [6].
A data integration process is not as simple as joining several systems because any effort at information sharing runs into the problem of semantic heterogeneity [7].Semantic heterogeneity occurs when enabling interoperability across geographic information systems (GIS) [8][9][10][11] because GIS are often designed to address data from highly distributed, multidisciplinary, and cross-lingual data sources with different application demands [12].Clarifying the semantics of data is therefore a crucial step toward successful data integration [13].To achieve this, domain ontologies are built as a mediator to exchange information in such a way that the precise meaning of the data (i.e., semantics) is readily retrievable beyond simple keyword matching via knowledge representation languages and reasoning [7,[13][14][15].Thus, ontology engineering has been regarded as an effective means of providing seamless connection between component GIS at the semantic level [8,12,16].
While the GI community widely acknowledges the utility of ontology technologies, two main problems need to be solved for GI ontology engineering and sharing are as follows: (1) traditional ontology research and technologies focusing on terminology and schema cannot answer the question surrounding how to engineer GI ontologies and integrate them with GIS or Spatial Data Infrastructures (SDI) [6]; and (2) mechanisms still need to be explored for GI ontology mapping in cross-lingual environments to facilitate semantic integration between GI ontologies described in different natural languages [17][18][19][20].
The reason for the first problem is that GI features and categories are a product of spatial cognition and social convention; thus, the ontology engineering works in GI domains are different from others, in which the location, topology, mereology, and other spatial relations play a major role in the identification and representation of GI semantics [14].For example, from a feature-driven ontology perspective, the geographic categories "river" and "bank" should be specified into different classes, and normally, the spatial relation "adjacent-to" between these two categories is missing.Moreover, geographic and non-geographic entities are ontologically distinct in a number of ways [21].To enhance the semantic expressiveness and overcome the issue of semantic heterogeneity during the GI ontology engineering process, the spatial-related characteristics of GI categories must be considered to enrich the spatial-related semantics of the given ontology.
Although a majority of current GI ontologies have been developed in English with English vocabularies, the amount of multilingual content on the Semantic Web and thus the number of vocabularies/ontologies in multiple languages continue to grow [22].Thus, methods for matching vocabularies across languages have become increasingly more important for promoting the accessibility of the data in multiple languages by end users [23].As a motivating scenario, if a user wants to query the water level data along the Mekong River (The seventh longest river in the world, covering six different countries-Cambodia, Laos, Myanmar, Thailand, Vietnam and China-and the official languages of each country are different), there are several data providers offering the related GI data via their national GIS in their native natural languages.This situation has generated a substantial challenge to integrating highly heterogeneous GI data across natural language barriers.
The purpose of this study is to establish a formal ontology model for cross-lingual geospatial information ontology mapping.Starting from two GI classification standards with different natural languages-Chinese and English(for the sake of simplicity and clarity, this study was restricted to the "surface water" categories from these two standards)-a set of semantic primitives are extracted from the free-text definition of the categories in the standards by applying Natural Language Processing (NLP) techniques.Then, an ontology-driven approach is used, and the formal ontology model is established to formally represent these semantic primitives using semantic statements, in which the spatial-related properties and relations are considered as crucial statements for the representation and identification of the semantics of the GI categories.To overcome the natural language barrier, the statements in Chinese are translated into English by using machine translation tools, and the mapping relationships between statements are determined within an English context, which then serve as the basis for the similarity calculation between categories in different GI ontologies.Finally, a similarity calculation algorithm is designed to distance the semantic similarity between GI categories in different ontologies, and the final mapping relationships between pairs of categories are determined based on calculated similarity values.The contributions of the proposed approach include (1) the construction of the spatial-related semantic properties and relations to serve the requirement of the presentation and identification of the spatial characteristics of the GI categories and (2) the algorithms of GI ontology mapping in a cross-lingual environment based on formally represented and comparable semantic statements.
The remainder of this paper is organized as follows.Section 2 presents the related works in the literature.Then, the main procedure of our methodology is presented in Section 3. Next, a case study demonstrating the application of our method is shown in Section 4. Finally, conclusions are drawn, and future works are noted.

Semantic Interpretation
Knowledge acquisition (KA) is a broad field that encompasses the processes of extracting, creating, and structuring knowledge from heterogeneous resources [24].Semantic interpretation (SI) for KA is defined as the composition of two sub-processes: the extraction of semantic primitives from the free-text definition in ontologies and semantic enrichment based on the extracted semantic primitives.
The research on semantic primitive extraction builds on a large body of works within the fields of Natural Language Processing (NLP) [25].NLP and text mining are research fields aimed at exploiting rich knowledge resources with the goal of understanding, extracting and retrieving semantic information from unstructured written text.Knowledge resources that have been used for these purposes include the entire range of terminologies, including lexicons, controlled vocabularies, thesauri, and ontologies [26,27].Although numerous methods and algorithms have been developed recently (such as symbolic, statistical, and hybrid approaches) [26], a fully automated algorithm for semantic extraction using NLP techniques seems unachievable, and a manual process as an assistant is normally inevitable.
For semantic enrichment, authors in [28,29] proposed a systematic methodology to explore and identify semantic information provided by categories in geographic ontologies, in which the semantic representations of categories are enriched with a set of semantic properties and relations to reveal similarities and heterogeneities between these categories.Authors in [30] presented an axiomatic formalization of a theory of top-level relations (parthood relations, sub-universal relations, and cross-categorical relations) between three categories of geospatial-related entities, namely, individuals, universals, and collections.In addition, they demonstrated how a more exact understanding helps to overcome the semantic heterogeneous problems in the information integration process.In [13], the semantics of a concept in GI ontologies were presented using an extendable and structural definition framework composed of a number of RDF triple statements, and a comparison algorithm was designed to determine the semantic relationships of concepts between different domains.The primary objective of these studies was to extract and represent the semantics of concepts/entities based on structural common vocabularies, which make the semantics of the concepts/entities comparable.However, the structural common vocabularies in these works are determined by domain experts manually; thus, the objectivity and automation of the algorithms (avoiding ad hoc manual procedures and subjective experts' knowledge) remain quite limited.

Ontology Mapping
Mapping relationship discovery for ontologies has attracted considerable attention in recent years.Various approaches based on processes to find similarities between different but related ontologies have emerged [31].With respect to the literature specifically oriented toward geospatial information (GI) ontologies, authors in [32] performed an analysis of the different models of semantic similarity measurement and evaluated these models with respect to the particular requirements of geospatial data.Authors in [7,33] systematically surveyed several of the most recent and often-referenced works on integrating GI and GI ontology mapping by applying comparison criteria, such as logical inference, mapping approaches, degree of automation, and geospatial relativity.In addition, a general conclusion is proposed that, for the ontology mapping task, the use of formal ontologies and, consequently, the use of reasoners should be mandatory.
In recent years, Volunteered Geographic Information (VGI) has been proposed for GI ontology mapping in a web environment.Authors in [34,35] devised a mechanism for computing the semantic similarity of the Open Street Map (OSM) geographic classes using volunteered lexical definitions to alleviate the semantic gap between different VGI producers.Another set of studies focused on introducing an artificial neural network approach to simulate the human perception and measure the semantic similarity between spatial entities for the purpose of improving the automaticity of the ontology mapping process [36,37].
All these proposals, combining the use of different models of semantic similarity measurement, have emerged to provide solutions to existing GI ontology mapping problems in English environments.However, the semantic web and ontology engineering have experienced significant advancements in standards and techniques, and increasingly more domain ontologies and localization content in the semantic web are described using native natural languages [23].There is a pressing need for cross-lingual ontology mapping mechanisms in the GI community that are designed to reconcile semantics of different ontologies in multilingual environments and to improve the accessibility of various GI ontologies across language barriers [38].

Methodologies
The main procedure for our methodologies is divided into two sub-processes, as shown in Figure 1.In the semantic interpretation process, two GI formal ontologies, namely, O A and O B , are established from the free-text definition of the corresponding classification standards with different natural languages.In the ontology mapping process, all the category names and semantic statements in O A are translated from L A into L B , and the mapping relationships between category names and semantic statements are determined within the same language context, which then serve as the basis for the similarity calculation between categories in different GI ontologies.Finally, a similarity calculation algorithm is designed, and the final mapping results between pairs of categories in different classification standards are determined.
ISPRS Int.J. Geo-Inf.2016, 5, 90 4 of 21 similarity of the Open Street Map (OSM) geographic classes using volunteered lexical definitions to alleviate the semantic gap between different VGI producers.Another set of studies focused on introducing an artificial neural network approach to simulate the human perception and measure the semantic similarity between spatial entities for the purpose of improving the automaticity of the ontology mapping process [36,37].All these proposals, combining the use of different models of semantic similarity measurement, have emerged to provide solutions to existing GI ontology mapping problems in English environments.However, the semantic web and ontology engineering have experienced significant advancements in standards and techniques, and increasingly more domain ontologies and localization content in the semantic web are described using native natural languages [23].There is a pressing need for cross-lingual ontology mapping mechanisms in the GI community that are designed to reconcile semantics of different ontologies in multilingual environments and to improve the accessibility of various GI ontologies across language barriers [38].

Methodologies
The main procedure for our methodologies is divided into two sub-processes, as shown in Figure 1.In the semantic interpretation process, two GI formal ontologies, namely, OA and OB, are established from the free-text definition of the corresponding classification standards with different natural languages.In the ontology mapping process, all the category names and semantic statements in OA are translated from LA into LB, and the mapping relationships between category names and semantic statements are determined within the same language context, which then serve as the basis for the similarity calculation between categories in different GI ontologies.Finally, a similarity calculation algorithm is designed, and the final mapping results between pairs of categories in different classification standards are determined.

Semantic Primitive Extraction
In geospatial information repositories, free-text definitions are often the primary and only available objective descriptions of categories.Semantic primitives are syntactic and lexical patterns in the free-text definition and can be extracted using NLP tools [55].The fields of studies on NLP have developed methods and algorithms for information retrieval and extraction from free-text knowledge resources.The methodology adopted here for analyzing definitions and extracting semantic primitives was introduced by [56].In this research, the lexical patterns of nominal phrases and verb phrases are considered as semantic primitives.An example is illustrated in Figure 2, and the main steps of the process are as follows: 1.
One category definition in free-text format is chosen as the input natural language material; 2.
Word segmentation is performed to split the whole sentence into individual words; 3.
Words are categorized and tagged into their parts-of-speech tag sets (see Tables 1 and 2) and labeled accordingly; 4.
The nominal phrases and verb phrases are chunked, and the sentence structure is analyzed to extract lexical patterns as the semantic primitives.

Semantic Primitive Extraction
In geospatial information repositories, free-text definitions are often the primary and only available objective descriptions of categories.Semantic primitives are syntactic and lexical patterns in the free-text definition and can be extracted using NLP tools [55].The fields of studies on NLP have developed methods and algorithms for information retrieval and extraction from free-text knowledge resources.The methodology adopted here for analyzing definitions and extracting semantic primitives was introduced by [56].In this research, the lexical patterns of nominal phrases and verb phrases are considered as semantic primitives.An example is illustrated in Figure 2, and the main steps of the process are as follows: 1.One category definition in free-text format is chosen as the input natural language material; 2. Word segmentation is performed to split the whole sentence into individual words; 3. Words are categorized and tagged into their parts-of-speech tag sets (see Tables 1 and 2) and labeled accordingly; 4. The nominal phrases and verb phrases are chunked, and the sentence structure is analyzed to extract lexical patterns as the semantic primitives.DEG measure word M "bei"("被") in short bei-const SB "de"("得") inV-deconst.and V-de-R DER other particle MSP sentence-final particle SP "di"("地") before VP DEV common noun NN predicative adjective VA "shi"("是") VC "you"("有") as the main verb VE other verb VV

Construction of the Formal Ontology Model
From Wikipedia an "ontology in information science" is a formal naming and definition of the types, properties, and interrelationships of the concepts that really or fundamentally exist for a particular domain of discourse.It is thus a practical application of philosophical ontology, with taxonomy.In addition, a domain ontology (or domain-specific ontology) represents concepts that belong to a general domain.Thus, for a formal representation [57,58], the domain ontology (denoted by O Domain ), and concepts in the domain could be summarized by Equations ( 1)- (3).
In Equation (1), S(C Domain ) represents the set of concepts in a domain, and the semantics of each concept in the domain are categorized into different groups, namely, S(H C ), S(R C ), and S(P C ); S(H C ) represents the set of the hierarchical relations about the taxonomic information in O Domain , S(R C ) represents the set of other interrelations between these concepts, and S(P C ) represents the set of the semantic properties belong to the concepts in this domain.
In Equation ( 2), the semantics of a concept in the domain are considered as the composition of terminology of this concept (denoted by T C ) and structural definition of this concept (denoted by D C ). Unlike the free-text format of definition, D C commonly consists of the semantic properties of the concept (P C ), the hierarchical relation (H C ) and other interrelations (R C ) between this concept and other concepts in the domain.Thus, from Equations ( 2) and (3), a certain concept in the domain, C Domain can be deduced as a function of T C , R C , H C , and P C in Equation ( 4) in which R C , H C , P C are used to represent the semantics of this concept, and belong to S(R C ), S(H C ), S(P C ), respectively.Considering the situation in the GI domain, we use the word "category" instead of "concept".Because the semantic characteristics of the GI category are highly correlated in space and time [59], the spatial-and temporal-related semantic properties and relations should be included in the model as crucial vocabularies for the representation and identification of the semantics of the GI categories.Thus, the GI ontology O GI and the semantics of a certain category C GI in O GI can be represented as Equations ( 5) and ( 6): O GI " tSpT C q, SpR S q, SpR T q, SpR C q, SpH C q, SpP S q, SpP T q, SpP C qu (5) In Equation ( 5 In order to solve the problems of geographic representation, authors in [60] distinguished three main theoretical tools that are required for the purposes of developing an overall formal theory of spatial representation, namely, mereology, location, and topology, these theoretical tools are selected as the basis for defining spatial-related semantics in our formal ontology model.In addition, geographic entities in reality is essentially dynamic, authors in [61] pointed out that a good ontology must be capable of accounting for spatial reality both synchronically (as it exists at a time) and diachronically(as it unfolds through time), thus the "time point" and "time period" properties should be used to describe dynamic characteristics of the geographic entities in our model.Moreover, in order to specify semantic relations and properties used in geographic definitions, authors in [28] analyzed several geographic ontologies and identified patterns which were systematically used to express specific semantic relations and properties, including hierarchical relations, part-whole relations and neighborhood relations, and semantic properties such as purpose, nature, material, size, and so on.
Based on previous researches and our formal ontology model in Equation ( 5), the semantic property and relation types in our model are subdivided and shown in Figure 3. Considering the situation in the GI domain, we use the word "category" instead of "concept".Because the semantic characteristics of the GI category are highly correlated in space and time [59], the spatial-and temporal-related semantic properties and relations should be included in the model as crucial vocabularies for the representation and identification of the semantics of the GI categories.Thus, the GI ontology OGI and the semantics of a certain category CGI in OGI can be represented as Equations ( 5) and ( 6): { ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( )

O S T S R S R S R S H S P S P S P 
(5) In Equation ( 5 In order to solve the problems of geographic representation, authors in [60] distinguished three main theoretical tools that are required for the purposes of developing an overall formal theory of spatial representation, namely, mereology, location, and topology, these theoretical tools are selected as the basis for defining spatial-related semantics in our formal ontology model.In addition, geographic entities in reality is essentially dynamic, authors in [61] pointed out that a good ontology must be capable of accounting for spatial reality both synchronically (as it exists at a time) and diachronically(as it unfolds through time), thus the "time point" and "time period" properties should be used to describe dynamic characteristics of the geographic entities in our model.Moreover, in order to specify semantic relations and properties used in geographic definitions, authors in [28] analyzed several geographic ontologies and identified patterns which were systematically used to express specific semantic relations and properties, including hierarchical relations, part-whole relations and neighborhood relations, and semantic properties such as purpose, nature, material, size, and so on.
Based on previous researches and our formal ontology model in Equation ( 5), the semantic property and relation types in our model are subdivided and shown in Figure 3.

Transformation from Semantic Primitives to Formal Ontology Model
In order to make the semantic primitives structural and comparable, domain experts are responsible for analyzing these semantic primitives and transforming them into different groups of semantic properties/relations in our geospatial formal ontology model.The famous triple statement Subject-Predicate-Object and the web ontology language (OWL) are selected as the basis for presenting the semantic properties/relations and their values in a machine-readable manner.The Subject represents a C GI in O GI ; the Predicate is a certain semantic property or semantic relation type illustrated in Figure 3, in which all of the semantic relations are presented by object property and the Object in these semantic relations is another C GI in O GI or an "owl:class" object type, while most of the semantic properties are presented by object property too, and a few of them are presented by datatype property in OWL syntax, and the Object in these semantic properties is a "rdfs:literal" datatype.The following rules are adopted to handle the formalization process: (1) The GI category can be represented by a number of semantic relations/properties; however, the number of semantic relations/properties involved should be minimized to avoid redundancy.(2) Not every GI category must cover all semantic relations/properties in the model.The situation whereby two different categories use the same set of semantic relations/properties to represent their semantics cannot be guaranteed.
(3) The semantic information of a certain category in our model is the combination of different semantic relations-properties and their values.This combination should represent all the semantic information of the category and be able to distinguish the different geospatial categories within and beyond domain ontologies to avoid ambiguity.(4) The hypernym, hyponym, and synonym relations should be included in the hierarchical relation group.If category A is a hyponym of category B, A must inherent all the semantic properties/relations of B to retain semantic consistency.
According to the above-mentioned rules, the semantic primitives can be specified into these properties/relations types as structure statements for identification and representation of the GI categories.For example, the free-text definition of the "canal" category in English is "manmade waterway used by watercraft or for drainage, irrigation, mining, or water power".In addition, the semantic primitives of the "canal" category are extracted by applying NLP tools to the set of phrases including "manmade waterway", "used", "watercraft", "drainage", "irrigation", "mining", and "water power".Then, transforming these semantic primitives into the proposed formal ontology model, the semantics of the category "canal" can be represented as a set of several semantic statements as follows: C Canal " tT C " "canal"X H C " "Hypernym : waterway" X P C " "pPurpose : watercra f tqY pPurpose : drainageq Y pPurpose : irrigationq Y pPurpose : miningqY pPurpose : waterpowerq" X P C " "Nature : Manmade"u In addition, the representation in OWL format is illustrated in Figure 4.
ISPRS Int.J. Geo-Inf.2016, 5, 90 8 of 21 In order to make the semantic primitives structural and comparable, domain experts are responsible for analyzing these semantic primitives and transforming them into different groups of semantic properties/relations in our geospatial formal ontology model.The famous triple statement Subject-Predicate-Object and the web ontology language (OWL) are selected as the basis for presenting the semantic properties/relations and their values in a machine-readable manner.The Subject represents a CGI in OGI; the Predicate is a certain semantic property or semantic relation type illustrated in Figure 3, in which all of the semantic relations are presented by object property and the Object in these semantic relations is another CGI in OGI or an "owl:class" object type, while most of the semantic properties are presented by object property too, and a few of them are presented by datatype property in OWL syntax, and the Object in these semantic properties is a "rdfs:literal" datatype.The following rules are adopted to handle the formalization process: (1) The GI category can be represented by a number of semantic relations/properties; however, the number of semantic relations/properties involved should be minimized to avoid redundancy.(2) Not every GI category must cover all semantic relations/properties in the model.The situation whereby two different categories use the same set of semantic relations/properties to represent their semantics cannot be guaranteed.
(3) The semantic information of a certain category in our model is the combination of different semantic relations-properties and their values.This combination should represent all the semantic information of the category and be able to distinguish the different geospatial categories within and beyond domain ontologies to avoid ambiguity.(4) The hypernym, hyponym, and synonym relations should be included in the hierarchical relation group.If category A is a hyponym of category B, A must inherent all the semantic properties/relations of B to retain semantic consistency.
According to the above-mentioned rules, the semantic primitives can be specified into these properties/relations types as structure statements for identification and representation of the GI categories.For example, the free-text definition of the "canal" category in English is "manmade waterway used by watercraft or for drainage, irrigation, mining, or water power".In addition, the semantic primitives of the "canal" category are extracted by applying NLP tools to the set of phrases including "manmade waterway", "used", "watercraft", "drainage", "irrigation", "mining", and "water power".Then, transforming these semantic primitives into the proposed formal ontology model, the semantics of the category "canal" can be represented as a set of several semantic statements as follows: In addition, the representation in OWL format is illustrated in Figure 4.

Semantics Translation
Assume that we have formal ontologies O A , O B presented in different natural languages, namely, language A (L A ) and B (L B ), respectively.According to the geospatial formal ontology model introduced in Section 3.1.2,the semantics of ontologies O A and O B consist of category name sets S (CN A ) and S (CN B ) and semantic statement sets S (SS A ) and S (SS B ), labeled in different natural languages, in which the semantic statement consists of semantic property/relation types (as illustrated in Figure 3   10: Take the "运河" category in Chinese as an example, the semantic primitives of the "运河" category are extracted by applying NLP tools to the set of phrases including "跨流域", "开凿", "供 调水", "航运", "人工水道".Then, transforming these semantic primitives into the proposed formal ontology model, the semantics of the category "运河"can be represented as a set of several semantic statements as follows: 11: 12: And the semantics translation result of C 运河 in English is as follows: 13: C 运河 " tT C " "pCanal)"X H C " "Hypernym : pWaterway, Aqueductq"X P C " "pPurpose : pWater transfer, DiversionqqY pPurpose : pShippingqq" X P C " "Nature : pManual,Artificialq"X R S " "Topology : pInter-basin, Across river basinsq"u (9)

Semantic Statement Mapping
To determine the mapping relationships between categories in different GI ontologies, the mapping relationships at the semantic statement level should be determined first because the semantic statement presents the most detailed semantic characteristics of the compared categories.Once their relationships are determined, the similarity between categories can be determined quantitatively.Algorithm 2 shows the comparison process for category names and semantic statements between O A and O B .In addition, all the mapping results M(O A , O B ) are stored as the basis for the similarity calculation between the concepts in different GI ontologies.

Study Material
To illustrate the methodologies, two different classification standards in two corresponding natural languages have been selected for use in the mapping process.CS C is developed based on the national topographic map standards in China (Standards of "Cartographic symbols for national fundamental scale maps" and "Specifications for feature classification and codes of fundamental geographic information").CS A is developed by the U.S. Geological Survey in America (http://cegis.usgs.gov/ttl/USTopographic.ttl).Both standards are digital literature materials; the category names and their free-text definitions are provided as source information for our experiment.In addition, for the sake of simplicity and clarity, our study was restricted to the "surface water" categories from these two classification standards.Table 3 briefly lists the characteristics of these two selected dataset, with detailed explanations as follows: (1) Both standards have their own classification system to address the categories of "surface water".
The categories in CS C are organized using a four-level hierarchy with six major categories.By contrast, the categories in CS A are organized by a four-level hierarchy with 81 major categories, which means that the hierarchical structure of CS A does not closely match that of CS C .(2) The free-text definitions in both standards are used as category definitions.
(3) The number of categories in CS C is 74, and the number of concepts in CS A is 92; thus, the CS A covers more category types than does CS C .(4) The natural language in CS C is Chinese, whereas the natural language in CS A is English, which means that there is a natural language barrier between these two GI classification standards.

Results
The well-defined category definitions in both CS C and CS A serve as the basis for our study.The Web Ontology Language (OWL) API is integrated to facilitate the implementation of the proposed algorithm in Eclipse with the JAVA language, and the experiment results are as follows.

Semantic Statement Mappings
The semantic primitives are extracted using the Stanford Natural Language Processing Tools (http://nlp.stanford.edu/software/)and are transformed into the formal ontologies O C and O A with the set of category names and semantic statements by domain experts and encoded by the OWL via Protégé.Using the semantic statement mapping algorithm introduced in Section 3.2.2, the number of mapping relationships between the statements in O C and O A is recorded, and the mapping results for different semantic property/relation types are shown in Table 4.The total number of semantic statements in O C is 142, and the total number of such statements in O A is 181.In addition, the total mapping rate of the semantic statements between O C and O A is 28.69%.The details of the mapping relationships between semantic statements in each type can be found in Appendix 5.
For the semantic statement about the semantic property types, the most matched type is "purpose".This is because the semantic property type of "purpose" is used to represent the manmade category, which includes "ditch", "canal", and "dam", and the free-text definitions in both the Chinese and American classification standards for these types of categories are very similar.The semantic information about purpose and functionality are considered as the crucial characteristics of the categories.It is easy to understand that the semantic property type "nature" has the highest mapping rate, namely, 100%, because there are only two values for this type of semantic statement, namely, "natural" and "manmade", in both O C and O A .Considering the semantic property type "location", there are seven semantic statements in O C , and eight in O A , but the mapping rate of this type is extremely low(only one semantic statement is mapped with mapping rate 7.14%).That's because the semantic property type "location" is used to describe the region environment where certain geographic category is at, and a lot of the categories in O A are bay-related or glacier-related, such as "glacier", "ice cap" and "iceberg tongue" with semantic property value of "location", "mountainous area", "regions of perennial frost", and "coast", respectively, and there are no such categories in O C .For the semantic statement about the relation types, the most matched type is "spatial relation", which is also the type with the highest mapping rate, indicating that the spatial-related relations play a major role in the identification and representation of GI semantics.

Similarity Calculation and Category Mappings
The similarities between concepts are calculated using the semantic statement mapping relationships and Algorithm 3 proposed in Section 3.2.3.Three typical examples of the mapping results between categories are chosen for further discussion.Table 5 shows the names and free-text definitions of the compared category pairs.In addition, the corresponding semantic statements, calculated similarity values and final mapping relationships between these category pairs are presented in Table 6.Precipitation or snowmelt water within a short time after the river or river diversions left after the river.

Concept 2 in O A Wash
The usually dry portion of a stream bed that contains water only during or after a local rainstorm or heavy snowmelt.
River, river, lake, sea, wells, springs and reservoirs, ponds, ditches, and other natural and artificial water bodies and the connected system in general.

Concept 3 in O A Surface Water
The water portion of the Earth's surface, including the surface of sea and inland waters  "Hyponym: river" Exact match "Hyponym:River" "Hyponym: river" Exact match "Hyponym:Stream" "Hyponym:lake" Exact match "Hyponym:Lake" "Hyponym:sea" Exact match "Hyponym:Sea" "Hyponym:spring" Exact match "Hyponym:Spring" "Hyponym:reservoir" Exact match "Hyponym:Reservoir" "Hyponym:pond" Exact match "Pond" "Hyponym:ditch" Exact match "Ditch" "Hyponym:body of water" Exact match "Hyponym:Water body" "Nature:natural" Exact match "Nature:Natural" "Nature:artificial" Exact match "Nature:Manmade" "Material:water" Exact match "Material:Water" These two concepts are comparable because the mapping relationship between their concept names is "exact match".Because their concept names and four semantic statements are matched (detailed mapping relationships are illustrated in Table 6, line 1 and 2), the second condition in Equation ( 9) is used to calculate the final similarity between "spillway" in O C and "spillway" in O A .The similarity value between these two concepts is calculated as 0.78; thus, the mapping relationship between these two concepts is "close match".This example demonstrates the simplest case for the calculation of the semantic similarity between concepts.
Example 2: Concept pair of "arroyo (dry river)" in O C and "wash" in O A In this example, the mapping relationship between the concept name of "arroyo (dry river)" and "wash" cannot be determined based on the mapping algorithm in Section 3.2.1.However, the similarity value between these two concepts is higher than the value in example (1).This is because all the semantic statements used to represent the semantic meaning of these two concepts are correspondingly matched (detailed mapping relationships are illustrated in Table 6, line 3 and 4), and all the mapping relationships between them are "exact match".The first condition in Equation 9 is used to calculate the final similarity between the concepts "arroyo (dry river)" in O C and "wash" in O A .The similarity value between these two concepts is calculated as 1.0; thus, the mapping relationship between these two concepts is "exact match".This example demonstrates a common situation in the cross-lingual environment in that two concepts have the same semantic meaning while their names are definitely different.Moreover, the utility of applying our methodologies to the complex application of cross-lingual GI ontology integration has been proven.Example 3: Concept pair of concept 3 "Water System" in O C and concept 3 "Surface water" in O A At first glance, the semantic statements between the concept "water system" and "surface water" are not matched very well, and the concept names of these two concepts cannot be matched either.This is because these two concepts are both the top concept in their own taxonomies, and these two concepts are abstract concepts in that they do not represent real-world objects with detailed characteristic entities, for example, rivers, lakes, and oceans.Thus, the definitions of this category in different languages may be very different, even when they are conveying the same meaning.Therefore, the solution for the semantic meaning representation of this type concept is not the same as the solution used in Examples (1) and (2).The sematic meaning of the hyponym-related concepts should be considered to infer the integrated semantic meaning of this abstract concept.After the implicit semantic statements have been inferred out (detailed mapping relationships are illustrated in Table 6, line 5 and 6), the first condition in Equation ( 9) is used to calculate the final similarity between the concepts "water system" in O C and "surface water" in O A .The similarity value between these two concepts is calculated as 0.92; thus, the mapping relationship between these two concepts is "close match".

Conclusions and Future Work
The presented research focuses on the determination of semantic mapping relationships between categories in different GI ontologies with natural language barriers.The proposed formal ontology model in this study is used to represent and identify the semantic characteristics of the GI categories with OWL-based semantic statements transformed from free-text definitions of two GI classification standards.A new similarity calculation algorithm based on this formal ontology model is presented to distance the semantic similarities and identify the mapping relationships between categories.
In particular, we work with two classification standards of topographic maps in Chinese and American English.The conducted experiment indicates that the proposed approach successfully determines the mapping relationships between categories in different GI ontologies and facilitates ontology integration in a cross-lingual environment.Due to the usages of the multilingual supported NLP tools in our experiment, it is easy to replicate our model to determine the mapping relationships between other GI ontologies, which may be described using other native natural languages, in addition to Chinese.However, this model has only been applied to geospatial information (GI) integration at the category level, and research on GI integration at the data level has not been fulfilled.That will form the basis for future study.In addition, publishing the mapping information in a cross-lingual context as linked data in a semantic web environment should also be considered.

Figure 1 .
Figure 1.Main procedure for our methodologies.

Figure 1 .
Figure 1.Main procedure for our methodologies.

Figure 2 .
Figure 2. Extract the semantic primitives from the free-text definitions by applying NLP tools in Chinese and English.

Figure 2 .
Figure 2. Extract the semantic primitives from the free-text definitions by applying NLP tools in Chinese and English.
), S(T C ) represents the set of the category names in O GI ; S(R S ), S(R T ) represent the set of the spatial-related and temporal-related semantic relations between categories; S(H C )represents the set of hierarchical relations; S(P S ), S(P T ) represent the set of the spatial-related and temporal-related semantic properties belong to the categories in O GI ; and S(R C ), S(P C ) represent the set of other semantic properties and relations in O GI .And in Equation (6), V x represents the values of certain semantic properties/relations of C GI ; T C , R S , R T , R C , H C , P S , P T , P C are used to represent the semantics of C GI , and belong to S(T C ), S(R S ), S(R T ), S(R C ), S(H C ), S(P S ), S(P T ), S(P C ), respectively.
ISPRS Int.J. Geo-Inf.2016, 5, 90 7 of 21 in which RC, HC, PC are used to represent the semantics of this concept, and belong to S(RC), S(HC), S(PC), respectively.
), S(TC) represents the set of the category names in OGI; S(RS), S(RT) represent the set of the spatial-related and temporal-related semantic relations between categories; S(HC)represents the set of hierarchical relations; S(PS), S(PT) represent the set of the spatial-related and temporalrelated semantic properties belong to the categories in OGI; and S(RC), S(PC) represent the set of other semantic properties and relations in OGI.And in Equation (6), Vx represents the values of certain semantic properties/relations of CGI;TC, RS, RT,RC,HC,PS, PT, PC are used to represent the semantics of CGI, and belong to S(TC), S(RS), S(RT), S(RC), S(HC), S(PS), S(PT), S(PC), respectively.

Figure 3 .Figure 3 .
Figure 3. Semantic property and relation groups in the Geospatial Formal Ontology Model.3.1.3.Transformation from Semantic Primitives to Formal Ontology Model Figure 3. Semantic property and relation groups in the Geospatial Formal Ontology Model.

Figure 4 .
Figure 4. Representation of the category "canal" in OWL format: (a) The OntoGraf view in Protégé and (b) the semantic statement presentation in turtle file format.

Algorithm 1 . 1 :
in Section 3.1.2)and their corresponding values.In order to cross the natural language barrier between O A and O B , algorithm 1 illustrates the process of semantics translation between L A and L B : Semantics Translation.Input: Formal ontologies O A (S(CN A ), S(SS A )) in L A 2: Output: Translation candidate result set of the semantics in O A , O 1 A (S(TC(CN A )), S(TC(SS A -object))) in 3: L B .4: Symbols: 5: S(TC(CN A ))-Translation candidate result set of S(CN A ) in L B .6: S(TC(SS A ))-Translation candidate result set of S(SS A ) in L B .7: ss A -object-The Object part of the semantic statement ss A .8: 1:for each category name cn A in S(CN A ), translate cn A in L A into cn A 1 in L B by using different Machine Translation (MT) web services (Google Translator API at" http://translate.

Table 1 .
Summary of the Penn Treebank Part-of-Speech Tag sets in English.

Table 1 .
Summary of the Penn Treebank Part-of-Speech Tag sets in English.

Table 2 .
Summary of the Penn Treebank Part-of-Speech Tag sets in Chinese.

for each semantic statement ss
Mapping result set M(O A , O B ) about category names and semantic statements between 3: O A and O B .4: Symbols: 5: T(ss)-semantic property/relation types for a certain semantic statement ss.6: M(O A 1 , O B )-mapping relationships about category names and semantic statements between O A A in S(SS A ), find the translation candidate results of ss A -object, 16: TC(ss A -object), 17: 6: for each translation candidate tc(ss A -object) in TC(ss A -object), search S(SS B -object) in O B , 18: find the matched semantic statement Object, ss B -object in S(SS B ) by applying Equation(10), 1 7: and O B .8: 1: for each category name cn A in S(CN A ), find the translation candidate results of cn A , TC(cn A ), 9: 2: for each translation candidate tc(cn A ) in TC(cn A ), search S(CN B ) in O B , find the matched 10: category name cn B in S(CN B ) by applying Equation(10), 11: 3: If there is a translation candidate tc(cn A ) has the mapping relationship "exact match" 12: with cn B , store the mapping result m(cn A , cn B , 'exact match') in M(O A , O B ); 13: 4: else If there is a translation candidate tc(cn A ) has the mapping relationship 14:"close match" with cn B , store the mapping result m(cn A , cn B , 'close match') in M(O A , O B ); 15: 5: B -object, and T(ss A ) equals T(ss B ), store the mapping result m(ss A , ss B , 'B -object, and T(ss A ) equals T(ss B ), store the mapping result m(ss A , ss B , '

Table 3 .
Characteristics of CS C and CSA .

Table 4 .
Condition of the mapping statements between O C and O A .

Table 5 .
Names and free-text definitions of the compared concept pairs.

Table 6 .
Example of categories definitions and similarity calculation.

Table A1 .
Detail of the Mapping Statements between O C and O A.