A Review of Geospatial Semantic Information Modeling and Elicitation Approaches

: The present paper provides a review of two research topics that are central to geospatial semantics: information modeling and elicitation. The ﬁrst topic deals with the development of ontologies at di ﬀ erent levels of generality and formality, tailored to various needs and uses. The second topic involves a set of processes that aim to draw out latent knowledge from unstructured or semi-structured content: semantic-based extraction, enrichment, search, and analysis. These processes focus on eliciting a structured representation of information in various forms such as: semantic metadata, links to ontology concepts, a collection of topics, etc. The paper reviews the progress made over the last ﬁve years in these two very active areas of research. It discusses the problems and the challenges faced, highlights the types of semantic information formalized and extracted, as well as the methodologies and tools used, and identiﬁes directions for future research.


Introduction
Geospatial semantics has been at the center of research over the last 20 years. The semantic problems occurring during the exchange, reuse, and integration of heterogeneous spatial data and the need to achieve semantic interoperability were among the first research challenges faced by the Geographic Information Science (GIScience) community [1]. In this endeavor, data about location and thematic attributes were not considered rich enough to fully represent the complexity of geographic entities and to support their integration and interpretation in different contexts. Hence, the discussion shifted from geospatial data to geospatial concepts and from data specifications to the nature and conceptualization of geographic entities and phenomena. Research on ontologies allowed for formal representations of geospatial concepts and enriched the discussion on the nature and conceptualization of geographic space.
Over these years, several reviews of geospatial semantics have highlighted different research directions based on the challenges of that specific period. In an early review of geospatial semantics, Kuhn [1] identified two aspects pertaining to this research field: Understanding Geographic Information Systems (GIS) contents and capturing this understanding in formal theories. He also highlighted that reasoning is even more important than formalization of meaning. He identified several challenges related to geospatial semantics and semantic interoperability: data discovery and evaluation, and service discovery, evaluation, and composition.
The interpretation of data across heterogeneous data sources and scientific domains became even more challenging in the context of the Semantic Web. The Semantic Web has further complicated the modeling, interpretation, reuse, and integration of knowledge, since "web knowledge has no primitives,

Upper-Level Ontologies
Geographic ontologies may be considered as domain ontologies. However, top-level ontologies are also relevant to geospatial knowledge. They define central notions for the geospatial domain such as space, time, spatial regions, boundaries, and processes and investigate ontological issues regarding the hypostasis and dimensionality of geographic entities and their dependence on spatial regions and boundaries [30][31][32]. Prominent upper-level ontologies that have influenced geospatial ontology development and research are: Basic Formal Ontology (BFO) [33,34], Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [35], General Formal Ontology (GFO) [36], Generalized Upper Model (GUM) [37], and Suggested Upper Merged Ontology (SUMO) [38]. WordNet [39], although not designed as an ontology but as a lexical database and semantic network, has also been used in the geospatial domain for ontology-based information extraction and retrieval, and semantic similarity computation.
Top-level ontologies are core components of formal ontology, the discipline that integrates aspects of philosophy, formal logic, and artificial intelligence [40]. Formal ontology is defined as the theory of a priori distinctions among the entities of the world (physical objects, events, processes, quantities, etc.) and among the meta-categories used to model the world (concepts, properties, qualities, states, etc.) [41]. An important ontological distinction upon which several top-level ontologies are built is that between continuants and occurrents. Continuants (also called endurants) are objects that are wholly present through time, whereas occurrents (also called perdurants) are objects that are temporally restricted and have temporal parts, such as processes or events. Space, spatial regions, and spatial relations, as well as time and temporal phenomena are fundamental notions in upper-level ontologies, as are mereology, the theory of parthood relations [42], and topology, the theory of spatial continuity and compactness [43].
In the last five years, the discipline of formal ontology has focused on further formalization of these notions. Baumann et al. [44] introduced GFO-Space, a first-order formalization of an ontology of space for General Formal Ontology (GFO). The principles underlying the ontology originate from Brentano's ideas on space and continuum and focus on phenomenal space (i.e., space determined by material objects and relations between them), as appearing to the mind of a subject. GFO-Space accounts for mereotopological relations, boundaries and dimensionality of entities. The axiomatization of their theory is based on four primitives: The category "space region" (i.e., the basic entities of phenomenal space), and the relations: "spatial part of", "spatial boundary of", and "coincidence". notions in upper-level ontologies, as are mereology, the theory of parthood relations [42], and topology, the theory of spatial continuity and compactness [43]. In the last five years, the discipline of formal ontology has focused on further formalization of these notions. Baumann et al. [44] introduced GFO-Space, a first-order formalization of an ontology of space for General Formal Ontology (GFO). The principles underlying the ontology originate from Brentano's ideas on space and continuum and focus on phenomenal space (i.e., space determined by material objects and relations between them), as appearing to the mind of a subject. GFO-Space accounts for mereotopological relations, boundaries and dimensionality of entities. The axiomatization of their theory is based on four primitives: The category "space region" (i.e., the basic entities of phenomenal space), and the relations: "spatial part of", "spatial boundary of", and "coincidence".  Mereogeometry is the theory of space and spatial relations which formalizes geometric notions within a region-based mereological framework. On the other hand, mereotopological relations, such as contact, parthood, and overlap, are also fundamental for the qualitative representation of spatial information. Schmidtke [45] outlines a mereogeometric framework for the formalization of geometric relations of incidence, congruence, and parallelism over extended regions. Hahmann [46] extends the axiomatization of CODI (Containment and Dimension), a first-order logic ontology of multidimensional mereotopology, with the mereological operations intersection and difference that apply to pairs of regions regardless of their dimensions. CODI provides a first-order formalization of the notions of spatial containment and relative spatial dimension and defines a set of six intuitive Mereogeometry is the theory of space and spatial relations which formalizes geometric notions within a region-based mereological framework. On the other hand, mereotopological relations, such as contact, parthood, and overlap, are also fundamental for the qualitative representation of spatial information. Schmidtke [45] outlines a mereogeometric framework for the formalization of geometric relations of incidence, congruence, and parallelism over extended regions. Hahmann [46] extends the axiomatization of CODI (Containment and Dimension), a first-order logic ontology of multidimensional mereotopology, with the mereological operations intersection and difference that apply to pairs of ISPRS Int. J. Geo-Inf. 2020, 9, 146 5 of 31 regions regardless of their dimensions. CODI provides a first-order formalization of the notions of spatial containment and relative spatial dimension and defines a set of six intuitive mereotopological relations: Containment and its refinement parthood as well as contact and its refinements partial overlap, incidence, and superficial contact.
Time and temporal phenomena are also prominent notions within upper-level ontologies. Key issues relating to how time and temporal phenomena are treated relate to: (a) Dimensionality (i.e., the treatment of instants and intervals), (b) frame-dependence (i.e., definition of time with respect to a reference frame), and (c) indexicality (i.e., reference to and distinction of the past, the present, and the future) [47].
In order to reconcile the data-modeling and process-modeling requirements of GIScience, Galton [48] laid the foundation for a formal theory of events and processes integrating two distinct ways of viewing time, referred as historical and experiential time. Historical time, presupposed by data-modeling functions, such as storage, retrieval, manipulation, and presentation, is considered as static and "frozen" and places emphasis on completed events. Experiential time on the other hand, presupposed by process-modeling functions, such as explanation, prediction, and simulation, is considered as active and "fluid" and places emphasis on ongoing processes. Temporal scale or granularity is considered important to delineate the relationship between processes and events. The theory distinguishes different types of processes, specified by activity conditions, and different types of events, specified by occurrence conditions. It also presents a formalization of operations for deriving events from processes, processes from events, and events from events.
Understanding the models and theories subsumed by upper-level ontologies is critical for making the ontological commitments explicit, specifying relations to other upper-level ontologies and extending upper-level ontologies to create new domain-specific ones. Hanzal et al. [49] present a comparative survey of how foundational and Semantic Web ontologies (Event Ontology (http: //purl.org/NET/c4dm/event.owl#), Simple Event Model Ontology [50], Linking Open Descriptions of Events (http://linkedevents.org/ontology/), etc.) formalize the notion of event. Based on modeling considerations followed by these ontologies, they provide an empirical classification of events in four categories: actions, happenings, planned "social" events, and structural components of temporal entities. Categorization relies on the distinction between object and relationship as formalized by the modeling language PURO [51]. Muñoz and Grüninger [52] apply the process of ontology verification on core temporal concepts of the Suggested Upper Merged Ontology (SUMO) to rule out unintended models and characterize missing intended ones. The verification process also supports ontology mappings to other time ontologies, PSL-Core ontology [53] and Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) upper ontology [35].

Domain, Task, and Application Ontologies
Besides upper-level ontologies, a variety of other ontologies have been developed in the geospatial domain at different levels of formality and generality ( Figure 1). Domain ontologies have been developed at different levels of detail, covering whole domains, such as earth and environmental sciences [54], oceanography [20], land cover and land use [55], etc., as well as specific domain concepts, such as city, locality, and forest.
Research in the last five years addresses or revisits issues related to the formalization of interdisciplinary or vague geospatial concepts, grounding their meaning on upper-level ontologies. Furthermore, the formalization of geospatial concepts with a cognitive and linguistic basis such as places, landforms, or landscapes aims to bridge the gap between the qualitative human perception and the necessity to have rigorous, unambiguous definitions in view of their implementation.
Alvarez and Bennett [56] propose a framework for the formal definition of the highly vague and interdisciplinary concept "forest". The framework encapsulates different aspects of forest definitions pertaining to their classification (i.e., determining whether an object is an instance of a class), individuation (i.e., establishing how many distinct individual objects of a given type exist), and demarcation (i.e., determining the boundary of an object), such as location, morphological, metrical, topological, and mereological restrictions, qualitative characteristics, scale, etc. The framework, which is implemented in a prototype prolog-based GIS, is based on supervaluation semantics to express the variety of possible meanings of the forest concept, from different perspectives. The fundamental idea of supervaluation semantics is that a vague language can be interpreted in many different precise ways, each of which can be modelled in terms of precise truth conditions for each predicate of the language, which is referred to as a precisification.
Ballatore [57] takes a preliminary step towards the formalization of an ontology of place by surveying the representation of this vague, polysemous, and culturally-dependent notion across a range of lightweight and formal ontologies. He also discusses issues (cultural and linguistic dependence of place, temporal dimension, social role, scale, and themes) and outlines conceptual tools for a multi-faceted formal ontology of place.
Calafiore et al. [58] focus on cities, which are viewed as systems of various types of urban artefacts interacting with human activities by playing multiple roles, probably at the same time. They perform an ontological analysis of urban artefacts and their social uses. In this context, a distinction is made between the intended and actual uses of urban artefacts and places are treated as social concepts. The ontological analysis is grounded on DOLCE foundational ontology due to its cognitive orientation. The general taxonomy in DOLCE is enriched with the notions of urban artefact, urban artefact types and roles, social practices, institutional places and social places and uses of artefacts are modeled in terms of roles theory.
In the context of historical geographic information systems, Garbacz et al. [59] identify diachronic criteria of identity for localities in order to model the type of changes they may undergo over time. In this context, localities are considered as endurants, which possess qualities relating to their name, location, type, and mereology, and participate in events or processes (considered as perdurants). Event-based criteria of identity are used to define qualitative transformations of localities with respect to one of these qualities.
Gharebaghi and Mostafavi [60] propose an ontology of the urban environment for modeling the interaction between humans and their social and physical environments and supporting the accessibility of people with disabilities. The ontology is based on the nature-development perspective. In this approach, concepts belong either to natural environment (such as forest and tree) or to developed environment (such as sidewalk and building) and have physical and social properties which are related to each other.
Stephen and Hahmann [61] developed an ontological framework for the formalization of surface and subsurface spatio-temporal processes that describe hydrologic flow ( Figure 2). The framework extends static hydrogeological concepts from the Hydro Foundational Ontology (HyFO) [61] and also builds on the distinction between endurants and perdurants and the participation relation PC(x, y, t) between them, as defined by DOLCE upper-level ontology. The participation relation PC(x, y, t) expresses that an endurant x participates in a perdurant y at time t. Flow processes are distinguished based on how and what kinds of endurants can participate and are formalized using semantic roles such as theme participant, source participant, goal participant, and locative participant. The taxonomy and related concepts are axiomatized in first-order logic.
Several domain ontologies were also developed for landforms. Landforms are fuzzy objects whose perception and classification mainly depends on the context and on human cognition. Hence, it has been acknowledged since long that a universal ontology of landforms does not exist. Since landforms are mainly characterized by their shape, several ontologies for specific fields or languages were proposed. These ontologies structure landforms into general concepts that are further specialized with a more precise shape. Yan et al. [62] present an ontology of undersea features following the nomenclature from the International Hydrographic Organization. They first define broad categories such as depression and eminence that are then refined into features such as trench and reef based on some shape descriptors such as "elongated" or "deep". The ontology is applied to feature classification from bathymetric data. Grenoble et al. [63] propose an ontology including both land and sea terms in the Kalaallisut language, categorizing features according to their shape and their function. More generally, Sinha et al. [64] present a broader landform reference ontology where landforms are also organized according to their shape and according to their dependence on the planetary body ( Figure 3). The ontology is applied to mapping linguistic categories to landform categories.
perspective. In this approach, concepts belong either to natural environment (such as forest and tree) or to developed environment (such as sidewalk and building) and have physical and social properties which are related to each other.
Stephen and Hahmann [61] developed an ontological framework for the formalization of surface and subsurface spatio-temporal processes that describe hydrologic flow ( Figure 2). The framework extends static hydrogeological concepts from the Hydro Foundational Ontology (HyFO) [61] and also builds on the distinction between endurants and perdurants and the participation relation PC(x, y, t) between them, as defined by DOLCE upper-level ontology. The participation relation PC(x, y, t) expresses that an endurant x participates in a perdurant y at time t. Flow processes are distinguished based on how and what kinds of endurants can participate and are formalized using semantic roles such as theme participant, source participant, goal participant, and locative participant. The taxonomy and related concepts are axiomatized in first-order logic.  Several domain ontologies were also developed for landforms. Landforms are fuzzy objects whose perception and classification mainly depends on the context and on human cognition. Hence, it has been acknowledged since long that a universal ontology of landforms does not exist. Since landforms are mainly characterized by their shape, several ontologies for specific fields or languages were proposed. These ontologies structure landforms into general concepts that are further specialized with a more precise shape. Yan et al. [62] present an ontology of undersea features following the nomenclature from the International Hydrographic Organization. They first define broad categories such as depression and eminence that are then refined into features such as trench and reef based on some shape descriptors such as "elongated" or "deep". The ontology is applied to feature classification from bathymetric data. Grenoble et al. [63] propose an ontology including both land and sea terms in the Kalaallisut language, categorizing features according to their shape and their function. More generally, Sinha et al. [64] present a broader landform reference ontology where landforms are also organized according to their shape and according to their dependence on the planetary body ( Figure 3). The ontology is applied to mapping linguistic categories to landform categories. One limitation of these approaches is that the shape of a landform is always defined within a context and terms such as "large" or "narrow" always depend on this context and the use of a generic landform ontology for classifying landforms from data is still a challenge. Hence, Guilbert et al. [65] propose to use salient elements of the terrain forming the backbone (or skeleton) of the landforms assuming that these elements can be translated into geometrical shapes (point or line) that can be identified from a digital terrain model. The landform is then delineated by a region built around its skeleton. The model proposes to handle fuzziness by defining a core region surrounded by a larger region where its boundary is located (Figure 4). Guilbert   One limitation of these approaches is that the shape of a landform is always defined within a context and terms such as "large" or "narrow" always depend on this context and the use of a generic landform ontology for classifying landforms from data is still a challenge. Hence, Guilbert et al. [65] propose to use salient elements of the terrain forming the backbone (or skeleton) of the landforms assuming that these elements can be translated into geometrical shapes (point or line) that can be identified from a digital terrain model. The landform is then delineated by a region built around its skeleton. The model proposes to handle fuzziness by defining a core region surrounded by a larger region where its boundary is located (Figure 4). Guilbert and Moulin [66] describe a framework for applying this approach to landform description, illustrating it on specific landforms. However, the framework remains mainly a conceptual approach that needs to be implemented for generic purpose. framework for applying this approach to landform description, illustrating it on specific landforms. However, the framework remains mainly a conceptual approach that needs to be implemented for generic purpose. Domain ontologies also brought a lot of interest in remote sensing and Geographic Object-Based Image Analysis (GEOBIA). Most processing in remote sensing requires the classification of data (usually raster images) into significant segments based on geometrical, topological, and semantic attributes. While most methods are data-driven, the abundance of data in different formats and the need for generic approaches require more knowledge-driven approaches that would allow for the integration of expert knowledge in the process. For that purpose, domain ontologies are relevant since ontologies can store knowledge and help sharing this knowledge [67]. Ontologies can include both human concepts, based on cognition, and their translation into numerical concepts required for processing. By this way, a same concept can be mapped to different numerical definitions associated to different types of data. As mentioned by Arvor et al. (2019) [67], the use of ontologies does not improve classification results but provides a more reliable representation of expert knowledge. A geographical object is defined by its characteristics in a domain ontology. Each characteristic is then related to some characteristic values measured on the image ( Figure 5). In the context of foreshore identification, Argyridis and Argialas [15] designed an ontology to formalize the implicit spectral, geometric, and spatial relationships described in the interpretation criteria, and employ them during identification. Rajbhandari et al. [14] also emphasize that ontologies allow for a better modularization of the methods: Common knowledge can be reused together with features more specific to an application. They also facilitate the automation of the classification since knowledge is transferred minimizing human intervention. They show in a case study that ontologies can be combined with machine learning approaches where the ontology stores generic knowledge and machine learning supplements the classification with specific rules. The application of some rules to the ontology often depends on the definition of some threshold values. These thresholds can be learned and applied for each specific case. The result of the classification can then be assessed by semantic similarity measures. Domain ontologies also brought a lot of interest in remote sensing and Geographic Object-Based Image Analysis (GEOBIA). Most processing in remote sensing requires the classification of data (usually raster images) into significant segments based on geometrical, topological, and semantic attributes. While most methods are data-driven, the abundance of data in different formats and the need for generic approaches require more knowledge-driven approaches that would allow for the integration of expert knowledge in the process. For that purpose, domain ontologies are relevant since ontologies can store knowledge and help sharing this knowledge [67]. Ontologies can include both human concepts, based on cognition, and their translation into numerical concepts required for processing. By this way, a same concept can be mapped to different numerical definitions associated to different types of data. As mentioned by Arvor et al. (2019) [67], the use of ontologies does not improve classification results but provides a more reliable representation of expert knowledge. A geographical object is defined by its characteristics in a domain ontology. Each characteristic is then related to some characteristic values measured on the image ( Figure 5). In the context of foreshore identification, Argyridis and Argialas [15] designed an ontology to formalize the implicit spectral, geometric, and spatial relationships described in the interpretation criteria, and employ them during identification. Rajbhandari et al. [14] also emphasize that ontologies allow for a better modularization of the methods: Common knowledge can be reused together with features more specific to an application. They also facilitate the automation of the classification since knowledge is transferred minimizing human intervention. They show in a case study that ontologies can be combined with machine learning approaches where the ontology stores generic knowledge and machine learning supplements the classification with specific rules. The application of some rules to the ontology often depends on the definition of some threshold values. These thresholds can be learned and applied for each specific case. The result of the classification can then be assessed by semantic similarity measures. In the last years, there has been a research shift towards the development of task and application ontologies to define, share, and reuse more specific knowledge. These ontologies are especially relevant for defining complicated geospatial tasks and operations or simulating geospatial workflows necessary to solve complex geospatial problems. They usually require the combination of other ontologies. For example, Zhuang et al. [68] developed an ontology-based approach using various ontologies (task ontology, process ontology, GIS operation ontology, interface ontology, data type ontology, GIS data ontology, and GIService-type ontology) to support a task-oriented knowledge base for modeling meteorological early-warning (MEW) analysis ( Figure 6). In the context of web-service approaches, Hofer et al. [69] developed a knowledge base to support spatial analysis workflow development. Yue et al. [23] present a linked data approach for discovering geospatial resources in the Web of Data to build geoprocessing workflows. The approach leverages existing ontologies and vocabularies to semantically describe different types of geospatial resources, such as sensors, observations, raster data, and geospatial services.
Rospocher [70] outlines the methodology for creating the Personalized Environmental Service Configuration and Delivery Orchestration (PESCaDO) Ontology for supporting personalized environmental decision support. The methodology involves requirements specification, reuse of existing models and ontologies, such as GeoSPARQL 4 and PROVenance Interchange Ontology (PROV-O) 5 , terminology extraction, formalization, revision, and documentation. The upper-level of the PESCaDO ontology includes three interrelated components: (a) The problem component describing the user request, activity, and profile; (b) the data component describing meteorological, pollen, and air quality data; and (c) the conclusion component for encoding warnings, recommendation, and suggestions provided by the system. Application ontologies have also been developed for managing natural disaster information [71,72]. In this context, ontologies are designed not only as a knowledge base but rather as a tool for assisting decision making in emergency response. Zhong et al. [72] implemented an application ontology where further information is inferred from semantic and spatial data. Linyao et al. [71] propose a tool for retrieving and managing disaster-related data. The objective is to automatically link some sources by measuring spatial and temporal similarities. Both approaches provide a system where data are collected or disseminated through web services. In the last years, there has been a research shift towards the development of task and application ontologies to define, share, and reuse more specific knowledge. These ontologies are especially relevant for defining complicated geospatial tasks and operations or simulating geospatial workflows necessary to solve complex geospatial problems. They usually require the combination of other ontologies. For example, Zhuang et al. [68] developed an ontology-based approach using various ontologies (task ontology, process ontology, GIS operation ontology, interface ontology, data type ontology, GIS data ontology, and GIService-type ontology) to support a task-oriented knowledge base for modeling meteorological early-warning (MEW) analysis ( Figure 6). In the context of web-service approaches, Hofer et al. [69] developed a knowledge base to support spatial analysis workflow development. Yue et al. [23] present a linked data approach for discovering geospatial resources in the Web of Data to build geoprocessing workflows. The approach leverages existing ontologies and vocabularies to semantically describe different types of geospatial resources, such as sensors, observations, raster data, and geospatial services.
Rospocher [70] outlines the methodology for creating the Personalized Environmental Service Configuration and Delivery Orchestration (PESCaDO) Ontology for supporting personalized environmental decision support. The methodology involves requirements specification, reuse of existing models and ontologies, such as GeoSPARQL (http://www.ogc.org/standards/geosparql) and PROVenance Interchange Ontology (PROV-O) (https://www.w3.org/TR/prov-o/), terminology extraction, formalization, revision, and documentation. The upper-level of the PESCaDO ontology includes three interrelated components: (a) The problem component describing the user request, activity, and profile; (b) the data component describing meteorological, pollen, and air quality data; and (c) the conclusion component for encoding warnings, recommendation, and suggestions provided by the system.
Application ontologies have also been developed for managing natural disaster information [71,72]. In this context, ontologies are designed not only as a knowledge base but rather as a tool for assisting decision making in emergency response. Zhong et al. [72] implemented an application ontology where further information is inferred from semantic and spatial data. Linyao et al. [71] propose a tool for retrieving and managing disaster-related data. The objective is to automatically link some sources by measuring spatial and temporal similarities. Both approaches provide a system where data are collected or disseminated through web services. Ontologies at different levels of generality are also built to formalize and support reasoning over cartographic knowledge. Earlier works mostly focused on the formalization of the cartographic generalization process [73] classifying cartographic features, spatial relations, and relational constraints to be observed by the generalization process. Gould and Mackaness [74] also include generalization algorithms in their ontology in order to have a knowledge base that could be used in an automated system. Yan et al. [75] propose an ontology for the generalization of isobaths on charts. In their ontology, they not only store cartographic elements such as soundings and isobaths, but also include features formed by groups of isobaths and soundings representing submarine features as perceived by the readers. They also integrate generalization constraints and operators that can apply to these larger features. The ontology was implemented in a triplestore and used to select and perform operations in a multi-agent system. However, these works were mostly considering the development of a knowledge base to support map generalization. More recently, Varanka and Usery [76] consider the map itself as a knowledge base and propose an ontology that would not only include data and design concepts but also semantic and logical knowledge that are also embedded in the map. They also present an architecture based on a triplestore where map elements can be retrieved with SPARQL and GeoSPARQL queries. Huang and Harrie [77] propose a broader model including visualization knowledge and new concepts describing scale and portrayal. They propose a system architecture with a web server retrieving geospatial and portrayal data from a knowledge base and producing a map with the required style for a client application. Hahmann and Usery [78] present a first-order logic formalization of contour semantics to support qualitative and quantitative reasoning about contours. The formalization comprises four fundamental concepts: Contour regions, contour lines, contour values, and contour sets, as well as their subclasses and relations between them. The ontology is developed in first-order logic to support the detailed ontological Ontologies at different levels of generality are also built to formalize and support reasoning over cartographic knowledge. Earlier works mostly focused on the formalization of the cartographic generalization process [73] classifying cartographic features, spatial relations, and relational constraints to be observed by the generalization process. Gould and Mackaness [74] also include generalization algorithms in their ontology in order to have a knowledge base that could be used in an automated system. Yan et al. [75] propose an ontology for the generalization of isobaths on charts. In their ontology, they not only store cartographic elements such as soundings and isobaths, but also include features formed by groups of isobaths and soundings representing submarine features as perceived by the readers. They also integrate generalization constraints and operators that can apply to these larger features. The ontology was implemented in a triplestore and used to select and perform operations in a multi-agent system. However, these works were mostly considering the development of a knowledge base to support map generalization. More recently, Varanka and Usery [76] consider the map itself as a knowledge base and propose an ontology that would not only include data and design concepts but also semantic and logical knowledge that are also embedded in the map. They also present an architecture based on a triplestore where map elements can be retrieved with SPARQL and GeoSPARQL queries. Huang and Harrie [77] propose a broader model including visualization knowledge and new concepts describing scale and portrayal. They propose a system architecture with a web server retrieving geospatial and portrayal data from a knowledge base and producing a map with the required style for a client application. Hahmann and Usery [78] present a first-order logic formalization of contour semantics to support qualitative and quantitative reasoning about contours. The formalization comprises four fundamental concepts: Contour regions, contour lines, contour values, and contour sets, as well as their subclasses and relations between them. The ontology is developed in first-order logic to support the detailed ontological analysis of contour semantics in general but is also translated into an OWL ontology for storing and querying information of particular contour maps.
From this review, we can see that, while geospatial ontologies were firstly designed as a tool to formalize expert knowledge, the last few years saw a growing focus towards the integration of domain and task ontologies as computational ontologies into larger systems for data retrieval and analysis. Latest works on geovisualization and disaster management make use of ontologies to develop interoperable systems and assist users with tailor-made solutions.

Ontology Design Patterns and Lightweight Ontologies
Ontologies presented above are referred as heavyweight ontologies. They are completely developed ontologies describing a whole domain or a domain concept. Recent developments are also concerned with the design of more modular ontologies applicable to different problems. This issue is addressed by two kinds of ontologies: lightweight ontologies and ontology design patterns (ODP).
The distinction between heavyweight and lightweight ontologies is based on the expressivity provided by ontologies: lightweight ontologies have restricted expressivity as they provide the simplest formalization of the simplest model of a domain, adequate for the task at hand; they typically consist of a hierarchy of concepts and a set of relations holding between those concepts [79]. Lightweight ontologies are also proposed as an alternative to foundational and domain ontologies in order to capture the semantics of a domain. Although they lack the expressivity of heavyweight ontologies, they are especially useful for supporting connectivity and interoperability across communities and platforms in the context of the Semantic Web and linked data [80] and may be applied in key applications such as document classification, semantic search, and data integration [81].
A lightweight ontology can be used to extend or integrate other ontologies. Hasan et al. [82] defined a lightweight ontology for earthquake engineering that was integrated to WordNet. Hong and Kuo [18] propose the development of lightweight ontologies to semantically integrate concepts from two domains (topography and land use). Natural language definitions of concepts are transformed into structured representations which are then compared and associated on the basis of four types of semantic relationships (exact, subset, superset, overlap, and null). A bridge ontology is used to formally represent the semantic relationships between the concepts of the lightweight ontologies.
Kuai et al. [83] propose a lightweight ontology for mapping concepts from topographic maps in English and Chinese. They define a hierarchy of concepts from natural language definitions. Concepts from both languages are then related based on their similarity. Kordjamshidi and Moens [84] propose a lightweight spatial ontology for locating objects in space based on a spatial annotation scheme. The ontology consists of spatial concepts (trajectory, landmark) and spatial relations (region, direction, and distance). Its aim is to define mappings between cognitive-linguistic spatial concepts in natural language and qualitative spatial representation models. A global supervised machine-learning model for ontology population is used to implement the mappings between natural language and the lightweight spatial ontology.
More recently, Couclelis [2] proposed a conceptual model for the development of micro-ontologies for specific user needs and purposes encompassing three interrelated views of information: (1) Measurements and formal operations on these, (2) semantics for defining the meaning of information, and (3) context for the interpretation and use of information. The model encompasses a process called "semantic contraction" which generates a sequence of representational layers varying in semantic richness to support the varying perspectives and interpretations of information and the resulting representational and informational requirements of different users.
The specification of ontology design patterns is pursued as an alternative to ontology design and more profound ontology integration approaches when semantic heterogeneities and conflicting requirements do not allow for the development of a common, integrated ontology. Ontology design patterns or content ontology design patterns are small, modular ontologies that may be used as building blocks in the ontology design process [85]. They facilitate ontology reuse, evaluation, alignment, and mapping across domains. For example, Calafiore et al. [86] defined two ODP modeling the built environment and social behavior. Both ODP are then integrated together to mine social behavior patterns in urban areas from crowdsourced data.
The GeoLink project [19] has developed a framework for integrating seven data repositories mainly in the domain of ocean science using semantic technologies such as linked data and ontology design patterns (ODP). The GeoLink oceanography ontology [20] is a collection of ODPs used to model notions such as person, organization, dataset, cruise, feature, geofeature, place, etc. A pattern may also be aligned to another pattern or to an external ontology.
Because they are modular, ODP have also been used in topography and landform description. Indeed, the definition of a landform varies with the application domain and the perception of the user or the expert. Hence, instead of building a complete landform ontology, several authors propose to apply ODP for the description of the topography in different contexts. Sinha et al. [87] defined an ODP for surface water features that can be used to describe different water bodies with their shape and flow. Guilbert and Moulin [66] propose a landform ODP based on landform saliences together with a conceptual framework. They apply their approach on the modeling of landforms based on existing models. Such ODP allow for more flexibility for the description of water features and landforms. However, their usability has not been demonstrated yet by designing and integrating new ontologies based on these ODP.
Finally, Janowicz et al. [88] propose the Sensor, Observation, Sample, and Actuator (SOSA) lightweight ontology that provides specifications for modeling interactions with sensors and samplers. It is now a W3C recommendation and OGC standard. SOSA is based on an ODP that designs the core structure from which different perspectives are derived. The ODP provides a pattern for designing lightweight ontologies that can be integrated into larger ontologies.

Bottom-Up Ontological Approaches
Geographic ontologies are traditionally defined top-down by authoritative organizations or groups of experts using semantic modeling approaches. However, the amount of information available today has shifted the focus to bottom-up approaches for enriching existing geographic ontologies. Different approaches can be found including semantic information extraction and text mining techniques.
Bennett and Cialone [89] describe a methodology, called corpus-guided sense cluster analysis, for ontology development which combines two different modes of investigation: (a) Logic-based, formal semantic analysis of concepts and relations of expert knowledge; and (b) corpus-based statistical analysis of the actual use of terminology in natural language texts highlighting the range and frequency of senses associated with a lexical term. Although the methodology is general, it focuses on the construction of spatial ontologies: spatial entities like surfaces and cavities and spatial relations like "surrounding", "enclosing", and "containing" are used for the explication of the proposed approach. Instead of adhering to a single precise definition, the methodology is based on the notion of sense cluster where the referent of a conceptual term is modelled by a probability distribution over a set of precise definitions.
Hu and Janowicz [21] proposed a workflow to mine bottom-up geographic knowledge from the Linked Open Data (LOD) (https://lod-cloud.net/) cloud in order to enrich existing geographic ontologies developed top-down by groups of experts with diverse perspectives provided by general users. Geographic knowledge is expressed through instances and property-value pairs attributed to relevant target categories.
Zhu et al. [90] propose a bottom-up, data-driven approach for identifying similarities and differences in the semantics of geographic feature types to supplement existing top-down alignment methods. Three major gazetteers are used to demonstrate the proposed approach: DBpedia places [91], GeoNames [92], and Getty Thesaurus of Geographic Names [93]. The semantics of gazetteers are examined not on the basis of feature type terms or definitions but on the basis of instances belonging to them. Three kinds of spatial statistical features are extracted from instances: spatial point patterns, spatial autocorrelations, and spatial interactions with other geographic features.
Kokla et al. [94] used a top-down and a bottom-up approach for enriching and populating a geospatial ontology in order to enable semantic information extraction. The top-down approach is applied in order to incorporate knowledge from existing ontologies. The bottom-up approach is applied in order to enrich and populate the geospatial ontology with semantic information (concepts, relations, and instances) extracted from domain-specific web content.

Modeling vs. Encoding
While ontologies are supposed to be designed for formalizing knowledge in view of using it in some implementation, Kuhn [95] pointed, in the Semantic Web context, that ontology tasks should be separated into: modeling and encoding. Through this review, one can see that ontologies were indeed developed with a focus on either of these tasks (Table 1). On one hand, ontologies were built to formalize expert knowledge in a specific area to provide unambiguous concepts and rules. For example, landform ontologies such as [63,64] propose concepts to structure landforms in lattices while [76] provides a formalization of existing rules for contour maps in first order logic. In such approaches, ontologies are used to constrain the interpretation of the different concepts.
On the other hand, domain ontologies were built to provide implementation standards [88] and facilitate data processing [67], or even, for bottom-up approaches, were directly built from data. They provide design patterns that allow for better transfer of knowledge from concepts to data dependent models and for data operability with the use of standard patterns. Hence, one major novelty in these last few years was the application of ontologies in other areas such as remote sensing and image processing where traditionally, semantics was given limited consideration. Ontologies are used to associate some concept with their data representation. For example, vegetation is associated with a vegetation index and a topographic eminence is associated to an elevation difference. Ontology enrichment: Hu and Janowicz [21] Within this five-year period, the first few years mainly saw works related to modeling without much concern on the implementation while these last two years saw an interest in providing workable solutions. Hence a "qualitative-quantitative divide" [96] still exists but some works such as [71,77] show promising implementations. Recent research on GEOBIA shows that ontologies are no longer the monopoly of a specialized community but are also of interest for researchers looking for solutions to cope with the inherent vagueness to common problems such as vegetation mapping or landform classification. However, as mentioned in [56], vagueness can be of two kinds: conceptual vagueness and threshold (or sorite) vagueness. Ontologies provide a way to handle the first one while, in [14] the second is handled with machine learning approaches.
While machine learning can achieve good prediction in setting threshold values for different attributes, their computation is fully based on data. It; thus, requires training for each type of data (according to data resolution, source . . . ) and does not provide any elicitation of this vagueness at a conceptual level. Indeed, defining threshold values for distinguishing what should be big or small for example depends on the context in which the analysis is done. Domain ontologies on the modeling side can be designed to provide this contextual information. Hence, we believe that a better integration between "modeling" and "encoding" ontologies would help to define thresholds according to properties obtained from the concepts, leading to some semantic thresholds rather than data-driven thresholds.

Semantic Information Elicitation
In the last years, research has focused on eliciting meaning from the considerable amount of semi-structured content, such as html pages and metadata records and unstructured content, such as scientific reports, news articles, travel blogs, and historical archives. These sources provide a wealth of information on geospatial concepts, places, events, activities, etc. The term elicitation in the context of this paper is used to encompass a set of related processes that focus on making this knowledge explicit and discoverable: semantic information extraction, enrichment, and search.
Information extraction supports the automatic processing of unstructured or semi-structured resources and the identification of relevant entities, concepts, and relations. In addition to the extraction of these semantic elements, topic modeling techniques unveil latent abstract topics and semantic associations from large text collections.
Ontology-based information extraction (OBIE) [97,98] is an emerging subfield of information extraction, in which the information extraction process is guided by an ontology as a means to formally describe domain knowledge and assist the extraction of pre-defined concepts, properties, and instances. OBIE includes: (a) Approaches that use an ontology as a guide to acquire knowledge from unstructured and semi-structured text; and (b) approaches that seek to build the ontology by processing texts, also known as ontology learning and population approaches [98].
Eliciting places, events, concepts, relations, topics, etc. from texts in natural language and connecting these to their meaning through ontologies and knowledge bases supports semantic enrichment (or annotation or tagging), i.e., linking the content to other relevant resources based on an understanding of what each term is about. It is also significant for semantic search based not only on keywords used to formulate a query, but also on the meaning of terms behind the query.
Furthermore, it is used for computing semantic similarity and applying semantic analysis, classification, and spatialization techniques to explore numerous cognitive and semantic aspects, such as the relations among places, the historical evolution of cities, the progression of physical phenomena and social events, and people's perception of landscapes and regions.
Data-driven geospatial semantics refers to a bottom-up approach for eliciting geospatial knowledge from natural language texts that contain either explicit or implicit locations of places [99]. It is distinct from top-down or expert-driven approaches that extract geospatial knowledge from experts in a specific field. In terms of methodology and scientific tools used, data-driven approaches are also contrasted with conventional human-participants studies that elicit cognitive aspects, such as the meaning of geographic categories [100][101][102] or the representation of cognitive regions [103]. However, the increased availability of data and their analysis using statistical and data mining techniques does not necessarily ensure meaningful knowledge discovery, a task that presents many challenges and requires the support of synthetic and semantic tools [104].

Geospatial Semantic Information Extraction and Enrichment
Semantic information extraction aims at eliciting salient, specific types of information from unstructured or semi-structured data sources [105]. Entities, concepts, and/or semantic relations that are implicit in a given source are made explicit to support semantic annotation, content-based exploration, semantic search, and data-driven geographic analysis. Figure 7 shows spatial entities (in blue), spatial concepts (in green), and relations (underlined) that may be extracted from a small passage. Semantic information extraction aims at eliciting salient, specific types of information from unstructured or semi-structured data sources [105]. Entities, concepts, and/or semantic relations that are implicit in a given source are made explicit to support semantic annotation, content-based exploration, semantic search, and data-driven geographic analysis. Figure 7 shows spatial entities (in blue), spatial concepts (in green), and relations (underlined) that may be extracted from a small passage. Named entities in this context refer to specific real-world objects or instances, such as persons, locations, and organizations, denoted by proper nouns. For example, in Figure 2, Maldives and Tuvalu are named spatial entities. Named entity recognition (NER) [105] is used to identify named entity mentions in text and classify them in pre-defined categories. Gazetteer lists that provide words or phrases representing individual instances of a specific category (e.g., location, time, and organization) are widely used in NER tasks. Named entity disambiguation (NED) [105] is used to determine the identity of a named entity mention and link it to a unique entity in a knowledge base. NED is crucial when the same entity mention may refer to different instances, e.g., the entity mention "Washington" may refer to different real-world entities such as George Washington, the State of Washington, or Washington D.C.
Concept extraction techniques are used to identify words or phrases that denote semantic classes, such as the concepts: "atoll", "coastal area", and "island" in Figure 2. Concept extraction involves related processes such as term extraction, keyword/key phrase extraction, and topic modeling [105]. These processes are applied on a text collection, which usually describes one or several domains. Term extraction provides insight into the core concepts of the domain(s) that the text collection is about. Keyword / key phrase extraction focuses on extracting domain concepts that describe a given text. Topic modeling aims at analyzing co-occurrences of related keywords that describe higher-level semantic topics of the text collection.
Concept extraction usually relies on pre-processing steps such as tokenization to split text into words, phrases, symbols, or other meaningful elements called tokens, sentence splitting to divide the texts into sentences, part-of-speech (POS) tagging [106] to mark up each phrase as corresponding to a particular part of speech, i.e., noun phrases, verb phrases, adjective phrases, adverb phrases, etc., and lemmatization to identify the base or dictionary form of a word (lemma). Then, some approaches called window-based, consider a window around terms and examine contextual information within this window to judge whether the term may indeed represent a specific concept. Other approaches define a set of rules in the form of regular expressions and lexico-syntactic patterns for the identification of predefined domain concepts.
Topic modeling techniques [107] such as latent Dirichlet allocation and latent semantic analysis (or else latent semantic indexing) are text mining [108] techniques employed to identify terms that frequently co-occur in given collections of documents. These term co-occurrences form clusters which represent latent abstract topics, e.g., the words cyclone, hurricane, typhoon, blizzard, and thunderstorm may co-occur under a meteorological disasters' topic. These techniques are used to identify the relevant context of a given corpus and aid the classification, search, and retrieval processes.  Named entities in this context refer to specific real-world objects or instances, such as persons, locations, and organizations, denoted by proper nouns. For example, in Figure 2, Maldives and Tuvalu are named spatial entities. Named entity recognition (NER) [105] is used to identify named entity mentions in text and classify them in pre-defined categories. Gazetteer lists that provide words or phrases representing individual instances of a specific category (e.g., location, time, and organization) are widely used in NER tasks. Named entity disambiguation (NED) [105] is used to determine the identity of a named entity mention and link it to a unique entity in a knowledge base. NED is crucial when the same entity mention may refer to different instances, e.g., the entity mention "Washington" may refer to different real-world entities such as George Washington, the State of Washington, or Washington D.C.
Concept extraction techniques are used to identify words or phrases that denote semantic classes, such as the concepts: "atoll", "coastal area", and "island" in Figure 2. Concept extraction involves related processes such as term extraction, keyword/key phrase extraction, and topic modeling [105]. These processes are applied on a text collection, which usually describes one or several domains. Term extraction provides insight into the core concepts of the domain(s) that the text collection is about. Keyword / key phrase extraction focuses on extracting domain concepts that describe a given text. Topic modeling aims at analyzing co-occurrences of related keywords that describe higher-level semantic topics of the text collection.
Concept extraction usually relies on pre-processing steps such as tokenization to split text into words, phrases, symbols, or other meaningful elements called tokens, sentence splitting to divide the texts into sentences, part-of-speech (POS) tagging [106] to mark up each phrase as corresponding to a particular part of speech, i.e., noun phrases, verb phrases, adjective phrases, adverb phrases, etc., and lemmatization to identify the base or dictionary form of a word (lemma). Then, some approaches called window-based, consider a window around terms and examine contextual information within this window to judge whether the term may indeed represent a specific concept. Other approaches define a set of rules in the form of regular expressions and lexico-syntactic patterns for the identification of predefined domain concepts.
Topic modeling techniques [107] such as latent Dirichlet allocation and latent semantic analysis (or else latent semantic indexing) are text mining [108] techniques employed to identify terms that frequently co-occur in given collections of documents. These term co-occurrences form clusters which represent latent abstract topics, e.g., the words cyclone, hurricane, typhoon, blizzard, and thunderstorm may co-occur under a meteorological disasters' topic. These techniques are used to identify the relevant context of a given corpus and aid the classification, search, and retrieval processes.
Relation extraction techniques are used to identify semantic relations between concepts and entities (i.e., hypernymy, synonymy, relatedness, etc.). For example, in Figure 7, Maldives and Tuvalu may be defined as instances of the concept "low-lying countries". Various techniques are used for the extraction of predefined relations based on text processing, mapping to external resources such as ontologies, thesauri, and computational lexicons or on a combination of both. In the first case, key methods involve: (a) the identification of the head/modifier terms of a phrase (e.g. the head of the phrase "low-lying countries" is "countries", while low-lying is a modifier that specializes the head and creates a hyponym, i.e., a subclass); and (b) formulation of lexico-syntactic patterns. For example, the Hearst pattern) [109] "Low-lying countries such as Maldives and Tuvalu.." signifies that Maldives and Tuvalu are instances of "low-lying countries". In the second case, ontologies or other external sources such as WordNet and DBpedia provide predefined relations between concepts (e.g., hypernyms/hyponyms, synonyms, and related concepts). However, identifying the correct sense of a term, called word sense disambiguation [110] is a very challenging issue greatly influencing the outcome of relation extraction. The extraction of a broader range of relations between entities is even more demanding requiring the combination of the previous mentioned techniques with techniques such as distant supervision, extraction of syntactic dependencies between entities, and calculation of the frequencies of n-grams occurring in the text window surrounding both entities [105]. For an extensive survey of information extraction techniques in a Semantic Web context refer to [105]. For comprehensive reviews of OBIE systems and the contribution of ontologies in information extraction refer to [97,98]. Figure 8 shows an overview of semantic information processes and extracted elements. Relation extraction techniques are used to identify semantic relations between concepts and entities (i.e., hypernymy, synonymy, relatedness, etc.). For example, in Figure 7, Maldives and Tuvalu may be defined as instances of the concept "low-lying countries". Various techniques are used for the extraction of predefined relations based on text processing, mapping to external resources such as ontologies, thesauri, and computational lexicons or on a combination of both. In the first case, key methods involve: (a) the identification of the head/modifier terms of a phrase (e.g. the head of the phrase "low-lying countries" is "countries", while low-lying is a modifier that specializes the head and creates a hyponym, i.e., a subclass); and (b) formulation of lexico-syntactic patterns. For example, the Hearst pattern) [109] "Low-lying countries such as Maldives and Tuvalu.." signifies that Maldives and Tuvalu are instances of "low-lying countries". In the second case, ontologies or other external sources such as WordNet and DBpedia provide predefined relations between concepts (e.g., hypernyms/hyponyms, synonyms, and related concepts). However, identifying the correct sense of a term, called word sense disambiguation [110] is a very challenging issue greatly influencing the outcome of relation extraction. The extraction of a broader range of relations between entities is even more demanding requiring the combination of the previous mentioned techniques with techniques such as distant supervision, extraction of syntactic dependencies between entities, and calculation of the frequencies of n-grams occurring in the text window surrounding both entities [105]. For an extensive survey of information extraction techniques in a Semantic Web context refer to [105]. For comprehensive reviews of OBIE systems and the contribution of ontologies in information extraction refer to [97,98]. Figure 8 shows an overview of semantic information processes and extracted elements. In the geospatial domain, place names (or toponyms) are the principal type of knowledge extracted from natural language texts using a process called geoparsing [111]. Geoparsing involves both the identification of place names in text and their disambiguation in order to be linked to unambiguous spatial references using geographic gazetteers. For an extensive survey of geoparsing In the geospatial domain, place names (or toponyms) are the principal type of knowledge extracted from natural language texts using a process called geoparsing [111]. Geoparsing involves both the identification of place names in text and their disambiguation in order to be linked to unambiguous spatial references using geographic gazetteers. For an extensive survey of geoparsing challenges and techniques refer to [112,113]. Besides place names, some applications, such as the analysis of historical archives, also require the extraction of temporal expressions and person names that refer to historical personalities. Regarding relations, emphasis has been placed on identifying spatial relations between objects for interpreting indirect geographic references and for associating an entity or event to a reference location. In natural language texts, spatial relations are usually specified qualitatively using spatial expressions such as at, close to, to the left of, north of, etc. These expressions convey various spatial relations, whose meaning is often vague and context dependent hindering their automatic extraction and disambiguation [114,115].
Wallgrün et al. [116] built a corpus of natural language expressions extracted from web documents for analyzing and modeling Spatial Relational Expressions (SREs). The corpus consists of geo-referenced triplets which contain a located object, a spatial relation and a reference object. Mirrezaei et al [117] developed TRIPLEX-ST, a system for the extraction of spatio-temporal information about entities and their properties from textual resources. TRIPLEX-ST is based on a distantly supervised approach, which uses linguistic annotations (e.g., dependency relations, named entities, and lexical constraints) together with information available in existing KBs (e.g., DBpedia and YAGO knowledge bases), to extract triples associated with temporal and spatial contexts and infer templates that capture spatiotemporal information from unseen sentences. Each template includes dependency relations, POS tags, named entities, WordNet synsets, subject and object types, and syntactic/semantic restrictions of verbs in dependency paths.
Dittrich et al. [118] presented a classification schema for disambiguating spatial from non-spatial uses of an extensive list of English prepositions such as above, below, next to, in (the) front of, north of, etc. The classification schema involves rules for identifying and excluding abstract entities (abstract locatum or abstract relatum) involved in locative expressions based on knowledge sources such as WordNet and DBpedia and non-spatial uses of prepositions such as a temporal relation, the material of an object, the agent of an action, or the topic of communication based on dependency trees provided by a natural language processing (NLP) parser. Radke et al. [119] modify a machine-learning approach for generic spatial role labelling [120] to automatically extract prepositions that are used specifically in a geospatial sense and distinguish geospatial uses of prepositions from other spatial and non-spatial uses. A geospatial relation is defined as one in which the preposition has a spatial sense and the reference object to which the preposition applies is a named place or a geographic feature type. Place names are identified using Geonames gazetteer, whereas geographic feature types are identified using a dictionary of geographic feature types. The approach is evaluated on the basis of a corpus of 1876 instances of preposition usage manually annotated as geospatial, spatial (but not geospatial), and non-spatial.
Derungs and Purves [115] extract spatial expressions of the nearness relation using an analysis of n-grams-contiguous sequences of tokens retrieved from a large corpus of the form A near B where A and B are both place names, referring to different spatial granularities. The analysis of n-grams allows estimations of the probabilities of phrases expressing the nearness relation and the exploration of what is considered to be near at different scales.
The combined extraction of spatial and temporal information from documents supports the discovery of complex information such as events or trajectories from historical documents or travel literature. Abraham et al. [121] developed an approach for extracting spatial, temporal, and attributive information for analyzing the movement of rival troops during the anti-colonial resistance war of 1904 in Namibia from historical documents. Named entity recognition is used for the extraction of place names, data expressions, and person names based on self-tailored gazetteers for tagging historical place names and temporal expressions that are not included in traditional gazetteers. Pattern-based rules are formulated to extract spatio-temporal relationships.
Wang and Stewart [16] extracted spatiotemporal and semantic information on natural hazards from web news reports. The semantic information extraction process was based on a hazard ontology manually constructed from authoritative sources on hazards, which was integrated with spatial, temporal, and semantic gazetteers that capture these three aspects of hazards. The extraction process was implemented with GATE 8.0 (General Architecture for Text Engineering) [122] and involved several steps: linguistic processing, named entity recognition to identify spatial and temporal information, manual annotation to build the semantic gazetteer, mapping the terms from the semantic gazetteer to the corresponding ontology classes, rule-based association of events to spatial and temporal information, and geocoding the results for their subsequent visualization.
In order to explore the emotional structure of place classes such as city, forest, and road, Ballatore and Adams [123] developed a vocabulary of place nouns of natural and built places based on DBpedia [91], GeoNames [92], and WordNet [39]. A natural language processing (NLP) technique was implemented to extract place emotions from a corpus of travel blog posts based on the emotion vocabulary WordNet-Affect [124]. These are used for constructing emotion vectors based on term co-occurrences within a context window around place terms.
Egorova et al. [125] developed linguistic rules based on morpho-syntactic patterns for the automated extraction and classification of three types of fictive motion (actual motion of the observer, description of a vista somewhere along the way or encyclopedic knowledge) based on an annotated corpus of alpine narratives.
Deriving spatial concepts and entities from texts also relates to the research on geospatial semantic analysis and spatialization of text corpora. Derungs and Purves [126] analyzed a large corpus describing Swiss alpine landscapes in order to explore the description of landscapes. The approach extracts toponyms using the Swissnames (https://shop.swisstopo.admin.ch/en/products/landscape/names3D) gazetteer which are linked to nouns referring to natural features using linguistic processing and manual annotation based on predefined rules. In another study of people's perceptions of landscapes and their prominent characteristics, Wartmann et al. [127] collected landscape descriptions of five landscape types across three different sources (participant free lists obtained through interviews, hiking blogs, and Flickr tags). The sources were manually annotated based on a coding scheme including toponyms, biophysical, cultural, and perceptual landscape elements, activities, sense of place, and people. The comparison of sources based on coded terms using cosine similarity showed that descriptions from the same source were significantly more similar irrespectively of the landscape type.
Cooper et al. [128] proposed Geographical Text Analysis (GTA) for the spatialization and analysis of digital texts. The information extraction process identifies place names, as well as thematic tags representing topics (e.g., education, warfare, and farming using the UCREL Semantic Analysis System (USAS)). Bruggmann and Fabrikant [129,130] extracted place names (toponyms) and temporal information from an online dictionary about Swiss history in order to visualize temporal relations between Swiss toponyms using spatialization techniques. Place names are extracted using the Swissnames gazetteer and temporal information (such as dates and periods of time) is extracted using HeidelTime temporal tagger. Salvini and Fabrikant [131] used semantic information analysis and spatialization techniques of Wikipedia content for the construction and the empirical investigation of a multi-relational world city network. User-generated tags, which categorize the content of the articles and their organization using hypernymic and hyponymic relationships, are used to compute similarity relationships between shared articles and group them into thematically relevant clusters using the topic modeling technique.
Semantic enrichment (also known as semantic tagging or semantic annotation) is a related process aiming at adding semantic metadata to help machines make sense of the content and reveal latent relations and other semantic information. It is used for information organization and retrieval, semantic search and knowledge discovery, and ontology development and population. Semantic enrichment has been used to add semantic metadata to different types of content, including: unstructured documents [132], maps [133], images [134,135]), metadata [136], and videos [137]. Semantic metadata describe the meaning of content in terms of abstract concepts and entities, such as people, things, and places.
Lemmens et al. [22] discuss ways to semantically enrich unstructured VGI content with concepts from both informal structures (folksonomies) and formal structures (ontologies) in order to support information retrieval. Tardy et al. [135] present a method for the semantic enrichment of photo tags with place characteristics such as feature type, function, use, shape, material, appearance, etc., paying special emphasis on places in urban areas with a small number of photos. The method uses a combination of geometric and linguistic techniques to classify the tags and then select those that would most likely semantically enrich feature descriptions. In order to improve the classification of events in tweets, Romero and Becker [138] use semantic enrichment to identify entities and relevant vocabulary from tweets and related web pages and associate these features with concepts extracted from the LoD cloud. A pruning technique is then applied in order to discard too generic or too specific semantic features and select those with the most discriminative power for event classification.
Besides unstructured and semi-structured content such as texts, tweets, and photo tags, other types of spatiotemporal data, such as trajectories and movement data can also be semantically enriched. In this context, trajectories and movement data are not considered as mere sequences of geographic coordinates but as patterns with meaning that needs to be made explicit in order to support a more profound interpretation of the movement performed [139]. The Baquara2 framework is developed to semantically enrich and analyze movement data based on a multilevel hierarchical ontological model [140]. The semantic enrichment process annotates movement segments with references to concepts and instances from ontologies and linked open data collections (e.g., DBpedia, LinkedGeoData (http://linkedgeodata.org)), which describe the place, event, goal, or environmental conditions of the movement segment.

Semantic Search and Knowledge Discovery
Semantic-based search is a prevalent research topic in the context of the Semantic Web aiming to overcome the limitations of keyword-based search. In traditional keyword-based approaches, resources that are semantically related to a user's query but described differently from the query keyword are considered irrelevant and are excluded from the search results. Semantic-based search seeks to identify relevant sources by not only matching the keywords used but also based on the meaning behind a user's query. This could potentially result in identifying relevant sources that do not necessarily contain the keyword used in the query. For example, a semantic search for resources related to "geological phenomena in the Mediterranean" would ideally return resources including "earthquakes in Ionian Islands", "volcanic eruption of Mount Vesuvius" and "soil erosion in Spain" although these resources may not explicitly mention the terms "geological phenomena" or "Mediterranean".
The semantic description of both source data and users' queries is critical for achieving this and associating a search request with the most relevant source. Semantic information extraction is used to educe this semantic description from unstructured or semi-structured sources, whereas semantic annotation is used for query expansion, retrieval, and ranking. Query expansion consists in selecting and adding relevant terms to the user's query for minimizing query-document mismatch and improving retrieval performance [141]. An ontology's conceptual hierarchy enables the generalization / specialization of semantic search based on concepts and entities and the relations between them, especially hypernymy/hyponymy and synonymy relations.
Geographic information retrieval (GIR) [142,143] is another relevant field which deals with the automated interpretation of place names and spatial relationships in queries and in documents and with indexing, relevance ranking, and retrieval of the relevant content. GIR faces significant challenges such as detecting geographical references and associated spatial natural language qualifiers, disambiguating place names or other geographic information, and ranking resources with respect to spatial, temporal, and thematic relevance. Although GIR is a distinct research field beyond the scope of this paper, some aspects of semantic search are also relevant to GIR, such as the semantic description of source data and users' queries, query expansion, and semantic relatedness.
Semantic search in the geospatial domain is based on different approaches: top-down ontological approaches, bottom-up data mining approaches, or a combination of both. In the first case, ontologies are used to interpret users' queries and enrich them with other meaningful terms using ontology concepts and their between relations [144,145]. Concept-based information retrieval represents both documents and queries using semantic concepts instead of keywords and performs retrieval in concept space [146]. Synonymous terms, as well as more general, more specific or semantically related concepts are used for query expansion. In the second case, topic modeling techniques such as latent Dirichlet allocation and latent semantic analysis are used to analyze large corpora and discover abstract topics that co-occur in these collections of documents.
De Andrade et al. [147] propose a framework for improving geographic information retrieval in spatial data infrastructures. The framework includes a tagging process for enriching metadata with information about the space, theme, and time related to each feature type offered by a service. Thematic tagging relates the themes of feature types to ontology concepts. To facilitate semantic retrieval of remote sensing services, Nys et al. [148] created a self-learning knowledge graph that structures the concepts used in related queries. Natural language processing techniques (part-of-speech tagging and lemmatization) are used to reduce the complexity of both queries and service descriptions. Queries are then expanded based on the UNESCO thesaurus and a subgraph of the nearest neighbors of selected concepts is extracted including broader, narrower, and related concepts. The spatial aspect of queries is also used for expansion using the location on a map and administrative subdivisions provided by GeoNames.
Zaila and Montesi [149] developed GeoNW, a geographic ontology combining GeoNames, WordNet, and Wikipedia to support toponym extraction and disambiguation. The ontology includes physical or administrative places associated with synonyms derived from GeoNames and WordNet, and nationality adjectives from Wikipedia. This information is subsequently used to identify related terms of a geographic entity in a document. This approach is based on the assumption that the more related a geographic entity is to other geographic terms in the document, the more likely it is that the geographic term is associated to the document.
In order to enable semantic search and knowledge discovery for ArcGIS Online, Hu et al. [133] designed a specific ontology based on the ArcGIS Online schema and used two semantic annotation systems, DBpedia Spotlight [150] and OpenCalais [151], to extract entities and classes from map titles and descriptions. Semantically similar terms were also taken into account for query expansion based on the UMBC Semantic Similarity Service [152]. Jiang et al. [153] proposed an approach to discover semantic relationships between domain-specific vocabularies in order to obtain the semantic context of a user's query. The approach integrates the results from four different methods: user search history analysis, clickstream analysis, metadata analysis, and ontology concept similarity and is implemented in oceanographic data discovery.
Li et al. [154] developed a semantic search tool integrated in GeoNetwork (https://geonetworkopensource.org/) to support search and retrieval of polar datasets. The semantic search tool is based on latent semantic indexing (LSI) for full-text indexing, searching, and ranking relevant metadata records. Semantic associations between dataset metadata terminologies are identified and stored in a semantic matrix maintained within the catalogue. The tool also adopts a revised cosine similarity measure for ranking relevant results.
Frankenplace [155] is an interactive thematic map search engine that supports exploratory search tasks. An indexing approach using a discrete global grid and term boosting as well as topic modeling are used to support information exploration at different scales (zooming in and out) both geographically and thematically.
Huang [17] combines ontologies with latent semantic analysis (LSA) in order to categorize geographic features from text documents. LSA is used to capture the semantic context of terms, while ontologies are used to represent domain knowledge and support the extraction of terms from text, as well as the identification of query terms that represent the predefined categories.
Cross-linguistic information discovery is also an issue attracting attention especially in the context of geoportals. The use of different languages both for the documentation of data sources and the formulation of queries by different users further complicates the process of semantic search and knowledge discovery. Multilingual thesauri and ontologies are used to enrich metadata with synonyms and term translations, as well as query terms for improving metadata discovery [156]. Laurini [157] uses geometric characteristics and toponyms derived from gazetteers to formulate inference rules for matching multilingual ontological concepts. Since toponyms and concept types across different languages are not always strictly equivalent, the matching is based on homology relations which are reflexive and symmetric, instead of equivalence relations which are reflexive, symmetric, and transitive.
The notion of semantic similarity is central to information search and knowledge discovery. Similar entities and concepts are used for linking data among different sources, answering user queries, and disambiguating word senses and place names. Adams [158] proposed an observation-to-generalization model that distinguishes between observed attributes of the environment at a specific location and generalized attributes about places that are inferred from these observations. Within this model, a place is defined by a six-tuple: a set of toponyms, a type, a set of spatial footprints, a set of associated observations, a set of generalizations, and a set of relations to other places. The model and a suite of operations based on the invariance of generalized place attributes are used to address the problem of similar-place search.
Kim et al. [159] proposed an approach for extracting and disambiguating place names from unstructured place descriptions. The approach uses natural language parsing to extract place names and spatial relations between them from triplets consisting of an object to be located, a reference object and their spatial relation [160,161]. Graph matching based on string, linguistic, and spatial similarities between places is further used to find corresponding spatial objects.
Besides semantic similarity, the notion of semantic relatedness has also begun to attract the attention of GIScience. Semantic relatedness deals with concepts that are not necessarily similar, but are somehow related through various types of relations, such as the part-of relation between forests and trees and the caused-by relation between tsunami and earthquake.
Semantic similarity/relatedness have been studied in the context of spatial data infrastructures to identify similar geographic data sources on the basis of their metadata records [162]. The similarity/relatedness measures applied are usually based on external sources like thesauri and computational lexicons such as WordNet and also take into account the semi-structured descriptive information of metadata records such as title, keywords, spatial and temporal coverage.
Unstructured data may also be used to infer knowledge about places and other spatial entities and semantically enrich more structured representations. Hu et al. [163] developed a computational framework to detect semantic relatedness between cities mentioned in the same news articles. The framework is based on the labeled latent Dirichlet allocation (LLDA) model to extract likelihood scores for different topics (e.g., politics, culture, business, and sports) from news articles. The semantic relatedness between two cities under a topic is quantified as the number of news articles containing the co-occurrences of the two cities and also discussing the topic. The extracted semantic relatedness is then aggregated according to the topic (e.g., culture) or to the time period using the number of news articles published in that time period.
The evaluation of semantic similarity/ relatedness results is also a critical factor, which is measured by the degree to which a computational measure approximates human-generated judgements. Ballatore et al. [164] have developed the Geo Relatedness and Similarity Dataset (GeReSiD), an open dataset for the evaluation of computational measures of geo-semantic relatedness and similarity. Geo-semantic relatedness focuses on relations involving at least one term with a spatial dimension. The dataset includes 97 geographic terms forming 50 term pairs and was developed based on human judgments.

Conclusions and Future Challenges
The paper provides a review of semantic-based approaches to information modeling and elicitation spanning the last five years of research. In this context, information modeling refers to the development of ontologies at different levels of generality and formality, tailored to various needs and uses. Elicitation on the other hand includes a set of processes educing semantic information from semi-structured and unstructured resources. Modeling and elicitation may be used synergistically to enhance both: extracted information is used to improve and enrich an ontology, which in turn is used to refine the information extraction results. In the last five years, information modeling and elicitation focused on a variety of spatial concepts: Interdisciplinary and vague geospatial concepts such as places and forests, dynamic concepts such as phenomena, events, and trajectories, and concepts with a cognitive and linguistic basis such as social roles associated with cities, emotions evoked by places, and characteristics of landscapes. Such concepts may be defined in a more traditional way in top-down approaches fitted for a formalization of expert knowledge but also in bottom-up approaches built more easily from user perceptions and sense of their spatial environment, as expressed in natural language texts.
While remarkable progress has been made over these last five years, much is still to be done to solve problems caused by different conceptualizations and interpretations of geospatial information and facilitate knowledge exchange also across different domains and languages, as outlined below.
Although ontologies have provided a sound basis for the semantic formalization of domain knowledge, many interdisciplinary geospatial concepts are formalized in different ways by different domains. Multidomain knowledge integration is critical to solve complex scientific problems that require geospatial resources pertaining to multiple domains (e.g., geography, environmental science, Earth sciences, hydrology, forestry, spatial planning, and natural resource management). The arrival of the Internet of Things has pushed for the development of new ontologies but also calls for a better integration of these ontologies. Existing works have shown the direction with the development of ontology design patterns, lightweight or micro-ontologies as an alternative to upper level and domain ontologies. These alternative approaches, varying in semantic expressivity, focus on representing different perspectives and interpretations of geospatial information.
The development of methods capable of expressing a wide range of cross-domain knowledge and deeper semantic information elicitation processes for geospatial and other relevant concepts would greatly enhance the ability of scientists and general users to search for and combine information. Existing geospatial information extraction approaches focus mostly on locations, concepts, and topics, and to a lesser extent on predefined properties and relations between concepts. The extraction of properties and interrelations between entities, as well as the joint extraction and modeling of various elements and their interrelations is a more complicated and challenging research topic which is also critical for further populating both domain-specific ontologies and general-knowledge KBs such as DBpedia [91], YAGO [165], and BabelNet [166].
The majority of information extraction approaches rely on language-specific tools that are developed primarily for the English language followed by other common languages such as French, Spanish, and Chinese. However, there is a growing need for extracting semantic information from sources written in other languages, as well as for integrating cross-lingual information. Besides the development of extraction tools tailored to other languages, this requires the use of multilingual KBs, such as DBpedia or BabelNet, and the development of more robust tools for multi-or cross-lingual information extraction, search, and retrieval. Moreover, since information extraction and search approaches are typically complex, requiring different inputs, resulting in different outputs based on different methods, their evaluation is critical for assessing not only their value, but also their reproducibility for different resources and languages and portability to different domains.
Information elicitation can help bridge the gap between the user and the machine, also allowing the user to express qualitative high-level concepts. The volume and variety of available resources, most of them in semi-structured and unstructured format, raises additional challenges as to the interpretations and valid uses of this information. The inherent ambiguity, polysemy, and context-dependence of natural language further complicates the semantic-based extraction, enrichment, retrieval, and analysis of these resources, especially in a multilingual and multisource context. The need to overcome these barriers in knowledge discovery is a future research challenge highly relevant to data linking in the context of the Semantic Web and the advancement of geo-portals. In this context, besides more powerful techniques for multilingual multi-context semantic-based extraction, enrichment, and search, users would greatly benefit from novel approaches to interactive knowledge exploration and visualization. These approaches could provide users contextual information for a deeper comprehension of query entities, concepts, and semantic relations between them and thus enable them to formulate more relevant and effective queries.