Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems

Vahdat, Amirhossein; Badard, Thierry; Pouliot, Jacynthe

doi:10.3390/ijgi14100369

Open AccessArticle

Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems

by

Amirhossein Vahdat

^*,

Thierry Badard

and

Jacynthe Pouliot

Centre de Recherche en Données et Intelligence Géospatiales (CRDIG), Département des Sciences Géomatiques, Université Laval, Québec, QC G1V 0A6, Canada

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(10), 369; https://doi.org/10.3390/ijgi14100369

Submission received: 31 July 2025 / Revised: 4 September 2025 / Accepted: 18 September 2025 / Published: 23 September 2025

(This article belongs to the Special Issue Advances in Remote Sensing and GIS for Natural Hazards Monitoring and Management)

Download

Browse Figures

Versions Notes

Abstract

Discovering and selecting relevant geospatial datasets from heterogeneous sources remains difficult in conventional geoportals, where keyword-based search often fails to capture thematic relationships or user intent. This article presents an ontology-based framework that augments geoportals with semantic-aware discovery and selection. The contributions are as follows: (1) the geospatial metadata ontology (GMO), which reuses W3C and OGC ontologies and aligns with ISO 19115 to provide a uniform metadata representation enriched with thematic hierarchies and relations; and (2) GeoFit, a discovery framework that integrates GMO into geoportal workflows. The framework extends conventional functionality by enabling semantic query expansion, faceted exploration of thematic hierarchies, and ranking of datasets according to conceptual proximity and fitness-for-use criteria. These capabilities demonstrate how ontology integration operationalizes domain knowledge in the discovery process and makes dataset selection more interpretable and targeted. Validation demonstrated feasibility in the context of natural hazard Early Warning Systems (EWSs), where the prototype surfaced datasets relevant to different components, organized them into ranked and navigable results, and illustrated portability of the method to applied settings. The study confirms that embedding an ontology layer into geoportals provides semantic capabilities absent from keyword-only interfaces and establishes a foundation for extending discovery functions in heterogeneous geospatial infrastructures.

Keywords:

geospatial data discovery; geomatics; ontology modeling; geoportal; natural hazard warning system

1. Introduction

The establishment of Early Warning Systems (EWSs) is an essential part of a comprehensive strategy for risk management of natural hazards [1,2]. To protect people efficiently, forecasting natural hazard models depends not only on specialists’ skills but also on the quality and quantity of geospatial information that exists [3]. Such information comes from various sources like meteorological data, meteorological forecasts, remote-sensing multispectral images, time series of extreme meteorological events, incident reports, etc. Consequently, large-scale and quick access to geospatial data is required to enhance natural-hazard warning timeliness and awareness as well as hazard mitigation and response strategies [4]. However, the volume and diversity of data related to geospatial data, especially real-time observations, are exponentially growing, which presents a substantial challenge for the retrieval, selection, integration, and analysis of data in EWSs [5]. The Qaujjikairit project (Available online: https://sentinellenord.ulaval.ca/en/research/qaujikkaut-line-advanced-foresight-tool-extreme-meteorological-events-and-natural-hazards-nunavik, accessed on 19 September 2025), founded by Sentinel North and co-led by the Centre d’études nordiques (CEN) and the Centre de recherche en données et intelligence géospatiales (CRDIG) of Université Laval, seeks to develop an advanced online tool for forecasting extreme meteorological events and natural hazards in Nunavik. The need to define the described project was aroused by many occurrences of landslides and avalanches, flooding, permafrost degradation, and coastal and river erosion in this territory. Nunavik is located north of the 55th parallel in Quebec, Canada, and is home to about 14,000 people across 14 villages over 440,000 km² [6]. Climate predictions indicate that northern Quebec will witness a significant increase in temperatures and precipitation, enhancing the probability and severity of climate-related natural hazards [7]. This change poses a direct threat to the safety, health, and well-being of the inhabitants, alongside the local communities and industries.

The process of finding and selecting appropriate data for EWSs, like that envisioned for the Qaujjikairit project in Nunavik, faces significant obstacles [8]. Three groups of problems are identified. The first challenge is related to the identification of required geospatial data in a natural hazard EWS. Despite the opportunity proposed by artificial intelligence and analytical models to predict natural hazards, there is a gap in the availability of required geospatial datasets, the understandability of this data, and its usability [9]. In monitoring hazard models, experts select geospatial datasets based on thematic, temporal, and spatial relevance, yet it remains uncertain if these datasets fully capture the essential variables required for accurately assessing a wide range of hazards [10]. For instance, the differentiation of fog warnings by road types (highways, major roads, secondary roads, etc.) in fog-warning systems exemplifies the importance of specific geospatial data characteristics, such as road network type, in effectively identifying accident risks during sudden low-visibility conditions [11]. Therefore, as mentioned by Wu et al. [11], the first problem faced by specialists designing such prediction models is determining what information is required, based on the type of hazard and the data demands of potential analytical models.

Even if the required geospatial datasets are identified, there is still a need to discover and retrieve them. The second group of problems refers to geospatial data discovery in spatial data infrastructure (SDI) and standalone geoportals. All around the world, efforts to enhance geospatial data discovery and retrieval have led to the creation of geoportals by initiatives like the United States National Aeronautics and Space Administration (NASA) (Available online: https://www.nasa.gov/, accessed on 19 September 2025), Environment and Climate Change Canada (ECCC) (Available online: https://www.canada.ca/en/environment-climate-change.html, accessed on 19 September 2025), the European Space Agency (ESA) (Available online: https://www.esa.int/, accessed on 19 September 2025), and the Australian Geoscience Data Cube (AGDC) (Available online: https://www.dea.ga.gov.au/about/open-data-cube, accessed on 19 September 2025). The static nature of these infrastructures, while efficient for straightforward data cataloging and retrieval, falls short in addressing the complex, context-specific needs of various applications in multidisciplinary fields [12]. Most search engines in geoportals fail to incorporate a description of the entities of domains and the relationships existing between these entities [13,14]. For instance, Figure 1 illustrates the outcome of queries for “Wildfire” (Figure 1a) and “Natural Hazards” (Figure 1b) within NASA’s Earthdata Search geoportal.

Despite Wildfire being categorized under natural hazards, the query for “Wildfire” returns more results than “Natural Hazards”. Moreover, most geospatial search engines primarily use simple keyword-based matching methods [13]. In the keyword-based approach, the system identifies correspondences between the keywords provided by the users and the textual annotations associated with geospatial assets [15,16]. As mentioned by Quarati et al. [17], the reliance of most geospatial search engines on basic keyword matching fails to accurately reflect the user’s intent or accommodate preferences for multidimensional data characteristics. For example, when a user searches for “hourly water level,” the system might individually consider ‘hourly’, ‘water’, and ‘level’, failing to grasp that the user is specifically interested in datasets regarding water levels on an hourly basis. This issue illustrates a significant misalignment between how users phrase their searches and what they are truly looking for, pointing to a fundamental limitation in accurately capturing the user’s search intent. This issue, combined with the overwhelming volume of data returned by searches, underscores the need for more sophisticated search capabilities that can better accommodate users’ specific needs and preferences [18].

The third group of problems refers to the fitness for use of geospatial data. Fitness for use can be investigated by comparison of information related to different data quality dimensions, including objective attributes such as completeness, accuracy, timeliness, and consistency, as well as subjective attributes including relevancy, credibility, accessibility, and data authority [19,20]. Utilizing metadata models based on a standardized paradigm and conforming to established spatial data quality standards, such as those from the International Organization for Standardization (ISO), may enable users to assess the suitability of various geospatial datasets for their specific needs [21]. Nevertheless, as highlighted by Kalantari et al. [22], inconsistent and irrelevant information within metadata records poses a substantial obstacle to users in the efficient and effective selection of spatial data. For example, issues such as titles with inconsistent information, like scale and unclear abbreviations, divergences between keywords and the actual descriptions of the data, and abstracts that are either overly detailed or too brief and lacking in useful information, can all be categorized as problems associated with metadata records. We should also add that not all data comes with metadata, and even if metadata is present, it might not adhere to the data quality standards like the Open Geospatial Consortium standard [23,24].

Accordingly, this paper (based on the doctoral thesis of the first author, defended at Laval University, Canada, in 2025 [25]) encompasses a critical phase of the Qaujjikairit project, which is dedicated to identifying and selecting essential geospatial data from geoportals and subsequently leveraging this data into a natural hazard EWS. Consequently, the primary goal of this research project is to improve the geospatial data discovery and selection processes of geospatial data within geoportals. The complementary objectives are (1) to establish comprehensive mappings of geospatial information for natural hazard EWSs and (2) to develop a prototype to demonstrate the feasibility of implementing the value-added approach for the new framework proposed within a geospatial data portal. The methodology used in this research primarily follows a quantitative approach similar to an engineering process: observing a phenomenon (in this case, the discovery of suitable geospatial datasets in a geoportal for natural hazard EWSs), formulating hypotheses about possible solutions, designing a selected solution, and testing its validity in terms of feasibility and added value.

2. Related Work

2.1. Semantic Enhancement for Data Discovery and Understanding

In practice, researchers often rely on datasets with which they are familiar [26]. There is a common lack of awareness of alternative datasets that might offer better fitness for use in their models or applications due to the limitations of current geospatial search engines [26,27]. This situation poses significant challenges for the information retrieval community, which needs to develop more sophisticated methods for intelligent geospatial data discovery and establish semantic search platforms to improve the effectiveness of data discovery and selection [28,29].

The semantic web has been used to discover geospatial data, opening up novel views on the issue of heterogeneity in geospatial data and its usage [30]. For machines to autonomously interpret and execute user requests for diverse reasoning operations, it is crucial to address knowledge representation at the semantic level as well [31]. A knowledge base (KB), or an ontology, comprises a formally explicit specification of a commonly understood concept [32]. An ontology is a formal and machine-readable description of all the entities of a domain and the relationships existing between these entities [33]. Ontologies can further provide standards for conceptualizing and referring to domains of interest [33]. In various applications, ontologies created using the Web Ontology Language (OWL) are integrated to improve reasoning capabilities [34,35,36,37]. This integration helps in identifying geospatial data sources and interpreting metadata from various standards through the process of ontology matching and reasoning [38].

The rapid increase in geospatial data on the web and the advances in web technologies have led to the creation of a joint OGC and W3C working group on Spatial Data on the Web (SDW) (Available online: https://www.w3.org/TR/sdw-bp/, accessed on 19 September 2025). This group focuses on establishing best practices for publishing geospatial data online, aiming to make SDI data more accessible and effective. Following the recommendations of the SDW working group, the integration of semantic web technologies, such as Resource Description Framework (RDF) and OWL, into geoportals has been proven to significantly enhance their functionality [39]. These technologies enable the explicit representation of data semantics, facilitating richer data integration and querying capabilities essential for effectively using extensive geospatial datasets across diverse applications [40,41]. However, while RDF provides a foundational structure for representing data as triples (subject–predicate–object), it lacks the expressiveness required for complex domain-knowledge modeling. OWL builds on RDF by introducing formal semantics that allow for a more detailed representation of relationships and concepts. OWL 2, an extension of OWL, further enhances these capabilities by offering even greater expressiveness and reasoning power, enabling richer representation of geospatial concepts and automatic inference of new knowledge. Comprehensive semantic metadata models are foundational to enhancing interoperability, consistency, and reusability across diverse systems [41]. Beretta et al. [42] demonstrated this through their user-centric metadata model, which leverages the SOSA (Sensor, Observation, Sample, and Actuator) ontology. SOSA is a lightweight ontology designed to model the process of making observations, including the sensors used and the resulting data, creating a standard paradigm based on the observation concept [43]. As Beretta et al. [42] discussed, the model improves dataset discoverability and semantic interoperability by employing ontologies and vocabularies like PROV-O (Available online: https://www.w3.org/TR/prov-o/, accessed on 19 September 2025) for data provenance and SWEET (Available online: http://cor.esipfed.org/ont?iri=http://sweetontology.net/sweetAll, accessed on 19 September 2025) for environmental data. In their case study, they demonstrated the model’s effectiveness through the integration of datasets related to Yellowfin tuna populations in the Indian Ocean. The researchers used the SOSA ontology to link biological observation data with environmental properties such as sea surface temperature and chlorophyll concentration. The authors reported that this approach enhanced semantic annotations, facilitating more efficient data discovery and integration, and allowing researchers to seamlessly retrieve relevant datasets across different disciplines, thereby supporting complex interdisciplinary research tasks.

Advancements in query techniques through ontology reasoning have notably improved the efficiency of data discovery processes, especially in data recall [44,45]. Li et al. [46] reported on developing a rule-based, semantic-enabled service-chain model to support intelligent spatiotemporal question answering and open knowledge discovery. The authors introduced this model as part of a cyberinfrastructure that integrates various data sources, enabling organization and visualization of geospatial data. They introduced key components of this model, including a spatial and temporal reasoner that resolves spatial and temporal details in user queries and enables the disambiguation of place names. Li et al. [46] stated that the proposed spatiotemporal capability could improve the efficiency of data discovery processes by ensuring the relevant data was accurately linked and processed according to user-defined parameters.

Nevertheless, most semantic approaches concentrate on improving metadata descriptions and retrieval precision, without sufficiently addressing the practical challenge of identifying datasets that are truly required for a user’s analytical or decision-making tasks. In many cases, users are still left with large sets of “relevant” datasets, yet with no systematic way to determine whether these datasets are actually suitable for the problem at hand.

This indicates a need for approaches that go beyond the semantic enrichment of metadata and incorporate mechanisms for aligning dataset discovery with explicit information requirements. Such mechanisms would support users in moving from “finding datasets that exist” to “finding datasets that can be effectively used for their intended application”.

2.2. Geospatial Data Quality and Fitness for Use

Enhancing fitness for the use of geospatial data within geoportals necessitates the implementation of sophisticated ranking approaches and advanced search capabilities such as facet searches [47]. As Vockner et al. [48] highlighted, these approaches and capabilities refine data discovery and retrieval to align with user needs by prioritizing search results based on relevance and user intention.

Ranking functions play a critical role in determining the relevance of search results in geospatial data portals, but their effectiveness is often limited by the reliance on single attribute-based metrics [49,50,51]. The study by Hervey et al. [52] offered an extensive review of search functionality across open geospatial portals, focusing mainly on how these systems process and interpret user queries, with a detailed examination of search facets and ranking functions. Their research revealed that most portals utilize conventional full-text search engines like Apache Solr (The Apache Software Foundation, Forest Hill, MD, USA), relying on basic keyword frequency metrics, such as Term Frequency-Inverse Document Frequency (TF-IDF), for ranking, often leading to static result lists that lack contextual refinement or dynamic re-ranking based on user interactions. Despite the prevalent use of these traditional methods, the authors identified several usability constraints, such as the inability to handle complex, multi-dimensional queries, the limitation of ranking based on single attributes, and the lack of support for user feedback to dynamically refine search results. They suggested integrating alternative interfaces and more sophisticated query-processing pipelines that better address the nuanced needs of diverse user communities. Specifically, the paper advocated for adopting semantic technologies, which could significantly enhance the understanding of queries and the relevance of search results by revealing the meaning behind user queries and data context. The study suggested that portals could offer more dynamic and user-friendly search interfaces by incorporating semantic technologies.

Although research has explored ranking functions and search functionalities in geoportals, these approaches remain largely limited to conventional keyword-based methods, often producing static results and struggling with complex, multi-dimensional queries. Prior work has not fully leveraged semantic technologies to enable the semantic ranking of datasets, which could capture both query meaning and data context. Addressing these gaps is essential for developing geoportals that provide intelligent, context-aware, and user-responsive search experiences.

2.3. Geospatial Data Discovery Methods for Natural Hazard EWSs

As previously mentioned, incorporating semantic technologies can enhance geospatial data’s interoperability, integration, discovery, and usability, thereby reducing the dependency on user expertise and the time needed to gather geospatial data in the discovery process. Most previous studies in disaster management applications use semantics (including ontologies) in geospatial data discovery and deal with the integration of geospatial datasets for specifically defined tasks [53,54]. These systems enabled the automated analysis of data from different geospatial sources. However, the process of data discovery is still limited in the existing EWSs, where the required geospatial data are already identified or do not cover the geospatial characteristics to improve dataset selection.

Qiu et al. [53] explored the significant role of ontologies in enhancing the discovery and management of geospatial data within an integrated flood disaster management system (FDMS) approach. Their developed ontology-based approach facilitated semantic interoperability between diverse datasets and environmental models. For example, the system used ontologies to define multi-level semantic mappings that quantitatively and qualitatively describe the relationships between data attributes and model requirements. These mappings include spatiotemporal constraints, feature constraints, and preference constraints. In practice, the FDMS could dynamically select the most relevant hydrological and meteorological datasets for a flood prediction model based on the model’s predefined requirements in the ontology, such as temporal resolution and data type, without manual intervention. As the author mentioned, the FDMS approach streamlined data integration processes, reduced response times in critical flood management situations, and enhanced the overall effectiveness of the disaster management workflow.

Phengsuwan et al. [54] presented a formal knowledge base pertaining to the domain of landslides. The research emphasized the verification of warning signs, utilizing social media and observation data to construct an EWS. While the proposed framework effectively identified potential datasets for validating the occurrence of landslides, its contribution was limited to the monitoring and prediction element of the EWS, with no mention of the other elements. Furthermore, the authors did not specify how they addressed the representation of spatial attributes within their design, such as spatial coverage derived from the metadata.

Following the literature review presented herein, the role of potential geospatial information and data sources in supporting all elements of a natural hazard EWS within a generalized framework remains largely unexplored.

3. Ontology-Based Geospatial Data Discovery and Selection Framework

3.1. Introduction

The literature review reveals that the effective discovery of geospatial data requires more than a keyword search or static metadata and demands a semantic understanding of what the data describes and how it aligns with application needs. To address the challenges of discovering relevant geospatial datasets in dynamic, multi-thematic contexts such as EWS, the GeoFit framework is proposed (Figure 2). GeoFit provides an ontology-based framework, designed to improve geospatial data discovery and ranking by aligning dataset characteristics with application-specific needs. GeoFit is structured around three main steps: (1) assembling and introducing a geospatial metadata ontology by reusing W3C and OGC ontologies; (2) mapping and instantiating metadata from existing geoportals using the ontology to create semantically enriched descriptions; and (3) utilizing the GMO in the discovery and selection process to support data fitness for use in a geoportal. The following sections present each step of the GeoFit framework.

3.2. Geospatial Metadata Ontology

The geospatial metadata ontology (GMO), as shown in Figure 3, is designed to enhance the data discovery and selection process in geoportals by integrating semantics into metadata and boosting the inherent capabilities of geoportals, such as enabling faceted searches, supporting thematic classification, improving relevance ranking through semantic similarity, and recommending additional datasets based on inferred user needs. Once the GMO is created, it remains stable and does not require modification during the operational phases of the framework. As a matter of fact, GMO functions as a predefined semantic structure that supports querying, reasoning, and selection of geospatial data.

The ontology is implemented in Protégé (version 5.5.0; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA) due to its comprehensive support of OWL 2 and its widespread use within the ontology-development community. As illustrated in Figure 3, the GMO integrates several established ontologies and vocabularies, such as Data Catalog Vocabulary (DCAT), SOSA (Sensor, Observation, Sample, and Actuator), OWL-Time, GeoSPARQL, and Simple Knowledge Organization System (SKOS), to form a cohesive model that supports advanced semantic reasoning over geospatial metadata. The visual structure clarifies how different classes, such as dcat:Dataset, sosa:ObservationCollection, and skos:Concept, are connected, emphasizing the semantic integration of dataset description, observation context, and thematic classification. The proposed metadata model reflects the key discovery elements recommended in ISO 19115-1 [55] and ISO 19115-2 [56] such as dcterms:title and dcterms:description for resource identification, foaf:Agent for responsible parties, dcterms:spatial and dcterms:temporal for spatial and temporal extent, dcterms:downloadURL and dcterms:accessRights for access and constraints, skos:Concept for thematic classification, and dqv:QualityMeasurement for data quality. While not all ISO elements are explicitly included, the model remains extensible and aligned with core discovery requirements. The most important ontologies for reusing terms and relationships are as follows.

DCAT and GeoDCAT-AP are used since they are W3C standards that are designed to enhance the interoperability of data catalogs on the web. DCAT promotes standardization and linked data principles, allowing enriched metadata descriptions and better data discoverability. GeoDCAT-AP is an extension of DCAT, developed by the European Commission’s Joint Research Centre (JRC), tailored to meet the specific needs of geospatial data catalogs. GeoDCAT-AP integrates geospatial metadata standards, such as ISO 19115 [55] and INSPIRE (Infrastructure for Spatial Information in the European Community) [57], into the DCAT framework. The selection of DCAT and GeoDCAT-AP over other metadata schemas, such as Dublin Core, is justified by their specific alignment with geospatial standards and their ability to enhance data discoverability and interoperability. GeoDCAT-AP includes a comprehensive set of metadata elements essential for geospatial data catalogs and discovery, covering INSPIRE/ISO 19115 elements [55,57] related to identification (such as resource titles, abstracts, and identifiers), contacts (points of contact), temporal aspects (dates and temporal references), spatial aspects (geographic bounding boxes and coordinate reference systems), and various facets of data quality [58].

As mentioned, the DCAT and its geospatial extension GeoDCAT-AP provide a structured model for catalog-level metadata, including spatial and temporal extent, responsible organizations, and data-access mechanisms. However, they do not offer explicit constructs for describing the observational context in which data is generated. To address this limitation, the SOSA ontology is integrated. SOSA is used for bringing observation context, like in situ observations and remote sensing imagery. It defines a set of observations as a collection, giving the main properties of the observation context. SOSA has been preferred over alternatives like the Observation and Measurement (O&M) model due to its flexibility and integration with the broader Semantic Sensor Network (SSN) ontology, which adds crucial capabilities in representing observation contexts. The SSN ontology builds on the SOSA ontology and includes more intricate details about the sensors’ operational ranges, accuracy, and other properties, facilitating a richer representation of sensor networks and their data. In Figure 3, this is shown through linking dcat:Dataset to sosa:ObservationCollection and enabling relationships between sosa:Observation, sosa:Sensor, sosa:Platform, sosa:ObservableProperty, and sosa:FeatureOfInterest.

Extensions to the SSN ontology provide additional capabilities, such as the class sosa:ObservationCollection. Observations are often part of collections where shared properties are critical for efficient data discovery and transfer. Standardizing the packaging of observation descriptions at the collection level supports the accessibility and usability of these datasets. It should be noted that if a sosa:Observation individual is a member of a sosa:ObservationCollection, the properties of the entire collection are linked to that individual observation as well as to all the other individual observations or sub-collections within it. This shared inheritance of metadata is crucial for enabling semantic reasoning over grouped observational datasets.

To support enhanced semantic discovery of geospatial datasets, two sets of SKOS-based concepts are introduced: FeatureTheme and PropertyTheme. These sets enable the classification of datasets along two distinct yet complementary thematic dimensions. FeatureTheme denotes general categories of geographic features described in a dataset, such as rivers, wetlands, or glaciers. PropertyTheme captures observable or measurable characteristics, such as water level, soil moisture, or vegetation index. While inspired by the sosa:FeatureOfInterest and sosa:ObservableProperty classes from the SOSA ontology, these concepts function as metadata-level thematic abstractions rather than representations of individual observational instances. Within the proposed framework, :FeatureTheme classifies the dataset as a whole, whereas sosa:FeatureOfInterest refers to the real-world object that is the target of an observation. This modeling approach is conceptually aligned with ISO 19110 (Feature Catalog) [59] and ISO 19156 (Observations and Measurements) [60]. FeatureTheme corresponds to the concept of feature types defined in ISO 19110, while PropertyTheme reflects the types of observable properties described in ISO 19156, both adapted for thematic classification purposes. As shown in Figure 3, FeatureTheme and PropertyTheme are defined as subclasses of skos:Concept, each forming a separate conceptual hierarchy. Although the two classes are not linked at the schema level, all PropertyTheme instances are defined as skos:narrower concepts of corresponding FeatureTheme instances. This design supports cross-thematic navigation while maintaining modularity and semantic clarity within the ontology.

In this work, the Global Change Master Directory (GCMD) science keyword taxonomy is adopted as the source vocabulary for both FeatureTheme and PropertyTheme. This choice is driven by the fact that the ontology is used to catalog metadata from NASA Earth observation datasets, which are already annotated with the GCMD keywords. By transforming the GCMD keyword taxonomy into SKOS and integrating it into the ontology, this work demonstrates the practical feasibility of organizing real-world Earth-science metadata using these thematic concepts. Importantly, this process is not limited to the GCMD because the ontology framework is designed with extensibility in mind. The use of SKOS as the modeling structure allows seamless integration of additional controlled vocabularies by representing their terms as SKOS concepts within the existing thematic hierarchies, or as separate hierarchies linked through SKOS semantic relations. This flexible architecture enables the ontology to accommodate diverse domain vocabularies without restructuring its core, supporting semantic interoperability and enhancing dataset classification and discovery across multiple scientific fields.

3.3. Metadata Processing

This step focuses on aligning metadata from geoportal datasets with the GMO and classifying datasets thematically. Although metadata parsing and transformation are well-established, such as in the GeoKnow project [61], they are not the primary focus of this study. However, a metadata-parsing module is manually created to determine which element of the proposed semantic metadata corresponds to the various attributes used in the initial metadata. Specifically, this process involves identifying the matches between the initial metadata elements and their corresponding classes, object properties, and data properties in the ontology. Once these mappings are identified, they are applied to all metadata resources to transform them accordingly. Although this work does not focus on the parsing and matching process, the process is significantly streamlined by adhering to the ISO 19115 standard in the semantic metadata model. This streamlining occurs because the ontology already includes the corresponding classes, object properties, and data properties for the elements of the ISO 19115 metadata.

To systematically classify geospatial datasets into dcat:theme, this study employs a methodology inspired by Li [30]. The process involves parsing structured metadata, extracting descriptive information from fields such as “Abstract”, “Title”, “Description”, and “Keyword” tags, and using N-gram analysis to extract relevant keywords. These keywords are then compared to the knowledge encoded in the ontology, which incorporates taxonomies from the GCMD, using a semantic analysis module designed for keyword matching. Based on the results of this keyword matching, themes are assigned by evaluating the hierarchical relationships between matched concepts within the ontology, rather than relying on single-word matches. This means that, instead of matching just a single word, the system considers a chain of related concepts organized from broader to more specific categories. For example, within the GCMD taxonomy, the concept “Total Surface Precipitation Rate” is a specific term nested under the broader concept “Precipitation”, which itself falls under the even broader category “Atmosphere”. In this hierarchy, “Atmosphere” serves as the general feature theme, while “Precipitation” and “Total Surface Precipitation Rate” represent more detailed property themes.

3.4. Data Selection and Fitness for Use Processing

The next step in GeoFit is data selection and fitness-for-use processing. The GeoFit framework employs a semantic ranking model and faceted search for data selection to ensure fitness for use. This step begins with the creation of a semantic ranking model. This model aims to rank the datasets based on their semantic similarity to the user’s search query. Once the results are ranked, the system will visualize the ranked datasets to allow users to access them. If further refinement is necessary, the faceted search filters the datasets according to specific criteria. The final output of this step is a set of ranked datasets that have been evaluated for their fitness for use, based on the user’s specific requirements.

By incorporating semantic ranking into the retrieval process, the system can rank datasets not only based on exact matches but also by considering their semantic proximity to the user’s query. To apply this enhanced ranking methodology, the concept of semantic distance is utilized, which measures the closeness between two concepts or individuals based on their semantic properties [62]. This is achieved by analyzing the hierarchical relationships defined within the ontology, where SKOS concepts establish skos:broader and skos:narrower relationships, and OWL2 provides the underlying structure that supports reasoning over these relationships. Semantic distance is then calculated by examining these relationships and determining the connections between concepts, including their common ancestors and relative positions within the hierarchy. Consequently, when a user seeks to retrieve datasets related to a particular theme, the ontology can infer connections between datasets that may not be directly linked but share a broader or narrower relationship through common ancestors or related concepts. This approach ensures that the system retrieves relevant results and ranks them according to their semantic distance. In this study, semantic similarity is computed using the GCMD vocabulary and its SKOS representation, which provide hierarchical relationships (skos:broader, skos:narrower) and lexical variants (skos:altLabel) aligned with Earth science concepts. General-purpose lexical resources such as WordNet were not applied, since the objective was to capture domain-specific semantics rather than purely linguistic similarity. The Q-TREE algorithm [62] is used as the foundation for computing the shortest distance within this structure. The relative order of subcategories associated with a given node is determined using the SKOS relationships, taking advantage of the fact that these relationships provide an ordered taxonomy. Given any two nodes v,z, the query algorithm for finding their distance d(v,z) is defined as follows:

d(v,z) = d(v,r) + d(z,r) − 2d(LCA(v,z),r)

(1)

where d(v,r) + d(z,r) is the distance of the nodes v and z from the root r (in the case that both have the same root, otherwise the nodes are not related to each other), and d(LCA(v,z),r) is the distance from the root to their least common ancestor.

For instance, a user searches for datasets that measure “Hourly Precipitation Amount”, and the SKOS-based ontology organizes this concept within a hierarchy where this concept that is considered as “Atmosphere” has a narrower concept called “Precipitation”, “Precipitation” has narrower concept called “Precipitation Amount”, and “Precipitation Amount” has narrower concept called “Hourly Precipitation Amount” (“Atmosphere” > “Precipitation” > “Precipitation Amount” > “Hourly Precipitation Amount”). The ontology first retrieves datasets that match the search query. Subsequently, the system computes the semantic similarity between this concept and other related themes, retrieving relevant results and ranking them according to their semantic distance from the common ancestors. In the example above, the semantic distance of datasets with the themes “3 Hour Precipitation Amount”, “6 Hour Precipitation Amount”, and “12 Hour Precipitation Amount” was computed using Equation (1), and the following was found to be two:

Atmosphere > Precipitation > Precipitation Amount > 3 Hour Precipitation Amount.
Atmosphere > Precipitation > Precipitation Amount > 6 Hour Precipitation Amount.
Atmosphere > Precipitation > Precipitation Amount > 12 Hour Precipitation Amount.

The most relevant datasets will be assigned five stars, while the least relevant ones will receive one star. To determine the star ratings, the system analyzes the computed semantic distances. Datasets with the shortest distances to the user query concept (or highest semantic similarity) receive higher star ratings:

Five stars are assigned to datasets with exact matches.
Four stars are given to datasets with minimal semantic distance.
Three stars are assigned to moderately relevant datasets.
Two stars are given to datasets with lower relevance.
One star is assigned to datasets with the least relevance but still related to the query.

To normalize the ranking and map it onto this 5-star method, the following steps are taken:

Compute Semantic Distances: First, the system computes the semantic distances between the user’s query concept and each dataset using the Q-TREE algorithm.
Determine Maximum and Minimum Distances: Identify the maximum and minimum semantic distances among the retrieved datasets to define the range of relevance. A maximum distance threshold of four for narrower and sibling concepts (themes in the SKOS GCMD ontology) is applied in this study. This threshold corresponds to the typical depth of the GCMD thematic hierarchies, which generally range from three to five levels. Distances greater than four usually indicate only very broad conceptual overlap (e.g., at the root level), which is not meaningful for ranking datasets in terms of fitness for use. The value of four, therefore, balances coverage of relevant semantic relations with interpretability in the 5-star ranking system. Although various optimization methods exist, such as those discussed by Hervey et al. [52], this threshold was selected to maintain a straightforward and practically interpretable approach in the context of this study.
Normalize Distances: Normalize the distances to a 0–1 scale using Equation (2):

$N o r m a l i z e d D i s t a n c e = \frac{D i s t a n c e - M i n D i s t a n c e}{M a x D i s t a n c e - M i n D i s t a n c e}$

(2)
Map to Star Ratings: Convert the normalized distances to star ratings using predefined thresholds. The thresholds are selected to distribute the datasets evenly across the star ratings:
- Exact match: Five stars.
- (0, 0.2]: Four stars (minimal distance).
- (0.2, 0.4]: Three stars (moderately relevant).
- (0.4, 0.6]: Two stars (lower relevance).
- (0.6, 1]: One star (least relevance).

This approach is similar to the discretization techniques used in data mining, where continuous attributes are converted into categorical values [63]. This star-rating system provides users with a clear, intuitive understanding of the relevance of each dataset, facilitating more efficient data selection.

4. Ontology for Required Geospatial Information in Natural Hazard EWSs

Despite extensive research, no existing ontologies fully model a natural hazard Early Warning System (EWS) with the necessary geospatial information and data (need references). Understanding the characteristics of key natural hazards and selecting appropriate geospatial data requires analysis of the preparatory phase of the hazard process. The preparatory phase, which includes both long-term environmental conditions and short-term dynamic factors, provides essential context for identifying which spatial variables are relevant for hazard assessment and early warning [64,65,66]. As proposed by Sättele et al. [66], natural-hazard processes can be categorized by the following threefold factors: (1) basic disposition, which is the general and long-term characteristics of an area, like topography, land use, and soil texture, that increases the potential for natural hazards occurrence; (2) variable disposition, which is related to the time-dependent parameters, such as rain, temperature, and wind; and, finally, (3) trigger events, which are pre-events at the origin of natural-hazard processes if the dispositions (basic disposition + variable disposition) are noteworthy. Recognizing these elements is essential for modeling hazard dynamics and informing EWSs.

An effective EWS encompasses four main elements: (1) risk knowledge, (2) monitoring and prediction, (3) dissemination of information, and (4) response [67]. Since the response element primarily involves educating communities on risks and reaction protocols, it falls outside the scope of this study. This focus enables concentration on the geospatial data and information components that are critical for risk assessment, monitoring, and decision-making within EWSs.

For the risk knowledge element, a geospatial information system (GIS) is helpful for storing and regularly updating natural-hazard risk datasets, as well as for accessing those published into web services. To make maps and data widely available in this element, it is essential to address key questions such as the following: Are the hazards and vulnerabilities well understood? What are the patterns and trends of hazards in the region? In response to the former question, basic disposition-related geospatial data, such as land use/cover and topography, serve to characterize hazards. Additionally, vulnerability-related geospatial data include information on surface and subsurface structures that are prone to risk [68]. One critical criterion for determining a region’s vulnerability is the population at risk, which can vary depending on factors such as season and time of day. Accordingly, it is essential to consider time when assessing the region’s overall vulnerability. To answer the latter question, historical data on events (e.g., historical flood data) are necessary to identify regional patterns and trends.

In the monitoring and prediction element, the continuous receiving and processing of data related to the “variable disposition” factor, or to the process of the natural hazard itself, in meaningful formats and in real-time, or near real-time, is vital. Also, the knowledge of natural-hazard warning signs can be used to identify appropriate data sources for natural-hazard precursor verifications. Therefore, reported incidents that people observe at a specific location can be another data source. Moreover, when prediction systems are designed to incorporate statistical or machine learning models, historical event data can also serve as a critical input for trend analysis and hazard forecasting. The mentioned information enables EWS to provide dynamic and timely decision-making in response to natural hazards. The related geospatial data for real-time or near-real-time information is organized into the categories of remote sensing data, in situ observation, reported incident, and predicted data.

Finally, for the dissemination element, decision-makers and responsible authorities need a usable and simple warning to be understood and to feel confident in acting. The geospatial data about vulnerable areas, and the major factors in the hazard risk (basic disposition) in the warned regions, help increase understanding regarding what the danger is and where it is most dangerous. Also, considering the real-time positioning of people and generally moving objects gives decision-makers a clear understanding of when an event has a significant impact and who is in danger.

An ontology of the required geospatial information in natural hazard EWS, named GeoNHEWS ontology (Figure 4), has been developed to represent the elements of natural hazard EWSs and the geospatial information they require. Overall, this ontology aligns with international EWS principles by ensuring that risk knowledge is grounded in historical and static spatial data, monitoring and prediction are supported by dynamic and real-time observations, and dissemination is driven by spatially contextualized warnings and population data. Through the integration of the GeoNHEWS ontology into GeoFit, users can benefit from the enhanced capabilities this framework offers for geospatial data discovery and selection. This integration is intended to reduce the need for domain expertise when querying for relevant geospatial data and to facilitate quick and convenient access to data in support of decision-making. The GeoFit framework can be adapted to different domains by specifying their required themes of geospatial data. In this way, an application ontology (e.g., the GeoNHEWS ontology) integrates with the GMO, enabling the system to perform further discovery through the ontology-based data discovery and selection framework.

In OWL representation, the core EWS elements are captured using dedicated classes. For example, EWS:RiskKnowledge is defined with object properties linking to EWS:EventHistory, EWS:BasicDisposition, and EWS:Vulnerability. These relationships reflect the conceptual elements discussed earlier and ensure the ontology captures both static and dynamic components of hazard understanding. Likewise, EWS:Monitoring is connected to EWS:VariableDisposition, while EWS:Dissemination reuses disposition and vulnerability links to inform decision-making and communication. To provide information for measuring dispositions and vulnerabilities, the ontology uses the classes :FeatureTheme and :PropertyTheme. :FeatureTheme represents ultimate features of interest, such as rivers, buildings, and other infrastructure, which are needed to determine basic dispositions and vulnerabilities. :PropertyTheme represents observable or measurable characteristics to assess dispositions, such as changes in soil moisture or temperature, as part of the monitoring process. The GeoNHEWS ontology integrates with the GMO through instances of the :FeatureTheme and :PropertyTheme classes, which define dataset themes. It should be mentioned that to implement the GeoNHEWS ontology, the same procedure used for developing the geospatial metadata ontology was employed. It is implemented in OWL 2, built using Protégé.

5. Testing the Feasibility of the GeoFit Framework in a Geoportal

A geoportal is designed and implemented to demonstrate the feasibility of incorporating the proposed ontology-based data discovery and selection framework within geoportals. The following sections first show the adaptability of this framework with the conventional geoportals and then explain how this approach works in practice.

5.1. System Architecture

The components shown in black in Figure 5 represent the architecture of a conventional geoportal, as described in the SDI Cookbook [69] and supported by the OGC Catalogue Services Specification 2.0.2 [70]. It starts with a user searching for specific geospatial resources through the user interface. The user’s request is sent to the related catalog services by the catalog portal. Each catalog service queries the metadata database and retrieves geospatial resources that meet the specified search criteria. The catalog portal then aggregates and presents the results to the user through the user interface.

As shown in Figure 5, the system architecture extends the conventional geoportal model (black components) by incorporating additional components (in green) to support enhanced capabilities. The core added component is the GeoFit Engine, which integrates an ontology-based approach for semantic enrichment and data processing. The GeoFit Engine is integrated within the catalog service, enhancing its capabilities while maintaining alignment with the conventional geoportal system architecture. This engine interacts with the ontology-based metadata to perform complex queries and generates an optimal list of ranked results by calculating the semantic similarity between user-searched queries and the themes of datasets. The Faceted Search Module is integrated into the user interface, allowing the dynamic filtering and updating of search results based on user interaction. This module sends queries to the catalog portal, which then forwards them to the catalog service and the GeoFit Engine, ensuring refined and relevant search results are presented back to the user.

5.2. Technical Supports for Testing the Feasibility of the Ontology-Based Geospatial Data Discovery Framework

In evaluating the feasibility of our system architecture, Spring Boot (version 3.2.0; Pivotal Software, San Francisco, CA, USA) is selected as the primary framework due to its robust capabilities in building scalable and enterprise-level Java applications. This choice is particularly advantageous given the need to integrate Apache Jena (version 4.9.0; The Apache Software Foundation, Forest Hill, MD, USA), which is also Java-based, ensuring a seamless and consistent development process. Alternative frameworks like Django (version 5.0; Django Software Foundation, Lawrence, KS, USA) and Node.js (version 20.8; OpenJS Foundation, San Mateo, CA, USA) were considered, but Spring Boot is preferred for its extensive ecosystem, strong community support, and ease of integration with other Java-based tools. Spring Boot’s capabilities enable the integration of front-end technologies such as Thymeleaf (version 3.1.1; The Thymeleaf Project, Seville, Spain) for dynamic HTML rendering. Thymeleaf offers seamless integration with Spring Boot and provides a natural syntax for creating server-side rendered views, while maintaining the flexibility required for complex applications. For interactive map functionalities, several options are available, including Leaflet (version 1.9.4; Vladimir Agafonkin, Kyiv, Ukraine), OpenLayers (version 8.2.0; OpenLayers Contributors, Worldwide), and the Google Maps API (Google LLC, Mountain View, CA, USA). Each of these libraries offers robust capabilities for web-based mapping. Leaflet is chosen for its simplicity, ease of use, and lightweight design. Apache Tomcat (version 10.1.13; The Apache Software Foundation, Forest Hill, MD, USA) serves as the server environment for hosting the application, as it is the default embedded server in Spring Boot and offers ease of configuration along with broad adoption in the Java ecosystem. Among RDF stores, options such as Virtuoso (version 7.2.9; OpenLink Software, Burlington, MA, USA) and RDF4J (version 4.3.5; Eclipse Foundation, Brussels, Belgium) were considered; however, Apache Jena is selected due to its comprehensive feature set, and, particularly, for its robust and native support for GeoSPARQL. Alternatives like RDFLib (version 6.3.2; RDFLib Team, Worldwide) and Owlready2 (version 0.44; LORIA, Nancy, France) were also reviewed, but their lack of native GeoSPARQL support excluded them from further consideration. Apache Jena Fuseki (version 4.9.0; The Apache Software Foundation, Forest Hill, MD, USA) is used to handle direct SPARQL queries. Fuseki’s built-in user interface, known as the Fuseki Admin Interface, provides a query form and a results display.

5.3. System Workflow for Feasibility Testing

Key elements of the system workflow focus on guiding the user through the processes of dataset search, filtering, and ranking. These components are designed to facilitate efficient data discovery by supporting interactive exploration and semantically enriched selection. The key elements involved in the system workflow are (a) User Search Interaction, (b) Faceted Search Module, and (c) Semantic Ranking. Each of these elements will be discussed in detail:

(a): User Search Interaction: The system’s web interface is designed with a user-centric approach to streamline data discovery. As shown in Figure 6, users start by entering a search query into the system’s web interface. This search box supports free-text queries, allowing users to input keywords or phrases related to the geospatial data they are looking for. Additionally, the interface includes a button for SPARQL queries, enabling users to perform advanced and precise searches using SPARQL. The auto-completion function calculates the semantic distance between the search query and theme terminologies, suggesting possible themes during typing.
(b): Search filters based on the initial search results. The user interface sends a query for faceted search filters to the catalog portal, which forwards it to the catalog service and GeoFit Engine. The GeoFit Engine queries the ontology for filter data, which is sent back through the same path to the user interface. The user interface then updates the faceted search filters, displaying the results and updated filters to the user.
Figure 7 shows the spatial coverage filter and the sensor-faceted search.
(c): Semantic Ranking: To test the effectiveness of dataset relevance and semantic ranking in the geoportal, a total of 75 geospatial datasets related to three main themes were selected from relevant NASA data archives. These themes include the themes related to Precipitation (47 related datasets), Soil Moisture and Water Content (15 related datasets), and Digital Terrain Model (DTM) and Digital Elevation Model (DEM) (13 related datasets). The datasets were sourced from the following NASA data centers:

Precipitation datasets from the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) (Available online: https://disc.gsfc.nasa.gov/, accessed on 19 September 2025).
Soil Moisture/Water Content datasets from the NASA National Snow and Ice Data Center (NSIDC) (Available online: https://nsidc.org/home, accessed on 19 September 2025).
DTM/DEM datasets from the NASA Land Processes Distributed Active Archive Center (LP DAAC) (Available online: https://lpdaac.usgs.gov/, accessed on 19 September 2025).

It should be noted that not all datasets exactly match each mentioned theme, but are related to them. For instance, within the Precipitation theme, related themes could include precipitation, rain, liquid precipitation, precipitation rate, precipitation amount, total surface precipitation rate, etc. These related themes were chosen to assess the system’s ability to handle semantic similarity and ranking, with the goal of improving the retrieval of relevant datasets and ranking them by their relevance.

Figure 8 presents the search results for “Precipitation,” where the system successfully retrieved all 47 datasets associated with this theme. In contrast, when the search was defined as “Total surface precipitation rate”, as shown in Figure 9, the system retrieved only 24 results. This reduction in the number of results reflects the more-focused nature of the query. For the broader “Precipitation” search, all 47 datasets were captured as they fell within the acceptable similarity range (the threshold of four for maximum distance). However, with the more specific “Total surface precipitation rate” search, only those datasets with a semantic similarity score of four or less, with narrower or siblings’ concepts in the SKOS GCMD ontology, were included, ensuring that the results were highly relevant to the narrower query.

The system’s performance in retrieving and ranking datasets is strongly influenced by the use of the GCMD (Global Change Master Directory) taxonomy, which is specifically aligned with NASA’s standards for organizing and categorizing datasets. By incorporating this taxonomy, the system was able to classify NASA datasets with precise accuracy within the geoportal. Furthermore, the geospatial metadata ontology was developed to accommodate different controlled vocabularies used by various geoportals. If a geoportal employs a different taxonomy or vocabulary, the system can be configured to integrate that specific vocabulary to categorize datasets in a way that aligns with the geoportal’s standards. This adaptability is intended to help maintain accuracy across datasets from different sources by using the relevant taxonomy for each. Although we tested the ontology-based data-discovery framework using the GCMD vocabulary and NASA datasets, the system can still effectively classify and retrieve datasets for geoportals lacking a specific controlled vocabulary. This capability draws on the work of Li [30], which demonstrated that, even without a predefined taxonomy, combining thematic classification with semantic analysis can significantly improve dataset retrieval, achieving an overall accuracy above 90% for datasets across different categories.

6. Evaluating the Added Value of GeoFit in Natural Hazard EWS Applications

6.1. Showcase on River Flood in Nunavik

To evaluate the feasibility and added value of the GeoFit approach with adding the GeoNHEWS ontology, this approach is tested through a river flood EWS showcase in Nunavik. To do that, the flood-associated information that should be populated in the GeoNHEWS ontology is first identified (i.e., basic disposition, variable disposition, and vulnerabilities) and, next, the ontology is tested with the geoportal. All information gathered for the river flood EWS showcase in Nunavik originates from comprehensive investigations documented by Allard et al. [71,72,73,74,75,76,77,78,79,80,81] and Deslauriers et al. [82]. For instance, Table 1 presents the main required information and data themes related to the basic and variable dispositions of river flooding in Nunavik.

6.2. Geoportal Interface for GeoNHEWS Data Discovery System

The geoportal interface developed for the GeoNHEWS data discovery system provides a user-friendly environment for exploring geospatial information and data. In addition to the data-discovery capabilities discussed so far, the web portal includes interactive controls that let users select an EWS element. Based on this selection, users can then choose the natural hazards relevant to the selected EWS element, as shown in Figure 10. These selections are used to retrieve geospatial information and the data themes that were semantically defined and linked in the GeoNHEWS ontology.

6.3. Testing the Ontology-Based Discovery Framework of EWS-Related Geospatial Data for the Flood

To assess how the system supports data discovery, two levels of validation are performed. The first level of tests refers to competency questions. The second level compares the dataset relevance and the semantic ranking with manual verification performed by the authors. Competency questions help determine whether the ontology can correctly and quickly link concepts, support domain experts in locating appropriate geospatial data, and provide relevant and comprehensive information for decision-making. Specifically, competency question 1 states “What are observed properties that can be used in the monitoring element of EWS for flood events?” and is designed to confirm that the link between concepts in the EWS, the processes of natural hazards, and their associated geospatial information is correct. Figure 11 illustrates the results of competency question 1, with Figure 11a showing the outputs in Protégé software and Figure 11b showing the outputs in the geoportal.

The competency question 2, “What additional observed properties, beyond direct measurements such as rising water levels, may be needed in the discovery process of datasets for flood event monitoring?”, is intended to assess whether the ontology supports the identification of triggering conditions to flood events. These may include properties such as precipitation rate, snowmelt, or soil saturation, which are not direct measurements of flooding but play a crucial role in its onset. Figure 12 shows the results of the second competency question, respectively, in Protégé software and in the geoportal. As illustrated, users can select this suggested information and search for related geospatial datasets. The system will then use the selected geospatial information as the search query and return results using the ontology-based data-discovery framework within the GeoFit approach. For example, by selecting the “Total surface precipitation rate” the system retrieves the datasets, as shown in Figure 13.

The same logic applied in the earlier competency questions was used to verify all geospatial information represented in the geoportal. For example, questions like “What are the vulnerable entities for flood hazards in Nunavik?” and “What are the basic dispositions for flood hazards in Nunavik?” were successfully answered by the geoportal. The results of these competency questions show that ontology can provide the necessary geospatial information and potential geospatial datasets. Moreover, the GeoNHEWS ontology, through its inference capabilities, can suggest related geospatial information and potential datasets that domain experts may utilize in an EWS.

7. Discussion and Conclusions

In this paper, we have exposed a new framework called GeoFit that seeks to improve the discovery of geospatial data, and we evaluated its relevance to the needs of various users in geoportals. The focus was on identifying and organizing crucial geospatial information and data sources for natural hazard EWSs.

The GeoFit framework consists of (1) assembling and introducing a geospatial metadata ontology (GMO) through reusing W3C and OGC ontologies, and (2) utilizing GMO in the discovery and selection process to support data fitness for use. It was shown that this framework improved data discovery and selection by focusing on semantic technologies to address data integration, usability, interoperability, and, in the end, boost inherent capabilities in geoportals to improve data selection and fitness for use. Among its core components, GMO plays a central role in achieving these improvements. The primary advantage of GMO is its ability to unify essential dataset aspects for discovery and to integrate diverse geospatial data sources into a single cohesive metadata model. Additionally, this semantic metadata model is built entirely on the recommendations of ISO 19115 and other well-known semantic standards. So, rather than proposing a new standard for data producers to publish their metadata, this model gathers and structures semantic information from initial metadata and enhances the capabilities of existing geoportals to improve data selection.

An added benefit of a GMO is the ability to perform thematic classification. By utilizing :FeatureTheme and :PropertyTheme, each dataset’s theme is defined as a hierarchy of concepts rather than single words. This thematic classification allows geoportals to incorporate the description of domain entities and their relationships during the search process. Therefore, rather than relying on basic keyword matching, the geoportal can search for a hierarchy of concepts. For example, when searching for “sea hourly water level”, instead of considering “sea”, “hourly”, “water”, and “level” individually, the system can understand the user’s intention and retrieve datasets related to “water levels” of the “sea” on an “hourly” basis. Thematic classification enables semantic ranking by allowing the system to evaluate and rank datasets based on how closely their themes match the user’s search intent. This process helps retrieve datasets not only by understanding the user’s intention through a hierarchy of concepts but also by identifying and ranking datasets that are semantically similar to the intended dataset, thereby increasing the relevance of search results.

The GeoNHEWS ontology was able to discover and select suitable geospatial datasets thanks to establishing relationships among key concepts in natural hazard EWSs and the necessary geospatial information. Regarding natural hazard EWS’s requirements, the knowledge base was able to suggest geospatial information that is directly or indirectly vital when establishing different parts of EWSs. In comparison with other works in the domain of natural hazard EWS, GeoNHEWS is not limited to a specific region or existing tasks. Based on our research, in many places like Nunavik, there are no established natural hazard EWSs, and, to create one, a roadmap is needed to identify the geospatial information and datasets required for each element of an EWS. While some research has addressed the gathering of these datasets, there is no established environment to assess how well they fit the needs of EWS elements. Our proposal integrates GeoFit to discover the required geospatial datasets and introduces the concept of fitness for use in selecting geospatial datasets for natural hazard EWSs, setting the stage for future studies.

The central contribution of this research is architectural: the design and integration of a metadata ontology that enriches geoportal infrastructures with semantic capabilities. The evaluation strategy focused on feasibility and applicability rather than optimizing retrieval accuracy. The ontology layer was demonstrated to operate within a conventional geoportal architecture and enable new capabilities: semantic expansion of queries, hierarchical ranking of related datasets, and ontology-driven filtering. These capabilities were exercised on NASA Earth observation datasets, which already include thematic tags from the GCMD vocabulary. As a result, classification quality was ensured by design, and the evaluation illustrated how these existing annotations can be semantically operationalized through an ontology structure. The study, therefore, concentrated on proving that ontology integration is both possible and beneficial for extending discovery mechanisms in practical contexts such as natural hazard EWSs.

Cross-portal benchmarking, for example, against NASA Earthdata Search, is not informative for this contribution. Each portal indexes different datasets, applies distinct vocabularies, and often uses proprietary ranking features. Differences in retrieval performance would, therefore, reflect those factors rather than the ontology layer. A more appropriate test is a within-system comparison, assessing capability with and without ontology support on the same catalog service. Results demonstrated that ontology integration enables semantic capabilities not available in conventional metadata-driven portals.

The ontology-based framework is also retrieval-model agnostic. Its semantic substrate of explicit concepts, relationships, and constraints can support diverse retrieval approaches, from keyword matching to neural embeddings and learning-to-rank methods. Existing powerful retrieval methods can be combined with the ontology structure to further improve accuracy and user satisfaction. The framework, therefore, provides both an immediate enhancement of semantic capabilities and a foundation for integrating future retrieval techniques.

While the model provides benefits, including improving the integration and effectiveness of discovery for various geospatial data sources, it also has certain limitations. In the process of metadata transformation, we primarily mapped ISO 19115 metadata elements onto our system. However, different collections of metadata may adhere to various metadata formats, such as the Directory Interchange Format (DIF), which was not addressed in this study. This could potentially limit the interoperability and integration of datasets that do not conform to ISO 19115 standards. Another limitation concerns the method of searching for data. While this research focused on semantic improvements, we did not fully incorporate natural language query techniques. Natural Language Processing (NLP) and Large Language Models (LLMs) could enhance user interaction by allowing more intuitive, conversational searches, which would complement our ontology-based framework.

Although the ontology-based framework enhances data discovery, relying solely on metadata may still lead to the exclusion of valuable datasets with poor metadata quality. The GeoFit framework is currently being extended through the integration of a geospatial data recommender system to address this limitation by leveraging user behavior and interaction history to suggest relevant datasets.

Author Contributions

Conceptualization, methodology, and validation, Amirhossein Vahdat; investigation, software, formal analysis, and writing—original draft preparation, Amirhossein Vahdat; writing—review and editing, Amirhossein Vahdat, Jacynthe Pouliot, and Thierry Badard; supervision, Jacynthe Pouliot and Thierry Badard. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Sentinel North program of Université Laval (via a research fund to R. Fortier and T. Badard), made possible, in part, thanks to funding from the Canada First Research Excellence Fund, under project number GF522489.

Acknowledgments

The authors wish to acknowledge the other members of the Working Group ‘Qaujjikairit project’, especially Richard Fortier.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pulwarty, R.S.; Sivakumar, M.V.K. Information systems in a changing climate: Early warnings and drought risk management. Weather. Clim. Extrem. 2014, 3, 14–21. [Google Scholar] [CrossRef]
Zschau, J.; Küppers, A.N. Early Warning Systems for Natural Disaster Reduction; Springer Science & Business Media: Berlin, Germany, 2013; pp. 3-1–3-12. [Google Scholar]
Prudhomme, C.; Homburg, T.; Ponciano, J.J.; Boochs, F.; Cruz, C.; Roxin, A.M. Interpretation and automatic integration of geospatial data into the Semantic Web: Towards a process of automatic geospatial data interpretation, classification and integration using semantic technologies. Computing 2020, 102, 365–391. [Google Scholar] [CrossRef]
Yu, M.; Yang, C.; Li, Y. Big data in natural disaster management: A review. Geosciences 2018, 8, 165. [Google Scholar] [CrossRef]
Gomes, V.C.F.; Queiroz, G.R.; Ferreira, K.R. An overview of platforms for big earth observation data management and analysis. Remote Sens. 2020, 12, 1253. [Google Scholar] [CrossRef]
Canada, S. Census Profile. 2021 Census of Population 2023. Available online: https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/index.cfm?Lang=E (accessed on 9 February 2022).
Pérez Bello, A. Extreme Precipitation and Temperature Relationship in Current and Future Climate./Relations Entre Précipitations Extrêmes et Température en Climat Actuel et Futur. Ph.D. Thesis, Université du Québec, Institut National de la Recherche Scientifique, Québec, QC, Canada, 2023; 175p. [Google Scholar]
Sheremata, M.; Tsuji, L.; Gough, W.A. Collaborative uses of geospatial technology to support climate change adaptation in indigenous communities of the Circumpolar North. In Geospatial Technology-Environmental and Social Applications; InTech: Rijeka, Croatia, 2016; pp. 197–215. [Google Scholar] [CrossRef]
Gregory, K.; Groth, P.; Cousijn, H.; Scharnhorst, A.; Wyatt, S. Searching data: A review of observational data retrieval practices in selected disciplines. J. Assoc. Inf. Sci. Technol. 2019, 70, 419–432. [Google Scholar] [CrossRef]
Ullah, K.; Zhang, J. GIS-based flood hazard mapping using relative frequency ratio method: A case study of Panjkora River Basin, eastern Hindu Kush, Pakistan. PLoS ONE 2020, 15, e0229153. [Google Scholar] [CrossRef]
Wu, Y.; Abdel-Aty, M.; Park, J.; Selby, R.M. Effects of real-time warning systems on driving under fog conditions using an empirically supported speed choice modeling framework. Transp. Res. Part C Emerg. Technol. 2018, 86, 97–110. [Google Scholar] [CrossRef]
Masó, J.; Pons, X.; Zabala, A. Tuning the second-generation SDI: Theoretical aspects and real use cases. Int. J. Geogr. Inf. Sci. 2012, 26, 983–1014. [Google Scholar] [CrossRef]
Gui, Z.; Yang, C.; Xia, J.; Liu, K.; Xu, C.; Li, J.; Lostritto, P. A performance, semantic and service quality-enhanced distributed search engine for improving geospatial resource discovery. Int. J. Geogr. Inf. Sci. 2013, 27, 1109–1132. [Google Scholar] [CrossRef]
Hu, Y.; Li, W. Spatial data infrastructures. arXiv 2017, arXiv:1707.03969. [Google Scholar] [CrossRef]
Li, W.; Goodchild, M.F.; Raskin, R. Towards geospatial semantic search: Exploiting latent semantic relations in geospatial data. Int. J. Digit. Earth 2014, 7, 17–37. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, T.; Li, W. Automatic search of geospatial features for disaster and emergency management. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 409–418. [Google Scholar]
Quarati, A.; De Martino, M.; Rosim, S. Geospatial open data usage and metadata quality. ISPRS Int. J. Geo-Inf. 2021, 10, 30. [Google Scholar] [CrossRef]
Jiang, Y. Improving Geospatial Data Search Ranking Using Deep Learning and User Behaviour Data. Ph.D. Thesis, George Mason University, Fairfax, VA, USA, 2018. [Google Scholar]
Devillers, R.; Bédard, Y.; Jeansoulin, R.; Moulin, B. Towards spatial data quality information analysis tools for experts assessing the fitness for use of spatial data. Int. J. Geogr. Inf. Sci. 2007, 21, 261–282. [Google Scholar] [CrossRef]
Horita, F.E.A.; de Albuquerque, J.P.; Marchezini, V. Understanding the decision-making process in disaster risk monitoring and early-warning: A case study within a control room in Brazil. Int. J. Disaster Risk Reduct. 2018, 28, 22–31. [Google Scholar] [CrossRef]
Qin, J.; D’Ignazio, J. The central role of metadata in a science data literacy course. J. Libr. Metad. 2010, 10, 188–204. [Google Scholar] [CrossRef]
Kalantari, M.; Syahrudin, S.; Rajabifard, A.; Subagyo, H.; Hubbard, H. Spatial metadata usability evaluation. ISPRS Int. J. Geo-Inf. 2020, 9, 463. [Google Scholar]
Wentz, E.; Shimizu, M. Measuring spatial data fitness-for-use through multiple criteria decision making. Ann. Am. Assoc. Geogr. 2018, 108, 1150–1167. [Google Scholar] [CrossRef]
Van Oort, P. Spatial Data quality: From Description to Application. Ph.D. Thesis, Wageningen University and Research, Wageningen, The Netherlands, 2006. [Google Scholar]
Vahdat, A. A New Approach to Enhance Geospatial Data Selection in Geoportals: Application for Supporting Natural Hazard Early Warning System in Nunavik, Québec. Ph.D. Thesis, Université Laval, Québec, QC, Canada, 2025. [Google Scholar]
Li, W.; Yang, C.; Nebert, D.; Raskin, R.; Houser, P.; Wu, H.; Li, Z. Semantic-based web service discovery and chaining for building an Arctic spatial data infrastructure. Comput. Geosci. 2011, 37, 1752–1762. [Google Scholar] [CrossRef]
Gray, J.; Liu, D.T.; Nieto-Santisteban, M.; Szalay, A.; DeWitt, D.J.; Heber, G. Scientific data management in the coming decade. ACM SIGMOD Rec. 2005, 34, 34–41. [Google Scholar] [CrossRef]
Jiang, H.; van Genderen, J.; Mazzetti, P.; Koo, H.; Chen, M. Current status and future directions of geoportals. Int. J. Digit. Earth 2020, 13, 1093–1114. [Google Scholar]
van den Brink, L.; Barnaghi, P.; Tandy, J.; Atemezing, G.; Atkinson, R.; Cochrane, B.; Fathy, Y.; García Castro, R.; Haller, A.; Harth, A. Best practices for publishing, retrieving, and using spatial data on the web. Semant. Web 2019, 10, 95–114. [Google Scholar]
Li, W. Lowering the barriers for accessing distributed geospatial big data to advance spatial data science: The PolarHub solution. Ann. Am. Assoc. Geogr. 2018, 108, 773–793. [Google Scholar]
Lutz, M.; Sprado, J.; Klien, E.; Schubert, C.; Christ, I. Overcoming semantic heterogeneity in spatial data infrastructures. Comput. Geosci. 2009, 35, 739–752. [Google Scholar] [CrossRef]
Li, W. Automated Data Discovery, Reasoning and Ranking in Support of Building an Intelligent Geospatial Search Engine. Ph.D. Thesis, George Mason University, Fairfax, VA, USA, 2010. [Google Scholar]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Chandra, R.; Kumar, S.S.; Patra, R.; Agarwal, S. Decision support system for Forest fire management using Ontology with Big Data and LLMs. arXiv 2024, arXiv:2405.11346. [Google Scholar] [CrossRef]
Bechhofer, S. OWL: Web ontology language. In Encyclopedia of Database Systems, 2nd ed.; Liu, L., Özsu, M.T., Eds.; Springer: New York, NY, USA, 2018; pp. 2640–2641. [Google Scholar]
Sánchez-Zas, C.; Villagrá, V.A.; Vega-Barbas, M.; Larriva-Novo, X.; Moreno, J.I.; Berrocal, J. Ontology-based approach to real-time risk management and cyber-situational awareness. Future Gener. Comput. Syst. 2023, 141, 462–472. [Google Scholar] [CrossRef]
Hamdani, Y.; Xiao, G.; Ding, L.; Calvanese, D. An ontology-based framework for geospatial integration and querying of raster data cube using virtual knowledge graphs. ISPRS Int. J. Geo-Inf. 2023, 12, 375. [Google Scholar]
Martin, P.; Magagna, B.; Liao, X.; Zhao, Z. Semantic linking of research infrastructure metadata. In Towards Interoperable Research Infrastructures for Environmental and Earth Sciences: A Reference Model Guided Approach for Common Challenges; Springer: Cham, Switzerland, 2020; pp. 226–246. [Google Scholar]
Zhang, F.; Lu, Q.; Du, Z.; Chen, X.; Cao, C. A comprehensive overview of RDF for spatial and spatiotemporal data management. Knowl. Eng. Rev. 2021, 36, e10. [Google Scholar] [CrossRef]
Athanasis, N.; Kalabokidis, K.; Vaitis, M.; Soulakellis, N. Towards a semantics-based approach in the development of geographic portals. Comput. Geosci. 2009, 35, 301–308. [Google Scholar] [CrossRef]
Frey, J.; Müller, K.; Hellmann, S.; Rahm, E.; Vidal, M.E. Evaluation of metadata representations in RDF stores. Semant. Web 2019, 10, 205–229. [Google Scholar] [CrossRef]
Beretta, V.; Desconnets, J.-C.; Mougenot, I.; Arslan, M.; Barde, J.; Chaffard, V. A user-centric metadata model to foster sharing and reuse of multidisciplinary datasets in environmental and life sciences. Comput. Geosci. 2021, 154, 104807. [Google Scholar] [CrossRef]
Janowicz, K.; Haller, A.; Cox, S.J.D.; Le Phuoc, D.; Lefrançois, M. SOSA: A lightweight ontology for sensors, observations, samples, and actuators. J. Web Semant. 2019, 56, 1–10. [Google Scholar] [CrossRef]
Smits, P.C.; Friis-Christensen, A. Resource discovery in a European spatial data infrastructure. IEEE Trans. Knowl. Data Eng. 2006, 19, 85–95. [Google Scholar] [CrossRef]
Vockner, B.; Mittlböck, M. Geo-enrichment and semantic enhancement of metadata sets to augment discovery in geoportals. ISPRS Int. J. Geo-Inf. 2014, 3, 345–367. [Google Scholar] [CrossRef]
Li, W.; Song, M.; Tian, Y. An ontology-driven cyberinfrastructure for intelligent spatiotemporal question answering and open knowledge discovery. ISPRS Int. J. Geo-Inf. 2019, 8, 496. [Google Scholar] [CrossRef]
Li, Y.; Jiang, Y.; Yang, C.; Yu, M.; Kamal, L.; Armstrong, E.M.; Huang, T.; Moroni, D.; McGibbney, L.J. Improving search ranking of geospatial data based on deep learning using user behavior data. Comput. Geosci. 2020, 142, 104520. [Google Scholar] [CrossRef]
Vockner, B.; Belgiu, M.; Mittlboeck, M. Recommender-based enhancement of discovery in Geoportals. Int. J. Spat. Data Infrastruct. Res. 2012, 7, 441–463. [Google Scholar]
Jiang, Y.; Li, Y.; Yang, C.; Hu, F.; Armstrong, E.M.; Huang, T.; Moroni, D.; McGibbney, L.J.; Greguska, F.; Finch, C.J. A smart web-based geospatial data discovery system with oceanographic data as an example. ISPRS Int. J. Geo-Inf. 2018, 7, 62. [Google Scholar] [CrossRef]
Vockner, B.; Richter, A.; Mittlböck, M. From geoportals to geographic knowledge portals. ISPRS Int. J. Geo-Inf. 2013, 2, 256–275. [Google Scholar] [CrossRef]
Wilson, G.; Devillers, R.; Hoeber, O. Fuzzy logic ranking for personalized geographic information retrieval. In Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011), Prague, Czech Republic, 29–31 August 2011; pp. 111–123. [Google Scholar]
Hervey, T.; Lafia, S.; Kuhn, W. Search facets and ranking in geospatial dataset search. In Proceedings of the 11th International Conference on Geographic Information Science (GIScience 2021)—Part I, Poznań, Poland, 27–30 September 2021; pp. 5:1–5:15. [Google Scholar]
Qiu, L.; Du, Z.; Zhu, Q.; Fan, Y. An integrated flood management system based on linking environmental models and disaster-related data. Environ. Model. Softw. 2017, 91, 111–126. [Google Scholar] [CrossRef]
Phengsuwan, J.; Shah, T.; James, P.; Thakker, D.; Barr, S.; Ranjan, R. Ontology-based discovery of time-series data sources for landslide early warning system. Computing 2020, 102, 745–763. [Google Scholar] [CrossRef]
ISO 19115-1:2014; Geographic Information—Metadata—Part 1: Fundamentals. International Organization for Standardization: Geneva, Switzerland, 2014.
ISO 19115-2:2019; Geographic Information—Metadata—Part 2: Extensions for Acquisition and Processing. International Organization for Standardization: Geneva, Switzerland, 2019.
European Commission. Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 Establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). Official Journal of the European Union, L 108, 25 April 2007; 1–14. [Google Scholar]
Perego, A.; Cetl, V.; Friis-Christensen, A.; Lutz, M. GeoDCAT-AP: Representing geographic metadata by using the “DCAT application profile for data portals in Europe”. In Proceedings of the Joint UNECE/UNGGIM Europe Workshop on Integrating Geospatial and Statistical Standards, Stockholm, Sweden, 6–8 November 2017. [Google Scholar]
ISO 19110:2016; Geographic Information—Methodology for Feature Cataloguing. International Organization for Standardization: Geneva, Switzerland, 2016.
ISO 19156:2011; Geographic Information—Observations and Measurements. International Organization for Standardization: Geneva, Switzerland, 2011.
Lehmann, J.; Athanasiou, S.; Both, A.; Garcia Rojas, A.; Giannopoulos, G.; Hladky, D.; Le Grange, J.J.; Ngonga Ngomo, A.C.; Sherif, M.A.; Stadler, C.; et al. Managing geospatial linked data in the GeoKnow project. In The Semantic Web in Earth and Space Science: Current Status and Future Directions; IOS Press: Amsterdam, The Netherlands, 2015; pp. 51–78. [Google Scholar]
Djidjev, H.N.; Pantziou, G.E.; Zaroliagis, C.D. Computing shortest paths and distances in planar graphs. In Proceedings of the 18th International Colloquium on Automata, Languages and Programming, Madrid, Spain, 8–12 July 1991; pp. 327–338. [Google Scholar]
Kotsiantis, S.; Kanellopoulos, D. Discretization techniques: A recent survey. GESTS Int. Trans. Comput. Sci. Eng. 2006, 32, 47–58. [Google Scholar]
Sreevalsan-Nair, J.; Mundayatt, A. Evolution of Data-driven Single- and Multi-Hazard Susceptibility Mapping and Emergence of Deep Learning Methods. arXiv 2025, arXiv:2502.09045. [Google Scholar]
Van Westen, C.J. Remote sensing and GIS for natural hazards assessment and disaster risk management. In Treatise on Geomorphology; Elsevier: Amsterdam, The Netherlands, 2013; Volume 3, pp. 259–298. [Google Scholar]
Sättele, M.; Bründl, M.; Straub, D. A classification of warning system for natural hazards. In Proceedings of the 10th International Probabilistic Workshop, Stuttgart, Germany, 15–16 November 2012; Moormann, C., Huber, M., Proske, D., Eds.; Institut für Geotechnik der Universität Stuttgart: Stuttgart, Germany, 2012; pp. 257–270. [Google Scholar]
UNDRR: Early Warning Systems Terminology. Available online: https://www.undrr.org/terminology/early-warning-system (accessed on 15 June 2024).
Prayudi, S.D. Multiparameter Land Subsidence Vulnerability Assessment through Satellite Imagery, GIS, and Spatial Data Integration. Bull. Geol. 2023, 7, 1261–1270. [Google Scholar]
Nebert, D. Developing Spatial Data Infrastructures: The SDI Cookbook v. 2.0. Glob. Spat. Data Infrastruct. 2004, 2, 39–56. [Google Scholar]
Rose, L. Geospatial Portal Reference Architecture: A Community Guide to Implementing Standards-Based Geospatial Portals. OpenGIS Discuss. Pap. 2004, OGC 04-039. Available online: https://portal.ogc.org/files/?artifact_id=6669 (accessed on 19 September 2025).
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 1: Summary Document, Community of Inukjuak; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 64p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 1: Summary Document, Community of Ivujivik; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 61p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 1: Summary Document, Community of Kangiqsujuaq; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 63p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 1: Summary Document, Community of Puvirnituq; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 68p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 1: Summary Document, Community of Quaqtaq; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 70p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Akulivik; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 63p.
Allard, M.; Aubé-Michaud, S.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Aupaluk; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 62p.
Allard, M.; Aubé-Michaud, S.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Kuujjuarapik; Final Report; Ministère de la Sécurité Publique (Government of Québec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 57p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Kuujjuaq; Final Report; Ministère de la Sécurité Publique (Government of Québec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 64p.
Allard, M.; Aubé-Michaud, S.; L’Hérault, E.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Salluit; Final Report; Ministère de la Sécurité Publique (Government of Québec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 74p.
Allard, M.; Aubé-Michaud, S.; Mathon-Dufour, V.; Deslauriers, C.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Tasiujaq; Final Report; Ministère de la Sécurité Publique (Government of Québec), Centre for Northern Studies, Université Laval: Québec, QC, Canada, 2020; 65p.
Deslauriers, C.; Allard, M.; Aubé-Michaud, S.; Mathon-Dufour, V.; Chiasson, A. Identification of Current and Potential Risks from Climate Change for Nunavik Community Territory—Phase 2: Summary Document, Community of Kangirsuk; Final Report; Ministère de la Sécurité Publique (Government of Quebec), Centre for Northern Studies, Université Laval: Quebec, QC, Canada, 2020; 65p.

Figure 1. Search examples based on NASA’s Earthdata Search geoportal (Available online: https://search.earthdata.nasa.gov/search, accessed on 19 September 2025). (a) Query result for “Wildfire.” (b) Query result for “Natural Hazards”.

Figure 2. The GeoFit framework (UML activity diagram).

Figure 3. Proposed Geospatial Metadata Ontology (GMO).

Figure 4. The core classes and object properties in the model of the GeoNHEWS ontology. Solid arrow is rdfs:subClassOf.

Figure 5. Conventional geoportal architecture (components shown in black) integrated with the proposed overall geoportal architecture (components shown in green).

Figure 6. Geoportal system’s main web interface page.

Figure 7. Spatial coverage filter interface and sensor-faceted search interface.

Figure 8. Number of results for “precipitation” search term.

Figure 9. Number of results for “total surface precipitation rate” search term.

Figure 10. GeoNHEWS data discovery main page and Suggested Natural Hazards for exploring information.

Figure 11. Results of the competency question 1 “What are observed properties that can be used in the monitoring element of EWS for flood events”. In the red boxes: (a) results in protégé software; and (b) results in the geoportal.

Figure 12. Results of the competency question 2 “What additional observed properties, beyond direct measurements such as rising water levels, may be needed in the discovery process of datasets for flood event monitoring”. In the red boxes: (a) results in protégé software and (b) results in the geoportal.

Figure 13. Results of selecting “total surface precipitation rate” suggested information from competency question 2 in geoportal.

Table 1. The main required information and data themes related to the basic and variable disposition of river flooding occurring in Nunavik.

Associated Basic Disposition Information and required Data Themes		Associated Variable Disposition Information and Required Data Themes
Basic Disposition	Related Main Data Themes	Variable Disposition	Related Main Data Themes
Presence of River Systems and Lake Basins	Rivers/Streams Lakes	Rising Water Levels (Directly related to monitoring the flood)	Flow Direction Discharge Rate Water Temperature Flow Velocity
Existence of Wetlands and Low-Lying Areas	Wetlands Floodplains	Pre-Storm Wind Patterns (Related to monitoring the triggering event of Storm Surges)	Atmospheric Winds Air Pressure Sea Surface Temperature
Existence of Glacial Deposits and Varied Topography	Glacial Deposits Topography	Seasonal and Daily Temperature Fluctuations (Related to monitoring the triggering event of Rapid Snowmelt)	Surface Air Temperature Solar Radiation
Presence of Rocky Terrains and Minimal Soil Cover	Terrain Soil Texture	Snowmelt and Ice Breakup (Related to monitoring the triggering events of Rapid Snowmelt/Ice Jams)	Surface Air Temperature River Ice Thickness Ice Sheets/Ice Shelves
Continuous Permafrost Zones	Permafrost	Precipitation (Related to monitoring the triggering event of Heavy Rainfall)	Precipitation Accumulated Precipitation
Thawing Permafrost Areas	Permafrost Degradation	Seasonal Precipitation Shifts (Related to monitoring the triggering events of Heavy Rainfall/Ice Jams)	Precipitation Surface Air Temperature
Sparse Tundra Vegetation	Tundra	Snowpack Variability (Related to monitoring the triggering events of Heavy Rainfall/Ice Jams)	Snow Cover Snow Water Equivalent
Changes in Vegetation Cover	Vegetation Cover Land Cover	Fall Rainfall Intensity and Freeze-up Timing (Related to the triggering event of Ice Jams)	Precipitation Surface Air Temperature Ice Formation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vahdat, A.; Badard, T.; Pouliot, J. Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems. ISPRS Int. J. Geo-Inf. 2025, 14, 369. https://doi.org/10.3390/ijgi14100369

AMA Style

Vahdat A, Badard T, Pouliot J. Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems. ISPRS International Journal of Geo-Information. 2025; 14(10):369. https://doi.org/10.3390/ijgi14100369

Chicago/Turabian Style

Vahdat, Amirhossein, Thierry Badard, and Jacynthe Pouliot. 2025. "Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems" ISPRS International Journal of Geo-Information 14, no. 10: 369. https://doi.org/10.3390/ijgi14100369

APA Style

Vahdat, A., Badard, T., & Pouliot, J. (2025). Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems. ISPRS International Journal of Geo-Information, 14(10), 369. https://doi.org/10.3390/ijgi14100369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of an Ontology-Based Framework to Enhance Geospatial Data Discovery and Selection in Geoportals for Natural-Hazard Early Warning Systems

Abstract

1. Introduction

2. Related Work

2.1. Semantic Enhancement for Data Discovery and Understanding

2.2. Geospatial Data Quality and Fitness for Use

2.3. Geospatial Data Discovery Methods for Natural Hazard EWSs

3. Ontology-Based Geospatial Data Discovery and Selection Framework

3.1. Introduction

3.2. Geospatial Metadata Ontology

3.3. Metadata Processing

3.4. Data Selection and Fitness for Use Processing

4. Ontology for Required Geospatial Information in Natural Hazard EWSs

5. Testing the Feasibility of the GeoFit Framework in a Geoportal

5.1. System Architecture

5.2. Technical Supports for Testing the Feasibility of the Ontology-Based Geospatial Data Discovery Framework

5.3. System Workflow for Feasibility Testing

6. Evaluating the Added Value of GeoFit in Natural Hazard EWS Applications

6.1. Showcase on River Flood in Nunavik

6.2. Geoportal Interface for GeoNHEWS Data Discovery System

6.3. Testing the Ontology-Based Discovery Framework of EWS-Related Geospatial Data for the Flood

7. Discussion and Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI