Next Article in Journal
Analysis of Flood Risk in Ulsan Metropolitan City, South Korea, Considering Urban Development and Changes in Weather Factors
Previous Article in Journal
Gravity-Driven Operation Mitigates Inorganic Fouling and Enables Low-Pressure Filtration in a Pilot-Scale Dynamic Membrane Bioreactor
Previous Article in Special Issue
Physical Flood Vulnerability Assessment in a GIS Environment Using Morphometric Parameters: A Case Study from Volos, Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication

1
Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487, USA
2
Department of Civil Engineering, University of Alabama, Tuscaloosa, AL 35487, USA
*
Author to whom correspondence should be addressed.
Water 2025, 17(19), 2801; https://doi.org/10.3390/w17192801
Submission received: 21 July 2025 / Revised: 16 September 2025 / Accepted: 16 September 2025 / Published: 23 September 2025
(This article belongs to the Special Issue Recent Advances in Flood Risk Assessment and Management)

Abstract

Flash floods are increasingly frequent and severe, yet standard risk communication messages are often too generic and lack actionable guidance, causing them to be ignored. This research aims to enhance flood risk communication by first, developing a robust flood ontology using a novel semi-automated approach, and second, demonstrating its potential as a semantic foundation for translating complex data into clear, personalized public alerts. We introduce a semi-automated, human-in-the-loop ontology engineering strategy that integrates expert-defined schemas with Large Language Model (LLM)-driven expansion and refinement from authoritative sources. Evaluation results are twofold: (1) Technical metrics confirm our LLM-constructed ontology achieves superior relationship richness and expressiveness compared with existing disaster ontologies. (2) A proof-of-concept case study demonstrates the ontology’s potential by showing how its specific classes and relations (e.g., ‘neededForElderly’ relation linking the class ‘SpecialConsideration’ to ‘ElderlyCommunityMember’) can be used to generate targeted advice like “check on elderly neighbors”, transforming a generic alert into a clear and actionable message. Consequently, this research delivers two key contributions: a replicable and domain-adaptable methodology for semi-automated ontology construction and a practical demonstration of how such an ontology can bridge the critical gap between flood data and public understanding, empowering communities to respond more effectively.

1. Introduction

Flash floods are occurring with greater frequency and intensity due to the accelerating effects of climate change, disproportionately impacting socioeconomically vulnerable communities that often lack the resources for effective preparedness and recovery [1,2]. In this context, effective flood risk communication is critical for enabling rapid response and timely evacuation. However, flood risk communication often fails at the most critical moment [3]. Public alerts are frequently too generic, filled with technical jargon like ‘rapid onset flooding,’ and lack the specific, actionable guidance needed to prompt immediate, life-saving responses from at-risk residents. This communication failure stems from a deeper, systemic issue: current flood information ecosystems remain fragmented. Local responders, relief agencies, and residents often consult siloed databases or ad-hoc reports whose terminologies are inconsistent, hindering rapid sense-making and coordinated action [4].
Structured, semantically meaningful representations, such as ontologies, offer a powerful tool to resolve this fragmentation by creating a shared, logical understanding of the flood domain [5]. However, while several flood and disaster ontologies exist, a critical analysis reveals two key limitations relevant to this research. First, many existing models are highly specialized frameworks designed for expert-level data integration and lack the specific, granular concepts required for public-facing risk communication. For example, ontologies like the Ontology for Flood Process Observation (OFPO) are excellent for organizing sensor and observation data, but are not explicitly designed to translate these data into clear, personalized alerts that account for socioeconomic vulnerability [6]. Second, their construction often relies on either slow, manual expert-driven methods that are difficult to scale and adapt, or on fully automated techniques using large language models (LLMs), which are prone to semantic drift, hallucination, and limitations in trust and accuracy, especially in high-stakes messaging [7,8]. Together, these findings reveal a dual methodological gap that this research addresses: first, the need for an efficient semi-automated pipeline that integrates human expertise with the power of modern AI to build a contextually rich ontology. Second, the need for an ontology specifically designed and demonstrated to bridge the gap between technical flood data and effective public warnings.
To address this dual gap, our research develops a structured, semi-automated methodology in two primary phases. The first phase focuses on ontology construction. It begins with domain experts defining the initial scope, followed by LLMs extending this into a seed ontology. This ontology is then iteratively refined and validated using Competency Questions (CQs) and enriched with structured knowledge extracted from authoritative sources like Federal Emergency Management Agency (FEMA) guidebooks and National Weather Service (NWS) guidelines. The second phase demonstrates the ontology’s practical application in enhancing flood risk communication. To achieve this, we conduct a proof-of-concept case study where the integrated ontology is used as a semantic framework to systematically deconstruct and revise a standard Flash Flood Warning from the NSW. This process involves a comparative ‘before-and-after’ analysis, showing how the ontology’s logical structure can translate generic, technical information into a specific, actionable, and personalized public alert. This two-part methodology provides a robust, context-sensitive resource and a clear validation of its effectiveness, directly linking the technical construction of the ontology to its real-world communication benefits.
The primary contribution of this paper is a novel, semi-automated methodology for ontology construction that effectively integrates human expertise with AI-driven knowledge extraction. This replicable, human-in-the-loop workflow addresses key challenges of fully automated generation, such as semantic drift and hallucination, resulting in a more trustworthy and contextually rich ontological model. Furthermore, to validate the practical relevance of this ontology, the paper presents a proof-of-concept case study. This demonstration shows the significant potential for the generated ontology to enhance flood risk communication by illustrating how its semantic structure can be used to translate a generic, technical warning into a clear, specific, and actionable public alert. The remainder of the paper is structured as follows: The next section reviews related works, discussing broader ontology applications within disaster communication and management contexts. Following that, the paper outlines the proposed methodological framework, detailing the specific phases of ontology development. The subsequent sections present and discuss the evaluation outcomes, while the final section summarizes the key findings and their broader implications.

2. Related Works

2.1. Ontologies in Flood-Related Applications

Ontologies and knowledge graphs can be understood as the abstract and concrete representations of a shared knowledge structure. An ontology specifies the formal vocabulary and schema of a domain, defining the concepts that exist and the relationships among them. A knowledge graph, in turn, instantiates this schema by populating it with concrete data. For example, in the flood-risk domain, the ontology defines entities such as floods, rivers, and evacuation procedures, while the knowledge graph integrates real-world information about particular flood events, river systems, and responses. This interpretation aligns with and synthesizes the range of definitions proposed in the literature. A widely cited foundation comes from Thomas Gruber, who defines an ontology as “an explicit specification of a conceptualization,” establishing the groundwork for treating ontologies as sharable and machine-readable specifications [9]. Practitioner guides such as Ontology Development 101 extend this view, describing an ontology as “a formal explicit description of concepts in a domain of discourse … an ontology together with a set of individual instances of classes constitutes a knowledge base”, thereby explicitly linking ontologies to instance data [10]. In this paper, we interpret the referenced “knowledge base” as the knowledge graph. Guarino et al. further situate ontologies on a continuum from informal to formal, ranging from glossaries and data dictionaries to logic programming and first-order logic, and emphasize that interoperability depends on a shared understanding achieved through community agreement [11]. In this light, an ontology is less a static dictionary than a social contract that aligns stakeholders around shared meanings.
Table 1 provides a summary of recent flood ontologies, highlighting their role in enhancing the efficient sharing of situational facts during emergencies. Early efforts focused on semantic interoperability, exemplified by Elmhadhbi et al.’s POLARISCO suite, which provides a shared dictionary for French responders through modular ontologies [12]. Recent ontologies expand their scope to the entire disaster-management lifecycle. Khantong et al. introduced a flood-evacuation ontology grounded in foundational ontologies, designed to structure and share both static and dynamic information across organizations in the response phase [13]. Bu Daher et al. showed that their ontology could integrate sensor readings, spatial layers, and social information to infer evacuation priorities and guide flood disaster response [14]. Similarly, Shukla et al.’s Disaster Management Ontology aligns its classes directly with India’s national disaster responsibility matrix [15]. The latest contributions, such as Du et al.’s Ontology for Flood Process Observation (OFPO), model not only the domain but the observation-and-decision workflow itself, linking tasks, data, methods, and sensors across mitigation, preparedness, response, and recovery stages [6]. Hofmeister et al. further advance this trend by explicitly targeting software agents, aligning their ontology with emerging artificial intelligence technologies [16]. Nevertheless, as shown in Table 1, flood ontologies still rely on manual design. Where AI is applied, as in Du et al.’s work, its role is limited to named entity recognition (BiLSTM-CRF and Word2Vec) for populating the ontology with instances and constructing the knowledge graph, rather than supporting ontology design itself [6]. In the critical domain of flooding, where knowledge must remain current, advanced AI approaches such as large language models have yet to be leveraged for ontology construction.
Most disaster and flood ontologies adopt established standards and are primarily implemented in OWL using Protégé. On Guarino’s continuum of formality—from plain glossaries (score 1) to logical ontologies in OWL or first-order logic (score 5)—these OWL-based approaches are scored at the highest level of formality in Table 1. However, Greater formal rigor does not necessarily yield superior performance in practice, since lower-formality ontologies such as SQL database schemas or UML diagrams can offer greater usability and efficiency, highlighting the need to balance expressiveness with practical applicability [11]. In the flood domain, this balance is especially critical, since ontologies are not only technical artifacts but also vehicles for communication among diverse stakeholders.

2.2. Ontologies for Flood-Related Communication

Effective communication requires the gathering of relevant information, the sense-making of that information, and the transmission of the resulting understanding as a negotiated and participatory process [23,24]. In flood-related communication, local officials, community leaders, and forecasting agencies gather and interpret information to issue warnings, support preparedness/outreach, and sustain communication into recovery [25,26]. For those responsible for communication, information resources remain fragmented and overwhelming in volume, causing information overload that impedes sense-making and distillation of clear messages [27]. Agencies may label the same phenomenon as “flash flood”, “pluvial event”, or “surface-water inundation”, which complicates cross-dataset queries and coordination. Because floods often escalate into multi-agency crises, fragmented information across hydrologists, utility operators, and emergency managers can further hinder a timely response [28]. The absence of a shared platform, as noted by Dorasamy et al., also reinforces siloed systems and delays collaboration [29].
Ontologies address fragmentation by centralizing information, contextualizing it within a shared domain, and formalizing common concepts [30]. They provide a stable foundation for data interpretation and knowledge exchange, ensuring that communication remains clear and aligned across stakeholders. In flood operations, warnings are disseminated through standardized protocols such as the Common Alerting Protocol (CAP) with NWS event codes in the United States [31]. These pipelines establish models for communicating hazard information but do not reconcile divergent hydrological vocabularies or fragmented agency data. Moreover, hand-crafted models struggle to keep pace with evolving terminology and the coordination demands of multi-agency practices. A flood ontology offers a way to align hazard terminology and codes with message fields while contextualizing information for intended audiences. This motivates a more automated approach that allows practitioners to iteratively validate and refine a schema for communicative content generation. Closest to this perspective, Sermet and Demir propose an information-centric flood ontology with a communication focus and AI-enabled interaction through a “knowledge engine,” supporting natural-language queries and multi-channel delivery [20]. Their ontology, however, is manually engineered and does not incorporate large language models or AI-assisted ontology construction. In contrast, the present work addresses this gap by contributing an AI-assisted methodology for ontology development, demonstrated through a proof-of-concept that generates context-rich flood messages.

2.3. Ontology Construction Methods

Manually crafting ontologies is widely recognized as resource-intensive and impractical to scale across domains [32]. Recent advances apply Artificial Intelligence and Machine learning, categorized by Ghidalia et al. into ontology learning, semantic mining, and learning-reasoning systems, to address challenges of scalability, consistency, and explainability [33]. Ontology learning (OL) has emerged as an approach for creating, maintaining, and populating ontologies with minimal human intervention, ranging from semi-automated systems with targeted human input to fully automated methods using text mining, information extraction, and symbolic reasoning [34]. In recent years, large language models have begun automating ontology engineering tasks [35]. LLMs can rapidly process large volumes of unstructured data with high accuracy, making them particularly advantageous for updating and refining knowledge in dynamic, multi-stakeholder, and time-sensitive domains such as flood communication. Once provided with a well-designed prompt template, modern LLMs can handle much of the tedious work of ontology engineering, as demonstrated by Castro et al., who showed that GPT-4 correctly identified, geocoded, and structured ecological distribution information from text in 87–100% of cases [36]. In disaster-related contexts, AI has been shown to generate structured insights that support decision-making and relieve experts of routine burdens [37].
A review of LLMs in ontology engineering by Li et al. highlights that most work to date focuses on early phases such as conceptualization and encoding. Systems such as OntoGenix use LLMs for preprocessing, schema planning, and refinement, achieving ontology quality comparable to manual models, though still requiring expert input [35]. Other efforts automate knowledge acquisition, such as Kommineni et al.’s pipeline where LLMs generate competency questions, answer them from corpora, and convert results into ontology axioms, reducing expert interviews but still needing human validation [38]. Lo et al. propose a different approach with OLLM, which treats ontology construction as a “sequence-to-graph” task and fine-tunes a 7B-parameter model to generate ontology subgraphs, outperforming extraction-based methods and generalizing well to new domains [39]. Table 2 provides a structured mini-review of representative approaches, illustrating the diversity of input data sources, tasks, automation levels, and evaluation strategies across domains as varied as semiconductor design, news media, and healthcare. Together, this body of work positions LLMs as powerful tools for automating ontology workflows. However, the risk of hallucinations—fluent yet unfounded assertions—highlights the continuing importance of human oversight, as evidenced by the consistent incorporation of expert validation across existing studies [8].
Addressing hallucinations requires moving beyond “human-in-the-loop” supervision toward genuine human–AI collaboration. Raees et al. describe this shift as a move toward interactive AI, where experts not only validate outputs but also shape how systems generate and refine knowledge [40]. Mazarakis et al. reinforce this direction, emphasizing interdisciplinary perspectives from human–computer interaction, psychology, and information science to ensure that humans remain central in design and application [41]. In the disaster response domain, Karanjit et al. illustrate how their Human–AI Convergence framework integrates machine-learning-based flood forecasts, expert knowledge, and social media inputs to improve evacuation planning [42]. Lokala et al. describe a human-led expansion of the Drug-Abuse Ontology, which was used to train an AI classifier that substantially reduced its false-positive rate on social-media data, although AI was not involved in the ontology construction itself [43]. More recent systems, such as Ontogenia, leverage LLMs to translate user stories and competency questions into OWL ontologies, while Tsaneva and Sabou demonstrate a human-in-the-loop crowdsourcing pipeline where semi-expert contributors validated ontology axioms with high accuracy [44,45]. Yet, Ontogenia remains confined to benchmark-style domains, and neither approach demonstrates genuine human–AI collaborative ontology construction and evaluation from expert documents or in critical, domain-specific settings such as flood-related communication. Collectively, these studies suggest a division of labor in which AI manages large-scale information processing, while human experts define scope, resolve ambiguities, and safeguard quality.
Table 2. Ontology learning with large language models—mini-review.
Table 2. Ontology learning with large language models—mini-review.
Ref.YearInput Data SourceGoal/TaskLLM Role/StrategyMethodology/PipelineAutomationTechnologyEvaluation
[46]2025Unstructured RAM technical documents (e.g., Semiconductor Draft Document 6578) plus user-provided targeted knowledge snippetsInteractive ontology extraction and subsequent knowledge graph generation tailored to Reliability and Maintainability domainOpenAI LLM with adaptive iterative Chain-of-Thought prompting inside a conversational user interfaceDialogue collection → CoT ontology extraction (concepts, relations, properties) → KG create and review → Cypher export → Neo4j loadSemi-automatic: human validates ontology steps; KG generation and database import automatedOpenAI API, adaptive CoT algorithm, Neo4j graph DB, Cypher MERGE, interactive web UICase study on Semiconductor Draft Document 6578; qualitative human review; future competency question evaluation planned
[47]2024IEEE Thesaurus v1.02 PDF + IEEE-Rel-1K (1000 topic pairs)Relation classification (broader, narrower, same-as, other) for topic ontology17 LLMs zero-shot; standard and chain-of-thought prompts with one/two-way heuristicsPrompt generation → LLM inference → heuristic aggregation → metric computationFully automatic; experts only build gold standardPython scripts via Amazon Bedrock, OpenAI API, KoboldAIPrecision, recall, F1 on IEEE-Rel-1K
[48]2024Natural language wine domain description and competency questionsAutomatic ontology generation (specification, conceptualization, implementation)GPT-3.5 CoT, role-play, few-shot prompting with iterative self-repairDraft generation → RDFLib syntax check → HermiT consistency check → OOPS pitfall resolutionFully automatic, post-hoc human analysisGPT-3.5 API; RDFLib; HermiT; OOPS API; Turtle; metaphactoryComparison to Stanford wine ontology using OntoMetrics counts and structural/inference analysis
[49]2024Reuters Nord Stream pipeline news article (first 12 sentences)Ontology extraction (classes, individuals, properties) from unstructured textGPT-4o zero-shot prompts at T = 0.3; direct, sequential, sentence-level variantsDirect one-shot → Sequential (class→individuals→relations) → Sentence-level extraction → MergeFully automatic extraction; no human in loopGPT-4o API; RDF/Turtle; Python scripts for merging and metricsPrecision, recall, F1; average degree score; qualitative inspection against ground truth
[50]2023LLM-as-source; GPT-3.5 latent knowledge seeded by a single domain conceptEnd-to-end concept-hierarchy (taxonomy) induction from scratch for a chosen domainGPT-3.5 generates lists, descriptions, and self-verifies relations via zero-/few-shot prompting with frequency samplingSeed concept → existence check → subconcept listing → description → multi-query verification → KRIS-based insertion into hierarchyFully automatic batch run; no human intervention during constructionPython + OpenAI GPT-3.5 API; parallel calls; KRIS insertion algorithm; output ontologies in OWL (RDF/XML)Manual subjective inspection; Structural stats (concepts, subsumptions, prompts/concept, cost)
[51]2023WordNet WN18RR terms; GeoNames categories; UMLS (NCI, MEDCIN, SNOMED CT) concepts; Schema.org type taxonomyZero-shot term typing, taxonomy discovery, and non-taxonomic relation extraction to construct ontologiesSeven LLMs queried with cloze/prefix prompts; FLAN instruction-tuning evaluated for gainsPrompt design → LLM inference → compare outputs to gold ontologies via MAP@1 or F1 metricsFully automatic zero-shot runs; domain experts only planned for later validationHuggingFace models (BERT, BART, BLOOM, Flan-T5) and GPT-3; open-source Python codebaseGold WordNet, GeoNames, UMLS, Schema.org sets; MAP@1 for typing, F1 for taxonomy and relations
[39]2024Wikipedia titles and summaries; arXiv titles and abstracts (2020–2022); each document annotated with categoriesEnd-to-end taxonomy induction—discovering concepts and taxonomic is-a relations from scratchMistral-7B finetuned via LoRA; custom frequency-masked loss; generates document-level subgraph pathsLinearise relevant paths → LLM outputs subgraphs → sum edge weights → prune loops, inverses, low-weights → final ontologyFully automatic batch pipeline; no human-in-the-loop after data collectionLoRA-adapted Mistral, vLLM runtime, Sentence-BERT embeddings, Hungarian assignment, simple graph convolutions for metricsLiteral, Fuzzy, Continuous, Graph F1 plus motif distance against Wikipedia and arXiv gold taxonomies
[52]2024Rule sets from seven ontologies—Wine, Economy, Olympics, Transport, SUMO, FoodOn, Gene OntologyOntology completion—predict missing concept-inclusion axioms within each ontologyFine-tuned or zero-shot LLMs used as NLI classifiers on verbalised rules; act as fallback judge in hybrid systemExtract rule templates → build concept graph → GNN scores candidates → NLI classifier → hybrid combines GNN first, LLM when no template matchFully automatic pipeline; human effort limited to annotating hard negative test rulesDeepOnto BERTSubs; RoBERTa, Llama-2, Mistral, Vicuna; GCN/GAT/R-GCN with ConCN embeddingsF1 on manually validated hard negatives across seven ontologies; inter-annotator k up to 0.83 for negatives
[53]2025Relational database schemas, natural-language schema documentation, external BioPortal ontologiesIterative ontology generation and enrichment from relational database schemasGen-LLM with hybrid recursive RAG; Judge-LLM or expert refinement; zero-shot promptsTable traversal → RAG retrieval → prompt → delta ontology → judge validation → merge → iterateMostly automatic; optional human or Judge-LLM review of each fragmentOWL 2 DL (Manchester), Faiss ANN index, SBERT embeddings, Protégé, HermiT reasonerProtégé syntax, HermiT consistency, OOPS pitfalls, structural metrics, semantic coverage, CQ scores on two medical databases
[35]2025Six Kaggle CSV datasets on airlines, Amazon beauty ratings, BigBasket products, Brazilian e-commerce orders, consumer complaints, UK e-commerce salesSemi-automatic ontology construction plus RML mapping and RDF knowledge-graph materialisation from tabular dataGPT-4 multi-agent prompting (Prompt Crafter, Plan Sage, OntoBuilder, OntoMapper) with iterative self-repair of mappingsGUI interaction → data preprocessing → schema definition → ontology building (Turtle) → RML mapping → KGen RDF generation with feedback loopSemi-automatic; LLM drives tasks while users iteratively refine prompts and validate results via Assist Bot GUIPython OntoGenix MVC GUI, GPT-4 (gpt-4-1106-preview), OWL/Turtle, RML, KGen, Morph-KGC; code on GitHub/ZenodoCompared six OntoGenix vs. human ontologies using 19 OQuaRE metrics, OOPS! pitfalls, expert review, and time-saving analysis
[54]2024PubMed breast-cancer research articles and NCCN treatment guidelinesExpand seed ontology and populate breast-cancer treatment knowledge graphChatGPT-3.5 fine-tuned on domain texts; prompts generate CQs; RAG answers; LLM judge scores outputsSeed ontology → LLM CQs → expert check → RAG retrieval → redundancy pruning → LLM triple extraction → KG assemblySemi-automatic with domain-expert validation of LLM-generated CQs and triplesProtégé editor, PubMed RAG pipeline, ChatGPT-3.5 backendFive PubMed articles manually tagged; LLM judge scored accuracy, completeness, relevance, consistency (1–5)
[55]2025Elicited user stories and competency questions plus the existing Music Meta OWL ontology textConversational support for requirement elicitation, CQ extraction, analysis, and testing of ontologiesGPT-3.5-turbo with one-shot/few-shot prompts acting as elicitor, generator, clusterer, and judge for CQ verificationPersona chat → CQ generation and refinement → redundancy removal and clustering → ontology verbalisation → prompt-based CQ unit testsSemi-automatic; human-in-loop refinement and confirmation, with automatic clustering and testing stagesPython 3.11, Gradio UI on HuggingFace Spaces, OpenAI GPT-3.5 API, OWL verbaliser moduleMusic Meta case study; N = 6 experts and N = 8 engineers surveys; CQ test accuracy 87.5% (P 88%, R 85.7%)
Nonetheless, significant challenges remain. Benson et al. report that large language models often deviate from established upper-level frameworks such as BFO, producing isolated “ontology silos” and undermining definitional clarity [56]. Li et al.’s systematic review goes further, arguing that much of the field suffers from ad hoc task design, inconsistent evaluation practices, and poor reproducibility [57]. Their call for standardized benchmarks and hybrid methods underscores that automation alone cannot ensure ontology quality. Later tasks such as documentation and maintenance also remain underexplored [57]. David et al. reach a similar conclusion in the domain of flood management, finding AI–human integration to be rare and highlighting the absence of end-to-end systems linking machine learning, image analysis, and expert input across disaster phases [58]. Taken together, these findings suggest that while AI can accelerate ontology construction, methodological weaknesses and limited integration with expert workflows continue to undermine reliability, especially in high-stakes settings. To address this gap, our methodology adopts an end-to-end human–AI collaborative approach that spans concept and relation extraction from expert sources, ontology construction, and evaluation, culminating in a practical proof of concept in flood-related communication.

3. Methodology

The proposed ontology-engineering strategy follows a progressive enrichment paradigm, in which human expertise delineates the conceptual boundary of the domain, and LLMs subsequently elaborate on that boundary in a series of data-driven refinement cycles. This semi-automated system aims to unlock the full potential of human-AI collaboration by having LLMs perform tasks that traditionally require human labor. Nevertheless, human intervention for verification and fine-tuning is integrated to ensure the accuracy and prevent hallucination in all intermediate results from the LLMs. Figure 1 shows the workflow of this semi-automated system, with its four main stages detailed below.

3.1. Stage 1: Expert Formulation of the Initial Ontology

This research began by closely examining program requirements and conducting stakeholder interviews, with a specific focus on flood risk communication, disaster management, and resilience within socioeconomically vulnerable communities. Guided by insights derived from this initial analysis, we developed a seed ontology whose top-level classes represent six key narrative pillars: Types of Floods, Flood Phases, Impacts, Socioeconomic Vulnerability Factors, Environmental Contexts, and Community Resilience Measures. Each of these top-level classes included explicit subclasses, for example, under the Types of Floods category, subclasses such as CoastalFlood, RiverineFlood, FlashFlood, and UrbanFlood were defined. LLMs were then utilized to generate meaningful labels and descriptions for each top-level class and its corresponding subclasses. These labels facilitate human readability, while the accompanying descriptions support machine processing through embedding-based semantic identification. At the end of Stage 1, the ontology primarily consisted of a structured class hierarchy, without yet incorporating specific properties. The resulting initial ontology structure is illustrated in Figure 2.

3.2. Stage 2: Competency Question-Driven Ontology Expansion with LLMs

Using the seed ontology as the foundational context, we leveraged the GPT-4o model to generate a diverse and comprehensive set of CQs, explicitly aiming to identify analytical challenges essential to flood risk communication knowledge frameworks. Following this automated generation, we conducted a rigorous manual review, retaining only forty high-quality CQs. These questions specifically targeted critical analytical issues, such as correlating socioeconomic vulnerability factors with evacuation delays and distinguishing environmental conditions between flash floods and riverine floods. To facilitate subsequent evaluation and ontology refinement, we partitioned these selected CQs into 51 distinct groups: 70% for ontology enhancement and 30% reserved exclusively for evaluating the final ontology.
The 70% group of CQs was used to drive the ontology’s expansion by extracting new conceptual entities. This was accomplished using the GPT-4o model with a low temperature of 0.1 to ensure deterministic and accurate outputs, thereby mitigating the risk of LLM hallucination. The model was tasked with identifying and describing key classes and relationships from the CQs. The specific prompts used for CQ generation, entity extraction, and entity placement are detailed in the Appendix A. The resulting candidate entities were then subjected to a rigorous two-step verification and deduplication protocol. First, an automated filtering process utilized OpenAI’s text-embedding-3-large model to calculate the cosine similarity between all new and existing entity descriptions. It removed new candidates with a high degree of similarity (threshold > 0.7), a value determined through experimentation to provide an optimal balance between filtering accurate duplicates and preserving conceptual nuance. This initial pass proved highly effective at reducing redundancy. Second, the remaining candidates underwent a brief human review to address nuanced issues of conceptual granularity, such as consolidating ‘Inundation Warning System’ into the more general parent concept of ‘Warning System.’ Due to the efficacy of the automated pre-filtering, we found a high degree of consensus among reviewers on these final consolidation decisions.
Once the set of new entities was validated, the LLM was prompted again to propose the most appropriate hierarchical placement for each new class within the existing taxonomy and to suggest potential semantic relationships. These structural suggestions underwent a final review by a human expert to ensure logical consistency and appropriate domain-range assignments. The integration of these fully-approved entities and relationships transformed the initial Stage 1 taxonomy into a richly connected T-Box, capable of addressing the targeted CQs.

3.3. Stage 3: Schema Enrichment from Authoritative Documents

To further enrich the ontology with additional domain-relevant concepts and instances, we collected two categories of external materials. The first category included federal recommendation documents, primarily sourced from authoritative agencies such as FEMA and the Department of Homeland Security. The second category comprised relevant and timely news articles collected through web scraping. To ensure LLMs could accurately process the document content, we utilized a document extraction tool, MunerU [59], to preprocess these materials from their original PDF format into machine-readable Markdown text. Leveraging LLMs, we subsequently identified sentences containing domain-specific knowledge pertinent to the existing class hierarchy. By following a structured chain-of-thought process, the LLMs systematically classified these sentences through a hierarchical approach, progressing from broad classifications aligned with top-level classes down to more detailed subclass categorizations.
Given that the scraped news articles predominantly captured current and up-to-date flood-related events, we treated the knowledge extracted from this source primarily as ontology instances. These simple, event-driven instances were directly populated into the ontology and subsequently visualized within a knowledge graph, preparing the groundwork for subsequent stages of analysis. In contrast, the knowledge extracted from federal documents contained a mixture of new concepts and instances. Consequently, we divided these extracted entities into two distinct groups: conceptual knowledge and instance data. The conceptual knowledge was integrated into the ontology following a systematic four-step procedure: (i) removing duplicates via embedding-based semantic similarity, (ii) leveraging LLMs to assign each concept to the most appropriate class in the existing hierarchy, (iii) inserting the validated classes and relationships into the ontology structure, and (iv) subjecting the integrated ontology to human expert review and approval. This collaborative human–AI approach yielded a richer, more policy-aware ontological schema while preserving the foundational structure established in Stages 1 and 2. Because the full ontology is too large to display in a single, comprehensible figure, we instead illustrate in Figure 3 one fully expanded branch down to its lowest-level leaf nodes. The refinement of the LLM-generated candidates followed the two-step verification protocol detailed in Section 3.2, which combines an automated embedding-based filter with human expert review. This hybrid process proved to be highly selective, with approximately 8% and 10% of the initial entity and relationship candidates rejected during this stage. Table 3 shows representative examples of rejections across three categories: hallucination, ambiguity, and semantic drift.

3.4. Stage 4: Instance Population from Web-Scraped Articles

Although the primary focus of this paper is the construction of an ontological schema, we demonstrate its practical utility through a sample knowledge graph populated with real-world data. A targeted web scraper collects recent news articles that describe significant flooding events and their impacts on socioeconomically vulnerable communities. The unstructured text is processed by a custom pipeline that employs LLMs to iteratively classify extracted entities and conceptual knowledge into progressively deeper subclasses defined by the ontology. At each level of the hierarchy, entities are assigned to child classes if available; otherwise, classification terminates at a leaf class. This iterative process continues until all entities are categorized into their most granular ontological class. The resulting set of candidate entities is then refined using an additional LLM-based filter to ensure semantic quality and contextual relevance. Subsequently, another LLM agent infers candidate relationships between the filtered entities, guided by the set of permissible relationships defined in the ontology. It is important to note that these relationships are derived from the semantic and contextual content of the entities themselves. Thus, the existence of a valid ontological relationship between two classes does not entail that such a relationship holds between all corresponding entities of those classes. A relationship is created in the knowledge graph only when there is sufficient contextual evidence within two entities to support it. The fully curated nodes and edges are then ingested into a Neo4j graph database. Figure 4 illustrates a sample visualization of the resulting ontology-backed knowledge graph at different granularity levels.

3.5. Stage 5: Case Study: Demonstrating Ontology Application

To validate the practical utility of the ontology constructed in the preceding stages, we conducted a proof-of-concept case study. This final stage demonstrates how the ontology can be applied as a semantic framework to transform a real-world flood warning into a more effective public warning. We selected an authentic Flash Flood Warning issued by the National Weather Service (NWS) as our baseline message for this case study. The original message, issued for Bradley and McMinn Counties in Tennessee, serves as a representative example of current warning templates. A systematic analysis of this baseline message reveals several critical communication gaps that hinder public comprehension and response. First, it suffers from a lack of geographic specificity; a warning for two entire counties is too broad to inform a resident whether their specific neighborhood or street is in immediate danger. Second, the impact assessment is vague, relying on technical data like rainfall measured in inches instead of describing tangible impacts, such as impassable roads, that are meaningful to residents. Third, the call to action is generic, advising residents to “act quickly” without providing specific, actionable instructions like naming a safe evacuation point. Fourth, the one-size-fits-all message has no consideration for vulnerable populations, failing to account for the unique needs of the elderly, persons with disabilities, or those with language barriers. Our methodology demonstrates the potential of the ontology as a semantic framework to address these gaps systematically. This process is not a fully automated system, but a proof-of-concept illustrating how the ontology’s structure can be queried to enrich and translate a baseline message. This is achieved by leveraging the ontology’s concepts (Entities) and semantic logic (Triplets) to resolve each of the four identified communication gaps:
1. Addressing Lack of Geographic Specificity. To counter the broadness of the original alert, the framework utilizes the ontology’s conceptual hierarchy by querying for specific Entities like “Community” and “ZipCode”. The semantic logic encoded in Triplets, such as the relation “ ( FlashFlood _ Event ) locatedInArea ( Community ) ”, provides the mechanism to pinpoint the warning to a precise, at-risk neighborhood.
2. Translating Data into Tangible Impacts. To resolve the use of technical jargon and data, the system leverages the “Impacts” hierarchy. This allows it to model real-world consequences using Entities like “InfrastructuralImpact” and “RoadDamage”. The logical connection in a Triplet, such as “ ( FlashFlood _ Event ) hasImpact ( RoadDamage ) ”, enables the translation of abstract data (e.g., rainfall in inches) into a concrete and understandable impact (e.g., impassable roads). Furthermore, the ontology’s catalog of historical incidents can be queried to provide crucial context. By identifying a similar past event through a semantic relation, the alert can offer a precedent, such as, “This event may cause flooding similar to the May 2021 flood”, making the potential threat more immediate and relatable for residents.
3. Providing Actionable Instructions. To replace generic advice, the ontology contains specific Entities for actions and resources, “evacuationProcedure”. The framework can then query for local knowledge stored as Triplets, such as “ ( Fairview _ Evacuation _ Route ) isA ( EvacuationProcedure ) ”, to provide specific guidance like naming safe locations.
4. Tailoring Messages for Vulnerable Populations. The ontology’s rich hierarchy of Entities, including “ElderlyCommunityMember” and “PersonWithDisability”, allows for targeted messaging. Key Triplets, such as the relation “ ( SpecialConsideration ) neededForElderly ( ElderlyCommunityMember ) ”, provide the explicit logic needed to generate critical, life-saving advice for at-risk groups that is absent in generic alerts.
Figure 5 provides a side-by-side comparison of the original NWS message and the revised, ontology-driven alert, highlighting the key improvements. The ontology-driven message uses information extracted from official documents and targeted web articles, parsed into the ontology, and used as a structured knowledge source to produce community-specific, tailored, and actionable insights.

3.6. Domain Adaptability and Generalization

While this study focuses on flood risk, the proposed semi-automated ontology construction methodology is designed to be domain-adaptable and generalizable to other disaster contexts, such as wildfires, earthquakes, or public health emergencies. The strength of the four-stage workflow, expert-driven schema formulation, LLM-driven expansion, CQ-based validation, and enrichment from authoritative sources lies in its process, which is not specific to hydrology. It provides a robust and replicable method for combining human expertise with AI-driven knowledge extraction in any complex domain.
Applying this methodology to a different disaster requires adapting the domain-specific inputs at each stage. For a wildfire scenario, for instance, the authoritative sources would shift from the NWS and FEMA to agencies like the National Interagency Fire Center (NIFC), and the ontology’s core concepts would involve Fire Behavior, Containment Status, and Air Quality Index instead of Flood Phases. Similarly, an earthquake ontology would be built from U.S. Geological Survey (USGS) data and include key entities such as Epicenter, Magnitude, and Aftershock. For a health disaster, sources like the CDC and WHO would inform a schema centered on Pathogen, Transmission Vector, and Public Health Measures. Despite necessary adaptations to domain content, the proposed methodology, leveraging the ontology to transform a generic technical warning into a clear, actionable public warning, demonstrates the ontology’s practical utility in a universally applicable way.

4. Evaluation Results

To systematically evaluate the quality and completeness of the ontology developed in this research, the evaluation is divided into two parts. First, we assessed the schema quality using established OntoQA metrics [60], which provide insights into the Relationship Richness (RR) and Inheritance Richness (IR). Second, to examine the comprehensiveness of the ontology in covering essential domain concepts, we conducted a concept convergence check. This evaluation involved applying the remaining 30% of CQs that were deliberately set aside from the earlier ontology development phase. If the ontology could successfully align with and represent the concepts inherent in these questions, it would indicate robust conceptual coverage, suggesting that the ontology can effectively support query answering and information retrieval once populated with domain-specific data.

4.1. Structural Evaluation Using OntoQA Metrics

To rigorously evaluate the structural quality of our generated ontology, we utilized three established OntoQA metrics: RR and IR. These metrics provide quantitative insights into the expressivity, completeness, and semantic richness of the ontology. RR measures the proportion of relationships defined within the ontology schema, excluding inheritance relations. Specifically, it is calculated as the ratio of the number of non-inheritance relationships (object properties) to the sum of these relationships and subclass (Inheritance) relationships:
R R = P P + S C
where P is the number of object properties (non-inheritance relationships), and S C is the number of subclass relationships (inheritance).
Inheritance Richness ( I R ) evaluates the distribution and depth of the ontology’s class hierarchy, indicating how extensively classes are specialized. I R is defined as the average number of subclasses per class and is calculated as follows:
I R = S C C
where C is the total number of classes.
Furthermore, the total number of axioms from Protégé is utilized to reflect the structural complexity of the ontology. This metric encompasses statements defining classes, relationships, property characteristics, and individual assertions. To benchmark the quality of our generated ontology, we compare it with several well-established, high-quality disaster-related ontologies, including Flood Disaster Support Ontology (FDSO) [17], Disaster Management Domain Ontology (DMDO) [15], and OntoCity [61].
Based on the comparative results summarized in Table 4, our ontology demonstrates notable advantages and some limitations relative to established disaster ontologies. Utilizing LLMs for ontology construction allowed our ontology to achieve a higher Relationship Richness ( R R = 0.58), substantially outperforming FDSO (0.16), DMDO (0.01), and OntoCity (0.18). This indicates that our ontology leverages more extensive semantic relationships beyond basic hierarchical structures, providing a richer, interconnected conceptual framework. Moreover, our ontology contains a large number of axioms (3754), indicating significant expressiveness and semantic depth comparable to comprehensive ontologies such as DMDO (4075 axioms). However, despite having a substantial class count (350), our inheritance richness ( I R = 1) suggests a relatively flat hierarchical structure, limiting its depth of specialization compared with more hierarchically nuanced ontologies, such as OntoCity ( I R = 0.81). Therefore, future improvements will involve refining the hierarchical structure and introducing data properties to boost descriptive specificity and practical applicability.

4.2. Concept Coverage Evaluation via Competency Questions

4.2.1. Concept Extraction

Concept coverage is a descriptive measure of our ontology, answering the question: “Are the concepts in the CQs adequately covered by the concepts in the ontology?” One method to assess concept coverage is to extract words from the CQs using named entity recognition (NER) and perform exact word matching. However, this approach would miss concepts that are semantically similar but are spelled differently. To address this, we use vector embeddings, which encode the semantic meaning of words, phrases, or sentences into a numerical, high-dimensional latent space. Given a corpus of text, each word has complex, nonlinear relationships and patterns with other words. The embedding model, trained on massive text datasets, allows us to compare the similarity between words, phrases or sentences in this high-dimensional space. For this evaluation, we will be using OpenAI’s text-embedding-3-large model. The evaluation is conducted on a CQ test set that is distinct from the CQs upon which the ontology was originally built.
The first step involves extracting concepts from the CQs using an LLM. Figure 6 illustrates the designed prompt, where we constrained the LLM to select three to seven key concepts. Surprisingly, these selected concepts included both explicit and implicit notions.
Interestingly, we found that the model occasionally extracted concepts that were not directly stated in the CQs but represented a higher level of abstraction. For example, from the question, “How can communities develop effective evacuation plans that account for all three flood phases?” the LLM extracted “flood management”. While not explicitly stated, it is a relevant concept for answering the question. An example of this is provided in Figure 7. This behavior can be regulated by adjusting the prompt or the model’s temperature. However, including these non-explicit concepts is beneficial, as important ideas for answering a question may not be stated directly in the language. But it does also carry the risk of generating irrelevant concepts. Based on the prompt, the concepts do not need to be single words, and we avoid stop words like “how”, “what”, etc.
To ensure the LLM concept extraction achieved good coverage and quality for ontology construction, we administered a comprehensive evaluation quiz to three evaluators who are actively involved in constructing the flood ontology. The quiz presented each evaluator with all 56 LLM-extracted concepts across 12 CQs, asking them to assess each concept’s relevance, accuracy, and utility for building a flood management ontology. Evaluators could mark concepts as “good” or “bad” and suggest additional concepts the LLM missed, as shown in Figure 8.
The results were processed using a majority voting system where concepts receiving approval from at least 2 out of 3 evaluators (>50%) were retained, while those with minority support were removed. This validation process demonstrated high LLM performance with a 93.5% approval rate for extracted concepts and an average quality rating of 4.5/5 from evaluators. The human validation resulted in a refined concept set of 66 concepts (up from 56), with two generic concepts like “Flooding” removed due to insufficient evaluator support and 12 new concepts added based on their suggestions. This human-validated concept extraction ensures higher-quality evaluation standards and establishes a human feedback-driven process for continuously refining and extending our flood ontology.

4.2.2. Evaluation and Results

Once all concepts are validated for each CQ, they are then embedded using text-embedding-3-large model. We also embed the ontology labels, which we refer to as the embedded ontology concepts. We then measure the distance between each CQ concept and every ontology concept embedding using cosine similarity. Because we are uncertain about the optimal similarity score, we compare results across different similarity thresholds. We call this our Type 1 comparison, with the results shown in Figure 9. Figure 10 presents a radial plot with the concepts from test question 40 at the center, surrounded by candidate ontology labels. The proximity of each label to the center indicates the degree of cosine similarity, with closer labels representing higher similarity.
A good example illustrating the need for threshold testing is the similarity between the terms “communities”, “Community”, and “community” shown in Figure 11. While we intuitively recognize that they represent the same concept, the embedding model treats them as distinct and assigns different similarity scores when compared with the extracted CQ word “Communities”. But which score should we use as the cutoff? For instance, if we set the threshold at 0.6, we might capture “Community” but miss “community”. This example also highlights the need to identify and possibly merge similar concepts. For example, “Community” appears at depth one in the ontology, while “community” appears at depth two, effectively duplicating the same concept at different levels.
A significant issue with embedding single-word concepts is the loss of context. First, we make the assumption that the concepts to answer the question are explicitly stated in the question itself. Though most of the time true, it may not always be the case. We showed previously that the LLM can extract some implicit concepts for us, but that only happens occasionally. Second, homonyms, words with the same spelling but different meanings, present a challenge. Consider “She made a deposit at the bank” versus “She went to the river bank”. Both use the word “bank”, but with different meanings. If we only extract “bank”, how do we determine which meaning is intended? This problem is known as “Word-Sense Disambiguation”. To combat this, we embed the ontology label comments, which we call ontology concept descriptions. This means we look for matches between the CQ concepts and the descriptions of the ontology concepts. The results of this Type 2 comparison are shown in Figure 12.
We also embed the ontology label along with its description, concatenated as “label:” + “description”, and compare it to the extracted concepts. We refer to this as type 3, which is shown in Figure 13. To preserve the context of the concepts even further, we can embed the entire question and identify which ontology concepts best match the embedding of the full CQ. In Figure 14, we embed the question and compare it to the “label:” + “description” embedding. This approach utilizes the available context for both the question and the ontology concepts; however, it skips the concept extraction step. Nonetheless, it is worth testing to ensure we are not missing out on coverage results.
The complete algorithm for our coverage evaluation is provided in Algorithm 1.
Algorithm 1  Ontology Coverage Evaluation
Require:
  1:  Competency Questions Q
  2:  Ontology Concepts OC
  3:  Ontology Concept Descriptions OD
  4:  Threshold Set T
  5:  LLM Prompt P
Ensure:  Coverage results { R 1 , R 2 , R 3 , R 4 } for each τ T
       Pre-processing
  6:   E OC { embed ( c ) c OC } ▹ label embeddings
  7:   E OD { embed ( d ) d OD } ▹ description embeddings
  8:   E OC + OD embed c d c OC , d OD ▹ label+desc embeds
  9:  for each  q Q do
10:         e q embed ( q ) ▹ question embed
11:         C q LLM_extract ( q , P ) ▹ concepts in q
12:         E C q { embed ( c ) c C q } ▹ concept embeds
      Coverage Evaluation
13:        for each  τ T  do
14:              R1:  R 1 = { c C q max e E OC sim ( e , e c ) τ } ▹ c vs. label
15:              R2:  R 2 = { c C q max e E OD sim ( e , e c ) τ } ▹ c vs. desc
16:              R3:  R 3 = { c C q max e E OC + OD sim ( e , e c ) τ } ▹ c vs. label+desc
17:              R4:  R 4 = max e E OC + OD sim ( e , e q ) τ ▹ question vs. label+desc
18:      end for
19:  end for
To see the top-k concepts from the ontology for each type of coverage evaluation, where k is the count of the extracted concepts, please refer to Table 5. This table shows the cosine similarity score for each concept and provides insights into what is considered “similar” depending on what we embedded. Table 6 provides a detailed numerical summary for every question and its concept matches for Type 1 and Type 2 comparisons. For Type 3 and Type 4, we are comparing against the entire question, so we assess the coverage of ontology concepts relative to the question itself, not its extracted concepts. In these cases, we indicate coverage at a given threshold with a “Y” for yes and a “-” for no. Q-Cov represents the number of questions for which all expected concepts were successfully matched at or above a given similarity threshold, expressed as a fraction of the total number of questions.

4.2.3. Analysis

Across all four comparison types, we observe a consistent coverage–threshold trade-off as the similarity threshold τ increases. However, the steepness of this trade-off varies significantly depending on the type of comparison. At the lowest cutoff ( τ = 0.3 ), all evaluations achieve perfect coverage, capturing all 66 concepts in the concept-level tests and all 12 questions in the question-level test. Increasing the threshold to τ = 0.4 slightly reduces coverage, with concept-to-label matches (Type 1) remaining nearly complete at 98.5%, while concept-to-description (Type 2) and concept-to-“label + description” (Type 3) matches decrease to 89.4% and 90.9%, respectively. An example of the difference that the label description provides is for the third question in Table 5, where “emergency coordination” matches to “animal evacuation coordination” for Type 1, while for Types 2 and 3 it yields “unified evacuation orders.”
The most significant divergence occurs between thresholds τ = 0.5 and τ = 0.6 (Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14). At τ = 0.5 , coverage rates stand at 92.4% for Type 1, 62.1% for Type 2, 59.1% for Type 3, and 100% for Type 4. The unweighted average coverage across these four comparison types at τ = 0.5 is
92.4 % + 62.1 % + 59.1 % + 100.0 % 4 = 78.4 % .
However, raising the threshold incrementally to τ = 0.6 causes a sharper decline, with coverage falling to 75.8% for Type 1, 36.4% for Type 2, 31.8% for Type 3, and dropping dramatically to only 25.0% for Type 4. Further increasing the threshold to τ = 0.7 significantly reduces coverage to just 45.5% for Type 1, with single-digit or zero coverage for the other three types.
Taken together, these trends underscore the importance of contextual detail while locating a practical “elbow” at roughly τ = 0.5 0.6 . Operating in this band retains relevant matches across every mode while filtering obvious noise. Some of the false negatives that appear at higher thresholds point to structural issues in the ontology itself, such as duplicate classes (e.g., redundant instances of Community) or missing terminology for key domain concepts like flood phases. Addressing these by merging redundancies, enriching descriptions, and incorporating salient terms could improve alignment in future iterations.
The concept coverage evaluation is not a perfect reflection of the ontology’s accuracy or utility in downstream tasks and contains ambiguity regarding the “right threshold”. We could achieve 100% concept coverage by comparing our CQs against a dictionary of all words, but we know that a dictionary is not built for our use case and does not contain the domain reasoning among concepts. It is still important to know that we have successfully modeled the necessary concepts within the constraints of our research problem. When we populate this ontology and locate real data instances to answer the CQs, we will have a way to navigate to the data through the conceptual blueprint we have generated.
This evaluation also reveals areas for improvement in the ontology, such as duplicate classes that need merging, inconsistencies, and missing concepts. The more questions we ask, the better we can iteratively improve the ontology. If we wish to add concepts from a new domain and ask new CQs, we can use this method to evaluate whether those concepts are contained. Because we use an LLM to generate CQs, we are much more efficient at covering a given domain. Human oversight is still necessary to ensure the questions are worthwhile and relevant, but this process is much faster than manual question creation.

5. Discussion

The results of this study introduce a novel semi-automated, human-in-the-loop methodology for ontology construction and demonstrate its application in enhancing flood risk communication. This section provides a critical interpretation of our findings, positioning them within the context of prior work, addressing key methodological advancements, and acknowledging the limitations of the current research.

5.1. A Critical Comparison of Ontological Structure and Purpose

Our evaluation highlights a distinct structural profile for our ontology when compared with established models like the FDSO, DMDO, and OntoCity. The most notable distinction lies in our ontology’s substantially higher Relationship Richness ( R R = 0.58) compared with FDSO (0.16), DMDO (0.01), and OntoCity (0.18). This outcome reflects our application-driven design. Ontologies like FDSO and DMDO are primarily architected for expert-level data integration, prioritizing deep, hierarchical classification to organize vast datasets. In contrast, our ontology is purpose-built to provide the semantic framework for public-facing risk communication. Effective communication requires modeling complex, cross-cutting relationships that link, for instance, a “SocioeconomicVulnerabilityFactor” to a specific “InfrastructuralImpact” and a corresponding “CommunityResilienceMeasure”. The dense network of non-hierarchical object properties is essential for capturing this context, enabling the translation of technical data into meaningful public alerts.
This design choice involves a trade-off, reflected in our Inheritance Richness ( I R = 1), which suggests a relatively flat class hierarchy. While this indicates less depth of specialization compared with more formally structured ontologies, it aligns with our immediate goal of developing a communication-centric framework. Nonetheless, we acknowledge this as a limitation and a clear avenue for future work, where incorporating more granular data will be essential to enrich this hierarchical structure without losing relational richness.

5.2. Bridging Documented Gaps in Flood Risk Communication

A primary application of this research is addressing the persistent gap between the issuance of a flood warning and the public’s ability to take appropriate protective action. The risk communication literature consistently identifies the use of technical jargon, a lack of geographic specificity, and generic advice as critical barriers to effective response [3,27]. Our work addresses this “semantic gap” directly. The proof-of-concept case study demonstrates how our ontology serves as a semantic bridge to overcome these challenges. By querying the ontology’s rich relational network, a generic alert for “two inches of rain” can be translated into a tangible, understandable warning about specific “RoadDamage” on “Mouse Creek Road” Similarly, vague instructions like “act quickly” are transformed into “ActionableInstructions” by retrieving knowledge of local resources, such as a named “evacuationProcedure”. This practical application substantiates our claim that a communication-oriented ontology can systematically resolve the well-documented shortcomings of current public alerting systems.

5.3. Advancing Semi-Automated Ontology Construction Methodologies

Beyond the application, our contribution extends to the methodology of ontology engineering itself. The literature on LLM-based ontology construction highlights significant risks, including factual “hallucination", “semantic drift” away from the core domain, and the creation of isolated “ontology silos” that lack definitional clarity [8,56,57]. Our four-stage, human-in-the-loop methodology was explicitly designed to mitigate these known failure modes. By combining expert-led schema formulation, CQ-driven expansion, and enrichment from authoritative documents, we establish strong constraints on the LLM’s generative process.
The effectiveness of this approach is evidenced in our candidate rejection process, as detailed in Table 3. Our hybrid verification protocol, which combines automated filtering with human expert review, successfully identified and rejected logically flawed hierarchies (hallucination), ambiguous concepts, and overly broad terms (semantic drift). This demonstrates that a structured, collaborative human-AI workflow does more than just accelerate ontology creation; it serves as a critical quality assurance mechanism that addresses the well-documented weaknesses of fully automated approaches, resulting in a more trustworthy and contextually robust model.

5.4. Limitations and Future Directions

While this study establishes a strong proof-of-concept, we acknowledge its limitations. The evaluation presented is primarily structural (OntoQA) and qualitative (case study). A crucial next step is to conduct a quantitative, task-oriented performance comparison. An empirical study measuring the comprehension and perceived actionability of the ontology-driven messages among target populations would provide definitive evidence of its real-world effectiveness. Second, as noted, the ontology’s hierarchical structure is relatively shallow. Future iterations will focus on incorporating more granular data from diverse sources to deepen the taxonomy, thereby increasing its descriptive power. Finally, the framework has not yet been field-tested in a live operational environment. Integrating this system into a real-world alerting pipeline, while a significant long-term objective, is essential for validating its practical utility and scalability under the pressures of an actual disaster event.

6. Conclusions

This research demonstrates that a semi-automated, human-in-the-loop methodology can successfully produce a semantically rich ontology specifically engineered to overcome critical gaps in public flood risk communication. By strategically combining human expertise with the generative power of LLMs, our work provides both a replicable workflow and a practical knowledge framework that bridges the divide between complex disaster data and actionable public understanding.
The broader implications of this study are twofold. For flood risk communication, it provides a blueprint for moving beyond the current paradigm of static, one-size-fits-all warnings toward next-generation intelligent alerting systems. Such systems, built on a robust ontological foundation, can enable the dynamic generation of context-aware, personalized, and adaptive messages tailored to the specific needs of vulnerable populations. For the field of ontology design, our application-driven approach advocates the development of purpose-built models over generic ones, while our methodology presents a validated template for human-AI collaboration that mitigates the known risks of fully automated construction.
Looking ahead, our future work will proceed along three concrete paths. First, we will move beyond qualitative assessment to conduct empirical, user-centric studies that quantitatively measure the improved comprehension and actionability of ontology-driven alerts. Second, we aim to enhance the ontology’s technical capabilities by deepening its hierarchical structure and integrating dynamic, real-time data sources such as weather radar feeds and social media sentiment. Finally, our long-term vision is to develop a pilot program in partnership with an emergency management agency, allowing us to test and refine this framework in an operational environment, ultimately advancing the goal of saving lives through clearer communication.

Author Contributions

Conceptualization, S.L., C.E., M.Z. and X.G.; Methodology, S.L. and C.E.; Project administration, Q.D. and J.G.; Validation, S.L. and M.Z.; Data curation, S.L. and C.E.; Software, S.L., C.E. and M.Z.; Writing—review and editing, S.L., C.E. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported in part by the National Science Foundation (NSF Award #2333836) and by the National Oceanic and Atmospheric Administration (NOAA) through the Cooperative Institute for Research to Operations in Hydrology (CIROH) at The University of Alabama under Cooperative Agreement NA22NWS4320003. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of NSF, NOAA, or the U.S. Department of Commerce.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

During the preparation of this manuscript, the authors used (ChatGPT, version-4o) to check grammar, improve readability, and smooth transitions between paragraphs. The tool was not used for study design, data analysis, or generation of scientific content. The authors reviewed and edited all AI-assisted suggestions and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no competing interests.

Appendix A. LLM Prompts

Appendix A.1. Competency Question Generation

Figure A1. Prompt for generating ontology-driven Competency Questions (CQs) in the flood resilience domain. A low model temperature ( T = 0.1 ) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.
Figure A1. Prompt for generating ontology-driven Competency Questions (CQs) in the flood resilience domain. A low model temperature ( T = 0.1 ) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.
Water 17 02801 g0a1

Appendix A.2. Entity Extraction from Competency Questions

Figure A2. Prompt for extracting concepts from ontology-driven Competency Questions generated from Figure A1. The provided Competency Questions (boxed in blue) are dynamically inserted using Python string operations. A low model temperature ( T = 0.1 ) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.
Figure A2. Prompt for extracting concepts from ontology-driven Competency Questions generated from Figure A1. The provided Competency Questions (boxed in blue) are dynamically inserted using Python string operations. A low model temperature ( T = 0.1 ) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.
Water 17 02801 g0a2

Appendix A.3. Ontology Integration and Property Alignment

Figure A3. Prompt for integrating new classes and aligning object properties within an existing OWL 2 ontology. Python’s Pydantic models are employed to ensure structured, reliable, and schema-compliant LLM outputs. A low model temperature ( T = 0.1 ) is applied to guarantee consistency and reproducibility across iterations of ontology extension and entity integration.
Figure A3. Prompt for integrating new classes and aligning object properties within an existing OWL 2 ontology. Python’s Pydantic models are employed to ensure structured, reliable, and schema-compliant LLM outputs. A low model temperature ( T = 0.1 ) is applied to guarantee consistency and reproducibility across iterations of ontology extension and entity integration.
Water 17 02801 g0a3

References

  1. Chang, S.E. Socioeconomic Impacts of Infrastructure Disruptions. In Oxford Research Encyclopedia of Natural Hazard Science; Oxford University Press: Oxford, UK, 2016. [Google Scholar]
  2. Hao, S.; Wang, W.; Ma, Q.; Li, C.; Wen, L.; Tian, J.; Liu, C. Analysis on the Disaster Mechanism of the “8.12” Flash Flood in the Liulin River Basin. Water 2022, 14, 2017. [Google Scholar] [CrossRef]
  3. Stephens, K.K.; Blessing, R.; Tasuji, T.; McGlone, M.S.; Stearns, L.N.; Lee, Y.; Brody, S.D. Investigating ways to better communicate flood risk: The tight coupling of perceived flood map usability and accuracy. Environ. Hazards 2024, 23, 92–111. [Google Scholar] [CrossRef]
  4. Merz, B.; Vorogushyn, S.; Uhlemann, S.; Viglione, A.; Blöschl, G. Understanding Heavy Tails of Flood Peak Distributions. Water Resour. Res. 2022, 58, e2021WR030506. [Google Scholar] [CrossRef]
  5. Elmhadhbi, L.; Ghedira, C.; Bouaziz, R. An Ontological Approach to Enhancing Information Sharing in Disaster Response. Information 2021, 12, 432. [Google Scholar] [CrossRef]
  6. Du, W.; Liu, C.; Xia, Q.; Wen, M.; Hu, Y. OFPO & KGFPO: Ontology and knowledge graph for flood process observation. Environ. Model. Softw. 2025, 185, 106317. [Google Scholar] [CrossRef]
  7. Raman, R.; Kowalski, R.; Achuthan, K.; Iyer, A.; Nedungadi, P. Navigating Artificial General Intelligence Development: Societal, Technological, Ethical, and Brain-Inspired Pathways. Sci. Rep. 2025, 15, 8443. [Google Scholar] [CrossRef]
  8. Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Liu, T. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 42. [Google Scholar] [CrossRef]
  9. Gruber, T.R. A Translational Approach to Portable Ontology Specifications. Knowl. Acquis. 1993, in press.
  10. Noy, N.F.; McGuinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology; Technical Report KSL-01-05; Stanford Knowledge Systems Laboratory: Stanford, CA, USA, 2001. [Google Scholar]
  11. Guarino, N.; Oberle, D.; Staab, S. What Is an Ontology? In Handbook on Ontologies; Staab, S., Studer, R., Eds.; Springer: Berlin, Germany, 2009; pp. 1–17. [Google Scholar]
  12. Elmhadhbi, L.; Karray, M.-H.; Archimède, B.; Otte, J.; Smith, B. A modular ontology for semantically enhanced interoperability in operational disaster response. In Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management—ISCRAM 2019, Valencia, Spain, 19–22 May 2019. [Google Scholar]
  13. Khantong, S.; Sharif, M.N.A.; Mahmood, A.K. An ontology for sharing and managing information in disaster response: An illustrative case study of flood evacuation. Int. Rev. Appl. Sci. Eng. 2020, 11, 22–33. [Google Scholar] [CrossRef][Green Version]
  14. Bu Daher, J.; Huygue, T.; Stolf, P.; Hernandez, N. An ontology and a reasoning approach for evacuation in flood disaster response. In Proceedings of the 17th International Conference on Knowledge Management (IKCM 2022), Potsdam, Germany, 23–24 June 2022; pp. 117–131. [Google Scholar][Green Version]
  15. Shukla, D.; Azad, H.K.; Abhishek, K.; Shitharth, S. Disaster management ontology—An ontological approach to disaster management automation. Sci. Rep. 2023, 13, 8091. [Google Scholar][Green Version]
  16. Hofmeister, M.; Bai, J.; Brownbridge, G.; Mosbach, S.; Lee, K.F.; Farazi, F.; Hillman, M.; Agarwal, M.; Ganguly, S.; Akroyd, J.; et al. Semantic agent framework for automated flood assessment using dynamic knowledge graphs. Data-Centric Eng. 2024, 5, e14. [Google Scholar] [CrossRef]
  17. Dutta, B.; Sinha, P.K. An ontological data model to support urban flood disaster response. J. Inf. Sci. 2023, 49, 1–22. [Google Scholar] [CrossRef]
  18. Mughal, M.H.; Shaikh, Z.A.; Wagan, A.I.; Khand, Z.H.; Hassan, S. ORFFM: An Ontology-Based Semantic Model of River Flow and Flood Mitigation. IEEE Access 2021, 9, 44003–44029. [Google Scholar] [CrossRef]
  19. Yahya, H.; Ramli, R. Ontology for Evacuation Center in Flood Management Domain. In Proceedings of the 2020 8th International Conference on Information Technology and Multimedia (ICIMU 2020), Selangor, Malaysia, 24–25 August 2020; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2020; pp. 288–291. [Google Scholar]
  20. Sermet, Y.; Demir, I. Towards an information centric flood ontology for information management and communication. Earth Sci. Inform. 2019, 12, 541–551. [Google Scholar] [CrossRef]
  21. Kurte, K.R.; Durbha, S.S. Spatio-Temporal Ontology for Change Analysis of Flood Affected Areas Using Remote Sensing Images. In Proceedings of the 10th International Conference on Formal Ontology in Information Systems (FOIS 2016), Annecy, France, 6–10 July 2016. Paper ONTO-COMP-D2. [Google Scholar]
  22. Agresta, A.; Fattoruso, G.; Pollino, M.; Pasanisi, F.; Tebano, C.; De Vito, S.; Di Francia, G. An Ontology Framework for Flooding Forecasting. In Proceedings of the 14th International Conference on Computational Science and Its Applications (ICCSA 2014), University of Minho, Campus de Azurém, Guimarães, Portugal, 30 June–3 July 2014; Lecture Notes in Computer Science, Volume 8582. Springer International Publishing: Cham, Switzerland, 2014; pp. 417–428. [Google Scholar]
  23. van Ruler, B. Communication Theory: An Underrated Pillar on Which Strategic Communication Rests. Int. J. Strateg. Commun. 2018, 12, 367–381. [Google Scholar] [CrossRef]
  24. Rowley, J. The Wisdom Hierarchy: Representations of the DIKW Hierarchy. J. Inf. Sci. 2007, 33, 163–180. [Google Scholar] [CrossRef]
  25. MacKinnon, J.; Heldsinger, N.; Peddle, S. A Community Guide to Effective Flood Risk Communication; Partners for Action: Waterloo, ON, Canada, 2018. [Google Scholar]
  26. Rollason, E.; Bracken, L.J.; Hardy, R.J.; Large, A.R.G. Rethinking flood risk communication. Nat. Hazards 2018, 92, 1665–1686. [Google Scholar] [CrossRef]
  27. Zajac, M.; Kulawiak, C.; Li, S.; Erickson, C.; Hubbell, N.; Gong, J. Unifying Flood-Risk Communication: Empowering Community Leaders Through AI-Enhanced, Contextualized Storytelling. Hydrology 2025, 12, 204. [Google Scholar] [CrossRef]
  28. Steen-Tveit, K. Identifying Information Requirements for Improving the Common Operational Picture in Multi-Agency Operations. In Proceedings of the 17th ISCRAM Conference, Blacksburg, VA, USA, 24–27 May 2020; pp. 252–263. [Google Scholar]
  29. Dorasamy, M.; Raman, M.; Kaliannan, M. Knowledge management systems in support of disasters management: A two-decade review. Technol. Forecast. Soc. Change 2013, 80, 1834–1853. [Google Scholar] [CrossRef]
  30. Guarino, N. Formal Ontology and Information Systems. In Proceedings of the Formal Ontology in Information Systems (FOIS’98), Trento, Italy, 6–8 June 1998; pp. 3–15. [Google Scholar]
  31. National Weather Service (NWS). CAP Documentation–NWS Common Alerting Protocol. Available online: https://vlab.noaa.gov/web/nws-common-alerting-protocol/cap-documentation (accessed on 28 August 2025).
  32. Asim, M.N.; Wasim, M.; Khan, M.U.G.; Mahmood, W.; Abbasi, H.M. A Survey of Ontology Learning Techniques and Applications. Database 2018, 2018, bay101. [Google Scholar] [CrossRef]
  33. Ghidalia, S.; Labbani Narsis, O.; Bertaux, A.; Nicolle, C. Combining Machine Learning and Ontology: A Systematic Literature Review. arXiv 2024, arXiv:2401.07744. [Google Scholar] [CrossRef]
  34. Zulkipli, Z.Z.; Maskat, R.; Teo, N.H.I. A Systematic Literature Review of Automatic Ontology Construction. Indones. J. Electr. Eng. Comput. Sci. 2022, 28, 878–889. [Google Scholar] [CrossRef]
  35. Val-Calvo, M.; Egaña-Aranguren, M.; Mulero-Hernández, J.; Almagro-Hernández, I.; Deshmukh, P.; Bernabé-Díaz, J.A.; Espinoza-Arias, P.; Sánchez-Fernández, J.L.; Mueller, J.; Fernández-Breis, G.T. OntoGenix: Leveraging Large Language Models for enhanced ontology engineering from datasets. Inf. Process. Manag. 2025, 62, 104042. [Google Scholar] [CrossRef]
  36. Castro, A.; Pinto, J.; Reino, L.; Pipek, P.; Capinha, C. Large language models overcome the challenges of unstructured text data in ecology. Ecol. Inform. 2024, 82, 102742. [Google Scholar] [CrossRef]
  37. Abid, S.K.; Sulaiman, N.; Chan, S.W. Present and Future of Artificial Intelligence in Disaster Management. In Proceedings of the International Conference on Engineering Management of Communication and Technology (EMCTECH), Vienna, Austria, 16–18 October 2023; IEEE: Kuala Lumpur, Malaysia, 2023; pp. 1–8. [Google Scholar]
  38. Kommineni, V.K.; König-Ries, B.; Samuel, S. From human experts to machines: An LLM-supported approach to ontology and knowledge graph construction. arXiv 2024, arXiv:2403.08345. [Google Scholar]
  39. Lo, A.; Jiang, A.Q.; Li, W.; Jamnik, M. End-to-End Ontology Learning with Large Language Models. In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10–15 December 2024; NeurIPS Foundation: Vancouver, BC, Canada, 2024. [Google Scholar]
  40. Raees, M.; Meijerink, I.; Lykourentzou, I.; Khan, V.-J.; Papangelis, K. From explainable to interactive AI: A literature review on current trends in human-AI interaction. Int. J. Hum.-Comput. Stud. 2024, 189, 103301. [Google Scholar] [CrossRef]
  41. Mazarakis, A.; Bernhard-Skala, C.; Braun, M.; Peters, I. What is critical for human-centered AI at work?—Toward an interdisciplinary theory. Front. Artif. Intell. 2023, 6, 1257057. [Google Scholar] [CrossRef]
  42. Karanjit, R.; Samadi, V.; Hughes, A.; Murray-Tuite, P.; Stephens, K. Converging human intelligence with AI systems to advance flood evacuation decision making. Nat. Hazards Earth Syst. Sci. Discuss. 2024. in review. [Google Scholar]
  43. Lokala, U.; Lamy, F.; Daniulaityte, R.; Gaur, M.; Gyrard, A.; Thirunarayan, K.; Kursuncu, U.; Sheth, A. Drug Abuse Ontology to Harness Web-Based Data for Substance Use Epidemiology Research: Ontology Development Study. JMIR Public Health Surveill. 2022, 8, e24938. [Google Scholar] [CrossRef]
  44. Tsaneva, S.; Sabou, M. Enhancing Human-in-the-Loop Ontology Curation Results through Task Design. ACM J. Data Inf. Qual. 2024, 16, 4. [Google Scholar] [CrossRef]
  45. Lippolis, A.S.; Saeedizade, M.J.; Keskisärkkä, R.; Zuppiroli, S.; Ceriani, M.; Gangemi, A.; Blomqvist, E.; Nuzzolese, A.G. Ontology Generation Using Large Language Models. arXiv 2025, arXiv:2503.05388. [Google Scholar] [CrossRef]
  46. Abolhasani, M.S.; Pan, R. OntoKGen: A Genuine Ontology and Knowledge Graph Generator Using Large Language Model. In Proceedings of the Annual Reliability & Maintainability Symposium (RAMS), Destin, FL, USA, 27–30 January 2025; pp. 20–25. [Google Scholar]
  47. Aggarwal, T.; Salatino, A.; Osborne, F.; Motta, E. Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field. Inf. Process. Manag. 2024, submitted. [CrossRef]
  48. Fathallah, N.; Das, A.; De Giorgis, S.; Poltronieri, A.; Haase, P.; Kovriguina, L. NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning. In Proceedings of the Semantic Web: ESWC 2024 Satellite Events, Hersonissos, Crete, Greece, 26–30 May 2024; Meroño Peñuela, A., Corcho, O., Groth, P., Simperl, E., Tamma, V., Nuzzolese, A.G., Poveda-Villalón, M., Sabou, M., Presutti, V., Celino, I., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2025; Volume 15344, pp. 36–50. [Google Scholar]
  49. Bakker, R.M.; Di Scala, D.L.; de Boer, M.H.T. Ontology Learning from Text: An Analysis on LLM Performance. In Proceedings of the NLP4KGC: 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation, in conjunction with SEMANTiCS 2024 Conference, Amsterdam, The Netherlands, 17–19 September 2024; CEUR Workshop Proceedings: Aachen, Germany, 2024. [Google Scholar]
  50. Funk, M.; Hosemann, S.; Jung, J.C.; Lutz, C. Towards Ontology Construction with Language Models. arXiv 2023, arXiv:2309.09898. [Google Scholar] [CrossRef]
  51. Babaei Giglou, H.; D’Souza, J.; Auer, S. LLMs4OL: Large Language Models for Ontology Learning. In Proceedings of the 22nd International Semantic Web Conference, Athens, Greece,, 6–10 November 2023; Proceedings, Part II. [Google Scholar]
  52. Li, N.; Bailleux, T.; Bouraoui, Z.; Schockaert, S. Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis. arXiv 2024, arXiv:2403.17216. [Google Scholar] [CrossRef]
  53. Nayyeri, M.; Yogi, A.A.; Fathallah, N.; Thapa, R.B.; Tautenhahn, H.-M.; Schnurpel, A.; Staab, S. Retrieval-Augmented Generation of Ontologies from Relational Databases. arXiv 2025, arXiv:2506.01232. [Google Scholar] [CrossRef]
  54. Yang, H.; Liu, Z.; Xiao, L.; Chen, J.; Zhu, R. An LLM Supported Approach to Ontology and Knowledge Graph Construction. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 5240–5246. [Google Scholar]
  55. Zhang, B.; Carriero, V.A.; Schreiberhuber, K.; Tsaneva, S.; Sánchez González, L.; Kim, J.; de Berardinis, J. OntoChat: A Framework for Conversational Ontology Engineering Using Language Models. In Proceedings of the 21st European Semantic Web Conference (ESWC 2024), Hersonissos, Crete, Greece, 26–30 May 2024; pp. 102–121. [Google Scholar]
  56. Benson, C.-B.; Sculley, A.; Liebers, A.; Beverley, J. My Ontologist: Evaluating BFO-Based AI for Definition Support. In Proceedings of the Workshop on the Convergence of Large Language Models and Ontologies, 14th International Conference on Formal Ontology in Information Systems (FOIS 2024), Enschede, The Netherlands; 2024; pp. 1–10. [Google Scholar]
  57. Li, J.; Garijo, D.; Poveda-Villalón, M. Large Language Models for Ontology Engineering: A Systematic Literature Review. Semant. Web J. 2025, submitted.
  58. David, A.O.; Ndambuki, J.M.; Muloiwa, M.; Kupolati, W.K.; Snyman, J. A Review of the Application of Artificial Intelligence in Climate Change-Induced Flooding—Susceptibility and Management Techniques. CivilEng 2024, 5, 1185–1198. [Google Scholar] [CrossRef]
  59. Wang, B.; Xu, C.; Zhao, X.; Ouyang, L.; Wu, F.; Zhao, Z.; Xu, R.; Liu, K.; Qu, Y.; Shang, F.; et al. Mineru: An Open-Source Solution for Precise Document Content Extraction. arXiv 2024, arXiv:2409.18839. [Google Scholar]
  60. Tartir, S.; Arpinar, I.B.; Moore, M.; Sheth, A.; Aleman-Meza, B. OntoQA: Metric-Based Ontology Quality Analysis. In Proceedings of the IEEE Workshop on Evaluation of Ontologies for the Web (EON), Houston, TX, USA, 27 November 2005. [Google Scholar]
  61. Alirezaie, M.; Khameneh, A.M.; Nagel, T.; Pileggi, S.F. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring. Sensors 2017, 17, 2545. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Workflow of the proposed semi-automated human-AI ontology construction framework.
Figure 1. Workflow of the proposed semi-automated human-AI ontology construction framework.
Water 17 02801 g001
Figure 2. Initial ontology produced via expert formulation.
Figure 2. Initial ontology produced via expert formulation.
Water 17 02801 g002
Figure 3. This expanded ontology illustrates one fully expanded branch down to its lowest-level leaf nodes. The example path shown follows: Flood PhasesPre-Flood PreparednessWarning Time → leaf nodes.
Figure 3. This expanded ontology illustrates one fully expanded branch down to its lowest-level leaf nodes. The example path shown follows: Flood PhasesPre-Flood PreparednessWarning Time → leaf nodes.
Water 17 02801 g003
Figure 4. Progressive visualization of the ontology and knowledge graph across increasing levels of granularity. From the full knowledge graph, the figure narrows to Community Awareness Initiatives, then to its subclass Warning Systems. Three exemplar relationships illustrate how communication systems reduce flood risk: (ClearMessaging → Alerts → HazardousAreas), (ClearMessaging → Facilitates_Communication_For → MarginalizedCommunities), and (DisasterInformation → Guides_Relocation_To → HostAreas). By zooming in, the figure highlights how Warning Systems bridge high-level resilience planning with concrete, actionable strategies that strengthen community preparedness.
Figure 4. Progressive visualization of the ontology and knowledge graph across increasing levels of granularity. From the full knowledge graph, the figure narrows to Community Awareness Initiatives, then to its subclass Warning Systems. Three exemplar relationships illustrate how communication systems reduce flood risk: (ClearMessaging → Alerts → HazardousAreas), (ClearMessaging → Facilitates_Communication_For → MarginalizedCommunities), and (DisasterInformation → Guides_Relocation_To → HostAreas). By zooming in, the figure highlights how Warning Systems bridge high-level resilience planning with concrete, actionable strategies that strengthen community preparedness.
Water 17 02801 g004
Figure 5. Numbered highlights indicate where the NWS message falls short and our framework addresses communication gaps, as discussed in Section 3.5: (1) addressing lack of geographic specificity, (2) translating data into tangible impacts, (3) providing actionable instructions, and (4) tailoring messages for vulnerable populations.
Figure 5. Numbered highlights indicate where the NWS message falls short and our framework addresses communication gaps, as discussed in Section 3.5: (1) addressing lack of geographic specificity, (2) translating data into tangible impacts, (3) providing actionable instructions, and (4) tailoring messages for vulnerable populations.
Water 17 02801 g005
Figure 6. Prompt used for extracting relevant concepts from competency questions about flood management and resilience.
Figure 6. Prompt used for extracting relevant concepts from competency questions about flood management and resilience.
Water 17 02801 g006
Figure 7. Question–concept mapping in JSON for an extracted implicit concept.
Figure 7. Question–concept mapping in JSON for an extracted implicit concept.
Water 17 02801 g007
Figure 8. Human validation of extracted concepts quiz.
Figure 8. Human validation of extracted concepts quiz.
Water 17 02801 g008
Figure 9. Threshold analysis for Type 1: CQ concepts vs. ontology labels.
Figure 9. Threshold analysis for Type 1: CQ concepts vs. ontology labels.
Water 17 02801 g009
Figure 10. Cosine similarity visual for Type 1 evaluation: Test question 40 concepts.
Figure 10. Cosine similarity visual for Type 1 evaluation: Test question 40 concepts.
Water 17 02801 g010
Figure 11. Example of concept matching with varying similarity scores for same concepts.
Figure 11. Example of concept matching with varying similarity scores for same concepts.
Water 17 02801 g011
Figure 12. Threshold analysis for Type 2: CQ concepts vs. ontology descriptions.
Figure 12. Threshold analysis for Type 2: CQ concepts vs. ontology descriptions.
Water 17 02801 g012
Figure 13. Threshold Analysis for Type 3: CQ Questions vs. Ontology Labels.
Figure 13. Threshold Analysis for Type 3: CQ Questions vs. Ontology Labels.
Water 17 02801 g013
Figure 14. Threshold Analysis for Type 4: CQ Questions vs. Ontology Descriptions.
Figure 14. Threshold Analysis for Type 4: CQ Questions vs. Ontology Descriptions.
Water 17 02801 g014
Table 1. Recent Flood and Disaster Ontologies Mini-Review.
Table 1. Recent Flood and Disaster Ontologies Mini-Review.
Ref.YearDomainLocationInformed-byMethodologyIntended UsersUse CaseEvaluationFormality ScoreTechnology
[6]2025Flood-process observationChinaUNISDR stages; China Disaster-Relief Plan (2023); OGC Time + GeoSPARQL; W3C SSN; domain expertsBottom-up, reuse-based design (Protégé); No AIEmergency managers; flood GIS analystsIntegrated query/decision support across flood stagesHenan 2021 case-study + OntoQA metrics5/5OWL in Protégé; GraphDB
[16]2024Flood-impact assessmentUKExisting ontologies (ENVO,SWEET); GeoSPARQL; Public APIs (EA, Met Office, HM Land Registry)Hybrid top–down/bottom-up, competency-question driven; No AIEmergency planners; City Planners; Software agentsReal-time flood-risk impact assessmentCompetency questions; HermiT reasoning5/5OWL in Protégé; Blazegraph
[15]2023Disaster-generalIndiaNational Disaster Management Plan (India); National Disaster Management Authority matrix; BFO; literatureBFO-aligned custom modelling; OWL-DL + SWRL; No AIGovernment disaster managersResponsibility allocation and relief-decision supportScenario-based reasoning tests5/5OWL-DL + SWRL in Protégé
[17]2023Flood-responseBangalore, IndiaAuthoritative Documents (NDMP, KSDMP); Competency Questions; Existing Ontologies (FOAF, EM-DAT)YAMO + NeOn methods; No AIEmergency respondersUrban-flood rescue/relief coordinationReasoners; OOPS!; SPARQL CQs5/5OWL DL (Protégé)
[14]2022Flood disaster response & evacuationFrance (Pyrénées)Prior models; domain experts (firefighters); institutional databases (BD TOPO); hydraulic modelsNeOn design methodology; No AIFirefighters; emergency managersDecision support for generating flood evacuation prioritiesReal-world case study; performance testing (execution time); visualization5/5OWL (Protégé); SHACL; SPARQL; Virtuoso
[18]2021Flood-mitigationPakistan (Indus River)Govt reports (NDMA/PDMA), irrigation manuals, existing ontologies, domain expertsUPON + METHONTOLOGY; No AIIrrigation & disaster managersRiver-flow/flood-mitigation coordinationCompetency questions; HermiT reasoner5/5OWL 2 DL (Protégé)
[13]2020Flood-evacuationThailandFoundational ontologies (UFO, DEMO); academic literatureDesign Science Research; Uschold & King; Gómez-Pérez et al.; No AIFlood response stakeholdersStructuring & sharing information for disaster responseExpert-based (semi-structured interviews)5/5OWL/OWL-S in Protégé; UML
[19]2020Flood-evacuation-centerMalaysiaAcademic literature; existing ontologies; JKM domain input; Previous researchConceptual modelling (no stated framework); No AIEmergency managers (JKM/NADMA)Shared victim-profile dataNone3/5Modeling Diagrams
[12]2019Operational disaster responseFranceInterviews with experts; feedback documents; BFO; CCO; prior ontologiesMETHONTOLOGY; modularization; competency questions; No AIEmergency respondersCross-agency semantic messaging for operational responseHermiT consistency checks; SPARQL over competency questions (Richter-65)5/5OWL (Protégé)
[20]2019Flood-informationNOAA/FEMA/USGS docs; prior flood ontologies; domain expertsTop-down UML → XMI; No AIInformation-system developers; emergency managersNLQ knowledge engine; data exchange; CommunicationApplication-based + data-driven4/5UML/XMI (GenMyModel)
[21]2016Flood-change detectionExisting ontologies (BFO 2.0, W3C Time); Spatial&Temporal models(RCC-8, Allen interval algebra); Domain ObservationsOntology reuse; rule-based encoding; No AIRemote-sensing analysts; emergency managersSpatio-temporal flood detection in RS imagesAutomated reasoning tests (Pellet)5/5OWL-DL + SWRL (Protégé)
[22]2014Flood-forecastingExisting ontologies (SSN,SWEET); hydro/hydraulic literature; domain expertsUschold–Gruninger (skeletal, middle-out); No AIAuthorities; risk managersInteroperable sensor–hydraulic flood forecast/alertNone5/5OWL in Protégé
Table 3. Examples of LLM-generated candidates rejected during human review.
Table 3. Examples of LLM-generated candidates rejected during human review.
Rejected Proposal (Entity/Hierarchy)Rejection ReasonExplanation (Human Reviewer’s Rationale)
Hierarchy: SocioeconomicVulnerabilityFactors → Age → Gender → RaceHallucination/Logical ErrorThe LLM incorrectly created a hierarchical chain where Race is a subclass of Gender, and Gender is a subclass of Age. This is a nonsensical, logically flawed structure. The human reviewer rejects this hierarchy and restructures them as parallel sibling classes, all of which are direct subclasses of SocioeconomicVulnerabilityFactors.
Entities: CommunityResilienceMeasures → Community; EnvironmentalContexts → GeographicArea → CommunityAmbiguityThe LLM generated two concepts with nearly identical labels but placed them in different parts of the ontology. This creates significant ambiguity. A human reviewer consolidates these, likely keeping Community as a subclass of geographicArea and relating it to CommunityResilienceMeasures through an object property (e.g., community -hasResilienceMeasure-> ...), rather than making it a subclass.
Entity: EnvironmentalContexts → GeographicFactors → GeographySemantic DriftThe concept “Geography” refers to an entire academic discipline and is far too broad for the specific scope of this ontology. It has “drifted” from the core topic of flood risk. The human reviewer rejects this entity in favor of more specific and relevant concepts like Topography or Watershed.
Entity: FloodEvent → FloodCharacteristic → KonaStormSemantic Drift“Kona Storm” is a highly specific type of cyclone that primarily affects Hawaii. Unless the ontology’s scope is explicitly global or focused on that region, this concept is too specific and not generalizable. The human reviewer rejects it to maintain the ontology’s focus on more broadly applicable flood concepts.
Entity: TypesOfFloods → CoastalFlood → 1%-annual-chance-flood-levelHallucination/Logical ErrorThe parent class CoastalFlood describes a physical event (the inundation of land), while the proposed subclass 1%-annual-chance flood level is a statistical metric used to measure risk. A metric is a characteristic of a flood or a floodplain, not a type of flood itself. The human reviewer rejects it.
Table 4. Comparison of ontology structural metrics across different ontologies.
Table 4. Comparison of ontology structural metrics across different ontologies.
OntologyObject Properties (P)Subclass Relations ( SC ) RR No. of Classes (C)No. of Subclasses IR Axiom Count
FDSO1146070.1640360712683
DMDO1210010.01366100114075
OntoCity17780.1856960.81196
Our Ontology4733430.5835034313754
Table 5. Ontology coverage matches for first 4 questions in test set at Top k = extracted concept count.
Table 5. Ontology coverage matches for first 4 questions in test set at Top k = extracted concept count.
Test QuestionExtracted ConceptsType 1Type 2Type 3Type 4
What factors make urban areas more susceptible to flooding compared to rural areas?Urban areasCoastal areas; 0.6188areas inhabited by homeless people; 0.4778areas inhabited by homeless people; 0.4691UrbanFlood; 0.5367
Rural areasrural residents; 0.6994rural residents; 0.5519rural residents; 0.5231SocioeconomicVulnerabilityFactors; 0.5353
Flood susceptibilityFlood Risk; 0.7509SocioeconomicVulnerabilityFactors; 0.7287SocioeconomicVulnerabilityFactors; 0.6889EnvironmentalContexts; 0.4759
UrbanizationUrbanFlood; 0.4161Development Type; 0.3980Development Type; 0.3751Vulnerable Population; 0.4756
Land useLandCover; 0.5916zoning; 0.5000zoning; 0.5182Age; 0.4682
Drainage systemsEvacuationCapacities; 0.4116Warning System; 0.4275Warning System; 0.4153Extreme Rainfall; 0.4669
Risk factorsDanger Factor; 0.5673socially vulnerable; 0.4989socially vulnerable; 0.5092Poverty; 0.4664
How can communities identify which type of flooding poses the greatest risk to their specific location?CommunitiesCommunity; 0.6570well-known community; 0.5149Social Network; 0.5022Vulnerable Population; 0.5541
Type of floodingTypesOfFloods; 0.7795TypesOfFloods; 0.6840TypesOfFloods; 0.6607CommunityResilienceMeasures; 0.5445
Greatest riskHealth Risk; 0.5677socially vulnerable; 0.4122socially vulnerable; 0.4126Community Impact; 0.5334
Specific locationlocation; 0.6841Geographic Area; 0.5397Geographic Area; 0.4652GeographicFactors; 0.5304
Flood identificationFlood Event; 0.7138Geographic Area; 0.6244TypesOfFloods; 0.6140Flood Behavior; 0.5258
What communication systems should communities establish before flooding occurs to coordinate during emergencies?Communication systemsReunification Systems; 0.4454Warning System; 0.4238mainstream media access; 0.3950LanguageBarriers; 0.5811
CommunitiesCommunity; 0.6570well-known community; 0.5149Social Network; 0.5022Warning System; 0.5593
Emergency coordinationAnimal evacuation coordination; 0.6710unified evacuation orders; 0.6202unified evacuation orders; 0.6143EvacuationCapacities; 0.5228
Pre-flood phaseFloodPhases; 0.6231Impact Phase; 0.4933Impact Phase; 0.4781disaster information; 0.5087
Pre Flood PreparednessPreFloodPreparedness; 0.8294Emergency Supply; 0.6055Emergency Supply; 0.5861Evacuating Jurisdictions; 0.5073
During/After Flood NeedsDuringFloodResponse; 0.6909meet basic human needs; 0.6454meet basic human needs; 0.6282reduce losses; 0.5044
How can communities develop effective evacuation plans that account for all three flood phases?CommunitiesCommunity; 0.6571well-known community; 0.5149Social Network; 0.5022Evacuating Jurisdictions; 0.5846
Evacuation plansevacuation plan; 0.8730plan compliance; 0.6200plan compliance; 0.5991Evacuation Procedure; 0.5528
Flood phasesFloodPhases; 0.8984FloodPhases; 0.6546FloodPhases; 0.6101EvacuationCapacities; 0.5503
Effective developmentDevelopment Speed; 0.6358Development Type; 0.4732Development Type; 0.4000CommunityResilienceMeasures; 0.5422
Flood managementfloodplain management tool; 0.7074FloodPhases; 0.6242flooding reduction; 0.6002FloodPhases; 0.5153
Table 6. Coverage analysis by comparison type and threshold.
Table 6. Coverage analysis by comparison type and threshold.
CQ Concepts vs. Ontology Labels τ Q14Q15Q19Q20Q24Q25Q29Q30Q34Q35Q39Q40TotalQ-Cov
0.37/75/56/65/54/47/75/57/75/55/57/73/3100.0%12/12
0.47/75/56/65/54/47/75/57/75/54/57/73/398.5%12/12
0.55/75/55/65/54/46/75/57/75/54/57/73/392.4%12/12
0.63/74/55/65/54/42/75/56/74/53/57/72/375.8%12/12
0.71/72/51/63/53/42/74/55/72/53/53/71/345.5%12/12
0.80/70/51/62/52/40/73/54/71/52/51/70/324.2%8/12
0.90/70/50/60/51/40/71/52/70/51/50/70/37.6%4/12
CQ Concepts vs. Ontology Descriptions τ Q14Q15Q19Q20Q24Q25Q29Q30Q34Q35Q39Q40TotalQ-Cov
0.37/75/56/65/54/47/75/57/75/55/57/73/3100.0%12/12
0.46/75/56/65/54/44/75/57/75/54/55/73/389.4%12/12
0.53/74/54/64/53/43/74/56/72/52/53/73/362.1%12/12
0.61/72/53/63/51/42/73/53/72/52/51/71/336.4%12/12
0.71/70/50/60/50/41/70/52/71/50/50/71/39.1%5/12
0.80/70/50/60/50/40/70/50/70/50/50/70/30.0%0/12
0.90/70/50/60/50/40/70/50/70/50/50/70/30.0%0/12
CQ Concepts vs. Ontology Label+Description τ Q14Q15Q19Q20Q24Q25Q29Q30Q34Q35Q39Q40TotalQ-Cov
0.37/75/56/65/54/47/75/57/75/55/57/73/3100.0%12/12
0.46/75/55/64/54/45/75/57/75/54/57/73/390.9%12/12
0.54/73/54/64/53/42/74/55/72/53/53/72/359.1%12/12
0.61/72/52/63/51/42/73/52/72/52/51/71/331.8%12/12
0.70/70/50/60/50/40/70/52/70/50/50/70/33.0%1/12
0.80/70/50/60/50/40/70/50/70/50/50/70/30.0%0/12
0.90/70/50/60/50/40/70/50/70/50/50/70/30.0%0/12
CQ Questions vs. Ontology Label+Description τ Q14Q15Q19Q20Q24Q25Q29Q30Q34Q35Q39Q40TotalQ-Cov
0.3YYYYYYYYYYYY100.0%12/12
0.4YYYYYYYYYYYY100.0%12/12
0.5YYYYYYYYYYYY100.0%12/12
0.6YYY25.0%3/12
0.70.0%0/12
0.80.0%0/12
0.90.0%0/12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, S.; Erickson, C.; Zajac, M.; Guo, X.; Duan, Q.; Gong, J. A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water 2025, 17, 2801. https://doi.org/10.3390/w17192801

AMA Style

Li S, Erickson C, Zajac M, Guo X, Duan Q, Gong J. A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water. 2025; 17(19):2801. https://doi.org/10.3390/w17192801

Chicago/Turabian Style

Li, Shenglin, Caleb Erickson, Michal Zajac, Xiaoming Guo, Qiuhua Duan, and Jiaqi Gong. 2025. "A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication" Water 17, no. 19: 2801. https://doi.org/10.3390/w17192801

APA Style

Li, S., Erickson, C., Zajac, M., Guo, X., Duan, Q., & Gong, J. (2025). A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water, 17(19), 2801. https://doi.org/10.3390/w17192801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop