A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication

Li, Shenglin; Erickson, Caleb; Zajac, Michal; Guo, Xiaoming; Duan, Qiuhua; Gong, Jiaqi

doi:10.3390/w17192801

Open AccessArticle

A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication

by

Shenglin Li

¹

,

Caleb Erickson

¹,

Michal Zajac

¹

,

Xiaoming Guo

¹,

Qiuhua Duan

²

and

Jiaqi Gong

^1,*

¹

Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487, USA

²

Department of Civil Engineering, University of Alabama, Tuscaloosa, AL 35487, USA

^*

Author to whom correspondence should be addressed.

Water 2025, 17(19), 2801; https://doi.org/10.3390/w17192801

Submission received: 21 July 2025 / Revised: 16 September 2025 / Accepted: 16 September 2025 / Published: 23 September 2025

(This article belongs to the Special Issue Recent Advances in Flood Risk Assessment and Management)

Download

Browse Figures

Versions Notes

Abstract

Flash floods are increasingly frequent and severe, yet standard risk communication messages are often too generic and lack actionable guidance, causing them to be ignored. This research aims to enhance flood risk communication by first, developing a robust flood ontology using a novel semi-automated approach, and second, demonstrating its potential as a semantic foundation for translating complex data into clear, personalized public alerts. We introduce a semi-automated, human-in-the-loop ontology engineering strategy that integrates expert-defined schemas with Large Language Model (LLM)-driven expansion and refinement from authoritative sources. Evaluation results are twofold: (1) Technical metrics confirm our LLM-constructed ontology achieves superior relationship richness and expressiveness compared with existing disaster ontologies. (2) A proof-of-concept case study demonstrates the ontology’s potential by showing how its specific classes and relations (e.g., ‘neededForElderly’ relation linking the class ‘SpecialConsideration’ to ‘ElderlyCommunityMember’) can be used to generate targeted advice like “check on elderly neighbors”, transforming a generic alert into a clear and actionable message. Consequently, this research delivers two key contributions: a replicable and domain-adaptable methodology for semi-automated ontology construction and a practical demonstration of how such an ontology can bridge the critical gap between flood data and public understanding, empowering communities to respond more effectively.

Keywords:

flood risk communication; ontology engineering; large language models; knowledge representation; human-AI collaboration

1. Introduction

Flash floods are occurring with greater frequency and intensity due to the accelerating effects of climate change, disproportionately impacting socioeconomically vulnerable communities that often lack the resources for effective preparedness and recovery [1,2]. In this context, effective flood risk communication is critical for enabling rapid response and timely evacuation. However, flood risk communication often fails at the most critical moment [3]. Public alerts are frequently too generic, filled with technical jargon like ‘rapid onset flooding,’ and lack the specific, actionable guidance needed to prompt immediate, life-saving responses from at-risk residents. This communication failure stems from a deeper, systemic issue: current flood information ecosystems remain fragmented. Local responders, relief agencies, and residents often consult siloed databases or ad-hoc reports whose terminologies are inconsistent, hindering rapid sense-making and coordinated action [4].

Structured, semantically meaningful representations, such as ontologies, offer a powerful tool to resolve this fragmentation by creating a shared, logical understanding of the flood domain [5]. However, while several flood and disaster ontologies exist, a critical analysis reveals two key limitations relevant to this research. First, many existing models are highly specialized frameworks designed for expert-level data integration and lack the specific, granular concepts required for public-facing risk communication. For example, ontologies like the Ontology for Flood Process Observation (OFPO) are excellent for organizing sensor and observation data, but are not explicitly designed to translate these data into clear, personalized alerts that account for socioeconomic vulnerability [6]. Second, their construction often relies on either slow, manual expert-driven methods that are difficult to scale and adapt, or on fully automated techniques using large language models (LLMs), which are prone to semantic drift, hallucination, and limitations in trust and accuracy, especially in high-stakes messaging [7,8]. Together, these findings reveal a dual methodological gap that this research addresses: first, the need for an efficient semi-automated pipeline that integrates human expertise with the power of modern AI to build a contextually rich ontology. Second, the need for an ontology specifically designed and demonstrated to bridge the gap between technical flood data and effective public warnings.

To address this dual gap, our research develops a structured, semi-automated methodology in two primary phases. The first phase focuses on ontology construction. It begins with domain experts defining the initial scope, followed by LLMs extending this into a seed ontology. This ontology is then iteratively refined and validated using Competency Questions (CQs) and enriched with structured knowledge extracted from authoritative sources like Federal Emergency Management Agency (FEMA) guidebooks and National Weather Service (NWS) guidelines. The second phase demonstrates the ontology’s practical application in enhancing flood risk communication. To achieve this, we conduct a proof-of-concept case study where the integrated ontology is used as a semantic framework to systematically deconstruct and revise a standard Flash Flood Warning from the NSW. This process involves a comparative ‘before-and-after’ analysis, showing how the ontology’s logical structure can translate generic, technical information into a specific, actionable, and personalized public alert. This two-part methodology provides a robust, context-sensitive resource and a clear validation of its effectiveness, directly linking the technical construction of the ontology to its real-world communication benefits.

The primary contribution of this paper is a novel, semi-automated methodology for ontology construction that effectively integrates human expertise with AI-driven knowledge extraction. This replicable, human-in-the-loop workflow addresses key challenges of fully automated generation, such as semantic drift and hallucination, resulting in a more trustworthy and contextually rich ontological model. Furthermore, to validate the practical relevance of this ontology, the paper presents a proof-of-concept case study. This demonstration shows the significant potential for the generated ontology to enhance flood risk communication by illustrating how its semantic structure can be used to translate a generic, technical warning into a clear, specific, and actionable public alert. The remainder of the paper is structured as follows: The next section reviews related works, discussing broader ontology applications within disaster communication and management contexts. Following that, the paper outlines the proposed methodological framework, detailing the specific phases of ontology development. The subsequent sections present and discuss the evaluation outcomes, while the final section summarizes the key findings and their broader implications.

2. Related Works

2.1. Ontologies in Flood-Related Applications

Ontologies and knowledge graphs can be understood as the abstract and concrete representations of a shared knowledge structure. An ontology specifies the formal vocabulary and schema of a domain, defining the concepts that exist and the relationships among them. A knowledge graph, in turn, instantiates this schema by populating it with concrete data. For example, in the flood-risk domain, the ontology defines entities such as floods, rivers, and evacuation procedures, while the knowledge graph integrates real-world information about particular flood events, river systems, and responses. This interpretation aligns with and synthesizes the range of definitions proposed in the literature. A widely cited foundation comes from Thomas Gruber, who defines an ontology as “an explicit specification of a conceptualization,” establishing the groundwork for treating ontologies as sharable and machine-readable specifications [9]. Practitioner guides such as Ontology Development 101 extend this view, describing an ontology as “a formal explicit description of concepts in a domain of discourse … an ontology together with a set of individual instances of classes constitutes a knowledge base”, thereby explicitly linking ontologies to instance data [10]. In this paper, we interpret the referenced “knowledge base” as the knowledge graph. Guarino et al. further situate ontologies on a continuum from informal to formal, ranging from glossaries and data dictionaries to logic programming and first-order logic, and emphasize that interoperability depends on a shared understanding achieved through community agreement [11]. In this light, an ontology is less a static dictionary than a social contract that aligns stakeholders around shared meanings.

Table 1 provides a summary of recent flood ontologies, highlighting their role in enhancing the efficient sharing of situational facts during emergencies. Early efforts focused on semantic interoperability, exemplified by Elmhadhbi et al.’s POLARISCO suite, which provides a shared dictionary for French responders through modular ontologies [12]. Recent ontologies expand their scope to the entire disaster-management lifecycle. Khantong et al. introduced a flood-evacuation ontology grounded in foundational ontologies, designed to structure and share both static and dynamic information across organizations in the response phase [13]. Bu Daher et al. showed that their ontology could integrate sensor readings, spatial layers, and social information to infer evacuation priorities and guide flood disaster response [14]. Similarly, Shukla et al.’s Disaster Management Ontology aligns its classes directly with India’s national disaster responsibility matrix [15]. The latest contributions, such as Du et al.’s Ontology for Flood Process Observation (OFPO), model not only the domain but the observation-and-decision workflow itself, linking tasks, data, methods, and sensors across mitigation, preparedness, response, and recovery stages [6]. Hofmeister et al. further advance this trend by explicitly targeting software agents, aligning their ontology with emerging artificial intelligence technologies [16]. Nevertheless, as shown in Table 1, flood ontologies still rely on manual design. Where AI is applied, as in Du et al.’s work, its role is limited to named entity recognition (BiLSTM-CRF and Word2Vec) for populating the ontology with instances and constructing the knowledge graph, rather than supporting ontology design itself [6]. In the critical domain of flooding, where knowledge must remain current, advanced AI approaches such as large language models have yet to be leveraged for ontology construction.

Most disaster and flood ontologies adopt established standards and are primarily implemented in OWL using Protégé. On Guarino’s continuum of formality—from plain glossaries (score 1) to logical ontologies in OWL or first-order logic (score 5)—these OWL-based approaches are scored at the highest level of formality in Table 1. However, Greater formal rigor does not necessarily yield superior performance in practice, since lower-formality ontologies such as SQL database schemas or UML diagrams can offer greater usability and efficiency, highlighting the need to balance expressiveness with practical applicability [11]. In the flood domain, this balance is especially critical, since ontologies are not only technical artifacts but also vehicles for communication among diverse stakeholders.

2.2. Ontologies for Flood-Related Communication

Effective communication requires the gathering of relevant information, the sense-making of that information, and the transmission of the resulting understanding as a negotiated and participatory process [23,24]. In flood-related communication, local officials, community leaders, and forecasting agencies gather and interpret information to issue warnings, support preparedness/outreach, and sustain communication into recovery [25,26]. For those responsible for communication, information resources remain fragmented and overwhelming in volume, causing information overload that impedes sense-making and distillation of clear messages [27]. Agencies may label the same phenomenon as “flash flood”, “pluvial event”, or “surface-water inundation”, which complicates cross-dataset queries and coordination. Because floods often escalate into multi-agency crises, fragmented information across hydrologists, utility operators, and emergency managers can further hinder a timely response [28]. The absence of a shared platform, as noted by Dorasamy et al., also reinforces siloed systems and delays collaboration [29].

Ontologies address fragmentation by centralizing information, contextualizing it within a shared domain, and formalizing common concepts [30]. They provide a stable foundation for data interpretation and knowledge exchange, ensuring that communication remains clear and aligned across stakeholders. In flood operations, warnings are disseminated through standardized protocols such as the Common Alerting Protocol (CAP) with NWS event codes in the United States [31]. These pipelines establish models for communicating hazard information but do not reconcile divergent hydrological vocabularies or fragmented agency data. Moreover, hand-crafted models struggle to keep pace with evolving terminology and the coordination demands of multi-agency practices. A flood ontology offers a way to align hazard terminology and codes with message fields while contextualizing information for intended audiences. This motivates a more automated approach that allows practitioners to iteratively validate and refine a schema for communicative content generation. Closest to this perspective, Sermet and Demir propose an information-centric flood ontology with a communication focus and AI-enabled interaction through a “knowledge engine,” supporting natural-language queries and multi-channel delivery [20]. Their ontology, however, is manually engineered and does not incorporate large language models or AI-assisted ontology construction. In contrast, the present work addresses this gap by contributing an AI-assisted methodology for ontology development, demonstrated through a proof-of-concept that generates context-rich flood messages.

2.3. Ontology Construction Methods

Manually crafting ontologies is widely recognized as resource-intensive and impractical to scale across domains [32]. Recent advances apply Artificial Intelligence and Machine learning, categorized by Ghidalia et al. into ontology learning, semantic mining, and learning-reasoning systems, to address challenges of scalability, consistency, and explainability [33]. Ontology learning (OL) has emerged as an approach for creating, maintaining, and populating ontologies with minimal human intervention, ranging from semi-automated systems with targeted human input to fully automated methods using text mining, information extraction, and symbolic reasoning [34]. In recent years, large language models have begun automating ontology engineering tasks [35]. LLMs can rapidly process large volumes of unstructured data with high accuracy, making them particularly advantageous for updating and refining knowledge in dynamic, multi-stakeholder, and time-sensitive domains such as flood communication. Once provided with a well-designed prompt template, modern LLMs can handle much of the tedious work of ontology engineering, as demonstrated by Castro et al., who showed that GPT-4 correctly identified, geocoded, and structured ecological distribution information from text in 87–100% of cases [36]. In disaster-related contexts, AI has been shown to generate structured insights that support decision-making and relieve experts of routine burdens [37].

A review of LLMs in ontology engineering by Li et al. highlights that most work to date focuses on early phases such as conceptualization and encoding. Systems such as OntoGenix use LLMs for preprocessing, schema planning, and refinement, achieving ontology quality comparable to manual models, though still requiring expert input [35]. Other efforts automate knowledge acquisition, such as Kommineni et al.’s pipeline where LLMs generate competency questions, answer them from corpora, and convert results into ontology axioms, reducing expert interviews but still needing human validation [38]. Lo et al. propose a different approach with OLLM, which treats ontology construction as a “sequence-to-graph” task and fine-tunes a 7B-parameter model to generate ontology subgraphs, outperforming extraction-based methods and generalizing well to new domains [39]. Table 2 provides a structured mini-review of representative approaches, illustrating the diversity of input data sources, tasks, automation levels, and evaluation strategies across domains as varied as semiconductor design, news media, and healthcare. Together, this body of work positions LLMs as powerful tools for automating ontology workflows. However, the risk of hallucinations—fluent yet unfounded assertions—highlights the continuing importance of human oversight, as evidenced by the consistent incorporation of expert validation across existing studies [8].

Addressing hallucinations requires moving beyond “human-in-the-loop” supervision toward genuine human–AI collaboration. Raees et al. describe this shift as a move toward interactive AI, where experts not only validate outputs but also shape how systems generate and refine knowledge [40]. Mazarakis et al. reinforce this direction, emphasizing interdisciplinary perspectives from human–computer interaction, psychology, and information science to ensure that humans remain central in design and application [41]. In the disaster response domain, Karanjit et al. illustrate how their Human–AI Convergence framework integrates machine-learning-based flood forecasts, expert knowledge, and social media inputs to improve evacuation planning [42]. Lokala et al. describe a human-led expansion of the Drug-Abuse Ontology, which was used to train an AI classifier that substantially reduced its false-positive rate on social-media data, although AI was not involved in the ontology construction itself [43]. More recent systems, such as Ontogenia, leverage LLMs to translate user stories and competency questions into OWL ontologies, while Tsaneva and Sabou demonstrate a human-in-the-loop crowdsourcing pipeline where semi-expert contributors validated ontology axioms with high accuracy [44,45]. Yet, Ontogenia remains confined to benchmark-style domains, and neither approach demonstrates genuine human–AI collaborative ontology construction and evaluation from expert documents or in critical, domain-specific settings such as flood-related communication. Collectively, these studies suggest a division of labor in which AI manages large-scale information processing, while human experts define scope, resolve ambiguities, and safeguard quality.

Table 2. Ontology learning with large language models—mini-review.

Ref.	Year	Input Data Source	Goal/Task	LLM Role/Strategy	Methodology/Pipeline	Automation	Technology	Evaluation
[46]	2025	Unstructured RAM technical documents (e.g., Semiconductor Draft Document 6578) plus user-provided targeted knowledge snippets	Interactive ontology extraction and subsequent knowledge graph generation tailored to Reliability and Maintainability domain	OpenAI LLM with adaptive iterative Chain-of-Thought prompting inside a conversational user interface	Dialogue collection → CoT ontology extraction (concepts, relations, properties) → KG create and review → Cypher export → Neo4j load	Semi-automatic: human validates ontology steps; KG generation and database import automated	OpenAI API, adaptive CoT algorithm, Neo4j graph DB, Cypher MERGE, interactive web UI	Case study on Semiconductor Draft Document 6578; qualitative human review; future competency question evaluation planned
[47]	2024	IEEE Thesaurus v1.02 PDF + IEEE-Rel-1K (1000 topic pairs)	Relation classification (broader, narrower, same-as, other) for topic ontology	17 LLMs zero-shot; standard and chain-of-thought prompts with one/two-way heuristics	Prompt generation → LLM inference → heuristic aggregation → metric computation	Fully automatic; experts only build gold standard	Python scripts via Amazon Bedrock, OpenAI API, KoboldAI	Precision, recall, F1 on IEEE-Rel-1K
[48]	2024	Natural language wine domain description and competency questions	Automatic ontology generation (specification, conceptualization, implementation)	GPT-3.5 CoT, role-play, few-shot prompting with iterative self-repair	Draft generation → RDFLib syntax check → HermiT consistency check → OOPS pitfall resolution	Fully automatic, post-hoc human analysis	GPT-3.5 API; RDFLib; HermiT; OOPS API; Turtle; metaphactory	Comparison to Stanford wine ontology using OntoMetrics counts and structural/inference analysis
[49]	2024	Reuters Nord Stream pipeline news article (first 12 sentences)	Ontology extraction (classes, individuals, properties) from unstructured text	GPT-4o zero-shot prompts at T = 0.3; direct, sequential, sentence-level variants	Direct one-shot → Sequential (class→individuals→relations) → Sentence-level extraction → Merge	Fully automatic extraction; no human in loop	GPT-4o API; RDF/Turtle; Python scripts for merging and metrics	Precision, recall, F1; average degree score; qualitative inspection against ground truth
[50]	2023	LLM-as-source; GPT-3.5 latent knowledge seeded by a single domain concept	End-to-end concept-hierarchy (taxonomy) induction from scratch for a chosen domain	GPT-3.5 generates lists, descriptions, and self-verifies relations via zero-/few-shot prompting with frequency sampling	Seed concept → existence check → subconcept listing → description → multi-query verification → KRIS-based insertion into hierarchy	Fully automatic batch run; no human intervention during construction	Python + OpenAI GPT-3.5 API; parallel calls; KRIS insertion algorithm; output ontologies in OWL (RDF/XML)	Manual subjective inspection; Structural stats (concepts, subsumptions, prompts/concept, cost)
[51]	2023	WordNet WN18RR terms; GeoNames categories; UMLS (NCI, MEDCIN, SNOMED CT) concepts; Schema.org type taxonomy	Zero-shot term typing, taxonomy discovery, and non-taxonomic relation extraction to construct ontologies	Seven LLMs queried with cloze/prefix prompts; FLAN instruction-tuning evaluated for gains	Prompt design → LLM inference → compare outputs to gold ontologies via MAP@1 or F1 metrics	Fully automatic zero-shot runs; domain experts only planned for later validation	HuggingFace models (BERT, BART, BLOOM, Flan-T5) and GPT-3; open-source Python codebase	Gold WordNet, GeoNames, UMLS, Schema.org sets; MAP@1 for typing, F1 for taxonomy and relations
[39]	2024	Wikipedia titles and summaries; arXiv titles and abstracts (2020–2022); each document annotated with categories	End-to-end taxonomy induction—discovering concepts and taxonomic is-a relations from scratch	Mistral-7B finetuned via LoRA; custom frequency-masked loss; generates document-level subgraph paths	Linearise relevant paths → LLM outputs subgraphs → sum edge weights → prune loops, inverses, low-weights → final ontology	Fully automatic batch pipeline; no human-in-the-loop after data collection	LoRA-adapted Mistral, vLLM runtime, Sentence-BERT embeddings, Hungarian assignment, simple graph convolutions for metrics	Literal, Fuzzy, Continuous, Graph F1 plus motif distance against Wikipedia and arXiv gold taxonomies
[52]	2024	Rule sets from seven ontologies—Wine, Economy, Olympics, Transport, SUMO, FoodOn, Gene Ontology	Ontology completion—predict missing concept-inclusion axioms within each ontology	Fine-tuned or zero-shot LLMs used as NLI classifiers on verbalised rules; act as fallback judge in hybrid system	Extract rule templates → build concept graph → GNN scores candidates → NLI classifier → hybrid combines GNN first, LLM when no template match	Fully automatic pipeline; human effort limited to annotating hard negative test rules	DeepOnto BERTSubs; RoBERTa, Llama-2, Mistral, Vicuna; GCN/GAT/R-GCN with ConCN embeddings	F1 on manually validated hard negatives across seven ontologies; inter-annotator k up to 0.83 for negatives
[53]	2025	Relational database schemas, natural-language schema documentation, external BioPortal ontologies	Iterative ontology generation and enrichment from relational database schemas	Gen-LLM with hybrid recursive RAG; Judge-LLM or expert refinement; zero-shot prompts	Table traversal → RAG retrieval → prompt → delta ontology → judge validation → merge → iterate	Mostly automatic; optional human or Judge-LLM review of each fragment	OWL 2 DL (Manchester), Faiss ANN index, SBERT embeddings, Protégé, HermiT reasoner	Protégé syntax, HermiT consistency, OOPS pitfalls, structural metrics, semantic coverage, CQ scores on two medical databases
[35]	2025	Six Kaggle CSV datasets on airlines, Amazon beauty ratings, BigBasket products, Brazilian e-commerce orders, consumer complaints, UK e-commerce sales	Semi-automatic ontology construction plus RML mapping and RDF knowledge-graph materialisation from tabular data	GPT-4 multi-agent prompting (Prompt Crafter, Plan Sage, OntoBuilder, OntoMapper) with iterative self-repair of mappings	GUI interaction → data preprocessing → schema definition → ontology building (Turtle) → RML mapping → KGen RDF generation with feedback loop	Semi-automatic; LLM drives tasks while users iteratively refine prompts and validate results via Assist Bot GUI	Python OntoGenix MVC GUI, GPT-4 (gpt-4-1106-preview), OWL/Turtle, RML, KGen, Morph-KGC; code on GitHub/Zenodo	Compared six OntoGenix vs. human ontologies using 19 OQuaRE metrics, OOPS! pitfalls, expert review, and time-saving analysis
[54]	2024	PubMed breast-cancer research articles and NCCN treatment guidelines	Expand seed ontology and populate breast-cancer treatment knowledge graph	ChatGPT-3.5 fine-tuned on domain texts; prompts generate CQs; RAG answers; LLM judge scores outputs	Seed ontology → LLM CQs → expert check → RAG retrieval → redundancy pruning → LLM triple extraction → KG assembly	Semi-automatic with domain-expert validation of LLM-generated CQs and triples	Protégé editor, PubMed RAG pipeline, ChatGPT-3.5 backend	Five PubMed articles manually tagged; LLM judge scored accuracy, completeness, relevance, consistency (1–5)
[55]	2025	Elicited user stories and competency questions plus the existing Music Meta OWL ontology text	Conversational support for requirement elicitation, CQ extraction, analysis, and testing of ontologies	GPT-3.5-turbo with one-shot/few-shot prompts acting as elicitor, generator, clusterer, and judge for CQ verification	Persona chat → CQ generation and refinement → redundancy removal and clustering → ontology verbalisation → prompt-based CQ unit tests	Semi-automatic; human-in-loop refinement and confirmation, with automatic clustering and testing stages	Python 3.11, Gradio UI on HuggingFace Spaces, OpenAI GPT-3.5 API, OWL verbaliser module	Music Meta case study; N = 6 experts and N = 8 engineers surveys; CQ test accuracy 87.5% (P 88%, R 85.7%)

Nonetheless, significant challenges remain. Benson et al. report that large language models often deviate from established upper-level frameworks such as BFO, producing isolated “ontology silos” and undermining definitional clarity [56]. Li et al.’s systematic review goes further, arguing that much of the field suffers from ad hoc task design, inconsistent evaluation practices, and poor reproducibility [57]. Their call for standardized benchmarks and hybrid methods underscores that automation alone cannot ensure ontology quality. Later tasks such as documentation and maintenance also remain underexplored [57]. David et al. reach a similar conclusion in the domain of flood management, finding AI–human integration to be rare and highlighting the absence of end-to-end systems linking machine learning, image analysis, and expert input across disaster phases [58]. Taken together, these findings suggest that while AI can accelerate ontology construction, methodological weaknesses and limited integration with expert workflows continue to undermine reliability, especially in high-stakes settings. To address this gap, our methodology adopts an end-to-end human–AI collaborative approach that spans concept and relation extraction from expert sources, ontology construction, and evaluation, culminating in a practical proof of concept in flood-related communication.

3. Methodology

The proposed ontology-engineering strategy follows a progressive enrichment paradigm, in which human expertise delineates the conceptual boundary of the domain, and LLMs subsequently elaborate on that boundary in a series of data-driven refinement cycles. This semi-automated system aims to unlock the full potential of human-AI collaboration by having LLMs perform tasks that traditionally require human labor. Nevertheless, human intervention for verification and fine-tuning is integrated to ensure the accuracy and prevent hallucination in all intermediate results from the LLMs. Figure 1 shows the workflow of this semi-automated system, with its four main stages detailed below.

3.1. Stage 1: Expert Formulation of the Initial Ontology

This research began by closely examining program requirements and conducting stakeholder interviews, with a specific focus on flood risk communication, disaster management, and resilience within socioeconomically vulnerable communities. Guided by insights derived from this initial analysis, we developed a seed ontology whose top-level classes represent six key narrative pillars: Types of Floods, Flood Phases, Impacts, Socioeconomic Vulnerability Factors, Environmental Contexts, and Community Resilience Measures. Each of these top-level classes included explicit subclasses, for example, under the Types of Floods category, subclasses such as CoastalFlood, RiverineFlood, FlashFlood, and UrbanFlood were defined. LLMs were then utilized to generate meaningful labels and descriptions for each top-level class and its corresponding subclasses. These labels facilitate human readability, while the accompanying descriptions support machine processing through embedding-based semantic identification. At the end of Stage 1, the ontology primarily consisted of a structured class hierarchy, without yet incorporating specific properties. The resulting initial ontology structure is illustrated in Figure 2.

3.2. Stage 2: Competency Question-Driven Ontology Expansion with LLMs

Using the seed ontology as the foundational context, we leveraged the GPT-4o model to generate a diverse and comprehensive set of CQs, explicitly aiming to identify analytical challenges essential to flood risk communication knowledge frameworks. Following this automated generation, we conducted a rigorous manual review, retaining only forty high-quality CQs. These questions specifically targeted critical analytical issues, such as correlating socioeconomic vulnerability factors with evacuation delays and distinguishing environmental conditions between flash floods and riverine floods. To facilitate subsequent evaluation and ontology refinement, we partitioned these selected CQs into 51 distinct groups: 70% for ontology enhancement and 30% reserved exclusively for evaluating the final ontology.

The 70% group of CQs was used to drive the ontology’s expansion by extracting new conceptual entities. This was accomplished using the GPT-4o model with a low temperature of 0.1 to ensure deterministic and accurate outputs, thereby mitigating the risk of LLM hallucination. The model was tasked with identifying and describing key classes and relationships from the CQs. The specific prompts used for CQ generation, entity extraction, and entity placement are detailed in the Appendix A. The resulting candidate entities were then subjected to a rigorous two-step verification and deduplication protocol. First, an automated filtering process utilized OpenAI’s text-embedding-3-large model to calculate the cosine similarity between all new and existing entity descriptions. It removed new candidates with a high degree of similarity (threshold > 0.7), a value determined through experimentation to provide an optimal balance between filtering accurate duplicates and preserving conceptual nuance. This initial pass proved highly effective at reducing redundancy. Second, the remaining candidates underwent a brief human review to address nuanced issues of conceptual granularity, such as consolidating ‘Inundation Warning System’ into the more general parent concept of ‘Warning System.’ Due to the efficacy of the automated pre-filtering, we found a high degree of consensus among reviewers on these final consolidation decisions.

Once the set of new entities was validated, the LLM was prompted again to propose the most appropriate hierarchical placement for each new class within the existing taxonomy and to suggest potential semantic relationships. These structural suggestions underwent a final review by a human expert to ensure logical consistency and appropriate domain-range assignments. The integration of these fully-approved entities and relationships transformed the initial Stage 1 taxonomy into a richly connected T-Box, capable of addressing the targeted CQs.

3.3. Stage 3: Schema Enrichment from Authoritative Documents

To further enrich the ontology with additional domain-relevant concepts and instances, we collected two categories of external materials. The first category included federal recommendation documents, primarily sourced from authoritative agencies such as FEMA and the Department of Homeland Security. The second category comprised relevant and timely news articles collected through web scraping. To ensure LLMs could accurately process the document content, we utilized a document extraction tool, MunerU [59], to preprocess these materials from their original PDF format into machine-readable Markdown text. Leveraging LLMs, we subsequently identified sentences containing domain-specific knowledge pertinent to the existing class hierarchy. By following a structured chain-of-thought process, the LLMs systematically classified these sentences through a hierarchical approach, progressing from broad classifications aligned with top-level classes down to more detailed subclass categorizations.

Given that the scraped news articles predominantly captured current and up-to-date flood-related events, we treated the knowledge extracted from this source primarily as ontology instances. These simple, event-driven instances were directly populated into the ontology and subsequently visualized within a knowledge graph, preparing the groundwork for subsequent stages of analysis. In contrast, the knowledge extracted from federal documents contained a mixture of new concepts and instances. Consequently, we divided these extracted entities into two distinct groups: conceptual knowledge and instance data. The conceptual knowledge was integrated into the ontology following a systematic four-step procedure: (i) removing duplicates via embedding-based semantic similarity, (ii) leveraging LLMs to assign each concept to the most appropriate class in the existing hierarchy, (iii) inserting the validated classes and relationships into the ontology structure, and (iv) subjecting the integrated ontology to human expert review and approval. This collaborative human–AI approach yielded a richer, more policy-aware ontological schema while preserving the foundational structure established in Stages 1 and 2. Because the full ontology is too large to display in a single, comprehensible figure, we instead illustrate in Figure 3 one fully expanded branch down to its lowest-level leaf nodes. The refinement of the LLM-generated candidates followed the two-step verification protocol detailed in Section 3.2, which combines an automated embedding-based filter with human expert review. This hybrid process proved to be highly selective, with approximately 8% and 10% of the initial entity and relationship candidates rejected during this stage. Table 3 shows representative examples of rejections across three categories: hallucination, ambiguity, and semantic drift.

3.4. Stage 4: Instance Population from Web-Scraped Articles

Although the primary focus of this paper is the construction of an ontological schema, we demonstrate its practical utility through a sample knowledge graph populated with real-world data. A targeted web scraper collects recent news articles that describe significant flooding events and their impacts on socioeconomically vulnerable communities. The unstructured text is processed by a custom pipeline that employs LLMs to iteratively classify extracted entities and conceptual knowledge into progressively deeper subclasses defined by the ontology. At each level of the hierarchy, entities are assigned to child classes if available; otherwise, classification terminates at a leaf class. This iterative process continues until all entities are categorized into their most granular ontological class. The resulting set of candidate entities is then refined using an additional LLM-based filter to ensure semantic quality and contextual relevance. Subsequently, another LLM agent infers candidate relationships between the filtered entities, guided by the set of permissible relationships defined in the ontology. It is important to note that these relationships are derived from the semantic and contextual content of the entities themselves. Thus, the existence of a valid ontological relationship between two classes does not entail that such a relationship holds between all corresponding entities of those classes. A relationship is created in the knowledge graph only when there is sufficient contextual evidence within two entities to support it. The fully curated nodes and edges are then ingested into a Neo4j graph database. Figure 4 illustrates a sample visualization of the resulting ontology-backed knowledge graph at different granularity levels.

3.5. Stage 5: Case Study: Demonstrating Ontology Application

To validate the practical utility of the ontology constructed in the preceding stages, we conducted a proof-of-concept case study. This final stage demonstrates how the ontology can be applied as a semantic framework to transform a real-world flood warning into a more effective public warning. We selected an authentic Flash Flood Warning issued by the National Weather Service (NWS) as our baseline message for this case study. The original message, issued for Bradley and McMinn Counties in Tennessee, serves as a representative example of current warning templates. A systematic analysis of this baseline message reveals several critical communication gaps that hinder public comprehension and response. First, it suffers from a lack of geographic specificity; a warning for two entire counties is too broad to inform a resident whether their specific neighborhood or street is in immediate danger. Second, the impact assessment is vague, relying on technical data like rainfall measured in inches instead of describing tangible impacts, such as impassable roads, that are meaningful to residents. Third, the call to action is generic, advising residents to “act quickly” without providing specific, actionable instructions like naming a safe evacuation point. Fourth, the one-size-fits-all message has no consideration for vulnerable populations, failing to account for the unique needs of the elderly, persons with disabilities, or those with language barriers. Our methodology demonstrates the potential of the ontology as a semantic framework to address these gaps systematically. This process is not a fully automated system, but a proof-of-concept illustrating how the ontology’s structure can be queried to enrich and translate a baseline message. This is achieved by leveraging the ontology’s concepts (Entities) and semantic logic (Triplets) to resolve each of the four identified communication gaps:

1. Addressing Lack of Geographic Specificity. To counter the broadness of the original alert, the framework utilizes the ontology’s conceptual hierarchy by querying for specific Entities like “Community” and “ZipCode”. The semantic logic encoded in Triplets, such as the relation “

(FlashFlood_Event) \to locatedInArea \to (Community)

”, provides the mechanism to pinpoint the warning to a precise, at-risk neighborhood.

2. Translating Data into Tangible Impacts. To resolve the use of technical jargon and data, the system leverages the “Impacts” hierarchy. This allows it to model real-world consequences using Entities like “InfrastructuralImpact” and “RoadDamage”. The logical connection in a Triplet, such as “

(FlashFlood_Event) \to hasImpact \to (RoadDamage)

”, enables the translation of abstract data (e.g., rainfall in inches) into a concrete and understandable impact (e.g., impassable roads). Furthermore, the ontology’s catalog of historical incidents can be queried to provide crucial context. By identifying a similar past event through a semantic relation, the alert can offer a precedent, such as, “This event may cause flooding similar to the May 2021 flood”, making the potential threat more immediate and relatable for residents.

3. Providing Actionable Instructions. To replace generic advice, the ontology contains specific Entities for actions and resources, “evacuationProcedure”. The framework can then query for local knowledge stored as Triplets, such as “

(Fairview_Evacuation_Route) \to isA \to (EvacuationProcedure)

”, to provide specific guidance like naming safe locations.

4. Tailoring Messages for Vulnerable Populations. The ontology’s rich hierarchy of Entities, including “ElderlyCommunityMember” and “PersonWithDisability”, allows for targeted messaging. Key Triplets, such as the relation “

(SpecialConsideration) \to neededForElderly \to (ElderlyCommunityMember)

”, provide the explicit logic needed to generate critical, life-saving advice for at-risk groups that is absent in generic alerts.

Figure 5 provides a side-by-side comparison of the original NWS message and the revised, ontology-driven alert, highlighting the key improvements. The ontology-driven message uses information extracted from official documents and targeted web articles, parsed into the ontology, and used as a structured knowledge source to produce community-specific, tailored, and actionable insights.

3.6. Domain Adaptability and Generalization

While this study focuses on flood risk, the proposed semi-automated ontology construction methodology is designed to be domain-adaptable and generalizable to other disaster contexts, such as wildfires, earthquakes, or public health emergencies. The strength of the four-stage workflow, expert-driven schema formulation, LLM-driven expansion, CQ-based validation, and enrichment from authoritative sources lies in its process, which is not specific to hydrology. It provides a robust and replicable method for combining human expertise with AI-driven knowledge extraction in any complex domain.

Applying this methodology to a different disaster requires adapting the domain-specific inputs at each stage. For a wildfire scenario, for instance, the authoritative sources would shift from the NWS and FEMA to agencies like the National Interagency Fire Center (NIFC), and the ontology’s core concepts would involve Fire Behavior, Containment Status, and Air Quality Index instead of Flood Phases. Similarly, an earthquake ontology would be built from U.S. Geological Survey (USGS) data and include key entities such as Epicenter, Magnitude, and Aftershock. For a health disaster, sources like the CDC and WHO would inform a schema centered on Pathogen, Transmission Vector, and Public Health Measures. Despite necessary adaptations to domain content, the proposed methodology, leveraging the ontology to transform a generic technical warning into a clear, actionable public warning, demonstrates the ontology’s practical utility in a universally applicable way.

4. Evaluation Results

To systematically evaluate the quality and completeness of the ontology developed in this research, the evaluation is divided into two parts. First, we assessed the schema quality using established OntoQA metrics [60], which provide insights into the Relationship Richness (RR) and Inheritance Richness (IR). Second, to examine the comprehensiveness of the ontology in covering essential domain concepts, we conducted a concept convergence check. This evaluation involved applying the remaining 30% of CQs that were deliberately set aside from the earlier ontology development phase. If the ontology could successfully align with and represent the concepts inherent in these questions, it would indicate robust conceptual coverage, suggesting that the ontology can effectively support query answering and information retrieval once populated with domain-specific data.

4.1. Structural Evaluation Using OntoQA Metrics

To rigorously evaluate the structural quality of our generated ontology, we utilized three established OntoQA metrics: RR and IR. These metrics provide quantitative insights into the expressivity, completeness, and semantic richness of the ontology. RR measures the proportion of relationships defined within the ontology schema, excluding inheritance relations. Specifically, it is calculated as the ratio of the number of non-inheritance relationships (object properties) to the sum of these relationships and subclass (Inheritance) relationships:

R R = \frac{P}{P + S C}

(1)

where P is the number of object properties (non-inheritance relationships), and

S C

is the number of subclass relationships (inheritance).

Inheritance Richness (

I R

) evaluates the distribution and depth of the ontology’s class hierarchy, indicating how extensively classes are specialized.

I R

is defined as the average number of subclasses per class and is calculated as follows:

I R = \frac{S C}{C}

(2)

where C is the total number of classes.

Furthermore, the total number of axioms from Protégé is utilized to reflect the structural complexity of the ontology. This metric encompasses statements defining classes, relationships, property characteristics, and individual assertions. To benchmark the quality of our generated ontology, we compare it with several well-established, high-quality disaster-related ontologies, including Flood Disaster Support Ontology (FDSO) [17], Disaster Management Domain Ontology (DMDO) [15], and OntoCity [61].

Based on the comparative results summarized in Table 4, our ontology demonstrates notable advantages and some limitations relative to established disaster ontologies. Utilizing LLMs for ontology construction allowed our ontology to achieve a higher Relationship Richness (

R R

= 0.58), substantially outperforming FDSO (0.16), DMDO (0.01), and OntoCity (0.18). This indicates that our ontology leverages more extensive semantic relationships beyond basic hierarchical structures, providing a richer, interconnected conceptual framework. Moreover, our ontology contains a large number of axioms (3754), indicating significant expressiveness and semantic depth comparable to comprehensive ontologies such as DMDO (4075 axioms). However, despite having a substantial class count (350), our inheritance richness (

I R

= 1) suggests a relatively flat hierarchical structure, limiting its depth of specialization compared with more hierarchically nuanced ontologies, such as OntoCity (

I R

= 0.81). Therefore, future improvements will involve refining the hierarchical structure and introducing data properties to boost descriptive specificity and practical applicability.

4.2. Concept Coverage Evaluation via Competency Questions

4.2.1. Concept Extraction

Concept coverage is a descriptive measure of our ontology, answering the question: “Are the concepts in the CQs adequately covered by the concepts in the ontology?” One method to assess concept coverage is to extract words from the CQs using named entity recognition (NER) and perform exact word matching. However, this approach would miss concepts that are semantically similar but are spelled differently. To address this, we use vector embeddings, which encode the semantic meaning of words, phrases, or sentences into a numerical, high-dimensional latent space. Given a corpus of text, each word has complex, nonlinear relationships and patterns with other words. The embedding model, trained on massive text datasets, allows us to compare the similarity between words, phrases or sentences in this high-dimensional space. For this evaluation, we will be using OpenAI’s text-embedding-3-large model. The evaluation is conducted on a CQ test set that is distinct from the CQs upon which the ontology was originally built.

The first step involves extracting concepts from the CQs using an LLM. Figure 6 illustrates the designed prompt, where we constrained the LLM to select three to seven key concepts. Surprisingly, these selected concepts included both explicit and implicit notions.

Interestingly, we found that the model occasionally extracted concepts that were not directly stated in the CQs but represented a higher level of abstraction. For example, from the question, “How can communities develop effective evacuation plans that account for all three flood phases?” the LLM extracted “flood management”. While not explicitly stated, it is a relevant concept for answering the question. An example of this is provided in Figure 7. This behavior can be regulated by adjusting the prompt or the model’s temperature. However, including these non-explicit concepts is beneficial, as important ideas for answering a question may not be stated directly in the language. But it does also carry the risk of generating irrelevant concepts. Based on the prompt, the concepts do not need to be single words, and we avoid stop words like “how”, “what”, etc.

To ensure the LLM concept extraction achieved good coverage and quality for ontology construction, we administered a comprehensive evaluation quiz to three evaluators who are actively involved in constructing the flood ontology. The quiz presented each evaluator with all 56 LLM-extracted concepts across 12 CQs, asking them to assess each concept’s relevance, accuracy, and utility for building a flood management ontology. Evaluators could mark concepts as “good” or “bad” and suggest additional concepts the LLM missed, as shown in Figure 8.

The results were processed using a majority voting system where concepts receiving approval from at least 2 out of 3 evaluators (>50%) were retained, while those with minority support were removed. This validation process demonstrated high LLM performance with a 93.5% approval rate for extracted concepts and an average quality rating of 4.5/5 from evaluators. The human validation resulted in a refined concept set of 66 concepts (up from 56), with two generic concepts like “Flooding” removed due to insufficient evaluator support and 12 new concepts added based on their suggestions. This human-validated concept extraction ensures higher-quality evaluation standards and establishes a human feedback-driven process for continuously refining and extending our flood ontology.

4.2.2. Evaluation and Results

Once all concepts are validated for each CQ, they are then embedded using text-embedding-3-large model. We also embed the ontology labels, which we refer to as the embedded ontology concepts. We then measure the distance between each CQ concept and every ontology concept embedding using cosine similarity. Because we are uncertain about the optimal similarity score, we compare results across different similarity thresholds. We call this our Type 1 comparison, with the results shown in Figure 9. Figure 10 presents a radial plot with the concepts from test question 40 at the center, surrounded by candidate ontology labels. The proximity of each label to the center indicates the degree of cosine similarity, with closer labels representing higher similarity.

A good example illustrating the need for threshold testing is the similarity between the terms “communities”, “Community”, and “community” shown in Figure 11. While we intuitively recognize that they represent the same concept, the embedding model treats them as distinct and assigns different similarity scores when compared with the extracted CQ word “Communities”. But which score should we use as the cutoff? For instance, if we set the threshold at 0.6, we might capture “Community” but miss “community”. This example also highlights the need to identify and possibly merge similar concepts. For example, “Community” appears at depth one in the ontology, while “community” appears at depth two, effectively duplicating the same concept at different levels.

A significant issue with embedding single-word concepts is the loss of context. First, we make the assumption that the concepts to answer the question are explicitly stated in the question itself. Though most of the time true, it may not always be the case. We showed previously that the LLM can extract some implicit concepts for us, but that only happens occasionally. Second, homonyms, words with the same spelling but different meanings, present a challenge. Consider “She made a deposit at the bank” versus “She went to the river bank”. Both use the word “bank”, but with different meanings. If we only extract “bank”, how do we determine which meaning is intended? This problem is known as “Word-Sense Disambiguation”. To combat this, we embed the ontology label comments, which we call ontology concept descriptions. This means we look for matches between the CQ concepts and the descriptions of the ontology concepts. The results of this Type 2 comparison are shown in Figure 12.

We also embed the ontology label along with its description, concatenated as “label:” + “description”, and compare it to the extracted concepts. We refer to this as type 3, which is shown in Figure 13. To preserve the context of the concepts even further, we can embed the entire question and identify which ontology concepts best match the embedding of the full CQ. In Figure 14, we embed the question and compare it to the “label:” + “description” embedding. This approach utilizes the available context for both the question and the ontology concepts; however, it skips the concept extraction step. Nonetheless, it is worth testing to ensure we are not missing out on coverage results.

The complete algorithm for our coverage evaluation is provided in Algorithm 1.

Algorithm 1 Ontology Coverage Evaluation
Require:
1: Competency Questions $Q$
2: Ontology Concepts $OC$
3: Ontology Concept Descriptions $OD$
4: Threshold Set $T$
5: LLM Prompt P
Ensure: Coverage results ${R_{1}, R_{2}, R_{3}, R_{4}}$ for each $τ \in T$
Pre-processing
6: $E_{OC} \leftarrow {embed (c) ∣ c \in OC}$	▹ label embeddings
7: $E_{OD} \leftarrow {embed (d) ∣ d \in OD}$	▹ description embeddings
8: $E_{OC + OD} \leftarrow \{embed (c ∥ d) ∣ c \in OC, d \in OD\}$	▹ label+desc embeds
9: for each $q \in Q$ do
10: $e_{q} \leftarrow embed (q)$	▹ question embed
11: $C_{q} \leftarrow LLM_extract (q, P)$	▹ concepts in q
12: $E_{C_{q}} \leftarrow {embed (c) ∣ c \in C_{q}}$	▹ concept embeds
Coverage Evaluation
13: for each $τ \in T$ do
14: R1: $R_{1} = {c \in C_{q} ∣ {max}_{e \in E_{OC}} sim (e, e_{c}) \geq τ}$	▹ c vs. label
15: R2: $R_{2} = {c \in C_{q} ∣ {max}_{e \in E_{OD}} sim (e, e_{c}) \geq τ}$	▹ c vs. desc
16: R3: $R_{3} = {c \in C_{q} ∣ {max}_{e \in E_{OC + OD}} sim (e, e_{c}) \geq τ}$	▹ c vs. label+desc
17: R4: $R_{4} = [{max}_{e \in E_{OC + OD}} sim (e, e_{q}) \geq τ]$	▹ question vs. label+desc
18: end for
19: end for

To see the top-k concepts from the ontology for each type of coverage evaluation, where k is the count of the extracted concepts, please refer to Table 5. This table shows the cosine similarity score for each concept and provides insights into what is considered “similar” depending on what we embedded. Table 6 provides a detailed numerical summary for every question and its concept matches for Type 1 and Type 2 comparisons. For Type 3 and Type 4, we are comparing against the entire question, so we assess the coverage of ontology concepts relative to the question itself, not its extracted concepts. In these cases, we indicate coverage at a given threshold with a “Y” for yes and a “-” for no. Q-Cov represents the number of questions for which all expected concepts were successfully matched at or above a given similarity threshold, expressed as a fraction of the total number of questions.

4.2.3. Analysis

Across all four comparison types, we observe a consistent coverage–threshold trade-off as the similarity threshold

τ

increases. However, the steepness of this trade-off varies significantly depending on the type of comparison. At the lowest cutoff (

τ = 0.3

), all evaluations achieve perfect coverage, capturing all 66 concepts in the concept-level tests and all 12 questions in the question-level test. Increasing the threshold to

τ = 0.4

slightly reduces coverage, with concept-to-label matches (Type 1) remaining nearly complete at 98.5%, while concept-to-description (Type 2) and concept-to-“label + description” (Type 3) matches decrease to 89.4% and 90.9%, respectively. An example of the difference that the label description provides is for the third question in Table 5, where “emergency coordination” matches to “animal evacuation coordination” for Type 1, while for Types 2 and 3 it yields “unified evacuation orders.”

The most significant divergence occurs between thresholds

τ = 0.5

and

τ = 0.6

(Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14). At

τ = 0.5

, coverage rates stand at 92.4% for Type 1, 62.1% for Type 2, 59.1% for Type 3, and 100% for Type 4. The unweighted average coverage across these four comparison types at

τ = 0.5

is

\frac{92.4 % + 62.1 % + 59.1 % + 100.0 %}{4} = 78.4 % .

However, raising the threshold incrementally to

τ = 0.6

causes a sharper decline, with coverage falling to 75.8% for Type 1, 36.4% for Type 2, 31.8% for Type 3, and dropping dramatically to only 25.0% for Type 4. Further increasing the threshold to

τ = 0.7

significantly reduces coverage to just 45.5% for Type 1, with single-digit or zero coverage for the other three types.

Taken together, these trends underscore the importance of contextual detail while locating a practical “elbow” at roughly

τ = 0.5

–

0.6

. Operating in this band retains relevant matches across every mode while filtering obvious noise. Some of the false negatives that appear at higher thresholds point to structural issues in the ontology itself, such as duplicate classes (e.g., redundant instances of Community) or missing terminology for key domain concepts like flood phases. Addressing these by merging redundancies, enriching descriptions, and incorporating salient terms could improve alignment in future iterations.

The concept coverage evaluation is not a perfect reflection of the ontology’s accuracy or utility in downstream tasks and contains ambiguity regarding the “right threshold”. We could achieve 100% concept coverage by comparing our CQs against a dictionary of all words, but we know that a dictionary is not built for our use case and does not contain the domain reasoning among concepts. It is still important to know that we have successfully modeled the necessary concepts within the constraints of our research problem. When we populate this ontology and locate real data instances to answer the CQs, we will have a way to navigate to the data through the conceptual blueprint we have generated.

This evaluation also reveals areas for improvement in the ontology, such as duplicate classes that need merging, inconsistencies, and missing concepts. The more questions we ask, the better we can iteratively improve the ontology. If we wish to add concepts from a new domain and ask new CQs, we can use this method to evaluate whether those concepts are contained. Because we use an LLM to generate CQs, we are much more efficient at covering a given domain. Human oversight is still necessary to ensure the questions are worthwhile and relevant, but this process is much faster than manual question creation.

5. Discussion

The results of this study introduce a novel semi-automated, human-in-the-loop methodology for ontology construction and demonstrate its application in enhancing flood risk communication. This section provides a critical interpretation of our findings, positioning them within the context of prior work, addressing key methodological advancements, and acknowledging the limitations of the current research.

5.1. A Critical Comparison of Ontological Structure and Purpose

Our evaluation highlights a distinct structural profile for our ontology when compared with established models like the FDSO, DMDO, and OntoCity. The most notable distinction lies in our ontology’s substantially higher Relationship Richness (

R R

= 0.58) compared with FDSO (0.16), DMDO (0.01), and OntoCity (0.18). This outcome reflects our application-driven design. Ontologies like FDSO and DMDO are primarily architected for expert-level data integration, prioritizing deep, hierarchical classification to organize vast datasets. In contrast, our ontology is purpose-built to provide the semantic framework for public-facing risk communication. Effective communication requires modeling complex, cross-cutting relationships that link, for instance, a “SocioeconomicVulnerabilityFactor” to a specific “InfrastructuralImpact” and a corresponding “CommunityResilienceMeasure”. The dense network of non-hierarchical object properties is essential for capturing this context, enabling the translation of technical data into meaningful public alerts.

This design choice involves a trade-off, reflected in our Inheritance Richness (

I R

= 1), which suggests a relatively flat class hierarchy. While this indicates less depth of specialization compared with more formally structured ontologies, it aligns with our immediate goal of developing a communication-centric framework. Nonetheless, we acknowledge this as a limitation and a clear avenue for future work, where incorporating more granular data will be essential to enrich this hierarchical structure without losing relational richness.

5.2. Bridging Documented Gaps in Flood Risk Communication

A primary application of this research is addressing the persistent gap between the issuance of a flood warning and the public’s ability to take appropriate protective action. The risk communication literature consistently identifies the use of technical jargon, a lack of geographic specificity, and generic advice as critical barriers to effective response [3,27]. Our work addresses this “semantic gap” directly. The proof-of-concept case study demonstrates how our ontology serves as a semantic bridge to overcome these challenges. By querying the ontology’s rich relational network, a generic alert for “two inches of rain” can be translated into a tangible, understandable warning about specific “RoadDamage” on “Mouse Creek Road” Similarly, vague instructions like “act quickly” are transformed into “ActionableInstructions” by retrieving knowledge of local resources, such as a named “evacuationProcedure”. This practical application substantiates our claim that a communication-oriented ontology can systematically resolve the well-documented shortcomings of current public alerting systems.

5.3. Advancing Semi-Automated Ontology Construction Methodologies

Beyond the application, our contribution extends to the methodology of ontology engineering itself. The literature on LLM-based ontology construction highlights significant risks, including factual “hallucination", “semantic drift” away from the core domain, and the creation of isolated “ontology silos” that lack definitional clarity [8,56,57]. Our four-stage, human-in-the-loop methodology was explicitly designed to mitigate these known failure modes. By combining expert-led schema formulation, CQ-driven expansion, and enrichment from authoritative documents, we establish strong constraints on the LLM’s generative process.

The effectiveness of this approach is evidenced in our candidate rejection process, as detailed in Table 3. Our hybrid verification protocol, which combines automated filtering with human expert review, successfully identified and rejected logically flawed hierarchies (hallucination), ambiguous concepts, and overly broad terms (semantic drift). This demonstrates that a structured, collaborative human-AI workflow does more than just accelerate ontology creation; it serves as a critical quality assurance mechanism that addresses the well-documented weaknesses of fully automated approaches, resulting in a more trustworthy and contextually robust model.

5.4. Limitations and Future Directions

While this study establishes a strong proof-of-concept, we acknowledge its limitations. The evaluation presented is primarily structural (OntoQA) and qualitative (case study). A crucial next step is to conduct a quantitative, task-oriented performance comparison. An empirical study measuring the comprehension and perceived actionability of the ontology-driven messages among target populations would provide definitive evidence of its real-world effectiveness. Second, as noted, the ontology’s hierarchical structure is relatively shallow. Future iterations will focus on incorporating more granular data from diverse sources to deepen the taxonomy, thereby increasing its descriptive power. Finally, the framework has not yet been field-tested in a live operational environment. Integrating this system into a real-world alerting pipeline, while a significant long-term objective, is essential for validating its practical utility and scalability under the pressures of an actual disaster event.

6. Conclusions

This research demonstrates that a semi-automated, human-in-the-loop methodology can successfully produce a semantically rich ontology specifically engineered to overcome critical gaps in public flood risk communication. By strategically combining human expertise with the generative power of LLMs, our work provides both a replicable workflow and a practical knowledge framework that bridges the divide between complex disaster data and actionable public understanding.

The broader implications of this study are twofold. For flood risk communication, it provides a blueprint for moving beyond the current paradigm of static, one-size-fits-all warnings toward next-generation intelligent alerting systems. Such systems, built on a robust ontological foundation, can enable the dynamic generation of context-aware, personalized, and adaptive messages tailored to the specific needs of vulnerable populations. For the field of ontology design, our application-driven approach advocates the development of purpose-built models over generic ones, while our methodology presents a validated template for human-AI collaboration that mitigates the known risks of fully automated construction.

Looking ahead, our future work will proceed along three concrete paths. First, we will move beyond qualitative assessment to conduct empirical, user-centric studies that quantitatively measure the improved comprehension and actionability of ontology-driven alerts. Second, we aim to enhance the ontology’s technical capabilities by deepening its hierarchical structure and integrating dynamic, real-time data sources such as weather radar feeds and social media sentiment. Finally, our long-term vision is to develop a pilot program in partnership with an emergency management agency, allowing us to test and refine this framework in an operational environment, ultimately advancing the goal of saving lives through clearer communication.

Author Contributions

Conceptualization, S.L., C.E., M.Z. and X.G.; Methodology, S.L. and C.E.; Project administration, Q.D. and J.G.; Validation, S.L. and M.Z.; Data curation, S.L. and C.E.; Software, S.L., C.E. and M.Z.; Writing—review and editing, S.L., C.E. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported in part by the National Science Foundation (NSF Award #2333836) and by the National Oceanic and Atmospheric Administration (NOAA) through the Cooperative Institute for Research to Operations in Hydrology (CIROH) at The University of Alabama under Cooperative Agreement NA22NWS4320003. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of NSF, NOAA, or the U.S. Department of Commerce.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

During the preparation of this manuscript, the authors used (ChatGPT, version-4o) to check grammar, improve readability, and smooth transitions between paragraphs. The tool was not used for study design, data analysis, or generation of scientific content. The authors reviewed and edited all AI-assisted suggestions and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no competing interests.

Appendix A. LLM Prompts

Appendix A.1. Competency Question Generation

Figure A1. Prompt for generating ontology-driven Competency Questions (CQs) in the flood resilience domain. A low model temperature (

T = 0.1

) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.

Figure A1. Prompt for generating ontology-driven Competency Questions (CQs) in the flood resilience domain. A low model temperature (

T = 0.1

) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.

Appendix A.2. Entity Extraction from Competency Questions

Figure A2. Prompt for extracting concepts from ontology-driven Competency Questions generated from Figure A1. The provided Competency Questions (boxed in blue) are dynamically inserted using Python string operations. A low model temperature (

T = 0.1

) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.

Figure A2. Prompt for extracting concepts from ontology-driven Competency Questions generated from Figure A1. The provided Competency Questions (boxed in blue) are dynamically inserted using Python string operations. A low model temperature (

T = 0.1

) is applied to ensure consistency and reproducibility across CQ generation outputs, while Python’s Pydantic models are employed to enforce structured and reliable LLM responses.

Appendix A.3. Ontology Integration and Property Alignment

Figure A3. Prompt for integrating new classes and aligning object properties within an existing OWL 2 ontology. Python’s Pydantic models are employed to ensure structured, reliable, and schema-compliant LLM outputs. A low model temperature (

T = 0.1

) is applied to guarantee consistency and reproducibility across iterations of ontology extension and entity integration.

Figure A3. Prompt for integrating new classes and aligning object properties within an existing OWL 2 ontology. Python’s Pydantic models are employed to ensure structured, reliable, and schema-compliant LLM outputs. A low model temperature (

T = 0.1

) is applied to guarantee consistency and reproducibility across iterations of ontology extension and entity integration.

References

Chang, S.E. Socioeconomic Impacts of Infrastructure Disruptions. In Oxford Research Encyclopedia of Natural Hazard Science; Oxford University Press: Oxford, UK, 2016. [Google Scholar]
Hao, S.; Wang, W.; Ma, Q.; Li, C.; Wen, L.; Tian, J.; Liu, C. Analysis on the Disaster Mechanism of the “8.12” Flash Flood in the Liulin River Basin. Water 2022, 14, 2017. [Google Scholar] [CrossRef]
Stephens, K.K.; Blessing, R.; Tasuji, T.; McGlone, M.S.; Stearns, L.N.; Lee, Y.; Brody, S.D. Investigating ways to better communicate flood risk: The tight coupling of perceived flood map usability and accuracy. Environ. Hazards 2024, 23, 92–111. [Google Scholar] [CrossRef]
Merz, B.; Vorogushyn, S.; Uhlemann, S.; Viglione, A.; Blöschl, G. Understanding Heavy Tails of Flood Peak Distributions. Water Resour. Res. 2022, 58, e2021WR030506. [Google Scholar] [CrossRef]
Elmhadhbi, L.; Ghedira, C.; Bouaziz, R. An Ontological Approach to Enhancing Information Sharing in Disaster Response. Information 2021, 12, 432. [Google Scholar] [CrossRef]
Du, W.; Liu, C.; Xia, Q.; Wen, M.; Hu, Y. OFPO & KGFPO: Ontology and knowledge graph for flood process observation. Environ. Model. Softw. 2025, 185, 106317. [Google Scholar] [CrossRef]
Raman, R.; Kowalski, R.; Achuthan, K.; Iyer, A.; Nedungadi, P. Navigating Artificial General Intelligence Development: Societal, Technological, Ethical, and Brain-Inspired Pathways. Sci. Rep. 2025, 15, 8443. [Google Scholar] [CrossRef]
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Liu, T. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 42. [Google Scholar] [CrossRef]
Gruber, T.R. A Translational Approach to Portable Ontology Specifications. Knowl. Acquis. 1993, in press.
Noy, N.F.; McGuinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology; Technical Report KSL-01-05; Stanford Knowledge Systems Laboratory: Stanford, CA, USA, 2001. [Google Scholar]
Guarino, N.; Oberle, D.; Staab, S. What Is an Ontology? In Handbook on Ontologies; Staab, S., Studer, R., Eds.; Springer: Berlin, Germany, 2009; pp. 1–17. [Google Scholar]
Elmhadhbi, L.; Karray, M.-H.; Archimède, B.; Otte, J.; Smith, B. A modular ontology for semantically enhanced interoperability in operational disaster response. In Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management—ISCRAM 2019, Valencia, Spain, 19–22 May 2019. [Google Scholar]
Khantong, S.; Sharif, M.N.A.; Mahmood, A.K. An ontology for sharing and managing information in disaster response: An illustrative case study of flood evacuation. Int. Rev. Appl. Sci. Eng. 2020, 11, 22–33. [Google Scholar] [CrossRef][Green Version]
Bu Daher, J.; Huygue, T.; Stolf, P.; Hernandez, N. An ontology and a reasoning approach for evacuation in flood disaster response. In Proceedings of the 17th International Conference on Knowledge Management (IKCM 2022), Potsdam, Germany, 23–24 June 2022; pp. 117–131. [Google Scholar][Green Version]
Shukla, D.; Azad, H.K.; Abhishek, K.; Shitharth, S. Disaster management ontology—An ontological approach to disaster management automation. Sci. Rep. 2023, 13, 8091. [Google Scholar][Green Version]
Hofmeister, M.; Bai, J.; Brownbridge, G.; Mosbach, S.; Lee, K.F.; Farazi, F.; Hillman, M.; Agarwal, M.; Ganguly, S.; Akroyd, J.; et al. Semantic agent framework for automated flood assessment using dynamic knowledge graphs. Data-Centric Eng. 2024, 5, e14. [Google Scholar] [CrossRef]
Dutta, B.; Sinha, P.K. An ontological data model to support urban flood disaster response. J. Inf. Sci. 2023, 49, 1–22. [Google Scholar] [CrossRef]
Mughal, M.H.; Shaikh, Z.A.; Wagan, A.I.; Khand, Z.H.; Hassan, S. ORFFM: An Ontology-Based Semantic Model of River Flow and Flood Mitigation. IEEE Access 2021, 9, 44003–44029. [Google Scholar] [CrossRef]
Yahya, H.; Ramli, R. Ontology for Evacuation Center in Flood Management Domain. In Proceedings of the 2020 8th International Conference on Information Technology and Multimedia (ICIMU 2020), Selangor, Malaysia, 24–25 August 2020; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2020; pp. 288–291. [Google Scholar]
Sermet, Y.; Demir, I. Towards an information centric flood ontology for information management and communication. Earth Sci. Inform. 2019, 12, 541–551. [Google Scholar] [CrossRef]
Kurte, K.R.; Durbha, S.S. Spatio-Temporal Ontology for Change Analysis of Flood Affected Areas Using Remote Sensing Images. In Proceedings of the 10th International Conference on Formal Ontology in Information Systems (FOIS 2016), Annecy, France, 6–10 July 2016. Paper ONTO-COMP-D2. [Google Scholar]
Agresta, A.; Fattoruso, G.; Pollino, M.; Pasanisi, F.; Tebano, C.; De Vito, S.; Di Francia, G. An Ontology Framework for Flooding Forecasting. In Proceedings of the 14th International Conference on Computational Science and Its Applications (ICCSA 2014), University of Minho, Campus de Azurém, Guimarães, Portugal, 30 June–3 July 2014; Lecture Notes in Computer Science, Volume 8582. Springer International Publishing: Cham, Switzerland, 2014; pp. 417–428. [Google Scholar]
van Ruler, B. Communication Theory: An Underrated Pillar on Which Strategic Communication Rests. Int. J. Strateg. Commun. 2018, 12, 367–381. [Google Scholar] [CrossRef]
Rowley, J. The Wisdom Hierarchy: Representations of the DIKW Hierarchy. J. Inf. Sci. 2007, 33, 163–180. [Google Scholar] [CrossRef]
MacKinnon, J.; Heldsinger, N.; Peddle, S. A Community Guide to Effective Flood Risk Communication; Partners for Action: Waterloo, ON, Canada, 2018. [Google Scholar]
Rollason, E.; Bracken, L.J.; Hardy, R.J.; Large, A.R.G. Rethinking flood risk communication. Nat. Hazards 2018, 92, 1665–1686. [Google Scholar] [CrossRef]
Zajac, M.; Kulawiak, C.; Li, S.; Erickson, C.; Hubbell, N.; Gong, J. Unifying Flood-Risk Communication: Empowering Community Leaders Through AI-Enhanced, Contextualized Storytelling. Hydrology 2025, 12, 204. [Google Scholar] [CrossRef]
Steen-Tveit, K. Identifying Information Requirements for Improving the Common Operational Picture in Multi-Agency Operations. In Proceedings of the 17th ISCRAM Conference, Blacksburg, VA, USA, 24–27 May 2020; pp. 252–263. [Google Scholar]
Dorasamy, M.; Raman, M.; Kaliannan, M. Knowledge management systems in support of disasters management: A two-decade review. Technol. Forecast. Soc. Change 2013, 80, 1834–1853. [Google Scholar] [CrossRef]
Guarino, N. Formal Ontology and Information Systems. In Proceedings of the Formal Ontology in Information Systems (FOIS’98), Trento, Italy, 6–8 June 1998; pp. 3–15. [Google Scholar]
National Weather Service (NWS). CAP Documentation–NWS Common Alerting Protocol. Available online: https://vlab.noaa.gov/web/nws-common-alerting-protocol/cap-documentation (accessed on 28 August 2025).
Asim, M.N.; Wasim, M.; Khan, M.U.G.; Mahmood, W.; Abbasi, H.M. A Survey of Ontology Learning Techniques and Applications. Database 2018, 2018, bay101. [Google Scholar] [CrossRef]
Ghidalia, S.; Labbani Narsis, O.; Bertaux, A.; Nicolle, C. Combining Machine Learning and Ontology: A Systematic Literature Review. arXiv 2024, arXiv:2401.07744. [Google Scholar] [CrossRef]
Zulkipli, Z.Z.; Maskat, R.; Teo, N.H.I. A Systematic Literature Review of Automatic Ontology Construction. Indones. J. Electr. Eng. Comput. Sci. 2022, 28, 878–889. [Google Scholar] [CrossRef]
Val-Calvo, M.; Egaña-Aranguren, M.; Mulero-Hernández, J.; Almagro-Hernández, I.; Deshmukh, P.; Bernabé-Díaz, J.A.; Espinoza-Arias, P.; Sánchez-Fernández, J.L.; Mueller, J.; Fernández-Breis, G.T. OntoGenix: Leveraging Large Language Models for enhanced ontology engineering from datasets. Inf. Process. Manag. 2025, 62, 104042. [Google Scholar] [CrossRef]
Castro, A.; Pinto, J.; Reino, L.; Pipek, P.; Capinha, C. Large language models overcome the challenges of unstructured text data in ecology. Ecol. Inform. 2024, 82, 102742. [Google Scholar] [CrossRef]
Abid, S.K.; Sulaiman, N.; Chan, S.W. Present and Future of Artificial Intelligence in Disaster Management. In Proceedings of the International Conference on Engineering Management of Communication and Technology (EMCTECH), Vienna, Austria, 16–18 October 2023; IEEE: Kuala Lumpur, Malaysia, 2023; pp. 1–8. [Google Scholar]
Kommineni, V.K.; König-Ries, B.; Samuel, S. From human experts to machines: An LLM-supported approach to ontology and knowledge graph construction. arXiv 2024, arXiv:2403.08345. [Google Scholar]
Lo, A.; Jiang, A.Q.; Li, W.; Jamnik, M. End-to-End Ontology Learning with Large Language Models. In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10–15 December 2024; NeurIPS Foundation: Vancouver, BC, Canada, 2024. [Google Scholar]
Raees, M.; Meijerink, I.; Lykourentzou, I.; Khan, V.-J.; Papangelis, K. From explainable to interactive AI: A literature review on current trends in human-AI interaction. Int. J. Hum.-Comput. Stud. 2024, 189, 103301. [Google Scholar] [CrossRef]
Mazarakis, A.; Bernhard-Skala, C.; Braun, M.; Peters, I. What is critical for human-centered AI at work?—Toward an interdisciplinary theory. Front. Artif. Intell. 2023, 6, 1257057. [Google Scholar] [CrossRef]
Karanjit, R.; Samadi, V.; Hughes, A.; Murray-Tuite, P.; Stephens, K. Converging human intelligence with AI systems to advance flood evacuation decision making. Nat. Hazards Earth Syst. Sci. Discuss. 2024. in review. [Google Scholar]
Lokala, U.; Lamy, F.; Daniulaityte, R.; Gaur, M.; Gyrard, A.; Thirunarayan, K.; Kursuncu, U.; Sheth, A. Drug Abuse Ontology to Harness Web-Based Data for Substance Use Epidemiology Research: Ontology Development Study. JMIR Public Health Surveill. 2022, 8, e24938. [Google Scholar] [CrossRef]
Tsaneva, S.; Sabou, M. Enhancing Human-in-the-Loop Ontology Curation Results through Task Design. ACM J. Data Inf. Qual. 2024, 16, 4. [Google Scholar] [CrossRef]
Lippolis, A.S.; Saeedizade, M.J.; Keskisärkkä, R.; Zuppiroli, S.; Ceriani, M.; Gangemi, A.; Blomqvist, E.; Nuzzolese, A.G. Ontology Generation Using Large Language Models. arXiv 2025, arXiv:2503.05388. [Google Scholar] [CrossRef]
Abolhasani, M.S.; Pan, R. OntoKGen: A Genuine Ontology and Knowledge Graph Generator Using Large Language Model. In Proceedings of the Annual Reliability & Maintainability Symposium (RAMS), Destin, FL, USA, 27–30 January 2025; pp. 20–25. [Google Scholar]
Aggarwal, T.; Salatino, A.; Osborne, F.; Motta, E. Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field. Inf. Process. Manag. 2024, submitted. [CrossRef]
Fathallah, N.; Das, A.; De Giorgis, S.; Poltronieri, A.; Haase, P.; Kovriguina, L. NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning. In Proceedings of the Semantic Web: ESWC 2024 Satellite Events, Hersonissos, Crete, Greece, 26–30 May 2024; Meroño Peñuela, A., Corcho, O., Groth, P., Simperl, E., Tamma, V., Nuzzolese, A.G., Poveda-Villalón, M., Sabou, M., Presutti, V., Celino, I., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2025; Volume 15344, pp. 36–50. [Google Scholar]
Bakker, R.M.; Di Scala, D.L.; de Boer, M.H.T. Ontology Learning from Text: An Analysis on LLM Performance. In Proceedings of the NLP4KGC: 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation, in conjunction with SEMANTiCS 2024 Conference, Amsterdam, The Netherlands, 17–19 September 2024; CEUR Workshop Proceedings: Aachen, Germany, 2024. [Google Scholar]
Funk, M.; Hosemann, S.; Jung, J.C.; Lutz, C. Towards Ontology Construction with Language Models. arXiv 2023, arXiv:2309.09898. [Google Scholar] [CrossRef]
Babaei Giglou, H.; D’Souza, J.; Auer, S. LLMs4OL: Large Language Models for Ontology Learning. In Proceedings of the 22nd International Semantic Web Conference, Athens, Greece,, 6–10 November 2023; Proceedings, Part II. [Google Scholar]
Li, N.; Bailleux, T.; Bouraoui, Z.; Schockaert, S. Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis. arXiv 2024, arXiv:2403.17216. [Google Scholar] [CrossRef]
Nayyeri, M.; Yogi, A.A.; Fathallah, N.; Thapa, R.B.; Tautenhahn, H.-M.; Schnurpel, A.; Staab, S. Retrieval-Augmented Generation of Ontologies from Relational Databases. arXiv 2025, arXiv:2506.01232. [Google Scholar] [CrossRef]
Yang, H.; Liu, Z.; Xiao, L.; Chen, J.; Zhu, R. An LLM Supported Approach to Ontology and Knowledge Graph Construction. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 5240–5246. [Google Scholar]
Zhang, B.; Carriero, V.A.; Schreiberhuber, K.; Tsaneva, S.; Sánchez González, L.; Kim, J.; de Berardinis, J. OntoChat: A Framework for Conversational Ontology Engineering Using Language Models. In Proceedings of the 21st European Semantic Web Conference (ESWC 2024), Hersonissos, Crete, Greece, 26–30 May 2024; pp. 102–121. [Google Scholar]
Benson, C.-B.; Sculley, A.; Liebers, A.; Beverley, J. My Ontologist: Evaluating BFO-Based AI for Definition Support. In Proceedings of the Workshop on the Convergence of Large Language Models and Ontologies, 14th International Conference on Formal Ontology in Information Systems (FOIS 2024), Enschede, The Netherlands; 2024; pp. 1–10. [Google Scholar]
Li, J.; Garijo, D.; Poveda-Villalón, M. Large Language Models for Ontology Engineering: A Systematic Literature Review. Semant. Web J. 2025, submitted.
David, A.O.; Ndambuki, J.M.; Muloiwa, M.; Kupolati, W.K.; Snyman, J. A Review of the Application of Artificial Intelligence in Climate Change-Induced Flooding—Susceptibility and Management Techniques. CivilEng 2024, 5, 1185–1198. [Google Scholar] [CrossRef]
Wang, B.; Xu, C.; Zhao, X.; Ouyang, L.; Wu, F.; Zhao, Z.; Xu, R.; Liu, K.; Qu, Y.; Shang, F.; et al. Mineru: An Open-Source Solution for Precise Document Content Extraction. arXiv 2024, arXiv:2409.18839. [Google Scholar]
Tartir, S.; Arpinar, I.B.; Moore, M.; Sheth, A.; Aleman-Meza, B. OntoQA: Metric-Based Ontology Quality Analysis. In Proceedings of the IEEE Workshop on Evaluation of Ontologies for the Web (EON), Houston, TX, USA, 27 November 2005. [Google Scholar]
Alirezaie, M.; Khameneh, A.M.; Nagel, T.; Pileggi, S.F. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring. Sensors 2017, 17, 2545. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Workflow of the proposed semi-automated human-AI ontology construction framework.

Figure 2. Initial ontology produced via expert formulation.

Figure 3. This expanded ontology illustrates one fully expanded branch down to its lowest-level leaf nodes. The example path shown follows: Flood Phases → Pre-Flood Preparedness → Warning Time → leaf nodes.

Figure 4. Progressive visualization of the ontology and knowledge graph across increasing levels of granularity. From the full knowledge graph, the figure narrows to Community Awareness Initiatives, then to its subclass Warning Systems. Three exemplar relationships illustrate how communication systems reduce flood risk: (ClearMessaging → Alerts → HazardousAreas), (ClearMessaging → Facilitates_Communication_For → MarginalizedCommunities), and (DisasterInformation → Guides_Relocation_To → HostAreas). By zooming in, the figure highlights how Warning Systems bridge high-level resilience planning with concrete, actionable strategies that strengthen community preparedness.

Figure 5. Numbered highlights indicate where the NWS message falls short and our framework addresses communication gaps, as discussed in Section 3.5: (1) addressing lack of geographic specificity, (2) translating data into tangible impacts, (3) providing actionable instructions, and (4) tailoring messages for vulnerable populations.

Figure 6. Prompt used for extracting relevant concepts from competency questions about flood management and resilience.

Figure 7. Question–concept mapping in JSON for an extracted implicit concept.

Figure 8. Human validation of extracted concepts quiz.

Figure 9. Threshold analysis for Type 1: CQ concepts vs. ontology labels.

Figure 10. Cosine similarity visual for Type 1 evaluation: Test question 40 concepts.

Figure 11. Example of concept matching with varying similarity scores for same concepts.

Figure 12. Threshold analysis for Type 2: CQ concepts vs. ontology descriptions.

Figure 13. Threshold Analysis for Type 3: CQ Questions vs. Ontology Labels.

Figure 14. Threshold Analysis for Type 4: CQ Questions vs. Ontology Descriptions.

Table 1. Recent Flood and Disaster Ontologies Mini-Review.

Ref.	Year	Domain	Location	Informed-by	Methodology	Intended Users	Use Case	Evaluation	Formality Score	Technology
[6]	2025	Flood-process observation	China	UNISDR stages; China Disaster-Relief Plan (2023); OGC Time + GeoSPARQL; W3C SSN; domain experts	Bottom-up, reuse-based design (Protégé); No AI	Emergency managers; flood GIS analysts	Integrated query/decision support across flood stages	Henan 2021 case-study + OntoQA metrics	5/5	OWL in Protégé; GraphDB
[16]	2024	Flood-impact assessment	UK	Existing ontologies (ENVO,SWEET); GeoSPARQL; Public APIs (EA, Met Office, HM Land Registry)	Hybrid top–down/bottom-up, competency-question driven; No AI	Emergency planners; City Planners; Software agents	Real-time flood-risk impact assessment	Competency questions; HermiT reasoning	5/5	OWL in Protégé; Blazegraph
[15]	2023	Disaster-general	India	National Disaster Management Plan (India); National Disaster Management Authority matrix; BFO; literature	BFO-aligned custom modelling; OWL-DL + SWRL; No AI	Government disaster managers	Responsibility allocation and relief-decision support	Scenario-based reasoning tests	5/5	OWL-DL + SWRL in Protégé
[17]	2023	Flood-response	Bangalore, India	Authoritative Documents (NDMP, KSDMP); Competency Questions; Existing Ontologies (FOAF, EM-DAT)	YAMO + NeOn methods; No AI	Emergency responders	Urban-flood rescue/relief coordination	Reasoners; OOPS!; SPARQL CQs	5/5	OWL DL (Protégé)
[14]	2022	Flood disaster response & evacuation	France (Pyrénées)	Prior models; domain experts (firefighters); institutional databases (BD TOPO); hydraulic models	NeOn design methodology; No AI	Firefighters; emergency managers	Decision support for generating flood evacuation priorities	Real-world case study; performance testing (execution time); visualization	5/5	OWL (Protégé); SHACL; SPARQL; Virtuoso
[18]	2021	Flood-mitigation	Pakistan (Indus River)	Govt reports (NDMA/PDMA), irrigation manuals, existing ontologies, domain experts	UPON + METHONTOLOGY; No AI	Irrigation & disaster managers	River-flow/flood-mitigation coordination	Competency questions; HermiT reasoner	5/5	OWL 2 DL (Protégé)
[13]	2020	Flood-evacuation	Thailand	Foundational ontologies (UFO, DEMO); academic literature	Design Science Research; Uschold & King; Gómez-Pérez et al.; No AI	Flood response stakeholders	Structuring & sharing information for disaster response	Expert-based (semi-structured interviews)	5/5	OWL/OWL-S in Protégé; UML
[19]	2020	Flood-evacuation-center	Malaysia	Academic literature; existing ontologies; JKM domain input; Previous research	Conceptual modelling (no stated framework); No AI	Emergency managers (JKM/NADMA)	Shared victim-profile data	None	3/5	Modeling Diagrams
[12]	2019	Operational disaster response	France	Interviews with experts; feedback documents; BFO; CCO; prior ontologies	METHONTOLOGY; modularization; competency questions; No AI	Emergency responders	Cross-agency semantic messaging for operational response	HermiT consistency checks; SPARQL over competency questions (Richter-65)	5/5	OWL (Protégé)
[20]	2019	Flood-information	–	NOAA/FEMA/USGS docs; prior flood ontologies; domain experts	Top-down UML → XMI; No AI	Information-system developers; emergency managers	NLQ knowledge engine; data exchange; Communication	Application-based + data-driven	4/5	UML/XMI (GenMyModel)
[21]	2016	Flood-change detection	–	Existing ontologies (BFO 2.0, W3C Time); Spatial&Temporal models(RCC-8, Allen interval algebra); Domain Observations	Ontology reuse; rule-based encoding; No AI	Remote-sensing analysts; emergency managers	Spatio-temporal flood detection in RS images	Automated reasoning tests (Pellet)	5/5	OWL-DL + SWRL (Protégé)
[22]	2014	Flood-forecasting	–	Existing ontologies (SSN,SWEET); hydro/hydraulic literature; domain experts	Uschold–Gruninger (skeletal, middle-out); No AI	Authorities; risk managers	Interoperable sensor–hydraulic flood forecast/alert	None	5/5	OWL in Protégé

Table 3. Examples of LLM-generated candidates rejected during human review.

Rejected Proposal (Entity/Hierarchy)	Rejection Reason	Explanation (Human Reviewer’s Rationale)
Hierarchy: SocioeconomicVulnerabilityFactors → Age → Gender → Race	Hallucination/Logical Error	The LLM incorrectly created a hierarchical chain where Race is a subclass of Gender, and Gender is a subclass of Age. This is a nonsensical, logically flawed structure. The human reviewer rejects this hierarchy and restructures them as parallel sibling classes, all of which are direct subclasses of SocioeconomicVulnerabilityFactors.
Entities: CommunityResilienceMeasures → Community; EnvironmentalContexts → GeographicArea → Community	Ambiguity	The LLM generated two concepts with nearly identical labels but placed them in different parts of the ontology. This creates significant ambiguity. A human reviewer consolidates these, likely keeping Community as a subclass of geographicArea and relating it to CommunityResilienceMeasures through an object property (e.g., community -hasResilienceMeasure-> ...), rather than making it a subclass.
Entity: EnvironmentalContexts → GeographicFactors → Geography	Semantic Drift	The concept “Geography” refers to an entire academic discipline and is far too broad for the specific scope of this ontology. It has “drifted” from the core topic of flood risk. The human reviewer rejects this entity in favor of more specific and relevant concepts like Topography or Watershed.
Entity: FloodEvent → FloodCharacteristic → KonaStorm	Semantic Drift	“Kona Storm” is a highly specific type of cyclone that primarily affects Hawaii. Unless the ontology’s scope is explicitly global or focused on that region, this concept is too specific and not generalizable. The human reviewer rejects it to maintain the ontology’s focus on more broadly applicable flood concepts.
Entity: TypesOfFloods → CoastalFlood → 1%-annual-chance-flood-level	Hallucination/Logical Error	The parent class CoastalFlood describes a physical event (the inundation of land), while the proposed subclass 1%-annual-chance flood level is a statistical metric used to measure risk. A metric is a characteristic of a flood or a floodplain, not a type of flood itself. The human reviewer rejects it.

Table 4. Comparison of ontology structural metrics across different ontologies.

Ontology	Object Properties (P)	Subclass Relations ( $SC$ )	$RR$	No. of Classes (C)	No. of Subclasses	$IR$	Axiom Count
FDSO	114	607	0.16	403	607	1	2683
DMDO	12	1001	0.01	366	1001	1	4075
OntoCity	17	78	0.18	56	96	0.81	196
Our Ontology	473	343	0.58	350	343	1	3754

Table 5. Ontology coverage matches for first 4 questions in test set at Top k = extracted concept count.

Test Question	Extracted Concepts	Type 1	Type 2	Type 3	Type 4
What factors make urban areas more susceptible to flooding compared to rural areas?	Urban areas	Coastal areas; 0.6188	areas inhabited by homeless people; 0.4778	areas inhabited by homeless people; 0.4691	UrbanFlood; 0.5367
	Rural areas	rural residents; 0.6994	rural residents; 0.5519	rural residents; 0.5231	SocioeconomicVulnerabilityFactors; 0.5353
	Flood susceptibility	Flood Risk; 0.7509	SocioeconomicVulnerabilityFactors; 0.7287	SocioeconomicVulnerabilityFactors; 0.6889	EnvironmentalContexts; 0.4759
	Urbanization	UrbanFlood; 0.4161	Development Type; 0.3980	Development Type; 0.3751	Vulnerable Population; 0.4756
	Land use	LandCover; 0.5916	zoning; 0.5000	zoning; 0.5182	Age; 0.4682
	Drainage systems	EvacuationCapacities; 0.4116	Warning System; 0.4275	Warning System; 0.4153	Extreme Rainfall; 0.4669
	Risk factors	Danger Factor; 0.5673	socially vulnerable; 0.4989	socially vulnerable; 0.5092	Poverty; 0.4664
How can communities identify which type of flooding poses the greatest risk to their specific location?	Communities	Community; 0.6570	well-known community; 0.5149	Social Network; 0.5022	Vulnerable Population; 0.5541
	Type of flooding	TypesOfFloods; 0.7795	TypesOfFloods; 0.6840	TypesOfFloods; 0.6607	CommunityResilienceMeasures; 0.5445
	Greatest risk	Health Risk; 0.5677	socially vulnerable; 0.4122	socially vulnerable; 0.4126	Community Impact; 0.5334
	Specific location	location; 0.6841	Geographic Area; 0.5397	Geographic Area; 0.4652	GeographicFactors; 0.5304
	Flood identification	Flood Event; 0.7138	Geographic Area; 0.6244	TypesOfFloods; 0.6140	Flood Behavior; 0.5258
What communication systems should communities establish before flooding occurs to coordinate during emergencies?	Communication systems	Reunification Systems; 0.4454	Warning System; 0.4238	mainstream media access; 0.3950	LanguageBarriers; 0.5811
	Communities	Community; 0.6570	well-known community; 0.5149	Social Network; 0.5022	Warning System; 0.5593
	Emergency coordination	Animal evacuation coordination; 0.6710	unified evacuation orders; 0.6202	unified evacuation orders; 0.6143	EvacuationCapacities; 0.5228
	Pre-flood phase	FloodPhases; 0.6231	Impact Phase; 0.4933	Impact Phase; 0.4781	disaster information; 0.5087
	Pre Flood Preparedness	PreFloodPreparedness; 0.8294	Emergency Supply; 0.6055	Emergency Supply; 0.5861	Evacuating Jurisdictions; 0.5073
	During/After Flood Needs	DuringFloodResponse; 0.6909	meet basic human needs; 0.6454	meet basic human needs; 0.6282	reduce losses; 0.5044
How can communities develop effective evacuation plans that account for all three flood phases?	Communities	Community; 0.6571	well-known community; 0.5149	Social Network; 0.5022	Evacuating Jurisdictions; 0.5846
	Evacuation plans	evacuation plan; 0.8730	plan compliance; 0.6200	plan compliance; 0.5991	Evacuation Procedure; 0.5528
	Flood phases	FloodPhases; 0.8984	FloodPhases; 0.6546	FloodPhases; 0.6101	EvacuationCapacities; 0.5503
	Effective development	Development Speed; 0.6358	Development Type; 0.4732	Development Type; 0.4000	CommunityResilienceMeasures; 0.5422
	Flood management	floodplain management tool; 0.7074	FloodPhases; 0.6242	flooding reduction; 0.6002	FloodPhases; 0.5153

Table 6. Coverage analysis by comparison type and threshold.

CQ Concepts vs. Ontology Labels	$τ$	Q14	Q15	Q19	Q20	Q24	Q25	Q29	Q30	Q34	Q35	Q39	Q40	Total	Q-Cov
	0.3	7/7	5/5	6/6	5/5	4/4	7/7	5/5	7/7	5/5	5/5	7/7	3/3	100.0%	12/12
	0.4	7/7	5/5	6/6	5/5	4/4	7/7	5/5	7/7	5/5	4/5	7/7	3/3	98.5%	12/12
	0.5	5/7	5/5	5/6	5/5	4/4	6/7	5/5	7/7	5/5	4/5	7/7	3/3	92.4%	12/12
	0.6	3/7	4/5	5/6	5/5	4/4	2/7	5/5	6/7	4/5	3/5	7/7	2/3	75.8%	12/12
	0.7	1/7	2/5	1/6	3/5	3/4	2/7	4/5	5/7	2/5	3/5	3/7	1/3	45.5%	12/12
	0.8	0/7	0/5	1/6	2/5	2/4	0/7	3/5	4/7	1/5	2/5	1/7	0/3	24.2%	8/12
	0.9	0/7	0/5	0/6	0/5	1/4	0/7	1/5	2/7	0/5	1/5	0/7	0/3	7.6%	4/12
CQ Concepts vs. Ontology Descriptions	$τ$	Q14	Q15	Q19	Q20	Q24	Q25	Q29	Q30	Q34	Q35	Q39	Q40	Total	Q-Cov
	0.3	7/7	5/5	6/6	5/5	4/4	7/7	5/5	7/7	5/5	5/5	7/7	3/3	100.0%	12/12
	0.4	6/7	5/5	6/6	5/5	4/4	4/7	5/5	7/7	5/5	4/5	5/7	3/3	89.4%	12/12
	0.5	3/7	4/5	4/6	4/5	3/4	3/7	4/5	6/7	2/5	2/5	3/7	3/3	62.1%	12/12
	0.6	1/7	2/5	3/6	3/5	1/4	2/7	3/5	3/7	2/5	2/5	1/7	1/3	36.4%	12/12
	0.7	1/7	0/5	0/6	0/5	0/4	1/7	0/5	2/7	1/5	0/5	0/7	1/3	9.1%	5/12
	0.8	0/7	0/5	0/6	0/5	0/4	0/7	0/5	0/7	0/5	0/5	0/7	0/3	0.0%	0/12
	0.9	0/7	0/5	0/6	0/5	0/4	0/7	0/5	0/7	0/5	0/5	0/7	0/3	0.0%	0/12
CQ Concepts vs. Ontology Label+Description	$τ$	Q14	Q15	Q19	Q20	Q24	Q25	Q29	Q30	Q34	Q35	Q39	Q40	Total	Q-Cov
	0.3	7/7	5/5	6/6	5/5	4/4	7/7	5/5	7/7	5/5	5/5	7/7	3/3	100.0%	12/12
	0.4	6/7	5/5	5/6	4/5	4/4	5/7	5/5	7/7	5/5	4/5	7/7	3/3	90.9%	12/12
	0.5	4/7	3/5	4/6	4/5	3/4	2/7	4/5	5/7	2/5	3/5	3/7	2/3	59.1%	12/12
	0.6	1/7	2/5	2/6	3/5	1/4	2/7	3/5	2/7	2/5	2/5	1/7	1/3	31.8%	12/12
	0.7	0/7	0/5	0/6	0/5	0/4	0/7	0/5	2/7	0/5	0/5	0/7	0/3	3.0%	1/12
	0.8	0/7	0/5	0/6	0/5	0/4	0/7	0/5	0/7	0/5	0/5	0/7	0/3	0.0%	0/12
	0.9	0/7	0/5	0/6	0/5	0/4	0/7	0/5	0/7	0/5	0/5	0/7	0/3	0.0%	0/12
CQ Questions vs. Ontology Label+Description	$τ$	Q14	Q15	Q19	Q20	Q24	Q25	Q29	Q30	Q34	Q35	Q39	Q40	Total	Q-Cov
	0.3	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	100.0%	12/12
	0.4	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	100.0%	12/12
	0.5	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	100.0%	12/12
	0.6	–	–	–	–	Y	–	–	Y	–	–	–	Y	25.0%	3/12
	0.7	–	–	–	–	–	–	–	–	–	–	–	–	0.0%	0/12
	0.8	–	–	–	–	–	–	–	–	–	–	–	–	0.0%	0/12
	0.9	–	–	–	–	–	–	–	–	–	–	–	–	0.0%	0/12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, S.; Erickson, C.; Zajac, M.; Guo, X.; Duan, Q.; Gong, J. A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water 2025, 17, 2801. https://doi.org/10.3390/w17192801

AMA Style

Li S, Erickson C, Zajac M, Guo X, Duan Q, Gong J. A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water. 2025; 17(19):2801. https://doi.org/10.3390/w17192801

Chicago/Turabian Style

Li, Shenglin, Caleb Erickson, Michal Zajac, Xiaoming Guo, Qiuhua Duan, and Jiaqi Gong. 2025. "A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication" Water 17, no. 19: 2801. https://doi.org/10.3390/w17192801

APA Style

Li, S., Erickson, C., Zajac, M., Guo, X., Duan, Q., & Gong, J. (2025). A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water, 17(19), 2801. https://doi.org/10.3390/w17192801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication

Abstract

1. Introduction

2. Related Works

2.1. Ontologies in Flood-Related Applications

2.2. Ontologies for Flood-Related Communication

2.3. Ontology Construction Methods

3. Methodology

3.1. Stage 1: Expert Formulation of the Initial Ontology

3.2. Stage 2: Competency Question-Driven Ontology Expansion with LLMs

3.3. Stage 3: Schema Enrichment from Authoritative Documents

3.4. Stage 4: Instance Population from Web-Scraped Articles

3.5. Stage 5: Case Study: Demonstrating Ontology Application

3.6. Domain Adaptability and Generalization

4. Evaluation Results

4.1. Structural Evaluation Using OntoQA Metrics

4.2. Concept Coverage Evaluation via Competency Questions

4.2.1. Concept Extraction

4.2.2. Evaluation and Results

4.2.3. Analysis

5. Discussion

5.1. A Critical Comparison of Ontological Structure and Purpose

5.2. Bridging Documented Gaps in Flood Risk Communication

5.3. Advancing Semi-Automated Ontology Construction Methodologies

5.4. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. LLM Prompts

Appendix A.1. Competency Question Generation

Appendix A.2. Entity Extraction from Competency Questions

Appendix A.3. Ontology Integration and Property Alignment

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI