Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models

Li, Cui; Wang, Yu; Gao, Lei; Ding, Qiaoyan

doi:10.3390/w18070840

Open AccessArticle

Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models

¹

School of Economics and Management, University of Emergency Management, Langfang 065201, China

²

Langfang Key Laboratory of Emergency Material Security and Logistics Management, University of Emergency Management, Langfang 065201, China

³

Department of Disciplines and Graduate Studies, University of Emergency Management, Langfang 065201, China

^*

Author to whom correspondence should be addressed.

Water 2026, 18(7), 840; https://doi.org/10.3390/w18070840

Submission received: 3 February 2026 / Revised: 17 March 2026 / Accepted: 31 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Advances and Major Achievements in China’s Digital Twin River Basin Development)

Download

Browse Figures

Versions Notes

Abstract

Water conservancy safety management faces several challenges. These include the integration of multi-source heterogeneous data and inefficient knowledge utilization. To address these issues, this study proposes a knowledge graph (KG) construction method that combines ontology modeling with large language models (LLMs). First, an ontology for water conservancy facility safety is constructed, encompassing four core elements: agencies and personnel, engineering equipment, risks and hidden dangers, and systems and processes. Subsequently, a KG-LLM-GraphRAG architecture is designed, which optimizes the knowledge extraction effectiveness of LLM through ontology-constrained prompt templates and utilizes the Neo4j graph database for knowledge storage and multi-hop reasoning. Experimental results demonstrate that the proposed method significantly outperforms traditional approaches in entity-relationship extraction tasks. The resulting KG supports hazard identification, emergency decision-making, and knowledge reuse, offering an efficient tool for organizing and reasoning in water conservancy safety management, strongly propelling the digital transformation of the water conservancy industry.

Keywords:

knowledge graph (KG); large language model (LLM); water conservancy facility safety

1. Introduction

Water conservancy facility safety serves as a public, foundational, and strategic safeguard for national economic and social development [1]. It is directly related to the safety of people’s lives and property. It also affects the stability of the ecological environment and sustainable economic and social development. A vast network of water infrastructure exists worldwide. Taking China as an example, publicly available statistics show that the country currently has nearly 95,000 reservoirs. It also has over 320,000 km of embankments. In addition, more than 580,000 water conservancy facilities are in operation. These include sluices, pumping stations, and irrigation districts. These facilities play an irreplaceable role in many critical functions. These functions include flood control, disaster reduction, water supply security, agricultural irrigation, and hydropower generation. However, the safe operation of these facilities faces severe challenges. Facility failures or accidents can lead to catastrophic consequences. For instance, during the “7·20” extreme rainstorm disaster in Zhengzhou, Henan, China, in 2021, safety issues at the Guojiazui and Changzhuang reservoirs significantly amplified the flood’s destructive power. These issues included spillway blockages and gate operation failures. The disaster directly resulted in 398 deaths and missing persons. Economic losses exceeded 120 billion RMB. In 2020, the Edenville Dam in Michigan, USA, breached due to structural aging and inadequate supervision. This breach triggered a chain of disasters. Direct economic losses reached 250 million USD. Thousands of people were evacuated. The severe casualties and substantial property losses highlight the critical importance of water conservancy facility safety management.

Globally, major water conservancy nations have built intensive regulatory systems for safety management [2]. These systems are dynamically updated. China serves as an example. As of June 2024, China has 452 water safety regulations. These regulations cover a five-tier system. The tiers are national, river basin, provincial, municipal, and county levels. The average number of regulation updates each year exceeds 37. This creates a complex regulatory framework that evolves rapidly. This framework is combined with massive amounts of other data. These data include engineering equipment information, risk records, institutional responsibility descriptions, and historical case data. Together, they form the core knowledge foundation for managing water conservancy facility safety. However, as textual regulatory information grows increasingly complex, traditional manual approaches face severe challenges in managing water facility safety knowledge. This creates significant pressure on safety management. Specific problems include the following. First, the processing of massive data is inefficient [3]. Second, tracking and updating safety regulations involves high costs and poor timeliness. Third, complex dynamic relationships between elements cannot be adequately represented [4]. Fourth, manual retrieval of scattered information during emergencies is slow and error-prone. These limitations severely impact the timeliness and accuracy of emergency decision-making. Consequently, a significant gap exists. On one hand, there is a vast and rapidly iterating volume of cross-document knowledge. On the other hand, there is a need for instant and precise emergency decisions. Traditional manual management methods are inadequate to address these challenges. Therefore, they limit the accurate identification of safety risks and the reliability of emergency decisions. This gap underscores the need for more intelligent approaches. Knowledge graphs, with their powerful semantic network expression capabilities, are one such approach. They can support dynamic and integrated safety management.

KGs offer new pathways to overcome the bottlenecks of knowledge fragmentation and relationship complexity in water conservancy facility safety management. This technology has matured in fields such as healthcare and finance [5]. Preliminary explorations in the water conservancy domain have also made progress. Applications include water disaster prevention [6], engineering scheduling [7], and operation and maintenance management [8]. However, existing research still faces dual challenges. Firstly, insufficient construction depth makes it difficult to accurately model the dynamic relational chains between water conservancy facilities. Secondly, traditional knowledge extraction methods are inefficient in semantically mining unstructured text. Due to the lack of domain constraint mechanisms, “hallucination” errors frequently occur. These errors severely constrain decision-making reliability. To address these core contradictions and the limitations of existing technologies, this paper proposes leveraging LLMs to enhance the information extraction capability of KGs. KG and LLM technologies are integrated. An ontology-driven KG-LLM-GraphRAG collaborative architecture is introduced. This architecture first constructs a water conservancy facility safety ontology as the core framework of the ontology layer. Four entity types are systematically integrated. These are agencies and personnel, engineering equipment, risks and hidden dangers, and systems and processes. Six semantic relationship types are logically generalized. These include “operates/is operated by.” A structured constraint template is thus formed. A Retrieval-Augmented Generation (GraphRAG) mechanism is then adopted to achieve dynamic construction at the data layer. This mechanism accurately extracts entity attributes through multi-source data cleansing and prompt optimization. It relies on the Neo4j graph database for efficient storage and dynamic updating of triples. Most critically, the predefined ontology serves as a generation template to correct LLM outputs in real-time. This enhances the accuracy of professional domain knowledge extraction. It also effectively suppresses the risk of “hallucination.”

This study aims to construct a water conservancy safety KG that integrates complex cross-document knowledge. This KG can accurately express complex dynamic relationships and support emergency response. It will effectively address the prominent contradictions in current water conservancy facility knowledge management. The intelligence level of water conservancy facility safety management will thus be enhanced.

Specifically, although existing studies have explored LLM + KG integration or GraphRAG, the innovation of this research lies in integrating the domain ontology and GraphRAG deeply into a unified framework. Unlike merely using the ontology for post-event verification or applying RAG without any structural constraints, this method directly embeds the ontology into the LLM prompt template as a hard constraint, guiding the extraction process to follow the predefined entity types and relationship categories. Moreover, the integration with GraphRAG supports multi-hop reasoning for the extracted knowledge, which is crucial for security-related decisions.

2. Related Works

2.1. LLM Empowerment of KG

As a structured semantic knowledge base, the core of a KG lies in its use of an ontology-defined ontology layer to achieve standardized knowledge representation [9], while organizing knowledge at the data layer in the form of “entity-relation-entity” or “entity-attribute-attribute value” triples [10]. This structure provides a powerful framework for integrating multi-source heterogeneous data and revealing complex relationships between entities, making it critical infrastructure in the field of artificial intelligence [11]. Research on KG construction methods has undergone significant iteration. Early studies primarily focused on effectively extracting entities and relationships from textual data to build structured knowledge bases, such as employing statistical models to automatically identify and associate multi-source entities [12], laying the foundation for KG construction. However, these traditional methods relied on predefined rules and templates, limiting their ability to handle complex semantic relationships. To overcome these limitations, KG completion techniques based on deep learning have emerged, with graph neural networks seeing widespread application in complex entity-relationship graph structures [13]. Simultaneously, multi-task learning technology has been introduced to jointly process entity recognition and relationship extraction [14], effectively enhancing the accuracy and efficiency of KG construction. Nevertheless, since KG are dynamically updated semantic networks, achieving efficient knowledge updating and completion remains an important research topic.

In recent years, LLMs have made breakthrough advancements in natural language understanding, text generation, and contextual reasoning [15]. The powerful semantic parsing capabilities of LLM overcome the limitations of traditional rule-based/statistical models in handling professional terminology, implicit relationships, and long-text dependencies, significantly improving the automated extraction efficiency of entities, attributes, and complex relationships from unstructured data. LLM can generate triples dynamically. They do this through techniques like zero-shot prompting and hybrid frameworks. This capability lowers the barrier to adapting them to new domains. By integrating Retrieval-Augmented Generation (RAG) technology, external knowledge sources are introduced to suppress domain-specific “hallucinations,” ensuring knowledge reliability. Furthermore, leveraging frameworks like GraphRAG, context-aware KG are constructed, enhancing multi-hop reasoning capabilities in complex scenarios and promoting deep cognitive synergy between LLM and KG [16]. Currently, the integration of LLM and KG is being applied in domain-specific question-answering (QA) systems, personalized recommendations, and other areas [17].

The core principle of GraphRAG is to incorporate the graph structure of the knowledge graph into the RAG framework. Its advantages lie in: by converting text into an entity-relation graph, it can perform multi-hop reasoning along the graph structure during retrieval, thereby answering complex questions that require integrating information across paragraphs and documents. However, GraphRAG also faces challenges, such as the graphs automatically constructed from unstructured text may have missing relationships or noise, leading to retrieval results deviating from the user’s true intention. In this study, we aim to improve the quality of graph construction from the source by introducing a domain ontology as a constraint template, in order to suppress the retrieval noise problem of GraphRAG.

2.2. Application Research of KG in Water Conservancy Facility Safety

Applying KG technology to the field of water conservancy facility safety holds immense potential for integrating fragmented knowledge and enhancing risk early warning and intelligent decision-making capabilities [18]. Water conservancy facility safety management involves multi-dimensional elements such as “institutions-equipment-hidden dangers-regulations” and their complex dynamic interactions, urgently requiring structured knowledge representation and reasoning support. Currently, research on KG in the water conservancy domain remains in the exploratory stage [19]. Although general KG construction techniques (including knowledge extraction [20], fusion [21], and storage [22]) are relatively mature and have achieved significant results in fields such as finance [23], healthcare [24], and education [25], the water conservancy domain’s strong specialization and task-specific nature present unique challenges. Existing research on water conservancy facility safety knowledge management primarily focuses on emphasizing the importance of ontology construction—recognizing that building ontologies covering core concepts (e.g., institutions, equipment, hidden dangers, regulations) in the water conservancy field is a prerequisite for knowledge reuse and sharing [26], and utilizing tools (e.g., Protégé) for formal modeling; and addressing the difficulties in information extraction—acknowledging that water conservancy texts are highly unstructured and specialized, leading to challenges in domain-specific entity and relationship recognition [27].

Despite the groundbreaking advancements in LLM technology creating opportunities and demonstrating significant potential for the deep application of KG in water conservancy facility safety, existing research on LLM-driven KG construction in this domain still faces critical limitations: First, general-purpose LLM exhibit insufficient understanding of domain-specific terminology and dynamic interactions between entities, leading to complex semantic parsing deviations [28]; second, the lack of robust domain-constrained mechanisms fails to effectively suppress knowledge “hallucinations,” compromising the reliability of professional knowledge; simultaneously, the semantic framework of ontologies and LLM extraction capabilities lack effective coordination, hindering accurate cross-source knowledge fusion; furthermore, current methods struggle to support real-time dynamic tracking of temporally evolving risks, limiting proactive early warning capabilities.

Therefore, this study focuses on addressing the aforementioned core challenges by proposing a synergistically enhanced KG construction method that integrates domain ontologies with LLM.

3. Methods

3.1. Overall Research Framework

Given the massive scale of water conservancy facilities and the critical safety responsibilities involved, facility failures could trigger severe consequences. However, safety knowledge in this domain is characterized by highly unstructured, multi-source, and heterogeneous data formats. Traditional technologies struggle to effectively integrate and model the complex dynamic relationships among “institutions, equipment, hidden dangers, and regulations.” This constrains risk early warning and intelligent decision-making capabilities. Traditional technologies struggle to effectively integrate and model the complex dynamic relationships among “institutions-equipment-hidden dangers-regulations,” which constrains risk early warning and intelligent decision-making capabilities. Meanwhile, existing KGs lack depth in water conservancy safety applications. LLMs face challenges such as professional “hallucinations” and instability in long-text comprehension. To address these challenges, an LLM-enhanced KG construction method is proposed in this study. It aims to systematically resolve core issues of complex knowledge, unstructured data, and utilization difficulties in water conservancy facility safety. This method has three main innovative contributions. First, a comprehensive ontology system is constructed. It covers “institutions, equipment, hidden dangers, and regulations.” This provides a unified and rigorous semantic framework for domain knowledge. Second, LLM and GraphRAG technologies are integrated for efficient knowledge extraction. This integration enables precise extraction of triple knowledge from unstructured texts. Ontology templates are used to effectively suppress LLM hallucinations. The model’s adaptability to specialized domains is thus enhanced. Third, a KG architecture with dynamic update capabilities is designed. This supports real-time updates of the graphTexts.

This study followed the methodology described above. We constructed a water conservancy facility safety KG. This graph was built in the Neo4j graph database. We also developed an intelligent QA system. This system was created by integrating GraphRAG indexing with hybrid retrieval technology. The overall research framework is shown in Figure 1, which comprises three core modules: (1) Data Integration and Preprocessing: automated collection and cleansing of multi-source heterogeneous data; (2) KG Construction: defining the ontology, utilizing LLM + GraphRAG technology for knowledge extraction, and implementing graph storage and visualization; (3) QA System Research: achieving multi-hop reasoning and precise answer generation to serve practical scenarios such as hidden danger management and emergency decision-making.

3.2. Model Construction Process

3.2.1. Multi-Source Heterogeneous Data Processing

Texts in the field of water conservancy facility safety are highly unstructured and specialized. Direct knowledge extraction often leads to suboptimal results due to semantic ambiguity and contextual deficiencies. Therefore, data preprocessing is essential. Raw text must be transformed into a normalized corpus. This corpus should have a clear structure and unambiguous semantics. This lays the foundation for high-quality entity and relationship recognition in subsequent stages.

Formatting errors and garbled characters must be removed from the text. Content that fails to meet quality and relevance requirements must be eliminated. The original format and expression forms of domain-specific materials should be preserved [29]. This ensures the purity and consistency of the textual data.
This step is performed using Python 3.12.0 programs for regex-based cleansing.
This phase involves manual curation, segmentation of long sentences, and removal of low-value information to provide robust data support for subsequent construction of the water conservancy facility safety KG.

3.2.2. Domain Ontology Modeling

This study aims to achieve fine-grained representation of water conservancy facility safety knowledge. Considering the current lack of reusable ontology libraries, the content structure of data sources is comprehensively analyzed in this paper. A systematic review of normative documents is conducted. Existing cases and research are referenced to extract core elements. Consequently, a comprehensive water conservancy facility safety ontology library is constructed. Its core components are “Agency and Personnel,” “Engineering Equipment,” “Risks and Hidden Dangers,” and “Systems and Processes.” The structural representation of this ontology library is as follows:

O_{W c f} = \{O_{A p o}, O_{E e o}, O_{R h d}, O_{S p o}, O_{R b o}\}

(1)

In the formula,

$O_{W c f}$ denotes the Comprehensive Water Conservancy Facility Safety Ontology;
$O_{A p o}$ represents the Agency and Personnel Ontology;
$O_{E e o}$ refers to the Engineering Equipment Ontology;
$O_{R h d}$ indicates the Risk and Hidden Danger Ontology;
$O_{S p o}$ signifies the System and Process Ontology;
$O_{R b o}$ captures the Inter-Relationships among these ontologies.

This study integrates water conservancy facility safety management regulations and guidelines as primary knowledge sources for constructing the safety management ontology. Utilizing Stanford University’s Protégé tool, we formally built the water conservancy facility safety knowledge ontology [30]. The constructed ontology comprises four major categories: Agency and Personnel, Engineering Equipment, Risks and Hidden Dangers, and Systems and Processes. Based on comprehensive consideration of conceptual logic, predefined semantic relationships between ontologies were established, including: “operates/is operated by”, “executes/is executed by”, “identifies/is identified by”, “complies with/regulates”, “triggers/affects”, and “prevents/exposes”.

3.2.3. Retrieval-Augmented Knowledge Extraction with LLM

To address the limitations of traditional Retrieval-Augmented Generation (RAG) in the water conservancy facility safety domain, a KG-LLM-GraphRAG architecture is proposed in this paper. Traditional RAG has low retrieval precision [31] and poor intent alignment [32]. The proposed framework deeply integrates the structured semantic relationships of KGs with the generalized reasoning capabilities of LLMs. A vector database is incorporated as a supplementary retrieval source. The architecture supports natural language queries from users. Query intent is automatically parsed. Graph relationships are leveraged for multi-hop reasoning. Accurate, interpretable decision-support answers are generated.

GraphRAG leverages LLMs to extract and construct KGs from raw text. This addresses the needs for cross-context retrieval and complex reasoning. Its construction process transforms unstructured text into a semantically enriched entity-relationship network. A structured knowledge representation is formed. This representation is more expressive than plain text segments. The system’s ability to handle complex multi-hop questions is significantly enhanced. During this process, a prompt-based optimization strategy guides the LLM to extract structured triples from unstructured text. By introducing an ontology layer-based contextual prompting method, the model’s generalization capability in the water conservancy facility safety domain is effectively enhanced. The accuracy of entity and relationship extraction is improved. Building upon the constructed water conservancy facility safety ontology, tailored extraction templates are further designed. A contextual prompting mechanism is employed to create standardized prompt information suitable for LLM processing. The construction workflow can be divided into the following key steps, as illustrated in Figure 2.

Text Segmentation

Normative documents in the water conservancy domain are typically lengthy and complex. LLMs have inherent input length limits. Direct input of full text would cause loss of critical information due to truncation. Therefore, text segmentation is necessary. The original input documents are partitioned along semantic boundaries into manageable textual units (TextUnits). This ensures each text block carries a relatively complete semantic segment. Fragmentation of entities or relationships is avoided. This process lays the foundation for subsequent in-depth parsing.

Prompt Design And Entity-Relationship Extraction.
Based on the partitioned TextUnits and the optimization strategy of contextual prompts, the LLM is invoked for sequential processing. This allows involved entities and their semantic relationships to be automatically identified and extracted. Preliminary triple structures are generated.
KG Construction And Community Generation.

The extracted entities and relationships are constructed into a preliminary KG. Subsequently, graph clustering algorithms are introduced to aggregate semantically closely related entities and relationships into communities, thereby revealing knowledge substructures and their intrinsic connections. Furthermore, a structured summary report is generated for each community. This report outlines the community’s core themes, key entities and relationships contained within, and the community’s importance or influence within the overall knowledge network.

To illustrate how GraphRAG supports multi-hop reasoning and intent alignment, we use the following example query: ‘Who supervises the reinforcement project of the Huangzhuangwa Flood Diversion Sluice?’

Intent Parsing: The LLM identifies the key entities in the query: ‘Huangzhuangwa Flood Diversion Sluice’ and ‘reinforcement project.’
Graph Retrieval: The system locates the node for ‘Huangzhuangwa Flood Diversion Sluice’ in Neo4j. It then traverses along a ‘triggers’ edge (representing that a reinforcement project is typically triggered by or affects the facility) to reach the ‘Reinforcement Project’ node. To find the supervising entity, the system looks for an ‘is operated by’ edge from the project node. If a direct edge to ‘Tianjin Water Bureau’ exists, the answer is retrieved directly.
Multi-hop Reasoning: If no direct supervision relation is stored, but the graph contains the following paths:
‘Huangzhuangwa Flood Diversion Sluice’—[is operated by] → ‘Haihe River Management Committee’ (indicating affiliation),
‘Haihe River Management Committee’—[is operated by] → ‘Tianjin Water Bureau’ (indicating guidance),
then GraphRAG can reason through the combined path:
‘Reinforcement Project’—[triggers] → ‘Huangzhuangwa Flood Diversion Sluice’—[is operated by] → ‘Haihe River Management Committee’—[is operated by] → ‘Tianjin Water Bureau’.
This multi-hop traversal infers that the supervising entity is likely the Tianjin Water Bureau.
Answer Generation: The LLM integrates the retrieved path information and generates a natural language response: ‘Based on reasoning, the reinforcement project of the Huangzhuangwa Flood Diversion Sluice is likely supervised by the Tianjin Water Bureau, as the sluice is affiliated with the Haihe River Management Committee, which is under the guidance of the Tianjin Water Bureau.’

This example demonstrates how GraphRAG leverages the predefined semantic relationships (triggers, is operated by) to perform multi-hop reasoning and align the retrieved knowledge with the user’s intent.

3.2.4. Graph Database Storage and Visualization

Based on the constructed triples [33], Cypher statements are employed to batch import them into the Neo4j graph database, achieving efficient knowledge storage. In the graph database, entities, relationships, and attributes are mapped to nodes and edges respectively, collectively forming a semantically rich topological network structure [34]. Compared with relational databases, graph databases demonstrate significant advantages in storing topologically interconnected data. Their native graph structure supports efficient multi-hop queries and complex relational reasoning, making them particularly suited for the application requirements of water conservancy facility safety KG. This provides a reliable foundation for knowledge organization and storage in this vertical domain.

4. Construction of Water Conservancy Facility Safety KG

4.1. Data Sources

This study selects representative multi-source normative texts from the water conservancy facility safety domain as the core data foundation for constructing the KG. The partial data adopted (see Table 1) covers multiple levels and types, including national laws, administrative regulations, departmental rules, local regulations, national standards, and local standards. Key examples include the “Water Law of the People’s Republic of China” and the “Regulations on Work Safety Management of Water Conservancy Projects”. The content comprehensively covers institutional responsibilities, engineering equipment management, risk hazard identification, and emergency response procedures. These authoritative and structurally diverse texts collectively form the core knowledge source for water conservancy facility safety management, providing reliable and diversified data support for the deep integration of semantic understanding and structured knowledge reasoning.

4.2. KG Construction

4.2.1. Construction of Water Conservancy Facility Safety Ontology

Based on the ontology layer, the comprehensive water conservancy facility safety ontology library

O_{W c f}

is constructed. This system framework explicitly defines four core ontologies: “Agency and Personnel”

O_{A p o}

, “Engineering Equipment”

O_{E e o}

, “Risks and Hidden Dangers”

O_{R h d}

, “Systems and Processes”

O_{S p o}

, and presets six categories of semantic relationships

O_{R b o}

: “operates/is operated by”, “executes/is executed by”, “identifies/is identified by”, “complies with/regulates”, “triggers/affects”, and “prevents/exposes”.

Firstly, the specific construction process of the four core ontologies is as follows:

(1): Regarding the construction of the Agency and Personnel Ontology: At the institutional level, entities are classified into five categories based on hierarchical relationships, business scenarios, and emergency roles: government regulatory departments, project management units, construction and operation enterprises, technical support agencies, and emergency coordination agencies. At the personnel level, individuals are categorized into five types according to qualification constraints, duty associations, and cross-industry mappings: unit responsible persons, technical management personnel, operation and maintenance staff, administrative support personnel, and emergency response personnel. Given the complex and dynamic nature of organizational arrangements for institutions and personnel, this paper constructs separate models for institutions and personnel based on this classification to enhance the targeting and efficiency of institutional and personnel arrangements in water conservancy projects. Table 2 demonstrates the construction of the institution ontology using government regulatory departments as an example, while Table 3 illustrates the personnel ontology construction with technical management personnel as an example.

(2): Regarding the construction of the Engineering Equipment Ontology: Based on ISO 55000 (Asset Management Standards) [35] and water conservancy engineering systems theory, core concepts mentioned in various standard documents are refined. Engineering equipment is classified into five categories: water-retaining engineering equipment, water-discharging engineering equipment, water-diversion engineering equipment, monitoring and control engineering equipment, and auxiliary engineering equipment. Table 4 demonstrates the construction of the engineering equipment ontology.

(3): Regarding the construction of the Risk and Hidden Danger Ontology: Based on disaster chain theory, the evolution of hidden dangers exhibits temporality, requiring distinction between latent, trigger, and outbreak phases. It is worth noting that these three phases of hidden danger evolution occur sequentially: the latent phase accumulates risks, the trigger phase is initiated by external conditions, and ultimately leads to disasters in the outbreak phase. Table 5 demonstrates the construction of the risk and hidden danger ontology.

(4): Regarding the construction of the system and process ontology, based on legal hierarchy, effectiveness, and management process stages, system processes are categorized into three types according to legal hierarchy: national laws, administrative regulations and departmental rules, and local regulations. It is important to note that logical consistency in classification must be maintained, avoiding overlaps or omissions. Table 6 demonstrates the construction of the system and process ontology.

Secondly, the definition of the six types of semantic relationships. After completing the construction of the four core ontologies, it is necessary to further clarify the semantic associations among them to form a knowledge network capable of supporting complex relational reasoning. Based on the inherent logic and interaction patterns among “institutions, equipment, hidden dangers, and regulations” in water conservancy facility safety management, this paper predefines six pairs of top-level relationship sets. Table 7 illustrates the definition of these six types of top-level semantic relationships.

In summary, this paper systematically constructs a comprehensive ontology library for the water conservancy facility safety domain, which includes four core ontologies and six predefined semantic relationships among them. This provides a rigorous semantic framework and structural constraints for the subsequent accurate knowledge extraction and graph construction based on LLM.

4.2.2. LLM-Integrated Prompt Engineering and Ontology-Constrained Entity-Relationship Extraction

After completing the construction of the domain ontology, we follow a structured semantic framework and carry out systematic operational procedures to accurately extract entities and relationships from unstructured texts, thereby completing the construction of the KG. The specific processing flow is as follows:

Text Segmentation

The text must be segmented to avoid fragmenting key entities or relationships. This is done by dividing the text into segments called TextUnits. The segmentation follows semantic boundaries. This process ensures that each TextUnit contains a complete semantic fragment.

Prompt Design And Entity-Relationship Extraction

On the basis of text segmentation, a prompting strategy is employed to guide the LLM in extracting entities and relationships from unstructured text, thereby generating structured triples. Based on the water conservancy facility safety ontology, tailored extraction templates are constructed, and a contextual prompting method is integrated to design standardized prompt information for input into the LLM. The specific prompt content is illustrated in Figure 3.

After completing the design of the prompt, a large model (such as doubao-1-5-pro-32k-250115) is used to extract the entities therein. These entities typically refer to information such as persons, locations, institutions, and concepts appearing in the document. The purpose of entity recognition is to construct an Entity Graph, extracting all entities and preparing for subsequent relationship mining and queries.

On the basis of identifying entities, the text units are further analyzed to mine the semantic relationships existing between entities. These relationships define how entities are interrelated.

KG Construction And Community Generation

Based on the extracted entities and their relationships, an initial KG is constructed. Graph clustering algorithms are then applied to group closely related entities and relationships into different communities. For each community, a community report is generated, summarizing the community’s core themes, key entities and relationships contained within, and the community’s importance or influence within the overall knowledge network.

By combining the contextual prompts of the LLM with ontological constraints, the accuracy and consistency of entity and relationship extraction by the LLM in the field of water conservancy safety are effectively enhanced. A reliable data foundation is laid for the subsequent construction of the KG.

4.3. Model Performance

To evaluate the effectiveness of our proposed method, we compare it against several LLM-based extraction strategies (direct extraction, template-based extraction). It is worth noting that we did not include traditional non-LLM baselines such as rule-based methods or classical NER models (e.g., BiLSTM-CRF) in this comparative study. The primary reason is that the water conservancy safety domain involves highly unstructured, specialized texts with complex entity relations and domain-specific terminology. Prior research has demonstrated that traditional information extraction methods struggle in such settings due to their reliance on handcrafted features and limited ability to capture semantic nuances [20,28]. For instance, Duan et al. [19] highlighted the challenges of constructing knowledge graphs from heterogeneous water conservancy documents using conventional NLP techniques. Moreover, recent surveys on domain-specific knowledge extraction [27,36] confirm that LLM-based approaches significantly outperform traditional methods in zero/few-shot scenarios and when dealing with implicit relations. Therefore, our focus is on advancing LLM-based techniques, and the comparison with direct and template-based LLM extraction already provides sufficient evidence of the superiority of our ontology-constrained, GraphRAG-enhanced framework. Future work may include a comprehensive benchmark covering traditional methods to further validate the generalizability of our approach.

To systematically validate the feasibility and effectiveness of the proposed method for constructing a water conservancy facility safety KG, this study evaluates the approach from two dimensions: ontology construction quality and knowledge extraction performance. First, regarding ontology construction, experts in the water conservancy field and experienced frontline practitioners were invited to conduct a professional review of the constructed ontology [37]. By assessing several aspects, the ontology’s validity and feasibility were confirmed. These aspects include the rationality of the ontology structure, the necessity of core concepts, and the similarity and distinction between concepts. It was unanimously concluded that the ontology demonstrates validity and feasibility in terms of semantic coverage and logical consistency. Second, in terms of knowledge extraction effectiveness, precision (P), recall (R), and F1-score were used to quantitatively evaluate the entity and relationship extraction results driven by the LLM [38]. Here, precision measures the proportion of true positive instances among the predicted positive cases. Recall reflects the proportion of true positive instances correctly predicted. The F1-score, as the harmonic mean of the two, comprehensively evaluates the overall performance of the model. The average F1-score is calculated as the arithmetic mean of the F1-scores across all categories. It assesses the model’s generalization capability across different types of entities and relationships. To ensure the reliability of the results, all extraction experiments were repeated three times. The average values were taken as the final performance metrics. By comparing the results with actual safety assessment reports, the P, R, and F1 values for the entity and relationship extraction tasks were calculated.

P = \frac{T P}{T P + F P}

(2)

R = \frac{T P}{T P + F N}

(3)

F 1 = 2 \times \frac{P \times R}{P + R}

(4)

In the provided formulas, TP (True Positive), FP (False Positive), TN (True Negative), and FN (False Negative) represent the four possible outcomes of the classifier’s target identification. The explanations for the four possible outcomes can be referred to Table 8. When describing these four outcomes, it is assumed that there are only two categories: Positive and Negative.

Among them, TP (True Positive) refers to the number of positive samples that the model correctly predicts as positive; FP (False Positive) refers to the number of negative samples that the model wrongly predicts as positive; TN (True Negative) refers to the number of negative samples that the model correctly predicts as negative; FN (False Negative) refers to the number of positive samples that the model wrongly predicts as negative. In the entity and relationship extraction task of this study, “positive class” is defined as the target entities or relationships that need to be correctly extracted.

In order to comprehensively understand the failure cases and common extraction errors, we manually examined 100 randomly selected extraction errors from the test set and classified them into three main types: entity mismatch, implicit relationships, and nested entities. Table 9 summarizes the relevant findings, including the error types, descriptions, and representative examples.

To validate the feasibility of the proposed method, this study designed and conducted three sets of comparative experiments. These experiments employed different extraction methods to comprehensively evaluate the performance and effectiveness of each approach. The first experiment used a widely adopted LLM for direct information extraction. This method relies on the model’s pre-trained knowledge and capabilities. Extraction results are generated directly from the input text. The second experiment introduced a template prompting method. Example templates manually extracted from a small number of samples are provided. This enhances the model’s understanding of specific domain contexts. The third experiment adopted the context-integrated method proposed in this study. This combines prompt technology with ontological knowledge for information extraction. Designed prompts guide the model. Domain ontology knowledge is incorporated. The accuracy and interpretability of the extraction results are improved.

First, no prompting method was used during extraction. The large model was directly applied for extraction. Second, without using the ontological template, extraction was performed directly using manually designed templates. Finally, both the ontological template and the prompt method were input into the LLM. The extraction results were obtained. To ensure the reliability and generalization ability of the evaluation results, we randomly selected 20% of the text segments (approximately 500 text units, TextUnits) from the 16 core data sources listed in Table 1 as the test set. This test set covers various text types such as laws and regulations, technical standards, and case reports, ensuring the comprehensiveness of the evaluation. For each indicator reported in Table 10, we independently conducted three experiments on the test set and calculated the arithmetic average of the macro-average results to minimize the impact of randomness. The comparative results of the three experiments are shown in Table 10.

From the table, it can be observed that the strategy combining the ontological template with the prompt method significantly outperforms the other two extraction methods. Commonly used LLM content extraction methods often face challenges when processing massive data. These challenges include insufficient semantic consistency and unclear expression of complex relationships. In contrast, the method proposed in this study constructs a KG. It combines ontology-driven large model knowledge extraction. Complex entity relationships and semantic information can be more accurately captured. This approach not only addresses the shortcomings of existing direct extraction and template extraction methods but also significantly improves extraction efficiency and the completeness of information expression, demonstrating notable superiority. This indicates that LLMs possess strong learning capabilities and can effectively capture useful information or features from templates and prompts. These results also confirm that the combined use of ontological templates and prompt-based recognition is entirely feasible.

5. KG Visualization and Application

Based on the ontological framework, this study aligns the entities and entity attribute data identified by the LLM with the entities and entity attributes in the ontological layer to complete the matching of entities and attributes. This process enables the construction of all entities, entity attributes, and relationships, ultimately forming a complete KG [39]. The constructed KG can be visually displayed in the Neo4j graph database.

The types of graph nodes are consistent with those in the ontology and can be broadly categorized into four major classes: agencies and personnel, engineering equipment, risks and hidden dangers, and systems and processes. The relationships can be divided into six pairs: “operates/is operated by,” “executes/is executed by,” “identifies/is identified by,” “complies with/regulates,” “triggers/affects,” and “prevents/exposes.” Specifically, the relationship between personnel and engineering equipment should be an operational relationship; the relationship between agencies and personnel and systems and processes should be an execution relationship; the relationship between agencies and personnel and risks and hidden dangers should be an identification relationship; the relationship between engineering equipment and systems and processes should be a compliance and regulation relationship; the relationship between engineering equipment and risks and hidden dangers should be a triggering and affecting relationship; and the relationship between systems and processes and risks and hidden dangers should be a prevention and exposure relationship. The extracted entity information is shown in Figure 4. The completed safety KG for water conservancy facilities, using the Huangzhuangwa Flood Diversion Sluice as an example, is shown in Figure 5.

For example, to understand the newly built ancillary structures in the hazard mitigation and reinforcement project of the Huangzhuangwa Flood Diversion Sluice, one only needs to perform the corresponding query operation in the Neo4j graph database to retrieve the data. The results, as shown in Figure 6, indicate that the newly built ancillary structure in the Huangzhuangwa Flood Diversion Sluice hazard mitigation and reinforcement project is a management building.

In order to more intuitively evaluate the question-answering performance of the knowledge graph in practical applications, we have designed two example questions.

Example 1 (Correct Answer):

User question: ‘What ancillary facilities were newly constructed for the Huang-zhuangwa Diversion Dam’s hazard mitigation and reinforcement project?’

Expected answer: ‘Management Apartment.’

System response: ‘According to the query, the newly constructed ancillary structure of the Huangzhuangwa Diversion Dam Renovation Project is a management building.’ (As shown in Figure 6.)

Result Analysis: The system accurately interpreted the user’s intention. Through the relationship in the graph (Huangzhuangwa Diversion Dam Renovation Project)—[New Construction] -> (Management Building), it successfully returned the correct answer. This proves that the multi-hop query based on the graph can precisely answer factual questions.

Example 2 (Incorrect/Broken Answer):

User question: “What are the potential safety hazards associated with the Huang-zhuangwa Diversion Dam?”

Expected answer: Appearance defects of the concrete structure (damage, cracks, exposed reinforcing bars, etc.).

Possible incorrect system response (before optimization): ‘The Huangzhuangwa Flood Diversion Gate requires on-site safety inspection.’ (If the system only established the relationship of (Huangzhuangwa Flood Diversion Gate)—[needs to be] -> (on-site safety inspection), but did not deeply explore the ‘specific hazards’ to be discovered during the inspection.)

Result Analysis: In the early versions of the graph or when lacking in deep reasoning, the system might only be able to respond with “need to check”, but unable to directly provide “what hidden problems to check”. Through the complete data chain we constructed, which includes the relationship of (on-site safety inspection)—[need to investigate] -> (appearance defects of concrete structure) (as shown in Figure 7), the optimized system can perform deeper reasoning and thus provide more accurate answers. This example also illustrates the crucial role of the completeness and depth of relationships in the knowledge graph for the accuracy of question answering.

To gain a clearer understanding of the data chain, one can also query the entire graph data chain containing the “field safety inspection” node to obtain relevant information about the field safety inspection, as shown in Figure 7.

The graph database, built on the graph structure, can clearly display the relationships between nodes through its inherent graph architecture [40]. Taking the field safety inspection as the starting point of the data chain, upward reasoning reveals that the Huangzhuangwa Flood Diversion Sluice was required to undergo field safety inspection. Downward reasoning indicates that the inspection aimed to identify concrete structural appearance defects (such as damage, cracks, exposed reinforcement, etc.), providing a more intuitive understanding of the overall facility management. The complete knowledge description of the facility management process is: “During the field safety inspection of the Huangzhuangwa Flood Diversion Sluice, concrete structural appearance defects (such as damage, cracks, exposed reinforcement, etc.) must be investigated.”

6. Conclusions

To address issues such as multi-source heterogeneous data barriers and difficulties in expressing complex entity relationships in water conservancy facility safety management, this paper proposes a novel method for constructing a safety knowledge graph for water facilities based on LLMs and conducts research and discussions on related safety management issues. By introducing an ontological framework, the recognition accuracy of LLM in specialized domains has been effectively improved. To tackle the complexity of entity relationships, LLMs are utilized to significantly enhance the identification capability of entities and relationships in water facility data. The constructed knowledge graph provides support for establishing an intelligent safety management system, contributing to a comprehensive improvement in safety management levels.

This study still has certain limitations. First, although domain ontology constraints were introduced to enhance the extraction performance of LLM, the accuracy of entity and relationship extraction in extremely complex contexts (such as multiple nested relationships and professional terminology ambiguity) still needs improvement. Second, the current research primarily relies on textual normative documents for knowledge extraction and graph construction and has not yet integrated multimodal data sources such as images, real-time sensor data, and geographic information systems (GIS). Third, the dynamic update mechanism for the knowledge graph has not been systematically implemented, resulting in limitations in the comprehensiveness and timeliness of knowledge types in the constructed graph, which restricts its completeness and evolutionary capability. This consequently affects its supporting effectiveness in practical scenarios such as proactive early warning and emergency response.

To address these identified limitations, future research will focus on the following concrete directions:

Multimodal Data Integration: In response to the current reliance on textual data only, we will explore deep integration mechanisms for incorporating non-textual data such as images (e.g., dam inspection photos, satellite imagery), real-time sensor data (e.g., water levels, stress gauges, seepage monitors), and geographic information systems (GIS). This will enable the construction of a more comprehensive knowledge graph that captures the physical states and spatial contexts of water conservancy facilities.
Real-time Sensor Stream Incorporation: To overcome the lack of dynamic updates, we will develop an event-driven pipeline that continuously ingests real-time sensor data streams. This will enable incremental updates to the knowledge graph, supporting temporal reasoning and enabling proactive early warning capabilities based on live monitoring data.
GIS Coupling for Spatial Reasoning: To enrich the contextual understanding of facility risks, we will couple the knowledge graph with GIS data, enabling spatial queries and visualizations (e.g., identifying facilities in flood-prone zones, analyzing spatial patterns of risks). This integration will enhance the system’s ability to support emergency response planning and resource allocation.
Enhanced Extraction for Complex Contexts: To improve accuracy in handling nested entities and implicit relations, we will investigate advanced NLP techniques, including nested NER models and relation inference modules, to better capture the semantic nuances in highly specialized texts.

By systematically pursuing these future work steps—directly targeting the limitations identified in this study—we aim to evolve the current prototype into a fully functional, dynamically updating knowledge graph system that can effectively support real-world water conservancy safety management and emergency decision-making.

Author Contributions

All the authors contributed to the study conception and design. Literature curation and investigation: Y.W., C.L. and L.G.; Formal analysis and synthesis: Y.W., C.L. and L.G.; Writing—original draft preparation: Y.W. and C.L.; Writing—review and editing: C.L., Y.W., L.G. and Q.D.; Supervision and project administration: C.L. All authors have read and agreed to the published version of the manuscript.

Funding

The Technology Innovation Program for Postgraduates at IDP is subsidized by the Fundamental Research Funds for Central Universities (ZY20260319), and it is also part of the Langfang Science and Technology Support Program Project and Science (2023013202).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors have no relevant financial or nonfinancial interests to disclose.

References

Wang, Y.; Hu, A. China’s Water Conservancy: Review and Outlook (1949–2050). J. Tsinghua 2011, 26, 99–112. [Google Scholar] [CrossRef]
Ge, W. Editorial: Risk assessment and management of water conservancy projects. Front. Earth Sci. 2023, 11, 1330621. [Google Scholar] [CrossRef]
Liu, Y.; Tang, Y.; Jing, L.; Chen, F.; Wang, P. Remote Sensing-Based Dynamic Monitoring of Immovable Cultural Relics, from Environmental Factors to the Protected Cultural Site: A Case Study of the Shunji Bridge. Sustainability 2021, 13, 6042. [Google Scholar] [CrossRef]
Lu, J.; Feng, J.; Tang, Z.; Zhang, P. Research on Key Technologies of Water Conservancy Big Data Directory Service and Resource Sharing. Water Resour. Inform. 2017, 4, 17–20+27. [Google Scholar] [CrossRef]
Qiu, L.; Zhang, A.; Li, S.; Zhang, Y.; Shen, M.; Zhou, P. A Review on Knowledge Graph Construction in Aviation Manufacturing. Appl. Res. Comput. 2022, 39, 968–977. [Google Scholar] [CrossRef]
Huang, Y.; Yu, S.; Luo, B.; Li, R.; Li, C.; Huang, W. Exploring the Digital Twin Yangtze River for Joint Intelligent Scheduling of Basin Water Engineering Disaster Prevention. J. Hydraul. Eng. 2022, 53, 253–269. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, T.; Niu, W.; Qin, H. Research on Key Technologies for Digital Twin Construction of the Three Gorges Reservoir Area. Yangtze River 2023, 54, 19–24. [Google Scholar] [CrossRef]
Xie, A.; Wu, Q.; Liu, F. Exploring Intelligent Operation and Maintenance Approaches for Pump Station Projects Based on Voiceprint Recognition and Knowledge Graph Technology. Yangtze River Technol. Econ. 2021, 5, 88–92. [Google Scholar] [CrossRef]
Huang, H.; Yu, J.; Liao, X.; Xi, Y. A Survey of Knowledge Graph Research. Comput. Syst. Appl. 2019, 28, 1–12. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, Z.; Su, X.; Jin, T. Construction of a Q&A Knowledge Graph Ontology Model Integrating Multi-level Data. Libr. Inf. Serv. 2022, 66, 125–132. [Google Scholar] [CrossRef]
Ibrahim, N.; Aboulela, S.; Ibrahim, A.; Kashef, R. A survey on augmenting knowledge graphs (KGs) with large language models (LLMs): Models, evaluation metrics, benchmarks, and challenges. Discov. Artif. Intell. 2024, 4, 76. [Google Scholar] [CrossRef]
Lv, W.; Liao, Z.; Liu, S.; Zhang, Y. MEIM: A Multi-source Software Knowledge Entity Extraction Integration Model. Comput. Mater. Contin. 2020, 66, 1027–1042. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
Chen, W.; Tian, J.; Xiao, L.; He, H.; Jin, Y. Exploring Logically Dependent Multi-task Learning with Causal Inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 2213–2225. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
Chen, H.; Xie, R.; Cui, X.; Yan, Z.; Wang, X.; Xuan, Z.; Zhang, K. LKPNR: Large Language Models and Knowledge Graph for Personalized News Recommendation Framework. Comput. Mater. Contin. 2024, 79, 4283–4296. [Google Scholar] [CrossRef]
Liu, X.; Lu, H.; Li, H. Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph—Take the South-to-North Water Diversion Project as an example. LHB 2022, 108, 2153629. [Google Scholar] [CrossRef]
Duan, H.; Han, K.; Zhao, H.; Jiang, Y.; Li, H.; Mao, W. Research on the Construction of Comprehensive Water Conservancy Knowledge Graph. J. Hydraul. Eng. 2021, 52, 948–958. [Google Scholar] [CrossRef]
Abdullah, M.H.A.; Aziz, N.; Abdulkadir, S.J.; Alhussian, H.S.A.; Talpur, N. Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges. IEEE Access 2023, 11, 10535–10562. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, X.; Wu, C.; Zhao, Z. Survey of Knowledge Graph Construction Techniques. Comput. Eng. 2022, 48, 23–37. [Google Scholar] [CrossRef]
Liu, S.; Yang, H.; Li, J.; Kolmanič, S. Preliminary Study on the Knowledge Graph Construction of Chinese Ancient History and Culture. Information 2020, 11, 186. [Google Scholar] [CrossRef]
Wang, W.; Xu, Y.; Du, C.; Chen, Y.; Wang, Y.; Wen, H. Data Set and Evaluation of Automated Construction of Financial Knowledge Graph. Data Intell. 2021, 3, 418–443. [Google Scholar] [CrossRef]
Abu-Salih, B.; AL-Qurishi, M.; Alweshah, M.; AL-Smadi, M.; Alfayez, R.; Saadeh, H. Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities. J. Big Data 2023, 10, 81. [Google Scholar] [CrossRef] [PubMed]
Dang, F.-R.; Tang, J.-T.; Pang, K.-Y.; Wang, T.; Li, S.-S.; Li, X. Constructing an Educational Knowledge Graph with Concepts Linked to Wikipedia. J. Comput. Sci. Technol. 2021, 36, 1200–1211. [Google Scholar] [CrossRef]
Cheng, Q.; Wang, J.; Lu, W.; Huang, Y.; Bu, Y. Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics 2020, 124, 1923–1943. [Google Scholar] [CrossRef]
Lin, J.; Zhao, Y.; Huang, W.; Liu, C.; Pu, H. Domain knowledge graph-based research progress of knowledge representation. Neural Comput. Appl. 2021, 33, 681–690. [Google Scholar] [CrossRef]
Tariq, A.; Luo, M.; Urooj, A.; Das, A.; Jeong, J.; Trivedi, S.; Patel, B.; Banerjee, I. Domain-specific LLM Development and Evaluation—A Case-study for Prostate Cancer. medRxiv 2024. [Google Scholar] [CrossRef]
Jacobs, G.; Hoste, V. SENTiVENT: Enabling supervised information extraction of company-specific events in economic and financial news. Lang. Resour. Eval. 2020, 56, 225–257. [Google Scholar] [CrossRef]
Shu, X.; Yang, H. Ontology-driven intelligent assessment system for dam structural safety based on spatiotemporal anomaly detection framework. Comput.-Aided Civ. Infrastruct. Eng. 2025, 40, 5649–5671. [Google Scholar] [CrossRef]
Ong, Q.C.; Ang, C.-S.; Chee, D.Z.Y.; Lawate, A.; Sundram, F.; Dalakoti, M.; Pasalic, L.; To, D.; Fox, T.E.; Bojic, I.; et al. Advancing health coaching: A comparative study of large language model and health coaches. Artif. Intell. Med. 2024, 157, 103004. [Google Scholar] [CrossRef]
Sawarkar, K.; Mangal, A.; Solanki, S.R. Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. In Proceedings of the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), Orlando, FL, USA, 15–17 August 2024; pp. 155–161. [Google Scholar] [CrossRef]
Li, J.; Hu, J.; Zhang, G. Enhancing Relational Triple Extraction in Specific Domains: Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models. CMC-Comput. Mater. Contin. 2024, 79, 2481–2503. [Google Scholar] [CrossRef]
Firouzjaei, H.A. A deep learning-based approach for identifying unresolved questions on Stack Exchange Q &A communities through graph-based communication modelling. Int. J. Data Sci. Anal. 2024, 18, 205–218. [Google Scholar] [CrossRef]
ISO 55000:2024; Asset Management—Vocabulary, Overview and Principles. ISO: Geneva, Switzerland, 2024.
Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
Jiang, L.; Shi, J.; Wang, C. Multi-ontology fusion and rule development to facilitate automated code compliance checking using BIM and rule-based reasoning. Adv. Eng. Inform. 2022, 51, 101449. [Google Scholar] [CrossRef]
Hodak, M.; Ellison, D.; Van Buren, C.; Jiang, X.; Dholakia, A. Benchmarking Large Language Models: Opportunities and Challenges; Springer: Cham, Switzerland, 2024; Volume 14247, pp. 77–89. [Google Scholar] [CrossRef]
Yang, F.; Meng, B. Design of Computer-Aided Instruction Model Based on Knowledge Graph Construction and Learning Path Recommendation. Int. J. Web-Based Learn. Teach. Technol. 2025, 20, 16. [Google Scholar] [CrossRef]
Dudáš, A.; Kleinedler, A. Effective Visualization of Data Structures in Graph Databases. J. Image Graph. 2024, 12, 283–291. [Google Scholar] [CrossRef]

Figure 1. Construction framework of a water conservancy facility safety KG enhanced by LLM.

Figure 2. Construction workflow of GraphRAG.

Figure 3. Prompt template content.

Figure 4. Entity information.

Figure 5. Visualization of the water conservancy facility safety KG (partial).

Figure 6. Query results.

Figure 7. Data chain query.

Table 1. Statistics of Data Sources for Water Conservancy Facility Safety KG (Partial).

No.	Name	Category	Level/Source
S01	Water Law of the People’s Republic of China	National Law	National People’s Congress
S02	Flood Control Law of the People’s Republic of China	National Law	National People’s Congress
S03	Regulations on the Safety Management of Reservoir Dams	Administrative Regulation	State Council
S04	Provisions on Work Safety Management of Water Conservancy Projects	Departmental Rule	Ministry of Water Resources
S05	Provisions on Quality Management of Water Conservancy Projects	Departmental Rule	Ministry of Water Resources
S06	Sichuan Province Water Conservancy Engineering Management Regulations	Local Regulation	Sichuan People’s Congress
S07	Chongqing City Water Conservancy Engineering Management Regulations	Local Regulation	Chongqing People’s Congress
S08	GB/T 40582-2021 Basic Terminology for Hydropower Stations	National Standard	Standardization Administration
S09	DB11/T 2193-2023 Specification for Investigation and Management of Flood Prevention Hidden Dangers—Water Conservancy Projects	Local Standard	Beijing Municipality
S10	Guide to the List of Major Hidden Dangers for Production Safety in Water Conservancy Projects (2021 Edition)	Departmental Normative Document	Ministry of Water Resources
S11	Standard for Post Setting of Water Conservancy Project Management Units (Pilot) and Quota Standard for Water Conservancy Project Maintenance (Pilot)	Departmental Normative Document	Ministry of Water Resources
S12	Measures for the Assessment and Management of Work Safety for Principal Responsible Persons, Project Responsible Persons, and Full-time Work Safety Management Personnel of Water Conservancy and Hydropower Construction Enterprises	Departmental Normative Document	Ministry of Water Resources
S13	Guide for Identification and Risk Assessment of Operational Hazard Sources for Water Conservancy and Hydropower Projects (Reservoirs, Sluices) (Trial)	Departmental Normative Document	Ministry of Water Resources
S14	Guide to the List of Major Hidden Dangers for Production Safety in Water Conservancy Projects (2023 Edition)	Departmental Normative Document	Ministry of Water Resources
S15	Guidelines for Risk Assessment of Dams (ICOLD)	International Organization Guide	International Commission on Large Dams (ICOLD)
S16	Hebei Province Water Conservancy Engineering Management Regulations	Local Regulation	Hebei People’s Congress

Table 2. Conceptual Hierarchy of Water Conservancy Agency Ontology—Example of Government Regulatory Agencies.

Level 1 Concept	Level 2 Concept	Instances
Government Regulatory Agencies	National Regulatory Agencies	Ministry of Water Resources, Ministry of Emergency Management, Ministry of Finance
	Provincial Regulatory Agencies	Provincial Water Resources Department, Provincial Emergency Management Department, Provincial Finance Department
	Municipal Regulatory Agencies	Municipal Water Resources Bureau, Municipal Emergency Management Bureau, Municipal Finance Bureau
	County Regulatory Agencies	County Water Resources Bureau, County Emergency Management Bureau, County Finance Bureau
	Inter-basin Management Agencies	River Basin Management Agencies, Regional Coordination Agencies
	Specialized Regulatory Agencies	Hydrology Bureau, Water Conservancy Project Quality Supervision Station, Water Administration Supervision Detachment

Table 3. Conceptual Hierarchy of Water Conservancy Personnel Ontology—Example of Technical Management Personnel.

Level 1 Concept	Level 2 Concept	Instances
Technical Management Personnel	Safety Engineer	Beiyun River Levee and Gate Safety, Yangzhuang Reservoir Water Quality Collaborative Management
	Quality Supervisor	Chaobai River Levee Project Quality Control, Pipe Material Quality Dispute Handling, Ad hoc Quality Inspection During Flood Season Construction
	Hydrological Monitor	Jiyun River Salt-Fresh Water Interaction Monitoring, Storm Surge Red Warning Response

Table 4. Conceptual Hierarchy of Engineering Equipment Ontology.

Concept Classification	Instances
Water-Retaining Engineering Equipment	Dam, Levee, Gate
Water-Discharging Engineering Equipment	Spillway, Flood Discharge Tunnel, Drainage Valve
Water-Diversion Engineering Equipment	Diversion Channel, Pipeline, Pump Station
Monitoring and Control Engineering Equipment	Water Level Sensor, Stress Monitor, SCADA System
Auxiliary Engineering Equipment	Hoist, Trash Rake, Emergency Power Supply

Table 5. Conceptual Hierarchy of Risk and Hidden Danger Ontology.

Phase	Characteristics	Instances
Latent	Hidden danger exists but not triggered	Concrete Carbonation, Metal Fatigue
Trigger	External conditions exceed critical threshold	Water Level Exceeds Warning Line, Peak Ground Acceleration Exceeds Limit
Outbreak	System instability leads to disaster	Dam Breach, Pipeline Burst

Table 6. Conceptual Hierarchy of System and Process Ontology.

Concept Classification	Instances
National Laws	Water Law of the People’s Republic of China, Flood Control Law of the People’s Republic of China
Administrative Regulations and Departmental Rules	Regulations on the Safety Management of Reservoir Dams, Provisions on Work Safety Management of Water Conservancy Projects, Provisions on Quality Management of Water Conservancy Projects
Local Regulations	Sichuan Province Water Conservancy Engineering Management Regulations, Chongqing City Water Conservancy Engineering Management Regulations

Table 7. Definition of Six Types of Top-Level Semantic Relationships.

Top-Level Semantic Relation	Integrated Similar Expressions
Operates/Is Operated By	Uses/Is Used By, Controls/Is Controlled By, Manages/Is Managed By, Manipulates/Is Manipulated By, Runs/Is Run By, Operates/Is Controlled By
Executes/Is Executed By	Implements/Is Implemented By, Carries Out/Is Carried Out By, Fulfills/Is Fulfilled By, Performs/Is Performed By, Executes/Is Commanded By, Responsible For/Is Responsibility Of
Identifies/Is Identified By	Discovers/Is Discovered By, Detects/Is Detected By, Monitors/Is Monitored By, Diagnoses/Is Diagnosed By, Determines/Is Determined By, Assesses/Is Assessed By
Complies With/Regulates	Obeys/Is Obeyed By, Based On/Is Basis For, Conforms To/Is Conformed To, Follows/Is Followed By, Regulates/Is Regulated By, Constrains/Is Constrained By
Triggers/Affects	Causes/Is Caused By, Induces/Is Induced By, Activates/Is Activated By, Results In/Is Resulted In By, Affects/Is Affected By, Exacerbates/Is Exacerbated By
Prevents/Exposes	Prevents/Is Prevented By, Avoids/Is Avoided By, Mitigates/Is Mitigated By, Controls/Is Controlled By, Exposes/Is Exposed By, Reveals/Is Revealed By

Table 8. Confusion matrix for model performance evaluation.

Prediction Actual	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

Table 9. Categorization of Typical Extraction Errors.

Error Type	Description	Example
Ontology Mismatch	Entities or relations are assigned to incorrect ontology classes due to semantic ambiguity or insufficient context.	The term “emergency plan” is misclassified as “System and Process” instead of “Risk and Hidden Danger.”
Implicit Relations	Relations that are implied by the text but not explicitly stated are missed by the extraction model.	“The reservoir is managed by the local water bureau” implies a “manages” relation, but the model fails to extract it due to lack of explicit keywords like “manage”.
Nested Entities	Entities that contain other entities (e.g., compound terms) are not fully decomposed, leading to loss of fine-grained information.	“Reservoir dam safety assessment report” contains nested entities (“Reservoir dam”, “safety assessment”) that are extracted as a single entity, losing the relationship between them.

Table 10. Entity and Relationship Extraction Results.

Type	Precision (P)	Recall (R)	F1
Direct Extraction	0.435	0.560	0.490
Template Extraction	0.643	0.728	0.683
Prompt + Ontology	0.840	0.948	0.891

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, C.; Wang, Y.; Gao, L.; Ding, Q. Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models. Water 2026, 18, 840. https://doi.org/10.3390/w18070840

AMA Style

Li C, Wang Y, Gao L, Ding Q. Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models. Water. 2026; 18(7):840. https://doi.org/10.3390/w18070840

Chicago/Turabian Style

Li, Cui, Yu Wang, Lei Gao, and Qiaoyan Ding. 2026. "Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models" Water 18, no. 7: 840. https://doi.org/10.3390/w18070840

APA Style

Li, C., Wang, Y., Gao, L., & Ding, Q. (2026). Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models. Water, 18(7), 840. https://doi.org/10.3390/w18070840

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models

Abstract

1. Introduction

2. Related Works

2.1. LLM Empowerment of KG

2.2. Application Research of KG in Water Conservancy Facility Safety

3. Methods

3.1. Overall Research Framework

3.2. Model Construction Process

3.2.1. Multi-Source Heterogeneous Data Processing

3.2.2. Domain Ontology Modeling

3.2.3. Retrieval-Augmented Knowledge Extraction with LLM

3.2.4. Graph Database Storage and Visualization

4. Construction of Water Conservancy Facility Safety KG

4.1. Data Sources

4.2. KG Construction

4.2.1. Construction of Water Conservancy Facility Safety Ontology

4.2.2. LLM-Integrated Prompt Engineering and Ontology-Constrained Entity-Relationship Extraction

4.3. Model Performance

5. KG Visualization and Application

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI