A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management

Yang, Libo; Mao, Libo; Wang, Xiaodong; Zhang, Xiuyu

doi:10.3390/su18126138

Open AccessArticle

A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management

¹

School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

²

Advanced Research Institute for Digital-Twin Water Conservancy, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

³

School of Water Resources, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(12), 6138; https://doi.org/10.3390/su18126138 (registering DOI)

Submission received: 14 May 2026 / Revised: 5 June 2026 / Accepted: 10 June 2026 / Published: 15 June 2026

(This article belongs to the Section Sustainable Water Management)

Download

Browse Figures

Versions Notes

Abstract

To address the issues of low intelligence and weak knowledge support in traditional water resource security risk analysis and response strategy generation, this paper proposes a novel framework based on hybrid retrieval augmentation and multi-agent collaboration. First, the proposed method integrates the DPSIR-CRITIC-TOPSIS framework with an obstacle degree model to construct an evaluation agent. This agent enables the intelligent assessment of regional water resource security and the precise extraction of key obstacle factors. Second, a water resource security knowledge graph and a vector knowledge base are constructed utilizing textual data, including policies and regulations, technical standards, the academic literature, and typical case studies. A hybrid retrieval augmentation mechanism—integrating graph reasoning, dual-path recall, and relation expansion—is designed to enhance the precision and relevance of the generated risk response strategies. Finally, a collaborative workflow comprising a master control agent alongside evaluation, retrieval, generation, and review agents is established to iteratively optimize the strategies through cross-validation and compliance reviews. In an empirical case study utilizing multi-year data from Henan Province and its 18 prefecture-level cities, experimental results demonstrate that the proposed method significantly outperforms baseline models across multi-dimensional semantic evaluation metrics, as well as BLEU-4 and ROUGE-L scores. The multi-agent collaborative architecture developed in this study, which integrates data-driven evaluation with knowledge-based hybrid retrieval augmentation, significantly elevates the intelligence level of water resource security assessment. It provides robust technical support for the analysis of regional water resource security situations and the intelligent generation of actionable response strategies.

Keywords:

water resource security evaluation; hybrid retrieval augmentation; multi-agent collaboration; knowledge graph; intelligent strategy generation; sustainability

1. Introduction

Water resource are a critical fundamental resource for socio-economic development, ecological environment protection, and regional security, serving as the cornerstone of long-term sustainability. Developing a water resource security analysis method that integrates scientific evaluation capabilities with governance support is of great practical significance for enhancing regional water resources management and advancing sustainable development goals [1].

Extensive research on water resource security evaluation has been conducted globally, resulting in relatively mature analytical frameworks and methodological systems. Regarding evaluation frameworks, models such as PSR and DPSIR have been widely applied to elucidate the interactive relationships among driving forces, pressures, states, impacts, and responses within resource and environmental systems [2,3]. Methodologically, approaches such as the Analytic Hierarchy Process (AHP) [4], entropy weight method [5], CRITIC [6], TOPSIS [7], and fuzzy comprehensive evaluation are frequently employed for indicator weighting and comprehensive measurement [8,9]. For problem identification, the obstacle degree model is utilized to further uncover the key constraints affecting regional water resource security [10]. The existing literature provides crucial support for assessing regional water resource security levels, analyzing spatiotemporal differentiation characteristics, and diagnosing major obstacle factors, thereby establishing a methodological foundation for water resource security governance.

However, current studies remain predominantly evaluation-oriented. In the subsequent phase of generating response strategies, the levels of intelligence and knowledge support are still insufficient. On the one hand, formulating specific management countermeasures from traditional evaluation results typically relies heavily on human expert experience, leading to low levels of automation and intelligence in decision support. On the other hand, although large language models (LLMs) have recently demonstrated robust capabilities in knowledge integration, textual reasoning, and solution generation [11,12], they continue to face challenges in highly specialized domains like water resource governance. These challenges include factual hallucinations, a lack of reliable evidence, weak regulatory compliance, and inadequate regional specificity [13,14]. In particular, traditional retrieval augmentation methods largely rely on simple textual similarity matching [15,16,17], which struggles to fully unearth the structured correlations among “obstacle factors–risk issues–governance measures–policy support.” Furthermore, strategies generated by a single model lack the necessary cross-reviews and constraint feedback mechanisms [18], thus limiting their practical application in regional water governance decision-making [19].

To address these limitations, this paper proposes a framework for water resource security evaluation and response strategy generation based on hybrid retrieval augmentation and multi-agent collaboration. First, based on the DPSIR framework, a regional water resource security evaluation indicator system is constructed. By integrating the CRITIC-TOPSIS method with the obstacle degree model, this system comprehensively measures regional water resource security levels and identifies key obstacle factors. Second, a water resource security knowledge graph and a vector knowledge base are constructed utilizing multi-source heterogeneous knowledge, including policies and regulations, technical standards, the academic literature, and typical case studies. A hybrid retrieval augmentation mechanism—incorporating graph reasoning, dual-path recall, and relation expansion—is designed to strengthen the effective linkage between evaluation results and governance knowledge. Finally, a collaborative workflow comprising a master control agent alongside evaluation, retrieval, generation, and review agents is established. This workflow achieves a closed-loop integration from quantitative evaluation to the output of governance recommendations through task decomposition, knowledge augmentation, strategy generation, and compliance review.

Compared to existing research, the primary contributions of this paper are threefold. First, it integrates quantitative evaluation and strategy generation into a unified framework, thereby elevating the intelligence level of extending water resource security analysis into decision support. Second, by utilizing a hybrid retrieval augmentation mechanism that fuses a knowledge graph with a vector knowledge base, it enhances the knowledge support capabilities, interpretability, and traceability of governance strategy generation. Third, the introduction of multi-agent collaboration and review feedback mechanisms mitigates, to a certain extent, the risks of hallucination and compliance deviations inherent in LLMs within specialized governance scenarios. This improves the specificity, rationality, and implementability of the generated strategies.

Building upon this foundation, this study takes Henan Province and its 18 prefecture-level cities as empirical subjects to construct an experimental system for regional water resource security evaluation and strategy generation. By validating the effectiveness of the proposed method, this research aims to provide novel insights into regional water resource security governance and intelligent decision-making.

2. Methods

2.1. Overall Architecture

As shown in Figure 1, this paper constructs a hierarchical methodological framework for water resource security evaluation and response strategy generation. Using the raw data of the water resource system as the foundational input, this framework integrates methods such as quantitative evaluation, domain knowledge organization, knowledge-augmented retrieval, and multi-agent collaborative reasoning. This establishes a closed-loop analytical process spanning water resource security state identification and key obstacle diagnosis to governance strategy generation. The overall framework comprises three components: a calculation layer, a knowledge layer, and a strategy layer, which are respectively responsible for quantitative diagnosis, knowledge support, and strategy generation.

Specifically, the calculation layer focuses on the quantitative characterization of the water resource security state. This study designs a water resource security evaluation agent responsible for converting raw data into diagnostic information. First, based on the DPSIR theory, this agent constructs a water resource security evaluation indicator system to systematically characterize the water resource system across five dimensions: driving forces, pressures, states, impacts, and responses. Subsequently, the objective weights of each indicator are determined using the CRITIC method, and the comprehensive security score of each evaluation object is calculated by integrating the TOPSIS model. Based on this, the obstacle degree model is further utilized to identify the primary constraining factors affecting regional water resource security. The outputs of the evaluation agent—including the comprehensive evaluation score, security level determination, and a set of key obstacle factors—serve as problem-oriented inputs for the subsequent knowledge retrieval and strategy generation phases.

The knowledge layer is primarily responsible for the structured organization and on-demand supply of domain knowledge. Addressing the multi-source heterogeneous knowledge involved in the water resource security management process—such as policies and regulations, planning texts, technical standards, governance measures, and the academic literature—this paper constructs a unified knowledge graph and a vector knowledge base for the water resource security domain. Building upon this, a hybrid retrieval augmentation mechanism is introduced to conduct knowledge retrieval, relationship association, and contextual organization tailored to the evaluation results and obstacle factors.

The strategy layer primarily executes the collaborative generation and result verification of response strategies. This layer consists of four types of functional entities: a master control agent, a knowledge retrieval agent, a strategy generation agent, and a strategy review agent. First, the master control agent receives the evaluation results and obstacle factors outputted by the calculation layer, translates them into requirement directives, and schedules the operational workflow. Next, the knowledge retrieval agent generates search queries and employs the hybrid retrieval augmentation mechanism to extract relevant policies, standards, and governance case studies from the knowledge layer, thereby providing contextual augmentation. Subsequently, the strategy generation agent drafts preliminary adaptive management strategies for regional water resources by combining the diagnostic data with the augmented knowledge. The review agent then performs feasibility and compliance checks on these preliminary strategies. Finally, the final governance strategies are iteratively generated under the constraints of the review feedback.

2.2. Water Resource Security Evaluation Method

Water resource security evaluation is a complex process. This study designs a water resource security evaluation agent responsible for the quantitative diagnosis of regional water resource systems, establishing a problem-oriented data foundation for subsequent knowledge retrieval and multi-agent strategy generation. Based on the DPSIR framework, this agent constructs an indicator system, computes the water resource security state by integrating CRITIC weights with the TOPSIS model [20], and extracts the primary constraining factors utilizing the obstacle degree model [21].

2.2.1. Construction of Evaluation Indicator System Based on DPSIR

The DPSIR (Driving forces-Pressures-States-Impacts-Responses) framework emphasizes the causal relationships between human activities and the resource-environment system, effectively describing the formation and evolution process of water resource security. Specifically, driving forces (D) reflect fundamental factors such as economic growth, population scale, and urbanization; pressures (P) represent external stresses exerted on the water system by resource exploitation, pollution discharge, and ecological disturbances; states (S) characterize the quantity, quality, and spatiotemporal distribution of regional water resources; impacts (I) reflect the consequences of water resource changes on the eco-environment, socio-economy, and water supply security; and responses (R) denote regulatory capabilities such as government governance, technological investment, water-saving measures, and institutional safeguards.

Based on the principles of scientificity, systematicity, accessibility, and regional applicability of indicator selection, this paper constructs the evaluation indicator system and its attributes, as detailed in Figure 2. A larger value for a positive indicator (+) signifies a higher positive contribution to the regional water resource security system, whereas a larger value for a negative indicator (−) indicates greater risk pressure faced by the system. The indicator data are sourced from the Statistical Yearbooks and Water Resources Bulletins of Henan Province and its municipalities.

2.2.2. Comprehensive Evaluation and Obstacle Degree Model

To eliminate the dimensional differences among various evaluation indicators, the evaluation agent employs the min–max standardization method to map the raw data into the [0, 1] interval, yielding the standardized matrix

X^{'} = {(x_{i j})}_{m \times n}

.

Next, the CRITIC method is used to calculate the objective weight

W_{j}

of each indicator, which is then integrated with the TOPSIS model for comprehensive evaluation. The weights are multiplied by the standardized matrix to obtain the weighted normalized matrix

V = {(v_{i j})}_{m \times n}

, where

v_{i j} = W_{j} \cdot x_{i j}

. The Euclidean distances from the

i

-th evaluation object to the positive ideal solution

V^{+}

and the negative ideal solution

V^{-}

are calculated (denoted as

D_{i}^{+}

and

D_{i}^{-}

, respectively), ultimately yielding the comprehensive closeness degree of water resource security

C_{i}

:

C_{i} = \frac{D_{i}^{-}}{D_{i}^{+} + D_{i}^{-}},

(1)

A larger value of

C_{i}

indicates a higher level of water resource security in the region.

Upon obtaining the comprehensive evaluation index, the evaluation agent invokes the obstacle degree model to identify the key obstacle factors constraining water resource security. First, the deviation degree

I_{i j} = 1 - x_{i j}

of the

i

-th evaluation object on the

j

-th indicator is calculated. Then, incorporating the indicator weight

W_{j}

, the obstacle degree

O_{i j}

is calculated as follows:

O_{i j} = \frac{I_{i j} \cdot W_{j}}{\sum_{j = 1}^{n} I_{i j} \cdot W_{j}}

(2)

A larger value of

O_{i j}

signifies a greater negative impact of that indicator on water resource security. By sorting the values of

O_{i j}

, the primary obstacle factors can be extracted.

2.3. Hybrid Retrieval-Augmented Method

2.3.1. Construction of the Water Resource Security Knowledge Graph

Water resource security governance involves multi-source heterogeneous knowledge, including natural endowments, water consumption structures, policies and regulations, engineering measures, and typical case studies. This knowledge is scattered across policy texts, technical standards, the academic literature, and governance cases, lacking a unified structured representation. Consequently, it is difficult to directly support subsequent hybrid retrieval and multi-agent strategy generation. To this end, this study constructs a domain knowledge graph oriented toward water resource security problem diagnosis and governance strategies. The construction process comprises five steps: ontology design [22], identification of knowledge sources, entity and relationship extraction [23,24], entity disambiguation and fusion [25], and graph database storage [26,27].

Ontology design is the prerequisite for knowledge graph construction, with the core task of defining entity and relationship types within the graph. Focusing on the water resource security management chain of “evaluation and diagnosis–risk identification–governance response,” this study designs nine types of entities and eleven types of relationships. The design of entity types follows the principle of covering the entire operational process—from evaluation indicators to governance implementation—ensuring that causal reasoning chains can be formed through the relationships between different entities. The specific entity types and their connotations are shown in Table 1. The design of relationship pathways follows the logic of “risk issues → governance measures → policy support”, while also accounting for spatial attribution and case correlations. The schema of the water resource security knowledge graph is illustrated in Figure 3.

The construction of the knowledge graph is founded on multi-source domain knowledge. This primarily includes the academic literature related to water resource security evaluation, risk identification, ecological protection, and resource management; national and local policies and regulations, planning texts, and water resource bulletins; and typical case materials concerning water conservation governance, water pollution prevention and control, ecological restoration, and water conservancy project construction. This knowledge is utilized to extract entities—such as obstacle factors, risk issues, governance measures, policy constraints, and case evidence—and their respective relationships.

After determining the ontology schema, this study adopts a “rule mapping + manual annotation + expert verification” approach to accomplish knowledge extraction and triple construction. For structured and semi-structured materials, such as water resource bulletins, statistical tables, and policy provisions, triples are generated based on predefined field mapping rules. For instance, “Region–Indicator–Value” information is mapped to triples such as “Spatial Region–Has Indicator–Evaluation Indicator” or “Evaluation Indicator–Characterizes–Risk Issue”; similarly, “Policy Name–Governance Requirement–Applicable Object” information is mapped to triples like “Policy and Regulation–Constrains/Supports–Governance Measure” or “Policy and Regulation–Applicable to–Spatial Region”. For unstructured materials, such as the academic literature, planning texts, and typical cases, entity recognition and relationship annotation are conducted according to manual annotation rules to form candidate triples.

The annotation rules primarily encompass the following aspects: First, entity annotation adheres to the principle of the “minimal semantically complete unit,” prioritizing the extraction of terms or phrases that can independently express water resource security concepts. Second, relationship annotation focuses on explicit semantic associations within a sentence or across adjacent sentences; corresponding relationships are established when explicit expressions of causality, support, constraint, governance, or spatial attribution exist in the text. Third, synonymous or near-synonymous expressions are unified based on domain terminology standards and synonym lexicons. For example, expressions like “low water use efficiency” and “inefficient water resource utilization” are merged into identical or similar obstacle factors. Fourth, ambiguous entity names are disambiguated and fused by incorporating contextual semantics, policy application scopes, and domain expert opinions.

Regarding validation procedures, this study implements quality control across three dimensions: entity standardization, relationship rationality, and triple consistency. First, a manual review of the extraction results is conducted to check whether entity type classifications are accurate, naming conventions are standardized, and duplicate nodes exist. Second, relationship connections are checked for legality based on ontology schema constraints, verifying, for example, whether an “Obstacle Factor” correctly points to a “Risk Issue,” and whether a “Governance Measure” correctly corresponds to a relevant “Policy and Regulation” or “Typical Case.” Third, experts in the fields of water resource management and ecological governance are invited to cross-validate the candidate triples, focusing on examining the rationality of causal relationships, governance measure matching relationships, and policy support relationships. Following multiple rounds of revision, triples with unclear semantics, insufficient evidence, or ambiguous relationship types are removed, and duplicate or synonymous entities are fused.

Finally, the triples are written into the Neo4j graph database, establishing a water resource security governance knowledge graph comprising 2418 entity nodes and 5792 relationship edges. The visualization of this graph database is presented in Figure 4.

2.3.2. Construction of Vector Knowledge Base

(1): Data Sources and Preprocessing

The data sources for the vector knowledge base primarily encompass the following categories: first, national and local water resource-related policies, regulations, and planning texts, such as the Water Law of the People’s Republic of China, the Comprehensive National Water Resources Plan, and various provincial and municipal water resources bulletins; second, technical specifications and standard documents in the domain of water resource security evaluation and governance; third, the academic literature and research reports relevant to the study area; and fourth, typical case studies of water resource security governance. Regarding the corpus size, this study collected a total of 86 textual documents, comprising 28 policy, regulatory, and planning texts; 17 technical specifications and standard documents; 31 academic literature and research reports; and 10 typical governance case studies. After text cleaning and format conversion, the raw corpus contained approximately 1.286 million characters. Because texts from diverse sources exhibit variations in structural hierarchy, terminology usage, and information granularity, direct storage is inadequate to meet the knowledge organization requirements for semantic retrieval and question-answering generation. Tailored to the knowledge characteristics of the water resource security domain, this section details the preprocessing, semantic chunking [28], vector representation, and index construction applied to the raw materials [29]. The construction process of the vector knowledge base is illustrated in Figure 5.

(2): Text Chunking

In this study, texts are chunked based on their semantic completeness and domain applicability. Specifically, the target length of the text chunks is set to 256–512 characters to balance retrieval granularity and vector encoding effectiveness. For policy and regulatory materials, articles, chapters, or specific management requirements serve as the primary chunking units; for evaluation standard materials, the corresponding relationships among indicator names, applicable conditions, calculation methods, and grading thresholds are preserved; and for governance case materials, the narrative structure of “problem type–governance measure–implementation effect” is maintained. After semantic chunking, a total of 2184 knowledge chunks were generated, including 732 policy and regulation chunks, 438 technical standard chunks, 764 academic literature and research report chunks, and 250 typical case chunks. The resulting knowledge chunks correspond to clearly defined water resource security themes, such as water resource carrying capacity evaluation, groundwater overexploitation governance, water source protection, ecological flow safeguarding, and river basin water allocation.

To enhance the interpretability of retrieval results, the knowledge chunks simultaneously retain necessary contextual information and metadata. For example, chunks related to water resource carrying capacity evaluation must retain their affiliated evaluation system, indicator meanings, and scope of application; chunks involving drinking water source protection must retain the protection zone type, regulatory requirements, and relevant legal basis.

(3): Vector Representation and Index Construction

After completing text chunking, a text vectorization model is employed to map the knowledge chunks into high-dimensional semantic vectors and establish a vector index. Vector retrieval possesses certain advantages in handling synonymous expressions, near-synonymous expressions, and scenario-based questions, making it suitable for the retrieval requirements of the water resource security domain, which is characterized by dense professional terminology and diverse problem formulations. For example, when a user asks, “What measures should be taken for the continuous decline of groundwater levels in a certain region?”, the retrieval process can associate relevant content such as “groundwater overexploitation governance,” “groundwater extraction reduction,” “water source replacement,” and “water-saving transformation,” rather than being limited to strictly identical literal expressions.

Let the set of knowledge chunks in the knowledge base be:

D = {d_{1}, d_{2}, \dots, d_{N}}

(3)

Their corresponding vector representations are:

V = {v_{1}, v_{2}, \dots, v_{N}}

(4)

For a user input question

q

, it is converted into a query vector

v_{q}

, and the cosine similarity is used to calculate the degree of semantic relevance between it and the knowledge chunk vectors:

S i m (q, d_{i}) = \frac{v_{q} \cdot v_{i}}{‖ v_{q} ‖ ‖ v_{i} ‖}

(5)

where

v_{q}

and

v_{i}

represent the vector representations of the user question and the

i

-th knowledge chunk, respectively. Based on the similarity scores, the knowledge chunks are sorted to recall policy documents, evaluation standards, technical measures, and case experiences relevant to the current question.

During the index construction process, auxiliary retrieval fields are established based on the thematic attributes of water resource security knowledge, including categories such as water resource allocation, water environment quality, groundwater management, water conservation and control, ecological flow safeguarding, risk early warning, and emergency dispatch. Combining vector similarity retrieval with thematic field constraints can, to a certain extent, reduce the recall ratio of irrelevant content and improve the matching degree between retrieval results and specific operational problems. For instance, for questions like “What are the evaluation indicators for water resource carrying capacity?”, the retrieval results should primarily point to evaluation indicator systems, calculation methods, and grading standards; for questions like “How to prevent and control pollution risks in water source areas?”, the retrieval results should focus on associating content such as protection zone delineation, risk source screening, monitoring and early warning, and emergency response.

Furthermore, considering the existence of a large number of proper nouns, policy terminology, and indicator names in the water resource security domain, a single semantic vector retrieval approach may struggle to cover the demands for precise terminology matching. Based on the vector index, this paper constructs a sparse keyword index, establishing an inverted index through word segmentation, word frequency statistics, and the calculation of term weights using the BM25 algorithm. This index is utilized to supplement the deficiencies of vector retrieval in the precise matching of regulation names, indicator names, and regional names. This forms a knowledge base structure combining semantic retrieval and keyword retrieval, providing a retrievable and traceable knowledge foundation for subsequent water resource security question-answering.

2.3.3. Hybrid Retrieval Mechanism

After completing the construction of the knowledge graph and the vector knowledge base, the key is to construct an interpretable and traceable evidence set necessary for subsequent generation, based on the key obstacle factors output from the quantitative diagnosis. This study designs a hybrid retrieval augmentation mechanism encompassing “task query construction—graph-augmented query—dual-path hybrid recall—graph relation expansion—knowledge integration”. This mechanism is illustrated in Figure 6.

(1): Task Query Construction

For each strategy sample, a task query

Q

is constructed based on the comprehensive evaluation results, the security state, and the diagnostic information regarding the obstacle degrees. The top-

K

factors (

K = 5

) from the set of key obstacle factors are extracted, denoted as:

F = {f_{1}, f_{2}, \dots, f_{5}}

(6)

This set

F

simultaneously serves as the seed entity set for subsequent knowledge graph relation expansion.

(2): Graph-Augmented Query

Prior to conducting text retrieval, few-shot reasoning is performed using the knowledge graph on the risk chains, problem types, and governance paths associated with the obstacle factors to obtain a graph reasoning summary:

S_{K G} = GraphReason (F, Q)

(7)

Subsequently, the task query and the reasoning summary are fused to construct an augmented query:

Q_{e n h} = Concat (Q, S_{K G})

(8)

(3): Dual-Path Hybrid Recall and RRF Reranking

Considering the textual characteristics of the regional water resource domain, this paper adopts a dual-path recall strategy that fuses dense vectors and sparse keywords. A text vectorization model is utilized to map the augmented task query into high-dimensional semantic vectors, retrieving text chunks by calculating similarity to extract semantic features. Concurrently, sparse retrieval is conducted using the BM25 algorithm to enhance the recall rate of specific professional terminology, policy document titles, and named entities.

Upon obtaining the dual-path retrieval results, the Reciprocal Rank Fusion (RRF) algorithm is introduced to rerank the candidates. This algorithm calculates a comprehensive score based on the ranking positions of each text chunk across different retrieval paths, thereby integrating semantic relevance with lexical matching degree. The system selects the top 10 text chunks based on their reranking scores to construct an unstructured text evidence set.

(4): Graph Relation Expansion

Unstructured text retrieval possesses certain limitations when handling systematic associations among entities. To this end, the system introduces structured relational reasoning within the pre-constructed domain knowledge graph. Utilizing the key obstacle factors extracted during the numerical evaluation phase as seed entities, it conducts multi-hop relation expansion.

An excessively small number of expansion hops leads to insufficient extraction of logical chains, whereas too many hops can easily introduce redundant nodes and increase computational overhead. Balancing information coverage and the signal-to-noise ratio, this study sets the multi-hop expansion depth to two hops. This parameter setting enables the extraction of risk associations and governance logical chains adjacent to the key obstacle factors, providing structured background knowledge support for response strategy generation.

(5): Knowledge Integration and Hierarchical Context Organization

The final phase of the retrieval stage involves the deep integration and contextual reconstruction of the aforementioned knowledge. This study fuses the structured logical chains obtained through graph relation expansion with the unstructured text evidence acquired via reranking, thereby constructing an augmented context for subsequent multi-agent collaborative reasoning.

To mitigate the “attention dispersion” and “factual hallucination” issues commonly encountered by large language models when processing massive amounts of unstructured information, this study implements a hierarchical organization of the augmented context. Specifically, the organization adheres to a logical sequence: “governance logical chain—correspondence between measures and policies—key facts and data snippets—explanation of response to key obstacle factors.” This structured prompting mechanism effectively strengthens the guiding role of numerical diagnosis during the natural language generation process, thereby significantly enhancing the specificity, interpretability, and traceability of the governance strategies generated by the multi-agent system.

2.4. Multi-Agent Collaboration Method

2.4.1. Overall Multi-Agent Architecture and Functional Division

This paper develops a multi-agent collaborative strategy generation method orchestrated by a master control agent. Grounded in a unified task state, this method introduces a hybrid retrieval augmentation mechanism to provide knowledge support for collaborative reasoning. Driven by review feedback to iteratively optimize strategies, it organizes agents with diverse functions to seamlessly execute the collaborative generation process—from regional diagnostic results to the output of governance strategies. The multi-agent collaborative workflow is illustrated in Figure 7.

The multi-agent system constructed in this paper is denoted as:

A = {A_{E}, A_{M}, A_{R}, A_{G}, A_{V}}

(9)

where

A_{E}

represents the evaluation agent,

A_{M}

represents the master control agent,

A_{R}

represents the knowledge retrieval agent,

A_{G}

represents the strategy generation agent, and

A_{V}

represents the strategy review agent.

To accommodate the complexity of regional water resource security collaborative governance, this paper clarifies the specific roles and functional boundaries of the five aforementioned types of agents, as shown in Table 2. Furthermore, the detailed prompt designs for each agent are presented in Appendix A Table A1 (Table A1. Prompts of agent) Under the orchestration of the master control agent, each agent performs its respective duties, collectively forming an iterative collaborative architecture.

Building upon the explicit division of labor among the agents, the system input oriented toward regional governance tasks is defined as:

I_{i} = {q_{i}, c_{i}, O_{i}, X_{i}^{(0)}}

(10)

where

q_{i}

is the natural language task query directed at the

i

-th evaluation object;

c_{i}

is the comprehensive closeness degree of security;

O_{i} = {O_{i 1}, O_{i 2}, \dots, O_{i k}}

is the set of key obstacle factors; and

X_{i}^{(0)}

is the initial augmented context generated by the hybrid retrieval module.

Consequently, the initial shared state of the system can be expressed as:

S_{i}^{(0)} = {q_{i}, c_{i}, O_{i}, X_{i}^{(0)}, P_{i}^{(0)}, F_{i}^{(0)}},

(11)

where

P_{i}^{(0)}

represents an empty initial candidate strategy, and

F_{i}^{(0)}

represents empty initial feedback information.

In each iteration, the master control agent selects the agent to be executed based on the current shared state and completes the state update. Its scheduling process can be expressed as:

a_{i}^{(t)} = π_{M} (S_{i}^{(t)})

(12)

S_{i}^{(t + 1)} = Update (S_{i}^{(t)}, a_{i}^{(t)}, z_{i}^{(t)})

(13)

where

a_{i}^{(t)}

is the agent scheduled in the

t

-th round, and

z_{i}^{(t)}

is the output result of that agent. Through this shared state-driven mechanism, the system can unify evaluation results, retrieval evidence, generated drafts, and review comments into a single task space, providing a foundation for subsequent collaborative reasoning and iterative correction.

2.4.2. Collaborative Reasoning Driven by Hybrid Retrieval Augmentation

This paper designs a collaborative reasoning workflow encompassing “diagnosis structuring—task rewriting—evidence retrieval—strategy generation,” enabling various types of agents to divide tasks and collaborate around the same problem chain.

First, the evaluation agent

A_{E}

structurally represents the regional water resource security state. For the

i

-th evaluation object, the evaluation agent outputs the diagnostic result:

D_{i} = {c_{i}, l_{i}, O_{i}, Δ_{i}}

(14)

where

l_{i}

represents the security level, and

Δ_{i}

represents the degree to which each indicator deviates from the ideal state. This result retains not only the comprehensive evaluation conclusion but also the ranking information of the key obstacle factors, thereby providing a clear problem orientation for subsequent retrieval and generation.

Second, based on the diagnostic results and user demands, the master control agent

A_{M}

rewrites and refines the original task to construct an intermediate task representation oriented toward retrieval and generation:

{q^{'}}_{i} = f_{M} (q_{i}, D_{i})

(15)

The core of this process is organizing “regional object—security state—key obstacles—strategy objectives” into a unified problem description. For example, when a region’s key obstacle factors are concentrated on “insufficient per capita water resources,” “excessively high proportion of groundwater supply,” and “excessive water consumption per mu of farmland irrigation,” the master control agent will prioritize highlighting three types of governance demands: supply–demand contradiction, groundwater extraction reduction, and agricultural water conservation. This reduces semantic drift during the subsequent retrieval process.

Building upon this, the knowledge retrieval agent

A_{R}

invokes the hybrid retrieval augmentation module to generate the evidence set most relevant to the current task. Its retrieval result can be expressed as:

E_{i} = Retrieve ({q^{'}}_{i}, O_{i}, G, V)

(16)

where

G

represents the knowledge graph,

V

represents the vector knowledge base, and

E_{i}

encompasses multiple types of evidence, including policy provisions, planning requirements, technical specifications, case chunks, and graph relational chains. Unlike simple text recall, the evidence organization here simultaneously preserves the associative structure of “obstacle factors—risk issues—governance measures—policy support,” which facilitates better interpretability of the subsequent generated results.

Subsequently, the strategy generation agent

A_{G}

generates a candidate governance strategy under the joint constraints of the diagnostic result

D_{i}

and the evidence set

E_{i}

:

P_{i}^{(t)} = g_{G} (D_{i}, E_{i}, F_{i}^{(t - 1)})

(17)

where

F_{i}^{(t - 1)}

is the previous round of review feedback. When

t = 1

, the generation agent primarily produces the initial draft based on the diagnostic results and retrieved evidence; when

t > 1

, it needs to modify local content based on the feedback. To improve the stability and verifiability of the generated content, this paper requires that the candidate strategies contain at least the following four categories of elements: first, problem summarization and cause explanation; second, governance measures corresponding to the obstacle factors; third, policy or case basis; and fourth, implementation priorities and phased arrangements. Such an output structure helps to mitigate issues like “providing measures without explaining the basis” or “only listing principles while lacking practical implementation points”.

During the collaborative reasoning process, the master control agent judges, based on the current state, whether the next step should prioritize supplementing evidence or directly generating a strategy; the evaluation agent looks back at the candidate strategy when necessary to judge whether it covers the key obstacle factors. Through this task state-oriented dynamic collaboration, the system introduces rigorous logical constraints and factual verification into the strategy generation phase. This not only elevates the automation level of the decision support process but also effectively averts the risk of domain knowledge hallucination potentially triggered by the unconstrained generation of a single model.

2.4.3. Iterative Generation Under Review Feedback Constraints

Considering that water resource governance strategies must not only address regional issues but also satisfy the requirements of policy constraints, factual consistency, and implementation feasibility, this paper introduces the strategy review agent

A_{V}

following the generation phase. This establishes a closed-loop iterative mechanism of “generation—review—feedback—regeneration”.

For the candidate strategy

P_{i}^{(t)}

generated in the

t

-th round, the review agent conducts verification across four dimensions:

(1): Problem matching degree: Whether the strategy responds to the current region’s key obstacle factors and risk issues;
(2): Knowledge support degree: Whether the core measures are supported by retrieved evidence, policy provisions, or case materials;
(3): Policy compliance: Whether the content is consistent with current laws and regulations, planning requirements, and rigid constraints;
(4): Implementation feasibility: Whether the measures conform to regional governance realities, and whether there are obvious logical conflicts or execution barriers.

Based on this, the review result can be expressed as:

R_{i}^{(t)} = {s_{i}^{(t)}, v_{i}^{(t)}, F_{i}^{(t)}}

(18)

where

s_{i}^{(t)}

is the comprehensive review score,

v_{i}^{(t)}

is the number of hard constraint violations, and

F_{i}^{(t)}

is the structured feedback opinion. The comprehensive review score is defined as:

s_{i}^{(t)} = \sum_{m = 1}^{4} ω_{m} s_{i m}^{(t)}, \sum_{m = 1}^{4} ω_{m} = 1

(19)

where

s_{i m}^{(t)}

represents the score of the

m

-th review dimension, and

ω_{m}

is the corresponding weight. If the candidate strategy contains explicit policy conflicts, such as breaching total water consumption control requirements, neglecting groundwater extraction reduction constraints, or lacking necessary ecological water safeguards, it is recorded as a hard constraint violation.

To avoid ineffective iterations, this paper sets the following convergence determination condition:

b_{i}^{(t)} = \{\begin{array}{l} 1, & s_{i}^{(t)} \geq τ and v_{i}^{(t)} = 0 \\ 0, & otherwise \end{array}

(20)

where

τ

is the review passing threshold. When

b_{i}^{(t)} = 1

, the current strategy is deemed to have met the output requirements and can serve as the final result; otherwise, the review agent generates targeted feedback and returns it to the master control agent for the next round of scheduling. The corresponding state update process can be expressed as:

S_{i}^{(t + 1)} = Update (S_{i}^{(t)}, A_{V}, F_{i}^{(t)})

(21)

Driven by the feedback, the master control agent selects a more appropriate correction path based on the problem type. If the problem is primarily manifested as insufficient evidence, it prioritizes scheduling the knowledge retrieval agent to supplement regulatory or case basis. If the problem is primarily manifested as unclear measure descriptions or incomplete logic, it schedules the strategy generation agent to perform local rewriting. If the problem involves omitted obstacle factors, it re-invokes the evaluation agent to cross-check the diagnostic results against the strategy’s coverage. Such a differentiated correction approach can reduce repetitive generation and improve system stability.

Ultimately, the system terminates when the review threshold or the maximum iteration rounds

T_{\max}

are reached, and outputs the final governance strategy:

P_{i}^{*} = P_{i}^{(t^{*})}

(22)

where

t^{*}

represents the round satisfying the stopping conditions. By introducing the iterative mechanism under the constraints of review feedback, the proposed method can reduce risks such as factual errors, policy deviations, and measure mismatches while preserving the natural language generation capabilities of large language models. Consequently, this enhances the specificity, compliance, and implementability of the regional water resource governance strategies. A specific example of the generated strategies can be found in Appendix A Table A2.

3. Experimental Results

The experimental environment configuration of this study is presented in Table 3, and the parameter configuration of the large language models is shown in Table 4.

Regarding model selection and division of labor, DeepSeek-R1 serves as the core foundational model for the multi-agent framework. Meanwhile, Qwen3-8B, Qwen2.5-72B, and GLM-4 are employed as baseline comparison models to verify the generalization and adaptation capabilities of the strategy framework across foundational models with varying parameter scales. In addition, ChatGPT is introduced as a judge model to conduct standardized scoring of the generated governance recommendations based on predefined evaluation dimensions, thereby ensuring the objectivity of the experimental assessment. The specific parameters of each model are detailed in Table 4.

3.1. Study Area and Data Sources

This paper selects Henan Province and its 18 prefecture-level cities as the experimental subjects. Spanning four major river basins, including the Yellow River and the Yangtze River, Henan Province is a typical major agricultural and populous province. It faces extremely intense competition for water resource elements and the strict constraints of the “Four Determinations by Water” policy, making it an ideal study area to verify the proposed system’s capability in handling complex compliance conflicts. The regional map of Henan Province is shown in Figure 8.

The experimental data encompasses multi-source panel data and text corpora from 2018 to 2024. Among these, the structured panel data used for numerical calculations are sourced from the Henan Statistical Yearbook, the Henan Provincial Water Resources Bulletin, and ecological environment statistical materials. Furthermore, the unstructured knowledge texts used to construct the hybrid retrieval augmentation are derived from national and local water resource management regulations, planning outlines, technical standards, and typical governance case studies.

3.2. Reliability Verification of Regional Water Resource Security Evaluation

To verify the effectiveness of the evaluation function within the multi-agent collaborative framework proposed in this paper, calculations were performed using data from 18 prefecture-level cities in Henan Province from 2018 to 2024. Based on the calculated comprehensive closeness degree scores, a heatmap illustrating the evolution of water resource security in Henan Province was plotted, as shown in Figure 9.

The overall evolutionary trend of the heatmap indicates that during the study period, the water resource security scores of most prefecture-level cities in Henan Province exhibited a steady upward trend, with colors gradually transitioning toward high-score intervals. Regarding spatial distribution, the scores of the cities demonstrate obvious regional differences. Southern cities such as Xinyang and Nanyang, relying on favorable natural water resource conditions, maintained relatively high scores consistently. Conversely, water-deficient cities in the central and northern regions, such as Zhengzhou, Xuchang, and Xinxiang, generally exhibited low scores at the beginning of the study period. Over time, particularly after 2021, influenced by the increased water supply from the Middle Route of the South-to-North Water Diversion Project and the expanded scale of unconventional water utilization within the region, the scores of these water-receiving cities experienced a distinct growth inflection point.

Concurrently, since 2022, the province has intensified the implementation of water conservation and control actions, leading to widespread improvements in water consumption indicators, such as water consumption per 10,000 RMB of GDP. Consequently, the rate of increase in the comprehensive scores of various cities in the heatmap has also accelerated. These spatiotemporal distribution characteristics and evolutionary patterns reflected by the heatmap are highly consistent with Henan Province’s natural water resource endowment—which is abundant in the south and scarce in the north—as well as the effectiveness of water network regulation in recent years. This indicates that the quantitative results output by the evaluation module authentically reflect the dynamic changes in the regional water resource security state. It verifies the accuracy and effectiveness of the evaluation function, thereby establishing a reliable foundation for subsequent objective factual retrieval and strategy generation by the agents.

3.3. Evaluation Metrics and Methods for Response Strategy Generation

3.3.1. Multi-Dimensional Semantic Evaluation Standards Based on AHP

Considering that the task of this study involves strategy generation oriented toward regional governance scenarios, relying solely on similarity metrics is inadequate to fully reflect the relative merits of the methods [30,31]. Therefore, this paper constructs an experimental evaluation system comprising five evaluation dimensions, as shown in Table 5.

To mitigate subjective bias during the fusion of various indicators, this study introduces the Analytic Hierarchy Process (AHP) to scientifically determine the weight of each evaluation dimension [32,33]. Specifically, drawing upon the experience of experts in the water resource management domain, the 1–9 scale method of AHP is employed to conduct pairwise importance comparisons among the aforementioned five dimensions, thereby constructing a

5 \times 5

positive reciprocal judgment matrix. Through calculation, the consistency ratio (CR) of this judgment matrix is

C R = 0.0122 < 0.10

, indicating that the judgment matrix has passed the consistency test and that the expert weight assignments possess reasonable logical consistency. After normalizing the eigenvectors of this matrix, the weights

w

for each evaluation dimension are ultimately determined as follows: 0.30, 0.15, 0.25, 0.20, and 0.10, respectively.

For any generated strategy sample, its single comprehensive evaluation score

S

is calculated as follows:

S = \sum_{i = 1}^{5} w_{i} g_{i}

(23)

where

w_{i}

represents the relative weight of the

i

-th evaluation dimension, and

g_{i}

represents the specific score assigned by the model for that dimension. A higher comprehensive score

S

indicates a superior overall performance of the generated strategy in regional governance.

3.3.2. Evaluation Methods

This paper adopts two complementary evaluation methods to assess the generated results: (1) judge scoring based on large language models (LLMs) to measure reasoning consistency and text alignment quality [34,35], and (2) objective overlap metrics based on N-grams to verify the literal precision and key information coverage of the generated content.

(1): Judge Evaluation

This study employs an LLM as a judge to conduct a quantitative evaluation of the generated results [36]. Considering the complex long-text reasoning and Chinese alignment requirements of the task, ChatGPT-5.4 is selected as the primary judge model, with its parameter settings detailed in Table 4.

During the evaluation process, system prompts containing strict scoring rubrics and few-shot examples were designed for the judge model to ensure the reproducibility of the scoring standards and discriminative consistency. For each sample to be tested, the judge model independently outputs the scores for each dimension along with the scoring rationale.

To rigorously validate the reliability of the overall evaluation and the credibility of the LLM-as-a-judge approach, a formalized expert scoring mechanism was implemented. A random sample of 20% of the generated strategies was extracted for independent blind review by three senior domain experts in regional water resource management. The experts strictly adhered to the same AHP-weighted multi-dimensional evaluation criteria used by the LLM. The final expert score for each sample was determined by calculating the arithmetic mean of the three experts’ ratings. Furthermore, to explicitly demonstrate the reliability of using ChatGPT as the primary judge model, the Pearson correlation coefficient between the LLM-generated scores and the average expert scores was computed on this subset. This cross-validation effectively proves that the automated scoring highly aligns with professional human judgment, thereby providing a robust and objective assessment of the generated governance strategies.

(2): N-gram Objective Overlap Evaluation

To further validate the performance of the generated strategies regarding literal precision and information recall capabilities, in addition to semantic evaluation, this paper introduces objective evaluation metrics commonly used in the natural language generation (NLG) domain. These metrics are utilized to cross-validate the generated texts against historical, officially published standard water resource planning and response plans. Specifically, the BLEU-4 and ROUGE-L metrics are employed for measurement. The results serve to supplement the assessment of the degree of overlap and information coverage in key expressions between the generated texts and the standard reference texts.

3.4. Comparative and Ablation Experiments

3.4.1. Construction of Test Dataset

Based on the aforementioned multi-source structured panel data and unstructured knowledge texts, this study constructed a test dataset comprising 100 experimental samples to evaluate the generation performance. These samples cover primary operational categories such as indicator overload warnings, industrial water use conflicts, groundwater over-extraction governance, ecological base flow protection, and cross-regional water allocation. During the sample construction process, moderate data perturbations were introduced based on wet and dry year variations, shifts in water usage structures, and specific policy constraints to simulate input variances under a highly competitive water resource environment. Furthermore, each sample is structured as an “input-reference” pair. Alongside the input diagnostic scenario, each sample includes a corresponding standard governance strategy derived from officially published water resource planning documents, regulatory response plans, and validated expert schemes. These standard strategies serve as the reference texts for calculating objective N-gram overlap metrics, including BLEU-4 and ROUGE-L.

3.4.2. Baseline Comparison Methods for Strategy Generation

To verify the practical effectiveness of the proposed framework in generating water resource governance strategies, this section selects three baseline methods with different configurations for comparative evaluation:

(1): LLM-Direct: Lacks retrieval augmentation and review mechanisms, relying directly on the large language model’s internal knowledge to generate governance recommendations;
(2): VectorRAG-LLM: Introduces traditional RAG augmentation based on a vector knowledge base, without a review mechanism;
(3): HybridRAG-LLM: Single-model generation based on hybrid retrieval augmentation integrating a knowledge graph and vectors, without a review mechanism;
(4): The proposed method: A complete framework integrating hybrid retrieval augmentation with multi-agent collaborative review and feedback.

The comparative experimental results of each method are presented in Table 6. The proposed method demonstrates optimal performance across all objective metrics and subjective evaluations.

The comparative experimental results of each method are presented in Table 6. The proposed method demonstrates optimal performance across all objective metrics and subjective evaluations. Specifically, our framework achieves the highest Expert Score (4.570 ± 0.230, 95% CI: [4.52, 4.62]) and ROUGE-L score (0.528 ± 0.035, 95% CI: [0.521, 0.535]). Independent samples t-tests confirm that these improvements are highly statistically significant compared to the best-performing baseline, HybridRAG-LLM (

p < 0.001

).

From the perspective of internal mechanisms, LLM-Direct, which relies solely on internalized knowledge, performs the worst; it is highly susceptible to generating factual hallucinations when handling highly specialized water resource scenarios. Although VectorRAG-LLM improves scores through basic vector retrieval, its excessive reliance on literal text similarity makes it difficult to untangle the complex causal chains underlying regional water networks. HybridRAG-LLM enhances the factual basis utilizing the dual-path retrieval of the knowledge graph and vectors, but due to the absence of post-processing checks, its single-pass generation results still struggle to properly resolve complex policy conflicts.

In contrast, the core advantage of the proposed method lies in the introduction of a multi-agent collaborative workflow. Through multiple iterations of collaboration and review among agents with distinct functions, the system can automatically identify and eliminate response measures that violate water conservancy regulations or lack data support. This mechanism, driven by iterative feedback, effectively overcomes the flaws of traditional RAG methods that simply piece together retrieved texts. It ensures that the final output governance plans are highly tailored to local realities, exhibiting excellent operability and compliance.

3.4.3. Ablation Experiments

To verify the contributions of the key internal roles within the proposed multi-agent collaborative framework, ablation experiments were conducted by removing specific modules from the complete method. The results of the ablation experiments are presented in Table 7.

As shown in Table 7, the complete framework outperforms the ablated models across all evaluation metrics. The absence of any core module results in a decline in the quality of strategy generation, indicating that the “evaluation—retrieval—reasoning—review” workflow constructed in this study possesses inherent logical necessity. Specifically:

On the one hand, the knowledge retrieval module provides essential, domain-specific factual support. Upon the removal of this module, the evaluation metrics experienced the largest decline, with the BLEU-4 score dropping to 0.165 and the expert score falling to merely 3.550. Water resource management is subject to rigid policy constraints such as the “Four Determinations by Water” policy; relying solely on the internal parameters of large language models is prone to producing factual deviations. In the absence of external corpus injection from the knowledge graph and vector database, it is difficult for the model to precisely grasp policy boundaries, resulting in output recommendations that lack regional specificity and practical guiding value.

On the other hand, the review feedback mechanism ensures the compliance and feasibility of the generated strategies. When the retrieval module is retained but the review module is removed, although the model possesses a certain degree of knowledge integration capability, a gap remains between its performance and that of the complete framework in terms of the BLEU-4 score (0.352), which reflects text precision, and the expert score (4.125). Regional water resource governance faces intricate policy requirements and practical conditions; a single-pass generation struggles to completely avoid logical loopholes in the details. Lacking cross-validation and multi-round constraint feedback among multiple agents, the system is susceptible to omitting compliance reviews, thereby compromising the rigor and operability of the final governance plans.

3.5. Quantitative Evaluation of Hallucination and Policy Compliance

To directly address the critical issue of model hallucinations and ensure the rigorousness of the generated regional water resource management strategies, this study conducted a targeted quantitative analysis. While large language models demonstrate robust generation capabilities, they are prone to fabricating ungrounded measures or violating specific regional constraints if left unchecked.

3.5.1. Evaluation Metrics and Experimental Setup

We designed three highly specific quantitative metrics to evaluate the reliability and compliance of the generation frameworks:

(1): Factual Hallucination Rate (FHR): The percentage of generated strategies containing fabricated diagnostic data, non-existent obstacle factors, or incorrect technical standards.
(2): Policy Conflict Rate (PCR): The frequency of proposed measures that explicitly violate rigid regional constraints, such as the “Four Determinations by Water” principle or specific groundwater extraction limits in Henan Province.
(3): Citation Grounding Accuracy (CGA): The proportion of proposed governance measures that can be strictly traced back to the retrieved policy documents, technical standards, or verified case studies within the hybrid knowledge base.

For the experimental dataset, we randomly sampled 100 representative evaluation instances from the 18 prefecture-level cities in Henan Province (spanning 2018 to 2024). These instances specifically included complex scenarios with high obstacle degrees in water scarcity and ecological degradation. Using DeepSeek-R1 as the foundational model, we compared our full multi-agent framework against the baseline methods. The evaluation was conducted using a highly constrained LLM-as-a-judge approach (ChatGPT-5.4) alongside a 20% random sample cross-validated by domain experts to ensure scoring strictness.

3.5.2. Experimental Results and Analysis

The quantitative evaluation results for hallucination reduction and compliance are presented in Table 8.

As demonstrated in Table 8, the direct generation method (LLM-Direct) suffers from severe reliability issues, with a Factual Hallucination Rate of 38.5% and a Policy Conflict Rate of 22.0%. This indicates that without external knowledge grounding, foundational models struggle to align intelligent strategy generation with strict regional water management norms. The introduction of the Hybrid RAG mechanism (HybridRAG-LLM) significantly improved the Citation Grounding Accuracy to 76.5% and reduced factual errors, proving the effectiveness of dual-path retrieval combined with knowledge graph expansion. However, the Policy Conflict Rate remained at 11.2%, as single-pass generation lacks the capability to continuously self-correct complex normative conflicts. Crucially, our full multi-agent collaborative framework achieved near-perfect compliance. By integrating the Review Agent into an iterative feedback loop, the Policy Conflict Rate was successfully reduced to 2.0%, and the Factual Hallucination Rate dropped to a negligible 1.5%. The Review Agent effectively acted as a rigorous compliance filter, autonomously identifying and rejecting candidate strategies that proposed ungrounded measurements or violated baseline constraints (such as suggesting water-intensive agricultural expansion in critically water-deficient northern cities like Xinxiang or Zhengzhou). Furthermore, the Citation Grounding Accuracy reached 95.8%, mathematically proving that the proposed framework not only generates fluent text but fundamentally guarantees the traceability, factual basis, and strict policy compliance required for intelligent regional water governance.

3.6. Adaptability Experiments of Different Foundation Models

To verify whether the proposed decision-making framework is strongly dependent on a single model, and to evaluate the system’s generalization capabilities across foundational models based on different technical routes, the underlying driving large language models of the framework were directly substituted. In addition to DeepSeek-R1, which possesses strong logical reasoning capabilities, the experiment additionally introduced representative, comprehensive open-source models with comparable parameter scales—namely Qwen2.5-72B and Qwen3-8B—as well as GLM-4, which exhibits excellent overall dialogue performance, for comparative testing.

As shown in Table 9, the proposed framework performs stably across large models with varying parameter scales. Overall, models with larger parameter sizes and stronger reasoning capabilities perform better. Among them, DeepSeek-R1 achieved the highest score, followed by Qwen2.5-72B and GLM-4. This indicates that with the assistance of the proposed method, high-performance models can effectively accomplish the water resource strategy generation task.

Meanwhile, the various metrics for Qwen3-8B, which has a smaller parameter size, showed some decline; in particular, the BLEU-4 score (0.345), which reflects text accuracy, experienced a relatively obvious drop. This suggests that small-scale models still face limitations when handling complex water conservancy information integration and policy rule analysis. However, supported by the proposed framework, the expert evaluation score for Qwen3-8B still reached 4.075. This demonstrates that the hybrid retrieval and multi-agent collaboration mechanisms compensate for the capability deficiencies of foundational models to a certain extent, thereby reducing the system’s reliance on high-end computational resources in practical applications.

4. Conclusions and Prospects

4.1. Conclusions

To address the issues of low intelligence and weak knowledge support in traditional water resource security risk analysis and strategy generation, this paper proposes a framework for security evaluation and intelligent strategy generation based on hybrid retrieval and multi-agent collaboration. Through data validation from 18 prefecture-level cities in Henan Province, the main conclusions are as follows:

(1): Enhanced intelligence and knowledge support for evaluation and strategy generation. The system utilizes the DPSIR framework and the obstacle degree model to achieve intelligent quantitative evaluation. After extracting key indicators, it precisely matches them with policies, regulations, and typical case studies through hybrid retrieval combining a knowledge graph and vectors. This overcomes the limitations of traditional methods regarding knowledge support and provides a solid factual basis for strategy generation by large language models (LLMs), thereby better serving the sustainability objectives of regional water resource management.
(2): Multi-agent collaboration improves the rigor and compliance of the plans. The system constructs a workflow comprising a master control agent, along with evaluation, retrieval, generation, and review agents. When implementing strict management requirements such as the “Four Determinations by Water” policy, cross-validation and compliance review among agents effectively reduce the risks of errors and omissions detaching from practical realities often seen in single-model generation. Experiments demonstrate that this mechanism performs favorably across multiple objective metrics and subjective scores.
(3): Strong strategy specificity and model adaptability. This framework can output specific governance countermeasures tailored to local differences and exhibits stability when integrated with large models of varying parameter scales. The multi-agent workflow mitigates the capability deficiencies of small models to a certain extent, helping to secure a baseline quality for the final decision output, thereby indicating potential practical engineering application value.

4.2. Prospects

Although the proposed framework presents a beneficial exploration in intelligent strategy generation, certain limitations remain in practical engineering implementation and system optimization, which outline directions for future research:

(1): Practical Application and Regional Generalizability: Currently, the framework’s effectiveness has primarily been empirically analyzed using data from Henan Province. Future research should expand the application to other river basins or provinces with different water resource endowment characteristics to further verify the framework’s regional generalizability. Additionally, exploring the integration of the system into real-world digital-twin water conservancy platforms for long-term operational testing is an important approach to evaluating its utility.
(2): Data Quality and Corpus Bias: The performance of retrieval augmentation heavily relies on the quality of the underlying knowledge base. The current corpus may exhibit inherent biases due to varying regional policy emphases or uneven data granularity. Future work will consider introducing automated data cleaning and dynamic knowledge graph updating mechanisms to mitigate the potential impact of corpus bias on generated strategies.

Author Contributions

Conceptualization, L.M. and L.Y.; methodology, L.M.; software, L.M. and X.W.; validation, L.M., X.W. and X.Z.; formal analysis, L.M. and L.Y.; investigation, L.M. and X.Z.; resources, L.Y.; data curation, X.W. and X.Z.; writing—original draft preparation, L.M.; writing—review and editing, L.M. and L.Y.; visualization, L.M. and X.W.; supervision, L.Y.; project administration, L.Y.; funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2024YFC3210800, and the Key R&D Special Project of Henan Province (Research and Application of Key Technologies for Water resource security in the Yellow River Diversion Area of Henan Province), grant number 261111321600. The APC was funded by the National Key R&D Program of China and the Key R&D Special Project of Henan Province.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Prompts of agent.

Agent	Prompts
Master Control Agent	Input: Original user query, diagnostic results from the evaluation agent, and the current system state. Output: Rewritten task query and the next routing and scheduling instructions in JSON format. System prompt: You are the global scheduling hub of the intelligent water resource decision-making system. Based on the user query and the evaluation diagnostic results, execute task rewriting and routing scheduling: 1. Task rewriting: Integrate “regional object-security state-key obstacles-strategic goals” into a unified problem description to reduce semantic drift during retrieval. 2. Routing decision: Based on the current system state, decide the next agent to invoke. ○ If factual basis needs to be supplemented, route to: Retrieval Agent. ○ If evidence is sufficient and a strategy needs to be drafted, route to: Generation Agent. ○ If compliance validation is required, route to: Review Agent. The output format must include the rewritten_query and next_agent fields.
Evaluation agent	Input: Regional raw indicator data of the DPSIR framework (JSON). Output: Structured diagnostic results (JSON format), including comprehensive security closeness, security grade conclusion, and the top five key obstacle factors. System prompt: Strictly prohibit self-calculation. Your task is to receive input data and strictly invoke the externally provided water_security_calculator algorithm tool. Extract the results returned by the tool, and standardize the output of the security closeness, security grade, and key obstacle factors. Do not output any redundant reasoning text.
Retrieval agent	Input: Rewritten task query + Key obstacle factor set. Output: Hierarchically organized augmented context (including graph reasoning summaries and text chunks). System prompt: You are a hybrid retrieval expert in the water resource domain. Using the provided key obstacle factors as seed entities, perform the following Retrieval-Augmented Generation (RAG) tasks: 1. Perform a 2-hop relation expansion within the water resource security knowledge graph to extract the “risk problem–governance measure–policy support” logic chain. 2. Concatenate the graph reasoning summary with the query, execute a “dense vector + sparse BM25” dual-pathway recall in the vector knowledge base, and perform reranking using the RRF algorithm. 3. Extract the Top-10 text chunks and organize the final evidence set for output according to the hierarchical structure of “governance logic chain -> correspondence between measures and policies -> key data fragments”.
Generation agent	Input: Diagnostic results + Augmented context + Previous round review feedback(Optional). Output: Candidate governance strategies in Markdown format. System prompt: You are an experienced think tank expert in regional water resource government administration. Please strictly base your work on the provided quantitative diagnostic results and retrieved contextual evidence to generate a targeted regional water resource governance strategy. The strategy must contain the following four standard modules: 1. Problem summary and cause explanation: Accurately describe the current situation based on the diagnostic results. 2. Targeted governance measures: Measures must strictly respond to key obstacle factors. 3. Policy, regulation, and case basis: Must cite the provided context text as support; subjective fabrication of facts is prohibited. 4. Implementation priority and phasing. Note: If the input contains review feedback, you must perform local rewriting and corrections targeting hard constraint violations or logical flaws in the feedback.
Review Agent	Input: Candidate governance strategies, diagnostic results, and retrieved evidence. Output: JSON data including comprehensive scoring, number of violations, and specific revision suggestions. System prompt: You are a strict compliance auditor for water-related policies. Please perform multi-dimensional cross-validation on the generated candidate strategies. You need to score from the following four dimensions (each with a maximum score of 100): Problem matching degree: Whether the strategy covers all key obstacle factors. Knowledge support degree: Whether core measures are supported by retrieved evidence. Policy compliance: Whether it conforms to current red-line constraints such as total water usage control. Implementation feasibility: Whether the logic is coherent and whether there are execution barriers. Hard constraint check: If obvious conflicts are found in “Policy compliance” or “Problem matching degree” (such as violating the “Four Determinations by Water” principle or seriously omitting key obstacles), please record the number of hard constraint violations as 1 or above. The return format must include the score of each dimension, the weighted total score, the number of hard constraint violations, and specific revision guidance feedback.

Table A2. Demonstration of the multi-agent collaborative workflow for intelligent strategy generation: a case study of Zhumadian City.

Intelligent Strategy Generation for Water Resource Security in Zhumadian City Based on Multi-Agent Collaboration

To verify the effectiveness of the proposed multi-agent collaborative decision-making framework in practical and complex regional water governance scenarios, this study utilizes real observation data from Zhumadian City in 2024 as input to fully demonstrate the end-to-end process from quantitative diagnosis to intelligent strategy generation.
1. Evaluation Agent: Quantitative Calculation and Multi-Dimensional Problem Diagnosis
In the calculation layer, the Evaluation Agent processed the raw indicator data of Zhumadian City based on the DPSIR framework. The calculation results indicate that the comprehensive water resource security score (Ci) of the city in 2024 was 0.617703.
Through the obstacle degree model, the Evaluation Agent accurately extracted the top five core obstacle factors (Top 5 Obstacles) restricting the region’s water resource security:

Proportion of Groundwater Supply: The obstacle degree reached 14.8122, acting as the primary risk point and indicating a severe over-reliance on groundwater in the regional water supply structure.
Per Capita GDP: The obstacle degree was 14.6775, reflecting that the support of water resources for high-quality economic development needs to be enhanced.
Proportion of Ecological Water Use: The obstacle degree was 13.8262, highlighting the current situation where ecological guarantee flows are severely squeezed in water resource allocation.
Water Consumption per 10,000 Yuan of GDP (m³): The obstacle degree was 6.3199, indicating low water use efficiency at the macro level.
Average Water Consumption per Mu of Farmland Irrigation (m³): The obstacle degree was 2.1699, exposing the issue of extensive agricultural water use in Zhumadian, a major agricultural city.

2. Retrieval Agent: Hybrid Retrieval and Multi-Source Evidence Construction
After receiving the obstacle factors, the Master Control Agent triggered the Retrieval Agent to conduct a dual-path hybrid recall combining vector and graph retrieval within the pre-constructed policy knowledge bases. The core recalled evidence set is as follows:

Evidence addressing the “high proportion of groundwater supply”:
○
Establish a dual-control indicator system for groundwater withdrawal volume and water levels using county-level administrative regions as units, and continuously advance the comprehensive governance of groundwater overexploitation.
○
Prohibit the extraction of deep groundwater for agricultural irrigation, and take measures to gradually achieve a complete ban on extraction where it has already occurred.
○
Improve the groundwater monitoring network, and strictly prohibit newly added groundwater extraction for industry, agriculture, and service sectors within overexploitation areas.
Evidence addressing the “low proportion of ecological water use”:
○
Ecological water use should prioritize unconventional water, and construction projects that possess the conditions to use unconventional water but fail to fully utilize it shall not be granted new water abstraction permits.
○
Urban ecological landscapes should prioritize the use of reclaimed water to enhance the utilization level of reclaimed water.
Evidence addressing “economic water efficiency and agricultural water consumption”:
○
Strictly implement total volume and intensity control according to the requirements of the “Four Determinations by Water” (determining city, land, population, and production by water).
○
Promote high-efficiency water-saving irrigation on a large scale across different regions.
○
Promote technologies such as sprinkler irrigation, micro-irrigation, drip irrigation, low-pressure pipeline water conveyance, rainwater harvesting and supplementary irrigation, water-fertilizer integration, and mulching to preserve soil moisture.

3. Generation Agent and Review Agent: Collaborative Reasoning and Iterative Feedback
The Generation Agent drafted an initial governance strategy (Draft V1) based on the aforementioned diagnosis and evidence. Subsequently, the Review Agent conducted a strict audit based on four evaluation dimensions (problem matching degree, knowledge support degree, policy compliance, and implementation feasibility), providing two rounds of critical feedback:

Review Feedback 1 (Policy Compliance Violation-Rejected):
○
Draft Content: To increase the proportion of ecological water use, it is recommended to increase the replenishment of urban rivers and landscape lakes with high-quality surface water and externally diverted water.
○
Review Comments: This severely violates policy red lines. According to the national policy guidelines, blindly expanding landscape water areas is strictly prohibited, and ecological water use must prioritize unconventional water. The unauthorized use of high-quality surface water sources for landscape replenishment is not allowed.
○
Correction Directive: The Generation Agent is required to modify the core strategy of ecological replenishment to “increase the resource utilization of unconventional water (such as reclaimed water and rainwater)”.
Review Feedback 2 (Lack of Implementation Feasibility-Revision Required):
○
Draft Content: In response to groundwater overexploitation, it is recommended that Zhumadian City immediately shut down all groundwater extraction wells and switch entirely to agricultural water sources.
○
Review Comments: Lacks practical feasibility and policy support. Policy documents explicitly state that for deep groundwater that has already been extracted, measures should be taken to “gradually achieve a complete ban on extraction”.
○
Correction Directive: It is necessary to integrate Zhumadian’s special location as a water-receiving area of the Middle Route of the South-to-North Water Diversion Project, introduce the logic of water source replacement, and propose a phased reduction plan.

4. Final Revised Output: Intelligent Strategy Generation Report
After multiple rounds of review and modification, the system ultimately generated a highly customized, compliant, and actionable comprehensive water resource governance strategy:
Collaborative Governance and Intelligent Enhancement Strategy for Water Resource Security in Zhumadian City
I. Comprehensive Diagnosis Summary. Zhumadian City’s current water security level is at the “Monitoring” status. The region is primarily constrained by the dual pressures of severe groundwater overexploitation (excessive water supply proportion, obstacle degree 14.8122) and the squeezing of ecological environment water use (obstacle degree 13.8262). Furthermore, low industrial and agricultural water efficiency (high obstacle degrees for water consumption per 10,000 yuan of GDP and farmland irrigation water consumption) further exacerbates the regional supply–demand contradiction. It is imperative to implement profound water-saving and water-control actions, and comprehensively enforce the “Four Determinations by Water” principle.
II. Systematic Governance Measures (Multi-Dimensional Execution Paths)
1. Deeply Optimize the Water Supply Structure and Resolutely Advance Groundwater Overexploitation Governance. Addressing the primary obstacle of an excessively high proportion of groundwater supply, Zhumadian City should fully leverage the benefits of the completed South-to-North Water Diversion Project, integrating water-saving efforts with the consumption of externally diverted water. Establish a dual-control indicator system for groundwater levels and volumes with the county-level administrative region as the unit. Within the coverage area of public water supply networks, close unauthorized private wells within a specified time limit, and prohibit new groundwater abstraction for industry, agriculture, and services in overexploitation areas. In the agricultural sector, strictly restrict and gradually achieve a complete ban on the extraction of deep groundwater for agricultural irrigation.
2. Mandate the Cascading Utilization of Unconventional Water to Address Ecological Water Shortages. Addressing the issue of an insufficient ecological water use proportion, it is strictly forbidden to blindly expand landscape water areas or consume high-quality surface water to supplement ecological water volumes. The unified allocation of unconventional water into water resources must be forcefully promoted. Upgrade existing sewage treatment facilities, and ensure that sectors such as urban ecological landscapes, urban greening, and environmental sanitation prioritize the use of reclaimed water. For construction projects that have the conditions to use unconventional water but fail to utilize it fully, new water abstraction permits must be resolutely denied, thereby unilaterally increasing the water replenishment proportion of ecosystems.
3. Enforce Rigid Water Resource Constraints and Bidirectionally Improve Industrial and Agricultural Water Efficiency. Facing the pain points of high water consumption per 10,000 yuan of GDP and farmland irrigation water consumption, strict whole-process water use management should be implemented, high-water-consuming project construction should be strictly controlled, and outdated production capacities should be legally and compliantly reduced and eliminated. Comprehensive planned water use management should be implemented for industrial enterprises with an annual water consumption of 10,000 cubic meters or more. Simultaneously, accelerate the modernization and transformation of large and medium-sized irrigation districts, and promote high-efficiency water-saving irrigation technologies such as micro-irrigation, drip irrigation, and water–fertilizer integration on a large scale across different regions to reduce agricultural water consumption intensity per mu from the source.

References

Zhao, M.; Wei, J.; Han, Y.; Shi, J.; Wang, S. Water resource security evaluation and barrier analysis in Henan Province utilizing the DPSIR framework. Front. Environ. Sci. 2024, 12, 1354175. [Google Scholar] [CrossRef]
Hou, W.; Zhao, J.; Zhang, Y. Water resources security assessment in the upper reaches of the Yellow River based on the DPSIR model and multisource data. In Proceedings of the Seventh International Conference on Geoscience and Remote Sensing Mapping (GRSM 2025); SPIE: Bellingham, WA, USA, 2026; Volume 13996, pp. 330–336. [Google Scholar]
Lu, M.; Wang, X.; Liao, W.; Wang, C.; Lei, X.; Wang, H. An assessment of temporal and spatial dynamics of regional water resources security in the DPSIR framework in Jiangxi Province, China. Int. J. Environ. Res. Public Health 2022, 19, 3650. [Google Scholar] [CrossRef]
Zhou, T.; Lin, T.; Cheng, R.; Wang, G.; Jiang, B. An integrated approach for spatio-temporal assessment and attribution of water resources carrying capacity: Incorporating AHP, TOPSIS, and lorenz asymmetry coefficient methods. J. Hydrol. 2025, 650, 132536. [Google Scholar] [CrossRef]
Dehkordi, M.F.; Hatefi, S.M.; Tamošaitienė, J. An Integrated Fuzzy Shannon Entropy and Fuzzy ARAS Model Using Risk Indicators for Water Resources Management Under Uncertainty. Sustainability 2025, 17, 5108. [Google Scholar] [CrossRef]
Han, L.; Wang, Y.; Li, S.; Li, W.; Chen, X. Evaluation of water resource carrying capacity and analysis of driving factors in the Dadu river basin based on the entropy weight method and CRITIC comprehensive evaluation method. Water 2025, 17, 2360. [Google Scholar] [CrossRef]
Yang, L.; Hao, Y.; Wang, B.; Li, X.; Gao, W. Evaluation of the water resources carrying capacity in Shaanxi Province based on DPSIRM–TOPSIS analysis. Ecol. Indic. 2025, 173, 113369. [Google Scholar] [CrossRef]
Xu, W.; Cao, Q.; Gao, G.; Wang, H.; Yin, Y.; Ren, J.; Li, J. Research on Water Resource Security Evaluation and Regulation Strategies for Multi-Source Water Supply Cities. Sustainability 2026, 18, 3492. [Google Scholar] [CrossRef]
Ding, B.; Zhang, J.; Zheng, P.; Li, Z.; Wang, Y.; Jia, G.; Yu, X. Water security assessment for effective water resource management based on multi-temporal blue and green water footprints. J. Hydrol. 2024, 632, 130761. [Google Scholar] [CrossRef]
Ye, A.; Li, X.; Cai, J.; Deng, Y. Evaluation of water ecological security and diagnosis of Obstacles in the Yangtze river delta, China. Sci. Rep. 2025, 15, 30981. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Lu, H.; Li, H.; Huai, X.; Chen, X. Knowledge-driven intelligent generation method of emergency plans for water conservancy projects: A case study of the Middle Route of the South-to-North Water Diversion Project. J. Hydraul. Eng. 2023, 54, 666–676. (In Chinese) [Google Scholar] [CrossRef]
Yang, Y.; Pan, S.; Liu, X.; Ma, W.; Feng, L. Risk response decision recommendation for water conservancy projects driven by the synergy of multimodal knowledge graphs and large models. J. Hydraul. Eng. 2025, 56, 519–530. (In Chinese) [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying large language models and knowledge graphs: A roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
Edge, D.; Trinh, H.; Newman Cheng, J.B.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Ness, R.O.; Larson, J. From local to global: A graph rag approach to query-focused summarization. arXiv 2024, arXiv:2404.16130. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Sarmah, B.; Mehta, D.; Hall, B.; Rao, R.; Patel, S.; Pasquali, S. Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance; ACM: New York, NY, USA, 2024; pp. 608–616. [Google Scholar]
Du, Y.; Li, S.; Torralba, A.; Tenenbaum, J.B.; Mordatch, I. Improving factuality and reasoning in language models through multiagent debate. In Proceedings of the Forty-first International Conference on Machine Learning; JMLR: Norfolk, MA, USA, 2024. [Google Scholar]
Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla, N.V.; Wiest, O.; Zhang, X. Large language model based multi-agents: A survey of progress and challenges. arXiv 2024, arXiv:2402.01680. [Google Scholar] [CrossRef]
Ma, D.; Duan, S.; Zhang, X.; Xu, B.; Xu, Y. Spatiotemporal dynamic assessment of water resources carrying capacity and identification of obstacle factors in Yunnan Province based on grey water footprint theory. Water 2024, 16, 3651. [Google Scholar] [CrossRef]
Zhang, Y.; Lu, X. A comprehensive evaluation of food security in China and its obstacle factors. Int. J. Environ. Res. Public Health 2022, 20, 451. [Google Scholar] [CrossRef]
Ayachi, R.; Guillon, D.; Aldanondo, M.; Vareilles, E.; Coudert, T.; Beauregard, Y.; Geneste, L. Risk knowledge modeling for offer definition in customer-supplier relationships in Engineer-To-Order situations. Comput. Ind. 2022, 138, 103608. [Google Scholar] [CrossRef]
Al-Moslmi, T.; Ocaña, M.G.; Opdahl, A.L.; Veres, C. Named entity extraction for knowledge graphs: A literature overview. IEEE Access 2020, 8, 32862–32881. [Google Scholar] [CrossRef]
Smirnova, A.; Cudré-Mauroux, P. Relation extraction using distant supervision: A survey. ACM Comput. Surv. 2018, 51, 1–35. [Google Scholar] [CrossRef]
Zhao, X.; Jia, Y.; Li, A.; Jiang, R.; Song, Y. Multi-source knowledge fusion: A survey. World Wide Web 2020, 23, 2567–2592. [Google Scholar] [CrossRef]
Zou, L.; Özsu, M.T. Graph-based RDF data management. Data Sci. Eng. 2017, 2, 56–70. [Google Scholar] [CrossRef]
Wylot, M.; Hauswirth, M.; Cudré-Mauroux, P.; Sakr, S. RDF data storage and query processing schemes: A survey. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H.; Wang, H.; et al. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 2019, 7, 535–547. [Google Scholar] [CrossRef]
Liu, Y.; Iter, D.; Xu, Y.; Wang, S.; Xu, R.; Zhu, C. G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 2511–2522. [Google Scholar]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Saaty, T.L. Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 2008, 1, 83–98. [Google Scholar] [CrossRef]
Liu, J.; Han, T.; Zhao, J.; Mu, D.; Liu, H.; Tang, B. An Intelligent Risk Assessment Methodology for the Full Lifecycle Security of Data. Symmetry 2025, 17, 820. [Google Scholar] [CrossRef]
Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Adv. Neural Inf. Process. Syst. 2023, 36, 46595–46623. [Google Scholar]
Li, H.; Dong, Q.; Chen, J.; Su, H.; Zhou, Y.; Ai, Q.; Ye, Z.; Liu, Y. Llms-as-judges: A comprehensive survey on llm-based evaluation methods. arXiv 2024, arXiv:2412.05579. [Google Scholar]
Qin, B.; Lu, P.; Xu, Y.; Deng, F.; Wang, Y.; Zeng, W.; Li, X.; Li, C. Application text generation framework integrating large language models and vector knowledge bases. J. Shenzhen Univ. Sci. Eng. 2025, 42, 597–605. (In Chinese) [Google Scholar]

Figure 1. Overall architecture of the water resource security decision support system.

Figure 2. The indicator system of water resource security evaluation.

Figure 3. Schema of water resource security knowledge graph.

Figure 4. Visualization of the water resource security knowledge graph.

Figure 5. Construction of the water resource security knowledge vector base.

Figure 6. Schematic diagram of the hybrid retrieval mechanism.

Figure 7. Multi-agent collaboration workflow.

Figure 8. Research area overview map.

Figure 9. Heatmap of water resource security evaluation scores for 18 prefecture-level cities in Henan Province from 2018 to 2024.

Table 1. Entity types in the knowledge graph.

Entity Type	Definition	Typical Examples
Regional Objects	Represents the spatial carriers in the knowledge graph, used to describe research objects such as administrative regions, basins, or cities.	Yellow River Basin, a Certain City, a Certain County, Irrigation Area
Water Security Status	Represents the water resource security level and its grade status of the research object during a specific period.	Safe, Relatively Safe, Critically Safe, Relatively Unsafe, Unsafe
Evaluation Indicators	Represents specific measurement indicators used to reflect the status of water resource security.	Water Resource Per Capita, Total Water Consumption, Sewage Treatment Rate, Water Resource Utilization Rate
Obstacle Factors	Represents key factors that exert significant constraints on water resource security.	Water Scarcity, Over-Extraction of Groundwater, Water Environment Pollution, Low Water Use Efficiency
Risk Issues	Represents systemic water resource security issues.	Imbalance Between Supply and Demand, Water Quality Deterioration, Ecological Degradation, Insufficient Governance Capacity
Governance Measures	Represents intervention paths or management methods adopted to address water resource security issues.	Water-Saving Renovation, Sewage Treatment, Ecological Water Replenishment, Total Water Consumption Control
Policies and Regulations	Represents laws, plans, systems, and regulatory documents that support the implementation of water resource management and governance.	Most Stringent Water Resource Management System, Integrated Basin Planning, Water-Saving Action Plan
Engineering Measures	Represents engineering facilities and construction projects related to water resource regulation, supply, or governance.	Reservoir Construction, Water Diversion Project, Sewage Treatment Plant, Reclaimed Water Utilization Project
Typical Cases	Represents exemplary regional governance practices and empirical samples.	Water-Saving Governance Case of a Certain Basin, Reclaimed Water Utilization Case of a Certain City

Table 2. Definitions of agent roles.

Agent Type	Symbol	Core Function Definition
Master Control Agent	$A_{M}$	Acts as the system hub, responsible for maintaining the global shared state, dynamically scheduling workflows, and determining the convergence of system iterations.
Retrieval Agent	$A_{R}$	Responsible for invoking the hybrid retrieval module to accurately recall relevant policy basis and case support from the knowledge graph and vector knowledge base.
Evaluation Agent	$A_{E}$	Calculates and diagnoses the water resource security level and the core causes of risks.
Generation Agent	$A_{G}$	Reasons and generates targeted candidate regional water resource governance strategies based on diagnostic conclusions and enhanced context.
Review Agent	$A_{V}$	Responsible for constraint checking the generated candidate strategies for rationality, knowledge support, and policy compliance, and driving iterative system correction through feedback.

Table 3. Experimental software and hardware configuration parameters.

Category	Configuration Details
Processor (CPU)	Intel Xeon Platinum 8352V 128 GB
GPU	NVIDIA A800 80 GB
Development Environment	Python 3.11
Deep Learning Framework	PyTorch 2.6.0
Knowledge Graph Database	Neo4j 5.12
Vector Database	Milvus 2.3
Text Embedding Model	m3e-base
Base Generative Model	DeepSeek-R1
Multi-Agent Orchestration Framework	LangChain

Table 4. Model parameter table.

Model Name	Temperature	Top-p	Top-k	Experimental Application Role
DeepSeek-R1	0.7	1.0	10	Multi-Agent System Foundation Model
Qwen3-8B	1.0	1.0	10	Comparative Experiment Model
Qwen2.5-72B	1.0	1.0	10	Comparative Experiment Model
GLM-4	0.8	0.8	10	Comparative Experiment Model
ChatGPT-5.4	1.0	1.0	10	Judge Model

Table 5. Evaluation dimensions and content.

Evaluation Dimension	Evaluation Content	Score Range	Weight
Content Completeness	Whether the main water resource security problems in the region are accurately and comprehensively identified.	1–5 Points	0.30
Reasonableness of Cause Explanation	Whether the relationship between obstacle factors and risk issues can be reasonably explained.	1–5 Points	0.15
Measure Pertinence	Whether governance measures match regional problems and obstacle factors.	1–5 Points	0.25
Sufficiency of Policy Basis	Whether there is clear policy, planning, or normative support.	1–5 Points	0.20
Implementability	Whether the strategy conforms to the reality of regional governance and is operable.	1–5 Points	0.10

Table 6. Strategy generation results of different methods.

Method	LLM Expert	Expert Score	BLEU-4	ROUGE-L
LLM-Direct	3.425 ± 0.312 [3.36, 3.49]	3.290 ± 0.320 [3.23, 3.35]	0.142 ± 0.041 [0.134, 0.150]	0.215 ± 0.045 [0.206, 0.224]
VectorRAG-LLM	3.970 ± 0.285 [3.91, 4.03]	3.875 ± 0.290 [3.82, 3.93]	0.258 ± 0.038 [0.251, 0.265]	0.324 ± 0.042 [0.316, 0.332]
HybridRAG-LLM	4.260 ± 0.260 [4.21, 4.31]	4.150 ± 0.275 [4.10, 4.20]	0.335 ± 0.035 [0.328, 0.342]	0.402 ± 0.039 [0.394, 0.410]
Ours	4.655 ± 0.215 [4.61, 4.70]	4.570 ± 0.230 [4.52, 4.62]	0.456 ± 0.032 [0.450, 0.462]	0.528 ± 0.035 [0.521, 0.535]

Notes: Data are presented as Mean ± Standard Deviation (SD), with 95% Confidence Intervals (CI) shown in square brackets [].

Table 7. Ablation experiment results.

Method	LLM Comprehensive Score	Expert Score	BLEU-4	ROUGE-L
w/o Retrieval Agent	3.650 ± 0.280 [3.59, 3.71]	3.550 ± 0.290 [3.49, 3.61]	0.165 ± 0.040 [0.157, 0.173]	0.232 ± 0.042 [0.224, 0.240]
w/o Review Agent	4.230 ± 0.250 [4.18, 4.28]	4.125 ± 0.260 [4.07, 4.18]	0.352 ± 0.036 [0.345, 0.359]	0.428 ± 0.038 [0.421, 0.435]
Ours	4.655 ± 0.215 [4.61, 4.70]	4.570 ± 0.230 [4.52, 4.62]	0.456 ± 0.032 [0.450, 0.462]	0.528 ± 0.035 [0.521, 0.535]

Notes: Data are presented as Mean ± Standard Deviation (SD), with 95% Confidence Intervals (CI) shown in square brackets [].

Table 8. Quantitative assessment of hallucination reduction and compliance across different methods.

Method	FHR	PCR	CGA
LLM-Direct	38.5%	22.0%	14.5%
VectorRAG-LLM	16.2%	14.5%	48.2%
HybridRAG-LLM	7.8%	11.2%	76.5%
Ours	1.5%	2.0%	95.8%

Table 9. Experimental results of different foundation models.

Method	LLM Comprehensive Score	Expert Score	BLEU-4	ROUGE-L
Qwen3-8B	4.160 ± 0.270 [4.11, 4.21]	4.075 ± 0.280 [4.02, 4.13]	0.345 ± 0.038 [0.338, 0.352]	0.412 ± 0.040 [0.404, 0.420]
GLM-4	4.530 ± 0.230 [4.48, 4.58]	4.475 ± 0.240 [4.43, 4.52]	0.432 ± 0.034 [0.425, 0.439]	0.495 ± 0.037 [0.488, 0.502]
Qwen2.5-72B	4.590 ± 0.220 [4.55, 4.63]	4.510 ± 0.235 [4.46, 4.56]	0.445 ± 0.033 [0.439, 0.451]	0.510 ± 0.036 [0.503, 0.517]
DeepSeek-R1	4.655 ± 0.215 [4.61, 4.70]	4.570 ± 0.230 [4.52, 4.62]	0.456 ± 0.032 [0.450, 0.462]	0.528 ± 0.035 [0.521, 0.535]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, L.; Mao, L.; Wang, X.; Zhang, X. A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management. Sustainability 2026, 18, 6138. https://doi.org/10.3390/su18126138

AMA Style

Yang L, Mao L, Wang X, Zhang X. A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management. Sustainability. 2026; 18(12):6138. https://doi.org/10.3390/su18126138

Chicago/Turabian Style

Yang, Libo, Libo Mao, Xiaodong Wang, and Xiuyu Zhang. 2026. "A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management" Sustainability 18, no. 12: 6138. https://doi.org/10.3390/su18126138

APA Style

Yang, L., Mao, L., Wang, X., & Zhang, X. (2026). A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management. Sustainability, 18(12), 6138. https://doi.org/10.3390/su18126138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Agent and Hybrid RAG-Based Framework for Security Evaluation and Intelligent Strategy Generation in Regional Water Resource Management

Abstract

1. Introduction

2. Methods

2.1. Overall Architecture

2.2. Water Resource Security Evaluation Method

2.2.1. Construction of Evaluation Indicator System Based on DPSIR

2.2.2. Comprehensive Evaluation and Obstacle Degree Model

2.3. Hybrid Retrieval-Augmented Method

2.3.1. Construction of the Water Resource Security Knowledge Graph

2.3.2. Construction of Vector Knowledge Base

2.3.3. Hybrid Retrieval Mechanism

2.4. Multi-Agent Collaboration Method

2.4.1. Overall Multi-Agent Architecture and Functional Division

2.4.2. Collaborative Reasoning Driven by Hybrid Retrieval Augmentation

2.4.3. Iterative Generation Under Review Feedback Constraints

3. Experimental Results

3.1. Study Area and Data Sources

3.2. Reliability Verification of Regional Water Resource Security Evaluation

3.3. Evaluation Metrics and Methods for Response Strategy Generation

3.3.1. Multi-Dimensional Semantic Evaluation Standards Based on AHP

3.3.2. Evaluation Methods

3.4. Comparative and Ablation Experiments

3.4.1. Construction of Test Dataset

3.4.2. Baseline Comparison Methods for Strategy Generation

3.4.3. Ablation Experiments

3.5. Quantitative Evaluation of Hallucination and Policy Compliance

3.5.1. Evaluation Metrics and Experimental Setup

3.5.2. Experimental Results and Analysis

3.6. Adaptability Experiments of Different Foundation Models

4. Conclusions and Prospects

4.1. Conclusions

4.2. Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI