Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China

Li, Wenpan; Feng, Yu; Yan, Luyu; Ji, Kebin; Yang, Wanglong; Chang, Ming; Zhang, Qi; Chen, Chuanzhong

doi:10.3390/app16136557

Open AccessArticle

Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China

by

Wenpan Li

,

Yu Feng

,

Luyu Yan

^*,

Kebin Ji

,

Wanglong Yang

,

Ming Chang

,

Qi Zhang

and

Chuanzhong Chen

China National Environmental Monitoring Centre, Beijing 100012, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(13), 6557; https://doi.org/10.3390/app16136557

Submission received: 6 March 2026 / Revised: 23 June 2026 / Accepted: 26 June 2026 / Published: 1 July 2026

Download

Browse Figures

Versions Notes

Abstract

The expansion of China’s modernized eco-environmental monitoring networks has generated vast amounts of data. Consequently, traditional, expertise-reliant analysis is increasingly ill-suited for agile regulatory decision-making. Although large language models (LLMs) present a promising alternative, their practical deployment remains limited by domain-specific knowledge gaps, hallucinations and an inherent difficulty in managing multi-faceted ecological tasks. This study introduces EnvSentry, a novel multi-engine collaborative LLM framework designed for intelligent eco-environmental monitoring and governance. EnvSentry coordinates reasoning, instruction, and multimodal engines, supported by a dynamic, vector-indexed knowledge base and retrieval-augmented generation (RAG) to ensure factual veracity. By transitioning operational workflows from fragmented, latent batch processing to integrated, real-time intelligent agent chains, the system achieves a closed-loop capability of intent recognition, data retrieval, and quality control. The model was evaluated across distinct environmental contexts, specifically water quality anomaly detection and air quality forecasting. Results show that EnvSentry yields higher analytical precision and attribution rates than baseline methods, while compressing decision-making latency from hours to seconds. Relative to baseline models, EnvSentry achieves a 25% improvement in water quality attribution accuracy (50% to 75%), a 90% reduction in decision making latency for anomaly detection, and a 10% absolute gain in data anomaly detection accuracy. In air quality forecasting, it reduces expert judgment time from 60 to 20 min and attains >85% agreement with expert forecasts when used by non-specialist personnel. These improvements suggest a practical shift in eco-environmental monitoring—moving from fragmented, reactive measures toward an integrated and proactive system. Consequently, this approach offers a viable path toward data-driven autonomous ecological management.

Keywords:

eco-environmental monitoring; AI agents; artificial intelligence; data governance; large language models (LLMs); predictive reasoning; EnvSentry

1. Introduction

China is currently constructing a nationwide eco-environment monitoring network, which is simultaneously undergoing technological modernization and intelligent upgrades. Up to 2024, this network already covers 33,000 monitoring sites, enabling the observation of environmental conditions and ecosystem changes at urban, regional, and national scales [1]. However, this expansion has created challenges related to rapidly increasing data volumes and limited data application efficiency. Meanwhile, traditional analysis models that rely heavily on manual expertise face significant bottlenecks when dealing with large-scale data processing [2,3]. Moreover, the multi-source and heterogeneous nature of eco-environmental monitoring data leads to data fragmentation and difficulties in forming standardized, hierarchically classified data warehouses. This further constrains the agility of regulatory decision-making and cross-domain logical reasoning in eco-environmental monitoring. These practical contradictions highlight the urgency of intelligent transformation in environmental monitoring and for a more integrated system covering perception, analysis, and decision-making.

The advent of multimodal LLMs (e.g., GPT-4.o, Llama) [4,5,6,7] has profoundly transformed the operational paradigms within vertical industries. The integration of generative LLMs in eco-environmental monitoring and science is advancing rapidly, demonstrating emerging potential in domains such as climate change analysis, environmental quality assessment, and the design of novel treatment technologies [8,9,10]. These models can process large and complex datasets, including scientific literature and policy documents, and support knowledge extraction and synthesis. The application of AI models in environmental domains spans environmental quality monitoring, modeling and prediction, treatment technology design, environmental policy assessment, and risk evaluation [11,12,13,14,15,16]. The recent proliferation of domain-specific LLMs has underscored the efficacy of fine-tuning strategies across diverse environmental disciplines. For instance, ClimateGPT is continually pre-trained on a massive 4.2-billion-token climate corpus and has shown competitive performance on specialized benchmarks like ClimateBench [17]. Similarly, WaterGPT augments a 7B-parameter backbone with 1.1 billion hydrology-specific tokens and multimodal tools, achieving 83% accuracy on the EvalWater benchmark [18]. OceanGPT, leveraging a corpus of 67,000 oceanography documents, provides expert-level assistance in marine science tasks [19]. These specialized models effectively address the challenges of complex terminology and heterogeneous data within their respective domains. Beyond single-discipline applications, integrated frameworks such as EnvGPT mark a significant step towards interdisciplinary environmental intelligence. By fine-tuning an 8B-parameter model on a balanced, cross-domain instruction dataset (ChatEnv) covering climate, ecosystems, water, soil, and energy, EnvGPT achieves superior performance, even rivalling larger general-purpose models on domain-specific benchmarks like EnvBench and EnviroExam [12]. This underscores the critical importance of high-quality, domain-specific instruction tuning for enhancing the factual accuracy and practical utility of LLMs in scientific applications.

Concurrently, to overcome the inherent limitations of LLMs, such as hallucination and outdated knowledge [20,21], retrieval-augmented generation (RAG) frameworks have been widely adopted [22]. Systems such as ChatClimate ground conversational AI in authoritative, up-to-date scientific sources, including the IPCC AR6 reports, which improves the veracity, reliability, and timeliness of climate information provided to decision-makers and the public [23]. By leveraging big data analytics, cloud computing, and these advanced AI paradigms, environmental monitoring is evolving from basic data processing towards in-depth mining and comprehensive analysis, which supports more accurate predictive modeling and better-informed environmental governance.

However, current eco-environmental monitoring workflows still face several unresolved operational limitations. Multi-source monitoring data are often distributed across heterogeneous platforms, while standards, historical reports, and expert experience remain fragmented across different organizational levels. As a result, manual interpretation is still required for many cross-domain tasks, including anomaly tracing, causal attribution, and report generation. General-purpose LLMs can assist with text understanding, but they lack direct access to real-time monitoring data, authoritative standards, and domain-specific rule constraints. These limitations motivate the development of a domain-oriented framework that integrates data governance, knowledge retrieval, and multi-agent reasoning.

During the formative stages of large language model advancement, our team made efforts to integrate this technology into eco-environmental monitoring practices. Initially, in collaboration with several top domestic research institutions and leading enterprises, we surveyed leading domestic models and established a core pathway of AI mid-platform integration and domain knowledge embedding, completing the first round of private deployment of base models. In 2024, efforts focused on building a data resource center with millisecond-level response capabilities, optimizing real-time query latency to milliseconds through distributed architecture. At the same time, we established a three-level knowledge base system of center, department, and individual levels, which transformed over 100,000 technical standards and monitoring reports into structured knowledge units.

With breakthroughs in deep reasoning capabilities of models such as DeepSeek-R1, we introduce EnvSentry, a novel multi-engine collaborative large language model designed for environmental monitoring. By coordinating a reasoning engine (DeepSeek-R1), an instruction engine (DeepSeek-V3), and a multimodal engine (Alibaba Qwen-VL), EnvSentry successfully implements an operational closed loop for intent recognition, data retrieval, and quality control. The model has demonstrated transformative operational utility, evidenced by enhanced accuracy in natural language queries and significantly improved efficiency in identifying anomalies within key river basins. Furthermore, by preliminarily deploying a matrix of 50 intelligent agents across four domains including rapid inquiry, intelligent analysis, deep insight, and automated review, the system supports the transition from manual analysis to near real-time response. In practical monitoring workflows, these agents are designed to undertake operational roles such as natural language query interpretation, multi-source data retrieval, abnormal section screening, preliminary pollution attribution, forecast explanation, quality-control checking, and structured report generation. By encapsulating domain rules and expert experience into task-specific agents, the system also reduces the knowledge barrier for non-specialist users and improves the consistency of routine monitoring analysis.

However, the evolution of AI technology still faces bottlenecks such as insufficient confidence in deep reasoning, high dependency on datasets, constraints on computational resources, and security concerns [24,25]. Recent research further emphasizes the necessity of building authoritative knowledge bases and RAG frameworks to mitigate model hallucination and enhance the verifiability and reliability of outputs in specialized domains such as environmental monitoring [26,27]. This paper investigates the technological integration mechanisms and the reconstruction of application paradigms for large models in eco-environmental monitoring. Using the EnvSentry system as an empirical case study, we provide an in-depth analysis of key breakthroughs in multi-source data governance, dynamic knowledge embedding, and multi-engine collaborative scheduling, while further exploring optimization pathways and future trends for intelligent transformation.

2. Architecture and Methodology

2.1. Schematic Diagram of EnvSentry

The data infrastructure of EnvSentry integrates three distinct modalities consisting of real-time streaming data, historical archives, and textual knowledge. Although this multi-source inputs inherently introduces pervasive processing obstacles, such as data heterogeneity, quality degradation, and semantic gaps, the framework is specifically engineered to resolve these multi-faceted bottlenecks through a synergistic combination of data governance, knowledge embedding, and multi-model collaboration. As delineated in Figure 1, this co-design culminates in a cohesive closed-loop system characterized by unified data knowledge task integration.

To address the challenges posed by massive datasets and complex decision applications within the national intelligent eco-environmental monitoring network, we proposed a framework that integrates AI capabilities into data governance and decision-making processes. The presented architecture of the national intelligent environmental monitoring network illustrates the systematic use of artificial intelligence (AI) in data aggregation, analytical, and decision-making layers, signifying a shift toward an intelligent eco-environmental governance paradigm (Figure 1). Currently, the framework is organized into two interrelated domains, including AI-driven data governance and AI-driven decision-making.

The AI-driven data governance layer harmonizes multi-source data streams across diverse ambient domains encompassing remote sensing, meteorology, noise, ecology, and soil properties alongside dedicated modules for surface water, groundwater, and network maintenance. Through AI-enhanced correlation analysis and anomaly detection, this layer significantly refines the spatiotemporal granularity and reliability of environmental diagnostics. Complementing this foundation, the AI-driven decision-making layer integrates predictive modeling for surface water alongside emergency monitoring protocols and AI-based quality assurance. Underpinned by robust pollution source management and regulatory compliance, the decision-making layer operates within an integrated observation–analysis–prediction workflow to facilitate evidence-based policy formulation and proactive environmental intervention.

A critical future direction involves migrating AI capabilities to the network edge through AI-empowered front-end sensing devices. Embedding lightweight models directly within monitoring instruments enables in situ, real-time data preprocessing, adaptive sampling, and quality validation. This approach minimizes transmission latency and bandwidth constraints while fostering a responsive, decentralized, and resource-efficient monitoring ecosystem and supports the development of more adaptive environmental intelligence systems.

2.2. Development Progress of EnvSentry

The deployment of EnvSentry in environmental monitoring follows an evolutionary trajectory, advancing from rudimentary task-specific applications to sophisticated autonomous systems. This progression is conceptualized via a three-mode iterative framework (Figure 2), which delineates the concomitant maturation of system architecture and intelligence.

The initial stage, direct integration, focuses on establishing foundational capabilities. This mode encompasses several parallel approaches: the straightforward chaining of scenario design, model preparation, prompt engineering, and business integration; the incorporation of agent construction to enhance model interactivity; and the early adoption of RAG alongside tool components within a process orchestration layer to ground model responses in external knowledge and functionality. This mode represents the baseline for leveraging LLMs, primarily treating them as powerful, configurable predictors within a predefined pipeline.

As system requirements become more complex, the framework transitions into the process orchestration mode. This phase prioritizes dynamic workflow management and places a pronounced emphasis on data-centric engineering. Methodologies within this mode integrate data engineering as a critical precursor to model preparation, ensuring the LLM utilizes high-quality, contextually relevant environmental data. The cornerstone of this architecture is the orchestration of intelligent agents, which are modular units capable of executing discrete tasks, into cohesive workflows. This design effectively addresses complex challenges, such as multi-step data analysis or automated reporting, and supports the transition from static integration to adaptive and data-driven analysis.

2.3. Knowledge Base Construction and Vector Indexing

General-purpose large language models often lack the specialized domain knowledge required for advanced eco-environmental science applications. To surmount this limitation, EnvSentry implements a hierarchical three-tiered knowledge hub comprising a national-level repository that synthesizes over 100,000 technical standards, scientific literature, and historical case studies into a dynamic knowledge graph via semantic slicing, a departmental-level core that aggregates tens of thousands of structured monitoring reports and research findings into specialized knowledge modules, and an individual-level interface linked to expert experience databases for personalized knowledge retrieval.

Before vector indexing, documents were first cleaned, deduplicated, and segmented according to document hierarchy, section headings, semantic boundaries, and domain-specific entities. Each semantic unit retained metadata such as document source, document type, monitoring domain, publication time, and applicable standard. Vector indexing was then used to support similarity-based retrieval, while metadata filtering constrained the retrieval scope and improved traceability. This procedure ensures that retrieved knowledge units remain semantically coherent, source-traceable, and suitable for downstream reasoning and verification.

Building upon the comprehensive technical pipeline of EnvSentry, the knowledge base is systematically structured across eight primary eco-environmental domains, encompassing both document-level resources and granular knowledge units (Figure 3). The literature category constitutes the most extensive segment of the knowledge base, with 780 documents and 125,000 knowledge chunks, providing contextual support for model understanding. This is followed by the technical guidelines category, which includes 610 documents and 88,000 knowledge chunks, offering important procedural references. The standards and specifications and policy and regulation categories also represent substantial components, containing 520 documents (72,000 chunks) and 480 documents (65,000 chunks) respectively, ensuring operational standardization and compliance.

Furthermore, the quality management system and case database categories contribute significant empirical knowledge, with the former containing 450 documents and 60,000 knowledge chunks, and the latter, despite having only 420 documents, comprising a highly granular 58,000 knowledge chunks essential for detailed analysis. The environmental reports category, with 312 documents and 42,000 knowledge chunks, serves as a foundational layer for general domain awareness. This distribution, with literature and technical guidelines as the main components, provides EnvSentry with a broad knowledge base for addressing environmental monitoring tasks in an evidence-based manner.

Based on optimized vector indexing technology, the accuracy of professional term recognition has been significantly enhanced. When a user queries a monitoring solution for the heavy metal antimony, the system automatically matches the technical clauses of the Water Quality, Determination of Mercury, Arsenic, Selenium, Bismuth, and Antimony by Atomic Fluorescence Spectrometry (HJ 694-2014) [28] and generates a standardized monitoring process framework. The knowledge embedding mechanism of Envsentry achieves an improvement over to the traditional RAG paradigm through the hierarchical integration of eco-environmental monitoring domain knowledge and dynamic reasoning capabilities. Its core innovation lies in constructing a tripartite technical architecture comprising a structured knowledge hub, active knowledge injection, and multi-engine collaborative verification, which effectively addresses the shortcomings of traditional RAG in the environmental monitoring field, such as knowledge fragmentation, passive retrieval modes, and the lack of domain-specific logical constraints.

2.4. Data Pipeline of Envsentry

Envsentry data pipeline systematically transforms multi-source data and expert knowledge into intelligent decision-making outputs (Figure 4). The pipeline begins with the integration of heterogeneous inputs, including standards, regulations, expert experience, rules, algorithms, and tabular data, which undergo granularity-based nested document chunking. These knowledge chunks are subsequently standardized and processed through four core analytical modules, i.e., visualization, algorithm, statistics, and analysis.

The processed information is then converted into vector representations using the BGE-M3 embedding model, with retrieval precision enhanced by the BGE-Reranker-large model. This enables dynamic integration of supplementary knowledge through intelligent scheduling mechanisms. The pipeline utilizes a comprehensive suite of tools and agents to perform specialized functions, such as code interpretation and real-time data processing. For example, the python_repl gateway operationalizes remote text-to-code execution by routing input programming strings into sandboxed runtime sessions that optimize session state persistence via specialized virtual workspace directories. This programmatic asset unifies server-side code interpretation with automated visualization management, returning aggregated diagnostics through a single, bounded error-logging interface. The air_data_query tool coordinates multi-dimensional environmental data extraction by unifying localized spatial definitions with scalable temporal resolutions into a single programmatic endpoint. This functional framework evaluates incoming parameter constraints to automate backend query generation, supporting localized ranking constraints through limit boundaries and ensuring standardized output formats verified by domain expert knowledge sets.

The final stage employs a large language model to synthesize processed information and generate structured analytical reports. In the prediction analysis of Yangtze River water quality compliance, the system effectively identifies at-risk sections and evaluates key parameters such as dissolved oxygen levels against regulatory standards. This integrated workflow demonstrates a sophisticated fusion of knowledge retrieval, computational analysis, and natural language generation, establishing a comprehensive framework for environmental monitoring and decision support.

Through its layered architecture, the pipeline effectively bridges the gap between raw environmental data (i.e., primary ecological environment monitoring data) (Table A2) and actionable analytics by deploying a tripartite framework that unifies a structured knowledge hub with active knowledge injection and multi-engine collaborative verification. This approach addresses critical limitations of conventional RAG systems in environmental monitoring, including knowledge chunks, passive retrieval modes, and insufficient domain logic constraints, while maintaining strong interpretability and regulatory compliance throughout the analytical process.

2.5. Multi-Engine Collaborative Mechanism

Given the current absence of a single large language model that is both universally capable and sufficiently specialized in the eco-environmental monitoring domain, we did not rely on a single, general-purpose model. Instead, EnvSentry adopts a multi-engine collaborative mechanism. To balance the demands of deep reasoning and real-time response, EnvSentry employs a task routing hub that dynamically allocates tasks to three specialized engines. This mechanism also encapsulates environmental mechanism models as callable functions, combining physical laws with AI inference. The multi-engine collaboration mechanism establishes an integrated architectural pipeline characterized by tiered processing, iterative feedback loops, and dynamic depth scaling. The operational architecture governing the multi-engine task routing is formulated below.

(1): The primary processing layer leverages the DeepSeek V3 engine to execute all initial user requests spanning conventional query resolution, intent recognition, and foundational logical reasoning.
(2): The secondary layer deploys the DeepSeek R1 engine to execute diagnostic depth enhancement whenever a user explicitly requests deep reasoning, registers dissatisfaction with agent outcomes, or falls below the pre-established DeepSeek V3 output confidence threshold. Upon trigger activation, the original query paired with the initial V3 baseline response is channeled into the R1 model as a unified input vector. Leveraging its reinforcement learning capacity and deep reasoning pathways, the R1 engine subsequently generates a structurally augmented or rectified resolution accompanied by an optimized confidence score.
(3): The Qwen2.5-VL engine governs the multimodal tier to accommodate rich unstructured inputs such as images and videos or to automatically satisfy high-level tasks demanding deep cross-modal reasoning.

For a query such as attribute the cause of excessive total phosphorus in Huanggai Lake, the router assigns the reasoning engine to deduce pollution pathways, the instruction engine to retrieve hydrological data from tributaries, and the multimodal engine to analyze land use patterns from satellite images. The final output quantifies the contribution of agricultural non-point sources by integrating these results with a mass balance equation.

The framework adopts a user-intent-driven paradigm to preemptively resolve potential multi-engine conflicts. Under standard operations, the task routing topology isolates execution within a single selected engine, thereby precluding multi-engine operational friction. In scenarios where the user explicitly rejects the primary response, the architecture triggers a deep analysis sequence coupled with an automated response-evaluation routine designed to enforce ultimate output fidelity.

2.6. Context Adaptation

Upon completion of the Envsentry model, a context-adaptive mechanism was designed to enhance model performance by incorporating more explicit instructions, structured reasoning steps, and domain-specific input formats into the input framework. This flowchart illustrates a dynamic and iterative workflow for context management within EnvSentry (Figure 5). It conceptualizes contexts not as static inputs, but as evolving guidance for task execution. The process begins with a Query, which initiates the cycle. The Inspirits phase represents the initial generation of ideas and responses. These are then organized into a coherent framework during the Structured Data phase. The core of the refinement loop involves Reflection & Structuring, where strategies are critically assessed and improved. Throughout this cycle, Rule Checking acts as a consistent governance mechanism, ensuring that all outputs adhere to predefined guidelines and quality standards. Ultimately, this modular process of generation, reflection, and curation allows strategies and knowledge to be systematically accumulated, refined, and organized, enabling the system to learn and adapt over time.

3. Application and Performance

3.1. RAG Performance in Envsentry

The integration of Retrieval-Augmented Generation (RAG) represents an important improvement in the deployment of large language models (LLMs) for eco-environmental domains. A tripartite evaluation of the Envsentry model based on accuracy, attribution rate, and response completeness, compares model performance with and without the RAG framework and shows its contribution to model performance (Figure 6). The consolidated results demonstrate that RAG does not merely incrementally improve performance but fundamentally enhances the model’s reliability, verifiability, and utility, while reducing important limitations of base LLMs.

The evaluation dataset of RAG is constructed by synthesizing a question bank designed by eco-environmental domain experts with authentic user queries derived from real-world operational scenarios, systematically spanning six benchmarking tasks consisting of basic information resolution, standard specifications, case analysis, intent recognition, NL2SQL translation, and programmatic coding ability. To ensure the objectivity and reliability of the evaluation metrics, a rigorous ground-truth dataset is established for all benchmarking queries by extracting definitive reference answers directly from authoritative repository documentation under strict environmental domain expert verification.

The accuracy metrics reveal that RAG improves performance across operational dimensions, especially in knowledge-intensive tasks (Figure 6a). The increase in basic information accuracy from 35% to 90% and standard specifications from 53% to 88% underscores RAG’s critical function in providing access to an external, authoritative knowledge base. This drastically reduces factual errors and hallucinations. Significant gains in functional capabilities such as NL2SQL (68% to 94%) and case analysis (65% to 85%) further indicate that retrieved context enables more precise reasoning and execution, positioning the model as an analytical tool rather than only a text generator.

The attribution rate data offers the most striking evidence of RAG’s role In ensuring output verifiability (Figure 6b). The critically low attribution rates without RAG particularly in standard specifications at 5%, case analysis at 9%, and basic information at 7%,indicate that the base model operates as a black box generating unsubstantiated and potentially unreliable content. The implementation of RAG elevates these rates to 90%, 87%, and 88%, respectively. This result shows that RAG links model outputs to retrievable context and provides more transparent and auditable evidence. This capability is indispensable for building trust and facilitating validation in a scientific or regulatory context.

The completeness evaluation demonstrates that RAG is useful for generating completer and more organized responses (Figure 6c). The extreme deficit in completeness without RAG, again most acute in basic information (27%) and case analysis (37%), reveals a tendency for the base model to produce fragmented, incomplete, or evasive answers. By providing a comprehensive informational scaffold, RAG guides the model to cover required information, raising completeness to 98% and 83% in these domains. This ensures that the outputs are not only correct but also functionally useful for end-users who require a full picture for decision-making.

The combined interpretation of these three metrics shows that the RAG framework systematically addresses the fundamental weaknesses of generative LLMs. Specifically, this architecture simultaneously rectifies semantic hallucination, knowledge obsolescence, and operational unreliability by addressing the underlying empirical shortfalls traditionally manifested in low accuracy, deficient attribution, and incomplete output generation.

Generally, the transition from a base LLM to a RAG-augmented system marks the evolution from a proficient but unreliable pattern-matching engine to a robust, knowledge-grounded, and verifiable assistant. For technical domains such as environmental monitoring, where precision, auditability, and thoroughness are essential, the integration of RAG is not an optional enhancement but a foundational requirement for dependable and effective deployment.

3.2. Workflow of EnvSentry for Applications

The diagram illustrates EnvSentry as a novel LLM-driven agentic architecture designed for complex environmental monitoring and assessment tasks (Figure 7). A case study is conducted to assess and analyze the current anomalies in the mainstream of the Yangtze River and to forecast its compliance status for the current year.

The operational workflow commences with a user-initiated natural language query processed by the EnvSentry framework acting as the primary orchestration hub. To manage complex high-level objectives, this central planner executes automated task decomposition to translate abstract user inputs into a structured sequential schedule of executable sub-tasks.

To evaluate anomalies and forecast compliance status within the presented case study, the scheduling framework orchestrates three continuous procedural phases spanning the initial baseline analysis of the Yangtze River water quality, the subsequent diagnosis of potential risks alongside their underlying drivers, and the final prediction of current-year compliance trends. The execution of this sequential pipeline is distributed across a multi-agent collective where each sub-task maps directly to a specialized autonomous entity namely a data retrieval agent for source ingestion, an analytical agent for risk and causal diagnostics, and a predictive agent embedded with advanced forecasting models. To operationalize these workflows, the agents dynamically invoke a domain-specific Model Context Protocol toolkit that unifies diverse modular capabilities including statistical reporting, correlation mapping, anomaly detection, geospatial watershed analysis, and multi-scale environmental compliance evaluations.

EnvSentry synthesizes the processed information and outputs from the agentic execution pipeline to generate a coherent, comprehensive response that directly addresses user’s initial query. This architecture modularizes complex problem-solving, combining the reasoning and planning capabilities of an LLM with the specialized, executable functions of autonomous agents and domain-specific tools.

3.3. Case Study and Performance

3.3.1. Real-Time Water Quality Analysis and Assessment

We implemented a case study focused on real-time water quality analysis and warning, deploying the EnvSentry model across the national surface water quality monitoring network (Figure A1). The objective was to leverage the model’s capacity for processing massive real-time and historical datasets to achieve rapid identification of pollution anomalies. This addresses a critical limitation of conventional methods, which largely depend on univariate outlier statistics for single pollutants and are ill-suited for rapid localization when confronted with multi-source, multi-scale data. The EnvSentry-driven intelligent agent introduces a more integrated analytical framework, integrating multivariate anomaly detection, spatiotemporal correlation analysis, and similarity matching against historical pollution events. This integrated methodology enables the system to move beyond simple detection, providing capabilities for inferring potential pollution etiologies and identifying source-tracing pathways.

Integration of EnvSentry optimizes both the detection latency and diagnostic precision of pollution events. Empirical data indicate that the system elevates the accuracy of pollution source inference from the baseline 50% to 75%, while simultaneously compressing the diagnostic and evaluation period by 90% (Figure 8). This substantial reduction in processing time enables minute-level anomaly identification and causal tracing, thereby providing reliable empirical support for emergency responses.

3.3.2. Air Quality Forecasting and Trend Assessment

In an application for regional air quality forecasting, the EnvSentry model was deployed as a knowledge coordination module. This system leverages comprehensive national networked datasets and high-frequency monitoring feeds to intelligently orchestrate a hybrid predictive framework, synergizing conventional numerical models, such as deterministic chemical transport models, with advanced statistical methods. This approach addresses the limitations of traditional numerical models, especially under complex scenarios. Because these models often exhibit significant deviations and lack sufficient spatiotemporal resolution, they cannot accurately capture local small-scale pollution events driven by atypical meteorology. Consequently, their practical value for operational guidance remains constrained.

The EnvSentry-driven intelligent agent functions as an adaptive post-processing and bias-correction layer. By analyzing and comprehending the complex, non-linear deviation patterns (i.e., systematic biases) between historical model outputs and observational data, the agent performs dynamic, real-time corrections on new model results. It further refines these corrected forecasts by assimilating near-real-time data from local monitoring stations, generating predictions that demonstrate better agreement with observed conditions.

Practically, this system delivers more precise and reliable trend forecasts for regional joint prevention and control authorities. This improved accuracy helps authorities formulate and execute timely mitigation strategies tailored to specific sub-regional needs. Quantitative evaluation confirmed the system’s performance and efficiency. First, the agent-driven workflow streamlined the analytical process, reducing the time required for comprehensive expert-led judgment from an average of 60 min to 20 min. Second, and perhaps more significantly, predictive judgments generated by non-specialist personnel invoking the agent achieved a consistency rate exceeding 85% when compared to forecasts produced by domain experts. This demonstrates the model’s ability to represent expert knowledge and support rapid-response analysis.

3.3.3. Operations and Maintenance (O&M) of the National Ecological Environmental Monitoring Network

In an application focused on the intelligent Operations and Maintenance (O&M) of the national eco-environmental monitoring network, the EnvSentry model was deployed to address significant challenges in data validation and operational efficiency. The network’s O&M workflows were traditionally hindered by the difficulty in identifying data anomalies, the high labor costs and subjective errors associated with manual verification, and inefficiencies in managing frequent maintenance tasks and work order closures.

The EnvSentry solution utilizes LLM-based reasoning, enabling business rules to be input as natural language descriptions. The model then automatically queries, analyzes, and discriminates anomalous data patterns. This system is integrated with agentic tools, offering a dual-mode capability: it can execute pre-encoded rule-based algorithms for routine tasks, or it can allow non-technical business personnel to directly leverage the model’s reasoning for complex diagnostics. This implementation yielded substantial improvements in network integrity and efficiency. Quantitatively, the introduction of the EnvSentry model improved data anomaly detection accuracy by 10%. Furthermore, it achieved a 90% rule intelligence rate, indicating that the model can translate natural language rules into executable logic. This automation resulted in a significant reduction in the total weekly manual verification workload by approximately 30%, shortening the work order processing cycle and enhancing the overall stability and efficiency of the national monitoring network.

3.4. Performance Comparison with Typical LLMs

We compared the performance of EnvSentry with two LLMs ChatGPT-5 and DeepSeek V3.1, using two questions. The first question asked the models to generate a 72 h air quality forecast for the Beijing-Tianjin-Hebei (BTH) region, and the second one asked them to assess the impact of rainfall during this year’s flood season on water quality in the Yellow River Basin (YRB) (Table A1). The assessment focuses on forecast structure, meteorological integration, specificity, and practical utility. The user-level evaluation prompts, representative desensitized agent prompt templates, and task execution rules used to support this comparison are provided in Appendix A.3 to improve methodological transparency and reproducibility.

To move beyond qualitative observation and enable a reproducible quantitative comparison, we evaluated the three models using four standardized criteria: factual accuracy, temporal specificity, structural completeness, and operational practicality. Each criterion was scored on a 0–100 scale according to predefined scoring rules. The evaluation was independently conducted by two annotators using the same set of domain-specific queries and original model outputs reported in this section (Table 1).

For Query 1, the comparative outputs from the three LLMs revealed distinct approaches and varying levels of sophistication in generating a 72 h air quality forecast for the BTH region. ChatGPT-5 produced a well-structured, general-audience forecast using a generic Day 1–3 format. It correctly identified key meteorological drivers and included public health recommendations. However, its output remained hypothetical and non-specific, lacking temporal anchoring to a real forecast cycle, and therefore resembled a reusable template rather than a situation-specific prediction.

DeepSeek V3.1 demonstrated enhanced specificity by anchoring its forecast to explicit calendar dates (24th–26th November). Its output included a meteorological explanation, describing the progression from cold air intrusion to weakening pressure fields and poor dispersion conditions, including mention of specific phenomena like nocturnal floating dust. This indicated a competent synthesis of typical weather-pollution linkages, though it omitted a formal forecast citation.

In contrast, EnvSentry generated an output most closely aligned with operational forecasting standards. It began by citing a specific forecast issue date, establishing critical provenance. The forecast employed precise, standardized air quality terminology such as Excellent to Good, and provided the highest spatial resolution, identifying nuanced risk areas such as the foothills of Hebei. It communicated the overall trend while appropriately qualifying short-term peaks and localized risks, demonstrating a superior grasp of forecast confidence and spatial variability. Consequently, the EnvSentry output demonstrates superior operational verisimilitude and practical utility within both professional and public communication contexts.

For Query 2, ChatGPT-5 provided a concise, bullet-point summary of general impact mechanisms, including increased runoff, pollutant flushing, a dilution effect, and microbial changes. Its response is factually sound for a generic explanation but exhibits key limitations for the specific query. It makes no reference to the year 2024 mentioned in the prompt, offers no quantitative data, and lacks any temporal or spatial nuance specific to the Yellow River Basin (YRB). The output reads as a static, textbook-style overview suitable for foundational education but devoid of situational analysis or report-like structure.

DeepSeek V3.1 demonstrated a notable advance in conceptual framing and narrative coherence. It introduced framework covering short-term shocks and long-term purification, capturing the competing temporal dynamics of flood impacts. Its response elaborated on specific pollutant types and processes such as channel scouring with greater technical detail than ChatGPT-5. While it anchored the response to 2024, it still operated on a conceptual level without integrating concrete, time-bound monitoring data, such as specific pollutant concentration trends or geographic hotspots. Its output is a well-reasoned, mechanistically rich discussion but remains a qualitative analysis.

EnvSentry generated a response that closely emulates a formal environmental assessment report. Its output is structurally organized with clear sections, including overall situation and key findings, mirroring professional scientific communication. Crucially, it successfully incorporated simulated or recalled specific data points, such as the V-shaped trend, the proportion of Class I-III water bodies from 79.8% to 86.1%, the count of Inferior Class V sections, and named vulnerable tributaries such as the Malian River and water bodies (Ulansuhai Lake). Furthermore, it extended the analysis to include regionally differentiated findings, a mechanism summary, and actionable policy recommendations. This synthesis of quantitative metrics, spatial specificity, and a standardized reporting format results in an output of higher practical relevance and functional utility for a domain expert or decision-maker.

4. Discussion

Several research-oriented large language models have emerged for climate and environmental sciences. For examples, systems such as ClimateGPT, WaterGPT, and OceanGPT have adapted LLMs to specific environmental subdomains [17,18,19,29,30], and frameworks like EnvGPT have made strides in interdisciplinary knowledge integration through supervised fine-tuning [12]. In contrast, EnvSentry differs from these models in its system architecture. Unlike these models, which primarily refine a single model’s knowledge and reasoning capabilities, the multi-engine collaborative mechanism of EnvSentry explicitly acknowledges the absence of a universal yet domain specialized LLM. Through dynamic task routing across specialized engines, EnvSentry achieves a superior balance between deep cognitive reasoning and real-time structured operations to circumvent the limitations of a single generalized model.

A core advantage of EnvSentry lies in its systematic knowledge grounding. Similar to ChatClimate, which grounds its responses in the authoritative IPCC AR6 reports to mitigate hallucination [23], EnvSentry implements an enhanced RAG paradigm built upon a massive, hierarchically structured knowledge base. However, EnvSentry extends this concept beyond a single document corpus. The system utilizes an architecture comprising a structured knowledge hub, active knowledge injection, and multi-engine collaborative verification. By integrating national standards, technical guidelines, literature, and case databases, this framework enables the precise retrieval of technical specifications (e.g., HJ 694-2014) and ensures that outputs are both verifiable and compliant with regulatory frameworks. This addresses the critical issue of factual reliability, a challenge that persists even in advanced general models.

Furthermore, EnvSentry design as an agentic workflow orchestrator moves beyond a pure Q&A system. It transforms the LLM from an endpoint into a planning and dispatch center that decomposes complex user queries into executable schedules. By leveraging a suite of specialized agents and MCP tools, it connects language understanding with practical environmental modeling, achieving a closed-loop integration of perception, analysis, and decision-making that is only nascent in other environmental AI applications.

Despite the operational efficiencies demonstrated by the EnvSentry framework, multi-engine orchestration inherently introduces complex structural trade-offs that necessitate critical evaluation. First, regarding infrastructure scalability, the expanding topology of coordinated autonomous agents increases systemic communication overhead and algorithmic scheduling complexity during peak environmental monitoring periods. Second, the reliance on high-fidelity inference models such as DeepSeek R1 substantially elevates the cumulative computational cost and operational latency, presenting clear bottlenecks for massive real-time data ingestion pipelines. Finally, a critical vulnerability lies in the potential cascaded propagation of reasoning inconsistencies across interconnected agents; an unresolved cognitive anomaly or semantic deviation generated by the primary processing layer can systematically distort the downstream contextual payloads ingested by specialized analytical or multimodal engines.

The comparative analysis of the three LLMs (ChatGPT-5, Deepseek v3.1, EnvSentry) reveals a clear spectrum of capability in addressing domain-specific environmental queries, ranging from general knowledge summarization to near-professional situational reporting. This comparison shows the influence of model architecture, training data specialization, and prompting constraints on output utility for scientific applications. ChatGPT-5 serves as a competent general explanatory tool, suitable for foundational education on the topic but limited by its static knowledge cutoff and lack of domain-specific data integration. DeepSeek V3.1 offers greater technical depth and reasoning, potentially providing a more satisfactory answer for users seeking a detailed process-based understanding, yet still constrained by the absence of current-year facts. EnvSentry demonstrates specialized domain superiority. By generating a response that incorporates time-bound data, geographic specifics, and a formal reporting structure, it delivers stronger domain-specific performance. for professionals or stakeholders requiring situation-aware analysis. Compared to ChatGPT-5 and Deepseek, EnvSentry illustrates the evolution from a general-purpose explanatory tool to a domain-specialized analytical assistant. This highlights a broader principle in AI application while foundational LLMs provide broad accessibility to knowledge, maximizing utility in specialized scientific and operational fields increasingly requires tailored models or robust retrieval-augmented generation systems capable of accessing and formatting domain-specific information.

Despite these advancements, several challenges persist, aligning with broader limitations in the field. First, while the reasoning engine handles complex tasks, the depth of causal reasoning in novel, multi-factorial environmental scenarios remain a frontier. Enhancing this capability may require integration with more sophisticated symbolic reasoning or mechanistic models, an area where systems like ClimateGPT have begun exploration by synthesizing interdisciplinary research [17,31]. Second, the model performance is heavily dependent on data standardization and quality. Inconsistencies in data formats and governance across the vast monitoring network can hinder optimal performance, a common bottleneck for data-driven AI systems. Our future strategy involves completing the migration to a unified domestic database and refining the knowledge hierarchy to mitigate this issue. Third, the computational demand of multi-engine orchestration and large-scale knowledge retrieval presents significant resource constraints. This reflects the high costs associated with training and fine-tuning large models such as EnvGPT [12,32]. To distribute the computational load efficiently, the proposed cloud-edge-terminal computing architecture offers a targeted infrastructure solution. Addressing the non-trivial risks of operational security and semantic hallucination, the integration of retrieval-augmented generation and multi-engine verification within EnvSentry significantly mitigates factual errors, yet the vulnerability to unsubstantiated output generation in edge cases persists. Future research will therefore incorporate advanced methodologies including federated learning for data privacy and knowledge anchoring for output fidelity to advance system robustness and reliability in alignment with rigorous source citation paradigms.

5. Conclusions

The EnvSentry model represents a critical advancement in the application of large language model and agent technology to the eco-environmental monitoring domain. By innovating a multi-engine collaborative framework grounded in robust data governance and a comprehensive knowledge base, EnvSentry has effectively transformed the paradigm from delayed, fragmented analysis to real-time, integrated, and intelligent decision-making. The demonstrated performance improvements across diverse application scenarios validate the feasibility of an integrated sensing, analysis, and decision-making framework for environmental governance. Nevertheless, large-scale deployment of EnvSentry still depends on continuous data standardization, stable computational resources, secure access control, and human verification for low-confidence outputs. These practical constraints should be addressed before fully autonomous environmental governance can be achieved in real-world monitoring networks.

In the future developments of EnvSentry, the technology will evolve towards the creation of a highly autonomous, interpretable, and collaborative environmental intelligence agent. A critical developmental pathway involves extending EnvSentry capabilities to the terminal layer of environmental monitoring by embedding lightweight models directly within front-end sensing devices. This edge-level integration will achieve in situ data processing and preliminary analysis at the data source to facilitate seamless end-to-end artificial intelligence deployment across the entire environmental monitoring, data governance, and decision-making pipeline. Future developments will prioritize the synergistic advancement of multimodal data fusion alongside high-resolution spatiotemporal prediction models. To ensure maximum scientific rigor, this integrated analytical framework will deeply incorporate domain-specific mechanistic models. The comprehensive technical paradigm ultimately yields a fully closed-loop intelligent environmental management system.

Author Contributions

Conceptualization, W.L. and C.C.; methodology, W.L. and L.Y.; software, W.L., L.Y. and Y.F.; validation, K.J., W.Y. and M.C.; formal analysis, W.L.; investigation, C.C. and Q.Z.; resources, C.C.; data curation, W.L. and Y.F.; writing—original draft preparation, W.L. and L.Y.; writing—review and editing, K.J. and M.C.; visualization, W.L. and C.C.; supervision, W.L.; project administration, W.L.; funding acquisition, C.C. All authors have read and agreed to the published version of the manuscript.

Funding

Jing-Jin-Ji Regional Integrated Environmental Improvement-National Science and Technology Major Project (Funder: Ministry of Science and Technology of the People’s Republic of China, Grant No. 2026ZD1208800).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

Figure A1. Screenshots of the EnvSentry model. (a) the startup page of EnvSentry (环境哨兵 is the Chinese name of EnvSentry), (b) model operation example (Miyun Reservoir water quality analysis).

Table A1. Comparison of generated answers to eco-environmental monitoring questions from ChatGPT-5, Deepseek and EnvSentry.

Query	ChatGPT-5	Deepseek V3.1	EnvSentry
1. Forecasting the air quality in the Beijing-Tianjin-Hebei (BTH) region for the next 72 h	Brief air quality Forecast for the BTH Region (Next 72 h): Day1 (0–24 h): good to excellent air quality in the northern parts, with light pollution expected in the southern areas. Day2 (24–48 h): air quality may improve if wind strength or precipitation increases; under stagnant conditions, light to moderate pollution will persist in the central and southern regions. Day3 (48–72 h): the arrival of cold air or precipitation could lead to a general improvement to good air quality; however, localized moderate pollution remains possible if conditions remain stagnant. Key risk areas: central and southern Hebei, coastal areas of Tianjin, industrial zones, and heavy-traffic zones. Primary pollutants: PM_2.5 and PM₁₀. Recommendations: limit outdoor activities when air quality is poor. Sensitive groups are advised to take necessary precautions.	Air quality outlook in the BTH (Next 72 h): 24th, Nov: under the influence of cold air, good to excellent air quality in the north; haze is expected to diminish and dissipate. 25th, Nov: as atmospheric pressure weakens, the north will experience good to lightly polluted conditions, while central and southern areas will see light to moderate pollution. 26th, Nov: poor dispersion conditions will lead to good to light pollution in the north, and light to moderate pollution in central and southern regions. At night, floating dust may occur in western Hebei. Overall, air quality will mainly range from good to lightly polluted, with dispersion conditions deteriorating after the mid-term period.	Air Quality Outlook for the BTH in the Next 72 h: According to the regional forecast issued on 23 Nov 2025, the overall air quality in the BTH region during the next 72 h (24–26 November) is expected to be primarily Excellent to Good. The northern parts will maintain consistently Excellent to Good levels, while the central and southern areas will mostly experience Good to Lightly Polluted conditions. A short-term peak in pollution may occur in the southern region on the 24th. By the 26th, localized areas along the foothills of Hebei and in northwestern Shandong may face a risk of Moderately Polluted conditions.
2. Please explain the impact of rainfall during 2024 flood season on water quality in the Yellow River Basin (YRB)	The Impact of rainfall during 2024 flood season on water quality in the YRB can be summarized as follows: 1. Increased runoff Volume: heavy rainfall leads to increased surface runoff, carrying more sediment and suspended solids into the river, thereby elevating water turbidity. 2. Pollutant flushing: non-point source pollutants (e.g., nitrogen, phosphorus, heavy metals) from agricultural lands, urban areas, and industrial zones are washed into the river. This can increase nutrient concentrations in the water body, posing a risk of localized eutrophication. 3. Pollutant dilution effect: during large-scale rainfall events, increased river discharge may temporarily dilute the concentrations of certain pollutants. However, this dilution effect can be offset by sustained rainfall and continued sediment input. 4. Microbial changes: rainwater runoff readily introduces bacteria and pathogenic microorganisms, potentially causing short-term fluctuations and increases in microbial indicators of water quality. Overall, the short-term impact of flood-season rainfall on Yellow River water quality is primarily characterized by increased turbidity and fluctuations in nutrient and microbial concentrations. While increased water volume provides a certain dilution effect, the impact of non-point source pollution shocks cannot be overlooked.	The impact of rainfall during 2024 flood season on the water quality of the YRB presents a complex scenario characterized by the coexistence of “short-term shocks” and “long-term purification.” 1. Short-term negative Impacts (shocks): Exacerbation of non-point source pollution: initial intense rainfall generates surface runoff, which flushes large quantities of pollutants, such as chemical fertilizers, pesticides, livestock manure from riparian farmlands, and contaminants from urban surfaces—into the river. This leads to a sharp short-term increase in indicators like ammonia nitrogen, total phosphorus, and chemical oxygen demand (COD). Increase in water turbidity: rainfall erodes riverbanks and the land surface, carrying substantial amounts of sediment and suspended solids into the Yellow River. This causes a rapid rise in water turbidity, impairing aesthetic quality and potentially facilitating the adsorption of other pollutants. 2. Long-Term/Subsequent Positive Impacts (Purification): Increased Water Volume, Dilution and Degradation: Sustained rainfall and increased upstream inflow significantly raise the runoff volume in the main stem of the Yellow River. The increased flow dilutes pollutants and enhances the water body’s self-purification capacity, contributing to gradual water quality improvement during the mid-to-late flood season or thereafter. Channel Scouring and Water Renewal: Higher flow rates help scour and resuspend riverbed sediments, mobilizing and transporting some accumulated pollutants towards the sea. This “flushing” effect cleanses the channel, creating favorable conditions for subsequent water quality improvement. In summary, rainfall in the early flood season often causes temporary and localized deterioration of Yellow River water quality. However, as runoff volume continues to increase, the effects of dilution and purification gradually become dominant, playing a positive role in enhancing the overall environmental carrying capacity of the Yellow River’s water system. Management authorities should prioritize preventing the shock load of non-point source pollution associated with initial rainwater runoff.	I. Overall situation During the 2024 flood season (June–September), the water quality in the YRB exhibited a V-shaped trend, characterized by an initial decline followed by an improvement. Influenced by the initial flushing effect of rainfall, water quality declined somewhat in July. However, from August to September, significant improvement was observed, with quality surpassing pre-flood season levels, primarily due to the dilution effect of sustained rainfall. II. Key findings 1. Water quality variation characteristics Proportion of Excellent/Good water bodies (Class I–III): increased from a low of 79.8% in July to 86.1% by September. Number of Inferior Class V monitoring sections: peaked at 8 in July and decreased to 4 by September. Primary pollution indicators: Chemical Oxygen Demand (COD), Permanganate Index, and 5-day Biochemical Oxygen Demand (BOD₅). 2. Significant regional disparities Mainstream of the Yellow River: water quality remained stable, consistently maintaining an “Excellent” grade for four consecutive months, demonstrating strong resilience to pollution shocks. Major Tributaries: These were more sensitive and vulnerable. Specific tributaries, such as the Malian River and Zuli River, exhibited issues with heavy metal pollution. Lakes and reservoirs: Ulansuhai Lake remained in the Inferior Class V category continuously, indicating a pronounced cumulative effect of pollutants. III. Analysis of impact mechanisms Rainfall during the flood season exerted a dual impact on water quality. In the initial phase (June–July), surface runoff scoured and transported pollutants into the river, leading to temporary deterioration. In the later phase (August–September), sustained rainfall established stable runoff, which improved water quality through a dilution effect. IV. Recommended measures 1. Enhance pollution monitoring and control in key tributaries during the critical June-July period. 2. Formulate targeted remediation plans for tributaries affected by heavy metal pollution. 3. Establish an early warning and response mechanism for water quality during the flood season to enable precise prevention and control. 4. Prioritize long-term restoration efforts for lake and reservoir water bodies. V. Conclusion Current water quality management in the Yellow River Basin has yielded significant results, with stable quality in the mainstream. However, pollution control in the tributaries requires continued strengthening. It is recommended to focus on non-point source pollution control and improve the emergency response system for the flood season.

Appendix A.2

Table A2. Sample of raw environmental data.

Months	Year	Basin_Name	Waterbody_Name	Station_Name	Turbidity (NTU)	DO (mg/L)	Water Temperature	Electrical Conductivity (mS/m)	BOD\u8325 (mg/L)	COD (mg/L)	CODmn (mg/L)	TP (mg/L)
9	2025	Yangtze River Basin	Yangtze River	Gaogang Wharf (Left Bank)	18.2	7.8	24.3	36.8	1.1	6.9	1.7	0.056
9	2025	Yangtze River Basin	Yangtze River	Yuanbaowei (Left Bank)	31.7	8.2	21.8	34.5	0.9	7.3	1.7	0.051
9	2025	Yangtze River Basin	Yangtze River	Hukou (Right Bank)	25	8.2	20.9	38.2	1	7.2	1.6	0.061
9	2025	Yangtze River Basin	Yangtze River	Yunchi (Baiyang)	9.5	9	20.6	41.3	0.5	5.8	1.4	0.042
9	2025	Yangtze River Basin	Yangtze River	Yanji	22.6	8	21.6	38.3	1.1	10.6	1.6	0.075
9	2025	Yangtze River Basin	Yangtze River	Liukou	18.5	9.1	20.5	44.4	0.8	9.2	1.1	0.042
9	2025	Yangtze River Basin	Yangtze River	Shaxi Town	44.5	8.2	21.5	44.9	1	6.1	1.4	0.054
9	2025	Yangtze River Basin	Yangtze River	Sujia (Left Bank)	36.5	8.1	20.3	44.2	1.5	10.2	1.4	0.053
9	2025	Yangtze River Basin	Yangtze River	Chongming Dongtan (Left Bank)	121.9	9.2	20.3	348.9	1.3	7	2	0.069

Appendix A.3

Appendix A.3.1. Prompt Word Framework

# [Agent Name]—System Prompt Template

## 1. Role Definition

You are a [domain-specific] assistant. Your role is to [core function description].

## 2. Data Source Constraint

-: All responses must be based strictly on the connected [data source type] data.
-: You shall NOT construct self-built models, simulate data, or generate outputs beyond the available data.
-: If the requested information is not available, explicitly state: “Data not available in current scope.”

## 3. Task Scope

You are authorized to handle the following types of queries:

-: [Task Type 1]: [Brief description]
-: [Task Type 2]: [Brief description]
-: [Task Type 3]: [Brief description]

## 4. Prohibited Behaviors

-: [Prohibition 1]: e.g., Do not fabricate data.
-: [Prohibition 2]: e.g., Do not infer beyond available data.
-: [Prohibition 3]: e.g., Do not modify time range or parameters to force results.

## 5. Output Format Rules

-: If the user requests a list: output in tabular format.
-: If no data is found: state “No matching results found”.
-: If the query involves comparison: include temporal context (e.g., “compared to previous period”).

## 6. Domain Knowledge Constraints (if applicable)

-: [Constraint 1]: e.g., Causal attribution must respect seasonal and regional constraints.
-: [Constraint 2]: e.g., Certain source types should only be considered under specific conditions.

Appendix A.3.2. Representative Desensitized Agent Prompt for Air Quality Forecasting

Agent Role: Water Environment Assessment Agent

Task Execution Rules:

1.: Query Mode:
-: For simple, non-decomposition queries: forward the user’s original query directly to the data tool without rephrasing or simplification.
2.: Profile Analysis Mode:
-: When performing profile analysis, the agent must first retrieve contextual data from surrounding areas.
-: The agent shall NOT perform anomaly tracing or root-cause diagnosis in this mode.
3.: Section List Handling:
-: For each identified problem section, the agent shall call the tool sequentially to query: exceeded-standard sections, year-on-year/month-on-month deteriorating sections, inferior Class V sections, and toxic/heavy-metal exceeded sections.
-: If no data is returned for a category, the agent shall conclude “no matching sections found” and shall NOT re-query with modified time ranges.
4.: Output Format:
-: If the user requests a list, output in tabular format.
5.: Domain Knowledge Constraints:
-: Causal attribution must respect seasonal and regional constraints (e.g., agricultural non-point source pollution should not be attributed in winter for northern regions due to minimal agricultural activity and precipitation).

References

Zhang, D.W. Establishing a nation-wide eco-environment monitoring network for sustainable governance. Environ. Sci. Ecotechnol. 2025, 26, 100585. [Google Scholar] [CrossRef] [PubMed]
Bibri, S.E.; Krogstie, J.; Kaboli, A.; Alahi, A. Smarter eco-cities and their leading-edge artificial intelligence of things solutions for environmental sustainability: A comprehensive systematic review. Environ. Sci. Ecotechnol. 2024, 19, 100330. [Google Scholar] [CrossRef] [PubMed]
Asha, P.; Natrayan, L.; Geetha, B.T.; Beulah, J.R.; Sumathy, R.; Varalakshmi, G. IoT enabled environmental toxicology for air pollution monitoring using AI techniques. Environ. Res. 2022, 205, 112574. [Google Scholar] [CrossRef] [PubMed]
Kipp, M. From GPT-3.5 to GPT-4o: A leap in AI’s medical exam performance. Information 2024, 15, 543. [Google Scholar] [CrossRef]
Vavekanand, R.; Sam, K. Llama 3.1: An In-Depth Analysis of the Next-Generation Large Language Model. 2024. Available online: https://www.researchgate.net/publication/382494872_Llama_31_An_In-Depth_Analysis_of_the_Next_Generation_Large_Language_Model?channel=doi&linkId=66a0a16627b00e0ca43dfb67&showFulltext=true (accessed on 5 May 2026).
Thorp, H.H. ChatGPT is fun, but not an author. Science 2023, 379, 313. [Google Scholar] [CrossRef] [PubMed]
Wu, T.Y.; He, S.Z.; Liu, J.P.; Sun, S.Q.; Liu, K.; Han, Q.L. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
Bulian, J.; Schäfer, M.S.; Amini, A.; Lam, H.; Ciaramita, M.; Gaiarin, B. Assessing large language models on climate information. arXiv 2023, arXiv:2310.02932. [Google Scholar]
Grossmann, I.; Feinberg, M.; Parker, D.C.; Christakis, N.A.; Tetlock, P.E.; Cunningham, W.A. AI and the transformation of social science research. Science 2023, 380, 1108–1109. [Google Scholar] [CrossRef] [PubMed]
Kim, K.; Park, J. A survey of applications of artificial intelligence algorithms in eco-environmental modelling. Environ. Eng. Res. 2009, 14, 102–110. [Google Scholar] [CrossRef]
Alotaibi, E.; Nassif, N. Artificial intelligence in environmental monitoring: In-depth analysis. Discov. Artif. Intell. 2024, 4, 84. [Google Scholar] [CrossRef]
Zhang, Y.X.; Lin, S.J.; Xiong, Y.X.; Li, N.; Zhong, L.J.; Ding, L.Z. Fine-tuning large language models for interdisciplinary environmental challenges. Environ. Sci. Ecotechnol. 2025, 27, 100608. [Google Scholar] [CrossRef] [PubMed]
Bostrom, A.; Demuth, J.L.; Wirz, C.D.; Cains, M.G.; Schumacher, A.; Madlambayan, D. Trust and trustworthy artificial intelligence: A research agenda for AI in the environmental sciences. Risk Anal. 2024, 44, 1498–1513. [Google Scholar] [CrossRef] [PubMed]
Zhou, S.Y.; Wang, W.; Zhu, L.; Qiao, Q.; Kang, Y.L. Deep-learning architecture for PM2.5 concentration prediction: A review. Environ. Sci. Ecotechnol. 2024, 21, 100400. [Google Scholar] [CrossRef] [PubMed]
Song, J.; Ma, C.D.; Ran, M.H. AirGPT: Pioneering the convergence of conversational AI with atmospheric science. npj Clim. Atmos. Sci. 2025, 8, 179. [Google Scholar] [CrossRef]
Krzyżewska, A. The applications of AI tools in the fields of weather and climate—Selected examples. Atmosphere 2025, 16, 490. [Google Scholar] [CrossRef]
Thulke, D.; Gao, Y.; Pelser, P.; Brune, R.; Jalota, R.; Fok, F. Climategpt: Towards ai synthesizing interdisciplinary research on climate change. arXiv 2024, arXiv:2401.09646. [Google Scholar]
Ren, Y.; Zhang, T.Y.; Dong, X.R.; Li, W.B.; Wang, Z.Y.; He, J. WaterGPT: Training a large language model to become a hydrology expert. Water 2024, 16, 3075. [Google Scholar] [CrossRef]
Bi, Z.; Zhang, N.Y.; Xue, Y.D.; Ou, Y.X.; Ji, D.X.; Zheng, G.Z. Oceangpt: A large language model for ocean science tasks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 3357–3372. [Google Scholar]
Ji, Z.W.; Lee, N.; Frieske, R.; Yu, T.Z.; Su, D.; Xu, Y. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Biswas, S.S. Potential use of chat GPT in global warming. Ann. Biomed. Eng. 2023, 51, 1126–1127. [Google Scholar] [CrossRef] [PubMed]
Wang, S.Q.; Hu, T.; Xiao, H.; Li, Y.; Zhang, C.; Ning, H. GPT, large language models (LLMs) and generative artificial intelligence (GAI) models in geospatial science: A systematic review. Int. J. Digit. Earth 2024, 17, 2353122. [Google Scholar] [CrossRef]
Vaghefi, S.A.; Stammbach, D.; Muccione, V.; Bingler, J.; Ni, J.W.; Kraus, M. ChatClimate: Grounding conversational AI in climate science. Commun. Earth Environ. 2023, 4, 480. [Google Scholar] [CrossRef]
Zhu, J.J.; Yang, M.Q.; Jiang, J.Y.; Bai, Y.M.; Chen, D.Q.; Ren, Z.J. Enabling GPTs for expert-level environmental engineering question answering. Environ. Sci. Technol. Lett. 2024, 11, 1327–1333. [Google Scholar] [CrossRef]
Wang, Y.Z.; Fan, Y.F.; Guo, L.; Xie, Y.D. Mapping the research landscape of Large Language Models from 2018 to 2024: A bibliometric analysis. Procedia Comput. Sci. 2025, 266, 947–954. [Google Scholar] [CrossRef]
Smail, L.; Calonge, D.S.; Kamalov, F.; Orak, N.H. Can large language models bridge the gap in environmental knowledge? arXiv 2025, arXiv:2508.03149. [Google Scholar]
Dhamija, P.; Bag, S. Role of artificial intelligence in operations environment: A review and bibliometric analysis. TQM J. 2020, 32, 869–896. [Google Scholar] [CrossRef]
HJ 694-2014; Water Quality—Determination of Mercury, Arsenic, Selenium, Bismuth and Antimony—Atomic Fluorescence Spectrometry. China Environmental Science Press: Beijing, China, 2014.
Lawson, J.R.; Trujillo-Falcón, J.E.; Schultz, D.M.; Flora, M.L.; Goebbert, K.H.; Lyman, S.N. Pixels and predictions: Potential of GPT-4V in meteorological imagery analysis and forecast communication. Artif. Intell. Earth Syst. 2025, 4, 240029. [Google Scholar] [CrossRef]
Núñez-Delgado, A. Avoiding basic mistakes when programming the use of artificial intelligence in soil and environmental science research. Sci. Total Environ. 2024, 934, 173310. [Google Scholar] [CrossRef] [PubMed]
Han, B.A.; Varshney, K.R.; Ladeau, S.; Subramaniam, A.; Weathers, K.C.; Zwart, J. A synergistic future for AI and ecology. Proc. Natl. Acad. Sci. USA 2023, 120, e2220283120. [Google Scholar] [CrossRef] [PubMed]
Ding, N.; Qin, Y.J.; Yang, G.; Wei, F.C.; Yang, Z.H.; Su, Y.S. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mach. Intell. 2023, 5, 220–235. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of AI-driven national intelligent eco-environmental monitoring network. The AI integrated component is named EnvSentry, consisting of data governance from space–air–ground–sea data source and decision-making.

Figure 2. EnvSentry evolutionary and iterative framework.

Figure 3. Statistic of documents and knowledge chunks across eight major categories of eco-environmental knowledge base.

Figure 4. Envsentry data pipeline: from multi-source data and expert knowledge to intelligent decision-making.

Figure 5. Iterative knowledge structuring workflow in EnvSentry.

Figure 6. RAG performance in Envsentry.

Figure 7. Workflow of the EnvSentry applied to solving eco-environmental monitoring problems.

Figure 8. Comparative analysis of eco-environmental monitoring task completion time with and without EnvSentry.

Table 1. Quantitative evaluation results of three LLM-based systems in eco-environmental monitoring tasks.

Model	Factual Accuracy	Temporal Specificity	Structural Completeness	Operational Practicality	Overall Score
ChatGPT-5	62	88	86	60	74
DeepSeek V3.1	76	88	86	78	82
EnvSentry	92	88	86	90	89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Feng, Y.; Yan, L.; Ji, K.; Yang, W.; Chang, M.; Zhang, Q.; Chen, C. Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China. Appl. Sci. 2026, 16, 6557. https://doi.org/10.3390/app16136557

AMA Style

Li W, Feng Y, Yan L, Ji K, Yang W, Chang M, Zhang Q, Chen C. Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China. Applied Sciences. 2026; 16(13):6557. https://doi.org/10.3390/app16136557

Chicago/Turabian Style

Li, Wenpan, Yu Feng, Luyu Yan, Kebin Ji, Wanglong Yang, Ming Chang, Qi Zhang, and Chuanzhong Chen. 2026. "Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China" Applied Sciences 16, no. 13: 6557. https://doi.org/10.3390/app16136557

APA Style

Li, W., Feng, Y., Yan, L., Ji, K., Yang, W., Chang, M., Zhang, Q., & Chen, C. (2026). Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China. Applied Sciences, 16(13), 6557. https://doi.org/10.3390/app16136557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China

Abstract

1. Introduction

2. Architecture and Methodology

2.1. Schematic Diagram of EnvSentry

2.2. Development Progress of EnvSentry

2.3. Knowledge Base Construction and Vector Indexing

2.4. Data Pipeline of Envsentry

2.5. Multi-Engine Collaborative Mechanism

2.6. Context Adaptation

3. Application and Performance

3.1. RAG Performance in Envsentry

3.2. Workflow of EnvSentry for Applications

3.3. Case Study and Performance

3.3.1. Real-Time Water Quality Analysis and Assessment

3.3.2. Air Quality Forecasting and Trend Assessment

3.3.3. Operations and Maintenance (O&M) of the National Ecological Environmental Monitoring Network

3.4. Performance Comparison with Typical LLMs

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

Appendix A.3

Appendix A.3.1. Prompt Word Framework

Appendix A.3.2. Representative Desensitized Agent Prompt for Air Quality Forecasting

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI