Automating the Construction of Environmental Policy Knowledge Graph with Large Language Models
Abstract
1. Introduction
2. Methods
2.1. IEO Framework: Iterative Knowledge-Augmented Construction
- : The three-phase extractor runs NER → relation extraction → validation on the current text chunk , with the background generated by Equations (2) injected into the prompts; the output is a set of candidate triples.
- : The two-phase optimizer performs structural cleaning and semantic normalization to produce a set of normalized triples.
- : A dynamic context is generated via density peak clustering, selecting associated text chunks by parent/child/sibling node types and their importance scores , where and are calculated by Equations (3) and (4), respectively.
- : A graph is constructed from historical triples ; the subgraph most relevant to the current entities serves as background.
- Dynamic Context Enhancement: When processing each new text chunk, the model can reference a dynamically generated background knowledge base, which facilitates a more comprehensive resolution of complex and long-distance relationships between entities.
- Early Error Correction: By performing immediate validation and optimization on newly generated triples in each iteration, local errors can be effectively identified and corrected. This prevents them from accumulating and amplifying during the construction process, thereby reducing the burden of global fusion for the final graph.
- Progressive Knowledge Discovery: As the background knowledge base grows, the framework can uncover deeper and more implicit entity relationships within the text, contributing to the construction of a more semantically coherent and consistent knowledge graph.
- 1.
- Extraction: A set of candidate triples is generated from using the three-phase extractor, with the subgraph generated from historical triples and the associated text chunks selected by DBCP serving as background knowledge.
- 2.
- Optimization: The newly generated candidate triples are cleaned and normalized using the two-phase optimizer.
- 3.
- Knowledge Augmentation: The optimized triples are merged into the triple pool, updating it to .
2.2. IEO: Extraction Process
2.2.1. Density-Based Context Perception Mechanism
- 1.
- Semantic Hierarchy Network Construction: We map all text chunks into a high-dimensional semantic space using an embedding model. Subsequently, we apply the DPC [32] algorithm to calculate the and for each text chunk and determine its “parent node,” , which is the nearest node with a higher density.
- 2.
- Associated Context Search: For the current text chunk node i to be processed, we define three types of associated nodes to construct context. The selection of these three node types is a direct product of the semantic hierarchy network structure built by the DPC, ensuring the logical relevance and hierarchical nature of the context:
- 3.
- Context Assignment: After searching for associated nodes, we adopt a strategy that combines balanced allocation and importance ranking to select the final context. First, we determine the basic quota for each category based on the preset total number of contexts and the number of non-empty node categories . Within each category, nodes are arranged in descending order of their importance score , prioritizing key nodes in the network. This strategy ensures the diversity and quality of context sources, and a fallback mechanism (such as a cosine similarity supplement) is designed to handle isolated nodes and other special cases.
2.2.2. Three-Phase Extraction Mechanism
- Phase 1: Named Entity Recognition (NER)
- 2.
- Phase 2: Triplet Extraction
- 3.
- Phase 3: Triplet Review
2.3. IEO: Optimization Process
2.3.1. Phase 1: Triplet Cleaning
- Syntactic and Structural Filtering: This step aims to eliminate the most basic level of noise. We begin by standardizing the text of all triplet components (head, relation, tail), for instance, by removing excess leading/trailing whitespace and meaningless special characters. Subsequently, we identify and remove three types of structurally invalid triplets: null-value triplets (any element is empty), semantically void triplets (elements containing placeholders like “null” or “N/A”), and irrational self-loops (e.g., the head and tail entities are identical in a non-reflexive relationship).
- Confidence-based Filtering: Leveraging the confidence score assigned to each triplet during the extraction phase, we establish a configurable threshold. All triplets with a confidence score below this threshold are filtered out.
- Exact Deduplication: We detect and remove completely identical triplets by concatenating the text of the (head, relation, tail) of each triplet into a unique string identifier. This step is a coarse-grained cleaning process that only considers whether the textual representation of the triplets is identical, without regard to their deeper semantics (such as the specific entity types). When conflicts arise, the version with the highest confidence score is retained.
2.3.2. Phase 2: Triplet Normalization
- 1.
- Relation Standardization: To resolve issues of synonymous or near-synonymous relationship expressions (e.g., “stipulates,” “requires,” “establishes”), we apply a set of predefined mapping rules to unify these varied expressions into a standardized set of relation predicates. This can enhance semantic coherence and query efficiency.
- 2.
- Conflict Detection and Resolution: We identify potential semantic conflicts by grouping by entity pair. For example, for the same entity pair , mutually exclusive triplets might exist, such as and . In such cases, we adopt a confidence-driven principle, automatically selecting the triplet with the highest confidence score as the final valid relation, thereby resolving internal logical contradictions.
- 3.
- Fine-grained Fusion: This step addresses deep semantic redundancies that “exact deduplication” cannot resolve. It merges semantically equivalent triplets by constructing a more stringent unique identifier that includes entity types (e.g., head name, head type, relation, tail name, tail type). This allows for the precise differentiation of homonymous but different-typed entities (e.g., distinguishing between “Apple” the company and “apple” the fruit), enabling more fine-grained knowledge fusion and ensuring the semantic consistency of the knowledge graph.
3. Case
3.1. Corpus Construction
3.1.1. Data Collection
- Macro-strategic documents: Niger’s Sustainable Development and Inclusive Growth Strategy 2035, which establishes the top-level design for the national environment and development.
- Core laws and regulations: key legal texts in critical areas, including the Mining Law (2022 version), the Petroleum Law (2017/2007 versions), and the Environmental Management Framework Law (1998 version).
- Specific policies and guidelines: including the National Climate Change Adaptation Plan, the National Petroleum Policy, and the Country Guide for Foreign Investment and Cooperation—Niger (2024 Edition), published on China’s Belt and Road Portal.
- News and case: covering policy dynamics and project cases between China and Niger, obtained from China’s Belt and Road Portal.
3.1.2. Text Preprocessing
- Text Standardization: As a cleaning step, we applied a set of standardization rules aimed at preserving the original semantic structure. This included handling special punctuation, normalizing paragraphs and line breaks, and removing excess whitespace, thereby ensuring format consistency for subsequent processing.
- Semantic-Aware Chunking: To preserve semantic coherence as much as possible while segmenting the text, we used the RecursiveCharacterTextSplitter from the LangChain framework. This method performs hierarchical splitting based on the syntactic structure of Chinese (e.g., periods, semicolons). We set a chunk size of 300 characters and an overlap of 20 characters. Through this step, the entire corpus was divided into 5277 independent text chunks.
- Vector Embedding: Finally, we used ZHIPU AI’s Embedding-3 model to map each text chunk to a high-dimensional semantic vector. This step completed the conversion from discrete text to a continuous semantic space, providing the foundation for the subsequent selection of text chunks based on DBCP.
3.2. Performance Evaluation Scheme
3.2.1. Benchmark and Comparative Methods
3.2.2. Evaluation Metrics
3.3. Construction and Application of the Knowledge Graph
3.3.1. Policy Network Analysis
- Multi-dimensional Regulatory Correlation Analysis: For example, as shown in Figure 4a, a multi-hop graph traversal can reveal the deeply embedded relationships of the Petroleum Law within Niger’s legal system. It can not only show its direct regulatory relationship with the Public Tax Law but also trace its legal origins back to the 1999 Constitution. This analysis examines environmental regulations within the broader context of the national fiscal and constitutional framework, revealing the multiple considerations in policymaking.
- Tracing Hidden Risk Pathways: For example, as shown in Figure 4b, by querying the path from “mining permit holder” to specific environmental impact entities, a key link can be identified. This link connects through regional organizations like the “Economic Community of West African States” to the “Climate Change and Environmental Degradation Risks and Adaptation Project” (CEDRA). This clearly demonstrates how local economic activities are linked to international and regional environmental governance frameworks, helping to trace complex risk transmission pathways. Such multi-layered, implicit relationships are difficult to uncover with traditional text analysis.
3.3.2. Aided Knowledge Retrieval and Analysis
- Multi-dimensional Policy Correlation Identification: Querying all association paths between the Petroleum Law and the Environmental Management Framework Law returns various types of connections, including legislative synergy, legal repeal and updates, and contractual compliance requirements, providing a factual basis for policy coherence analysis.
- Profiling Regulatory Systems in Specific Domains: By querying the theme of “environmental protection in the mining industry”, with the Environmental Management Framework Law as the core, the system aggregates all related international agreements, site remediation regulations, emission standards, etc., to dynamically generate a regulatory network view of the domain.
- Deconstruction of Macro-strategies: Querying the “environmental protection” measures contained in the 2035 Sustainable Development Strategy allows the graph to break them down into a series of specific components, such as associated petroleum policies, multi-departmental coordination plans, and rural development strategies, which helps to reveal the implementation pathways of macro-strategies.
- Identification of Policy Coordination Mechanisms: By analyzing all interaction paths between the two core concept nodes of “environmental protection” and “economic development”, various policy coordination mechanisms can be identified, such as policy formulation, fundraising, high-level coordination, and the application of economic instruments.
4. Discussion
- We propose and validate a new method for constructing knowledge graphs from complex domain-specific texts. The IEO framework surpasses the traditional, static “one-pass extraction” pipeline by introducing the idea of “dynamic iteration and progressive enhancement.” This approach offers a feasible path for using large language models to process specialized, context-dependent corpora.
- Through the case study of Niger’s environmental policy, we demonstrate the potential of transforming large volumes of unstructured policy texts into structured knowledge. The knowledge graph built by IEO is not just a collection of knowledge but also a powerful analytical tool capable of supporting compliance checks, risk pathway tracing, and multi-dimensional policy correlation analysis. This has practical application value for decision support and risk management in transnational projects and investments, such as those under the Belt and Road Initiative.
5. Conclusions
- Through its innovative iterative knowledge augmentation and density-based context perception mechanisms, the IEO framework performs well in the completeness of knowledge extraction. Its recall rate is superior to existing benchmark methods, helping to alleviate problems such as knowledge omission and long-range dependencies in complex texts.
- While achieving a high recall rate, the IEO framework maintains a good F1 comprehensive performance, indicating a good balance between knowledge coverage and accuracy. The framework demonstrates robust performance across different large language models, showcasing the potential universality of its methodology.
- In the case study of Niger’s environmental policy, we preliminarily constructed a structured knowledge graph for in-depth analysis and demonstrated its potential in revealing complex policy networks and supporting intelligent knowledge retrieval.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhao, L.; Liu, J.; Li, D.; Yang, Y.; Wang, C.; Xue, J. China’s Green Energy Investment Risks in Countries along the Belt and Road. J. Clean. Prod. 2022, 380, 134938. [Google Scholar] [CrossRef]
- Mahadevan, R.; Sun, Y. Effects of Foreign Direct Investment on Carbon Emissions: Evidence from China and Its Belt and Road Countries. J. Environ. Manag. 2020, 276, 111321. [Google Scholar] [CrossRef] [PubMed]
- Cai, X.; Che, X.; Zhu, B.; Zhao, J.; Xie, R. Will Developing Countries Become Pollution Havens for Developed Countries? An Empirical Investigation in the Belt and Road. J. Clean. Prod. 2018, 198, 624–632. [Google Scholar] [CrossRef]
- Cheng, S.; Wang, B. Impact of the Belt and Road Initiative on China’s Overseas Renewable Energy Development Finance: Effects and Features. Renew. Energy 2023, 206, 1036–1048. [Google Scholar] [CrossRef]
- Duan, F.; Ji, Q.; Liu, B.-Y.; Fan, Y. Energy investment risk assessment for nations along China’s belt & road initiative. J. Clean. Prod. 2018, 170, 535–547. [Google Scholar] [CrossRef]
- Macintosh, A.; Wilkinson, D. Complexity Theory and the Constraints on Environmental Policymaking. J. Environ. Law 2016, 28, 65–93. [Google Scholar] [CrossRef]
- Vesterager, J.P.; Frederiksen, P.; Kristensen, S.B.P.; Vadineanu, A.; Gaube, V.; Geamana, N.A.; Pavlis, V.; Terkenli, T.S.; Bucur, M.M.; van der Sluis, T.; et al. Dynamics in national agri-environmental policy implementation under changing EU policy priorities: Does one size fit all? Land Use Policy 2016, 57, 764–776. [Google Scholar] [CrossRef]
- Meckling, J.; Kelsey, N.; Biber, E.; Zysman, J. Winning Coalitions for Climate Policy. Science 2015, 349, 1170–1171. [Google Scholar] [CrossRef]
- Macris, A.M.; Georgakellos, D.A. A New Teaching Tool in Education for Sustainable Development: Ontology-Based Knowledge Networks for Environmental Training. J. Clean. Prod. 2006, 14, 855–867. [Google Scholar] [CrossRef]
- Dong, L.; Ren, M.; Xiang, Z.; Zheng, P.; Cong, J.; Chen, C.-H. A Novel Smart Product-Service System Configuration Method for Mass Personalization Based on Knowledge Graph. J. Clean. Prod. 2023, 382, 135270. [Google Scholar] [CrossRef]
- Hofmeister, M.; Lee, K.F.; Tsai, Y.-K.; Müller, M.; Nagarajan, K.; Mosbach, S.; Akroyd, J.; Kraft, M. Dynamic Control of District Heating Networks with Integrated Emission Modelling: A Dynamic Knowledge Graph Approach. Energy AI 2024, 17, 100376. [Google Scholar] [CrossRef]
- Wu, P.; Tu, H.; Mou, X.; Gong, L. An Intelligent Energy Management Method for the Manufacturing Systems Using the Knowledge Graph and Large Language Model. J. Intell. Manuf. 2025, 1–20. [Google Scholar] [CrossRef]
- Wang, X.; Meng, L.; Wang, X.; Wang, Q. The Construction of Environmental-Policy-Enterprise Knowledge Graph Based on PTA Model and PSA Model. Resour. Conserv. Recycl. Adv. 2021, 12, 200057. [Google Scholar] [CrossRef]
- Islam, M.S.; Proma, A.; Zhou, Y.-S.; Akter, S.N.; Wohn, C.; Hoque, E. KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues. In Proceedings of the AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate Challenges, Arlington, VA, USA, 17–19 November 2022. [Google Scholar]
- Zhong, L.; Wu, J.; Li, Q.; Peng, H.; Wu, X. A Comprehensive Survey on Automatic Knowledge Graph Construction. ACM Comput. Surv. 2024, 56, 1–62. [Google Scholar] [CrossRef]
- Val-Calvo, M.; Egaña Aranguren, M.; Mulero-Hernández, J.; Almagro-Hernández, G.; Deshmukh, P.; Bernabé-Díaz, J.A.; Espinoza-Arias, P.; Sánchez-Fernández, J.L.; Mueller, J.; Fernández-Breis, J.T. OntoGenix: Leveraging Large Language Models for Enhanced Ontology Engineering from Datasets. Inf. Process. Manag. 2025, 62, 104042. [Google Scholar] [CrossRef]
- Guo, L.; Yan, F.; Li, T.; Yang, T.; Lu, Y. An Automatic Method for Constructing Machining Process Knowledge Base from Knowledge Graph. Robot. Comput. Integr. Manuf. 2022, 73, 102222. [Google Scholar] [CrossRef]
- Hu, Y.; Zou, F.; Han, J.; Sun, X.; Wang, Y. LLM-TIKG: Threat Intelligence Knowledge Graph Construction Utilizing Large Language Model. Comput. Secur. 2024, 145, 103999. [Google Scholar] [CrossRef]
- Laver, M.; Benoit, K.; Garry, J. Extracting Policy Positions from Political Texts Using Words as Data. Am. Pol. Sci. Rev. 2003, 97, 311–331. [Google Scholar] [CrossRef]
- Wang, X.; Huang, L.; Daim, T.; Li, X.; Li, Z. Evaluation of China’s New Energy Vehicle Policy Texts with Quantitative and Qualitative Analysis. Technol. Soc. 2021, 67, 101770. [Google Scholar] [CrossRef]
- Egbemhenghe, A.U.; Ojeyemi, T.; Iwuozor, K.O.; Emenike, E.C.; Ogunsanya, T.I.; Anidiobi, S.U.; Adeniyi, A.G. Revolutionizing Water Treatment, Conservation, and Management: Harnessing the Power of AI-Driven ChatGPT Solutions. Environ. Chall. 2023, 13, 100782. [Google Scholar] [CrossRef]
- Liao, W.; Lu, X.; Fei, Y.; Gu, Y.; Huang, Y. Generative AI Design for Building Structures. Autom. Constr. 2024, 157, 105187. [Google Scholar] [CrossRef]
- Jurišević, N.; Kowalik, R.; Gordić, D.; Novaković, A.; Vukasinovic, V.; Rakić, N.; Nikolić, J.; Vukicevic, A. Large Language Models as Tools for Public Building Energy Management: An Assessment of Possibilities and Barriers. Int. J. Qual. Res. 2025, 19, 817–830. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, X.; Chen, J.; Qiao, S.; Ou, Y.; Yao, Y.; Deng, S.; Chen, H.; Zhang, N. LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities. World Wide Web 2024, 27, 58. [Google Scholar] [CrossRef]
- Carta, S.; Giuliani, A.; Piano, L.; Podda, A.S.; Pompianu, L.; Tiddia, S.G. Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction. arXiv 2023, arXiv:2307.01128. [Google Scholar] [CrossRef]
- Polak, M.P.; Morgan, D. Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering. Nat. Commun. 2024, 15, 1569. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, H.; Xu, G.; Ren, M. A Novel Large-Language-Model-Driven Framework for Named Entity Recognition. Inf. Process. Manag. 2025, 62, 104054. [Google Scholar] [CrossRef]
- Wei, X.; Cui, X.; Cheng, N.; Wang, X.; Zhang, X.; Huang, S.; Xie, P.; Xu, J.; Chen, Y.; Zhang, M.; et al. ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv 2024, arXiv:2302.10205. [Google Scholar]
- Chen, B.; Bertozzi, A.L. AutoKG: Efficient Automated Knowledge Graph Generation for Language Models. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; pp. 3117–3126. [Google Scholar]
- Yang, R.; Yang, B.; Feng, A.; Ouyang, S.; Blum, M.; She, T.; Jiang, Y.; Lecue, F.; Lu, J.; Li, I. Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective 2024. arXiv 2024, arXiv:2410.17600. [Google Scholar]
- Zhang, B.; Soh, H. Extract, Define, Canonicalize: An LLM-Based Framework for Knowledge Graph Construction 2024. arXiv 2024, arXiv:2404.03868. [Google Scholar]
- Rodriguez, A.; Laio, A. Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef]
- Gutiérrez, B.J.; Shu, Y.; Gu, Y.; Yasunaga, M.; Su, Y. HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. Adv. Neural Inf. Process. Syst. 2024, 37, 59532–59569. [Google Scholar]
- AI Powered Knowledge Graph Generator. Available online: https://github.com/robert-mcdermott/ai-knowledge-graph (accessed on 17 July 2025).
- Castro Ferreira, T.; Gardent, C.; Ilinykh, N.; van der Lee, C.; Mille, S.; Moussallem, D.; Shimorina, A. The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020). In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Virtual, Ireland, 18 December 2020; Castro Ferreira, T., Gardent, C., Ilinykh, N., van der Lee, C., Mille, S., Moussallem, D., Shimorina, A., Eds.; Association for Computational Linguistics: Dublin, Ireland, 2020; pp. 55–76. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. arXiv 2019, arXiv:1904.09675. [Google Scholar]
- Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008; Volume 39. [Google Scholar]
- Zhang, W.; Guo, H.; Yang, J.; Tian, Z.; Zhang, Y.; Chaoran, Y.; Li, Z.; Li, T.; Shi, X.; Zheng, L.; et al. mABC: Multi-Agent Blockchain-Inspired Collaboration for Root Cause Analysis in Micro-Services Architecture. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 4017–4033. [Google Scholar]
- Pei, C.; Wang, Z.; Liu, F.; Li, Z.; Liu, Y.; He, X.; Kang, R.; Zhang, T.; Chen, J.; Li, J.; et al. Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 422–431. [Google Scholar]





| Metric | HippoRAG | AIKG | IEO |
|---|---|---|---|
| Micro-Precision | 0.92235 | 0.8902 | 0.76335 |
| Micro-Recall | 0.6919 | 0.8259 | 0.93025 |
| Micro-F1 | 0.78395 | 0.85545 | 0.8368 |
| Macro-Precision | 0.8443 | 0.887 | 0.75365 |
| Macro-Recall | 0.7155 | 0.8498 | 0.9484 |
| Macro-F1 | 0.74175 | 0.83805 | 0.80975 |
| Total Reference Triplets | 2809 | 2809 | 2809 |
| Total Candidate Triplets | 2116.5 | 2612.5 | 3442.5 |
| Matched Triplets | 1943.5 | 2320 | 2613 |
| Missed Triplets | 865.5 | 489 | 196 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Liu, X.; Tu, X.; Lu, Y.; Wang, Y. Automating the Construction of Environmental Policy Knowledge Graph with Large Language Models. Sustainability 2025, 17, 10282. https://doi.org/10.3390/su172210282
Yang Y, Liu X, Tu X, Lu Y, Wang Y. Automating the Construction of Environmental Policy Knowledge Graph with Large Language Models. Sustainability. 2025; 17(22):10282. https://doi.org/10.3390/su172210282
Chicago/Turabian StyleYang, Yuexiang, Xuewen Liu, Xinyu Tu, Yali Lu, and Yue Wang. 2025. "Automating the Construction of Environmental Policy Knowledge Graph with Large Language Models" Sustainability 17, no. 22: 10282. https://doi.org/10.3390/su172210282
APA StyleYang, Y., Liu, X., Tu, X., Lu, Y., & Wang, Y. (2025). Automating the Construction of Environmental Policy Knowledge Graph with Large Language Models. Sustainability, 17(22), 10282. https://doi.org/10.3390/su172210282

