Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety
Abstract
1. Introduction
- 1.
- In the absence of publicly available datasets for the coal mine equipment safety domain, this study constructs a dedicated corpus by collecting 194 safety-related documents. Through OCR processing, regular-expression-based segmentation, and expert-guided manual annotation (with dual-layer verification by three safety engineers and achieving 0.85 inter-annotator agreement) using the Doccano platform, a high-quality dataset tailored to the needs of domain-specific knowledge extraction is established.
- 2.
- Aiming at the problems in knowledge extraction for the coal mine equipment safety field, such as the unequal semantic status of head and tail entities in the traditional triple structure (head entity–relationship–tail entity) and the tendency of relation extraction to favor one-sided semantics, we propose a coal mine equipment safety knowledge extraction framework based on symmetric constraints. The core of this framework lies in the introduction of a quintuple-structured representation (head entity–head entity type–relationship–tail entity–tail entity type). By imposing a symmetry constraint on the head entity type and the tail entity type in prompt engineering, a mirror-image structure between the head and tail entities in the type dimension is constructed. This symmetric design effectively balances the semantic status between entities, guiding the model to more accurately capture two-way semantic associations. As a result, the joint completion of entity recognition and relation extraction can be efficiently achieved within a unified model;
- 3.
- In this paper, three sets of experiments were conducted. The performance of the proposed extraction framework was compared with that of the traditional step-by-step method of first extracting entities and then relations on Ernie-4, ChatGLM4-9B, Qwen-plus, and DeepSeek-R1. Moreover, on DeepSeek-R1, the large language model with the best performance, the impacts of three prompt strategies, namely zero-shot, few-shot, and CoT, on knowledge extraction were verified. Additionally, the effects of model scales were examined using models of varying sizes including DeepSeek-R1-distill-qwen-14b/32b, Qwen2.5-14B/32B, Qwen-plus, and DeepSeek-R1. We demonstrated the crucial role of extraction paradigm selection in acquiring domain-specific knowledge. Our work provides empirical evidence for the boundary of model capabilities in different extraction scenarios and offers a practical guide for implementing coal mine equipment safety knowledge extraction with minimal supervision and configuration.
2. Related Research
3. Materials and Methods
3.1. Dataset Construction
3.1.1. Data Acquisition and Pre-Processing
3.1.2. Ontology Definition
3.1.3. Data Annotation
3.2. Entity–Relation Extraction
3.2.1. Models
3.2.2. Symmetric Joint Entity–Relation Extraction
3.2.3. Prompt Strategy
3.2.4. The Crucial Role of Symmetry in Knowledge Extraction
4. Experimental Results and Discussion
4.1. Experimental Environment
4.2. Computational Efficiency Analysis
4.3. Indicators for Model Evaluation
4.4. Experiments and Analysis of Results
4.4.1. Baseline Comparison Experiment
4.4.2. Prompt Ablation Experiment
4.4.3. Model Parameter Experiment
4.5. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Cui, C.; Zhang, F.; Kang, Y.; Bao, K. Statistical analysis and research on major and above coal mine accidents in China from 2011 to 2020. J. Saf. Environ. 2023, 29, 3269–3276. [Google Scholar]
- Zhang, S.; Zhang, M.; Zhu, R.; Meng, Q. Analysis of the characteristics of China’s mine accidents in the past five years and countermeasures for prevention and control. Coal Chem. Ind. 2021, 44, 101–109. [Google Scholar]
- Zhang, J.; Yao, Y.; Feng, Y.; Liu, Q. Application of fault diagnosis and early warning system in coal mining. Coal Sci. Technol. 2021, 49, 175–182. [Google Scholar]
- Wang, H.; Qi, Q.; Liang, Y.; Qi, Q.; Liu, Y.; Sun, Z. Statistical analysis and countermeasures of major accidents in coal mines in China. China Saf. Sci. J. 2024, 34, 9–18. [Google Scholar]
- Qiu, Z.; Liu, Q.; Li, X.; Zhang, J.; Zhang, Y. Construction and analysis of a coal mine accident causation network based on text mining. Process Saf. Environ. Prot. 2021, 153, 320–328. [Google Scholar] [CrossRef]
- Zhang, P.; Sheng, L.; Wang, W.; Wei, W.; Zhao, J. Construction of a mine accident knowledge graph based on Large Language Models. J. Mine Autom. 2025, 2, 76–105. [Google Scholar]
- Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A.S.; Ceder, G.; Persso, K.A.; Jain, A. Structured information extraction from scientific text with large language models. Nat. Commun. 2024, 15, 1418. [Google Scholar] [CrossRef] [PubMed]
- Hu, Y.; Chen, Q.; Du, J.; Peng, X.; Keloth, V.K.; Zuo, X.; Zhou, Y.; Li, Z.; Jiang, X.; Lu, Z.; et al. Improving large language models for clinical named entity recognition via prompt engineering. J. Am. Med Informatics Assoc. 2024, 31, 1812–1820. [Google Scholar] [CrossRef]
- Remadi, A.; El Hage, K.; Hobeika, Y.; Bugiotti, F. To prompt or not to prompt: Navigating the use of large language models for integrating and modeling heterogeneous data. Data Knowl. Eng. 2024, 152, 102313. [Google Scholar] [CrossRef]
- Wang, C.; Liu, X.; Chen, Z.; Hong, H.; Tang, J.; Song, D. DeepStruct: Pretraining of language models for structure prediction. Findings ACL 2022, 803–823. [Google Scholar]
- Tang, X.; Su, Q.; Wang, J.; Deng, Z. Chisiec: An information extraction corpus for ancient chinese history. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Torino, Italy, 20–25 May 2024; pp. 3192–3202. [Google Scholar]
- Li, M.; Zhou, H.; Yang, H.; Zhang, R. Rt: A retrieving and chain-of-thought framework for few-shot medical named entity recognition. J. Am. Med. Inform. Assoc. 2024, 31, ocae095. [Google Scholar] [CrossRef]
- Shao, W.; Zhang, R.; Ji, P.; Fan, D.; Hu, Y.; Yan, X.; Cui, C.; Tao, Y.; Mi, L.; Chen, L. Astronomical knowledge entity extraction in astrophysics journal articles via large language models. Res. Astron. Astrophys. 2024, 24, 065012. [Google Scholar] [CrossRef]
- Meoni, S.; Clergerie, D.L.E.; Ryffel, T. Large language models as instructors: A study on multilingual clinical entity extraction. In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, Toronto, ON, Canada, 13 July 2023. [Google Scholar]
- Moassefi, M.; Houshmand, S.; Faghani, S.; Chang, P.D.; Sun, S.H.; Khosravi, B.; Triphati, A.G.; Rasool, G.; Bhatia, N.K.; Folio, L.; et al. Cross-institutional evaluation of large language models for radiology diagnosis extraction: A prompt-engineering perspective. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef]
- Sasaki, Y.; Washizaki, H.; Li, J.; Sander, D.; Yoshioka, N.; Fukazawa, Y. Systematic literature review of prompt engineering patterns in software engineering. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; pp. 670–675. [Google Scholar]
- Zheng, Z.; Chao, W.; Qiu, Z.; Zhu, H.; Xiong, H. Harnessing large language models for text-rich sequential recommendation. In Proceedings of the ACM on Web Conference 2024, Singapore, 13–17 May 2024; pp. 3207–3216. [Google Scholar]
- Khurana, D.; Koli, A.; Khatter, K. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef]
- Lavrinovics, E.; Biswas, R.; Bjerva, J. Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective. J. Web Semant. 2025, 85, 100844. [Google Scholar] [CrossRef]
- Krupka, G. Description of the SRA system as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MA, USA, 6–8 November 1995; pp. 1–12. [Google Scholar]
- Li, L.; Xi, X.; Sheng, S.; Cui, Z.; Xu, J. Research progress on named entity recognition in Chinese deep learning. Comput. Eng. Appl. 2023, 59, 46–69. [Google Scholar]
- Morwal, S.; Jahan, N.; Chopra, D. Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput. 2012, 1, 15–23. [Google Scholar] [CrossRef]
- Chieu, H.L.; Ng, H.T. Named entity recognition: A maximum entropy approach using global information. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, 24 August–1 September 2002. [Google Scholar]
- Bekoulis, G.; Deleu, J.; Demeester, T.; Develder, C. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst. Appl. 2018, 114, 34–45. [Google Scholar] [CrossRef]
- Zheng, S.; Hao, Y.; Lu, D.; Bao, H.; Xu, J.; Hao, H.; Xu, B. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 2017, 257, 59–66. [Google Scholar] [CrossRef]
- Isozaki, H.; Kazawa, H. Efficient support vector classifiers for named entity recognition. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, 24 August–1 September 2002. [Google Scholar]
- Yang, W.; Qin, Y.; Huang, R. Adaptive feature extraction for entity relation extraction. Comput. Speech Lang. 2025, 89, 101712. [Google Scholar] [CrossRef]
- Sharma, T.; Emmert-Streib, F. Deep mining the textual gold in relation extraction. Artif. Intell. Rev. 2025, 58, 1–69. [Google Scholar] [CrossRef]
- Cai, A.; Zhang, Y.; Ren, Z. Fault knowledge graph construction for coal mine fully mechanized mining equipment. J. Mine Autom. 2023, 49, 46–51. [Google Scholar]
- Liu, P.; Ye, S.; Shu, Y.; Lu, X.; Liu, M. Coal mine safety: Knowledge graph construction and its QA approach. J. Chin. Inf. Process. 2020, 34, 49–59. [Google Scholar]
- Zhang, G.; Cao, X.; Zhang, M. A knowledge graph system for the maintenance of coal mine equipment. Math. Probl. Eng. 2021, 13, 2866751. [Google Scholar] [CrossRef]
- Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
Entity Type |
---|
Organizations and Personnel |
Activities and Operations |
Equipment and Tools |
Safety Management |
Environment and Conditions |
Regulations and Standards |
Relationship Type | Relationship Type | Relationship Type |
---|---|---|
shall | requires | allowed |
shall_comply | contains | without_approval |
prohibited | performs | is_a |
strictly_prohibited | configures | belongs_to |
must |
Statistic | Value |
---|---|
Total entity pairs (triples) | 1018 |
Entity Type Distribution | |
Organizations and Personnel | 428 (21.0%) |
Equipment and Tools | 624 (30.6%) |
Activities and Operations | 476 (23.4%) |
Environment and Conditions | 344 (16.9%) |
Safety Management | 136 (6.7%) |
Regulations and Standards | 28 (1.4%) |
Relation Type Distribution | |
shall | 337 (33.1%) |
must | 139 (13.7%) |
prohibited | 112 (11.0%) |
strictly_prohibited | 76 (7.5%) |
shall_comply | 134 (13.2%) |
is_a | 42 (4.1%) |
performs | 38 (3.7%) |
contains | 25 (2.5%) |
requires | 48 (4.7%) |
belongs_to | 17 (1.7%) |
allowed | 15 (1.5%) |
without_approval | 12 (1.2%) |
configures | 23 (2.3%) |
Average sentence length (words) | 32.7 |
Variable | Value |
---|---|
Role | You are a knowledge extraction expert. |
Ontology | Entity Types: Organizations and Personne: CBM enterprises, contractors, workers, certified personnel. Activities and Operations: design, drilling, cementing, logging, fracturing, extraction. Equipment and Tool: safety devices, pumpjacks, compressors, ESPs, gas–water separators. Safety Management: safety protocols, training programs, emergency response plans. Environment and Conditions: sites, pipelines, explosion-proof requirements. Laws and Regulations: laws, administrative regulations, rules, standards, technical codes. relation list: [’should’, ’belongs to’, ’requires’, ’includes’, ’stipulates’, ’meets requirements’, ’is prohibited’, ’is’, ’is conducted by’, ’must not’, ’guides’, ’is allowed’] |
Instruction | 1. Read and understand the safety regulation content. 2. Identify the entities and their types (e.g., “Activities and Operations”, “Equipment and Tools”). 3. Identify the relation between entities (e.g., “should”, “requires”, “belongs to”). 4. Represent each knowledge point as a five-tuple in the form: (’head entity’, ’relation’, ’tail entity’, ’head entity type’, ’tail entity type’). 5. If a noun is modified by an adjective or phrase, include the full noun phrase as the entity. 6. If multiple knowledge points appear in one sentence, create one five-tuple for each. 7. Avoid redundancy and ambiguity in tuple outputs. 8. Maintain semantic fidelity while allowing reordering to ensure relation terms match the relation list. 9. Use “includes” to express hierarchical or conditional relationships between entities. |
Content | Given text: “When conducting drilling operations in CBM enterprises, Hydrogen sulfide protection measures must be taken.” |
Result | (’CBM enterprises’, ’conducts’, ’drilling operations’, ’Organizations and Personnel’, ’Activities and Operations’), (’drilling operations’, ’requires’, ’Hydrogen sulfide protection measures’, ’Activities and Operations’, ’Environment and Conditions’) |
Model | P | R | F1 |
---|---|---|---|
Ernie-4-SERE | 0.1258 | 0.1031 | 0.1133 |
ChatGLM4-9B-SERE | 0.1074 | 0.1073 | 0.1073 |
Qwen-plus-SERE | 0.1338 | 0.1313 | 0.1325 |
DeepSeek-R1-SERE | 0.0988 | 0.1479 | 0.1185 |
Ernie-4-JERE | 0.3219 | 0.2458 | 0.2787 |
ChatGLM4-9B-JERE | 0.4050 | 0.3708 | 0.3872 |
Qwen-plus-JERE | 0.3516 | 0.3062 | 0.3273 |
DeepSeek-R1-JERE | 0.2920 | 0.3188 | 0.3048 |
Model | P | R | F1 |
---|---|---|---|
Ernie-4-SERE | 0.5869 | 0.6131 | 0.5997 |
ChatGLM4-9B-SERE | 0.4325 | 0.6433 | 0.5173 |
Qwen-plus-SERE | 0.3179 | 0.4490 | 0.3723 |
DeepSeek-R1-SERE | 0.2737 | 0.4729 | 0.3468 |
Ernie-4-JERE | 0.7589 | 0.5716 | 0.6521 |
ChatGLM4-9B-JERE | 0.8232 | 0.7118 | 0.7635 |
Qwen-plus-JERE | 0.7518 | 0.6656 | 0.7061 |
DeepSeek-R1-JERE | 0.7668 | 0.8376 | 0.8006 |
Prompt | Task | DeepSeek-R1-JERE | Qwen-Plus-JERE | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
zero-shot | RE | 0.754 | 0.8057 | 0.779 | 0.6591 | 0.8806 | 0.7539 |
NER | 0.2529 | 0.2718 | 0.262 | 0.2413 | 0.275 | 0.2571 | |
few-shot | RE | 0.7668 | 0.8376 | 0.8006 | 0.7267 | 0.9061 | 0.8065 |
NER | 0.292 | 0.3188 | 0.3048 | 0.2375 | 0.2771 | 0.2558 | |
CoT | RE | 0.817 | 0.7962 | 0.8065 | 0.7844 | 0.9268 | 0.8496 |
NER | 0.4779 | 0.4729 | 0.4754 | 0.2453 | 0.3000 | 0.2699 |
Model | Task | P | R | F1 |
---|---|---|---|---|
DeepSeek-R1-distill-qwen-14b | RE | 0.6945 | 0.7277 | 0.7107 |
DeepSeek-R1-distill-qwen-14b | NER | 0.2207 | 0.2198 | 0.2203 |
DeepSeek-R1-distill-qwen-32b | RE | 0.7643 | 0.8933 | 0.8238 |
DeepSeek-R1-distill-qwen-32b | NER | 0.2243 | 0.25 | 0.2365 |
DeepSeek-R1 | RE | 0.7668 | 0.8376 | 0.8006 |
DeepSeek-R1 | NER | 0.292 | 0.3188 | 0.3048 |
Qwen2.5-14B | RE | 0.8328 | 0.7930 | 0.8124 |
Qwen2.5-14B | NER | 0.2372 | 0.2271 | 0.2320 |
Qwen2.5-32B | RE | 0.7884 | 0.8901 | 0.8362 |
Qwen2.5-32B | NER | 0.2566 | 0.2854 | 0.2702 |
Qwen-plus | RE | 0.7518 | 0.6656 | 0.7061 |
Qwen-plus | NER | 0.3516 | 0.3062 | 0.3273 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Ding, R.; Liu, Y.; Ma, H. Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety. Symmetry 2025, 17, 1490. https://doi.org/10.3390/sym17091490
Zhang Z, Ding R, Liu Y, Ma H. Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety. Symmetry. 2025; 17(9):1490. https://doi.org/10.3390/sym17091490
Chicago/Turabian StyleZhang, Ziheng, Rijia Ding, Yinhang Liu, and He Ma. 2025. "Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety" Symmetry 17, no. 9: 1490. https://doi.org/10.3390/sym17091490
APA StyleZhang, Z., Ding, R., Liu, Y., & Ma, H. (2025). Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety. Symmetry, 17(9), 1490. https://doi.org/10.3390/sym17091490