Enhancing Geological Knowledge Engineering with Retrieval-Augmented Generation: A Case Study of the Qin–Hang Metallogenic Belt
Abstract
1. Introduction
2. Related Research
2.1. Geological Background and Research Significance
2.2. Research on the ChatGLM3-6B Model
2.3. Overview of the LangChain Open-Source Framework
- Observability Layer: LangSmith at the top level offers tools to monitor and optimize application performance in real time. Debugging allows developers to debug applications during development to quickly identify and resolve issues. Playground: An interactive testing platform for verifying application performance and behavior in real-world conditions. Evaluation: This tool provides tools for performance and accuracy assessment, helping improve response quality. Annotation: Facilitates data annotation and labeling for effective data management during testing and training. Monitoring: Offers real-time application monitoring to identify potential issues and support preventive maintenance. The observability layer provides LangChain with comprehensive debugging and monitoring capabilities, allowing developers to test, analyze, and improve application performance effectively.
- Deployments Layer: This includes LangServe and Templates, which focus on deployment and templates for applications. LangServe enables the deployment of chain-based tasks as REST APIs, allowing LangChain applications to integrate seamlessly into broader systems. Templates: Preconfigured application templates to help developers quickly build and configure applications for common tasks. The deployment layer simplifies application integration and expansion, making it easier for developers to publish applications as REST APIs or quickly setting up specific applications via templates.
- Cognitive Architectures Layer: Comprising LangChain’s core modules, this layer is essential for building intelligent applications, including the following: Chains: Supports chain-based task execution, allowing developers to create workflows by linking operational steps. Agents: Intelligent task units that dynamically make decisions, choose tools or execute specific tasks on the basis of requirements. Retrieval strategies: These strategies offer various methods for extracting relevant information from large datasets, such as in knowledge-based Q&A. These cognitive architectures provide developers with tools to build complex and adaptable applications capable of automated decision-making and multistep tasks.
- Integration Components Layer: Contains the LangChain-Community module, which integrates components for model interaction and retrieval, providing Model I/O, which includes components for model calls, prompt templates, example selectors, and output parsing. Retrieval: This method supports functions such as document loading, vector storage, and embedding models for precise information retrieval. Agent Tooling: Offers toolkits to enhance agent interactivity and data processing within applications. These components enable rapid setup and customization for applications that interact with LLMs, increasing flexibility and scalability.
- Protocol Layer: The LangChain Expression Language (LCE-L) in this layer enables essential features such as parallelization, fault handling, and asynchronous execution, providing low-level support that ensures high efficiency and scalability across other modules.
2.4. Overview of LangChain-Chatchat
3. Knowledge Base Construction
3.1. Technical Approach
- Data Loading and Preprocessing: Various file types within the local knowledge base (e.g., HTML, MD, JSON, CSV, PDF, and TXT) are parsed via an OCR parser to extract content, converting it to a standard text format for further processing. An unstructured data loader ensures compatibility with subsequent processing steps. The extracted text is then segmented on the basis of predefined content and length parameters via the RecursiveCharacterTextSplitter tokenizer, maintaining logical coherence at the sentence or paragraph level.
- Text Vectorization: Each text segment is converted into a vector representation via the bge-large-zh-v1.5 embedding model to capture its semantic features. This process encodes each segment as a high-dimensional vector, which is stored in a vector database (Faiss) to facilitate similarity-based retrieval. Text vectorization not only retains semantic content but also enables efficient similarity searches through vector operations.
- Vector Indexing for Retrieval: All text vectors are stored in the Faiss database to construct a vector index, enabling approximate nearest neighbor (ANN) searches. When a user submits a question, the system can quickly identify similar text segments within the vector space. To optimize the retrieval speed, the system chooses an appropriate indexing structure (e.g., HNSW or IVF) on the basis of the size of the knowledge base and uses cases, allowing rapid, accurate retrieval of relevant information from large datasets.
- Query Vectorization and Retrieval: Upon receiving a user query, the system first vectorizes the question. It then matches this query vector with stored vectors in the database to retrieve the top-K most relevant text segments. These retrieved segments serve as context, forming a “question + context” prompt that is fed into the generative language model to enhance response relevance.
- Answer Generation: The ChatGLM3-6B generative model processes the “question + context” prompt, synthesizing a response that draws on both the model’s intrinsic knowledge and the retrieved content. By accessing up-to-date external information from the knowledge base, the RAG approach enhances response accuracy and relevance, allowing the model to generate answers that are better tailored to complex user queries.
3.2. Data Collection and Preprocessing
4. Analysis and Evaluation of Intelligent Dialog Results
4.1. Dialog Interface
4.2. Comparative Analysis of Dialog Responses
4.3. Evaluation
4.3.1. Introduction to the BLEU Evaluation Method
4.3.2. Evaluation Metrics
4.3.3. Evaluation Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhou, Y.Z.; Xiao, F. Overview: A glimpse of the latest advances in artificial intelligence and big data geoscience research. Earth Sci. Front. 2024, 31, 1–6. [Google Scholar] [CrossRef]
- Liu, Y.; He, H.; Han, T.; Zhang, X.; Liu, M.; Tian, J.; Zhang, Y.; Wang, J.; Gao, X.; Zhong, T.; et al. Understanding LLMs: A Comprehensive Overview from Training to Inference. arXiv 2024, arXiv:2401.02038. [Google Scholar] [CrossRef]
- Liu, Y.; Han, T.; Ma, S.; Zhang, J.; Yang, Y.; Tian, J.; He, H.; Li, A.; He, M.; Liu, Z.; et al. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology 2023, 1, 100017. [Google Scholar] [CrossRef]
- Boitel, E.; Mohasseb, A.; Haig, E. A Comparative Analysis of GPT-3 and BERT Models for Text-Based Emotion Recognition: Performance, Efficiency, and Robustness; UK Workshop on Computational Intelligence; Springer: Cham, Switzerland, 2023; pp. 567–579. [Google Scholar] [CrossRef]
- Anthropic, A.I. The Claude 3 Model Family: Opus, Sonnet, Haiku; 2024; Claude-3 Model Card; pp. 1–42. Available online: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf (accessed on 29 July 2025).
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Scialom, T. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Clusmann, J.; Kolbinger, F.R.; Muti, H.S. The future landscape of large language models in medicine. Commun. Med. 2023, 3, 141. [Google Scholar] [CrossRef] [PubMed]
- Karabacak, M.; Margetis, K. Embracing Large Language Models for Medical Applications: Opportunities and Challenges. Cureus 2023, 15, e39305. [Google Scholar] [CrossRef]
- Castro, A.; Pinto, J.; Reino, L.; Pipek, P.; Capinha, C. Large language models overcome the challenges of unstructured text data in ecology. bioRxiv 2024, 82, 102742. [Google Scholar] [CrossRef]
- Tan, S.Z.; Zheng, Z.; Lu, X.Z. Exploring and Discussion on the Application of Large Language Models in Construction Engineering. Ind. Constr. 2023, 53, 162–169. [Google Scholar] [CrossRef]
- Smetana, M.; Salles de Salles, L.; Sukharev, I.; Khazanovich, L. Highway Construction Safety Analysis Using Large Language Models. Appl. Sci. 2024, 14, 1352. [Google Scholar] [CrossRef]
- Dudhee, V.; Vukovic, V. How large language models and artificial intelligence are transforming civil engineering. Proc. Inst. Civil Eng. 2023, 176, 4. [Google Scholar] [CrossRef]
- Chen, Z.; Lin, M.; Wang, Z.; Zang, M.; Bai, Y. PreparedLLM: Effective pre-pretraining framework for domain-specific large language models. Big Earth Data 2024, 8, 649–672. [Google Scholar] [CrossRef]
- Lin, M.; Jin, M.; Li, J.; Bai, Y. GEOSatDB: Global civil earth observation satellite semantic database. Big Earth Data 2024, 8, 522–539. [Google Scholar] [CrossRef]
- Wang, D.; Tong, X.; Dai, C.; Guo, C.; Lei, Y.; Qiu, C.; Li, H.; Sun, Y. Voxel modeling and association of ubiquitous spatiotemporal information in natural language texts. Int. J. Digit. Earth 2023, 16, 868–890. [Google Scholar] [CrossRef]
- Augenstein, I.; Baldwin, T.; Cha, M.; Chakraborty, T.; Ciampaglia, G.L.; Corney, D.; Zagni, G. Factuality challenges in the era of large language models and opportunities for fact-checking. Nat. Mach. Intell. 2024, 6, 852–863. [Google Scholar] [CrossRef]
- Weiser, B.; Schweber, N. Lawyer who used ChatGPT faces penalty for made up citations. New York Times. 8 June 2023, p. 8. Available online: https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html (accessed on 27 December 2023).
- Opdahl, A.L.; Tessem, B.; Dang-Nguyen, D.T.; Motta, E.; Setty, V.; Throndsen, E.; Trattner, C. Trustworthy journalism through AI. Data. Knowl. Eng. 2023, 146, 102182. [Google Scholar] [CrossRef]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and other large language models are double-edged swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef]
- Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024, 630, 625–630. [Google Scholar] [CrossRef]
- Ling, C.; Zhao, X.; Lu, J.; Deng, C.; Zheng, C.; Wang, J.; Zhao, L. Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv 2023, arXiv:2305.18703. [Google Scholar] [CrossRef]
- Lu, R.S.; Lin, C.C.; Tsao, H.Y. Empowering Large Language Models to Leverage Domain-Specific Knowledge in E-Learning. Appl. Sci. 2024, 14, 5264. [Google Scholar] [CrossRef]
- Marvin, G.; Hellen, N.; Jjingo, D.; Nakatumba-Nabende, J. Prompt Engineering in Large Language Models. In Data Intelligence and Cognitive Informatics; Jacob, I.J., Piramuthu, S., Falkowski-Gilski, P., Eds.; ICDICI 2023; Algorithms for Intelligent Systems; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
- Hu, Y.; Chen, Q.; Du, J.; Peng, X.; Keloth, V.K.; Zuo, X.; Xu, H. Improving large language models for clinical named entity recognition via prompt engineering. J. Am. Med. Inform. Assoc. 2024, 31, 1812–1820. [Google Scholar] [CrossRef]
- Hsueh, C.Y.; Zhang, Y.; Lu, Y.W.; Han, J.C.; Meesawad, W.; Tsai, R.T.H. NCU-IISR: Prompt Engineering on GPT-4 to Stove Biological Problems in BioASQ 11b Phase B. In Proceedings of the CLEF, Thessaloniki, Greece, 18–21 September 2023; pp. 114–121. Available online: https://api.semanticscholar.org/CorpusID:264441290 (accessed on 27 December 2023).
- Knoth, N.; Tolzin, A.; Janson, A.; Leimeister, J.M. AI literacy and its implications for prompt engineering strategies. Comput. Educ. Artif. Intell. 2024, 6, 100225. [Google Scholar] [CrossRef]
- Heston, T.F.; Khun, C. Prompt Engineering in Medical Education. Int. Med. Educ. 2023, 2, 198–205. [Google Scholar] [CrossRef]
- Lee, U.; Jung, H.; Jeon, Y.; Sohn, Y.; Hwang, W.; Moon, J.; Kim, H. Few-shot is enough: Exploring ChatGPT prompt engineering method for automatic question generation in english education. Educ. Inf. Technol. 2024, 29, 11483–11515. [Google Scholar] [CrossRef]
- Giray, L. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Ann. Biomed. Eng. 2023, 51, 2629–2633. [Google Scholar] [CrossRef]
- Velásquez-Henao, J.D.; Franco-Cardona, C.J.; Cadavid-Higuita, L. Prompt Engineering: A methodology for optimizing interactions with AI-Language Models in the field of engineering. Dyna 2023, 90, 9–17. [Google Scholar] [CrossRef]
- Borzunov, A.; Ryabinin, M.; Chumachenko, A.; Baranchuk, D.; Dettmers, T.; Belkada, Y.; Raffel, C.A. Distributed inference and fine-tuning of large language models over the internet. Adv. Neural Inf. Process. Syst. 2024, 36, 12312–12331. [Google Scholar]
- Li, Y.; Wang, S.; Ding, H.; Chen, H. Large language models in finance: A survey. In Proceedings of the 4th ACM International Conference on AI in Finance, Brooklyn, NY, USA, 27–29 November 2023; pp. 374–382. [Google Scholar] [CrossRef]
- Wu, S.; Irsoy, O.; Lu, S.; Dabravolski, V.; Dredze, M.; Gehrmann, S.; Mann, G. Bloomberggpt: A large language model for finance. arXiv 2023, arXiv:2303.17564. [Google Scholar] [CrossRef]
- Liu, X.Y.; Wang, G.; Yang, H.; Zha, D. Fingpt: Democratizing internet-scale data for financial large language models. arXiv 2023, arXiv:2307.10485. [Google Scholar] [CrossRef]
- Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Hou, L.; Natarajan, V. Towards expert-level medical question answering with large language models. arXiv 2023, arXiv:2305.09617. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Chen, J.; Jiang, F.; Yu, F.; Chen, Z.; Li, J.; Chen, G.; Wu, X.; Zhang, Z.; Xiao, Q.; et al. HuatuoGPT, towards Taming Language Model to Be a Doctor. arXiv 2023, arXiv:2305.15075. [Google Scholar] [CrossRef]
- Wang, H.; Liu, C.; Xi, N.; Qiang, Z.; Zhao, S.; Qin, B.; Liu, T. HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge. arXiv 2023, arXiv:abs/2304.06975. [Google Scholar] [CrossRef]
- Zhang, Y.; Wei, C.; He, Z.; Yu, W. GeoGPT: An assistant for understanding and processing geospatial tasks. Int. J. Appl. Earth Obs. Geoinform. 2024, 131, 103976. [Google Scholar] [CrossRef]
- Nie, Y.; Zelikman, E.; Scott, A.; Paletta, Q.; Brandt, A. SkyGPT: Probabilistic ultra-short-term solar forecasting using synthetic sky images from physics-constrained VideoGPT. Adv. Appl. Energy. 2024, 14, 100172. [Google Scholar] [CrossRef]
- Deng, C.; Zhang, T.; He, Z.; Chen, Q.; Shi, Y.; Xu, Y.; He, J. K2: A foundation language model for geoscience knowledge understanding and utilization. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 4–8 March 2024; pp. 161–170. [Google Scholar] [CrossRef]
- Lin, Z.; Deng, C.; Zhou, L.; Zhang, T.; Xu, Y.; Xu, Y.; He, Z.; Shi, Y.; Dai, B.; Song, Y.; et al. GeoGalactica: A Scientific Large Language Model in Geoscience. arXiv 2023, arXiv:abs/2401.00434. [Google Scholar] [CrossRef]
- Asai, A.; Min, S.; Zhong, Z.; Chen, D. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 7 July 2023; Volume 6, pp. 41–46. [Google Scholar] [CrossRef]
- Guu, K.; Lee, K.; Tung, Z.; Pasupat, P.; Chang, M. Retrieval augmented language model pre-training. In International Conference on Machine Learning; PMLR: New York, NY, USA, 2020; pp. 3929–3938. [Google Scholar]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar] [CrossRef]
- Fan, W.; Ding, Y.; Ning, L.; Wang, S.; Li, H.; Yin, D.; Li, Q. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6491–6501. [Google Scholar] [CrossRef]
- Quinonez, C.; Meij, E. A new era of AI-assisted journalism at Bloomberg. AI Mag. 2024, 45, 187–199. [Google Scholar] [CrossRef]
- Qian, J.; Jin, Z.; Zhang, Q.; Cai, G.; Liu, B. A Liver Cancer Question-Answering System Based on Next-Generation Intelligence and the Large Model Med-PaLM 2. Int. J. Comput. Sci. Inf. Technol. 2024, 2, 28–35. [Google Scholar] [CrossRef]
- Iz, B.; Kyle, L.; Arman, C. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, 3–7 November 2019; pp. 3615–3620. [Google Scholar] [CrossRef]
- Chaminé, H.I.; Fernandes, I. The role of engineering geology mapping and GIS-based tools in geotechnical practice. In Advances on Testing and Experimentation in Civil Engineering; Springer: Cham, Switzerland, 2022; pp. 3–27. [Google Scholar] [CrossRef]
- Wang, C.; Wang, X.; Chen, J. Digital Geological Mapping to Facilitate Field Data Collection, Integration, and Map Production in Zhoukoudian, China. Appl. Sci. 2021, 11, 5041. [Google Scholar] [CrossRef]
- Cai, B.; Zhao, J.; Yu, X. A methodology for 3D geological mapping and implementation. Multimed. Tools Appl. 2019, 78, 28703–28713. [Google Scholar] [CrossRef]
- Chen, H.; Liu, H.; Shen, C.; Xie, W.; Liu, T.; Zhang, J.; Lu, J.; Li, Z.; Peng, Y. Research on Geological-Engineering Integration Numerical Simulation Based on EUR Maximization Objective. Energies 2024, 17, 3644. [Google Scholar] [CrossRef]
- Song, Y.; Yin, T.; Zhang, C.; Wang, N.; Hou, X. Application of sedimentary numerical simulation in sequence stratigraphy study. Arab. J. Geosci. 2020, 13, 267. [Google Scholar] [CrossRef]
- Pham, L.T.; Oliveira, S.P.; Le, C.V.A. Editorial for the Special Issue “Application of Geophysical Data Interpretation in Geological and Mineral Potential Mapping”. Minerals 2024, 14, 63. [Google Scholar] [CrossRef]
- Lai, J.; Su, Y.; Xiao, L.; Zhao, F.; Bai, T.; Li, Y.; Qin, Z. Application of geophysical well logs in solving geologic issues: Past, present and future prospect. Geosci. Front. 2024, 15, 101779. [Google Scholar] [CrossRef]
- Hdeid, O.M.; Morsli, Y.; Raji, M.; Baroudi, Z.; Adjour, M.; Nebagha, K.C.; Vall, I.B. Application of Remote Sensing and GIS in Mineral Alteration Mapping and Lineament Extraction Case of Oudiane Elkharoub (Requibat Shield, Northern of Mauritania). Open J. Geol. 2024, 14, 823–854. [Google Scholar] [CrossRef]
- Bety, A.K.; Hassan, M.M.; Salih, N.M.; Thannoun, R.G. The Application of Remote Sensing Techniques for Identification the Gypsum Rocks in the Qara Darbandi Anticline, Kurdistan Region of Iraq. Iraqi Geol. J. 2024, 57, 308–322. [Google Scholar] [CrossRef]
- Hofer, M.; Obraczka, D.; Saeedi, A.; Köpcke, H.; Rahm, E. Construction of Knowledge Graphs: Current State and Challenges. Information 2024, 15, 509. [Google Scholar] [CrossRef]
- Zhou, Y.Z.; Zhang, Q.L.; Huang, Y.J.; Yang, W.; Xiao, F.; Ji, J. Constructing knowledge graph for the porphyry copper deposit in the Qingzhou-Hangzhou Bay area: Insight into knowledge graph based mineral resource prediction and evaluation. Earth Sci. Front. 2021, 28, 9. [Google Scholar] [CrossRef]
- Zhang, Q.L.; Zhou, Y.Z.; Guo, L.X.; Yuan, G.Q.; Yu, P.P.; Wang, H.Y.; Zhu, B.B.; Han, F.; Long, S.Y. Intelligent application of knowledge graphs in mineral prospecting: A case study of porphyry copper deposits in the Qin-Hang metallogenic belt. Earth Sci. Front. 2024, 31, 7–15. [Google Scholar] [CrossRef]
- Zhang, Q.L.; Zhou, Y.Z.; Yu, P.P.; Wang, H.Y.; Han, F.; He, J.X. Ontology construction of multi-level ore deposit and its application in knowledge graph. Bull. Mineral. Petrol. Geochem. 2024, 43, 211–217. [Google Scholar] [CrossRef]
- Zhang, C.; Govindaraju, V.; Borchardt, J.; Foltz, T.; Ré, C.; Peters, S. GeoDeepDive: Statistical inference using familiar data-processing languages. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 23–28 June 2013; pp. 993–996. [Google Scholar] [CrossRef]
- Janowicz, K.; Gao, S.; McKenzie, G.; Hu, Y.; Bhaduri, B. GeoAI: Spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond. Int. J. Geogr. Inf. Sci. 2020, 34, 625–636. [Google Scholar] [CrossRef]
- Choi, Y. GeoAI: Integration of Artificial Intelligence, Machine Learning, and Deep Learning with GIS. Appl. Sci. 2023, 13, 3895. [Google Scholar] [CrossRef]
- Bornstein, T.; Lange, D.; Münchmeyer, J.; Woollam, J.; Rietbrock, A.; Barcheck, G.; Tilmann, F. PickBlue: Seismic phase picking for ocean bottom seismometers with deep learning. Earth Space Sci. 2024, 11, e2023EA003332. [Google Scholar] [CrossRef]
- Maio, R.; Arko, R.A.; Lehnert, K.; Ji, P. Entity Linking Leveraging the GeoDeepDive Cyberinfrastructure and Managing Uncertainty with Provenance; AGU Fall Meeting Abstract; American Geophysical Union: Washington, DC, USA, 2017; p. #IN33B-0118. Available online: https://ui.adsabs.harvard.edu/abs/2017AGUFMIN33B0118M (accessed on 27 December 2023).
- Goring, S.; Marsicek, J.; Ye, S.; Williams, J.W.; Meyers, S.; Peters, S.E.; Marcott, S. A Model Workflow for GeoDeepDive: Locating Pliocene and Pleistocene Ice-Rafted Debris. EarthArXiv 2021. [Google Scholar] [CrossRef]
- Husson, J.M.; Peters, S.E.; Ross, I.; Czaplewski, J.J. Macrostrat and GeoDeepDive: A platform for Geological Data Integration and Deep-time Research; AGU Fall Meeting Abstract; American Geophysical Union: Washington, DC, USA, 2016; p. IN23F-04. Available online: https://ui.adsabs.harvard.edu/abs/2016AGUFMIN23F..04H (accessed on 27 December 2023).
- Kumpf, B. Evaporites Through Phanerozoic Time: Using GeoDeepDive, Macrostrat, and Geochemical Modeling to Investigate and Model Changes in Seawater Chemistry. Ph.D. Thesis, University of Victoria, Victoria, BC, Canada, 2024. Available online: https://hdl.handle.net/1828/16395 (accessed on 27 December 2023).
- Liu, P.; Biljecki, F. A review of spatially-explicit GeoAI applications in Urban Geography. Int. J. Appl. Earth Obs. Geoinform. 2022, 112, 102936. [Google Scholar] [CrossRef]
- Li, W.; Hsu, C.-Y. GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in Geography. ISPRS Int. J. Geo-Inf. 2022, 11, 385. [Google Scholar] [CrossRef]
- Woollam, J.; Münchmeyer, J.; Tilmann, F.; Rietbrock, A.; Lange, D.; Bornstein, T.; Soto, H. SeisBench—A toolbox for machine learning in seismology. Seismol. Soc. Am. 2022, 93, 1695–1709. [Google Scholar] [CrossRef]
- Ramaneti, K.; Rajkumar, S. An Overview of Recent Advances and Applications of Machine Learning in Seismic Phase Picking. RearchGate 2022. [Google Scholar] [CrossRef]
- Pita-Sllim, O.; Chamberlain, C.J.; Townend, J.; Warren-Smith, E. Parametric testing of EQTransformer’s performance against a high-quality, manually picked catalog for reliable and accurate seismic phase picking. Seism. Rec. 2023, 3, 332–341. [Google Scholar] [CrossRef]
- Münchmeyer, J.; Saul, J.; Tilmann, F. Learning the deep and the shallow: Deep-learning-based depth phase picking and earthquake depth estimation. Seismol. Res. Lett. 2024, 95, 1543–1557. [Google Scholar] [CrossRef]
- Chen, H.L.; Chen, H.Z.; Han, K.F.; Zhu, G.X.; Zhao, Y.C.; Du, Y. Domain-Specific Foundation-Model Customization: Theoretical Foundation and Key Technology. J. Data Acquis. Process. 2024, 39, 524–546. [Google Scholar] [CrossRef]
- He, J.X.; Zhang, Q.L.; Xu, Y.T.; Liu, Y.Q.; Wang, W.X.; Zhou, Y.Z.; Yu, P.P. Research progress of Qinzhou—Hangzhou metallogenic belt—Analysed from CiteSpace community discovery. Geol. Rev. 2023, 69, 1919–1927. [Google Scholar] [CrossRef]
- Shi, D.; Fan, S.; Li, G.; Zhu, Y.; Yan, Q.; Jia, M.; Faisal, M. Genesis of Yongping copper deposit in the Qin-Hang Metallogenic Belt, SE China: Insights from sulfide geochemistry and sulfur isotopic data. Ore Geol. Rev. 2024, 173, 106231. [Google Scholar] [CrossRef]
- Duan, R.C.; Jiang, S.Y. Fluid Inclusions and H-O-C-S-Pb Isotope Studies of the Xinmin Cu-Au-Ag Polymetallic Deposit in the Qinzhou-Hangzhou Metallogenic Belt, South China: Constraints on Fluid Origin and Evolution. Geofluids 2021, 1, 5171579. [Google Scholar] [CrossRef]
- Wang, J.; Wang, J.; Zhang, X.; Zhang, J.; Liu, A.; Jiang, M.; Bai, X. Geology, geochronology, and stable isotopes of the Triassic Tianjingshan orogenic gold deposit, China: Implications for ore genesis of the Qinzhou Bay–Hangzhou Bay metallogenic belt. Ore Geol. Rev. 2021, 131, 103952. [Google Scholar] [CrossRef]
- Zhou, B.; Yan, C.; Zhan, Y.; Sun, X.; Li, S.; Wen, X.; Huang, M. Deep electrical structures of Qinzhou-Fangcheng Junction Zone in Guangxi and seismogenic environment of the 1936 Lingshan M 6¾ earthquake. Sci. China Earth Sci. 2024, 67, 584–603. [Google Scholar] [CrossRef]
- Xu, D.M.; Qu, Z.Y.; Long, W.G.; Zhang, K.; Wang, L.; Zhou, D.; Huang, H. Research History and Current Situation of Qinzhou-Hangzhou Metallogenic Belt, South China. Geol. Mineral. Resour. South China 2012, 28, 277. [Google Scholar] [CrossRef]
- Wang, G.; Fang, H.; Qiu, G.; Pei, F.; He, M.; Du, B.; Peng, Y. Three-dimensional magnetotelluric imaging of the Eastern Qinhang Belt between the Yangtze Block and Cathaysia Block: Implications for lithospheric architecture and associated metallogenesis. Ore Geol. Rev. 2023, 158, 105490. [Google Scholar] [CrossRef]
- Liu, Y.; Lu, Q.; Farquharson, C.; Yan, J. Metallogenic Mechanism of the North Qinhang Belt, South China, from Gravity and Magnetic Inversions. In Proceedings of the 25th European Meeting of Environmental and Engineering Geophysics, The Hague, The Netherlands, 8–12 September 2019; Volume 1, pp. 1–5. [Google Scholar] [CrossRef]
- Zheng, M.; Xu, T.; Lü, Q.; Lin, J.; Huang, M.; Bai, Z.; Badal, J. Upper crustal structure beneath the Qin-Hang and Wuyishan metallogenic belts in Southeast China as revealed by a joint active and passive seismic experiment. Geophys. J. Int. 2023, 232, 190–200. [Google Scholar] [CrossRef]
- Zhang, D.; Li, F.; He, X.L. Mesozoic tectonic deformation and its rock/ore-control mechanism in the important metallogenic belts in South China. J. Geomech. 2021, 27, 497–528. [Google Scholar] [CrossRef]
- Song, C.Z.; Li, J.H.; Yan, J.Y.; Wang, Y.Y.; Liu, Z.D.; Yuan, F.; Li, Z.W. A tentative discussion on some tectonic problems in the east of South China continent. Geol. China 2019, 46, 704–722. [Google Scholar] [CrossRef]
- Zhang, Y.Q.; Xu, Y.; Yan, J.Y.; Xu, Z.W.; Zhao, J.H. Crustal thickness, properties and its relations to mineralization in the southeastern part of South China: Constraint from the teleseismic receiver functions. Geol. China 2019, 46, 723–736. [Google Scholar] [CrossRef]
- Zeng, T.G.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Rojas, D.; Feng, G.; Zhao, H.; Lai, H.; Yu, H.; et al. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv 2024, arXiv:abs/2406.12793. [Google Scholar] [CrossRef]
- Topsakal, O.; Akinci, T.C. Creating large language model applications utilizing langchain: A primer on developing llm apps fast. Int. Conf. Appl. Eng. Nat. Sci. 2023, 1, 1050–1056. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, J.; Zhang, S.; Yang, Y. Poisoned LangChain: Jailbreak LLMs by LangChain. arXiv 2024, arXiv:2406.18122. [Google Scholar] [CrossRef]
- Jeong, S.; Baek, J.; Cho, S.; Hwang, S.J.; Park, J.C. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. arXiv 2024, arXiv:2403.14403. [Google Scholar] [CrossRef]
- Massey, P.A.; Montgomery, C.; Zhang, A.S. Comparison of ChatGPT–3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. JAAOS-J. Am. Acad. Orthop. Surg. 2023, 31, 1173–1179. [Google Scholar] [CrossRef] [PubMed]
- Crawford, J.; Cowling, M.; Ashton-Hay, S.; Kelder, J.-A.; Middleton, R.; Wilson, G. Artificial Intelligence and authorship editor policy: ChatGPT, Bard Bing, and beyond. J. Univ. Teach. Learn. Pract. 2023, 20, 1. [Google Scholar] [CrossRef]
- Reiter, E. A structured review of the validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar] [CrossRef]
- Imran, M.; Almusharraf, N. Google Gemini as a next generation AI educational tool: A review of emerging educational technology. Smart Learn. Environ. 2024, 11, 22. [Google Scholar] [CrossRef]
- Zhou, Y.Z.; Zhang, L.J.; Zhang, A.D.; Wang, J. Bigdata Mining & Machine Learning in Geoscience; Sun Yat-sen University Press: Zhuhai, China, 2018. [Google Scholar] [CrossRef]
Model | Answer |
ChatCPT-4o | |
Bing |
Questions | Answers |
---|---|
Metallogenic Characteristics of Porphyry Copper Deposits in the Qin–Hang Metallogenic Belt | Porphyry copper deposits in the Qin–Hang metallogenic belt are distributed across the northern, central, and southern sections, with primary mineralization occurring during the Yanshanian period. The porphyries associated with mineralization in the Qin–Hang belt predominantly belong to the calc-alkaline magma series, with rock types including granodiorite porphyry, granite porphyry, and subrhyolitic porphyry. These mineralized porphyries exhibit trace element characteristics similar to those of Neoproterozoic island arc igneous rocks, suggesting they originated from the partial melting of a mixture of asthenosphere-derived magma and middle-lower crustal material. This is indicative of crust-mantle contamination, likely linked to lithospheric mantle influenced by earlier subduction. The geodynamic setting of the Yanshanian porphyry copper deposits is closely associated with the subduction of the Pacific plate, while the arc magmatic features of the porphyry and copper deposits are related to pre-Mesozoic tectonic events in South China. The calc-alkaline porphyries in the Qin–Hang belt derive from partial melting of basaltic rocks at the base of a Proterozoic island arc (lower crust) during the Mesozoic, fundamentally carrying island arc subduction-related characteristics. |
Regional Geological Background of the Qin–Hang Metallogenic Belt | The main extent of the Qin–Hang metallogenic belt corresponds to the Qin–Hang junction zone, a multiblock collisional zone formed by the collision and amalgamation of the Yangtze and Cathaysia blocks during the Jinningian period. The belt stretches approximately 2000 km from Qinzhou Bay in Guangxi to Hangzhou Bay in Zhejiang, with an overall S-shaped arc pattern. In the south, it is bounded by the Pingxiang-Guangfeng-Shaoxing fault separating it from the Cathaysia block, while in the north, the Yifeng-Jingdezhen-Shexian fault marks the boundary with the Yangtze block. The eastern segment of the Qin–Hang belt includes the entirety of the eastern Qin–Hang junction zone, along with Yanshanian-period magmatic mineralization zones on either side, including the eastern segment of the Jiangnan orogenic belt to the north and the Wugongshan-Tiantai uplift zone to the south. This region spans approximately 900 km in length and 190–230 km in width, covering an area of approximately 180,000 square kilometers. The basement strata of the eastern segment are Neoproterozoic in age, including formations such as the Shuangqiaoshan, Zhangcun, Xikou, Wannian, Shuangxiwu, Chencai, Yifeng, Tianli, and Tieshajie groups, which collectively represent a suite of marine volcanic-sedimentary deposits. Since the Neoproterozoic, the Qin–Hang belt has undergone multiple tectonic events. The Jinningian period involved oceanic-arc subduction and continental collision, leading to the convergence of the Yangtze and Cathaysia blocks along the Northeast Jiangxi major fault, forming the Jiangnan orogenic belt in the northern Qin–Hang belt, with a complete trench-arc-basin system. The Caledonian orogeny resulted in the complete amalgamation of the Yangtze and Cathaysia blocks, creating the unified South China block and broadly shaping the structure of the Qin–Hang junction zone and its surroundings. During the Hercynian period, intracontinental rifting along the western side of the Northeast Jiangxi deep fault, between the Yifeng-Jingdezhen-Shexian fault and the Pingxiang-Guangfeng-Shaoxing fault, formed the Zhejiang-Jiangxi depression zone (Pingxiang-Qiantang depression). This was followed by intense intracontinental folding and orogeny during the Indosinian period, leading to the formation of the Wugongshan-Tiantai uplift zone, resulting in the east segment of the Qin–Hang metallogenic belt’s characteristic “two uplifts with one depression.” The Yanshanian period witnessed a tectonic shift involving lithospheric extension and thinning, triggering large-scale granite magmatism and a surge in magmatic mineralization. |
Distribution of Deposit Types in the Qin–Hang Metallogenic Belt | The distribution of deposits in the Qin–Hang metallogenic belt follows a distinct pattern, significantly influenced by the internal structure and evolutionary heterogeneity of the Qin–Hang junction zone. Yongzhang Zhou and colleagues proposed a division of the belt into northern, central, and southern sections based on the regional distribution of dominant ore types. The northern section is rich in copper, iron, and precious metals, with representative deposits including the Dexing copper deposit, Lingshuikeng lead-zinc-silver deposit, Yinzhan gold-copper-lead-zinc deposit, Jinshan gold deposit, and Ping Shui copper deposit, among others. The central section corresponds to the Nanling metallogenic belt, known for its abundance of granite and as a major source of tungsten-tin polymetallic minerals. Key deposits include Dongpo, Shuikoushan, Xitian, Xihuashan, Dajishan, Tongshanling, Huangshaping, and Shizhuyuan, all associated with magmatic activity. The southern section is primarily known for gold, silver, and polymetallic deposits, with major resources located in the Yunkaidashan, Dayao, and Shiwan mountains. Representative deposits include the Hetai gold deposit, Pangxidong silver deposit, Daming Mountain gold-silver deposit, Yun’an Gaochang lead-zinc-silver deposit, and the Changkeng large silver-gold deposit. Jingwen Mao et al. categorized the Qin–Hang belt and adjacent deposits into two metallogenic series and three subseries: Neoproterozoic Seafloor Exhalative-Sedimentary Copper-Zinc Deposit Series, and Yanshanian Granite-Related Tungsten-Tin-Copper-Lead-Zinc Polymetallic Deposit Series. The latter is further divided into: The Middle-Late Jurassic Porphyry-Skarn-Hydrothermal Vein Copper Polymetallic Subseries, The Late Jurassic Granite-Related Tungsten-Tin Polymetallic Subseries, and The Cretaceous Epithermal Low-Temperature Gold-Silver-Lead-Zinc-Tungsten-Tin Subseries associated with subvolcanic activity. Mao proposed that the Neoproterozoic massive sulfide copper-zinc deposits formed in a back-arc or fore-arc basin setting, while large-scale tungsten-tin polymetallic mineralization approximately 160–150 Ma is related to major granite magmatic events. After 135 Ma, mineral resources were concentrated in volcanic and fault-basin settings. Deming Xu and colleagues identified seven metallogenic series, including the Meso-Neoproterozoic seafloor exhalative-sedimentary copper-polymetallic deposits, Neoproterozoic marine sedimentary-metamorphic iron-manganese deposits, Paleozoic marine sedimentary-superimposed copper-lead-zinc-iron-manganese deposits, Caledonian granite-related tungsten-molybdenum-gold-silver polymetallic deposits, Indosinian granite-related tungsten-tin-niobium-tantalum-uranium polymetallic deposits, Yanshanian granite-related copper-lead-zinc-gold-tungsten-tin polymetallic deposits, and gold-silver deposits associated with regional dynamic metamorphic hydrothermal activity. |
Model | Precisionaverage | Recallaverage | F1 Scoreaverage |
---|---|---|---|
ChatGLM3-6B | 0.7916 | 0.8022 | 0.7969 |
ChatGLM3-6B RAG | 0.8663 | 0.8421 | 0.8535 |
ChatCPT-4o | 0.8507 | 0.8521 | 0.8514 |
Bing | 0.8456 | 0.8145 | 0.8298 |
Gemini | 0.8152 | 0.8140 | 0.8146 |
Comparison Dimension | GeoGPT | GeoGalactica | K2 | ChatGLM3-6B RAG (This Study) |
---|---|---|---|---|
System Architecture | KG + GPT-based QA | Scientific generative LLM | Foundation model for geoscience | RAG-based QA (LangChain + ChatGLM) |
Language Support | English only | English only | English only | Bilingual (Chinese + English) |
Geographic Focus | Global terminology | Global scientific corpus | Multi-region remote sensing | Focused on the Qin–Hang metallogenic belt |
Openness | Partially open (no weights) | Structure open, no weights released | Fully open (code + data) | Fully open (code + corpus structure) |
QA Capability | Entity recognition & basic queries | Generative QA (no benchmark) | No QA module | Vector retrieval + generative answers |
Performance Evaluation | Not reported | Not reported | Evaluated on GeoBench | Evaluated using F1/Precision/Recall |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, J.; Zhou, Y.; He, L.; Zhang, Q.; Bilal, M.A.; Zhang, Y. Enhancing Geological Knowledge Engineering with Retrieval-Augmented Generation: A Case Study of the Qin–Hang Metallogenic Belt. Minerals 2025, 15, 1023. https://doi.org/10.3390/min15101023
Ma J, Zhou Y, He L, Zhang Q, Bilal MA, Zhang Y. Enhancing Geological Knowledge Engineering with Retrieval-Augmented Generation: A Case Study of the Qin–Hang Metallogenic Belt. Minerals. 2025; 15(10):1023. https://doi.org/10.3390/min15101023
Chicago/Turabian StyleMa, Jianhua, Yongzhang Zhou, Luhao He, Qianlong Zhang, Muhammad Atif Bilal, and Yuqing Zhang. 2025. "Enhancing Geological Knowledge Engineering with Retrieval-Augmented Generation: A Case Study of the Qin–Hang Metallogenic Belt" Minerals 15, no. 10: 1023. https://doi.org/10.3390/min15101023
APA StyleMa, J., Zhou, Y., He, L., Zhang, Q., Bilal, M. A., & Zhang, Y. (2025). Enhancing Geological Knowledge Engineering with Retrieval-Augmented Generation: A Case Study of the Qin–Hang Metallogenic Belt. Minerals, 15(10), 1023. https://doi.org/10.3390/min15101023