Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG
Abstract
1. Introduction
2. Materials and Methods
2.1. Corrosion Knowledge Dataset
2.2. Fine-Tuning of the Corrosion Dataset
2.3. Retrieval Augmented Generation
2.4. Prompt Engineering
2.5. Model Evaluation
2.6. Automatic Evaluation
2.7. Human Evaluation
3. Result and Discussion
3.1. Prompt Engineering and Hallucination Analysis
3.2. Evaluation of Fine-Tuning and RAG Integration
3.3. Expert Review Results
3.4. Analysis of the Relationship Between Model Performance and Storage Footprint
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
LoRA | Low-Rank Adaptation |
SFT | Supervised fine-tuning |
RAG | Retrieval-Augmented Generation |
CKD | Corrosion Knowledge Database |
Appendix A. Q–A Data Example
- (a)
- Accepted Q–A Pair
- (b)
- Revised Q–A Pair
- (c)
- Deleted Q–A Pair
Appendix B. System Prompt Design
Appendix C. Large Language Model Information
References
- Comizzoli, R.B.; Frankenthal, R.P.; Milner, P.C.; Sinclair, J.D. Corrosion of Electronic Materials and Devices. Science 1986, 234, 340–345. [Google Scholar] [CrossRef]
- Revie, R.W. Corrosion and Corrosion Control: An Introduction to Corrosion Science and Engineering; John Wiley & Sons: New York, NY, USA, 2008. [Google Scholar]
- Koch, G.H.; Brongers, M.P.; Thompson, N.G.; Virmani, Y.P.; Payer, J.H. Corrosion Cost and Preventive Strategies in the United States; United States Federal Highway Administration: Washington, DC, USA, 2002.
- Guo, X.; Ding, X.; Wang, Y.; Wang, J.; Tan, W.; Li, Y.; Chen, Z.; Li, Z.; Chen, W.; Ma, L. High-Throughput Screening of Green Amino Acid and Surfactant Mixtures with High Corrosion Inhibition Efficiency: Experimental and Modelling Perspectives. Corros. Sci. 2024, 240, 112460. [Google Scholar] [CrossRef]
- Guo, X.; Zhang, X.; Ma, L.; Li, Y.; Le, J.; Fu, Z.; Lu, L.; Zhang, D. Understanding the Adsorption of Imidazole Corrosion Inhibitor at the Copper/Water Interface by Ab Initio Molecular Dynamics. Corros. Sci. 2024, 236, 112237. [Google Scholar] [CrossRef]
- Wang, K.; Li, C.; Lu, J.; Nan, C.; Zhang, Q.; Zhang, H. Intelligent Evaluation of Marine Corrosion of Q420 Steel Based on Image Recognition Method. Coatings 2022, 12, 881. [Google Scholar] [CrossRef]
- Luo, M.-J.; Pang, J.; Bi, S.; Lai, Y.; Zhao, J.; Shang, Y.; Cui, T.; Yang, Y.; Lin, Z.; Zhao, L.; et al. Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology. JAMA Ophthalmol. 2024, 142, 798–805. [Google Scholar] [CrossRef]
- Li, Y.; Zhao, J.; Li, M.; Dang, Y.; Yu, E.; Li, J.; Sun, Z.; Hussein, U.; Wen, J.; Abdelhameed, A.M.; et al. RefAI: A GPT-Powered Retrieval-Augmented Generative Tool for Biomedical Literature Recommendation and Summarization. J. Am. Med. Inform. Assoc. 2024, 31, 2030–2039. [Google Scholar] [CrossRef] [PubMed]
- Zakka, C.; Shad, R.; Chaurasia, A.; Dalal, A.R.; Kim, J.L.; Moor, M.; Fong, R.; Phillips, C.; Alexander, K.; Ashley, E.; et al. Almanac—Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI 2024, 1, AIoa2300068. [Google Scholar] [CrossRef] [PubMed]
- Ram, O.; Levine, Y.; Dalmedigos, I.; Muhlgay, D.; Shashua, A.; Leyton-Brown, K.; Shoham, Y. In-Context Retrieval-Augmented Language Models. Trans. Assoc. Comput. Linguist. 2023, 11, 1316–1331. [Google Scholar] [CrossRef]
- Siriwardhana, S.; Weerasekera, R.; Wen, E.; Kaluarachchi, T.; Rana, R.; Nanayakkara, S. Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering. Trans. Assoc. Comput. Linguist. 2023, 11, 1–17. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2021, arXiv:2005.11401. [Google Scholar]
- Matsumoto, N.; Moran, J.; Choi, H.; Hernandez, M.E.; Venkatesan, M.; Wang, P.; Moore, J.H. KRAGEN: A Knowledge Graph-Enhanced RAG Framework for Biomedical Problem Solving Using Large Language Models. Bioinformatics 2024, 40, btae353. [Google Scholar] [CrossRef]
- Yan, Z.; Liang, H.; Wang, J.; Zhang, H.; da Silva, A.K.; Liang, S.; Rao, Z.; Zeng, X. PDGPT: A Large Language Model for Acquiring Phase Diagram Information in Magnesium Alloys. Mater. Genome Eng. Adv. 2024, 2, e77. [Google Scholar] [CrossRef]
- He, C.; Li, W.; Jin, Z.; Xu, C.; Wang, B.; Lin, D. OpenDataLab: Empowering General Artificial Intelligence with Open Datasets. arXiv 2024, arXiv:2407.13773. [Google Scholar]
- Zhang, L.; Li, S.; Peng, H. Lora for Dense Passage Retrieval of ConTextual Masked Auto-Encoding. Signal Image Video Process. 2024, 19, 23. [Google Scholar] [CrossRef]
- Ong, C.S.; Obey, N.T.; Zheng, Y.; Cohan, A.; Schneider, E.B. SurgeryLLM: A Retrieval-Augmented Generation Large Language Model Framework for Surgical Decision Support and Workflow Enhancement. Npj Digit. Med. 2024, 7, 364. [Google Scholar] [CrossRef]
- Luu, R.K.; Buehler, M.J. BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-Inspired Materials. Adv. Sci. 2024, 11, 2306724. [Google Scholar] [CrossRef]
- Buehler, M.J. MechGPT, a Language-Based Strategy for Mechanics and Materials Modeling That Connects Knowledge across Scales, Disciplines, and Modalities. Appl. Mech. Rev. 2024, 76, 021001. [Google Scholar] [CrossRef]
- Kresevic, S.; Giuffrè, M.; Ajcevic, M.; Accardo, A.; Crocè, L.S.; Shung, D.L. Optimization of Hepatological Clinical Guidelines Interpretation by Large Language Models: A Retrieval Augmented Generation-Based Framework. Npj Digit. Med. 2024, 7, 102. [Google Scholar] [CrossRef] [PubMed]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 195. [Google Scholar] [CrossRef]
- Wang, M.; Wang, M.; Xu, X.; Yang, L.; Cai, D.; Yin, M. Unleashing ChatGPT’s Power: A Case Study on Optimizing Information Retrieval in Flipped Classrooms via Prompt Engineering. IEEE Trans. Learn. Technol. 2024, 17, 629–641. [Google Scholar] [CrossRef]
- Abacha, A.B.; Yim, W.; Fu, Y.; Sun, Z.; Yetisgen, M.; Xia, F.; Lin, T. MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes. arXiv 2025, arXiv:2412.19260. [Google Scholar]
- Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Yang, A.; Yang, B.; Hui, B.; Zheng, B.; Yu, B.; Zhou, C.; Li, C.; Li, C.; Liu, D.; Huang, F.; et al. Qwen2 Technical Report. arXiv 2024, arXiv:2407.10671. [Google Scholar]
- Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; Huang, F.; Wei, H.; et al. Qwen2.5 Technical Report. arXiv 2025, arXiv:2412.15115. [Google Scholar]
- Glm, T.; Zeng, A.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Zhang, D.; Rojas, D.; Feng, G.; Zhao, H.; et al. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv 2024, arXiv:2406.12793. [Google Scholar]
Issue Type | Handling Method | Example Modification |
---|---|---|
Minor factual error | Corrected by reviewer | Corrected misreported corrosion rate from 0.1 mm/y to 0.01 mm/y |
Ambiguous terminology | Clarified wording | Changed “damage” to “localized pitting damage” |
Difficulty mismatch | Adjusted question or answer complexity | Simplified multi-step electrochemical equation for basic-level question |
Irrelevant content | Removed from dataset | Deleted Q–A about unrelated polymer degradation |
Incomplete answer | Expanded with missing details | Added standard testing method reference |
Model Name | Parameters | Provider | Open Source |
---|---|---|---|
GPT-4 | About 200B [23] | OpenAI | No |
GPT-3.5 | About 175B [23] | OpenAI | No |
Llama3 | 8B [24] | Meta | Yes |
Llama3 | 70B [24] | Meta | Yes |
Qwen2 | 72B [25] | Alibaba Cloud | Yes |
Qwen2 | 7B [25] | Alibaba Cloud | Yes |
Qwen2.5 | 7B [26] | Alibaba Cloud | Yes |
GLM-4 | 9B [27] | Zhipu AI | Yes |
Evaluation Aspect | Main Concerns | Evaluation Options |
---|---|---|
Technical Accuracy | Accuracy and correctness in understanding corrosion mechanisms, material properties, and environmental factors | Completely consistent Partially consistent Inconsistent No standard answer |
Content Completeness | Completeness of information, including coverage of corrosion factors, key points of diagnosis, and mitigation strategies | Comprehensive Mostly comprehensive Incomplete Not applicable |
Practical Relevance | Relevance to practical problem scenarios and consistency with real-world corrosion conditions or engineering requirements | Highly relevant Partially relevant Not relevant Unclear |
Question | What are the Main Factors Influencing the Corrosion of Carbon Steel in a Marine Atmospheric Environment? |
---|---|
Reference Factors | Temperature Humidity Light intensity Salinity Pollutants Chloride ion concentration |
Temp = 0.2 | Salt content Humidity Temperature Pollutants Wind force Ultraviolet (UV) radiation |
Temp = 0.5 | Salt content Humidity Temperature Wind speed Pollutants Protective coatings |
Temp = 0.7 | Salt content Humidity Temperature Oxygen content Pollutants Mechanical stress |
Temp = 0.1 | Salt content Humidity Temperature Wind force Pollutants |
Qwen2.5-7B prompt-factor | Chloride ion concentration Salt content Oxygen content Carbon dioxide content Temperature Humidity Salt spray concentration Wind speed Microbial activity Electrochemical processes |
Item | Response |
---|---|
Question | In one sentence, describe what the key factors contributing to pitting corrosion in stainless steel are? |
Reference Answer | Pitting corrosion in stainless steel is primarily caused by the localized breakdown of the passive oxide film, typically initiated by chloride ions. Contributing factors include chloride ion concentration, temperature, pH level, and alloy composition. |
Qwen2.5-7B | Pitting corrosion in stainless steel is primarily driven by the combined effects of chloride ions, halide ions, and a moist, oxygen-rich environment, often exacerbated by mechanical stress or localized areas of stagnant or stagnant-like conditions. |
Corr-LoRA | Pitting corrosion in stainless steel is primarily driven by chloride ion penetration and localized film breakdown. Other influencing factors include ambient salinity, residual welding stress, and the dislocation density near grain boundaries. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, W.; Xu, D.; Liu, L.; Wang, B.; Zhao, Y.; Cheng, X.; Li, X. Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG. Appl. Sci. 2025, 15, 9226. https://doi.org/10.3390/app15169226
Wu W, Xu D, Liu L, Wang B, Zhao Y, Cheng X, Li X. Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG. Applied Sciences. 2025; 15(16):9226. https://doi.org/10.3390/app15169226
Chicago/Turabian StyleWu, Weitong, Di Xu, Liangan Liu, Bingqin Wang, Yadi Zhao, Xuequn Cheng, and Xiaogang Li. 2025. "Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG" Applied Sciences 15, no. 16: 9226. https://doi.org/10.3390/app15169226
APA StyleWu, W., Xu, D., Liu, L., Wang, B., Zhao, Y., Cheng, X., & Li, X. (2025). Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG. Applied Sciences, 15(16), 9226. https://doi.org/10.3390/app15169226