Fine-Tuned Large Language Models for High-Accuracy Prediction of Band Gap and Stability in Transition Metal Sulfides
Abstract
1. Introduction
2. Materials and Methods
2.1. High-Quality Dataset Construction
2.2. Traditional Language Models
2.3. Prompt-Only ChatGPT Approach
2.4. Fine-Tuning of Large Language Models
3. Results and Discussion
3.1. Description of Fine-Tuned Datasets for Transition Metal Sulfides
3.2. Fine-Tuning for Performance Enhancement and Model Iterations
3.3. Comparative Analysis of Model Prediction Accuracy
3.4. Analysis of Prediction Patterns and Model Improvement
3.5. Advantages of Fine-Tuning LLMs for Materials Science Applications
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kajana, T.; Pirashanthan, A.; Velauthapillai, D.; Yuvapragasam, A.; Yohi, S.; Ravirajan, P.; Senthilnanthanan, M. Potential transition and post-transition metal sulfides as efficient electrodes for energy storage applications: Review. RSC Adv. 2022, 12, 18041–18062. [Google Scholar] [CrossRef]
- Kharboot, L.H.; Fadil, N.A.; Abu Bakar, T.A.; Najib, A.S.M.; Nordin, N.H.; Ghazali, H. A Review of Transition Metal Sulfides as Counter Electrodes for Dye-Sensitized and Quantum Dot-Sensitized Solar Cells. Materials 2023, 16, 2881. [Google Scholar] [CrossRef] [PubMed]
- Wu, G.; Zelenay, P. Activity versus stability of atomically dispersed transition-metal electrocatalysts. Nat. Rev. Mater. 2024, 9, 643–656. [Google Scholar] [CrossRef]
- Jiang, X.; Fu, H.; Bai, Y.; Jiang, L.; Zhang, H.; Wang, W.; Yun, P.; He, J.; Xue, D.; Lookman, T.; et al. Interpretable Machine Learning Applications: A Promising Prospect of AI for Materials. Adv. Funct. Mater. 2025. [CrossRef]
- Wang, T.; Zhang, K.; Thé, J.; Yu, H. Accurate prediction of band gap of materials using stacking machine learning model. Comput. Mater. Sci. 2022, 201, 110899. [Google Scholar] [CrossRef]
- Noor, N.; Iqbal, M.W.; Zelai, T.; Mahmood, A.; Shaikh, H.; Ramay, S.M.; Al-Masry, W. Analysis of direct band gap A2ScInI6 (A = Rb, Cs) double perovskite halides using DFT approach for renewable energy devices. J. Mater. Res. Technol. 2021, 13, 2491–2500. [Google Scholar] [CrossRef]
- Masood, H.; Sirojan, T.; Toe, C.Y.; Kumar, P.V.; Haghshenas, Y.; Sit, P.H.-L.; Amal, R.; Sethu, V.; Teoh, W.Y. Enhancing prediction accuracy of physical band gaps in semiconductor materials. Cell Rep. Phys. Sci. 2023, 4, 101555. [Google Scholar] [CrossRef]
- Singh, P. Density-functional theory of material design: Fundamentals and applications-I. Oxf. Open Mater. Sci. 2020, 1, itab018. [Google Scholar] [CrossRef]
- Guillén, C. Band Gap Energy and Lattice Distortion in Anatase TiO2 Thin Films Prepared by Reactive Sputtering with Different Thicknesses. Materials 2025, 18, 2346. [Google Scholar] [CrossRef]
- Schleder, G.R.; Padilha, A.C.M.; Acosta, C.M.; Costa, M.; Fazzio, A. From DFT to machine learning: Recent approaches to materials science—A review. J. Phys. Mater. 2019, 2, 032001. [Google Scholar] [CrossRef]
- Stoll, A.; Benner, P. Machine learning for material characterization with an application for predicting mechanical properties. GAMM-Mitteilungen 2021, 44, e202100003. [Google Scholar] [CrossRef]
- Rajan, A.C.; Mishra, A.; Satsangi, S.; Vaish, R.; Mizuseki, H.; Lee, K.-R.; Singh, A.K. Machine-learning-assisted accurate band gap predictions of functionalized mxene. Chem. Mater. 2018, 30, 4031–4038. [Google Scholar] [CrossRef]
- Sabagh Moeini, A.; Shariatmadar Tehrani, F.; Naeimi-Sadigh, A. Machine learning-enhanced band gaps prediction for low-symmetry double and layered perovskites. Sci Rep. 2024, 14, 26736. [Google Scholar] [CrossRef] [PubMed]
- Chan, C.H.; Sun, M.; Huang, B. Application of machine learning for advanced material prediction and design. EcoMat 2022, 4. [Google Scholar] [CrossRef]
- Jain, A. Machine learning in materials research: Developments over the last decade and challenges for the future. Curr. Opin. Solid State Mater. Sci. 2024, 33, 101189. [Google Scholar] [CrossRef]
- Kalutharage, C.S.; Liu, X.; Chrysoulas, C. Explainable AI and Deep Autoencoders Based Security Framework for IoT Network Attack Certainty (Extended Abstract). In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin, Germany, 2022; pp. 41–50. [Google Scholar]
- Zhang, L.; Zhang, H.; Ji, B.; Liu, L.; Liu, X.; Chen, D. Application of Machine Learning in Amorphous Alloys. Materials 2025, 18, 1771. [Google Scholar] [CrossRef] [PubMed]
- Taniguchi, T.; Hosokawa, M.; Asahi, T. Graph Comparison of Molecular Crystals in Band Gap Prediction Using Neural Networks. ACS Omega 2023, 8, 39481–39489. [Google Scholar] [CrossRef]
- Zhao, Z.; Li, H.-F. Investigating Material Interface Diffusion Phenomena through Graph Neural Networks in Applied Materials. ACS Appl. Mater. Interfaces 2024, 16, 53153–53162. [Google Scholar] [CrossRef]
- Li, Y.; Gupta, V.; Kilic, M.N.T.; Choudhary, K.; Wines, D.; Liao, W.K.; Choudhary, A.; Agrawal, A. Hybrid-LLM-GNN: Integrating large language models and graph neural networks for enhanced materials property prediction. Digit. Discov. 2024, 4, 376–383. [Google Scholar] [CrossRef]
- Hu, L.; Zhou, Z.; Jia, G. A one-shot automated framework based on large language model and AutoML: Accelerating the design of porous carbon materials and carbon capture optimization. Sep. Purif. Technol. 2025, 376, 133487. [Google Scholar] [CrossRef]
- Omee, S.S.; Louis, S.-Y.; Fu, N.; Wei, L.; Dey, S.; Dong, R.; Li, Q.; Hu, J. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns 2022, 3, 100491. [Google Scholar] [CrossRef]
- Zhou, M.; Duan, N.; Liu, S.; Shum, H.-Y. Progress in Neural NLP: Modeling, Learning, and Reasoning. Engineering 2020, 6, 275–290. [Google Scholar] [CrossRef]
- Annepaka, Y.; Pakray, P. Large language models: A survey of their development, capabilities, and applications. Knowl. Inf. Syst. 2024, 67, 2967–3022. [Google Scholar] [CrossRef]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The rise and potential of large language model based agents: A survey. Sci. China Inf. Sci. 2025, 68, 121101. [Google Scholar] [CrossRef]
- Saal, J.E.; Kirklin, S.; Aykol, M.; Meredig, B.; Wolverton, C. Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM 2013, 65, 1501–1509. [Google Scholar] [CrossRef]
- Kumar, P.; Kabra, S.; Cole, J.M. A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor. Sci. Data 2024, 11, 1273. [Google Scholar] [CrossRef]
- Ganose, A.M.; Jain, A. Robocrystallographer: Automated crystal structure text descriptions and analysis. MRS Commun. 2019, 9, 874–881. [Google Scholar] [CrossRef]
- Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking materials property prediction methods: The Matbench test set and Automatminer reference algorithm. NPJ Comput. Mater. 2020, 6, 138. [Google Scholar] [CrossRef]
- Chaudhari, A.; Guntuboina, C.; Huang, H.; Farimani, A.B. AlloyBERT: Alloy property prediction with large language models. Comput. Mater. Sci. 2024, 244, 113256. [Google Scholar] [CrossRef]
- Chandrasekhar, A.; Chan, J.; Ogoke, F.; Ajenifujah, O.; Barati Farimani, A. AMGPT: A large language model for contextual querying in additive manufacturing. Addit. Manuf. Lett. 2024, 11, 100232. [Google Scholar] [CrossRef]
- Liu, H.; Xu, L.; Ma, Z.; Li, Z.; Li, H.; Zhang, Y.; Zhang, B.; Wang, L.-L. Accurate prediction of semiconductor bandgaps based on machine learning and prediction of bandgaps for two-dimensional heterojunctions. Mater. Today Commun. 2023, 36, 106578. [Google Scholar] [CrossRef]
- Guo, Y.; Park, T.; Yi, J.W.; Henzie, J.; Kim, J.; Wang, Z.; Jiang, B.; Bando, Y.; Sugahara, Y.; Tang, J.; et al. Nanoarchitectonics for Transition-Metal-Sulfide-Based Electrocatalysts for Water Splitting. Adv. Mater. 2019, 31, 1807134. [Google Scholar] [CrossRef]
- Huang, H.; Magar, R.; Xu, C.; Farimani, A.B. Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction. arXiv 2023, arXiv:2308.16259. [Google Scholar] [CrossRef]
- Jain, A.; Ong, S.P.; Hautier, G.; Chen, W.; Richards, W.D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002. [Google Scholar] [CrossRef]
- Wu, S.; Kondo, Y.; Kakimoto, M.-A.; Yang, B.; Yamada, H.; Kuwajima, I.; Lambard, G.; Hongo, K.; Xu, Y.; Shiomi, J.; et al. Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm. NPJ Comput. Mater. 2019, 5, 5. [Google Scholar] [CrossRef]
- Wang, Z.; Zhu, H.; Dong, Z.; He, X.; Huang, S.L. Less Is Better: Unweighted Data Subsampling via Influence Function. Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34, 6340–6347. [Google Scholar] [CrossRef]
- Xian, Y.; Lorenz, T.; Schiele, B.; Akata, Z. Feature Generating Networks for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Himanen, L.; Geurts, A.; Foster, A.S.; Rinke, P. Data-Driven Materials Science: Status, Challenges, and Perspectives. Adv. Sci. 2019, 6. [Google Scholar] [CrossRef] [PubMed]
- Ramos, M.C.; Collison, C.J.; White, A.D. A Review of Large Language Models and Autonomous Agents in Chemistry. Chem. Sci. 2024, 16, 2514–2572. [Google Scholar] [CrossRef] [PubMed]
GPT-3.5 | GPT-4.0 | GPT-FT | |
---|---|---|---|
R2 RMSE | 0.7937 | 0.8542 | 0.9989 |
0.3041 | 0.2453 | 0.0252 |
RF | SVM | XGBoost | LightGBM | |
---|---|---|---|---|
Train Train R2 Train RMSE Test R2 Test RMSE | 0.9713 | 0.9533 | 0.9624 | 0.9612 |
0.1077 | 0.1874 | 0.0842 | 0.1208 | |
0.9655 | 0.9410 | 0.9460 | 0.9592 | |
0.1491 | 0.1951 | 0.1867 | 0.1623 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.; Hu, L.; Wang, H. Fine-Tuned Large Language Models for High-Accuracy Prediction of Band Gap and Stability in Transition Metal Sulfides. Materials 2025, 18, 3793. https://doi.org/10.3390/ma18163793
Zhao Z, Hu L, Wang H. Fine-Tuned Large Language Models for High-Accuracy Prediction of Band Gap and Stability in Transition Metal Sulfides. Materials. 2025; 18(16):3793. https://doi.org/10.3390/ma18163793
Chicago/Turabian StyleZhao, Zimo, Lin Hu, and Honghui Wang. 2025. "Fine-Tuned Large Language Models for High-Accuracy Prediction of Band Gap and Stability in Transition Metal Sulfides" Materials 18, no. 16: 3793. https://doi.org/10.3390/ma18163793
APA StyleZhao, Z., Hu, L., & Wang, H. (2025). Fine-Tuned Large Language Models for High-Accuracy Prediction of Band Gap and Stability in Transition Metal Sulfides. Materials, 18(16), 3793. https://doi.org/10.3390/ma18163793