Bandgap Prediction of Silicon Oxide Materials for Electric Furnace Refractories Based on Explainable Machine Learning
Abstract
:1. Introduction
2. Model Building
2.1. Establish a Silicon Oxide Materials Database
2.2. Feature Extraction for Silicon Oxide Material Data
2.3. Hyperparameter Optimization: Exploring Optimal Model Parameters
2.4. Cross-Validation: Model Performance Evaluation
2.5. Model Evaluation Methodology
3. Results and Discussion
3.1. Hyperparameter Optimization and Cross-Validation Results of the Model
3.2. Comparison of Model Performance
4. Resolution of Bandgap Regulators in Silicon Oxides Based on SHAP Analysis
5. Conclusions
- The ensemble learning model achieves high-precision predictions even after excluding the dependence on CBM/VBM features, and it improves performance by more than 25% compared to traditional methods. Additionally, through ensemble strategies, it significantly reduces noise interference on prediction stability while retaining 80% of the data variance explanation rate. This feature decoupling modeling approach breaks the traditional band theory’s reliance on directly correlated features.
- Among the seven machine learning models compared, the AdaBoost model demonstrated remarkable accuracy in predicting the bandgap of silicon oxide materials, achieving the highest R2 value (0.80) and the lowest MAE (0.5).
- Through SHAP explainability analysis, a physical linkage between material features and prediction outcomes was successfully established, revealing that the model’s decision-making mechanism aligns with the principles of condensed matter physics. Among these features, ‘energy above hull’, as a thermodynamic stability indicator, has a feature importance value as high as 0.45, making it a core regulatory factor in the predictive model. SHAP analysis also uncovered relationships between key features such as ‘energy above hull’, ‘num of unique magnetic sites’, and ‘formation energy per atom’, and the bandgap. Optimizing these features can significantly enhance the thermal stability and erosion resistance of refractory materials, thereby reducing heat and electrical energy waste during electric furnace smelting processes and lowering CO2 emissions.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zharmenov, A.; Yefremova, S.; Satbaev, B.; Shalabaev, N.; Satbaev, S.; Yermishin, S.; Kablanbekov, A. Production of refractory materials using a renewable source of silicon dioxide. Minerals 2022, 12, 1010. [Google Scholar] [CrossRef]
- Sadik, C.; El Amrani, I.E.; Albizane, A. Recent advances in silica-alumina refractory: A review. J. Asian Ceram. Soc. 2014, 2, 83–96. [Google Scholar] [CrossRef]
- Liu, B.; Sun, J.; Guo, L.; Shi, H.; Feng, G.; Feldmann, L.; Yin, X.; Riedel, R.; Fu, Q.; Li, H. Materials design of silicon based ceramic coatings for high temperature oxidation protection. Mater. Sci. Eng. R Rep. 2025, 163, 100936. [Google Scholar] [CrossRef]
- Matsumoto, Y.; Melendez, F.; Asomoza, R. Plasma CVD deposited p-type silicon oxide wide-bandgap material for solar cells. Sol. Energy Mater. Sol. Cells 1998, 52, 251–260. [Google Scholar] [CrossRef]
- Tao, J.; Yan, Z.; Yang, J.; Li, J.; Lin, Y.; Huang, Z. Boosting the cell performance of the SiOx@ C anode material via rational design of a Si-valence gradient. Carbon Energy 2022, 4, 129–141. [Google Scholar] [CrossRef]
- Dinic, F.; Neporozhnii, I.; Voznyy, O. Machine learning models for the discovery of direct band gap materials for light emission and photovoltaics. Comput. Mater. Sci. 2024, 231, 112580. [Google Scholar] [CrossRef]
- Morelock, R.J.; Bare, Z.J.L.; Musgrave, C.B. Bond-valence parameterization for the accurate description of DFT energetics. J. Chem. Theory Comput. 2022, 18, 3257–3267. [Google Scholar] [CrossRef]
- Lee, J.; Seko, A.; Shitara, K.; Nakayama, K.; Tanaka, I. Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques. Phys. Rev. B 2016, 93, 115104. [Google Scholar] [CrossRef]
- Mattur, M.N.; Nagappan, N.; Rath, S.; Thomas, T. Prediction of nature of band gap of perovskite oxides (ABO3) using a machine learning approach. J. Mater. 2022, 8, 937–948. [Google Scholar]
- Lin, C.M.; Khatri, A.; Yan, D.; Chen, C.C. Machine Learning and First-Principle Predictions of Materials with Low Lattice Thermal Conductivity. Materials 2024, 17, 5372. [Google Scholar] [CrossRef]
- Zhuo, Y.; Tehrani, A.M.; Brgoch, J. Predicting the band gaps of inorganic solids by machine learning. J. Phys. Chem. Lett. 2018, 9, 1668–1673. [Google Scholar] [CrossRef]
- Li, J.; Song, Q.; Liu, Z.; Wang, D. Machine Learning for Predicting Band Gap in Boron-containing Materials. Acta Chim. Sin. 2024, 82, 387. [Google Scholar] [CrossRef]
- Gok, E.C.; Yildirim, M.O.; Haris, M.P.U.; Eren, E.; Pegu, M.; Hemasiri, N.H.; Huang, P.; Kazim, S.; Oksuz, A.U.; Ahmad, S. Predicting perovskite bandgap and solar cell performance with machine learning. Sol. RRL 2022, 6, 2100927. [Google Scholar] [CrossRef]
- Olsthoorn, B.; Geilhufe, R.M.; Borysov, S.S.; Balatsky, A.V. Band gap prediction for large organic crystal structures with machine learning. Adv. Quantum Technol. 2019, 2, 1900023. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Jain, A.; Ong, S.P.; Hautier, G.; Chen, W.; Richards, W.D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002. [Google Scholar] [CrossRef]
- Jain, A.; Montoya, J.; Dwaraknath, S.; Zimmermannk, N.E.R.; Dagdelen, J.; Horton, M.; Huck, P.; Winston, D.; Cholia, S.; Ong, S.P.; et al. The materials project: Accelerating materials design through theory-driven data and tools. Handb. Mater. Model. Methods Theory Model. 2020, 2020, 1751–1784. [Google Scholar]
- Nargesian, F.; Samulowitz, H.; Khurana, U.; Khalil, E.B.; Turaga, D. Learning Feature Engineering for Classification. Proc. Ijcai 2017, 17, 2529–2535. [Google Scholar]
- Zhi, C.; Wang, S.; Sun, S.; Li, C.; Li, Z.; Wan, Z.; Wang, H.; Li, Z.; Liu, Z. Machine-learning-assisted screening of interface passivation materials for perovskite solar cells. ACS Energy Lett. 2023, 8, 1424–1433. [Google Scholar] [CrossRef]
- Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv 2020, arXiv:2003.05689. [Google Scholar]
- Browne, M.W. Cross-validation methods. J. Math. Psychol. 2000, 44, 108–132. [Google Scholar] [CrossRef] [PubMed]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 2001, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
No. | Feature | Count | Mean | Std | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|---|---|---|
1 | number of sites | 5291 | 72.14 | 38.47 | 3 | 36 | 70 | 114 | 162 |
2 | number of elements | 5291 | 4.66 | 1.19 | 2 | 4 | 5 | 6 | 9 |
3 | volume(Å3) | 5291 | 3.27 | 0.94 | 0.57 | 2.77 | 3.16 | 3.64 | 9.19 |
4 | density(g/cm3) | 5291 | 3.27 | 0.94 | 0.57 | 2.77 | 3.16 | 3.64 | 9.19 |
5 | atomic density | 5291 | 13.60 | 2.84 | 7.06 | 12.07 | 13.34 | 14.49 | 58.15 |
6 | uncorrected energy per atom | 5291 | −7.29 | 1.44 | −26.95 | −7.55 | −7.12 | −6.67 | −2.66 |
7 | energy per atom | 5291 | −7.50 | 0.72 | −17.21 | −7.98 | −7.56 | −7.09 | −3.07 |
8 | formation energy per atom | 5291 | −2.77 | 0.46 | −3.93 | −3.06 | −2.81 | −2.54 | 1.39 |
9 | energy above hull | 5291 | 0.10 | 0.18 | 0 | 0.02 | 0.06 | 0.11 | 4.67 |
10 | efermi | 5291 | 2.21 | 1.96 | −6.28 | 1.20 | 1.95 | 2.88 | 14.32 |
11 | num of magnetic sites | 5291 | 1.65 | 3.87 | 0 | 0 | 0 | 2 | 83 |
12 | num of unique magnetic sites | 5291 | 1.49 | 4.81 | 0 | 0 | 0 | 1 | 79 |
13 | bandgap(eV) | 5291 | 2.74 | 1.56 | 0 | 1.63 | 2.78 | 3.84 | 6.99 |
Hyper Parameterization | Models | ||||||
---|---|---|---|---|---|---|---|
Adaboost | GBR | XGB | Lightgbm | Decision Tree | SVR | Random Forest | |
n_iter | 100 | 100 | 100 | 100 | 100 | 20 | 100 |
n_Estimators | (50, 200) | (50, 300) | (50, 300) | (1, 300) | |||
learning_rate | (0.01, 1) | (0.01, 1) | (0.01, 1) | (0.01, 1) | |||
Max depth | (1, 20) | (1, 15) | (1, 15) | (1, 15) | (3, 20) | (3, 20) |
Hyper Parameterization | Models | |||
---|---|---|---|---|
Adaboost | GBR | XGB | Lightgbm | |
n Estimator | 189 | 217 | 104 | 100 |
Learning rate | 0.98 | 0.03 | 0.08 | 0.12 |
Max depth | 17 | 13 | 14 | 15 |
Models | Evaluation Indicators | |||
---|---|---|---|---|
MAE | MSE | R2 | ||
AdaBoost | Mean | 0.52 | 0.58 | 0.75 |
Standard deviation | 0.02 | 0.04 | 0.02 | |
GBR | Mean | 0.55 | 0.59 | 0.75 |
Standard deviation | 0.02 | 0.03 | 0.02 | |
XGBoost | Mean | 0.55 | 0.6 | 0.75 |
Standard deviation | 0.02 | 0.04 | 0.02 | |
LightGBM | Mean | 0.58 | 0.63 | 0.73 |
Standard deviation | 0.01 | 0.02 | 0.01 | |
Decision Tree | Mean | 0.71 | 1 | 0.58 |
Standard deviation | 0.04 | 0.04 | 0.02 | |
SVR | Mean | 0.6 | 0.74 | 0.69 |
Standard deviation | 0.04 | 0.04 | 0.02 | |
Random Forest | Mean | 0.59 | 0.63 | 0.73 |
Standard deviation | 0.02 | 0.03 | 0.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, X.; Wu, Y.; Yang, J.; Zhao, X.; Han, Y. Bandgap Prediction of Silicon Oxide Materials for Electric Furnace Refractories Based on Explainable Machine Learning. Processes 2025, 13, 1595. https://doi.org/10.3390/pr13051595
Zhao X, Wu Y, Yang J, Zhao X, Han Y. Bandgap Prediction of Silicon Oxide Materials for Electric Furnace Refractories Based on Explainable Machine Learning. Processes. 2025; 13(5):1595. https://doi.org/10.3390/pr13051595
Chicago/Turabian StyleZhao, Xin, Yanqing Wu, Jinmei Yang, Xuan Zhao, and Yang Han. 2025. "Bandgap Prediction of Silicon Oxide Materials for Electric Furnace Refractories Based on Explainable Machine Learning" Processes 13, no. 5: 1595. https://doi.org/10.3390/pr13051595
APA StyleZhao, X., Wu, Y., Yang, J., Zhao, X., & Han, Y. (2025). Bandgap Prediction of Silicon Oxide Materials for Electric Furnace Refractories Based on Explainable Machine Learning. Processes, 13(5), 1595. https://doi.org/10.3390/pr13051595