CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals
Abstract
1. Introduction
- To address the nonlinearity and large time delays in electronic-grade semiconductor silicon single-crystal growth, as well as the difficulty of directly measuring the crystal diameter, this paper proposes a hybrid modeling approach that integrates mechanistic and data-driven models, combining interpretability with high predictive accuracy. This innovative research approach provides a theoretical foundation and practical guidance for crystal diameter prediction and for improving the quality of silicon single crystals;
- By integrating control theory with machine learning, the proposed hybrid modeling framework provides a new perspective on semiconductor silicon single-crystal growth and has significant implications for advancing the intelligent manufacturing of semiconductor materials.
2. Proposed Hybrid Modeling Framework
2.1. Mechanistic Modeling of the Silicon Single-Crystal Growth System
2.1.1. Heat-Transfer Model
- Governing equation for the temporal evolution of the heater temperature :
- The heat capacity of the heater is given by the following:
- Heater volume :
- Radiative heat-transfer rate from heater to crucible :where is the heater power, is the heater specific heat capacity, is the heater density, and are the outer and inner radii, is the crucible surface area, is the Stefan–Boltzmann constant, and and are the heater and crucible temperatures.
- Governing equation for the temporal evolution of the crucible temperature :
- The crucible heat capacity is computed as follows:
- The calculation method for the crucible volume is as follows:where denotes the radiative heat-transfer rate from the crucible to the environment, the radiative heat-transfer rate from the crucible to the melt, the conductive heat-transfer rate from the crucible to the melt, the specific heat capacity of the crucible material, the crucible material density, and the crucible height.
2.1.2. Geometric Model
- Governing equation for the temporal evolution of the melt height at the solid–liquid interface from mass conservation:where denotes the crystal pulling velocity, the rate of change of the meniscus height, the melt density, and the crucible radius.
- Equation for the meniscus height :From Equation (9), it follows thatwhere denotes the capillary length, which depends on the meniscus surface tension and the melt density, denotes the crystal growth angle, and denotes the crystal tilt angle.
- Relationship between the temporal variation of the crystal radius and the growth rate :
- Temporal evolution of the tilt angle :wherewhere denotes the crucible lifting rate.
2.2. Convolutional Neural Network (CNN)
2.3. Long Short-Term Memory (LSTM) Network
2.3.1. Recurrent Neural Network (RNN)
2.3.2. Gating Mechanisms
2.4. Bidirectional Long Short-Term Memory (BiLSTM) Network
2.5. Self-Attention Mechanism
2.6. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling Method
- (1)
- Data acquisition: Shoulder-formation and constant-diameter stage data are collected both from actual CZ silicon single-crystal growth processes using an industrial furnace (TDR-180, National and Local Engineering Research Center of Crystal Growth Equipment and System Integration, Xi’an University of Technology, Xi’an, China) with a sampling interval of 2 s, covering the shoulder-formation and constant-diameter stages, and from crystal growth models constructed based on first principles under different operating conditions in multiple furnaces, in order to enrich the experimental sample dataset.
- (2)
- Mechanistic model simulation: Using Simulink R2023a, the lifting speed and heater power under actual operating conditions are fed into the mechanistic model to obtain diameter predictions, which exhibit relatively large errors. Purely mechanistic modeling yields crystal diameters that reach only about 50% of the actual diameter.
- (3)
- Data preprocessing: The data used for model training are preprocessed by handling missing values and outliers, filtering out random noise, applying normalization, and then partitioning the dataset into training and test sets.
- (4)
- Training the CNN model for feature extraction: During silicon single-crystal growth, variations in crystal diameter are influenced by multiple factors, including thermal conditions and operating actions such as heater power and crystal pulling speed, which often exhibit spatial interrelationships. The CNN layer is responsible for extracting local features from the raw input data, such as the relationship between heater power and crystal pulling speed. Convolution operations can identify these key patterns across different time points and spatial ranges. The extracted features not only provide rich inputs for the subsequent BiLSTM layer, but also supply diverse feature representations for the self-attention mechanism. The feature outputs of the CNN are then passed to the BiLSTM and attention layers, ensuring that subsequent time-series modeling and the weighting of key features are performed on a more accurate basis.
- (5)
- Training the BiLSTM network model: The primary role of the BiLSTM layer is to capture temporal dependencies in time-series data by using forward and backward LSTM networks to learn past and future information in the input sequence, respectively. In crystal diameter prediction for silicon single-crystal growth, the BiLSTM not only extracts information from historical data that is relevant to the current diameter, but also, through the backward LSTM, captures the potential impact of future operations on diameter variation. The feature representations provided by the CNN serve as inputs to the BiLSTM layer, helping it to learn dynamic relationships and trends across time, and to supply the attention layer with a temporally informed feature sequence. This bidirectional information-processing capability enables the model to predict future diameter changes more accurately, particularly under long control horizons and process complexity.
- (6)
- Self-attention focusing: The attention layer plays a focusing role in the overall model by assigning different weights to the input features at each time step according to their importance. In the silicon single-crystal growth process, not all control variables affect the crystal diameter to the same extent; at certain time instants, heater power or pulling speed may have a much stronger influence on diameter variations, while other variables have only a minor effect. By assigning weights to the features at each time step, the attention mechanism enables the model to adaptively focus on the most relevant information, thereby improving predictive accuracy. In the proposed model, the attention layer not only relies on the feature representations produced by the CNN and BiLSTM, but also makes use of their temporal information and local features to further strengthen the focus on data at critical time instants.
- (7)
- During CZ silicon single-crystal growth, crystal diameter prediction can be achieved by summing the measured data with the differential prediction values and feeding the result into the hybrid modeling module.
3. Industrial Experiment Simulation
3.1. Performance Evaluation Metrics
3.2. Prediction Results and Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dezfoli, A.R.A. Review of simulation and modeling techniques for silicon Czochralski crystal growth. J. Cryst. Growth 2024, 648, 127921. [Google Scholar] [CrossRef]
- Zheng, Z.-C.; Seto, T.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M.; Hasebe, S. A first-principle model of 300 mm Czochralski single-crystal Si production process for predicting crystal radius and crystal growth rate. J. Cryst. Growth 2018, 492, 105–113. [Google Scholar] [CrossRef]
- Hou, L.; Gao, D.; Wang, S.; Zhang, W.; Lin, H.; An, Y. Particle Swarm Optimization–Long Short-Term Memory-Based Dynamic Prediction Model of Single-Crystal Furnace Temperature and Heating Power. Crystals 2025, 15, 110. [Google Scholar] [CrossRef]
- Ren, J.-C.; Liu, D.; Wan, Y. Modeling and application of Czochralski silicon single crystal growth process using hybrid model of data-driven and mechanism-based methodologies. J. Process Control 2021, 104, 74–85. [Google Scholar] [CrossRef]
- Li, Y.-K.; Chen, C.; Liu, D.; Li, D.-P. Anti-disturbance switching control for silicon single crystal growth systems under unmeasured states. IEEE Trans. Cybern. 2025, 55, 4865–4877. [Google Scholar] [CrossRef]
- Yen, C.-C.; Singh, A.K.; Chung, Y.-M.; Chou, H.-Y.; Wuu, D.-S. Study of flow pattern defects and oxidation induced stacking faults in Czochralski single-crystal silicon growth. Crystals 2023, 13, 336. [Google Scholar] [CrossRef]
- Liu, D.; Zhao, X.-G.; Zhao, Y. A review of growth process modeling and control of czochralski silicon single crystal. Control Theory Appl. 2017, 34, 1–12. [Google Scholar]
- Liu, X.; Harada, H.; Miyamura, Y.; Han, X.-F.; Nakano, S.; Nishizawa, S.; Kakimoto, K. Transient global modeling for the pulling process of Czochralski silicon crystal growth. II. Investigation on segregation of oxygen and carbon. J. Cryst. Growth 2020, 532, 125404. [Google Scholar] [CrossRef]
- Wang, K.; Koch, H.; Trempa, M.; Kranert, C.; Friedrich, J.; Derby, J.J. Physically-based, lumped-parameter models for the prediction of oxygen concentration during Czochralski growth of silicon crystals. J. Cryst. Growth 2021, 576, 126384. [Google Scholar] [CrossRef]
- Popescu, A.; Vizman, D. Particularities of the thermal and oxygen concentration instabilities in a Czochralski process for solar silicon growth. J. Cryst. Growth 2023, 611, 127177. [Google Scholar] [CrossRef]
- Sun, Q.-Q.; Ge, Z.-Q. A Survey on Deep Learning for Data-Driven Soft Sensors. IEEE Trans. Ind. Inform. 2021, 17, 5853–5866. [Google Scholar] [CrossRef]
- Kutsukake, K.; Nagai, Y.; Banba, H. Virtual experiments of Czochralski growth of silicon using machine learning: Influence of processing parameters on interstitial oxygen concentration. J. Cryst. Growth 2022, 584, 126580. [Google Scholar] [CrossRef]
- Liu, D.; Zhang, N.; Jiang, L.; Zhao, X.-G.; Duan, W.-F. Nonlinear Generalized Predictive Control of the Crystal Diameter in CZ-Si Crystal Growth Process Based on Stacked Sparse Autoencoder. J. IEEE Trans. Control Syst. Technol. 2020, 28, 1132–1139. [Google Scholar] [CrossRef]
- Wan, Y.; Liu, D.; Liu, C.-C.; Zhao, X.-G.; Ren, J.-C. Data-Driven Model Predictive Control of Cz Silicon Single Crystal Growth Process With V/G Value Soft Measurement Model. J. IEEE Trans. Semicond. Manuf. 2021, 34, 420–428. [Google Scholar] [CrossRef]
- Jiang, L.; Teng, D.; Zhao, Y. A Soft Measurement Method for the Tail Diameter in the Growing Process of Czochralski Silicon Single Crystals. Appl. Sci. 2024, 14, 1569. [Google Scholar] [CrossRef]
- Qi, X.-F.; Ma, W.-C.; Dang, Y.-F.; Su, W.-J.; Liu, L.-J. Optimization of the melt/crystal interface shape and oxygen concentration during the Czochralski silicon crystal growth process using an artificial neural network and a genetic algorithm. J. Cryst. Growth 2020, 548, 125828. [Google Scholar] [CrossRef]
- Zabihi, M.; Mehrizi, R.V.; Kasaiezadeh, A.; Pirani, M.; Khajepour, A. A Hybrid Model-Data Vehicle Sensor and Actuator Fault Detection and Diagnosis System. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8121–8133. [Google Scholar] [CrossRef]
- Chen, Y.-T.; Huang, D.; Zhang, D.-X.; Zeng, J.-S.; Wang, N.-Z.; Zhang, H.-R.; Yan, J.-Y. Theory-guided hard constraint projection (HCP): A knowledge-based data-driven scientific machine learning method. J. Comput. Phys. 2021, 445, 110624. [Google Scholar] [CrossRef]
- Kato, S.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M. Gray-box modeling of 300 mm diameter Czochralski single-crystal Si production process. J. Cryst. Growth 2021, 553, 125929. [Google Scholar] [CrossRef]
- Ren, J.-C.; Liu, D.; Wan, Y. Data-Driven and Mechanism-Based Hybrid Model for Semiconductor Silicon Monocrystalline Quality Prediction in the Czochralski Process. IEEE Trans. Semicond. Manuf. 2022, 35, 658–669. [Google Scholar] [CrossRef]
- Wan, Y.; Liu, D.; Ren, J.-C. Performance-driven semiconductor silicon crystal quality control. J. Process Control 2022, 120, 68–85. [Google Scholar] [CrossRef]
- Sun, B.; Liu, X.-D.; Wang, J.-Y.; Wei, X.-Z.; Yuan, H.; Dai, H.-F. Short-term performance degradation prediction of a commercial vehicle fuel cell system based on CNN and LSTM hybrid neural network. Int. J. Hydrogen Energy 2023, 48, 8613–8628. [Google Scholar] [CrossRef]
- Pan, S.-W.; Yang, B.; Wang, S.-K.; Guo, Z.; Wang, L.; Liu, J.-H.; Wu, S.-Y. Oil well production prediction based on CNN-LSTM model with self-attention mechanism. Energy 2023, 284, 128701. [Google Scholar] [CrossRef]
- Gao, M.-Y.; Xie, Y.-J.; Song, P.; Qian, J.-H.; Sun, X.-G.; Liu, J.Y. A definition rule for defect classification and grading of solar cells photoluminescence feature images and estimation of CNN-based automatic defect detection method. Crystals 2023, 13, 819. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, Y.; Liu, J.; Zhang, Y.; Zheng, Y. A CNN-BiLSTM monthly rainfall prediction model based on SCSSA optimization. J. Water Clim. Change 2024, 15, 4862–4876. [Google Scholar] [CrossRef]
- Li, F.; Liu, S.-H.; Wang, T.-H.; Liu, R.-R. Optimal planning for integrated electricity and heat systems using CNN-BiLSTM-Attention network forecasts. Energy 2024, 309, 133042. [Google Scholar] [CrossRef]
- Qin, C.-Y.; Qin, D.-L.; Jiang, Q.-X.; Zhu, B.-Z. Forecasting carbon price with attention mechanism and bidirectional long short-term memory network. Energy 2024, 299, 131410. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, Z.; Chen, Y.; Jin, Y.; Bai, G. Selective kernel convolution deep residual network based on channel-spatial attention mechanism and feature fusion for mechanical fault diagnosis. ISA Trans. 2023, 133, 369–383. [Google Scholar] [CrossRef]
- Gao, S.-Y.; Zhao, Z.-M.; Liu, X.-J.; Jiao, Y.-L.; Song, C.-Y.; Zhao, J.-D. Vehicle Lane Change Multistep Trajectory Prediction Based on Data and CNN_BiLSTM Model. J. Adv. Transp. 2024, 2024, 7129562. [Google Scholar] [CrossRef]
- Li, P.-H.; Zhang, Z.-J.; Xiong, Q.-Y.; Ding, B.-C.; Hou, J.; Luo, D.-C.; Rong, Y.-J.; Li, S.-Y. State-of-health estimation and remaining useful life prediction for the lithium-ion battery based on a variant long short term memory neural network. J. Power Sources 2020, 459, 228069. [Google Scholar] [CrossRef]
- Dropka, N.; Ecklebe, S.; Holena, M. Real time predictions of VGF-GaAs growth dynamics by LSTM neural networks. Crystals 2021, 11, 138. [Google Scholar] [CrossRef]
- Cheng, Z.-H.; Chen, B.; Lu, R.-Y.; Wang, Z.-J.; Zhang, H.; Meng, Z.-Y.; Yuan, X. Recurrent neural networks for snapshot compressive imaging. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2264–2281. [Google Scholar] [CrossRef] [PubMed]
- Fernandez, J.G.; Keemink, S.; van Gerven, M. Gradient-free training of recurrent neural networks using random perturbations. Front. Neurosci. 2024, 18, 1439155. [Google Scholar] [CrossRef]
- Wang, S.; Zhu, D.-H.; Chen, J.; Bi, J.-B.; Wang, W.-Y. Deepfake face discrimination based on self-attention mechanism. Pattern Recognit. Lett. 2024, 183, 92–97. [Google Scholar] [CrossRef]
- Lin, X.-Z.; Chao, S.-H.; Yan, D.-M.; Guo, L.-L.; Liu, Y.; Li, L.-J. Multi-sensor data fusion method based on self-attention mechanism. Appl. Sci. 2023, 13, 11992. [Google Scholar] [CrossRef]
- Luo, S.-C.; Wang, B.-S.; Gao, Q.-Z.; Wang, Y.-B.; Pang, X.-F. Stacking integration algorithm based on CNN-BiLSTM-Attention with XGBoost for short-term electricity load forecasting. Energy Rep. 2024, 12, 2676–2689. [Google Scholar] [CrossRef]

















| Symbol | Physical Meaning | Unit |
|---|---|---|
| P | Heater power | W |
| Crystal pulling rate | mm/min | |
| Measured crystal diameter | mm | |
| Initial diameter prediction from the mechanistic model | mm | |
| Estimated diameter correction from the data-driven model | mm | |
| Final predicted diameter after compensation, | mm |
| Metric | Definition | Formula |
|---|---|---|
| MSE | Mean Squared Error | |
| RMSE | Root Mean Squared Error | |
| MAE | Mean Absolute Error | |
| MAPE | Mean Absolute Percentage Error | |
| Coefficient of Determination |
| Stage | Convolution/Pooling Structure | BiLSTM Layer | Self-Attention Layer | Learning Rate | Epochs |
|---|---|---|---|---|---|
| Shoulder-formation | One convolution layer; one pooling layer | One layer with 12 hidden units | Two-dimensional key/query vectors | 0.005 | 50 |
| Constant-diameter | Sixteen convolution layers; one max-pooling layer | One layer with 15 hidden units | Two-dimensional key/query vectors | 0.001 | 150 |
| Model | MSE | RMSE | MAE | MAPE | |
|---|---|---|---|---|---|
| CNN | 71.30% | 2.4848 | 1.5763 | 1.1718 | 0.551% |
| LSTM | 74.40% | 2.2162 | 1.4887 | 1.2924 | 0.608% |
| BILSTM | 80.31% | 1.7045 | 1.3056 | 0.9769 | 0.458% |
| CNN-BILSTM | 91.71% | 0.7174 | 0.8470 | 0.6231 | 0.292% |
| CNN-BILSTM-Attention | 98.54% | 0.1262 | 0.3553 | 0.3173 | 0.151% |
| Note: Bold indicates the proposed method in this study. | |||||
| Model | MSE | RMSE | MAE | MAPE | |
|---|---|---|---|---|---|
| CNN | 58.53% | 0.1023 | 0.3199 | 0.2629 | 0.041% |
| LSTM | 77.23% | 0.0293 | 0.1714 | 0.1461 | 0.083% |
| BILSTM | 82.69% | 0.0460 | 0.2145 | 0.1719 | 0.039% |
| CNN-BILSTM | 94.54% | 0.0131 | 0.1147 | 0.0861 | 0.049% |
| CNN-BILSTM-Attention | 98.31% | 0.0040 | 0.0636 | 0.0517 | 0.029% |
| Note: Bold indicates the proposed method in this study. | |||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, P.; Pan, H.; Chen, C.; Jing, Y.; Liu, D. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals 2026, 16, 57. https://doi.org/10.3390/cryst16010057
Zhang P, Pan H, Chen C, Jing Y, Liu D. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals. 2026; 16(1):57. https://doi.org/10.3390/cryst16010057
Chicago/Turabian StyleZhang, Pengju, Hao Pan, Chen Chen, Yiming Jing, and Ding Liu. 2026. "CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals" Crystals 16, no. 1: 57. https://doi.org/10.3390/cryst16010057
APA StyleZhang, P., Pan, H., Chen, C., Jing, Y., & Liu, D. (2026). CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals, 16(1), 57. https://doi.org/10.3390/cryst16010057
