Next Article in Journal
Thermal Expansion, Microstructure and Mechanical Properties of Rapid Microwave Sintering Mn3Cu0.5Ge0.5N0.9C0.1 in Nitrogen Atmosphere
Previous Article in Journal
Synthesis, Structure, and Physical Properties of RbCr2Se2O
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals

1
School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
2
National and Local Engineering Research Center of Crystal Growth Equipment and System Integration, Xi’an University of Technology, Xi’an 710048, China
3
Growing Factory, Xi’an ESWIN Material Technology Co., Ltd., Xi’an 710114, China
4
E2 Growing Technology Department, Xi’an XINSEMI Material Technology Co., Ltd., Xi’an 710065, China
*
Author to whom correspondence should be addressed.
Crystals 2026, 16(1), 57; https://doi.org/10.3390/cryst16010057
Submission received: 5 December 2025 / Revised: 5 January 2026 / Accepted: 10 January 2026 / Published: 13 January 2026
(This article belongs to the Section Inorganic Crystalline Materials)

Abstract

High-precision prediction of the crystal diameter during the growth of electronic-grade silicon single crystals is a critical step for the fabrication of high-quality single crystals. However, the process features high-temperature operation, strong nonlinearities, significant time-delay dynamics, and external disturbances, which limit the accuracy of conventional mechanism-based models. In this study, mechanism-based models denote physics-informed heat-transfer and geometric models that relate heater power and pulling rate to diameter evolution. To address this challenge, this paper proposes a hybrid deep learning model combining a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and self-attention to improve diameter prediction during the shoulder-formation and constant-diameter stages. The proposed model leverages the CNN to extract localized spatial features from multi-source sensor data, employs the BiLSTM to capture temporal dependencies inherent to the crystal growth process, and utilizes the self-attention mechanism to dynamically highlight critical feature information, thereby substantially enhancing the model’s capacity to represent complex industrial operating conditions. Experiments on operational production data collected from an industrial Czochralski (Cz) furnace, model TDR-180, demonstrate improved prediction accuracy and robustness over mechanism-based and single data-driven baselines, supporting practical process control and production optimization.

1. Introduction

Silicon, as a canonical semiconductor material, exhibits outstanding electronic properties and permits effective modulation of electrical conductivity between the conductive and insulating regimes, making it indispensable to fields such as integrated circuits, automotive electronics, and telecommunications. The Czochralski (CZ) method is a widely used silicon single-crystal growth technique, extensively employed in semiconductor manufacturing [1,2]. The core step of this method involves heating high-purity silicon granules to temperatures above approximately 1400 °C, rendering them molten. A small seed crystal is subsequently immersed in the molten silicon and slowly withdrawn while being rotated, so as to ensure uniform crystal growth. During growth, precise control of the temperature and pulling rate [3,4,5] ensures that silicon atoms deposit onto the surface of the seed crystal, forming a well-ordered crystal lattice. Silicon single crystals produced by the CZ process exhibit excellent electrical properties and high purity, making them well suited for the fabrication of advanced electronic devices such as microprocessors, memory chips, and optoelectronic sensors.
In recent years, advances in the integrated circuit industry have imposed more stringent requirements on the quality management of single-crystal silicon wafers and related manufacturing processes. Typically, the principal manipulated variables governing variations in crystal diameter are the heater power and the crystal pulling rate. Among these, the optimal adjustment range of the crystal pulling rate is narrow, and frequent changes can induce pronounced fluctuations in the growth interface, increasing the risk of crystal breakage and defect formation [6]. By contrast, regulating the crystal diameter via heater power is a slow, time-delayed process that offers a wider actuation range and is less likely to disturb the growth interface. Consequently, the development of a robust growth process model that quantitatively relates the manipulated variables (heater power and pulling rate) to crystal diameter, together with an accurate diameter prediction framework, constitutes a critical prerequisite for achieving high-quality CZ silicon single-crystal growth.
One of the widely adopted models for crystal diameter prediction [7] is the mechanistic model. Mechanistic modeling enables quantitative characterization and prediction of system behavior and processes by leveraging fundamental scientific principles and physical laws. This modeling approach is extensively employed in physics, chemistry, biology, and engineering to facilitate a deeper understanding of the dynamic behaviors of complex systems. Liu et al. [8] performed a transient global simulation of CZ silicon single-crystal growth, systematically investigating the generation, transport, and segregation behaviors of oxygen and carbon, and elucidating their interfacial distribution characteristics in relation to melt depth and fluid flow structures. Wang et al. [9] formulated a lumped-parameter model of oxygen concentration dynamics for CZ silicon single-crystal growth, thereby facilitating the real-time implementation of mechanistic models for growth optimization and control. Popescu et al. [10] used numerical models to elucidate the temporal evolution of temperature and oxygen concentration in CZ silicon single-crystal growth and quantified the sensitivity of their fluctuations to the pulling rate and crucible rotation rate. In modeling Cz silicon single-crystal growth, both lumped-parameter and distributed-parameter models face practical limitations. Lumped-parameter models assume spatially uniform temperatures for components such as the heater, crucible, and melt and represent energy balance with ordinary differential equations, whereas distributed-parameter models resolve spatially varying fields via partial differential equations. Parameter uncertainty, sensitivity to operating conditions, and the high computational cost of distributed-parameter models hinder real-time industrial implementation. Furthermore, critical model information and control-relevant data are difficult to acquire online in real time. Thus, mechanistic approaches are limited by high modeling cost, strong dependence on domain expertise, difficulty in online acquisition of critical parameters, and inaccuracies induced by model simplifications.
Another commonly employed class of models is data-driven modeling. With continued advances in computing hardware, modeling approaches based on data analytics and machine learning have attracted increasing attention. Sun et al. [11] analyzed the necessity of employing deep learning for soft sensing, reviewed mainstream models and toolchains, and summarized the practical requirements and challenges encountered in real-world applications. Kutsukate et al. [12] employed machine learning models to quantify the effects of process parameters on the interstitial oxygen concentration (Oi) in CZ silicon single-crystal growth, demonstrating that Oi increases with higher crucible rotation rates and lower argon flow, and confirming nonlinear parameter dependencies, thereby informing growth optimization strategies. Liu et al. [13] developed a constant-pulling-rate control scheme for CZ silicon single-crystal growth, in which a stacked sparse autoencoder-based generalized predictive controller substantially enhances diameter control accuracy. Wan et al. [14] proposed a data-driven model predictive control framework leveraging a hybrid weighted stacked autoencoder–random forest model to optimize growth rate-to-temperature-gradient ratio (V/G) control in CZ silicon single-silicon single-crystal growth, improving crystal quality to satisfy process requirements. Jiang et al. [15] developed a real-time soft sensor based on deep belief network–support vector regression to mitigate prediction latency and enable high-accuracy tail-diameter prediction in CZ silicon single-crystal growth. Qi et al. [16] developed a neural networks–genetic algorithm hybrid optimization scheme for CZ silicon single-crystal growth parameters, lowering growth-interface shape deviation and oxygen concentration to enhance crystal quality. Despite high accuracy and low implementation cost, data-driven approaches often yield black-box models with limited interpretability, offering little mechanistic insight into the intrinsic dynamics of CZ silicon single-crystal growth.
A hybrid model is constructed by integrating mechanistic and data-driven approaches, leveraging their complementary strengths to develop and optimize the soft-sensing model. Zabihi et al. [17] developed a hybrid fault detection framework fusing physics-based sensor models with data-driven learning, and experimentally verified its effectiveness for fault detection and quantification in target components. Chen et al. [18] introduced a theory-guided hard-constraint projection (HCP) that embeds domain knowledge in neural networks and uses a projection to enforce physical consistency, reducing data needs and improving accuracy and robustness. Kato et al. [19] proposed a gray-box model for predicting control variables in the CZ silicon single-crystal growth process, achieving significantly higher predictive accuracy than conventional first-principles models. Ren et al. [20] developed a hybrid mechanism and data-driven model to predict critical quality indicators in CZ silicon single-crystal growth, thereby achieving substantially improved predictive accuracy. Wan et al. [21] proposed a soft sensor and a performance-driven hierarchical control scheme for online monitoring of critical variables in CZ silicon single-crystal growth, validated on industrial data. Hybrid-driven modeling, by jointly leveraging mechanistic knowledge and data, enables real-time monitoring and high-accuracy prediction. Its distinctive strength lies in combining the interpretability of mechanism-based models with the generalization capacity of data-driven approaches, thereby enhancing robustness and adaptability and allowing the resulting model to more faithfully reflect actual operating conditions.
The novelty of this work lies in a hybrid modeling approach that integrates mechanistic and data-driven models to predict crystal diameter during the shoulder-formation and constant-diameter stages in the growth of semiconductor-grade silicon single crystals. The mechanistic model accurately characterizes the evolution of crystal diameter during growth, while the data-driven model refines the diameter prediction using process data, thereby achieving predictions that are both interpretable and accurate. Concurrent with this, we propose an algorithm that integrates a CNN [22,23,24], a BiLSTM [25,26] network, and an adaptive prediction-enhancement module, thereby improving the accuracy and robustness of diameter prediction. The main contributions of this work are as follows:
  • To address the nonlinearity and large time delays in electronic-grade semiconductor silicon single-crystal growth, as well as the difficulty of directly measuring the crystal diameter, this paper proposes a hybrid modeling approach that integrates mechanistic and data-driven models, combining interpretability with high predictive accuracy. This innovative research approach provides a theoretical foundation and practical guidance for crystal diameter prediction and for improving the quality of silicon single crystals;
  • To enhance the predictive accuracy and robustness of crystal diameter estimation, this paper proposes an algorithm based on a CNN–BiLSTM architecture augmented with an attention mechanism [27,28], thereby broadening its scope of practical application and validating its accuracy and efficiency;
  • By integrating control theory with machine learning, the proposed hybrid modeling framework provides a new perspective on semiconductor silicon single-crystal growth and has significant implications for advancing the intelligent manufacturing of semiconductor materials.

2. Proposed Hybrid Modeling Framework

This paper proposes a hybrid-driven modeling approach, based on a CNN–BiLSTM–Attention architecture, for diameter prediction during the shoulder-formation and constant-diameter stages of CZ silicon single-crystal growth. Figure 1 illustrates the key stages of silicon single-crystal growth, including charging/melting, seeding, shoulder-formation, and constant-diameter growth. Across these stages, the crystal diameter is strongly influenced by operating variables such as heater power and pulling rate. Motivated by the strong coupling between these operating variables and diameter evolution, we focus on the shoulder-formation and constant-diameter stages, in which accurate diameter prediction is essential for stable diameter control. Figure 2 presents the proposed two-module modeling architecture: (i) The mechanistic module, developed from heat-transfer and geometric relations and complemented by energy-balance and fluid-dynamics considerations, provides an initial diameter prediction. (ii) The data-driven module then learns to predict and compensate for the residual error using historical and real-time multi-source sensor data, yielding a refined diameter prediction for decision-making and process control. Table 1 summarizes the main variables involved and their physical meanings as well as units.
The data-driven module leverages machine learning and statistical learning techniques to capture data regularities in silicon single-crystal growth, such as the empirical effects of heater power and pulling rate on diameter evolution. It produces dynamic short-horizon predictions of the diameter residual e ^ D , which are used in Figure 2 to refine the mechanistic prediction. These predictions support control-oriented decisions, for example by informing heater power adjustments for energy-efficient and stable diameter regulation. Moreover, large predicted diameter fluctuations can serve as an early warning signal of elevated instability risk, helping reduce defect-prone operating conditions. Prediction is implemented using a CNN–BiLSTM–Attention approach, and the detailed algorithmic framework is shown in Figure 3, which provides a zoomed-in view of the data-driven module in Figure 2.
The network takes as inputs the residual signal produced by the mechanistic model together with historical multivariate time-series data collected during silicon single-crystal growth, where the time series consist of sequential measurements of process variables (e.g., heater power, pulling rate, measured diameter, and other sensor signals) that reflect the dynamic evolution of the growth process. After dataset construction, a BiLSTM serves as the core temporal modeling component. A sequence-folding layer reshapes each multivariate sequence into a two-dimensional matrix, enabling the CNN to apply two-dimensional convolution kernels to extract localized inter-variable correlations at each time step. The BiLSTM then captures bidirectional temporal dependencies, and the self-attention mechanism emphasizes informative time steps. Finally, a fully connected layer aggregates the learned features and outputs the prediction. The predicted signal is inverse-normalized to obtain the predicted diameter-related time series in the original scale, and the performance is evaluated using mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE).

2.1. Mechanistic Modeling of the Silicon Single-Crystal Growth System

2.1.1. Heat-Transfer Model

Assuming that all energy released by the heater is completely absorbed by the crucible with no losses, the first law of thermodynamics allows us to derive lumped-parameter models for the respective components.
  • Governing equation for the temporal evolution of the heater temperature T ˙ h :
    T ˙ h = 1 C h P i n q h c
  • The heat capacity C h of the heater is given by the following:
    C h = S h × ρ h × V h
  • Heater volume V h :
    V h = π ( R h o 2 R h i 2 ) × H h
  • Radiative heat-transfer rate from heater to crucible q h c :
    q h c = A c × σ × T h 4 T c 4
    where P i n is the heater power, S h is the heater specific heat capacity, ρ h is the heater density, R h o and R h i are the outer and inner radii, A c is the crucible surface area, σ is the Stefan–Boltzmann constant, and T h and T c are the heater and crucible temperatures.
  • Governing equation for the temporal evolution of the crucible temperature T ˙ c :
    T ˙ c = 1 C c q h c q c o q c s q c m
  • The crucible heat capacity C c is computed as follows:
    C c = S c × ρ c × V c
  • The calculation method for the crucible volume V c is as follows:
    V c = π R c 2 × H c
    where q c o denotes the radiative heat-transfer rate from the crucible to the environment, q c s the radiative heat-transfer rate from the crucible to the melt, q c m the conductive heat-transfer rate from the crucible to the melt, S c the specific heat capacity of the crucible material, ρ c the crucible material density, and H c the crucible height.

2.1.2. Geometric Model

  • Governing equation H ˙ m for the temporal evolution of the melt height at the solid–liquid interface from mass conservation:
    H ˙ m = ρ s R i 2 V p H ˙ m e n ρ m R c 2 ρ s R i 2
    where V p denotes the crystal pulling velocity, H m the rate of change of the meniscus height, ρ m the melt density, and R c the crucible radius.
  • Equation for the meniscus height H m e n :
    H m e n = a 1 sin α 0 + α c 1 + a 2 R i
    From Equation (9), it follows that
    H ˙ m e n = H m e n R i R ˙ i + H m e n α c α ˙ c
    where R ˙ i denotes the capillary length, which depends on the meniscus surface tension and the melt density, α 0 denotes the crystal growth angle, and α c denotes the crystal tilt angle.
  • Relationship between the temporal variation of the crystal radius R i and the growth rate V g :
    R ˙ i = V g tan α c
  • Temporal evolution of the tilt angle α ˙ c :
    α ˙ c = V p V c r u c C α z V g C α n
    where
    C α z = 1 ρ s R i 2 ρ m R c 2 + 1 R i 2 R c 2 H m e n R i 2 R i H m e n R c 2 a 2 R c 2 c o s α 0 + α c t a n α c
    C α n = 1 R i 2 R c 2 H m e n α c + a 2 R i R c 2 s i n ( α 0 + α c )
    H m e n R i = a 2 1 s i n ( α 0 + α c ) 2 2 R i 2 1 + a 2 R i S α
    H m e n α c = a c o s α 0 + α c 2 S α
    S α = 1 sin ( α 0 + α c ) ( 1 + a / 2 R i )
    where V c r u c denotes the crucible lifting rate.

2.2. Convolutional Neural Network (CNN)

In silicon single-crystal growth, the measured process variables form a multivariate time series, where each time step contains synchronized samples from multiple sensor channels. In this work, the input at each time step consists of heater power, pulling rate, and the measured crystal diameter, which are stacked as multiple channels. Convolutional layers apply sliding filters across the channel dimension to extract localized interparameter couplings, that is, short-range correlations among variables within the same time step, while reducing the sensitivity to noise and improving feature compactness.
A CNN is a deep learning architecture designed for processing grid-structured data [29], as illustrated in Figure 4. By exploiting local receptive fields and weight sharing, the CNN efficiently learns hierarchical feature representations through stacked layers. In the proposed framework, the CNN serves as the first feature extraction stage of the data-driven module shown in Figure 3. The multichannel time-series input is first reshaped by the sequence-folding operation into a two-dimensional representation, after which the CNN extracts spatial features that are passed to the subsequent BiLSTM and attention blocks for temporal modeling and feature weighting. Pooling layers perform downsampling to reduce the dimensionality of intermediate feature maps. In our setting, pooling compresses the CNN feature maps along the folded representation, which decreases computational cost while retaining the most informative local patterns for downstream BiLSTM modeling.
The convolutional layer constitutes the core of a CNN and extracts local features by applying sliding convolutional kernels over the input. The convolution operation is essentially an inner product between the input feature map and a convolution kernel, and weight sharing substantially reduces the number of trainable parameters. The convolutional layer output S ( t ) is given by the following:
S t = X W t = i = 1 k x t i + 1 w i
where X = x 1 , x 2 , , x T , W is the convolution kernel, K is the kernel size, t is the convolution position, and S ( t ) denotes the output at position t.
Pooling layers reduce data dimensionality and computational burden while preserving the principal feature information. The most common pooling operations are max pooling and average pooling. Assuming the convolutional output is A = a 1 , a 2 , , a T and max pooling slides a window of length k with stride s along the time axis, the max-pooling operation is given by the following:
P ( t ) = max ( a t , a t + 1 , , a t + k + 1 )
where P ( t ) denotes the pooled output at position t, i.e., the maximum value within the pooling window. For average pooling, the operation is as follows:
P ( t ) = 1 k i = t t + k + 1 a i
where P ( t ) denotes the pooled output, i.e., the average value within the pooling window.
The fully connected layer maps the features extracted by the convolutional and pooling layers to the final output space. Assuming the output A of the convolution and pooling blocks is flattened and fed to the fully connected layer, the fully connected mapping is as follows:
y = W f · A + b f
where y denotes the output of the fully connected layer, W f is the weight matrix of the fully connected layer, b f is the bias term, and A is the output from the convolution and pooling layers.

2.3. Long Short-Term Memory (LSTM) Network

For diameter prediction during the constant-diameter stage of CZ silicon single-crystal growth, the input data used for model construction are obtained from measurements and simulations and exhibit pronounced temporal characteristics. The LSTM [30,31] effectively captures temporal dependencies in these inputs and accommodates complex inter-feature relationships in multi-input settings.
An LSTM is a deep learning model for sequential data. Each unit comprises three gates—the input, forget, and output gates—and a cell state. The forget gate determines how much past information is retained, the input gate governs the incorporation of new information, and the output gate determines the current hidden state, thereby enabling flexible updating and propagation of information through the cell state. Compared with conventional Recurrent Neural Networks (RNNs) [32,33], LSTMs more effectively mitigate vanishing/exploding gradients, capture long-range dependencies, and substantially improve model performance.

2.3.1. Recurrent Neural Network (RNN)

The traditional RNN is a kind of neural network used for processing sequential data, as shown in Figure 5. Unlike feedforward neural networks, an RNN maintains an internal recurrent state that retains past information while processing the input sequence, thereby capturing temporal dependencies. The basic RNN cell combines the current input with the previous hidden state via recurrent connections to form a new hidden state. The standard equations are as follows:
f t = f ( W h · h t 1 + W x · x t + b )
where h t is the hidden state at time t, h t 1 is the hidden state at the previous time step, x t is the current input, W h and W x are weight matrices, b is the bias vector, and f ( · ) is typically a nonlinear activation function.
For long sequences, RNNs can suffer from vanishing or exploding gradients during backpropagation, which hinders learning of long-range dependencies.

2.3.2. Gating Mechanisms

To address the limitations of conventional RNNs, LSTM networks introduce a gating mechanism. The input, forget, and output gates dynamically regulate what information is retained, updated, and passed to the hidden state at each time step, thereby controlling information flow through the cell state and the hidden state and enabling effective learning of long-range dependencies. This mechanism enables LSTM to capture long-range dependencies in sequential data more effectively than standard RNNs. In silicon single-crystal growth, such long-range dependencies arise from pronounced time-delay dynamics, where early operating actions can influence the diameter evolution much later in the process. Building on LSTM, the BiLSTM structure processes the sequence in both forward and backward directions, which further enhances the modeling of delayed and stage-dependent effects.
The forget gate determines how much past information is discarded from the cell state:
f t = σ ( W f · h t 1 , x t + b f )
where f t is the forget-gate output (valued in 0 , 1 ), σ ( · ) is the sigmoid activation that maps inputs to 0 , 1 , W f is the weight matrix, h t 1 is the previous hidden state, x t is the current input, b f is the bias term, and f t controls the proportion of information retained from the previous cell state C t 1 .
The input gate determines which new information is added to the cell state:
i t = σ ( W i · h t 1 , x t + b i )
where i t denotes the output of the input gate, W i is the weight matrix, and b i is the bias term. The output gate determines the hidden state of the next step (i.e., the output of the current time step):
o t = σ ( W o · h t 1 , x t + b o )
h t = o t tanh ( C t )
where o t denotes the output of the output gate, W o is the weight matrix, and b o is the bias term. The cell state is as follows:
C t = f t C t 1 + i t C ¯ t
where C t is the weighted sum of information retained from the previous cell state via the forget gate and new information selected via the input gate, and C ¯ t is the candidate cell content.
At each time step, the forget gate determines—conditioned on the current input—how much past information to discard; the input gate selects which new information to incorporate; and the output gate decides which portion of the cell state is exposed as the hidden state, which is then propagated to subsequent layers, as shown in Figure 6. In silicon single-crystal growth, the diameter response is governed by coupled heat-transfer and melt–crystal dynamics and exhibits pronounced thermal inertia and time-delay effects, such that early variations in heater power or pulling rate can influence the diameter evolution much later. By flexibly updating and transmitting information, the gating mechanism enables LSTM to retain salient process information over long horizons and thereby capture these long-range dependencies effectively.

2.4. Bidirectional Long Short-Term Memory (BiLSTM) Network

For diameter prediction during the constant-diameter stage of silicon single-crystal growth, the diameter response is governed by coupled heat-transfer and melt–crystal dynamics and exhibits pronounced thermal inertia and time-delay effects. As a result, the diameter at the prediction point depends on information distributed over a time window rather than only on the most recent measurements. To exploit such contextual information, we encode the input sequence using bidirectional temporal modeling around the prediction point. In this work, the available measurements constitute heterogeneous multivariate time series, since different sensor channels may have different physical meanings, scales, and sampling characteristics. Therefore, the signals are first synchronized, cleaned, and normalized to form aligned multichannel sequences. Under the proposed multi-input training setting, the data-driven model jointly receives the mechanistic model diameter prediction, its corresponding residual (error) signal, and multichannel process measurements such as pulling rate and heater power. This design encourages the network to focus on learning the residual dynamics that are not captured by the mechanistic model, thereby refining the overall prediction. A BiLSTM is adopted because it can model long-range temporal dependencies and multivariate feature interactions more effectively than a unidirectional recurrent model. As shown in Figure 7, the forward and backward LSTM streams process the sequence in opposite directions and fuse their hidden representations to form contextual features. These features are subsequently weighted by the attention mechanism to emphasize the most informative time steps and variable combinations under different operating conditions.
This bidirectional structure enables comprehensive capture of contextual information by simultaneously considering forward and backward information in the sequence, thereby improving the understanding of sequential data. Its basic structure comprises a forward LSTM that processes the sequence in its natural order, from the first to the last time step ( x 1 , x 2 , , x T ) , and a backward LSTM that processes the input sequence in reverse order, from the last to the first time step ( x T , x T 1 , , x 1 ) . In this model, the input signal is processed by a forward LSTM producing h and a backward LSTM producing h ; together, they determine the value fed to the hidden layer, yielding the BiLSTM output y t , whose update equations are as follows:
h t = L S T M ( x t , h t 1 )
h t = L S T M ( x t , h t 1 )
y t = W h t + W h t + b
The hidden states of the forward and backward LSTMs are concatenated to form the output:
h t = h t , h t

2.5. Self-Attention Mechanism

In diameter prediction for CZ silicon single-crystal growth, the self-attention mechanism [34,35] captures long-range temporal dependencies, dynamically reweights salient features, handles complex data patterns, and improves interpretability, thereby significantly enhancing predictive accuracy and robustness, as shown in Figure 8. Compared with traditional models, it more effectively handles nonlinearity, noise disturbances, and long-range temporal dependencies, while providing higher parallel computing efficiency, thereby accelerating training and enabling optimization of control strategies.
Self-attention is typically used to enhance a model’s ability to focus on different parts of the input sequence, thereby helping it capture long-range dependencies and salient features [36]. When combined with convolutional layers and a BiLSTM, the self-attention mechanism can effectively handle complex patterns in time-series data.
Let X R n d denote the input matrix, where n is the sequence length and d is the feature dimension per time step. Map X to the query Q, key K, and value V via linear transformations:
Q = X W q
K = X W k
V = X W v
The core of self-attention is to compute similarities between the queries Q and the keys K, and to use these similarities to weight the value matrix V.
A t t e n t i o n ( Q , K , V ) = s o f t max ( Q K T d k ) V
where Q K T denotes the dot product between the queries and keys, measuring query–key similarity, d k is the key dimension used to scale the dot product and avoid excessively large values, and V is the value matrix, whose weighted sum yields the output at each time step.

2.6. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling Method

This paper proposes a hybrid modeling approach that integrates CNN, BiLSTM, and self-attention. First, the CNN extracts local features from the input sequence, captures spatial relationships in high-dimensional data, and feeds them to the BiLSTM network. BiLSTM performs bidirectional processing of the input data in the forward and backward directions using two independent LSTM networks, thereby capturing information from both past and future contexts in the sequence. Subsequently, the attention mechanism assigns distinct importance weights to each time step in the sequence, enabling the model to focus on the most relevant features and further improving predictive performance. By integrating these three components, the method enhances the model’s understanding of temporal dynamics and optimizes feature extraction and information weighting, thereby improving overall predictive performance.
As illustrated in Figure 9, the procedure for crystal diameter prediction in CZ silicon single-crystal growth is as follows:
(1)
Data acquisition: Shoulder-formation and constant-diameter stage data are collected both from actual CZ silicon single-crystal growth processes using an industrial furnace (TDR-180, National and Local Engineering Research Center of Crystal Growth Equipment and System Integration, Xi’an University of Technology, Xi’an, China) with a sampling interval of 2 s, covering the shoulder-formation and constant-diameter stages, and from crystal growth models constructed based on first principles under different operating conditions in multiple furnaces, in order to enrich the experimental sample dataset.
(2)
Mechanistic model simulation: Using Simulink R2023a, the lifting speed and heater power under actual operating conditions are fed into the mechanistic model to obtain diameter predictions, which exhibit relatively large errors. Purely mechanistic modeling yields crystal diameters that reach only about 50% of the actual diameter.
(3)
Data preprocessing: The data used for model training are preprocessed by handling missing values and outliers, filtering out random noise, applying normalization, and then partitioning the dataset into training and test sets.
(4)
Training the CNN model for feature extraction: During silicon single-crystal growth, variations in crystal diameter are influenced by multiple factors, including thermal conditions and operating actions such as heater power and crystal pulling speed, which often exhibit spatial interrelationships. The CNN layer is responsible for extracting local features from the raw input data, such as the relationship between heater power and crystal pulling speed. Convolution operations can identify these key patterns across different time points and spatial ranges. The extracted features not only provide rich inputs for the subsequent BiLSTM layer, but also supply diverse feature representations for the self-attention mechanism. The feature outputs of the CNN are then passed to the BiLSTM and attention layers, ensuring that subsequent time-series modeling and the weighting of key features are performed on a more accurate basis.
(5)
Training the BiLSTM network model: The primary role of the BiLSTM layer is to capture temporal dependencies in time-series data by using forward and backward LSTM networks to learn past and future information in the input sequence, respectively. In crystal diameter prediction for silicon single-crystal growth, the BiLSTM not only extracts information from historical data that is relevant to the current diameter, but also, through the backward LSTM, captures the potential impact of future operations on diameter variation. The feature representations provided by the CNN serve as inputs to the BiLSTM layer, helping it to learn dynamic relationships and trends across time, and to supply the attention layer with a temporally informed feature sequence. This bidirectional information-processing capability enables the model to predict future diameter changes more accurately, particularly under long control horizons and process complexity.
(6)
Self-attention focusing: The attention layer plays a focusing role in the overall model by assigning different weights to the input features at each time step according to their importance. In the silicon single-crystal growth process, not all control variables affect the crystal diameter to the same extent; at certain time instants, heater power or pulling speed may have a much stronger influence on diameter variations, while other variables have only a minor effect. By assigning weights to the features at each time step, the attention mechanism enables the model to adaptively focus on the most relevant information, thereby improving predictive accuracy. In the proposed model, the attention layer not only relies on the feature representations produced by the CNN and BiLSTM, but also makes use of their temporal information and local features to further strengthen the focus on data at critical time instants.
(7)
During CZ silicon single-crystal growth, crystal diameter prediction can be achieved by summing the measured data with the differential prediction values and feeding the result into the hybrid modeling module.

3. Industrial Experiment Simulation

In this section, we develop a hybrid-driven model based on a CNN–BiLSTM–Attention architecture to predict the crystal diameter during the shoulder-formation and constant-diameter stages of CZ silicon single-crystal growth. We then compare its performance with that of other data-driven modeling methods.

3.1. Performance Evaluation Metrics

To comprehensively compare model performance, five evaluation metrics are employed, as listed in Table 2: mean squared error (MSE), RMSE, MAE, MAPE, and the coefficient of determination ( R 2 ).

3.2. Prediction Results and Analysis

In this study, the crystal diameter in the Czochralski silicon single-crystal growth process is taken as the target variable. Twelve-inch SSC growth data collected from a TDR-180 single-crystal furnace were used to evaluate the predictive capability of the proposed model by continuously recording process data during the shoulder-formation and constant-diameter stages. In the data preprocessing stage, missing and abnormal values were corrected, random noise was removed, and the data were standardized. Subsequently, 720 data points from the end of the shoulder-retraction stage after missing-value processing, together with 4100 data points from the subsequent constant-diameter stage, were selected and partitioned into training and testing sets at a ratio of 7:3. Figure 10 shows the curves of the actual diameter measurement values during the crystal shoulder placement and equal-diameter stages.
The growth environment and process of CZ silicon single crystals are highly complex and involve numerous interrelated influencing factors. This study focuses on variables specifically related to crystal diameter during the shoulder-formation and constant-diameter stages, among which heater power, crystal pulling speed, and the deviation between the actual crystal diameter and the mechanistic model prediction are identified as key variables for diameter prediction. On the basis of these core factors, a modeling study is carried out with the aim of improving the accuracy of crystal diameter prediction.
In this study, a hybrid modeling approach is employed to accomplish crystal diameter prediction. First, Simulink simulations are used to obtain diameter predictions from a mechanism-based model. The predicted values are then compared with experimental measurements to obtain the corresponding prediction deviations. Figure 11 shows the curves of the measured values of crystal diameter, the predicted values of the mechanism model, and the prediction error values of the model. Finally, the deviation values, together with other relevant variables, are used as the input–output parameters to train a data-driven CNN-BiLSTM-Attention model, which compensates for the errors of the simplified mechanism model and improves the prediction accuracy.
The network parameters of the proposed CNN–BiLSTM–Attention model for the small-sample shoulder-formation stage and for the constant-diameter stage are summarized in Table 3. All deep learning models were implemented and trained in MATLAB R2023a using the Deep Learning Toolbox. The Adam optimizer was adopted for training, and the mean squared error (MSE) was used as the loss function. An early-stopping strategy based on the validation MSE was applied to mitigate overfitting. The main hyperparameter search space included the learning rate [ 0.01 , 0.005 , 0.001 ] , the number of BiLSTM hidden units [ 50 , 100 , 150 ] , the number of CNN filters [ 16 32 , 32 64 ] , the number of attention heads [ 2 , 4 , 8 ] , the dropout rate [ 0.2 , 0.3 , 0.4 ] , and the batch size [ 64 , 128 ] . The training time was approximately 10 min for the shoulder-formation stage and 30 min for the constant-diameter stage. All experiments were conducted on a PC equipped with an Intel Core i9-13900HX CPU, 16 GB RAM, and an NVIDIA GeForce RTX 4060 Laptop GPU.
Figure 12 and Figure 13 present the compensation prediction results and the corresponding prediction error curves of the proposed CNN-BiLSTM-Attention model for the shoulder-formation stage and the constant-diameter stage, respectively. For comparison, CNN-, LSTM-, and BiLSTM-based models were also implemented for diameter prediction. Figure 14 and Figure 15 show the performance comparison and the diameter variation curves for the shoulder-formation stage, respectively, while Figure 16 and Figure 17 show the corresponding results for the constant-diameter stage.
Based on the data in Table 4 and Table 5, the comparison includes CNN, LSTM, BiLSTM, CNN–BiLSTM, and the proposed CNN–BiLSTM–Attention algorithm. In terms of the R 2 metric, the value in the shoulder-formation stage increases from 71.30% for CNN to 98.54% for the proposed model, and in the constant-diameter stage, it increases from 58.53% to 98.31%. This indicates that the latter can capture data trends and features more comprehensively, thereby achieving the highest prediction accuracy. In terms of the MSE and RMSE metrics, the CNN–BiLSTM–Attention model performs best, achieving the lowest MSE of 0.1262 and RMSE of 0.35525 in the shoulder-formation stage, and the lowest MSE of 0.0040 and RMSE of 0.0636 in the constant-diameter stage. This indicates that the model exhibits a smaller overall error magnitude in numerical terms. In terms of MAE, the absolute error in the shoulder-formation stage decreases significantly from 1.1718 for the CNN model to 0.31738 for the CNN–BiLSTM–Attention model, and in the constant-diameter stage from 0.2629 for the CNN model to 0.0517 for the proposed model, which shows that the proposed method is more effective in reducing absolute error. In terms of mean absolute percentage error (MAPE), the proposed model achieves the best performance, with MAPE values of 0.15% in the shoulder-formation stage and 0.029% in the constant-diameter stage. These results indicate only negligible percentage deviations from the true values and fully demonstrate the model’s superior predictive accuracy and stability.
In summary, hybrid models generally exhibit superior performance compared with single models. Among them, the CNN–BiLSTM–Attention model achieves the best results across multiple key evaluation metrics. This superiority stems from its effective integration of local spatial features extracted by CNN with long-term temporal dependencies modeled by BiLSTM, while the attention mechanism adaptively assigns weights to and focuses on critical information. The synergy between this multi-level feature fusion and dynamic information selection substantially enhances the model’s representational capacity, effectively suppresses prediction errors, and improves generalization performance.

4. Conclusions

This study addresses the critical problem of diameter prediction during the shoulder-formation and constant-diameter stages of CZ silicon single-crystal growth and proposes a hybrid-driven modeling framework based on a CNN–BiLSTM–Attention architecture. The framework adopts a multi-level feature enhancement strategy: the CNN extracts local spatial features from multichannel process measurements, the BiLSTM captures long-range temporal dependencies in diameter evolution, and the self-attention module emphasizes informative time steps to improve prediction performance near stage transitions. Experimental results show that the proposed hybrid model achieves higher prediction accuracy and stronger robustness to process disturbances than conventional single-model baselines. The model is trained and evaluated using operational data collected from an industrial single-crystal growth furnace rather than purely simulated data, and it has been validated offline on historical production datasets with satisfactory accuracy. Online closed-loop control in real-time growth experiments has not yet been implemented and will be investigated in future work. Overall, the proposed approach provides an effective data-driven solution for diameter prediction in CZ silicon single-crystal growth.

Author Contributions

Conceptualization, P.Z.; methodology, P.Z. and H.P.; investigation, H.P.; formal analysis, P.Z.; validation, P.Z. and Y.J.; visualization, P.Z.; writing—original draft, P.Z.; writing—review and editing, H.P., C.C. and D.L.; data curation, C.C. and Y.J.; funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Major Scientific Instrument Development Project of China under Grant 62127809.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to liud@xaut.edu.cn.

Conflicts of Interest

Author Pengju Zhang is employed by Xi’an ESWIN Materials Technology Co., Ltd. Author Hao Pan is employed by Xi’an XINSEMI Material Technology Co., Ltd. The company had no involvement in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The other authors declare no conflicts of interest.

References

  1. Dezfoli, A.R.A. Review of simulation and modeling techniques for silicon Czochralski crystal growth. J. Cryst. Growth 2024, 648, 127921. [Google Scholar] [CrossRef]
  2. Zheng, Z.-C.; Seto, T.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M.; Hasebe, S. A first-principle model of 300 mm Czochralski single-crystal Si production process for predicting crystal radius and crystal growth rate. J. Cryst. Growth 2018, 492, 105–113. [Google Scholar] [CrossRef]
  3. Hou, L.; Gao, D.; Wang, S.; Zhang, W.; Lin, H.; An, Y. Particle Swarm Optimization–Long Short-Term Memory-Based Dynamic Prediction Model of Single-Crystal Furnace Temperature and Heating Power. Crystals 2025, 15, 110. [Google Scholar] [CrossRef]
  4. Ren, J.-C.; Liu, D.; Wan, Y. Modeling and application of Czochralski silicon single crystal growth process using hybrid model of data-driven and mechanism-based methodologies. J. Process Control 2021, 104, 74–85. [Google Scholar] [CrossRef]
  5. Li, Y.-K.; Chen, C.; Liu, D.; Li, D.-P. Anti-disturbance switching control for silicon single crystal growth systems under unmeasured states. IEEE Trans. Cybern. 2025, 55, 4865–4877. [Google Scholar] [CrossRef]
  6. Yen, C.-C.; Singh, A.K.; Chung, Y.-M.; Chou, H.-Y.; Wuu, D.-S. Study of flow pattern defects and oxidation induced stacking faults in Czochralski single-crystal silicon growth. Crystals 2023, 13, 336. [Google Scholar] [CrossRef]
  7. Liu, D.; Zhao, X.-G.; Zhao, Y. A review of growth process modeling and control of czochralski silicon single crystal. Control Theory Appl. 2017, 34, 1–12. [Google Scholar]
  8. Liu, X.; Harada, H.; Miyamura, Y.; Han, X.-F.; Nakano, S.; Nishizawa, S.; Kakimoto, K. Transient global modeling for the pulling process of Czochralski silicon crystal growth. II. Investigation on segregation of oxygen and carbon. J. Cryst. Growth 2020, 532, 125404. [Google Scholar] [CrossRef]
  9. Wang, K.; Koch, H.; Trempa, M.; Kranert, C.; Friedrich, J.; Derby, J.J. Physically-based, lumped-parameter models for the prediction of oxygen concentration during Czochralski growth of silicon crystals. J. Cryst. Growth 2021, 576, 126384. [Google Scholar] [CrossRef]
  10. Popescu, A.; Vizman, D. Particularities of the thermal and oxygen concentration instabilities in a Czochralski process for solar silicon growth. J. Cryst. Growth 2023, 611, 127177. [Google Scholar] [CrossRef]
  11. Sun, Q.-Q.; Ge, Z.-Q. A Survey on Deep Learning for Data-Driven Soft Sensors. IEEE Trans. Ind. Inform. 2021, 17, 5853–5866. [Google Scholar] [CrossRef]
  12. Kutsukake, K.; Nagai, Y.; Banba, H. Virtual experiments of Czochralski growth of silicon using machine learning: Influence of processing parameters on interstitial oxygen concentration. J. Cryst. Growth 2022, 584, 126580. [Google Scholar] [CrossRef]
  13. Liu, D.; Zhang, N.; Jiang, L.; Zhao, X.-G.; Duan, W.-F. Nonlinear Generalized Predictive Control of the Crystal Diameter in CZ-Si Crystal Growth Process Based on Stacked Sparse Autoencoder. J. IEEE Trans. Control Syst. Technol. 2020, 28, 1132–1139. [Google Scholar] [CrossRef]
  14. Wan, Y.; Liu, D.; Liu, C.-C.; Zhao, X.-G.; Ren, J.-C. Data-Driven Model Predictive Control of Cz Silicon Single Crystal Growth Process With V/G Value Soft Measurement Model. J. IEEE Trans. Semicond. Manuf. 2021, 34, 420–428. [Google Scholar] [CrossRef]
  15. Jiang, L.; Teng, D.; Zhao, Y. A Soft Measurement Method for the Tail Diameter in the Growing Process of Czochralski Silicon Single Crystals. Appl. Sci. 2024, 14, 1569. [Google Scholar] [CrossRef]
  16. Qi, X.-F.; Ma, W.-C.; Dang, Y.-F.; Su, W.-J.; Liu, L.-J. Optimization of the melt/crystal interface shape and oxygen concentration during the Czochralski silicon crystal growth process using an artificial neural network and a genetic algorithm. J. Cryst. Growth 2020, 548, 125828. [Google Scholar] [CrossRef]
  17. Zabihi, M.; Mehrizi, R.V.; Kasaiezadeh, A.; Pirani, M.; Khajepour, A. A Hybrid Model-Data Vehicle Sensor and Actuator Fault Detection and Diagnosis System. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8121–8133. [Google Scholar] [CrossRef]
  18. Chen, Y.-T.; Huang, D.; Zhang, D.-X.; Zeng, J.-S.; Wang, N.-Z.; Zhang, H.-R.; Yan, J.-Y. Theory-guided hard constraint projection (HCP): A knowledge-based data-driven scientific machine learning method. J. Comput. Phys. 2021, 445, 110624. [Google Scholar] [CrossRef]
  19. Kato, S.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M. Gray-box modeling of 300 mm diameter Czochralski single-crystal Si production process. J. Cryst. Growth 2021, 553, 125929. [Google Scholar] [CrossRef]
  20. Ren, J.-C.; Liu, D.; Wan, Y. Data-Driven and Mechanism-Based Hybrid Model for Semiconductor Silicon Monocrystalline Quality Prediction in the Czochralski Process. IEEE Trans. Semicond. Manuf. 2022, 35, 658–669. [Google Scholar] [CrossRef]
  21. Wan, Y.; Liu, D.; Ren, J.-C. Performance-driven semiconductor silicon crystal quality control. J. Process Control 2022, 120, 68–85. [Google Scholar] [CrossRef]
  22. Sun, B.; Liu, X.-D.; Wang, J.-Y.; Wei, X.-Z.; Yuan, H.; Dai, H.-F. Short-term performance degradation prediction of a commercial vehicle fuel cell system based on CNN and LSTM hybrid neural network. Int. J. Hydrogen Energy 2023, 48, 8613–8628. [Google Scholar] [CrossRef]
  23. Pan, S.-W.; Yang, B.; Wang, S.-K.; Guo, Z.; Wang, L.; Liu, J.-H.; Wu, S.-Y. Oil well production prediction based on CNN-LSTM model with self-attention mechanism. Energy 2023, 284, 128701. [Google Scholar] [CrossRef]
  24. Gao, M.-Y.; Xie, Y.-J.; Song, P.; Qian, J.-H.; Sun, X.-G.; Liu, J.Y. A definition rule for defect classification and grading of solar cells photoluminescence feature images and estimation of CNN-based automatic defect detection method. Crystals 2023, 13, 819. [Google Scholar] [CrossRef]
  25. Zhang, X.; Yang, Y.; Liu, J.; Zhang, Y.; Zheng, Y. A CNN-BiLSTM monthly rainfall prediction model based on SCSSA optimization. J. Water Clim. Change 2024, 15, 4862–4876. [Google Scholar] [CrossRef]
  26. Li, F.; Liu, S.-H.; Wang, T.-H.; Liu, R.-R. Optimal planning for integrated electricity and heat systems using CNN-BiLSTM-Attention network forecasts. Energy 2024, 309, 133042. [Google Scholar] [CrossRef]
  27. Qin, C.-Y.; Qin, D.-L.; Jiang, Q.-X.; Zhu, B.-Z. Forecasting carbon price with attention mechanism and bidirectional long short-term memory network. Energy 2024, 299, 131410. [Google Scholar] [CrossRef]
  28. Zhang, S.; Liu, Z.; Chen, Y.; Jin, Y.; Bai, G. Selective kernel convolution deep residual network based on channel-spatial attention mechanism and feature fusion for mechanical fault diagnosis. ISA Trans. 2023, 133, 369–383. [Google Scholar] [CrossRef]
  29. Gao, S.-Y.; Zhao, Z.-M.; Liu, X.-J.; Jiao, Y.-L.; Song, C.-Y.; Zhao, J.-D. Vehicle Lane Change Multistep Trajectory Prediction Based on Data and CNN_BiLSTM Model. J. Adv. Transp. 2024, 2024, 7129562. [Google Scholar] [CrossRef]
  30. Li, P.-H.; Zhang, Z.-J.; Xiong, Q.-Y.; Ding, B.-C.; Hou, J.; Luo, D.-C.; Rong, Y.-J.; Li, S.-Y. State-of-health estimation and remaining useful life prediction for the lithium-ion battery based on a variant long short term memory neural network. J. Power Sources 2020, 459, 228069. [Google Scholar] [CrossRef]
  31. Dropka, N.; Ecklebe, S.; Holena, M. Real time predictions of VGF-GaAs growth dynamics by LSTM neural networks. Crystals 2021, 11, 138. [Google Scholar] [CrossRef]
  32. Cheng, Z.-H.; Chen, B.; Lu, R.-Y.; Wang, Z.-J.; Zhang, H.; Meng, Z.-Y.; Yuan, X. Recurrent neural networks for snapshot compressive imaging. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2264–2281. [Google Scholar] [CrossRef] [PubMed]
  33. Fernandez, J.G.; Keemink, S.; van Gerven, M. Gradient-free training of recurrent neural networks using random perturbations. Front. Neurosci. 2024, 18, 1439155. [Google Scholar] [CrossRef]
  34. Wang, S.; Zhu, D.-H.; Chen, J.; Bi, J.-B.; Wang, W.-Y. Deepfake face discrimination based on self-attention mechanism. Pattern Recognit. Lett. 2024, 183, 92–97. [Google Scholar] [CrossRef]
  35. Lin, X.-Z.; Chao, S.-H.; Yan, D.-M.; Guo, L.-L.; Liu, Y.; Li, L.-J. Multi-sensor data fusion method based on self-attention mechanism. Appl. Sci. 2023, 13, 11992. [Google Scholar] [CrossRef]
  36. Luo, S.-C.; Wang, B.-S.; Gao, Q.-Z.; Wang, Y.-B.; Pang, X.-F. Stacking integration algorithm based on CNN-BiLSTM-Attention with XGBoost for short-term electricity load forecasting. Energy Rep. 2024, 12, 2676–2689. [Google Scholar] [CrossRef]
Figure 1. The growth process flow of silicon single crystals.
Figure 1. The growth process flow of silicon single crystals.
Crystals 16 00057 g001
Figure 2. Hybrid modeling architecture for diameter prediction. The mechanistic model takes heater power P and pulling rate V t as inputs and generates an initial diameter prediction D s . The measured crystal diameter is denoted by D t . The data-driven model outputs a diameter correction e ^ D , and the final predicted diameter is computed as D M = D s + e ^ D .
Figure 2. Hybrid modeling architecture for diameter prediction. The mechanistic model takes heater power P and pulling rate V t as inputs and generates an initial diameter prediction D s . The measured crystal diameter is denoted by D t . The data-driven model outputs a diameter correction e ^ D , and the final predicted diameter is computed as D M = D s + e ^ D .
Crystals 16 00057 g002
Figure 3. Overall algorithmic framework of the CNN–BiLSTM–Attention model, which provides a detailed realization of the data-driven module in Figure 2. The network takes as inputs the multivariate time-series signals of the silicon single-crystal growth process and the residual from the mechanistic model, where e D = D t D s . It outputs the estimated correction e ^ D , which is used in Figure 2 to refine the mechanistic prediction as D M = D s + e ^ D . Here, CNN denotes a convolutional neural network, BiLSTM denotes a bidirectional long short-term memory network, and Attention denotes a self-attention mechanism.
Figure 3. Overall algorithmic framework of the CNN–BiLSTM–Attention model, which provides a detailed realization of the data-driven module in Figure 2. The network takes as inputs the multivariate time-series signals of the silicon single-crystal growth process and the residual from the mechanistic model, where e D = D t D s . It outputs the estimated correction e ^ D , which is used in Figure 2 to refine the mechanistic prediction as D M = D s + e ^ D . Here, CNN denotes a convolutional neural network, BiLSTM denotes a bidirectional long short-term memory network, and Attention denotes a self-attention mechanism.
Crystals 16 00057 g003
Figure 4. CNN-based feature extraction block used in the data-driven module. The CNN takes the multichannel input sequence formed by heater power, pulling rate, and measured diameter at each time step and extracts compact feature maps, which are passed to the subsequent BiLSTM and attention modules in Figure 3.
Figure 4. CNN-based feature extraction block used in the data-driven module. The CNN takes the multichannel input sequence formed by heater power, pulling rate, and measured diameter at each time step and extracts compact feature maps, which are passed to the subsequent BiLSTM and attention modules in Figure 3.
Crystals 16 00057 g004
Figure 5. Recurrent unit schematic underlying the BiLSTM block in Figure 3. This unit processes multichannel time-series measurements from silicon single-crystal growth sequentially to capture temporal dependencies and produce hidden representations for subsequent attention and prediction layers.
Figure 5. Recurrent unit schematic underlying the BiLSTM block in Figure 3. This unit processes multichannel time-series measurements from silicon single-crystal growth sequentially to capture temporal dependencies and produce hidden representations for subsequent attention and prediction layers.
Crystals 16 00057 g005
Figure 6. Schematic of an LSTM unit underlying the BiLSTM block in the data-driven module, as shown in Figure 3. The unit uses input, forget, and output gates to regulate information flow through the cell state and hidden state, enabling the modeling of long-range temporal dependencies in silicon single-crystal growth time-series data.
Figure 6. Schematic of an LSTM unit underlying the BiLSTM block in the data-driven module, as shown in Figure 3. The unit uses input, forget, and output gates to regulate information flow through the cell state and hidden state, enabling the modeling of long-range temporal dependencies in silicon single-crystal growth time-series data.
Crystals 16 00057 g006
Figure 7. Schematic of a BiLSTM block in the data-driven module, as shown in Figure 3. The forward and backward streams process the silicon single-crystal growth time series in opposite directions and combine their hidden representations for downstream prediction.
Figure 7. Schematic of a BiLSTM block in the data-driven module, as shown in Figure 3. The forward and backward streams process the silicon single-crystal growth time series in opposite directions and combine their hidden representations for downstream prediction.
Crystals 16 00057 g007
Figure 8. Schematic of a self-attention mechanism.
Figure 8. Schematic of a self-attention mechanism.
Crystals 16 00057 g008
Figure 9. Flowchart of the hybrid CNN–BiLSTM–Attention prediction model.
Figure 9. Flowchart of the hybrid CNN–BiLSTM–Attention prediction model.
Crystals 16 00057 g009
Figure 10. Actual diameter measurements in the shoulder-formation and constant-diameter stages.
Figure 10. Actual diameter measurements in the shoulder-formation and constant-diameter stages.
Crystals 16 00057 g010
Figure 11. Measured crystal diameter and model prediction error (mm) in the shoulder-formation and constant-diameter stages. (a) Measured crystal diameter and model prediction error (mm) in the shoulder-formation stage. (b) Measured crystal diameter and model prediction error (mm) in the constant-diameter stage.
Figure 11. Measured crystal diameter and model prediction error (mm) in the shoulder-formation and constant-diameter stages. (a) Measured crystal diameter and model prediction error (mm) in the shoulder-formation stage. (b) Measured crystal diameter and model prediction error (mm) in the constant-diameter stage.
Crystals 16 00057 g011
Figure 12. Model-compensated prediction results and errors (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage. (a) Model-compensated prediction results (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage. (b) Model-compensated prediction errors (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage.
Figure 12. Model-compensated prediction results and errors (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage. (a) Model-compensated prediction results (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage. (b) Model-compensated prediction errors (mm) of the CNN–BiLSTM–Attention algorithm in the shoulder-formation stage.
Crystals 16 00057 g012
Figure 13. Model-compensated prediction results and errors (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage. (a) Model-compensated prediction results (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage. (b) Model-compensated prediction errors (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage.
Figure 13. Model-compensated prediction results and errors (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage. (a) Model-compensated prediction results (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage. (b) Model-compensated prediction errors (mm) of the CNN–BiLSTM–Attention algorithm in the constant-diameter stage.
Crystals 16 00057 g013
Figure 14. Comparison of shoulder-formation stage prediction performance across modeling methods with diameter error (mm).
Figure 14. Comparison of shoulder-formation stage prediction performance across modeling methods with diameter error (mm).
Crystals 16 00057 g014
Figure 15. Comparison of crystal diameter (mm) in the shoulder-formation stage across modeling methods.
Figure 15. Comparison of crystal diameter (mm) in the shoulder-formation stage across modeling methods.
Crystals 16 00057 g015
Figure 16. Comparison of constant-diameter stage prediction performance across modeling methods with diameter error (mm).
Figure 16. Comparison of constant-diameter stage prediction performance across modeling methods with diameter error (mm).
Crystals 16 00057 g016
Figure 17. Comparison of crystal diameter (mm) in the constant-diameter stage across modeling methods.
Figure 17. Comparison of crystal diameter (mm) in the constant-diameter stage across modeling methods.
Crystals 16 00057 g017
Table 1. List of variables, physical meanings, and units.
Table 1. List of variables, physical meanings, and units.
SymbolPhysical MeaningUnit
PHeater powerW
V t Crystal pulling ratemm/min
D t Measured crystal diametermm
D s Initial diameter prediction from the mechanistic modelmm
e ^ D Estimated diameter correction from the data-driven modelmm
D M Final predicted diameter after compensation, D M = D s + e ^ D mm
Table 2. Model performance evaluation metrics.
Table 2. Model performance evaluation metrics.
MetricDefinitionFormula
MSEMean Squared Error 1 N i = 1 N y ( i ) y ^ ( i ) 2
RMSERoot Mean Squared Error 1 N i = 1 N y ( i ) y ^ ( i ) 2
MAEMean Absolute Error 1 N i = 1 N y ( i ) y ^ ( i )
MAPEMean Absolute Percentage Error 1 N i = 1 N y ( i ) y ^ ( i ) y ( i ) × 100 %
R 2 Coefficient of Determination 1 i = 1 N y ( i ) y ^ ( i ) 2 i = 1 N y ( i ) y ¯ ( i ) 2
Table 3. Model parameters for the shoulder-formation and constant-diameter stages.
Table 3. Model parameters for the shoulder-formation and constant-diameter stages.
StageConvolution/Pooling StructureBiLSTM LayerSelf-Attention LayerLearning RateEpochs
Shoulder-formationOne 3 × 1 convolution layer; one 2 × 1 pooling layerOne layer with 12 hidden unitsTwo-dimensional key/query vectors0.00550
Constant-diameterSixteen 3 × 1 convolution layers; one 2 × 1 max-pooling layerOne layer with 15 hidden unitsTwo-dimensional key/query vectors0.001150
Table 4. Prediction performance metrics of the models in the shoulder-formation stage.
Table 4. Prediction performance metrics of the models in the shoulder-formation stage.
Model R 2 MSERMSEMAEMAPE
CNN71.30%2.48481.57631.17180.551%
LSTM74.40%2.21621.48871.29240.608%
BILSTM80.31%1.70451.30560.97690.458%
CNN-BILSTM91.71%0.71740.84700.62310.292%
CNN-BILSTM-Attention98.54%0.12620.35530.31730.151%
Note: Bold indicates the proposed method in this study.
Table 5. Prediction performance metrics of the models in the constant-diameter stage.
Table 5. Prediction performance metrics of the models in the constant-diameter stage.
Model R 2 MSERMSEMAEMAPE
CNN58.53%0.10230.31990.26290.041%
LSTM77.23%0.02930.17140.14610.083%
BILSTM82.69%0.04600.21450.17190.039%
CNN-BILSTM94.54%0.01310.11470.08610.049%
CNN-BILSTM-Attention98.31%0.00400.06360.05170.029%
Note: Bold indicates the proposed method in this study.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, P.; Pan, H.; Chen, C.; Jing, Y.; Liu, D. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals 2026, 16, 57. https://doi.org/10.3390/cryst16010057

AMA Style

Zhang P, Pan H, Chen C, Jing Y, Liu D. CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals. 2026; 16(1):57. https://doi.org/10.3390/cryst16010057

Chicago/Turabian Style

Zhang, Pengju, Hao Pan, Chen Chen, Yiming Jing, and Ding Liu. 2026. "CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals" Crystals 16, no. 1: 57. https://doi.org/10.3390/cryst16010057

APA Style

Zhang, P., Pan, H., Chen, C., Jing, Y., & Liu, D. (2026). CNN–BiLSTM–Attention-Based Hybrid-Driven Modeling for Diameter Prediction of Czochralski Silicon Single Crystals. Crystals, 16(1), 57. https://doi.org/10.3390/cryst16010057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop