Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process

Zhao, Duqiao; Ren, Junchao; Du, Xiaoyan; Wang, Yixin; Ding, Dong

doi:10.3390/cryst16020086

Open AccessArticle

Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process

by

Duqiao Zhao

¹,

Junchao Ren

²,

Xiaoyan Du

^1,*

,

Yixin Wang

¹ and

Dong Ding

¹

Department of Information Technology, Shaanxi Police College, Xi’an 710021, China

²

School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Crystals 2026, 16(2), 86; https://doi.org/10.3390/cryst16020086

Submission received: 28 November 2025 / Revised: 10 January 2026 / Accepted: 22 January 2026 / Published: 25 January 2026

(This article belongs to the Section Industrial Crystallization)

Download

Browse Figures

Versions Notes

Abstract

The V/G criterion is a critical indicator for monitoring dynamic changes during Czochralski silicon single crystal (Cz-SSC) growth. However, the inability to measure it in real time forces reliance on offline feedback for process regulation, leading to imprecise control and compromised crystal quality. To overcome this limitation, this paper proposes a novel soft sensor modeling framework that integrates both mechanism-based knowledge and data-driven learning for the real-time prediction of the crystal quality parameter, specifically the V/G value (the ratio of growth rate to axial temperature gradient). The proposed approach constructs a hybrid prediction model by combining a data-driven sub-model with a physics-informed mechanism sub-model. The data-driven component is developed using an attention-based dynamic stacked enhanced autoencoder (AD-SEAE) network, where the SEAE structure introduces layer-wise reconstruction operations to mitigate information loss during hierarchical feature extraction. Furthermore, an attention mechanism is incorporated to dynamically weigh historical and current samples, thereby enhancing the temporal representation of process dynamics. In addition, a robust ensemble approach is achieved by fusing the outputs of two subsidiary models using an adaptive weighting strategy based on prediction accuracy, thereby enabling more reliable V/G predictions under varying operational conditions. Experimental validation using actual industrial Cz-SSC production data demonstrates that the proposed method achieves high-prediction accuracy and effectively supports real-time process optimization and quality monitoring.

Keywords:

Czochralski silicon single crystal (Cz-SSC); V/G prediction; soft sensor; mechanism–data fusion; autoencoder; attention mechanism

1. Introduction

Silicon single crystal (SSC) serves as a foundational material for the modern semiconductor industry and remains an irreplaceable base material for manufacturing integrated circuit chips. In particular, the technology for pulling large-scale, high-quality ingots occupies a pivotal front-end position within the entire industry chain, directly determining the performance ceiling of subsequent chip manufacturing processes [1,2]. With the continuous decrease in chip linewidth and the continuous increase in silicon wafer size, higher requirements are put forward for the quality of SSC production [3]. Therefore, an online real-time prediction model for the crystal quality of the SSC growth process holds significant practical engineering significance. The model can continuously monitor the dynamic growth conditions of the crystal, enabling process personnel to promptly assess the growth status and make optimized adjustments. Meanwhile, the high-quality prediction data provided by the model also offers a crucial decision-making basis and data support for maintaining the stable and reliable operation of the growth control system.

It is well known that the Czochralski (Cz) crystal growth process is the most widely adopted mainstream technology for preparing high-quality and large-sized SSC [4]. This growth process is fundamentally based on the precise control of the solid–liquid phase transition, and its internal mechanism is extremely complex, involving multi-physical field coupling and interaction among the gas phase, melt (liquid) phase, and crystal (solid) phase [5,6]. The evaluation of crystal quality in Cz-SSC growth revolves around two core aspects: the geometric dimensional accuracy at the macroscopic level and the density of crystal defects at the microscopic level. These latter defects, which are pivotal, are predominantly the intrinsic point defects (vacancies and self-interstitials) generated as the crystal solidifies from the melt. The gradual accumulation of these intrinsic point defects (such as vacancies and self-interstitial atoms) in the crystal may lead to uneven resistance distribution in integrated circuit chips, increased leakage current, and reduced photoelectric conversion efficiency, among other key technical indicators that fail to meet the design requirements, thereby directly affecting the performance of the chips [7,8]. The current theory of point defect generation in crystals can be traced back to the V/G criterion theory first proposed by Vornkov [9] in the 1980s, where “V” presents the growth rate of the crystal and “G” denotes the temperature gradient near the solid–liquid interface. It should be emphasized that adjusting the V/G ratio is a key method for controlling the type of point defects within crystals during the Cz crystal growth process. By regulating this ratio, point defects within the crystal can be guided to transform from one form to another, thereby promoting rapid mutual recombination between interstitial atoms and vacancies [7]. This mechanism effectively reduces the residual and aggregation of defects, providing an important process control approach for achieving the growth of high-quality silicon single crystals with low-defect density. It is evident that real-time monitoring of the V/G value during the growth process of Cz-SSC is of great significance, as the fluctuation of this critical ratio directly affects the type and density of defects within the crystal. Such monitoring provides clear guidance for on-site operators to adjust the process parameters, enabling dynamic optimization of the growth procedure. This not only helps to improve the consistency of single crystal quality, but it also has important guiding significance for achieving the efficient and stable operation of the control system.

The V/G criterion is widely regarded as a key indicator for evaluating the quality of silicon single crystal (SSC) wafers [10,11,12]. However, traditional hard sensors cannot directly obtain this ratio during the actual crystal growth process, which poses a significant challenge for real-time monitoring and control. Therefore, adopting an artificial intelligence-based soft sensor modeling technique to predict the V/G criterion in real time through indirect measurable process parameters has become an effective solution to address these issues. Soft sensor modeling technology is a mathematical model-based and data-driven method for real-time estimation or prediction of the process variables that cannot be directly measured. Unlike traditional hard sensors that rely on physical probes for direct measurement, soft sensors are based on data-driven mathematical models. Through advanced modeling and computing technologies (such as machine learning algorithms), this technology can model and analyze the existing and measurable relevant process variables (such as temperature, pressure, rotational speed, etc.) in the system, achieving real-time estimation and prediction of the target variables that are difficult to measure directly [13,14,15]. Based on the above principle, the soft sensor modeling technology mainly unfolds along two paradigms: one is the modeling method based on mechanism knowledge, and the other is the data-driven modeling method. Although soft sensor models based on mechanistic knowledge are explicitly interpretable, they suffer from unmodelled dynamics and are prone to low-prediction accuracy. In contrast, a data-driven soft sensor model relies solely on abundant operational data generated during the production process to establish a high-accuracy predictive model from measurable variables to target variables. The core advantage of such models lies in their ability to automatically learn and capture complex dynamic characteristics from data, thereby playing an increasingly important role in predicting key variables across a wide range of complex industrial processes [16,17].

The core principle of data-driven soft sensor modeling involves first selecting auxiliary variables that are highly correlated with the primary variable (i.e., the variable to be predicted) from the production process as inputs; then a mathematical mapping model is constructed between them using historical or real-time data through machine learning algorithms. This model can accurately depict complex, nonlinear dynamic relationships and ultimately enable real-time online prediction of the key variables, completing the soft measurement task. Problems such as strong data redundancy and significant process noise often arise when dealing with the massive data accumulated in industrial processes, resulting in the effective information being overwhelmed by a large amount of irrelevant or repetitive data. Consequently, the key challenge and research focus in current soft sensor modeling lies in accurately and efficiently extracting feature information that reflects the intrinsic nature of the process from multidimensional, highly noisy raw data. Traditional shallow modeling methods (e.g., least squares regression, principal component analysis, etc.) often fail to effectively capture and extract complex, nonlinear deep features from data due to their relatively simple structures; this results in significantly limited representational capacity when confronted with industrial process data that exhibits strongly coupled and high-dimensional dynamic characteristics. In contrast, deep learning has been widely applied in the field of soft sensor modeling due to its outstanding ability in extracting deep features and learning representations. Therefore, various deep learning models, such as stacked autoencoders [18], convolutional neural networks [19], long short-term memory networks [20], deep belief networks [21], and other modeling methods have been successfully introduced and significantly improved the prediction performance of complex process variables. For instance, as a typical deep learning model, a stacked autoencoder (SAE) demonstrates significant advantages in soft sensor modeling because it can fully extract the underlying deep features and key information from the data, achieving effective dimensionality reduction in high-dimensional data. In this context, Gao et al. further proposed an innovative generative modeling framework, which combined the deep feature learning and representation capabilities of stacked variational autoencoders (VAEs) with the stable generation advantages of Wasserstein generative adversarial networks (WGANs) to construct a hybrid generative model. Experimental results show that this model significantly improves the prediction accuracy and generalization performance of soft sensors under complex conditions [22]. Wang et al. proposed a hybrid modeling framework by integrating the improved support vector machine with the sparse autoencoder network, effectively enhancing the classification accuracy in pipeline leakage detection [23]. To address the problem of generalization failure of prediction models caused by changes in working conditions in complex industrial processes, Ren et al. proposed a stack-enhanced autoencoder transfer learning algorithm based on variational mode decomposition, which effectively solved the domain-adaptive prediction problem and provided a feasible solution for adaptive prediction problems in actual industrial applications [24]. At present, the application of deep learning methods to the modeling of soft sensors in the growth process of pull-type silicon single crystals (Cz-SSC) is still in the initial exploration stage, and the relevant high-quality research results are relatively limited. Therefore, there are still numerous open questions in areas such as model architecture design, training strategy optimization, and integration of physical mechanisms that require further research and exploration.

Inspired by the aforementioned soft measurement modeling approach, a soft sensor modeling method driven by mechanism and data fusion is presented in this paper. In this method, an enhanced stacked autoencoder network incorporating an attention mechanism is specifically designed to optimize the prediction performance of the data-driven module and enhance its ability to capture the characteristics of the process history dynamics. Furthermore, an adaptive weight adjustment strategy for fusion is designed during the fusion stage of the data-driven sub-model and the mechanistic sub-model. This strategy can adaptively adjust the fusion of the mechanistic sub-model and the data-driven sub-model based on the real-time operating conditions and the local prediction performance of each sub-model. Through this adaptive weighted fusion, the model can more flexibly and synergistically leverage the generalizability of mechanistic knowledge and the compensatory strengths of data-driven approaches, ultimately achieving stable and accurate online prediction of the key crystal quality variable: the V/G criterion. Specifically, the main contributions of this paper are as follows:

(1): To address the technical challenge of direct detection of the V/G criterion during crystal growth using hard sensors, a mechanism and data fusion-driven soft sensor model (referred to as M-AD-SEAE) is designed, aiming to integrate the prior knowledge of the mechanism with the adaptive learning ability of the data-driven method to overcome the limitations of a single modeling approach.
(2): During the establishment of the data-driven sub-model (i.e., AD-SEAE), an attention mechanism is introduced to dynamically calculate the weights of different historical information, thereby focusing on the key dynamic features. The weighted historical information is integrated with the current sample and input into the stack-enhanced self-coding network (SEAE) for feature extraction and modeling. This design not only enhances the model’s ability to capture temporal dependencies, but also helps preserve the key process of dynamic information, thereby significantly improving prediction accuracy and reducing information loss.
(3): An adaptive weight adjustment strategy based on the entropy weight method is proposed herein to achieve dynamic fusion between the mechanistic sub-model and the data-driven sub-model. This method objectively calculates and allocates fusion weights based on the performance of each sub-model during real-time predictions through an entropy weighting approach, rather than employing a fixed proportion. This data-driven weight allocation mechanism effectively enhances the overall prediction accuracy and robustness of the soft measurement model for the V/G criterion of crystal quality.

The rest of the paper is organized as follows: Section 2 briefly describes the Cz-SSC growth process and the need for V/G criterion prediction in crystal quality. Section 3 systematically elaborates on the theoretical foundations and modeling methodology of the proposed soft sensor model, providing the essential knowledge framework for subsequent model construction. Subsequently, Section 4 conducts comparative experiments to qualitatively and quantitatively evaluate the proposed approach against existing mainstream modelling methods, analyzing its performance advantages and scope of applicability. Finally, Section 5 summarizes the research undertaken throughout the paper, outlining the core contributions and limitations of this methodology, and offers prospects for future investigations.

2. Cz Process and Problem Description

The growth of Cz-SSC is essentially a liquid–solid phase change process carried out under precise control [25]. The core of this process lies in the coordinated regulation of multiple parameters such as the temperature field, pulling speed, and crystal rotation, to achieve precise control over the morphology and dynamics of the crystal growth interface. The core objective of the process operation is to produce high-quality and large-sized SSCs with high efficiency and low costs, while ensuring the long-term stable operation of the single crystal furnace, in order to meet the strict requirements of semiconductor manufacturing for the perfection and consistency of materials. The growth process is shown in Figure 1.

The crystal growth environment inside the Cz-SSC furnace involves the coupling of multiple physical fields, such as the temperature, flow, and stress fields, as well as complex interactions among the gas, liquid, and solid phases, making it a typical strongly nonlinear, dynamic, and time-varying process [26], as depicted in Figure 2. The Cz-SSC growth process occurs under multiple extreme conditions including high temperatures, high vacuum, and strong magnetic fields, whilst simultaneously involving a complex physicochemical environment characterized by the coexistence and strong mutual coupling of solid, liquid, and gaseous phases. These complex working conditions make it difficult to directly monitor in real time the quality indicators of crystal growth (such as defect density, micro-unevenness, etc.) through traditional hard sensors, thereby restricting the effectiveness of implementing precise and optimal control over the SSC growth process. Fortunately, there are a series of process-operating variables that can be directly or indirectly monitored during the Cz-SSC growth process, such as crystal rotation rate, crystal pulling rate, crucible rotation rate, crucible lifting rate, crystal diameter, thermal field temperature distribution, and heater power, etc. Most of these variables can be reliably obtained through corresponding sensors, providing an important multi-source data foundation for indirect perception, assessment, and prediction of the internal quality state of the crystal.

Therefore, establishing an accurate mapping model between measurable process variables and crystal quality indicators is of great practical significance for guiding the process personnel at the Cz-SSC growth site to accurately assess the crystal growth status and achieve dynamic control of key point defects. This is precisely the core starting point and motivation for this paper to conduct in-depth research on this issue.

3. Mechanism and Data Fusion-Driven Soft Sensor Model

3.1. Mechanism Sub-Model

Figure 3 illustrates the physics-based (mechanistic) sub-model for the Cz-SSC growth process. It consists of two coupled parts: (i) an energy-transfer block that characterizes heat transport to the growth interface, and (ii) a hydrodynamic/geometric pulling block that links meniscus shape and diameter evolution [27,28]. The measurable time series {

P

,

v_{p}

,

v_{c}

,

D_{c r y}

} serve as inputs to the mechanistic block. Among them,

v_{c}

represents the crucible rising rate,

v_{p}

denotes the pulling rate, the heater power is represented by

P

, the crystal diameter is denoted by

D_{c r y}

.

In this study, considering that the energy transfer model involves a complete thermal field of complex radiation, conduction, and convection, it is simplified for the realization of soft sensor modeling. According to the theoretical derivation of the energy transfer model in reference [27], this paper only retains the physical relationship most related to V/G (such as Equations (1), (3) and (4)). Therefore, the complex heat transfer process can be equivalently represented as a computable thermal state estimation. Specifically, given the measurable inputs {

P

,

v_{p}

,

v_{c}

,

D_{c r y}

}, then the melt-side gradient agent

G_{m e n}

can be calculated. The solid-side gradient

G_{c}

is further deduced by the conservation of interface energy, and the V/G output of the mechanism sub-model is finally obtained. The following is the calculation process of the specific mechanism sub-model V/G output:

Firstly, it is assumed that the process near the meniscus is axially symmetric and quasi-steady at each sampling interval. The thermophysical properties change slowly and are taken as constants. The solid–liquid interface temperature is fixed at the melting point of silicon. The axial temperature gradient at the meniscus is approximated by a finite difference along the meniscus height.

Secondly, the melt temperature and meniscus height are calculated: the melt-side axial temperature gradient at the meniscus is approximated as

G_{m e n} = - \frac{T_{m} - T^{*}}{h_{m e n}}

(1)

where

T^{*}

is the melting point temperature of silicon.

T_{m}

is the melt temperature at the bottom of the meniscus (provided/estimated by the energy-transfer block). The meniscus height

h_{m e n}

is modeled as

h_{m e n} = a \sqrt{\frac{1 - \sin (α_{c} + α_{0})}{1 + a / (\sqrt{2} r_{c r y})}}

(2)

where

a = \sqrt{γ / ρ_{m} g}

represents the capillary constant,

γ

denotes the surface tension,

ρ_{m}

is the melt density,

g

describes the gravitational acceleration. In addition,

α_{0}

is the growth angle, usually defaulted to

α_{0} = 11^{o}

,

α_{c}

denotes the crystal slope angle, and

r_{c r y} = 0.5 D_{c r y}

describes the crystal radius.

Then, according to the interface energy balance and the solid-side gradient, the following is generated by the interface energy conservation:

v_{g} = \frac{k_{m e n} G_{m e n} - k_{c} G_{c}}{ρ_{s} Δ H}

(3)

where

k_{m e n}

and

k_{c}

are the thermal conductivity of the melt and crystal phases, respectively.

ρ_{s}

is the crystal density, and

Δ H

is the latent heat of crystallization.

G_{c}

denotes the axial temperature gradient on the crystal (solid) side at the growth interface. Since

G_{c}

is not directly measurable in our industrial setup, we estimate it by using Equations (1)–(3) as:

G_{c} = \frac{k_{m e n} G_{m e n} - ρ_{s} Δ H v_{g}}{k_{c}}

(4)

Here, the growth (solidification) rate is approximated from the relative motion as

v_{g} \approx v_{p} - v_{c}

.

The crystal radius evolution follows the curved-interface geometry as

{\dot{r}}_{c r y} = \frac{d r_{c r y}}{d t} = v_{g} \tan α_{c}

(5)

According to Voronkov’s criterion, the mechanistic V/G is defined as

\frac{V}{G} = \frac{v_{g}}{G_{c}}

(6)

In summary, given {

P

,

v_{p}

,

v_{c}

,

D_{c r y}

}, the mechanistic block sequentially computes

h_{m e n}

,

G_{m e n}

,

v_{g}

,

G_{c}

, and

V / G

, providing a physics-consistent prior for the subsequent hybrid fusion.

3.2. Data-Driven Sub-Model

3.2.1. Attention Mechanisms

In deep learning, the design of the attention mechanism draws inspiration from the human brain’s ability to selectively focus on key parts when processing information. Its core principle is that the model can automatically assess the importance of different parts of the input information and assign higher weights to more critical elements, thereby concentrating the limited computing resources on the features that are more relevant to the current task objective. Figure 4 shows the schematic of the model of the attention mechanism [29]:

Figure 4 illustrates the core framework of the attention mechanism, wherein source domain data can be modeled as a sequence of <Key, Value> pairs. For a given query in the target domain, the weight of each corresponding value is obtained by calculating the similarity between its Key and all Keys in the source domain. The final attention output is the weighted sum of all values, calculated as follows:

A t t e n t i o n (Q u e r y, S o u r c e) = \sum_{i = 1}^{L_{x}} s i m i l a r i t y (Q u e r y, k e y_{i}) * V a l u e_{i}

(7)

where

L_{x} = ‖S o u r c e‖

describes the length of

S o u r c e

. The core objective of attention mechanisms is to selectively extract the most critical components for the current task from redundant or complex inputs when processing vast amounts of information. This effectively suppresses the influence of secondary or distracting information, thereby enhancing the model’s ability to represent and learn from key features. The focusing process of the attention mechanism essentially manifests as the calculation of different weight coefficients assigned to input information. The magnitude of these weight values directly reflects the model’s degree of attention to each “value”: the higher the weight, the more focused it is on its corresponding Value, i.e., the weight represents the importance of the information and the Value is its corresponding information.

The computational process of the attention mechanism can be systematically divided into three stages, with their logical relationships illustrated in Figure 5.

In the first stage, the similarity or correlation is calculated based on Query and

k e y_{i}

. In this paper, the vector dot product is used for the solution, i.e.,

S (Q u e r y, K e y_{i}) = Q u e r y \cdot K e y_{i}

(8)

In the second stage of attention computation, a method similar in principle to the softmax function is typically employed to normalize the raw scores generated in the first stage. This step enables standardization, transforming the scores into a standard probability distribution where the sum of all elements equals one. It should be emphasized that the weights of key elements are highlighted through the inherent mechanism of the softmax function. The calculation form is as follows:

α_{i} = s o f t \max (S_{i}) = \frac{e^{S_{i}}}{\sum_{j = 1}^{L_{x}} e^{S_{j}}}

(9)

where

α_{i}

is the weighting factor corresponding to

V a l u e_{i}

.

In the third stage, all the weighting coefficients are weighted and summed up to obtain the Attention value of the Query, i.e.,

A t t e n t i o n (Q u e r y, S o u r c e) = \sum_{i = 1}^{L_{x}} α_{i} \cdot V a l u e_{i}

(10)

3.2.2. Stacked Autoencoder

SAE is a neural network widely employed for deep feature extraction and whose architecture consists of multiple autoencoders (AE) stacked sequentially, thereby enabling layer-by-layer abstraction of input data. As shown in Figure 6, each basic autoencoder unit comprises an input layer, a hidden layer (encoding layer), and an output layer (decoding layer), enabling an effective encoding of the data through unsupervised training. Specifically, each basic autoencoder unit comprises an encoder and a decoder. Both the input layer and hidden layer form the encoder, while the hidden layer and output layer constitute the decoder [18]. The computations involved in encoding and decoding are given below:

h = s_{f} (W_{1} x + b_{1})

(11)

\hat{x} = s_{d} (W_{2} h + b_{2})

(12)

where

x

denotes the network input, while

\hat{x}

denotes reconstructed output. Additionally, both

s_{f}

and

s_{d}

denote neural activation functions, though the former pertains to the encoding process, while the latter relates to the decoding process. Note that the sigmoid function is employed throughout this paper. Here,

h

denotes the characteristics of the hidden layer, while the weights for the encoding and decoding processes are represented by

W_{1}

and

W_{2}

, respectively. The deviations in the encoding and decoding processes are represented by

b_{1}

and

b_{2}

, respectively.

The training of the AE network is accomplished by minimizing the reconstruction error between its input and output, with the specific form of this optimization objective shown in Equation (13).

J (W_{1}, W_{2}, b_{1}, b_{2}) = \frac{1}{m} \sum_{i = 1}^{m} {‖x_{i} - {\hat{x}}_{i}‖}_{2}^{2}

(13)

3.2.3. Stack-Enhanced Autoencoder

Traditional SAE networks achieve deep feature extraction from input data by minimizing the AE reconstruction error layer by layer. However, considering that the actual reconstruction process cannot guarantee that each layer of the AE can achieve accurate and lossless reconstruction of the input data, the phenomenon of information loss accumulation will inevitably exist as the feature learning proceeds layer by layer. To address this problem, this paper uses a stack-enhanced autoencoder (SEAE) [24], whose structure is shown schematically in Figure 7.

The SEAE network shown in Figure 7 is stacked by

L

enhanced autoencoders (EAE). The structure of EAE is shown in Figure 8. The key mechanism of the sound emission network lies in its recursive input reconstruction process. Specifically, at each layer, the network not only reconstructs the output of the current layer, but also incorporates the original input information as a constraint or reference within the reconstruction process. This reconstruction strategy which integrates the original input enables the network to retain the integrity of the source data to the greatest extent during deep feature extraction, thereby effectively compensating for the information loss caused by layer-by-layer abstraction in traditional SAE and enhancing the robustness and physical interpretability of feature representation.

In the actual network training process, EAE is only required to reconstruct the input information, and its corresponding error loss function can be simply formulated as:

J_{1} (W_{1}, {\hat{W}}_{1}, b_{1}, {\hat{b}}_{1}) = \frac{1}{m} \sum_{i = 1}^{m} {‖x_{i} - {\hat{x}}_{i}‖}_{2}^{2}

(14)

where

x

denotes the original input,

\hat{x}

represents the reconstructed output, and the symbol

s

describes the number of neurons in the hidden layer. After the training of EAE 1 is completed, the output layer is removed, and its hidden layer is used as the input of EAE 2. At this point, for EAE 2, the variables to be reconstructed are the hidden layer eigenvalue

h^{1}

of EAE 1 and the original input

x

. The output of EAE 2 is then removed, and the output of the hidden layer is used as the input of EAE 3, and so on.

Assuming that the SEAE has a total of

L

hidden layers, the loss function of

E A E l

(

1 < l \leq L

) can be expressed as follows [24]:

J_{l} (W_{l}, {\hat{W}}_{l}, b_{l}, {\hat{b}}_{l}) = \frac{1}{m} \sum_{i = 1}^{m} ({‖x_{i} - {\hat{x}}_{i}^{l}‖}_{2}^{2} + {‖z_{i}^{l} - {\hat{z}}_{i}^{l}‖}_{2}^{2}), l = 2, \dots, L

(15)

It should be emphasized that compared with the traditional SAE network, the SEAE network simultaneously reconstructs the shallow data features and the original data information during the network training process to ensure that the deep features of the data are obtained while minimizing unnecessary information loss.

3.2.4. SEAE Network Based on Attention Mechanism

The traditional SAE network ignores the influence of historical information on the current results as it is a static model. However, in the actual industrial production process, historical information is of great significance to the current production; the neglect of historical information will reduce the feature extraction ability and final prediction accuracy of the network. Therefore, considering the historical information, this paper proposes a dynamic stacked enhanced autoencoder network (AD-SEAE) based on an attention mechanism (see Figure 9). Among them, Figure 10 is the calculation process of the attention sample.

In the actual industrial production process, historical information closely related to the current production is mainly concentrated in the recent time period, while the historical information with a long time span has a very limited effect on the current information. Therefore, to reduce the computational cost, this paper introduces sliding window technology to limit the number of historical data input. In Figure 9, the input of the AD-SEAE network consists of two parts, namely, the historical information and the current information of the attention mechanism. For the historical information that introduces the attention mechanism, the calculation process is as follows:

1. The historical samples in the window are mapped to the Key space and the Value space. The mapping process is as follows:

k_{i} = f_{k} (x_{i}) = W_{k} x_{i} + b_{k}

(16)

v_{i} = f_{v} (x_{i}) = W_{v} x_{i} + b_{v}

(17)

where

k_{i}, v_{i}

denotes the

i

th Key and Value values, respectively;

W_{k}, b_{k}

denotes the Key-Value mapping weight matrix and bias, respectively; and

W_{v}, b_{v}

denotes the numerical mapping weight matrix and bias, respectively. Here,

W_{v}, b_{v}

is the unit matrix and the all-zero matrix, respectively.

2. Calculating the attention score includes two steps: the dot product operation to calculate similarity and the softmax function to perform numerical conversion operation.

3. Then obtain the Attention value based on weighted sum.

The difference between AD-SEAE and SEAE lies mainly in the first EAE. For the first EAE network, its inputs include both current inputs and historical inputs it obtained based on the combination of attention mechanisms. Therefore, its network parameters can be expressed as follows:

W_{1}^{'} = \{\begin{matrix} W_{1} \\ W_{a t t e n t i o n} \end{matrix}\}, b_{1}^{'} = \{\begin{matrix} b_{1} \\ b_{a t t e n t i o n} \end{matrix}\}

(18)

{\hat{W}}_{1}^{'} = \{\begin{matrix} {\hat{W}}_{1} \\ {\hat{W}}_{a t t e n t i o n} \end{matrix}\}, {\hat{b}}_{1}^{'} = \{\begin{matrix} {\hat{b}}_{1} \\ {\hat{b}}_{a t t e n t i o n} \end{matrix}\}

(19)

where

W_{1}^{'}, {\hat{W}}_{1}^{'}

denotes the encoding and decoding weights of the first EAE network in the AD-SEAE network, respectively;

b_{1}^{'}, {\hat{b}}_{1}^{'}

is the bias; and

W_{a t t e n t i o n}, {\hat{W}}_{a t t e n t i o n}, b_{a t t e n t i o n}, {\hat{b}}_{a t t e n t i o n}

denotes the weight and bias of the combined historical input information. The loss function of the first EAE network in the AD-SEAE network is denoted as follows:

J_{1} (W_{1}, {\hat{W}}_{1}, b_{1}, {\hat{b}}_{1}) = \frac{1}{m} \sum_{i = 1}^{m} ({(x_{i} - {\hat{x}}_{i})}^{2} + λ {(x_{a t t e n t i o n}^{i} - {\hat{x}}_{a t t e n t i o n}^{i})}^{2})

(20)

where

λ

is the historical information weight adjustment parameter. The above loss function contains not only the reconstruction of the original information, but also the reconstruction of the combination of attention history information. This operation ensures that the encoder can extract both the current features and the relevant historical information features.

The basic process of the AD-SEAE network contains three steps. Firstly, the required history samples are determined by choosing the appropriate window length, and then the combined history information is obtained through calculations based on the attention mechanism. Then, the attention samples and the current samples are combined and inputted into the AD-SEAE model in order to extract to the deep combination features. Finally, the mapping relationship between the deep combination features and the target variables is modeled in order to achieve the prediction of the target variables. In short, Table 1 is the modeling algorithm of the AD-SEAE model.

3.3. Output Fusion of Mechanisms Model and Data Model

In summary, to make full use of the prediction advantages of the mechanism sub-model and the data-driven sub-model, a model fusion weight adjustment method based on entropy weight is proposed. The entropy weight method is an objective weighting method, which determines the objective weight according to the variability of the entropy weight index [30]. Specifically, the smaller the degree of variation in the entropy weight index, the less the amount of information reflected, and the lower the corresponding weight. The specific fusion steps are as follows:

1. In this paper, the root mean square error (RMSE) and the mean absolute error (MAE) are used as the basis of the weight solution, that is

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

(21)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{y}}_{i} - y_{i}|

(22)

where

{\hat{y}}_{i}

is the predicted value,

y_{i}

is the actual value, and

N

is the number of samples.

2. The sub-model prediction accuracy value is defined as

P A

where

P A = R M S E + M A E

(23)

Based on the sub-model prediction accuracy

P A

, the sub-model likelihood measure parameter

l h

can be further defined as:

l h = 1 - \frac{P A}{\sum_{j = 1}^{n} P A}

(24)

In Equation (24), the larger the

l h

value is, the higher the prediction accuracy of the sub-model is. Assuming that the prediction accuracy of the sub-model is

V = {[l h_{1}, l h_{2}, \dots, l h_{L}]}^{T}

, and

L

is the window length, then the prediction accuracy data set

V^{*}

of all models can be expressed as:

V^{*} = [\begin{matrix} l h_{11} & l h_{12} \\ l h_{21} & l h_{22} \\ ⋮ & ⋮ \\ l h_{L 1} & l h_{L 2} \end{matrix}]

(25)

where

l h_{L 1}

and

l h_{L 2}, 1 \leq l \leq L

represent the

l

th prediction output accuracy of the mechanism sub-model and the data-driven sub-model in a window with a length of

L

, respectively.

3. By standardizing Equation (26), a standardized matrix can be obtained.

l h_{l r}^{'} = \frac{l h_{l r} - \min \{l h_{l r}\}}{\max \{l h_{l r}\} - \min \{l h_{l r}\}}, l = 1, 2, \dots, L; r = 1, 2

(26)

V^{'} = [\begin{matrix} l h_{11}^{'} & l h_{12}^{'} \\ l h_{21}^{'} & l h_{22}^{'} \\ ⋮ & ⋮ \\ l h_{L 1}^{'} & l h_{L 2}^{'} \end{matrix}]

(27)

Let

p_{i j} = \frac{l h_{i j}^{'}}{\sum_{i = 1}^{L} l h_{i j}^{'}}, j = 1, 2

, then the entropy can be expressed as:

E_{j} = - \sum_{i = 1}^{L} p_{i j} \ln p_{i j}, (0 \leq E_{j} \leq 1)

(28)

When

l h_{i j}^{'} = 0

, there exists

p_{i j} \ln p_{i j} = 0

.

4. The information entropy of the mechanism sub-model and data-driven sub-model can be expressed as

[E_{1}, E_{2}]

. Therefore, the fusion weight of each sub-model is:

α_{j} = \frac{1 - E_{j}}{2 - \sum_{j = 1}^{2} E_{j}}

(29)

Therefore, the final prediction output can be expressed as follows:

\hat{y} = α_{1} {\hat{y}}_{1} + α_{2} {\hat{y}}_{2}

(30)

where

{\hat{y}}_{1}

represents the predicted output of the mechanism sub-model, and

{\hat{y}}_{2}

denotes the predictive output of the data-driven sub-model. It is worth mentioning that the aforementioned RMSE/MAE-driven weight update belongs to offline calibration, which is used to determine the fusion weights on historical data (including offline reconstructed labels). During the online operation phase, only forward reasoning and fusion are performed, without relying on any online labels.

3.4. M-AD-SEAE-Based Soft Sensor

The main steps of the M-AD-SEAE soft sensor modeling method proposed in this paper are as follows:

1. Data Preprocessing

(1): Outlier processing: In the actual industrial production process, there are always inevitabe abnormal data points that deviate from the expected value due to noise and other factors. If these outliers are not eliminated and modeled with normal data, the accuracy of the model and other aspects will be greatly affected. Therefore, this paper uses the $3 σ$ criterion to eliminate the outliers in the data set. It is assumed that the measured value is $x_{1}, x_{2}, \dots, x_{n}$ , the average value is $\bar{x} = \frac{1}{n} (\sum_{i = 1}^{n} x_{i})$ , the absolute error is $Δ x_{i} = x_{i} - {\bar{x}}_{i}$ , and the standard deviation is $σ = {[\sum Δ x_{i}^{2} / (n - 1)]}^{1 / 2}$ . If the absolute error of a measured value $x_{i}$ satisfies $|Δ x_{i}| > 3 σ$ , it is considered that $x_{i}$ is abnormal and needs to be eliminated.
(2): Standardization: Since the modeling process of the data-driven sub-model requires auxiliary variables, the magnitude of different auxiliary variables is different. If the real value is used directly, the convergence time of the network becomes longer. Therefore, it is necessary to standardize the input data and output data:

$x^{*} = \frac{x - \bar{x}}{σ}$

(31)

2. Auxiliary feature variable selection:

Based on the standardized data, this paper considers the mixed correlation between the auxiliary variables and the target variables based on Pearson and Spearman in the process of auxiliary variable selection. That is

\begin{array}{l} λ_{x, y} = k | ρ_{x, y} | + (1 - k) | r_{x, y} | \\ = k |\frac{\sum (x - \bar{x}) (y - \bar{y})}{\sum {(x - \bar{x})}^{2} \sum {(y - \bar{y})}^{2}}| + (1 - k) |1 - \frac{6 \sum d_{i}^{2}}{N (N^{2} - 1)}| \end{array}

(32)

where

k

denotes the correlation adjustment parameter, and

d_{i} = x_{i} - y_{i}

denotes the difference in rank order between

x_{i}

and

y_{i}

. Based on the descending order of the mixed correlation values, the more relevant auxiliary variables can be selected.

3. AD-SEAE network modeling:

Firstly, the modeling sample is selected. Then, the attention score is calculated and the attention sample is obtained. Next, the attention samples and current samples are used as inputs in the AD-SEAE network. Finally, the final network model is obtained by pre-training the network layer by layer and then by backward fine-tuning.

4. Establishment of mechanism sub-model:

According to Equations (1)–(4) as shown above, the corresponding mechanism sub-model is established.

5. Prediction output fusion:

First, the prediction accuracy of the computer mechanism sub-model and the data-driven sub-model is obtained. Then, the information entropy and fusion weight of each sub-model are calculated according to the model prediction accuracy. Finally, the predicted output results are fused according to Equation (28).

4. Case Study

To validate the performance of the algorithm proposed in this paper, simulation experiments were conducted on two systems: a numerical case and an actual Cz-SSC production process. The model prediction performance metrics consisted of three indicators: MAE, MAPE, and RMSE where

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}| \times 100 %

(33)

4.1. Case 1: Generalized Numerical Case

To verify the applicability and effectiveness of the proposed M-AD-SEAE, this paper employed the following discrete-time nonlinear system to characterize the batch process of a specific group of industrial objects.

\begin{array}{l} x (k) = 1.5 u (k) - 1.5 u^{2} (k) \\ y (k + 1) = 0.6 y (k) - 0.1 y (k - 1) + 1.2 x (k) - 0.1 x (k - 1) \end{array}

(34)

where

u (k)

,

x (k)

, and

y (k)

denote the input, state, and output of the system at the time

k

. To mimic six batch runs (six operating conditions), we generated six different input trajectories

u_{i} (k) = a_{i} (k)

,

i = 1, \dots, 6

. The specific form was as follows:

[\begin{array}{l} a_{1} \\ a_{2} \\ a_{3} \\ a_{4} \\ a_{5} \\ a_{6} \end{array}] = [\begin{array}{l} \sin (k) \\ {(\sin (k))}^{2} \\ {(\sin (k))}^{3} \\ \sin (k) + {(\sin (k))}^{2} \\ {(\sin (k))}^{2} + {(\sin (k))}^{3} \\ 1.3 \sin (k) - 2 {(\sin (k))}^{2} \end{array}]

(35)

For each case

i

,

u_{i} (k)

was injected into Equation (34) to generate the corresponding output

y_{i} (k)

. In our soft-sensing setting, the first five outputs

{y_{1} (k), y_{2} (k), \dots, y_{5} (k)}

were treated as auxiliary variables (inputs), while the sixth output

y_{6} (k)

was treated as the target variable to be estimated.

The initial conditions were randomly set as

x (1) \in [0, 1]

and

y (1) \in [0, 1]

, respectively. To emulate measurement noise under different operating conditions, additive noise with different amplitudes was imposed on each output sequence

y_{i} (k)

, which yielded the noisy data shown in Figure 11. Also, to verify the effectiveness of the M-AD-SEAE soft sensor modeling method proposed in this paper, SAE, SEAE, AD-SEAE models were used for comparison.

To simulate the aforementioned data, it was necessary to conduct data preprocessing before establishing the soft sensor model. Given that the network parameters (i.e., weight, bias) significantly impact the model’s performance, this paper employed particle swarm optimization to optimize these parameters. The resulting network structure was as follows: [1,3,6,7]. The initial learning rate was set at 0.25, with a reduction factor of 0.5. Additionally, the activation function utilized is ‘sigm’.

Figure 12 and Figure 13 display the prediction output results of the test set, as well as the corresponding error curves for the four distinct soft sensor models. Notably, the Figures reveal significant variations in the prediction outcomes obtained by these different models, as this paper outlines. For the original SAE model, the maximum absolute error was recorded as 16.3277. Due to potential information loss during the layer-by-layer feature extraction process in the SAE network, the SEAE model was introduced, resulting in a decreased maximum absolute error of 12.8334 in its prediction results. Building upon SEAE, AD-SEAE further considered the influence of system history information, yielding relatively good prediction output that closely fluctuated around the true value. However, this method faced certain issues, i.e., a relatively large maximum absolute error of 17.2917.

The primary reason behind this issue was that the introduction of historical information accounted for the influence of historical noise, which led to significant errors in the model output when there are sudden changes in the true value. However, this problem can be mitigated by incorporating a mechanistic model. By leveraging the combined strengths of both approaches, the prediction accuracy can be further enhanced. Moreover, compared to AD-SEAE, the introduction of the mechanistic model in M-AD-SEAE reduced the maximum absolute error to 15.1239, demonstrating an improvement in performance.

The prediction performance indices of the four models are presented in Figure 14 and Table 2, clearly indicating a significant improvement in prediction performance through the step-by-step model enhancement process. Specifically, compared with the original SAE model, the MAE indexes of other models were reduced by 18.36%, 35.55%, and 62.76%, respectively. Similarly, the MAPE index exhibited decreases of 18.37%, 35.57%, and 62.78%, while the RMSE index demonstrated decreases of 16.89%, 28.46%, and 56.12%, respectively. These findings further demonstrate the suitability of the proposed modeling method for soft sensor modeling in numerical cases.

4.2. Case 2: Cz-SSC Growth Process

The historical database of the Cz-SSC furnace recorded a large amount of process data reflecting the state of crystal growth. In the actual single furnace production, the equal diameter growth stage of Cz-SSC takes a long time. Therefore, this paper selected 20,000 sets of sample data extracted from historical industrial growth logs, and the sampling time was 2 s. The first 85% of the data were used for model training and verification, and the last 15% of the data were used to verify the prediction performance of the model. Here, the data corresponding to 20,000 samples came from the same furnace.

According to expert experience, the auxiliary variables that affect the V/G value can be preliminarily determined. Table 3 shows these auxiliary variables include: crystal diameter, crystal pulling rate, main heater power, heating element temperature, crucible rise speed, liquid level temperature, crystal rotation speed, and crucible rotation speed. Here, the Pearson and Spearman coefficient was used to measure the correlation. The auxiliary variable data and target variables after data preprocessing are shown in Figure 15.

To verify the effectiveness of the M-AD-SEAE in the Cz-SSC process, SAE, SEAE, and AD-SEAE models were used for comparison. Here, the parameters of M-AD-SEAE were determined using the particle swarm optimization algorithm as [1,3,8,17]. Among them, “17” denoted the input dimension, “1” represented the output dimension of the network, and [3,8] indicated that the system comprised two hidden layers. The initial learning rate was set at 0.05, the learning rate reduction factor was 0.9, and the activation function used was ‘sigm’. To ensure fairness, the parameters for SAE, SEAE, and ADSEAE networks remained consistent with those of M-AD-SEAE.

In order to intuitively and clearly compare the prediction performance of each model, Figure 16 shows the V/G index prediction output results of SAE, SEAE, AD-SEAE, and M-AD-SEAE. It can be seen that the prediction accuracy of the SEAE model was higher than that of the SAE model. This was because the SEAE network increased the operation of reconstructing the original data in the feature extraction process, which reduced the cumulative loss of the original information, thereby enhancing the prediction accuracy of the SEAE network. Further, the prediction accuracy of the AD-SEAE model was better than that of the SEAE network, mainly because the self-attention mechanism module increased the model’s focus on key information, which enabled the model’s predictions to track the actual values in a largely consistent manner with a very good trend match of the inflection point. In short, the V/G indexes predicted by the proposed M-AD-SEAE model and the actual curves were in good agreement in terms of values and trends, which further illustrated the validity and prediction accuracy of the method proposed in this paper. It is worth mentioning that the “actual value” in Figure 16 is not the V/G directly measured by an online sensor, but rather the V/G reference label value obtained through offline reconstruction and calibration.

The prediction error curves of the above different models are shown in Figure 17. It can be seen that the prediction errors of the SAE and SEAE models were significantly higher than those of AD-SEAE and M-AD-SEAE, which was not a significant advantage for the semiconductor SSC growth process with high precision crystal quality control requirements. It is worth pointing out that although there was a small error mutation between AD-SEAE and the proposed M-AD-SEAE at t = 2800~2830, the error can be guaranteed to be stable within the rest of the time. In contrast, the M-AD-SEAE model proposed in this paper had high-prediction accuracy and could meet the monitoring requirements of the Cz-SSC growth process.

In order to compare the prediction performance of each model more clearly, this paper analysed the error histogram as shown in Figure 18 and the prediction performance index of each model as shown in Table 4.

Figure 18 shows that incorporating the attention module markedly enhances AD-SEAE relative to SEAE. Moreover, by integrating the mechanistic sub-model, the proposed M-AD-SEAE further outperformed AD-SEAE. As reported in Table 4, introducing the original-information pathway already boosted SEAE over the baseline SAE. With attention enabled, AD-SEAE achieved reductions in 91.4% (MAE), 91.3% (MAPE), and 71.0% (RMSE) compared with SEAE. In addition, M-AD-SEAE delivered further decreases of 52.8%, 52.0%, and 53.2% in MAE, MAPE, and RMSE, respectively, over AD-SEAE, indicating the smallest prediction error and the highest accuracy. Overall, M-AD-SEAE provided the best predictive performance among the compared methods and was well suited for online estimation of the V/G quality indicator in Cz-SSC growth.

5. Conclusions

In this study, a hybrid physics–data modeling framework was developed to enable online estimation of the V/G ratio, a key quality-related indicator, in the Czochralski process. The proposed method integrated first-principles knowledge with advanced deep learning techniques to address the challenges posed by system nonlinearity, process variability, and limited online measurability. Specifically, the data-driven sub-model was constructed using an attention-based dynamic stacked enhanced autoencoder (AD-SEAE), which improved temporal feature extraction by incorporating historical attention and mitigating information loss during hierarchical encoding. Meanwhile, the mechanism sub-model captured domain-specific physical relationships to reinforce predictive interpretability. An adaptive fusion strategy was then introduced to dynamically balance the contributions of both sub-models according to their prediction accuracy, thereby enhancing robustness and generalization. Experimental evaluations based on real industrial data demonstrated that the proposed method significantly outperformed conventional soft sensors in terms of prediction accuracy and resilience under varying operating conditions. These results confirmed the effectiveness of the hybrid model as a reliable tool for real-time crystal quality monitoring. Future work should focus on extending the model’s adaptability with batch-to-batch variations and integration with intelligent control frameworks for closed-loop optimization.

Author Contributions

Methodology, Investigation, Writing—original draft, Writing—Review & editing: D.Z.; Project administration, Conceptualization: J.R.; Formal analysis, Supervision, Conceptualization: X.D., Y.W.; Data curation: D.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (Grant No. 62303376) and the Independent Research Project of Shaanxi Key Laboratory of Intelligent Policing (No. SXZJ25ZZ07, No. SXZJ25ZZ08).

Data Availability Statement

The datasets presented in this article are not readily available because they are derived from highly confidential industrial production processes and are subject to strict commercial confidentiality agreements. Requests for verification of the key conclusions of this work should be directed to the author, Duqiao Zhao, who can facilitate a supervised review of the key figures and statistical analyses presented in the paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Fisher, G.; Seacrist, M.R.; Standley, R.W. Silicon crystal growth and wafer technologies. Proc. IEEE 2012, 100, 1454–1474. [Google Scholar] [CrossRef]
Duffar, T. Crystal Growth Processes Based on Capillarity: Czochralski, Floating Zone, Shaping and Crucible Techniques; Wiley: New York, NY, USA, 2010. [Google Scholar]
Wan, Y.; Liu, D.; Ren, J. Performance-driven semiconductor silicon crystal quality control. J. Process Control 2022, 120, 68–85. [Google Scholar] [CrossRef]
Zulehner, W. Czochralski growth of silicon. J. Cryst. Growth 1983, 65, 189–213. [Google Scholar] [CrossRef]
Ren, J.; Liu, D.; Wan, Y. Modeling and application of Czochralski silicon single crystal growth process using hybrid model of datadriven and mechanism-based methodologies. J. Process Control 2021, 104, 74–85. [Google Scholar] [CrossRef]
Bukhari, H.Z.; Hovd, M.; Winkler, J. Inverse response behaviour in the bright ring radius measurement of the Czochralski process I: Investigation. J. Cryst. Growth 2021, 568–569, 126039. [Google Scholar] [CrossRef]
Vanhellemont, J. The v/G criterion for defect-free silicon single crystal growth from a melt revisited: Implication for large diameter crystals. J. Cryst. Growth 2013, 381, 134–138. [Google Scholar] [CrossRef]
Voronkov, V.V. Grown-in defects in silicon produced by agglomeration of vacancies and self-interstitials. J. Cryst. Growth 2008, 310, 1307–1314. [Google Scholar] [CrossRef]
Vornkov, V.V. The mechanism of swirl defects formation in silicon. J. Cryst. Growth 1982, 59, 625–643. [Google Scholar] [CrossRef]
Vanhellemont, J.; Kamiyama, E.; Nakamura, K.; Śpiewak, P.; Sueoka, K. Impacts of thermal stress and doping on intrinsic point defect properties and clustering during single crystal silicon and germanium growth from a melt. J. Cryst. Growth 2017, 474, 96–103. [Google Scholar] [CrossRef]
Sabanskis, A.; Virbulis, J. Modelling of thermal field and point defect dynamics during silicon single crystal growth using CZ technique. J. Cryst. Growth 2019, 519, 7–13. [Google Scholar] [CrossRef]
Friedrich, J.; Jung, T.; Trempa, M.; Reimann, C.; Denisov, A.; Muehe, A. Considerations on the limitations of the growth rate during pulling of silicon crystals by the Czochralski technique for PV applications. J. Cryst. Growth 2019, 524, 125168. [Google Scholar] [CrossRef]
Teng, R.; Dai, X.; Xu, W. Numerical simulation of the influence of thermal shield optimization on the growth of large-diameter single-crystal silicon. J. Synth. Cryst. 2012, 41, 238–242. [Google Scholar]
Huang, W.; Liu, D. Modeling and multi-objective optimization of silicon single crystal growth process parameters. J. Artif. Cryst. 2017, 46, 2095–2101. [Google Scholar]
Guan, X.; Zhang, X. Simulation of V/G During Φ450 mm Czochralski grown silicon single crystal growth under the different crystal and crucible rotation rates. In MATEC Web of Conferences; EDP Sciences: London, UK, 2016; Volume 67, pp. 107–112. [Google Scholar]
Sun, Q.; Ge, Z. A Survey on Deep Learning for Data-Driven Soft Sensors. IEEE Trans. Ind. Inform. 2021, 17, 5853–5866. [Google Scholar] [CrossRef]
Yuan, X.; Gu, Y.; Wang, Y.; Yang, C.; Gui, W. A deep supervised learning framework for data-driven soft sensor modeling of industrial processes. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4737–4746. [Google Scholar] [CrossRef]
Yuan, X.; Ou, C.; Wang, Y.; Yang, C.; Gui, W. Deep quality-related feature extraction for soft sensing modeling: A deep learning approach with hybrid VW-SAE. Neurocomputing 2020, 396, 375–382. [Google Scholar] [CrossRef]
Yuan, X.; Qi, S.; Shardt, Y.A.; Wang, Y.; Yang, C.; Gui, W. Soft sensor model for dynamic processes based on multichannel convolutional neural network. Chemom. Intell. Lab. Syst. 2020, 203, 104050. [Google Scholar] [CrossRef]
Xu, B.; Pooi, C.K.; Tan, K.M.; Huang, S.; Shi, X.; Ng, H.Y. A novel long short-term memory artificial neural network (LSTM)-based soft-sensor to monitor and forecast wastewater treatment performance. J. Water Process Eng. 2023, 54, 104041. [Google Scholar] [CrossRef]
Wang, G.; Jia, Q.-S.; Zhou, M.; Bi, J.; Qiao, J. Soft-sensing of Wastewater Treatment Process via Deep Belief Network with Event-triggered Learning. Neurocomputing 2021, 436, 103–113. [Google Scholar] [CrossRef]
Gao, S.; Qiu, S.; Ma, Z.; Tian, R.; Liu, Y. SVAE-WGAN-Based Soft Sensor Data Supplement Method for Process Industry. IEEE Sens. J. 2022, 22, 601–610. [Google Scholar] [CrossRef]
Wang, C.; Han, F.; Zhang, Y.; Lu, J. An SAE-based resampling SVM ensemble learning paradigm for pipeline leakage detection. Neurocomputing 2020, 403, 237–246. [Google Scholar] [CrossRef]
Ren, J.; Liu, D.; Wan, Y. VMD-SEAE-TL-Based Data-Driven soft sensor modeling for a complex industrial batch processes. Measurement 2022, 198, 111439. [Google Scholar]
Liu, D.; Zhang, N.; Jiang, L.; Zhao, X.-G.; Duan, W.-F. Nonlinear generalized predictive control of the crystal diameter in CZ-Si crystal growth process based on stacked sparse autoencoder. IEEE Trans. Control. Syst. Technol. 2020, 28, 1132–1139. [Google Scholar] [CrossRef]
Rahmanpour, P.; Slid, S.; Hovd, M. Run-to-run control of the Czochralski process. Comput. Chem. Eng. 2017, 104, 353–365. [Google Scholar]
Zheng, Z.; Seto, T.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M.; Hasebe, S. A first-principle model of 300 mm Czochralski single-crystal Si production process for predicting crystal radius and crystal growth rate. J. Cryst. Growth 2018, 492, 105–113. [Google Scholar] [CrossRef]
Kato, S.; Kim, S.; Kano, M.; Fujiwara, T.; Mizuta, M. Gray-box modeling of 300 mm diameter Czochralski single-crystal Si production process. J. Cryst. Growth 2021, 553, 125929. [Google Scholar] [CrossRef]
Kumar, I.; Tripathi, B.K.; Singh, A. Attention-based LSTM network-assisted time series forecasting models for petroleum production. Eng. Appl. Artif. Intell. 2023, 123, 106440. [Google Scholar] [CrossRef]
He, N.; Qian, C.; Shen, C.; Huangfu, Y. A fusion framework for lithium-ion batteries state of health estimation using compressed sensing and entropy weight method. ISA Trans. 2023, 135, 585–604. [Google Scholar]

Figure 1. Schematic diagram of Cz-SSC growth process.

Figure 2. Growth environment in Cz-SSC furnace.

Figure 3. Schematic diagram of the mechanism model of Cz-SSC growth process.

Figure 4. Schematic diagram of the attention mechanism.

Figure 5. Three-stage process of calculating Attention.

Figure 6. Autoencoder network structure.

Figure 7. Stacked enhanced auto-encoder structure.

Figure 8. Structure of the EAE.

Figure 9. AD-SEAE network structure diagram.

Figure 10. Schematic diagram of attention sample calculation.

Figure 11. Original data—simulated measured process variables with additive noise.

Figure 12. Model prediction output.

Figure 13. Model prediction error.

Figure 14. Comparison of predictive performance indicators of different models.

Figure 15. Auxiliary variable and target variable data.

Figure 16. The V/G index prediction output results of different models.

Figure 17. Prediction output errors of different models.

Figure 18. Comparison of predictive performance indicators of different models.

Table 1. The pseudo-code style of the AD-SEAE modeling algorithms.

Input: Given training data

x_{t r a i n}

, target variable data

y

, test data

x_{t e s t}

, moving window length

L

, number of network layers

Q

;
Output: Predicted value of target variable

\hat{y}

Training stage:
for

i = 1 \to N

do
1.

x_{w_{i}} = [x_{i - L + 1}, x_{i - L}, \dots, x_{i - 1}]

{//

X_{w_{i}}

is the historical data in the training sample obtained for the

i

th time after moving the window}
2. Calculate the attention sample

X_{a t t e n t i o n_{i}}

according to Figure 10;
end for
for

i = 1 \to N

do
for

q = 1 \to Q

do
if

q = 1

then

X_{i n p u t_{i}}^{q} = [X_{t r a i n_{i}} {; X}_{a t t e n t i o n_{i}}]

else

X_{i n p u t_{i}}^{q} = h_{i}^{q - 1}

end if

h_{i}^{q} = E n c o d e r_{q} (x_{i n p u t_{i}}^{q})

{//

E n c o d e r_{q}

denotes the coding layer of the

q

th EAE of the AD-SEAE network}
Update the network parameters of the

q

th EAE
end for
Predictive training data:

{\hat{y}}_{i} = f (h_{i}^{Q})

;
Updates the parameters of the entire network;
end for
Testing phase:

\hat{y} = A D - S E A E_{t r a i n e d} (x)

return Outputs

Table 2. Model predictive performance index.

Model	MAE	MAPE (%)	RMSE
SAE	6.1128	14.48	7.0356
SEAE	4.9905	11.82	5.8470
AD-SEAE	3.9396	9.33	5.0335
M-AD-SEAE	2.2764	5.39	3.0871

Table 3. Auxiliary Variables.

Variable	Unit	Correlation	Variable	Unit	Correlation
Crystal pulling rate	mm/min	0.9832	Crucible rise speed	mm/min	0.7784
Crystal diameter	mm	0.9844	Liquid level temperature	K	0.6712
Main heater power	Kw	0.7830	Crystal rotation speed	rad/min	0.5985
Heating element temperature	K	0.6780	Crucible rotation speed	rad/min	0.4873

Table 4. Model predictive performance index.

Model	MAE	MAPE (%)	RMSE
SAE	0.0138	0.0310	0.0174
SEAE	0.0128	0.0286	0.0162
AD-SEAE	0.0011	0.0025	0.0047
M-AD-SEAE	5.1903 × 10⁻⁴	0.0012	0.0022

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, D.; Ren, J.; Du, X.; Wang, Y.; Ding, D. Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process. Crystals 2026, 16, 86. https://doi.org/10.3390/cryst16020086

AMA Style

Zhao D, Ren J, Du X, Wang Y, Ding D. Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process. Crystals. 2026; 16(2):86. https://doi.org/10.3390/cryst16020086

Chicago/Turabian Style

Zhao, Duqiao, Junchao Ren, Xiaoyan Du, Yixin Wang, and Dong Ding. 2026. "Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process" Crystals 16, no. 2: 86. https://doi.org/10.3390/cryst16020086

APA Style

Zhao, D., Ren, J., Du, X., Wang, Y., & Ding, D. (2026). Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process. Crystals, 16(2), 86. https://doi.org/10.3390/cryst16020086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Mechanism–Data-Driven Modeling for Crystal Quality Prediction in Czochralski Process

Abstract

1. Introduction

2. Cz Process and Problem Description

3. Mechanism and Data Fusion-Driven Soft Sensor Model

3.1. Mechanism Sub-Model

3.2. Data-Driven Sub-Model

3.2.1. Attention Mechanisms

3.2.2. Stacked Autoencoder

3.2.3. Stack-Enhanced Autoencoder

3.2.4. SEAE Network Based on Attention Mechanism

3.3. Output Fusion of Mechanisms Model and Data Model

3.4. M-AD-SEAE-Based Soft Sensor

4. Case Study

4.1. Case 1: Generalized Numerical Case

4.2. Case 2: Cz-SSC Growth Process

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI