1. Introduction
As the core cornerstone of the information industry, the development level of integrated circuits directly determines the country’s technological competitiveness and industrial security [
1]. Due to its excellent semiconductor physical properties, single-crystal silicon is the preferred substrate material for manufacturing large-scale integrated circuits, supporting more than 90% of the world’s semiconductor device production [
2]. With the continuous evolution of semiconductor technology, the feature size of integrated circuits is constantly shrinking, which puts forward unprecedentedly stringent requirements on the crystal integrity, micro-defect density and impurity distribution uniformity of single-crystal silicon [
3]. Any tiny crystal defect may lead to chip performance degradation or even failure. Therefore, achieving stable growth of high-quality single-crystal silicon has become a core technical challenge in the upstream of the semiconductor industry [
4,
5]. Among the many single-crystal silicon preparation technologies, the Czochralski (CZ) method has become the mainstream technology for industrial production of large-diameter silicon single crystals because Dash necking and the free solid–liquid interface provide a high success rate for single-crystal growth, the crystal can be directly pulled from the melt without the difficulties associated with crucible removal, and the process is more economical than the floating-zone method [
6,
7]. The crystal defect dynamics theory proposed by Voronkov et al. clearly points out that the ratio of the crystal pulling speed V to the axial temperature gradient G at the solid–liquid interface (V/G) is the key parameter that determines the type, concentration and distribution of intrinsic point defects inside single-crystal silicon [
8]. Only by precisely controlling the V/G value within an extremely narrow window near the critical value can a “perfect crystal” that meets the requirements of advanced processes be grown [
9]. However, since the CZ single-crystal furnace is in an extreme working condition of high temperature and sealing, and the solid–liquid interface is deeply buried inside the silicon melt, there is currently no sensor that can directly and continuously measure the temperature gradient G, which makes the online acquisition of the V/G value a key bottleneck restricting the fine control of the silicon single-crystal growth process [
10]. Currently, the industry generally adopts a trial-and-error model of “offline testing–process adjustment”, which not only has a long development cycle and high production costs, but also cannot achieve real-time closed-loop control, making it difficult to guarantee the consistency of product quality.
To address the core bottleneck of online detection of V/G values, domestic and international researchers have conducted extensive research and gradually completed technological iterations, evolving from early traditional methods to modern data-driven methods [
11]. Early research mainly relied on two types of methods: mechanism-driven and simple data fitting. Mechanism-driven methods take numerical simulation as the core and calculate V/G values by constructing multi-physics coupling models. The physical meaning is clear, but there are defects such as long computation time and inability to be applied online [
12]. Simple data-fitting methods, such as principal component regression (PCR) and partial least squares (PLS), are shallow models. Although they are simple and easy to implement, they are difficult to capture the strong nonlinear characteristics of the growth process, have limited prediction accuracy, and cannot adapt to changes in working conditions [
13]. With the development of industrial sensors and artificial intelligence technology, data-driven soft measurement technology has become mainstream. From the early artificial neural network (ANN) shallow model, it has gradually developed into time series models such as long short-term memory (LSTM) and gated recurrent unit (GRU), as well as convolutional neural network–long short-term memory (CNN-LSTM) hybrid models and variational autoencoder (VAE) deep generation models, which have significantly improved the prediction accuracy and noise resistance under specific working conditions [
14]. However, all such models are based on the assumption of “stable data distribution”, which cannot solve the problem of data distribution deviation under multiple working conditions. The generalization ability of the models is insufficient and it is difficult to meet the long-term stable operation requirements of industrial sites [
15].
To address the domain offset problem in multi-condition modeling, transfer learning techniques have been widely introduced into the field of industrial soft measurement. Its core idea is to learn cross-domain invariant features and utilize the existing labeled data knowledge in the source domain to assist in modeling tasks with unlabeled or minimally labeled data in the target domain, thereby improving the model’s cross-condition generalization ability. Early transfer learning methods were mainly divided into two categories: statistical discrepancy alignment and adversarial domain adaptation. Among them, maximum mean discrepancy (MMD) is a commonly used statistical alignment method, which achieves domain adaptation by minimizing the distribution difference between the source and target domain data. A domain-adversarial neural network (DANN) forces the feature extractor to learn domain invariant features through adversarial training mechanisms, achieving certain results in multi-condition soft measurement. However, such methods fail to fully consider the complex topological relationships between variables in industrial processes, resulting in poor interpretability of feature extraction [
16,
17].
The rise of graph neural networks (GNNs) has provided an effective tool for characterizing the complex relationships between variables in industrial processes. It can explicitly model the physical relationships and causal relationships between measurable variables. Compared with traditional fully connected neural networks, it has stronger interpretability and generalization ability and has been successfully applied to modeling a variety of complex industrial processes [
18]. Recently, Ren and Zhao [
19] proposed the Invariant-Specific Bidirectional Graph Neural Network (IS-BiGNN) model, which innovatively combines graph neural networks with invariant feature learning in transfer learning to construct two parallel graph neural network branches, respectively, learning cross-domain invariant relationships and domain-specific relationships between variables. It achieves cross-condition transfer through feature alignment and shows better performance than traditional transfer learning methods and single data-driven models in multiple industrial soft measurement tasks, providing a new idea for V/G value soft measurement under multiple conditions. However, the original IS-BiGNN model still has obvious limitations when dealing with complex dynamic industrial processes such as silicon single-crystal growth, and it is difficult to fully adapt to the actual needs of V/G value prediction. This has become the weak link in current research and is the core research starting point of this paper.
Specifically, the limitations of the original IS-BiGNN model are mainly reflected in three aspects. First, it adopts a static graph structure assumption, assuming that the correlation between process variables remains unchanged throughout the entire silicon single-crystal growth cycle. This fails to characterize the dynamic evolution characteristics of variable correlations at different stages of growth (crystal introduction, shoulder formation, and constant diameter growth), while the thermal field characteristics of each stage of silicon single-crystal growth differ significantly, and the correlation of variables exhibits obvious time-varying characteristics. Second, it assigns the same weight to all source-domain samples, without considering the differences in correlation between different source-domain samples and the target domain. When the operating conditions of some source-domain samples differ significantly from those of the target domain, it is easy to introduce negative transfer phenomena, reducing the model’s prediction accuracy. Third, it fails to fully exploit the information value of a large amount of unlabeled data in the target domain. Model training mainly relies on labeled data in the source domain, and the utilization of unlabeled data in the target domain is limited to simple feature alignment, failing to fully leverage the role of unlabeled data in improving the model’s generalization ability.
To address the aforementioned issues, this paper proposes a multi-condition V/G value soft measurement method based on an improved IS BiGNN. This method models the silicon single-crystal growth process under different production conditions as multiple related domains with distribution shifts. Based on IS BiGNN, three core improvements are introduced: a dynamic sample graph construction mechanism, which adaptively learns the dynamic correlation between variables through sample-level attention and growth stage sensing nodes; a source-domain credibility evaluation mechanism, which dynamically allocates sample weights based on inter-domain distribution differences and prediction uncertainties; and a semi-supervised consistency training framework, which fully utilizes unlabeled data in the target domain by combining conditional distribution alignment and regression consistency constraints. Experiments using industrial data from a 12-inch silicon single-crystal production line demonstrate that this method can accurately predict V/G values under various complex scenarios, including batch, thermal field, and process parameter variations, with significantly better overall performance than mainstream soft measurement methods. Through these mechanisms, this paper achieves high-precision soft measurement of V/G values under multiple complex scenarios, providing a new technical approach for online detection of key parameters in the silicon single-crystal growth process and offering a valuable reference for multi-condition modeling problems in other complex industrial processes.
The subsequent content of this paper is arranged as follows:
Section 2 introduces the relevant theoretical foundations of graph neural networks and IS-BiGNN;
Section 3 elaborates on the proposed Dynamic Weighted Conditional Invariant-Specific BiGNN (DWC-ISBiGNN) model;
Section 4 verifies the effectiveness of the proposed method through experiments and analyzes the results; and
Section 5 summarizes the work of this paper and looks forward to future research directions.
3. V/G Value Prediction Method Based on Multi-Condition Invariant Feature Extraction
Although the IS-BiGNN model provides an excellent theoretical framework for soft measurement of multi-condition industrial processes and has achieved good application results in multiple industrial scenarios, its inherent limitations are significantly amplified when directly applied to V/G value prediction in the Czochralski silicon single-crystal growth process due to the complexity, dynamism, and multi-stage nature of the silicon single-crystal growth process. This limitation fails to fully meet the actual needs of V/G value prediction, specifically in the following three aspects:
(1) The original IS-BiGNN uses a static graph structure, which cannot capture the dynamic drift and multi-stage differences in the variable relationships during silicon single-crystal growth. It may misjudge stage changes within the same domain as inter-domain differences, hindering invariant feature extraction.
(2) The model treats all source domains equally during training, failing to distinguish the similarity between each source domain and the target domain. This leads to negative transfer from low-similarity source domains, weakening the cross-condition generalization ability. (3) The model only performs coarse-grained alignment at the overall graph level, ignoring the fine-grained correspondence between samples in the same stage, and relies solely on labeled data, failing to utilize the operating condition information contained in the large number of unlabeled samples in the target domain.
To address the limitations of the IS-BiGNN model in static graph modeling, equal treatment of the source domain, and global coarse-grained alignment, this paper proposes an adaptive multi-operating-condition invariant feature extraction method, named DWC-ISBiGNN (Dynamic Weighted Conditional IS-BiGNN). This method achieves accurate prediction of V/G values across operating conditions by organically integrating three mechanisms: dynamic sample graph construction, source-domain confidence weighting, and stage condition alignment–regression consistency constraints.
3.1. Dynamic Graph Construction and Feature Extraction
The model dynamically generates the graph topology in each training batch. A sample-level attention mechanism is used to calculate the correlation strength between any two process variables
and
in the current batch:
Here,
is a feature vector composed of the values of the
l-th process variable of all samples in the current batch, ⊕ represents the vector concatenation operation,
is a learnable attention parameter, and
is the sigmoid activation function, which normalizes the correlation strength to the [0,1] interval.
For the perception growth stage, a learnable “stage” global node is added, connected to all physical variable nodes. Its state update follows the GCN message passing rules:
where
represents the stage node features,
represents the physical variable node feature matrix,
represents the adjacency matrix between stage nodes and physical variable nodes, and
represents the learnable weights. After several layers of propagation, the hidden state of the stage node will encode the production stage information of the current batch, enabling the physical variable node to perceive its own macroscopic stage and solving the defect of static graphs being insensitive to stage information.
Based on a dynamic graph structure, a parameter-sharing graph convolutional network is used to extract invariant features
, and a domain-private graph convolutional network is used to extract specific features
.
where
represents the cross-domain shared invariant feature, and
represents the
k-th domain-specific feature. The stage node information
is concatenated to the physical node features or used as a conditional input, enabling the invariant feature extraction process to perceive stage differences and thus learn a more accurate cross-domain commonality representation at different stages.
3.2. Adaptive Weighting of Source-Domain Credibility
Based on the invariant graph that best reflects the essential cross-domain relationship, the Frobenius norm is used to quantify the graph structure difference between the
i-th source domain and the target domain:
The smaller the difference, the closer the variable association patterns between the source and target domains are. The differences are converted into normalized confidence weights using a Softmax function with a temperature coefficient
:
The temperature coefficient controls the steepness of the weight distribution: the smaller is, the weights of source regions with large differences tend to be close to 0, and the weights of source regions with small differences tend to be close to 1; the larger is, the smoother the weight distribution. is determined through cross-validation.
The credibility weight
is applied to the source-domain terms in the topological similarity loss and the task loss, respectively, to achieve “enhancing high-quality source domains and suppressing irrelevant source domains”:
3.3. Conditional Alignment and Regression Consistency Constraints
Define a stage-specific conditional alignment loss, which forces samples from different domains in the same stage to align in the invariant feature space. Calculate the mean vector
of the invariant features
of all samples in the
g-th stage in the
d-th domain, and then minimize the squared distance between the means of the same stage in different domains:
where
represents the mean of the invariant features in the
d-th domain and the
g-th stage.
A teacher–student semi-supervised framework is introduced. The student model is updated via gradient descent, while the teacher model parameters are updated from the student model using exponential moving average (EMA):
The attenuation coefficient
. For unlabeled samples in the target domain, inputting them into the student model and teacher model, respectively, yields predicted values
,
and intermediate layer features
,
. The regression consistency loss is defined as
where
, the weights of
employ an exponential warm-up strategy:
where
t is the number of training steps,
T is the total number of warm-up steps (set to 80), and
is the maximum weight.
3.4. Overall Model Framework and Training Objectives
The three improvement strategies mentioned above are organically integrated to construct the DWC-ISBiGNN model based on multi-condition invariant feature extraction, as shown in
Figure 3. First, a dynamic graph structure adapted to the current operating condition is generated through a dynamic sample graph construction strategy, enabling accurate capture of dynamic changes in process variables and growth stage information. Based on the constructed dynamic graph structure, feature extraction is performed, obtaining cross-domain invariant features and domain-specific features. The similarity between each source domain and the target domain is calculated through a source-domain credibility evaluation module, and source-domain weights are dynamically allocated accordingly, strengthening the guiding role of high-quality source domains and suppressing interference from irrelevant source domains. Fine-grained stage condition alignment constraints achieve accurate alignment of sample features from different domains in the same growth stage, and regression consistency constraints are used to mine the potential value of unlabeled data in the target domain, further improving the extraction quality of invariant features. Finally, based on the output results of each step, an end-to-end joint training method is used to optimize the model parameters overall, ensuring that the model has excellent cross-condition generalization ability and V/G value prediction accuracy. To achieve optimized training and accurate prediction of the model, and considering the core objectives of each improvement module, the final training objective of the model is defined as a weighted sum of the loss terms, as follows:
All weights are determined through cross-validation to ensure optimal predictive performance of the model.
3.5. Methodological Differences from IS-BiGNN
DWC-ISBiGNN is developed from the IS-BiGNN framework, but the modifications are designed for the specific non-stationary and multi-stage characteristics of Czochralski silicon growth rather than for simple parameter tuning. The original IS-BiGNN separates invariant and domain-specific graph representations, which is useful for transfer learning. However, when it is directly applied to V/G prediction in the CZ process, three practical limitations become evident.
The first limitation is the static treatment of graph structure. In the original IS-BiGNN, the graph learned for a domain is assumed to be representative of the variable relationships in that domain. During constant-diameter CZ growth, however, the relationships among pulling speed, heater power, crucible motion, melt-level-related variables, and crystal geometry are not fixed. They evolve with crystal length and thermal history. Therefore, DWC-ISBiGNN introduces dynamic graph construction and stage-aware global nodes, so that the graph topology can adapt to the current operating state instead of remaining fixed throughout the growth process.
The second limitation is the equal treatment of source domains. In length-dependent CZ growth data, different source domains do not have the same relevance to the target domain. Some earlier length ranges may share similar thermal and growth characteristics with the target region, whereas others may differ substantially. If all source domains are forced to contribute equally, less relevant domains may introduce negative transfer. For this reason, DWC-ISBiGNN introduces a source-domain credibility weighting mechanism based on invariant graph–structure distance, allowing more relevant source domains to contribute more strongly to transfer learning.
The third limitation is the insufficient use of unlabeled target-domain trajectories. In industrial production, reliable V/G labels are difficult to obtain, whereas unlabeled process trajectories from the target domain are relatively abundant. DWC-ISBiGNN therefore combines stage-conditional alignment with teacher–student regression consistency. The former avoids aligning samples from incomparable growth stages, and the latter uses unlabeled target samples to regularize the regression behavior. In this way, the proposed model modifies the graph construction, source-transfer strategy, and target-domain adaptation mechanism of IS-BiGNN to better match the physical and data characteristics of CZ silicon growth.
3.6. Algorithm Overview
The overall process of the DWC-ISBiGNN algorithm is as follows: first, for each training batch, an invariant graph
and a specific graph
are dynamically generated through sample-level attention, and the growth stage information
is encoded using stage-aware global nodes. Then, based on the dynamic graph structure, a shared GNN is used to extract invariant features
, and a domain-private GNN is used to extract specific features
. Next, the Frobenius distance between the invariant graphs of each source domain and the target domain is calculated, and the source-domain confidence weight
is obtained through Softmax with a temperature coefficient, and a weighted topological similarity loss
and task loss
are applied. Simultaneously, a stage conditional alignment loss
is introduced to force consistent cross-domain feature distribution within the same stage, and a teacher–student semi-supervised framework is constructed, applying a regression consistency loss
to the unlabeled data in the target domain (using EMA updates and a warm-up strategy). Finally, the total loss is jointly optimized as
, and the end-to-end model is trained. The specific implementation is shown in Algorithm 1.
| Algorithm 1 Training Process of DWC-ISBiGNN |
- Require:
Labeled source domain sets , Labeled target domain set , Unlabeled target domain set , Hyperparameters - Ensure:
Trained DWC-ISBiGNN model - 1:
Initialize student model , teacher model , AdamW optimizer, step - 2:
for
to E do - 3:
Mix all data and split into batches with size B - 4:
for each batch in the batch set do - 5:
- 6:
For each domain, generate dynamic invariant graph and specific graph . Inject stage-aware nodes into graph structure - 7:
Shared GNN extracts invariant features , private GNN extracts specific features - 8:
Compute invariant graph distance and adaptive weight - 9:
Teacher/student model predict unlabeled samples, compute - 10:
Dynamic weight: - 11:
- 12:
Backward propagation to update - 13:
EMA update: - 14:
end for - 15:
Evaluate on validation set; apply early stop if no improvement for 30 epochs - 16:
end for
|
4. Experimental Verification and Result Analysis
To verify the effectiveness of the proposed DWC-ISBiGNN model in predicting the V/G value of single-crystal silicon, this chapter designs a multi-dimensional, progressive experimental system. First, the composition of the experimental data, preprocessing procedures, evaluation indicators, and experimental configuration are described. Then, overall performance comparison, module ablation verification, and in-depth mechanism analysis are conducted, completing a systematic demonstration from the perspectives of prediction accuracy, generalization ability, and mechanism of action.
4.1. Data Description and Experimental Setup
4.1.1. Data Source and Composition
The experimental data comes from the actual production process records of single-crystal silicon from a domestic unit, covering data from multiple production batches. Data from the constant diameter growth stage of different complete production batches were selected. Based on the thermal field configuration and differences in raw material batches, the data are divided into four source domains (denoted as Source A, Source B, Source C and Source D) and two target domains (denoted as Target 1). The statistical information of the samples in each domain is shown in
Table 1.
The percentage of tagged samples in the target domain is extremely low, accounting for only 15% of the total samples in the target domain. This closely matches the characteristics of actual silicon single-crystal production, where V/G values are difficult to measure online and labeled data is scarce.
The temperature gradient G at the solid–liquid interface was obtained through numerical simulations of the thermal field during the Czochralski growth process. The simulations were performed using the commercial software package CGSim (STR Group). The simulation model is designed to be highly consistent with the actual physical system, incorporating key factors such as heat transfer via conduction, radiation, and convection within the crucible, silicon melt, and growing crystal. The thermal-field simulation was calibrated against experimental temperature measurements from the production furnace to ensure its accuracy. The temperature gradient G was extracted from the calibrated simulated thermal field at the solid–liquid interface and then correlated with the experimentally recorded pulling speed V to obtain the V/G values used in this study.
To further assess the reliability of the simulation-derived V/G labels, the simulated V/G trends were cross-validated against post-growth defect inspection data. Specifically, the defect tendency inferred from the simulated V/G values was compared with the experimentally observed defect patterns at the tail regions of sampled ingots. The comparison showed agreement for approximately 90% of the inspected ingots, indicating that the calibrated CGSim model can reasonably capture the defect-sensitive variation in V/G. Considering the residual deviation in thermal-field calibration and the measurement noise in the recorded process variables, the overall uncertainty of the generated V/G labels was estimated to be within approximately .
A total of 14 input variables were selected, including: crystal diameter, crystal rise rate, main heating power, auxiliary heating power, crystal rotation speed, crucible rotation speed, crucible rise rate, heating element temperature, liquid surface temperature feedback, crystal weight, crystal length, meniscus bottom temperature, meniscus height, and growth rate. V/G value labels were jointly calibrated based on offline numerical simulation results and partial batch tail ingot defect detection data. Each sequence consists of a time window of length w = 6, the input dimension is .
To quantify the differences among the length-defined domains,
Table 2 summarizes the mean and standard deviation of key process variables for each domain. As shown, the target domain (755–932 mm) exhibits a higher average main heater power, a lower average growth speed, and a lower mean V/G value than most source domains. These differences confirm that the length-defined domains correspond to distinct operating distributions within the constant-diameter stage, thereby providing a practical basis for the multi-domain transfer-learning setting.
To further quantify the distribution shift, the Kolmogorov–Smirnov (KS) statistic and Wasserstein distance were calculated between each source domain and the target domain. As shown in
Table 3, clear distribution shifts exist in main heater power, growth speed, and V/G. The results also indicate that different source domains have different degrees of relevance to the target domain, which motivates the adaptive source-domain credibility weighting mechanism in DWC-ISBiGNN.
4.1.2. Data Preprocessing
To suppress high-frequency random noise, median filtering is applied to the original time-series signal. For each process variable, domain-independent min–max normalization is used to map the values to the [0,1] interval:
where
is the original variable, and
and
are the minimum and maximum values of the variable in the corresponding domain, respectively.
4.1.3. Evaluation Metrics
The prediction performance is comprehensively evaluated using three common quantitative metrics for regression tasks:
1. Root Mean Square Error (RMSE)
2. Mean Absolute Error (MAE)
3. Coefficient of Determination (R
2)
where
m is the number of test samples,
is the true V/G value,
is the model predicted value, and
is the mean of the true values. The smaller the RMSE and MAE, and the closer
is to 1, the better the model’s prediction accuracy and generalization ability.
4.1.4. Experimental Environment and Parameter Configuration
All models are implemented based on the PyTorch 2.9.1 framework, using the Adam optimizer, with a learning rate of , a batch size of 64, and 250 training epochs. Key hyperparameters of the model were as follows: , initial temperature coefficient , continuously adjusted during training through learning; preheating steps = 30; consistency loss weight ; feature consistency coefficient ; EMA decay coefficient .
To ensure the comprehensiveness and fairness of the comparison, four mainstream baseline models were selected for performance benchmarking. The specific configurations of each model are shown in
Table 4. All baseline models were trained using only all labeled data from the source domain and a small amount of labeled data from the target domain, without introducing unlabeled data, and maintaining consistency between the input variables and the preprocessing procedure.
4.2. Overall Performance Comparison Experiment
To verify the advancement of the proposed DWC-ISBiGNN, a comprehensive performance comparison experiment was conducted on its target domain, comparing it with the original IS-BiGNN model and four mainstream baseline models (deep neural network (DNN), extreme gradient boosting (XGBoost), LSTM, Transformer and Domain-Adversarial Neural Network with Long Short-Term Memory (DANN-LSTM)). The experiments strictly adhered to the same experimental configuration, data preprocessing procedures, and evaluation metrics to ensure the fairness and reliability of the comparison results. All models underwent multiple retraining iterations, and the average of the experimental results was used as the final performance metric to reduce the impact of random factors on the results. Specific comparison results are shown in
Table 5. For clarity, all figures in this section display only the first 500 consecutive samples from the test set; all quantitative metrics are computed on the full test set.
To more intuitively demonstrate the performance differences between the models,
Figure 4 shows the experimental results comparing the model predictions with the actual values.
Combining the experimental data in
Table 5 and
Figure 4, it can be seen that the sequence modeling method is generally superior to the traditional static modeling method. The experimental results show that the overall performance of the temporal modeling method (LSTM, Transformer) is significantly better than that of the traditional static modeling method (DNN, XGBoost). Specifically, the RMSE (0.005208) and MAE (0.003147) of the LSTM model are lower than those of DNN (RMSE = 0.006169, MAE = 0.004857) and XGBoost (RMSE = 0.006253, MAE = 0.005539), while the
(0.9273) is higher than that of DNN (0.898) and XGBoost (0.8952), respectively. The Transformer model performs slightly better than the IS-BiGNN model, with an
of 0.9179, slightly lower than LSTM’s 0.9273, and a slightly higher RMSE (0.005535) than LSTM. This result demonstrates that the silicon single-crystal growth process exhibits strong time-series dependence, and the temporal evolution characteristics of process parameters significantly impact V/G value prediction. Temporal modeling methods (especially the short-term dependency capture capability of LSTM and the long-range correlation capture capability of Transformer) can effectively mine the temporal information of parameters, validating the necessity and superiority of temporal modeling in V/G value prediction. Traditional static modeling methods (DNN, XGBoost) do not consider the temporal correlation of process parameters, relying solely on parameters at a single moment for prediction, making it difficult to adapt to the dynamic process of silicon single-crystal growth, thus resulting in relatively low prediction accuracy.
Graph structure modeling can significantly improve V/G value prediction accuracy. The IS-BiGNN model (RMSE = 0.006088, MAE = 0.004385, = 0.9006) comprehensively outperforms traditional static modeling methods (DNN, XGBoost) in all performance metrics, while slightly underperforming temporal modeling methods (LSTM, Transformer). This result demonstrates that constructing 12 process parameters as graph nodes and learning the complex coupling relationships between parameters through a data-driven approach can effectively capture the intrinsic correlations among process parameters during silicon single-crystal growth. This overcomes the limitation of traditional models in modeling the interactions between parameters and is a key path to improving V/G prediction performance. Although IS-BiGNN does not incorporate the advantages of temporal modeling, its graph structure modeling characteristics still allow it to outperform static models, further validating the effectiveness of graph structure modeling in this task.
The proposed DWC-ISBiGNN significantly improves model performance, achieving optimal overall performance. Experimental data shows that as the improvement strategies are gradually superimposed, the model performance exhibits a continuous upward trend, validating the effectiveness and synergistic effect of each improvement strategy. Specifically, its RMSE = 0.0041, MAE = 0.00285, and = 0.9549. Compared to the original IS-BiGNN, the RMSE is reduced by 32.7%, the MAE by 35.0%, and the R2 is increased by 5.43%. Compared to the best baseline model LSTM, the RMSE is reduced by 21.3%, the MAE by 9.4%, and the is increased by 2.76%. Compared to the Transformer model, the RMSE is reduced by 25.9%, the MAE by 37.2%, and the is increased by 3.7%. These results fully validate the comprehensive effectiveness of the three improvement strategies proposed in this paper: dynamic graph construction, source-domain confidence weighting, and conditional consistency constraints. The synergistic effect of these three strategies effectively solves the problems of static modeling, equal weighting of the source domain, and low data utilization in the IS-BiGNN model, significantly improving the model’s prediction accuracy and generalization ability.
To further benchmark the proposed method against a recent domain adaptation approach, we additionally implemented a DANN-LSTM model, which combines a Domain-Adversarial Neural Network (DANN) with LSTM for adversarial domain alignment and temporal sequence modeling. As shown in
Table 5 and
Figure 5, DANN-LSTM achieved an RMSE of 0.004475, an MAE of 0.003706, and an
of 0.9463. This result is notably better than all other baseline models and is competitive with the proposed DWC-ISBiGNN, demonstrating the benefit of adversarial domain adaptation for the multi-condition V/G prediction task. However, DWC-ISBiGNN still achieves superior performance, with RMSE reduced by 8.4% and
improved by 0.86 percentage points compared with DANN-LSTM. This additional improvement can be attributed to the explicit modeling of dynamic graph structures, adaptive source-domain weighting, and semi-supervised consistency learning, which jointly address the non-stationary and multi-stage characteristics of the CZ growth process that adversarial alignment alone cannot fully capture.
Comparison with Weighted Ensemble Baseline
To further examine whether the performance improvement of DWC-ISBiGNN can be achieved by a simpler model aggregation strategy, an additional weighted ensemble baseline was constructed. The ensemble used DNN, XGBoost, LSTM, and Transformer as base learners, and the ensemble weights were determined by non-negative least squares (NNLS). To avoid overly optimistic weight estimation, the NNLS weights were calibrated on held-out source-domain validation samples that were not used for training the base learners. Under this conservative calibration protocol, the NNLS optimization assigned the largest weight to XGBoost, reflecting its best fit to the source-domain validation samples rather than necessarily indicating the best target-domain performance.
Figure 6 compares the NNLS-weighted ensemble with the proposed DWC-ISBiGNN. The NNLS ensemble achieved an RMSE of 0.006247, an MAE of 0.005520, and an
of 0.895385. In contrast, DWC-ISBiGNN achieved an RMSE of 0.004100, an MAE of 0.002850, and an
of 0.954900. Compared with the NNLS ensemble, DWC-ISBiGNN reduced RMSE by 34.37% and MAE by 48.37%, while improving
by 5.95 percentage points. These results indicate that the proposed model’s advantage cannot be reproduced by a simple weighted aggregation of conventional baseline models.
4.3. Ablation Experiments and Module Validation
To further verify the independent contributions and synergistic effects of the three improvement strategies proposed in this paper, four model variants—D-ISBiGNN, DW-ISBiGNN, and DWC-ISBiGNN—were constructed by sequentially overlaying dynamic sample graph construction (D), source-domain confidence weighting (W), and conditional alignment and regression consistency constraints (C) on the original IS-BiGNN as the baseline. Incremental comparisons were then performed on the test set. The experimental results are shown in
Table 6.
As shown in
Figure 7, comparing the ablation experiment data and results, each improved module independently delivers performance gains, and the cumulative effect continuously increases. Compared to the baseline IS-BiGNN (RMSE = 0.006088,
= 0.9006), the D-ISBiGNN, which only incorporates dynamic sample graphs, reduces RMSE to 0.005717 (a decrease of 6.1%) and increases
to 0.9124, verifying the effectiveness of the dynamic graph structure in capturing non-stationary changes within the operating conditions. Further superimposing source-domain confidence-weighted DW-ISBiGNN achieves a significant performance leap, with RMSE dropping sharply to 0.004343 (a decrease of 28.7% compared to IS-BiGNN) and
jumping to 0.9494 (an increase of 4.88 percentage points), indicating that adaptive source-domain weighting effectively filters high-quality source domains and greatly suppresses negative migration. Finally, the complete model DWC-ISBiGNN, which incorporates conditional alignment and regression consistency constraints, further reduces the RMSE to 0.004100 (a 5.6% decrease) and improves the
to 0.9549, while maintaining an extremely low MAE, demonstrating that semi-supervised consistency constraints can stably mine useful information from unlabeled data and continuously optimize feature representations. No module experienced performance degradation throughout the process, proving that the three improvements are highly complementary in function.
In summary, the ablation experiments fully demonstrate that the three strategies—dynamic sample graph construction (D), source-domain confidence weighting (W), and conditional alignment and regression consistency constraints (C)—are independently effective and synergistically enhance each other, jointly pushing the prediction accuracy of IS-BiGNN to new heights. Compared to the original IS-BiGNN, the complete model DWC-ISBiGNN shows a relative reduction of 32.6% in RMSE and an absolute increase of 5.43 percentage points in , strongly supporting the advancement and rationality of the proposed improvement method.
4.4. Physical Interpretability Analysis of the Learned Graph Structure
To further investigate the physical interpretability of the proposed DWC-ISBiGNN model, the learned invariant adjacency matrix in the target domain was extracted and visualized, as shown in
Figure 8. In this graph, each node represents one of the 14 measured process variables, and each edge weight denotes the learned interaction strength between two variables. Specifically, the nodes correspond to crystal diameter, crystal lift speed, main heater power, auxiliary heater power, crystal rotation speed, crucible rotation speed, crucible lift speed, heater temperature, melt surface temperature feedback, crystal weight, crystal length, meniscus bottom temperature, meniscus height, and growth speed. Therefore, the learned adjacency matrix provides a direct way to examine whether the graph structure captured by the model is consistent with known coupling relationships in the Czochralski silicon growth process.
As shown in
Figure 8, several physically meaningful dependencies can be observed. The strongest interaction appears between crystal lift speed and growth speed, which is consistent with the pulling-growth dynamics in the Czochralski process. Crystal lift speed also shows strong connections with crucible lift speed, main heater power, auxiliary heater power, crystal length, and crystal weight, indicating that the pulling process is coupled with crucible movement, thermal input, and crystal geometry evolution. In addition, heater-related variables and growth-related variables are connected, suggesting that the model captures part of the coupling between thermal control and interface evolution.
It should be noted that the learned graph is not intended to be a strict mechanistic heat-transfer network. Instead, it represents data-driven invariant dependencies that are useful for V/G prediction. Some learned edges are consistent with known physical couplings, whereas others may reflect indirect correlations induced by process control strategies, operating stages, or batch-level variations. Nonetheless, the learned adjacency matrix provides useful interpretability for analyzing variable interactions and offers clues for understanding the underlying process dynamics.
5. Summary
To address the challenges of online measurement of V/G values, data distribution shifts across multiple operating conditions, and scarcity of labeled samples during the Czochralski silicon single-crystal growth process, this paper proposes a soft measurement method, DWC-ISBiGNN, based on dynamically weighted conditionally invariant feature extraction. Building upon the IS-BiGNN framework, this method introduces a dynamic sample graph construction mechanism (sample-level attention and stage-aware global nodes) to dynamically adjust the graph topology with each batch, capturing the non-stationary characteristics and stage information of the growth process. A source-domain credibility evaluation module is designed, adaptively allocating source-domain weights based on the distance of the invariant graph structure to effectively suppress negative migration. Simultaneously, a stage-conditional alignment loss and teacher–student semi-supervised regression consistency constraint are constructed to achieve fine-grained alignment of cross-domain features within the same stage and to mine the value of unlabeled data in the target domain. Experiments based on industrial data from a 12-inch silicon single-crystal production line demonstrate that DWC-ISBiGNN achieves optimal prediction performance across multiple scenarios, including batch size, thermal field, and process parameters, with RMSE and MAE reaching 0.0041 and 0.00285, respectively, and reaching 0.9549. Compared to the original IS-BiGNN, RMSE is reduced by 32.6%, and the absolute value of is increased by 5.43 percentage points. Ablation experiments further validate the independent effectiveness and synergistic gain of the three improvement strategies. This study provides a feasible technical solution for online detection of key parameters in silicon single-crystal growth and offers a valuable paradigm for multi-condition soft measurement modeling of complex industrial processes.
It should be noted that the “invariant” features learned in this study are data-driven dependencies that are stable across the observed source domains and target domain under the same furnace configuration (i.e., same thermal field design, crucible geometry, and heat shield structure). Their generalizability to different furnace hardware (e.g., altered crucible material, heat shield geometry, or hot-zone design) is not guaranteed and would require recalibration or fine-tuning with data from the new configuration. This is an inherent limitation of all data-driven approaches, whose generalization is bounded by the coverage of the training data. Cross-furnace generalization is a promising direction for future research, where few-shot transfer learning or domain adaptation with limited data could be explored to enable rapid adaptation to new furnace designs.
Future work will focus on model lightweighting (e.g., knowledge distillation, pruning) and physical mechanism fusion (heat conduction equations, fluid dynamic constraints) to improve real-time inference capabilities and interpretability under extreme conditions.