Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction

Wan, Yin; Han, Chun-Jie; Liu, Ding; Lei, Hao-Nan; Ren, Jun-Chao

doi:10.3390/cryst16070420

Open AccessArticle

Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction

by

Yin Wan

,

Chun-Jie Han

,

Ding Liu

^*

,

Hao-Nan Lei

and

Jun-Chao Ren

Department of Information and Control Engineering, School of Automation and Information Engineering, Jinhua Campus, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Crystals 2026, 16(7), 420; https://doi.org/10.3390/cryst16070420 (registering DOI)

Submission received: 18 May 2026 / Revised: 23 June 2026 / Accepted: 26 June 2026 / Published: 29 June 2026

(This article belongs to the Special Issue Microstructure and Characterization of Crystalline Materials)

Download

Browse Figures

Versions Notes

Abstract

In the Czochralski process of silicon single-crystal growth, the V/G value at the solid–liquid interface is a key parameter affecting intrinsic crystal defects. However, online V/G detection remains difficult because the temperature gradient G cannot be directly measured, while multi-condition distribution shifts and limited labeled data reduce the robustness of data-driven models. To address these issues, this paper proposes DWC-ISBiGNN, an adaptive multi-condition invariant feature extraction method based on the Invariant-Specific Bidirectional Graph Neural Network. The proposed method introduces dynamic sample graph construction with stage-aware global nodes to capture non-stationary process correlations, source-domain credibility weighting to suppress negative transfer, and a semi-supervised training framework combining stage-conditional alignment with teacher–student regression consistency to exploit unlabeled target-domain data. Experiments on industrial data from a 12-inch silicon single-crystal production line show that DWC-ISBiGNN achieves an RMSE of 0.0041, an MAE of 0.00285, and an

R^{2}

of 0.9549. Compared with the original IS-BiGNN, the RMSE is reduced by 32.6%, and

R^{2}

is increased by 5.43 percentage points. The results demonstrate that the proposed method provides an effective soft-sensing approach for V/G prediction under multiple operating conditions.

Keywords:

silicon single-crystal growth; V/G ratio; soft measurement; multi-condition modeling; causal invariant features; graph neural network; transfer learning

1. Introduction

As the core cornerstone of the information industry, the development level of integrated circuits directly determines the country’s technological competitiveness and industrial security [1]. Due to its excellent semiconductor physical properties, single-crystal silicon is the preferred substrate material for manufacturing large-scale integrated circuits, supporting more than 90% of the world’s semiconductor device production [2]. With the continuous evolution of semiconductor technology, the feature size of integrated circuits is constantly shrinking, which puts forward unprecedentedly stringent requirements on the crystal integrity, micro-defect density and impurity distribution uniformity of single-crystal silicon [3]. Any tiny crystal defect may lead to chip performance degradation or even failure. Therefore, achieving stable growth of high-quality single-crystal silicon has become a core technical challenge in the upstream of the semiconductor industry [4,5]. Among the many single-crystal silicon preparation technologies, the Czochralski (CZ) method has become the mainstream technology for industrial production of large-diameter silicon single crystals because Dash necking and the free solid–liquid interface provide a high success rate for single-crystal growth, the crystal can be directly pulled from the melt without the difficulties associated with crucible removal, and the process is more economical than the floating-zone method [6,7]. The crystal defect dynamics theory proposed by Voronkov et al. clearly points out that the ratio of the crystal pulling speed V to the axial temperature gradient G at the solid–liquid interface (V/G) is the key parameter that determines the type, concentration and distribution of intrinsic point defects inside single-crystal silicon [8]. Only by precisely controlling the V/G value within an extremely narrow window near the critical value can a “perfect crystal” that meets the requirements of advanced processes be grown [9]. However, since the CZ single-crystal furnace is in an extreme working condition of high temperature and sealing, and the solid–liquid interface is deeply buried inside the silicon melt, there is currently no sensor that can directly and continuously measure the temperature gradient G, which makes the online acquisition of the V/G value a key bottleneck restricting the fine control of the silicon single-crystal growth process [10]. Currently, the industry generally adopts a trial-and-error model of “offline testing–process adjustment”, which not only has a long development cycle and high production costs, but also cannot achieve real-time closed-loop control, making it difficult to guarantee the consistency of product quality.

To address the core bottleneck of online detection of V/G values, domestic and international researchers have conducted extensive research and gradually completed technological iterations, evolving from early traditional methods to modern data-driven methods [11]. Early research mainly relied on two types of methods: mechanism-driven and simple data fitting. Mechanism-driven methods take numerical simulation as the core and calculate V/G values by constructing multi-physics coupling models. The physical meaning is clear, but there are defects such as long computation time and inability to be applied online [12]. Simple data-fitting methods, such as principal component regression (PCR) and partial least squares (PLS), are shallow models. Although they are simple and easy to implement, they are difficult to capture the strong nonlinear characteristics of the growth process, have limited prediction accuracy, and cannot adapt to changes in working conditions [13]. With the development of industrial sensors and artificial intelligence technology, data-driven soft measurement technology has become mainstream. From the early artificial neural network (ANN) shallow model, it has gradually developed into time series models such as long short-term memory (LSTM) and gated recurrent unit (GRU), as well as convolutional neural network–long short-term memory (CNN-LSTM) hybrid models and variational autoencoder (VAE) deep generation models, which have significantly improved the prediction accuracy and noise resistance under specific working conditions [14]. However, all such models are based on the assumption of “stable data distribution”, which cannot solve the problem of data distribution deviation under multiple working conditions. The generalization ability of the models is insufficient and it is difficult to meet the long-term stable operation requirements of industrial sites [15].

To address the domain offset problem in multi-condition modeling, transfer learning techniques have been widely introduced into the field of industrial soft measurement. Its core idea is to learn cross-domain invariant features and utilize the existing labeled data knowledge in the source domain to assist in modeling tasks with unlabeled or minimally labeled data in the target domain, thereby improving the model’s cross-condition generalization ability. Early transfer learning methods were mainly divided into two categories: statistical discrepancy alignment and adversarial domain adaptation. Among them, maximum mean discrepancy (MMD) is a commonly used statistical alignment method, which achieves domain adaptation by minimizing the distribution difference between the source and target domain data. A domain-adversarial neural network (DANN) forces the feature extractor to learn domain invariant features through adversarial training mechanisms, achieving certain results in multi-condition soft measurement. However, such methods fail to fully consider the complex topological relationships between variables in industrial processes, resulting in poor interpretability of feature extraction [16,17].

The rise of graph neural networks (GNNs) has provided an effective tool for characterizing the complex relationships between variables in industrial processes. It can explicitly model the physical relationships and causal relationships between measurable variables. Compared with traditional fully connected neural networks, it has stronger interpretability and generalization ability and has been successfully applied to modeling a variety of complex industrial processes [18]. Recently, Ren and Zhao [19] proposed the Invariant-Specific Bidirectional Graph Neural Network (IS-BiGNN) model, which innovatively combines graph neural networks with invariant feature learning in transfer learning to construct two parallel graph neural network branches, respectively, learning cross-domain invariant relationships and domain-specific relationships between variables. It achieves cross-condition transfer through feature alignment and shows better performance than traditional transfer learning methods and single data-driven models in multiple industrial soft measurement tasks, providing a new idea for V/G value soft measurement under multiple conditions. However, the original IS-BiGNN model still has obvious limitations when dealing with complex dynamic industrial processes such as silicon single-crystal growth, and it is difficult to fully adapt to the actual needs of V/G value prediction. This has become the weak link in current research and is the core research starting point of this paper.

Specifically, the limitations of the original IS-BiGNN model are mainly reflected in three aspects. First, it adopts a static graph structure assumption, assuming that the correlation between process variables remains unchanged throughout the entire silicon single-crystal growth cycle. This fails to characterize the dynamic evolution characteristics of variable correlations at different stages of growth (crystal introduction, shoulder formation, and constant diameter growth), while the thermal field characteristics of each stage of silicon single-crystal growth differ significantly, and the correlation of variables exhibits obvious time-varying characteristics. Second, it assigns the same weight to all source-domain samples, without considering the differences in correlation between different source-domain samples and the target domain. When the operating conditions of some source-domain samples differ significantly from those of the target domain, it is easy to introduce negative transfer phenomena, reducing the model’s prediction accuracy. Third, it fails to fully exploit the information value of a large amount of unlabeled data in the target domain. Model training mainly relies on labeled data in the source domain, and the utilization of unlabeled data in the target domain is limited to simple feature alignment, failing to fully leverage the role of unlabeled data in improving the model’s generalization ability.

To address the aforementioned issues, this paper proposes a multi-condition V/G value soft measurement method based on an improved IS BiGNN. This method models the silicon single-crystal growth process under different production conditions as multiple related domains with distribution shifts. Based on IS BiGNN, three core improvements are introduced: a dynamic sample graph construction mechanism, which adaptively learns the dynamic correlation between variables through sample-level attention and growth stage sensing nodes; a source-domain credibility evaluation mechanism, which dynamically allocates sample weights based on inter-domain distribution differences and prediction uncertainties; and a semi-supervised consistency training framework, which fully utilizes unlabeled data in the target domain by combining conditional distribution alignment and regression consistency constraints. Experiments using industrial data from a 12-inch silicon single-crystal production line demonstrate that this method can accurately predict V/G values under various complex scenarios, including batch, thermal field, and process parameter variations, with significantly better overall performance than mainstream soft measurement methods. Through these mechanisms, this paper achieves high-precision soft measurement of V/G values under multiple complex scenarios, providing a new technical approach for online detection of key parameters in the silicon single-crystal growth process and offering a valuable reference for multi-condition modeling problems in other complex industrial processes.

The subsequent content of this paper is arranged as follows: Section 2 introduces the relevant theoretical foundations of graph neural networks and IS-BiGNN; Section 3 elaborates on the proposed Dynamic Weighted Conditional Invariant-Specific BiGNN (DWC-ISBiGNN) model; Section 4 verifies the effectiveness of the proposed method through experiments and analyzes the results; and Section 5 summarizes the work of this paper and looks forward to future research directions.

2. Theoretical Foundations of Graph Neural Networks and IS-BiGNN

2.1. Fundamental Theories of Graph Neural Networks

A graph is a mathematical structure describing the relationships between entities, formally represented as

G = (V, E)

, where

V = {v_{1}, v_{2}, \dots, v_{n}}

is the set of nodes, each node corresponding to an entity;

E = {e_{1}, e_{2}, \dots, e_{m}}

is the set of edges, each edge

e_{i j}

corresponding to the relationship between two nodes

v_{i}

and

v_{j}

, reflecting the degree of mutual influence between process variables [20].

Based on the properties of the edges, graphs can be divided into three categories: undirected graphs have no direction on the edges, reflecting bidirectional influence between variables; directed graphs have direction on the edges, reflecting unidirectional influence between variables; and weighted graphs have weights on the edges, which can quantify the strength of the relationship between variables [21].

The core mathematical representation of a graph comprises three types of matrices: the adjacency matrix A is an

n \times n

square matrix, where

A_{i j}

represents the association strength between nodes

v_{i}

and

v_{j}

, taking a value of 0 when there is no association; the node feature matrix X is an

n \times d

matrix, where

X_{i j}

represents the j-th feature of the i-th node, corresponding to process variable monitoring data; and the degree matrix D is an

n \times n

diagonal matrix, where

D_{i i}

is the degree of node

v_{i}

, which is the sum of the weights of the connecting edges. These three types of matrices work together to provide the foundation for information aggregation and node feature updates in GNNs, and their rationality directly affects the modeling effect.

Core Mechanism of Graph Neural Network and Graph Convolutional Network

The core idea of graph neural network is to update the node representation by aggregating the information of neighbor nodes, as shown in Figure 1. A traditional GNN is based on fixed-point theory and converges the graph state through iteration, but it has limitations such as slow convergence and poor interpretability [22].

For any node

v_{n}

in graph G, its state update follows a local iteration rule, and the transition function and output function are as follows:

x_{n} = f_{w} (l_{n}, l_{c o [n]}, x_{n e [n]}, l_{n e [n]})

(1)

o_{n} = g_{w} (x_{n}, l_{n})

(2)

where

l_{n}

is the feature vector of node

v_{n}

itself;

l_{c o [n]}

is the feature set of the edges connected to node

v_{n}

;

x_{n e [n]}

is the set of current states of neighboring nodes;

l_{n e [n]}

is the feature set of neighboring nodes;

f_{w}

is the learnable parameterized transfer function (usually composed of a multilayer perceptron); and

g_{w}

is the local output function. At the global level, the local functions are stacked to obtain the global transfer function

F_{w}

and the output function

G_{w}

, and

F_{w}

is required to be a compression mapping to ensure iterative convergence.

Graph convolutional networks (GCNs), as the most representative variant of GNNs, were proposed by Kipf and Welling [23]. They simplify spectral convolution to a first-order spatial approximation and adopt layered forward propagation, which greatly improves training efficiency. The core operation of GCN is “neighborhood information aggregation”, that is, each node updates its representation by aggregating the features of its neighboring nodes and combining them with its own features. Stacking multiple layers can expand the receptive field of the nodes [24]. Its single-layer forward propagation formula is:

H^{(l + 1)} = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} H^{(l)} W^{(l)})

(3)

where

H^{(l)}

is the feature matrix of the l-th layer node;

\tilde{A} = A + I_{N}

(adding self-loops to retain the node’s own information);

\tilde{D}

is the degree matrix of

\tilde{A}

(used for normalization to avoid feature bias caused by differences in node degree);

W^{(l)}

is the learnable weight matrix; and

σ (\cdot)

is the ReLU nonlinear activation function.

The forward propagation of GCN consists of two steps: first, the node features are mapped to a new feature space by multiplying

H^{(l)}

and

W^{(l)}

; second, the neighborhood information is weighted and aggregated by multiplying the normalized

\tilde{A}

and the mapped feature matrix [25,26,27].

2.2. Invariant-Specific BiGNN (IS-BiGNN)

The Invariant-Specific BiGNN is designed for transfer learning in multi-condition industrial processes. It achieves knowledge transfer by decoupling cross-domain “invariant” relations from domain “specific” relations and combining graph structure alignment and feature orthogonalization. It is the baseline model in this paper, as shown in Figure 2.

2.2.1. Symbol Definition and Problem Modeling

To clearly describe the working mechanism of IS-BiGNN, and in conjunction with the silicon single-crystal V/G value prediction task in this paper, relevant symbols are defined: let there be N labeled source domains

D_{s}^{1}, D_{s}^{2}, \dots, D_{s}^{n}

and one target domain

D_{t}

. The source domains are historical operating conditions with a large amount of labeled data, and the target domain is the current operating condition to be predicted (containing a small amount of labeled data and a large amount of unlabeled data).

The n-th source-domain data is represented as

D_{n}^{s} = \{(x_{n}^{s_{1}}, y_{n}^{s_{1}}), \dots, (x_{n}^{s_{a}}, y_{n}^{s_{a}})\}

, where a is the number of samples,

x_{n}^{s_{1}}

is the time series of d process variables, and

y_{n}^{s_{1}}

is the corresponding V/G value label; the number of labeled data in the target domain

D_{t}

is b, the number of unlabeled data is c, and

b ≪ c

, which is highly consistent with the actual scenario in silicon single-crystal production where V/G values are difficult to measure online and labeled data is scarce.

2.2.2. Graph Construction Module

The d process variables of each domain are modeled as a fully connected graph with d nodes, and the adjacency matrix is learned by data-driven learning. Two parallel graph learning layers are designed:

(1) Invariant Graph Learning Layer

Learns stable relationships common to all domains, with parameters shared across domains. For any two variables

v_{l}

and

v_{q}

, the relationship strength is calculated as

a_{l, q}^{inv} = f (θ_{c} \cdot (v_{l} \oplus v_{q}))

(4)

where

θ_{c} \in R^{2 w}

is the trainable parameter vector shared by all domains, ⊕ denotes the vector concatenation operation, and

f (\cdot)

is the Sigmoid activation function. After obtaining the upper triangular matrix, the complete adjacency matrix is constructed by symmetrization and adding self-loops:

A^{inv} = {\tilde{A}}^{inv} + {({\tilde{A}}^{inv})}^{⊤} + I_{d}

(5)

To maintain sparsity, a threshold

ϵ

is introduced for filtering:

{\tilde{a}}_{l, q}^{inv} = \{\begin{matrix} a_{l, q}^{inv}, & if a_{l, q}^{inv} > ϵ, \\ 0, & otherwise . \end{matrix}

(6)

(2) Specific Graph Learning Layer

Parameters are private to each domain and are not shared across domains. For domain k, the relation strength is calculated by the domain-specific parameter

θ_{p}^{(k)}

:

a_{l, q}^{spe, (k)} = f (θ_{p}^{k} \cdot (v_{l}^{(k)} \oplus v_{q}^{(k)}))

(7)

After symmetrization, adding self-loops, and sparsification, we obtain the domain-specific graph adjacency matrix

A_{s p e}^{k}

.

Through the above dual-channel graph construction, the data of each domain is simultaneously represented as two graphs: an invariant graph depicting cross-domain commonalities, and a domain-specific graph reflecting the domain’s characteristics. For example, for the i-th source domain

s_{i}

, we obtain

A_{i n v}^{s i}

and

A_{s p e}^{s i}

; for the target domain t, we obtain

A_{i n v}^{t}

and

A_{s p e}^{t}

.

2.2.3. Feature Extraction and Alignment Module

The feature extraction and alignment module is responsible for extracting features from the graph structure and learning cross-domain invariant features through multiple constraints. This module works collaboratively from two levels: “invariant relation alignment” and “specific information filtering”.

(1) Invariant Relation Alignment

Invariant relation alignment aims to ensure that the invariant relations between pairs of the same variables in different domains are as consistent as possible, and that the graph neural networks used to extract invariant features also have consistent parameters. IS-BiGNN achieves this goal through topological similarity constraints and parameter-sharing GNN.

Topological Similarity Constraints: A topological vector

r_{q}^{i}

is defined for each variable in each domain. The topological difference between different domains

i, j

is measured by L2 distance, and the topological loss for all domain pairs is

L_{top} = \sum_{(i, j)} \sum_{q = 1}^{d} {∥r_{q}^{i} - r_{q}^{j}∥}_{2}^{2}

(8)

Parameter-Sharing GNNs: Besides maintaining a consistent graph topology, the GNN parameters for extracting features from invariant graphs should also be shared across all domains. IS-BiGNN uses graph convolution operations for feature extraction:

h_{c} = {GNN}_{c} (X, A^{inv}) = σ ({({\tilde{D}}^{inv})}^{- 1 / 2} A^{inv} {({\tilde{D}}^{inv})}^{- 1 / 2} X W_{shared})

(9)

where

X \in R^{d \times w}

is the input data matrix,

A^{i n v}

is the invariant graph adjacency matrix,

{\tilde{D}}^{i n v}

is the degree matrix of

A^{i n v}

,

W_{share} \in R^{w \times h}

is the trainable weight matrix shared by all domains,

σ (\cdot)

is the non-linear activation function (usually ReLU), and

h_{c} \in R^{d \times h}

are the extracted invariant features.

G N N_{c}

shares the parameter

W_{s h a r e d}

across all source and target domains, ensuring that the same invariant relations are mapped to the same feature space. Domain-specific features are extracted from a domain-specific GNN:

h_{p}^{k} = {GNN}_{p}^{k} (X_{k}, A_{spe}^{k}) = σ ({({\tilde{D}}_{spe}^{k})}^{- 1 / 2} A_{spe}^{k} {({\tilde{D}}_{spe}^{k})}^{- 1 / 2} X_{k} W_{unshared}^{k})

(10)

here,

W_{u n s h a r e d}

is a weight parameter unique to domain k, and

h_{p}^{k}

is a specific feature of domain k. The

G N N_{p}^{k}

of each domain does not share parameters to ensure that the specific features of each domain can fully capture the private data characteristics of that domain.

(2) Specific Information Filtering: The purpose of specific information filtering is to ensure that the extracted invariant feature

h_{c}

does not contain domain-specific information. This is achieved through two aspects: domain classification task and feature orthogonality constraint.

Domain Classification Task: A domain classifier is connected after the specific feature

h_{p}

, so that

h_{p}

contains rich domain-discriminative information. The loss is

L_{d o m a i n}

:

L_{domain} = - k I [d = k] \log p_{domain, k}

(11)

where

I [d = k]

is an indicator function, taking a value of 1 when the true domain label of the sample is k, and 0 otherwise. Minimizing

L_{d o m a i n}

drives

G N N_{p}

to learn specific features with strong domain discriminative power, making them contain as much domain-specific information as possible.

Feature Orthogonality Constraint: This requires that the invariant features

h_{c}

and specific features

h_{p}

of the same batch of samples be as orthogonal as possible in the vector space.

L_{dec} = {∥{(h_{c})}^{⊤} h_{p}∥}_{F}^{2}

(12)

Here,

{(h_{c})}^{⊤} h_{p}

is the inner product of the two feature matrices, and

{∥ \cdot ∥}_{F}

is the Frobenius norm. The loss reaches its minimum of 0 when

h_{c}

and

h_{p}

are perfectly orthogonal in the inner product space. The orthogonality constraint decouples invariant features from specific features at the numerical optimization level, preventing domain-specific information from contaminating the invariant features.

2.2.4. Joint Training Objectives

Combining the above loss terms and adding the task loss from downstream V/G value prediction, the complete training objective of IS-BiGNN is

L = α \cdot L_{top} + β \cdot L_{dec} + \sum_{i = 1}^{N} L_{task}^{s, i} + L_{task}^{t} + γ \cdot (\sum_{i = 1}^{N} L_{domain}^{s, i} + L_{domain}^{t})

(13)

where

L_{t a s k}^{s, i}

and

L_{t a s k}^{t}

are the downstream task losses for the i-th source and target domains, respectively, and mean squared error loss is used in this paper:

L_{task} = \frac{1}{m} \sum_{j = 1}^{m} {(y_{j} - {\hat{y}}_{j})}^{2}

(14)

where m is the number of samples,

y_{j}

is the true V/G value, and

{\hat{y}}_{j}

is the model predicted value (obtained from the invariant feature

h_{c}

through a fully connected regression layer).

α, β, γ

are hyperparameters that control the relative weights of the topological similarity loss, feature orthogonality loss, and domain classification loss, respectively. These hyperparameters are typically determined on the validation set through grid search or Bayesian optimization.

3. V/G Value Prediction Method Based on Multi-Condition Invariant Feature Extraction

Although the IS-BiGNN model provides an excellent theoretical framework for soft measurement of multi-condition industrial processes and has achieved good application results in multiple industrial scenarios, its inherent limitations are significantly amplified when directly applied to V/G value prediction in the Czochralski silicon single-crystal growth process due to the complexity, dynamism, and multi-stage nature of the silicon single-crystal growth process. This limitation fails to fully meet the actual needs of V/G value prediction, specifically in the following three aspects:

(1) The original IS-BiGNN uses a static graph structure, which cannot capture the dynamic drift and multi-stage differences in the variable relationships during silicon single-crystal growth. It may misjudge stage changes within the same domain as inter-domain differences, hindering invariant feature extraction.

(2) The model treats all source domains equally during training, failing to distinguish the similarity between each source domain and the target domain. This leads to negative transfer from low-similarity source domains, weakening the cross-condition generalization ability. (3) The model only performs coarse-grained alignment at the overall graph level, ignoring the fine-grained correspondence between samples in the same stage, and relies solely on labeled data, failing to utilize the operating condition information contained in the large number of unlabeled samples in the target domain.

To address the limitations of the IS-BiGNN model in static graph modeling, equal treatment of the source domain, and global coarse-grained alignment, this paper proposes an adaptive multi-operating-condition invariant feature extraction method, named DWC-ISBiGNN (Dynamic Weighted Conditional IS-BiGNN). This method achieves accurate prediction of V/G values across operating conditions by organically integrating three mechanisms: dynamic sample graph construction, source-domain confidence weighting, and stage condition alignment–regression consistency constraints.

3.1. Dynamic Graph Construction and Feature Extraction

The model dynamically generates the graph topology in each training batch. A sample-level attention mechanism is used to calculate the correlation strength between any two process variables

v_{l}

and

v_{q}

in the current batch:

a_{l, q} = f (θ_{c} \cdot (v_{l}^{batch} \oplus v_{q}^{batch}))

(15)

Here,

v_{l}^{b a t c h}

is a feature vector composed of the values of the l-th process variable of all samples in the current batch, ⊕ represents the vector concatenation operation,

θ_{c}

is a learnable attention parameter, and

f (\cdot)

is the sigmoid activation function, which normalizes the correlation strength to the [0,1] interval.

For the perception growth stage, a learnable “stage” global node is added, connected to all physical variable nodes. Its state update follows the GCN message passing rules:

h_{stage}^{(l + 1)} = σ ({\tilde{D}}_{stage}^{- 1 / 2} {\tilde{A}}_{stage} {\tilde{D}}_{stage}^{- 1 / 2} [h_{stage}^{(l)}; H_{var}^{(l)}] W_{stage})

(16)

where

h_{s t a g e}

represents the stage node features,

H_{v a r}

represents the physical variable node feature matrix,

{\tilde{A}}_{s t a g e}

represents the adjacency matrix between stage nodes and physical variable nodes, and

W_{s t a g e}

represents the learnable weights. After several layers of propagation, the hidden state of the stage node will encode the production stage information of the current batch, enabling the physical variable node to perceive its own macroscopic stage and solving the defect of static graphs being insensitive to stage information.

Based on a dynamic graph structure, a parameter-sharing graph convolutional network is used to extract invariant features

h_{c}

, and a domain-private graph convolutional network is used to extract specific features

h_{k}^{p}

.

h_{c} = {GNN}_{c} (X, A_{inv} (t), h_{stage}), h_{p}^{k} = {GNN}_{p}^{k} (X_{k}, A_{spe}^{k} (t))

(17)

where

h_{c}

represents the cross-domain shared invariant feature, and

h_{k}^{p}

represents the k-th domain-specific feature. The stage node information

h_{s t a g e}

is concatenated to the physical node features or used as a conditional input, enabling the invariant feature extraction process to perceive stage differences and thus learn a more accurate cross-domain commonality representation at different stages.

3.2. Adaptive Weighting of Source-Domain Credibility

Based on the invariant graph that best reflects the essential cross-domain relationship, the Frobenius norm is used to quantify the graph structure difference between the i-th source domain and the target domain:

dist (i, t) = {∥A_{s, i}^{inv} - A_{t}^{inv}∥}_{F}^{2}

(18)

The smaller the difference, the closer the variable association patterns between the source and target domains are. The differences are converted into normalized confidence weights using a Softmax function with a temperature coefficient

τ

:

w_{i} = \frac{e^{- dist (i, t) / τ}}{\sum_{i^{'}} e^{- dist (i^{'}, t) / τ}}

(19)

The temperature coefficient

τ

controls the steepness of the weight distribution: the smaller

τ

is, the weights of source regions with large differences tend to be close to 0, and the weights of source regions with small differences tend to be close to 1; the larger

τ

is, the smoother the weight distribution.

τ

is determined through cross-validation.

The credibility weight

w_{i}

is applied to the source-domain terms in the topological similarity loss and the task loss, respectively, to achieve “enhancing high-quality source domains and suppressing irrelevant source domains”:

L_{top} = \sum_{i} w_{i} \cdot L_{top}^{s, i, t} + L_{top}^{s, s}

(20)

L_{task} = \sum_{i} w_{i} \cdot L_{task}^{s, i} + L_{task}^{t}

(21)

3.3. Conditional Alignment and Regression Consistency Constraints

Define a stage-specific conditional alignment loss, which forces samples from different domains in the same stage to align in the invariant feature space. Calculate the mean vector

μ_{d_{i}, g}

of the invariant features

h_{c}

of all samples in the g-th stage in the d-th domain, and then minimize the squared distance between the means of the same stage in different domains:

L_{cond} = \sum_{g} \sum_{d_{i} < d_{j}} {∥μ_{d_{i}, g} - μ_{d_{j}, g}∥}_{2}^{2}

(22)

where

μ_{d, g}

represents the mean of the invariant features in the d-th domain and the g-th stage.

A teacher–student semi-supervised framework is introduced. The student model is updated via gradient descent, while the teacher model parameters are updated from the student model using exponential moving average (EMA):

θ_{teacher} \leftarrow ρ \cdot θ_{teacher} + (1 - ρ) \cdot θ_{student}

(23)

The attenuation coefficient

ρ = 0.99

. For unlabeled samples in the target domain, inputting them into the student model and teacher model, respectively, yields predicted values

y_{s t u d e n t}

,

y_{t e a c h e r}

and intermediate layer features

z_{s t u d e n t}

,

z_{t e a c h e r}

. The regression consistency loss is defined as

L_{cons} = MSE (y_{student}, y_{teacher}) + λ_{feat} \cdot MSE (z_{student}, z_{teacher})

(24)

where

λ_{f e a t} = 0.5

, the weights of

L_{c o n s}

employ an exponential warm-up strategy:

λ_{cons} (t) = λ_{cons, \max} \cdot (1 - e^{- t / T})

(25)

where t is the number of training steps, T is the total number of warm-up steps (set to 80), and

λ_{c o n s, m a x}

is the maximum weight.

3.4. Overall Model Framework and Training Objectives

The three improvement strategies mentioned above are organically integrated to construct the DWC-ISBiGNN model based on multi-condition invariant feature extraction, as shown in Figure 3. First, a dynamic graph structure adapted to the current operating condition is generated through a dynamic sample graph construction strategy, enabling accurate capture of dynamic changes in process variables and growth stage information. Based on the constructed dynamic graph structure, feature extraction is performed, obtaining cross-domain invariant features and domain-specific features. The similarity between each source domain and the target domain is calculated through a source-domain credibility evaluation module, and source-domain weights are dynamically allocated accordingly, strengthening the guiding role of high-quality source domains and suppressing interference from irrelevant source domains. Fine-grained stage condition alignment constraints achieve accurate alignment of sample features from different domains in the same growth stage, and regression consistency constraints are used to mine the potential value of unlabeled data in the target domain, further improving the extraction quality of invariant features. Finally, based on the output results of each step, an end-to-end joint training method is used to optimize the model parameters overall, ensuring that the model has excellent cross-condition generalization ability and V/G value prediction accuracy. To achieve optimized training and accurate prediction of the model, and considering the core objectives of each improvement module, the final training objective of the model is defined as a weighted sum of the loss terms, as follows:

L = α \cdot L_{top} + β \cdot L_{dec} + L_{task} + γ \cdot L_{domain} + λ_{cond} \cdot L_{cond} + λ_{cond} (t) \cdot L_{cons}

(26)

All weights are determined through cross-validation to ensure optimal predictive performance of the model.

3.5. Methodological Differences from IS-BiGNN

DWC-ISBiGNN is developed from the IS-BiGNN framework, but the modifications are designed for the specific non-stationary and multi-stage characteristics of Czochralski silicon growth rather than for simple parameter tuning. The original IS-BiGNN separates invariant and domain-specific graph representations, which is useful for transfer learning. However, when it is directly applied to V/G prediction in the CZ process, three practical limitations become evident.

The first limitation is the static treatment of graph structure. In the original IS-BiGNN, the graph learned for a domain is assumed to be representative of the variable relationships in that domain. During constant-diameter CZ growth, however, the relationships among pulling speed, heater power, crucible motion, melt-level-related variables, and crystal geometry are not fixed. They evolve with crystal length and thermal history. Therefore, DWC-ISBiGNN introduces dynamic graph construction and stage-aware global nodes, so that the graph topology can adapt to the current operating state instead of remaining fixed throughout the growth process.

The second limitation is the equal treatment of source domains. In length-dependent CZ growth data, different source domains do not have the same relevance to the target domain. Some earlier length ranges may share similar thermal and growth characteristics with the target region, whereas others may differ substantially. If all source domains are forced to contribute equally, less relevant domains may introduce negative transfer. For this reason, DWC-ISBiGNN introduces a source-domain credibility weighting mechanism based on invariant graph–structure distance, allowing more relevant source domains to contribute more strongly to transfer learning.

The third limitation is the insufficient use of unlabeled target-domain trajectories. In industrial production, reliable V/G labels are difficult to obtain, whereas unlabeled process trajectories from the target domain are relatively abundant. DWC-ISBiGNN therefore combines stage-conditional alignment with teacher–student regression consistency. The former avoids aligning samples from incomparable growth stages, and the latter uses unlabeled target samples to regularize the regression behavior. In this way, the proposed model modifies the graph construction, source-transfer strategy, and target-domain adaptation mechanism of IS-BiGNN to better match the physical and data characteristics of CZ silicon growth.

3.6. Algorithm Overview

The overall process of the DWC-ISBiGNN algorithm is as follows: first, for each training batch, an invariant graph

A_{i n v} (t)

and a specific graph

A_{s p e}^{k} (t)

are dynamically generated through sample-level attention, and the growth stage information

h_{s t a g e}

is encoded using stage-aware global nodes. Then, based on the dynamic graph structure, a shared GNN is used to extract invariant features

h_{c}

, and a domain-private GNN is used to extract specific features

h_{p}^{k}

. Next, the Frobenius distance between the invariant graphs of each source domain and the target domain is calculated, and the source-domain confidence weight

w_{i}

is obtained through Softmax with a temperature coefficient, and a weighted topological similarity loss

L_{t o p}

and task loss

L_{t a s k}

are applied. Simultaneously, a stage conditional alignment loss

L_{c o n d}

is introduced to force consistent cross-domain feature distribution within the same stage, and a teacher–student semi-supervised framework is constructed, applying a regression consistency loss

L_{c o n s}

to the unlabeled data in the target domain (using EMA updates and a warm-up strategy). Finally, the total loss is jointly optimized as

L = α \cdot L_{top} + β \cdot L_{dec} + L_{task} + γ \cdot L_{domain} + λ_{cond} \cdot L_{cond} + λ_{cons} (t) \cdot L_{cons}

, and the end-to-end model is trained. The specific implementation is shown in Algorithm 1.

Algorithm 1 Training Process of DWC-ISBiGNN

Require:: Labeled source domain sets ${S_{1}, S_{2}, \dots, S_{N}}$ , Labeled target domain set $D_{t}^{l}$ , Unlabeled target domain set $D_{t}^{u}$ , Hyperparameters $Θ$
Ensure:: Trained DWC-ISBiGNN model
1:: Initialize student model $θ_{s}$ , teacher model $θ_{t} \leftarrow θ_{s}$ , AdamW optimizer, step $t = 0$
2:: for $epoch = 1$ to E do
3:: Mix all data and split into batches with size B
4:: for each batch in the batch set do
5:: $t \leftarrow t + 1$
6:: For each domain, generate dynamic invariant graph $A_{inv}^{k}$ and specific graph $A_{spe}^{k}$ . Inject stage-aware nodes into graph structure
7:: Shared GNN extracts invariant features $h_{c}$ , private GNN extracts specific features $h_{p}^{k}$
8:: Compute invariant graph distance $dist (i, t)$ and adaptive weight $w_{i}$
9:: Teacher/student model predict unlabeled samples, compute $L_{cons}$
10:: Dynamic weight: $λ_{cons} (t) = λ_{cons, \max} \cdot (1 - e^{- t / T_{w}})$
11:: Total loss:

$L = α L_{top} + β L_{dec} + L_{task} + γ L_{domain} + λ_{cond} L_{cond} + λ_{cons} (t) L_{cons}$
12:: Backward propagation to update $θ_{s}$
13:: EMA update: $θ_{t} \leftarrow ρ \cdot θ_{t} + (1 - ρ) \cdot θ_{s}$
14:: end for
15:: Evaluate on validation set; apply early stop if no improvement for 30 epochs
16:: end for

4. Experimental Verification and Result Analysis

To verify the effectiveness of the proposed DWC-ISBiGNN model in predicting the V/G value of single-crystal silicon, this chapter designs a multi-dimensional, progressive experimental system. First, the composition of the experimental data, preprocessing procedures, evaluation indicators, and experimental configuration are described. Then, overall performance comparison, module ablation verification, and in-depth mechanism analysis are conducted, completing a systematic demonstration from the perspectives of prediction accuracy, generalization ability, and mechanism of action.

4.1. Data Description and Experimental Setup

4.1.1. Data Source and Composition

The experimental data comes from the actual production process records of single-crystal silicon from a domestic unit, covering data from multiple production batches. Data from the constant diameter growth stage of different complete production batches were selected. Based on the thermal field configuration and differences in raw material batches, the data are divided into four source domains (denoted as Source A, Source B, Source C and Source D) and two target domains (denoted as Target 1). The statistical information of the samples in each domain is shown in Table 1.

The percentage of tagged samples in the target domain is extremely low, accounting for only 15% of the total samples in the target domain. This closely matches the characteristics of actual silicon single-crystal production, where V/G values are difficult to measure online and labeled data is scarce.

The temperature gradient G at the solid–liquid interface was obtained through numerical simulations of the thermal field during the Czochralski growth process. The simulations were performed using the commercial software package CGSim (STR Group). The simulation model is designed to be highly consistent with the actual physical system, incorporating key factors such as heat transfer via conduction, radiation, and convection within the crucible, silicon melt, and growing crystal. The thermal-field simulation was calibrated against experimental temperature measurements from the production furnace to ensure its accuracy. The temperature gradient G was extracted from the calibrated simulated thermal field at the solid–liquid interface and then correlated with the experimentally recorded pulling speed V to obtain the V/G values used in this study.

To further assess the reliability of the simulation-derived V/G labels, the simulated V/G trends were cross-validated against post-growth defect inspection data. Specifically, the defect tendency inferred from the simulated V/G values was compared with the experimentally observed defect patterns at the tail regions of sampled ingots. The comparison showed agreement for approximately 90% of the inspected ingots, indicating that the calibrated CGSim model can reasonably capture the defect-sensitive variation in V/G. Considering the residual deviation in thermal-field calibration and the measurement noise in the recorded process variables, the overall uncertainty of the generated V/G labels was estimated to be within approximately

\pm 5 %

.

A total of 14 input variables were selected, including: crystal diameter, crystal rise rate, main heating power, auxiliary heating power, crystal rotation speed, crucible rotation speed, crucible rise rate, heating element temperature, liquid surface temperature feedback, crystal weight, crystal length, meniscus bottom temperature, meniscus height, and growth rate. V/G value labels were jointly calibrated based on offline numerical simulation results and partial batch tail ingot defect detection data. Each sequence consists of a time window of length w = 6, the input dimension is

d \times w = 14 \times 6

.

To quantify the differences among the length-defined domains, Table 2 summarizes the mean and standard deviation of key process variables for each domain. As shown, the target domain (755–932 mm) exhibits a higher average main heater power, a lower average growth speed, and a lower mean V/G value than most source domains. These differences confirm that the length-defined domains correspond to distinct operating distributions within the constant-diameter stage, thereby providing a practical basis for the multi-domain transfer-learning setting.

To further quantify the distribution shift, the Kolmogorov–Smirnov (KS) statistic and Wasserstein distance were calculated between each source domain and the target domain. As shown in Table 3, clear distribution shifts exist in main heater power, growth speed, and V/G. The results also indicate that different source domains have different degrees of relevance to the target domain, which motivates the adaptive source-domain credibility weighting mechanism in DWC-ISBiGNN.

4.1.2. Data Preprocessing

To suppress high-frequency random noise, median filtering is applied to the original time-series signal. For each process variable, domain-independent min–max normalization is used to map the values to the [0,1] interval:

{\tilde{v}}^{l} = \frac{v^{l} - v_{\min}^{l}}{v_{\max}^{l} - v_{\min}^{l}}

(27)

where

v_{l}

is the original variable, and

v_{m i n}^{l}

and

v_{m a x}^{l}

are the minimum and maximum values of the variable in the corresponding domain, respectively.

4.1.3. Evaluation Metrics

The prediction performance is comprehensively evaluated using three common quantitative metrics for regression tasks:

1. Root Mean Square Error (RMSE)

RMSE = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(28)

2. Mean Absolute Error (MAE)

MAE = \frac{1}{m} \sum_{i = 1}^{m} |y_{i} - {\hat{y}}_{i}|

(29)

3. Coefficient of Determination (R²)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{m} {(y_{i} - \bar{y})}^{2}}

(30)

where m is the number of test samples,

y_{i}

is the true V/G value,

{\hat{y}}_{i}

is the model predicted value, and

\bar{y}

is the mean of the true values. The smaller the RMSE and MAE, and the closer

R^{2}

is to 1, the better the model’s prediction accuracy and generalization ability.

4.1.4. Experimental Environment and Parameter Configuration

All models are implemented based on the PyTorch 2.9.1 framework, using the Adam optimizer, with a learning rate of

3 \times 10^{- 3}

, a batch size of 64, and 250 training epochs. Key hyperparameters of the model were as follows:

α = 0.2, β = 0.02, γ = 0.001, λ_{c o n d} = 0.01

, initial temperature coefficient

τ_{0} = 1

, continuously adjusted during training through learning; preheating steps

T_{w}

= 30; consistency loss weight

λ_{c o n s, m a x} = 0.095

; feature consistency coefficient

λ_{f e a t} = 0.1

; EMA decay coefficient

ρ = 0.999

.

To ensure the comprehensiveness and fairness of the comparison, four mainstream baseline models were selected for performance benchmarking. The specific configurations of each model are shown in Table 4. All baseline models were trained using only all labeled data from the source domain and a small amount of labeled data from the target domain, without introducing unlabeled data, and maintaining consistency between the input variables and the preprocessing procedure.

4.2. Overall Performance Comparison Experiment

To verify the advancement of the proposed DWC-ISBiGNN, a comprehensive performance comparison experiment was conducted on its target domain, comparing it with the original IS-BiGNN model and four mainstream baseline models (deep neural network (DNN), extreme gradient boosting (XGBoost), LSTM, Transformer and Domain-Adversarial Neural Network with Long Short-Term Memory (DANN-LSTM)). The experiments strictly adhered to the same experimental configuration, data preprocessing procedures, and evaluation metrics to ensure the fairness and reliability of the comparison results. All models underwent multiple retraining iterations, and the average of the experimental results was used as the final performance metric to reduce the impact of random factors on the results. Specific comparison results are shown in Table 5. For clarity, all figures in this section display only the first 500 consecutive samples from the test set; all quantitative metrics are computed on the full test set.

To more intuitively demonstrate the performance differences between the models, Figure 4 shows the experimental results comparing the model predictions with the actual values.

Combining the experimental data in Table 5 and Figure 4, it can be seen that the sequence modeling method is generally superior to the traditional static modeling method. The experimental results show that the overall performance of the temporal modeling method (LSTM, Transformer) is significantly better than that of the traditional static modeling method (DNN, XGBoost). Specifically, the RMSE (0.005208) and MAE (0.003147) of the LSTM model are lower than those of DNN (RMSE = 0.006169, MAE = 0.004857) and XGBoost (RMSE = 0.006253, MAE = 0.005539), while the

R^{2}

(0.9273) is higher than that of DNN (0.898) and XGBoost (0.8952), respectively. The Transformer model performs slightly better than the IS-BiGNN model, with an

R^{2}

of 0.9179, slightly lower than LSTM’s 0.9273, and a slightly higher RMSE (0.005535) than LSTM. This result demonstrates that the silicon single-crystal growth process exhibits strong time-series dependence, and the temporal evolution characteristics of process parameters significantly impact V/G value prediction. Temporal modeling methods (especially the short-term dependency capture capability of LSTM and the long-range correlation capture capability of Transformer) can effectively mine the temporal information of parameters, validating the necessity and superiority of temporal modeling in V/G value prediction. Traditional static modeling methods (DNN, XGBoost) do not consider the temporal correlation of process parameters, relying solely on parameters at a single moment for prediction, making it difficult to adapt to the dynamic process of silicon single-crystal growth, thus resulting in relatively low prediction accuracy.

Graph structure modeling can significantly improve V/G value prediction accuracy. The IS-BiGNN model (RMSE = 0.006088, MAE = 0.004385,

R^{2}

= 0.9006) comprehensively outperforms traditional static modeling methods (DNN, XGBoost) in all performance metrics, while slightly underperforming temporal modeling methods (LSTM, Transformer). This result demonstrates that constructing 12 process parameters as graph nodes and learning the complex coupling relationships between parameters through a data-driven approach can effectively capture the intrinsic correlations among process parameters during silicon single-crystal growth. This overcomes the limitation of traditional models in modeling the interactions between parameters and is a key path to improving V/G prediction performance. Although IS-BiGNN does not incorporate the advantages of temporal modeling, its graph structure modeling characteristics still allow it to outperform static models, further validating the effectiveness of graph structure modeling in this task.

The proposed DWC-ISBiGNN significantly improves model performance, achieving optimal overall performance. Experimental data shows that as the improvement strategies are gradually superimposed, the model performance exhibits a continuous upward trend, validating the effectiveness and synergistic effect of each improvement strategy. Specifically, its RMSE = 0.0041, MAE = 0.00285, and

R^{2}

= 0.9549. Compared to the original IS-BiGNN, the RMSE is reduced by 32.7%, the MAE by 35.0%, and the R² is increased by 5.43%. Compared to the best baseline model LSTM, the RMSE is reduced by 21.3%, the MAE by 9.4%, and the

R^{2}

is increased by 2.76%. Compared to the Transformer model, the RMSE is reduced by 25.9%, the MAE by 37.2%, and the

R^{2}

is increased by 3.7%. These results fully validate the comprehensive effectiveness of the three improvement strategies proposed in this paper: dynamic graph construction, source-domain confidence weighting, and conditional consistency constraints. The synergistic effect of these three strategies effectively solves the problems of static modeling, equal weighting of the source domain, and low data utilization in the IS-BiGNN model, significantly improving the model’s prediction accuracy and generalization ability.

To further benchmark the proposed method against a recent domain adaptation approach, we additionally implemented a DANN-LSTM model, which combines a Domain-Adversarial Neural Network (DANN) with LSTM for adversarial domain alignment and temporal sequence modeling. As shown in Table 5 and Figure 5, DANN-LSTM achieved an RMSE of 0.004475, an MAE of 0.003706, and an

R^{2}

of 0.9463. This result is notably better than all other baseline models and is competitive with the proposed DWC-ISBiGNN, demonstrating the benefit of adversarial domain adaptation for the multi-condition V/G prediction task. However, DWC-ISBiGNN still achieves superior performance, with RMSE reduced by 8.4% and

R^{2}

improved by 0.86 percentage points compared with DANN-LSTM. This additional improvement can be attributed to the explicit modeling of dynamic graph structures, adaptive source-domain weighting, and semi-supervised consistency learning, which jointly address the non-stationary and multi-stage characteristics of the CZ growth process that adversarial alignment alone cannot fully capture.

Comparison with Weighted Ensemble Baseline

To further examine whether the performance improvement of DWC-ISBiGNN can be achieved by a simpler model aggregation strategy, an additional weighted ensemble baseline was constructed. The ensemble used DNN, XGBoost, LSTM, and Transformer as base learners, and the ensemble weights were determined by non-negative least squares (NNLS). To avoid overly optimistic weight estimation, the NNLS weights were calibrated on held-out source-domain validation samples that were not used for training the base learners. Under this conservative calibration protocol, the NNLS optimization assigned the largest weight to XGBoost, reflecting its best fit to the source-domain validation samples rather than necessarily indicating the best target-domain performance.

Figure 6 compares the NNLS-weighted ensemble with the proposed DWC-ISBiGNN. The NNLS ensemble achieved an RMSE of 0.006247, an MAE of 0.005520, and an

R^{2}

of 0.895385. In contrast, DWC-ISBiGNN achieved an RMSE of 0.004100, an MAE of 0.002850, and an

R^{2}

of 0.954900. Compared with the NNLS ensemble, DWC-ISBiGNN reduced RMSE by 34.37% and MAE by 48.37%, while improving

R^{2}

by 5.95 percentage points. These results indicate that the proposed model’s advantage cannot be reproduced by a simple weighted aggregation of conventional baseline models.

4.3. Ablation Experiments and Module Validation

To further verify the independent contributions and synergistic effects of the three improvement strategies proposed in this paper, four model variants—D-ISBiGNN, DW-ISBiGNN, and DWC-ISBiGNN—were constructed by sequentially overlaying dynamic sample graph construction (D), source-domain confidence weighting (W), and conditional alignment and regression consistency constraints (C) on the original IS-BiGNN as the baseline. Incremental comparisons were then performed on the test set. The experimental results are shown in Table 6.

As shown in Figure 7, comparing the ablation experiment data and results, each improved module independently delivers performance gains, and the cumulative effect continuously increases. Compared to the baseline IS-BiGNN (RMSE = 0.006088,

R^{2}

= 0.9006), the D-ISBiGNN, which only incorporates dynamic sample graphs, reduces RMSE to 0.005717 (a decrease of 6.1%) and increases

R^{2}

to 0.9124, verifying the effectiveness of the dynamic graph structure in capturing non-stationary changes within the operating conditions. Further superimposing source-domain confidence-weighted DW-ISBiGNN achieves a significant performance leap, with RMSE dropping sharply to 0.004343 (a decrease of 28.7% compared to IS-BiGNN) and

R^{2}

jumping to 0.9494 (an increase of 4.88 percentage points), indicating that adaptive source-domain weighting effectively filters high-quality source domains and greatly suppresses negative migration. Finally, the complete model DWC-ISBiGNN, which incorporates conditional alignment and regression consistency constraints, further reduces the RMSE to 0.004100 (a 5.6% decrease) and improves the

R^{2}

to 0.9549, while maintaining an extremely low MAE, demonstrating that semi-supervised consistency constraints can stably mine useful information from unlabeled data and continuously optimize feature representations. No module experienced performance degradation throughout the process, proving that the three improvements are highly complementary in function.

In summary, the ablation experiments fully demonstrate that the three strategies—dynamic sample graph construction (D), source-domain confidence weighting (W), and conditional alignment and regression consistency constraints (C)—are independently effective and synergistically enhance each other, jointly pushing the prediction accuracy of IS-BiGNN to new heights. Compared to the original IS-BiGNN, the complete model DWC-ISBiGNN shows a relative reduction of 32.6% in RMSE and an absolute increase of 5.43 percentage points in

R^{2}

, strongly supporting the advancement and rationality of the proposed improvement method.

4.4. Physical Interpretability Analysis of the Learned Graph Structure

To further investigate the physical interpretability of the proposed DWC-ISBiGNN model, the learned invariant adjacency matrix in the target domain was extracted and visualized, as shown in Figure 8. In this graph, each node represents one of the 14 measured process variables, and each edge weight denotes the learned interaction strength between two variables. Specifically, the nodes correspond to crystal diameter, crystal lift speed, main heater power, auxiliary heater power, crystal rotation speed, crucible rotation speed, crucible lift speed, heater temperature, melt surface temperature feedback, crystal weight, crystal length, meniscus bottom temperature, meniscus height, and growth speed. Therefore, the learned adjacency matrix provides a direct way to examine whether the graph structure captured by the model is consistent with known coupling relationships in the Czochralski silicon growth process.

As shown in Figure 8, several physically meaningful dependencies can be observed. The strongest interaction appears between crystal lift speed and growth speed, which is consistent with the pulling-growth dynamics in the Czochralski process. Crystal lift speed also shows strong connections with crucible lift speed, main heater power, auxiliary heater power, crystal length, and crystal weight, indicating that the pulling process is coupled with crucible movement, thermal input, and crystal geometry evolution. In addition, heater-related variables and growth-related variables are connected, suggesting that the model captures part of the coupling between thermal control and interface evolution.

It should be noted that the learned graph is not intended to be a strict mechanistic heat-transfer network. Instead, it represents data-driven invariant dependencies that are useful for V/G prediction. Some learned edges are consistent with known physical couplings, whereas others may reflect indirect correlations induced by process control strategies, operating stages, or batch-level variations. Nonetheless, the learned adjacency matrix provides useful interpretability for analyzing variable interactions and offers clues for understanding the underlying process dynamics.

5. Summary

To address the challenges of online measurement of V/G values, data distribution shifts across multiple operating conditions, and scarcity of labeled samples during the Czochralski silicon single-crystal growth process, this paper proposes a soft measurement method, DWC-ISBiGNN, based on dynamically weighted conditionally invariant feature extraction. Building upon the IS-BiGNN framework, this method introduces a dynamic sample graph construction mechanism (sample-level attention and stage-aware global nodes) to dynamically adjust the graph topology with each batch, capturing the non-stationary characteristics and stage information of the growth process. A source-domain credibility evaluation module is designed, adaptively allocating source-domain weights based on the distance of the invariant graph structure to effectively suppress negative migration. Simultaneously, a stage-conditional alignment loss and teacher–student semi-supervised regression consistency constraint are constructed to achieve fine-grained alignment of cross-domain features within the same stage and to mine the value of unlabeled data in the target domain. Experiments based on industrial data from a 12-inch silicon single-crystal production line demonstrate that DWC-ISBiGNN achieves optimal prediction performance across multiple scenarios, including batch size, thermal field, and process parameters, with RMSE and MAE reaching 0.0041 and 0.00285, respectively, and

R^{2}

reaching 0.9549. Compared to the original IS-BiGNN, RMSE is reduced by 32.6%, and the absolute value of

R^{2}

is increased by 5.43 percentage points. Ablation experiments further validate the independent effectiveness and synergistic gain of the three improvement strategies. This study provides a feasible technical solution for online detection of key parameters in silicon single-crystal growth and offers a valuable paradigm for multi-condition soft measurement modeling of complex industrial processes.

It should be noted that the “invariant” features learned in this study are data-driven dependencies that are stable across the observed source domains and target domain under the same furnace configuration (i.e., same thermal field design, crucible geometry, and heat shield structure). Their generalizability to different furnace hardware (e.g., altered crucible material, heat shield geometry, or hot-zone design) is not guaranteed and would require recalibration or fine-tuning with data from the new configuration. This is an inherent limitation of all data-driven approaches, whose generalization is bounded by the coverage of the training data. Cross-furnace generalization is a promising direction for future research, where few-shot transfer learning or domain adaptation with limited data could be explored to enable rapid adaptation to new furnace designs.

Future work will focus on model lightweighting (e.g., knowledge distillation, pruning) and physical mechanism fusion (heat conduction equations, fluid dynamic constraints) to improve real-time inference capabilities and interpretability under extreme conditions.

Author Contributions

Conceptualization, Y.W.; methodology, C.-J.H.; software, C.-J.H. and H.-N.L.; formal analysis, D.L.; investigation, H.-N.L.; resources, J.-C.R.; data curation, J.-C.R.; writing—original draft preparation, Y.W.; writing—review and editing, all authors; visualization, H.-N.L.; supervision, D.L. and J.-C.R.; project administration, D.L.; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant number 62503387 and 62303376], and the National Major Scientific Instrument Development Project of China [grant number 62127809].

Data Availability Statement

The data presented in this study are proprietary and confidential, being part of an ongoing industrial design project. Due to trade-secret and intellectual-property restrictions, the raw data cannot be shared publicly. All results reported in this paper are based on internal analyses that are fully described in the methods section. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhang, X.X.; Lai, J.M. From “Investment” to “Innovation”: A study on the policy utility transformation of the National Integrated Circuit Industry Investment Fund. Enterp. Econ. 2026, 45, 83–92. (In Chinese) [Google Scholar]
Zhao, S.X.; Zhou, A.L.; Wang, J.; Wen, M.; Lin, Y.L.; Chang, G.Y.; Hou, S.Y.; Zhang, D. Defect detection in monocrystalline silicon equal-diameter process based on improved YOLOv8n. Sci. Technol. Eng. 2026, 26, 1565–1574. (In Chinese) [Google Scholar]
Zhang, H.; Qian, J. Analysis of anisotropic etching characteristics and morphology simulation of monocrystalline silicon. J. Synth. Cryst. 2026, 55, 241–252. (In Chinese) [Google Scholar]
Kang, J.M.; Huang, Z.L.; Li, T.; Zhao, L.; Zhou, X.; Lv, G.Q. Research progress on large-size photovoltaic monocrystalline silicon preparation technology. Acta Energiae Solaris Sin. 2025, 46, 310–319. (In Chinese) [Google Scholar]
Zhao, Z.; Zong, X.; Guan, S.Y.; Li, X.; Yang, X.; Zhao, J.; Tian, L.; Wang, S.; Zhao, H. Influence of wide range temperature on indentation response of monocrystalline silicon. Eng. Fail. Anal. 2025, 180, 109880. [Google Scholar] [CrossRef]
Bai, Y.; Wang, Z.; Ren, Y.S.; Zhang, S.; Yu, B.; Lv, G.; Gu, W.; Xu, H.; Yuan, K.; Chen, J. TanAdvances in doping technology for Czochralski monocrystalline silicon. Sol. Energy Mater. Sol. Cells 2026, 301, 114309. [Google Scholar]
Ansari Dezfoli, A.R. Review of simulation and modeling techniques for silicon Czochralski crystal growth. J. Cryst. Growth 2024, 648, 127921. [Google Scholar] [CrossRef]
Geng, D.; Guo, X.G.; Wang, C.K.; Deng, Y.; Gao, S. Molecular dynamics simulation analysis of energy deposition on the evolution of single crystal silicon defect system. Mater. Today Commun. 2024, 40, 109576. [Google Scholar] [CrossRef]
Vanhellemont, J. The v/G criterion for defect-free silicon single crystal growth from a melt revisited: Implications for large diameter crystals. J. Cryst. Growth 2013, 381, 134–138. [Google Scholar]
Ren, J.C.; Liu, D.; Huang, Z.X.; Wan, Y. Deep learning with slow feature analysis for silicon single crystal growth state identification in Czochralski process. J. Cryst. Growth 2025, 670, 128346. [Google Scholar] [CrossRef]
Nakajima, K.; Murai, R.; Morishita, K.; Powell, D.M.; Kivambe, M.; Buonassisi, T. High speed growth of square-like Si single bulk crystals with a size of 23 × 23 cm² for solar cells using the noncontact crucible method. In Proceedings of the 2014 IEEE 40th Photovoltaic Specialist Conference (PVSC), Denver, CO, USA, 8–13 June 2014; pp. 3530–3533. [Google Scholar]
Wang, L.; Li, K.; Zhang, J.; Ding, Z. Soft fault diagnosis and recovery method based on model identification in rotation FOG inertial navigation system. IEEE Sens. J. 2017, 17, 5705–5716. [Google Scholar] [CrossRef]
Xi’an University of Technology. Method for Predicting V/G Value in Semiconductor Silicon Single Crystal Growth Based on Event-Triggered Learning. Chinese Patent CN202512007978.1, 3 April 2026. [Google Scholar] [PubMed]
Ye, X.; Zhou, T.; Chen, X.; Wei, C. Supervised neighborhood preserving autoencoder for soft measurement modeling. In Proceedings of the 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS), Kaifeng, China, 17–19 May 2024; pp. 37–42. [Google Scholar]
Petrovski, A.; Arifeen, M.; Kotenko, I.; Sletov, M.; Hassard, P.; Hasan, J. ML-based soft sensing and decision-making for data-driven formation pressure prediction. Comput. Ind. Eng. 2026, 216, 111970. [Google Scholar]
Han, Z.Z.; Xia, W.R.; Shen, W.M.; Zhu, Q.; Liu, H.; Zhang, C. Simulation-to-real transfer learning for bearing fault diagnosis across working conditions: A hybrid approach combining physical modeling and data-driven techniques. Adv. Eng. Inform. 2026, 69, 103998. [Google Scholar]
Hu, X.; Zhang, S.; Zhang, T.; Han, Y.; Wang, L.; Geng, Z. Multiple space transfer learning based on maximizing mean variance differences for soft sensor modeling. Expert Syst. Appl. 2026, 306, 130975. [Google Scholar]
Liao, Y.; Yin, Z.S.; Tian, X.Y. GNN-based SC-FDMA intelligent channel estimation in V2I scenario of Internet of Vehicles. Acta Electron. Sin. 2024, 52, 772–782. (In Chinese) [Google Scholar]
Ren, J.; Zhao, C. Discovering invariance from variations: Invariant-specific bi-graph neural network for multisource transfer learning with application to industrial soft sensor. IEEE Trans. Instrum. Meas. 2025, 74, 2526812. [Google Scholar]
Zheng, W.F.; Xu, G.Y.; Lu, S.Y.; Lyu, J.; Bao, F.; Yin, L. GNN: Core branches, integration strategies and applications. Comput. Model. Eng. Sci. 2026, 146, 5. [Google Scholar] [CrossRef]
Zhang, Z.X.; Liu, J.L.; Zhong, S.; Si, Y.L.; Gong, S.R. A high-order data social recommendation model integrating GNN and attention mechanism. Comput. Eng. Des. 2025, 46, 2625–2633. (In Chinese) [Google Scholar]
Zhao, H.X.; Shi, Y.R.; He, W.C.; Sun, H.; Wang, H.; Liu, J.; Gui, L. Novel graph neural network and GNN-C-Transformer model construction for direction of arrival estimation. Digit. Signal Process. 2026, 168, 105619. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2017, arXiv:1609.02907. [Google Scholar]
Gong, X.M.; He, Q.J.; Ye, J.J.; Huang, Y.; Zhao, C. Learning multi-pattern collaboration in multivariate time series via patch-GCN and time-attention. Inf. Sci. 2026, 748, 123474. [Google Scholar]
Zheng, S.X.; Guo, Z.T.; Li, Y.B.; Yun, Z. Multi-scale dual-axis attention GCN remaining useful life prediction model. Electron. Meas. Technol. 2026, 49, 146–154. (In Chinese) [Google Scholar]
Cao, R.Z.; Shao, H.; Wang, X.J.; Wang, J.; Shang, E.; Li, Y. Photocatalytic production of high-value-added fuels from biodegradable PBAT by Nb₂O₅/GCN heterojunction catalyst: Performance and mechanism. Chin. Chem. Lett. 2025, 36, 293–301. [Google Scholar]
Chen, S.; Qiu, Y.; Zhang, S.; Lv, S. Damage area identification of L-shaped aluminum structure based on CNN-GCN network structure. In Proceedings of the 2025 IEEE 5th International Conference on Control Theory and Applications (ICoCTA), Chengdu, China, 19–21 September 2025; pp. 1–6. [Google Scholar]

Figure 1. Evolution of node states in a traditional graph neural network.

Figure 2. Schematic diagram of the IS-BiGNN model structure.

Figure 3. Schematic diagram of the DWC-ISBiGNN model structure.

Figure 4. Comparison chart of predicted and actual values from various models.

Figure 5. Performance comparison between DANN-LSTM and the proposed DWC-ISBiGNN.

Figure 6. Comparison between the NNLS-weighted ensemble baseline and the proposed DWC-ISBiGNN model. The NNLS ensemble uses DNN, XGBoost, LSTM, and Transformer as base learners, while the DWC-ISBiGNN metrics are the paper-reported results.

Figure 7. Performance comparison of different model variants in ablation experiments.

Figure 8. Learned invariant adjacency matrix in the target domain. Each node represents a measured process variable, and brighter colors indicate stronger learned interaction strengths between variables.

Table 1. Experimental dataset statistics.

Domain	Type	Number of Labeled Samples	Number of Unlabeled Samples	Total Number of Samples
Source A	Source Domain	7262	0	7262
Source B	Source Domain	7275	0	7275
Source C	Source Domain	7279	0	7279
Source D	Source Domain	7244	0	7244
Target 1	Target Domain	1086	6154	7240

Table 2. Quantitative description of crystal-length-defined source and target domains.

Domain	Role	Crystal Length (mm)	Main Heater Power	Growth Speed
1	Source	0–179	71.6939 ± 1.4742	0.7427 ± 0.1398
2	Source	180–372	69.8924 ± 0.4259	0.7944 ± 0.1345
3	Source	373–567	70.0416 ± 0.4811	0.7955 ± 0.1059
4	Source	568–754	70.8077 ± 0.3977	0.7657 ± 0.0522
5	Target	755–932	71.7143 ± 0.3619	0.7282 ± 0.0351

Table 3. Distribution shift from each source domain to the target domain, measured by KS statistics and Wasserstein distance.

Source Domain	Main Heater Power KS	Growth Speed KS	V/G KS	V/G Wasserstein
1	0.4795	0.2863	0.2863	0.0322
2	0.9767	0.5929	0.5929	0.0544
3	0.9421	0.6243	0.6243	0.0452
4	0.7879	0.4617	0.4617	0.0229

Table 4. Configuration details of compared algorithms.

Algorithm	Configuration
DNN	Four-layer fully connected network, hidden layer dimensions 256-128-64-32, ReLU activation + Dropout regularization
LSTM	Two-layer LSTM, hidden dimension 128, followed by a fully connected regression layer
Transformer	Four-head self-attention + two-layer encoder, feed-forward network dimension 256
XGBoost	Gradient boosting tree ensemble, number of trees 500, max depth 8, learning rate 0.05
DANN-LSTM	Same LSTM backbone (two-layer LSTM, hidden dimension 128) with an additional domain-adversarial discriminator (two-layer MLP, hidden dimension 64) and gradient reversal layer. Labeled target samples used for regression; unlabeled target samples for domain alignment only

Table 5. Performance comparison of different models for V/G value prediction.

Model	RMSE	MAE	R²
DWC-ISBiGNN	0.0041	0.00285	0.9549
DANN-LSTM	0.004475	0.003706	0.9463
LSTM	0.005208	0.003147	0.9273
Transformer	0.005535	0.00454	0.9179
IS-BiGNN	0.006088	0.004385	0.9006
DNN	0.006169	0.004857	0.8980
XGBoost	0.006260	0.005542	0.8949

Table 6. Performance comparison of ablation experiments.

Model	RMSE	MAE	R²
IS-BiGNN	0.006088	0.004385	0.9006
D-ISBiGNN	0.005717	0.004063	0.9124
DW-ISBiGN	0.004343	0.002786	0.9494
DWC-ISBiGN	0.0041	0.00285	0.9549

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wan, Y.; Han, C.-J.; Liu, D.; Lei, H.-N.; Ren, J.-C. Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction. Crystals 2026, 16, 420. https://doi.org/10.3390/cryst16070420

AMA Style

Wan Y, Han C-J, Liu D, Lei H-N, Ren J-C. Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction. Crystals. 2026; 16(7):420. https://doi.org/10.3390/cryst16070420

Chicago/Turabian Style

Wan, Yin, Chun-Jie Han, Ding Liu, Hao-Nan Lei, and Jun-Chao Ren. 2026. "Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction" Crystals 16, no. 7: 420. https://doi.org/10.3390/cryst16070420

APA Style

Wan, Y., Han, C.-J., Liu, D., Lei, H.-N., & Ren, J.-C. (2026). Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction. Crystals, 16(7), 420. https://doi.org/10.3390/cryst16070420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on V/G Value Prediction Method for Silicon Single-Crystal Growth Based on Multi-Condition Invariant Feature Extraction

Abstract

1. Introduction

2. Theoretical Foundations of Graph Neural Networks and IS-BiGNN

2.1. Fundamental Theories of Graph Neural Networks

Core Mechanism of Graph Neural Network and Graph Convolutional Network

2.2. Invariant-Specific BiGNN (IS-BiGNN)

2.2.1. Symbol Definition and Problem Modeling

2.2.2. Graph Construction Module

2.2.3. Feature Extraction and Alignment Module

2.2.4. Joint Training Objectives

3. V/G Value Prediction Method Based on Multi-Condition Invariant Feature Extraction

3.1. Dynamic Graph Construction and Feature Extraction

3.2. Adaptive Weighting of Source-Domain Credibility

3.3. Conditional Alignment and Regression Consistency Constraints

3.4. Overall Model Framework and Training Objectives

3.5. Methodological Differences from IS-BiGNN

3.6. Algorithm Overview

4. Experimental Verification and Result Analysis

4.1. Data Description and Experimental Setup

4.1.1. Data Source and Composition

4.1.2. Data Preprocessing

4.1.3. Evaluation Metrics

4.1.4. Experimental Environment and Parameter Configuration

4.2. Overall Performance Comparison Experiment

Comparison with Weighted Ensemble Baseline

4.3. Ablation Experiments and Module Validation

4.4. Physical Interpretability Analysis of the Learned Graph Structure

5. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI