Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing

Kaitwanidvilai, Somyot; Sittisombut, Chaiwat; Huang, Yu; Bom, Sthitie

doi:10.3390/pr13040962

Open AccessArticle

Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing

by

Somyot Kaitwanidvilai

^1,*

,

Chaiwat Sittisombut

¹

,

Yu Huang

²

and

Sthitie Bom

²

¹

School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

²

Seagate Technology LLC, Cupertino, CA 95014, USA

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(4), 962; https://doi.org/10.3390/pr13040962

Submission received: 8 February 2025 / Revised: 6 March 2025 / Accepted: 15 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Process Automation and Smart Manufacturing in Industry 4.0/5.0)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the semiconductor industry has embraced advanced artificial intelligence (AI) techniques to facilitate intelligent manufacturing throughout their organizations, with particular emphasis on virtual metrology (VM) systems. Nonetheless, the practical application of data-driven virtual metrology for product quality inspection encounters notable hurdles, such as annotating inspections in highly dynamic industrial environments. This leads to complexities and significant expenses in data acquisition and VM model training. To address the challenges, we delved into transfer learning (TL). TL offers a valuable avenue for knowledge sharing and scaling AI models across various processes and factories. At the same time, research on transfer learning in VM systems remains limited. We propose a novel parameter transfer learning (PTL) architecture for VM systems and examine its application in industrial process automation. We implemented cross-factory and cross-recipe transfer learning to enhance VM performance and offer practical advice on adapting TL to meet individual needs and use cases. By leveraging extensive data from Seagate wafer factories, known for their large-scale and high-dimensional nature, we achieved significant PTL performance improvements across multiple performance metrics, with the true positive rate (TPR) increasing by 29% and false positive rate (FPR) decreasing by 43% in the cross-factory study. In contrast, in the cross-recipe study, TPR increased by 27.3% and FPR decreased by 6.5%. With our proposed PTL architecture and its performance achievements, insufficient data from the new manufacturing sites, new production lines and new products are addressed with shorter VM model training time and smaller computational power with strong final quality prediction confidence.

Keywords:

transfer learning; virtual metrology; wafer technology; data-driven; soft-sensing; quality prediction

1. Introduction

Machine learning (ML) has significantly enhanced the capabilities of intelligent manufacturing systems over the past few years. This improvement is particularly notable in industrial automation, where novel data-driven methodologies, such as predictive maintenance, computer vision, and anomaly detection, have played a pivotal role in advancing the development of systems that exhibit greater ease and robustness in automation than ever before. However, according to the recent survey of semiconductor device makers conducted by Mckinsey [1], only about 30% of respondents stated that they are already generating value through ML, while the other 70% are still in the pilot phase with ML and their progress has stalled. The production of semiconductors and microelectronics entails the use of highly complex and expensive equipment, and intricate fabrication processes that require a high degree of precision. To reduce costs, improve yields, and increase the overall fab throughput, these device makers actively leverage advanced machine learning techniques in multiple specific use cases, such as visual inspection, defect identification, root cause analysis, virtual metrology (VM), etc. This paper focuses on the scalability of ML-based virtual metrology models for semiconductor and microelectronics manufacturing.

The semiconductor manufacturing process is characterized by high fragmentation, with a diverse range of products flowing through production lines. This requires using complex machines and multiple process recipes and steps. In a certain process recipe, engineers typically specify one constant time frame for each step. However, the variability in individual wafers may introduce statistical or systematic fluctuations in the time frame required for a given step. A process could keep running until it has achieved the desired outcome, leading to an increase in both timelines and resource waste and potentially even chip damage. To improve processing accuracy, semiconductor companies leverage live tool-sensor data and tool-sensor readings from previous process steps, allowing machine learning models to capture nonlinear relationships between processes and wafer-inspection outcomes (measured by a metrology system).

Metrology is the science of measuring and characterizing tiny structures and materials. Metrology systems are responsible for ensuring production quality in the semiconductor manufacturing industry. Offline sampling inspection is a widely used method to achieve quality control goals. However, this approach can only evaluate the quality of a limited number of sampled wafers, resulting in a waiting period to obtain metrology values after the completion of the manufacturing process. The delay associated with metrology acquisition in offline sampling inspection precludes real-time monitoring of product quality, thus reducing the effectiveness. To overcome time and cost limitations involved by offline sampling, VM systems have been developed to predict metrology variables based on process and wafer state information [2]. A machine learning-based virtual metrology system, made possible by advancements in machine learning techniques, can be trained to automatically detect and classify defects on wafers with comparable or superior accuracy to human inspectors. Additionally, specialized hardware, such as tensor-processing units, and cloud [3] offerings enable the automated training of machine learning algorithms at scale. This, in turn, allows quicker piloting, real-time inference, and scalable deployment of VM systems.

In recent times, VM has garnered substantial attention from researchers, and its effectiveness in wafer inspection has been demonstrated through several studies, including a locally weighted partial least squares approach for the dry etching process [4], Gaussian process regression models for the chemical mechanical polishing process [5], etc. Recently, soft-sensing ConFormer [6] was developed in the first empirical study of wafer inspection, based on semiconductor manufacturing data provided by the IEEE BigData 2021 Cup in Soft Sensing at Scale-Seagate [2].

However, the practical implementation of VM models in deep learning is impeded by two distinct characteristics: First, ML algorithms assume that training and testing data originate from the same probability distribution. It is imperative that the training dataset and the actual problem are similar in terms of their feature space and the distribution of data therein. However, this assumption may not hold in reality as data collected from different production contexts is likely to arise from different probability distributions. Moreover, it is noteworthy that machine learning algorithms are limited to learning only the effects that are present in the training data. Consequently, the efficacy of these algorithms is reliant on the quality and quantity of the data, which must be large and diverse in order to include rare events as well. In practice, acquiring such comprehensive datasets is increasingly arduous and complex as problems become more intricate. Second, retraining an ML-based VM model once trained is similar to training a completely untrained one. It necessitates a substantial amount of computational power and access to all of the training data. In the context of a highly dynamic industrial automation environment such as an semiconductor manufacturer where production lines regularly switch between products, tools, or processes, this approach is impractical.

Transfer learning (TL) offers a potential solution to mitigate both issues at hand. TL refers to a collection of techniques that aim to reduce the volume and caliber of necessary data while simultaneously enabling the utilization of prior knowledge instead of commencing each learning task from scratch. This is accomplished by transferring knowledge between tasks, thus producing distributed cooperative learning systems. Although transfer learning has been extensively studied in areas such as medical imaging, spam detection, and speech recognition, there appears to be a lack of similar research in the semiconductor and microelectronics industrial automation sector.

We investigate the value of the transfer learning approach within the context of semiconductor manufacturing. The objective was to achieve a more precise virtual metrology model that could effectively manage data scarcity across a variety of processing scenarios. Specifically, this study introduces a TL method that utilizes parameter transfer. Parameter transfer enables the adaptation of a machine learning model to variations in the feature space arising from differences in the types and number of sensors used in the process monitoring system, without requiring complete retraining of the algorithm [7]. It has been adopted in quality management [8], anomaly detection [9,10], etc. The proposed approach involves reusing a pre-trained deep neural network from the source domains to enhance the predictive capability in the target domains, thereby reducing the requirement for extensive training data in the target domains. The TL used in the study is based on soft-sensing ConFormer (CONvolutional transFORMER) [6], which is a VM model that underpins transfer learning for wafer fault diagnostic. This model comprises multi-head convolution modules that leverage the benefits of fast and lightweight convolution operations while also being capable of learning robust representations through multi-head design akin to transformers.

In this paper, performances of different TL tasks—cross-factory and cross-recipe—are evaluated with a focus on the key parameters of the successful strategies. Numerical experiments are conducted on real-world industrial semiconductor manufacturing data provided by Seagate wafer factories.

2. Preliminaries

Aligned with the objective of the current study, this section presents a review of the limited existing literature pertaining to transfer learning within the context of virtual metrology. In [11], the authors propose a unified VM model for two identically designed chambers utilizing a deep learning architecture, specifically a domain adversarial neural network. This model incorporates a discriminator that distinguishes between the two chambers during the examination process. In [12], the use of TL techniques was explored for equipment with identical design in scenarios where the number of wafer records for the target equipment is inadequate. A VM modeling approach based on the paradigm of transfer learning in a fragmented production context was presented in [13], which exploits a convolutional neural network (CNN)-based spatial pyramid pooling model to perform the TL with inputs of different sizes. Hsieh et al. [14] proposed an automated VM (AVM) server that also employs CNNs for efficient VM processing, which was achieved by optimizing the CNN architecture and developing an automated data alignment scheme to align the inputs, enhancing the feasibility of deployment. Based on that, an advanced AVM system [15] based on a convolutional autoencoder and TL was proposed to address the practical application challenge, i.e., insufficient metrology data and online model refreshing. Experimental results confirmed the feasibility of employing the advanced AVM system for onsite applications in actual production lines.

However, given the limited literature related to TL in VM, there are insufficient guidelines to find a good exemplary TL application to achieve scalable VM in semiconductor manufacturing. This paper aims to offer insights into the adaptability of scalable deep learning-based VM with a pragmatic transfer learning approach to the specific requirements of diverse industrial processes.

2.1. Transfer Learning

Transfer learning is a field that explores and develops machine learning methods by leveraging knowledge gained from previously solved source tasks to more efficiently solve new target tasks. In the published literature, inconsistencies remain in transfer learning terminology. Regarding labeled data availability, three common problem categories are distinguished:

Inductive transfer learning target is where domain labels are provided.
Transductive transfer learning is where only the source domain labels are available.
Unsupervised transfer learning is where neither source nor target domain labels are available.

Accordingly, four main approach categories are defined among statistical transfer learning and deep transfer learning approaches:

Instance transfer learning describes approaches that add (weighted) instances from the source domain(s) to the target domain to improve training on the target task.
Feature representation transfer learning involves mapping instances from both the source and target domains into a shared feature space. This approach can enhance training for the target task.
Parameter transfer learning involves sharing parameters or priors between source and target domain models to enhance the initial model before training on the target task. In deep transfer learning, this is achieved through the partial reuse of deep neural networks pre-trained on the source domain(s).
Relational knowledge transfer learning maps relational knowledge from the source to the target domains, which usually requires domain expertise. However, deep transfer learning using generative adversarial networks or end-to-end approaches can alleviate this issue by integrating domain adaptation into the decision-making function.

The applicability of different approach categories in a real-world setting depends on specific factors such as dataset and storage sizes, communication bandwidth, and the availability of expert knowledge. It is important to note that this applicability is not solely determined by the advantages or disadvantages of the approaches in general or the categorization of the problem, as described above.

Parameter transfer learning (PTL) fundamentally involves the transfer of model parameters, specifically weights and biases, from a pre-trained model within a source domain to a new model in a target domain. In contrast, traditional transfer learning predominantly emphasizes the transfer of data instances from the source domain to the target domain, adjusting the model’s weights during the target domain’s training process. The principal advantage of PTL over traditional transfer learning lies in its ability to significantly reduce both the training time and the computational resources required to retrain the target model, thereby enhancing efficiency and scalability in model deployment.

The application of parameter transfer learning (PTL) in industrial process automation is constrained by the necessity for congruence between the source and target domains. Specifically, the industrial processes in both domains must exhibit similar features or tasks to ensure effective knowledge transfer. For instance, implementing PTL in a new factory’s process automation (target domain) necessitates that the setup processes closely resemble those of an existing factory (source domain). This requirement for similarity poses a significant limitation, as it restricts the versatility and broader applicability of PTL in diverse industrial settings where process characteristics may differ substantially.

2.2. Virtual Metrology

In a bid to explore the practicality of implementing virtual metrology in its manufacturing operations, Seagate Technology conducted a comprehensive study utilizing a large volume of tool-sensor data from multiple manufacturing sites worldwide. As part of this study, the Seagate researchers proposed an autoencoder-based model that achieved dimension reduction and virtual metrology prediction simultaneously. The virtual metrology problem is complex and requires the utilization of large-scale deep learning models, particularly in scenarios involving time-series data. Incorporating sensor data sequences into the model enhances its predictive capabilities, but it simultaneously necessitates a more intricate design for models. To this end, the researchers developed the soft-sensing transformer model, which leverages self-attention mechanisms that have been proven to be effective for sequential data and efficient for high-dimensional inputs. However, while the soft-sensing transformer model demonstrated exceptional performance on some virtual metrology tasks in Seagate’s sensor data, it was unable to uncover correlations between sensors. The gaps in correlation and interpretability were filled by the ConFormer [6] model. ConFormer combines convolutional networks and transformers to extract the local correlation between neighbor features (sensors) and enhance accuracy while building upon this idea and incorporates correlations between all sensors through a graph neural network. The ConFormer model exhibits impressive capabilities in prediction performance and feature correlation extraction. DeepViz is a technique that visualizes features importance and can be applied to all the aforementioned models. By visualizing importance, DeepViz not only facilitates the interpretation of how the models make predictions but also provides a means of improving model performance by assigning higher weights to more important features. In addition, the researchers at Seagate conducted a thorough investigation of model training and data processing techniques, yielding valuable insights in this regard.

2.3. Problem Formulation: Transfer Learning in Seagate Factories

Given a source domain

D_{S}

and source learning task

T_{S}

, a target domain

D_{T}

and a target learning task

T_{T}

, transfer learning aims to help improve the learning of the target predictive function (VM model)

f (\cdot)

in

D_{T}

by using the knowledge in

D_{S}

and

T_{S}

, where

D_{S} \neq D_{T}

.

In the given definition, a domain is a pair

D = {X, P (X)}

, where

X

is the feature or sensor space and

P (X) = {x_{1}, \dots, x_{n}}

is marginal probability distribution. Thus, the condition

D_{S} \neq D_{S}

indicates that either

X_{S} \neq X_{T}

or

P (X_{S}) \neq P (X_{T})

.

Motivated by the diversity of wafer manufacturing lines, we, therefore, propose the following two base use cases to examine best practice examples of TL and facilitate their adaption to other scenarios.

Cross-factory refers to the transfer of knowledge and expertise from one location to another, as illustrated in Figure 1a. This practice involves leveraging the knowledge gained from one factory or site and applying it to other similar entities located elsewhere. This allows the dissemination of valuable knowledge and the sharing of best practices across different sites, ultimately improving overall performance and efficiency. In the realm of semiconductor manufacturing, a diverse range of deposition tools are utilized to create electronic components via the deposition of a thin film of material onto a substrate. They have the same process parameters, i.e., $X_{S} = X_{T}$ , while tools vary from each other and are located in different sites, resulting in different performance, i.e., $P (X_{S}) \neq P (X_{T})$ ;
Cross-recipe refers to the transfer of knowledge from one distinct manufacturing process recipe to another, for instance, from the processing of one product to a different one, , as illustrated in Figure 1b. A recipe in wafer manufacturing typically refers to a set of instructions detailing the specific steps and parameters required to fabricate a component at a given operation in the process flow. In this case, $X_{S} \neq X_{T}$ and $P (X_{S}) \neq P (X_{T})$ .

It is noteworthy that, in the VM task perspective, the task can be defined as

F = {Y, f (\cdot)}

, where

Y

is the label space and

f (\cdot)

is the predictive VM model that can be defined as

P (Y | X)

. Taking deposition machining as an example, the film thickness on all the wafers are examined after the process finishes, i.e.,

Y_{T} = Y_{S}

. The different performances of different tools will result in variations in the thickness measurements; that is

P (Y_{T} | X_{T}) \neq P (Y_{T} | X_{T})

.

2.4. Manufacturing Data Acquisition

The wafer manufacturing process is a complex and time-consuming operation, involving numerous stages such as metal deposition, dielectric deposition, etching, electroplating, planarization, and lithography. The intricacy of the process makes it challenging to maintain manufacturing stability, hindering quality control in industrial production. In order to enhance the predictability of qualified product yield, a large sensor network is installed in the manufacturing line to monitor wafer quality. At each stage of processing, engineers collect and analyze multiple critical sensor records. These records provide data on key quality indicators that allow the engineers to assess the quality of the wafer. To do this, they use internal heuristic threshold values for each indicator. However, the sensor data collected are often highly nonlinear, dynamic, and noisy, making them difficult to handle. To overcome this challenge, Seagate wafer factories have adopted data-driven VM models [6,16]. These models use multivariate time-series sensor data to predict inspection results, specifically the pass or fail of binary indicators.

Specifically, the data are obtained from two manufacturing sites with similar processes. However, while the dataset from Factory 1 exhibits greater volume and diversity, the dataset from Factory 2 is relatively smaller in scale. The objective of this study is to facilitate the transfer of knowledge acquired from the larger and more diverse dataset at Factory 1 to the smaller dataset at Factory 2. We utilize datasets obtained from five distinct manufacturing process recipes spanning a duration of 2 years to train and validate the models. The test set, conversely, comprises data from a more recent time period and is divided into four periods of equal duration.

The sample size of the training and validation sets from Factory 1 and Factory 2, grouped according to five different process recipes, are presented in Figure 2. The ratio between the training set and validation set between Factory 1 and Factory 2 are less than 60% and 67%, respectively. Meanwhile, Figure 3 displays the sample size of the test sets across four distinct testing periods for both factories. The ratios between the test set between Factory 1 and Factory 2 are less than 72%.

3. Materials and Methods

3.1. Virtual Metrology Base Model Architecture

The ConFormer (CONvolutional transFORMER) Model [6] is used as the baseline for the virtual metrology transfer learning model. The ConFormer architecture consists of 2 main components, a convolutional neural network (CNN) and multi-head self-attention layers [17]. The CNNs and multi-head self-attention mechanism are described in detail in Section 3.1.1 and Section 3.1.2, respectively.

The ConFormer Model is illustrated in Figure 4. The sensor data, as described in Section 2.4, are fed into a dense layer as an embedding layer to reduce the high-dimensional input data, which is expressed in Equation (1)

X = δ (F C (input))

(1)

where

δ

is a sigmoid activation function, and FC(·) is a fully connected (FC) layer, i.e., dense layer.

The dense layer data output is fed to the multi-head self-attention layer, which is described in Section 3.1.2. In this ConFormer Model, there are three convolutional blocks, each of which consists of following layers:

Gated linear unit (GLU) [18], which is expressed in Equation (2).

$X^{t} = W^{A} X^{t - 1} ⊙ δ (W^{B} X^{t - 1})$

(2)

where $δ$ is a sigmoid activation function, $W^{A}$ and $W^{B}$ are weight parameters which are associated with previous input $X^{t - 1}$ .
Convolutional neural network (CNN), which is described in Section 3.1.1
Batch Normalization [19], which is a layer to normalize activations in-between deep neural network layers. It also aims to improve deep learning model performance and speed up model training convergence.
Swish activation layer [20], which can be calculated from Equation (3).

$f (x) = x ⊙ δ (x)$

(3)

where $δ$ is a sigmoid activation function.
Fully connected layer (also known as a dense layer), which is expressed in Equation (4)

$y_{j k} (x) = \sum_{i = 1}^{n} W_{j k} x_{i} + b_{0}$

(4)

where W denotes weight parameters associated with input i, and $b_{0}$ denotes a bias parameter.

3.1.1. Convolutional Neural Network (CNN)

In the convolutional neural network (CNN), the convolution is performed by multiplying a filter or kernel with a data fragment at each point and moving the filter by n-strides along the X and Y directions of the data to complete the whole image. Feature maps are generated as a result. The stride is the number of hops by which the filter moves along the network [21].

Figure 5 shows a graphical explanation of the convolution process, where the input data have a shape of (7 × 7 × 1) and there are 3 kernels,

K_{11}^{(1)}

,

K_{12}^{(1)}

, and

K_{13}^{(1)}

with a shape of 3 × 3 in the first convolutional layer. As the result, 3 feature maps

A_{1}^{(1)}, A_{2}^{(1)}

, and

A_{3}^{(1)}

are generated. In the second convolution layer, there are 3 kernels, namely,

K_{11}^{(2)}, K_{21}^{(2)}, K_{31}^{(2)},

K_{12}^{(2)}, K_{22}^{(2)}, K_{32}^{(2)},

K_{12}^{(2)}, K_{23}^{(2)},

and

K_{33}^{(2)}

for each channel with a kernel shape of 2 × 2 that are convoluted with the previous feature maps to obtain the next feature maps

A_{1}^{(2)}, A_{2}^{(2)}

, and

A_{3}^{(2)}

.

The convolution result can be expressed as in Equation (5)

A_{j}^{(l)} = f^{(l)} (\sum_{(i = 1)}^{M^{(l - 1)}} A_{i}^{(l - 1)} * K_{i j}^{(l)} + b_{j}^{(l)})

(5)

where M is the number of feature maps,

A_{i}^{l - 1}

are the feature maps from the previous layer t,

K_{i j}^{l}

is kernel in the current layer, and

b_{j}^{l}

denotes the bias in the current layer.

Figure 4. ConFormer (CONvolutional transFORMER) Architecture.

Figure 5. Two-Layer Convolutional Neural Network with 3 Filters in Each Layer.

3.1.2. Multi-Head Self-Attention

Multi-head self-attention ws introduced by [17], and was first used in transformer models for natural language processing applications. Self-attention captures the dependency among input tokens, which is calculated by the scaled dot product between the Q and K input tokens, and the result is normalized by a softmax function with the response of the token V as expressed in Equation (6). The multi-head self-attention is a concatenation of each self-attention of the

i -

head, as described in Equation (6).

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

An alternative concise expression of the scaled dot product operation is also represented in Equation (7).

Attention (X) = Attention (X W^{Q}, X W^{K}, X W^{V}) = Attention (Q, K, V)

(7)

As shown in Figure 6a, an initial input of the scaled dot product attention is the input embedding

X = (x_{1}, x_{2}, \dots, x_{T})

with a size of

T \times d_{m o d e l}

matrix. Then, three matrices are generated from the input X:

The query (Q), $Q = X W^{Q}$ matrix, where $W^{Q}$ is a $d_{m o d e l} \times d_{q}$ matrix, as a result, Q is a $T \times d_{q}$ matrix.
The key (K), $K = X W^{K}$ matrix, where $W^{K}$ is a $d_{m o d e l} \times d_{k}$ matrix, and K is a $T \times d_{k}$ matrix. The scaled dot product between Q and K requires $d_{q} = d_{k}$ .
The value (V), $V = X W^{V}$ matrix where $W^{V}$ is a $d_{m o d e l} \times d_{V}$ matrix, with a size of V.

The normalization by

\sqrt{d_{k}}

of the dot product between query (Q) vectors and key (K) vectors is required to control the dot product result magnitude. The large magnitude results in a softmax function in a region with a small gradient [22].

The benefit of the attention mechanism is that it does not have any recurrent connections and it can compute the input token in parallel in the same layer. As a result, it gains better effectiveness, efficiency, and scalability [23]. This parallel computation is also called multi-head attention, which is shown in Figure 6b and expressed in Equation (8)

Multi-Head Attention (X) = Concat ({head}_{1}, {head}_{2}, \dots, {head}_{h}) W^{O}

(8)

where

{head}_{i} = Attention (Q W_{i}^{T}, Q W_{i}^{T}, Q W_{i}^{T})

3.2. Parameter Transfer Learning Architecture

This study aims to investigate the parameter transfer approach, a technique that employs a parametric model to encode and transfer knowledge. The central objective of this approach is to minimize the expected risk associated with the target task by leveraging relevant knowledge from an effective parameter in the source region and applying it to the target region. To accomplish this, we propose a strategy that entails learning an optimal parameter in the source region and transferring a subset of this parameter to the target region. One way to control the parameters involves directly constraining the parameters of the source model to the target model.

In this paper, as shown in Figure 7, the parameter transfer learning design is executed in two steps, pre-training followed by fine-tuning. In the first step, sufficient historical data collected from a variety of tools in the source domain are collected to build the initial model. After pre-training the VM model with sufficient qualified data, the ConFormer blocks within the VM model contain deep, discriminative convolutional filters that are learned from source domain data. However, due to different characteristics that exist among tools from different domains, the prediction accuracy may be poor when leveraging this initial model to other tools, even of the same type. Therefore, to further enhance the accuracy of pre-trained VM models on the target domain, the pre-trained model needs to be fine-tuned. However, during the fine-tuning process, the gradient is allowed to propagate back through the whole network, which may compromise the discriminative filters in the ConFormer blocks. To avoid this problem, it is recommended to freeze all convolutional layers within the VM model, while the final layers (the last FC layer and the softmax function) of the VM model are fine-tuned. In the context of convolutional neural networks, freezing layers refer to a technique used to control the weight update process. Specifically, by selectively freezing different numbers of convolutional layers, the weight behavior of the convolutional layers can be derived for the target model, holding all other factors constant. Finally, by fine-tuning the modifiable parameters until reaching the desired objective function, the re-trained VM model should be able to recognize different object types or classes in the target domain.

Key challenges associated with applying PTL in VM lie in the necessity for extensive pre-training of the source model on large-scale datasets. This pre-training is crucial, as it lays the groundwork for subsequent model optimization and fine-tuning processes. Such rigorous preparation is essential, as it directly influences the VM’s performance metrics, specifically the true positive rate (TPR), false positive rate (FPR), and area under the curve (AUC). Thus, the performance and efficacy of the VM are contingent upon these comprehensive and meticulous preparatory stages.

3.3. Experiment Setting

3.3.1. Design of Experiments

First, we aimed to investigate the effectiveness of transfer learning in the context of virtual metrology modeling with a cross-factory adaptation. Specifically, we employed datasets from two distinct factories, with those from factory 1 and factory 2 serving as the source domains and target domain, respectively. The utilization of Factory 2’s dataset as the target domain is of particular interest given its relatively small size. Such a scenario highlights the potential benefits of TL in addressing the issue of data scarcity. Second, a cross-recipe transfer learning experiment was conducted utilizing a dataset derived from two separate manufacturing process recipe groups. The source recipe group exhibited a greater degree of diversity than recipe groups 1 to 5. As such, the source recipe group was designated as the source domain, while recipe groups 1 to 5 were selected as the target domain for this particular experiment.

3.3.2. Evaluation Metrics

The evaluation of virtual metrology models’ performance involves assessing the true positive rate (TPR), false positive rate (FPR), and area under a receiver operating characteristics curve (AUC-ROC). This is carried out for models trained on the base domain, target domain, and with transfer learning [24]. According to Figure 8, the true positive rate (TPR), false positive rate (FPR), and AUC score can be expressed as in Equation (9), (10) and (11), respectively. Higher TPR and AUC are indicators of superior model performance, while a lower FPR suggests better model performance.

TPR = \frac{TP}{TP + FN}

(9)

FPR = \frac{FP}{FP + TN}

(10)

\begin{matrix} AUC = \int_{0}^{1} TPR d (FPR) \end{matrix}

(11)

A higher TPR indicates that PTL is effectively leveraging the knowledge gained from the pre-trained model to accurately identify true positive in the VM task. Conversely, a lower FPR shows that PTL directly increases precision in VM performance, meaning that the VM model’s positive predictions are more likely to be correct.

3.3.3. Implementation Details

The ConFormer model and its training methodology are utilized across all domains. The model was implemented in the Keras 2.3 framework. The optimal values were determined after comparing the results of various parameter choices. The embedding size was selected as 64, as it outperformed the other sizes in the pool of size parameters [32, 64, 128, 256]. To address overfitting, different dropout values ranging from no dropout to 0.8 dropout were tested, and a dropout value of 0.5 was found to be optimal. To further regularize the model, different values in the set [1e-3, 1e-4, 1e-5, 1e-6] were tested, and a regularization value of 1e-4 was chosen. The batch size was set as 1024. The optimization was performed using the Adam optimizer, with the scheduled learning rate set as in [6]. The early stopping mechanism was executed if the performance of the model on the validation dataset started to degrade (with patience being set at 50 epochs).

The transfer learning procedure realizing the idea of parameter transfer of deep VM models comprises the following steps:

Source domain training and evaluation: The base model was trained on data from the source domain, from which the weight parameters of the model were preserved. In this step, the maximum epoch number is set as 2000. The increase in the number of epochs determines the model performance improvement [25]. The initial learning rate is set as 0.001. This pre-trained model serves as a pre-trained baseline.
Target domain training and evaluation: the base model was trained separately on the target domain dataset using hyperparameter settings identical to those of the source domain.
Parameter transfer and evaluation: The base model was initialized using the weights from the pre-trained baseline. The weight parameters of the three CNN layers discussed in Section 3.1 are frozen, which restricts learning and prevents weight parameter updates. Conversely, the input layer of a pre-trained model is unfrozen and it may be necessary to modify the input dimensions in cases where the data shapes of the source and target domains differ. A dense layer is appended to the final layer, creating a transfer learning model through fine-tuning. The partially frozen model is re-trained on the target domain, where the epoch is limited to 200, while the initial learning rate is reduced to $0.0001$ . Thus, reducing the number of epochs not only decreases training time but also minimizes computational power consumption.

4. Results

In this section, the cross-factory transfer learning results are shown in Section 4.1 and cross-recipe transfer learning results are described in Section 4.2.

4.1. Cross-Factory Transfer Learning Results

The cross-factory VM transfer learning performance result is shown in Table 1. Due to the significantly larger and more diverse source domain dataset (factory 1), the model encounters difficulty in learning all the patterns. Therefore, the model’s performance on the source domain may not surpass that of the target domain, as evidenced in the first two columns of Table 1. The utilization of the parameter transfer learning approach aids in enhancing the performance of the VM model on the target domain (factory 2). For five different recipe groups of transfer learning from factory 1 to factory 2, the overall TPRs in the four test periods increased while FPRs decreased. Especially for Group 1, the average test TPR increased by

29 %

while the average FPR decreased by 43%. The cross-factory performance improvement is shown in Figure 9.

However, it should be noted that cross-factory transfer learning may not consistently improve the performance of VM models. For instance, we observed a reduction in TPRs after transfer learning for Group 3 and Group 5, which can be attributed to the poor performance of the model in the source domain. Nevertheless, given the significant improvement in FPRs, such a reduction in TPRs is considered acceptable.

4.2. Cross-Recipe Transfer Learning Results

The VM performance on the source recipe domain is shown in Table 2. The cross-recipe transfer learning experiment result is shown in Table 3. By utilizing the knowledge in the source recipe group, we can significantly improve the performance of target recipe Groups 1 to 4. However, in target recipe Group 5, the average FPR on the test dataset significantly increased from 3.37% to 9.87%, while the average TPR improved from 55% to 82.3%. The cross-recipe performance improvement is shown in Figure 10.

Figure 9. Cross-Factory Performance Improvement.

Figure 10. Cross-Recipe Performance Improvement.

5. Conclusions

This study presents a parameter transfer learning approach for virtual metrology using extensive manufacturing data from Seagate Technology. The performance of our parameter transfer learning approach was evaluated on various datasets sourced from different factories, time periods, and processing recipes. With our PTL architecture, performance improved, with the true positive rate (TPR) increasing by 29% and the false positive rate (FPR) decreasing by 43% in a cross-factory study. In contrast, in a cross-recipe study, TPR increased by 27.3% but FPR decreased by only 6.5%. These results demonstrate that our parameter transfer learning approach can significantly enhance the manufacturing model’s quality prediction capability, particularly on insufficient datasets or datasets with limited diversity. Therefore, our approach not only enhances manufacturing processes where new manufacturing sites, new production lines, or new products have insufficient data for VM model training but also requires shorter model training time and smaller computational power to acquire a high true prediction rate.

6. Discussion

Although extensive datasets have been investigated, they constitute only a limited fraction of the complete Seagate manufacturing lines. Our future investigations will entail more in-depth modeling and the utilization of transfer learning across data from various product lines within Seagate. Therefore, manufacturing sites can apply our PTL approach to other manufacturing processes, e.g., wafer inspection.

Model parameters are meticulously optimized for each virtual metrology (VM) scenario. This optimization process necessitates a high degree of similarity in features, tasks, and operational conditions between the source domain and the target domain. Such congruence is essential to ensure that the knowledge transferred from the pre-trained model is effectively leveraged, thereby optimizing the performance of the model in the target domain.

Author Contributions

Conceptualization, S.K. and S.B.; methodology, C.S. and Y.H.; software, C.S.; validation, S.K. and S.B.; formal analysis, C.S. and Y.H.; investigation, S.K. and S.B.; resources, S.K. and S.B.; data curation, S.B.; writing—original draft preparation, S.K., C.S., and Y.H.; writing—review and editing, S.K. and S.B.; funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Seagate Technology.

Data Availability Statement

The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest

Authors Yu Huang and Sthitie Bom were employed by the company Seagate Technology. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the Curve
CNN	Convolutional Neural Network
FPR	False Positive Rate
PTL	Parameter Transfer Learning
TPR	True Positive Rate
TL	Transfer Learning
VM	Virtual Metrology

References

Göke, S.; Staight, K.; Vrijen, R. Scaling AI in the Sector That Enables It: Lessons for Semiconductor-Device Makers; Article, McKinsey & Company: New York, NY, USA, 2021. [Google Scholar]
Petrov, S.; Zhang, C.; Yella, J.; Huang, Y.; Qian, X.; Bom, S. IEEE BigData 2021 Cup: Soft sensing at scale. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5780–5785. [Google Scholar]
Symeonidis, G.; Nerantzis, E.; Kazakis, A.; Papakostas, G.A. MLOps-definitions, tools and challenges. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; pp. 0453–0460. [Google Scholar]
Hirai, T.; Kano, M. Adaptive virtual metrology design for semiconductor dry etching process through locally weighted partial least squares. IEEE Trans. Semicond. Manuf. 2015, 28, 137–144. [Google Scholar] [CrossRef]
Wan, J.; McLoone, S. Gaussian process regression for virtual metrology-enabled run-to-run control in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2017, 31, 12–21. [Google Scholar] [CrossRef]
Yella, J.; Zhang, C.; Petrov, S.; Huang, Y.; Qian, X.; Minai, A.A.; Bom, S. Soft-sensing conformer: A curriculum learning-based convolutional transformer. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 1990–1998. [Google Scholar]
Maschler, B.; Weyrich, M. Deep transfer learning for industrial automation: A review and discussion of new techniques for data-driven machine learning. IEEE Ind. Electron. Mag. 2021, 15, 65–75. [Google Scholar] [CrossRef]
Tercan, H.; Guajardo, A.; Heinisch, J.; Thiele, T.; Hopmann, C.; Meisen, T. Transfer-learning: Bridging the gap between real and simulation data for machine learning in injection molding. Procedia CIRP 2018, 72, 185–190. [Google Scholar]
Liang, P.; Yang, H.D.; Chen, W.S.; Xiao, S.Y.; Lan, Z.Z. Transfer learning for aluminium extrusion electricity consumption anomaly detection via deep neural networks. Int. J. Comput. Integr. Manuf. 2018, 31, 396–405. [Google Scholar] [CrossRef]
Hsieh, R.J.; Chou, J.; Ho, C.H. Unsupervised online anomaly detection on multivariate sensing time series data for smart manufacturing. In Proceedings of the 2019 IEEE 12th conference on service-oriented computing and applications (SOCA), Kaohsiung, Taiwan, 18–21 November 2019; pp. 90–97. [Google Scholar]
Gentner, N.; Kyek, A.; Yang, Y.; Carletti, M.; Susto, G.A. Enhancing scalability of virtual metrology: A deep learning-based approach for domain adaptation. In Proceedings of the 2020 Winter Simulation Conference (WSC), Orlando, FL, USA, 14–18 December 2020; pp. 1898–1909. [Google Scholar]
Kang, P.; Kim, D.; Cho, S. Semi-supervised support vector regression based on self-training with label uncertainty: An application to virtual metrology in semiconductor manufacturing. Expert Syst. Appl. 2016, 51, 85–106. [Google Scholar] [CrossRef]
Clain, R.; Borodin, V.; Juge, M.; Roussy, A. Virtual metrology for semiconductor manufacturing: Focus on transfer learning. In Proceedings of the 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), Lyon, France, 23–27 August 2021; pp. 1621–1626. [Google Scholar]
Hsieh, Y.M.; Wang, T.J.; Lin, C.Y.; Peng, L.H.; Cheng, F.T.; Shang, S.Y. Convolutional neural networks for automatic virtual metrology. IEEE Robot. Autom. Lett. 2021, 6, 5720–5727. [Google Scholar] [CrossRef]
Hsieh, Y.M.; Wang, T.J.; Lin, C.Y.; Tsai, Y.F.; Cheng, F.T. Convolutional Autoencoder and Transfer Learning for Automatic Virtual Metrology. IEEE Robot. Autom. Lett. 2022, 7, 8423–8430. [Google Scholar] [CrossRef]
Zhang, C.; Yella, J.; Huang, Y.; Qian, X.; Petrov, S.; Rzhetsky, A.; Bom, S. Soft sensing transformer: Hundreds of sensors are worth a single word. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 1999–2008. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
Bjorck, N.; Gomes, C.P.; Selman, B.; Weinberger, K.Q. Understanding batch normalization. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Ramachandran, P.; Zoph, B.; Le, Q.V. Swish: A self-gated activation function. arXiv 2017, arXiv:1710.05941. [Google Scholar]
Anaya-Isaza, A.; Mera-Jiménez, L.; Zequera-Diaz, M. An overview of deep learning in medical imaging. Inform. Med. Unlocked 2021, 26, 100723. [Google Scholar] [CrossRef]
Zhang, C.; Bis, D.; Liu, X.; He, Z. Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks. BMC Bioinform. 2019, 20, 502. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Liu, J.; Han, J. Multi-head or Single-head? An Empirical Comparison for Transformer Training. arXiv 2021, arXiv:2106.09650. [Google Scholar]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Ajayi, O.G.; Ashi, J. Effect of varying training epochs of a Faster Region-Based Convolutional Neural Network on the Accuracy of an Automatic Weed Classification Scheme. Smart Agric. Technol. 2023, 3, 100128. [Google Scholar] [CrossRef]

Figure 1. (a) Cross Factory Transfer Learning (b) Cross Recipe Transfer Learning.

Figure 2. Training and Validation Sample Sizes from Factory 1 and Factory 2.

Figure 3. Test Dataset Sample Sizes from 4 Test Periods from Factory 1 and Factory 2.

Figure 6. (a) Scaled Dot Product Attention (b) Multi-Head Attention.

Figure 7. Parameter Transfer Learning Architecture.

Figure 8. Confusion Matrix.

Table 1. Cross-Factory Transfer Learning Performance and Result.

Groups	Factory 1 Baseline			Factory 2 Baseline			Transfer Learning			Performance Improvement
Groups	TPR	FPR	AUC	TPR	FPR	AUC	TPR	FPR	AUC	TPR	FPR	AUC
Group 1 Train	0.93201	0.10800	0.97113	0.88554	0.08318	0.96831	0.93820	0.05142	0.96720	5.95%	38.18%	−0.11%
Group 1 Validate	0.70629	0.11295	0.85057	0.63462	0.09050	0.86057	0.80000	0.06798	0.90522	26.06%	24.88%	5.19%
Group 1 Test Period 1	0.67398	0.10550	0.86864	0.56369	0.10275	0.79409	0.88621	0.05657	0.96594	57.22%	44.94%	21.64%
Group 1 Test Period 2	0.92121	0.10061	0.96872	0.68571	0.09773	0.83002	0.86333	0.05568	0.93467	25.90%	43.03%	12.61%
Group 1 Test Period 3	0.95015	0.18340	0.93859	0.72783	0.14003	0.85951	0.85172	0.05846	0.96448	17.02%	58.25%	12.21%
Group 1 Test Period 4	0.80059	0.10215	0.93714	0.83333	0.13264	0.90021	0.96667	0.09797	0.96113	16.00%	26.14%	6.77%
Group 2 Train	0.93409	0.11528	0.97415	0.94842	0.05319	0.98210	0.88152	0.05926	0.93433	−7.05%	−11.43%	−4.86%
Group 2 Validate	0.61000	0.11607	0.75931	0.71000	0.06187	0.86621	0.95000	0.06630	0.95063	33.80%	−7.16%	9.75%
Group 2 Test Period 1	0.75909	0.11886	0.85305	0.47826	0.05971	0.77913	0.66522	0.06060	0.90365	39.09%	−1.50%	15.98%
Group 2 Test Period 2	0.86522	0.11623	0.91393	0.90909	0.06180	0.97580	0.95000	0.07753	0.98087	4.50%	−25.46%	0.52%
Group 2 Test Period 3	0.85455	0.13802	0.91420	0.82609	0.07526	0.87500	0.82609	0.06124	0.93577	0.00%	18.63%	6.94%
Group 2 Test Period 4	0.84545	0.12663	0.92936	0.85217	0.07244	0.94098	0.91304	0.06530	0.95695	7.14%	9.85%	1.70%
Group 3 Train	0.89352	0.04240	0.98279	0.91161	0.02756	0.99365	0.80991	0.01586	0.94794	−11.16%	42.43%	−4.60%
Group 3 Validate	0.68333	0.04118	0.82259	0.75000	0.03240	0.87674	0.83333	0.02041	0.93287	11.11%	37.01%	6.40%
Group 3 Test Period 1	0.73571	0.04196	0.83895	0.80000	0.02920	0.92838	0.84615	0.01580	0.99339	5.77%	45.87%	7.00%
Group 3 Test Period 2	0.86429	0.04168	0.98271	0.76923	0.04006	0.96515	0.57692	0.02505	0.92206	−25.00%	37.46%	−4.47%
Group 3 Test Period 3	0.80588	0.08444	0.92772	0.96000	0.04666	0.98552	0.93333	0.04873	0.99147	−2.78%	−4.42%	0.60%
Group 3 Test Period 4	0.72941	0.05041	0.95821	0.82857	0.02601	0.92222	0.78571	0.01300	0.94119	−5.17%	50.00%	2.06%
Group 4 Train	0.95660	0.02760	0.99066	0.98889	0.03744	0.99164	0.93670	0.02269	0.97884	−5.28%	39.39%	−1.29%
Group 4 Validate	0.75833	0.03079	0.90862	0.75000	0.03889	0.89917	0.83333	0.02602	0.90209	11.11%	33.10%	0.33%
Group 4 Test Period 1	0.78462	0.03438	0.91987	1.00000	0.04249	0.99548	1.00000	0.02875	0.99634	0.00%	32.33%	0.09%
Group 4 Test Period 2	0.85385	0.03801	0.97279	0.96923	0.03265	0.98558	1.00000	0.01986	0.99885	3.17%	39.16%	1.35%
Group 4 Test Period 3	0.86429	0.09857	0.93350	0.94286	0.06368	0.96600	1.00000	0.02785	0.99525	6.06%	56.27%	3.03%
Group 4 Test Period 4	0.72667	0.03707	0.89895	0.95714	0.05412	0.95541	0.92857	0.02163	0.97827	−2.99%	60.04%	2.39%
Group 5 Train	0.96456	0.11689	0.97999	0.95556	0.04223	0.99388	0.85062	0.01805	0.96464	−10.98%	57.25%	−2.94%
Group 5 Validate	0.71111	0.05238	0.87560	0.73333	0.03164	0.91117	0.88889	0.02222	0.99342	21.21%	29.76%	9.03%
Group 5 Test Period 1	0.59000	0.11687	0.81431	0.80000	0.03235	0.91209	0.75556	0.01846	0.99007	−5.56%	42.93%	8.55%
Group 5 Test Period 2	0.76364	0.12612	0.88361	0.82000	0.03294	0.90490	0.70000	0.02950	0.86829	−14.63%	10.46%	−4.05%
Group 5 Test Period 3	0.87000	0.08823	0.95611	0.83636	0.03217	0.93341	0.90909	0.01783	0.91890	8.70%	44.56%	−1.55%
Group 5 Test Period 4	0.91818	0.14869	0.95935	0.97778	0.03656	0.99774	1.00000	0.01895	0.99599	2.27%	48.17%	−0.17%

Table 2. Source Recipe Group Baseline Performance.

Groups	Source Recipe Group Baseline
Groups	TPR	FPR	AUC
Source Recipe Group Train	0.86458	0.15097	0.93659
Source Recipe Group Validate	0.72516	0.14726	0.85814
Source Recipe Group Test Period 1	0.71519	0.15521	0.84321
Source Recipe Group Test Period 2	0.81882	0.15440	0.90256
Source Recipe Group Test Period 3	0.81758	0.18887	0.85617
Source Recipe Group Test Period 4	0.80371	0.15542	0.88162

Table 3. Cross-Recipe Transfer Learning Performance and Result.

Groups	Target Recipe Group Baseline			Transfer Learning			Performance Improvement
Groups	TPR	FPR	AUC	TPR	FPR	AUC	TPR	FPR	AUC
Group 1 Train	0.77044	0.22518	0.86848	0.85151	0.09789	0.93316	10.52%	56.53%	7.45%
Group 1 Validate	0.55652	0.22767	0.70558	0.85932	0.09372	0.95082	54.41%	58.84%	34.76%
Group 1 Test Period 1	0.57600	0.23694	0.69810	0.84621	0.09345	0.92758	46.91%	60.56%	32.87%
Group 1 Test Period 2	0.64615	0.22997	0.79149	0.84504	0.09677	0.92687	30.78%	57.92%	17.11%
Group 1 Test Period 3	0.68571	0.22463	0.80887	0.83282	0.09268	0.92132	21.45%	58.74%	13.90%
Group 1 Test Period 4	0.62857	0.23274	0.76081	0.85038	0.09796	0.93269	35.29%	57.91%	22.59%
Group 2 Train	0.82561	0.15326	0.92443	0.85751	0.10418	0.93781	3.86%	32.03%	1.45%
Group 2 Validate	0.84444	0.15816	0.92170	0.86780	0.09495	0.95482	2.77%	39.97%	3.59%
Group 2 Test Period 1	0.39000	0.16171	0.63282	0.85308	0.10428	0.94622	118.74%	35.52%	49.52%
Group 2 Test Period 2	0.81000	0.15680	0.89111	0.86183	0.10449	0.93526	6.40%	33.36%	4.95%
Group 2 Test Period 3	0.66667	0.16279	0.81847	0.82443	0.10696	0.94029	23.66%	34.30%	14.88%
Group 2 Test Period 4	0.70909	0.18538	0.87006	0.88538	0.10634	0.93933	24.86%	42.64%	7.96%
Group 3 Train	0.84516	0.18762	0.91333	0.86249	0.10833	0.93528	2.05%	42.26%	2.40%
Group 3 Validate	0.65714	0.17379	0.82062	0.87034	0.09511	0.95759	32.44%	45.27%	16.69%
Group 3 Test Period 1	0.55652	0.20530	0.76141	0.82137	0.11008	0.92161	47.59%	46.38%	21.04%
Group 3 Test Period 2	0.67826	0.17845	0.83246	0.88779	0.10504	0.95808	30.89%	41.14%	15.09%
Group 3 Test Period 3	0.71852	0.21638	0.84151	0.87481	0.10673	0.93662	21.75%	50.68%	11.30%
Group 3 Test Period 4	0.69231	0.18294	0.82349	0.89313	0.11340	0.95149	29.01%	38.01%	15.54%
Group 4 Train	0.79363	0.29263	0.85537	0.86276	0.10677	0.93883	8.71%	63.52%	9.76%
Group 4 Validate	0.60000	0.23741	0.75515	0.85043	0.10095	0.94609	41.74%	57.48%	25.28%
Group 4 Test Period 1	0.69474	0.29447	0.77176	0.85191	0.10246	0.94022	22.62%	65.20%	21.83%
Group 4 Test Period 2	0.72222	0.30188	0.79792	0.84462	0.10409	0.92466	16.95%	65.52%	15.88%
Group 4 Test Period 3	0.65000	0.26328	0.77420	0.83206	0.10741	0.93382	28.01%	59.20%	20.62%
Group 4 Test Period 4	0.78947	0.31724	0.86429	0.88923	0.10638	0.95566	12.64%	66.47%	10.57%
Group 5 Train	0.92308	0.02060	0.99582	0.85469	0.09876	0.93944	−7.41%	−379.49%	−5.66%
Group 5 Validate	0.50000	0.02586	0.81466	0.91207	0.09390	0.95903	82.41%	−263.07%	17.72%
Group 5 Test Period 1	0.30000	0.03594	0.62656	0.82016	0.10111	0.94309	173.39%	−181.35%	50.52%
Group 5 Test Period 2	0.50000	0.02000	0.82385	0.80000	0.09482	0.92558	60.00%	−374.12%	12.35%
Group 5 Test Period 3	0.80000	0.05113	0.89398	0.85736	0.10059	0.94645	7.17%	−96.75%	5.87%
Group 5 Test Period 4	0.60000	0.02754	0.97971	0.81719	0.09835	0.92863	36.20%	−257.15%	−5.21%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kaitwanidvilai, S.; Sittisombut, C.; Huang, Y.; Bom, S. Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing. Processes 2025, 13, 962. https://doi.org/10.3390/pr13040962

AMA Style

Kaitwanidvilai S, Sittisombut C, Huang Y, Bom S. Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing. Processes. 2025; 13(4):962. https://doi.org/10.3390/pr13040962

Chicago/Turabian Style

Kaitwanidvilai, Somyot, Chaiwat Sittisombut, Yu Huang, and Sthitie Bom. 2025. "Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing" Processes 13, no. 4: 962. https://doi.org/10.3390/pr13040962

APA Style

Kaitwanidvilai, S., Sittisombut, C., Huang, Y., & Bom, S. (2025). Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing. Processes, 13(4), 962. https://doi.org/10.3390/pr13040962

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Product Quality Prediction in Smart Manufacturing Through Parameter Transfer Learning: A Case Study in Hard Disk Drive Manufacturing

Abstract

1. Introduction

2. Preliminaries

2.1. Transfer Learning

2.2. Virtual Metrology

2.3. Problem Formulation: Transfer Learning in Seagate Factories

2.4. Manufacturing Data Acquisition

3. Materials and Methods

3.1. Virtual Metrology Base Model Architecture

3.1.1. Convolutional Neural Network (CNN)

3.1.2. Multi-Head Self-Attention

3.2. Parameter Transfer Learning Architecture

3.3. Experiment Setting

3.3.1. Design of Experiments

3.3.2. Evaluation Metrics

3.3.3. Implementation Details

4. Results

4.1. Cross-Factory Transfer Learning Results

4.2. Cross-Recipe Transfer Learning Results

5. Conclusions

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI