Enhancing the Effectiveness of Cycle Time Estimation in Wafer Fabrication-efficient Methodology and Managerial Implications

Cycle time management plays an important role in improving the performance of a wafer fabrication factory. It starts from the estimation of the cycle time of each job in the wafer fabrication factory. Although this topic has been widely investigated, several issues still need to be addressed, such as how to classify jobs suitable for the same estimation mechanism into the same group. In contrast, in most existing methods, jobs are classified according to their attributes. However, the differences between the attributes of two jobs may not be reflected on their cycle times. The bi-objective nature of classification and regression tree (CART) makes it especially suitable for tackling this problem. However, in CART, the cycle times of jobs of a branch are estimated with the same value, which is far from accurate. For these reason, this study proposes a joint use of principal component analysis (PCA), CART, and back propagation network (BPN), in which PCA is applied to construct a series of linear combinations of original variables to form new variables that are as unrelated to each other as possible. According to the new variables, jobs are classified using CART before estimating their cycle times with BPNs. A real case was used to evaluate the effectiveness of the proposed methodology. The experimental results supported the superiority of the proposed methodology over some existing methods. In addition, the managerial implications of the proposed methodology are also discussed with an example.


Introduction
Wafer fabrication is a complex and time-consuming process.First, photoresist patterns are photo-masked onto the surface of a wafer.Then, the wafer is exposed to short-wave ultraviolet light.The unexposed areas are etched away and cleaned.Subsequently, hot chemical vapors are deposited onto the desired zones, so that ions can be implanted to form specific patterns at the required depth.These steps are repeated hundreds of times, depending on the complexity of the desired circuits and connections that are usually measured in fractions of micrometers.
This study aims to estimate the cycle time of a job in a wafer fabrication factory.The cycle time (flow time, manufacturing lead time) of a job is the time required for the job to go through the factory.Therefore, it is subject to future demand, capacity constraints, the factory congestion level, the quality of job scheduling, and many other factors.As a result, the cycle time of a job is highly uncertain.According to the competitive semiconductor manufacturing (CSM) survey, the best-performing wafer fabrication factory of memory products achieved an average cycle time of about two days per layer of circuitry [1].Cycle time management activities include cycle time estimation, internal due date assignment, job sequencing and scheduling, and cycle time reduction (see Figure 1).In practice, shortening the cycle times of jobs is considered an effective way to improve the responsiveness to changes in demand [2].In addition, according to [3], the number of defects per die has a positive relationship with the cycle time, which means reducing the cycle time can improve product quality.Further, estimating the cycle time of a job helps establish the internal due date for the job.At semiconductor manufacturing factories, the quantities allocated to customers are distributed to daily time buckets according to the available to promise (ATP), which is calculated according to the historical cycle time.For these reasons, estimating and shortening the cycle times of jobs is a very important task to maintain a competitive edge in this industry [4].In the literature, various types of methods have been proposed to estimate the cycle time of a job in a factory.For example, probability-based statistical methods, such as queuing theory and regression, have been proposed.Furthermore, in these studies, some restrictive assumptions were made, such as exponential processing time distribution [5].Recently, Pearn et al. [6] fitted the waiting time of a job in a wafer fabrication factory with a Gamma distribution.After adding the waiting time to the release time, the cycle time can be derived.This is one of the most important tasks in controlling a wafer fabrication factory.However, the fitted distribution became invalid quickly, making some cycle time estimates far from accurate [7].Wu [8] constructed a Petri net to estimate the stage cycle time of dual-arm cluster tools with wafer revisiting.Hsieh et al. [9] modeled the response surface between the cycle time of normal lots and the percentage of hot lots in semiconductor manufacturing.Chen [10] fitted a fuzzy linear regression (FLR) equation to estimate the cycle time of a job in a wafer fabrication factory.A precise range of the cycle time was also determined.For the same purposes, Chien et al. [11] Cycle time estimation

Cycle time reduction
Internal due assignment fitted a nonlinear regression equation instead.Chen [7] applied classification and regression tree (CART) to estimate the cycle time of each job in a wafer fabrication factory.Principal component analysis (PCA) was also applied to generate independent variables from the original ones, which then served as the new inputs to CART.
The application of artificial neural networks (ANNs) is also a mainstream in this field.For example, a self-organization map (SOM) was developed in Chen [12] to classify the jobs in a wafer fabrication factory into a number of categories.Chen [13] and Chang and Hsieh [14] have constructed back propagation networks (BPNs) (or feed-forward neural networks, FNNs) to estimate the cycle time of a job based on the attributes of the job and the current factory conditions.These studies indicated that linear methods are incapable of estimating the cycle time of a job, which supported the application of nonlinear methods such as ANNs.In addition, to improve the effectiveness of an ANN approach, classifying jobs before (or after) estimating the cycle times have been shown to be a viable strategy.To this end, several classifiers were applied, such as k-means (kM) [15], fuzzy c-means (FCM) [16], and SOM [12,17].A common feature of these classifiers is that all attributes of a job are considered at the same time.In contrast, there are classifiers that consider only some of the job attributes, such as CART and fuzzy inference systems (FISs).The joint use of CART and BPN for estimating the cycle time of a job has rarely been discussed in this field.Chen [18] proposed a BPN tree approach in which the jobs of a branch are separated into two parts for either part a BPN is constructed to estimate the cycle times of jobs.However, Chen's approach relies on extensive and iterative BPN re-learning.
Genetic algorithm (GA) or genetic programming (GP) have also been applied to optimize the parameters of the existing FISs or ANNs to estimate the cycle time of a job, e.g., Chang et al. [19], and Nguyen et al. [20].
This study proposes a hybrid principal component analysis (PCA), CART, and BPN approach to estimate the cycle time of a job in a wafer fabrication factory.The significance of doing so is six-fold: (1) Hybrid algorithms have been shown to be more effective than (pure) algorithms.
(2) Although variable replacement has been applied to forecasting in many industries, it has not been applied to job cycle time forecasting for semiconductor manufacturers.This study applies PCA to enhance the forecasting performance of the CART-BPN approach.(3) In CART, the cycle times of jobs of a branch are estimated with the same value, which is not accurate.In the proposed PCA-CART-BPN approach, a BPN is established for each branch to estimate the cycle times of jobs, which is expected to enhance the estimation accuracy.(4) Although clusterwise models, such as SUPPORT and treed Gaussian process models, have been used in various fields, they have not been applied to estimating the cycle time of a job in a manufacturing system.(5) Compared with the existing methods, the proposed PCA-CART-BPN approach classify jobs based on fewer job attributes, which is relatively easy to implement.It is also possible to assign more jobs to a branch.(6) The existing classifiers in this field, such as kM, FCM, and SOM, classify jobs based on their attributes rather than their compatibilities with the estimation mechanism.However, the compatibility with the estimation mechanism is important, and may be more influential to the estimation performance.In this regard, CART considers the estimation performance in classifying jobs, which makes it more suitable for the same purpose.
In the proposed PCA-CART-BPN approach, first the original variables are replaced according to the results of PCA analysis.Then, jobs are classified using CART.Finally, a BPN is constructed for each category to estimate the cycle times of jobs within the category.
Table 1 is used to compare the proposed PCA-CART-BPN approach with some existing methods in this field.Some methods are easy to use because there have been a lot of software that can seamlessly complete all the necessary steps.The procedure of the proposed PCA-CART-BPN approach is detailed in Section 2, followed by an application to a real case in Section 3. Several existing methods were also applied to the same case for a comparison.Then, the advantages and/or disadvantages of each method were discussed.The managerial implications of the proposed methodology are discussed in Section 4. Finally, Section 5 summarizes the findings of this study, and puts forward some directions that can be explored in future studies.

Methodology
The proposed methodology is composed of five steps, as illustrated in Figure 2.

Variable Replacement Using PCA
PCA replaces the original variables by new variables that are independent of each other; these new variables become new inputs to be used in the CART-BPN that estimates job cycle times.
PCA comprises four steps: (1) Raw data standardization: The original variables may have excessively large numerical differences and dimensional conflicts.To standardize the dimensions and differences, apply Equations ( 1) through (3): 1 ( ) where i x and i  denote the mean and standard deviation of job attribute i.
(3) Determination of the number of principal components: The variance contribution rate is: and the accumulated variance contribution rate is: Choose the smallest p value such that ( ) p   ≥ 85% ~ 90%.
(4) Formation of the following matrixes:

Job Classification Using CART
CART, introduced by Breiman et al. [21], is a statistical procedure primarily used for object classification.The objective is to classify a group of objects into two or more populations.CART can handle both categorical and continuous data.CART uses an exhaustive search in splitting with an objective to improve the impurity measure, i.e., the reduction in the residual sum of squares.
The procedure of CART is composed of stages including tree growing, stopping, and pruning.The first stage is to grow the tree using a recursive partitioning technique that selects variables and split points according to a pre-specified criterion.Criteria to this end include Gini, towing, ordered towing, and maximum deviance reduction [21].
Tree growing stops if any of the following conditions is satisfied: (1) The improvement in the performance measure has become insignificant with more branches.
(2) A certain number of nodes have been generated.
(3) The depth of the tree has reached a certain level.
A large tree may overfit, while a small tree may not reflect the inherent structure of the data.Cost-complexity pruning is usually used to tackle this issue.The cost-complexity of a tree T is the sum of sum of squared error (SSE) and the penalty on the complexity/size of the tree: ( ) where j CT and j y are the cycle time estimate and actual cycle time of job j, respectively.The results of cost-complexity pruning are a nested subset of trees starting from the largest tree and ending with the smallest tree (with only a single node).The effectiveness of a subtree can be evaluated by crossvalidation or using another (testing) data.
In the traditional CART approach, the cycle times of jobs assigned to a branch (b) are estimated with the same value ( ) b y that is equal to the average of the historical cycle times: where G(b) represents the set of jobs of branch b.However, such a treatment is far from accurate.For this reason, in this study, a BPN is constructed for each branch to estimate the cycle times of jobs assigned to this branch: BPN ( ) where BPN b is the BPN constructed for branch b to estimate the job cycle times; j z is the vector of the new attributes of job j.The comparison of CART, CART-BPN, and the proposed methodology is illustrated in Figure 3, in which Δ k is the boundary value used to split the data along the new attribute k.

Job Cycle Time Estimation within Each Node
In a wafer fabrication factory, the relationship between the cycle time and attributes of a job has been shown to be a nonlinear one [11].Consider the simplest example.Sometimes wafers are processed piece by piece, while in other steps tens of wafers are processed as a whole.As a result, the relationship between the cycle time and size of a job cannot be fitted with a linear equation.BPN is a well-known tool for fitting nonlinear relationships, so is CART.A combination of CART and BPN is natural, and has potential for improving the performance of estimating the cycle time of a job.
Another question is why jobs with similar attributes have very different cycle times.That is because most of the attributes of a job were determined when the job was released into the factory.However, the cycle time of a job depends on the future conditions of the factory.That may be much different even for jobs with similar attributes.That explains the incapability of the existing methods based on job classification.

Procedure
Subsequently, a BPN is constructed to estimate the cycle times of jobs.The BPN is configured as follows.There are K inputs to the BPN including the new attributes of a job.A lot of past studies have shown that a BPN with a single hidden layer can achieve a satisfactory approximation performance [15][16][17]22].In addition, several ways have been proposed in the literature to determine the number of neurons in the hidden layer, e.g., [22][23][24].In this study, common neural network learning parameters, including number of hidden layers, number of hidden layer units, learning rate, and learning time, all are determined using a trial-and-error method.The activation/transformation functions for the input and hidden layers are the linear activation function and the hyperbolic tangent sigmoid function, respectively: (linear activation) ( ) Inputs to the BPN are multiplied by the weights of the connections between the input and hidden layers, then are summed on each neuron in the hidden layer.After being compared with the threshold on the neuron, only significant signals will be transformed and outputted as: where: jl h is the outputted signal from hidden-layer neuron l for job j; h l  is the threshold on hidden-layer neuron l; h kl w is the weight of the connection between input-layer neuron k and hidden-layer neuron l.
Signals outputted from the hidden-layer neurons are transmitted to the neuron in the output layer in the same manner.Finally, the output from the BPN is generated as: where: o  is the threshold on the output-layer neuron; o l w is the weight of the connection between hidden-layer neuron l and the output-layer neuron.Subsequently, a lot of algorithms can be applied to train a BPN, such as the gradient descent (GD) algorithm, the conjugate gradient algorithm, the Levenberg-Marquardt (LM) algorithm, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, and others.For a recent comparison of these algorithms, refer to [22].Among these algorithms, the LM algorithm has a faster convergence speed, therefore is applied in the proposed methodology, as described below.
First, placing the inputs and the BPN parameters in vectors: (21) and respectively.Then, the network output j o can be represented with: ( , ) Substituting Equation (25) into Equation ( 14), To find out the optimize values of β, an iterative procedure is used in the LM algorithm: (1) Specify the initial values of β, e.g.β = [1, 1, ... , 1].
(2) Replace β by β + δ, where δ (3) Approximating the right-hand side by its linearization gives: where: is the gradient vector of f with respect to β. SSE becomes: (4) The optimal value of δ can be obtained by taking the derivative of SSE with respect to δ and setting the result to zero.For details refer to [25].(5) Return to step (2).

Applications
A real case containing the data of 120 jobs from a wafer fabrication factory located in Taichung Scientific Park, Taiwan, was used to evaluate the effectiveness of the proposed methodology (see Table 2).There are tens of dynamic random access memory (DRAM) products in the wafer fabrication factory.Six attributes of a job, including job size, factory utilization, the queue length on the route, the queue length before the bottleneck, work-in-process (WIP), and the average waiting time, are indicated with x j = [x j1 x j2 x j3 x j4 x j5 x j6 ].The average waiting time is the average of the waiting times of the three most recently completed jobs.That measures the extent of delay that each job is likely to face.The six attributes were chosen from about twenty candidates after the backward elimination of regression analysis.The proposed PCA-CART-BPN approach was implemented on a PC with an Intel Dual CPU E2200 2.2 GHz and 2.0G RAM.The BPN was implemented with the Neural Network Toolbox of MATLAB 2006a.At first, a Pareto comparison of percentage of variability explained by each principal component is shown in Figure 4.An individual component by itself was able to explain less than 30% of the variance; the present study proposed to explain 85%-90% of the variance, and thus p was set to 5. The five principal components shown in Figure 4 explained approximately 85% of the total variability in the standardized data; therefore, the five-component analysis was a reasonable reduction of dimensions.The coordinates of the original data were calculated in terms of the new coordinate system to produce component scores.Table 3 shows the component scores, which were used as new inputs to the BPN.Subsequently, the new data were divided into two parts-the training data (the first 90 jobs) and the testing data (the remaining 30 jobs) (see Table 2).The training data were not normalized before creating the CART tree, but were normalized into [0.1, 0.9] before they were learned by the BPNs.In fact, data normalization is not helpful for CART, but is conducive to the fast convergence of the BPN.Only the performance to the testing data was evaluated.In addition, four-fold cross validation was applied to effectively reduce bias and variability.Subsequently − , CART was applied to classify jobs according to the new attributes.The impurity measure in CART was Gini.The popular 1-SE rule was applied for the CART tree pruning.In this way, the smallest tree which cross-validation cost was less than the minimum cross-validation cost plus the standard deviation of the cross-validation cost was chosen.The results are shown in Figure 5.

C D
Obviously, the number of job groups is much more than that obtained using the existing classifier like kM, FCM, or SOM.The classification results were also different from those using CART alone without PCA (see Figure 6).However, some groups contained too few jobs to train a BPN.For this reason, only for groups with more than five jobs, BPNs were applied to improve the cycle times.
Jobs in each group were used to train the BPN of the group.In this study, a trail-and-error method was used to test the optimal number of neurons.When the learning rate (η) was 0.1, testing various numbers of neurons affected the prediction error.After repeatedly testing five times, when the number of hidden layer neurons was set to 8, the minimal prediction error (minimal RMSE) was obtained (Figure 7).To define the optimal parameter values of the epochs and learning rate (η) and to obtain the best prediction results, the changes in the prediction error when using various combinations of learning rates (η) and epochs were analyzed.The results indicated the minimal RMSE was achieved when the epochs were 50,000 and the learning rate (η) was 0.9 (Figure 8).However, that resulted in a very lengthy learning process.In addition, when the learning rate (η) was 0.9, setting the epochs to 30,000 was a favorable choice.For this reason, in the experiment, the learning rate (η) and epochs were set to 0.9 and 30,000, respectively.To evaluate which method among the multiple linear regression (MLR), BPN, CART, CBR, CART-BPN, PCA-BPN, kM-BPN, and PCA-CART-BPN methods was more accurate, the estimation performance of the eight methods was arranged, as shown in Table 8, and the RMSE, MAE, and MAPE were employed to determine estimation accuracy.However, we did not compare with methods based on expert collaboration, such as Chen [7] and Chen [10], because there is no collaboration part in the proposed methodology.In addition, some hybrid methods including kM-FBPN [15], FCM-BPN and SOM-FBPN [12] are similar in nature to kM-BPN, and therefore were not compared.Further, the symmetric-partitioning and incremental-relearning classification and BPN approach proposed by Chen [18] was not compared because the execution time was more than 5 min.Only the estimation performances to the testing/validation data were compared.(1) As evident from Tables 5 and 6, the most obvious advantage of CART-BPN over CART was regarding MAPE, which achieved 10%.The CART-BPN prediction error was smaller, whereas the CART prediction error was larger; thus, the CART-BPN predictions were more accurate.
To validate the more accurate prediction of the CART-BPN, the paired t test was employed.
H 0 : When estimating the job cycle time, the estimation accuracy of the CART-BPN method is the same as that obtained from using the CART method.H 1 : When estimating the job cycle time, the CART-BPN method is more accurate than the CART method.
As expected, the CART-BPN forecasting method was superior to the CART method.Table 9 summarizes the results of a comparative analysis.Therefore, a nonlinear analysis is indeed beneficial to the job cycle time estimation problem.(2) Of these, the PCA-CART-BPN estimation efficacy exceeded that of the statistical regression model (MLR) by 41%.To make sure that such a difference is statistically significant, a paired t test was used: H 0 : When estimating the job cycle time, the PCA-CART-BPN methodology for estimating accuracy is the same as using the statistical regression approach.H 1 : When estimating the job cycle time, the PCA-CART-BPN methodology is more accurate than the statistical regression approach.
The comparison results are summarized in Table 10.As a result, the advantage of the PCA-CART-BPN methodology over statistical regression was statistically significant at α = 0.1.(3) When exploring the PCA-CART-BPN and grouping CART-BPN and kM-BPN models, utilizing the PCA-CART-BPN method reduced the MAE by 52% and 66%, respectively.Therefore, PCA-CART was a better job classifier than kM and CART.(4) The most obvious advantage of the proposed methodology over CART was up to 70% in terms of MAE.(5) The most obvious advantage of the proposed methodology over CART-BPN was up to 52% in terms of MAE, which confirmed the effectiveness of variable replacement using PCA.
(6) If the differences between the attributes of two jobs could be fully reflected on their cycle times, CBR would be very effective.However, the results did not support this viewpoint.(7) The same problem happened to kM-BPN.Nevertheless, the performance of kM-BPN was quite close to that of the proposed methodology, which can be attributed to the approximation ability of BPN.(8) The execution times of the eight methods were compared in Table 11.Direct methods like MLR were really fast.Iterative methods like CART, CBR, and BPN were a bit slow.Nevertheless, training a BPN using the LM algorithm was also efficient.The execution times of some hybrid methods (CART-BPN, PCA-BPN, kM-BPN, and PCA-CART-BPN) were approximately equal, which supported the reasonability of using the proposed methodology instead of CART-BPN, PCA-BPN, or kM-BPN.

Managerial Implications
Estimating and shortening the cycle time of each job is an important task to maintain a competitive edge in the DRAM industry.For example, the famous DRAM maker, Samsung, implemented the short cycle time and low inventory (SLIM) method to estimate the cycle times and WIP levels for various manufacturing steps, so that a more effective control of the factory was possible.As a result, the average cycle times of some DRAM products were reduced from more than 80 days to less than 30 days, bringing Samsung a benefit of about 1 billion US$ [23].
Incorrectly estimating the cycle times of jobs increases the difficulties in projecting the monthly output of a wafer fabrication factory [10], which then misleads the production planning personnel in making the release plan.If many wafers are incorrectly outputted from a wafer fabrication factory during a month with very low average selling prices (ASPs), the wafer fabrication factory will suffer considerable losses.To illustrate this, an example is given as follows.There are two products, A and B, in the factory.The ASPs of the two products are shown in Table 12.The gross dies and wafer yields of the two products are the same: 500 dies per wafer and 95%.The monthly capacity of the factory is 10,000 pieces of wafers distributed between the two products according to their ASPs.The actual cycle times of the two products are two and three months, respectively.The release plan based on the correct cycle times is shown in Table 13.The yearly revenues (months 1 to 12) are 205 million US$.If the cycle time of product A is mistaken as three months, the release plan will be Table 14.The yearly revenues reduce to 203 million US$, resulting in a loss of two million dollars per year.In future studies, a collaboration mechanism can be incorporated into the PCA-CART-BPN approach, so that multiple experts can estimate the cycle time of a job collaboratively.

Figure 2 .
Figure 2. The procedure for the proposed PCA-CART-BPN approach.

Figure 3 .
Figure 3.Comparison of CART and the proposed methodology.

Figure 7 .
Figure 7. Various numbers of neurons affected the prediction error.

Figure 8 .
Figure 8. Effects of the number of epochs.

Table 1 .
A comparison of the proposed methodology with some existing methods.

Table 2 .
The collected data.

Table 3 .
New inputs to the BPN.

Table 4 .
The estimation accuracy of the PCA-CART-BPN method (four-fold cross-validation).

Table 5 .
The estimation accuracy of the CART method (four-fold cross-validation).

Table 6 .
The estimation accuracy of the CART-BPN method (four-fold cross-validation).

Table 7 .
The estimation accuracy of the PCA-BPN method (four-fold cross-validation).

Table 8 .
Comparisons of the performances of various methods.

Table 9 .
CART-BPN and CART of the paired t test.

Table 10 .
PCA-CART-BPN and regression of the paired t test.

Table 11 .
The comparison results.

Table 12 .
The average selling prices (ASPs) of the two products.

Table 13 .
The correct release plan.

Table 15 .
Some software packages for implementing PCA, CART, and BPN.