A Hybrid Approach Combining Fuzzy c-Means-Based Genetic Algorithm and Machine Learning for Predicting Job Cycle Times for Semiconductor Manufacturing

Job cycle time is the cycle time of a job or the time required to complete a job. Prediction of job cycle time is a critical task for a semiconductor fabrication factory. A predictive model must forecast job cycle time to pursue sustainable development, meet customer requirements, and promote downstream operations. To effectively predict job cycle time in semiconductor fabrication factories, we propose an effective hybrid approach combining the fuzzy c-means (FCM)-based genetic algorithm (GA) and a backpropagation network (BPN) to predict job cycle time. All job records are divided into two datasets: the first dataset is for clustering and training, and the other is for testing. An FCM-based GA classification method is developed to pre-classify the first dataset of job records into several clusters. The classification results are then fed into a BPN predictor. The BPN predictor can predict the cycle time and compare it with the second dataset. Finally, we present a case study using the actual dataset obtained from a semiconductor fabrication factory to demonstrate the effectiveness and efficiency of the proposed approach.


Introduction
Job cycle time is the cycle time of a job or the time required to complete a job, the prediction of which is a critical task associated with different types of systems, such as production systems [1,2], computer systems [3], and network systems [4]. Various managerial goals can be achieved once the job cycle time has been predicted accurately at a factory, including ordering-decision support, internal due-date assignment, output projection, enhancing customer relationships, guiding subsequent operations [5], pursuing sustainable development, meeting customer requirements, and promoting downstream operations. These advantages can support the competitiveness of a system to allow it to survive and be developed sustainably.
In semiconductor manufacturing, each wafer fabrication factory is a complicated production system with idiosyncratic features such as changing demand, various product types and priorities, equipment unreliability, unbalanced capacity, job reentry into machines, alternative machines, sequence-dependent setup time, and shifting bottlenecks [6], which strongly affect job cycle time and make it very difficult to predict. However, these important features could be detected by mining and analyzing current job cycle time data. Herein, we present a hybrid approach that comprises the fuzzy c-means (FCM)-based genetic algorithm (GA) and a backpropagation network (BPN) to predict job cycle times for semiconductor manufacturing factories.
Semiconductor wafer fabrication is currently a very complex manufacturing process, presenting various production planning and control issues. Predicting the cycle time of each job in a wafer fabrication factory is a critical task for every wafer manufacturer. Glassey and Resende [7] presented a closed-loop job-release control policy to minimize the average cycle time of jobs in a wafer fabrication factory. Various studies [2,[8][9][10] then emphasized the significance of predicting job cycle times for semiconductor fabrication factories. Here, we identify and review various studies that surveyed job cycle time prediction issues in semiconductor manufacturing.
Initially, Chung and Huang [11] classified the existing approaches used to predict job cycle times in semiconductor manufacturing into four categories, namely (i) statistical, (ii) analytical, (iii) simulation, and (iv) hybrid methods. Now, because of the development of machine learning technology, six categories [2] are generally used, namely (i) statistical, (ii) production simulation (PS), (iii) artificial neural networks (ANNs), (iv) case-based reasoning (CBR), (v) fuzzy modeling, and (vi) hybrid approaches. Herein, we review previous studies in the context of (i) statistical analysis, (ii) analytical methods, (iii) ANNs, (iv) CBR, (v) PS, (vi) fuzzy modeling methods, and (vii) hybrid approaches.
Statistical analysis is a prevalent method in practical applications. A regression model [12] was used to forecast the fab throughput time, including the wait time and processing time. Backus et al. [13] used statistical methods based on modern data-mining algorithms to develop nonlinear predictors for estimating the cycle time of a target lot in a factory. Yang et al. [14] proposed a nonlinear regression metamodel based on queuing theory to generate cycle-time-throughput curves, where simulation experiments were built up sequentially using a multistage procedure. Subsequently, Yang et al. [15] used a factory simulation to fit the metamodels to determine the parameters of the generalized gamma distribution. Pearn et al. [16] presented a due-date assignment model by modeling the waiting time for each product type using a gamma distribution for the due-date assignment problem in a semiconductor fabrication factory. Tai et al. [17] developed an accurate cycle-time estimation method to satisfy a targeted on-time delivery rate. A statistical approach was used to calculate the cycle time for the sum of multiple Weibull-distributed waiting times in the multilayer semiconductor final testing process.
Analytical methods are attracting considerable attention as a way to estimate job cycle times for semiconductor fabrication factories. Chung and Huang [11] provided an analytical approach to developing cycle-time estimation algorithms for engineering lots by analyzing the material flow characteristics in a wafer fabrication factory. Shanthikumar et al. [18] developed a novel solution to reduce the cycle time for each lot in a semiconductor fabrication factory by relaxing a fundamental assumption in classical queuing theory. Morrison and Martin [19] developed a G/G/m-queue model to estimate the total time which a product lot spent in the G/G/m-queue system.
The neural network is one of the most well-known techniques in artificial intelligence [20]. Many studies have also shown that ANN-based methods outperform traditional methods where lengthy and rigorous experiments are avoided while predicting. Chang and Hsieh [21] presented a neural network model to forecast the due date of each order at a wafer fabrication factory, where the experimental results indicated that the proposed approach was effective and efficient when compared with some traditional methods. Sha and Hsu [22] developed an ANN-based due-date assignment rule combined with simulation technology and statistical analysis for predicting lead times at a wafer fabrication factory. Chien et al. [23] developed a manufacturing intelligence approach by integrating Gauss-Newton regression and BPN as a basic model for forecasting the cycle time of a production line. Gazder and Ratrout [24] presented a logit-ANN for mode selection model and applied it in a case study for border transportation. Singh et al. [20] proposed a novel approach for multicriteria decision-making problems using the analytical hierarchy process for evaluating the total transportation cost and an ANN model to investigate the prediction of the total transportation cost. CBR methods have been widely used in previous studies. Chang et al. [1] explored an application of CBR and developed a CBR system using a similarity measure among orders for due-date assignment problems at a wafer fabrication factory. Chiu et al. [25] developed a CBR approach that used a k-nearest-neighbor method with dynamic feature weights and nonlinear similarity functions to predict order due-date for the due-date assignment problem at a wafer fabrication factory. Chang et al. [26] developed a CBR model in which a GA was used to predict the job cycle time, and a self-organizing map was used to cluster the job cycle time and related shop-floor status at a semiconductor fabrication factory. Liu et al. [27] proposed an approach to predicting job cycle times-by applying evolving fuzzy CBR and self-organizing map methods for a semiconductor fabrication factory.
In addition to the aforementioned approaches, various PS methods have also been proposed for estimating job cycle times for semiconductor fabrication factories. Vig and Dooley [28] presented two new dynamic due-date assignment rules used to predict the job cycle time based on recently completed jobs. Those two new rules were also compared with other established job cycle time-estimation models via computer simulation based on the criterion of due-date performance. Veeger et al. [29] proposed an aggregate model that an extensive simulation study demonstrated could accurately predict the cycle-time distribution of integrated processing workstations in a semiconductor fabrication factory. Hsieh et al. [30] proposed a progressive simulation metamodeling methodology that allowed efficient development of the response surface between the cycle time of regular lots and the percentage of hot lots of high priority in a semiconductor fabrication factory. Yang, Hu [31] built a simulation model for due-date assignment by using the orthogonal kernel least-squares algorithm and imitating the production process of a highly dynamic job shop.
Many fuzzy modeling methods have also been developed previously. Chen [2] developed a fuzzy BPN to incorporate production-control expert judgments with expert opinions to enhance the performance of an existing crisp BPN for predicting the output times of wafer lots. Chang et al. [32] presented a fuzzy modeling method that was evolved further with a GA for the due-date assignment problem in semiconductor manufacturing. Chen and Romanowski [6] proposed a fuzzy data-mining approach based on an innovative fuzzy BPN to determine the lower and upper bounds of job cycle times for wafer fabrication factories. Chen [33] developed a fuzzy neural network-based fluctuationsmoothing rule to better estimate the remaining cycle time of a job at a wafer fabrication factory.
Recently, many hybrid approaches [34,35] based on machine learning and classification have been proposed to improve job cycle time prediction accuracy and accuracy by analyzing the data from semiconductor fabrication factories. Chen [36] applied a fuzzy BPN approach by pre-classifying wafer lots with the k-means classifier before predicting the output time of a job. Chen et al. [37] proposed two hybrid approaches with job record post-classification, namely (i) the equally divided method and (ii) the proportional-to-error method, where a job was post-classified by a BPN after the forecasting error was generated. Tirkel [38] developed cycle-time prediction models by applying machine learning and data-mining methods, where the best neural network model obtained a higher prediction accuracy than that of the decision-tree method. Chen and Lin [39] proposed a fuzzy BPN approach to improve the accuracy of wafer-lot output-time prediction using a relatively small training set. Wang and Zhang [10] designed big-data analytics to predict wafer-lot cycle times. Chen [40] proposed a BPN-based hybrid approach to estimate job cycle times and determine the cycle-time range for a semiconductor manufacturing factory. Chen and Wang [41] estimated the cycle time of a job using a BPN after a nonlinear approach had been used to normalize job cycle times.
Previous studies have used various hybrid approaches for job cycle time prediction [42,43]. Generally, comparisons and trade-offs are made to select a suitable approach among the aforementioned categories. Analytical methods are inappropriate for job cycle time prediction because it is challenging to construct a complex semiconductor manufacturing system analytically. Excessive simulation time and the need for a large amount of data are the main disadvantages of PS [44]. Besides, wafer-lot priorities and machine-dispatching rules change in real-time in semiconductor manufacturing systems, limiting the applicability of statistical methods in predicting the cycle times of wafer lots [10]. ANNs, CBR, and fuzzy modeling methods can provide reasonable prediction accuracy with a relatively small amount of data [44]. Many recent results have shown that classificationbased hybrid approaches can improve the accuracy of job cycle time prediction without requiring a large amount of data from semiconductor fabrication factories [44,45]. To improve the performance of job cycle time prediction, we develop a new hybrid approach that combines data mining (FCM-based GA) and machine learning (BPN) methods.
The classification method selection is critical because the quality of the resulting classification can indeed affect the accuracy of job cycle time prediction for semiconductor fabrication factories. However, it is not appropriate to classify the job records exclusively into several clusters. In other words, one job record can belong to multiple clusters with different membership values. In that case, different membership values to different clusters indicate the different amount of effects toward the corresponding clusters. Besides, because it is difficult to determine which cluster a job record belongs to when it falls on the border of two adjacent clusters, FCM has been used in prediction models to classify the data into several clusters in advance, an approach that has been used widely in various fields [46][47][48]. The FCM algorithm uses the sum of the membership value for all clusters, making it sensitive to noise and isolated data. In addition, FCM is essentially a kind of local hill-climbing algorithm, which makes it sensitive to the initial cluster center and easy to converge to a local extremum [49,50]. To overcome these defects, an FCM-based GA has been introduced.
Before applying the FCM-based GA method, all job records are divided into (i) a dataset for clustering and training and (ii) a dataset for testing. We then use an FCM-based GA classifier to classify the first dataset of job records into several clusters because (i) FCM allows for flexibility in classifying job records and reduces the sensitivity to noise and isolated data, and (ii) a GA can improve the FCM performance by preventing cluster centers from converging to local extrema. The cluster center results obtained from the FCMbased GA classifier are fed into the BPN predictor for training. We discovered that the overfitting of the BPN predictor took place when trained by the raw data containing noises. The BPN predictor predicts job cycle times and compares them with the second dataset.
The remainder of this paper is organized as follows. In Section 2, we describe the proposed FCM-based GA and BPN methods for predicting job cycle times for semiconductor manufacturing. Then, Section 3 uses an actual dataset obtained from a semiconductor fabrication factory to test the proposed hybrid approach. Finally, in Section 4, we conclude by showing this study's contributions and future directions.

Methodology
In this section, we present our hybrid approach and explain how it predicts the job cycle time. In semiconductor fabrication factories, six attributes are usually considered for determining the final job cycle time, namely: (i) the job size (pieces); (ii) the factory workin-process (jobs); (iii) the queue length before the bottleneck (jobs); (iv) the queue length on the rout (jobs); (v) the average waiting time of recently completed jobs (hours); and (vi) factory utilization rate [40,51]. These attributes make different contributions to the final cycle time (hours) of a job. Before we explain the approach in detail, we introduce the notation used herein.

Notation
The notation used in this model is as follows: Sets ℛ Dataset of job records for clustering and training, indexed by ∈ ℛ Dataset of job records for testing, indexed by ∈ ℬ Set of job attributes, indexed by ∈ ℬ Parameters Value of attribute of job record Cycle time of job record Weighted value of attribute Number of clusters, indexed by , ∈ Number of attributes. Fuzziness exponent value Decisions variables -th cluster center of attribute -th K-dimensional cluster center Membership value of job record to -th cluster Expected job cycle time of cluster Cycle time prediction of new job record

Fuzzy c-Means Clustering
The FCM clustering is a soft partitioning algorithm, where jobs with similar attribute values are classified into the same cluster. FCM was developed by Dunn [52] and later improved by Bezdek and Dunn [53]. It is used to assign patterns or data to different clusters, where each data point is allowed to belong to several clusters with different membership values. These membership values represent the extent to which each point belongs to each cluster, and they are also used to update the cluster centers. With this FCM clustering method, the number of clusters is pre-determined, and each data point is then assigned to one or more clusters. The FCM algorithm can be seen as a fuzzified version of the k-means clustering algorithm and is based on minimizing an objective function called the c-means function [53,54]. This takes three input parameters, namely: (i) the number of clusters ; (ii) the fuzziness exponent value 1; and (iii) the termination tolerance 0. Given a set of job records ℛ, including six attributes ℬ, FCM attempts to minimize the objective function using the following steps.
Since the proposed hybrid FCM-based GA needs an efficient chromosome design, it is crucial to understand the FCM spatially. To help readers understand, an example of FCM is given in Figure 1 with 6 data points in circles, 2 attributes in axes, and 3 cluster centers in rectangles on two-dimensional spaces. The distance is calculated once cluster centers are obtained.  (1) Minimization of c-means functional The FCM minimizes an objective function called c-means functional, given as follows. The objective function measures the sum of fuzzy similarity for job records ℛ over all clusters. Note that, at each iteration, the FCM updates the centers of all clusters. The objective function is given by where is the Euclidean distance between job record and the center of cluster . (2) Classification (membership updates) We update the membership value of each job record toward cluster . Note that at the start of the algorithm, these membership values are initialized randomly such that 0 and ∑ ∈ 1. The fuzziness coefficient 1 ∞ represents the required clustering tolerance. The membership value is updated by 1 (3) Determination of cluster center We calculate the K-dimensional center for each cluster using Equation (3) and then update the Euclidean distance from each job record to the center of cluster , as well as the membership value , using Equation (4): It is noted that the deviation of values in each attribute is different, and more significant attributes do not overwhelm smaller ones. It was proposed that the original data should be normalized into the same range for all attributes. In this process of reducing prediction errors, can be determined by applying (4) Termination condition The incremental improvement for objective function value determines if the iteration continues. When the FCM termination tolerance is given, the FCM terminates if where s is the iteration number.

Design of FCM-Based GA
As explained earlier, a GA has been incorporated into FCM to improve the FCM performance by preventing the cluster centers from converging to local extrema too easily. Past research indicates that the combination of GA and FCM methods has apparent strength. Ding and Fu [50] studied a combination of the kernel-based FCM and GA, leading to improved clustering performance. Wikaisuksakul [55] demonstrated that the FCM-NSGA (non-dominated sorting GA) achieves the best partitioning over the other techniques. Ye and Jin [56] suggested a clustering algorithm based on quantum GA, resulting in a better clustering than the general FCM clustering algorithm. Herein, we develop an FCM-based GA approach that combines a GA with FCM. We use center-based string encoding, nonlinear ranking selection, and an adaptive crossover and mutation strategy [55], discussed below.
(1) Chromosome structure It is always important to have an effective and efficient chromosome structure to the problem in GA. The chromosomes of our proposed FCM-based GA represent the cluster centers by encoding them as a center-based string. Since the job records in this study have K attributes, the K-dimensional centers of clusters are called genes and are concatenated into a string, as shown in Figure 2. An individual or chromosome comprises Kdimensional centers. Therefore, the FCM is implemented whenever K-dimensional centers of G clusters need to be found, given attribute values and their membership values to cluster c.  (2) Fitness function

Chromosome
The fitness function is the measure to judge and evaluate individuals (chromosomes) until either a maximum number of generations is reached or the fitness value has converged (i.e., a fitness variance is small enough). In GAs, individuals with higher fitness values are considered better and are more likely to survive. In our case, we used the reciprocal of the objective function as the fitness function to evaluate each individual's fitness: (3) Genetic operators The genetic operators that drive the search process are as follows.
(1) Selection operator Here, the constant-ratio selection method is used to determine individuals to which genetic operations are applied. This selection operator allows us to use good individuals as parents for the population in the next generation. A group is created by selecting a predetermined percentage (or constant ratio) of individuals from the population, and the best individual among that group is chosen. This process is repeated as many times as the size of the population. This method can preserve the best individual in the current population. With this strategy, fitter individuals have higher survival probabilities, although this does not guarantee that the fittest individual is selected.
(2) Crossover operator First, we generate a random number in the interval [0, 1] and compare it with the crossover probability . The crossover operator is applied to two parents that are selected randomly to generate two new children while . We use a single-point crossover operator, and the crossover point is selected based on a random integer ∈ 1, . The crossover process is shown in Figure 3.  (

3) Mutation operator
For each individual, we generate a random value in the interval 0, 1 and compare it to the mutation probability . If , we mutate that individual using a singlepoint mutation one by replacing randomly selected gene with a new random K-dimensional center * * , * , … , * , as shown in Figure 4.

(4) Steps of FCM-based GA
The following steps must be implemented to apply the proposed FCM-based GA.
Step 1: Set the parameter values, namely the number of clusters , number of generations , population size , crossover probability , mutation probability , fuzziness value , and termination tolerance .
Step 2: Initialize the population. Chromosomes are generated using the FCM clustering methods as many times as . To produce a chromosome, we calculate the membership value for each job record to each cluster by randomly generated numbers , 0 1, as follows Step 3: Apply genetic operations. The value of the fitness function (Equation (7)) can be calculated for each individual. We then use genetic operations, namely the selection, crossover, and mutation operators, to improve population diversity.
Step 4: Apply optimal preservation. For each generation, the fitness values are recalculated after the genetic operations have been applied to evaluate each individual. Individuals with higher fitness values are more likely to be chosen for survival.
Step 5: Check termination condition. In the present study, the iteration terminates either after a given number of generations or when a given fitness variance value is achieved. If either of these conditions is satisfied, then evolution stops. Otherwise, we return to step 3. For a population of , indexed by , the variance is calculated as After the evolution process is complete, we can obtain the K-dimensional center for each cluster and the membership values. We then associate each job record to all clusters only if its membership value exceeds a given threshold value . An auxiliary binary variable ϵ is used to calculate the expected job cycle time for all clusters. Finally, the expected job cycle time in each cluster can be calculated as where, ϵ 1 if 0 otherwise . Figure 5 shows a flowchart of the complete FCM-based GA approach.

Backpropagation Network (BPN) Predictor
After the classification is complete, the K-dimensional center for each cluster and the corresponding expected job cycle time is obtained. This information represents the overall behavior and relationship of the dataset. It is important to determine the number of clusters required to capture the most information from the dataset. With too few clusters, it is hard to identify the main features of the dataset. Meanwhile, with too many clusters, the locations of the cluster centers are easily affected by some isolated and noisy job records. Therefore, we test multiple clusters to identify the best combination of experimental parameters and numbers of clusters in the next section.
For the cluster centers, the relationship between the independent variables (or attributes) and the corresponding job cycle time has been shown to be nonlinear [23]. As illustrated in Section 2, BPNs are well-known tools for fitting nonlinear relationships. In this sense, a BPN is a good choice for fitting the relationship and predicting the job cycle time [57,58]. Law [59] emphasized that the results predicted by BPNs are accurate with relatively few errors because a BPN adjusts its weights in the output layer to model the training elements. A BPN compares its output with the actual values and propagates the error back through the network. This process is repeated until the error falls within the range of acceptance, whereupon the neural network has been trained successfully. However, determining suitable training and architectural parameters remains difficult. These parameters are usually determined either by trial and error or by pairwise comparisons. The main structure of our BPN is shown in Figure 6.
Since there is no standard method to construct the BPNs for certain problems [20], we have tested several structures of BPNs on a trial-and-error basis. BPN has a stronger nonlinear mapping ability and flexible network structure. Additionally, the hidden layer may be one or more depending upon the complexity of the problem. More hidden layers might result in overfitting. In this sense, this study uses BPN with one hidden layer rather than ANN for fitting nonlinear relationships between the attribute values and job cycle time. Besides, one hidden layer is used for faster convergence without losing much quality.

Input layer
Hidden layer Output layer ...  The BPN usually consists of at least three layers. Each layer contains many neurons, and the neurons in adjacent layers are interconnected by a set of weights. Here, the detailed configuration and training information of BPN are described as follows: (1) Input layer There are cluster centers, each with K attributes. Therefore, K neurons are set in the input layer. The cluster centers with K attribute values and the expected job cycle time are normalized to fall within the interval 0, 1 , indexed by , according to (2) Single hidden layer The hidden layer may be one or more depends upon the complexity of the problem. In this study, one hidden layer is used for faster convergence and avoiding overfitting without losing much quality.
(3) Output layer The predicted job cycle time is obtained in normalized form.
(4) Training method Gradient descent has been used as a training method in this study since it is the most commonly used.
(5) Activation/transformation function We use the following nonlinear sigmoid function [60] as the activation function: where is a random variable.
(6) Learning rate (π): 0.01-1.0. (7) Convergence criterion Many measures can be used to determine when to cease BPN training. Here, the following formula is used in the BPN: where is the job cycle time of cluster c, which is calculated at the output layer using the set of currently trained weights. The BPN training stops when T falls below 10 .

Data Description
To demonstrate how the proposed approach is applied, we use a series of 120 job records used in the previous research [51,58]. They were collected from a wafer fabrication factory and are shown in Figure 7. The unit of job cycle time is given by hour (h). The job cycle time has an average of 1237 h and a standard deviation of 205 h. The job cycle time pattern is highly nonstationary and is not stable. Additionally, the job cycle time is correlated strongly with six attribute variables, as mentioned in the Methodology section. To demonstrate and evaluate the predictive performance of the proposed approach and assess its accuracy fairly, we split these 120 job records into two datasets, namely (i) training dataset and (ii) testing dataset. Since the dataset used for testing usually accounts for a small proportion of the whole dataset, in this study, the first 110 job records are used for (i) the first 110 job records for clustering and training and (ii) the remaining 10 job records for testing.

Experimental Settings
The FCM-based GA is used to classify the first 110 job records into clusters. The obtained cluster centers are fed into the BPN. After learning and training, 10 new testing jobs are given to the trained BPN predictor to predict their cycle times, where the prediction performance is assessed in terms of the mean absolute error (MAE), mean absolute percentage error (MAPE), and root-mean-squared error (RMSE), which are calculated as where is the absolute difference between the predicted and observed values. Many parameters affect the accuracy of the job cycle time prediction. Herein, we test the performance and compare the results by using different cluster numbers , different neuron numbers ξ in the hidden layer, and different threshold values . In addition, all experiments are conducted using populations with the same number of individuals. The remaining parameters used for the FCM-based GA and BPN are summarized in Table 1. Learning rate (π) 0.99 Number of neurons in the input layer (φ) 6 Number of neurons in the hidden layer (ξ) 7, 9, 11, 13, and 15 We implement the proposed hybrid approach in C++ using Visual Studio 2013. The experiments are run on a computer with an Intel Core i7-4790 CPU @3.6 GHz and 16 GB of memory under Windows 10 Professional edition. Because the BPN performance is sensitive to the initial conditions of random weights [39], the experiment is repeated at least five times with different initial conditions. The best and average predicted values are recorded for subsequent analysis.

Experimental Results
The predicted job cycle times and the observed job cycle times for different numbers of clusters and different membership threshold values are presented in Figure 8. It shows that the predicted job cycle time follows similar patterns to those of the observed data. The proposed approach is quite effective in predicting the job cycle time, and computational results show that the observed and predicted job cycle times are matched well when the number of clusters goes from 16 to 20.
Note that the membership threshold value may indeed affect the accuracy of the job cycle time prediction. As the number of clusters increases from 20 to 24, the job cycle time prediction performance fluctuates more under different membership threshold values. Next, we discuss the job cycle time prediction accuracy in terms of the MAE (hours), MAPE (%), and RMSE (hours). To investigate how the number of clusters affects the prediction accuracy, we experimented using six different numbers of clusters, namely 14,16,18,20,22, and 24. The MAE, MAPE, and RMSE are used to evaluate the effects of the number of clusters overall testing data for the identical threshold value and the identical number of neurons in the BPN hidden layer, in Figures 9-11, respectively. These figures show similar trends in the MAE, MAPE, and RMSE values against different numbers of clusters, indicating that there might be the need to study optimal numbers of clusters appropriate to datasets but beyond the scope of this study. The curves of MAE, MAPE, and RMSE show that the job cycle time prediction accuracy is strongly correlated with the number of clusters. The job cycle time prediction performance varies with the number of clusters and is poor with too many or too few clusters, meaning that increasing or decreasing the number of clusters does not provide better job cycle time predictions. In Figure 9, we generally obtain better job cycle time prediction performance (MAE) with the threshold values 0.05 or 0.08. These threshold values provide good data classification that improves the job cycle time prediction accuracy in the BPN predictor. Similar trends are obtained for the MAPE and RMSE values in Figures 10 and 11, respectively. To investigate the effect of different numbers of neurons in the BPN hidden layer, we test the job cycle time prediction performance with the identical numbers of clusters (18, 20, and 22) (18, 20, and 22), five different numbers of neurons in the hidden layer (7, 9, 11, 13, and 15) and four different threshold value (0.01, 0.05, 0.08 and 0.1). Since the performance measures (MAE, MAPE, and RMSE) are about the prediction errors, the best prediction results were obtained are shaded grey in Table 2 when 20, 0.01, and ξ 7. Furthermore, to evaluate the effectiveness and efficiency of the proposed hybrid approach, we compare with five existing approaches that were implemented for the same dataset as ours in previous works (Chen 2007b, Chen 2016b); they are (i) linear regression (LR), (ii) BPN, (iii) k-means BPN, (iv) k-means fuzzy BPN [36], and (v) post-classification BPN (Chen [58]).
In Table 3, the best prediction results obtained with the proposed approach with 20, 0.01, and ξ 7 are compared with those of the five existing approaches. The best results in the previous research were obtained by post-classification BPN (Chen 2016b As shown in Table 3, the prediction accuracies are improved by 16.3%, 4.6%, and 19.6% as measured in terms of MAE, MAPE, and RMSE, respectively. Comparing the proposed approach with the previous ones in Table 3 suggests that the former is effective and applicable. It is also clear that the classification-based hybrid approaches (k-means BPN, k-means fuzzy BPN, post-classification BPN, and the proposed approach) perform better than the other approaches (LR and BPN). The proposed hybrid approach is better than other BPNs because it is challenging to train the BPN using raw job records because of the overfitting problem. The proposed approach is better than the k-means fuzzy BPN approach because a better classification is obtained using the FCM-based GA method.

Conclusions
This study investigated the relationships between job attributes and job cycle time using a dataset obtained from a semiconductor fabrication factory. To enhance the effectiveness of the job cycle time prediction for semiconductor fabrication factories, a new hybrid approach is proposed, and it comprises FCM-based GA and BPN. Many previous studies used pre-classification or post-classification methods. However, such methods have several drawbacks, such as unequal sizes of different clusters. The FCM-based GA represents a new soft unsupervised classification method wherein the GA is used to improve the classification performance of the FCM. Hence, a good set of cluster centers are obtained while balancing the fuzzy memberships of data points to different centers by applying the FCM-based GA, which captures the main features of the dataset. The results obtained from the FCM-based GA are used as the training data and fed into the BPN predictor. The BPN predictor is then trained to predict the cycle time of new jobs with different attributes.
We also presented a case study to demonstrate the effectiveness of the proposed approach. To obtain good job cycle time prediction performance, various parameters in the experiments were tested to achieve the best performance. The obtained results were compared to those of the previous research to show the dominating performance over the existing methodologies. These efforts lead us to the following conclusions: (1) The FCMbased GA can better pre-classify the first dataset of job records, enabling the BPN to produce accurate job cycle time predictions; (2) Stable and fuzzy job-record classification results allow the BPN to predict the job cycle time more accurately, as shown in Table 3; (3) The proposed hybrid approach indicates that classification-based methods are superior to those without job-record classification.
The limitation of this study is explained here. The computational times were not obtained during the experiments since most studies using the single-layer ANN with the back propagation do not measure it meaningfully. More precisely, the computational times can be divided into (1) FCM clustering and GA, and (2) ANN training. The prediction time is almost instant. The computational times for FCM clustering and GA were varied by various parameters, including the number of centers, threshold values, and many GA parameters (shown in Table 1), which requires significant efforts in the design of experiments to optimize the performance. The training time depends on the size of the data and the training algorithm. Since the primary focus of this study was to propose a novel algorithm, the experimental design to understand the performance is beyond this study.
Another limitation is also related to the optimization of the performance. A simple train/test split was used instead of k-fold cross-validation, which may help the best performance of the ANN training. The primary reason is why the goal of this study was to propose a novel methodology, not to optimize the proposed algorithms. In addition, since FCM clustering is used, we believe that its effects may be aggregated as a grouping in kfold cross-validation. Furthermore, since we used the clustered c-means by the stochastic algorithm (i.e., GA) instead of the raw data, the benefits of the resampling by k-fold crossvalidation may not be justified. However, the study on the benefits of the k-fold crossvalidation in our proposed algorithm can be studied in the future.
In future work, we plan to explore the following interesting problems. The optimal number of clusters can be identified by further research of preprocessing the dataset. Given their importance, the proposed approach could be modified to predict the lower and upper job cycle time bounds. In addition, fuzzy concepts could be incorporated into the BPN to enhance prediction accuracy. Another interesting direction for future work would be identifying and eliminating invalid job records because such noise data may affect prediction accuracy. All these questions are considered in future research.