An FCM–GABPN Ensemble Approach for Material Feeding Prediction of Printed Circuit Board Template

Featured Application: The application of the work is to optimize the material feeding of a printed circuit board (PCB) template and therefore reduce the comprehensive cost caused by surplus and supplemental feeding. Abstract: Accurate prediction of material feeding before production for a printed circuit board (PCB) template can reduce the comprehensive cost caused by surplus and supplemental feeding. In this study, a novel hybrid approach combining fuzzy c-means (FCM), feature selection algorithm, and genetic algorithm (GA) with back-propagation networks (BPN) was developed for the prediction of material feeding of a PCB template. In the proposed FCM–GABPN, input templates were ﬁrstly clustered by FCM, and seven feature selection mechanisms were utilized to select critical attributes related to scrap rate for each category of templates before they are fed into the GABPN. Then, templates belonging to di ﬀ erent categories were trained with di ﬀ erent GABPNs, in which the separately selected attributes were taken as their inputs and the initial parameter for BPNs were optimized by GA. After training, an ensemble predictor formed with all GABPNs can be taken to predict the scrap rate. Finally, another BPN was adopted to conduct nonlinear aggregation of the outputs from the component BPNs and determine the predicted feeding panel of the PCB template with a transformation. To validate the e ﬀ ectiveness and superiority of the proposed approach, the experiment and comparison with other approaches were conducted based on the actual records collected from a PCB template production company. The results indicated that the prediction accuracy of the proposed approach was better than those of the other methods. Besides, the proposed FCM–GABPN exhibited superiority to reduce the surplus and / or supplemental feeding in most of the case in simulation, as compared to other methods. Both contributed to the superiority of the proposed approach.


Introduction
Printed circuit board (PCB) is found in practically all electrical and electronic equipment, being the base of the electronics industry [1]. Due to the rapid development of computer, communication, consumer electronics, 5G, and automotive electronics, as well as the update of their products, the demand of PCB orders with specialized design features and manufacturing requirements, often referred to as a PCB template in the factory, has increased rapidly. The mode of production for a PCB factory with lots of template orders has changed from mass production to customer-oriented small-batch production, and therefore causes companies to face serious challenges. Accurate prediction of material feeding for each order is one of the critical problems. seven feature selection approaches were taken to select the critical attributes of each category divided by FCM. The GA was used to optimize the initialization parameters of BPN for each category. The reason for employing an FCM is that it accounts for the flexible classification (a template might be clustered into multiple categories with different membership degrees) and was widely used in many fields [27][28][29]. The reason for applying GA is that GA is easy to encode the problem and achieve good optimization results. It was also widely employed to optimize the structure (the number of layers and nodes in each hidden layer) and/or initial weight and bias [18,19,30] for the purpose of improving the prediction effectiveness of BPN. An aggregator BPN was adopted to conduct nonlinear aggregation because, theoretically, a BPN can approximate any nonlinear relationship [31].
In the proposed FCM-GABPN approach, input samples were first clustered with FCM, and seven feature selection methods were utilized to select critical attributes related to scrap rate for each category (a cluster is taken as a category) of PCB templates before they were fed into the BPN. Then, samples belonging to different categories were trained with different BPNs, in which the separately selected attributes were taken as their inputs and the initial parameters were optimized with GA. After training, an ensemble predictor formed with all GABPNs was taken to predict the scrap rate. Finally, another BPN was adopted to conduct nonlinear aggregation of the outputs from the component BPNs and determine the predicted feeding panel of the PCB template with a transformation. The proposed FCM-GABPN approach is illustrated in Figure 1.
Appl. Sci. 2019, x, x FOR PEER REVIEW 3 of 18 recursive feature elimination (RFE) [22], LR [23], lasso regression [24], ridge regression [25], and random forest regression (RFR) [26] seven feature selection approaches were taken to select the critical attributes of each category divided by FCM. The GA was used to optimize the initialization parameters of BPN for each category. The reason for employing an FCM is that it accounts for the flexible classification (a template might be clustered into multiple categories with different membership degrees) and was widely used in many fields [27][28][29]. The reason for applying GA is that GA is easy to encode the problem and achieve good optimization results. It was also widely employed to optimize the structure (the number of layers and nodes in each hidden layer) and/or initial weight and bias [18,19,30] for the purpose of improving the prediction effectiveness of BPN. An aggregator BPN was adopted to conduct nonlinear aggregation because, theoretically, a BPN can approximate any nonlinear relationship [31].
In the proposed FCM-GABPN approach, input samples were first clustered with FCM, and seven feature selection methods were utilized to select critical attributes related to scrap rate for each category (a cluster is taken as a category) of PCB templates before they were fed into the BPN. Then, samples belonging to different categories were trained with different BPNs, in which the separately selected attributes were taken as their inputs and the initial parameters were optimized with GA. After training, an ensemble predictor formed with all GABPNs was taken to predict the scrap rate. Finally, another BPN was adopted to conduct nonlinear aggregation of the outputs from the component BPNs and determine the predicted feeding panel of the PCB template with a transformation. The proposed FCM-GABPN approach is illustrated in Figure 1. The remainder of the paper is organized as follows. In Section 2, variables specification and sample collection are described. The FCM, feature selection methods, GABPN, and the nonlinear aggregation BPN are introduced in Section 3, followed by experimental results and discussion in Section 4. Lastly, conclusions are given in Section 5.

Variables and Sample
The data used in this study were collected from Guangzhou FastPrint Technology Co., Ltd. A total of 56 variables inherited from an enterprise resource planning system combined with the derived variables were selected and specified in Table 1, in which variables 1 to 35 are the product/process attributes, while 36 to 56 are the statistic variables. The delivery unit in a panel, required quantity/panel/area, and delivery unit area, with No. 36, 38, 39, 47, and 46, respectively, can not only be taken as statistic items, but also attribute candidates for prediction model establishment. Set and unit are two types of delivery unit, whereas panel as a production unit will be partitioned into either set or unit according to the customer's requirement before delivery. If the number of final  The remainder of the paper is organized as follows. In Section 2, variables specification and sample collection are described. The FCM, feature selection methods, GABPN, and the nonlinear aggregation BPN are introduced in Section 3, followed by experimental results and discussion in Section 4. Lastly, conclusions are given in Section 5.

Variables and Sample
The data used in this study were collected from Guangzhou FastPrint Technology Co., Ltd. A total of 56 variables inherited from an enterprise resource planning system combined with the derived variables were selected and specified in Table 1, in which variables 1 to 35 are the product/process attributes, while 36 to 56 are the statistic variables. The delivery unit in a panel, required quantity/panel/area, and delivery unit area, with No. 36, 38, 39, 47, and 46, respectively, can not only be taken as statistic items, but also attribute candidates for prediction model establishment. Set and unit are two types of delivery unit, whereas panel as a production unit will be partitioned into either set or unit according to the customer's requirement before delivery. If the number of final qualified set/unit (feeding set/unit minus the scrap set/unit) is larger than the demand number, it brings surplus sets/units; conversely, it causes supplemental feeding. On this basis, 30,117 samples of the orders were collected, multivariate boxplots [2] were conducted to detect the outliers, and, finally, 29,157 samples were left for this study. Performances of the proposed FCM-GABPN are compared to the other five approaches based on the same samples. Value range in the last column of Table 1 is the statistic result of the 29,157 samples, and variables 40 to 56 are the statistic results of the manual feeding adopted by FastPrint.

Methodology
The procedure of the proposed approach (FCM-GABPN) is shown in Figure 2, and various aspects of FCM-GABPN are discussed in the following subsections.
Rezaee et al. [28] incorporated a dynamic FCM in ANN for the online prediction of companies in the stock exchange. According to experimental results, Rezaee et al.'s algorithm was efficient at clustering samples. In addition, Fathabadi [29] applied dynamic FCM clustering based ANN approach to reconfigure power-distribution networks. Experimental results indicated that Fathabadi's approach has some benefits, such as a short process time, a very simple structure, and higher accuracy compared to the others. (1, ) m  is the hyper-parameter that controls how fuzzy the cluster will be. The procedure of applying FCM to cluster samples is as follows [31]: (1) The cluster membership value, ij u (the coefficient giving the degree of i x being in the j th cluster), are initialized randomly and establish an initial clustering result.

Data Preparation and Template Classification with FCM
Data preparation is to collect the historical data of PCB templates for this study based on the variables given in Table 1. Then, 0-1 normalization was conducted for each variable for the purpose of reducing the influence of value-range difference. On this basis, the input attributes for FCM were selected based on the experience of experts from PCB workshops. The 17 attributes marked with boldface type in Table 1 were selected, in which the attributes of Ln and Noo represent the overall characteristics of the template; the Mwil, Mlsil, Mwol, and Mlsol are the design requirements of the hole and line; and the Reqq, Reqp, and Reqa are the production scale of each template order. Others are surface-finishing operation options.
Samples of templates were pre-classified into K categories with the selected 17 attributes by FCM before they were fed into the BPN. One recent example of FCM application is Tang et al. [27], in which FCM combining with adaptive neural network was applied to predict the lane changes by considering different simulation scenarios, and the results showed that the prediction performance and stability was considerably improved when compared with ANN, SVM, and MLR. Besides, Rezaee et al. [28] incorporated a dynamic FCM in ANN for the online prediction of companies in the stock exchange. According to experimental results, Rezaee et al.'s algorithm was efficient at clustering samples. In addition, Fathabadi [29] applied dynamic FCM clustering based ANN approach to reconfigure power-distribution networks. Experimental results indicated that Fathabadi's approach has some benefits, such as a short process time, a very simple structure, and higher accuracy compared to the others.
where C is the required number of clusters; n is the number of samples; µ i(c) represents the membership of sample i belonging to cluster c; e i(c) measures the distance from samples i to the centroid of c; m ∈ (1, ∞) is the hyper-parameter that controls how fuzzy the cluster will be. The procedure of applying FCM to cluster samples is as follows [31]: (1) The cluster membership value, u ij (the coefficient giving the degree of x i being in the j th cluster), are initialized randomly and establish an initial clustering result.
(2) (Iterations) obtain the centers of each cluster as is the centroid of cluster c.
(3) Re-measure the distance of each PCB template to the centroid of every cluster, and then recalculate the corresponding membership value. (4) Stop if the number of iterations is larger than a set value. Otherwise, return to Step (2).
After clustering, samples of different categories (clusters) are then trained with different BPNs. First, a membership threshold value µ L for selecting samples in network learning has to be determined. Only samples with µ i(c) ≥ µ L will be taken in training the BPN to obtain the weights and bias geared to the c th category. As a result, a sample might be selected by multiple categories.

Attributes Selection for Each BPN Prediction Model
It is necessary to remove irrelevant and redundant attributes to reduce the complexity of analysis and the generated models, and also improve the efficiency of the whole modelling processes [2,32]. In this study, LC [20], MIC [21], RFE [22], LR [23], lasso regression [24], ridge regression [23], and RFR [24] seven feature selection approaches were employed to select critical attributes related to the scrap rate for each category of samples. The scarp rate can be taken as the dependent variable, and the independent variables are the attributes with No. 1-36, 38, 39, 46, and 47, given in Table 1. The score of independent variables obtained by each feature selection method were calculated and whose average score is greater than a certain threshold (e.g., 0.15) were taken as the input attribute of the prediction model.
The LC uses the linear correlation coefficient lcc(x, y) = cov(x, y)/ var(x)var(y) to measure the relationship between the (independent) variable x and variable y, where var is the variance of a variable and cov(x, y) denotes the covariance between x and y (namely scrap rate here) [20]. MIC is based on the idea that if a relationship exists between two variables, then a grid can be drawn on the scatterplot of the two variables that partition the data to encapsulate the relationship. To calculate the MIC of a set of two-variable data, all grids up to a maximal grid resolution are explored by computing for every pair of integers (x, y) the largest possible mutual information achievable by any x-by-y grid applied to the data. Then these mutual information (MI) values are normalized to ensure a fair comparison between grids of different dimensions and to obtain modified values between 0 and 1. Finally, the highest normalized MI achieved by any x-by-y grid as the value of MIC [21]. The main idea of RFE is to train an estimator based on the initial set of variables and weights are assigned to each one of them at first. Then, variables whose absolute weights are the smallest are pruned from the current set of variables. That procedure is recursively repeated on the pruned set until the desired number of variables to select is eventually reached [22].
The LR is to establish the regression equation of the dependent variable based on the independent variables, in which the importance of independent variables will be determined according to F-test. The smaller the value of F-test, the more important the variable is to the regression equation [21]. The lasso regression is a regularized LR by putting a L1 norm penalty on the regression coefficients.
Lasso regression will drive more coefficients of weak correlated independent variables to zero, and then facilitate the selection of variables with strong correlation [24]. The ridge regression is similar to lasso regression by putting a L2 norm penalty on the regression to penalize the weak correlated variables for the regression model establishment [25]. RFR is an ensemble of unpruned classification or regression trees, in which each branch of the trees will calculate the importance of each unused attribute in previous steps and then facilitate important-attribute selection simultaneously [26]. The above seven approaches were realized by the encapsulated functions in the machine learning library "sklearn" in this study.

GABPN-Based Scrap Rate Prediction for Each Category
The configuration of the BPN is established as follows: (1) Input: the 0-1 normalized data of the selected attributes for each category.
(2) Architecture: Single hidden layer (number of nodes in the input layer + number of nodes in the output layer)/2 is one of the commonly used ways to determine the suitable number of neurons in the hidden layer. Therefore, the number of nodes in hidden layer is depended on the number of selected attributes in this study. In order to achieve better prediction accuracy (a large number of the hidden-layer nodes are theoretically conducive to improve the predicting accuracy) and to keep the consistency, the number of neurons in the hidden layer of each BPN was set to 12 for each category in the proposed approach, considering the number of selected attributes (up to 23 selected attributes for the samples that will be discussed in Section 4). The performance of a BPN is sensitive to the initial condition. Therefore, the optimization of the initial weights and biases of BPN with GA was conducted. The design and configuration of GA is as follows: (1) Encoding and decoding: The individual chromosome in the population was encoded as [W 1 , Φ 1 , W 2 , Φ 2 ] in which W 1 = [w 1,1 , w 1,2 , .., w 1,12 , w 2,1 , w 2,2 , .., w 2,12 , w i,1 , w i,2 , .., .., w i,12 ] (selected i attributes as input and the number of neurons in the hidden layer is 12) represents the weights between nodes in input layer and hidden layer; W 1 = [w 1,1 , w 2,1 , ..., w 12,1 ] represents the weights between the nodes in hidden layer and output layer; Φ 1 = [θ 1 , θ 2 , ..., θ 12 ] is the bias vector of nodes in the hidden layer; and Φ 2 is the bias of output node. The decoding is to assign corresponding weights and bias to each node based on the BPN structure, and then conduct the forward propagation to compute the output of each BPN. (2) Population initialization: Each individual chromosome in the population was initialized randomly with its elements between −3 and 3, based on the encoding principle. (3) Fitness evaluation: The sum of absolute error between reversely normalized scrap rate forecastô k and actual scrap rate o k was taken as the fitness F = |ô k − o k | for each individual. The smaller the fitness is, the more accurate prediction result it can obtain. Thereafter, the minimization objective function, which the problem seeks to optimize, is the same as the fitness function. (4) Reproduction, crossover and mutation operation: Reproduction: The roulette wheel selection was taken to select individuals for reproduction in which the fittest individuals have a greater chance of survival than weaker ones. The probability of each individual being selected is where F i is the fitness of the ith individual and N is the number of individuals. Crossover: Two empty offspring chromosomes, O1 and O2, were initialized first, and two chromosomes, P1 and P2, were randomly selected from the reproduced population. The crossover location was randomly selected, and then the offspring O1 consisted of the genes of P1 before the crossover location and genes of P2 after the crossover location; while offspring O2 consisted of the genes of P2 before the crossover location and genes of P1 after the crossover location.
Mutation: One-point mutation was utilized as the mutation operator. The chromosome in the population was randomly selected, and one gene was chosen randomly from the selected chromosome. Then, a random r with the value in (0, 1) was generated to mutate the value. If r > 0.5, then a j = a j + (a j − a max ) × r, otherwise a j = a j + (a min − a j ) × r, where a j is value of the j th position in the chromosome selected for mutation, and a max and a min are the maximum and minimum of the j th position of all chromosomes in current generation, respectively. After the templates were clustered, a portion of the templates in each category were taken as "training samples" into the GABPN to determine the weights and bias values for the category. Three phases were involved at the training stage. First, the initial weight and bias were optimized according the GA. Second, the forward propagation is conducted, in which the inputs (selected attributes with bias) were multiplied with weights (weights of bias are 1), summated, and transferred to the hidden layer. The results of nodes in the hidden layer were further processed by sigmoid function and also transferred to the output layer with the same procedure. Finally, the output of GABPN was compared with the accurate scrap rate, and the accuracy of the GABPN, represented with mean squared error (MSE), was evaluated.
Subsequently, the backward pass which propagates derivatives (error between prediction and the actual value) from the output layer to hidden layers was conducted. The backward pass for a 3-layer BPN starts by computing the partial derivative for the output node (only one node here), and the error terms δ j of nodes j in the hidden layers can be calculated according to δ j = eW j f (x j ), in which e is error of the output node, W j is the weight connecting node j to the output node, and f (x j ) is the derivative of the sigmoid activation function with the input x j . On this basis, adjustments were made to the connection weights and bias to reduce the MSE. Network-learning stops when the iteration is greater than a given number in this study.
The trained GABPN was tested by the remaining portion of the templates in each category with the same performance indicator, MSE. Finally, the GABPN was used to predict the scrap rate of new templates that "completely" belonged to the clustered category. However, complete assignment of template to only a category is usually impossible. When a new template order is coming, the selected attributes associated with the new template are recorded, and the membership belonging to each category is calculated. Then, an ensemble predictor formed with all GABPNs can be taken to predict the scrap rate for the new template.

Nonlinear Aggregation with Another BPN and Transformation
For aggregating the predicted results from the component GABPNs into a single value representing the predicted scrap rate of the template, another BPN was employed in this study to conduct nonlinear aggregation, and the configuration is set as follows: (1) Input: 2K parameters consisted of the predicted results of each component GABPNs for the template and the membership values of the template belonging to each category. (2) Architecture: Single hidden layer and the number of nodes in the hidden layer were set to the same as that in the input layer, 2K.  The BPN also underwent training and testing. Then, the network output (i.e., the aggregation result) determined the normalized scrap-rate prediction of the template. Finally, the transformation of scrap rate to surplus rate and supplemental feeding rate were carried out. The reverse normalization was conducted for the output of the aggregation BPN and taking it as the predicted scrap rate (Scrar_Pd). Thereafter, the transformation for predicted feeding panel (Fedp_Pd) was conducted by Fedq_Pd = 100 × Reqq/(100 − Scrar_Pd) and Fedp_Pd = Fedq_Pd/Duap , where Reqq is the required quantity, Duap is the delivery unit in a panel, and Fedp_Pd is the predicted panel.

Performance Indicators
In order to evaluate the effectiveness of the model, the MSE, mean absolute error (MAE), and mean absolute percentage error (MAPE) were adopted as the indicators to evaluate the performance of the approaches, in which the predicted dataô i are the predicted least feeding panel and the original data o i are the least feeding panel. The MSE, MAE, and MAPE can be described as × 100, respectively, where N is the number of samples.
The indicators surplus rate (Surpr) and supplemental feeding rate (Supfr) in the PCB template workshop were also considered. The predicted surplus rate (Surpr_Pd) and predicted supplemental feeding rate Supfr_Pd can be computed with Equations (10) and (11) in [2], respectively. The final performance is evaluated by the MSE, MAE, MAPE, Supfr_Pd, and Surpr_Pd.

Experimental Results and Discussions
The proposed FCM-GABPN was implemented by Python 3.6. The number of clusters was set to three while conducting FCM for the purpose of reducing the number of training, testing, and model maintenance in the workshop, but also to achieve good enough prediction accuracy based on some initial test. The hyper-parameter m that controls how fuzzy the cluster was commonly set to 2 [31], and it was adopted here. The maximum number of iterations of FCM was set to 800.
If FCM cluster samples fall into the category with the highest membership value, the templates will be cluster into C1, C2, and C3, with 20,773, 1354, and 7030 samples, respectively. The membership value giving the membership degree of each sample (samples were clustered into C1, C2, and C3 here for visualization) being in the three categories is illustrated in Figure 3. The membership degree of each sample will be taken as part of the input of the aggregator BPN to perform nonlinear aggregation, as shown in Figure 1.
The mean values of input attributes in the three categories are given in Figure 4. The mean values of Reqq, Reqp, and Reqa are comparatively different in the three categories; and they are the main attributes to distinguish and identify samples within each category, which is consistent with practice in which the workshop also regards order scale (Reqq, Reqp, and Reqa) as important variables to classify orders. Meanwhile, the mean values of Mwil, Mlsil, and Mwol in C2 is lower than the corresponding values in C1 and C3, but the Ln is higher, which indicates that, the higher Ln is, the denser the lines that coincide with the actual situation are. Appl. Sci. 2019, x, x FOR PEER REVIEW 11 of 18       The membership threshold µ L should be specified for adopting samples in network learning. The numbers of samples within the three categories with different µ L are given in Table 2. The 0.4 was selected as the threshold to generate training and testing samples, not only to make sure there were enough training and testing samples for each category, but also in case a template was clustered into multiple categories with different membership degrees. Then 2/3 and 1/3 of mutually exclusive samples were randomly selected as training and testing data for each category at each run. The unclassified samples were not taken as the input to train each GABPN; however, they will be taken as the samples for final test. The numbers of training and testing samples for each category are given in Table 3.   Table 1) input attributes, the aforementioned seven feature selection mechanisms were employed to calculate the importance of each attribute on scrap rate. The importance (mean) score of each attribute for the three categories and all samples are given in Figure 5, and the corresponding No. is given in Table 4. The importance scores of attributes greater than 0.15 were chosen as the input of GABPN considering the number of selected attributes and confirmed by experts from the factory, and 23, 9, 20, and 16 attributes were selected for C1, C2, C3, and all data, respectively, that were marked with " " in Table 4. It can be seen that the critical attributes for different categories of samples are different, and one of the reasons is that the samples may have multiple complex distributions.   Table 1) input attributes, the aforementioned seven feature selection mechanisms were employed to calculate the importance of each attribute on scrap rate. The importance (mean) score of each attribute for the three categories and all samples are given in Figure 5, and the corresponding No. is given in Table 4. The importance scores of attributes greater than 0.15 were chosen as the input of GABPN considering the number of selected attributes and confirmed by experts from the factory, and 23, 9, 20, and 16 attributes were selected for C1, C2, C3, and all data, respectively, that were marked with "▲" in Table 4. It can be seen that the critical attributes for different categories of samples are different, and one of the reasons is that the samples may have multiple complex distributions.
Each GABPN model was trained by the training samples and the selected attributes given in Table 4. All samples belonging to a category compete in the same way in training the GABPN geared to the category. Prediction models of GABPN were trained and tested for each category separately, while the aggregator BPN was trained with all the training samples and tested by all the testing samples.
The GA parameters of population size, crossover probability, mutational probability, and the number of iterations of the three GABPNs were set to 100, 0.8, 0.05, and 100, according to some initial test. The convergences of GA for the initial parameter optimization of the three BPNs are illustrated in Figure 6. On the basis of the optimized parameters, the three BPNs were trained in parallel, and the output of the three prediction models was set into the aggregator BPN, with the membership degree of each sample obtained by FCM given in Figure 3. The predicted feeding panel of each sample can be determined according to the transformation described in Section 3.4, based on the reversely normalized output of the aggregator BPN.
(a) Importance scores of attributes for samples in C1.  (c) Importance scores of attributes for samples in C3.
(d) Importance scores of attributes for all samples.     Each GABPN model was trained by the training samples and the selected attributes given in Table 4. All samples belonging to a category compete in the same way in training the GABPN geared to the category. Prediction models of GABPN were trained and tested for each category separately, while the aggregator BPN was trained with all the training samples and tested by all the testing samples.
The GA parameters of population size, crossover probability, mutational probability, and the number of iterations of the three GABPNs were set to 100, 0.8, 0.05, and 100, according to some initial test. The convergences of GA for the initial parameter optimization of the three BPNs are illustrated in Figure 6. On the basis of the optimized parameters, the three BPNs were trained in parallel, and the output of the three prediction models was set into the aggregator BPN, with the membership degree of each sample obtained by FCM given in Figure 3. The predicted feeding panel of each sample can be determined according to the transformation described in Section 3.4, based on the reversely normalized output of the aggregator BPN. Arcr Phwr ▲ ▲ ▲ Figure 6. Convergences of GA for the initial parameter optimization of the three BPNs.
The regression of the predicted feeding panel versus the least feeding panel is given in Figure 7. Results indicated that the predicted feeding panel coincides well with the least feeding panel, and, therefore, the waste of surplus quantity and area can be reduced. The regression of the predicted feeding panel versus the least feeding panel is given in Figure 7. Results indicated that the predicted feeding panel coincides well with the least feeding panel, and, therefore, the waste of surplus quantity and area can be reduced. Figure 6. Convergences of GA for the initial parameter optimization of the three BPNs.
The regression of the predicted feeding panel versus the least feeding panel is given in Figure 7. Results indicated that the predicted feeding panel coincides well with the least feeding panel, and, therefore, the waste of surplus quantity and area can be reduced. The FCM-BPBPN was compared to manual feeding, BPN, MSC-ANN, FCM-GABPN without aggregation (indicated with FCM-GABPN w/o aggregation), and FCM-BPN five approaches to quantify its performance. Manual feeding is to determine the feeding panel for each template based on worker in PCB factory. BPN is to establish a single BPN prediction model without The FCM-BPBPN was compared to manual feeding, BPN, MSC-ANN, FCM-GABPN without aggregation (indicated with FCM-GABPN w/o aggregation), and FCM-BPN five approaches to quantify its performance. Manual feeding is to determine the feeding panel for each template based on worker in PCB factory. BPN is to establish a single BPN prediction model without pre-classification and takes the selected 16 attributes marked with " " in the column "All" of Table 4 as inputs. MSC-ANN [2] considered only required panel to classify the records and divide the samples into six groups. The FCM-GABPN w/o aggregation only applies the BPN to which the membership belonging is the highest and no BPN aggregation will be conducted. FCM-BPN has no GA to optimize the initial parameters of each BPN.
The testing samples were taken to evaluate the performance of the approaches, and the average MSE, MAE, MAPE, Surpr_Pd, and Supfr Pd of five runs for BPN, MSC-ANN, FCM-GABPN-w/o aggregation, FCM-BPN, and FCM-GABPN is given in Table 5. The improvement of different approaches comparing to the manual feeding (actual results of the factory) according to the performance indicators are also given, and the following discussions are made: (1) The prediction accuracy (measured with MSE, MAE, and MAPE) of the FCM-GABPN approach was significantly better than those of the other approaches, in most cases by achieving a 95.91%, 83.03%m and 89.57% reduction in MSE, MAE, and MAPE, respectively, over manual feeding. Meanwhile, the proposed FCM-GABPN exhibited superiority in the reduction of surplus and/or supplemental feeding in most of the case comparing to other methods by reducing 70.16% Surpr_Pd and 31.03% Supfr Pd over manual feeding.
(3) The superiority FCM-GABPN-w/o aggregation, FCM-BPN, and FCM-GABPN over MSC-ANN according to the MSE, MAE, MAPE, and Surpr_Pd with only 5.83% inferiority in Supfr Pd for FCM-BPN approach and the same value for FCM-GABPN indicates that the pre-classification by clustering, which considers many attributes, surpassed the MSC classification, which considers only one attribute. In addition, the FCM-GABPN-w/o aggregation, FCM-BPN and FCM-GABPN only established three BPNs for the three categories of samples, while MSC-ANN pre-classified the samples into six categories and trained a prediction model for each category.
(4) FCM-BPN and FCM-GABPN achieved lower MSE, MAE, MAPE, and Surpr_Pd in comparison to FCM-GABPN-w/o aggregation, which indicates that applying the aggregator BPN to derive the representative value by considering the membership degree of each sample facilitates the prediction improvement for the four performance indicators. The 13.61% and 7.78% increase in Surpr_Pd for FCM-BPN and FCM-GABPN may be brought by the 9.07% and 11.86% reduction in Supfr Pd, respectively. In practice, the reduction of surplus feeding and supplemental feeding is conflicted because it is difficult to obtain the minimum value for both of them in the factory. However, the reduction of the surplus rate is a goal with the greatest cost impact in the factory because the individualized surplus template products can only be placed in inventory or directly destroyed, and the reduction of the surplus production will reduce the comprehensive cost caused by the waste of material, production, inventory, and disposal/ recycling.
(5) The FCM-GABPN surpassed FCM-BPN according to the five indicators that verify the effectiveness of the initialization optimization based on GA. The reason is that BPN is sensitive to the initial condition [30], especially for the samples in the three categories that were learned with different BPNs that may be influenced greatly by the combination of the BPN's initial parameters.

Conclusions
In order to enhance the accuracy of material feeding prediction of a PCB template, an ensemble predictor FCM-GABPN was proposed. In the proposed approach, the input templates were firstly clustered by FCM, and seven feature selection mechanisms were utilized to select critical attributes related to the scrap rate for each category of templates. Then, a GABPN was trained to predict the scrap rate for each category of templates, and the GABPNs for all the categories formed an ensemble predictor with a nonlinear aggregator BPN. Finally, the predicted feeding panel for each template was determined based on the predicted scrap rate with a transformation. The effectiveness and superiority were validated with many experiments based on the actual data. On the basis of the experimental results, conclusions and contributions are highlighted as follows: (1) The accuracy of the proposed approach was better than those of the other approaches by achieving a 95.91%, 83.03%, and 89.57% reduction in MSE, MAE, and MAPE, respectively, over the comparison basis-manual feeding. Meanwhile, the FCM-GABPN's performance was superior to that of the other methods in the reduction of simulated surplus and/or supplemental feeding in most of the cases, by achieving a 70.16% reduction in Surpr_Pd and a 31.03% reduction in Supfr Pd over manual feeding. (2) The material feeding prediction of PCB template problem considering category fuzziness of samples and the diverse samples with different influence factors is different from the existing production quality prediction and optimization problem, to the best of our knowledge. The novelty of the proposed FCM-GABPN is that we fuzzily clustered samples into different categories with FCM and specified a membership threshold to adopt samples for each category. Meanwhile, component GABPN prediction model for each category was established with separately selected input attributes and GA optimized initial parameter. Furthermore, an aggregator BPN was employed to aggregate the predicted results of each GABPN by considering the membership values of each template.
Training an ensemble predictor with many sub-models that can extract shared attributes for similar templates automatically without explicit pre-classification needs to be studied, in which we do not have to divide samples, select critical attributes for each category, and build the prediction model separately. Meanwhile, the rapid development and evolution of PCB template should also be considered. The transfer and lifelong learning may be the mechanisms worthy of attempting, in order to handle the aforementioned problem.