Integration of Functional Link Neural Networks into a Parameter Estimation Methodology

: In the ﬁeld of robust design, most estimation methods for output responses of input factors are based on the response surface methodology (RSM), which makes several assumptions regarding the input data. However, these assumptions may not consistently hold in real-world industrial problems. Recent studies using artiﬁcial neural networks (ANNs) indicate that input–output relationships can be effectively estimated without the assumptions mentioned above. The primary objective of this research is to generate a new, robust design dual-response estimation method based on ANNs. First, a second-order functional-link-NN-based robust design estimation approach has been proposed for the process mean and standard deviation (i.e., the dual-response model). Second, the optimal structure of the proposed network is deﬁned based on the Bayesian information criterion. Finally, the estimated response functions of the proposed functional-link-NN-based estimation method are applied and compared with that obtained using the conventional least squares method (LSM)-based RSM. The numerical example results imply that the proposed functional-link-NN-based dual-response robust design estimation model can provide more effective optimal solutions than the LSM-based RSM, according to the expected quality loss criteria.


Introduction
Over the past two decades, robust parameter design (RPD), also known as robust design (RD), has been widely applied to improve the quality of products in the offline stage of practical manufacturing processes. Bendell et al. [1] and Dehnad [2] discussed the applications of RD to various engineering concerns in the process, automobile, information technology, and plastic technology industries. The primary purpose of RD is to obtain the optimal settings of control factors by minimizing the process variability and deviation based on the mean value and the presupposed target, which are often referred to as the process bias. As Taguchi explored [3], RD includes two main stages: design of experiments and two-step modeling. However, orthogonal arrays, statistical analyses, and signal-tonoise ratios used in conventional techniques to solve RD problems have been questioned by engineers and statisticians, such as León et al. [4], Box [5], Box et al. [6], and Nair et al. [7]. As a result, to resolve these shortcomings, several advanced studies have been proposed.
The most significant alternative to Taguchi's approach is the dual-response model approach based on the response surface methodology (RSM) [8]. In this approach, the process mean and variance (or standard deviations) are approximated as two separate functions of input factors based on the LSM. In addition, the dual-response model approach provides an RD optimization model that minimizes the process variability while the process Therefore, the main objective is to propose a new dual-response estimation approach based on NNs. First, the normal quadratic process mean and standard deviation functions in RD are estimated using the proposed functional-link-NN-based estimation method. Second, the Bayesian information criterion (BIC) is used to quantify the magnitude of neurons in the hidden layer, which affects the coefficients in the estimated input-output equations. Finally, the results of the case study are presented to verify the effectiveness of the proposed NN-based estimation method compared with the LSM-based RSM. The graphical overview of the proposed NN-based estimation method is demonstrated in Figure 1.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 20 control factor settings in the RD model without the use of estimation formulas. Winiczenko et al. [34] introduced an efficient optimization method by combining the RSM and a genetic algorithm (GA) to find the optimal topology of ANNs for predicting color changes in rehydrated apple cubes. Therefore, the main objective is to propose a new dual-response estimation approach based on NNs. First, the normal quadratic process mean and standard deviation functions in RD are estimated using the proposed functional-link-NN-based estimation method. Second, the Bayesian information criterion (BIC) is used to quantify the magnitude of neurons in the hidden layer, which affects the coefficients in the estimated input-output equations. Finally, the results of the case study are presented to verify the effectiveness of the proposed NN-based estimation method compared with the LSM-based RSM. The graphical overview of the proposed NN-based estimation method is demonstrated in Figure 1. The remainder of the study is organized as follows: Section 2 introduces the conventional LSM-based RSM. Section 3 describes the functional-link-NN-based dualresponse estimation model. Section 4 explains the outputs generated by the proposed NNbased estimation method and analyzes the LSM-based RSM based on the results of the case study. Finally, Section 5 concludes the study and describes further studies.

Conventional LSM-Based RSM
The RSM was first introduced by Box and Wilson [35] and is used to model empirical relationships between output and input variables. Myers [36] and Khuri and Mukhopadhyay [37] present insightful commentaries on the different development phases of the RSM. When the exact functional relationship is very complicated or is unknown, the conventional LSM is used to estimate the input-response functional relationships of the output [38,39]. In general, the estimated second-order response surface functions are used to analyze RD problems. The estimated process mean and standard deviation functions proposed by Vining and Myers [8] can be defined as follows: The remainder of the study is organized as follows: Section 2 introduces the conventional LSM-based RSM. Section 3 describes the functional-link-NN-based dual-response estimation model. Section 4 explains the outputs generated by the proposed NN-based estimation method and analyzes the LSM-based RSM based on the results of the case study. Finally, Section 5 concludes the study and describes further studies.

Conventional LSM-Based RSM
The RSM was first introduced by Box and Wilson [35] and is used to model empirical relationships between output and input variables. Myers [36] and Khuri and Mukhopadhyay [37] present insightful commentaries on the different development phases of the RSM. When the exact functional relationship is very complicated or is unknown, the conventional LSM is used to estimate the input-response functional relationships of the output [38,39]. In general, the estimated second-order response surface functions are used to analyze RD problems. The estimated process mean and standard deviation functions proposed by Vining and Myers [8] can be defined as follows: where x = x 1 , . . . , x i , . . . , x j , . . . , x p is a vector of input variables, andβ andδ are the estimated coefficients in the mean and standard deviation functions, respectively. These regression coefficients can be estimated using the LSM aŝ where X is the design matrix of control factors, and y and s are the mean and standard deviation of the observed responses, respectively.

Proposed Functional Link NN
ANNs are comprised of various processing elements known as artificial neurons or nodes, which are interconnected. Each neuron obtains input signals from the previous nodes, aggregates these signals with the related weights, and generates output signals through a transfer function (or activation function). This output signal forms the input signal for other nodes. The multilayer feed-forward back-propagation NN is the most influential model applied to various practical problems. A multilayer feed-forward back-propagation NN model comprises several layers; each layer contains several nodes. The first and last layers inside the network are considered as input and output layers, respectively, as the input and output units in the NN system are involved. Various hidden layers are located between the input and output layers. A multilayer feed-forward back-propagation NN comprises one input layer, one output layer, and various hidden layers between the input and output layer. The overall structure of a multilayer feed-forward back-propagation NN is illustrated in Figure 2.
where = ( , … , , … , , … , ) is a vector of input variables, and and are the estimated coefficients in the mean and standard deviation functions, respectively. These regression coefficients can be estimated using the LSM as where is the design matrix of control factors, and and are the mean and standard deviation of the observed responses, respectively.

Proposed Functional Link NN
ANNs are comprised of various processing elements known as artificial neurons or nodes, which are interconnected. Each neuron obtains input signals from the previous nodes, aggregates these signals with the related weights, and generates output signals through a transfer function (or activation function). This output signal forms the input signal for other nodes. The multilayer feed-forward back-propagation NN is the most influential model applied to various practical problems. A multilayer feed-forward backpropagation NN model comprises several layers; each layer contains several nodes. The first and last layers inside the network are considered as input and output layers, respectively, as the input and output units in the NN system are involved. Various hidden layers are located between the input and output layers. A multilayer feed-forward backpropagation NN comprises one input layer, one output layer, and various hidden layers between the input and output layer. The overall structure of a multilayer feed-forward back-propagation NN is illustrated in Figure 2. The universal approximation theorem for multilayer feed-forward networks was proposed by Cybenko [21] and Hornik et al. [20]. A multilayer feed-forward NN with a The universal approximation theorem for multilayer feed-forward networks was proposed by Cybenko [21] and Hornik et al. [20]. A multilayer feed-forward NN with a hidden layer can approximate a multidimensional, continuous, and arbitrary nonlinear function with any desired accuracy, as mentioned in Funahashi [22] and Hartman et al. [40], based on the theorem stated by Hornik et al. [20] and Cybenko [21]. In the hidden area, the transfer function is used to figure out the functional formation between the input and output factors. Popular transfer functions used in ANNs include step-like, hard limit, sigmoidal, tan sigmoid, log sigmoid, hyperbolic tangent sigmoid, linear, radial basis, saturating linear, multivariate, softmax, competitive, symmetric saturating linear, universal, generalized universal, and triangular basis transfer functions [41,42]. In RD, there are two characteristics of the output responses that are of particular interest: the mean and standard Appl. Sci. 2021, 11, 9178 5 of 18 deviation. Each output performance can be separately analyzed and computed in a single NN structure based on the dual-response estimation framework. Figure 3 illustrates the proposed functional-link-NN-based dual-response estimation approach.
[40], based on the theorem stated by Hornik et al. [20] and Cybenko [21]. In the hidden area, the transfer function is used to figure out the functional formation between the input and output factors. Popular transfer functions used in ANNs include step-like, hard limit, sigmoidal, tan sigmoid, log sigmoid, hyperbolic tangent sigmoid, linear, radial basis, saturating linear, multivariate, softmax, competitive, symmetric saturating linear, universal, generalized universal, and triangular basis transfer functions [41,42]. In RD, there are two characteristics of the output responses that are of particular interest: the mean and standard deviation. Each output performance can be separately analyzed and computed in a single NN structure based on the dual-response estimation framework. Figure 3 illustrates the proposed functional-link-NN-based dual-response estimation approach. As shown in Figure 3, , , … , denote control variables in the input layer. The weighted sum of the factors with their corresponding biases , , … , can represent the input for each hidden neuron. This weighted sum is transformed by the activation function + , also known as the transfer function. The transformed combination is the output of the hidden layer and refers to the input of one output layer as well. Analogously, the integration of the transformed combination of inputs with their relevant biases can represent the output neuron ( or ̂). The linear activation function can represent the output neuron transfer function. In an h-hidden-node NN system, 1, … , , … , ℎ, are denoted as the hidden layer, and and represent the weight term and process bias, separately. In particular, the weight connection between the input factor and hidden node is written as , while is the weight connection between the hidden node and the output. In addition, and represent deviations at and the output, respectively. The output performance of the layers in the hidden neuron can be represented in mathematical formulas as: The outcome of the functional-link-NN-based RD estimation model can be written as: As shown in Figure 3, x 1 , x 2 , . . . , x k denote k control variables in the input layer. The weighted sum of the k factors with their corresponding biases b 1 , b 2 , . . . , b h can represent the input for each hidden neuron. This weighted sum is transformed by the activation function x + x 2 , also known as the transfer function. The transformed combination is the output of the hidden layer y hid and refers to the input of one output layer as well. Analogously, the integration of the transformed combination of inputs with their relevant biases can represent the output neuron (ŷ orŝ). The linear activation function x can represent the output neuron transfer function. In an h-hidden-node NN system, 1, . . . , j, . . . , h, are denoted as the hidden layer, and w and b represent the weight term and process bias, separately. In particular, the weight connection between the input factor x i and hidden node j is written as w ji , while w j is the weight connection between the hidden node j and the output. In addition, b hid j and b out represent deviations at j and the output, respectively. The output performance of the layers in the hidden neuron can be represented in mathematical formulas as: The outcome of the functional-link-NN-based RD estimation model can be written as: Hence, the regressed formulas for the estimated mean and standard deviation are given as: where h_mean and h_std denote the quantity of the hidden neurons of the h-hidden-node NN for the mean and standard deviation functions, respectively.

Learning Algorithm
The learning or training process in NNs helps determine suitable weight values. The learning algorithm back-propagation is implemented in training feed-forward NNs. Backpropagation means that the errors are transmitted backward from the output to the hidden layer. First, the weights of the neural network are randomly initialized. Next, based on presetting weight terms, the NN solution can be computed and compared with the desired output target. The goal is to minimize the error term E between the estimated outputŷ out and the desired output y out , where: Finally, the iterative step of the gradient descent algorithm modifies w j refers to: where The parameter η (> 0) is known as the learning rate. While using the steepest descent approach to train a multilayer network, the magnitude of the gradient may be minimal, resulting in small changes to weights and biases regardless of the distance between the actual and optimal values of weights and biases. The harmful effects of these smallmagnitude partial derivatives can be eliminated using the resilient back-propagation training algorithm (trainrp), in which the weight updating direction is only affected by the sign of the derivative. In addition, the Marquardt-Levenberg algorithm (trainlm), an approximation to Newton's method, is defined such that the second-order training speed is almost achieved without estimating the Hessian matrix.
One problem with the NN training process is overfitting. This is characterized by large errors when new data are presented to the network, despite the errors on the training set being very small. This implies that the training examples have been stored and memorized in the network, but the training experiences cannot generalize new situations. To avoid the overfitting problem, efficient researchers can use an additional technique of "early stopping" to improve the generalization ability. In this model, the dataset is separated into three subsets, which are specialized to train, validate, and test the database. The process weight and bias terms of the network can be updated in the training set, in which the gradient is estimated as well. Then, the error, which is supervised during the training process, has to be evaluated in the validation set. While in the testing set, the capability to generalize the supposedly trained network can be examined. The accurate proportion of the learning algorithm among training, testing, validation data is determined by the designer; typically, the ratios of training:testing:validation are 50:25:25, 60:20:20, or 70:15:15.

Number of Hidden Neurons
Based on the number of layers in the hidden neuron, the optimal NN structure can be decided. A random selection of the number of hidden neurons can cause overfitting or underfitting problems. Several approaches can determine the number of hidden neurons in NNs-a literature review can be found in Sheela and Deepa [43]. However, no single method is effective and accurate considering various circumstances. In this study, Schwartz's Bayesian criterion, known as BIC, can help determine the number of hidden neurons. The BIC is given by: where n and p represent the magnitude of the sample data and the number of variables in the mathematical formula, respectively. ln(n) in BIC tends to significantly penalize complex models. Moreover, while the size of the dataset n increases, the BIC will be more likely to decide matched-model approaches.

Case Study
The printing data proposed by Box and Draper [38] are discussed in this study for comparative analysis; these data had been used by Vining and Myers [8] and Lin and Tu [11] as well. Three experimental parameters, x 1 , x 2 , and x 3 (speed, pressure, and distance), of a printing machine are treated as input variables to examine the ability to apply colored inks to package levels (y). These three control factors are assumed to be examined in three levels (−1, 0, +1), so that there are 27 runs in total. Based on the general full factorial design in the design of experiments, it contains 27 experimental runs considering all combinations of three levels of three factors. The order of the experiment was set in the standard order, and three repeated experiments were performed for each run. Experimental data (Box and Draper [38]) lists the experimental configurations, which include process mean, standard deviation, and variability, with their corresponding design points.
A variety of criteria have been used to analyze RD solutions. Among them, the expected quality loss (EQL) is widely used as a critical optimization criterion. The expectation of the loss function can be expressed as where θ signifies a positive loss coefficient, θ = 1, andμ(x), τ, andσ(x) are the estimated mean function, desirable target value, and estimated standard deviation function, respectively. In this example, the target value is τ = 500. As this model does not exhibit the unrealistic constraint of forcing the estimated mean response to a specific target value, it avoids misleading the zero-bias logic. The main objective of minimizing process bias and variability to obtain efficient solutions has allowed a slight bias between the estimated mean and its assigned target. For this reason, the EQL is selected as an identification and comparison tool to evaluate optimal solutions obtained from each model.
MATLAB is used in this study to perform the estimated regression functions of mean and standard deviation using the proposed dual-response approach and conventional LSMbased RSM, respectively. The correlation coefficients of the estimated response functions based on Vining and Myers' [8] dual-response approach are listed in Table 1.  Table 2 lists the proposed NN-functional-link-based dual-response RD estimation model after the training procedure. The weights and biases of the NN for the estimated mean and standard deviation functions are listed in Tables 3 and 4, respectively. In these tables, W mean in_hid , w mean hid_out T , b mean hid , and b mean out represent the weight connection from the input to the hidden layers, the weight connection from the hidden layers to the output, the process bias in the hidden layers, and the process bias in the output layer of the observed mean formula, respectively.
Similarly, W std in_hid , w std hid_out T , b std hid , and b std out represent the weight connection from the input to the hidden layers, the weight connection from the hidden layers to the output, the process bias in the hidden layers, and the process bias in the output layer of the observed standard deviation formula, respectively. According to the estimated regression formulas of the process mean and standard deviation, the response functions of the dual-response models between parameters x 1 and x 2 for two estimation methods (i.e., LSM and NN) are illustrated in Figures 4 and 5, including statistical indexes such as coefficients of determination (R 2 ) and root-meansquare error (RMSE). The rest of parameters x 1 and x 3 , x 2 and x 3 are additionally provided as figures in Appendix A. The process mean, bias, and variance obtained from the proposed functional-link-NN-based dual-response estimation approach and the conventional LSMbased RSM are compared in Table 5 based on the corresponding optimal input variable settings and the EQL results. Table 5 implies that the proposed RD dual-response estimation approach based on the functional link NN produced a significantly smaller EQL value than the conventional LSM-based RSM. In addition, compared with the conventional LSM, the proposed method can provide lower process variance. The process mean and the squared bias with regard to the variability of the standard LSM-based RSM and proposed NN-based estimation model are shown in Figure 6. The criterion spaces of the estimated regression functions are denoted by red stars. The optimal settings are marked as green star in each figure.

Conclusions and Further Studies
There have been many studies to improve the RSM by combining statistical a mathematical techniques, but there are cases where particular data types are required assumptions are made to define functions between mean, variance, and input elemen NNs, which have recently been widely applied in artificial intelligence, can present sim mathematical models (functions) using artificial neurons and determine unkno interactions between the input and output performance of a process without a knowledge of the principle.
This study has described a functional-link-NN-based estimation method that off an alternative RD technique without assumptions inherent in the conventional LSM-ba RSM. Compared with the existing RD dual-response estimation approach, the propo method provides significant advantages in determining the functional relations between the control factors and output performances and the optimal solutions. T proposed dual-response estimation approach can be quickly and efficiently implemen using MATLAB (see Appendix B). Experimental results show that the proposed N based estimation method can achieve better solutions than the conventional LSM-ba RSM in the EQL metric.
In the future, the proposed functional-link-NN-based dual-response RD estimat approach will be extended to time series data and multiple-response optimizat problems. In addition, we plan to search for the optimal structure by binary coding neural network structure with a genetic algorithm and conducting research on optimiz the weights of the neural network.

Conclusions and Further Studies
There have been many studies to improve the RSM by combining statistical and mathematical techniques, but there are cases where particular data types are required or assumptions are made to define functions between mean, variance, and input elements. NNs, which have recently been widely applied in artificial intelligence, can present simple mathematical models (functions) using artificial neurons and determine unknown interactions between the input and output performance of a process without any knowledge of the principle.
This study has described a functional-link-NN-based estimation method that offers an alternative RD technique without assumptions inherent in the conventional LSM-based RSM. Compared with the existing RD dual-response estimation approach, the proposed method provides significant advantages in determining the functional relationship between the control factors and output performances and the optimal solutions. The proposed dual-response estimation approach can be quickly and efficiently implemented using MATLAB (see Appendix B). Experimental results show that the proposed NN-based estimation method can achieve better solutions than the conventional LSM-based RSM in the EQL metric.
In the future, the proposed functional-link-NN-based dual-response RD estimation approach will be extended to time series data and multiple-response optimization problems. In addition, we plan to search for the optimal structure by binary coding the neural network structure with a genetic algorithm and conducting research on optimizing the weights of the neural network.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Estimated Regression Formulas of the Process Mean and Standard Deviation
The estimated regression formulas of the process mean and standard deviation, the response functions of the dual-response model for the estimation methods, are illustrated in Figures A1-A4.

Conflicts of Interest:
The authors declare no conflicts of interest.