Selecting the Best Quantity and Variety of Surrogates for an Ensemble Model

: Surrogate modeling techniques are widely used to replace the computationally expensive black-box functions in engineering. As a combination of individual surrogate models, an ensemble of surrogates is preferred due to its strong robustness. However, how to select the best quantity and variety of surrogates for an ensemble has always been a challenging task. In this work, ﬁve popular surrogate modeling techniques including polynomial response surface (PRS), radial basis functions (RBF), kriging (KRG), Gaussian process (GP) and linear shepard (SHEP) are considered as the basic surrogate models, resulting in twenty-six ensemble models by using a previously presented weights selection method. The best ensemble model is expected to be found by comparative studies on prediction accuracy and robustness. By testing eight mathematical problems and two engineering examples, we found that: (1) in general, using as many accurate surrogates as possible to construct ensemble models will improve the prediction performance and (2) ensemble models can be used as an insurance rather than o ﬀ ering signiﬁcant improvements. Moreover, the ensemble of three surrogates PRS, RBF and KRG is preferred based on the prediction performance. The results provide engineering practitioners with guidance on the superior choice of the quantity and variety of surrogates for an ensemble.


Introduction
Surrogate models, also called metamodels, utilize interpolation and regression methods to approximate the computation-intensive black-box functions. They has attracted more attention in recent decades. Researchers are expecting to develop a best surrogate modeling technique for all engineering applications. Therefore, many comparative studies on surrogate modeling techniques under multiple modeling criteria are presented. Jin et al. [1] provide the first systematic comparative report on the performances of four surrogates polynomial response surface (PRS), radial basis functions (RBF), kriging (KRG) and multivariate adaptive regression splines (MARS) based on multiple performance criteria and multiple test problems. The results show that these four surrogate modeling techniques have both advantages and disadvantages. Mullur [2] proposes an extended radial basis function (E-RBF), which offers more flexibility than the typical radial basis functions (RBF). To further understand the advantages and limitations of this new surrogate modeling technique, Mullur and Messac [3] compared the performances of E-RBF to that of PRS, RBF, KRG. The E-RBF is found to outperform the other surrogate models. Zhao and Xue [4] observed the relationships between the sample quality merits and the performance measures of the PRS, RBF, KRG and Bayesian neural network (BNN) models. They provide simple guidelines to select the candidate surrogate models

Research Objectives
The question of how to determine the weight factors for constructing an ensemble of surrogates, attracts most researchers' attention. Thus, many weights selection methods have been presented [9,10,14,15,27,28]. When forming an ensemble of surrogates, we generally select the basic surrogate models according to past experience and personal preference. However, it's possible that selecting inappropriate surrogates will result in a loss of accuracy. Moreover, selecting excessive surrogates will lead to a loss of modeling efficiency. The research objective presented in this work, is to carry out a systematic and comprehensive study to answer the following two questions: (1) How to select the appropriate variety of surrogates for an ensemble? Do we need to use surrogates that are as accurate as possible? (2) How to select the appropriate quantity of surrogates for an ensemble? Do we need to use as many surrogates as possible?
We do not discuss the measures for evaluating the weight factors in this paper. The existing weights selection method presented in the authors' previous work is used [27]. It will be briefly introduced in the next section.

Weights Selection Method
Intelligent selection of weights is important for building a superior ensemble of surrogates. The weights are carefully allocated to improve the overall prediction accuracy of the ensemble. Goel et al. [9] consider that the weights should not only reflect the confidence in the surrogates, but also filter out the adverse effects of surrogates which perform poorly in sampling sparse regions. A weights selection method addressing these two issues is proposed and formulated as: where w i is the weight associated with the ith basic surrogate, E i is the given error measure of the ith basic surrogate, E indicates the average value of all surrogates' error measures. This weighting scheme needs to specify two parameters α and β which control the importance of averaging and individual surrogates, respectively. Goel et al. [9] found that α = 0.05 and β = −1 results in a better ensemble model in their study. Acar and Rais-Rohani [10] considered that the fixed values of two key parameters α and β will lead to a loss of flexibility. Inspired by the work from Goel et al. [9], Acar and Rais-Rohani [10] selected the weights by solving an optimization problem of the form: where Err{} indicates the chosen error metric that measures accuracy of the ensemble modelŷ e . N is the number of training points, y(x k ) denotes the true response value at the kth sample point x k ,ŷ i (x k ) indicates the predicted response of the ith basic surrogate at the kth sample point x k . It is noticed that the number of optimal parameters in this weights selection method equals to the number of basic surrogates. That is, the more surrogates that are used to construct the ensemble of surrogates, the more computational cost is required. Here, we advise the parameters α and β can be optimized to minimize the error measure of the ensemble model. We propose a weights selection method through solving the optimization problem formulated as: where E e and E i represent the error measure, i.e., Prediction sum of squares of root mean square (PRESS RMS ) of the ensemble model and ith basic surrogate, respectively.ŷ indicates the predicted response of the ith basic surrogate generated using N − 1 training points without the kth sample point x k (i.e., leave-one-out cross-validation strategy). Here, a more restrictive constraint E i + αE > 0 is added to improve the quality of the proposed weights selection method in contrast to the work [9,10]. The MATLAB ® function "fmincon" is employed to produce the sequential quadratic programming (SQP) algorithm for solving this optimization problem.

Basic Surrogate Models and Derived Ensemble Models
Five basic surrogate models including PRS, RBF, KRG, GP and SHEP are selected to form twenty-six different ensembles of surrogates. For the sake of simplicity, the selected individual surrogates and formed ensemble models are named as IS1~IS5 and ES1~ES26, respectively. Table 1 provides details about twenty-six ensemble models and five basic surrogates used during the investigation. The SURROGATES toolbox developed by Viana [29] was employed to build five basic surrogate models. The SURROGATES toolbox fits the radial basis function model using the RBF toolbox of Jekabsons [30], fits the KRG model using the DACE toolbox of Lophaven et al. [31], and fits the GP model using the GP toolbox of Forrester [32]. PRS is a typical regression method that has benefits in handling convex problems. The second-order fully quadratic polynomial is used in this work. And the unknown regression coefficients can be determined by the least squares method. The RBF model is more appropriate for predicting the scattered multivariate data. It is composed of multiple radial basis functions. The RBF model is based on the multi-quadric formulation with the constant, c = 1. The KRG model is good at dealing with nonlinear problems. It estimates the value of a function as a combination of a known function and unknown departures. A Gaussian correlation function and a constant trend model are used in the KRG model. The GP model provides a flexible model and model-based estimate of the prediction error even if the simulation itself is deterministic. It can also be used when the simulation is stochastic, although this requires an extension of the model. The covariance function in the GP model is selected as the squared exponential function with an automatic relevance determination distance measure. The SHEP model is well suited for interpolating the material density field by using nodal density values. The linear Shepard interpolation scheme is employed to build the SHEP model.  IS1  PRS  ES11  PRS-RBF-KRG  IS2  RBF  ES12  PRS-RBF-GP  IS3  KRG  ES13  PRS-RBF-SHEP  IS4  GP  ES14  PRS-KRG-GP  IS5  SHEP  ES15  PRS-KRG-SHEP  ES1  PRS-RBF  ES16  PRS-GP-SHEP  ES2  PRS-KRG  ES17  RBF-KRG-GP  ES3 PRS

Test Problems
The prediction capability of surrogate models is tested by eight mathematical problems and two engineering examples from the publications [10,27]. The specific mathematical formulas are summarized as follows (3) Goldstein and Price function (GF) with n = 2 (4) Hartman function (HN3) with n = 3 (8) Dixon-Price function (DP) with n = 12 (9) The tension spring design (TSD) problem is taken from Arora [33]. The schematic of this design is illustrated in Figure 1. The objective is to minimize the weight of a tension spring, whose response function is written as Three variables are identified: diameter x 1 , mean coil diameter x 2 , and number of active coils x 3 . (10) The I-beam design (IBD) problem was firstly presented by Messac and Mullur [34]. The schematic of this design is illustrated in Figure 2. It aims to minimize the vertical deflection of an I-beam, whose objective function is formulated as (9) The tension spring design (TSD) problem is taken from Arora [33]. The schematic of this design is illustrated in Figure 1. The objective is to minimize the weight of a tension spring, whose response function is written as Three variables are identified: diameter x1, mean coil diameter x2, and number of active coils x3. (10) The I-beam design (IBD) problem was firstly presented by Messac and Mullur [34]. The schematic of this design is illustrated in Figure 2. It aims to minimize the vertical deflection of an I-beam, whose objective function is formulated as  The Latin hypercube sampling (LHS) method is used to determine the locations of the points for all test problems. The MATLAB ® function "lhsdesign" and "maximin" criterion with a maximum of 100 and 10 iterations are employed to generate the training points and test points, separately. The details about the training and test data for each test problem are given in Table 2. As shown in Table 2, the training set for each test problem is composed of 12 to 182 points depending on the number of input variables. In order to reduce the influence of random sampling, 1000 different training sets are used for all test problems.

Performance Measures
The prediction accuracy of five individual surrogate models and twenty-six ensemble models is evaluated by three classical performance measures including the coefficient of determination R 2 , root mean square error (RMSE) and maximum absolute error (MAE) [35][36][37]. They are expressed as where K is the number of test points, yi and ˆi y are the actual and predicted response values at the ith test point, separately. i y and ŷ are the average values of actual and predicted response values at all test points, respectively. In this work, the MATLAB ® function "corrcoef " is employed to calculate R 2 . A larger R 2 and smaller RMSE and MAE would indicate a good prediction capability. The known parameters for the IBD problem include: Young's modulus of elasticity E = 20000 kN/cm 2 , maximal bending forces P = 600 kN and length of the beam L = 200 cm.
The Latin hypercube sampling (LHS) method is used to determine the locations of the points for all test problems. The MATLAB ® function "lhsdesign" and "maximin" criterion with a maximum of 100 and 10 iterations are employed to generate the training points and test points, separately. The details about the training and test data for each test problem are given in Table 2. As shown in Table 2, the training set for each test problem is composed of 12 to 182 points depending on the number of input variables. In order to reduce the influence of random sampling, 1000 different training sets are used for all test problems.

Performance Measures
The prediction accuracy of five individual surrogate models and twenty-six ensemble models is evaluated by three classical performance measures including the coefficient of determination R 2 , root mean square error (RMSE) and maximum absolute error (MAE) [35][36][37]. They are expressed as where K is the number of test points, y i andŷ i are the actual and predicted response values at the ith test point, separately. y i andŷ are the average values of actual and predicted response values at all test points, respectively. In this work, the MATLAB ® function "corrcoef " is employed to calculate R 2 . A larger R 2 and smaller RMSE and MAE would indicate a good prediction capability.

Results and Discussions
The prediction accuracy and robustness of all twenty-six ensemble models derived from five basic surrogate models PRS, RBF, KRG, GP, SHEP were tested and compared and the results are presented in this section. The average values of the performance measures R 2 , RMSE and MAE for all test problems are given in Tables 3-5. In order to facilitate the comparison, the average values of RMSE and MAE are normalized with respect to the most accurate stand-alone surrogates while the R 2 metric is not (since it is already a normalized value). The best values of each performance measure are shown in boldface. Thereinto, the numbers in brackets indicate the ranks of the prediction accuracy and the summation of ranks for all test problems are also provided. As shown in Tables 3-5, there is no one surrogate model that performs the best for all test problems. The individual surrogate models IS1 (PRS) and IS2 (RBF) perform the best for test problem CB and GF, respectively, while IS3 (KRG) performs the best for test problem BH. That is, the ensemble of surrogates may be less accurate than individual surrogates for some certain problems. Moreover, the stand-alone surrogate models PRS, RBF and KRG generally perform better than the GP and SHEP models. It is unclear how to accurately and efficiently select the appropriate surrogate models.
Firstly, we will discuss the first question from Section 2. Do we need to use surrogates that are as accurate as possible? As shown in Table 3, for individual surrogate models, IS3 (KRG) performs best, IS2 (RBF) takes second place, and IS1 (PRS) comes in third. For ensembles of two surrogates ES1~ES10, the performance measures of R 2 for ES5 (RBF-KRG), ES2 (PRS-KRG), ES1 (PRS-RBF) are larger than others. It can be found that ES5, ES2 and ES1 select more accurate stand-alone surrogates KRG, RBF and PRS. Furthermore, two more accurate surrogates KRG and RBF lead to the most accurate ensemble model ES5 (RBF-KRG). Similarly, for ensembles of three surrogates ES11~ES20, the total rank of ES11 (PRS-RBF-KRG) is significantly smaller than other ensembles of three surrogates. With regard to ensembles of four surrogates, ES21~ES25, ES21 (PRS-RBF-KRG-SHEP) and ES22 (PRS-RBF-KRG-GP) rank as the top two. It is noted that the performance measure of R 2 for ES11 is larger than other surrogate models used. When considering the results for RMSE and MAE in Tables 4 and 5, the same phenomenon appears as in Table 3. For the performance measure of RMSE, the top three individual surrogate models RBF, KRG, PRS result in the most accurate ensemble models: ES11 (PRS-RBF-KRG), ES5 (RBF-KRG) and ES22 (PRS-RBF-KRG-SHEP). As is the case for performance measure R 2 , the smaller the value of the performance measure RMSE indicates that the ensemble model ES11 (PRS-RBF-KRG) has the best prediction performance. As for the performance measure MAE, the top three individual surrogate models PRS, KRG, RBF result in the most accurate ensemble models ES11 (PRS-RBF-KRG), ES2 (PRS-KRG) and ES22 (PRS-RBF-KRG-SHEP). In accordance with the performance measures R 2 and RMSE, the performance measure of MAE for ES11 (PRS-RBF-KRG) is smallest among the selected thirty-one surrogate models. In summary, we can conclude that selecting surrogates that are as accurate as possible will lead to more accurate ensemble models when using the same quantity of stand-alone surrogates.
Then, we will discuss the second question from Section 2. Do we need to use as many surrogates as possible? The comparison results shown in Table 3, indicate that using poor surrogate models will result in a loss of accuracy of ensemble models. For example, the performance measure of R 2 for ensemble model ES5 (RBF-KRG) is larger than the ensemble models ES17 (RBF-KRG-GP), ES18 (RBF-KRG-SHEP) and ES25 (RBF-KRG-GP-SHEP). On the contrary, satisfactory surrogate models will lead to the advancement of accuracy of ensemble models. For instance, the performance measure of R 2 for ensemble model ES11 (PRS-RBF-KRG) is better than the ensemble models ES1 (PRS-RBF), ES2 (PRS-KRG) and ES5 (RBF-KRG). When considering the results on RMSE and MAE, in Tables 4 and 5, the same conclusion is reached as in Table 3. That is, whether or not to use more surrogate models depends on their prediction performance.   In order to facilitate the comparison of the prediction performances of the ensemble models with different quantities, Table 6 provides their average values of total ranks of performance measures for all test problems. It can be seen that the overall prediction performances of stand-alone surrogate models are inferior to the ensemble models. Furthermore, the more surrogate models we use for constructing ensemble models, the more accurate prediction performances we obtain. The results also show that the average prediction performance of the ensemble of four surrogates ES21-ES25 is compared to that of the ensemble of five surrogates ES26. We can conclude that using as many as possible surrogates may not improve the prediction performances of ensemble models due to the inaccurate surrogates. However, using as many as possible accurate surrogates will contribute to enhance the prediction performance. In fact, the ensemble model ES11 (PRS-RBF-KRG) shows the best prediction performance among twenty-six ensemble models and five individual surrogate models. In order to reveal the robustness of the different surrogate models, statistical graphics i.e., boxplots are used to show the deviations of the performance measures results. A smaller/shorter box implies a small standard deviation and the symbol (+) denotes an abnormal value. Figures 3-5 show the boxplot of performance measures for R 2 , RMSE and MAE, respectively. The comparative results show that the sizes of boxes of all surrogate models for different test problems are obviously variable. Moreover, the abnormal values from the used surrogate models mainly appear for all test problems due to the substantive experiments. Meanwhile, the standard deviation of individual surrogate models PRS, RBF and KRG is smaller than many ensemble models. Certainly, the standard deviation of all ensemble models is smaller than the worst stand-alone surrogate model. The relatively smaller boxes of ensemble models for most test problems demonstrate that it has a higher robustness, compared with the individual surrogate models and other ensemble models. In conclusion, it can be seen that: (1) The prediction performances of the five individual surrogate models and twenty-six ensemble models vary, apparently with different test problems; (2) The individual surrogate models PRS, RBF and KRG reveal the satisfactory prediction performances which are not worse than most of the ensemble models; (3) The twenty-six ensemble models perform better than the worst individual surrogate model, which demonstrates the necessity for adopting the ensemble techniques; (4) The ensemble of four surrogates ES21-ES25 and ensemble of five surrogates ES26, on the whole provide better accuracy and robustness under three performance measures for ten test problems in this work. In general, we suggest that ensemble models can be used as an insurance, rather than for offering significant improvement. From the accuracy and robustness perspectives, the ensemble model ES11 (PRS-RBF-KRG) is suggested in this work.  Figure 5. Boxplots of MAE for all test problems.

Conclusions
In this work, five basic surrogate models PRS, RBF, KRG, GP and SHEP are selected to form twenty-six ensemble models, through using a previously presented weights selection method. The prediction performances of the total thirty-one surrogate models are tested using eight mathematical problems and two engineering examples. The performance measures R 2 , RMSE and MAE are calculated to reflect the prediction accuracy and robustness. The comparative studies answer two questions raised in this paper. We suggest that more accurate surrogates should be used to construct the ensemble models on the premise of sufficient computational resources. Moreover, the ensemble of three surrogates ES11 (PRS-RBF-KRG) is preferred in view of prediction accuracy and robustness. The objective of this work is to guide researchers in selecting the appropriate quantity and variety of surrogate modeling techniques for building ensemble models, rather than concentrating on developing novel ensemble modeling methods. However, there are still some limitations in this study. Firstly, the use of only five basic surrogate models is limited. Secondly, the selection of weight factors is not investigated and discussed. More individual surrogate models will be selected as the basic surrogate models for the comparative study in future work. Additionally, the investigation on the choice of weights in different ensembles will also be studied.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions
In this work, five basic surrogate models PRS, RBF, KRG, GP and SHEP are selected to form twenty-six ensemble models, through using a previously presented weights selection method. The prediction performances of the total thirty-one surrogate models are tested using eight mathematical problems and two engineering examples. The performance measures R 2 , RMSE and MAE are calculated to reflect the prediction accuracy and robustness. The comparative studies answer two questions raised in this paper. We suggest that more accurate surrogates should be used to construct the ensemble models on the premise of sufficient computational resources. Moreover, the ensemble of three surrogates ES11 (PRS-RBF-KRG) is preferred in view of prediction accuracy and robustness. The objective of this work is to guide researchers in selecting the appropriate quantity and variety of surrogate modeling techniques for building ensemble models, rather than concentrating on developing novel ensemble modeling methods. However, there are still some limitations in this study. Firstly, the use of only five basic surrogate models is limited. Secondly, the selection of weight factors is not investigated and discussed. More individual surrogate models will be selected as the basic surrogate models for the comparative study in future work. Additionally, the investigation on the choice of weights in different ensembles will also be studied.

Conflicts of Interest:
The authors declare no conflict of interest.