1. Introduction
Surrogate models, also called metamodels, utilize interpolation and regression methods to approximate the computation-intensive black-box functions. They has attracted more attention in recent decades. Researchers are expecting to develop a best surrogate modeling technique for all engineering applications. Therefore, many comparative studies on surrogate modeling techniques under multiple modeling criteria are presented. Jin et al. [
1] provide the first systematic comparative report on the performances of four surrogates polynomial response surface (PRS), radial basis functions (RBF), kriging (KRG) and multivariate adaptive regression splines (MARS) based on multiple performance criteria and multiple test problems. The results show that these four surrogate modeling techniques have both advantages and disadvantages. Mullur [
2] proposes an extended radial basis function (E-RBF), which offers more flexibility than the typical radial basis functions (RBF). To further understand the advantages and limitations of this new surrogate modeling technique, Mullur and Messac [
3] compared the performances of E-RBF to that of PRS, RBF, KRG. The E-RBF is found to outperform the other surrogate models. Zhao and Xue [
4] observed the relationships between the sample quality merits and the performance measures of the PRS, RBF, KRG and Bayesian neural network (BNN) models. They provide simple guidelines to select the candidate surrogate models according to the sample quality merits and the performance measures. Gelder et al. [
5] carry out a comparative study of five surrogate modeling techniques containing PRS, RBF, KRG, MARS and neural networks (NN). This work contributes to guiding users to select a reliable and time-efficient surrogate modeling technique for building energy simulations and they recommend KRG and NN models. Salem and Tomaso [
6] proposed an automatic selection method to determine the most suitable surrogate models for the specific problems. They measured the quality of general surrogate models based on the internal accuracy, predictive performance and a roughness penalty. Keane and Voutchkov [
7] compared and contrasted a wide range of surrogate models for aerodynamic section performance modeling. They found that the NN model outperforms many existing surrogate modeling techniques when large quantities of data are available. Kianifar and Campean [
8] evaluated the performances of surrogate modeling techniques PRS, RBF, KRG using eighteen test problems and four engineering examples. The results show that KRG performs consistently well across different problems although it can be very time-consuming for large samples. It has been proved that different surrogate modeling techniques with diverse characteristics are suitable for different engineering problems. Most researchers face the difficulties of selecting the appropriate surrogate models for their own engineering applications. Thus, there is a great interest in an intuitive way to combine multiple surrogate models.
Research focusing on the ensembles of surrogates in the literature can be divided into two parts: (1) the measures for evaluating the weight factors, (2) the engineering applications using ensembles of surrogates. According to the measures for evaluating the weight factors, existing ensemble modeling techniques can be generally classified as global measures [
9,
10,
11,
12,
13], local measures [
14,
15,
16,
17,
18] and combing global and local measures [
19,
20]. The weight factors evaluated by global measures remain constant over the entire design space. Goel et al. [
9] allocate weight factors for the chosen individual surrogates PRS, RBF, and KRG by using the generalized mean square cross-validation error (GMSE). They state that the ensemble of surrogates can improve the robustness of the predictions by reducing the impact of a poor surrogate model. Acar and Rais-Rohani [
10] consider that the selection of weight factors can be directly determined based on the minimization of the certain global error metric. The results show that the optimized ensemble model provides more accurate predictions than the stand-alone surrogates. In contrast, the weight factors evaluated by local measures are pointwise changed with the variation of prediction points. Lee and Choi [
15] presented a new pointwise ensemble of surrogates whose weights are calculated by using a v nearest points cross-validation error. The regression models are also suggested as basic surrogates for ensemble construction, it can interpolate the function values at training points by the proposed control function. To take advantages of both global and local measures, Chen et al. [
19] divided the design space into two parts, whereas the weight factors are evaluated by using the different strategies, respectively. A large proportion of research indicates that the ensemble of surrogates can provide more accurate and robust results. Therefore, successful engineering applications using ensembles of surrogates are continuously reported [
21,
22,
23,
24,
25,
26]. Hamza and Saitou [
21] presented a multi-scenario co-evolutionary genetic algorithm (MSCGA) for vehicle structural crashworthiness via an ensemble of surrogates. Gu et al. [
22] employed an ensemble of surrogates to deal with the reliability-based design optimization for a vehicle occupant protection system. Dhamotharan et al. [
23] proposed an ensemble of surrogates based optimization framework for Savonius wind turbine design. Chen and Lu [
24] developed a new adaptive approach for reliability analysis by the ensemble learning of multiple competitive surrogate models including KRG, support vector regression (SVR) and polynomial chaos expansion. In all instances, the quantity and variety of individual surrogates in the ensembles are chosen in advance.
This work aims to carry out a systematic and comprehensive study on selecting the best quantity and variety of surrogates for an ensemble. Five classical surrogates including PRS, RBF, KRG, GP and SHEP are used as the basic surrogate models for building twenty-six ensembles whose weight factors are determined by the authors’ previous work [
27]. The performances of twenty-six ensembles along with five individual surrogates are measured using prediction accuracy and robustness. Thus, guidelines for selecting the appropriate quantity and variety of surrogates for an ensemble are proposed.
3. Weights Selection Method
Intelligent selection of weights is important for building a superior ensemble of surrogates. The weights are carefully allocated to improve the overall prediction accuracy of the ensemble. Goel et al. [
9] consider that the weights should not only reflect the confidence in the surrogates, but also filter out the adverse effects of surrogates which perform poorly in sampling sparse regions. A weights selection method addressing these two issues is proposed and formulated as:
where
wi is the weight associated with the
ith basic surrogate,
Ei is the given error measure of the
ith basic surrogate,
indicates the average value of all surrogates’ error measures. This weighting scheme needs to specify two parameters
α and
β which control the importance of averaging and individual surrogates, respectively. Goel et al. [
9] found that
α = 0.05 and
β = −1 results in a better ensemble model in their study.
Acar and Rais-Rohani [
10] considered that the fixed values of two key parameters
α and
β will lead to a loss of flexibility. Inspired by the work from Goel et al. [
9], Acar and Rais-Rohani [
10] selected the weights by solving an optimization problem of the form:
where
Err{} indicates the chosen error metric that measures accuracy of the ensemble model
.
N is the number of training points,
y(
xk) denotes the true response value at the
kth sample point
xk,
indicates the predicted response of the
ith basic surrogate at the
kth sample point
xk. It is noticed that the number of optimal parameters in this weights selection method equals to the number of basic surrogates. That is, the more surrogates that are used to construct the ensemble of surrogates, the more computational cost is required. Here, we advise the parameters
α and
β can be optimized to minimize the error measure of the ensemble model. We propose a weights selection method through solving the optimization problem formulated as:
where
Ee and
Ei represent the error measure, i.e., Prediction sum of squares of root mean square (
PRESSRMS) of the ensemble model and
ith basic surrogate, respectively.
indicates the predicted response of the
ith basic surrogate generated using
N − 1 training points without the
kth sample point
xk(i.e., leave-one-out cross-validation strategy). Here, a more restrictive constraint
is added to improve the quality of the proposed weights selection method in contrast to the work [
9,
10]. The MATLAB
® function “fmincon” is employed to produce the sequential quadratic programming (SQP) algorithm for solving this optimization problem.
5. Results and Discussions
The prediction accuracy and robustness of all twenty-six ensemble models derived from five basic surrogate models PRS, RBF, KRG, GP, SHEP were tested and compared and the results are presented in this section. The average values of the performance measures
R2, RMSE and MAE for all test problems are given in
Table 3,
Table 4 and
Table 5. In order to facilitate the comparison, the average values of RMSE and MAE are normalized with respect to the most accurate stand-alone surrogates while the
R2 metric is not (since it is already a normalized value). The best values of each performance measure are shown in boldface. Thereinto, the numbers in brackets indicate the ranks of the prediction accuracy and the summation of ranks for all test problems are also provided. As shown in
Table 3,
Table 4 and
Table 5, there is no one surrogate model that performs the best for all test problems. The individual surrogate models IS1 (PRS) and IS2 (RBF) perform the best for test problem CB and GF, respectively, while IS3 (KRG) performs the best for test problem BH. That is, the ensemble of surrogates may be less accurate than individual surrogates for some certain problems. Moreover, the stand-alone surrogate models PRS, RBF and KRG generally perform better than the GP and SHEP models. It is unclear how to accurately and efficiently select the appropriate surrogate models.
Firstly, we will discuss the first question from
Section 2. Do we need to use surrogates that are as accurate as possible? As shown in
Table 3, for individual surrogate models, IS3 (KRG) performs best, IS2 (RBF) takes second place, and IS1 (PRS) comes in third. For ensembles of two surrogates ES1~ES10, the performance measures of
R2 for ES5 (RBF-KRG), ES2 (PRS-KRG), ES1 (PRS-RBF) are larger than others. It can be found that ES5, ES2 and ES1 select more accurate stand-alone surrogates KRG, RBF and PRS. Furthermore, two more accurate surrogates KRG and RBF lead to the most accurate ensemble model ES5 (RBF-KRG). Similarly, for ensembles of three surrogates ES11~ES20, the total rank of ES11 (PRS-RBF-KRG) is significantly smaller than other ensembles of three surrogates. With regard to ensembles of four surrogates, ES21~ES25, ES21 (PRS-RBF-KRG-SHEP) and ES22 (PRS-RBF-KRG-GP) rank as the top two. It is noted that the performance measure of
R2 for ES11 is larger than other surrogate models used. When considering the results for RMSE and MAE in
Table 4 and
Table 5, the same phenomenon appears as in
Table 3. For the performance measure of RMSE, the top three individual surrogate models RBF, KRG, PRS result in the most accurate ensemble models: ES11 (PRS-RBF-KRG), ES5 (RBF-KRG) and ES22 (PRS-RBF-KRG-SHEP). As is the case for performance measure
R2, the smaller the value of the performance measure RMSE indicates that the ensemble model ES11 (PRS-RBF-KRG) has the best prediction performance. As for the performance measure MAE, the top three individual surrogate models PRS, KRG, RBF result in the most accurate ensemble models ES11 (PRS-RBF-KRG), ES2 (PRS-KRG) and ES22 (PRS-RBF-KRG-SHEP). In accordance with the performance measures
R2 and RMSE, the performance measure of MAE for ES11 (PRS-RBF-KRG) is smallest among the selected thirty-one surrogate models. In summary, we can conclude that selecting surrogates that are as accurate as possible will lead to more accurate ensemble models when using the same quantity of stand-alone surrogates.
Then, we will discuss the second question from
Section 2. Do we need to use as many surrogates as possible? The comparison results shown in
Table 3, indicate that using poor surrogate models will result in a loss of accuracy of ensemble models. For example, the performance measure of
R2 for ensemble model ES5 (RBF-KRG) is larger than the ensemble models ES17 (RBF-KRG-GP), ES18 (RBF-KRG-SHEP) and ES25 (RBF-KRG-GP-SHEP). On the contrary, satisfactory surrogate models will lead to the advancement of accuracy of ensemble models. For instance, the performance measure of
R2 for ensemble model ES11 (PRS-RBF-KRG) is better than the ensemble models ES1 (PRS-RBF), ES2 (PRS-KRG) and ES5 (RBF-KRG). When considering the results on RMSE and MAE, in
Table 4 and
Table 5, the same conclusion is reached as in
Table 3. That is, whether or not to use more surrogate models depends on their prediction performance.
In order to facilitate the comparison of the prediction performances of the ensemble models with different quantities,
Table 6 provides their average values of total ranks of performance measures for all test problems. It can be seen that the overall prediction performances of stand-alone surrogate models are inferior to the ensemble models. Furthermore, the more surrogate models we use for constructing ensemble models, the more accurate prediction performances we obtain. The results also show that the average prediction performance of the ensemble of four surrogates ES21–ES25 is compared to that of the ensemble of five surrogates ES26. We can conclude that using as many as possible surrogates may not improve the prediction performances of ensemble models due to the inaccurate surrogates. However, using as many as possible accurate surrogates will contribute to enhance the prediction performance. In fact, the ensemble model ES11 (PRS-RBF-KRG) shows the best prediction performance among twenty-six ensemble models and five individual surrogate models.
In order to reveal the robustness of the different surrogate models, statistical graphics i.e., boxplots are used to show the deviations of the performance measures results. A smaller/shorter box implies a small standard deviation and the symbol (
+) denotes an abnormal value.
Figure 3,
Figure 4 and
Figure 5 show the boxplot of performance measures for
R2, RMSE and MAE, respectively. The comparative results show that the sizes of boxes of all surrogate models for different test problems are obviously variable. Moreover, the abnormal values from the used surrogate models mainly appear for all test problems due to the substantive experiments. Meanwhile, the standard deviation of individual surrogate models PRS, RBF and KRG is smaller than many ensemble models. Certainly, the standard deviation of all ensemble models is smaller than the worst stand-alone surrogate model. The relatively smaller boxes of ensemble models for most test problems demonstrate that it has a higher robustness, compared with the individual surrogate models and other ensemble models. In conclusion, it can be seen that: (1) The prediction performances of the five individual surrogate models and twenty-six ensemble models vary, apparently with different test problems; (2) The individual surrogate models PRS, RBF and KRG reveal the satisfactory prediction performances which are not worse than most of the ensemble models; (3) The twenty-six ensemble models perform better than the worst individual surrogate model, which demonstrates the necessity for adopting the ensemble techniques; (4) The ensemble of four surrogates ES21–ES25 and ensemble of five surrogates ES26, on the whole provide better accuracy and robustness under three performance measures for ten test problems in this work. In general, we suggest that ensemble models can be used as an insurance, rather than for offering significant improvement. From the accuracy and robustness perspectives, the ensemble model ES11 (PRS-RBF-KRG) is suggested in this work.
6. Conclusions
In this work, five basic surrogate models PRS, RBF, KRG, GP and SHEP are selected to form twenty-six ensemble models, through using a previously presented weights selection method. The prediction performances of the total thirty-one surrogate models are tested using eight mathematical problems and two engineering examples. The performance measures R2, RMSE and MAE are calculated to reflect the prediction accuracy and robustness. The comparative studies answer two questions raised in this paper. We suggest that more accurate surrogates should be used to construct the ensemble models on the premise of sufficient computational resources. Moreover, the ensemble of three surrogates ES11 (PRS-RBF-KRG) is preferred in view of prediction accuracy and robustness. The objective of this work is to guide researchers in selecting the appropriate quantity and variety of surrogate modeling techniques for building ensemble models, rather than concentrating on developing novel ensemble modeling methods. However, there are still some limitations in this study. Firstly, the use of only five basic surrogate models is limited. Secondly, the selection of weight factors is not investigated and discussed. More individual surrogate models will be selected as the basic surrogate models for the comparative study in future work. Additionally, the investigation on the choice of weights in different ensembles will also be studied.