Multi-Fidelity Sparse Polynomial Chaos and Kriging Surrogate Models Applied to Analytical Benchmark Problems

: In this article, multi-ﬁdelity kriging and sparse polynomial chaos expansion (SPCE) surrogate models are constructed. In addition, a novel combination of the two surrogate approaches into a multi-ﬁdelity SPCE-Kriging model will be presented. Accurate surrogate models, once obtained, can be employed for evaluating a large number of designs for uncertainty quantiﬁcation, optimization, or design space exploration. Analytical benchmark problems are used to show that accurate multiﬁdelity surrogate models can be obtained at lower computational cost than high-ﬁdelity models. The benchmarks include non-polynomial and polynomial functions of various input dimensions, lower dimensional heterogeneous non-polynomial functions, as well as a coupled spring-mass-system. Overall, multi-ﬁdelity models are more accurate than high-ﬁdelity ones for the same cost, especially when only a few high-ﬁdelity training points are employed. Full-order PCEs tend to be a factor of two or so worse than SPCES in terms of overall accuracy. The combination of the two approaches into the SPCE-Kriging model leads to a more accurate and ﬂexible method overall.


Introduction
The North Atlantic Treaty Organization (NATO) Science and Technology Organization (STO) working group AVT-331 "Goal-driven, multi-fidelity approaches for military vehicle system-level design" is currently investigating computing frameworks that could significantly impact the design of next-generation military vehicles [1]. In order to save computational time for uncertainty quantification, optimization, or design space exploration, the use of surrogate models can be a very good strategy. In a surrogate model, expensive function evaluations are replaced by an approximation, which is computationally inexpensive and can thus be evaluated extensively. Especially polynomial chaos expansions (PCEs) [2][3][4] and kriging [5][6][7][8] have been widely used in the community. Kriging accuracy can be improved by a dynamic training point selection process rather than picking all training points randomly at the beginning [9]. This is analogous to the concept of expected improvement (EI) which tries to keep a balance between exploration (space filling) and exploitation (resolve interesting areas in the design space).
Additionally, the compressive sensing (CS) theory [22][23][24][25][26][27][28][29] has shown great potential to reduce the curse of dimensionality for PCEs depending on the decay rate of the PCE coefficients and thus the underlying sparsity of the solution. CS offers a framework to use fewer training points than basis functions and to still recover a good approximation to the sparse solution [27][28][29][30]. CS also increases the robustness to noise thanks to the introduced sparsity constraint [28,29]. In summary, CS aims at selecting a small number of basis polynomials with great impact on the model response [31] and the resulting surrogate model is called sparse PCE (SPCE).
Thus far, these two distinct surrogate modeling approaches (SPCE and kriging) have been applied in various fields more or less standalone. There have been a few attempts to combine kriging and SPCE in a systematic way for single-fidelity applications [32,33], but none for multi-fidelity. SPCE is well known for capturing the overall trends of the solution, whereas kriging handles the values around training points well [32,33]. In this work, a proposed MF SPCE-Kriging method aims at combining the advantages of both metamodeling methods, thus expecting fewer training points to be required for constructing an accurate model in lieu of computationally expensive HF simulations.
The general goal of this work is to construct very accurate surrogate models at small overall computational cost assuming that the dominant cost factor is obtaining high-fidelity training point information. The strategy to accomplish this is to enhance surrogate modeling techniques by employing lower fidelity information, adaptively selecting training points, as well as compressed sensing and to use a combined SPCE-Kriging meta-modeling method.

Methodology
The multi-fidelity kriging and sparse PCE methods in Sections 2.2 and 2.3 have been presented in previous works [30,34,35], but are briefly presented here for completeness. Section 2.4 presents the novel combination of the two methods, but first a brief overview of universal kriging is given in Section 2.1.

Single-Fidelity Universal Kriging
Kriging [5] is a Gaussian random process realization to estimate a function value in a sample location,f (x): where β T h(x) is the mean value or trend of the Gaussian process and Z(x) represents a stationary Gaussian process with zero mean, variance σ 2 and the covariance of two locations, x i and x j , is given by where R is an autocorrelation function, which is only dependent on the distance between the two locations due to stationarity and a hyper-parameter, θ. In this work Wendland C4 [36] auto-correlation functions are employed. The most general kriging formulation is called universal kriging, in which the mean is given by a sum of P + 1 pre-selected trend functions, h k (x), that is where β k are the trend coefficients. In ordinary kriging the mean has a constant but unknown value, so P = 0, h 0 (x) = 1, and β 0 is unknown. Given a value for θ the calibration of the universal kriging model parameters can be computed analytically using an empirical best linear unbiased estimator (BLUE): where Y = { f (x i ), i = 1, . . . , N} are the N training point observations and The optimalθ is determined here by maximization of a log-likelihood function via a genetic algorithm (for more details see Yamazaki et al. [16]): (5) and the resulting β(θ) can then be used to predict the kriging response viâ where r i (x) = R(x, x i ,θ) is the correlation between a new sample location, x, and an existing training point, x i .

Multi-Fidelity Kriging
In order to create a multi-fidelity surrogate model one has to map the trend of the more intensively sampled low-fidelity (LF) data to the less intensively sampled high-fidelity (HF) data. Here, a hybrid bridge function approach adopted from Han et al. [18] is used that has been presented in previous work [37]. The relationship between low-and high-fidelity surrogate model values (indicated by a hat) in any location x is given bŷ whereφ(x) = g T (x)ρ is a low-order polynomial with q + 1 basis functions g T (x) = [1, g 1 (x), . . . , g q (x)] and corresponding coefficientsρ = [ρ 0 ,ρ 1 , . . . ,ρ q ] T andγ(x) is an additive bridge function. A multi-fidelity kriging surrogate model is constructed via the following three steps (visualized in Figure 1):

1.
Using N LF 1 lowest fidelity training points construct a kriging model,f LF 1 .

2.
Construct another kriging model for the additive bridge function,γ 2 , where γ 2 ( Compute an optimalρ 2 (andθ 2 ) during the maximum likelihood estimation updates forγ 2 , which means training point values γ 2 (x i ) will change constantly, but once converged one can evaluatê Repeat step 2 until reaching the highest fidelity level (this is analogous to a multigrid strategy).
Numerical experimentation showed what intuition suggests namely that the HF training locations should be a subset of the LF locations to make sure that no error is added fromf LF (x) when computing γ(x) = f HF (x) −φ(x)f LF (x) in all N HF training points. The other (N LF − N HF ) LF training locations are selected via latin hypercube sampling subject to a distance constraint to the existing HF points. Furthermore, whenever the dynamic training point algorithm [9] adds a HF point to the set, the corresponding LF point is also added.

Multi-Fidelity Sparse Polynomial Chaos Expansions
An all-at-once approach developed by Bryson and Rumpfkeil [13] is used to simultaneously determine additive, δ(x), and multiplicative, α(x), corrections to the low-fidelity data to best approximate the high-fidelity function in a least-squares sense: A non-intrusive point collocation regression procedure is employed and the PCE coefficients are determined by a singular value decomposition. Gauss-Patterson knots are used to determine the training points and higher grid levels contain the lower levels as a subset. In particular, Smolyak sparse grids with a slow-exponential growth rule are used via Burkardt's sparse grid mixed growth anisotropic rules library [38].
For the compressive sensing one can recover the signalb from an incomplete measurement vector Y by solving an optimization problem of the form where λ is a penalty coefficient and A is the regression matrix. The LASSO (least absolute shrinkage and selection operator) algorithm is employed to solve Equation (9) implemented via the software library mlpack [39]. Note that selecting a good value for λ is crucial: If λ is too small, it may result in over-fitting or a less sparse solution, while for large values of λ, the reconstructed signal may not be accurate enough. Following Salehi et al. [31] to determine an optimalλ, either the leave-one-out cross-validation error [40] or a true rootmean-square error (RMSE) has been used in previous work [30] and the latter is employed here. In order to findλ, the optimization problem is solved for various values of λ and the corresponding RMSE is computed. Then,λ is defined as the value of λ yielding the smallest RMSE.

Multi-Fidelity SPCE-Kriging
The descriptions given in Sections 2.1 and 2.3 show that kriging is interpolatory in nature and PCE is a regression. Thus, kriging usually recreates the local variations well, whereas PCE captures the global trends better [32,33]. Therefore, it is natural to try to combine the advantages of both kriging and SPCE for a more efficient meta-modeling technique. The resulting SPCE-Kriging model is a kriging surrogate, whose trend functions are determined by an SPCE model using the same underlying training data. The construction of a single-fidelity SPCE-Kriging consists of the following three main steps:

1.
Choose PCE orthogonal bases associated with the probability distributions of the (random) inputs;

2.
Solve for the PCE coefficients using the LASSO algorithm as described in Section 2.3 to determine which bases are most important; 3.
Use only those important bases as trend functions h k in the universal kriging mean value given by Equation (2) as well as in the maximum likelihood estimation using Equation (5).
For the construction of a multi-fidelity SPCE-Kriging, which to the authors' knowledge is a novel contribution, the algorithm presented in Section 2.2 is employed. However, there is an important caveat: SPCE-Kriging is only used in the first step (the training of the lowest fidelity surrogate model,f LF 1 ) and for the additive bridge function,γ 2 , an ordinary kriging model is constructed. The reason for this is that in step two, the genetic algorithm requires a lot of log-likelihood evaluations to compute the optimalρ 2 (andθ 2 ), which is required to computeφ 2 (x) = g T (x)ρ 2 and thus the N LF 2 training points for the additive bridge function, Hence, using SPCE-Kriging for the construction of the additive bridge function,γ 2 , would require a lot of SPCE evaluations, which are much more expensive than simple ordinary kriging constant trend computations.

Numerical Experiments Setup
The AVT-331 group aims to investigate goal-driven, multi-fidelity approaches for the system-level design of military vehicles, which usually requires the solution of optimization problems and the exploration/visualization of the global design space.

Assessment Metric
In order to assess the global accuracy of a surrogate model the root mean square error (RMSE) between the exact, f , and surrogate function predictions, f , calculated on a D-dimensional Cartesian mesh with N t total nodes will be used If the surrogate model is not deterministic (e.g., kriging) it will be executed five times, and the average RMSE and its standard deviation will be reported. For a deterministic surrogate (e.g., PCE) a single run is sufficient. The Cartesian mesh is equispaced with N x nodes in one coordinate direction as given in Table 1. Note that for D = 10 not the entire normalized interval [0, 1] is covered but rather only [0.05, 0.95] since otherwise a majority of the points are likely located in the extrapolation regimes of the surrogate models, which tend to be less accurate. Table 2 summarizes the recommended analytical benchmarks by the AVT-331 group to assess and compare the performance of multifidelity methods. The following subsections provide more details for each of the functions. The Forrester function [41] is a well-known one-dimensional benchmark for multifidelity methods, described by the following equations in the domain 0 ≤ x ≤ 1 (from 1 highest fidelity to 4 lowest fidelity level) and shown in Figure 2.

Rosenbrock Function
The Rosenbrock function is a well-known D-dimensional optimization benchmark problem in [−2, 2] D described by the following equation: The global minimum is inside a long, narrow, parabolic shaped flat valley. It is relatively straightforward to capture the overall quartic shape, but it is difficult to capture the local flat valleys.
The extension to multi-fidelity is described by the following equations, where f 2 can be considered as a medium-fidelity level and f 3 as the lowest fidelity.
The three-fidelity levels are shown for two dimensions in Figure 3.

Shifted-Rotated Rastrigin Function
To address real-world optimization problems, where the design space of the objective function is usually multi-modal, the Rastrigin function is selected as benchmark. The variable range is defined as x ∈ [−0.1, 0.2] D , and the function is shifted to change the position of the minimum and rotated to change the properties of the function itself within the variable space: where R is the rotation matrix in two dimensions which can be extended to D dimensions by using the Aguilera-Perez algorithm [42] and the rotation angle is set to θ = 0.2. The fidelity levels can be defined following the work of Wang and Jin [43] as with φ 1 = 10,000 (high-fidelity), φ 2 = 5000 (medium-fidelity), and φ 3 = 2500 (low-fidelity). The resolution error is given by These fidelity levels are displayed in Figure 4.

ALOS Function
This function employs heterogeneous non-polynomial analytic functions defined on unit cubes in one, two, and three dimensions. The one-and two-dimensional high-fidelity functions are taken from Clark and Bae [44] and are given by and An extension to three dimensions is given by Low-fidelity versions are obtained by using linear additive and multiplicative bridge functions: g LF (x, y) = (g − 2.0 + x + y)/(5.0 + 0.25x + 0.5y) (26) h LF (x, y, z) = (h − 2.0 + x + y + z)/(5.0 + 0.25x + 0.5y − 0.75z) The advantages of this problem are that it is heterogeneous and non-polynomial. The disadvantages are that it cannot be scaled to arbitrary dimensions and that the LF model is a simple non-linear polynomial scaling.
The one-and two-dimensional functions are displayed together with their low-fidelity approximations in Figure 5.

Coupled Spring-Mass-System
Three masses sliding along a frictionless surface are attached to each other by four springs, as shown in Figure 6. The following constants, variables and assumptions are going to be used in the analysis:

Spring Constants
The springs obey Hooke's law with constants k i , i = 1, . . . , 4 and the mass of each spring is negligible.

Fixed Ends
The first and last spring are attached to fixed walls and have the same spring constant. Mass Constants m 1 , m 2 , m 3 are point masses.

Position Variables
The variables x 1 (t), x 2 (t), x 3 (t) represent the mass positions measured from their equilibrium positions (negative is left and positive is right).
The equations of motion for this setup are given by which can be rewritten as where the mass matrix M, stiffness matrix K and the displacement x are defined as This is a constant-coefficient homogeneous system of second-order ODEs, the solution of which is given by where ω i = √ −λ i and λ i are the eigenvalues of the matrix M −1 K and v i are the corresponding eigenvectors. The constants a i and b i are determined by the initial conditions x(t = 0) = x 0 andẋ(t = 0) =ẋ 0 .

Proposed Benchmark Problem
As a small numerical example, let m 1 = m 2 = 1, k 1 = k 2 = k 3 = 1 as well as with real eigenvalues λ 1 = −1 and λ 2 = −3 and corresponding eigenvectors Since the initial velocity is zero, we have b 1 = b 2 = 0 and using x(t = 0) = x 0 in Equation (30) yields a 1 = a 2 = 0.5 such that x 1 (t) = 0.5 cos(t) + 0.5 cos( √ 3t) and x 2 (t) = 0.5 cos(t) − 0.5 cos( √ 3t) These solutions are plotted in Figure 7 for 0 ≤ t ≤ 30. Converting Equation (29) into a system of first-order ODEs and using the fourth-order accurate Runge-Kutta time-marching method yields a multi-fidelity analysis problem by varying the time-step size ∆t as shown in Figure 8. Note that the high-fidelity (HF) analysis with ∆t = 0.01 is not distinguishable from the analytical solution given by Equation (31) and shown in Figure 7, while the low-fidelity (LF) analysis with ∆t = 0.6 is exhibiting the correct trends but is somewhat inaccurate. Treating two spring constants as independent input variables with 1 ≤ k 1 = k 3 , k 2 ≤ 4 while m 1 = m 2 = 1 and computing x 1 (t = 6) yields the two-dimensional design space shown in Figure 9. One can infer that the lower fidelity trends match the high-fidelity ones.
In this work, a two-dimensional (by varying the two spring constants) and fourdimensional (varying two springs and two masses) version of this benchmark problem will be employed. However, other benchmark problems based on this problem for uncertainty quantification and optimization can be easily conceived [35].

Results
In the following, the results for each benchmark problem will be shown and discussed. Note that lines are plotted between markers as a visual aide, but do not necessarily reflect performance at intermediate points. For the PCE models, an oversampling ratio of two is always imposed, and the (full-order) PCE results are shown with dashed lines. In general, sparse PCEs (solid lines) outperform full-order PCEs up to a point when the λ term in Equation (9) introduces errors into the fit and the RMSE cannot drop further. SPCE runs for higher than four dimensions (and two for MF) were not possible due to memory issues because of the resulting large matrices. Ordinary kriging results are shown in dashed lines and SPCE-Kriging results (abbreviated as SK in the figures) are displayed in solid lines.
Results for high-fidelity (HF) training data alone are always shown in black whereas multi-fidelity results using fidelity levels 2, 3 or 4 are shown in red, green, and blue, respectively. For the stochastic kriging model, five runs were performed, and the figures show the average RMSE and its standard deviation. The total computational cost is calculated by the sum over all fidelity levels of the product of the number of function evaluations at that level and the given cost at that fidelity level (as given in Table 2). Please note that the vertical error axes differ in each figure since they were chosen to best display each particular result.

Forrester
The Forrester function results are shown in Figure 10 for both MF SPCE and MF kriging. The acronyms in this and following figures should be interpreted as follows: HF is using high-fidelity data alone with either PCE or ordinary kriging surrogate models whereas HFS and HFSK are employing SPCE and SPCE-Kriging surrogates, respectively. MF2 implies a multi-fidelity surrogate (either PCE or kriging) using data from the highest fidelity level and in this case fidelity level 2. Again, MF2S and MF2SK are instead employing SPCE and SPCE-Kriging surrogates, respectively. Since for MF kriging the number of lowfidelity points can be arbitrary, the starting amount is given. In this case, as 30 LF (that is all the starting HF locations plus latin hypercube sampled locations subject to a distance constraint). The number of LF training points for the MF PCEs is thirty levels higher for the Forrester function (i.e., 63 LF points are used when using 15 HF points). For the MF SPCE models, various orders for the polynomials were combined with several values of λ to find the combination with the lowest root-mean-square error (RMSE). The resulting best values for HF only and MF3 are shown in Table 3. Values for MF2 and MF4 are somewhat similar and are thus omitted here. One can infer that SPCE performs better than PCE up to a point and that MF models perform better than HF models, except for MF2, where the LF model is just too expensive. Kriging performs similarly to PCE when using only a few training points but then falls behind. SPCE-Kriging performs similarly to ordinary kriging.

Rosenbrock
The Rosenbrock function results are shown in Figure 11 for MF SPCE and in Figure 12 for MF Kriging. The PCE model should match the truth function exactly as soon as sufficient training data is available to build a fourth-order polynomial (including an oversampling ratio of two). For example, the full-order PCE surrogate model requires 30 HF training points in two dimensions and 2002 points in ten dimensions. Some of these points may be LF for the MF models. The number of LF training points in two dimensions is four grid levels higher (i.e., 97 LF points are used when using 17 HF points) and in five and ten dimensions it is one level higher (e.g., 51 HF are combined with 151 LF points in five dimensions and 201 HF are combined with 1201 LF points for ten dimensions). For ten dimensions, a two level difference was also tried, leading to, as expected, a better performance for the cheaper low-fidelity model, but a worse performance for the more expensive one. Overall, the results are behaving as one would expect, verifying that the PCE surrogate models are correctly implemented.
Kriging clearly does not perform as well as PCE for this polynomial test case, but one can infer that MF Kriging at least for the cheaper low-fidelity model outperforms kriging based on HF data alone. One can also see again that an expensive LF model (MF2) is not worth considering. HF SPCE-Kriging performs better than both PCE and SPCE, and MF SPCE-Kriging performs better than MF ordinary kriging, but not as well as HF SPCE-Kriging, though MF3 is better when using only a few HF training points. This demonstrates that the ordinary kriging performance, at least for polynomial functions, can be dramatically improved and also that only using ordinary kriging for the additive bridge function needs to be improved.

Rastrigin
The Rastrigin function results are shown in Figure 13 for MF SPCE and in Figure 14 for MF Kriging. Again, many different orders for the polynomials were combined with various values of λ to find the combination with the lowest root-mean-square error (RMSE) for the SPCE models. The resulting best values for HF and MF2 for the two-dimensional case are shown in Table 4. One can infer that SPCE performs better than PCE and that MF models perform better than HF models in the beginning when only a few HF training points are used. Once again, for ten dimensions, a one and two grid level difference was used, leading to, counterintuitively, a better performance for the more expensive low-fidelity model, but a worse one for the cheaper one. Kriging performs similarly to PCE in both five and ten dimensions, but falls behind for more training points in one dimension. Both HF and MF SPCE-Kriging again perform similarly to ordinary kriging.

ALOS
The ALOS function results are shown in Figure 15 for MF SPCE and in Figure 16 for MF Kriging. Again, many different orders for the polynomials were combined with various values of λ to find the combination with the lowest root-mean-square error (RMSE) for the SPCE models. The resulting best values for HF and MF2 for the two-dimensional case are shown in Table 5.   As usual, SPCE performs better than PCE up to a point and MF models perform better than HF models in the beginning when only a few HF training points are used. Overall, the performance of the multi-fidelity kriging is somewhat better than MF SPCE in two and three dimensions and somewhat comparable in one dimension. Both HF and MF SPCE-Kriging perform slightly better than ordinary kriging especially when more training points are used and in two and three dimensions.

Coupled Spring-Mass-System
For the MF SPCE models of the coupled spring-mass-system, many different orders for the polynomials were combined with several values of λ to find the combination with the lowest root-mean-square error (RMSE). The resulting best values for the two-dimensional case are shown in Table 6. The results for these settings are plotted on the left in Figure 17, whereas the performance of a MF kriging model is shown on the right. In general, one can see that the SPCE surrogate models outperform the full order ones (compare solid to dashed lines) again up to a point, and the mono-fidelity models are inferior to the multi-fidelity ones (compare black to red lines) for both the MF SPCE and MF kriging models. Overall, the performance of the multi-fidelity ordinary kriging is comparable to the one of MF SPCE and MF SPCE-Kriging. In addition to the two-dimensional case, a four-dimensional one where both spring constants and both masses are varied between one and four was considered as well. Again, many different orders for the SPCE are combined with various values for λ to find the combination with the lowest RMSE. The resulting best values are given in Table 7, and the results are shown in Figure 18. Table 7. Best order and λ values for the four-dimensional coupled spring-mass-system using MF SPCE. Once again, one can see that the SPCE surrogate model outperforms the full order one (compare solid to dashed line), and the MF models are better than the mono-fidelity ones (compare red to black lines) for both the MF SPCE and MF kriging models. Overall, the performance of the multi-fidelity ordinary kriging is somewhat better than MF SPCE.

Conclusions
The goal of this work was to build very accurate surrogate models for applications in design space exploration and optimization, as well as uncertainty quantification at small overall computational cost assuming that the dominant cost factor is obtaining high-fidelity training point information. The employed strategy was to enhance surrogate modeling techniques by employing lower fidelity information and adaptively selecting training points as well as to use compressed sensing and a combined SPCE-Kriging meta-modeling method.
Several analytical benchmark problems have been used to assess the performance of a multi-fidelity sparse polynomial chaos expansion (MF SPCE) model, a MF ordinary kriging model as well as a MF SPCE-Kriging model. Overall, multi-fidelity models are more accurate than high-fidelity ones for the same computational cost especially when only a few high-fidelity training points are employed. Full-order PCEs tend to be a factor of two or so worse than SPCES in terms of overall accuracy as measured by the root-mean-square error (RMSE) compared to a truth model. The combination of the sparse PCE and kriging models into a SPCE-Kriging model leads to a more accurate model overall, which for the employed benchmark problems was either on par with or outperformed both underlying methods, since it is incorporating the best of both approaches. It is also more flexible than the PCE or SPCE models, since the training points can be in arbitrary locations.  Institutional Review Board Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to public release restrictions.