Multi-Fidelity Gradient-Based Strategy for Robust Optimization in Computational Fluid Dynamics

: Efﬁcient Robust Design Optimization (RDO) strategies coupling a parsimonious uncertainty quantiﬁcation (UQ) method with a surrogate-based multi-objective genetic algorithm (SMOGA) are investigated for a test problem in computational ﬂuid dynamics (CFD), namely the inverse robust design of an expansion nozzle. The low-order statistics (mean and variance) of the stochastic cost function are computed through either a gradient-enhanced kriging (GEK) surrogate or through the less expensive, lower ﬁdelity, ﬁrst-order method of moments (MoM). Both the continuous (non-intrusive) and discrete (intrusive) adjoint methods are evaluated for computing the gradients required for GEK and MoM. In all cases, the results are assessed against a reference kriging UQ surrogate not using gradient information. Subsequently, the GEK and MoM UQ solvers are fused together to build a multi-ﬁdelity surrogate with adaptive inﬁll enrichment for the SMOGA optimizer. The resulting hybrid multi-ﬁdelity SMOGA RDO strategy ensures a good tradeoff between cost and accuracy, thus representing an efﬁcient approach for complex RDO problems.


Introduction
In recent years, robust design optimization (RDO) [1] has received increasing interest in engineering applications, due to its ability to provide efficient designs with a stable behavior under uncertainties of a diverse nature, such as randomly fluctuating operating conditions, geometric tolerances, and model uncertainties. Taguchi's method [2], relying on the simultaneous optimization of the average and variance of the stochastic cost functions, is by far the most popular RDO method, although approaches allowing accounting for rare events, such as the low-quantile [3,4] or the "horsetail matching" [5] methods, have been paid significant interest recently.
The main ingredient for RDO is an uncertainty quantification (UQ) method, allowing characterizing the probability distribution functions (pdf) or, at least, the lower order statistics of the cost functionals for each proposed design, in order to select those that guarantee the best possible average performance while avoiding critical deviations when nominal design conditions are not matched. According to the RDO method in use, a single objective deterministic design problem is generally converted into a multi-objective (Pareto front) one, with the aim to optimize the average performance while avoiding critical performance loss at off-design conditions. For this reason, RDO often combines an UQ solver with evolutionary algorithms (typically, multi-objective genetic algorithms (MOGA) [6,7]), which are naturally suited for providing a full set of compromise solutions among the multiple objectives. On the other hand, evolutionary optimizers are generally very demanding in terms of cost function evaluations, which may require in the end a prohibitive computational effort for problems described by costly computer models, such as those encountered in computational fluid dynamics (CFD), despite the use of massive parallelization [8][9][10]. In order to reduce the number of costly function calls, it is crucial to select parsimonious UQ methods and optimizers, the overall cost of RDO being typically the product of the cost of the two approaches [11]. Past examples of RDO in CFD include various forms of UQ solvers based on non-intrusive polynomial chaos expansion [11,12] or surrogate models such as simplex stochastic collocation [13] or kriging [9]. All of them require a number of CFD solves quickly increasing with the number of uncertain parameters, and their direct coupling with MOGA optimizers is not computationally affordable for industrial applications, especially if massively parallel computers are not available.
An interesting option for reducing the cost of UQ solves is to use gradient information. A simple method for approximating statistical moments of the cost function by Taylor series expansions is the so-called first-order method of moments (MoM) [14]. Such a method can be remarkably fast if the derivatives of the cost function with respect to the uncertain variables are readily available by means of a discrete or continuous adjoint solver [15][16][17][18][19][20]. Nevertheless, its accuracy is limited to Gaussian or weakly non-Gaussian processes with small uncertainties, since higher-order terms become increasingly important for strongly non-Gaussian input distributions. Some improvement can be achieved by using higher-order expansions, but these require information about higher-order sensitivity derivatives, which may represent a delicate and highly intrusive task. A more complete discussion can be found in [21]. An alternative to MoM, better suited for high uncertainty ranges and generic pdf, consists of leveraging gradient information to construct a high-quality surrogate from a reduced number of samples. Such an approach is used for instance in gradient-enhanced kriging (GEK) surrogates [22][23][24]. Once the surrogate is available, inexpensive Monte Carlo sampling on the response surface can be used to estimate the required statistics.
Massive parallelization is of great help for speeding up the RDO process [10,11,25], but it is not promptly applicable for routine industrial use. A way of drastically reducing the required number of function calls consists of replacing the costly CFD or UQ solvers with surrogate models, such as radial basis functions [26], artificial neural networks [27], and kriging [9,10], approximating variations of the cost functions through the design space. Such an approach is called a surrogate-based multi-objective genetic algorithm (SMOGA).
Further reductions of computational time can be achieved by combining models with various levels of fidelity during the optimization. Such so-called multi-fidelity (MF) models [28,29] leverage the use of inexpensive, but low-fidelity (LF) models for efficiently exploring the design or stochastic spaces, while using parsimonious high-fidelity (HF) samples to improve model accuracy. Examples of LF models are given by coarse-grid approximations [30][31][32][33], data-fit interpolation and regression models [34], projection-based reduced models [35,36], machine-learning-based models [37], or simplified models relying on approximations of the underlying physics [38][39][40]. In addition to an LF and an HF model, a correlation model is also required for combining data with various fidelity levels. A simple approach consists of linking the HF and the LF models by means of an additive correlation [41]: given an LF model f LF (ξ) and an HF model f HF (ξ), it is assumed that is an error function to be estimated. This approach is accurate enough when HF and LF models have similar scales and a good correlation, as is the case for coarse-grid approximations [42]. As an alternative, a multiplicative correlation can be used [43][44][45]: f HF (ξ) = ρ(ξ) f LF (ξ), with ρ(ξ) a constant scalar multiplier. A more comprehensive formulation combining the preceding ones was proposed in [32]: f HF (ξ) = ρ f LF (ξ) + δ(ξ). This approach is considerably more robust, and it has been extensively used in conjunction with Gaussian process approximation (including kriging) and Bayesian inference [46][47][48][49]). More sophisticated correlations exist [50][51][52], but they may be difficult to implement for complex problems.
In the present paper, we build on a SMOGA-based RDO technique introduced in [10], relying on the coupling of two nested Bayesian kriging (BK) surrogates: the first one is used to compute the required statistics of the objective functions in the uncertain parameter space, while the second one is used to model the response of these statistics to the design variables. Such an approach is called "combined kriging" [53]. An expected improvement criterion is used to update the second kriging surrogate during convergence towards the optimum. This technique has been successfully applied to the design of turbine blades for organic Rankine cycles [9] and to the RDO of the thermodynamic cycle [54]. Assuming that each kriging surrogate requires a number of samples approximately equal to 10 times the cardinality of the parameter space to build a reasonably accurate approximation [55], the nested BK RDO strategy needs O(100 × n unc × n des ) function evaluations (with n unc the number of uncertain parameters and n des the number of design variables) in the first generation of the GA to generate the initial kriging surrogates for the statistical moments. Additional O(10 × n unc ) evaluations are required for each update of the external kriging surrogate. Such a number of function calls is still too expensive for industrial CFD problems, even for uncertain or design spaces of low to moderate dimensionality (up to about eight uncertain or design parameters). GEK surrogates can be used to reduce the number of samples for the UQ step, but GEK-based MOGA is not straightforward in the context of RDO problems, since it requires also the gradient of the statistical moments of the QoI's pdf with respect to the design variables. Obtaining such a piece of information by using efficient adjoint methods is not a trivial task; on the other hand, finite difference approximations are easily applicable, but at the price of a considerable computational expense for high-dimensional design spaces. This is why we propose in this work a new multi-fidelity strategy for RDO that drastically reduces the required number of function calls by leveraging an inexpensive (but low-accuracy) first-order method of moments (MoM) and a Bayesian GEK [56]. Using gradient-based solvers allows reducing per se the number of function solves in the UQ step and to mitigate the curse of dimensionality. Here, the two models are fused together by using a methodology similar to [57] to generate a surrogate model for the MOGA optimization that combines the efficiency of MoM and the accuracy of GEK. The surrogate is enriched based on the expected improvement criterion during MOGA convergence, as in [9]. The required gradient information is obtained by using either continuous or discrete adjoint formulations. The new MF-RDO strategy is applied to an inexpensive test problem, i.e., the stochastic inverse design of a supersonic quasi-1D nozzle. The results show that a few GEK UQ solves are sufficient to correct the MoM solution, thus reducing the computational cost of the RDO by approximately one order of magnitude with respect to the nested BK strategy. Computational gains are expected to be even more substantial for costly industrial CFD problems.
The paper is organized as follows. In Section 2, we present the RDO problem and the test problem. The UQ methods considered in the study are described in Section 3. Section 4 presents the surrogate-based and multi-fidelity RDO strategies. In Section 5, we first apply various UQ methods to the test configuration and compare their accuracy and computational costs; afterwards, such methods are combined with a SMOGA or MF-SMOGA, and their efficiency in solving the RDO problem is assessed. Conclusions and final remarks are reported in Section 6.

Problem Definition
Following Taguchi's RDO method, we look for a methodology allowing optimizing a set of QoIs, J = J(x, ξ), J ∈ R m depending on a vector of deterministic design parameters x ∈ R n des and on a vector of uncertain parameters ξ ∈ R n unc . Note that some of the design parameters may also be uncertain. We formulate the RDO problem by using the expectancy and the variance of J as measures of robustness, which leads to the solution of the two objective deterministic optimization problem in Equation (1).
The preceding optimization problem is solved by means of an MOGA. More precisely, following our previous studies [10,12], we adopt the non-dominated sorting genetic algorithm (NSGA-II) of Deb et al. [58], which provides an approximated Pareto front of optimal solutions corresponding to different trade-offs between average performance and robustness for the various QoIs at hand. For simplicity, in the following, we consider only the case of a single QoI, m = 1, but the approach can be extended to multiple QoIs. The required statistics of the QoIs are calculated by means of a non-intrusive UQ method, which, for CFD models, is coupled with a suitable (costly) flow solver. Thus, the first ingredient of the RDO process is an efficient UQ approach, which provides accurate approximations of E[J] and var[J] based on a set of N deterministic samples of the solution. The UQ methods investigated in this work are described in Section 3.
Direct coupling of the MOGA with the UQ solver is overly costly for CFD, due to the high number of function evaluations. For instance, running the MOGA with a population of n pop individuals over n gen generations and using N samples for UQ lead to an overall number of CFD evaluations of the QoI of about N × n pop × n gen . The computational cost can be greatly alleviated by running N × n pop deterministic runs in parallel at each NSGA generation [8,25], but: (i) the required number of computational cores may exceed the computational resources available, and (ii) even with a perfect parallel scaling at each generation, the turn-around time of the RDO equals at least the average cost of a CFD run multiplied by n gen . To reduce the computational cost, a second (external) surrogate model is introduced to predict the response of the cost functions to the design parameters, as described in Section 4.

Test Problem: Quasi-1D Supersonic Nozzle
Various RDO strategies are assessed against an inexpensive test problem (also studied in [10,17]), namely the inverse design of a supersonic quasi-1D diverging nozzle. This allows validating UQ methods against MC sampling.
The nozzle geometry is assigned through the area distribution S(x) along the longitudinal axis x. This is chosen to be of the form: with a, b, c, and d coefficients defining the geometry. The nozzle length is set to L = 10. A typical nozzle geometry is depicted in Figure 1. For this test case, the QoI J (Equation (3)) is a scalar, namely the mean quadratic error of the actual pressure distribution in the nozzle P(x) with respect to the target design pressure distribution P des (x). The latter corresponds to the pressure distribution for a nozzle geometry with a = 1.75, b = 0.699, c = 1.00, and d = 3.80, a reservoir pressure P T,in = 1 bar, an outlet static pressure P out = 0.6 bar, and a gas specific heat capacity ratio γ = 1.4.
The optimization goal is to determine the design parameters a, b, c, d in Equation (2) providing the best fit to the target pressure distribution under multiple uncertainties, in the sense of Equation (1).
The flow is assumed to be governed by the steady Euler equations for quasi-1D flows (Equation (4)): where w = [ρ, ρv, ρe t ] T and f (w) = [ρv, ρv 2 + P, ρvh t ] T are, respectively, the conservative variable and the physical flux vectors [59]. In the preceding equations, ρ is the fluid density, v is the velocity along the nozzle axis, e t and h t are the total specific energy and enthalpy, and K = [0, −P, 0] T . The system of equations is supplemented by the equation of state for thermally and calorically perfect gases, P = (γ − 1)ρ(e T − v 2 /2). The governing equations are discretized by a cell-centered finite volume formulation, using Rusanov's first-order upwind scheme for space integration. The steady state solution is computed iteratively by solving a false transient with four-stage explicit Runge-Kutta time-stepping [59]. Characteristic boundary conditions based on Riemann invariants are imposed at the nozzle inlet and outlet. Sonic flow conditions are prescribed at the inlet, so that all Riemann invariants enter the domain. The range of variation of the total pressure is such that a shock is always created in the divergent nozzle. As a consequence, outlet flow conditions are always subsonic. In this case, we impose the outlet static pressure, which is treated as deterministic, and fixed to P out = 0.6 bar.
Based on a preliminary mesh study, a computational grid of 300 uniformly spaced cells is used in all of the following calculations.
The system is assumed to be subject to uncertainties of various nature, specifically: • geometric tolerances on the nozzle shape, modeled by treating the shape parameters a, b, c, d as normally distributed random variables, with mean µ and coefficient of variation CoV = σ/µ, with σ the standard deviation; • uncertainties in inlet total pressure P T,in described as a uniformly-distributed random variable with imposed lower and upper bounds; • uncertainties in the gas properties, here represented by the specific heat ratio γ, which is also assumed as uniformly distributed.
The characteristics of the random parameters are listed in Table 1. In the inverse design process, the uncertain geometric parameters are also (uncertain) design variables: for this reason, their mean is not fixed, but varies within the ranges corresponding to the bounds of the design space. This means that, even for designs corresponding to the upper/lower bounds, a realization of the nozzle geometry may lie outside the prescribed limits, due to geometric tolerances.

Uncertainty Quantification Methods
The goal of UQ methods is to estimate variability in the output of a model, corresponding to a set of QoIs, given variations in the model inputs. We present in the following two gradient-based UQ methods. The first one is a so-called probabilistic and non-intrusive approach, which predicts the approximate pdfs of the output QoIs, given the probability distributions assigned to the inputs and a set of samples of the QoI obtained by running the model as a black-box function. It relies on MC sampling on a gradient-enhanced kriging surrogate. The second one is a deterministic, more intrusive, approach based on Taylor-series expansion of the QoI with respect to the uncertain parameters, and it corresponds to a first-order adjoint method of moments (MoM). In addition to those UQ methods, we also consider a UQ solver based on the BK surrogate of [9], which does not use gradient information. BK is also used as the external surrogate model for SMOGA in the next section.

Bayesian Kriging and Gradient-Enhanced Kriging
Surface-response methods based on Bayesian kriging (BK) are implemented to achieve a non-intrusive estimate of the cost function statistics. In the following, we first introduce a BK formulation using only the observed values of the QoI J. Afterwards, we extend the formulation to account for gradient information, leading to Bayesian GEK.

Bayesian Kriging
A surface-response method based on BK was implemented to achieve a non-intrusive estimate of the cost function statistics. The QoI J is modeled as a regression function of the form: where we dropped the dependency on the deterministic quantities for simplicity, so that ξ defines a generic point in the n unc -dimensional uncertain space, f i are basis functions of the regression model, β i are regression coefficients to be determined, and Z(ξ) is a zero mean Gaussian process (GP) [60], modeling the deviation between the regression function and the data. Choosing a zeroth-order polynomial as the regression model leads to the so-called ordinary kriging. The deviation Z is represented as a multi-variate normal distribution Z ∼ N(0, R), with zero mean and covariance matrix R.
The kriging approximation can be formulated in a Bayesian statistical framework. In this case, the BK surrogate predicts a set of M values of the QoI J(ξ), conditional on N observed data J * , where J * is selected from the full set of J through the observation matrix H of size N × M: J * = H J with: for i = 1, ..., M and j = 1, ..., N. The vector of the unknown QoI J(ξ) is supposed to follow a normal prior distribution p(J), J∼N(0, P), where the covariance matrix P has to be estimated. Besides, the conditional probability to observe the N data J * given the unknowns J (known as the likelihood function) is also supposed to be a normal distribution of the form: The observation error is modeled as uniform and uncorrelated, resulting in a covariance matrix of the form R = 2 I, where I is the unit matrix and a pre-specified error of the observed variable values. The posterior distribution of the unknowns, conditional to the observed data, can be inferred by means of the Bayes theorem, such that: The posterior being the product of two normal distributions results in being a normal distribution as well, whose parameters depend on the expression of the prior covariance P: Assuming that J and J * are normalized such that J * has zero mean and unit variance, the covariance matrix P of the prior distribution p(J) is defined such that its elements are generated by the squared exponential kernel (reported below in the 1D case, for simplicity): where h ij = ξ i − ξ j is the correlation range, ξ i the generic coordinate of the observation point J * , and θ a hyperparameter that has to be estimated (for multi-dimensional problems, it is a vector whose dimension is the cardinality of the parameter space). Following [61], the kriging predictor mean and variance are: The advantage of this methodology is that the kriging variance provides a natural measure of the accuracy of the surrogate model. Because we deal with an ordinary kriging model where the hyperparameters need to be estimated, the maximum likelihood estimation (MLE) approach is implemented in order to evaluate the hyperparameter vector θ as the solution of the optimization problem: where A = (R + HPH T ). Since the optimization problem in (13) is defined over a multidimensional space of cardinality n unc , whose cost grows as O(n s N 3 ), where n s is the number of optimization steps and N is the number of observed samples, it represents a bottleneck for the above methodology. The MLE problem is classically solved by means of the Nelder-Mead downhill simplex method [62].
Here, a global search algorithm based on a differential genetic evolution approach was implemented, providing an improvement in terms of computation time. Besides, efficient inversion of matrix A is achieved through Cholesky decomposition. Once the kriging surrogate has been trained from the data and the hyperparameters are estimated, the statistics of the QoI are calculated through Monte Carlo sampling of the kriging surface, according to some prescribed joint probability function distribution of the parameters ξ.

Gradient Enhanced Kriging
Bayesian kriging can be accelerated by adding gradient information. The formulation described hereafter is the form of so-called gradient-enhanced co-kriging, denoted GEK for simplicity, where the gradient data are added as covariables. This approach has been shown to be simpler and more robust for indirect gradient-enhanced kriging, where the gradient information is included in the surrogate by adding finite difference-based values at small distances from the sample locations, and then, a standard kriging is performed on the resulting augmented sample [22][23][24].
For the sake of clarity, we first describe GEK in the one-dimensional case: J = J(ξ). Assuming that the derivatives of dJ/dξ are known at the M sampling points, the observation vector is redefined by concatenating the gradient information to the function values: The covariance matrix is subsequently modified to account for the observational errors on the derivatives, also assumed to be uniform and uncorrelated. This results in a diagonal matrix of dimension 2M × 2M: where ς 2 is the variance of the (identically distributed) observation errors on the gradients. Finally, the prior covariance is reformulated as in [63]: P = P 00 P 01 P 10 P 11 (16) with In the above, P 00 is the covariance of the function values, P 11 is the covariance of the derivatives, and P 01 and P 10 are the cross-covariances.
The GEK formulation is extended to M-dimensional problems in a straightforward manner: the observation vector is expanded by adding N M-dimensional gradients, which results in the N × (M + 1) compiled observations [64]. The prior covariance P becomes: where: Due to the use of additional gradient information, GEK achieves a given level of accuracy with a number of samples that is consistently lower than BK. However, the solution quality may be highly dependent on the accuracy of the computed gradients [24], which must be properly accounted for when building the surrogate.

Method of Moments
Among deterministic UQ methodologies, an interesting approach is the MoM, which approximates the statistical moments of the fitness function by Taylor series expansions. This method may provide fast and sufficiently accurate estimates of the QoI statistics, as long as the fitness sensitivity derivatives with respect to uncertain parameters are provided by an efficient method and complete output statistics are not required [17]. Both first-and second-order MoM formulations have been considered in the literature (see for instance [17]). Here, we restrict our attention to the first-order MoM, as a candidate low-fidelity UQ method to be combined with a higher-fidelity approach, as discussed later. Following [65], the first-order approximation for the expected value (mean) of J is simply given by: which is nothing but the deterministic evaluation of function J at the mean value of the inputξ. The first-order variance is: with cov(ξ i , ξ j ) the covariance matrix. If the variables ξ = {ξ 1 , ξ 2 , ....ξ N } are uncorrelated, the covariance is a diagonal matrix and Equation (20) takes the simplified form: where σ 2 i stands for the variance of the i-th uncertain parameter.

Gradient Calculation
The gradient information required for the GEK and MoM approaches is calculated by means of the adjoint method. Both the continuous and the discrete adjoint formulations are used in the following. Their derivation for the present quasi-1D nozzle problem is described in Appendix A.

Robust Design Optimization Strategy
As the RDO problem defined in Equation (1) is an intrinsically multi-objective problem, the well-known non-dominated sorting genetic algorithm NSGA-II [58] has been used as the optimizer. The MOGA requires evaluations of the RDO cost functions through the UQ method for each individual and for each generation. To reduce the computational cost of the RDO, one possibility is to construct a BK or GEK surrogate over a parameter space extended to the design variables, i.e., of dimensionality n unc + n des . Due to the curse of dimensionality, constructing a reliable surrogate over such a large parameter space may result in an unacceptable cost of the kriging approximation. To avoid this issue and to facilitate the use of gradient-based UQ, including MoM, we chose to construct instead a separate surrogate model mapping a variable in the design space to the cost function space.

BK-Based Robust Design Optimization
In [9,10,54], an external BK surrogate was constructed to describe variations of the mean and variance of the QoIs in the design space, by using n init samples chosen according to a preliminary design of experiments (DOE). In those references, a first-level UQ BK using N samples was used to evaluate the cost functions. In the present work, a similar approach is used, whereby a GEK surrogate or the MoM may replace first-level BK for the UQ step; a flowchart of the RDO process is provided in Figure 2.
The construction of the second-level surrogate requires N × n init deterministic runs of the direct CFD solver in the case of BK and of the direct and adjoint solvers for GEK, which can be parallelized in a straightforward manner. The moderate extra-cost of adjoint solves for GEK is in general counter-balanced by a smaller N required for achieving a given accuracy. If the MoM is used as the UQ solver, the cost of the initial surrogate is reduced to n init deterministic runs of the direct (nonlinear) and adjoint CFD solvers. Numerical tests show that a sufficiently accurate BK surrogate can be obtained by setting n init = 10n des . Once n init estimates of the QoI mean and variance have been obtained, a second-level BK is constructed and coupled with the MOGA. In order to control the accuracy of the approximated cost functions, an adaptive infill strategy is adopted to enrich the external BK surrogate during the evolution. For this purpose, we add to the initial new DOE samples selected during the MOGA iteration and retrain the BK model. Several infill criteria are available. For an overview about all these techniques the reader is addressed to [66,67]. Among them, the expected improvement (EI) criterion [68] has been shown to provide a good trade-off between exploitation and exploration, and it was therefore selected for the RDO carried out in the present work. The new samples are drawn in such a way that the EI of the global minimum is maximized [69,70]. The EI quantifies the probability to improve the surrogate on the design space parameters. A global optimization of the EI function is then carried out to find the most suitable location for the new sample (see [70] for details).
Since RDO is intrinsically a multi-objective optimization process, the EI function is a surface of a hyper-dimensional parameter space, and a more complex formulation is developed, referred-to as the multi objective expected improvement (MOEI) approach (see [70]). The accuracy of the surrogate is rapidly improved with a few MOEI infills, so that n MOEI << n gen . An MOEI update implies running the UQ algorithm for the additional sample to be integrated to the second-level BK. The N deterministic CFD and adjoint runs required for the BK and GEK UQ solvers can be carried out in parallel. Finally, the cost of the RDO based on the adaptive infill strategy in terms of cost-function evaluation is given by N × (n init + n MOEI ) direct CFD calculations for BK, N × (n init + n MOEI ) direct and adjoint CFD calculations for GEK, and (n init + n MOEI ) direct and adjoint calculations for MoM. The turn-around time, in the case of a perfectly-scaling parallel implementation, is approximately equal to n MOEI + 1 runs. Based on numerical experiments, an MOEI adaptation every three to five MOGA generations is generally sufficient to achieve an accurate approximation of the optimum, as shown in the Results Section.
The full BK-based RDO loop is described in the pseudo-code provided hereafter in Algorithm 1.

Multi-Fidelity Methods for RDO
Although the BK-based SMOGA allows a considerable reduction of function calls during the RDO loop, the overall computational cost remains significant for high-dimensional design and uncertain spaces. While the computational cost is higher for BK or GEK solvers than for MoM, the former may provide an accurate estimate of the QoI statistics, which is not the case for the first-order MoM. With the aim of achieving a compromise between the accuracy of kriging-surrogate UQ solvers and the computational efficiency of the MoM, in this section, we introduce an advanced SMOGA, based on a multi-fidelity surrogate model of the design space replacing the preceding external BK surrogate (MF-SMOGA).
In the MF surrogate, the MoM UQ solver is used as the low-fidelity (LF) model, and the GEK UQ solver is the high-fidelity (HF) one. In the present implementation, an initial DoE is run at the first iteration of the SMOGA, where the HF and LF sampling points are generated independently. The auto-regressive correlation: is used for correcting the discrepancy between the LF and HF models. More specifically, we follow the implementation proposed in [57], and we use the LF model as a basis function for the regression term expression of a universal kriging model; therefore, the term m(ξ) in Equation (5) becomes: where β ρ is an estimate of the coefficient ρ of Equation (22) by means of a GP regression. Assuming that the LF and HF models are independent, the mean µ HF and variance σ 2 HF of the high-fidelity model are given by: with µ LF , σ 2 LF , µ δ , and σ 2 δ the means and variances of the LF model and of the discrepancy function δ, respectively. In order to improve the MF surrogate accuracy during SMOGA convergence, adaptive infill based on the MOEI criterion is used. In this case, however, either the LF or the HF model can be used for the infill. In the following calculations, we adopt the strategy of [42]: for the infill, priority is given to the less expensive LF model, and the HF one is used only when improvement achieved with the lower level of fidelity is below a given tolerance, tol = 10 −4 in the present calculations. Either way, re-sampling at the same location is avoided.
Pseudo-code for the MF model is provided below in Algorithm 2; for more information about it, the reader is addressed to [57].

Results
The quasi-1D supersonic nozzle test problem presented in Section 2.1 is first used to assess the BK, GEK, and MoM UQ methods against reference MC sampling. Afterwards, the methods are applied to a sample of nozzle geometries and used to build single or multi-fidelity surrogates used in the SMOGA RDO loop.

Preliminary Assessment of the UQ Methods
We select one of the nozzle geometries in the design space by assigning the geometric coefficients' fixed normal pdfs with the standard deviation equal to 1% of the mean. The pdf parameters for the geometric variables are provided in Table 2. The operation and thermodynamic parameters are assigned the same pdf as in Table 1. The pdfs are sampled in order to build BK and GEK surrogates, while the MoM is applied by computing the QoI and its gradient at the expected value of the input parameters.
A summary of the UQ results is given in Table 3, where we report the approximated mean and variance of the QoI J according to the various UQ methods. A reference calculation based on MC integration over 10 5 samples is carried out to provide a reference solution. For the present inexpensive test problem, the CPU time required for MC sampling is approximately 3 × 10 6 seconds on a personal computer having an Intel(R) Xeon(R) CPU E5-1620 v3 at 3.50 GHz. Results are also reported for MC sampling on the BK surrogate. The latter uses 60 function evaluations at points selected according to a Latin hypercube sampling (LHS) of the six-dimensional uncertain space. The sample size corresponds to the empirical rule N = 10n unc . This is already sufficient to achieve extremely low errors with respect to the MC mean and variance, while reducing the overall computational cost of the UQ by three orders of magnitude. The GEK UQ based on the discrete adjoint solver provides an accuracy similar to BK by using only 15 samples. Despite the additional adjoint solves for gradient computations, the computational cost is reduced by more than 1/3 with respect to BK. The accuracy of GEK is confirmed by inspection of Figure 3, showing the full empirical pdfs of J computed with PC, BK, and GEK. A very good agreement between MC, BK, and GEK is observed.  The accuracy is less satisfactory for the continuous adjoint approach, due to the less accurate computation of gradients. This reflects numerical errors introduced by the discretization of the adjoint equations and the treatment of boundary conditions, which are not completely consistent with the discretization errors introduced by the direct CFD solver. On the other hand, the continuous adjoint solver is developed independently on the direct solver, and in this sense, it is non-intrusive. The computational cost of GEK samples using the direct or continuous adjoint is approximately the same. The MoM method is obviously less accurate than the other UQ solvers. Nevertheless, the errors on both the computed mean and variance remain very reasonable, despite the presence of a shock in the divergent nozzle(where the Taylor-series expansion is not defined) and the relatively large uncertainty range on the reservoir pressure (approximately 20%). The shock is in practice regularized by numerical diffusion both in the direct and the adjoint solvers, while the uncertainty ranges of all other parameters are reasonably small. While the discrete and continuous-adjoint MoM calculations provide strictly the same results for E[J] (which does not use gradient information), they predict different results for the standard deviation. The slightly lower errors obtained for the continuous adjoint method is the effect of the compensation of errors for the case at hand. Since the MoM requires only one direct and one adjoint CFD calculation, its computational cost is essentially 1/15 of the GEK sampling. Overall, the first-order MoM provides a reasonably accurate estimate of the lower order statistics of the QoI and a very good tradeoff between cost and accuracy for the present problem. Its accuracy is however expected to decrease for more severe uncertainty ranges. For this reason, the MoM is categorized as a lower fidelity model than BK or GEK.
The UQ results for MC, BK, and GEK can be used to carry out a global sensitivity analysis and to identify the random parameters contributing the most to the variance of the QoI by means of an analysis of variance (ANOVA) decomposition. Specifically, exact or surrogate-based MC samples are employed to calculate the Sobol indexes [71] in the parameter space defined by all uncertain inputs. For this purpose, we used the SALib Python library [72]. Only the discrete GEK results (reported in Figure 4) are considered here, the MC, BK, and continuous GEK methods leading to similar conclusions. The figure reports the first-order Sobol indexes with respect to each uncertain parameter, as well as the sum of the higher order indexes, corresponding to the interaction between parameters when these are changed simultaneously. Sobol total indexes are also included in order to further evaluate the relative importance of the various input parameters. The inlet total pressure P T,in appears to be by far the most influential parameter, consistent with the much larger range of its pdf. Nevertheless, the interactions terms are very significant, due to the highly nonlinear nature of the compressible CFD problem, with geometric parameters c and d contributing by more than 10% to the total variance and parameters a and b by more than 2%. For this reason, in the RDO calculations of the following section, we treat all six input parameters as random variables.

RDO Results
In this section, the UQ methods are coupled with a SMOGA RDO optimizer. The objective is to investigate how different approximations of the statistical models of J, i.e., of the RDO cost functions, may affect the RDO process. The BK and GEK UQ solvers were used in the same setting as Section 5.1, i.e., they are trained for each new design by using 60 and 15 samples in the uncertain space, respectively. This choice provides a good tradeoff between accuracy and computational cost. More sophisticated approaches involving an adaptive refinement of the UQ surrogates are possible and will be the object of future work. Secondly, an MF SMOGA RDO is carried out, and the results are compared with single fidelity RDO. For the single fidelity surrogates, an initial LHS DOE of 10 × n des = 40 UQ runs was carried out prior to the first generation. Then, the first population was randomly initialized in the design space, and the GA was allowed to evolve over 80 generations. MOEI infills were conducted every five generations. For the MF surrogate, the initial DOE contained 40 LF continuous MoM and 10 HF discrete GEK UQ runs. Low-fidelity MOEI infills were added every five generations except for the last infills, for which the tolerance criterion suggested an HF infill. The evolution of the SMOGA search in the objective space E[J] vs. var[J] is depicted in Figure 5 for various RDO optimizers: for each strategy, designs belonging to the first generation, the generations after the first and the second MOEI infill, and the last one are highlighted in red, while all other designs are depicted in grey. For the present inverse design problem, the Pareto front collapses onto a single point corresponding to simultaneous minimization of both objectives E[J] and var [J]. The progression toward the optimal point is similar for all UQ techniques and optimization surrogates. For more clarity, Figure 6 shows the locations of selected GA populations in the objective space. For each technique, four subplots are provided showing in red all points calculated respectively at the SMOGA first generation, at the generation after the first MOEI infill, at the generation after the second MOEI infill, and at the last generation. Moreover, in grey, as a background for each subplot, the reader can find all points calculated with each technique during the SMOGA optimization. It appears that kriging, GEK, the MoM, and MF evolve similarly, as they quickly reach the area near the optimum after only two MOEI infills.  The optimal solutions calculated with each technique are reported in Table 4. Despite the different accuracy of the UQ techniques in use, all RDO loops converge to very similar solutions identified through the expected values of the four design parameters [a, b, c, d], which differ only on the sixth decimal digit or less for BK, GEK, and MF and on the fifth decimal digit for the MoM. The table also displays the values of the objective functions computed on the surrogate models. BK-BK, GEK-BK, and MF surrogates lead to close estimates of the objective functions, with discrepancies of less than 1% on the mean and less than 2% for the variance. The CPU cost of the different RDO algorithms on a single processor personal computer are reported in the same table. Due to the reduced number of samples used in the UQ runs, the GEK algorithm allows gaining a factor of two to 2.5 over the BK. A further reduction of a factor of 15 or larger is obtained using the MoM, but caution must be taken in systematically preferring such a method for RDO problems, because of its lower accuracy. Finally, the MF SMOGA RDO has a computational time eight times smaller than BK and four times smaller than GEK while ensuring similar accuracy to BK SMOGA. As a further verification of the RDO results, the optimal design is recomputed using MC sampling, resulting in E[J] = 0.32325 and var[J] = 0.09487, which is in rather good agreement with the SMOGA estimates, appearing to be slightly over-optimistic in predicting the objective functions. Indeed, the full pdf of the QoI computed by propagating the uncertain parameters through the CFD solution for the optimal nozzle geometry by using the BK and GEK methods (shown in Figure 7 alongside the MC distribution) exhibits moderate deviations from the MC one. Globally, the optimal distributions are much closer to zero on average, and they exhibit a smaller variance than the baseline geometry investigated in the preceding section, showing that the RDO effectively improves both criteria.  Table 4.
Finally, the optimized geometry is depicted for all of the employed methodologies in Figure 8, and it is compared with the baseline one. The plot is obtained by propagating the pdf of the geometric coefficients through Equation (2), with the mean equal to the optimal RDO values and CoV = 0.01 as prescribed in Table 4, by using MC sampling. Afterwards, the average geometries and the 3σ confidence intervals are computed. In the figure, average geometries for all methods are essentially superposed to within plotting accuracy, and the differences are largely smaller than the geometric uncertainty.

Conclusions
In the present work, we first assessed various gradient-based methods for uncertainty quantifications, in view of their use for the robust design optimization (RDO) of CFD problems. Specifically, the capability of a Bayesian gradient-enhanced kriging surrogate model and a first-order method of moments to accurately and efficiently compute the lower order statistical moments of a quantity of interest was evaluated for an inexpensive test problem, representative of a supersonic divergent nozzle, for which the results can be compared with well-converged Monte Carlo sampling. The results show that GEK allows computational gains of a factor of two or more with respect to a Bayesian kriging surrogate not using gradient information, when the gradients are efficiently computed using an adjoint method. In the present work, both a discrete and a continuous adjoint method were used for building GEK surrogates. The first one provides more accurate results, but it is somewhat more costly and requires intrusive automatic derivation of the CFD code, which is not possible if, for instance, a commercially available CFD solver is to be employed. The continuous adjoint method allows developing a non-intrusive adjoint solver, but it is less accurate due to inconsistencies in the numerical treatment of the direct and adjoint equations. In the present implementation, the continuous adjoint solver benefits from a direct solution of the linear system of adjoint equations, and it is therefore quicker than the discrete adjoint solver, which converges by means of a pseudo-transient technique.
The first-order moment method, based on either discrete or continuous adjoint gradients, is less accurate than GEK, but it still provides satisfactory estimates of the QoI for the present shocked flow problem, due to the relatively small variation ranges of the uncertain parameters.
The UQ methods are then combined with a genetic algorithm for solving the RDO problem. The computations are sped up by constructing a Bayesian kriging surrogate model of the design space. The surrogate is enriched during GA iterations by means of a multi-objective expected improvement (MOEI) infill criterion. For the test problem at hand, the RDO results are found to be similar for the BK, GEK, and MoM methods, the latter being much less expensive in terms of CPU time, but slightly less accurate than the former ones. In order to benefit from the computational efficiency of the MoM and the accuracy of the GEK UQ solvers at the same time, a multi-fidelity surrogate model is built by fusing together the low-fidelity MoM and the high-fidelity GEK. An MOEI infill is used again to enrich the surrogate during convergence, with preference for low-fidelity infills. The multi-fidelity approach successfully identifies the RDO optimum, while dividing by a factor of 3 ÷ 4 the computational cost with respect to GEK. Such an approach is then identified as a promising candidate for more complex RDO problems using CFD models. Further work is however required to assess its effectiveness for more realistic and complex CFD problems. For this aim, its application to the RDO of the stator row of an organic Rankine cycle turbine is underway and will be reported in the near future. In such a context, the introduction of multi-fidelity UQ solvers combining BK and/or GEK based on different numbers of samples could be an interesting future development for further speed up of the RDO procedure.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Calculation of the Gradient from CFD Codes
As GEK and the MoM both require as input also the gradient of the QoIs with respect to the uncertain parameters, it is worth providing a quick overview of the methodologies used to calculate the derivatives of the quantities from the CFD code employed.
In general, the gradient evaluation problem may be stated as follows: J is a vector of objective functions and constraints of dimension N j (Equation (A1)) that must comply with a set R of governing equations (Equation (A2)). where: • α is a vector of the control/design variables of dimension N α , which parametrizes the problem.
• ω is a vector of state variables depending on α with dimension N ω .
In this work, J is the objective function defined in Equation (3); the governing equations R correspond to Euler equations for quasi-1D flow Equation (4); ω are the conservative variables [ρ, ρv, ρe t ]; and α is the vector of the design parameters parametrizing the geometry [a, b, c, d] that should be optimized.
The final aim of the problem is to evaluate the gradient of the objective functions and constraints J expressed in Equation (A1) with respect to the design variables α, as defined in Equation (A3).
In the present work, the continuous adjoint and discrete adjoint were used to compute the gradient in Equation (A3); hereafter, in the following sections, the adopted formulation of both of these techniques is provided.
For the discrete adjoint approach, both J and R must be considered in discrete form. Therefore, as the differential form of the governing equation R Equation (A2) is equal to zero, it is possible to plug it in the gradient equation Equation (A3), obtaining Equation (A4).
where Ψ is an arbitrary line vector with dimension N ω containing the discrete adjoint variables [73]. Equation (A4) can be arranged to obtain Equation (A5).
The first term of Equation (A5) can be eliminated by choosing the values for the components of vector Ψ to satisfy the adjoint Equation (A6).
Thus, gradient dJ dα can be easily calculated as in Equation (A7).
In the present work, the direct and the discrete adjoint codes presented in [17] were used; the latter was implemented by means of the algorithmic differentiation software Tapenade [74].

Appendix A.2. Continuous Adjoint
For the continuous adjoint, the adjoint equations with respect to J are derived from the continuous form of the governing equations R; afterwards, they are discretized and solved.
Considering the quasi-1D nozzle geometry described in Section 2.1 and the definition of J provided in Equation (3), one can write the differential form of J as in Equation (A8).
∂J ∂α = L 0 (P − P des ) δP δα dx (A8) As the differential form of the governing equation R Equation (A2) is equal to zero, it is possible to multiply it by a vector θ T , then integrate it over the domain, and finally, subtract it from the variation of the cost function dJ, obtaining the relation in Equation (A9).
where θ T is an arbitrary line vector with dimension N ω containing the continuous adjoint variables.
As the objective is to obtain the gradient dJ dα , Equation (A9) can be differentiated, and with the assumption that θ T is also differentiable, it is possible to integrate the integrals containing θ T by parts, obtaining the formulation as in Equation (A10).
where K, S, and f were already defined in Equation (4). At this point, since θ T is an arbitrary differentiable function, it can be chosen in order to remove the dependence of dJ from the variation of the state vector dω, obtaining the differential adjoint problem defined in Equation (A11).  Once θ T has been determined, the variation of the cost function dJ with respect to the design parameters α can be easily calculated by considering just the variation of dS, as stated in Equation (A14).
In the present work, the system of linear equations defined in Equation (A12) is discretized by a cell-centered finite volume formulation, using Rusanov's first-order scheme for space integration, while the boundary conditions in Equation (A13) with a second-order backward finite differences scheme; this set of equation was solved with Gaussian elimination with partial pivoting.

Appendix A.3. Discrete and Continuous Adjoint Validation
In order to validate both the discrete and continuous adjoint, a comparison with gradients calculated with finite differences was performed for the baseline geometry defined in Table 2. For finite differences, a second-order central scheme was employed with a discretization step of 10 −5 .
In Table A1, we compare the gradients from the finite differences method vs. the one calculated with the discrete adjoint, while Table A2 presents the comparison with the continuous adjoint.