Optimal ANOVA-Based Emulators of Models With(out) Derivatives

Lamboni, Matieyendou

doi:10.3390/stats8010024

Open AccessArticle

Optimal ANOVA-Based Emulators of Models With(out) Derivatives

by

Matieyendou Lamboni

^1,2

¹

Department DFR-ST, University of Guyane, Cayenne 97346, French Guiana

²

228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France

Stats 2025, 8(1), 24; https://doi.org/10.3390/stats8010024

Submission received: 31 January 2025 / Revised: 1 March 2025 / Accepted: 12 March 2025 / Published: 17 March 2025

(This article belongs to the Section Statistical Methods)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes new ANOVA-based approximations of functions and emulators of high-dimensional models using either available derivatives or local stochastic evaluations of such models. Our approach makes use of sensitivity indices to design adequate structures of emulators. For high-dimensional models with available derivatives, our derivative-based emulators reach dimension-free mean squared errors (MSEs) and a parametric rate of convergence (i.e.,

O (N^{- 1})

). This approach is extended to cope with every model (without available derivatives) by deriving global emulators that account for the local properties of models or simulators. Such generic emulators enjoy dimension-free biases, parametric rates of convergence, and MSEs that depend on the dimensionality. Dimension-free MSEs are obtained for high-dimensional models with particular distributions from the input. Our emulators are also competitive in dealing with different distributions of the input variables and selecting inputs and interactions. Simulations show the efficiency of our approach.

Keywords:

derivative-based ANOVA; emulators; high-dimensional models; independent input variables; optimal estimators

PACS:

62J10; 62L20; 62Fxx; 49Q12; 26D10

1. Introduction

Derivatives are sometimes available in modeling either according to the nature of observations of the phenomena of interest ([1,2] and the references therein) or low-cost evaluations of exact derivatives for some classes of PDE/ODE-based models, thanks to adjoint methods [3,4,5,6,7,8]. Models are defined via their rates of change with respect to their inputs; implicit functions (defined via their derivatives) are instances. Additionally, and particularly for complex models or simulators, efficient estimators of gradients and second-order derivatives using stochastic approximations are provided in refs. [9,10,11,12,13]. Being able to reconstruct functions using such derivatives is worth investigating [14], as having a practical, fast-evaluated model that links stochastic parameters and/or stochastic initial conditions to the output of interest of PDE/ODE models remains a challenge due to numerous uncertain inputs [15].

Moreover, the first-order derivatives of models are used to quickly select non-relevant input variables of simulators or functions, leading to effective screening measures. Efficient variance-based screening methods rely on the upper bounds of generalized sensitivity indices, including Sobol’s indices (see refs. [16,17,18,19,20,21,22] for independent inputs and [23] for non-independent variables).

For high-dimensional simulators, dimension reductions via screening input variables are often performed before building their emulators using Gaussian processes (kriging models) [24,25,26,27,28] polynomial chaos expansions [29,30,31], SS-ANOVA [32,33], or other machine learning approaches [34,35]. Indeed, such emulators rely on nonparametric or semi-parametric regressions and struggle to reconstruct simulators for a moderate to large number of inputs (e.g., [35]). Often, nonparametric rates of convergence are achieved by such emulators (see ref. [33] for SS-ANOVA and [36,37,38] for polynomial chaos expansions). Regarding the stability and accuracy of polynomial chaos expansions, the number of model runs needed is first estimated at the square of the dimension of the basis used [36] and then reduced at that dimension up to a logarithm factor [37,38]. Note that for d inputs, such a dimension is about

{(w + 1)}^{d}

for the tensor-product basis, including, at most, the monomial of degree w.

For models with a moderate to large number of relevant inputs, Bayesian additive regression trees have been used for building emulators of such models using only the input and output observations (see ref. [35] and the references therein). Such approaches rely somehow on rule-ensemble learning approaches by constructing homogenous or local base learners (see ref. [34] and references therein). Combining the model outputs and model derivatives can help to build emulators that account for both local and global properties of simulators. For instance, including derivatives in the Gaussian processes (considered in ref. [2]) allows for improving such emulators. Emulators based on Taylor series (see ref. [14]) combine both the model outputs and derivative outputs with interesting theoretical results, such as dimension-free rates of convergence. However, concrete constructions of such emulators are not provided in that paper.

Note that the aforementioned emulators are part of the class of global approximations of functions. While global emulators can be used to approximate models at any point in entire domain, local or point-based emulators require building different emulators for the same model. Conceptually, the main issues related to such practical emulators are the truncation errors and the biases due to epistemic uncertainty. Indeed, none of the above emulators rely on exact and finite expansions of functions in general. Thus, additional assumptions about the order of such errors are necessary to derive the rates of convergence of emulators. For instance, decreasing eigenvalues of kernels are assumed in kernel-based learning approaches (see, e.g., ref. [39]).

So far, the recent derivative-based (Db) ANOVA provides exact expansions of smooth functions with different types of distributions from inputs using the model outputs, such as first-order and second-order derivatives up to

d^{t h}

-order cross partial derivatives [22]. It was used in ref. [13] to derive the plug-in and crude emulators of models by replacing the unknown derivatives with their estimates. However, convergence analysis of such emulators is not known, and derivative-free methods are much more convenient for applications in which the computations of cross-partial derivatives are too expansive or impossible [9,10,11].

Therefore, for high-dimensional simulators for which all the inputs are important or not, it is worth investigating the development of their emulators based directly on available derivatives or derivative-free methods. The contribution of this paper is threefold:

We designed adequate structures of emulators based on information gathered from global and derivative-based sensitivity analysis, such as unbiased orders of truncations and the selection of relevant ANOVA components (inputs and interactions);
We constructed derivative-based or derivative-free global emulators that are easy to fit and compute and can cope with every distribution of continuous input variables;
We examined the convergence analysis of our emulators, with a particular focus on the (i) dimension-free upper bounds of the biases and MSEs; (ii) the parametric rates of convergence (i.e., $O (N^{- 1})$ ); and (iii) the number of model runs needed to obtain the stability and accuracy of our estimations.

In this paper, flexible emulators of complex models or simulators and approximations of functions are proposed using exact and finite expansions of functions involving cross-partial derivatives, known as Db-ANOVA. Section 2 first deals with the general formulation of Db-ANOVA using different continuous distribution functions and emulators of models with available derivatives. Such emulators reach the parametric rates of convergence, and their MSEs do not depend on dimensionality (i.e., d). Second, adequate structures of emulators are investigated using Sobol’s indices and their upper bounds, as the components of such emulators are interpretable as the main effects and interactions of a given order. The orders of unbiased truncations have been derived, leading to the possibility of selecting the ANOVA components that were included in our emulators.

For non-highly smooth models and for high-dimensional simulators for which the computations of derivatives are too expansive or impossible, new, efficient simulator emulators have been considered and are shown in Section 3, including their statistical properties. First, we provide such emulators under the assumption of quasi-uniform distributions of inputs so as to (i) obtain practical conditions for using such emulators and (ii) derive emulators that enjoy dimension-free MSEs for particular distributions of inputs. Second, such an assumption is removed to cope with every distribution of inputs. The proposed emulators have dimension-free biases and reach the parametric rate of convergence as well. Numerical illustrations (see Section 4) and an application to a heat diffusion PDE model (see Section 5) have been considered to show the efficiency of our estimators, and we conclude this work in Section 6.

General Notation

For an integer,

d > 0

, denote with

X : = (X_{1}, \dots, X_{d})

a random vector of d independent and continuous variables with marginal cumulative distribution functions (CDFs),

F_{j}

, and probability density functions (PDFs),

ρ_{j}, j = 1, \dots, d

.

For a non-empty subset,

u \subseteq {1, \dots, d}

,

| u |

stands for its cardinality, and

(\sim u) : = {1, \dots, d} ∖ u

. Additionally,

X_{u} : = (X_{j}, \forall j \in u)

denotes a subset of such variables, and the partition

X = (X_{u}, X_{\sim u})

holds. Finally, we use

{||\cdot||}_{2}

for the Euclidean norm,

E [\cdot]

for the expectation operator, and

V [\cdot]

for the variance operator.

2. New Insight into Derivative-Based ANOVA and Emulators

Given an integer,

n > 0

, and an open set,

Ω \subseteq R^{d}

, consider a weak partial differentiable function,

f : Ω \to R^{n}

[40,41], and a subset,

v \subseteq {1, \dots, d}

, with

| v | > 0

. Denote with

D^{| v |} f : = (\prod_{k \in v} \frac{\partial}{\partial x_{k}}) f

the

{| v |}^{th}

weak cross-partial derivatives of each component of f with respect to each

x_{k}

with

k \in v

and

L^{2} (Ω) : = \{f : Ω \to R^{n} : E [{||f (X)||}_{2}^{2}] < + \infty\}

, e.g., the Hilbert space of functions. Consider the following Hilbert–Sobolev space:

W^{d, 2} : = \{f \in L^{2} (Ω) : D^{| v |} f \in L^{2} (Ω); \forall | v | \leq d\} .

In what follows, assume the following:

Assumption A1.

X

is a random vector of independent and continuous variables, supported on an open domain, Ω.

Assumption A2.

f (\cdot)

is a deterministic function with

f (\cdot) \in W^{d, 2}

.

2.1. Full Derivative-Based Emulators

Under Assumption 2, every sufficiently smooth unction,

f (\cdot)

, admits the derivative-based ANOVA (Db-ANOVA) expansion (see refs. [13,22]), that is,

\forall x \in Ω

,

f (x) = E_{X^{'}} [f (X^{'})] + \sum_{\begin{matrix} v, v \subseteq {1, \dots, d} \\ | v | > 0 \end{matrix}} E_{X^{'}} [D^{| v |} f (X^{'}) \prod_{k \in v} \frac{G_{k} (X_{k}^{'}) - 𝟙_{[X_{k}^{'} \geq x_{k}]}}{g_{k} (X_{k}^{'})}],

(1)

where

X^{'} : = (X_{1}^{'}, \dots, X_{d}^{'})

is a random vector of independent variables, having the CDFs

X_{j}^{'} \sim G_{j}

and the PDFs

g_{j} : = \frac{d G_{j}}{d x_{j}^{'}}

. By evaluating

f (\cdot)

at the random vector,

X

, and taking

G_{j} = F_{j}

, yields the unique and orthogonal Db-ANOVA decomposition of

f (X)

, that is,

f (X) = E [f (X^{'})] + \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ | v | > 0 \end{matrix}} f_{v} (X_{v}); f_{v} (X_{v}) : = E_{X^{'}} [D^{| v |} f (X^{'}) \prod_{k \in v} \frac{F_{k} (X_{k}^{'}) - 𝟙_{[X_{k}^{'} \geq X_{k}]}}{ρ_{k} (X_{k}^{'})}] .

(2)

When analytical cross-partial derivatives are available, or the derivative datasets are observed (see refs. [1,2] and the references therein), we are able to derive emulators of complex models that are time-demanding, bearing in mind the method of moments. Indeed, given a sample,

{\{X_{i}^{'}\}}_{i = 1}^{N} : = {\{(X_{i, 1}^{'}, \dots, X_{i, d}^{'})\}}_{i = 1}^{N}

, from

X^{'}

, and the associated sample of the (analytical or observed) derivatives’ outputs, that is,

({\{D^{| v |} f (X_{i}^{'})\}}_{i = 1}^{N}, \forall v \subseteq {1, \dots, d}),

the consistent (full) emulator or estimator of f at any sample point,

x

, of

X

is

\hat{f_{N}} (x) : = \frac{1}{N} \sum_{i = 1}^{N} \sum_{\begin{matrix} v, v \subseteq {1, \dots, d} \\ | v | \geq 0 \end{matrix}} D^{| v |} f (X_{i}^{'}) \prod_{k \in v} \frac{F_{k} (X_{i, k}^{'}) - 𝟙_{[X_{i, k}^{'} \geq x_{k}]}}{ρ_{k} (X_{i, k}^{'})},

(3)

with

D^{| \emptyset |} f = f

and

\prod_{k \in \emptyset} c_{k} : = 1

for every real

c_{k}

by convention. We can check that

\hat{f_{N}} (x)

is an unbiased estimator and that it reaches the parametric mean squared error (MSE) rate of convergence, that is,

E [{(\hat{f_{N}} (x) - f (x))}^{2}] = O (N^{- 1})

. This rate is dimension-free, provided that

V [\sum_{\begin{matrix} v, v \subseteq {1, \dots, d} \\ | v | \geq 0 \end{matrix}} D^{| v |} f (X_{1}^{'}) \prod_{k \in v} \frac{F_{k} (X_{1, k}^{'}) - 𝟙_{[X_{1, k}^{'} \geq x_{k}]}}{ρ_{k} (X_{1, k}^{'})}] < + \infty

.

For complex models without cross-partial derivatives, optimal estimators of such derivatives (i.e.,

\hat{D^{| v |} f}

) have been used to construct the plug-in consistent emulator of f [13]. Such an emulator is given by

\hat{f_{N, p}} (x) : = \frac{1}{N} \sum_{i = 1}^{N} \sum_{\begin{matrix} v, v \subseteq {1, \dots, d} \\ | v | \geq 0 \end{matrix}} \hat{D^{| v |} f} (X_{i}^{'}) \prod_{k \in v} \frac{F_{k} (X_{i, k}^{'}) - 𝟙_{[X_{i, k}^{'} \geq x_{k}]}}{ρ_{k} (X_{i, k}^{'})} .

While the estimator,

\hat{D^{| v |} f}

, provided in ref. [13] has a dimension-free upper bound for the bias and reaches the parametric rate of convergence, its MSE increases with

d^{| v |}

, showing the necessity of using the number of model runs,

N \propto d^{| v |}

, to expect a significant reduction in the MSE for higher-order cross-partial derivatives.

2.2. Adequate Structures of Emulators and Truncations

In high-dimensional settings, it is common to expect a reduction in the dimensionality before building emulators. The use of truncations is common practice in the polynomial approximation of functions [29,32,33,37] and in ANOVA-practicing communities [17,42,43,44], leading to truncated errors. When using Db-ANOVA, controlling such errors is made possible according to the information gathered from global sensitivity analysis [13,17]. Indeed, the variances in the terms in Db-ANOVA expansions of functions are exactly the main part, and interactions with Sobol’s indices up to a normalized constant occur when

G_{j} = F_{j}

. Thus, we are able to avoid non-relevant terms in our emulators according to the values of Sobol’s indices, suggesting that adequate truncations will not have any impact on the MSEs and the above parametric rate of convergence. For the sake of simplicity,

n = 1

is considered in what follows.

Definition 1.

Consider an integer,

d_{0} \in {1, \dots, d}

, and the full Db-ANOVA given by (2). The truncated Db-ANOVA of f (in the superpose sense [42,43,44,45]) of order

d_{0}

is given by

f_{T, d_{0}} (X) : = E [f (X^{'})] + \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ 0 < | v | \leq d_{0} \end{matrix}} f_{v} (X_{v}) .

While

f_{T, d_{0}}

is an approximation of f in general, the equality holds for some functions. Given an integer,

α \geq 0

, consider the space of functions:

L_{α, 0} : = \{f : R^{d} \to R^{n} : |f (x) - \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq α \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}]| = 0\} .

We can see that

L_{α, 0}

is a class of functions having, at most,

α

-order interactions. Moreover,

L_{α, 0}

contains the class of functions given by

D^{| {v, j_{0}} |} f = 0, \forall v \subseteq {1, \dots, d}, j_{0} \in (\sim v), | v | = α .

It is clear, then, that we have

f (X) = f_{T, d_{0}} (X)

when

f \in L_{α = d_{0}, 0}

.

Definition 2.

A truncation of f given by

f_{T, d_{0}}

is said to be an unbiased one whenever

f (X) = f_{T, d_{0}} (X)

.

Thus,

L_{α, 0}

is the class of unbiased truncations of f up to the order of

d_{0} = α

.

Formally, based on available derivatives, we are able to derive unbiased truncations for some classes of functions. Consider the following Db expressions of Sobol’s indices of the input variables

X_{j}, j = 1, \dots, d

and their upper bounds (see ref. [22]):

S_{j} : = \frac{1}{V [f (X)]} E [\frac{\partial f}{\partial x_{j}} (X) \frac{\partial f}{\partial x_{j}} (X^{'}) \frac{F_{j} (min (X_{j}, X_{j}^{'})) - F_{j} (X_{j}) F_{j} (X_{j}^{'})}{ρ_{j} (X_{j}) ρ_{j} (X_{j}^{'})}],

S_{T_{j}} : = \frac{1}{V [f (X)]} E [\frac{\partial f}{\partial x_{j}} (X) \frac{\partial f}{\partial x_{j}} (X_{j}^{'}, X_{\sim j}) \frac{F_{j} (min (X_{j}, X_{j}^{'})) - F_{j} (X_{j}) F_{j} (X_{j}^{'})}{ρ_{j} (X_{j}) ρ_{j} (X_{j}^{'})}],

and

S_{j} \leq S_{T_{j}} \leq U B_{j} : = \frac{1}{2 V [f (X)]} E [{(\frac{\partial f}{\partial x_{j}} (X))}^{2} \frac{F_{j} (X_{j}) (1 - F_{j} (X_{j}))}{{[ρ_{j} (X_{j})]}^{2}}] .

Note that the computations of

S_{j}

and

U B_{j}

are straightforward, using given first-order derivatives (see ref. [13]), whereas those of

S_{T_{j}}

require using i) nonparametric methods for given derivatives or ii) derivatives of specific input values. Using such indices, adequate structures of

f (\cdot)

can be constructed. For instance, it is known that when

\sum_{j = 1}^{d} S_{j} = 1

, f is an additive function of the form

f (x) = \sum_{j = 1}^{d} h_{j} (x_{j})

, with

h_{j}

being a real-valued function. Thus, a truncation (in the superpose sense) of order

d_{0} = 1

is an unbiased truncation. Other values of

d_{0}

are given in Proposition 1.

Proposition 1.

Consider the main and total indices given by

S_{j}, S_{T_{j}}

with

j = 1, \dots, d

. Then,

$\sum_{j = 1}^{d} (S_{j} + S_{T_{j}}) = 2$ implies that using $d_{0} = 2$ leads to unbiased truncations;
$\sum_{j = 1}^{d} (2 S_{j} + S_{T_{j}}) \in] 2, 3]$ implies that using $d_{0} = 3$ leads to unbiased truncations;
If there exists an integer, α, such that $(α - 1) \sum_{j = 1}^{d} S_{T_{j}} + \sum_{j = 1}^{d} S_{j} \geq α$ and $\sum_{j = 1}^{d} S_{T_{j}} + (α - 1) \sum_{j = 1}^{d} S_{j} \leq α$ , then $d_{0} = α$ leads to unbiased truncations.

Proof.

See Appendix A. □

In general, if

D^{| {v, j_{0}} |} f = 0, \forall v \subseteq {1, \dots, d}, j_{0} \in (\sim v), | v | = α

, then

d_{0} = α

leads to unbiased truncations. Often, low-order derivatives are available or can be efficiently computed using fewer model runs, leading to truncated emulators in the superpose sense. It is worth noting that our emulators still enjoy the above parametric rate of convergence for unbiased truncations. In the presence of truncated errors, it is usually difficult to derive the rates of convergence without additional assumptions about the order of such errors.

In addition to such truncations in the superpose sense, screening measures allow for quickly identifying non-relevant inputs (i.e.,

U B_{j} \approx 0

), leading to possible dimension reductions. For instance, we can see the following:

$S_{j} = S_{T_{j}}$ or $S_{j} \approx U B_{j}$ implies removing all cross-partial derivatives or interactions involving $X_{j}$ ;
$S_{j} = 0$ and $U B_{j} \geq S_{T_{j}} ≫ 0, \forall j \in {1, \dots, d}$ suggest removing the first-order terms, corresponding to $d_{0} = 1$ ;
$1 - (S_{i} + S_{T_{j}}) \leq S_{T_{\sim {i, j}}} = 0$ or, equivalently, $S_{i} + S_{T_{j}} = 1$ and $S_{T_{k}} = 0, \forall k \notin {i, j}$ implies keeping only $X_{i}$ and $X_{j}$ .

For non-highly smooth functions and the models for which the computations of derivatives are too expansive or impossible, derivative-free methods combined with unbiased truncations remain an interesting framework.

3. Derivative-Free Emulators of Models

This section covers the development of the model emulators, even when all the inputs are important according to screening measures.

3.1. Stochastic Surrogates of Functions Using Db-ANOVA

Consider integers

L > 0, q > 0

;

β_{ℓ} \in R

with

ℓ = 1, \dots, L

;

h : = (h_{1}, \dots, h_{d}) \in R_{+}^{d}

, and denote with

V : = (V_{1}, \dots, V_{d})

a d-dimensional random vector of independent variables satisfying:

\forall j \in {1, \dots, d}

,

E [V_{j}] = 0; E [{(V_{j})}^{2}] = σ^{2}; E [{(V_{j})}^{2 q + 1}] = 0; E [{(V_{j})}^{2 q}] < + \infty .

Any random variable that is symmetrically about zero is an instance of

V_{j}

s. Additionally, denote

β_{ℓ} h V : = (β_{ℓ} h_{1} V_{1}; \dots, β_{ℓ} h_{d} V_{d})

. For concise reporting of the results, elementary symmetric polynomials (ESPs) were used (see, e.g., [46,47]).

Definition 3.

Given

u \subseteq {1, \dots, d}

with

| u | > 0

and

r_{u} : = (r_{k}, \forall k \in u) \in R^{| u |}

, the

p^{t h}

ESP of

r_{u}

is defined as follows:

e_{p}^{(u)} (r_{u}) : = \{\begin{matrix} 0 & i f p > | u | o r p < 0 \\ 1 & i f p = 0 \\ \sum_{\begin{matrix} w \subseteq u \\ | w | = p \end{matrix}} \prod_{k \in w} r_{k} & i f p = 1, \dots, | u | \end{matrix} .

Note that

r : = r_{{1, \dots, d}} = (r_{1}, \dots, r_{d}) \in R^{d}

,

e_{p}^{(1 : d)} (r) : = e_{p}^{({1, \dots, d})} (r)

. In addition, given

X_{j}^{'} \sim G_{j}, \forall j \in {1, \dots, d}

, define

R_{k} (x_{k}, X_{k}^{'}, V_{k}) : = \frac{G_{k} (X_{k}^{'}) - 𝟙_{[X_{k}^{'} \geq x_{k}]}}{g_{k} (X_{k}^{'}) h_{k} σ^{2}} V_{k}, k = 1, \dots, d;

R_{u} (x_{u}, X_{u}^{'}, V_{u}) : = (R_{k} (x_{k}, X_{k}^{'}, V_{k}), \forall k \in u); R (x, X^{'}, V) : = R_{{1, \dots, d}} (x, X^{'}, V) .

Without loss of generality, we are going to focus on the modified output, that is,

f^{c} (x) : = f (x) - E [f (X^{'})]

. Based on the above framework, Theorem 1 provides a new approximation of every function or surrogate of a deterministic simulator.

Theorem 1.

Consider distinct

β_{ℓ}

s. If f is smooth enough and Assumption 1 holds, then there exists

α_{d} \in {1, \dots, L}

and real coefficients

C_{1}^{(p)}, \dots, C_{L}^{(p)}, p = 1, \dots, d

such that

f^{c} (x) = \sum_{ℓ = 1}^{L} \sum_{p = 1}^{d} C_{ℓ}^{(p)} E [f (X^{'} + β_{ℓ} h V) e_{p}^{(1 : d)} (R (x, X^{'}, V))] + O ({||h||}_{2}^{2 α_{d}}) .

(4)

Proof.

See Appendix B. □

The setting

L = 1, β_{1} = 1, C_{1}^{(p)} = 1

with

p = 1, \dots, d

provides an approximation of order

O ({||h||}_{2}^{2})

. Equivalently, the same order is obtained using the constraints:

\{\begin{matrix} \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} β_{ℓ}^{r} = δ_{p, r}; r = 0, \dots, L - 1, & i f p \leq L - 1 \\ \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} β_{ℓ}^{r} = δ_{p, r}; r = 0, \dots, L - 2, p, & o t h e r w i s e \end{matrix},

with

p = 1, \dots, d

and

L \leq d + 1

. In the same sense, taking

\sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} β_{ℓ}^{r} = δ_{p, r}; r = p, \dots, L + p - 1

with

p = 1, \dots, d

yields an approximation of order

O ({||h||}_{2}^{2 L})

. Such constraints implicitly define the coefficients

C_{1}^{(p)}, \dots, C_{L}^{(p)}, p = 1, \dots, d

, and they rely on the (generalized) Vandermonde matrices. Distinct values of

β_{ℓ}

s (i.e.,

β_{ℓ_{1}} \neq β_{ℓ_{2}}

) ensure the existence and uniqueness of such coefficients, as such matrices are invertible (see refs. [47,48]).

To improve the approximations of lower-order terms (i.e., lower values of p) in Equation (4), we are given an integer

r^{*} \in {0, \dots, d - 1}

with

r^{*} \leq L - 2

and consider the following constraints:

\{\begin{matrix} \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} β_{ℓ}^{r} = δ_{p, r}; r = 0, \dots, r^{*}, p + 2 λ_{p} + 2, \dots, p + 2 λ_{p} + 2 (L - 1 - r^{*}), & i f p \leq r^{*} \\ \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} β_{ℓ}^{r} = δ_{p, r}; r = 0, \dots, r^{*}, p, p + 2 \dots, p + 2 (L - r^{*} - 2), & o t h e r w i s e \end{matrix},

(5)

where

λ_{p} : = [\frac{r^{*} - p}{2}]

stands for the largest integer that is less than

\frac{r^{*} - p}{2}

. The above choice of coefficients requires

L \geq r^{*} + 2

, and

L^{*} : = r^{*} + 2

is the minimum number of model runs used for deriving surrogates of functions. Such coefficients are more suitable for truncated surrogates. For instance, the truncated surrogate of order

d_{0} \leq d

(in the superposition sense) is given by

\tilde{f_{T, d_{0}}^{c}} (x) : = \sum_{ℓ = 1}^{L} \sum_{p = 1}^{d_{0}} C_{ℓ}^{(p)} E [f (X^{'} + β_{ℓ} h V) e_{p}^{(1 : d)} (R (x, X^{'}, V))] .

Note that in this case, one must require

r^{*} \in {0, \dots, d_{0} - 1}

, and we will see that

r^{*} = d_{0} - 1

is the best choice to improve the MSEs.

Likewise, when

X_{u_{I}}

with

u_{I} \subset {1, \dots, d}

is the vector of the most influential input variables according to variance-based sensitivity analysis (see Section 2.2), the following truncated surrogate should be considered:

\tilde{f_{u_{I}}^{c}} (x_{u_{I}}) : = \sum_{ℓ = 1}^{L} \sum_{p = 1}^{| u_{I} |} C_{ℓ}^{(p)} E [f (X^{'} + β_{ℓ} h V) e_{p}^{(u_{I})} (R_{u_{I}} (x_{u_{I}}, X_{u_{I}}^{'}, V_{u_{I}}))] .

Based on Equation (4), the method of moments allows for deriving the emulator of any simulator or the estimator of any function. To that end, we are given two independent samples of size N, that is,

{\{X_{i}^{'}\}}_{i = 1}^{N} : = {\{(X_{i, 1}^{'}, \dots, X_{i, d}^{'})\}}_{i = 1}^{N}

from

X^{'}

and

{\{V_{i}\}}_{i = 1}^{N} : = {\{(V_{i, 1}, \dots, V_{i, d})\}}_{i = 1}^{N}

from

V

. The full and consistent emulator is given by

\hat{f_{N}^{c}} (x) : = \frac{1}{N} \sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} \sum_{p = 1}^{d} C_{ℓ}^{(p)} f (X_{i}^{'} + β_{ℓ} h V_{i}) e_{p}^{(1 : d)} (R (x, X_{i}^{'}, V_{i})) .

(6)

The derivations of the truncated emulators (i.e.,

\hat{\tilde{f_{T, d_{0}}^{c}}}

and

\hat{\tilde{f_{u_{I}}^{c}}}

) are straightforward. All these emulators rely on

N L

model runs with the possibility

L ≪ d

. This property is useful for high-dimensional simulators.

3.2. Statistical Properties of Our Emulators

While the emulator

\hat{f_{N}^{c}} (x)

does not rely on the model derivatives, structural and technical assumptions about f are needed to derive the biases of this emulator, such as the Hölder space of functions. Given

\vec{ı} : = (i_{1}, \dots, i_{d}) \in N^{d}

, denote

D^{(\vec{ı})} f : = (\prod_{k = 1}^{d} \frac{\partial^{i_{k}}}{\partial x_{k}}) f

,

{(x)}^{\vec{ı}} : = \prod_{k = 1}^{d} x_{k}^{i_{k}}

,

\vec{ı}! = i_{1}! \dots i_{d}!

and

| | \vec{ı} {| |}_{1} = i_{1} + \dots i_{d}

. Given

α \geq 0

, the Hölder space of

α

-smooth functions is given by

\forall x, y \in R^{d}

,

H_{α} : = \{f : R^{d} \to R : |f (x) - \sum_{| | \vec{ı} {| |}_{1} = 0}^{α - 1} \frac{D^{(\vec{ı})} f (y)}{\vec{ı}!} {(x - y)}^{\vec{ı}}| \leq M_{α} {||x - y||}_{2}^{α}\},

with

M_{α} > 0

, and

D^{(\vec{ı})} f (y)

is a (weak) cross-partial derivative.

Moreover, given

B_{α} \geq 0

and CDFs

G_{j}

s, define the following space of functions:

L_{α, B_{α}} : = \{f : R^{d} \to R : |f (x) - \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq α \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}]| \leq B_{α}\} .

We can see that

L_{α, B_{α}}

contains constants;

L_{α, 0}

is a class of functions having, at most,

α

-order of interactions, and

L_{d, 0}

is a class of all smooth functions, as

B_{d} = 0

. Lemma 1 provides the links between both spaces. To that end, consider

M_{| w |}^{'} : = {||D^{| w |} f||}_{\infty}

for all

w \subseteq {1, \dots, d}

with

{||\cdot||}_{\infty}

being the infinity norm.

Assumption A3.

g_{j} \geq ρ_{min} > 0

for any

j \in {1, \dots, d}

.

Assumption 3 aims to cover the class of quasi-uniform distributions and other distributions for which the event

g_{j} \geq ρ_{min}

occurs with a high probability. It is the case for most unbounded distributions.

Lemma 1.

Consider

0 < d_{0} \leq d

, and assume that

f \in H_{α}

with

α \in {0, d}

and Assumptions 1 and 3 hold. Then, there exists

γ_{0} > 0

such that

f \in L_{d_{0}, D_{d_{0}, ρ_{min}}}

, with

D_{d_{0}, ρ_{min}} : = 2 γ_{0} M_{0}^{'} {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d_{0}} \{{[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d - d_{0}} - 1\} .

Proof.

See Appendix C. □

Note that Lemma 1 also provides the upper bound of the remaining terms when approximating

f (x)

using the truncated function

f_{d_{0}} (x) : = \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq d_{0} \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}] .

When

\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} \to 0

,

D_{d_{0}, ρ_{min}}

is equivalent to

D_{d_{0}, ρ_{min}} \equiv \frac{(d - d_{0}) γ_{0} {(M_{0}^{'})}^{(d - 1) / d} {(M_{d}^{'})}^{1 / d}}{ρ_{min}} [\frac{d_{0}}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1] .

3.2.1. Biases of the Proposed Emulators

To derive the bias of

\hat{f_{N, d_{0}}^{c}}

(i.e., an estimator of

f_{d_{0}}^{c}

) in Theorem 2 using the aforementioned spaces of functions, denote with

Z : = (Z_{1}, \dots, Z_{d})

a d-dimensional random vector of independent variables that are centered about zero and standardized (i.e.,

E [Z_{k}^{2}] = 1

,

k = 1, \dots, d

), and

R_{c}

denotes the set of such random vectors. For any

r \in N

and

w \subseteq {1, \dots, d}

, define

Γ_{r} : = \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| w |)} β_{ℓ}^{r}|; K_{w, L} : = inf_{Z \in R_{c}} E [{||Z^{2}||}_{2}^{L} \prod_{k \in w} Z_{k}^{2}] Γ_{| w | + 2 L};

L_{w}^{'} : = ([\frac{r^{*} - | w |}{2}] + L - r^{*}) 𝟙_{| w | \leq r^{*}} + (L - r^{*} - 1) 𝟙_{| w | > r^{*}} .

Theorem 2.

Assume

f \in H_{α}

with

α \in \{0, max (d, d_{0} + 2 (L - r^{*} - 1))\}

and Assumptions 1 and 3 hold. Then, we have

|E [\hat{f_{N, d_{0}}^{c}} (x)] - f^{c} (x)| \leq \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} σ^{2 L_{w}^{'}} {||h^{2}||}_{2}^{L_{w}^{'}} M_{| w | + 2 L_{w}^{'}} K_{w, L_{w}^{'}} {(\frac{1}{2 ρ_{min}})}^{| w |} + D_{d_{0}, ρ_{min}} .

(7)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|E [\hat{f_{N, d_{0}}^{c}} (x)] - f^{c} (x)| \leq \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} ξ^{2 L_{w}^{'}} | | h^{2} {| |}_{1}^{L_{w}^{'}} M_{| w | + 2 L_{w}^{'}} Γ_{| w | + 2 L_{w}^{'}} {(\frac{1}{2 ρ_{min}})}^{| w |} + D_{d_{0}, ρ_{min}} .

(8)

Proof.

See Appendix D. □

Using the fact

h_{k} \to 0

, the results provided in Theorem 2 have simple upper bounds (see Corollary 1). To provide such results, consider

K_{1, r^{*}, d_{0}}^{max} : = max_{\begin{matrix} w \subseteq {1, \dots, d} \\ r^{*} < | w | \leq d_{0} \end{matrix}} \{K_{w, (L - r^{*} - 1)} M_{| w | + 2 (L - r^{*} - 1)}\};

K_{2, r^{*}, d_{0}}^{max} : = max_{\begin{matrix} w \subseteq {1, \dots, d} \\ r^{*} < | w | \leq d_{0} \end{matrix}} \{M_{| w | + 2 (L - r^{*} - 1)} Γ_{| w | + 2 (L - r^{*} - 1)}\};

K_{1, ρ_{min}, r^{*}} : = [2 ρ_{min} {(\frac{d}{2 ρ_{min}})}^{r^{*} + 1} \frac{{(\frac{d}{2 ρ_{min}})}^{d_{0} - r^{*}} - 1}{d - 2 ρ_{min}}] 𝟙_{r^{*} < d_{0} - 1} + (\binom{d}{d_{0}}) {(\frac{1}{2 ρ_{min}})}^{d_{0}} 𝟙_{r^{*} = d_{0} - 1} .

Corollary 1.

Assume

f \in H_{α}

with

α \in \{0, max (d, d_{0} + 2 (L - r^{*} - 1))\}

and Assumptions 1 and 3 hold. If

h_{k} \to 0

, then

|E [\hat{f_{N, d_{0}}^{c}} (x)] - f^{c} (x)| \leq {||h^{2}||}_{2}^{L - r^{*} - 1} σ^{2 (L - r^{*} - 1)} K_{1, r^{*}, d_{0}}^{max} K_{1, ρ_{min}, r^{*}} + D_{d_{0}, ρ_{min}} + O ({||h^{2}||}_{2}^{L - r^{*} - 1}) .

(9)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|E [\hat{f_{N, d_{0}}^{c}} (x)] - f^{c} (x)| \leq | | h^{2} {| |}_{1}^{L - r^{*} - 1} ξ^{2 (L - r^{*} - 1)} K_{2, r^{*}, d_{0}}^{max} K_{1, ρ_{min}, r^{*}} + D_{d_{0}, ρ_{min}} + O (| | h^{2} {| |}_{1}^{L - r^{*} - 1}) .

(10)

Proof.

See Appendix E. □

Using the above results, the bias of the full emulator of

f^{c}

is straightforward when taking

d_{0} = d

and knowing that

D_{d, ρ_{min}} = 0

. Moreover, Corollary 2 provides the bias of such an emulator under different structural assumptions about f so as to cope with many functions. To that end, define

K_{1 : d} : = inf_{Z \in R_{c}} E [{||Z||}_{2} \prod_{k = 1}^{d} Z_{k}^{2}] .

Corollary 2.

Let

d_{0} = d

;

r^{*} \leq d - 1

and

L = r^{*} + 2

. Assume

f \in H_{α}

with

α \in \{0, d + 1\}

and Assumptions 1 and 3 hold. If

h_{k} \to 0

, then

|E [\hat{f_{N}^{c}} (x)] - f^{c} (x)| \leq σ {||h||}_{2} M_{d + 1} K_{1 : d} Γ_{d + 1} \prod_{k = 1}^{d} E [|E_{k}|] + O ({||h||}_{2}) .

(11)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|E [\hat{f_{N}^{c}} (x)] - f^{c} (x)| \leq {ξ | | h | |}_{1} M_{d + 1} Γ_{d + 1} \prod_{k = 1}^{d} E [|E_{k}|] + O ({| | h | |}_{1}) .

(12)

Proof.

See Appendix F. □

In view of Corollaries 1 and 2, Equation (12) can lead to a dimension-free upper bound of the bias. Indeed, using the uniform bandwidth

h_{k} = h

and

ξ \leq \frac{1}{d M_{d + 1} Γ_{d + 1} \prod_{k = 1}^{d} E [|E_{k}|]},

we can see that the upper bound of the bias is

|E [\hat{f_{N}^{c}} (x)] - f^{c} (x)| \leq h

. Furthermore, it is worth noting that

|E [\hat{f_{N, d_{0}}^{c}} (x)] - f^{c} (x)| \leq h^{L - r^{*} - 1}

for any function

f \in L_{d_{0}, 0}

.

3.2.2. Mean Squared Errors

We start this section with the variance of the proposed emulators, followed by their mean squared errors and different rates of convergence. For the variance, define

ϝ_{r^{*}, d_{0}}^{max} : = max_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq d_{0} \end{matrix}} \{M_{min (r^{*}, | w | - 1) + 1} Γ_{min (r^{*}, | w | - 1) + 1}\}; ϝ_{d_{0}}^{max} : = max_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq d_{0} \end{matrix}} \{M_{| w |} Γ_{| w |}\} .

Theorem 3.

Consider the coefficients given by Equation (5), and assume

f \in H_{α}

with

α \in \{0, max (d, d_{0} + 2 (L - r^{*} - 1))\}

and Assumptions 1–3 hold. Then,

V [\hat{f_{N, d_{0}}^{c}} (x)] \leq \frac{{(ϝ_{r^{*}, d_{0}}^{max})}^{2}}{N} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \prod_{k \in w} (\frac{E [E_{k}^{2}]}{h_{k}^{2} σ^{4}} E [V_{1}^{2} {||h V||}_{2}^{2 \frac{min (r^{*}, | w | - 1) + 1}{| w |}}]) .

Moreover, if

r^{*} = d_{0} - 1

,

h_{k} = h

and

Z_{k} = V_{k} / σ

, then

V [\hat{f_{N, d_{0}}^{c}} (x)] \leq \frac{d {(ϝ_{d_{0}}^{max})}^{2} E [Z_{1}^{2} {||Z||}_{2}^{2}]}{N (d E [Z_{1}^{2} {||Z||}_{2}^{2}] - 3 ρ_{min}^{2})} [{(\frac{d E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} - 1] .

Proof.

See Appendix G. □

It turns out that the upper bounds of the variance in Theorem 3 do not depend on the uniform bandwidths when

r^{*} = d_{0} - 1

, leading to the derivations of the parametric MSEs of

\hat{f_{N, d_{0}}^{c}} (x)

and

\hat{f_{N}^{c}} (x)

. To that end, consider the upper bound of the above variance, that is,

{\bar{Σ}}_{d_{0}} : = \frac{d {(ϝ_{d_{0}}^{max})}^{2} E [Z_{1}^{2} {||Z||}_{2}^{2}]}{N (d E [Z_{1}^{2} {||Z||}_{2}^{2}] - 3 ρ_{min}^{2})} [{(d \frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} - 1] .

Remark 1.

Based on the expression of

{\bar{Σ}}_{d_{0}}

, the random variable

V_{j}

or

Z_{j} = V_{j} / σ

having the smallest value of fourth moments or kurtosis should be used. Under the additional condition

E [Z_{1}^{2} {||Z||}_{2}^{2}] \geq 3 ρ_{min}^{2}

, we can check that (see Appendix G)

{\bar{Σ}}_{d_{0}} \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} [2 {(d \frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} 𝟙_{d_{0} \leq d_{0}^{*}} + 2^{d} {(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} 𝟙_{d_{0} > d_{0}^{*}}],

with

d_{0}^{*} : = \frac{(d - 1) ln (2)}{ln (d)}

.

In what follows, we are going to use

E [{(\hat{f_{N, d_{0}}^{c}} (x) - f^{c} (x))}^{2}]

for the MSE of

\hat{f_{N, d_{0}}^{c}}

and

E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}] : = E_{X} \{E [{(\hat{f_{N, d_{0}}^{c}} (X) - f^{c} (X))}^{2}]\},

for the expected or integrated MSE (IMSEs) of

\hat{f_{N, d_{0}}^{c}}

.

Corollary 3.

Given (5),

r^{*} = d_{0} - 1

, assume

f \in H_{α}

with

α \in \{0, max (d, d_{0} + 2 (L - d_{0}))\}

;

h_{k} = h \to 0

and Assumptions 1–3 hold. Then, the MSE and IMSE share the same upper bound, given as follows:

\begin{matrix} E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}] & \leq & 2 D_{d_{0}, ρ_{min}} {||h^{2}||}_{2}^{L - d_{0}} σ^{2 (L - d_{0})} K_{1, d_{0} - 1, d_{0}}^{max} (\binom{d}{d_{0}}) {(\frac{1}{2 ρ_{min}})}^{d_{0}} + D_{d_{0}, ρ_{min}}^{2} \\ + {\bar{Σ}}_{d_{0}} + {||h^{2}||}_{2}^{2 (L - d_{0})} σ^{4 (L - d_{0})} {(K_{1, d_{0} - 1, d_{0}}^{max})}^{2} {(\binom{d}{d_{0}})}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d_{0}} . \end{matrix}

Moreover, if

G_{j} = F_{j}

with

j = 1, \dots, d

, then the IMSE is given by

\begin{matrix} E \{E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}]\} & \leq & {||h^{2}||}_{2}^{2 (L - d_{0})} σ^{4 (L - d_{0})} {(K_{1, d_{0} - 1, d_{0}}^{max})}^{2} {(\binom{d}{d_{0}})}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d_{0}} \\ + D_{d_{0}, ρ_{min}}^{2} + {\bar{Σ}}_{d_{0}} . \end{matrix}

Proof.

See Appendix H. □

The presence of

D_{d_{0}, ρ_{min}}

in Corollary 3 is going to decrease the rates of convergence of our estimators without additional assumptions about

D_{d_{0}, ρ_{min}}^{2}

. Corollary 4 starts providing such conditions and the associated MSEs and IMSEs.

Corollary 4.

Under the conditions of Corolary 3, assume that

f \in L_{d_{0}, 0}

. Then, the upper bound of the IMSE and MSE is given by

{||h^{2}||}_{2}^{2 (L - d_{0})} σ^{4 (L - d_{0})} {(K_{1, d_{0} - 1, d_{0}}^{max})}^{2} {(\binom{d}{d_{0}})}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d_{0}} + {\bar{Σ}}_{d_{0}} .

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then this bound becomes

| | h^{2} {| |}_{1}^{2 (L - d_{0})} ξ^{4 (L - d_{0})} {(M_{d_{0} + 2 (L - d_{0})} Γ_{d_{0} + 2 (L - d_{0})})}^{2} {(\binom{d}{d_{0}})}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d_{0}} + {\bar{Σ}}_{d_{0}} .

Proof.

Using Corollary 1, the results are straightforward. □

Based on the upper bounds of Corollary 4, interesting choices of

σ

or

ξ

on one hand and

ρ_{min}

and h on the other hand help in obtaining the parametric rates of convergence due to the fact that

{\bar{Σ}}_{d_{0}}

does not depend on h.

Corollary 5.

Let

r^{*} = d_{0} - 1

,

L = d_{0} + 1

. Assume

f \in H_{α}

with

α \in {0, max (d, d_{0} + 2)}

;

f \in L_{d_{0}, 0}

and Assumptions 1–3 hold. If

h_{k} = h \propto N^{- η}

with

η \in] \frac{1}{4}, 1 [

and

ξ^{2} \leq {(d M_{d_{0} + 2} Γ_{d_{0} + 2} (\binom{d}{d_{0}}) {(\frac{1}{2 ρ_{min}})}^{d_{0}})}^{- 1}

, then the upper bound of MSE and IMSE is

E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} [2 {(\frac{d (d + 0.8)}{3 ρ_{min}^{2}})}^{d_{0}} 𝟙_{d_{0} \leq d_{0}^{*}} + 2^{d} {(\frac{d + 0.8}{3 ρ_{min}^{2}})}^{d_{0}} 𝟙_{d_{0} > d_{0}^{*}}] + O (N^{- 1}),

provided that

d + 0.8 \geq 3 ρ_{min}^{2}

.

Moreover, if

ρ_{min} = \sqrt{\frac{c_{0} d (d + 0.8)}{3}}

with the real

1 < c_{0} \leq 2

, then

E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N (c_{0} - 1)} + O (N^{- 1}) .

Proof.

See Appendix I. □

It is worth noting that the parametric rates of convergence are reached for any function that belongs to

L_{d_{0}, 0}

(see Corollary 5). Moreover, taking

c_{0} \in] 1, 2]

leads to the dimension-free MSEs, which hold for particular distributions of the inputs given by

X_{j} \sim F_{j}^{*}, ρ_{j}^{*} \geq ρ_{min}^{*} : = \sqrt{\frac{c_{0} d (d + 0.8)}{3}}, \forall j \in {1, \dots, d} .

For uniform distributions,

X_{j} \sim U (a_{j}, b_{j})

, we must have

b_{j} - a_{j} \leq \frac{1}{ρ_{min}^{*}}

. Obviously, such conditions are a bit strong, as a few distributions are covered. In the same sense, using

\hat{f_{N, d_{0}}^{c}}

as an emulator of

f^{c}

for any sample point,

x

, of

X \sim F

requires choosing

X_{j}^{'}

such that its support contains that of

X_{j}

,

j = 1, \dots, d

. Thus, Assumption 3, given by

g_{j} > ρ_{min}

, implicitly depends on the distribution of

X_{j}

. For instance, given the bounded support of

X_{j}^{'}

, that is,

(a_{j}^{'}, b_{j}^{'})

, we must have

\frac{1}{ρ_{min}^{*}} > b_{j}^{'} - a_{j}^{'} \geq b_{j} - a_{j}

, with

(a_{j}, b_{j})

being the support of

X_{j}

, limiting our ability to deploy

\hat{f_{N, d_{0}}^{c}}

as a dimension-free, global emulator for some distributions of inputs.

Nevertheless, the assumption

g_{j} \geq ρ_{min}^{*}

is always satisfied for the finite dimensionality, d, if we are only interested in estimating

f (x_{0})

for a given point

x_{0}

, leading to local emulators. Indeed, taking

X_{j}^{'} \sim G_{j}

to be depended on the point

x^{0}

at which f must be evaluated allows for enjoying the parametric rate of convergence and dimension-free MSEs for sample points falling in a neighborhood of

x_{0}

. An example of such a choice is

X_{j}^{'} \sim U (x_{j}^{0} - \frac{1}{2 ρ_{min}^{*}}, x_{j}^{0} + \frac{1}{2 ρ_{min}^{*}}) .

However, different emulators are going to be built in order to estimate

f (x)

for any value,

x

, of

X

. Constructions of balls of given nodes and the radius

1 / ρ_{min}^{*}

are an interesting perspective.

Remark 2.

When

f \notin L_{d_{0}, 0}

, in-depth structural assumptions about f that should allow for enjoying the above MSEs concern the truncation error, resulting from keeping only all the

{| v |}^{h}

interactions or cross-partial derivatives with

| v | \leq d_{0}

. One way to handle it consists of choosing

d_{0} = d_{0, N}

such that the residual bias is less than

1 / N

(i.e.,

D_{d_{0, N}, ρ_{min}}^{2} < 1 / N

), thanks to sensitivity indices.

While truncations are sometimes necessary in higher dimensions, it is interesting to have the rates of convergence without any truncation to cover lower or moderate dimensional functions, for instance.

Corollary 6.

Let

d_{0} = d

;

r^{*} = d - 1

,

L = d + 1

and

h_{k} = h

. If

f \in H_{α}

with

α \in {0, d + 1}

,

h \to 0

, and Assumptions 1–3 hold, then the upper bound of MSE and IMSE is

E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq σ^{2} {||h||}_{2}^{2} M_{d + 1}^{2} K_{1 : d}^{2} Γ_{d + 1}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d} + \frac{{(ϝ_{d}^{max})}^{2}}{N} [{(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}} + 1)}^{d} - 1] .

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq ξ^{2} {| | h | |}_{1}^{2} M_{d + 1}^{2} Γ_{d + 1}^{2} {(\frac{1}{2 ρ_{min}})}^{2 d} + \frac{{(ϝ_{d}^{max})}^{2}}{N} [{(\frac{d + 0.8}{3 ρ_{min}^{2}} + 1)}^{d} - 1] .

Proof.

See Appendix J. □

In the case of the full emulator of

f^{c}

, remark that

\frac{{(ϝ_{d}^{max})}^{2}}{N} [{(\frac{d + 0.8}{3 ρ_{min}^{2}} + 1)}^{d} - 1] \leq \frac{d {(ϝ_{d}^{max})}^{2} (d + 0.8)}{N (d (d + 0.8) - 3 ρ_{min}^{2})} [{(d \frac{(d + 0.8)}{3 ρ_{min}^{2}})}^{d} - 1] .

Based on Corollary 6, different rates of convergence can be obtained depending on the the support of the input variables

X_{j} \sim F_{j}

via the choice of

ρ_{min}

.

Corollary 7.

Let

r^{*} = d - 1

and

L = d + 1

. Assume

f \in H_{α}

with

α \in {0, d + 1}

;

ξ \leq {[d M_{d + 1} Γ_{d + 1} {(\frac{1}{2 ρ_{min}})}^{d}]}^{- 1}

;

h_{k} = h \propto N^{- η}

with

η \in] \frac{1}{2}, 1 [

and Assumptions 1–3 hold. Then, the (MSE and IMSE) rates of convergence are given as follows:

E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d}^{max})}^{2}}{N} [{(\frac{d + 0.8}{3 ρ_{min}^{2}} + 1)}^{d} - 1] + O (N^{- 1}) .

(i): If $ρ_{min} = \sqrt{\frac{d + 0.8}{3}}$ , then $E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d}^{max})}^{2}}{N} (2^{d} - 1) + O (N^{- 1})$ .
(ii): If $ρ_{min} = \sqrt{\frac{c_{0} d (d + 0.8)}{3}}$ with $1 \leq c_{0} \leq 2$ , then

E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d}^{max})}^{2}}{N} min \{\frac{1}{c_{0} - 1}, {(\frac{1}{c_{0} d} + 1)}^{d} - 1\} + O (N^{- 1}),

with

{(\frac{1}{c_{0} d} + 1)}^{d} - 1 \equiv \frac{1}{c_{0}}

when

c_{0} d \to \infty

.

Proof.

It is straightforward bearing in mind Corollaries 5 and 6. □

Again, the assumptions in Points (i)–(ii) are satisfied for fewer distributions, but they are always satisfied if we are only interested in estimating

f (x_{0})

for a given point

x_{0}

, leading to building different local emulators.

3.2.3. Mean Squared Errors for Every Distribution of Inputs

In this section, we are going to remove the assumption about the quasi-uniform distribution of inputs (Assumption 3) so as to cover any probability distribution of inputs. Note that Assumption 3 is used to derive

E [E_{k}^{2}] \leq \frac{1}{3 ρ_{min}^{2}}

and

E [| E_{k} |] \leq \frac{1}{2 ρ_{min}}

. Such an assumption can be avoided by using the following inequalities:

E [| E_{k} |] = E [\frac{[1 - G (X_{k}^{'})] 𝟙_{X_{k}^{'} \geq X_{k}} + G (X_{k}^{'}) 𝟙_{X_{k}^{'} < X_{k}}}{g_{k} (X_{k}^{'})}] \leq κ_{1},

with

κ_{1} : = sup_{k \in {1, \dots, d}} sup_{x_{k}^{'} \in Ω_{k}} E [\frac{F_{k} (X_{k}^{'}) + G (X_{k}^{'}) - 2 G (X_{k}^{'}) F_{k} (X_{k}^{'})}{g_{k} (X_{k}^{'})}];

and

E [E_{k}^{2}] = E [\frac{G^{2} (X_{k}^{'}) + F_{k} (X_{k}^{'}) - 2 G (X_{k}^{'}) F_{k} (X_{k}^{'})}{g_{k}^{2} (X_{k}^{'})}] \leq κ_{2},

with

κ_{2} : = sup_{k \in {1, \dots, d}} sup_{x_{k}^{'} \in Ω_{k}} E [\frac{G^{2} (x_{k}^{'}) + F_{k} (x_{k}^{'}) - 2 G (x_{k}^{'}) F_{k} (x_{k}^{'})}{g_{k}^{2} (x_{k}^{'})}] .

Using such inequalities, the following results are straightforward keeping in mind Corollaries 5–7.

Corollary 8.

Let

r^{*} = d_{0} - 1

and

L = d_{0} + 1

. Assume

f \in H_{α}

with

α \in {0, max (d, d_{0} + 2)}

;

f \in L_{d_{0}, 0}

and Assumption 1 and 2 hold. If

h_{k} = h \propto N^{- η}

with

η \in] \frac{1}{4}, 1 [

and

ξ^{2} \leq {(d M_{d_{0} + 2} Γ_{d_{0} + 2} (\binom{d}{d_{0}}) κ_{1}^{d_{0}})}^{- 1}

, then we have

E [{(\hat{f_{N, d_{0}}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} [2 {(κ_{2} d (d + 0.8))}^{d_{0}} 𝟙_{d_{0} \leq d_{0}^{*}} + 2^{d} {(κ_{2} (d + 0.8))}^{d_{0}} 𝟙_{d_{0} > d_{0}^{*}}],

provided that

κ_{2} (d + 0.8) \geq 1

.

Corollary 9.

Let

r^{*} = d - 1

and

L = d + 1

. Assume

f \in H_{α}

with

α \in {0, d + 1}

;

ξ \leq {[d M_{d + 1} Γ_{d + 1} κ_{1}^{d}]}^{- 1}

;

h_{k} = h \propto N^{- η}

with

η \in] \frac{1}{2}, 1 [

; and Assumption 1 and 2 hold. Then,

E [{(\hat{f_{N}^{c}} - f^{c})}^{2}] \leq \frac{{(ϝ_{d}^{max})}^{2}}{N} [{(κ_{2} (d + 0.8) + 1)}^{d} - 1] .

Regarding the choice of

G_{k}

, from the expression

E [E_{k}^{2}] = E [\frac{F_{k} (X_{k}^{'}) + G (X_{k}^{'}) [G (X_{k}^{'}) - 2 F_{k} (X_{k}^{'})]}{g_{k}^{2} (X_{k}^{'})}],

an interesting choice of

G_{k}

must satisfy

G_{k} \leq 2 F_{k}

in order to reduce the value of

E [E_{k}^{2}]

. The following proposition gives interesting choices of

G_{k}

s. Recall that

X_{k} \sim F_{k}

and

X_{k}^{'} \sim G_{k}

are supported on

Ω_{k}, Ω_{k}^{'}

, respectively, with

Ω_{k} \subseteq Ω_{k}^{'}

.

Proposition 2.

Consider a PDF,

ρ_{k}^{'}

, supported on

Ω_{k}^{'} ∖ Ω_{k}

and

τ \in] 0, 1]

. The distribution,

G_{k}

, defined as a mixture of

ρ_{k}

and

ρ_{k}^{'}

, that is,

g_{k} = τ ρ_{k} 𝟙_{Ω_{k}} + (1 - τ) ρ_{k}^{'} 𝟙_{Ω_{k}^{'} ∖ Ω_{k}},

allows for reducing

E [E_{k}^{2}]

, provided that

sup {x : x \in Ω} \leq y_{0}

,

\forall y_{0} \in Ω_{k}^{'} ∖ Ω_{k}

.

Proof.

We can check that

G_{k} \leq F_{k}

. □

In what follows, we are going to consider

τ = 1

(i.e.,

G_{k} = F_{k}

) and

τ < 1

with

ρ_{k}^{'}

being the uniform distribution.

4. Illustrations

In this section, we deploy our derivative-free emulators to approximate analytical functions. For the setting of different parameters needed, we rely on the results of Corollary 5. Indeed, we use

V_{k} \sim U (- ξ, ξ)

with

ξ = {(d (\binom{d}{d_{0}}) {(\frac{1}{2 ρ_{min}})}^{d_{0}})}^{- 1 / 2}

;

L = d_{0} + 1

for the identified

d_{0}

. For each function, we use

N L

runs of the model to construct our emulators, corresponding to

f (X_{i} + β_{ℓ} h V_{i})

with the following:

$i = 1, \dots, N = 500$ ;
$β_{ℓ} \in {0, \pm 2^{k - 1} : k = 1, \dots, \frac{L - 1}{2}}$ if L is odd and $β_{ℓ} \in {\pm 2^{k} : k = 1, \dots, \frac{L}{2}}$ ; otherwise, $h = N^{- 1}$ .

Then, we predict the output for

N = 500

sample points, that is,

f (X_{i})

. Finally, sample values are generated using Sobol’s sequence, and we compare the predictions associated with the initial distributions, that is,

G_{k} = F_{k}

, with a mixture distribution of

F_{k}

(i.e.,

τ = 0.9

) (see Proposition 2).

4.1. Test Functions

4.1.1. Ishigami’s Function ( $d = 3$ )

The Ishigami function is given by

f (x) = sin (x_{1}) + 7 {sin}^{2} (x_{2}) + 0.1 x_{3}^{4} sin (x_{1}),

with

X_{j} \sim U (- π, π)

,

j = 1, 2, 3

. The sensitivity indices are

S_{1} = 0.3139

,

S_{2} = 0.4424

,

S_{3} = 0.0

,

S_{T_{1}} = 0.567

,

S_{T_{2}} = 0.442

, and

S_{T_{3}} = 0.243

. Thus, we have

d_{0} = 2

because

\sum_{j = 1}^{3} (S_{j} + S_{T_{j}}) = 2.01

(see Proposition 1). Moreover, we are going to remove (in our emulator) the main effect term corresponding to

X_{3}

, as

S_{3} = 0

. Figure 1 depicts the predictions versus observations (i.e., simulated outputs) for 500 sample points. We can see that our predictions are in line with the simulated values of the output for both distributions used.

4.1.2. Sobol’s g-Function ( $d = 10$ )

The g-function [49] is defined as follows:

f (x) = \prod_{j = 1}^{d = 10} \frac{| 4 x_{j} - 2 | + a_{j}}{1 + a_{j}} .

with

X_{j} \sim U (0, 1)

j = 1, \dots, d = 10

. This function is differentiable almost everywhere and has different properties according to the values of

a = (a_{j}, j = 1, 2, \dots, d)

[17]. Indeed, the following applies:

If $a = {[0, 0, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52]}^{T}$ (i.e., type A), we have $S_{1} = S_{2} = 0.39$ , $S_{j} = 0.0069$ , $\forall j > 2$ , $S_{T_{1}} = S_{T_{2}} = 0.54$ , and $S_{T_{j}} = 0.013$ , $\forall j > 2$ . We have $d_{0} = 2$ , as $\sum_{j = 1}^{d} (S_{j} + S_{T_{j}}) = 2.01$ . Moreover, we have $S_{1} + S_{T_{2}} \approx 1$ , suggesting that we should only include $X_{1}$ and $X_{2}$ in our emulator;
If $a = {[50, 50, 50, 50, 50, 50, 50, 50, 50, 50]}^{T}$ (i.e., type B), we have $S_{j} = S_{T_{j}} = 0.1$ and $\forall j \in {1, 2, \dots, d}$ , leading to $d_{0} = 1$ ;
If $a = {[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}^{T}$ (i.e., type C), we have $S_{j} = 0.02$ and $S_{T_{j}} = 0.27$ , $\forall j \in {1, 2, \dots, d}$ , and we can check that $d_{0} = 4$ and that all the inputs are important. Thus, we have to include a lot of ANOVA components in our emulator with small effective effects since the variance of that function is fixed. More information is needed to design the structure of this function better.

Figure 2 and Figure 3 depict the predictions versus the simulated outputs (i.e., observations) of the g-function of type A and type B, respectively, for 500 sample points. Even though we obtain quasi-perfect predictions in the case of the g-function of type B, those of type A face some difficulties in predicting small and null values.

5. Application: Heat Diffusion Models with Stochastic Initial Conditions

We deployed our emulators to approximate a high-dimensional model defined by the one-dimensional (1-D) diffusion PDE with stochastic initial conditions, that is,

\{\begin{matrix} \frac{\partial f}{\partial t} - D \frac{\partial^{2} f}{\partial^{2} x} = 0, & x \in] 0, 1 [, t \in [0, T] \\ f (x, t = 0) = Z (x) & x \in [0, 1] \\ f (x = 0, t) = 0, f (x = 1, t) = 1, & t \in [0, T] \end{matrix},

where

D = 0.0011

represents the diffusion coefficient. The quantity of interest (QoI) is given by

J (Z (x)) : = \frac{1}{2} \int_{0}^{T} \int_{0}^{10} {(f (x, t))}^{2} d x d t

. The spatial discretization consists of subdividing the spatial domain,

[0, 1]

, into d equally-sized cells, leading to d initial conditions or inputs, that is,

Z (x_{j})

with

j = 1, \dots, d

. We assume that the

d = 50

inputs are independent, and

X_{j} : = Z (x_{j}) \sim U (sin (2 π x_{j}) - 1.96, sin (2 π x_{j}) + 1.96)

. A time step of

0.025

was considered, starting from 0 up to

T = 5

.

It is known [12] that the exact gradient can be computed as follows:

\nabla_{Z} J (Z (x)) = f^{a} (x, 0)

, where

f^{a} (x, 0)

stands for the adjoint model of f evaluated at

(x, t = 0)

. Note that only one evaluation of such a function is needed to obtain the gradient of the QOI. The adjoint model is given by (see ref. [12])

\{\begin{matrix} - \frac{\partial f^{a}}{\partial t} - D \frac{\partial^{2} f^{a}}{\partial^{2} x} = f, & x \in] 0, 1 [, t \in [0, T] \\ f^{a} (x = 0, t) = f^{a} (x = 1, t) = 0, & t \in [0, T] \\ f^{a} (x, T) = 0, & x \in [0, 1] \end{matrix} .

The values of the hyper-parameters derived in this paper (considered at the beginning of Section 4) were used to compute the results below. Using the exact values of the gradient (i.e.,

f^{a} (x, 0)

), we computed the main indices (

S_{j}

s) and the upper bounds of the total indices (

U B_{j}

s) (see Figure 4, top-left panel). It appears that the upper bounds are almost equal to the main indices, showing the absence of interactions. This information is confirmed by the fact that

\sum_{j = 1}^{d = 50} S_{j} = 1.09

, leading to

d_{0} = 1

. Based on this information, Figure 4 (top-right panel) depicts the predictions versus the observations (simulated outputs) using the derivative-based emulator with all the first-order partial derivatives. In the same sense, Figure 4 (bottom-left panel) depicts the observations versus predictions by including only the ANOVA components for which

U B_{j} > 0.01

, that is, 37 components. Both results are close together and are in line with the observations. Finally, Figure 4 (bottom-right panel) shows the observations versus predictions for derivative-free emulators using only the components for which

U B_{j} > 0.01

. It turns out that our emulators provide reliable estimations. As expected (see MSEs), the derivative-based emulator using exact values of derivatives performs better.

6. Conclusions

In this paper, we have proposed simple, practical, and sound emulators of simulators or estimators of functions using either the available derivatives and distribution functions of inputs or derivative-free methods, such as stochastic approximations. Since our emulators or estimators rely exactly on Db-ANOVA, Sobol’s indices and their upper bounds were used to derive the appropriate structures of such emulators so as to reduce epistemic uncertainty. The derivative-based and derivative-free emulators reach the parametric rate of convergence (i.e.,

O (N^{- 1})

) and have dimension-free biases. Moreover, the former emulators enjoy dimension-free MSEs when all cross-partial derivatives are available and, therefore, can cope with higher-dimensional models. However, the MSEs of the derivative-free estimators depend on dimensionality, and we have shown that the stability and accuracy of such emulators require about

N \approx {(d + 1)}^{d}

model runs for full emulators and about

N \approx min (d^{2 d_{0}}, 2^{d} d^{d_{0}})

runs for unbiased, truncated emulators of order

d_{0}

.

To be able to deploy our emulators in practice, we have provided the best known values for the hyper-parameters needed. The numerical results have revealed that our emulators provide efficient predictions of models once the adequate structures of such models are used. While such results are promising, further improvements are going to be investigated in our next works by i) considering distributions of

V_{j}

s that may help in reducing the dimensionality in MSEs, ii) taking into account the discrepancy errors by using the output observations (rather than their mean only), and iii) considering local emulators. It is also worth investigating adaptations of such methods in the presence of empirical data.

Funding

This research received no external funding, except the APC funded by stats-MDPI.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No empirical datasets are used in this paper. All simulated data are already in the paper.

Acknowledgments

We would like to thank the three reviewers for their suggestions and comments that have helped improve our manuscript.

Conflicts of Interest

The author has no conflicts of interest to declare regarding the content of this paper.

Appendix A. Derivations of Unbiased Truncations (Proposition 1)

Keeping in mind the Sobol indices, it is known that

\sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | > 0 \end{matrix}} S_{v} = 1

, which comes down to

\sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} \end{matrix}} S_{v} = 1

for functions of the form

f (X) = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} \end{matrix}} f_{v} (X_{v})

. Thus, we have

d_{0} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} \end{matrix}} d_{0} S_{v}, \sum_{j = 1}^{d} S_{T_{j}} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} \end{matrix}} | v | S_{v} .

Taking the difference yields

d_{0} - \sum_{j = 1}^{d} S_{T_{j}} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} - 1 \end{matrix}} (d_{0} - | v |) S_{v} ⟹ d_{0} - \sum_{j = 1}^{d} S_{T_{j}} - (d_{0} - 1) \sum_{j = 1}^{d} S_{j} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ 2 \leq | v | \leq d_{0} - 1 \end{matrix}} (d_{0} - | v |) S_{v} \geq 0

which implies that

\sum_{j = 1}^{d} S_{T_{j}} + (d_{0} - 1) \sum_{j = 1}^{d} S_{j} \leq d_{0}

.

Additionally, taking

d_{0} - (d_{0} - 1) \sum_{j = 1}^{d} S_{T_{j}} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ | v | \leq d_{0} \end{matrix}} [d_{0} - (d_{0} - 1) | v |] S_{v}

yields

d_{0} - (d_{0} - 1) \sum_{j = 1}^{d} S_{T_{j}} - \sum_{j = 1}^{d} S_{j} = \sum_{\begin{matrix} v \subset {1, \dots, d} \\ 2 \leq | v | \leq d_{0} \end{matrix}} [d_{0} - (d_{0} - 1) | v |] S_{v} \leq 0,

which implies

(d_{0} - 1) \sum_{j = 1}^{d} S_{T_{j}} + \sum_{j = 1}^{d} S_{j} \geq d_{0}

.

Appendix B. Proof of Theorem 1

Denote

\vec{ı} : = (i_{1}, \dots, i_{d}) \in N^{d}

,

| | \vec{ı} {| |}_{1} = i_{1} + \dots + i_{d}

,

\vec{ı}! : = i_{1}! \dots i_{d}!

,

x^{\vec{ı}} : = x_{1}^{i_{1}} \dots x_{d}^{i_{d}}

, and

D^{(\vec{ı})} f : = \partial_{1}^{i_{1}} \dots \partial_{d}^{i_{d}} f

. The Taylor expansion of

f (x + β_{ℓ} h V)

about

x

of order

α

is given by

\begin{matrix} f (x + β_{ℓ} h V) & = & \sum_{r = 0}^{α} \sum_{| | \vec{ı} {| |}_{1} = r} \frac{D^{(\vec{ı})} f (x)}{\vec{ı}!} β_{ℓ}^{r} {(h V)}^{\vec{ı}} + O (| | β_{ℓ} h V {| |}_{1}^{α + 1}) . \end{matrix}

For any

w \subseteq {1, \dots, d}

with the cardinality

| w |

; using

𝟙_{w} (\cdot)

for the indicator function and

\vec{w} : = (𝟙_{w} (1), \dots, 𝟙_{w} (d))

lead to

D^{(\vec{w})} f = D^{| w |} f

. Additionally, using

E_{k} : = \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}

implies that

R_{k} = \frac{V_{k}}{h_{k} σ^{2}} E_{k}

,

k = 1, \dots, d

.

First, by evaluating the above expansion at

X^{'}

and taking the expectation with respect to

V

,

A : = \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E_{V} [f (X^{'} + β_{ℓ} h V) e_{u}^{(1 : d)} (R (x, X^{'}, V))]

can be written as

\begin{matrix} A & = & \sum_{r \geq 0} \sum_{| | \vec{ı} {| |}_{1} = r} \frac{D^{(\vec{ı})} f (X^{'})}{\vec{ı}!} (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{r}) E_{V} [{(h V)}^{\vec{ı}} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} {(\frac{V}{h σ^{2}} E)}^{\vec{w}}] \\ = & \sum_{\begin{matrix} r \geq 0 \\ | | \vec{ı} {| |}_{1} = r \end{matrix}} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} \frac{D^{(\vec{ı})} f (X^{'}) (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{r})}{\vec{ı}! σ^{2 | u |}} E_{V} [{(V)}^{\vec{ı} + \vec{w}} {(h)}^{\vec{ı} - \vec{w}} {(E)}^{\vec{w}}] \end{matrix}

We can see that

E_{V} [{(V)}^{\vec{ı} + \vec{w}} {(h)}^{\vec{ı} - \vec{w}}] \neq 0

iff

\vec{ı} + \vec{w} = 2 \vec{q}, \forall \vec{q} \in N^{d}

. Equation

\vec{ı} + \vec{w} = 2 \vec{q}

implies

i_{k} = 2 q_{k} \geq 0

if

k \notin w

; otherwise,

i_{k} = 2 q_{k} - 1 \geq 0

. The last quantity is equivalent to

i_{k} = 2 q_{k} + 1

when

k \in w

with

q_{k} \in N

, and it leads to

\vec{ı} = 2 \vec{q} + \vec{w}

,

\forall \vec{q} \in N^{d}

, which also implies that

| | \vec{ı} {| |}_{1} \geq | | \vec{w} {| |}_{1} = | u |

.

When

| | \vec{q} {| |}_{1} = 0

, we have

\vec{ı} = \vec{w}

;

D^{(\vec{ı})} f = D^{w} f

and

E [{(V)}^{\vec{ı} + \vec{w}} {(h)}^{\vec{ı} - \vec{w}}] = E [{(V)}^{2 \vec{w}}] = σ^{2 | u |}

, by independence. Thus, we have

\begin{matrix} A & = & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} D^{| w |} f (X^{'}) (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{| u |}) \prod_{k \in w} E_{k} + \sum_{\begin{matrix} s \geq 1 \\ | | \vec{q} {| |}_{1} = s \end{matrix}} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} \\ \times \frac{D^{(2 \vec{q} + \vec{w})} f (X^{'}) (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{| u | + 2 s})}{(2 \vec{q} + \vec{w})! σ^{2 | u |}} E [{(V)}^{2 (\vec{q} + \vec{w})} {(h)}^{2 \vec{q}} {(E)}^{\vec{w}}], \end{matrix}

using the change of variable

2 s = r - | u |

. At this point, the setting

L = 1, β_{ℓ} = 1

and

C_{ℓ}^{(| u |)} = 1

gives the approximation of

\sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} D^{| w |} f (X^{'}) \prod_{k \in w} \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}

of order

O ({||h||}_{2}^{2})

for all

u \subseteq {1, \dots, d}

.

Second, taking the expectation with respect to

X^{'}

and the sum over

| u | = 1, \dots, d

, we obtain the result, that is,

\sum_{| u | = 1}^{d} E_{X^{'}} [A] = \sum_{| u | = 1}^{d} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = | u | \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}] + O ({||h||}_{2}^{2}),

bearing in mind Equation (1).

Third, we can increase this order up to

O ({||h||}_{2}^{2 L})

by using the constraints

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{2 s + | u |} = δ_{0, s}

s = 0, 1, \dots, L

to eliminate some higher-order terms. Thus, the order

O ({||h||}_{2}^{2 L})

is reached. Other constraints can lead to the order

O ({||h||}_{2}^{2 α_{| u |}})

with

α_{| u |} = 1, \dots, L

.

Appendix C. Proof of Lemma 1

Recall that

|E_{k}| = |\frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}|

. By using the definition of the absolute value and the fact that

0 \leq G (x) \leq 1

, we can check that

E |E_{k}| \leq \frac{1}{2 ρ_{min}}

. Additionally, using the following inequality (see Lemma 1 in ref. [10])

M_{| w |}^{'} \leq 2 γ_{0} {(M_{0}^{'})}^{1 - | w | / d} {(M_{d}^{'})}^{| w | / d} = 2 γ_{0} M_{0}^{'} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{| w | / d},

for a given

γ_{0}

, we can write

\begin{matrix} D_{d_{0}, ρ_{min}} & : = & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} M_{| w |}^{'} {(\frac{1}{2 ρ_{min}})}^{| w |} \leq 2 γ_{0} M_{0}^{'} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d}]}^{| w |} \\ = & 2 γ_{0} M_{0}^{'} \{\sum_{\begin{matrix} w \subseteq {1, \dots, d} \end{matrix}} {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d}]}^{| w |} - \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | \leq d_{0} \end{matrix}} {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d}]}^{| w |}\} \\ \leq & 2 γ_{0} M_{0}^{'} \{\sum_{\begin{matrix} ℓ = 0 \end{matrix}}^{d} (\binom{d}{ℓ}) {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d}]}^{ℓ} - \sum_{\begin{matrix} ℓ = 0 \end{matrix}}^{d_{0}} (\binom{d_{0}}{ℓ}) {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d}]}^{ℓ}\} \\ = & 2 γ_{0} M_{0}^{'} \{{[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d} - {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d_{0}}\} \\ = & 2 γ_{0} M_{0}^{'} {[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d_{0}} \{{[\frac{1}{2 ρ_{min}} {(\frac{M_{d}^{'}}{M_{0}^{'}})}^{1 / d} + 1]}^{d - d_{0}} - 1\}, \end{matrix}

and the result holds.

Appendix D. Proof of Theorem 2

Recall that

R_{k} = \frac{V_{k}}{h_{k} σ^{2}} E_{k}

with

E_{k} : = \frac{G_{k} (X_{k}^{'}) - 𝟙_{X_{k}^{'} \geq x_{k}}}{g_{k} (X_{k}^{'})}

,

k = 1, \dots, d

. Using

A_{1} : = \sum_{p = 1}^{d_{0}} \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} E \{[f (X^{'} + β_{ℓ} h V)] e_{p}^{(1 : d)} (R (x, X^{'}, V))\},

the bias

B : = A_{1} - f^{c} (x)

becomes

\begin{matrix} B & = & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \{\sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} E [f (X^{'} + β_{ℓ} h V) \prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}} E_{k}] - E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} E_{k}]\} \\ - \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} E_{k}] \\ = & \sum_{\begin{matrix} w \subseteq {1, \dots, d_{0}} \\ | w | > 0 \end{matrix}} E_{X^{'}} \{(\prod_{k \in w} E_{k}) (\sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} E_{V} [f (X^{'} + β_{ℓ} h V) \prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}}] - D^{| w |} f (X^{'}))\} \\ - \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} E_{X^{'}} [D^{| w |} f (X^{'}) \prod_{k \in w} E_{k}] . \end{matrix}

Note that the quantity

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} E_{V} [f (X^{'} + β_{ℓ} h V) \prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}}] - D^{| w |} f (X^{'})

has been investigated in ref. [13]. To make use of such results in our context given by Equation (5), let

L_{w}^{'} : = ([\frac{r^{*} - | w |}{2}] + L - r^{*}) 𝟙_{| w | \leq r^{*}} + (L - r^{*} - 1) 𝟙_{| w | > r^{*}}

. Thus, we have

|D^{| w |} f (x) - \sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} E [f (x + β_{ℓ} h V) \prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}}]| \leq σ^{2 L_{w}^{'}} M_{| w | + 2 L_{w}^{'}} K_{1, L_{w}^{'}} {||h^{2}||}_{2}^{L_{w}^{'}} .

(A1)

When

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|D^{| w |} f (x) - \sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} E [f (x + β_{ℓ} h V) \prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}}]| \leq M_{| w | + 2 L_{w}^{'}} ξ^{2 L_{w}^{'}} | | h^{2} {| |}_{1}^{L_{w}^{'}} Γ_{| w | + 2 L_{w}^{'}} .

(A2)

Using Equation (A1);

g_{k} > ρ_{min}

and the fact that

E |E_{k}| \leq \frac{1}{2 ρ_{min}}

, we can write

\begin{matrix} | B | & \leq & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} σ^{2 L_{w}^{'}} M_{| w | + 2 L_{w}^{'}} K_{1, L_{w}^{'}} {||h^{2}||}_{2}^{L_{w}^{'}} \prod_{k \in w} E [|E_{k}|] + \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} M_{| w |}^{'} \prod_{k \in w} E [|E_{k}|] \\ \leq & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} σ^{2 L_{w}^{'}} M_{| w | + 2 L_{w}^{'}} K_{1, L_{w}^{'}} {||h^{2}||}_{2}^{L_{w}^{'}} \prod_{k \in w} \frac{1}{2 ρ_{min}} + \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | > d_{0} \end{matrix}} M_{| w |}^{'} \prod_{k \in w} \frac{1}{2 ρ_{min}}, \end{matrix}

where

M_{| w |}^{'} = {||D^{| w |} f||}_{\infty}

. The results hold by using Lemma 1.

Appendix E. Proof of Corollary 1

First, keeping in mind Theorem 2, we can see that the smallest value of

L_{w}^{'}

is

L - r^{*} - 1

, which is reached when

| w | > r^{*}

. Thus, the bias verifies

\begin{matrix} |B| & \leq & {||h^{2}||}_{2}^{L - r^{*} - 1} σ^{2 (L - r^{*} - 1)} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ r^{*} < | w | \leq d_{0} \end{matrix}} K_{w, (L - r^{*} - 1)} M_{| w | + 2 (L - r^{*} - 1)} {(\frac{1}{2 ρ_{min}})}^{| w |} \\ + D_{d_{0}, ρ_{min}} + O ({||h^{2}||}_{2}^{L - r^{*} - 1}) . \end{matrix}

Second, using

K_{1, r^{*}, d_{0}}^{max}

, we can write

\begin{matrix} A_{3} & : = & \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ r^{*} < | w | \leq d_{0} \end{matrix}} K_{w, (L - r^{*} - 1)} M_{| w | + 2 (L - r^{*} - 1)} {(\frac{1}{2 ρ_{min}})}^{| w |} \leq K_{1, r^{*}, d_{0}}^{max} \sum_{ı = r^{*} + 1}^{d_{0}} (\binom{d}{ı}) {(\frac{1}{2 ρ_{min}})}^{ı} \\ = & 2 K_{1, d_{0}}^{max} ρ_{min} {(\frac{d}{2 ρ_{min}})}^{r^{*} + 1} \frac{{(\frac{d}{2 ρ_{min}})}^{d_{0} - r^{*}} - 1}{d - 2 ρ_{min}}, \end{matrix}

because

(\binom{d}{ı}) \leq d^{ı}

and

\sum_{ı = r^{*} + 1}^{d_{0}} (\binom{d}{ı}) {(\frac{1}{2 ρ_{min}})}^{ı} \leq \sum_{ı = r^{*} + 1}^{d_{0}} {(\frac{d}{2 ρ_{min}})}^{ı} = {(\frac{d}{2 ρ_{min}})}^{r^{*} + 1} \frac{{(\frac{d}{2 ρ_{min}})}^{d_{0} - r^{*}} - 1}{\frac{d}{2 ρ_{min}} - 1}

.

Finally, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, the following bias is used to derive the result:

\begin{matrix} |B| & \leq & | | h^{2} {| |}_{1}^{L - r^{*} - 1} ξ^{2 (L - r^{*} - 1)} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ r^{*} < | w | \leq d_{0} \end{matrix}} M_{| w | + 2 (L - r^{*} - 1)} Γ_{| w | + 2 (L - r^{*} - 1)} {(\frac{1}{2 ρ_{min}})}^{| w |} \\ + D_{d_{0}, ρ_{min}} + O (| | h^{2} {| |}_{1}^{L - r^{*} - 1}) . \end{matrix}

Appendix F. Proof of Corollary 2

For

r^{*} < | w | \leq d - 1

, we can see that

L_{w}^{'} = 1

, and the order of approximation in Corollary 1 becomes

O ({||h^{2}||}_{2})

or

O (| | h^{2} {| |}_{1})

because

| w | + 2 \leq d + 1

and

L = r^{*} + 2

. When

| w | = d

, the smallest order is obtained thanks to ref. [13], Corollary 2.

Appendix G. Proof of Theorem 3

For the variance of our emulator, we can write

\begin{matrix} V [\hat{f_{N, d_{0}}^{c}} (x)] & = & \frac{1}{N} V [\sum_{p = 1}^{d_{0}} \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} f (X^{'} + β_{ℓ} h V) e_{p}^{(1 : d)} (R (x, X^{'}, V))] \\ \leq & \frac{1}{N} E [{\{\sum_{p = 1}^{d_{0}} \sum_{ℓ = 1}^{L} C_{ℓ}^{(p)} f (X^{'} + β_{ℓ} h V) e_{p}^{(1 : d)} (R (x, X^{'}, V))\}}^{2}] \\ \leq & \frac{1}{N} E [{\{\sum_{p = 1}^{d_{0}} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ | w | = p \end{matrix}} (\prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}} E_{k}) (\sum_{ℓ = 1}^{L} C_{ℓ}^{(p = | w |)} f (X^{'} + β_{ℓ} h V))\}}^{2}] . \end{matrix}

Using

A_{2} : = \sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} f (X^{'} + β_{ℓ} h V)

, and keeping in mind Equation (5), we can write

\begin{matrix} |A_{2}| & = & |\sum_{ℓ = 1}^{L} C_{ℓ}^{(| w |)} [f (X^{'} + β_{ℓ} h V) - \sum_{r = 0}^{min (r^{*}, | w | - 1)} \sum_{| | \vec{ı} {| |}_{1} = r} \frac{D^{(\vec{ı})} f (X^{'})}{\vec{ı}!} β_{ℓ}^{r} {(h V)}^{\vec{ı}}]| \\ \leq & M_{min (r^{*}, | w | - 1) + 1} {||h V||}_{2}^{min (r^{*}, | w | - 1) + 1} \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| w |)} β_{ℓ}^{min (r^{*}, | w | - 1) + 1}|, \end{matrix}

because

f \in H_{| w | + 1}

. Keeping in mind that

Γ_{min (r^{*}, | w | - 1) + 1} = \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| w |)} β_{ℓ}^{min (r^{*}, | w | - 1) + 1}|

and using

ϝ_{r^{*}, | w |} : = M_{min (r^{*}, | w | - 1) + 1} Γ_{min (r^{*}, | w | - 1) + 1}

, the variance becomes

\begin{matrix} V [\hat{f_{N, d_{0}}^{c}} (x)] & \leq & \frac{1}{N} E [{\{\sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} ϝ_{r^{*}, | w |} {||h V||}_{2}^{min (r^{*}, | w | - 1) + 1} (\prod_{k \in w} \frac{V_{k}}{h_{k} σ^{2}} E_{k})\}}^{2}] \\ \leq & \frac{{(ϝ_{r^{*}, d_{0}}^{max})}^{2}}{N} E [{\{\sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \prod_{k \in w} (\frac{V_{k} E_{k}}{h_{k} σ^{2}} {||h V||}_{2}^{\frac{min (r^{*}, | w | - 1) + 1}{| w |}})\}}^{2}] \\ = & \frac{{(ϝ_{r^{*}, d_{0}}^{max})}^{2}}{N} E [\sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \prod_{k \in w} {(\frac{V_{k} E_{k}}{h_{k} σ^{2}} {||h V||}_{2}^{\frac{min (r^{*}, | w | - 1) + 1}{| w |}})}^{2}], \end{matrix}

because

E [V_{k}] = 0

,

E [V_{k} {||h V||}_{2}^{\frac{min (r^{*}, | w | - 1) + 1}{| w |}}] = 0

and when

w \neq w^{'}

,

E [\prod_{k \in w} (\frac{V_{k} E_{k}}{h_{k} σ^{2}} {||h V||}_{2}^{\frac{min (r^{*}, | w | - 1) + 1}{| w |}}) \prod_{ℓ \in w^{'}} (\frac{V_{ℓ} E_{ℓ}}{h_{ℓ} σ^{2}} {||h V||}_{2}^{\frac{min (r^{*}, | w | - 1) + 1}{| w |}})] = 0 .

For the second result, by expanding

E_{k}^{2}

and knowing that

0 \leq G (x) \leq 1

, we can see that

E [E_{k}^{2}] \leq \frac{1}{3 ρ_{min}^{2}}

. Additionally, using

Z_{k} = V_{k} / σ

and the fact that

h_{k} = h, r^{*} = d_{0} - 1

and

| w | \leq d_{0}

, we have

\begin{matrix} V [\hat{f_{N, d_{0}}^{c}} (x)] & \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \prod_{k \in w} (\frac{E [V_{1}^{2} {||V||}_{2}^{2}]}{σ^{4}} E [E_{k}^{2}]) \\ \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} \prod_{k \in w} (E [Z_{1}^{2} {||Z||}_{2}^{2}] E [E_{k}^{2}]) \\ \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \sum_{\begin{matrix} w \subseteq {1, \dots, d} \\ 0 < | w | \leq d_{0} \end{matrix}} {(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{| w |} = \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \sum_{p = 1}^{d_{0}} (\binom{d}{p}) {(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{p} \\ \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \frac{d E [Z_{1}^{2} {||Z||}_{2}^{2}]}{d E [Z_{1}^{2} {||Z||}_{2}^{2}] - 3 ρ_{min}^{2}} [{(\frac{d E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} - 1] . \end{matrix}

(A3)

Finally, when

E [Z_{1}^{2} {||Z||}_{2}^{2}] \geq 3 ρ_{min}^{2}

and knowing that

\sum_{p = 1}^{d_{0}} (\binom{d}{p}) \leq \sum_{p = 1}^{d_{0}} d^{p} = \frac{d^{d_{0}} - 1}{d - 1} d \leq 2 d^{d_{0}}

, we can write

\begin{matrix} V [\hat{f_{N, d_{0}}^{c}} (x)] & \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} \sum_{p = 1}^{d_{0}} (\binom{d}{p}) {(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{p} \\ \leq & \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} {(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} \sum_{p = 1}^{d_{0}} (\binom{d}{p}) = \frac{2 {(ϝ_{d_{0}}^{max})}^{2}}{N} {(d \frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d_{0}} . \end{matrix}

Additionally, note that

\sum_{p = 1}^{d_{0}} (\binom{d}{p}) \leq 2^{d}

.

Appendix H. Proof of Corollary 3

Since

r^{*} = d_{0} - 1

and

| w | \leq d_{0}

, we have

K_{1, ρ_{min}, d_{0} - 1} = (\binom{d}{d_{0}}) {(\frac{1}{2 ρ_{min}})}^{d_{0}}

. The first result is obvious using the variance of the emulator provided in Theorem 3 and the bias from Corollary 1.

The second result is due to the fact that when

G_{j} = F_{j}

, the terms in the Db expansion of f are

L_{2}

-orthogonal.

Appendix I. Proof of Corollary 5

As

V_{k} \sim U (- ξ, ξ)

, then

Z_{k} \sim U (- \sqrt{3}, \sqrt{3})

and

E [Z_{1}^{2} {||Z||}_{2}^{2}] = d + 4 / 5

. Thus, the first result holds.

For the second result, taking

ρ_{min} = \sqrt{\frac{c_{0} d (d + 0.8)}{3}}

with

1 < c_{0} \leq 2

yields

V [\hat{f_{N, d_{0}}^{c}} (x)] \leq {\bar{Σ}}_{d_{0}} = \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} (\frac{1 - {(1 / c_{0})}^{d_{0}}}{c_{0} - 1}) \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N (c_{0} - 1)} .

Appendix J. Proof of Corollary 6

Using Equation (A3) and the fact that

r^{*} = d - 1

,

d_{0} = d

, we can check that

V [\hat{f_{N, d_{0}}^{c}} (x)] \leq \frac{{(ϝ_{d_{0}}^{max})}^{2}}{N} [{(\frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}} + 1)}^{d} - 1] \leq {\bar{Σ}}_{d},

with

{\bar{Σ}}_{d} : = \frac{d {(ϝ_{d_{0}}^{max})}^{2} E [Z_{1}^{2} {||Z||}_{2}^{2}]}{N (d E [Z_{1}^{2} {||Z||}_{2}^{2}] - 3 ρ_{min}^{2})} [{(d \frac{E [Z_{1}^{2} {||Z||}_{2}^{2}]}{3 ρ_{min}^{2}})}^{d} - 1] .

Thus, the results hold using Corollaries 2 and 3.

References

Max, D.; Morris, T.J.M.; Ylvisaker, D. Bayesian Design and Analysis of Computer Experiments: Use of Derivatives in Surface Prediction. Technometrics 1993, 35, 243–255. [Google Scholar]
Solak, E.; Murray-Smith, R.; Leithead, W.; Leith, D.; Rasmussen, C. Derivative observations in Gaussian process models of dynamic systems. Adv. Neural Inf. Process. Syst. 2002, 15, 1–8. [Google Scholar]
Le Dimet, F.X.; Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus A Dyn. Meteorol. Oceanogr. 1986, 38, 97–110. [Google Scholar] [CrossRef]
Le Dimet, F.X.; Ngodock, H.E.; Luong, B.; Verron, J. Sensitivity analysis in variational data assimilation. J. Meteorol. Soc. Jpn. 1997, 75, 245–255. [Google Scholar] [CrossRef]
Cacuci, D.G. Sensitivity and Uncertainty Analysis—Theory; Chapman & Hall/CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
Gunzburger, M.D. Perspectives in Flow Control and Optimization; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
Borzi, A.; Schulz, V. Computational Optimization of Systems Governed by Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar]
Ghanem, R.; Higdon, D.; Owhadi, H. Handbook of Uncertainty Quantification; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Agarwal, A.; Dekel, O.; Xiao, L. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. In Proceedings of the COLT, Haifa, Israel, 27–29 June 2010; Citeseer: Princeton, NJ, USA, 2010; pp. 28–40. [Google Scholar]
Bach, F.; Perchet, V. Highly-Smooth Zero-th Order Online Optimization. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; Volume 49, pp. 257–283. [Google Scholar]
Akhavan, A.; Pontil, M.; Tsybakov, A.B. Exploiting higher order smoothness in derivative-free optimization and continuous bandits. In Proceedings of the NIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Lamboni, M. Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables. Axioms 2024, 13, 426. [Google Scholar] [CrossRef]
Lamboni, M. Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats 2024, 7, 1–22. [Google Scholar] [CrossRef]
Chkifa, A.; Cohen, A.; DeVore, R.; Schwab, C. Sparse adaptive Taylor approximation algorithms for parametric and stochastic elliptic PDEs. ESAIM Math. Model. Numer. Anal. 2013, 47, 253–280. [Google Scholar] [CrossRef]
Patil, P.; Babaee, H. Reduced-Order Modeling with Time-Dependent Bases for PDEs with Stochastic Boundary Conditions. SIAM/ASA J. Uncertain. Quantif. 2023, 11, 727–756. [Google Scholar] [CrossRef]
Sobol, I.M.; Kucherenko, S. Derivative based global sensitivity measures and the link with global sensitivity indices. Math. Comput. Simul. 2009, 79, 3009–3017. [Google Scholar] [CrossRef]
Kucherenko, S.; Rodriguez-Fernandez, M.; Pantelides, C.; Shah, N. Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 2009, 94, 1135–1148. [Google Scholar] [CrossRef]
Lamboni, M.; Iooss, B.; Popelin, A.L.; Gamboa, F. Derivative-based global sensitivity measures: General links with Sobol’ indices and numerical tests. Math. Comput. Simul. 2013, 87, 45–54. [Google Scholar] [CrossRef]
Roustant, O.; Fruth, J.; Iooss, B.; Kuhnt, S. Crossed-derivative based sensitivity measures for interaction screening. Math. Comput. Simul. 2014, 105, 105–118. [Google Scholar] [CrossRef]
Roustant, O.; Barthe, F.; Iooss, B. Poincar inequalities on intervals—Application to sensitivity analysis. Electron. J. Statist. 2017, 11, 3081–3119. [Google Scholar] [CrossRef]
Lamboni, M. Derivative-based integral equalities and inequality: A proxy-measure for sensitivity analysis. Math. Comput. Simul. 2021, 179, 137–161. [Google Scholar] [CrossRef]
Lamboni, M. Weak derivative-based expansion of functions: ANOVA and some inequalities. Math. Comput. Simul. 2022, 194, 691–718. [Google Scholar] [CrossRef]
Lamboni, M.; Kucherenko, S. Multivariate sensitivity analysis and derivative-based global sensitivity measures with dependent variables. Reliab. Eng. Syst. Saf. 2021, 212, 107519. [Google Scholar] [CrossRef]
Krige, D.G. A Statistical Approaches to Some Basic Mine Valuation Problems on the Witwatersrand. J. Chem. Metall. Soc. S. Afr. 1951, 52, 119–139. [Google Scholar]
Currin, C.; Mitchell, T.; Morris, M.; Ylvisaker, D. Bayesian Prediction of Deterministic Functions, with Applications to the Design and Analysis of Computer Experiments. J. Am. Stat. Assoc. 1991, 86, 953–963. [Google Scholar] [CrossRef]
Oakley, J.E.; O’Hagan, A. Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 2004, 66, 751–769. [Google Scholar] [CrossRef]
Conti, S.; O’Hagan, A. Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Inference 2010, 140, 640–651. [Google Scholar] [CrossRef]
Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2001, 63, 425–464. [Google Scholar] [CrossRef]
Xiu, D.; Karniadakis, G. The Wiener-Askey polynomial chaos for stochastic di eren-tial equations. Siam J. Sci. Comput. 2002, 24. [Google Scholar] [CrossRef]
Ghanem, R.G.; Spanos, P.D. Stochastic Finite Elements: A Spectral Approach; Springer: New York, NY, USA, 1991; pp. 1–214. [Google Scholar]
Sudret, B. Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 2008, 93, 964–979. [Google Scholar] [CrossRef]
Wahba, G. Spline Models for Observational Data; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1990. [Google Scholar] [CrossRef]
Wong, R.K.W.; Storlie, C.B.; Lee, T.C.M. A Frequentist Approach to Computer Model Calibration. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 79, 635–648. [Google Scholar] [CrossRef]
Friedman, J.H.; Popescu, B.E. Predictive learning via rule ensembles. Ann. Appl. Stat. 2008, 2, 916–954. [Google Scholar] [CrossRef]
Horiguchi, A.; Pratola, M.T. Estimating Shapley Effects in Big-Data Emulation and Regression Settings using Bayesian Additive Regression Trees. arXiv 2024, arXiv:2304.03809. [Google Scholar]
Migliorati, G.; Nobile, F.; Schwerin, E.; Tempone, R. Analysis of Discrete L2 Projection on Polynomial Spaces with Random Evaluations. Found. Comput. Math. 2014, 14, 419–456. [Google Scholar]
Hampton, J.; Doostan, A. Coherence motivated sampling and convergence analysis of least squares polynomial Chaos regression. Comput. Methods Appl. Mech. Eng. 2015, 290, 73–97. [Google Scholar] [CrossRef]
Cohen, A.; Davenport, M.A.; Leviatan, D. On the stability and accuracy of least squares approximations. arXiv 2018, arXiv:math.NA/1111.4422. [Google Scholar] [CrossRef]
Tsybakov, A. Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2009. [Google Scholar]
Zemanian, A. Distribution Theory and Transform Analysis: An Introduction to Generalized Functions, with Applications; Dover Books on Advanced Mathematics; Dover Publications: Garden City, NY, USA, 1987. [Google Scholar]
Strichartz, R. A Guide to Distribution Theory and Fourier Transforms; Studies in advanced mathematics; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
Caflisch, R.E.; Morokoff, W.J.; Owen, A.B. Valuation of mortgage-backed securities using Brownian bridges to reduce effective dimension. J. Comput. Financ. 1997, 1, 27–46. [Google Scholar] [CrossRef]
Owen, A. The dimension distribution and quadrature test functions. Stat. Sin. 2003, 13, 1–17. [Google Scholar]
Rabitz, H. General foundations of high dimensional model representations. J. Math. Chem. 1999, 25, 197–233. [Google Scholar] [CrossRef]
Kuo, F.; Sloan, I.; Wasilkowski, G.; Woźniakowski, H. On decompositions of multivariate functions. Math. Comput. 2010, 79, 953–966. [Google Scholar] [CrossRef]
Alatawi, M.S.; Martinucci, B. On the Elementary Symmetric Polynomials and the Zeros of Legendre Polynomials. J. Math. 2022, 1–9. [Google Scholar] [CrossRef]
Arafat, A.; El-Mikkawy, M. A Fast Novel Recursive Algorithm for Computing the Inverse of a Generalized Vandermonde Matrix. Axioms 2023, 12, 27. [Google Scholar] [CrossRef]
Rawashdeh, E. A simple method for finding the inverse matrix of Vandermonde matrix. Mathematiqki Vesnik 2019, 71, 207–213. [Google Scholar]
Homma, T.; Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 1996, 52, 1–17. [Google Scholar] [CrossRef]

Figure 1. Predictions versus simulated outputs (observations) for Ishigami’s function.

Figure 2. Predictions versus simulated outputs (observations) for the g-function of type A.

Figure 3. Predictions versus simulated outputs (observations) for the g-function of type B.

Figure 4. Main indices (∘) and upper bounds of total indices (+) of

d = 50

inputs (top-left panel); observations versus predictions using either derivative-based emulators (see the top-right panel when including all components and the bottom-left panel for all other cases) or derivative-free emulators (bottom-right panel).

Figure 4. Main indices (∘) and upper bounds of total indices (+) of

d = 50

inputs (top-left panel); observations versus predictions using either derivative-based emulators (see the top-right panel when including all components and the bottom-left panel for all other cases) or derivative-free emulators (bottom-right panel).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lamboni, M. Optimal ANOVA-Based Emulators of Models With(out) Derivatives. Stats 2025, 8, 24. https://doi.org/10.3390/stats8010024

AMA Style

Lamboni M. Optimal ANOVA-Based Emulators of Models With(out) Derivatives. Stats. 2025; 8(1):24. https://doi.org/10.3390/stats8010024

Chicago/Turabian Style

Lamboni, Matieyendou. 2025. "Optimal ANOVA-Based Emulators of Models With(out) Derivatives" Stats 8, no. 1: 24. https://doi.org/10.3390/stats8010024

APA Style

Lamboni, M. (2025). Optimal ANOVA-Based Emulators of Models With(out) Derivatives. Stats, 8(1), 24. https://doi.org/10.3390/stats8010024

Article Menu

Optimal ANOVA-Based Emulators of Models With(out) Derivatives

Abstract

1. Introduction

General Notation

2. New Insight into Derivative-Based ANOVA and Emulators

2.1. Full Derivative-Based Emulators

2.2. Adequate Structures of Emulators and Truncations

3. Derivative-Free Emulators of Models

3.1. Stochastic Surrogates of Functions Using Db-ANOVA

3.2. Statistical Properties of Our Emulators

3.2.1. Biases of the Proposed Emulators

3.2.2. Mean Squared Errors

3.2.3. Mean Squared Errors for Every Distribution of Inputs

4. Illustrations

4.1. Test Functions

4.1.1. Ishigami’s Function ( d = 3 )

4.1.2. Sobol’s g-Function ( d = 10 )

5. Application: Heat Diffusion Models with Stochastic Initial Conditions

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Derivations of Unbiased Truncations (Proposition 1)

Appendix B. Proof of Theorem 1

Appendix C. Proof of Lemma 1

Appendix D. Proof of Theorem 2

Appendix E. Proof of Corollary 1

Appendix F. Proof of Corollary 2

Appendix G. Proof of Theorem 3

Appendix H. Proof of Corollary 3

Appendix I. Proof of Corollary 5

Appendix J. Proof of Corollary 6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1.1. Ishigami’s Function ( $d = 3$ )

4.1.2. Sobol’s g-Function ( $d = 10$ )