Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

Lamboni, Matieyendou

doi:10.3390/stats7030042

Open AccessArticle

Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

by

Matieyendou Lamboni

^1,2

¹

Department DFR-ST, University of Guyane, 97346 Cayenne, France

²

228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France

Stats 2024, 7(3), 697-718; https://doi.org/10.3390/stats7030042

Submission received: 31 May 2024 / Revised: 4 July 2024 / Accepted: 12 July 2024 / Published: 14 July 2024

(This article belongs to the Section Statistical Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Computing cross-partial derivatives using fewer model runs is relevant in modeling, such as stochastic approximation, derivative-based ANOVA, exploring complex models, and active subspaces. This paper introduces surrogates of all the cross-partial derivatives of functions by evaluating such functions at N randomized points and using a set of L constraints. Randomized points rely on independent, central, and symmetric variables. The associated estimators, based on

N L

model runs, reach the optimal rates of convergence (i.e.,

O (N^{- 1})

), and the biases of our approximations do not suffer from the curse of dimensionality for a wide class of functions. Such results are used for (i) computing the main and upper bounds of sensitivity indices, and (ii) deriving emulators of simulators or surrogates of functions thanks to the derivative-based ANOVA. Simulations are presented to show the accuracy of our emulators and estimators of sensitivity indices. The plug-in estimates of indices using the U-statistics of one sample are numerically much stable.

Keywords:

derivative-based ANOVA; high-dimensional models; independent input variables; optimal estimators of derivatives; sensitivity analysis

MSC:

62Fxx; 62J10; 49-XX; 26D10

1. Introduction

Derivatives are relevant in modeling, such as inverse problems, first-order and second-order stochastic approximation methods [1,2,3,4], exploring complex mathematical models or simulators, derivative-based ANOVA (Db-ANOVA) or exact expansions of functions, and active subspaces. First-order derivatives or gradients are sometime available in modeling. Instances are (i) models defined via their rates of change with respect to their inputs; (ii) implicit functions defined via their derivatives [5,6]; (iii) cases listed in [7,8] and the references therein.

In FANOVA and sensitivity analysis user and developer communities (see, e.g., [9,10,11,12,13,14]), screening of input variables and interactions of high-dimensional simulators is often performed before building emulators of such models using Gaussian processes [15,16,17,18,19], polynomial chaos expansions and SS-ANOVA [20,21], or other machine learning approaches. Emulators are fast-evaluated models that better approximate complex and/or too-expansive simulators. Efficient variance-based screening methods rely on the upper bounds of generalized sensitivity indices, including Sobol’ indices (see [14,22,23,24,25,26] for independent inputs and [27,28,29] for non-independent variables). Such upper bounds require the computations of cross-partial derivatives, even for simulators for which these computations are time-demanding or impossible. Also, active subspaces rely on the first-order derivatives for performing dimension reduction and then for approximating complex models [30,31,32,33].

For functions with full interactions, all the cross-partial derivatives are used in the integral representations of the infinitesimal increment of functions [34], and in the unanchored decompositions of functions in the Sobolev space [35]. Recently, such derivatives have become crucial in the Db-ANOVA representation of every smooth function, such as high-dimensional PDE models. Indeed, it is known, in [14], that every smooth function f admits an exact Db-ANOVA decomposition, that is,

\forall x \in Ω \subseteq R^{d}

,

f (x) = E [f (X^{'})] + \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ | v | > 0 \end{matrix}} E_{X^{'}} [D^{| v |} f (X^{'}) \prod_{k \in v} \frac{G_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq x_{k}]}}{g_{k} (X_{k}^{'})}],

(1)

where

D^{| v |} f

stands for the cross-partial derivative with respect to

x_{k}

for any

k \in v

;

X^{'} : = (X_{1}^{'}, \dots, X_{d}^{'})

is a random vector of independent variables, supported on an open

Ω

with margins

G_{j}

s and densities

g_{j}

s (i.e.,

X_{j}^{'} \sim G_{j}, j = 1, \dots, d

).

Computing all the cross-partial derivatives using a few model runs or evaluations of functions is challenging. Direct computations of accurate cross-partial derivatives were considered in [36,37] using the generalization of Richardson’s extrapolation. Such approximations of all the cross-partial derivatives require a number of model runs that strongly depends on the dimensionality (i.e., d). While adjoint-based methods can provide the gradients for some POE/PDE-based models using only one simulation of the adjoint models [38,39,40,41,42,43], note that computing the Hessian matrix requires running d second-order adjoint models in general, provided that such models are available (see [38,44]).

Stochastic perturbations methods or Monte Carlo approaches have been used in stochastic approximations (see, e.g., [1,2,3,4]), including derivative-free optimization (see [2,3,4,45,46,47,48] and the references therein), for computing the gradients and Hessian of functions. Such approaches lead to the estimates of gradients using a number of model runs that can be less than the dimensionality [48,49]. While gradients computations and the convergence analysis are considered in the first-order stochastic approximations, estimators of the Hessian matrices are investigated in the second-order stochastic approximations [2,50,51,52,53]. Most of such approaches rely on the Taylor expansions of functions and randomized kernels and/or a random vector that is uniformly distributed on the unit sphere. Nevertheless, independent variables are used in [48,50,53], and the approaches considered in [53,54] rely on the Stein identity [55]. Note that the upper bounds of the biases of such approximations depend on the dimensionality, except in the work [48] for the gradients only. Moreover, the convergence analysis for more than the second-order cross-partial derivatives are not available according to our knowledge.

Given a smooth function defined on

R^{d}

, the motivation of this paper consists of proposing new approaches for deriving surrogates of cross-partial derivatives and derivative-based emulators of functions that

Are simple to use and generic by making use of d independent variables that are symmetrically distributed about zero and a set of constraints;
Lead to dimension-free upper bounds of the biases related to the approximations of cross-partial derivatives for a wide class of functions;
Provide estimators of cross-partial derivatives that reach the optimal and parametric rates of convergence;
Can be used for computing all the cross-partial derivatives and emulators of functions at given points using a small number of model runs.

In this paper, new expressions of cross-partial derivatives of any order are derived in Section 3 by combining the properties of (i) the generalized Richardson extrapolation approach so as to increase the approximations accuracy, and (ii) the Monte Carlo approaches based only on independent random variables that are symmetrically distributed about zero. Such expressions are followed by their order of approximations and biases. We also derive the estimators of such new expressions and their associated mean squared errors, including the rates of convergence for some classes of functions (see Section 3.3). Section 3.4 provides the derivative-based emulators of functions, depending on the strength of the interactions or (equivalently) the cross-partial derivatives, thanks to Equation (1). The strength of the interactions can be assessed using sensitivity analysis. Thus, Section 4 deals with the derivation of new expressions of sensitivity indices and their estimators by making use of the proposed surrogates of cross-partial derivatives. Simulations based on test functions are considered in Section 5 to show the accuracy of our approach, and we conclude this work in Section 6.

2. Preliminary

For an integer

d \in N ∖ {0}

, let

X : = (X_{1}, \dots, X_{d})

be a random vector of d independent and continuous variables with marginal cumulative distribution functions (CDFs)

F_{j}, j = 1, \dots, d

and probability density functions (PDFs)

ρ_{j}, j = 1, \dots, d

.

For a non-empty subset

u \subseteq {1, \dots, d}

, we use

| u |

for its cardinality (i.e., the number of elements in u) and

(\sim u) : = {1, \dots, d} ∖ u

. Also, we use

X_{u} : = (X_{j}, \forall j \in u)

for a subset of inputs, and we have the partition

X = (X_{u}, X_{\sim u})

. Assume that:

Assumption 1

(A1).

X

is a random vector of independent variables, supported on Ω.

Working with partial derivatives requires a specific mathematical space. Given an integer

n \in N ∖ {0}

and an open set

Ω \subseteq R^{d}

, consider a weak partial differentiable function

f : Ω \to R^{n}

[56,57] and a subset

v \subseteq {1, \dots, d}

with

| v | > 0

. Namely, we use

D^{| v |} f : = (\prod_{k \in v} \frac{\partial}{\partial x_{k}}) f

for the

| v |

-th weak cross-partial derivatives of each component of f with respect to each

x_{k}

with

k \in v

.

Likewise, given

\vec{ı} : = (i_{1}, \dots, i_{d}) \in N^{d}

, denote

D^{(\vec{ı})} f : = (\prod_{k = 1}^{d} \frac{\partial^{i_{k}}}{\partial x_{k}}) f

and

1 I_{v} (j) = 1

if

j \in v

and zero otherwise. Thus, taking

\vec{v} : = (1 I_{v} (1), \dots, 1 I_{v} (d))

yields

D^{| v |} f = D^{(\vec{v})} f

. Moreover, denote

{(x)}^{\vec{ı}} = x^{\vec{ı}} : = \prod_{k = 1}^{d} x_{k}^{i_{k}}

,

\vec{ı}! : = i_{1}! \dots i_{d}!

, and consider the Hölder space of

α

-smooth functions given by

\forall x, y \in R^{d}

H_{α} : = \{f : R^{d} \to R : |f (x) - \sum_{0 \leq i_{1} + \dots + i_{d} \leq α - 1} \frac{D^{(\vec{ı})} f (y)}{\vec{ı}!} {(x - y)}^{\vec{ı}}| \leq M_{α} {||x - y||}_{2}^{α}\},

with

α \geq 1

,

M_{α} > 0

, and

D^{(\vec{ı})} f (y)

as weak cross-partial derivatives. We use

{||\cdot||}_{2}

for the Euclidean norm,

| | \cdot {| |}_{1}

for the

L_{1}

-norm,

E (\cdot)

for the expectation, and

V (\cdot)

for the variance.

3. Surrogates of Cross-Partial Derivatives and New Emulators of Functions

3.1. New Expressions of Cross-Partial Derivatives

This section aims at providing expressions of cross-partial derivatives using the model of interest, and new independent random vectors. We provide approximated expressions of

D^{| u |} f (x)

for all

u \subseteq {1, \dots, d}

and the associated orders of approximations.

Given

L, q \in N ∖ {0}

, consider

β_{ℓ} \in R

with

ℓ = 1, \dots, L

,

h : = (h_{1}, \dots, h_{d}) \in R_{+}^{d}

, and denote with

V : = (V_{1}, \dots, V_{d})

as d-dimensional random vectors of independent variables satisfying

\forall j \in {1, \dots, d}

E [V_{j}] = 0; E [{(V_{j})}^{2}] = σ^{2}; E [{(V_{j})}^{2 q + 1}] = 0; E [{(V_{j})}^{2 q}] < + \infty .

Random vectors of d independent variables that are symmetrically distributed about zero are instances of

V

, including the standard Gaussian random vector and symmetric uniform distributions about zero.

Denote $β_{ℓ} h V : = (β_{ℓ} h_{1} V_{1}; \dots, β_{ℓ} h_{d} V_{d})$ . The reals $β_{ℓ}$ s are used for controlling the order of derivatives (i.e., $| u |$ ) we are interested in, while $V_{j}$ s help in selecting one particular derivative of order $| u |$ . Finally, $h_{j}$ s aim at defining a neighborhood of a sample point $x$ of $X$ that will be used. Thus, using $β_{m a x} : = max (| β_{1} |, \dots, | β_{L} |)$ and keeping in mind the variance of $β_{ℓ} h_{j} V_{j}$ , we assume that $\forall j \in {1, \dots, d}$

Assumption 2

(A2).

β_{m a x} h_{j} σ \leq 1 / 2 o r, e q u i v a l e n t l y, 0 < β_{m a x} h_{j} | V_{j} | \leq 1 f o r b o u n d e d V_{j} s

.

Based on the above framework, Theorem 1 provides a new expression of the cross-partial derivatives of f. Recall that

| u |

is the cardinality of u and

D^{| u |} f = D^{(\vec{u})} f

.

Theorem 1.

Consider distinct

β_{ℓ}

s, and assume that

f \in H_{α}

with

α \geq | u | + 2 L

and (A2) holds. Then, for any

u \subseteq {1, \dots, d}

with

| u | > 0

, there exists

α_{| u |} \in {1, \dots, L}

and coefficients

C_{1}^{(| u |)}, \dots, C_{L}^{(| u |)}

such that

D^{| u |} f (x) = \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{(h_{k} σ^{2})}] + O ({||h||}_{2}^{2 α_{| u |}}) .

(2)

Proof.

The detailed proof is provided in Appendix A. □

In view of Theorem 1, we are able to compute all the cross-partial derivatives using the same evaluations of functions with the same or different order of approximations, depending on the constraints imposed to determine the coefficients

C_{1}^{(| u |)}, \dots, C_{L}^{(| u |)}

(see Appendix A). While the setting

L = 1, β_{1} = 1, C_{1}^{(| u |)} = 1

or the constraints

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}

;

r = 0, \dots, L - 2, | u |, L - 1 \leq | u |

lead to the order

O ({||h||}_{2}^{2})

, one can increase that order up to

O ({||h||}_{2}^{2 L})

by using either

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r + | u |} = δ_{0, r}; r = 0, 2, \dots, 2 (L - 1)

or the full constraints given by

\sum_{ℓ = 1}^{L + | u |} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}; r = 0, \dots, 2 L + | u | - 1

. The last setting is going to improve the approximations and numerical computations of derivatives. Since increasing the number of constraints requires more evaluations of simulators, and in ANOVA-like decomposition of

f (X)

, it is common to neglect the higher-order components or, equivalently, the higher-order cross-partial derivatives thanks to Equation (1), the following parsimony number of constraints may be considered. Given an integer

r^{*} > 0

, controlling the partial derivatives of order up to

r^{*} \leq L - 2

can be performed using the constraints

\{\begin{matrix} \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}; r = 0, 1, \dots, L - 1 & if | u | = 1, \dots, r^{*} \\ \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}; r = 0, \dots, r^{*}, | u |, \dots, | u | + L - r^{*} - 2 & otherwise \end{matrix} .

(3)

Equation (3) gives approximations of all the cross-partial derivatives of

O ({||h||}_{2}^{α_{| u |}})

where

o : = \{\begin{matrix} L - | u | & if 1 \leq | u | \leq r^{*} \\ L - r^{*} - 1 & otherwise \end{matrix}

and

α_{| u |} = o

if o is even and

o + 1

otherwise. This equation relies on the Vandermonde matrices and the generalized Vandermonde matrices, which ensure the existence and uniqueness of the coefficients for distinct values of

β_{ℓ}

s (i.e.,

β_{ℓ_{1}} \neq β_{ℓ_{2}}

) because the determinant is of the form

\prod_{1 \leq ℓ_{1} < ℓ_{2} \leq L^{'}} (β_{ℓ_{1}} - β_{ℓ_{2}})

(see [58,59] for more details and the inverse of such matrices).

Remark 1.

When

L = 1

, we must have

β_{1} = 1

,

C_{i}^{(| u |)} = 1, \forall u \subseteq {1, \dots, d}

. Thus, the coefficient

C_{i}^{(| u |)}

does not necessarily depend on

| u |

. Taking L for an even integer, the following nodes may be considered:

\{β_{1}, \dots, β_{L}\} = \{\pm 2^{k}, k = 0, \dots, \frac{L - 2}{2}\}

. When L is odd, one may add 0 to the above set. Of course, other possibilities can be considered provided that

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{| u |} = 1

.

Remark 2.

For a given

u_{0} \subseteq {1, \dots, d}

, if we are only interested in all the

{| v |}^{t h}

cross-partial derivatives with

v \subseteq u_{0}

, it is better to set

h_{\sim u} = 0

in Equation (2).

Remark 3.

Links to other works.

Consider

β_{1} = - 1, β_{2} = 1, β_{3} = 0

, and

V_{j} \sim N (0, 1)

with

j = 1, \dots, d

. Using

L = 1

or

L = 2

or

L = 3

, our estimators of the first-order and the second-order cross-partial derivatives are very similar to the results obtained in [53].

Using the uniform perturbations and

L = 2

and

L = 3

, our estimators of the first-order and the second-order cross-partial derivatives are similar to those provided in [50]. However, we will see later that specific uniform distributions allow for obtaining dimension-free upper bounds of the biases.

3.2. Upper Bounds of Biases

To derive precise biases of our approximations provided in Theorem 1, different structural assumptions on the deterministic functions f and

V

are considered. Assume that

f \in H_{α}

with

α \geq | u |

is sufficient to define

D^{| u |} f (x)

for any

u \subseteq {1, \dots, d}

. Note that such an assumption does not depend on the dimensionality d. For the sequel of generality, we provide the upper bounds of the biases for any value of L by considering two sets of constraints.

Denote with

R : = (R_{1}, \dots, R_{d})

a d-dimensional random vector of independent variables that are centered about zero and standardized (i.e.,

E [R_{k}^{2}] = 1

,

k = 1, \dots, d

), and

R_{c}

the set of such random vectors. For any

r \in N

, define

Γ_{r} : = \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{r}|; K_{1, L} : = inf_{R \in R_{c}} E [{||R^{2}||}_{2}^{L} \prod_{k \in u} R_{k}^{2}] Γ_{| u | + 2 L} .

Corollary 1.

Consider distinct

β_{ℓ}

s and the constraints

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{| u | + r} = δ_{0, r}

with

r = 0, 2, 4, \dots, 2 (L - 1)

. If

f \in H_{| u | + 2 L}

and (A2) hold, then there is

M_{| u | + 2 L} > 0

such that

|D^{| u |} f (x) - \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]| \leq σ^{2 L} M_{| u | + 2 L} K_{1, L} {||h^{2}||}_{2}^{L} .

(4)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|D^{| u |} f (x) - \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]| \leq M_{| u | + 2 L} ξ^{2 L} | | h^{2} {| |}_{1}^{L} Γ_{| u | + 2 L} .

(5)

Proof.

See Appendix B for the detailed proof. □

In view of Corollary 1, one obtains the upper bounds that do not depend on the dimensionality d by choosing

ξ = σ = \frac{1}{\sqrt{d}}, h_{k} = h

for instance. When

Γ_{| u | + 2 L} > 1

, the choice

ξ = σ = d^{- \frac{1}{2}} {(Γ_{| u | + 2 L})}^{- \frac{1}{2 L}}

is more appropriate. Corollary 1 provides the results for highly smooth functions. To be able to derive the optimal rates of convergence for a wide class of functions (i.e.,

H_{| u | + 1}

), Corollary 2 starts providing the biases for this class of functions under a specific set of constraints. To that end, define

K_{1} : = inf_{R \in R_{c}} E [{||R||}_{2} \prod_{k \in u} R_{k}^{2}] .

Corollary 2.

For distinct

β_{ℓ}

s, consider

r^{*} \in {0, \dots, | u | - 1}

and the constraints

\sum_{ℓ = 1}^{L = r^{*} + 2} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}

with

r = 0, 1, \dots, r^{*}, | u |

. If

f \in H_{| u | + 1}

and (A2) hold, then there is

M_{| u | + 1} > 0

such that

|D^{| u |} f (x) - \sum_{ℓ = 1}^{L = 2} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]| \leq σ M_{| u | + 1} K_{1} {||h||}_{2} Γ_{| u | + 1} .

(6)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

|D^{| u |} f (x) - \sum_{ℓ = 1}^{L = 2} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]| \leq ξ M_{| u | + 1} {| | h | |}_{1} Γ_{| u | + 1} .

(7)

Proof.

See Appendix C. □

Note that the upper bounds derived in Corollary 2 depend on

r^{*} \in {0, \dots, | u | - 1}

through

L = r^{*} + 2

and

Γ_{| u | + 1} = \sum_{ℓ = 1}^{L = r^{*} + 2} |C_{ℓ}^{(| u |)} β_{ℓ}^{| u | + 1}|

. Thus, taking

h_{k} = h

and

ξ = 1 / (d Γ_{| u | + 1})

will give a dimension-free upper bound that does not increase with

r^{*}

. The crucial role and importance of

r^{*}

is highlighted in Section 3.3.

Remark 4.

When

r^{*} = 0

, which implies that

L = 2

, consider

β_{1} = 1, β_{2} = - 1

;

C_{1}^{(| u |)} = 1 / 2

;

C_{2}^{(| u |)} = 1 / 2

when

| u |

is even and

C_{2}^{(| u |)} = - 1 / 2

otherwise. With the above choices, the upper bounds given by Equations (6) and (7) become, respectively,

σ M_{| u | + 1} K_{1} {||h||}_{2}, ξ M_{| u | + 1} {| | h | |}_{1} .

We can check that the same results hold when using

L = 1

,

β_{1} = 1

and

C_{1}^{(| u |)} = 1

.

Remark 5.

It is worth noting that we obtain exact approximations of

D^{| u |} f (x)

in Corollary 1 for the class of functions described by

B_{0} : = \{h \in H_{\infty} : D^{(\vec{ı})} h = 0, \forall \vec{ı} \in N^{d} a n d | | \vec{ı} {| |}_{1} \geq | u | + 2 L\} .

In general, exact approximations of

D^{| u |} f (x)

are obtained when

L \to \infty

for highly smooth functions.

3.3. Convergence Analysis

Given a sample of

V

, that is,

{\{V_{i} : = (V_{i, 1}, \dots, V_{i, d})\}}_{i = 1}^{N}

and using Equation (2), the method of moments implies that the estimator of

D^{| u |} f (x)

is given by

\hat{D^{| u |} f} (x) : = \frac{1}{N} \sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V_{i}) \prod_{k \in u} \frac{V_{i, k}}{(h_{k} σ^{2})} .

Statistically, it is common to measure the quality of an estimator using the mean squared error (MSE), including the rates of convergence. The MSEs can also help in determining the optimal value of

h

. Theorem 2 provides such quantities under different assumptions. To that end, define

K_{2, r} : = inf_{R \in R_{c}} E [{||R^{2}||}_{2}^{r} \prod_{k \in u} R_{k}^{2}] .

Theorem 2.

For distinct

β_{ℓ}

s, consider

r^{*} \in N

and

\sum_{ℓ = 1}^{L = r^{*} + 2} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}

with

r = 0, 1, \dots, r^{*}, | u |

and

r^{*} \leq | u | - 1

. If

f \in H_{| u | + 1}

and (A2) hold, then

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] \leq σ^{2} M_{| u | + 1}^{2} K_{1}^{2} Γ_{| u | + 1}^{2} {||h||}_{2}^{2} + \frac{M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} K_{2, r^{*} + 1}}{N σ^{2 (| u | - r^{*} - 1)} \prod_{k \in u} h_{k}^{2}} {||h^{2}||}_{2}^{r^{*} + 1} .

(8)

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

k = 1, \dots, d

, then

. E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] \leq M_{| u | + 1}^{2} {| | h | |}_{1}^{2} ξ^{2} Γ_{| u | + 1}^{2} + \frac{3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2}}{N ξ^{2 (| u | - r^{*} - 1)} \prod_{k \in u} h_{k}^{2}} {||h||}_{2}^{2 (r^{*} + 1)} .

(9)

Proof.

See Appendix D for the detailed proof. □

Theorem 2 provides the upper bounds of MSEs for the anisotropic case. Using a uniform bandwidth, that is,

h_{k} = h

, reveals that such upper bounds clearly depend on the dimensionality of the function of interest. Indeed, we can check that the upper bounds of the MSEs provided in Equations (8) and (9) become, respectively,

σ^{2} M_{| u | + 1}^{2} K_{1}^{2} Γ_{| u | + 1}^{2} d h^{2} + \frac{M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} K_{2, r^{*} + 1}}{N σ^{2 (| u | - r^{*} - 1)} h^{2 (| u | - r^{*} - 1)}} d^{\frac{r^{*} + 1}{2}},

ξ^{2} M_{| u | + 1}^{2} Γ_{| u | + 1}^{2} d^{2} h^{2} + \frac{3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} d^{r^{*} + 1}}{N ξ^{2 (| u | - r^{*} - 1)} h^{2 (| u | - r^{*} - 1)}} .

By minimizing such upper bounds with respect to h, the optimal rates of convergence of the proposed estimators are derived in Corollary 3.

Corollary 3.

Under the assumptions made in Theorem 2, if

r^{*} < | u | - 1

, then

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] = O (N^{- \frac{1}{| u | - r^{*}}} d^{1 + \frac{| u | - 1}{| u | - r^{*}}}) .

Proof.

See Appendix E for the detailed proof. □

The optimal rates of convergence obtained in Corollary 3 are far away from the parametric ones, and such rates decrease with

| u |

. Nevertheless, such optimal rates are function of

d^{2}

for any

| u | \geq 2

using

r^{*} = 1

. The maximum rate of convergence that can be reached is

N^{- 1 / 2} d^{\frac{| u | + 1}{2}}

by taking

r^{*} = | u | - 2

.

To derive the optimal and parametric rates of convergence, let us now choose

r^{*} = | u | - 1

and

h_{k} = h

with

k = 1, \dots, d

. Thus, we can see that the second terms of the upper bounds of the MSEs (provided in Theorem 2) are function of

d^{| u |}

, but they are independent of h. This key observation leads to Corollary 4.

Corollary 4.

Under the assumptions made in Theorem 2, if

r^{*} = | u | - 1

;

ξ = 1 / (d Γ_{| u | + 1})

and

h_{k} = h \propto N^{- γ / 2}

with

γ \in [1, 2]

, then we have

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] = O (N^{- 1} d^{| u |}) .

Proof.

The proof is straightforward since

h^{2} \propto N^{- γ}

and

N h \to \infty

if

N \to \infty

. □

It is worth noting that the upper bound of the squared bias obtained in Corollary 4 does not depend on the dimensionality thanks to

ξ

. Also, the optimal and parametric rates of convergence are reached by means of

L N = (| u | + 1) N

model evaluations, and such model runs can still be used for computing

D^{| v |} f (x)

for every

v \subseteq {1, \dots, d}

with

| v | \leq | u |

. Based on the same assumptions, it appears that the results provided in Corollary 4 are much more convenient for

| u | = 1, 2

in higher dimensions, while those obtained in Corollary 3 are well suited for higher dimensions and for higher values of

| u | \in {1, \dots, d}

.

For highly smooth functions and for large values of

| u |

and d, we are able to derive intermediate rates of convergence of the estimator of

D^{| u |} f

(see Theorem 3). To that end, consider an integer

L^{'}

,

L = r^{*} + L^{'} + 1

, and denote with

[b]

the largest integer that is less than b for any real b.

Theorem 3.

For an integer

r^{*} \leq | u | - 2

, consider

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}

r = 0, 1, \dots, r^{*}, | u |,

| u | + 2, | u | + 4, \dots, | u | + 2 (L^{'} - 1)

. If

f \in H_{| u | + 2 L^{'}}

and (A2) hold, then

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] \leq σ^{4 L^{'}} M_{| u | + 2 L^{'}}^{2} K_{1, L^{'}}^{2} d^{L^{'}} h^{4 L^{'}} + \frac{M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} K_{2, r^{*} + 1}}{N σ^{2 (| u | - r^{*} - 1)} h^{2 (| u | - r^{*} - 1)}} d^{\frac{r^{*} + 1}{2}} .

Moreover, if

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, then

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] \leq ξ^{4 L^{'}} M_{| u | + 2 L^{'}}^{2} Γ_{| u | + 2 L^{'}}^{2} d^{2 L^{'}} h^{4 L^{'}} + \frac{3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} d^{r^{*} + 1}}{N ξ^{2 (| u | - r^{*} - 1)} h^{2 (| u | - r^{*} - 1)}} .

For a given

0 \leq ϵ_{o p} < 1

, taking

L^{'} = [\frac{(| u | - r^{*} - 1) (1 - ϵ_{o p})}{2 ϵ_{o p}}]

leads to

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] = O (N^{- 1 + ϵ_{o p}} d^{| u | (1 - ϵ_{o p})}) .

(10)

Proof.

Detailed proofs are provided in Appendix F. □

It turns out that the optimal rate of convergence derived in Theorem 3 is a trade-off between the sample size N and the dimensionality d. For instance, when

ϵ_{o p} = 1 / 2

, the optimal rate becomes

O (N^{- 1 / 2} d^{| u | / 2)})

, which improves the rate obtained in Corollary 3, but under different assumptions.

Remark 6.

Since the bias vanishes for the class of functions

B_{0}

, taking

σ = 1 / h

yields

E [{(\hat{D^{| u |} f} (x) - D^{| u |} f (x))}^{2}] = O (N^{- 1})

(see Appendix G). Note that such an optimal rate of convergence is dimension-free.

3.4. Derivative-Based Emulators of Smooth Functions

Using Equation (1) and bearing in mind the estimators of the cross-derivatives provided in Section 3.3, this section aims at providing surrogates of smooth functions, also known as emulators. The general expression of the surrogate of f is given below.

Corollary 5.

For any

| u | \in {1, \dots, d}

, consider

\sum_{ℓ = 1}^{L = 5} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = δ_{| u |, r}

,

r = 0, 1, \dots,

r^{*} = max (| u | - 1, 3), | u |

. Assume that

f \in H_{d + 1}

and (A1) and (A2) hold. Then, an approximation of f at

x \in Ω

is given by

\hat{f} (x) : = E [f (X^{'})] + \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ | v | > 0 \end{matrix}} E_{X^{'}} [\hat{D^{| v |} f} (X^{'}) \prod_{k \in v} \frac{G_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq x_{k}]}}{g_{k} (X_{k}^{'})}] \overset{P}{\to} f (x) .

(11)

The above plug-in estimator is consistent using the law of large numbers. For a given

x

, it is worth noting that the choice of

G_{j}

s is arbitrary, provided that such distributions are supported on an open neighborhood of

x

.

Often, the higher-order cross-partial derivatives or, equivalently, the higher-order interactions among the model inputs almost vanish, leading us to consider the truncated expressions. Given an integer s with

0 < s ≪ d

and keeping in mind the ANOVA decomposition, consider the class of functions that admit at most the s-th-order interactions, that is,

A_{s} : = \{h : R^{d} \to R : h (x) = \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ | v | \leq s \end{matrix}} h_{v} (x_{v})\}

. Truncating the functional expansion is a standard practice within the ANOVA-community, that is,

s ≪ d

is assumed in higher dimensions [10,12]. For such a class of functions, requiring

f \in H_{α_{s}}

with

α_{s} \geq s

is sufficient to derive our results. Thus, the truncated surrogate of f is given by

\hat{f_{s}} (x) : = E [f (X^{'})] + \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ 0 < | v | \leq s \end{matrix}} E_{X^{'}} [\hat{D^{| v |} f} (X^{'}) \prod_{k \in v} \frac{G_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq x_{k}]}}{g_{k} (X_{k}^{'})}] .

Under the assumptions made in Corollary 5,

\hat{f_{s}} (x)

reaches the optimal and parametric rate of convergence for the class of functions

A_{3}

. For instance, taking

s = 1

leads to the first-order emulator of f, which relies only on the gradient information. Thus,

\hat{f_{s = 1}}

provides accurate estimates of additive models of the form

\sum_{j = 1}^{d} h_{j} (x_{j})

, where

h_{j}

s are given functions. Likewise,

\hat{f_{2}}

allows for incorporating the second-order terms, but it requires the second-order cross-partial derivatives. Thus, it is relevant to find the class of functions

A_{s}

which contains the model of interest before building emulators. The following section deals with such issues.

4. Applications: Computing Sensitivity Indices

In high-dimensional settings, reducing the dimension of functions is often achieved by using screening measures, that is, measures that can be used for quickly identifying non-relevant input variables. Screening measures based on the upper bounds of the total sensitivity indices rely on derivatives [14,23,24,25,60,61]. This section aims at providing optimal computations of upper bounds of the total indices, followed by the computations of the main indices using derivatives.

By evaluating the function f given by Equation (1) at a random vector

X

using

G_{j} = F_{j}

, one obtains a random vector of the model outputs. Generalized sensitivity indices, including Sobol’s indices, rely on the variance–covariance of sensitivity functionals (SFs), which are also random vectors containing the information about the overall contributions of inputs [14,26,62]. The derivative-based expressions of SFs are given below (see [14] for more details). Given

u \subseteq {1, \dots, d}

, the interaction SF of the inputs

X_{u}

is given by

f_{u} (X_{u}) : = E_{X^{'}} [D^{| u |} f (X^{'}) \prod_{k \in u} \frac{F_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq X_{k}]}}{ρ_{k} (X_{k}^{'})}],

and the first-order SF of

X_{u}

is given by

f_{u}^{f o} (X_{u}) : = \sum_{\begin{matrix} v, v \subseteq u \\ | v | > 0 \end{matrix}} f_{v} (X_{v}) .

Likewise, the total-interaction SF of

X_{u}

is given by [14]

f_{u}^{s u p} (X) : = \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ u \subseteq v \end{matrix}} f_{v} (X_{v}) = E_{X^{'}} [D^{| u |} f (X_{u}^{'}, X_{\sim u}) \prod_{k \in u} \frac{F_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq X_{k}]}}{ρ_{k} (X_{k}^{'})}],

and the total SF of

X_{u}

is given as [14]

f_{u}^{t o t} (X) : = \sum_{\begin{matrix} v \subseteq {1, \dots, d} \\ u \cap v \neq Ø \end{matrix}} f_{v} (X_{v}) = \sum_{\begin{matrix} v, v \subseteq u \\ | v | > 0 \end{matrix}} E_{X^{'}} [D^{| v |} f (X_{u}^{'}, X_{\sim u}) \prod_{k \in v} \frac{F_{k} (X_{k}^{'}) - 1 I_{[X_{k}^{'} \geq X_{k}]}}{ρ_{k} (X_{k}^{'})}] .

For a single input

X_{j}

, we have

f_{j} (X_{j}) = f_{j}^{f o} (X_{j})

and

f_{j}^{t o t} (X) = f_{j}^{s u p} (X)

. Among similarity measures [28,29], taking the variance–covariances of SFs, that is,

Σ_{u} : = V [f_{u} (X_{u})]

,

Σ_{u}^{s u p} : = V [f_{u}^{s u p} (X)]

and

Σ_{u}^{t o t} : = V [f_{u}^{t o t} (X)]

, leads to [14]

Σ_{u} = E [D^{| u |} f (X) D^{| u |} f (X^{'}) \prod_{k \in u} \frac{F_{k} (\min (X_{k}, X_{k}^{'})) - F_{k} (X_{k}) F_{k} (X_{k}^{'})}{ρ_{k} (X_{k}) ρ_{k} (X_{k}^{'})}];

(12)

Σ_{u} \leq Σ_{u}^{s u p} \leq Σ_{u}^{u b} : = \frac{1}{2^{| u |}} E [{(D^{| u |} f (X))}^{2} \prod_{k \in u} \frac{F_{k} (X_{k}) [1 - F_{k} (X_{k})]}{{(ρ_{k} (X_{k}))}^{2}}] .

(13)

Thus,

Σ_{u}^{u b}

is the upper bound of

Σ_{u}^{s u p}

. Likewise,

Σ_{j}^{u b}

is the upper bound of

Σ_{j}^{t o t}

(i.e.,

Σ_{j}^{t o t} \leq Σ_{j}^{u b}

), and it can be used for screening the input variables.

To provide new expressions of the screening measures and the main induces in the following proposition, denote with

V^{'}

an i.i.d. copy of

V

, and assume that

Assumption 3

(A3).

f (X)

has finite second-order moments.

Proposition 1.

Under the assumptions made in Corollary 4, assume that (A1)–(A3) hold. Then,

\begin{matrix} Σ_{u} & = & \sum_{\begin{matrix} ℓ_{1} = 1 \\ ℓ_{2} = 1 \end{matrix}}^{L} C_{ℓ_{1}}^{(| u |)} C_{ℓ_{2}}^{(| u |)} E [f (X + β_{ℓ_{1}} h V) f (X^{'} + β_{ℓ_{2}} h V^{'}) \\ \times \prod_{k \in u} \frac{V_{k} V_{k}^{'}}{(h_{k}^{2} σ^{4})} (\frac{F_{k} (\min (X_{k}, X_{k}^{'})) - F_{k} (X_{k}) F_{k} (X_{k}^{'})}{ρ_{k} (X_{k}) ρ_{k} (X_{k}^{'})})] + O ({||h||}_{2}^{2}); \end{matrix}

(14)

\begin{matrix} Σ_{u}^{u b} & = & \frac{1}{2^{| u |}} \sum_{\begin{matrix} ℓ_{1} = 1 \\ ℓ_{2} = 1 \end{matrix}}^{L} C_{ℓ_{1}}^{(| u |)} C_{ℓ_{2}}^{(| u |)} E [f (X + β_{ℓ_{1}} h V) f (X + β_{ℓ_{2}} h V^{'}) \\ \times \prod_{k \in u} \frac{V_{k} V_{k}^{'}}{h_{k}^{2} σ^{4}} (\frac{F_{k} (X_{k}) [1 - F_{k} (X_{k})]}{{(ρ_{k} (X_{k}))}^{2}})] + O ({||h||}_{2}^{2}) . \end{matrix}

(15)

Proof.

Bearing in mind Equations (2), (12) and (13), the proof of Proposition 1 is straightforward. □

The method of moments allows for deriving the estimators of

Σ_{u}

and

Σ_{u}^{u b}

for all

u \subseteq {1, \dots, d}

. For screening inputs of models, we provide the estimators of

Σ_{j}

and

Σ_{j}^{u b}

for any

j \in {1, \dots, d}

. To that end, we are given four independent samples, that is,

{\{X_{i}\}}_{i = 1}^{N}

from

X

,

{\{X_{i}^{'}\}}_{i = 1}^{N}

from

X^{'}

,

{\{V_{i}\}}_{i = 1}^{N}

from

V

, and

{\{V_{i}^{'}\}}_{i = 1}^{N}

from

V^{'}

. Consistent estimators of

Σ_{j}

and

Σ_{j}^{u b}

are, respectively, given by

\begin{matrix} \hat{Σ_{j}} & : = & \sum_{i = 1}^{N} \sum_{\begin{matrix} ℓ_{1} = 1 \\ ℓ_{2} = 1 \end{matrix}}^{L} \frac{C_{ℓ_{1}}^{(1)} C_{ℓ_{2}}^{(1)}}{N} f (X_{i} + β_{ℓ_{1}} h V_{i}) f (X_{i}^{'} + β_{ℓ_{2}} h V_{i}^{'}) \frac{V_{i, j} V_{i, j}^{'}}{h_{j}^{2} σ^{4}} \\ (\frac{F_{j} (\min (X_{i, j}, X_{i, j}^{'})) - F_{j} (X_{i, j}) F_{j} (X_{i, j}^{'})}{ρ_{j} (X_{i, j}) ρ_{j} (X_{i, j}^{'})}); \end{matrix}

(16)

\hat{Σ_{j}^{u b}} : = \sum_{i = 1}^{N} \sum_{\begin{matrix} ℓ_{1} = 1 \\ ℓ_{2} = 1 \end{matrix}}^{L} \frac{C_{ℓ_{1}}^{(1)} C_{ℓ_{2}}^{(1)}}{2 N} f (X_{i} + β_{ℓ_{1}} h V_{i}) f (X_{i} + β_{ℓ_{2}} h V_{i}^{'}) \frac{V_{i, j} V_{i, j}^{'}}{h_{j}^{2} σ^{4}} (\frac{F_{j} (X_{i, j}) [1 - F_{j} (X_{i, j})]}{{(ρ_{j} (X_{i, j}))}^{2}}) .

(17)

The above (direct) estimators require

3 L N

model runs for obtaining the estimates for any

j \in {1, \dots, d}

. Additionally to such estimators, we derive the plug-in estimators, which are relevant in the presence of given data about the estimates of the first-order derivatives. To provide such estimators, we denote with

{\{\hat{D^{| {j} |} f} (X_{i})\}}_{i = 1}^{N_{1}}

a sample of

N_{1} > 1

known or estimates of the first-order derivatives (i.e.,

| u | = 1

). Using Equation (2), such estimates are obtained by considering

L = 1

or

L = 2

or

L = 3

and N. Keeping in mind Equations (12) and (13) and the U-statistic theory for one sample, the plug-in estimator of the main index of

X_{j}

is given by

{\hat{Σ_{j}}}^{'} : = \frac{2}{N_{1} (N_{1} - 1)} \sum_{1 \leq i_{1} < i_{2} \leq N_{1}} \hat{D^{| {j} |} f} (X_{i_{1}}) \hat{D^{| {j} |} f} (X_{i_{2}}) \frac{F_{j} (\min (X_{i_{1}, j}, X_{i_{2}, j})) - F_{j} (X_{i_{1}, j}) F_{j} (X_{i_{2}, j})}{ρ_{j} (X_{i_{1}, j}) ρ_{j} (X_{i_{2}, j})} .

(18)

Likewise, the plug-in estimator of the upper bound of the total index of

X_{j}

is given by

{\hat{Σ_{j}^{u b}}}^{'} : = \frac{1}{2 N_{1}} \sum_{i = 1}^{N_{1}} {[\hat{D^{| {j} |} f} (X_{i})]}^{2} \frac{F_{j} (X_{i, j}) [1 - F_{j} (X_{i, j})]}{{[ρ_{j} (X_{i, j})]}^{2}} .

(19)

Note that the plug-in estimators are consistent and require a total of

L N_{1} N_{0}

model runs for computing such indices, where

N_{0}

is the number of model runs used for computing the gradient of f at

X_{i}

.

5. Illustrations: Screening and Emulators of Models

5.1. Test Functions

5.1.1. Ishigami’s Function ( $d = 3$ )

The Ishigami function includes three independent inputs following a uniform distribution on

[- π, π]

, and it is given by

f (x) = sin (x_{1}) + 7 {sin}^{2} (x_{2}) + 0.1 x_{3}^{4} sin (x_{1}) .

The sensitivity indices are

S_{1} = 0.3139

,

S_{2} = 0.4424

,

S_{3} = 0.0

,

S_{T_{1}} = 0.567

,

S_{T_{2}} = 0.442

, and

S_{T_{3}} = 0.243

.

5.1.2. Sobol’s g-Function ( $d = 10$ )

The g-function [63] includes ten independent inputs following a uniform distribution on

[0, 1]

, and it is defined as follows:

f (x) = \prod_{j = 1}^{d = 10} \frac{| 4 x_{j} - 2 | + a_{j}}{1 + a_{j}} .

Note that such a function is differentiable almost everywhere. According to the values of

a = (a_{j}, j = 1, 2, \dots, d)

, this function has different properties [23]:

If $a = {[0, 0, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52, 6.52]}^{T}$ , the values of sensitivity indices are $S_{1} = S_{2} = 0.39$ , $S_{j} = 0.0069$ , $\forall j > 2$ , $S_{T_{1}} = S_{T_{2}} = 0.54$ , and $S_{T_{j}} = 0.013$ , $\forall j > 2$ . Thus, this function has a low effective dimension (function of type A), and it belongs to $A_{s}$ with $s > 1$ (see Section 3.4).
If $a = {[50, 50, 50, 50, 50, 50, 50, 50, 50, 50]}^{T}$ , the first and total indices are given as follows: $S_{j} = S_{T_{j}} = 0.1$ , $\forall j \in {1, 2, \dots, d}$ . Thus, all inputs are important, but there is no interaction among these inputs. This function has a high effective dimension (function of type B). Note that it belongs to $A_{1}$ .
If $a = {[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}^{T}$ , the function belongs to the class of functions with important interactions among inputs. Indeed, we have $S_{j} = 0.02$ and $S_{T_{j}} = 0.27$ , $\forall j \in {1, 2, \dots, d}$ . All the inputs are relevant due to important interactions (function of type C). Then, this function belongs to $A_{s}$ with $s \geq 2$ .

5.2. Numerical Comparisons of Estimators

This section provides a comparison of the direct and plug-in estimators of the main indices and the upper bounds of the total indices using the test functions of Section 5.1. Different total budgets for the model evaluations are considered in this paper, that is,

N = 500

, 1000, 1500, 2000, 3000, 5000, 10,000, 15,000, 20,000. In the case of the plug-in estimators, we used

N_{0} = 2 d

. For generating different random values, Sobol’s sequence (scrambled = 3) from the R-package randtoolbox [64] is used. We replicated each estimation 30 times by randomly choosing the seed, and the reported results are the average of the 30 estimates.

Figure 1, Figure 2, Figure 3 and Figure 4 show the mean squared errors related to the estimates of the main indices for the Ishigami function, the g-functions of type A, type B, and type C, respectively. Each figure depicts the results for

L = 2

and

L = 3

. All the figures show the convergence and accuracy of our estimates using either

L = 2

or

L = 3

.

Likewise, Figure 5, Figure 6, Figure 7 and Figure 8 show the mean squared gaps (differences) between the true total index and its estimated upper bound for the Ishigami function, the g-functions of type A, type B, and type C, respectively.

It turns out that the plug-in estimators outperform the direct ones. Also, increasing the values of L gives the same results. Moreover, the direct estimators associated with

L = 1

fail to provide accuracy estimates (we do not report such results here). On the contrary, the plug-in estimates using

L = 1

are reported in Table 1 for the Ishigami function and in Table 2 for the three types of the g-function. Such results suggest considering

L = 1

or

L = 2

with

β_{1} = - 1, β_{2} = 1

for plug-in estimators when the total budget of model runs is small. For a larger budget of model runs, the direct estimators associated with

L = 2

and

β_{1} = - 1, β_{2} = 1

can be considered as well in practice.

5.3. Emulations of the g-Function of Type B

Based on the results obtained in Section 5.2 (see Table 2), all the inputs are important in the case of the g-function of type B, meaning that the dimension reduction is not possible. Also, the estimated upper bounds suggest weak interactions among inputs. As expected, our estimated results confirm that the g-function of type B belongs to

A_{1}

. Thus, an emulator of f based only on the first-order derivatives is sufficient. Using this information, we have derived the emulator of that function (i.e.,

\hat{f_{s = 1}} (x)

) under the assumptions made in Corollary 5 (

r^{*} = 0, L \geq 2

) and using

G_{j} = F_{j}

with

j = 1, \dots, d

. For a given L, we used 300 model runs to build the emulator, and Figure 9 depicts the approximations of that function (called predictions) at the sample points involved in the construction of the emulator. Note that the evaluations of f at such sample points (called observations) are not directly used in the construction of such an emulator. It turns out from Figure 9 that

\hat{f_{s = 1}}

provides predictions that are in line with the observations, showing the accuracy of our emulator.

6. Conclusions

In this paper, we firstly provided i) stochastic expressions of cross-partial derivatives of any order, followed by their biases, and ii) estimators of such expressions. Our estimators of the

| u |

-th cross-partial derivatives (

\forall u \subseteq {1, \dots, d}

) reach the parametric rates of convergence (i.e.,

O (N^{- 1} d^{| u |})

) by means of a set of

L \geq | u | + 1

constraints for the Hölder space of

α

-smooth functions

H_{α}

with

α > | u |

. Moreover, we showed that the upper bounds of the biases of such estimators do not suffer from the curse of dimensionality. Secondly, the proposed surrogates of cross-partial derivatives are used for deriving (i) new derivative-based emulators of simulators or surrogates of models, even when a large number of model inputs contribute to the model outputs, and (ii) new repressions of the main sensitivity indices and the upper bounds of the total sensitivity indices.

Numerical simulations confirmed the accuracy of our approaches for not only screening the input variables, but also for identifying the class of functions that contains our simulator of interest, such as the class of functions with important or no interaction among inputs. This relevant information allows for designing and building the appropriate emulators of functions. In the case of the g-function of type B, our emulator of this function (based only on the first-order derivatives) provided approximations or predictions that are in line with the observations.

For functions with important interactions or, equivalently, for higher-order cross-partial derivatives, further numerical schemes are necessary to increase the numerical accuracy of the computations of such derivatives and predictions. Such perspectives will be investigated in the near future as well as the computations of the total sensitivity indices using the proposed surrogates of derivatives. Moreover, there is a need for a theoretical investigation to expect derivation of the parametric rates of convergence of the above estimators that do not suffer from the course of dimensionality. Working in

C

rather than in

R

may be helpful.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

I would like to thank the three reviewers for their comments that helped improve my manuscript.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Theorem 1

Firstly, as

\vec{ı} : = (i_{1}, \dots, i_{d}) \in N^{d}

, denote

| | \vec{ı} {| |}_{1} = i_{1} + \dots + i_{d}

and

\vec{u} : = (1 I_{u} (1), \dots, 1 I_{u} (d))

. The Taylor expansion of

f (x + β_{ℓ} h V)

about

x

of order

α

is given by

\begin{matrix} f (x + β_{ℓ} h V) & = & \sum_{p = 0}^{α} \sum_{| | \vec{ı} {| |}_{1} = p} \frac{D^{(\vec{ı})} f (x)}{\vec{ı}!} β_{ℓ}^{p} {(h V)}^{\vec{ı}} + O (| | β_{ℓ} h V {| |}_{1}^{α + 1}) . \end{matrix}

Multiplying such an expansion by the constant

C_{ℓ}^{(| u |)}

, and taking the sum over

ℓ = 1, \dots, L

, the expectation

E : = \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{(h_{k} σ^{2})}]

becomes

\begin{matrix} E & = & \sum_{p \geq 0} \sum_{| | \vec{ı} {| |}_{1} = p} \frac{D^{(\vec{ı})} f (x)}{\vec{ı}!} (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{p}) E [\frac{{(V)}^{\vec{ı} + \vec{u}} {(h)}^{\vec{ı} - \vec{u}}}{σ^{2 | u |}}] . \end{matrix}

We can see that

E [{(V)}^{\vec{ı} + \vec{u}} {(h)}^{\vec{ı} - \vec{u}}] \neq 0

iff

\vec{ı} + \vec{u} = 2 \vec{q}, \forall \vec{q} \in N^{d}

. Equation

\vec{ı} + \vec{u} = 2 \vec{q}

implies

i_{k} = 2 q_{k} \geq 0

if

k \notin u

and

i_{k} = 2 q_{k} - 1 \geq 0

otherwise. Thus, using

i_{k} = 2 q_{k} + 1

when

k \in u

is much more convenient, and it leads to

\vec{ı} = 2 \vec{q} + \vec{u}

,

\forall \vec{q} \in N^{d}

, which also implies that

| | \vec{ı} {| |}_{1} \geq | | \vec{u} {| |}_{1}

. We then obtain

D^{| u |} f

when

| | \vec{q} {| |}_{1} = 0

or

\vec{ı} = \vec{u}

, and the fact that

E [{(V)}^{2 \vec{u}}] = E [\prod_{k \in u} V_{k}^{2}] = σ^{2 | u |}

by independence. We can then write

E = D^{| u |} f (x) (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{| u |}) + \sum_{\begin{matrix} r \geq 1 \\ | | \vec{q} {| |}_{1} = r \end{matrix}} \frac{D^{(2 \vec{q} + \vec{u})} f (x)}{(2 \vec{q} + \vec{u})!} (\sum_{ℓ} C_{ℓ}^{(| u |)} β_{ℓ}^{r + | u |}) E [\frac{{(V)}^{2 (\vec{q} + \vec{u})} {(h)}^{\vec{2 q}}}{σ^{2 | u |}}],

using the change of variable

r = p - | | \vec{u} {| |}_{1}

. At this point, setting

L = 1, β_{ℓ} = 1

, and

C_{ℓ}^{(| u |)} = 1

results in the approximation of

D^{| u |} f (x)

of order

O ({||h||}_{2}^{2})

.

Secondly, for

L > 1

, the constraints

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r + | u |} = δ_{0, r}

r = 0, 2, \dots, 2 (L - 1)

allow to eliminate some higher-order terms so as to reach the order

O ({||h||}_{2}^{2 L})

. One can also use

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r + | u |} = δ_{0, r}

r = - | u |, \dots, - | u | + L - 2, 0

to increase the accuracy of approximations, but keeping the order

O ({||h||}_{2}^{2})

when

- | u | + L - 2 < 0

.

Appendix B. Proof of Corollary 1

Let $\vec{q} = (q_{1}, \dots, q_{d}) \in N^{d}$ , $\vec{u} : = (1 I_{u} (1), \dots, 1 I_{u} (d)) \in N^{d}$ , and consider the set $α : = \{2 \vec{q} + \vec{u} : | | \vec{q} {| |}_{1} = L\}$ . As $f \in H_{| u | + 2 L}$ , the expansion of $f (x + h V)$ gives

$f (x + β_{ℓ} h V) = \sum_{| | \vec{ı} {| |}_{1} = 0}^{| u | + 2 L - 1} D^{(\vec{ı})} f (x) β_{ℓ}^{| | \vec{ı} {| |}_{1}} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!} + \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = | u | + 2 L \\ \vec{ı} \notin α \end{matrix}} D^{(\vec{ı})} f (x) β_{ℓ}^{| u | + 2 L} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!} + R_{| u |} (h, β_{ℓ}, V),$

with the remainder term $R_{| u |} (h, β_{ℓ}, V) : = β_{ℓ}^{| u | + 2 L} \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = | u | + 2 L \\ \vec{ı} \in α \end{matrix}} D^{(\vec{ı})} f (x + β_{ℓ} h V) \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!}$ . Thus, $R_{| u |} (h, β_{ℓ}, V) : = β_{ℓ}^{| u | + 2 L} \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = L \end{matrix}} \frac{D^{(2 \vec{q} + \vec{u})} f (x + β_{ℓ} h V)}{(2 \vec{q} + \vec{u})!} {(h V)}^{2 \vec{q} + \vec{u}}$ and

$\begin{matrix} R_{| u |} (h, β_{ℓ}, V) & = & β_{ℓ}^{| u | + 2 L} \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = L \end{matrix}} \frac{D^{(2 \vec{q} + \vec{u})} f (x + β_{ℓ} h V)}{(2 \vec{q} + \vec{u})!} {(h V)}^{2 \vec{q} + \vec{u}} \\ = & β_{ℓ}^{| u | + 2 L} {(h V)}^{\vec{u}} \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = L \end{matrix}} \frac{D^{(2 \vec{q} + \vec{u})} f (x + β_{ℓ} h V)}{(2 \vec{q} + \vec{u})!} {(h^{2} V^{2})}^{\vec{q}} . \end{matrix}$

Using

E : = \sum_{ℓ = 1}^{L = 2} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]

, Theorem 1 implies that the absolute value of the bias

B : = |E - D^{| u |} f (x)| = |\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} E [R_{| u |} (h, β_{ℓ}, V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]|

is given by

B \leq \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{| u | + 2 L}| M_{| u | + 2 L} E [| | h^{2} V^{2} {| |}_{1}^{L} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}],

as

|\sum_{\begin{matrix} | | \vec{q} {| |}_{1} = L \end{matrix}} \frac{D^{(2 \vec{q} + \vec{u})} f (x + β_{ℓ} h V)}{(2 \vec{q} + \vec{u})!} {(h^{2} V^{2})}^{\vec{q}}| \leq M_{| u | + 2 L} | | h^{2} V^{2} {| |}_{1}^{L}

.

Using

R_{k} = V_{k} / σ

, the results hold because

E [| | h^{2} V^{2} {| |}_{1}^{L} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] = σ^{2 L} E [| | h^{2} R^{2} {| |}_{1}^{L} \prod_{k \in u} R_{k}^{2}]

.

For

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, we have

E [| | h^{2} V^{2} {| |}_{1}^{L} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] \leq ξ^{2 L} | | h^{2} {| |}_{1}^{L}

.

Appendix C. Proof of Corollary 2

Let $α : = \{\vec{q} + \vec{u} : | | \vec{q} {| |}_{1} = 1\}$ . As $f \in H_{| u | + 1}^{M}$ , we can write

$f (x + β_{ℓ} h V) = \sum_{| | \vec{ı} {| |}_{1} = 0}^{| u |} D^{(\vec{ı})} f (x) β_{ℓ}^{| | \vec{ı} {| |}_{1}} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!} + \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = | u | + 1 \\ \vec{ı} \notin α \end{matrix}} D^{(\vec{ı})} f (x) β_{ℓ}^{| u | + 1} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!} + R_{| u |} (h, β_{ℓ}, V),$

with the remainder term $R_{| u |} (h, β_{ℓ}, V) : = β_{ℓ}^{| u | + 1} \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = | u | + 1 \\ \vec{ı} \in α \end{matrix}} D^{(\vec{ı})} f (x + β_{ℓ} h V) \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!}$ . Using $E : = \sum_{ℓ = 1}^{L = 2} C_{ℓ}^{(| u |)} E [f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{(h_{k} σ^{2})}]$ and Theorem 1, the results hold by analogy to the proof of Corollary 1. Indeed, if $R_{k} : = V_{k} / σ$ , then

$\begin{matrix} B & \leq & M_{| u | + 1} E [{| | h V | |}_{1} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] Γ_{| u | + 1} = σ M_{| u | + 1} E [{| | h R | |}_{1} \prod_{k \in u} R_{k}^{2}] Γ_{| u | + 1} \\ \leq & σ M_{| u | + 1} {||h||}_{2} E [{||R||}_{2} \prod_{k \in u} R_{k}^{2}] Γ_{| u | + 1} . \end{matrix}$

For

V_{k} \sim U (- ξ, ξ)

with

ξ > 0

and

k = 1, \dots, d

, we have

B \leq M_{| u | + 1} E [{| | h V | |}_{1} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] Γ_{| u | + 1} \leq M_{| u | + 1} {ξ | | h | |}_{1} E [\prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] Γ_{| u | + 1} = M_{| u | + 1} {| | h | |}_{1} ξ Γ_{| u | + 1} .

Appendix D. Proof of Theorem 2

As

f \in H_{| u | + 1}

implies that

f \in H_{r^{*} + 1}

with

r^{*} \leq | u | - 1

, we have

|f (x + β_{ℓ} h V) - \sum_{| | \vec{ı} {| |}_{1} = 0}^{r^{*}} D^{(\vec{ı})} f (x) β_{ℓ}^{| | \vec{ı} {| |}_{1}} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!}| \leq M_{r^{*} + 1} {||β_{ℓ} h V||}_{2}^{r^{*} + 1},

Using the fact that

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} β_{ℓ}^{r} = 0

for

r = 0, 1, \dots, r^{*}

, we can write

\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V) = \sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} [f (x + β_{ℓ} h V) - \sum_{| | \vec{ı} {| |}_{1} = 0}^{r^{*}} D^{(\vec{ı})} f (x) β_{ℓ}^{| | \vec{ı} {| |}_{1}} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!}],

which leads to

|\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V)| \leq \sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{r^{*} + 1}| M_{r^{*} + 1} {||h V||}_{2}^{r^{*} + 1}

.

By taking the variance of the proposed estimator, we have

\begin{matrix} V [\hat{D^{| u |} f} (x)] & : = & \frac{1}{N} V [\{\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V)\} \prod_{k \in u} \frac{V_{k}}{(h_{k} σ^{2})}] \\ \leq & \frac{1}{N} E [{\{\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V)\}}^{2} \prod_{k \in u} \frac{V_{k}^{2}}{(h_{k}^{2} σ^{4})}] \\ \leq & \frac{M_{r^{*} + 1}^{2} {(\sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{r^{*} + 1}|)}^{2}}{N \prod_{k \in u} h_{k}^{2}} E [\frac{{||h V||}_{2}^{2 (r^{*} + 1)}}{σ^{2 | u |}} \prod_{k \in u} \frac{V_{k}^{2}}{σ^{2}}] \\ \leq & \frac{M_{r^{*} + 1}^{2} {(\sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{r^{*} + 1}|)}^{2}}{N σ^{2 (| u | - r^{*} - 1)} \prod_{k \in u} h_{k}^{2}} {||h^{2}||}_{2}^{r^{*} + 1} E [{||R^{2}||}_{2}^{r^{*} + 1} \prod_{k \in u} R_{k}^{2}], \end{matrix}

where

R_{k} = V_{k} / σ

.

If

V_{k} \sim U (- ξ, ξ)

,

V [\hat{D^{| u |} f} (x)] \leq \frac{3^{| u |} M_{r^{*} + 1}^{2} {(\sum_{ℓ = 1}^{L} |C_{ℓ}^{(| u |)} β_{ℓ}^{r^{*} + 1}|)}^{2}}{N ξ^{2 (| u | - r^{*} - 1)} \prod_{k \in u} h_{k}^{2}} {||h||}_{2}^{2 (r^{*} + 1)}

.

The results hold using Corollary 2 and the fact that

M S E = B^{2} + V [\hat{D^{| u |} f} (x)]

.

Appendix E. Proof of Corollary 3

Let

η : = | u | - r^{*} - 1 > 0

,

F_{0} : = 3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} d^{r^{*} + 1}

and

F_{1} : = M_{| u | + 1}^{2} Γ_{| u | + 1}^{2} d^{2}

. By minimizing

ξ^{2} M_{| u | + 1}^{2} Γ_{| u | + 1}^{2} d^{2} h^{2} + \frac{3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} d^{r^{*} + 1}}{N ξ^{2 (| u | - r^{*} - 1)} h^{2 (| u | - r^{*} - 1)}}

, we obtain

h_{o p} = \frac{1}{ξ} {(\frac{η F_{0}}{F_{1}})}^{\frac{1}{2 η + 2}} N^{- \frac{1}{2 η + 2}}; \frac{F_{0}}{N ξ^{2 η} h^{2 η}} = \frac{F_{0}}{N N^{- \frac{2 η}{2 η + 2}}} {(\frac{F_{1}}{η F_{0}})}^{\frac{2 η}{2 η + 2}} = {(\frac{F_{0}}{N})}^{\frac{1}{η + 1}} {(\frac{F_{1}}{η})}^{\frac{η}{η + 1}};

and

d^{\frac{r^{*} + 1}{| u | - r^{*}}} d^{\frac{2 (| u | - r^{*} - 1)}{| u | - r^{*}}} = d^{\frac{2 | u | - r^{*} - 1}{| u | - r^{*}}} = d^{1 + \frac{| u | - 1}{| u | - r^{*}}} .

Appendix F. Proof of Theorem 3

The first two results hold by combining the biases obtained in Corollary 1 and the upper bounds of the variance provided in Theorem 2.

For the last result, let

η : = | u | - r^{*} - 1 > 0

,

F_{0} : = 3^{| u |} M_{r^{*} + 1}^{2} Γ_{r^{*} + 1}^{2} d^{r^{*} + 1}

and

F_{1} : = M_{| u | + 2 L^{'}}^{2} Γ_{| u | + 2 L^{'}}^{2} d^{2 L^{'}}

. By minimizing the last upper bound, we obtain

h_{o p} : = \frac{1}{ξ} {(\frac{η F_{0}}{2 L^{'} F_{1}})}^{\frac{1}{2 η + 4 L^{'}}} N^{- \frac{1}{2 η + 4 l^{'}}}

and

\frac{F_{0}}{N ξ^{2 η} h^{2 η}} = \frac{F_{0}}{N N^{- \frac{2 η}{2 η + 4 L^{'}}}} {(\frac{2 L^{'} F_{1}}{η F_{0}})}^{\frac{2 η}{2 η + 4 L^{'}}} = {(\frac{F_{0}}{N})}^{\frac{2 L^{'}}{η + 2 L^{'}}} {(\frac{2 L^{'} F_{1}}{η})}^{\frac{η}{η + 2 L^{'}}} \propto N^{- \frac{2 L^{'}}{| u | - r^{*} - 1 + 2 L^{'}}} d^{\frac{2 L^{'} | u |}{| u | - r^{*} - 1 + 2 L^{'}}} .

Appendix G. On Remark 6

The variance is

V [\hat{D^{| u |} f (x)}] = \frac{1}{N} V [\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V) \prod_{k \in u} \frac{V_{k}}{h_{k} σ^{2}}]

and

V [\hat{D^{| u |} f (x)}] \leq \frac{1}{N} E [{(\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V))}^{2} \prod_{k \in u} \frac{V_{k}^{2}}{h_{k}^{2} σ^{4}}] \leq \frac{{||f||}_{\infty}^{2} \sum_{\begin{matrix} ℓ_{1} = 1, ℓ_{2} = 1 \end{matrix}}^{L} C_{ℓ_{1}}^{(| u |)} C_{ℓ_{2}}^{(| u |)}}{N h^{2 | u |} σ^{2 | u |}} .

References

Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
Fabian, V. Stochastic approximation. In Optimizing Methods in Statistics; Elsevier: Amsterdam, The Netherlands, 1971; pp. 439–470. [Google Scholar]
Nemirovsky, A.; Yudin, D. Problem Complexity and Method Efficiency in Optimization; Wiley & Sons: New York, NY, USA, 1983. [Google Scholar]
Polyak, B.; Tsybakov, A. Optimal accuracy orders of stochastic approximation algorithms. Probl. Peredachi Inf. 1990, 2, 45–53. [Google Scholar]
Cristea, M. On global implicit function theorem. J. Math. Anal. Appl. 2017, 456, 1290–1302. [Google Scholar] [CrossRef]
Lamboni, M. Derivative formulas and gradient of functions with non-independent variables. Axioms 2023, 12, 845. [Google Scholar] [CrossRef]
Morris, M.D.; Mitchell, T.J.; Ylvisaker, D. Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. Technometrics 1993, 35, 243–255. [Google Scholar] [CrossRef]
Solak, E.; Murray-Smith, R.; Leithead, W.; Leith, D.; Rasmussen, C. Derivative observations in Gaussian process models of dynamic systems. In Advances in Neural Information Processing Systems 15; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
Efron, B.; Stein, C. The jacknife estimate of variance. Ann. Stat. 1981, 9, 586–596. [Google Scholar] [CrossRef]
Sobol, I.M. Sensitivity analysis for non-linear mathematical models. Math. Model. Comput. Exp. 1993, 1, 407–414. [Google Scholar]
Rabitz, H. General foundations of high dimensional model representations. J. Math. Chem. 1999, 25, 197–233. [Google Scholar] [CrossRef]
Saltelli, A.; Chan, K.; Scott, E. Variance-Based Methods, Probability and Statistics; John Wiley and Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Lamboni, M. Weak derivative-based expansion of functions: ANOVA and some inequalities. Math. Comput. Simul. 2022, 194, 691–718. [Google Scholar] [CrossRef]
Currin, C.; Mitchell, T.; Morris, M.; Ylvisaker, D. Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Am. Stat. Assoc. 1991, 86, 953–963. [Google Scholar] [CrossRef]
Oakley, J.E.; O’Hagan, A. Probabilistic sensitivity analysis of complex models: A bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 2004, 66, 751–769. [Google Scholar] [CrossRef]
Conti, S.; O’Hagan, A. Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Inference 2010, 140, 640–651. [Google Scholar] [CrossRef]
Haylock, R.G.; O’Hagan, A.; Bernardo, J.M. On inference for outputs of computationally expensive algorithms with uncertainty on the inputs. In Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting; Oxford Academic: Oxford, UK, 1996; Volume 5, pp. 629–638. [Google Scholar]
Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2001, 63, 425–464. [Google Scholar] [CrossRef]
Sudret, B. Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 2008, 93, 964–979. [Google Scholar] [CrossRef]
Wahba, G. An introduction to (smoothing spline) anova models in rkhs with examples in geographical data, medicine, atmospheric science and machine learning. arXiv 2004, arXiv:math/0410419. [Google Scholar] [CrossRef]
Sobol, I.M.; Kucherenko, S. Derivative based global sensitivity measures and the link with global sensitivity indices. Math. Comput. Simul. 2009, 79, 3009–3017. [Google Scholar] [CrossRef]
Kucherenko, S.; Rodriguez-Fernandez, M.; Pantelides, C.; Shah, N. Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 2009, 94, 1135–1148. [Google Scholar] [CrossRef]
Lamboni, M.; Iooss, B.; Popelin, A.-L.; Gamboa, F. Derivative-based global sensitivity measures: General links with Sobol’ indices and numerical tests. Math. Comput. Simul. 2013, 87, 45–54. [Google Scholar] [CrossRef]
Roustant, O.; Fruth, J.; Iooss, B.; Kuhnt, S. Crossed-derivative based sensitivity measures for interaction screening. Math. Comput. Simul. 2014, 105, 105–118. [Google Scholar] [CrossRef]
Lamboni, M. Derivative-based generalized sensitivity indices and Sobol’ indices. Math. Comput. Simul. 2020, 170, 236–256. [Google Scholar] [CrossRef]
Lamboni, M.; Kucherenko, S. Multivariate sensitivity analysis and derivative-based global sensitivity measures with dependent variables. Reliab. Eng. Syst. Saf. 2021, 212, 107519. [Google Scholar] [CrossRef]
Lamboni, M. Measuring inputs-outputs association for time-dependent hazard models under safety objectives using kernels. Int. J. Uncertain. Quantif. 2024, 1–17. [Google Scholar] [CrossRef]
Lamboni, M. Kernel-based measures of association between inputs and outputs using ANOVA. Sankhya A 2024. [CrossRef]
Russi, T.M. Uncertainty Quantification with Experimental Data and Complex System Models; Spring: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Constantine, P.; Dow, E.; Wang, S. Active subspace methods in theory and practice: Applications to kriging surfaces. SIAM J. Sci. Comput. 2014, 36, 1500–1524. [Google Scholar] [CrossRef]
Zahm, O.; Constantine, P.G.; Prieur, C.; Marzouk, Y.M. Gradient-based dimension reduction of multivariate vector-valued functions. SIAM J. Sci. Comput. 2020, 42, A534–A558. [Google Scholar] [CrossRef]
Kucherenko, S.; Shah, N.; Zaccheus, O. Application of Active Subspaces for Model Reduction and Identification of Design Space; Springer: Berlin/Heidelberg, Germany, 2024; pp. 412–418. [Google Scholar]
Kubicek, M.; Minisci, E.; Cisternino, M. High dimensional sensitivity analysis using surrogate modeling and high dimensional model representation. Int. J. Uncertain. Quantif. 2015, 5, 393–414. [Google Scholar] [CrossRef]
Kuo, F.; Sloan, I.; Wasilkowski, G.; Woźniakowski, H. On decompositions of multivariate functions. Math. Comput. 2010, 79, 953–966. [Google Scholar] [CrossRef]
Bates, D.; Watts, D. Relative curvature measures of nonlinearity. J. Royal Stat. Soc. Ser. B 1980, 42, 1–25. [Google Scholar] [CrossRef]
Guidotti, E. calculus: High-dimensional numerical and symbolic calculus in R. J. Stat. Softw. 2022, 104, 1–37. [Google Scholar] [CrossRef]
Le Dimet, F.-X.; Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus A Dyn. Meteorol. Oceanogr. 1986, 38, 97–110. [Google Scholar] [CrossRef]
Le Dimet, F.X.; Ngodock, H.E.; Luong, B.; Verron, J. Sensitivity analysis in variational data assimilation. J. Meteorol. Soc. Jpn. 1997, 75, 245–255. [Google Scholar] [CrossRef]
Cacuci, D.G. Sensitivity and Uncertainty Analysis—Theory, Chapman & Hall; CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
Gunzburger, M.D. Perspectives in Flow Control and Optimization; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
Borzi, A.; Schulz, V. Computational Optimization of Systems Governed by Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar]
Ghanem, R.; Higdon, D.; Owhadi, H. Handbook of Uncertainty Quantification; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Wang, Z.; Navon, I.M.; Le Dimet, F.-X.; Zou, X. The second order adjoint analysis: Theory and applications. Meteorol. Atmos. Phys. 1992, 50, 3–20. [Google Scholar] [CrossRef]
Agarwal, A.; Dekel, O.; Xiao, L. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proceedings of the 23rd Conference on Learning Theory, Haifa, Israel, 27–29 June 2010; pp. 28–40. [Google Scholar]
Bach, F.; Perchet, V. Highly-smooth zero-th order online optimization. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; Feldman, V., Rakhlin, A., Shamir, O., Eds.; Volume 49, pp. 257–283. [Google Scholar]
Akhavan, A.; Pontil, M.; Tsybakov, A.B. Exploiting Higher Order Smoothness in Derivative-Free Optimization and Continuous Bandits, NIPS’20; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
Lamboni, M. Optimal and efficient approximations of gradients of functions with nonindependent variables. Axioms 2024, 13, 426. [Google Scholar] [CrossRef]
Patelli, E.; Pradlwarter, H. Monte Carlo gradient estimation in high dimensions. Int. J. Numer. Methods Eng. 2010, 81, 172–188. [Google Scholar] [CrossRef]
Prashanth, L.; Bhatnagar, S.; Fu, M.; Marcus, S. Adaptive system optimization using random directions stochastic approximation. IEEE Trans. Autom. Control. 2016, 62, 2223–2238. [Google Scholar]
Agarwal, N.; Bullins, B.; Hazan, E. Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 2017, 18, 4148–4187. [Google Scholar]
Zhu, J.; Wang, L.; Spall, J.C. Efficient implementation of second-order stochastic approximation algorithms in high-dimensional problems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3087–3099. [Google Scholar] [CrossRef] [PubMed]
Zhu, J. Hessian estimation via stein’s identity in black-box problems. In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, Online, 15–17 August 2022; Bruna, J., Hesthaven, J., Zdeborova, L., Eds.; Volume 145 of Proceedings of Machine Learning Research, PMLR. pp. 1161–1178. [Google Scholar]
Erdogdu, M.A. Newton-stein method: A second order method for glms via stein’ s lemma. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
Stein, C.; Diaconis, P.; Holmes, S.; Reinert, G. Use of exchangeable pairs in the analysis of simulations. Lect.-Notes-Monogr. Ser. 2004, 46, 1–26. [Google Scholar]
Zemanian, A. Distribution Theory and Transform Analysis: An Introduction to Generalized Functions, with Applications, Dover Books on Advanced Mathematics; Dover Publications: Mineola, NY, USA, 1987. [Google Scholar]
Strichartz, R. A Guide to Distribution Theory and Fourier Transforms, Studies in Advanced Mathematics; CRC Press: Boca, FL, USA, 1994. [Google Scholar]
Rawashdeh, E. A simple method for finding the inverse matrix of Vandermonde matrix. Math. Vesn. 2019, 71, 207–213. [Google Scholar]
Arafat, A.; El-Mikkawy, M. A fast novel recursive algorithm for computing the inverse of a generalized Vandermonde matrix. Axioms 2023, 12, 27. [Google Scholar] [CrossRef]
Morris, M. Factorial sampling plans for preliminary computational experiments. Technometrics 1991, 33, 161–174. [Google Scholar] [CrossRef]
Roustant, O.; Barthe, F.; Iooss, B. Poincaré inequalities on intervals-application to sensitivity analysis. Electron. J. Stat. 2017, 11, 3081–3119. [Google Scholar] [CrossRef]
Lamboni, M. Multivariate sensitivity analysis: Minimum variance unbiased estimators of the first-order and total-effect covariance matrices. Reliab. Eng. Syst. Saf. 2019, 187, 67–92. [Google Scholar] [CrossRef]
Homma, T.; Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 1996, 52, 1–17. [Google Scholar] [CrossRef]
Dutang, C.; Savicky, P. R Package, version 1.13. Randtoolbox: Generating and Testing Random Numbers. The R Foundation: Vienna, Austria, 2013. [Google Scholar]

Figure 1. Average of

d = 3

mean squared errors using the Ishigami function (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 1. Average of

d = 3

mean squared errors using the Ishigami function (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 2. Average of

d = 10

mean squared errors using the g-function of type A (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 2. Average of

d = 10

mean squared errors using the g-function of type A (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 3. Average of

d = 10

mean squared errors using the g-function of type B (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 3. Average of

d = 10

mean squared errors using the g-function of type B (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 4. Average of

d = 10

mean squared errors using the g-function of type C (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 4. Average of

d = 10

mean squared errors using the g-function of type C (○ the direct estimator (16) and + for the plug-in estimator (18)).

Figure 5. Average of

d = 3

mean squared gaps using the Ishigami function (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 5. Average of

d = 3

mean squared gaps using the Ishigami function (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 6. Average of

d = 10

mean squared gaps using the g-function of type A (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 6. Average of

d = 10

mean squared gaps using the g-function of type A (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 7. Average of

d = 10

mean squared gaps using the g-function of type B (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 7. Average of

d = 10

mean squared gaps using the g-function of type B (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 8. Average of

d = 10

mean squared gaps using the g-function of type C (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 8. Average of

d = 10

mean squared gaps using the g-function of type C (○ the direct estimator (17) and + for the plug-in estimator (19)).

Figure 9. Predictions of g-function of type B using the emulator

\hat{f_{s = 1}}

versus observations using

L = 2

and

L = 3

.

Figure 9. Predictions of g-function of type B using the emulator

\hat{f_{s = 1}}

versus observations using

L = 2

and

L = 3

.

Table 1. Average of 30 estimates of the main indices and upper bounds of total indices for the Ishigami function using the plug-in estimators,

L = 1

, and 2000 model runs.

Table 1. Average of 30 estimates of the main indices and upper bounds of total indices for the Ishigami function using the plug-in estimators,

L = 1

, and 2000 model runs.

	X1	X2	X3
$S_{j}$	0.249	0.318	−0.006
$U B_{j}$	1.420	4.872	0.711

Table 2. Average of 30 estimates of the main indices and upper bounds of total indices for the g-functions using the plug-in estimators,

L = 1

, and 2000 model runs.

Table 2. Average of 30 estimates of the main indices and upper bounds of total indices for the g-functions using the plug-in estimators,

L = 1

, and 2000 model runs.

	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10
	Type A
$S_{j}$	0.330	0.324	0.006	0.005	0.006	0.006	0.005	0.005	0.006	0.005
$U B_{j}$	2.022	2.005	0.045	0.046	0.047	0.047	0.046	0.046	0.046	0.047
	Type B
$S_{j}$	0.085	0.085	0.085	0.085	0.085	0.085	0.085	0.085	0.085	0.085
$U B_{j}$	0.362	0.363	0.363	0.362	0.363	0.362	0.363	0.362	0.363	0.363
	Type C
$S_{j}$	0.028	0.028	0.032	0.032	0.035	0.041	0.031	0.030	0.036	0.034
$U B_{j}$	2.033	1.301	1.825	1.605	1.634	1.641	2.216	1.526	1.793	1.503

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lamboni, M. Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats 2024, 7, 697-718. https://doi.org/10.3390/stats7030042

AMA Style

Lamboni M. Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats. 2024; 7(3):697-718. https://doi.org/10.3390/stats7030042

Chicago/Turabian Style

Lamboni, Matieyendou. 2024. "Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions" Stats 7, no. 3: 697-718. https://doi.org/10.3390/stats7030042

APA Style

Lamboni, M. (2024). Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats, 7(3), 697-718. https://doi.org/10.3390/stats7030042

Article Menu

Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

Abstract

1. Introduction

2. Preliminary

3. Surrogates of Cross-Partial Derivatives and New Emulators of Functions

3.1. New Expressions of Cross-Partial Derivatives

3.2. Upper Bounds of Biases

3.3. Convergence Analysis

3.4. Derivative-Based Emulators of Smooth Functions

4. Applications: Computing Sensitivity Indices

5. Illustrations: Screening and Emulators of Models

5.1. Test Functions

5.1.1. Ishigami’s Function ( $d = 3$ )

5.1.2. Sobol’s g-Function ( $d = 10$ )

5.2. Numerical Comparisons of Estimators

5.3. Emulations of the g-Function of Type B

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Corollary 1

Appendix C. Proof of Corollary 2

Appendix D. Proof of Theorem 2

Appendix E. Proof of Corollary 3

Appendix F. Proof of Theorem 3

Appendix G. On Remark 6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

Abstract

1. Introduction

2. Preliminary

3. Surrogates of Cross-Partial Derivatives and New Emulators of Functions

3.1. New Expressions of Cross-Partial Derivatives

3.2. Upper Bounds of Biases

3.3. Convergence Analysis

3.4. Derivative-Based Emulators of Smooth Functions

4. Applications: Computing Sensitivity Indices

5. Illustrations: Screening and Emulators of Models

5.1. Test Functions

5.1.1. Ishigami’s Function ( d = 3 )

5.1.2. Sobol’s g-Function ( d = 10 )

5.2. Numerical Comparisons of Estimators

5.3. Emulations of the g-Function of Type B

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Corollary 1

Appendix C. Proof of Corollary 2

Appendix D. Proof of Theorem 2

Appendix E. Proof of Corollary 3

Appendix F. Proof of Theorem 3

Appendix G. On Remark 6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1.1. Ishigami’s Function ( $d = 3$ )

5.1.2. Sobol’s g-Function ( $d = 10$ )