Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables

Lamboni, Matieyendou

doi:10.3390/axioms13070426

Open AccessArticle

Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables

by

Matieyendou Lamboni

^1,2

¹

Department DFR-ST, University of Guyane, 97346 Cayenne, France

²

228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France

Axioms 2024, 13(7), 426; https://doi.org/10.3390/axioms13070426

Submission received: 12 May 2024 / Revised: 16 June 2024 / Accepted: 19 June 2024 / Published: 25 June 2024

(This article belongs to the Special Issue Recent Research on Functions with Non-Independent Variables)

Download

Browse Figures

Versions Notes

Abstract

Gradients of smooth functions with nonindependent variables are relevant for exploring complex models and for the optimization of the functions subjected to constraints. In this paper, we investigate new and simple approximations and computations of such gradients by making use of independent, central, and symmetric variables. Such approximations are well suited for applications in which the computations of the gradients are too expansive or impossible. The derived upper bounds of the biases of our approximations do not suffer from the curse of dimensionality for any 2-smooth function, and they theoretically improve the known results. Also, our estimators of such gradients reach the optimal (mean squared error) rates of convergence (i.e.,

O (N^{- 1})

) for the same class of functions. Numerical comparisons based on a test case and a high-dimensional PDE model show the efficiency of our approach.

Keywords:

dependent variables; gradients; high-dimensional models; optimal estimators; tensor metric of nonindependent variables

MSC:

26A24; 60H25; 62Gxx; 49Qxx

1. Introduction

Nonindependent variables arise when at least two variables do not vary independently, and such variables are often characterized by their covariance matrices, distribution functions, copulas, and weighted distributions (see, e.g., [1,2,3,4,5,6,7]). Recently, dependency models provide explicit functions that link these variables together by means of additional independent variables [8,9,10,11,12]. Models with nonindependent input variables, including functions subjected to constraints, are widely encountered in different scientific fields, such as data analysis, quantitative risk analysis, and uncertainty quantification (see, e.g., [13,14,15]).

Analyzing such functions requires being able to calculate or to compute their dependent gradients, that is, the gradients that account for the dependencies among the inputs. Recall that gradients are involved in (i) inverse problems and optimization (see, e.g., [16,17,18,19,20]), (ii) exploring complex mathematical models or simulators (see [21,22,23,24,25,26,27,28] for independent inputs and [9,15] for nonindependent variables); (iii) Poincaré inequalities and equalities [9,28,29,30], and recently in (iv) the derivative-based ANOVA (i.e., exact expansions) of functions [28]. While the first-order derivatives of functions with nonindependent variables have been derived in [9] for screening dependent inputs of high-dimensional models, the theoretical expressions of the gradients of such functions (dependent gradients) have been introduced in [15], thereby enhancing the difference between the gradients and the first-order partial derivatives when the input variables are dependent or correlated.

In high-dimensional settings and for time-demanding models, having an efficient approach for computing the dependent gradients provided in [15] using a few model evaluations is worth investigating. So far, the adjoint methods can provide the exact classical gradients for some classes of PDE/ODE-based models [31,32,33,34,35,36]. Additionally, Richardson’s extrapolation and its generalization considered in [37] provide accurate estimates of the classical gradients using a number of model runs that strongly depends on the dimensionality. In contrast, the Monte Carlo approach allows for computing the classical gradients using a number of model runs that can be very less than the dimensionality (i.e.,

d \in N

) [17,38,39]. The Monte Carlo approach is a consequence of the Stokes theorem, which claims that the expectation of a function evaluated at a random point about

x \in R^{d}

is the gradient of a certain function. Such a property leads to randomized approximations of the classical gradients in derivative-free optimization or zero-order stochastic optimization (see [16,18,19,20] and the references therein). Such approximations are also relevant for applications in which the computations of the gradients are impossible [20].

Most of the randomized approximations of the classical gradients, including the Monte Carlo approach, rely on randomized kernels and/or random vectors that are uniformly distributed on the unit ball. The qualities of such approximations are often assessed by the upper bounds of the biases and the rates of convergence. The upper bounds provided in [19,20,40] depend on the dimensionality in general.

In this paper, we propose new surrogates of the gradients of smooth functions with nonindependent inputs and the associated estimators that comply with the following requirements:

They are simple and applicable to a wide class of functions by making use of model evaluations at randomized points, which are only based on independent, central, and symmetric variables;
They lead to a dimension-free upper bound of the bias, and they improve the best known upper bounds of the bias for the classical gradients;
They lead to the optimal and parametric (mean squared error) rates of convergence;
They are going to increase the computational efficiency and accuracy of the gradients estimates by means of a set of constraints.

The surrogates of the dependent gradients are derived in Section 3 by combining the properties of (i) the generalized Richardson extrapolation approach thanks to a set of constraints and (ii) the Monte Carlo approach based only on independent random variables that are symmetrically distributed about zero. Such expressions are followed by their order of approximations, biases, and a comparison with known results for the classical gradients. We also provide the estimators of such surrogates and their associated mean squared errors, including the rates of convergence for a wide class of functions (see Section 3.3). A number of numerical comparisons is considered so as to assess the efficiency of our approach. Section 4 presents comparisons of our approach to other methods and simulations based on a high-dimensional PDE (spatiotemporal) model with given autocollaborations among the initial conditions, which are considered in Section 5 to compare our approach to the adjoint-based methods. We conclude this work in Section 6.

2. Preliminaries

For an integer

d > 0

, let

X : = (X_{1}, \dots, X_{d})

be a random vector of continuous and nonindependent variables having F as the joint cumulative distribution function (CDF) (i.e.,

X \sim F

). For any

j \in {1, \dots, d}

, we use

F_{x_{j}}

or

F_{j}

for the marginal CDF of

X_{j}

and

F_{j}^{- 1}

for its inverse. Also, we use

(\sim j) : = (1, \dots, j - 1, j + 1, \dots, d)

and

X_{\sim j} : = (X_{1}, \dots, X_{j - 1}, X_{j + 1}, \dots, X_{d})

. The equality (in distribution)

X \overset{d}{=} Z

means that X and Z have the same CDF.

As the sample values of

X

are dependent, here we use

\frac{\partial f}{\partial x_{k}}

for the formal partial derivative of f with respect to

x_{k}

, that is, the partial derivative obtained by considering other inputs as constant or independent of

x_{k}

. Thus,

\nabla f : = {[\frac{\partial f}{\partial x_{1}}, \dots, \frac{\partial f}{\partial x_{d}}]}^{T}

stands for the formal or classical gradient of f.

Given an open set

Ω \subseteq R^{d}

, consider a weak partial differentiable function

f : Ω \to R

[41,42]. Given

\vec{ı} : = (i_{1}, \dots, i_{d}) \in N^{d}

, denote

D^{(\vec{ı})} f : = (\prod_{k = 1}^{d} \frac{\partial^{i_{k}}}{\partial x_{k}}) f

;

{(x)}^{\vec{ı}} = x^{\vec{ı}} : = \prod_{k = 1}^{d} x_{k}^{i_{k}}

,

\vec{ı}! = i_{1}! \dots i_{d}!

, and consider the Hölder space of

α

-smooth functions given by

\forall x, y \in R^{d}

H_{α} : = \{f : R^{d} \to R^{n} : |f (x) - \sum_{0 \leq i_{1} + \dots + i_{d} \leq α - 1} \frac{D^{(\vec{ı})} f (y)}{\vec{ı}!} {(x - y)}^{\vec{ı}}| \leq M_{α} {||x - y||}_{2}^{α}\},

with

α \geq 1

and

M_{α} > 0

. We use

{||\cdot||}_{2}

for the Euclidean norm,

| | \cdot {| |}_{1}

for the

L_{1}

norm,

E (\cdot)

for the expectation, and

V (\cdot)

for the variance.

For the stochastic evaluations of functions, consider

L, q \in N \ {0}

,

β_{ℓ} \in R

with

ℓ = 1, \dots, L

,

h : = (h_{1}, \dots, h_{d}) \in R_{+}^{d}

, and denote with

V : = (V_{1}, \dots, V_{d})

the d-dimensional random vectors of independent variables satisfying the following:

\forall j \in {1, \dots, d}

,

E [V_{j}] = 0; E [{(V_{j})}^{2}] = σ^{2}; E [{(V_{j})}^{2 q + 1}] = 0; E [{(V_{j})}^{2 q}] < + \infty .

The random vectors of the independent variables that are symmetrically distributed about zero are instances of

V

, including the standard Gaussian random vector and the symmetric uniform distributions about zero.

Also, denote

h V : = (h_{1} V_{1}; \dots, h_{d} V_{d})

;

h^{- 1} V : = (V_{1} / h_{1}; \dots, V_{d} / h_{d})

and

β_{ℓ} h V : = (β_{ℓ} h_{1} V_{1}; \dots, β_{ℓ} h_{d} V_{d})

. The real

β_{ℓ}

s are used for controlling the order of approximations and the order of derivatives (i.e.,

| | \vec{ı} {| |}_{1} = 1, 2

) that we are interested in. Finally,

h_{j}

s are used to define a neighborhood of a sample point of

X

(i.e.,

x

). Thus, using

β_{m a x} : = max (| β_{1} |, \dots, | β_{L} |)

and keeping in mind the variance of

β_{ℓ} h_{j} V_{j}

, we assume that

\forall j \in {1, \dots, d}

,

Assumption 1.

β_{m a x} h_{j} σ \leq 1 / 2

or equivalently

0 < β_{m a x} h_{j} | V_{j} | \leq 1

for bounded

V_{j}^{'} s

.

3. Main Results

This section aims at providing new expressions of the gradient of a function with nonindependent variables and the associated order of approximations. We are also going to derive the estimators of such a gradient, including the optimal and parametric rates of convergence. Recall that the input variables are said to be nonindependent whenever there exist at least two variables

X_{j}, X_{k}

such that the joint CDF

F_{j, k} (x_{j}, x_{k}) \neq F_{j} (x_{j}) F_{k} (x_{k})

.

3.1. Stochastic Expressions of the Gradients of Functions with Dependent Variables

Using the fact that

X \sim F

, with

F (x) \neq \prod_{j = 1}^{d} F_{j} (x_{j})

, we are able to model

X

as follows [8,9,10,11,12,14,43]:

\begin{matrix} X_{\sim j} & \overset{d}{=} & r_{j} (X_{j}, Z_{\sim j}) \\ = & {[r_{1, j} (X_{j}, Z_{\sim j}), \dots, r_{j - 1, j} (X_{j}, Z_{\sim j}), r_{j + 1, j} (X_{j}, Z_{\sim j}) \dots, r_{d, j} (X_{j}, Z_{\sim j})]}^{T}, \end{matrix}

(1)

where

r_{j} : R^{d} \to R^{d - 1}

;

X_{j}

; and

Z_{\sim j} : = (Z_{1}, \dots, Z_{j - 1}, Z_{j + 1}, \dots Z_{d})

are independent. Moreover, we have

(X_{j}, X_{\sim j}) \overset{d}{=} (X_{j}, r_{j} (X_{j}, Z))

, and it is worth noting that the function

r_{j}

is invertible with respect to

Z_{\sim j}

for continuous variables, that is,

Z_{\sim j} = r_{j}^{- 1} (X_{\sim j} | X_{j}) .

Note that the formal Jacobian matrix of

g : R^{d} \to R^{d}

,

x \mapsto x

is the identity matrix. As

x

is a sample value of

X

, the dependent Jacobean of g based on the above dependency function is clearly not the identity matrix due to the fact that such a matrix accounts for the dependencies among the elements of

x

. The dependent partial derivatives of

x

with respect to

x_{j}

are then given by [9,15]

J^{(j)} (x) : = \frac{\partial x}{\partial x_{j}} = {[\frac{\partial r_{1, j}}{\partial x_{j}} \dots \underset{j^{th} position}{\underset{︸}{1}} \dots \frac{\partial r_{d, j}}{\partial x_{j}}]}^{T} (x_{j}, r_{j}^{- 1} (x_{\sim j} | x_{j})),

and the dependent Jacobian matrix becomes (see [15] for more details)

J^{d} (x) : = [J^{(1)} (x), \dots, J^{(d)} (x)] .

Moreover, the gradient of f with nonindependent variables is given by [15]

g r a d (f) (x) : = {[J^{d} {(x)}^{T} J^{d} (x)]}^{- 1} \nabla f (x) = G^{- 1} (x) \nabla f (x),

(2)

with

G (x) : = J^{d} {(x)}^{T} J^{d} (x)

being the tensor metric and

G^{- 1} (x)

being its generalized inverse. Based on the above framework, Theorem 1 provides the stochastic expression of

g r a d (f) (x)

. In what follows, denote

𝟙_{•} : = {[1, \dots, 1]}^{T} \in R^{d}

.

Theorem 1.

Assume that

f \in H_{α}

, with

α \geq 2 L

, Assumption 1 holds and that the

β_{ℓ}

s are distinct. Then, there exists

α_{1} \in {1, \dots, L}

and reals coefficients

C_{1}, \dots, C_{L}

such that

g r a d (f) (x) = G^{- 1} (x) \sum_{ℓ = 1}^{L} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}] + O ({||h||}_{2}^{2 α_{1}}) 𝟙_{•} .

(3)

Proof.

See Appendix A for the detailed proof. □

Using the Kronecker symbol

δ_{1, r}

, the setting

L = 1, β_{1} = 1, C_{1} = 1

, or the constraints

\sum_{ℓ = 1}^{L = 2} C_{ℓ} β_{ℓ}^{r} = δ_{1, r}; r = 0, 1

lead to the order of approximation

O ({||h||}_{2}^{2})

, while the constraints

\sum_{ℓ = 1}^{L} C_{ℓ} β_{ℓ}^{r} = δ_{1, r}; r = 1, 3, 5, \dots, 2 L - 1

allow for increasing that order up to

O ({||h||}_{2}^{2 L})

. For distinct

β

s, the above constraints lead to the existence of the constants

C_{1}, \dots, C_{L}

. Indeed, some constraints rely on the Vandermonde matrix of the form

A_{L} : = [\begin{matrix} 1 & 1 & \dots & 1 \\ β_{1} & β_{2} & \dots & β_{L} \\ β_{1}^{2} & β_{2}^{2} & \dots & β_{L}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \end{matrix}],

which is invertible for distinct values of the

β_{ℓ}

s (i.e.,

β_{ℓ_{1}} \neq β_{ℓ_{2}}

), because the determinant

det (A_{L}) = \prod_{1 \leq ℓ_{1} < ℓ_{2} \leq L} (β_{ℓ_{1}} - β_{ℓ_{2}})

.

Remark 1.

For an even integer L, the following nodes may be considered:

\{β_{1}, \dots, β_{L}\} = \{\pm 2^{k}, k = 0, \dots, \frac{L - 2}{2}\}

. When L is odd, one may add 0 to the above set. Of course, there are other possibilities, provided that

\sum_{ℓ = 1}^{L} C_{ℓ} β_{ℓ} = 1

.

Beyond the strong assumption made on the functions in Theorem 1, and knowing that increasing L will require more evaluations of f at random points, we are going to derive the upper bounds of the biases of our appropriations under different structural assumptions on the deterministic functions f and

V

, such as

f \in H_{α}

, with

α > 1

. To that end, denote

R : = (R_{1}, \dots, R_{d})

as a d-dimensional random vector of independent variables that are centered about zero and standardized (i.e.,

E [R_{k}^{2}] = 1

,

k = 1, \dots, d

), and let

R_{c}

be the set of such random vectors. Define

K_{1} : = inf_{R \in R_{c}} {|||G^{- 1} (x)| E [R^{2} {||R||}_{2}]||}_{1}; K_{2} : = inf_{R \in R_{c}} {|||G^{- 1} (x)| E [R^{2} {||R||}_{2}]||}_{2};

with

|G^{- 1}|

the matrix obtained by putting the entries of

G^{- 1}

in the absolute value.

When

1 < α \leq 2

, only

L = 1

or

L = 2

can be considered for any function that belongs to

H_{α}

. To be able to derive the parametric rates of convergence, Corollary 1 starts providing the upper bounds of the bias when

L = 2

.

Corollary 1.

Consider

β_{1} = 1, β_{2} = - 1

;

C_{1} = 1 / 2

; and

C_{2} = - 1 / 2

. If

f \in H_{2}

and Assumption 1 hold, then there exists

M_{2} > 0

such that

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{1} \leq σ M_{2} K_{1} {||h||}_{2};

(4)

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \leq σ M_{2} K_{2} {||h||}_{2} .

(5)

Proof.

Detailed proofs are provided in Appendix B. □

For a particular choice of

V

, we obtain the results below.

Corollary 2.

Consider

β_{1} = 1, β_{2} = - 1

;

C_{1} = 1 / 2

;

C_{2} = - 1 / 2

. If

V_{k} \sim U (- ξ, ξ)

, where

ξ > 0

,

k = 1, \dots, d

,

f \in H_{2}

, and Assumption 1 hold, then

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{1} \leq M_{2} ξ {|||G^{- 1} (x)| 𝟙_{•}||}_{1} {||h||}_{1};

(6)

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \leq M_{2} ξ {|||G^{- 1} (x)| 𝟙_{•}||}_{2} {||h||}_{1} .

(7)

Proof.

Since

| V_{k} | \leq ξ

, we have

{||h V||}_{1} \leq ξ {||h||}_{1}

, and the results hold using the following upper bounds:

M_{2} {|||G^{- 1} (x)| E [\frac{V^{2}}{σ^{2}} {||h V||}_{1}]||}_{1}

and

M_{2} {|||G^{- 1} (x)| E [\frac{V^{2}}{σ^{2}} {||h V||}_{1}]||}_{2}

obtained in Appendix B. □

It is worth noting that choosing

h_{k} = h

and

ξ = 1 / d^{2}

leads to the dimension-free upper bound of the bias, that is,

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{1} \leq \frac{M_{2} h}{d} {|||G^{- 1} (x)| 𝟙_{•}||}_{1},

because

{|||G^{- 1} (x)| 𝟙_{•}||}_{2}

is a function of d in general.

For the sequel of generality, Corollary 3 provides the bias of our approximations for highly smooth functions. To that end, define

K_{2, L} : = inf_{R \in R_{c}} {|||G^{- 1} (x)| E [R^{2} {||R||}_{2}^{L}]||}_{2}; K_{3} : = \sum_{ℓ = 1}^{L + 1} |C_{ℓ} β_{ℓ}^{1 + L}| .

Corollary 3.

For an odd integer

L > 2

, consider

\sum_{ℓ = 1}^{L + 1} C_{ℓ} β_{ℓ}^{r} = δ_{1, r}; r = 0, 1, \dots, L

. If

f \in H_{1 + L}

and Assumption 1 hold, then there exists

M_{1 + L} > 0

such that

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L + 1} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \leq σ^{L} M_{1 + L} K_{2, L} K_{3} {||h||}_{2}^{L} .

Moreover, if

V_{k} \sim U (- ξ, ξ)

, with

ξ > 0

and

k = 1, \dots, d

, then

{||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L + 1} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \leq ξ^{L} M_{1 + L} {|||G^{- 1} (x)| 𝟙_{•}||}_{2} {||h||}_{1}^{L} K_{3} .

Proof.

The proofs are similar to those of Corollary 1 (see Appendix B). □

In view of the results provided in Corollary 3, finding the

β

s and Cs that minimize the quantity

K_{3} = \sum_{ℓ = 1}^{L + 1} |C_{ℓ} β_{ℓ}^{1 + L}|

might be helpful for improving the above upper bounds.

3.2. Links to Other Works for Independent Input Variables

Recall that for independent input variables, the matrix

|G^{- 1} (x)|

comes down to the identity matrix, and

g r a d (f) = \nabla f

. Thus, Equation (7) becomes

{||\nabla f (x) - \sum_{ℓ = 1}^{L = 2} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \leq M_{2} h,

when

ξ = \sqrt{d} / d^{2}

. Taking

ξ = \sqrt{d} / d

leads to the upper bound

M_{2} h d

.

Other results about the upper bounds of the bias of the (formal) gradient approximations have been provided in [19,20] (and the references therein) under the same assumptions made on f and the evaluations of f. Such results rely on a random vector

S

that is uniformly distributed on the unit ball and a kernel K. Under such a framework, the upper bound derived in [19,20] is

{||\nabla f (x) - \frac{d}{h} E [f (x + U h S) S K (U)]||}_{2} \leq 2 \sqrt{2} α d M_{α} h^{α - 1},

where

U \sim U (- 1, 1)

is independent of

S

. Therefore, our results improve the upper bound obtained in [19,20] when

α = 2

, for instance.

3.3. Computation of the Gradients of Functions with Dependent Variables

Consider a sample of

V

given by

{\{V_{i} : = (V_{i, 1}, \dots, V_{i, d})\}}_{i = 1}^{N}

. Using Equation (3), the estimator of

g r a d (f) (x)

is derived as follows:

\hat{g r a d (f)} (x) : = G^{- 1} (x) \frac{1}{N} \sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V_{i}) \frac{V_{i} h^{- 1}}{σ^{2}} .

To assess the quality of such an estimator, it is common to use the mean squared error (MSE), including the rates of convergence. The MSEs are often used in statistics for determining the optimal value of

h

as well. Theorem 2 and Corollary 4 provide such quantities of interest. To that end, define

K_{4} : = inf_{R \in R_{c}} E [{||G^{- 1} (x) R h^{- 1}||}_{2}^{2} {||R^{2}||}_{2}] .

Theorem 2.

Consider

β_{1} = 1, β_{2} = - 1

;

C_{1} = 1 / 2

; and

C_{2} = - 1 / 2

. If

f \in H_{2}

and Assumption 1 hold, then

E [{||\hat{g r a d (f)} (x) - g r a d (f) (x)||}_{2}^{2}] \leq σ^{2} M_{2}^{2} K_{2}^{2} {||h||}_{2}^{2} + \frac{M_{1}^{2} K_{4} {||h^{2}||}_{2}}{N} .

(8)

Moreover, if

V_{k} \sim U (- ξ, ξ)

, with

ξ > 0

,

k = 1, \dots, d

and

R_{0} : = V / σ

, then

\begin{matrix} E [{||\hat{g r a d (f)} (x) - g r a d (f) (x)||}_{2}^{2}] & \leq & M_{2}^{2} ξ^{2} {|||G^{- 1} (x)| 𝟙_{•}||}_{2}^{2} {||h||}_{1}^{2} \\ + \frac{3 \sqrt{d} M_{1}^{2} {||h^{2}||}_{2}}{N} E [{||G^{- 1} (x) R_{0} h^{- 1}||}_{2}^{2}] . \end{matrix}

(9)

Proof.

See Appendix C. □

Using a uniform bandwidth, that is,

h_{k} = h

, with

k = 1, \dots, d

, the upper bounds of MSEs provided in Theorem 2 have simple expressions. Indeed, the upper bounds in Equations (8) and (9) become, respectively,

σ^{2} M_{2}^{2} K_{2}^{2} d h^{2} + \frac{M_{1}^{2} \sqrt{d}}{N} inf_{R \in R_{c}} E [{||G^{- 1} (x) R||}_{2}^{2} {||R^{2}||}_{2}];

M_{2}^{2} ξ^{2} {|||G^{- 1} (x)| 𝟙_{•}||}_{2}^{2} d^{2} h^{2} + \frac{3 d M_{1}^{2}}{N} E [{||G^{- 1} (x) R_{0}||}_{2}^{2}] .

It comes out that the second terms of the above upper bounds do not depend on the bandwidth h. This key observation leads to the derivation of the optimal and parametric rates of convergence of the proposed estimator.

Corollary 4.

Under the assumptions made in Theorem 2, if

ξ = d^{- 3 / 2}

and

h_{k} = h \propto N^{- γ / 2}

, with

γ \in] 1, 2 [

, then we have

E [{||\hat{g r a d (f)} (x) - g r a d (f) (x)||}_{2}^{2}] = O (N^{- 1} d^{2}) .

Proof.

The proof is straightforward, since

h^{2} \propto N^{- γ}

and

N h \to \infty

when

N \to \infty

. □

It is worth noting that the upper bound of the squared bias obtained in Corollary 4 does not depend on the dimensionality, thanks to the choice of

ξ = d^{- 3 / 2}

. But, the derived rate of convergence depends on

d^{2}

, thus meaning that our estimator suffers from the curse of dimensionality. In higher dimensions, an attempt to improve our results consists of controlling the upper bound of the second-order moment of the estimator through

\sum_{ℓ = 1}^{L} |C_{ℓ} β_{ℓ}|

.

Remark 2.

For highly smooth functions (i.e.,

f \in H_{1 + L}

, with

L > 3

) and under the assumptions made in Corollary 3, we can check that (see Appendix C)

\begin{matrix} E [{||\hat{g r a d (f)} (x) - g r a d (f) (x)||}_{2}^{2}] & \leq & ξ^{2 L} M_{1 + L}^{2} {‖ |G^{- 1} (x)| 𝟙_{•} ‖}_{2}^{2} {||h||}_{1}^{2 L} K_{3}^{2} \\ + \frac{3 \sqrt{d} M_{1}^{2} {||h^{2}||}_{2}}{N} {(\sum_{ℓ = 1}^{L + 1} |C_{ℓ} β_{ℓ}|)}^{2} E [{||G^{- 1} (x) R_{0} h^{- 1}||}_{2}^{2}] . \end{matrix}

4. Computations of the Formal Gradient of Rosenbrock’s Function

For comparing our approach to (i) the finite differences method (FDM) using the R package numDeriv [44], with

h = 10^{- 4}

, and (ii) the Monte Carlo (MC) approach provided in [17], with

h = 10^{- 4}

, let us consider the Rosenbrock function given as follows:

\forall x \in R^{d}

,

r (x) : = \sum_{k = 1}^{d - 1} [{(1 - x_{k})}^{2} + 100 {(x_{k + 1} - x_{k}^{2})}^{2}] .

The gradient of that function at

0

is

\nabla r (0) = {[- 2, \dots, - 2, 0]}^{T} \in R^{100}

(see [17]). To assess the numerical accuracy of each approach, the following measure is considered:

E r r : = \frac{{||\nabla r (0) - \hat{\nabla r} (0)||}_{1}}{{||\nabla r (0)||}_{1}},

where

\hat{\nabla r} (0)

is the estimated value of the gradient. Table 1 reports the values of

E r r

for the three approaches. To obtain the results using our approach, we have used

h = 1 / \sqrt{N}

, with N as the sample size and

ξ = 1 / d^{2} = 10^{- 4}

, with

d = 100

. Also, the Sobol sequence has been used for generating the values of the

V_{j}

s, and the Gram–Schmidt algorithm is applied to obtain (perfect) orthogonal vectors for a given N.

Based on Table 1, our approach provides efficient results compared to other methods. Since the FDM is not possible when

N < 2 d = 200

, it comes out that our approach is quite flexible thanks to L and the fact that the gradient can be computed for every value of N. Increasing N improved our results, as expected.

5. Application to a Heat PDE Model with Stochastic Initial Conditions

5.1. Heat Diffusion Model and Its Formal Gradient

Consider a time-dependent model

f (x, t)

defined by the one-dimensional (1-D) diffusion PDE with stochastic initial conditions, that is,

\{\begin{matrix} \frac{\partial f}{\partial t} - D \frac{\partial^{2} f}{\partial^{2} x} = 0, & x \in] 0, 1 [, t \in [0, T] \\ f (x, t = 0) = Z (x) & x \in [0, 1] \\ f (x = 0, t) = 0, f (x = 1, t) = 1, & t \in [0, T] \end{matrix},

where

D \in R_{+}

represents the diffusion coefficient. It is common to consider

J (Z (x)) : = \frac{1}{2} \int_{0}^{T} \int_{0}^{10} {(f (x, t))}^{2} d x d t

as the quantity of interest (QoI). The spatial discretisation consists in subdividing the spatial domain

[0, 1]

in d equally sized cells, which leads to d initial conditions or inputs given by

Z (x_{j})

with

j = 1, \dots, d

. Given zero-mean random variables

(R_{j}, j = 1, \dots, d)

, assume that

X_{j} : = Z (x_{j}) = sin (2 π x_{j}) + s_{j} R_{j}, j = 1, \dots, d

, where

s_{j} \in R_{+}

represents the inverse precision about our knowledge on the initial conditions. For the dynamic aspect, a time step of

0.025

is considered starting from 0 up to

T = 5

.

Given a direction

z (x)

and the Gâteaux derivative

\overset{ˇ}{f} (x, t) : = \frac{\partial f}{\partial z (x)}

, the tangent linear model is derived as follows:

\{\begin{matrix} \frac{\partial \overset{ˇ}{f}}{\partial t} - D \frac{\partial^{2} \overset{ˇ}{f}}{\partial^{2} x} = 0, & x \in] 0, 1 [, t \in [0, T] \\ \overset{ˇ}{f} (x, t = 0) = z (x), & x \in [0, 1] \\ \overset{ˇ}{f} (x = 0, t) = \overset{ˇ}{f} (x = 1, t) = 0, & t \in [0, T] \end{matrix},

and we can check that the adjoint model (AM) (i.e.,

f^{a}

) is given by

\{\begin{matrix} - \frac{\partial f^{a}}{\partial t} - D \frac{\partial^{2} f^{a}}{\partial^{2} x} = f, & x \in] 0, 1 [, t \in [0, T] \\ f^{a} (x = 0, t) = f^{a} (x = 1, t) = 0, & t \in [0, T] \\ f^{a} (x, T) = 0, & x \in [0, 1] \end{matrix} .

The formal gradient of

J (Z (x))

with respect to the inputs

Z (x)

is

\nabla_{Z} J (Z (x)) = f^{a} (x, 0)

. Remark that the above gradient relies on

f^{a} (x, 0)

, and only one evaluation of such a function is needed.

5.2. Spatial Autocorrelations of Initial Conditions and the Tensor Metric

Recall that the above gradient is based on the assumption of independent input variables, thus suggesting that the initial conditions within different cells are uncorrelated. To account for the spatial autocorrelations between different cells, assume that the d input variables follow the Gaussian process with the following autocorrelation function:

ρ (X_{j_{1}}, X_{j_{2}}) = {(\frac{1}{2})}^{| j_{1} - j_{2} |} 𝟙_{[0, 3]} (| j_{1} - j_{2} |); \forall j_{1}, j_{2} \in {1, \dots, d},

where

𝟙_{[0, 3]} (| j_{1} - j_{2} |) = 1

if

| j_{1} - j_{2} | \in [0, 3]

and is zero otherwise. Such spatial autocorrelations lead to the correlation matrix of the form

R : = [\begin{matrix} 1 & 0.5 & 0.25 & 0.125 & 0 & 0 & 0 & \dots & 0 \\ 0.5 & 1 & 0.5 & 0.25 & 0.125 & 0 & 0 & \dots & 0 \\ 0.25 & 0.5 & 1 & 0.5 & 0.25 & 0.125 & 0 & \dots & 0 \\ 0.125 & 0.25 & 0.5 & 1 & 0.5 & 0.25 & 0.125 & 0 & \dots 0 \\ 0 & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ \dots \end{matrix}] .

Using the same standard deviation

s_{j} = s

leads to the following covariance matrix

Σ = s^{2} R

, and

X = (X_{1}, \dots, X_{d}) \sim N_{d} (μ, Σ)

, with

μ : = (sin (2 π c_{1}), \dots, sin (2 π c_{d}))

, and

c_{1}, \dots, c_{d}

being the centers of the cells. The associated dependency model is given below.

Consider the diagonal matrix

D_{\sim j} = d i a g (Σ_{1, 1}, \dots, Σ_{j - 1, j - 1}, Σ_{j + 1, j + 1}, \dots, Σ_{d, d})

and the Gaussian random vector

W \sim N_{d - 1} (μ_{\sim j}, D_{\sim j})

. Denote with

Σ^{(j)}

the matrix obtained by moving the

j^{th}

row and column of

Σ

to the the first row and column;

L^{(j)}

is the Cholesky factor of

Σ^{(j)}

, and

μ^{(j)} : = (μ_{j}, μ_{1}, \dots, μ_{j - 1}, μ_{j + 1}, \dots, μ_{d})

. We can see that

(X_{j}, X_{\sim j}) \sim N_{d} (μ^{(j)}, Σ^{(j)})

, and the dependency model is given by [10]

\begin{matrix} (X_{j}, X_{\sim j}) & = & L^{(j)} [\begin{matrix} \frac{1}{\sqrt{Σ_{j, j}}} (X_{j} - E [X_{j}]) \\ D_{\sim j}^{- 1 / 2} (W - μ_{\sim j}) \end{matrix}] + μ^{(j)}; j = 1, \dots, d . \end{matrix}

(10)

Based on Equation (10), we have

\frac{\partial X_{\sim j}}{\partial x_{j}} = \frac{L_{\sim 1, 1}^{(j)}}{\sqrt{Σ_{j, j}}} = \frac{Σ_{\sim 1, 1}^{(j)}}{Σ_{j, j}} = \frac{Σ_{\sim j, j}}{Σ_{j, j}}

. Thus, we can deduce that

J^{(j)} = \frac{Σ_{•, j}}{Σ_{j, j}}

, with

Σ_{•, j}

being the

j^{th}

column of

Σ

, and the dependent Jacobian becomes

J^{d} = [J^{(1)}, \dots, J^{(d)}] = [\frac{Σ_{•, 1}}{Σ_{1, 1}}, \dots, \frac{Σ_{•, d}}{Σ_{d, d}}] = \frac{Σ}{s^{2}} = R

, since

Σ_{j, j} = s_{j}^{2} = s^{2}

, and

Σ = s^{2} R

. The tensor metric is given by

G = R^{T} R

.

5.3. Comparisons between Exact Gradient and Estimated Gradients

For running the above PDE-based model using the R package deSolve [45], we are given

D = 0.0011

and

s = 1.96

. The exact and formal gradient associated with the mean values of the initial conditions is obtained by running the corresponding adjoint model. For estimating the gradient using the proposed estimators, we consider

L = 2, 3

and

N = 50, 100, 150, 200

. We also use

h = 1 / \sqrt{N}

and

V_{j} \sim N (0, 1), j = 1, \dots, d = 50

. The Sobol sequence is used for generating the random values of the

V_{j}

s, and the Gram–Schmidt algorithm is applied to obtain perfect orthogonal vectors for a given N.

Figure 1 shows the comparisons between the estimated and the exact values of the formal gradient

\nabla f

(i.e.,

ρ (X_{j_{1}}, Z_{j_{2}}) = 0

) for

L = 1, 2

. Likewise, Figure 2 and Figure 3 depict the dependent gradient

g r a d (f) = {(R^{T} R)}^{- 1} \nabla f

and its estimation. The estimates of both gradients are in line with the exact values using only

N L = 50

(respectively,

N L = 100

) model evaluations when

L = 1

and

N = 50

(respectively,

L = 1

and

N = 100

or

L = 2

and

N = 50

). Increasing the values of L and N gives the same quasiperfect results for both the formal and dependent gradients (see Figure 3).

6. Conclusions

In this paper, we have proposed new, simple, and generic approximations of the gradients of functions with nonindependent input variables by means of independent, central, and symmetric variables and a set of constraints. It comes out that the biases of our approximations for a wide class of functions, such as 2-smooth functions, do not suffer from the curse of dimensionality by properly choosing the set of independent, central, and symmetric variables. For functions including only independent input variables, a theoretical comparison has shown that the upper bounds of the bias of the formal gradient derived in this paper outperformed the best known results.

For computing the dependent gradient of the function of interest, we have provided estimators of such a gradient by making use of evaluations of that function at

L N

randomized points. Such estimators reach the optimal (mean squared error) rates of convergence (i.e.,

O (N^{- 1} d^{2})

) for a wide class of functions. Numerical comparisons using a test case and simulations based on a PDE model with given autocollaborations among the initial conditions have shown the efficiency of our approach, even when

L = 1, 2

constraints were used. Our approach is therefore flexible, thanks to L and the fact that the gradient can be computed for every value of the sample size N in general.

While the proposed estimators reach the parametric rate of convergence, note that the second-order moments of such estimators depend on

d^{2}

. An attempt to reach a dimension-free rate of convergence requires working in

C

rather than

R

when

L = 2

. In the future, it is worth investigating the derivation of the optimal rates of convergence that are dimension-free or (at least) are linear with respect to d by considering

L > 3

constraints. Also, combining such a promising approach with a transformation of the original space might be helpful for reducing the number of model evaluations in higher dimensions.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to thank the reviewers for their comments that have helped to improve our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

As

\vec{ı} = (ı_{1}, \dots, ı_{d})

, let

\vec{k} = (0, \dots, 0, \underset{k th position}{\underset{︸}{1}}, 0 \dots, 0) \in R^{d}

and

\vec{q} = (q_{1}, \dots, q_{d}) \in N^{d}

. Multiplying the Taylor expansion of

f (x + β_{ℓ} h V)

about

x

, that is,

f (x + β_{ℓ} h V) = \sum_{p = 0}^{m} \sum_{| | \vec{ı} {| |}_{1} = p} \frac{D^{(\vec{ı})} f (x)}{\vec{ı}!} β_{ℓ}^{p} {(h V)}^{\vec{ı}} + O ({||β_{ℓ} h V||}_{2}^{m + 1}),

by

\frac{V h^{- 1}}{σ^{2}} \in R^{d}

and the constant

C_{ℓ}

, as well as taking the sum over

ℓ = 1, \dots, L

, we can see that the expectation

E : = \sum_{ℓ = 1}^{L} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]

becomes

\begin{matrix} E & = & \sum_{p \geq 0} \sum_{| | \vec{ı} {| |}_{1} = p} \frac{D^{(\vec{ı})} f (x)}{\vec{ı}!} (\sum_{ℓ} C_{ℓ} β_{ℓ}^{p}) E [\frac{{(V)}^{\vec{ı}} {(h)}^{\vec{ı}} V h^{- 1}}{σ^{2}}] . \end{matrix}

Firstly, for a given

k \in {1, \dots, d}

and by independence, we can see that

E [{(V)}^{\vec{ı}} {(h)}^{\vec{ı}} V_{k} h_{k}^{- 1}] = E [{(V)}^{\vec{ı} + \vec{k}} {(h)}^{\vec{ı} - \vec{k}}] \neq 0

if

ı_{k} = 2 q_{k} + 1

; thus,

ı_{j} = 2 q_{j}

for any

j \in {1, \dots, d} \ {k}

, with

q_{k}, q_{j} \in N

, which implies that

\vec{ı} = \vec{k} + 2 \vec{q}

. Thus, one obtains

\frac{\partial f}{\partial x_{k}}

when

| | \vec{ı} {| |}_{1} = | | \vec{k} + 2 \vec{q} {| |}_{1} = 1

, and the fact that

E [V_{k}^{2}] = σ^{2}

;

E [V_{j}] = 0

; and

\sum_{ℓ} C_{ℓ} β_{ℓ} = 1

. At this point, by taking

k = 1, \dots, d

and setting

L = 1, β_{ℓ} = 1

and

C_{ℓ} = 1

, this results in the approximation of

\nabla f (x)

of order

O ({||h||}_{2}^{2})

, because when

| | \vec{ı} {| |}_{1} = 2

,

E [{(V)}^{\vec{ı}} {(h)}^{\vec{ı}} V_{k} h_{k}^{- 1}] = 0

.

Secondly, for

L > 1

the constraints

\sum_{ℓ = 1}^{L} C_{ℓ} β_{ℓ}^{r + 1} = δ_{0, r}

r = 0, 2, \dots, 2 (L - 1)

, we eliminate some higher-order terms so as to reach the order

O ({||h||}_{2}^{2 L})

. Using other constraints, we complete the proof, thus bearing in mind Equation (2).

Appendix B. Proof of Corollary 1

For

\vec{q} = (q_{1}, \dots, q_{d}) \in N^{d}

;

k \in {1, \dots, d}

and

\vec{k} = (0, \dots, 0, \underset{k th position}{\underset{︸}{1}}, 0 \dots, 0) \in R^{d}

, consider

s_{k} : = \{\vec{q} + \vec{k} : | | \vec{q} {| |}_{1} = 1\}

. As

f \in H_{2}

, we can write

f (x + β_{ℓ} h V) = \sum_{| | \vec{ı} {| |}_{1} = 0}^{1} D^{(\vec{ı})} f (x) β_{ℓ}^{| | \vec{ı} {| |}_{1}} \frac{{(h V)}^{\vec{ı}}}{\vec{ı}!} + \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = 2 \\ \vec{ı} \notin s_{k} \end{matrix}} D^{(\vec{ı})} f (x) \frac{β_{ℓ}^{2} {(h V)}^{\vec{ı}}}{\vec{ı}!} + R_{k} (h, β_{ℓ}, V),

with the remainder term

\begin{matrix} R_{k} (h, β_{ℓ}, V) & = & \sum_{\begin{matrix} | | \vec{ı} {| |}_{1} = 2 \\ \vec{ı} \in s_{k} \end{matrix}} D^{(\vec{ı})} f (x + β_{ℓ} h V) \frac{β_{ℓ}^{2} {(h V)}^{\vec{ı}}}{\vec{ı}!} = \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = 1 \\ \vec{ı} \in s_{k} \end{matrix}} D^{(\vec{k} + \vec{q)}} f (x + β_{ℓ} h V) \frac{β_{ℓ}^{2} {(h V)}^{\vec{q} + \vec{k}}}{(\vec{k} + \vec{q})!} \\ = & h_{k} V_{k} β_{ℓ}^{2} \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = 1 \end{matrix}} D^{(\vec{k} + \vec{q)}} f (x + β_{ℓ} h V) \frac{{(h V)}^{\vec{q}}}{(\vec{k} + \vec{q})!} . \end{matrix}

Denote

R_{k}^{0} : = \sum_{\begin{matrix} | | \vec{q} {| |}_{1} = 1 \end{matrix}} D^{(\vec{k} + \vec{q)}} f (x + β_{ℓ} h V) \frac{{(h V)}^{\vec{q}}}{(\vec{k} + \vec{q})!}

, and remark that

| R_{k}^{0} | \leq M_{2} {||h V||}_{1}

. Using Theorem 1, we can see that the absolute value of the bias, that is,

B : = {||g r a d (f) (x) - G^{- 1} (x) \sum_{ℓ = 1}^{L} C_{ℓ} E [f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{1}

is given by

\begin{matrix} B & = & {||G^{- 1} (x) E [\nabla f (x) - \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{1} \\ = & {||G^{- 1} (x) \sum_{ℓ = 1}^{L} C_{ℓ} \frac{β_{ℓ}^{2}}{σ^{2}} E {[V_{1}^{2} R_{1}^{0}, \dots, V_{d}^{2} R_{d}^{0}]}^{T}||}_{1} \\ \leq & \sum_{ℓ = 1}^{L} |C_{ℓ}| \frac{β_{ℓ}^{2}}{σ^{2}} {||G^{- 1} (x) E {[V_{1}^{2} R_{1}^{0}, \dots, V_{d}^{2} R_{d}^{0}]}^{T}||}_{1} \\ \leq & \sum_{ℓ = 1}^{L} |C_{ℓ}| β_{ℓ}^{2} M_{2} {|||G^{- 1} (x)| E [\frac{V^{2}}{σ^{2}} {||h V||}_{1}]||}_{1}, \end{matrix}

using the expansion of the product between matrices.

Using the same reasoning and taking the Euclidean norm, we obtain

\begin{matrix} B_{2} & = & {||G^{- 1} (x) E [\nabla f (x) - \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V) \frac{V h^{- 1}}{σ^{2}}]||}_{2} \\ \leq & \sum_{ℓ = 1}^{L} |C_{ℓ}| β_{ℓ}^{2} M_{2} {|||G^{- 1} (x)| E [\frac{V^{2}}{σ^{2}} {||h V||}_{1}]||}_{2} . \end{matrix}

The results hold using

R : = V / σ

.

Appendix C. Proof of Theorem 2

Firstly, remark that

M S E : = E [{||\hat{g r a d (f)} (x) - g r a d (f) (x)||}_{2}^{2}]

is given by

M S E = E [{||\hat{g r a d (f)} (x) - E [\hat{g r a d (f)} (x)]||}_{2}^{2} + {||E [\hat{g r a d (f)} (x)] - g r a d (f) (x)||}_{2}^{2}] .

Since the bias

E [{||E [\hat{g r a d (f)} (x)] - g r a d (f) (x)||}_{2}^{2}]

has been derived in previous Corollaries, we are going to treat the second-order moment.

Secondly, since

f \in H_{2}

implies that

f \in H_{1}

, we have

|f (x + β_{ℓ} h V) - f (x)| \leq M_{1} {||β_{ℓ} h V||}_{2} .

Also, since

\sum_{ℓ = 1}^{L} C_{ℓ} = 0

, we then have

\sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V) = \sum_{ℓ = 1}^{L} C_{ℓ} [f (x + β_{ℓ} h V) - f (x)],

which leads to

|\sum_{ℓ = 1}^{L} C_{ℓ}^{(| u |)} f (x + β_{ℓ} h V)| \leq \sum_{ℓ = 1}^{L} |C_{ℓ} β_{ℓ}| M_{1} {||h V||}_{2}

and

Q (x) : = G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}} \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V) = G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}} \sum_{ℓ = 1}^{L} C_{ℓ} [f (x + β_{ℓ} h V) - f (x)] .

(A1)

Thirdly, using (3), we can see that

E [Q (x)] = E [\hat{g r a d (f)} (x)]

. Bearing in mind the definition of the Euclidean norm and the variance, the centered second-order moment, that is,

V_{g r a d} : = E [{||\hat{g r a d (f)} (x) - E [\hat{g r a d (f)} (x)]||}_{2}^{2}]

is given by

\begin{matrix} V_{g r a d} & \leq & \frac{1}{N} E [{||G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}} \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V) - E [\hat{g r a d (f)} (x)]||}_{2}^{2}] \\ \leq & \frac{1}{N} E [{||G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}} \sum_{ℓ = 1}^{L} C_{ℓ} f (x + β_{ℓ} h V)||}_{2}^{2}] \\ \overset{(A 1)}{=} & \frac{1}{N} E [{||G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}} \sum_{ℓ = 1}^{L} C_{ℓ} \{f (x + β_{ℓ} h V) - f (x)\}||}_{2}^{2}] \\ \leq & \frac{1}{N} E [{||G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}}||}_{2}^{2} {||h V||}_{2}^{2}] M_{1}^{2} {(\sum_{ℓ = 1}^{L} |C_{ℓ} β_{ℓ}|)}^{2} \\ \leq & \frac{1}{N} E [{||G^{- 1} (x) \frac{V h^{- 1}}{σ^{2}}||}_{2}^{2} {||V^{2}||}_{2}] {||h^{2}||}_{2} M_{1}^{2} {(\sum_{ℓ = 1}^{L} |C_{ℓ} β_{ℓ}|)}^{2} \end{matrix}

when bearing in mind the Hölder inequality. The results hold using

R : = V / σ

and the fact that when

V_{k} \sim U (- ξ, ξ)

,

{||V^{2}||}_{2} \leq \sqrt{d} ξ^{2}

, and

σ^{2} = ξ^{2} / 3

.

References

Rosenblatt, M. Remarks on a Multivariate Transformation. Ann. Math. Statist. 1952, 23, 470–472. [Google Scholar] [CrossRef]
Nataf, A. Détermination des distributions dont les marges sont données. Comptes Rendus L’Académie Des Sci. 1962, 225, 42–43. [Google Scholar]
Joe, H. Dependence Modeling with Copulas; Chapman & Hall/CRC: London, UK, 2014. [Google Scholar]
McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management; Princeton University Press: Princeton, NJ, USA; Oxford, UK, 2015. [Google Scholar]
Navarro, J.; Ruiz, J.M.; Aguila, Y.D. Multivariate weighted distributions: A review and some extensions. Statistics 2006, 40, 51–64. [Google Scholar] [CrossRef]
Sklar, A. Fonctions de Rpartition à n Dimensions et Leurs Marges. Publ. l’Institut Stat. L’Université Paris 1959, 8, 229–231. [Google Scholar]
Durante, F.; Ignazzi, C.; Jaworski, P. On the class of truncation invariant bivariate copulas under constraints. J. Math. Anal. Appl. 2022, 509, 125898. [Google Scholar] [CrossRef]
Skorohod, A.V. On a representation of random variables. Theory Probab. Appl. 1976, 21, 645–648. [Google Scholar]
Lamboni, M.; Kucherenko, S. Multivariate sensitivity analysis and derivative-based global sensitivity measures with dependent variables. Reliab. Eng. Syst. Saf. 2021, 212, 107519. [Google Scholar] [CrossRef]
Lamboni, M. Efficient dependency models: Simulating dependent random variables. Math. Comput. Simul. 2022, 200, 199–217. [Google Scholar] [CrossRef]
Lamboni, M. On exact distribution for multivariate weighted distributions and classification. Methodol. Comput. Appl. Probab. 2023, 25, 41. [Google Scholar] [CrossRef]
Lamboni, M. Measuring inputs-outputs association for time-dependent hazard models under safety objectives using kernels. Int. J. Uncertain. Quantif. 2024, 1–17. [Google Scholar] [CrossRef]
Kucherenko, S.; Klymenko, O.; Shah, N. Sobol’ indices for problems defined in non-rectangular domains. Reliab. Eng. Syst. Saf. 2017, 167, 218–231. [Google Scholar] [CrossRef]
Lamboni, M. On dependency models and dependent generalized sensitivity indices. arXiv 2021, arXiv:2104.12938. [Google Scholar]
Lamboni, M. Derivative Formulas and Gradient of Functions with Non-Independent Variables. Axioms 2023, 12, 845. [Google Scholar] [CrossRef]
Nemirovsky, A.; Yudin, D. Problem Complexity and Method Efficiency in Optimization; Wiley & Sons: New York, NY, USA, 1983; p. 404. [Google Scholar]
Patelli, E.; Pradlwarter, H. Monte Carlo gradient estimation in high dimensions. Int. J. Numer. Methods Eng. 2010, 81, 172–188. [Google Scholar] [CrossRef]
Agarwal, A.; Dekel, O.; Xiao, L. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. In Proceedings of the The 23rd Conference on Learning Theory, COLT 2010, Haifa, Israel, 27–29 June 2010; pp. 28–40. [Google Scholar]
Bach, F.; Perchet, V. Highly-Smooth Zero-th Order Online Optimization. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; Volume 49, pp. 257–283. [Google Scholar]
Akhavan, A.; Pontil, M.; Tsybakov, A.B. Exploiting Higher Order Smoothness in Derivative-Free Optimization and Continuous Bandits. Red Hook, NY, USA, 2020. NIPS’20. Available online: https://arxiv.org/abs/2006.07862 (accessed on 18 June 2024).
Sobol, I.M.; Kucherenko, S. Derivative based global sensitivity measures and the link with global sensitivity indices. Math. Comput. Simul. 2009, 79, 3009–3017. [Google Scholar] [CrossRef]
Kucherenko, S.; Rodriguez-Fernandez, M.; Pantelides, C.; Shah, N. Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 2009, 94, 1135–1148. [Google Scholar] [CrossRef]
Lamboni, M.; Iooss, B.; Popelin, A.L.; Gamboa, F. Derivative-based global sensitivity measures: General links with Sobol’ indices and numerical tests. Math. Comput. Simul. 2013, 87, 45–54. [Google Scholar] [CrossRef]
Roustant, O.; Fruth, J.; Iooss, B.; Kuhnt, S. Crossed-derivative based sensitivity measures for interaction screening. Math. Comput. Simul. 2014, 105, 105–118. [Google Scholar] [CrossRef]
Fruth, J.; Roustant, O.; Kuhnt, S. Total interaction index: A variance-based sensitivity index for second-order interaction screening. J. Stat. Plan. Inference 2014, 147, 212–223. [Google Scholar] [CrossRef]
Lamboni, M. Derivative-based generalized sensitivity indices and Sobol’ indices. Math. Comput. Simul. 2020, 170, 236–256. [Google Scholar] [CrossRef]
Lamboni, M. Derivative-based integral equalities and inequality: A proxy-measure for sensitivity analysis. Math. Comput. Simul. 2021, 179, 137–161. [Google Scholar] [CrossRef]
Lamboni, M. Weak derivative-based expansion of functions: ANOVA and some inequalities. Math. Comput. Simul. 2022, 194, 691–718. [Google Scholar] [CrossRef]
Bobkov, S. Isoperimetric and Analytic Inequalities for Log-Concave Probability Measures. Ann. Probab. 1999, 27, 1903–1921. [Google Scholar] [CrossRef]
Roustant, O.; Barthe, F.; Iooss, B. Poincaré inequalities on intervals-application to sensitivity analysis. Electron. J. Statist. 2017, 11, 3081–3119. [Google Scholar] [CrossRef]
Le Dimet, F.X.; Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus A Dyn. Meteorol. Oceanogr. 1986, 38, 97–110. [Google Scholar] [CrossRef]
Le Dimet, F.X.; Ngodock, H.E.; Luong, B.; Verron, J. Sensitivity analysis in variational data assimilation. J.-Meteorol. Soc. Jpn. 1997, 75, 245–255. [Google Scholar] [CrossRef]
Cacuci, D.G. Sensitivity and Uncertainty Analysis—Theory; Chapman & Hall/CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
Gunzburger, M.D. Perspectives in Flow Control and Optimization; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
Borzi, A.; Schulz, V. Computational Optimization of Systems Governed by Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar]
Ghanem, R.; Higdon, D.; Owhadi, H. Handbook of Uncertainty Quantification; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Guidotti, E. Calculus: High-Dimensional Numerical and Symbolic Calculus in R. J. Stat. Softw. 2022, 104, 1–37. [Google Scholar] [CrossRef]
Ancell, B.; Hakim, G.J. Comparing Adjoint- and Ensemble-Sensitivity Analysis with Applications to Observation Targeting. Mon. Weather Rev. 2007, 135, 4117–4134. [Google Scholar] [CrossRef]
Pradlwarter, H. Relative importance of uncertain structural parameters. Part I: Algorithm. Comput. Mech. 2007, 40, 627–635. [Google Scholar] [CrossRef]
Polyak, B.; Tsybakov, A. Optimal accuracy orders of stochastic approximation algorithms. Probl. Peredachi Inf. 1990, 26, 45–53. [Google Scholar]
Zemanian, A. Distribution Theory and Transform Analysis: An Introduction to Generalized Functions, with Applications; Dover Books on Advanced Mathematics; Dover Publications: Mineola, NY, USA, 1987. [Google Scholar]
Strichartz, R. A Guide to Distribution Theory and Fourier Transforms; Studies in Advanced Mathematics; CRC Press: Boca, FL, USA, 1994. [Google Scholar]
Lamboni, M. Kernel-based Measures of Association Between Inputs and Outputs Using ANOVA. Sankhya A 2024. [Google Scholar] [CrossRef]
Gilbert, P.; Varadhan, R. R-Package numDeriv: Accurate Numerical Derivatives; CRAN Repository. 2019. Available online: http://optimizer.r-forge.r-project.org/ (accessed on 18 June 2024).
Soetaert, K.; Petzoldt, T.; Setzer, R.W. Solving Differential Equations in R: Package deSolve. J. Stat. Softw. 2010, 33, 1–25. [Google Scholar] [CrossRef]

Figure 1. Exact gradient versus estimated gradients using

L = 1

(∘) and

L = 2

(+) of the QoI by considering the inputs as independent (formal gradients).

Figure 1. Exact gradient versus estimated gradients using

L = 1

(∘) and

L = 2

(+) of the QoI by considering the inputs as independent (formal gradients).

Figure 2. Exact gradient versus estimated gradients using

L = 1

(∘) and

L = 2

(+) of the QoI by considering the autocorrelations among the inputs (dependent gradients).

Figure 2. Exact gradient versus estimated gradients using

L = 1

(∘) and

L = 2

(+) of the QoI by considering the autocorrelations among the inputs (dependent gradients).

Figure 3. Exact gradient versus estimated gradients using

L = 2

(∘) and

L = 3

(+) of the QoI by considering the autocorrelations anong the inputs (dependent gradients).

Figure 3. Exact gradient versus estimated gradients using

L = 2

(∘) and

L = 3

(+) of the QoI by considering the autocorrelations anong the inputs (dependent gradients).

Table 1. Values of

E r r

for three different approximations of the formal gradients.

Table 1. Values of

E r r

for three different approximations of the formal gradients.

	Number of Total Model Evaluations (i.e., $LN$ )
	100	150	200	200	1000	1000
Methods
FDM [44]	-	-	-	0.005	-	-
MC [17]	$0.042$	-	-	-	-	-
	$L = 1$	$L = 1$	$L = 1$	$L = 2$	$L = 1$	$L = 2$
This paper	0.035	0.014	0.009	0.009	0.0020	0.00199

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lamboni, M. Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables. Axioms 2024, 13, 426. https://doi.org/10.3390/axioms13070426

AMA Style

Lamboni M. Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables. Axioms. 2024; 13(7):426. https://doi.org/10.3390/axioms13070426

Chicago/Turabian Style

Lamboni, Matieyendou. 2024. "Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables" Axioms 13, no. 7: 426. https://doi.org/10.3390/axioms13070426

APA Style

Lamboni, M. (2024). Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables. Axioms, 13(7), 426. https://doi.org/10.3390/axioms13070426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables

Abstract

1. Introduction

2. Preliminaries

3. Main Results

3.1. Stochastic Expressions of the Gradients of Functions with Dependent Variables

3.2. Links to Other Works for Independent Input Variables

3.3. Computation of the Gradients of Functions with Dependent Variables

4. Computations of the Formal Gradient of Rosenbrock’s Function

5. Application to a Heat PDE Model with Stochastic Initial Conditions

5.1. Heat Diffusion Model and Its Formal Gradient

5.2. Spatial Autocorrelations of Initial Conditions and the Tensor Metric

5.3. Comparisons between Exact Gradient and Estimated Gradients

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Corollary 1

Appendix C. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI