Derivative Formulas and Gradient of Functions with Non-Independent Variables

Lamboni, Matieyendou

doi:10.3390/axioms12090845

Open AccessArticle

Derivative Formulas and Gradient of Functions with Non-Independent Variables

by

Matieyendou Lamboni

^1,2

¹

Department DFR-ST, University of Guyane, 97346 Cayenne, France

²

228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France

Axioms 2023, 12(9), 845; https://doi.org/10.3390/axioms12090845

Submission received: 18 June 2023 / Revised: 17 August 2023 / Accepted: 22 August 2023 / Published: 30 August 2023

(This article belongs to the Special Issue Mathematical and Statistical Analysis Methods with Multidisciplinary Applications)

Download Versions Notes

Abstract

:

Stochastic characterizations of functions subject to constraints result in treating them as functions with non-independent variables. By using the distribution function or copula of the input variables that comply with such constraints, we derive two types of partial derivatives of functions with non-independent variables (i.e., actual and dependent derivatives) and argue in favor of the latter. Dependent partial derivatives of functions with non-independent variables rely on the dependent Jacobian matrix of non-independent variables, which is also used to define a tensor metric. The differential geometric framework allows us to derive the gradient, Hessian, and Taylor-type expansions of functions with non-independent variables.

Keywords:

copulas; dependent and correlated variables; gradient; Hessian; Riemannian tensors

MSC:

26A24; 53Axx; 62H05; 62H05

1. Introduction

We used to work with models defined through functions that include non-independent variables such as correlated input variables. This is also the case for models defined via functions with independent input variables and equations or inequations connecting such inputs; functions subject to constraints involving input variables and/or the model output. Knowing the key role of partial derivatives at a given point in (i) the mathematical analysis of functions and convergence, (ii) Poincaré inequalities ([1,2]) and equalities ([3,4]), (iii) optimization and active subspaces ([5,6]), (iv) implicit functions ([7,8]) and (v) differential geometry (see, e.g., [9,10,11]), it is interesting and relevant to have formulas that enable the calculations of partial derivatives of functions in the presence of non-independent variables, including the gradients. Of course, such formulas must account for the dependency structures among the model inputs, including the constraints imposed on such inputs.

Actual partial derivatives aim to calculate the partial derivatives of functions, taking into account the relationships between the input variables ([12]). For instance, let us consider the function

f : R^{3} \to R

given by

f (x, y, z) = 2 x y^{2} z^{3}

under the constrained equation

h (x, y, z) = 0

, where h is any smooth function. Using

f_{x}^{'}

to represent the formal partial derivative of f with respect to. x, which is the partial derivative of f when considering other inputs as constant or independent, the chain rule yields the following partial derivative:

\begin{matrix} \frac{\partial f}{\partial x} (x, y, z) & = & f_{x}^{'} (x, y, z) + f_{y}^{'} (x, y, z) \frac{\partial y}{\partial x} + f_{z}^{'} (x, y, z) \frac{\partial z}{\partial x} \\ = & 2 y^{2} z^{3} + 4 x y z^{3} \frac{\partial y}{\partial x} + 6 x y^{2} z^{2} \frac{\partial z}{\partial x} . \end{matrix}

In probability and statistics, the point

(x, y, z)

represents a realization or a sample value of the implicit random vector

(X, Y, Z)

. The quantities

\frac{\partial y}{\partial x}, \frac{\partial z}{\partial x}

, and

\frac{\partial f}{\partial x}

do not have a definite meaning when the variables

(X, Y, Z)

are npn-independent. At this point, determining

\frac{\partial y}{\partial x}, \frac{\partial z}{\partial x}

and

\frac{\partial f}{\partial x}

is often challenging without supplementary assumptions such as the choice of directions or paths. When

(X, Z)

or

(X, Y)

are independent and using the equation

h (x, y, z) = x + 2 y - z = 0

, we can write ([12]):

\frac{\partial f}{\partial x} (x, y, z) = \{\begin{matrix} 2 y^{2} z^{3} - 2 x y z^{3} & i f X a n d Z a r e i n d e p e n d e n t o r z i s b e i n g h e l d f i x e d \\ 2 y^{2} z^{3} + 6 x y^{2} z^{2} & i f X a n d Y a r e i n d e p e n d e n t o r y i s b e i n g h e l d f i x e d \end{matrix} .

It is clear that the actual partial derivatives are not unique, as such derivatives rely on two different paths or assumptions. While each supplementary assumption can make sense in some cases, it cannot always be guaranteed by the constrained equation

h (x, y, z) = 0

in general. Indeed, when all the initial input variables are dependent or correlated, the above partial derivatives are no longer valid, even for a linear function h, and it is worth finding the correct relationship between the input variables and the partial derivatives, including the gradient.

In differential geometry (see, e.g., [9,10,11]), using the differential of the function f, that is,

d f = 2 y^{2} z^{3} d x + 4 x y z^{3} d y + 6 x y^{2} z^{2} d z,

the gradient of f is defined as the dual of

d f

with respect to a given tensor metric. Obviously, different tensor metrics will yield different gradients of the same function. While the Euclidean metric, given by

γ : = d x^{2} + d y^{2} + d z^{2}

, is more appropriate for independent inputs, finding the appropriate metrics is challenging in general. Indeed, the first fundamental form (in differential geometry) requires the Jacobian matrix of the inputs to define the associated tensor metric.

For non-independent variables with F as the joint cumulative distribution function (CDF), the bivariate dependency models ([13]) and multivariate dependency models ([3,14,15]), including the conditional and inverse Rosenblatt transformation ([16,17]), establish formal and analytical relationships among such variables using either CDFs or the corresponding copulas or new distributions that look like and behave like a copula ([18]). A dependency function characterizes the probabilistic dependency structures among these variables. For a d-dimensional random vector of non-independent variables, the dependency models express a subset of

d - 1

variables as a function of independent variables, consisting of the remaining input and new independent variables.

In this paper, we propose a new approach for calculating the partial derivatives of functions, considering the dependency structures among the input variables. Our approach relies on dependency models. By providing known relationships between the dependent inputs (including constraints imposed on inputs or outputs), dependency models can be regarded as global and probabilistic implicit functions (see Section 2.2). Such dependency models are used to determine the dependent Jacobian and the tensor metric for non-independent variables. The contributions of this paper are threefold:

To provide a generalization of the actual partial derivatives of functions with non-independent variables and establish its limits;
To introduce the general derivative formulas of functions with non-independent variables (known as dependent partial derivatives) without any additional assumptions;
To provide the gradient, Hessian, and Taylor-type expansions of functions with non-independent variables that comply with the dependency structures among the input variables.

In Section 2, we first review the dependency models of dependent input variables, including correlated variables. Next, we derive interesting properties of these models regarding the calculus of partial derivatives and probabilistic implicit functions. By coupling dependency functions with the function of interest, we extend the actual partial derivatives of functions with non-independent variables in Section 3. To avoid their drawbacks, the dependent partial derivatives of functions with non-independent variables are provided in Section 4. The gradient and Hessian matrix of these functions are derived in Section 5 using the framework of differential geometry. We provide an application in Section 6 and conclude this work in Section 7.

General Notations

For an integer

d > 0

, let

X : = (X_{1}, \dots, X_{d})

be a random vector of continuous variables with F as the joint cumulative distribution function (CDF) (i.e.,

X \sim F

). For any

j \in {1, \dots, d}

, we use

F_{x_{j}}

or

F_{j}

for the marginal CDF of

X_{j}

and

F_{j}^{- 1}

for its inverse. Also, we use

(\sim j) : = (w_{1}, \dots, w_{d - 1})

for an arbitrary permutation of

{1, \dots, d} ∖ {j}

and

X_{\sim j} : = (X_{w_{1}}, \dots, X_{w_{d - 1}})

.

For a function f that includes

X

as inputs, we use

f_{x_{j}}^{'}

for the formal partial derivative of f with respect to

X_{j}

, considering other inputs as constant or independent of

X_{j}

, and

\nabla f : = f_{x}^{'} : = {[f_{x_{1}}^{'}, \dots, f_{x_{d}}^{'}]}^{T}

. We use

\frac{\partial f}{\partial x_{j}}

for the partial derivative of f with respect to

X_{j}

, which takes the dependencies among inputs into account. We also use

\frac{\partial f}{\partial x} : = {[\frac{\partial f}{\partial x_{1}}, \dots, \frac{\partial f}{\partial x_{d}}]}^{T} \in R^{d}

. Of course,

\frac{\partial f}{\partial x} = \nabla f

for independent inputs.

2. Probabilistic Characterization of Functions with Non-Independent Variables

In probability theory, it is common to treat input variables as random vectors with associated CDFs. For instance, for the inputs that take their values within a known domain or space, the Bayesian framework allows for assigning a joint distribution, known as a prior distribution, to such inputs. Without additional information about the inputs, it is common to use non-informative prior distributions, such as uniform distributions or Gaussian distributions associated with higher values of the variances (see e.g., [19]).

Functions with non-independent variables include many types of models encountered in practice. An example is the models defined via a given function and equations or inequations connecting its inputs. The resulting inputs that comply with such constraints are often dependent or correlated. In the subsequent discussion, we use probability theory to characterize non-independent variables (see Definition 1).

Definition 1.

Consider

n \in N ∖ {0}

and a function

f : R^{d} \to R^{n}

that includes

X \sim F

as inputs.

Then, f is said to be a function with non-independent variables whenever there exists at least one pair

j_{1}, j_{2} \in {1, \dots, d}

with

j_{1} \neq j_{2}

, such that

F_{j_{1}, j_{2}} (x_{j_{1}}, x_{j_{2}}) \neq F_{j_{1}} (x_{j_{1}}) F_{j_{2}} (x_{j_{2}}) .

Using

N (0, Σ)

for the multivariate normal distribution, we can check that a function that includes

X \sim N (0, Σ)

as inputs, with

Σ

being a non-diagonal covariance matrix, is a member of the class of functions defined by

D_{d, n} = \{f : R^{d} \to R^{n} : X \sim F, F (x) \neq \prod_{j = 1}^{d} F_{j} (x_{j}); x \in R^{d}\} .

2.1. New Insight into Dependency Functions

In this section, we recall useful results about generic dependency models of non-independent variables (see [3,13,14,15,18]). For a d-dimensional random vector of non-independent variables (i.e.,

X \sim F

), a dependency model of

X

consists of expressing a subset of

d - 1

variables (i.e.,

X_{\sim j}

) as a function of independent variables, including

X_{j}

.

Formally, if

X \sim F

with

F (x) \neq \prod_{j = 1}^{d} F_{j} (x_{j})

, then there exists ([3,13,14,15,18]):

(i): New independent variables $Z : = (Z_{w_{1}}, \dots, Z_{w_{d - 1}})$ , which are independent of $X_{j}$ ;
(ii): A dependency function $r_{j} : R^{d} \to R^{d - 1}$ ,

such that

X_{\sim j} \overset{d}{=} r_{j} (X_{j}, Z); a n d (X_{j}, X_{\sim j}) \overset{d}{=} (X_{j}, r_{j} (X_{j}, Z)),

(1)

where

X_{\sim j} = : (X_{w_{1}}, \dots, X_{w_{d - 1}})

, and

A \overset{d}{=} B

means that the random variables A and B have the same CDF.

It is worth noting that the dependency model is not unique in general. Uniqueness can be obtained under additional conditions provided in Proposition 1, which enable the inversion of the dependency function

r_{j}

.

Proposition 1.

Consider a dependency model of the continuous random vector

X \sim F

, given by

X_{\sim j} = r_{j} (X_{j}, Z)

, with a prescribed order

(w_{1}, \dots, w_{d - 1})

.

If

X_{j}

is the explanatory variable and the distribution of

Z

is prescribed, then:

(i): The dependency model is uniquely defined;
(ii): The dependency model is invertible, and the unique inverse is given by

Z = r_{j}^{- 1} (X_{\sim j} | X_{j}) .

(2)

Proof.

See Appendix A. □

It should be noted that the dependency models (DMs) are vector-valued functions of independent input variables. Thus, DMs facilitate the calculus of partial derivatives, such as the partial derivatives of

X

with respect to

X_{j}

. Moreover, the inverse of a DM avoids working with

Z

. A natural choice of the order

(w_{1}, \dots, w_{d - 1})

is

(1, \dots, j - 1, j + 1, \dots, d)

.

2.2. Enhanced Implicit Functions: Dependency Functions

In this section, we provide a probabilistic version of the implicit function using DMs.

Consider

X \sim F

, a sample value of

X

given by

x \in R^{d}

, and a function

h : R^{d} \to R^{p}

with

p \leq d

an integer. When connecting the input variables by p compatible equations, that is,

h (X) = 0,

the well-known theorem of the implicit function (see Theorem 1 below) states that for each sample value

x^{*}

satisfying

h (x^{*}) = 0

, a subset of

X

can be expressed as a function of the others in the neighborhood of

x^{*}

. To recall this theorem, we use

u \subseteq {1, \dots, d}

with

| u | : = c a r d (u) = d - p

,

X_{u} : = (X_{j}, \forall j \in u)

,

X_{\sim u} : = (X_{j}, \forall j \in {1, \dots d} ∖ u)

, and

B (x_{u}^{*}, r_{1}) \subseteq R^{d - p}

(resp.

B (x_{\sim u}^{*}, r_{2})

) for an open ball centered on

x_{u}^{*}

(resp.

x_{\sim u}^{*}

) with a radius of

r_{1}

(resp.

r_{2}

). Again,

h_{x_{\sim u}}^{'} (x^{*})

(resp.

h_{x_{u}}^{'}

) is the formal Jacobian of h with respect to

x_{\sim u}

(resp.

x_{u}

).

Theorem 1

(implicit function). Assume that

h (x^{*}) = 0

and

h_{x_{\sim u}}^{'} (x^{*})

is invertible. Then, there exists a function

g : B (x_{u}^{*}, r_{1}) \to B (x_{\sim u}^{*}, r_{2})

, such that

x_{\sim u} = g (x_{u}); x_{\sim u}^{*} = g (x_{u}^{*}); \frac{\partial x_{\sim u}}{\partial x_{u}} (x_{u}^{*}) = - {[h_{x_{\sim u}}^{'} (x^{*})]}^{- 1} h_{x_{u}}^{'} (x^{*}) .

While Theorem 1 is useful, it turns out that the implicit function theorem provides a local relationship among the variables. It should be noted that the DMs derived in Section 2.1 provide global relationships once the CDFs of the input variables are known. The distribution function of the variables that satisfy the constraints given by

h (X) = 0

is needed to construct the global implicit function. To derive this distribution function, we assume that:

(A1) All the constraints

h (X) = 0

are compatible.

Under (A1), the constraints

h (X) = 0

introduce new dependency structures on the initial CDF F, which matters for our analysis. Probability theory ensures the existence of a distribution function that captures these dependencies.

Proposition 2.

Let

X \sim F

and

X^{c} : = \{X \sim F : h (X) = 0\} \sim F^{c}

be the constrained variables. If (A1) holds, we have

\{\begin{matrix} X \sim F \\ s . t . h (X) : = 0 \end{matrix} \overset{d}{=} X^{c},

where

\overset{d}{=}

denotes equality in distribution.

Introducing constraints on initial variables leads us to work with constrained variables that follow a new CDF, that is,

X^{c} \sim F^{c}

. Some examples of generic and constrained variables

X^{c}

and their corresponding distribution functions can be found in [14,15,20]. When analytical derivations of the CDF of

X^{c}

are challenging or not possible, a common practice is to fit a distribution function to the observations of

X^{c}

using numerical simulations (see [14,21,22,23,24,25] for examples of distribution and density estimations). By using the new distributions of the input variables, Corollary 1 provides the probabilistic version of the implicit function.

Definition 2.

A distribution G is considered to be a degenerate CDF when G is the CDF of the Dirac measure with

δ_{a}

as the probability mass function, where

a \in R

.

Corollary 1

([14,15]). Consider a random vector

X^{c} : = \{X \sim F : h (X) = 0\}

that follows

F^{c}

as a CDF. Assume that (A1) holds and

F^{c}

is a nondegenerate CDF.

Then, there exists a function

r_{j} : R^{d} \to R^{d - 1}

and

d - 1

independent variables

Z

, such that

Z

is independent of

X_{j}^{c}

and

X_{\sim j}^{c} = r_{j} (X_{j}^{c}, Z) .

Proof.

This result is the DM for the distribution

F^{c}

(see Section 2.1). □

While Corollary 1 gives the explicit function that links

X_{\sim j}

to

X_{j}

, we can sometimes extend this result as follows:

X_{\sim u}^{c} = r_{u} (X_{u}^{c}, Z_{u}),

(3)

where

Z_{u}

is a vector of

d - | u |

independent variables,

Z_{u}

is independent of

X_{u}^{c}

, and

r_{u} : R^{d} \to R^{d - | u |}

(see Section 2.1 and [14]).

Remark 1.

We can easily generalize the above process to handle (i) constrained inequations such as

h (X) < 0

or

h (X) > 0

(see Section 6 and [15]), and (ii) a mixture of constrained equations and inequalities involving different variables.

Remark 2.

For a continuous random vector

X

with C as the copula and

F_{j}

,

j = 1, \dots, d

as the marginal distributions, an expression of its DM is given by ([3,14])

\begin{matrix} \begin{matrix} X_{1} & = & F_{1}^{- 1} (C_{1 | j}^{- 1} (Z_{1} | F_{j} (X_{j}))) = : r_{j, 1} (X_{j}, Z_{1}) \\ X_{2} & = & F_{2}^{- 1} (C_{2 | j, 1}^{- 1} (Z_{2} | F_{j} (X_{j}), F_{1} (r_{j, 1} (X_{j}, Z_{1})))) = : r_{j, 2} (X_{j}, Z_{1}, Z_{2}) \\ ⋮ \end{matrix}, \end{matrix}

(4)

where

C_{1 | j}

is the conditional copula,

C_{1 | j}^{- 1}

is the inverse of

C_{1 | j}

, and

r_{j, 1}

and

r_{j, 2}

are real-valued functions.

2.3. Representation of Functions with Non-Independent Variables

In general, a function may include a group of independent variables, as well as groups of non-independent variables such as correlated variables and/or dependent variables. We can then organize these input variables as follows:

(O): the random vector

X : = (X_{1}, \dots, X_{d})

consists of K independent random vector(s) given by

X = (X_{π_{1}}, \dots, X_{π_{K}})

, where the sets

π_{1}, \dots, π_{K}

form a partition of

{1, \dots, d}

. The random vector

X_{π_{k_{1}}} : = (X_{j}, \forall j \in π_{k_{1}})

is independent of

X_{π_{k_{2}}} : = (X_{j}, \forall j \in π_{k_{2}})

for every pair

k_{1}, k_{2} \in {1, \dots, K}

with

k_{1} \neq k_{2}

. Without loss of generality, we use

X_{π_{1}}

for a random vector of

d_{1} \geq 0

independent variable(s) and

X_{π_{k}}

with

k \geq 2

for a random vector of

d_{k} \geq 2

dependent variables.

We use

(π_{1, k}, \dots, π_{d_{k}, k})

to denote the ordered permutation of

π_{k}

(i.e.,

π_{1, k} < π_{2, k} \dots < π_{d_{k}, k}

). For any

π_{j, k} \in π_{k}

, we use

X_{π_{j, k}}

to refer to an element of

X_{π_{k}}

;

(\sim π_{j, k}) : = (π_{1, k}, \dots, π_{j - 1, k},

π_{j + 1, k}, \dots, π_{d_{k}, k})

and

Z_{k} : = (Z_{π_{ℓ, k}}, \forall π_{ℓ, k} \in (\sim π_{j, k}))

. Keeping in mind the DMs (see Section 2.1), we can represent

X_{π_{k}}

by

X_{\sim π_{j, k}} \overset{d}{=} r_{π_{j, k}} (X_{π_{j, k}}, Z_{k}), \forall k \in {2, \dots, K},

(5)

where

r_{π_{j, k}} (\cdot) = (r_{π_{1, k}} (\cdot), \dots, r_{π_{j - 1, k}} (\cdot), r_{π_{j + 1, k}} (\cdot), \dots, r_{π_{d_{k}}} (\cdot))

;

Z_{k}

is a random vector of

d_{k} - 1

independent variables; and

X_{π_{1, k}}

is independent of

Z_{k}

. Based on the above DM of

X_{π_{k}}

with

k = 2, \dots, K

, let us introduce new functions, that is,

c_{π_{j, k}} : R^{d_{k}} \to R^{d_{k}}

, given by

c_{π_{j, k}} (\cdot) = (r_{π_{1, k}} (\cdot), \dots, r_{π_{j - 1, k}} (\cdot), r_{π_{j, k}} (\cdot) = X_{π_{j, k}}, r_{π_{j + 1, k}} (\cdot), \dots, r_{π_{d_{k}}} (\cdot))

and

X_{π_{k}} \overset{}{=} c_{π_{j, k}} (X_{π_{j, k}}, Z_{k}) = : (X_{π_{j, k}}, r_{π_{j, k}} (X_{π_{j, k}}, Z_{k})) .

The function

c_{π_{j, k}}

maps the independent variables

(X_{π_{j, k}}, Z_{k})

onto

X_{π_{k}}

, and the chart

\begin{matrix} R^{d} & \overset{c}{\to} & R^{d} & \overset{f}{\to} & R^{n} \\ (\begin{matrix} X_{π_{1}} \\ X_{π_{j, 2}} \\ Z_{2} \\ ⋮ \\ X_{π_{j, K}} \\ Z_{K} \end{matrix}) & \mapsto & (\begin{matrix} X_{π_{1}} \\ X_{π_{2}} \\ ⋮ \\ X_{π_{K}} \end{matrix}) = X & \mapsto & (\begin{matrix} f_{1} (X) \\ ⋮ \\ f_{n} (X) \end{matrix}) \end{matrix},

leads to a new representation of functions with non-independent variables. Indeed, composing f with c yields

f (X_{π_{1}}, X_{2}, \dots, X_{π_{k}}) \overset{d}{=} f \circ c (X_{π_{1}}, X_{π_{j, 2}}, Z_{2}, \dots, X_{π_{j, K}}, Z_{K}),

(6)

where

c (X_{π_{1}}, X_{π_{j, 2}}, Z_{2}, \dots, X_{π_{j, K}}, Z_{K}) : = (X_{π_{1}}, c_{π_{j, 2}} (X_{π_{j, 2}}, Z_{2}), \dots, c_{π_{j, K}} (X_{π_{j, K}}, Z_{K}))

.

The equivalent representation of f, given by (6), relies on the innovation variables

Z : = (Z_{2}, \dots, Z_{K})

. Recall that for the continuous random vector

X_{π_{k}}

, the DM

r_{π_{j, k}}

, given by (5), is always invertible (see Proposition 1), and, therefore,

c_{π_{j, k}}

is also invertible. These inversions are helpful for working with

X

only.

3. Actual Partial Derivatives

This section discusses the calculus of partial derivatives of functions with non-independent variables using only one relationship among the inputs, such as the DM given by Equation (5). The usual assumptions made are:

(A2) The joint (resp. marginal) CDF is continuous and has a density function

ρ > 0

on its open support;

(A3) Each component of the dependency function

r_{π_{j, k}}

is differentiable with respect to

X_{π_{j, k}}

;

(A4) Each component of the function f, that is,

f_{ℓ}

with

ℓ \in {1, \dots, n}

, is differentiable with respect to each input.

Without loss of generality and for the sake of simplicity, here, we suppose that

n = 1

. Namely, we use

I_{d \times d} \in R^{d \times d}

for the identity matrix, and

O_{d \times d_{1}} \in R^{d \times d_{1}}

for the null matrix. It is common to use

\nabla f_{x_{π_{k}}}

for the formal partial derivatives of f with respect to each input of

X_{π_{k}}

(i.e., the derivatives obtained by considering inputs as independent) with

k = 1, \dots, K

. Thus, the formal gradient of f (i.e., the gradient with respect to the Euclidean metric) is given by

\nabla f : = {[\nabla f_{x_{π_{1}}}^{T} \nabla f_{x_{π_{2}}}^{T} \dots \nabla f_{x_{π_{K}}}^{T}]}^{T} .

(7)

Keeping in mind the function

c_{π_{j, k}} (\cdot)

, the partial derivatives of each component of

X_{π_{k}}

with respect to

X_{π_{j, k}}

are given by

J^{(π_{j, k})} : = \frac{\partial c_{π_{j, k}}}{\partial x_{π_{j, k}}} = {[\frac{\partial X_{π_{1, k}}}{\partial x_{π_{j, k}}} \dots \frac{\partial X_{π_{d_{k}, k}}}{\partial x_{π_{j, k}}}]}^{T} = {[\frac{\partial r_{π_{1, k}}}{\partial x_{π_{j, k}}} \dots \underset{j^{th} position}{\underset{︸}{1}} \dots \frac{\partial r_{π_{d_{k}, k}}}{\partial x_{π_{j, k}}}]}^{T} .

(8)

We use

J_{i}^{(π_{j, k})}

for the

i^{t h}

element of

J^{(π_{j, k})}

. For instance,

J_{j}^{(π_{j, k})} = 1

and

J_{d_{k}}^{(π_{j, k})}

represent the partial derivative of

X_{π_{d_{k}, k}}

with respect to

X_{π_{j, k}}

. It is worth recalling that

J^{(π_{j, k})}

is a vector-valued function of

(X_{π_{j, k}}, Z_{k})

, and Lemma 1 expresses

J^{(π_{j, k})}

as a function of

x_{π_{k}}

only.

Lemma 1.

Let

x_{π_{k}}

be a sample value of

X_{π_{k}}

. If Assumptions (A2)–(A4) hold, the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{j, k}}

evaluated at

x_{π_{k}}

are given by

J^{(π_{j, k})} (x_{π_{k}}) : = {[\frac{\partial r_{π_{1, k}}}{\partial x_{π_{j, k}}} \dots \underset{j^{th} position}{\underset{︸}{1}} \dots \frac{\partial r_{π_{d_{k}, k}}}{\partial x_{π_{j, k}}}]}^{T} (x_{π_{j, k}}, r_{π_{j, k}}^{- 1} (x_{\sim π_{j, k}} | x_{π_{j, k}})) .

(9)

Proof.

See Appendix B. □

Again,

J_{ℓ}^{(π_{j, k})} (x_{π_{k}})

with

ℓ \in {1, \dots, d_{k}}

represents the

ℓ^{t h}

component of

J^{(π_{j, k})} (x_{π_{k}})

, as provided in Lemma 1. Using these components and the chain rule, Theorem 2 provides the actual partial derivatives of functions with non-independent variables (i.e.,

\frac{\partial_{a} f}{\partial x} = : J_{π_{k}}^{a} (x_{π_{k}}))

, which are the derivatives obtained by utilizing only one dependency function given by Equation (5).

Theorem 2.

Let

x \in R^{d}

be a sample value of

X

and

π_{j, k} \in π_{k}

with

k = 2, \dots, K

. If Assumptions (A2)–(A4) hold, then:

(i): The actual Jacobian matrix of $c_{π_{j, k}}$ is given by

$J_{π_{k}}^{a} (x_{π_{k}}) : = [\begin{matrix} \frac{J^{(π_{j, k})} (x_{π_{k}})}{J_{1}^{(π_{j, k})} (x_{π_{k}})} & \dots & \frac{J^{(π_{j, k})} (x_{π_{k}})}{J_{d_{k}}^{(π_{j, k})} (x_{π_{k}})} \end{matrix}], \forall k \in {2, \dots, K} .$

(10)
(ii): The actual Jacobian matrix of c or $\frac{\partial_{a} X}{\partial x}$ is given by

$\begin{matrix} J^{a} (x) & : = & [\begin{matrix} I_{d_{1} \times d_{1}} & O_{d_{1} \times d_{2}} & \dots & O_{d_{1} \times d_{K}} \\ O_{d_{2} \times d_{1}} & J_{π_{2}}^{a} (x_{π_{2}}) & \dots & O_{d_{2} \times d_{K}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ O_{d_{K} \times d_{1}} & O_{d_{K} \times d_{2}} & \dots & J_{π_{K}}^{a} (x_{π_{K}}) \end{matrix}], \end{matrix}$

(11)
(iii): The actual partial derivatives of f are given by

$\frac{\partial_{a} f}{\partial x} (x) : = J^{a} {(x)}^{T} \nabla f (x) .$

(12)

Proof.

see Appendix C. □

The results from Theorem 2 are based on only one dependency function, which uses

X_{π_{j, k}}

as the explanatory input. Thus, the actual Jacobian

J^{a} (x)

and the actual partial derivatives of f provided in (10)–(12) will change with the choice of the explanatory input

X_{π_{j, k}}

for every

j \in {1, \dots, d_{k}}

. All these possibilities are not surprising. Indeed, while no additional explicit assumption is necessary for calculating the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{j, k}}

(i.e.,

J^{(π_{j, k})}

), we implicitly keep the other variables fixed when calculating the partial derivative of

X_{π_{k}}

with respect to

X_{π_{i, k}}

, that is,

\frac{J^{(π_{j, k})}}{J_{i}^{(π_{j, k})}}

for each

i \neq j

. Such an implicit assumption is due to the reciprocal rule used to derive the results (see Appendix C). In general, the components of

X_{\sim π_{j, k}}

, such as

X_{π_{i_{1}, k}}

and

X_{i_{2}, k}

, are both functions of

X_{π_{j, k}}

and

Z_{π_{1, k}}

at the very least. Thus, different possibilities of the actual Jacobians are based on different implicit assumptions, making it challenging to use the actual partial derivatives. Further drawbacks of the actual partial derivatives of f are illustrated in Section 3

Example 1

We consider the function

f (X_{1}, X_{2}) = X_{1} + X_{2} + X_{1} X_{2}

, which includes two correlated inputs

X \sim N_{2} (0, [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])

. We see that

\nabla f (X) = [\begin{matrix} 1 + X_{2} \\ 1 + X_{1} \end{matrix}]

. Using the DM of

X

given by (see [3,14,15])

X_{2} = ρ X_{1} + \sqrt{1 - ρ^{2}} Z_{2} ⟺ Z_{2} = (X_{2} - ρ X_{1}) / \sqrt{1 - ρ^{2}},

the actual Jacobian matrix of c and the actual partial derivatives of f are given by

J^{a} (X) = [\begin{matrix} 1 & \frac{1}{ρ} \\ ρ & 1 \end{matrix}]; \frac{\partial_{a} f}{\partial x} (X) = [\begin{matrix} 1 + X_{2} + ρ (1 + X_{1}) \\ \frac{1 + X_{2}}{ρ} + 1 + X_{1} \end{matrix}] .

When

ρ = 1

, both inputs are perfectly correlated, and we have

X_{1} = X_{2}

, which also implies that

f (X_{1}, X_{2}) = f (X_{1}) = 2 X_{1} + X_{1}^{2} = f (X_{2}) = 2 X_{2} + X_{2}^{2}

. We can check that

\frac{\partial_{a} f}{\partial x} (X) = [\begin{matrix} 2 + 2 X_{1} \\ 2 + 2 X_{2} \end{matrix}]

. However, when

ρ = 0

, both inputs are independent, and we should expect the actual partial derivatives to be equal to the formal gradient

\nabla f

, but this is not the case. Moreover, using the second DM, which is given by

X_{1} = ρ X_{2} + \sqrt{1 - ρ^{2}} Z_{1} ⟺ Z_{1} = (X_{1} - ρ X_{2}) / \sqrt{1 - ρ^{2}},

it becomes apparent that

J^{a} (X) = [\begin{matrix} 1 & ρ \\ \frac{1}{ρ} & 1 \end{matrix}]; \frac{\partial_{a} f}{\partial X} (X) = [\begin{matrix} 1 + X_{2} + \frac{1 + X_{1}}{ρ} \\ ρ (1 + X_{2}) + 1 + X_{1} \end{matrix}],

which differs from the previous results. All these drawbacks are due to the implicit assumptions made (e.g., keeping some variables fixed), which can be avoided (see Section 4).

4. Dependent Jacobian and Partial Derivatives

This section aims to derive the first- and second-order partial derivatives of functions with non-independent variables, without relying on any additional assumption, whether explicit or implicit. Essentially, we calculate or compute the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{i, k}}

using only the dependency function that includes

X_{π_{i, k}}

as an explanatory input, which can be expressed as follows:

X_{\sim π_{i, k}} = r_{π_{i, k}} (X_{π_{i, k}}, Z_{π_{i, k}}), \forall i = 1, \dots, d_{k}; \forall k = 2, \dots, K .

By using the above dependency function, the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{i, k}}

are given as follows (see (9)):

J^{(π_{i, k})} (x_{π_{k}}) : = \frac{\partial X_{π_{k}}}{\partial x_{π_{i, k}}} = {[\frac{\partial r_{π_{1, k}}}{\partial x_{π_{i, k}}} \dots \underset{i^{th} position}{\underset{︸}{1}} \dots \frac{\partial r_{π_{d_{k}, k}}}{\partial x_{π_{i, k}}}]}^{T} (x_{π_{i, k}}, r_{π_{i, k}}^{- 1} (x_{\sim π_{i, k}} | x_{π_{i, k}})) .

It should be noted that

J^{(π_{i, k})}

does not require any supplementary assumption, as

X_{π_{i, k}}

and

Z_{π_{i, k}}

are independent. Thus,

d_{k}

different DMs are necessary to derive the dependent Jacobian and partial derivatives of f (see Theorem 3).

Theorem 3.

Let

x \in R^{d}

be a sample value of

X

, and assume that (A2)–(A4) hold:

(i): For all $k \geq 2$ , the dependent Jacobian matrix $\frac{\partial X_{π_{k}}}{\partial x_{π_{k}}}$ is given by

$J_{π_{k}}^{d} (x_{π_{k}}) : = [\begin{matrix} J^{(π_{1, k})} (x_{π_{k}}) & \dots & J^{(π_{d_{k}, k})} (x_{π_{k}}) \end{matrix}] .$

(13)
(ii): The dependent Jacobian matrix $\frac{\partial X}{\partial x}$ is given by

$\begin{matrix} J^{d} (x) & : = & [\begin{matrix} I_{d_{1} \times d_{1}} & O_{d_{1} \times d_{2}} & \dots & O_{d_{1} \times d_{K}} \\ O_{d_{2} \times d_{1}} & J_{π_{2}}^{d} (x_{π_{2}}) & \dots & O_{d_{2} \times d_{K}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ O_{d_{K} \times d_{1}} & O_{d_{K} \times d_{2}} & \dots & J_{π_{K}}^{d} (x_{π_{K}}) \end{matrix}], \end{matrix}$

(14)
(iii): The partial derivatives of f are given by

$\frac{\partial f}{\partial x} (x) : = J^{d} {(x)}^{T} \nabla f (x),$

(15)

Proof.

See Appendix D. □

Although the results from Theorem 3 require different DMs, these results are more comprehensive than the actual partial derivatives because no supplementary assumption is available for each non-independent variable.

To derive the second-order partial derivatives of f, we use

f_{x_{i} x_{j}}^{^{″}}

to denote the formal cross-partial derivative of f with respect to

x_{i}

and

x_{j}

, and

H_{π_{k}} : = {(f_{x_{j_{1}} x_{j_{2}}}^{^{″}})}_{j_{1}, j_{2} \in π_{k}}

to denote the formal or ordinary Hessian matrices of f restricted to

X_{π_{k}}

for

k = 1, \dots, K

. In the same sense, we use

H_{π_{k_{1}}, π_{k_{2}}} : = {(f_{x_{j_{1}} x_{j_{2}}}^{^{″}})}_{j_{1} \in π_{k_{1}}, j_{2} \in π_{k_{2}}}

to denote the formal cross-Hessian matrix of f restricted to

(X_{π_{k_{1}}}, X_{π_{k_{2}}})

for every pair

k_{1}, k_{2} \in {1, \dots K}

with

k_{1} \neq k_{2}

. To ensure the existence of the second-order partial derivatives, we assume that:

(A5) The function f is twice (formal) differentiable with respect to each input;

(A6) Every dependency function

r_{π_{j, k}}

is twice differentiable with respect to

X_{π_{j, k}}

.

By considering the

d_{k}

DMs of

X_{π_{k}}

(i.e.,

X_{\sim π_{i, k}} = r_{π_{i, k}} (X_{π_{i, k}}, Z_{π_{i, k}})

, with

i = 1, \dots, d_{k}

) used to derive the dependent Jacobian, we can write

\frac{\partial J^{(π_{i, k})}}{\partial x_{π_{i, k}}} (x_{π_{k}}) : = \frac{\partial^{2} X_{π_{k}}}{\partial^{2} x_{π_{i, k}}} = {[\frac{\partial^{2} r_{π_{1, k}}}{\partial^{2} x_{π_{i, k}}} \dots \underset{i^{th} position}{\underset{︸}{0}} \dots \frac{\partial^{2} r_{π_{d_{k}, k}}}{\partial^{2} x_{π_{i, k}}}]}^{T} (x_{π_{i, k}}, r_{π_{i, k}}^{- 1} (x_{\sim π_{i, k}} | x_{π_{i, k}})),

for the second partial derivatives of

X_{π_{k}}

with respect to

X_{π_{i, k}}

. By using

d i a g (x) \in R^{d \times d}

to represent a diagonal matrix with

x

as its diagonal elements and

D J_{π_{k}}^{d} (x) : = d i a g ({[\frac{\partial J^{(π_{1, k})}}{\partial x_{π_{1, k}}} (x_{π_{k}}), \dots, \frac{\partial J^{(π_{d_{k}, k})}}{\partial x_{π_{d_{k}, k}}} (x_{π_{k}})]}^{T} \nabla f_{x_{π_{k}}} (x)) J_{π_{k}}^{d} (x_{π_{k}}),

(16)

for all

k \in {2, \dots, K}

, Theorem 4 provides the dependent second-order partial derivatives (i.e.,

\frac{\partial^{2} f}{\partial^{2} x}

).

Theorem 4.

Let

x

be a sample value of

X

. If (A2), (A5), and (A6) hold, then

\frac{\partial^{2} f}{\partial^{2} x} (x) : =

\begin{matrix} [\begin{matrix} H_{π_{1}} (x) & H_{π_{1}, π_{2}} (x) J_{π_{2}}^{d} (x_{π_{2}}) & \dots & H_{π_{1}, π_{K}} (x) J_{π_{K}}^{d} (x_{π_{k}}) \\ J_{π_{2}}^{d} {(x_{π_{2}})}^{T} H_{π_{2}, π_{1}} (x) & J_{π_{2}}^{d} {(x_{π_{2}})}^{T} H_{π_{2}} (x) J_{π_{2}}^{d} (x_{π_{2}}) & \dots & J_{π_{2}}^{d} {(x_{π_{2}})}^{T} H_{π_{2}, π_{K}} (x) \\ + D J_{π_{2}}^{d} (x) & \times J_{π_{K}}^{d} (x_{π_{k}}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ J_{π_{K}}^{d} {(x_{π_{k}})}^{T} H_{π_{K}, π_{1}} (x) & J_{π_{K}}^{d} {(x_{π_{K}})}^{T} H_{π_{K}, π_{2}} (x) J_{π_{2}}^{d} (x_{π_{2}}) & \dots & J_{π_{K}}^{d} {(x_{π_{K}})}^{T} H_{π_{K}} (x) \\ \times J_{π_{K}}^{d} (x_{π_{K}}) + D J_{π_{K}}^{d} (x) \end{matrix}] . \end{matrix}

Proof.

See Appendix E. □

Example 1 (Revisited)

Since

f (X_{1}, X_{2}) = X_{1} + X_{2} + X_{1} X_{2}

and the DMs of

X

are given by

X_{2} = ρ X_{1} + \sqrt{1 - ρ^{2}} Z_{2} ⟹ Z_{2} = (X_{2} - ρ X_{1}) / \sqrt{1 - ρ^{2}},

X_{1} = ρ X_{2} + \sqrt{1 - ρ^{2}} Z_{1} ⟹ Z_{1} = (X_{1} - ρ X_{2}) / \sqrt{1 - ρ^{2}},

we can check that

J^{d} = [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}]; \frac{\partial f}{\partial x} (X) = [\begin{matrix} 1 + X_{2} + ρ (1 + X_{1}) \\ ρ (1 + X_{2}) + 1 + X_{1} \end{matrix}]; \frac{\partial^{2} f}{\partial^{2} x} (X) = [\begin{matrix} 2 ρ & 1 + ρ^{2} \\ 1 + ρ^{2} & 2 ρ \end{matrix}] .

For instance, when

ρ = 1

, we have

\frac{\partial f}{\partial x} (X) = \frac{\partial_{a} f}{\partial x} (X)

, and when

ρ = 0

, we have

\frac{\partial f}{\partial X} (X) = \nabla f (X)

and

\frac{\partial^{2} f}{\partial^{2} X} (X) = H (X)

. Thus, the dependent partial derivatives of f align with the formal gradient and Hessian matrix when the inputs are independent.

5. Expansion of Functions with Non-Independent Variables

Although Section 4 provided the partial derivatives and cross-partial derivatives of f, it is misleading to think that the infinitesimal increment of f, given by

f (X_{π_{k}} + ϵ e_{π_{j, k}}) - f (X_{π_{k}})

, should result in the individual effect quantified by

\frac{\partial f (X_{π_{k}})}{\partial x_{π_{j, k}}} ϵ

with

e_{π_{j, k}} : = {[0, \dots, 0, \underset{π_{j, k}^{th} position}{\underset{︸}{1}}, 0, \dots, 0]}^{T}

and

ϵ > 0

. Indeed, when moving

X_{π_{j, k}}

, it leads to partial movements of the other variables, and the effects we observe (i.e.,

\nabla f_{x_{π_{k}}}^{T} J^{(π_{j, k})}

) can also be attributed to other variables. The dependency structures of these effects are described by the dependent Jacobian matrix

J_{π_{k}}^{d} (x_{π_{k}})

(see Equation (13)). Therefore, the definition of the gradient and Hessian of f with non-independent variables requires the introduction of a tensor metric or a Riemannian tensor.

In differential geometry, the function of the form

\begin{matrix} c_{π_{j, k}} : & R^{d_{k}} & \to & R^{d_{k}} \\ (X_{π_{j, k}}, Z_{π_{j, k}}) & \mapsto & X : = (X_{π_{1, k}}, \dots, X_{π_{j, k}}, \dots, X_{π_{d_{k}, k}}); X_{\sim π_{j, k}} \overset{}{=} r_{π_{j, k}} (X_{π_{j, k}}, Z_{π_{j, k}}) \end{matrix},

for every

π_{i, k} \in π_{k}

can be seen as a parametrization of a manifold

M_{k}

in

R^{d_{k}}

. The

d_{k}

column entries of the dependent Jacobian matrix

J_{π_{k}}^{d} (x_{π_{k}}) \in R^{d_{k} \times d_{k}}

span a local

m_{k}

-dimensional vector space, also known as the tangent space at

x_{π_{k}}

, where

m_{k}

is the rank of

J_{π_{k}}^{d} (x_{π_{k}})

, indicating the number of linearly independent columns in

J_{π_{k}}^{d} (x_{π_{k}})

.

By considering all the K groups of inputs and the corresponding dependent Jacobian matrix

J^{d} (x)

, we can see that the support of the random vector

X

forms an m-dimensional manifold

M

in

R^{d}

, where m is the rank of

J^{d} (x)

. When

m \leq d

, we work within the tangent space

T R^{m}

(or local coordinate system) spanned by the m column entries of

J^{d} (x)

that are linearly independent. Working in

T R^{m}

rather than

T R^{d}

ensures that the Riemannian tensor induced by

x \mapsto x

using the dot product is invertible. Since the Riemannian tensor metric is often symmetric, the Moore–Penrose generalized inverse of symmetric matrices ([26,27,28]) allows us to keep working in

T R^{d}

in the subsequent discussion. Using the first fundamental form (see, e.g., [9,10,11]), the induced tensor metric is defined as the inner product between the column entries of the dependent Jacobian matrix of the dependency functions, that is,

G (x) : = J^{d} {(x)}^{T} J^{d} (x) .

(17)

Based on these elements, the gradient and Hessian matrix are provided in Corollary 2. To that end, we use

G^{- 1}

to represent the inverse of the metric G, as given by Equation (17), when

m = d

, or the generalized inverse of G for every

m < d

([26,27,28]). For any

k \in {1, \dots, d}

, the Christoffel symbols are defined by ([9,11,29,30])

Γ_{i j}^{k} : = \frac{1}{2} \sum_{ℓ = 1}^{m = d} G_{k ℓ}^{- 1} (x) (G_{i ℓ, x_{j}}^{^{'}} (x) + G_{j ℓ, x_{i}}^{^{'}} (x) - G_{i j, x_{ℓ}}^{^{'}} (x)); \forall i, j = 1, \dots, d,

where

G_{i ℓ, x_{j}}^{^{'}}

is the formal partial derivative of

G_{i ℓ}

with respect to

x_{j}

.

Corollary 2.

Let

x

be a sample value of

X

, and assume that (A2) and (A5)–(A6) hold:

(i) The gradient of f is given by

g r a d (f) (x) : = G^{- 1} (x) \nabla f (x),

(18)

(ii) The Hessian matrix of f is given by

H e s s_{i j} (f) (x) : = f_{x_{i} x_{j}}^{^{″}} (x) - \sum_{k = 1}^{m = d} Γ_{i j}^{k} (x) f_{x_{k}}^{^{'}} (x) .

(19)

Proof.

Points (i)–(ii) result from the definition of the gradient and the Hessian matrix within a Riemannian geometric context equipped with the metric G (see [9,10,11,31]). □

Taylor’s expansion is widely used to approximate functions with independent variables. In the subsequent discussion, we are concerned with the approximation of a function with non-independent variables. The Taylor-type expansion of a function with non-independent variables is provided in Corollary 3 using the gradient and Hessian matrix.

Corollary 3.

Let

x

,

x_{0}

be two sample values of

X

, and assume that (A2) and (A5)–(A6) hold. Then, we have

f (x) \approx f (x_{0}) + {(x - x_{0})}^{T} g r a d (f) (x_{0}) + \frac{1}{2} {(x - x_{0})}^{T} H e s s (f) (x_{0}) (x - x_{0}),

(20)

provided that

x

is close to

x_{0}

.

Proof.

The proof is straightforward using the dot product induced by the tensor metric G within the tangent space and considering the Taylor expansion provided in [11]. □

Example 1 (Revisited)

For the function in Example 1, we can check that the tensor metric is

G (X) = [\begin{matrix} 1 + ρ^{2} & 2 ρ \\ 2 ρ & 1 + ρ^{2} \end{matrix}]; G^{- 1} (X) = \frac{1}{{(ρ^{2} - 1)}^{2}} [\begin{matrix} 1 + ρ^{2} & - 2 ρ \\ - 2 ρ & 1 + ρ^{2} \end{matrix}]

; and the gradient is

g r a d (f) (X) = \frac{1}{{(ρ^{2} - 1)}^{2}} [\begin{matrix} {(1 - ρ)}^{2} + X_{2} (1 + ρ^{2}) - 2 ρ X_{1} \\ {(1 - ρ)}^{2} + X_{1} (1 + ρ^{2}) - 2 ρ X_{2} \end{matrix}],

which reduces to

\nabla f (X)

when the variables are independent (i.e.,

ρ = 0

).

6. Application

In this section, we consider three independent input factors

X_{j} \sim N (0, 1)

with

j = 1, 2, 3

,

X : = (X_{1}, X_{2}, X_{3})

, a constant

c \in R_{+}

, and the function

f (X) = X_{1}^{2} + X_{2}^{2} + X_{3}^{2} .

Also, we consider the constraint

f (X) \leq c

. It is known in [15] (Corollary 4) that the DM of

X^{w} : = (X_{1}^{w}, X_{2}^{w}, X_{3}^{w}) : = \{X_{j} \sim N (0, 1), j = 1, 2, 3 : X_{1}^{2} + X_{2}^{2} + X_{3}^{2} \leq c\}

is given by

X_{2}^{w} = R_{2} Z_{2} \sqrt{c - {(X_{1}^{w})}^{2}}; X_{3}^{w} = R_{3} Z_{3} \sqrt{c - {(X_{1}^{w})}^{2}} \sqrt{1 - Z_{2}^{2}} .

We can then write

\frac{\partial X_{2}^{w}}{\partial X_{1}^{w}} = - X_{1}^{w} R_{2} Z_{2} \frac{1}{\sqrt{c - {(X_{1}^{w})}^{2}}} = \frac{- X_{1}^{w} X_{2}^{w}}{c - {(X_{1}^{w})}^{2}},

\frac{\partial X_{3}^{w}}{\partial X_{1}^{w}} = - X_{1}^{w} R_{3} Z_{3} \frac{\sqrt{1 - Z_{2}^{2}}}{\sqrt{c - {(X_{1}^{w})}^{2}}} = \frac{- X_{1}^{w} X_{3}^{w}}{c - {(X_{1}^{w})}^{2}} .

Using the above derivatives and symmetry among the inputs, the dependent Jacobian and the tensor metric are given by

J^{d} (X^{w}) = [\begin{matrix} 1 & \frac{- X_{1}^{w} X_{2}^{w}}{c - {(X_{2}^{w})}^{2}} & \frac{- X_{1}^{w} X_{3}^{w}}{c - {(X_{3}^{w})}^{2}} \\ \frac{- X_{1}^{w} X_{2}^{w}}{c - {(X_{1}^{w})}^{2}} & 1 & \frac{- X_{2}^{w} X_{3}^{w}}{c - {(X_{3}^{w})}^{2}} \\ \frac{- X_{1}^{w} X_{3}^{w}}{c - {(X_{1}^{w})}^{2}} & \frac{- X_{2}^{w} X_{3}^{w}}{c - {(X_{2}^{w})}^{2}} & 1 \end{matrix}], G (X^{w}) = {[J^{d} (X^{w})]}^{T} J^{d} (X^{w}) .

The following partial derivatives of f can be deduced:

\frac{\partial f}{d x^{w}} = [\begin{matrix} 1 & \frac{- X_{1}^{w} X_{2}^{w}}{c - {(X_{1}^{w})}^{2}} & \frac{- X_{1}^{w} X_{3}^{w}}{c - {(X_{1}^{w})}^{2}} \\ \frac{- X_{1}^{w} X_{2}^{w}}{c - {(X_{2}^{w})}^{2}} & 1 & \frac{- X_{2}^{w} X_{3}^{w}}{c - {(X_{2}^{w})}^{2}} \\ \frac{- X_{1}^{w} X_{3}^{w}}{c - {(X_{3}^{w})}^{2}} & \frac{- X_{2}^{w} X_{3}^{w}}{c - {(X_{3}^{w})}^{2}} & 1 \end{matrix}] [\begin{matrix} 2 X_{1}^{w} \\ 2 X_{2}^{w} \\ 2 X_{3}^{w} \end{matrix}] = [\begin{matrix} 2 X_{1}^{w} (1 - \frac{{(X_{2}^{w})}^{2}}{c - {(X_{1}^{w})}^{2}} - \frac{{(X_{3}^{w})}^{2}}{c - {(X_{1}^{w})}^{2}}) \\ 2 X_{2}^{w} (1 - \frac{{(X_{1}^{w})}^{2}}{c - {(X_{2}^{w})}^{2}} - \frac{{(X_{3}^{w})}^{2}}{c - {(X_{2}^{w})}^{2}}) \\ 2 X_{3}^{w} (1 - \frac{{(X_{1}^{w})}^{2}}{c - {(X_{3}^{w})}^{2}} - \frac{{(X_{2}^{w})}^{2}}{c - {(X_{3}^{w})}^{2}}) \end{matrix}] .

For given values of

X_{1}^{w}

,

X_{2}^{w}

, and

X_{3}^{w}

and as

c \to \infty

, we can see that

\frac{\partial f}{d x_{1}^{w}} = 2 X_{1}^{w}

, which is exactly the partial derivative of f when the inputs are independent. Note that as

c \to \infty

, the inputs become independent, as the constraint imposed on

X

is always satisfied.

Keeping in mind Equation (6), it is worth noting that the partial derivatives of f can be directly derived by making use of an equivalent DM of

X^{w}

, that is,

{(X_{2}^{w})}^{2} = Z_{2} (c - {(X_{1}^{w})}^{2}), {(X_{3}^{w})}^{2} = Z_{3} (c - {(X_{1}^{w})}^{2}) (1 - Z_{2})

, where

{(X_{1}^{w})}^{2} \sim B 1 (c, 1 / 2, 2)

,

Z_{2} \sim B e t a (1 / 2, 3 / 2)

and

Z_{3} \sim B e t a (1 / 2, 1)

are independent, with

B 1

representing the beta distribution of the first kind (see [15], Corollaries 2). Indeed, we have

f (X^{w}) = {(X_{1}^{w})}^{2} (1 - Z_{2} - Z_{3} (1 - Z_{2})) + c Z_{2} + c Z_{3} (1 - Z_{2}) = {(X_{1}^{w})}^{2} (1 - Z_{2}) (1 - Z_{3}) + c Z_{2} + c Z_{3} (1 - Z_{2});

and

\frac{\partial f}{d x_{1}^{w}} = 2 X_{1}^{w} (1 - Z_{2}) (1 - Z_{3}) = 2 X_{1}^{w} \frac{c - {(X_{1}^{w})}^{2} - {(X_{2}^{w})}^{2} - {(X_{3}^{w})}^{2}}{c - {(X_{1}^{w})}^{2}},

because

Z_{2} = \frac{{(X_{2}^{w})}^{2}}{c - {(X_{1}^{w})}^{2}}; 1 - Z_{2} = \frac{c - {(X_{1}^{w})}^{2} - {(X_{2}^{w})}^{2}}{c - {(X_{1}^{w})}^{2}}; Z_{3} = \frac{{(X_{3}^{w})}^{2}}{c - {(X_{1}^{w})}^{2} - {(X_{2}^{w})}^{2}} .

As a matter of fact, we obtain the same partial derivatives of f, keeping in mind the symmetry among the inputs.

7. Conclusions

A new approach for calculating and computing the partial derivatives, gradient, and Hessian of functions with non-independent variables is proposed and studied in this paper. It relies on (i) dependency functions that model the dependency structures among dependent variables, including correlated variables; (ii) the dependent Jacobian of the dependency functions; and (iii) the tensor metric using the dependent Jacobian. Based on the unique tensor metric due to the first fundamental form, the unique gradient of a function with non-independent variables is provided. Since the so-called dependent partial derivatives and the dependent Jacobian do not require any additional assumption (which is always the case), such derivatives (including the gradient) should be used.

The results obtained depend on the parameters of the distribution function or the density function of non-independent variables. For the values of such parameters that lead to independent variables, the proposed gradient and partial derivatives reduce to the formal gradient or the gradient with respect to the Euclidean metric. In the same sense, the proposed tensor metric reduces to the Euclidean metric using the above values of the parameters of the distribution function.

Using the proposed gradient and Hessian matrix, the Taylor-type expansion of a function with non-independent variables is provided. Although the generalized inverse of a symmetric matrix is used in some cases, more investigation is needed for the gradient calculus when the tensor metric is not invertible. The proposed gradient will be used for (i) the development of the active subspaces of functions with non-independent variables, and (ii) enhancing the optimization of functions subject to constraints.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The author would like to thank the six reviewers and the associate editor for their comments and suggestions that have helped improve this paper.

Conflicts of Interest

The author declares no competing financial interests or personal relationships that could have influenced the work reported in this paper.

Appendix A. Proof of Proposition 1

For continuous random variables and prescribed

(w_{1}, \dots, w_{d - 1})

, the Rosenblatt transform of

X_{\sim j} | X_{j}

is unique and strictly increasing ([16]). Therefore, the inverse of the Rosenblatt transform of

X_{\sim j} | X_{j}

is also unique ([17]), and we can write

X_{\sim j} \overset{d}{=} r_{j}^{'} (X_{j}, U),

where

U \sim U {(0, 1)}^{d - 1}

. For the prescribed distribution of the

d - 1

innovation variables

Z = (Z_{w_{i}} \sim F_{Z_{w_{i}}}, i = 1, \dots, d - 1)

, the above model becomes

X_{\sim j} \overset{d}{=} r_{j}^{'} (X_{j}, U_{1}, \dots, U_{d - 1}) \overset{d}{=} r_{j}^{'} (X_{j}, F_{Z_{w_{1}}} (Z_{w_{1}}), \dots, F_{Z_{w_{d - 1}}} (Z_{w_{d - 1}})) = r_{j} (X_{j}, Z),

because

Z_{w_{i}} \overset{d}{=} F_{Z w_{i}}^{- 1} (U_{i}) ⟺ U_{i} = F_{Z_{w_{i}}} (Z_{w_{i}})

for continuous variables. Thus, Point (i) holds.

For Point (ii), since

X_{\sim j} \overset{d}{=} r_{j}^{'} (X_{j}, U)

is the inverse of the Rosenblatt transform of

X_{\sim j} | X_{j}

, we then have the unique inverse

U = r_{j}^{^{' - 1}} (X_{\sim j} | X_{j}),

which yields the unique inverse of the DM. Indeed,

(F_{Z_{1}} (Z_{1}), \dots, F_{Z_{d - 1}} (Z_{1})) = r_{j}^{^{' - 1}} (X_{\sim j} | X_{j}) ⟹ Z = r_{j}^{- 1} (X_{\sim j} | X_{j}) .

Appendix B. Proof of Lemma 1

Using the partial derivatives

\frac{\partial r_{π_{i, k}}}{\partial x_{π_{j, k}}} (X_{π_{j, k}}, Z)

, with

i = 1, \dots, d_{k}

given by Equation (8) and the unique inverse of

X_{\sim π_{j, k}} \overset{}{=} r_{π_{j, k}} (X_{π_{j, k}}, Z_{k})

for any distribution of

Z_{k}

given by

Z_{k} = r_{π_{j, k}}^{- 1} (X_{\sim π_{j, k}} | X_{π_{j, k}})

(see Proposition 1), the result is immediate.

Appendix C. Proof of Theorem 2

Firstly, using the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{j, k}}

in (9), that is,

J^{(π_{j, k})} = {[\frac{\partial X_{π_{1, k}}}{\partial x_{π_{j, k}}} \dots \frac{\partial X_{π_{d_{k}, k}}}{\partial x_{π_{j, k}}}]}^{T}

with

\frac{\partial X_{π_{i, k}}}{\partial x_{π_{j, k}}} = \frac{\partial r_{π_{i, k}}}{\partial x_{π_{j, k}}} = J_{i}^{(π_{j, k})}

for any

π_{i, k} \in π_{k}

, the reciprocal rule allows us to write

\frac{\partial X_{π_{j, k}}}{\partial X_{π_{i, k}}} = \frac{1}{\frac{\partial X_{π_{i, k}}}{\partial X_{π_{j, k}}}} = \frac{1}{J_{i}^{(π_{j, k})}} .

Applying the chain rule yields

\frac{\partial X_{π_{i_{1}, k}}}{\partial X_{π_{i_{2}, k}}} = \frac{\partial X_{π_{i_{1}, k}}}{\partial X_{π_{j, k}}} \frac{\partial X_{π_{j, k}}}{\partial X_{π_{i_{2}, k}}} = \frac{J_{i_{1}}^{(π_{j, k})}}{J_{i_{2}}^{(π_{j, k})}} .

Thus, the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{i, k}}

are given by

\frac{\partial X_{π_{k}}}{\partial X_{π_{i, k}}} : = {[\frac{J_{1}^{(π_{j, k})}}{J_{i}^{(π_{j, k})}} \dots \frac{J_{d_{k}}^{(π_{j, k})}}{J_{i}^{(π_{j, k})}}]}^{T} = \frac{J^{(π_{j, k})}}{J_{i}^{(π_{j, k})}},

and the actual Jacobian matrix of

X_{π_{k}}

(i.e.,

\frac{\partial_{a} X_{π_{k}}}{\partial X_{π_{k}}}

) is given by

J_{π_{k}}^{a} : = [\begin{matrix} \frac{J^{(π_{j, k})}}{J_{1}^{(π_{j, k})}} & \dots & \frac{J^{(π_{j, k})}}{J_{d_{k}}^{(π_{j, k})}} \end{matrix}] .

Secondly, under the organization of the input variables (O), and for every pair

k_{1}, k_{2} \in {1, \dots, K}

with

k_{1} \neq k_{2}

, we have the following cross-Jacobian matrices:

\frac{\partial X_{π_{k_{1}}}}{\partial x_{π_{k_{2}}}} = O_{d_{k_{1}} \times d_{k_{2}}}; J_{π_{1}} : = \frac{\partial X_{π_{1}}}{\partial x_{π_{1}}} = I_{d_{1} \times d_{1}} .

Therefore, the actual Jacobian matrix of

X

is given by

\begin{matrix} J^{a} & : = & [\begin{matrix} \frac{\partial X_{π_{1}}}{\partial x_{π_{1}}} & \frac{\partial X_{π_{1}}}{\partial x_{π_{2}}} & \dots & \frac{\partial X_{π_{1}}}{\partial x_{π_{K}}} \\ \frac{\partial X_{2}}{\partial x_{π_{1}}} & \frac{\partial X_{2}}{\partial x_{π_{2}}} & \dots & \frac{\partial X_{2}}{\partial x_{π_{K}}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial X_{π_{K}}}{\partial x_{π_{1}}} & \frac{\partial X_{π_{K}}}{\partial x_{π_{2}}} & \dots & \frac{\partial X_{π_{K}}}{\partial x_{π_{K}}} \end{matrix}] = [\begin{matrix} I_{d_{1} \times d_{1}} & O & \dots & O \\ O & J_{π_{2}}^{a} & \dots & O \\ ⋮ & ⋮ & ⋱ & ⋮ \\ O & O & \dots & J_{π_{K}}^{a} \end{matrix}] . \end{matrix}

Finally, using the formal gradient of f in (7), that is,

\nabla f : = {[\nabla f_{x_{π_{1}}}^{T} \nabla f_{x_{π_{2}}}^{T} \dots, \nabla f_{x_{π_{K}}}^{T}]}^{T}

and bearing in mind the cyclic rule, we can write

\frac{\partial f}{\partial x_{π_{1}}} = {(\nabla f^{T} \frac{\partial X}{\partial x_{π_{1}}})}^{T} = \nabla f_{x_{π_{1}}}, \frac{\partial f}{\partial x_{π_{k}}} = {(\nabla f^{T} \frac{\partial X}{\partial x_{π_{k}}})}^{T} = {(\nabla f_{x_{π_{k}}}^{T} J_{π_{k}}^{a})}^{T},

and the actual partial derivatives of f are then given by

\frac{\partial_{a} f}{\partial x} = J^{a} {(x)}^{T} \nabla f (x)

.

Appendix D. Proof of Theorem 3

For Point (i), building the

d_{k}

dependency functions for every explanatory input

X_{π_{i, k}}

with

i = 1, \dots, d_{k}

, the partial derivatives of

X_{π_{k}}

with respect to

X_{π_{i, k}}

evaluated at

x_{π_{k}}

are given by (see Equation (9))

J^{(π_{i, k})} (x_{π_{k}}) = J^{(π_{i, k})} ((x_{π_{i, k}}, r_{π_{i, k}}^{- 1} (x_{π_{i, k}} | x_{π_{i, k}})); i = 1, \dots, d_{k} .

Points (ii)–(iii) are similar to those of Theorem 2 using the dependent Jacobian matrix given by Point (i).

Appendix E. Proof of Theorem 4

First, using Equation (15), given by

\frac{\partial f}{\partial x} (x) : = J^{d} {(x)}^{T} \nabla f (x)

, we can extract

\frac{\partial f}{\partial x_{π_{1}}} (x) = \nabla f_{x_{π_{1}}} (x); \frac{\partial f}{\partial x_{π_{k}}} (x) = J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \nabla f_{x_{π_{k}}} (x),

By applying the vector-by-vector derivatives of

\frac{\partial f}{\partial x_{π_{1}}} (x)

with respect to

x_{π_{1}}

and

x_{π_{k}}

, we have

\frac{\partial^{2} f}{\partial^{2} x_{π_{1}}} (x) = \frac{\partial [\nabla f_{x_{π_{1}}}]}{\partial x_{π_{1}}} (x) = H_{π_{1}} (x); \frac{\partial^{2} f (x)}{\partial x_{π_{1}} \partial x_{π_{k}}} = \frac{\partial [\nabla f_{x_{π_{1}}} (x)]}{\partial x} \frac{\partial x}{\partial x_{π_{k}}} = H_{π_{1}, π_{k}} (x) J_{π_{k}}^{d},

as

X_{π_{1}}

is a vector of independent variables and

\frac{\partial X}{\partial x_{π_{k}}} = [\begin{matrix} O_{d_{1} \times d_{k}} \\ ⋮ \\ J_{π_{k}}^{d} \\ ⋮ \\ O_{d_{K} \times d_{k}} \end{matrix}]

, considering the dependent Jacobian matrix provided in Equation (14).

In the same sense, the derivatives of

\frac{\partial f}{\partial x_{π_{k}}} (x)

with respect to

x_{π_{1}}

and

x_{π_{ℓ}}

with

ℓ \neq k

are

\begin{matrix} \frac{\partial^{2} f}{\partial x_{π_{k}} \partial x_{π_{1}}} & = & \frac{\partial [J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{1}}} = J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{1}}} \\ = & J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x} \frac{\partial x}{\partial x_{π_{1}}} = J_{π_{k}}^{d} {(x_{π_{k}})}^{T} H_{π_{k}, π_{1}} (x), \end{matrix}

\begin{matrix} \frac{\partial^{2} f}{\partial x_{π_{k}} \partial x_{π_{ℓ}}} & = & \frac{\partial [J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{ℓ}}} = J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{ℓ}}} \\ = & J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x} \frac{\partial x}{\partial x_{π_{ℓ}}} = J_{π_{k}}^{d} {(x_{π_{k}})}^{T} H_{π_{k}, π_{ℓ}} (x) J_{x_{π_{ℓ}}}^{d} (x_{π_{ℓ}}) . \end{matrix}

Finally, we have to derive the quantity

\frac{\partial^{2} f}{\partial^{2} x_{π_{k}}} = \frac{\partial [J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{k}}}

. For each

π_{ℓ, k} \in π_{k}

, we can write

\begin{matrix} \frac{\partial^{2} f}{\partial x_{π_{k}} \partial x_{π_{ℓ, k}}} = \frac{\partial [J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{ℓ, k}}} \\ = & \frac{\partial [J_{π_{k}}^{d} {(x_{π_{k}})}^{T}]}{\partial x_{π_{ℓ, k}}} \nabla f_{x_{π_{k}}} (x) + J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x_{π_{ℓ, k}}} \\ = & {[\frac{\partial J^{(π_{1, k})}}{\partial x_{π_{ℓ, k}}} \dots \frac{\partial J^{(π_{d_{k}, k})}}{\partial x_{π_{ℓ, k}}}]}^{T} \nabla f_{x_{π_{k}}} (x) + J_{π_{k}}^{d} {(x_{π_{k}})}^{T} \frac{\partial [\nabla f_{x_{π_{k}}} (x)]}{\partial x} \frac{\partial x}{\partial x_{π_{ℓ, k}}} \\ = & {[J_{1}^{(π_{ℓ, k})} \frac{\partial J^{(π_{1, k})}}{\partial x_{π_{1, k}}} \dots J_{d_{k}}^{(π_{ℓ, k})} \frac{\partial J^{(π_{d_{k}, k})}}{\partial x_{π_{d_{k}, k}}}]}^{T} \nabla f_{x_{π_{k}}} (x) + J_{π_{k}}^{d} {(x_{π_{k}})}^{T} H_{π_{k}} (x) J^{(π_{ℓ, k})} (x_{π_{k}}), \end{matrix}

because for all

i \in {1, \dots, d_{k}}

, we can write (thanks to the chain rule)

\frac{\partial J^{(π_{i, k})}}{\partial x_{π_{ℓ, k}}} : = {[\frac{\partial^{2} X_{π_{1, k}}}{\partial^{2} x_{π_{i, k}}} \frac{\partial x_{π_{i, k}}}{\partial x_{π_{ℓ, k}}} \dots \frac{\partial^{2} X_{π_{d_{k}, k}}}{\partial^{2} x_{π_{i, k}}} \frac{\partial x_{π_{i, k}}}{\partial x_{π_{ℓ, k}}}]}^{T} = \frac{\partial x_{π_{i, k}}}{\partial x_{π_{ℓ, k}}} \frac{\partial J^{(π_{i, k})}}{\partial x_{π_{i, k}}} = J_{i}^{(π_{ℓ, k})} \frac{\partial J^{(π_{i, k})}}{\partial x_{π_{i, k}}} .

Re-organizing the first element of the right-hand terms in the above equation yields

\begin{matrix} \frac{\partial^{2} f}{\partial x_{π_{k}} \partial x_{π_{ℓ, k}}} & = & d i a g ({[\frac{\partial J^{(π_{1, k})}}{\partial x_{π_{1, k}}} \dots \frac{\partial J^{(π_{d_{k}, k})}}{\partial x_{π_{d_{k}, k}}}]}^{T} \nabla f_{x_{π_{k}}} (x)) J^{(π_{ℓ, k})} + J_{π_{k}}^{d} {(x_{π_{k}})}^{T} H_{π_{k}} (x) J^{(π_{ℓ, k})} . \end{matrix}

By running

ℓ = 1, \dots, d_{k}

, we obtain the result.

References

Bobkov, S. Isoperimetric and Analytic Inequalities for Log-Concave Probability Measures. Ann. Probab. 1999, 27, 1903–1921. [Google Scholar] [CrossRef]
Roustant, O.; Barthe, F.; Iooss, B. Poincaré inequalities on intervals-application to sensitivity analysis. Electron. J. Statist. 2017, 11, 3081–3119. [Google Scholar] [CrossRef]
Lamboni, M.; Kucherenko, S. Multivariate sensitivity analysis and derivative-based global sensitivity measures with dependent variables. Reliab. Eng. Syst. Saf. 2021, 212, 107519. [Google Scholar] [CrossRef]
Lamboni, M. Weak derivative-based expansion of functions: ANOVA and some inequalities. Math. Comput. Simul. 2022, 194, 691–718. [Google Scholar] [CrossRef]
Russi, T.M. Uncertainty Quantification with Experimental Data and Complex System Models; University of California: Berkeley, CA, USA, 2010. [Google Scholar]
Constantine, P.; Dow, E.; Wang, S. Active subspace methods in theory and practice: Applications to kriging surfaces. SIAM J. Sci. Comput. 2014, 36, 1500–1524. [Google Scholar] [CrossRef]
Zhang, W.; Ge, S.S. A global Implicit Function Theorem without initial point and its applications to control of non-affine systems of high dimensions. J. Math. Anal. Appl. 2006, 313, 251–261. [Google Scholar] [CrossRef]
Cristea, M. On global implicit function theorem. J. Math. Anal. Appl. 2017, 456, 1290–1302. [Google Scholar] [CrossRef]
Jost, J.J. Riemannian Geometry and Geometric Analysis; Springer: Berlin, Heidelberg, Germany, 2011; Volume 6, pp. 1–611. [Google Scholar]
Petersen, P. Riemannian Geometry; Springer International Publishing AG: Berlin, Heidelberg, Germany, 2016. [Google Scholar]
Sommer, S.; Fletcher, T.; Pennec, X. Introduction to differential and Riemannian geometry. In Riemannian Geometric Statistics in Medical Image Analysis; Elsevier: Amsterdam, The Netherlands, 2020; pp. 3–37. [Google Scholar]
MITOpenCourseWare. Non-Independent Variables; Open Course; MIT Institute: Cambridge, MA, USA, 2007. [Google Scholar]
Skorohod, A.V. On a representation of random variables. Theory Probab. Appl. 1976, 21, 645–648. [Google Scholar]
Lamboni, M. On dependency models and dependent generalized sensitivity indices. arXiv 2021, arXiv:2104.12938. [Google Scholar]
Lamboni, M. Efficient dependency models: Simulating dependent random variables. Math. Comput. Simul. MATCOM 2022, 200, 199–217. [Google Scholar] [CrossRef]
Rosenblatt, M. Remarks on a Multivariate Transformation. Ann. Math. Statist. 1952, 23, 470–472. [Google Scholar] [CrossRef]
O’Brien, G.L. The Comparison Method for Stochastic Processes. Ann. Probab. 1975, 3, 80–88. [Google Scholar] [CrossRef]
Lamboni, M. On Exact Distribution for Multivariate Weighted Distributions and Classification. Methodol. Comput. Appl. Probab. 2023, 25, 41. [Google Scholar] [CrossRef]
Robert, C.P. From Prior Information to Prior Distributions. In The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation; Springer: New York, NY, USA, 2007; pp. 105–163. [Google Scholar]
Durante, F.; Ignazzi, C.; Jaworski, P. On the class of truncation invariant bivariate copulas under constraints. J. Math. Anal. Appl. 2022, 509, 125898. [Google Scholar] [CrossRef]
Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Epanechnikov, V. Nonparametric estimation of a multidimensional probability density. Theory Probab. Appl. 1969, 14, 153–158. [Google Scholar] [CrossRef]
McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
Durante, F.; Sempi, C. Principles of copula theory; CRC/Chapman & Hall: London, UK, 2015. [Google Scholar]
Moore, E. On the Reciprocal of the General Algebraic Matrix. Bull. Am. Math. Soc. 1920, 26, 394–395. [Google Scholar]
Moore, E. General analysis, Part 1. Mem. Amer. Phil. Soc. 1935, 1, 97–209. [Google Scholar]
Penrose, R. A generalized inverse for matrices. Proc. Cambrid. Phil. Soc. 1955, 51, 406–413. [Google Scholar] [CrossRef]
Rund, H. Differential-geometric and variational background of classical gauge field theories. Aequationes Math. 1982, 24, 121–174. [Google Scholar] [CrossRef]
Vincze, C. On the extremal compatible linear connection of a generalized Berwald manifold. Aequationes Math. 2022, 96, 53–70. [Google Scholar] [CrossRef]
YiHua, D. Rigid properties for gradient generalized m-quasi-Einstein manifolds and gradient shrinking Ricci solitons. J. Math. Anal. Appl. 2023, 518, 126702. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lamboni, M. Derivative Formulas and Gradient of Functions with Non-Independent Variables. Axioms 2023, 12, 845. https://doi.org/10.3390/axioms12090845

AMA Style

Lamboni M. Derivative Formulas and Gradient of Functions with Non-Independent Variables. Axioms. 2023; 12(9):845. https://doi.org/10.3390/axioms12090845

Chicago/Turabian Style

Lamboni, Matieyendou. 2023. "Derivative Formulas and Gradient of Functions with Non-Independent Variables" Axioms 12, no. 9: 845. https://doi.org/10.3390/axioms12090845

APA Style

Lamboni, M. (2023). Derivative Formulas and Gradient of Functions with Non-Independent Variables. Axioms, 12(9), 845. https://doi.org/10.3390/axioms12090845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Derivative Formulas and Gradient of Functions with Non-Independent Variables

Abstract

1. Introduction

General Notations

2. Probabilistic Characterization of Functions with Non-Independent Variables

2.1. New Insight into Dependency Functions

2.2. Enhanced Implicit Functions: Dependency Functions

2.3. Representation of Functions with Non-Independent Variables

3. Actual Partial Derivatives

Example 1

4. Dependent Jacobian and Partial Derivatives

Example 1 (Revisited)

5. Expansion of Functions with Non-Independent Variables

Example 1 (Revisited)

6. Application

7. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Proposition 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 2

Appendix D. Proof of Theorem 3

Appendix E. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI