1. Introduction
We used to work with models defined through functions that include non-independent variables such as correlated input variables. This is also the case for models defined via functions with independent input variables and equations or inequations connecting such inputs; functions subject to constraints involving input variables and/or the model output. Knowing the key role of partial derivatives at a given point in (i) the mathematical analysis of functions and convergence, (ii) Poincaré inequalities ([
1,
2]) and equalities ([
3,
4]), (iii) optimization and active subspaces ([
5,
6]), (iv) implicit functions ([
7,
8]) and (v) differential geometry (see, e.g., [
9,
10,
11]), it is interesting and relevant to have formulas that enable the calculations of partial derivatives of functions in the presence of non-independent variables, including the gradients. Of course, such formulas must account for the dependency structures among the model inputs, including the constraints imposed on such inputs.
Actual partial derivatives aim to calculate the partial derivatives of functions, taking into account the relationships between the input variables ([
12]). For instance, let us consider the function
given by
under the constrained equation
, where
h is any smooth function. Using
to represent the formal partial derivative of
f with respect to.
x, which is the partial derivative of
f when considering other inputs as constant or independent, the chain rule yields the following partial derivative:
In probability and statistics, the point
represents a realization or a sample value of the implicit random vector
. The quantities
, and
do not have a definite meaning when the variables
are npn-independent. At this point, determining
and
is often challenging without supplementary assumptions such as the choice of directions or paths. When
or
are independent and using the equation
, we can write ([
12]):
It is clear that the actual partial derivatives are not unique, as such derivatives rely on two different paths or assumptions. While each supplementary assumption can make sense in some cases, it cannot always be guaranteed by the constrained equation in general. Indeed, when all the initial input variables are dependent or correlated, the above partial derivatives are no longer valid, even for a linear function h, and it is worth finding the correct relationship between the input variables and the partial derivatives, including the gradient.
In differential geometry (see, e.g., [
9,
10,
11]), using the differential of the function
f, that is,
the gradient of
f is defined as the dual of
with respect to a given tensor metric. Obviously, different tensor metrics will yield different gradients of the same function. While the Euclidean metric, given by
, is more appropriate for independent inputs, finding the appropriate metrics is challenging in general. Indeed, the first fundamental form (in differential geometry) requires the Jacobian matrix of the inputs to define the associated tensor metric.
For non-independent variables with
F as the joint cumulative distribution function (CDF), the bivariate dependency models ([
13]) and multivariate dependency models ([
3,
14,
15]), including the conditional and inverse Rosenblatt transformation ([
16,
17]), establish formal and analytical relationships among such variables using either CDFs or the corresponding copulas or new distributions that look like and behave like a copula ([
18]). A dependency function characterizes the probabilistic dependency structures among these variables. For a
d-dimensional random vector of non-independent variables, the dependency models express a subset of
variables as a function of independent variables, consisting of the remaining input and new independent variables.
In this paper, we propose a new approach for calculating the partial derivatives of functions, considering the dependency structures among the input variables. Our approach relies on dependency models. By providing known relationships between the dependent inputs (including constraints imposed on inputs or outputs), dependency models can be regarded as global and probabilistic implicit functions (see
Section 2.2). Such dependency models are used to determine the dependent Jacobian and the tensor metric for non-independent variables. The contributions of this paper are threefold:
To provide a generalization of the actual partial derivatives of functions with non-independent variables and establish its limits;
To introduce the general derivative formulas of functions with non-independent variables (known as dependent partial derivatives) without any additional assumptions;
To provide the gradient, Hessian, and Taylor-type expansions of functions with non-independent variables that comply with the dependency structures among the input variables.
In
Section 2, we first review the dependency models of dependent input variables, including correlated variables. Next, we derive interesting properties of these models regarding the calculus of partial derivatives and probabilistic implicit functions. By coupling dependency functions with the function of interest, we extend the actual partial derivatives of functions with non-independent variables in
Section 3. To avoid their drawbacks, the dependent partial derivatives of functions with non-independent variables are provided in
Section 4. The gradient and Hessian matrix of these functions are derived in
Section 5 using the framework of differential geometry. We provide an application in
Section 6 and conclude this work in
Section 7.
General Notations
For an integer , let be a random vector of continuous variables with F as the joint cumulative distribution function (CDF) (i.e., ). For any , we use or for the marginal CDF of and for its inverse. Also, we use for an arbitrary permutation of and .
For a function f that includes as inputs, we use for the formal partial derivative of f with respect to , considering other inputs as constant or independent of , and . We use for the partial derivative of f with respect to , which takes the dependencies among inputs into account. We also use . Of course, for independent inputs.
2. Probabilistic Characterization of Functions with Non-Independent Variables
In probability theory, it is common to treat input variables as random vectors with associated CDFs. For instance, for the inputs that take their values within a known domain or space, the Bayesian framework allows for assigning a joint distribution, known as
a prior distribution, to such inputs. Without additional information about the inputs, it is common to use non-informative
prior distributions, such as uniform distributions or Gaussian distributions associated with higher values of the variances (see e.g., [
19]).
Functions with non-independent variables include many types of models encountered in practice. An example is the models defined via a given function and equations or inequations connecting its inputs. The resulting inputs that comply with such constraints are often dependent or correlated. In the subsequent discussion, we use probability theory to characterize non-independent variables (see Definition 1).
Definition 1. Consider and a function that includes as inputs.
Then, f is said to be a function with non-independent variables whenever there exists at least one pair with , such that Using
for the multivariate normal distribution, we can check that a function that includes
as inputs, with
being a non-diagonal covariance matrix, is a member of the class of functions defined by
2.1. New Insight into Dependency Functions
In this section, we recall useful results about generic dependency models of non-independent variables (see [
3,
13,
14,
15,
18]). For a
d-dimensional random vector of non-independent variables (i.e.,
), a dependency model of
consists of expressing a subset of
variables (i.e.,
) as a function of independent variables, including
.
Formally, if
with
, then there exists ([
3,
13,
14,
15,
18]):
- (i)
New independent variables , which are independent of ;
- (ii)
A dependency function ,
such that
where
, and
means that the random variables
A and
B have the same CDF.
It is worth noting that the dependency model is not unique in general. Uniqueness can be obtained under additional conditions provided in Proposition 1, which enable the inversion of the dependency function .
Proposition 1. Consider a dependency model of the continuous random vector , given by , with a prescribed order .
If is the explanatory variable and the distribution of is prescribed, then:
- (i)
The dependency model is uniquely defined;
- (ii)
The dependency model is invertible, and the unique inverse is given by
It should be noted that the dependency models (DMs) are vector-valued functions of independent input variables. Thus, DMs facilitate the calculus of partial derivatives, such as the partial derivatives of with respect to . Moreover, the inverse of a DM avoids working with . A natural choice of the order is .
2.2. Enhanced Implicit Functions: Dependency Functions
In this section, we provide a probabilistic version of the implicit function using DMs.
Consider
, a sample value of
given by
, and a function
with
an integer. When connecting the input variables by
p compatible equations, that is,
the well-known theorem of the implicit function (see Theorem 1 below) states that for each sample value
satisfying
, a subset of
can be expressed as a function of the others in the neighborhood of
. To recall this theorem, we use
with
,
,
, and
(resp.
) for an open ball centered on
(resp.
) with a radius of
(resp.
). Again,
(resp.
) is the formal Jacobian of
h with respect to
(resp.
).
Theorem 1 (implicit function)
. Assume that and is invertible. Then, there exists a function , such that While Theorem 1 is useful, it turns out that the implicit function theorem provides a local relationship among the variables. It should be noted that the DMs derived in
Section 2.1 provide global relationships once the CDFs of the input variables are known. The distribution function of the variables that satisfy the constraints given by
is needed to construct the global implicit function. To derive this distribution function, we assume that:
(A1) All the constraints are compatible.
Under (A1), the constraints introduce new dependency structures on the initial CDF F, which matters for our analysis. Probability theory ensures the existence of a distribution function that captures these dependencies.
Proposition 2. Let and be the constrained variables. If (A1) holds, we havewhere denotes equality in distribution. Introducing constraints on initial variables leads us to work with constrained variables that follow a new CDF, that is,
. Some examples of generic and constrained variables
and their corresponding distribution functions can be found in [
14,
15,
20]. When analytical derivations of the CDF of
are challenging or not possible, a common practice is to fit a distribution function to the observations of
using numerical simulations (see [
14,
21,
22,
23,
24,
25] for examples of distribution and density estimations). By using the new distributions of the input variables, Corollary 1 provides the probabilistic version of the implicit function.
Definition 2. A distribution G is considered to be a degenerate CDF when G is the CDF of the Dirac measure with as the probability mass function, where .
Corollary 1 ([
14,
15])
. Consider a random vector that follows as a CDF. Assume that (A1) holds and is a nondegenerate CDF.Then, there exists a function and independent variables , such that is independent of and Proof. This result is the DM for the distribution
(see
Section 2.1). □
While Corollary 1 gives the explicit function that links
to
, we can sometimes extend this result as follows:
where
is a vector of
independent variables,
is independent of
, and
(see
Section 2.1 and [
14]).
Remark 1. We can easily generalize the above process to handle (i) constrained inequations such as or (see Section 6 and [15]), and (ii) a mixture of constrained equations and inequalities involving different variables. Remark 2. For a continuous random vector with C as the copula and , as the marginal distributions, an expression of its DM is given by ([3,14])where is the conditional copula, is the inverse of , and and are real-valued functions. 2.3. Representation of Functions with Non-Independent Variables
In general, a function may include a group of independent variables, as well as groups of non-independent variables such as correlated variables and/or dependent variables. We can then organize these input variables as follows:
(O): the random vector consists of K independent random vector(s) given by , where the sets form a partition of . The random vector is independent of for every pair with . Without loss of generality, we use for a random vector of independent variable(s) and with for a random vector of dependent variables.
We use
to denote the ordered permutation of
(i.e.,
). For any
, we use
to refer to an element of
;
and
. Keeping in mind the DMs (see
Section 2.1), we can represent
by
where
;
is a random vector of
independent variables; and
is independent of
. Based on the above DM of
with
, let us introduce new functions, that is,
, given by
and
The function
maps the independent variables
onto
, and the chart
leads to a new representation of functions with non-independent variables. Indeed, composing
f with
c yields
where
.
The equivalent representation of
f, given by (
6), relies on the innovation variables
. Recall that for the continuous random vector
, the DM
, given by (
5), is always invertible (see Proposition 1), and, therefore,
is also invertible. These inversions are helpful for working with
only.
3. Actual Partial Derivatives
This section discusses the calculus of partial derivatives of functions with non-independent variables using only one relationship among the inputs, such as the DM given by Equation (
5). The usual assumptions made are:
(A2) The joint (resp. marginal) CDF is continuous and has a density function on its open support;
(A3) Each component of the dependency function is differentiable with respect to ;
(A4) Each component of the function f, that is, with , is differentiable with respect to each input.
Without loss of generality and for the sake of simplicity, here, we suppose that
. Namely, we use
for the identity matrix, and
for the null matrix. It is common to use
for the formal partial derivatives of
f with respect to each input of
(i.e., the derivatives obtained by considering inputs as independent) with
. Thus, the formal gradient of
f (i.e., the gradient with respect to the Euclidean metric) is given by
Keeping in mind the function
, the partial derivatives of each component of
with respect to
are given by
We use
for the
element of
. For instance,
and
represent the partial derivative of
with respect to
. It is worth recalling that
is a vector-valued function of
, and Lemma 1 expresses
as a function of
only.
Lemma 1. Let be a sample value of . If Assumptions (A2)–(A4) hold, the partial derivatives of with respect to evaluated at are given by Again,
with
represents the
component of
, as provided in Lemma 1. Using these components and the chain rule, Theorem 2 provides the actual partial derivatives of functions with non-independent variables (i.e.,
, which are the derivatives obtained by utilizing only one dependency function given by Equation (
5).
Theorem 2. Let be a sample value of and with . If Assumptions (A2)–(A4) hold, then:
- (i)
The actual Jacobian matrix of is given by - (ii)
The actual Jacobian matrix of c or is given by - (iii)
The actual partial derivatives of f are given by
The results from Theorem 2 are based on only one dependency function, which uses
as the explanatory input. Thus, the actual Jacobian
and the actual partial derivatives of
f provided in (
10)–(
12) will change with the choice of the explanatory input
for every
. All these possibilities are not surprising. Indeed, while no additional explicit assumption is necessary for calculating the partial derivatives of
with respect to
(i.e.,
), we implicitly keep the other variables fixed when calculating the partial derivative of
with respect to
, that is,
for each
. Such an implicit assumption is due to the reciprocal rule used to derive the results (see
Appendix C). In general, the components of
, such as
and
, are both functions of
and
at the very least. Thus, different possibilities of the actual Jacobians are based on different implicit assumptions, making it challenging to use the actual partial derivatives. Further drawbacks of the actual partial derivatives of
f are illustrated in
Section 3 Example 1
We consider the function
, which includes two correlated inputs
. We see that
. Using the DM of
given by (see [
3,
14,
15])
the actual Jacobian matrix of
c and the actual partial derivatives of
f are given by
When
, both inputs are perfectly correlated, and we have
, which also implies that
. We can check that
. However, when
, both inputs are independent, and we should expect the actual partial derivatives to be equal to the formal gradient
, but this is not the case. Moreover, using the second DM, which is given by
it becomes apparent that
which differs from the previous results. All these drawbacks are due to the implicit assumptions made (e.g., keeping some variables fixed), which can be avoided (see
Section 4).
4. Dependent Jacobian and Partial Derivatives
This section aims to derive the first- and second-order partial derivatives of functions with non-independent variables, without relying on any additional assumption, whether explicit or implicit. Essentially, we calculate or compute the partial derivatives of
with respect to
using only the dependency function that includes
as an explanatory input, which can be expressed as follows:
By using the above dependency function, the partial derivatives of
with respect to
are given as follows (see (
9)):
It should be noted that
does not require any supplementary assumption, as
and
are independent. Thus,
different DMs are necessary to derive the dependent Jacobian and partial derivatives of
f (see Theorem 3).
Theorem 3. Let be a sample value of , and assume that (A2)–(A4) hold:
- (i)
For all , the dependent Jacobian matrix is given by - (ii)
The dependent Jacobian matrix is given by - (iii)
The partial derivatives of f are given by
Although the results from Theorem 3 require different DMs, these results are more comprehensive than the actual partial derivatives because no supplementary assumption is available for each non-independent variable.
To derive the second-order partial derivatives of f, we use to denote the formal cross-partial derivative of f with respect to and , and to denote the formal or ordinary Hessian matrices of f restricted to for . In the same sense, we use to denote the formal cross-Hessian matrix of f restricted to for every pair with . To ensure the existence of the second-order partial derivatives, we assume that:
(A5) The function f is twice (formal) differentiable with respect to each input;
(A6) Every dependency function is twice differentiable with respect to .
By considering the
DMs of
(i.e.,
, with
) used to derive the dependent Jacobian, we can write
for the second partial derivatives of
with respect to
. By using
to represent a diagonal matrix with
as its diagonal elements and
for all
, Theorem 4 provides the dependent second-order partial derivatives (i.e.,
).
Theorem 4. Let be a sample value of . If (A2), (A5), and (A6) hold, then Example 1 (Revisited)
Since
and the DMs of
are given by
we can check that
For instance, when
, we have
, and when
, we have
and
. Thus, the dependent partial derivatives of
f align with the formal gradient and Hessian matrix when the inputs are independent.
5. Expansion of Functions with Non-Independent Variables
Although
Section 4 provided the partial derivatives and cross-partial derivatives of
f, it is misleading to think that the infinitesimal increment of
f, given by
, should result in the individual effect quantified by
with
and
. Indeed, when moving
, it leads to partial movements of the other variables, and the effects we observe (i.e.,
) can also be attributed to other variables. The dependency structures of these effects are described by the dependent Jacobian matrix
(see Equation (
13)). Therefore, the definition of the gradient and Hessian of
f with non-independent variables requires the introduction of a tensor metric or a Riemannian tensor.
In differential geometry, the function of the form
for every
can be seen as a parametrization of a manifold
in
. The
column entries of the dependent Jacobian matrix
span a local
-dimensional vector space, also known as the tangent space at
, where
is the rank of
, indicating the number of linearly independent columns in
.
By considering all the
K groups of inputs and the corresponding dependent Jacobian matrix
, we can see that the support of the random vector
forms an
m-dimensional manifold
in
, where
m is the rank of
. When
, we work within the tangent space
(or local coordinate system) spanned by the
m column entries of
that are linearly independent. Working in
rather than
ensures that the Riemannian tensor induced by
using the dot product is invertible. Since the Riemannian tensor metric is often symmetric, the Moore–Penrose generalized inverse of symmetric matrices ([
26,
27,
28]) allows us to keep working in
in the subsequent discussion. Using the first fundamental form (see, e.g., [
9,
10,
11]), the induced tensor metric is defined as the inner product between the column entries of the dependent Jacobian matrix of the dependency functions, that is,
Based on these elements, the gradient and Hessian matrix are provided in Corollary 2. To that end, we use
to represent the inverse of the metric
G, as given by Equation (
17), when
, or the generalized inverse of
G for every
([
26,
27,
28]). For any
, the Christoffel symbols are defined by ([
9,
11,
29,
30])
where
is the formal partial derivative of
with respect to
.
Corollary 2. Let be a sample value of , and assume that (A2) and (A5)–(A6) hold:
(i) The gradient of f is given by (ii) The Hessian matrix of f is given by Proof. Points (i)–(ii) result from the definition of the gradient and the Hessian matrix within a Riemannian geometric context equipped with the metric
G (see [
9,
10,
11,
31]). □
Taylor’s expansion is widely used to approximate functions with independent variables. In the subsequent discussion, we are concerned with the approximation of a function with non-independent variables. The Taylor-type expansion of a function with non-independent variables is provided in Corollary 3 using the gradient and Hessian matrix.
Corollary 3. Let , be two sample values of , and assume that (A2) and (A5)–(A6) hold. Then, we haveprovided that is close to . Proof. The proof is straightforward using the dot product induced by the tensor metric
G within the tangent space and considering the Taylor expansion provided in [
11]. □
Example 1 (Revisited)
For the function in Example 1, we can check that the tensor metric is
; and the gradient is
which reduces to
when the variables are independent (i.e.,
).
6. Application
In this section, we consider three independent input factors
with
,
, a constant
, and the function
Also, we consider the constraint
. It is known in [
15] (Corollary 4) that the DM of
is given by
We can then write
Using the above derivatives and symmetry among the inputs, the dependent Jacobian and the tensor metric are given by
The following partial derivatives of
f can be deduced:
For given values of
,
, and
and as
, we can see that
, which is exactly the partial derivative of
f when the inputs are independent. Note that as
, the inputs become independent, as the constraint imposed on
is always satisfied.
Keeping in mind Equation (
6), it is worth noting that the partial derivatives of
f can be directly derived by making use of an equivalent DM of
, that is,
, where
,
and
are independent, with
representing the beta distribution of the first kind (see [
15], Corollaries 2). Indeed, we have
and
because
As a matter of fact, we obtain the same partial derivatives of
f, keeping in mind the symmetry among the inputs.
7. Conclusions
A new approach for calculating and computing the partial derivatives, gradient, and Hessian of functions with non-independent variables is proposed and studied in this paper. It relies on (i) dependency functions that model the dependency structures among dependent variables, including correlated variables; (ii) the dependent Jacobian of the dependency functions; and (iii) the tensor metric using the dependent Jacobian. Based on the unique tensor metric due to the first fundamental form, the unique gradient of a function with non-independent variables is provided. Since the so-called dependent partial derivatives and the dependent Jacobian do not require any additional assumption (which is always the case), such derivatives (including the gradient) should be used.
The results obtained depend on the parameters of the distribution function or the density function of non-independent variables. For the values of such parameters that lead to independent variables, the proposed gradient and partial derivatives reduce to the formal gradient or the gradient with respect to the Euclidean metric. In the same sense, the proposed tensor metric reduces to the Euclidean metric using the above values of the parameters of the distribution function.
Using the proposed gradient and Hessian matrix, the Taylor-type expansion of a function with non-independent variables is provided. Although the generalized inverse of a symmetric matrix is used in some cases, more investigation is needed for the gradient calculus when the tensor metric is not invertible. The proposed gradient will be used for (i) the development of the active subspaces of functions with non-independent variables, and (ii) enhancing the optimization of functions subject to constraints.