1. Introduction
Taylor’s theorem [
1] is one of the most fundamental results in functional analysis. Taylor–Young and Maclaurin series are widely used for approximating smooth functions around a given point, a.k.a. a node. For instance, such series have been used in partial differential equations [
2], in stochastic computations of gradients [
3,
4,
5,
6,
7,
8,
9], and in statistical analysis of functions [
10,
11]. Other instances are listed in [
12,
13,
14] and the references therein. Stochastic Taylor expansions or Ito–Taylor expansions have been proposed to expand functions of solutions of stochastic differential equations (see e.g., [
15,
16]), and a version of probabilistic Taylor’s expansion has been considered in [
17,
18,
19,
20,
21] by treating the input as a random variable.
Basically, Taylor series consist of linearizing smooth functions using information about the derivatives of such functions at one given node only, which ease derivations of results in the aforementioned scientific fields and others. Linearizing a function at a single node leads to a local approximation, and the quality of such an approximation often decays quickly for the input values that are far away form this node. Such local approximations make it difficult to assess the behavior of numerical schemes and/or algorithms relying on Taylor series when the input values are not close to the node, because the remainder term is often unknown. Such epistemic uncertainties of Taylor series around given nodes have been addressed in [
12] using a Gaussian process. The posterior variance of the Gaussian process given derivative data sets serves as a measure of credibility for the Taylor approximations when the input values are far away from the corresponding nodes.
The issues of choosing the unique node arise in some situations, as derivative data sets become available at different input values [
22,
23], and they can serve as nodes for function approximations. In contrast to one-node-based approximations, it is very interesting to make best use of such derivative data sets for function approximations, particularly when the first-order derivative information is available at some points, and the second-order one is observed at other points. Moreover, reducing the epistemic uncertainty due the remainder term has been considered in [
13,
14,
24] by deriving the optimal Taylor-like formula, which requires evaluating the first-order and second-order derivatives at different nodes. Each node has a specific weight, leading to a kind of discrete random variable.
So far, the aforementioned expansions or approximations of functions have been developed in the spirit of Taylor’s theorem and/or the mean value theorem. In this paper, a new and unified stochastic framework for Taylor-type expansions of smooth functions is proposed by means of independent random variables. The proposed probabilistic expansions of a function are able to incorporate evaluations of derivatives at different points, leading to a global approach. It turns out that the derived nodes for the proposed approach are expectations of the random variables used, and the coefficients used in such expansions involve the moments of derivatives. Also, exact expansions are obtained for any order of available derivatives, and such expansions enable the statistical inference of the remainder terms. Moreover, the proposed approach can be deployed for approximating functions that are not differentiable at a set of countable points. It appears that the traditional Taylor–Young and Maclaurin series are particular cases of the proposed approach by allowing the random variables to follow the Dirac measure. Thus, taking the Dirac probability measure leads to enhanced guidelines for using Taylor series. Different ways of choosing the optimal distributions of random variables are provided when truncations are applied.
The paper is organized as follows:
Section 2 provides the generic expansions of functions using the concept of zero-one-ended functions, including cumulative distribution functions. Cumulative distribution functions are used to derive the probabilistic Taylor-type expansions of functions. Links with the Taylor series are established in that section.
Section 3 addresses a kind of convergence analysis by deriving the error bounds of truncated expansions at a given order. Such upper-bounds are then used for deriving the optimal cumulative distribution functions.
Section 4 provides simulated results comparing the proposed approach to the traditional Taylor expansions using different functions, and we conclude this work in
Section 5.
Consider a set and a measurable and deterministic function given by . Denote with Y a continuous random variable having F as the cumulative distribution function (CDF) and as the probability density function (PDF). Given an integer , with are i.i.d. copies of Y. Also, denote with (resp. ) the expectation (resp. variance) operator taken w.r.t. when there is no ambiguity.
To work with a large class of functions, the concept of weakly differentiable functions is needed. Let
be a class of testing functions, that is, any
has a compact support included in an open set
and
[
25,
26]. For any
,
is its derivative.
Definition 1 ([
25,
26])
. Let .A function f is said to be weakly differentiable if there exists a locally integrable function h such that for any , The function
h is the weak derivative of
f w.r.t.
x, that is,
, and it is uniquely defined almost everywhere. Moreover, the distribution theory allows for defining the weak derivatives of
f of every order. For a given integer
, denote with
the
kth-order weak derivative of
f w.r.t.
x, and define the following space:
where
means that
, with
F being a CDF.
2. Expansions of a Function Using Other Functions
This section aims at providing expansions of a function using its available weak derivatives and specific functions. We start with expansions of f based on specific functions (called zero-one-ended functions), followed by (i) the probabilistic expansions of the deterministic f using the CDFs of continuous random variables, and (ii) the links between such expansions and Taylor series.
Given , denotes the Heaviside function or indicator function, that is, if and 0 otherwise. Its distribution derivative is the Dirac delta function or measure , leading to .
Definition 2. A function is said to be zero-one-ended whenever Obviously, when g has limits at a and b, then and . Instances of zero-one-ended functions are on with , and the large class of the well-known CDFs of continuous random variables, defined on the supports of such variables. Weak-differentiable and zero-one-ended functions (even negative) enable the expansions of every smooth function.
Assumption 1 (A1)
. The function f belongs to given by Equation (1). Assumption (A1) provides the needful for not only ensuring that f is smooth enough and is integrable w.r.t. the probability measure F, but also for being able to work with finite second-order moments of , when evaluating it at a random variable. The global expansion of f alongside the zero-one-ended function g is derived in the following Lemma.
Lemma 1. Let be a zero-one-ended function on Ω. Assume and (A1) hold. Then, the pth-order expansion of f at is Proof. We derive this result using the recurrence reasoning. Firstly, the integration by part gives
Adding an integral to the above equation yields the zero-order expansion of
f. Indeed,
Secondly, by applying the zero-order expansion to the function
, that is,
and replacing
with the above expression in Equation (
3), we obtain the first-order expansion of
f given by
Thirdly, combining the remainder term of Equation (
2) corresponding to the order
with the zero-order expansion of
, that is,
yields
and the result holds. □
The identity in Equation (
2) is an exact expansion of
f at any
. We then approximate
f by omitting the last term of Equation (
2). Such an approximation remains exact for polynomials of degrees up to
p. In view of Lemma 1, there are many possibilities of choosing
g. The natural choice of
g in the probability theory is the CDF, as it can bring information about the input or the output space. Another interesting choice of
g might rely on regression functions used in statistical modeling. Before taking the CDFs to represent
g, let us introduce the moments of
f evaluated at a random variable
X using its first-order derivative and the exact expansion given by Equation (
2). In what follows, denote
and
, which become
and
when
f has limits at such points.
Proposition 1. Let be a random variable having G as the CDF. If , Then, Proof. For any smooth function
h, using Lemma 1 and bearing in mind Fubini’s theorem [
27,
28], one can write
and the result holds. □
Applications of such a result give new identities involving the moments of functions evaluated at random variables. The following corollary provides such identities.
Corollary 1. Let be a random variable having G as the CDF. Assume and has finite th-order moment. Then, for any , Proof. The first result holds using , and the second one using (See Proposition 1). □
In view of Corollary 1, the moments of
do not depend on
g, which is used to expand
f. It is also shown in [
29] that the variance of
is
2.1. Probabilistic Expansions of Functions
In this section, the expansions of f based on particular zero-one-ended functions, such as CDFs, are considered. Before giving such expansions, the following intermediate results are needed. Denote with F a continuous CDF and the corresponding probability density function, that is, .
Proposition 2. Let , and be a random variable having finite th-order moment. Then, the following identity holds: Proof. Let
. It follows from the integration by part that
and the result holds. □
Lemma 2. Let , be a random variable and be i.i.d. copies of Y. If Y has finite th-order moment, thenwith the sequences Proof. Let us show the result using recurrence reasoning. The first step (i.e.,
) holds using (
4) with
, that is,
Secondly, for all
, suppose that Equation (
5) is still valid.
Thirdly, we are going to show that (
5) is also true when
. Indeed,
using the change of variables of the form
. □
It is sometimes very interesting to have the above expression around a constant, such as the expectation of Y. Such a result is provided in Lemma 3.
Lemma 3. Let , be a random variable and be i.i.d. copies of Y. If Y has finite th-order moment, thenwith the sequences Proof. The first step (i.e.,
) holds using (
4) with
, that is,
Secondly, for all
, suppose that Equation (
7) is still valid.
Thirdly, we are going to show that (
7) is also true when
. Indeed,
using the change of variables
and the fact that
. □
Now, we have all the elements in hand to provide the Taylor-type expansions of functions based on CDFs of random variables and their moments (see Theorem 1). To that end, consider the sequences
and
given by (
6) and (
8), respectively, and define the coefficients
and
Theorem 1. Let , be a random variable, be i.i.d. copies of Y, and assume (A1) holds. Then, the pth-order expansions of f at x are Proof. When taking
, Equation (
2) becomes
with the convention
. For the first result, using Lemma 2, that is,
we can write
The second result holds using the same reasoning thanks to Lemma 3, that is,
□
Theorem 1 provides exact and global expansions of smooth functions at any order
, and it appears that different expansions are possible thanks to the choice of CDFs. For polynomials of degree
p, the last terms in Equations (
9) and (
10) vanish, and exact expansions still hold using the first
pth-order weak-derivatives of
f. In general, the last terms are commonly known as remainder terms, and the smaller they are, the better the expansions are without such terms, that is,
Instead of using the above approximations, one may consider the exact expansions at order
, which include the remainder term when only the first
pth-order weak derivatives are available. Indeed, the expectation
can be computed in that situation, leading to the following estimator:
where
is an i.i.d. sample of
for any
.
Additionally, to polynomials of degrees up to
p, note that approximations
of
f become exact when
for any function
. The most important question is as follows: can we find or construct different distribution functions
F such that
? Without an answer to that open problem, this study examines the error bounds so as to find a reasonable CDF (see
Section 3).
2.2. Links with Taylor–Young and Maclaurin Series
Taylor series are well known and used by different communities for locally approximating functions. In this section, we are going to show in Corollary 2 that such series are particular cases of the expansions of functions derived in Theorem 1. To that end, for any
, denote with
a sequence of uniformly distributed random variables, having
as the CDF. Applying Equation (
10) using
yields
which is a sequence of functions that expand
f as well. But, it is worth noting that
for only
because
is a zero-one-ended function on that support.
Corollary 2. Let , and assume (A1) holds. Then, we have Proof. The sequence of random variables given by
has
as density for all
. The density
is also known as the rectangular function, and it converges in distribution toward the Dirac delta measure, that is,
. The CDF of
is given by
and
. Therefore, the sequence given by Equation (
8) becomes
and a direct calculus gives
because for all
Moreover, giving the identity
one can write the following limit of
:
because the last limit vanishes, keeping in mind the dominated convergence theorem. Indeed, using
for all real
and
for all sets
, we have
and
Since the last term is integrable, the dominated convergence theorem allows us to write
and the result holds. □
Obviously, Corollary 2 provides the exact expansion of , and good approximations of for any with . Extending such approximations for any might not give accurate results, except for some particular cases. Indeed, it is common to consider expanding f within different subsets s of that form a partition of when derivative data sets are available within each .
3. Error Bounds and Choice of Distribution Functions
This section treats the choice of the distribution functions of random variables depending on selected quality-measures of functions expansions, such as integrated or mean errors, mean squared errors (MSEs) and integrated MSEs (IMSEs).
In this section, the input values of a function follow a prescribed distribution function, that is,
. Without additional information on
X, it is common to use uniform distributions or the Gaussian distribution with a higher variance. In the other case,
can be estimated using the well-established methods for density and/or distribution estimations (see e.g., [
30,
31,
32,
33]).
Recall that the main issue consists of expecting to find the distribution functions (i.e.,
F) such that
In the case where
with
being the Dirac measure, the distribution function is given by
and
almost surely. Therefore,
is achieved, and the exact Taylor expansion at that point is obtained. But, the corresponding approximation is only accurate around the node
a, as discussed in
Section 2.2.
3.1. Zero-Mean Errors
Concerning Equation (
10),
can be seen as an error term, and it is common to require the zero-mean error over the input space (i.e., integrated mean) and the smallest integrated variance. To achieve the zero-mean error, Theorem 2 gives the distribution function
F of independent variables as well as the corresponding IMSE.
Theorem 2. Consider the remainder term of Equations (9) and (10) and assume that (A1) holds. If , thenMoreover, if , then Proof. As
with
, the first result holds using the Fubini theorem [
27] and the fact that
, leading to
.
Since
and
, the integrated MSE is given by
Indeed, for the term
, it is obvious that
and when
and
, one can see that
and
If
and the second result holds by iterating the above procedure corresponding to the expectation
for any
.
The last result holds because
□
3.2. Integrated Mean Squared Errors
This section aims at providing distribution functions that can contribute to minimizing IMSEs, leading to possible optimal distribution functions for the approximation of
f. To that end, define
Theorem 3. Consider the remainder term of Equations (9) and (10) and assume that (A1) holds. Then, Proof. It comes out from the proof of Theorem 2 that the integrated MSE is given by
Since
and
for any
, one can write
and the result holds. □
In view of Theorem 3, the ideal, optimal distribution function
must satisfy the following functional equation:
Unfortunately, there is no solution in
, suggesting working in
so as to expect finding
. Nevertheless, such a functional equation is interesting as it could lead to finding a zero-one-ended function, which is not necessarily a distribution function (see Lemma 1).
For probabilistic expansions in considered in this paper, finding approximated optimal CDF requires using the upper-bound of the IMSE provided in Theorem 3. To derive , denote with the set of continuous CDFs on the support .
Proposition 3. Under the conditions of Theorem 3, assume that . Then, an optimal CDF is given by Such a result is straightforward. Of course, parametric approaches may be considered as well. Such approaches consist of (i) choosing particular classes of CDFs F belongs to, and (ii) determining the optimal parameters of the corresponding CDFs, that is, the values of the parameters that minimize the upper-bound of the IMSE. Instances of distributions are Gaussian, arcsine, and beta distributions.
To obtain an optimal CDF that depends on requires relying on the CDF of the form with , which leads to the following result.
Proposition 4. Under the conditions of Theorem 3, assume that with . Then, an optimal CDF is given by with Proof. One can see that is a CDF and the corresponding density is . □
4. Applications
To assess the qualities of different methods, including the Taylor approach, the following error measure is considered:
where
is the exact evaluation of
f at
and
is the corresponding approximation. The values
are selected so that they are equally spaced in the bounded domain
. Three types of approximations are computed: the traditional Taylor approximation, the proposed approach with and without the remainder term. All of these approximations make use of the same order
p of derivatives.
For each function, the R-package pracma is used for performing the Taylor expansion about (i) the middle of the domain
, that is,
with
, or (ii) the expectation of
X, that is,
for for infinite domains. The same R-package is used for obtaining derivatives of functions of orders up to
. Concerning the proposed approach, different CDFs are considered, such as the truncated Gaussian distribution on
with mean
and standard deviation
(i.e., practically, the Dirac measure); the uniform distribution on
; a mixture of Gaussian distributions; and the optimal beta distribution (when possible) according to Propositions 3 and 4. The coefficients
s of the proposed probabilistic expansion (see Theorem 1) are computed using the Monte Carlo approach with the sample size
, that is,
Table 1 and
Table 2 report the estimated values of the coefficients
s and the nodes
for expanding
and
, respectively. Such values are obtained using only one simulation and without the remainder terms.
It turns out that the coefficients used for the proposed expansions of functions differ from those of the traditional Taylor approach, except for the truncated Gaussian distribution, as expected.
The following figures show the simulated error values (in log 10) for different functions. Each panel of such figures corresponds to a specific CDF, and it shows the error values associated with Taylor’s approximation compared to the proposed approach, that is, the probabilistic Taylor-type without the remainder term (i.e., PTWoR) and/or with the remainder term (i.e., PTWR).
Figure 1 and
Figure 2 depict the error values for the functions
and
when
, respectively. Similar results are obtained when
and 3. It appears that the practical Dirac measure almost exactly provides the well-known Taylor approximations, and the other CDFs (PTWoR) perform well when the input values are far way from the node used.
Figure 3 and
Figure 4 depict the error values for the functions
and
when
, respectively. Again, the practical Dirac measure exactly provides the well-known Taylor approximations. The proposed PTWR approach performs well compared to PTWoR for the uniform distribution and the beta distribution.
Figure 5 and
Figure 6 show the error values for the absolute function
when
and 3, respectively. The practical Dirac measure helps to reproduce the Taylor approximations. The PTWoR performs well when
, while the PTWR associated with the Dirac measure gives the best results compared to PTWoR when
. It is to be noted that Taylor series fail for this function.
5. Conclusions
A generic expansion of every smooth function has been proposed in this paper using any zero-one-ended function. Special zero-one-ended functions are CDFs, which lead to the unified stochastic framework for Taylor-type expansions of functions, including Taylor series. The proposed probabilistic expansions are a global approach by allowing one to incorporate derivative information at several points. Since different CDFs are possible, it is shown in this paper that the Dirac measure and its CDF is the best CDF for the proposed probabilistic expansions, leading to Taylor series. But, such series are exact at the node used and give accurate results for input values that fall within a neighborhood of that node. A functional equation is proposed in order to derive the best CDF over the whole domain. Without any solution in , error bounds have been used for selecting the quasi optimal CDF.
The practical Dirac probability measure, such as truncated Gaussian distributions with the smallest variance, is used to recover the well-known Taylor expansion. It happens that some CDFs allow for improving the Taylor approximations. Such results are promising and require more investigations. For instance, different CDFs can be involved in the proposed expansion instead of only one CDF. Significant advantages of the proposed probabilistic expansion over Taylor series should be expected thanks to the best CDFs.