Probabilistic Taylor-Type Expansions of Functions

Lamboni, Matieyendou

doi:10.3390/math14040712

Open AccessArticle

Probabilistic Taylor-Type Expansions of Functions

by

Matieyendou Lamboni

^1,2

¹

Department DFR-ST, University of Guyane, 97346 Cayenne, France

²

228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France

Mathematics 2026, 14(4), 712; https://doi.org/10.3390/math14040712

Submission received: 30 December 2025 / Revised: 15 February 2026 / Accepted: 16 February 2026 / Published: 18 February 2026

(This article belongs to the Section C: Mathematical Analysis)

Download

Browse Figures

Versions Notes

Abstract

Taylor–Young and Maclaurin series are widely used for approximating smooth functions around a given point. This study investigates a unified stochastic framework for Taylor-type expansions of functions by means of independent random variables. The proposed probabilistic expansions of a function are able to incorporate evaluations of derivatives at different points, leading to a global approach. Exact expansions are obtained for any order of available derivatives, and such Taylor-type expansions enable the statistical inference of the remainder terms. It appears that the traditional Taylor–Young and Maclaurin series are particular cases of the proposed approach thanks to the Dirac probability measure, and guidelines for using Taylor series have been enhanced. Different ways of choosing the optimal distributions of random variables are provided, particularly when truncations are applied. Numerical comparisons are provided as well.

Keywords:

approximations of smooth functions; error bounds; optimal distribution functions; Taylor’s theorem and probabilistic Taylor’s expansion

MSC:

41A58; 65N75; 65C20; 42A61

1. Introduction

Taylor’s theorem [1] is one of the most fundamental results in functional analysis. Taylor–Young and Maclaurin series are widely used for approximating smooth functions around a given point, a.k.a. a node. For instance, such series have been used in partial differential equations [2], in stochastic computations of gradients [3,4,5,6,7,8,9], and in statistical analysis of functions [10,11]. Other instances are listed in [12,13,14] and the references therein. Stochastic Taylor expansions or Ito–Taylor expansions have been proposed to expand functions of solutions of stochastic differential equations (see e.g., [15,16]), and a version of probabilistic Taylor’s expansion has been considered in [17,18,19,20,21] by treating the input as a random variable.

Basically, Taylor series consist of linearizing smooth functions using information about the derivatives of such functions at one given node only, which ease derivations of results in the aforementioned scientific fields and others. Linearizing a function at a single node leads to a local approximation, and the quality of such an approximation often decays quickly for the input values that are far away form this node. Such local approximations make it difficult to assess the behavior of numerical schemes and/or algorithms relying on Taylor series when the input values are not close to the node, because the remainder term is often unknown. Such epistemic uncertainties of Taylor series around given nodes have been addressed in [12] using a Gaussian process. The posterior variance of the Gaussian process given derivative data sets serves as a measure of credibility for the Taylor approximations when the input values are far away from the corresponding nodes.

The issues of choosing the unique node arise in some situations, as derivative data sets become available at different input values [22,23], and they can serve as nodes for function approximations. In contrast to one-node-based approximations, it is very interesting to make best use of such derivative data sets for function approximations, particularly when the first-order derivative information is available at some points, and the second-order one is observed at other points. Moreover, reducing the epistemic uncertainty due the remainder term has been considered in [13,14,24] by deriving the optimal Taylor-like formula, which requires evaluating the first-order and second-order derivatives at different nodes. Each node has a specific weight, leading to a kind of discrete random variable.

So far, the aforementioned expansions or approximations of functions have been developed in the spirit of Taylor’s theorem and/or the mean value theorem. In this paper, a new and unified stochastic framework for Taylor-type expansions of smooth functions is proposed by means of independent random variables. The proposed probabilistic expansions of a function are able to incorporate evaluations of derivatives at different points, leading to a global approach. It turns out that the derived nodes for the proposed approach are expectations of the random variables used, and the coefficients used in such expansions involve the moments of derivatives. Also, exact expansions are obtained for any order of available derivatives, and such expansions enable the statistical inference of the remainder terms. Moreover, the proposed approach can be deployed for approximating functions that are not differentiable at a set of countable points. It appears that the traditional Taylor–Young and Maclaurin series are particular cases of the proposed approach by allowing the random variables to follow the Dirac measure. Thus, taking the Dirac probability measure leads to enhanced guidelines for using Taylor series. Different ways of choosing the optimal distributions of random variables are provided when truncations are applied.

The paper is organized as follows: Section 2 provides the generic expansions of functions using the concept of zero-one-ended functions, including cumulative distribution functions. Cumulative distribution functions are used to derive the probabilistic Taylor-type expansions of functions. Links with the Taylor series are established in that section. Section 3 addresses a kind of convergence analysis by deriving the error bounds of truncated expansions at a given order. Such upper-bounds are then used for deriving the optimal cumulative distribution functions. Section 4 provides simulated results comparing the proposed approach to the traditional Taylor expansions using different functions, and we conclude this work in Section 5.

Set up and general notation

Consider a set

Ω \subseteq R

and a measurable and deterministic function

f : Ω \to R

given by

x \mapsto f (x)

. Denote with Y a continuous random variable having F as the cumulative distribution function (CDF) and

ρ

as the probability density function (PDF). Given an integer

p > 0

,

Y^{(j)}

with

j = 1, \dots, p

are i.i.d. copies of Y. Also, denote with

E [\cdot]

(resp.

V [\cdot]

) the expectation (resp. variance) operator taken w.r.t.

Y^{(j)}

when there is no ambiguity.

To work with a large class of functions, the concept of weakly differentiable functions is needed. Let

C_{c}^{\infty} (Ω)

be a class of testing functions, that is, any

ϕ \in C_{c}^{\infty}

has a compact support included in an open set

Ω : = (a, b)

and

ϕ \in C^{\infty}

[25,26]. For any

ϕ \in C_{c}^{\infty}

,

ϕ^{'}

is its derivative.

Definition 1

([25,26]). Let

h : Ω \subseteq R \to R

.

A function f is said to be weakly differentiable if there exists a locally integrable function h such that for any

ϕ \in C_{c}^{\infty}

,

\int_{Ω} f (x) ϕ^{'} (x) d x = - \int_{Ω} h (x) ϕ (x) d x .

The function h is the weak derivative of f w.r.t. x, that is,

\frac{d f}{d x} (x) = h (x)

, and it is uniquely defined almost everywhere. Moreover, the distribution theory allows for defining the weak derivatives of f of every order. For a given integer

k > 0

, denote with

D^{k} f

the kth-order weak derivative of f w.r.t. x, and define the following space:

W^{p} : = \{f : Ω \to R : f \in C^{p} (Ω), D^{k} f \in L^{2} (Ω, F), \forall k \in {0, 1, \dots, p}\},

(1)

where

D^{k} f \in L^{2} (Ω, F)

means that

E_{X \sim F} [{\{D^{k} f (X)\}}^{2}] = \int_{Ω} {\{D^{k} f (x)\}}^{2} F (d x) < + \infty

, with F being a CDF.

2. Expansions of a Function Using Other Functions

This section aims at providing expansions of a function

f : Ω \to R

using its available weak derivatives and specific functions. We start with expansions of f based on specific functions (called zero-one-ended functions), followed by (i) the probabilistic expansions of the deterministic f using the CDFs of continuous random variables, and (ii) the links between such expansions and Taylor series.

Given

a \in R

,

H_{a} (x) : = {1 I}_{[x \geq a]}

denotes the Heaviside function or indicator function, that is,

{1 I}_{[x \geq a]} = 1

if

x \geq a

and 0 otherwise. Its distribution derivative is the Dirac delta function or measure

δ_{a} (x)

, leading to

H_{a}^{'} (x) = δ_{a} (x)

.

Definition 2.

A function

g : Ω = (a, b) \to R

is said to be zero-one-ended whenever

g (a_{+}) : = lim_{x \to a} g (x) = 0; g (b_{-}) : = lim_{x \to b} g (x) = 1 .

Obviously, when g has limits at a and b, then

g (a_{+}) = g (a)

and

g (b_{-}) = g (b)

. Instances of zero-one-ended functions are

g_{λ} (x) : = exp (λ x) {1 I}_{R -} (x)

on

Ω_{0} : = R_{-}

with

λ > 0

, and the large class of the well-known CDFs of continuous random variables, defined on the supports of such variables. Weak-differentiable and zero-one-ended functions (even negative) enable the expansions of every smooth function.

Assumption 1

(A1). The function f belongs to

W^{p + 1}

given by Equation (1).

Assumption (A1) provides the needful for not only ensuring that f is smooth enough and

D^{k} f

is integrable w.r.t. the probability measure F, but also for being able to work with finite second-order moments of

D^{k} f

, when evaluating it at a random variable. The global expansion of f alongside the zero-one-ended function g is derived in the following Lemma.

Lemma 1.

Let

g \in W^{1}

be a zero-one-ended function on Ω. Assume

\int_{Ω} D^{k} f (y) \frac{d g}{d y} (y) d y < \infty, \forall k \leq p

and (A1) hold. Then, the pth-order expansion of f at

x_{0} \in Ω

is

\begin{matrix} f (x_{0}) & = & \int_{Ω} f (y) \frac{d g}{d y} (y) d y + \sum_{k = 1}^{p} \int_{Ω} D^{k} f (y) \frac{d g}{d y} (y) d y \int_{Ω^{k}} \prod_{j = 1}^{k} (g (x_{j}) - {1 I}_{[x_{j} \geq x_{j - 1}]}) d x_{1} \dots d x_{k} \\ + \int_{Ω^{p + 1}} D^{p + 1} f (x_{p + 1}) \prod_{j = 1}^{p + 1} (g (x_{j}) - {1 I}_{[x_{j} \geq x_{j - 1}]}) d x_{1} \dots d x_{p + 1} . \end{matrix}

(2)

Proof.

We derive this result using the recurrence reasoning. Firstly, the integration by part gives

\int_{Ω} f (y) \frac{d g}{d y} (y) d y = {[f (y) g (y)]}_{a}^{b} - \int_{Ω} D f (y) g (y) d y = f (b) - \int_{Ω} D f (y) g (y) d y .

Adding an integral to the above equation yields the zero-order expansion of f. Indeed,

\begin{matrix} \int_{Ω} f (y) \frac{d g}{d y} (y) d y + \int_{Ω} D f (y) (g (y) - {1 I}_{[y \geq x]}) d y & = & f (b) - \int_{Ω} D f (y) {1 I}_{[y \geq x]} d y \\ = & f (b) - {[f (y)]}_{x}^{b} = f (x) . \end{matrix}

(3)

Secondly, by applying the zero-order expansion to the function

D f (y)

, that is,

D f (y) = \int_{Ω} D f (z) \frac{d g}{d z} (z) d z + \int_{Ω} D^{2} f (z) (g (z) - {1 I}_{[z \geq y]}) d z,

and replacing

D f (y)

with the above expression in Equation (3), we obtain the first-order expansion of f given by

\begin{matrix} f (x) & = & \int_{Ω} f (y) \frac{d g}{d y} (y) d y + \int_{Ω} D f (y) \frac{d g}{d y} (y) d y \int_{Ω} (g (y) - {1 I}_{[y \geq x]}) d y \\ + \int_{Ω^{2}} D^{2} f (z) (g (z) - {1 I}_{[z \geq y]}) (g (y) - {1 I}_{[y \geq x]}) d z d y . \end{matrix}

Thirdly, combining the remainder term of Equation (2) corresponding to the order

p - 1

with the zero-order expansion of

D^{p} f (x_{p})

, that is,

D^{p} f (x_{p}) = \int_{Ω} D^{p} f (x_{p + 1}) \frac{d g}{d x_{p + 1}} (x_{p + 1}) d x_{p + 1} + \int_{Ω} D^{p + 1} f (x_{p + 1}) (g (x_{p + 1}) - {1 I}_{[x_{p + 1} \geq x_{p}]}) d x_{p + 1},

yields

\begin{matrix} R_{p} & : = & \int_{Ω^{p}} D^{p} f (x_{p}) \prod_{j = 1}^{p} (g (x_{j}) - {1 I}_{[x_{j} \geq x_{j - 1}]}) d x_{1} \dots d x_{p} \\ = & \int_{Ω} D^{p} f (x_{p + 1}) \frac{d g}{d x_{p + 1}} (x_{p + 1}) d x_{p + 1} \int_{Ω^{p}} \prod_{j = 1}^{p} (g (x_{j}) - {1 I}_{[x_{j} \geq x_{j - 1}]}) d x_{1} \dots d x_{p} \\ + \int_{Ω^{p + 1}} D^{p + 1} f (x_{p + 1}) \prod_{j = 1}^{p + 1} (g (x_{j}) - {1 I}_{[x_{j} \geq x_{j - 1}]}) d x_{1} \dots d x_{p + 1}, \end{matrix}

and the result holds. □

The identity in Equation (2) is an exact expansion of f at any

x_{0} \in Ω

. We then approximate f by omitting the last term of Equation (2). Such an approximation remains exact for polynomials of degrees up to p. In view of Lemma 1, there are many possibilities of choosing g. The natural choice of g in the probability theory is the CDF, as it can bring information about the input or the output space. Another interesting choice of g might rely on regression functions used in statistical modeling. Before taking the CDFs to represent g, let us introduce the moments of f evaluated at a random variable X using its first-order derivative and the exact expansion given by Equation (2). In what follows, denote

f (b_{-}) : = {lim}_{x \to b} f (x)

and

f (a_{+}) : = {lim}_{x \to a} f (x)

, which become

f (a)

and

f (b)

when f has limits at such points.

Proposition 1.

Let

X \sim G

be a random variable having G as the CDF. If

h \in W^{1}

, Then,

E [h (X)] = h (b_{-}) - \int_{Ω} \frac{d h}{d y} (y) G (y) d y .

Proof.

For any smooth function h, using Lemma 1 and bearing in mind Fubini’s theorem [27,28], one can write

\begin{matrix} E [h (X)] & = & \int_{Ω} h (y) \frac{d g}{d y} (y) d y + \underset{Ω^{2}}{\int \int} \frac{d h}{d y} (y) (g (y) - {1 I}_{y \geq x}) d y d G (x) \\ = & \int_{Ω} h (y) \frac{d g}{d y} (y) d y + \int_{Ω} \frac{d h}{d y} (y) (g (y) - G (y)) d y \\ = & {[h (y) g (y)]}_{a}^{b} - \int_{Ω} \frac{d h}{d y} (y) G (y) d y \\ = & lim_{x \to b} h (x) - \int_{Ω} \frac{d h}{d y} (y) G (y) d y, \end{matrix}

and the result holds. □

Applications of such a result give new identities involving the moments of functions evaluated at random variables. The following corollary provides such identities.

Corollary 1.

Let

X \sim G

be a random variable having G as the CDF. Assume

f \in W^{1}

and

f (X)

has finite

p + 1

th-order moment. Then, for any

r \leq p + 1

,

E [f^{r} (X)] = f^{r} (b_{-}) - r \int_{Ω} f^{r - 1} (y) D f (y) G (y) d y .

E [{(f (X) - E [f (X)])}^{r}] = {(f (b_{-}) - E [f (X)])}^{r} - r \int_{Ω} {(f (y) - E [f (X)])}^{r - 1} D f (y) G (y) d y .

Proof.

The first result holds using

h = f^{r}

, and the second one using

h = {(f - E [f (X)])}^{r}

(See Proposition 1). □

In view of Corollary 1, the moments of

f (X)

do not depend on g, which is used to expand f. It is also shown in [29] that the variance of

f (X)

is

V [f (X)] = \int_{Ω^{2}} D f (y) D f (z) [G (min (y, z)) - G (y) G (z)] d y d z .

2.1. Probabilistic Expansions of Functions

In this section, the expansions of f based on particular zero-one-ended functions, such as CDFs, are considered. Before giving such expansions, the following intermediate results are needed. Denote with F a continuous CDF and

ρ

the corresponding probability density function, that is,

D F = ρ

.

Proposition 2.

Let

x \in Ω

,

α \in R

and

Y \sim F

be a random variable having finite

(p + 1)

th-order moment. Then, the following identity holds:

E [{(Y - α)}^{p} \frac{F (Y) - {1 I}_{[Y \geq x]}}{ρ (Y)}] = \frac{1}{p + 1} \{{(x - α)}^{p + 1} - E [{(Y - α)}^{p + 1}]\} .

(4)

Proof.

Let

B : = \frac{1}{p + 1} E [{(Y - α)}^{p + 1}]

. It follows from the integration by part that

\begin{matrix} E [{(Y - α)}^{p} \frac{F (Y) - {1 I}_{[Y \geq x]}}{ρ (Y)}] & = & \int_{Ω} {(y - α)}^{p} F (y) d y - \int_{x}^{b} {(y - α)}^{p} d y \\ = & \frac{1}{p + 1} {[{(y - α)}^{p + 1} F (y)]}_{a}^{b} - B - \frac{1}{p + 1} {[{(y - α)}^{p + 1}]}_{x}^{b} \\ = & \frac{1}{p + 1} {(x - α)}^{p + 1} - \frac{1}{p + 1} E [{(Y - α)}^{p + 1}], \end{matrix}

and the result holds. □

Lemma 2.

Let

Y^{(0)} : = x \in Ω

,

Y \sim F

be a random variable and

{\{Y^{(j)}\}}_{j = 1}^{p + 1}

be i.i.d. copies of Y. If Y has finite

(p + 1)

th-order moment, then

E [\prod_{j = 1}^{p} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = \sum_{j = 0}^{p} \frac{x^{j}}{j!} U_{p - j},

(5)

with the sequences

U_{0} : = 1, U_{k + 1} = - \sum_{i = 1}^{k + 1} \frac{E [Y^{i}]}{i!} U_{k + 1 - i} .

(6)

Proof.

Let us show the result using recurrence reasoning. The first step (i.e.,

p = 1

) holds using (4) with

α = E [Y]

, that is,

\begin{matrix} E [\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq x]}}{ρ (Y^{(1)})}] & = & x - E [Y] . \end{matrix}

Secondly, for all

q = 1, \dots, p

, suppose that Equation (5) is still valid.

Thirdly, we are going to show that (5) is also true when

q = p + 1

. Indeed,

\begin{matrix} E [\prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] \\ = & E_{Y^{(1)}} [E_{Y^{(2)} - Y^{(p + 1)}} [\prod_{j = 2}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ = & E_{Y^{(1)}} [E_{Y^{(2)} - Y^{(p + 1)}} [\prod_{k = 1}^{p} \frac{F (Y^{(k + 1)}) - {1 I}_{[Y^{(k + 1)} \geq Y^{(k)}]}}{ρ (Y^{(k + 1)})}] (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ \overset{S t e p 2}{=} & \sum_{j = 0}^{p} \frac{U_{p - j}}{j!} E_{Y^{(1)}} [Y^{{(1)}^{j}} (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ \overset{(4)}{=} & \sum_{j = 0}^{p} \frac{U_{p - j}}{(j + 1)!} Y^{{(0)}^{j + 1}} - \sum_{j = 0}^{p} \frac{U_{p - j}}{(j + 1)!} E [Y^{j + 1}] \\ = & \sum_{j^{'} = 1}^{p + 1} \frac{U_{p + 1 - j^{'}}}{(j^{'})!} Y^{(0) j^{'}} + U_{p + 1} = \sum_{j^{'} = 0}^{p + 1} \frac{U_{p + 1 - j^{'}}}{(j^{'})!} Y^{(0) j^{'}}, \end{matrix}

using the change of variables of the form

j^{'} = j + 1

. □

It is sometimes very interesting to have the above expression around a constant, such as the expectation of Y. Such a result is provided in Lemma 3.

Lemma 3.

Let

Y^{(0)} : = x \in Ω

,

Y \sim F

be a random variable and

{\{Y^{(j)}\}}_{j = 1}^{p + 1}

be i.i.d. copies of Y. If Y has finite

(p + 1)

th-order moment, then

E [\prod_{j = 1}^{p} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = \sum_{j = 0}^{p} V_{p - j} \frac{{(x - E [Y])}^{j}}{j!},

(7)

with the sequences

V_{0} : = 1, V_{1} : = 0, V_{k + 1} = - \sum_{i = 2}^{k + 1} V_{k + 1 - i} \frac{E [{(Y - E [Y])}^{i}]}{i!}, \forall k \geq 1 .

(8)

Proof.

The first step (i.e.,

p = 1

) holds using (4) with

α = E [Y]

, that is,

\begin{matrix} E [\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq x]}}{ρ (Y^{(1)})}] & = & x - E [Y] . \end{matrix}

Secondly, for all

q = 1, \dots, p

, suppose that Equation (7) is still valid.

Thirdly, we are going to show that (7) is also true when

q = p + 1

. Indeed,

\begin{matrix} E [\prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] \\ = & E_{Y^{(1)}} [E_{Y^{(2)} - Y^{(p + 1)}} [\prod_{j = 2}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ = & E_{Y^{(1)}} [E_{Y^{(2)} - Y^{(p + 1)}} [\prod_{k = 1}^{p} \frac{F (Y^{(k + 1)}) - {1 I}_{[Y^{(k + 1)} \geq Y^{(k)}]}}{ρ (Y^{(k + 1)})}] (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ \overset{S t e p 2}{=} & \sum_{j = 0}^{p} \frac{V_{p - j}}{j!} E_{Y^{(1)}} [{(Y^{(1)} - E [Y])}^{j} (\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq Y^{(0)}]}}{ρ (Y^{(1)})})] \\ \overset{(4)}{=} & \sum_{j = 0}^{p} \frac{V_{p - j}}{(j + 1)!} {(Y^{(0)} - E [Y])}^{j + 1} - \sum_{j = 0}^{p} \frac{V_{p - j}}{(j + 1)!} E [{(Y - E [Y])}^{j + 1}] \\ = & \sum_{j^{'} = 1}^{p + 1} \frac{V_{p + 1 - j^{'}}}{(j^{'})!} {(Y^{(0)} - E [Y])}^{j^{'}} + V_{p + 1} = \sum_{j^{'} = 0}^{p + 1} \frac{V_{p + 1 - j^{'}}}{(j^{'})!} {(Y^{(0)} - E [Y])}^{j^{'}}, \end{matrix}

using the change of variables

j^{'} = j + 1

and the fact that

E [Y - E [Y]] = 0

. □

Now, we have all the elements in hand to provide the Taylor-type expansions of functions based on CDFs of random variables and their moments (see Theorem 1). To that end, consider the sequences

U_{k}

and

V_{k}

given by (6) and (8), respectively, and define the coefficients

α_{j} : = \sum_{k = 0}^{p - j} U_{k} E [D^{k + j} f (Y)]; β_{j} : = \sum_{k = 0}^{p - j} V_{k} E [D^{k + j} f (Y)]; j = 0, \dots, p;

and

R_{p} : = E [D^{p} f (Y^{(p)}) \prod_{j = 1}^{p} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] .

Theorem 1.

Let

Y^{(0)} : = x \in Ω

,

Y \sim F

be a random variable,

{\{Y^{(j)}\}}_{j = 1}^{p + 1}

be i.i.d. copies of Y, and assume (A1) holds. Then, the pth-order expansions of f at x are

f (x) = \sum_{j = 0}^{p} α_{j} \frac{x^{j}}{j!} + R_{p + 1};

(9)

f (x) = \sum_{j = 0}^{p} β_{j} \frac{{(x - E [Y])}^{j}}{j!} + R_{p + 1} .

(10)

Proof.

When taking

g = F

, Equation (2) becomes

\begin{matrix} f (x) & = & \sum_{k = 0}^{p} E [D^{k} f (Y)] E [\prod_{j = 1}^{k} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] \\ + E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}], \end{matrix}

with the convention

\prod_{j = 1}^{k = 0} τ_{j} : = 1

. For the first result, using Lemma 2, that is,

E [\prod_{j = 1}^{p} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = \sum_{j = 0}^{p} \frac{U_{p - j}}{j!} x^{j},

we can write

\begin{matrix} f (x) & = & \sum_{k = 0}^{p} E [D^{k} f (Y)] \sum_{j = 0}^{k} U_{k - j} \frac{x^{j}}{j!} + E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] \\ = & \sum_{j = 0}^{p} \sum_{k = j}^{p} E [D^{k} f (Y)] U_{k - j} \frac{x^{j}}{j!} + E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] . \end{matrix}

The second result holds using the same reasoning thanks to Lemma 3, that is,

E [\prod_{j = 1}^{p} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = \sum_{j = 0}^{p} \frac{V_{p - j}}{j!} {(x - E [Y])}^{j} .

□

Theorem 1 provides exact and global expansions of smooth functions at any order

p \geq 1

, and it appears that different expansions are possible thanks to the choice of CDFs. For polynomials of degree p, the last terms in Equations (9) and (10) vanish, and exact expansions still hold using the first pth-order weak-derivatives of f. In general, the last terms are commonly known as remainder terms, and the smaller they are, the better the expansions are without such terms, that is,

f (x) \approx {\tilde{f}}_{1} (x) : = \sum_{j = 0}^{p} α_{j} \frac{x^{j}}{j!}; f (x) \approx {\tilde{f}}_{2} (x) : = \sum_{j = 0}^{p} β_{j} \frac{{(x - E [Y])}^{j}}{j!} .

Instead of using the above approximations, one may consider the exact expansions at order

p - 1

, which include the remainder term when only the first pth-order weak derivatives are available. Indeed, the expectation

R_{p}

can be computed in that situation, leading to the following estimator:

\hat{f} (x) = \sum_{j = 0}^{p - 1} β_{j} \frac{{(x - E [Y])}^{j}}{j!} + \frac{1}{N} \sum_{i = 1}^{N} D^{p} f (Y_{i}^{(p)}) \prod_{j = 1}^{p} \frac{F (Y_{i}^{(j)}) - {1 I}_{[Y_{i}^{(j)} \geq Y_{i}^{(j - 1)}]}}{ρ (Y_{i}^{(j)})},

where

{\{Y_{i}^{(j)}\}}_{i = 1}^{N}

is an i.i.d. sample of

Y^{(j)}

for any

j \in {1, \dots, p}

.

Additionally, to polynomials of degrees up to p, note that approximations

{\tilde{f}}_{1}, {\tilde{f}}_{2}

of f become exact when

R_{p + 1} = E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = 0,

for any function

f \in W^{p + 1}

. The most important question is as follows: can we find or construct different distribution functions F such that

R_{p + 1} = 0

? Without an answer to that open problem, this study examines the error bounds so as to find a reasonable CDF (see Section 3).

2.2. Links with Taylor–Young and Maclaurin Series

Taylor series are well known and used by different communities for locally approximating functions. In this section, we are going to show in Corollary 2 that such series are particular cases of the expansions of functions derived in Theorem 1. To that end, for any

x_{0} \in Ω

, denote with

{\{U_{m} \sim U (x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m})\}}_{m \in N ∖ {0}}

a sequence of uniformly distributed random variables, having

F_{m}

as the CDF. Applying Equation (10) using

F_{m}

yields

\begin{matrix} f_{m} (x) & : = & \sum_{j = 0}^{p} \sum_{k = 0}^{p - j} E [D^{k + j} f (U_{m})] V_{k, m} \frac{{(x - x_{0})}^{j}}{j!} \\ + E [D^{p + 1} f (U_{m}^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F_{m} (U_{m}^{(j)}) - {1 I}_{[U_{m}^{(j)} \geq U_{m}^{(j - 1)}]}}{ρ_{m} (U_{m}^{(j)})}], \end{matrix}

which is a sequence of functions that expand f as well. But, it is worth noting that

f (x) = f_{m} (x)

for only

x \in [x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]

because

F_{m}

is a zero-one-ended function on that support.

Corollary 2.

Let

x \in Ω

, and assume (A1) holds. Then, we have

lim_{m \to \infty} f_{m} (x) = \sum_{j = 0}^{p} D^{j} f (x_{0}) \frac{{(x - x_{0})}^{j}}{j!} .

Proof.

The sequence of random variables given by

U_{m} \sim U (x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m})

has

ρ_{m} (x) : = m {1 I}_{[x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]} (x)

as density for all

m > 0

. The density

ρ_{m}

is also known as the rectangular function, and it converges in distribution toward the Dirac delta measure, that is,

ρ_{m} (x) \to δ_{x_{0}} (x) : = δ (x - x_{0})

. The CDF of

U_{m}

is given by

F_{m} (x) : = m (x - x_{0} + \frac{1}{2 m})

and

E [U_{m}] = x_{0}

. Therefore, the sequence given by Equation (8) becomes

V_{0, m} : = 1, V_{1, m} : = 0, V_{k + 1, m} = - \sum_{i = 2}^{k + 1} V_{k + 1 - i, m} \frac{E [{(U_{m} - x_{0})}^{i}]}{i!}, \forall k \geq 1,

and a direct calculus gives

{lim}_{m \to \infty} V_{k + 1, m} = 0

because for all

i > 0

lim_{m \to \infty} E [{(U_{m} - x_{0})}^{i}] = lim_{m \to \infty} \int_{R} {(x - x_{0})}^{i} ρ_{m} (x) d x = \int_{R} δ_{x_{0}} (x) {(x - x_{0})}^{i} d x = 0 .

Moreover, giving the identity

lim_{m \to \infty} E [D^{j} f (U_{m})] = lim_{m \to \infty} \int_{R} D^{j} f (x) ρ_{m} (x) d x = \int_{R} D^{j} f (x) δ_{x_{0}} (x) d x = D^{j} f (x_{0}),

one can write the following limit of

f_{m}

:

\begin{matrix} lim_{m \to \infty} f_{m} (x) & = & \sum_{j = 0}^{p} lim_{m \to \infty} E [D^{j} f (U_{m})] \frac{{(x - x_{0})}^{j}}{j!} \\ + lim_{m \to \infty} E [D^{p + 1} f (U_{m}^{(p + 1)}) \prod_{j = 1}^{p + 1} (U_{m}^{(j)} - x_{0} + \frac{1}{2 m} - \frac{{1 I}_{[U_{m}^{(j)} \geq U_{m}^{(j - 1)}]}}{m})] . \\ = & \sum_{j = 0}^{p} D^{j} f (x_{0}) \frac{{(x - x_{0})}^{j}}{j!}, \end{matrix}

because the last limit vanishes, keeping in mind the dominated convergence theorem. Indeed, using

| a + b | \leq | a | + | b |

for all real

a, b

and

{1 I}_{A} \leq {1 I}_{B}

for all sets

A \subseteq B

, we have

m |z^{(j)} - x_{0} + \frac{1}{2 m} - \frac{{1 I}_{[z^{(j)} \geq z^{(j - 1)}]}}{m}| {1 I}_{[x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]} (z^{(j)}) \leq 2 {1 I}_{[x_{0} - \frac{1}{2}, x_{0} + \frac{1}{2}]} (z^{(j)}),

and

\begin{matrix} |D^{p + 1} f (z^{(p + 1)}) \prod_{j = 1}^{p + 1} (z^{(j)} - x_{0} + \frac{1}{2 m} - \frac{{1 I}_{[z^{(j)} \geq z^{(j - 1)}]}}{m})| m {1 I}_{[x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]} (z^{(j)}) \\ \leq & |D^{p + 1} f (z^{(p + 1)})| \prod_{j = 1}^{p + 1} 2 {1 I}_{[x_{0} - \frac{1}{2}, x_{0} + \frac{1}{2}]} (z^{(j)}) . \end{matrix}

Since the last term is integrable, the dominated convergence theorem allows us to write

\begin{matrix} lim_{m \to \infty} E [D^{p + 1} f (z^{(p + 1)}) \prod_{j = 1}^{p + 1} (U_{m}^{(j)} - x_{0} + \frac{1}{2 m} - \frac{{1 I}_{[U_{m}^{(j)} \geq U_{m}^{(j - 1)}]}}{m})] \\ = & \int lim_{m \to \infty} D^{p + 1} f (z^{(p + 1)}) \prod_{j = 1}^{p + 1} (z^{(j)} - x_{0} + \frac{1}{2 m} - \frac{{1 I}_{[z^{(j)} \geq z^{(j - 1)}]}}{m}) m {1 I}_{[x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]} (z^{(j)}) d z^{(j)} \\ = & \int D^{p + 1} f (z^{(p + 1)}) \prod_{j = 1}^{p + 1} (z^{(j)} - x_{0}) δ_{x_{0}} (z^{(j)}) d z^{(j)} = 0, \end{matrix}

and the result holds. □

Obviously, Corollary 2 provides the exact expansion of

f (x_{0})

, and good approximations of

f (x)

for any

x \in [x_{0} - \frac{1}{2 m}, x_{0} + \frac{1}{2 m}]

with

m > 1

. Extending such approximations for any

x \in Ω

might not give accurate results, except for some particular cases. Indeed, it is common to consider expanding f within different subsets

Ω_{i}

s of

Ω

that form a partition of

Ω

when derivative data sets are available within each

Ω_{i}

.

3. Error Bounds and Choice of Distribution Functions

This section treats the choice of the distribution functions of random variables depending on selected quality-measures of functions expansions, such as integrated or mean errors, mean squared errors (MSEs) and integrated MSEs (IMSEs).

In this section, the input values of a function follow a prescribed distribution function, that is,

X \sim F_{0}

. Without additional information on X, it is common to use uniform distributions or the Gaussian distribution with a higher variance. In the other case,

F_{0}

can be estimated using the well-established methods for density and/or distribution estimations (see e.g., [30,31,32,33]).

Recall that the main issue consists of expecting to find the distribution functions (i.e., F) such that

R_{p + 1} = E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}] = 0 .

In the case where

Y \sim δ_{a}

with

δ_{a}

being the Dirac measure, the distribution function is given by

F (y) = 1 {1 I}_{[y \geq a]}

and

Y^{(j)} = a

almost surely. Therefore,

R_{p + 1} = 0

is achieved, and the exact Taylor expansion at that point is obtained. But, the corresponding approximation is only accurate around the node a, as discussed in Section 2.2.

3.1. Zero-Mean Errors

Concerning Equation (10),

R_{p + 1}

can be seen as an error term, and it is common to require the zero-mean error over the input space (i.e., integrated mean) and the smallest integrated variance. To achieve the zero-mean error, Theorem 2 gives the distribution function F of independent variables as well as the corresponding IMSE.

Theorem 2.

Consider the remainder term

R_{p + 1}

of Equations (9) and (10) and assume that (A1) holds. If

F = F_{0}

, then

E [R_{p + 1} (X)] = 0,

E [R_{p + 1} {(X)}^{2}] \leq \frac{1}{3} {[sup_{y \in Ω} \frac{F_{0} (y) [1 - F_{0} (y)]}{ρ_{0}^{2} (y)}]}^{p} E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ_{0} (Y^{(p + 1)})}\}}^{2}] .

Moreover, if

ρ_{0} \geq ϵ > 0

, then

E [R_{p + 1} {(X)}^{2}] \leq \frac{1}{3 ϵ^{2 p}} E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ_{0} (Y^{(p + 1)})}\}}^{2}] .

Proof.

As

R_{p + 1} (X) = E [D^{p + 1} f (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}]

with

Y^{(0)} = X

, the first result holds using the Fubini theorem [27] and the fact that

E_{X} [{1 I}_{[Y^{(1)} \geq X]}] = F_{0} (Y^{(1)})

, leading to

F (Y^{(1)}) - E_{X} [{1 I}_{[Y^{(1)} \geq X]}] = 0

.

Since

F = F_{0}

and

E [R_{p + 1} (X)] = 0

, the integrated MSE is given by

\begin{matrix} E_{X} [R_{p + 1}^{2} (X)] & \leq & E_{X} [E^{2} \{|D^{p + 1} f (Y^{(n + 1)}) \prod_{j = 1}^{p + 1} \frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})}\}]] \\ \leq & E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ (Y^{(p + 1)})}\}}^{2}] E [ρ^{2} (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} {(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})})}^{2}] \\ \leq & \frac{1}{3} {[sup_{y \in Ω} \frac{F_{0} (y) [1 - F_{0} (y)]}{ρ_{0}^{2} (y)}]}^{p} E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ (Y^{(p + 1)})}\}}^{2}] . \end{matrix}

Indeed, for the term

E [ρ^{2} (Y^{(p + 1)}) \prod_{j = 1}^{p + 1} {(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})})}^{2}]

, it is obvious that

{(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})})}^{2} = \frac{F^{2} (Y^{(j)}) - 2 F (Y^{(j)}) {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]} + {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ^{2} (Y^{(j)})},

and when

j = 1

and

j = p + 1

, one can see that

E_{X} [{(\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq X]}}{ρ (Y^{(1)})})}^{2}] = \frac{F^{2} (Y^{(1)}) - 2 F (Y^{(1)}) F_{0} (Y^{(1)}) + F_{0} (Y^{(1)})}{ρ^{2} (Y^{(1)})},

and

\begin{matrix} E_{Y^{(p + 1)}} [ρ^{2} (Y^{(p + 1)}) {(\frac{F (Y^{(p + 1)}) - {1 I}_{[Y^{(p + 1)} \geq Y^{(p)}]}}{ρ (Y^{(p + 1)})})}^{2}] & = & \frac{1}{3} - \{1 - F^{2} (Y^{(p)})\} + 1 - F (Y^{(p)}) \\ = & \frac{1}{3} + F^{2} (Y^{(p)}) - F (Y^{(p)}) \leq \frac{1}{3} . \end{matrix}

If

F = F_{0}

E_{X} [{(\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq X]}}{ρ (Y^{(1)})})}^{2}] = \frac{F_{0} (Y^{(1)}) [1 - F_{0} (Y^{(1)})]}{ρ_{0}^{2} (Y^{(1)})},

and the second result holds by iterating the above procedure corresponding to the expectation

E_{Y^{(j - 1)})} [{(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)})}]}{ρ (Y^{(j)})})}^{2}] = \frac{F (Y^{(j)}) [1 - F (Y^{(j)})]}{ρ^{2} (Y^{(j)})},

for any

j \in {2, \dots, p}

.

The last result holds because

E_{X} [{(\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq X]}}{ρ (Y^{(1)})})}^{2}] = \frac{F_{0} (Y^{(1)}) [1 - F_{0} (Y^{(1)})]}{ρ_{0}^{2} (Y^{(1)})} \leq \frac{1}{ρ_{0}^{2} (Y^{(1)})} \leq \frac{1}{ϵ^{2}} .

□

3.2. Integrated Mean Squared Errors

This section aims at providing distribution functions that can contribute to minimizing IMSEs, leading to possible optimal distribution functions for the approximation of f. To that end, define

K_{p} : = E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ (Y^{(p + 1)})}\}}^{2}] .

Theorem 3.

Consider the remainder term

R_{p + 1}

of Equations (9) and (10) and assume that (A1) holds. Then,

E [R_{p + 1} {(X)}^{2}] \leq \frac{K_{p}}{3} sup_{y \in Ω} \frac{F^{2} (y) - 2 F (y) F_{0} (y) + F_{0} (y)}{ρ^{2} (y)} {[sup_{y \in Ω} \frac{F (y) [1 - F (y)]}{ρ^{2} (y)}]}^{p - 1} .

Proof.

It comes out from the proof of Theorem 2 that the integrated MSE is given by

E_{X} [R_{p + 1}^{2} (X)] \leq \frac{1}{3} E [\prod_{j = 1}^{p} {(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})})}^{2}] E [{\{\frac{D^{p + 1} f (Y^{(p + 1)})}{ρ (Y^{(p + 1)})}\}}^{2}] .

Since

E_{X} [{(\frac{F (Y^{(1)}) - {1 I}_{[Y^{(1)} \geq X]}}{ρ (Y^{(1)})})}^{2}] = \frac{F^{2} (Y^{(1)}) - 2 F (Y^{(1)}) F_{0} (Y^{(1)}) + F_{0} (Y^{(1)})}{ρ^{2} (Y^{(1)})},

and

E_{Y^{(j - 1)})} [{(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)})}]}{ρ (Y^{(j)})})}^{2}] = \frac{F (Y^{(j)}) [1 - F (Y^{(j)})]}{ρ^{2} (Y^{(j)})},

for any

j \in {2, \dots, p}

, one can write

\begin{matrix} A_{1} & : = & E [\prod_{j = 1}^{p} {(\frac{F (Y^{(j)}) - {1 I}_{[Y^{(j)} \geq Y^{(j - 1)}]}}{ρ (Y^{(j)})})}^{2}] \\ \leq & sup_{y \in Ω} \frac{F^{2} (y) - 2 F (y) F_{0} (y) + F_{0} (y)}{ρ^{2} (y)} {[sup_{y \in Ω} \frac{F (y) [1 - F (y)]}{ρ^{2} (y)}]}^{p - 1}, \end{matrix}

and the result holds. □

In view of Theorem 3, the ideal, optimal distribution function

F_{o p}

must satisfy the following functional equation:

F_{o p}^{2} (y) - 2 F_{o p} (y) F_{0} (y) + F_{0} (y) = 0 .

Unfortunately, there is no solution in

R

, suggesting working in

C

so as to expect finding

F_{o p}

. Nevertheless, such a functional equation is interesting as it could lead to finding a zero-one-ended function, which is not necessarily a distribution function (see Lemma 1).

For probabilistic expansions in

R

considered in this paper, finding approximated optimal CDF

F_{o p}^{*}

requires using the upper-bound of the IMSE provided in Theorem 3. To derive

F_{o p}^{*}

, denote with

F_{Ω}^{c}

the set of continuous CDFs on the support

Ω

.

Proposition 3.

Under the conditions of Theorem 3, assume that

p ≫ 1

. Then, an optimal CDF is given by

F_{o p}^{*} : = arg min_{F \in F_{Ω}^{c}} (sup_{y \in Ω} \frac{F (y) [1 - F (y)]}{ρ^{2} (y)}) .

Such a result is straightforward. Of course, parametric approaches may be considered as well. Such approaches consist of (i) choosing particular classes of CDFs F belongs to, and (ii) determining the optimal parameters of the corresponding CDFs, that is, the values of the parameters that minimize the upper-bound of the IMSE. Instances of distributions are Gaussian, arcsine, and beta distributions.

To obtain an optimal CDF that depends on

F_{0}

requires relying on the CDF of the form

F = F_{0}^{τ}

with

τ > 0

, which leads to the following result.

Proposition 4.

Under the conditions of Theorem 3, assume that

F = F_{0}^{τ}

with

τ > 0

. Then, an optimal CDF is given by

F_{o p}^{* *} : = F_{0}^{τ^{*}}

with

τ^{*} : = arg min_{τ \in R_{+} ∖ {0}} \{sup_{y \in Ω} \frac{F_{0}^{2} (y) - 2 F_{0}^{3 - τ} (y) + F_{0}^{3 - 2 τ} (y)}{τ^{2} ρ_{0}^{2} (y)} {(sup_{y \in Ω} \frac{F_{0}^{2 - τ} (y) [1 - F_{0}^{τ} (y)]}{τ^{2} ρ_{0}^{2} (y)})}^{p - 1}\} .

Proof.

One can see that

F = F_{0}^{τ}

is a CDF and the corresponding density is

ρ = τ F_{0}^{τ - 1} ρ_{0}

. □

4. Applications

To assess the qualities of different methods, including the Taylor approach, the following error measure is considered:

E r r (x_{ℓ}) : = \frac{|\tilde{f} (x_{ℓ}) - f (x_{ℓ})|}{|f (x_{ℓ})|}; ℓ = 1, \dots, L,

where

f (x_{ℓ})

is the exact evaluation of f at

x_{ℓ} \in Ω

and

\tilde{f} (x_{ℓ})

is the corresponding approximation. The values

x_{ℓ}, ℓ = 1, \dots, L = 100

are selected so that they are equally spaced in the bounded domain

Ω

. Three types of approximations are computed: the traditional Taylor approximation, the proposed approach with and without the remainder term. All of these approximations make use of the same order p of derivatives.

For each function, the R-package pracma is used for performing the Taylor expansion about (i) the middle of the domain

Ω

, that is,

\frac{a + b}{2}

with

Ω = (a, b)

, or (ii) the expectation of X, that is,

E [X]

for for infinite domains. The same R-package is used for obtaining derivatives of functions of orders up to

p = 3

. Concerning the proposed approach, different CDFs are considered, such as the truncated Gaussian distribution on

Ω

with mean

\frac{a + b}{2}

and standard deviation

10^{- 6}

(i.e., practically, the Dirac measure); the uniform distribution on

Ω

; a mixture of Gaussian distributions; and the optimal beta distribution (when possible) according to Propositions 3 and 4. The coefficients

β_{j}

s of the proposed probabilistic expansion (see Theorem 1) are computed using the Monte Carlo approach with the sample size

N = 1000

, that is,

β_{j} \approx \frac{1}{N} \sum_{k = 0}^{p - j} \sum_{i = 1}^{N} V_{k} D^{k + j} f (Y_{i}) .

Table 1 and Table 2 report the estimated values of the coefficients

β_{j}

s and the nodes

E [Y]

for expanding

sin (x)

and

cos (x)

, respectively. Such values are obtained using only one simulation and without the remainder terms.

It turns out that the coefficients used for the proposed expansions of functions differ from those of the traditional Taylor approach, except for the truncated Gaussian distribution, as expected.

The following figures show the simulated

L = 100

error values (in log 10) for different functions. Each panel of such figures corresponds to a specific CDF, and it shows the error values associated with Taylor’s approximation compared to the proposed approach, that is, the probabilistic Taylor-type without the remainder term (i.e., PTWoR) and/or with the remainder term (i.e., PTWR).

Figure 1 and Figure 2 depict the error values for the functions

sin (x)

and

cos (x)

when

p = 2

, respectively. Similar results are obtained when

p = 1

and 3. It appears that the practical Dirac measure almost exactly provides the well-known Taylor approximations, and the other CDFs (PTWoR) perform well when the input values are far way from the node used.

Figure 3 and Figure 4 depict the error values for the functions

exp (x)

and

log (x + 1)

when

p = 1

, respectively. Again, the practical Dirac measure exactly provides the well-known Taylor approximations. The proposed PTWR approach performs well compared to PTWoR for the uniform distribution and the beta distribution.

Figure 5 and Figure 6 show the error values for the absolute function

| x |

when

p = 2

and 3, respectively. The practical Dirac measure helps to reproduce the Taylor approximations. The PTWoR performs well when

p = 3

, while the PTWR associated with the Dirac measure gives the best results compared to PTWoR when

p = 2

. It is to be noted that Taylor series fail for this function.

5. Conclusions

A generic expansion of every smooth function has been proposed in this paper using any zero-one-ended function. Special zero-one-ended functions are CDFs, which lead to the unified stochastic framework for Taylor-type expansions of functions, including Taylor series. The proposed probabilistic expansions are a global approach by allowing one to incorporate derivative information at several points. Since different CDFs are possible, it is shown in this paper that the Dirac measure and its CDF is the best CDF for the proposed probabilistic expansions, leading to Taylor series. But, such series are exact at the node used and give accurate results for input values that fall within a neighborhood of that node. A functional equation is proposed in order to derive the best CDF over the whole domain. Without any solution in

R

, error bounds have been used for selecting the quasi optimal CDF.

The practical Dirac probability measure, such as truncated Gaussian distributions with the smallest variance, is used to recover the well-known Taylor expansion. It happens that some CDFs allow for improving the Taylor approximations. Such results are promising and require more investigations. For instance, different CDFs can be involved in the proposed expansion instead of only one CDF. Significant advantages of the proposed probabilistic expansion over Taylor series should be expected thanks to the best CDFs.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the two reviewers for their comments that have helped improving our manuscript.

Conflicts of Interest

The author declares no conflicts of interest.

References

Taylor, B. Methodus Incrementorum Directa and Inversa; Prop.VII, Th.III; Innys: London, UK, 1717. [Google Scholar]
Chen, P.; Ghattas, O. Taylor Approximation for Chance Constrained Optimization Problems Governed by Partial Differential Equations with High-Dimensional Random Parameters. SIAM/ASA J. Uncertain. Quantif. 2021, 9, 1381–1410. [Google Scholar] [CrossRef]
Patelli, E.; Pradlwarter, H. Monte Carlo gradient estimation in high dimensions. Int. J. Numer. Methods Eng. 2010, 81, 172–188. [Google Scholar]
Agarwal, A.; Dekel, O.; Xiao, L. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. In Proceedings of the Colt; Citeseer: University Park, PA, USA, 2010; pp. 28–40. [Google Scholar]
Shamir, O. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 2017, 18, 1703–1713. [Google Scholar]
Berahas, A.S.; Cao, L.; Choromanski, K.; Scheinberg, K. A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Found. Comput. Math. 2022, 22, 507–560. [Google Scholar] [CrossRef]
Gasnikov, A.; Dvinskikh, D.; Dvurechensky, P.; Gorbunov, E.; Beznosikov, A.; Lobanov, A. Randomized Gradient-Free Methods in Convex Optimization. In Encyclopedia of Optimization; Pardalos, P.M., Prokopyev, O.A., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 1–15. [Google Scholar]
Lamboni, M. Optimal and Efficient Approximations of Gradients of Functions with Nonindependent Variables. Axioms 2024, 13, 426. [Google Scholar] [CrossRef]
Lamboni, M. Dimension-Free Estimators of Gradients of Functions With(out) Non-Independent Variables. Axioms 2026, 15, 22. [Google Scholar] [CrossRef]
Novák, L.; Novák, D. On Taylor Series Expansion for Statistical Moments of Functions of Correlated Random Variables. Symmetry 2020, 12, 1379. [Google Scholar] [CrossRef]
Lamboni, M. Optimal ANOVA-Based Emulators of Models With(out) Derivatives. Stats 2025, 8, 24. [Google Scholar] [CrossRef]
Karvonen, T.; Cockayne, J.; Tronarp, F.; Särkkä, S. A probabilistic Taylor expansion with Gaussian processes. Trans. Mach. Learn. Res. 2023. [Google Scholar]
Chaskalovic, J.; Jamshidipour, H. A New First Order Expansion Formula with a Reduced Remainder. Axioms 2022, 11, 562. [Google Scholar] [CrossRef]
Chaskalovic, J.; Assous, F.; Jamshidipour, H. A new second order Taylor-like theorem with an optimized reduced remainder. J. Comput. Appl. Math. 2024, 438, 115496. [Google Scholar] [CrossRef]
Baudoin, F.; Zhang, X. Taylor expansion for the solution of a stochastic differential equation driven by fractional Brownian motions. Electron. J. Probab. 2012, 17, 1–21. [Google Scholar] [CrossRef]
Rößler, A. Stochastic Taylor Expansions for the Expectation of Functionals of Diffusion Processes. Stoch. Anal. Appl. 2004, 22, 1553–1576. [Google Scholar] [CrossRef]
Massey, W.A.; Whitt, W. A probabilistic generalization of Taylor’s theorem. Stat. Probab. Lett. 1993, 16, 51–54. [Google Scholar]
Lin, G.D. On a probabilistic generalization of Taylor’s theorem. Stat. Probab. Lett. 1994, 19, 239–243. [Google Scholar] [CrossRef]
Aliprantis, C.D.; Border, K.C. Infinite Dimensional Analysis: A Hitchhiker’s Guide; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Di Crescenzo, A. A probabilistic analogue of the mean value theorem and its applications to reliability theory. J. Appl. Probab. 1999, 36, 706–719. [Google Scholar] [CrossRef]
Yang, Y.; Zhou, X.; Wang, M. Taylor’s Theorem and Mean Value Theorem for Random Functions and Random Variables. arXiv 2025, arXiv:2102.10429. [Google Scholar]
Solak, E.; Murray-Smith, R.; Leithead, W.; Leith, D.; Rasmussen, C. Derivative observations in Gaussian process models of dynamic systems. Adv. Neural Inf. Process. Syst. 2002, 15. [Google Scholar]
Morris, M.D.; Mitchell, T.J.; Ylvisaker, D. Bayesian Design and Analysis of Computer Experiments: Use of Derivatives in Surface Prediction. Technometrics 1993, 35, 243–255. [Google Scholar] [CrossRef]
Chaskalovic, J.; Assous, F. Optimized first-order taylor-like formulas and gauss quadrature errors. Int. J. Numer. Anal. Model. 2025, 22, 824–842. [Google Scholar] [CrossRef]
Zemanian, A. Distribution Theory and Transform Analysis: An Introduction to Generalized Functions, with Applications; Dover Books on Advanced Mathematics; Dover Publications: New York, NY, USA, 1987. [Google Scholar]
Strichartz, R. A Guide to Distribution Theory and Fourier Transforms; Studies in Advanced Mathematics; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
Fubini, G. Sugli Integrali Multipli: Nota; Tipografia della R. Accademia dei Lincei: Roma, Italy, 1907. [Google Scholar]
Morrow, M. Fubini’s theorem and non-linear change of variables over a two-dimensional local field. arXiv 2007, arXiv:0712.2177. [Google Scholar]
Lamboni, M. Derivative-based integral equalities and inequality: A proxy-measure for sensitivity analysis. Math. Comput. Simul. 2021, 179, 137–161. [Google Scholar]
Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Epanechnikov, V. Nonparametric estimation of a multidimensional probability density. Theory Probab. Appl. 1969, 14, 153–158. [Google Scholar]
Silverman, B. Density Estimation for Statistics and Data Analysis; Chapman & Hall: New York, NY, USA, 1986. [Google Scholar]

Figure 1. Average of 30 error values (in log 10) for the function

sin (x)

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 1. Average of 30 error values (in log 10) for the function

sin (x)

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 2. Average of 30 error values (in log 10) for the function

cos (x)

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 2. Average of 30 error values (in log 10) for the function

cos (x)

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 3. Average of 30 error values (in log 10) for the function

exp (x)

when

p = 1

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; Panel (c) shows those for the Gaussian mixture distribution, and Panel (d) shows those for the optimal beta distribution. We also added the error values when including the estimated remainder term: PTWR.

Figure 3. Average of 30 error values (in log 10) for the function

exp (x)

when

p = 1

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; Panel (c) shows those for the Gaussian mixture distribution, and Panel (d) shows those for the optimal beta distribution. We also added the error values when including the estimated remainder term: PTWR.

Figure 4. Average of 30 error values (in log 10) for the function

log (x + 1)

when

p = 1

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; Panel (c) shows those for the Gaussian mixture distribution, and Panel (d) shows those for the optimal beta distribution. We also added the error values when including the estimated remainder term: PTWR.

Figure 4. Average of 30 error values (in log 10) for the function

log (x + 1)

when

p = 1

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; Panel (c) shows those for the Gaussian mixture distribution, and Panel (d) shows those for the optimal beta distribution. We also added the error values when including the estimated remainder term: PTWR.

Figure 5. Average of 30 error values (in log 10) for the absolute function

| x |

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 5. Average of 30 error values (in log 10) for the absolute function

| x |

when

p = 2

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 6. Average of 30 error values (in log 10) for the absolute function

| x |

when

p = 3

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Figure 6. Average of 30 error values (in log 10) for the absolute function

| x |

when

p = 3

. In addition to the error values for the traditional Taylor approximations, Panel (a) shows the error values corresponding to the practical Dirac’s measure; Panel (b) shows those associated with the uniform distribution; and Panel (c) shows those for the Gaussian mixture distribution.

Table 1. Estimated values of

β_{j}

s and the nodes for expanding

sin (x)

using the proposed approach associated with different CDFs. For the traditional Taylor approach, the coefficients are

f (3.14), D f (3.14),

and

D^{2} f (3.14)

.

Table 1. Estimated values of

β_{j}

s and the nodes for expanding

sin (x)

using the proposed approach associated with different CDFs. For the traditional Taylor approach, the coefficients are

f (3.14), D f (3.14),

and

D^{2} f (3.14)

.

	Truncated Gaussian CDF	Uniform CDF	Mixture CDF	Taylor
$β_{0}$	0.0015	−0.068	−0.008	0.0015
$β_{1}$	−0.999	−0.007	−0.352	−0.999
$β_{2}$	−0.002	0.026	0.004	−0.001
Node	3.14	3.14	3.223	3.14

Table 2. Estimated values of

β_{j}

s and the nodes for expanding

cos (x)

using the proposed approach associated with different CDFs. For the traditional Taylor approach, the coefficients are

f (3.14), D f (3.14),

and

D^{2} f (3.14)

.

Table 2. Estimated values of

β_{j}

s and the nodes for expanding

cos (x)

using the proposed approach associated with different CDFs. For the traditional Taylor approach, the coefficients are

f (3.14), D f (3.14),

and

D^{2} f (3.14)

.

	Truncated Gaussian CDF	Uniform CDF	Mixture CDF	Taylor
$β_{0}$	−0.999	−0.139	−0.667	−0.999
$β_{1}$	−0.002	−0.015	−0.019	−0.0015
$β_{2}$	0.999	0.053	0.356	0.999
Node	3.14	3.14	3.125	3.14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lamboni, M. Probabilistic Taylor-Type Expansions of Functions. Mathematics 2026, 14, 712. https://doi.org/10.3390/math14040712

AMA Style

Lamboni M. Probabilistic Taylor-Type Expansions of Functions. Mathematics. 2026; 14(4):712. https://doi.org/10.3390/math14040712

Chicago/Turabian Style

Lamboni, Matieyendou. 2026. "Probabilistic Taylor-Type Expansions of Functions" Mathematics 14, no. 4: 712. https://doi.org/10.3390/math14040712

APA Style

Lamboni, M. (2026). Probabilistic Taylor-Type Expansions of Functions. Mathematics, 14(4), 712. https://doi.org/10.3390/math14040712

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Probabilistic Taylor-Type Expansions of Functions

Abstract

1. Introduction

2. Expansions of a Function Using Other Functions

2.1. Probabilistic Expansions of Functions

2.2. Links with Taylor–Young and Maclaurin Series

3. Error Bounds and Choice of Distribution Functions

3.1. Zero-Mean Errors

3.2. Integrated Mean Squared Errors

4. Applications

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI