## 1. Introduction

Given an abstract probability space

$(\Omega ,\mathcal{F},\mathbb{P})$ and a random variable

$X:\Omega \to \mathbb{R}$, its probability density function

${f}_{X}$ is defined as the Borel measurable and non-negative function that satisfies

$\mathbb{P}[X\in B]={\int}_{B}{f}_{X}\left(x\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}x$ for any Borel set

$B\subseteq \mathbb{R}$ ([

1], Ch. 2). In other words,

${f}_{X}\left(x\right)=\frac{\mathrm{d}{\mathbb{P}}_{X}\left(x\right)}{\mathrm{d}x}$ is the Radon–Nikodym derivative of the probability law

${\mathbb{P}}_{X}=\mathbb{P}\circ {X}^{-1}$ with respect to the Lebesgue measure. When the density function exists, the random variable (its probability law) is said to be absolutely continuous. Examples of absolutely continuous distributions are Uniform, Triangular, Normal, Gamma, etc.

When the probability law of

X is unknown in explicit form but realizations of

X can be drawn, a kernel estimation may be used to reconstruct the probability density function of

X. It is a non-parametric method that uses a finite data sample from

X. If

${x}_{1},\dots ,{x}_{M}$ are independent realizations of

X, then the kernel density estimator takes the form

${\widehat{f}}_{X}^{M}\left(x\right)=\frac{1}{Mb}{\sum}_{i=1}^{M}K\left(\frac{x-{x}_{i}}{b}\right)$, where

K is the kernel function and

b is the bandwidth, see [

2].

Kernel density estimation may present slow convergence rate with the number of realizations and may smooth out discontinuity and non-differentiability points of the target density. For certain random variable transformations

$X=g(U,V)$, where

U and

V are independent random quantities and

g is a deterministic real function, we will see that there is an alternative estimation method of parametric nature that is based on Monte Carlo simulation. Such approach has been used for certain random differential equations problems, see [

3,

4,

5,

6,

7]. However, a detailed comparison with kernel density estimation, both theoretically and numerically, has not been performed yet. This detailed comparison is the novelty of the present paper. We will demonstrate that the parametric method is more efficient and correctly captures density features. Numerical experiments will illustrate these improvements.

As it shall be seen later (see expression (

2)), our approach relies upon the representation of the probability density via the expectation of a transformation derived through the application of the so-called Random Variable Transformation (RVT) method [

1] (also termed “change of variable formula”, see [

8]). This method depends on the computation of invertible functions with Jacobian determinant. We point out that, when the number of random model parameters is large, it is convenient to apply competitive techniques to carry out computations. In this regard, we here highlight recent contributions that address this important issue in the machine learning setting. In [

8], the authors applied real-valued non-volume preserving (real NVP) transformations to estimate the density when reconstructing natural images on datasets through sampling, log-likelihood evaluation, and latent variable manipulations. Recently, in [

9], the real NVP was improved by combining the so called flow-based generative models. In the machine learning context, these are methods based upon the application of the “change of variable formula”—together with Knothe–Rosenblatt rearrangement.

## 2. Method

Let $X=g(U,V)$ be a random variable, where U is a random variable, V is a random variable/vector, and g is a deterministic real function. The aim is to estimate the probability density function of X.

A kernel density estimator takes the form

where

${u}_{1},\dots ,{u}_{M}$ and

${v}_{1},\dots ,{v}_{M}$ are independent realizations of

U and

V, respectively.

Let us see an alternative method when

U and

V are independent. Suppose that

U has an explicit density function

${f}_{U}$. Suppose also that

$g(\xb7,V)$ is invertible for all

V, where

$h(\xb7,V)$ is the inverse. Then,

where

${\partial}_{1}$ is the partial derivative with respect to the first variable and

$\mathbb{E}$ is the expectation operator. Notice that we are not requiring

V to have a probability density.

Indeed, if

${\mathbb{P}}_{V}=\mathbb{P}\circ {V}^{-1}$ denotes the probability law of

V, then the following chain of equalities holds:

We have used the independence, the transformed density function ([

1], Section 6.2.1), and Fubini’s theorem.

To estimate the expectation from (

2) when the probability law of

V is unknown in closed form but realizations of it can be drawn, Monte Carlo simulation may be conducted. For example, the crude Monte Carlo method estimates

${f}_{X}\left(x\right)$ by using

where

${v}_{1},\dots ,{v}_{M}$ are independent realizations of

V ([

10], pp. 53–54).

Notice that the structure and the complexity of this estimator are very similar to a kernel one. For (

1), one needs realizations

${u}_{1},\dots ,{u}_{M}$,

${v}_{1},\dots ,{v}_{M}$, while (

3) only uses

${v}_{1},\dots ,{v}_{M}$. Hence, the complexity of drawing realizations of

V affects (

1) and (

3) in exactly the same way. On the other hand, (

1) evaluates

K and

g, while (

3) evaluates

${f}_{U}$ and

h.

By the central limit theorem, the root mean square error of the Monte Carlo estimate (

3) of the mean is

$\sigma /\sqrt{M}=\mathcal{O}\left({M}^{-1/2}\right)$, where

$\sigma $ is the standard deviation of the random quantity within the expectation (

2) (which is assumed to be finite). Variance reduction methods may be applied, such as antithetic or control variates, see [

11]. By contrast, the root mean square error of the kernel estimate (

1) is

$\mathcal{O}\left({M}^{-r}\right)$,

$r<1/2$ [

2].

The density estimation of ${f}_{X}\left(x\right)$ thus becomes a parametric problem, as we are estimating the expectation parameter from a population distribution. This is in contrast to kernel density estimation, which is non-parametric because it reconstructs a distribution. Moreover, the method presented here acts pointwise, so discontinuities and non-differentiability points are correctly captured, without smoothing them out.

## 3. Some Numerical Experiments

In this section, we compare the new parametric method with kernel density estimation numerically. We use the software Mathematica^{®} (Wolfram Research, Inc, Mathematica, Version 12.0, Champaign, IL, USA, 2019).

**Example** **1.** Let

$X=U+V$, where

$U\sim \mathrm{Normal}(0,1)$ and

$V\sim \mathrm{Uniform}(-1,1)$ are independent random variables. Obviously, the exact density function of

X is known via convolution:

${f}_{X}\left(x\right)={\int}_{-\infty}^{\infty}{f}_{U}(u-v){f}_{V}\left(v\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}v$. However, this example allows for comparing the new parametric method with kernel density estimation by calculating exact errors. With

$g(U,V)=U+V$ and

$h(U,V)=U-V$, expression (

2) becomes

${f}_{X}\left(x\right)=\mathbb{E}\left[{f}_{U}(x-V)\right]$. From

M realizations, we conduct a kernel density estimation (Gaussian kernel with Silverman’s rule to determine the bandwidth,

$b=0.9\times min\{\widehat{\sigma},\mathrm{I}\widehat{\mathrm{Q}}\mathrm{R}/1.34\}\times {M}^{-1/5}$, where

$\widehat{\sigma}$ is the standard deviation of the sample and

$\mathrm{I}\widehat{\mathrm{Q}}\mathrm{R}$ is the interquartile range of the sample), crude Monte Carlo simulation on

$\mathbb{E}\left[{f}_{U}(x-V)\right]$, antithetic variates method (

$M/2$ realizations of

V and the sample is completed by changing signs), and control variates method (the control variable here is

V). Let

${\widehat{f}}_{X}^{M}\left(x\right)$ be the density estimate of

${f}_{X}\left(x\right)$. We consider the error measure

${\delta}^{M}={\int}_{-\infty}^{\infty}{({f}_{X}\left(x\right)-{\widehat{f}}_{X}^{M}\left(x\right))}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}x$. Numerically, this integral is computed by fixing a large interval

$[-R,R]$ and performing Gauss–Legendre quadrature on it. As

${\delta}^{M}$ is random, we better consider

${\u03f5}^{M}={\left(\mathbb{E}\left[{\delta}^{M}\right]\right)}^{1/2}$. This expectation is estimated by repeating the density estimate several times (we did so 20 times).

Figure 1 reports the estimated error

${\widehat{\u03f5}}^{M}$ in log-log scale, for different orders

M growing geometrically. Observe that the parametric approach is more efficient than the kernel method. Variance reduction methods also allow for lowering the error. The lines corresponding to the three parametric methods have slope

$-1/2$ approximately, due to the well-known decrease of the Monte Carlo error.

For example, to achieve an error ${\u03f5}^{M}<0.01$, the kernel density estimation requires about $\mathrm{25,000}$ realizations, while the parametric crude Monte Carlo estimation needs 400 realizations. The antithetic method decreases the number of required realizations to 50–100, and the control variates approach does so even more, to less than 50.

In

Table 1, we present the timings in seconds for achieving root mean square errors less than

$0.01$ and

$0.001$. We work at the density location

$x=1$. Observe that the kernel density estimation is the least efficient method, especially as we require smaller errors.

**Example** **2.** Let

$X=UV$, where

$U\sim \mathrm{Normal}(0,1)$ and

$V=P+{\sum}_{i=1}^{12}{Q}_{i}$, being

$P\sim \mathrm{Poisson}\left(1\right)$ and

${Q}_{i}\sim \mathrm{Triangular}(0.1,0.105)$. All of the random variables are assumed to be independent. With

$g(U,V)=UV$ and

$h(U,V)=U/V$, expression (

2) is

${f}_{X}\left(x\right)=\mathbb{E}\left[{f}_{U}\left(\frac{x}{V}\right)\frac{1}{V}\right]$. Notice that, despite

V being discrete,

X has a density. This expectation cannot be computed via quadratures, due to the large dimension of the random space. We employ our parametric method using crude Monte Carlo simulation. In

Figure 2, first panel, we depict the estimated density function

${\widehat{f}}_{X}^{M}\left(x\right)$ using

M realizations. For

$M=\mathrm{10,000}$ and

$M=\mathrm{30,000}$, we perceive good agreement between the approximations. This is in contrast to the second panel of

Figure 2, where the kernel density estimation does not show convergence yet.

**Example** **3.** Consider the same setting of Example 2 but

$U\sim \mathrm{Uniform}(-1,1)$ (now

${f}_{U}$ is not continuous on

$\mathbb{R}$). In

Figure 3, first plot, we show the estimated density function

${\widehat{f}}_{X}^{M}\left(x\right)$ by the parametric crude Monte Carlo method using

M realizations. The discontinuity points of the target density

${f}_{X}$ are correctly captured. Indeed, the method acts pointwise in

x, so any feature of

${f}_{X}$ will be correctly identified, no matter how rare it is. By contrast, the kernel density estimation smooths out the discontinuities, see the second panel of the figure.

## 4. Application to Random Differential Equations

A random differential equation problem considers some of the terms in the system (input coefficients, forcing term, initial conditions, etc.) as random variables or stochastic processes [

1]. This is due to errors in measurements of data when trying to model a physical phenomenon, which introduces a variability in the parameters estimation. The solution to the system is then a stochastic process. Its deterministic trajectories are not the main concern. Instead, the study of the statistical content of the solution is the main goal. An important aim is to compute its probability density function. The key idea is that the solution is a transformation of the input random parameters; therefore, the probability density may be estimated as described in this paper whenever the solution is given in closed form. In the notation used in this paper,

U and

V would denote the input random parameters of the system, and

g would be the transformation mapping that relates the output with the inputs. A specific input parameter is selected as

U, with respect to which the mapping

g is easily invertible to obtain

h.

We consider the Ricatti random differential equation

where

$A>0$ and

${X}_{0}$ are independent random variables. The solution to (

4) is given by

By taking

$U={X}_{0}$, the density function of

$X\left(t\right)$ is given by

From

M realizations of

A, say

${A}^{\left(1\right)},\dots ,{A}^{\left(M\right)}$, the expectation is estimated via crude Monte Carlo simulation:

As seen in the previous sections, this procedure is more efficient and certain than kernel density estimation, expressed as

where the superscript

i denotes the

i-th realization,

$i=1,\dots ,M$.

For a specific numerical example, let us fix

$A\sim \mathrm{Uniform}(-1,1)$ and

${X}_{0}\sim \mathrm{Uniform}(0.1,0.13)$.

Figure 4, first panel, plots (

5) for a certain number of realizations

M for

A, for time

$t=0.3$. A comparison is conducted with kernel density estimation with a Gaussian kernel (second panel). It must be appreciated that the parametric Monte Carlo method (

5) correctly captures the non-differentiability points of the target density.

For another example, let us consider the damped pendulum differential equation with uncertainties:

where

$T>0$,

${w}_{0}\ne 0$,

$\xi \ne 0$ and

${\xi}^{2}<1$ (underdamped case) are constant, the initial position

${X}_{0}$ and the initial velocity

${X}_{1}$ are random variables, and the source/forcing term

$Y\left(t\right)$ is a stochastic process ([

1], Example 7.2). The solution is given by

where

${\omega}_{1}={\omega}_{0}\sqrt{1-{\xi}^{2}}\ne 0$ and

$p\left(t\right)=\frac{1}{{\omega}_{1}}{\mathrm{e}}^{-\xi {\omega}_{0}t}sin\left({\omega}_{1}t\right)$. In [

4], the density function of the response

$X\left(t\right)$ is expressed in terms of an expectation (

2), which is estimated by means of parametric crude Monte Carlo simulation due to the large dimension of the random space. By taking

$U={X}_{0}$, one of the formulas derived is

where the superscript

i denotes the

i-th realization,

$i=1,\dots ,M$. Compare (

8) with a kernel density estimation

The expressions and their complexities are very similar, but the convergence of (

8) is faster with

M, as justified in the previous sections.

Let us see a numerical example. We take the upper time

$T=1$, the damping ratio

$\xi =1/2$ and the natural frequency

${\omega}_{0}=\pi /2$. Consider

${X}_{0}\sim \mathrm{Exponential}\left(3\right)$,

${X}_{1}\sim \mathrm{Binomial}(7,0.31)$ and

$Y\left(t\right)={\sum}_{j=1}^{\infty}\frac{\sqrt{2}}{j\pi}sin\left(tj\pi \right){\xi}_{j}$. The series is understood in

${\mathrm{L}}^{2}([0,1]\times \Omega )$ and

${\left\{{\xi}_{j}\right\}}_{j=1}^{\infty}$ is a sequence of independent random variables with

$\mathrm{Uniform}(-\sqrt{3},\sqrt{3})$ distribution. This is a Karhunen–Loève expansion ([

10], p. 47). It is assumed that

${X}_{0}$,

${X}_{1}$, and

Y are independent. By applying (

8) with the sum of

$Y\left(t\right)$ truncated to a finite-term series, we estimate the density function of

$X\left(t\right)$. For example, in

Figure 5, first panel,

${f}_{X\left(0.5\right)}\left(x\right)$ is plotted for orders of truncation

$N=4$, 5, and 6. Overlapping of the graphs is clearly appreciated. The strong oscillations of the density are perfectly captured because the method acts pointwise. In the second panel of the figure, the kernel density estimate with Gaussian kernel is plotted, with

${10}^{6}$ realizations. Although there is apparent agreement between both panels, our parametric Monte Carlo method captures all the peaks while the kernel method smooths them out.

Further applications of the methodology for random differential equations may be consulted in [

3,

4,

5,

6,

7]. The present paper forms the theoretical and computational foundations of the methodology used in those recent contributions.

## 5. Conclusions

In this paper, we have been concerned with the density estimation of random variables. We have proposed an alternative to kernel density estimation for certain random variable transformations. The alternative is based on the estimation of an expectation parameter via Monte Carlo methods; therefore, it is of parametric nature and improves kernel methods in terms of efficiency. Furthermore, the method captures density features and does not smooth out discontinuity and non-differentiability points of the target density.

As shown here, the solution to some random differential equation problems is an explicit transformation of the input random parameters. The methodology proposed in this paper may be employed to estimate the density function of the closed-form stochastic solution parametrically, instead of relying on kernel estimation.