1. Introduction
Statistical projection estimates in the Monte Carlo method were first proposed by N.N. Chentsov [
1]. He developed a general technique for optimizing such estimates; this technique, however, requires clarification in specific problems [
2]. In the paper [
3], projection estimates based on the Legendre polynomials for marginal probability densities of solutions to stochastic differential equations were proposed. The mean square error of the estimates was studied, and a comparison of both the obtained projection estimates and the histogram was carried out with examples. The analysis of the results showed that for the same sample size, the projection estimate approximates the density more accurately. In addition, such an estimate is specified analytically, and it is a smooth function. So, it is preferable in, e.g., filtering problems and the control of nonlinear systems [
4,
5,
6,
7].
This paper presents two statistical algorithms, based on algorithms using Legendre polynomials from [
3], for jointly obtaining projection estimates of the density and distribution function of a random variable.
When solving different problems by statistical estimation algorithms, it is important to make an optimal (consistent) choice of their parameters for finding the mathematical expectation of a certain functional that depends on a random variable. Therefore, this paper analyzes the mean square error of the projection estimate from this point of view. Such parameters are the projection expansion length and sample size. We solve a conditional optimization problem considered by G.A. Mikhailov [
8]. The objective of this problem is to minimize the algorithm complexity while achieving the required level of approximation accuracy. We study how to minimize the mean square error of the projection estimates of the density and distribution function by the equalization of its deterministic and stochastic components. The accuracy of the projection estimates also depends on the degree of smoothness of the density; therefore, in this paper, we consider the dependence of error not only on the projection expansion length but also on the degree of smoothness of the approximated function.
The obtained theoretical results are confirmed with examples using a two-parameter family of densities that allows one both to choose the degree of smoothness and to perform a simple calculation of the expansion coefficients with respect to Legendre polynomials.
The rest of this paper has the following structure:
Section 2 contains the necessary information on the Fourier–Legendre series and the definition of projection estimates of the density and distribution function; in this section, relations for expansion coefficients of the density and distribution function with respect to Legendre polynomials are obtained.
Section 3 presents algorithms for jointly obtaining randomized projection estimates of the density and distribution function. The analysis and conditional optimization of randomized projection estimates are carried out in
Section 4.
Section 5 proposes a two-parameter family of densities with different degrees of smoothness and related distribution functions; it presents an algorithm for modeling the corresponding random variables, gives the expansion coefficients of the considered densities and distribution functions, and studies the convergence rate of their expansions. Numerical experiments and their analysis are discussed in
Section 6. In
Section 7, a projection estimate of the density is compared with a histogram. Brief conclusions on the paper are given in
Section 8.
3. Algorithms for Jointly Obtaining Randomized Projection Estimates of Density and Distribution Functions
The randomization of the projection estimate of the density
g is obtained by the first equation of Formula (
14) as a result of calculating linear functional (
15) for the Legendre polynomials (
6) by the Monte Carlo method with independent realizations of the random variable
:
where
is the
lth realization and
N is the sample size (number of realizations).
Estimates
can also be obtained from expression (
16) as
where
are statistical estimates of initial moments of the random variable
,
and
. Using Formulae (
4) and (
5), we can write the expressions for the estimates of the density expansion coefficients separately for even indices (
), as
and for odd indices (
), as
The randomization of the projection estimate of the distribution function
f is obtained by the second equation of Formula (
14) and recurrence relations (
20):
Here and below, the dependence of the estimates of both expansion coefficients and functions on the sample size N is not indicated for simplicity.
Remark 3. To obtain a projection estimate of the distribution function f based on the first Legendre polynomials, it is necessary to find estimates of the expansion coefficients of the density g with respect to the first Legendre polynomials.
Further, we present Algorithm 1 for jointly obtaining randomized projection estimates of the density and distribution function of the random variable
.
Algorithm 1: Jointly obtaining projection estimates of the density and distribution function using explicit formulae for the Legendre polynomials. |
0. Set the projection expansion length n and the sample size N.
1. Simulate N realizations of the random variable , .
2. Find statistical estimates of initial moments of the random variable using Formula (23), . Set .
3. Find estimates of expansion coefficients of the density g using Formula (22) or Formulae (24) and (25), .
4. Find estimates of expansion coefficients of the distribution function f using Formula (26), .
5. Find randomized projection estimates and of the density and distribution function, respectively, using Formula (14).
|
In step 3 in Algorithm 1, errors can occur due to the peculiarities of machine arithmetic when calculating expansion coefficients
. To avoid this, it is recommended to use Formula (
21) together with recurrence relation (
1) and expression (
6).
Next, we formulate Algorithm 2 for jointly obtaining randomized projection estimates of the density and distribution function.
Algorithm 2: Jointly obtaining projection estimates of the density and distribution function using the recurrence relation for the Legendre polynomials. |
0. Set the projection expansion length n and the sample size N.
1. Simulate N realizations of the random variable , .
2. Find estimates of expansion coefficients of the density g using Formula (21) together with recurrence relation (1) and expression (6), .
3. Find estimates of expansion coefficients of the distribution function f using Formula (26), .
4. Find randomized projection estimates and of the density and distribution function, respectively, using Formula (14).
|
4. Analysis and Conditional Optimization of Randomized Projection Estimates
In this section, we analyze the error of the projection estimate
relative to the density
g of the random variable
in
. Let
; then,
due to Jensen’s inequality [
21]. Further, we consider the following expression:
Functions
and
are orthogonal in
since they belong to different subspaces. The first one is formed by the Legendre polynomials
for
, and the second one is formed by the Legendre polynomials
for
. This is a consequence of the equalities
hence,
According to Parseval’s identity [
9], we have
and
Therefore, taking into account that the unbiased estimate of mathematical expectation is used [
18], namely,
, we obtain
where
denotes the variance.
The equality
is satisfied since
, i.e.,
. For
, the variance of estimates
and the sample size
N are inversely proportional [
22]; therefore, using additionally inequality (
9) under the condition
(see Remark 1), we find
where
are constants independent of
n and
N.
The error of the projection estimate
relative to the distribution function
f of the random variable
in
is analyzed similarly:
where
are constants independent of
n and
N.
From the obtained estimates, it is clear how the mean square error depends on the projection expansion length n and the sample size N.
Further, we consider a problem of the optimal (consistent) choice of parameters for statistical algorithms to obtain projection estimates of the density and distribution function: the projection expansion length n and the sample size N. For jointly obtaining these estimates by Algorithms 1 and 2, it is sufficient to consider the optimal choice of parameters for density estimation only and use them for distribution function estimation. This is because the degree of smoothness of f is greater than the degree of smoothness of g, so the sequence converges to f faster than the sequence converges to g.
The main result is stated using the symbol “≍”. The expression for suitable functions u and v means that and for ; i.e., there exist constants such that , where n and are natural numbers.
Theorem 1. Let the density g of the random variable ξ belong to , and let be the randomized projection estimate of the density obtained by Algorithms 1 or 2 with the projection expansion length n and the sample size N. Then, the minimum complexity of obtaining the estimate is achieved with the parameters and that satisfy relationswhere γ is the required approximation accuracy for the density g in , i.e., . Proof. To find the optimal parameters
and
for the estimate
, it is sufficient to equate terms [
8] in Formula (
28) and express the required approximation accuracy
from the relation
From the equality
we obtain the relationship for the optimal parameters, i.e.,
, as well as expressions
and consequently,
and
. □
Theorem 1 establishes the relationship between parameters in Algorithms 1 and 2,
n and
N, as well as the dependence of approximation accuracy on parameter
s. By choosing parameters in this way, we have
where
is a constant independent of
n and
N.
Remark 4. The error (
28)
of the randomized projection estimate relative to the density g is based on inequality (
9)
, but inequality (
10)
can also be used. Then, taking into account Remark 1, we have It is possible to formulate an analogue of Theorem 1 and show that the following relationship between parameters for estimating the density g is conditionally optimal: Remark 5. If the distribution function f of the random variable ξ belongs to (this condition holds if the density g of the random variable ξ belongs to , i.e., if the conditions of Theorem 1 are satisfied), then the minimum complexity of obtaining randomized projection estimate by Algorithms 1 or 2 is achieved with parameters and that satisfy relationswhere γ is the required approximation accuracy for the function f in , i.e., . The proof is similar to the proof of Theorem 1. Another relationship between parameters can be found if we take inequality (
10)
instead of inequality (
9)
(see Remark 4). 5. Two-Parameter Family of Densities with Different Degrees of Smoothness
In this section, a special example is proposed to test statistical algorithms for obtaining projection estimates depending on the projection expansion length, sample size, and smoothness of the estimated function.
5.1. Densities with Different Degrees of Smoothness and Related Distribution Functions
Let
be the random variable defined by the density
:
where
and
are parameters (natural numbers) and
is a normalizing constant, with
because
Further, we assume that . The function has the following properties:
(a) is continuous on the set of real numbers;
(b) The support of is the set ;
(c) The normalization condition holds:
(d)
is differentiable at
only
r times:
(e) The derivative
exists almost everywhere on
; if the derivative is understood in a generalized sense, then
Next, we determine the distribution function
for the random variable
with density (
30):
It is easy to see that
for
and
for
. Moreover,
and consequently,
The function
is differentiable
times at
since
due to relationship (
31). Thus,
If we do not restrict ourselves to the space with natural s (see Remark 1), then we can show that and .
Consider the case
. Then, the generalized derivative
is represented as a linear combination of functions
and
, where
is the indicator of
. It suffices to find a condition for parameter
which ensures that
, where
is defined by Formula (
11) and
.
If
x and
y have the same sign, then
, and if
x and
y have different signs, then
. Hence,
The integrals on the right-hand side of the latter equality coincide since the integrand does not change when the signs of
x and
y change simultaneously. The convergence condition for them is
. Indeed, for
, we have
and
For , these integrals obviously diverge. Therefore, for , and for .
The case is similar to the considered case, so finally, and , provided that and .
5.2. Modeling Random Variables with Given Test Distributions Using Monte Carlo Method
The modeling formula for the random variable
with distribution function
for parameters
and
can be derived using the inverse function method [
22]:
, where
is the inverse function to
and
is the random variable having a uniform distribution on
.
Given the distribution function
, we can obtain the Algorithm 3 for modeling the random variable
.
Algorithm 3: Modeling the random variable with given test density and distribution function. |
0. Set parameters , , calculate and :
|
1. Obtain a realization of the random variable having a uniform distribution on . |
2. If , then is a root of the algebraic equation
|
from , otherwise is a root of the algebraic equation
|
from . |
For small and , roots can be found analytically. Next, we obtain the modeling formulae for two cases used below.
Proposition 1. For the random variable ξ with densitythe modeling formula is as follows:where Proof. The density
g of the random variable
is included in a two-parameter family (
30) for
,
, and
:
. For given parameters, we have
First, we consider the case
; i.e., we should solve Equation (33). This is the quadratic equation
where
,
.
The function
has a minimum at
since
,
and
. This means that the quadratic equation has two real roots (the discriminant is positive) and the largest of them determines
:
Second, we consider the case
; i.e., we should solve Equation (34). This is the cubic equation
where
,
.
The function
has extrema at
:
. A minimum is reached at
:
and
. A maximum is reached at
:
and
. This means that the cubic equation has three real roots (the discriminant is positive) and
is determined by the root that lies between the smallest and the largest roots. By using Cardano’s formulae for roots [
23], we have
where
and
, which corresponds to the positive discriminant; therefore,
A and
B are complex conjugate numbers:
Let
. Since
and
, we obtain
; therefore,
and
Thus,
i.e., Formula (
35) is valid. □
Proposition 2. For the random variable ξ with densitythe modeling formula is as follows:where Proof. The density
g of the random variable
is included in a two-parameter family (
30) for
,
, and
:
. In this case,
The proof is the same as for Proposition 1, so some details are omitted. We only note that for
, Equation (33) is the quartic equation
where
,
, the polynomial
has a minimum at
, and
. The quartic equation has two real roots, and the largest of them determines
. It is convenient to find roots using Ferrari’s formulae [
23].
For
, Equation (34) is the cubic equation that is solved similarly to the cubic equation from the proof of Proposition 1. Such reasoning leads to Formula (
36). □
5.3. Expansion Coefficients of Test Functions (Fourier–Legendre Series)
To exactly calculate the second term of the projection estimate error (
27) in examples, we find expansion coefficients
of the density
, as well as expansion coefficients
of the distribution function
with respect to Legendre polynomials (
6):
First, we obtain the following values:
We multiply the left-hand and right-hand sides of relation (
1) by
and integrate over the interval
:
or
Similarly, by multiplying the left-hand and right-hand sides of relation (
1) by
and integrating over the interval
, we obtain
These relations can be formally applied for
when
but not for
. Therefore, we should consider the case
separately:
For
and
, we have
and then we use relation (
17):
If
is even, then
according to Formula (
5), and
. If
i is odd, then we can apply the explicit Formula (
4) or obtain an additional recurrence relation. We choose the latter way and take into account relation (
1) for
:
The same reasoning leads to the following results:
and
where
for even
. For odd
i, we have
Thus, we obtain the general expression for calculating
and
with arbitrary non-negative integers
and
i:
so that the expansion coefficients of the density
with respect to Legendre polynomials (
6) are expressed as follows (relation (
37) is also used here):
Expressions for the expansion coefficients of the distribution function
are similarly obtained:
These expressions for the expansion coefficients of the density
and distribution function
are used for their approximation as
and for the approximation error:
5.4. Analysis of Convergence Rate for Expansions of Test Functions
Consider the function
where
is a parameter (natural number). Its expansion coefficients
with respect to Legendre polynomials (
6) are expressed through the previously found values
:
Further, we derive the recurrence relation for
, different from relation (
38). We multiply the left-hand and right-hand sides of relation (
1) by
and integrate over the interval
:
Next, we use the rule of integration by parts:
and consequently, taking into account equality (
17), we obtain
therefore,
or
i.e.,
The series formed by the squared expansion coefficients
converges since
. It can be represented as a sum of two series:
The Raabe–Duhamel test [
24] implies that the first series (similarly for the second one) converges in the same way as the series
since the sequence
has the limit
, but the convergence of this series is equivalent to the convergence of the integral
which takes place under the condition
, or
.
As a result, using Parseval’s identity, we find
where
and this corresponds to estimate (
9) with the limit value
(see Remark 1).
The obtained result can be transferred to the function
and its expansion coefficients
with respect to Legendre polynomials (
6). The easiest way to prove this is the use of the equality
, which follows from the relation between expansion coefficients of an arbitrary function from
and its reflection [
25]. Therefore, the same result holds for the function
. This result can be extended to the function
due to its degree of smoothness is greater by one than the degree of smoothness of
.
Thus,
where
are some constants. Moreover, we can assume that
and
are real positive numbers, and if we consider
not to be a density but some function not bound by the probability theoretical framework, then the condition
is admissible. Such a convergence reflects that
and
subject to
and
. It corresponds to estimate (
9).
6. Numerical Experiments
In this section, we present the results of the joint estimation of the density and distribution function for two examples that use a two-parameter family (
30) of densities with different degrees of smoothness. In these examples, the results are presented in the tables that contain the errors of the projection estimates of the density and distribution function in
. We study the dependence of error on the projection expansion length (for the maximum degree
n of the Legendre polynomials, values 4, 8, 16, 32, and 64 are used, i.e.,
for
), on the sample size
for
, and on the degree of smoothness of the approximated density (see Examples 1 and 2 below). Algorithm 2 is applied for estimation.
These examples show the approximation errors
and
, which are calculated using Formula (
39), i.e., deterministic components of projection estimate errors. In the tables, they are in rows marked by the symbol “∗”. The remaining rows contain errors that include deterministic and stochastic components. The formulae for these errors follow from Parseval’s identity:
where
and
are estimates of expansion coefficients
and
, respectively. For an arbitrary density
g, a similar formula was used to obtain estimate (
27).
Example 1. Let be the density from a two-parameter family (
30)
, and let be the related distribution function described by Formula (
32)
, with the following parameters: , for . The modeling formula is given in Proposition 1. The function is continuous, non-differentiable at , but its first-order derivative exists almost everywhere on Ω: , , i.e., . However, if we do not restrict ourselves to the space with natural s (see Remark 1), then for (). The degree of smoothness of is greater, and . This corresponds to Formulae (40), which take the formfor given parameters and . According to Theorem 1, for the limit value , to achieve the required approximation accuracy , , the conditionally optimal parameters should be as follows: ; this is confirmed by the statistical modeling results (see Table 1 and Table 2). In the row “∗” of Table 1, the deterministic component of error decreases by approximately times (in Table 2, by approximately times) when the projection expansion length n is doubled. In the rest of this table, errors corresponding to optimal parameters and are highlighted in bold, and they are consistent with the relationship . Table 2 demonstrates higher accuracy of distribution function estimation. In our calculations with the formulae for errors, the following values are used (squared norms of the functions and ): For clarity, Figure 1 contains the approximation errors in graphical form. One axis shows , which determines the projection expansion length, , and another axis shows , which determines the sample size, . The vertical axis corresponds to the density approximation errors. The example under consideration corresponds to the left part of Figure 1, which presents two surfaces. The first one (red) corresponds to the obtained computational error, it is formed on data from Table 1. The second one (blue, with marked nodes) corresponds to the theoretical error according to Formula (28) with : Constants and are approximately determined based on the condition of a minimum sum of squared deviations between the theoretical error and computational error.
Example 2. Let be the density from a two-parameter family (
30)
, and let be the related distribution function described by Formula (
32)
, with the following parameters: , for . The modeling formula is given in Proposition 2. The function is continuous, differentiable everywhere on Ω, and its second-order derivative exists almost everywhere on Ω: , ; hence, . Considering the space with real non-negative s (see Remark 1), we affirm that for (). The degree of smoothness of is greater, and . This corresponds to Formula (
40)
when substituting given parameters and : Theorem 1 implies that in order to achieve the required approximation accuracy , , with the limit value , the conditionally optimal parameters should satisfy the relationship , and this is illustrated by the statistical modeling results from Table 3 and Table 4. In the row “∗” of Table 3, if the projection expansion length n is doubled, then the deterministic component of error decreases by approximately times (in Table 4, by approximately times). In the rest of this table, errors corresponding to optimal parameters and are shown in bold, and they are consistent with the relationship . Table 4 shows higher accuracy of distribution function estimation. In our calculations with the formulae for errors, the following values are used (squared norms of the functions and ): Figure 1 contains a graphical representation of the numerical experiment. The meaning of different axes is described in Example 1. This example corresponds to the right part of Figure 1 with two surfaces. The first one (red) is constructed from the obtained computational error given in Table 3, and the second one (blue, with marked nodes) corresponds to the theoretical error according to Formula (28) with :where constants and are chosen from the same condition as in Example 1. 7. Comparison of Projection Density Estimate and Histogram
The classical method of estimating the density of a random variable is associated with a histogram [
18], which is very often used in applied problems. We consider a histogram a projection estimate since the main results of this paper are related to projection estimates.
We can define block pulse functions on the set
as
where
L, a natural number, is the number of block pulse functions and
. It is advisable to redefine the function
in such a way that it becomes continuous on the left at
.
Block pulse functions (
41) form an orthonormal system of functions in
. This system is not complete, but it can be used to approximate an arbitrary function
:
where
For
with natural
m, the first
L Walsh or Haar functions on
are exactly expressed through block pulse functions (
41). Therefore, the results of this section can easily be adapted to the projection estimates of the distribution of a random variable using Walsh or Haar functions that form complete orthonormal systems of functions [
26].
The approximation accuracy in
is usually estimated as follows [
27]:
where
does not depend on
L, under the condition
.
As a function
u in the given formulae, we can use the density
g and the distribution function
f. We restrict ourselves to the density
g only (the corresponding expansion coefficients are further denoted by
):
where
Thus, the calculation of expansion coefficients
of the density
from a two-parameter family (
30) is reduced to the calculation of values of the corresponding distribution function
described by Formula (
32), i.e.,
To calculate the approximation error for the density
, we can use the formula similar to the first equation of Formula (
39):
The histogram can be defined by the expression based on approximation (
43):
where
are estimates of expansion coefficients
based on observations of the random variable
,
. For example,
where
is the
lth realization and
N is the sample size (number of realizations). The value
h depending on
L specifies the histogram step.
The error of the histogram
relative to the density
g includes deterministic and stochastic components, and it is estimated from below by the value
:
The results of calculations using Formula (
44) for both densities from Examples 1 and 2 are given in
Table 5 and
Table 6, respectively. They should be compared with the rows “∗” in
Table 1 and
Table 3. Such a comparison shows the undoubted advantage of projection density estimates using Legendre polynomials. The approximation accuracy when using block pulse functions for
corresponds to the approximation accuracy when using Legendre polynomials for
in Example 1 and for
in Example 2. If the number of block pulse functions
L is doubled, then the deterministic component of error decreases by approximately 2 times, and this conclusion does not depend on the degree of smoothness of the estimated density.
The problem of the conditional optimization of the algorithm for obtaining the histogram was solved in [
1]. In this problem, optimal parameters satisfy the relationship
which corresponds to inequality (
42). A generalization of this result in the context of stochastic differential equations can be found in [
3]. Choosing parameters in this way, the computational error
will be approximately twice as large as its deterministic component
, since the conditional optimization of the algorithm for obtaining the histogram assumes the equalization of deterministic and stochastic components.
For densities from Examples 1 and 2, to reduce the approximation error using the histogram by 2 times, it is necessary to increase the number of block pulse functions L by 2 times and the sample size N by times. The projection estimates of densities using Legendre polynomials are more effective in these examples. To reduce the approximation error by 2 times in Example 1, it suffices to increase the projection expansion length n by times and the sample size N by times. In Example 2, it suffices to increase the projection expansion length n by times and the sample size N by times. In this case, increasing n and N implies their subsequent rounding-up.