Estimating Smoothness and Optimal Bandwidth for Probability Density Functions

Dimitris N. Politis; Peter F. Tarassenko; Vyacheslav A. Vasiliev

doi:10.3390/stats6010003

,

and

¹

Department of Mathematics and Halicioglu Data Science Institute, University of California, San Diego, CA 92093-0112, USA

²

Institute of Applied Mathematics and Computer Science, Tomsk State University, 36 Lenin Ave., 634050 Tomsk, Russia

^*

Author to whom correspondence should be addressed.

Stats2023, 6(1), 30-49;https://doi.org/10.3390/stats6010003

This article belongs to the Special Issue Advances in Probability Theory and Statistics

Version Notes

Order Reprints

Abstract

The properties of non-parametric kernel estimators for probability density function from two special classes are investigated. Each class is parametrized with distribution smoothness parameter. One of the classes was introduced by Rosenblatt, another one is introduced in this paper. For the case of the known smoothness parameter, the rates of mean square convergence of optimal (on the bandwidth) density estimators are found. For the case of unknown smoothness parameter, the estimation procedure of the parameter is developed and almost surely convergency is proved. The convergence rates in the almost sure sense of these estimators are obtained. Adaptive estimators of densities from the given class on the basis of the constructed smoothness parameter estimators are presented. It is shown in examples how parameters of the adaptive density estimation procedures can be chosen. Non-asymptotic and asymptotic properties of these estimators are investigated. Specifically, the upper bounds for the mean square error of the adaptive density estimators for a fixed sample size are found and their strong consistency is proved. The convergence of these estimators in the almost sure sense is established. Simulation results illustrate the realization of the asymptotic behavior when the sample size grows large.

Keywords:

non-parametric kernel density estimators; adaptive density estimators; mean square and almost surely convergence; rate of convergence; smoothness class

PACS:

62G07; 62G20; 62F12; 62G99

1. Introduction

Let

X_{1}, \dots, X_{n}

be independent identically distributed random variables (i.i.d. r.v.’s) having a probability density function f. In the typical non-parametric set-up, nothing is assumed about f except that it possesses a certain degree of smoothness, e.g., that it has r continuous derivatives.

Estimating f via kernel smoothing is a sixty year old problem; M. Rosenblatt who was one of its originators discusses the subject’s history and evolution in the monograph [1]. For some point x, the kernel smoothed estimator of

f (x)

is defined by

f_{n, h} (x) = \frac{1}{n} \sum_{j = 1}^{n} \frac{1}{h} K (\frac{x - X_{j}}{h})

(1)

where the kernel

K (\cdot)

is a bounded function satisfying

\int K (x) d x = 1

and

\int K^{2} (x) d x < \infty

, and the positive bandwidth parameter h is a decreasing function of the sample size n.

If

K (\cdot)

has finite moments up to q-th order, and moments of order up to

q - 1

equal to zero, then q is called the ‘order’ of the kernel

K (\cdot)

. Since the unknown function f is assumed to have r continuous derivatives, it typically follows that

V a r (f_{n, h} (x)) = \frac{C_{f, K} (x)}{h n} + o (\frac{1}{h n}),

and

B i a s (f_{n, h} (x)) = c_{f, K} (x) h^{k} + o (h^{k}),

where

k = min (q, r)

, and

C_{f, K} (x), c_{f, K} (x)

are bounded functions depending on

K (\cdot)

as well as f and its derivatives, cf. [1] p. 8.

The idea of choosing a kernel of order q bigger (or equal) than r in order to ensure the

B i a s (f_{n, h} (x))

to be

O (h^{r})

dates back to the early 1960s in work of [2,3]; recent references on higher-order kernels include the following: [4,5,6,7,8,9,10]. Note that since r is typically unknown and can be arbitrarily large, it is possible to use kernels of infinite order that achieve the minimal bias condition

B i a s (f_{n, h} (x)) = O (h^{r})

for any r; Ref. [11] gives many properties of kernels of infinite order. In this paper we will employ a particularly useful class of infinite order kernels namely the flat-top family; see [12] for a general definition.

It is a well-known fact that optimal bandwidth selection is perhaps the most crucial issue in such non-parametric smoothing problems; see [13], as well as the book [14]. The goal typically is minimization of the large-sample mean squared error (MSE) of

f_{n, h} (x)

. However, to perform this minimization, the practitioner needs to know the degree of smoothness r, as well as the constants

C_{f, K} (x)

and

c_{f, K} (x)

. Using an infinite order kernel and focusing just on optimizing the order of magnitude of the large-sample MSE, it is apparent that the optimal bandwidth h must be asymptotically of order

n^{- 1 / (2 r + 1)}

; this yields a large-sample MSE of order

n^{- 2 r / (2 r + 1)}

.

A generalization of the above scenario is possible using a degree of smoothness r that has another sense, and that is not necessarily an integer. Let

[r]

denote the integer part of r, and define

γ = r - [r]

; then, one may assume that f has

[r]

continuous derivatives, and that the

[r]

th derivative satisfies a Lipschitz condition of order

γ

. Interestingly, even in this case where f is assumed to belong to the Hölder class of degree r (the derivative of the density function of the order r satisfies the Lipschitz condition) the MSE–optimal bandwidth h is still of order

n^{- 1 / (2 r + 1)}

and again yields a large-sample MSE of the order

n^{- 2 r / (2 r + 1)}

(see, e.g., [15,16,17,18] among others).

The problem of course is that, as previously mentioned, the underlying degree of smoothness r is typically unknown. In Section 4 of the paper at hand, we develop an estimator

r_{n}

of r and prove its strong consistency; this is perhaps the first such result in the literature. In order to construct our estimator

r_{n}

, we operate under a class of functions that is slightly more general than the aforementioned Hölder class; this class of functions is formally defined in Section 2 via Equation (3) or (4).

Under such a condition on the tails of the characteristic function we are able to show in Section 3 that the optimized MSE of

{\hat{f}}_{n} (x)

is again of order

n^{- 2 r / (2 r + 1)}

for possibly non-integer

r;

this is true, for example, when the characteristic function

ϕ (s)

has tails of order

O (1 / | s |^{r + 1}),

see Example 2.

Furthermore, in Section 5 we develop an adaptive estimator

{\hat{f}}_{n} (x)

that achieves the optimal MSE rate of

n^{- 2 r / (2 r + 1)}

within a logarithmic factor despite the fact that r is unknown, see Examples after Theorem 3. Similar effect arises in the adaptive estimation problem of the densities from the Hölder class; see [18,19,20]. It should pointed that problems of asymptotic adaptive optimal density estimations from another classes have also been considered in the literature; see, e.g., [14,21,22,23].

The construction of

{\hat{f}}_{n} (x)

is rather technical; it uses the new estimator

r_{n}

, and it is inspired from the construction of sequential estimates although we are in a fixed n, non-sequential setting. As the major theoretical result of our paper, we are able to prove a non-asymptotic upper bound for the MSE of

{\hat{f}}_{n} (x)

that satisfies the above mentioned optimal rate. Section 6 contains some simulation results showing the performance of the new estimator

{\hat{f}}_{n} (x)

in practice. All proofs are deferred to Section 7, while Section 8 contains our conclusions and suggestions for future work.

2. Problem Set-Up and Basic Assumptions

Let

X_{1}, \dots, X_{n}

be i.i.d. having a probability density function f. Denote

ϕ (s) = \int e^{i s x} f (x) d x

the characteristic function of f and the sample characteristic function

ϕ_{n} (s) = \frac{1}{n} \sum_{k = 1}^{n} e^{i s X_{k}} .

For some finite

r > 0

, define two families

F_{r}^{+}

and

F_{r}

of bounded, i.e.,

\exists 0 < \bar{f} < \infty : sup_{y \in R^{1}} f (y) \leq \bar{f},

(2)

and continuous functions f satisfying one of the following conditions, respectively:

\int {| s |}^{r} | ϕ (s) | d s < \infty, \int {| s |}^{r + ε} | ϕ (s) | d s = \infty, for all ε > 0,

(3)

\int {| s |}^{r - ε} | ϕ (s) | d s < \infty, \int {| s |}^{r} | ϕ (s) | d s = \infty, for all 0 < ε < r .

(4)

In other words,

F_{r}^{+}

is the family of functions (introduced by M. Rosenblatt) satisfying (2) and (3), while

F_{r}

is the family of functions (introduced in this paper) satisfying (2) and (4). It should be noted that the new class

F_{r}

is a little bit more wide that the classical class

F_{r}^{+}

.

In addition, define the family

F_{r, m}^{+}

(respectively,

F_{r, m}

) as the family of functions f that belong to

F_{r}^{+}

(respectively,

F_{r}

) but with f being such that its characteristic function

| ϕ (s) |

has monotonously decreasing tails.

Consider the class

Ξ

of non-parametric kernel smoothed estimators

f_{n, h} (x)

of

f (x)

as given in Equation (1). Note that we can alternatively express

f_{n, h} (x)

in terms of the Fourier transform of kernel

K (\cdot)

, i.e.,

f_{n, h} (x) = \frac{1}{n} \sum_{j = 1}^{n} \frac{1}{h} K (\frac{x - X_{j}}{h}) = \frac{1}{2 π} \int λ (s, h) ϕ_{n} (s) e^{- i s x} d s

(5)

where

λ (s, h) = \int K (\frac{x}{h}) e^{i s x} d x .

In this paper, we will employ the family of flat-top infinite order kernels, i.e., we will let the function

λ (s, h)

be of the form

λ_{c} (s, h) = \{\begin{matrix} 1 if | s | \leq 1 / h, \\ g (s, h) if 1 / h < | s | \leq c / h, \\ 0 if | s | \geq c / h, \end{matrix}

where c is a fixed number in

[1, \infty)

chosen by the practitioner, and

g (s, h)

is some properly chosen continuous, real-valued function satisfying

g (s, h) = g (- s, h),

g (s, 1) = g (s / h, h),

and

| g (s, h) | \leq 1,

for any

s,

with

g (1 / h, h) = 1,

and

g (c / h, h) = 0

; see [12,24,25,26] for more details on the above flat-top family of kernels.

Define

g_{h}^{'} (s, h)

the partial derivative of the function

g (s, h)

with respect to the bandwidth

h .

We will also assume that for some

c_{0} > 0

\bar{lim_{h \to 0}} sup_{1 / h < | s | < c / h} | g_{h}^{'} (s, h) | / | s | < c_{0} .

(6)

Denote for every

0 \leq γ < r

the functions

δ_{γ} (h) = \int_{1 / h < | s | < c / h} {| s |}^{r - γ} | ϕ (s) | d s, when h > 0, and δ_{γ} (0) = 0 .

From (3) and (5) it follows that

δ_{γ} (h) = o (1)

as

h \to 0

for

f \in F_{r}^{+}

and

γ = 0,

as well as for

f \in F_{r}

and

0 < γ < r .

In other cases

δ_{γ} (h) = \infty .

Define the following classes

{\bar{F}}_{r} = F_{r}^{+} \cup F_{r}

and

{\bar{F}}_{r, m} = F_{r, m}^{+} \cup F_{r, m} .

The main aim of the paper is the estimation of the parameter r of these classes and adaptive estimation of densities from the class

{\bar{F}}_{r}

with the unknown parameter

r .

3. Asymptotic Mean Square Optimal Estimation

The mean square error (MSE)

u_{f}^{2} (f_{n, h}) = E_{f} {(f_{n, h} (x) - f (x))}^{2}

of the estimators

f_{n, h} (x) \in Ξ,

f \in {\bar{F}}_{r}

has the following form:

u_{f}^{2} (f_{n, h}) = U_{f}^{2} (h, c) - \frac{1}{n} {(\int K (v) f (x - h v) d v)}^{2},

(7)

where

U_{f}^{2} (h, c)

is the principal term of the MSE,

U_{f}^{2} (h, c) = \frac{L_{1} f (x)}{n h} + {[\frac{1}{2 π} \int_{1 / h < | s | < c / h} (1 - g (s, h)) ϕ (s) e^{- i s x} d s]}^{2},

L_{1} = \int K^{2} (v) d v .

Thus, in particular,

sup_{f \in {\bar{F}}_{r}} |\int K (v) f (x - h v) d v| < \infty .

To minimize the principal term

U_{f}^{2} (h, c)

by h we set its first derivative with respect to

h

to zero which gives the following equality for the optimal (in the mean square sense) value

h^{0} = h^{0} (n) :

\int_{1 / h^{0} < | s | < c / h^{0}} (g (s, h^{0}) - 1) ϕ (s) e^{- i s x} d s \cdot {c ϕ (c / h^{0}) e^{- \frac{i c x}{h^{0}}} + c ϕ (- c / h^{0}) e^{\frac{i c x}{h^{0}}} + {(h^{0})}^{2} \int_{1 / h^{0} < | s | < c / h^{0}} g_{h}^{'} (s, h^{0}) ϕ (s) e^{- i s x} d s} = \frac{2 π^{2} L_{1} f (x)}{n} .

(8)

From the definition of the class of kernels for cases

δ_{γ} (h) < \infty

we have

|\int_{1 / h < | s | < c / h} (g (s, h) - 1) ϕ (s) e^{- i s x} d s| \leq 2 h^{r - γ} δ_{γ} (h)

and for h small enough, according to (6)

|\int_{1 / h < | s | < c / h} g_{h}^{'} (s, h) ϕ (s) e^{- i s x} d s| \leq c_{0} h^{r - 1 - γ} δ_{γ} (h) .

Then, by the definition of the class

{\bar{F}}_{r, m},

as h small enough, denoting

c_{1} (γ) = \frac{r + 1 - γ}{2 (c^{r + 1 - γ} - 1)},

we have

δ_{γ} (h) \geq \int_{1 / h}^{c / h} {| s |}^{r - γ} d s \cdot [inf_{1 / h < s < c / h} | ϕ (s) | + inf_{1 / h < s < c / h} | ϕ (- s) |]

\geq {(c_{1} (γ))}^{- 1} h^{- (r + 1 - γ)} [| ϕ (c / h) | + | ϕ (- c / h) |] .

Thus, for

h < < 1,

| ϕ (c / h) | + | ϕ (- c / h) | \leq c_{1} (γ) h^{r + 1 - γ} δ_{γ} (h)

and from (8) it follows

{(h^{0})}^{2 r + 1 - 2 γ} δ_{γ}^{2} (h^{0}) \geq \frac{π^{2} L_{1} f (x)}{(c_{0} + c_{1} (γ)) n} .

Define the number

h_{1}^{0} = h_{1}^{0} (n)

from the equality

{(h_{1}^{0})}^{2 r + 1 - 2 γ} δ_{γ}^{2} (h_{1}^{0}) = \frac{π^{2} L_{1} f (x)}{(c_{0} + c_{1} (γ)) n} .

(9)

It is obvious, that

0 < h_{1}^{0} \leq h^{0}

and

{(h_{1}^{0})}^{2 r + 1 - 2 γ} δ_{γ}^{2} (h_{1}^{0}) \leq {(h^{0})}^{2 r + 1 - 2 γ} δ_{γ}^{2} (h^{0}) .

Then, from (7) and (9), for every

f \in {\bar{F}}_{r, m}

and

f_{n}^{0} = f_{n, h^{0}}

as

n \to \infty,

we have

u_{f}^{2} (f_{n}^{0}) \leq u_{f}^{2} (f_{n, h_{1}^{0}}) \leq \frac{L_{1} f (x)}{n h_{1}^{0}} + \frac{1}{π^{2}} {(h_{1}^{0})}^{2 r - 2 γ} δ_{γ}^{2} (h_{1}^{0}) = C_{γ} \cdot \frac{δ_{γ}^{\frac{2}{2 r + 1 - 2 γ}} (h_{1}^{0})}{n^{\frac{2 r - 2 γ}{2 r + 1 - 2 γ}}},

(10)

where

C_{γ} = L_{1}^{\frac{2 r - 2 γ}{2 r + 1 - 2 γ}} (x) (1 + \frac{π^{2}}{c_{0} + c_{1} (γ)}) {(\frac{c_{0} + c_{1} (γ)}{π^{2}})}^{\frac{1}{2 r + 1 - 2 γ}} .

In such a way we have proved the following theorem, which gives the rates of convergence of the random quantities

f_{n}^{0} (x)

and

f_{n, h_{1}^{0}} (x) .

We can loosely call

f_{n}^{0} (x)

and

f_{n, h_{1}^{0}} (x)

‘estimators’ although it is clear that these functions can not be considered as estimators in the usual sense in view of the dependence of the bandwidths

h^{0}

and

h_{1}^{0}

on unknown parameters r and

f (x) .

Nevertheless, this theorem can be used for the construction of bona fide adaptive estimators with the optimal and suboptimal converges rates; see Examples 1 and 2, as well as Section 5.3 in what follows.

Theorem 1.

Let

f (x) > 0 .

Then, for the asymptotically optimal (with respect to bandwidth h) in the MSE sense ‘estimator’

f_{n}^{0} (x)

of the function

f \in {\bar{F}}_{r}

and for the ‘estimator’

f_{n, h_{1}^{0}} (x)

of

f \in {\bar{F}}_{r, m}

the following limit relations, as

n \to \infty,

hold

\begin{matrix} 1^{\circ} . sup_{f \in {\bar{F}}_{r}} |inf_{h} u_{f}^{2} (f_{n, h}) - U_{f}^{2} (h^{0}, c)| = O (\frac{1}{n}); \\ 2^{\circ} . f o r e v e r y f \in {\bar{F}}_{r, m} w i t h γ = 0 i f f \in F_{r, m}^{+} a n d e v e r y 0 < γ < r i f f \in F_{r, m} \\ u_{f}^{2} (f_{n}^{0}) \leq u_{f}^{2} (f_{n, h_{1}^{0}}) \leq C_{γ} \cdot \frac{δ_{γ}^{\frac{2}{2 r + 1 - 2 γ}} (h_{1}^{0})}{n^{\frac{2 r - 2 γ}{2 r + 1 - 2 γ}}}, n \geq 1 . \end{matrix}

Remark 1.

The definition (9) of

h_{1}^{0}

is essentially simpler than the definition (8) of the optimal bandwidth

h^{0} .

From Theorem 1 it follows that the (slightly) suboptimal ‘estimator’

f_{n, h_{1}^{0}}

can be successfully used instead.

It should be noted that the parameter

γ

is chosen by the practitioner here and that

γ = 0 if f \in F_{r, m}^{+}

but

0 < γ < r if f \in F_{r, m}

in which case we want to choose

γ

close to 0.

We shall write in the sequel

φ (s) \approx ψ (s)

as

s \to \infty

instead of the limit relations

0 < lim_{\bar{s \to \infty}} \frac{φ (s)}{ψ (s)} \leq \bar{lim_{s \to \infty}} \frac{φ (s)}{ψ (s)} < \infty .

Example 1.

Consider an estimation problem of the function

f \in F_{r, m}^{+},

satisfying the following additional condition

| ϕ (s) | \approx \frac{1}{{| s |}^{r + 1} {ln}^{1 + φ} | s |} a s | s | \to \infty, φ > 0,

using the kernel estimator

(f_{n, h} (x)) \in Ξ .

By making use of (9) and (10) we find the rate of convergence of the MSE

u_{f}^{2} (f_{n}^{0})

and

u_{f}^{2} (f_{n, h_{1}^{0}}) .

To this end we calculate

δ_{0} (h) = \int_{1 / h < | s | < c / h} {| s |}^{r} | ϕ (s) | d s \approx \frac{1}{{(ln h^{- 1})}^{φ}} - \frac{1}{{(ln h^{- 1} + ln c)}^{φ}} \approx \frac{1}{{({ln}^{2} h^{- 1})}^{1 + φ}} .

It is easy to verify that

f_{n}^{0}, f_{n, h_{1}^{0}} \in Ξ .

Thus, from (9), as

n \to \infty,

h_{1}^{0} \approx {(n γ_{n})}^{- \frac{1}{2 r + 1}},

where

γ_{n} \approx {ln}^{- 2 (1 + φ)} n, n \to \infty

is a solution of the equation

γ_{n} {[ln n + ln γ_{n}]}^{2 (1 + φ)} = 1 .

Therefore, as

n \to \infty,

we have

h_{1}^{0} \approx {(\frac{{ln}^{2 (1 + φ)} n}{n})}^{\frac{1}{2 r + 1}} a n d u_{f}^{2} (f_{n, h_{1}^{0}}) = O {(\frac{1}{n^{2 r} {ln}^{2 (1 + φ)} n})}^{\frac{1}{2 r + 1}} .

Consider the piecewise linear flat-top kernel

λ_{c}^{L I N} (s, h),

introduced by [25] (see [26] as well):

λ_{c}^{L I N} (s, h) = \frac{c}{c - 1} {(1 - \frac{h}{c} | s |)}^{+} - \frac{1}{c - 1} {(1 - h | s |)}^{+},

where

{(x)}^{+} = max (x, 0)

is the positive part function.

Then, from (8) we obtain

{ϕ (c / h^{0}) e^{- \frac{i c x}{h^{0}}} + ϕ (- c / h^{0}) e^{\frac{i c x}{h^{0}}}}^{2} \approx \frac{1}{n}

and, for n large enough

{| ϕ (c / h^{0}) |}^{2} + {| ϕ (- c / h^{0}) |}^{2} \geq \frac{C}{n} .

Thus, similarly to

h_{1}^{0},

as

n \to \infty,

for

f \in F_{r}

we find

h^{0} \approx {(\frac{{ln}^{2 (1 + φ)} n}{n})}^{\frac{1}{2 (r + 1)}}

and

u_{f}^{2} (f_{n}^{0}) = O {(\frac{1}{n^{2 r + 1} {ln}^{2 (1 + φ)} n})}^{\frac{1}{2 (r + 1)}} = o (u_{f}^{2} (f_{n, h_{1}^{0}})) .

Example 2.

Consider an estimation problem of the function

f \in F_{r, m},

satisfying the following additional condition:

| ϕ (s) | \approx \frac{1}{{| s |}^{r + 1}} a s | s | \to \infty,

using the kernel estimator

(f_{n, h} (x)) \in Ξ .

Using (9) and (10) we will find the rate of convergence of the MSE

u_{f}^{2} (f_{n}^{0})

and

u_{f}^{2} (f_{n, h_{1}^{0}}) .

To this end, we calculate

δ_{γ} (h) = \int_{1 / h < | s | < c / h} {| s |}^{r - γ} | ϕ (s) | d s \approx h^{γ} .

It is easy to verify that

f_{n}^{0}, f_{n, h_{1}^{0}} \in Ξ .

Thus, from (9), as

n \to \infty,

h_{1}^{0} \approx n^{- \frac{1}{2 r + 1}} .

Therefore, we have

u_{f}^{2} (f_{n, h_{1}^{0}}) = O (\frac{1}{n^{\frac{2 r}{2 r + 1}}}), n \to \infty .

Similarly to Example 1 as

n \to \infty,

for

f \in F_{r}

we find

h^{0} \approx \frac{1}{n^{\frac{1}{2 (r + 1)}}}

and

u_{f}^{2} (f_{n}^{0}) = O (\frac{1}{n^{\frac{2 r + 1}{2 (r + 1)}}}) = o (u_{f}^{2} (f_{n, h_{1}^{0}})) .

4. Estimation of the Degree of Smoothness r

Define the functions

Φ_{α} (A, B) = \int_{A < | s | < B} {| s |}^{α} | ϕ (s) | d s, Φ_{α} = Φ_{α} (0, \infty),

Φ_{n, α} (A, B) = \int_{A < | s | < B} {| s |}^{α} | ϕ_{n} (s) | d s, Φ_{n, α} = Φ_{n, α} (0, \infty) .

Let

{(δ_{n})}_{n \geq 1}

and

{(ρ_{n})}_{n \geq 1}

be two given sequences of positive numbers chosen by the practitioner such that

δ_{n} \to 0

and

ρ_{n} \to \infty

as

n \to \infty .

The sequence

(δ_{n})

represents the ‘grid’-size in our search of the correct exponent

r,

while

(ρ_{n})

represents an upper bound that limits this search.

Define the following sets of non-random sequences

C_{+} = {{(A_{n}, B_{n}, δ_{n})}_{n \geq 1} : A_{n} \to \infty, 0 < A_{n} < B_{n} \to \infty, δ_{n} \to 0 as n \to \infty;

for some m_{0} \geq 2, \sum_{n \geq 1} \frac{B_{n}^{2 m_{0} (ϱ_{n} + 1 + δ_{n})}}{n^{m_{0}}} < \infty; Φ_{r + ε} (A_{n}, B_{n}) \to \infty, \forall ε > 0},

C = {{(A_{n}, B_{n}, δ_{n})}_{n \geq 1} : A_{n} \to \infty, 0 < A_{n} < B_{n} \to \infty, δ_{n} \to 0 as n \to \infty;

for some m_{0} \geq 2, \sum_{n \geq 1} \frac{B_{n}^{2 m_{0} (ϱ_{n} + 1 + δ_{n})}}{n^{m_{0}}} < \infty; Φ_{r} (A_{n}, B_{n}) \to \infty} .

Remark 2.

Formally, the definition of sets

C_{+},

C

and, as follows of estimators

r_{n}^{+}

and

r_{n},

as well of sets

C_{+}^{*},

C_{*}

defined below depend on the unknown function

Φ_{α} (A, B) .

At the same time, the set

C_{+}

(and, as follows, the estimator

r_{n}^{+}

and the set

C_{+}^{*})

can be defined independently of

Φ_{α} (A, B) .

Indeed, denote

α_{s} = {| s |}^{r + 1} | ϕ (s) |

and

– let

f \in F_{r}^{+} .

Then for every

ε > 0

lim_{s \to \infty} s^{ε / 2} α_{s} = \infty a n d \bar{lim_{s \to \infty}} α_{s} \cdot log s < \infty .

Thus

(A_{n}, B_{n}, δ_{n}) \in C_{+}

for appropriate chosen

(δ_{n})

and

A_{n} = O (B_{n}^{1 / 2})

because (consider for simplification the case

A_{n} > 0)

Φ_{r + ε} (A_{n}, B_{n}) = \int_{A_{n}}^{B_{n}} {| s |}^{r + ε} | ϕ (s) | d s = \int_{A_{n}}^{B_{n}} s^{ε - 1} α_{s} d s = \int_{0}^{B_{n}} s^{ε - 1} α_{s} d s - \int_{0}^{A_{n}} s^{ε - 1} α_{s} d s \geq C_{1} \int_{0}^{B_{n}} s^{ε / 2 - 1} d s - C_{2} \int_{0}^{A_{n}} s^{ε - 1} {log}^{- 1} s d s \sim B_{n}^{ε / 2} - A_{n}^{ε} {log}^{- 1} A_{n} + \int_{0}^{A_{n}} s^{ε - 1} {log}^{- 2} s d s \to \infty .

According to the definition of the class

F_{r}

it is impossible to find elements of the set

C

independently of the function to be estimated without usage of an a priori information about

f .

Consider one simple example.

– Let

f \in F_{r} .

Suppose, e.g., in addition that

lim_{\bar{s \to \infty}} log s \cdot \frac{1}{s} \int_{0}^{s} {| u |}^{r + 1} | ϕ (u) | d u > 0 .

Then

(A_{n}, B_{n}, δ_{n}) \in C

for appropriate chosen

(δ_{n})

and

A_{n} = o (B_{n})

because

Φ_{r} (A_{n}, B_{n}) = \int_{A_{n}}^{B_{n}} s^{r} | ϕ (s) | d s = \int_{A_{n}}^{B_{n}} s^{- 1} d \int_{0}^{s} α_{u} d u d s = \frac{1}{s} \int_{0}^{s} α_{u} d u |_{A_{n}}^{B_{n}} + \int_{A_{n}}^{B_{n}} s^{- 2} \int_{0}^{s} α_{u} d u d s \geq C \int_{A_{n}}^{B_{n}} \frac{1}{s log s} d s \geq C log log B_{n} \to \infty .

Another examples are in Example 3 (see also Remark 3 and Example 4).

For an arbitrary given

H > 0

chosen by the practitioner, define the estimators

{(r_{n}^{+})}_{n \geq 1}

and

{(r_{n})}_{n \geq 1}

of the parameter r in (3) and (4) as follows

r_{n}^{+} = min [ϱ_{n}, (δ_{n} \cdot inf {k \geq 1 : Φ_{n, (k + 1) δ_{n}} (A_{n}, B_{n}) \geq H, (A_{n}, B_{n}, δ_{n}) \in C_{+}})] .

(11)

r_{n} = min [ϱ_{n}, (δ_{n} \cdot inf {k \geq 1 : Φ_{n, k δ_{n}} (A_{n}, B_{n}) \geq H, (A_{n}, B_{n}, δ_{n}) \in C})] .

(12)

Example 3.

For the functions

ϕ (\cdot)

from Examples 1 and 2, we can use the definitions (11) and (12) with the following choices:

B_{n} = ln n, ϱ_{n} = ρ \frac{ln n}{ln ln n}, ρ \in (0, {(2 m_{0})}^{- 1}),

arbitrary

δ_{n} \to 0

and

A_{n} = o (B_{n}),

as

n \to \infty .

Indeed, for

f \in F_{r, m}^{+}

and every

ε > 0

(Example 1),

Φ_{r + ε} (A_{n}, B_{n}) = \int_{A_{n} < | s | < B_{n}} {| s |}^{r + ε} | ϕ (s) | d s \approx \frac{B_{n}^{ε}}{{ln}^{φ} B_{n}} - \frac{A_{n}^{ε}}{{ln}^{φ} A_{n}} \approx \frac{{ln}^{ε} n}{{ln}^{φ} ln n} \to \infty

and for

f \in F_{r, m}

(Example 2),

Φ_{r} (A_{n}, B_{n}) = \int_{A_{n} < | s | < B_{n}} {| s |}^{r} | ϕ (s) | d s \approx ln \frac{B_{n}}{A_{n}} \to \infty

and, as follows, the classes

C_{+}

and

C

are not empty.

Define

J_{n, α} = \int_{A_{n} < | s | < B_{n}} {| s |}^{α} | ϕ_{n} (s) - ϕ (s) | d s, α > 0, n \geq 1 .

Lemma 1.

Let

(A_{n}, B_{n}, δ_{n}) \in C_{+} \cup C .

Then, for every

α > 0, m \geq 1

and

n \geq 1

there exist positive numbers

C_{α, m}

such that

sup_{f \in {\bar{F}}_{r}} E_{f} J_{n, α}^{2 m} \leq C_{α, m} \frac{B_{n}^{2 m (α + 1)}}{n^{m}}

(13)

and for every

f \in {\bar{F}}_{r}

J_{n, ϱ_{n} + δ_{n}} = o (1) P_{f} - a . s .

Define the sets

C_{+}^{*}

and

C^{*}

of non-random sequences

{(A_{n}, B_{n}, δ_{n})}_{n \geq 1}

C_{+}^{*} = {(A_{n}, B_{n}, δ_{n}) \in C_{+} : lim_{n \to \infty} Φ_{r + δ_{n}} (A_{n}, B_{n}) = \infty},

C_{*} = {(A_{n}, B_{n}, δ_{n}) \in C : lim_{n \to \infty} Φ_{r - δ_{n}} (A_{n}, B_{n}) = 0} .

Remark 3.

It can be directly verified that under the conditions of Remark 2 the sequences

(A_{n}, B_{n}, δ_{n}) \in C_{+}^{*}

if

A_{n} = O (B_{n}^{1 / 2})

and

δ_{n}^{- 1} = o (log B_{n}),

as well as

(A_{n}, B_{n}, δ_{n}) \in C_{*}

if

A_{n} = o (B_{n})

and

δ_{n} = o (\frac{log log log B_{n}}{log B_{n}}) .

Moreover, under the conditions of Example 3.1,

(A_{n}, B_{n}, δ_{n}) \in C_{+}^{*}

if we put

δ_{n} = δ \cdot \frac{ln ln ln (n + 1)}{ln ln (n + 1)}, δ > φ .

Example 4.

Consider the functions

ϕ (\cdot)

from Examples 1, 2 and suppose, that the smooth parameter

r \leq R

for some known number

R .

Then the sequences

(A_{n}, B_{n}, δ_{n}) \in C_{+}^{*} \cup C_{*}

if we put

B_{n} = n^{b}, 0 < b < \frac{m_{0} - 1}{4 (R + 1)}, A_{n} = o (B_{n}), ϱ_{n} = R,

δ_{n} = δ \cdot \frac{ln ln (n + 1)}{ln (n + 1)}, δ b > φ .

Theorem 2.

The estimators

r_{n}^{+}

and

r_{n},

defined in (11) and (12), respectively, with

ρ_{n} \to \infty

have the following properties

1^{\circ}

(a) if

f \in F_{r}^{+}

and

(A_{n}, B_{n}, δ_{n}) \in C_{+},

then

lim_{n \to \infty} r_{n}^{+} = r P_{f} - a . s .

(b) if

f \in F_{r}

and

(A_{n}, B_{n}, δ_{n}) \in C,

then

lim_{n \to \infty} r_{n} = r P_{f} - a . s .

2^{\circ}

(a) if

f \in F_{r}^{+}

and for some

δ_{n} \to 0

the sequences

(A_{n}, B_{n}, δ_{n}) \in C_{+}^{*},

then

lim_{n \to \infty} δ_{n}^{- 1} (r_{n}^{+} - r) = 0 P_{f} - a . s .

(b) if

f \in F_{r}

and for some

δ_{n} \to 0

the sequences

(A_{n}, B_{n}, δ_{n}) \in C_{*},

then

lim_{n \to \infty} δ_{n}^{- 1} (r_{n} - r) = 0 P_{f} - a . s .

5. Adaptive Estimation of the Functions $f \in {\bar{F}}_{r}$

The purpose of this section is the construction and investigation of an adaptive estimator of the function

f \in {\bar{F}}_{r}

with unknown

r,

which can either serve as the main estimator (since it achieves the optimal rate of convergence within

{\bar{F}}_{r})

or can serve as a ‘pilot’ estimator to be used in (8) and (9) for the construction of an adaptive optimal and suboptimal bandwidths

{\hat{h}}^{0}

and

{\hat{h}}_{1}^{0} .

5.1. Adaptive MSE–Optimal Estimation

We define an adaptive estimator of

f \in {\bar{F}}_{r}

as follows

{\hat{f}}_{n} (x) = \frac{1}{n} \sum_{j = 1}^{n} Λ_{j - 1} (x - X_{j}) = \frac{1}{2 π n} \sum_{j = 1}^{n} \int λ_{j - 1} (s) e^{- i s (x - X_{j})} d s,

(14)

where

Λ_{j - 1} (z) = \frac{1}{{\hat{h}}_{j - 1}} K (\frac{z}{{\hat{h}}_{j - 1}}) = \frac{1}{2 π} \int λ_{j - 1} (s) e^{- i s z} d s

is the smoothing kernel, and

λ_{j - 1} (s) = λ_{c} (s, {\hat{h}}_{j - 1});

the required bandwidths are defined by

{\hat{h}}_{j} = {(j + 1)}^{- \frac{1}{1 + 2 r (j)}}, j \geq 1,

where

r (j) = r_{j}^{+}

if

f \in F_{r}^{+}

and

r (j) = r_{j}

if

f \in F_{r};

recall that the estimators

r_{j}^{+}

and

r_{j}

are defined in (11) and (12), respectively.

From the definition of

r (j)

it follows, that

{\hat{h}}_{j} \leq {\bar{h}}_{j}, j \geq 1,

where

{\bar{h}}_{j} = {(j + 1)}^{- \frac{1}{1 + 2 ϱ_{j}}} .

Note, that

{\bar{h}}_{j} \leq 1

and

{\bar{h}}_{j} \to 0

if the following additional condition

lim_{j \to \infty} \frac{ϱ_{j}}{ln j} = 0

(15)

on the sequence

(ϱ_{j})

defined in the beginning of Section 4 holds.

Denote

n_{1} = \{\begin{matrix} sup {n \geq 1 : Φ_{r} (A_{n}, B_{n}) > H - 1} if f \in F_{r}^{+}, \\ sup {n \geq 1 : Φ_{r - δ_{n}} (A_{n}, B_{n}) > H - 1} if f \in F_{r}, \end{matrix}

n_{2} = \{\begin{matrix} sup {n \geq 1 : Φ_{r + δ_{n}} (A_{n}, B_{n}) < H + 1} if f \in F_{r}^{+}, \\ sup {n \geq 1 : Φ_{r} (A_{n}, B_{n}) < H + 1} if f \in F_{r}, \end{matrix}

where the constant H first used in (11) and (12). Define the following sequences for

j \geq 0,

0 \leq γ \leq r,

h_{j} = {(j + 1)}^{- \frac{1}{1 + 2 r - 2 γ}}, h_{j}^{*} = {(j + 1)}^{- \frac{1}{1 + 2 (r - γ - δ_{j + 1})}},

{\tilde{h}}_{j} = {(j + 1)}^{- \frac{1}{1 + 2 (r - γ + δ_{j + 1})}} and Δ_{γ} (h) = \int_{| s | > h^{- 1}} {| s |}^{r - γ} | ϕ (s) | d s,

as well as the constants

C_{1} = \bar{f} \cdot \int K^{2} (u) d u, {\tilde{C}}_{1} = C_{1} (\sum_{j = 1}^{n_{1}} j + C_{r, 2 m_{0}} \sum_{j > n_{1}} \frac{B_{j}^{4 m_{0} (r + 1)}}{j^{2 m_{0} - 1}}), {\tilde{C}}_{2} (γ) = {\bar{f}}^{2} + C_{2} (γ),

C_{2} (γ) = \frac{1}{4 π^{2}} [\sum_{j = 1}^{n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) + C_{r + 1, m_{0}} \sum_{j > n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) \frac{B_{j}^{2 m_{0} (r + 1 + δ_{j})}}{n^{m_{0}}}]

and the function

Ψ_{γ} (n) = \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}} + \frac{1}{4 π^{2} n} \sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\tilde{h}}_{j - 1}) .

Note that the summability of the series in the definitions of the constants

{\tilde{C}}_{1}

and

C_{2} (γ)

follows from the corresponding demand in the definition of the classes

C^{+}

and

C .

Main properties of constructed estimators are stated in the following theorem.

Theorem 3.

Let the sequences

(A_{n}, B_{n}, δ_{n})

in the definition of the estimator

r_{n}^{+}

belong to the set

C_{+}^{*}

and in the definition of the estimator

r_{n}

to the set

C_{*}

and the condition (15) is fulfilled.

Let

γ = 0

if

f \in F_{r}^{+}

and

γ \in (0, r)

if

f \in F_{r} .

Then, for every

f \in {\bar{F}}_{r}

and

n \geq 1

the estimator (14) has the following properties:

1^{\circ} .

u_{f}^{2} ({\hat{f}}_{n}) \leq Ψ_{γ} (n) + \frac{{\tilde{C}}_{1}}{n^{2}} + \frac{{\tilde{C}}_{2} (γ)}{n};

2^{\circ} .

the estimator

{\hat{f}}_{n}

is strongly consistent:

lim_{n \to \infty} {\hat{f}}_{n} (x) = f (x) P_{f} - a . s .

Example 11. (Examples 1 and 4 revisited,

f \in F_{r}^{+}

) In this case

\frac{1}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}} = \frac{1}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}} \cdot {(ln j)}^{\frac{2 δ}{{(1 + 2 r)}^{2}}} \approx \frac{1}{n h_{n}} \cdot {(ln n)}^{\frac{2 δ}{{(1 + 2 r)}^{2}}}

and

\frac{1}{n} \sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r} Δ_{0}^{2} ({\tilde{h}}_{j - 1}) \approx \frac{1}{n} \sum_{j = 1}^{n} h_{j - 1}^{2 r} \cdot {(ln j)}^{\frac{4 r δ}{{(1 + 2 r)}^{2}} - 2 φ} .

Thus, under the following conditions

m_{0} > 16 R (R + 1) + 1, 4 R < b < \frac{m_{0} - 1}{4 (R + 1)}, \frac{φ}{b} < δ < \frac{φ}{4 R}

we have, as

n \to \infty,

\frac{1}{n} \sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r} Δ_{0}^{2} ({\tilde{h}}_{j - 1}) = o (\frac{1}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}})

and, as follows,

Ψ_{0} (n) \approx \frac{1}{n h_{n}} \cdot {(ln n)}^{\frac{2 δ}{{(1 + 2 r)}^{2}}} \approx \frac{1}{n^{\frac{2 r}{1 + 2 r}}} \cdot {(ln n)}^{\frac{2 δ}{{(1 + 2 r)}^{2}}} .

Then, according to Theorem 2, in this case the rate of convergence of adaptive density estimators of

f \in F_{r}^{+}

differs from the rate of non-adaptive estimators in [26] on the extra log-factor only.

For the functions

f \in F_{r}

and

γ \in (0, \min (r, 1))

from Examples 2 and 4, it is easy to verify, that

Ψ_{γ} (n) \approx \frac{1}{n^{\frac{2 (r - γ)}{1 + 2 (r - γ)}}} {(ln n)}^{\frac{δ}{1 + 2 (r - γ)}} as n \to \infty .

5.2. A Symmetric Estimator

Noting that the construction of the estimator

{\hat{f}}_{n} (x)

depends on the order by which the data

X_{1}, \dots, X_{n}

are employed, a simple improvement is immediately available. Let

X_{[1]} \leq X_{[2]} \leq \dots \leq X_{[n]}

be the order statistics that are a sufficient statistic in the case of our i.i.d. sample

X_{1}, \dots, X_{n}

. Hence, by the Rao–Blackwell theorem, the estimator

E ({\hat{f}}_{n} (x) | X_{[1]}, \dots, X_{[n]})

(16)

will have smaller (or, at least, not larger) MSE than

{\hat{f}}_{n} (x)

.

Unfortunately, the estimator (16) is difficult to compute. However, it is possible to construct a simple estimator that captures the same idea. To do this, consider all distinct permutations of the data

X_{1}, \dots, X_{n}

, and order them in some fashion so that

X_{1}^{(k)}, \dots, X_{n}^{(k)}

is the kth permutation. For unifying presentation, the 1st permutation will be the original data

X_{1}, \dots, X_{n}

. Because of the continuity of the r.v.s

X_{1}, \dots, X_{n}

, the number of such permutations is

n!

with probability one.

So let

{\hat{f}}_{n}^{(k)} (x)

be the estimator

{\hat{f}}_{n} (x)

as computed from the kth permutation

X_{1}^{(k)}, \dots, X_{n}^{(k)}

, i.e., Equation (14) with

X_{1}^{(k)}, \dots, X_{n}^{(k)}

instead of

X_{1}, \dots, X_{n}

.

Finally, let

b \leq n!

be a positive integer (possibly depending on n), and let

{\bar{f}}_{n, b} (x) = \frac{1}{b} \sum_{k = 1}^{b} {\hat{f}}_{n}^{(k)} (x) .

(17)

Theorem 4.

For any choice of

b (\leq n!)

, we have

M S E ({\bar{f}}_{n, b} (x)) \leq M S E ({\hat{f}}_{n} (x)) .

Ideally, the practitioner would use a high value of b—even

b = n!

if the latter is computationally feasible. However, even moderate values of b would give some improvement; in this case, the b permutations to be included in the construction of

{\bar{f}}_{n, b} (x)

might be picked randomly as in resampling/subsampling methods—see e.g., [27].

5.3. Adaptive Optimal Bandwidth

Define

L (h, ϕ) = \int_{1 / h < | s | < c / h} (g (s, h) - 1) ϕ (s) e^{- i s x} d s \cdot {c ϕ (c / h) e^{- \frac{i c x}{h}} + c ϕ (- c / h) e^{\frac{i c x}{h}} + h^{2} \int_{1 / h < | s | < c / h} g_{h}^{'} (s, h) ϕ (s) e^{- i s x} d s} .

According to (8) the optimal bandwidth

h^{0}

is defined from the equality

L (h^{0}, ϕ) = \frac{2 π^{2} L_{1} f (x)}{n} .

Thus, it is natural to define the adaptive (to the unknown parameter r and the function

f (x))

optimal bandwidth

{\hat{h}}^{0}

from the equality

L ({\hat{h}}^{0}, ϕ_{n}) = \frac{2 π^{2} L_{1} {\bar{f}}_{n, b} (x)}{n},

where the adaptive estimator

{\bar{f}}_{n, b} (x)

is defined in (17).

It is hoped that the bandwidths

h^{0}

and

{\hat{h}}^{0}

have similar asymptotic properties in view of the fact that, according to Theorems 3 and 4 the function

n Ψ^{- 1 / 2} (n) [L (h^{0}, ϕ) - L ({\hat{h}}^{0}, ϕ_{n})]

is bounded in probability.

6. Simulation Results

In this section we provide results of a simulation study regarding the estimators introduced in Section 3.

Two flat-top kernels have been used in the simulation. The first one has the piecewise linear kernel characteristic function introduced in [26], i.e.,

λ (s) = \{\begin{matrix} 1, & | s | \leq 1, \\ (c - | s |) / (c - 1), & 1 < | s | < c, \\ 0, & | s | \geq c . \end{matrix}

The piecewise linear characteristic function and corresponding kernel are shown in Figure 1.

Figure 1. Piecewise linear characteristic function (left) and corresponding kernel (right),

c = 1.5

.

The second case refers to the infinitely differentiable flat-top kernel characteristic function defined in [28], i.e.,

λ (s) = \{\begin{matrix} 1, & | s | \leq c, \\ e x p [- b exp [- b / {(| s | - c)}^{2}] / {(| s | - 1)}^{2}], & c < | s | < 1, \\ 0, & | s | \geq 1 . \end{matrix}

The characteristic function and kernel of the second case are shown in Figure 2.

Figure 2. Infinitely differentiable flat-top characteristic function (left) and corresponding kernel (right),

c = 0.05

,

b = 1

.

We examine kernel density estimators of triangular, exponential, Laplace, and gamma (with various shape parameter) distributions. Figure 3, Figure 4 and Figure 5 illustrate the estimator MSE as a function of the sample size.

Figure 3. MSE of kernel estimators multiplied by

n^{3 / 4}

as a function of the sample size n for the triangle density function. (a) MSE of estimator with piecewise linear kernel characteristic function. (b) MSE of estimator with infinitely differentiable flat-top kernel characteristic function.

Figure 4. MSE of kernel estimators (with piecewise linear kernel characteristic function) as a function of the sample size n. (a) Laplace density function (

r = 1

, MSE multiplied by

n^{3 / 4}

). (b) Gamma density function (

k = 3

,

r = 2

, MSE multiplied by

n^{5 / 6}

).

Figure 5. MSE of kernel estimators (with piecewise linear kernel characteristic function) as a function of the sample size n. (a) Gamma distribution shape parameter

k = 4

,

r = 3

(MSE multiplied by

n^{7 / 8}

). (b) Gamma distribution shape parameter

k = 6

,

r = 5

(MSE multiplied by

n^{11 / 12}

).

Using notation

C (x) = {0, x < 0; 1, x \geq 0}

for Heaviside step function, the triangular density function is defined as

f (x) = ((λ - | x |) / λ^{2}) C (λ - x) C (λ + x)

having characteristic function

ϕ (s) = 2 (1 - cos (λ s)) / {(λ s)}^{2} .

Laplace density

f (x) = λ / 2 exp (- λ | x |)

has characteristic function

ϕ (s) = λ^{2} / (λ^{2} + s^{2})

, gamma density

f (x) = λ^{k} x^{k - 1} e^{- λ x} / Γ (k)

has characteristic function

ϕ (s) = λ^{k} / {(λ - i s)}^{k}

.

In all cases we choose scale parameter

λ

to have variation equals to 1, and consider estimation of density function

f (x)

at point

x = 1

.

All the above-mentioned characteristic functions

ϕ (s)

satisfy condition (4) for

r = 1

(triangular and Laplace), and

r = k - 1

(gamma,

k > 1

); therefore, all distributions belong to the family

F_{r}

with corresponding value of r. In addition, all

ϕ (s)

meet the requirements of Example 2. Thus, the bandwidth can be taken in the form

h = O (n^{- 1 / (2 (r + 1))})

and the expected convergence rate of the kernel estimator MSE is

n^{- (2 r + 1) / (2 (r + 1))} .

The main goal of the simulation study is investigation of the MSE behavior for the kernel estimator with the growth of sample size. We generate sequences of 150 samples for sample size from 25 to 2000 with step 25, and for some distributions for sample size from 2000 to 20,000 with step 100 or 200. Then, for each sample size we calculate the estimator MSE multiplied by

n^{- (2 r + 1) / (2 (r + 1))}

and expect visual stabilization of the sequence of resulting values with growth of

n .

Typical examples of the simulation results are presented at Figure 3 (for

r = 1

), Figure 4 (for

r = 1

and

r = 2

), and Figure 5 (for

r = 3

and

r = 5

). The expected stabilization of the scaled MSE is observed in all cases. Moreover, increasing r causes enlargement of sample size that is needed to achieve limiting asymptotic behavior. For

r = 1

and

r = 2

we can see stabilization starting from

n \approx 500

, for

r = 3

it starts from

n \approx 1500

, while for

r = 5

the asymptotic behavior is observed to start from sample size

n \approx

15,000.

7. Technical Proofs

7.1. Proof of Lemma 1

First we note that for every

m \geq 1

and

n \geq 1

there exist positive numbers

κ_{m}

such that

sup_{f \in {\bar{F}}_{r}} E_{f} {| ϕ_{n} (s) - ϕ (s) |}^{2 m} \leq \frac{κ_{m}}{n^{m}} .

These inequalities follow from the Burkholder inequality (see, for example, [29]) for the martingale

(\sum_{k = 1}^{n} (e^{i s X_{k}} - ϕ (s)), F_{n}^{X}),

F_{n}^{X} = σ {X_{1}, \dots, X_{n}}

and finiteness of the function

ϕ (\cdot) .

Using this and Hölder’s inequalities we can estimate

sup_{f \in {\bar{F}}_{r}} E_{f} J_{n, α}^{2 m} = sup_{f \in {\bar{F}}_{r}} E_{f} {[\int_{A_{n} < | s | < B_{n}} {| s |}^{α} | ϕ_{n} (s) - ϕ (s) | d s]}^{2 m}

\leq {(\int_{A_{n} < | s | < B_{n}} {| s |}^{\frac{2 m α}{2 m - 1}} d s)}^{2 m - 1} \cdot \int_{A_{n} < | s | < B_{n}} sup_{f \in {\bar{F}}_{r}} E_{f} {| ϕ_{n} (s) - ϕ (s) |}^{2 m} d s \leq \frac{C_{α, m} B_{n}^{2 m (α + 1)}}{n^{m}} .

From the Borel–Cantelli lemma and the assumed summability of the right-hand side of (13) for

m = m_{0}, α = ϱ_{n} + δ_{n}

and

f \in {\bar{F}}_{r}

follows the second assertion of Lemma 1.

7.2. Proof of Theorem 2

We prove now the statements

1^{\circ}

(a) and

2^{\circ}

(a) of Theorem 1. First, we show for n large enough the inequalities

r_{n}^{+} < ϱ_{n} P_{f} - a . s .

(18)

To this end, according to the definition of the estimator

r_{n}^{+},

it is enough to establish for some

α > r

the limiting relation

lim_{n \to \infty} Φ_{n, α} (A_{n}, B_{n}) = \infty P_{f} - a . s .,

which follows from the definition of the class

C_{+}

and Lemma 1:

Φ_{n, α} (A_{n}, B_{n}) = Φ_{n, α} (A_{n}, B_{n}) - Φ_{α} (A_{n}, B_{n}) + Φ_{α} (A_{n}, B_{n}) \geq Φ_{α} (A_{n}, B_{n}) - J_{n, α} \to \infty P_{f} - a . s .

From (18) and by the definition of the estimator

r_{n}^{+},

for n large enough, we have

Φ_{n, r_{n}^{+} + δ_{n}} (A_{n}, B_{n}) \geq H .

Thus,

lim_{\bar{n \to \infty}} Φ_{r_{n}^{+} + δ_{n}} (A_{n}, B_{n}) \geq lim_{\bar{n \to \infty}} Φ_{n, r_{n}^{+} + δ_{n}} (A_{n}, B_{n})

- \bar{lim_{n \to \infty}} J_{n, ϱ_{n} + δ_{n}} \geq H P_{f} - a . s .

(19)

Analogously,

\bar{lim_{n \to \infty}} Φ_{r_{n}^{+}} (A_{n}, B_{n}) \leq H P_{f} - a . s .

(20)

From (19) and (20) it follows, that for any

ε > 0

and

δ > 0

for n large enough

r - ε - δ < r_{n}^{+} < r + ε P_{f} - a . s .

and the assertion 1(a) of Theorem 2 is proved.

From the definitions of the estimator

r_{n}^{+},

class

C_{+}^{*},

Chebyshev’s inequality and (13), for

m \geq 1, n > n_{1}

and

f \in F_{r}^{+}

we have

P_{f} (r_{n - 1}^{+} < r - δ_{n - 1}) \leq P_{f} (Φ_{n, r} (A_{n}, B_{n}) \geq H) \leq P_{f} (J_{n, r} \geq H - Φ_{r} (A_{n}, B_{n}))

\leq \frac{E_{f} J_{n, r}^{2 m}}{{(H - Φ_{r} (A_{n}, B_{n}))}^{2 m}} \leq \frac{C_{r, m} B_{n}^{2 m (r + 1)}}{n^{m}} .

(21)

Similar to (21) for

n > n_{2}

and

m \geq 1

we obtain

P_{f} (r_{n - 1}^{+} > r + δ_{n - 1}) \leq P_{f} (Φ_{n, r + δ_{n}} (A_{n}, B_{n}) < H) = P_{f} (Φ_{r + δ_{n}} (A_{n}, B_{n}) - H \leq J_{n, r + δ_{n}}) \leq \frac{E_{f} J_{n, r + δ_{n}}^{2 m}}{{(Φ_{r + δ_{n}} (A_{n}, B_{n}) - H)}^{2 m}} \leq \frac{C_{r + 1, m} B_{n}^{2 m (r + 1 + δ_{n})}}{n^{m}}

(22)

and, as follows, for

f \in F_{r}^{+},

P_{f} (δ_{n}^{- 1} | r_{n}^{+} - r | \geq 1) \leq \frac{2 C_{r + 1, m} B_{n}^{2 m (r + 1 + δ_{n})}}{n^{m}} .

(23)

From the Borel–Cantelli lemma and the assumed summability of the right hand side in (23) for

m \geq m_{0}

follows the assertion 2(a) of Theorem 2.

The other statements of Theorem 2 for the estimator

r_{n}

can be proved analogically.

7.3. Proof of Theorem 3

Consider the deviation of the estimator (14) in the following form:

{\hat{f}}_{n} (x) - f (x) = I_{1} (n) + I_{2} (n),

(24)

where

I_{1} (n) = {\hat{f}}_{n} (x) - {\tilde{f}}_{n} (x), I_{2} (n) = {\tilde{f}}_{n} (x) - f (x),

{\tilde{f}}_{n} (x) = \frac{1}{n} \sum_{j = 1}^{n} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z = \frac{1}{2 π n} \sum_{j = 1}^{n} \int λ_{j - 1} (s) ϕ (s) e^{- i s x} d s .

Now we estimate second moments of

I_{1} (n)

and

I_{2} (n) .

Denote

F_{j}^{X} = σ {X_{1}, \dots, X_{j}} .

For

f \in {\bar{F}}_{r}

we have

E_{f} I_{1}^{2} (n) = \frac{1}{n^{2}} E_{f} {(\sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}} [K (\frac{x - X_{j}}{{\hat{h}}_{j - 1}}) - {\hat{h}}_{j - 1} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z])}^{2}

= \frac{1}{n^{2}} E_{f} \sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}^{2}} E_{f} ({[K (\frac{x - X_{j}}{{\hat{h}}_{j - 1}}) - {\hat{h}}_{j - 1} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z]}^{2} | F_{j - 1}^{X})

= \frac{1}{n^{2}} E_{f} \sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}} \int {[K (u) - {\hat{h}}_{j - 1} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z]}^{2} f (x - {\hat{h}}_{j - 1} u) d u

\leq \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} E_{f} \frac{1}{{\hat{h}}_{j - 1}} + \frac{{\bar{f}}^{2}}{n} \leq \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}} + \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} j P_{f} (r (j) < r - δ_{j}) + \frac{{\bar{f}}^{2}}{n} .

From (21) for

m = 2 m_{0}

we obtain

E_{f} I_{1}^{2} (n) \leq \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}} + \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n_{1}} j + \frac{C_{1} C_{r, 2 m_{0}}}{n^{2}} \sum_{j > n_{1}} \frac{j B_{j}^{4 m_{0} (r + 1)}}{j^{2 m_{0}}} + \frac{{\bar{f}}^{2}}{n}

= \frac{C_{1}}{n^{2}} \sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}} + \frac{{\tilde{C}}_{1}}{n^{2}} + \frac{{\bar{f}}^{2}}{n} .

(25)

Further, by the definition of the function

\tilde{f} (x),

the Cauchy–Bunyakovskii–Schwarz inequality and from (22) we have

E_{f} I_{2}^{2} (n) = \frac{1}{4 π^{2} n^{2}} E_{f} {[\sum_{j = 1}^{n} \int_{| s | \geq {\hat{h}}_{j - 1}^{- 1}} (λ_{j - 1} (s) - 1) ϕ (s) e^{- i s x} d s]}^{2} \leq \frac{1}{4 π^{2} n} \sum_{j \geq 1} E_{f} {[\int_{| s | \geq {\hat{h}}_{j - 1}^{- 1}} | ϕ (s) | d s]}^{2} \leq \frac{1}{4 π^{2} n} \sum_{j = 1}^{n} E_{f} {\hat{h}}_{j - 1}^{2 r - 2 γ} \cdot {[\int_{| s | \geq {\hat{h}}_{j - 1}^{- 1}} {| s |}^{r - γ} | ϕ (s) | d s]}^{2} = \frac{1}{4 π^{2} n} \sum_{j = 1}^{n} E_{f} {\hat{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\hat{h}}_{j - 1}) \leq \frac{1}{4 π^{2} n} [\sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\tilde{h}}_{j - 1}) + \sum_{j = 1}^{n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) + \sum_{j > n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) P_{f} (r (j) > r - γ + δ_{j})] \leq \frac{1}{4 π^{2} n} [\sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\tilde{h}}_{j - 1}) + \sum_{j = 1}^{n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) + C_{r + 1, m_{0}} \sum_{j > n_{2}} {\bar{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\bar{h}}_{j - 1}) \frac{B_{j}^{2 m_{0} (r + 1 + δ_{j})}}{n^{m_{0}}}] \leq \frac{1}{4 π^{2} n} \sum_{j = 1}^{n} {\tilde{h}}_{j - 1}^{2 r - 2 γ} Δ_{γ}^{2} ({\tilde{h}}_{j - 1}) + \frac{C_{2}}{n} .

(26)

From (24)–(26) follows the first assertion of Theorem 3.

For the proof of the second assertion we estimate first, for some integer

m > 1

the rate of convergence of the moment

E_{f} I_{1}^{2 m} (n) .

Let

α_{1}, \dots, α_{m}

be non-negative integers and denote

{\tilde{K}}_{j} (x) = K (\frac{x - X_{j}}{{\hat{h}}_{j - 1}}) - {\hat{h}}_{j - 1} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z,

{\bar{K}}_{j} (x) = \int {[K (u) - {\hat{h}}_{j - 1} \int K (z) f (x - {\hat{h}}_{j - 1} z) d z]}^{2} f (x - {\hat{h}}_{j - 1} u) d u .

By the Burkholder inequality for the martingale

\sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}} {\tilde{K}}_{j} (x)

we have

E_{f} I_{1}^{2 m} (n) = \frac{1}{n^{2 m}} E_{f} {(\sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}} {\tilde{K}}_{j} (x))}^{2 m} \leq \frac{C}{n^{2 m}} E_{f} {(\sum_{j = 1}^{n} \frac{1}{{\hat{h}}_{j - 1}^{2}} {\tilde{K}}_{j}^{2} (x))}^{m}

\leq \frac{C}{n^{2 m}} \sum_{α_{1} + \dots + α_{m} = m} \sum_{1 \leq j_{1} < \dots < j_{m} \leq n} E_{f} \frac{1}{{\hat{h}}_{j_{1} - 1}^{2} \cdot \dots \cdot {\hat{h}}_{j_{m} - 1}^{2}} {\tilde{K}}_{j_{1}}^{2} (x) \cdot \dots \cdot {\tilde{K}}_{j_{m}}^{2} (x)

\leq \frac{C}{n^{2 m}} \sum_{α_{1} + \dots + α_{m} = m} \sum_{1 \leq j_{1} < \dots < j_{m} \leq n} E_{f} \frac{{\tilde{K}}_{j_{1}}^{2} (x) \cdot \dots \cdot {\tilde{K}}_{j_{m - 1}}^{2} (x) \cdot {\bar{K}}_{j_{m}} (x)}{{\hat{h}}_{j_{1} - 1}^{2} \cdot \dots \cdot {\hat{h}}_{j_{m - 1} - 1}^{2} \cdot h_{j_{m} - 1}^{*}}

\leq \frac{C}{n^{2 m}} \sum_{α_{1} + \dots + α_{m} = m} \sum_{1 \leq j_{1} < \dots < j_{m} \leq n} E_{f} \frac{1}{{\hat{h}}_{j_{1} - 1}^{2} \cdot \dots \cdot {\hat{h}}_{j_{m - 2} - 1}^{2} \cdot h_{j_{m - 1} - 1}^{*} \cdot h_{j_{m} - 1}^{*}}

\cdot {\tilde{K}}_{j_{1}}^{2} (x) \cdot \dots \cdot {\tilde{K}}_{j_{m - 2}}^{2} (x) \cdot {\bar{K}}_{j_{m - 1}} (x) \leq \dots \leq \frac{C}{n^{2 m}} {(\sum_{j = 1}^{n} \frac{1}{h_{j - 1}^{*}})}^{m} .

By the definition of

h_{j}^{*}

for some

0 < r_{*} < r - γ

and

j_{*} < \infty,

h_{j}^{*} \geq n^{- \frac{1}{1 + 2 r_{*}}}

and, as follows

E_{f} I_{1}^{2 m} (n) = O (n^{- \frac{2 m r_{*}}{1 + 2 r_{*}}}) as n \to \infty .

Thus

\frac{2 m r_{*}}{1 + 2 r_{*}} > 1

for

m > \frac{1 + 2 r_{*}}{2 r_{*}}

and by the Borel–Cantelli lemma, as

n \to \infty,

I_{1} (n) \to 0 P_{f} - a . s .

(27)

Further,

| I_{2} (n) | \leq \frac{1}{4 π^{2} n} \sum_{j = 1}^{n} {\hat{h}}_{j - 1}^{r} | Δ_{γ} ({\hat{h}}_{j - 1}) |

and, as follows, as

n \to \infty,

I_{2} (n) \to 0 P_{f} - a . s .

(28)

From (27) and (28) follows the second assertion of Theorem 3.

7.4. Proof of Theorem 4

Note that the distribution of

{\hat{f}}_{n}^{(k)} (x)

is the same as that of

{\hat{f}}_{n}^{(j)} (x)

for all

k, j

. Hence,

E {\bar{f}}_{n, b} (x) = E {\hat{f}}_{n} (x) .

Now

| C o v ({\hat{f}}_{n}^{(k)} (x), {\hat{f}}_{n}^{(j)} (x)) | \leq V a r ({\hat{f}}_{n}^{(1)} (x))

by the Cauchy–Schwarz inequality and the fact that

V a r ({\hat{f}}_{n}^{(k)} (x)) = V a r ({\hat{f}}_{n}^{(j)} (x))

. Thus,

V a r ({\bar{f}}_{n, b} (x)) = \frac{1}{b^{2}} \sum_{k = 1}^{b} \sum_{j = 1}^{b} C o v ({\hat{f}}_{n}^{(k)} (x), {\hat{f}}_{n}^{(j)} (x)) \leq V a r ({\hat{f}}_{n}^{(1)} (x))

and the theorem is proven.

8. Conclusions

Non-parametric kernel estimation crucially depends on the bandwidth choice which, in turn, depends on the smoothness of the underlying function. Focusing on estimating a probability density function, we define a smoothness class and propose a data-based estimator of the underlying degree of smoothness. The convergence rates in the almost sure sense of the proposed estimators are obtained. Adaptive estimators of densities from the given class on the basis of the constructed smoothness parameter estimators are also presented, and their consistency is established. Simulation results illustrate the realization of the asymptotic behavior when the sample size grows large.

Recently, there has been an increasing interest in nonparametric estimation with dependent data both in terms of theory as well as applications; see, e.g., [15,30,31,32,33]. With respect to probability density estimation, many asymptotic results remain true when moving from i.i.d. data to data that are weakly dependent. For example, the estimator variance, bias and MSE have the same asymptotic expansions as in the i.i.d. case subject to some limitations on the allowed bandwidth rate; fortunately, the optimal bandwidth rate of

n^{- 1 / 5}

is in the allowed range—see [34,35].

Consequently, it is conjectured that our proposed estimator of smoothness—as well as resulting data-based bandwidth choice and probability density estimator—will retain their validity even when the data are weakly dependent. Future work may confirm this conjecture especially since working with dependent data can be quite intricate. For example, [36] extended the results of [34] from the realm of linear time series to strong-mixing process. In so doing, Remark 5 of [36] pointed to a nontrivial error in the work of [34] which is directly relevant to optimal bandwidth choice.

Author Contributions

All authors contributed equally to this project. All authors have read and agreed to the published version of the manuscript.

Funding

Partial funding by NSF grant DMS 19-14556.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are very thankful to O. Lepskii for helpful comments and remarks.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rosenblatt, M. Stochastic Curve Estimation; NSF-CBMS Regional Conference Series in Probability and Statistics, 3; Institute of Mathematical Statistics: Hayward, CA, USA, 1991. [Google Scholar]
Bartlett, M.S. Statistical estimation of density function. Sankhya Indian J. Stat. 1963, A25, 245–254. [Google Scholar]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Devroye, L. A Course in Density Estimation; Birkhäuser: Boston, MA, USA; Basel, Switzerland; Stuttgart, Germany, 1987. [Google Scholar]
Gasser, T.; Müller, H.-G.; Mammitzsch, V. Kernels for nonparametric curve estimation. J. R. Stat. Soc. Ser. B 1985, 60, 238–252. [Google Scholar] [CrossRef]
Granovsky, B.L.; Müller, H.-G. Optimal kernel methods: A unifying variational principle. Int. Stat. Rev. 1991, 59, 373–388. [Google Scholar] [CrossRef]
Jones, M.C. On higher order kernels. J. Nonparametric Stat. 1995, 5, 215–221. [Google Scholar] [CrossRef]
Marron, J.S. Visual understanding of higher order kernels. J. Comput. Graph. Stat. 1994, 3, 447–458. [Google Scholar]
Müller, H.-G. Nonparametric Regression Analysis of Longitudinal Data; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
Scott, D.W. Multivariate Density Estimation: Theory, Practice and Visualization; Wiley: New York, NY, USA, 1992. [Google Scholar]
Devroye, L. A note on the usefulness of superkernels in density estimation. Ann. Stat. 1992, 20, 2037–2056. [Google Scholar] [CrossRef]
Politis, D.N. On nonparametric function estimation with infinite-order flat-top kernels. In Probability and Statistical Models with Applications; Charalambides, C., Ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2001; pp. 469–483. [Google Scholar]
Jones, M.C.; Marron, J.S.; Sheather, S.J. A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 1996, 91, 401–407. [Google Scholar] [CrossRef]
Tsybakov, A. Introduction to Nonparametric Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2009. [Google Scholar]
Dobrovidov, A.V.; Koshkin, G.M.; Vasiliev, V.A. Non-Parametric State Space Models; Kendrick Press: Heber City, UT, USA, 2012. [Google Scholar]
Ibragimov, I.A.; Khasminskii, R.Z. Statistical Estimation: Asymptotic Theory; Springer: Berlin/Heidelberg, Germany, 1981. [Google Scholar]
Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer Texts in Statistics: New York, NY, USA, 2022. [Google Scholar]
Lepskii, O.V.; Spokoiny, V.G. Optimal pointwise adaptive methods in nonparametric estimation. Ann. Stat. 1997, 25, 2512–2546. [Google Scholar] [CrossRef]
Brown, L.D.; Low, M.G. Superefficiency and Lack of Adaptibility in Functional Estimation; Technical Report; Cornell University: Ithaca, NY, USA, 1992. [Google Scholar]
Lepskii, O.V. On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Its Appl. 1990, 35, 454–466. [Google Scholar] [CrossRef]
Butucea, C. Exact adaptive pointwise estimation on Sobolev classes of densities. ESAIM Probab. Stat. 2001, 5, 1–31. [Google Scholar] [CrossRef]
Goldenshluger, A.; Lepski, O. Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 2011, 39, 1608–1632. [Google Scholar] [CrossRef]
Lacour, C.; Massart, P.; Rivoirard, V. Estimator selection: A new method with applications to kernel density estimation. Sankhya A 2017, 79, 298–335. [Google Scholar] [CrossRef]
Politis, D.N. Adaptive bandwidth choice. J. Nonparametric Stat. 2003, 15, 517–533. [Google Scholar] [CrossRef]
Politis, D.N.; Romano, J.P. On a Family of Smoothing Kernels of Infinite Order. In Computing Science and Statistics, Proceedings of the 25th Symposium on the Interface, San Diego, CA, USA, 14–17 April 1993; Tarter, M., Lock, M., Eds.; The Interface Foundation of North America: San Diego, CA, USA, 1993; pp. 141–145. [Google Scholar]
Politis, D.N.; Romano, J.P. Multivariate density estimation with general flat-top kernels of infinite order. J. Multivar. Anal. 1999, 68, 1–25. [Google Scholar] [CrossRef]
Politis, D.N.; Romano, J.P.; Wolf, M. Subsampling; Springer: New York, NY, USA, 1999. [Google Scholar]
McMurry, T.; Politis, D.N. Nonparametric regression with infinite order flat-top kernels. J. Nonparametric Stat. 2004, 16, 549–562. [Google Scholar] [CrossRef]
Liptser, R.; Shiryaev, A. Theory of Martingales; Springer: New York, NY, USA, 1988. [Google Scholar]
Bijloos, G.; Meyers, J.A. Fast-Converging Kernel Density Estimator for Dispersion in Horizontally Homogeneous Meteorological Conditions. Atmosphere 2021, 12, 1343. [Google Scholar] [CrossRef]
Cortes Lopez, J.C.; Jornet Sanz, M. Improving Kernel Methods for Density Estimation in Random Differential Equations Problems. Math. Comput. Appl. 2020, 25, 33. [Google Scholar] [CrossRef]
Correa-Quezada, R.; Cueva-Rodriguez, L.; Alvarez-Garcia, J.; del Rio-Rama, M.C. Application of the Kernel Density Function for the Analysis of Regional Growth and Convergence in the Service Sector through Productivity. Mathematics 2020, 8, 1234. [Google Scholar] [CrossRef]
Vasiliev, V.A. A truncated estimation method with guaranteed accuracy. Ann. Inst. Stat. Math. 2014, 66, 141–163. [Google Scholar] [CrossRef]
Hallin, M.; Tran, L.T. Kernel density estimation for linear processes: Asymptotic normality and bandwidth selection. Ann. Inst. Stat. Math. 1996, 48, 429–449. [Google Scholar] [CrossRef]
Wu, W.-B.; Mielniczuk, J. Kernel density estimation for linear processes. Ann. Stat. 2002, 30, 1441–1459. [Google Scholar] [CrossRef]
Lu, Z. Asymptotic normality of kernel density estimators under dependence. Ann. Inst. Stat. Math. 2001, 53, 447–468. [Google Scholar] [CrossRef]

Figure 1. Piecewise linear characteristic function (left) and corresponding kernel (right),

c = 1.5

.

Figure 2. Infinitely differentiable flat-top characteristic function (left) and corresponding kernel (right),

c = 0.05

,

b = 1

.

Figure 3. MSE of kernel estimators multiplied by

n^{3 / 4}

as a function of the sample size n for the triangle density function. (a) MSE of estimator with piecewise linear kernel characteristic function. (b) MSE of estimator with infinitely differentiable flat-top kernel characteristic function.

Figure 4. MSE of kernel estimators (with piecewise linear kernel characteristic function) as a function of the sample size n. (a) Laplace density function (

r = 1

, MSE multiplied by

n^{3 / 4}

). (b) Gamma density function (

k = 3

,

r = 2

, MSE multiplied by

n^{5 / 6}

).

Figure 5. MSE of kernel estimators (with piecewise linear kernel characteristic function) as a function of the sample size n. (a) Gamma distribution shape parameter

k = 4

,

r = 3

(MSE multiplied by

n^{7 / 8}

). (b) Gamma distribution shape parameter

k = 6

,

r = 5

(MSE multiplied by

n^{11 / 12}

).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Estimating Smoothness and Optimal Bandwidth for Probability Density Functions

Abstract

1. Introduction

2. Problem Set-Up and Basic Assumptions

3. Asymptotic Mean Square Optimal Estimation

4. Estimation of the Degree of Smoothness r

5. Adaptive Estimation of the Functions $f \in {\bar{F}}_{r}$

5.1. Adaptive MSE–Optimal Estimation

5.2. A Symmetric Estimator

5.3. Adaptive Optimal Bandwidth

6. Simulation Results

7. Technical Proofs

7.1. Proof of Lemma 1

7.2. Proof of Theorem 2

7.3. Proof of Theorem 3

7.4. Proof of Theorem 4

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Estimating Smoothness and Optimal Bandwidth for Probability Density Functions

Abstract

1. Introduction

2. Problem Set-Up and Basic Assumptions

3. Asymptotic Mean Square Optimal Estimation

4. Estimation of the Degree of Smoothness r

5. Adaptive Estimation of the Functions f ∈ F ¯ r

5.1. Adaptive MSE–Optimal Estimation

5.2. A Symmetric Estimator

5.3. Adaptive Optimal Bandwidth

6. Simulation Results

7. Technical Proofs

7.1. Proof of Lemma 1

7.2. Proof of Theorem 2

7.3. Proof of Theorem 3

7.4. Proof of Theorem 4

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

5. Adaptive Estimation of the Functions $f \in {\bar{F}}_{r}$