Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes

Didi, Sultana; Bouzebda, Salim

doi:10.3390/e27040389

Open AccessArticle

Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes

by

Sultana Didi

^1,†

and

Salim Bouzebda

^2,*,†

¹

Department of Statistics and Operations Research, College of Sciences, Qassim University, P.O. Box 6688, Buraydah 51452, Saudi Arabia

²

LMAC (Laboratory of Applied Mathematics of Compiègne), Université de Technologie de Compiègne, CS 60 319-60 203 Compiègne Cedex, 60203 Compiègne, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2025, 27(4), 389; https://doi.org/10.3390/e27040389

Submission received: 6 February 2025 / Revised: 1 April 2025 / Accepted: 4 April 2025 / Published: 6 April 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

In this work, we propose a wavelet-based framework for estimating the derivatives of a density function in the setting of continuous, stationary, and ergodic processes. Our primary focus is the derivation of the integrated mean square error (IMSE) over compact subsets of

R^{d}

, which provides a quantitative measure of the estimation accuracy. In addition, a uniform convergence rate and normality are established. To establish the asymptotic behavior of the proposed estimators, we adopt a martingale approach that accommodates the ergodic nature of the underlying processes. Importantly, beyond ergodicity, our analysis does not require additional assumptions regarding the data. By demonstrating that the wavelet methodology remains valid under these weaker dependence conditions, we extend earlier results originally developed in the context of independent observations.

Keywords:

density estimation; stationarity; ergodicity; rates of strong convergence; wavelet-based estimators; martingale differences; discrete time; stochastic processes; time series

MSC:

62G07; 62G08; 62G05; 62G20; 62H05; 60G42; 60G46

1. Introduction

Multivariate data analysis frequently relies on the behavior of the partial derivatives of underlying density functions, especially for tasks such as identifying modal regions. Despite their significance, research on the nonparametric estimation of higher-order density derivatives remains relatively scarce. The primary goal of this work is to introduce wavelet-based approaches for estimating the partial derivatives of multivariate densities in a nonparametric framework. Estimating the derivatives of unknown functions—whether they are densities or regression functions—constitutes a fundamental statistical operation that underpins numerous applications. Such estimation tasks are common in fields like economics and industrial engineering, where complex systems must be modeled in the face of substantial prior uncertainty. For instance, the second-order derivatives of densities can be used to construct statistical tests for detecting modes [1,2], and they also inform the choice of the bandwidth in kernel density estimation [3]. In nonparametric signal estimation, the ratio between a density’s derivative and the density itself (i.e., the logarithmic derivative) serves a central role in formulating optimal filtering and interpolation equations [4,5]. The accurate estimation of this ratio thereby becomes critical for signal processing. Meanwhile, gradient estimation is essential for isolating filaments in point clouds, a technique frequently employed in medical imaging, remote sensing, seismology, and cosmology [6]. Other motivations include challenges in regression analysis, Fisher information evaluation, parameter estimation, and hypothesis testing [7]. Foundational studies on density derivative estimation can be found in [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22], among others.

Kernel-based estimators are recognized for their strong performance when the underlying target function (whether a density or a regression function) is defined on an unbounded domain. However, they often exhibit deficiencies near the boundaries of a compact support. Furthermore, traditional kernel-based strategies typically assume a sufficiently high degree of smoothness on the target function, for instance, by requiring at least two continuous derivatives. In the absence of such smoothness guarantees, wavelet-based estimators present a compelling alternative. They leverage function spaces that capture refined notions of regularity rather than mere differentiability requirements. Remarkably, although wavelet estimators do not explicitly demand these smoothness properties, they nevertheless attain optimal convergence rates equivalent to those of strategies presupposing full knowledge of the function’s true regularity; see [23].

Wavelet-based methods have thus gained prominence in statistical estimation by deftly accommodating heterogeneous regularity properties and managing discontinuities with ease. An additional merit is their typically low computational overhead, coupled with modest memory usage. For a thorough exposition of wavelet methods in nonparametric function estimation, see [24]. Representative applications of wavelets include the estimation of the integrated squared derivative of a univariate density under independence [25] and under certain forms of dependence [26]. Extensions to the partial derivatives of multivariate densities under independence were explored by [27], while the mixing hypothesis was investigated in [28,29]. More recently, [30] studied wavelet estimators for the partial derivatives of multivariate densities in the presence of additive noise.

Although mixing assumptions are widely employed for their technical convenience, they may not hold when the data exhibit more intricate dependence patterns. In essence, “mixing” conditions entail a form of asymptotic independence that may be unrealistic in settings characterized by strong dependence. Methods that address more general dependence structures remain relatively undeveloped. In this respect, ergodicity provides a broader alternative to strong mixing assumptions by dispensing with their stringent requirements and associated probabilistic complexity [31]. All mixing conditions imply ergodicity; however, ergodicity itself is strictly weaker, enlarging the range of processes that can be analyzed [32]. These advantages are examined in more detail in [33,34], where discussions of ergodic processes and their practical significance can be found. Several examples illustrate that certain processes are ergodic but not mixing [32,35]. In particular, ref. [36] (Section 5.3) constructed a strictly stationary process,

{(T_{i}, λ_{i})}_{i \in Z}

, satisfying

T_{i} ∣ T_{i - 1} \sim Poisson (λ_{i})

, with

λ_{i} = κ (λ_{i - 1}, T_{i - 1})

for some mapping,

κ

. While this process is ergodic under general conditions, it need not be mixing [37]. Another important example involves fractional Brownian motion,

{W_{t}^{H} : t \geq 0}

. When

H \in (0, 1)

, this process has strictly stationary increments, and the associated fractional Gaussian noise process

{G_{t}^{H} : t \geq 0}

exhibits long memory for

H > 1 / 2

, thereby violating strong mixing [38,39], although suitable correlation decay conditions can still yield ergodicity [40]. In general, verifying ergodicity is often simpler than proving mixing, making ergodic frameworks appealing for analyzing diverse classes of processes, including those arising in chaotic systems with noise [41]. Formally, let

{X_{n}}_{n \in Z}

be a stationary sequence with backward

σ

-fields,

A_{n} = σ (X_{k} : k \leq n)

, and forward

σ

-fields,

B_{m} = σ (X_{k} : k \geq m)

. The sequence is deemed *strongly mixing* if

sup_{A \in A_{0}, B \in B_{n}} |P (A \cap B) - P (A) P (B)| ⟶ 0 as n \to \infty .

It is said to be *ergodic* if

\frac{1}{n} \sum_{k = 0}^{n - 1} |P (A \cap τ^{- k} B) - P (A) P (B)| ⟶ 0 as n \to \infty,

where

τ

denotes the shift operator. Strong mixing implies ergodicity, but the converse does not always hold [32]. Many authors advocate for an ergodic perspective precisely because it accommodates an expanded class of stochastic processes [33,34].

Despite the literature on wavelet estimators for the partial derivatives of multivariate densities, it appears that no existing work addresses dependence structures strictly beyond strong mixing. The present study aims to fill this gap by developing wavelet-based estimators that operate under ergodic conditions. This extension necessitates a suite of martingale techniques that diverge substantially from the standard tools typically employed for strong mixing processes. More importantly, bridging this methodological gap requires novel theoretical contributions specifically tailored to estimating partial derivatives in an ergodic context, rather than a mere combination of pre-existing results obtained using wavelet methods and ergodic theory.

The organization of this paper is as follows. Section 2 is devoted to the mathematical background of the study. The proposed linear wavelet estimators are stated as well. The assumptions and main results concerning the uniform convergence rates and asymptotic normality under the weak dependence assumptions are given in Section 3. Some concluding remarks and suggestions for future work are given in Section 6. Finally, to avoid interrupting the flow of the presentation, the proofs are postponed to Section 7.

Notation

Throughout the paper, we let C be some positive constant, which may be different from one term to another. Set

1_{A} (\cdot)

is the indicator function of A. For the numerical sequences of positive constants,

a_{n}

,

b_{n}

, where

\{a_{n}, n \geq 1\}

and

\{b_{n}, n \geq 1\}

, we denote

a_{n} = O (b_{n})

for

a_{n} \leq C b_{n}

, and

a_{n} = o (b_{n})

implies that

\frac{a_{n}}{b_{n}} \to 0

as

n \to \infty

.

2. Mathematical Background

2.1. Linear Wavelet Estimator

We begin by reviewing fundamental concepts in wavelet analysis, adopting the notation in [42]. Consider a multiresolution analysis (MRA),

{V_{j}}_{j = 1}^{\infty}

, of

L_{2} (R^{d})

, with

ϕ (\cdot)

and

ψ (\cdot)

denoting the scaling function and the orthogonal wavelet, respectively. Both

ϕ

and

ψ

are assumed to be compactly supported on

{[- L, L]}^{d}

for some

L > 0

and r-regular (for

r \geq 1

) if

ϕ (\cdot) \in C^{r}

and all its partial derivatives up to a total order of r are rapidly decreasing, i.e., for every integer

i > 0

, there exists a constant

A_{i}

such that

|(D^{β} ϕ) (x)| \leq \frac{2^{j (d + | β |) / 2} A_{i}}{{(1 + ∥ x ∥)}^{i}}, for all ∣ β ∣ \leq r .

(1)

For every integer j and

k \in Z^{d}

, the family

{\{ϕ_{j, k} (x) = 2^{\frac{j d}{2}} ϕ (2^{j} x - k)\}}_{k \in Z^{d}}

constitutes an orthonormal basis of

V_{j}

. Furthermore, one obtains

2^{d} - 1

orthonormal wavelet functions:

\{ψ_{i, j, k} (x) = 2^{\frac{j d}{2}} ψ_{i} (2^{j} x - k); i = 1, \dots, 2^{d} - 1, k \in Z^{d}\},

which together form an orthonormal basis of

L_{2} (R^{d})

.

Next, let

{X_{i}}_{i = 1}^{n}

be a collection of (weakly dependent) random variables drawn from a distribution with an unknown density,

f (\cdot)

. Assume f is partially differentiable up to a total order of r. Our main interest is to estimate the partial derivatives of f; that is,

(\partial^{β} f) (x) = \frac{\partial^{β} f (x)}{\partial x_{1}^{β_{1}} \dots \partial x_{d}^{β_{d}}},

where

β = (β_{1}, \dots, β_{d})

and

| β | = \sum_{i = 1}^{d} β_{i}

. In particular, we aim to construct wavelet-based estimators of

(\partial^{β} f) (\cdot)

that exhibit both strong convergence and asymptotic normality. Assuming that

(\partial^{β} f) (\cdot)

lies in

L_{2} (R^{d})

, one can invoke [43] to write, for any integer m,

(\partial^{β} f) (x) = \sum_{k \in Z^{d}} a_{m, k} ϕ_{m, k} (x),

where each wavelet coefficient satisfies

a_{m, k} = \int_{R^{d}} (\partial^{β} f) (u) ϕ_{m, k} (u) d u = {(- 1)}^{| β |} \int_{R^{d}} f (u) (\partial^{β} ϕ_{m, k}) (u) d u .

Here,

(\partial^{β} ϕ_{m, k}) (\cdot)

denotes the

β

th partial derivative of

ϕ_{m, k} (\cdot)

.

A natural linear estimator of

(\partial^{β} f) (\cdot)

at a resolution level of

m = m (n) \to \infty

is

{\hat{(\partial^{β} f)}}_{n} (x) = \sum_{k \in Z^{d}} {\hat{a}}_{m, k} ϕ_{m, k} (x),

where

{\hat{a}}_{m, k}

is the following unbiased empirical estimator of

a_{m, k}

:

{\hat{a}}_{m, k} = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} (\partial^{β} ϕ_{m, k}) (X_{i}) .

2.2. Besov Spaces

In what follows, we will work within the framework of Besov spaces,

B_{s, p, q}

(with

s > 0

,

1 \leq p \leq \infty

, and

1 \leq q \leq \infty

). According to [42], these spaces can be characterized using wavelet coefficients. Specifically, if

0 < s < r

and

f (\cdot) \in L_{p} (R^{d})

(where s captures the real-valued smoothness of f), then the membership

f (\cdot) \in B_{s, p, q}

is equivalent to the following two conditions:

(B.1): $J_{s, p, q} (f) = {∥P_{V_{0}} f∥}_{L_{p}} + {(\sum_{j > 0} {(2^{j s} {∥ P_{W_{j}} f ∥}_{L_{p}})}^{q})}^{1 / q} < \infty,$
(B.2): $J_{s, p, q}^{'} (f) = {∥a_{0, \cdot}∥}_{l_{p}} + {(\sum_{j > 0} {(2^{j [s + d (\frac{1}{2} - \frac{1}{p})]} {∥b_{j, \cdot}∥}_{l_{p}})}^{q})}^{1 / q} < \infty,$

where

$a_{0, \cdot} = \int_{R^{d}} f (u) ϕ_{0, k} (u) d u, b_{i, j, k} = \int_{R^{d}} f (u) ψ_{i, j, k} (u) d u,$

and

${∥b_{j, \cdot}∥}_{l_{p}} = {(\sum_{i = 1}^{d} \sum_{k \in Z^{d}} {| b_{i, j, k} |}^{p})}^{1 / p} .$

When $q = \infty$ , one uses the corresponding supremum norm. Notably, these Besov spaces encapsulate various function classes frequently used in statistical analysis, including the Hilbert space $H^{s} = B_{s, 2, 2}$ and the space of bounded s-Lipschitz functions, $C^{s} = B_{s, \infty, \infty}$ (for a non-integer s), among others. Besov spaces, $B_{s, p, q}$ , are of central importance in statistical estimation and approximation theory, as they provide a flexible framework for characterizing smoothness properties; for instance, see [44,45,46,47,48]. Additional equivalent characterizations and advantages of Besov spaces in approximation theory and statistics can be found in [49,50,51,52] and in the Appendix A.

2.3. Linear Wavelet Estimator

We begin by reviewing key elements of wavelet theory, following the notation introduced in [42]. Consider a multiresolution analysis,

{V_{j}}_{j = 1}^{\infty}

, of the space

L_{2} (R^{d})

. Let

ϕ (\cdot)

be the scaling function and

ψ (\cdot)

the orthogonal wavelet function, both assumed to be r-regular (with

r \geq 1

) and compactly supported within

{[- L, L]}^{d}

for some

L > 0

. For each integer j and each

k \in Z^{d}

, define

ϕ_{j, k} (x) = 2^{j d / 2} ϕ (2^{j} x - k) .

It is well known that

{ϕ_{j, k}}_{k \in Z^{d}}

forms an orthonormal basis for

V_{j}

. Furthermore, there exist

2^{d} - 1

associated wavelet functions, giving rise to the family

\{ψ_{i, j, k} (x) = 2^{j d / 2} ψ_{i} (2^{j} x - k) ∣ i = 1, \dots, 2^{d} - 1, k \in Z^{d}\},

which constitutes an orthonormal basis for

L_{2} (R^{d})

.

Let

{X_{i}}_{i \geq 1}

be an

R^{d}

-valued strictly stationary ergodic process defined on a probability space,

(Ω, A, P)

. Let f be the common density function of the sample

X_{1}, \dots, X_{n}

, which is assumed to be bounded and continuous, that is partially differentiable up to a total order of r. In this work, our principal objective is to estimate the partial derivative of the density:

(\partial^{β} f) (x) = \frac{\partial^{β} f (x)}{\partial x_{1}^{β_{1}} \dots \partial x_{d}^{β_{d}}},

where

β = (β_{1}, \dots, β_{d})

and

| β | = \sum_{i = 1}^{d} β_{i}

. As highlighted in the Introduction, we aim to construct wavelet-based estimators of

(\partial^{β} f) (\cdot)

that exhibit both MISE and strong convergence as well as asymptotic normality.

According to [43], for any integer m, this derivative of the density can be expanded in the subspace

V_{m}

as

(\partial^{β} f) (x) = \sum_{k \in Z^{d}} a_{m, k} ϕ_{m, k} (x),

where each wavelet coefficient,

a_{m, k}

, is given by

a_{m, k} = \int_{R^{d}} (\partial^{β} f) (u) ϕ_{m, k} (u) d u = {(- 1)}^{| β |} \int_{R^{d}} f (u) (\partial^{β} ϕ_{m, k}) (u) d u,

where we have applied multiple integration by parts.

Remark 1.

The fact that

ϕ (\cdot)

and

ψ_{i} (\cdot)

are bounded and compactly supported, with the support being the monotonically increasing function of their degree of differentiability, ensures that the above summations on

k \in Z^{d}

are finite for each fixed

x

(i.e., they converge in a pointwise sense; for instance, see [43]).

Note that

(\partial^{β} ϕ_{m, k}) (\cdot)

denotes the

β

th partial derivative of

ϕ_{m, k} (\cdot)

. We define a linear estimator of

(\partial^{β} f) (\cdot)

at the resolution level

m = m (n) \to \infty

(to be specified later) by

{\hat{(\partial^{β} f)}}_{T} (x) = \sum_{k \in Z^{d}} {\hat{a}}_{m, k} ϕ_{m, k} (x),

(2)

where the unbiased empirical estimate

{\hat{a}}_{m, k}

of

a_{m, k}

is given by

{\hat{a}}_{m, k} = \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} (\partial^{β} ϕ_{m, k}) (X_{t}) d t .

(3)

This construction will serve as the foundation for our subsequent analysis.

3. Assumptions and Main Results

We introduce the following notation and our hypotheses to facilitate the presentation of our main results. For the remainder of the paper, any small, real

0 \leq t \leq T

and

δ > 0

, we will denote by

n = \frac{T}{δ} \in N, a n d T_{j} = j δ, f o r j = 1, \dots, n .

Let

F_{t - δ}

be the

σ

-field generated by

\{(X_{s}) : 0 \leq s < t - δ\} .

Let

F_{j}

be the

σ

-field generated by

\{(X_{s}) : 0 \leq s \leq T_{j}\} .

We define the conditional density of

X_{i}

given

F_{i - 1}

as

f_{X_{t}}^{F_{t - δ}} (\cdot) = f_{t}^{F_{t - δ}} (\cdot) .

The following assumptions are imposed throughout the paper.

(C.1): For every $x \in S$ , the sequence

$\frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t,$

converges to $f (x)$ as $n \to \infty$ , both almost surely (a.s.) and in the $L^{2}$ sense.
(C.2): Moreover, for every $x \in S$ ,

$lim_{n \to \infty} sup_{x \in R^{d}} |\frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t - f (x)| = 0,$

again in both the almost sure and $L^{2}$ senses.
(C.3): The partial derivative function $(\partial^{β} f) (\cdot)$ belongs to the Besov space $B_{s, p, q}$ for some $0 < s < r$ , $l \leq p, q \leq \infty$ .
(C.4): For all values of $i = 1; \dots; n$ , the partial derivative function $(\partial^{β} f^{F_{i - 1}}) (\cdot)$ belongs to the Besov space $B_{s, p, q}$ for some $0 < s < r$ , $l \leq p, q \leq \infty$ .

Define the kernel

K (u, v)

by

K (u, v) : = \sum_{k \in Z^{d}} ϕ (u - k) ϕ (v - k) a n d h_{n} = 2^{- m (n)} .

(4)

Consequently,

K^{(β)} (u, v) : = \sum_{k \in Z^{d}} ϕ (u - k) (\partial_{v}^{β} ϕ) (v - k),

(5)

where

K^{(β)} (u, v) : = (\partial_{v}^{β} k) (u, v)

denotes the

β

th partial derivative of

K (u, v)

with respect to

v

. We infer that the derivative kernel function

K^{(β)} (\cdot)

defined in (5) converges uniformly and satisfies [42] (p. 33), i.e., for

∣ α ∣ \leq r

,

∣ β ∣ \leq r

, and for some constant

C_{m} > 0

with

m \geq 1

, the following inequality holds:

| (\partial_{u}^{α} \partial_{v}^{β} K) (u, v) | \leq \frac{C_{m}}{{(1 + ∥ v - u ∥}_{2})^{m}},

where, for any

x = (x_{1}, \dots, x_{d}) \in \in R^{d}, {∥ x ∥}_{2} = \sqrt{\sum_{i = 1}^{d} x_{i}^{2}}

. For

α = 0

and

m = d + | β |

, we obtain

| K^{(β)} (u, v) | \leq \frac{C_{d + | β |}}{{(1 + ∥ v - u ∥}_{2})^{d + | β |}} .

(6)

That is, for any

j \geq 1

, we have

\int_{R^{d}} {| K^{(β)} (v, u) |}^{j} d v \leq G_{j} (d) .

The function

G_{j} (d)

is defined as

G_{j} (d) = 2 π^{d / 2} \frac{Γ (d) Γ (j + d (j - 1))}{Γ (d / 2) Γ ((d + 1) j)} C_{d + 1}^{j},

where

Γ (t)

represents the Gamma function, given by

Γ (t) : = \int_{0}^{\infty} y^{t - 1} e^{- y} d y .

Furthermore, by assuming

| α | = 1

and

m = 2

, we deduce that

|\frac{\partial_{u} \partial_{v}^{β} K (u, y)}{\partial u_{i}}| \leq \frac{C_{m}}{{(1 + ∥ u - y ∥}_{2})^{2}} \leq C_{2}, i = 1, \dots, d .

(7)

This, in turn, implies that

|K^{β} (u, y) - K^{β} (v, y)| \leq C_{2} \sum_{i = 1}^{d} | u_{i} - v_{i} | \leq d^{1 / 2} C_{2} {∥ u - v ∥}_{2} .

(8)

By incorporating Equations (2), (3), and (5), the estimator

{\hat{(\partial^{β} f)}}_{n} (x)

can be expressed in an extended kernel estimation framework as follows:

{\hat{(\partial^{β} f)}}_{T} (x) = \frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \int_{0}^{T} K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) d t, where h_{T} = 2^{- m (T)} .

(9)

We refer the reader to [27] for further details.

4. Comments on Hypotheses

Assume that the random functions,

f_{t}^{F_{t - δ}} (x)

, for any

t \in [0, T]

, belong to the space,

C^{0}

, of continuous functions, which is a separable Banach space. Moreover, approaching the integral

\int_{0}^{T} f_{t}^{G_{t - δ}} (x) d t

through its Riemann’s sum, it follows that

\begin{matrix} \frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t & = & \frac{1}{T} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{t - δ}} (x) d t \\ ⋍ & \frac{1}{n} \sum_{i = 1}^{n} f_{T_{i - 1}}^{F_{T_{i - 2}}} (x) . \end{matrix}

Since the process

{(X_{T_{j}})}_{j \geq 1}

is stationary and ergodic, following [53] (see Lemma 4 and Corollary 1, together with their proofs), one may prove that the sequence,

{(f_{j δ, (j - 1) δ} (x))}_{j \geq 1}

, of random functions is stationary and ergodic. Indeed, in the work of [53], it sufficed to replace the conditional densities with

f_{j δ, (j - 1) δ}

and the density with the function

f (\cdot)

. In nonparametric estimation, boundedness conditions (C.3) and (C.4) are typically assumed [54]. These conditions can be relaxed by using weighted approximations or distances [55,56]. However, this relaxation requires developing new probabilistic tools for the empirical process, tools that have not yet been established for dependent cases.

Theorem 1.

Under the stated assumptions (C.1) and (C.3), let f be an element of the Besov space

B_{s, p, q}

with

s > 1 / p

and

1 \leq p, q \leq \infty

. In this setting, the linear wavelet estimator

{\hat{(\partial^{β} f)}}_{n}

satisfies

E {∥{\hat{(\partial^{β} f)}}_{T} - \partial^{β} f∥}_{2}^{2} = O (T^{- \frac{2 (s - | β |)}{2 s + 1}}) .

Theorem 2.

Assume that

m (T) = m \to \infty a n d \frac{2^{d m (T)} log T}{T} \to 0 a s n \to \infty .

For every compact subset,

D \subset R^{d}

, under assumptions (C.1) and (C.3), we have, almost surely,

sup_{x \in D} |{\hat{(\partial^{β} f)}}_{T} (x) - E [{\hat{(\partial^{β} f)}}_{T} (x)]| = O ({(\frac{log T}{T h_{T}^{d + 2 | β |}})}^{1 / 2}) + O (h_{T}^{- 1 / 2}) .

A result of the bias term is given in the following lemma.

Lemma 1.

Using the assumption (C.4), we obtain the following result:

B_{T} (x) = {∥E [{\hat{(\partial^{β} f)}}_{T} (x)] - (\partial^{β} f) (x)∥}_{L^{\infty}} = O (h_{T}^{δ}),

for

δ = s - \frac{d}{p} > 0

.

Remark 2.

The term

2^{- m (T)}

in the preceding theorems is directly linked to the bandwidth

h_{T}

used in Parzen–Rosenblatt kernel density estimation. In practical applications, determining the multiresolution level

m (T)

in the wavelet framework tends to be more straightforward than selecting the bandwidth

h_{T}

. This is because, in practice, only a limited set of values for

m (T)

(typically three or four) needs to be considered, making the procedure both more intuitive and computationally efficient.

Remark 3.

It is well established that kernel estimators lose accuracy as the dimensionality of the data increases. This phenomenon, widely known as the curse of dimensionality [57], arises because in high-dimensional settings, local neighborhoods require remarkably large sample sizes to gather an adequate number of observations. Consequently, unless the sample size is extremely large, one must use bandwidths so wide that the concept of truly “local” averaging is essentially lost. A detailed account of these challenges, including numerical examples, can be found in [58], while more recent analyses appear in [59,60]. Notwithstanding the popularity of penalized splines, considerable uncertainty persists regarding their asymptotic performance even in conventional nonparametric problems, and only a handful of theoretical studies have examined their behavior. Furthermore, most functional regression methods rely on minimizing an

L_{2}

norm and thus remain susceptible to outliers. Alternative, less conventional but potentially effective strategies include approaches employing delta sequences and wavelet-based techniques [61].

Remark 4.

An intriguing avenue for future research lies in broadening the current theoretical framework to accommodate processes that exhibit memory, including non-Markovian stochastic dynamics. However, realizing such an extension will likely necessitate a methodological departure from the techniques employed here. Consequently, we leave the pursuit of this challenging direction open for subsequent investigations.

4.1. Asymptotic Normality Results

In this section, we establish a central limit theorem for the estimator defined in Equation (2) under the assumption of weak dependence. Our findings are derived under straightforward regularity conditions on the estimator and minimal bandwidth requirements. Notably, the mathematical premises and outcomes of Theorem 1 remain applicable in subsequent discussions.

We denote by

Z_{n} \overset{D}{\to} N (0, Σ^{2} (x))

that the sequence of random variables

{(Z_{n})}_{n \geq 1}

converges in distribution to a normal distribution with a mean of zero and the covariance matrix

Σ^{2} (x)

.

Theorem 3.

Assume that the hypotheses (C.1)–(C.4) are satisfied. Additionally, suppose that

T h_{T}^{d + 2 | β | + 2 δ} \to 0 a s T \to \infty . T h_{T}^{2 d + 2 | β |} \to 0 a s T \to \infty .

(10)

Then, the following convergence holds:

\sqrt{T h_{T}^{d + 2 | β |}} ({\hat{(\partial^{β} f)}}_{T} (x) - (\partial^{β} f) (x)) \overset{D}{\to} N (0, Σ^{2} (x)),

where

Σ^{2} (x) : = Σ_{(β)}^{2} (x) : = f_{X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u .

The proof of Theorem 3 is detailed in Section 7.

Remark 5.

By leveraging Theorem 1, we can derive a novel result regarding the MSE of the wavelet-based multivariate density estimator. Specifically, under suitable conditions, we establish that

E {∥{\hat{f}}_{T} - f∥}_{2}^{2} = O (T^{- \frac{2 s}{2 s + 1}}) .

Furthermore, employing a similar analytical framework to Theorem 3, we obtain an asymptotic normality result. In particular, we show that

\sqrt{T h_{T}^{d}} ({\hat{f}}_{T} (x) - f (x)) \overset{D}{⟶} N (0, Σ_{(0)}^{2} (x)),

where the asymptotic variance is given by

Σ_{(0)}^{2} (x) = f_{X} (x) \int_{R^{d}} K^{2} (0, u) d u .

These results align with the theoretical insights presented in Theorem 4.7 of [33], reinforcing the effectiveness of wavelet estimators in nonparametric multivariate settings.

4.2. Confidence Interval

The asymptotic variance

Σ^{2} (x)

in the central limit theorem depends on the unknown density function

f_{X} (\cdot)

of

X

and must be estimated for practical applications. This estimation involves selecting a suitable family of bounded, compactly supported discrete wavelets from the existing literature (such as the commonly utilized Daubechies wavelets [43]) with an adequately large multiresolution level,

m (T)

, and an adaptive initial level,

j_{0}

. The unknown parameters are chosen using wavelet-based methods combined with a plug-in approach. We define

{\hat{Σ}}^{2} (x)

as a consistent estimator of the variance

Σ^{2} (x)

by setting

{\hat{Σ}}^{2} (x) : = {\hat{f}}_{X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u,

where

{\hat{f}}_{X} (\cdot)

is a consistent estimator of

f_{X} (\cdot)

. Consequently, an approximate confidence interval for

(\partial^{β} f) (x)

can be constructed as

(\partial^{β} f) (x) \in [{\hat{(\partial^{β} f)}}_{T} (x) \pm c_{α} \frac{{\hat{Σ}}^{2} (x)}{\sqrt{T h_{T}^{d + 2 β}}}],

where

c_{α}

represents the

(1 - α)

quantile of the standard normal distribution.

5. Application to Multivariate Mode Estimation

In this section, we address the problem of estimating the nonparametric mode, following the framework established in [62] and retaining the same notation and definitions. The kernel mode estimator is defined as any random variable,

{\hat{Θ}}_{T}

, that satisfies

{\hat{f}}_{T} ({\hat{Θ}}_{T}) = sup_{x \in R^{p}} {\hat{f}}_{T} (x),

(11)

where

{\hat{(\partial^{0} f)}}_{T} = {\hat{f}}_{T}

. This estimator can be explicitly characterized as

{\hat{Θ}}_{T} = inf \{y \in R^{d} ∣ {\hat{f}}_{T} (y) = sup_{x \in R^{d}} {\hat{f}}_{T} (x)\},

where the infimum is taken with respect to the lexicographic order on

R^{p}

. This definition ensures the measurability of the wavelet-based mode estimator. Assuming that the true mode

Θ

is nondegenerate, meaning that the Hessian matrix

D^{2} f (Θ)

(the second-order derivative at

Θ

) is nonsingular, we denote by

\nabla ℓ (\cdot)

the gradient of

ℓ (\cdot)

. From the definition of

{\hat{Θ}}_{T}

, it follows that

\nabla {\hat{f}}_{T} ({\hat{Θ}}_{T}) = 0 .

Rearranging this, we obtain

\nabla {\hat{f}}_{T} ({\hat{Θ}}_{T}) - \nabla {\hat{f}}_{T} (Θ) = - \nabla {\hat{f}}_{T} (Θ) .

(12)

Applying Taylor’s expansion to the partial derivative

\frac{\partial f_{n} (\cdot)}{\partial x_{i}}

, we find that there exists a vector,

ξ_{T} (i) = {(ξ_{T; 1} (i), \dots, ξ_{T; d} (i))}^{⊤}

, such that

\frac{\partial {\hat{f}}_{T}}{\partial x_{i}} ({\hat{Θ}}_{T}) - \frac{\partial {\hat{f}}_{n}}{\partial x_{i}} (Θ) = \sum_{j = 1}^{d} \frac{\partial^{2} {\hat{f}}_{T}}{\partial x_{i} \partial x_{j}} (ξ_{T} (i)) ({\hat{θ}}_{T, j} - θ_{j}),

where

| ξ_{T; j} (i) - θ_{j} | \leq | {\hat{θ}}_{T, j} - θ_{j} |

for all

j = 1, \dots, d

. Next, define the

d \times d

matrix

H_{T} = {(H_{T, i, j})}_{1 \leq i, j \leq d}

as

H_{T, i, j} = \frac{\partial^{2} {\hat{f}}_{T}}{\partial x_{i} \partial x_{j}} (ξ_{T} (i)) .

Substituting this into (12), we obtain

H_{T} ({\hat{Θ}}_{T} - Θ) = - \nabla {\hat{f}}_{T} (Θ) .

(13)

Applying the first result of Theorem 2, we establish that

lim_{T \to \infty} sup_{x \in R^{d}} |\frac{\partial^{2} {\hat{f}}_{T}}{\partial x_{i} \partial x_{j}} (x) - E (\frac{\partial^{2} {\hat{f}}_{T}}{\partial x_{i} \partial x_{j}} (x))| = 0, a . s .

Moreover, standard results ensure that

E (\frac{\partial^{2} {\hat{f}}_{T}}{\partial x_{i} \partial x_{j}} (x))

converges uniformly to

\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} (x)

in a neighborhood of

Θ

. Consequently, it follows that

lim_{T \to \infty} {\hat{Θ}}_{T} = Θ, a . s .

which further implies

lim_{T \to \infty} H_{T} = D^{2} f (Θ) .

From Equation (13), the convergence rate of

{\hat{Θ}}_{T} - Θ

is determined using the behavior of

D^{2} f (Θ) \nabla f_{T} (Θ)

. Under appropriate regularity conditions and leveraging Theorem 2, we obtain

\begin{matrix} |{\hat{Θ}}_{T} - Θ| = O ({(\frac{log T}{T h_{T}^{d + 2 | β |}})}^{1 / 2}) + O (h_{T}^{s - \frac{d}{p}}) . \end{matrix}

(14)

For related results on modal regression, see [41].

Remark 6.

It is well established that the performance of kernel estimators deteriorates as the dimensionality of the data increases. This issue, commonly referred to as the curse of dimensionality [57], arises because, in high-dimensional spaces, a substantial number of observations are required within local regions to ensure reliable estimation. However, unless the sample size is exceptionally large, the bandwidth must be set so broadly that the fundamental principle of “local” averaging is undermined. A comprehensive discussion of these challenges, along with numerical demonstrations, is available in [58], and further insights can be found in more recent studies [59,60]. Although penalized splines remain widely used, their asymptotic properties remain insufficiently understood, even in conventional nonparametric frameworks. Theoretical studies examining their long-term behavior are relatively scarce. Additionally, it is important to recognize that many functional regression methodologies rely on minimizing the

L_{2}

norm, making them particularly vulnerable to the influence of outliers. Alternative approaches, which may provide more robust solutions, include methods based on delta sequences and wavelet techniques [61].

Remark 7.

Let us recall some integral functions of the density function

T_{1} (F) = \int_{R} {(f^{'} (x))}^{2} d x, T_{2} (F) = - \int_{R} a (F (x)) f^{'} (x) d x a n d T_{3} (F) = \int_{R} {(f (x))}^{2} d x .

Notice that the functional

T_{3} (F)

is a special case of

T_{2} (F)

. The functionals

T_{1} (F)

and

T_{3} (F)

appear in plug-in data-driven bandwidth selection procedures in density estimation (refer to [63] and the references therein), and the functional

T_{2} (F)

arises as part of the variance in nonparametric location and regression estimation based on linear rank statistics (refer especially to [64]). Consider the following general class of the integral functionals of the density:

T (F) = \int_{R} φ (x, F (x), F^{(1)} (x), \dots, F^{(r)} (x)) d F (x),

where F is a cumulative distribution function on

R

with

r \geq 1

derivatives,

F^{(m)}

; refer to [65,66] for more details. One can estimate

T (F)

through plug-in methods making use of the wavelet estimate of the density function and its derivatives. The proof of such a statement, however, should require a different methodology than that used in the present paper, and we leave this problem open for future research.

Remark 8.

The nonlinear thresholding method provides alternative estimators to

f_{X} (x)

. These estimators are constructed as follows:

{\hat{f}}_{X} (x) = \sum_{k \in Z^{d}} {\hat{a}}_{j_{0}, k} ϕ_{j_{0}, k} (x) + \sum_{j = j_{0}}^{τ} \sum_{i = 1}^{N} \sum_{k \in Z^{d}} {\hat{b}}_{i j k} 1 \{|{\hat{b}}_{i j k}| > δ_{j, T}\} ψ_{i, j, k} (x),

(15)

where

δ_{j, T}

represents a carefully chosen threshold. In the case of a univariate setting (

d = 1

), the estimator

{\hat{f}}_{X}

was originally introduced by [49]. Further investigation into the effectiveness and adaptability of these estimators in higher-dimensional settings would be a promising avenue for future research.

Remark 9.

The convergence rates in the sup-norm established in the theorems are comparable to those obtained in [23,67,68,69], where optimal estimation results were derived. Notably, the precise logarithmic rates of convergence are influenced by the resolution level, which in turn depends on the smoothness parameter s of the function

f (\cdot)

defined within the Besov space

B_{s, p, q}

. This characteristic is a well-known feature of nonparametric estimation techniques and is consistent with most references in the literature. The smoothness assumptions considered in this work extend beyond the standard integer-order differentiability conditions typically required for convolution kernel methods, even though the exact value of s is generally unknown.

To ensure the estimator is applicable in practice, various techniques have been introduced for selecting the optimal adaptive value of

m (T)

. Among the most widely used methods are “Stein’s method”, the “rule of thumb”, and “cross-validation”. For a comprehensive discussion on these approaches and their role in deriving asymptotically optimal empirical bandwidth selection rules, we refer to [24,70].

Remark 10.

To provide insight into the validity of our assumptions as outlined in [71], we present several examples that illustrate how these conditions hold in different settings:

1.: Long memory discrete time processes: Consider a white noise sequence, ${(ϵ_{t})}_{t \in Z}$ , with the variance $σ^{2}$ , and let I and B denote the identity and backshift operators, respectively. As established in [72] (Theorem 1, p. 55), the k-factor Gegenbauer process is given by

$\prod_{i \leq i \leq k} {(I - 2 v_{i} B + B^{2})}^{d_{i}} X_{t} = ϵ_{t},$

where the parameters satisfy $0 < d_{i} < 1 / 2$ for $| ν_{i} | < 1$ and $0 < d_{i} < 1 / 4$ for $| ν_{i} | = 1$ , with $i = 1, \dots, k$ . This process is characterized by stationarity, causality, invertibility, and long memory behavior. Additionally, it admits a moving average representation, $X_{t} = \sum_{j \geq 0} ψ_{j} (d, v) ϵ_{t - j}$ , where the condition $\sum_{j = 0}^{\infty} ψ_{j}^{2} (d, v) < \infty$ ensures its asymptotic stability.
However, [73] demonstrated that when ${(ϵ_{t})}_{t \in Z}$ follows a Gaussian distribution, the process is not strongly mixing. Despite this, the moving average formulation guarantees that the process remains stationary, Gaussian, and ergodic. This example underscores the nuanced role of mixing conditions and highlights the significance of the moving average representation in understanding long-term dependencies.
2.: The stationary solution of a linear Markov $AR (1)$ process: Consider the autoregressive process defined by $X_{i} = \frac{1}{2} X_{i - 1} + ϵ_{i}$ , where $(ϵ_{i})$ are independent symmetric Bernoulli variables taking the values $- 1$ and 1. As established in [35], this process does not satisfy the α-mixing property due to its intrinsic dependency structure. Nonetheless, it retains key statistical properties such as stationarity, Markovianity, and ergodicity. This example illustrates that a process can be Markovian and ergodic without necessarily being strongly mixing, which has important implications for statistical inference in time series and functional data analysis.
3.: A stationary process with an $AR (1)$ representation: Consider an independent and identically distributed (i.i.d.) sequence, $(u_{i})$ , uniformly distributed over ${1, \dots, 9}$ , and define the process as

$X_{t} : = \sum_{i = 0}^{\infty} 10^{- i - 1} u_{t - i},$

where $u_{t}, u_{t - 1}, \dots$ represent the decimal expansion of $X_{t}$ . This process satisfies stationarity and can be expressed in an $AR (1)$ form:

$X_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{10} u_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{2} + ϵ_{t},$

where $ϵ_{t} = \frac{1}{10} u_{t} - \frac{1}{2}$ constitutes a strong white noise process. Although this process does not satisfy the α-mixing property [74] (Example A.3, p. 349), it remains ergodic. This example highlights that even in the absence of strong mixing, ergodicity can be preserved, making the process applicable in areas such as nonparametric functional data analysis.

Remark 11.

In many continuous time settings, observations are obtained through a sampling design. The literature contains several approaches to discretization, including both deterministic and randomized schemes; see, for instance, [75,76,77,78,79]. To illustrate this, let us consider the estimation of the density function

f (\cdot)

from the continuous time sample

{X_{t} : t \in [0, T]}

by means of discrete observations,

{X (t_{k}) : k = 1, \dots, n}

. A natural kernel-based density estimator is therefore

f_{n} (x) = \frac{1}{n h_{n}} \sum_{j = 1}^{n} K (\frac{x - X_{t_{j}}}{h_{n}^{1 / p}}) .

Following [75], we focus on two types of sampling designs: deterministic sampling and random sampling.

Deterministic sampling.: Suppose the observation times ${(t_{k})}_{1 \leq k \leq n}$ are irregularly spaced but deterministic and satisfy

$inf_{1 \leq k \leq n} |t_{k + 1} - t_{k}| = \frac{1}{τ}$

for some constant $τ > 0$ . For each $1 \leq k \leq n$ , let $G_{k} = σ (X (t_{k}))$ denote the σ-field generated by all observations, ${X_{s} : 0 \leq s \leq t_{k}}$ . Evidently, the sequence ${(G_{k})}_{k = 1}^{n}$ forms an increasing family of σ-fields.
Random sampling.: Assume the sampling times ${(t_{k})}_{1 \leq k \leq n}$ are independent random variables uniformly distributed on $[0, T]$ , independent of the process ${X_{t} : t \in [0, T]}$ . Denote by

$0 \leq τ_{1} < τ_{2} < \dots < τ_{n} \leq T$

the corresponding order statistics, which serve as the actual observation points. Since these times are strictly increasing, their spacings are strictly positive. Defining $G_{k} = σ (X (t_{k}))$ as the σ-field generated by ${X_{s} : 0 \leq s \leq τ_{k}}$ again yields an increasing sequence, ${(G_{k})}_{k = 1}^{n}$ .

We also note that a penalization method for selecting the mesh δ of observations achieves the optimal rate of convergence. However, we leave a detailed analysis of this approach in the context of ergodic processes for future research.

Remark 12.

Several studies, including [49,80,81], have demonstrated that nonlinear wavelet-based thresholding methods often outperform linear approaches when evaluated using the mean integrated error. However, Theorem 3 and its corollary establish that, for any fixed function, g, belonging to the Besov space

B_{s, p, q}

, linear wavelet estimators already achieve the classical optimal convergence rates under the almost-sure sup-norm criterion in the standard i.i.d. framework. This indicates that, in terms of convergence rates, the nonlinear thresholded estimators in (15) do not offer any improvement over their linear counterparts in (2), when

β = 0

and when the almost-sure sup-norm is used as the evaluation metric.

This finding naturally raises the question of why the sup-norm criterion may be preferable to the mean integrated error. A key reason lies in the local behavior of functions within the broad Besov space

B_{s, p, q}

, many of which exhibit pronounced local oscillations. Such fluctuations can lead to significant estimation errors in small regions of

R^{d}

, which may have little impact on integrated error measures but are readily captured by the sup-norm. This issue is particularly relevant for discontinuous functions, where large estimation errors near discontinuities may remain undetected when using the mean integrated error. Empirical evidence from [80,81] supports this argument, highlighting how wavelet-based estimators can exhibit substantial pointwise bias and variance fluctuations that are largely obscured in the mean integrated squared error.

Furthermore, in most practical applications, only a single realization of the underlying process is available. The almost-sure sup-norm criterion provides insights into estimator performance for nearly every realization, in contrast to error metrics such as the mean square error or mean integrated squared error, which reflect only the average behavior over multiple realizations. Therefore, we argue that the almost-sure sup-norm serves as a more appropriate benchmark for density and regression estimation in

B_{s, p, q}

. Further discussion on this perspective can be found in [68].

6. Concluding Remarks

This research investigated the estimation of the partial derivatives of multivariate density functions. In particular, we introduced a class of nonparametric estimators based on linear wavelet methods and examined their theoretical properties. We established strong uniform consistency over compact subsets of

R^{d}

and determined the corresponding rates of convergence. Additionally, we proved the asymptotic normality of the proposed estimators. A significant contribution of this study is the extension of existing results to the setting of ergodic processes.

One of the main open questions in this work concerns the optimal selection of smoothing parameters to minimize the mean squared error (MSE) of the estimators. This remains a crucial problem that warrants further investigation and will be the subject of future research. Another important avenue for exploration involves extending the proposed methodology to functional ergodic data, which presents considerable mathematical challenges beyond the scope of this study. Additionally, a promising research direction is the adaptation of this framework to scenarios with incomplete data, including cases where data are missing at random or subject to censoring under various mechanisms, particularly in the context of spatially dependent data.

An interesting extension of this work would be to relax the assumption of stationarity and consider locally stationary processes, developing comparable theoretical results. However, this generalization would require a fundamentally different mathematical approach, which is left for future study. Another potential research direction involves extending the estimation problem to the framework of stationary continuous time processes, which could provide further insights.

Each of these extensions involves distinct mathematical techniques beyond those employed in the present study and remains an open topic for future exploration. To enhance the practical applicability of the proposed methods, conducting extensive numerical experiments on both simulated and real datasets would be highly valuable, as it would provide empirical support and facilitate more concrete recommendations.

In conclusion, carrying out extensive simulation experiments and analyzing real datasets would greatly enhance the practical value of this study. Such empirical evaluations can provide clear guidelines for effectively implementing these techniques, thereby demonstrating their potential to advance modern statistical analysis.

7. Proofs

We derived an upper bound for the partial sums of unbounded martingale differences, which is vital for establishing the asymptotic behavior of the density estimator constructed from strictly stationary and ergodic samples. Throughout the paper, the symbol C designates a positive constant whose value may vary from one occurrence to another.

Proof of Lemma 1.

The study of the bias term is purely analytical and can be proved by the same arguments as in [27] since it is not affected by the dependence property. Therefore, the details are omitted. □

The relevant upper bound inequality is formulated in the lemmas that follow.

Lemma 2 (Burkholder–Rosenthal inequality).

Following Notation 1 in [82], let

{(X_{i})}_{i \geq 1}

be a stationary martingale adapted to the filtration

{(F_{i})}_{i \geq 1}

and define

{(d_{i})}_{i \geq 1}

as the sequence of martingale differences adapted to

{(F_{i})}_{i \geq 1}

and

S_{n} = \sum_{i = 1}^{n} d_{i}

; then, for any positive integer, n,

∥ max_{1 \leq j \leq n} | S_{j} {|) ∥}_{p} ≪ n^{1 / p} {∥ d_{1} ∥}_{p} + {∥\sum_{k = 1}^{n} E (d_{k}^{2} / F_{k - 1})∥}_{p / 2}^{1 / 2}, f o r a n y p \geq 2;

(16)

where as usual the norm is

{∥ \cdot ∥}_{p} = {(E [| \cdot |^{p}])}^{1 / p}

.

Lemma 3.

Let

{(Z_{n})}_{n \geq 1}

be a sequence of real martingale differences with respect to the sequence of σ-fields

{(F_{n} = σ (Z_{1}, \dots, Z_{n}))}_{n \geq 1}

, where is the σ-field generated by the random variables

Z_{1}, \dots, Z_{n}

. Set

S_{n} = \sum_{i = 1}^{n} Z_{i} .

For any

p \geq 2

and any

n \geq 1

, assume that there exist some nonnegative constants, C and

d_{n}

, such that

E [Z_{n}^{p} | F_{n - 1}] \leq C^{p - 1} p! d_{n}^{2}, a l m o s t s u r e l y .

Then, for any

ϵ > 0

, we have

P (| S_{n} | > ϵ) \leq 2 exp \{- \frac{ϵ^{2}}{2 (D_{n} + C ϵ)}\},

where

D_{n} = \sum_{i = 1}^{n} d_{i}^{2} .

Proof of Lemma 3.

The proof follows from a particular case of Theorem 8.2.2 due to [83]. □

In order to establish Theorem 1, we draw upon the following two lemmas.

Lemma 4.

Let

k \in Z^{d}

. Then, under the assumption (C.1), we have

E [{\hat{a}}_{m, k}] = a_{m, k}, a s T \to \infty .

(17)

Proof of Lemma 4.

Consider first the decomposition

E [{\hat{a}}_{m, k}] - a_{m, k} = (E [{\hat{a}}_{m, k}] - {\tilde{a}}_{m, k}) + ({\tilde{a}}_{m, k} - a_{m, k}) = A_{m, k, 1} + A_{m, k, 2},

(18)

where

{\tilde{a}}_{m, k} = \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} E [(\partial^{β} ϕ_{m, k}) (X_{i}) | F_{t - δ}] d t .

(19)

By invoking assumption (C.1), we can further expand

{\tilde{a}}_{m, k}

as follows:

\begin{matrix} {\tilde{a}}_{m, k} & = \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) f^{F_{t - δ}} (x) d x d t \\ = {(- 1)}^{| β |} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) (\frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} (x) d t) d x \\ = {(- 1)}^{| β |} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) (f (x) + o (1)) d x \\ = {(- 1)}^{| β |} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) f (x) d x + o (1) \\ = a_{m, k} + o (1) . \end{matrix}

(20)

Consequently,

{\tilde{a}}_{m, k} = a_{m, k} as n \to \infty,

(21)

which implies

A_{m, k, 2} = o (1) a . s .

(22)

Hence, we deduce that

E [{\hat{a}}_{m, k}] - a_{m, k} = A_{m, k, 1} + o (1) a . s .

A_{m, k, 1}

remains to be analyzed. Notice that by applying assumption (C.1), we have

\begin{matrix} A_{m, k, 1} & = & E [{\hat{a}}_{m, k}] - {\tilde{a}}_{m, k} \\ = & \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} (E [(\partial^{β} ϕ_{m, k}) (X_{t})] - E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{t - δ}] d t) \\ = & \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) (f (x) - f^{F_{t - δ}} (x)) d x d t \\ = & {(- 1)}^{| β |} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) (f (x) - (\frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} (x) d t)) d x \\ = & o (1) ({(- 1)}^{| β |} \int_{R^{d}} (\partial^{β} ϕ_{m, k}) (x) d x) \\ = & o (1), \end{matrix}

(23)

since

(\partial^{β} ϕ_{m, k})

is compactly supported, which implies

A_{m, k, 1} = o (1) a . s .

(24)

The proof of (17) is achieved. □

Lemma 5.

Let

k \in Z^{d}

; then, under the assumptions (C.1) and (C.3), we have

E [{({\hat{a}}_{m, k} - a_{m, k})}^{2}] = O (\frac{2^{2 m (| β | - d)}}{n}), a s T \to \infty .

(25)

Proof of Lemma 5.

Consider the following decomposition:

{\hat{a}}_{m, k} - a_{m, k} = ({\hat{a}}_{m, k} - {\tilde{a}}_{m, k}) + ({\tilde{a}}_{m, k} - a_{m, k}) .

(26)

From this, it follows that

\begin{matrix} E [{({\hat{a}}_{m, k} - a_{m, k})}^{2}] & = & E [{({\hat{a}}_{m, k} - {\tilde{a}}_{m, k})}^{2}] + E [{({\tilde{a}}_{m, k} - a_{m, k})}^{2}] \\ + 2 E [({\hat{a}}_{m, k} - {\tilde{a}}_{m, k}) ({\tilde{a}}_{m, k} - a_{m, k})] \\ = & A_{m, k, 1} + A_{m, k, 2} + A_{m, k, 3} . \end{matrix}

(27)

By applying statement (22), one obtains

A_{m, k, 2} = o (1) .

(28)

Moreover, using statements (22) and (24), we deduce

\begin{matrix} A_{m, k, 3} & = o (1) E [{\hat{a}}_{m, k} - {\tilde{a}}_{m, k}] \\ = o (1) \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} (E [(\partial^{β} ϕ_{m, k}) (X_{t})] - E [E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{t - δ}]]) d t \\ = 0 . \end{matrix}

(29)

Consequently, our attention now shifts to the first term in the decomposition (27). Notice that

\begin{matrix} {\hat{a}}_{m, k} - {\tilde{a}}_{m, k} & = & \frac{{(- 1)}^{| β |}}{T} \int_{0}^{T} ((\partial^{β} ϕ_{m, k}) (X_{t}) - E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{t - δ}]) \\ = & \frac{{(- 1)}^{| β |}}{T} \sum_{i = 1}^{n} Φ_{m, k, i}, \end{matrix}

where

Φ_{m, k, i} = \int_{T_{i - 1}}^{T_{i}} ((\partial^{β} ϕ_{m, k}) (X_{t}) - E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{t - δ}]) .

Notice that

{(Φ_{m, k, i})}_{0 \leq k \leq m_{n}}

is a sequence of martingale differences with respect to the sequence of

σ

-fields,

{(F_{i - 1})}_{0 \leq i \leq m_{n}}

. Using the inequality provided by Lemma 2, we immediately deduce

A_{m, k, 1} = \frac{{(- 1)}^{2 | β |}}{T^{2}} E [{|\sum_{i = 1}^{n} Φ_{m, k, i}|}^{2}] .

Furthermore, one can establish

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{m, k, i}|}^{2}])}^{\frac{1}{2}} & \leq & \sqrt{n} {∥ Φ_{m, k, 1} ∥}_{2} + {∥\sum_{i = 1}^{n} E [Φ_{m, k, i}^{2} | F_{i - 2}]∥}_{1}^{\frac{1}{2}} \\ = & Φ_{(1)} + Φ_{(2)} . \end{matrix}

(30)

To analyze these terms, we use a standard decomposition with Jensen’s inequality and note that

F_{0}

is the trivial

σ

-field. Hence, one obtains

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = & ∥ Φ_{m, k, 1} ∥_{2}^{2} \\ = & E [{|\int_{0}^{δ} ((\partial^{β} ϕ_{m, k}) (X_{t}) - E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{0}]) d t|}^{2}] \\ \leq & \int_{0}^{δ} \sum_{j = 0}^{2} (\binom{2}{j}) E [{|(\partial^{β} ϕ_{m, k}) (X_{t})|}^{j} {(E [|(\partial^{β} ϕ_{m, k}) (X_{t})|])}^{2 - j}] d t \\ = & \int_{0}^{δ} \sum_{j = 0}^{2} (\binom{2}{j}) E [{|(\partial^{β} ϕ_{m, k}) (X_{t})|}^{j}] {(E [|(\partial^{β} ϕ_{m, k}) (X_{t})|])}^{2 - j} d t \\ \leq & 4 \int_{0}^{δ} E [{|(\partial^{β} ϕ_{m, k}) (X_{t})|}^{2}] d t . \end{matrix}

(31)

Note additionally that

\begin{matrix} \int_{0}^{δ} E [{|(\partial^{β} ϕ_{m, k}) (X_{t})|}^{2}] d t & = & δ \int_{R^{d}} {(\partial^{β} ϕ_{m, k})}^{2} (u) f (u) d u \\ = & 2^{m (d + 2 | β |)} \int_{R^{d}} {(\partial^{β} ϕ)}^{2} (2^{m} u - k) f (u) d u \\ = & 2^{2 m | β |} \int_{R^{d}} {(\partial^{β} ϕ)}^{2} (v) f (\frac{v + k}{2^{m}}) d v . \end{matrix}

By employing a first-order Taylor expansion alongside Equation (1) and assumption (C.3), we obtain

\begin{matrix} \int_{0}^{δ} E [{|(\partial^{β} ϕ_{m, k}) (X_{t})|}^{2}] d t & = & δ 2^{2 m | β |} \int_{R^{d}} {(\partial^{β} ϕ)}^{2} (v) (f (v) + O (2^{- m d})) d v \\ = & O (2^{m (| β |)}) . \end{matrix}

(32)

This leads to

\begin{matrix} Φ_{1}^{2} & = & 4 (δ n) 2^{2 m | β |} \int_{R^{d}} {(\partial^{β} ϕ)}^{2} (v) (f (v) + O (2^{- m d})) d v \\ Φ_{1} & = & O (2^{m (| β |)} T^{1 / 2}) . \end{matrix}

(33)

Next, we analyze the second component of the decomposition in Equation (30). Specifically, we consider

\begin{matrix} Φ_{2} & = & {(E (\sum_{i = 1}^{n} E [Φ_{m, k, i}^{2} | F_{i - 2}]))}^{1 / 2} = {(\sum_{i = 1}^{n} E [Φ_{m, k, i}^{2}])}^{1 / 2} . \end{matrix}

Applying a Jensen inequality with a standard identity, we find

\begin{matrix} E [Φ_{m, k, i}^{2}] & = & E [{(\int_{T_{i - 1}}^{T_{i}} ((\partial^{β} ϕ_{m, k}) (X_{t}) - E [(\partial^{β} ϕ_{m, k}) (X_{t}) | F_{t - δ}]) d t)}^{2}] \\ \leq & 2 \int_{T_{i - 1}}^{T_{i}} (E [∣ (\partial^{β} ϕ_{m, k}) (X_{t}) ∣^{2}] d t + E [| E [(\partial^{β} ϕ_{m, k}) (X_{t}) F_{t - δ}] |^{2}]) d t \\ \leq & 4 \int_{T_{i - 1}}^{T_{i}} E [∣ (\partial^{β} ϕ_{m, k}) (X_{t}) ∣^{2}] d t . \end{matrix}

Using Equation (32), it follows that

\begin{matrix} Φ_{2} & = & O (2^{m (| β |)} T^{1 / 2}) . \end{matrix}

(34)

Combining the results from Equations (33) and (34), we derive

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{m, k} (X_{i})|}^{2}])}^{\frac{1}{2}} = O (2^{m (| β |)} T^{1 / 2}) . \end{matrix}

Consequently,

\begin{matrix} A_{m, k, 1} & = & E [{({\hat{a}}_{m, k} - {\tilde{a}}_{m, k})}^{2}] \\ = & \frac{{(- 1)}^{2 | β |}}{T^{2}} E [{|\sum_{i = 1}^{n} Φ_{m, k} (X_{i})|}^{2}] \\ = & O (2^{2 m (| β |)} T^{- 1}) . \end{matrix}

(35)

Finally, by integrating the findings from Equations (28), (29), and (35), we conclude that

E [{({\hat{a}}_{m, k} - a_{m, k})}^{2}] = O (\frac{2^{2 m (| β |)}}{T}) .

This completes the proof. □

Proof of Theorem 1.

The fundamental argument is constructed using established techniques; for a comprehensive exploration, please refer to [24]. Specifically, based on the projected normality’s definition, we define

E {∥{\hat{(\partial^{β} f)}}_{T} - \partial^{β} f∥}_{2}^{2} = E {∥\sum_{k \in Z^{d}} ({\hat{a}}_{m, k} - a_{m, k}) ϕ_{m, k}∥}_{2}^{2} = \sum_{k \in Z^{d}} E [{({\hat{a}}_{m, k} - a_{m, k})}^{2}] .

(36)

According to Lemma 5 and the fact that

∣ V_{ℓ} ∣ \sim 2^{m}

and

2^{m} \sim n^{\frac{1}{2 s + 1}}

and

T \sim δ n

,

E {∥{\hat{(\partial^{β} f)}}_{T} - \partial^{β} f∥}_{2}^{2} = (n^{\frac{1}{2 s + 1}}) O (\frac{2^{2 m (| β |)}}{T}) = O (T^{- \frac{2 (s - | β |)}{2 s + 1}}) .

(37)

□

Proof of Theorem 2.

We define

L (T) = {(\frac{2^{(d + 2) m (T)} T}{log T})}^{d / 2} .

Since the domain D is compact, it can be covered by a finite number,

L = L (T)

, of cubes,

I_{j} = I_{T, j}

, with centers,

x_{j} = x_{T, j}

, and side lengths,

ℓ_{T}

, for

j = 1, \dots, L (T)

. It is straightforward to see that

ℓ_{n} = \frac{const}{L^{1 / d} (T)} .

Furthermore, we define the term

{\bar{f}}_{T} (x) = \frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \int_{0}^{T} E [K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) ∣ F_{t - δ}] d t .

Now, we consider the following decomposition:

\begin{matrix} sup_{x \in D} |{\hat{(\partial^{β} f)}}_{T} (x) - E [{\hat{(\partial^{β} f)}}_{T} (x)]| & \leq & sup_{x \in D} |{\hat{(\partial^{β} f)}}_{T} (x) - {\bar{f}}_{T} (x)| \\ + sup_{x \in D} |{\bar{f}}_{T} (x) - E [{\hat{(\partial^{β} f)}}_{T} (x)]| \\ = & G_{T, 1} (x) + G_{T, 2} (x) . \end{matrix}

(38)

Making use of the fact that D is compact, we readily infer that

\begin{matrix} sup_{x \in D} |{\hat{(\partial^{β} f)}}_{T} (x) - {\bar{f}}_{T} (x)| & = & max_{1 \leq j \leq L (T)} sup_{x \in D \cap I_{j}} |{\hat{(\partial^{β} f)}}_{T} (x) - {\bar{f}}_{T} (x)| \\ \leq & max_{1 \leq j \leq L (T)} sup_{x \in D \cap I_{j}} |{\hat{(\partial^{β} f)}}_{T} (x) - {\hat{(\partial^{β} f)}}_{T} (x_{j})| \\ + max_{1 \leq j \leq L (n)} |{\hat{(\partial^{β} f)}}_{T} (x_{j}) - {\bar{f}}_{T} (x_{j})| \\ + max_{1 \leq j \leq L (T)} sup_{x \in D \cap I_{j}} |{\bar{f}}_{T} (x_{j}) - {\bar{f}}_{T} (x)| \\ = & Q_{1} + Q_{2} + Q_{3} . \end{matrix}

(39)

Statement (8) allows us to infer that

| {\hat{(\partial^{β} f)}}_{T} (x) - {\hat{(\partial^{β} f)}}_{T} (x_{j}) | \leq \frac{d^{1 / 2} C_{2}}{h_{T}^{d + | β | + 1}} ∥ x - x_{j} ∥,

which implies that

Q_{1} \leq \frac{d^{1 / 2} C_{2} ℓ_{T}}{h_{T}^{d + | β | + 1}} = \frac{c o n s t}{L^{1 / d} (T) h_{T}^{d + | β | + 1}} = O ({(\frac{log T}{T h_{T}^{d + 2 | β |}})}^{1 / 2}), a . s .

(40)

A similar argument shows, likewise, that

Q_{3} \leq \frac{d^{1 / 2} C_{2} ℓ_{T}}{h_{T}^{d + | β | + 1}} = \frac{c o n s t}{L^{1 / d} (T) h_{T}^{d + | β | + 1}} = O ({(\frac{log T}{T h_{T}^{d + 2 | β |}})}^{1 / 2}), a . s .

(41)

Now, we turn our attention to the main term and show that

Q_{2} = O ({(\frac{log T}{T h_{T}^{d + 2 | β |}})}^{1 / 2}), a . s .

(42)

Observe that

\begin{matrix} Q_{2} & = & max_{1 \leq j \leq L (n)} |{\hat{(\partial^{β} f)}}_{T} (x_{j}) - {\bar{f}}_{T} (x_{j})| \\ = & max_{1 \leq j \leq L (T)} |\frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} (K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) - E [K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}]) d t| \\ = & max_{1 \leq j \leq L (T)} |\frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \sum_{i = 1}^{n} Z_{n, i} (x_{j})|, \end{matrix}

where, for

i = 1, \dots, n

,

Z_{n, i} (x_{j}) = {(- 1)}^{| β |} \int_{T_{i - 1}}^{T_{i}} (K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) - E [K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}]) d t,

is a sequence of martingale difference arrays with respect to the

σ

-field

F_{i - 1}

. Observe that by applying the Jensen then Minkowski inequalities, we obtain

\begin{matrix} |E [{(Z_{n, i} (x_{j}))}^{p} | F_{i - 2}]| \\ \leq \int_{T_{i - 1}}^{T_{i}} E [{|K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) - E [K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}]|}^{p} | F_{i - 2}] d t \\ = \int_{T_{i - 1}}^{T_{i}} (E^{1 / p} [{(K^{(β)})}^{p} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{i - 2}] \\ + E^{1 / p} [{(E [K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}])}^{p} | F_{i - 2}]) d t . \end{matrix}

(43)

By using the fact that

{(E [K^{(β)} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}])}^{p - k}

is

F_{i - 2}

-measurable since all

δ \in [T_{i - 1}, T_{i})

, we have

F_{i - 2} \subset F_{t - δ},

It follows from (43) that

\begin{matrix} |E [{(Z_{n, i} (x_{j}))}^{p} | F_{i - 2}]| & \leq & 2^{p} \int_{T_{i - 1}}^{T_{i}} E^{1 / p} [{(K^{(β)})}^{p} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{i - 2}] d t . \end{matrix}

It follows readily from (6), for any integer,

ζ

, that

\begin{matrix} E [{(K^{(β)})}^{p} (\frac{x_{j}}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{i - 2}] \\ = \int_{R^{d}} {(K^{(β)})}^{p} (\frac{x_{j}}{h_{T}}, \frac{y}{h_{T}}) f^{F_{i - 2}} (y) d y \\ \leq \int_{R^{d}} {(\frac{C_{d + | β |}}{(1 + h_{n}^{- 1} ∥ x_{j} - y {∥_{2})}^{d + | β |}})}^{p} f^{F_{i - 2}} (y) d y \\ = \int_{R^{d}} {(\frac{h_{T}^{d + | β |} C_{d + | β |}}{(h_{T} + ∥ x_{j} - y {∥)}^{d + | β |}})}^{p} f^{F_{i - 2}} (y) d y \\ \leq h_{T}^{p (d + | β |)} C_{d + | β |}^{p} \int_{R^{d}} f^{F_{i - 2}} (y) d y = h_{T}^{p (d + | β |)} C_{d + | β |}^{p}, \end{matrix}

(44)

which gives the following upper bound:

\begin{matrix} |E [{(Z_{n, i} (x_{j}))}^{p} | F_{i - 2}]| & \leq & 2^{p} δ h_{T}^{p (d + | β |)} C_{d + | β |}^{p} \leq p! C^{p - 1} d_{i}^{2} . \end{matrix}

We shall apply Lemma 3 to the sum of

{Z_{n} (x_{j}, X_{i})}

, with

C = 2 C_{d + | β |}, d_{i}^{2} = C_{d + | β |} δ h_{T}^{d + | β |}, D_{n} = \sum_{i = 1}^{n} d_{i}^{2} = O (T h_{T}^{d + | β |})

and

ϵ_{T} = ϵ_{0} {(log T / T h_{T}^{d + 2 | β |})}^{1 / 2} .

Therefore, we have the following chain of inequalities for some positive constant,

C_{2}

:

\begin{matrix} P (max_{1 \leq j \leq L (T)} |\frac{1}{T h_{T}^{d + | β |}} \sum_{i = 1}^{n} Z_{n, i} (x_{j})| > ϵ_{T}) \\ \leq \sum_{j = 1}^{L (T)} P (|\sum_{i = 1}^{n} Z_{n, i} (x_{j})| > ϵ_{T} T h_{T}^{d + | β |}) \\ \leq 2 L (T) exp \{- \frac{ϵ_{0}^{2} {(T h_{T}^{d + | β |})}^{2} (log T / T h_{T}^{d + 2 | β |})}{2 (D_{n} + 2 C_{d + 1} T h_{T}^{d + | β |} {(log T / T h_{T}^{d + 2 | β |})}^{1 / 2})}\} \\ \leq 2 {(\frac{T}{h_{T}^{d + 1} log T})}^{d / 2} exp \{- \frac{ϵ_{0}^{2} T h_{T}^{d + | β |} (log T / h_{T}^{| β |})}{O (T h_{T}^{d + | β |}) (1 + 2 C_{d + 1} {(log T / T h_{T}^{d + 2 | β |})}^{1 / 2})}\} \\ \leq 2 {(\frac{T}{h_{T}^{d + 1} log T})}^{d / 2} (T^{- \frac{C_{2} ϵ_{0}^{2}}{h_{T}^{| β |}}}) \\ = \frac{2 T^{d / 2 - \frac{C_{2} ϵ_{0}^{2}}{h_{T}^{| β |}}}}{{(h_{T}^{d + 1} log T)}^{d / 2}} \\ = \frac{2}{{((T h_{T}^{d + 1}) log T)}^{d / 2} T^{(C_{2} ϵ_{0}^{2} / h_{T}^{| β |}) - d}} . \end{matrix}

Observe that

\frac{1}{h_{T}^{| β |}} \to \infty as n \to \infty .

We choose a sufficiently large

ϵ_{0}

, such that

(C_{2} ϵ_{0}^{2} / h_{T}^{| β |}) - d > 0 .

We readily obtain that

\sum_{n \geq 1} P (max_{1 \leq j \leq L (T)} |\frac{1}{T h_{T}^{d + | β |}} \sum_{i = 1}^{n} Z_{n, i} (x_{j})| > ϵ_{T}) < \infty .

(45)

We obtain assertion (42) through a routine application of the Borel–Cantelli lemma. Therefore, the conclusion of Theorem 2 follows, using decomposition (39) in combination with (40)–(42). The second term on the right side of (38) remains to be evaluated. One can see that

\begin{matrix} sup_{x \in D} |{\bar{f}}_{T} (x) - E [{\hat{(\partial^{β} f)}}_{T} (x)]| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \int_{0}^{T} (E [K^{β} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}] - E [K^{β} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}})]) d t| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \int_{0}^{T} \int_{R^{d}} K^{β} (\frac{x}{h_{T}}, \frac{y}{h_{T}}) (f^{F_{t - δ}} (y) - f (y)) d y d t| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{h_{T}^{d + | β |}} \int_{R^{d}} K^{β} (\frac{x}{h_{T}}, \frac{y}{h_{T}}) (\frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} (y) d t - f (y)) d y| . \end{matrix}

Making use of the Cauchy–Schwarz inequality and statement (6) when

m = d + 2 | β | + 1

, we readily obtain that

\begin{matrix} |\frac{{(- 1)}^{| β |}}{h_{T}^{d + | β |}} \int_{R^{d}} K^{β} (\frac{x}{h_{T}}, \frac{y}{h_{T}}) (\frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} (y) d t - f (y)) d y| \\ \leq {(\int_{R^{d}} {|\frac{1}{h_{n}^{d + | β |}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}})|}^{2} d y)}^{1 / 2} {(\int_{R^{d}} | \frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} (y) d t - f (y) |^{2} d y)}^{1 / 2} \\ \leq ∥ \frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} d t - f ∥_{L^{2}} {(\int_{R^{d}} (\frac{1}{h_{T}^{d + 2 | β |}} \frac{C_{d + 2 | β | + 1}}{(1 + h_{T}^{- 1} {∥ x - y ∥)}^{d + 2 | β | + 1}}) |\frac{1}{h_{T}^{d}} K^{β} (\frac{x}{h_{T}}, \frac{y}{h_{T}})| d y)}^{1 / 2} \\ \leq h_{T}^{1 / 2} C_{d + 2 | β | + 1} ∥ \frac{1}{T} \int_{0}^{T} f^{F_{t - δ}} d t - f ∥_{L^{2}} G_{1}^{1 / 2} (d) . \end{matrix}

Under assumption (C.2), we deduce that

\begin{matrix} G_{n, 2} (x) & = & sup_{x \in D} |{\bar{f}}_{n} (x) - E [{\hat{(\partial^{β} f)}}_{n} (x)]| \leq h_{T}^{1 / 2} C_{d + 2 | β | + 1} G_{1}^{1 / 2} (d) \\ = & O (h_{T}^{1 / 2}) . \end{matrix}

(46)

Hence, the proof is completed. □

Proof of Theorem 3.

Recall the decomposition

\begin{matrix} \sqrt{T h_{T}^{d + 2 | β |}} ({\hat{(\partial^{β} f)}}_{T} (x) - (\partial^{β} f) (x)) \\ = \sqrt{T h_{T}^{d + 2 | β |}} (({\hat{(\partial^{β} f)}}_{n} (x) - {\bar{f}}_{T} (x)) + ({\bar{f}}_{T} (x) - \hat{(\partial^{β} f)} (x))) \\ = \sqrt{T h_{T}^{d + 2 | β |}} (Q_{T} (x) + B_{T} (x)) . \end{matrix}

(47)

Observe that

\begin{matrix} B_{T} (x) & = & ({\bar{f}}_{T} (x) - E [{\hat{(\partial^{β} f)}}_{T} (x)]) + (E [{\hat{(\partial^{β} f)}}_{T} (x)] - \hat{(\partial^{β} f)} (x)) \\ = & B_{T, 1} (x) + B_{T, 2} (x) . \end{matrix}

(48)

Hence, using a similar argument as for (46) and under hypothesis (C.2) and statement (6) when

m = 2 (d + | β |)

and condition (10), we readily obtain that

\begin{matrix} {(T h_{T}^{d + 2 | β |})}^{1 / 2} B_{T, 1} (x) \\ = {(T h_{T}^{d + 2 | β |})}^{1 / 2} (\frac{{(- 1)}^{| β |}}{T h_{T}^{d + | β |}} \int_{0}^{T} (E [K^{β} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}] - E [K^{β} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}})]) d t) \\ = O (h_{T}^{d / 2} {(T h_{T}^{d + 2 | β |})}^{1 / 2} = o (1) . \end{matrix}

(49)

On the other hand, according to condition (10) and Lemma 1, we have

{(T h_{T}^{d + 2 | β |})}^{1 / 2} B_{T, 2} (x) = O (h_{T}^{δ} {(T h_{T}^{d + 2 | β})}^{1 / 2}) = o (1) .

(50)

Observe that

\begin{matrix} \sqrt{T h_{T}^{d + 2 | β |}} Q_{T} (x) \\ = \sum_{i = 1}^{n} \{\int_{T_{i - 1}}^{T_{i}} (\frac{{(- 1)}^{| β |}}{\sqrt{T h_{T}^{d}}} K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}})) d t - \int_{T_{i - 1}}^{T_{i}} E [\frac{{(- 1)}^{| β |}}{\sqrt{T h_{T}^{d}}} K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}] d t\} \\ = \sum_{i = 1}^{n} \{ξ_{n i} (x) - {\bar{ξ}}_{n i} (x)\} \\ = \sum_{i = 1}^{n} χ_{n i} (x), \end{matrix}

(51)

where

\begin{matrix} χ_{n i} (x) & = & ξ_{n i} (x) - {\bar{ξ}}_{n i} (x), \\ ξ_{n i} (x) & = & \frac{{(- 1)}^{| β |}}{\sqrt{T h_{T}^{d}}} \int_{T_{i - 1}}^{T_{i}} (K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}})) d t, \\ {\bar{ξ}}_{n i} (x) & = & \frac{{(- 1)}^{| β |}}{\sqrt{T h_{T}^{d}}} \int_{T_{i - 1}}^{T_{i}} E [K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) | F_{t - δ}] d t, \end{matrix}

where

χ_{n i} (x, φ)

is a triangular array of martingale differences with respect to the

σ

-field

F_{i - 1}

. This allows us to apply the central limit theorem for discrete time arrays of martingales (see [84]) to establish the asymptotic normality of

\sqrt{n h_{n}^{d + 2 | β |}} Q_{n} (x)

. This can be performed if we establish the following statements:

(a): Lyapunov condition:

$\sum_{i = 1}^{n} E [χ_{n i}^{2} (x) ∣ F_{i - 2}] \overset{P}{\to} Σ^{2} (x);$
(b): Lindberg condition:

$n E [χ_{n i}^{2} (x) 1_{\{|χ_{n i} (x)| > ϵ\}}] = o (1), holds true for any ϵ > 0 .$

Proof of part (a)

Observe that

|\sum_{i = 1}^{n} E [ξ_{n i}^{2} (x_{i}) ∣ F_{i - 2}] - \sum_{i = 1}^{n} E [χ_{n i}^{2} (x) ∣ F_{i - 2}]| = \sum_{i = 1}^{n} {(E [ξ_{n i} (x) ∣ F_{i - 2}])}^{2} .

By employing a first-order Taylor expansion alongside Equation (6) and assumption (C.4), we obtain

\begin{matrix} E [ξ_{n i} (x) ∣ F_{i - 2}] & = & \frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} \int_{T_{i - 1}}^{T_{i}} E [(K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}})) | F_{i - 2}] d t \\ = & \frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) f_{t}^{F_{i - 2}} (y) d y d t \\ = & \frac{{(- 1)}^{| β |} \sqrt{h_{n}^{d}}}{\sqrt{n}} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} K^{β} (\frac{x}{h_{n}}, \frac{x}{h_{n}} + u) f_{t}^{F_{i - 2}} (x + h_{n} u) d u d t \\ \leq & (\frac{{(- 1)}^{| β |} \sqrt{h_{n}^{d}}}{\sqrt{n}}) (\frac{C_{d + | β |}}{{(1 + ∥ u ∥}_{2})^{d + | β |}}) (\int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t + o (1)) . \end{matrix}

Let

g_{i - 1}^{F_{i - 2}} (x) = {(\int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t)}^{2} .

Making use of the Riemann sum, the quantity

g_{i - 1}^{F_{i - 2}} (x)

may be approached, whenever

δ

is small enough, by

δ f_{T_{i - 1}}^{F_{i - 2}}

. It is then clear from the discussion above that the process

{(g_{T_{i - 1}}^{F_{i - 2}})}_{i \geq 1}

is stationary and ergodic. So, the sum

\frac{1}{n} \sum_{i = 1}^{n} g_{i - 1}^{F_{i - 2}} (x)

has a finite limit, (see [85], Theorem 4.4), which is

E [f_{0}^{F_{- δ}} (x)] = {(\int_{0}^{δ} f (x) d t)}^{2} = δ^{2} f^{2} (x),

(52)

where

F_{- δ}

is the trivial

σ

-field. Therefore,

\begin{matrix} \sum_{i = 1}^{n} {(E [ξ_{n i} (x) ∣ F_{i - 2}])}^{2} \\ = (h_{n}^{d}) {(\frac{C_{d + | β |}}{{(1 + ∥ u ∥}_{2})^{d + | β |}})}^{2} (\frac{1}{n} \sum_{i = 1}^{n} {(\int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t)}^{2} + o (1)) = O (h_{n}^{d}) . \end{matrix}

Statement (a) then follows from

lim_{n \to \infty} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x) ∣ G_{i - 1}] \overset{P}{=} Σ_{φ}^{2} (x) .

Observe the fact that (see [42])

K^{(β)} (u, v) = K^{(β)} (u + k, v + k), f o r k \in Z^{d} .

According to assumptions (C.1) and (C.4), we have

\begin{matrix} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x) ∣ F_{i - 2}] \\ = \sum_{i = 1}^{n} E [{(\frac{{(- 1)}^{| β |}}{\sqrt{T h_{T}^{d}}} \int_{T_{i - 1}}^{T_{i}} K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}) d t)}^{2} | F_{i - 1}] \\ \leq \frac{1}{T h_{T}^{d}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} {(K^{(β)})}^{2} (\frac{x}{h_{T}}, \frac{u}{h_{T}}) f_{t}^{F_{i - 2}} (u) d u d t \\ = \frac{1}{n δ} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} {(K^{(β)})}^{2} (0, v) f_{t}^{F_{i - 2}} (x + h_{T} v) d v d t \\ = \frac{1}{δ} (\int_{R^{d}} {(K^{(β)})}^{2} (0, v) d v) (\frac{1}{n} \sum_{i = 1}^{n} (\int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t) + o (1)) \\ = \frac{1}{δ} (δ f (x) + o (1)) (\int_{R^{d}} {(K^{(β)})}^{2} (0, v) d v) . \end{matrix}

We deduce

lim_{n \to \infty} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x) ∣ F_{i - 2}] = f (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, v) d v .

(53)

□

Proof of part (b).

The Lindberg condition results from Corollary 9.5.2 in [86], which implies that

n E [χ_{n i}^{2} (x) 1_{\{|χ_{n i} (x)| > ϵ\}}] \leq 4 n E [ξ_{n i}^{2} (x) 1_{\{|ξ_{n i} (x)| > ϵ / 2\}}] .

Let

a > 1

and

b > 1

such that

\frac{1}{a} + \frac{1}{b} = 1 .

Making use of Hölder and Markov inequalities, one can write, for all

ϵ > 0,

\begin{matrix} E [ξ_{n i}^{2} (x) 1_{\{|ξ_{n i} (x)| > ϵ / 2\}}] & \leq & \frac{E {|ξ_{n i} (x)|}^{2 a}}{{(ϵ / 2)}^{2 a / b}} . \end{matrix}

Therefore, by using condition (6) and assumption (C.3), we obtain

\begin{matrix} 4 n E [ξ_{n i}^{2} (x) 1_{\{|ξ_{n i} (x)| > ϵ / 2\}}] \\ \leq \frac{4}{T^{a - 1} h_{T}^{a d} {(ϵ / 2)}^{2 a / b}} \int_{T_{i - 1}}^{T_{i}} E {|(K^{(β)} (\frac{x}{h_{T}}, \frac{X_{t}}{h_{T}}))|}^{2 a} d t \\ = \frac{4}{T^{a - 1} h_{n}^{a d} {(ϵ / 2)}^{2 a / b}} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} {(K^{(β)})}^{2 a} (\frac{x}{h_{T}}, \frac{u}{h_{T}}) f (u) d u d t \\ = \frac{4}{T^{a - 1} h_{T}^{(a - 1) d} {(ϵ / 2)}^{2 a / b}} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} {(K^{(β)})}^{2 a} (0, v) f (x + h_{T} v) d v d t \\ = \frac{4}{T^{a - 1} h_{T}^{(a - 1) d} {(ϵ / 2)}^{2 a / b}} \int_{T_{i - 1}}^{T_{i}} \int_{R^{d}} {(K^{(β)})}^{2 a} (0, v) (f (x) + o (1)) d v d t . \end{matrix}

We infer that

\begin{matrix} 4 n E [ξ_{n i}^{2} (x, φ) 1_{\{|ξ_{n i} (x, φ)| > ϵ / 2\}}] & \leq & \frac{4 C_{d + 1}^{2 a}}{T^{a - 1} h_{T}^{(a - 1) d} {(ϵ / 2)}^{2 a / b}} \\ = & O ({(\frac{1}{T h_{T}^{d}})}^{a - 1}) . \end{matrix}

(54)

Combining statements (53) and (54), we achieve the proof of the theorem. □

□

Author Contributions

Conceptualization, S.D. and S.B.; methodology, S.D. and S.B.; validation, S.D. and S.B.; formal analysis, S.D. and S.B.; investigation, S.D. and S.B.; resources, S.D. and S.B.; writing—original draft preparation, S.D. and S.B.; writing—review and editing, S.D. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Deanship of Scientific Research, Qassim University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2025). The authors gratefully acknowledge Qassim University, represented by the Deanship of Scientific Research. The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the four reviewers for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in markedly improved presentation.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Besov Spaces

There are many equivalent definitions of Besov spaces; see, e.g., [87]. Following [68], consider the parameters

1 \leq p, q \leq \infty

, and define the shift operator

S_{τ}

acting on a function, f, as

(S_{τ} f) (x) = f (x - τ) .

For a fractional order,

0 < s < 1

, introduce the seminorm

γ_{s, p, q} (f) = {(\int_{R^{d}} {(\frac{∥ S_{τ} {f - f ∥}_{L_{p}}}{{∥ τ ∥}^{s}})}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q},

with the limiting case when

q = \infty

, given by

γ_{s, p, \infty} (f) = sup_{τ \in R^{d}} \frac{∥ S_{τ} {f - f ∥}_{L_{p}}}{{∥ τ ∥}^{s}} .

For the special case

s = 1

, the seminorm takes the form

γ_{1, p, q} (f) = {(\int_{R^{d}} {(\frac{∥ S_{τ} f + S_{- τ} {f - 2 f ∥}_{L_{p}}}{∥ τ ∥})}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q},

and in the supremum norm,

γ_{1, p, \infty} (f) = sup_{τ \in R^{d}} \frac{∥ S_{τ} f + S_{- τ} {f - 2 f ∥}_{L_{p}}}{∥ τ ∥} .

The Besov space

B_{s, p, q}

for

0 < s < 1

and

1 \leq p, q \leq \infty

consists of all functions,

f \in L_{p} (R^{d})

, for which

γ_{s, p, q} (f)

is finite:

B_{s, p, q} = {f \in L_{p} (R^{d}) : γ_{s, p, q} (f) < \infty} .

For higher regularity levels, where

s > 1

, decompose s into its integer and fractional components as

s = {[s]}^{-} + {s}^{+}

, where

{[s]}^{-}

is the greatest integer less than or equal to s and

0 < {s}^{+} \leq 1

. In this case, the Besov space

B_{s, p, q}

comprises functions,

f \in L_{p} (R^{d})

, whose weak derivatives,

D^{j} f

, belong to

B_{{s}^{+}, p, q}

for all multi-indices, j, satisfying

| j | \leq {[s]}^{-}

. The associated norm is given by

{∥ f ∥}_{B_{s, p, q}} = {∥ f ∥}_{L_{p}} + \sum_{| j | \leq {[s]}^{-}} γ_{{s}^{+}, p, q} (D^{j} f) .

classically differentiable according to the fundamental theorem of calculus. One defines the Sobolev spaces of weakly differentiable functions on A; for

1 \leq p < \infty

, the

L^{p}

Sobolev space of the order

s \in N

is defined as

H_{p}^{s} (A) = \{f \in L^{p} : \partial^{j} f \in L^{p} (A), \forall j = {1, \dots, s : ∥ f ∥}_{H_{p}^{s} (A)} \equiv {∥ f ∥}_{p} + {∥\partial^{s} f∥}_{p} < \infty\},

Notable instances of Besov spaces include the Sobolev space

H_{2}^{s} = B_{s, 2, 2}

and the space of bounded s-Lipschitz functions,

B_{s, \infty, \infty}

.

Remark A1.

It is worth noting that the optimal

L^{2}

risk for density estimation on a Sobolev ball with the regularity index s is of the order

O (n^{- 2 s / (2 s + 1)})

; see [88,89,90]. For a comprehensive examination of the connections between classical function spaces and Besov spaces—including the Fourier analytical characterizations of Sobolev spaces for

p \neq 2

—the reader is referred to [91,92]. In ref. [93], the relationship between

V_{p}

spaces (comprising functions of bounded p-variation) and Besov spaces is explored further, drawing on interpolation methods detailed in [52]. An alternative, more traditional treatment of p-variation spaces can be found in [94]. Moreover, an important reference addressing Besov spaces defined on broader geometric structures, such as manifolds and Dirichlet spaces, is [95].

References

Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. Non-parametric inference for density modes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 99–126. [Google Scholar] [CrossRef]
Sasaki, H.; Noh, Y.K.; Niu, G.; Sugiyama, M. Direct density derivative estimation. Neural Comput. 2016, 28, 1101–1140. [Google Scholar] [CrossRef] [PubMed]
Noh, Y.K.; Sugiyama, M.; Liu, S.; du Plessis, M.C.; Park, F.C.; Lee, D.D. Bias reduction and metric learning for nearest-neighbor estimation of Kullback-Leibler divergence. Neural Comput. 2018, 30, 1930–1960. [Google Scholar] [CrossRef] [PubMed]
Vasiliev, V.A.; Dobrovidov, A.V.; Koshkin, G.M. Neparametricheskoe Otsenivanie Funktsionalov ot Raspredeleniĭ Statsionarnykh Posledovatel’ Nosteĭ; Nauka: Moscow, Russia, 2004; p. 510. [Google Scholar]
Dobrovidov, A.V.; Ruds’ko, I.M. Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method. Autom. Remote Control 2010, 71, 209–224. [Google Scholar] [CrossRef]
Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. On the path density of a gradient field. Ann. Stat. 2009, 37, 3236–3271. [Google Scholar] [CrossRef]
Singh, R.S. Applications of estimators of a density and its derivatives to certain statistical problems. J. R. Stat. Soc. Ser. B 1977, 39, 357–363. [Google Scholar] [CrossRef]
Meyer, T.G. Bounds for estimation of density functions and their derivatives. Ann. Stat. 1977, 5, 136–142. [Google Scholar] [CrossRef]
Silverman, B.W. Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Stat. 1978, 6, 177–184. [Google Scholar] [CrossRef]
Singh, R.S. Mean squared errors of estimates of a density and its derivatives. Biometrika 1979, 66, 177–180. [Google Scholar] [CrossRef]
Katkovnik, V.Y.; Poletaeva, N.G. Synthesis of optimal kernel estimates for the probability density and its derivatives in the presence of errors of observations. Kibern. Vychisl. Tekh. Kiev 1985, 67, 9–20. [Google Scholar]
Horová, I.; Vieu, P.; Zelinka, J. Optimal choice of nonparametric estimates of a density and of its derivatives. Stat. Decis. 2002, 20, 355–378. [Google Scholar] [CrossRef]
Abdous, B.; Germain, S.; Ghazzali, N. A unified treatment of direct and indirect estimation of a probability density and its derivatives. Stat. Probab. Lett. 2002, 56, 239–250. [Google Scholar] [CrossRef]
Chacón, J.E.; Duong, T.; Wand, M.P. Asymptotics for general multivariate kernel density derivative estimators. Stat. Sin. 2011, 21, 807–840. [Google Scholar] [CrossRef]
Henderson, D.J.; Parmeter, C.F. Normal reference bandwidths for the general order, multivariate kernel density derivative estimator. Stat. Probab. Lett. 2012, 82, 2198–2205. [Google Scholar] [CrossRef]
Henderson, D.J.; Parmeter, C.F. Canonical higher-order kernels for density derivative estimation. Stat. Probab. Lett. 2012, 82, 1383–1387. [Google Scholar] [CrossRef][Green Version]
Funke, B.; Hirukawa, M. Density derivative estimation using asymmetric kernels. J. Nonparametr. Stat. 2024, 36, 994–1017. [Google Scholar] [CrossRef]
Cao, K.; Zeng, X. Data-driven wavelet estimations for density derivatives. Bull. Malays. Math. Sci. Soc. 2024, 47, 18. [Google Scholar] [CrossRef]
Durastanti, C.; Turchi, N. Nonparametric needlet estimation for partial derivatives of a probability density function on the d-torus. J. Nonparametr. Stat. 2023, 35, 733–772. [Google Scholar] [CrossRef]
Guo, L.; Song, W.; Shi, J. Estimating multivariate density and its derivatives for mixed measurement error data. J. Multivariate Anal. 2022, 191, 18. [Google Scholar] [CrossRef]
Xu, J. Wavelet thresholding estimation of density derivatives from a negatively associated size-biased sample. Int. J. Wavelets Multiresolut. Inf. Process. 2020, 18, 15. [Google Scholar] [CrossRef]
Chacón, J.E.; Duong, T. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat. 2013, 7, 499–532. [Google Scholar] [CrossRef]
Bouzebda, S. Limit theorems for wavelet conditional U-statistics for time series models. Math. Methods Statist. 2025, 35, 1–42. [Google Scholar]
Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation, and Statistical Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 129, pp. xviii+265. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Nonparametric estimation of the derivatives of a density by the method of wavelets. Bull. Inf. Cybern. 1996, 28, 91–100. [Google Scholar]
Chaubey, Y.P.; Doosti, H.; Prakasa Rao, B.L.S. Wavelet based estimation of the derivatives of a density with associated variables. Int. J. Pure Appl. Math. 2006, 27, 97–106. [Google Scholar]
Rao, B.L.S.P. Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets. In Asymptotics in Statistics and Probability; Puri, M.L., Ed.; De Gruyter: Berlin, Germany; Boston, MA, USA, 2000; pp. 321–330. [Google Scholar] [CrossRef]
Hosseinioun, N.; Doosti, H.; Niroumand, H.A. Nonparametric estimation of a multivariate probability density for mixing sequences by the method of wavelets. Ital. J. Pure Appl. Math. 2011, 28, 31–40. [Google Scholar]
Koshkin, G.; Vasil’iev, V. An estimation of a multivariate density and its derivatives by weakly dependent observations. In Statistics and Control of Stochastic Processes. The Liptser Festschrift. Papers from the Steklov Seminar Held in Moscow, Russia, 1995–1996; World Scientific: Singapore, 1997; pp. 229–241. [Google Scholar]
Prakasa Rao, B.L.S. Wavelet estimation for derivative of a density in the presence of additive noise. Braz. J. Probab. Stat. 2018, 32, 834–850. [Google Scholar] [CrossRef]
Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stoch. Process. Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef]
Bradley, R.C. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 3, pp. xii+597. [Google Scholar]
Bouzebda, S.; Didi, S.; El Hajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Statist. 2015, 24, 163–199. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Commun. Stat. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
Andrews, D.W.K. Non-strong mixing autoregressive processes. J. Appl. Probab. 1984, 21, 930–934. [Google Scholar] [CrossRef]
Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef]
Neumann, M.H. Absolute regularity and ergodicity of Poisson count processes. Bernoulli 2011, 17, 1268–1284. [Google Scholar] [CrossRef]
Beran, J. Statistics for long-memory processes. In Monographs on Statistics and Applied Probability; Chapman and Hall: New York, NY, USA, 1994; Volume 61, pp. x+315. [Google Scholar]
Lu, Z. Analyse des Processus Longue Mémoire Stationnaires et Non-Stationnaires: Estimations, Applications et préVisions. Ph.D. Thesis, Cachan, Ecole Normale Supérieure, Cachan, France, 2009. [Google Scholar]
Maslowski, B.; Pospíšil, J. Ergodicity and parameter estimates for infinite-dimensional fractional Ornstein-Uhlenbeck process. Appl. Math. Optim. 2008, 57, 401–429. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef] [PubMed]
Meyer, Y. Wavelets and operators. In Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1992; Volume 37, pp. xvi+224. [Google Scholar]
Daubechies, I. Ten lectures on wavelets. In CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 61, pp. xx+357. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M. Minimax estimation via wavelet shrinkage. Ann. Statist. 1998, 26, 879–921. [Google Scholar] [CrossRef]
Donoho, D.L.; Vetterli, M.; DeVore, R.A.; Daubechies, I. Data Compression and Harmonic Analysis. 1998; Volume 44, pp. 2435–2476. [Google Scholar] [CrossRef]
Birgé, L.; Massart, P. An adaptive compression algorithm in Besov spaces. Constr. Approx. 2000, 16, 1–36. [Google Scholar] [CrossRef]
Nickl, R.; Pötscher, B.M. Bracketing metric entropy rates and empirical central limit theorems for function classes of Besov- and Sobolev-type. J. Theoret. Probab. 2007, 20, 177–199. [Google Scholar] [CrossRef]
Nickl, R. Empirical and Gaussian processes on Besov classes. Lect. Notes Monogr. Ser. 2006, 51, 185–195. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Statist. 1996, 24, 508–539. [Google Scholar] [CrossRef]
Schneider, C. Beyond Sobolev and Besov—Regularity of Solutions of PDEs and Their Traces in Function Spaces; Lecture Notes in Mathematics; Springer: Cham, Switzerland, 2021; Volume 2291, pp. xviii+327. [Google Scholar] [CrossRef]
Sawano, Y. Theory of Besov Spaces; Developments in Mathematics; Springer: Singapore, 2018; Volume 56, pp. xxiii+945. [Google Scholar] [CrossRef]
Peetre, J. New Thoughts on Besov Spaces; Duke University Mathematics Series, No. 1; Duke University; Mathematics Department: Durham, NC, USA, 1976; pp. vi+305. [Google Scholar]
Delecroix, M. Sur l’estimation et la Prévision non Paramétrique des Processus Ergodiques. Ph.D. Thesis, Thèse de doctorat dirigée par Bosq, Denis Sciences, Mathématiques Lille 1, Villeneuve-d’Ascq, France, 1987. [Google Scholar]
Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
Giné, E.; Koltchinskii, V.; Sakhanenko, L. Kernel density estimators: Convergence in distribution for weighted sup-norms. Probab. Theory Relat. Fields 2004, 130, 167–198. [Google Scholar] [CrossRef]
Giné, E.; Koltchinskii, V.; Zinn, J. Weighted uniform consistency of kernel density estimators. Ann. Probab. 2004, 32, 2570–2605. [Google Scholar] [CrossRef][Green Version]
Bellman, R. Adaptive Control Processes: A Guided Tour; Princeton University Press: Princeton, NJ, USA, 1961; pp. xvi+255. [Google Scholar]
Scott, D.W.; Wand, M.P. Feasibility of multivariate density estimates. Biometrika 1991, 78, 197–205. [Google Scholar] [CrossRef]
Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. [Google Scholar] [CrossRef]
Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
Mokkadem, A.; Pelletier, M. The law of the iterated logarithm for the multivariable kernel mode estimator. ESAIM Probab. Stat. 2003, 7, 1–21. [Google Scholar] [CrossRef]
Hall, P.; Marron, J.S. Estimation of integrated squared density derivatives. Statist. Probab. Lett. 1987, 6, 109–115. [Google Scholar] [CrossRef]
Jurečková, J. Asymptotic linearity of a rank statistic in regression parameter. Ann. Math. Statist. 1969, 40, 1889–1900. [Google Scholar] [CrossRef]
Giné, E.; Mason, D.M. Uniform in bandwidth estimation of integral functionals of the density function. Scand. J. Statist. 2008, 35, 739–761. [Google Scholar] [CrossRef]
Levit, B.Y. Asymptotically efficient estimation of nonlinear functionals. Problemy Peredachi Informatsii 1978, 14, 65–72. [Google Scholar]
Masry, E. Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stoch. Process. Appl. 1997, 67, 177–193. [Google Scholar] [CrossRef]
Masry, E. Wavelet-based estimation of multivariate regression functions in Besov spaces. J. Nonparametr. Statist. 2000, 12, 283–308. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Liu, J. Multivariate wavelet estimators for weakly dependent processes: Strong consistency rate. Commun. Statist. Theory Methods 2023, 52, 8317–8350. [Google Scholar] [CrossRef]
Hall, P.; Penev, S. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 2001, 7, 317–341. [Google Scholar] [CrossRef]
Chaouch, M.; Laïb, N. Regression estimation for continuous-time functional data processes with missing at random response. J. Nonparametr. Stat. 2024, 36, 1–32. [Google Scholar] [CrossRef]
Giraitis, L.; Leipus, R. A generalized fractionally differencing approach in long-memory modeling. Liet. Mat. Rink. 1995, 35, 65–81. [Google Scholar] [CrossRef]
Guégan, D.; Ladoucette, S. Non-mixing properties of long memory processes. C. R. Acad. Sci. Paris Sér. I Math. 2001, 333, 373–376. [Google Scholar] [CrossRef]
Francq, C.; Zakoïan, J.M. GARCH Models: Structure, Statistical Inference and Financial Applications; John Wiley & Sons, Ltd.: Chichester, UK, 2010; pp. xiv+489. [Google Scholar] [CrossRef]
Masry, E. Probability density estimation from sampled data. IEEE Trans. Inform. Theory 1983, 29, 696–709. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Nonparametric density estimation for stochastic processes from sampled data. Publ. Inst. Statist. Univ. Paris 1990, 35, 51–83. [Google Scholar]
Prakasa Rao, B.L.S. Statistical Inference for Diffusion Type Processes. In Kendall’s Library of Statistics; Edward Arnold: London, UK; Oxford University Press: New York, NY, USA, 1999; Volume 8, pp. xvi+349. [Google Scholar]
Bosq, D. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, 2nd ed.; Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 110, pp. xvi+210. [Google Scholar] [CrossRef]
Blanke, D.; Pumo, B. Optimal sampling for density estimation in continuous time. J. Time Ser. Anal. 2003, 24, 1–23. [Google Scholar] [CrossRef]
Hall, P.; Patil, P. On wavelet methods for estimating smooth functions. Bernoulli 1995, 1, 41–58. [Google Scholar] [CrossRef]
Hall, P.; Patil, P. Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Ann. Statist. 1995, 23, 905–928. [Google Scholar] [CrossRef]
Burkholder, D.L. Distribution function inequalities for martingales. Ann. Probab. 1973, 1, 19–42. [Google Scholar] [CrossRef]
de la Peña, V.H.; Giné, E. Decoupling; Probability and its Applications (New York); Springer: New York, NY, USA, 1999; pp. xvi+392. [Google Scholar] [CrossRef]
Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Probability and Mathematical Statistics; Academic Press, Inc.: New York, NY, USA; Harcourt Brace Jovanovich, Publishers: London, UK, 1980; pp. xii+308. [Google Scholar]
Krengel, U. Ergodic Theorems. In De Gruyter Studies in Mathematics; Walter de Gruyter & Co.: Berlin, Germany, 1985; Volume 6, pp. viii+357. [Google Scholar] [CrossRef]
Chow, Y.S.; Teicher, H. Probability Theory; Springer: New York, NY, USA; Heidelberg, Germany, 1978; pp. xv+455. [Google Scholar]
Triebel, H. Theory of function spaces. II. In Monographs in Mathematics; Birkhäuser: Basel, Switzerland, 1992; Volume 84, pp. viii+370. [Google Scholar] [CrossRef]
Schipper, M. Optimal rates and constants in L₂-minimax estimation of probability density functions. Math. Methods Statist. 1996, 5, 253–274. [Google Scholar]
Efromovich, S. Adaptive estimation of and oracle inequalities for probability densities and characteristic functions. Ann. Statist. 2008, 36, 1127–1155. [Google Scholar] [CrossRef]
Efromovich, S. Lower bound for estimation of Sobolev densities of order less 12. J. Statist. Plann. Inference 2009, 139, 2261–2268. [Google Scholar] [CrossRef]
Triebel, H. Theory of Function Spaces; Monographs in Mathematics; Birkhäuser: Basel, Switzerland, 1983; Volume 78, p. 284. [Google Scholar] [CrossRef]
DeVore, R.A.; Lorentz, G.G. Constructive Approximation; Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]; Springer: Berlin, Germany, 1993; Volume 303, pp. x+449. [Google Scholar]
Bourdaud, G.; Lanza de Cristoforis, M.; Sickel, W. Superposition operators and functions of bounded p-variation. Rev. Mat. Iberoam. 2006, 22, 455–487. [Google Scholar] [CrossRef]
Dudley, R.M.; Norvaiša, R. Differentiability of Six Operators on Nonsmooth Functions and P-variation; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1703, pp. viii+277. [Google Scholar] [CrossRef]
Geller, D.; Pesenson, I.Z. Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds. J. Geom. Anal. 2011, 21, 334–371. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Didi, S.; Bouzebda, S. Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes. Entropy 2025, 27, 389. https://doi.org/10.3390/e27040389

AMA Style

Didi S, Bouzebda S. Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes. Entropy. 2025; 27(4):389. https://doi.org/10.3390/e27040389

Chicago/Turabian Style

Didi, Sultana, and Salim Bouzebda. 2025. "Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes" Entropy 27, no. 4: 389. https://doi.org/10.3390/e27040389

APA Style

Didi, S., & Bouzebda, S. (2025). Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes. Entropy, 27(4), 389. https://doi.org/10.3390/e27040389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Linear Wavelet-Based Estimators of Partial Derivatives of Multivariate Density Function for Stationary and Ergodic Continuous Time Processes

Abstract

1. Introduction

Notation

2. Mathematical Background

2.1. Linear Wavelet Estimator

2.2. Besov Spaces

2.3. Linear Wavelet Estimator

3. Assumptions and Main Results

4. Comments on Hypotheses

4.1. Asymptotic Normality Results

4.2. Confidence Interval

5. Application to Multivariate Mode Estimation

6. Concluding Remarks

7. Proofs

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Besov Spaces

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI