Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes

Didi, Sultana; Bouzebda, Salim

doi:10.3390/math13101587

Open AccessArticle

Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes

by

Sultana Didi

^1,†

and

Salim Bouzebda

^2,*,†

¹

Department of Statistics and Operations Research, College of Sciences, Qassim University, P.O. Box 6688, Buraydah 51452, Saudi Arabia

²

Université de Technologie de Compiègne, LMAC (Laboratory of Applied Mathematics of Compiègne), CS 60 319-60 203 Compiègne, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(10), 1587; https://doi.org/10.3390/math13101587

Submission received: 11 April 2025 / Revised: 6 May 2025 / Accepted: 9 May 2025 / Published: 12 May 2025

(This article belongs to the Special Issue Mathematical Statistics and Nonparametric Inference)

Download Versions Notes

Abstract

This study introduces a wavelet-based framework for estimating derivatives of a general regression function within discrete-time, stationary ergodic processes. The analysis focuses on deriving the integrated mean squared error (IMSE) over compact subsets of

R^{d}

, while also establishing rates of uniform convergence and the asymptotic normality of the proposed estimators. To investigate their asymptotic behavior, we adopt a martingale-based approach specifically adapted to the ergodic nature of the data-generating process. Importantly, the framework imposes no structural assumptions beyond ergodicity, thereby circumventing restrictive dependence conditions. By establishing the limiting behavior of the wavelet estimators under these minimal assumptions, the results extend existing findings for independent data and highlight the flexibility of wavelet methods in more general stochastic settings.

Keywords:

regression estimation; stationarity; ergodicity; rates of strong convergence; wavelet-based estimators; martingale differences; discrete time; stochastic processes; time series

MSC:

62G07; 62G08; 62G05; 62G20; 62H05; 60G42; 60G46

1. Introduction

Nonparametric estimation has attracted sustained attention over many years, leading to a broad spectrum of methodological developments. Given its extensive range of applications and significant role within mathematical statistics, the estimation of both density and regression functions has become a central research theme. Among the most frequently employed approaches are kernel-type estimators, recognized for their flexibility and robust performance. For comprehensive treatments of these methods and their applications, the reader is referred to [1,2], and the references therein. Estimating derivatives of a function—whether a density or a regression function—serves as a powerful technique in statistical data analysis. Despite their importance, however, the nonparametric estimation of higher-order density derivatives has not been extensively explored. A key objective of this work is thus to investigate wavelet-based nonparametric estimators for the partial derivatives of multivariate densities. Derivative estimation arises in many disciplines, including economics and industry, where complex systems must often be modeled under limited prior knowledge. For instance, the second-order derivative of a density can support statistical tests for mode detection [3] and also guides bandwidth selection in kernel density estimation [4]. In nonparametric signal estimation, the logarithmic derivative of the density—defined by the ratio of the derivative to the density itself—is instrumental for optimal filtering and interpolation procedures [5], thereby making precise estimation crucial for accurate signal processing. Moreover, gradient estimation is essential for filament detection in point cloud data, a task of broad relevance in medical imaging, remote sensing, seismology, and cosmology [6]. Additional motivations and applications appear in regression analysis, Fisher information estimation, parameter estimation, and hypothesis testing [7]. Seminal investigations into density derivative estimation include [8,9,10], among others. Further context for related statistical challenges, such as regression, Fisher information estimation, parameter estimation, and hypothesis testing, may be found in [7].

It is likewise well documented that estimating first- and higher-order regression derivatives holds substantial practical importance. Examples include modeling human growth processes [11], assessing kidney function in lupus nephritis patients [12], and analyzing Raman spectra of bulk materials [13]. In nonparametric regression, derivative estimation is indispensable for constructing confidence intervals [14], selecting kernel bandwidths [15], and comparing regression curves [16]. In the setting of a homoscedastic regression model, ref. [17] proposed kernel M-estimators to estimate the first derivative of the regression function in a nonparametric fashion, and extended these ideas heuristically to higher-order derivatives. Derivative information also underpins modal regression, an alternative to traditional regression methods for investigating the relationship between a response variable

Y

and a predictor variable

X

; see [18]. Foundational studies of regression function estimation can be found in [19,20]. In contrast, fewer works address the estimation of derivatives for stationary densities or regression functions, with most established results pertaining to independent and identically distributed data. Wavelet-based techniques have recently attracted growing interest in statistical estimation, largely owing to their adaptability to varying function regularity and their capacity to handle discontinuities effectively. Furthermore, wavelet methods often lead to computationally efficient algorithms that require relatively modest memory. For a detailed overview of wavelet methodologies in nonparametric functional estimation, see [21]. Examples of wavelet applications include estimating the integrated squared derivative of a univariate density for independent data [22] and for negatively or positively associated sequences [23]. The work of [24] extended these methods to partial derivatives of multivariate densities under independence, while [25,26] addressed the mixing scenario. More recently, ref. [27] studied wavelet estimators for the partial derivatives of multivariate densities under additive noise.

To the best of our knowledge, methods for wavelet-based estimation of partial derivatives of multivariate densities have not yet been extended to more general dependence structures beyond the strong mixing framework. Addressing this gap forms the central motivation of our work. In particular, we draw on a collection of martingale-based techniques that markedly differ from the tools typically employed under strong mixing conditions. Nevertheless, as the subsequent sections will illustrate, bridging this gap entails much more than merely combining pre-existing methodologies: it requires advanced mathematical developments tailored to wavelet-based estimation in an ergodic setting.

The remainder of this article is organised as follows. Section 2 reviews the necessary mathematical foundations and introduces the proposed class of linear wavelet estimators. Section 3 specifies the underlying assumptions and presents the principal theoretical results—uniform convergence rates and asymptotic normality established under weak-dependence conditions. Section 4 applies the methodology to regression-derivative estimation, whereas Section 5 explores its extension to mode regression. Concluding remarks and avenues for further investigation are given in Section 6. To maintain the flow of exposition, all proofs are collected in Section 7.

Notation

Unless otherwise specified, C denotes a positive constant whose value may change from line to line. We write

1_{A} (\cdot)

for the indicator function of a set A. For two sequences of positive real numbers

{a_{n}}_{n \geq 1}

and

{b_{n}}_{n \geq 1}

, we use the Landau notation

a_{n} = O (b_{n})

when there exists a constant

C > 0

such that

a_{n} \leq C b_{n}

for all sufficiently large n, and

a_{n} = o (b_{n})

when

lim_{n \to \infty} \frac{a_{n}}{b_{n}} = 0

.

2. Mathematical Backgrounds

2.1. Besov Spaces

Throughout this study we adopt the Besov scale

B_{s, p, q}

with parameters

s > 0

and

1 \leq p, q \leq \infty

. Meyer’s wavelet characterization [28] shows that, for

0 < s < r

and any

f \in L_{p} (R^{d})

, the inclusion

f \in B_{s, p, q}

is equivalent to the finiteness of either of the two seminorms

(B.1): $J_{s, p, q} (f) = {∥P_{V_{0}} f∥}_{L_{p}} + {(\sum_{j > 0} {(2^{j s} {∥ P_{W_{j}} f ∥}_{L_{p}})}^{q})}^{1 / q} < \infty,$
(B.2): $J_{s, p, q}^{'} (f) = {∥a_{0, \cdot}∥}_{l_{p}} + {(\sum_{j > 0} {(2^{j [s + d (\frac{1}{2} - \frac{1}{p})]} {∥b_{j, \cdot}∥}_{l_{p}})}^{q})}^{1 / q} < \infty,$

where the scaling and wavelet coefficients are defined by

a_{0, k} = \int_{R^{d}} f (u) ϕ_{0, k} (u) d u, b_{i, j, k} = \int_{R^{d}} f (u) ψ_{i, j, k} (u) d u,

and

∥ b_{j, \cdot} ∥_{ℓ_{p}} = {(\sum_{i = 1}^{d} \sum_{k \in Z^{d}} {| b_{i, j, k} |}^{p})}^{1 / p} .

When

q = \infty

, the summations in

(B . 1)

and

(B . 2)

are replaced by the essential supremum. The Besov framework subsumes many classical smoothness classes that are ubiquitous in statistics and machine learning—for instance, the Sobolev (Hilbert) space

H^{s}

corresponds to

B_{s, 2, 2}

, while the (non-integer) Hölder–Lipschitz–Zygmund class

C^{s}

coincides with

B_{s, \infty, \infty}

. Additional equivalent formulations, together with their advantages in approximation theory and statistical methodology, are detailed in [29,30,31,32] and in Appendix A.

2.2. Linear Wavelets Estimator

This section opens with a concise review of the essentials of wavelet theory, following the notation of [28]. Let

{V_{j}}_{j \in Z}

be a multiresolution analysis of the Hilbert space

L_{2} (R^{d})

. Denote by

ϕ

the scaling function and by

ψ

the associated orthogonal wavelet, both assumed to be r-regular (

r \geq 1

) and compactly supported in the hyper-cube

{[- L, L]}^{d}

for some

L > 0

. For every integer j and every index vector

k \in Z^{d}

, define

ϕ_{j, k} (x) = 2^{j d / 2} ϕ (2^{j} x - k) .

The family

{ϕ_{j, k}}_{k \in Z^{d}}

forms an orthonormal basis of

V_{j}

; moreover, each partial derivative of

ϕ

up to total order r is rapidly decreasing. Specifically, for every integer

i > 0

there exists a constant

A_{i} > 0

such that (see [28], p. 29, Thm.2)

|(\partial^{β} ϕ) (x)| \leq \frac{A_{i}}{{(1 + ∥ x ∥)}^{i}}, for all ∣ β ∣ \leq r .

(1)

In addition, there exist precisely

2^{d} - 1

companion wavelets

ψ_{1}, \dots, ψ_{2^{d} - 1}

. Together they generate the collection

\{ψ_{i, j, k} (x) = 2^{j d / 2} ψ_{i} (2^{j} x - k) | i = 1, \dots, 2^{d} - 1, k \in Z^{d}\},

which constitutes an orthonormal basis of

L_{2} (R^{d})

.

We next recall the notions of strong mixing and ergodicity and the relationship between them. Let

{X_{n}}_{n \in Z}

be a strictly stationary sequence, and set

A_{n} = σ (X_{k} : k \leq n)

and

B_{m} = σ (X_{k} : k \geq m) .

The sequence is (

α

-)strongly mixing if

α (n) = sup_{A \in A_{0}, B \in B_{n}} |P (A \cap B) - P (A) P (B)| ⟶ 0 (n \to \infty) .

It is ergodic when

lim_{n \to \infty} \frac{1}{n} \sum_{k = 0}^{n - 1} |P (A \cap τ^{- k} B) - P (A) P (B)| = 0,

where

τ

denotes the left-shift operator. The foregoing definition of strong mixing is stricter than the one often used for measure-preserving transformations—namely,

{lim}_{n \to \infty} P (A \cap τ^{- n} B) = P (A) P (B)

for all measurable sets

A, B

[33]. While strong mixing implies ergodicity, the converse need not hold (cf. Remark 2.6, p. 50, and Proposition 2.8, p. 51, of [34]). A number of authors have argued that an ergodic, rather than a strongly mixing, dependence structure is often preferable; see, for example, the discussion and illustrative examples in [35].

Finally, let

(X, Y)

be a random vector with

X = (X_{1}, \dots, X_{d}) \in R^{d}

and

Y = (Y_{1}, \dots, Y_{q}) \in R^{q}

. Define the joint distribution function (df) of

(X, Y)

by

F (x, y) : = P (X \leq x, Y \leq y), x \in R^{d}, y \in R^{q} .

Henceforth, for vectors

v^{'} = (v_{1}^{'}, \dots, v_{r}^{'}) \in R^{r}

and

v^{″} = (v_{1}^{″}, \dots, v_{r}^{″}) \in R^{r}

, we write

v^{'} \leq v^{″}

if and only if

v_{j}^{'} \leq v_{j}^{″}

for all

j = 1, \dots, r

. We let

I

and

J

be two fixed subsets of

R^{d}

such that

I = \prod_{j = 1}^{d} [a_{j}, b_{j}] \subset J = \prod_{j = 1}^{d} [c_{j}, d_{j}] \subset R^{d},

with

- \infty < c_{j} < a_{j} < b_{j} < d_{j} < \infty for each j = 1, \dots, d .

Suppose

(X, Y)

has a joint density function

f_{X, Y} (x, y) : = \frac{\partial^{d + q}}{\partial x_{1} \dots \partial x_{d} \partial y_{1} \dots \partial y_{q}} F (x, y) on J \times R^{q},

with respect to the Lebesgue measure

d x d y

. Denote

f_{X} (x) = \int_{R^{q}} f (x, y) d y, x \in J,

as the marginal density of

X

on

J

. Let

ρ : R^{q} \to R

be a measurable function. We are chiefly interested in estimating derivatives of the form

\begin{matrix} (\partial^{β} r) (ρ; x) & : = \partial^{β} [E [ρ (Y) ∣ X = x]] f_{X} (x) \\ = \int_{R^{q}} ρ (y) (\partial^{β} f) (x, y) d y, \end{matrix}

where

(\partial^{β} f) (x, y) = \frac{\partial^{| β |} f (x, y)}{\partial x_{1}^{β_{1}} \dots \partial x_{d}^{β_{d}}}, β = (β_{1}, \dots, β_{d}), | β | = \sum_{i = 1}^{d} β_{i} .

When the regression function

m_{ρ} (x) = E [ρ (Y) ∣ X = x]

is sufficiently smooth, we may also consider its derivative

\partial_{x}^{β} m_{ρ} (x) = \frac{\partial^{| β |} m_{ρ} (x)}{\partial x_{1}^{β_{1}} \dots \partial x_{d}^{β_{d}}} .

Assume now that

(\partial^{β} r) (ρ; \cdot) \in L_{2} (R^{d})

. According to [36], for any integer

τ

, the function

(\partial^{β} r) (ρ; \cdot)

admits an expansion in the subspace

V_{τ}

of the multiresolution analysis:

(\partial^{β} r) (x) = \sum_{k \in Z^{d}} a_{τ, k} ϕ_{τ, k} (x),

(2)

where each wavelet coefficient

a_{τ, k}

can be represented (via repeated integration by parts) as

\begin{matrix} a_{τ, k} & = \int_{R^{d}} (\partial^{β} r) (ρ; u) ϕ_{τ, k} (u) d u \\ = {(- 1)}^{| β |} \int_{R^{d}} r (ρ; u) (\partial^{β} ϕ_{τ, k}) (u) d u . \end{matrix}

Let

\partial^{β} ϕ_{τ, k} (\cdot)

denote the

β

-order partial derivative of

ϕ_{τ, k} (\cdot)

. We define the linear estimator of

(\partial^{β} r) (ρ; \cdot)

at the resolution level

τ = m (n) \to \infty

(the precise divergence rate of

m (n)

will be specified below) by

{\hat{(\partial^{β} r)}}_{n} (ρ; x) = \sum_{k \in Z^{d}} {\hat{a}}_{τ, k} ϕ_{τ, k} (x),

(3)

where

{\hat{a}}_{τ, k}

is the unbiased empirical estimator of

a_{τ, k}

:

{\hat{a}}_{τ, k} = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) .

(4)

Remark 1.

Since

ϕ (\cdot)

and

ψ_{i} (\cdot)

are bounded and compactly supported—where the support grows in a controlled manner with increasing differentiability—it follows that, for any fixed point

x

, only finitely many terms in the sums over

k \in Z^{d}

contribute to the value of the wavelet expansions (see [36] for further details). This implies pointwise convergence of the expansions.

3. Assumptions and Main Results

To facilitate the presentation of our main results, we introduce the following notation. Let us denote by

g (\cdot)

the density function of the random variable Y. Let

F_{n}

be the

σ

-field generated by

{(X_{i}, Y_{i}) : 1 \leq i \leq n}

, let

f_{(X, Y)}^{F_{i - 1}} (\cdot)

and

g^{F_{i - 1}} (\cdot)

and be the conditional densities of

(X, Y)

and Y respectively, given the

σ -

field

F_{i - 1}

. Define the

σ -

field

G_{n} = σ ((X_{k}, Y_{k}); (X_{n + 1}) : 1 \leq k \leq n) .

The following assumptions are imposed throughout the paper.

(C.1)

For every

x \in S

, the sequence

\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)

converges to

f (x)

as

n \to \infty

, both almost surely (a.s.) and in the

L^{2}

sense.

(C.2)

Moreover, for every

x \in S

,

lim_{n \to \infty} sup_{x \in R^{d}} |\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x) - f (x)| = 0,

again in both the almost sure and

L^{2}

senses.

(C.3)

(i): Assume that the multiresolution analysis is r-regular.
(ii): The density $(\partial^{β} r) (ρ; \cdot) \in B_{s p q}$ for some $0 < s < r, 1 ⩽ p, q ⩽ \infty$ .

(C.4)

(i): The density $f (\cdot) \in B_{s p q}$ for some $0 < s < r, 1 ⩽ p, q ⩽ \infty$ .
(ii): The conditional density $f^{F_{i - 1}} (\cdot) \in B_{s p q}$ for some $0 < s < r, 1 ⩽ p, q ⩽ \infty$ .

(N.0)

E [| ρ (Y_{1}) |^{ν}] < \infty

for some

ν > 2

.

(N.1)

For any

y \in R

lim_{n \to \infty} sup_{y \in R} |\frac{1}{n g (y)} \sum_{i = 1}^{n} g^{F_{i - 1}} (y) - 1| = 0, in the a . s . and L^{2} sense .

We may refer to [37] for further details.

(N.2)

(i): The conditional mean of $Y_{i}$ given the $σ -$ field $S_{i}$ depends only on $X_{i}$ , i.e., for any $i \geq 1$ ,

$E [ρ (Y_{i}) ∣ G_{i - 1}] = E [ρ (Y_{i}) ∣ X_{i}] = m_{ρ} (x) .$
(ii): for all $η \leq 2$ ,

$E [{|ρ (Y_{i})|}^{η} ∣ G_{i - 1}] = E [{|ρ (Y_{i})|}^{η} ∣ X_{i}] = m_{ρ, η} (x) .$

and the function $m_{ρ, η} (x)$ is continuous at $x$

(N.3)

(i): We shall suppose that there exist two known constants $C_{m} > 0$ such that

$sup_{x \in S} m_{ρ} (x) \leq C_{m} .$
(ii): The function regression satisfies a Hölder condition, that is, there exist constants $β > 0$ and $c_{3} > 0$ such that, for any $(u, v) \in R^{2 d}$

$|m_{ρ} (u) - m_{ρ} (v)| \leq c_{3} {∥ u - v ∥}^{β} .$
(iii): The function regression $m_{ρ, η}$ satisfy a Hölder condition that is, there exist constants $β_{1} > 0$ and $c_{4} > 0$ such that, for any $(u, v) \in R^{2 d}$

$|m_{ρ, η} (u) - m_{ρ, η} (v)| \leq c_{4} {∥ u - v ∥}^{β_{1}} .$

Lemma 1.

Under condition (C.3), for

s > d / p

, we have

\begin{matrix} sup_{x \in R^{d}} |E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)] - (\partial^{β} r) (ρ; x)| & = sup_{x \in R^{d}} |(\partial^{β} r) (ρ; x) - (P_{V_{m}} (\partial^{β} r) (ρ; x)) (x)| \\ ⩽ const . 2^{- (s - d / p) m (n)} J_{s p q} ((\partial^{β} r) (ρ; \cdot)) . \end{matrix}

Define the kernel

K (u, v)

by

K (u, v) : = \sum_{k \in Z^{d}} ϕ (u - k) ϕ (v - k) and h_{n} = 2^{- m (n)} .

(5)

consequently

K^{(β)} (u, v) : = \sum_{k \in Z^{d}} ϕ (u - k) (\partial_{v}^{β} ϕ) (v - k) .

(6)

where

K^{(β)} (u, v) : = (\partial_{v}^{β} k) (u, v)

denotes the

β

-th partial derivative of

K (u, v)

with respect to

v

. We infer that the derivative kernel function

K^{(β)} (\cdot)

defined in (6) converges uniformly and satisfies ([28], p. 33), i.e., for

∣ α ∣ \leq r

,

∣ β ∣ \leq r

, and for some constant

C_{m} > 0

with

m \geq 1

, the following inequalities holds:

| (\partial_{u}^{α} \partial_{v}^{β} K) (u, v) | \leq \frac{C_{m}}{{(1 + ∥ v - u ∥}_{2})^{m}},

where, for any

x = (x_{1}, \dots, x_{d}) \in R^{d},

{∥ x ∥}_{2} = \sqrt{\sum_{i = 1}^{d} x_{i}^{2}} .

For

α = 0

and

m = d + | β | + 1

, we obtain

| K^{(β)} (u, v) | \leq \frac{C_{d + | β | + 1}}{{(1 + ∥ v - u ∥}_{2})^{d + | β | + 1}} .

(7)

That is, for any

j \geq 1

, we have

\int_{R^{d}} {| K^{(β)} (v, u) |}^{j} d v \leq G_{j} (d) .

The function

G_{j} (d)

is defined as:

G_{j} (d) = 2 π^{d / 2} \frac{ℓ (d) ℓ (j + d (j - 1))}{ℓ (d / 2) ℓ ((d + 1) j)} C_{d + 1}^{j},

where

ℓ (t)

represents the ell function, given by

ℓ (t) : = \int_{0}^{\infty} y^{t - 1} e^{- y} d y .

Furthermore, by assuming

| α | = 1

and

m = 2

, we deduce that

|\frac{\partial_{u} \partial_{v}^{β} K (u, y)}{\partial u_{i}}| \leq \frac{C_{m}}{{(1 + ∥ u - y ∥}_{2})^{2}} \leq C_{2}, i = 1, \dots, d .

(8)

This, in turn, implies that

|K^{β} (u, y) - K^{β} (v, y)| \leq C_{2} \sum_{i = 1}^{d} | u_{i} - v_{i} | \leq d^{1 / 2} C_{2} {∥ u - v ∥}_{2} .

(9)

By incorporating Equations (4), (3), and (6), the estimator

{\hat{(\partial^{β} r)}}_{n} (x)

can be expressed in an extended kernel estimation framework as follows:

{\hat{(\partial^{β} r)}}_{n} (x) = \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}), where h_{n} = 2^{- m (n)} .

(10)

We refer the reader to [24] for further details.

Theorem 1.

Under the stated assumptions (C.1), (C.4)(i), (N.0) and (N.2)(i), let r be an element of the Besov space

B_{s, p, q}

with

s > 1 / p

, and

1 \leq p, q \leq \infty

. In this setting, the linear wavelet estimator

{\hat{(\partial^{β} r)}}_{n}

satisfies

E {∥ {\hat{(\partial^{β} r)}}_{n} (ρ; x) - (\partial^{β} r) (ρ; x) ∥}_{2}^{2} = O (n^{- \frac{2 (s - | β | - d / 4)}{2 s + 1}}) .

Theorem 2.

Assume that assumptions (C.2), (N.1), (N.2), (N.3)(i) and

m (n) = m \to \infty a n d \frac{2^{d m (n)} log n}{n} \to 0 a s n \to \infty .

Define

T_{n}

and assume that for

m (n) \to \infty

slowly enough such that

{(\frac{2^{m (n) d} log n}{n})}^{1 / 2} T_{n} = {(\frac{log n}{n h_{n}^{d}})}^{1 / 2} T_{n} \to 0, as n \to \infty .

(11)

For every compact subset

D \subset R^{d}

, under assumptions (C.1) and (C.3), we have almost surely,

sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]| = O ({(\frac{log n}{n h_{n}^{d + 2 | β |}})}^{1 / 2}) + O (h_{n}^{- 1 / 2}) .

Remark 2.

The factor

2^{- m (n)}

that appears in the preceding theorems corresponds directly to the bandwidth

h_{n}

in Parzen–Rosenblatt kernel density estimation. In practice, however, selecting the multiresolution level

m (n)

within the wavelet framework is typically much simpler than choosing

h_{n}

. Because only a small, discrete set of candidate levels—usually three or four—needs to be examined, the procedure remains both conceptually transparent and computationally inexpensive. Adaptive choice of

m (n)

most commonly relies on Stein’s unbiased risk estimate, the classical rule of thumb, and cross-validation. Comprehensive treatments of these methods, as well as their use in establishing asymptotically optimal data-driven bandwidth selection rules, are provided by [38] and [21]. For the univariate setting (

d = 1

), the cross-validation criterion at resolution level j is given by

C V (j) = \sum_{k} [\frac{2}{n (n - 1)} \sum_{i = 1}^{n} {(ϕ_{j, k} (X_{i}))}^{2} - \frac{n + 1}{n^{2} (n - 1)} {(\sum_{i = 1}^{n} ϕ_{j, k} (X_{i}))}^{2}] .

The statistic

C V (j)

depends solely on the observations

X_{1}, \dots, X_{n}

and the resolution index j. The optimal level is selected by

j_{0} = arg min_{j} C V (j) .

Remark 3.

Kernel estimators suffer a well-documented decline in accuracy as the dimension of the covariate space grows—a manifestation of the curse of dimensionality [39]. In high dimensions, the volume of a local neighbourhood expands so rapidly that an impractically large sample is required to collect even a modest number of observations inside it. Unless the sample size is very large, one must therefore adopt bandwidths so wide that the resulting estimate is no longer truly local. A thorough exposition, complete with numerical illustrations, appears in [40], while more recent analyses are given in [41,42]. Although penalised splines enjoy considerable empirical popularity, their asymptotic behaviour is still not fully resolved even in classical non-parametric settings, and only a handful of theoretical investigations address the issue. In addition, most functional-regression techniques minimise an $L_2$ criterion and thus remain sensitive to outliers. Promising, if less conventional, alternatives include methods based on delta sequences and wavelet representations.

3.1. Asymptotic Normality Results

This section establishes a central limit theorem for the estimator defined in Equation (3). This problem was considered in several papers. In [10,43], the authors analyzed a non-parametric regression model with repeated measurements corrupted by mixing errors and, under mild moment and mixing-coefficient assumptions, established the r-th mean consistency, almost-sure consistency, complete consistency, optimal almost-sure convergence rates, and asymptotic normality of the associated wavelet estimator. For deconvolution density estimation with moderately ill-posed noise, ref. [44] proves the asymptotic normality of wavelet estimators when the target density lies in suitable Besov spaces. The study in [45] revisits non-parametric regression, deriving asymptotic normality of the wavelet estimator when the errors constitute an asymptotically negatively associated sequence. In a censored-data setting, ref. [46] treats density estimation where survival and censoring times form a stationary

α

-mixing sequence, again demonstrating asymptotic normality of the wavelet estimator. A fixed-design non-parametric regression model driven by strongly mixing errors is considered in [47], which likewise confirms asymptotic normality for the wavelet estimator of the regression function. Wavelet methods for linear processes—Gaussian or otherwise—with long, short, or negative memory are examined in [48]; both the log-regression and Whittle wavelet estimators are shown to be asymptotically normal, and explicit expressions for the limiting variance are derived via a general result for the suitably centred and normalised scalogram. Finally, ref. [49] proposes a wavelet estimator for fixed-design non-parametric regression with strictly stationary, associated errors, proving pointwise weak consistency and uniform asymptotic normality. For additional treatments of asymptotic normality of wavelet estimators in alternative frameworks, see [10,50,51]. Our results are obtained under mild regularity conditions on the estimator and only minimal bandwidth assumptions. We write

Z_{n} \overset{D}{\to} N (0, Σ^{2} (x))

to indicate that the sequence of random variables

{(Z_{n})}_{n \geq 1}

converges in distribution to a mean-zero normal distribution with covariance matrix

Σ^{2} (x)

.

Theorem 3.

Assume that the hypotheses (C.1)–(C.4), (N.0), (N.2)(ii) and (N.3)(iii) are satisfied. Additionally, suppose that

n h_{n}^{d + 2 | β | + 2 δ} \to 0 as n \to \infty .

(12)

Then, the following convergence holds:

\sqrt{n h_{n}^{d + 2 | β |}} ({\hat{(\partial^{β} r)}}_{n} (ρ; x) - (\partial^{β} r) (ρ; x)) \overset{D}{\to} N (0, Σ_{(β)}^{2} (x)),

where

Σ_{(β)}^{2} (x) : = m_{2} (ρ, x) f_{X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u,

and

m_{2} (ρ, x) = E [ρ^{2} (Y) ∣ X = x] = \frac{1}{f_{X} (x)} \int_{R^{q}} ρ^{2} (y) f_{X, Y} (x, y) d y .

The proof of Theorem 3 is detailed in Section 7.

Remark 4.

Invoking Theorem 1, we obtain a refined bound for the mean-squared error of the wavelet-based estimator of multivariate density derivatives. Under the usual smoothness and moment assumptions,

E {∥{\hat{(\partial^{β} f)}}_{n} - \partial^{β} f∥}_{2}^{2} = O (n^{- \frac{2 (s - | β |)}{2 s + 1}}) .

A parallel analysis, this time relying on Theorem 3, yields an asymptotic normality result. Specifically, for any fixed

x \in R^{d}

,

\sqrt{n h_{n}^{d + 2 | β |}} ({\hat{(\partial^{β} f)}}_{n} (x) - \partial^{β} f (x)) \overset{D}{\to} N (0, {\tilde{Σ}}_{(β)}^{2} (x)),

with asymptotic variance

{\tilde{Σ}}_{(β)}^{2} (x) = f_{X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u .

Both results are consistent with the theoretical developments in Theorem 4.8 of [52], further demonstrating the efficacy of wavelet methods for non-parametric estimation in multivariate contexts.

Remark 5.

A noteworthy case of

(\partial^{β} r) (ρ; \cdot)

arises when

ρ (y) = 1_{{y \leq t}}

. In this setting, from (3) we obtain

{\hat{(\partial^{β} r)}}_{n} (1_{{\cdot \leq t}}; x) = \sum_{k \in Z^{d}} {\overset{˘}{a}}_{τ, k} ϕ_{τ, k} (x),

where

{\overset{˘}{a}}_{τ, k}

is the unbiased estimator given by

{\overset{˘}{a}}_{τ, k} = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} 1_{{Y_{i} \leq t}} (\partial^{β} ϕ_{τ, k}) (X_{i}) .

Since this is a specific instance of the estimator in (3), Theorem 3 implies that

\sqrt{n h_{n}^{d + 2 β}} ({\hat{(\partial^{β} r)}}_{n} (1_{{\cdot \leq t}}; x) - \partial^{β} r (1_{{\cdot \leq t}}; x)) \overset{D}{\to} N (0, {\overset{˘}{Σ}}^{2} (x)),

where

{\overset{˘}{Σ}}^{2} (x) : = m_{2} (1_{{\cdot \leq t}}, x) f_{X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u .

This result is pivotal for determining the asymptotic behavior of wavelet estimators derived from the conditional distribution.

3.2. Confidence Interval

The asymptotic variance

Σ_{(β)}^{2} (x)

that features in the central limit theorem depends on two unknown components—the conditional variance of Y given

X

and the density

f_{X} (\cdot)

of

X

—and therefore must be estimated in practice. We obtain consistent estimates by employing a bounded, compactly supported wavelet basis drawn from the literature, such as the widely used Daubechies family [36]. Using a sufficiently large multiresolution level

τ_{n}

and an adaptively chosen initial level $j_0$, we recover the requisite nuisance parameters via wavelet-based estimation and subsequently insert them through a plug-in scheme. Using the estimators (3) we establish

Σ_{n, (β)}^{2} (x)

a consistent estimate of the variance

Σ_{(β)}^{2} (x)

given by

Σ_{n, (β)}^{2} (x) : = m_{n, 2} (ρ, x) f_{n, X} (x) \int_{R^{d}} {(K^{(β)})}^{2} (0, u) d u,

(13)

where

f_{n, X} (\cdot)

is the wavelet estimator of

f_{X} (\cdot)

and

m_{n, 2} (ρ, x) = \sum_{k \in Z^{d}} {\hat{a}}_{j_{0} k}^{″} ϕ_{j_{0}, k} (x) + \sum_{j = j_{0}}^{τ} \sum_{i = 1}^{N} \sum_{k \in Z^{d}} {\hat{b}}_{i j k}^{″} ψ_{i, j, k} (x) .

(14)

The considered coefficients estimators are given, respectively, for any

j_{0} \leq τ

, by

{\hat{a}}_{τ, k}^{″} = \frac{1}{n} \sum_{i = 1}^{n} ρ^{2} (Y_{i}) ϕ_{τ, k} (X_{i}),

and

{\hat{b}}_{i j k}^{″} = \frac{1}{n} \sum_{i = 1}^{n} ρ^{2} (Y_{i}) ψ_{i, j, k} (X_{i}) .

The approximate confidence intervals of

(\partial^{β} r) (ρ; x)

can be obtained as

(\partial^{β} r) (ρ; x) \in [{\hat{(\partial^{β} r)}}_{n} (ρ; x) \pm c_{α} \frac{Σ_{n, (β)}^{2} (x)}{\sqrt{n h_{n}^{d + 2 | β |}}}],

where

c_{α}

denotes the

(1 - \frac{α}{2}) -

quantile of the standard normal condition.

4. Application to the Regression Derivatives

In this section, we will follow the same notation as in [20]. We will consider especially the conditional expectation of

ρ (Y)

given

X = x

, for

p = q = 1

. Recall that

\begin{matrix} m (x, ρ) & = E (ρ (Y) ∣ X = x) = \frac{1}{f_{X} (x)} \int_{R} ρ (y) f_{X, Y} (x, y) d y \\ = \frac{r (ρ, x)}{f_{X} (x)} . \end{matrix}

Recall the following derivatives

m_{ρ}^{'} (x) = \frac{r^{'} (ρ, x)}{f_{X} (x)} - \frac{r (ρ, x) f_{X}^{'} (x)}{f_{X}^{2} (x)},

(15)

and

\begin{matrix} m_{ρ}^{″} (x) & = & \frac{r^{″} (ρ, x)}{f_{X} (x)} - \frac{2 r^{'} (ψ, x) f_{X}^{'} (x)}{f_{X}^{2} (x)} \\ + \frac{r (ρ, x) {2 {(f_{X}^{'} (x))}^{2} - f_{X} (x) f_{X}^{″} (x)}}{f_{X}^{3} (x)} . \end{matrix}

(16)

In order to estimate the derivatives of

m_{ρ}^{'} (x)

in (15) and

m_{ρ}^{″} (x)

in (16) by replacing

f_{X} (\cdot),

f_{X}^{'} (\cdot)

,

f_{X}^{″} (\cdot)

,

r (ρ; \cdot)

,

r^{'} (ρ; \cdot)

and

r^{″} (ρ; \cdot)

by

f_{X; n} (\cdot, h_{n})

,

f_{X; n}^{'} (\cdot, h_{n})

,

f_{X; n}^{″} (\cdot, h_{n})

,

r_{n} (ρ; \cdot, h_{n})

,

r_{n}^{'} (ρ; \cdot, h_{n})

and

r_{n}^{″} (ρ; \cdot, h_{n})

. We so define

m_{ρ, n}^{'} (x; h_{n})

and

m_{ρ, n}^{″} (x; h_{n})

when

f_{X; n} (x, h_{n}) \neq 0

. The definition of

m_{ρ, n}^{'} (x; h_{n})

and

m_{ρ, n}^{″} (x; h_{n})

is completed by setting

m_{ρ, n}^{'} (x; h_{n}) = m_{ρ, n}^{″} (x; h_{n}) = 0

when

f_{X; n} (x, h_{n}) = 0

.

The following theorem is more or less a straightforward consequence of Theorem 2.

Corollary 1.

Under the assumptions of Theorem 2, we have

\begin{matrix} sup_{x \in J} |m_{ρ, n}^{'} (x; h_{n}) - m_{ρ}^{'} (x)| = o_{P} (1), \\ sup_{x \in J} |m_{ρ, n}^{″} (x; h_{n}) - m_{ρ}^{″} (x)| = o_{P} (1) . \end{matrix}

5. Mode Regression

Now the location

θ

and size

m_{ρ} (θ)

of the mode of

m_{ρ} (\cdot)

are estimated from the respective functionals

{\hat{θ}}_{n}

and

{\hat{m}}_{ρ, n} ({\hat{θ}}_{n})

pertaining to

{\hat{m}}_{ρ, n} (\cdot)

, i.e.,

{\hat{θ}}_{n}

is chosen through the equation

\begin{matrix} {\hat{m}}_{ρ, n} ({\hat{θ}}_{n}) = max_{x \in I} {\hat{m}}_{ρ, n} (x) . \end{matrix}

(17)

Then, a natural estimator of

θ

is

{\hat{θ}}_{n} : = arg max_{x \in I} {\hat{m}}_{ρ, n} (x) .

(18)

Note that the estimate

{\hat{θ}}_{n}

is not necessarily unique and our results are valid for any chosen value satisfying (18). We point out that we can specify our choice by taking

\begin{matrix} {\hat{θ}}_{n} = inf \{y \in R such that {\hat{m}}_{ρ, n} (y) = sup_{x \in R} {\hat{m}}_{ρ, n} (x)\} . \end{matrix}

It is known that kernel estimators tend to produce some additional and superfluous modality. However, this has no bearing on the asymptotic theory; our results are valid for any choice of

{\hat{θ}}_{n}

satisfying (18). Following the paper [53], we have the convergence result given in the following corollary.

Corollary 2.

Under the assumptions of Theorem 2, we have

\begin{matrix} |{\hat{θ}}_{n} - θ| = o_{P} (1) . \end{matrix}

Remark 6.

Using weak-convergence techniques in the space of continuous functions, ref. [54] strengthened Parzen’s distributional result under broadly weaker assumptions. He showed that an appropriately rescaled kernel-density estimator converges weakly to a randomly translated parabola that passes through the origin and possesses a fixed second derivative. In [55], the key insight is that, on a shrinking neighbourhood around the mode, the estimator f converges weakly—under very general conditions—to a Gaussian process whose mean function is a parabola through the origin. This mean depends on the kernel moments and on the unknown density and its derivatives evaluated at the mode. More precisely, define

Z_{n} (t) = h_{n}^{- 2} [{\hat{f}}_{n} (\overset{˘}{θ} + h_{n} t) - {\hat{f}}_{n} (\overset{˘}{θ})], t \in R,

where

\overset{˘}{θ}

denotes the mode of f. The process

Z_{n}

is thus a normalised version of the estimator in an interval centred at the mode. The limiting Gaussian process

Z (t)

is given by

\begin{matrix} Z (t) = Y_{α} (t) + {(- 1)}^{p + 1} \frac{d}{c p!} f^{(p + 1)} (\overset{˘}{θ}) B_{p} t + \frac{1}{2} f^{(2)} (\overset{˘}{θ}) t^{2}, t \in R, \end{matrix}

where

c > 0

,

d \geq 0

,

p \geq 2

is fixed,

B_{p} = \int x^{p} K (x) d x,

and

Y_{α}

is a mean-zero Gaussian process with covariance

R (s, t) = \frac{f (\overset{˘}{θ})}{c^{2 + α}} V_{α} (s / t) s t^{1 - α}, 0 \leq α \leq 1, | s | \leq | t | .

The function

V_{α}

is defined by

V_{α} (γ) = lim_{δ \to 0} δ^{α} \int [\frac{K (γ δ - x) - K (- x)}{γ δ}] [\frac{K (δ - x) - K (- x)}{δ}] d x .

Ref. [55] establishes weak convergence of

{Z_{n}}

to Z for independent data. A central step in his Theorem 2.1 is to express

Z_{n} (t)

as an average of i.i.d. variables, facilitating covariance computations and convergence of finite-dimensional distributions. Tightness is obtained via Theorem 15.7 of [56], which reduces the argument to evaluating the variance of independent variables (relation 2.9). The present study tackles the same problem via alternative methods, extending the framework of [18] to censored, dependent observations. Adapting Eddy’s approach to this setting would be an attractive direction for future work. Such an extension, however, requires new probabilistic results analogous to those in [55] but tailored to dependent—e.g., mixing—samples, a significant undertaking that we leave for subsequent research.

Remark 7.

We note that, when

| s | \geq 2

,

m_{ρ, n}^{(| s |)} (x, h_{n}) = D^{| s |} (m_{ρ, n} (x, h_{n}))

may be obtained likewise through the usual Leibniz expansion of derivatives of products given by

m_{ρ, n}^{(| s |)} (x, h_{n}) = \sum_{| j | = 0}^{| s |} C_{| s |}^{| j |} r_{n}^{(| j |)} (ρ; x, h_{n}) {\{f_{X; n}^{- 1} (x, h_{n})\}}^{(| s | - | j |)}, f_{X; n} (x, h_{n}) \neq 0 .

Remark 8.

It is well known that the accuracy of kernel estimators deteriorates as the dimension of the predictor space grows. This phenomenon—popularly termed the curse of dimensionality [39]—stems from the fact that, in high dimensions, an impractically large number of observations is needed within any local neighborhood to secure a dependable estimate. When the available sample size is modest, practitioners are forced to adopt a bandwidth so wide that the very notion of “local” averaging effectively collapses. A thorough treatment of these difficulties, complemented by illustrative numerical experiments, can be found in [40]; see also the more recent contributions [41,42,57] for additional insights. Despite their popularity, penalized splines remain theoretically under-explored: rigorous results describing their asymptotic behaviour are sparse even in classical non-parametric settings. Moreover, a large proportion of functional-regression techniques are predicated on minimising the $L_2$ loss, rendering them particularly susceptible to outlying observations. More robust alternatives—such as methods based on delta sequences or on wavelet decompositions—offer promising directions for mitigating this sensitivity.

Remark 9.

Let us recall some integral functions of the density function

T_{1} (F) = \int_{R} {(f^{'} (x))}^{2} d x, T_{2} (F) = - \int_{R} a (F (x)) f^{'} (x) d x

and

T_{3} (F) = \int_{R} {(f (x))}^{2} d x .

Notice that the functional

T_{3} (F)

is a special case of

T_{2} (F)

. The functionals

T_{1} (F)

and

T_{3} (F)

appear in plug-in data-driven bandwidth selection procedures in density estimation (refer to [58] and the references therein) and the functional

T_{2} (F)

arises as part of the variance in non-parametric location and regression estimation based on linear rank statistics (refer especially to [59]. Consider the following general class of integral functionals of the density:

T (F) = \int_{R} ρ (x, F (x), F^{(1)} (x), \dots, F^{(r)} (x)) d F (x),

where F is a cumulative distribution function on

R

with

r \geq 1

derivatives

F^{(m)}

, refer to the [60,61] for more details. One can estimate

T (F)

the plug-in methods making use of the wavelet estimate of the density function and its derivatives. The proof of such a statement, however, should require a different methodology than that used in the present paper, and we leave this problem open for future research.

Remark 10.

The nonlinear thresholding framework furnishes an alternative class of estimators for the unknown density

f_{X} (x)

. Specifically,

{\hat{f}}_{X} (x) = \sum_{k \in Z^{d}} {\hat{a}}_{j_{0}, k} ϕ_{j_{0}, k} (x) + \sum_{j = j_{0}}^{τ} \sum_{i = 1}^{N} \sum_{k \in Z^{d}} {\hat{b}}_{i j k} 1 \{|{\hat{b}}_{i j k}| > δ_{j, n}\} ψ_{i, j, k} (x),

(19)

where

δ_{j, n}

denotes an appropriately selected threshold. In the univariate case (

d = 1

), this estimator was first proposed by [29]. Extending its theoretical guarantees and empirical performance to higher-dimensional settings remains an intriguing direction for future research.

Remark 11.

The sup-norm convergence rates proved in our theorems coincide with the optimal rates reported by [62,63,64]. The exact logarithmic factors depend on the resolution level, which is itself governed by the smoothness index s of the target function f in the Besov space

B_{s, p, q}

. Such a dependence on s is a hallmark of non-parametric estimation and is widely documented in the literature. By allowing f to possess general Besov regularity, our analysis relaxes the customary requirement of an integer-order derivative that underpins classical convolution-kernel methods, even though s is typically unknown in practice.

Remark 12.

To demonstrate that the assumptions stated in [65] are attainable, we provide three illustrative examples in which they are fulfilled:

Long-memory discrete-time processes: Let ${(ϵ_{t})}_{t \in Z}$ be white noise with variance $σ^{2}$ ; denote the identity and back-shift operators by I and B, respectively. According to [66] (Theorem 1, p. 55), the k-factor Gegenbauer process satisfies

$\prod_{i \leq i \leq k} {(I - 2 v_{i} B + B^{2})}^{d_{i}} X_{t} = ϵ_{t},$

with $0 < d_{i} < 1 / 2$ when $| ν_{i} | < 1$ and $0 < d_{i} < 1 / 4$ when $| ν_{i} | = 1$ for $i = 1, \dots, k$ . This specification yields a stationary, causal, invertible series that exhibits long-range dependence. Moreover, it admits the moving-average representation

$X_{t} = \sum_{j \geq 0} ψ_{j} (d, v) ϵ_{t - j},$

and the condition

$\sum_{j = 0}^{\infty} ψ_{j}^{2} (d, v) < \infty$

guarantees asymptotic stability. Nevertheless, ref. [67] shows that if ${(ϵ_{t})}_{t \in Z}$ is Gaussian, the process is not strongly mixing. Even so, the moving-average form secures stationarity, Gaussianity, and ergodicity, clarifying the subtle influence of mixing conditions and emphasizing the interpretive value of the moving-average representation for long-memory dynamics.
Stationary solution of a linear Markov $AR (1)$ process: Consider

$X_{i} = \frac{1}{2} X_{i - 1} + ϵ_{i},$

where $(ϵ_{i})$ are independent symmetric Bernoulli variables taking the values $- 1$ and 1. As shown in [68], this model is not α-mixing because of its dependence structure. It nevertheless remains stationary, Markov, and ergodic, illustrating that strong mixing is not necessary for either Markovianity or ergodicity—a point of direct relevance to statistical inference for time series and functional data.
A stationary process with an $AR (1)$ representation: Let $(u_{i})$ be an independent and identically distributed sequence uniformly distributed on ${1, \dots, 9}$ , and define

$X_{t} : = \sum_{i = 0}^{\infty} 10^{- i - 1} u_{t - i},$

so that $u_{t}, u_{t - 1}, \dots$ constitute the decimal expansion of $X_{t}$ . The series is stationary and can be written in $AR (1)$ form:

$X_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{10} u_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{2} + ϵ_{t},$

where

$ϵ_{t} = \frac{1}{10} u_{t} - \frac{1}{2}$

is strong white noise. Although it fails the α-mixing criterion [69] (Example A.3, p. 349), the process is ergodic. This confirms that ergodicity may persist even in the absence of strong mixing, underscoring its suitability for non-parametric functional data analysis.

Remark 13.

The paper already includes several well-known examples from the literature of processes that are ergodic yet not mixing. Broadening this to encompass more general linear Markov models—such as ARMA

(p, q)

or ARIMA

(p, d, q)

with

q \neq 0

—would certainly be of considerable interest. Nonetheless, pursuing this extension would necessitate a meticulous and comprehensive study to rigorously delineate the circumstances under which such processes remain ergodic while failing to satisfy mixing conditions.

6. Concluding Remarks

This study tackles the estimation of partial derivatives of multivariate regression functions. We introduce a family of non-parametric estimators based on linear wavelet methods and provide a rigorous theoretical analysis. Specifically, we establish strong uniform consistency on compact subsets of

R^{d}

and derive the corresponding convergence rates. We also prove the estimators’ asymptotic normality and extend these results to ergodic processes, thereby broadening the scope of existing theory.

A key open problem is how to select smoothing parameters optimally so as to minimise the mean-squared error; addressing this question will be the focus of future work. Other promising directions include extending the methodology to functional ergodic data—a task that presents substantial mathematical challenges—and adapting it to settings with incomplete observations, such as data missing at random or subject to various censoring mechanisms, particularly in spatially dependent contexts.

Further progress could come from relaxing the stationarity assumption to accommodate locally stationary processes and from developing comparable results for stationary continuous-time models. Both extensions would require fundamentally different analytical tools.

Finally, extensive numerical experiments on simulated and real datasets would enhance the practical relevance of our procedures, and the development of weighted bootstrap techniques—building on recent contributions such as [70,71]—offers another fruitful avenue for investigation.

7. Proofs

We derive an upper bound for the partial sums of unbounded martingale differences—a key ingredient in analyzing the asymptotic properties of the wavelet estimator based on strictly stationary, ergodic observations. Throughout the paper, C denotes a generic positive constant that may change from one occurrence to the next. The necessary inequality is presented in the ensuing lemmas.

Lemma 2.

(Burkholder-Rosenthal inequality) Following Notation 1 in [72].

Let

{(X_{i})}_{i \geq 1}

be a stationary martingale adapted to the filtration

{(F_{i})}_{i \geq 1}

, define

{(d_{i})}_{i \geq 1}

is the sequence of martingale differences adapted to

{(F_{i})}_{i \geq 1}

and

S_{n} = \sum_{i = 1}^{n} d_{i},

then for any positive integer n,

∥ max_{1 \leq j \leq n} | S_{j} {|) ∥}_{p} ≪ n^{1 / p} {∥ d_{1} ∥}_{p} + {∥\sum_{k = 1}^{n} E (d_{k}^{2} / F_{k - 1})∥}_{p / 2}^{1 / 2}, f o r a n y p \geq 2;

(20)

where as usual the norm

{∥ \cdot ∥}_{p} = {(E [| \cdot |^{p}])}^{1 / p} .

Lemma 3.

Let

{(Z_{n})}_{n \geq 1}

be a sequence of real martingale differences with respect to the sequence of

σ -

fields

{(F_{n} = σ (Z_{1}, \dots, Z_{n}))}_{n \geq 1}

, where is the σ-field generated by the random variables

Z_{1}, \dots, Z_{n}

. Set

S_{n} = \sum_{i = 1}^{n} Z_{i} .

For any

p \geq 2

and any

n \geq 1

, assume that there exists some nonnegative constants C and

d_{n}

such that

E [Z_{n}^{p} | F_{n - 1}] \leq C^{p - 1} p! d_{n}^{2}, a l m o s t s u r e l y .

Then, for any

ϵ > 0

, we have

P (| S_{n} | > ϵ) \leq 2 exp \{- \frac{ϵ^{2}}{2 (D_{n} + C ϵ)}\},

where

D_{n} = \sum_{i = 1}^{n} d_{i}^{2} .

Proof of Lemma 3.

The proof follows as a particular case of Theorem 8.2.2 due to [73]. □

To prove Theorem 1, we utilize the following two lemmas. Define the conditional expectation of

a_{τ, k}

{\tilde{a}}_{τ, k} = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | F_{i - 1}] .

(21)

Lemma 4.

For any

k \in Z^{d}

, under the assumptions (C.1) and (N.2)(i), the following holds

{\tilde{a}}_{τ, k} = a_{τ, k}, as n \to \infty .

(22)

Proof of Lemma 4.

Observe that by invoking assumptions (N.2)(i) and (C.1), and the fact that

F_{i - 1} \subset G_{i - 1}

, we can further expand

{\tilde{a}}_{τ, k}

as follows:

\begin{matrix} {\tilde{a}}_{τ, k} & = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | G_{i - 1}] | F_{i - 1}] \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [E [ρ (Y_{i}) | X_{i}] (\partial^{β} ϕ_{τ, k}) (X_{i}) | F_{i - 1}] \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} \int_{R^{d}} m_{ρ} (x) (\partial^{β} ϕ_{τ, k}) (x) f^{F_{j - 1}} (x) d x \\ = {(- 1)}^{| β |} \int_{R^{d}} m_{ρ} (x) (\partial^{β} ϕ_{τ, k}) (x) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)) d x \\ = {(- 1)}^{| β |} \int_{R^{d}} m_{ρ} (x) (\partial^{β} ϕ_{τ, k}) (x) (f (x) + o (1)) d x \\ = {(- 1)}^{| β |} \int_{R^{d}} r (ρ, x) (\partial^{β} ϕ_{τ, k}) (x) d x + o (1) \\ = a_{τ, k} + o (1) . \end{matrix}

(23)

Consequently,

{\tilde{a}}_{τ, k} = a_{τ, k} as n \to \infty,

(24)

□

Lemma 5.

For any

k \in Z^{d}

, under the assumption (N.2)(i),the following holds:

E [{\hat{a}}_{τ, k}] = a_{τ, k}, as n \to \infty a . s .

(25)

Proof of Lemma 5.

Following a similar argument as for (23) and using the property

E [E [\cdot | F]] = E [\cdot]

where

F

is a sigma field, under the assumption (N.2)(i), we have

\begin{matrix} E [{\hat{a}}_{τ, k}] & = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i})] \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | X_{i}]] \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} E [m_{ρ} (X_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i})] \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} \int_{R^{d}} m_{ρ} (x) (\partial^{β} ϕ_{τ, k}) (x, y) f (x) d x d y \\ = {(- 1)}^{| β |} \int_{R^{d} \times R} \frac{r (ρ, x)}{f (x)} (\partial^{β} ϕ_{τ, k}) (x) f (x) d x \end{matrix}

(26)

\begin{matrix} = {(- 1)}^{| β |} \int_{R^{d} \times R} r (ρ, x) (\partial^{β} ϕ_{τ, k}) (x) d x \end{matrix}

(27)

which implies

E [{\hat{a}}_{τ, k}] = a_{τ, k}, as n \to \infty . a . s .

(28)

This completes the proof of (25). □

Lemma 6.

For any

k \in Z^{d}

, under the assumptions (C.1), (C.4)(i), (N.0), and (N.2)(i), the following holds:

E [{({\hat{a}}_{τ, k} - a_{τ, k})}^{2}] = O (\frac{2^{2 m (d / 4 + | β |)}}{n}), as n \to \infty .

(29)

Proof of Lemma 6.

We present the following decomposition:

{\hat{a}}_{τ, k} - a_{τ, k} = ({\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k}) + ({\tilde{a}}_{τ, k} - a_{τ, k}) .

(30)

Therefore, we conclude that

\begin{matrix} E [{({\hat{a}}_{τ, k} - a_{τ, k})}^{2}] & = E [{({\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k})}^{2}] + E [{({\tilde{a}}_{τ, k} - a_{τ, k})}^{2}] \\ + 2 E [({\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k}) ({\tilde{a}}_{τ, k} - a_{τ, k})] \\ = A_{m, k, 1} + A_{m, k, 2} + A_{m, k, 3} . \end{matrix}

(31)

By applying using Lemma 4 and its assumptions, especially statement (22), one obtains

A_{m, k, 2} = o (1) .

(32)

Moreover, using the same statement (22), we deduce

\begin{matrix} A_{m, k, 3} & = o (1) E [{\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k}] \\ = o (1) \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} (E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i})] - E [E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | F_{i - 1}]]) \\ = 0 . \end{matrix}

(33)

We now direct our focus to the first term in the decomposition (31). Observe that

\begin{matrix} {\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k} & = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} (ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) - E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | F_{i - 1}]) \\ = \frac{{(- 1)}^{| β |}}{n} \sum_{i = 1}^{n} Φ_{τ, k} (X_{i}, Y_{i}) . \end{matrix}

Observe that

{(Φ_{τ, k} (X_{i}))}_{0 \leq k \leq m_{n}}

forms a martingale difference sequence with respect to the filtration

{(F_{i})}_{0 \leq k \leq m_{n}}

. Applying Lemma 2, we directly conclude:

A_{m, k, 1} = \frac{{(- 1)}^{2 | β |}}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{τ, k} (X_{i}, Y_{i})|}^{2}] .

Moreover, the following holds:

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{τ, k} (X_{i}, Y_{i})|}^{2}])}^{\frac{1}{2}} \\ \leq \sqrt{n} {∥ Φ_{τ, k} (X_{1}, Y_{1}) ∥}_{2} + {∥\sum_{i = 1}^{n} E [Φ_{τ, k}^{2} (X_{i}, Y_{i}) | F_{i - 1}]∥}_{1}^{\frac{1}{2}} \\ = Φ_{(1)} + Φ_{(2)} . \end{matrix}

(34)

To examine these components, we employ a standard decomposition approach and observe that

F_{0}

corresponds to the trivial

σ

-algebra. Consequently, the following result is derived:

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = ∥ Φ_{τ, k} (X_{1}, Y_{1}) ∥_{2}^{2} \\ = E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1}) - E [ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1}) | F_{0}]|}^{2}] \\ \leq \sum_{j = 0}^{2} (\binom{2}{j}) E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|}^{j} {(E [|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|])}^{2 - j}] \\ = \sum_{j = 0}^{2} (\binom{2}{j}) E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|}^{j}] {(E [|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|])}^{2 - j} \\ = E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|}^{2}] + 3 {(E [|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|])}^{2} \\ \leq 4 E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|}^{2}] . \end{matrix}

(35)

By invoking the Cauchy-Schwarz inequality we obtain

\begin{matrix} E [{|ρ (Y_{1}) (\partial^{β} ϕ_{τ, k}) (X_{1})|}^{2}] & \leq & E^{1 / 2} [{|ρ (Y_{1})|}^{4}] E^{1 / 2} [{|(\partial^{β} ϕ_{τ, k}) (X_{1})|}^{4}] \end{matrix}

Under assumption (N.0), there exists a constant

C_{ρ, 4}

such that

E [{|ρ (Y_{1})|}^{4}] \leq C_{ρ, 4} < \infty .

on the other hand, by employing a first-order Taylor expansion alongside Equation (1) and assumption (C.4)(i), we obtain

\begin{matrix} E [{|(\partial^{β} ϕ_{τ, k}) (X_{1})|}^{4}] & = \int_{R^{d}} {(\partial^{β} ϕ_{τ, k})}^{4} (u) f (u) d u \\ = 2^{m (2 d + 4 | β |)} \int_{R^{d}} {(\partial^{β} ϕ)}^{4} (2^{m} u - k) f (u) d u \\ = 2^{m (d + 4 | β |)} \int_{R^{d}} {(\partial^{β} ϕ)}^{4} (v) f (\frac{v + k}{2^{m}}) d v . \\ = 2^{m (d + 4 | β |)} \int_{R^{d}} {(\partial^{β} ϕ)}^{4} (v) (f (v) + O (2^{- m d})) d v \\ \leq 2^{m (d + 4 | β |)} {(\frac{A_{i}}{{(1 + ∥ x ∥)}^{i}})}^{4} \\ = O (2^{4 m (d / 4 + | β |)}) . \end{matrix}

(36)

This leads to

\begin{matrix} Φ_{(1)} & = & O (2^{m (d / 4 + | β |)} n^{1 / 2}) . \end{matrix}

(37)

Next, we analyze the second component of the decomposition in Equation (34). Specifically, we consider

\begin{matrix} Φ_{(2)} & = {(E (\sum_{i = 1}^{n} E [Φ_{τ, k}^{2} (X_{i}, Y_{i}) | F_{i - 1}]))}^{1 / 2} \\ = {(\sum_{i = 1}^{n} E (E [Φ_{τ, k}^{2} (X_{i}, Y_{i}) | F_{i - 1}]))}^{1 / 2} \\ = {(\sum_{i = 1}^{n} E [Φ_{τ, k}^{2} (X_{i}, Y_{i})])}^{1 / 2} . \end{matrix}

Applying a standard identity, we find

\begin{matrix} E [Φ_{τ, k}^{2} (X_{i}, Y_{i})] & = E [ρ (Y_{i}) {((\partial^{β} ϕ_{τ, k}) (X_{i}) - E [ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) | F_{i - 1}])}^{2}] \\ \leq 2 E [∣ ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) ∣^{2}] + 2 E [E [∣ ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) ∣^{2} | F_{i - 1}]] \\ \leq 4 E [∣ ρ (Y_{i}) (\partial^{β} ϕ_{τ, k}) (X_{i}) ∣^{2}] . \end{matrix}

Using Equation (36), it follows that

\begin{matrix} Φ_{(2)} & = & O (2^{m (d / 4 + | β |)} n^{1 / 2}) . \end{matrix}

(38)

Combining the results from Equations (42) and (38), we derive

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{τ, k} (X_{i}, Y_{i})|}^{2}])}^{\frac{1}{2}} = O (2^{m (d / 4 + | β |)} n^{1 / 2}) . \end{matrix}

Consequently,

\begin{matrix} A_{m, k, 1} & = E [{({\hat{a}}_{τ, k} - {\tilde{a}}_{τ, k})}^{2}] \\ = \frac{{(- 1)}^{2 | β |}}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{τ, k} (X_{i}, Y_{i})|}^{2}] \\ = O (2^{2 m (d / 4 + | β |)} n^{- 1}) . \end{matrix}

(39)

Finally, by integrating the findings from Equations (32), (33), and (39), we conclude that

E [{({\hat{a}}_{τ, k} - a_{τ, k})}^{2}] = O (\frac{2^{2 m (d / 4 + | β |)}}{n}) .

This completes the proof. □

Proof of Lemma 1.

The analysis of the bias term is purely analytical and follows from arguments analogous to those in [24], as it remains unaffected by the dependence structure. For brevity, we omit the detailed derivation. □

Proof of Theorem 1.

Building on standard wavelet estimation techniques (see [21] for a detailed exposition), we derive the main result in the following manner. First, by applying the definition of projector normality, we obtain

\begin{matrix} E {∥{\hat{(\partial^{β} r)}}_{n} (ρ; x) - (\partial^{β} r) (ρ; x)∥}_{2}^{2} \\ = E {∥\sum_{k \in Z^{d}} ({\hat{a}}_{τ, k} - a_{τ, k}) ϕ_{τ, k}∥}_{2}^{2} = \sum_{k \in Z^{d}} E [{({\hat{a}}_{τ, k} - a_{τ, k})}^{2}] . \end{matrix}

Next, by invoking Lemma 6 and using the facts that

| V_{ℓ} | \sim 2^{m}

and

2^{m} \sim n^{\frac{1}{2 s + 1}}

, we deduce

\begin{matrix} E {∥ {\hat{(\partial^{β} r)}}_{n} (ρ; x) - (\partial^{β} r) (ρ; x) ∥}_{2}^{2} & = (n^{\frac{1}{2 s + 1}}) O (\frac{2^{2 m (d / 4 + | β |)}}{n}) \\ = O (n^{- \frac{2 (s - | β | - d / 4)}{2 s + 1}}) . \end{matrix}

□

Proof of Theorem 2.

Now, we consider the following decomposition:

\begin{matrix} sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]| \\ \leq sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\bar{r}}_{n} (ρ; x)| + sup_{x \in D} |{\bar{r}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]| \\ = G_{n, 1} (ρ; x) + G_{n, 2} (ρ; x) . \end{matrix}

(40)

To establish the uniform consistency of the above term we will follow the method used in ([74]). Let us introduce the truncated version of

{\hat{(\partial^{β} r)}}_{n} (ρ; x)

as follows. Let

{\hat{a}}_{τ, k}^{T} = \frac{1}{n} \sum_{i = 1}^{n} ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} (\partial^{β} ϕ_{τ, k}) (X_{i}),

(41)

and

{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) = \sum_{k \in Z^{d}} {\hat{a}}_{τ, k}^{T} ϕ_{τ, k} (x) .

(42)

Throughout this work, the notation

1 {A}

represents the indicator function for the set A. In a similar way as in the (10), we write

{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x)

as an extended kernel estimator

{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) = \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) .

(43)

Furthermore, we define:

{\bar{r}}_{n} (ρ; x) = \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} E [ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) ∣ F_{i - 1}] .

and

{\bar{r}}_{n}^{T} (x) = \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] .

(44)

We begin by breaking down the initial term of the Equation (40),

G_{n, 1} (ρ; x)

into three distinct components, expressed as a sum through the following formulation:

\begin{matrix} G_{n, 1} (ρ; x) & = & sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\bar{r}}_{n} (ρ; x)| \\ \leq & sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x)| + sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\bar{r}}_{n}^{T} (ρ; x)| \\ + sup_{x \in D} |{\bar{r}}_{n}^{T} (ρ; x) - {\bar{r}}_{n} (ρ; x)| \\ = & W_{n, 1} (ρ; x) + W_{n, 2} (ρ; x) + W_{n, 3} (ρ; x) . \end{matrix}

(45)

Recalling statement (7), we obtain readily that

sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x)| \leq \frac{{(- 1)}^{| β |} h_{n} C_{d + | β | + 1}}{n} \sum_{i = 1}^{n} |ρ (Y_{i}) 1 {| ρ (Y_{i}) | > T_{n}}| .

(46)

Markov inequality, in turn, implies that

P (| ρ (Y_{n}) | > T_{n}) \leq T_{n}^{- ν} E (| ρ (Y_{n}) |^{ν}),

under assumption (N.0) and using the fact that

\sum_{n > 1} T_{n}^{- ν} < \infty,

By the Borel-Cantelli lemma,

| ρ (Y_{n}) | \leq T_{n}

almost surely for all sufficiently large n. Given the monotonicity of

T_{n}

, this implies

| ρ (Y_{i}) | \leq T_{n}

for all

i \leq n

. Combining these results with the inequality (46) as in [74], we conclude that

W_{n, 1} (ρ; x) = sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x)| = o (h_{n}), a . s .

(47)

Once more, by (7), we infer that

\begin{matrix} sup_{x \in D} |{\bar{r}}_{n}^{T} (ρ; x) - {\bar{r}}_{n} (ρ; x)| \\ \leq \frac{1}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} |E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] \\ - E [ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) ∣ F_{i - 1}]| \\ \leq \frac{1}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | > T_{n}} K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] \\ \leq \frac{1}{n} \sum_{i = 1}^{n} E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | > T_{n}} \frac{C_{d + | β | + 1}}{h_{n}^{d + | β |} (1 + ∥ x - X_{i} {∥_{2} h_{n}^{- 1})}^{d + | β | + 1}} | F_{i - 1}] \\ \leq \frac{h_{n} C_{d + | β | + 1}}{n} \sum_{i = 1}^{n} E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | > T_{n}} | F_{i - 1}] . \end{matrix}

Observe that the measurability of the functions

ρ (\cdot)

and

v \mapsto v^{q}

, combined with the ergodicity of

{(Y_{n})}_{n \in N}

, ensures that the process

{(g^{F_{i - 1}} (v))}_{n \in N}

satisfies condition (N.1). Applying the Hölder and Markov inequalities, for any

ϵ > 0

and exponents

p, q

fulfilling

\frac{1}{p} + \frac{1}{q} = 1,

that

\begin{matrix} E [| ρ (Y_{i}) | 1 {| ρ (Y_{i}) | > T_{n}} | F_{i - 1}] \\ \leq {(E [| ρ (Y_{i}) |^{q} | F_{i - 1}])}^{1 / q} {(P {| ρ (Y_{i}) | > T_{n} | F_{i - 1}})}^{1 / p} \\ \leq T_{n}^{- q} E [| ρ (Y_{i}) |^{q} | F_{i - 1}] \\ = T_{n}^{- q} \int_{R} {| ρ (v) |}^{q} g^{F_{i - 1}} (v) d v . \end{matrix}

This, in turn, readily implies that

\begin{matrix} sup_{x \in D} |{\bar{r}}_{n}^{T} (ρ; x) - {\bar{r}}_{n} (ρ; x)| \\ \leq h_{n} C_{d + | β | + 1} sup_{y \in R} ∥\frac{1}{n g (y)} \sum_{i = 1}^{n} g^{F_{i - 1}} (y)∥ T_{n}^{- q} \int_{R} {| ρ (v) |}^{q} g (v) d v \\ = h_{n} sup_{y \in R} ∥\frac{1}{n g (y)} \sum_{i = 1}^{n} g^{F_{i - 1}} (y)∥ T_{n}^{- q} E [| ρ (Y_{i}) |^{q}], \end{matrix}

which gives

W_{n, 3} (ρ; x) = sup_{x \in D} |{\bar{r}}_{n}^{T} (ρ; x) - {\bar{r}}_{n} (ρ; x)| = O (h_{n}) a . s .

(48)

Lastly, we turn our attention to second term of the decomposition (45), given the compactness of D, there exists a finite covering by

L (n)

cubes

I_{n, j}

, each centered at

x_{n, j}

with side length

ℓ_{n}

, where

j = 1, \dots, L (n)

. It follows directly that

ℓ_{n} = \frac{const}{L^{1 / d} (n)} .

where

L (n) = {(\frac{2^{(d + 2) m (n)} n T_{n}^{2}}{log n})}^{d / 2} = {(\frac{n T_{n}^{2}}{h_{n}^{d + 2} log n})}^{d / 2} .

we readily infer that

\begin{matrix} W_{n, 2} (ρ; x) & = & sup_{x \in D} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\bar{r}}_{n}^{T} (ρ; x)| \\ = & max_{1 \leq j \leq L (n)} sup_{x \in D \cap I_{j}} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\bar{r}}_{n}^{T} (ρ; x)| \\ \leq & max_{1 \leq j \leq L (n)} sup_{x \in D \cap I_{j}} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x_{j})| \\ + max_{1 \leq j \leq L (n)} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x_{j}) - {\bar{r}}_{n}^{T} (ρ; x_{j})| \\ + max_{1 \leq j \leq L (n)} sup_{x \in D \cap I_{j}} |{\bar{r}}_{n}^{T} (ρ; x_{j}) - {\bar{r}}_{n}^{T} (ρ; x)| \\ = & Q_{1} + Q_{2} + Q_{3} . \end{matrix}

(49)

The statement (9) allows us to infer that

\begin{matrix} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x_{j})| \\ = |\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} (K (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) - K (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}))| \\ \leq \frac{d^{1 / 2} C_{2} T_{n}}{h_{n}^{d + | β | + 1}} ∥ x - x_{i} ∥, \end{matrix}

which implies that

\begin{matrix} Q_{1} & = max_{1 \leq j \leq L (n)} sup_{x \in D \cap I_{j}} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x) - {\bar{r}}_{n}^{T} (ρ; x)| \\ \leq \frac{d^{1 / 2} C_{2} T_{n} ℓ_{n}}{h_{n}^{d + | β | + 1}} = \frac{d^{1 / 2} C_{2} T_{n} c o n s t}{L^{1 / d} (n) h_{n}^{d + | β | + 1}} \\ = O ({(\frac{log n}{n h_{n}^{d + 2 | β |}})}^{1 / 2}), a . s . \end{matrix}

(50)

A similar argument shows, likewise, that

\begin{matrix} Q_{3} & = max_{1 \leq j \leq L (n)} sup_{x \in D \cap I_{j}} |{\bar{r}}_{n}^{T} (ρ; x_{j}) - {\bar{r}}_{n}^{T} (ρ; x)| \\ \leq \frac{d^{1 / 2} C_{2} T_{n} ℓ_{n}}{h_{n}^{d + | β | + 1}} = \frac{d^{1 / 2} C_{2} T_{n} c o n s t}{L^{1 / d} (n) h_{n}^{d + | β | + 1}} \\ = O ({(\frac{log n}{n h_{n}^{d + 2 | β |}})}^{1 / 2}), a . s . \end{matrix}

(51)

We now analyze the term

Q_{2}

in the right side of (49) and show that

Q_{2} = O ({(\frac{log n}{n h_{n}^{d + 2 | β |}})}^{1 / 2}), a . s .

(52)

Observe that

\begin{matrix} Q_{2} & = & max_{1 \leq j \leq L (n)} |{\hat{(\partial^{β} r)}}_{n}^{T} (ρ; x_{j}) - {\bar{r}}_{n}^{T} (ρ; x_{j})| \\ = & max_{1 \leq j \leq L (n)} |\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} (ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) \\ - E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}])| \\ = & max_{1 \leq j \leq L (n)} |\frac{1}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} Z (ρ, x_{j})|, \end{matrix}

where, for

i = 1, \dots, n

,

\begin{matrix} Z (ρ, x_{j}) & = & {(- 1)}^{| β |} (ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}) \\ - E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}]) \end{matrix}

forms a sequence of martingale-difference arrays relative to the

σ

-field

F_{i}

. We now utilize the Lemma 3 for the partial sums of unbounded of martingale differences to obtain an upper bound. Observe that

\begin{matrix} {(Z (ρ, x_{j}))}^{p} \\ = & {(- 1)}^{p | β |} (ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) \\ {- E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}])}^{p} \\ = & {(- 1)}^{p | β |} \sum_{k = 0}^{p} {(- 1)}^{p - k} C_{p}^{k} {(ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{k} \\ {(E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}])}^{p - k} . \end{matrix}

By using the fact that

{(E [ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}])}^{p - k}

is

F_{i - 1}

-measurable, it follows then

\begin{matrix} |E [Z {(ρ, x_{j})}^{p} | F_{i - 1}]| \\ \leq \sum_{k = 0}^{p} C_{p}^{k} E [|{(ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{k}| | F_{i - 1}] \\ \times E [|{(ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x_{j}}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{p - k}| | F_{i - 1}] . \end{matrix}

(53)

recalling the fact that

F_{i - 1} \subset G_{i - 1}

and applying the assumption (N.2)(ii), for any integer

ℓ > 1

, we obtain

\begin{matrix} E [{|ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ = E [E [{|ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | G_{i - 1}] | F_{i - 1}] \\ \leq E [E [{|ρ (Y_{i})|}^{ℓ} | G_{i - 1}] {|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ = E [m_{ρ, ℓ} (X_{i}) {|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}], \end{matrix}

(54)

where

m_{ρ, ℓ} (X_{i})

is a positive function as stipulated in assumption (N.2)(ii), ensures that

\begin{matrix} E [{|ρ (Y_{i}) 1 {| ρ (Y_{i}) | \leq T_{n}} K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ \leq E [∣ m_{ρ, ℓ} (X_{i}) - m_{ρ, ℓ} (x) ∣ {|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ + m_{ρ, ℓ} (x) E [{|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ = [sup_{∥ u - x ∥ \leq h_{n}} ∣ m_{ρ, ℓ} (u) - m_{ρ, ℓ} (x) ∣ + m_{ρ, ℓ} (x)] E [{|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ \leq C_{0} E [{|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}], \end{matrix}

where

C_{0}

is a positive constant. It follows readily from (7),

\begin{matrix} E [{|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] & \leq & E [{|K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})|}^{ℓ} | F_{i - 1}] \\ = & \int_{R^{d}} {|K^{(β)} (\frac{x}{h_{n}}, \frac{y}{h_{n}})|}^{ℓ} f^{F_{i - 1}} (y) d y \\ \leq & \int_{R^{d}} {(\frac{C_{d + | β |}}{(1 + h_{n}^{- 1} {∥ x - y ∥}_{2})^{d + | β |}})}^{ℓ} f^{F_{i - 1}} (y) d y \\ \leq & h_{n}^{ℓ (d + | β |)} C_{d + | β |}^{ℓ}, \end{matrix}

(55)

statements (53), (54) and (55) give the following upper bound

\begin{matrix} |E [Z {(ρ, x_{j})}^{p} | F_{i - 1}]| & \leq & C_{0}^{2} \sum_{k = 0}^{p} C_{p}^{k} h_{n}^{p (d + | β |)} C_{d + | β |}^{p} \\ \leq & C_{0}^{2} 2^{p} h_{n}^{d + | β |} C_{d + | β |}^{p} \leq p! C^{p - 1} d_{i}^{2} . \end{matrix}

We apply Lemma 3 to the summation of the martingale differences

Z_{n} (x_{j}, X_{i})

, where

C = C_{d + | β |}, d_{i}^{2} = C_{0}^{2} C_{d + | β |} h_{n}^{d + | β |}, D_{n} = \sum_{i = 1}^{n} d_{i}^{2} = O (n h_{n}^{d + | β |})

and

ϵ_{n} = ϵ_{0} {(log n / n h_{n}^{d + 2 | β |})}^{1 / 2} .

Consequently, there exists a positive constant

C_{2} > 0

such that the following inequalities hold:

\begin{matrix} P (max_{1 \leq j \leq L (n)} |\frac{1}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} Z (ρ, x_{j})| > ϵ_{n}) \\ \leq \sum_{j = 1}^{L (n)} P (|\sum_{i = 1}^{n} Z (ρ, x_{j})| > ϵ_{n} n h_{n}^{d + | β |}) \\ \leq 2 L (n) exp \{- \frac{ϵ_{0}^{2} {(n h_{n}^{d + | β |})}^{2} (log n / n h_{n}^{d + 2 | β |})}{2 (D_{n} + 2 C_{d + 1} n h_{n}^{d + | β |} {(log n / n h_{n}^{d + 2 | β |})}^{1 / 2})}\} \\ \leq {(\frac{n T_{n}^{2}}{h_{n}^{d + 1} log n})}^{d / 2} exp \{- \frac{ϵ_{0}^{2} n h_{n}^{d + | β |} (log n / h_{n}^{| β |})}{O (n h_{n}^{d + | β |}) (1 + 2 C_{d + 1} {(log n / n h_{n}^{d + 2 | β |})}^{1 / 2})}\} \\ \leq {(T_{n} {(\frac{log n}{n h_{n}^{d}})}^{1 / 2})}^{d} {(\frac{1}{{(n h_{n})}^{1 / 2} log n})}^{d} n^{- (\frac{C_{2} ϵ_{0}^{2}}{h_{n}^{| β |}} - \frac{3 d}{2})} . \end{matrix}

Observe that

n h_{n} \to \infty as n \to \infty, and \frac{1}{h_{n}^{| β |}} \to \infty as n \to \infty .

this implies that when

n \to \infty

we have

\frac{C_{2} ϵ_{0}^{2}}{h_{n}^{| β |}} - \frac{3 d}{2} > 0,

we readily obtain, under condition (11), that

\sum_{n \geq 1} P (max_{1 \leq j \leq L (n)} |\frac{1}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} Z (ρ, x_{j})| > ϵ_{n}) < \infty .

(56)

The assertion (52) is established through a standard application of the Borel-Cantelli lemma. Consequently, the result stated in Theorem 2 follows by applying the decompositions (40) that resulted in two terms: the first one was rewritten in (45) as the sum of three terms, the statements (47) and (48) and the decomposition (49) in conjunction with estimates (50), (51), and (52). To complete the argument, we now analyze the second term on the right-hand side of (40). Employing the same reasoning as in (54), and in view of assumptions (N.2)(i) and (N.3)(i), we observe that

\begin{matrix} sup_{x \in D} |{\bar{r}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} (E [ρ (Y_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] - E [ρ (Y_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})])| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} (E [m_{ρ} (X_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] - E [m_{ρ} (X_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})])| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} \int_{R^{d}} m_{ρ} (y) K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) (f^{F_{i - 1}} (y) - f (y)) d y| \\ = sup_{x \in D} |\frac{{(- 1)}^{| β |}}{h_{n}^{d + | β |}} \int_{R^{d}} m_{ρ} (y) K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (y) - f (y)) d y| \\ \leq C_{m} sup_{x \in D} |\frac{{(- 1)}^{| β |}}{h_{n}^{d + | β |}} \int_{R^{d}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (y) - f (y)) d y| . \end{matrix}

Making use of the Cauchy-Schwarz inequality and statement (7) when

m = d + 2 | β | + 1,

we obtain readily that

\begin{matrix} |\frac{{(- 1)}^{| β |}}{h_{n}^{d + | β |}} \int_{R^{d}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (y) - f (y)) d y| \\ \leq {(\int_{R^{d}} {|\frac{1}{h_{n}^{d + | β |}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}})|}^{2} d y)}^{1 / 2} {(\int_{R^{d}} | \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (y) - f (y) |^{2} d y)}^{1 / 2} \\ \leq ∥ \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} - f ∥_{L^{2}} {(\int_{R^{d}} (\frac{1}{h_{n}^{d + 2 | β |}} \frac{C_{d + 2 | β | + 1}}{(1 + h_{n}^{- 1} {∥ x - y ∥)}^{d + 2 | β | + 1}}) |\frac{1}{h_{n}^{d}} K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}})| d y)}^{1 / 2} \\ \leq h_{n}^{1 / 2} C_{d + 2 | β | + 1} ∥ \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} - f ∥_{L^{2}} {(\int_{R^{d}} |K^{β} (\frac{x}{h_{n}}, z)| d z)}^{1 / 2} \\ \leq h_{n}^{1 / 2} C_{d + 2 | β | + 1} ∥ \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} - f ∥_{L^{2}} G_{1}^{1 / 2} (d) . \end{matrix}

Under Assumption (C.2), we deduce that

\begin{matrix} G_{n, 2} (x) & = sup_{x \in D} |{\bar{r}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]| \\ \leq C_{m} h_{n}^{1 / 2} C_{d + 2 | β | + 1} G_{1}^{1 / 2} (d) ∥ \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} - f ∥_{L^{2}} \\ = O (h_{n}^{1 / 2}) . \end{matrix}

(57)

Hence the proof is complete. □

Proof of Theorem 3.

Recall the decomposition

\begin{matrix} \sqrt{n h_{n}^{d + 2 | β |}} ({\hat{(\partial^{β} r)}}_{n} (ρ; x) - (\partial^{β} r) (ρ; x)) \\ = \sqrt{n h_{n}^{d + 2 | β |}} (({\hat{(\partial^{β} r)}}_{n} (ρ; x) - {\bar{r}}_{n} (ρ; x)) \\ + ({\bar{r}}_{n} (ρ; x) - \hat{(\partial^{β} r)} (ρ; x))) \\ = \sqrt{n h_{n}^{d + 2 | β |}} (Q_{n} (ρ; x) + B_{n} (ρ; x)) . \end{matrix}

(58)

Observe that

\begin{matrix} B_{n} (ρ; x) & = & ({\bar{r}}_{n} (ρ; x) - E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)]) \\ + (E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)] - \hat{(\partial^{β} r)} (ρ; x)) \\ = & B_{n, 1} (ρ; x) + B_{n, 2} (ρ; x) . \end{matrix}

(59)

Thus, employing reasoning analogous to that used for (57), under hypothesis (C.2), statement (7) with

m = 2 (d + | β |)

, and Condition (12), we directly conclude that

\begin{matrix} {(n h_{n}^{d + 2 | β |})}^{1 / 2} B_{n, 1} (x) \\ = {(n h_{n}^{d + 2 | β |})}^{1 / 2} (\frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} (E [ρ (Y_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] - E [ρ (Y_{i}) K^{β} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})])) \\ = O (h_{n}^{d / 2}) {(n h_{n}^{d + 2 | β |})}^{1 / 2} \\ = o (1) . \end{matrix}

(60)

We turn our attention to the term

B_{n, 2} (ρ; x)

and observe that under assumption (C.3), we have

\begin{matrix} E [{\hat{(\partial^{β} r)}}_{n} (ρ; x)] & = & \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} E [ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})] \\ = & \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} E [E [ρ (Y_{i}) | X_{i}] K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})] \\ = & \frac{{(- 1)}^{| β |}}{n h_{n}^{d + | β |}} \sum_{i = 1}^{n} \int_{R^{d}} m_{ρ} (u) K^{(β)} (\frac{x}{h_{n}}, \frac{u}{h_{n}}) f (u) d u \\ = & \frac{1}{h_{n}^{d}} \int_{R^{d}} (\partial^{β} r) (ρ; u) K (\frac{x}{h_{n}}, \frac{u}{h_{n}}) d u \\ = & \int_{{[- 2 L, 2 L]}^{d}} (\partial^{β} r) (ρ; x + h_{n} v) K (\frac{x}{h_{n}}, \frac{x}{h_{n}} + v) d v \\ = & (\partial^{β} r) (ρ; x) + h_{n} (\partial^{β + 1} r) (ρ; x) \int_{{[- 2 L, 2 L]}^{d}} K (0, v) d v . \end{matrix}

(61)

On the other hand, by Condition (12), we have

{(n h_{n}^{d + 2 | β |})}^{1 / 2} B_{n, 2} (ρ; x) = O (h_{n}^{δ} {(n h_{n}^{d + 2 | β})}^{1 / 2}) = o (1) .

(62)

Observe that

\begin{matrix} \sqrt{n h_{n}^{d + 2 | β |}} Q_{n} (x) & = & \sum_{i = 1}^{n} \{(\frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}})) \end{matrix}

\begin{matrix} - E [\frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}]\} \\ = & \sum_{i = 1}^{n} \{ξ_{n i} (x_{j}, X_{i}) - E [ξ_{n i} (x_{j}, X_{i}) ∣ F_{i - 1}]\} \end{matrix}

(63)

\begin{matrix} = \sum_{i = 1}^{n} χ_{n i} (x_{j}, X_{i}), \end{matrix}

(64)

Here,

χ_{n i} (x, ρ)

constitutes a martingale difference array adapted to the filtration

F_{i}

. This justifies applying the martingale central limit theorem for discrete-time arrays (cf. [75]) to demonstrate the asymptotic normality of

\sqrt{n h_{n}^{d + 2 | β |}} Q_{n} (x)

. To achieve this, it suffices to verify the following conditions:

(a): Lyapunov condition:

$\sum_{i = 1}^{n} E [χ_{n i}^{2} (x_{j}, X_{i}) ∣ F_{i - 1}] \overset{P}{\to} Σ^{2} (x);$
(b): Lindberg condition:

$n E [χ_{n i}^{2} (x_{j}, X_{i}) 1_{\{|χ_{n i} (x_{j}, X_{i})| > ϵ\}}] = o (1), holds true for any ϵ > 0 .$

Proof of part (a).

Observe that

|\sum_{i = 1}^{n} E [ξ_{n i}^{2} (x_{j}, X_{i}) ∣ F_{i - 1}] - \sum_{i = 1}^{n} E [χ_{n i}^{2} (x_{j}, X_{i}) ∣ F_{i - 1}]| = \sum_{i = 1}^{n} {(E [ξ_{n i} (x_{j}, X_{i}) ∣ F_{i - 1}])}^{2} .

Employing the same reasoning as in (54), and employing a first-order Taylor expansion alongside Equation (7), and in view of assumptions (N.2)(i), (N.3)(i), and (C.4)(ii), we obtain

\begin{matrix} E [ξ_{n i} (x_{j}, X_{i}) ∣ F_{i - 1}] & = & \frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} E [ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] \\ = & \frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} E [E [ρ (Y_{i}) | G_{i - 1}] K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}) | F_{i - 1}] \\ = & \frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} \int_{R^{d}} m_{ρ} (y) K^{β} (\frac{x}{h_{n}}, \frac{y}{h_{n}}) f^{F_{i - 1}} (y) d y \\ \leq & \frac{{(- 1)}^{| β |} C_{m} \sqrt{h_{n}^{d}}}{\sqrt{n}} \int_{R^{d}} K^{β} (\frac{x}{h_{n}}, \frac{x}{h_{n}} + u) f^{F_{i - 1}} (x + h_{n} u) d u \\ \leq & (\frac{{(- 1)}^{| β |} C_{m} \sqrt{h_{n}^{d}}}{\sqrt{n}}) (\frac{C_{d + | β |}}{{(1 + ∥ u ∥}_{2})^{d + | β |}}) (f^{F_{i - 1}} (x) + o (1)) . \end{matrix}

Therefore, we infer

\begin{matrix} \sum_{i = 1}^{n} {(E [ξ_{n i} (x_{j}, X_{i}) ∣ F_{i - 1}])}^{2} \\ = (h_{n}^{d}) {(\frac{C_{d + | β |} C_{m}}{{(1 + ∥ u ∥}_{2})^{d + | β |}})}^{2} (\frac{1}{n} \sum_{i = 1}^{n} {(f^{F_{i - 1}} (x))}^{2} + o (1)) . \end{matrix}

The ergodicity of the process

{(X_{n})}_{n \geq 1}

implies that the process defined by

{({(f^{F_{i - 1}} (x))}^{2})}_{n \geq 1}

is ergodic too and verifies the condition (C.2), which means that

\frac{1}{n} \sum_{i = 1}^{n} {(f^{F_{i - 1}} (x))}^{2} = {(f (x))}^{2},

in both the almost sure and

L^{2}

senses. This implies

\sum_{i = 1}^{n} {(E [ξ_{n i} (x_{j}, X_{i}) ∣ F_{i - 1}])}^{2} = O (h_{n}^{d}) .

The statement (a) follows then from

lim_{n \to \infty} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x_{j}, X_{i}) ∣ F_{i - 1}] \overset{P}{=} Σ_{ρ}^{2} (x) .

Observe that by the fact that (see [28])

K^{(β)} (u, v) = K^{(β)} (u + k, v + k), f o r k \in Z^{d} .

By assumption (N.2)(ii), we have

\begin{matrix} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x_{j}, X_{i}) ∣ F_{i - 1}] \\ = \sum_{i = 1}^{n} E [{(\frac{{(- 1)}^{| β |}}{\sqrt{n h_{n}^{d}}} ρ (Y_{i}) K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} E [E [ρ^{2} (Y_{i}) | G_{i - 1}] {(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} E [m_{ρ, 2} (X_{i}) {(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ \leq \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} E [sup_{{u : ∥ x - u ∥ < h}} |m_{ρ, 2} (X_{i}) - m_{ρ, 2} (x)| {(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ + \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} E [m_{ρ, 2} (x) {(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = I_{n, 1} (x) + I_{n, 2} (x) . \end{matrix}

(65)

Under assumptions (C.1), (C.4)(ii), and (N.3)(iii) we have

\begin{matrix} I_{n, 1} (x) & = & \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} E [sup_{{u : ∥ x - u ∥ < h}} |m_{ρ, 2} (X_{i}) - m_{ρ, 2} (x)| {(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = & \frac{o (1)}{n h_{n}^{d}} \sum_{i = 1}^{n} E [{(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = & o (1) \frac{1}{n} \sum_{i = 1}^{n} \int_{{[- 2 L, 2 L]}^{d}} {(K^{(β)} (\frac{x}{h_{n}}, \frac{x}{h_{n}} + v))}^{2} f^{F_{i - 1}} (x + h_{n} v) d v \\ = & o (1) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x) + o (1)) \int_{{[- 2 L, 2 L]}^{d}} {(K^{(β)} (\frac{x}{h_{n}}, \frac{x}{h_{n}} + v))}^{2} d v \\ = & o (1) (f (x) + o (1)) \int_{{[- 2 L, 2 L]}^{d}} {(K^{(β)} (0, v))}^{2} d v \\ = & o (1) . \end{matrix}

(66)

Observe that

\begin{matrix} I_{n, 2} (x) & = & m_{ρ, 2} (x) (\frac{1}{n h_{n}^{d}}) \sum_{i = 1}^{n} E [{(K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))}^{2} | F_{i - 1}] \\ = & m_{ρ, 2} (x) (\frac{1}{n h_{n}^{d}}) \sum_{i = 1}^{n} \int_{R^{d}} {(K^{(β)})}^{2} (\frac{x}{h_{n}}, \frac{u}{h_{n}}) f^{F_{i - 1}} (u) d u \\ = & m_{ρ, 2} (x) \frac{1}{n} \sum_{i = 1}^{n} \int_{R^{d}} {(K^{(β)} (0, v))}^{2} f^{F_{i - 1}} (x + h_{n} v) d v \\ = & m_{ρ, 2} (x) \int_{R^{d}} {(K^{(β)} (0, v))}^{2} (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x) + o (1)) d v . \end{matrix}

(67)

Combining Equations (66) and (67) we deduce

\begin{matrix} lim_{n \to \infty} \sum_{i = 1}^{n} E [ξ_{n i}^{2} (x_{j}, X_{i}) ∣ G_{i - 1}] & \overset{P}{=} Σ_{ρ}^{2} (x) \\ = m_{ρ, 2} (x) f (x) \int_{R^{d}} {(K^{(β)} (0, v))}^{2} d v . \end{matrix}

(68)

Proof of (b).

The Lindberg condition results from Corollary 9.5.2 in [76] which implies that

n E [χ_{n i}^{2} (x_{j}, X_{i}) 1_{\{|χ_{n i} (x_{j}, X_{i})| > ϵ\}}] \leq 4 n E [ξ_{n i}^{2} (x_{j}, X_{i}) 1_{\{|ξ_{n i} (x_{j}, X_{i})| > ϵ / 2\}}] .

Let

a > 1

and

b > 1

such that

\frac{1}{a} + \frac{1}{b} = 1 .

Making use of Hölder and Markov inequalities one can write, for all

ϵ > 0,

\begin{matrix} E [ξ_{n i}^{2} (x_{j}, X_{i}) 1_{\{|ξ_{n i} (x_{j}, X_{i})| > ϵ / 2\}}] & \leq & \frac{E {|ξ_{n i} (x_{j}, X_{i})|}^{2 a}}{{(ϵ / 2)}^{2 a / b}} . \end{matrix}

Therefore, by using the condition (7) when

m = d + 1

, we obtain

\begin{matrix} 4 n E [ξ_{n i}^{2} (x_{j}, X_{i}) 1_{\{|ξ_{n i} (x_{j}, X_{i})| > ϵ / 2\}}] \\ \leq \frac{4}{n^{a - 1} h_{n}^{a d} {(ϵ / 2)}^{2 a / b}} E {|ρ (Y_{i}) (K^{(β)} (\frac{x}{h_{n}}, \frac{X_{i}}{h_{n}}))|}^{2 a} \\ \leq \frac{4}{n^{a - 1} h_{n}^{a d} {(ϵ / 2)}^{2 a / b}} E {|ρ (Y_{i}) (\frac{C_{d + 1}}{(1 + h_{n}^{- 1} ∥ x - X_{i} {∥_{2})}^{d + 1}})|}^{2 a} \\ = \frac{4 C_{d + 1}^{2 a}}{n^{a - 1} h_{n}^{(a - 1) d} {(ϵ / 2)}^{2 a / b}} E {|ρ (Y_{i})|}^{2 a} . \end{matrix}

Hence, under (N.0) we infer that

\begin{matrix} 4 n E [ξ_{n i}^{2} (x, ρ) 1_{\{|ξ_{n i} (x, ρ)| > ϵ / 2\}}] & \leq \frac{4 C_{d + 1}^{2 a}}{n^{a - 1} h_{n}^{(a - 1) d} {(ϵ / 2)}^{2 a / b}} \\ = O ({(\frac{1}{n h_{n}^{d}})}^{a - 1}) . \end{matrix}

(69)

Combining statements (68) and (69), we achieve the proof of the theorem. □

Author Contributions

Conceptualization, S.D. and S.B.; methodology, S.D. and S.B.; validation, S.D. and S.B.; formal analysis, S.D. and S.B.; investigation, S.D. and S.B.; resources, S.D. and S.B.; writing—original draft preparation, S.D. and S.B.; writing—review and editing, S.D. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors gratefully acknowledge Qassim University, represented by the Deanship of Scientific Research. The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the four reviewers for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in markedly improved presentation.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Besov Spaces

Following [74], let

1 \leq p, q \leq \infty

and introduce the shift operator

(S_{τ} f) (x), f (x - τ), τ \in R^{d} .

Define

ℓ_{s, p, q} (f) = {(\int_{R^{d}} {[{∥ τ ∥}^{- s} {∥ S_{τ} f - f ∥}_{L_{p}}]}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q},

with the usual modification

ℓ_{s, p, \infty} (f) = sup_{τ \in R^{d}} \frac{∥ S_{τ} {f - f ∥}_{L_{p}}}{{∥ τ ∥}^{s}} .

For first-order regularity one replaces the first difference with a second difference:

ℓ_{1, p, q} (f) = {(\int_{R^{d}} {[{∥ τ ∥}^{- 1} {∥ S_{τ} f + S_{- τ} f - 2 f ∥}_{L_{p}}]}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q},

and

ℓ_{1, p, \infty} (f) = sup_{τ \in R^{d}} \frac{∥ S_{τ} f + S_{- τ} {f - 2 f ∥}_{L_{p}}}{∥ τ ∥} .

For

0 < s < 1

the Besov space

B_{s, p, q} = \{f \in L_{p} (R^{d}) : ℓ_{s, p, q} (f) < \infty\} .

When

s > 1

write

s = {[s]}^{-} + {s}^{+}

with

{[s]}^{-} \in N

and

0 < {s}^{+} \leq 1

. Then

f \in B_{s, p, q}

iff every weak derivative

D^{j} f

with

| j | \leq {[s]}^{-}

satisfies

D^{j} f \in B_{{s}^{+}, p, q};

equivalently,

{∥ f ∥}_{B_{s, p, q}} = {∥ f ∥}_{L_{p}} + \sum_{| j | \leq {[s]}^{-}} ℓ_{{s}^{+}, p, q} (D^{j} f) < \infty .

Prominent examples include the Sobolev space

H_{2}^{s} = B_{s, 2, 2}

and the class of bounded s-Lipschitz functions

B_{s, \infty, \infty}

.

Remark A1.

For density estimation over a Sobolev ball of smoothness s, the minimax

L^{2}

-risk is

O (n^{- 2 s / (2 s + 1)})

; see [77]. Comprehensive surveys of the relationships between classical function spaces and Besov spaces—including Fourier-analytic characterisations of Sobolev spaces when

p \neq 2

—are given in [78,79]. Connections between Besov spaces and the

V_{p}

spaces of functions of bounded p-variation are developed in [80], relying on interpolation theory from [32]; a traditional exposition of p-variation may be found in [81]. For Besov spaces on more general geometric settings such as manifolds or Dirichlet spaces, consult [82].

References

Wand, M.P.; Jones, M.C. Kernel smoothing. In Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; Volume 60, pp. xii+212. [Google Scholar] [CrossRef]
Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Volume II. Regression; Springer Series in Statistics; Springer: Dordrecht, The Netherlands, 2009; pp. xx+571. [Google Scholar] [CrossRef]
Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. Non-parametric inference for density modes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 99–126. [Google Scholar] [CrossRef]
Noh, Y.K.; Sugiyama, M.; Liu, S.; du Plessis, M.C.; Park, F.C.; Lee, D.D. Bias reduction and metric learning for nearest-neighbor estimation of Kullback-Leibler divergence. Neural Comput. 2018, 30, 1930–1960. [Google Scholar] [CrossRef]
Dobrovidov, A.V.; Ruds’ko, I.M. Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method. Autom. Remote Control 2010, 71, 209–224. [Google Scholar] [CrossRef]
Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. On the path density of a gradient field. Ann. Stat. 2009, 37, 3236–3271. [Google Scholar] [CrossRef] [PubMed]
Singh, R.S. Applications of estimators of a density and its derivatives to certain statistical problems. J. R. Stat. Soc. Ser. B 1977, 39, 357–363. [Google Scholar] [CrossRef]
Meyer, T.G. Bounds for estimation of density functions and their derivatives. Ann. Stat. 1977, 5, 136–142. [Google Scholar] [CrossRef]
Silverman, B.W. Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Stat. 1978, 6, 177–184. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2002; pp. x+190, Methods and case studies. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005; pp. xx+426. [Google Scholar]
Liu, S.; Kong, X. A generalized correlated C_p criterion for derivative estimation with dependent errors. Comput. Statist. Data Anal. 2022, 171, 107473. [Google Scholar] [CrossRef]
Eubank, R.L.; Speckman, P.L. Confidence bands in nonparametric regression. J. Amer. Statist. Assoc. 1993, 88, 1287–1301. [Google Scholar] [CrossRef]
Ruppert, D.; Sheather, S.J.; Wand, M.P. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995, 90, 1257–1270. [Google Scholar] [CrossRef]
Park, C.; Kang, K.H. SiZer analysis for the comparison of regression curves. Comput. Statist. Data Anal. 2008, 52, 3954–3970. [Google Scholar] [CrossRef]
Härdle, W.; Gasser, T. On robust kernel estimation of derivatives of regression functions. Scand. J. Statist. 1985, 12, 233–240. [Google Scholar]
Ziegler, K. On the asymptotic normality of kernel regression estimators of the mode in the nonparametric random design model. J. Statist. Plann. Inference 2003, 115, 123–144. [Google Scholar] [CrossRef]
Georgiev, A.A. Speed of convergence in nonparametric kernel estimation of a regression function and its derivatives. Ann. Inst. Statist. Math. 1984, 36, 455–462. [Google Scholar] [CrossRef]
Deheuvels, P.; Mason, D.M. General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 2004, 7, 225–277. [Google Scholar] [CrossRef]
Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, approximation, and statistical applications. In Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 129, pp. xviii+265. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Nonparametric estimation of the derivatives of a density by the method of wavelets. Bull. Inf. Cybern. 1996, 28, 91–100. [Google Scholar]
Chaubey, Y.P.; Doosti, H.; Prakasa Rao, B.L.S. Wavelet based estimation of the derivatives of a density with associated variables. Int. J. Pure Appl. Math. 2006, 27, 97–106. [Google Scholar]
Rao, B.L.S.P. Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets. In Asymptotics in Statistics and Probability. Papers in Honor of George Gregory Roussas; Puri, M.L., Ed.; De Gruyter: Berlin, Germany; Boston, MA, USA, 2000; pp. 321–330. [Google Scholar] [CrossRef]
Hosseinioun, N.; Doosti, H.; Niroumand, H.A. Nonparametric estimation of a multivariate probability density for mixing sequences by the method of wavelets. Ital. J. Pure Appl. Math. 2011, 28, 31–40. [Google Scholar]
Koshkin, G.; Vasil’iev, V. An estimation of a multivariate density and its derivatives by weakly dependent observations. In Statistics and Control of Stochastic Processes. The Liptser Festschrift. Papers from the Steklov Seminar Held in Moscow, Russia, 1995–1996; World Scientific: Singapore, 1997; pp. 229–241. [Google Scholar]
Prakasa Rao, B.L.S. Wavelet estimation for derivative of a density in the presence of additive noise. Braz. J. Probab. Stat. 2018, 32, 834–850. [Google Scholar] [CrossRef]
Meyer, Y. Wavelets and operators. In Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1992; Volume 37, pp. xvi+224. [Google Scholar]
Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Statist. 1996, 24, 508–539. [Google Scholar] [CrossRef]
Schneider, C. Beyond Sobolev and Besov—Regularity of solutions of PDEs and their traces in function spaces. In Lecture Notes in Mathematics; Springer: Cham, Switzerland, 2021; Volume 2291, pp. xviii+327. [Google Scholar] [CrossRef]
Sawano, Y. Theory of Besov spaces. In Developments in Mathematics; Springer: Singapore, 2018; Volume 56, pp. xxiii+945. [Google Scholar] [CrossRef]
Peetre, J. New Thoughts on Besov Spaces; Duke University Mathematics Series, No. 1; Duke University, Mathematics Department: Durham, NC, USA, 1976; pp. vi+305. [Google Scholar]
Rosenblatt, M. Uniform ergodicity and strong mixing. Z. Wahrscheinlichkeitstheorie Verw. Geb. 1972, 24, 79–84. [Google Scholar] [CrossRef]
Bradley, R.C. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 3, pp. xii+597. [Google Scholar]
Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
Daubechies, I. Ten lectures on wavelets. In CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 61, pp. xx+357. [Google Scholar] [CrossRef]
Peškir, G. The uniform mean-square ergodic theorem for wide sense stationary processes. Stoch. Anal. Appl. 1998, 16, 697–720. [Google Scholar] [CrossRef]
Hall, P.; Penev, S. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 2001, 7, 317–341. [Google Scholar] [CrossRef]
Bellman, R. Adaptive Control Processes: A Guided Tour; Princeton University Press: Princeton, NJ, USA, 1961; pp. xvi+255. [Google Scholar]
Scott, D.W.; Wand, M.P. Feasibility of multivariate density estimates. Biometrika 1991, 78, 197–205. [Google Scholar] [CrossRef]
Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. [Google Scholar] [CrossRef]
Shen, A.; Li, X.; Zhang, Y.; Qiu, Y.; Wang, X. Consistency and asymptotic normality of wavelet estimator in a nonparametric regression model. Stochastics 2021, 93, 868–885. [Google Scholar] [CrossRef]
Liu, Y.; Zeng, X. Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 2020, 48, 321–342. [Google Scholar] [CrossRef]
Tang, X.; Xi, M.; Wu, Y.; Wang, X. Asymptotic normality of a wavelet estimator for asymptotically negatively associated errors. Statist. Probab. Lett. 2018, 140, 191–201. [Google Scholar] [CrossRef]
Niu, S.l. Asymptotic normality of wavelet density estimator under censored dependent observations. Acta Math. Appl. Sin. Engl. Ser. 2012, 28, 781–794. [Google Scholar] [CrossRef]
Li, Y.; Guo, j. Asymptotic normality of wavelet estimator for strong mixing errors. J. Korean Statist. Soc. 2009, 38, 383–390. [Google Scholar] [CrossRef]
Roueff, F.; Taqqu, M.S. Asymptotic normality of wavelet estimators of the memory parameter for linear processes. J. Time Ser. Anal. 2009, 30, 534–558. [Google Scholar] [CrossRef]
Li, Y.; Yang, S.; Zhou, Y. Consistency and uniformly asymptotic normality of wavelet estimator in regression model with associated samples. Statist. Probab. Lett. 2008, 78, 2947–2956. [Google Scholar] [CrossRef]
Debbarh, M. Normalité asymptotique de l’estimateur par ondelettes des composantes d’un modèle additif de régression. C. R. Math. Acad. Sci. Paris 2006, 343, 601–606. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Liu, J. Asymptotic distribution of the wavelet-based estimators of multivariate regression functions under weak dependence. J. Math. Inequal. 2023, 17, 481–515. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
Eddy, W.F. Optimum kernel estimators of the mode. Ann. Statist. 1980, 8, 870–882. [Google Scholar] [CrossRef]
Eddy, W.F. The asymptotic distributions of kernel estimators of the mode. Z. Wahrsch. Verw. Geb. 1982, 59, 279–290. [Google Scholar] [CrossRef]
Billingsley, P. Convergence of Probability Measures; John Wiley & Sons, Inc.: New York, NY, USA; London, UK; Sydney, Australia, 1968; pp. xii+253. [Google Scholar]
Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
Hall, P.; Marron, J.S. Estimation of integrated squared density derivatives. Statist. Probab. Lett. 1987, 6, 109–115. [Google Scholar] [CrossRef]
Jurečková, J. Asymptotic linearity of a rank statistic in regression parameter. Ann. Math. Statist. 1969, 40, 1889–1900. [Google Scholar] [CrossRef]
Giné, E.; Mason, D.M. Uniform in bandwidth estimation of integral functionals of the density function. Scand. J. Statist. 2008, 35, 739–761. [Google Scholar] [CrossRef]
Levit, B.Y. Asymptotically efficient estimation of nonlinear functionals. Probl. Peredachi Informatsii 1978, 14, 65–72. [Google Scholar]
Masry, E. Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stoch. Process. Appl. 1997, 67, 177–193. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Liu, J. Multivariate wavelet estimators for weakly dependent processes: Strong consistency rate. Comm. Statist. Theory Methods 2023, 52, 8317–8350. [Google Scholar] [CrossRef]
Bouzebda, S. Limit theorems for wavelet conditional U-statistics for time series models. Math. Methods Statist. 2025, 35, 1–42. [Google Scholar]
Chaouch, M.; Laïb, N. Regression estimation for continuous-time functional data processes with missing at random response. J. Nonparametr. Stat. 2024, 36, 1–32. [Google Scholar] [CrossRef]
Giraitis, L.; Leipus, R. A generalized fractionally differencing approach in long-memory modeling. Liet. Mat. Rink. 1995, 35, 65–81. [Google Scholar] [CrossRef]
Guégan, D.; Ladoucette, S. Non-mixing properties of long memory processes. C. R. Acad. Sci. Paris Sér. I Math. 2001, 333, 373–376. [Google Scholar] [CrossRef]
Andrews, D.W.K. Non-strong mixing autoregressive processes. J. Appl. Probab. 1984, 21, 930–934. [Google Scholar] [CrossRef]
Francq, C.; Zakoïan, J.M. GARCH Models: Structure, Statistical Inference and Financial Applications; John Wiley & Sons, Ltd.: Chichester, UK, 2010; pp. xiv+489. [Google Scholar] [CrossRef]
Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivar. Anal. 2013, 116, 52–62. [Google Scholar] [CrossRef]
Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
Burkholder, D.L. Distribution function inequalities for martingales. Ann. Probab. 1973, 1, 19–42. [Google Scholar] [CrossRef]
de la Peña, V.H.; Giné, E. Decoupling; Probability and Its Applications (New York); Springer: New York, NY, USA, 1999; pp. xvi+392. [Google Scholar] [CrossRef]
Masry, E. Wavelet-based estimation of multivariate regression functions in Besov spaces. J. Nonparametr. Statist. 2000, 12, 283–308. [Google Scholar] [CrossRef]
Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Probability and Mathematical Statistics; Academic Press, Inc.; Harcourt Brace Jovanovich, Publishers: New York, NY, USA; London, UK, 1980; pp. xii+308. [Google Scholar]
Chow, Y.S.; Teicher, H. Probability Theory; Springer: New York, NY, USA; Berlin/Heidelberg, Germany, 1978; pp. xv+455. [Google Scholar]
Efromovich, S. Lower bound for estimation of Sobolev densities of order less $\frac{1}{2}$ . J. Statist. Plann. Inference 2009, 139, 2261–2268. [Google Scholar] [CrossRef]
Triebel, H. Theory of function spaces. In Monographs in Mathematics; Birkhäuser Verlag: Basel, Switzerland, 1983; Volume 78, p. 284. [Google Scholar] [CrossRef]
DeVore, R.A.; Lorentz, G.G. Constructive approximation. In Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]; Springer: Berlin/Heidelberg, Germany, 1993; Volume 303, pp. x+449. [Google Scholar]
Bourdaud, G.; Lanza de Cristoforis, M.; Sickel, W. Superposition operators and functions of bounded p-variation. Rev. Mat. Iberoam. 2006, 22, 455–487. [Google Scholar] [CrossRef]
Dudley, R.M.; Norvaiša, R. Differentiability of six operators on nonsmooth functions and p-variation. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1703, pp. viii+277. [Google Scholar] [CrossRef]
Geller, D.; Pesenson, I.Z. Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds. J. Geom. Anal. 2011, 21, 334–371. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Didi, S.; Bouzebda, S. Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics 2025, 13, 1587. https://doi.org/10.3390/math13101587

AMA Style

Didi S, Bouzebda S. Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics. 2025; 13(10):1587. https://doi.org/10.3390/math13101587

Chicago/Turabian Style

Didi, Sultana, and Salim Bouzebda. 2025. "Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes" Mathematics 13, no. 10: 1587. https://doi.org/10.3390/math13101587

APA Style

Didi, S., & Bouzebda, S. (2025). Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics, 13(10), 1587. https://doi.org/10.3390/math13101587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes

Abstract

1. Introduction

Notation

2. Mathematical Backgrounds

2.1. Besov Spaces

2.2. Linear Wavelets Estimator

3. Assumptions and Main Results

3.1. Asymptotic Normality Results

3.2. Confidence Interval

4. Application to the Regression Derivatives

5. Mode Regression

6. Concluding Remarks

7. Proofs

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Besov Spaces

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI