Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes

Sultana Didi; Salim Bouzebda

doi:10.3390/math10224356

and

¹

Department of Statistics, College of Sciences, Qassim University, P.O. Box 6688, Buraydah 51452, Saudi Arabia

²

LMAC (Laboratory of Applied Mathematics of Compiègne), Université de Technologie de Compiégne, 60200 Compiègne, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2022, 10(22), 4356;https://doi.org/10.3390/math10224356

This article belongs to the Special Issue Functional Data Analysis: Theory and Applications to Different Scenarios

Version Notes

Order Reprints

Abstract

In this study, we look at the wavelet basis for the nonparametric estimation of density and regression functions for continuous functional stationary processes in Hilbert space. The mean integrated squared error for a small subset is established. We employ a martingale approach to obtain the asymptotic properties of these wavelet estimators. These findings are established under rather broad assumptions. All we assume about the data is that they are ergodic, but beyond that, we make no assumptions. In this paper, the mean integrated squared error findings in the independence or mixing setting were generalized to the ergodic setting. The theoretical results presented in this study are (or will be) valuable resources for various cutting-edge functional data analysis applications. Applications include conditional distribution, conditional quantile, entropy, and curve discrimination.

Keywords:

multivariate regression estimation; multivariate density estimation; stationarity; ergodicity; rates of strong convergence; wavelet-based estimators; martingale differences; continuous time series

MSC:

62G07; 62G08; 62G05; 62G20; 62H05; 60G42; 60G46

1. Introduction and Motivations

In recent years, the statistical literature has become increasingly interested in statistical issues pertaining to the study of functional random variables or variables with values in an infinite-dimensional space. The availability of data measured on ever-finer temporal/spatial grids, such as in meteorology, medicine, and satellite images, is driving the expansion of this research topic; statistical modeling of these data as random functions uncovered numerous complex theoretical and numerical research challenges. The reader may consult the monographs to summarize functional data analysis’s theoretical and practical aspects. [1] for linear models for random variables with values in a Hilbert space, [2] for scalar-on-function and function-on-function linear models, functional principal component analysis, and parametric discriminant analysis. [3], on the other hand, concentrated on nonparametric methods, particularly kernel-type estimation for scalar-on-function nonlinear regression models. Such tools were extended to classification and discrimination analysis. [4] discussed the application of several interesting statistical concepts to the functional data framework, including goodness-of-fit tests, portmanteau tests, and change point problems. [5] was interested in analyzing variance for functional data, whereas [6] was more concerned with regression analysis for gaussian processes. Recent studies and surveys on functional data modeling and analysis can be found in [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22].

The subject of estimating conditional models has been extensively studied in the statistical literature, utilizing several estimation techniques, the most prevalent of which is the conventional kernel method. This is because these methods have broad applications and play an important role in statistical inference. However, such methods may have certain drawbacks when predicting compactly supported or discontinuous curves at boundary locations. The prevalence of alternative wavelet approaches can be attributed to the adaptability of these methods to discontinuities in the curve that is to be approximated. In application, the wavelet method offers a straightforward estimating algorithm to both implement and compute. Refs. [23,24,25,26] provided more information on wavelet theory. We quote the work of [27], which discusses the properties of wavelet approximation and analyzes the application of wavelets in various curve estimation issues in depth. In [28], numerous applications of wavelet theory are described for the independent unidimensional case, with an emphasis on calculating the integrated squared derivative density function. The findings of [28] were later extended by [29] to estimate the density derivatives for negatively and positively related sequences, respectively. In [30], wavelet estimators for partial derivatives of a multivariate probability density function were constructed, and convergence rates for an independence scenario could be determined. In detail, [31] discusses the estimation of partial derivatives of a multivariate probability density function in the presence of additive noise. We could consult [32,33] for the most recent information on this subject. Using the independent and identically distributed paradigm, [34] examined density and regression estimation issues unique to functional data. Ref. [34] proposed a novel adaptive method based on the term-by-term selection of wavelet coefficient estimators and wavelet bases for Hilbert spaces of functions.

The primary purpose of this paper is to provide further context for the earlier discussion of stationary ergodic processes. The lack of research on the general dependence framework for wavelet analysis prompted us to conduct the present study. Several arguments for contemplating an ergodic dependency structure in that data as opposed to a mixing structure are presented in [35,36,37,38,39,40,41,42], where further information on the notion of the ergodic property of processes and examples of such processes are provided. In [43], one of the arguments used to justify the ergodic setting is that establishing ergodic characteristics rather than the mixing condition can be significantly simpler for some classes of processes. Therefore, the ergodicity hypothesis provides the ideal framework for examining data series created by chaotic noise.

In the body of statistical research that has been published, estimation issues based on discretely sampled observations as well as continuous time have been investigated. In finance, even if the underlying process is continuous across time, only its values at a finite number of points are known. The first one is more important from the standpoint of financial econometrics (see [44,45]). On the other hand, the technical development of statistical inference using a continuous time record of observations is far easier to accomplish than an inference based on discretely observed data. It makes it possible to advance further in the model’s statistical analysis and find solutions to various concerns concerning discretely observed diffusions that had not been resolved until this point. Further, remember that the discrete-time model hits its limits when the discretization step equals zero, which is when the continuous-time model becomes the limit (see [46]). As a result, the asymptotic behavior of estimate processes in actuality may be close to the asymptotic behavior established theoretically for continuous-time observations if the available data are “dense enough” relative to the observation period. The knowledge of the most accurate estimate derived from continuous-time data is also of practical significance. Ref. [47] investigated the challenge of estimating an invariant density function for a continuous-time Markov process and established the mean-square consistency of kernel-type estimators. Ref. [48] demonstrated the homogeneity and consistency of these estimators. Ref. [48] showed that a family of smoothed estimators, including kernel-type estimators, are asymptotically normal with a rate of convergence

T^{1 / 2}

for continuous-time processes. This finding is unexpected because the invariant density estimators operate differently in this instance than in discrete-time processes, where the convergence rate is often smaller than

1 / 2

. Numerous physical phenomena, such as seismic waves, the Earth’s magnetic field, and isotherms and isobars in meteorology, are functional variables observable in continuous time, as is evident.

In [49], we have considered the wavelet basis for the nonparametric estimation of density and regression functions for continuous functional stationary processes in Hilbert space. We have characterized the mean integrated square errors. This research aims to provide the first complete theoretical rationale for wavelet-based functional density and regression function estimation for continuous stationary processes by extending our earlier work [49] to the continuous setting. The employment of wavelet estimators in functional ergodic data frames and the resulting difficulty of establishing the mean integrated square error over suitable decomposition spaces is, to the best of our knowledge, still an unresolved topic in the literature. By merging different martingale theory techniques utilized in the mathematical construction of the proofs, we aim to contribute to the literature by addressing this gap. In the independent or mixing setting, the tools employed for regression estimation change significantly in the ergodic setting. Nevertheless, we shall see that more than merely mixing preexisting ideas and outcomes is needed to address the issue. In order to deal with wavelet estimators in an ergodic setting, one must resort to complex mathematical derivations.

The paper’s structure is as follows: The multiresolution analysis is described in Section 2. The principal density estimate results are presented in Section 3. Section 4 summarizes the principal outcomes for regression estimation. Section 5 provides some examples of potential applications. There are some concluding observations in Section 6. All proof is compiled in Section 7.

2. Multiresolution Analysis

Following [34,49,50], we will now introduce some basic notations for defining wavelet bases for Hilbert spaces of functions with a few modifications to accommodate our context. In this study, nonlinear, thresholded, wavelet-based estimators are examined. Beginning with a description of the fundamental theory of wavelet approaches, we then introduce nonlinear wavelet-based estimators. The interested reader should consult [23,24], see also [51,52] and the references therein, despite the fact that the wavelet is based on a separable Hilbert space

H

of real or complex-valued functions on a complete separable metric space. Let

H

represent the separable Hilbert space of real-valued functions defined on a separable complete metric space

S

. Given that

H

is separable; it possesses an orthonormal basis

E = \{e_{j} : j \in Δ\},

where

Δ

is a countable index set. The space

H

is equipped with an inner product

⟨ \cdot, \cdot ⟩

and a norm

‖ \cdot ‖

. Consider the sequence of subsets

\{I_{k}; k \geq 0\}

an increasing sequence of finite subsets of

Δ

such that

⋃_{k \geq 0} I_{k} = Δ .

The subset

J_{k}

denotes the orthogonal complement of

I_{k}

in

I_{k + 1}

, i.e.,

J_{k} = I_{k + 1} I_{k} .

Choose, for any

k \geq 0

,

ζ_{k, ℓ} \in S

,

ℓ \in I_{k}

and

η_{k, ℓ} \in S

,

ℓ \in J_{k}

, such that the following matrices

A_{k} = {(e_{j} (ζ_{k, ℓ}))}_{(j, ℓ) \in I_{k} \times I_{k}}, B_{k} = {(e_{j} (η_{k, ℓ}))}_{(j, ℓ) \in J_{k} \times J_{k}},

(1)

fulfill one of the two following conditions, for instance, see [34,50] and the references therein.

(A.1): $A_{k}^{*} A_{k} = diag {(a_{k, ℓ})}_{ℓ \in I_{k}}$ and $B_{k}^{*} B_{k} = diag {(b_{k, ℓ})}_{ℓ^{'} \in J_{k}}$ where $a_{k, ℓ}$ and $b_{k, ℓ}$ for $ℓ \in I_{k}$ and $ℓ^{'} \in J_{k}$ are positive constants.
(A.2): $A_{k} A_{k}^{*} = diag {(c_{k, ℓ})}_{ℓ \in I_{k}}$ and $B_{k} B_{k}^{*} = diag {(d_{k, ℓ})}_{ℓ^{'} \in J_{k}}$ where $c_{k, ℓ}$ and $d_{k, ℓ}$ for $ℓ \in I_{k}$ and $ℓ^{'} \in J_{k}$ are positive constants.

Condition (A.1) implies that

a_{k, ℓ} = \sum_{j \in I_{k}} | e_{j} (ζ_{k, ℓ}) |^{2}, ℓ \in I_{k}, a n d b_{k, ℓ} = \sum_{j \in J_{k}} {| e_{j} (η_{k, ℓ}) |}^{2}, ℓ \in I_{k},

(2)

which means that all the columns of

A_{k}

and

B_{k}

are not zero vectors. As for (A.2), it gives

c_{k, ℓ} = \sum_{ℓ \in I_{k}} | e_{j} (ζ_{k, ℓ}) |^{2}, j \in I_{k}, a n d d_{k, ℓ} = \sum_{ℓ \in J_{k}} {| e_{j} (η_{k, ℓ}) |}^{2}, j \in I_{k},

(3)

saying that all the rows of

A_{k}

and

B_{k}

are not zero vectors. For any

x \in S

, we set

\{\begin{matrix} ϕ_{k} (\cdot; ζ_{k, ℓ}) = \sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} \bar{e_{j} (ζ_{k, ℓ})} e_{j} (\cdot), \\ ψ_{k} (\cdot; η_{k, ℓ}) = \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} \bar{e_{j} (η_{k, ℓ})} e_{j} (\cdot), \end{matrix}

(4)

where

g_{j, k, ℓ} = \{\begin{matrix} a_{k, ℓ} i f (A . 1), \\ c_{k, ℓ} i f (A . 2), \end{matrix} h_{j, k, ℓ} = \{\begin{matrix} b_{k, ℓ} i f (A . 1), \\ d_{k, ℓ} i f (A . 2), \end{matrix}

(5)

The following collection serves as the orthonormal basis for

H

(see Theorem 2 of [50]):

B = \{ϕ_{0} (x, ζ_{0, ℓ}), ℓ \in I_{0}; ψ_{k} (x, η_{k, ℓ}), k \geq 0, ℓ \in J_{k}\} .

(6)

For further details, see [34,50,53]. Hence, we infer that for any

f \in H

, we have

f (x) = \sum_{ℓ \in I_{0}} α_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k \geq 0} \sum_{ℓ \in J_{k}} β_{k, ℓ} ψ_{k} (x; η_{k; ℓ}),

(7)

where

α_{0, ℓ} = ⟨ f, ϕ_{0} (\cdot; ζ_{0, ℓ}) ⟩, β_{k, ℓ} = ⟨ f, ψ_{k} (\cdot; η_{k; ℓ}) ⟩ .

(8)

Adding two additional assumptions to the orthonormal basis

E

:

(E.1)

There exists a constant

C_{1} > 0

such that, for any integer

k \geq 0

, we have

(i): $\sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} \leq C_{1},$
(ii): $\sum_{j \in I_{k}} \frac{1}{h_{j, k, ℓ}} {|e_{j} (η_{k, ℓ})|}^{2} \leq C_{1} .$

(E.2)

There exists a constant

C_{2} > 0

such that, for any integer

k \geq 0

, we have

sup_{x \in S} \sum_{j \in J_{k}} {|e_{j} (x)|}^{2} \leq C_{2} | J_{k} | .

Remark 1.

Clearly, the assumption (E.1) is satisfied under assumption (A.1) whenever

C_{1} = 1

; we may also consult [50], Example 2 and its applications for further details. [50,53] have provided three examples satisfying condition (E.2) taking

sup_{x \in S} \sum_{j \in J_{k}} {|e_{j} (x)|}^{2} \leq 1,

see also [53] Theorem 3.2. Moreover, [34] has used both assumptions in the independent and identically distributed functional data.

Besov Space

Over the years, numerous statisticians have pondered the following question: given an estimating method and a required estimation rate for a certain loss function, what is the largest space over which this rate can be accomplished, for example, see [27,54] and the references therein. We are interested in estimating methods based on thresholding methods given by wavelet bases in a natural situation. It is common knowledge that wavelet bases serve to characterize the smoothness of spaces such as the Hölder spaces

C^{s}

, Sobolev spaces

W^{s} (L_{p})

, and Besov spaces

B_{q}^{s} (L_{p})

for a range of indices s that depend on both the smoothness properties that are dependent on both the smoothness of

ψ

and its dual function

\tilde{ψ}

, for instance, see [54] for more detail and examples, at this point, we may consult [55]. The following statistical definition is used in approximation theory for the study of nonlinear procedures, such as thresholding and greedy algorithms; see [27,52,54,56].

Definition 1

(Besov space). Let

s > 0

. We say that the function

f \in H

, defined by statement (7), belongs to the Besov space

B_{\infty}^{s} (H)

if and only if:

sup_{m \geq 0} | J_{m} |^{2 s} \sum_{k \geq m} \sum_{ℓ \in J_{k}} {| β_{k, ℓ} |}^{2} < \infty .

(9)

Definition 2

(Weak Besov space). Let

r > 0

. We say that the function

f \in H

, defined by statement (7), belongs to the weak Besov space

W^{r} (H)

if and only if:

sup_{λ \geq 0} λ^{r} \sum_{k \geq 0} \sum_{ℓ \in J_{k}} 𝟙_{\{| β_{k, ℓ} | \geq λ\}} < \infty .

(10)

3. The Density Estimation

Let

{X_{t}, Y_{t}}_{t \geq 0}

denote a sequence of strictly stationary ergodic pairs of random elements, where

X_{t}

takes values in a complete separable metric space of Hilbert space

S

associated with the corresponding Borel

σ

-algebra

B

and

Y_{i}

is a real or complex-valued variable. Let

P_{X}

denote the probability measure induced by

X_{0}

on

(S, B)

. Assume that there exists

σ

-finite measure

ν

on the measurable space

(S, B)

in such a way that

P_{X}

is dominated by

ν

. The Radon–Nikodym theorem guarantees the existence of a measurable function that is nonnegative

f (\cdot)

in such a way that

P_{X} (B) = \int_{B} f (x) ν (d x), B \in B .

(11)

In this framework, we intend to estimate

f (\cdot)

on the basis of n observed functional data

{(X_{t})}_{{0 \leq t \leq T}}

. We assume that

f \in H

, where

H

is a separable Hilbert space of real or complex-valued functions defined on

S

and square integrable with respect to the

σ

-finite measure

ν

. In this research, we are especially interested in the wavelet estimation processes created in the 1990s, see Meyer’s work for the functional data of a Hilbert space and, more specifically, the nonlinear estimators. The majority of this model’s approaches involve introducing kernel estimator techniques to estimate the functional component of the model, see [57]. Let

f (\cdot)

be the sample’s common density function

{(X_{t})}_{{0 \leq t \leq T}}

, which is assumed to be

(F.1): $\exists C_{f} > 0$ is a known constant such that

$sup_{x \in S} f (x) \leq C_{f} .$

(12)

Density Function Estimator

From now, suppose that the density function

f (\cdot) \in H

is a separable Hilbert space. Then

f (\cdot)

satisfies the wavelet representation (7). Assume that we observe a sequence

{\{(X_{t}, Y_{t})\}}_{0 \leq t \leq T}

of copies of

(X, Y)

that is supposed to be functional stationary and ergodic with

X

with the density function

f (\cdot)

. We examine density estimation utilizing wavelet bases for Hilbert spaces of functions of [50]. We consider the estimated coefficients

{α_{k, ℓ}}

and

{β_{k, ℓ}}

given, respectively, by (14) and (15), for any

j_{0} \leq m

. Here the resolution level

m = m (T) \to \infty

at the rate specified below. Since we assume that

ϕ (\cdot)

and

ψ_{i} (\cdot)

have a compact support so that the summations in (7) are finite for each fixed

x

(note that in this case, the support of

ϕ (\cdot)

and

ψ_{i} (\cdot)

is a monotonically increasing function of their degree of differentiability [24]). We focus our attention on the nonlinear estimators (13), which will be studied in the mean integrated squared error over adapted decomposition spaces, in a similar way as in [34] in the setting of the independent and identically distributed functional processes, in particular, we refer to [49]. The density wavelet hard thresholding estimator

\hat{f} (\cdot)

is defined for all

x \in S

, by

{\hat{f}}_{T} (x) = \sum_{ℓ \in I_{0}} {\hat{α}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\hat{β}}_{k, ℓ} 𝟙_{\{| {\hat{β}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}\}} ψ_{k} (x; η_{k; ℓ}),

(13)

where

\begin{matrix} {\hat{α}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \end{matrix}

(14)

\begin{matrix} {\hat{β}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

(15)

Here

κ

denotes a large-enough constant, and

m_{T}

is the integer satisfying

\frac{1}{2} \frac{T}{ln T} \leq | J_{m_{T}} | \leq \frac{T}{ln T} .

(16)

See [36,38] for further details on the multivariate case.

Comments on the Method of Estimation

Our estimation method is divided into three steps:

1.: Estimation of the wavelet coefficients $α_{k, ℓ}$ and $β_{k, ℓ}$ , see statement (8), ${\hat{α}}_{k, ℓ}$ and ${\hat{β}}_{k, ℓ}$ given by Equations (14) and (15);
2.: The greatest ${\hat{β}}_{k, ℓ}$ is selected by applying hard thresholding;
3.: Reconstruction of the selected elements of the initial wavelet basis.

It is essential to highlight that our choice is the universal threshold

κ {(\frac{ln T}{T})}^{1 / 2}

and the definition of

m_{T}

is based on theoretical considerations. The estimator under consideration is not dependent on the smoothness of

f (\cdot)

; see [53] for more details in the case of the linear wavelet estimator of

f (\cdot)

. Moreover, for additional information on the case of

H = L ([a, b])

and more standard nonparametric models, refer to [27,58]. Notation is necessary for stating the results. Throughout the paper, we shall denote by

B \in B,

the open set of the Borel

σ -

algebra

B

. For any

0 \leq t \leq T

and

δ > 0

small real, we define

F_{X_{t}} = P (X_{t} \in B) = P_{X} (B), s e e s t a t e m e n t (11),

and

F_{X_{t}}^{F_{t - δ}} = P (X_{t} \in B | F_{t - δ})

as the distribution function and the conditional distribution function, given the

σ -

field

F_{t - δ}

, respectively. Before presenting our results, we present supplementary notation and our hypotheses. For the remainder of the paper, for a positive real

δ

, we will denote by

n = \frac{T}{δ} \in N, a n d T_{j} = j δ, f o r j = 1, \dots, n .

Let

F_{t - δ}

be the

σ -

field generated by

\{(X_{s}, Y_{s}) : 0 \leq s < t - δ\}

Let

F_{j}

be the

σ -

field generated by

\{(X_{s}, Y_{s}) : 0 \leq s \leq T_{j}\}

and

S_{t} = σ \{(X_{s}, Y_{s}); (X_{r}) : 0 \leq s \leq t, t \leq r \leq t + δ\} .

The following assumptions are required for the entire paper.

(C.0): There is a nonnegative measurable function $f_{t}^{F_{t - δ}}$ such that

$P_{X}^{F_{t - δ}} (B) = \int_{B} f_{t}^{F_{t - δ}} (x) ν (d x), B \in B .$

(17)

We may refer to [3,35,59] for further details.
(C.1): For any $x \in S$

$lim_{n \to \infty} \frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) = f (x), i n t h e a . s . a n d L^{2} s e n s e .$

At this point, we may refer to [60] for further details.

Comments on hypotheses. Approximating the integral

\int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t

by its Riemann’s sum, we have

\frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t = \frac{1}{T} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{t - δ}} (x) d t ⋍ \frac{1}{n} \sum_{i = 1}^{n} f_{T_{i}}^{F_{T_{i - 1}}} (x) .

By the fact that process

{(X_{T_{j}})}_{j \geq 1}

is stationary and ergodic, in a similar way as in [61] (see, Lemma 4 and Corollary 1 together with their proofs), one may establish that the sequence

{(f_{T_{i}}^{F_{T_{i - 1}}} (x))}_{i \geq 1} = {(f_{i δ, (i - 1) δ} (x))}_{i \geq 1}

of random functions is stationary and ergodic. Indeed, it suffices to substitute the conditional densities in the work of Delecroix with

f_{i δ, (i - 1) δ}

and the density by function

f (\cdot)

. Refer to [35,62] and the references therein.

Theorem 1.

Under the assumptions (C.0), (C.1), (F.1), (E.1), and (E.2), and Equation (16) for any

θ \in (0, 1)

,

f \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H),

there is a constant

C_{1} > 0

such that

E ({∥{\hat{f}}_{T} (x) - f (x)∥}^{2}) \leq C_{1} {(\frac{ln T}{T})}^{θ},

(18)

for a large-enough T.

The following upper bound finding is an immediate consequence: if

f \in

B_{\infty}^{s / (2 s + 1)} (H) \cap W^{2 / (2 s + 1)} (H)

for

s > 0

, then there is a constant

C_{2} > 0

in such a way that

E (∥ {\hat{f}}_{T} {- f ∥}^{2}) \leq C_{2} {(\frac{ln T}{T})}^{2 s / (2 s + 1)} .

This convergence rate is close to optimal in the “standard” minimax configuration (see, [27]). Moreover, on the application of [52] (Theorem 3.2), one obtains that

B_{\infty}^{θ / 2} (H) \cap

W^{2 (1 - θ)} (H)

is the “maxiset” associated with

\hat{f} (\cdot)

at the convergence rate of

{(ln T / T)}^{θ}

, i.e.,

lim_{T \to \infty} {(\frac{T}{ln T})}^{θ} E (∥ {\hat{f}}_{T} {- f ∥}^{2}) < \infty \Leftrightarrow f \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H) .

4. The Regression Estimation

Let

ρ : R^{q} \to R

be a measurable function. The regression function

m (\cdot, ρ)

is defined by

ρ (Y) = m (X, ρ) + ϵ,

(19)

where

ϵ

is a random variable independent of

X

with

N (0, 1)

. We assume that

m (\cdot, ρ) \in H

, where

H

is a separable Hilbert space of real or complex-valued functions defined on

S

and a square-integrable with respect to the

σ

-finite measure

ν

. We shall assume that there is a known constant and

C_{m} > 0

in such a way that

sup_{x \in S} m (x, ρ) \leq C_{m} .

(20)

In this framework, we redefine the probability measure

P_{X}

in (11) and assume that

f (\cdot)

is a nonnegative measurable known function.

(M.1): We shall assume that there is a known constant $C_{m} > 0$ in such a way that

$sup_{x \in S} m (x; ρ) \leq C_{m} .$
(M.2): We shall suppose that there is a known constant $c_{f} > 0$ in such a way that

$inf_{x \in S} f (x) \geq c_{f} .$

Regression Function Estimator

In this framework, our goal is to estimate

m (\cdot, ρ)

based on observed functional data

{(X_{t}, Y_{t})}_{{0 \leq t \leq T}}

. The kernel estimator for the regression function of functional data has been suggested by [59]

{\hat{m}}_{n; h_{T}} (x, ρ) : = \frac{\int_{0}^{T} ρ (Y_{t}) K (\frac{d (x, X_{t})}{h_{T}^{d}}) d t}{\int_{0}^{T} K (\frac{d (x, X_{t})}{h_{T}^{d}}) d t} .

This estimator is similar to the one proposed by [59] in the discrete framework. By using the idea of [58], we propose the hard wavelet thresholding estimator

\hat{m} (\cdot, ρ)

, for all

x \in S

, by

\hat{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\hat{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\hat{θ}}_{k, ℓ} 𝟙_{\{| {\hat{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}\}} ψ_{k} (x; η_{k; ℓ}),

(21)

where

\begin{matrix} {\hat{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \end{matrix}

(22)

\begin{matrix} {\hat{θ}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

(23)

where

m_{T}

is the integer satisfying

\frac{1}{2} \frac{T}{{(ln T)}^{2}} \leq | J_{m_{T}} | \leq \frac{T}{{(ln T)}^{2}},

(24)

and

κ

is a large-enough constant. The multivariate case for discrete- and continuous-time linear wavelet estimators was examined by [36,38].

Theorem 2.

Under the assumptions (E.1), (E.2) (M.1), (M.2), (C.0), and (C.1), combined with condition (24), for any

θ \in (0, 1)

,

m (\cdot, ρ) \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H)

, there is a constant

C_{3} > 0

such that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C {(\frac{ln T}{T})}^{θ},

(25)

for T large enough.

Assume that

m (\cdot, ρ)

and

f (\cdot)

fulfill (M.1) and, for any

θ \in (0, 1),

m (\cdot, ρ) \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H),

where

B_{\infty}^{θ / 2} (H)

is in (Definition 1) with

s = θ / 2

and

W^{2 (1 - θ)} (H)

is in (Definition 2) with

r = 2 (1 - θ)

, then there is a constant

C_{3} > 0

in such a way that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C_{3} {(\frac{{(ln T)}^{2}}{T})}^{θ}

for T large enough. Again, note that for

s > 0

, if

m (\cdot, ρ) \in B_{\infty}^{s / (2 s + 1)} (H) \cap W^{2 / (2 s + 1)} (H),

then there is a constant

C_{4} > 0

in such a way that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C_{4} {(\frac{{(ln T)}^{2}}{T})}^{2 s / (2 s + 1)} .

This convergence rate is close to optimal in the “standard” minimax framework (see, [27]) up to an extra logarithmic term. According to our knowledge, Theorem 2 is the first one to examine an adaptive wavelet-based estimator for functional data in the context of nonparametric regression for an ergodic process. Ref. [49] studied the same estimator in a discrete time setting.

From the fact that the coefficients defined in Equations (22) and (23) depend on the unknown function

f (\cdot)

, it is possible to use

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{\hat{f} (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \end{matrix}

(26)

\begin{matrix} {\tilde{θ}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{\hat{f} (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

(27)

The wavelet hard thresholding estimator is defined by

\tilde{m} (\cdot, ρ)

, for all

x \in S

, by

\tilde{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\tilde{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\tilde{θ}}_{k, ℓ} 𝟙_{{| {\tilde{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}}} ψ_{k} (x; η_{k; ℓ}) .

(28)

Keep in mind the following elementary observation

\frac{1}{\hat{f} (\cdot)} = \frac{1}{f (\cdot)} + \frac{(f (\cdot) - \hat{f} (\cdot))}{f (\cdot) \hat{f} (\cdot)} .

We can infer, from the last equation, the following

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & {\hat{η}}_{k, ℓ} + \frac{1}{T} \int_{0}^{T} \frac{(f (X_{t}) - \hat{f} (X_{t}))}{f (X_{t}) \hat{f} (X_{t})} ρ (Y_{t}) ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \\ {\tilde{θ}}_{k, ℓ} & = & {\hat{θ}}_{k, ℓ} + \frac{1}{T} \int_{0}^{T} \frac{(f (X_{t}) - \hat{f} (X_{t}))}{f (X_{t}) \hat{f} (X_{t})} ρ (Y_{t}) ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

Notice that

\begin{matrix} {({\tilde{η}}_{k, ℓ} - {\hat{η}}_{k, ℓ})}^{2} & = & {\{\frac{1}{T} \int_{0}^{T} \frac{(f (X_{t}) - \hat{f} (X_{t}))}{f (X_{t}) \hat{f} (X_{t})} ρ (Y_{t}) ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t\}}^{2} \\ \leq & \frac{1}{T} \int_{0}^{T} {\{\frac{(f (X_{t}) - \hat{f} (X_{t}))}{f (X_{t}) \hat{f} (X_{t})}\}}^{2} d t \\ \times \frac{1}{T} \int_{0}^{T} {\{ρ (Y_{t}) ϕ_{k} (X_{t}; ζ_{k, ℓ})\}}^{2} d t . \end{matrix}

This gives

\begin{matrix} E {({\tilde{η}}_{k, ℓ} - {\hat{η}}_{k, ℓ})}^{2} & \leq & E \{\frac{1}{T} \int_{0}^{T} {\{\frac{(f (X_{t}) - \hat{f} (X_{t}))}{f (X_{t}) \hat{f} (X_{t})}\}}^{2} d t \\ \times \frac{1}{T} \int_{0}^{T} E (ρ {(Y_{t})}^{2} ∣ X_{t}) ϕ_{k} {(X_{t}; ζ_{k, ℓ})}^{2} d t\} . \end{matrix}

Under the conditions of Theorems 1 and 2, one can find a positive

C

constant such that

\begin{matrix} E {({\tilde{η}}_{k, ℓ} - {\hat{η}}_{k, ℓ})}^{2} & \leq & C {(\frac{ln T}{T})}^{θ} . \end{matrix}

A combination of Theorem 1 with Theorem 2 gives the following corollary.

Corollary 1.

Under the hypotheses of Theorems 1 and 2, there exists a constant

C_{5} > 0

in such a way that

E ({∥\tilde{m} (x; ρ) - m (x; ρ)∥}^{2}) \leq C_{5} {(\frac{ln T}{T})}^{θ},

(29)

for T large enough.

Remark 2.

Let

\{X_{n}, n \in Z\}

be a stationary sequence. Consider the backward field

A_{n} = σ (X_{k} :

k \leq n)

and the forward field

B_{n} = σ (X_{k} : k \geq n)

. The sequence is strongly mixing if

sup_{A \in A_{0}, B \in B_{n}} | P (A \cap B) - P (A) P (B) | = α (n) \to 0 as n \to \infty .

The sequence is ergodic if

lim_{n \to \infty} \frac{1}{n} \sum_{k = 0}^{n - 1} |P (A \cap τ^{- k} B) - P (A) P (B)| = 0,

where τ is the shift transformation or time-evolution. The labeling of strong mixing in the preceding definition is more rigorous than what is commonly referred to as strong mixing (when using the terminology of measure-preserving dynamical systems), meaning that it satisfies the following conditions

lim_{n \to \infty} P (A \cap τ^{- n} B) = P (A) P (B)

for any two measurable sets

A, B

, see, for instance, [63]. Therefore, substantial mixing requires ergodicity, although the opposite is not necessarily true (see, for instance, Remark 2.6 on page 50 about Proposition 2.8 on page 51 in [64]).

Remark 3.

Ref. [43] provided an example of an ergodic but non-mixing process in their discussion, which can be summarized as follows: Let

(T_{i}, λ_{i}) : i \in Z

be a strictly stationary process such that

T_{i} ∣ T_{i - 1}

is a Poisson process with parameter

λ i

, where

T_{i}

is the

s i g m a

-field generated by

(T_{i}, λ_{i}, T_{i - 1}, \dots)

. Assume

λ_{i} = f (λ_{i - 1}, T_{i - 1}),

and

f : [0, \infty] \times N \to (0, \infty)

is a given function. This process is not mixing in general (see Remark 3 of [65]). It is known that any sequence

{(ε_{i})}_{i \in Z}

of independent and identically distributed random variables is ergodic. Hence, according to Proposition 2.10 in [64], it is easy to see that

{(Y_{i})}_{i \in Z}

with

Y_{i} = ϑ ((\dots, ε_{i - 1}, ε_{i}), (ε_{i + 1}, ε_{i + 2}, \dots)),

for some Borel-measurable function

ϑ (\cdot)

. Ref. [66] has constructed an example of a non-mixing ergodic continuous-time process. It is well known that the fractional Brownian motion

{W_{t}^{H} : t \geq 0}

with parameter

H \in (0, 1)

has strictly stationary increments. Otherwise, the fractional Gaussian noise, defined for every

s > 0

by

{G_{t}^{H} : t \geq 0} : = {W_{t + s}^{H} - W_{t}^{H} : t \geq 0},

is a strictly stationary-centered, long memory process when

H \in (\frac{1}{2}, 1)

(for instance, see [67] p. 55, [68] p. 17), hence the condition of strong mixing is not satisfied. Let

{G_{t} : t \geq 0}

be a strictly stationary-centered Gaussian process with correlation function

R (t) = E [G_{0} G_{t}] .

Relying on the work of [69], Lemma 4.2, it follows that the process

{G_{t} : t \geq 0}

is ergodic whenever

lim_{t \to \infty} R (t) = 0,

which is the case for the process

{G_{t}^{H} : t \geq 0}

.

Remark 4.

In continuous time, sampling is frequently used to obtain data. Several discretization strategies, including deterministic and random sampling, have been proposed in the literature. The interested reader is referred to [70,71,72,73,74]. To simplify the concept, we consider the density estimator of

f (\cdot)

based on

{X_{t} : t \in [0, T]}

and its sampled discrete sequence

{X (t_{k}) : k = 1, \dots, n}

. The estimator of the sampled density

f (\cdot)

is

f_{n} (x) = \frac{1}{n h_{n}^{d}} \sum_{i = 1}^{n} K (\frac{x - X_{t_{j}}}{h_{n}}) .

As stated in [70], we can only recollect two designs: irregular sampling and random sampling.

Deterministic sampling.Consider the situation where ${(t_{k})}_{1 \leq k \leq n}$ is deterministically irregularly spaced with

$inf_{1 \leq k \leq n} | t_{j + 1} - t_{j} | = \frac{1}{τ},$

for some $τ > 0$ . $G_{k} : = σ (X (t_{k}))$ the σ-field generated by ${X_{s} : 0 \leq s \leq t_{k}}$ . Clearly, ${(G_{k})}_{1 \leq k \leq n}$ in a family of increasing σ-fields.
Random sampling. Assume that the instants ${(t_{k})}_{1 \leq k \leq n}$ in the interval $[0, T]$ form a sequence of uniform random variables independent of the process ${X_{t} : t \in [0, T]}$ . Define

$0 \leq τ_{1} < \dots < τ_{n} \leq T,$

as the corresponding order statistics. Observe that ${(τ_{k})}_{1 \leq k \leq n}$ are the observation points for the procedure. Clearly, all of the distances between these sites are positive. As a consequence, taking $G_{k} : = σ (X (t_{k}))$ the σ-field generated by ${X_{s} : 0 \leq s \leq τ_{k}}$ , it follows that ${(G_{k})}_{1 \leq k \leq n}$ is a sequence of increasing σ-fields.

As established in [75], the penalization approach for the choice of the mesh δ of the observations provides an optimal rate of convergence; we leave this subject open for future investigation within the context of ergodic processes.

Remark 5.

In a previous publication [36], we tackled the nonparametric estimation of the density and regression function in a finite-dimensional setting using an orthonormal wavelet basis. Our findings differ significantly from those presented in this publication. In [36], we demonstrated the strong uniform consistency characteristics of these estimators over compact subsets of

R^{d}

under a general ergodic condition on the underlying processes. In addition, we demonstrate the asymptotic normality of wavelet-based estimators. The Burkholder–Rosenthal inequality, a more complex technique than the exponential inequality used in the previous publication, was the central concept of this study. Significantly, the current study studies the mean integrated square error over compact subsets, which is fundamentally different from the conclusions of the prior publication.

5. Applications

5.1. The Conditional Distribution

Our findings can be utilized to examine the conditional distribution

F (y ∣ x)

for

y \in R^{d}

. To be more precise, let

ρ (y) = 𝟙 {y \leq z}

. The wavelet’s hard thresholding estimator of

F (y ∣ x)

, is defined, for all

x \in S

, by

\hat{F} (y ∣ x) = \sum_{ℓ \in I_{0}} {\overset{˘}{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\overset{˘}{θ}}_{k, ℓ} 𝟙_{\{| {\overset{˘}{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}\}} ψ_{k} (x; η_{k; ℓ}),

(30)

where

\begin{matrix} {\overset{˘}{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{𝟙 {Y_{t} \leq y}}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \end{matrix}

(31)

\begin{matrix} {\overset{˘}{θ}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{𝟙 {Y_{t} \leq y}}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

(32)

A direct consequence of Theorem 2 is

E ({∥\hat{F} (y ∣ x) - F (y ∣ x)∥}^{2}) \leq C {(\frac{ln T}{T})}^{θ} .

(33)

5.2. The Conditional Quantile

Remark that whenever

F (\cdot ∣ x)

is strictly increasing and continuous in a neighborhood of

q_{α} (x)

, the function

F (\cdot ∣ x)

has a unique quantile of order

α

at a point

q_{α} (x)

, which is

F (q_{α} (x) ∣ x) = α

. In this situation

q_{α} (x) = F^{- 1} (α ∣ x) = inf {y \in R : F (y ∣ x) \geq α},

which can be estimated by

{\hat{q}}_{T, α} (x) = {\hat{F}}^{- 1} (α ∣ x)

. Consequently, the fact

F (q_{α} (x) ∣ x) = α = \hat{F} ({\hat{q}}_{T, α} (x) ∣ x)

and

\hat{F} (\cdot ∣ x)

is continuous and strictly increasing, we then have

\forall ϵ > 0, \exists η (ϵ) > 0, \forall y, |\hat{F} (y ∣ x) - \hat{F} (q_{α} (x) ∣ x)| \leq η (ϵ) \Rightarrow |y - q_{α} (x)| \leq ϵ,

implying that,

\forall ϵ > 0, \exists η (ϵ) > 0

\begin{matrix} P (|{\hat{q}}_{T, α} (x) - q_{α} (x)| \geq η (ϵ)) \\ \leq P (|\hat{F} ({\hat{q}}_{T, α} (x) ∣ x) - \hat{F} (q_{α} (x) ∣ x)| \geq η (ϵ)) \\ = P (∣ F (q_{α} (x) ∣ x) - \hat{F} (q_{α} (x) ∣ x) \geq η (ϵ)) . \end{matrix}

In addition, suppose that, for fixed

x_{0} \in C \subset S, F (y ∣ x_{0})

is differentiable at

q_{α} (x_{0})

with

{\frac{\partial}{\partial y} F (y ∣ x_{0})|}_{y = q_{α} (x_{0})} : = g (q_{α} (x_{0}) ∣ x_{0}) > ν > 0,

where

ν

is a real number, and

g (\cdot ∣ x)

is uniformly continuous for all

x \in C

. From Taylor’s expansion of the function

F ({\hat{q}}_{T, α} (x) ∣ x)

in the neighborhood of

q_{α} (x)

, we have

F ({\hat{q}}_{T, α} (x) ∣ x) - F (q_{α} (x) ∣ x) = ({\hat{q}}_{T, α} (x) - q_{α} (x)) g (q_{T, α}^{*} (x) ∣ x)

where

q_{T, α}^{*} (x)

is a point between

q_{α} (x)

and

{\hat{q}}_{T, α} (x)

. Using the fact that

{\hat{q}}_{T, α} (x)

converges a.s. toward

q_{α} (x)

as T goes to infinity, combined with the uniform continuity of

g (\cdot ∣ x)

, allows us to write that

sup_{x \in C} |{\hat{q}}_{T, α} (x) - q_{α} (x)| sup_{x \in C} |g (q_{α} (x) ∣ x)| = O_{a . s .} (sup_{y \in S} sup_{x \in C} |{\hat{F}}_{T} (y ∣ x) - F (y ∣ x)|) .

By the fact that

g (q_{α} (x) ∣ x)

is uniformly bounded from below, we can then claim that

E ({∥{\hat{q}}_{T, α} (x) - q_{α} (x)∥}^{2}) \leq C {(\frac{ln T}{T})}^{θ} .

(34)

Remark 6.

(Expectile regression). For

p \in (0, 1)

, the choice given by

ψ (T - θ) = (p - 𝟙 {T - θ \leq 0}) | T - θ |

leads to quantities called expectiles by [76]. Expectiles, as defined by [76], may be introduced either as a generalization of the mean or as an alternative to quantiles. Indeed, classical regression provides us with a high sensitivity to extreme values, allowing for more reactive risk management. Quantile regression, on the other hand, provides the ability to acquire exhaustive information on the effect of the explanatory variable on the response variable by examining its conditional distribution; refer to [77,78,79] for further details on expectiles in functional data settings.

Remark 7.

(Conditional winsorized mean). As in [80], if we consider

ψ (T - θ) = - k, T - θ, k

if

T - θ < - k

,

| T - θ | \leq k

, or

T - θ > k

, then

m (x; ψ)

will be the conditional winsorized mean. Notably, this parameter was not considered in the literature on nonparametric functional data analysis involving wavelet estimators. Our paper offers asymptotic results for the conditional winsorized mean when the covariates are functions.

5.3. Shannon’s Entropy

The differential (or Shannon) entropy of

f (\cdot)

is defined to be

\begin{matrix} H (f) & = & - \int_{S} f_{X} (x) log (f_{X} (x)) ν (d x), \end{matrix}

(35)

whenever this integral is meaningful. We will apply the convention

0 log (0) = 0

since

u log (u) \to 0

as

u \to 0

. The notion of differential entropy was originally introduced in [81]. Since this epoch, the concept of entropy has been the topic of extensive theoretical and applied research. We are referencing [82] (Chapter 8) for a thorough overview of differential entropy and its applications and mathematical characteristics. Entropy concepts and principles play a fundamental role in many applications, such as quantization theory [83], statistical decision theory [84], and contingency table analysis [85]. Ref. [86] introduced the concept of convergence in entropy and showed that the latter convergence concept implies convergence in

L_{1}

. This property indicates that entropy is a useful concept for measuring “closeness in distribution”, and also heuristically justifies the usage of sample entropy as a test statistic when designing entropy-based tests of goodness-of-fit. This line of research has been pursued by [87,88,89] (including the references therein). The idea here is that many families of distributions are characterized by maximization of entropy subject to constraints (see [90,91]). Given

{\hat{f}}_{T} (\cdot)

in Equation (13), we estimate

H (f)

using representation (35), by setting

\begin{matrix} H_{T} (f) = - \int_{A_{n}} {\hat{f}}_{T} (x) log ({\hat{f}}_{T} (x)) ν (d x), \end{matrix}

(36)

where

A_{n} = {x \in S : {\hat{f}}_{T} (x) \geq γ_{n}},

and

γ_{n} ↓ 0

is a sequence of positive constants. To prove the strong consistency of

H_{T} (f)

, we shall consider another, but more appropriate and more computationally convenient, centering factor than the expectation

E H_{T} (f)

, which is delicate to handle. This is given by

\begin{matrix} \hat{E} H_{T} (f) = - \int_{A_{n}} E {\hat{f}}_{T} (x) log (E {\hat{f}}_{T} (x)) ν (d x) . \end{matrix}

We first decompose

H_{T} (f) - \hat{E} H_{T} (f)

, as in [92,93,94,95], into the sum of two components, by writing

\begin{matrix} H_{T} (f) - \hat{E} H_{T} (f) \\ = - \int_{A_{n}} {\hat{f}}_{T} (x) log ({\hat{f}}_{T} (x)) ν (d x) \\ + \int_{A_{n}} E {\hat{f}}_{T} (x) log (E {\hat{f}}_{T} (x)) ν (d x) \\ = - \int_{A_{n}} \{log {\hat{f}}_{T} (x) - log E {\hat{f}}_{T} (x)\} E {\hat{f}}_{T} (x) ν (d x) \\ - \int_{A_{n}} \{{\hat{f}}_{T} (x) - E {\hat{f}}_{T} (x)\} log {\hat{f}}_{T} (x) ν (d x) . \end{matrix}

(37)

We remark that for all

z > 0

,

|log z| \leq |\frac{1}{z} - 1| + |z - 1| .

Thus, for any

x \in A_{n}

, we obtain

\begin{matrix} | log {\hat{f}}_{T} (x) - log E {\hat{f}}_{T} (x) | = |log \frac{{\hat{f}}_{T} (x)}{E {\hat{f}}_{T} (x)}| \\ \leq |\frac{E {\hat{f}}_{T} (x)}{{\hat{f}}_{T} (x)} - 1| + |\frac{{\hat{f}}_{T} (x)}{E {\hat{f}}_{T} (x)} - 1| \\ = \frac{|E {\hat{f}}_{T} (x) - {\hat{f}}_{T} (x)|}{{\hat{f}}_{T} (x)} + \frac{|{\hat{f}}_{T} (x) - E {\hat{f}}_{T} (x)|}{E {\hat{f}}_{T} (x)} . \end{matrix}

Making use of Theorem 1, gives

E ({(H_{T} (f) - \hat{E} H_{T} (f))}^{2}) \leq C {(\frac{ln T}{T})}^{θ},

(38)

for positive constant C. Note that this is the first estimation result for the entropy of functional continuous time series processes. A similar idea can be used to estimate other density functions, such as the extropy; refer to [96] for a definition.

5.4. The Curve Discrimination

The curve discrimination problem can be stated as follows. Let

{\{X_{t}\}}_{0 \leq t \leq T}

denote a sample of curves, and each of them is known to pertain to one among

G

groups

ι = 1, \dots, G

. We denote the group of the curve

X_{t}

by

T_{t}

. Suppose that each pair of variables

(X_{t}, T_{t})

has the same distribution as the pair

(X, T)

. Given a new curve

x

, the question is to determine its class membership. To do so, we will estimate the conditional probability for every

ι = 1, \dots, G

, as follows:

p_{ι} (x) = P (T = ι ∣ X = x) .

Following the proposal made in [97,98] permitting the estimation of these probabilities by

{\hat{p}}_{ι} (x) = \sum_{ℓ \in I_{0}} {\overset{˘}{\overset{˘}{η}}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} 𝟙_{\{| {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}\}} ψ_{k} (x; η_{k; ℓ}),

(39)

where

\begin{matrix} {\overset{˘}{\overset{˘}{η}}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{𝟙 {T_{t} =_{ι}}}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, \end{matrix}

(40)

\begin{matrix} {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{𝟙 {T_{t} =_{ι}}}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

(41)

As remarked by [97,98], for each

ι

we make use of the notation

Y = \{\begin{matrix} 1 & if T = ι \\ 0 & otherwise, \end{matrix}

then we can write

p_{ι} (x) = E (Y ∣ X = x) .

An application of Theorem 2 is

E ({∥{\hat{p}}_{ι} (x) - p_{ι} (x∥}^{2}) \leq C {(\frac{ln T}{T})}^{θ} .

6. Concluding Remarks

In this study, we investigated the nonparametric estimation of density and regression function using continuous functional stationary processes and wavelet bases for Hilbert spaces of functions. We described the mean square error integrated over compact subsets. The martingale method was used to determine the asymptotic properties of these estimators, which differs significantly from the mixing and independent settings. Ergodicity is the assumption of the process’s dependence. Extending nonparametric functional concepts to local stationary processes is a relatively new area of research. It would be interesting to extend our work to the case of the functional local stationary process, but this would require nontrivial mathematics and is well outside the scope of this paper. The exact logarithm rates of convergence depend on the smoothness parameters

θ

of functions

f (\cdot)

and

m (\cdot, ρ)

defined in the space

H

. These results are typical of all nonparametric estimations and are consistent with the vast majority of academic references. Although

θ

is unknown, the smoothness metrics assumed here are more nuanced than the differentiability criterion of integer orders required for convolution kernel methods. To make the estimator more practical, techniques have been developed to select the optimal adaptive value of

τ

, such that the most commonly used methods of “Stein,” the “rule of thumb,” and “cross-validation”, we refer to [99] and [27] for details on an asymptotically optimal empirical bandwidth selection rule. It would be worthwhile to investigate this subject in the future.

7. Proofs

This section is dedicated to proving our findings. The previously presented notations are used again in the sequel. In this research, we need an upper bound inequality for partial sums of unbounded martingale differences to derive the asymptotic results for the estimations of the density and regression functions based on strictly stationary and ergodic functional data. Here and in the following, “C” denotes a positive constant that may vary from line to line. The following lemmas express this inequality.

Lemma 1.

(Burkholder–Rosenthal inequality) Following Notation1in [100].

Let

{(X_{i})}_{i \geq 1}

be a stationary martingale adapted to the filtration

{(F_{i})}_{i \geq 1}

, define

{(d_{i})}_{i \geq 1}

as the sequence of martingale differences adapted to

{(F_{i})}_{i \geq 1}

and

S_{n} = \sum_{i = 1}^{n} d_{i},

then for any positive integer n,

∥ max_{1 \leq j \leq n} | S_{j} {|) ∥}_{p} ≪ n^{1 / p} {∥ d_{1} ∥}_{p} + {∥\sum_{k = 1}^{n} E (d_{k}^{2} / F_{k - 1})∥}_{p / 2}^{1 / 2}, f o r a n y p \geq 2;

(42)

where, as usual, the norm

{∥ \cdot ∥}_{p} = {(E [| \cdot |^{p}])}^{1 / p}

.

Lemma 2

([101]). Let

{Z_{i}, i \geq 1}

denote a sequence of martingale differences in such a way that

| Z_{i} | \leq B, a . s .,

then, for all

ϵ > 0

and all n large enough, we obtain

P \{|\sum_{i = 1}^{n} Z_{i}| > ϵ\} \leq 2 exp \{- \frac{ϵ^{2}}{2 n B^{2}}\} .

The subsequent lemmas characterize the asymptotic behavior of estimators

{\hat{α}}_{k, ℓ}

and

{\hat{β}}_{k, ℓ}

.

Lemma 3.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in I_{k}

, under conditions (C.0), (C.1), (F.1), and (E.1)(i), there is a constant

C > 0

in such a way that

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq C (\frac{ln T}{T}) .

(43)

Lemma 4.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in J_{k}

, and under conditions (C.0), (C.1), (F.1), (E.1), and (E.2), and condition (16), there is a constant

C > 0

in such a way that

E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) = C {(\frac{ln T}{T})}^{2}, a . s .

(44)

Lemma 5.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in J_{k}

, for

κ > 0

sufficiently large and under conditions (C.0), (C.1), (F.1), (E.1), and (E.2), and assumption (16) there is a constant

C > 0

in such a way that

P (|{\hat{β}}_{k, ℓ} - β_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) \leq C {(\frac{ln T}{T})}^{2} .

(45)

Proof of Theorem 1.

Keep in mind that the proof of Theorem 1 is a straightforward application is a direct application of [52] (Theorem 3.1), using Lemmas 3–5 alongside

c (n) = {(ln T / T)}^{1 / 2}

,

σ_{i} = 1

,

r = 2

. We modified and expanded the approach used to prove Theorem 3.1 of [34], and to include the stationary ergodic process. □

Proof of Theorem 3.

Consider the subsequent decomposition

\begin{matrix} {\hat{α}}_{k, ℓ} - α_{k, ℓ} & = & {\hat{α}}_{k, ℓ} - {\tilde{α}}_{k, ℓ} + {\tilde{α}}_{k, ℓ} - α_{k, ℓ} \\ = & A_{k, ℓ, 1} + A_{k, ℓ, 2}, \end{matrix}

(46)

where

{\tilde{α}}_{k, ℓ} = \frac{1}{T} \int_{0}^{T} E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t .

Under the assumptions (C.0) and (C.2), we have

\begin{matrix} {\tilde{α}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) f_{t}^{F_{t - δ}} (x) ν (d x) d t \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) (\frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t) ν (d x) \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) (f (x) + o (1)) ν (d x) \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) f (x) ν (d x) + o (1) \\ = & α_{k, ℓ} + o (1) . \end{matrix}

We readily obtain

{\tilde{α}}_{k, ℓ} = α_{k, ℓ}, a s, T \to \infty,

(47)

implying that

A_{k, ℓ, 2} = o (1), a . s .

(48)

Therefore, we infer that

\begin{matrix} {\hat{α}}_{k, ℓ} - α_{k, ℓ} & = & A_{k, ℓ, 1} + o (1), a . s . \end{matrix}

We now consider the term

A_{k, ℓ, 1}

. We infer

\begin{matrix} A_{k, ℓ, 1} & = & {\hat{α}}_{k, ℓ} - {\tilde{α}}_{k, ℓ} \\ = & \frac{1}{T} \int_{0}^{T} (ϕ_{k} (x_{t}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t \\ = & \frac{1}{n} \sum_{i = 1}^{n} Φ_{k} (ζ_{k, ℓ}), \end{matrix}

where

Φ_{i, k} (ζ_{k, ℓ}) = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (ϕ_{k} (x_{t}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t .

Notice that, with respect to the sequence of

σ -

fields

{(F_{i})}_{0 \leq k \leq n}

,

{(Φ_{i, k} (ζ_{k, ℓ}))}_{0 \leq k \leq n}

is a sequence of martingale differences. It is evident, by Lemma 1, to infer that

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{i, k} (ζ_{k, ℓ})|}^{2}], \end{matrix}

where

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{i, k} (ζ_{k, ℓ})|}^{2}])}^{\frac{1}{2}} & \leq & n^{1 / 2} {∥Φ_{1, k} (ζ_{k, ℓ})∥}_{2} + {∥\sum_{i = 1}^{n} E [Φ_{i, k}^{2} (ζ_{k, ℓ}) | F_{i - 1}]∥}_{1}^{1 / 2} \\ = & Φ_{(1)} + Φ_{(2)} . \end{matrix}

(49)

Observe that for all

0 \leq t \leq δ

, the

σ

-field

F_{t - δ} = F_{0}

represents the trivial one. On the one hand, by using the classical decomposition in connection with the fact that

F_{0}

is the trivial

σ -

field, we have

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = & ‖ Φ_{1, k} (ζ_{k, ℓ}) ‖_{2}^{2} \\ = & E [{|\frac{1}{δ} \int_{0}^{δ} (ϕ_{k} (x_{t}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{0}]) d t|}^{2}] \\ \leq & \frac{1}{δ^{2}} \int_{0}^{δ} E [\sum_{j = 0}^{2} ∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{j} {(E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣])}^{2 - j}] d t \\ = & \frac{1}{δ^{2}} \int_{0}^{δ} (\sum_{j = 0}^{2} C_{2}^{j} E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{j}] . {(E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣])}^{2 - j}) d t \\ = & \frac{1}{δ^{2}} \int_{0}^{δ} (C_{2}^{2} E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2}] + C_{2}^{1} {(E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣])}^{2} + C_{2}^{0} E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2}]) d t . \end{matrix}

Remark that, under the conditions (F.1) and (E.1)(i) and the fact that

E

is an orthonormal basis of H, we obtain

\begin{matrix} E [{|ϕ_{k} (x_{t}; ζ_{k, ℓ})|}^{2}] & = & \int_{S} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} f (x) ν (d x) \\ \leq & C_{f} \int_{S} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} ν (d x) \\ = & C_{f} \int_{S} {|\sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} e_{j} (ζ_{k, ℓ}) e_{j} (x)|}^{2} ν (d x) \\ = & C_{f} \int_{S} \sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & C_{f} C_{1}, \end{matrix}

(50)

where

C_{1}

is a positive constant,

\begin{matrix} E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2}] & = & O (1), \end{matrix}

(51)

therefore,

\begin{matrix} Φ_{(1)} = O (T^{1 / 2}) . \end{matrix}

(52)

Furthermore, we examine the second term of decomposition (49), and remark that

\begin{matrix} Φ_{2} & = & {(E (\sum_{i = 1}^{n} E [Φ_{i, k}^{2} (ζ_{k, ℓ}) | F_{i - 1}]))}^{1 / 2} \\ = & {(\sum_{i = 1}^{n} E (E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}]))}^{1 / 2} \\ = & {(\sum_{i = 1}^{n} E [Φ_{i, k}^{2} (ζ_{k, ℓ})])}^{1 / 2}, \end{matrix}

using the notable identity combined with Jensen’s inequality, we obtain

\begin{matrix} E [Φ_{i, k}^{2} (ζ_{k, ℓ})] & = & E [{(|\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (ϕ_{k} (x_{t}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t|)}^{2}] \\ \leq & \frac{1}{δ^{2}} \int_{T_{i - 1}}^{T_{i}} E [{(|(ϕ_{k} (x_{t}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{t}; ζ_{k, ℓ}) | F_{t - δ}])|)}^{2}] d t \\ \leq & \frac{1}{δ^{2}} \int_{T_{i - 1}}^{T_{i}} E [2 ∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2} + 2 E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2} | F_{t - δ}]] d t \\ \leq & \frac{1}{δ^{2}} \int_{T_{i - 1}}^{T_{i}} (2 E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2}] + 2 E [E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2} | F_{t - δ}]]) d t \\ \leq & \frac{4}{δ^{2}} \int_{T_{i - 1}}^{T_{i}} E [∣ ϕ_{k} (x_{t}; ζ_{k, ℓ}) ∣^{2}] d t, \end{matrix}

observe that, using (50), we have

\begin{matrix} Φ_{2} = O (T^{1 / 2}) . \end{matrix}

(53)

Therefore, combining Equations (52) and (53), we obtain

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{i, k} (ζ_{k, ℓ})|}^{2}])}^{\frac{1}{2}} = O (T^{1 / 2}) . \end{matrix}

Hence,

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{i, k} (ζ_{k, ℓ})|}^{2}] \\ = & \frac{δ^{2}}{T^{2}} O (T) \\ \leq & C (\frac{ln T}{T}) . \end{matrix}

Therefore, there exists a constant

C = C_{f} C_{1} > 0

, such that

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq \frac{4 C δ}{T} \leq 4 C (\frac{ln T}{T}) .

(54)

Hence the proof is complete. □

Proof of Lemma 4.

Consider the following decomposition

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & {\hat{β}}_{k, ℓ} - {\tilde{β}}_{k, ℓ} + {\tilde{β}}_{k, ℓ} - β_{k, ℓ} \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

(55)

where

{\tilde{β}}_{k, ℓ} = \frac{1}{T} \int_{0}^{T} E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{t - δ}] d t .

Remark that, under conditions (F.1) and (E.1)(i) and making use of the fact that

E

is an orthonormal basis of H, and reasoning in a similar way as in Equation (47), we infer that

{\tilde{β}}_{k, ℓ} = β_{k, ℓ}, a s, n \to \infty .

(56)

This, in turn, implies that

B_{k, ℓ, 2} = o (1), a . s .

(57)

Therefore, we obtain

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & B_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Hence, we readily infer

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & = & \frac{1}{n^{4}} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}), \end{matrix}

(58)

where

Ψ_{i, k, ℓ} = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (ψ_{k} (x_{t}; η_{k, ℓ}) - E [ψ_{k} (x_{t}; η_{k, ℓ}) | F_{t - δ}]) d t .

Remark that, with respect to the sequence of

σ -

fields

{(F_{i - 1})}_{0 \leq k \leq n}

,

{(Ψ_{i, k, ℓ})}_{0 \leq k \leq n}

is a sequence of martingale differences, an application of the Burkholder–Rosenthal inequality (see Lemma 1), gives

\begin{matrix} {(E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}))}^{1 / 4} & \leq & {∥max_{1 \leq j \leq n} |\sum_{i = 1}^{j} Ψ_{i, k, ℓ}|∥}_{4} \\ ≪ & n^{1 / 4} {∥ Ψ_{1, k, ℓ} ∥}_{4} + {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} | F_{i - 2})∥}_{4 / 2}^{1 / 2} \\ = & Ψ_{k, ℓ}^{(1)} + Ψ_{k, ℓ}^{(2)} . \end{matrix}

(59)

Consider the first term of Equation (59). We recall that for all

0 \leq t \leq δ

, the

σ

-field

F_{t - δ} = F_{0}

represents the trivial one. Applying Jensen’s inequality, we have

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & = & ∥ Ψ_{1, k, ℓ} ∥_{4}^{4} \\ = & E ({|\frac{1}{δ} \int_{0}^{δ} (ψ_{k} (x_{t}; η_{k, ℓ}) - E [ψ_{k} (x_{t}; η_{k, ℓ}) | F_{0}]) d t|}^{4}) \\ \leq & \frac{1}{δ^{4}} \int_{0}^{δ} E [{(|ψ_{k} (x_{t}; η_{k, ℓ})| + E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4}] d t . \end{matrix}

Using the Minkowski inequality, we obtain

\begin{matrix} \int_{0}^{δ} E [{(|ψ_{k} (x_{t}; η_{k, ℓ})| + E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4}] d t \\ = \int_{0}^{δ} {(E^{1 / 4} [{|ψ_{k} (x_{t}; η_{k, ℓ})|}^{4}] + E^{1 / 4} [{(E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4}])}^{4} d t \\ = 16 \int_{0}^{δ} E [{|ψ_{k} (x_{t}; η_{k, ℓ})|}^{4}] d t . \end{matrix}

(60)

This gives that

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & \leq & 16 \int_{0}^{δ} {(E^{1 / 4} [{|ψ_{k} (x_{t}; η_{k, ℓ})|}^{4}])}^{4} d t . \end{matrix}

By reasoning in a similar way as in Equation (50) and making use of conditions (F.1) and (E.1)(i), for all

0 \leq t \leq T

, we infer that

E [{|ψ_{k} (x_{t}; η_{k, ℓ})|}^{2}] \leq C,

(61)

where C denotes a positive constant. In addition, by the Cauchy–Schwarz inequality in combination with conditions (E.1)(ii), (E.2) and condition (16), we infer

\begin{matrix} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| & \leq & sup_{x \in S} \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} | e_{j} (η_{k, ℓ}) | | e_{j} (x) | \\ \leq & {(\sum_{j \in J_{k}} \frac{1}{h_{j, k, ℓ}} {| e_{j} (η_{k, ℓ}) |}^{2})}^{1 / 2} {(sup_{x \in S} \sum_{j \in J_{k}} {| e_{j} (x) |}^{2})}^{1 / 2} \\ \leq & C_{1}^{1 / 2} C_{2}^{1 / 2} \sqrt{| J_{k} |} \\ \leq & C_{4} \sqrt{| J_{m_{n}} |} \\ \leq & C_{4} \sqrt{\frac{T}{ln T}} . \end{matrix}

(62)

We then obtain

\begin{matrix} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{4}] & = & O (\frac{T}{ln T}) . \end{matrix}

(63)

Recall that

T = δ n

, we deduce that

Ψ_{k, ℓ}^{(1)} = O (\frac{T^{1 / 2}}{{(ln T)}^{1 / 4}}) .

(64)

We now consider the upper bound of

Ψ_{k, ℓ}^{(2)}

in Equation (59). Remark that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} ∣ F_{i - 2})∥}_{2}^{1 / 2} \\ = & {(E [{(\sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} ∣ F_{i - 2}])}^{2}])}^{1 / 4} . \end{matrix}

Observe that for all

T_{i - 1} \leq t \leq T_{i}

for all

i = 1, \dots, n

; we have

F_{i - 2} \subset F_{t - δ}

. Making use of Jensen and Minkowski’s inequality, it follows that

\begin{matrix} \sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} ∣ F_{i - 2}] \\ = \sum_{i = 1}^{n} (E [{(\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (ψ_{k} (x_{t}; η_{k, ℓ}) - E [ψ_{k} (x_{t}; η_{k, ℓ}) ∣ F_{t - δ}]) d t)}^{2} ∣ F_{i - 2}]) \\ \leq \frac{1}{δ^{2}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} {(E^{1 / 2} [ψ_{k}^{2} (x_{t}; η_{k, ℓ}) ∣ F_{i - 2}] + E^{1 / 2} [E [ψ_{k}^{2} (x_{t}; η_{k, ℓ}) ∣ F_{t - δ}] ∣ F_{i - 2}])}^{2} d t \\ = \frac{4}{δ^{2}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [ψ_{k}^{2} (x_{t}; η_{k, ℓ}) ∣ F_{i - 2}] d t . \end{matrix}

For all

T_{i - 1} \leq t \leq T_{i}

, using the stationarity of the process

{(X_{t})}_{t \geq 0}

, we have

f_{t}^{F_{i - 2}} (x) = f_{T_{i - 1}}^{F_{i - 2}} (x) .

Remark, under conditions (F.1), (E.1)(i), (C.0), and (C.1) and statement (61), that

\begin{matrix} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [{(ψ_{k} (x_{t}; η_{k, ℓ}))}^{2} ∣ F_{i - 2}] d t & = & n \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (\frac{1}{n} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t) ν (d x) \\ \leq & n \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (δ \frac{1}{n} \sum_{i = 1}^{n} f_{T_{i - 1}}^{F_{i - 2}} (x)) ν (d x) \\ = & n \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (f (x) + o (1)) ν (d x) \\ \leq & n (C_{f} + o (1)) \int_{S} {|ψ_{k} (x; ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & \frac{T C_{f} C_{1}}{δ} . \end{matrix}

(65)

It follows that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & O (T^{1 / 2}) . \end{matrix}

(66)

Combining statements (59), (64), and (66), we obtain

\begin{matrix} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}) & = & O (\frac{T^{2}}{ln T}) + O (T^{2}) . \end{matrix}

We conclude that

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & = & O (\frac{1}{T^{2} ln T}) + O (\frac{1}{T^{2}}) . \end{matrix}

(67)

This implies that there exists a constant

C > 0

, such that

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & \leq & C {(\frac{ln T}{T})}^{2} . \end{matrix}

(68)

The proof is achieved. □

Proof of Lemma 5.

Consider the previous decomposition in Lemma 4, to write that

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & ({\hat{β}}_{k, ℓ} - {\tilde{β}}_{k, ℓ}) + ({\tilde{β}}_{k, ℓ} - β_{k, ℓ}) \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

where

\begin{matrix} B_{k, ℓ, 1} & = & \frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (ψ_{k} (x_{t}; η_{k, ℓ}) - E [ψ_{k} (x_{t}; η_{k, ℓ}) | F_{t - δ}]) d t), \\ B_{k, ℓ, 2} & = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (E [ψ_{k} (x_{t}; η_{k, ℓ}) | F_{t - δ}] - β_{k, ℓ}) d t) . \end{matrix}

Making use of Equation (57), we achieve the desired result for the term

B_{k, ℓ, 2}

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & B_{k, ℓ, 1} + o (1) . \end{matrix}

We now remark

\begin{matrix} P (|\frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) & \leq & P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{δ κ}{2} \sqrt{T ln T}) . \end{matrix}

The application of Lemma 2 implies that

\begin{matrix} |Ψ_{i, k, ℓ}| & = & |ψ_{k} (x_{i}; η_{k, ℓ}) - E [ψ_{k} (x_{i}; η_{k, ℓ}) ∣ F_{i - 1}]| \\ \leq & 2 sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| \\ \leq & C_{3} \sqrt{\frac{T}{ln T}} \\ \leq & C_{3} \sqrt{T} . \end{matrix}

(69)

Let

B = C_{3} \sqrt{T}

. Then, for all

ϵ_{T} = \frac{δ κ}{2} \sqrt{T ln T}

where n is sufficiently large, we have

\begin{matrix} P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{T ln T}) & \leq & 2 exp \{- \frac{ϵ_{T}^{2}}{2 n B^{2}}\} \\ = & 2 exp \{- \frac{{(\frac{δ κ}{2} \sqrt{T ln T})}^{2}}{2 (T / δ) {(C_{3} \sqrt{T})}^{2}}\} \\ = & 2 exp \{- \frac{δ^{3} κ^{2} ln T}{8 C_{3}^{2} T}\} \\ = & 2 exp \{ln T^{- \frac{δ^{3} κ^{2}}{8 C_{3}^{2} T}}\} \\ = & 2 T^{- w (κ, T)}, \end{matrix}

(70)

where

w (κ, T) = \frac{δ^{3} κ^{2}}{8 C_{3}^{2} T} .

By choosing

κ

such that

w (T, κ) = 2,

we have

\begin{matrix} P (|{\hat{β}}_{k, ℓ} - β_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) & \leq & C \frac{1}{T^{2}} + o (1) \\ \leq & C {(\frac{ln T}{T})}^{2} . \end{matrix}

(71)

The proof of (45) is achieved. □

Recall that

\hat{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\hat{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{T}} \sum_{ℓ \in J_{k}} {\hat{θ}}_{k, ℓ} 𝟙_{{| {\hat{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln T}{T}}}} ψ_{k} (x; η_{k; ℓ}),

where

\begin{matrix} {\hat{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) d t, {\hat{θ}}_{k, ℓ} = \frac{1}{T} \int_{0}^{T} \frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) d t . \end{matrix}

Lemma 6.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in I_{k}

and under conditions (E.1)(i), (M.1), (M.2), (C.0), and (C.1), there is a constant

C > 0

in such a way that

E ({|{\hat{η}}_{k, ℓ} - η_{k, ℓ}|}^{2}) \leq C (\frac{ln T}{T}) .

(72)

Lemma 7.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in J_{k}

, and under conditions (E.1), (E.2) (M.1), (M.2), (C.0), and (C.1), in combination with condition (24), there is a constant

C > 0

in such a way that

E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) = C {(\frac{ln T}{T})}^{2}, a . s .

(73)

Lemma 8.

For each

k \in {0, \dots, m_{T}}

and each

ℓ \in J_{k}

, for

κ > 0

sufficiently large, (E.1), (E.2) (M.1), (M.2), (C.0), and (C.1), in combination with condition (24), there is a constant

C > 0

in such a way that

P (|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) \leq C {(\frac{ln T}{T})}^{2} .

(74)

Proof of Lemma 2.

Remark that the proof of Theorem 2 is a consequence of [52] (Theorem 3.1) with

c (n) = {(ln T / T)}^{1 / 2}

,

σ_{i} = 1

,

r = 2

in connection with Lemmas 6–8. We generalized the method of the proof in [34] (Theorem 4.1). □

Proof of Lemma 6.

Consider the subsequent decomposition

\begin{matrix} {\hat{η}}_{k, ℓ} - η_{k, ℓ} & = & {\hat{η}}_{k, ℓ} - {\tilde{η}}_{k, ℓ} + {\tilde{η}}_{k, ℓ} - η_{k, ℓ} \\ = & A_{k, ℓ, 1} + A_{k, ℓ, 2}, \end{matrix}

(75)

where

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t \\ = & \frac{1}{T} \int_{0}^{T} E [\frac{(m (X_{t}, ρ) + ϵ_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t \\ = & \frac{1}{T} \int_{0}^{T} E [\frac{m (X_{t}, ρ)}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t + \frac{1}{T} \int_{0}^{T} E [\frac{ϵ_{t}}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t . \end{matrix}

For all

0 \leq t \leq T

; we have

F_{t - δ} \subset S_{t - δ}

and from the independence between

ϵ_{t}

and

X_{t}

, we have

\begin{matrix} E [ϵ_{t} | S_{t - δ}] & = & E [ϵ_{t} | X_{t}] \\ = & E [ϵ_{t}] \\ = & 0 . \end{matrix}

(76)

Observe that

\begin{matrix} E [\frac{ϵ_{t}}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] & = & E [\frac{E [ϵ_{t} | S_{t - δ}]}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] \\ = & E [\frac{E [ϵ_{t} | X_{t}]}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] \\ = & E [\frac{E [ϵ_{t}]}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] \\ = & 0 . \end{matrix}

This implies that

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} E [\frac{m (X_{t}, ρ)}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}] d t . \end{matrix}

Using conditions (M.1), (M.2), (C.0), and (C.1), we obtain

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{T} \int_{0}^{T} \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) f_{t}^{F_{t - δ}} (x) ν (d x) d t \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) (\frac{1}{T} \int_{0}^{T} f_{t}^{F_{t - δ}} (x) d t) ν (d x) \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) (f (x) + o (1)) ν (d x) \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) f (x) ν (d x) + o (1) \\ = & η_{k, ℓ} + o (1) . \end{matrix}

We readily obtain that

{\tilde{η}}_{k, ℓ} = η_{k, ℓ}, a s, T \to \infty,

(77)

implying that

A_{k, ℓ, 2} = o (1), a . s .

(78)

Therefore, we infer that

\begin{matrix} {\hat{η}}_{k, ℓ} - η_{k, ℓ} & = & A_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Let us now concentrate on the term

A_{k, ℓ, 1}

in Equation (75), we infer

\begin{matrix} A_{k, ℓ, 1} & = & {\hat{η}}_{k, ℓ} - {\tilde{η}}_{k, ℓ} \\ = & \frac{1}{T} \int_{0}^{T} (\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t \\ = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t) \\ = & \frac{1}{n} \sum_{i = 1}^{n} Φ_{k, i} (ζ_{k, ℓ}), \end{matrix}

where

Φ_{k, i} (ζ_{k, ℓ}) = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t .

Remark that, with respect to the sequence of

σ -

fields

{(F_{i - 1})}_{0 \leq k \leq m_{T}}

,

{(Φ_{k, i} (ζ_{k, ℓ}))}_{0 \leq k \leq m_{T}}

is a sequence of martingale differences. It is obvious, going with the proof of Equation (54), to observe that

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k, i} (ζ_{k, ℓ})|}^{2}], \end{matrix}

where

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{k, i} (ζ_{k, ℓ})|}^{2}])}^{\frac{1}{2}} & \leq & n^{1 / 2} ‖ Φ_{k, 1} (ζ_{k, ℓ}) ‖_{2} + {‖ \sum_{i = 1}^{n} E [Φ_{k, i}^{2} (ζ_{k, ℓ}) | F_{i - 2}] ‖}_{1}^{1 / 2} \\ = & Φ_{(1)} + Φ_{(2)} . \end{matrix}

(79)

On the one hand, using Jensen and Minkowski’s inequalities combined, we obtain

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = & ‖ Φ_{k, 1} (ζ_{k, ℓ}) ‖_{2}^{2} \\ = & E [{|\frac{1}{δ} \int_{0}^{δ} (\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}]) d t|}^{2}] \\ \leq & \frac{1}{δ^{2}} \int_{0}^{δ} E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ}) | F_{t - δ}]|}^{2}] d t \\ \leq & \frac{1}{δ^{2}} \int_{0}^{δ} {(E^{1 / 2} [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{2}] + E^{1 / 2} [E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{2} | F_{t - δ}]])}^{2} d t \\ \leq & \frac{4}{δ^{2}} \int_{0}^{δ} E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{2}] d t . \end{matrix}

It follows, from assumptions (M.1) and (M.2), that

| ρ (Y_{t}) | \leq C_{m} + | ϵ_{t} |,

(80)

combined with the independence between

X_{t}

and

ϵ_{t}

,

E [ϵ_{t}^{2}] = 1

. Remark that, under condition (E.1)(i) and using the fact that

E

is an orthonormal basis of H, we infer

\begin{matrix} E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{2}] & \leq & (C_{m}^{2} + 1) \frac{1}{C_{f}} E [{|\frac{1}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{2}] \\ \leq & (C_{m}^{2} + 1) \int_{S} \frac{1}{f (x)} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} f (x) ν (d x) \\ = & (C_{m}^{2} + 1) \int_{S} {|\sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} e_{j} (ζ_{k, ℓ}) e_{j} (x)|}^{2} ν (d x) \\ = & (C_{m}^{2} + 1) \int_{S} \sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & (C_{m}^{2} + 1) C_{1} = O (1) . \end{matrix}

(81)

Therefore, we have

\begin{matrix} Φ_{(1)} = {(\frac{δ n (C_{m}^{2} + 1) C_{1}}{δ^{2}}))}^{1 / 2} = O (T^{1 / 2}) . \end{matrix}

(82)

Furthermore, we examine the second term of decomposition (79) and continue as per the proof of (53) and considering (81), one obtains

\begin{matrix} Φ_{2} = O (T^{1 / 2}) . \end{matrix}

(83)

therefore, combining Equations (82) and (83), we obtain

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{k} (X_{i}; ζ_{k, ℓ})|}^{2}])}^{1 / 2} = O (T^{1 / 2}) . \end{matrix}

Hence, there exists a positive constant C such that

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2}] \\ = & \frac{δ^{2}}{T^{2}} O (T) \\ \leq & C (\frac{ln T}{T}) . \end{matrix}

(84)

Therefore, combining Equations (78) and (84) there exists a constant C such as

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq C (\frac{ln T}{T}) .

(85)

Thus, the proof of Lemma 6 is completed. □

Proof of Lemma 7.

Consider the subsequent decomposition

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & {\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ} + {\overset{˘}{θ}}_{k, ℓ} - θ_{k, ℓ} \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

(86)

where

{\overset{˘}{θ}}_{k, ℓ} = \frac{1}{T} \int_{0}^{T} E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) | F_{t - δ}] d t .

Remark that, under conditions (M.1), (M.2), and (C.1), Equation (76), and using the fact that

E

is an orthonormal basis of H, by reasoning as in Equation (77), we obtain

{\overset{˘}{θ}}_{k, ℓ} = θ_{k, ℓ}, a s, n \to \infty,

(87)

giving that

B_{k, ℓ, 2} = o (1), a . s .

(88)

Therefore, we have

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & B_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Hence we obtain

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & = & \frac{1}{n^{4}} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}), \end{matrix}

(89)

where

Ψ_{i, k, ℓ} = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{Y_{t}}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) - E [\frac{Y_{i}}{f (x_{t})} ψ_{k} (x_{t}; η_{k, ℓ}) | F_{t - δ}]) d t .

Notice that

{(Ψ_{i, k, ℓ})}_{0 \leq k \leq n}

is a sequence of martingale differences with respect to the sequence of

σ -

fields

{(F_{i - 1})}_{0 \leq k \leq n}

, applying the Burkholder–Rosenthal inequality (see Lemma 1), we obtain

\begin{matrix} {(E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}))}^{1 / 4} & \leq & {∥max_{1 \leq j \leq n} |\sum_{i = 1}^{j} Ψ_{i, k, ℓ}|∥}_{4} \\ ≪ & n^{1 / 4} {∥ Ψ_{1, k, ℓ} ∥}_{4} + {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} | F_{i - 2})∥}_{4 / 2}^{1 / 2} \\ = & Ψ_{k, ℓ}^{(1)} + Ψ_{k, ℓ}^{(2)} . \end{matrix}

(90)

Consider the first term of the Equation (90). Applying Jensen and Minkowski’s inequalities, we have

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} = {∥ Ψ_{1, k, ℓ} ∥}_{4}^{4} \\ = E ({|\frac{1}{δ} \int_{0}^{δ} (\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) | F_{t - δ}]) d t|}^{4}) \\ \leq \frac{1}{δ^{4}} \int_{0}^{δ} E [{(|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ})| + E [|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ})| | F_{t - δ}])}^{4}] \\ \leq \frac{1}{δ^{4}} \int_{0}^{δ} {(E^{1 / 4} [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{4}] + E^{1 / 4} [E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{4} | F_{t - δ}]])}^{4} d t \\ \leq \frac{16}{δ^{4}} \int_{0}^{δ} E [{|\frac{ρ (Y_{t})}{f (X_{t})} ϕ_{k} (X_{t}; ζ_{k, ℓ})|}^{4}] d t . \end{matrix}

(91)

By proceeding as in Equation (81) and under the same conditions (M.1), (M.2), and (E.1)(i), for all

0 \leq t \leq T

, we obtain

E [{|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ})|}^{2}] \leq C,

(92)

where C is a positive constant. Moreover, by the Cauchy–Schwarz inequality in connection with conditions (E.1)(ii) and (E.2) and assumption (24), we have

\begin{matrix} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| & \leq & sup_{x \in S} \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} | e_{j} (η_{k, ℓ}) | | e_{j} (x) | \\ \leq & {(\sum_{j \in J_{k}} \frac{1}{h_{j, k, ℓ}} {| e_{j} (η_{k, ℓ}) |}^{2})}^{1 / 2} {(sup_{x \in S} \sum_{j \in J_{k}} {| e_{j} (x) |}^{2})}^{1 / 2} \\ \leq & C_{1}^{1 / 2} C_{2}^{1 / 2} \sqrt{| J_{k} |} \\ \leq & C_{3} \sqrt{| J_{m_{T}} |} \\ \leq & C_{3} \sqrt{\frac{T}{{(ln T)}^{2}}} . \end{matrix}

(93)

From statements (92) and (93), we deduce that

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & \leq & C_{4} (\frac{T}{{(ln T)}^{2}}) . \end{matrix}

It follows that

Ψ_{k, ℓ}^{(1)} \leq \frac{C_{4}}{δ} {(\frac{T}{ln T})}^{1 / 2} .

(94)

Let us now examine the upper bound of

Ψ_{k, ℓ}^{(2)}

of Equation (59). Remark that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} / F_{i - 2})∥}_{2}^{1 / 2} \\ = & {(E [{(\sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} | F_{i - 2}])}^{2}])}^{1 / 4}, \end{matrix}

Observe that for all

T_{i - 1} \leq t \leq T_{i}

for all

i = 1, \dots, n

; we have

F_{i - 2} \subset F_{t - δ} \subset S_{t - δ} .

Making use of Jensen and Minkowski’s inequality in the same manner as in Equation (91), it follows

\begin{matrix} \sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} | F_{i - 2}] \\ = \sum_{i = 1}^{n} (E [{(\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (x_{t}; η_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (x_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) | F_{t - δ}]) d t)}^{2} | F_{i - 2}]) \\ \leq \frac{4}{δ^{2}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [{(\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}))}^{2} | F_{i - 2}] d t . \end{matrix}

From the independence between

ϵ_{t}

and

X_{t}

, for all

T_{i - 1} \leq t \leq T_{i}

, we have

\begin{matrix} E [ϵ_{t}^{2} | S_{t - δ}] & = & E [ϵ_{t}^{2} | X_{t}] \\ = & E [ϵ_{t}^{2}] = 1 . \end{matrix}

(95)

Under conditions (M.1), (M.2), (E.1)(i), and (C.1) and Equations (61) and (95), we obtain

\begin{matrix} \frac{4}{δ^{2}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [{(\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}))}^{2} | F_{i - 2}] d t \\ = \frac{4}{δ^{2}} \sum_{i = 1}^{n} E [E [{(\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}))}^{2} | S_{t - δ}] | F_{i - 2}] \\ \leq \frac{4 n (C_{m}^{2} + 1)}{δ^{2} C_{f}} \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (\frac{\frac{1}{n} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t}^{F_{i - 2}} (x) d t}{f (x)}) ν (d x) . \end{matrix}

For all

T_{i - 1} \leq t \leq T_{i}

, using the stationarity of the process

{(X_{t})}_{t \geq 0}

, we have

f_{t}^{F_{i - 2}} (x) = f_{T_{i - 1}}^{F_{i - 2}} (x) .

We deduce

\begin{matrix} \frac{4}{δ^{2}} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [{(\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}))}^{2} | F_{i - 2}] d t \\ \leq \frac{4 δ n (C_{m}^{2} + 1)}{δ^{2} C_{f}} \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (\frac{\frac{1}{n} \sum_{i = 1}^{n} f_{T_{i - 1}}^{F_{i - 2}} (x)}{f (x)}) ν (d x) \\ = \frac{T (C_{m}^{2} + 1)}{δ^{2} C_{f}} (1 + o (1)) \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} ν (d x) \\ \leq C_{5} T, \end{matrix}

(96)

where

C_{5} = \frac{(C_{m}^{2} + 1)}{δ^{2} C_{f}} (1 + o (1)) .

It follows

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & C T^{1 / 2} . \end{matrix}

(97)

Combining Equations (59), (64), and (66), we obtain

\begin{matrix} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}) & = & O ({(\frac{T}{ln T})}^{2}) + O (T^{2}), \end{matrix}

combining this with Equation (89), we conclude

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & = & O ((\frac{δ^{4}}{T^{4}}) {(\frac{T}{ln T})}^{2}) + O ((\frac{δ^{4}}{T^{4}}) T^{2}) . \end{matrix}

(98)

Hence there exists a constant

C > 0

, such that

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & \leq & C {(\frac{1}{T})}^{2} \leq C {(\frac{ln T}{T})}^{2} . \end{matrix}

(99)

The proof is achieved. □

Proof of Lemma 8.

Consider decomposition (86) in Lemma 7, we have

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & ({\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ}) + ({\overset{˘}{θ}}_{k, ℓ} - θ_{k, ℓ}) \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

where

\begin{matrix} B_{k, ℓ, 1} & = & \frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ} \\ = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) - E [\frac{ρ (Y_{t})}{f (X_{i})} ψ_{k} (X_{t}; η_{k, ℓ}) | F_{t - δ}]) d t), \end{matrix}

(100)

\begin{matrix} B_{k, ℓ, 2} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) | F_{t - δ}] - θ_{k, ℓ}) d t . \end{matrix}

(101)

Statement (88) achieves the desired result for the term

B_{k, ℓ, 2}

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & B_{k, ℓ, 1} + o (1) . \end{matrix}

We consider the next decomposition

\begin{matrix} Ψ_{i, k, ℓ} & = & V_{i, k, ℓ} + W_{i, k, ℓ}, \end{matrix}

(102)

where

V_{i, k, ℓ} = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (x_{t}; η_{k, ℓ}) 𝟙_{A_{t}} - E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}} | F_{t - δ}]) d t,

W_{i, k, ℓ} = \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}^{c} - E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}^{c} | F_{t - δ}]) d t,

and

𝟙_{A_{t}} = \{| ϵ_{t} | \geq c_{*} \sqrt{ln T}\},

and

c_{*}

represents a constant that will be selected later. Remark that

\begin{matrix} P (|{\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) & \leq & P (|B_{k, ℓ, 1}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) + o (1) \\ = & P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{T ln T}) + o (1) \\ = & I_{1} + I_{2} + o (1), \end{matrix}

(103)

where

\begin{matrix} I_{1} & = & P (|\sum_{i = 1}^{n} V_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{T ln T}), \\ I_{2} & = & P (|\sum_{i = 1}^{n} W_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{T ln T}) . \end{matrix}

First, we aim to bound the term

I_{1}

of Equation (103). The Markov inequality and the Cauchy–Schwarz inequality yield

\begin{matrix} I_{1} & \leq & \frac{2}{κ \sqrt{T ln T}} E (|\sum_{i = 1}^{n} V_{i, k, ℓ}|) \\ \leq & \frac{2}{κ \sqrt{T ln T}} \sum_{i = 1}^{n} E (|V_{i, k, ℓ}|) . \end{matrix}

(104)

Observe that

\begin{matrix} E (|V_{i, k, ℓ}|) \\ \leq \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} E (|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}} - E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}} | F_{t - δ}]|) d t \\ \leq \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} (E (|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}|) + E (E [|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}| | F_{t - δ}])) d t \\ \leq \frac{2}{δ} \int_{T_{i - 1}}^{T_{i}} E (|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}|) d t \\ \leq \frac{2}{δ} {(E ({|\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ})|}^{2}))}^{1 / 2} {(P (A_{t}))}^{1 / 2} . \end{matrix}

(105)

Using Equation (92) combined with an elementary Gaussian inequality and taking

c_{*}

to have

\frac{c_{*}^{2}}{4} - 1 / 2 = 2 .

We obtain

\begin{matrix} I_{1} & \leq & \frac{2 C}{δ κ} \sqrt{\frac{T}{ln T}} exp \{- \frac{c_{*}^{2} ln T}{4}\} \\ \leq & \frac{2 C}{δ κ} \frac{T^{- (\frac{c_{*}^{2}}{4} - 1 / 2)}}{\sqrt{ln T}} \\ \leq & C_{κ} (\frac{1}{T^{2}}), \end{matrix}

(106)

where

C_{κ} = \frac{2 C}{δ κ} .

Now, we are going to look into determining an upper bound for

I_{2}

from decomposition (103). We begin by confirming Lemma’s 2 conditions. Assume that conditions (M.1) and (M.2) are fulfilled in connection with Equation (93), we infer

\begin{matrix} | Y_{t} 𝟙_{A_{t}^{c}} | & \leq & C_{m} + c_{*} \sqrt{ln T} \\ \leq & C \sqrt{ln T}, \end{matrix}

(107)

which implies

\begin{matrix} |W_{i, k, ℓ}| & \leq & \frac{1}{δ} \int_{T_{i - 1}}^{T_{i}} |\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}^{c} - E [\frac{ρ (Y_{t})}{f (X_{t})} ψ_{k} (X_{t}; η_{k, ℓ}) 𝟙_{A_{t}}^{c} | F_{t - δ}]| d t \\ \leq & \frac{2 C \sqrt{ln T}}{δ C} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| \\ \leq & \frac{2 C}{δ C_{f}} \sqrt{ln T} \sqrt{\frac{T}{{(ln T)}^{2}}} \\ \leq & C_{3} \sqrt{\frac{T}{ln T}} \\ \leq & C_{3} \sqrt{T}, \end{matrix}

(108)

where

C_{3} = \frac{2 C}{δ C_{f}}

, let

B = C_{3} \sqrt{T}

, then, for all

ϵ_{T} = \frac{κ}{2} \sqrt{T ln T}

with a sufficiently large T, we have

\begin{matrix} I_{2} = P (|\sum_{i = 1}^{n} W_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{T ln T}) & \leq & 2 exp \{- \frac{ϵ_{T}^{2}}{2 T B^{2}}\} \\ = & 2 exp \{- \frac{{(\frac{κ}{2} \sqrt{T ln T})}^{2}}{2 T {(C_{3} \sqrt{T})}^{2}}\} \\ = & 2 exp \{- \frac{κ^{2} ln T}{4 C_{3}^{2} T}\} \\ = & 2 exp \{ln T^{- \frac{κ^{2}}{4 C_{3}^{2} T}}\} \\ = & 2 T^{- w (κ, T)}, \end{matrix}

(109)

where

w (κ, T) = \frac{κ^{2}}{4 C_{3}^{2} T} .

Taking

κ

such that

w (n, κ) = 2,

we have

\begin{matrix} I_{2} & \leq & C (\frac{1}{T^{2}}) . \end{matrix}

(110)

It follows from Equations (103), (106), and (110) that

\begin{matrix} P (|{\hat{θ}}_{k, ℓ} - {\tilde{θ}}_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln T}{T}}) & \leq & C \frac{1}{T^{2}} + o (1) \end{matrix}

(111)

\begin{matrix} \leq C {(\frac{ln T}{T})}^{2} . \end{matrix}

(112)

The proof of Equation (74) is achieved. □

8. Besov Spaces

In terms of wavelet coefficients, in [23], the Besov space was described as follows:

0 < s < r

is a real-valued smoothness parameter of

ϱ \in L_{p} (R^{d})

, then

ϱ \in B_{p, q}^{s} (R^{d})

equivalent to

(B.1): $J_{s, p, q} (ϱ) = {∥ P_{V_{0}} ϱ ∥}_{L_{p}} + {(\sum_{j > 0} {(2^{j s} {∥ P_{W_{j}} ϱ ∥}_{L_{p}})}^{q})}^{1 / q} < \infty,$
(B.2): $J_{s, p, q}^{'} (ϱ) = {∥ a_{0} \cdot ∥}_{l_{p}} + {(\sum_{j > 0} {(2^{j (s + d (1 / 2 - 1 / p))} {∥ b_{j} \cdot ∥}_{l_{p}})}^{q})}^{1 / q} < \infty$

with

∥ a_{0} {\cdot ∥}_{l_{p}} = {(\sum_{k \in Z^{d}} {| a_{0, k} |}^{p})}^{1 / p}

and

∥ b_{j} {\cdot ∥}_{l_{p}} = {(\sum_{i = 1}^{N} \sum_{k \in Z^{d}} {| b_{i, j, k} |}^{p})}^{1 / p}

and classical sup-norm modification for

q = \infty

. The Besov spaces are useful for describing the smoothness properties of functional estimation and approximation theory. They contain well-known statistical research spaces, such as the Hilbert–Sobolev space

(H_{2}^{s} = B_{2, 2}^{s} (R^{d}))

, Holder space

C^{s} = B_{\infty, \infty}^{s} (R^{d})

for

0 < s \notin N

, and others. We refer readers to [58] for additional descriptions of Besov space and its merits in approximation theory and statistics. Reasoning as in [102], suppose that

1 \leq p \leq \infty, 1 \leq q \leq \infty .

Let

(S_{τ} f) (x) = f (x - τ) .

For

0 < s < 1

, set

\begin{matrix} γ_{s, p, q} (f) & = {(\int_{R^{d}} {(\frac{{∥S_{τ} f - f∥}_{L_{p}}}{{∥ τ ∥}^{s}})}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q} \\ γ_{s, p, \infty} & = sup_{τ \in R^{d}} \frac{{∥S_{τ} f - f∥}_{L_{p}}}{{∥ τ ∥}^{S}} \end{matrix}

For

s = 1,

set

\begin{matrix} γ_{1, p, q} (f) & = {(\int_{R^{d}} {(\frac{{∥S_{τ} f + S_{- τ} f - 2 f∥}_{L_{p}}}{∥ τ ∥})}^{q} \frac{d τ}{{∥ τ ∥}^{d}})}^{1 / q}, \\ γ_{1, p, \infty} & = sup_{τ \in R^{d}} \frac{{∥S_{τ} f + S_{- τ} f - 2 f∥}_{L_{p}}}{∥ τ ∥} . \end{matrix}

For

0 < s < 1

and

1 \leq p, q \leq \infty,

define

B_{p, q}^{s} (R^{d}) = \{f \in L_{p} : γ_{s, p, q} < \infty\} .

For

s > 1,

put

s = {[s]}^{-} + {s}^{+}

with

{[s]}^{-}

an integer and

0 < {s}^{+} \leq 1 .

Define

B_{p, q}^{s} (R^{d})

to be the space of functions in

L_{p} (R^{d})

such that

D^{j} f \in B_{{s}^{+}, p, q}

for all

| j | \leq {[s]}^{-} .

The norm is defined by

{∥ f ∥}_{B_{p, q}^{s} (R^{d})} = {∥ f ∥}_{L_{p}} + \sum_{{| j | \leq s}}^{-}} γ_{{s}^{+}, p, q} (D^{j} f) .

Author Contributions

S.D. and S.B.: conceptualization, methodology, investigation, writing—original draft, and writing—review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Special Issue Editor of the Special Issue on “Functional Data Analysis: Theory and Applications to Different Scenarios”, Mustapha Rachdi, for the invitation. The authors are very grateful to the three referees whose comments helped in improving the quality of this work.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Bosq, D. Linear processes in function spaces. In Lecture Notes in Statistics; Springer: New York, NY, USA, 2000; Volume 149, pp. xiv+283. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005; pp. xx+426. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2006; pp. xx+258. [Google Scholar]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012; pp. xiv+422. [Google Scholar] [CrossRef]
Zhang, J.T. Analysis of variance for functional data. In Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA, 2014; Volume 127, pp. xxiv+386. [Google Scholar]
Shi, J.Q.; Choi, T. Gaussian Process Regression Analysis for Functional Data; CRC Press: Boca Raton, FL, USA, 2011; pp. xx+196. [Google Scholar]
Geenens, G. Curse of dimensionality and related issues in nonparametric functional regression. Stat. Surv. 2011, 5, 30–43. [Google Scholar] [CrossRef]
Cuevas, A. A partial overview of the theory of statistics with functional data. J. Statist. Plann. Inference 2014, 147, 1–23. [Google Scholar] [CrossRef]
Shang, H.L. A survey of functional principal component analysis. AStA Adv. Stat. Anal. 2014, 98, 121–142. [Google Scholar] [CrossRef]
Horváth, L.; Rice, G. An introduction to functional data analysis and a principal component approach for testing the equality of mean curves. Rev. Mat. Complut. 2015, 28, 505–548. [Google Scholar] [CrossRef]
Müller, H.G. Peter Hall, functional data analysis and random objects. Ann. Statist. 2016, 44, 1867–1887. [Google Scholar] [CrossRef]
Nagy, S. An overview of consistency results for depth functionals. In Functional Statistics and Related Fields; Springer: Cham, Switzerland, 2017; pp. 189–196. [Google Scholar]
Vieu, P. On dimension reduction models for functional data. Statist. Probab. Lett. 2018, 136, 134–138. [Google Scholar] [CrossRef]
Aneiros, G.; Cao, R.; Fraiman, R.; Genest, C.; Vieu, P. Recent advances in functional data analysis and high-dimensional statistics. J. Multivariate Anal. 2019, 170, 3–9. [Google Scholar] [CrossRef]
Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
Goia, A.; Vieu, P. An introduction to recent advances in high/infinite dimensional statistics [Editorial]. J. Multivariate Anal. 2016, 146, 1–6. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Bouzebda, S.; Chikr Elmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model 2022, 38, 47–63. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat. 2022, 34, 250–281. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 2022, 5, 431–533. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. 2022, 1–56. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Kaid, Z.; Laksaci, A.; Rachdi, M. Estimating the Conditional Density in Scalar-On-Function Regression Structure: K-N-N Local Linear Approach. Mathematics 2022, 10, 902. [Google Scholar] [CrossRef]
Meyer, Y. Wavelets and Operators; Cambridge University Press: Cambridge, UK, 1993; pp. 35–58. [Google Scholar] [CrossRef]
Daubechies, I. Ten lectures on wavelets. In CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 61, pp. xx+357. [Google Scholar] [CrossRef]
Mallat, S. A Wavelet Tour of Signal Processing, 3rd ed.; Elsevier/Academic Press: Amsterdam, The Netherlands, 2009; pp. xxii+805. [Google Scholar]
Vidakovic, B. Statistical Modeling by Wavelets; Wiley Series in Probability and Statistics: Applied Probability and Statistics; John Wiley & Sons Inc.: New York, NY, USA, 1999; pp. xiv+382. [Google Scholar] [CrossRef]
Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, approximation, and statistical applications. In Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 129, pp. xviii+265. [Google Scholar] [CrossRef]
Rao, B.L.S.P. Nonparametric estimation of the derivatives of a density by the method of wavelets. Bull. Inform. Cybernet. 1996, 28, 91–100. [Google Scholar] [CrossRef]
Chaubey, Y.P.; Doosti, H.; Prakasa Rao, B.L.S. Wavelet based estimation of the derivatives of a density for a negatively associated process. J. Stat. Theory Pract. 2008, 2, 453–463. [Google Scholar] [CrossRef]
Rao, B.L.S.P. Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets. In Asymptotics in Statistics and Probability, Papers in Honor of George Gregory Roussas; De Gruyter: Berlin, Germany, 2018; pp. 321–330. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Wavelet estimation for derivative of a density in the presence of additive noise. Braz. J. Probab. Stat. 2018, 32, 834–850. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Liu, J. Multivariate wavelet estimators for weakly dependent processes: Strong consistency rate. Comm. Statist. Theory Methods 2022, 1–34. [Google Scholar] [CrossRef]
Chesneau, C.; Kachour, M.; Maillot, B. Nonparametric estimation for functional data by wavelet thresholding. REVSTAT 2013, 11, 211–230. [Google Scholar]
Laib, N.; Louani, D. Nonparametric kernel regression estimation for functional stationary ergodic data: Asymptotic properties. J. Multivar. Anal. 2010, 101, 2266–2281. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S.; El Hajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Statist. 2015, 24, 163–199. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some results about kernel estimators for function derivatives based on stationary and ergodic continuous time processes with applications. Comm. Statist. Theory Methods 2022, 51, 3886–3933. [Google Scholar] [CrossRef]
Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
Bouzebda, S.; Chaouch, M.; Didi, S. Asymptotics for function derivatives estimators based on stationary and ergodic discrete time processes. Ann. Inst. Statist. Math. 2022, 74, 1–35. [Google Scholar] [CrossRef]
Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef]
Aït-Sahalia, Y. Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica 2002, 70, 223–262. [Google Scholar] [CrossRef]
Jiang, G.J.; Knight, J.L. A nonparametric approach to the estimation of diffusion processes, with an application to a short-term interest rate model. Econom. Theory 1997, 13, 615–645. [Google Scholar] [CrossRef]
Kutoyants, Y.A. Efficient density estimation for ergodic diffusion processes. Stat. Inference Stoch. Process. 1998, 1, 131–155. [Google Scholar] [CrossRef]
Banon, G. Nonparametric identification for diffusion processes. SIAM J. Control Optim. 1978, 16, 380–395. [Google Scholar] [CrossRef]
Castellana, J.V.; Leadbetter, M.R. On smoothed probability density estimation for stationary processes. Stoch. Process. Appl. 1986, 21, 179–193. [Google Scholar] [CrossRef]
Didi, S.; Alharby, A.; Bouzebda, S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics 2022, 10, 3433. [Google Scholar] [CrossRef]
Goh, S.S. Wavelet bases for Hilbert spaces of functions. Complex Var. Elliptic Equ. 2007, 52, 245–260. [Google Scholar] [CrossRef]
Kerkyacharian, G.; Picard, D. Density estimation in Besov spaces. Statist. Probab. Lett. 1992, 13, 15–24. [Google Scholar] [CrossRef]
Kerkyacharian, G.; Picard, D. Thresholding algorithms, maxisets and well-concentrated bases. Test 2000, 9, 283–344. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Nonparametric density estimation for functional data via wavelets. Commun. Stat. Theory Methods 2010, 39, 1608–1618. [Google Scholar] [CrossRef]
Cohen, A.; DeVore, R.; Kerkyacharian, G.; Picard, D. Maximal spaces with given rate of convergence for thresholding algorithms. Appl. Comput. Harmon. Anal. 2001, 11, 167–191. [Google Scholar] [CrossRef]
DeVore, R.A. Nonlinear approximation. In Acta Numerica, 1998; Cambridge University Press: Cambridge, UK, 1998; Volume 7, pp. 51–150. [Google Scholar] [CrossRef]
Autin, F. Point de vue Maxiset en Estimation non Paramétrique. Ph.D. Thesis, Université Paris, Paris, France, 2004. [Google Scholar]
Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Statist. 1996, 24, 508–539. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination. J. Nonparametr. Stat. 2004, 16, 111–125, The International Conference on Recent Trends and Directions in Nonparametric Statistics. [Google Scholar] [CrossRef]
Peškir, G. The uniform mean-square ergodic theorem for wide sense stationary processes. Stoch. Anal. Appl. 1998, 16, 697–720. [Google Scholar] [CrossRef]
Delecroix, M. Sur l’Estimation et la Prévision non Paramétrique des Processus Ergodiques Cycles économiques et Taches Solaires. Ph.D. Thesis, Université Lille, Lille, France, 1987. [Google Scholar]
Didi, S.; Louani, D. Asymptotic results for the regression function estimate on continuous time stationary and ergodic data. Stat. Risk Model. 2014, 31, 129–150. [Google Scholar] [CrossRef]
Rosenblatt, M. Uniform ergodicity and strong mixing. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 1972, 24, 79–84. [Google Scholar] [CrossRef]
Bradley, R.C. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 3, pp. xii+597. [Google Scholar]
Neumann, M.H. Absolute regularity and ergodicity of Poisson count processes. Bernoulli 2011, 17, 1268–1284. [Google Scholar] [CrossRef]
Didi, S. Quelques propriétés Asymptotiques en Estimation non Paramétrique de Fonctionnelles de Processus Stationnaires en Temps Continu. Ph.D. Thesis, Université Pierre et Marie Curie, Paris, France, 2014. [Google Scholar]
Beran, J. Statistics for Long-Memory Processes. In Monographs on Statistics and Applied Probability; Chapman and Hall: New York, NY, USA, 1994; Volume 61, pp. x+315. [Google Scholar]
Lu, Z. Analysis of Stationary and Non-Stationary Long Memory Processes: Estimation, Applications and Forecast. 2009. Available online: https://tel.archives-ouvertes.fr/tel-00422376/document (accessed on 10 October 2022).
Maslowski, B.; Pospíšil, J. Ergodicity and parameter estimates for infinite-dimensional fractional Ornstein-Uhlenbeck process. Appl. Math. Optim. 2008, 57, 401–429. [Google Scholar] [CrossRef]
Masry, E. Probability density estimation from sampled data. IEEE Trans. Inform. Theory 1983, 29, 696–709. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Nonparametric density estimation for stochastic processes from sampled data. Publ. Inst. Statist. Univ. Paris 1990, 35, 51–83. [Google Scholar]
Prakasa Rao, B.L.S. Statistical inference for diffusion type processes. In Kendall’s Library of Statistics; Edward Arnold: London, UK; Oxford University Press: New York, NY, USA, 1999; Volume 8, pp. xvi+349. [Google Scholar]
Bosq, D. Nonparametric Statistics for Stochastic Processes, 2nd ed.; Springer: New York, NY, USA, 1998; Volume 110, pp. xvi+210. [Google Scholar] [CrossRef]
Blanke, D.; Pumo, B. Optimal sampling for density estimation in continuous time. J. Time Ser. Anal. 2003, 24, 1–23. [Google Scholar] [CrossRef]
Comte, F.; Merlevède, F. Adaptive estimation of the stationary density of discrete and continuous time mixing processes. ESAIM Probab. Statist. 2002, 6, 211–238. [Google Scholar] [CrossRef]
Newey, W.K.; Powell, J.L. Asymmetric least squares estimation and testing. Econometrica 1987, 55, 819–847. [Google Scholar] [CrossRef]
Mohammedi, M.; Bouzebda, S.; Laksaci, A. On the nonparametric estimation of the functional expectile regression. C. R. Math. Acad. Sci. Paris 2020, 358, 267–272. [Google Scholar] [CrossRef]
Mohammedi, M.; Bouzebda, S.; Laksaci, A. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal. 2021, 181, 104673. [Google Scholar] [CrossRef]
Litimein, O.; Laksaci, A.; Mechab, B.; Bouzebda, S. Local linear estimate of the functional expectile regression. Statist. Probab. Lett. 2023, 192, 109682. [Google Scholar] [CrossRef]
Huber, P.J. Robust estimation of a location parameter. Ann. Math. Statist. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell System Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley Series in Telecommunications; John Wiley & Sons Inc.: New York, NY, USA, 1991; pp. xxiv+542. [Google Scholar] [CrossRef]
Rényi, A. On the dimension and entropy of probability distributions. Acta Math. Acad. Sci. Hungar. 1959, 10, 193–215. [Google Scholar] [CrossRef]
Kullback, S. Information Theory and Statistics; John Wiley & Sons, Inc: New York, NY, USA; Chapman & Hall, Ltd.: London, UK, 1959; pp. xvii+395. [Google Scholar]
Gokhale, D.V.; Kullback, S. The information in contingency tables. In Statistics: Textbooks and Monographs; Marcel Dekker, Inc.: New York, NY, USA, 1978; Volume 23, pp. x+365. [Google Scholar]
Csiszár, I. Informationstheoretische Konvergenzbegriffe im Raum der Wahrscheinlichkeitsverteilungen. Magy. Tud. Akad. Mat. Kutató Int. Közl. 1962, 7, 137–158. [Google Scholar]
Vasicek, O. A test for normality based on sample entropy. J. Roy. Statist. Soc. Ser. B 1976, 38, 54–59. [Google Scholar] [CrossRef]
Dudewicz, E.J.; van der Meulen, E.C. Entropy-based tests of uniformity. J. Amer. Statist. Assoc. 1981, 76, 967–974. [Google Scholar] [CrossRef]
Ebrahimi, N.; Habibullah, M.; Soofi, E. Testing exponentiality based on Kullback-Leibler information. J. Roy. Statist. Soc. Ser. B 1992, 54, 739–748. [Google Scholar] [CrossRef]
Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Verdugo Lazo, A.C.G.; Rathie, P.N. On the entropy of continuous probability distributions. IEEE Trans. Inform. Theory 1978, 24, 120–122. [Google Scholar] [CrossRef]
Bouzebda, S.; Elhattab, I. A strong consistency of a nonparametric estimate of entropy under random censorship. C. R. Math. Acad. Sci. Paris 2009, 347, 821–826. [Google Scholar] [CrossRef]
Bouzebda, S.; Elhattab, I. Uniform in bandwidth consistency of the kernel-type estimator of the Shannon’s entropy. C. R. Math. Acad. Sci. Paris 2010, 348, 317–321. [Google Scholar] [CrossRef]
Bouzebda, S.; Elhattab, I.; Nemouchi, B. On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat. 2021, 33, 321–358. [Google Scholar] [CrossRef]
Bouzebda, S.; El-hadjali, T. Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 2020, 32, 864–914. [Google Scholar] [CrossRef]
Lad, F.; Sanfilippo, G.; Agrò, G. Extropy: Complementary dual of entropy. Statist. Sci. 2015, 30, 40–58. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. The functional nonparametric model and application to spectrometric data. Comput. Statist. 2002, 17, 545–564. [Google Scholar] [CrossRef]
Ouassou, I.; Rachdi, M. Regression operator estimation by delta-sequences method for functional data and its applications. AStA Adv. Stat. Anal. 2012, 96, 451–465. [Google Scholar] [CrossRef]
Hall, P.; Penev, S. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 2001, 7, 317–341. [Google Scholar] [CrossRef]
Burkholder, D.L. Distribution function inequalities for martingales. Ann. Probab. 1973, 1, 19–42. [Google Scholar] [CrossRef]
de la Peña, V.H.; Giné, E. Decoupling; Probability and its Applications (New York); Springer: New York, NY, USA, 1999; pp. xvi+392. [Google Scholar] [CrossRef]
Masry, E. Wavelet-based estimation of multivariate regression functions in Besov spaces. J. Nonparametr. Statist. 2000, 12, 283–308. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes

Abstract

1. Introduction and Motivations

2. Multiresolution Analysis

Besov Space

3. The Density Estimation

Density Function Estimator

Comments on the Method of Estimation

4. The Regression Estimation

Regression Function Estimator

5. Applications

5.1. The Conditional Distribution

5.2. The Conditional Quantile

5.3. Shannon’s Entropy

5.4. The Curve Discrimination

6. Concluding Remarks

7. Proofs

8. Besov Spaces

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics