Abstract
This study introduces a wavelet-based framework for estimating derivatives of a general regression function within discrete-time, stationary ergodic processes. The analysis focuses on deriving the integrated mean squared error (IMSE) over compact subsets of , while also establishing rates of uniform convergence and the asymptotic normality of the proposed estimators. To investigate their asymptotic behavior, we adopt a martingale-based approach specifically adapted to the ergodic nature of the data-generating process. Importantly, the framework imposes no structural assumptions beyond ergodicity, thereby circumventing restrictive dependence conditions. By establishing the limiting behavior of the wavelet estimators under these minimal assumptions, the results extend existing findings for independent data and highlight the flexibility of wavelet methods in more general stochastic settings.
Keywords:
regression estimation; stationarity; ergodicity; rates of strong convergence; wavelet-based estimators; martingale differences; discrete time; stochastic processes; time series MSC:
62G07; 62G08; 62G05; 62G20; 62H05; 60G42; 60G46
1. Introduction
Nonparametric estimation has attracted sustained attention over many years, leading to a broad spectrum of methodological developments. Given its extensive range of applications and significant role within mathematical statistics, the estimation of both density and regression functions has become a central research theme. Among the most frequently employed approaches are kernel-type estimators, recognized for their flexibility and robust performance. For comprehensive treatments of these methods and their applications, the reader is referred to [1,2], and the references therein. Estimating derivatives of a function—whether a density or a regression function—serves as a powerful technique in statistical data analysis. Despite their importance, however, the nonparametric estimation of higher-order density derivatives has not been extensively explored. A key objective of this work is thus to investigate wavelet-based nonparametric estimators for the partial derivatives of multivariate densities. Derivative estimation arises in many disciplines, including economics and industry, where complex systems must often be modeled under limited prior knowledge. For instance, the second-order derivative of a density can support statistical tests for mode detection [3] and also guides bandwidth selection in kernel density estimation [4]. In nonparametric signal estimation, the logarithmic derivative of the density—defined by the ratio of the derivative to the density itself—is instrumental for optimal filtering and interpolation procedures [5], thereby making precise estimation crucial for accurate signal processing. Moreover, gradient estimation is essential for filament detection in point cloud data, a task of broad relevance in medical imaging, remote sensing, seismology, and cosmology [6]. Additional motivations and applications appear in regression analysis, Fisher information estimation, parameter estimation, and hypothesis testing [7]. Seminal investigations into density derivative estimation include [8,9,10], among others. Further context for related statistical challenges, such as regression, Fisher information estimation, parameter estimation, and hypothesis testing, may be found in [7].
It is likewise well documented that estimating first- and higher-order regression derivatives holds substantial practical importance. Examples include modeling human growth processes [11], assessing kidney function in lupus nephritis patients [12], and analyzing Raman spectra of bulk materials [13]. In nonparametric regression, derivative estimation is indispensable for constructing confidence intervals [14], selecting kernel bandwidths [15], and comparing regression curves [16]. In the setting of a homoscedastic regression model, ref. [17] proposed kernel M-estimators to estimate the first derivative of the regression function in a nonparametric fashion, and extended these ideas heuristically to higher-order derivatives. Derivative information also underpins modal regression, an alternative to traditional regression methods for investigating the relationship between a response variable and a predictor variable ; see [18]. Foundational studies of regression function estimation can be found in [19,20]. In contrast, fewer works address the estimation of derivatives for stationary densities or regression functions, with most established results pertaining to independent and identically distributed data. Wavelet-based techniques have recently attracted growing interest in statistical estimation, largely owing to their adaptability to varying function regularity and their capacity to handle discontinuities effectively. Furthermore, wavelet methods often lead to computationally efficient algorithms that require relatively modest memory. For a detailed overview of wavelet methodologies in nonparametric functional estimation, see [21]. Examples of wavelet applications include estimating the integrated squared derivative of a univariate density for independent data [22] and for negatively or positively associated sequences [23]. The work of [24] extended these methods to partial derivatives of multivariate densities under independence, while [25,26] addressed the mixing scenario. More recently, ref. [27] studied wavelet estimators for the partial derivatives of multivariate densities under additive noise.
To the best of our knowledge, methods for wavelet-based estimation of partial derivatives of multivariate densities have not yet been extended to more general dependence structures beyond the strong mixing framework. Addressing this gap forms the central motivation of our work. In particular, we draw on a collection of martingale-based techniques that markedly differ from the tools typically employed under strong mixing conditions. Nevertheless, as the subsequent sections will illustrate, bridging this gap entails much more than merely combining pre-existing methodologies: it requires advanced mathematical developments tailored to wavelet-based estimation in an ergodic setting.
The remainder of this article is organised as follows. Section 2 reviews the necessary mathematical foundations and introduces the proposed class of linear wavelet estimators. Section 3 specifies the underlying assumptions and presents the principal theoretical results—uniform convergence rates and asymptotic normality established under weak-dependence conditions. Section 4 applies the methodology to regression-derivative estimation, whereas Section 5 explores its extension to mode regression. Concluding remarks and avenues for further investigation are given in Section 6. To maintain the flow of exposition, all proofs are collected in Section 7.
Notation
Unless otherwise specified, C denotes a positive constant whose value may change from line to line. We write for the indicator function of a set A. For two sequences of positive real numbers and , we use the Landau notation
when there exists a constant such that for all sufficiently large n, and
when .
2. Mathematical Backgrounds
2.1. Besov Spaces
Throughout this study we adopt the Besov scale with parameters and . Meyer’s wavelet characterization [28] shows that, for and any , the inclusion is equivalent to the finiteness of either of the two seminorms
- (B.1)
- (B.2)
where the scaling and wavelet coefficients are defined by
and
When , the summations in and are replaced by the essential supremum. The Besov framework subsumes many classical smoothness classes that are ubiquitous in statistics and machine learning—for instance, the Sobolev (Hilbert) space corresponds to , while the (non-integer) Hölder–Lipschitz–Zygmund class coincides with . Additional equivalent formulations, together with their advantages in approximation theory and statistical methodology, are detailed in [29,30,31,32] and in Appendix A.
2.2. Linear Wavelets Estimator
This section opens with a concise review of the essentials of wavelet theory, following the notation of [28]. Let be a multiresolution analysis of the Hilbert space . Denote by the scaling function and by the associated orthogonal wavelet, both assumed to be r-regular () and compactly supported in the hyper-cube for some . For every integer j and every index vector , define
The family forms an orthonormal basis of ; moreover, each partial derivative of up to total order r is rapidly decreasing. Specifically, for every integer there exists a constant such that (see [28], p. 29, Thm.2)
In addition, there exist precisely companion wavelets . Together they generate the collection
which constitutes an orthonormal basis of .
We next recall the notions of strong mixing and ergodicity and the relationship between them. Let be a strictly stationary sequence, and set
and
The sequence is (-)strongly mixing if
It is ergodic when
where denotes the left-shift operator. The foregoing definition of strong mixing is stricter than the one often used for measure-preserving transformations—namely, for all measurable sets [33]. While strong mixing implies ergodicity, the converse need not hold (cf. Remark 2.6, p. 50, and Proposition 2.8, p. 51, of [34]). A number of authors have argued that an ergodic, rather than a strongly mixing, dependence structure is often preferable; see, for example, the discussion and illustrative examples in [35].
Finally, let be a random vector with and . Define the joint distribution function (df) of by
Henceforth, for vectors and , we write if and only if for all . We let and be two fixed subsets of such that
with
Suppose has a joint density function
with respect to the Lebesgue measure . Denote
as the marginal density of on . Let be a measurable function. We are chiefly interested in estimating derivatives of the form
where
When the regression function
is sufficiently smooth, we may also consider its derivative
Assume now that . According to [36], for any integer , the function admits an expansion in the subspace of the multiresolution analysis:
where each wavelet coefficient can be represented (via repeated integration by parts) as
Let denote the -order partial derivative of . We define the linear estimator of at the resolution level (the precise divergence rate of will be specified below) by
where is the unbiased empirical estimator of :
Remark 1.
Since and are bounded and compactly supported—where the support grows in a controlled manner with increasing differentiability—it follows that, for any fixed point , only finitely many terms in the sums over contribute to the value of the wavelet expansions (see [36] for further details). This implies pointwise convergence of the expansions.
3. Assumptions and Main Results
To facilitate the presentation of our main results, we introduce the following notation. Let us denote by the density function of the random variable Y. Let be the -field generated by , let and and be the conditional densities of and Y respectively, given the field . Define the field
The following assumptions are imposed throughout the paper.
- (C.1)
- For every , the sequenceconverges to as , both almost surely (a.s.) and in the sense.
- (C.2)
- Moreover, for every ,again in both the almost sure and senses.
- (C.3)
- (i)
- Assume that the multiresolution analysis is r-regular.
- (ii)
- The density for some .
- (C.4)
- (i)
- The density for some .
- (ii)
- The conditional density for some .
- (N.0)
- for some .
- (N.1)
- For anyWe may refer to [37] for further details.
- (N.2)
- (i)
- The conditional mean of given the field depends only on , i.e., for any ,
- (ii)
- for all ,and the function is continuous at
- (N.3)
- (i)
- We shall suppose that there exist two known constants such that
- (ii)
- The function regression satisfies a Hölder condition, that is, there exist constants and such that, for any
- (iii)
- The function regression satisfy a Hölder condition that is, there exist constants and such that, for any
Lemma 1.
Under condition (C.3), for , we have
Define the kernel by
Theorem 1.
Under the stated assumptions (C.1), (C.4)(i), (N.0) and (N.2)(i), let r be an element of the Besov space with , and . In this setting, the linear wavelet estimator satisfies
Theorem 2.
Assume that assumptions (C.2), (N.1), (N.2), (N.3)(i) and
Define and assume that for slowly enough such that
For every compact subset , under assumptions (C.1) and (C.3), we have almost surely,
Remark 2.
The factor that appears in the preceding theorems corresponds directly to the bandwidth in Parzen–Rosenblatt kernel density estimation. In practice, however, selecting the multiresolution level within the wavelet framework is typically much simpler than choosing . Because only a small, discrete set of candidate levels—usually three or four—needs to be examined, the procedure remains both conceptually transparent and computationally inexpensive. Adaptive choice of most commonly relies on Stein’s unbiased risk estimate, the classical rule of thumb, and cross-validation. Comprehensive treatments of these methods, as well as their use in establishing asymptotically optimal data-driven bandwidth selection rules, are provided by [38] and [21]. For the univariate setting (), the cross-validation criterion at resolution level j is given by
The statistic depends solely on the observations and the resolution index j. The optimal level is selected by
Remark 3.
Kernel estimators suffer a well-documented decline in accuracy as the dimension of the covariate space grows—a manifestation of the curse of dimensionality [39]. In high dimensions, the volume of a local neighbourhood expands so rapidly that an impractically large sample is required to collect even a modest number of observations inside it. Unless the sample size is very large, one must therefore adopt bandwidths so wide that the resulting estimate is no longer truly local. A thorough exposition, complete with numerical illustrations, appears in [40], while more recent analyses are given in [41,42]. Although penalised splines enjoy considerable empirical popularity, their asymptotic behaviour is still not fully resolved even in classical non-parametric settings, and only a handful of theoretical investigations address the issue. In addition, most functional-regression techniques minimise an $L_2$ criterion and thus remain sensitive to outliers. Promising, if less conventional, alternatives include methods based on delta sequences and wavelet representations.
3.1. Asymptotic Normality Results
This section establishes a central limit theorem for the estimator defined in Equation (3). This problem was considered in several papers. In [10,43], the authors analyzed a non-parametric regression model with repeated measurements corrupted by mixing errors and, under mild moment and mixing-coefficient assumptions, established the r-th mean consistency, almost-sure consistency, complete consistency, optimal almost-sure convergence rates, and asymptotic normality of the associated wavelet estimator. For deconvolution density estimation with moderately ill-posed noise, ref. [44] proves the asymptotic normality of wavelet estimators when the target density lies in suitable Besov spaces. The study in [45] revisits non-parametric regression, deriving asymptotic normality of the wavelet estimator when the errors constitute an asymptotically negatively associated sequence. In a censored-data setting, ref. [46] treats density estimation where survival and censoring times form a stationary -mixing sequence, again demonstrating asymptotic normality of the wavelet estimator. A fixed-design non-parametric regression model driven by strongly mixing errors is considered in [47], which likewise confirms asymptotic normality for the wavelet estimator of the regression function. Wavelet methods for linear processes—Gaussian or otherwise—with long, short, or negative memory are examined in [48]; both the log-regression and Whittle wavelet estimators are shown to be asymptotically normal, and explicit expressions for the limiting variance are derived via a general result for the suitably centred and normalised scalogram. Finally, ref. [49] proposes a wavelet estimator for fixed-design non-parametric regression with strictly stationary, associated errors, proving pointwise weak consistency and uniform asymptotic normality. For additional treatments of asymptotic normality of wavelet estimators in alternative frameworks, see [10,50,51]. Our results are obtained under mild regularity conditions on the estimator and only minimal bandwidth assumptions. We write
to indicate that the sequence of random variables converges in distribution to a mean-zero normal distribution with covariance matrix .
Theorem 3.
Assume that the hypotheses (C.1)–(C.4), (N.0), (N.2)(ii) and (N.3)(iii) are satisfied. Additionally, suppose that
Then, the following convergence holds:
where
and
The proof of Theorem 3 is detailed in Section 7.
Remark 4.
Invoking Theorem 1, we obtain a refined bound for the mean-squared error of the wavelet-based estimator of multivariate density derivatives. Under the usual smoothness and moment assumptions,
A parallel analysis, this time relying on Theorem 3, yields an asymptotic normality result. Specifically, for any fixed ,
with asymptotic variance
Both results are consistent with the theoretical developments in Theorem 4.8 of [52], further demonstrating the efficacy of wavelet methods for non-parametric estimation in multivariate contexts.
Remark 5.
A noteworthy case of arises when . In this setting, from (3) we obtain
where is the unbiased estimator given by
Since this is a specific instance of the estimator in (3), Theorem 3 implies that
where
This result is pivotal for determining the asymptotic behavior of wavelet estimators derived from the conditional distribution.
3.2. Confidence Interval
The asymptotic variance that features in the central limit theorem depends on two unknown components—the conditional variance of Y given and the density of —and therefore must be estimated in practice. We obtain consistent estimates by employing a bounded, compactly supported wavelet basis drawn from the literature, such as the widely used Daubechies family [36]. Using a sufficiently large multiresolution level and an adaptively chosen initial level $j_0$, we recover the requisite nuisance parameters via wavelet-based estimation and subsequently insert them through a plug-in scheme. Using the estimators (3) we establish a consistent estimate of the variance given by
where is the wavelet estimator of and
The considered coefficients estimators are given, respectively, for any , by
and
The approximate confidence intervals of can be obtained as
where denotes the quantile of the standard normal condition.
4. Application to the Regression Derivatives
In this section, we will follow the same notation as in [20]. We will consider especially the conditional expectation of given , for . Recall that
Recall the following derivatives
and
In order to estimate the derivatives of in (15) and in (16) by replacing , , , and by , , , , and . We so define and when . The definition of and is completed by setting when .
The following theorem is more or less a straightforward consequence of Theorem 2.
Corollary 1.
Under the assumptions of Theorem 2, we have
5. Mode Regression
Now the location and size of the mode of are estimated from the respective functionals and pertaining to , i.e., is chosen through the equation
Then, a natural estimator of is
Note that the estimate is not necessarily unique and our results are valid for any chosen value satisfying (18). We point out that we can specify our choice by taking
It is known that kernel estimators tend to produce some additional and superfluous modality. However, this has no bearing on the asymptotic theory; our results are valid for any choice of satisfying (18). Following the paper [53], we have the convergence result given in the following corollary.
Corollary 2.
Under the assumptions of Theorem 2, we have
Remark 6.
Using weak-convergence techniques in the space of continuous functions, ref. [54] strengthened Parzen’s distributional result under broadly weaker assumptions. He showed that an appropriately rescaled kernel-density estimator converges weakly to a randomly translated parabola that passes through the origin and possesses a fixed second derivative. In [55], the key insight is that, on a shrinking neighbourhood around the mode, the estimator f converges weakly—under very general conditions—to a Gaussian process whose mean function is a parabola through the origin. This mean depends on the kernel moments and on the unknown density and its derivatives evaluated at the mode. More precisely, define
where denotes the mode of f. The process is thus a normalised version of the estimator in an interval centred at the mode. The limiting Gaussian process is given by
where , , is fixed,
and is a mean-zero Gaussian process with covariance
The function is defined by
Ref. [55] establishes weak convergence of to Z for independent data. A central step in his Theorem 2.1 is to express as an average of i.i.d. variables, facilitating covariance computations and convergence of finite-dimensional distributions. Tightness is obtained via Theorem 15.7 of [56], which reduces the argument to evaluating the variance of independent variables (relation 2.9). The present study tackles the same problem via alternative methods, extending the framework of [18] to censored, dependent observations. Adapting Eddy’s approach to this setting would be an attractive direction for future work. Such an extension, however, requires new probabilistic results analogous to those in [55] but tailored to dependent—e.g., mixing—samples, a significant undertaking that we leave for subsequent research.
Remark 7.
We note that, when , may be obtained likewise through the usual Leibniz expansion of derivatives of products given by
Remark 8.
It is well known that the accuracy of kernel estimators deteriorates as the dimension of the predictor space grows. This phenomenon—popularly termed the curse of dimensionality [39]—stems from the fact that, in high dimensions, an impractically large number of observations is needed within any local neighborhood to secure a dependable estimate. When the available sample size is modest, practitioners are forced to adopt a bandwidth so wide that the very notion of “local” averaging effectively collapses. A thorough treatment of these difficulties, complemented by illustrative numerical experiments, can be found in [40]; see also the more recent contributions [41,42,57] for additional insights. Despite their popularity, penalized splines remain theoretically under-explored: rigorous results describing their asymptotic behaviour are sparse even in classical non-parametric settings. Moreover, a large proportion of functional-regression techniques are predicated on minimising the $L_2$ loss, rendering them particularly susceptible to outlying observations. More robust alternatives—such as methods based on delta sequences or on wavelet decompositions—offer promising directions for mitigating this sensitivity.
Remark 9.
Let us recall some integral functions of the density function
and
Notice that the functional is a special case of . The functionals and appear in plug-in data-driven bandwidth selection procedures in density estimation (refer to [58] and the references therein) and the functional arises as part of the variance in non-parametric location and regression estimation based on linear rank statistics (refer especially to [59]. Consider the following general class of integral functionals of the density:
where F is a cumulative distribution function on with derivatives , refer to the [60,61] for more details. One can estimate the plug-in methods making use of the wavelet estimate of the density function and its derivatives. The proof of such a statement, however, should require a different methodology than that used in the present paper, and we leave this problem open for future research.
Remark 10.
The nonlinear thresholding framework furnishes an alternative class of estimators for the unknown density . Specifically,
where denotes an appropriately selected threshold. In the univariate case (), this estimator was first proposed by [29]. Extending its theoretical guarantees and empirical performance to higher-dimensional settings remains an intriguing direction for future research.
Remark 11.
The sup-norm convergence rates proved in our theorems coincide with the optimal rates reported by [62,63,64]. The exact logarithmic factors depend on the resolution level, which is itself governed by the smoothness index s of the target function f in the Besov space . Such a dependence on s is a hallmark of non-parametric estimation and is widely documented in the literature. By allowing f to possess general Besov regularity, our analysis relaxes the customary requirement of an integer-order derivative that underpins classical convolution-kernel methods, even though s is typically unknown in practice.
Remark 12.
To demonstrate that the assumptions stated in [65] are attainable, we provide three illustrative examples in which they are fulfilled:
- Long-memory discrete-time processes: Let be white noise with variance ; denote the identity and back-shift operators by I and B, respectively. According to [66] (Theorem 1, p. 55), the k-factor Gegenbauer process satisfieswith when and when for . This specification yields a stationary, causal, invertible series that exhibits long-range dependence. Moreover, it admits the moving-average representationand the conditionguarantees asymptotic stability. Nevertheless, ref. [67] shows that if is Gaussian, the process is not strongly mixing. Even so, the moving-average form secures stationarity, Gaussianity, and ergodicity, clarifying the subtle influence of mixing conditions and emphasizing the interpretive value of the moving-average representation for long-memory dynamics.
- Stationary solution of a linear Markov process: Considerwhere are independent symmetric Bernoulli variables taking the values and 1. As shown in [68], this model is not α-mixing because of its dependence structure. It nevertheless remains stationary, Markov, and ergodic, illustrating that strong mixing is not necessary for either Markovianity or ergodicity—a point of direct relevance to statistical inference for time series and functional data.
- A stationary process with an representation: Let be an independent and identically distributed sequence uniformly distributed on , and defineso that constitute the decimal expansion of . The series is stationary and can be written in form:whereis strong white noise. Although it fails the α-mixing criterion [69] (Example A.3, p. 349), the process is ergodic. This confirms that ergodicity may persist even in the absence of strong mixing, underscoring its suitability for non-parametric functional data analysis.
Remark 13.
The paper already includes several well-known examples from the literature of processes that are ergodic yet not mixing. Broadening this to encompass more general linear Markov models—such as ARMA or ARIMA with —would certainly be of considerable interest. Nonetheless, pursuing this extension would necessitate a meticulous and comprehensive study to rigorously delineate the circumstances under which such processes remain ergodic while failing to satisfy mixing conditions.
6. Concluding Remarks
This study tackles the estimation of partial derivatives of multivariate regression functions. We introduce a family of non-parametric estimators based on linear wavelet methods and provide a rigorous theoretical analysis. Specifically, we establish strong uniform consistency on compact subsets of and derive the corresponding convergence rates. We also prove the estimators’ asymptotic normality and extend these results to ergodic processes, thereby broadening the scope of existing theory.
A key open problem is how to select smoothing parameters optimally so as to minimise the mean-squared error; addressing this question will be the focus of future work. Other promising directions include extending the methodology to functional ergodic data—a task that presents substantial mathematical challenges—and adapting it to settings with incomplete observations, such as data missing at random or subject to various censoring mechanisms, particularly in spatially dependent contexts.
Further progress could come from relaxing the stationarity assumption to accommodate locally stationary processes and from developing comparable results for stationary continuous-time models. Both extensions would require fundamentally different analytical tools.
Finally, extensive numerical experiments on simulated and real datasets would enhance the practical relevance of our procedures, and the development of weighted bootstrap techniques—building on recent contributions such as [70,71]—offers another fruitful avenue for investigation.
7. Proofs
We derive an upper bound for the partial sums of unbounded martingale differences—a key ingredient in analyzing the asymptotic properties of the wavelet estimator based on strictly stationary, ergodic observations. Throughout the paper, C denotes a generic positive constant that may change from one occurrence to the next. The necessary inequality is presented in the ensuing lemmas.
Lemma 2.
(Burkholder-Rosenthal inequality) Following Notation 1 in [72].
Let be a stationary martingale adapted to the filtration , define is the sequence of martingale differences adapted to and
then for any positive integer n,
where as usual the norm
Lemma 3.
Let be a sequence of real martingale differences with respect to the sequence of fields , where is the σ-field generated by the random variables . Set
For any and any , assume that there exists some nonnegative constants C and such that
Then, for any , we have
where
Proof of Lemma 3.
The proof follows as a particular case of Theorem 8.2.2 due to [73]. □
To prove Theorem 1, we utilize the following two lemmas. Define the conditional expectation of
Lemma 4.
For any , under the assumptions (C.1) and (N.2)(i), the following holds
Proof of Lemma 4.
Observe that by invoking assumptions (N.2)(i) and (C.1), and the fact that , we can further expand as follows:
Consequently,
□
Lemma 5.
For any , under the assumption (N.2)(i),the following holds:
Proof of Lemma 5.
Lemma 6.
For any , under the assumptions (C.1), (C.4)(i), (N.0), and (N.2)(i), the following holds:
Proof of Lemma 6.
We present the following decomposition:
Therefore, we conclude that
By applying using Lemma 4 and its assumptions, especially statement (22), one obtains
Moreover, using the same statement (22), we deduce
We now direct our focus to the first term in the decomposition (31). Observe that
Observe that forms a martingale difference sequence with respect to the filtration . Applying Lemma 2, we directly conclude:
Moreover, the following holds:
To examine these components, we employ a standard decomposition approach and observe that corresponds to the trivial -algebra. Consequently, the following result is derived:
By invoking the Cauchy-Schwarz inequality we obtain
Under assumption (N.0), there exists a constant such that
on the other hand, by employing a first-order Taylor expansion alongside Equation (1) and assumption (C.4)(i), we obtain
This leads to
Next, we analyze the second component of the decomposition in Equation (34). Specifically, we consider
Applying a standard identity, we find
Using Equation (36), it follows that
Combining the results from Equations (42) and (38), we derive
Consequently,
Finally, by integrating the findings from Equations (32), (33), and (39), we conclude that
This completes the proof. □
Proof of Lemma 1.
The analysis of the bias term is purely analytical and follows from arguments analogous to those in [24], as it remains unaffected by the dependence structure. For brevity, we omit the detailed derivation. □
Proof of Theorem 1.
Building on standard wavelet estimation techniques (see [21] for a detailed exposition), we derive the main result in the following manner. First, by applying the definition of projector normality, we obtain
Next, by invoking Lemma 6 and using the facts that and , we deduce
□
Proof of Theorem 2.
Now, we consider the following decomposition:
To establish the uniform consistency of the above term we will follow the method used in ([74]). Let us introduce the truncated version of as follows. Let
and
Throughout this work, the notation represents the indicator function for the set A. In a similar way as in the (10), we write as an extended kernel estimator
Furthermore, we define:
and
We begin by breaking down the initial term of the Equation (40), into three distinct components, expressed as a sum through the following formulation:
Recalling statement (7), we obtain readily that
Markov inequality, in turn, implies that
under assumption (N.0) and using the fact that
By the Borel-Cantelli lemma,
almost surely for all sufficiently large n. Given the monotonicity of , this implies for all . Combining these results with the inequality (46) as in [74], we conclude that
Once more, by (7), we infer that
Observe that the measurability of the functions and , combined with the ergodicity of , ensures that the process satisfies condition (N.1). Applying the Hölder and Markov inequalities, for any and exponents fulfilling
that
This, in turn, readily implies that
which gives
Lastly, we turn our attention to second term of the decomposition (45), given the compactness of D, there exists a finite covering by cubes , each centered at with side length , where . It follows directly that
where
we readily infer that
The statement (9) allows us to infer that
which implies that
A similar argument shows, likewise, that
We now analyze the term in the right side of (49) and show that
Observe that
where, for ,
forms a sequence of martingale-difference arrays relative to the -field . We now utilize the Lemma 3 for the partial sums of unbounded of martingale differences to obtain an upper bound. Observe that
By using the fact that
is -measurable, it follows then
recalling the fact that
and applying the assumption (N.2)(ii), for any integer , we obtain
where is a positive function as stipulated in assumption (N.2)(ii), ensures that
where is a positive constant. It follows readily from (7),
statements (53), (54) and (55) give the following upper bound
We apply Lemma 3 to the summation of the martingale differences , where
and
Consequently, there exists a positive constant such that the following inequalities hold:
Observe that
this implies that when we have
we readily obtain, under condition (11), that
The assertion (52) is established through a standard application of the Borel-Cantelli lemma. Consequently, the result stated in Theorem 2 follows by applying the decompositions (40) that resulted in two terms: the first one was rewritten in (45) as the sum of three terms, the statements (47) and (48) and the decomposition (49) in conjunction with estimates (50), (51), and (52). To complete the argument, we now analyze the second term on the right-hand side of (40). Employing the same reasoning as in (54), and in view of assumptions (N.2)(i) and (N.3)(i), we observe that
Making use of the Cauchy-Schwarz inequality and statement (7) when
we obtain readily that
Under Assumption (C.2), we deduce that
Hence the proof is complete. □
Proof of Theorem 3.
Recall the decomposition
Observe that
Thus, employing reasoning analogous to that used for (57), under hypothesis (C.2), statement (7) with , and Condition (12), we directly conclude that
We turn our attention to the term and observe that under assumption (C.3), we have
On the other hand, by Condition (12), we have
Observe that
Here, constitutes a martingale difference array adapted to the filtration . This justifies applying the martingale central limit theorem for discrete-time arrays (cf. [75]) to demonstrate the asymptotic normality of . To achieve this, it suffices to verify the following conditions:
- (a)
- Lyapunov condition:
- (b)
- Lindberg condition:
Proof of part (a).
Observe that
Employing the same reasoning as in (54), and employing a first-order Taylor expansion alongside Equation (7), and in view of assumptions (N.2)(i), (N.3)(i), and (C.4)(ii), we obtain
Therefore, we infer
The ergodicity of the process implies that the process defined by is ergodic too and verifies the condition (C.2), which means that
in both the almost sure and senses. This implies
The statement (a) follows then from
Observe that by the fact that (see [28])
By assumption (N.2)(ii), we have
Under assumptions (C.1), (C.4)(ii), and (N.3)(iii) we have
Observe that
Combining Equations (66) and (67) we deduce
Proof of (b).
The Lindberg condition results from Corollary 9.5.2 in [76] which implies that
Let and such that
Making use of Hölder and Markov inequalities one can write, for all
Therefore, by using the condition (7) when , we obtain
Hence, under (N.0) we infer that
Combining statements (68) and (69), we achieve the proof of the theorem. □
Author Contributions
Conceptualization, S.D. and S.B.; methodology, S.D. and S.B.; validation, S.D. and S.B.; formal analysis, S.D. and S.B.; investigation, S.D. and S.B.; resources, S.D. and S.B.; writing—original draft preparation, S.D. and S.B.; writing—review and editing, S.D. and S.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data are contained within the article.
Acknowledgments
The authors gratefully acknowledge Qassim University, represented by the Deanship of Scientific Research. The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the four reviewers for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in markedly improved presentation.
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Besov Spaces
Following [74], let and introduce the shift operator
Define
with the usual modification
For first-order regularity one replaces the first difference with a second difference:
and
For the Besov space
When write
with and . Then iff every weak derivative with satisfies
equivalently,
Prominent examples include the Sobolev space and the class of bounded s-Lipschitz functions .
Remark A1.
For density estimation over a Sobolev ball of smoothness s, the minimax -risk is ; see [77]. Comprehensive surveys of the relationships between classical function spaces and Besov spaces—including Fourier-analytic characterisations of Sobolev spaces when —are given in [78,79]. Connections between Besov spaces and the spaces of functions of bounded p-variation are developed in [80], relying on interpolation theory from [32]; a traditional exposition of p-variation may be found in [81]. For Besov spaces on more general geometric settings such as manifolds or Dirichlet spaces, consult [82].
References
- Wand, M.P.; Jones, M.C. Kernel smoothing. In Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; Volume 60, pp. xii+212. [Google Scholar] [CrossRef]
- Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Volume II. Regression; Springer Series in Statistics; Springer: Dordrecht, The Netherlands, 2009; pp. xx+571. [Google Scholar] [CrossRef]
- Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. Non-parametric inference for density modes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 99–126. [Google Scholar] [CrossRef]
- Noh, Y.K.; Sugiyama, M.; Liu, S.; du Plessis, M.C.; Park, F.C.; Lee, D.D. Bias reduction and metric learning for nearest-neighbor estimation of Kullback-Leibler divergence. Neural Comput. 2018, 30, 1930–1960. [Google Scholar] [CrossRef]
- Dobrovidov, A.V.; Ruds’ko, I.M. Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method. Autom. Remote Control 2010, 71, 209–224. [Google Scholar] [CrossRef]
- Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. On the path density of a gradient field. Ann. Stat. 2009, 37, 3236–3271. [Google Scholar] [CrossRef] [PubMed]
- Singh, R.S. Applications of estimators of a density and its derivatives to certain statistical problems. J. R. Stat. Soc. Ser. B 1977, 39, 357–363. [Google Scholar] [CrossRef]
- Meyer, T.G. Bounds for estimation of density functions and their derivatives. Ann. Stat. 1977, 5, 136–142. [Google Scholar] [CrossRef]
- Silverman, B.W. Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Stat. 1978, 6, 177–184. [Google Scholar] [CrossRef]
- Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
- Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2002; pp. x+190, Methods and case studies. [Google Scholar] [CrossRef]
- Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005; pp. xx+426. [Google Scholar]
- Liu, S.; Kong, X. A generalized correlated Cp criterion for derivative estimation with dependent errors. Comput. Statist. Data Anal. 2022, 171, 107473. [Google Scholar] [CrossRef]
- Eubank, R.L.; Speckman, P.L. Confidence bands in nonparametric regression. J. Amer. Statist. Assoc. 1993, 88, 1287–1301. [Google Scholar] [CrossRef]
- Ruppert, D.; Sheather, S.J.; Wand, M.P. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995, 90, 1257–1270. [Google Scholar] [CrossRef]
- Park, C.; Kang, K.H. SiZer analysis for the comparison of regression curves. Comput. Statist. Data Anal. 2008, 52, 3954–3970. [Google Scholar] [CrossRef]
- Härdle, W.; Gasser, T. On robust kernel estimation of derivatives of regression functions. Scand. J. Statist. 1985, 12, 233–240. [Google Scholar]
- Ziegler, K. On the asymptotic normality of kernel regression estimators of the mode in the nonparametric random design model. J. Statist. Plann. Inference 2003, 115, 123–144. [Google Scholar] [CrossRef]
- Georgiev, A.A. Speed of convergence in nonparametric kernel estimation of a regression function and its derivatives. Ann. Inst. Statist. Math. 1984, 36, 455–462. [Google Scholar] [CrossRef]
- Deheuvels, P.; Mason, D.M. General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 2004, 7, 225–277. [Google Scholar] [CrossRef]
- Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, approximation, and statistical applications. In Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 129, pp. xviii+265. [Google Scholar] [CrossRef]
- Prakasa Rao, B.L.S. Nonparametric estimation of the derivatives of a density by the method of wavelets. Bull. Inf. Cybern. 1996, 28, 91–100. [Google Scholar]
- Chaubey, Y.P.; Doosti, H.; Prakasa Rao, B.L.S. Wavelet based estimation of the derivatives of a density with associated variables. Int. J. Pure Appl. Math. 2006, 27, 97–106. [Google Scholar]
- Rao, B.L.S.P. Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets. In Asymptotics in Statistics and Probability. Papers in Honor of George Gregory Roussas; Puri, M.L., Ed.; De Gruyter: Berlin, Germany; Boston, MA, USA, 2000; pp. 321–330. [Google Scholar] [CrossRef]
- Hosseinioun, N.; Doosti, H.; Niroumand, H.A. Nonparametric estimation of a multivariate probability density for mixing sequences by the method of wavelets. Ital. J. Pure Appl. Math. 2011, 28, 31–40. [Google Scholar]
- Koshkin, G.; Vasil’iev, V. An estimation of a multivariate density and its derivatives by weakly dependent observations. In Statistics and Control of Stochastic Processes. The Liptser Festschrift. Papers from the Steklov Seminar Held in Moscow, Russia, 1995–1996; World Scientific: Singapore, 1997; pp. 229–241. [Google Scholar]
- Prakasa Rao, B.L.S. Wavelet estimation for derivative of a density in the presence of additive noise. Braz. J. Probab. Stat. 2018, 32, 834–850. [Google Scholar] [CrossRef]
- Meyer, Y. Wavelets and operators. In Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1992; Volume 37, pp. xvi+224. [Google Scholar]
- Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Statist. 1996, 24, 508–539. [Google Scholar] [CrossRef]
- Schneider, C. Beyond Sobolev and Besov—Regularity of solutions of PDEs and their traces in function spaces. In Lecture Notes in Mathematics; Springer: Cham, Switzerland, 2021; Volume 2291, pp. xviii+327. [Google Scholar] [CrossRef]
- Sawano, Y. Theory of Besov spaces. In Developments in Mathematics; Springer: Singapore, 2018; Volume 56, pp. xxiii+945. [Google Scholar] [CrossRef]
- Peetre, J. New Thoughts on Besov Spaces; Duke University Mathematics Series, No. 1; Duke University, Mathematics Department: Durham, NC, USA, 1976; pp. vi+305. [Google Scholar]
- Rosenblatt, M. Uniform ergodicity and strong mixing. Z. Wahrscheinlichkeitstheorie Verw. Geb. 1972, 24, 79–84. [Google Scholar] [CrossRef]
- Bradley, R.C. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 3, pp. xii+597. [Google Scholar]
- Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
- Daubechies, I. Ten lectures on wavelets. In CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 61, pp. xx+357. [Google Scholar] [CrossRef]
- Peškir, G. The uniform mean-square ergodic theorem for wide sense stationary processes. Stoch. Anal. Appl. 1998, 16, 697–720. [Google Scholar] [CrossRef]
- Hall, P.; Penev, S. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 2001, 7, 317–341. [Google Scholar] [CrossRef]
- Bellman, R. Adaptive Control Processes: A Guided Tour; Princeton University Press: Princeton, NJ, USA, 1961; pp. xvi+255. [Google Scholar]
- Scott, D.W.; Wand, M.P. Feasibility of multivariate density estimates. Biometrika 1991, 78, 197–205. [Google Scholar] [CrossRef]
- Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
- Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. [Google Scholar] [CrossRef]
- Shen, A.; Li, X.; Zhang, Y.; Qiu, Y.; Wang, X. Consistency and asymptotic normality of wavelet estimator in a nonparametric regression model. Stochastics 2021, 93, 868–885. [Google Scholar] [CrossRef]
- Liu, Y.; Zeng, X. Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 2020, 48, 321–342. [Google Scholar] [CrossRef]
- Tang, X.; Xi, M.; Wu, Y.; Wang, X. Asymptotic normality of a wavelet estimator for asymptotically negatively associated errors. Statist. Probab. Lett. 2018, 140, 191–201. [Google Scholar] [CrossRef]
- Niu, S.l. Asymptotic normality of wavelet density estimator under censored dependent observations. Acta Math. Appl. Sin. Engl. Ser. 2012, 28, 781–794. [Google Scholar] [CrossRef]
- Li, Y.; Guo, j. Asymptotic normality of wavelet estimator for strong mixing errors. J. Korean Statist. Soc. 2009, 38, 383–390. [Google Scholar] [CrossRef]
- Roueff, F.; Taqqu, M.S. Asymptotic normality of wavelet estimators of the memory parameter for linear processes. J. Time Ser. Anal. 2009, 30, 534–558. [Google Scholar] [CrossRef]
- Li, Y.; Yang, S.; Zhou, Y. Consistency and uniformly asymptotic normality of wavelet estimator in regression model with associated samples. Statist. Probab. Lett. 2008, 78, 2947–2956. [Google Scholar] [CrossRef]
- Debbarh, M. Normalité asymptotique de l’estimateur par ondelettes des composantes d’un modèle additif de régression. C. R. Math. Acad. Sci. Paris 2006, 343, 601–606. [Google Scholar] [CrossRef]
- Allaoui, S.; Bouzebda, S.; Liu, J. Asymptotic distribution of the wavelet-based estimators of multivariate regression functions under weak dependence. J. Math. Inequal. 2023, 17, 481–515. [Google Scholar] [CrossRef]
- Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
- Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
- Eddy, W.F. Optimum kernel estimators of the mode. Ann. Statist. 1980, 8, 870–882. [Google Scholar] [CrossRef]
- Eddy, W.F. The asymptotic distributions of kernel estimators of the mode. Z. Wahrsch. Verw. Geb. 1982, 59, 279–290. [Google Scholar] [CrossRef]
- Billingsley, P. Convergence of Probability Measures; John Wiley & Sons, Inc.: New York, NY, USA; London, UK; Sydney, Australia, 1968; pp. xii+253. [Google Scholar]
- Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
- Hall, P.; Marron, J.S. Estimation of integrated squared density derivatives. Statist. Probab. Lett. 1987, 6, 109–115. [Google Scholar] [CrossRef]
- Jurečková, J. Asymptotic linearity of a rank statistic in regression parameter. Ann. Math. Statist. 1969, 40, 1889–1900. [Google Scholar] [CrossRef]
- Giné, E.; Mason, D.M. Uniform in bandwidth estimation of integral functionals of the density function. Scand. J. Statist. 2008, 35, 739–761. [Google Scholar] [CrossRef]
- Levit, B.Y. Asymptotically efficient estimation of nonlinear functionals. Probl. Peredachi Informatsii 1978, 14, 65–72. [Google Scholar]
- Masry, E. Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stoch. Process. Appl. 1997, 67, 177–193. [Google Scholar] [CrossRef]
- Allaoui, S.; Bouzebda, S.; Liu, J. Multivariate wavelet estimators for weakly dependent processes: Strong consistency rate. Comm. Statist. Theory Methods 2023, 52, 8317–8350. [Google Scholar] [CrossRef]
- Bouzebda, S. Limit theorems for wavelet conditional U-statistics for time series models. Math. Methods Statist. 2025, 35, 1–42. [Google Scholar]
- Chaouch, M.; Laïb, N. Regression estimation for continuous-time functional data processes with missing at random response. J. Nonparametr. Stat. 2024, 36, 1–32. [Google Scholar] [CrossRef]
- Giraitis, L.; Leipus, R. A generalized fractionally differencing approach in long-memory modeling. Liet. Mat. Rink. 1995, 35, 65–81. [Google Scholar] [CrossRef]
- Guégan, D.; Ladoucette, S. Non-mixing properties of long memory processes. C. R. Acad. Sci. Paris Sér. I Math. 2001, 333, 373–376. [Google Scholar] [CrossRef]
- Andrews, D.W.K. Non-strong mixing autoregressive processes. J. Appl. Probab. 1984, 21, 930–934. [Google Scholar] [CrossRef]
- Francq, C.; Zakoïan, J.M. GARCH Models: Structure, Statistical Inference and Financial Applications; John Wiley & Sons, Ltd.: Chichester, UK, 2010; pp. xiv+489. [Google Scholar] [CrossRef]
- Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivar. Anal. 2013, 116, 52–62. [Google Scholar] [CrossRef]
- Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
- Burkholder, D.L. Distribution function inequalities for martingales. Ann. Probab. 1973, 1, 19–42. [Google Scholar] [CrossRef]
- de la Peña, V.H.; Giné, E. Decoupling; Probability and Its Applications (New York); Springer: New York, NY, USA, 1999; pp. xvi+392. [Google Scholar] [CrossRef]
- Masry, E. Wavelet-based estimation of multivariate regression functions in Besov spaces. J. Nonparametr. Statist. 2000, 12, 283–308. [Google Scholar] [CrossRef]
- Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Probability and Mathematical Statistics; Academic Press, Inc.; Harcourt Brace Jovanovich, Publishers: New York, NY, USA; London, UK, 1980; pp. xii+308. [Google Scholar]
- Chow, Y.S.; Teicher, H. Probability Theory; Springer: New York, NY, USA; Berlin/Heidelberg, Germany, 1978; pp. xv+455. [Google Scholar]
- Efromovich, S. Lower bound for estimation of Sobolev densities of order less . J. Statist. Plann. Inference 2009, 139, 2261–2268. [Google Scholar] [CrossRef]
- Triebel, H. Theory of function spaces. In Monographs in Mathematics; Birkhäuser Verlag: Basel, Switzerland, 1983; Volume 78, p. 284. [Google Scholar] [CrossRef]
- DeVore, R.A.; Lorentz, G.G. Constructive approximation. In Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]; Springer: Berlin/Heidelberg, Germany, 1993; Volume 303, pp. x+449. [Google Scholar]
- Bourdaud, G.; Lanza de Cristoforis, M.; Sickel, W. Superposition operators and functions of bounded p-variation. Rev. Mat. Iberoam. 2006, 22, 455–487. [Google Scholar] [CrossRef]
- Dudley, R.M.; Norvaiša, R. Differentiability of six operators on nonsmooth functions and p-variation. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1703, pp. viii+277. [Google Scholar] [CrossRef]
- Geller, D.; Pesenson, I.Z. Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds. J. Geom. Anal. 2011, 21, 334–371. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).