Abstract
Let be the non-parametric kernel density estimator based on a kernel function K and a sequence of independent and identically distributed random vectors taking values in . With some mild conditions, we establish sharp moderate deviations for the kernel density estimator. This means that we provide an equivalent for the tail probabilities of this estimator.
MSC:
60F10; 62G07; 60E05; 62E20
1. Introduction
Let be a sequence of independent and identically distributed (i.i.d.) random vectors taking values in on probability space with density function f. Let be a kernel function. The kernel density estimator of f is defined by
where is a bandwidth sequence, that is, a sequence of positive numbers satisfying
A great and synthetic reference for such estimates is [1]. Among the huge number of applications of kernel density estimation, let us cite the elegant paper in [2] which makes use of this estimator for an important problem related with green algae: using our results may be used to derive a decision rule for this important ecological question.
In this paper, we are interested in the pointwise sharp moderate deviations for by the empirical process approach; the volume in [3] is a perfect overview for such questions. In order to present our main result, let us first introduce some notations and assumptions. Let be a real function. As usual, denote by
the -norm of g and the supermum norm, respectively.
The consistency for the kernel density estimator has been studied widely. Let f be continuously differentiable on such that and ( is the derivative of f). In addition, suppose that and , where c is a constant. With some mild conditions, Joutard [4] proved the following pointwise sharp large deviation: for any and ,
where is such that and , with . For uniform consistency, with some mild conditions, Gao [5] proved the following moderate deviation principle (MDP) result. Let be a sequence of positive real numbers satisfying
Gao [5] proved that for any ,
where
A pointwise MDP is also established in Gao [5]. A class of refinements of pointwise MDP is called sharp moderate deviations. Sharp moderate deviations are also known as Cramér moderate deviations, and have attracted a lot of interest. We refer to Cramér [6], Petrov [7], Beknazaryan et al. [8] and Fan et al. [9] for such type results. In this paper, we are interested in establishing sharp moderate deviations for the kernel density estimator.
2. Main Results
The following assumptions will be used in this paper.
- (A)
- Assume that the kernel function K satisfies
- (B)
- There exist a constant and a non-negative integer s such that for anywhere A is a positive constant, is the Euclidean distance and
- (C)
- Assume
Remark 1.
After [1], we recall that the previous assumptions are pretty standard and we restate them in the current multivariate setting:
- 1.
- The first condition in Assumption (A) is necessary to ensure that the estimate remains a function with integral 1;the second one is not necessary but no striking or useful unbounded kernel was used in the frame of density estimation. Moreover, this condition makes it useless to assume that , for example.
- 2.
- Assumption (B) is a regularity condition on f with order , with and as considered before.
- 3.
- The first condition in Assumption (C) is useful to prove that the involved expressions are square-integrable. The second part of this condition is more tricky and ensures that the Taylor expansion up to order s provides the relationsince all the intermediate terms simply vanish. It is important to also quote that such kernels K exist. A very simple and usual case is (second-order regularity), which holds in case K is symmetric with respect to each of its coordinates; it this case, it is possible to obtain and then the estimator is still a density (since it is non-negative). For the general case , a standard procedure to prove the existence of such kernels is to define for a fixed bounded density function and , where is the degree of the polynomial P. Then, it is easy to prove that the system of equations in (C) together with the first part of (A) is invertible and linear because the matrix with coefficientsis symmetric non-negative definite; this point is a straightforward extension of Lemma 3.3.1 in [10] to our multidimensional setting.
Assume that for some Denote
We have the following pointwise sharp moderate deviations for the kernel density estimator.
Theorem 1.
Assume that Conditions (A)–(C) are satisfied. Assume for some Then, it holds that
uniformly for as Moreover, the same equality remains valid when is replaced by .
For the non-centered case, we have the following pointwise sharp moderate deviations for the kernel density estimator. Denote
Theorem 2.
Assume that conditions (A)–(C) are satisfied. Assume for some . Then, it holds that
uniformly for as Moreover, the same equality remains valid when is replaced by .
Remark 2.
Let us comment on Theorem 2.
- 1.
- In the expression of , recall that (1) in Remark 1 entails that
- 2.
- This result makes it possible to provide a practitioner with precise confidence intervals that are easy to compute in the case of hypothesis testing. Explicit asymptotic p-values can thus be straightforwardly obtained. For instance, consider the following hypothesis testing:with DenoteThen, by Theorem 2, the p-value is asymptotically equal to , provided that satisfiesas .
- 3.
- Cases of other non parametric estimators, such as the Nadaraya–Watson kernel regression estimator (cf. El Machkouri et al. [11] for instance), non-linear regression estimates or conditional expectations, for predictions issues or estimates of derivatives or even quantile regression estimators, see Rosenblatt [1], will be derived in further subsequent papers.
- 4.
- Even if a non-independent version of this result is accessible, we prefer to give a simple result in the current i.i.d. case.
By Theorem 2, we have the following Berry–Esseen bound for that is,
In particular, by taking , we obtain
Moreover, if and , i.e., f is 1-Hölder-continuous, then it holds that
Conclusions. When and is symmetric with respect to 0, which implies that for all by taking in Assumptions (C), then we have
which implies
Then, Theorems 1 and 2 hold with Theorems 1 and 2 provide moderate deviations for the expressions and , which are related through the expression of ’s bias; see Remark 2. Remarks 1 and 2 provide a detailed description of the calculation of the bias essential here.
3. Proof of Theorem 1
For let be i.i.d. and centered random variables. Denote and Assume that Fan et al. [12] (see also Cramér [6]) established the following asymptotic expansion on the tail probabilities of moderate deviations for .
Lemma 1.
Assume that there exists a constant such that for all
Then,
holds uniformly for
Proof.
Lemma 1 is a simple consequence of Fan et al. [12]. □
With the preliminary lemma above, we are in the position to begin the proof of Theorem 1. It is easy to see that
In the sequel, we give an estimate for the right-hand side of the last equality. Notice that
Denote
We can prove that satisfies the Bernstein condition Equation (5). Indeed, we can deduce that for all
For the variance of , we have the following estimation:
It is easy to see that
By Assumption (B), it is easy to see that
where and A is given by Assumption (B). Again by Condition (C), we can deduce that
By Condition (B), it is easy to see that
From Equations (10) and (11), we have
When we obtain
and
Therefore, by Lemma 1, we can deduce that for all
Applying inequality Equation (12) to the last inequality, we deduce that for all
Because of
it is easy to see that for all ,
Hence, we obtain for any
By the last equality, it follows that
Therefore, we have, for all
This completes the proof of Theorem 1.
4. Proof of Theorem 2
It is easy to see that
By inequality Equation (9), we deduce that
Applying the last line to Equation (13), we obtain, for all
This completes the proof of Theorem 2.
Author Contributions
Writing—original draft preparation, X.F., P.D., S.L. and H.H.; supervision, X.F.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.
Funding
This work was also funded in the frame of the chair FIME: https://fime-lab.org/ and CY-AS (“Investissements d’Avenir” ANR-16-IDEX-0008), “EcoDep” PSI-AAP2020-0000000013.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Acknowledgments
The authors would like to thank the anonymous referees for their valuable comments and remarks.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Rosenblatt, R. Stochastic Curve Estimation; NSF-CBMS Regional Conference Series in Probability and Statistics; Institute of Mathematical Statistics: Waite Hill, OH, USA, 1991; Volume 3. [Google Scholar]
- Ciret, T.; Damien, P.; Ciutat, A.; Durrieu, G.; Massabuau, J.-C. Estimation of potential and limits of bivalve closure response to detect contaminants: Application to cadmium. Environ. Toxicol. Chem. 2003, 22, 914–920. [Google Scholar]
- de la Peña, V.H.; Lai, T.L.; Shao, Q.M. Self-Normalized Processes: Limit Theory and Statistical Applications (Probability and Its Applications); Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Joutard, C. Sharp large deviations in nonparametric estimation. J. Nonparam. Stat. 2006, 18, 293–306. [Google Scholar] [CrossRef]
- Gao, F. Moderate deviations and large deviations for kernel density estimators. J. Theoret. Probab. 2003, 16, 401–418. [Google Scholar] [CrossRef]
- Cramér, H. Sur un nouveau théorème-limite de la théorie des probabilités. Actual. Sci. Indust. 1938, 736, 5–23. [Google Scholar]
- Petrov, V.V. Sums of Independent Random Variables; Springer: Berlin/Heidelberg, Germany, 1975. [Google Scholar]
- Beknazaryan, A.; Sang, H.; Xiao, Y. Cramér type moderate deviations for random fields. J. Appl. Probab. 2019, 56, 223–245. [Google Scholar] [CrossRef]
- Fan, X.; Hu, H.; Xu, L. Cramér-type moderate deviations for Euler-Maruyama scheme for SDE. Sci. China Math. 2024, 67, 1865–1880. [Google Scholar] [CrossRef]
- Doukhan, P. Stochastic Models for Time Series; Series Mathématiques et Applications; Springer: Berlin/Heidelberg, Germany, 2018; Volume 80. [Google Scholar]
- El Machkouri, M.; Fan, X.; Reding, L. On the Nadaraya-Watson kernel regression estimator for irregularly spaced spatial data. J. Statist. Plann. Infer. 2020, 205, 92–114. [Google Scholar] [CrossRef]
- Fan, X.; Grama, I.; Liu, Q. Cramér large deviation expansions for martingales under Bernstein’s condition. Stoch. Process. Appl. 2013, 123, 3919–3942. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).