Abstract
Using kernel methods, Lepski and Willer study a convolution structure density model and establish adaptive and optimal risk estimations over an anisotropic Nikol’skii space (Lepski, O.; Willer, T. Oracle inequalities and adaptive estimation in the convolution structure density model. Ann. Stat. 2019, 47, 233–287). Motivated by their work, we consider the same problem over Besov balls by wavelets in this paper and first provide a linear wavelet estimate. Subsequently, a non-linear wavelet estimator is introduced for adaptivity, which attains nearly-optimal convergence rates in some cases.
1. Introduction
The estimation of a probability density from independent and identically distributed (i.i.d.) random observations of X is a classical problem in statistics. The representative work is Donoho et al. [1], they established an adaptive and nearly-optimal estimate (up to a logarithmic factor) over Besov spaces using wavelets.
However, the observed data are always polluted by noises in many real-life applications. One of the important problems is the density estimation with an additive noise. Let be i.i.d. random variables and have the same distribution as
where X denotes a real-valued random variable with unknown probability density function f and Y stands for an independent random noise (error) with a known probability density g. The problem is to estimate f by in some sense. Moreover, it is also called a denconvolution problem (model), because the density h of Z equals the convolution of f and g. Fan and Koo [2] studied the MISE performance (-risk) of linear wavelet deconvolution estimator over a Besov ball. The risk optimal wavelet estimations were investigated by Lounici and Nickl [3]. Furthermore, Li and Liu [4] provided risk optimal deconvolution estimations using wavelet bases.
In this paper, we consider a generalized deconvolution model introduced by Lepski & Willer [5,6]. More precisely, let be a probability space and be i.i.d. random variables having the same distribution as
where the symbols X and Y are same as model (1), f and g are the corresponding densities respectively. Moreover, the biggest difference with model (1) is that a Bernoulli random variable with is added in (2), and is known. The problem is also to estimate f by the observed data in some sense.
When , model (2) reduces to the deconvolution one (see [2,3,4,7,8] et al.), while corresponds to the classical density model with no errors (see [1,9,10,11] et al.). Clearly, the density function h of Z in (2) satisfies
Here, stands for the convolution of f and g. Furthermore, when the function for , we have
where g, are known and is the Fourier transform of given by
Based on the model (2) with some mild assumptions on , Lepski and Willer [5] provided a lower bound estimation over risk on an anistropic Nikol’skii space. Moreover, they investigated an adaptive and optimal estimate by using kernel method in Ref. [6]. Recently, Wu et al. [12] established a pointwise lower bound estimation for model (2) under the local Hölder condition.
When compared with the classical kernel estimation of density functions, the wavelet estimations provide more local information and fast algorithm [13]. We will consider the risk estimations under the model (2) over Besov balls by using wavelets and expect to obtain the corresponding convergence rates.
The same as Assumption 4 in [6], we also need the following condition on Y,
with for and for others. It is reasonable, because it holds automatically for , while the same condition for is necessary for the deconvolution estimations [4,7]. In addition, when , and thanks to . In fact, the condition (3) is necessary to prove Lemmas 2 and 3 in Section 2. Here and after, denotes for a fixed constant ; means ; stands for both and .
It is well-known that the wavelet estimation depends on an orthonormal wavelet expansion in , even in . Let be a classical Multiresolution Analysis of with scaling function and being the corresponding wavelet. Subsequently, for
where and . A scaling function is called m-regular , if and for each (). Clearly, the m-regularity of implies that of the corresponding , and due to the integration by parts. An important example is Daubechies’ function with N large enough.
As usual, let be the orthogonal projection from onto ,
If is m-regular, then is well-defined for . Moreover, the identity (4) holds in for .
The following lemma is needed for later discussions.
Lemma 1
([13]). Let ϑ be an orthogonal scaling function or a wavelet satisfying m-regularity. Subsequently, there exist , such that, for and ,
One of the advantages of wavelet bases is that they can characterize Besov spaces, which contain the -Sobolev spaces and Hölder spaces as special examples.
Proposition 1
([13]). Let scaling function φ be m-regular with and ψ be the corresponding wavelet. Afterwards, for and , the following conditions are equivalent:
- (i).
- ;
- (ii).
- ; and,
- (iii).
The Besov norm can be defined by
When and , it is well-known that
- (1)
- for ;
- (2)
- for and ,
where stands for a Banach space A continuously embedded in another Banach space B. More precisely, holds for some .
In this paper, we use the notation with some constants to stand for a Besov ball, i.e.,
Next, we will estimate f with risk by constructing wavelet estimators from the observed data . To introduce wavelet estimators, we take having compact support and m-regularity with in this paper. Moreover, denote
where
and is defined by the way. Clearly, due to the Plancherel formula. Subsequently, the linear wavelet estimator is given by
where . In particular, the cardinality of satisfies that , when f and have compact supports.
Now, we are in a position to state the first result of this paper.
Theorem 1.
For and , the estimator in (7) with satisfies
where and .
Remark 1.
When , , the conclusion of Theorem 1 reduces to Theorem 3 of Li & Liu [4].
Note that the estimator is non-adaptive, because the choice of depends on the unknown parameter s. To obtain an adaptive estimate, define
Here, and the constants will be determined later on. Subsequently, the non-linear wavelet estimator is defined by
where , are positive integers satisfying and respectively. Clearly, and do not depend on the unknown parameters , which means that the estimator in (9) is adaptive.
Theorem 2.
Let and . Then the estimator in (9) satisfies
where
Remark 2.
When , and , the convergence rate of Theorem 2 coincides with that of Theorem 3 in Donoho et al. [1]. On the other hand, and for the case , while the conclusion of Theorem 4 in Li & Liu [4] can follow directly from this theorem.
Remark 3.
When comparing the result of Theorem 2 with Theorem 1, we find easily that for the case , the convergence rate of non-linear estimator is better than that of the linear one with and .
Remark 4.
The convergence rates of Theorem 2 with the cases and are nearly-optimal (up to a logarithmic factor) by Donoho et al. [1] and Li & Liu [4] respectively. However, it is not clear whether the estimation in Theorem 2 is optimal (nearly-optimal) or not for . Therefore, one of our future work is to determine a low bound estimate for model (2) with . This problem may be much more complicated than the cases of and .
2. Preliminaries
This section is devoted to introduce some useful lemmas. The following inequality is necessary in the proof of Lemma 2.
Rosenthal’s inequality ([13]). Let and be the independent random variables such that and . Subsequently, there exists , such that
Lemma 2.
Let and . Then for ,
Proof.
Obviously, one only needs to prove the first inequality and the second one is similar. Define (). Subsequently, are i.i.d. samples and (). By the definitions of and , and
According to (6), one obtains that
This with (3), regularity of and shows
Hence, for
On the other hand, follows from . Afterwards,
Furthermore, due to the Plancherel formula. The same arguments as (11) imply that
We state another classical inequality, before giving the proof of Lemma 3.
Bernstein’s inequality ([13]).Let be independent random variables with and . Then for each ,
Lemma 3.
If and , then there exists some constat such that for any ,
where .
3. Proofs of Main Results
We shall show the proofs of Theorems 1 and 2 in this section.
Proof of Theorem 1.
It is sufficient to prove the case for . In fact, when and f has a compact support, does because of having the same property. Subsequently, it follows from Hölder inequality and Jensen’s inequality that
Clearly, by Proposition 1, thanks to the well-known embedding theorem for .
On the other hand, implies and Lemma 2 tells that . This with Lemma 1 and shows that
Finally, (16) reduces to
because of . The proof is done. □
Now, we give a proof of Theorem 2, which is the most important result.
Proof of Theorem 2.
The same arguments as the proof of Theorem 1, one only needs to prove the case for .
It is known that for with . Hence, due to Proposition 1. Moreover, it follows from the choice of , and that
Similar to (17), one obtains
by , and the definition of .
Next, the main work of this proof is to estimate . By Lemma 1,
Define and . Subsequently, where
with denoting the complement of A in . Hence, it suffices to prove .
To estimate , note that follows from . Moreover, by the Hölder inequality. Afterwards, for large ,
thanks to , Lemmas 2 and 3.
For , due to . Combining this with Lemma 3, one knows . Therefore, by , and large ,
In order to estimate and , one defines satisfying
Recall that , and . Then due to and .
To estimate , one divides into
Similar to (17), according to ,
Define
When , . It is easy to see that for . Hence,
In addition, by Lemma 2 and due to Proposition 1 and . These with (18), and lead to
For the case , . Denote . Then . Hence, the same arguments as (18) show that
On the other hand, follows from and Proposition 1. Furthermore, thanks to Lemma 2. These with (19) and imply that
because and .
Finally, it remains to estimate . Obviously, for . Thus,
Because and , . This with implies that
Note that by , and due to . Subsequently, with and ,
At last, one shows for the case Subsequently,
The same arguments as (22), one finds
4. Conclusions
This current paper shows risk estimations of both linear and non-linear wavelet estimators under a convolution structure density model over Besov balls. The corresponding conclusions are introduced by Theorems 1 and 2 in this paper, which can be seen as an extension of the works of Donoho et al. [1] and Li & Liu [4].
It should be pointed out that the non-linear wavelet estimator is adaptive, and the convergence rate of non-linear estimator are better than that of linear one for the case . However, it is not clear whether the estimations are optimal (nearly-optimal) or not for . Therefore, one of our future work is to determine a low bound estimate for model (2) with .
Author Contributions
Conceptualization, K.C. and X.Z.; methodology, K.C. and X.Z.; validation, X.Z.; formal analysis, K.C.; writing–original draft preparation, K.C.; writing–review and editing, K.C. and X.Z.; supervision, X.Z.; funding acquisition, X.Z.; project administration, X.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This paper is supported by the National Natural Science Foundation of China (No. 11901019) and the Science and Technology Program of Beijing Municipal Commission of Education (No. KM202010005025).
Acknowledgments
The authors would like to thank the referees for their very helpful comments.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Stat. 1996, 24, 508–539. [Google Scholar] [CrossRef]
- Fan, J.; Koo, J.-Y. Wavelet deconvolution. IEEE Trans. Inform. Theory 2002, 48, 734–747. [Google Scholar]
- Lounici, K.; Nickl, R. Global uniform risk bounds for wavelet deconvolution estimators. Ann. Stat. 2011, 39, 201–231. [Google Scholar] [CrossRef]
- Li, R.; Liu, Y.M. Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 2014, 36, 416–433. [Google Scholar] [CrossRef]
- Lepski, O.; Willer, T. Lower bounds in the convolution structure density model. Bernoulli 2017, 23, 884–926. [Google Scholar] [CrossRef]
- Lepski, O.; Willer, T. Oracle inequalities and adaptive estimation in the convolution structure density model. Ann. Stat. 2019, 47, 233–287. [Google Scholar] [CrossRef]
- Liu, Y.M.; Zeng, X.C. Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 2020, 48, 321–342. [Google Scholar] [CrossRef]
- Pensky, M.; Vidakovic, B. Adaptive wavelet estimator for nonparametric density deconvolution. Ann. Stat. 1999, 27, 2033–2053. [Google Scholar]
- Cao, K.K.; Liu, Y.M. On the Reynaud-Bouret–Rivoiard–Tuleau-Malot problem. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1850038. [Google Scholar] [CrossRef]
- Kerkyacharian, G.; Picard, D. Density estimation in Besov spaces. Stat. Probab. Lett. 1992, 13, 15–24. [Google Scholar] [CrossRef]
- Liu, Y.M.; Wang, H.Y. Convergence order of wavelet thresholding estimator for differential operators on Besov spaces. Appl. Comput. Harmon. Anal. 2012, 32, 342–356. [Google Scholar] [CrossRef]
- Wu, C.; Wang, J.R.; Zeng, X.C. A pointwise lower bound for generalized deconvolution density estimation. Appl. Anal. 2019. [Google Scholar] [CrossRef]
- Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation and Statistical Applications; Springer: New York, NY, USA, 1998. [Google Scholar]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).