Adaptive Wavelet Estimations in the Convolution Structure Density Model

Using kernel methods, Lepski and Willer study a convolution structure density model and establish adaptive and optimal Lp risk estimations over an anisotropic Nikol’skii space (Lepski, O.; Willer, T. Oracle inequalities and adaptive estimation in the convolution structure density model. Ann. Stat.2019, 47, 233–287). Motivated by their work, we consider the same problem over Besov balls by wavelets in this paper and first provide a linear wavelet estimate. Subsequently, a non-linear wavelet estimator is introduced for adaptivity, which attains nearly-optimal convergence rates in some cases.


Introduction
The estimation of a probability density from independent and identically distributed (i.i.d.) random observations X 1 , X 2 , · · · , X n of X is a classical problem in statistics. The representative work is Donoho et al. [1], they established an adaptive and nearly-optimal estimate (up to a logarithmic factor) over Besov spaces using wavelets.
However, the observed data are always polluted by noises in many real-life applications. One of the important problems is the density estimation with an additive noise. Let Z 1 , Z 2 , · · · , Z n be i.i.d. random variables and have the same distribution as where X denotes a real-valued random variable with unknown probability density function f and Y stands for an independent random noise (error) with a known probability density g. The problem is to estimate f by Z 1 , Z 2 , · · · , Z n in some sense. Moreover, it is also called a denconvolution problem (model), because the density h of Z equals the convolution of f and g. Fan and Koo [2] studied the MISE performance (L 2 -risk) of linear wavelet deconvolution estimator over a Besov ball. The L ∞ risk optimal wavelet estimations were investigated by Lounici and Nickl [3]. Furthermore, Li and Liu [4] provided L p (1 ≤ p ≤ ∞) risk optimal deconvolution estimations using wavelet bases.
In this paper, we consider a generalized deconvolution model introduced by Lepski & Willer [5,6]. More precisely, let (Ω, F , P) be a probability space and Z 1 , Z 2 , · · · , Z n be i.i.d. random variables having the same distribution as where the symbols X and Y are same as model (1), f and g are the corresponding densities respectively. Moreover, the biggest difference with model (1) is that a Bernoulli random variable ε ∈ {0, 1} with Here, f * g stands for the convolution of f and g. Furthermore, when the function G α (t) : where g, α are known and f f t is the Fourier transform of f ∈ L 1 (R) given by Based on the model (2) with some mild assumptions on G α , Lepski and Willer [5] provided a lower bound estimation over L p risk on an anistropic Nikol'skii space. Moreover, they investigated an adaptive and optimal L p estimate by using kernel method in Ref. [6]. Recently, Wu et al. [12] established a pointwise lower bound estimation for model (2) under the local Hölder condition.
When compared with the classical kernel estimation of density functions, the wavelet estimations provide more local information and fast algorithm [13]. We will consider the L p (1 ≤ p < ∞) risk estimations under the model (2) over Besov balls by using wavelets and expect to obtain the corresponding convergence rates.
The same as Assumption 4 in [6], we also need the following condition on Y, with β(α) = β ≥ 0 for α = 1 and β(α) = 0 for others. It is reasonable, because it holds automatically for α = 0, while the same condition for α = 1 is necessary for the deconvolution estimations [4,7]. In addition, when 0 < α < 1 2 , β(α) = 0 and |G α (t)| ≥ 1 − α − α|g f t (t)| 1 thanks to g f t ∞ ≤ 1. In fact, the condition (3) is necessary to prove Lemmas 2 and 3 in Section 2. Here and after, A B denotes A ≤ cB for a fixed constant c > 0; A B means B A; A ∼ B stands for both A B and A B.
As usual, let P j be the orthogonal projection from L 2 (R) onto V j , If ϕ is m-regular, then P j f is well-defined for f ∈ L p (R). Moreover, the identity (4) holds in L p (R) for 1 ≤ p ≤ ∞.
The following lemma is needed for later discussions.
Lemma 1 ([13]). Let ϑ be an orthogonal scaling function or a wavelet satisfying m-regularity. Subsequently, there exist C 2 ≥ C 1 > 0, such that, for λ = {λ k } ∈ l p (Z) and 1 ≤ p ≤ ∞, One of the advantages of wavelet bases is that they can characterize Besov spaces, which contain the L 2 -Sobolev spaces and Hölder spaces as special examples. Proposition 1 ([13]). Let scaling function ϕ be m-regular with m > s > 0 and ψ be the corresponding wavelet. Afterwards, for r, q ∈ [1, ∞] and f ∈ L r (R), the following conditions are equivalent: The Besov norm can be defined by When s > 0 and 1 ≤ r, p, q ≤ ∞, it is well-known that where A → B stands for a Banach space A continuously embedded in another Banach space B. More precisely, u B ≤ c u A (u ∈ A) holds for some c > 0.
In this paper, we use the notation B s r,q (L, M) with some constants L, M > 0 to stand for a Besov ball, i.e., Next, we will estimate f with L p risk by constructing wavelet estimators from the observed data Z 1 , Z 2 , · · · , Z n . To introduce wavelet estimators, we take ϕ having compact support and m-regularity with m ≥ β(α) + 2 in this paper. Moreover, denotê and Kψ jk is defined by the way. Clearly, Eα jk = α jk due to the Plancherel formula. Subsequently, the linear wavelet estimator is given byf where In particular, the cardinality of Λ j satisfies that |Λ j | ∼ 2 j , when f and ϕ have compact supports. Now, we are in a position to state the first result of this paper.
Note that the estimatorf lin n is non-adaptive, because the choice of j 0 depends on the unknown parameter s. To obtain an adaptive estimate, definê Here, τ j,n = cγ2 jβ(α) j n and the constants c, γ will be determined later on. Subsequently, the non-linear wavelet estimator is defined bŷ where j 0 , j 1 are positive integers satisfying 2 j 0 ∼ n 1 2m+2β(α)+1 and 2 j 1 ∼ n ln n respectively. Clearly, j 0 and j 1 do not depend on the unknown parameters s, r, q, which means that the estimatorf non n in (9) is adaptive.

Remark 3.
When comparing the result of Theorem 2 with Theorem 1, we find easily that for the case r ≤ p, the convergence rate of non-linear estimator is better than that of the linear one with n − s p 2s +2β(α)+1 and s = s − 1 r + 1 p .

Remark 4.
The convergence rates of Theorem 2 with the cases α = 0 and α = 1 are nearly-optimal (up to a logarithmic factor) by Donoho et al. [1] and Li & Liu [4] respectively. However, it is not clear whether the estimation in Theorem 2 is optimal (nearly-optimal) or not for α ∈ (0, 1). Therefore, one of our future work is to determine a low bound estimate for model (2) with α ∈ (0, 1). This problem may be much more complicated than the cases of α = 0 and α = 1.

Preliminaries
This section is devoted to introduce some useful lemmas. The following inequality is necessary in the proof of Lemma 2.
We state another classical inequality, before giving the proof of Lemma 3.

Proofs of Main Results
We shall show the proofs of Theorems 1 and 2 in this section.
Proof of Theorem 1. It is sufficient to prove the case for r ≤ p. In fact, when r > p and f has a compact support,f lin n does because of ϕ having the same property. Subsequently, it follows from Hölder inequality and Jensen's inequality that According to (5) and (7), one easily finds that Clearly, by Proposition 1, P j 0 f − f p p 2 −j 0 s p thanks to the well-known embedding theorem B s r,q → B s p,q for r ≤ p. On the other hand, s > 1 r implies f ∞ 1 and Lemma 2 tells that E|α jk − α jk | p 2 jβ(α)p n − p 2 . This with Lemma 1 and |Λ j 0 | ∼ 2 j 0 shows that Finally, (16) reduces to because of 2 j 0 ∼ n 1 2s +2β(α)+1 . The proof is done. Now, we give a proof of Theorem 2, which is the most important result.

Conclusions
This current paper shows L p (1 ≤ p < ∞) risk estimations of both linear and non-linear wavelet estimators under a convolution structure density model over Besov balls. The corresponding conclusions are introduced by Theorems 1 and 2 in this paper, which can be seen as an extension of the works of Donoho et al. [1] and Li & Liu [4].
It should be pointed out that the non-linear wavelet estimator is adaptive, and the convergence rate of non-linear estimator are better than that of linear one for the case r ≤ p. However, it is not clear whether the estimations are optimal (nearly-optimal) or not for α ∈ (0, 1). Therefore, one of our future work is to determine a low bound estimate for model (2) with α ∈ (0, 1).