Nonparametric Density Estimation in a Mixed Model Using Wavelets

Liang, Dan; Kou, Junke

doi:10.3390/axioms14100741

Open AccessArticle

Nonparametric Density Estimation in a Mixed Model Using Wavelets

by

Dan Liang

and

Junke Kou

^*

School of Mathematics and Computational Science, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(10), 741; https://doi.org/10.3390/axioms14100741

Submission received: 5 August 2025 / Revised: 22 September 2025 / Accepted: 24 September 2025 / Published: 30 September 2025

Download

Browse Figures

Versions Notes

Abstract

This paper investigates nonparametric estimations of a density function within a mixed density model. A linear wavelet density estimator and an adaptive nonlinear wavelet estimator are proposed using wavelet method and hard thresholding algorithm. Under some mild conditions, the convergence rates over the mean integrated squared error of two wavelet density estimators are proved. Compared with the optimal convergence rates of nonparametric wavelet estimations, those two wavelet estimators all are optimal in some cases. Finally, the performances of two wavelet estimators are verified by numerical experimental studies.

Keywords:

nonparametric estimation; density function; mixed model; wavelets

MSC:

62G07; 62G20; 42C40

1. Introduction

In this paper, we investigate a mixed density model as follows. The random vectors

X_{1}, X_{2}, \dots, X_{n}

are independent and identically distributed, and share a same density

g (x)

satisfying

g (x) = θ h (x) + (1 - θ) f (x), x \in Ω .

(1)

In the above equation,

Ω

is a compact support subset of

R^{d}

, and

θ

is a known mixture parameter,

θ \in (0, 1)

. Both

h (x)

and

f (x)

are bounded densities. The model aims to recover the unknown density

f (x)

using the observed sample

{X_{i}}_{i = 1}^{n}

.

This mixed density model has many practical applications. In the contamination problem [1,2], the density function

f (x)

standing for a reasonable assumption distribution is contaminated by an arbitrary assumption

h (x)

. In addition, this model is also widely used in microarray analysis [3,4,5], neuroimaging [6] and other testing problems [7,8]. During the multiple testing, Efron et al. [9] used the above mixed model to estimate the local false discovery rate.

For the density estimation problem (1), significant results have been established through various methods, including the kernel method, maximum likelihood estimation, and polynomial techniques. Olkin and Spiegelman [10] estimated the mixture parameter

θ

by the maximum likelihood method, and proposed a kernel density estimator of the true density function

f (x)

. Priebe and Marchette [11] proposed using parameter estimates as weights for kernel density estimation. James et al. [12] constructed a semiparametric density estimator, and studied the almost sure convergence rate of this estimator under some mild conditions. Robin et al. [13] estimated the unknown density function employing a weighted kernel function and adaptively selected the weights. The pointwise quadratic risk for the randomly weighted kernel estimator mentioned above was derived by [14], assuming the unknown density function pertains to a Hölder class. The convergence rates and asymptotic behavior of the density estimator were derived by [15], considering the cases of a known and an unknown mixture parameter, respectively. Deb et al. [16] proposed likelihood based methods for estimating the distribution function of the density function

f (x)

, and discussed the statistical and computational properties of this method. To the best of our knowledge, there is no paper focus on wavelet method for estimating the unknown density function

f (x)

.

This paper focuses on developing wavelet-based methods to address the density estimation problem (1). Wavelet has an important and satisfactory property of local time–frequency analysis. Due to this unique property, wavelet estimator can choose suitable scale parameters for estimating functions which have different functional properties in different intervals. Wavelet methods are now a staple tool in nonparametric statistics; see [17,18,19,20,21,22]. In this paper, a linear wavelet density estimator is initially introduced via the wavelet projector approach. It is noteworthy that this linear estimator exhibits unbiasedness. A convergence rate of the linear estimator over

L^{2}

-risk is proved in Besov spaces. Although it achieves the optimal convergence rate typical of nonparametric wavelet estimation, this linear estimator lacks adaptability. Secondly, a nonlinear wavelet estimator is derived using the hard thresholding method, which is an adaptive density estimator. Compared to its linear estimator, the nonlinear estimator achieves a superior convergence rate when

1 \leq p < 2

. To conclude, a series of numerical experiments are provided to study the performances of those two wavelet estimators.

The rest of the paper proceeds as follows. Section 2 will provide the definitions of two wavelet density estimators and the convergence rates of those two estimators under

L^{2}

-risk. The performances of two wavelet estimators are studied through numerical experiments in Section 3. The proof of the main results and some auxiliary results are presented in Section 4.

2. Wavelet Estimators and Main Results

This work investigates wavelet-based density estimation for mixed models within Besov spaces. Let us first review several fundamental concepts in wavelet theory. The orthogonal multiresolution analysis (MRA) [23] defines a sequence of nested and closed linear subspaces

V_{j} \subset V_{j + 1}

, which belong to the space of squared integrable functions

L^{2} (R^{d})

for any

j \in Z

,

(i): $⋂ V_{j} = {0}$ , $\bar{⋃ V_{j}} = L^{2} (R^{d});$
(ii): $f (x) \in V_{0}$ if and only if $f (2^{j} x) \in V_{j};$
(iii): There exists a function $Φ (x) \in V_{0}$ for which the set ${Φ (x - κ) ∣ κ \in Z^{d}}$ forms an orthonormal basis of $V_{0}$ .

Given an orthonormal scaling function

Φ

, the corresponding wavelets are denoted by

Ψ_{μ}

for

μ \in 0, 1, \dots, 2^{d} - 1

. Then

S = {Φ_{ϱ, κ} : = 2^{ϱ d / 2} Φ (2^{ϱ} x - κ), Ψ_{j, κ, μ} : = 2^{j d / 2} Ψ_{μ} (2^{j} x - κ), j \geq ϱ, κ \in Λ_{j}}

form an orthonormal basis of

L^{2} (Ω)

. For every integer

j_{0} \geq ϱ

, the function

f (x) \in L^{2} (Ω)

can be represented by S in the following wavelet series:

f (x) = \sum_{κ \in Λ_{j_{0}}} a_{j_{0}, κ} Φ_{j_{0}, κ} (x) + \sum_{j = j_{0}}^{\infty} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} b_{j, κ, μ} Ψ_{j, κ, μ} (x) .

(2)

In this equation,

Λ_{j} = {0, 1, \dots, 2^{j d} - 1}

,

a_{j, κ} = {〈 f, Φ_{j, κ} 〉}_{L^{2} (Ω)}

,

b_{j, κ, μ} = {〈 f, Ψ_{j, κ, μ} 〉}_{L^{2} (Ω)}

.

Let

P_{V j}

denote the orthogonal projection operator from

L^{2} (Ω)

onto the subspace

V_{j}

. Then, for any

f (x) \in L^{2} (Ω)

,

P_{V_{j}} f (x) = \sum_{κ \in Λ_{j}} a_{j, κ} Φ_{j, κ} (x) .

(3)

For nonparametric density estimations based on wavelet methods, it is very common to assume that the unknown density function belong to Besov spaces. It is generally acknowledged that Besov spaces are very general function spaces and can be characterized simply in terms of wavelet coefficients. Following [23], we next define Besov spaces via their wavelet coefficient characterization.

Lemma 1.

Suppose the scale function Φ is regular of order ω,

0 < s < ω

, let

f \in L^{2} (Ω)

,

1 \leq p, q < \infty

, the following statements are thus logically equivalent:

(i): $f \in B_{p, q}^{s} (Ω);$
(ii): ${2^{j s} {∥ f - P_{V_{j}} f ∥}_{p}} \in l_{q};$
(iii): ${2^{j (s + \frac{d}{2} - \frac{d}{p})} {∥ b_{j, κ, μ} ∥}_{p}} \in l_{q}$ .

One can characterize the Besov norm of f as follows:

{∥f∥}_{B_{p, q}^{s}} : = {∥(a_{ϱ, κ})∥}_{p} + {∥{(2^{j (s + \frac{d}{2} - \frac{d}{p})} {∥ b_{j, κ, μ} ∥}_{p})}_{j \geq ϱ}∥}_{q}

with

{∥b_{j, κ, μ}∥}_{p}^{p} = \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{p}

.

We now introduce the linear wavelet estimator as follows

{\hat{f}}_{n}^{lin} (x) : = \sum_{κ \in Λ_{j_{0}}} {\hat{a}}_{j_{0}, κ} Φ_{j_{0}, κ} (x) .

(4)

In the above definition,

{\hat{a}}_{j, κ} : = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i}) - γ_{j, κ}

(5)

and

γ_{j, κ} = \int_{Ω} \frac{θ}{1 - θ} h (x) Φ_{j, κ} (x) d x

. In these definitions,

θ

stands for a known mixture parameter of the model (1). For the scale function

Φ (x)

, this paper uses the Daubechies wavelets [24]. It is well known that the simplest of these is the Haar wavelet. The corresponding scale functions of Daubechies wavelets can be obtained by a iterative scheme with Haar scale function [24,25]. Next, we derive the convergence rate of the linear wavelet estimator, using the notation:

x_{+} : = max {x, 0}

. There exists a constant

c > 0

,

u ≲ v

denotes

u \leq c v

;

u ≳ v

denotes

v ≲ u

; and

u \sim v

denotes both

u ≲ v

and

v ≲ u

.

Theorem 1.

Consider the model (1),

f (x) \in B_{p, q}^{s} (Ω)

, where

p, q \in [1, \infty)

,

s > \frac{d}{p}

. Define the linear wavelet estimator

{\hat{f}}_{n}^{lin} (x)

by (4), taking

2^{j_{0}} \sim n^{\frac{1}{2 s^{'} + d}}

where

s^{'} = s - d {(\frac{1}{p} - \frac{1}{2})}_{+}

, then

\begin{matrix} E [∥ {\hat{f}}_{n}^{lin} (x) - f (x) ∥_{2}^{2}] ≲ n^{- \frac{2 s^{'}}{2 s^{'} + d}} . \end{matrix}

Remark 1.

When

p \geq 2

, the convergence rate

(n^{- \frac{2 s}{2 s + d}})

of the linear wavelet estimator

{\hat{f}}_{n}^{lin} (x)

matches the optimal convergence rate [26] for standard nonparametric wavelet estimation problems.

Compared with the optimal convergence rate

n^{- \frac{2 s}{2 s + d}}

[26], the linear wavelet estimator results in a lower convergence rate in the case of

1 \leq p < 2

. Moreover, the definition of the linear wavelet estimator

{\hat{f}}_{n}^{lin} (x)

requires knowledge of the smoothness parameter s of the unknown density

f (x)

. However, the smoothness parameter s of the density function is usually unknown in many practical applications. As a result, this linear wavelet estimator is not adaptive. In order to overcome those shortages of the linear estimator, this paper employs a hard thresholding method to construct a nonlinear wavelet estimator.

We establish the following nonlinear wavelet estimator

{\hat{f}}_{n}^{non} (x) : = \sum_{κ \in Λ_{j_{0}}} {\hat{a}}_{j_{0}, κ} Φ_{j_{0}, κ} (x) + \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} Ψ_{j, κ, μ} (x) .

(6)

In this equation,

{\hat{b}}_{j, κ, μ} : = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Ψ_{j, κ, μ} (X_{i}) - ϑ_{j, κ, μ}

(7)

and

ϑ_{j, κ, μ} = \int_{Ω} \frac{θ}{1 - θ} h (x) Ψ_{j, κ, μ} (x) d x

. Then, the wavelet function

Ψ (x)

can be constructed by the scaling relation and the scale function

Φ (x)

. For more details, we can refer to [24,25]. Here, the function

I_{G}

is an indicator function on G and

r_{n} = \sqrt{\frac{ln n}{n}}

. The convergence rate of this nonlinear wavelet estimator is presented as follows.

Theorem 2.

Consider the model (1),

f (x) \in B_{p, q}^{s} (Ω)

, where

p, q \in [1, \infty)

,

s > \frac{d}{p}

. The nonlinear estimator

{\hat{f}}_{n}^{non} (x)

is constructed by (6) together with

2^{j_{0}} \sim n^{\frac{1}{2 ω + d}} (ω > s)

,

2^{j_{2}} \sim {(\frac{n}{ln n})}^{\frac{1}{d}}

; then

\begin{matrix} E [∥ {\hat{f}}_{n}^{non} (x) - f (x) ∥_{2}^{2}] ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

Remark 2.

Note that the convergence rate of this nonlinear wavelet estimator matches the optimal convergence rate

n^{- \frac{2 s}{2 s + d}}

up to the

ln n

factor.

Remark 3.

Unlike linear estimators, both attain optimal rates for

p \geq 2

. However, the nonlinear wavelet estimator gets better convergence rate when

1 \leq p < 2

. More importantly, the definition of the nonlinear estimator is independent of the smoothness parameter of the unknown density function, making the wavelet estimator adaptive.

Remark 4.

According to the model (1), we can easily to see that the density function

f (x)

has a compact support Ω on

R^{d}

. It should be pointed out that this compact support property play a important role in the definitions of two wavelet estimators and the proof of two theorems. For details, in the definitions of two wavelet estimators,

κ \in Λ_{j_{0}}

and

κ \in Λ_{j}

rely on the compact support property. Then the cardinalities of

Λ_{j_{0}}

and

Λ_{j}

satisfy

| Λ_{j_{0}} | \sim 2^{j_{0} d}

and

| Λ_{j} | \sim 2^{j d}

, respectively. These results are used in some steps of the proofs of the two theorems, such as (12), (13), (27) and so on. Hence, similar to the classical and important work of Donoho et al. [26], this paper considers the nonparametric estimation of density function with compact support condition. On the other hand, for the nonparametric density estimation with non-compact support assumption, some significant stidoes have been conducted by [27,28].

3. Numerical Experiments

This section presents numerical experiments using R2024 software to delve into the effect of approximation of the linear and nonlinear wavelet estimators. During the experiments, the density function

f (x)

is estimated from the observed data set

{X_{i}}_{i = 1}^{n}

. Because during the definitions of those two wavelet estimators the scale parameters

j_{0}

and

j_{1}

are related to the sample size n, we choose the sample size

n = 4096

in the following simulation studies. In the model (1), the mixture parameter

θ

is selected to be

θ = 0.025

. To assess the performance of both estimators, we adopt the mean square error (MSE) as the evaluation criterion, i.e.,

MSE (f, \hat{f}) = \frac{1}{n} \sum_{i = 1}^{n} {(f (x_{i}) - \hat{f} (x_{i}))}^{2}

, which is a classical and effective evaluation method in nonparametric estimations.

Based on the definition of linear wavelet estimator, we construct a set of linear estimators

{\hat{f}}_{n}^{lin} (x)

with different scales

j_{0} = 0, 1, \dots, l o g_{2} (n) - 1

. During the following simulation studies, we use

j_{-}^{*}

to stand for the scale parameter

j_{0}

, which means that

j_{-}^{*} = j_{0}

. By minimizing the mean square error, we can obtain the best linear wavelet estimator with the optimal scale parameters

j_{0}

. For example, in the Example 1, the MSE results of the linear wavelet estimator with different scale parameter

j_{0}

are shown in Figure 1c. Then, it is easy to see that the MSE of linear wavelet estimator is least when the scale parameter

j_{0}

is 5, 6, 7, 8, 9 and 10. For simplicity and high efficiency, we choose the minimum scale parameter value 5, i.e.,

j_{0} = 5

.

The nonlinear estimator, according to the definition (6), has two scale parameters,

j_{0}

and

j_{1}

. For the first scale parameter, we used the same optimal parameter

j_{0}

as for the linear estimator. The other scale parameter

j_{1}

is set at the maximum level permitted in the wavelet decomposition (i.e.,

j_{2} = l o g_{2} (n) - 1

with

n = 4096

). In addition, the optimal thresholding parameter

λ (λ = τ r_{n})

is selected by minimizing the mean square error of the nonlinear wavelet estimator. For example, in the Example 1, the two scale parameters of nonlinear wavelet estimator are

j_{0} = 5

and

j_{2} = 11

. On the other hand, the MSE of nonlinear wavelet estimator with different thresholding values are shown in Figure 1d. Then we can select the optimal thresholding parameter

λ = 0.0321070234

. According to the model (1), six different functions will be selected as the density function

f (x)

in the following simulation study.

Example 1.

In model (1), we take the density to be

f_{1} (x) = 8 (4 x - 2) e^{- {(4 x - 2)}^{2}}

. The density function

h (x) = 0.57 + 0.5 x^{2} + 0.3 cos (2 x)

and

x \in [0.5, 1.5]

. The performance of both linear and nonlinear wavelet estimators is illustrated in Figure 1a,b, respectively. As can be observed, each estimator provides an effective approximation of the density function

f (x)

. As clearly shown in Figure 1c, the optimal scale parameter is

j_{0} = 5

. The optimal threshold parameter

λ = 0.0321070234

is depicted in Figure 1d.

Example 2.

In model (1), we set the unknown density to

f_{2} (x) = 10 x {(1 - 4 x I_{{x \leq 0.35}})}^{2} I_{{x \leq 0.35}} + (- x^{2} + 0.5 x + 0.5) I_{{x > 0.35}} + 0.71

,

h (x) = 0.7 + 0.5 x^{2} + 0.3 cos (2 x)

and

x \in [0, 1]

. The performance of both linear and nonlinear wavelet estimators is illustrated in Figure 2a,b, respectively. Based on Figure 2c,d, the optimal scale parameter

j_{0} = 6

and the optimal threshold parameter

λ = 0.0051170569

are obtained. From those results, it is evident that the nonlinear wavelet estimator outperforms the linear estimator, particularly in capturing sharp features.

Example 3.

In model (1), we take

f_{3} (x) = 0.81 {cos}^{2} x

on

x \in [- 0.5, 1.5]

and set

h (x) = 0.36 + 0.2 x^{2} + 0.1 cos (2 x)

. The efficacy of the two wavelet estimators in approximating the density function

f (x)

is evidenced in Figure 3a,b. The corresponding best parameters of two wavelet estimators are presented in Figure 3c,d.

Example 4.

In model (1), the density function is specified as

f_{4} (x) = 2 (0.32 + 0.6 x + 0.3 e^{- 100 {(x - 0.3)}^{2}}) I_{{x \leq 0.5}} - 2 (0.28 - 0.6 x - 0.3 e^{- 100 {(x - 1.3)}^{2}}) I_{{x > 0.5}}

with

h (x) = 0.43 + 0.1 x^{2} + 0.1 cos (2 x)

and

x \in [- 1, 1]

. The performance of the linear and nonlinear wavelet estimators is illustrated in Figure 4a,b, respectively. Both estimators effectively approximate the density function, but the nonlinear estimator demonstrates superior performance near discontinuities.

Example 5.

For the experiment, we set the target density to

f_{5} (x) = (4 s i n (4 π x) - s i g n (x - 0.3) - s i g n (0.72 - x)) / 10 + 1.1

,

h (x) = 0.7 + 0.5 x^{2} + 0.3 cos (2 x)

and

x \in [0, 1]

. According to Figure 5c,d, the optimal scale parameter

j_{0} = 5

and the optimal thresholding parameter

λ = 0.0929765886

are obtained. Under these conditions, the density function

f (x)

can be estimated by the two wavelet estimators shown in Figure 5a,b.

Example 6.

For the experiment, we select the “Time Shift Sine” as the density function

f_{6} (x)

[29]. In addition,

h (x) = 0.7 + 0.5 x^{2} + 0.3 cos (2 x)

and

x \in [0, 1]

. Figure 6c,d show that the optimal scale parameter is

j_{0} = 6

and the optimal thresholding parameter is

l a m b d a = 0.006204013

. From the following results, both wavelet estimators accurately estimate the unknown density.

For both the linear and nonlinear wavelet estimators, the best scale parameter, the optimal thresholding parameter and the values of MSE are shown in Table 1. Based on Table 1 and the preceding results, both wavelet estimators effectively approximate the density function, with the nonlinear variant exhibiting superior performance. More importantly, according to the results of Example 2, 4, 5 and 6, both wavelet estimators perform well in cases with discontinuities and spikes.

4. Proof of Main Theorem and Auxiliary Results

In this section, we will give some auxiliary results and the proof of Theorems 1 and 2. It should now be pointed out that this paper does not use symbolic computation software (such as Mathematica, Maple, etc.) for the following theoretical derivations.

4.1. Auxiliary Results

Lemma 2.

For the model (1), let

{\hat{a}}_{j, κ}

be defined as (5) and

{\hat{b}}_{j, κ, μ}

given by (7). Then we have

\begin{matrix} E [{\hat{a}}_{j, κ}] = a_{j, κ}, E [{\hat{b}}_{j, κ, μ}] = b_{j, κ, μ} . \end{matrix}

Proof.

Accroding to

{\hat{a}}_{j, κ} = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i}) - γ_{j, κ}

,

γ_{j, κ} = \int_{Ω} \frac{θ}{1 - θ} h (x) Φ (x) d x

, we observe that

\begin{matrix} E [{\hat{a}}_{j, κ}] & = E [\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i}) - γ_{j, κ}] \\ = E [\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i}) - \int_{Ω} \frac{θ}{1 - θ} h (x) Φ_{j, κ} (x) d x] \\ = E [\frac{1}{1 - θ} Φ_{j, κ} (X_{1})] - \int_{Ω} \frac{θ}{1 - θ} h (x) Φ_{j, κ} (x) d x \\ = \int_{Ω} \frac{1}{1 - θ} g (x) Φ_{j, κ} (x) d x - \int_{Ω} \frac{θ}{1 - θ} h (x) Φ_{j, κ} (x) d x . \end{matrix}

From the Equation (1),

g (x) = θ h (x) + (1 - θ) f (x)

and

\begin{matrix} E [{\hat{a}}_{j, κ}] & = \int_{Ω} \frac{1}{1 - θ} [θ h (x) + (1 - θ) f (x)] Φ_{j, κ} (x) d x - \int_{Ω} \frac{θ}{1 - θ} h (x) Φ_{j, κ} (x) d x \\ = \int_{Ω} f (x) Φ_{j, κ} (x) d x \\ = {〈 f, Φ_{j, κ} 〉}_{L_{Ω}^{2}} = a_{j, κ} . \end{matrix}

The proof of the second equation is similar to the first one. This concludes the proof of Lemma 2. □

Lemma 3.

Consider the model (1) with

θ \in (0, δ)

and

0 < δ < 1

. Two unbiased estimators

{\hat{a}}_{j, κ}

and

{\hat{b}}_{j, κ, μ}

of the wavelet coefficients are proposed by (5) and (7), respectively. Then we have

\begin{matrix} E [{({\hat{a}}_{j, κ} - a_{j, κ})}^{2}] ≲ \frac{1}{n}, E [{({\hat{b}}_{j, κ, μ} - b_{j, κ, μ})}^{2}] ≲ \frac{1}{n} . \end{matrix}

Proof.

According to Lemma 2 and the definition of

γ_{j, κ}

, one has

var [{\hat{a}}_{j, κ}] = E [{({\hat{a}}_{j, κ} - E [{\hat{a}}_{j, κ}])}^{2}]

,

\begin{matrix} E [{({\hat{a}}_{j, κ} - a_{j, κ})}^{2}] & = var [\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i}) - γ_{j, κ}] \\ = var [\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i})] \\ = \frac{1}{n^{2}} var [\sum_{i = 1}^{n} \frac{1}{1 - θ} Φ_{j, κ} (X_{i})] \\ \leq \frac{1}{n} E [\frac{1}{{(1 - θ)}^{2}} Φ_{j, κ}^{2} (X_{1})] \\ ≲ \frac{1}{n} E [Φ_{j, κ}^{2} (X_{1})] . \end{matrix}

Due to the boundness of

h (x)

,

f (x)

and mixture parameter

θ

, we have

\begin{matrix} E [Φ_{j, κ}^{2} (X_{1})] & = \int_{Ω} g (x) Φ_{j, κ}^{2} (x) d x \\ = \int_{Ω} [θ h (x) + (1 - θ) f (x)] Φ_{j, κ}^{2} (x) d x \\ = \int_{Ω} θ h (x) Φ_{j, κ}^{2} (x) d x + \int_{Ω} (1 - θ) f (x) Φ_{j, κ}^{2} (x) d x \\ ≲ \int_{Ω} Φ_{j, κ}^{2} (x) d x = 1 . \end{matrix}

Hence, we prove the first equality,

E [{({\hat{a}}_{j, κ} - a_{j, κ})}^{2}] ≲ \frac{1}{n} .

According to the similar arguments, the second equation can be proved easily. Hence, Lemma 3 is proved. □

Lemma 4.

Consider the model (1) with

θ \in (0, δ)

and

0 < δ < 1

. The wavelet coefficients estimator

{\hat{b}}_{j, κ, μ}

is given by (7). For

2^{j d} \leq \frac{n}{ln n}

, with a constant

τ > 1

satisfying

\begin{matrix} \Pr (| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | \geq τ r_{n}) ≲ n^{- 4} . \end{matrix}

Proof.

In order to prove the above result simply, we take

B_{i} : = \frac{1}{1 - θ} (Ψ_{j, κ, μ} (X_{i}) - E [Ψ_{j, κ, μ} (X_{i})]) .

Using Lemma 2,

\begin{matrix} |{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}| & = | \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Ψ_{j, κ, μ} (X_{i}) - ϑ_{j, κ, μ} - E [{\hat{b}}_{j, κ, μ}] | \\ = | \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{1 - θ} Ψ_{j, κ, μ} (X_{i}) - E [\frac{1}{1 - θ} Ψ_{j, κ, μ} (X_{i})] | \\ = \frac{1}{n} | \sum_{i = 1}^{n} \frac{1}{1 - θ} (Ψ_{j, κ, μ} (X_{i}) - E [Ψ_{j, κ, μ} (X_{i})]) | \\ = \frac{1}{n} | \sum_{i = 1}^{n} B_{i} | . \end{matrix}

Thus, the following can be concluded:

\Pr {| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | \geq τ r_{n}} = \Pr \{\frac{1}{n} | \sum_{i = 1}^{n} B_{i} | \geq τ r_{n}\} .

(8)

Note that

E [B_{i}] = 0

. The boundness of

h (x)

,

f (x)

and

θ

imply that

\begin{matrix} | B_{i} | & = | \frac{1}{1 - θ} (Ψ_{j, κ, μ} (X_{i}) - E [Ψ_{j, κ, μ} (X_{i})]) | ≲ | Ψ_{j, κ, μ} (X_{i}) - E [Ψ_{j, κ, μ} (X_{i})] | \\ \leq | Ψ_{j, κ, μ} (X_{i}) | + | E [Ψ_{j, κ, μ} (X_{i})] | \\ = | 2^{\frac{j d}{2}} Ψ_{μ} (2^{j} X_{i} - κ) | + | \int_{Ω} g (x) 2^{\frac{j d}{2}} Ψ_{μ} (2^{j} x - κ) d x | ≲ 2^{\frac{j d}{2}} . \end{matrix}

This together with

2^{j d} \leq \frac{n}{ln n}

shows that

| B_{i} | ≲ \sqrt{\frac{n}{ln n}} .

(9)

On the other hand, using the properties of variance and wavelet function, we have

\begin{matrix} E [B_{i}^{2}] & = E [\frac{1}{{(1 - θ)}^{2}} {(Ψ_{j, κ, μ} (X_{i}) - E [Ψ_{j, κ, μ} (X_{i})])}^{2}] \\ ≲ var [Ψ_{j, κ, μ} (X_{i})] \leq E [Ψ_{j, κ, μ}^{2} (X_{i})] ≲ 1 . \end{matrix}

(10)

Finally, it follows from (9), (10) and Bernstein’s inequality [23] that

\begin{matrix} \Pr (\frac{1}{n} | \sum_{i = 1}^{n} B_{i} | \geq τ r_{n}) & ≲ exp \{- \frac{n τ^{2} r_{n}^{2}}{2 (1 + τ r_{n} \sqrt{\frac{n}{ln n}} / 3)}\} \\ ≲ exp \{- \frac{(ln n) τ^{2}}{2 (1 + τ / 3)}\} ≲ n^{- \frac{τ^{2}}{2 (1 + τ / 3)}} . \end{matrix}

Then, a sufficiently large

τ

can be chosen such that

\begin{matrix} \Pr (| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | \geq τ r_{n}) ≲ n^{- \frac{τ^{2}}{2 (1 + τ / 3)}} ≲ n^{- 4} . \end{matrix}

The proof of Lemma 4 is completed. □

4.2. Proof of Main Theorem

Proof of Theorem 1.

According to (2) and (3),

\begin{matrix} E [∥ {\hat{f}}_{n}^{lin} (x) - f (x) ∥_{2}^{2}] & = E [∥ {\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x) + P_{V_{j_{0}}} f (x) - f (x) ∥_{2}^{2}] \\ = E [∥ {\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} {f (x) ∥}_{2}^{2}] + ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{2}^{2} . \end{matrix}

(11)

For the first part, it can be inferred that

\begin{matrix} E [∥ {\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x) ∥_{2}^{2}] & = E [{∥\sum_{κ \in Λ_{j_{0}}} ({\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}) Φ_{j_{0}, κ} (x)∥}^{2}] \\ = E [\int_{Ω} {|\sum_{κ \in Λ_{j_{0}}} ({\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}) Φ_{j_{0}, κ} (x)|}^{2} d x] \\ = E [\sum_{κ \in Λ_{j_{0}}} {|{\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}|}^{2}] \\ = \sum_{κ \in Λ_{j_{0}}} E [{|{\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}|}^{2}] . \end{matrix}

By Lemma 3, since

|Λ_{j_{0}}| \sim 2^{j_{0} d}

and

2^{j_{0}} \sim n^{\frac{1}{2 s^{'} + d}}

, we have

\begin{matrix} E [∥ {\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x) ∥_{2}^{2}] & = \sum_{κ \in Λ_{j_{0}}} E [{|{\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}|}^{2}] \\ ≲ \sum_{κ \in Λ_{j_{0}}} \frac{1}{n} = \frac{2^{j_{0} d}}{n} \sim n^{- \frac{2 s^{'}}{2 s^{'} + d}} . \end{matrix}

(12)

For the second part, when

p \geq 2

, we can get

d {(\frac{1}{p} - \frac{1}{2})}_{+} = 0

and

s^{'} = s

. Moreover, Hölder’s inequality implies that

\begin{matrix} ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{2}^{2} & = \int_{Ω} {| P_{V_{j_{0}}} f (x) - f (x) |}^{2} \cdot 1 d x \\ \leq {(\int_{Ω} {| P_{V_{j_{0}}} f (x) - f (x) |}^{2 \cdot \frac{p}{2}} d x)}^{\frac{2}{p}} {(\int_{Ω} 1^{1^{\frac{p}{p - 2}}} d x)}^{1 - \frac{2}{p}} \\ ≲ {(\int_{Ω} {| P_{V_{j_{0}}} f (x) - f (x) |}^{p} d x)}^{\frac{2}{p}} \\ = ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{p}^{2} . \end{matrix}

Furthermore, using Lemma 1 and

f (x) \in B_{p, q}^{s} (Ω)

, we can get

\begin{matrix} ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{2}^{2} ≲ {∥ P_{V_{j_{0}}} f (x) - f (x) ∥}_{p}^{2} ≲ 2^{- j_{0} s} \sim n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(13)

When

1 \leq p < 2

and

s > \frac{d}{p}

, we know that

s^{'} = s - d (\frac{1}{p} - \frac{1}{2})

and

B_{p, q}^{s} (Ω) \subseteq B_{2, + \infty}^{s^{'}} (Ω)

. In addition, the following conclusion is true,

\begin{matrix} ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{2}^{2} ≲ \sum_{j = j_{0}}^{\infty} 2^{- 2 j s^{'}} \sim 2^{- 2 j_{0} s^{'}} \sim n^{- \frac{2 s^{'}}{2 s^{'} + d}} . \end{matrix}

(14)

Hence, for

1 \leq p < \infty

, the results (13) and (14) show that

\begin{matrix} ∥ P_{V_{j_{0}}} {f (x) - f (x) ∥}_{2}^{2} ≲ n^{- \frac{2 s^{'}}{2 s^{'} + d}} . \end{matrix}

(15)

Finally, together with (11), (12) and (15), we prove that

\begin{matrix} E [∥ {\hat{f}}_{n}^{lin} (x) - f (x) ∥_{2}^{2}] ≲ n^{- \frac{2 s^{'}}{2 s^{'} + d}} . \end{matrix}

□

Proof of Theorem 2.

According to the definitions of linear estimator, nonlinear estimator and projection operator, we have

\begin{matrix} {\hat{f}}_{n}^{non} (x) - f (x) & = ({\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x)) - (f (x) - P_{V_{j_{2} + 1}} f (x)) \\ + \sum_{j = j_{0}}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} ({\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}) Ψ_{j, κ, μ} (x) . \end{matrix}

Hence,

\begin{matrix} E [∥ {\hat{f}}_{n}^{non} (x) - f (x) ∥_{2}^{2}] ≲ T_{1} + T_{2} + D . \end{matrix}

(16)

In this inequality,

\begin{matrix} T_{1} : = E [{∥{\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x)∥}_{2}^{2}], \\ T_{2} : = {∥f (x) - P_{V_{j_{2} + 1}} f (x)∥}_{2}^{2}, \\ D : = E [{∥\sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} ({\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}) Ψ_{j, κ, μ} (x)∥}_{2}^{2}] . \end{matrix}

For

T_{1}

. According to Lemma 3, (12) and

2^{j_{0}} \sim n^{\frac{1}{2 ω + d}} (ω > s)

,

\begin{matrix} T_{1} & = E [{∥{\hat{f}}_{n}^{lin} (x) - P_{V_{j_{0}}} f (x)∥}_{2}^{2}] \\ = \sum_{κ \in Λ_{j_{0}}} E [{|{\hat{a}}_{j_{0}, κ} - a_{j_{0}, κ}|}^{2}] ≲ \sum_{κ \in Λ_{j_{0}}} \frac{1}{n} \\ ≲ \frac{2^{j_{0} d}}{n} \sim n^{- \frac{2 ω}{2 ω + d}} \leq n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(17)

For

T_{2}

. Consistent with the reasoning in (15), it follows that

\begin{matrix} T_{2} ≲ (ln n) n^{- \frac{2 s}{2 s + d}}, \end{matrix}

(18)

with the condition

2^{j_{2}} \sim {(\frac{n}{ln n})}^{\frac{1}{d}}

.

The following work is to prove

\begin{matrix} D = E [{∥\sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} ({\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}) Ψ_{j, κ, μ} (x)∥}_{2}^{2}] ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

According to the properties of wavelet function, we have

\begin{matrix} D = \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}|}^{2}] . \end{matrix}

Note that the following conclusion is true:

\begin{matrix} {|{\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}|}^{2} & ≲ {|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | > \frac{τ r_{n}}{2}}} \\ + {|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | > \frac{τ r_{n}}{2}}} + {|b_{j, κ, μ}|}^{2} I_{{b_{j, κ, μ} | \leq 2 τ r_{n}}} . \end{matrix}

Then, we have

\begin{matrix} D = \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} I_{{| {\hat{b}}_{j, κ, μ} | \geq τ r_{n}}} - b_{j, κ, μ}|}^{2}] ≲ D_{1} + D_{2} + D_{3}, \end{matrix}

(19)

\begin{matrix} D_{1} = \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | > \frac{τ r_{n}}{2}}}], \\ D_{2} = \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}}], \\ D_{3} = \sum_{j = j_{0}}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} . \end{matrix}

For

D_{1}

. Applying Hölder inequality,

\begin{matrix} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | > \frac{τ r_{n}}{2}}}] & \leq {(E [| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} |^{4}])}^{\frac{1}{2}} {(E [I_{{| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | > \frac{τ r_{n}}{2}}}])}^{\frac{1}{2}} \\ \leq {(E [| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} |^{4}])}^{\frac{1}{2}} {(\Pr (| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}))}^{\frac{1}{2}} . \end{matrix}

According to Lemmas 3 and 4,

\begin{matrix} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| {\hat{b}}_{j, κ, μ} - b_{j, κ, μ} | > \frac{τ r_{n}}{2}}}] & \leq {(E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} {({\hat{b}}_{j, κ, μ} - b_{j, κ, μ})}^{2}])}^{\frac{1}{2}} . \frac{1}{n^{2}} \\ ≲ {(\frac{n}{ln n} E [{({\hat{b}}_{j, κ, μ} - b_{j, κ, μ})}^{2}])}^{\frac{1}{2}} . \frac{1}{n^{2}} \\ ≲ \frac{1}{n^{2} \sqrt{ln n}} . \end{matrix}

Then,

D_{1} ≲ \sum_{j = j_{0}}^{j_{2}} \frac{2^{j d}}{n^{2} \sqrt{ln n}} ≲ \frac{2^{j_{2} d}}{n^{2} \sqrt{ln n}} ≲ \frac{1}{n {(ln n)}^{\frac{3}{2}}} \leq \frac{1}{n} \leq n^{- \frac{2 s}{2 s + d}}

. Hence,

\begin{matrix} D_{1} ≲ n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(20)

For

D_{2}

. We define

2^{j_{1}} \sim n^{\frac{1}{2 s + d}}

. Then

2^{j_{0}} \sim n^{\frac{1}{2 ω + d}} (ω > s) \leq 2^{j_{1}} \sim n^{\frac{1}{2 s + d}} \leq 2^{j_{2}} \sim {(\frac{n}{ln n})}^{\frac{1}{d}}

. Moreover, we can rewrite

D_{2}

as

\begin{matrix} D_{2} & : = (\sum_{j = j_{0}}^{j_{1}} + \sum_{j = j_{1} + 1}^{j_{2}}) \{\sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}}]\} \\ : = D_{21} + D_{22} . \end{matrix}

(21)

Using Lemma 3,

\begin{matrix} D_{21} & = \sum_{j = j_{0}}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}}] \\ ≲ \sum_{j = j_{0}}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} \frac{1}{n} ≲ \sum_{j = j_{0}}^{j_{1}} \frac{2^{j d}}{n} ≲ \frac{2^{j_{1} d}}{n} \sim n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(22)

For

D_{22}

, note that

\begin{matrix} D_{22} & : = \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} E [{|{\hat{b}}_{j, κ, μ} - b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}}] \\ ≲ \sum_{j = j_{1} + 1}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} \frac{1}{n} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}} . \end{matrix}

For

p \geq 2

, according to

f \in B_{p, q}^{s} (Ω)

,

r_{n} = \sqrt{\frac{ln n}{n}}

and Hölder inequality,

\begin{matrix} D_{22} & ≲ \sum_{j = j_{1} + 1}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} \frac{1}{n} {(\frac{| b_{j, κ, μ} |}{τ r_{n} / 2})}^{2} ≲ \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {| b_{j, κ, μ} |}^{2} \\ = \sum_{j = j_{1} + 1}^{j_{2}} ∥ b_{j, κ, μ} ∥_{2}^{2} \leq \sum_{j = j_{1} + 1}^{j_{2}} 2^{j d (1 - \frac{2}{p})} {∥ b_{j, κ, μ} ∥}_{p}^{2} \\ ≲ \sum_{j = j_{1} + 1}^{j_{2}} 2^{- 2 j s} ≲ 2^{- 2 j_{1} s} \sim n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(23)

On the other hand, in the case of

1 \leq p < 2

, by Lemma 1 and the definition of

2^{j_{1}}

,

\begin{matrix} D_{22} & ≲ \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} \frac{1}{n} I_{{| b_{j, κ, μ} | \geq \frac{τ r_{n}}{2}}} \\ \leq \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} \frac{1}{n} {(\frac{| b_{j, κ, μ} |}{τ r_{n} / 2})}^{p} ≲ (ln n) n^{\frac{p}{2} - 1} \sum_{j = j_{1} + 1}^{j_{2}} {∥ b_{j, κ, μ} ∥}_{p}^{p} \\ ≲ (ln n) n^{\frac{p}{2} - 1} \sum_{j = j_{1} + 1}^{j_{2}} 2^{- j (s + \frac{d}{2} - \frac{d}{p}) p} ≲ (ln n) n^{\frac{p}{2} - 1} 2^{- j_{1} (s + \frac{d}{2} - \frac{d}{p}) p} \\ \sim (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(24)

This with (21)–(23) shows that

\begin{matrix} D_{2} ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(25)

For

D_{3}

, it could be written as

\begin{matrix} D_{3} & = (\sum_{j = j_{0}}^{j_{1}} + \sum_{j = j_{1} + 1}^{j_{2}}) \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} \\ : = D_{31} + D_{32} . \end{matrix}

(26)

For the upper bound of

D_{31}

, it is easily to get that

\begin{matrix} D_{31} & = \sum_{j = j_{0}}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} \\ \leq \sum_{j = j_{0}}^{j_{1}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {| 2 τ r_{n} |}^{2} ≲ \sum_{j = j_{0}}^{j_{1}} \frac{ln n}{n} 2^{j d} \\ ≲ \frac{ln n}{n} 2^{j_{1} d} ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(27)

For the upper bound for

D_{32}

, when

p \geq 2

, using Lemma 1 and Hölder’s inequality,

\begin{matrix} D_{32} & = \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} \\ ≲ \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} ≲ \sum_{j = j_{1} + 1}^{j_{2}} 2^{j d (1 - \frac{p}{2})} {∥ b_{j, κ, μ} ∥}_{p}^{2} \\ ≲ \sum_{j = j_{1} + 1}^{j_{2}} 2^{- 2 j s} ≲ 2^{- 2 j_{1} s} \sim n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(28)

For

1 \leq p < 2

, have

{|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} \leq | b_{j, κ, μ} |^{p} {| 2 τ r_{n} |}^{2 - p}

. Furthermore,

\begin{matrix} D_{32} & = \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} {|b_{j, κ, μ}|}^{2} I_{{| b_{j, κ, μ} | \leq 2 τ r_{n}}} \\ ≲ \sum_{j = j_{1} + 1}^{j_{2}} \sum_{μ = 1}^{2^{d} - 1} \sum_{κ \in Λ_{j}} | b_{j, κ, μ} |^{p} | 2 τ r_{n} |^{2 - p} ≲ \sum_{j = j_{1} + 1}^{j_{2}} {∥ b_{j, κ, μ} ∥}_{p}^{p} {(\frac{ln n}{n})}^{\frac{2 - p}{2}} \\ ≲ {(\frac{ln n}{n})}^{\frac{2 - p}{2}} \sum_{j = j_{1} + 1}^{j_{2}} 2^{- j (s + \frac{d}{2} - \frac{d}{p}) p} ≲ {(\frac{ln n}{n})}^{\frac{2 - p}{2}} 2^{- j_{1} (s + \frac{d}{2} - \frac{d}{p}) p} \\ ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(29)

Then, according to (26)–(29),

\begin{matrix} D_{3} ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

(30)

Due to (19), (20), (25) and (30), we can prove that

\begin{matrix} D ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

Together with (16)–(18), this yields

\begin{matrix} E [∥ {\hat{f}}_{n}^{non} (x) - f (x) ∥_{2}^{2}] ≲ (ln n) n^{- \frac{2 s}{2 s + d}} . \end{matrix}

□

5. Conclusions

This paper systematically investigates the application of wavelet methods in density estimation under nonparametric mixture models. Under mild regularity conditions, two estimators are constructed: a linear estimator and a nonlinear adaptive estimator. Meanwhile, we conduct theoretical analysis to derive the convergence rates of these estimators under the

L^{2}

-risk criterion. Results demonstrate that both estimation methods achieve optimal convergence rates in nonparametric density estimation, thereby confirming their statistical efficiency. Furthermore, numerical experiments validate that the practical performance of the proposed methods aligns with theoretical conclusions, indicating their robust statistical efficiency.

Author Contributions

Methodology, J.K.; Writing—original draft, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by Guangxi Natural Science Foundation (Nos. 2024GXNSFBA010379, 2023GXNSFAA026042), the National Natural Science Foundation of China (No. 12361016), Center for Applied Mathematics of Guangxi (GUET), Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

All authors would like to thank the reviewers for their important comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huber, P.J. A robust version of the probability ratio test. Ann. Math. Stat. 1965, 36, 1753–1758. [Google Scholar] [CrossRef]
McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: New York, NY, USA, 2000. [Google Scholar]
Allison, D.B.; Gadbury, G.L.; Heo, M.; Fernández, J.R.; Lee, C.K.; Prolla, T.A.; Weindruch, R. A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal. 2002, 39, 1–20. [Google Scholar] [CrossRef]
Aubert, J.; Bar-Hen, A.; Daudin, J.J.; Robin, S. Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinform. 2004, 5, 125. [Google Scholar] [CrossRef]
Liao, J.G.; Lin, Y.; Selvanayagam, Z.E.; Shih, W.J. A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 2004, 20, 2694–2701. [Google Scholar] [CrossRef] [PubMed]
Shu, H.; Nan, B.; Koeppe, R. Multiple testing for neuroimaging via hidden markov random field. Biometrics 2015, 71, 741–750. [Google Scholar] [CrossRef] [PubMed]
Efron, B. Large-scale simultaneous hypothesis testing. J. Am. Stat. Assoc. 2004, 99, 96–104. [Google Scholar] [CrossRef]
McLachlan, G.J.; Bean, R.W.; Jones, L.B.T. A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 2006, 22, 1608–1615. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R.; Storey, J.D.; Tusher, V. Empirical bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 2001, 96, 1151–1160. [Google Scholar] [CrossRef]
Olkin, I.; Spiegelman, C.H. A semiparametric approach to density estimation. J. Am. Stat. Assoc. 1987, 82, 858–865. [Google Scholar] [CrossRef]
Priebe, C.E.; Marchette, D.J. Alternating kernel and mixture density estimates. Comput. Stat. Data Anal. 2000, 35, 43–65. [Google Scholar] [CrossRef]
James, L.F.; Priebe, C.E.; Marchette, D.J. Consistent estimation of mixture complexity. Ann. Stat. 2001, 29, 1281–1296. [Google Scholar] [CrossRef]
Robin, S.; Bar-Hen, A.; Daudin, J.J.; Pierre, L. A semi-parametric approach for mixture models: Application to local false discovery rate estimation. Comput. Stat. Data Anal. 2007, 51, 5483–5493. [Google Scholar] [CrossRef]
Nguyen, V.H.; Matias, C. Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation. ESAIM Probab. Stat. 2014, 18, 584–612. [Google Scholar] [CrossRef]
Patra, R.K.; Sen, B. Estimation of a two-component mixture model with applications to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 869–893. [Google Scholar] [CrossRef]
Deb, N.; Saha, S.; Guntuboyina, A.; Sen, B. Two-component mixture model in the presence of covariates. J. Am. Stat. Assoc. 2022, 117, 1820–1834. [Google Scholar] [CrossRef]
Chesneau, C.; Doosti, H.; Stone, L. Adaptive wavelet estimation of a function from an m-dependent process with possibly unbounded m. Commun. Stat.-Theory Methods 2018, 48, 1123–1135. [Google Scholar] [CrossRef]
Amato, U.; Antoniadis, A.; Feis, I.D.; Gijbels, I. Wavelet-based robust estimation and variable selection in nonparametric additive models. Stat. Comput. 2022, 32, 11. [Google Scholar] [CrossRef]
Niles-Weed, J.; Berthet, Q. Minimax estimation of smooth densities in wasserstein distance. Ann. Stat. 2022, 50, 1519–1540. [Google Scholar] [CrossRef]
Shirazi, E.; Doosti, H. Evaluation of threshold selection methods for adaptive wavelet quantile density estimation in the presence of bias. Commun. Stat.-Simul. Comput. 2024, 53, 6633–6646. [Google Scholar] [CrossRef]
Benhaddou, R.; Liu, Q. Wavelet estimation for the nonparametric additive model in random design and long-memory dependent errors. J. Nonparametric Stat. 2024, 36, 1088–1113. [Google Scholar] [CrossRef]
Rademacher, D.; Krebs, J.; Sachs, R.V. Statistical inference for wavelet curve estimators of symmetric positive definite matrices. J. Stat. Plan. Inference 2024, 231, 106140. [Google Scholar] [CrossRef]
Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation and Statistical Applications; Springer: New York, NY, USA, 1997. [Google Scholar]
Daubechies, I. Ten Lectures on Wavelets; SIAM: Philadelphia, PA, USA, 1992. [Google Scholar]
Boggess, A.; Narcowich, F.J. A First Course in Wavelets with Fourier Analysis; Wiley and Sons: Toronto, ON, Canada, 2009. [Google Scholar]
Donoho, D.L.; Johnstone, M.I.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Stat. 1996, 24, 508–539. [Google Scholar] [CrossRef]
Juditsky, A.; Lambert-Lacroix, S. On minimax density estimation on R. Bernoulli 2004, 10, 187–220. [Google Scholar] [CrossRef]
Reynaud-Bouret, P.; Rivoirard, V.; Tuleau-Malot, C. Adaptive density estimation: A curse of support. J. Stat. Plan. Inference 2011, 141, 115–139. [Google Scholar] [CrossRef]
Chesneau, C.; Kolei, S.E.; Kou, J.K.; Navarro, F. Nonparametric estimation in a regression model with additive and multiplicative noise. J. Comput. Appl. Math. 2020, 380, 112971. [Google Scholar] [CrossRef]

Figure 1. Estimations of two wavelet estimators with density function

f (x) = f_{1} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 1. Estimations of two wavelet estimators with density function

f (x) = f_{1} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 2. Estimations of two wavelet estimators with density function

f (x) = f_{2} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Figure 2. Estimations of two wavelet estimators with density function

f (x) = f_{2} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Figure 3. Estimations of two wavelet estimators with density function

f (x) = f_{3} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 3. Estimations of two wavelet estimators with density function

f (x) = f_{3} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 4. Estimations of two wavelet estimators with density function

f (x) = f_{4} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Figure 4. Estimations of two wavelet estimators with density function

f (x) = f_{4} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Figure 5. Estimations of two wavelet estimators with density function

f (x) = f_{5} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 5. Estimations of two wavelet estimators with density function

f (x) = f_{5} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

with different scale parameter

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

with different thresholding parameter

λ

.

Figure 6. Estimations of two wavelet estimators with density function

f (x) = f_{6} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Figure 6. Estimations of two wavelet estimators with density function

f (x) = f_{6} (x)

. (a) Linear wavelet estimator result

{\hat{f}}_{n}^{lin} (x)

; (b) nonlinear wavelet estimator result

{\hat{f}}_{n}^{non} (x)

; (c)

M S E ({\hat{f}}_{n}^{lin}, f)

under different scale parameters

j_{0}

; (d)

M S E ({\hat{f}}_{n}^{non}, f)

under different threshold parameters

λ

.

Table 1. Results of the wavelet estimators.

	$f_{1}$	$f_{2}$	$f_{3}$
$j_{0}$	5	6	9
$λ$	0.0321070234	0.0051170569	0.0255852843
$M S E ({\hat{f}}_{n}^{lin}, f)$	0.0082318890	0.0004092525	0.0004416361
$M S E ({\hat{f}}_{n}^{non}, f)$	0.0046535810	0.0002191909	0.0002000215
	$f_{4}$	$f_{5}$	$f_{6}$
$j_{0}$	8	5	6
$λ$	0.1056856187	0.0929765886	0.006204013
$M S E ({\hat{f}}_{n}^{lin}, f)$	0.0024805800	0.0008602093	0.0005513363
$M S E ({\hat{f}}_{n}^{non}, f)$	0.0012282120	0.0002903662	0.0002366971

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, D.; Kou, J. Nonparametric Density Estimation in a Mixed Model Using Wavelets. Axioms 2025, 14, 741. https://doi.org/10.3390/axioms14100741

AMA Style

Liang D, Kou J. Nonparametric Density Estimation in a Mixed Model Using Wavelets. Axioms. 2025; 14(10):741. https://doi.org/10.3390/axioms14100741

Chicago/Turabian Style

Liang, Dan, and Junke Kou. 2025. "Nonparametric Density Estimation in a Mixed Model Using Wavelets" Axioms 14, no. 10: 741. https://doi.org/10.3390/axioms14100741

APA Style

Liang, D., & Kou, J. (2025). Nonparametric Density Estimation in a Mixed Model Using Wavelets. Axioms, 14(10), 741. https://doi.org/10.3390/axioms14100741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nonparametric Density Estimation in a Mixed Model Using Wavelets

Abstract

1. Introduction

2. Wavelet Estimators and Main Results

3. Numerical Experiments

4. Proof of Main Theorem and Auxiliary Results

4.1. Auxiliary Results

4.2. Proof of Main Theorem

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI