Pointwise Optimality of Wavelet Density Estimation for Negatively Associated Biased Sample

: This paper focuses on the density estimation problem that occurs when the sample is negatively associated and biased. We constructed a block thresholding wavelet estimator to recover the density function from the negatively associated biased sample. The pointwise optimality of this wavelet density estimation is shown as L p (1 ≤ p < ∞ ) risks over Besov space. To validate the effectiveness of the block thresholding wavelet method, we provide some examples and implement the numerical simulations. The results indicate that our block thresholding wavelet density estimator is superior in terms of the mean squared error (MSE) when comparing with the nonlinear wavelet density estimator.


Introduction
Let X 1 , X 2 , · · · , X n be the unobserved realizations of random variable X with the density function g, and Y 1 , Y 2 , · · · , Y n be the recorded observations of random variable Y with the density function: where η (y) represents a biasing function and its corresponding expectation is µ = E (η (X)) = 1 0 η (y) g (y)dy (0 < µ < ∞).
To achieve the goal that recovers the density function g (y) in Model (1) from the sample Y 1 , Y 2 , · · · , Y n , some statisticians have conducted thorough explorations [1][2][3][4]. The wavelet method, which can be adapted to represent the local features of the density function, has been widely used in their researches. For instance, Ramirez and Vidakovic [2] estimated the density from the stratified size-biased sample using a linear wavelet method and proved the consistency of their wavelet estimator on the L 2 risk over Besov space. Chesneau [3] considered a pointwise wavelet estimation on the L P (1 ≤ p < ∞) risks [4] and extended this univariate wavelet estimation to the multivariate case. However, the above results rely on the independence assumption of the size-biased sample, which is a serious restriction in practical applications. As the size-biased sample is dependent, many researchers modelled the dependence of the sample as being negatively associated (NA), the definition of which is as follows. Definition 1. Let A and B be an arbitrary pair of disjoint nonempty subsets of {1, 2, · · · , n}. Assume that f 1 and f 2 are real-valued coordinate-wise nondecreasing functions. For a sequence of random variables Y 1 , Y 2 , · · · , Y n , if the covariances of the random variable functions exist and satisfy then Y 1 , Y 2 , · · · , Y n are said to be NA.
The concept of negative association, was first proposed by Alam and Saxena [5] and its basic properties were investigated by Joag-Dev and Proschan [6]. Negative association has been widely applied to multivariate statistical analysis and systems reliability since it contains many multivariate distributions, such as: (a) multinomial, (b) convolution of unlike multinomial, (c) Dirichlet, (d) negatively correlated normal distribution, and (e) random sampling without replacement. Some studies investigated the fundamental and asymptotic properties for an NA sample ( [7][8][9][10][11][12]). Recently, the NA sequence has been introduced into the Model (1), and the wavelet density estimation of NA size-biased sample was studied. For example, Chesneau [13] obtained the optimal convergence rate of the linear wavelet method for a NA size-biased sample on the L 2 risk. As the linear wavelet method is not adaptive, Liu and Xu [14] and Guo and Kou [15] considered a nonlinear wavelet method to estimate the density function from a NA (stratified) size-biased sample. This nonlinear wavelet density estimation has been shown to be adaptive, and a pointwise convergence rate over L p risks was established.
As far as we know, a good density estimation procedure should simultaneously achieve two objectives: computational efficiency and adaptivity [16,17]. Although the nonlinear method in [14,15] is adaptive, the convergence rate of the nonlinear estimation is nearly optimal (up to one logarithmic factor). Cai and Chicken [16] proposed a block thresholding method that can remove the logarithmic factor from the convergence rate of the wavelet density estimation. The block thresholding method provides spatial adaptivity to relatively subtle changes in the underlying density function. More specifically, it considers the wavelet coefficients of blocks of length L, but not the term by term; thus, it produces a degree of graduated smoothing, which amounts to choosing an appropriate bandwidth in kernel estimation. In contrast, the hard thresholding method in Liu and Xu [14] applied to density estimation involves using a global bandwidth in kernel estimation.
We aimed to remove the logarithmic factor in the convergence rate of wavelet density estimation and thus enhance computational efficiency. We selected the block thresholding technique [16,18] and allowed the sample to be NA size-biased. We structured an L p version of the block thresholding wavelet estimator for the density function g (y) in Model (1). This estimator is adaptive and simultaneously achieves the pointwise optimal convergence rate over Besov space. Some examples are provided and the corresponding simulations were conducted using R software. The result indicated that the block thresholding wavelet density estimator is better than the nonlinear wavelet density estimator in terms of the mean squared error (MSE).

Notations and Assumptions
Denote the scaling function and its associated wavelet function by ϕ and ψ, respectively. We assume both ϕ and ψ are r-regularity and have periodic boundary conditions on [0, 1]. Let ϕ i 0 j (y) = 2 i 0 /2 ϕ 2 i 0 y − j , ψ ij (y) = 2 i/2 ψ 2 i y − j . We define an orthonormal basis (ONB) of L 2 [0, 1] as: Then, a function g ∈ L 2 [0, 1] can be reconstructed as: where h,q includes the common Sobolev (B s 2,2 ) and Hölder (B s ∞,∞ ) spaces, so they offer a flexible collection of smooth functions. For a function g (y) that belongs to the Besov space, the Besov sequence norm b s h,q of the wavelet coefficients is bounded [19], which implies that a positive constant K exists, such that: where β and α i 0 denote the vectors, s is an index of regularity that satisfies 0 < s < r, θ and q satisfy To establish our theorem, we list some assumptions that were necessary in our proofs.
(A1) The function g (y) is bounded; that is, a positive constant c 1 exists, such that: (A2) The biasing function η (y) is non-increasing and for all y ∈ [0, 1], two positive constants c 2 and c 3 exist, i.e.,

Estimator and Main Result
Let L = (log n) (p∨2) and divide the set 0, 1, · · · , 2 i−1 into consecutive, non-overlapping blocks of length L at every resolution level i, i.e., 2 log 2 (log 2 n) and i 1 = log 2 n log 2 n and let ∑ (im) represent the summation over j ∈ ∆ im . Then, the block thresholding estimator is given by: We outline the pointwise convergence rate of the wavelet density estimatorĝ (y) in the following theorem.
h,q and s > max 1 h, 1/2 . For the block thresholding wavelet estimatorĝ (y) defined by (5), if the conditions (A1) and (A2) are satisfied, then for 1 ≤ p < ∞ and h ≥ p (2s + 1), a positive constant C exists, such that: Remark 2. The block thresholding wavelet estimator is adaptive since i 0 andi 1 do not rely on s, h, q. If p = 2 and η (y) ≡ 1, Model (1) reduces to the standard density estimation model and the convergence rate in Theorem 1 is the same as the results of Cai and Chicken [16]. If the NA bias reduces to the independent bias, the results of the Theorem 1 become Theorem (4.1) in Chesneau [3].

Remark 3.
With the presence of NA bias, Liu and Xu [14] and Guo and Kou [15] established the near optimal convergence rates (up to a logarithmic term) of a nonlinear wavelet estimator for the density function in Model (1). Note that the logarithmic term in the convergence rate has been removed in Theorem 1, which improves the convergence rates in [14,15].

Simulation Study
We used the method in Alam and Saxena [5] and Liu and Xu [14] to generate NA samples, firstly drawing n samples Z 1 , Y n are NA (refer to Liu and Xu [14]). Throughout the simulations, we took δ = 1 7 . Now, we provide three examples including the linear and nonlinear density functions, and their derivatives are continuous and discontinuous.

Example 2.
Let η (y) = y + 1 and: For the block wavelet threshold, we set c 2 = µ and c 1 = max 1≤t≤n {g (Y t )} + 1. To evaluate the performance of the density estimators, we considered the MSE, which is defined as: Figures 1-3 display the recovery of the density functions by wavelet density estimators for NA samples in the above examples. Table 1 lists the MSEs of these estimators. The results indicated both the nonlinear wavelet method and the block thresholding wavelet method performed well in the density estimation problem even though the samples were NA size-biased. The block wavelet density estimator performed better than the nonlinear wavelet density estimator in terms of the MSE; the estimations were increasingly accurate with increasing sample size.

Proof of Theorem 1
The proof of Theorem 1 is similar to that of Theorem (4.2) in Chesneau [20] and Theorem (4.1) in Chesneau [3]. The difference is that we considered that the samples were not only biased, but also NA. Hence, we had to overcome some non-trivial technical difficulties. Before elaborating the detailed proof, we introduce some basic properties and inequalities of the NA sequence in the following lemmas. Lemma 1. [13]. For a sequence of NA random variables Z 1 , Z 2 , cdots, Z n and the non-empty subsets B 1 , B 2 , · · · , B k of {1, 2, · · · , n}, if B 1 , B 2 , · · · , B k are pairwise disjoint and f 1 , f 2 , · · · , f k are k coordinate-wise non-decreasing Borel functions, then f 1 (Z t , t ∈ B 1 ), f 2 (Z t , t ∈ B 2 ), · · · , f k (Z t , t ∈ B k ) are still NA. Lemma 2. [20]. Let {Z t , t ≥ 1} be NA random variables and {Z * t , t ≥ 1} be independent random variables with the same marginal distribution as Z t . If f is a non-decreasing function, then: Lemma 3. [20]. If {Z t andt ≥ 1} are NA random variables with EZ t = 0 and E|Z t | 2 < ∞, then we have: (1) Rosenthal-type inequality: If E|Z t | p < ∞ for some p ≥ 2, then a constant C p (only depends on p) exists, such that: (2) Kolmogorov-type inequality: Denote b 2 n = n ∑ t=1 EZ 2 t , we have, for all 0 < b < 1, z > 0 and τ > 0: .

Proof of Theorem 1.
According to the proof of Theorem (4.1) in Chesneau [3], the proof of Theorem 1 will be completed if we can show that the moment inequality and the large deviation inequality hold at resolution level i (the case of primary resolution level i 0 can be treated similarly). Therefore, the remainder of the proof is composed of the following two parts. Part one: Moment inequality. By the proof of Proposition 4.1 in Chesneau [3], we have that the inequality (6) is equivalent to where: We first consider the term G ij . Write: Since ψ is a bounded variation function, then two bounded nonnegative nondecreasing functions exist, ψ andψ, such that ψ =ψ −ψ. Denote: whereβ ij = 1 0 g (y)ψ ij (y) dy andβ ij = 1 0 g (y)ψ ij (y) dy and, hence, ξ t =ξ t −ξ t . As Lemma 1 and E (ξ 1 ) = 1 0 µη −1 (y) ψ ij (y) g (y) η (y) µ −1 dy − β ij = 0,ξ t andξ t (t = 1, 2, · · · , n) are zero NA random variables according to the monotonicity of η (y) in assumption (A2).