Regression Estimation with Errors in the Variables via the Laplace Transform

: This paper considers nonparametric regression estimation with errors in the variables. It is a standard assumption that the characteristic function of the covariate error does not vanish on the real line. This assumption is rather strong. In this paper, we assume the covariate error distribution is a convolution of uniform distributions, the characteristic function of which contains zeros on the real line. Our regression estimator is constructed via the Laplace transform. We prove its strong consistency and show its convergence rate. It turns out that zeros in the characteristic function have no effect on the convergence rate of our estimator.


Introduction
This paper considers a regression model with errors in the variables.Suppose observations (W 1 , Y 1 ), • • • , (W n , Y n ) are i.i.d.(independent and identically distributed) random variables generated by the model The i.i.d.random variables δ j are independent of X j and Y j .j are independent of X j , E j = 0 and E 2 j < +∞.The functions f δ (known) and f X (unknown) stand for the densities of δ j and X j , respectively.The goal is to estimate the regression function m(x) from the observations (W 1 , Y 1 ), • • • , (W n , Y n ).Errors-in-variables regression problems have been extensively studied in the literature, see, for example, ( [1][2][3][4][5][6][7]). Regression models with errors in the variables play an important role in many areas of science and social science ( [8][9][10]).Nadaraya and Watson ([11,12]) propose a kernel regression estimator for the classical regression model (δ j = 0).Since the Fourier transform can transform a complex convolution to an ordinary product, it is a common method to deal with the deconvolution problem.Fan and Truong [4] generalize the Nadaraya-Watson regression estimator from the classical regression model to the regression model (1) via the Fourier transform.They study the convergence rate by assuming the integer order derivatives of f X and m to be bounded.Compared to integer-order derivatives, it is more precise to describe the smoothness by the Hölder condition.Meister [6] shows the convergence rate under the local Hölder condition.
The above references on model (1) both assume that the characteristic function of the covariate errors δ j does not have zeros on the real line.The assumption is rather strong.For example, if f δ is of uniform density on [−1, 1], it vanishes at v = kπ, k = ±1, ±2, • • • in the Fourier domain.Delaigle and Meister [1] consider the regression model (1) with a Fourier-oscillating noise, which means the Fourier transform of f δ vanishes periodically.
They show that if f X and m are compact, then they can be estimated with the standard rate, as in the case where f δ does not vanish in the Fourier domain.) extend Delaigle and Meister [1]'s work to multivariate cases.
The compactness is the cost of eliminating the zero points effect in the Fourier domain.Belomestny and Goldenshluger [16] apply the Laplace transform to construct a deconvolution density estimator without assuming the density to be compact.They provide sufficient conditions under which the zeros of the corresponding characteristic function have no effect on the estimation accuracy.Goldenshluger and Kim [17] also construct a deconvolution density estimator via the Laplace transform; they study how zero multiplicity affects the estimation accuracy.Motivated by the above work, we apply the Laplace transform to study the regression model (1) with errors following a convolution of uniform distributions.
The organization of the paper is as follows.In Section 2, we present some knowledge about the covariate error distribution and functional classes.Section 3 introduces the kernel regression estimator via the Laplace transform.The consistency and convergence rate of our estimator are discussed in Section 4 and Section 5, respectively.

Preparation
This section will introduce the covariate error distribution and functional classes.For a integrable function f , the bilateral Laplace transform [18] is defined by The Laplace transform f (z) is an analytic function in the convergence region Σ f , which is a vertical strip: The inverse Laplace transform is given by the formula Let the covariate error distribution be a γ-fold convolution of the uniform distribution on [−θ, θ], θ > 0. This means where Here, fδ (z) is the product of two functions; the function (1 − e 2θz ) γ has zeros only on the imaginary axis, the function 1 (−2θz) γ e γθz does not have zeros for the analyticity of (−2θz) γ e γθz .The zeros of fδ (z) are z k = ikπ θ , where k ∈ Z\{0}.Now, we introduce some functional classes.

Definition 1.
For A > 0, σ > 0, and β > 0, a function f : R → R is said to satisfy the local Hölder condition with smoothness parameter β if f is k times continuously differentiable and where β = k + β 0 and 0 < β 0 ≤ 1.All these functions are denoted by H σ,β;x (A).
If (3) holds for any y, ỹ ∈ R, f satisfies the Hölder condition with smoothness parameter β.All these functions are denoted by H β (A).
Clearly, k in Definition 1 equals max{l ∈ N : l < β}.In later discussions, β := max{l ∈ N : l < β}. Then, It is easy to see that f ∈ H β (A) must be contained in H σ,β;x (A) for each x ∈ R.However, the reverse is not necessarily true.

Example 2 ([19]
).Consider the function Note that (3) is a local Hölder condition around x ∈ R. When we consider the pointwise estimation, it is natural to assume the unknown function to satisfy a local smoothness condition.Definition 2. Let r > 0 and B > 0 be real numbers.We say that a function f belongs to the functional class M r (B) if We denote F σ,β,r;x (A, B) = H σ,β;x (A) ∩ M r (B).

Kernel Estimator
This section will construct the kernel regression estimator.Two kernels K and L s,h will be used.
Assume that the kernel K : R → R satisfies the following conditions: (i) Then, the kernel K(x) satisfies conditions (i) and (ii) with k 0 = 1.
Motivated by Belomestny and Goldenshluger [16], we will construct the regression estimator via the Laplace transform.Note that fδ (−z) does not have zeros out of the imaginary axis.Then, the kernel L s,h is defined by the inverse Laplace transform where s = 0, h > 0 and K(•) is the Laplace transform of kernel K with the convergence region Σ K = C.There is a complex-valued improper integral in (4).One can use the property of the Laplace transform to compute it, see [18].
The following lemma provides a infinite series of kernel L s,h (t).It is a specific form of Lemma 2 in [16] .In order to explain the construction of the estimator, we give the details of the proof.
The truncation is used to deal with infinite series.Select parameter N so that N γ ∈ N + .The cut-off kernels are defined by Denote Motivated by the Nadaraya-Watson regression estimator, we define the regression estimator of m(x) as where In what follows, we will write m(N) +,h (x) and m(N) −,h (x) for the estimator (7) associated with s > 0 and s < 0, respectively.Finally, our regression estimator is denoted by

Strong Consistency
In this section, we investigate the consistency of the regression estimator (9).Roughly speaking, consistency means that the estimator m(N) h (x) converges to m(x) as the sample size tends to infinity. 6(γ+1) and n
= p(x) .For any > 0, By Markov's inequality, we obtain for s := 4(γ + 1).This motivates us to derive an upper bound on E| p(N) 5) with ( 8), we have and We obtain where Thus, Let #A denote the number of elements contained in the set A.
On the other hand, if #{j 1 , • • • , j 2s } = s 1 for s 1 ≤ s, by Jensen's inequality, we obtain where Inserting this into (10), we obtain Note that (W j , Y j ) are identically distributed.Then, it follows from (11) and E(Y j e ivW j ) = E(Y j e ivX j )E(e ivδ j ) that where By (2), we have where Hence, Since p ∈ M r (B) and considering the boundedness of K, holds for an h that is small enough.It follows from r > 1 2 , h = n − 1 6(γ+1) and N ≥ n vanishes.This, with ( 14), shows (20) for an n that is large enough.Since s = 4(γ + 1), we have For any > 0, it follows from the Borel-Cantelli lemma that
By (21), we have where ) γ and (2), we have that Similar to (17), we obtain where Thus, we have Since p ∈ M r (B) and considering the boundedness of K, holds for an h that is small enough.Similar to x ≥ 0, we get a.s.

= m(x).
This completes the proof.

Remark 1. Theorem 1 shows the strong consistency of kernel estimator m(N) h (x).
It is different from the work of Meister [6] in that the density function of our covariate error δ contains zeros in the Fourier domain.Our covariate error belongs to the Fourier oscillating noise considered by Delaigle and Meister [1].Compared to their work, we construct a regression estimator via the Laplace transform without assuming f X and m to be compact.

Convergence Rate
In this section, we focus on the convergence rate in the weak sense.Meister [6] introduces the weak convergence rate by modifying the concept of weak consistency.A regression estimator mn (x) is said to attain the weak convergence rate The set P is the collection of all pairs (m, f X ) that satisfy some conditions.The order of limits is first n → ∞, and then C → ∞.Here, C is independent of n.
Define the set The following Lemma is used to prove the theorem in this section.
Remark 2. Our convergence rate is the same as that in the ordinary smoothness case of Meister [6], where the density function of the covariate error does not vanish in the Fourier domain.Compared to Delaigle and Meister [1], we do not assume f X and m to be compact.
Remark 3. Belomestny and Goldenshluger [16] consider the density deconvolution problem with non-standard error distributions.They assume the density function to be estimated satisfies the Hölder condition.It is natural to assume a local smooth condition in point estimation.Hence, f X and m f X are assumed to satisfy the local Hölder condition in our discussion.