Abstract
This paper considers nonparametric regression estimation with errors in the variables. It is a standard assumption that the characteristic function of the covariate error does not vanish on the real line. This assumption is rather strong. In this paper, we assume the covariate error distribution is a convolution of uniform distributions, the characteristic function of which contains zeros on the real line. Our regression estimator is constructed via the Laplace transform. We prove its strong consistency and show its convergence rate. It turns out that zeros in the characteristic function have no effect on the convergence rate of our estimator.
Keywords:
nonparametric regression; errors in variables; Laplace transform; strong consistency; convergence rate MSC:
62G08; 62G20
1. Introduction
This paper considers a regression model with errors in the variables. Suppose observations are i.i.d. (independent and identically distributed) random variables generated by the model
The i.i.d. random variables are independent of and . are independent of , and . The functions (known) and (unknown) stand for the densities of and , respectively. The goal is to estimate the regression function from the observations . Errors-in-variables regression problems have been extensively studied in the literature, see, for example, ([1,2,3,4,5,6,7]). Regression models with errors in the variables play an important role in many areas of science and social science ([8,9,10]).
Nadaraya and Watson ([11,12]) propose a kernel regression estimator for the classical regression model . Since the Fourier transform can transform a complex convolution to an ordinary product, it is a common method to deal with the deconvolution problem. Fan and Truong [4] generalize the Nadaraya–Watson regression estimator from the classical regression model to the regression model (1) via the Fourier transform. They study the convergence rate by assuming the integer order derivatives of and m to be bounded. Compared to integer-order derivatives, it is more precise to describe the smoothness by the Hölder condition. Meister [6] shows the convergence rate under the local Hölder condition.
The above references on model (1) both assume that the characteristic function of the covariate errors does not have zeros on the real line. The assumption is rather strong. For example, if is of uniform density on [−1, 1], it vanishes at in the Fourier domain. Delaigle and Meister [1] consider the regression model (1) with a Fourier-oscillating noise, which means the Fourier transform of vanishes periodically. They show that if and m are compact, then they can be estimated with the standard rate, as in the case where does not vanish in the Fourier domain. Guo and Liu ([13,14,15]) extend Delaigle and Meister [1]’s work to multivariate cases.
The compactness is the cost of eliminating the zero points effect in the Fourier domain. Belomestny and Goldenshluger [16] apply the Laplace transform to construct a deconvolution density estimator without assuming the density to be compact. They provide sufficient conditions under which the zeros of the corresponding characteristic function have no effect on the estimation accuracy. Goldenshluger and Kim [17] also construct a deconvolution density estimator via the Laplace transform; they study how zero multiplicity affects the estimation accuracy. Motivated by the above work, we apply the Laplace transform to study the regression model (1) with errors following a convolution of uniform distributions.
The organization of the paper is as follows. In Section 2, we present some knowledge about the covariate error distribution and functional classes. Section 3 introduces the kernel regression estimator via the Laplace transform. The consistency and convergence rate of our estimator are discussed in Section 4 and Section 5, respectively.
2. Preparation
This section will introduce the covariate error distribution and functional classes.
For a integrable function f, the bilateral Laplace transform [18] is defined by
The Laplace transform is an analytic function in the convergence region , which is a vertical strip:
The inverse Laplace transform is given by the formula
Let the covariate error distribution be a -fold convolution of the uniform distribution on . This means
where are i.i.d and with density . Hence,
Here, is the product of two functions; the function has zeros only on the imaginary axis, the function does not have zeros for the analyticity of . The zeros of are , where .
Now, we introduce some functional classes.
Definition 1.
For , , and , a function is said to satisfy the local Hölder condition with smoothness parameter β if f is k times continuously differentiable and
where and . All these functions are denoted by .
If (3) holds for any , f satisfies the Hölder condition with smoothness parameter . All these functions are denoted by .
Clearly, k in Definition 1 equals . In later discussions, .
Example 1.
Function
Then, and .
It is easy to see that must be contained in for each . However, the reverse is not necessarily true.
Example 2
([19]). Consider the function
where is the indicator function on the interval for a non-negative integer l. Then, for each . However, .
Note that (3) is a local Hölder condition around . When we consider the pointwise estimation, it is natural to assume the unknown function to satisfy a local smoothness condition.
Definition 2.
Let and be real numbers. We say that a function f belongs to the functional class if
We denote .
3. Kernel Estimator
This section will construct the kernel regression estimator. Two kernels K and will be used.
Assume that the kernel satisfies the following conditions:
(i) , and supp ;
(ii) There exists a fixed positive integer such that
Example 3
([20]). Function
where
and . Then, the kernel satisfies conditions (i) and (ii) with .
Motivated by Belomestny and Goldenshluger [16], we will construct the regression estimator via the Laplace transform. Note that does not have zeros out of the imaginary axis. Then, the kernel is defined by the inverse Laplace transform
where , and is the Laplace transform of kernel K with the convergence region . There is a complex-valued improper integral in (4). One can use the property of the Laplace transform to compute it, see [18].
The following lemma provides a infinite series of kernel . It is a specific form of Lemma 2 in [16]. In order to explain the construction of the estimator, we give the details of the proof.
The truncation is used to deal with infinite series. Select parameter N so that . The cut-off kernels are defined by
Denote
Motivated by the Nadaraya–Watson regression estimator, we define the regression estimator of as
where
In what follows, we will write and for the estimator (7) associated with and , respectively. Finally, our regression estimator is denoted by
4. Strong Consistency
In this section, we investigate the consistency of the regression estimator (9). Roughly speaking, consistency means that the estimator converges to as the sample size tends to infinity.
Theorem 1
Proof.
We consider the estimator for .
Note that , and . Then, it is sufficient to prove and .
Now, we prove . For any ,
By Markov’s inequality, we obtain
for . This motivates us to derive an upper bound on . Combining (5) with (8), we have
and
We obtain
where .
Thus,
Let denote the number of elements contained in the set A. If , at least one of is independent of all other , . Hence,
On the other hand, if for , by Jensen’s inequality, we obtain
where . Let . Then,
Since for all k, we obtain that for . This, with , leads to
Inserting this into (10), we obtain
Hence,
Since and considering the boundedness of K,
holds for an h that is small enough. It follows from , and that
Note that the kernel function K satisfies condition (i) and , then
holds for each Lebesgue point x of p. Hence, for an n that is sufficiently large, the term vanishes. This, with (14), shows
for an n that is large enough. Since , we have
For any , it follows from the Borel–Cantelli lemma that
Thus,
When putting almost surely, we have
Hence,
By and (2), we have that . So,
Thus, we have
Since and considering the boundedness of K,
holds for an h that is small enough.
Similar to , we get
This completes the proof. □
Remark 1.
Theorem 1 shows the strong consistency of kernel estimator . It is different from the work of Meister [6] in that the density function of our covariate error δ contains zeros in the Fourier domain. Our covariate error belongs to the Fourier oscillating noise considered by Delaigle and Meister [1]. Compared to their work, we construct a regression estimator via the Laplace transform without assuming and m to be compact.
5. Convergence Rate
In this section, we focus on the convergence rate in the weak sense. Meister [6] introduces the weak convergence rate by modifying the concept of weak consistency. A regression estimator is said to attain the weak convergence rate if
The set is the collection of all pairs that satisfy some conditions. The order of limits is first , and then . Here, C is independent of n.
Define the set
where .
The following Lemma is used to prove the theorem in this section.
Lemma 2
([6]). If , , and . Then, for a small enough ,
with two positive constants, and .
Theorem 2.
Proof.
(1) We assume that and consider the estimator . Applying Lemma 2 and Markov’s inequality, we obtain
where is the larger of and , and appear in Lemma 2. Then,
and
First, we estimate and . By (18), we have
By Taylor expansion of p with the degree , there exists such that
Since kernel K satisfies condition (ii) and , we have
By , we find that
holds for an h that is small enough. Equations (19) and (28) imply the following upper bound:
Now, we estimate and . By (8), we have
Note that . It follows from and that . Then,
It follows from (5) that
Therefore,
Let
where is the number of weak compositions of l in parts [21]. Note that
Then,
By supp , we have supp . Denote . If , the intervals and are disjointed for . For an h that is small enough, we obtain
Denote . By supp and ,
Since , we have
This, with (37), leads to
When , we obtain
by and similar arguments to [16]. Similarly,
and
Hence,
Similar to estimate , we have
Since and ,
Note that . Then,
This leads to the result of Theorem 2 for .
Similar arguments to (34)–(37) show
holds for an h that is small enough, where . Denote . Similar to (38),
Similar to (45),
This leads to the result of Theorem 2 for .
This completes the proof. □
Remark 2.
Our convergence rate is the same as that in the ordinary smoothness case of Meister [6], where the density function of the covariate error does not vanish in the Fourier domain. Compared to Delaigle and Meister [1], we do not assume and m to be compact.
Remark 3.
Belomestny and Goldenshluger [16] consider the density deconvolution problem with non-standard error distributions. They assume the density function to be estimated satisfies the Hölder condition. It is natural to assume a local smooth condition in point estimation. Hence, and are assumed to satisfy the local Hölder condition in our discussion.
Remark 4.
Theorem 1 shows the strong consistency of the regression estimator without the smoothness assumption. The main tool used is the Borel–Cantelli lemma which requires a convergent series. It is easy to see from (13) and (20) that the choice of h is not unique. Theorem 2 gives a weak convergence rate, which is defined by modifying the weak consistency. It is natural to assume the smoothness condition when discussing the convergence rate. In Theorem 2, the choice of h is related to the smoothness index β. It follows from our proof (44) that the choice of h is unique in the sense of a constant difference.
Remark 5.
It would be interesting to study the numerical illustration of our estimation. We shall investigate this in the future.
Author Contributions
Writing—original draft preparation, H.G. and Q.B.; Writing—review and editing, H.G. All authors have read and agreed to the published version of the manuscript.
Funding
This paper is supported by the National Natural Science Foundation of China (No. 12001132), the Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, and the Center for Applied Mathematics of Guangxi (GUET).
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank the editor and reviewers for their important comments.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Delaigle, A.; Meister, A. Nonparametric function estimation under Fourier-oscillating noise. Stat. Sin. 2011, 21, 1065–1092. [Google Scholar] [CrossRef]
- Dong, H.; Otsu, T.; Taylor, L. Bandwidth selection for nonparametric regression with errors-in-variables. Econom. Rev. 2023, 42, 393–419. [Google Scholar] [CrossRef]
- Di Marzio, M.; Fensore, S.; Taylor, C.C. Kernel regression for errors-in-variables problems in the circular domain. Stat. Methods Appl. 2023. [Google Scholar] [CrossRef]
- Fan, J.Q.; Truong, Y.K. Nonparametric regression with errors in variables. Ann. Stat. 1993, 21, 1900–1925. [Google Scholar] [CrossRef]
- Hu, Z.R.; Ke, Z.T.; Liu, J.S. Measurement error models: From nonparametric methods to deep neural networks. Stat. Sci. 2022, 37, 473–493. [Google Scholar] [CrossRef]
- Meister, A. Deconvolution Problems in Nonparametric Statistics; Springer: Berlin, Germany, 2009. [Google Scholar]
- Song, W.X.; Ayub, K.; Shi, J.H. Extrapolation estimation for nonparametric regression with measurement error. Scand. J. Stat. 2023. [Google Scholar] [CrossRef]
- Carroll, R.J.; Delaigle, A.; Hall, P. Non-parametric regression estimation from data contaminated by a mixture of Berkson and classical errors. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 859–878. [Google Scholar] [CrossRef] [PubMed]
- Zhou, S.; Pati, D.; Wang, T.Y.; Yang, Y.; Carroll, R.J. Gaussian processes with errors in variables: Theory and computation. J. Mach. Learn. Res. 2023, 24, 1–53. [Google Scholar]
- Delaigle, A.; Hall, P.; Jamshidi, F. Confidence bands in non-parametric errors-in-variables regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 2015, 77, 149–169. [Google Scholar] [CrossRef]
- Nadaraya, E.A. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
- Watson, G.S. Smooth regression analysis. Sankhyā Indian J. Stat. 1964, 26, 359–372. [Google Scholar]
- Guo, H.J.; Liu, Y.M. Strong consistency of wavelet estimators for errors-in-variables regression model. Ann. Inst. Stat. Math. 2017, 69, 121–144. [Google Scholar] [CrossRef]
- Guo, H.J.; Liu, Y.M. Convergence rates of multivariate regression estimators with errors-in-variables. Numer. Funct. Anal. Optim. 2017, 38, 1564–1588. [Google Scholar] [CrossRef]
- Guo, H.J.; Liu, Y.M. Regression estimation under strong mixing data. Ann. Inst. Stat. Math. 2019, 71, 553–576. [Google Scholar] [CrossRef]
- Belomestny, D.; Goldenshluger, A. Density deconvolution under general assumptions on the distribution of measurement errors. Ann. Stat. 2021, 49, 615–649. [Google Scholar] [CrossRef]
- Goldenshluger, A.; Kim, T. Density deconvolution with non-standard error distributions: Rates of convergence and adaptive estimation. Electron. J. Stat. 2021, 15, 3394–3427. [Google Scholar] [CrossRef]
- Oppenheim, A.V.; Willsky, A.S.; Nawab, H.S. Signals & Systems, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
- Liu, Y.M.; Wu, C. Point-wise estimation for anisotropic densities. J. Multivar. Anal. 2019, 171, 112–125. [Google Scholar] [CrossRef]
- Stein, E.M.; Shakarchi, R. Real Analysis: Measure Theory, Integration, and Hilbert Spaces; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
- Stanley, R.P. Enumerative Combinatorics; Cambridge University Press: Cambridge, UK, 1997; Volume 1. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).