Next Article in Journal
A New 3D Chaotic Attractor in Gene Regulatory Network
Previous Article in Journal
Solving the Matrix Exponential Function for Special Orthogonal Groups SO(n) up to n = 9 and the Exceptional Lie Group G2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pointwise Estimation of Anisotropic Regression Functions Using Wavelets with Data-Driven Selection Rule

1
School of Mathematics, Sichuan University of Arts and Science, Dazhou 635000, China
2
School of Mathematics and Computational Science, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(1), 98; https://doi.org/10.3390/math12010098
Submission received: 3 November 2023 / Revised: 20 December 2023 / Accepted: 25 December 2023 / Published: 27 December 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
For nonparametric regression estimation, conventional research all focus on isotropic regression function. In this paper, a linear wavelet estimator of anisotropic regression function is constructed, the rate of convergence of this estimator is discussed in anisotropic Besov spaces. More importantly, in order to obtain an adaptive estimator, a regression estimator is proposed with scaling parameter data-driven selection rule. It turns out that our results attain the optimal convergence rate of nonparametric pointwise estimation.

1. Introduction

The classical regression estimation model considers independent and identically distributed (i.i.d.) random variables ( U 1 , V 1 ) , , ( U n , V n ) , which are defined on [ 0 , 1 ] d × R , and a regression function given by
r ( x ) : = E ( ρ ( V ) | U = x ) = R ρ ( y ) g ( x , y ) d y h ( x ) , x [ 0 , 1 ] d .
In this setting, ρ is a known function, g stands for the density function of the random variables ( U , V ) , and h is the density function of U. The aim of this classical model is to estimate the unknown regression function r ( x ) from the data { ( U i , V i ) , i = 1 , , n } . However, the data ( U 1 , V 1 ) , , ( U n , V n ) are not directly observed in some situations that only exhibit a version of them contaminated by some noise. For this problem, this paper considers the biased sampling regression model as follows. Let ( X 1 , Y 1 ) , , ( X n , Y n ) be i.i.d. random variables with the common density function
f ( x , y ) = ω ( x , y ) g ( x , y ) μ , ( x , y ) [ 0 , 1 ] d × R ,
where ω is a known function, μ : = E ( ω ( U , V ) ) < + . We want to estimate the unknown regression function r ( x ) based on the observed data { ( X i , Y i ) , 1 i n } .
The above biased sampling regression model arises in many practical applications, for instance, in [1,2,3]. In order to illustrate an application, take the relationship of company business income V and research and development (R&D) investment U in a country. In order to raise working efficiency, the researchers narrow the sample range and obtain the data ( X , Y ) from some special companies that have higher business income X and R&D investment Y. Because some special companies are chosen to provide data, the density f of ( X , Y ) can be considered as given by f ( x , y ) = ω ( x , y ) g ( x , y ) μ with a known biased sampling function ω and the density g of ( U , V ) . Then, the researchers can estimate the relationship function r ( x ) of the business income and R&D investment in the country using the observed data { ( X i , Y i ) , 1 i n } .
For nonparametric estimation problems, the wavelet method has been widely used by the local property in both time and frequency domain; see [4,5,6,7,8]. For the above biased sampling regression model, the mean integrated square error ( L 2 risk) of linear and nonlinear wavelet estimators is discussed by [9]. An optimal convergence rate of wavelet estimators over the L p ( 1 p < + ) risk is considered by [10,11]. Ref. [12] studied the pointwise l p ( 1 p < + ) risk of wavelet estimators under independent conditions. However, all those results require that the regression functions are isotropic functions (a multivariate function f ( x ) is isotropic if its smoothing parameters are the same at each coordinate axis direction). Moreover, the key scaling parameters of the above conventional wavelet estimators are chosen based only on the sample size n of the observed data { ( X i , Y i ) , 1 i n } . Any other important information about the observed data is neglected. More importantly, the definitions of the conventional wavelet estimators usually do not bring with them a rule to choose the smoothing parameter of the regression function, which means that the conventional wavelet estimators are not adaptive.
In this paper, we firstly construct a linear wavelet estimator for an anisotropic regression function (a multivariate function f ( x ) is anisotropic if its smoothing parameters are different in the coordinate directions; see [13,14]) and study the pointwise l p ( 1 p < + ) risk of the wavelet estimator in anisotropic Besov spaces. In order to overcome the shortage of conventional wavelet estimators, a data-driven selection of a wavelet estimator is proposed. The choice of the scaling parameter of our data-driven estimator depends on not only the sample size but also on other important information of the observed data. Furthermore, the definition of the data-driven estimator is determined by the observed data, which means that our data-driven estimator is completely adaptive. Finally, it should be pointed out that our results attain the optimal convergence rate of nonparametric pointwise estimation when the anisotropic regression function reduces to isotropic function.
The structure of this paper is organized as follows. The definitions and properties of wavelet and Besov space are given in Section 2. Section 3 will propose a linear wavelet estimator, and study the rate of convergence of this estimator in anisotropic Besov spaces. Furthermore, a data-driven estimator with scaling parameter data-driven selection rule is constructed in Section 4.

2. Anisotropic Wavelets and Besov Space

In order to construct wavelet estimators for anisotropic regression functions, some basic concepts of the anisotropic and orthonormal wavelet basis will be given in the following. More details can be found in [14,15,16,17,18,19,20,21,22,23].
Let { V j , j Z } be a classical orthonormal multiresolution analysis of L 2 ( R ) with a compactly supported scaling function ϕ ˜ and wavelet function ψ ˜ . A function space generated by the wavelet functions { 2 j / 2 ψ ˜ ( 2 j y l ) , l Z } is denoted as W j . For τ = ( τ 1 , , τ d ) with τ 1 , , τ d > 0 and τ 1 + + τ d = d , we define
V j τ i : = V j τ i ,
where j 0 , i { 1 , , d } and y denotes the integer part of y. Then, for x = ( x 1 , , x d ) R d and k = ( k 1 , , k d ) Z d (the multiple integer set),
ϕ j τ , k ( x ) : = i = 1 d ϕ ˜ j τ i , k i ( x i ) = i = 1 d 2 j τ i / 2 ϕ ˜ ( 2 j τ i x i k i )
constitutes an orthonormal basis of V j τ : = i = 1 d V j τ i .
The notation V j τ i is reasonable since it depends on j τ i for all i { 1 , d } . It is easy to see that with Γ = { 0 , 1 } d { 0 } d ,
V j + 1 τ = i = 1 d V j + 1 τ i = i = 1 d ( V j τ i l = j τ i ( j + 1 ) τ i 1 W l ) = V j τ γ Γ i = 1 d W j τ i γ i ,
where γ = ( γ 1 , , γ d ) , W j τ i γ i : = V j τ i with γ i = 0 , and W j τ i γ i : = l = j τ i ( j + 1 ) τ i 1 W l for γ i = 1 . Then, the above equation can be rewritten as
V j + 1 τ = V j τ W j τ ; W j τ = γ Γ i = 1 d W j τ i γ i .
Now, we give the anisotropic wavelet basis of W j τ . Define the indicator set I j τ : = { ( γ , m ) : γ Γ , m Z d , j Z } . For any i { 1 , , d } , m i = j τ i with γ i = 0 , m i [ j τ i , ( j + 1 ) τ i ] in the case of γ i = 1 .
Define
ψ j τ , k ( γ , m ) : = 2 | m | 2 i = 1 d ψ γ i ( 2 m i x i k i ) ,
where ( γ , m ) I j τ and | m | = m 1 + + m d . Moreover, ψ γ i = ψ ˜ with γ i = 1 and ψ γ i = ϕ ˜ when γ i = 0 . For j 0 0 ,
{ ϕ j 0 τ , k , ψ j τ , k ( γ , m ) ; j j 0 , ( γ , m ) I j τ , k Z d }
constitutes an orthonormal basis of L 2 ( R d ) . Then, for each f ( x ) L 2 ( R d ) ,
f ( x ) = k Z d α j 0 τ , k ϕ j 0 τ , k ( x ) + j = j 0 ( γ , m ) I j τ k Z d β j τ , k ( γ , m ) ψ j τ , k ( γ , m ) ( x )
with α j 0 τ , k = R d f ( x ) ϕ j 0 τ , k ( x ) d x and β j τ , k ( γ , m ) = R d f ( x ) ψ j τ , k ( γ , m ) ( x ) d x .
Let P j τ be the orthogonal projection operator from L 2 ( R d ) onto V j τ with the orthonormal basis { ϕ j τ , k ( x ) , k Z d } . In this paper, we choose the Daubechies scaling function as ϕ ˜ ; then, the function ϕ L ( R d ) L ( R d ) and k Z d ϕ ( x k ) ϕ ( y k ) ¯ converges absolutely almost everywhere. It can be shown that for f ( x ) L p ( R d ) ( 1 p ) ,
P j τ f ( x ) : = k Z d α j τ , k ϕ j τ , k ( x )
holds almost everywhere on R d .
Besov spaces are important in theory and applications that contain Hölder and L 2 Sobolev spaces as special cases. Now, we give the definition of the anisotropic Besov space B p , q s ( R d ) ([14]). Let
Δ t , i f ( x ) : = f ( x + t e i ) f ( x ) , Δ t , i M i f ( x ) : = ( Δ t , i Δ t , i f M i ) ( x )
with e i = ( 0 , , 0 i 1 , 1 , 0 , , 0 ) , t R , and M i N + ( i = 1 , 2 , , d ) .
Definition 1.
Let 1 p , q < + , s = ( s 1 , , s d ) , M = ( M 1 , , M d ) , and 0 < s i < M i . Then,
B p , q s ( R d ) = { f L p ( R d ) : f s p q M < + } .
f s p q M = f p + i = 1 d ( 0 1 t s i q Δ t , i M i f p q d t t ) 1 q is the Besov norm of f.
The following lemma gives the wavelet description of the anisotropic Besov space.
Lemma 1.
Let s : = ( s 1 , , s d ) , s i > 0 , 1 s ( d ) = 1 d i = 1 d 1 s i , and τ i = s ( d ) s i . Then, the following assertions are equivalent:
(i) f B p , q s ( R d ) ;
(ii) f B p , q s : = α j 0 τ · p + { j = j 0 ( γ , m ) I j τ [ 2 j ( s ( d ) d p ) + | m | 2 β j τ · ( γ , m ) p ] q } 1 q < + ;
(iii) β j τ · ( γ , m ) p 2 j ( s ( d ) d p ) | m | 2 ,
  • where α j 0 τ · p p = k Z d | α j 0 τ , k | p and β j τ · ( γ , m ) p p = k Z d | β j τ , k ( γ , m ) | p .
Here, A B denotes A c B for some constant c > 0 ; A B means B A ; and A B stands for both A B and A B . In this paper, we assume r ( x ) B p , q s ( H ) with H > 0 , where
B p , q s ( H ) : = { f B p , q s ( R d ) : f B p , q s H } .

3. Linear Wavelet Estimation

In this section, a linear wavelet estimator for an anisotropic regression function r ( x ) is constructed, and a convergence rate of this wavelet estimator is proved under some mild assumptions. Now, we firstly give some assumptions about the regression models (1) and (2). Hereafter, general positive constants will be denoted by c 1 , c 2 , , where they can be distinct or not.
Hypothesis 1.
The density function h of a random variable U has a positive lower bound,
inf x [ 0 , 1 ] d h ( x ) c 1 > 0 .
Hypothesis 2.
The function ρ in the Equation (1) satisfies ρ L ( R ) L ( R ) .
Hypothesis 3.
For any ( x , y ) [ 0 , 1 ] d × R , there exist constants 0 < c 2 < c 3 < such that
0 < c 2 < ω ( x , y ) < c 3 < .
Note that Hypotheses 1 and 3 and are standard conditions for a nonparametric regression model with biased data ([9,24,25]). Moreover, the condition y [ a , b ] is required in [24,25]. However, this paper does not need this harsh condition. A new mild condition Hypothesis 2 is proposed, which is the same as in [11,12].
Now, a linear wavelet estimator is defined by
r ^ j τ ( x ) = k Ω j τ α ^ j τ , k ϕ j τ , k ( x ) ,
where the cardinality of Ω j τ satisfies | Ω j τ | 2 i = 1 d j τ i ,
μ ^ n = [ 1 n i = 1 n 1 ω ( X i , Y i ) ] 1 ,
α ^ j τ , k = μ ^ n n i = 1 n ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) .
Similar to the results of Lemma 2 in [11], one can easily obtain
E ( μ ^ n 1 ) = μ 1 , E ( μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) ) = α j τ , k .
On the other hand, the following lemma is obtained using Rosenthal’s inequality. Its proof is similar to Proposition 4.1 in [12].
Lemma 2.
If  Hypotheses 1–3  hold and 2 | j τ | n with | j τ | : = i = 1 d j τ i , then for p [ 1 , ) ,
E | α ^ j τ , k α j τ , k | p n p 2 .
This paper considers nonparametric regression estimation in anisotropic Besov spaces. The following lemma for Besov space is crucial ([14,26]).
Lemma 3.
Let r B p ˜ , q s ( H ) ( p ˜ , q [ 1 , ) ) and s ( d ) > d / p ˜ . Then,
( γ , m ) I j τ k Ω j τ | β j τ , k ( γ , m ) | | ψ j τ , k ( γ , m ) ( x ) | 2 j ( s ( d ) d / p ˜ ) .
Now, we are in this position to state the convergence rate of the linear wavelet estimator.
Theorem 1.
Consider the problem defined by (1) and (2) with Hypotheses 1–3, r ( x ) B p ˜ , q s ( H ) ( p ˜ , q [ 1 , ) ) , and s ( d ) > d / p ˜ . The linear wavelet estimator r ^ j τ ( x ) is defined by (4) with 2 j τ i n s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i . Then,
E | r ^ j τ ( x ) r ( x ) | p n p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d
for every x [ 0 , 1 ] d .
Remark 1.
When s i is constant for every i = 1 , 2 , , d , the anisotropic function spaces reduce to isotropic spaces. Then, our result is the same as the optimal convergence rate of standard nonparametric pointwise estimation ([27,28]).
Proof. 
It is easy to see that
E | r ^ j τ ( x ) r ( x ) | p E | r ^ j τ ( x ) P j τ r ( x ) | p + | P j τ r ( x ) r ( x ) | p .
For the upper bound of | P j τ r ( x ) r ( x ) | p , according to the Hölder inequality and Lemma 3,
| P j τ r ( x ) r ( x ) | p ( j = j ( γ , m ) I j τ k Ω j τ | β j τ , k ( γ , m ) | | ψ j τ , k ( γ , m ) ( x ) | ) p 2 j p ( s ( d ) d / p ˜ ) .
For the upper bound of E | r ^ j τ ( x ) P j τ r ( x ) | p , it follows from the Hölder inequality ( 1 p + 1 p = 1 ) and Lemma 2 that
E | r ^ j τ ( x ) P j τ r ( x ) | p E { k Ω j τ | α ^ j τ , k α j τ , k | | ϕ j τ , k ( x ) | } p E ( k Ω j τ | α ^ j τ , k α j τ , k | p | ϕ j τ , k ( x ) | ) ( k Ω j τ | ϕ j τ , k ( x ) | ) p p 2 | j τ | p / 2 n p 2 .
Note that | j τ | = i = 1 d j τ i i = 1 d j τ i = j d and 2 j τ i n s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i . One thus obtains
2 | j τ | 2 j d , 2 | j τ | n d 2 ( s ( d ) d / p ˜ ) + d .
Those with (9) and (10) show that
E | r ^ j τ ( x ) r ( x ) | p 2 | j τ | d p ( s ( d ) d / p ˜ ) + n 1 2 p d 2 ( s ( d ) d / p ˜ ) + d p 2 n p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .

4. Data-Driven Estimation

In conventional wavelet regression estimators ([12,24,25]) and in the above anisotropic linear wavelet estimator, the choice of the wavelet scaling parameter j ( j τ ) of wavelet estimators are based only on the sample size n of the observed data ( X 1 , Y 1 ) , , ( X n , Y n ) . Because the definitions of these wavelet estimators are related to the smoothing parameter of the regression function, those estimators are not adaptive. In this section, we will construct an adaptive data-driven regression estimator. The scaling parameter selection of this estimator is determined by ( X 1 , Y 1 ) , , ( X n , Y n ) . Furthermore, the definition of this data-driven estimator does not depend on any information about the regression function.
Now, we define an auxiliary estimator
r ^ j τ , j * τ * ( x ) = k α ^ j τ j * τ * , k ϕ j τ j * τ * , k ( x ) ,
where j τ j * τ * : = j τ for | j τ | | j * τ * | ; otherwise, j τ j * τ * : = j * τ * . For a constant λ (specified at almost the end of the proof of Theorem 2),
ν j τ = λ ( 1 + p / 2 ) 2 | j τ | m a x { 1 , ( ln 2 ) | j τ | } / n ,
satisfies
ν j τ n 1 | j τ | 2 | j τ | .
Define | t | = t 1 + + t d and
Θ : = { t = ( t 1 , , t d ) R d ; 0 t i log 2 n ln n , | t | 2 | t | n } .
Then, j 0 τ 0 Θ is determined by the following selection rule:
( I ) ξ ^ j τ ( x ) = max j * τ * Θ [ | r ^ j τ , j * τ * ( x ) r ^ j * τ * ( x ) | ν j * τ * ν j τ ] +
( I I ) ξ ^ j 0 τ 0 ( x ) + 2 ν j 0 τ 0 = min j τ { ξ ^ j τ ( x ) + 2 ν j τ } .
According to the above selection rule, it is easy to see that the parameter j 0 τ 0 is decided by the observed data ( X 1 , Y 1 ) , , ( X n , Y n ) . Now, we give a data-driven estimator r ^ j 0 τ 0 ( x ) ,
r ^ j 0 τ 0 ( x ) = k Ω j 0 τ 0 α ^ j 0 τ 0 , k ϕ j 0 τ 0 , k ( x ) .
In order to discuss the convergence rate of this data-driven estimator, the following Lemma ([29]) and Bernstein’s inequality are essential.
Lemma 4.
Let (Ω,ϝ,χ) be a measurable space and let f ( y ) L p (Ω,ϝ,χ) with p ( 0 , ) . Then, with δ ( t ) = χ { y Ω : | f ( y ) | > t } ,
Ω | f ( y ) | p d χ = p 0 t p 1 δ ( t ) d t .
Bernstein’s inequality Let X 1 , , X n be independent random variables such that E X i = 0 , | X i | M and E X i 2 = σ 2 . Then, for each v 0 ,
P ( 1 n | i = 1 n X i | v ) 2 exp { n v 2 2 ( σ 2 + v M 3 ) } .
Theorem 2.
For Problems (1) and (2) with Hypotheses 1–3, r ( x ) B p ˜ , q s ( H ) ( p ˜ , q [ 1 , ) ) and s ( d ) > d / p ˜ . When j 0 τ 0 is determined by Selection Rules ( I ) and ( I I ) , the data-driven estimator r ^ j 0 τ 0 ( x ) satisfies
E | r ^ j 0 τ 0 ( x ) r ( x ) | p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d
for every x [ 0 , 1 ] d .
Remark 2.
Note that the convergence rate of this data-driven estimator stays the same as that of the above linear wavelet estimator up to a logarithmic factor ( ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d . These two regression estimators all can attain the optimal convergence rate over pointwise error ([28]). On the other hand, the scaling parameter j 0 τ 0 of the data-driven estimator r ^ j 0 τ 0 ( x ) depends on the observed data ( X 1 , Y 1 ) , , ( X n , Y n ) , not only the sample size n. Additionally, the definition of the data-driven estimator r ^ j 0 τ 0 ( x ) is not related to any information about the regression function r ( x ) , which means that this data-driven estimator is completely adaptive.
Proof. 
Let j 1 τ 1 : = ( j 1 τ 1 1 , , j 1 τ 1 d ) and 2 j 1 τ 1 i ( n ln n ) s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i . Then, one can easily obtain
s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i i = 1 d s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i < 1 ,
j 1 τ 1 i log 2 n ln n and j 1 τ 1 Θ .
According to Theorem 1 and 2 j 1 τ 1 i ( n ln n ) s ( d ) 2 ( s ( d ) d / p ˜ ) + d 1 s i ,
E | r ^ j 1 τ 1 ( x ) r ( x ) | p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .
Note that
E | r ^ j 0 τ 0 ( x ) r ( x ) | p E | r ^ j 0 τ 0 ( x ) r ^ j 1 τ 1 ( x ) | p + E | r ^ j 1 τ 1 ( x ) r ( x ) | p .
Hence, we only need to estimate the upper bound of E | r ^ j 0 τ 0 ( x ) r ^ j 1 τ 1 ( x ) | p .
It follows from the selection rule and the definition of an auxiliary estimator that
| r ^ j 0 τ 0 ( x ) r ^ j 1 τ 1 ( x ) | | r ^ j 0 τ 0 ( x ) r ^ j 0 τ 0 , j 1 τ 1 ( x ) | + | r ^ j 0 τ 0 , j 1 τ 1 ( x ) r ^ j 1 τ 1 ( x ) | ( ξ ^ j 1 τ 1 ( x ) + ν j 0 τ 0 + ν j 1 τ 1 ) + ( ξ ^ j 0 τ 0 ( x ) + ν j 1 τ 1 + ν j 0 τ 0 ) ( ξ ^ j 1 τ 1 ( x ) + 2 ν j 1 τ 1 ) .
Then, one can easily obtain that
E | r ^ j 0 τ 0 ( x ) r ^ j 1 τ 1 ( x ) | p E ( ξ ^ j 1 τ 1 ( x ) ) p + E ν j 1 τ 1 p .
According to (12),
E ν j 1 τ 1 p ( n 1 | j 1 τ 1 | 2 | j 1 τ 1 | ) p 2 ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .
for E ( ξ ^ j 1 τ 1 ( x ) ) p . Note that r ^ j 1 τ 1 , j * τ * ( x ) = r ^ j * τ * ( x ) in the case of | j * τ * | | j 1 τ 1 | . Moreover,
ξ ^ j 1 τ 1 ( x ) max | j * τ * | > | j 1 τ 1 | [ | r ^ j 1 τ 1 ( x ) r ^ j * τ * ( x ) | ν j 1 τ 1 ν j * τ * ] + .
It is easy to see from the triangle inequality that | r ^ j 1 τ 1 ( x ) r ^ j * τ * ( x ) | | r ^ j 1 τ 1 ( x ) P j 1 τ 1 r ( x ) | + | P j 1 τ 1 r ( x ) r ( x ) | + | r ( x ) P j * τ * r ( x ) | + | P j * τ * r ( x ) r ^ j * τ * ( x ) | . Then, using | j 1 τ 1 | j 1 d , | j * τ * | > | j 1 τ 1 | and (9), one obtains
| P j 1 τ 1 r ( x ) r ( x ) | p 2 j 1 p ( s ( d ) d / p ˜ ) 2 | j 1 τ 1 | p d ( s ( d ) d / p ˜ ) ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d ,
| r ( x ) P j * τ * r ( x ) | p 2 | j * τ * | p d ( s ( d ) d / p ˜ ) 2 | j 1 τ 1 | p d ( s ( d ) d / p ˜ ) ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .
Hence, one knows
E ( ξ ^ j 1 τ 1 ( x ) ) p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d + E [ | r ^ j 1 τ 1 ( x ) P j 1 τ 1 r ( x ) | ν j 1 τ 1 ] + p + max | j * τ * | > | j 1 τ 1 | E [ | r ^ j * τ * ( x ) P j * τ * r ( x ) | ν j * τ * ] + p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d + j τ 1 = 0 log 2 n ln n j τ d = 0 log 2 n ln n E [ | r ^ j τ ( x ) P j τ r ( x ) | ν j τ ] + p .
For t > 0 , { ( | r ^ j τ ( x ) P j τ r ( x ) | ν j τ ) + t } = { | r ^ j τ ( x ) P j τ r ( x ) | ν j τ t } . Then, using Lemma 4,
E [ | r ^ j τ ( x ) P j τ r ( x ) | ν j τ ] + p = p 0 t p 1 P ( | r ^ j τ ( x ) P j τ r ( x ) | ν j τ + t ) d t = p ν j τ p 0 t p 1 P ( | r ^ j τ ( x ) P j τ r ( x ) | ( 1 + t ) ν j τ ) d t .
According to the definition of r ^ j τ ( x ) ,
| r ^ j τ ( x ) P j τ r ( x ) | | 1 n i = 1 n { k Ω j τ [ μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) α j τ , k ] ϕ j τ , k ( x ) } | + | 1 n i = 1 n { k Ω j τ [ 1 ω ( X i , Y i ) 1 μ ] ϕ j τ , k ( x ) } | : = | 1 n i = 1 n η i | + | 1 n i = 1 n ζ i | .
Furthermore, the following probability inequality is true:
P ( | r ^ j τ ( x ) P j τ r ( x ) | ( 1 + t ) ν j τ ) P ( | 1 n i = 1 n η i | 1 2 ( 1 + t ) ν j τ ) + P ( | 1 n i = 1 n ζ i | 1 2 ( 1 + t ) ν j τ ) .
Note that η 1 , , η d is independent and identically distributed and E η i = 0 . The assumptions Hypotheses 1–3 and the property of ϕ j τ , k ( x ) imply | η i | C 1 2 | j τ | ( C 1 > 0 ) . Using the Hölder inequality,
E | η i | 2 E [ k Ω j τ | μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) α j τ , k | 2 | ϕ j τ ; k ( x ) | ] [ k Ω j τ | ϕ j τ , k ( x ) | ] .
According to the argument of (6), one has
E | μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) α j τ , k | 2 E | μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) | 2 = [ 0 , 1 ] d × R | μ ρ ( y ) ω ( x , y ) h ( x ) ϕ j τ , k ( x ) | 2 f ( x , y ) d x d y .
This with Hypotheses 1–3 and (2) shows that
E | μ ρ ( Y i ) ω ( X i , Y i ) h ( X i ) ϕ j τ , k ( X i ) α j τ , k | 2 1 .
Hence, there exists a constant C 2 > 0 such that
E | η i | 2 C 2 2 | j τ | .
According to ν j τ 1 and Bernstein’s inequality,
P ( | 1 n i = 1 n η i | 1 2 ( 1 + t ) ν j τ ) 2 exp { n ( 1 2 ( 1 + t ) ν j τ ) 2 2 ( C 2 2 | j τ | + ( 1 + t ) C 1 2 | j τ | ν j τ 6 ) } exp { n ( 1 + t ) ν j τ 2 C 3 2 | j τ | ) } ( C 3 > 0 ) .
Similar to the arguments of (19), one can obtain that
P ( | 1 n i = 1 n ζ i | 1 2 ( 1 + t ) ν j τ ) exp { n ( 1 + t ) ν j τ 2 C 4 2 | j τ | ) } ( C 4 > 0 ) .
Choosing λ max { C 3 , C 4 } , it follows from (11), (19) and (20) that the (18) reduces to
P ( | r ^ j τ ( x ) P j τ r ( x ) | ( 1 + t ) ν j τ ) exp { ( 1 + t ) ( 1 + p 2 ) max ( 1 , ( ln 2 ) | j τ | ) } e t 2 ( 1 + p 2 ) | j τ | .
Furthermore, by the definition of ν j τ ,
E [ | r ^ j τ ( x ) P j τ r ( x ) | ν j τ ] + p p ν j τ p 0 t p 1 e t 2 ( 1 + p 2 ) | j τ | d t 2 | j τ | ( ln n n ) p 2 .
Finally, one obtains
E ( ξ ^ j 1 τ 1 ( x ) ) p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d + j τ 1 = 0 log 2 n ln n · · · j τ d = 0 log 2 n ln n 2 | j τ | ( ln n n ) p 2 ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d + ( ln n n ) p 2 ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .
Combining (14)–(17) and (21),
E | r ^ j 0 τ 0 ( x ) r ( x ) | p ( n ln n ) p ( s ( d ) d / p ˜ ) 2 ( s ( d ) d / p ˜ ) + d .

5. Conclusions

In this paper, we consider nonparametric wavelet estimations of anisotropic regression function. Firstly, a linear wavelet estimator is proposed by wavelet projection operator. The rate of convergence over pointwise error of this wavelet estimator is proved under some mild conditions. Secondly, we construct a data-driven regression estimator with scaling parametric selection method. It should be point out that the definition of this data-driven only depends on the observed data. In other word, this data-driven estimator is an adaptive estimator. Finally, the convergence rate of the data-driven estimator is discussed by the selection rule. Compared with the conventional nonparametric regression estimations, those two regression estimators all can attain the optimal convergence rate. In addition, the data-driven estimator will be widely used in big data processing.

Author Contributions

Conceptualization, J.C. and J.K.; writing—original draft preparation, J.C.; writing—review and editing, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the National Natural Science Foundation of China (No. 12361016), Guangxi Natural Science Foundation (No. 2023GXNSFAA026042), Center for Applied Mathematics of Guangxi (GUET), and the Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cristóbal, J.A.; Alcalá, J.T. Nonparametric regression estimators for length biased data. J. Stat. Plan. Inference 2000, 89, 145–168. [Google Scholar] [CrossRef]
  2. Jewell, N. Least squares regression with data arising from samples of the dependent variable. Biometrika 1985, 72, 11–21. [Google Scholar] [CrossRef]
  3. Nair, V.N.; Wang, P.C.C. Maximum likelihood estimation under a succesive sampling discovery model. Technometrics 1989, 31, 423–436. [Google Scholar] [CrossRef]
  4. Amato, U.; Antoniadis, A.; Feis, I.D.; Gijbels, I. Wavelet-based robust estimation and variable selection in nonparametric additive models. Stat. Comput. 2022, 32, 11. [Google Scholar] [CrossRef]
  5. Li, L.Y.; Zhang, B. Nonlinear wavelet-based estimation to spectral density for stationary non-Gaussian linear processes. Appl. Comput. Harmon. Anal. 2022, 60, 176–204. [Google Scholar] [CrossRef]
  6. Liu, Y.M.; Zeng, X.C. Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 2020, 48, 321–342. [Google Scholar] [CrossRef]
  7. Wishart, J.R. Smooth hyperbolic wavelet deconvolution with anisotropic structure. Electron. J. Stat. 2019, 13, 1694–1716. [Google Scholar] [CrossRef]
  8. Wu, C.; Zeng, X.C.; Mi, N. Adaptive and optimal pointwise deconvolution density estimations by wavelets. Adv. Comput. Math. 2021, 47, 14. [Google Scholar] [CrossRef]
  9. Chesneau, C.; Shirazi, E. Nonparametric wavelet regression based on biased data. Commun. Stat. Theory Methods 2014, 43, 2642–2658. [Google Scholar] [CrossRef]
  10. Kou, J.K.; Liu, Y.M. An extension of Chesneau’s theorem. Stat. Probab. Lett. 2016, 108, 23–32. [Google Scholar] [CrossRef]
  11. Kou, J.K.; Liu, Y.M. Nonparametric regression estimations over Lp risk based on biased dat. Commun. Stat. Theory Methods 2017, 46, 2375–2395. [Google Scholar] [CrossRef]
  12. Guo, H.J.; Kou, J.K. Pointwise wavelet estimation of regression function based on biased data. Results Math. 2019, 74, 128. [Google Scholar] [CrossRef]
  13. Goldenshluger, A.; Lepski, O. On adaptive minimax density estimation on Rd. Probab. Theory Relat. Fields 2014, 159, 479–543. [Google Scholar] [CrossRef]
  14. Triebel, H. Theory of Function Spaces III; Birkhäuser: Berlin, Germany, 2006. [Google Scholar]
  15. Berry, M.V.; Lewis, Z.V.; Nye, J.F. On the weierstrass-mandelbrot fractal function. Proc. R. Soc. A 1980, 370, 459–484. [Google Scholar]
  16. Guariglia, E.; Silvestrov, S. Fractional-wavelet analysis of positive definite distributions and wavelets on D’(C). In Engineering Mathematics II: Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization; Springer International Publishing: New York, NY, USA, 2016. [Google Scholar]
  17. Guariglia, E.; Guido, R.C. Chebyshev wavelet analysis. J. Funct. Spaces 2022. [Google Scholar] [CrossRef]
  18. Guariglia, E. Primality, Fractality and image analysis. Entropy 2019, 21, 304. [Google Scholar] [CrossRef] [PubMed]
  19. Guido, R.C.; Pedroso, F.; Contreras, R.C.; Rodrigues, L.C.; Guariglia, E.; Neto, J.S. Introduce the discrete path thransform (DPT) and its applications in signal analysis, artefact removal, and spoken word recognition. Digit. Signal Process. 2021, 117. [Google Scholar] [CrossRef]
  20. Jiang, X.X.; Wang, J.W.; Wang, W.; Zhang, H.X. A predictor-corrector compact difference scheme for a nonlinear fractional differential equation. Fractal Fract. 2023, 7, 521. [Google Scholar] [CrossRef]
  21. Yang, L.; Su, H.L.; Zhong, C.; Meng, Z.Q.; Luo, H.W.; Li, X.C.; Tang, Y.Y.; Lu, Y. Hyperspectral image classification using wavelet transform-based smooth ordering. Int. J. Wavelets Multiresolution Inf. Process. 2019, 17. [Google Scholar] [CrossRef]
  22. Yang, X.H.; Wu, L.J.; Zhang, H.X. A space-time spectral order sinc-collocation method for the fourth-order nonlocal heat model arising in viscoelasticity. Appl. Math. Comput. 2023, 457. [Google Scholar] [CrossRef]
  23. Zheng, X.W.; Tang, Y.Y.; Zhou, J.T. A framework of adaptive multiscale wavelet decomposition for signals on undirected graphs. IEEE Trans. Signal Process. 2019, 67, 1696–1711. [Google Scholar] [CrossRef]
  24. Chaubey, Y.P.; Chesneau, C.; Shirazi, E. Wavelet-based estimation of regression function for dependent biased data under a given random design. J. Nonparametr. Stat. 2013, 25, 53–71. [Google Scholar] [CrossRef]
  25. Chaubey, Y.P.; Shirazi, E. On MISE of a nonlinear wavelet estimator of the regression function based on biased data. Commun. Stat. Theory Methods 2015, 44, 885–899. [Google Scholar] [CrossRef]
  26. Liu, Y.M.; Wu, C. Point-wise estimation for anisotropic densities. J. Multivar. Anal. 2019, 171, 112–125. [Google Scholar] [CrossRef]
  27. Cai, T.T. Rates of convergence and adaptation over Besov space under pointwise risk. Stat. Sin. 2003, 13, 881–902. [Google Scholar]
  28. Rebelles, G. Pointwise adaptive estimation of a multivariate density under independence hypothesis. Bernoulli 2015, 21, 1984–2023. [Google Scholar] [CrossRef]
  29. Cohn, D.L. Measure Theory; Birkhäuser: Boston, MA, USA, 1980. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Kou, J. Pointwise Estimation of Anisotropic Regression Functions Using Wavelets with Data-Driven Selection Rule. Mathematics 2024, 12, 98. https://doi.org/10.3390/math12010098

AMA Style

Chen J, Kou J. Pointwise Estimation of Anisotropic Regression Functions Using Wavelets with Data-Driven Selection Rule. Mathematics. 2024; 12(1):98. https://doi.org/10.3390/math12010098

Chicago/Turabian Style

Chen, Jia, and Junke Kou. 2024. "Pointwise Estimation of Anisotropic Regression Functions Using Wavelets with Data-Driven Selection Rule" Mathematics 12, no. 1: 98. https://doi.org/10.3390/math12010098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop