Next Article in Journal
A New Log-Transform Histogram Equalization Technique for Deep Learning-Based Document Forgery Detection
Previous Article in Journal
Convergence on Kirk Iteration of Cesàro Means for Asymptotically Nonexpansive Mappings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Almost Sure Rate of Convergence for the Wavelets Estimator in the Partially Linear Additive Models

by
Khalid Chokri
1,† and
Salim Bouzebda
2,*,†
1
Modeling and Complex Systems Laboratory-M.C.S.L., Cadi Ayyad University, P.B. 549, Marrakech 40001, Morocco
2
Laboratoire de Mathématiques Appliquées de Compiègne-L.M.A.C., Université de Technologie de Compiègne, 60205 Compiègne Cedex, France
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2025, 17(3), 394; https://doi.org/10.3390/sym17030394
Submission received: 10 February 2025 / Revised: 28 February 2025 / Accepted: 4 March 2025 / Published: 5 March 2025
(This article belongs to the Section Mathematics)

Abstract

:
In this article, we examine a class of partially linear additive models (PLAM) defined via a measurable mapping Ψ : R q R . More precisely, we consider Ψ ( Y i ) : = Y i = Z i β + l = 1 d m l ( X l , i ) + ε i , i = 1 , , n , where Z i = ( Z i , 1 , , Z i , p ) and X i = ( X 1 , i , , X d , i ) denote vectors of explanatory variables. The unknown parameter vector is β = ( β 1 , , β p ) , and m 1 , , m d are real-valued functions of a single variable whose forms are not specified. The error terms ε 1 , , ε n are identically distributed with mean zero and finite variance σ ε , and they fulfill the condition E ( ε X , Z ) = 0 almost surely. These models are broadly applicable in finance, biology, and engineering, where capturing intricate nonlinear effects is essential. We propose an estimation method that leverages marginal integration in conjunction with linear wavelet-based techniques to obtain estimators for the unknown components m 1 , , m d . Under suitable regularity conditions, we establish strong uniform convergence of these estimators, demonstrating that they achieve practically relevant convergence rates. Our theoretical results indicate that these estimators converge uniformly at rates that are favorable for practical applications, underscoring the adaptability and scope of this partially linear additive model.

1. Introduction

Consider a random vector X , Y , Z R d × R q × R p , and let Ψ : R q R be a measurable mapping from some function class F . We investigate a modeling structure where the relation between the response variables and the covariates is captured by a partially linear specification of the form
Ψ ( Y ) = Y = m 0 + Z β + m ( X ) + ε ,
where m 0 is a constant, β R p is a vector of unknown coefficients, and m ( · ) is a nonparametric function designed to capture potential nonlinear effects of the variables in X . The random error term ε represents noise. Throughout, the transpose of any vector V will be written as V . Model (1) is notably flexible, accommodating a variety of established statistical frameworks as particular cases (see, for example, [1]). In practice, it is common to collect a large set of variables, which then requires the elimination of those that are not influential before finalizing the model. Hence, the selection of significant variables is a key consideration in both parametric and nonparametric regression contexts; for a comprehensive discussion, see [2]. One illustrative application concerns the study of factors influencing East Germans’ decisions to migrate to West Germany one year after reunification [3,4], where certain covariates (e.g., discrete variables) may exert linear effects on the response, whereas others act in a nonlinear manner. Similar scenarios arise in various domains, such as civil engineering materials [5], the Pima Indian diabetes dataset [6,7], and investigations related to guidelines for urinary incontinence discussion and evaluation [8].
Introducing a linear component Z β alongside a nonparametric function m ( X ) in the model (1) offers an appealing compromise between purely parametric and fully nonparametric approaches. Parametric models are easy to use but can be severely biased if their assumptions are violated, whereas nonparametric techniques can accommodate complex data structures but often need large sample sizes for reliable inference. By incorporating both a linear and a nonparametric term, the partially linear formulation in (1) combines the interpretability and efficiency of parametric methods with the flexibility to capture nonlinear relationships via m ( X ) . A salient special case is isotonic regression, which arises if Ψ ( y ) = y and the linear term is omitted. This isotonic setting dates back to the pioneering work of [9] and remains a rich area of investigation in statistics.
The partially linear framework, initially proposed by [10] for the classical situation Ψ ( y ) = y , has been extensively used in fields such as economics, epidemiology, engineering, and the life sciences. Foundational contributions by [11,12,13] deepened the theoretical underpinnings for estimation and inference, motivating a large body of ensuing research. In recent decades, numerous efforts have extended (1) to the case Ψ ( y ) = y and refined inference for both the parametric and nonparametric components; see [14,15,16,17,18,19,20,21,22], as well as the comprehensive volume [23]. More contemporary developments appear in [24,25,26,27,28,29,30,31,32,33].
When parametric models are correctly specified, they can yield a strong performance but may introduce significant biases if crucial assumptions are violated. In contrast, fully nonparametric smoothing does not rely on strict parametric constraints, though it can be hindered by the “curse of dimensionality” in moderate or high dimensions. To address this challenge, refs. [34,35] introduced additive models, which decompose multivariate nonparametric relationships into sums of lower-dimensional functions. This decomposition promotes interpretability, reduces computational overhead, and can lead to performance gains over fully nonparametric techniques. Detailed expositions of additive models and their development appear in [36,37], and related sources.
In our specific context, we handle the potentially high dimension of m ( X ) in (1) by imposing an additive structure. Concretely, we replace m ( X ) with
Ψ ( Y ) = m 0 + Z β + l = 1 d m l ( X l ) + ε ,
where X l denotes the -th component of X and m l ( · ) captures its influence on Ψ ( Y ) . This partially linear additive design retains a transparent linear part Z β while permitting each covariate X l to appear in a potentially nonlinear form. Empirical results such as those in [38] suggest that exploiting this additive framework can substantially improve efficiency. In related work, refs. [39,40] illustrate how additional constraints in semiparametric models can also boost estimation precision.
Researchers have long taken an interest in partially linear additive models across diverse domains. Kernel-based techniques are frequently employed to estimate (2), partly because of their relative ease of implementation and strong theoretical foundations. In-depth reviews of kernel-based methods can be found in [41,42,43]. In parallel, ref. [30] introduced a fusion-penalized inverse probability weighted least squares method designed for missing responses in partially linear additive setups with latent subgroups. Meanwhile, ref. [32] proposed a smooth-threshold generalized method of moments estimator and a variable selection procedure for partially linear additive spatial autoregressive models, utilizing basis expansions for the nonparametric components.
Recent work by [29] examined partially linear additive models in pooled data settings, supporting their approach with simulations and an application based on environmental health information from the National Health and Nutrition Examination Survey. Independently, ref. [26] introduced a weighted quantile regression methodology with inverse probability weighting to address selective data retention, establishing large-sample behavior for both parametric and nonparametric components. Other significant studies include [44], which proposed a scalar-on-function paradigm incorporating nonparametric functional predictors and semiparametric covariates, and [45], which developed generalized partial functional linear additive models (GPFLAM) and employed a quasi-likelihood approach. Additionally, ref. [46] investigated partially linear additive dynamic panels with fixed effects and autocorrelation, and [47] examined transformation models for right-censored survival data under the same partially linear additive framework. Wavelet-based methods have also been explored; for example, ref. [48] advanced a spline-backfitted kernel smoothing algorithm displaying oracle-type properties under stationarity and geometric mixing conditions.
Although kernel-based estimators are effective, they may face complications near boundaries or with discontinuous target functions. Wavelet methods inherently address such localized irregularities by allowing for abrupt changes and flexible boundary treatments. Foundational contributions to wavelet theory—see [49,50,51,52] and the summary in [53]—established a robust mathematical basis for wavelet-based estimation. Important theoretical expansions of this framework can be found in [54,55,56,57].
Wavelet techniques have seen extensive application in density estimation and regression. For instance, ref. [58] analyzed a linear wavelet estimator, proving its strong consistency under L 2 -type smoothing with the help of sophisticated empirical process tools. Subsequently, ref. [59] introduced statistical inference procedures for wavelet estimators of symmetric positive-definite function curves, employing a log-Euclidean metric to maintain key properties such as positive-definiteness and permutation invariance. Wavelet-based methods have also been used to detect and categorize local symmetry in images [60], to characterize refinement masks for symmetric wavelets meeting particular sum rules [61], and to examine DNA sequence symmetries [62]. Further advances include [63], which developed nearly frame-like wavelets with specified approximation properties, and [64], focusing on block thresholding for nonparametric regression. Broader contexts of wavelet transforms involving symmetry groups or tube domains are discussed in [65,66], while [67] investigated perfect reconstruction of two-dimensional signals on hexagonal grids with enhanced regularity. In [68,69], a methodology based on the k-th correlation coefficient was introduced to verify whether the true model error in partial linear settings follows a symmetric distribution.
A notable takeaway from this body of literature is that linear wavelet estimation can asymptotically achieve performance comparable to the well-known nonlinear hard-threshold wavelet estimator developed by [56], reinforcing the practical advantages of wavelet-based approaches. Under α -mixing processes, ref. [70] showed that wavelet estimators for densities on bounded domains in R d attain optimal or near-optimal almost sure convergence rates, mirroring those in the i.i.d. context. Extending this line of inquiry, ref. [71] demonstrated strong uniform convergence for wavelet regression estimators on compact subsets of R d . Discussions of linear wavelet estimators appear in [70,71,72,73,74].
That said, the distributional properties of wavelet estimators are comparatively less well-established. Early progress was made by [75], who proved almost sure uniform convergence and asymptotic normality for wavelet estimators of density and regression functions under ergodic R d -valued processes, later extending these insights to continuous-time ergodic processes in [76]. Subsequent work has refined and broadened these findings; see, for example, [77,78,79,80]. Collectively, this evolving literature underscores the versatility and strength of wavelet-based methods in a wide variety of applications.
  • Contributions of this paper. In this study, we seek to establish the optimal almost sure convergence rate of wavelet estimators for each component function in partially linear additive regression models (PLAM). Our work substantially extends and refines earlier theoretical findings, in particular those reported by [43,80,81,82]. Rather than merely synthesizing existing methodologies, we rigorously examine the dual challenges posed by mixing dependence structures—commonly arising in both time series and spatial data—and the complexities introduced by wavelet expansions. To meet these challenges, we employ advanced large-sample theory, the mixing conditions proposed by [71], and the sophisticated wavelet expansion techniques discussed in [79]. By integrating these elements, we can derive optimal almost sure convergence results that remain valid beyond the standard assumption of independent and identically distributed observations. Crucially, even in the simpler i.i.d. setting, the optimal almost sure convergence rate of wavelet estimators for component functions in additive regression models has not been fully determined. Addressing this unresolved issue constitutes a key contribution to our work and highlights the significant theoretical and technical developments required to advance the field. In previous works [71,75,76,78,79,83], wavelet-based techniques have been employed to address multivariate regression. However, in these studies, the convergence rate was shown to depend on the dimension of the covariates, causing performance to deteriorate as the number of covariates increases. As a result, larger sample sizes become necessary to draw robust conclusions. In contrast, the principal contribution of the present paper lies in achieving a dimension-free convergence rate, which offers a significant advantage in practical applications.
  • Paper organization. The subsequent sections provide the essential background, introduce the primary model, and present the theoretical findings. In Section 2, we offer a concise overview of wavelets, Besov spaces, and wavelet estimators, which collectively serve as the foundational framework for our later developments. Section 3 then elaborates on the partially linear additive model and specifies the estimators that constitute the core of our analysis. Next, Section 4 outlines the principal assumptions and formally states our main theoretical contributions, including detailed assertions regarding the asymptotic normality of each additive component. Section 5 concludes by underscoring the key results and proposing directions for further study. To maintain a coherent narrative, the complete proofs of the central theorems are deferred to Section 6, while the Appendix A consolidates additional lemmas and other technical derivations, thereby preserving the continuity of the main exposition.

2. Wavelets and Besov Space

In this section, we introduce the notation associated with wavelets and Besov spaces. We begin by providing a brief overview of wavelet multiresolution theory as developed in [49], and further elaborated in [70,71,72]. Consider a multiresolution analysis { V j } j = 1 of the space L 2 ( R d ) , where L p ( R d ) (for 1 p < ) is the set of all measurable functions f : R d R satisfying
R d | f ( x ) | p d x <
and is equipped with the norm
f L p = R d | f ( x ) | p d x 1 / p .
A multiresolution analysis is defined as a decomposition of L 2 ( R d ) into an increasing sequence of closed subspaces { V j : j Z } , satisfying the following properties:
1.
For all integers j,
V j V j + 1 .
2.
j V j = { 0 } , j V j ¯ = L 2 ( R d ) .
3.
For any f ( · ) V j ,
f ( 2 · ) V j + 1 , and f ( · + k ) V j for all k Z d .
4.
There exists a scaling function ϕ L 2 ( R d ) , normalized such that
R d ϕ ( x ) d x = 1 ,
and the set
{ ϕ k ( x ) = ϕ ( x k ) : k Z d }
forms an orthonormal basis of V 0 . Hence,
V 0 = span ¯ { ϕ ( · k ) : k Z d } .
By scaling, we obtain an orthonormal basis for V j :
{ ϕ j , k ( x ) = 2 j d / 2 ϕ ( 2 j x k ) : k Z d } .
Throughout this paper, we assume that ϕ ( · ) is r-regular ( r 1 ), meaning that ϕ C r ( R d ) and satisfies, for each i > 0 and all multi-indices β = ( β 1 , , β d ) with
| β | = i = 1 d β i r ,
the bound
| D β ϕ ( x ) | C i ( 1 + x ) i ,
where
D β ϕ ( x ) = | β | ϕ x 1 β 1 x d β d ( x ) ,
and C i is a constant depending only on i.
Define the wavelet space W j by
V j W j = V j + 1 .
Following the construction in [50], one obtains N = 2 d 1 wavelet functions
{ ψ i : i = 1 , , N }
associated with the scaling function ϕ , such that the following applies:
1.
Orthonormal Basis for W 0 :
{ ψ i ( x k ) : k Z d , i = 1 , , N }
forms an orthonormal basis for W 0 .
2.
Orthonormal Basis for L 2 ( R d ) :
{ ψ i , j , k ( x ) = 2 j d / 2 ψ i ( 2 j x k ) : j Z , k Z d , i = 1 , , N }
forms an orthonormal basis for L 2 ( R d ) .
3.
Regularity and Compact Support: Each ψ i ( · ) shares the same regularity as ϕ ( · ) , and both ϕ and ψ i have compact support contained within [ L , L ] d for some L > 0 .
Starting from some integer j 0 , any f ( · ) L 2 ( R d ) can be expanded as
f ( x ) = k Z d a j 0 , k ϕ j 0 , k ( x ) + i = 1 2 d 1 j = j 0 k Z d b i , j , k ψ i , j , k ( x ) ,
where
a j 0 , k = R d f ( u ) ϕ j 0 , k ( u ) d u ,
and
b i , j , k = R d f ( u ) ψ i , j , k ( u ) d u .
For any l j 0 , the orthogonal projection of f onto V l can be written in two equivalent forms:
( P V l f ) ( x ) = k Z d a l , k ϕ l , k ( x ) = k Z d a j 0 , k ϕ j 0 , k ( x ) + j = j 0 l i = 1 N k Z d b i , j , k ψ i , j , k ( x ) .
Besov spaces B s , p , q are of central importance in statistical estimation and approximation theory as they provide a flexible framework for characterizing smoothness properties; for instance, see [84,85,86,87,88]. They include a wide variety of both homogeneous and inhomogeneous function spaces frequently employed in statistical analysis. Using wavelet coefficients, ref. [49] characterizes the Besov space as follows. Consider a real-valued smoothness parameter 0 < s < r and let f L p ( R d ) . Then f B s , p , q if and only if
(B.1)
J s , p , q ( f ) = P V 0 f L p + j 0 2 j s P W j f L p q 1 / q < ,
(B.2)
J s , p , q ( f ) = a 0 · l p + j 0 2 j ( s + d ( 1 / 2 1 / p ) ) b j · l p q 1 / q < ,
where
a 0 · l p = k Z d | a 0 , k | p 1 / p ,
and
b j · l p = i = 1 N k Z d | b i , j , k | p 1 / p .
For q = , the usual sup-norm modification applies. Additional equivalent characterizations and advantages of Besov spaces in approximation theory and statistics can be found in [56,89,90,91] and in the Appendix A.

Linear Wavelets Regression Estimators

We begin by introducing the notation and definitions that will be used in the following sections. We start with a reminder of the definition of strong mixing. To this end, we introduce the following notation.
Let F i k ( Z ) denote the σ -algebra generated by the random variables { Z j : i j k } . The strong mixing coefficient as introduced by [92] is defined as follows. For additional details, the reader may consult [93].
Definition 1. 
Let Z = { Z i : i = 1 , 2 , } be a strictly stationary sequence of random variables. For a positive integer n, we set
α ( n ) = sup | P ( A B ) P ( A ) P ( B ) | : A F 1 k ( Z ) , B F k + n ( Z ) , k N * .
The sequence Z is said to be α-mixing (or strongly mixing) if the mixing coefficient α ( n ) converges to 0 as n .
In the study of mixing conditions, α -mixing emerges as a relatively weak yet broadly applicable criterion, encompassing a wide range of stochastic processes, including many time series models. The seminal contributions of [94,95] established sufficient conditions for linear processes to be α -mixing. These works further demonstrated that both linear autoregressive and bilinear models possess strong mixing properties, with mixing coefficients that decay at an exponential rate. In addition, ref. [96] highlighted the critical role of α -mixing in understanding nonlinear time series models, including aspects of geometric ergodicity, a concept further explored and extended in [75]. Along similar lines, ref. [97] showed that functional autoregressive processes can achieve geometric ergodicity under appropriate conditions.
Moreover, the studies [98,99] demonstrated that, even with minimal assumptions, autoregressive conditional heteroscedastic (ARCH) processes and nonlinear additive autoregressive models incorporating exogenous variables remain both stationary and α -mixing.
Next, let { X i , Y i } be jointly stationary processes, and let Ψ ( · ) be a Borel measurable function on the real line. Define
m ( x , Ψ ) = E [ Ψ ( Y 1 ) X 1 = x ] ,
for x R d , whenever it exists. That is, we assume
E | Ψ ( Y 1 ) | < .
Our framework includes several special cases of the regression model (5), depending on the choice of the function Ψ ( · ) . Notable examples include the following:
  • Ψ ( Y ) = 1 { Y y } ,
    which yields the conditional distribution function of Y 1 given X 1 = x .
  • Ψ ( Y ) = Y k ,
    which provides the k-th conditional moment of Y 1 given X 1 = x .
Throughout, we shall assume
E | Ψ ( Y 1 ) | p < , p 2 .
Following the approach of previous works, our estimator of m ( Ψ , x ) will be obtained as the ratio of wavelet estimators of
g ( Ψ , x ) = m ( x , Ψ ) f X ( x )
and f X ( x ) , where f X ( x ) is the unknown density function of X . First, we recall that from (4), the linear wavelet estimator of f X ( x ) L 2 ( R d ) is given by
f ^ X ( x ) = k Z d a ^ τ , k ϕ τ , k ( x ) ,
or equivalently,
f ^ X ( x ) = k Z d a ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 τ i = 1 N k Z d b ^ i j k ψ i , j , k ( x ) ,
where the estimators of the coefficients { a τ , k } and { b i j k } are given by
a ^ τ , k = 1 n i = 1 n ϕ τ , k ( X i ) , and b ^ i j k = 1 n i = 1 n ψ i , j , k ( X i ) ,
for any fixed j 0 τ , where τ = λ ( n ) represents the resolution level, a strictly positive integer depending on n and tending to infinity at a specified rate.
By the regularity and compact support conditions on ϕ ( · ) and ψ ( · ) , the above sums are finite for any fixed x , which is crucial for practical implementation. The supports of ϕ ( · ) and ψ i ( · ) grow only as a function of their degree of differentiability; see [50]. Next, since g ( Ψ , · ) L 2 ( R d ) , it admits a wavelet expansion as well. Thus, a linear wavelet estimator for g ( Ψ , x ) can be written as
g ^ n ( Ψ , x ) = k Z d a ^ τ , k ϕ τ , k ( x ) ,
or equivalently,
g ^ n ( Ψ , x ) = k Z d a ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 τ i = 1 N k Z d b ^ i j k ψ i , j , k ( x ) ,
where the coefficient estimators are
a ^ τ , k = 1 n i = 1 n Ψ ( Y i ) ϕ τ , k ( X i ) ,
and
b ^ i j k = 1 n i = 1 n Ψ ( Y i ) ψ i , j , k ( X i ) ,
for any j 0 τ . From a practical perspective, the multiresolution properties of wavelets facilitate computationally efficient estimation algorithms that are memory-efficient. In particular, the bandwidth is chosen in the form 2 j , where j can be selected from a small set of values (e.g., three or four); see [57].
Returning to the wavelet-based estimation, for u , v R d , we define the kernel K ( u , v ) by
K ( u , v ) : = k Z d ϕ ( u k ) ϕ ( v k ) .
Since
| ϕ ( x ) | A d + 1 ( 1 + x ) d + 1 ,
the kernel function K ( · , · ) defined in (12) converges uniformly in u , v R d . In particular,
sup u , v R d k Z d ϕ ( u k ) ϕ ( v k ) < .
Moreover, for any j 1 , there exists a constant C j such that ([49], p. 33)
| K ( v , u ) | C j ( 1 + v u ) j .
From (14), it follows that
R d | K ( v , u ) | j d v G j ( d ) ,
where
G j ( d ) = 2 π d / 2 Γ ( d ) Γ ( j + d ( j 1 ) ) Γ ( d / 2 ) Γ ( ( d + 1 ) j ) C d + 1 j ,
and Γ ( t ) denotes the Gamma function,
Γ ( t ) : = 0 y t 1 e y d y .
Furthermore, under the assumptions on ϕ ( · ) , for any partial derivative of order | β | = 1 , we have ([49], p. 33)
K ( u , v ) u i C 2 ( 1 + u v ) 2 , i = 1 , , d .
Combining (7), (9), and (12), we see that the linear estimates of the regression function m ^ n ( φ , x ) can be expressed as an extended kernel estimator:
K h n , x ( X i ) = 1 h n d K x h n , X i h n , with h n = 2 λ ( n ) .
Thus, we have
m ^ n ( Ψ , x ) = g ^ n ( Ψ , x ) f ^ X ( x ) = m ^ 2 , n ( Ψ , x ) m ^ 1 , n ( x ) , if f ^ X ( x ) 0 , 1 n i = 1 n φ ( Y i ) , otherwise .
Here,
m ^ 1 , n ( Ψ , x ) = i = 1 n K h n , x ( X i ) n E [ K h n , x ( X 1 ) ]
and
m ^ 2 , n ( Ψ , x ) = i = 1 n Ψ ( Y i ) K h n , x ( X i ) n E [ K h n , x ( X 1 ) ] .
This formulation neatly integrates wavelet-based estimation of regression functions with kernel-like estimation procedures.

3. Presentation of Estimators

Consider a stationary sequence { ( X i , Y i , Z i ) } i 1 of random replicates from the random vector ( X , Y , Z ) . Let f X Y Z ( · ) be the joint density of ( X , Y , Z ) with respect to the Lebesgue measure, and let f X ( · ) , f Y ( · ) , and f Z ( · ) denote the corresponding marginal densities. For any x R d , we define the kernel estimator of f X ( · ) by
f n ( x ) = 2 λ ( n ) d n i = 1 n K 2 λ ( n ) x , 2 λ ( n ) X i ,
where K ( · ) is a bounded, non-negative kernel function integrating to 1, and λ ( n ) controls the bandwidth growth rate.
For simplicity in what follows, we assume m 0 = 0 . Observing that model (1) can be recast for a measurable function Ψ ( · ) F as
Ψ ( Y ) Z β = m ( Ψ , X ) + ε ,
we propose an internal estimator of the nonparametric component in (16). For any x R d , this estimator is given by
m ^ n β ( Ψ , x , h ) = i = 1 n 2 λ ( n ) d Ψ ( Y i ) Z i β n f n ( X i ) K 2 λ ( n ) x , 2 λ ( n ) X i ,
where h = 2 λ ( n ) is a smoothing parameter tending to zero as specified by Condition (H.2).
Since m ^ n β ( · ) depends on the unknown β , we recognize that the true function m ( · ) in (16) may also naturally depend on β . In particular, when m ( · ) has an additive form, we can write
m a d d β ( Ψ , x ) = l = 1 d m l β ( Ψ , x l ) ,
where x l is the -th component of x . To ensure identifiability of each additive component m l ( · ) , we impose
E m l β ( Ψ , X l ) = 0 for 1 l d .
There are various approaches to estimating the additive components in regression models. Two notable examples are the marginal integration technique (see, for instance, [100,101,102,103,104]) and back-fitting algorithms (cf. [105,106], and references therein). While back-fitting poses more intricate asymptotic challenges (though progress is reported in [107,108]), marginal integration methods tend to exhibit more transparent asymptotic properties, making them particularly appealing in statistical and econometric contexts.
Guided by these advantages, we adopt a marginal integration framework to estimate the linear parameter β through a least squares criterion. Crucially, our estimators require only kernel functions, smoothing parameters, and the density estimators, avoiding the need for complex optimization techniques or weighting schemes that depend on unknown quantities (as is sometimes required in other methods; see, e.g., [109]).

3.1. The Marginal Integration Method

In this work, we rely on marginal integration to build our estimators. To outline this approach, let us first establish some notation. For 1 l d , let
x l = ( x 1 , , x l 1 , x l + 1 , , x d ) .
We then define the functions
q l ( x l ) = j = 1 j l d q j ( x j ) and q ( x ) = l = 1 d q l ( x l ) ,
where q l is a known univariate density for each l { 1 , , d } . Throughout, all integrals over continuous variables are assumed to be taken with respect to the Lebesgue measure. By applying marginal integration, the additive regression function can be represented as
m a d d β ( Ψ , x ) = l = 1 d η l β ( Ψ , x l ) + R d m a d d β ( Ψ , z ) q ( z ) d z ,
where each component η l β is given by
η l β ( Ψ , x l ) = R d 1 m a d d β ( Ψ , x ) q l ( x l ) d x l R d m a d d β ( Ψ , x ) q ( x ) d x .
For notational convenience, we will often write m a d d ( · ) instead of m a d d β ( · ) and η l ( · ) instead of η l β ( · ) .
Given the parameter β , we define an estimator of m a d d ( · ) via marginal integration as follows:
m ^ a d d β ( Ψ , x ) = l = 1 d η ^ l β ( Ψ , x l ) + R d m ^ n β ( Ψ , z ) q ( z ) d z ,
where
η ^ l β ( Ψ , x l ) = R d 1 m ^ n β ( Ψ , x ) q l ( x l ) d x l R d m ^ n β ( Ψ , x ) q ( x ) d x .
Throughout the remainder of the paper, we will simplify the notation by writing m ^ a d d ( · ) for m ^ a d d β ( · ) , m ^ n ( · ) for m ^ n β ( · ) , and η ^ l ( · ) for η ^ l β ( · ) . Consequently, once β has been estimated, we obtain a fully specified estimator m ^ a d d ( · ) for the additive regression function.

3.2. Estimation of the Model’s Parameter

Consider the following partially linear additive regression model:
Ψ ( Y ) = Z β + m a d d ( X ) + ε .
Throughout the subsequent discussion, for l = 1 , , d , we define
K l ( u l ) = R d 1 K ( u ) d u l and K l ( u l ) = R K ( u ) d u l .
Following the methodology outlined in [42], we estimate β using
β ^ = [ Z ˜ Z ˜ ] 1 Z ˜ Y ˜ ,
where
Y ˜ = ψ ( Y i ) j = 1 n W n j ( X i ) Ψ ( Y j ) 1 i n ,
Z ˜ = Z i j = 1 n W n j ( X i ) Z j 1 i n ,
W n j ( X i ) = U n j ( X i ) n f n ( X j ) ,
and
U n j ( X i ) = l = 1 d 2 λ ( n ) K l 2 λ ( n ) X l , i , 2 λ ( n ) X j , l D l ( d 1 ) R d 2 λ ( n ) d K 2 λ ( n ) z , 2 λ ( n ) X j q ( z ) d z .
Here,
D l = R d 1 2 λ ( n ) ( d 1 ) K 2 λ ( n ) z l , 2 λ ( n ) X l , j q l ( z l ) d z l .
For additional information regarding the estimation of β , we refer the reader to [16]. Finally, to estimate the regression function and the additive components, we use
m ^ a d d ( Ψ , x , h ) = l = 1 d η ^ l ( x l ) + R d m ^ n ( z ) q ( z ) d z ,
and
η ^ l ( Ψ , x l , h ) = R d 1 m ^ n ( x ) q l ( x l ) d x l R d m ^ n ( x ) q ( x ) d x .

4. Main Results

To present our findings, we introduce an additional structural assumption on the model. Specifically, we assume that
lim n 1 n i = 1 n Z ˜ i Z ˜ i = B a . s . ,
where B is a p × p positive-definite matrix. Let
I d = l = 1 d I l = l = 1 d [ a l , c l ] ,
and
J d = l = 1 d J l = l = 1 d [ a l , c l ] ,
be two fixed partitions of R d such that
a l < a l < c l < c l , for all 1 l d .
Additional assumptions regarding the density of X , the regression function m ( · ) , the kernel functions K ( · ) , K ( · ) , and the smoothing parameters are collected below for ease of reference and the reader’s convenience. The first set of conditions concerns the density function f X ( · ) and the regression function m ( · ) .
(G.1)
The density f X ( · ) is continuous and bounded away from 0 on R d .
(G.2)
One of the following holds:
(i)
f X ( · ) is bounded and uniformly continuous on R d ,
or
(ii)
f X ( · ) belongs to a Besov space B s , p , q for some 0 < s < r , 1 p , q .
(G.3)
For all x , y , we have
| f j ( x , y ) f X ( x ) f X ( y ) | M < for all j 1 ,
where f j ( x , y ) is the joint probability density function of the vectors ( X 0 , X j ) , assumed to exist.
(G.4)
(i)
For all 1 l d , the functions m l β ( Ψ , · ) are Lipschitz continuous of order s 1 p .
(ii)
sup x I d g ( φ , x ) < .
(iii)
The function H ( · ) belongs to the Besov space B s , p , q for some 0 < s < r , 1 p , q .
(G.5)
The process { X i , Y i , Z i } is α -mixing with a mixing coefficient α ( s ) satisfying
α ( s ) = O ( s ν ) , where ν > 2 d + 3 .
In what follows, let λ ( n ) denote a sequence of positive constants satisfying the following conditions:
(H.1)
As n ,
n 1 / 2 λ ( n ) , 2 λ ( n ) , n 2 λ ( n ) a 0 ,
and
n 2 λ ( n ) 1 / 2 n 1 log n s d + 2 s 0 .
(H.2)
As n ,
n 1 2 λ ( n ) log n log 2 ( n ) 0 .
(H.3)
As n ,
2 3 λ ( n ) n n log n a ( s 1 p ) 0 .
Next, we state assumptions on the random variables Z and Y.
(M.1)
Ψ ( Y ) and Z are bounded.
To present our main results, we impose conditions on the density functions q l ( · ) for 1 l d .
(Q.1)
Each q l ( · ) is bounded and continuous for all 1 l d . Moreover, q l ( · ) is bounded and uniformly continuous on R for all 1 l d .
Alternatively, we assume the following:
(Q.2)
The function q l ( · ) belongs to a Besov space B s , p , q for some 0 < s < r , 1 p ,   q , for all 1 l d .
(R.1)
The multiresolution analysis is r-regular.
Our principal finding is encapsulated in the following result.
Theorem 1. 
Suppose that all the conditions (G.1)–(G.5), (H.1)–(H.3), (M.1), (Q.1), (Q.2), and (R.1) are satisfied. Then, for a given measurable function Ψ ( · ) F , it holds that
sup x C m ^ add β ^ ( Ψ , x ) m ( x ) = O log n n a ( s 1 p ) .
Remark 1. 
It is noteworthy that the quantity 2 λ ( n ) appearing in the preceding theorems corresponds directly to the bandwidth h n utilized in Parzen–Rosenblatt convolution kernel density estimators. From a practical perspective, selecting the multiresolution level τ n associated with wavelets is often more straightforward than choosing the bandwidth h n itself. Indeed, only a relatively small number of candidate values for λ ( n ) (e.g., three or four) need to be considered in practice, thus rendering the procedure more accessible and computationally feasible.
Remark 2. 
It is important to emphasize that the precise logarithmic convergence rates are influenced by both the chosen resolution level and the underlying smoothness parameters s associated with f ( · ) , as posited in (G.2). The function f ( · ) lies within a Besov-type space B s , p , q , which allows for more general and subtle smoothness characterizations than classical integer-order differentiability conditions. Such complexity in smoothness assumptions is well-known in nonparametric estimation and aligns with established results in the literature. While s may be unknown in practice, these convergence rates are standard and commonly observed across nonparametric procedures. To render the estimator fully data-driven, various adaptive techniques have been proposed for selecting the resolution level τ, including methods inspired by Stein’s unbiased risk estimation, heuristic “rules of thumb,” and cross-validation approaches. For detailed discussions on the design and analysis of asymptotically optimal bandwidth selection rules, the interested reader may refer to [53,57].
Remark 3. 
The moment condition (M.1) can be generalized substantially. Instead of requiring boundedness conditions, one may impose more general assumptions on the moments of Y , as outlined in [110]. Specifically, condition (M.1) can be replaced by one of the following stronger but more flexible alternatives:
(M.1)′ 
There exists a measurable envelope function F : R q [ 0 , ) such that for all y R q ,
F ( y ) sup Ψ F | Ψ ( y ) | ,
and for some s > 2 ,
sup x I d E | F ( Y ) | s X = x < .
(M.1)″ 
Consider a nonnegative, continuous, and nondecreasing function { M ( x ) : x 0 } defined on [ 0 , ) . Assume that for some s > 2 , as x :
( i ) x s M ( x ) is eventually decreasing , and ( i i ) x 1 log M ( x ) is eventually increasing .
For any t M ( 0 ) , define M i n v ( t ) 0 by
M ( M i n v ( t ) ) = t .
We further require
sup x I d E M ( | F ( Y ) | ) X = x < .
Notable examples of functions M ( · ) that satisfy these criteria include the following:
1.
M ( x ) = x p for some p > 2 ;
2.
M ( x ) = exp ( s x ) for some s > 0 .
In other words, the assumption (M1) may be replaced by weaker, but more intricate, finite-moment-type conditions. This change, however, significantly complicates the technical details of the proofs. For further insights and a more comprehensive discussion, the reader may consult [111]. Finally, the introduction of the function ψ ( · ) into our framework is motivated by findings and observations stated in Remark 1.2 of [111].

4.1. The Conditional Distribution Function

By incorporating Ψ ( Y ) = 1 { Y t } , for t R q , into (17) under the assumption β = 0 , we obtain the following kernel estimator of the conditional distribution function
F ( t x ) : = P Y t X = x ,
namely,
m ˜ n ( t x ) : = i = 1 n 2 λ ( n ) d 1 ( Y i t ) n f n ( X i ) K 2 λ ( n ) x , 2 λ ( n ) X i .
ref. [112] refers to this as the conditional empirical distribution function and was the first to establish uniform consistency results for it. Next, consider the following additive structure:
F a d d ( t x ) = l = 1 d m l ( t , x l ) .
Following the marginal integration procedure, for any x R d , the corresponding additive regression function takes the form
F a d d ( t x ) = l = 1 d η l ( t , x l ) + R d F a d d ( t z ) q ( z ) d z ,
where
η l ( t , x l ) = R d 1 F a d d ( t x ) q l ( x l ) d x l R d F a d d ( t x ) q ( x ) d x .
To estimate both the conditional empirical distribution and the additive components, we define
F ˜ a d d ( t , x , h n ) = l = 1 d η ˜ l ( t , x l , h n ) + R d m ˜ n ( t z ) q ( z ) d z ,
and
η ˜ l ( t , x l , h n ) = R d 1 m ˜ n ( t x ) q l ( x l ) d x l R d m ˜ n ( t x ) q ( x ) d x .
By applying Theorem 1, we obtain the uniform convergence rate
sup x C F ˜ a d d ( t , x , h ) F a d d ( t x ) = O ( log n n a ( s 1 p ) ) .

4.2. Relative-Error Prediction

Classical regression typically estimates the operator r ( · ) by minimizing the conditional expectation of the squared error, E ( Y r ( X ) ) 2 X . While widely used, this approach imposes equal emphasis on all observations, rendering it sensitive to outliers and potentially unsuitable in situations involving large variability in the response values.
To address these drawbacks, we focus on minimizing the mean squared relative error (MSRE) defined by
E Y r ( X ) Y 2 X , with Y > 0 almost surely .
This criterion offers a more robust assessment of predictive performance when the range of predicted values is extensive. Moreover, the unique feature of the MSRE-based approach is that its solution can be expressed as a ratio of the first two conditional inverse moments of Y. Specifically, letting
g γ ( x ) = E Y γ X = x , γ = 1 , 2 ,
and assuming these moments exist and are finite almost surely, one can show, following [113,114,115], that the optimal MSRE predictor of Y given X is
r ˘ ( x ) = E Y 1 , X = x E Y 2 X = x , a . s .
By incorporating Ψ ( Y ) = Y γ , into (17) under the assumption β = 0 , we obtain the following kernel estimator
r ˘ n ( x ) = m ^ add 0 ( y 1 , x ) m ^ add 0 ( y 2 , x )
By leveraging Theorem 1 for the particular cases Ψ ( y ) = y 1 and Ψ ( y ) = y 2 , we establish the following uniform convergence rate:
sup x C r ˘ n ( x ) r ˘ ( x ) = O log n n a s 1 p .
This result not only corroborates previous findings but also extends and enhances the analyses presented in [116].
Remark 4. 
It has long been recognized that kernel estimators become less reliable when the dimensionality of the data increases. This phenomenon, commonly known as the curse of dimensionality [117], emerges because, in high-dimensional settings, local regions typically require exceedingly large sample sizes to accumulate enough observations. As a result, unless sample sizes are enormous, bandwidths must be made so broad that the notion of “local” averaging is compromised. A thorough exploration of these issues, with numerical illustrations, can be found in [118], and more recent references include [119,120]. Despite their popularity, however, there remains significant uncertainty regarding the asymptotic properties of penalized splines, even in standard nonparametric settings, and only a few theoretical investigations address their performance. In parallel, it should be noted that most functional regression techniques rely on minimizing an L 2 -norm and are thus sensitive to outliers. Other less conventional but potentially valuable tools include methods using delta sequences and wavelet-based approaches.
Remark 5. 
In statistical learning for time series, it is often assumed that the underlying data-generating process exhibits some form of asymptotic independence, commonly referred to as “mixing”. While many standard processes are presumed to follow known types of mixing rates, the precise coefficients that govern these rates generally remain unspecified. Moreover, such mixing assumptions are seldom examined empirically, and there are currently no established methods to estimate mixing coefficients directly from observed data. To address this gap, ref. [121] introduces an estimator for beta-mixing coefficients based on a single stationary realization. Subsequently, ref. [122] proposes strongly consistent estimators for the l 1 -norm of both α- and β-mixing coefficients in a stationary ergodic process. These estimators then serve as the foundation for strongly consistent hypothesis tests for goodness-of-fit. In particular, ref. [122] develops tests to determine whether the α-mixing or β-mixing coefficients of a process remain bounded above by a specified rate function, under the same summability assumptions.
Remark 6. 
Nonlinear thresholding approaches yield the following estimators for g ( Ψ , x ) and f X ( x ) . Specifically, we define
f ^ X ( x ) = k Z d a ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 τ i = 1 N k Z d b ^ i j k 1 b ^ i j k > δ j , n ψ i , j , k ( x ) ,
and
g ^ n ( Ψ , x ) = k Z d a ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 τ i = 1 N k Z d b ^ i j k 1 b ^ i j k > δ j , n ψ i , j , k ( x ) ,
where δ j , n is an appropriately selected threshold. In the one-dimensional setting ( d = 1 ), the estimator f ^ X was originally proposed by [56]. Exploring the applicability and performance of these estimators for higher-dimensional cases would be a valuable direction for future research.
Remark 7. 
Numerous investigations, such as [54,55,56], report that thresholded nonlinear wavelet-based methods frequently outperform linear alternatives when assessed by mean integrated error. However, Theorem 3 and its corollary reveal that, for any fixed function g in the Besov space B s , p , q , linear wavelet estimators already attain the classic optimal convergence rates under the almost sure sup-norm criterion in the standard i.i.d. setting. This implies that, in terms of convergence rates, the nonlinear thresholded estimators in (35)–(36) confer no additional advantage over the linear estimators in (8)–(11) once the almost sure sup-norm is adopted as the performance measure.
This result naturally raises the question of why the sup-norm metric is preferable to the mean integrated error. Many functions in the broad Besov space B s , p , q exhibit significant local oscillations, which can cause large estimation errors in small subsets of R d . Such localized errors have a negligible impact on mean integrated metrics yet are readily detected by the sup-norm. This issue is especially acute for discontinuous functions, as substantial misestimation around a discontinuity may remain hidden under mean integrated error. Empirical findings by [55] corroborate this, illustrating how pointwise bias and variance in wavelet-based estimators can oscillate considerably while appearing largely diminished in the mean integrated squared error.
Moreover, in most practical scenarios, we typically observe only one realization of the underlying process. An almost sure sup-norm analysis reveals how estimators perform for almost every realization, in contrast to metrics such as mean-square error or mean integrated squared error, which capture only average behavior across multiple realizations. Therefore, we propose that the almost sure sup-norm constitutes a more fitting benchmark for density and regression estimation in B s , p , q . Additional insights on this perspective can be found in the introduction of [71].
Remark 8. 
Our approach, which integrates marginal integration with linear wavelet-based techniques, facilitates the effective separation of linear influences from complex nonlinear patterns. This capability is especially valuable in managerial settings, where comprehending both direct and subtle impacts on outcomes is crucial. For instance:
1.
Finance: Managers can utilize this model to enhance risk assessment and optimize portfolio strategies. By capturing nonlinear dependencies among market variables, the model yields more precise forecasts and deeper insights into asset behavior, thereby supporting more informed investment decisions.
2.
Biology: In research and development, the ability to detect and quantify nonlinear effects assists in guiding resource allocation and experimental design, ultimately leading to more efficient innovation and product development strategies.
3.
Engineering: In engineering contexts, our method’s robust estimation of critical performance indicators—despite the presence of complex nonlinear interactions—substantially enhances operational control and system optimization. This increased precision enables managers to implement quality control measures that are both accurate and reliable, ensuring that engineering systems consistently perform within established standards.

5. Concluding Remarks

In this work, we establish strong uniform convergence results for estimators of nonlinear additive components derived from integrating marginal methods with linear wavelet-based estimation strategies. These findings significantly advance the theoretical underpinnings of additive modeling under mixing conditions, which characterizes the dependence structure of the underlying random variable sequences. In particular, wavelet estimators excel in detecting discontinuities, offering a distinct advantage over kernel-based techniques. This benefit is most pronounced when target functions exhibit spatial irregularities, especially for 1 p < 2 , or when function smoothness is not specified. In such scenarios, purely linear smoothing strategies cannot attain the minimax optimal convergence rate, making nonlinear wavelet estimators (e.g., soft and hard thresholding) indispensable. Nevertheless, constructing and analyzing these nonlinear approaches introduces considerable mathematical complexity, which remains an important avenue for future research to broaden the applicability and bolster the performance of wavelet-based estimators.
A promising direction for further exploration involves the application of wavelet techniques to conditional U-statistics. Integrating wavelet-based methods into this setting may open up new theoretical insights and spur the development of enhanced inference procedures. Advancing this line of research is therefore highly encouraged as it has the potential to substantially enrich methodological progress in the analysis of complex stochastic processes.
The advent of high-throughput genomic technologies—covering genome-wide single nucleotide polymorphisms and RNA sequencing—has generated extensive datasets that require robust statistical models to quantify and interpret how genetic variation and gene expression influence quantitative traits [123]. In contrast to conventional data structures, where the number of variables is typically much smaller than the sample size, modern high-dimensional data often have as many or more variables (p) than observations (n). This imbalance poses serious challenges to traditional statistical methods, which generally assume n p .
Consequently, it is critical to examine the viability of partial linear additive models (PLAM) in high-dimensional frameworks, particularly where both linear and nonlinear components can exceed the sample size, as emphasized in [123]. Such investigations promise to refine our analytical toolkit for complex genomic data and deepen our understanding of the genetic mechanisms that underlie quantitative traits.
Finally, to reinforce the practical significance of this work, comprehensive simulation studies and applications to real datasets would be highly advantageous. We also intend to perform a comprehensive sensitivity analysis to assess the robustness of our findings under various conditions. Such empirical investigations will offer more concrete guidelines for the effective use of these methods, which will underscore their potential to advance contemporary statistical analysis.

6. Proofs

This section is dedicated to proving our main result. Throughout this proof, we will continue to use the notation introduced earlier. To unburden our notation a bit, m ^ add β ^ ( Ψ , · ) will be denoted by m ^ add β ^ ( · ) . We begin by establishing the result under the assumption that the marginal density f ( · ) is known. Subsequently, we provide a brief outline of the proof in which the marginal density is estimated.
Proof of Theorem 1. 
Since C is compact, we can cover it with a finite collection of balls B p ( t p , r p ) such that
C p = 1 r ( n ) B p ,
where r ( n ) will be specified later. Therefore, for any point x C , there exists a ball centered at t ( x ) that contains x. Consequently, the approximation error can be decomposed as follows:
sup x C m ^ a d d β ^ ( x ) m ( x )       sup x C m ^ a d d β ^ ( x ) m ^ a d d β ( x ) + sup x C m ^ a d d β ( x ) m ^ a d d β ( t ( x ) )             + sup x C m ^ a d d β t ( x ) E m ^ a d d β t ( x ) + sup x C E m ^ a d d β t ( x ) E m ^ a d d β ( x )             + sup x C E m ^ a d d β ( x ) m ( x ) .
This decomposition shows how each term contributes to the overall approximation error. Recall that
m ^ a d d β ( x ) = l = 1 d η ^ l β ( x l ) + R d m ^ n β ( z ) q ( z ) d z .
Hence,
E m ^ a d d β ( x ) m ( x )       =   E l = 1 d η ^ l β ( x l ) + R d m ^ n β ( z ) q ( z ) d z l = 1 d η l ( x l ) R d m ( z ) q ( z ) d z       =   l = 1 d E η ^ l β ( x l ) η l ( x l ) + R d E m ^ n β ( z ) m ( z ) q ( z ) d z       = : B 1 + B 2 .
Here, B 1 captures the main bias for the additive components η ^ l β ( · ) , while B 2 corresponds to the integral term involving m ^ n β ( z ) . Next, recall that
η ^ l β ( x l ) = ζ ^ n , l β x l R ζ ^ n , l β u l q l ( u l ) d u l ,
where
ζ ^ n , l β x l = 2 λ ( n ) n i = 1 n Y ˜ i , n β f l X l , i K l 2 λ ( n ) x l , 2 λ ( n ) X l , i ,
and
Y ˜ i , n β = Ψ ( Y i ) Z i β R d 1 2 ( d 1 ) λ ( n ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i q l x l f X l , i X l , i d x l .
The term ζ ^ n , l β ( · ) can be viewed as a univariate regression estimator of
m ˜ n ( · ) = E [ Y ˜ i , n β X l , i = · ] .
In accordance with the approach of [103], let
C n : = μ + R d 1 j = 1 , j l d m j ( z j ) g n z l d z l ,
where
g n z l : = R d 1 2 ( d 1 ) λ ( n ) K l 2 λ ( n ) u l , 2 λ ( n ) z l q l u l d u l .
Then
m ˜ n ( x l ) = m l ( x l ) + C n .
Furthermore,
η l ( x l ) = m l ( x l ) R m l ( u l ) q l ( u l ) d u l .
Combining (39) and (40) leads to
η l ( x l ) = m ˜ n ( x l ) C n R m l ( u l ) q l ( u l ) d u l .
Additionally, for 1 l d , we know
η ^ l β ( x l ) = R d 1 m ^ n Ψ , β ( x ) q l x l d x l R d m ^ n Ψ , β ( x ) q ( x ) d x ,
which implies
η ^ l β ( x l ) = ζ ^ n , l β x l R d m ^ n Ψ , β ( x ) q ( x ) d x .
Therefore, exploiting (41) and (42), we can write the first bias term B 1 in (38) as
B 1 ( x l ) = E ζ ^ n , l β ( x l ) m ˜ n ( x l ) E R d m ^ n Ψ , β ( x ) q ( x ) d x R m l ( u l ) q l ( u l ) d u l C n .
To further estimate E ζ ^ n , l β ( x l ) m ˜ n ( x l ) , observe:
E ζ ^ n , l β ( x l ) m ˜ n ( x l )       = E 2 λ ( n ) Y ˜ i , n β f l X l , i K l 2 λ ( n ) x l , 2 λ ( n ) X l , i m ˜ n ( x l )       = R 2 λ ( n ) m ˜ n ( u l ) K l 2 λ ( n ) x l , 2 λ ( n ) u l d u l m ˜ n ( x l )       = R m ˜ n x l + 2 λ ( n ) v l m ˜ n ( x l ) K l 2 λ ( n ) x l , 2 λ ( n ) x l + v l d v l       R m ˜ n x l + 2 λ ( n ) v l m ˜ n ( x l ) K l 2 λ ( n ) x l , 2 λ ( n ) x l + v l d v l       2 λ ( n ) s 1 p C R v l s 1 p K l 2 λ ( n ) x l , 2 λ ( n ) x l + v l d v l ,
where in the final step we use the fact that m ˜ n ( · ) is Lipschitz of order s 1 p . Note additionally that each kernel K l ( · ) has compact support, which simplifies bounding the above integrals. We begin by noting that under the assumption
2 λ ( n ) log n n a ,
it follows that
E ζ ^ n , l β ( x l ) m ˜ n ( x l ) = O log n n a s 1 p .
To proceed, we next examine
E R d m ^ n Ψ , β ( x ) q ( x ) d x C n       = R d E m ^ n Ψ , β ( x ) q ( x ) d x R d 1 j = 1 j l d m j ( z j ) g n z l d z l μ       = 2 λ ( n ) d R d E Ψ ( Y i ) Z i β K 2 λ ( n ) x , 2 λ ( n ) X i f ( X i ) q ( x ) d x             R d 1 j = 1 j l d m j ( z j ) g n z l d z l μ       = 2 λ ( n ) d R d R d m a d d ( u ) K 2 λ ( n ) x , 2 λ ( n ) u q ( x ) d x d u             R d 1 j = 1 j l d m j ( z j ) g n z l d z l μ       = 2 λ ( n ) d R d R d j = 1 n m j ( u j ) K 2 λ ( n ) x , 2 λ ( n ) u q ( x ) d x d u             R d 1 j = 1 j l d m j ( z j ) g n z l d z l       = 2 λ ( n ) R R m l ( u l ) K l 2 λ ( n ) x l , 2 λ ( n ) u l q l ( x l ) d x l d u l       R m l ( x l ) q l ( x l ) d x l + R q l ( x l ) R m l x l + 2 λ ( n ) v l m l ( x l )               × K l 2 λ ( n ) x l , 2 λ ( n ) x l + v l d x l d v l       R m l ( x l ) q l ( x l ) d x l                 + R q l ( x l ) Supp ( K l ) 2 λ ( n ) s 1 p v l s 1 p K l 2 λ ( n ) x l , 2 λ ( n ) x l + v l d x l d v l       = R m l ( x l ) q l ( x l ) d x l + O log n n a s 1 p .
Combining (44) and (45) immediately implies
B 1 = O log n n a s 1 p .
We now consider the additional term:
E m ^ n β ( z ) m ( z )       = E 2 λ ( n ) d i = 1 n Ψ ( Y i ) Z i β n f X ( X i ) K 2 λ ( n ) z , 2 λ ( n ) X i m ( z )       = R d 2 λ ( n ) d m ( u ) K 2 λ ( n ) z , 2 λ ( n ) u d u m ( z )       = R d m ( z + 2 λ ( n ) v ) m ( z ) K 2 λ ( n ) z , 2 λ ( n ) z + v d v       2 λ ( n ) s 1 p R d v s 1 p K 2 λ ( n ) z , 2 λ ( n ) z + v d v .
Hence, we conclude that
B 2 = O log n n a s 1 p .
From (38), (46), and (47), we deduce the following:
sup x C E m ^ a d d β ( x ) m ( x ) = O log n n a s 1 p .
Finally, we consider the difference
m ^ a d d β ( x ) m ^ a d d β t ( x ) = l = 1 d ζ ^ n , l β x l ζ ^ n , l β t ( x l ) ,
where
ζ ^ n , l β ( x l ) = 2 λ ( n ) n i = 1 n Y ˜ i , n β f l X l , i K l 2 λ ( n ) x l , 2 λ ( n ) X l , i ,
and
Y ˜ i , n β = Ψ ( Y i ) Z i β × R d 1 2 ( d 1 ) λ ( n ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i q l x l f X l , i X l , i d x l .
Since K l is a Lipschitz function, there exists a suitable Lipschitz constant L; thus,
m ^ a d d β ( x ) m ^ a d d β t ( x ) = O 2 2 λ ( n ) r p ,
where r p is the radius of B p ( t p , r p ) . For
r p = ε ( n ) 2 λ ( n ) γ with γ 2 ,
we obtain
P sup x C m ^ a d d β ( x ) m ^ a d d β t ( x ) ε ( n ) / 4 = 0 ,
provided ε ( n ) is chosen sufficiently small. Similarly, one can establish that
P sup x C E [ m ^ add β ( x ) ] E [ m ^ add β ( t ( x ) ) ] ε ( n ) / 4 = 0 .
Next, let { ( p ( n ) , q ( n ) ) } n N * be sequences of positive integers such that
2 p ( n ) q ( n ) n < 2 p ( n ) q ( n ) + 1 .
We then obtain
m ^ add β t ( x ) E m ^ add β t ( x )       = l = 1 d η ^ l β t ( x l ) + R d m ^ n β ( z ) q ( z ) d z             E l = 1 d η ^ l β t ( x l ) + R d m ^ n β ( z ) q ( z ) d z       = 1 n j = 1 q ( n ) ξ j 1 t ( x ) + 1 n j = 1 q ( n ) ξ j 2 t ( x ) + 1 n δ n t ( x ) ,
where
ξ j 1 = i = 2 ( j 1 ) p ( n ) + 1 ( 2 j 1 ) p ( n ) ζ i E ζ i , ξ j 2 = i = ( 2 j 1 ) p ( n ) + 1 2 j p ( n ) ζ i E ζ i , δ n = i = 2 q ( n ) p ( n ) + 1 n ζ i E ζ i , ζ i = l = 1 d 2 λ ( n ) n Y ˜ i , n β f l ( X l , i ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i R Y ˜ i , n β f l ( X l , i ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i + 2 λ ( n ) d Ψ ( Y i ) Z i β f ( X i ) R d K 2 λ ( n ) x , 2 λ ( n ) X i q ( x ) d x .
It is straightforward to show that for 1 j q ( n ) , there exist constants m and M such that
m p ( n ) ξ j i M p ( n ) for i = 1 , 2 .
Consequently, we have
P sup x C m ^ add β t ( x ) E m ^ add β t ( x ) ε ( n )       P sup x C j = 1 q ( n ) ξ j 1 t ( x ) 2 n ε ( n ) 5 + P sup x C j = 1 q ( n ) ξ j 2 t ( x ) 2 n ε ( n ) 5             + P sup x C δ n t ( x ) n ε ( n ) 5 .
We begin by examining the leading term of (51). Specifically, we have
P sup x C j = 1 q ( n ) ξ j 1 t ( x ) 2 n ε ( n ) 5 r ( n ) max p = 1 , , r ( n ) P j = 1 q ( n ) ξ j 1 ( t p ) 2 n ε ( n ) 5 .
Next, by choosing
ζ = min n ε ( n ) 5 q ( n ) , m p ( n ) 2 ,
we obtain
j = 1 q ( n ) ξ j 1 2 n ε ( n ) 5 j = 1 q ( n ) ξ j 1 2 n ε ( n ) 5 and ξ j 1 ξ j 1 * 1 j q ( n ) ζ j = 1 q ( n ) ξ j 1 ξ j 1 * > ζ ,
where ξ j 1 * are defined as in Lemma A1. Consequently,
P j = 1 q ( n ) ξ j 1 2 n ε ( n ) 5 P j = 1 q ( n ) ξ j 1 * 2 n ε ( n ) 5 q ( n ) ζ + j = 1 q ( n ) P ξ j 1 ξ j 1 * ζ .
By applying Hoeffding’s inequality [124], we deduce
P j = 1 q ( n ) ξ j 1 * 2 n ε ( n ) 5 q ( n ) ζ P j = 1 q ( n ) ξ j 1 * n ε ( n ) 5 2 exp 2 n 2 ε 2 ( n ) / 5 2 q ( n ) 4 M 2 p ( n ) 2 2 exp n ε 2 ( n ) 5 2 M 2 p ( n ) .
Next, applying Bradley’s Lemma A1 with γ = + gives
P ξ j ξ j * ζ 11 M p ( n ) min n ε ( n ) 5 p ( n ) , m p ( n ) 2 α ( p ) 11 5 2 1 2 min ε ( n ) M , 5 m 4 M 1 2 α ( p ) ,
since n 2 p ( n ) q ( n ) . If we additionally require ε ( n ) M , then
min ε ( n ) M , 5 m 4 M ε ( n ) M m 2 .
Hence,
P ξ j ξ j * ζ 11 5 2 1 2 M 2 m ε ( n ) α ( p ) .
Combining (53), (54), and (55) then yields
P j = 1 q ( n ) ξ j 1 2 n ε ( n ) 5 2 exp n ε 2 ( n ) 5 2 M 2 p ( n ) + 11 5 2 1 2 M 2 m ε ( n q ( n ) α ( p ) .
Finally, by combining (52) and (56), we obtain
P sup x C j = 1 q ( n ) ξ j 1 t ( x ) 2 n ε ( n ) 5 r ( n ) ( 2 exp n ε 2 ( n ) 5 2 M 2 p ( n ) + 11 5 2 1 2 M 2 m ε ( n q ( n ) α ( p ) ) .
By the same reasoning, we can also prove
P sup x C j = 1 q ( n ) ξ j 2 t ( x ) 2 n ε ( n ) 5 r ( n ) ( 2 exp n ε 2 ( n ) 5 2 M 2 p ( n ) + 11 5 2 1 2 M 2 m ε ( n q ( n ) α ( p ) ) .
Now, let ν > 0 . Then we have
P δ n n ε ( n ) 5 exp ν n ε ( n ) 5 E exp ν δ n exp ν n ε ( n ) 5 M n 2 p ( n ) q ( n ) .
Since M ν n 2 p ( n ) q ( n ) = 1 and n < 2 p ( n ) q ( n ) + 1 , we obtain
P δ n n ε ( n ) 5 e exp n ε ( n ) 5 M n 2 p ( n ) q ( n ) e exp ε ( n ) 5 M q ( n ) .
In addition, if ε ( n ) M , then
P δ n n ε ( n ) 5 e exp ε ( n ) 2 5 M 2 q ( n ) .
Hence,
P sup x C δ n t ( x ) n ε ( n ) 5 r ( n ) max p = 1 , , r ( n ) P δ n ( t p ) n ε ( n ) 5 e r ( n ) exp ε ( n ) 2 5 M 2 q ( n ) .
Recall that
2 p ( n ) q ( n ) n < 2 p ( n ) q ( n ) + 1 ,
which implies
q ( n ) = n 2 p ( n ) θ , 0 θ < 1 .
Then, we have
P sup x C δ n t ( x ) n ε ( n ) 5 e r ( n ) exp n ε ( n ) 2 10 M 2 p ( n ) exp ε ( n ) 2 5 M 2 e 6 / 5 r ( n ) exp n ε ( n ) 2 10 M 2 p ( n ) ,
since ε ( n ) M .
It follows from (51), (57), (58), and (61) that
P sup x C m ^ add β t ( x ) E m ^ add β t ( x ) ε ( n )       2 r ( n ) 2 exp n ε 2 ( n ) 25 M 2 p ( n ) + 11 5 2 1 2 M 2 m ε ( n ) q ( n ) α ( p )             + e 6 / 5 r ( n ) exp n ε ( n ) 2 10 M 2 p ( n )       r ( n ) 4 + e 6 / 5 exp n ε 2 ( n ) 25 M 2 p ( n ) + 22 r ( n ) 5 2 1 2 M 2 m ε ( n q ( n ) α ( p ) .
Therefore, by the Borel–Cantelli lemma, we obtain
n 1 P sup x C m ^ add β t ( x ) E m ^ add β t ( x ) ε ( n ) < .
We begin by observing the following relationship:
m ^ a d d β ^ ( x ) m ^ a d d β ( x )       = l = 1 d η ^ l β ^ ( x l ) + R d m ^ n β ^ ( z ) q ( z ) d z l = 1 d η ^ l β ( x l ) R d m ^ n β ( z ) q ( z ) d z       = l = 1 d η ^ l β ^ ( x l ) η ^ l β ( x l ) + R d m ^ n β ^ ( z ) q ( z ) d z m ^ n β ( z ) q ( z ) d z .
Recall that
η ^ l β ( x l ) = ζ ^ n , l β x l R ζ ^ n , l β x l q l ( x l ) d x l ,
where
ζ ^ n , l β x l = 2 λ ( n ) n i = 1 n Y ˜ i , n β f l X l , i K l 2 λ ( n ) x l , 2 λ ( n ) X l , i ,
and
Y ˜ i , n β = Ψ ( Y i ) Z i β R d 1 2 ( d 1 ) λ ( n ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i q l x l f X l , i X l , i d x l .
Consequently,
η ^ l β ^ x l η ^ l β x l       = ζ ^ n , l β ^ x l ζ ^ n , l β x l R ζ ^ n , l β ^ x l ζ ^ n , l β x l q l ( x l ) d x l       = i = 1 n 2 λ ( n ) Y ˜ i , n β ^ Y ˜ i , n β n f l X l , i K l 2 λ ( n ) x l , 2 λ ( n ) X l , i R K l 2 λ ( n ) x l , 2 λ ( n ) X l , i q l ( x l ) d x l .
It is immediately apparent that
Y ˜ i , n β ^ Y ˜ i , n β       = Z i β ^ β R d 1 2 ( d 1 ) λ ( n ) K l 2 λ ( n ) x l , 2 λ ( n ) X l , i q l x l f X l , i X l , i d x l .
Observe that
n 2 λ ( n ) β ^ β       = n 1 Z ˜ Z ˜ 1 1 n 2 λ ( n ) i = 1 n Z ˜ i m ˜ a d d ( X i ) i = 1 n Z ˜ i j = 1 n W n j ( X i ) ε j + i = 1 n Z ˜ i ε i .
Then,
i = 1 n Z ˜ i m ˜ a d d ( X i ) = i = 1 n Z i m ˜ a d d ( X i ) i = 1 n j = 1 n W n j ( X i ) Z j m ˜ a d d ( X i ) ,
where
max 1 i n m ˜ a d d ( X i ) = max 1 i n m a d d ( X i ) j = 1 n m a d d ( X j ) W n j ( X i ) max 1 i n m a d d ( X i ) m ^ a d d Ψ , β ( X i ) + max 1 i n j = 1 n W n j ( X i ) ε j .
By Lemma 1 of [125], under hypotheses (G.1-3), (K.1-3), (M.1), and (Q.1-2), it follows that
max 1 i n m a d d ( X i ) m ^ a d d Ψ , β ( X i ) = O 2 λ ( n ) log n n .
Moreover, it is straightforward to see that
max 1 i n j = 1 n W n j ( X i ) = O 1 .
By Theorem 1.1 of [126], choosing c n = 2 n log 2 ( n ) yields
i = 1 n Z i = O 2 n log 2 ( n ) .
Hence, combining (69), (70), and (71) with hypothesis (H.2), we obtain
i = 1 n Z i m ˜ a d d ( X i ) = o n 2 λ ( n ) .
Next, consider
i = 1 n j = 1 n W n j X i Z j m ˜ a d d X i       n max 1 i n m ˜ a d d X i max 1 i n max 1 j n W n j X i j = 1 n Z j .
A similar argument, using
max 1 i n max 1 j n W n j ( X i ) = O n 1 2 λ ( n ) ,
implies
i = 1 n j = 1 n W n j X i Z j m ˜ a d d X i = o n 2 λ ( n ) .
Therefore, by (68), (72), and (74), it follows that
i = 1 n Z ˜ i m ˜ a d d ( X i ) = o n 2 λ ( n ) .
Applying the same arguments, we also deduce
i = 1 n Z ˜ i j = 1 n W n j X i ε j = o n 2 λ ( n ) ,
and
i = 1 n Z ˜ i ε i = o n 2 λ ( n ) .
Finally, combining (67), (75), (76), and (77) with the fact that
lim n 1 n i = 1 n Z ˜ i Z ˜ i = B ,
we conclude that
n 2 λ ( n ) β ^ β = o ( 1 ) .
Hence, from (65), (66), (67), (78), and hypothesis (H.3), we obtain
l = 1 d η ^ l β ^ ( x l ) η ^ l β ( x l ) = o log n n a s 1 p .
Next, observe that
m ^ n β ^ ( z ) q ( z ) d z m ^ n β ( z ) q ( z ) d z       = 2 λ ( n ) d i = 1 n Z i β ^ β n f X ( X i ) K 2 λ ( n ) z , 2 λ ( n ) X i .
By (78) and hypothesis (H.3), it follows that
R d m ^ n β ^ ( z ) q ( z ) d z m ^ n β ( z ) q ( z ) d z = o log n n a s 1 p .
Therefore, by (64), (79), and (81), we conclude that
sup x C m ^ a d d β ^ ( x ) m ^ a d d β ( x ) = o log n n a s 1 p .
Finally, combining (37), (48), (49), (50), 63, and (82) completes the proof of the theorem.
Observe that, up to this point, we have focused on the scenario in which the density function f X ( · ) is known. When f X ( · ) is unknown, however, it can be estimated by the empirical function f n ( · ) . In this case, we use the following decomposition:
1 f n = 1 f X + f X f n f X f n .
According to [70], under conditions (G.1)–(G.3), it holds on every compact subset I d R d that
sup x I d f n ( x ) f X ( x ) = O log n n s d + 2 s .
By combining this result with the preceding findings, we can now complete the proof of the theorem. □

Author Contributions

K.C.: Conceptualization, Methodology, Investigation, Writing—Original Draft, Writing—Review and Editing; S.B.: Conceptualization, Methodology, Investigation, Writing—Original Draft, Writing—Review and Editing. Both authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors wish to extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the six referees for their exceptionally thoughtful and constructive feedback. Their rigorous and detailed review process not only illuminated several oversights in our initial submission but also provided insightful guidance that has significantly enhanced the depth, clarity, and focus of our work. We are deeply appreciative of the academic rigor and kindness embodied in their suggestions, which have been instrumental in refining our presentation and elevating the overall quality of this research.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Lemma A1. 
Let ζ and γ be positive constants, and consider random variables ξ 1 , , ξ q such that
ζ       ξ j γ   <
for all j = 1 , , q . Then there exist independent random variables ξ 1 * , , ξ q * with P ξ j * = P ξ j for each j, and
P | ξ j * ξ j | ζ 11 ξ j γ ζ γ 2 γ + 1 α ( p ) 2 γ 2 γ + 1 ,
where α ( p ) denotes the strong mixing coefficients given by
α ( p ) = sup t sup A F t , B F t + p + P ( A B ) P ( A ) P ( B ) ,
for p = 1 , 2 , , and F t and F t + p + are the σ-fields generated by the centered real-valued stochastic process ( X s ) s Z , with X s for s t and s t + p , respectively.
Proof. 
See Theorem 3 of [127] and p. 202 of [128]. □

Besov Spaces

Following [71], suppose 1 p , q . Define the shift operator
( S τ f ) ( x ) = f ( x τ ) .
For 0 < s < 1 , set
γ s , p , q ( f ) = R d S τ f f L p τ s q d τ τ d 1 / q ,
and
γ s , p , ( f ) = sup τ R d S τ f f L p τ s .
If s = 1 , define
γ 1 , p , q ( f ) = R d S τ f + S τ f 2 f L p τ q d τ τ d 1 / q ,
and
γ 1 , p , ( f ) = sup τ R d S τ f + S τ f 2 f L p τ .
For 0 < s < 1 and 1 p , q , define
B s , p , q = { f L p ( R d ) : γ s , p , q ( f ) < } .
For s > 1 , write
s = [ s ] + { s } + ,
where [ s ] is an integer and 0 < { s } + 1 . Then, B s , p , q consists of all functions f L p ( R d ) such that D j f B { s } + , p , q for every multi-index j with | j | [ s ] . The norm is given by
f B s , p , q = f L p + | j | [ s ] γ { s } + , p , q ( D j f ) .
Notable examples of Besov spaces include the Sobolev space H 2 s = B s , 2 , 2 and the space of bounded s-Lipschitz functions B s , , .

References

  1. Liu, Y.; Lu, M.; McMahan, C.S. A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data. Electron. J. Stat. 2021, 15, 2247–2287. [Google Scholar] [CrossRef]
  2. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  3. Härdle, W.; Mammen, E.; Müller, M. Testing parametric versus semiparametric modeling in generalized linear models. J. Am. Stat. Assoc. 1998, 93, 1461–1474. [Google Scholar] [CrossRef]
  4. Müller, M.; Rönz, B. Credit scoring using semiparametric methods. In Measuring Risk in Complex Stochastic Systems; Springer: New York, NY, USA, 2000; pp. 83–97. [Google Scholar]
  5. Zhang, J.; Zhou, Y.; Lin, B.; Yu, Y. Estimation and hypothesis test on partial linear models with additive distortion measurement errors. Comput. Stat. Data Anal. 2017, 112, 114–128. [Google Scholar] [CrossRef]
  6. Wang, L.; Liu, X.; Liang, H.; Carroll, R.J. Estimation and variable selection for generalized additive partial linear models. Ann. Stat. 2011, 39, 1827–1851. [Google Scholar] [CrossRef] [PubMed]
  7. Zhang, X.; Liang, H. Focused information criterion and model averaging for generalized additive partial linear models. Ann. Stat. 2011, 39, 174–200. [Google Scholar] [CrossRef]
  8. Tian, Y.; Jiang, B. Equalities for estimators of partial parameters under linear model with restrictions. J. Multivar. Anal. 2016, 143, 299–313. [Google Scholar] [CrossRef]
  9. Brunk, H.D. On the estimation of parameters restricted by inequalities. Ann. Math. Stat. 1958, 29, 437–454. [Google Scholar] [CrossRef]
  10. Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric Estimates of the Relation Between Weather and Electricity Sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
  11. Speckman, P. Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. B 1988, 50, 413–436. [Google Scholar] [CrossRef]
  12. Liang, H.; Härdle, W.; Carroll, R.J. Estimation in a semiparametric partially linear errors-in-variables model. Ann. Stat. 1999, 27, 1519–1535. [Google Scholar] [CrossRef]
  13. Severini, T.A.; Staniswalis, J.G. Quasi-likelihood estimation in semiparametric models. J. Am. Stat. Assoc. 1994, 89, 501–511. [Google Scholar] [CrossRef]
  14. Rice, J. Convergence rates for partially splined models. Stat. Probab. Lett. 1986, 4, 203–208. [Google Scholar] [CrossRef]
  15. Chen, H. Convergence rates for parametric components in a partly linear model. Ann. Stat. 1988, 16, 136–146. [Google Scholar] [CrossRef]
  16. Robinson, P.M. Root-N-consistent semiparametric regression. Econometrica 1988, 56, 931–954. [Google Scholar] [CrossRef]
  17. Chen, H.; Shiau, J.J.H. A two-stage spline smoothing method for partially linear models. J. Stat. Plann. Inference 1991, 27, 187–201. [Google Scholar] [CrossRef]
  18. Eubank, R.L.; Speckman, P. Curve fitting by polynomial-trigonometric regression. Biometrika 1990, 77, 1–9. [Google Scholar] [CrossRef]
  19. Donald, S.G.; Newey, W.K. Series estimation of semilinear models. J. Multivar. Anal. 1994, 50, 30–40. [Google Scholar] [CrossRef]
  20. Shi, P.D.; Li, G.Y. Asymptotic normality of the M-estimators for parametric components in partly linear models. Northeast. Math. J. 1995, 11, 127–138. [Google Scholar]
  21. Shi, P.D.; Li, G.Y. A note on the convergent rates of M-estimates for a partly linear model. Statistics 1995, 26, 27–47. [Google Scholar] [CrossRef]
  22. Hamilton, S.A.; Truong, Y.K. Local linear estimation in partly linear models. J. Multivar. Anal. 1997, 60, 1–19. [Google Scholar] [CrossRef]
  23. Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Contributions to Statistics; Physica-Verlag: Heidelberg, Germany, 2000; p. x+203. [Google Scholar] [CrossRef]
  24. Lee, E.R.; Han, K.; Park, B.U. Estimation of errors-in-variables partially linear additive models. Stat. Sin. 2018, 28, 2353–2373. [Google Scholar]
  25. Boente, G.; Martínez, A.M. Robust variable selection for partially linear additive models. Stat. Comput. 2024, 34, 201. [Google Scholar] [CrossRef]
  26. Maidman, A.; Wang, L.; Zhou, X.H.; Sherwood, B. Quantile partially linear additive model for data with dropouts and an application to modeling cognitive decline. Stat. Med. 2023, 42, 2729–2745. [Google Scholar] [CrossRef]
  27. Yang, X.; Li, L.; Wu, H.; Xu, W. Composite quantile regression for partially linear additive model with censored responses and its application. Chin. J. Appl. Probab. Stat. 2023, 39, 604–622. [Google Scholar]
  28. Li, Z.; Song, Y. Two-stage Walsh-average-based robust estimation and variable selection for partially linear additive spatial autoregressive models. Braz. J. Probab. Stat. 2023, 37, 667–692. [Google Scholar] [CrossRef]
  29. Mou, X.; Wang, D. Additive partially linear model for pooled biomonitoring data. Comput. Stat. Data Anal. 2024, 190, 107862. [Google Scholar] [CrossRef]
  30. Cai, T.; Li, J.; Zhou, Q.; Yin, S.; Zhang, R. Subgroup detection based on partially linear additive individualized model with missing data in response. Comput. Stat. Data Anal. 2024, 192, 107910. [Google Scholar] [CrossRef]
  31. Zhao, W.; Li, R.; Lian, H. Estimation and variable selection of quantile partially linear additive models for correlated data. J. Stat. Comput. Simul. 2024, 94, 315–345. [Google Scholar] [CrossRef]
  32. Lu, F.; Tian, G.; Yang, J. GMM estimation and variable selection of partially linear additive spatial autoregressive model. Stat. Pap. 2024, 65, 2253–2288. [Google Scholar] [CrossRef]
  33. Lu, M.; Li, C.S.; Wagner, K.D. Penalised estimation of partially linear additive zero-inflated Bernoulli regression models. J. Nonparametr. Stat. 2024, 36, 863–890. [Google Scholar] [CrossRef]
  34. Stone, C.J. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
  35. Stone, C.J. The dimensionality reduction principle for generalized additive models. Ann. Stat. 1986, 14, 590–606. [Google Scholar] [CrossRef]
  36. Fan, J.; Gijbels, I. Local polynomial modelling and its applications. In Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1996; Volume 66, p. xvi+341. [Google Scholar]
  37. Härdle, W. Applied nonparametric regression. In Econometric Society Monographs; Cambridge University Press: Cambridge, UK, 1990; Volume 19, p. xvi+333. [Google Scholar] [CrossRef]
  38. Manzan, S.; Zerom, D. Kernel estimation of a partially linear additive model. Stat. Probab. Lett. 2005, 72, 313–322. [Google Scholar] [CrossRef]
  39. Chamberlain, G. Efficiency bounds for semiparametric regression. Econometrica 1992, 60, 567–596. [Google Scholar] [CrossRef]
  40. Yu, K.; Mammen, E.; Park, B.U. Semi-parametric regression: Efficiency gains from modeling the nonparametric part. Bernoulli 2011, 17, 736–748. [Google Scholar] [CrossRef]
  41. Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Stat. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
  42. Bouzebda, S.; Chokri, K. Statistical tests in the partially linear additive regression models. Stat. Methodol. 2014, 19, 4–24. [Google Scholar] [CrossRef]
  43. Chokri, K.; Bouzebda, S. Uniform-in-bandwidth consistency results in the partially linear additive model components estimation. Commun. Stat. Theory Methods 2024, 53, 3383–3424. [Google Scholar] [CrossRef]
  44. Tang, Q.; Tu, W.; Kong, L. Estimation for partial functional partially linear additive model. Comput. Stat. Data Anal. 2023, 177, 107584. [Google Scholar] [CrossRef]
  45. Du, J.; Cao, R.; Kwessi, E.; Zhang, Z. Estimation for generalized partially functional linear additive regression model. J. Appl. Stat. 2019, 46, 914–925. [Google Scholar] [CrossRef]
  46. Ding, F.; Chen, J. Penalized quadratic inference estimation for partially linear additive dynamic panel model with fixed effects. Appl. Math. Ser. A (Chin. Ed.) 2018, 33, 21–35. [Google Scholar]
  47. Liu, L.; Li, J.; Zhang, R. General partially linear additive transformation model with right-censored data. J. Appl. Stat. 2014, 41, 2257–2269. [Google Scholar] [CrossRef]
  48. Ma, S.; Yang, L. Spline-backfitted kernel smoothing of partially linear additive model. J. Stat. Plann. Inference 2011, 141, 204–219. [Google Scholar] [CrossRef]
  49. Meyer, Y. Wavelets and operators. In Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1992; Volume 37, p. xvi+224. [Google Scholar]
  50. Daubechies, I. Ten lectures on wavelets. In CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 61, p. xx+357. [Google Scholar] [CrossRef]
  51. Mallat, S. A Wavelet Tour of Signal Processing, 3rd ed.; Elsevier/Academic Press: Amsterdam, The Netherlands, 2009; p. xxii+805. [Google Scholar]
  52. Vidakovic, B. Statistical Modeling by Wavelets; Wiley Series in Probability and Statistics: Applied Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1999; p. xiv+382. [Google Scholar] [CrossRef]
  53. Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation, and Statistical Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 1998; Volume 129, p. xviii+265. [Google Scholar] [CrossRef]
  54. Hall, P.; Patil, P. Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Ann. Stat. 1995, 23, 905–928. [Google Scholar] [CrossRef]
  55. Hall, P.; Patil, P. On wavelet methods for estimating smooth functions. Bernoulli 1995, 1, 41–58. [Google Scholar] [CrossRef]
  56. Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Stat. 1996, 24, 508–539. [Google Scholar] [CrossRef]
  57. Hall, P.; Penev, S. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 2001, 7, 317–341. [Google Scholar] [CrossRef]
  58. Giné, E.; Nickl, R. Uniform limit theorems for wavelet density estimators. Ann. Probab. 2009, 37, 1605–1646. [Google Scholar] [CrossRef]
  59. Rademacher, D.; Krebs, J.; von Sachs, R. Statistical inference for wavelet curve estimators of symmetric positive definite matrices. J. Stat. Plann. Inference 2024, 231, 106140. [Google Scholar] [CrossRef]
  60. Püspöki, Z.; Unser, M. Template-free wavelet-based detection of local symmetries. IEEE Trans. Image Process. 2015, 24, 3009–3018. [Google Scholar] [CrossRef] [PubMed]
  61. Krivoshein, A.V. On construction of multivariate symmetric MRA-based wavelets. Appl. Comput. Harmon. Anal. 2014, 36, 215–238. [Google Scholar] [CrossRef]
  62. Cattani, C. On the existence of wavelet symmetries in archaea DNA. Comput. Math. Methods Med. 2012, 2012, 673934. [Google Scholar] [CrossRef] [PubMed]
  63. Krivoshein, A.V. From frame-like wavelets to wavelet frames keeping approximation properties and symmetry. Appl. Math. Comput. 2019, 344/345, 204–218. [Google Scholar] [CrossRef]
  64. Doosti, H.; Iranmanesh, A.; Arashi, M.; Tabatabaey, S.M.M. On minimaxity of block thresholded wavelets under elliptical symmetry. J. Stat. Plann. Inference 2011, 141, 1526–1534. [Google Scholar] [CrossRef]
  65. Rauhut, H. Wavelet transforms associated to group representations and functions invariant under symmetry groups. Int. J. Wavelets Multiresolut. Inf. Process. 2005, 3, 167–187. [Google Scholar] [CrossRef]
  66. Liu, H. Wavelet transforms and symmetric tube domains. J. Lie Theory 1998, 8, 351–366. [Google Scholar]
  67. Cohen, A.; Schlenker, J.M. Compactly supported bidimensional wavelet bases with hexagonal symmetry. Constr. Approx. 1993, 9, 209–236. [Google Scholar] [CrossRef]
  68. Gai, Y.; Zhang, J. Detection the symmetry or asymmetry of model errors in partial linear models. Comm. Stat. Simul. Comput. 2022, 51, 2217–2234. [Google Scholar] [CrossRef]
  69. Gai, Y.; Zhang, J. Detection of the symmetry of model errors for partial linear single-index models. Comm. Stat. Simul. Comput. 2022, 51, 3410–3427. [Google Scholar] [CrossRef]
  70. Masry, E. Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stoch. Process. Appl. 1997, 67, 177–193. [Google Scholar] [CrossRef]
  71. Masry, E. Wavelet-based estimation of multivariate regression functions in Besov spaces. J. Nonparametr. Stat. 2000, 12, 283–308. [Google Scholar] [CrossRef]
  72. Masry, E. Probability density estimation from dependent observations using wavelets orthonormal bases. Stat. Probab. Lett. 1994, 21, 181–194. [Google Scholar] [CrossRef]
  73. Chesneau, C.; Doosti, H. A note on the adaptive estimation of a conditional continuous-discrete multivariate density by wavelet methods. Chin. J. Math. 2016, 2016, 6204874. [Google Scholar] [CrossRef]
  74. Chesneau, C.; Doosti, H.; Stone, L. Adaptive wavelet estimation of a function from an m-dependent process with possibly unbounded m. Commun. Stat. Theory Methods 2019, 48, 1123–1135. [Google Scholar] [CrossRef]
  75. Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Commun. Stat. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
  76. Bouzebda, S.; Didi, S.; El Hajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Stat. 2015, 24, 163–199. [Google Scholar] [CrossRef]
  77. Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
  78. Allaoui, S.; Bouzebda, S.; Liu, J. Multivariate wavelet estimators for weakly dependent processes: Strong consistency rate. Comm. Stat. Theory Methods 2023, 52, 8317–8350. [Google Scholar] [CrossRef]
  79. Allaoui, S.; Bouzebda, S.; Liu, J. Asymptotic Distribution of the Wavelet-based estimators of multivariate regression functions in Besov spaces under weak dependence. J. Math. Inequal. 2023, 17, 481–515. [Google Scholar] [CrossRef]
  80. Chokri, K.; Bouzebda, S. Asymptotic normality for the wavelet partially linear additive model components estimation. Comm. Stat. Theory Methods 2024, 53, 8376–8411. [Google Scholar] [CrossRef]
  81. Debbarh, M. Normalité asymptotique de l’estimateur par ondelettes des composantes d’un modèle additif de régression. C. R. Math. Acad. Sci. Paris 2006, 343, 601–606. [Google Scholar] [CrossRef]
  82. Bouzebda, S.; Chokri, K.; Louani, D. Some uniform consistency results in the partially linear additive model components estimation. Commun. Stat. Theory Methods 2016, 45, 1278–1310. [Google Scholar] [CrossRef]
  83. Doosti, H.; Niroumand, H.A. Multivariate stochastic regression estimation by wavelet methods for stationary time series. Pak. J. Stat. 2009, 25, 37–46. [Google Scholar]
  84. Donoho, D.L.; Johnstone, I.M. Minimax estimation via wavelet shrinkage. Ann. Stat. 1998, 26, 879–921. [Google Scholar] [CrossRef]
  85. Donoho, D.L.; Vetterli, M.; DeVore, R.A.; Daubechies, I. Data compression and harmonic analysis. IEEE Trans. Inf. Theory 1998, 44, 2435–2476. [Google Scholar] [CrossRef]
  86. Birgé, L.; Massart, P. An adaptive compression algorithm in Besov spaces. Constr. Approx. 2000, 16, 1–36. [Google Scholar] [CrossRef]
  87. Nickl, R.; Pötscher, B.M. Bracketing metric entropy rates and empirical central limit theorems for function classes of Besov- and Sobolev-type. J. Theoret. Probab. 2007, 20, 177–199. [Google Scholar] [CrossRef]
  88. Nickl, R. Empirical and Gaussian processes on Besov classes. In High Dimensional Probability; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, USA, 2006; Volume 51, pp. 185–195. [Google Scholar] [CrossRef]
  89. Schneider, C. Beyond Sobolev and Besov—Regularity of Solutions of PDEs and Their Traces in Function Spaces; Lecture Notes in Mathematics; Springer: Cham, Switzerland, 2021; Volume 2291, p. xviii+327. [Google Scholar] [CrossRef]
  90. Sawano, Y. Theory of Besov spaces. In Developments in Mathematics; Springer: Singapore, 2018; Volume 56, p. xxiii+945. [Google Scholar] [CrossRef]
  91. Peetre, J. New Thoughts on Besov Spaces; Duke University Mathematics Series, No. 1; Duke University, Mathematics Department: Durham, NC, USA, 1976; p. vi+305. [Google Scholar]
  92. Rosenblatt, M. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 1956, 42, 43–47. [Google Scholar] [CrossRef]
  93. Bradley, R.C. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 3, p. xii+597. [Google Scholar]
  94. Gorodeckiĭ, V.V. The strong mixing property for linearly generated sequences. Teor. Verojatnost. Primenen. 1977, 22, 421–423. [Google Scholar]
  95. Withers, C.S. Conditions for linear processes to be strong-mixing. Z. Wahrsch. Verw. Geb. 1981, 57, 477–480. [Google Scholar] [CrossRef]
  96. Auestad, B.; Tjøstheim, D. Identification of nonlinear time series: First order characterization and order determination. Biometrika 1990, 77, 669–687. [Google Scholar] [CrossRef]
  97. Chen, R.; Tsay, R.S. Functional-coefficient autoregressive models. J. Am. Stat. Assoc. 1993, 88, 298–308. [Google Scholar] [CrossRef]
  98. Masry, E.; Tjøstheim, D. Nonparametric estimation and identification of nonlinear ARCH time series. Econom. Theory 1995, 11, 258–289. [Google Scholar] [CrossRef]
  99. Masry, E.; Tjøstheim, D. Additive nonlinear ARX time series and projection estimates. Econom. Theory 1997, 13, 214–252. [Google Scholar] [CrossRef]
  100. Newey, W.K. Kernel estimation of partial means and a general variance estimator. Econom. Theory 1994, 10, 233–253. [Google Scholar] [CrossRef]
  101. Tjøstheim, D.; Auestad, B.r.H. Nonparametric identification of nonlinear time series: Projections. J. Am. Stat. Assoc. 1994, 89, 1398–1409. [Google Scholar]
  102. Linton, O.; Nielsen, J.P. A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 1995, 82, 93–100. [Google Scholar] [CrossRef]
  103. Camlong-Viot, C.; Sarda, P.; Vieu, P. Additive time series: The kernel integration method. Math. Methods Stat. 2000, 9, 358–375. [Google Scholar]
  104. Camlong-Viot, C.; Rodríguez-Póo, J.M.; Vieu, P. Nonparametric and semiparametric estimation of additive models with both discrete and continuous variables under dependence. In The Art of Semiparametrics; Contributions to Statistics; Physica-Verlag/Springer: Heidelberg, Germany, 2006; pp. 155–178. [Google Scholar] [CrossRef]
  105. Hastie, T.; Tibshirani, R. Generalized additive models. Stat. Sci. 1986, 1, 297–318. [Google Scholar] [CrossRef]
  106. Sperlich, S.; Linton, O.B.; Härdle, W. Integration and backfitting methods in additive models–Finite sample properties and comparison. Test 1999, 8, 419–458. [Google Scholar] [CrossRef]
  107. Opsomer, J.D.; Ruppert, D. Fitting a bivariate additive model by local polynomial regression. Ann. Stat. 1997, 25, 186–211. [Google Scholar] [CrossRef]
  108. Mammen, E.; Linton, O.; Nielsen, J. The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Stat. 1999, 27, 1443–1490. [Google Scholar] [CrossRef]
  109. Fan, J.; Härdle, W.; Mammen, E. Direct estimation of low-dimensional components in additive models. Ann. Stat. 1998, 26, 943–971. [Google Scholar] [CrossRef]
  110. Einmahl, U.; Mason, D.M. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 2005, 33, 1380–1403. [Google Scholar] [CrossRef]
  111. Deheuvels, P.; Mason, D.M. General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 2004, 7, 225–277. [Google Scholar] [CrossRef]
  112. Stute, W. Conditional empirical processes. Ann. Stat. 1986, 14, 638–647. [Google Scholar] [CrossRef]
  113. Park, H.; Stefanski, L.A. Relative-error prediction. Stat. Probab. Lett. 1998, 40, 227–236. [Google Scholar] [CrossRef]
  114. Jones, M.C.; Park, H.; Shin, K.I.; Vines, S.K.; Jeong, S.O. Relative error prediction via kernel regression smoothers. J. Stat. Plann. Inference 2008, 138, 2887–2898. [Google Scholar] [CrossRef]
  115. Demongeot, J.; Hamie, A.; Laksaci, A.; Rachdi, M. Relative-error prediction in nonparametric functional statistics: Theory and practice. J. Multivar. Anal. 2016, 146, 261–268. [Google Scholar] [CrossRef]
  116. Bouzebda, S.; Taachouche, N. Oracle inequalities and upper bounds for kernel conditional U-statistics estimators on manifolds and more general metric spaces associated with operators. Stochastics 2024, 96, 2135–2198. [Google Scholar] [CrossRef]
  117. Bellman, R. Adaptive Control Processes: A Guided Tour; Princeton University Press: Princeton, NJ, USA, 1961; p. xvi+255. [Google Scholar]
  118. Scott, D.W.; Wand, M.P. Feasibility of multivariate density estimates. Biometrika 1991, 78, 197–205. [Google Scholar] [CrossRef]
  119. Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
  120. Attaoui, S.; Bentata, B.; Bouzebda, S.; Laksaci, A. The strong consistency and asymptotic normality of the kernel estimator type in functional single index model in presence of censored data. AIMS Math. 2024, 9, 7340–7371. [Google Scholar] [CrossRef]
  121. McDonald, D.J.; Shalizi, C.R.; Schervish, M. Estimating beta-mixing coefficients via histograms. Electron. J. Stat. 2015, 9, 2855–2883. [Google Scholar] [CrossRef]
  122. Khaleghi, A.; Lugosi, G. Inferring the mixing properties of an ergodic process. arXiv 2021, arXiv:2106.07054. [Google Scholar]
  123. Li, X.; Wang, L.; Nettleton, D. Additive partially linear models for ultra-high-dimensional regression. Stat 2019, 8, e223. [Google Scholar] [CrossRef]
  124. Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
  125. Camlong-Viot, C. Vers un test d’additivité en régression non paramétrique sous des conditions de mélange. C. R. Acad. Sci. Paris Sér. I Math. 2001, 333, 877–880. [Google Scholar] [CrossRef]
  126. Fu, K.; Zhang, L. LIL behavior for B-valued strong mixing random variables. Sci. China Math. 2011, 54, 785–792. [Google Scholar] [CrossRef]
  127. Bradley, R.C. Approximation theorems for strongly mixing random variables. Mich. Math. J. 1983, 30, 69–81. [Google Scholar] [CrossRef]
  128. Tran, L.T. The L1 convergence of kernel density estimates under dependence. Canad. J. Stat. 1989, 17, 197–208. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chokri, K.; Bouzebda, S. Optimal Almost Sure Rate of Convergence for the Wavelets Estimator in the Partially Linear Additive Models. Symmetry 2025, 17, 394. https://doi.org/10.3390/sym17030394

AMA Style

Chokri K, Bouzebda S. Optimal Almost Sure Rate of Convergence for the Wavelets Estimator in the Partially Linear Additive Models. Symmetry. 2025; 17(3):394. https://doi.org/10.3390/sym17030394

Chicago/Turabian Style

Chokri, Khalid, and Salim Bouzebda. 2025. "Optimal Almost Sure Rate of Convergence for the Wavelets Estimator in the Partially Linear Additive Models" Symmetry 17, no. 3: 394. https://doi.org/10.3390/sym17030394

APA Style

Chokri, K., & Bouzebda, S. (2025). Optimal Almost Sure Rate of Convergence for the Wavelets Estimator in the Partially Linear Additive Models. Symmetry, 17(3), 394. https://doi.org/10.3390/sym17030394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop