Next Article in Journal
Distributed Interval Observers with Switching Topology Design for Cyber-Physical Systems
Previous Article in Journal
Modeling Study of Factors Determining Efficacy of Biological Control of Adventive Weeds
Previous Article in Special Issue
Asymptotic Behavior of a Nonparametric Estimator of the Renewal Function for Random Fields
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Empirical-Likelihood-Based Inference for Partially Linear Models

1
School of Computing, Montclair State University, Montclair, NJ 07043, USA
2
Department of Mathematics and Statistics, Rochester Institute of Technology, Rochester, NY 14632, USA
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(1), 162; https://doi.org/10.3390/math12010162
Submission received: 18 November 2023 / Revised: 31 December 2023 / Accepted: 2 January 2024 / Published: 4 January 2024
(This article belongs to the Special Issue Parametric and Nonparametric Statistics: From Theory to Applications)

Abstract

:
Partially linear models find extensive application in biometrics, econometrics, social sciences, and various other fields due to their versatility in accommodating both parametric and nonparametric elements. This study aims to establish statistical inference for the parametric component effects within these models, employing a nonparametric empirical likelihood approach. The proposed method involves a projection step to eliminate the nuisance nonparametric component and utilizes an empirical-likelihood-based technique, along with the Bartlett correction, to enhance the coverage probability of the confidence interval for the parameter of interest. This method demonstrates robustness in handling normally and non-normally distributed errors. The proposed empirical likelihood ratio statistic converges to a limiting chi-square distribution under certain regulations. Simulation studies demonstrate that this method provides better inference in terms of coverage probabilities compared to the conventional normal-approximation-based method. The proposed method is illustrated by analyzing the Boston housing data from a real study.

1. Introduction

In many practical situations, linear models may not be complex enough to capture the underlying relation between the response variable and some associated covariates, especially when the response variable Y is not linearly related to all the covariates. For example, suppose one is interested in estimating the relationship between an outcome variable Y and vectors of variables X and Z. The researcher is comfortably modeling the linear function in X but hesitates to extend the linearity to Z. One example given by Engle et al. [1] is the effect of temperature on electricity consumption for four cities. They modeled the average monthly electricity consumption as the sum of a smooth function of the monthly temperatures and a linear function of the monthly price of electricity, income and 11 other monthly dummy variables. It is natural to impose linearity on the part of the regression function involving household characteristics and a nonlinear function involving temperature since electricity consumption tends to be higher at extreme temperatures but lower at moderate temperatures. A partially linear model provides a good fit for these types of data because it allows for a regression function that maintains linearity in some variables and also extends the effect of other variables to be nonlinear.
A partially linear regression model is defined as
Y i = X i β + g ( Z i ) + ε i , i = 1 , , n ,
where the Y i ’s are scalar response variables, X i = ( x i 1 , , x i p ) are known p-variate covariate, Z i is a scalar explanatory variable, and g ( · ) is the smooth part of the model, which is assumed to represent a smooth unparameterized functional relationship. β = ( β 1 , , β p ) is a vector of unknown parameters and ε 1 , , ε n are independent random errors with mean zero and finite variance σ 2 given the covariates X and Z.
The partial linear model Equation (1) is a semiparametric model since it contains both parametric and nonparametric components. The partially linear model is more flexible to interpret the effect of each linear covariate and allows one to focus on particular variables that can have nonlinear effects. It may be preferable to a completely nonparametric model because of the well-known “curse of dimensionality”. Computationally, partially linear models are remarkably easier than additive models, in which iterative approaches such as a backfitting algorithm [2] or marginal integration [3] are necessary.
Partially linear models are widely used in biometrics, econometrics, social sciences and other fields (see [1,4]), and have been studied extensively for estimating β and g ( · ) . For example, Wahba [5], Engle et al. [1], and Green et al. [6] described penalized spline estimates of β and g ( · ) . Heckman [7] and Rice [8] proposed the polynomial method. Speckman [9] described the kernel method. Chen and Shiau [10] used a smoothing spline method. Chen [11] proposed the projection method. For more discussions about partially linear models, we refer to Härdle et al. [12] for a summary.
In most cases, investigators are more interested in the parameter β and take g ( · ) as a nuisance parameter [13]. Estimating the confidence interval for the parametric components in partially linear models using a backfitting algorithm or marginal integration can be computationally heavy. Severini and Staniswalis [14] derived the asymptotic properties for their proposed estimators of β and g ( · ) under mild regularity conditions. These asymptotic properties serve as a foundation for constructing confidence intervals that are asymptotically accurate for the parameters. However, in practice, the finite-sample performance of these confidence intervals may be less satisfactory because of the complex structure of the covariance matrix, requiring estimates to be plugged in for multiple parameters. The linear components in the partially linear models can also be estimated using the generalized additive models [15], but the results depend on the distribution family used in the gam function in R. When a wrong distribution family is chosen, the results could be very biased. Confidence interval for the parametric components in the partially linear models can also be constructed based on the asymptotic normal distribution; however, this may not hold when the normality assumption fails or when the sample size is small.
Empirical likelihood provides a good alternative among the nonparametric methods that can be used to make statistical inference when the normality assumption fails or when the distribution is unspecified. The advantages of empirical likelihood compared to the bootstrap method and the jackknife method arise from it being a nonparametric method of inference based on a data-driven likelihood ratio function. As a combination of a nonparametric method and the likelihood method, on one hand, it does not require any specification of a family of distributions for the data; on the other hand, like parametric likelihood methods, it makes an automatic determination of the shape of confidence regions [16]. This property makes it a serious competitor with other nonparametric methods such as the bootstrap method and the jackknife method. Although empirical likelihood can be a very useful tool for deriving statistical inference, the use of a conventional empirical likelihood method or the profile empirical likelihood has limitations when constructing confidence intervals for each element of a large parameter vector.
Motivated by the above mentioned concerns, this paper develops an empirical-likelihood-based procedure which can be used to make inferences a for large parameter vector β in partially linear models in Equation (1) by incorporating the projection method. The proposed method has two main advantages. First, it does not require distribution assumptions. Second, we provide theoretical justification that the proposed method can be applied to partially linear models, and the computation requirements are relatively straightforward because it does not require an asymptotic variance estimation. After the Bartlett correction, the coverage probability of the confidence interval is improved and better than normal-approximation-based methods in most cases.
The structure of the paper is as follows. Section 2 gives the model formulation of the empirical likelihood for the parameter of interest and the Bartlett correction procedure for the proposed method. Section 3 studies the performance of the proposed methods through simulation studies and illustrates the method by a real study example. Section 4 gives the conclusion. All the proofs are given in the Appendix A.

2. Materials and Methods

2.1. Model Formulation

Since the interest in this paper is in obtaining inference for β only in the partially linear model, the nuisance parameter g ( · ) needs to be removed first. This is implemented by using the projection principle [17,18]. Y and X need to be first regressed on Z using a nonparametric regression method, where Y = ( Y 1 , , Y n ) , Z = ( Z 1 , , Z n ) , a n d X is an n × p covariate matrix. Denote the nonparametric regressions of Y on Z and X on Z by m Y ( Z ) and m X ( Z ) , respectively. Here, without loss of generality, let X be a one-dimensional vector (for a multidimensional vector, E ( X i | Z ) can be obtained for each column X i of X , respectively), and then, the effect of Z on Y and X can be removed by using the regression residual of Y and X given Z. For simplicity of notation, the matrix form of the partially linear model was used here:
Y = X β + g ( Z ) + ε .
The first step is to regress Y and X onto Z and obtain the following equation
m Y ( Z ) = m X ( Z ) β + g ( Z ) .
Then, Equation (3) is subtracted from the original model (2), and the residual model is obtained as follows:
Y m Y ( Z ) = { X m X ( Z ) } β + ε .
Denote A 2 = A A T and ζ ˜ = ζ m ζ ( Z ) . For example, X ˜ = X m X ( Z ) . Assuming X ˜ has full rank, based on Speckman [9], the estimator of β can then be given by Equation (5) by the least squares method if m X ( Z ) and m Y ( Z ) are known:
β ^ = i = 1 n ( X i m X ( Z i ) ) 2 1 i = 1 n { X i m X ( Z i ) } { Y i m Y ( Z i ) } .
The formula above cannot be applied directly since m X ( Z ) and m Y ( Z ) need to be estimated appropriately. There are lots of methods for estimating m X ( Z ) and m Y ( Z ) , including local constant smoothers [9], higher-order local polynomial estimators [19], kernel methods with varying bandwidths, smoothing and regression splines, etc. Fan and Gijbels [19] showed that within the class of linear estimators which include kernel and spline estimates, the local linear estimates achieve the best possible rates of convergence. Due to these desirable properties, the local linear smoothers was used with fixed bandwidths for estimating the nonparametric regression of Y and X on Z. Let m ^ X ( Z ) and m ^ Y ( Z ) be the local linear nonparametric regression estimators for m X ( Z ) and m Y ( Z ) , K ( . ) be a symmetric density function, h be a suitable bandwidth, and define K h ( z ) = K ( z / h ) / h ; then, the estimators take the form given by Fan and Gijbels [19]:
m ^ X ( Z ) = i = 1 n w i X i i = 1 n w i   and   m ^ Y ( Z ) = i = 1 n w i Y i i = 1 n w i ,
w i = K h ( Z i Z ) { S n , 2 ( Z i Z ) S n , 1 } ,
where S n , j = 1 n K h ( Z i Z ) ( Z i Z ) j . m X ( Z ) and m Y ( Z ) are then replaced with their corresponding estimates m ^ X ( Z ) , m ^ Y ( Z ) in the estimating procedure (a Gaussian kernel is an example of kernel function used in the estimating procedure) and the empirical likelihood estimator of β needs to satisfy the following estimating equation:
i = 1 n { X i m ^ X ( Z i ) } T [ Y i m ^ Y ( Z i ) { X i m ^ X ( Z i ) } β ] = 0 .
This implies that the estimator for β ^ can be obtained by
β ^ * = i = 1 n ( X i m ^ X ( Z i ) ) 2 1 i = 1 n { X i m ^ X ( Z i ) } { Y i m ^ Y ( Z i ) } .
Next, the empirical likelihood principle was applied to construct statistical inference for β . Let p i be the probability assigned to ( X i , Z i , Y i ) . The empirical likelihood ratio function for β can be expressed as:
R n ( β ) = sup p i { i = 1 n n p i | i = 1 n p i { X i m ^ X ( Z i ) } T [ Y i m ^ Y ( Z i ) { X i m ^ X ( Z i ) } β ] = 0 , p i 0 , i = 1 n p i = 1 } .
We establish the asymptotic distribution of 2 log { R n ( β ) } under the following assumptions:
Assumption 1. 
E ( X 4 ) < , E ( X 2 Y 2 ) < , and E ( X T X ) is nonsingular; X and Z are correlated.
Assumption 2. 
The bandwidths used in estimating m x ( Z ) and m y ( Z ) are of order n 1 / 5 .
Assumption 3. 
The function K ( . ) is a bounded symmetric density function with compact support and satisfies K ( u ) d u = 1 , K ( u ) u d u = 0 and u 2 K ( u ) d u = 1 .
Assumption 4. 
The functions m x ( Z ) and m y ( Z ) have bounded and continuous second derivatives.
Assumption 5. 
The density function of Z , f z ( Z ) is bounded away from zero and has bounded continuous second derivatives.
Theorem 1. 
2 log { R n ( β ) } converges to a chi-squared distribution with p degrees of freedom under Assumptions 1–5.
The proof of Theorem 1 is given in the Appendix A. A confidence region for β can be constructed based on Theorem 1 and further adjusted by using the Bartlett correction [20].
When β is a vector (or when X is an n × p matrix), and we are interested in a subset of the parameter vector β , say the first element β 1 , we can apply the projection method again, i.e., we regress X ^ 1 , the first column of X ^ , which is X m ^ X ( Z ) , onto the space of X ^ 1 , which is the remaining columns of X ^ . Similarly, we apply the same projection principle from Y ^ = Y m ^ Y ( Z ) to X ^ 1 . Then, we obtain a new residual model, i.e., β 1 should satisfy the estimating equation as follows:
1 n i = 1 n { X ^ 1 E ^ ( X ^ 1 | X ^ 1 ) } T [ Y ^ E ^ ( Y ^ | X ^ 1 ) { X ^ 1 E ^ ( X ^ 1 | X ^ 1 ) } β 1 ] = 0 .
Let p i be the probability assigned to ( Y i , X i , Z i ) , where p i could be different from the p i in Theorem 1. The estimating equation for β 1 can be written as:
R n ( β 1 ) = sup p i { i = 1 n n p i | i = 1 n p i { X ^ 1 E ^ ( X ^ 1 | X ^ 1 ) } T [ Y ^ E ^ ( Y ^ | X ^ 1 ) { X ^ 1 E ^ ( X ^ 1 | X ^ 1 ) } β 1 ] = 0 , p i 0 , i = 1 n p i = 1 } .
Theorem 2. 
2 log { R n ( β 1 ) } converges to a chi-squared distribution with 1 degree of freedom under Assumptions 1–5.
The proof of Theorem 2 is given in the Appendix A. Based on Theorem 2, the 100 ( 1 α ) % empirical likelihood confidence interval for β 1 can be obtained by:
{ β 1 : 2 log { R n ( β 1 ) } c α } .
The 100 ( 1 α ) % confidence interval for other components of β can be constructed similarly.

2.2. Bartlett Correction

To further improve the accuracy of the inference, the empirical likelihood ratio may be Bartlett corrected with a higher-order error than the usual error term of order O ( n 1 ) [20]. The Bartlett correction can effectively control the coverage error of the confidence interval, providing more accurate estimations and reducing the chance of obtaining intervals that do not contain the true parameter value. The basic idea is to multiply the χ 2 threshold by a constant ( 1 + B c / n ) instead of 1, where B c is the Bartlett correction constant. Because it is very difficult to obtain an exact expression for B c , we give an estimator of ( 1 + B c / n ) by using the bootstrap procedure, which has successfully been applied in a more complex setting by Chen and Cui [21].
The Bartlett correction of the empirical likelihood confidence interval for a parameter of interest β 1 in a partially linear model in Equation (1) is constructed by the following procedures. The procedures for another component of β , say β 2 , would be similar.
  • First, the nonparametric regression method is used to regress Y and X on the nonparametric component Z. The reduced partial residuals follow a linear model of the form Y m Y ( Z ) = { X m X ( Z ) } β + ε . We use m ^ Y ( Z ) and m ^ X ( Z ) to replace m Y ( Z ) and m X ( Z ) in the estimating procedure.
  • Then, the first column of X ^ (denoting by X ^ 1 ) is regressed on the rest of the columns (denoting by X ^ 1 ) . The residual serves as the new fixed covariates of β 1 , and the residual of regressing Y ^ on X ^ 1 serves as the new response variable. The residual model is obtained and given by
    { Y ^ E ( Y ^ | X ^ 1 ) } = { X ^ 1 E ( X ^ 1 | X ^ 1 ) } β + ε .
  • We treat the residual model as the new linear model. The bootstrap procedure of estimating the Bartlett correction factor in the new linear model follows the procedure shown below:
    (a).
    Generate bootstrap resamples of size n by sampling with replacement from the sample { Y ^ E ( Y ^ | X ^ 1 ) } 1 n and { X ^ 1 E ( X ^ 1 | X ^ 1 ) } 1 n , respectively, after the projection; then, calculate 2 log { R n * ( β ^ 1 ) } based on the resamples, where β 1 ^ is the global maximum empirical likelihood estimator of β 1 based on the original sample { Y ^ E ( Y ^ | X ^ 1 ) } 1 n and { X ^ 1 E ( X ^ 1 | X ^ 1 ) } 1 n .
    (b).
    Repeat (a) B times to obtain 2 log { R n * 1 ( β ^ 1 ) } , 2 log { R n * 2 ( β ^ 1 ) } , , 2 log { R n * B ( β ^ 1 ) } and B 1 b = 1 B 2 log { R n * b ( β ^ 1 ) } , which is the bootstrap estimator of E [ 2 log { R n ( β ^ 1 ) } ] .
The bootstrap estimator of τ is B 1 b = 1 B 2 log { R n * b ( β ^ 1 ) } . In consequence, the Bartlett corrected confidence region is constructed by
C I α = { β 1 : 2 log { R n ( β 1 ) } τ ^ c α } .
The Bartlett corrected confidence interval for β 1 is thus constructed.

3. Results

3.1. Simulation Studies

In the simulation studies, we studied the performance of the proposed method in getting the inference of the parameter of interest β in the partially linear model (1). We first simulated Z from a U n i f ( 0 , 1 ) distribution with sample size n. The true β value was set to be β = ( 2 , 5 , 7 , 4 ) , and we aimed to estimate the first component of β . X was set to be the sum of two matrices 1 and 2 , where 1 was the matrix composed of vectors 1.5 exp ( 1.5 z ) , 5 z , 5 z , and 3 z + z 2 . 2 was the matrix of error terms composed of n samples from the scaled multivariate normal distribution with zero mean and a compound symmetry covariance matrix with diagonal 1 and off-diagonal 0.4 ; the scale parameter was 0.5 . The columns of the X matrix were functions of Z and were thus correlated. The nonparametric component g ( Z ) took the function g ( Z ) = sin ( Z ) . Two cases for the distribution of the error term ε were considered,
Case 1:
ε follows a normal distribution with mean 0 and variance σ 2 = 1 .
Case 2:
ε follows the scaled log-normal distribution such that ε has mean 0 and variance σ 2 = 1 .
In the simulations, the sample sizes were considered to be 50, 100, and 200. In each simulation, we generated 1000 independent data sets and constructed the 95 % confidence interval for each data set. In estimating the nonparametric regression of m y ( Z ) and m x ( Z ) , the direct plug-in method was used to select the bandwidth of a local linear Gaussian kernel regression estimate, as described by Ruppert, Sheather, and Wand [22]. The proposed method was compared with the normal-based method and the generalized additive model method (gam) [15].
Table 1 gives the average results from the 1000 simulations (the endpoints of the confidence intervals were obtained by the medians of the 1000 simulation results, and the confidence interval lengths were computed using the difference of the two endpoints). In Table 1, Est refers to the estimated β 1 value; Norm, Gam, EL, and ELb refer to the normal-based method, the gam function in R, the empirical-likelihood-based method without Bartlett correction, and the empirical-likelihood-based method with Bartlett correction, respectively. Length and coverage probability refer to the respective length and coverage probability of the confidence intervals constructed using the four different methods. It is worth mentioning that each confidence interval based on the normal approximation is symmetric while the confidence interval based on empirical likelihood is not symmetric. In the simulation, the Gaussian distribution was used as the distribution family within the gam function under both error cases.
The simulation results from Table 1 indicates that the Bartlett correction indeed improved the statistical inference. The coverage probability was improved after the Bartlett correction, especially when the sample size is small, where the normal approximation method may not be appropriate. When the sample size is small (for example n = 50 ), our proposed method tends to enlarge the confidence interval to have a better coverage probability for the true parameter. In that case, the length of the confidence interval for the Bartlett correction is larger than that of the normal approximation, the gam method, and the empirical likelihood without Bartlett correction, but the coverage probability is the closest to the nominal level 95 % . When the sample size becomes larger, the length of the confidence interval using the proposed method tends to be close to or shorter than the confidence interval of the normal approximation method and yet still has slightly better or equally good coverage probability compared to the normal approximation method and gam method.

3.2. A Real Study Example

The proposed method is illustrated by an application to the Boston housing data set, which was obtained from the StatLib archive and has been extensively used in regression analysis. The data set consists of the median value of owner-occupied homes in 506 US census tracts in the Boston area in 1970, as well as several variables which might explain the variation in housing values. Based on the correlations and multicollinearity analysis, we fit a partially linear model with the variable of interest MEDV (median value of owner-occupied home in USD 1000) linearly related with predictor PTRATIO (pupil–teacher ratio by town), RM (number of rooms per dwelling), and nonlinearly related with variable LSTAT (% lower status of the population). The partially linear model has the following form:
M E D V = β 0 + β 1 P T R A T I O + β 2 R M + g ( L S T A T ) + ε .
The proposed method was used to construct the 95% confidence interval for β 1 . The proposed empirical-likelihood-based Bartlett corrected 95% confidence interval for β 1 was (2.375, 4.656), and the normal-based 95% confidence interval was (2.406, 4.502). Both methods indicated a positive linear relationship between PTRATIO and MEDV, with the proposed method’s confidence interval slightly wider than the normal-based confidence interval. Based on our simulation results for the coverage probability under a large sample size, the confidence interval obtained from the proposed method was comparable with the normal-based confidence interval and was trustworthy.

4. Discussion

In this paper, an empirical-likelihood-based method to construct the confidence interval for the linear components in partially linear models was proposed. Simulation studies showed that the length of the confidence interval for the proposed empirical likelihood with Bartlett correction method was larger than the normal approximation when the sample size was small, but the coverage probability was the closest to the nominal 95 % level. When the sample size was larger, the confidence interval for the proposed empirical likelihood with Bartlett correction method had a slightly shorter length and a similar coverage probability as the normal-based method and gam method, which indicated the confidence interval constructed by the proposed method was more desirable in estimating the parameter of interest. The above findings are mostly true under both normally distributed error and non-normally distributed error terms. This ensures the robustness of our proposed test numerically, which also makes the proposed method a practically useful tool in real studies where we usually do not know the distribution of the data. The trade-off of the proposed method is that it requires more computation than the normal-approximation method.
In summary, this proposed method gives better inference in terms of the length and coverage probabilities of the confidence intervals compared to the normal-approximation-based method. It does not impose any restrictions on the data distribution, and the computations are relatively straightforward for partially linear models. This proposed method is recommended for estimating and constructing confidence intervals for the linear components in partially linear models, particularly when the sample size is small.

Author Contributions

Methodology, H.S.; software, H.S. and L.C.; formal analysis, H.S. and L.C.; writing—original draft, revision, H.S.; writing—review and editing, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The boston housing data used in the paper was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston, accessed on 1 November 2023).

Acknowledgments

The authors would like to thank the editor and three referees for their insightful comments that significantly improved an earlier version of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1. 
First, we give the following fact, which is used later in the proof of the theorem. Its proof can be shown by Assumptions 2–5:
m ^ x ( Z ) m x ( Z ) = o p ( n 1 / 4 ) , m ^ y ( Z ) m y ( Z ) = o p ( n 1 / 4 )
From Y = X β + g ( Z ) + ε , we have Y m y ( Z ) = { X m x ( Z ) β } + ε . In the estimating process, it is rewritten as Y ^ = X ^ β + ε with the notation that m ^ ξ ( T ) is the local linear kernel regression estimator (for example, the kernel function can be the Gaussian kernel function, the bandwidth h can be determined by using the direct plug-in method by Ruppert, Sheather, and Wand [22] of a local linear Gaussian kernel regression estimate) of m ξ ( T ) and ξ ^ = ξ m ^ ξ ( T ) .
Let Ω i = X ^ i T ( Y ^ i X ^ i β ) , Ω ˜ i = X ˜ i T ( Y ˜ i X ˜ i β ) . A standard simplification as in Owen [16] yields
p i = 1 n ( 1 + a T Ω ˜ i ) , i = 1 , , n ,
where a is the solution of the equation
n 1 i = 1 n Ω ˜ i 1 + a T Ω ˜ i = 0 .
A direct calculation yields
Ω ˜ i Ω i = X ˜ i T ( Y ˜ i X ˜ i β ) ( X ^ i X ˜ i + X ˜ i ) T ( Y ^ i Y ˜ i + Y ˜ i X ^ i β + X ˜ i β X ˜ i β ) = X ˜ i T ( Y ˜ i Y ^ i ) + X ˜ i T ( X ^ i X ˜ i ) β ( X ^ i X ˜ i ) T ( X ^ i Y ˜ i ) + ( X ^ i X ˜ i ) T ( X ^ i X ˜ i ) β ( X ^ i X ˜ i ) T ( Y ˜ i X ˜ i β ) = o p ( 1 ) ,
where o p ( 1 ) is independent of the index i. The above equation is of order o p ( 1 ) because X ˜ i is O p ( 1 ) . Y ˜ i Y ^ i and X ˜ i X ^ i are of order o p ( n 1 / 4 ) by (A1).
Using arguments similar to those in the proof of Theorem 3.2 of Owen [16], we have
a = O p ( n 1 / 2 )   and   max 1 i n Ω ˜ i = o p ( n 1 / 2 ) .
With (A2), we have max 1 i n Ω i   max 1 i n Ω ˜ i + o p ( 1 ) = o p ( n 1 / 2 ) . Using the same argument as those in the proof of Theorem 4 in Liang et al. [23], we have
2 log { R n ( β ) } = i = 1 n a T Ω ˜ i Ω ˜ i T a + o p ( 1 ) = n 1 / 2 i = 1 n Ω ˜ i T n 1 i = 1 n Ω ˜ i Ω ˜ i T 1 n 1 / 2 i = 1 n Ω ˜ i + o p ( 1 ) .
Now, we need to show that by replacing Ω ˜ with Ω i , the above equation still holds. To show that, we first show that n 1 / 2 i = 1 n X ˜ i T ( Y ˜ i Y ^ i ) = o p ( 1 ) . Given ϵ > 0 and some constant c, let a i = Y ˜ i Y ^ i = m ^ y ( Z i ) m y ( Z i ) . We have
P n 1 / 2 | i = 1 n X ˜ i T ( Y ˜ i Y ^ i ) | > ϵ P n 1 / 2 | i = 1 n X ˜ i T a i | > ϵ , | a i | c n 1 / 4 + P ( | a i | > c n 1 / 4 ) P n 1 / 2 · c n 1 / 4 | i = 1 n X ˜ i T | > ϵ + o p ( 1 ) = P n 1 / 2 | i = 1 n X ˜ i T | > c n 1 / 4 ϵ + o p ( 1 ) = o p ( 1 ) + o p ( 1 ) = o p ( 1 ) .
In the above equations, P n 1 / 2 i = 1 n | X ˜ i T | > c n 1 / 4 ϵ = o p ( 1 ) because n 1 / 2 i = 1 n X ˜ i T N ( 0 , v ( X | Z ) ) , where v ( X | Z ) is the covariance matrix of X E ( X | Z ) . Similarly, we can show
1 n i = 1 n X ˜ i T ( X ˜ i X ^ i ) β = o p ( 1 ) . 1 n i = 1 n ( X ^ i X ˜ i ) T ( Y ˜ i X ˜ i ) β = o p ( 1 ) .
To show 1 n i = 1 n ( X ^ i X ˜ i ) T ( Y ^ i Y ˜ i ) = o p ( 1 ) , note that
1 n i = 1 n ( X ^ i X ˜ i ) T ( Y ^ i Y ˜ i ) = n 1 / 2 o p ( n 1 / 4 ) · o p ( n 1 / 4 ) · n = o p ( 1 ) .
The first equal sign in (A3) holds because sup Z i m ^ w ( Z i ) m w ( Z i ) = o p ( n 1 / 4 ) , where w = X or w = Y , so that we can take o p ( n 1 / 4 ) out of the summation. With the same procedure, we can also show that
1 n i = 1 n ( X ^ i X ˜ i ) T ( X ^ i X ˜ i ) β = o p ( 1 ) .
These arguments imply that n 1 1 / 2 i = 1 n Ω i and n 1 / 2 i = 1 n Ω ˜ i asymptotically have the same limiting normal distribution, and n 1 i = 1 n Ω i Ω i T and n 1 i = 1 n Ω ˜ i Ω ˜ i T have the same limiting value. Since
n 1 / 2 i = 1 n Ω ˜ i T n 1 i = 1 n Ω ˜ i Ω ˜ i T n 1 / 2 i = 1 n Ω ˜ i χ p 2 ,
we have
n 1 / 2 i = 1 n Ω i T n 1 i = 1 n Ω i Ω i T n 1 / 2 i = 1 n Ω i χ p 2 .
The proof is thus complete. □
Proof of Theorem 2. 
We continue to use the notations ξ ˜ = ξ ξ ( Z ) , ξ ^ = ξ m ^ ξ ( Z ) for any random vector ξ . First, denote
Ω i = { X ^ 1 i E ^ ( X ^ 1 | X ^ 1 ) i } T [ Y ^ i E ^ ( Y ^ i | X ^ 1 i ) { X ^ 1 i E ^ ( X ^ 1 | X ^ 1 ) i β 1 } ] .
Ω ^ i = { X ^ 1 i E ( X ^ 1 | X ^ 1 ) i } T [ Y ^ i E ( Y ^ i | X ^ 1 i ) { X ^ 1 i E ( X ^ 1 | X ^ 1 ) i β 1 } ] .
Ω ˜ i = { X ˜ 1 i E ( X ˜ 1 | X ˜ 1 ) i } T [ Y ˜ i E ( Y ˜ i | X ˜ 1 i ) { X ˜ 1 i E ( X ˜ 1 | X ˜ 1 ) i β 1 } ] .
We first need to show Ω i = Ω ˜ i + o p ( 1 ) , and n 1 1 / 2 i = 1 n Ω i and n 1 / 2 i = 1 n Ω ˜ i asymptotically have the same limiting distribution.
Since in the linear model case, we have proved that Ω i Ω ^ i = o p ( 1 ) and n 1 1 / 2 i = 1 n Ω ^ i and n 1 / 2 i = 1 n Ω ˜ i asymptotically have the same limiting distribution, now we only need to show
Ω ^ i Ω ˜ i = o p ( 1 )
and
n 1 / 2 i = 1 n Ω ^ i n 1 / 2 i = 1 n Ω ˜ i = o p ( 1 ) .
Assume
E ( Y ˜ | X ˜ 1 ) = X ˜ 1 η ,
E ( X ˜ 1 | X ˜ 1 ) = X ˜ 1 γ .
Recall that in the estimating procedures, we replaced Y ˜ , X ˜ with Y ^ , X ^ , so we have
E ( Y ^ | X ^ 1 ) = X ^ 1 η ,
E ( X ^ 1 | X ^ 1 ) = X ^ 1 γ .
We first show that with this replacement, Ω ˜ i = Ω ^ i + o p ( 1 ) holds. Note that by using Equation (A1) where m v ( Z ) m ^ v ( Z ) = o p ( 1 ) and v = x or y , because X ˜ 1 i X ˜ 1 i γ , Y ˜ i X ˜ 1 i η and X ˜ 1 i X ˜ 1 i γ are random variables, we have
Ω ˜ i Ω ^ i = ( X ˜ 1 i X ˜ 1 i γ ) T { Y ˜ i Y ^ i ( X ˜ 1 i X ^ 1 i ) η ( X ˜ 1 i X ^ 1 i ) β 1 + ( X ˜ 1 i X ^ 1 i ) γ β 1 } { ( X ^ 1 i X ˜ 1 i ) ( X ^ 1 i X ˜ 1 i ) γ } T [ Y ^ i Y ˜ i ( X ^ 1 i X ˜ 1 i ) η + Y ˜ i X ˜ 1 i η { X ^ 1 i X ˜ 1 i ( X ^ 1 i X ˜ 1 i ) γ + X ˜ 1 i X ˜ 1 i γ } β 1 ] = O p ( 1 ) { o p ( n 1 / 4 ) o p ( n 1 / 4 ) η o p ( n 1 / 4 ) β 1 + o p ( n 1 / 4 ) γ β 1 } { o p ( n 1 / 4 ) o p ( n 1 / 4 ) γ } [ o p ( n 1 / 4 ) o p ( n 1 / 4 ) η + O p ( 1 ) { o p ( n 1 / 4 ) o p ( n 1 / 4 ) γ O p ( 1 ) } β 1 ] = o p ( 1 ) .
To show Equation (A4), we first need to show
n 1 / 2 i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( Y ˜ i Y ^ i ) = o p ( 1 ) .
For a given ϵ and a certain constant c, we have
P n 1 / 2 | i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( Y ˜ i Y ^ i ) | > ϵ P n 1 / 2 | i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( Y ˜ i Y ^ i ) | > ϵ , | Y ˜ i Y ^ i | c n 1 / 4 + P | Y ˜ i Y ^ i | > c n 1 / 4 P n 1 / 2 c n 1 / 4 | i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) | > ϵ + o p ( 1 ) = o p ( 1 ) .
Equation (A5) holds because n 1 / 2 i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) converges to N ( 0 , v ( X ˜ 1 | X ˜ 1 ) ) , where v ( X ˜ 1 | X ˜ 1 ) is the variance of ( X ˜ 1 X ˜ 1 γ ) . Using a similar proof, we have
n 1 / 2 i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( X ˜ 1 i X ^ 1 i ) η = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( X ˜ 1 i X ^ 1 i ) β 1 = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ˜ 1 i X ˜ 1 i γ ) ( X ˜ 1 i X ^ 1 i ) γ β 1 = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) ( Y ˜ i X ˜ 1 i η ) = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) ( X ˜ 1 i X ˜ 1 i γ ) = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ ( Y ˜ i X ˜ 1 i η ) = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ ( X ˜ 1 i X ˜ 1 i γ ) = o p ( 1 ) .
With the same proof as in Equation (A3), we have
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) T ( Y ^ i Y ˜ i ) = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) T ( X ^ 1 i X ˜ 1 i ) η = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) T ( X ^ 1 i X ˜ 1 i ) β 1 = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) T ( X ^ 1 i X ˜ 1 i ) γ β 1 = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ T ( Y ^ i Y ˜ i ) = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ T ( X ^ 1 i X ˜ 1 i ) η = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ T ( X ^ 1 i X ˜ 1 i ) β 1 = o p ( 1 ) .
n 1 / 2 i = 1 n ( X ^ 1 i X ˜ 1 i ) γ T ( X ^ 1 i X ˜ 1 i ) γ β 1 = o p ( 1 ) .
With the above equations, Equation (A4) holds. The proof is thus completed following the same procedure as in proving Theorem 1. □

References

  1. Engle, R.; Granger, C.; Rice, J.; Weiss, A. Nonparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
  2. Hastie, T.; Tibshirani, R. Generalized Additive Models; Chapman and Hall: London, UK, 1990. [Google Scholar]
  3. Linton, O.; Nielsen, J. A Kernel Method of Estimating Structured Nonparametric Regression Based on Marginal Integration. Biometrika 1995, 82, 93–100. [Google Scholar] [CrossRef]
  4. Gray, R. Spline-based test in survival analysis. Biometrika 1994, 50, 640–652. [Google Scholar] [CrossRef]
  5. Wahba, G. Cross validated spline methods for the estimation of multivariate functions from data on functionals. In Proceedings of the Iowa State University Statistical Laboratory 50th Anniversary Conference, Ames, IA, USA, 13–15 June 1984; David, H.A., David, H.T., Eds.; The Iowa State University Press: Ames, IA, USA, 1984; pp. 205–235. [Google Scholar]
  6. Green, P.; Jennison, C.; Seheult, A. Analysis of field experiments by least squares smoothing. J. R. Stat. Soc. Ser. 1985, 47, 299–315. [Google Scholar] [CrossRef]
  7. Heckman, N. Smoothing Spline in partly linear models. J. R. Stat. Ser. B 1986, 48, 244–248. [Google Scholar] [CrossRef]
  8. Rice, J. Convergence rates for partially splined models. Stat. Probab. Lett. 1994, 4, 203–208. [Google Scholar] [CrossRef]
  9. Speckman, P. Kernel Smoothing in Partial Linear Models. J. R. Stat. Soc. Ser. B 1988, 50, 413–436. [Google Scholar] [CrossRef]
  10. Chen, H.; Shiau, J.J.H. Data-Driven Efficient Estimators for a Partially Linear Model. Ann. Stat. 1994, 22, 211–237. [Google Scholar] [CrossRef]
  11. Chen, H. Convergence rates for parametric components in a partly linear model. Ann. Stat. 1988, 16, 136–146. [Google Scholar] [CrossRef]
  12. Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Springer Physica: Heidelberg, Germany, 2000. [Google Scholar]
  13. Liang, H. Estimation in Partially Linear Models and Numerical Comparisons. Comput. Stat. Data Anal. 2006, 50, 675–687. [Google Scholar] [CrossRef] [PubMed]
  14. Severini, T.; Staniswalis, J. Quasilikelihood estimation in semiparametric models. J. Am. Stat. Assoc. 1994, 89, 501–511. [Google Scholar] [CrossRef]
  15. Hastie, T.J. Generalized Additive Models; Wadsworth and Brooks/Cole: Pacific Grove, CA, USA, 1992. [Google Scholar]
  16. Owen, A. Empirical Likelihood; Chapman and Hall/CRC: London, UK, 2001. [Google Scholar]
  17. Robinson, P.M. Root-n-consistent semiparametric regression. Econometrica 1988, 56, 931–954. [Google Scholar] [CrossRef]
  18. Su, H.; Liang, H. An empirical likelihood-based method for comparison of treatment effects-test of equality of coefficients in linear models. Comput. Stat. Data Anal. 2010, 54, 1079–1088. [Google Scholar] [CrossRef] [PubMed]
  19. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman and Hall/CRC: London, UK, 1996. [Google Scholar]
  20. DiCiccio, T.; Hall, P.; Romano, J. Empirical likelihood is bartlett-correctable. Ann. Stat. 1991, 19, 1053–1061. [Google Scholar] [CrossRef]
  21. Chen, S.; Cui, H. On bartlett correction of empirical likelihood in the presense of nuisance parameters. Biometrica 2006, 93, 215–220. [Google Scholar] [CrossRef]
  22. Ruppert, D.; Sheather, S.J.; Wand, M.P. An effective bandwidth selector for local least squares regression. J. Am. Med. Assoc. 1995, 90, 1257–1270. [Google Scholar] [CrossRef]
  23. Liang, H.; Wang, S.; Carroll, R. Partially linear models with missing response variables and error-prone covariates. Biometrica 2007, 94, 185–198. [Google Scholar]
Table 1. Confidence interval and coverage probability for partially linear models; β = ( 2 , 5 , 7 , 4 ) , σ = 1 , and β 1 is the parameter of interest.
Table 1. Confidence interval and coverage probability for partially linear models; β = ( 2 , 5 , 7 , 4 ) , σ = 1 , and β 1 is the parameter of interest.
nEstLengthCoverage Probability
NormGamELELbNormGamELELb
Norm501.9571.4961.4981.4371.5200.9350.9360.9240.945
1002.0471.0521.0851.0161.0490.9630.9630.9490.953
2002.0260.7060.7040.6890.7010.9460.9500.9440.950
Non-norm501.9751.3681.2891.2781.3860.9460.9340.9440.950
1002.051.0121.0510.9801.0080.9730.9620.9540.956
2002.0270.6810.6890.6680.6770.9440.9400.9300.946
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, H.; Chen, L. Empirical-Likelihood-Based Inference for Partially Linear Models. Mathematics 2024, 12, 162. https://doi.org/10.3390/math12010162

AMA Style

Su H, Chen L. Empirical-Likelihood-Based Inference for Partially Linear Models. Mathematics. 2024; 12(1):162. https://doi.org/10.3390/math12010162

Chicago/Turabian Style

Su, Haiyan, and Linlin Chen. 2024. "Empirical-Likelihood-Based Inference for Partially Linear Models" Mathematics 12, no. 1: 162. https://doi.org/10.3390/math12010162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop