Next Article in Journal
A Kernel Recursive Maximum Versoria-Like Criterion Algorithm for Nonlinear Channel Equalization
Previous Article in Journal
Evaluation of Rolling Bearing Performance Degradation Using Wavelet Packet Energy Entropy and RBF Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Composite Quantile Regression for Varying Coefficient Models with Response Data Missing at Random

1
School of Science, Xi’an Polytechnic University, Xi’an 710048, China
2
School of Economics and Finance, Xi’an Jiaotong University, Xi’an 710061, China
3
School of Economics and Management, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(9), 1065; https://doi.org/10.3390/sym11091065
Submission received: 24 July 2019 / Revised: 17 August 2019 / Accepted: 19 August 2019 / Published: 21 August 2019

Abstract

:
Composite quantile regression (CQR) estimation and inference are studied for varying coefficient models with response data missing at random. Three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are proposed for unknown coefficient functions. Under some mild conditions, the proposed estimators are asymptotic normal. Simulation studies demonstrate that the unknown coefficient estimators with IWLLCQR are superior to the other two with WLLCQR and NWLLCQR. Moreover, bootstrap test procedures based on the IWLLCQR fittings is developed to test whether the coefficient functions are actually varying. Finally, a type of investigated real-life data is analyzed to illustrated the applications of the proposed method.

1. Introduction

The varying coefficient model, proposed originally by Hastie and Tibshirani [1], is flexible and powerful to examine the dynamic changes of regression coefficients over some factors such as time and age and has gained much popularity during the past few decades (see [2,3,4,5,6]).
A classical varying coefficient model has the following structure:
Y = X T β ( U ) + ε ,
where Y R is a response variable, X = ( X 1 , , X p ) T R p is a covariate vector, β ( · ) = ( β 1 ( · ) , , β p ( · ) ) T R p is an unknown coefficient vector function with a smoothing variable U, and ε is a random error independent of ( X , U ) .
Recently, some estimates of β ( · ) for Model (1) with the least squares regression have attracted many researchers’ attention. Above all, Hastie and Tibshirani [1] considered L 2 penalized least squares estimation and attained some good results. Following, Fan et al. [4] and Fan et al. [7] applied the least squares regression to propose a two-step local polynomial estimation procedure and a profile estimator for Model (1), respectively, and designed some suitable statistical inference procedures. However, there arises a dilemma that these estimation procedures of the least squares could be very sensitive to outliers [8]. In order to overcome this problem, quantile regression proposed by Koenker [9] can be thought of as an alternative because as a mean model, traditional least squares regression only gives the effects of the covariates at the center of the distribution, while quantile regression can not only directly estimate the ones at different quantiles, but also characterize the entire conditional distribution of a dependent variable of the regression [8]. Thus, this regression has a much better robust property when processing outlier observations.
Due to its significant theoretical advances, some scholars had integrated the quantile regression into the varying coefficient model. Kim [10] attained the quantile regression model with the varying coefficient. For the processing of time series data, Cai and Xu [11] developed nonparametric quantile estimations with dynamic smooth coefficient models. Later, Cai and Xiao [12] applied dynamic models with partially-varying coefficients to investigate semiparametric quantile regression and obtained some useful results. Tang [13] derived a robust quantile regression estimation using the spatial semiparametric partially-linear regression model with a varying coefficient. Unfortunately, a relative small efficiency may result by a single quantile regression procedure compared with the least squares regression. In order to overcome this drawback, it is very necessary to get a desirable efficient and stable estimator. In recent years, an oracle procedure of composite quantile regression (CQR) was proposed by Zou and Yuan [14] to select the significant variables, and some important theoretical and applied results were derived. So far, the CQR method is widely used in many situations. For example, some efficient estimators based on the CQR method were proposed by Kai et al. [15] and Guo et al. [16] for semi-parametric partially-linear models with a varying coefficient. In addition, a data-driven weighted CQR (WCQR) estimation was studied by Sun et al. [17] and Yang et al. [18] for linear models with a varying coefficient, respectively.
Although the QR has significant theoretical properties such that its literature on the complete data has been rapidly growing, people have paid scant attention to the incomplete data, i.e., the data samples containing missing values since this class of data may lead easily to substantially-distorted results. In fact, as is commonplace, missing data often appear in real life. There are various reasons such as failure on the part of investigators when gathering correct information, the unwillingness of some sampled units when supplying the desired information, loss of information caused by uncontrollable factors, and so forth, resulting in the data missing. In the early 1970s, the advances in computer technology such that many laborious numerical calculations were possible to perform spurred the literature on statistical analysis of real data containing missing values in applied work; see [19,20,21,22,23,24,25]. Despite a long history on missing data analysis, little work on QR has taken missing data into account. Recently, an iterative imputation procedure was developed by Wei et al. [26] in a linear QR model with non-i.i.d. error terms for the covariates with missing values. A smoothed empirical likelihood analysis was discussed by Lv and Li [27] for partially-linear quantile regression with missing response. An inverse probability weighting QR approach was proposed by Sherwood et al. [28] in the last few years for analyzing healthcare cost data with missing covariates at random. The QR for competing risk data was studied by Sun et al. [29] when the failure type was missing. An efficient QR analysis was discussed by Chen [30] with missing observations. Some imputation methods were proposed by Shu [31] for quantile estimation under data missing at random.
In this paper, a coherent inference framework based on CQR estimation and inference is explored for varying coefficient models with response data missing at random. The main contribution of this paper can be summarized as follows:
  • A composite quantile regression estimation (CQRE) method is proposed for the analysis of varying coefficient models with response data missing at random. This method has the following two advantages: (1) the CQRE method can effectively overcome not only the drawback of a relative small efficiency that may result from a single quantile regression procedure compared with the least-squares regression, but also the interference of non-normal error; hence, it improves its estimation efficiency significantly; (2) since different quantiles are used in the imputation instead of actually observed responses or means and the robustness of quantile regression is inherited, the CQRE method is less sensitive to outliers; thus, the CQRE method is more effective and robust than the single quantile regression method and the classical least squares method.
  • Three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are proposed for an unknown coefficient function in the varying coefficient model to establish the asymptotic normality of these estimators under some mild conditions.
The rest of this paper is organized as follows. The CQR varying coefficient model will be introduced with missing response data in Section 2 to construct a class of estimators for an unknown coefficient function. Then, some theoretical results on the asymptotic property of the proposed estimators are proposed in Section 3. In Section 4, a bootstrap-based test procedure is developed to perform a simulation study in Section 5 that demonstrates the finite-sample performance of the proposed method. Following, an application to a real dataset illustrates the effectiveness of our approach in Section 6. In addition, some discussions and conclusion remarks are presented in Section 7 and Section 8, respectively. Finally, the proof of the main results is given in Appendix A.

2. Estimation Based on the CQR Varying Coefficient Model With Missing Response

In this section, the CQR varying coefficient model will be introduced with missing response data to construct a class of estimators for an unknown coefficient function. In particular, as the main estimate methods in this paper, three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are constructed and emphasized.
Let { ( X i , U i , Y i , δ i ) : i = 1 , 2 , , n } be a random sample coming from Model (1), such that:
Y i = X i T β ( U i ) + ε i , i = 1 , 2 , , n ,
where all the X i R p and U i R are always observed, and β ( · ) = ( β 1 ( · ) , , β j ( · ) , , β p ( · ) ) T R p is the coefficient vector function. Further, δ i = 0 if Y i is missing and δ i = 1 otherwise. We assume that throughout this paper, Y i is missing at random (MAR) for some i. This assumption indicates that δ i and Y i are conditionally independent given X i and U i , that is,
P ( δ i = 1 | X i , U i , Y i ) = P ( δ = 1 | X i , U i ) = p ( X i , U i ) = π ( Z i ) ,
where Z i = ( X i , U i ) T . Moreover, we also assume that across different quantile regression models, there is the same coefficient vector function β ( · ) . Thus, we can express the conditional τ-quantile function of Y as:
Q τ ( X , U ) = X T β ( U ) + c τ ,
where c τ is the τ -quantile of ε . If β j ( · ) is differentiable, Taylor’s expansion yields that:
β j ( U ) β j ( u ) + β j ( u ) ( U u ) = a j + b j ( U u ) ,
for j = 1 , 2 , , p , where u is a fixed value of a random variable and U lies in a neighborhood of u. For the case of no missing response data, minimizing the following criterion:
i = 1 n ρ τ Y i c τ X i T [ a + b ( U i u ) ] K h ( U i u ) ,
we can attain the local linear quantile regression (LLQR) estimator of β ( u ) , where ρ τ ( u ) = u ( τ I ( u < 0 ) ) is called the quantile loss function of τ-quantile regression, a = ( a 1 , , a p ) T , b = ( b 1 , , b p ) T , and K h ( · ) = K h ( · / h ) / h is a Gaussian kernel function with bandwidth h. In order to improve the quantile regression estimation efficiently, the local linear composite quantile regression (LLCQR) estimation is adopted from Guo et al. [16] for the varying coefficient models. Let q be the number of quantiles and τ k = k / ( 1 + q ) for k = 1 , , q . The loss function of the LLCQR estimation is defined as:
k = 1 q i = 1 n ρ τ k Y i c k X i T [ a + b ( U i u ) ] K h ( U i u ) ,
where c k is the τ k -quantile of ε .
In what follows, this technique of LLCQR will be extended to handle the case of response data missing at random.

2.1. WLLCQR Estimation

The inverse probability weighting (IPW) version of local linear CQR estimation will be considered to handle missing responses data at random, that is the CC (complete-case) analysis will be adjusted by using the inverse of the selection probability as the weight. However, the nonparametric smoothing estimation of π ( · ) will encounter the curse of dimensionality when the dimension of Z is high enough. Motivated by Wang [24], we use the inverse marginal probability weighted approach.
Let P ( δ = 1 | X i = x , U i = u ) = P ( δ = 1 | U i = u ) = Δ ( u ) , i.e., the propensity score just depends on U. When the inverse marginal probability function Δ ( u ) is known, the WLLCQR estimator β ^ ( u ) of β ( u ) is defined as:
( c ^ , a ^ , b ^ ) = arg m i n c , a , b k = 1 q i = 1 n δ i Δ ( U i ) ρ τ k Y i c k X i T [ a + b ( U i u ) ] K h ( U i u ) ,
where c = ( c 1 , , c q ) T and Δ ( u ) = P ( δ = 1 | U i = u ) . Here, β ^ ( u ) = a ^ is called the WLLCQR estimator of β ( u ) with Δ ( u ) .

2.2. Nonparametric WLLCQR Estimation

However, the inverse marginal probability function in practical situations is usually unknown, and thus, it needs to be estimated. We often employ nonparametric smoothing estimation approaches to estimate the unknown selection probability Δ ( · ) . The Nadaraya–Watson estimation [32] is one of these nonparametric smoothing estimation approaches. We can define the Nadaraya–Watson estimator of Δ ( u ) as:
Δ ^ ( u ) = i = 1 n δ i L h 0 ( U i u ) i = 0 n L h 0 ( U i u ) ,
where L h 0 ( · ) = L h 0 ( · / h 0 ) / h 0 is a density kernel function and h 0 is a bandwidth. Therefore, the NWLLCQR estimation procedure with Δ ^ ( u ) is formally defined as:
( c ^ N , a ^ N , b ^ N ) = arg m i n c , a , b k = 1 q i = 1 n δ i Δ ^ ( U i ) ρ τ k Y i c k X i T [ a + b ( U i u ) ] K h ( U i u ) ,
where β ^ N ( u ) = a ^ N is called the NWLLCQR estimator of β ( u ) with Δ ^ ( u ) .

2.3. Imputed WLLCQR Estimation

Although both the WLLCQR estimator and NWLLCQR estimator can well estimate the inverse marginal probability function, the information contained in the data is not explored fully. Now, we use quantile regression imputation to resolve the issue by imputing Y i by X T β ^ C ( U ) if Y i is missing, where β ^ C ( U ) = a ^ and a ^ is defined in:
( c ^ , a ^ , b ^ ) = arg m i n c , a , b k = 1 q i = 1 n δ i ρ τ k Y i c k X i T [ a + b ( U i u ) ] K h ( U i u ) .
Therefore, the imputed WLLCQR estimation procedure can be defined as:
( c ^ I , a ^ I , b ^ I ) = arg m i n c , a , b k = 1 q i = 1 n ρ τ k Y i * c k X i T [ a + b ( U i u ) ] K h ( U i u ) ,
where Y i * = δ i Δ ^ ( U i ) Y i + ( 1 δ i Δ ^ ( U i ) ) X i T β ^ C ( U i ) and β ^ I ( u ) = a ^ I is called the IWLLCQR estimator of β ( u ) .
Remark 1.
Since local results in interpolation vary greatly and are unstable, as a smoothing method, the kernel function is used in Equations (4)–(10) such that the interpolation results of these equations are much smoother and stabler.

3. Asymptotic Properties

In this section, the asymptotic distribution will be considered for the estimators proposed in Section 2 to establish some theoretical results of these estimators.
Let f ( · ) and f U ( · ) be the density functions of ε and U, respectively. For simplicity, the following notations: μ j = μ j K ( u ) d u and ν j = μ j K 2 ( u ) d u for j = 0 , 1 , 2 , η i = k = 1 q [ I ( ε i c k ) τ k ] for i = 1 , 2 ,⋯, n, and D u = E ( X i X i T | U = u ) k = 1 q f ( c k ) will be used in this section.
Now, the following results are established.
Theorem 1.
Suppose that Conditions C 1 C 7 in the Appendix hold. If Δ ( u ) is known, then:
n h β ^ ( u ) β ( z ) 1 2 h 2 μ 2 β ( u ) d N 0 , v 0 f U ( u ) D u 1 Ω u D u 1 ,
where d represents the convergence in the distribution, Ω u = E { η i 2 Δ ( U i ) X i X i T | U = u } .
Theorem 2.
Suppose Δ ( u ) > 0 is a smoothing function of u, based on Conditions C 1 C 7 in the Appendix holding. Then:
n h β ^ E ( u ) β ( z ) 1 2 h 2 μ 2 β ( u ) d N 0 , v 0 f U ( u ) D u 1 Ω u * D u 1 ,
where Ω u * = E { η i 2 Δ ( U i ) X i X i T | U i } E { 1 Δ ( U i ) Δ ( U i ) E [ X i T η i | U i ] 2 | U i } .
Theorem 3.
Assuming Δ ( u ) > 0 is a smoothing function of u, based on the Conditions C 1 C 7 in the Appendix hold. Then:
n h β ^ I ( u ) β ( z ) 1 2 h 2 μ 2 β ( u ) d N 0 , v 0 f U ( u ) D u 1 Ω u * * D u 1 ,
where:
Ω u k * * = E η i 2 π ( U i ) X i X i T | U i = u E ( 1 π ( U i ) ) ( 2 π ( U i ) + 1 ) π ( U i ) E [ X i η i | U i ] 2 .

4. A Bootstrap-Based Goodness-of-Fit Test

In investigating the varying coefficient model, how to test whether unknown coefficient functions are actually varying is of importance. In this section, the testing problem is considered for Model (1) under response missing, and then, a goodness-of-fit test is proposed based on the difference between the weighted residual sums of the quantile (WRSQ) and the LLCQR fittings under both the null and alternative hypotheses.
The following testing problem:
H 0 : β ( u ) = β versus H 1 : β ( u ) β ,
for simplicity, is considered, where β is a constant vector. The model (1) becomes a classical linear model with missing responses under the null hypothesis. The WRSQ under H 0 is defined as:
W R S Q 0 = k = 1 q i = 1 n ρ τ k Y i * c ^ k X i T β ^ I ,
where c ^ 1 , ⋯, c ^ q and β ^ I are given by the following IWCQR estimation procedure:
( c ^ 1 , , c ^ q , β ^ I ) = arg min k = 1 q i = 1 n ρ τ k Y i * c k X i T β .
Similarly, the WRSQ under H 1 can be defined as:
W R S Q 1 = k = 1 q i = 1 n ρ τ k Y i * c ^ I k X i T β ^ I ( U i ) ,
where c I ^ 1 , , c I ^ q and β ^ I ( u ) are given in (10). Then, the following test statistic is given as:
T n = W R S Q 0 W R S Q 1 W R S Q 1 = W R S Q 0 W R S Q 1 1 .
For a large value of T n , the null hypothesis (11) is rejected. In what follows, based on the bootstrap method, we evaluate the p values of the test along the lines of Wong et al. [33] and Guo et al. [16]:
Step 1.
Assume the number of complete data is m. We get the IWLLCQR estimator β ^ I ( U i ) .
Step 2.
The bootstrap residuals ε i * are generated from series { ε ^ i ε ^ ¯ } i = 1 n , where:
ε ^ i = Y i * X i T β ^ ( U i ) , ε ^ ¯ = 1 m i = 1 m ε ^ i ,
Step 3.
Step 2 is repeated for M times, and then, series sets E j = { Y i , j , X i , U i , δ i } i = 1 n are obtained for j = 1 , , M . The bootstrap test statistic is calculated for each bootstrap sample E j , denoted by T n , j * .
Step 4.
The p value is approximately estimated by p ^ = S M , where S is the cardinality of the set S = { j | T n , j * T n , j = 1 , , M } .

5. Simulation Study

A simulation study was carried out to investigate the finite-sample properties of our proposed method by a comparison among the WLLCQR estimation method, the NWLLCQR estimation method, the IWLLCQR estimation method, the INWLLCQR (imputed not weighted LLCQR) estimation method, and the WLLCQR estimation method without data missing, defined in (5).
In numerical studies, generally, the kernel function K ( x ) is taken to be K ( x ) = 0.75 ( 1 x 2 ) I ( | x | 1 ) . Here, this function is still adopted. It follows that the cross-validation method is used to select the optimal bandwidths h o p t . In the subsequent examples, let the composite level q = 8 .
Example 1.
Consider the following model:
Y i = sin ( π U i ) X i + ε i ,
where X i = Z i 1 + Z i 2 + Z i 3 , U i = Z i 1 + Z i 2 , Z i 1 , Z i 2 , Z i 3 are independent, Z i 1 and Z i 2 follow a uniform distribution on [ 1 , 1 ] , Z i 3 N ( 0 , 1 ) , and three error distributions of ε i are considered including ε i N ( 0 , 1 ) , ε i t ( 3 ) , and ε i 0.8 N ( 0 , 1 ) + 0.2 N ( 0 , 3 2 ) .
An analysis of the fitting of five different estimators including WLLCQR Δ , NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR is done by using the following three selection probability functions:
  • C a s e 1 : Δ 1 ( u ) = 0.9 + 0.2 u i f | u | 0.5 , a n d 0.95 o t h e r w i s e .
  • C a s e 2 : Δ 2 ( u ) = 0.7 + 0.2 u i f | u | 0.5 , a n d 0.75 o t h e r w i s e .
  • C a s e 3 : Δ 3 ( u ) = 0.5 + 0.2 u i f | u | 0.5 , a n d 0.55 o t h e r w i s e .
The average missing rates of Y corresponding to these three selection probability functions are approximately 0.15, 0.36, and 0.45, respectively. For each of the three cases, we generated 500 Monte Carlo random samples of size 200. The performance of the estimators is illustrated via the MSE. The simulation results are given in Table 1.
From Table 1, we can make the following observations:
Under the same selection probability function Δ ( u ) and the same sample size n, the MSE of IWLLCQR Δ ^ is only slightly smaller than the ones of WLLCQR Δ and NWLLCQR Δ ^ , respectively; the MSE of INWLLCQR is also slightly smaller than the ones of WLLCQR Δ and NWLLCQR Δ ^ , because much more information on missing data is considered in IWLLCQR Δ ^ and INWLLCQR, while the MSE of INWLLCQR is slightly greater than the one of IWLLCQR Δ ^ . Further the MSE of IWLLCQR Δ ^ is only slightly greater than the one of WLLCQR; this further confirms that the IWLLCQR Δ ^ method is a safe alternative to WLLCQR Δ and NWLLCQR Δ ^ .
Now, the simulated curves are plotted with the case of ε i N ( 0 , 1 ) under different levels of missing rates. Here, the results are presented only when n = 200 , while the ones for the case of n = 100 are not given since these results were similar. Figure 1, Figure 2 and Figure 3 summarize the finite sample performance of the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR methods for β ( u ) under different levels of missing rates. The red dashed curve, the blue dashed curve, the blue dotted curve, and the green dashed-dotted curve represent the results obtained by the NWLLCQR Δ ^ method, the IWLLCQR Δ ^ method, the INWLLCQR method, and the WLLCQR method, respectively. In addition, the red solid curve denotes the real curve of β ( u ) . From Figure 1, Figure 2 and Figure 3, we can see that:
(1) The simulation results based on the IWLLCQR Δ ^ method were similar to those based on the NWLLCQR Δ ^ method, the INWLLCQR Δ ^ method, and the WLLCQR method under a lower level of missing rate. However, the IWLLCQR Δ ^ method outperformed the INWLLCQR method, and the INWLLCQR method outperformed the NWLLCQR Δ ^ method, under a higher level of missing rate.
(2) It can be easily found that the simulated curve obtained by the IWLLCQR Δ ^ method was very close to the true curve. Thus, the imputed estimation was reasonable. However, the bias of the INWLLCQR method was slightly greater than those for the IWLLCQR Δ ^ and the WLLCQR method.
Example 2.
To examine the performance of the proposed test method, we consider the following model:
Y = β ( U ) X + ε ,
where X N ( 0 , 1 ) , ε t ( 5 ) , U follows a uniform distribution on [ 0 , 1 ] , and X , U , and ε are independent.
In order to illustrate our methods by using the dataset, artificial missing data were created by deleting some of the response values in the dataset at random. Assume that 40 % of the response values in this data are missed in this example. Consider the testing problem:
H 0 : β ( u ) = 1 v . s H 1 : β ( u ) = 1 + λ ( u 2 0.5 ) ( 0 λ 1 ) .
In what follows, the proposed test procedure is applied in a simulation with 500 replications. For each replication, 500 samples were generated, and the bootstrap sampling was repeated 300 times. Suppose the significance level α = 0.05 . Figure 4 shows that the simulated powers increased quickly as λ increased. In particular, the simulated size of the test T n was 0.043, which is close to the true significant level of α = 0.05 , when the null hypothesis holds. This demonstrates that the bootstrap estimate of the null distribution was considerably effective, which shows that our test was very powerful.

6. A Real Data Example

In this section, we apply the methods proposed in this paper to the dataset on air pollution that the Norwegian Public Roads Administration collected. The dataset, which can be found in StatLib, consists of 500 observations. The varying coefficient model based on the CQR method was used by Guo and Tian [16] to fit the relation among the hourly values of the logarithm of the number of cars per hour ( X 1 ) , wind speed ( X 2 ) , the logarithm of the concentration of NO 2 (Y), and the hour of the day ( T ) . We deleted about 35 % of the completely observed Y randomly to illustrate our proposed methods. Now, we investigate the varying coefficient model with the response data missing:
Y = β 1 ( T ) X 1 + β 2 ( T ) X 2 + ε .
Since the coefficient functions of the model (15) are really time varying, we need to consider the following testing problem:
H 0 : β ( · ) = β v . s H 1 : β ( · ) β
where β = ( β 1 , β 2 ) T is a constant vector. The model (15) is just a classical linear model if the null hypothesis in (16) is true. For the testing problem (16), we should reject the null hypothesis H 0 at a significance level of 0.05 because the p value of test T n was 0.00 based on 500 resampling bootstraps.
In addition, the estimated functions of β 1 ( · ) and β 2 ( · ) are given, and the computational results came from 500 simulation runs. The estimated coefficients and the standard deviations are summarized in Table 2 for WLLCQR Δ , NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR.
From Table 2, we can find that IWLLCQR Δ ^ and WLLCQR had much smaller standard deviations than WLLCQR Δ , WLLCQR Δ ^ , and INWLLCQR, respectively. In what follows, we give estimated functions of β 1 ( · ) and β 2 ( · ) along with the 95% bootstrap confidence bands. The results in Figure 5 and Figure 6 show that β 1 ( u ) and β 2 ( u ) were time varying. Furthermore, we can also see that the IWLLCQR Δ ^ method had almost equal confidence intervals as the WLLCQR method. Hence, the IWLLCQR Δ ^ method was reasonable.

7. Discussions

In the simulation study and the practical applications, we mainly found that the CQRE method was much more stable, efficient, and effective for varying coefficient models with response data missing at random than the single QRE method and least squares method when the sample size was large enough and the error faced a different distribution, which means the bias was at a relatively low level and the correct selection rate relatively higher.
In the face of high-dimensional data, these three method were not ideal, but the CQRE method was relatively better. This case arises widely in many research fields such as reliability life testing, genetic data research, medical tracking trials, population census, economics and finance, environment monitoring and biomedical research, etc. How to modify our method to improve the performance of the CQRE method for high-dimensional varying coefficient models with response data missing at random is an important topic that we will study further.
On the other hand, this paper with only response data missing at random studied the CQR estimation and inference for varying coefficient models. However, it did not consider the case of covariant data missing, nor even the more general case of both response and covariant data missing. These problems are more challenging topics that we will explore and study further in the coming year.

8. Concluding Remarks

In this paper, a CQRE method was proposed for varying coefficient models with response data missing at random to develop three estimators including the WLLCQR estimator, the NWLLCQR estimator, and the IWLLCQ estimator for unknown coefficient functions and establish some results on the asymptotic normality of these proposed estimators under some mild conditions. Following, a bootstrap-based test procedure was designed to perform a simulation study, which demonstrated that the unknown coefficient estimators with IWLLCQR were superior to the other two ones with WLLCQR and NWLLCQR. Meanwhile, based on the IWLLCQR fittings, a bootstrap test procedure was also designed to test whether the coefficient functions were actually varying. Finally, a type of investigated real-life dataset was analyzed to illustrate that the CQRE method was much more stable, efficient, and effective for varying coefficient models with response data missing at random than the single QRE method and least squares method.

Author Contributions

All the authors inferred the main conclusions and approved of the current version of this manuscript.

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments and suggestions, which actually stimulated this work. The work was supported by the National Natural Science Foundations of China (11601409, 71501155 and 11201362), the Natural Science Foundations of Shaanxi Province of China (2016JM1009), and the Natural Science Foundations of the Department of Shaanxi Province of China (2017JK0344).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

QRquantile regression
CQRcomposite QR
WCQRweighted CQR
CQRECQR estimation
LLQRlocal linear QR
LLCQRlocal linear CQR
WLLCQRweighted LLCQR
NWLLCQRnonparametric WLLCQR
IWLLCQRimputed WLLCQR

Appendix A

The following conditions are needed for the results in Section 3.
(C1) { ( Y i , X i ) : i = 1 , 2 , , n } are independent and identically distributed random vectors.
(C2) The density function f ( · ) of ε has a continuous and uniformly-bounded derivative, namely 0 < sup s f ( s ) < B 0 .
(C3) Matrix E ( X i T X i | T = t ) is a positive definite matrix, and E ( X i | T ) = 0 .
(C4) Random variable U has a second-order differentiable density function f U ( u ) > 0 in some neighborhood of u.
(C5) The coefficient function β ( u ) is second-order differentiable in a neighborhood of a given u, and β 0 is continuous.
(C6) The kernel function K ( · ) is a symmetric density function with a compact support, whose bandwidth h 0 , n h as n .
(C7) The bandwidth h 0 0 , h 0 / h 0 , and n h 0 as n .
(C8) The selection probability function Δ ( u ) > 0 has a bounded and continuous second derivative on the support of U.
The following lemma is useful for proving some theorems given in Section 3.
Lemma A1
(See Lemma 2 in [15]). Let ( Y 1 , X 1 ) , ( Y 2 , X 2 ) , , ( Y n , X n ) be independent and identically distributed (i.i.d) random vectors, where the Y i s are scalar random variables. Suppose that E | Y i | 3 < and sup x | y | s φ ( x , y ) d y < , where f ( · , · ) represents the density of ( X , Y ) . Let K ( · ) be a bounded positive function with a bounded support, satisfying the Lipschitz condition. Then:
s u p x | 1 n 1 n { K h ( X i x ) Y i E [ K h ( X i x ) Y i ] } | = O p l n 1 / 2 ( 1 / h ) n h .
In what follows, the main theorems in Section 3 will be proven.
Proof of Theorem 1.
Let K i ( u ) = K h { ( U i u ) / h } , s i ( u ) = ( U i u ) / h , η i , k ( u ) = I ( ε i c k r i ( u ) ) τ k with r i ( u ) = X i T β ( U i ) β ( U i ) β ( u ) ( U i u ) , θ = n h { c ^ 1 c 1 , , c ^ q c q , [ a ^ β ( u ) ] T , [ b ^ β ( u ) ] T } T , X i , k ( u ) = { e k T , X i T , X i T ( U i u ) / h } T with e k a q-dimensional vector with one at the k th position and zero, elsewhere. Since:
( c ^ , a ^ , b ^ ) = arg m i n c , a , b k = 1 q i = 1 n δ i Δ ( U i ) ρ τ k Y i c k X i T [ a + b ( U i u ) ] K h ( U i u ) .
θ is the minimizer of the criterion:
T n ( π ( U ) , θ ) = k = 1 q i = 1 n δ i K i ( u ) Δ ( U i ) ρ τ k ( ε i c k + r i ( u ) Δ i , k ) ρ τ k ( ε i c k + r i ( u ) ) ,
where Δ i , k = X i , k T θ / n h . Applying the following identity (see Knight [34]):
ρ τ ( x y ) ρ τ ( x ) = y [ I ( x < 0 ) τ ] + 0 y [ I ( x s ) I ( x 0 ) ] d s ,
we can rewrite T n ( Δ ( U i ) , θ ) as:
T n ( Δ ( U i ) , θ ) = k = 1 q i = 1 n δ i K i ( u ) Δ ( U i ) { Δ i , k [ I ( ε i c k r i ( u ) ) τ k ]    + 0 Δ i , k [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z = [ W n ( u ) ] T θ + k = 1 q B n , k ( θ ) ,
where
W n ( u ) = 1 n h k = 1 q i = 1 n δ i K i ( u ) Δ ( U i ) η i , k ( u ) X i , k ( u ) and ,
B n , k ( θ ) = i = 1 n δ i K i ( u ) Δ ( U i ) 0 Δ i , k [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z .
Then, it follows from Lemma A1 that B n , k ( θ ) = E ( B n , k ( θ ) ) + o p ( 1 ) . Denote:
S n ( u ) = D i a g S n , 11 ( u ) S n , 12 ( u ) S n , 21 ( u ) S n , 22 ( u ) , S n , 33 ( u ) ,
where:
S n , 11 ( u ) = 1 n h D i a g f ( c 1 ) i = 1 n K i ( u ) , , f ( c q ) i = 1 n K i ( u ) ,
S n , 12 ( u ) = 1 n h f ( c 1 ) i = 1 n K i ( u ) X i , , f ( c q ) i = 1 n K i ( u ) X i T ,
S n , 22 ( u ) = 1 n h k = 1 q f ( c k ) i = 1 n K i ( u ) X i X i T ,
S n , 33 ( u ) = 1 n h k = 1 q f ( c k ) i = 1 n K i ( u ) X i X i T s i 2 .
We observe that by the iterative expectation:
k = 1 q E [ B n , k ( θ ) | X , U ] = k = 1 q i = 1 n K i ( u ) 0 Δ i , k E [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] d z = k = 1 q i = 1 n K i ( u ) 0 Δ i , k [ F ( c k r i ( u ) + z ) F ( c k r i ( u ) ) ] d z = 1 2 θ T 1 n h k = 1 q i = 1 n K i ( u ) f ( c k ) ( 1 + o ( 1 ) ) X i , k ( u ) X i , k T ( u ) θ + o p ( 1 ) = 1 2 θ T S n ( u ) θ + o p ( 1 ) .
As in Parzen [35], we have:
S n , 11 ( u ) P S 11 ( u ) = f U ( u ) D i a g { f ( c 1 , , f ( c q ) } ,
S n , 12 ( u ) P S 12 ( u ) = f U ( u ) E ( X i | U = u ) [ f ( c 1 , , f ( c q ) ) ] ,
S n , 22 ( u ) P S 22 ( u ) = f U ( u ) k = 1 q f ( c k ) E ( X i X i T | U = u ) ,
S n , 33 ( u ) P S 33 ( u ) = f U ( u ) μ 2 k = 1 q f ( c k ) E ( X i X i T | U = u ) .
Based on the above results, we can prove that:
T n ( Δ ( U i ) , θ ) = 1 2 θ T S n ( u ) θ + [ W n ( u ) ] T θ + o p ( 1 ) ,
where:
S ( u ) = D i a g S 11 ( u ) S 12 ( u ) S 21 ( u ) S 22 ( u ) , S 33 ( u ) .
By Corollary 2 of Knight [34], we have θ P [ S ( u ) ] 1 W n ( u ) . Assume Condition (C3) is satisfied. Then, S ( u ) = D i a g { S 11 ( u ) , S 22 ( u ) , S 33 ( u ) } . Simple calculation of the block matrix yields:
n h ( β ^ ( u ) β ( u ) ) P [ S 22 ( u ) ] 1 W n , 2 ( u ) ,
where W n , 2 ( u ) = 1 n h i = 1 n δ i K i ( u ) Δ ( U i ) η i ( u ) X i . Let W ˜ n , 2 ( u ) = 1 n h i = 1 n δ i K i ( u ) Δ ( U i ) η i X i . It is easy to verify that E ( W ˜ n , 2 ( u ) ) = 0 . As in Parzen [35], we obtain:
V a r ( W ˜ n , 2 ( u ) ) = E 1 n h i = 1 n K i 2 ( u ) Δ ( U i ) η i 2 X i X i T = f U ( u ) ν 0 E η i 2 Δ ( U i ) X i X i T | U = u .
By the central limit theorem, we get W ˜ n , 2 ( u ) L N ( 0 , f U ( u ) ν 0 Ω u ) . Similar to Kai et al. (2011) [15], we can show that:
V a r ( W ˜ n , 2 ( u ) W n , 2 ( u ) | X , U ) q 2 n h i = 1 n δ i K i ( u ) Δ 2 ( U i ) X i X i T × max k { F ( c k + | r i ( u ) | F ( c k ) } = o p ( 1 ) .
By Slutsky’s theorem, we obtain:
W n , 2 ( u ) E [ W n , 2 ( u ) ] L N ( 0 , f U ( u ) ν 0 Ω u ) .
Since:
1 n h E [ W 2 n ( u ) | X , U ] = 1 n h k = 1 q i = 1 n K i ( u ) [ F ( c k r i ( u ) F ( c k ) ] X i = 1 n h k = 1 q i = 1 n K i ( u ) f ( c k ) [ 1 + o ( 1 ) ] r i ( u ) X i ,
it is easy to obtain:
1 n h E [ W 2 n ( u ) ] = μ 2 h 2 2 f U ( u ) S 22 β ( u ) + o ( h 2 ) .
Together with (A2) and (A4), we have:
n h β ^ ( u ) β ( u ) 1 2 μ 2 h 2 β ( u ) L N 0 , ν 0 f U ( u ) D u 1 Ω u D u 1 .
This completes the proof of Theorem 1. □
Proof of Theorem 2.
Let θ * = n h { c ^ 1 * c 1 , , c ^ q * c q , [ a ^ * β ( u ) ] T , h [ b ^ * β ( u ) ] T } T . Similar to the proof of Theorem 1, we have:
T n * ( Δ ^ ( U ) , θ * ) = k = 1 q i = 1 n δ i K i ( u ) Δ ^ ( U i ) { Δ i , k * [ I ( ε i c k r i ( u ) ) τ k ] + 0 Δ i , k * [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z = [ W n * ( u ) ] T θ * + k = 1 q B n , k * ( θ * ) ,
where:
W n * ( u ) = 1 n h k = 1 q i = 1 n δ i K i ( u ) Δ ^ ( U i ) η i , k ( u ) X i , k ( u ) ,
B n , k * ( θ * ) = i = 1 n δ i K i ( u ) Δ ^ ( U i ) 0 Δ i , k * [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z .
Let:
H n , k * ( θ * ) = i = 1 n δ i ( Δ ( U i ) Δ ^ ( U i ) ) Δ ^ ( U i ) Δ ( U i ) × 0 Δ i , k * K i ( u ) [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z .
Then, B n , k * ( θ * ) = B n , k ( θ * ) + H n , k * ( θ * ) . It is easy to verify that:
i = 1 n δ i Δ ( U i ) 0 Δ i , k * K i ( u ) [ I ( ε i c k + r i ( u ) z ) I ( ε i c k + r i ( u ) 0 ) ] } d z = O p ( 1 ) .
Considering the fact that sup u | Δ ^ ( u ) Δ ( u ) | = o ( 1 ) , it follows from (A5) that H n , k * ( θ * ) = o p ( 1 ) , and then:
k = 1 q B n , k * ( θ * ) = k = 1 q B n , k ( θ * ) + o p ( 1 ) .
Similar to the proof of Theorem 1, we can prove that:
n h ( β ^ N ( u ) β ( u ) ) P [ S 22 ( u ) ] 1 W n , 2 * ( u ) ,
where W n , 2 * ( u ) = 1 n h i = 1 n δ i K i ( u ) Δ ^ ( U i ) η i ( u ) X i . Let W ˜ n , 2 * ( u ) = 1 n h i = 1 n δ i K i ( u ) Δ ^ ( U i ) η i X i . By the proof of Theorem 3 in Wong [33], we can obtain:
W ˜ n , 2 * ( u ) = 1 n h i = 1 n δ i Δ ^ ( U i ) K i ( u ) η i X i + 1 n h i = 1 n δ i Δ ( U i ) Δ ( U i ) E [ K i ( u ) η i X i | Z i ] + o p ( h 2 ) = W ˜ n , 21 * ( u ) + W ˜ n , 22 * ( u ) + o p ( h 2 ) ,
where E ( W ˜ n , 21 * ( u ) ) = 0 and E ( W ˜ n , 22 * ( u ) ) = 0 . Furthermore,
V a r ( W ˜ n , 21 * ( u ) ) = f U ( u ) ν 0 E [ η i 2 Δ ( U i ) X i X i T | U i ] ,
V a r ( W ˜ n , 22 * ( u ) ) = f U ( u ) ν 0 E [ 1 Δ ( U i ) Δ ( U i ) E { X i η i | U i } 2 ] ,
C o v ( W ˜ n , 21 * ( u ) , W ˜ n , 22 * ( u ) ) = f U ( u ) ν 0 E [ 1 Δ ( U i ) Δ ( U i ) E { X i η i | U i } 2 ] .
Completing the calculation, we obtain:
V a r ( W ˜ n , 2 * ( u ) ) = f U ( u ) ν 0 E ( η i 2 Δ ( U i ) X i X i T | U i ) E [ 1 Δ ( U i ) Δ ( U i ) E { X i η i | U i } 2 ] + o ( 1 ) .
Based on the above results, it follows that W ˜ n , 2 * ( u ) L N ( 0 , f U ( u ) ν 0 Ω u * ) . Similar to the proof of Theorem 1, we have: V a r ( W ˜ n , 2 * ( u ) W n , 2 * ( u ) | X , U ) = o p ( 1 ) . Thus:
W n , 2 * ( u ) E [ W n , 2 * ( u ) ] L N ( 0 , f U ( u ) ν 0 Ω u * )
By Lemma A1, we get:
1 n h i = 1 n δ i K i ( u ) η i X i P E [ 1 n h i = 1 n δ i K i ( u ) η i X i ] = O ( h 2 ) .
Since 1 Δ ^ ( U i ) 1 Δ ( U i ) = o p ( 1 ) , then
1 n h W n , 2 * ( u ) = 1 n h i = 1 n δ i Δ ^ ( U i ) K i ( u ) η i X i + 1 n h i = 1 n δ i [ 1 Δ ^ ( U i ) 1 Δ ( U i ) ] K i ( u ) η i X i = 1 n h i = 1 n W n , 2 ( u ) + o p ( h 2 ) .
Thus, we can show that:
1 n h E [ W n , 2 * ( u ) ] = 1 n h E [ W n , 2 ( u ) ] + o ( h 2 ) .
Following (A7), (A9), and Theorem 1, we complete the proof of Theorem 2. □
Proof of Theorem 3.
Write θ * * = n h { c ^ 1 * * c 1 , , c ^ q * * c q , { a ^ * * β ( u ) } T , h { b ^ * * β ( u ) } T } T . Δ i , k * * = X i , k T θ * * / n h , η i , k * ( u ) = I ( δ i Δ ^ ( U ) ε i r i ( u ) ) , then we have:
Y i * = δ i π ^ ( U i ) Y i + ( 1 δ i π ^ ( U i ) ) X i T β ^ C ( U i ) = δ i π ^ ( U ) ε i + X i T β ^ C ( U i ) + o p ( 1 ) ,
Similar to the proof of Theorem 2, we have:
T n * * ( π ^ ( U ) , θ * * ) = k = 1 q i = 1 n δ i K i ( u ) { Δ i , k * * I ( 1 π ^ ( U ) ε i c k r i ( u ) ) τ k + 0 Δ i , k * * [ I ( 1 π ^ ( U ) ε i c k + r i ( u ) z ) I ( 1 π ^ ( U ) ε i c k + r i ( u ) 0 ) ] } d z + k = 1 q i = 1 n 1 δ i π ^ ( U ) K i ( u ) { Δ i , k * * I ( 0 c k r i ( u ) ) τ k + 0 Δ i , k * * I ( r i ( u ) c k z ) I ( r i ( u ) c k 0 ) } d z = [ W n * * ( u ) ] T θ * * + k = 1 q B n , k * * ( θ * * ) ,
where:
W n * * ( u ) = 1 n h k = 1 q i = 1 n δ i K i ( u ) η i , k * ( u ) X i + 1 n h k = 1 q i = 1 n 1 δ i π ^ ( U ) K i ( u ) ξ i , k * ( u ) X i ,
B n , k * * ( θ * * ) = i = 1 n δ i K i ( u ) 0 Δ i , k * * I ( 1 π ^ ( U ) ε i c k + r i ( u ) z ) I ( 1 π ^ ( U ) ε i c k + r i ( u ) 0 ) d z + i = 1 n 1 δ i π ^ ( U ) K i ( u ) 0 Δ i , k * * I ( r i ( u ) c k z ) I ( r i ( u ) c k 0 ) d z .
We can prove:
i = 1 n 1 δ i π ^ ( U ) K i ( u ) 0 Δ i , k * * I ( r i ( u ) c k z ) I ( r i ( u ) c k 0 ) d z = o p ( 1 ) .
Similar to the proof of Theorem 1, we can complete the proof. □

References

  1. Hastie, T.J.; Tibshirani, R.J. Varying-coefficient models. J. R. Stat. Soc. Ser. 1993, 55, 757–796. [Google Scholar] [CrossRef]
  2. Chiang, C.T.; Rice, J.A.; Wu, C.O. Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J. Am. Stat. Assoc. 2001, 96, 605–619. [Google Scholar] [CrossRef]
  3. Eubank, R.L.; Huang, C.; Maldonado, Y.M.; Wang, N.; Wang, S.; Buchanan, R.J. Smoothing spline estimation in varying coefficient models. J. R. Stat. Soc. Ser. 2004, 66, 653–667. [Google Scholar] [CrossRef]
  4. Fan, J.; Zhang, J.T. Statistical estimation in varying coefficient models. Ann. Stat. 1999, 27, 1491–1518. [Google Scholar]
  5. Huang, J.; Wu, C.O.; Zhou, L. Varying coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 2002, 89, 111–128. [Google Scholar] [CrossRef]
  6. Wu, C.O.; Yu, K.F.; Chiang, C.T. A two-step smoothing method for varying coefficient models with repeated measurements. Ann. Inst. Stat. Math. 2000, 52, 519–543. [Google Scholar] [CrossRef]
  7. Fan, J.; Huang, T. Profile likelihood inferences on semiparametric varying-cofficient partially linear models. Bernoulli 2005, 11, 1031–1057. [Google Scholar] [CrossRef]
  8. Whang, Y.J. Smoothed empirical likelihood methods for quantile regression models. Econom. Theory 2006, 22, 173–205. [Google Scholar] [CrossRef]
  9. Koenker, R. Quantiles Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  10. Kim, M.O. Quantile regression with varying coefficients. Ann. Stat. 2007, 35, 92–108. [Google Scholar] [CrossRef]
  11. Cai, Z.; Xu, X. Nonparametric quantile estimations for dynamic smooth coefficient models. J. Am. Stat. Assoc. 2008, 103, 1595–1608. [Google Scholar] [CrossRef]
  12. Cai, Z.; Xiao, Z. Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. J. Econom. 2012, 167, 413–425. [Google Scholar] [CrossRef]
  13. Tang, Q.G. Robust estimation for spatial semiparametric varying coefficient partially linear regression. Stat. Pap. 2015, 56, 1137–1161. [Google Scholar]
  14. Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
  15. Kai, B.; Li, R.; Zou, H. New efficient estimation and variable selection methods for semiparametric varying coefficient partially linear models. Ann. Stat. 2011, 39, 305–332. [Google Scholar] [CrossRef] [PubMed]
  16. Guo, J.; Tian, M.Z. New efficient and robust estimation in varying coefficient models with heteroscedasticity. Stat. Sin. 2012, 22, 1075–1101. [Google Scholar]
  17. Sun, J.; Gai, Y.; Lin, L. Weighted local linear composite quantile estimation for the case of general error distributions. J. Stat. Plan. Inference 2013, 143, 1049–1063. [Google Scholar] [CrossRef]
  18. Yang, H.; Lv, J.; Guo, C.H. Weighted composite quantile regression estimation and variable selection for varying coefficient models with heteroscedasticity. J. Korean Stat. Soc. 2015, 44, 77–94. [Google Scholar] [CrossRef]
  19. Luo, S.; Zhang, C.-Y. Nonparametric M-type regression estimation under missing response data. Stat. Pap. 2016, 57, 641–664. [Google Scholar] [CrossRef]
  20. Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
  21. Sterne, J.; White, I.; Carlin, J.; Spratt, M.; Royston, P.; Kenward, M.; Carpenter, J. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009, 338, b2393. [Google Scholar] [CrossRef] [PubMed]
  22. Wang, Q.; Linton, O.; HÄrdle, W. Semiparametric regression analysis with missing response at random. J. Am. Stat. Assoc. 2004, 99, 334–345. [Google Scholar] [CrossRef]
  23. Wang, Q.; Sun, Z. Estimation in partially linear models with missing responses at random. J. Multivar. Anal. 2007, 98, 1470–1493. [Google Scholar] [CrossRef] [Green Version]
  24. Wang, Q.; Rao, N.K. Empirical Likelihood-based inference under imputation for missing response data. Ann. Stat. 2002, 30, 896–924. [Google Scholar] [CrossRef]
  25. Xue, L.G. Empirical likelihood confidence intervals for response mean with data missing at random. Scand. J. Stat. 2009, 36, 671–685. [Google Scholar] [CrossRef]
  26. Wei, Y.; Ma, Y.; Carroll, R. Multiple imputation in quantile regression. Biometrika 2012, 99, 423–438. [Google Scholar] [CrossRef] [PubMed]
  27. Lv, X.; Li, R. Smoothed empirical likelihood analysis of partially linear quantile regression models with missing response variables. Adv. Stat. Anal. 2013, 97, 317–347. [Google Scholar] [CrossRef]
  28. Sherwood, B.; Wang, L.; Zhou, X. Weighted quantile regression for analyzing health care cost data with missing covariates. Stat. Med. 2013, 32, 4967–4979. [Google Scholar] [CrossRef]
  29. Sun, Y.; Wang, Q.; Gilbert, P. Quantile regression for competing risks data with missing cause of failure. Ann. Stat. 2012, 22, 703–728. [Google Scholar] [CrossRef] [Green Version]
  30. Chen, X.; Wan, T.K.; Zhou, Y. Efficient quantile regression analysis with missing observations. J. Am. Stat. Assoc. 2015, 110, 723–741. [Google Scholar] [CrossRef]
  31. Kim, S.Y. Imputation methods for quantile estimation under missing at random. Stat. Its Interface 2013, 6, 369–377. [Google Scholar] [Green Version]
  32. Nageswara, S.; Rao, V. Nadaraya-Watson estimator for sensor fusion. Opt. Eng. 1997, 36, 642–647. [Google Scholar] [Green Version]
  33. Wong, H.; Guo, S.J.; Chen, M.; Wai-Cheung, I.P. On locally weighted estimation and hypothesis testing on varying coefficient models with missing covariates. J. Stat. Plan. Inference 2009, 139, 2933–2951. [Google Scholar] [CrossRef]
  34. Knight, K. Limiting distributions for L1 regression estimators under general conditions. Ann. Stat. 1998, 26, 755–770. [Google Scholar]
  35. Parzen, E. On estimation of a probability density function and model. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Figure 1. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 1 ( u ) .
Figure 1. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 1 ( u ) .
Symmetry 11 01065 g001
Figure 2. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 2 ( u ) .
Figure 2. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 2 ( u ) .
Symmetry 11 01065 g002
Figure 3. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Figure 3. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Symmetry 11 01065 g003
Figure 4. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Figure 4. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Symmetry 11 01065 g004
Figure 5. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Figure 5. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Symmetry 11 01065 g005
Figure 6. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Figure 6. The comparison between the true curve and the NWLLCQR Δ ^ , IWLLCQR Δ ^ , INWLLCQR, and WLLCQR simulation curve when n = 200 and the selection probability function is Δ 3 ( u ) .
Symmetry 11 01065 g006
Table 1. The MSE for estimators in Example 1.
Table 1. The MSE for estimators in Example 1.
Model Error Δ ( u ) nMSE
WLLCQR Δ NWLLCQR Δ ^ IWLLCQR Δ ^ INWLLCQRWLLCQR
Error(1) Δ 1 ( u ) 1000.12190.12130.11920.12010.1182
2000.10170.10120.09970.10030.0973
Δ 2 ( u ) 1000.18450.17920.17010.17150.1645
2000.17010.16980.16410.16540.1583
Δ 3 ( u ) 1000.26850.25180.23460.23710.2207
2000.19760.19030.18260.18420.1612
Error(2) Δ 1 ( u ) 1000.07890.07750.07190.07280.0696
2000.06460.06120.05980.06090.0559
Δ 2 ( u ) 1000.11240.11020.10190.10270.0921
2000.09970.09040.08980.09050.0802
Δ 3 ( u ) 1000.39540.35420.32570.33020.3024
2000.33560.32980.30210.30750.2814
Error(3) Δ 1 ( u ) 1000.05980.05680.05140.05290.0498
2000.05280.05150.04980.05020.0439
Δ 2 ( u ) 1000.06870.06650.06210.06320.0596
2000.05790.05580.05230.05450.0495
Δ 3 ( u ) 1000.11020.10170.09870.10040.0812
2000.09570.09220.08920.09040.0759
Table 2. The coefficient estimates and sample standard deviations (in parentheses) for the air pollution data.
Table 2. The coefficient estimates and sample standard deviations (in parentheses) for the air pollution data.
WLLCQR Δ NWLLCQR Δ ^ IWLLCQR Δ ^ INWLLCQRWLLCQR
β 1 ( t ) −0.312 (0.046)−0.307 (0.045)−0.315 (0.042)−0.30 (0.044)−0.316 (0.041)
β 2 ( t ) −0.379 (0.104)−0.378 (0.102)−0.375 (0.099)−0.376 (0.101)−0.374 (0.098)

Share and Cite

MDPI and ACS Style

Luo, S.; Zhang, C.-y.; Wang, M. Composite Quantile Regression for Varying Coefficient Models with Response Data Missing at Random. Symmetry 2019, 11, 1065. https://doi.org/10.3390/sym11091065

AMA Style

Luo S, Zhang C-y, Wang M. Composite Quantile Regression for Varying Coefficient Models with Response Data Missing at Random. Symmetry. 2019; 11(9):1065. https://doi.org/10.3390/sym11091065

Chicago/Turabian Style

Luo, Shuanghua, Cheng-yi Zhang, and Meihua Wang. 2019. "Composite Quantile Regression for Varying Coefficient Models with Response Data Missing at Random" Symmetry 11, no. 9: 1065. https://doi.org/10.3390/sym11091065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop