Composite Quantile Regression for Varying Coefficient Models with Response Data Missing at Random

Composite quantile regression (CQR) estimation and inference are studied for varying coefficient models with response data missing at random. Three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are proposed for unknown coefficient functions. Under some mild conditions, the proposed estimators are asymptotic normal. Simulation studies demonstrate that the unknown coefficient estimators with IWLLCQR are superior to the other two with WLLCQR and NWLLCQR. Moreover, bootstrap test procedures based on the IWLLCQR fittings is developed to test whether the coefficient functions are actually varying. Finally, a type of investigated real-life data is analyzed to illustrated the applications of the proposed method.


Introduction
The varying coefficient model, proposed originally by Hastie and Tibshirani [1], is flexible and powerful to examine the dynamic changes of regression coefficients over some factors such as time and age and has gained much popularity during the past few decades (see [2][3][4][5][6]).
A classical varying coefficient model has the following structure: where Y ∈ R is a response variable, X = (X 1 , · · · , X p ) T ∈ R p is a covariate vector, β(·) = (β 1 (·), · · · , β p (·)) T ∈ R p is an unknown coefficient vector function with a smoothing variable U, and ε is a random error independent of (X, U).
Recently, some estimates of β(·) for Model (1) with the least squares regression have attracted many researchers' attention. Above all, Hastie and Tibshirani [1] considered L 2 penalized least squares estimation and attained some good results. Following, Fan et al. [4] and Fan et al. [7] applied the least squares regression to propose a two-step local polynomial estimation procedure and a profile estimator for Model (1), respectively, and designed some suitable statistical inference procedures. However, there arises a dilemma that these estimation procedures of the least squares could be very sensitive to outliers [8]. In order to overcome this problem, quantile regression proposed by Koenker [9] can be thought of as an alternative because as a mean model, traditional least squares regression only gives the effects of the covariates at the center of the distribution, while quantile regression can not only directly estimate the ones at different quantiles, but also characterize the entire conditional distribution of a dependent variable of the regression [8]. Thus, this regression has a much better robust property when processing outlier observations. Due to its significant theoretical advances, some scholars had integrated the quantile regression into the varying coefficient model. Kim [10] attained the quantile regression model with the varying coefficient. For the processing of time series data, Cai and Xu [11] developed nonparametric quantile estimations with dynamic smooth coefficient models. Later, Cai and Xiao [12] applied dynamic models with partially-varying coefficients to investigate semiparametric quantile regression and obtained some useful results. Tang [13] derived a robust quantile regression estimation using the spatial semiparametric partially-linear regression model with a varying coefficient. Unfortunately, a relative small efficiency may result by a single quantile regression procedure compared with the least squares regression. In order to overcome this drawback, it is very necessary to get a desirable efficient and stable estimator. In recent years, an oracle procedure of composite quantile regression (CQR) was proposed by Zou and Yuan [14] to select the significant variables, and some important theoretical and applied results were derived. So far, the CQR method is widely used in many situations. For example, some efficient estimators based on the CQR method were proposed by Kai et al. [15] and Guo et al. [16] for semi-parametric partially-linear models with a varying coefficient. In addition, a data-driven weighted CQR (WCQR) estimation was studied by Sun et al. [17] and Yang et al. [18] for linear models with a varying coefficient, respectively.
Although the QR has significant theoretical properties such that its literature on the complete data has been rapidly growing, people have paid scant attention to the incomplete data, i.e., the data samples containing missing values since this class of data may lead easily to substantially-distorted results. In fact, as is commonplace, missing data often appear in real life. There are various reasons such as failure on the part of investigators when gathering correct information, the unwillingness of some sampled units when supplying the desired information, loss of information caused by uncontrollable factors, and so forth, resulting in the data missing. In the early 1970s, the advances in computer technology such that many laborious numerical calculations were possible to perform spurred the literature on statistical analysis of real data containing missing values in applied work; see [19][20][21][22][23][24][25]. Despite a long history on missing data analysis, little work on QR has taken missing data into account. Recently, an iterative imputation procedure was developed by Wei et al. [26] in a linear QR model with non-i.i.d. error terms for the covariates with missing values. A smoothed empirical likelihood analysis was discussed by Lv and Li [27] for partially-linear quantile regression with missing response. An inverse probability weighting QR approach was proposed by Sherwood et al. [28] in the last few years for analyzing healthcare cost data with missing covariates at random. The QR for competing risk data was studied by Sun et al. [29] when the failure type was missing. An efficient QR analysis was discussed by Chen [30] with missing observations. Some imputation methods were proposed by Shu [31] for quantile estimation under data missing at random.
In this paper, a coherent inference framework based on CQR estimation and inference is explored for varying coefficient models with response data missing at random. The main contribution of this paper can be summarized as follows: • A composite quantile regression estimation (CQRE) method is proposed for the analysis of varying coefficient models with response data missing at random. This method has the following two advantages: (1) the CQRE method can effectively overcome not only the drawback of a relative small efficiency that may result from a single quantile regression procedure compared with the least-squares regression, but also the interference of non-normal error; hence, it improves its estimation efficiency significantly; (2) since different quantiles are used in the imputation instead of actually observed responses or means and the robustness of quantile regression is inherited, the CQRE method is less sensitive to outliers; thus, the CQRE method is more effective and robust than the single quantile regression method and the classical least squares method.

•
Three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are proposed for an unknown coefficient function in the varying coefficient model to establish the asymptotic normality of these estimators under some mild conditions.
The rest of this paper is organized as follows. The CQR varying coefficient model will be introduced with missing response data in Section 2 to construct a class of estimators for an unknown coefficient function. Then, some theoretical results on the asymptotic property of the proposed estimators are proposed in Section 3. In Section 4, a bootstrap-based test procedure is developed to perform a simulation study in Section 5 that demonstrates the finite-sample performance of the proposed method. Following, an application to a real dataset illustrates the effectiveness of our approach in Section 6. In addition, some discussions and conclusion remarks are presented in Section 7 and Section 8, respectively. Finally, the proof of the main results is given in Appendix A.

Estimation Based on the CQR Varying Coefficient Model With Missing Response
In this section, the CQR varying coefficient model will be introduced with missing response data to construct a class of estimators for an unknown coefficient function. In particular, as the main estimate methods in this paper, three estimators including the weighted local linear CQR (WLLCQR) estimator, the nonparametric WLLCQR (NWLLCQR) estimator, and the imputed WLLCQR (IWLLCQR) estimator are constructed and emphasized. Let n} be a random sample coming from Model (1), such that: where all the X i ∈ R p and U i ∈ R are always observed, and β(·) = (β 1 (·), · · · , β j (·), · · · , β p (·)) T ∈ R p is the coefficient vector function. Further, δ i = 0 if Y i is missing and δ i = 1 otherwise. We assume that throughout this paper, Y i is missing at random (MAR) for some i. This assumption indicates that δ i and Y i are conditionally independent given X i and U i , that is, where Z i = (X i , U i ) T . Moreover, we also assume that across different quantile regression models, there is the same coefficient vector function β(·). Thus, we can express the conditional τ-quantile function of Y as: where c τ is the τ-quantile of ε. If β j (·) is differentiable, Taylor's expansion yields that: for j = 1, 2, · · · , p, where u is a fixed value of a random variable and U lies in a neighborhood of u.
For the case of no missing response data, minimizing the following criterion: we can attain the local linear quantile regression (LLQR) estimator of β(u), where ρ τ (u) = u(τ − I (u<0) ) is called the quantile loss function of τ-quantile regression, a = (a 1 , · · · , a p ) T , b = (b 1 , · · · , b p ) T , and K h (·) = K h (·/h)/h is a Gaussian kernel function with bandwidth h. In order to improve the quantile regression estimation efficiently, the local linear composite quantile regression (LLCQR) estimation is adopted from Guo et al. [16] for the varying coefficient models. Let q be the number of quantiles and τ k = k/(1 + q) for k = 1, · · · , q. The loss function of the LLCQR estimation is defined as: where c k is the τ k -quantile of ε.
In what follows, this technique of LLCQR will be extended to handle the case of response data missing at random.

WLLCQR Estimation
The inverse probability weighting (IPW) version of local linear CQR estimation will be considered to handle missing responses data at random, that is the CC (complete-case) analysis will be adjusted by using the inverse of the selection probability as the weight. However, the nonparametric smoothing estimation of π(·) will encounter the curse of dimensionality when the dimension of Z is high enough. Motivated by Wang [24], we use the inverse marginal probability weighted approach. Let i.e., the propensity score just depends on U. When the inverse marginal probability function ∆(u) is known, the WLLCQR estimatorβ(u) of β(u) is defined as: where c = (c 1 , · · · , c q ) T and ∆(u) = P(δ = 1|U i = u). Here,β(u) =â is called the WLLCQR estimator of β(u) with ∆(u).

Nonparametric WLLCQR Estimation
However, the inverse marginal probability function in practical situations is usually unknown, and thus, it needs to be estimated. We often employ nonparametric smoothing estimation approaches to estimate the unknown selection probability ∆(·). The Nadaraya-Watson estimation [32] is one of these nonparametric smoothing estimation approaches. We can define the Nadaraya-Watson estimator of ∆(u) as:∆ where L h 0 (·) = L h 0 (·/h 0 )/h 0 is a density kernel function and h 0 is a bandwidth. Therefore, the NWLLCQR estimation procedure with∆(u) is formally defined as: whereβ N (u) =â N is called the NWLLCQR estimator of β(u) with∆(u).

Imputed WLLCQR Estimation
Although both the WLLCQR estimator and NWLLCQR estimator can well estimate the inverse marginal probability function, the information contained in the data is not explored fully. Now, we use quantile regression imputation to resolve the issue by imputing Y i by X Tβ Therefore, the imputed WLLCQR estimation procedure can be defined as: where

Remark 1.
Since local results in interpolation vary greatly and are unstable, as a smoothing method, the kernel function is used in Equations (4)- (10) such that the interpolation results of these equations are much smoother and stabler.

Asymptotic Properties
In this section, the asymptotic distribution will be considered for the estimators proposed in Section 2 to establish some theoretical results of these estimators.
Let f (·) and f U (·) be the density functions of ε and U, respectively. For simplicity, the following notations: will be used in this section. Now, the following results are established.
Theorem 2. Suppose ∆(u) > 0 is a smoothing function of u, based on Conditions C1-C7 in the Appendix holding. Then: :

A Bootstrap-Based Goodness-of-Fit Test
In investigating the varying coefficient model, how to test whether unknown coefficient functions are actually varying is of importance. In this section, the testing problem is considered for Model (1) under response missing, and then, a goodness-of-fit test is proposed based on the difference between the weighted residual sums of the quantile (WRSQ) and the LLCQR fittings under both the null and alternative hypotheses.
The following testing problem: for simplicity, is considered, where β is a constant vector. The model (1) becomes a classical linear model with missing responses under the null hypothesis. The WRSQ under H 0 is defined as: whereĉ 1 , · · · ,ĉ q andβ I are given by the following IWCQR estimation procedure: Similarly, the WRSQ under H 1 can be defined as: where cÎ 1 , · · · , cÎ q andβ I (u) are given in (10). Then, the following test statistic is given as: For a large value of T n , the null hypothesis (11)  Step 1. Assume the number of complete data is m. We get the IWLLCQR estimatorβ I (U i ).
Step 2 is repeated for M times, and then, series sets Step 4. The p value is approximately estimated byp = S M , where S is the cardinality of the set S = {j | T * n,j ≥ T n , j = 1, · · · , M}.

Simulation Study
A simulation study was carried out to investigate the finite-sample properties of our proposed method by a comparison among the WLLCQR estimation method, the NWLLCQR estimation method, the IWLLCQR estimation method, the INWLLCQR (imputed not weighted LLCQR) estimation method, and the WLLCQR estimation method without data missing, defined in (5).
In numerical studies, generally, the kernel function K(x) is taken to be K(x) = 0.75(1 − x 2 )I (|x|≤1) . Here, this function is still adopted. It follows that the cross-validation method is used to select the optimal bandwidths h opt . In the subsequent examples, let the composite level q = 8. Example 1. Consider the following model: , Z i3 are independent, Z i1 and Z i2 follow a uniform distribution on [−1, 1], Z i3 ∼ N(0, 1), and three error distributions of ε i are considered including ε i ∼ N(0, 1), 2N(0, 3 2 ).
The average missing rates of Y corresponding to these three selection probability functions are approximately 0.15, 0.36, and 0.45, respectively. For each of the three cases, we generated 500 Monte Carlo random samples of size 200. The performance of the estimators is illustrated via the MSE. The simulation results are given in Table 1. From Table 1, we can make the following observations: Under the same selection probability function ∆(u) and the same sample size n, the MSE of IWLLCQR∆ is only slightly smaller than the ones of WLLCQR ∆ and NWLLCQR∆, respectively; the MSE of INWLLCQR is also slightly smaller than the ones of WLLCQR ∆ and NWLLCQR∆, because much more information on missing data is considered in IWLLCQR∆ and INWLLCQR, while the MSE of INWLLCQR is slightly greater than the one of IWLLCQR∆. Further the MSE of IWLLCQR∆ is only slightly greater than the one of WLLCQR; this further confirms that the IWLLCQR∆ method is a safe alternative to WLLCQR ∆ and NWLLCQR∆. Now, the simulated curves are plotted with the case of ε i ∼ N(0, 1) under different levels of missing rates. Here, the results are presented only when n = 200, while the ones for the case of n = 100 are not given since these results were similar. Figures 1-3 summarize the finite sample performance of the NWLLCQR∆, IWLLCQR∆, INWLLCQR, and WLLCQR methods for β(u) under different levels of missing rates. The red dashed curve, the blue dashed curve, the blue dotted curve, and the green dashed-dotted curve represent the results obtained by the NWLLCQR∆ method, the IWLLCQR∆ method, the INWLLCQR method, and the WLLCQR method, respectively. In addition, the red solid curve denotes the real curve of β(u). From Figures 1-3, we can see that: (1) The simulation results based on the IWLLCQR∆ method were similar to those based on the NWLLCQR∆ method, the INWLLCQR∆ method, and the WLLCQR method under a lower level of missing rate. However, the IWLLCQR∆ method outperformed the INWLLCQR method, and the INWLLCQR method outperformed the NWLLCQR∆ method, under a higher level of missing rate.
(2) It can be easily found that the simulated curve obtained by the IWLLCQR∆ method was very close to the true curve. Thus, the imputed estimation was reasonable. However, the bias of the INWLLCQR method was slightly greater than those for the IWLLCQR∆ and the WLLCQR method.

Example 2.
To examine the performance of the proposed test method, we consider the following model: where X ∼ N(0, 1), ε ∼ t(5), U follows a uniform distribution on [0, 1], and X, U, and ε are independent.
In order to illustrate our methods by using the dataset, artificial missing data were created by deleting some of the response values in the dataset at random. Assume that 40% of the response values in this data are missed in this example. Consider the testing problem: In what follows, the proposed test procedure is applied in a simulation with 500 replications. For each replication, 500 samples were generated, and the bootstrap sampling was repeated 300 times. Suppose the significance level α = 0.05. Figure 4 shows that the simulated powers increased quickly as λ increased. In particular, the simulated size of the test T n was 0.043, which is close to the true significant level of α = 0.05, when the null hypothesis holds. This demonstrates that the bootstrap estimate of the null distribution was considerably effective, which shows that our test was very powerful.

A Real Data Example
In this section, we apply the methods proposed in this paper to the dataset on air pollution that the Norwegian Public Roads Administration collected. The dataset, which can be found in StatLib, consists of 500 observations. The varying coefficient model based on the CQR method was used by Guo and Tian [16] to fit the relation among the hourly values of the logarithm of the number of cars per hour (X 1 ), wind speed (X 2 ), the logarithm of the concentration of NO 2 (Y), and the hour of the day (T). We deleted about 35% of the completely observed Y randomly to illustrate our proposed methods. Now, we investigate the varying coefficient model with the response data missing: Since the coefficient functions of the model (15) are really time varying, we need to consider the following testing problem: where β = (β 1 , β 2 ) T is a constant vector. The model (15) is just a classical linear model if the null hypothesis in (16) is true. For the testing problem (16), we should reject the null hypothesis H 0 at a significance level of 0.05 because the p value of test T n was 0.00 based on 500 resampling bootstraps.
In addition, the estimated functions of β 1 (·) and β 2 (·) are given, and the computational results came from 500 simulation runs. The estimated coefficients and the standard deviations are summarized in Table 2 for WLLCQR ∆ , NWLLCQR∆, IWLLCQR∆, INWLLCQR, and WLLCQR.   Table 2, we can find that IWLLCQR∆ and WLLCQR had much smaller standard deviations than WLLCQR ∆ , WLLCQR∆, and INWLLCQR, respectively. In what follows, we give estimated functions of β 1 (·) and β 2 (·) along with the 95% bootstrap confidence bands. The results in Figures 5 and 6 show that β 1 (u) and β 2 (u) were time varying. Furthermore, we can also see that the IWLLCQR∆ method had almost equal confidence intervals as the WLLCQR method. Hence, the IWLLCQR∆ method was reasonable.

Discussions
In the simulation study and the practical applications, we mainly found that the CQRE method was much more stable, efficient, and effective for varying coefficient models with response data missing at random than the single QRE method and least squares method when the sample size was large enough and the error faced a different distribution, which means the bias was at a relatively low level and the correct selection rate relatively higher.
In the face of high-dimensional data, these three method were not ideal, but the CQRE method was relatively better. This case arises widely in many research fields such as reliability life testing, genetic data research, medical tracking trials, population census, economics and finance, environment monitoring and biomedical research, etc. How to modify our method to improve the performance of the CQRE method for high-dimensional varying coefficient models with response data missing at random is an important topic that we will study further.
On the other hand, this paper with only response data missing at random studied the CQR estimation and inference for varying coefficient models. However, it did not consider the case of covariant data missing, nor even the more general case of both response and covariant data missing. These problems are more challenging topics that we will explore and study further in the coming year.

Concluding Remarks
In this paper, a CQRE method was proposed for varying coefficient models with response data missing at random to develop three estimators including the WLLCQR estimator, the NWLLCQR estimator, and the IWLLCQ estimator for unknown coefficient functions and establish some results on the asymptotic normality of these proposed estimators under some mild conditions. Following, a bootstrap-based test procedure was designed to perform a simulation study, which demonstrated that the unknown coefficient estimators with IWLLCQR were superior to the other two ones with WLLCQR and NWLLCQR. Meanwhile, based on the IWLLCQR fittings, a bootstrap test procedure was also designed to test whether the coefficient functions were actually varying. Finally, a type of investigated real-life dataset was analyzed to illustrate that the CQRE method was much more stable, efficient, and effective for varying coefficient models with response data missing at random than the single QRE method and least squares method.
Author Contributions: All the authors inferred the main conclusions and approved of the current version of this manuscript.

Acknowledgments:
The authors would like to thank the anonymous referees for their valuable comments and suggestions, which actually stimulated this work. The work was supported by the National Natural Science

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The following conditions are needed for the results in Section 3. (C1) {(Y i , X i ) : i = 1, 2, · · · , n} are independent and identically distributed random vectors. (C2) The density function f (·) of ε has a continuous and uniformly-bounded derivative, namely 0 < sup s f (s) < B 0 .
(C3) Matrix E(X T i X i |T = t) is a positive definite matrix, and E(X i |T) = 0. (C4) Random variable U has a second-order differentiable density function f U (u) > 0 in some neighborhood of u.
(C5) The coefficient function β(u) is second-order differentiable in a neighborhood of a given u, and β = 0 is continuous.
(C6) The kernel function K(·) is a symmetric density function with a compact support, whose bandwidth h → 0, nh → ∞ as n → ∞.
(C8) The selection probability function ∆(u) > 0 has a bounded and continuous second derivative on the support of U.
The following lemma is useful for proving some theorems given in Section 3.
Lemma A1 (See Lemma 2 in [15]). Let (Y 1 , X 1 ), (Y 2 , X 2 ), · · · , (Y n , X n ) be independent and identically distributed (i.i.d) random vectors, where the Y i s are scalar random variables. Suppose that E|Y i | 3 < ∞ and sup x |y| s ϕ(x, y)dy < ∞, where f (·, ·) represents the density of (X, Y). Let K(·) be a bounded positive function with a bounded support, satisfying the Lipschitz condition. Then: In what follows, the main theorems in Section 3 will be proven.

Proof of Theorem 1. Let
/h} T with e k a q-dimensional vector with one at the k th position and zero, elsewhere. Since: θ is the minimizer of the criterion: Applying the following identity (see Knight [34]): we can rewrite T n (∆(U i ), θ) as: Then, it follows from Lemma A1 that B n,k (θ) = E(B n,k (θ)) + o p (1). Denote: S n (u) = Diag S n,11 (u) S n,12 (u) S n,21 (u) S n,22 (u) , S n,33 (u) , where: We observe that by the iterative expectation: As in Parzen [35], we have: Based on the above results, we can prove that: where: By Corollary 2 of Knight [34], we have θ P −→ −[S(u)] −1 W n (u). Assume Condition (C3) is satisfied. Then, S(u) = Diag{S 11 (u), S 22 (u), S 33 (u)}. Simple calculation of the block matrix yields: It is easy to verify that E(W n,2 (u)) = 0. As in Parzen [35], we obtain: By the central limit theorem, we getW n,2 (u) Kai et al. (2011) [15], we can show that: By Slutsky's theorem, we obtain: Since: it is easy to obtain: Together with (A2) and (A4), we have: This completes the proof of Theorem 1.
, then we have: Similar to the proof of Theorem 2, we have: where: We can prove: Similar to the proof of Theorem 1, we can complete the proof.