1. Introduction
Variable screening technique has been demonstrated as a computationally fast and efficient tool in solving many problems in ultrahigh dimensions. For example, in many scientific areas, such as biological genetics, finance and econometrics, we may collect the ultrahigh dimensional data sets (e.g., biomarkers, financial factors, assets and stocks), where the number 
 of predictors extremely exceeds the sample size 
n. Theoretically, ultrahigh dimension often refers to the dimensionality 
 and sample size 
n satisfies the relationship: 
 for some constant 
. Variable screening is able to reduce the computational cost, to avoid the instability of algorithms, and to improve the estimation accuracy. These issues exist in the variable selection approaches based on LASSO [
1], SCAD [
2,
3] or MCP [
4] for ultrahigh dimensional data. Since the seminal work of [
5], which pioneeringly proposed the sure independence screening (SIS) procedure, many variable screening approaches have been consecutively documented over the last fifteen years, including the model-based methods (e.g., [
6,
7,
8,
9,
10,
11]) and the model-free methods [
12,
13,
14,
15,
16,
17,
18,
19,
20]. These papers have showed that with probability approaching one, the set of selected predictors contain the set of all truly important predictors.
Most marginal approaches focus only on developing various effective and robust measures to characterize the marginal association between the response and individual predictor. Whereas, these methods do not take into consideration the influence of conditional variables or confounding factors on the response. A simple application of SIS is relatively rough since SIS may perform poorly when predictors are highly correlated with each other. Some predictors that are weakly relevant or irrelevant, but jointly correlated to the response, may be excluded in the final model after applying marginal screening methods. This will result in a high false positive rate (FPR). To surmount this weakness, an iterated screening algorithm or a penalization-based variable selection is usually offered as a refined follow-up step (e.g., [
5,
10]).
Conditional variable screening can be viewed as an important extension of the marginal screening. It accounts for conditional information when calculating the marginal screening utility. There is relatively less work in the literature. To name a few, Ref. [
21] proposed a conditional SIS (CIS) procedure to improve the performance of SIS because some correlated conditional variables may increase the chance of boosting the rank of the marginally weak predictor and that of reducing the number of false negatives. The paper [
22] proposed a confounder-adjusted screening method for high dimensional censoring data, in which the additional environmental confounders are regarded as conditional variables. The researchers in [
23] studied the variable screening by incorporating within-subject correlation for ultrahigh dimensional longitudinal data, where they used some baseline variables as conditional variables. Ref. [
24] proposed a conditional distance correlation-based screening via kernel smoothing method, while [
25] further presented a screening procedure based on conditional distance correlation, which is similar to [
24] in methodology, but differs in theory. Additionally, Ref. [
11] developed a conditional quantile correlation-based screening approach using the B-spline smoothing technique. However, in [
11,
24,
25], among others, the conditional variable they considered is only univariate. Further, Ref. [
21] focuses on the generalized linear models, but cannot handle heavy-tailed data. For this regard, we aim to develop a screener that behaves more robustly to outliers and heavy-tailed data, and simultaneously considers more than one conditional variable. On the choice of conditional variables, one can achieve that through some prior knowledge such as published research work or the experience of experts from relevant subjects. When no prior knowledge is available, one can apply some marginal screening approaches, such as the SIS or its robust variants, to select several top-ranked predictors as conditional variables.
On the other hand, to the best of our knowledge, several works have considered multiple conditional variables based on distinct partial correlations. For instance, Ref. [
26] proposed a thresholded partial correlation approach to select significant variables in linear regression models. Additionally, Ref. [
17] presented a screening procedure on the basis of the quantile partial correlation in [
27], and they referred to the procedure as QPC-SIS. More recently, Ref. [
28] proposed a copula partial correlation-based screening approach. It is worth noting that the partial correlation used in both [
17,
28] removes the effect of conditional variables on the response and each predictor through fitting two parametric models with a linear structure. However, this manner may be ineffective, especially when the conditional variables have a nonlinear influence on the response nonlinear. This motivates us to work out a flexible way to control the impact of conditional variables. Meanwhile, we also take into account the issue of the robustness to outlying or heavy-tail response in this paper.
This paper contributes a robust and flexible conditional variable screening procedure via a partial correlation coefficient, which is a non-trivial extension of [
17]. First of all, in order to precisely control conditional variables, we propose a nonparametric definition of QPC, which extends that of [
17] and allows for more flexibility. Specifically, we first fit two nonparametric additive models to remove the effect of conditional variables on the response and an individual predictor, where we use the B-spline smoothing technique to estimate the nonparametric functions. This can be viewed as a nonparametric adjustment for controlling conditional variables. By that, we can obtain two residuals, on which a quantile correlation can be calculated to formulate a nonparametric QPC. Second, we use this quantity as the screening utility in variable screening. This procedure can be implemented rapidly. We refer to this procedure as the nonparametric quantile partial correlation-based screening, denoted as NQPC-SIS. Third, theoretically, we establish the sure screening property for NQPC-SIS under some mild conditions. Compared to [
17], our approach is more flexible and our theory on the sure screening property is more difficult to derive. Moreover, our screening idea can be easily transferred to some existing screening methods that use some popular partial correlation.
The remainder of the paper is organized as follows. In 
Section 2, the NQPC-SIS is introduced. The technical conditions needed are listed and asymptotic properties are established in 
Section 3. 
Section 4 provides an iterative algorithm for a further refinement. Numerical studies and empirical analysis of real data set are carried out in 
Section 5. Concluding remarks are given in 
Section 6. All the proofs of the main results are relegated to the 
Appendix A.
  3.  Theoretical Properties
To state our theoretical results, we first make some notations. Let . Throughout the rest of the paper, for any matrix , we use , , and  and  to stand for the operator norm, the infinity norm as well as the minimum and maximum eigenvalues for a symmetric matrix , respectively. In addition, for any vector ,  means the Euclidean norm.
Denote 
 and 
, where 
 is given in Equation (
4) and 
 is given in Equation (
7). Further, we also denote 
, where
      
      where 
, 
 and 
. Before we establish the uniform convergence of 
 to 
, we first investigate the bound of the gap between 
 and 
, which is helpful to understand the marginal signal level after applying B-spline approximation to the population utility. We need the following conditions:
- (B1) 
- We assume that  -  and  -  denotes the support of covariate  - . There exist some positive constants  -  and  -  such that for any  - ,
           - 
          where  d-  is defined in condition (C1) below. 
- (B2) 
- There exist some positive constants  -  such that
           - 
          where  -  and  -  are given in ( 4- ) and ( 8- ), respectively. 
- (B3) 
- In a neighborhood of , the conditional density of Y given , , is bounded on the support of  and uniformly in j. 
- (B4) 
-  for some  and . 
Condition (B1) is imposed on the approximation error condition for nonparametric function in B-spline smoothing literature (e.g., [
11,
30,
31]). Condition (B2) requires variances 
 and 
 to be uniformly bounded. Condition (B3) implies that there exists a finite constant 
 such that for a small 
, 
 holds uniformly. Condition (B4) guarantees that the marginal signal of active components in model 
 does not vanish. These conditions are similar to those in [
17].
Proposition 1. Under conditions (B1)–(B3), there exists a positive constant  such that In addition, if condition (B4) further holds, thenprovided that  for some .  To establish the sure screening property, we make the following assumptions:
- (C1) 
-  and  -  belong to a class of functions  - , whose  r- th derivatives  -  and  -  exist and are Lipschitz of order  - ,
           - 
          for some positive constant  K- , where  -  is the support of  - ,  r-  is a non-negative integer and  -  such that  - . 
- (C2) 
- The joint density of ,  is bounded by two positive numbers  and  satisfying . The density of ,  is bounded away from zero and infinity uniformly in j, that is, there exist two positive constants  and  such that . 
- (C3) 
- There exist two positive constants  and , such that  for every j. 
- (C4) 
- The conditional density of Y given , , satisfies the Lipschitz condition of first order and  for some positive constants  and  for any y in a neighborhood of  for . 
- (C5) 
- There exist some positive constants  and  such that , . Furthermore, assume that  for some constant . 
- (C6) 
- There exists some constant  such that . 
Condition (C1) is a smoothness assumption on 
 and 
 in nonparametric B-spline-related literature ([
7,
32]). Condition (C3) is a moment constraint on each of the predictors. Conditions (C2), (C4) and (C5) are similar to those imposed in [
17]. Condition (C6) is assumed to ensure the marginal signal level of truly active variables not too weak after B-spline approximation. The above conditions are standard in variable screening literature (e.g., [
17,
28]).
According to the properties of normalized B-splines and under the conditions (C1) and (C2) (c.f., [
33,
34]), we can obtain the fact that for each 
 and 
, there exist positive constants 
 and 
 independent of 
 such that
      
      and
      
The following lemma bounds the eigenvalues of the B-spline basis matrix from below and from above. This result extends Lemma 3 of [
32] from a fixed dimension to a diverging dimension, which may be crucial to the independent interest of some readers.
Lemma 1. Suppose that conditions (C1) and (C2) hold, then we havewhere  for some constant .  This result reveals that  plays an important role in bounding the eigenvalues of the B-spline basis matrix. When  goes to infinity rapidly, the minimum eigenvalue of the basis matrix will degrade to zero very quickly at an exponential rate. However, if the following result holds, then the divergence rate of  cannot achieve a polynomial order of n, but can be of an order of .
Theorem 1. Suppose that conditions (B1)–(B5) and (C1)–(C5) hold and assume that  and  are satisfied.
- (i) 
- For any , then there exist some positive constants  such that, for  and sufficiently large n,where  and  is given in Lemma 1. 
- (ii) 
- In addition, if condition (C6) is further satisfied, by choosing  with , we havefor sufficiently large n, where . 
 The above establishes the sure screening property that all the relevant variables can be recruited with probability going to one in the final model. The probability bound in the property is free of 
, but depends on 
 and the number of basis functions 
. Though this ensures that NQPC-SIS retains all important predictors with high probability, the noisy variables can be included by NQPC-SIS. Ideally, this can be realized by the choice of 
, according to Theorem 1 and by setting 
, to achieve the selection consistency, i.e.,
      
      when 
n is sufficiently large. This property can also be achieved by Theorem 1 and by assuming that 
 for 
. However, this would be too restrictive to check in practice. Similar to [
17], we may assume that 
 for some 
 to control the false selection rate. With this condition, we can obtain the following property to control the size of the selected model.
Theorem 2. Under the conditions of Theorem 1 and by choosing  with  and if  for some , then for some positive constant , there exist some constants  such thatfor sufficiently large n.  This theorem reveals that after an application of the NQPC-SIS, the dimensionality can be reduced from an exponential order to a polynomial size of n at the same time retaining all the important predictors with probability approaching one.
  4. Algorithm for NQPC-SIS
To make the NQPS-SIS practically applicable, for each 
, we need to specify the conditional set 
. We note that a sequential test was developed to identify 
 in [
17] via an application of the Fisher’s Z-transformation [
35] and partial correlation. In this section, we provide a two-stage procedure based on nonparametric additive quantile regression model, which can be viewed as a complementary to [
17].
To reduce the computational burden, we first apply the quantile-adaptive model-free feature screening (Qa-SIS) proposed by [
13] to select a subset from 
, denoted by 
 with 
, where 
 is the number of basis functions used in Qa-SIS and 
 denotes the largest integer not exceeding 
a. Second, for each 
, if 
, we set 
, otherwise 
. Thus, 
. Third, we carry out a variable selection with SCAD penalty [
2] based on additive quantile regression model for data set 
 and then a small reduced subset is obtained, denoted by 
. Such a two-stage procedure can help to find the conditional subset for the 
jth variable and will be incorporated in the following algorithm. With a slight abuse of notation, we use 
 to denote the screening threshold parameter of the NQPC-SIS, in other words, for the NQPC-SIS, we select 
 covariates that correspond to the first 
 largest NQPCs.
Algorithm 1 has the same spirit as the QPCS algorithm of [
17], who demonstrated empirically that the QPCS algorithm outperforms their QTCS and QFR algorithms. In the implementation, we choose 
 and 
, which does not exclude other choice. According to our limited simulation experience, this choice works satisfactorily. The values of 
 and 
 we take on cannot be too large, due to the use of B-spline basis approximations. Theoretically, we need to specify 
 such that 
, while it is sufficient to require 
 practically.
      
| Algorithm 1 The implementation of NQPC-SIS. | 
| 1:Given , we set a pre-specified number  and an initial set .2:For ,
				   (2a)update ;(2b)update  , where the variable index   is defined by
                  
3:For ,
				   (3a)update ;(3b)update  , where the variable index   is such that
                  
4:Repeat Step 3 until . The final selected set is denoted as .
 |