Abstract
Considering the influence of conditional variables is crucial to statistical modeling, ignoring this may lead to misleading results. Recently, Ma, Li and Tsai proposed the quantile partial correlation (QPC)-based screening approach that takes into account conditional variables for ultrahigh dimensional data. In this paper, we propose a nonparametric version of quantile partial correlation (NQPC), which is able to describe the influence of conditional variables on other relevant variables more flexibly and precisely. Specifically, the NQPC firstly removes the effect of conditional variables via fitting two nonparametric additive models, which differs from the conventional partial correlation that fits two parametric models, and secondly computes the QPC of the resulting residuals as NQPC. This measure is very useful in the situation where the conditional variables are highly nonlinearly correlated with both the predictors and response. Then, we employ this NQPC as the screening utility to do variable screening. A variable screening procedure based on NPQC (NQPC-SIS) is proposed. Theoretically, we prove that the NQPC-SIS enjoys the sure screening property that, with probability going to one, the selected subset can recruit all the truly important predictors under mild conditions. Finally, extensive simulations and an empirical application are carried out to demonstrate the usefulness of our proposal.
Keywords:
ultrahigh dimensional screening; quantile partial correlation; conditional variables; sure screening property MSC:
62H30; 62J07
1. Introduction
Variable screening technique has been demonstrated as a computationally fast and efficient tool in solving many problems in ultrahigh dimensions. For example, in many scientific areas, such as biological genetics, finance and econometrics, we may collect the ultrahigh dimensional data sets (e.g., biomarkers, financial factors, assets and stocks), where the number of predictors extremely exceeds the sample size n. Theoretically, ultrahigh dimension often refers to the dimensionality and sample size n satisfies the relationship: for some constant . Variable screening is able to reduce the computational cost, to avoid the instability of algorithms, and to improve the estimation accuracy. These issues exist in the variable selection approaches based on LASSO [1], SCAD [2,3] or MCP [4] for ultrahigh dimensional data. Since the seminal work of [5], which pioneeringly proposed the sure independence screening (SIS) procedure, many variable screening approaches have been consecutively documented over the last fifteen years, including the model-based methods (e.g., [6,7,8,9,10,11]) and the model-free methods [12,13,14,15,16,17,18,19,20]. These papers have showed that with probability approaching one, the set of selected predictors contain the set of all truly important predictors.
Most marginal approaches focus only on developing various effective and robust measures to characterize the marginal association between the response and individual predictor. Whereas, these methods do not take into consideration the influence of conditional variables or confounding factors on the response. A simple application of SIS is relatively rough since SIS may perform poorly when predictors are highly correlated with each other. Some predictors that are weakly relevant or irrelevant, but jointly correlated to the response, may be excluded in the final model after applying marginal screening methods. This will result in a high false positive rate (FPR). To surmount this weakness, an iterated screening algorithm or a penalization-based variable selection is usually offered as a refined follow-up step (e.g., [5,10]).
Conditional variable screening can be viewed as an important extension of the marginal screening. It accounts for conditional information when calculating the marginal screening utility. There is relatively less work in the literature. To name a few, Ref. [21] proposed a conditional SIS (CIS) procedure to improve the performance of SIS because some correlated conditional variables may increase the chance of boosting the rank of the marginally weak predictor and that of reducing the number of false negatives. The paper [22] proposed a confounder-adjusted screening method for high dimensional censoring data, in which the additional environmental confounders are regarded as conditional variables. The researchers in [23] studied the variable screening by incorporating within-subject correlation for ultrahigh dimensional longitudinal data, where they used some baseline variables as conditional variables. Ref. [24] proposed a conditional distance correlation-based screening via kernel smoothing method, while [25] further presented a screening procedure based on conditional distance correlation, which is similar to [24] in methodology, but differs in theory. Additionally, Ref. [11] developed a conditional quantile correlation-based screening approach using the B-spline smoothing technique. However, in [11,24,25], among others, the conditional variable they considered is only univariate. Further, Ref. [21] focuses on the generalized linear models, but cannot handle heavy-tailed data. For this regard, we aim to develop a screener that behaves more robustly to outliers and heavy-tailed data, and simultaneously considers more than one conditional variable. On the choice of conditional variables, one can achieve that through some prior knowledge such as published research work or the experience of experts from relevant subjects. When no prior knowledge is available, one can apply some marginal screening approaches, such as the SIS or its robust variants, to select several top-ranked predictors as conditional variables.
On the other hand, to the best of our knowledge, several works have considered multiple conditional variables based on distinct partial correlations. For instance, Ref. [26] proposed a thresholded partial correlation approach to select significant variables in linear regression models. Additionally, Ref. [17] presented a screening procedure on the basis of the quantile partial correlation in [27], and they referred to the procedure as QPC-SIS. More recently, Ref. [28] proposed a copula partial correlation-based screening approach. It is worth noting that the partial correlation used in both [17,28] removes the effect of conditional variables on the response and each predictor through fitting two parametric models with a linear structure. However, this manner may be ineffective, especially when the conditional variables have a nonlinear influence on the response nonlinear. This motivates us to work out a flexible way to control the impact of conditional variables. Meanwhile, we also take into account the issue of the robustness to outlying or heavy-tail response in this paper.
This paper contributes a robust and flexible conditional variable screening procedure via a partial correlation coefficient, which is a non-trivial extension of [17]. First of all, in order to precisely control conditional variables, we propose a nonparametric definition of QPC, which extends that of [17] and allows for more flexibility. Specifically, we first fit two nonparametric additive models to remove the effect of conditional variables on the response and an individual predictor, where we use the B-spline smoothing technique to estimate the nonparametric functions. This can be viewed as a nonparametric adjustment for controlling conditional variables. By that, we can obtain two residuals, on which a quantile correlation can be calculated to formulate a nonparametric QPC. Second, we use this quantity as the screening utility in variable screening. This procedure can be implemented rapidly. We refer to this procedure as the nonparametric quantile partial correlation-based screening, denoted as NQPC-SIS. Third, theoretically, we establish the sure screening property for NQPC-SIS under some mild conditions. Compared to [17], our approach is more flexible and our theory on the sure screening property is more difficult to derive. Moreover, our screening idea can be easily transferred to some existing screening methods that use some popular partial correlation.
The remainder of the paper is organized as follows. In Section 2, the NQPC-SIS is introduced. The technical conditions needed are listed and asymptotic properties are established in Section 3. Section 4 provides an iterative algorithm for a further refinement. Numerical studies and empirical analysis of real data set are carried out in Section 5. Concluding remarks are given in Section 6. All the proofs of the main results are relegated to the Appendix A.
2. Methodology
2.1. A Preliminary
In this section, we formally introduce the NQPC-SIS procedure. To begin with, we give some background on the quantile correlation (QC) introduced in [27]. Let X and Y be two random variables, and be the expectation of X. The definition of QC is formulated as
where is the th quantile of Y, and for some quantile level , here denotes an indicator function. This correlation takes on a value between and 1 and is asymmetric with respect to Y and X compared with the conventional correlation coefficient. The QC shares the merits: the property of monotone invariance for Y as well as the robustness of Y, due to the use of the quantile rather than the mean in the definition. Thus, QC affects little in the presence of outliers in Y. Besides, as shown in [27], is closely related to the quantile regression. If we denote by the minimizer of with respect to and , where . Then, it follows that , where is a continuous and increasing function, and if and only if .
When QC is used as a marginal screening utility for variable screening, the screening results obtained may be misleading when the predictors are highly correlated. To overcome this problem, Ref. [17] proposed the screening based on quantile partial correlation (QPC) to reduce the effect from conditional predictors. For the sake of presentation, write for . The QPC in [17] is defined as
where , and . When applying the QPC to variable screening, we must estimate two quantities and in advance. However, for ultrahigh dimensional data, the dimensionality of is , which can still be much bigger than the sample size n. In this situation, it is difficult to obtain the estimators of and . On the other hand, it is usually believed that the useful conditional variables are relatively less. Thus, it is reasonable to consider a small subset of , denoted by in practice. Here, is said to be conditional set with a size smaller than n and it can be specified as the set of previously selected variables and the variables related to the jth predictor, if there is no prior knowledge on it. As a result, Ref. [17] suggested using the following measure to perform variable screening:
where , and , in which .
From the definition, one can see that the QPC is just the QC between Y and after removing the confounding effects of conditional variables . Typically, it is through fitting two parametric regression models: one is to fit a linear quantile regression of Y on , and another is on a multivariate linear regression of on . Afterwards, the QPC computes the QC of two residuals that are obtained from these two regression fittings. However, in real applications, the parametric models used to dispel the confounding effects may not be adequate, especially when a nonlinear dependence structure between the response and the predictions is present, which is quite common in high-dimensional data. This motivates us to consider a more flexible and efficient approach to control the influence of the confounding/conditional variables.
2.2. Proposed Method: NQPC-SIS
Without loss of generality, we assume that the predictors are standardized and the response Y satisfies -qauntile centered, i.e., , which is similar to the treatment where the response is centered by mean. Then, we consider the quantile additive model as
where the error term satisfies . This means that the conditional -quantile of Y given is . We denote by the active set, which indicates the set of indices associated with the nonzero coefficients in the true model and is often assumed to be sparse.
Let be the cardinality of a set , and and , be -smoothing functions satisfying some conditions. For the identification, we require that and for all . Set and . A nonparametric version of QPC (denoted as NQPC) is formulated as
where , and . Suppose we have a dataset: consisting of n independent copies of , where the dimensionality of is . Let be the sub-vector of indexed by . Then, a sample estimate for NQPC can be given as
where , and . Since and are unknown nonparametric functions, so and cannot be used, rendering inapplicable. In what follows, we estimate each of s and s by making use of nonparametric B-spline approximation.
To proceed, we denote with by a sequence of normalized and centered B-spline basis functions, where is the number of basis functions. Then, according to the theory of B-spline approximation ([29]), for a generic smoothing function m, there exists a vector such that , where . Therefore, there exist vectors and such that and . Since and , it naturally implies that for . Write , and . Denote by , and , where
and
where indicates within being replaced by for and . Then, it follows that a feasible sample estimate for NQPC is given by
Next, we employ the above NQPC estimator as a screening utility in variable screening. To this end, we denote to be the selected active set via the screening procedure such that the maximal absolute sample NQPC of the selected variables in are greater than a user-specified threshold value . In other words, we can select an active set of variables by
We name this procedure as the NQPC-based variable screening, abbreviated as NQPC-SIS. In the next section, we will provide some theoretical justification for this approach.
3. Theoretical Properties
To state our theoretical results, we first make some notations. Let . Throughout the rest of the paper, for any matrix , we use , , and and to stand for the operator norm, the infinity norm as well as the minimum and maximum eigenvalues for a symmetric matrix , respectively. In addition, for any vector , means the Euclidean norm.
Denote and , where is given in Equation (4) and is given in Equation (7). Further, we also denote , where
where , and . Before we establish the uniform convergence of to , we first investigate the bound of the gap between and , which is helpful to understand the marginal signal level after applying B-spline approximation to the population utility. We need the following conditions:
- (B1)
- We assume that and denotes the support of covariate . There exist some positive constants and such that for any ,where d is defined in condition (C1) below.
- (B2)
- (B3)
- In a neighborhood of , the conditional density of Y given , , is bounded on the support of and uniformly in j.
- (B4)
- for some and .
Condition (B1) is imposed on the approximation error condition for nonparametric function in B-spline smoothing literature (e.g., [11,30,31]). Condition (B2) requires variances and to be uniformly bounded. Condition (B3) implies that there exists a finite constant such that for a small , holds uniformly. Condition (B4) guarantees that the marginal signal of active components in model does not vanish. These conditions are similar to those in [17].
Proposition 1.
Under conditions (B1)–(B3), there exists a positive constant such that
In addition, if condition (B4) further holds, then
provided that for some .
To establish the sure screening property, we make the following assumptions:
- (C1)
- and belong to a class of functions , whose rth derivatives and exist and are Lipschitz of order ,for some positive constant K, where is the support of , r is a non-negative integer and such that .
- (C2)
- The joint density of , is bounded by two positive numbers and satisfying . The density of , is bounded away from zero and infinity uniformly in j, that is, there exist two positive constants and such that .
- (C3)
- There exist two positive constants and , such that for every j.
- (C4)
- The conditional density of Y given , , satisfies the Lipschitz condition of first order and for some positive constants and for any y in a neighborhood of for .
- (C5)
- There exist some positive constants and such that , . Furthermore, assume that for some constant .
- (C6)
- There exists some constant such that .
Condition (C1) is a smoothness assumption on and in nonparametric B-spline-related literature ([7,32]). Condition (C3) is a moment constraint on each of the predictors. Conditions (C2), (C4) and (C5) are similar to those imposed in [17]. Condition (C6) is assumed to ensure the marginal signal level of truly active variables not too weak after B-spline approximation. The above conditions are standard in variable screening literature (e.g., [17,28]).
According to the properties of normalized B-splines and under the conditions (C1) and (C2) (c.f., [33,34]), we can obtain the fact that for each and , there exist positive constants and independent of such that
and
The following lemma bounds the eigenvalues of the B-spline basis matrix from below and from above. This result extends Lemma 3 of [32] from a fixed dimension to a diverging dimension, which may be crucial to the independent interest of some readers.
Lemma 1.
Suppose that conditions (C1) and (C2) hold, then we have
where for some constant .
This result reveals that plays an important role in bounding the eigenvalues of the B-spline basis matrix. When goes to infinity rapidly, the minimum eigenvalue of the basis matrix will degrade to zero very quickly at an exponential rate. However, if the following result holds, then the divergence rate of cannot achieve a polynomial order of n, but can be of an order of .
Theorem 1.
Suppose that conditions (B1)–(B5) and (C1)–(C5) hold and assume that and are satisfied.
- (i)
- For any , then there exist some positive constants such that, for and sufficiently large n,where and is given in Lemma 1.
- (ii)
- In addition, if condition (C6) is further satisfied, by choosing with , we havefor sufficiently large n, where .
The above establishes the sure screening property that all the relevant variables can be recruited with probability going to one in the final model. The probability bound in the property is free of , but depends on and the number of basis functions . Though this ensures that NQPC-SIS retains all important predictors with high probability, the noisy variables can be included by NQPC-SIS. Ideally, this can be realized by the choice of , according to Theorem 1 and by setting , to achieve the selection consistency, i.e.,
when n is sufficiently large. This property can also be achieved by Theorem 1 and by assuming that for . However, this would be too restrictive to check in practice. Similar to [17], we may assume that for some to control the false selection rate. With this condition, we can obtain the following property to control the size of the selected model.
Theorem 2.
Under the conditions of Theorem 1 and by choosing with and if for some , then for some positive constant , there exist some constants such that
for sufficiently large n.
This theorem reveals that after an application of the NQPC-SIS, the dimensionality can be reduced from an exponential order to a polynomial size of n at the same time retaining all the important predictors with probability approaching one.
4. Algorithm for NQPC-SIS
To make the NQPS-SIS practically applicable, for each , we need to specify the conditional set . We note that a sequential test was developed to identify in [17] via an application of the Fisher’s Z-transformation [35] and partial correlation. In this section, we provide a two-stage procedure based on nonparametric additive quantile regression model, which can be viewed as a complementary to [17].
To reduce the computational burden, we first apply the quantile-adaptive model-free feature screening (Qa-SIS) proposed by [13] to select a subset from , denoted by with , where is the number of basis functions used in Qa-SIS and denotes the largest integer not exceeding a. Second, for each , if , we set , otherwise . Thus, . Third, we carry out a variable selection with SCAD penalty [2] based on additive quantile regression model for data set and then a small reduced subset is obtained, denoted by . Such a two-stage procedure can help to find the conditional subset for the jth variable and will be incorporated in the following algorithm. With a slight abuse of notation, we use to denote the screening threshold parameter of the NQPC-SIS, in other words, for the NQPC-SIS, we select covariates that correspond to the first largest NQPCs.
Algorithm 1 has the same spirit as the QPCS algorithm of [17], who demonstrated empirically that the QPCS algorithm outperforms their QTCS and QFR algorithms. In the implementation, we choose and , which does not exclude other choice. According to our limited simulation experience, this choice works satisfactorily. The values of and we take on cannot be too large, due to the use of B-spline basis approximations. Theoretically, we need to specify such that , while it is sufficient to require practically.
| Algorithm 1 The implementation of NQPC-SIS. |
|
5. Numerical Studies
5.1. Simulations
In this subsection, we conduct some simulation studies to examine the finite sample performance of the proposed NQPC-SIS. In order to evaluate the performance, we employ three criteria: the minimum model size (MMS), i.e., the smallest number of covariates that contain all the active variables, its robust standard deviation (RSD), and the proportion of all the active variables selected () with the screening threshold parameter being specified as . Throughout this subsection, we adopt the following simulation settings: the sample size , the number of basis , and the dimensionality . We simulate the random error from two distributions: and , respectively. Three quantile levels are considered in all situations. For each simulation scenario, all the results are obtained over replications.
Example 1.
Let be a -dimensional random vector having a multivariate normal distribution with mean zero and covariance matrix , where and except that . Generate the response as:
It is easily observed that the marginal Pearson’s correlation between and Y is zero. We take and set to incorporate the quantile information.
Example 2.
We follow the simulation model of [17] and generate the response as
where and are defined as in Example 1 except that such that is uncorrelated with .
Example 3.
We simulate the response from the following nonlinear model:
where . The covariates are simulated from a random-effects model , where s and U are iid . We consider two cases of and , corresponding to and for , respectively.
Example 4.
We consider the same model as that in Example 3, with exception that and are replaced by and , where and is independent of ε, the error in the model in Example 3.
The simulation results of Examples 1–4 are shown in Table 1, Table 2, Table 3 and Table 4, respectively. The results in Table 1 show that when the true relation between the response and covariates in the model is linear, the SIS, NIS and Qa-SIS methods fail to work. However, when comparing to those methods, we can see that both QPC-SIS and NQPC-SIS with work reasonably well, although the QPC-SIS slightly outperform our NQPC-SIS when . This is expected because the QPC works for the model with linear relationship between the covariates. A similar observation can be drawn in Table 2 for Example 2, which is also a linear model, albeit the difference that and are independent in Example 2. The results in Table 3 indicate that when the relationship between Y and is nonlinear and the relationship between covariates is linear, our proposed NQPC-SIS performs best and then followed by QPC-SIS. From Table 4, we can see that when the relationship between Y and is nonlinear and there also exists a nonlinear relationship among , NQPC-SIS works most satisfactorily and is much better than Qa-SIS and QPC-SIS in terms of both MMS and selection rate .
Table 1.
Simulation results for Example 1 when n = 200.
Table 2.
Simulation results for Example 2 when .
Table 3.
Simulation results for Example 3 when .
Table 4.
Simulation results for Example 4 when .
In addition, the simulation results of QPC-SIS and NQPC-SIS for Examples 1–4 with and are reported in Table 5. It can be observed from Table 5 that when the sample size increases from 200 to 400, the performance of QPC-SIS and NQPC-SIS are improved by much, although QPC-SIS and NQPC-SIS perform very competitively in Examples 1 and 2, while NQPC-SIS performs significantly better than QPC-SIS in Examples 3 and 4. These evidences indicate the effectiveness and usefulness of our NQPC-SIS.
Table 5.
Simulation results for Examples 1 to 4 when and .
As suggested by one anonymous reviewer, we add one more simulation to compare our NQPC-SIS with the following two approaches: (a) QC-SIS, which is the screening method based on quantile correlation, but simply ignores the effect of conditional variables on the response, and (b) RFQPC-SIS, which is a procedure very similarly to our NQPC-SIS, yet removes the effect of conditional variables through fitting Random Forest models. We examine the performance of these three approaches under and for Examples 1 to 4, where RFQPC-SIS is a variant of the NQPC method and implemented with randomForest in R package “randomForest”. Note that RFQPC-SIS requires random forest regressions in the k-th iteration, which is highly computationally intensive. Here, we evaluate NQPC-SIS, QC-SIS and RFQPC-SIS using effective model size (EMS) and , where EMS indicates the average of true variables contained in the first variables selected from 200 replicate experiments. The results are reported in Table 6, showing that our NQPC-SIS still performs the best and is followed by RFQPC-SIS. Moreover, the computational cost of NQPC-SIS is much less than that of RFQPC-SIS.
Table 6.
Simulation results for Examples 1 to 4 when and .
5.2. An Application to Breast Cancer Data
In this subsection, we apply the proposed NQPC-SIS to breast cancer data with a high lethality rate, which is reported by [36]. The data consists of 19,672 gene expression and 2,149 CGH measurements from 89 cancer patient samples, which is available at https://github.com/bnaras/PMA/blob/master/data/breastdata.rda (accessed on 18 June 2021). Our interest here is to detect the genes that have the most impact on comparative genomic hybridization (CGH) measurements. A similar purpose was achieved in [25,37]. Following [37], we consider the first principal component of 136 CGH measurements as the response Y and the remaining 18,672 gene probes as the explanatory variables . We implement the two stage procedure for the sake of comparison, where a variable screening method is implemented in the first stage and a predictive regression model is conducted in the second stage. To this end, we select variables in the first stage using one of the screening methods: SIS, NIS, Qa-SIS, QPC-SIS and NQPC-SIS, as mentioned in the simulation study. In the second stage, we randomly select 80% sample data as the training set, and the remaining sample as the test set. Then, we apply one machine learning method, regression tree, to the dimension-reduced data to examine the finite sample performance on the test set. We use the command M5P in R package “RWeka” for implementing the regression tree method. We use the mean of absolute prediction error (MAPE), defined as
as our evaluation index, where is the number of observations in the training set and is the predicted value of Y at the observation in the test set. We repeat the above procedure 500 times and report the mean and standard deviation of 500 MAPEs in Table 7. According to the results in Table 7, we can observe that the NQPC-SIS outperforms both the SIS, NIS and Qa-SIS. Typically, our NQPC-SIS produces the lowest prediction error (MAPE) among these methods when , and . Moreover, we also note that the QPC-SIS performs better than our NQPC-SIS at and , but worse than our method at other three quantile levels. Qa-SIS performs worst among these methods. This evidence supports that the proposed NPQC-SIS in this paper works well for this real data.
Table 7.
Prediction results for the real data on the test set, where the standard deviation is given in the parenthesis.
6. Concluding Remarks
In this paper, we proposed a nonparametric quantile partial correlation-based variable screening approach (NQPC-SIS), which can be viewed as an extension of the QPC-SIS proposed in [17] from a parametric framework to the nonparametric situation. Our proposed NQPC-SIS enjoys the sure independence screening property under some mild technical conditions. Furthermore, an algorithm of NQPC-SIS for implementation is provided for users. Extensive numerical experiments including simulations and real-world data analysis are carried out for illustration. The numerical results showed that our NQPC-SIS works fairly well especially when the relationship between variables is highly nonlinear.
Author Contributions
All the authors contributed to formulating the research idea, methodology, theory, algorithm design, result analysis, writing and reviewing the research. Conceptualization, X.X. and H.M.; methodology, X.X.; software, H.M.; validation, H.M.; formal analysis, H.M.; investigation, X.X.; writing—original draft preparation, H.M.; writing—review and editing, X.X.; supervision, X.X. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Fundamental Research Funds for the Central Universities (Grant No. 2021CDJQY-047) and National Natural Science Foundation of China (Grant No. 11801202).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Technical Proofs
Proof of Proposition 1.
First, recalling definitions of and , we can make a simple algebra decomposition:
Due to condition (B1), we can observe
where the cross product is zero due to by condition (B1). This, in conjunction with condition (B1) and the basic inequality that for , gives
Using Cauchy–Schwarz inequality, (A2) and the fact that , we have
For , we note that
where, by Taylor’s expansion,
where is a number between and . Hence, by condition (B1)–(B3) and Cauchy–Schwarz inequality, we can obtain
for some constant .
For , by a similar argument, we can obtain
Using the basic inequality that , we can immediately conclude
where . Thus, we complete the proof. □
Proof of Lemma 1.
Without loss of generality, suppose that . Then, . Let , where with . On one hand, since by Cauchy–Schwarz inequality, we have
This together with the right hand side of (9) implies that
On the other hand, an application of Lemma S.1 of [38] leads to
where for some positive constant and the last line uses the fact that for any . It follows from the result on the left hand side of (9) that
where the second inequality stems from and . This in turns implies that
Lemma A1.
Suppose that condition (C3) holds, then, for all ,
holds uniformly in j.
Lemma A1 is the same as Lemma 1 of [11]. From this, it is easily seen that is finite and bounded by .
Lemma A2
(Bernstein’s inequality, Lemma 2.2.11, [39]). For independent random variables with mean zero and for every , and some constants . Then, for , we have
for .
Lemma A3
(Bernstein’s inequality, Lemma 2.2.9, [39]). For independent random variables with mean zero and bounded range , then
for .
Lemma A4
(Symmetrization, Lemma 2.3.1, [39]). Let be independent random variables with values in and is a class of real valued functions on . Then,
where is a Rademacher sequence (i.e., independent and identically distributed sequence taking values with probability ) independent of , and and .
Lemma A5.
(Contraction theorem, [40]). Let be nonrandom elements of some space and let be a class of real valued functions on . Denote by a Rademacher sequence. Consider Lipschitz functions , that is
Then, for any function , we have
Lemma A6
(Concentration theorem, [41]). Let be independent random variables with values in and let , a class of real valued functions on . We assume that for some positive constants and , . Define , and , then for any ,
Next, we need several lemmas to establish the consistency inequalities for and . Write , , and . Thus and .
Lemma A7.
Under conditions (C1) and (C2),
- (i)
- there exists a constant such that for any ,
- (ii)
- for some positive constant , there exists some positive constant such thatwhere and is defined in Lemma 1; and
- (iii)
- in addition, for any given constant , there exists some positive constant such that
Proof of Lemma A7.
First, consider the proof of part (i). Denote with and . Recalling that , we have and by the inequality (10). By Lemma A3, we have for any ,
Let . It follows from Lemma 5 of [7] that . Besides, it is easy to derive that for any vector , , which implies that
Next, consider the proof of part (ii). Let , where . Employing the result (A10) and taking , we have
for some positive constant . This implies the part (ii).
Last, consider the proof of part (iii). Let and . Obviously, we know that . Using the same arguments as in [7], we can show that for , implies , where . Thus, implies . Hence, using the fact that for any real symmetric invertible matrix , we have
where . This completes the proof. □
Lemma A8.
Under conditions (C1)–(C3), for every and for any given positive constant , there exist some positive constants such that
Proof of Lemma A8.
By the definitions of and and a simple algebra operation, we have
In the following, we need to find the exponential tail probabilities for and , respectively.
We first deal with the first term . Since , we have
Thus, it follows from the triangle inequality and Lemma 1 that
For , it follows that
where and the last inequality holds by applying Lemma A1 and the result in (10). Using the above result, we have
Let , then for any , we have
Therefore, by Lemma A7, it follows that
For , note that is an vector, whose th component is , where and . Let . Then, for every , we have
where we have used the inequality that for , the fact that as well as Lemma A1. It follows from Lemma A2 that for any ,
where and . Employing the union bound of probability and the inequality (A16), we further have
Next, we deal with the second term . Since , we have by Lemma 1. Then, it follows from (A17) that
Using (A21) with , we have
for some positive constant and sufficiently large n, where . Hence, the desired result follows. □
Lemma A9.
Under conditions (C1)–(C5), for any given constant and for every , there exist some positive constants and such that
Proof of Lemma A9.
Write and . By Lemma A.2 of [13], we have, for any ,
Taking in (A22), where C is any given positive constant, we first show that there exists some positive constant such that
To this end, let with . Invoking the Knight’s identity ([42], p121), i.e., , we have
where we have used the result that by the definition of . Note that the right hand side of (A24) equals
for between and . By condition (C4), it follows that
where and . This proves (A23). Hence, by (A22), it reduces to derive that
In what follows, we first consider . Let and then . Note that using the Knight’s identity, we have . So, by using condition (C5), it follows that
and
According to Lemma A3, we have
for some positive constant , provided .
Next, we consider . Define and so . This leads to
Again, using the Knight’s identity, we obtain
where the last line is because . Thus, it follows that
Lemma A10.
Under conditions (C1)–(C5), for every and for any given constant , there exist some positive constants and such that
Proof of Lemma A10.
Since by definition, so . A simple decomposition gives
The rest is to find exponential bounds for the tail probabilities of and , respectively.
For , since , so it follows from the inequality and Lemma A1 that for each ,
Invoking Lemma A2, for any , we have
where and .
For , note that for each ,
where a direct application of Lemma A9 yields . Let with . Denote
Then,
Furthermore, there exists a with and such that
for some positive constant , where we have used condition (C4) in the third line, Cauchy–Schwarz inequality in the fourth line, Lemmas 1 and A1 in the last line. Analogously to (A31), we have for each ,
and it follows from Lemma A2 that for any ,
where and . Setting in (A34), we obtain
Finally, we consider . Denote and define its subdifferential as with
and . Recalling the definition of , there exists such that . This yields
Thus, by condition (C5), it follows that
Using Lemma A8, we obtain
Note that for any . Letting , we thus have
Lemma A11.
Under conditions (C1)–(C5), for every and for any given constant , there exist some positive constants and such that
when n is sufficiently large. In addition, for some ,
Proof of Lemma A11.
Recalling the definition of and , we have
Let . For every , by the inequality and condition (C5), we have with and . Thus, by Lemma A2, it follows
for some positive constant . In addition, it is easily derived that
where . Similarly, applying the arguments used in deriving Lemma A7(ii), we have that for any constant , there exists some finite positive constant such that
This together with Lemma 1 yields
Moreover, employing (A21) with , we have
for some positive constants and . This in conjunction with (A45) gives
for some positive constant . For , let , and then for every , , where and . Thus, it follows from Lemma A2 that
for some positive constant . Note that . This together with (A47) and the union bound of probability gives
Proof of Theorem 1.
(i) We first show the first assertion. Let , , and . Then,
We first show that for some given constant , there exists a positive constant such that
To this end, using the fact that for positive x and y, we have
where the last line uses condition (C5). This together with Lemma A11 implies (A51). Notice that since , we have, for sufficiently large n, there exists a constant such that . Thus,
Accordingly,
where and the last inequality is due to . Moreover, observe that, by the definition of and Lemma A1,
where . So it follows from condition (C5) and (A51) and (A51) that
This together with the union bound of probability proves the first assertion.
(ii) Next, we show the second assertion. By the choice of with and condition (C6), we have
Thus, this completes the proof. □
Proof of Theorem 2.
By the assumption that which implies that the size of cannot exceed . Thus, it follows that for any , on the set , the size of cannot exceed the size of , which is bounded by . Then, taking and , we have
Therefore, the desired conclusion follows from part (i) of Theorem 1. □
References
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zou, H.; Li, R. One-step sparse estimate in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar] [PubMed]
- Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
- Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 2008, 70, 849–911. [Google Scholar] [CrossRef]
- Cheng, M.; Honda, T.; Li, J.; Peng, H. Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann. Stat. 2014, 42, 1819–1849. [Google Scholar] [CrossRef]
- Fan, J.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 2011, 106, 544–557. [Google Scholar] [CrossRef]
- Fan, J.; Ma, Y.; Dai, W. Nonparametric independent screening in sparse ultra-high dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef]
- Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
- Liu, J.; Li, R.; Wu, R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J. Am. Stat. Assoc. 2014, 109, 266–274. [Google Scholar] [CrossRef]
- Xia, X.; Li, J.; Fu, B. Conditional quantile correlation learning for ultrahigh dimensional varying coefficient models and its application in survival analysis. Statist. Sinica 2019, 29, 645–669. [Google Scholar] [CrossRef]
- Chang, J.; Tang, C.Y.; Wu, Y. Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann. Stat. 2016, 44, 515–539. [Google Scholar] [CrossRef] [PubMed]
- He, X.; Wang, L.; Hong, H. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 2013, 41, 342–369. [Google Scholar] [CrossRef]
- Liu, W.; Ke, Y.; Liu, J.; Li, R. Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc. 2022, 117, 428–443. [Google Scholar] [CrossRef]
- Li, J.; Zheng, Q.; Peng, L.; Huang, Z. Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics 2016, 72, 1145–1154. [Google Scholar] [CrossRef]
- Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef]
- Ma, S.; Li, R.; Tsai, C. Variable screening via quantile partial correlation. J. Am. Stat. Assoc. 2017, 112, 650–663. [Google Scholar] [CrossRef]
- Mai, Q.; Zou, H. The fused Kolmogorov filter: A nonparametric model-free screening method. Ann. Stat. 2015, 43, 1471–1497. [Google Scholar] [CrossRef]
- Wu, Y.; Yin, G. Conditional qunatile screening in ultrahigh-dimensional heterogeneous data. Biometrika 2015, 102, 65–76. [Google Scholar] [CrossRef]
- Zhou, T.; Zhu, L.; Xu, C.; Li, R. Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 2020, 115, 1393–1405. [Google Scholar] [CrossRef]
- Barut, E.; Fan, J.; Verhasselt, A. Conditional sure independence screening. J. Am. Stat. Assoc. 2016, 111, 1266–1277. [Google Scholar] [CrossRef] [PubMed]
- Xia, X.; Jiang, B.; Li, J.; Zhang, W. Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. Lifetime Data Anal. 2016, 22, 549–569. [Google Scholar] [CrossRef] [PubMed]
- Chu, W.; Li, R.; Meimherr, M. Feature screening for time-varying coefficient models with ultrahigh-dimensional longitudinal data. Ann. Appl. Stat. 2016, 10, 596–617. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Wang, Q. Model-free feature screening for ultrahigh-dimensional data conditional on some variables. Ann. I. Stat. Math. 2018, 70, 283–301. [Google Scholar] [CrossRef]
- Wen, C.; Pan, W.; Huang, M.; Wang, X. Sure independence screening adjusted for confounding covariates with ultrahigh-dimensional data. Statist. Sinica 2018, 28, 293–317. [Google Scholar]
- Li, R.; Liu, J.; Lou, L. Variable selection via partial correlation. Statist. Sinica 2017, 27, 983–996. [Google Scholar] [CrossRef]
- Li, G.; Li, Y.; Tsai, C.L. Quantile correlations and quantile autoregressive modeling. J. Am. Stat. Assoc. 2015, 110, 246–261. [Google Scholar] [CrossRef]
- Xia, X.; Li, J. Copula-based partial correlation screening: A joint and robust approach. Statist. Sinica 2021, 31, 421–447. [Google Scholar] [CrossRef]
- De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 2001. [Google Scholar]
- Huang, J.Z.; Wu, C.; Zhou, L. Varying-coefficient models and basis function approximation for the analysis of repeated measurements. Biometrika 2002, 89, 111–128. [Google Scholar] [CrossRef]
- Xia, X. Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing. Stat. Pap. 2022, 62, 2885–2905. [Google Scholar] [CrossRef]
- Huang, J.; Horowitz, J.; Wei, F. Variable selection in nonparametric additive models. Ann. Stat. 2010, 38, 2282–2313. [Google Scholar] [CrossRef] [PubMed]
- Stone, C. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
- Zhou, S.; Shen, X.; Wolfe, D.A. Local asymptotics for regression splines and confidence regions. Ann. Stat. 1998, 26, 1760–1782. [Google Scholar]
- Kalisch, M.; Bühlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 2007, 8, 613–636. [Google Scholar]
- Chin, K.; DeVries, S.; Fridlyand, J.; Spellman, P.T.; Roydasgupta, R.; Kuo, W.L.; Lapuk, A.; Neve, R.M.; Qian, Z.; Ryder, T.; et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 2006, 10, 529–541. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, J.; Hao, Z.; Zhu, L. Model-free conditional feature screening with exposure variables. Stat. Its Interface 2019, 12, 239–251. [Google Scholar] [CrossRef]
- Chen, Z.; Fan, J.; Li, R. Error variance estimation in ultrahigh dimensional additive models. J. Am. Stat. Assoc. 2018, 113, 315–327. [Google Scholar] [CrossRef]
- Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
- Leddoux, M.; Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes; Springer: Berlin, Germany, 1991. [Google Scholar]
- Massart, P. About the constants in talagrands concentration inequalities for empirical processes. Ann. Probab. 2000, 28, 863–884. [Google Scholar] [CrossRef]
- Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).