Abstract
The proportional subdistribution hazards (PSH) model is popularly used to deal with competing risks data. Censored quantile regression provides an important supplement as well as variable selection methods due to large numbers of irrelevant covariates in practice. In this paper, we study variable selection procedures based on penalized weighted quantile regression for competing risks models, which is conveniently applied by researchers. Asymptotic properties of the proposed estimators, including consistency and asymptotic normality of non-penalized estimator and consistency of variable selection, are established. Monte Carlo simulation studies are conducted, showing that the proposed methods are considerably stable and efficient. Real data about bone marrow transplant (BMT) are also analyzed to illustrate the application of the proposed procedure.
Keywords:
competing risks; cumulative incidence function; bone marrow transplant; re-distribution method MSC:
62N02
1. Introduction
In survival analysis, sometimes events fail because of a specific cause or from some other causes or competing risks. Consider the dataset of bone marrow transplant (BMT) in [1] for example, which includes 177 patients who received a stem cell transplant for acute leukemia. Whereas 56 patients in this dataset relapsed (REL), considered as the event of interest, 75 patients died from causes related to the transplant (transplant related mortality, TRM), which is considered a competing risk, as it hinders the occurrence of leukemia relapse. The other 46 patients are regarded as censored due to the end of the study. In the analysis of such a dataset, treating competing risks (TRM) as censoring cases and using usual Cox modelling may be inaccurate, as the competing risks are probably affected by covariates. To deal with such competing risks data, Ref. [2] proposed a novel semiparametric proportional hazards for the subdistribution, or PSH model, which directly analyzes the effect of covariates on the marginal probability function or cumulative incidence function (CIF). The competing risks data often occur in clinical trials containing large numbers of covariates, among which only a few have significant or essential influence on the response, generating the variable selection issues, such as the general penalized log-partial likelihood method proposed by [3].
Quantile regression introduced by [4] is widely known to more comprehensively describe the conditional distribution of response on covariates. Existing work about competing risks quantile regression includes [5], which first transforms competing risks quantile regression models to accelerated the failure model and uses an estimating equation procedure for estimation. In addition, Ref. [6] discussed the quantile regression for competing risks data with missing cause of failure. Then [7,8] developed variable selection procedures based on unbiased estimating equations with group structures and penalization methods for competing risks quantile regression models.
In the paper, in spite of the estimating equation method, we propose developing a more general method for competing risks quantile regression and expanding the weighted procedures by considering the re-distribution methods [9] for the PSH model. By transformed responses, we can rewrite the competing risks quantiles formulation as a general quantile regression objective function, then apply the constructed weights. With unbiasedness of the subgradient of this weighted objective function at the true cumulative-incidence function and coefficient proved, consistency and asymptotic normality of the penalty-free estimators are established under regularity conditions. To realize the variable selection, penalization methods such as the least absolute shrinkage and selection operator (LASSO) proposed by [10] and the adaptive LASSO (ALASSO) developed by [11] are applied to the weighted objective function, which can be easily applied with the R package. The consistency of the variable selection procedure is also established, and Monte Carlo simulation is performed to illustrate the efficiency and stability of our proposed procedures. Real data about bone marrow transplant are analyzed using our methods.
The paper is organized as follows. Our proposed weighted competing risks quantile regression model and its penalized methods are developed in Section 2, with asymptotic properties demonstrated in Section 3. Simulation studies as well as the application to the BMT data are performed in Section 4 to illustrate the performance of proposed methods.
2. Models
We take the formulation of competing risks quantile regression in [5]. In the setting of competing risks models, assume there exist K causes of failure, denoted by an observable indicator , the same denotation as [2]. Without loss of generality, we can set . Let T and C denote the failure and censoring time, respectively, and we observe , and censoring or risk indicator where is an indicator function. Denote a bounded time-independent covariate vector as and . Assume that are independent and identically distributed observed samples.
Ref. [2] modeled the CIF for failure from cause 1 conditionally on the covariates, . They proposed the PSH model based on the formula of subdistribution hazard, which is defined as
in [12]. Analogue to the definition of quantile, we define the conditional quantile as , where is the CIF for cause k; for more details, refer to [5]. For , consider to be modeled as
where is a coefficient vector, is a known monotone increasing and continuously differential bounded link function, . With the statement in [2], if we denote , then has a distribution function equal to when and a point mass at . Then, at , the quantile of equals under the formulation of (1).
Remark 1.
According to the formulation of , we can see that when , the quantile of will become ∞, which is obvious when reviewing the definition that and the fact that is monotone increasing. This fact provides a thought about the choice of .
With reference to [13], for proper , is supposed to be the minimizer of the following expected loss function with respect to :
where E denotes the expectation, and is called the “check” function.
In a sample scenario, we can obtain the estimator of via minimizing the following objective function:
2.1. Weighted Competing Risks Quantile Regression
Similar to [5], our paper first considers the case in which there are no missing data (i.e., there is no censoring). As a result, and . As aforementioned, we can estimate via the minimization problem (3). Because is not observed, we modify (3) to
where is any value sufficiently large to exceed all . Then, it is not difficult to derive the negative subgradient of (4) with respect to .
For the censoring case, we aim to construct such a weighted quantile objective function to estimate as follows:
The weight function is re-constructed based on competing risks analogy to [14], as follows:
Remark 2.
In our case of competing risks quantile regression, each point contributes to the subgradient condition only via the sign of . For data with , we know , and can be observed, thus we assign a weight of 1 for this case. For data with and , then or ; in the first scenario, ; in the second scenario, , where we assign a weight of 0. The ambiguous situation is and , i.e., . If , or ; if i.e., . However, the cannot be observed.
Thus, we assign the weight for this case, where given ,
We can show that a subgradient of the weighted quantile objective function (5) with respect to
is an unbiased estimating function of .
Although the unbiasedness of (8) is proved with in , the underlying distribution or is unknown in practice. Here we use the IPCW [15] estimator proposed by [5] to estimate ,
where is the survival function of C given Z, which can be estimated semiparametrically or nonparametrically. Here for simplicity, as in [2], we assume the independence of C and , then the Kaplan–Meier estimator in [16] could be used. Such a computation-friendly estimator (9) has been proved to behave quite well in simulation results, which should be well improved combined with more effective estimators of .
2.2. Variable Selection Procedure
To select important variables, a penalty function is added to the weighted objective function (11) to obtain the penalized estimator :
where can be LASSO, adaptive LASSO, and so on.
For LASSO and ALASSO penalty, we can easily write , where is the jth element of the initial consistent unpenalized estimator. We choose for LASSO and for ALASSO. The minimization of (12) and (11) can be directly solved with the R package quantreg without linear programming, leading our proposed methods to conveniently applicable tools.
3. Theoretical Property
To establish the asymptotic results in this paper, we require the following assumptions:
- A1
- The covariate Z is bounded in probability. There exists a constant such that , and is a positive definite matrix.
- A2
- The functions and have first derivatives with respect to t, denoted as and , which are uniformly bounded away from infinity. Additionally, and have bounded (uniformly in t) second-order partial derivatives with respect to Z.
- A3
- For in the neighborhood of , and are positive definite.
Assumption A1 states some tail and moment conditions on the covariate Z, which are standard for the quantile regression. Assumption A2 is needed for the local Kaplan–Meier estimator. It allows us to obtain the local expansions of and in the neighborhood of in order to obtain the uniform consistency and the linear representation of . Assumption A3 ensures that the expectation of the estimating function has a unique zero at , and it is needed to establish the asymptotic distribution of .
- C1
- There exists such that and =0.
- C2
- is Lipschitz continuous for
- C3
- a.s.
Assumptions C1 and C2 are regularity conditions for competing risks quantile regressions. Assumption C3 is easily satisfied for the situation of competing risks; otherwise, it will turn out to be a standard Cox model.
Theorem 1.
Assume that triples constitute an i.i.d. multivariate random sample and that the censoring variable is independent of conditionally on the covariate . Under model (1) and assumptions A1–A3, C1–C3,
in probability as
Theorem 2.
Theorems 1 and 2 established the consistency and asymptotic normality of the unpenalized estimator . We then establish the property of consistency in variable selection of the proposed penalized estimator . Let and .
Theorem 3.
If A1–A3, C1–C3 hold, and if and , then
Theorem 3 states that the proposed procedure is able to select the correct model with probability approaching one. By the remark of Theorem 2 of [14], the oracle properties are satisfied by the proposed estimators.
The proofs are presented in Appendix A.
4. Numerical Studies
4.1. Monte Carlo Simulation
We conduct Monte Carlo simulations to evaluate the performance of the proposed methods and consider the data-generating ways as in [5] with a larger dimension of covariates.
We generate satisfying , , and , where denotes the standard normal distribution function, , and are true parameters in the model above. Set while Then
where is the jth component of covariate Z, and is the jth component of . Then the estimated coefficient in model (1) is
Thus the true number of non-zero coefficients is 5 for and 4 for due to .
In simulations, we set the number of irrelevant predictors to be , the sample size to be . For the structure of the covariance matrix for covariates, we consider , where .
We generate the covariate vector as follows: Unif(0,1) and Bernoulli(0.5), , . For each scenario, the simulation is repeated 500 times. The censoring rate average is 36%.
We use the following criteria to evaluate the performances: the ratio of number of relevant variables correctly selected to true number of relevant variables (TPr) defined as , the ratio of number of irrelevant variables incorrectly selected to true number of irrelevant variables (FPr) defined as , the absolute error , and the squared error . The closer TPr is to 1 and FPr is to 0, the better. Both TPr and FPr range from 0 to 1, thus we present them together in Figure 1 for comparison.
Figure 1.
Case . Comparison of TPr and FPr for four levels of . In each subplot, the Y axis reports the TPr and FPr values at different . The solid line is TPr and the dashed line is FPr. Four colors are used to represent the methods: red for wcqr, green for wcqr0, blue for wcqr1, and purple for wcqr2. Light colors represent the LASSO penalty and dark colors are the ALASSO penalty for all methods.
We compare our proposed weighted estimators with the estimated estimator of competing risks quantile regression model proposed in [8], denoted as wcqr and cqr, respectively, implying the weighting method or not. In simulation tables, we use cqr.l and cqr.a to represent cqr estimators with LASSO and ALASSO penalty, respectively. Similarly, our estimators, denoted as wcqri.l and wcqri.a, stands for administrative censoring where C is known and randomly right censoring cases where X is in place of C, respectively; wcqr2 uses a different weight:
As the weight above involves the estimation of , which probably is complicated in practical circumstances, we only use it for comparison in simulations. Here in wcqr2, we apply a similar estimating method of to .
Alhough our theoretical results are not based on these two estimators of , most simulation results show that wcqr0 and wcqr1 are considerably close, as the weight is only different at , suggesting good estimates in large censoring rates. Research about massive competing risks data with enormous censored observations will appear in our future work.
Before the variable selection, we also conduct the simulation for unpenalized estimators. In this case, we use , , and . We repeat this 1000 times and compare the empirical bias (EmpBias) and average coverage probabilities based on 95% confidence intervals computed with empirical variance. The results are summarized in Table 1, where in lower quantiles, the cqr method shows extreme excellence, whereas in high quantiles, it displays some instability. For weighted methods, though inferior to cqr in lower quantiles, these methods still behave well in most simulations, especially wcqr1 and wcqr2. The average coverage probabilities display similar patterns; cqr behaves well until . In relatively high quantiles such as , wcqr2 behaves the best for most coefficients.
Table 1.
Bias and empirical coverage; .
With moderate dimensions of covariates (), Figure 1 presents the TPr and FPr values evaluated for and and four s for , respectively. Generally speaking, we can observe that almost all selection performances appear to decline as increases, and with higher TPr and lower FPr, ALASSO penalized methods are overall superior to LASSO methods. Specifically, in quantiles lower than 0.4, with ALASSO penalty, cqr and wcqr both have good performances for identification of important variables, with TPr close to 1. Compared to the ALASSO method, LASSO methods have higher FPr values and tend to select much more irrelevant variables. In Figure 1, wcqr estimators display comparable performance with cqr estimators according to high TPr and low FPr at moderate . At higher quantiles, although a little bit inferior to cqr estimators in TPr, wcqr estimators have very low FPr values despite a rapid increase of FPr for cqr estimators, which means wcqr estimators have a strong ability to drop irrelevant variables as well as select correct variables even when cqr estimators almost fail in particularly high quantiles. We should state that in all simulations, wcqr estimators present quite stable performances in higher quantiles and higher dimensions. The decline of performance with increasing can be explained by a higher that is approaching the probability , which induces larger biases. In addition, it is notable that TPr has a very small decrease when increases except for cqr.l, which has a large decrease, since when the correlation of covariates increases, it is more difficult for identification. Even when , the wcqr estimators with ALASSO behave quite well in simulations.
Figure 2 shows the P1 and P2 performances for the eight methods, and the two values for cqr estimators are too large to be displayed in the plot. In contrast, wcqr estimators stably indicate a decrease from 0.1 to about 0.27 and an increase from 0.3 to 0.6. It can be explained that in low quantiles, few ambiguous cases are used for estimation, which causes insufficient use of information; whereas in high quantiles, where more ambiguous observations are weighted, the accuracy of weights will affect the estimation performance. The improvement of estimation for can be an investigation in the future.
Figure 2.
Case . Comparison of P1 and P2 for four levels of . In each subplot, the Y axis reports the P1 and P2 values at different . The solid line is P1 and the dashed line is P2. Four colors are used to represent the methods: red for cqr, green for wcqr0, blue for wcqr1, and purple for wcqr2. Light colors represent the LASSO penalty and dark colors are the ALASSO penalty for all methods.
We also present other simulation results in Figures S1–S11 in the supplementary material. Figures S1 and S2 show the TPr, FPr, P1, and P2 for and 50, respectively. We can observe that, in Figure S1, the TPr of wcqr estimators with ALASSO are above 0.9 except for very high quantiles, indicating stability with low FPr compared to cqr estimators; in Figure S2, the tendency remains but the selection performance is inferior, although TPr still stays higher than 0.8 at . We also conduct the case when , which means the number of predictors exceeds the sample size. In this case, cqr estimators fail due to singular design matrix as well as the ALASSO estimator. We discover, surprisingly, that our wcqr estimators still work and behave quite well, as illustrated in Figure S3. Numerical studies for and are also discussed in the supplementary material, illustrated by Figures S4–S11. For the structure of the covariance matrix, we consider another kind of setup: . We also consider a different choice for and as 0.6 and 0.45, respectively, in order to test the performance under a different probability of . In addition, we also simulate the heavy-tailed distributions instead of Gaussian distribution for . Figures S4–S7 show that the ALASSO penalty significantly decreases the FP for both estimators, which suggests the superiority of ALASSO. Our estimators behave fairly close to the cqr estimator in most cases. Although the TP of our estimators behave slightly worse, the FP shows a relatively better performance. Not only does wcqr shows smaller deviation about estimated coefficients, but it also shows great stability, especially for the ALASSO penalty, in the case of higher quantiles . This shows the meaningful application of our estimators in high quantiles. Figure S8 represents the case of , where the performances of all criteria are greatly improved.
Figure S9 shows the performance for , which presents slightly better results than the case of . Figure S10 is for a different pair of , and our s ranges from 0 to 0.4, and turns out to be the quantile of 3 nonzero coefficients, which fits our simulation results. Figure S11 simulates t(3) distribution in place of standard normal distribution, displaying that our estimators behave significantly well for heavy-tailed distributions.
To conclude, wcqr estimators behaves comparably with the cqr estimator, with slightly worse performance for TP but better for FP. Interestingly, for the higher correlations and higher quantiles and heavy-tailed distribution, the superior performance the wcqr estimators display show good potential applicability to more complex data and higher quantiles.
4.2. Real Data Analysis
In this subsection, we use the BMT dataset in [1] for practical application. As the simulation illustrates, wcqr estimators display more stability to the complexity of data and high quantiles than existing cqr estimators, which motivates us to conduct the data analysis with our methods.
In this dataset, a total of 177 patients received a stem cell transplant for acute leukemia. The failure event is relapse (REL, 56 patients), and death from causes related to the transplant (transplant related mortality, TRM, 75 patients) is the competing risk. Forty-six patients are censored, thus the censoring rate is 26%. Covariates that affect REL and TRM includes sex, disease (lymphoblastic or myeloblastic leukemia), phase at transplant (Relapse, CR1, CR2, CR3), source of stem cells (bone marrow and peripheral blood, coded as BM+PB, or peripheral blood, coded as PB), and age. The link function is assumed to be exponential.
Figure 3, Figure 4 and Figure 5 report the numbers of selected variables as well as coefficient estimates by our weighted estimators compared with penalized quantile estimating equations proposed by [8] and the penalty-free methods with ranging from 0 to .
Figure 3.
Variable selection and estimation results for intercept and . The Y axis reports the coefficient values at different . Various colors of lines represent eight methods.
Figure 4.
Estimation results for , , and .The Y axis reports the coefficient values at different . Various colors of lines represent eight methods.
Figure 5.
Estimation results for , and . The Y axis reports the coefficient values at different . Various colors of lines represent eight methods.
From the figure we can see mainly our estimators select similar numbers of variables to cqr estimators at lower quantiles, but in higher quantiles, the wcqr estimators lie between the cqr-LASSO and cqr-ALASSO. For the intercept, in lower quantiles, five estimators appears conincident with one another, although cqr-ALASSO estimators tend to be unstable, whereas wcqr estimators shows stability here. For age, all estimators regard this variable as unimportant, except that the two LASSO estimators probably overestimate the importance. For sex:F, almost all estimators shrink the corresponding coefficients to zero. The ALASSO estimators tend to treat D:AML as an unimportant variable, except for quantiles around 0.1. For phase:CR1 and phase:CR2, all estimators tend to select them in lower quantiles, but wcqr tends to select phase:CR1 at higher quantiles larger than 0.21 but neglects phase:CR2 from 0.22 to 0.27. For phase:CR3, all the estimators show analogue performances but with slight shifts. For source:PB, the wcqr estimators perform more stably than cqr for all quantiles. The estimations for based on the five methods are placed in Figure 6.
Figure 6.
Estimation of . The Y axis reports the estimated values of at different t. Various colors of lines represent eight methods.
To conclude, our wcqr estimators present stability and keep similar performances to the results of cqr estimators. More importantly, our weighted estimates provide a relatively general objective function for researchers to directly use R packages for application.
5. Conclusions
In this paper, we proposed a weighted method for competing risks quantile regression model to transform the estimating equation to a common weighted objective function and applied the LASSO and ALASSO penalization for variable selection. We established the consistency and asymptotic normality for penalty-free estimators as well as the consistency of variable selection. Monte Carlo simulations were conducted for several scenarios, presenting good variable selection performance and stability. Finally, a real dataset was utilized to illustrate the application of our methods, which is comparable with other methods.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11061295/s1, Figure S1: Case . Reports of TPr, FPr, and at different . Four colors are used to represent the methods: red for cqr, green for wcqr0, blue for wcqr1 and purple for wcqr2. Light colors represent LASSO penalty and dark colors for ALASSO penalty. Figure S2: Case . Reports of TPr, FPr, and at different . Four colors are used to represent the methods: red for cqr, green for wcqr0, blue for wcqr1 and purple for wcqr2. Light colors represent LASSO penalty and dark colors for ALASSO penalty. Figure S3: Case , Reports of TPr, FPr, and at different . Various colors of line represent eight methods respectively. Figure S4: Case . Comparision of TP and FP for four levels of . In each subplot, the Y axis reports the TP values at different . Various colors of line represent eight methods respectively. Figure S5: Case , Plots of FP for four levels of . In each subplot, the Y axis reports the FP values at different . Various colors of line represent eight methods respectively. Figure S6: Plots of for four levels of . In each subplot, the Y axis reports the values at different . Various colors of line represent eight methods respectively. . Figure S7: Plots of for four levels of . In each subplot, the Y axis reports the values at different . Various colors of line represent eight methods respectively, . Figure S8: Case . Reports of TP, FP, and at different . Various colors of line represent eight methods respectively. Figure S9: Case . Reports of TP, FP, and at different . Various colors of line represent eight methods respectively. Figure S10: Case . Reports of TP, FP, and at different . Various colors of line represent eight methods respectively. Figure S11: Case . Reports of TP, FP, and at different . Various colors of line represent eight methods respectively.
Author Contributions
Conceptualization, E.L., J.P., M.T. (Manlai Tang), K.Y., W.K.H., X.D. and M.T. (Maozai Tian); Methodology, E.L.; Software, E.L.; Validation, E.L.; Formal analysis, E.L.; Investigation, E.L.; Resources, E.L.; Data curation, E.L., J.P., M.T. (Manlai Tang), K.Y., W.K.H., X.D. and M.T. (Maozai Tian); Writing—original draft, E.L.; Writing—review & editing, E.L., J.P., M.T. (Manlai Tang), K.Y., W.K.H., X.D. and M.T. (Maozai Tian); Visualization, E.L.; Supervision, J.P., M.T. (Manlai Tang), K.Y., W.K.H. and M.T. (Maozai Tian); Project administration, E.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by National Natural Science Funds of China (Grant No. 12101015), Scientific Research Foundation of North China University of Technology (No. 110051360002), the Fundamental Research Funds for Beijing Universities, NCUT (NO.110052971921/007), National Natural Science Foundation of China (No. 11861042), and the China Statistical Research Project (No. 2020LZ25).
Data Availability Statement
Publicly available datasets were analyzed in this study. These data can be found here: 10.1038/bmt.2009.359.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Technical Details of Proofs
To simplify the presentation, we omit in such expressions as .
Since the weights depend on , we take as . Additionally, we define as the subgradient of the weighted quantile objective function (11), where
Let , where
where is the density of censoring variable C conditionally on Z, and . It is noteworthy that , and it is easy to derive that .
Lemma A1.
Assume assumptions A1–A3, C1–C3 hold. Then
for every .
Remark A1.
Lemma A1 directly guarantees the consistency of our weight estimation to , which is the in Equation (6).
Proof.
By condition C1 and A1 and [17], Ref. [5] has developed that for every , , a.s. This, coupled with C2, implies that
Simultaneously, for , is uniformly bounded away from 0, thus by Chebyshev’s inequality, for every ,
which holds for any x, that is
□
Lemma A2.
For all positive values , we have
Proof.
Let and denote the jth coordinates of and , respectively. For notational simplicity, in the following we omit the subscript i in various expressions such as . Let be some positive constants. Note that for
where
It is easy to verify that
or multiplied by a constant, by Assumption C3. Therefore, by Assumptions A1 and A2,
Following similar arguments, we can show that
Note that
Similarly to , it is easy to verify that . Then
Since
Then by Assumption A1, we have Consequently,
Similar arguments to proving , by adding and subtracting , yields
By the proof of and , we can easily get that and Thus
Therefore, condition (3.2) of [18] holds with and , and condition (3.3) is satisfied by remark 3(ii) of their paper. Thus, Lemma 2 holds by applying Theorem 3 of [18]. □
Proof of Theorem 1.
Note that is equivalent to and . Therefore, when plugging in the true and into M, we get
Because is the solution of with being a continuous function of in a compact parameter neighborhood .
Therefore, the consistency of is the direct conclusion of Theorem 1 of [18], and we only need verify conditions (1.1), (1.2), and (1.5’) in their paper, as (1.3) is trivially satisfied and (1.4) follows from Lemma A1.
- (1.1)
- By the subgradient condition of quantile regression [13], there exists a vector v with coordinates such thatby Assumption A.1, where denotes a element subset of .
- (1.2)
- For any and ,which is strictly positive under Assumptions A1 and A3. Here is some value between and .
- (1.5’)
- Let be a sequence of positive numbers approaching zero as . Note that , under Assumption A1. It then follows from Chebyshev’s inequality that
Then the proof of Theorem 1 is complete with the conclusion of Theorem 1 of [18]. □
Proof of Theorem 2.
where ≈ denotes asymptotic equivalence uniformly in , and . Similarly derived as [5], is Lipshitz in x, converges weakly to a mean zero Guassian process with covariance matrix . Then by (A7),
where
is a random vector with mean 0 and by Assumptions A1–A3.
The asymptotic normality of relies on the results of Theorem 2 in [18]. We need to prove conditions (2.1)–(2.4), (2.5’), and (2.6’) in their paper. Conditions (2.1), (2.4), and (2.5’) hold directly by (A5), Lemma A1, and Lemma A2, respectively.
Note that for any lying above the th conditional quantile , the quantile fit will not be affected if we assign the entire weight to either or . Then we obtain
which is continuous at and of full rank under Assumption A3. For all , we define the functional derivative of at in the direction as
where . Since
where
Similarly, we can derive
where
For such that , . For sufficiently small , , then
where .
For such that , . For sufficiently small , , then
where and
For , note that for , then
and for , then
By expanding (treated as a function of ) around , and using the fact that (example 20.5 in [19]), we obtain
Similarly, we have
Therefore, for such that ,
for such that ,
That is
With the process of Taylor expansion, we can verify condition (2.3) of [18] under Assumptions A1 and A2.
Then, we verify condition (2.6). Combining (A6) and the analysis above, we have
Denote , , , , , and . Follow the proof in [5], , from [17], and converges to uniformly in both and , where . Then
Recall being independent mean 0 random vectors.
Since , and for , it is easy to verify
Then applying the central limit theorem gives
where
Then the proof for (14) is thus complete by Theorem 2 of [18]. □
Proof of Theorem 3.
Let . We first show that for any , as . Suppose there exists a such that . Let be a vector constructed by replacing with 0 in . For simplicity, we write . Note that . Therefore, for large enough n,
By Theorem 1, and , thus . As and , which yields
where is any positive constant. This contradicts the fact that .
We next show that for any , . We write for any vector , and as the sub-matrix of a matrix B with both row and column indices in . By Taylor expansion
uniformly over such that and . Let , we have
where . Therefore, with probability tending to 1,
for some positive and . However, the subgradient condition (A5) requires that
When and Assumption A1 holds, (A13) and (A14) suggest that the subgradient condition cannot hold if for some positive K. Using the monotonicity argument in [20], we can show that the subgradient condition also cannot hold if . Therefore, with probability tending to 1. Equivalently speaking, for all , or . The proof of Theorem 3 is thus complete. □
References
- Scrucca, L.; Santucci, A.; Aversa, F. Regression modeling of competing risk using R: An in depth guide for clinicians. Bone Marrow Transpl. 2010, 45, 1388–1395. [Google Scholar] [CrossRef] [PubMed]
- Fine, J.P.; Gray, R.J. A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 1999, 94, 496–509. [Google Scholar] [CrossRef]
- Fu, Z.; Parikh, C.R.; Zhou, B.J. Penalized variable selection in competing risks regression. Lifetime Data Anal. 2017, 23, 353–376. [Google Scholar] [CrossRef] [PubMed]
- Koenker, R.W.; Bassett, G. Regression quantile. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
- Peng, L.; Fine, J.P. Competing risks quantile regression. J. Am. Stat. Assoc. 2009, 104, 1440–1453. [Google Scholar] [CrossRef]
- Sun, Y.Q.; Wang, H.; Gilbert, P. Quantile regression for competing risks data with missing cause of failure. Stat. Sin. 2012, 22, 703–728. [Google Scholar] [CrossRef] [PubMed]
- Ahn, K.W.; Kim, S. Variable selection with group structure in competing risks quantile regression. Stat. Med. 2018, 37, 1577–1586. [Google Scholar] [CrossRef] [PubMed]
- Li, E.; Tian, M.; Tang, M. Variable selection in competing risks models based on quantile regression. Stat. Med. 2019, 38, 4670–4685. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.J.; Wang, L. Locally weighted censored quantile regression. J. Am. Stat. Assoc. 2009, 104, 1117–1128. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 121–152. [Google Scholar] [CrossRef]
- Gray, R.J. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann. Stat. 1988, 16, 1141–1154. [Google Scholar] [CrossRef]
- Koenker, R. Quantile Regression; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
- Wang, H.J.; Zhou, J.; Li, Y. Variable selection for censored quantile regresion. Stat. Sin. 2013, 23, 145–167. [Google Scholar] [PubMed]
- Robins, J.M.; Rotnitzky, A. Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology Theorethodological Issues; Jewell, N., Dietz, K., Farewell, V., Eds.; Birkhäuser: Boston, MA, USA, 1992; pp. 24–33. [Google Scholar]
- Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
- Pepe, M.S. Inference for Events With Dependent Risks in Multiple Endpoint Studies. J. Am. Stat. Assoc. 1991, 86, 770–778. [Google Scholar] [CrossRef]
- Chen, X.; Linton, O.; Van Keilegom, I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica 2003, 71, 1591–1608. [Google Scholar] [CrossRef]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Jureckova, J. Asymptotic relations of m-estimates and r-estimates in linear regression. Ann. Statist. 1977, 5, 464–472. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).