Nonparametric Sieve Maximum Likelihood Estimation of Semi-Competing Risks Data

: In biomedical studies involving time-to-event data, a subject may experience distinct types of events. We consider the problem of estimating the transition functions for a semi-competing risks model under illness-death model framework. We propose to estimate the intensity functions by maximizing a B-spline based sieve likelihood. The method yields smooth estimates without parametric assumptions. Our proposed approach facilitates easy computation of the covariance of the model parameters and yields direct interpretation. Compared with existing approaches, our proposed method requires neither the subjective speciﬁcation of the frailty distribution nor the Markov or semi-Markov assumption which may be unmet in real applications. We establish the consistency, the convergence rate, and the asymptotic normality of the proposed estimators under some regularity conditions. We also provide simulation studies to assess the ﬁnite-sample performance of the proposed modeling and estimation strategy. A real data application is further used to illustrate the proposed methodology.


Introduction
In survival analysis, a subject may experience several distinct types of failures.If apart from censoring, the follow up period ends upon the occurrence of the first event, such data are often referred to as competing risks data.This framework consists of survival data where failure may be due to one of a number of competing causes.In some application, with additional information, this notion can be extended to accommodate that of semi-competing risks ( [1,2]), where one type of event (terminal event, e.g., death) may censor the other events (non-terminal event, e.g., relapse of the disease), but not vice versa.The framework of semi-competing risk data have been previously discussed in [1,3].Furthermore, competing risks data can also be regarded as a special type of multitask prediction problem, which simultaneously predicts multiple outcomes from the same set of predictors.A stacking algorithm borrowing information among multiple prediction tasks to improve multivariate prediction performance (MTPS) is recently proposed by [4].The MTPS is shown to outperform existing multivariate prediction methods.
Recently [5] suggests that semicompeting risks data can also be analyzed using the conventional illness-death compartment model by a subjective specification of the frailty distribution and postulating the Markov or semi-Markov assumption for the conditional transition functions given the covariates and the frailty ( [6,7]).However, the subjective specification of the frailty distribution or the Markov or semi-Markov assumption may be unmet in some practical applications, leading to inconsistent estimators.In such cases, alternative (non-Markov) estimators are needed.Furhthemore, their nonparametric maximum likelihood estimation approach may be computational demanding when the sample size is large.
To address the theoretical and numerical challenges in the semiparametric estimation of semi-competing risks model, we employ the B-spline based sieve maximum likelihood approach to simultaneously estimate the regression parameters and transition functions.Covariates are incorporated naturally via proportional hazards assumptions.This approach facilitates easy calculation of the covariance of the model parameters.The proposed spline estimation algorithm requires much less computation than the isotonic type algorithm used in [5] since the size of the step function is much larger than the number of parameters in our proposed B-spline based approach.Under certain regularity conditions, we are able to prove that the estimators of regression parameters is root-n consistent, asymptotically normal and semiparametric efficient.
The rest of the paper is organized as follows.In Section 2, we will introduce our proposed model and estimating approach.In Section 3, we study the asymptotic properties of the proposed estimators.In Section 4, we provide simulation results.An application to colon cancer data is given in Section 5. We then conclude with some discussion in Section 6.All proofs are relegated to the Appendix A.

Model and Likelihood Function
For the ith subject, let C i , X i , T i1 , and T i2 denote the censoring, covariate vector, non-terminal event time, and terminal event time, respectively.Define The hazard functions are defined as below.
For the case with q dimension covariates X, the conditional transition rate functions are defined as follows: Note that both x and X refer to the covariates where X denote the random variable and x refers to its observed values.The Equations ( 5)- (7) are the conditional transition functions of T 1 and T 2 (given X = x) while the Equations (1)-(3) are the unconditional transition functions of T 1 and T 2 .
To simplify the notation, denote Note that in our modeling approach, λ 30 depends on two parameters t and s.

Sieve Space
We propose a sieve space consisting of B-splines for λ j0 (j = 1, 2, 3) in maximizing (4).We suppose that Y 1 and Y 2 have compact supports (say [0, 1]) and that β ≤ M for a known constant M. Rewrite λ 10 (t) = exp (g 10 (t)), λ 20 (s) = exp (g 20 (s)), λ 30 (t, s) = exp (g 30 (t, s)).Let ψ = (g 1 , g 2 , g 3 ) and ψ 0 = (g 10 , g 20 , g 30 ).A sieve space consisting of B-splines is defined for these new parameters as follows: First, we obtain an extended partition with equal length 1/K n for the interval [0, 1] : where m (independent of the sample size n) and K n = O(n ν )(0 < ν < 1/2) are two integers to be chosen later.Note that m and K n are two parameters often used in B-spline modeling where m indicates the smoothness of the basis function.Let be a normalized B-spline basis associated with ∆ (see [8]).Then the sieve space for the parameters θ = (β, ψ(t, s)) is defined as where M n ≤ (2m − 1)/(2m (2m + 1)) with a constant m arbitrarily close to m.For any Remark 1.Here we assume that the transition intensity λ 30 (•) depends on both t 1 and t 2 .A semi-Markov process specifies that λ 30 (t 1 , t 2 ) = h 2 (t 2 − t 1 ).However, it is important to note that in either Markov or semi-Markov approaches, λ 30 depends on only one parameter, corresponding to the special cases of our modeling approach where λ 30 can flexibly depend on two parameters.

Maximization
Let P n , P denote the empirical measure and the true probability measure of (δ 1 , δ 2 , Y 1 , Y 2 , X), respectively.We maximize the function over the sieve space Θ n .For the knot selection, we let m = 3 and use the Bayesian information criterion to choose K n which minimizes the criterion function.

Theoretical Properties
In this section, we establish the theoretical properties of our spline-based modeling strategy under the following regularity conditions.

Assumptions
(A1) Y 1 and Y 2 have compact supports (say [0, 1]) and X has bounded support in R q where q is the dimension of X.Moreover, if there exists a constant c 0 and a constant vector γ such that γ X = c 0 almost surely, then c 0 = 0 and γ = 0. (A2) β 0 ∈ B, where B is a compact set of R 3q with nonempty interior.λ 10 and λ 20 ∈ H r , and λ 30 ∈ C r .(A3) K n = O(n ν ) where ν satisfies the restrictions 0.25/r < ν < 0.5.(A4) r ≥ 2 where r is the measure of smoothness of λ j in definitions of H r and C r .
We first establish the strong consistency for the estimated model parameters.
Next, we obtain the convergence rates for the proposed estimators.
Theorem 2. Under Assumptions A1-A3, it holds that ), which is the optimal convergence rate in the non-parametric regression setting for bivariate function estimation by [9].
To derive the limiting distribution of the proposed estimators, establish the asymptotic normality, we calculate the directional derivative of the log-likelihood in the associate functional spaces as follows.
Remark 2. Inference about β.Theorem 3 offers ease of inference procedure, especially for the regression parameter β.Set φ j (•) = 0(j = 1, 2, 3), then Theorem 3 yields that n by Gramer-Wold device, one can establish semiparametricefficiency of β.where Σ ββ can be consistently estimated using the inverse of the Hessian matrix.

Remark 3. Inference about λ
where σ 2 λ j (j = 1, 2) can be consistently estimated by using the delta method or some resampling methods.Similarly inference can be done for λ 3 (t, s): Let b = 0, φ 1 (•) = 0, φ 2 (•) = 0, then Theorem 3 yields that where σ 2 λ 3 can be consistently estimated by using the delta method or some resampling methods.The above results can be used to check the linear (quadratic) effect of t j (j = 1, 2), or to check whether λ 3 (t 1 , t 2 ) is an additive form of t 1 and t 2 .

Simulation Study
We conducted simulations to investigate finite sample performance of the proposed estimator.In the simulation, we let By calculation, it is clear that the stipulated transition functions do not follow the transition functions from the models involving the frailty distribution and Markov or semi-Markov modells ( [1,5]).It is therefore of interest to examine whether the proposed splinebased estimation procedure still yields reliable and accurate estimates for this scenario which cannot be tackled by existing approaches.We report results with one covariate, X, having a uniform.distribution between 0 and 0.5.We consider β j = 1, −1, 0.5, j = 1, 2, 3, and n = 200 and 400.The censoring time was simulated from from a uniform distribution on (0, τ) with τ = 50.We compute the spline based semiparametric maximum likelihood estimate using the cubic B-spline and estimate the standard error of the estimated regression parameter using the inverse of the Hessian matrix.For the B-spline, the number of knots K n or equivalently N n = (K n + m) is chosen using BIC defined in Section 2.3.Tables 1-3 presents the estimation bias (BIAS), standard deviations (STD), the mean of the estimated standard error of the estimated regression parameter(ESE) and the coverage proportion of the 95 percent confidence intervals (CP) based on 500 replicates.From Tables 1-3, we can see (a) the proposed estimates have very small biases; (b) standard deviations of the estimates shrink at approximately the √ n rate; (c) the estimated standard deviations are very close to those of the original estimates; the 95 percent confidence intervals provide adequate coverage probabilities.It can be seen that the proposed modeling strategy and estimation procedure can yield reliable and accurate estimates and exhibit direct and good interpretation in practice.

A Real Data Example
As our proposed B-spline based modeling strategy does not involve the subjective specification of the frailty distribution and do not require the Markov or semi-Markov assumption which may be unmet in real applications, it is hence more flexible than existing approaches in practice.To illustrate this point, we now apply the illness-death model presented in Section 2 to the colon cancer data.It is of interest to examine whether the time spent in state 1 (past) is related to the transition function from state 2 into state 3.For answering this question, we consider a working model λ 3 (t, s) = exp(ξt)λ(s).It translates to test H 0 : ξ = 0.This can be done using the usual likelihood ratio statistic.The results obtained for the colon cancer study show that the effect of time spent in state 1 is significant (p-value < 0.05).This allows us to conclude that the Markov assumption may be unsatisfactory for the colon cancer data set.This further demonstrate the stringent assumptions required by existing approaches may be unmet in practice which calls for the need of our proposed methodology.
For illustrative purposes, we only consider one covariates: Lev+5-FU treatment.Our interest centers on understanding the effect of Lev+5-FU treatment and nonparametricall modelling transition functions in different states.Table 4 reports the estimates of the regression coefficients along with standard errors and p-values.From Figures 1 and 2, we can see our proposed model and estimation procedure yield the estimated transition functions with direct and good interpretation.It stipulates quantitatively how the hazard functions of the time to terminal event and the time to non-terminal event evolves over time and shed lights on the disease progression and death risks for colon cancer patients with and without relapse of the cancer.We plot the estimated the transition functions in Figure 2.
Furthermore, to illustrate the computational advantage of our proposed approach, for the real data application, the existing frailty-model approach will require the number of parameters (3 + 413 + 1 = 417).However, our proposed B-spline approach only require (m + K n ) * 3 + 3 = (4 + 8) * 3 + 3 = 39 parameters.Hence, the computational cost is substantially reduced while our approach is more flexible than existing approaches because it does not require the subjective specification of the frailty distribution and the Markov or semi-Markov assumption.

Concluding Remarks
In this paper, we proposed an spline-based sieve semiparametric maximum likelihood method for semi-competing risks data.This method reduces the dimensionality of the estimation problem using the splines and therefore releases the numerical burden of the computation.This approach allow essily infer for both regression parameters and transition functions.It should be a straightforward task to apply the method presented here to allow for non-linear relationships between continuous predictors and survival in the multi-state framework ([6,10] and others).Simulations showed that the new estimator may behave very good.For illustration purposes we used a real dataset from a clinical trail for colon cancer.Competing risks data can also be regarded as a special type of multitask prediction problem.In such a field, the most state-of-the-art method is MTPS [4], which currently does not support predicting survival outcomes.Following their approaches, it would be worthwhile studying the stacked algorithm for prediction with multivariate survival outcomes including competing risks and semi-competing risks data.

Figure 2 .
Figure 2.Estimated transition functions for the colon cancer data.

Table 4 .
Estimated regression coefficients and their standard errors for the colon data.