An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data

: For analyzing multiple events data, the illness death model is often used to investigate the covariate–response association for its easy and direct interpretation as well as the ﬂexibility to accommodate the within-subject dependence. The resulting estimation and inferential procedures often depend on the subjective speciﬁcation of the parametric frailty distribution. For certain frailty distributions, the computation can be challenging as the estimation involves both the nonparametric component and the parametric component. In this paper, we develop efﬁcient computational methods for analyzing semi-competing risks data in the illness death model with the general frailty, where the Minorization–Maximization (MM) principle is employed for yielding accurate estimation and inferential procedures. Simulation studies are conducted to assess the ﬁnite-sample performance of the proposed method. An application to a real data is also provided for illustration


Introduction
In biomedical studies, one subject may experience multiple types of event times.For example, a subject can experience a terminal event such as death and a non-terminal event such as disease recurrence where the terminal event can censor the non-terminal event but not vice versa.The two event times are usually dependent.This gives rise to semi-competing risks data in which the follow-up of a non-terminal event could be stopped by a terminal event [1].Such dependency can be modeled by copula; for instance, Ref. [1] investigated the degree of association between the two events under the Clayton copula model, and [2] extended their works to a class of more general copula models and established the asymptotic normality of this estimator.In addition, the Archimedean copula was applied by [3] to model such dependency.
In addition to the copula approach, the semi-competing risk data can be modeled by illness-death model by using the shared frailty to describe the dependence of two events' time.Many researchers ( [4][5][6][7][8]) have developed semi-parametric frailty models to analyze the semi-competing risks data, and semi-parametric regression analyses ( [9,10]) were also discussed by many researchers.In particular, Ref. [4] proposed a marginal-likelihood approach under the semi-parametric gamma frailty model.Ref. [11] proposed a joint frailty-copula model in the meta-analysis of individual patient data with semi-competing risks.Moreover, Refs.[5,6] proposed semi-parametric Bayesian approaches for the semicompeting risks data.Unlike the classical likelihood, which only involves fixed parameters, the hierarchical likelihood is constructed for both fixed parameters and unobserved frailties at the same time for the semi-parametric shared frailty model by [7].In addition, Ref. [12] proposed a broad class of semi-parametric transformation models with random effects for the joint analysis of recurrent events and a terminal event.Ref. [8] developed a novel semiparametric transformation model for the analysis of semi-competing risks data.In general, the computation involved in the semi-parametric frailty models is usually intensive, and the essential cause comes from the calculation of multiple non-parametric baseline cumulative hazard functions and intractable integrals over the frailty distributions.
Inferences for such semi-competing risks data are generally focused on the covariate treatment effects on the rates of terminal and non-terminal events and the association between the two types of events ([9,13-15]).Some recent research also focused on the parameter estimation procedure of semi-competing risk models.
Ref. [16] provided R packages with many algorithms to deal with independent and cluster-correlated semi-competing risk data.Ref. [17] applied the Bayesian approach to their proposed algorithm to conduct variable selection along with the parameter estimation, but their method is computationally intensive.Similarly, a penalized approach has been applied to this model by [18], and they applied a more efficient proximal gradient method which requires a proper warm start.In addition, Ref. [19] proposed an MCEM scheme to update the parameter estimates.The drawback of this method is that the performance of estimation is highly reliant on the MCMC sample approximates in the E-step, which also requires an accurate starting value.Different from their methods, for the parameter estimation part, we propose applying the MM algorithm in order to achieve better computational efficiency.
The MM algorithm is an important and powerful tool for optimization problems because it can simplify a difficult optimization problem by decomposing a high-dimensional objective function into separable low-dimensional functions.So far, the MM algorithm has a broad range of applications in the field of statistics, such as proportional odds model ( [20]), the shared frailty model ( [21]), the quantile regression ( [22]), and so on.In this paper, we used the MM principle and proposed a profile MM algorithm for the semi-competing risk shared frailty model, which facilitates its pertinent use in high-dimensional situations.
The rest of the paper is organized as follows.In Section 2, we introduce the semicompeting risk shared frailty model.Section 3 presents the estimation procedures based on the MM method.In Section 4, we provide two simulation studies to assess their practical performance.Section 5 illustrates the method by a real data analysis.Some concluding remarks and discussions are given in Section 6.

The Semi-Competing Risk Model with Gamma Frailty
Let C i , T i1 , and T i2 be the censoring, non-terminal, and terminal event times for the i-th subject, respectively, i = 1, . . ., n. Denote by x i a p-dimensional vector of covariates of the i-th subject.Assume that the censoring time C i is independent of T i1 and T i2 given x i .The observations can be summarized as If the subject fails before the non-terminal event occurs, then we define T i1 = ∞.The illness-death multi-state model for semi-competing risks data has three states, which are characterized by three intensity or hazard functions: (2.3) Equations (2.1) and (2.2) are the usual rough hazard functions for the competing risk portion of the model where a non-terminal or terminal event occurs first.Then, Equation (2.3) defines the hazard rate of the terminal event following the occurrence of the non-terminal event.Usually, λ 12 (t 2 |t 1 ) depend on both t 1 and t 2 .Followed by [4], since the dependence between non-terminal and terminal event times will be modeled by shared frailty later, here we assume a Markov process where the transition probability from the non-terminal event to the terminal event does not depend on the duration of the non-terminal event, i.e., λ 12 (t 2 |t 1 ) = λ 12 (t 2 ) only depends on t 2 .
In practice, Ref. [4] extended the classical semi-competing risk model by incorporating a common frailty (random effect) to model the dependency between non-terminal and terminal event times.That is, given the random effect ω and covariates x, the Cox-type multiplicative models can be expressed in the following form: (2.4) ) where λ 01 (t 1 ), λ 02 (t 2 ), and λ 03 (t 2 ) are three baseline hazard functions and ω is a subjectspecific random effect or frailty.Usually, the frailties ω i (i = 1, . . ., n) are assumed to be independent and identically distributed with a density function with a frailty parameter θ.The common distributions assumed for ω are Gamma(1/θ, 1/θ), Inverse Gaussian (θ, θ 2 ), and Log-normal(0, θ).Since the likelihood of semi-competing risk model with Gamma frailty has the explicit form of expression, we assume that the frailty ω has a Gamma distribution with mean 1, variance θ in the following section, and only show the estimation procedures for the semi-competing risk models (2.4)-(2.6)with ω ∼ Gamma(1/θ, 1/θ), i.e., Note that θ measures the dependence between non-terminal and terminal event times and a larger θ indicates a stronger dependence.

Philosophy of the MM Principle
Assume arg max α∈Θ (α|Y obs ) is our maximization problem, where (α|Y obs ) is the objective log-likelihood function, α = (α 1 , . . ., α q ) T ∈ Θ are the vector of parameters to be estimated, and Θ is the parameter space.
For such maximization problems, the MM principle provides a general frame for constructing iterative algorithms with monotone convergence which involves two M steps.The first M step aims to construct a surrogate function Q(α|α (k) ) by a series of algebraic inequalities under the following conditions: where α (k) denotes the current estimate of α in the k-th iteration.Once the surrogate function is constructed, the second M step is to maximize the surrogate function Q(•|α (k) ) instead of the objective log-likelihood function (α|Y obs ).Then, we update α (k) by α (k+1) as follows:

The Estimation Procedure
From Equations (2.4)-(2.6), the observed likelihood function of the semi-competing risk Gamma frailty model can be written as the form of then, we have the objective log-likelihood function as follows: Based on the MM principle, it is necessary to find a surrogate function for the objective log-likelihood function l(θ, β, Λ 0 |Y obs ) in (3.1).We first denote: and utilize the supporting hyperplane inequality − log(x) ≥ − log(x 0 ) − x−x 0 x 0 to deal with the last term of (3.1), then we have the temporary surrogate function as follows: where c 1 is a constant.Following [23,24], we use the profile estimation method and first profile out Λ 01 , Λ 02 and 0 ) for any given θ and β, this provides the estimate of Λ 01 , Λ 02 , and Λ 03 given θ and β as: 0 ), we have: where c 2 is a constant.For ease of expression, we further denote: and also use the supporting hyperplane inequality to deal with the first three terms of 0 ) in Equation (3.5), we obtain the surrogate function: To separate the parameters θ with each β 1 , β 2 , and β 3 , we further minorize the components θ exp(β 1 x j ), θ exp(β 2 x j ) and θ exp( 0 ) by the inequality: , then we obtain the final surrogate function: 2 exp(β 2 exp(β where . From (3.6), it can be seen that the frailty parameter θ and the regression parameters β 1 , β 2 , and β 3 are separated from each other.Accordingly, the resulting MM algorithm only involves a series of separated univariate optimizations in the next maximization step and matrix inversion is not needed.The Algorithm 1 is stated as follows.

Algorithm 1
The estimation procedures via MM method.
The variance for the estimates in the model involves three non-parametric baseline transition functions, frailty variance, and regression coefficients as shown in [4].Hence, there is no readily available plug-in formula for estimating it.However, the resampling method such as the bootstrap can be employed to calibrate it and construct confidence interval and the associate inferential procedures.

Simulation Study
To evaluate the finite sample performance of the proposed methods, we conducted the following simulation studies.As emphasized in the Section 2, the purpose of incorporating subject specific frailty terms is to account for dependence, which is not taken into account by the measured covariates.
Based on 500 replications, the average values of the biases (Bias) for estimated frailty and regression parameters, their empirical standard deviations (SD), and the average computation times (T) are summarized in Tables 1 and 2. From the results of Tables 1 and 2, we find that the SD of each parameter is relatively small which indicating the effectiveness of MM algorithm for the illness-death model with gamma frailty.It can also be found that most the SDs of θ are larger than that of β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 .With the increase in the true values of θ, the SD of θ will increase accordingly, while the SD of θ will decrease with the increase in sample size.When the censoring proportion increases from 30% to 50%, the SD of θ is increased accordingly, but the SDs of other parameters, i.e., β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 , do not show a clear trend of change.A larger value of theta requires longer computation time.
Furthermore, we plot the estimated cumulative baseline hazard function Λ 01 (t) and Λ 03 (t) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 01 (t) = t and Λ 03 (t) = 2t (with black color) under the model with gamma frailty in Figures 1 and 2 with sample size 500.Since both cumulative baseline hazard functions Λ 01 (t) and Λ 02 (t) have the same expression, here, we only plot the estimated cumulative baseline hazard function Λ 01 (t).
Based on 500 replications, the average values of the biases (Bias) of estimated frailty and regression parameters, their empirical standard deviations (SD), and the average computation times (T) are summarized in Table 3. From the results of Table 3, we find that the SDs of parameters β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 are relatively small, especially when the sample size is large.It can also be found that the SDs of θ are larger than that of other parameters.With the increase in the true values of θ, the SD of θ will increase accordingly, while the SD of θ will decrease with the increase in sample size.Compared to the model in Scenario 1, some parameters such as the frailty parameter in the semi-competing risk model with log-normal frailty are more difficult to estimate.Similar with Scenario 1, a larger value of θ requires longer computation time.

Real Data Analysis
Colon cancer is a serious type of cancer and has a high mortality rate.This type of cancer is not easy to cure since even if all the apparent diseased tissue can be surgically removed, there still exists other residual tumor parts which are not observable.Therefore, the cancer-recurrence event is commonly observed in clinical trials.For those patients who experience a recurrence, they will be subject to a higher risk of mortality.Thus, it is crucial to find the therapy which can significantly reduce the cancer-recurrence rate.In the following illustration, the colon cancer data provided by [25] are applied to our model.A total of 929 patients with Stage C disease are included for the modeling.Among them, 304 patients received levamisole plus fluorouracil, which is the covariate where the effect will be tested.A total of 468 patients developed recurrence, and 414 of them died.There are three stages recorded: State 1 stands for the stage that the patient is alive and disease-free.Stage 2 represents the condition that the patient is alive but with a recurrent diagnosis of cancer, while stage 3 denotes death.A general therapy for this type of cancer is the combination of levamisole and fluorouracil, which is denoted by the treatment Lev+5-FU [26].Previous research has proven the effectiveness of this therapy on reducing the cancer-recurrence rate.We will show the significance of this covariate on increasing the state transition hazard rate under our model framework.
Similar to the simulation study, we report the estimate of β 1 , β 2 , and β 3 along with the corresponding baseline hazard.The observations (X i , C i , T i ) from the original dataset are re-sampled with replacement in non-parametric way, where α 3 ) is the parameter estimation result under k th bootstrap.Let ᾱ be the mean of estimated parameters with the number of resamples equals to r.The estimated standard error is: In addition, the 95% bootstrap CI from Table 4 is constructed using 2.5% and 97.5% quantiles of {α (k) } r k=1 with r = 1000.As shown in Table 4, the treatment with therapy Lev+5-FU can significantly reduce the probability of occurrence of recurrent event.However, we can observe that this therapy has little effect to the decrease in mortality.Both β2 and β3 have a p-value around 0.4, which indicates the insignificance of these two parameters.The cumulative hazard rate for different transitions are presented by Figure 3 and it indicates that a patient who experiences the recurrence of this disease will have very high mortality rate in the following years.In addition, we also observe the high hazard rate for the recurrence of colon cancer, which shows the importance of therapies such as Lev+5-FU, which can significantly reduce the occurrence of such event.

Discussion
In this paper, we proposed an efficient MM algorithm for the non-parametric maximum likelihood estimation of the illness-death model with a gamma frailty.The paper is motivated by the feature of colon cancer data while the existing method proposed by [4] has a high computational cost on the parameter estimation process.The proposed MM method can decompose the high-dimensional optimization problem into a sum of univariate optimization problems by the construction of a simple surrogate function which provides accurate and efficient simulation results.This MM approach avoids matrix inversion and can provide a toolkit for developing more efficient algorithms in a broad range of statistical optimization problems.Therefore, the proposed MM algorithm can help to estimate the parameters from the semi-competing risk model more efficiently.
In this paper, the explicit form of the marginal likelihood for the illness-death model can be derived.Therefore, the iteration steps do not involve integration calculation.However, in general, the marginal likelihood of the illness-death model with shared frailty is usually hard to obtain as an explicit form of the marginal likelihood is not available.For example, if the illness-death model involves a log-normal frailty or with correlated frailties, the intractable integral of the shared frailty will lead to a more complicated marginal likelihood.Even in the case of gamma frailty, the complicated model setup with three covariates and three non-parametric baseline hazard causes difficulties in parameter estimation.The inclusion of intractable integrals given general shared frailty will lead to a much higher computational cost and lower estimation accuracy.Therefore, even though our method can be extended to handling the illness-death model with general frailty, further modifications should be made to the algorithm to improve the estimation efficiency.

Figure 1 .
Figure 1.Simulation results for estimated cumulative baseline hazard function Λ 01 (t) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 01 (t) = t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.

Figure 2 .
Figure 2. Simulation results for estimated cumulative baseline hazard function Λ 03 (t) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 03 (t) = 2t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.

Figure 3 .
Figure 3. Cumulative hazard for different transitions in terms of days.

Table 1 .
The simulation results of the illness-death model with gamma frailty at 30% censoring proportion.

Table 2 .
The simulation results of the illness-death model with gamma frailty at 50% censoring proportion.

Table 3 .
The simulation results of the illness-death model with log-normal frailty.

Table 4 .
The fitting results of colon cancer data.