Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data

: Motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, the objective of integration of important biomarkers for the early detection of Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD) as a therapeutic intervention is most likely to be beneﬁcial in the early stages of disease progression. Developing predictors for MCI to AD comes down to genotype variables such that the dimension of predictors increases as the sample becomes large. Thus, we consider the sparsity concept of coefﬁcients in a high-dimensional regression model with clustered failure time data such as ADNI, which enables enhancing predictive performances and facilitates the model’s interpretability. In this study, we propose two MM algorithms (proﬁle and non-proﬁle) for the shared frailty survival model ﬁrstly and then extend the two proposed MM algorithms to regularized estimation in sparse high-dimensional regression model. The convergence properties of our proposed estimators are also established. Furthermore simulation studies and analysis of ADNI data are illustrated by our proposed methods.


Introduction
In biomedical research, we often encounter clustered failure time data in which individuals from the same cluster (e.g., family) share common genetic and/or environmental factors.An illustrative example comes from the Alzheimer's Disease Neuroimaging Initiative study where each participant may develop Mild Cognitive Impairment (MCI) and/or develop Alzheimer's disease (AD).The times to these two events for each individual are expected to be highly associated.
In order to account for the correlation of associated failure times, shared frailty or random effect models are commonly used by researchers ( [1][2][3][4]).In particular, the gamma frailty model ( [5][6][7][8]) is numerically convenient for such analysis since the likelihood function exhibits a closed form.For example, ref. [2] proposed a frailty model where individuals within the same cluster share a common random effect.Ref. [9][10][11] provided a good and comprehensive review of the applications of different frailty models for clustered survival data.The authors of [12] compared several commonly used methods and demonstrated the advantages of the gamma frailty model.Ref. [13,14] also explored the nonparametric frailty model and the between-within frailty model for correlated survival data.
For high-dimension regression analysis, an important and useful strategy is to exploit sparsity and assume that the true parameters lie in a low-dimensional subspace.In the past years, sparsity-restricted estimation has attracted a great deal of attention in highdimensional regression models.That is, only a few regression coefficients are assumed to be nonzero ( [15,16]).In addition to the widely used LASSO penalty, there exists other types of regularization methods such as ridge regression ( [17]), bridge regression ([18]), and elastic net ( [19]).In order to improve the performance of the LASSO, many modifications such as the adaptive Lasso, the smoothly clipped absolute deviation (SCAD, [20]), and minimax concave penalty (MCP, [21]) have also been proposed.
In this paper, we develop fast and efficient algorithms for regularized estimation in the general frailty model for clustered failure time data.As the model parameters consists of both the regression coefficients and the unknown nonparametric baseline hazard function, computation in the frailty model with survival data is usually intensive, especially when frailty does not result in a closed-form likelihood function.The regularized estimation poses even more challenges for the problem and further complicates computational burden.The existing approaches of general frailty model almost rely on the EM algorithm.The EM algorithm adopts Newton's method in its maximization step, which involves matrix inversion and may not have good performance in a high-dimensional environment.The minorization-maximization (MM) algorithm possesses an ascent property, which driving the target likelihood function to increase and reliably converges to the maximum from well-chosen initial values ( [22][23][24]).In particular, ref. [25] proposed three different MM algorithms for the gamma frailty model and further demonstrated their utility for the estimation in high-dimensional situations via decomposing a high-dimensional objective function into separable low-dimensional functions.In this paper, we first develop a pair of MM algorithms, including profile and non-profile, for the general frailty model.As demonstrated by [25] in numerical studies, the decomposition inosculated with regularized estimation well in sparse high-dimensional settings.For illustration, we chose to use concave penalties such as the smoothly clipped absolute deviations penalty (SCAD, [20]) and the minimax concave penalty (MCP, [21]) to explore the sparsity because they both possess the good property of unbiasedness.
The rest of the paper is organized as follows.In Section 2, we provide the model description, an overview of MM principle, and then propose a pair of MM algorithms.In Section 3, we derive regularized estimation methods via profile and non-profile MM algorithms in sparse high-dimensional regression setting.The convergence properties of the proposed algorithms are provided in Section 4. In Section 5, a series of simulation studies was conducted to assess the finite-sample performances of the proposed methods.Section 6 provides a real application to the ADNI data.

The Model
Consider datasets from some population that contains M i 1 individuals in i-th subgroup of the population, i = 1, . . ., B. Individuals within the i-th subgroup have dependent event times due to some unobserved covariate information summarized in a frailty, ω i .Let Y ij be the event time, let C ij be the censoring time, and let X ij = (X ij1 , . . ., X ijp ) denote the potential covariates for the j-th individual in the i-th subgroup.The censoring time C ij is assumed to be independent of Y ij , given covarites X ij and frailty ω i .Define , where I(•) denotes the indicator function.Suppose that censorship is noninformative.Conditional on frailty ω i , the hazard rate function is of the following form: where λ 0 (•) is an arbitrary baseline hazard rate, and β is a vector of unknown parameters.Assume that ω i , i = 1, . . ., B are independent and identically distributed with density function f (ω i |θ) on the domain W. Denote α = (θ, β, Λ 0 ); we then propose profile MM and non-profile MM methods to estimate parameters based on the minorization-maximization (MM) principle.

An Overview of MM Principle
Before introducing the two proposed methods, let us review the MM principle.Assume that Y obs is the observed data, (α | Y obs ) is the log-likelihood function with unknown parameter α = (α 1 , . . ., α q ) T , and the maximum likelihood estimate of α is α = argmax (α | Y obs ).The MM principle involves two M steps: One is a minorization step, and the other is maximization step.In a maximization problem, the first step is minorization, which aims to construct a minorization/surrogate function for the objective log-likelihood function (α | Y obs ) to be maximized through a series of inequalities satisfying the following two conditions: where α (k) is the k-th approximation of α.Once the minorization function Q(α | α (k) ) is successfully constructed for objective function (α | Y obs ), the following maximization step is to maximizing the surrogate function Q(α | α (k) ) to obtain the (k + 1)-th approximation of α rather than the objective function, i.e., .
By MM principle, we may have (α , and the values of the objective function continue to increase until convergence.

Profile MM Estimation Procedure
For general shared frailty models, we can write the observed log-likelihood function (α|Y obs ) as follows: where the following is the case.
Generally, the Laplace transform of the frailty's distribution is theoretically intractable.Hence, the explicit form of marginal hazard is not available to us.However, that has not stopped us from developing the profile MM approach for estimation in general shared frailty model.Define the following: and rewrite the objective function as follows.
By Jensen's inequality, we have the following: where X is a subset of the real line R, ϕ(•) is the concave function, h(•) is an arbitrary realvalued function defined on X, and g(•) is a density function defined on X.In Equation ( 4), v i (ω i |α (k) ) is a density function, and choosing h(x) as τ i (ω i |α)/v i (ω i |α (k) ), we can apply the above Jensen's inequality and construct the following surrogate function for (α|Y obs ): where the following is the case: which only consists of parameter θ.Moreover, we have the following: where ) separates parameters θ and (β, Λ 0 ) into ( 6) and (7), respectively.In the second M-step, the updated estimates of θ involve maximizing (6) numerically.Due to the presence of the nonparametric component Λ 0 , updating (β, Λ 0 ) is still a big challenge.Following [7], we apply the profile estimation method to profile out Λ 0 in Q 12 (β, Λ 0 |α (k) ), which results in the estimate of Λ 0 given β.
r exp(X rs β) . ( 8) , we obtain the following: which involves β only.In Equation ( 9), Newton's method and large matrix inversion are required to update β in this procedure when there exist a large number of covariates.
Here, we further construct minorizing functions for Q 13 (β|α (k) ) to separate the regression parameters β 1 , . . ., β q from each other under the MM principle.We first use the supporting hyperplane inequality: − log(x) − log(x 0 ) − x − x 0 x 0 to minorize Q 13 (β|α (k) ); then, we have the following surrogate function: where c is a constant not depending on β.We apply Jensen's inequality to the concave function − exp(•) in Q 14 (β|α (k) ) by rewriting the following: where π prs = |X prs |/ ∑ q p=1 |X prs |.Finally, the minorizing function for Q 14 (β|α (k) ) is as follows: where the following is the case.
In general, the structure of minorizing function for the objective log-likelihood function (α|Y obs ) in (3) is the following: with an explicit form update of dΛ 0 by (8).We may observe that the objective function to be maximized is decomposed into a sum of q + 1 univariate functions from (12) as Q 11 (θ|α (k) ) only consists of one parameter usually.The next maximization step of this MM algorithm only involves q + 1 separate univariate optimizations and matrix inversion is unnecessary.Note that the success of the above profile MM algorithm requires the convergence of two improper integrals such as Moreoveer, the convergences of these improper integrals are obvious when we assume the distribution of random effect comes from an exponential distribution family.The estimation proceeds by profile MM algorithm are summarized as follows: Step 1.Given initial values for θ, β, and Λ 0 ; Step 2. Update θ via (6).Update each β p via (11) for p = 1, . . ., q; Step 3. Based on the update of β, compute the estimates of Λ 0 (t ij ) via (8); Step 4. Iterate steps 2 and 3 until convergence.

Non-Profile MM Estimation Procedure
In this subsection, we bypass the profile estimation procedure in previous subsection and continue to develop new MM procedures for Equation (7) to separate parameters β and nuisance baseline hazard rate Λ 0 .Actually, to separate β and Λ 0 of ( 7) is to deal with the last term −Λ 0 (t ij ) exp(X ij β).As in [26], we use the following arithmetic-geometric mean inequality.
Here, x i and a i are non-negative.Choosing (13), we obtain the following surrogate function for (7): where the following is the case.
In order to obtain the nonparametric estimate of Λ 0 , the maximization of ( 15) is required.For ease of computation, a one-step late skill is applied to the first order derivative of Equation ( 15); then, we obtain the estimate of λ 0 by the following: which is same as (8) in the profile estimation method.To update β, a similar technique as dealing with Q 14 (β|α (k) ) is used.We apply Jensen's inequality to the concave function ) by rewriting the following: where In the end, the minorizing function for Q 22 (β|α (k) ) is as follows: where the following is obtained: for p = 1, . . ., q.As a result, we construct the surrogate function for the objective loglikelihood function via a non-profile MM principle as follows: with explicit form update of dΛ 0 by (17).From (20), we can find similar nice features as Q pro (θ, β|α (k) ); that is, Q nonpro (θ, β|α (k) ) is a sum of q + 1 univariate functions, which means that the next maximization (second M) step only involves q + 1 simple univariate optimizations.It is worth noting that the parameter separated feature in the proposed profile MM and non-profile MM algorithms will help incoporate with the existing simple off-the-shelf accelerators well and brings about great effectiveness in computation time, as discussed in [25].The estimation proceeds by non-profile MM algorithm are summarized as follows: Step 1.Given initial values of θ, β, and Λ 0 ; Step 2. Update the estimate of θ via (6).Update the estimate of β p based on (19) for p = 1, . . ., q; Step 3. Using the updated estimate of β, compute the estimates of Λ 0 (t ij ) via (17); Step 4. Iterate steps 2 and 3 until convergence.

Regularized Estimation Methods via MM Methods
Followed by Section 3, both proposed estimation approaches created nice parameterseparated surrogate functions to be maximized which may incorporate with regularization cleverly.In this part, we propose regularized estimation approaches based on MM algorithms in regression analysis under general shared frailty model.Many variable selection criteria arise as special cases of the general formulation as discussed in [20], where the penalized likelihood function takes the form where (θ, β, Λ 0 |Y obs ) is the log-likelihood function for the shared frailty model, q is the dimension of β, P(•, λ) is a given nonnegative penalty function, and λ 0 is a tuning parameter, which is allowed to use λ p in more general cases.This penalty shrinks some of the coefficients to zero.Under the scope of general frailty topic, the computation of MLEs is extremely complicated and hard in terms of accuracy as parameters of interests involve three blocks θ, β, and Λ 0 , and even more complicated when there exists a large numbers of coefficients.It is worth mentioning that the proposed profile and non-profile MM algorithms decomposed the coefficient vector β from the other two blocks θ, and Λ 0 and different coefficient parameters are separated from each other.This nice feature of the proposed profile and non-profile MM algorithms may mesh well with various regularization problems in (21) to produce more sparse and accurate estimates.Under profile MM algorithmic technique in (12), we obtain the corresponding minorization function for (21) as follows.
Under the non-profile MM algorithmic technique in (20), the minorization function for ( 21) is as follows.
When P(•, λ) is piecewise differentiable, nondecreasing, and concave on (0, ∞) such as L 1 , MCP and SCAD penalties [27], explored a connection between the local quadratic approximation with MM algorithm.The penalty term −P(•, λ) can be minorized by a local quadratic approximation form as follows: which is actually a one-step minorizing process.By combining function (24) with (12) or (20), respectively, we obtain the final surrogate functions for penalized likelihood function (21) as follows: and the following is obtained. (26) Both Equations ( 25) and ( 26) are written as a sum of a series of univariate functions so that maximizing (25) and (26) will be easier than directly maximizing (21).Moreover, some simple off-the-shelf accelerators may also be used here to make the optimization problems more simplier and efficient.Regularized estimation proceeds by (profile/non-profile) MM algorithms and are summarized as follows: Step 1.Given initial values of θ, β and Λ 0 ; Step 2. Update the estimate of θ via (6); Step 3.For profile MM method, update β by maximizing Step 4. Using the updated estimate of β in Step 3, compute the estimates of Λ 0 (t ij ) via (8) for profile MM method and via (17) for non-profile MM method, respectively; Step 5. Iterate steps 2 to 4 until convergence.

Model Selection
From the recent literature, tuning parameter λ may be selected by multiple data driven model selection criteria, such as Bayesian information criterion BIC ( [28]) and generalized cross-validation GCV ( [29]).In this paper, we consider a widely used BIC-type criterion, defined by the following: to select the tuning parameter λ, where C n = max{1, log[log(q + 1)]}, q is the dimension of β, and the degrees of freedom Ŝ are defined as the number of nonzero parameters in β.

Numerical Examples
Example 1.We independently simulate data from three different frailty models: with three different sample sizes (B, M) = {(15, 20), (30,13), (50, 10)}.The true value of regression vector β is set to be (−2 6 , −1 6 , 1 6 , 2 6 , 3 6 ) with dimension q = 30 and all X i 's are generated from independent uniform distribution between 0 and 0.5.The censoring times are generated from independent uniform distribution to yield censoring proportion at around 15% or 30%.In this example, we numerically illustrate the efficiency of two proposed profile and non-profile MM algorithms under three different (Log-normal, Inverse Gaussian, and Gamma) frailty models with three different sample sizes at two censoring situations.Furthermore, we compare the performance of the two proposed profile and non-profile MM algorithms with the existing estimation approach by coxph function of the survival R package under gamma and log-normal frailty models since the coxph function of the survival R package only allows gamma and log-normal frailty.The computation time of MM algorithm can be improved by using simple off-the-shelf accelerators ( [31,32]); here, we implement the accelerated profile MM and non-profile MM algorithms with the squared iterative method (SqS1).We set the stopping criterion of iteration as follows.
(α (k+1 Based on 500 replications, the average values of estimated frailty and regression parameters (MLE), their biases (Bias), their empirical standard deviations (SD), and run times (T) based on three estimation methods are summarized in Tables 1-6.In general, with a sample size increase, the biases and empirical standard deviations of almost all parameters become smaller.Small sample size (B, M) = (15, 20) always causes obvious biases for frailty parameters in Log-normal and Inverse Gaussian frailty models because the number of clusters is too small.Moreover, a larger censoring proportion usually results in greater empirical standard deviations for almostly all cases.From Tables 1-4, we also observe that the existing estimation approach using coxph function is the fastest among the three methods since the survival R package is optimised using the C language.In terms of estimation accuracy, the non-profile MM algorithm performs the best for both frailty and regression parameters, almost always exhibiting smallest biases and empirical standard deviations in all situations.Even in a small sample size such as (B, M) = (15, 20), the non-profile MM estimates for frailty parameter still perform well, especially for Log-normal frailty and Gamma frailty models.Example 2. In this example, we simulated 200 realizations consisting of B = 50 clusters and M = 6 subjects in each cluster from the frailty model: where the frailty terms are ω i ∼ Gamma(1/θ, 1/θ) with θ = 0.5 for i = 1, . . ., B, β = (1, 3, 0 46 , 2, 4) , q = 50, λ 0 (t) = 5.The X i were marginally standard normal, and the correlation between X i and X j is |i−j| with = 0.25, 0.75, respectively.The censoring times were generated from uniform distribution to yield a censoring proportion around 30%.In this simulation experiment, the utility of our proposed profile MM and non-profile MM estimation methods for regularized estimation was illustrated in a sparse high-dimensional regression model (21) with three different penalties (L 1 , MCP, and SCAD).The model error (ME) and relative model error (RME), which is the ratio of the model error of the regularized estimator and that of the ordinary maximum likelihood estimator, are calculated to evaluate the estimation accuracy.Based on 200 replications, we report the median of relative model errors (MRME) and the average number of correctly and incorrectly identified zero coefficients in Table 7 in which the column labeled "Correct" presents the average restricted only to the true zero coefficients, while the column labeled "Incorrect" depicts the average of coefficients erroneously set to 0. From Table 4, we can observe that the proposed profile and non-profile MM algorithms mesh very well with MCP and SCAD penalties and provide good results in both parameter estimation and variable selection, especially for MCP penalty.For L 1 penalty, we observe that it meshes well with profile and non-profile MM algorithms under lower correlation case at = 0.25 but tends to yield biased estimates and inaccurate variable selection results when the correlation between X i and X j becomes stronger.Furthermore, we report the average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties based on 200 repetition in Table 8.It can be observed that both profile and non-profile MM methods perform similarly well in varying and different penalties, always showing small biases and empirical standard deviations.

Real Data Analysis
Alzheimer Disease's (AD) is the most common dementia that causes progressive memory and other body function losses.According to Alzheimer Association, Alzheimer's is the sixth leading cause of death in the United States.Many researches have studied the development of AD from Mild Cognitive Impairment (MCI).The authors of [33] analyzed the AUC score of whether MCI will transfer to AD using LASSO penalized logistic regression.The authors of [34] applied cognitive scores, ApoE genotype, and CSF biomarkers to predict the transition time from MCI to AD.Moreover, there are research studies such as [35] that used image data to analyze the transition time to dementia.In this paper, we will apply clinical data and selected SNP genotypes to conduct feature selection under three different frailty models (Gamma, Inverse Gaussian, and Log-normal) using SCAD and MCP penalties.The dataset was obtained from ADNI database (adni.loni.usc.edu).In this dataset, 267 people were recorded as cognitively normal (CN) during the first visit.Among these people, 78 of them were diagnosed with MCI before the last visit.Eventually, we observed 22 people transferred from the MCI stage to dementia.To predict the these two conversion times, 19 clinical predictors were applied mainly based on the age, marriage status, education level, and test score wuch as Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog) and Functional Activities Questionnaire (FAQ).Other than these predictors, we also included genotypes of SNP from GWAS such as ApoE-ε4, which have a relationship with early onset dementia or late-onset dementia; 132 covariates are selected for model training.
In general, using the same notation from simulation, we have (B, M) = (276, 2).Individuals are independent, and two events of the same individual are grouped into the same cluster that share the same frailty.The dataset contains the following information.For individual (i = 1, 2, • • • , 276) and event (j = 1, 2), t ij is minimum value of event time Y ij and censoring time C ij .As shown by Table 9, censoring time C ij in this dataset is the time difference between the first observation date of certain event (CN or MCI) and last observation date of this patient.The event time is the state transition time, and Y i1 is the time difference between the first observation date of this patient in state CN and date of transition from CN to MCI.If no transition is observed, censoring occurs where δ i1 = 0 and t i1 = C i1 , otherwise, δ i1 = 1 and t i1 = Y i1 .Similarly, Y i2 is the time difference between the first observation date of this patient in state MCI and the date of transition from MCI to Dementia.If no transition is observed, δ i2 = 0 and t i2 = C i2 , otherwise, δ i2 = 1 and t i2 = Y i2 .X ij = (X ij1 , . . ., X ijp ) denote the potential covariates (such as SNP genotypes and clinical predictors) where p = 132.According to data description, it is reasonable to assume that censoring time C ij is independent of event time Y ij .Moreover, the two events from each individual are considered to be highly associated due to common genetic factors and/or certain habits.Thus, the frailty model is suitable for analyzing this dataset.
Three frailty models are applied for variable selection using SCAD and MCP penalties.According to the results presented by Tables 10-12, around 70 covariates were selected for Gamma Frailty and Log-normal Frailty model, and 27 covariates were selected by an Inverse Gaussian model.Similar covairates were selected by SCAD or MCP penalty for both Profile MM and Non-profile MM under the same model, which verifies the result from simulation study.For selected covariates, the ApoE-ε4 is selected as a negative effect to the survival time as expected.In addition to ApoE-ε4, single SNPs such as rs2333227 and rs669 from the MPO and A2M gene are associated with the development of Alzheimer's Disease.We can observe similar results from [36].However, many selected SNPs are not recognized from other studies, and it needs to be analyzed further whether these SNPs can help to identify the risk of Alzheimer's.
We use the model with the lowest BIC score for prediction by constructing prognostic index in order to see whether the selected covariates can help identify the individual with low risk or high risk.Let X be a collection of X ij .The full dataset is divided into 10 parts (X (k) (k = 1, 2, ..., 10)).Similarly to the cross-validation process, without the part of dataset X (k) , covariates are selected using BIC criteria and the vector of parameters βk is estimated correspondingly.The risk factor rk = X (k)  βk (k = 1, 2, ..., 10) is calculated using dataset X (k) without considering the individuals' Latent variable.Individuals with estimated risk factors below median are classified into to low risk group, and and individual with estimated risk above median is classified into the high risk group.The survival plots for estimated high risk group and low risk group are constructed based on their true survival times.Figure 1 shows that the selected model has conducted a good estimation for the transition time from stage CN to stage MCI.Due to the lack of data for the second transition, their confidence intervals are overlapped.Therefore, the prediction result, especially from stage CN to stage MCI, has shown good estimation using the corresponding model, and two risk groups are distinguished by predicted risk levels.

Discussion
The profile and non-profile MM algorithms are proposed for high-dimension regression analysis with clustered failure time data where a general frailty is used to model within-cluster dependence and the penalty such as the SCAD and MCP is used for inducing sparsity.The proposed methods can separate the high-dimensional minorizing function into a sum of univariate functions after a sequence of minorization steps.These approaches avoid matrix inversion and provide a toolkit for developing more efficient algorithms in a broad range of in statistical optimization problems.Meshing well with sparsity penalties such as SCAD or MCP, the two regularized MM algorithms are further shown to exhibit certain numerical advantages in sparse high-dimensional regression analyses.The shared frailty model only represents a special and relatively simple model among the widely used frailty models for multivariate survival data.For example, the standard shared frailty model assumes that all subjects in the same cluster share a common frailty.This assumption can be relaxed to the correlated frailty terms among subjects in the same cluster.Correlated frailty models present the limitation that shared frailty models may only be used to fit positively correlated event times.Furthermore, frailty is assumed to be time-constant.However, unobserved heterogeneity may also be time dependent, which can be explained by an unobserved random process that unfolds over time.Based on this idea, several approaches have been proposed such as diffusion processes modeling or Levy processes modeling for frailty.As an approach based on birth-death Poisson or simpler, piecewise constant, frailty models have recently been proposed.It would be worthwhile to extend the proposed two MM algorithms in these applications.Lastly, spatially correlated survival data present another important and useful setting where the proposed MM algorithms can be further extended to accommodate.

Figure 1 .
Figure 1.Survival curves of the transition times from CN to MCI and from MCI to AD for high-risk and low risk groups constructed using prognostic index.

Table 1 .
Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 15% censoring.

Table 2 .
Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 30% censoring.

Table 3 .
Simulation results for Gamma frailty model in Example 1 with different sample sizes at 15% censoring.

Table 4 .
Simulation results for Gamma frailty model in Example 1 with different sample sizes at 30% censoring.

Table 5 .
Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 15% censoring.

Table 6 .
Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 30% censoring.

Table 7 .
The median of relative model errors for gamma frailty model by L 1 , MCP, and SCAD penalties with sample size (B, M) = (50, 6) based on 200 realizations in Example 2.

Table 8 .
The average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties in Example 2.

Table 9 .
Illustration of survival years and censoring date for some patients from the ADNI dataset.

Table 10 .
Training results for Gamma Frailty Model.

Table 11 .
Training results for Inverse Gaussian Model.

Table 12 .
Training results for Log-normal Frailty Model.