Abstract
This study addresses the analysis of complex multivariate survival data, where each individual may experience multiple events and a wide range of relevant covariates are available. We propose an advanced modeling approach that extends the classical shared frailty framework to account for within-subject dependence. Our model incorporates a flexible frailty distribution, encompassing well-known distributions, such as gamma, log-normal, and inverse Gaussian. To ensure accurate estimation and effective model selection, we utilize innovative regularization techniques. The proposed methodology exhibits desirable theoretical properties and has been validated through comprehensive simulation studies. Additionally, we apply the approach to real-world data from the Medical Information Mart for Intensive Care (MIMIC-III) dataset, demonstrating its practical utility in analyzing complex survival data structures.
Keywords:
complex data; multivariate survival data; inverse Gaussian; log normal; general frailty model MSC:
46N30; 62N01; 62N02
1. Introduction
In recent years, advancements in biomedicine, genomics, epidemiology, image processing, and other fields have led to a surge in high-dimensional data analysis emerging as a prominent theme in statistics. A notable example is the MIMIC-III (Medical Information Mart for Intensive Care) study, which aims to identify relevant variables for modeling and predicting two distinct periods: from ICU admission to discharge and from discharge to death. This study has generated high-dimensional, correlated multivariate failure time data which is commonly analyzed by the shared frailty models [1]. Specifically, the Cox model or relative risk model [2] with frailty is employed to incorporate dependence and assess the effects of covariates [3].
However, the observed likelihood of the Cox model with shared frailty often lacks an analytic form, except for the gamma frailty model. Consequently, parameter estimation of general frailty models is always computationally challenging. To perform variable selection for frailty models, Ref. [4] initially proposed a theory based on SCAD penalized regression. Ref. [5] developed an algorithm that utilizes Laplace transformation to handle frailty distributions in their general form. Ref. [6] applied the Laplace approximation of the full likelihood and developed an R package to implement their method. However, the methods mentioned above employ the Newton–Raphson method in their algorithms, resulting in significant computational burden when dealing with high-dimensional covariates. To avoid high-dimensional matrix inversion in the Newton–Raphson method, we try to use the MM (minorize-maximization) principle, to obtain the nonparametric maximum likelihood estimation for general frailty models, as the objective likelihood derived from the MM algorithm exhibits a monotonically increasing trend with reliable convergence properties when initialized appropriately [7].
In general, we present an MM algorithm for the Cox model with general frailties in this paper, highlighting its applicability in high-dimensional scenarios. We introduce a regularized estimation method, in which our algorithm decomposes the objective function with high-dimensional parameters into separable functions with low-dimensional parameters. This approach seamlessly integrates into the analysis of multivariate failure time data, effectively handling the challenges posed by its high-dimensional nature. LASSO is a commonly used penalty, but it can introduce notable biases in the resulting estimator. Therefore, we employ concave penalties such as SCAD [8] and MCP to conduct variable selection that gives consistent estimates.
The structure of the paper is as follows: In Section 2, we introduce the formulation of the Cox model with general frailties for high-dimensional multivariate failure time data. Section 3 presents our proposed MM algorithm for estimating model parameters. In Section 4, we introduce a regularized estimation method using the profile MM algorithm, leveraging sparse regression assumptions with high-dimensional parameters. We establish the convergence properties in Section 5. To assess the finite sample performances, we conducted simulation studies as described in Section 6. In Section 7, we illustrate the proposed methods, using the aforementioned MIMIC-III dataset. Finally, we provide further discussion in Section 8.
2. Data and Model Formulation
Assume there are J different types of events and there are n patients in the study. Denote the event time for the i-th studied subject and its j-th event type by , and the corresponding censoring time is denoted by . We use to denote the event time and let be the censoring indicator ( for right-censored observation; otherwise) and be the high-dimensional covariates. The observed data consist of . Given a noninformative censoring assumption, the Cox model with generality frailties assume that, conditional on the subject-specific frailties , are independent of one other and the conditional hazard function of given and takes the form
where are the individual-level frailty variables with density function , which are independently and identically distributed. The baseline hazard of the j-th event is denoted by , and is a parameter vector. We denote as the cumulative hazard functions of all events. The three components make up the model parameters, i.e., , and the nonparametric components . To simplify the wording, we define . The observed data likelihood function of model (1) can be written as
Generally, there is no closed-form solution of the frailty distribution’s Laplace transform. Hence, we cannot find the explicit form of the marginal hazard in (2). In the following, we apply the MM principle, to deal with such an intractable case and to separate parameters when the model parameters are especially large, for fast and accurate estimation of model parameters. The MM principle is presented as follows: the minorization step first constructs a surrogate function , which satisfies
where denotes the current estimate of in the k-th iteration. Function always lies under and is tangent to it at the point . The maximization step then updates by , which maximizes the surrogate function instead of :
3. Likelihood Function and the Proposed Methodologies
From (2), we then formulate the log-likelihood function as
where
Given the following weight function,
then we can rewrite the objective function,
According to the measure-theoretic form of Jensen’s inequality,
where , is concave, is the corresponding density function on real line and is an arbitrary function on real line . Then, applying this inequality (6) to Equation (5), we let be the corresponding density function , and the is defined as . takes the concave log function. Thus, we have,
where is a constant. Then, substituting , using Equation (4), we can construct the following surrogate function for :
where
with parameter . And
where Therefore, successfully separates the parameters , , and into (8) and (9), correspondingly. Then, in the second M-step, the will be updated by maximizing (8) numerically. However, the updating of by maximizing (9) is challenging, due to the presence of the nonparametric components . Therefore, to tackle this issue, following [9], we utilize their profile estimation approach in (9), to profile out given , which results in the estimate of as
By substituting (10) into (9), we furthermore have
where is a constant. Therefore,
which includes parameter only. It is obvious that is of the same form as the Cox model’s log partial likelihood, where the modified Cox regression technique can be applied to obtain the updated estimates of , but this procedure will apply Newton’s method, which involves matrix inversion, which is computationally inefficient with large number of covariates. As the MM algorithm helps to separate the estimation of parameters, here we treat (11) as a new objective function and construct the minorizing function by reformulating the high-dimensional parameter estimation problem into a collection of low-dimensional optimizations. Here, we use the hyperplane inequality, which is generally applied in the MM algorithm [10]:
In inequality (12), we take
Then,
with constant . Then we have the following surrogate function:
For the finite form of Jensen’s inequality:
where are positive weights with and is concave. As in [11], the concave function can then adopt inequality (14), and a part of can be rewritten as
where is the weight in Jensen’s inequality. Thus,
Then, substituting (15) into , the minorizing function can be obtained as
where
for . In the first M-step of the profile MM algorithm, the final minorizing function using the profiled method for the objective log-likelihood is
with the update of by (10). From (18), maximizing the original objective function can be transferred into the maximization for a collection of univariate functions, as the from is one-dimensional in most cases. Thus, the next M-step is conducted by optimizing univariate objective functions separately without the inefficient matrix inversion. We can see that there exist two integrals, and . While our model is designed to address general frailty forms, the improper integrals presented above may not be computationally retrievable in extremely sparse and high-dimensional cases. However, as long as these integrals can be numerically calculated, our MM algorithm will always converge. Then, we propose the following estimation procedure.
4. The Algorithms
The parameter-separated surrogate functions proposed in Section 3 can cope well with sparsity-inducing penalties like SCAD or MCP. Therefore, in this section, the regularized estimation method is proposed, using the MM principle, as discussed in Section 3. Many variable selection penalties are special cases from the general form given in [8], and the likelihood function incorporated with the penalty term is written as
where is discussed in the previous section with q-dimensional . is a non-negative tuning parameter that can also be set as in more general cases. The variable selection can be realized by shrinking some of the coefficients to zero, using a given penalty. With general frailties, the computation of MLEs is rather complicated, as the parameters involve three parts, , and it is more challenging when the number of parameters is high-dimensional, which is indeed our case. From the discussion in the previous section, our proposed profile MM algorithm separates all these parameters, which leads to an efficient and accurate estimation. This nice property of our proposed profile MM algorithm can mesh well with the different kinds of regularization penalties in (19), to produce efficient and accurate sparse estimation. Using the same profile MM strategies as in (18), as is the surrogate function for , the corresponding minorization function for can be expressed as
where is a concave function that is nondecreasing and piecewise differentiable, defined on , Ref. [12] proposed an approximation approach for the penalty term, using quadratic functions:
combining function (21) with (18):
Therefore, the final surrogate function based on the MM algorithm for the penalized log-likelihood (19) is written as
Equation (22) decomposes the original maximizing function (19) to a sum of univariate functions that is more computationally efficient. Moreover, off-the-shelf accelerators can be applied, to improve the efficiency of the optimization problems. Then, we propose an alternative estimation procedure.
From the literature, there are different criteria for the selection of tuning parameter , such as the BIC (Bayesian information criterion [13]) and the GCV (generalized cross-validation [14]). In this paper, a BIC-type criterion is applied, which is defined by
where with q-dimensional . is the degrees of freedom, and it denotes the number of estimates from with nonzero values. To determine the optimal , we use a similar method to the one commonly used in the R package “glmnet” [15]. To determine the appropriate range of , we first conduct a search from , using the BIC criteria. Next, we select the optimal , using a grid search within this determined range.
5. Theoretical Properties
The convergence properties are established in this section, for both the profile MM algorithm and the regularized MM algorithm. Let be the minoring function based on the original objective , where denotes the parameters and is the k-th iteration’s estimation. The general convergence of the MM algorithm is provided by [16], as stated in Lemma 1 below. We denote to be the maximizer of , given the following regularity conditions:
- I.
- The set of parameter is open .
- II.
- The objective function is continuously differentiable.
- III.
- The set defined as is compact in .
- IV.
- The surrogate function is continuously differentiable in and continuous in .
- V.
- The objective function ’s stationary points are isolated.
- VI.
- For the surrogate function , there exists a unique global maximum.
Lemma 1.
Let denote a sequence from the MM algorithm:
- (i)
- is continuous at if VI is satisfied.
- (ii)
- If I–VI are satisfied, for any initial value , when , tends to stationary point . Moreover, = , and the likelihood sequence strictly increases to if for all k.
Therefore, based on this Lemma, given Condition 1 below, we have the convergence of our MM algorithm.
Condition 1.
Conditions for the convergence of the profile MM algorithm:
- (a)
- is continuously differentiable.
- (b)
- γ and are compact, given .
- (c)
- Stationary points for are isolated.
Theorem 1.
If Condition 1 holds, for any initial value of , the profile MM algorithm (Algorithm 1) converges to .
| Algorithm 1 Estimating Procedure. |
| S1. Provide initial values for parameters γ, , and . S2. Update the estimation of parameter γ, using (8). S3. Update the estimation of other parameters of covariates , using (17) for . S4. Compute the estimation of , using (10) with the previous estimated from S3. S5. Conduct iterations from S2 to S4 repeatedly till convergence. |
Proof.
Therefore, the initial minorizing function satisfies the condition that
Then, after profiling out ,
from (18), has a unique global maximum and is a unimodal function with a unique global maximum that verifies Condition VI. Condition I is easily followed from the form of . Condition II follows by Condition 1(a) and, hence, IV is also satisfied. Also, Condition V is followed by Condition 1(c). Then, we verify Condition III. It follows from the continuity of that is closed. If is unbounded, . By contrast, is bounded when . Combined with Condition 1(b), III is satisfied. Thus, by Lemma 1, the profiled MM algorithm is convergent. □
Theorem 2.
If Condition 1 holds, for any initial value of , the regularized profile MM algorithm (Algorithm 2) converges to .
| Algorithm 2 An alternative method. |
| S1. Provide initial values for parameters γ, , and . S2. Update the estimation of parameter γ, using (8). S3. Under the profile MM method, is updated by maximizing the equation S4. Compute the estimates of , using (10) with estimated in S3. S5. Conduct iterations from S2 to S4 repeatedly till convergence. |
Proof.
Note that, after profiling out , the surrogate function consists of and with the unique global maximum, which verifies condition VI. For the other conditions, I–V, the proof is similar to Theorem 1. Thus, by Lemma 1, the regularized profiled MM algorithm is convergent. □
6. Simulation Study
Example 1.
We conducted simulations for the frailty models stated below:
with two events, , a sample size , and . We set the true value as with dimension . Covariates ’s were generated independently from a uniform distribution . The generation process ensured that the censoring rate fell between and .
We tested the performance of our proposed profile MM algorithms, using the three different frailty models provided above. For computational issue, our algorithm separated the estimation of all parameters into univariate objectives, making it easily adaptable to off-the-shelf accelerators. Utilizing simple off-the-shelf accelerators [17], we can speed up our MM algorithms. As described in [17], there are several types of accelerators that utilize quasi-Newton approximation. For our article, we chose to adopt the squared iterative (SqS1) approach. The simulation results are presented in Table 1, where (MLE) denotes the average estimation of regression and frailty parameters, and (Bias) denotes their biases. (SD) is their estimated standard deviations, estimated using empirical standard deviation (for the estimated standard deviation was where was the estimation of in replication), (K) is the number of iterations until convergence, (T) is the total run time, in terms of seconds, and the final value for the objective function is denoted by (L). The results indicate that our algorithms exhibited fast convergence and performed well across all three frailty models. Our algorithm also demonstrated good estimation accuracy, as evidenced by the nearly unbiased results and the high significance of the estimates of , which was observed from their estimated standard deviations in all cases.
Table 1.
Simulation results of Example 1.
Example 2.
In this example, we conducted a simulation for multivariate frailty models:
for , with . Here, we set the true coefficient vector as with dimension . We generated the covariates following multivariate normal distribution. The mean was equal to zero and the covariance was for with a first-order autoregressive form, where . Similar to Example 1, there existed censored observation while the censoring proportion was set to be between to .
In the following simulation study, we tested the correctness of the variable selection of our proposed regularized profile MM methods in a high-dimensional regression model, using MCP and SCAD penalties with high parameter sparsity (19). The first derivative of the penalty function for SCAD was and for MCP it was . According to [4,18], they suggested taking for SCAD and for MCP. To assess the performance of the model selection method, we conducted 200 replications. Based on these replications, we calculated the probability of obtaining the true model, where all covariates with non-zero parameters were estimated to have non-zero values, while covariates with zero parameters were not selected. This measure provided an evaluation of how well the method performed in correctly selecting the model that accurately represented the underlying data. We counted the number of correctly identified zero coefficients and the non-zero parameters that were estimated to be zero in each simulation run, and the average is shown in Table 2. The column titled “Correct” denotes the average count of true zero coefficients with zero estimated value, and the column titled “Incorrect” represents the average count of parameters where the true value was not zero but had an estimated zero value. We can observe that it consistently provided correct variable selection results across three different frailty models. Furthermore, we report the average parameter estimates and their bias and the corresponding mean squared error with 200 replications in Table 3. We observe that our proposed MM profile algorithms effectively handled penalties such as MCP and SCAD, producing accurate estimation results for various frailties. In particular, the inverse Gaussian frailty model with the SCAD penalty demonstrated excellent performance. Moreover, we also found that the estimates from the SCAD penalty were consistently more accurate than the results from the MCP penalty. The results from Table 2 and Table 3 highlight the efficacy of our approach in dealing with different penalty functions and its ability to yield reliable estimation results for various types of frailty models.
Table 2.
Simulation results based on three frailty models using two types of penalties. The sample size was 400, and 200 replications were conducted in Example 2.
Table 3.
The estimated parameters’ MLE, Bias, and SD (standard error). The sample size equaled 400 and the replications number was 200 in Example 2.
7. Real Data Application
MIMIC-III [19] is a dataset that includes information about patients who have been admitted to a critical care unit for various diseases. In this study, we specifically focused on a subset of 8516 patients who were admitted due to respiratory disease. The dataset included 82 covariates, which consisted of clinical variables such as mean blood pressure, heart rate, height, and other features, like whether the patient was transferred from another hospital. Two events were considered in our analysis. The first event was the time from admission to discharge from the ICU, and the second event was the time from discharge to death.
Three frailty models—namely gamma, log-normal, and inverse Gaussian—were fitted, using the regularized profile MM estimation method, with both MCP and SCAD penalties. The results are given in Table 4. The result obtained using MCP and SCAD penalties were similar, as they both selected the same set of significant variables, using the BIC criterion. For both the log-normal and the inverse Gaussian models, a total of 11 significant covariates were selected. These covariates were primarily clinical variables, as indicated by Table 5 and Table 6. Among the 11 clinical variables that were considered significant, more than half of them were numerical variables, such as height. These numerical variables were considered to be more relevant to the two events being studied. On the other hand, other types of variables, such as categorical variables, may have had less of a direct relationship with the events. In addition to the clinical variables, it was observed that patients with booked admission (ELECTIVE) had a significant positive impact on the two events being studied. This suggests that patients who have planned admissions may have better outcomes compared to those who are admitted in emergency situations. On the other hand, variables related to patients’ personal information, such as marital status, had smaller effects on the two events. This indicates that these personal factors may have had less influence on the outcomes being studied. For the Gamma frailty model, as presented by Table 7, more significant covariates were found: for example, the ethnicity was considered to be significant. It is reasonable that different frailty models will generate different estimates. It is worth noting that the Gamma frailty model selected too many sparse features, including ethnicity, which may not have been preferable in this particular case. However, regardless of the choice of frailty, there were seven key features that were consistently selected by all models. This suggests that our model can consistently select those features which play a vital role in reflecting the length of a patient’s stay in the ICU.
Table 4.
The minimum BIC scores (BIC) and number of significant variables from the three frailty models.
Table 5.
Parameter estimates for covariates using the inverse Gaussian frailty model.
Table 6.
Parameter estimates for covariates, using the log-normal frailty model.
Table 7.
Parameter estimates for covariates, using the gamma frailty model.
8. Discussion
We have developed an innovative algorithm for parameter estimation in the analysis of complex multivariate failure time data with high-dimensional covariates. This type of data presents challenges, due to its intricate nature and the presence of multiple correlated survival outcomes. Estimating frailty models, which incorporate random effects to capture unobserved heterogeneity, typically requires nonparametric maximum likelihood estimation, due to the unknown baseline hazard function. Most existing researches apply the gamma frailty model because it provides an explicit formula for parameter estimation in each iteration, reducing the computational burden. By contrast, our method offers the advantage of being applicable to general frailty models and providing efficient estimation results, even in high-dimensional cases. This is achieved through the incorporation of regularization methods.
Our proposed algorithm efficiently addresses these challenges by accurately estimating the high-dimensional parameters. Our method avoids the Laplace approximation and the traditional Newton–Raphson method, which can result in inaccurate and inefficient estimation when dealing with high-dimensional data. A significant contribution of our method is the utilization of a decomposition approach, which splits the minorizing function into a collection of univariate functions. This approach improves computational efficiency and ensures robust parameter estimation. In high-dimensional cases, the traditional EM algorithm for parameter estimation of frailty models can lead to significant computational costs. In practical applications involving various clinical and genetic covariates, our algorithm enables the estimation procedure to be completed within a reasonable timeframe, thus facilitating effective identification of key features.
Importantly, our algorithm can be applied to various settings and complexities of multivariate survival data without relying on specific regularization techniques. In addition to the multiple event data modeled in this article, our algorithm can be further extended, to handle data with different structures of correlation, such as recurrent event data and clustered survival data with general frailties. The package “frailtyMMpen” can be downloaded from CRAN, providing convenient support for setting up various types of survival data and regularization methods in the model. By leveraging this algorithm, researchers and practitioners can effectively analyze and gain insights from real-world applications of multivariate survival data. Understanding the relationships between covariates and survival outcomes is crucial for informed decision-making and predictive modeling in these scenarios.
Author Contributions
Data curation, X.H. and Y.Z.; Formal analysis, Y.Z. and X.H.; Investigation, X.H. and Y.Z.; Methodology, X.H. and Y.Z.; Project administration, J.X. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Clayton, D.G. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
- David, C.R. Regression models and life-tables (with discussion). J. R. Stat. Soc. B 1972, 34, 187220. [Google Scholar]
- Andersen, P.K.; Klein, J.P.; Knudsen, K.M.; Tabanera y Palacios, R. Estimation of variance in Cox’s regression model with shared gamma frailties. Biometrics 1997, 53, 1475–1484. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Li, R. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 2002, 30, 74–99. [Google Scholar] [CrossRef]
- Androulakis, E.; Koukouvinos, C.; Vonta, F. Estimation and variable selection via frailty models with penalized likelihood. Stat. Med. 2012, 31, 2223–2239. [Google Scholar] [CrossRef] [PubMed]
- Groll, A.; Hastie, T.; Tutz, G. Selection of effects in Cox frailty models by regularization methods. Biometrics 2017, 73, 846–856. [Google Scholar] [CrossRef] [PubMed]
- Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Johansen, S. An extension of Cox’s regression model. Int. Stat. Rev. 1983, 51, 165–174. [Google Scholar] [CrossRef]
- Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
- Ding, J.; Tian, G.L.; Yuen, K.C. A new MM algorithm for constrained estimation in the proportional hazards model. Comput. Stat. Data Anal. 2015, 84, 135–151. [Google Scholar] [CrossRef]
- Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617. [Google Scholar] [CrossRef] [PubMed]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Craven, P.; Wahba, G. Smoothing noisy data with spline functions. Numer. Math. 1978, 31, 377–403. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [PubMed]
- Vaida, F. Parameter convergence for EM and MM algorithms. Stat. Sin. 2005, 15, 831–840. [Google Scholar]
- Zhou, H.; Alexander, D.; Lange, K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 2011, 21, 261–273. [Google Scholar] [CrossRef] [PubMed]
- Ma, S.; Huang, J. A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 2017, 112, 410–423. [Google Scholar] [CrossRef]
- Johnson, A.E.; Pollard, T.J.; Shen, L.; Lehman, L.w.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Anthony Celi, L.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).