Abstract
Subgroup analysis with survival data are most essential for detailed assessment of the risks of medical products in heterogeneous population subgroups. In this paper, we developed a semiparametric mixture modeling strategy in the proportional odds model for simultaneous subgroup identification and regression analysis of survival data that flexibly allows the covariate effects to differ among several subgroups. Neither the membership or the subgroup-specific covariate effects are known a priori. The nonparametric maximum likelihood method together with a pair of MM algorithms with monotone ascent property are proposed to carry out the estimation procedures. Then, we conducted two series of simulation studies to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of German breast cancer data is further provided for illustrating the proposed methodology.
Keywords:
heterogeneous covariate effects; mixture of proportional odds model; MM algorithm; nonparametric maximum likelihood MSC:
62N01; 62N02
1. Introduction
In some clinical trials, a substantial proportion of patients respond favorably to a new treatment while the others may eventually relapse. Subgroup analyses aim to classify the patients into a few homogeneous groups and tailor a disease treatment specifically for each subgroup to optimize the treatment effect. In recent years, subgroup identification has received increasing attention in a wide range of fields such as clinical trials, public management, econometrics, and social science. For example, Refs. [1,2] conducted subgroup analysis in econometrics and marketing, while Refs. [3,4] implemented the subgroup analysis in epidemiology and biology, respectively.
Statistical methods for subgroup analysis have also been greatly developed recently. Among them, a finite mixture model has been recognized as an important tool and has been widely used for analyzing data from a heterogeneous population [5]. For example, there are many studies on the Gaussian mixture model for data clustering and classification [6,7,8]. Ref. [9] introduced a structured logistic-normal mixture model to identify subgroups in randomized clinical trials with differential treatment effects. Refs. [10,11] extended the mixture model-based approach to generalized linear models. Bayesian approaches for mixture regression models are studied by [12]. Moreover, nonparametric mixture models have also been under study in recent years. Ref. [13] studied a nonparametric mixture model for cure rate estimation. Ref. [14] studied a semiparametric accelerated failure time mixture model for estimation of a biological treatment effect on a latent subgroup of interest in randomized clinical trials. Ref. [15] proposed a semiparametric Logistic–Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring.
Mixture models are deeply connected to the expectation–maximization (EM) algorithm. The EM algorithm is a popular approach for maximum likelihood estimation in incomplete data problems, of which finite mixtures are canonical examples because the unobserved labels of the individuals (as in unsupervised clustering) give a direct interpretation of missing data [16]. Actually, the EM algorithm is a special member of the general family of MM algorithms [17]. The MM algorithm possesses great flexibility in solving optimization problems because the basic idea of MM algorithm is to convert a difficult optimization problem into a series of simpler ones. The MM algorithm has been a powerful tool for optimization problems and enjoys its greatest vogue in computational statistics. Thus far, the MM algorithm has been widely used in many statistical optimization problems. We can find applications of MM principle in a broad range of statistical contexts, including the Bradley–Terry model [18], quantile regression [19], variable selection [20,21], the proportional odds model [22], the shared frailty model [23], distance majorization [24] and so on. The key property of MM principle is that it can decompose a high-dimensional objective function into separable low-dimensional functions by the construction of surrogate function. In this paper, we introduce the general MM principle to the semiparametric mixture of proportional odds model for simultaneous subgroup identification and regression analysis.
The rest of the paper is organized as follows. We first review the MM algorithm in Section 2. In Section 3, we present the latent proportional odds model and develop a pair of estimation procedures for the proposed model using the MM algorithm. In Section 4, we provide two parts of simulation studies to select the number of subgroups and assess the finite-sample performances of the proposed methods. We further provide an application of the German breast cancer study data to illustrate the practical utilities of the proposed methods in Section 5.
2. MM Principle
The MM algorithm is an important and powerful tool for optimization problems and enjoys its greatest vogue in computational statistics. For example, is the objective log-likelihood function, are the vector of parameters to be estimated, and is the parameter space. The maximum likelihood estimate of is . The MM principle provides a general frame for constructing iterative algorithms with monotone convergence, which involves double duty. In maximization problems, the first M stands for minorize and the second M for maximize. The minorization step first constructs a surrogate function such that
where denotes the current estimate of in the k-th iteration. The maximization step then updates by , which maximizes the surrogate function instead of , that is,
Since
the constructed MM algorithm can increase the objective function at each iteration and possess the ascent property driving the objective optimization function uphill.
3. Proportional Odds Model with Individual-Specific Covariate Effects
Let T be time to event. The proportional odds model postulates that
where is the hazard function of given the covariates . Let the conditional survival function of T be . We know that . In the proportional odds model, is the regression coefficients, quantifying the effect of the covariates X on the time to event T through the conditional hazard function. It is assumed to be the same for all subjects in the population. In practice, however, subjects may come from different subgroups, the covariate effects may differ and therefore it is more appropriate to assume the following proportional odds model with individual-specific covariate effects:
In this model, we assume that the covariate effects for the subject i may differ. For both parsimony and better interpretation, it is reasonable to assume that with probability . In other words, there are only M different subgroups for the covariate effects , where are M different regression coefficients. It is of our interest to estimate the number of groups M, and . Note that .
3.1. Heterogeneity Regression Pursuit via MM Algorithm
The joint density function of can be written as
where
denotes the density function of the m-th subgroup, , is the corresponding effect parameter of X in the m-th subgroup. Given the observed data , we have the observed log-likelihood function as
where . Given the parameters in the k-th iteration and denoting
then we can rewrite as
By the continuous version of Jensen’s inequality as , we can transfer the function outside the integral to the inside of the integral, where is a density function. Inspired by this feature, we construct a density function in Equation (2) which plays the role of function , the rest of the part plays the role of function . By the following calculation,
the logarithmic function on the outside is transferred to the inside of the integral, which breaks down the product terms into a summation. Hence, we construct the surrogate function for as
where
and
The surrogate function separates the parameters and into (3) and (4), respectively. All the parameters in (3) are separated from each other so that updating is as straightforward as
To update , we apply the supporting hyperplane inequality to Equation (4) to release the object x from the logarithmic function,
we have
where . Then, we obtain the following surrogate function for ,
3.2. Profile MM Method
Following [25,26], we consider the profile estimation approach and first profile out in for any given . This leads to the estimate of given as
Substituting (6) into yields the function
We use the supporting hyperplane inequality again to deal with , then we obtain the follwing where all are separated from each other,
where . Finally, the estimate of each can be obtained by one step Newton iteration.
3.3. Non-Profile MM Method
For the above profile MM method, the estimate of is highly related to the estimate of because we treat nonparametric component as a function of in the profile step. Inspired by the parameter-separable property of the MM principle, we further separate the nonparametric part with the according to the decomposition rules. That is, we use the following inequality of arithmetic and geometric means to the function as
Here, we let and , then we have
That is,
Substituting the above inequality back to , we may obtain
where
and
It is observed that the parameters and are completely separated, then the corresponding parameter estimators can be obtained by differentiating them separately. Letting , we obtain the estimate of by
To update , we calculate the first and second derivatives of as follows:
and
Then, can be estimated by
4. Simulation Study
According to the estimation equation derived in previous sections, we simulate the data to analyze the estimation result at finite sample size. As the number of groups M in the mixture of proportional odds model is unknown and will be estimated by a data-driven manner. Here, we use the modified Bayesian information criterion (BIC [19]) to choose the number of components M by minimizing the criterion function:
where n is the sample size and q is the dimension of . Note that this is strictly related to the marginal likelihood computation as can be seen in [27,28,29].
Scenario 1. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two covariates
where the two covariates and are independent and follow the standard normal distribution, , We randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let so that for , for . We choose different sample sizes and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 1 reports the mean and median of the estimator and the proportion of equal to the true number of subgroups based on 500 replications. Table 2 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators , , and based on 500 replications. We found that the mean of gradually approaches the true number of subgroups 2, and the median of remains at 2, and the proportion of correctly identifying the true number of subgroups is close to 1 with the increase of sample size. Moreover, our methods can estimate the parameters well with small empirical bias, small MSE, and small standard error, even at small sample sizes.
Table 1.
The mean, median, standard error (s.d.), and the proportion (Pro) of in Scenario 1.
Table 2.
Parameter estimation results in Scenario 1.
Scenario 2. We generate right-censored data from a proportional odds model with three covariates
where the three covariates and are independent and follow the standard normal distribution. We set and for all subjects. Note that the model corresponds to the latent proportional odds model with the true number of subgroups M being 1. We set the censoring proportion at 30% and choose different sample sizes to assess their performance of the proposed estimation procedures.
Table 3 reports the mean and median of the estimator and the proportion of equal to the true number of subgroups based on 200 replications. Table 4 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators based on 500 replications. Based on the profile MM method, we observed that the median of is equal to the true number 1, the mean also gets closer to 1, and the empirical percentage of is close to 1 as the sample size increases. Based on the non-profile MM method, we found that the mean and median of are both the true number 1, and the proportion of is 1 when the sample sizes are 250 and 500. Furthermore, our methods show excellent performance in parameter estimation. We obtain great estimates of under different sample sizes.
Table 3.
The mean, median, and the proportion (Pro) of in Scenario 2.
Table 4.
Parameter estimation results in Scenario 2.
Scenario 3. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two correlated covariates
where the two covariates are generated from a multivariate normal distribution with mean zero and a first-order autoregressive structure for Set , sample size . Then, we randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let so that for , for . We choose different values of with and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 5 reports the mean and median of the estimator and the proportion of equal to the true number of subgroups based on 500 replications. Table 6 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators , , and based on 500 replications. In Table 5, the results of the profile MM method and non-profile MM method are basically consistent, the proportions of are very close to 1 and the smaller the value of , the larger the value of Pro. it shows that our proposed methods can accurately identify the number of subgroups. In Table 6, the estimation results at a smaller value of perform better and more stably than the results at a larger value of for both the profile MM method and the non-profile MM method.
Table 5.
The mean, median, and the proportion (Pro) of in Scenario 3.
Table 6.
Parameter estimation results in Scenario 3.
5. Real Data Analysis
Now, we apply the proposed method to analyze the German Breast Cancer Study data which can be available from R package “pec”. The data contain the observations of 686 women where the censoring rate is 56.41%. In order to analyze whether there is heterogeneity in the data, we consider “tgrade(I vs. III, II vs. III )” and “pnodes” as explanatory variables of interest, where “tgrade” indicates tumor grade which is an ordered factor at levels I vs. III or II vs. III, “pnodes” indicates the number of positive lymph nodes. Then, we use the BIC criterion function to determine the number of subgroups M. In Table 7, we report the maximum log-likelihood values (LL), the BIC values (BIC), and the estimated parameters under the number of subgroups . Based on the results in Table 7, we found that the optimal M is 1 by comparing the BIC values. The estimated regression coefficients are detailed in Table 7.
Table 7.
Estimation results for breast cancer data.
6. Conclusions
In this work, we introduce the MM algorithm into a semiparametric mixture modeling strategy in the proportional odds model for subgroup analysis of survival data that flexibly allows the covariate effects to differ among several subgroups. Both proposed MM methods to the semiparametric mixture of proportional odds model are able to conduct simultaneous subgroup identification and regression analysis, which provides a general frame for constructing iterative algorithms with monotone convergence. The main advantage of our MM algorithm is that it can separate the nonparametric baseline hazard rate with other regression parameters and can help to avoid matrix inversion in high-dimensional regression analysis, which makes the estimation process more efficient. Furthermore, our algorithm can mesh well with the existing quasi-Newton acceleration and other simple off-the-shelf accelerators to further boost the estimation process. Such estimation procedures derived for the semiparametric mixture proportional odds model can be easily extended to other semiparametric or nonparametric mixture models. Although our proposed MM algorithms are developed for the mixture of proportional odds models, a parallel approach can essentially be developed for the more general mixture of transformation models. We will investigate this in our future work.
Author Contributions
Conceptualization, J.X., X.H.; Data curation, X.H., C.X. and J.H.; Formal analysis, X.H. and C.X.; Investigation, X.H. and J.S.; Methodology, X.H., J.X. and J.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Frühwirth-Schnatter, S. Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. Am. Stat. Assoc. 2001, 96, 194–209. [Google Scholar] [CrossRef]
- Rossi, P.E.; Allenby, G.M. Bayesian statistics and marketing. Mark. Sci. 2003, 22, 304–328. [Google Scholar] [CrossRef]
- Green, P.J.; Richardson, S. Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 2002, 97, 1055–1070. [Google Scholar] [CrossRef]
- Wang, P.; Puterman, M.L.; Cockburn, I.; Le, N. Mixed Poisson regression models with covariate dependent rates. Biometrics 1996, 52, 381–400. [Google Scholar] [CrossRef] [PubMed]
- Everitt, B. Finite Mixture Distributions; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49, 803–821. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R. Discriminant analysis by Gaussian mixtures. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 155–176. [Google Scholar] [CrossRef]
- McNicholas, P.D. Model-based classification using latent Gaussian mixture models. J. Stat. Plan. Inference 2010, 140, 1175–1181. [Google Scholar] [CrossRef]
- Shen, J.; He, X. Inference for subgroup analysis with a structured logistic-normal mixture model. J. Am. Stat. Assoc. 2015, 110, 303–312. [Google Scholar] [CrossRef]
- Chaganty, A.T.; Liang, P. Spectral experts for estimating mixtures of linear regressions. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1040–1048. [Google Scholar]
- Hurn, M.; Justel, A.; Robert, C.P. Estimating mixtures of regressions. J. Comput. Graph. Stat. 2003, 12, 55–79. [Google Scholar] [CrossRef]
- Frühwirth-Schnatter, S.; Frèuhwirth-Schnatter, S. Finite Mixture and Markov Switching Models; Springer: Berlin/Heidelberg, Germany, 2006; Volume 425. [Google Scholar]
- Peng, Y.; Dear, K.B. A nonparametric mixture model for cure rate estimation. Biometrics 2000, 56, 237–243. [Google Scholar] [CrossRef]
- Altstein, L.; Li, G. Latent subgroup analysis of a randomized clinical trial through a semiparametric accelerated failure time mixture model. Biometrics 2013, 69, 52–61. [Google Scholar] [CrossRef]
- Wu, R.f.; Zheng, M.; Yu, W. Subgroup analysis with time-to-event data under a logistic-Cox mixture model. Scand. J. Stat. 2016, 43, 863–878. [Google Scholar] [CrossRef]
- Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef]
- Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
- Hunter, D.R. MM algorithms for generalized Bradley-Terry models. Ann. Stat. 2004, 32, 384–406. [Google Scholar] [CrossRef]
- Hunter, D.R.; Lange, K. Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 2000, 9, 60–77. [Google Scholar]
- Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617–1642. [Google Scholar] [CrossRef]
- Yen, T.J. A majorization–minimization approach to variable selection using spike and slab priors. Ann. Stat. 2011, 39, 1748–1775. [Google Scholar] [CrossRef]
- Hunter, D.R.; Lange, K. Computing estimates in the proportional odds model. Ann. Inst. Stat. Math. 2002, 54, 155–168. [Google Scholar] [CrossRef]
- Huang, X.; Xu, J.; Tian, G. On profile MM algorithms for gamma frailty survival models. Stat. Sin. 2019, 29, 895–916. [Google Scholar] [CrossRef]
- Chi, E.C.; Zhou, H.; Lange, K. Distance majorization and its applications. Math. Program. 2014, 146, 409–436. [Google Scholar] [CrossRef]
- Johansen, S. An extension of Cox’s regression model. Int. Stat. Rev. Int. Stat. 1983, 51, 165–174. [Google Scholar] [CrossRef]
- Klein, J.P. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 1992, 48, 795–806. [Google Scholar] [CrossRef]
- Knuth, K.H.; Habeck, M.; Malakar, N.K.; Mubeen, A.M.; Placek, B. Bayesian evidence and model selection. Digit. Signal Process. 2015, 47, 50–67. [Google Scholar] [CrossRef]
- Llorente, F.; Martino, L.; Curbelo, E.; López-Santiago, J.; Delgado, D. On the safe use of prior densities for Bayesian model selection. Wiley Interdiscip. Rev. Comput. Stat. 2022, e1595. [Google Scholar] [CrossRef]
- DiCiccio, T.J.; Kass, R.E.; Raftery, A.; Wasserman, L. Computing Bayes factors by combining simulation and asymptotic approximations. J. Am. Stat. Assoc. 1997, 92, 903–915. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).