Mixture Modeling of Time-to-Event Data in the Proportional Odds Model

: Subgroup analysis with survival data are most essential for detailed assessment of the risks of medical products in heterogeneous population subgroups. In this paper, we developed a semiparametric mixture modeling strategy in the proportional odds model for simultaneous subgroup identiﬁcation and regression analysis of survival data that ﬂexibly allows the covariate effects to differ among several subgroups. Neither the membership or the subgroup-speciﬁc covariate effects are known a priori. The nonparametric maximum likelihood method together with a pair of MM algorithms with monotone ascent property are proposed to carry out the estimation procedures. Then, we conducted two series of simulation studies to examine the ﬁnite sample performance of the proposed estimation procedure. An empirical analysis of German breast cancer data is further provided for illustrating the proposed methodology.


Introduction
In some clinical trials, a substantial proportion of patients respond favorably to a new treatment while the others may eventually relapse.Subgroup analyses aim to classify the patients into a few homogeneous groups and tailor a disease treatment specifically for each subgroup to optimize the treatment effect.In recent years, subgroup identification has received increasing attention in a wide range of fields such as clinical trials, public management, econometrics, and social science.For example, Refs.[1,2] conducted subgroup analysis in econometrics and marketing, while Refs.[3,4] implemented the subgroup analysis in epidemiology and biology, respectively.
Statistical methods for subgroup analysis have also been greatly developed recently.Among them, a finite mixture model has been recognized as an important tool and has been widely used for analyzing data from a heterogeneous population [5].For example, there are many studies on the Gaussian mixture model for data clustering and classification [6][7][8].Ref.
[9] introduced a structured logistic-normal mixture model to identify subgroups in randomized clinical trials with differential treatment effects.Refs.[10,11] extended the mixture model-based approach to generalized linear models.Bayesian approaches for mixture regression models are studied by [12].Moreover, nonparametric mixture models have also been under study in recent years.Ref. [13] studied a nonparametric mixture model for cure rate estimation.Ref. [14] studied a semiparametric accelerated failure time mixture model for estimation of a biological treatment effect on a latent subgroup of interest in randomized clinical trials.Ref. [15] proposed a semiparametric Logistic-Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring.
Mixture models are deeply connected to the expectation-maximization (EM) algorithm.The EM algorithm is a popular approach for maximum likelihood estimation in incomplete data problems, of which finite mixtures are canonical examples because the unobserved labels of the individuals (as in unsupervised clustering) give a direct interpretation of missing data [16].Actually, the EM algorithm is a special member of the general family of MM algorithms [17].The MM algorithm possesses great flexibility in solving optimization problems because the basic idea of MM algorithm is to convert a difficult optimization problem into a series of simpler ones.The MM algorithm has been a powerful tool for optimization problems and enjoys its greatest vogue in computational statistics.Thus far, the MM algorithm has been widely used in many statistical optimization problems.We can find applications of MM principle in a broad range of statistical contexts, including the Bradley-Terry model [18], quantile regression [19], variable selection [20,21], the proportional odds model [22], the shared frailty model [23], distance majorization [24] and so on.
The key property of MM principle is that it can decompose a high-dimensional objective function into separable low-dimensional functions by the construction of surrogate function.In this paper, we introduce the general MM principle to the semiparametric mixture of proportional odds model for simultaneous subgroup identification and regression analysis.
The rest of the paper is organized as follows.We first review the MM algorithm in Section 2. In Section 3, we present the latent proportional odds model and develop a pair of estimation procedures for the proposed model using the MM algorithm.In Section 4, we provide two parts of simulation studies to select the number of subgroups and assess the finite-sample performances of the proposed methods.We further provide an application of the German breast cancer study data to illustrate the practical utilities of the proposed methods in Section 5.

MM Principle
The MM algorithm is an important and powerful tool for optimization problems and enjoys its greatest vogue in computational statistics.For example, (α|Y obs ) is the objective log-likelihood function, α = (α 1 , . . ., α q ) T ∈ Θ are the vector of parameters to be estimated, and Θ is the parameter space.The maximum likelihood estimate of α is α = arg max α∈Θ (α|Y obs ).The MM principle provides a general frame for constructing iterative algorithms with monotone convergence, which involves double duty.In maximization problems, the first M stands for minorize and the second M for maximize.The minorization step first constructs a surrogate functionQ(α|α (k) ) such that where α (k) denotes the current estimate of α in the k-th iteration.The maximization step then updates α (k) by α (k+1) , which maximizes the surrogate function Q(•|α (k) ) instead of (α|Y obs ), that is, the constructed MM algorithm can increase the objective function at each iteration and possess the ascent property driving the objective optimization function (α|Y obs ) uphill.

Proportional Odds Model with Individual-Specific Covariate Effects
Let T be time to event.The proportional odds model postulates that where λ i (t) is the hazard function of T i given the covariates X i .Let the conditional survival function of T be S(t|X) = P(T > t|X).We know that λ(t|X) = − d(− log S(t|X)) dt . In the proportional odds model, β is the regression coefficients, quantifying the effect of the covariates X on the time to event T through the conditional hazard function.It is assumed to be the same for all subjects in the population.In practice, however, subjects may come from different subgroups, the covariate effects may differ and therefore it is more appropriate to assume the following proportional odds model with individual-specific covariate effects: In this model, we assume that the covariate effects β i for the subject i may differ.For both parsimony and better interpretation, it is reasonable to assume that

Heterogeneity Regression Pursuit via MM Algorithm
The joint density function of (T, δ) can be written as where denotes the density function of the m-th subgroup, m = 1, 2, ..., M, β m is the corresponding effect parameter of X in the m-th subgroup.Given the observed data Y obs = ({t i } n i=1 , {d i } n i=1 , {X i } n i=1 ), we have the observed log-likelihood function as where Given the parameters in the k-th iteration and denoting By the continuous version of Jensen's inequality as dx, we can transfer the function ϕ(•) outside the integral to the inside of the integral, where g(x) is a density function.Inspired by this feature, we construct a density function υ mi in Equation ( 2) which plays the role of function g(x), the rest of the part mi plays the role of function f (x).By the following calculation, the logarithmic function on the outside is transferred to the inside of the integral, which breaks down the product terms into a summation.Hence, we construct the surrogate function for (Λ 0 , β, π|Y obs ) as and The surrogate function Q(Λ 0 , β, π|Λ 0 , β (k) , π (k) ) separates the parameters π and (Λ 0 , β) into (3) and (4), respectively.All the parameters {π m } K m=1 in (3) are separated from each other so that updating π m is as straightforward as To update (Λ 0 , β), we apply the supporting hyperplane inequality to Equation (4) to release the object x from the logarithmic function, we have Then, we obtain the following surrogate function for Q(Λ 0 , β|Λ

Non-Profile MM Method
For the above profile MM method, the estimate of Λ 0 is highly related to the estimate of β because we treat nonparametric component Λ 0 as a function of β in the profile step.Inspired by the parameter-separable property of the MM principle, we further separate the nonparametric part Λ 0 with the β according to the decomposition rules.That is, we use the following inequality of arithmetic and geometric means to the function x Here, we let m ), then we have Substituting the above inequality back to Q 1 (Λ 0 , β|Λ (k) 0 , β (k) , π (k) ), we may obtain where It is observed that the parameters Λ 0 and β m are completely separated, then the corresponding parameter estimators can be obtained by differentiating them separately.Letting To update β m , we calculate the first and second derivatives of Q 4 (β|Λ (k) 0 , β (k) , π (k) ) as follows: Then, β m can be estimated by 0 , β (k) , π (k) ).

Simulation Study
According to the estimation equation derived in previous sections, we simulate the data to analyze the estimation result at finite sample size.As the number of groups M in the mixture of proportional odds model is unknown and will be estimated by a data-driven manner.Here, we use the modified Bayesian information criterion (BIC [19]) to choose the number of components M by minimizing the criterion function: where n is the sample size and q is the dimension of β m .Note that this is strictly related to the marginal likelihood computation as can be seen in [27][28][29].
Scenario 1.We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two covariates where the two covariates X i1 and X i2 are independent and follow the standard normal distribution, Λ 0 (t) = (t/2) 2 , We randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let We choose different sample sizes n = 150, 250, 500 and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.Table 1 reports the mean and median of the estimator M and the proportion of M equal to the true number of subgroups based on 500 replications.Table 2 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators π, β 1 , and β 2 based on 500 replications.We found that the mean of M gradually approaches the true number of subgroups 2, and the median of M remains at 2, and the proportion of correctly identifying the true number of subgroups is close to 1 with the increase of sample size.Moreover, our methods can estimate the parameters well with small empirical bias, small MSE, and small standard error, even at small sample sizes.
where the three covariates X i1 , X i2 and X i3 are independent and follow the standard normal distribution.We set β = (1, −3, 2) and Λ 0 (t) = (t/2) 2 for all subjects.Note that the model corresponds to the latent proportional odds model with the true number of subgroups M being 1.We set the censoring proportion at 30% and choose different sample sizes n = 250, 500 to assess their performance of the proposed estimation procedures.Table 3 reports the mean and median of the estimator M and the proportion of M equal to the true number of subgroups based on 200 replications.Table 4 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators β based on 500 replications.Based on the profile MM method, we observed that the median of M is equal to the true number 1, the mean also gets closer to 1, and the empirical percentage of M is close to 1 as the sample size increases.Based on the non-profile MM method, we found that the mean and median of M are both the true number 1, and the proportion of M is 1 when the sample sizes are 250 and 500.Furthermore, our methods show excellent performance in parameter estimation.We obtain great estimates of β under different sample sizes.
where the two covariates are generated from a multivariate normal distribution with mean zero and a first-order autoregressive structure ρ |r−s| for r, s = 1, 2. Set Λ 0 (t) = (t/2) 2 , sample size n = 200.Then, we randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let P(i ∈ G 1 ) = P(i ∈ G 2 ) = 0.5 so that β i = (3, −1) We choose different values of ρ with ρ = 0.2, 0.8 and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.Table 5 reports the mean and median of the estimator M and the proportion of M equal to the true number of subgroups based on 500 replications.Table 6 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators π, β 1 , and β 2 based on 500 replications.In Table 5, the results of the profile MM method and non-profile MM method are basically consistent, the proportions of M are very close to 1 and the smaller the value of ρ, the larger the value of Pro. it shows that our proposed methods can accurately identify the number of subgroups.In Table 6, the estimation results at a smaller value of ρ perform better and more stably than the results at a larger value of ρ for both the profile MM method and the non-profile MM method.

Real Data Analysis
Now, we apply the proposed method to analyze the German Breast Cancer Study data which can be available from R package "pec".The data contain the observations of 686 women where the censoring rate is 56.41%.In order to analyze whether there is heterogeneity in the data, we consider "tgrade(I vs. III, II vs. III )" and "pnodes" as explanatory variables of interest, where "tgrade" indicates tumor grade which is an ordered factor at levels I vs. III or II vs. III, "pnodes" indicates the number of positive lymph nodes.Then, we use the BIC criterion function to determine the number of subgroups M. In Table 7, we report the maximum log-likelihood values (LL), the BIC values (BIC), and the estimated parameters under the number of subgroups M = 1, 2, 3. Based on the results in Table 7, we found that the optimal M is 1 by comparing the BIC values.The estimated regression coefficients are detailed in Table 7.

Conclusions
In this work, we introduce the MM algorithm into a semiparametric mixture modeling strategy in the proportional odds model for subgroup analysis of survival data that flexibly allows the covariate effects to differ among several subgroups.Both proposed MM methods to the semiparametric mixture of proportional odds model are able to conduct simultaneous subgroup identification and regression analysis, which provides a general frame for constructing iterative algorithms with monotone convergence.The main advantage of our MM algorithm is that it can separate the nonparametric baseline hazard rate with other regression parameters and can help to avoid matrix inversion in high-dimensional regression analysis, which makes the estimation process more efficient.Furthermore, our algorithm can mesh well with the existing quasi-Newton acceleration and other simple offthe-shelf accelerators to further boost the estimation process.Such estimation procedures derived for the semiparametric mixture proportional odds model can be easily extended to other semiparametric or nonparametric mixture models.Although our proposed MM algorithms are developed for the mixture of proportional odds models, a parallel approach can essentially be developed for the more general mixture of transformation models.We will investigate this in our future work.

Table 1 .
The mean, median, standard error (s.d.), and the proportion (Pro) of M in Scenario 1.

Table 2 .
Parameter estimation results in Scenario 1.

Table 3 .
The mean, median, and the proportion (Pro) of M in Scenario 2.

Table 4 .
Parameter estimation results in Scenario 2. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two correlated covariates

Table 5 .
The mean, median, and the proportion (Pro) of M in Scenario 3.

Table 6 .
Parameter estimation results in Scenario 3.