Next Article in Journal
DExMA: An R Package for Performing Gene Expression Meta-Analysis with Missing Genes
Next Article in Special Issue
An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data
Previous Article in Journal
New Extensions of the Parameterized Inequalities Based on Riemann–Liouville Fractional Integrals
Previous Article in Special Issue
Efficient Estimation and Inference in the Proportional Odds Model for Survival Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixture Modeling of Time-to-Event Data in the Proportional Odds Model

1
School of Mathematics, Yunnan Normal University, Kunming 650092, China
2
School of Mathematics, Minnan Normal University, Zhangzhou 363000, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(18), 3375; https://doi.org/10.3390/math10183375
Submission received: 18 August 2022 / Revised: 9 September 2022 / Accepted: 11 September 2022 / Published: 16 September 2022
(This article belongs to the Special Issue Recent Advances in Computational Statistics)

Abstract

:
Subgroup analysis with survival data are most essential for detailed assessment of the risks of medical products in heterogeneous population subgroups. In this paper, we developed a semiparametric mixture modeling strategy in the proportional odds model for simultaneous subgroup identification and regression analysis of survival data that flexibly allows the covariate effects to differ among several subgroups. Neither the membership or the subgroup-specific covariate effects are known a priori. The nonparametric maximum likelihood method together with a pair of MM algorithms with monotone ascent property are proposed to carry out the estimation procedures. Then, we conducted two series of simulation studies to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of German breast cancer data is further provided for illustrating the proposed methodology.

1. Introduction

In some clinical trials, a substantial proportion of patients respond favorably to a new treatment while the others may eventually relapse. Subgroup analyses aim to classify the patients into a few homogeneous groups and tailor a disease treatment specifically for each subgroup to optimize the treatment effect. In recent years, subgroup identification has received increasing attention in a wide range of fields such as clinical trials, public management, econometrics, and social science. For example, Refs. [1,2] conducted subgroup analysis in econometrics and marketing, while Refs. [3,4] implemented the subgroup analysis in epidemiology and biology, respectively.
Statistical methods for subgroup analysis have also been greatly developed recently. Among them, a finite mixture model has been recognized as an important tool and has been widely used for analyzing data from a heterogeneous population [5]. For example, there are many studies on the Gaussian mixture model for data clustering and classification [6,7,8]. Ref. [9] introduced a structured logistic-normal mixture model to identify subgroups in randomized clinical trials with differential treatment effects. Refs. [10,11] extended the mixture model-based approach to generalized linear models. Bayesian approaches for mixture regression models are studied by [12]. Moreover, nonparametric mixture models have also been under study in recent years. Ref. [13] studied a nonparametric mixture model for cure rate estimation. Ref. [14] studied a semiparametric accelerated failure time mixture model for estimation of a biological treatment effect on a latent subgroup of interest in randomized clinical trials. Ref. [15] proposed a semiparametric Logistic–Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring.
Mixture models are deeply connected to the expectation–maximization (EM) algorithm. The EM algorithm is a popular approach for maximum likelihood estimation in incomplete data problems, of which finite mixtures are canonical examples because the unobserved labels of the individuals (as in unsupervised clustering) give a direct interpretation of missing data [16]. Actually, the EM algorithm is a special member of the general family of MM algorithms [17]. The MM algorithm possesses great flexibility in solving optimization problems because the basic idea of MM algorithm is to convert a difficult optimization problem into a series of simpler ones. The MM algorithm has been a powerful tool for optimization problems and enjoys its greatest vogue in computational statistics. Thus far, the MM algorithm has been widely used in many statistical optimization problems. We can find applications of MM principle in a broad range of statistical contexts, including the Bradley–Terry model [18], quantile regression [19], variable selection [20,21], the proportional odds model [22], the shared frailty model [23], distance majorization [24] and so on. The key property of MM principle is that it can decompose a high-dimensional objective function into separable low-dimensional functions by the construction of surrogate function. In this paper, we introduce the general MM principle to the semiparametric mixture of proportional odds model for simultaneous subgroup identification and regression analysis.
The rest of the paper is organized as follows. We first review the MM algorithm in Section 2. In Section 3, we present the latent proportional odds model and develop a pair of estimation procedures for the proposed model using the MM algorithm. In Section 4, we provide two parts of simulation studies to select the number of subgroups and assess the finite-sample performances of the proposed methods. We further provide an application of the German breast cancer study data to illustrate the practical utilities of the proposed methods in Section 5.

2. MM Principle

The MM algorithm is an important and powerful tool for optimization problems and enjoys its greatest vogue in computational statistics. For example, ( α | Y o b s ) is the objective log-likelihood function, α = ( α 1 , , α q ) T Θ are the vector of parameters to be estimated, and Θ is the parameter space. The maximum likelihood estimate of α is α ^ = arg max α Θ ( α | Y o b s ) . The MM principle provides a general frame for constructing iterative algorithms with monotone convergence, which involves double duty. In maximization problems, the first M stands for minorize and the second M for maximize. The minorization step first constructs a surrogate function Q ( α | α ( k ) ) such that
Q ( α | α ( k ) ) ( α | Y o b s ) , α , α ( k ) Θ , Q ( α ( k ) | α ( k ) ) = ( α ( k ) | Y o b s ) ,
where α ( k ) denotes the current estimate of α in the k-th iteration. The maximization step then updates α ( k ) by α ( k + 1 ) , which maximizes the surrogate function Q ( · | α ( k ) ) instead of ( α | Y o b s ) , that is,
α ( k + 1 ) = arg max α Θ Q ( α | α ( k ) ) .
Since
( α ( k + 1 ) | Y o b s ) Q ( α ( k + 1 ) | α ( k ) ) Q ( α ( k ) | α ( k ) ) = ( α ( k ) | Y o b s ) ,
the constructed MM algorithm can increase the objective function at each iteration and possess the ascent property driving the objective optimization function ( α | Y o b s ) uphill.

3. Proportional Odds Model with Individual-Specific Covariate Effects

Let T be time to event. The proportional odds model postulates that
λ i ( t X ) = λ 0 ( t ) exp X i β 1 + Λ 0 ( t ) exp X i β ,
where λ i ( t ) is the hazard function of T i given the covariates X i . Let the conditional survival function of T be S ( t | X ) = P ( T > t | X ) . We know that λ ( t | X ) = d ( log S ( t | X ) ) d t . In the proportional odds model, β is the regression coefficients, quantifying the effect of the covariates X on the time to event T through the conditional hazard function. It is assumed to be the same for all subjects in the population. In practice, however, subjects may come from different subgroups, the covariate effects may differ and therefore it is more appropriate to assume the following proportional odds model with individual-specific covariate effects:
λ i ( t X ) = λ 0 ( t ) exp X i β i 1 + Λ 0 ( t ) exp X i β i .
In this model, we assume that the covariate effects β i for the subject i may differ. For both parsimony and better interpretation, it is reasonable to assume that β i = β 0 , m with probability π m , m = 1 , M . In other words, there are only M different subgroups for the covariate effects β i , where β 0 , m , m = 1 , , M are M different regression coefficients. It is of our interest to estimate the number of groups M, β 0 , m , m = 1 , , M and π m , m = 1 , , M . Note that m = 1 M π m = 1 .

3.1. Heterogeneity Regression Pursuit via MM Algorithm

The joint density function of ( T , δ ) can be written as
f ( t , δ | X ) = m = 1 M π m f m ( t , δ | X )
where
f m ( t , δ | X ) = λ 0 t exp X β m 1 + Λ 0 t exp X β m δ 1 1 + Λ 0 t exp X β m
denotes the density function of the m-th subgroup, m = 1 , 2 , , M , β m is the corresponding effect parameter of X in the m-th subgroup. Given the observed data Y o b s = ( { t i } i = 1 n , { d i } i = 1 n , { X i } i = 1 n ) , we have the observed log-likelihood function as
( Λ 0 , β , π | Y o b s ) = i = 1 n log m = 1 M π m f m ( t i , δ i | X i ) .
where Λ 0 ( t ) = i n I ( t i t ) λ 0 ( t i ) , β = ( β 1 T , , β M T ) T , π = ( π 1 , , π M ) . Given the parameters in the k-th iteration and denoting
υ m i ( k ) = π m ( k ) · f m ( k ) ( t i , δ i | X i ) m = 1 K π m ( k ) · f m ( k ) ( t i , δ i | X i ) ,
then we can rewrite ( Λ 0 , β , π | Y o b s ) as
( Λ 0 , β , π | Y o b s ) = i = 1 n log m = 1 M υ m i ( k ) · π m · f m ( t i , δ i | X i ) υ m i ( k ) .
By the continuous version of Jensen’s inequality as φ Ω f ( x ) · g ( x ) d x Ω φ ( f ( x ) ) · g ( x ) d x , we can transfer the function φ ( · ) outside the integral to the inside of the integral, where g ( x ) is a density function. Inspired by this feature, we construct a density function υ m i ( k ) in Equation (2) which plays the role of function g ( x ) , the rest of the part π m · f m ( t i , δ i | X i ) / υ m i ( k ) plays the role of function f ( x ) . By the following calculation,
i = 1 n log m = 1 M υ m i ( k ) · π m · f m ( t i , δ i | X i ) υ m i ( k ) i = 1 n m = 1 M υ m i ( k ) · log π m + log f m ( t i , δ i | X i ) ,
the logarithmic function on the outside is transferred to the inside of the integral, which breaks down the product terms into a summation. Hence, we construct the surrogate function for ( Λ 0 , β , π | Y o b s ) as
Q ( Λ 0 , β , π | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) · log π m + log f m ( t i , δ i | X i ) , = ^ Q ( π | Λ 0 ( k ) , β ( k ) , π ( k ) ) + Q ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) ,
where
Q ( π | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) · log π m ,
and
Q ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) log f m ( t i , δ i | X i ) , = i = 1 n δ i log λ 0 ( t i ) + i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) log 1 + Λ 0 ( t i ) exp ( X i β m ) .
The surrogate function Q ( Λ 0 , β , π | Λ 0 ( k ) , β ( k ) , π ( k ) ) separates the parameters π and ( Λ 0 , β ) into (3) and (4), respectively. All the parameters { π m } m = 1 K in (3) are separated from each other so that updating π m is as straightforward as
π ^ m = i = 1 n υ i m ( k ) n , m = 1 , , M .
To update ( Λ 0 , β ) , we apply the supporting hyperplane inequality to Equation (4) to release the object x from the logarithmic function,
log ( x ) log x 0 x x 0 x 0 ,
we have
log 1 + Λ 0 t i exp X i β m log ( A i m ( k ) ) 1 + Λ 0 t i exp X i β m A i m ( k ) A i m ( k ) ,
where A i m ( k ) = 1 + Λ 0 ( k ) t i exp X i β m ( k ) . Then, we obtain the following surrogate function for Q ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) ,
Q 1 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n δ i log λ 0 ( t i ) + i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) Λ 0 ( t i ) exp ( X i β m ) A i m ( k ) .

3.2. Profile MM Method

Following [25,26], we consider the profile estimation approach and first profile out Λ 0 in Q 1 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) for any given β . This leads to the estimate of Λ 0 given β as
λ ^ 0 t i = δ i j = 1 n I t j t i m = 1 M υ j m ( k ) δ j + 1 exp X j β m / A j m ( k ) .
Substituting (6) into Q 1 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) yields the function
Q 2 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n δ i log j = 1 n I t j t i m = 1 M υ j m ( k ) δ j + 1 exp X j β m / A j m ( k ) .
We use the supporting hyperplane inequality again to deal with Q 2 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) , then we obtain the follwing Q 3 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) where all β m ( m = 1 , , M ) are separated from each other,
Q 3 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n δ i j = 1 n I t j t i m = 1 M υ j m ( k ) δ j + 1 exp X j β m / A j m ( k ) B i ( k ) = m = 1 M i = 1 n υ i m ( k ) δ i X i β m i = 1 n δ i j = 1 n I t j t i υ j m ( k ) δ j + 1 exp X j β m / A j m ( k ) B i ( k ) = m = 1 M Q 3 ( β m | Λ 0 ( k ) , β ( k ) , π ( k ) ) ,
where B i ( k ) = j = 1 n I t j t i m = 1 M υ j m ( k ) δ j + 1 exp X j β m ( k ) / A j m ( k ) . Finally, the estimate of each β m can be obtained by one step Newton iteration.

3.3. Non-Profile MM Method

For the above profile MM method, the estimate of Λ 0 is highly related to the estimate of β because we treat nonparametric component Λ 0 as a function of β in the profile step. Inspired by the parameter-separable property of the MM principle, we further separate the nonparametric part Λ 0 with the β according to the decomposition rules. That is, we use the following inequality of arithmetic and geometric means to the function Q 1 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) as
i = 1 n x i a i i = 1 n a i a 1 x i a 1 .
Here, we let x 1 = Λ 0 ( t i ) / Λ 0 ( k ) ( t i ) and x 2 = exp ( X i β m ) / exp ( X i β m ( k ) ) , then we have
Λ 0 ( t i ) exp ( X i β m ) Λ 0 ( k ) ( t i ) exp ( X i β m ( k ) ) Λ 0 2 ( t i ) 2 Λ 0 2 ( k ) ( t i ) exp ( 2 X i β m ( k ) ) 2 exp ( 2 X i β m ( k ) ) .
That is,
Λ 0 ( t i ) exp ( X i β m ) exp ( X i β m ( k ) ) 2 Λ 0 ( k ) ( t i ) Λ 0 2 ( t i ) Λ 0 ( k ) ( t i ) 2 exp ( X i β m ( k ) ) exp ( 2 X i β m ) .
Substituting the above inequality back to Q 1 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) , we may obtain
Q 4 ( Λ 0 , β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n δ i log λ 0 ( t i ) + i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) exp ( X i β m ( k ) ) 2 Λ 0 ( k ) ( t i ) Λ 0 2 ( t i ) + Λ 0 ( k ) ( t i ) 2 exp ( X i β m ( k ) ) exp ( 2 X i β m ) / A i m ( k ) = ^ Q 4 ( Λ 0 | Λ 0 ( k ) , β ( k ) , π ( k ) ) + Q 4 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) ,
where
Q 4 ( Λ 0 | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n δ i log λ 0 ( t i ) i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) exp ( X i β m ( k ) ) 2 Λ 0 ( k ) ( t i ) Λ 0 2 ( t i ) / A i m ( k )
and
Q 4 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) δ i X i β m i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) Λ 0 ( k ) ( t i ) 2 exp ( X i β m ( k ) ) exp ( 2 X i β m ) / A i m ( k ) .
It is observed that the parameters Λ 0 and β m are completely separated, then the corresponding parameter estimators can be obtained by differentiating them separately. Letting Q 4 ( Λ 0 | Λ 0 ( k ) , β ( k ) , π ( k ) ) / Λ 0 = 0 , we obtain the estimate of Λ 0 by
λ ^ 0 t i = δ i j = 1 n I t j t i m = 1 M υ j m ( k ) δ j + 1 exp X j β m / A j m ( k ) .
To update β m , we calculate the first and second derivatives of Q 4 ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) as follows:
Q 4 β m ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) δ i X i i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) Λ 0 ( k ) ( t i ) exp ( X i β m ( k ) ) exp ( 2 X i β m ) X i / A i m ( k )
and
Q 4 β m ( β | Λ 0 ( k ) , β ( k ) , π ( k ) ) = i = 1 n m = 1 M υ i m ( k ) ( δ i + 1 ) Λ 0 ( k ) ( t i ) exp ( X i β m ( k ) ) exp ( 2 X i β m ) X i X i / A i m ( k ) .
Then, β m can be estimated by
β m ( k + 1 ) = β m ( k ) Q 4 β m ( β m ( k ) | Λ 0 ( k ) , β ( k ) , π ( k ) ) 1 Q 4 β m ( β m ( k ) | Λ 0 ( k ) , β ( k ) , π ( k ) ) .

4. Simulation Study

According to the estimation equation derived in previous sections, we simulate the data to analyze the estimation result at finite sample size. As the number of groups M in the mixture of proportional odds model is unknown and will be estimated by a data-driven manner. Here, we use the modified Bayesian information criterion (BIC [19]) to choose the number of components M by minimizing the criterion function:
B I C M = 2 ( Λ ^ 0 , β ^ , π ^ | Y o b s ) + M q log ( n ) .
where n is the sample size and q is the dimension of β m . Note that this is strictly related to the marginal likelihood computation as can be seen in [27,28,29].
Scenario 1. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two covariates
λ i ( t X ) = λ 0 ( t ) exp X i β i 1 + Λ 0 ( t ) exp X i β i ,
where the two covariates X i 1 and X i 2 are independent and follow the standard normal distribution, Λ 0 ( t ) = ( t / 2 ) 2 , We randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let P ( i G 1 ) = P ( i G 2 ) = 0.5 so that β i = ( 3 , 1 ) for i G 1 , β i = ( 3 , 2 ) for i G 2 . We choose different sample sizes n = 150 , 250 , 500 and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 1 reports the mean and median of the estimator M ^ and the proportion of M ^ equal to the true number of subgroups based on 500 replications. Table 2 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators π ^ , β 1 , and β 2 based on 500 replications. We found that the mean of M ^ gradually approaches the true number of subgroups 2, and the median of M ^ remains at 2, and the proportion of correctly identifying the true number of subgroups is close to 1 with the increase of sample size. Moreover, our methods can estimate the parameters well with small empirical bias, small MSE, and small standard error, even at small sample sizes.
Scenario 2. We generate right-censored data from a proportional odds model with three covariates
λ i ( t X ) = λ 0 ( t ) exp X i β i 1 + Λ 0 ( t ) exp X i β i ,
where the three covariates X i 1 , X i 2 and X i 3 are independent and follow the standard normal distribution. We set β = ( 1 , 3 , 2 ) and Λ 0 ( t ) = ( t / 2 ) 2 for all subjects. Note that the model corresponds to the latent proportional odds model with the true number of subgroups M being 1. We set the censoring proportion at 30% and choose different sample sizes n = 250 , 500 to assess their performance of the proposed estimation procedures.
Table 3 reports the mean and median of the estimator M ^ and the proportion of M ^ equal to the true number of subgroups based on 200 replications. Table 4 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators β based on 500 replications. Based on the profile MM method, we observed that the median of M ^ is equal to the true number 1, the mean also gets closer to 1, and the empirical percentage of M ^ is close to 1 as the sample size increases. Based on the non-profile MM method, we found that the mean and median of M ^ are both the true number 1, and the proportion of M ^ is 1 when the sample sizes are 250 and 500. Furthermore, our methods show excellent performance in parameter estimation. We obtain great estimates of β under different sample sizes.
Scenario 3. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two correlated covariates
λ i ( t X ) = λ 0 ( t ) exp X i β i 1 + Λ 0 ( t ) exp X i β i ,
where the two covariates are generated from a multivariate normal distribution with mean zero and a first-order autoregressive structure ρ | r s | for r , s = 1 , 2 . Set Λ 0 ( t ) = ( t / 2 ) 2 , sample size n = 200 . Then, we randomly assign the sample size n into two subgroups with equal probabilities, i.e., we let P ( i G 1 ) = P ( i G 2 ) = 0.5 so that β i = ( 3 , 1 ) for i G 1 , β i = ( 3 , 2 ) for i G 2 . We choose different values of ρ with ρ = 0.2 , 0.8 and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 5 reports the mean and median of the estimator M ^ and the proportion of M ^ equal to the true number of subgroups based on 500 replications. Table 6 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators π ^ , β 1 , and β 2 based on 500 replications. In Table 5, the results of the profile MM method and non-profile MM method are basically consistent, the proportions of M ^ are very close to 1 and the smaller the value of ρ , the larger the value of Pro. it shows that our proposed methods can accurately identify the number of subgroups. In Table 6, the estimation results at a smaller value of ρ perform better and more stably than the results at a larger value of ρ for both the profile MM method and the non-profile MM method.

5. Real Data Analysis

Now, we apply the proposed method to analyze the German Breast Cancer Study data which can be available from R package “pec”. The data contain the observations of 686 women where the censoring rate is 56.41%. In order to analyze whether there is heterogeneity in the data, we consider “tgrade(I vs. III, II vs. III )” and “pnodes” as explanatory variables of interest, where “tgrade” indicates tumor grade which is an ordered factor at levels I vs. III or II vs. III, “pnodes” indicates the number of positive lymph nodes. Then, we use the BIC criterion function to determine the number of subgroups M. In Table 7, we report the maximum log-likelihood values (LL), the BIC values (BIC), and the estimated parameters under the number of subgroups M = 1 , 2 , 3 . Based on the results in Table 7, we found that the optimal M is 1 by comparing the BIC values. The estimated regression coefficients are detailed in Table 7.

6. Conclusions

In this work, we introduce the MM algorithm into a semiparametric mixture modeling strategy in the proportional odds model for subgroup analysis of survival data that flexibly allows the covariate effects to differ among several subgroups. Both proposed MM methods to the semiparametric mixture of proportional odds model are able to conduct simultaneous subgroup identification and regression analysis, which provides a general frame for constructing iterative algorithms with monotone convergence. The main advantage of our MM algorithm is that it can separate the nonparametric baseline hazard rate with other regression parameters and can help to avoid matrix inversion in high-dimensional regression analysis, which makes the estimation process more efficient. Furthermore, our algorithm can mesh well with the existing quasi-Newton acceleration and other simple off-the-shelf accelerators to further boost the estimation process. Such estimation procedures derived for the semiparametric mixture proportional odds model can be easily extended to other semiparametric or nonparametric mixture models. Although our proposed MM algorithms are developed for the mixture of proportional odds models, a parallel approach can essentially be developed for the more general mixture of transformation models. We will investigate this in our future work.

Author Contributions

Conceptualization, J.X., X.H.; Data curation, X.H., C.X. and J.H.; Formal analysis, X.H. and C.X.; Investigation, X.H. and J.S.; Methodology, X.H., J.X. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Frühwirth-Schnatter, S. Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. Am. Stat. Assoc. 2001, 96, 194–209. [Google Scholar] [CrossRef]
  2. Rossi, P.E.; Allenby, G.M. Bayesian statistics and marketing. Mark. Sci. 2003, 22, 304–328. [Google Scholar] [CrossRef]
  3. Green, P.J.; Richardson, S. Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 2002, 97, 1055–1070. [Google Scholar] [CrossRef]
  4. Wang, P.; Puterman, M.L.; Cockburn, I.; Le, N. Mixed Poisson regression models with covariate dependent rates. Biometrics 1996, 52, 381–400. [Google Scholar] [CrossRef] [PubMed]
  5. Everitt, B. Finite Mixture Distributions; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  6. Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49, 803–821. [Google Scholar] [CrossRef]
  7. Hastie, T.; Tibshirani, R. Discriminant analysis by Gaussian mixtures. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 155–176. [Google Scholar] [CrossRef]
  8. McNicholas, P.D. Model-based classification using latent Gaussian mixture models. J. Stat. Plan. Inference 2010, 140, 1175–1181. [Google Scholar] [CrossRef]
  9. Shen, J.; He, X. Inference for subgroup analysis with a structured logistic-normal mixture model. J. Am. Stat. Assoc. 2015, 110, 303–312. [Google Scholar] [CrossRef]
  10. Chaganty, A.T.; Liang, P. Spectral experts for estimating mixtures of linear regressions. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1040–1048. [Google Scholar]
  11. Hurn, M.; Justel, A.; Robert, C.P. Estimating mixtures of regressions. J. Comput. Graph. Stat. 2003, 12, 55–79. [Google Scholar] [CrossRef]
  12. Frühwirth-Schnatter, S.; Frèuhwirth-Schnatter, S. Finite Mixture and Markov Switching Models; Springer: Berlin/Heidelberg, Germany, 2006; Volume 425. [Google Scholar]
  13. Peng, Y.; Dear, K.B. A nonparametric mixture model for cure rate estimation. Biometrics 2000, 56, 237–243. [Google Scholar] [CrossRef]
  14. Altstein, L.; Li, G. Latent subgroup analysis of a randomized clinical trial through a semiparametric accelerated failure time mixture model. Biometrics 2013, 69, 52–61. [Google Scholar] [CrossRef]
  15. Wu, R.f.; Zheng, M.; Yu, W. Subgroup analysis with time-to-event data under a logistic-Cox mixture model. Scand. J. Stat. 2016, 43, 863–878. [Google Scholar] [CrossRef]
  16. Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef]
  17. Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
  18. Hunter, D.R. MM algorithms for generalized Bradley-Terry models. Ann. Stat. 2004, 32, 384–406. [Google Scholar] [CrossRef]
  19. Hunter, D.R.; Lange, K. Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 2000, 9, 60–77. [Google Scholar]
  20. Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617–1642. [Google Scholar] [CrossRef]
  21. Yen, T.J. A majorization–minimization approach to variable selection using spike and slab priors. Ann. Stat. 2011, 39, 1748–1775. [Google Scholar] [CrossRef]
  22. Hunter, D.R.; Lange, K. Computing estimates in the proportional odds model. Ann. Inst. Stat. Math. 2002, 54, 155–168. [Google Scholar] [CrossRef]
  23. Huang, X.; Xu, J.; Tian, G. On profile MM algorithms for gamma frailty survival models. Stat. Sin. 2019, 29, 895–916. [Google Scholar] [CrossRef]
  24. Chi, E.C.; Zhou, H.; Lange, K. Distance majorization and its applications. Math. Program. 2014, 146, 409–436. [Google Scholar] [CrossRef]
  25. Johansen, S. An extension of Cox’s regression model. Int. Stat. Rev. Int. Stat. 1983, 51, 165–174. [Google Scholar] [CrossRef]
  26. Klein, J.P. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 1992, 48, 795–806. [Google Scholar] [CrossRef]
  27. Knuth, K.H.; Habeck, M.; Malakar, N.K.; Mubeen, A.M.; Placek, B. Bayesian evidence and model selection. Digit. Signal Process. 2015, 47, 50–67. [Google Scholar] [CrossRef]
  28. Llorente, F.; Martino, L.; Curbelo, E.; López-Santiago, J.; Delgado, D. On the safe use of prior densities for Bayesian model selection. Wiley Interdiscip. Rev. Comput. Stat. 2022, e1595. [Google Scholar] [CrossRef]
  29. DiCiccio, T.J.; Kass, R.E.; Raftery, A.; Wasserman, L. Computing Bayes factors by combining simulation and asymptotic approximations. J. Am. Stat. Assoc. 1997, 92, 903–915. [Google Scholar] [CrossRef]
Table 1. The mean, median, standard error (s.d.), and the proportion (Pro) of M ^ in Scenario 1.
Table 1. The mean, median, standard error (s.d.), and the proportion (Pro) of M ^ in Scenario 1.
MethodnMeanMedianPro
Profile MM150221
2502.0320.97
500221
Non-profile MM1502.0320.97
2502.0320.97
5002.00520.995
Table 2. Parameter estimation results in Scenario 1.
Table 2. Parameter estimation results in Scenario 1.
nParameterTrueProfile MMNon-Profile MM
BIASMSEs.d.BIASMSEs.d.
150 π 1 0.5 0.0014 0.00290.0543 0.0035 0.00320.0565
β 11 30.01620.17310.41620.02350.18120.4255
β 12 1 0.0216 0.09770.31220.00470.09250.3045
β 21 3 0.0085 0.17850.4228 0.0233 0.19130.4372
β 22 2 0.0011 0.12340.35160.03320.12670.23548
250 π 1 0.50.00130.00190.04410.00180.00170.0417
β 11 3 0.0106 0.09110.3019 0.0112 0.10380.3222
β 12 1 0.0119 0.05510.2347 0.0134 0.04870.2206
β 21 3 0.0100 0.10680.3270 0.0067 0.10300.3212
β 22 20.00320.07440.2730 0.0059 0.07240.2693
500 π 1 0.50.00020.00080.0287 0.0014 0.00080.0277
β 11 30.00890.04310.20760.00850.04620.2150
β 12 1 0.0082 0.02400.15500.00430.02200.1483
β 21 3 0.0096 0.04850.2202 0.0158 0.04340.2079
β 22 20.00760.03180.1784 0.0053 0.03490.1870
Table 3. The mean, median, and the proportion (Pro) of M ^ in Scenario 2.
Table 3. The mean, median, and the proportion (Pro) of M ^ in Scenario 2.
MethodnMeanMedianPro
Profile MM2501.00510.995
500111
Non-profile MM250111
500111
Table 4. Parameter estimation results in Scenario 2.
Table 4. Parameter estimation results in Scenario 2.
nParameterTrueProfile MMNon-Profile MM
BIASMSEs.d.BIASMSEs.d.
250 β 1 10.00210.01850.1361 0.0049 0.02060.1436
β 2 3 0.01940.05170.22680.00780.04490.2121
β 3 2 0.0190 0.03190.17790.00600.03090.1760
500 β 1 1 0.0012 0.00930.09660.00040.00970.0986
β 2 3 0.0014 0.024390.15630.00010.02470.1574
β 3 20.01140.01670.1288 0.0013 0.01490.1221
Table 5. The mean, median, and the proportion (Pro) of M ^ in Scenario 3.
Table 5. The mean, median, and the proportion (Pro) of M ^ in Scenario 3.
Method ρ MeanMedianPro
Profile MM0.22.00520.995
0.82.01520.985
Non-profile MM0.22.00520.995
0.82.01520.985
Table 6. Parameter estimation results in Scenario 3.
Table 6. Parameter estimation results in Scenario 3.
ρ ParameterProfile MMNon-Profile MM
BIASMSEs.d.BIASMSEs.d.
0.2 π 1 0.00030.00230.0487 0.0036 0.00230.0484
β 11 0.02850.12280.3502 0.0146 0.13560.3689
β 12 0.01610.06250.25020.00950.08220.2873
β 21 0.0356 0.11940.3446 0.0021 0.13510.3684
β 22 0.0149 0.08660.2945 0.0059 0.09180.3037
0.8 π 1 0.00230.00430.0661 0.0022 0.00390.0630
β 11 0.02510.24660.4972 0.0131 0.24130.4923
β 12 0.0155 0.17530.4195 0.0005 0.16480.4070
β 21 0.0990 0.34420.5797 0.0209 0.25900.5098
β 22 0.09650.26010.50200.01080.21280.4624
Table 7. Estimation results for breast cancer data.
Table 7. Estimation results for breast cancer data.
MethodMLLBICEstimated Parameters
Profile MM1 2049.165 4117.923 β ^ = ( 1.3489 , 0.3919 , 0.0937 )
2 2046.936 4133.057 π 1 ^ = 0.7012 , β ^ 1 = ( 1.2093 , 0.0392 , 0.0630 )
π 2 ^ = 0.2988 , β ^ 2 = ( 1.6508 , 1.2284 , 0.2415 )
3 2044.792 4148.361 π 1 ^ = 0.2680 , β ^ 1 = ( 1.6517 , 1.4040 , 0.2596 )
π 2 ^ = 0.0757 , β ^ 2 = ( 2.1672 , 0.5394 , 0.5748 )
π 3 ^ = 0.6563 , β ^ 3 = ( 1.1245 , 0.0477 , 0.0658 )
Non-profile MM1 2049.165 4117.923 β ^ = ( 1.3489 , 0.3918 , 0.0937 )
2 2046.936 4133.057 π 1 ^ = 0.7012 , β ^ 1 = ( 1.2093 , 0.0393 , 0.0630 )
π 2 ^ = 0.2988 , β ^ 2 = ( 1.6508 , 1.2282 , 0.2415 )
3 2044.792 4148.361 π 1 ^ = 0.2680 , β ^ 1 = ( 1.6516 , 1.4038 , 0.2596 )
π 2 ^ = 0.0757 , β ^ 2 = ( 2.1672 , 0.5392 , 0.5748 )
π 3 ^ = 0.6563 , β ^ 3 = ( 1.1245 , 0.0477 , 0.0658 )
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, X.; Xiong, C.; Xu, J.; Shi, J.; Huang, J. Mixture Modeling of Time-to-Event Data in the Proportional Odds Model. Mathematics 2022, 10, 3375. https://doi.org/10.3390/math10183375

AMA Style

Huang X, Xiong C, Xu J, Shi J, Huang J. Mixture Modeling of Time-to-Event Data in the Proportional Odds Model. Mathematics. 2022; 10(18):3375. https://doi.org/10.3390/math10183375

Chicago/Turabian Style

Huang, Xifen, Chaosong Xiong, Jinfeng Xu, Jianhua Shi, and Jinhong Huang. 2022. "Mixture Modeling of Time-to-Event Data in the Proportional Odds Model" Mathematics 10, no. 18: 3375. https://doi.org/10.3390/math10183375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop