1. Introduction
In some clinical trials, a substantial proportion of patients respond favorably to a new treatment while the others may eventually relapse. Subgroup analyses aim to classify the patients into a few homogeneous groups and tailor a disease treatment specifically for each subgroup to optimize the treatment effect. In recent years, subgroup identification has received increasing attention in a wide range of fields such as clinical trials, public management, econometrics, and social science. For example, Refs. [
1,
2] conducted subgroup analysis in econometrics and marketing, while Refs. [
3,
4] implemented the subgroup analysis in epidemiology and biology, respectively.
Statistical methods for subgroup analysis have also been greatly developed recently. Among them, a finite mixture model has been recognized as an important tool and has been widely used for analyzing data from a heterogeneous population [
5]. For example, there are many studies on the Gaussian mixture model for data clustering and classification [
6,
7,
8]. Ref. [
9] introduced a structured logistic-normal mixture model to identify subgroups in randomized clinical trials with differential treatment effects. Refs. [
10,
11] extended the mixture model-based approach to generalized linear models. Bayesian approaches for mixture regression models are studied by [
12]. Moreover, nonparametric mixture models have also been under study in recent years. Ref. [
13] studied a nonparametric mixture model for cure rate estimation. Ref. [
14] studied a semiparametric accelerated failure time mixture model for estimation of a biological treatment effect on a latent subgroup of interest in randomized clinical trials. Ref. [
15] proposed a semiparametric Logistic–Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring.
Mixture models are deeply connected to the expectation–maximization (EM) algorithm. The EM algorithm is a popular approach for maximum likelihood estimation in incomplete data problems, of which finite mixtures are canonical examples because the unobserved labels of the individuals (as in unsupervised clustering) give a direct interpretation of missing data [
16]. Actually, the EM algorithm is a special member of the general family of MM algorithms [
17]. The MM algorithm possesses great flexibility in solving optimization problems because the basic idea of MM algorithm is to convert a difficult optimization problem into a series of simpler ones. The MM algorithm has been a powerful tool for optimization problems and enjoys its greatest vogue in computational statistics. Thus far, the MM algorithm has been widely used in many statistical optimization problems. We can find applications of MM principle in a broad range of statistical contexts, including the Bradley–Terry model [
18], quantile regression [
19], variable selection [
20,
21], the proportional odds model [
22], the shared frailty model [
23], distance majorization [
24] and so on. The key property of MM principle is that it can decompose a high-dimensional objective function into separable low-dimensional functions by the construction of surrogate function. In this paper, we introduce the general MM principle to the semiparametric mixture of proportional odds model for simultaneous subgroup identification and regression analysis.
The rest of the paper is organized as follows. We first review the MM algorithm in
Section 2. In
Section 3, we present the latent proportional odds model and develop a pair of estimation procedures for the proposed model using the MM algorithm. In
Section 4, we provide two parts of simulation studies to select the number of subgroups and assess the finite-sample performances of the proposed methods. We further provide an application of the German breast cancer study data to illustrate the practical utilities of the proposed methods in
Section 5.
3. Proportional Odds Model with Individual-Specific Covariate Effects
Let
T be time to event. The proportional odds model postulates that
where
is the hazard function of
given the covariates
. Let the conditional survival function of
T be
. We know that
. In the proportional odds model,
is the regression coefficients, quantifying the effect of the covariates
X on the time to event
T through the conditional hazard function. It is assumed to be the same for all subjects in the population. In practice, however, subjects may come from different subgroups, the covariate effects may differ and therefore it is more appropriate to assume the following proportional odds model with individual-specific covariate effects:
In this model, we assume that the covariate effects
for the subject
i may differ. For both parsimony and better interpretation, it is reasonable to assume that
with probability
. In other words, there are only
M different subgroups for the covariate effects
, where
are
M different regression coefficients. It is of our interest to estimate the number of groups
M,
and
. Note that
.
3.1. Heterogeneity Regression Pursuit via MM Algorithm
The joint density function of
can be written as
where
denotes the density function of the
m-th subgroup,
,
is the corresponding effect parameter of
X in the
m-th subgroup. Given the observed data
, we have the observed log-likelihood function as
where
. Given the parameters in the
k-th iteration and denoting
then we can rewrite
as
By the continuous version of Jensen’s inequality as
, we can transfer the function
outside the integral to the inside of the integral, where
is a density function. Inspired by this feature, we construct a density function
in Equation (
2) which plays the role of function
, the rest of the part
plays the role of function
. By the following calculation,
the logarithmic function on the outside is transferred to the inside of the integral, which breaks down the product terms into a summation. Hence, we construct the surrogate function for
as
where
and
The surrogate function
separates the parameters
and
into (
3) and (
4), respectively. All the parameters
in (
3) are separated from each other so that updating
is as straightforward as
To update
, we apply the supporting hyperplane inequality to Equation (
4) to release the object
x from the logarithmic function,
we have
where
. Then, we obtain the following surrogate function for
,
3.2. Profile MM Method
Following [
25,
26], we consider the profile estimation approach and first profile out
in
for any given
. This leads to the estimate of
given
as
Substituting (
6) into
yields the function
We use the supporting hyperplane inequality again to deal with
, then we obtain the follwing
where all
are separated from each other,
where
. Finally, the estimate of each
can be obtained by one step Newton iteration.
3.3. Non-Profile MM Method
For the above profile MM method, the estimate of
is highly related to the estimate of
because we treat nonparametric component
as a function of
in the profile step. Inspired by the parameter-separable property of the MM principle, we further separate the nonparametric part
with the
according to the decomposition rules. That is, we use the following inequality of arithmetic and geometric means to the function
as
Here, we let
and
, then we have
That is,
Substituting the above inequality back to
, we may obtain
where
and
It is observed that the parameters
and
are completely separated, then the corresponding parameter estimators can be obtained by differentiating them separately. Letting
, we obtain the estimate of
by
To update
, we calculate the first and second derivatives of
as follows:
and
Then,
can be estimated by
4. Simulation Study
According to the estimation equation derived in previous sections, we simulate the data to analyze the estimation result at finite sample size. As the number of groups
M in the mixture of proportional odds model is unknown and will be estimated by a data-driven manner. Here, we use the modified Bayesian information criterion (BIC [
19]) to choose the number of components
M by minimizing the criterion function:
where
n is the sample size and
q is the dimension of
. Note that this is strictly related to the marginal likelihood computation as can be seen in [
27,
28,
29].
Scenario 1. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two covariates
where the two covariates
and
are independent and follow the standard normal distribution,
, We randomly assign the sample size
n into two subgroups with equal probabilities, i.e., we let
so that
for
,
for
. We choose different sample sizes
and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 1 reports the mean and median of the estimator
and the proportion of
equal to the true number of subgroups based on 500 replications.
Table 2 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators
,
, and
based on 500 replications. We found that the mean of
gradually approaches the true number of subgroups 2, and the median of
remains at 2, and the proportion of correctly identifying the true number of subgroups is close to 1 with the increase of sample size. Moreover, our methods can estimate the parameters well with small empirical bias, small MSE, and small standard error, even at small sample sizes.
Scenario 2. We generate right-censored data from a proportional odds model with three covariates
where the three covariates
and
are independent and follow the standard normal distribution. We set
and
for all subjects. Note that the model corresponds to the latent proportional odds model with the true number of subgroups
M being 1. We set the censoring proportion at 30% and choose different sample sizes
to assess their performance of the proposed estimation procedures.
Table 3 reports the mean and median of the estimator
and the proportion of
equal to the true number of subgroups based on 200 replications.
Table 4 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators
based on 500 replications. Based on the profile MM method, we observed that the median of
is equal to the true number 1, the mean also gets closer to 1, and the empirical percentage of
is close to 1 as the sample size increases. Based on the non-profile MM method, we found that the mean and median of
are both the true number 1, and the proportion of
is 1 when the sample sizes are 250 and 500. Furthermore, our methods show excellent performance in parameter estimation. We obtain great estimates of
under different sample sizes.
Scenario 3. We generate clustered right-censored data from a mixture of proportional odds model with two subgroups and two correlated covariates
where the two covariates are generated from a multivariate normal distribution with mean zero and a first-order autoregressive structure
for
Set
, sample size
. Then, we randomly assign the sample size
n into two subgroups with equal probabilities, i.e., we let
so that
for
,
for
. We choose different values of
with
and set the censoring proportion at 30% to assess their performance of the proposed estimation procedures.
Table 5 reports the mean and median of the estimator
and the proportion of
equal to the true number of subgroups based on 500 replications.
Table 6 reports the empirical bias, mean square error (MSE), and standard error (s.d.) of the estimators
,
, and
based on 500 replications. In
Table 5, the results of the profile MM method and non-profile MM method are basically consistent, the proportions of
are very close to 1 and the smaller the value of
, the larger the value of Pro. it shows that our proposed methods can accurately identify the number of subgroups. In
Table 6, the estimation results at a smaller value of
perform better and more stably than the results at a larger value of
for both the profile MM method and the non-profile MM method.