1. Introduction
Survival analysis is a statistical approach used to analyze time-to-event data, which measure the time until a specific event occurs, such as disease progression, treatment response, or death [
1]. This approach is widely used in biomedical research, where predicting survival probabilities and time-to-event outcomes is crucial for understanding patient prognosis and evaluating treatments [
2].
Traditional survival analysis methods often use techniques such as the Kaplan–Meier estimator to estimate survival functions and the Cox proportional hazards model to assess the relationship between predictor variables and event risk over time [
3]. However, these models involve limitations when dealing with more complex data, such as censored observations, time-varying covariate effects, and population heterogeneity.
Bayesian survival analysis extends traditional methods by combining prior information with observed data in survival models. This approach is especially useful when sample sizes are small or when there is substantial uncertainty in the data [
4]. Bayesian frameworks use prior distributions to combine existing knowledge or expert opinion with observed data, resulting in more flexible and robust models [
5]. Additionally, Bayesian methods naturally quantify the uncertainty in parameter estimates, which is crucial for precise predictions in fields such as personalized medicine [
6].
Traditional survival models often assume that populations are homogeneous, but in reality, different groups may exhibit distinct survival patterns. Mixture models address this issue by allowing researchers to model diverse populations with varying survival outcomes [
7]. These models include finite mixture models, latent class models, and nonparametric mixture models, each serving a specific role in handling data complexity.
Bayesian mixture-based survival modeling has also received increasing attention beyond classical parametric formulations. For instance, Kottas [
8] developed a nonparametric Bayesian survival model using mixtures of Weibull distributions, while Iorio et al. [
9] proposed a Bayesian nonparametric approach for survival regression under nonproportional hazards. Other related work includes Dirichlet process mixture models for survival outcome data [
10], full Bayesian inference for hazard mixture models [
11], and Bayesian nonparametric Erlang mixture modeling for survival analysis [
12]. Although these studies demonstrate the flexibility of Bayesian mixture methods for modeling heterogeneous survival data, they do not address structural selection among arithmetic, geometric, harmonic, and intermediate mixture forms through a single mixing parameter. This gap motivates the present Bayesian treatment of the
-mixture survival model.
Finite mixture models divide a population into a finite number of subgroups, making it easier to identify distinct survival patterns within each group [
7]. Latent class models extend this idea by grouping individuals into unobserved categories based on shared characteristics, thereby revealing hidden structures within the data [
13]. Nonparametric mixture models, on the other hand, offer greater flexibility by avoiding strict parametric assumptions and enabling a more data-driven approach to identifying underlying survival distributions [
14]. Together, these models improve the precision of survival analysis and provide a deeper understanding of the factors that influence survival outcomes.
Among the different types of mixture models, arithmetic, geometric, and harmonic mixtures stand out [
15,
16]. An arithmetic mixture model takes a weighted average of the individual component distributions and is the most common approach. In contrast, geometric and harmonic mixtures use alternative forms of aggregation, allowing them to capture distinctive data features such as skewness or heavy tails [
17,
18].
In a recent study, Asadi et al. [
19] introduced the
-mixture model, which generalizes conventional mixture models in survival analysis using a single mixing power parameter
. The framework includes both survival mixtures and failure-rate mixtures as special cases, allowing for a more comprehensive approach to modeling heterogeneous survival data. The main idea is that different values of
correspond to different types of mixtures. Therefore, the
-mixture model creates a continuous bridge among the classical mixture types. The detailed properties of the
mixture, along with the value of
, can be found in [
19].
While most existing studies on
-mixture models have concentrated on their stochastic properties, such as comparisons, aging characteristics, and bending behavior (e.g., Shojaee et al. [
20,
21], Barmalzan et al. [
22], Shojaee and Momeni [
23]), the development of statistical inference methods within a Bayesian framework remains relatively underexplored.
A critical component of survival mixture modeling is model selection, which determines the model that best fits the data. This task becomes especially important when comparing different types of mixture models. Traditional criteria such as AIC, BIC, DIC, WAIC, and Bayes factors evaluate model adequacy based on likelihood comparisons and assume fixed model structures. In contrast, the
-mixture model extends model selection to the structural level by introducing a parameter that generalizes classical mixture formulations. This parameter allows the model to capture heterogeneity more flexibly and improve overall fit. Through its posterior distribution, the
-mixture model identifies the mixture form most consistent with the observed data. This unified formulation removes the need to fit multiple separate models and allows model selection and parameter estimation to occur simultaneously within a coherent Bayesian setting. For the model assessment, in this paper, we consider the logarithm of the pseudo-marginal likelihood (LPML). The LPML is a leave-one-out cross-validated criterion based on the conditional predictive ordinate [
24].
The remainder of this paper is organized as follows.
Section 2 presents the mathematical formulation of the
-mixture model and the Bayesian estimation procedure.
Section 3 illustrates the approach through three examples involving different mixture combinations.
Section 4 presents the results of the simulation studies.
Section 5 analyzes three data sets with
-mixture models.
Section 6 concludes the study and discusses the results, implications, and directions for future research.
2. Model
Suppose that
represents the survival function for component
j, where
. Let
denote the survival time for the subject
i, and let
be the mixing proportion for the component
j. Let
denote the mixing power, and let
denote the set of all parameters involved in the model. Following [
19], the survival function of the
-mixture model is defined as
where
represents the parameters in the specific component,
j, of the mixing function. Here, it should be noted that
, as shown in [
19]. Let
denote the failure rate for component
j. The corresponding probability density function (pdf) for subject
i is given by
To avoid the label switch and identifiability issues, as did [
25], we consider non-scalable survival functions
. Further, by the constraint
and
, we avoid the identifiability issue between
and
,
.
For right-censored data, let
denote the event time and
denote the censoring time for subject
. Then, the observed survival time
and the censoring indicator
can, respectively, be expressed as
where
denotes the indicator function. Let
denote the vector of observed survival times, and let
denote the vector of censoring indicators. Then, the observed-data likelihood is given by
where
denotes the failure rate for component
j. Thus, if
, the contribution is the density
, whereas if
, the contribution is the survival function
.
Equation (
2) presents the pdf of the
-mixture model, and different values of
lead to different types of mixtures. For example, in the absence of censoring, it follows from (
2) that:
- (a)
when
, (
2) gives the pdf of the arithmetic mixture such that
- (b)
when
, (
2) gives the pdf of the geometric mixture, such that
- (c)
and, when
, (
2) gives the pdf of the harmonic mixture, such that
Under the Bayesian framework, for the proper joint posterior distribution of parameters, we assign independent proper prior distributions for parameters involved in the model. The mixing power takes a real value, so that we assign a Normal prior distribution on it. The mixing proportions indicate the membership probabilities of mixing components. Hence, we consider a Dirichlet prior distribution for it. For the parameters involved in the component j, that is, , we assign appropriate prior distributions depending on its survival function. For example, we assign Normal prior distributions for the Lognormal survival function, and we assign Gamma prior distribution for Weibull survival function.
Based on Bayes rule, we may obtain the joint posterior distribution of all parameters given survival times from the joint distribution of all survival times and parameters that can be expressed by the likelihood and the product of prior distributions. Regarding Monte Carlo-based inferences, we may collect Markov Chain Monte Carlo samples through Gibbs sampling by using the full conditional distributions. For example, if we assign prior distributions for the
-mixture with
m components and a single positive parameter in each component, such that
for the hyperparameters
,
,
and
,
, then we may use the full conditional distributions that are proportional to
It should be noted that the estimation of parameters in the
-mixture model includes not only the parameters in each mixture components and membership probabilities but also the mixing power
. The Bayes estimator of the mixing power, that is the posterior mean of
, reveals the appropriate types of mixture by (
1) and (
2).
We update
, the mixture weights
, and
or
using Metropolis–Hastings within Gibbs algorithms. At iteration
k, proposal values are generated as follows:
where
is a vector of concentration parameters, and the Gamma proposal distribution is parameterized by
For mixtures of Lognormal components, the location parameters
are updated using a Normal proposal:
rather than the Gamma proposal used for
.
We present three examples involving different distributions in the next section.
3. Examples
3.1. Example 1: Weibull–Weibull Mixture Model
In this first example, we consider a mixture model with two Weibull distributions with the shape parameters fixed at 2. The survival function for subject
i is
and the corresponding failure rate function is
where
is the scale parameter for component
j. We have considered the three sample sizes with
10,000.
With mixing proportions
, we define the parameter set
which contains all model parameters. Let
denote the survival times for all subjects. Let
if the event time is observed and
if the observation is right-censored. Then, the likelihood function of the
-mixture model is
where
.
We assume that the failure times are independently distributed across subjects and use a Normal prior for , specified as , where and . A Beta prior, , is assigned to the mixing proportions , with . For the scale parameters , we assume Gamma priors, , with .
Let denote the observed survival times, with if the event is observed and if the observation is right-censored.
Applying Bayes’ rule, we derive the posterior distributions for the parameters, as follows:
It should be noted that, because the expression for has a limiting form at , we evaluate the geometric-mixture limit whenever . This is a numerical convention used for stability and for classification of the near-zero case, rather than a literal point mass in the posterior distribution.
In the Markov chain Monte Carlo (MCMC) sampling procedure, we consider the full conditional distribution of
. In Equation (
1),
, so that
as
. Hence, in Gibbs sampling, we generate the sample
from its full conditional distribution
and take it as zero if it is within
for a small
value. Here, we use
.
3.2. Example 2: Gamma–Weibull Mixture Model
In this example, we consider a mixture of two different distributions, Gamma and Weibull, and fix the shape parameters at 1 and
, respectively. The survival functions for subject
i are
The corresponding failure rate functions are
With mixing proportions
, we define the parameter set
which contains all model parameters. Let
denote the survival times for all subjects. The likelihood function of the
-mixture model is
where
.
The same prior information for each parameter is used, following a similar approach to that in Example 1. We obtain the posterior distributions for the parameters:
For the sampling of in the Gibbs procedure, we use the full conditional distribution . If the generated value lies in , we evaluate the limiting geometric-mixture form at and record the value as zero for reporting purposes. This near-zero rule is a numerical convention, not a literal posterior point mass at zero.
3.3. Example 3: Lognormal–Lognormal Mixture Model
In this example, we consider a mixture model with two Lognormal distributions and fix the standard deviations at
and
, respectively. The survival functions for subject
i are
The corresponding failure-rate functions are
where
and
denote the pdf and cumulative distribution function (cdf) of the Normal distribution, respectively.
With mixing proportions
, we define the parameter set
which contains all model parameters. Let
denote the survival times for all subjects. The likelihood function of the
-mixture model is
where
.
We use the same prior specification for
and
, and we assign a Normal prior
to each
,
. Following an approach similar to that used in the previous examples, we derive the posterior distributions for the parameters:
Note that, for the sampling of in the Gibbs procedure, we use the full conditional distribution . If the generated value lies in , we evaluate the limiting geometric-mixture form at and record the value as zero for reporting purposes. This near-zero rule is a numerical convention, not a literal posterior point mass at zero.
4. Simulation Study
In this section, we conduct simulation studies to evaluate the performance of the proposed Bayesian estimation method for -mixture models across three specific cases. The estimation accuracy is evaluated using the mean squared error (MSE). A model comparison is performed using the log pseudo-marginal likelihood (LPML), which is a commonly used Bayesian model selection criterion. Larger LPML values indicate a better model fit. LPML provides an alternative to other criteria, such as DIC and WAIC, and is particularly suitable for mixture models. For each case, we generate 5000 Markov chain Monte Carlo (MCMC) samples after a burn-in period of 5000 iterations and perform 100 simulation runs. The acceptance rates of the Metropolis–Hastings updates are also recorded to assess the mixing performance of the MCMC algorithm.
For each simulated dataset, the sample size is set to . Right censoring is introduced by generating censoring times from a uniform distribution with the upper bound chosen to yield approximately 10% censoring.
We have simulated the survival times under the following three cases:
WW: mixture of two Weibull distributions, and ;
GW: mixture of Gamma and Weibull distributions, and ; and
LL: mixture of two Lognormal distributions, and .
We assume equal mixing proportions, specifically , for all cases, and evaluate the performance of the estimator for at various true values. Specifically, for WW, we consider , while for GW and LL, we consider .
For Bayesian inference, we specify independent prior distributions for all parameters. The mixing parameter is assigned a Normal prior, . The mixing proportions follow a Beta prior, , for two–component mixtures. For the parameters in each component of the mixture, we have assigned independent Gamma prior distribution for the scale parameter and Normal prior distribution for the location parameter, respectively, with hyperparameters chosen to be weakly informative.
Posterior samples are obtained using a Metropolis–Hastings within the Gibbs sampling algorithm. At each iteration, the parameters are updated sequentially from their full conditional distributions. All the parameters are updated using Metropolis–Hastings steps since the closed-form full conditionals are not available. For each simulation, 10,000 iterations are generated, with the first 5000 iterations discarded as burn-in, and the remaining 5000 samples are used for inference. The convergence of the MCMC chains is assessed using trace plots, and the trace plots show good mixing behavior. Representative trace plots are provided in
Appendix A. Posterior means based on the retained samples are used as point estimates, and the performance of the proposed method is evaluated using MSE, LPML, and acceptance rates.
Table 1 summarizes the estimation results for
under three cases. In addition to
, we also assess the recovery of mixture weights and component parameters. The estimates for
,
, and
,
, exhibit small bias and mean squared error across simulation settings, indicating accurate recovery of the underlying parameters. Furthermore, the coverage probabilities of the 95% credible intervals were examined and found to be close to the nominal level, suggesting that the proposed Bayesian procedure provides reliable uncertainty quantification.
To avoid imposing strong prior information, we adopt weakly informative priors for all model parameters. Specifically, the parameter is assigned a Normal prior with large variance, allowing a wide range of plausible values. The mixing proportions are assigned Beta or Dirichlet priors, which ensure that the weights lie within the unit simplex and are commonly used in mixture modeling. The component parameters are assigned Gamma priors to enforce positivity while remaining weakly informative. To assess robustness to prior specification, we have conducted sensitivity analyses using alternative hyperparameter values. The resulting estimates show qualitative similarity, indicating that the proposed method is not sensitive to the choice of priors.
Figure 1 displays the estimated survival curves for the three mixture model examples considered in the simulation study.
5. Data Application
We analyze three survival datasets: the Kidney Catheter data [
26], the Hospital Infection data [
27], and glioma data obtained from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (
seer.cancer.gov). For each dataset, the proposed
-mixture model is fitted under multiple mixture specifications to assess the implied mixture structure through the estimated value of
.
Table 2 summarizes the posterior results.
Here, WW denotes the two-component Weibull mixture described in Example 1, GW represents the Gamma–Weibull mixture from Example 2, and LL represents the Lognormal–Lognormal mixture from Example 3.
The posterior summaries for the model parameters are reported in
Table 2. Across the three datasets, the estimates of
under the WW model are small and positive, indicating mixtures close to the geometric case. For the SEER–Medicare data, the credible interval for
is particularly narrow, suggesting strong evidence that the mixture structure is near-geometric. Although the estimates of
are close to zero in several cases, this does not imply that the geometric mixture necessarily provides the best fit. The value of
is estimated jointly with the mixing proportions and component parameters, and small deviations from zero can still affect the fitted survival function. As a result, the arithmetic case may provide a comparable or slightly better fit, as reflected by the LPML values reported in
Table 3. This suggests that the data favor a mixture structure that is close to geometric but not exactly equal to it.
Table 4 compares LPML values for the single Weibull, single Lognormal, and single Gamma models across the three datasets. Since larger LPML values indicate a better fit, the single Lognormal model is preferred for all datasets. The differences are small for the Kidney Catheter data but substantial for the SEER-Medicare and Hospital Infection datasets, indicating that the Lognormal distribution provides the best single-model representation and is, therefore, used in the subsequent plots.
Figure 2 compares the fitted survival curves with the Kaplan–Meier estimates for the three real-data applications.
Across the three datasets, the Lognormal -mixture model (red) generally provides the best overall fit compared with the fixed- mixtures (blue) and the single Lognormal model (green). The fixed- model differs by dataset: the Weibull–Weibull arithmetic mixture is preferred for the kidney data, whereas the Lognormal–Lognormal geometric mixture performs best for the hospital and SEER data. Although the single Lognormal model is the best within the single-model class for all datasets, it is less flexible and tends to deviate more from the Kaplan–Meier curve, especially in the tail regions.
6. Discussion
In this study, we developed and extended mixture models for analyzing heterogeneous survival data within a Bayesian framework. The work began with the Bayesian -mixture model, which estimates the mixture type through the mixing power parameter . Different values of correspond to distinct types of mixtures, and the simulation studies demonstrated that the Bayesian -mixture accurately estimates across a range of distributional settings. The method was implemented in the R package alpmixBayes and applied to multiple real datasets, illustrating its practical usefulness for identifying mixture types and guiding applied survival analysis. These results show that the Bayesian -mixture provides a reliable and interpretable approach to modeling heterogeneity when a single mixing power is adequate.
Within the Bayesian framework, model selection aims to identify the model that best captures the underlying data structure while balancing the goodness of fit and model complexity. Common criteria include the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), the Deviance Information Criterion (DIC) [
28], the Widely Applicable Information Criterion (WAIC) [
29], and Bayes factors [
30]. Although these measures provide systematic tools for comparing models, they rely on fixed model structures and require separate estimation for each candidate model, which can be computationally demanding when model uncertainty is present.
The -mixture model offers an alternative by embedding multiple competing structures within a single formulation. Rather than fitting separate models, the parameter allows the mixture to adjust its form according to the data, and its posterior distribution provides direct information about the appropriate structure. When approaches specific values (e.g., 0 or 1), the mixture simplifies to particular component forms, effectively identifying the structure most consistent with the data. In this way, the -mixture acts as a continuous bridge among competing formulations, rather than relying on discrete comparisons.
By incorporating structural learning into the estimation step, the -mixture model provides a unified Bayesian approach to both parameter estimation and model selection. This reduces computational redundancy and allows for a more interpretable assessment of model adequacy while accounting for uncertainty in the underlying structure.
Future research may extend this work in several directions. One direction is to adapt the modeling framework to settings involving censoring or truncation, which are common in survival analysis and require careful treatment in the likelihood and computation. Another direction is to improve computational performance as the hierarchical structure grows since deeper models introduce more parameters and require efficient sampling or discretization methods to ensure stable estimation.
Additional applications to real data will help clarify when hierarchical mixtures offer clear benefits and when simpler models may be sufficient. Exploring a broader range of scientific settings can deepen our understanding of how mixture-based approaches represent heterogeneity and support survival modeling in more complex contexts. These extensions offer opportunities for further methodological development and for refining the role of mixture models in applied survival analysis.