1. Introduction
Mixture models and unconditional distribution models have been widely applied in the analysis of lifetime and survival data, particularly for capturing unobserved heterogeneity.
Pickles in [
1] reviewed some of the main approaches to the analysis of multivariate censored survival data. They compared the performance of conditional and mixture likelihood approaches in estimating models with frailty effects in censored bivariate survival data and found that the mixture methods were surprisingly robust against misspecification of the frailty distribution. Building on the idea of model flexibility, Ref. [
2] proposed a generalized Weibull distribution (the exponentiated Weibull), providing greater flexibility in modeling various shapes of hazard functions (increasing, decreasing, and bathtub-shaped). Similarly, Ref. [
3] developed a mixture model based on the extended exponential–geometric distribution to describe heterogeneous survival data, where the maximum likelihood method was used to estimate the model parameters. In contrast, Ref. [
4] developed survival models derived from stable distributions of the positive numbers—the gamma, the degenerate, and the inverse Gaussian distributions—to describe heterogeneity in populations and to show how these models affect hazard and survival functions.
Lawless in [
5] provides a comprehensive framework for modeling and analyzing lifetime data. The book covers classical lifetime distributions, such as Weibull, Exponential, and Gamma, along with methods for parameter estimation. In addition to that, Ref. [
6] addresses censored and truncated data, life testing, hazard models, and diagnostic tools for assessing system performance based on empirical data.
Another important contribution to the field of reliability engineering emphasizing practical and lifetime modeling was made by [
7], while Ref. [
8] presented a mathematical and probabilistic treatment of lifetime distributions, including Weibull models and other theoretical aspects.
Kuo and Peng in [
9] introduced a mixture model approach to analyze beetle data that included both exact observations and interval-censored cases. Building on this line of research, later contributions have sought to develop more flexible lifetime models. For instance, Rubio and Hong [
10] proposed a log two-piece model as a flexible class of lifetime distribution. They estimated its parameters via maximum likelihood and evaluated the model using information criteria such as AIC. The applicability of their method was further demonstrated with real datasets, including Mayo primary biliary cirrhosis and lung cancer studies.
In addition to mixture and frailty approaches, considerable attention has been given to extending simple one-parameter models, such as exponential distribution, by additional parameters, often to provide greater flexibility in the tail behavior. Ref. [
11] was among the first to formalize this idea, proposing the Beta-G family of distributions by embedding the Beta distribution to generate new probability models. This framework was subsequently broadened by [
12], which developed the mathematical properties of these generated families, including their density and distribution functions, moments, and reliability characteristics.
Building on the work of Kumaraswamy, Cordeiro and de Castro, Ref. [
13], introduced a new family of generalized distributions extending classical models such as the Weibull and Gamma. This idea was further developed through the generalized beta-generated family [
14], enhancing flexibility in modeling hazard behaviors. Later, Torabi and Montazari [
15] proposed the logistic-uniform distribution, adding additional adaptability to lifetime modeling. Collectively, these studies advanced the theoretical foundation for developing more flexible and realistic reliability models.
Many flexible lifetime models have been proposed by introducing extra shape parameters, compounding techniques, and hierarchical structures. Study [
16] introduced the beta exponential distribution, an extension of the classical exponential model obtained by applying the beta generator to provide more flexibility in modeling lifetime data with various hazard rate shapes. The study provided a comprehensive treatment of the mathematical properties of the distribution. Similarly, Ref. [
17] proposed a four-parameter lifetime model, called the gamma-extended Fréchet distribution, which is a new lifetime model that generalizes the traditional Fréchet distribution. Later, Ref. [
18] developed a new general method for generating families of continuous distributions based on transformations of random variables. Collectively, these contributions have advanced the development of flexible distribution families for lifetime and reliability modeling.
Kundu and Gupta [
19] investigated the Marshall–Olkin bivariate Weibull distribution, developed Bayesian estimation methods for its parameters, and provided a comprehensive framework for analyzing dependent lifetime data, enhancing reliability analysis in multicomponent systems. In contrast, Ref. [
20] proposed a generalized modified Weibull distribution, extending the classical Weibull model to capture a wide variety of hazard rate shapes. These models are able to capture various hazard patterns, including bathtub-shaped, unimodal, and other non-monotonic forms often observed in reliability and medical data.
Ghitany, Atieh, and Nadarajah [
21] examined the Lindley distribution as an alternative to the exponential model, complementing the exponential–geometric model earlier proposed by Adamidis and Loukas in 1998 and demonstrating its effectiveness for lifetime data with non-constant hazard rates. Study [
22] introduced the complementary exponential–geometric distribution, further enhancing flexibility in modeling heterogeneous survival data. Complementing these distributional developments, Ref. [
23] presented a comprehensive framework for Bayesian survival analysis, offering powerful inferential tools for lifetime modeling. Meanwhile, the book by Ref. [
24] focused on the analysis of multivariate survival data, addressing dependence structures among correlated lifetimes. Its strong emphasis on conceptual foundations and modeling strategies makes it an equally valuable reference for both applied statisticians and practitioners.
Ref. [
25] proposed a modified Weibull extension to model bathtub-shaped failure rates, addressing early failures, stable periods, and wear-out phases. Building on methodological developments in survival analysis, Ref. [
26] provided tools for analyzing interval-censored failure time data. Extending previous ideas introduced by Lehmann in 1953, these were later applied by Refs. [
16,
27], who introduced a class of exponentiated generalized distributions, highlighting their properties and real-data applications. Similarly, Ref. [
28] developed the beta generalized exponential distribution to flexibly model diverse lifetime behaviors. Ref. [
29] presented the generalized additive models for location, scale, and shape, offering a flexible framework for modeling univariate response variables. Collectively, these studies enriched the statistical framework for analyzing complex survival and reliability data.
Building on the foundational distributional frameworks, recent studies have utilized established estimation and analytical techniques to investigate the properties of newly developed lifetime models. Ref. [
30] proposed shrinkage-type estimators and compared them with the standard maximum likelihood estimation (MLE) method in reliability analysis. Ref. [
31] examined the mathematical properties of two newly introduced lifetime distributions, deriving survival and hazard functions, moments, moment-generating functions, mean deviation, Rényi entropy, and quantile functions, and demonstrated the consistency of MLE through Monte Carlo simulations. Similarly, Refs. [
32,
33] introduced new families of compound probability distributions and analyzed their statistical characteristics using MLE. Ref. [
34] derived analytical expressions for the PDF, CDF, survival, and hazard functions, mean residual life, and several entropy measures for an entropy-transformed Weibull model. Ref. [
35] focused on cumulative residual entropy and its dynamics for residual lifetimes, while Ref. [
36] examined generalized entropy measures to assess information loss in reliability systems. Expanding on these developments, Ref. [
37] proposed extended concepts of cumulative residual entropy and formulated expressions for residual and cumulative entropies for continuous distributions. Collectively, these studies highlight how established statistical tools continue to enhance the analysis, characterization, and understanding of complex modern lifetime distributions.
Mixture models combine two or more probability distributions to represent heterogeneous populations. The mixed distribution describes the overall population, while the mixing distribution assigns weights to each component. Mathematically, it is expressed as
, where
denotes the mixture distribution,
is the conditional (component) density function of
given
and
is the mixing distribution, determining the relative contributions of each [
38].
5. Entropy
In information theory, entropy is a measure of the amount of information or uncertainty in a system. It is a non-negative measure, and it depends on the probability distribution of events or outcomes. High entropy means more randomness and unpredictability in the data, while low entropy implies more predictability and less information content and less surprise. It finds applications in diverse scientific and engineering contexts. The entropy
of the random variable
, with a density function
, is defined as the expectation of the function
Entropy shows how uncertain or spread out the outcomes behave, while expectation shows the average outcome in the center of the data. That is why putting them side by side in a table makes the analysis clearer and more insightful. In
Table 4 and
Figure 9, as follow, we present the expectation and entropy for the UG-EM model at different values of
and
.
Table 4 and
Figure 9 illustrate the time evolution of the expectation
and entropy
for UG-EM for various values of α and λ. The two quantities increase with time, which means the system is developing and becoming more uncertain. Still, entropy
grows more slowly. Changes in α affect the system much more strongly than changes in λ, something we will see more clearly below. These results are numerical approximations intended to illustrate how the system’s behavior changes with the parameters, even in cases where the theoretical integrals do not converge.
When fitting the data for E(T) and H(T) in
Table 4, a quadratic model yields
values of
% and 98.9%, respectively. This indicates that both α and λ have a significant effect on increases or decreases in the system mean or entropy. Such influence reflects a higher degree of disorder and unpredictability within the UG-EM model.
Figure 10 presents these effects.
Clearly, E(T) increases with both α and λ. Conversely, H(T) exhibits a decreasing trend with increasing values of both parameters. These plots capture the behavior at a specific moment in time.
It is important to note, from
Table 4 and
Figure 10, that both the mean E(T) and entropy H(T) increase over time, indicating system growth and rising uncertainty. However, at fixed times there are small transient effects: increasing
α or
λ can produce a temporary concentration (slight decrease) in
H(
T) at early times, while their net effect over longer time horizons is an increase in
H(
T). The influence of
α is stronger than that of
λ.
8. Application
The fitness of the UG-EM model can be demonstrated through its application to three well-known survival datasets available in the R survival package: first, Lung data, which is from a clinical trial of non-small cell lung cancer patients, from treatment initiation to death or last follow-up. This records the time of survival in days from treatment initiation to death or censoring. Then, Kidney data: This measurement records the time to graft failure in kidney transplant recipients from the surgery date to the date of failure or the end of observation. Finally, Veteran data: These were received in a trial comparing two therapies for small-cell lung cancer. The data estimate the time of survival in days from entry until censoring or death.
Note: It should be noted that the current analysis assumes complete observations. Right-censored data are not explicitly handled in this study, and the reported results should be interpreted accordingly. Future work will extend the model to properly account for censoring in survival datasets.
Data Sources and Ethical Statement: The datasets used in this study (Lung, Veteran, and Kidney) are publicly available through the R survival package (2025.09.0+387). All datasets are fully de-identified and therefore exempt from ethical approval. The Lung dataset originates from the North Central Cancer Treatment Group (NCCTG) study on prognostic variables in advanced lung cancer patients [
43]. The Veteran dataset is based on the data described in Kalbfleisch and Prentice [
44]. The Kidney dataset corresponds to catheter survival data analyzed using frailty models [
45].
In
Table 6 and according to the criteria AIC and BIC, the Weibull model is the best-fitting model for both Veteran and Lung datasets due to the minimum value of both of them, while the Log-normal distribution is a close second-best fit for both these datasets, particularly for the Veteran data. The Log-normal model gives the best fit for the Kidney data. The UG-EM model did not outperform other candidates in any of the datasets, although it showed competitive performance in the Kidney data. Notably, the Gamma model failed to fit the data well, so it was excluded from the final comparisons.
The UG-EM model is particularly well-suited to capture the behavior of the data that has a long survival time and decays noticeably more slowly than an exponential curve. In other words, it has a heavy or slowly vanishing right tail.
In the following, a graphical comparison of the Exponential, Weibull, Log-normal, and UG-EM models based on Lung, Veteran, and Kidney data is presented in
Figure 13.