A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes

Vasquez, Jonathan K. J.; Tomazella, Vera; Alvares, Danilo; Marinho, Pedro Rafael D.; Martínez-Minaya, Joaquín

doi:10.3390/stats9030064

Open AccessArticle

A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes

by

Jonathan K. J. Vasquez

^1,*

,

Vera Tomazella

²

,

Danilo Alvares

¹

,

Pedro Rafael D. Marinho

³

and

Joaquín Martínez-Minaya

⁴

¹

Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil

²

Department of Statistics, Federal University of São Carlos, São Carlos 13565-905, Brazil

³

Department of Statistics, Federal University of Paraíba, João Pessoa 58051-900, Brazil

⁴

Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, 46022 Valencia, Spain

^*

Author to whom correspondence should be addressed.

Stats 2026, 9(3), 64; https://doi.org/10.3390/stats9030064 (registering DOI)

Submission received: 23 April 2026 / Revised: 12 June 2026 / Accepted: 16 June 2026 / Published: 19 June 2026

(This article belongs to the Section Survival Analysis)

Download

Browse Figures

Versions Notes

Abstract

This paper introduces a random activation framework for cure rate modeling that provides a novel latent mechanistic interpretation of the standard mixture cure model, utilizing a Waring-distributed number of latent causes. The proposed approach represents unobserved heterogeneity through a discrete latent variable interpreted as the number of potential risk factors, providing a flexible and biologically interpretable characterization of individual susceptibility. In contrast to classical competing risks models based on extremal operators or deterministic activation schemes, the event time is assumed to arise from a stochastic selection among latent causes. This random activation mechanism defines a unified probabilistic framework in which the cure fraction emerges naturally as the probability of having zero latent causes. The Waring distribution is adopted to model the latent count structure due to its hierarchical formulation, which accommodates overdispersion and heavy-tailed behavior strictly within the latent parametrization of individual risk factors. Under this framework, while the population survival function mathematically reduces to the classical mixture cure representation, the model provides an alternative structure where covariates directly impact the expected latent burden. Parameter estimation for the identifiable regression structure is performed via maximum likelihood, and the finite-sample performance of the estimators is assessed through Monte Carlo simulations, showing accurate parameter recovery and stable inferential properties. An application to real survival data illustrates the practical relevance and epidemiological interpretability of the proposed framework. Overall, this work extends the understanding of existing cure rate models by integrating latent count structures and stochastic activation within a coherent setting, providing a powerful interpretation tool for heterogeneous survival data with long-term survivors.

Keywords:

cure fraction; long-term survival; random activation; waring distribution

1. Introduction

Survival analysis focuses on modeling the time until the occurrence of an event and has been widely applied across several fields. A central challenge in this context is accounting for unobserved factors that affect the risk of event occurrence. To address this issue, Vaupel et al. [1] introduced frailty models, which incorporate random effects to represent heterogeneity among individuals. Under this framework, individuals with higher frailty tend to experience the event earlier, whereas less frail individuals exhibit greater resistance.

Traditionally, continuous distributions such as Gamma, Inverse Gamma, and Generalized Gamma have been used to model unobserved heterogeneity among individuals [1,2,3,4,5,6,7]. However, recent advances in medical research have highlighted the need to explicitly account for individuals who are not susceptible to the event of interest, commonly referred to as cured individuals. In such settings, discrete latent-effect distributions become particularly appealing, as they naturally allow for the inclusion of a non-susceptible fraction in the population. Several studies have emphasized the usefulness of these approaches in modeling cure rate data [8,9,10,11,12,13].

More generally, the relationship between latent-effect models and cure rate models has been widely explored in recent years, especially through discrete formulations based on power series distributions. These approaches allow the cure fraction to arise naturally as the probability of zero latent causes, providing a unified framework for modeling both long-term survivors and unobserved heterogeneity [8,12]. Despite this flexibility, most existing methods primarily emphasize distributional assumptions and inferential procedures, with comparatively less attention given to the structural interpretation of the different sources of variability underlying heterogeneity.

In biological applications, overdispersion has motivated the use of the Negative Binomial (NB) distribution, originally proposed by Bates & Neyman [14], which can be viewed as a Poisson distribution with a Gamma-distributed mean. Although widely used, the NB distribution may not adequately capture more complex forms of heterogeneity, particularly in situations where certain risk factors may disappear without external intervention. As an alternative, the Waring distribution has been proposed [15], extending the NB framework through an additional Beta mixing layer. This hierarchical structure enables the simultaneous modeling of internal and external sources of variability, allowing a clearer distinction between individual susceptibility and unobserved environmental effects.

The Waring distribution is particularly suitable for scenarios in which both the event occurrence rate and the probability of success exhibit variability. Its hierarchical formulation, combining Poisson, Gamma, and Beta components, has been successfully applied in areas such as reliability and biomedical sciences, where capturing uncertainty and heterogeneity is essential [16].

A notable feature of the Waring distribution, which has received limited attention in the survival analysis literature, is its ability to decompose total variability into distinct and interpretable components associated with different sources of heterogeneity. This property makes it especially attractive for modeling complex systems in which latent risk factors originate from multiple mechanisms, including biological, environmental, and individual-specific processes.

Another important challenge in survival analysis lies in identifying the mechanisms responsible for event occurrence in the presence of multiple latent risk factors. As discussed by Minayo [17], many diseases have multifactorial origins, requiring models capable of accommodating such complexity. In this context, the Waring distribution provides a flexible framework for representing heterogeneity in multifactorial settings.

Classical competing risks models typically assume that the event is triggered by a specific activation rule. For instance, the event may occur upon activation of the first latent cause (minimum activation), a formulation that has been extensively studied in the literature, including recent developments based on discrete frailty models [18], or only after all causes are activated. Between these extremes, random activation schemes offer a more flexible and realistic alternative, allowing the triggering mechanism to be governed by stochastic selection among latent causes. Previous works have explored these schemes as a way to capture uncertainty in the activation process [19,20]. In particular, random activation is well suited for modeling scenarios in which multiple latent factors may contribute to the occurrence of the event in an uncertain and unobservable manner, as commonly observed in clinical applications, such as tumor recurrence.

More recently, activation-based cure models have gained increasing attention due to their ability to incorporate biologically meaningful mechanisms into survival analysis [21,22]. The random activation scheme, in particular, provides an intermediate structure between competing and complementary risks, offering greater flexibility in representing the failure mechanism. Nevertheless, many existing models rely on relatively simple assumptions regarding the distribution of latent causes, which may limit their ability to capture more complex patterns of heterogeneity.

Recent contributions have also investigated early and late activation mechanisms, incorporating additional features such as spatial variability and degradation of risk factors [21,23]. These developments highlight the need for more flexible modeling approaches capable of simultaneously capturing the cure fraction and the underlying heterogeneity structure.

Motivated by these challenges, this paper proposes a cure rate modeling framework that combines a Waring-distributed latent structure with a random activation mechanism within a unified probabilistic setting. Unlike classical mixture cure models, in which the cure fraction is imposed at the population level, the proposed approach derives this structure from an explicit latent mechanism, where the cure probability emerges naturally as the probability of having zero latent causes. This provides a clear structural interpretation of long-term survival, directly linking it to the underlying heterogeneity in the number of risk factors.

From a modeling perspective, the use of the Waring distribution provides a flexible model for the latent number of competing causes, accommodating overdispersion and heavy-tailed count behavior while allowing an interpretable decomposition of variability. Under the assumptions adopted in this work, these characteristics operate at the latent level and influence the observable survival function through the induced cure fraction.

The main contributions of this work can be summarized as follows. First, we introduce a class of cure rate models that integrates discrete latent-effect distributions and stochastic activation mechanisms in a coherent framework. Second, we establish that the classical mixture cure representation arises as a consequence of the proposed construction, rather than being assumed a priori, providing an interpretation of cure models. Third, we develop a full likelihood-based inferential procedure that accommodates right-censored data and allows the inclusion of covariates through a regression structure on the latent mean. Finally, we demonstrate through simulation studies and a real data application that the proposed model provides accurate estimation, meaningful interpretation, and satisfactory inferential performance in the presence of heterogeneous survival data with long-term survivors.

The remainder of this paper is organized as follows. Section 2 introduces the Waring distribution and discusses its main properties. Section 3 presents the proposed model under the random activation scheme. Section 4 describes the inferential framework based on maximum likelihood estimation, including the construction of the likelihood function, parameter estimation procedures, and the treatment of right-censored data. Section 5 reports the results of a simulation study designed to assess the performance of the estimators. In Section 6, the model is applied to melanoma cancer data. Finally, Section 7 provides concluding remarks.

2. The Waring Distribution

This section provides a brief overview of the two-parameter Waring distribution, emphasizing its key probabilistic properties and its relevance for modeling heterogeneity in survival data.

The Waring distribution can be viewed as an extension of the Yule–Simon distribution [24] and has been widely used in applications where overdispersion and heterogeneity are present. In this context, the Waring distribution [25] is derived from the Waring series, defined as:

\begin{matrix} \frac{1}{x - a} = \sum_{m = 0}^{\infty} \frac{{(a)}_{m}}{{(x)}_{m + 1}}, \end{matrix}

(1)

where

{(α)}_{q} = α (α + 1) \dots (α + q - 1)

; if

α > 0

follows that

{(α)}_{q} = Γ (α + q) / Γ (α)

. If

ρ = x - a

, The probability mass function of M is given by:

\begin{matrix} P [M = m] = p_{m} = ρ \frac{{(a)}_{m}}{{(a + ρ)}_{m + 1}}, m = 0, 1, 2, \dots, \end{matrix}

(2)

where

a > 0

and

ρ > 2

. The probability mass function of the Waring distribution can be expressed in terms of the Gamma function as shown below:

p_{m} = ρ \frac{{(a)}_{m}}{{(a + ρ)}_{m + 1}} = ρ \frac{\frac{Γ (a + m)}{Γ (a)}}{\frac{Γ (a + ρ + m + 1)}{Γ (a + ρ)}} = ρ \frac{Γ (a + m)}{Γ (a)} \frac{Γ (a + ρ)}{Γ (a + ρ + m + 1)} .

(3)

In Figure 1, the behavior of the probability mass function of the Waring distribution

W (a, ρ)

is illustrated for different values of the parameters a and

ρ

.

The Waring distribution can be written as a mixture of the Poisson distribution with the Gamma and Beta distributions. This approach more comprehensively captures the variability and uncertainty inherent in many real-world phenomena.

Hierarchical Representation of the Waring Distribution

A convenient way to understand the Waring distribution is through its hierarchical construction, which reveals the different sources of variability contributing to the overall dispersion [26]. This hierarchical formulation allows the total variability to be decomposed into distinct components, typically interpreted as random, external, and internal effects. These components capture different aspects of heterogeneity, including individual-level variability, unobserved environmental influences, and structural variability associated with the mean behavior of the process. Mathematically,

\begin{matrix} \{\begin{matrix} M \sim Geometric (p), \\ p \sim Beta (a, ρ) . \end{matrix} \end{matrix}

(4)

The second, more complex approach involves mixing the Poisson distribution with the Gamma and Beta distributions, in three stages. This approach has the ability to more comprehensively capture the variability and uncertainty inherent in many real phenomena.

To separate internal and external sources of variability, Irwin [26] introduced a three-stage hierarchical model:

P o i s s o n (λ) \underset{λ}{\land} G a m m a (a, \frac{1 - p}{p}) \underset{p}{\land} B e t a (ρ, 1) .

(5)

Of the three stages that form the hierarchical representation in Equation (5), we can state the following:

1.: First stage: Referred to as the random effect stage, we assume that the number of risk factors M (e.g., number of cancerous cells, number of bacteria) follows a Poisson distribution, i.e., $M \sim Poisson (λ)$ .
2.: Second stage: Also called external effect, in this second stage, we will assume that the average number of risk factors $λ$ follows a Gamma distribution, i.e., $λ \sim Gamma (a, v)$ . Consequently, the number of risk factors M is a discrete variable that follows a Negative Binomial (NB) distribution, described as $M \sim NB (a, p)$ with $p = 1 / (1 + v)$ .
3.: Third stage: Finally, we have the last stage called internal effect. Let $p = 1 / (1 + v)$ be a probability of success, such that $p \sim Beta (ρ, 1)$ . Thus, the risk factors M follow a Waring distribution, in which the variance captures three sources of variation: random effect, external effect, and internal effect. From the stochastic representation above, we have the following:

$M \sim Waring (a, ρ),$

(6)

where the mean of Equation (6) is described by:

$\begin{matrix} E (M) = \frac{a}{ρ - 1} = μ . \end{matrix}$

(7)

The Waring distribution is characterized by overdispersion, meaning that the variance exceeds the mean. The variance can be decomposed into three sources: random effect, external effect, and internal effect, which reflect different aspects of variability and heterogeneity in patient risk. The random effect captures unobserved heterogeneity among individuals, external effects represent unobserved covariates or environmental factors influencing survival, and the internal effect accounts for variation that grows with the mean survival time.

Table 1 presents this decomposition of the total variance of the Waring distribution into the three sources, along with their respective variance rates. It is important to note that this decomposition is a consequence of the adopted Waring latent-count parametrization. Under the random activation framework considered in this work, the resulting components should be interpreted as model-implied summaries of latent heterogeneity rather than as separately identifiable features of the observable survival data.

The variance rate expresses the proportion of total variability attributed to each source. This decomposition helps understand how each component contributes to the observed dispersion in the data, with the random effect predominating when the mean survival is low, the internal effect becoming dominant as the mean increases, and the external effect remaining intermediate and modulated by the parameter

ρ

.

3. Random Activation Mechanism with Waring-Distributed Latent Causes

In this section, we formally introduce the proposed model, which combines a discrete latent-cause structure based on the Waring distribution with a stochastic activation mechanism. The key distinction from classical competing risks formulations lies in the definition of the observed event time: rather than being determined by an extremal operator (e.g., minimum or maximum), the event arises through a random selection among latent activation times. This induces a different probabilistic structure and leads to a specific mixture representation at the population level.

The proposed construction can be interpreted within a hierarchical latent variable framework, in which both the number of latent causes and the activation mechanism are unobserved.

3.1. General Random Activation Mechanism

Let M be a non-negative integer-valued random variable representing the number of latent causes (such as residual tumor cells) within a single individual. Conditional on

M = m

for

m \geq 1

, consider a collection of latent activation times

{Z_{1}, \dots, Z_{m}}

defined on a common probability space.

To provide a parsimonious framework suitable for our data application, we assume that given

M = m

, the latent activation times are independent and identically distributed with cumulative distribution function

F_{0} (t)

and survival function

S_{0} (t) = 1 - F_{0} (t)

. We define a random activation mechanism where the observed event time T arises from a probabilistic selection among these latent times, yielding the following conditional survival function

P (T > t ∣ M = m) = S_{0} (t) .

(8)

This specification establishes an invariance property where the conditional survival distribution of T depends on the baseline trajectory

S_{0} (t)

rather than the specific number of latent components, provided that at least one cause is present (

m \geq 1

).

Specifically, conditional on

M = m \geq 1

, one latent cause is selected at random through a latent variable K taking values in {1, …, m}. For simplicity, and consistently with the data application, we assume uniform selection among latent causes.

Under the assumption that the latent activation times are independent and identically distributed, the survival distribution of the selected activation time coincides with the baseline survival function, yielding.

By marginalizing over M, the population survival function is given by the following decomposition

S_{pop} (t) = P (T > t) = \sum_{m = 0}^{\infty} P (M = m) P (T > t ∣ M = m) .

(9)

When

M = 0

, no latent cause is present and we define

T = \infty

almost surely, which implies

P (T > t ∣ M = 0) = 1

. For

M \geq 1

, substituting the invariance property from Equation (8) into Equation (9) yields

S_{pop} (t) = P (M = 0) + S_{0} (t) \sum_{m = 1}^{\infty} P (M = m) .

(10)

Since

\sum_{m = 1}^{\infty} P (M = m) = 1 - P (M = 0)

, it follows that

S_{pop} (t) = P (M = 0) + (1 - P (M = 0)) S_{0} (t) .

(11)

It is important to note that, under the assumptions of conditional independence and identically distributed latent activation times, the resulting population survival function takes the same form as the classical mixture cure model. Consequently, the positive component of the latent count distribution does not directly appear in the observable survival function. In this setting, the Waring distribution provides a probabilistic mechanism for modeling the latent number of causes and the associated cure fraction.

Therefore, the resulting model takes the form of a mixture cure model at the population level. Importantly, this structure is not imposed a priori, but emerges naturally from the proposed random activation framework combined with a Waring-distributed number of latent causes. Although the observable survival function coincides with that of a standard mixture cure model under the assumptions adopted here, the proposed framework offers a mechanistic interpretation of the cure fraction and establishes a connection between random activation processes and latent competing causes.

3.2. Special Case: Uniform Random Activation Mechanism

This section considers a particular case of the random activation mechanism introduced in Section 3.1 in which the selection rule is uniform and independent of latent activation times. Specifically, for

M = m \geq 1

, we assume that

P (K = k ∣ M = m, Z) = P (K = k ∣ M = m) = \frac{1}{m}, k = 1, \dots, m .

(12)

Under this specification, each latent cause has the same probability of being activated, reflecting an exchangeable structure with no preferential triggering. This corresponds to a baseline version of the random activation mechanism, in which the selection is purely stochastic and does not depend on the latent activation times.

Assuming further that the latent activation times

{Z_{1}, \dots, Z_{m}}

are independent and identically distributed with survival function

S_{0} (t)

, the conditional survival function simplifies to Equation (8). This invariance property indicates that, under uniform random activation, the survival distribution of the observed event time does not depend on the number of latent causes, provided that at least one cause is present. In this case, the role of M is restricted to determining whether the individual is susceptible (

M \geq 1

) or not (

M = 0

).

Consequently, the population survival function (see Equation (11)) reduces to

S_{pop} (t) = p_{0} + (1 - p_{0}) S_{0} (t),

(13)

where

p_{0} = P (M = 0)

denotes the cure fraction.

Although this expression coincides with the classical mixture cure model, it is important to emphasize that, within the present framework, it arises as a direct consequence of the random activation mechanism rather than being imposed at the population level. This provides a clear probabilistic interpretation of the cure fraction as the probability of having no latent causes.

3.3. Waring-Distributed Number of Latent Causes

We assume that the latent variable M follows a Waring distribution with parameters

a > 0

and

ρ > 0

. This choice is motivated by its flexible hierarchical structure, which allows for the modeling of overdispersion and multiple sources of variability. In particular, the Waring distribution is capable of capturing heavy-tailed behavior in the number of latent causes, allowing for the presence of individuals with a large number of potential risks. Moreover, it provides a natural mechanism to induce a cure fraction through its positive probability mass at zero.

Under the random activation mechanism and the assumption of independent and identically distributed latent activation times, the invariance property established in Equation (8) holds. In particular, the conditional survival function does not depend on the number of latent causes for

M \geq 1

, and the population survival function follows the general expression derived in Equation (11).

Specifically, individuals with

M = 0

have no latent causes and are therefore interpreted as cured, with

T = \infty

almost surely. In this setting, the influence of the Waring distribution enters the model through the probability mass at zero; for

M \sim Waring (a, ρ)

, this probability is given by:

p_{0} = P (M = 0) = \frac{ρ}{a + ρ} .

(14)

Substituting this expression into the population survival function obtained in Equation (11), we obtain

S_{pop} (t) = \frac{ρ}{a + ρ} + (1 - \frac{ρ}{a + ρ}) S_{0} (t) .

(15)

Therefore, the asymptotic survival level is directly determined by the parameters of the Waring distribution:

lim_{t \to \infty} S_{pop} (t) = \frac{ρ}{a + ρ} .

(16)

Although this expression coincides with the classical mixture cure model, it is important to emphasize that, in the present framework, this structure is not imposed at the population level. Instead, it arises naturally from the combination of (i) the discrete distribution of the number of latent causes and (ii) the random activation mechanism governing the selection of the triggering cause within the individual.

This provides a clear probabilistic interpretation of the cure fraction as the probability of having zero latent causes, thereby linking long-term survival directly to the underlying heterogeneity structure.

Assuming that the baseline activation times follow a Weibull distribution with parameters

λ > 0

and

α > 0

, we have

S_{0} (t) = exp (- λ t^{α}) and f_{0} (t) = λ α t^{α - 1} exp (- λ t^{α}) .

(17)

Substituting into the population-level expressions given in Equation (15), we obtain

\begin{matrix} S_{pop} (t) = \frac{ρ}{a + ρ} + (1 - \frac{ρ}{a + ρ}) exp (- λ t^{α}), \end{matrix}

(18)

and

\begin{matrix} f_{pop} (t) = (1 - \frac{ρ}{a + ρ}) λ α t^{α - 1} exp (- λ t^{α}) = \frac{a}{a + ρ} λ α t^{α - 1} exp (- λ t^{α}) . \end{matrix}

(19)

It is noteworthy that, under the random activation mechanism, the influence of the Waring distribution on the population survival function operates exclusively through the probability mass at zero,

P (M = 0)

. In contrast, the survival behavior of susceptible individuals remains fully governed by the baseline distribution. This separation yields a clear and interpretable decomposition between cure fraction and failure-time dynamics.

In the present framework, the latent variable M represents the number of potential causes rather than a multiplicative effect on the hazard function. The observed event time arises from the random activation mechanism rather than from a scaling of the baseline hazard, leading to a distinct probabilistic interpretation of unobserved heterogeneity suitable for illustrative applications.

4. Inference

We now describe the inferential procedure for parameter estimation under the proposed model, based on the maximum likelihood framework and considering right-censored survival data.

Let

D = {(t_{i}, δ_{i}), i = 1, \dots, n}

denote a random sample, where

t_{i}

represents the observed survival time for the i-th individual and

δ_{i} \in {0, 1}

is the censoring indicator, with

δ_{i} = 1

if the event is observed and

δ_{i} = 0

if the observation is right-censored.

Under the assumption of independent observations, the likelihood function is constructed from the population density and the survival function. Specifically, the likelihood function can be written as

\begin{matrix} L (ϑ ∣ D) = \prod_{i = 1}^{n} {[f_{pop} (t_{i})]}^{δ_{i}} {[S_{pop} (t_{i})]}^{1 - δ_{i}} . \end{matrix}

(20)

Substituting the closed-form expressions under the Waring–Weibull specification (see Equations (18) and (19)), we obtain the following

\begin{matrix} L (ϑ ∣ D) = \prod_{i = 1}^{n} {[\frac{a}{a + ρ} λ α t_{i}^{α - 1} e^{- λ t_{i}^{α}}]}^{δ_{i}} {[\frac{ρ + a e^{- λ t_{i}^{α}}}{a + ρ}]}^{1 - δ_{i}}, \end{matrix}

(21)

where

ϑ = (a, ρ, λ, α)

denotes the vector of unknown parameters.

The corresponding log-likelihood function is given by:

\begin{matrix} ℓ (ϑ) & = & \sum_{i = 1}^{n} δ_{i} [log a - log (a + ρ) + log λ + log α + (α - 1) log t_{i} - λ t_{i}^{α}] \\ + \sum_{i = 1}^{n} (1 - δ_{i}) [log (ρ + a e^{- λ t_{i}^{α}}) - log (a + ρ)] . \end{matrix}

(22)

It is worth emphasizing that the log-likelihood naturally decomposes into two components: one associated with the contribution of uncensored observations through the density function, and another driven by censored observations through the survival function. The term

log (ρ + a e^{- λ t_{i}^{α}})

reflects the latent mixture structure induced by the Waring distribution.

Incorporation of Covariates

To incorporate covariate information, we model the mean of the Waring distribution through a regression structure. Recall that the mean number of latent causes is given by

μ = \frac{a}{ρ - 1} .

To ensure positivity and facilitate interpretation, we adopt a log-link function:

\begin{matrix} μ (x) = \frac{a (x)}{ρ - 1} = exp (x^{⊤} β), \end{matrix}

(23)

which implies

a (x) = (ρ - 1) exp (x^{⊤} β) .

(24)

It is important to note that

a (x)

is not treated as an independent parameter. Rather, it is fully determined by

ρ

and the regression coefficients through the adopted regression structure. Consequently, the inferential procedure is carried out directly on the parameter vector

(β^{⊤}, ρ, λ, α)

, avoiding overparameterization of the model.

Substituting

a (x)

into the probability mass function of the Waring distribution, the cure fraction becomes

p_{0} (x) = P (M = 0 ∣ x) = \frac{ρ}{ρ + a (x)} = \frac{ρ}{ρ + (ρ - 1) exp (x^{⊤} β)} .

(25)

It is worth noting that the parameter

ρ

and the intercept jointly contribute to the baseline cure probability through Equation (25). Consequently, inference should be based on the full likelihood rather than on a separate interpretation of these parameters. In the present framework, the primary quantities of interest are the induced cure probabilities and the covariate effects, which are directly linked to the expected latent burden. Therefore, scientific interpretation focuses on the resulting cure fraction and latent risk structure. Accordingly, the model is interpreted through the induced cure probabilities and covariate effects, while separate inference on

ρ

and the intercept is not the primary objective of the proposed framework.

Therefore, the cure fraction depends jointly on the regression structure and the parameter

ρ

. While the regression coefficients determine the covariate-specific variation in the expected number of latent causes, the parameter

ρ

governs the shape of the latent count distribution and contributes to the induced cure fraction.

Thus, the regression model is formulated directly on the expected number of latent causes. This specification guarantees that

μ (x)

remains strictly positive and introduces a multiplicative effect of the covariates on the latent count structure. Consequently, the regression coefficients admit a natural interpretation in terms of relative changes in the expected number of latent causes.

It is important to note that

a (x)

is not treated as an independent parameter. Rather, it is completely determined by

ρ

and the regression coefficients through Equation (23). Therefore, inference is performed on the parameter vector

ϑ = (β^{⊤}, ρ, λ, α),

where

x^{⊤} = (1, x_{1}, \dots, x_{p})

denotes the covariate vector and

β = {(β_{0}, β_{1}, \dots, β_{p})}^{⊤}

is the corresponding vector of regression parameters.

Under the random activation framework, the covariates affect the observable survival function through their impact on the latent number of causes and the associated cure fraction. Therefore, positive values of a regression coefficient indicate an increase in the expected number of latent causes, whereas negative values indicate a reduction in the latent burden and a corresponding increase in the cure probability.

Finally, we assume that the latent activation times

Z_{1}, \dots, Z_{M}

are independent and follow a Weibull distribution with parameters

λ > 0

and

α > 0

, as defined in Section 3.

5. Simulation Study

A Monte Carlo simulation study was conducted to evaluate the finite-sample performance of the maximum likelihood estimators under the proposed model. Different parameter configurations were considered to assess the finite-sample performance, accuracy, and stability of the estimators under varying levels of heterogeneity and cure fraction. Data were generated according to the proposed model and a Weibull baseline distribution. The data-generating process is described in Algorithm 1.

Algorithm 1 Generation of survival times and censoring with a cure model

Require:: $n, z, β_{1}, β_{2}, ρ, λ, α$ .
Ensure:: A dataset with $(t_{i}, δ_{i}, z_{i})$ for $i = 1, \dots, n$ .
1:: for $i = 1$ to n do
2:: Compute the cure probability: $p_{i} \leftarrow \frac{ρ}{ρ + (ρ - 1) e^{β_{1} + β_{2} z_{i}}}$
3:: Simulate the susceptibility indicator: $M_{i} \sim Bernoulli (1 - θ_{i})$
4:: if $M_{i} = 1$ then
5:: Generate standard uniform random variable: $U_{i} \sim Uniform (0, 1)$
6:: Generate event time from a Weibull distribution: $t_{i} = {(\frac{- log (u_{i})}{λ})}^{1 / α}$
7:: else
8:: Assign infinite time (cured): $t_{i} \leftarrow \infty$
9:: end if
10:: Generate a random censoring time: $C_{i} \sim Uniform (0, t_{\max})$
11:: Compute the observed time: $t_{i} \leftarrow min (t_{i}, C_{i})$
12:: Define the event indicator: $δ_{i} \leftarrow I (t_{i} < C_{i})$
13:: end for
14:: return $(t_{i}, δ_{i}, z_{i})$ for each individual i.

This procedure ensures that the simulated data strictly follow the proposed model, preserving both the discrete latent-count structure and the random activation mechanism where, under the uniform selection assumption, each latent cause has an equal probability of triggering the event conditional on

M_{i}

. To evaluate the model’s performance under various conditions, the simulation study pursued two main objectives. First, the parameters were estimated across different censoring proportions to analyze the impact of data loss. Second, a comparative simulation study was conducted to contrast the proposed model against the negative binomial model. The entire study was based on 1000 replicates for each scenario, evaluating three sample sizes (n = 100, 500, and 1000) across three distinct censoring rates to replicate realistic operational conditions. For the negative binomial model, the parameters

α = 2.5

,

λ = 1.5

, and

β_{1} = - 1

remained identical, alongside the same baseline configurations for

β_{0}

, while the unique differing parameter was set to

ν = 4

. The complete performance metrics of these simulation configurations are summarized in Table 2, while the results regarding the negative binomial simulation are provided in Table 3.

As expected, increasing the sample size yields substantial improvements in the properties of the maximum likelihood estimators. Specifically, a consistent reduction in both standard deviations (SD) and mean squared errors (MSE) is observed as n increases from 100 to 1000, underscoring the asymptotic efficiency of the estimators, while empirical biases approach zero, highlighting their consistency. Regarding the censoring effect, the performance of the estimators is highly sensitive to the censoring rate. Under low and moderate censoring regimes (10% and 50%), the estimators exhibit remarkable precision and rapid convergence even for moderate sample sizes. Conversely, under the heavy censoring scenario (80%), the loss of observed information significantly deteriorates the quality of the estimates in smaller samples (

n = 100

), leading to inflated MSE values and pronounced biases, particularly in the structural parameters (

β_{0}, β_{1}

) and the activation probabilities (

p_{0}, p_{1}

). Nevertheless, these deviations are heavily mitigated as the sample size grows to

n = 1000

. Furthermore, the coverage probabilities (CP) of the 95% confidence intervals show adequate calibration across all scenarios, remaining close to the nominal level and stabilizing remarkably around 95% for larger sample sizes (

n \geq 500

).

Based on these simulation results, a comparative analysis was performed with the negative binomial (NB) cure rate model under identical experimental conditions to evaluate the performance and competitive advantages of the proposed Waring framework. This systematic comparison aims to determine whether the structural flexibility of the Waring model offers greater finite-sample performance in parameter recovery, especially when handling variations in sample size and censoring intensity. The comparative results under the selected moderate censoring regime (50%) are detailed in Table 3, which highlights the distinctive behavior of each model under identical data generation mechanisms.

Under a moderate censoring regime, the simulation results in Table 2 and Table 3 indicate more favorable estimation properties for the Waring model than for the NB alternative under the considered simulation setting. For small sample sizes, the proposed approach successfully recovers the structural parameters with minimal bias, whereas the NB model severely underestimates the regression slope. This performance gap remains evident as the sample size increases; while the Waring model achieves near-perfect asymptotic convergence toward the nominal values, the NB model suffers from persistent calibration issues in its confidence intervals, displaying over-coverage in the threshold parameter and a concerning under-coverage in the slope. These findings support the finite-sample efficiency and flexibility of the proposed Waring cure rate model under the simulation scenarios considered.

To complement these results and provide a more rigorous statistical comparison, the AIC and BIC were calculated for both models in all simulation scenarios. Analysis of these information criteria allows for a comprehensive assessment of the goodness of fit, penalizing model complexity and thus offering an objective framework for determining which approach demonstrates superior performance. According to the results in Table 4, the Waring model is the best-performing model across all simulation scenarios. For all sample sizes, the Waring model consistently achieves lower values for both AIC and BIC compared with the NB model. These systematic differences demonstrate that the proposed framework provides a significantly better goodness-of-fit and greater model parsimony, indicating more favorable fit measures for the Waring model under the simulation scenarios considered.

The estimated parameter correlation matrix did not reveal evidence of severe confounding between

ρ

and the regression coefficient. Specifically,

ρ

exhibited a moderate positive correlation with the intercept

θ_{0}

, suggesting that despite their joint presence in the cure fraction formulation, the baseline scale parameter remains structurally distinguishable from the constant cure probability under the evaluated design. In Table 5, the estimated parameter correlation matrix did not reveal evidence of severe practical confounding among the model parameters under the simulated scenarios. In particular, the correlation between

ρ

and the intercept (

β_{0}

) remained moderate, suggesting stable numerical estimation and no evidence of optimization instability in Monte Carlo replications.

6. Application for Melanoma Data

We present a practical application using data from patients diagnosed with melanoma in the state of São Paulo, Brazil, with the aim of evaluating the performance of the proposed model under different activation schemes. The data were provided by the Fundação Oncocentro de São Paulo (FOSP), a public institution affiliated with the State Department of Health, responsible for coordinating the Hospital Cancer Registry in the state of São Paulo. The study includes patients diagnosed between 2000 and 2014, with follow-up conducted until 2018. The event of interest was death attributed exclusively to melanoma cancer. After excluding 593 patients due to missing information on observed covariates, the final sample consisted of 6741 patients, of whom 71.67% were censored, that is, they did not experience the event of interest during the study period. This dataset was also analyzed in greater depth by Molina et al. [10].

In Table 6, a detailed description of the covariates analyzed in this study is presented.

For the melanoma application, we adopt the modeling framework developed in Section 3. In particular, the analysis is conducted under the uniform random activation mechanism and the assumption that latent activation times are conditionally independent and identically distributed. Within this framework, the latent causes may be interpreted as unobserved residual malignant cells or microscopic metastatic foci associated with disease progression. Because the specific latent cause ultimately responsible for melanoma-related death cannot be identified from the available registry data, the uniform random activation mechanism is adopted as a parsimonious and biologically neutral representation that does not privilege any particular latent cause. Likewise, the assumption of conditional independence provides a tractable framework for modeling latent competing causes and is commonly adopted in cure-rate and competing-risk models. Although these latent processes are not directly observable, the proposed model provides a probabilistic representation of unobserved heterogeneity and long-term survival, two important characteristics commonly encountered in melanoma studies.

For each patient i, the observed survival time

t_{i}

was defined as the time from melanoma diagnosis until death due exclusively to melanoma or the end of follow-up. The censoring indicator was defined as

δ_{i} = 1

when death due to melanoma was observed and

δ_{i} = 0

otherwise. Patients who remained alive at the end of the study period or who experienced a non-melanoma-related death were treated as right-censored observations.

Adjustment of Models with the Presence of Covariates

In this study, we begin by analyzing the isolated influence of each covariate on the time to death of melanoma patients. The regression parameter was incorporated into parameter a through the link function in Equation (23). The results for the proposed model are presented in Table 7.

The results presented in Table 7 indicate that gender, clinical stage, radiotherapy, and chemotherapy are important covariates associated with the cure probability in the model. Across both model specifications, the parameter estimates are consistent, suggesting stable effects of these covariates.

In particular, females and patients in clinical stage I show higher estimated cure probabilities compared with males and patients in stage II, respectively. This reflects the direction of the estimated effects within the model, rather than causal relationships.

Regarding treatment variables, patients who did not receive radiotherapy or chemotherapy exhibit higher estimated cure probabilities than those who received these treatments. This suggests a negative association between treatment indicators and the cure probability, conditional on the model specification and covariates included.

Finally, the consistency of parameter estimates across model specifications reinforces the robustness of the observed associations between gender, clinical stage, and treatment variables with the outcome.

Next, Figure 2 illustrates the behavior of the estimated survival functions across the different covariate groups. The fitted survival curves (represented by dashed lines) closely follow the corresponding empirical Kaplan–Meier estimates (represented by solid lines), particularly during the periods of highest event density in the early years of follow-up. Although the empirical curves exhibit a marked stabilization in the later stages of follow-up, reflecting the presence of long-term survivors, the proposed model successfully captures the overall survival pattern observed in the melanoma data. The close agreement between the fitted and empirical curves supports the adequacy of the Weibull baseline specification for the latent activation times and indicates that the proposed framework provides a parsimonious and reliable representation of the observed survival experience.

Table 8 presents the variance decomposition of the Waring model according to the four covariates under study: gender, clinical stage, and reception of radiotherapy and chemotherapy.

Under the uniform random activation framework, the proposed Waring variance decomposition reflects the assumed latent-count parametrization; therefore, these components should be interpreted as model-implied summaries rather than separately identified sources of empirical heterogeneity. Based on these model-implied metrics, the internal effect represents the largest proportion of variability across most subgroups, especially for patients in clinical stage II and those receiving radiotherapy or chemotherapy. This suggests that, under the model’s structure, unobserved intrinsic factors capture a predominant role in risk heterogeneity under more advanced clinical conditions and intensive therapeutic interventions. Conversely, the random and external effects generally contribute less to the total model-implied variance. However, the random component gains relative relevance in men and in patients who did not receive radiotherapy or chemotherapy, indicating a higher share of unexplained variability within the parameterization in the absence of therapeutic interventions. The external effect exhibits a similar pattern, capturing a higher relative contribution in men and untreated patients compared with those undergoing active therapy. Finally, in women, the model-implied variability is more evenly distributed among the components, although the internal effect remains the primary structural contributor.

Considering the simultaneous inclusion of all covariates, we present the estimates of the corresponding parameters in Table 9.

The confidence intervals associated with

ρ

,

λ

, and

α

indicate satisfactory estimation precision within their strictly positive parameter spaces. In particular, the estimate of

α

is close to 1 (

α \approx 1

), suggesting an approximately exponential susceptible survival distribution.

Regarding the regression coefficients, all covariates have confidence intervals that exclude zero, indicating statistically significant effects after adjusting for the remaining variables in the model. Since the covariates are linked to the mean number of latent competing causes through a log-link function, positive coefficients indicate an increase in the expected number of latent causes, which is associated with a lower cure probability and, consequently, a higher risk of death due to melanoma.

The coefficient associated with age is positive, suggesting that older patients tend to have a higher risk of death from melanoma. The coefficient for gender is negative, indicating that female patients have a lower risk of death compared with male patients. Clinical stage has the largest positive coefficient, showing that patients diagnosed at Stage II have a substantially higher risk of death than those diagnosed at Stage I.

Similarly, radiotherapy and chemotherapy are associated with higher risks of death. This result should be interpreted with caution, since these treatments are typically administered to patients with more severe disease, and therefore the observed association may reflect underlying disease severity rather than a detrimental treatment effect.

Figure 3 presents the behavior of the sources of variability considering all covariates in the model.

Regarding the overall sample, the random component represents approximately 41.9% of the total model-implied variability, capturing a substantial portion of unexplained heterogeneity within the parameterization, likely driven by individual characteristics not captured by the covariates. The internal effect constitutes about 39.7% of this model-implied variance, reflecting how the assumed structure formalizes the role of intrinsic patient factors. Finally, the external effect contributes around 18.4%; although it stands as the smallest component, it remains a relevant factor in the model’s distribution of overall variability. Under this parametric framework, most of the risk heterogeneity is structurally driven by the combination of random and internal effects, while the model assigns a smaller but still meaningful role to environmental or external factors.

Table 10 presents the influence of the covariates age, gender, clinical stage, radiotherapy and chemotherapy on the cure rate (

p_{0}

) and the rate of variation of the internal effect (RCIE).

The columns related to the cure rate (

p_{0}

) and the internal effect (RCIE) show clear and consistent patterns across all covariate combinations. In general, patients with clinical stage II undergoing both radiotherapy and chemotherapy present lower cure rates and higher RCIE values, indicating a stronger internal dynamic component associated with more advanced disease and combined treatments.

This behavior can be illustrated, for instance, by comparing a 30-year-old male patient in stage I who has not received any treatment, who presents a high probability of cure and a low RCIE value (

p_{0} = 0.888

, RCIE = 0.130), versus a patient with the same characteristics who has undergone both radiotherapy and chemotherapy, where the cure rate decreases substantially and the internal effect increases (e.g.,

p_{0} = 0.290

, RCIE = 0.743). A similar pattern is observed across all age and gender groups.

Another relevant finding is that, in almost all comparable scenarios, female patients tend to present slightly higher cure rates and lower RCIE values than male patients, suggesting a marginally better prognosis under identical clinical conditions. For example, a 60-year-old female in stage I without treatment shows a higher cure rate (

p_{0} = 0.877

) and lower RCIE (0.142) compared with a male with the same characteristics (

p_{0} = 0.806

, RCIE = 0.222).

Finally, a consistent inverse relationship is observed between the cure rate and RCIE: as the probability of cure decreases, particularly in older patients, advanced stage II cases, and those receiving combined treatments, the internal effect increases, reflecting a stronger internal variability component associated with more severe clinical conditions.

7. Final Remarks

In this paper, we introduce a random activation framework for cure rate models where the number of latent causes follows a Waring distribution. From a modeling perspective, the use of the Waring distribution provides a flexible formulation for the latent number of competing causes, accommodating overdispersion and heavy-tailed count behavior while allowing an interpretable decomposition of variability. Under the assumptions adopted in this work, these characteristics operate at the latent level and influence the observable survival function exclusively through the induced cure fraction. Thus, our contribution is framed as a novel latent mechanistic interpretation of the standard mixture cure model rather than a new observable survival law.

The use of the Waring distribution introduces additional flexibility by allowing a decomposition of variability into distinct components associated with internal, external, and random sources of heterogeneity. This feature makes the model especially suitable for applications in which the underlying risk structure is complex and partially unobserved. From an inferential perspective, the model admits a tractable likelihood formulation and can be efficiently estimated using maximum likelihood methods while naturally accommodating right-censored data. The inclusion of covariates further enhances its applicability by enabling the assessment of how explanatory variables influence both the cure fraction and the latent risk structure.

Overall, the proposed framework extends existing cure rate models by providing a unified formulation that combines discrete latent-effect structures and stochastic activation mechanisms. This perspective offers a meaningful alternative to traditional competing risks and mixture-based formulations, particularly in settings where interpretability of the cure mechanism and heterogeneity structure is of primary interest.

Author Contributions

Conceptualization, J.K.J.V. and V.T.; Methodology, J.K.J.V. and V.T.; Software, J.K.J.V., P.R.D.M., D.A. and J.M.-M.; Validation, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Formal analysis, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Investigation, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Data curation, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Writing original draft preparation, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Writing review and editing, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Supervision, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M.; Project administration, J.K.J.V., V.T., D.A., P.R.D.M. and J.M.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Coordination for the Improvement of Higher Education Personnel (CAPES).

Informed Consent Statement

The data used in this study were obtained from a previously published study (Molina et al. [10]). The dataset is fully anonymized and contains no identifiable personal information. All ethical procedures were conducted in the original study. Therefore, no additional informed consent was required for the present analysis.

Data Availability Statement

The data used in this study were obtained from a previously published study (Molina et al.) and were provided by the Oncocentro Foundation of São Paulo (FOSP). Data are available from the authors upon reasonable request and with permission of the data provider.

Acknowledgments

The authors are also very grateful to the Oncocentro Foundation of São Paulo (FOSP) for providing the melanoma cancer dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Vaupel, J.W.; Manton, K.G.; Stallard, E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 1979, 16, 439–454. [Google Scholar] [CrossRef]
Balakrishnan, N.; Peng, Y. The generalized Gamma frailty model. Stat. Med. 2006, 25, 2797–2816. [Google Scholar] [CrossRef] [PubMed]
Caroni, C.; Crowder, M.; Kimber, A. Proportional hazards models with discrete frailty. Lifetime Data Anal. 2010, 16, 374–384. [Google Scholar] [CrossRef] [PubMed]
Chen, P.; Zhang, J.; Zhang, R. Estimation of the accelerated failure time frailty model under generalized Gamma frailty. Comput. Stat. Data Anal. 2013, 62, 171–180. [Google Scholar] [CrossRef]
Hougaard, P. Life table methods for heterogeneous populations: Distributions describing the heterogeneity. Biometrika 1984, 71, 75–83. [Google Scholar] [CrossRef] [PubMed]
Tomazella, V. Modelagem de Dados de Eventos Recorrentes via Processo de Poisson com Termo de Fragilidade. Ph.D. Thesis, Universidade de São Paulo, São Paulo, Brazil, 2003. [Google Scholar]
Vasquez, J.K.J.; Molina, K.C.; Tomazella, V.; Diniz, C.A.; Suzuki, A.K. Multistate models with nested frailty for lifetime analysis: Application to bone marrow transplantation recovery patients. Commun. Stat.-Theory Methods 2025, 54, 418–436. [Google Scholar]
Cancho, V.G.; Macera, M.A.; Suzuki, A.K.; Louzada, F.; Zavaleta, K.E. A new long-term survival model with dispersion induced by discrete frailty. Lifetime Data Anal. 2020, 26, 221–244. [Google Scholar] [PubMed]
de Souza, D.; Cancho, V.G.; Rodrigues, J.; Balakrishnan, N. Bayesian cure rate models induced by frailty in survival analysis. Stat. Methods Med. Res. 2017, 26, 2011–2028. [Google Scholar] [CrossRef] [PubMed]
Molina, K.C.; Calsavara, V.F.; Tomazella, V.; Milani, E.A. Survival models induced by zero-modified power series discrete frailty: Application with a melanoma data set. Stat. Methods Med. Res. 2021, 30, 1874–1889. [Google Scholar] [CrossRef] [PubMed]
Mota, A.; Milani, E.A.; Calsavara, V.F.; Tomazella, V.; Leão, J.; Ramos, P.L.; Ferreira, P.H.; Louzada, F. Weighted Lindley frailty model: Estimation and application to lung cancer data. Lifetime Data Anal. 2021, 27, 561–587. [Google Scholar] [CrossRef] [PubMed]
Rodrigues, J.; Castro, M.; Cancho, V.G.; Balakrishnan, N. COM-Poisson cure rate survival models and an application to a cutaneous melanoma data. J. Stat. Plan. Inference 2009, 139, 3605–3611. [Google Scholar] [CrossRef]
Vasquez, J.K.J.; Rodrigues, J.; Balakrishnan, N. A useful variance decomposition for destructive Waring regression cure model with an application to HIV data. Commun. Stat.-Theory Methods 2022, 51, 6978–6989. [Google Scholar] [CrossRef]
Bates, G.E.; Neyman, J. Contributions to the Theory of Accident Proneness. I. An Optimistic Model of the Correlation Between Light and Severe Accidents; Technical Report; University of California: Berkeley, CA, USA, 1952. [Google Scholar]
Irwin, J.O. The generalized Waring distribution. Part I. J. R. Stat. Soc. Ser. A 1975, 138, 18–31. [Google Scholar] [CrossRef]
Tang, Y.; Wang, J.; Zhu, Z. On the MLE of the Waring distribution. Stat. Theory Relat. Fields 2023, 7, 144–158. [Google Scholar] [CrossRef]
Minayo, M.C.d.S. Saúde-doença: Uma concepção popular da etiologia. Cad. Saúde Pública 1988, 4, 363–381. [Google Scholar] [CrossRef]
Vasquez, J.K.J.; Tomazella, V.; Marinho, P.R.D. Decomposing heterogeneity in cure rate models via discrete Waring frailty under minimum activation. Biom. Biostat. Int. J. 2026, 15, 1–11. [Google Scholar]
Cancho, V.G.; Louzada-Neto, F.; Barriga, G.D.C. The Poisson-Exponential lifetime distribution. Comput. Stat. Data Anal. 2011, 55, 677–686. [Google Scholar] [CrossRef]
Goetghebeur, E.; Ryan, L. A modified log rank test for competing risks with missing failure type. Biometrika 1990, 77, 207–211. [Google Scholar] [CrossRef]
Cancho, V.G.; Bandyopadhyay, D.; Louzada, F.; Yiqi, B. The destructive negative binomial cure rate model with a latent activation scheme. Stat. Methodol. 2013, 13, 48–68. [Google Scholar] [CrossRef] [PubMed]
Cooner, F.; Banerjee, S.; Carlin, B.P.; Sinha, D. Flexible cure rate modeling under latent activation schemes. J. Am. Stat. Assoc. 2007, 102, 560–572. [Google Scholar] [CrossRef] [PubMed]
Cooner, F.; Banerjee, S.; McBean, A.M. Modelling geographically referenced survival data with a cure fraction. Stat. Methods Med. Res. 2006, 15, 307–324. [Google Scholar] [CrossRef] [PubMed]
Yule, G.U. An introduction to the theory of statistics. Bull. Am. Math. Soc. 1924, 30, 465–466. [Google Scholar] [CrossRef]
Rodríguez-Avi, J.; Conde-Sánchez, A.; Sáez-Castillo, A.; Olmo-Jiménez, M. A new generalization of the Waring distribution. Comput. Stat. Data Anal. 2007, 51, 6138–6150. [Google Scholar] [CrossRef]
Irwin, J.O. The generalized Waring distribution applied to accident theory. J. R. Stat. Soc. Ser. A 1968, 131, 205–225. [Google Scholar] [CrossRef]

Figure 1. Probability mass function of the Waring distribution for (a) different values of parameter a with fixed

ρ

, and (b) for different values of

ρ

with fixed a.

Figure 1. Probability mass function of the Waring distribution for (a) different values of parameter a with fixed

ρ

, and (b) for different values of

ρ

with fixed a.

Figure 2. Kaplan–Meier estimation and Waring model survival curves for the melanoma dataset according to gender, clinical stage, radiotherapy, and chemotherapy.

Figure 3. Sources of variability influenced by all covariates in the Waring model for melanoma data set.

Table 1. Variance decomposition of the Waring distribution.

Source of Variability	Variance	Variance Rate (VR)
Random effect	$μ$	$\frac{ρ - 2}{ρ} \frac{1}{1 + μ}$
External effect	$\frac{2}{(ρ - 2)} μ$	$\frac{2}{ρ} \frac{1}{1 + μ}$
Internal effect	$\frac{ρ}{(ρ - 2)} μ^{2}$	$\frac{μ}{1 + μ}$
Total	$\frac{ρ}{ρ - 2} (μ + μ^{2})$	1

Table 2. Simulation results for the proposed Waring cure rate model under different censoring levels: mean MLEs, standard deviations (SD), bias, average standard errors (SE), and coverage probabilities (CP).

Censoring	n	Parameter	$ρ$	$λ$	$α$	$β_{0}$	$β_{1}$
10%	100	Mean	3.398	1.500	2.533	5.630	−2.169
		SD	0.784	0.067	0.202	3.593	4.193
		Bias	−0.602	0.000	0.033	1.130	−1.169
		MSE	0.988	0.067	0.205	3.767	4.353
		CP ( $95 %$ )	0.896	0.958	0.945	0.896	0.890
	500	Mean	3.795	1.500	2.506	4.506	−1.038
		SD	0.060	0.030	0.089	0.334	0.408
		Bias	−0.205	0.000	0.006	0.006	−0.038
		MSE	0.214	0.030	0.089	0.334	0.410
		CP ( $95 %$ )	0.945	0.950	0.953	0.943	0.941
	1000	Mean	3.960	1.501	2.503	4.474	−1.015
		SD	0.038	0.020	0.066	0.216	0.267
		Bias	−0.040	0.001	0.003	−0.026	−0.015
		MSE	0.184	0.020	0.066	0.217	0.267
		CP ( $95 %$ )	0.950	0.956	0.958	0.951	0.955
50%	100	Mean	3.984	1.527	2.538	2.064	−1.054
		SD	0.119	0.135	0.357	0.465	0.543
		Bias	−0.016	0.027	0.038	0.064	−0.054
		MSE	0.120	0.138	0.359	0.470	0.546
		CP ( $95 %$ )	0.983	0.952	0.953	0.983	0.977
	500	Mean	3.999	1.501	2.511	2.005	−1.008
		SD	0.033	0.047	0.141	0.136	0.191
		Bias	−0.001	0.001	0.011	0.005	−0.008
		MSE	0.033	0.047	0.142	0.136	0.191
		CP ( $95 %$ )	0.943	0.949	0.947	0.941	0.953
	1000	Mean	4.001	1.499	2.504	2.002	−1.006
		SD	0.021	0.033	0.106	0.091	0.131
		Bias	0.001	−0.001	0.004	0.002	−0.006
		MSE	0.021	0.033	0.106	0.091	0.131
		CP ( $95 %$ )	0.951	0.948	0.953	0.947	0.950
80%	100	Mean	3.567	1.754	2.736	3.701	−3.468
		SD	1.151	0.696	0.768	4.573	4.675
		Bias	−0.433	0.254	0.236	2.701	−2.468
		MSE	1.230	0.741	0.804	5.311	5.287
		CP ( $95 %$ )	0.938	0.937	0.944	0.976	0.983
	500	Mean	3.901	1.693	2.517	2.383	−2.202
		SD	0.797	0.480	0.334	3.177	3.006
		Bias	−0.099	0.193	0.017	1.383	−1.202
		MSE	0.803	0.518	0.335	3.465	3.237
		CP ( $95 %$ )	0.941	0.939	0.949	0.967	0.979
	1000	Mean	4.074	1.630	2.503	1.698	−1.541
		SD	0.526	0.370	0.256	2.108	1.953
		Bias	0.074	0.130	0.003	0.698	−0.541
		MSE	0.531	0.392	0.256	2.220	2.027
		CP ( $95 %$ )	0.945	0.942	0.951	0.954	0.944

Table 3. Simulation results for the negative binomial cure rate model under a 50% censoring rate: mean MLEs, standard deviations (SD), bias, root mean squared errors (RMSE), and coverage probabilities (CP).

n	Metric	$ν$	$λ$	$α$	$β_{0}$	$β_{1}$
100	Mean	4.045	1.483	2.595	2.365	−0.748
	SD	0.074	0.092	0.291	0.591	0.694
	Bias	0.045	−0.017	0.095	0.365	0.252
	RMSE	0.086	0.093	0.306	0.695	0.738
	CP ( $95 %$ )	0.998	0.920	0.954	0.996	0.963
500	Mean	4.026	1.498	2.516	2.209	−0.852
	SD	0.056	0.037	0.116	0.450	0.506
	Bias	0.026	−0.002	0.016	0.209	0.148
	RMSE	0.062	0.038	0.117	0.496	0.527
	CP ( $95 %$ )	0.990	0.953	0.956	0.989	0.935
1000	Mean	4.012	1.497	2.511	2.095	−0.854
	SD	0.051	0.025	0.078	0.413	0.403
	Bias	0.012	−0.003	0.011	0.095	0.146
	RMSE	0.053	0.026	0.079	0.424	0.429
	CP ( $95 %$ )	0.985	0.955	0.957	0.985	0.901

Table 4. Mean of AIC and BIC values for Waring and NB models by sample size.

Criterion	Waring Model			NB Model
Criterion	$n = 100$	$n = 500$	$n = 1000$	$n = 100$	$n = 500$	$n = 1000$
AIC	215.29	1049.13	2092.94	226.49	1131.36	2270.67
BIC	228.31	1070.20	2116.94	239.52	1152.44	2295.21

Table 5. Estimated parameter correlation matrix.

	$ρ$	$λ$	$α$	$β_{0}$	$β_{1}$
$ρ$	1
$λ$	0.4137	1
$α$	−0.0812	−0.1242	1
$β_{0}$	0.5912	0.5836	−0.0934	1
$β_{1}$	−0.0140	−0.1345	0.0210	−0.4568	1

Table 6. Description of the covariates in the melanoma data set.

Covariate	Category	Description	n	%
$X_{1}$ : Age	-	$μ = 58.11$ $σ = 16.26$	6741	-
$X_{2}$ : Gender	0 1	Male Female	3411 3330	50.60 49.40
$X_{3}$ : Clinical stage	0 1	Stage I Stage II	4546 2195	67.44 32.56
$X_{4}$ : Radiotherapy	0 1	Did not receive Received	6154 587	91.29 8.71
$X_{5}$ : Chemotherapy	0 1	Did not receive Received	5638 1103	83.64 16.36

Table 7. Maximum likelihood estimation (MLE), standard error (SE), and 95% confidence interval (95% CI) obtained for the model according to the covariates gender, clinical stage, radiotherapy, and chemotherapy.

Parameter	MLE	SE	95% CI		Parameter	MLE	SE	95% CI
Parameter	MLE	SE	Lower	Upper	Parameter	MLE	SE	Lower	Upper
$ρ$	6.397	0.126	6.149	6.645	$ρ$	4.981	0.002	4.977	4.985
$λ$	0.283	0.011	0.261	0.306	$λ$	0.293	0.011	0.271	0.316
$α$	0.982	0.021	0.940	1.024	$α$	0.986	0.021	0.944	1.028
$β_{01}$ (Intercept)	0.070	0.055	−0.038	0.178	$β_{03}$ (Intercept)	−0.423	0.040	−0.502	−0.345
$β_{11}$ (Gender)	−0.609	0.066	−0.739	−0.478	$β_{13}$ (Radiotherapy)	2.579	0.198	2.191	2.968
$p_{01}$ (Male)	0.525	0.007	0.512	0.538	$p_{03}$ (No)	0.656	0.015	0.627	0.686
$p_{11}$ (Female)	0.670	0.004	0.662	0.678	$p_{13}$ (Yes)	0.127	0.045	0.038	0.215
$ρ$	6.312	0.130	6.057	6.567	$ρ$	6.000	0.001	5.997	6.002
$λ$	0.310	0.011	0.288	0.333	$λ$	0.304	0.011	0.282	0.327
$α$	0.999	0.021	0.957	1.040	$α$	0.995	0.021	0.954	1.037
$β_{02}$ (Intercept)	−1.375	0.048	−1.470	−1.280	$β_{04}$ (Intercept)	−0.725	0.040	−0.804	−0.646
$β_{12}$ (Clinical stage)	2.831	0.086	2.664	2.999	$β_{14}$ (Chemotherapy)	2.571	0.123	2.331	2.811
$p_{02}$ (Stage I)	0.824	0.002	0.821	0.828	$p_{04}$ (No)	0.712	0.003	0.706	0.718
$p_{12}$ (Stage II)	0.217	0.018	0.182	0.251	$p_{14}$ (Yes)	0.159	0.030	0.100	0.218

Table 8. Variance decomposition of Waring model according to gender, clinical stage, radiotherapy and chemotherapy.

Source of Variability	Gender				Clinical Stage
	Male		Female		I		II
	Variance	VR	Variance	VR	Variance	VR	Variance	VR
Random effect	1.07	0.33	0.58	0.43	0.25	0.55	4.29	0.13
External effect	0.49	0.15	0.26	0.20	0.12	0.25	1.99	0.06
Internal effect	1.67	0.52	0.50	0.37	0.09	0.20	26.96	0.81
	Radiotherapy				Chemotherapy
	No		Yes		No		Yes
	Variance	VR	Variance	VR	Variance	VR	Variance	VR
Random effect	0.65	0.36	8.64	0.06	0.48	0.45	6.33	0.09
External effect	0.44	0.24	5.79	0.04	0.25	0.23	3.17	0.04
Internal effect	0.72	0.40	124.64	0.90	0.35	0.32	60.19	0.87

Table 9. Maximum likelihood estimation (MLE), standard error (SE), and 95% confidence interval (95% CI) obtained for the model considering all covariates.

Parameter	MLE	SE	95% CI
Parameter	MLE	SE	Lower	Upper
$ρ$	6.543	0.304	5.946	7.139
$λ$	0.311	0.011	0.289	0.333
$α$	0.999	0.020	0.959	1.040
$β_{0}$ (Intercept)	−2.546	0.164	−2.868	−2.223
$β_{1}$ (Age)	0.021	0.002	0.017	0.026
$β_{2}$ (Gender)	−0.545	0.058	−0.659	−0.432
$β_{3}$ (Clinical stage)	2.322	0.088	2.150	2.494
$β_{4}$ (Radiotherapy)	1.391	0.215	0.970	1.813
$β_{5}$ (Chemotherapy)	1.569	0.131	1.312	1.826

Table 10.

p_{0}

and RCIE of some patient profiles considering the model fitted with all covariates.

Table 10.

p_{0}

and RCIE of some patient profiles considering the model fitted with all covariates.

Age	Gender	Stage	Radiotherapy	Chemotherapy	$p_{0}$	RCIE
30	Male	I	No	No	0.888	0.130
			No	Yes	0.622	0.418
			Yes	No	0.663	0.375
			Yes	Yes	0.290	0.743
		II	No	No	0.436	0.604
			No	Yes	0.139	0.880
			Yes	No	0.161	0.860
			Yes	Yes	0.039	0.967
30	Female	I	No	No	0.931	0.080
			No	Yes	0.739	0.294
			Yes	No	0.772	0.258
			Yes	Yes	0.414	0.626
		II	No	No	0.572	0.469
			No	Yes	0.218	0.809
			Yes	No	0.249	0.780
			Yes	Yes	0.065	0.945
60	Male	I	No	No	0.806	0.222
			No	Yes	0.463	0.578
			Yes	No	0.507	0.534
			Yes	Yes	0.177	0.846
		II	No	No	0.289	0.744
			No	Yes	0.078	0.933
			Yes	No	0.092	0.921
			Yes	Yes	0.011	0.982
60	Female	I	No	No	0.877	0.142
			No	Yes	0.598	0.442
			Yes	No	0.640	0.399
			Yes	Yes	0.270	0.761
		II	No	No	0.412	0.628
			No	Yes	0.127	0.890
			Yes	No	0.148	0.871
			Yes	Yes	0.035	0.970

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vasquez, J.K.J.; Tomazella, V.; Alvares, D.; Marinho, P.R.D.; Martínez-Minaya, J. A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes. Stats 2026, 9, 64. https://doi.org/10.3390/stats9030064

AMA Style

Vasquez JKJ, Tomazella V, Alvares D, Marinho PRD, Martínez-Minaya J. A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes. Stats. 2026; 9(3):64. https://doi.org/10.3390/stats9030064

Chicago/Turabian Style

Vasquez, Jonathan K. J., Vera Tomazella, Danilo Alvares, Pedro Rafael D. Marinho, and Joaquín Martínez-Minaya. 2026. "A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes" Stats 9, no. 3: 64. https://doi.org/10.3390/stats9030064

APA Style

Vasquez, J. K. J., Tomazella, V., Alvares, D., Marinho, P. R. D., & Martínez-Minaya, J. (2026). A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes. Stats, 9(3), 64. https://doi.org/10.3390/stats9030064

Article Menu

A Random Activation Framework for Cure Models with Waring-Distributed Latent Causes

Abstract

1. Introduction

2. The Waring Distribution

Hierarchical Representation of the Waring Distribution

3. Random Activation Mechanism with Waring-Distributed Latent Causes

3.1. General Random Activation Mechanism

3.2. Special Case: Uniform Random Activation Mechanism

3.3. Waring-Distributed Number of Latent Causes

4. Inference

Incorporation of Covariates

5. Simulation Study

6. Application for Melanoma Data

Adjustment of Models with the Presence of Covariates

7. Final Remarks

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI