Residual Analysis for Poisson-Exponentiated Weibull Regression Models with Cure Fraction

: The use of cure-rate survival models has grown in recent years. Even so, proposals to perform the goodness of fit of these models have not been so frequent. However, residual analysis can be used to check the adequacy of a fitted regression model. In this context, we provide Cox–Snell residuals for Poisson-exponentiated Weibull regression with cure fraction. We developed several simulations under different scenarios for studying the distributions of these residuals. They were applied to a melanoma dataset for illustrative purposes.


Introduction
The application of survival analysis models has expanded substantially in many areas in recent years, such as reliability analysis, operational research, oncology, and tourism, among many other areas.A particular feature of the usual survival models is the assumption that as the measure of the time until the observation of the event of interest approaches infinity, all the observations will involve the occurrence of that event.
However, in many cases, there are individuals (observations) that will not experience the event of interest, where the observed time is considered to be infinity.These models are called survival models with cure fractions.They were first proposed in [1][2][3], which described a population as a mixture of individuals susceptible and other individuals immune to the event under study, assuming that at most one cause contributes to the time until the occurrence of the event of interest for the susceptible individuals, while the immune individuals do not present causes contributing to the occurrence of the event modeled based on a Bernoulli distribution, commonly called a mixture model.
Thereafter, other proposals emerged for incorporating the immune individuals into the event in question in the models.For example, those formulated in [4] stand out, which were based on a biological context for studying the progression until the time of occurrence of the event of interest.In these cases, it was assumed that the number of competing causes to determine the time until the occurrence of the event follows the Poisson distribution, which became known as the promotion time model.
Parametric models for survival analysis stand out, among other reasons, since the Weibull distribution can be applied in many areas.There are also many generalizations of the Weibull distribution, but the exponentiated Weibull (EW) [5] has been the most popular one, as discussed in [6].So, the adoption of this distribution in survival analysis has been indicated as a solution for modeling various problems in recent years [6][7][8][9].
Many papers have been published proposing cure fraction models.For example, ref. [10] introduced a unified approach for cure rate models, ref. [11] defined a destructive weighted Poisson cure rate model, ref. [12] described a geometric Birnbaum-Saunders regression model with cure rate, ref. [13] addressed the log-beta Weibull regression model with application to the recurrence of prostate cancer, ref. [14] studied a power series beta Weibull regression model for predicting breast carcinoma, ref. [15] discussed the estimation of nonlinear effects in the presence of cure fraction using a semi-parametric regression model, and ref. [16] presented an extension of the destructive Poisson odd log-logistic generalized half-normal cure rate model.More recently, refs.[17][18][19] introduced various regressions for the cure fraction under different distributions, both for the failure time and the latent variable that counts the number of cancer cells.However, none of these studies analyzed the residuals of these models for the cure fraction.For common survival models, one can use the Cox-Snell residuals, deviance, and Schoenfeld residuals, among others.
Among other important works that model cure fractions, we can cite the following.Ref. [20] introduced the estimation into the Cox proportional hazards cure model, ref. [21] presented the algorithm-based likelihood estimation for some cure rate models.Ref. [22] studied the log-normal lifetimes and likelihood-based inference for flexible cure rate models based on the COM-Poisson family, ref. [23] introduced the estimation of parameters of a flexible cure rate model with a generalized gamma lifetime and discrimination modeling under likelihood-and information-based methods, ref. [24] investigated the likelihood inference for the destructive exponentially weighted Poisson cure rate model for melanoma data, and ref. [25] reported a stochastic version of the EM algorithm for a mixture cure model under the EW family.
The assessment of the fitted model is an important part of data analysis, particularly in regression models, and residual analysis is a helpful tool for validating the fitted model.An examination of residuals can be conducted, for instance, to detect the presence of outliers, the absence of components in the systematic part of the model, and departures from the response distribution and variance assumptions.
Among the cure survival models with Schoenfeld residuals are those discussed in [26] for the semiparametric mixture cure models, which suggests that there is at most one cause contributing to the occurrence of the event modeled by a Bernoulli distribution, and those with Cox-Snell residuals for totally parametric mixture models from the Weibull distribution for susceptible individuals with interval and right censoring [27], as well as some semiparametric models [28].
Based on all these studies, we provided Cox-Snell residuals to assess departures from the response distribution assumption and to detect outlying observations in the Poissonexponentiated Weibull (PEW) regression model with cure fraction.These residuals can also be adopted for the Poisson-exponentiated exponential, Poisson-Weibull, and Poisson exponential regression model with cure fraction.For different parameter settings, sample sizes, and censoring percentages, various simulations were performed, and the empirical distributions of the residuals are reported.
The rest of this paper is structured as follows.Section 2 defines the PEW regression with cure fraction.Section 3 discusses Cox-Snell residuals for this regression and provides some simulations.Section 4 analyzes a melanoma dataset.Section 5 ends with some conclusions.

PEW Regression Model with Cure Fraction
The mixture models [1], promotion time models [29,30], and flexible models with healing fraction [31] are among the most popular approaches for cure rate survival models.A proposal to unify these models was introduced in [31,32], and synthesized in [33].In this work, we adopted the promotion time model.Let M be the unobservable number of causes of the event of interest with a Poisson mass function (pmf): where θ > 0. Let Z j be the time for the jth cause to produce the event of interest (j = 1, . . ., M).Consider that the Z j s are independent and identically distributed (iid) random variables (independent of M) with survival function S(t).The observed time for the event of interest is T = min{Z 1 , • • • , Z M }, and T = ∞ if M = 0.Then, the improper survival function for the entire population has the form [32] The cure fraction is p 0 = exp(−θ), and the corresponding improper density and hazard functions are and respectively.
In order to provide more flexibility, we consider that the time Z to the recurrence of the breast cancer has the EW survival function where τ = (α, γ, λ) ⊤ , and all parameters are positive.It includes several known distributions [6].Let Z ∼EW(α, γ, λ).
The main motivation for using the EW model ( 5) is that it extends some important distributions previously considered in the literature.In particular, it contains, as special cases, the exponentiated exponential for α = 1, Weibull for γ = 1, and exponential for α = γ = 1, among others.Another main characteristic of the EW distribution is that the hazard rate function (hrf) can be constant, increasing, decreasing, unimodal, and bathtub.It is important to emphasize that most EW mathematical properties are manageable using modern computer programs with numerical and analytic capabilities.So, they may turn into adequate tools comprising the arsenal of applied statisticians.
In inserting the previous expression into Equation (2), the PEW model (with long-term survivors) reduces to [34] and the population density function (pdf) becomes

Inference
Consider that the time to event T i is subject to right censoring time The parameter θ in Equation ( 6) is related to the covariates x i by where β = (β 1 , . . ., β p ) ⊤ is the vector of regression coefficients.
The log-likelihood function for φ = (τ, β) ⊤ given (y 1 , δ 1 , x 1 ), • • • , (y n , δ n , x n ) follows from Equations ( 6) and (7) as follows: where r is the number of uncensored observations.The maximum likelihood estimate (MLE) φ of φ can be found from the maximization of (8).We use the BFGS method [35] through the maxLik [36] package of the R software [37].The script can be obtained from the authors upon request.
Under conditions that are fulfilled for the parameter vector φ in the interior of the parameter space but not on the boundary, the asymptotic distribution of , where K(φ) is the expected information matrix.The asymptotic covariance matrix K(φ) −1 of φ can be approximated by the inverse of the (p + 3) × (p + 3) observed information matrix − L(φ).The approximate multivariate normal distribution N p+3 (0, − L(φ) −1 ) for φ can be used in the classical way to construct approximate confidence regions for some parameters in φ.

Cox-Snell Residuals for the PEW Regression Model
The fitting of regression models involves assumptions established regarding the data.Among these is the relationship between the time to the occurrence of the event and explanatory variables, under the assumption that the random variable associated with the failure time follows a specified distribution [38].We define Cox-Snell residuals to measure the adequacy of the fit of the PEW regression with cure fraction.We adopted the idea of the residuals for survival analysis pioneered in [39] using a transformation of the cumulative hazard function (CHF) associated with the PEW regression model.For the regression models normally employed in survival analysis, the Cox-Snell residuals should behave like a censored sample from a standard exponential distribution when the fit is adequate [38].Thus, a plot of the estimated CHF versus the ordered residuals should be near the straight line with slope one if the model chosen is correct.
When faced with a population including individuals who are susceptible and immune to the event of interest, it is necessary to propose residuals that can evaluate the fitted survival function that captures this nuance.Among the various residuals for models with cure fraction, the one formulated in [26] has been of interest, where Schoelfeld residuals were presented for the mixture model.Ref. [28] employed residuals to evaluate mixture models globally and select the latency distribution, and ref. [27] adopted Cox-Snell residuals to evaluate the standard mixture model with right censoring.
The distribution associated with the CHF (see Appendix A) for the PEW regression with cure fraction is a mixture distribution [27,28].Even if this distribution is not a standard exponential, it will have standard exponential behavior in the interval [0, θ).Following the idea of these authors, we propose Cox-Snell residuals for the PEW regression with cure fraction: If the fit is adequate, these residuals will have the behavior of a censored sample from a standard exponential distribution in [0, θ) (see Appendix A).Similar to the findings reported in [27], the CHF follows a mixed distribution, whereas the continuous part has a standard exponential distribution, and the discrete part behaves like a degenerate variable in θ.These facts are shown empirically in the simulation study presented in Section 3.1.
After calculating the residuals, it is common to construct simulated envelopes [40].The algorithm used to generate them is given below: 1.
Fit the new PEW regression to the data and calculate r CSi ; 2.
Generate k data sets from this regression model using the parameter estimates found in the previous fit; 3.
Adjust the PEW model for each generated data point and calculate the residuals r CSi for each of the k generated samples; 4.
Use the residuals as a censored sample from a censored standard exponential in the interval [0, θ i ) and estimate the parametric survival function for each of the k samples using the method in [41]; 5.
The minimum and maximum values of the r CSi s define the envelope; if the current regression model is adequate, the residuals should be distributed randomly inside their bands.
The simulated envelope from the estimated parameters constitutes a tool for verifying the basic assumptions of the proposed regression model.

Simulation Study
The algorithm used to generate each data set is reported below: 1.
Generate m i occurrences from the Poisson distribution with θ i = exp(x ⊤ i β), where the covariates in x are fixed, and the parameters are chosen to preserve the levels of the censorship proportions.

2.
Generate the random censoring c i from the uniform U(0, u) distribution, where u is chosen to preserve the censoring proportions and cure fraction.Table 1 provides the levels of censoring, percentages of censoring, and percentages of the cure fraction.

3.
If m i = 0 ⇒ t i = ∞.Otherwise, t i = min{z i1 , . . ., z im i }, where z im i denotes generated values from the EW distribution with its parameters chosen according to Tables 2 and 3 without and with covariates, respectively.4.
The observed times are determined as Generate a censoring indicator variable as follows: The generated samples without covariates will be composed of the observed times and the censorship indicator variable, i.e., (y,δ).For the case of covariates, the data are (y, δ, x).For the case of covariates, they are specified as follows: x i1 ∼ N(0, 1.5), x i2 ∼ Bernoulli(0.5),and x i3 ∼ Exponential(2.5).Thus, Table 2 provides the parameters without covariates, whereas Table 3 provides the fixed parameters for the covariates.
The three levels of censoring are light, moderate, and high.The proportions of censoring and cure fraction are listed in Table 1.The sample sizes are n = 100, 250, and 500 for both scenarios.The plots below support the model assumptions.Each plot in Figures 1-12 was prepared from 1000 replicates (under each scenario: A 1 , A 2 , B 1 , and B 2 ) based on the fact that the residuals of the replicates behave like a censored sample from an exponential distribution in the interval [0, θ) for susceptible individuals.Thus, we can use the plot of the CHF Ĥ0 (r CSi ) = − log( Ŝ0 (r CSi )) versus the ordered estimated residuals r CSi , where Ŝ0 (r CSi ) is the Kaplan-Meier (KM) survival function.Then, the expected behavior for a corrected model fit to the data follows if these plots are close to a straight line with unity slope [27].The Cox-Snell residuals for the proposed regression with covariates (scenario A 2 ) are displayed in Figures 4-6.It can be noted, due to the presence of covariates, that the expected behavior has a dispersion around the first bisector line, which is a little greater if compared to the cases without covariates.However, this preserves the pattern of a linear behavior around this line, thus evidencing the capacity of the proposed residuals to select the model that best describes the current data.

Misspecification (Scenario B)
We adopted the proposed residuals to select a model that fits well to the generated data sets by adding simulation results for scenarios B 1 and B 2 .Thus, we can note the ability of the residuals to identify the corrected distribution to the data.Similar results are reported in Figures 10-12 for scenario B 2 by generating data from the PEW promotion time model with covariates with cure fraction after fitting the log-logistic promotion time model.Again, we can note that these residuals can capture the model misspecification.
Finally, all plots show the possibility of using new Cox-Snell residuals to select different distributions and find one that is adequate to describe data with cure fraction.

Application to Melanoma Data
We applied the proposed methods to cutaneous melanoma data discussed in [42,43].We emphasize that these previous papers did not conduct a residual analysis for models with cure fractions.For this data set, the response variable y i is the time that elapsed from the diagnosis to death for n = 417 patients who had cutaneous melanoma with approximately 43% of censored observations.The independent variables were as follows: x i1 : treatment (0 = observation, 1 = interferon dose); x i2 , age (in years); x i3 , nodule category (1 to 4); x i4 , sex (0 = male, 1 = female); x i5 , p.s. (performance status patients in terms of activities: 0 = fully active, 1 = other); and x i6 , tumor thickness (in mm).

Marginal Analysis of the Response Variable
First, we analyzed the response variable.Figure 13 displays the Kaplan-Meier (KM) survival function, where an obvious cure possibility occurs after five years [44].
We consider Equation (7) to model these data marginally.Table 4 provides the MLEs and their standard errors (SEs) for the PEW model with cure fraction.The KM and the estimated survival functions in Figure 13 reveal that the PEW model fits well to the data.The estimated proportion of cured patients is θ = e β0 = 0.6819 ⇒ p0 = e − θ = 50.57%,where the confidence interval for the proportion of cured patients is (44.57%,55.37%).Figure 14 shows the plot of the CHF versus Cox-Snell residuals, which supports a good fit of the PEW model to these data.

The PEW-Adjusted Regression
Consider the PEW regression with cure fraction addressed in Section 2: The MLEs and standard errors (SEs) in Table 5 were calculated using the R software [37], and they indicate that x 3 is a significant variable.Hence, the final regression is    16 provides the plots of the empirical and estimated survival functions for the four nodule categories, which reveal that the new regression provides a good fit to explain the time elapsed from diagnosis until death caused by cutaneous melanoma cancer.Fitting the PEW model proved to be effective in describing the data under analysis.This model exhibits results comparable to those reported in [45], which applied the COM-Poisson Weibull model with a cure fraction among other models to cutaneous melanoma data.The findings of our study show a striking similarity with these previous results.Similarly, ref. [46] introduced the Negative Binomial Generalized Gamma model with cure rate to explain the incidence of cutaneous melanoma.The results obtained in [46] and others also support the conclusions obtained from our adjusted PEW model.
However, although both studies mentioned corroborate the applicability of the PEW model in similar contexts, there is still a significant gap in the literature in terms of evaluating the adjusted model through residual analysis.There are no established methods that offer an accurate distribution of waste and are simultaneously easy to implement.Therefore, we emphasize that the residuals developed in our study offer a considerable advantage in evaluating fitted models.In calculating these residuals, it is possible to carry out a more precise and detailed analysis of the adjusted model, thus improving the reliability of the statistical inferences made.

Conclusions
Several simulations performed in this study indicated that the Cox-Snell residuals for the Poisson-exponentiated Weibull (PEW) regression showed a linear behavior around the first bisector, thus evidencing the capacity of these residuals to select the best model for the current data.The proposed methods of residuals and envelopes for survival models with cured fraction can be employed to assess the suitability of the proposed models for the data.This utilization can be similar to what is performed in the case of the normal linear model, evaluating the quality of model fit by comparing the residuals with a reference probability distribution and simulated envelopes.In this way, the contributions of this study can be understood in the common context of model fitting and checking the adequacy of regression models.Similarly, future research may be conducted to study these residuals in other models with cure fraction.Further, note that these residuals for the PEW regression model with cure fraction can be easily extended to other regression sub-models (Weibull, log-logistic, exponentiated exponential, log-normal, and Rayleigh, among others) since the EW distribution has some special cases widely used in survival analysis.Finally, other works can be developed to extend the Cox-Snell residuals for several regression models with cure fractions in the literature, such as the COM-Poisson cure rate survival models, destructive negative binomial cure rate models, Conway-Maxwell-Poisson generalized gamma regression models, a power series beta Weibull regression model, survival models induced by discrete frailty for modeling lifetime data with longterm survivors, power series cure rate models for spatially correlated interval-censored data, and long-term bivariate survival Farlie-Gumbel-Morgenstern copula models (bivariate case), among others.(A2) Finally,

Figures 1 - 500 Figure 1 . 500 Figure 2 .
Figures 1-3 display plots of the CHF versus the ordered Cox-Snell residuals for the new model without covariates (scenario A 1 ).So, we can conclude the following: • The proposed residuals have a linear behavior with unity slope.• The linear behavior of the residuals becomes more evident when the sample size increases.n = 100 n = 250 n = 500

Figure 13 .
Figure 13.The KM and estimated survival function from the PEW model with cure fraction.

Table 1 .
Percentages of censoring and cure fraction for the simulations.
2 : generate data from the PEW regression with cure fraction with covariates and fit the log-logistic regression with cure fraction under the same structure for them.For both scenarios B 1 and B 2 , we show how the proposed residuals can verify the model misspecification.

Table 2 .
Fixed values of β 0 and u for generating data without covariates.

Table 3 .
Fixed parameters values and u and β for generating data with covariates.

Table 4 .
Findings from the fitted PEW model with cure fraction.

Table 5 .
Findings from the fitted new regression.

Table 6 .
Findings from the final fitted model.