A Non-Mixture Cure Model for Right-Censored Data with Fréchet Distribution

This paper considers a non-mixture cure model for right-censored data. It utilizes the maximum likelihood method to estimate model parameters in the non-mixture cure model. The simulation study is based on Fréchet susceptible distribution to evaluate the performance of the method. Compared with Weibull and exponentiated exponential distributions, the non-mixture Fréchet distribution is shown to be the best in modeling a real data on allogeneic marrow HLA-matched donors and ECOG phase III clinical trial e1684 data.


Introduction
The cure fraction models are broadly used for analyzing survival data.In the literature, there are two major models to fit survival data with cure fraction.The first one is the mixture cure rate model, also known as standard cure rate model.This model was initially introduced by Boag [1] in 1949 and further developed by Berkson and Gage [2] in 1952 and later extensively studied by other authors.In this model, it is assumed that certain proportion of population is cured and the remaining is not cured.
In order to estimate cure fraction, parametric, semi-parametric and non-parametric methods have been studied by several authors.Farewell [3] in 1982 used a Weibull distribution for uncured subjects and logistic regression for cure probability.Goldman [4] in 1984 discussed parametric survivorship analyses using maximum likelihood estimation and the likelihood ratio test.Taylor [5] in 1995 proposed semi-parametric method to mixture model using logistic regression for incidence part and kaplan-meier for latency part.Peng and Dear [6] in 2000 investigated non-parametric approach to mixture model to estimate parameter of interest in the model using EM algorithm, marginal likelihood approach, and multiple imputations.They also extended to Cox's proportional hazards cure model.Kuk and Chen [7] in 1992 also proposed semi-parametric cure model using the logistic regression for cure probability and the proportional hazard regression model for failure time.Zhang and Peng [8] in 2009 proposed a mixture cure model where the covariate effects on the proportion of cure and the distribution of the failure time of uncured patients are separately modeled.Moreover, Kim and Jhun [9] in 2008 investigated interval censored data based on mixture cure model.They used to derive the likelihood in interval censored data based on an approximate likelihood approach suggested by Goetghebeur and Ryan [10] in 2000.
The second one is non-mixture cure rate model, also known as bounded cumulative hazard model and promotion time cure model.In cancer study, this model was developed based on assumption that the number of cancer cells that remain active after cancer treatment and that may grow slowly and produce a detectable cancer, which assumed to follows a Poisson distribution.This model was first proposed by Yakovlev et al. [11] in 1993 and was further discussed by Chen et al. [12] in 1999.The semi-parametric approaches of estimation for survival data with a cure fraction have been discussed by Chen et al. [13] in 2001.Tsodikov et al. [14] in 2003 provided a review of existing methodology of statistical inference based on the non-mixture model.They have highlighted that there are the distinct advantages of the non-mixture cure model: the non-mixture rate model has proportional hazard model structure, the non-mixture cure model present a much more biologically meaningful interpretation of the results of the data analysis and the non-mixture cure model is easy in computations due to its simple structure for the survival function which can provide a certain technical advantage when developing maximum likelihood estimation procedures.Herring and Ibrahim [15] in 2002 studied the parametric estimation of random effects for non-ignorable missing covariates in non-mixture cure model.Uddin et al. [16,17] in 2006 approached both non-parametric and parametric methods in non-mixture model for uncensored data.Liu et al. [18] in 2009 investigated the semi-parametric non-mixture cure model for interval censored data using the EM method.Lopes et al. [19] in 2012 studied both bayesian and clasical approaches to long-term survivors and random effects based on non-mixture cure model.
The survival models with long term survivors have been studied for decades.The survival data with long term survivors categorize into two groups: susceptible individuals and long term survivors.In this paper, we considered generalized extreme value type II distribution for long term survivors.The generalized extreme value distribution was introduced by Jenkinson [20] in 1955 and its sub-models are widely used for modeling extreme events.The generalized extreme value distribution sub-models are Gumbel, Fréchet and Weibull distributions.An extreme value distributions play an important role in statistics.Ramos et al. [21] in 2017, discussed some results for long term Fréchet distribution.We study a long term survival non-mixture cure model with Fréchet distribution for uncured population.
Simulation study is conducted to show the performance of the maximum likelihood estimators.
Moreover, we compare the our proposed model with other established models using the real data sets of Lukemia for 46 allogeneic marrow transplantation patients presented by Kersey et al. [22] in 1987 and ECOG phase III clinical trial e1684 available in R package smcure.

Likelihood Function of a Mixture Cure Model
A mixture cure model, also known as standard cure rate model, refers to a class of model for survival data with long term survivors when some of them will not develop the event of interest.In a mixture cure model, the population consists of two types of subjects: uncured (susceptible) subjects who experience the event of interest and cured (non-susceptible) subjects who will never experience it.Those who are not going to develop the event of interest are referred to as "cured subjects,"or "long-term survivors ".Let T denote the survival time of a subject and let ∆ be the cure indicator with ∆ = 0 when the subject is cured and ∆ = 1 when the subject is uncured.Let p be the proportion of subjects those are cured, and 1 − p be the proportion of subjects those are not cured.That is, P(∆ = 1) = 1 − p and P(∆ = 0) = p.So, P(T ≤ t|∆ = 0) = 0 and P(T ≤ t|∆ = 1) = F(t), the cumulative distribution function of uncured subjects.Then the cumulative distribution function of the overall population T is and the survival function of T is where f (t) is the probability density function of uncured patients.
Suppose y i be the right censored survival time for subject i, then y i = min(T i , C i ), where T i be the failure time of the ith subject and C i be the right censored variable of the ith subject, i = 1, • • • , n.The observed survival time of ith subject is y i and the censoring indicator is δ i , where δ i = I(T i ≤ C i ) .Then the likelihood function of the mixture model is and the log-likelihood function is (3)

Likelihood Function of a Non-mixture Cure Model
In this section, we introduce a non-mixture cure model developed by [11] in 1993 for survival data with long-term survivors in cancer study.Let N be the number of cancer cells for a patient after cancer treatment.Since the number of cancer cells may grow rapidly and may later produce a detectable cancer disease, the number of cancer cells, N, is assumed to have a Poisson distribution with mean λ.Let Z j be the random time for the jth cancer cell to produce a detectable cancer mass.Then, the time to relapse of cancer can be defined by the random variable T such that T = min{Z j , j = 1, 2, ..., N}.Furthermore, we assume that Z j are independently and identically distributed with distribution and survival functions F(•) and S(•), respectively.Then the survival function of T is given by: where p = e −λ is the probability of cure or cure fraction in this model since which implies that S T (•) is an improper survival function.The cumulative distribution, density and hazard functions of T are given below respectively, Since S T (t) is not proper survival function, then we can derive the following theorem: Since F(•) is strictly increasing which implies that F T (•) is also strictly increasing, it has an inverse, denoted by Case-1: If p = 0, this theorem is equivalent to the Inverse Transform Method.
Case-2: If p = 1, the result is not realistic for real life.
We now consider the right censored life time data and assume there are n patients under cancer study.Let y i refer to the right censored survival time for individual i and T i = min{Z ij , j = 1, 2, ..., N i } where C i be the right censored variables.Then we observe y i = min(T i , C i ) and the censoring indicator and the log-likelihood function is
The score functions are given below by taking partial derivatives of log-likelihood function ( 5) with respect to θ: If the log-likelihood function has a global maximizer, then the MLE is the solution of the score equations.Since it is non-linear system, numerical solution is computed by Newton-Raphson method.
The second partial derivative of the maximum likelihood function are given as follows: The asymptotic normality of maximum likelihood estimations of parameters are given by the inverse of the Fisher information matrix.The Fisher information is defined by the following relation In practical application, the observed Fisher information matrix is used when the expected Fisher

Simulation Study
We conducted simulation studies to examine the performance of the maximum likelihood estimator of θ = (α, β, p) in finite samples.The right censored survival times were generated by using the inverse transform method to the survival function S T (t) = p F(t) .The following algorithm used to simulate a sample of size n from the non-mixture Fréchet distribution with right censored data: Step 1: Generate a simple sample of u 1 , • • • , u n from ∼ U(0, 1).
Step 2: Suppose p is a cure fraction.The random survival time can be calculated from equation Step 3: We generate the simple sample of the censoring times c 1 , • • • , c n from a Fréchet distribution.We adjust the parameters of the Fréchet distribution to obtain the desired censoring rates.
Step 4: The right censored data is obtained from minimum of censoring time and survival time.That is Step 5: The observed data set is Step 6: Maximize likelihood function with respect to θ to obtain θ.The standard optimization method, optim() in R is used.
We consider various simulation settings with different proportion of cure fractions and different censoring rates.Censoring variable follows a Fréchet distribution with parameter α > 0 and β > 0, where the value of α and β would be adjusted to get the desired censoring rate in the right censored survival data.
In this simulation study, we are interested in the bias, standard error and root mean square error as the performance measures.The bias is the difference between the expected value of an estimator and true parameter and the standard error is a measure of the dispersion of the values in the sampling distribution, which is a statistical term that measures the accuracy.The mean square error (MSE) of an estimator is the expected squared deviation of the estimator of a parameter from the true parameter.
The root mean square error (RMSE) is the squared root of MSE.
The simulation results are based on 500 replications with the sample sizes 100, 200, 300, and 500 for each parameter setting.Results are presented in Tables 1-4  Figure 1 shows the relationship between the estimated bias against the censoring rate.

A Real Data Analysis
In this section, we would like to compare non-mixture and mixture models with different distributions, we use different susceptible assessments: the negative value of the log-likelihood function, the Akaike's information Criterion (AIC) and the corrected Akaike's information Criterion (AIC c ).The (AIC) is defined by AIC = −2lnL(θ) + 2k, where θ = (α, β, p) is the vector of unknown parameters included in the model, L(θ) is the likelihood function and where k is the number of free parameters in the model.The (AIC c ) is defined by where n is the sample size.The lower the value of selection criterion indicates the better fit.
We consider the data set of 46 patients with an HLA-matched donor received allogeneic marrow presented by Kersey [22] in 1987.Out of the 46 patients, 12 with allogeneic transplanted were died during their observed periods.Those are the censored observations (26%).Using this data, we compare the non-mixture and mixture models with Fréchet, two parameters Weibull and two parameters Exponentiated Exponential (EE) susceptible distributions using maximum likelihood method.
Tables 5, and 6 show the maximum likelihood estimates of the parameters and their standard errors of the above three susceptible distributions with both non-mixture and mixture models for allogenic marrow transplanted (leukemia) data, respectively.Table 7 shows the values of −logL, AIC and AIC c of three susceptible distributions using both mixture and non-mixture models for leukemia data.From Table 7, indicates that Fréchet Distribution is the best within non-mixture and mixture cure models, respectively.The second data set we considered is the melanoma data without covariates from the Eastern Cooperative Oncology Group (ECOG) phase III clinical trial e1684 available in R package smcure.
In this trial, we found that a total of 69% patients are censored.This trial, e1684, was a two arm clinical trial comparing high-dose interferon alpha-2b (IFN) regimen to observation.8 and 9. Table 10 shows the values of −logL, AIC and AIC c of different distributions both mixture and non-mixture models for melanoma data.From the result of Table 10, we conclude that the Fréchet distribution non-mixture and mixture cure models are better fit, since −logL, AIC and AIC c are smaller when compared to the other models.

1 ,
this paper, we consider to use a generalized extreme value distribution as the susceptible distribution, i.e., the type II Fréchet distribution.The class of extreme value distributions essentially consists of three types of extreme value distributions: type I (Gumbel distribution), type II (Fréchet distribution) and type III (Weibull distribution).The extreme value distribution is useful in modeling and measuring the events which occur with very low probability.Moreover, extreme value distributions are widely used in finance, insurance, risk management, economics, material sciences and many other subjects those dealing with extreme events.More specifically, we consider the Fréchet distribution, which was named after the French mathematician Maurice Fréchet (1878-1973), to fit the survival time of uncured individuals.The probability density function of Fréchet distribution is f (y) = α > 0 is the shape parameter and β > 0 is the scale parameter and the survival and hazard functions are S(y) = 1 − e y ≥ 0. Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 September 2018 doi:10.20944/preprints201809.0588.v1Peer-reviewed version available at Stats 2018, 1, 13; doi:10.3390/stats1010013

Figure 1 .
Figure 1.The relationship between the bias and the censoring rate.

Figure 3 .Figure 3 .
figure of Figure 3 is the Kaplan-Meier survival curve of the melanoma data.Top right corner of Figure 3 is the Kaplan-Meier estimated curve overlaid with the estimated survival curves using Fréchet non-mixture and mixture model.Bottom left corner of Figure 3 is the Kaplan-Meier estimated curve overlaid with the estimated survival curves using Weibull non-mixture and mixture models.Bottom right corner of Figure 3 is the Kaplan-Meier estimated curve with survival curve of non-mixture and mixture model of Exponentiated Exponential distribution.The curve of non-mixture Fréchet distribution is closer to the Kaplan-Meier survival curve in compare to all other distributions, and mixture model is little bit over estimating the survival rate.

Table 1 .
However, the simulation results for both cure fractions of 1% and 2% with lower censoring rates perform better than the higher censoring rates.The biases of estimates are small, and the estimates of the standard error (SE) and root mean square error (RMSE) of all examined parameters decreased with increasing sample size for all settings.The estimates of all examined parameters perform better for low level of censoring than high level of censoring.Moreover.The estimates of all examined parameters performed well for low rate of cure proportion in comparing to high rate of cure proportion.Finally, we can conclude that the MLE works well for non-mixture cure model with Fréchet susceptible distribution.Summary statistics of a non-mixture model with 1% cure fraction for right censored data , which show the values of mean, bias, standard error (SE) and root mean square error (RMSE) of MLE.The simulation results suggest that the proposed method has a good performance overall.The average of the estimates are very closed to respective true parameter values in all different settings of simulations.

Table 2 .
Summary statistics of non-mixture model with 2% cure fraction for right censored data

Table 3 .
Summary statistics of non-mixture model with 1% cure fraction for right censored data

Table 4 .
Summary statistics of non-mixture model with 2% cure fraction for right censored data

Table 5 .
Results of non-mixture cure models with Leukemia data

Table 6 .
Results of mixture cure models with Leukemia data

preprints.org) | NOT PEER-REVIEWED | Posted: 29 September 2018 doi:10.20944/preprints201809.0588.v1
Peer-reviewed version available at Stats 2018, 1, 13; doi:10.3390/stats1010013analyzingour models.The statistics summary under Fréchet susceptible distribution, two-parameter Weibull and Exponentiated Exponential susceptible distributions are presented in Tables There were a total of 286 patients enrolled in the study, collected from 1984 to 1990, and the study was published in 1996.After deleting the missing data, we used a total of 284 observations without covariates for Preprints (www.

Table 8 .
Results of a non-mixture cure model for melanoma data

Table 9 .
Results of a mixture cure model for melanoma data

Table 10 .
Models comparison by information criteria for melanoma data Frechet 367.546 741.09 741.18 367.356 740.72 740.80 Weibull 378.405 762.81 762.90 381.396 768.79 768.88 EE 377.970 761.94 762.03 382.549 771.10 771.18The Kaplan-Meier estimate of survival curve and fitted survival curves of the Fréchet, Weibull and Exponentiated Exponential distributions for the melanoma data are given in