On Progressive Censored Competing Risks Data: Real Data Application and Simulation Study

: Competing risks are frequently overlooked, and the event of interest is analyzed with conventional statistical techniques. In this article, we consider the analysis of bi-causes of failure in the context of competing risk models using the extension of the exponential distribution under progressive Type-II censoring. Maximum likelihood estimates for the unknown parameters via the expectation-maximization algorithm are obtained. Moreover, the Bayes estimates of the unknown parameters are approximated using Tierney-Kadane and MCMC techniques. Interval estimates using Bayesian and classical techniques are also considered. Two real data sets are investigated to illustrate the different estimation methods, and to compare the suggested model with Weibull distribution. Furthermore, the estimation methods are compared through a comprehensive simulation study.


Introduction
In survival analysis and reliability experiments, the event of interest often occurs as a result of several factors.Neglecting those factors or reducing them to one factor may lead to wrong decisions and unacceptable results.Analyzing the data taking into account all the factors leading to the event of interest is called in the literature competing risks problem.For example, in a clinical trial of a group of patients with breast cancer, Boag [1] mentioned that there were causes other than breast cancer that were the cause of death for some patients.Hoel [2], in an experiment to analyze the survival of a group of mice, attributed the cause of death of mice to three causes, namely reticulum cell sarcoma, thymic lymphoma or other.In reliability experiment, Doganaksoy et al. [3] attributed the failure of electrodes to one of two modes (causes) which are the degradation of the organic material (mode D) and the insulation flaw as a result of a processing problem (mode E).More reliability experiments that are concerned with competing risks were reported by Nelson [4] in Chapter 7 and Craiu and Lee [5].Nowadays, the patients who are infected by Covid 19 may be suffering from other diseases and hence the death may occur as a result to one of these diseases.From these examples, we can conclude that the competing risks data includes time of failure event and an indication of the causes of failure.The causes of failure may be independent of each other or may be not independent.Here, competing risks via latent failure times are used through the point of view proposed by Cox [6].The topic of competing risks is rich in references, we mention, for example, Kalbfleish and Prentice [7], Crowder [8], Kundu et al. [9], and Koley and Kundu [10].
Progressive Type-II censoring is one of the most important and famous types of censoring, in which the lifetime experiment can be terminated after a pre-specified number of failure say m out of n, where n is the sample size.Through this type, one can remove or exclude some units during the test.This type was introduced first by Cohen [11].Progressive Type-II censoring scheme is denoted by n, m, R 1 , R 2 , ..., R m , where R i is the number of units removed from the test at failure i, i = 1, 2, ..., m and R 1 + R 2 + ... + R m = n − m.For extensive description of progressive Type-II censoring, see for example Balakrishnan and Aggarwala [12], and Balakrishnan and Cramer [13].
Nadarajah and Haghighi [14] proposed a two-parameter lifetime distribution with an increasing or decreasing failure rate as an alternative to some well-known distributions.This distribution is known as an extension of the exponential distribution.Here, we assume the latent failure times follow the extension of the exponential distribution [14] with a joint shape parameter γ > 0 and distinct scale parameters σ 1 > 0 and σ 2 > 0, σ 1 = σ 2 .Specifically, the latent variable T j , j = 1, 2 has probability density function (PDF), cumulative distribution function (CDF) and hazard rate function (HRF) as follows: and h j (t) = γσ j 1 + σ j t γ−1 , t > 0, j = 1, 2. ( The extension of the exponential distribution (EED) is one of the distributions that have been used extensively in recent time as a lifetime model.This distribution was introduced in 2011 by both Nadarajah and Haghighi, and it is often called Nadarajah and Haghighi (NH) distribution.This distribution is distinguished by the relationship between its PDF and its HRF [14].The EED has a decreasing PDF and allows increasing, constant or decreasing hazard rate.Many researchers have been interested in studying this distribution.For example, in the past ten years, more than a hundred articles were related to this distribution.Some of them dealt with the distribution itself, and some of them introduced generalizations to it.We can refer to the references [15][16][17][18][19][20][21][22][23] that dealt with the EED and to the references [24][25][26][27][28] for the extensions and generalizations of EED.
This paper presents statistical inference of unknown parameters for EED based on progressive Type-II censored data in the presence of competing risks.Suppose that the latent failure time comes from two independent competing risks, following the EED with parameters (γ, σ 1 ) and (γ, σ 2 ), respectively.Different estimation methods are used to estimate the parameters like the maximum likelihood method (ML), expectation-maximization (EM) method and Bayesian method.The Bayesian estimates (BEs) are approximated using Monte Carlo Markov chain method (MCMC) and Tierney-Kadane (TK) approximation; 95% asymptotic confidence intervals (CIs) of the parameters are constructed depending on the maximum likelihood estimates (MLEs), while 95% credible intervals of the parameters are executed depending on the MCMC generated values of parameters.A comprehensive simulation study is performed for the purpose of comparison various estimation methods.For the purpose of illustration the different estimation methods, two real data sets are analyzed.Moreover, these data sets are used to illustrate that EED can be a possible alternative to Weibull distribution.
The rest of the paper is organized as follows: the model is formulated in Section 2. In Section 3, the MLEs are provided, while the EM algorithm is provided in Section 4. Bayesian estimations with methods of approximation are presented in Section 5.In Section 6, we present two applications of real-life data.Simulation study is provided in Section 7. Finally, the paper is concluded in Section 8.

Model Formulation
This section outlines the competing risks model and its notations.Suppose a lifetime experiment of n units prone to failure for two previously known causes.Let T i = min{T i1 , T i2 } ∀ i = 1, 2, ..., n, where T ij is the latent failure time of the ith unit under the jth cause of failure, j = 1, 2. ∀ i = 1, 2, ..., n, the failure times T i1 , and T i2 are assumed to be independent and follow EED(γ, σ 1 ) and EED(γ, σ 2 ), respectively.Moreover, we assume that the pairs (T i1 , T i2 ) ∀ i = 1, 2, ..., n are iid random variables.

Remark 1.
If X 1 ∼ EED(γ, σ 1 ) and X 2 ∼ EED(γ, σ 2 ) are independent random variables, then the reliability function of X = min{X 1 , X 2 } is given by Furthermore, the CDF and PDF of X are given, respectively by We can describe the competing risk data subject to progressive Type-II censoring as follows: Assume that n identical units are monitored to collect the failure times of those units according to prefixed censoring scheme (CS) R = (R 1 , R 2 , ..., R m ), where the failure may be caused by one of j causes.According to this scenario m is the total number of failures, n − m is the total number of censored units and R i is the number of survival units randomly excluded from the test just when the ith failure occurs.Cause of failure is also recorded, and if the ith recorded failure was caused by the jth risk, then we state δ i = j.Therefore, m j = ∑ m i=1 I(δ i = j) is the number of failures caused by the jth risk where m = m 1 + m 2 and According to the previous description of the competing risks data, the joint PDF for data set T = (T 1 , T 2 , ..., T m ) is given by [9] where f j (t i ) and F j (t i ) = 1 − F j (t i ) are the PDF and CDF of T i for the jth cause of failure, i = 1, 2, ..., m, j = 1, 2.

Maximum Likelihood Estimation
This section concerns with the classical estimation for the parameters of EED based on two independent competing risks under progressive Type-II censored data.From (1) and ( 2) in (7), one can obtain the likelihood function of γ, σ 1 and σ 2 after neglecting the constants as follows: From ( 8), the log-likelihood function can be written as the likelihood equations of γ, σ 1 and σ 2 are, respectively, In order to obtain the estimates γ, σ1 and σ2 , ( 10)-( 12) are set to zero and then solved numerically.

Asymptotic Confidence Intervals
In this subsection, the asymptotic CIs are constructed depending on the observed Fisher information matrix of the parameters γ, σ 1 and σ 2 , which is given by: Now, the 100(1 − α)% two-sided asymptotic CIs for the parameters γ, σ 1 and σ 2 are obtained as follows: and where V( γ), V( σ 1 ) and V( σ 2 ) are the estimated variances of γ, σ 1 and σ 2 , which are the diagonal elements of I −1 ( γ, σ 1 , σ 2 ), and z α/2 is the upper (α/2) percentile of standard normal distribution.

Bayesian Estimation
Through this section, the BEs for the parameters γ, σ 1 and σ 2 of the competing risks EED are derived based on progressive Type-II censoring under three different loss functions.The first loss function is the squared error loss (SEL) function which is symmetric function and given by: where θ is the estimate of θ.The second loss function is asymmetric function, called linear exponential (LINEX) loss function and defined as The third loss function is the generalized entropy (GE) loss function which is asymmetric loss function and given by:

Prior and Posterior Distributions
Consider the parameters γ, σ 1 and σ 2 are independent variables having gamma prior distributions as follows: where a i and b i for i = 1, 2, 3 are the hyper-parameters of the priors.For any parameter θ with prior density Gamma(a, b), the values of the hyper-parameters are obtained by solving the following two equations: (1) The mean of the Gamma prior distribution equals the MLE of θ, i.e., ( a b = θML ).( 2) The variance of the Gamma prior equals a small value, say 0.05, i.e., ( a b 2 = 0.05).This method for determining hyper-parameters values was used by Nagy et al. [30].
From ( 8) and ( 26), then the joint posterior distribution of the parameters is given by: Now, the BE for any function g(γ, σ 1 , σ 2 ) under the SEL function be the posterior mean of g(γ, σ 1 , σ 2 ) which is given by: The BE under the LINEX loss function is given by: while the BE under GE loss function is given by: Unfortunately, based on ( 28)- (30), the BEs cannot be obtained in a closed-form, so approximation methods are recommended.MCMC approximation method and Tierney-Kadane method are used in the following two subsections to approximate the BEs.

Real Data Applications
Through this section, two real data applications are analyzed under the concept of competing risk model to apply and illustrate the proposed methods of estimation under progressive Type-II censoring.Furthermore, these two real data sets are used to show that EED can be a possible alternative to one of well-known distribution such as Weibull distribution.

Application 1
For the purpose of illustrating the estimation techniques and comparing EED with Weibull distribution, we consider a real data set that was originally monitored by Xia et al. [33] in an experiment of the breaking strength of jute fibres.In this experiment, the breaking strength failure data of jute fiber are caused by two different gauge lengths, 10 mm and 20 mm.From the original data [33], we generate a progressive Type-II censored data with n = 60 and m = 40; such progressive censored data as well as their causes of failure are listed in Table 1.
In the remainder of this subsection, we use the data in Table 1 to compare the EED with Weibull distribution and to illustrate the different estimation methods.Now, the modified Kolmogorov-Smirnov test (MK-S) which was suggested by Pakyari and Balakrishnan [34] for fitting a progressive Type-II censored data is performed to test the fitting of EED and Weibull distribution to the data set in Table 1.The goodness of fit test requires estimating unknown parameters of the suggested distributions.Therefore, the unknown parameters of the EED and Weibull distribution are estimated using ML method, and these estimates are presented in the Table 2. Table 2 also contains the value of MK-S test statistic and corresponding p-value for both EED and Weibull distribution.From Table 2, we can see that the EED provides a reasonable fit to considered data set under progressive Type-II censoring, while Weibull distribution does not fit this data set because the corresponding p-value is less than 0.05.

Application 2
In this application, the data under the study were introduced by King [35] and then presented in Nelson [36] and Crowder [8].These data represent the breaking strengths of 23 wire connections.There are two competing causes of failure: breakage along the wire itself and breakage at the bonded end.The complete data consists of 23 failure times in which 13 failures due to breakage of the bond and 10 failures due to the breakage of the wire itself.In this example, we generate a progressive Type-II censored sample from the complete data with m = 18.Such progressive Type-II competing risks data are presented in Table 5.The measurements appear to be rounded to the nearest 50 mg, and the two zeros, as Nelson [36] pointed out, must be faulty bonds.
The MLEs and MK-S test are used to check the validity of the EED to fit the data in Table 5.The value of MK-S test statistic for the data in Table 5 is 0.3263 with p-value = 0.087.This means that the EED provides a good fit for this set of real-life data under progressive Type-II censoring.Weibull distribution is not appropriate for data in Table 5.Moreover, Weibull distribution cannot be fitted to data sets of the type in Table 5.This is due to the nature of the Weibull distribution's log-likelihood function, which takes the form For the data in Table 5, existence of zeros, W (γ, Thus, the MLE of γ in non-unique.Then, Weibull distribution cannot be fitted to this data.See Nadarajah and Haghighi [14] for more information on this topic. Tables 6 and 7 display, respectively, the different point and interval estimates for the parameters γ, σ 1 and σ 2 depending on the progressive Type-II censoring real data in Table 5.For the EM algorithm, we use MLEs of γ, σ 1 and σ 2 as starting values.The BEs for informative priors case are obtained using the following values of hyperparameters: (a 1 , b 1 ) = (2.514× 10 −7 , 0.0022), (a 2 , b 2 ) = (1.608× 10 −7 , 0.0017) and (a 3 , b 3 ) = (155.028,55.68).

Simulation Study
A Monte Carlo simulation study is conducted to assess the performance of different estimation methods discussed in the previous sections.The point estimates are compared through two criteria for the accuracy of estimators, namely: mean square error (MSE) and relative absolute bias (RAB).Moreover, the suggested CIs are compared in terms of their average widths (AWs) and coverage probabilities (CPs).We begin the simulation study with true values of the parameters as γ = 0.2 , σ 1 = 0.4, and σ 2 = 0.5.Next, 1000 progressive Type-II censored samples are generated using the algorithm of Balakrishnan and Aggarwala [12] with (n, m) = (30, 20), (50, 40), (60, 50), (80, 60) and the following three censoring schemes: 1.
Scheme I: Scheme II: Scheme III: The number of failures due to cause 1 (m 1 < m) is a random variable that has binomial distribution with parameters m and p, where p is the relative risk rate, i.e. the probability that the failure of ith unit coming due to cause 1, it is defined as follows: For the EM algorithm, we use γ (0) = 0.09, σ 1(0) = 0.4, and σ 2(0) = 0.5 as starting values.The hyper-parameters of the informative priors are chosen to be (a 1 , b 1 ) = (3.2,8), (a 2 , b 2 ) = (5, 10) and (a 3 , b 3 ) = (0.8, 4).The hyper-parameter for the LINEX loss function is chosen to be h = −0.5 and 0.5, while the hyper-parameter of the GE loss is chosen to be q = 1.1.The MCMC algorithm is repeated 11000 times with 1000 as the burn in period.
Table 8 shows the RAB and MSE for the different estimates of the parameters with different sample sizes and different CSs.Table 9 shows the AWs and CPs for 95% asymptotic, informative-credible, and non-informative credible CIs.From the simulation study results, we can see the following: • The accuracy of the different point and interval estimates increases with the increment of the sample size.

•
Using the EM algorithm augments the accuracy of the estimates compared to the ML method.• BEs under informative priors are more accurate than MLEs according to the two standards of accuracy, RAB and MSE.• BEs under non-informative priors are less accurate than these using informative priors.

•
In some cases, the MLEs are more accurate than BEs under non-informative priors, This illustrates the importance of choosing the prior distribution of unknown parameters.

•
We do not notice an absolute preference for one of the suggested Bayesian methods over the other, except the BE of the shape parameter using MCMC method under general entropy loss function has the lowest MSE and RAB.• Using the censoring scheme in which the censoring occurs at the beginning of the experiment may increase the accuracy of the estimates.

•
The average width of different CIs decreases as the sample size augments.

•
The credible CIs for informative prior have the smallest average width comparing to asymptotic CIs and credible CIs for non-informative prior.

•
For the different types of CIs, the average width of the shape parameter, γ, is always smaller than the corresponding average width of the scale parameters, σ 1 and σ 2 .

Conclusions
Competing risks are often overlooked in survival and reliability analysis, which ultimately leads to incorrect and inaccurate decisions.In this study, we have investigated progressive Type-II censoring in the presence of competing risks.We have supposed that the latent failure times under the competing risks follow independent EED with the same shape parameter and different scale parameters.In this setting, we have estimated the unknown parameters using ML and EM algorithm methods.The parameters also were estimated using Bayesian estimation approach via TK technique and MCMC algorithm, under non-informative and informative priors.Furthermore, the asymptotic and credible CIs for the unknown parameters were constructed.The suggested estimation methods were compared through a comprehensive simulation study.From the results of simulation study, we have noted that the BEs under informative prior more accurate than corresponding non-informative prior BEs and classical estimates in terms of both RAB and MSE.We have got the same observation regarding confidence intervals.Moreover, two real examples were analyzed to demonstrate the suggested methods and to illustrate that the EED can be an appropriate alternative to Weibull distribution.

Table 2 .
Results of MK-S test of EED and Weibull distribution for jute fibre data.

Table 3 .
MLEs and BEs for the parameters of EED for application 1.

Table 6 .
MLEs and BEs for the parameters of application 2.

Table 8 .
Values of RAB and MSE for the classical and Bayesian estimation of distribution parameters.

Table 9 .
Values of AWs and CPs for the asymptotic CI, Asym.CI, and credible CIs for informative prior, Cred.CI (IP), and non-informative prior, Cred.CI (NIP).