Statistical Inference on a Finite Mixture of Exponentiated Kumaraswamy-G Distributions with Progressive Type II Censoring Using Bladder Cancer Data

: A new family of distributions called the mixture of the exponentiated Kumaraswamy-G (henceforth, in short, ExpKum-G) class is developed. We consider Weibull distribution as the baseline (G) distribution to propose and study this special sub-model, which we call the exponentiated Kumaraswamy Weibull distribution. Several useful statistical properties of the proposed ExpKum-G distribution are derived. Under the classical paradigm, we consider the maximum likelihood estimation under progressive type II censoring to estimate the model parameters. Under the Bayesian paradigm, independent gamma priors are proposed to estimate the model parameters under progressive type II censored samples, assuming several loss functions. A simulation study is carried out to illustrate the efﬁciency of the proposed estimation strategies under both classical and Bayesian paradigms, based on progressively type II censoring models. For illustrative purposes, a real data set is considered that exhibits that the proposed model in the new class provides a better ﬁt than other types of ﬁnite mixtures of exponentiated Kumaraswamy-type models.


Introduction
The utility of mixture distributions during the last decade or so have provided a mathematical-based strategy to model a wide range of random phenomena effectively. Statistically speaking, the mixture distributions are a useful tool and have greater flexibility to analyze and interpret the probabilistic alias random events in a possibly heterogenous population. In modeling real-life data, it is quite normal to observe that the data have come from a mixture population involving of two or more distributions. One may find ample evidence(s) in terms of applications of finite mixture models not limited to but including in medicine, economics, psychology, survival data analysis, censored data analysis and reliability, among others. In this article, we are going to explore such a finite mixture model based on bounded (on (0,1)) univariate continuous distribution mixing with another baseline (G) continuous distribution and will study its structural properties with some where a, b, c are all positive parameters and x > 0. The associated cumulative distribution function (cdf) is given by If u ∈ (0, 1), the associated quantile function is given by In this paper, we consider a finite mixture of two independent EKW distributions with mixing weights and consider an absolute continuous probability model, namely the two-parameter Weibull, as a baseline model.
The rest of this article is organized as follows. In Section 2, we provide the mathematical description of the proposed model. In Section 3, some useful structural properties of the proposed model are discussed. The maximum likelihood function of the mixture exponentiated Kumaraswamy-G distribution based on progressively type II censoring is given in Section 4. Section 5 deals with the specific distribution of the mixture of exponentiated Kumaraswamy-G distribution when the baseline (G) is a two parameter Weibull, henceforth known as EKW distribution. In Section 6, we provide a general framework for the Bayes estimation of the vector of the parameters and the posterior risk under different loss functions of the exponentiated Kumaraswamy-G distribution. In Section 7, we consider the estimation of the EKW distribution under both the classical and Bayesian paradigms via a simulation study and under various censoring schemes. For illustrative purposes, an application of the EKW distribution is shown by applying the model to bladder cancer data in Section 8. Finally, some concluding remarks are presented in Section 9.

Model Description
A density function of the mixture of two components' densities with mixing proportions p ∈ [0, 1] and q = 1 − p of EKG distributions is given as follows: for x > 0, with a j b j c j > 0, and j = 1, 2, the j-th component and the pdf of the mixture of the two EKG distributions is given by meaning the associated cdf of the distribution is The component wise cdf can be obtained as For the density in Equation (3), (a 1 , b 1 ), (a 2 , b 2 ), are all playing the role of shape parameters. Consequently, for the varying choices of a 1 , b 1 , a 2 and b 2 one may obtain various possible shapes of the pdf, as well as for the hrf function.

Structural Properties
We begin this section by discussing the asymptotes and shapes of the proposed mixture model in Equation (3).

•
Result 1: Shapes. The cdf in Equation (3) can be obtained analytically. The critical points of the pdf are the roots of the following equation: where Similarly, There may be more than one root to the Equation (5). If x = x* is the root of the equation, it corresponds to a local maximum, or a local minimum or a point of inflexion depending A random variable is said to have the exponentiated-G distribution with parameter a > 0 if y ∼ Exp − G(a) and if its pdf and cdf is given by f (y) = a g(x)G a (x) and F(y) = G a (x), as shown in [6,7].
If one considers the following, we have the following equations: Therefore, Note that if b 1 , c 1 , b 2 , c 2 are integers, then the repective sums will stop at b 1 , c 1 , b 2 and c 2 . The above expression shows the fact that the pdf of the finite mixture of EKG can be represented as the finite mixture of infinite exponentiated-G distribution with parameters a 1 (j 2 + 1) and a 2 (j 2 + 1), respectively. Therefore, structural properties, such as moments, entropy, etc., of this model can be obtained from the knowledge of the exponentiated-G distribution and one can refer to [8] for some pertinent details.
Then, the following scheme will work: (iii) Accept X = x as a sample from the target density if y < f (x). If y ≥ f (x), one must go to step (ii).
One may obtain an expression of the reliability function of mixture EKG, which takes the following form: where the component-wise reliability function of the mixture model is given by The density in Equation (1) is flexible in the sense that one can obtain different shapes of hazard rate function (hrf) of the mixture model, which is given by The quantile function of the mixture model is given by For example, the median, x m , of f (x) for U = 0.5 will be The various shapes of the pdf and the hrf when the baseline distribution (G) is Weibull is provided in Figure 1. In the next section, we discuss the maximum likelihood estimation strategy for the finite mixture of exponentiated Kumaraswamy-G (EKG) distribution under the progressive type-II censoring scheme. For more details, one can refer to [9]. The necessary and sufficient conditions for identifiability and identifiability properties are discussed in the Appendix A.

Maximum Likelihood Estimation of EKG Distribution under Progressive Type-II Censoring
One must suppose that n units are put on life test at time zero and the experimenter decides beforehand the quantity m, the number of failures to be observed. In this censoring scheme, and m are prefixed. The resulting m is ordered. Values, which are obtained as a consequence of this type of censoring, are appropriately referred to as progressive type II censored ordered statistics. One

Maximum Likelihood Estimation of EKG Distribution under Progressive Type-II Censoring
One must suppose that n units are put on life test at time zero and the experimenter decides beforehand the quantity m, the number of failures to be observed. At the time of first failure, R 1 units are randomly removed from the remaining n-1 surviving units. At the second failure, R 2 units from the remaining n − 2 − R 1 units are randomly removed. The test continues until the mth failure. At this time, all remaining R m = n − m − R 1 − R 2 − . . . − R m−1 units are removed. In this censoring scheme, R i and m are prefixed. The resulting m is ordered. Values, which are obtained as a consequence of this type of censoring, are appropriately referred to as progressive type II censored ordered statistics. One must note that if R 1 = R 2 = . . . = R m−1 = 0, so that R m = n − m, this scheme reduces to a conventional type II on the stage right censoring scheme.
One must also note that if R 1 = R 2 = . . . = R m = 0, so that m = n, the progressively type II censoring scheme reduces to the case of a complete sample (the case of no censoring).
One must allow (X 1:m:n , X 2:m:n , . . . , X m:m:n ) to be a progressively type II censored sample, with (R 1 , R 2 , . . . , R m ) being the progressive censoring scheme. The likelihood function based on the progressive censored sample of EKG distributions is given by and G(x) are given in Equations (3) and (4) and we obtain the log likelihood function without the constant term, which is is given by To simplify, we take the logarithm of the likelihood function, ı, and for illustration purposes, let g j (X i:m:n ) = f j (X i:m:n ) and G j (X i:m:n ) = F j (X i:m:n ) as follows: Next, for illustrative purposes, we consider the baseline (G) distribution to be a two parameter Weibull distribution on the EKG distribution and discuss its estimation under both the classical and Bayesian set up.

Finite Mixture of Exponentiated Kumaraswamy Weibull Distribution
Exponentiated Kumaraswamy Weibull (EKW) distribution is a special case that can be generated from exponentiated Kumaraswamy -G distributions. The EKW distribution is found by taking G(x) of the Weibull distribution in Equation (1). One of the most important advantages of the EKW distribution is its capacity to fit data sets with a variety of shapes, as well as for censored data, compared to the component distributions. One must let G be the Weibull distribution with the pdf and the cdf are given by and The inverse of the cdf is given by The pdf of a mixture of two component densities with mixing proportions, (p j , j = 1, 2) for q = 1 − p of the exponentiated Kumaraswamy Weibull distribution (henceforth, in short is MKEW) is given by For the pdf in Equation (6), the following is noted: (i) s 1 and s 2 are the scale parameters and r 1 and r 2 are the shape parameters for the Weibull component. (ii) a 1 , a 2 , b 1 and b 2 are the shape parameters arising from the finite mixture pdf in Equation (4); (iii) p, and q are the mixing proportions , where p + q = 1.
Depending on the different values of the parameters, different shapes of the pdf and the hrf of the MEKW distribution are shown in Figure 1. From Figure 1 (left panel), it appears that the MEKW pdf can include symmetric, asymmetric, right-skewed, and decreasing shapes, depending on the values of parameters. From Figure 1 (right panel), one can observe that the hrf may assume shapes with constants and that are down-upward and increasing.
The associated cdf is given by The hazard rate function of MEKW, hr(x), model is flexible, as it allows for different shapes, which is given by The quantile function is given by In the next section, by using a quantile function-based formula for skewness and kurtosis, we plot the coefficients of skewness and kurtosis for the MEKW distribution for different values of the parameters, as shown in Figure 2. From Figure 2, one can observe that the distribution can be positively skewed, negatively skewed, and could also assume platykurtic and mesokurtic shapes. , , and are the shape parameters arising from the finite mixture pdf in Equation (4); (iii) , and are the mixing proportions , where + = 1.

Iiii Iiii Iiiii
Depending on the different values of the parameters, different shapes of the pdf and the hrf of the MEKW distribution are shown in Figure 1. From Figure 1 (left panel), it appears that the MEKW pdf can include symmetric, asymmetric, right-skewed, and decreasing shapes, depending on the values of parameters. From Figure 1 (right panel), one can observe that the hrf may assume shapes with constants and that are down-upward and increasing.
The associated cdf is given by The hazard rate function of MEKW, ℎ ( ), model is flexible, as it allows for different shapes, which is given by The quantile function is given by In the next section, by using a quantile function-based formula for skewness and kurtosis, we plot the coefficients of skewness and kurtosis for the MEKW distribution for different values of the parameters, as shown in Figure 2. From Figure 2, one can observe that the distribution can be positively skewed, negatively skewed, and could also assume platykurtic and mesokurtic shapes. In the next section, we discuss a strategy of estimating parameters for the EKG model under the Bayesian paradigm using independent gamma priors. In the next section, we discuss a strategy of estimating parameters for the EKG model under the Bayesian paradigm using independent gamma priors.

Bayesian Estimation Using Gamma Priors for the Finite Mixture of Exponentiated Kumaraswamy-G Family
In this section, we consider the Bayes estimates of the model parameters that are obtained under the assumption that the component random variables for the random vector Φ = [a j, b j, c j , s j , r j , p, q, ], f or j = 1, 2, have independent gamma priors with hyper parameters a k and ∅ k , k = 1, 2, 3, 4, 5, 6, 7, which is given by By multiplying Equation (6) with the joint posterior density of the vector Φ, given the data, we can obtain the following: Marginal posterior distributions of Φ can be obtained by integrating out the nuisance parameters. Next, we consider the loss function that will be used to derive the estimators from the marginal posterior distributions.

Bayes Estimation of the Vector of Parameters and Evaluation of Posterior Risk under Different Loss Functions
This section spotlights the derivation of the Bayes estimator (BE) under different loss functions and their respective posterior risks (PR). For a detailed study on different loss error functions, one can refer to [10]. The Bayes estimators are evaluated using the squared error loss function (SELF), weighted squared error loss function (WSELF), precautionary loss function (PLF), modified (quadratic) squared error loss function (M/Q SELF), logarithmic loss function (LLF), entropy loss function (ELF), and K-Loss function. The K-loss function proposed by [11] is well fitted for a measure of inaccuracy for an estimator of a scale parameter of a distribution defined by R + = (0, ∞); this loss function is called the K-loss function (KLF). Table 1 shows the Bayes estimators and the associated posterior risks under each specific loss functions considered in this paper.
Next, we derive the Bayes estimators of the model parameters under different loss functions. They were originally used in estimation problems when the unbiased estimator of Φ was being considered. Another reason for its popularity is due to its relationship to the least squares theory. The SEL function makes the computations simpler. Under the SEL, WSEL, Q M SEL, PL, LL, EL and KL functions in Table 1, the Bayesian estimation for the random vector Φ = a j , b j , c j , s j , r j , p, q , for j = 1, 2, and under various loss functions, it can be obtained as follows.
It is evident that each of the integrals in the above section have no closed form for the resulting joint posterior distribution as given in Equation (9). Therefore, they need to be solved analytically. Consequently, the MCMC technique is proposed to generate samples from the posterior distributions and then the Bayes estimates of the parameter vector Φ are computed under progressively type II censored samples. Next, we provide the general form of the Bayesian credible intervals.

Credible Intervals
In this subsection, asymmetric 100(1 − τ )% two-sided Bayes probability interval estimates of the parameter vector Φ, denoted by [L Φ , U Φ ], are obtained by solving the following expression: Since it is difficult to find the interval L Φ and U Φ analytically, we apply suitable numerical techniques to solve Equation (11).

Bayesian Estimation of the Exponentiated Kumaraswamy Weibull Distribution
G is assumed to be the Weibull distribution with pdf and cdf, which are given by where r is the shape parameter (r > 0), and s is the scale parameter (s > 0) and The joint posterior density for the parameter vector Φ, given the data, becomes the following: Marginal distributions of the parameter vector Φ can be obtained by integrating the nuisance parameters. Next, we consider the loss function that will be used to derive the estimators from the marginal posterior distributions.

Simulation Study
In this section, we evaluate the performance of the maximum likelihood and the Bayesian estimation methods to estimate the parameters using Monte Carlo simulations. We conduct the simulations using the (Maxlik) package in R software, as shown in [12]. The values of the biases, and the relative mean square errors (RMSEs) in the results indicate that the maximum likelihood and the Bayesian estimation methods performs quite well to estimate the model parameters.

Simulation Study for MEKW
In this subsection, we evaluate the performance of the maximum likelihood method and Bayesian estimation method to estimate the parameters for the MEKW model using Monte Carlo simulations. Based on progressively type II censored samples selected from the MEKW pdf in Equation (3), a total of eight parameter combinations, and assuming the sample sizes n = 25, 50, censored at 60% and 80% of the sample size, are considered. The process is repeated 1000 times and the biases (estimate-actual), RMSEs and length of confidence intervals (CI) of the estimates are reported in Tables 2-7. In computing the length of CI, we obtain length asymptotic CI (LACI) for the likelihood estimators, and also obtain the length credible CI (LCCI) for the Bayesian estimators. In addition, we compared the performance of the estimation by considering the following schemes.

Application on Bladder Cancer Data
In this section, we provide a real data analysis to illustrate some practical applications of the proposed distributions. The data are from [13], which correspond to the remission times (in months) of a random sample of n = 128 bladder cancer patients. These data are given as follows: 0 Before proceeding further, we fitted the mixture EKW distribution to the complete data set. Table 8 reports the ML and Bayesian estimates for the parameters for the complete bladder cancer data. Figure 3 represents the overall fit of EKW for these data.

Application on Bladder Cancer Data
In this section, we provide a real data analysis to illustrate some practical applica of the proposed distributions. The data are from [13], which correspond to the rem times (in months) of a random sample of n = 128 bladder cancer patients. These da given as follows: 0 Before proceeding further, we fitted the mixture EKW distribution to the com data set. Table 8 reports the ML and Bayesian estimates for the parameters for the plete bladder cancer data. Figure 3 represents the overall fit of EKW for these data.  The validity of the fitted model is assessed by computing the Kolmogorov-Sm distance (KSD) statistics with p-Value KS (PVKS) in Table 8. In addition, we plotte fitted cdf and the empirical cdf, as shown in Figure 3. This was conducted by repl the parameters with their ML (in red) estimates, as shown in Figure 3. The KSD sta for ML are 0.0443 and the corresponding p-value is 0.9629. Therefore, the KS test, with Figure 3, indicate that the EKW distribution provides the best fit for this data s Next, we fitted the MEKW distribution to the complete data set. Table 9 repor ML and Bayesian estimates for the parameters for the complete bladder cancer data  Table 8. In addition, we plotted the fitted cdf and the empirical cdf, as shown in Figure 3. This was conducted by replacing the parameters with their ML (in red) estimates, as shown in Figure 3. The KSD statistics for ML are 0.0443 and the corresponding p-value is 0.9629. Therefore, the KS test, along with Figure 3, indicate that the EKW distribution provides the best fit for this data set.
Next, we fitted the MEKW distribution to the complete data set. Table 9 reports the ML and Bayesian estimates for the parameters for the complete bladder cancer data.      Two different sampling schemes are used to generate the progressively censored samples from the bladder cancer data with m = 100, which are as follows: Strategy 1: (99*0,28); R kı = n k − m k , ı = 1 0, ı = 2, . . . , m k (type II censoring scheme).
In both cases, we have considered the optimization algorithm to compute the ML estimates. Table 10 shows the ML estimates for these two schemes.    In addition, Bayesian credible interval estimates of the parameters are obtained numerically using Markov chain Monte Carlo (MCMC) techniques. That is, samples are simulated from the joint posterior distribution in Equation (12) using the Metropolis-Hasting algorithm to obtain the posterior mean values of the estimates of the parameters by MCMC. Table 10 reports the estimates of the MEKW parameters with the corresponding SE and credible confidence intervals using the HDI algorithm of the Bayesian estimators.  In addition, Bayesian credible interval estimates of the parameters are obtained numerically using Markov chain Monte Carlo (MCMC) techniques. That is, samples are simulated from the joint posterior distribution in Equation (12) using the Metropolis-Hasting algorithm to obtain the posterior mean values of the estimates of the parameters by MCMC. Table 10 reports the estimates of the MEKW parameters with the corresponding SE and credible confidence intervals using the HDI algorithm of the Bayesian estimators.

Concluding Remarks
Finite mixture models under both the continuous and the discrete domain have received considerable attention over the last decade or so due to its flexibility of modeling an observed phenomenon when each component cannot adequately explain the entire nature of the data. In this paper, we have developed and studied a finite mixture of exponentiated Kumaraswamy-G distribution under a progressively type II censored sampling scheme, when the baseline distribution (G) is a two parameter Weibull. The efficacy of the proposed model has been established through applying it to model data from the healthcare domain. From the simulation study as well as from the application, it has been observed that, depending on the censoring scheme, either of the two estimation methods (i.e., maximum likelihood and the Bayesian estimation under independent gamma priors) could be useful. Among the various loss functions assumed for the Bayesian estimation, the results based on the small simulation study are inconclusive as to which loss function will be the most suitable for this type of finite mixture models. Most likely, a full-scale simulation study with varying parameter choices and a wide range of censoring schemes would give us an idea. Currently, we are working on this and it will be published when it is ready for submission.

Data Availability Statement:
The data used to support the findings of this study are included within the article.