Statistical Inference on the Shannon Entropy of Inverse Weibull Distribution under the Progressive First-Failure Censoring

Entropy is an uncertainty measure of random variables which mathematically represents the prospective quantity of the information. In this paper, we mainly focus on the estimation for the parameters and entropy of an Inverse Weibull distribution under progressive first-failure censoring using classical (Maximum Likelihood) and Bayesian methods. For Bayesian approaches, the Bayesian estimates are obtained based on both asymmetric (General Entropy, Linex) and symmetric (Squared Error) loss functions. Due to the complex form of Bayes estimates, we cannot get an explicit solution. Therefore, the Lindley method as well as Importance Sampling procedure is applied. Furthermore, using Importance Sampling method, the Highest Posterior Density credible intervals of entropy are constructed. As a comparison, the asymptotic intervals of entropy are also gained. Finally, a simulation study is implemented and a real data set analysis is performed to apply the previous methods.


Introduction
Usually, in lifetime experiments, due to the restrictions of limited time and cost, accurate product lifetime data cannot be observed so we have censored data. The most common censoring schemes are so-called Type-I and Type-II censoring. In the first one, place N units in a life experiment and terminate the experiment after a predetermined time; for the other, terminate the experiment after the predetermined units number m has failed. Progressive censoring is a generalization of Type-II censoring which permits the units to be randomly removed at various time points instead of the end of the time.
Compared to conventional Type-I and Type-II censoring, progressive censoring, i.e., withdrawal of non-failed items, decreases the accuracy of estimation. However, in certain practical circumstances, experimenters are forced to withdraw items from tests. Thus, the application of the progressive censoring methodology allows profiting from information related to withdrawn items.
When the above methods still fail to meet the time and cost constraints, to further improve efficiency, other censoring schemes are successively filed by researchers. One of the successful attempts is the first failure censoring. In this censoring scheme, N = k × n units are assigned to n groups in random with k identical units in each group. The lifetime experiment is conducted by testing all groups simultaneously until the first failure is observed in each group.
Since progressive censoring and first-failure censoring can both greatly enhance the efficiency of the lifetime experiment, Ref.
[1] united these two items and developed a novel censoring scheme called the progressive first-failure censoring. In this censoring, N = k × n samples are divided into n disjoint groups in random with k identical units at the beginning of the life experiment, and the experiment is terminated when the mth unit fails. When the ith unit fails, the group containing the ith is removed together with R i randomly selected groups, and when the mth fails, all the surviving groups are removed. Here, R = (R 1 , . . . , R m ) and m are set in advance. Note that (1) When k = 1, the progressive first failure censoring can be reduced to the well-known progressive Type-II censoring.
Since it is more efficient than other censoring schemes, many researchers have discussed the study of the progressive first-failure censoring. Ref. [2] considered both the point and interval estimation of two parameters from a Burr-XII distribution when both of the parameters are unknown; Ref. [3] dealt with the reliability function of GIED (Generalized inverted exponential distribution) under progressive first-failure censoring; Ref. [4] established different reliability sampling plans using two criteria from a Lognormal distribution based on the progressive first-failure censoring; Ref. [5] chose a competing risks data model under progressive first-failure censoring from a Gompertz distribution and estimated the model using Bayesian and non-Bayesian methods; Ref. [6] considered the lifetime performance index (C L ) under the progressive first-failure censoring schemes of a Pareto model, solved the problem of the hypothesis testing of C L , and gave a lower specification limit.
The Weibull distribution is used in a widespread manner in analyzing lifetime data. Nevertheless, the Weibull distribution possesses a constant, decreasing or increasing failure rate function, its failure rate function cannot be non-monotone, such as unimodal. In practice, if the research shows that the empirical failure rate function is non-monotone, then the Inverse Weibull model is a more suitable choice than the Weibull model. The Inverse Weibull model has a wide variety of applications in pharmacy, economics and chemistry.
The cumulative distribution function and the probability density function of the Inverse Weibull distribution (IWD) are separately written as where x > 0, λ > 0, α > 0, λ is the scale parameter and α is the shape parameter. The failure rate function is One of the most important properties of the IWD is that its failure rate function can be unimodal. Figure 1 also evidently supports this conclusion, and we can observe that the distribution whose failure rate function is unimodal is more flexible in application.
Many researchers have studied the Inverse Weibull distribution. Ref. [7] invesigated the Bayesian inference and successfully predicted the IWD for the type-II censoring scheme; Ref. [8] not only considered the Baysian estimation but also the generalized Bayesian estimation for the IWD parameters; Ref. [9] used three classical methods to estimate the parameters from IWD; Ref. [10] estimated the unknown parameters from IWD under the progressive type-I interval censoring and chose the optimal censoring schemes; Ref. [11] adopted two methods to get bias corrections of unknown parameters using maximum likelihood method of the IWD. Entropy is a quantitive measure of the uncertainty of each probability distribution. For the random variable X, of the probability density distribution f (x), the Shannon entropy, recorded as H(X), is written as: Many studies about entropy can be found in the literature. Ref. [12] proposed an indirect method using a decomposition to simplify the entropy's calculation under the progressive Type-II censoring; Ref. [13] estimated the entropy for several exponential distributions and extended the results to other circumstances; Ref. [14] estimated the Shannon entropy of a Rayleigh model under doubly generalized Type-II hydrid censoring, and compared the performance by two criteria. The Shannon entropy of the IWD is given by: where γ is a Euler constant.
In this paper, we discuss the maximum likelihood and Bayesian estimation of the paramaters (α, λ) and entropy of IWD under progressive first-failure censoring. As far as we know, this topic is very new and few researchers study it. However, it needs in-depth research and innovation. The rest of this paper is elaborated as follows: In Section 2, we derive the maximum likelihood estimation of entropy and parameters. In Section 3, we present the asymptotic intervals for the entropy and parameters. In Section 4, we work out the Bayesian estimation of entropy and parameters using Lindley and Importance Sampling methods. In Section 5, a simulation study is organized to compare different estimators. In Section 6, we analyze a real data set to explain the previous conclusions. Finally, in Section 7, a conclusion is presented.

Maximum Likelihood Estimation
We consider the maximum likelihood estimates (MLEs) for the entropy and parameters of an Inverse Weibull distribution under progressive first-failure censoring. Set X R 1:m:n:k ≤ X R 2:m:n:k ≤ · · · ≤ X R m:m:n:k be a sample from IWD under the progressive first-failure censoring (k, n, m, R 1 , . . . , R m ). For simplicity, we choose x i for representing x R i:m:n:k , i = 1, . . . , m. The joint probability density function is f X R 1:m:n:k ...X R m:m:n:k where 0 < x 1 < · · · < x m < ∞ and P = n(n − 1 − R 1 ) . .
Combining (1), (2), and (5), the likelihood function (LF) is Then, the log-likelihood function is written as For partial derivatives with respect to α and λ, the corresponding score equations are The MLEsα andλ, separately, are the roots of Equations (8) and (9). The equations don't have an explicit solution, so we need some numerical techniques to approximate the values of these parameters. Furthermore, according to the invariance property of MLE, we derive the ML estimator of entropy as:

Asymptotic Intervals for MLEs
The 100(1 − ξ)% confidence intervals (CIs) for the two parameters α and λ can be constructed by the asymptotic normality of MLEs with Var(α) and Var(λ) which are obtained by the inverse of the observed Fisher matrix.
From Equation (7), find second-order partial derivatives for α and λ as follows: The Fisher information matrix of two parameters α and λ is I(α, λ). Here, we approximate that (α,λ) T is a bivariate normal vector with mean (α, λ) T and covariance matrix I −1 = I −1 (α, λ). As a matter of fact, we use I −1 (α,λ) to make an estimation of I −1 (α, λ). In other words, where Thus, based on the normal approximation, the 100(1 − ξ)% CIs for two parameters α and λ arê Here, Z ξ/2 is the ξ/2 percentile of the standard normal distribution. Thus, as to obtain the approximate estimation of the variance of entropy, we use the delta method. Let where ∂H ∂α Then, the approximate estimate ofV ar(Ĥ) is obtained bŷ Therefore, we approximate thatĤ The asymptotic 100(1 − ξ)% CI for entropy is derived aŝ

Asymptotic Intervals for Log-Transformed MLE
Ref. [15] proposed that the asymptotic CI using log-transformed MLE has a more precise coverage probabilty. It is clear that α, λ, and entropy are all positive. Then, we obtain that 100(1 − ξ)% asymptotic approximate CIs for log-transformed MLEs are Thus, based on the normal approximation of log-transformed MLE, the 100(1 − ξ)% CIs for two parameters α and λ areα Furthermore, a 100(1 − ξ)% CI for entropy iŝ

Prior and Posterior Distribution
Both α and λ are unknown parameters, so we don't have any conjugate prior for both α and λ. Usually, we choose independent priors of α and λ which are both Gamma distributions. However, for the Inverse Weibull distribution, it is not appropriate to choose gamma for both priors. The specific reason is explained in detail in the Importance Sampling procedure subsection. Thus, in this case, we consider the following prior distributions: λ possesses a Gamma prior G(a, b) with the probability density function α has a non-informative prior with the following probability density function where a and b are pre-fixed to be known and positive. Now, the joint prior distribution of the two parameters α and λ can be obtained by Then, the joint posterior PDF of two parameters α and λ is derived by

Symmetric and Asymmetric Loss Functions
Choosing loss function is an important part of Bayesian inference. In this subsection, we consider the Bayes estimation for two parameters α, λ, and entropy of an IWD under both the asymmetric and symmetric loss functions. A widely used symmetric loss function is the squared error loss function (SELF). As for asymmetric loss functions, we choose the general entropy loss function (GELF) and linex loss function (LLF). The SELF, LLF, and GELF are defined as where‫א‬ means an estimate of ℵ. In LLF and SELF, the symbols of p and q indicate the direction of the asymmetry, and their sizes mean the different level. Neither of them are zero. The Bayes estimates of ℵ under above loss functions arê , where E ℵ means the posterior expectation under the parameter ℵ. Now, we can derive the Bayes estimates of α, λ, and entopy under SELF, LLF, and GELF.
To begin with, Bayes estimate of g(α, λ) under SELF iŝ Let g(α, λ) takes the value of α, λ, and entropy, then we can easily obtain the corresponding estimation under SELF. Moreover, Bayes estimate of g(α, λ) under LLF iŝ Let g(α, λ) take the value of α, λ, and entropy; then, we can obviously obtain the corresponding estimation under LLF. Finally, Bayes estimate of g(α, λ) under GELF iŝ Let g(α, λ) take the value of α, λ, and entropy; then, we can evidently obtain the corresponding esitimation under GELF. Obviously, the Bayesian estimation cannot be accurately expressed in a closed form. Hence, we recommend using Lindley method as well as Importance Sampling procedure to derive the Bayesian estimation.
Next, we derive the Bayesian estimate of entropy. It is clear that The requested estimation of entropy can be derived in a similar method. (
The approximate Bayes estimator of λ is computed likewise.

Importance Sampling Procedure
Using the Lindley approximation method, we can get the Bayesian estimates of the unknown parameters and entropy. Although the Lindley method can make point estimation, it cannot determine the Highest Posterior Density (HPD) credible intervals. Thus, we recommend using the Importance Sampling to get Bayesian estimates and to derive HPD credible intervals as well.
To begin with, let's solve the doubts before. If we choose two Gammas for prior distributions, record it as α ∼ G(a, b) and λ ∼ G(c, d). Then, the joint prior distribution is Correspondingly, the joint posterior distribution is We observe that G α seems like Gamma distribution, but the second parameter b + ∑ m i=1 log(x i ) can not be proven to be strictly positive, so it cannot be considered to be a Gamma distribution. Obviously, it is not possible to generate its random samples according to the Gamma distribution, and it is also difficult to generate its random samples using other methods. Therefore, it is not appropriate to choose both Gammas as priors.
Then, we return to the prior distribution we selected before. To implement the Importance Sampling, the joint posterior distribution can be adapted as Here, K is a normalizing constant and Note that, in order to get the Bayesian estimates of parameters using the Importance Sampling, we demand to produce corresponding samples from f 1 (λ|α) and f 2 (α). It is uncomplicated and clear to produce samples from f 1 (λ|α) because it is a simple Gamma distribution. As for producing samples from f 2 (α), we have a Lemma.
Since m is a postive number, the second-order partial derivative of log( f 2 (α)) is constantly negative. Thereby, f 2 (α) is log-concave.
Then, using the approach originally proposed by [16], we can easily produce samples from f 2 (α). Using the following steps, we can produce several samples from the request scenario: 1.
Then, the required Bayesian estimate of (α,λ) can be represented by Furthermore, samples produced above can also be chosen to establish the HPD intervals for the parameters and entropy. Suppose that 0 < p < 1, and p makes P( (α, λ) ≤ p ) = p. For a given p, we purpose an approach to make a estimation of p and then to establish the HPD intervals for (α, λ). Firstly, we suppose For simplicity, we replace Then, the Bayesian estimate of p isˆ p = (M p ) , where M p is an integer which satifies Therefore, a 100(1 − ξ)% HPD interval of (α, λ) can be derived by for all δ. The next section will use Monte Carlo simulation to numerically and systematically compare previously proposed estimators.

Simulation Results
We will use the Monte Carlo simulation method to analyze the behavior of different estimators obtained by the above sections based on the expected value (EV) and mean squared error (MSE). The progressive first-failure censored samples are produced from different censoring schemes of (k, n, m, R 1 , . . . , R m ) and various parameter values from the IWD by using the algorithm originally proposed by [17].
In general, we let α = 2, λ = 1, and correspondingly the entropy is 1.172676. We use the 'optim' command in the R software (version 3.6.1, Lucent Technologies, Mary Hill, NJ, USA) to get the approximate MLEs of α, λ, and entropy presented in Table 1. The Bayesian estimates under both asymmetric and symmetric loss functions are precisely computed by the Lindley method and Importance Samplings. For the Bayes estimation, we assign the value of hyperparameters as a = 1, b = 1 for Tables 2-7 and a = 0, b = 0 for Tables 8 and 9. Under the LLF, we let p = 0.5 and p = 1. Under the GELF, we choose q = −0.1 and q = 1. We derive 95% asymmetric intervals of parameters using the MLEs and log-transformed MLEs and 95% HPD intervals. Pay attention that, for simplicity, the censoring schemes are presented by abbreviations such as (0 * 5) represents (0, 0, 0, 0, 0) and ((1, 0) * 2) represents (1, 0, 1, 0). Tables 2-6 and 8 present the Bayes estimation of α, λ, and entropy using the Lindley method. The Bayes estimation based on Importance Samplings is shown in Tables 7 and 9. In Table 10, the interval estimation of entropy is presented.      As a whole, the EVs and MSEs of parameters and entropy all significantly decrease as the sample size n increases. In Tables 1-9, set m and n invariant, the EVs and MSEs of parameters and entropy both decrease as the group size k increases. Furthermore, set k and n invariant, the EVs and MSEs of parameters and entropy both decrease as m increases. Bayesian estimates with a = 1, b = 1 perform more precise than a = 0, b = 0, which is so-called non-informative. Using MLE and Bayes estimation based on the Lindley method is better than the Importance Sampling procedure. Bayes estimation using the Lindley method is a little bit more precise than the MLE. For LLF, choosing p = 1 seems to be better than p = 0.5. For GELF, q = −1 competes as well as q = 1. In Tables 7 and 9, we observe that the few censoring schemes such as (0 * 24, 25) and (0 * 34, 35) do not compete well.
In Table 10, the average length (AL) narrows down as the sample size n increases. Moreover, HPD intervals are more precise than confidence intervals based on AL. For confidence intervals, using log-transformed MLEs performs much better than MLEs. In almost all circumstances, the coverage probability (CP) of entropy derived here achieve their specified confidence intervals.

Real Data Analysis
We will analyze a real data set and apply the approaches put forward in the sections above. The data set was analyzed by [7,18]. The data show the surviving days of guinea pig injected with vairous species of tubercle bacilli. The quantity of regimen is the logarithmic of the quantity of bacillary units in 0. 5  Before analyzing the data, we want to test if the IWD matches the complete data well. To begin with, from [7], we conclude that the failure rate function of this data are unimodal, so it is scientific and reasonable to analyze the data using IWD. Then, we choose various approaches to analyze the goodness of fit of IWD using the MLE. We compute the − ln(L) and Kolmogorov-Smirnov (K-S) statistics with its associated p-value represented in Table 11. According to the p-value, the IWD fits the complete data well. Table 11. Summary for model fit using − ln L, K-S statistic and associated p-value. Now, we can consider the censoring data to illustrate the previous approaches. To generate the first-failure censored sample, we randomly sort the given data into n = 36 groups with k = 2 identical units in each group, and we can get the first-failure censored sample: 12,15,22,24,32,32,33,34,38,38,43,44,48,52,54,55,56,58,58,60,60,61,63,65,65,68,70,70,73,76,84,91,109,110,129,143. Then, we produce samples using three diffrent progressive first-failure censoring which are (18, 0 * 17), (1 * 18) and (0 * 17, 18) from the above sample with m = 18. The results are organized in Table 12. In Table 13, for MLE, we calculate the EVs, MSEs, and confidence intervals of the parameters and entropy; for Bayes estimation, we obtain the EVs, MSEs, and HPD intervals of entropy and two parameters. The estimates of α, λ, and entropy using the MLE and the Importance Sampling method are relatively close.

Conclusions
In this article, the problem of statistical inference on the parameters and entropy of IWD under progressive first-failure censoring has been considered. Both the maximum likelikood estimation and Bayesian estimation are investigated. For Bayesian estimation, we apply the Lindley and Importance Sampling method to approximate the Bayesian estimates under both asymmetric and symmetric loss functions. We construct the approximate intervals based on MLEs and Log-transformed MLEs. In addition, we use the Importance Sampling method to derive the HPD intervals. Then, we compare the performance of estimates through EV and MSE. Although we have considered the estimation of entropy under progressive first-failure censoring scheme as much as possible, using a similar method, this censoring scheme can be widely extended to other more efficient and complex censoring schemes. This direction is still very promising and requires more attention and work.