Bayesian Estimation of Entropy for Burr Type XII Distribution under Progressive Type-II Censored Data

: With the rapid development of statistics, information entropy is proposed as an important indicator used to quantify information uncertainty. In this paper, maximum likelihood and Bayesian methods are used to obtain the estimators of the entropy for a two-parameter Burr type XII distribution under progressive type-II censored data. In the part of maximum likelihood estimation, the asymptotic conﬁdence intervals of entropy are calculated. In Bayesian estimation, we consider non-informative and informative priors respectively, and asymmetric and symmetric loss functions are both adopted. Meanwhile, the posterior risk is also calculated to evaluate the performances of the entropy estimators against different loss functions. In a numerical simulation, the Lindley approximation and the Markov chain Monte Carlo method were used to obtain the Bayesian estimates. In turn, the highest posterior density credible intervals of the entropy were derived. Finally, average absolute bias and mean square error were used to evaluate the estimators under different methods, and a real dataset was selected to illustrate the feasibility of the above estimation model.


Introduction
Burr type XII distribution was first proposed in [1] by Burr along with thirteen other types of Burr distributions. In recent years, Burr type XII has been applied widely in industry, physics and survival analysis, as a more flexible alternative than the Weibull distribution. It is also called the Singh-Maddala distribution, which is one of the "generalized log logistic distributions." The cumulative distribution function (cdf) and probability distribution function (pdf) of this distribution are, respectively, as follows: and where β and α are both shape parameters. In recent years, many researchers have investigated the estimation based on Burr distribution. The probabilistic and statistical properties of Burr type XII and the relevance to other distributions are discussed in [2]. Jaheen and Okasha (2011) [3] used the E-Bayesian method to calculate estimates of the unknown parameter for a Burr type XII distribution with type-II censored data, and compared it with classical estimation method. Wu et al. (2010) [4] constructed the optimal confidence regions of two parameters and the confidence interval of one shape parameter (β in (1)) in Burr type XII distribution using several pivotal quantities under progressive type-II censoring.
Censoring is common in many fields, such as pharmacology, social economics and engineering, especially in terms of reliability and survival analysis. In actual production, it is difficult to observe the sample data completely due to time and cost constraints. Therefore, censoring data are more practical and efficient. Due to the various conditions of live experiments, different censoring types are suggested. For type-I censoring, the experimental stop time is a pre-determined time T, rather than the time when all items fail. For type-II censoring, the experimental stop time is the occurrence time of a fixed number of failures. At the experimental stop time, the remaining items are removed and the experiment does not continue. However, neither of them allows the items to be removed during the experiment. Therefore, progressive censoring is proposed as a more applicable scheme. Suppose that n independent identically distributed items are tested in survival analysis, (X 1 , X 2 , · · · X m (1 ≤ m ≤ n)) is the observed sample of failure and (R 1 , · · · , R m−1 , R m ) represents the number of corresponding items removed from the test. Let x i represent the specific value of X i . At the time of X 1 (the first failure) occurring, R 1 surviving items are randomly removed and no longer part of the experiment. At the second failure time x 2 , R 2 surviving items are randomly removed out of the experiment. Repeat this action until the mth failure occurs. At this moment R m = n − m − ∑ m−1 i=1 R i . Obviously, the censored sample size is n − m. As a general situation, progressive type-II contains the type-II censoring (R m = n − m, R 1 = · · · = R m−2 = R m−1 = 0) and complete sampling case (R 1 = · · · = R m−1 = R m = 0).
Up to now, many scholars have employed censored data to study the estimation of parameters for different distributions in different experimental situations. For example, Panahi and Sayyareh (2014) [5] employed a type-II censored sample to compute the maximum likelihood estimators via expectation-maximization algorithm, and obtained the Bayesian estimators for Burr type XII parameters. Qin and Gui (2020) [6] derived the maximum likelihood and Bayesian estimation of Burr type XII parameters based on the competing risks model, and proved the existence and uniqueness of the maximum likelihood estimators. Elsagheer (2016) [7] adopted the maximum likelihood, Bayesian, and bootstrap estimation methods for partially accelerated life experiments for the power hazard function with progressive type-II censoring schemes. Censored data have been widely studied because of their practicability and commonness. However, the use of censored data will inevitably cause loss of information (compared with the completed data), and the current research on information entropy with censored data is relatively lacking. Therefore, for the consideration of applicability and flexibility, we adopt progressive type-II censored data to estimate information entropy. For more discussion on different censoring schemes, refer to [8][9][10].
Information is an abstract concept. In the face of a large number of data, we can easily get the amount of data, but it is not clear how much information is contained in the data-that is, whether the data are valuable. Shannon put forward information entropy in 1984. In the definition, information entropy is used as an index to describe the uncertainty of information. The censoring samples inevitably lead to the loss of information, which in turn affects the estimates of information entropy. Therefore, quantifying the information entropy under different censoring cases is necessary. Additionally, due to the emergence of incomplete samples, the estimation of information entropy has also aroused interest of some authors in recent years. For example, Patra et al. [11] studied the problem of estimating the entropy of two exponential populations with scale and ordered location parameters under several censoring schemes. With the extensive application of Burr distribution in reliability, biology, economy, energy, meteorology, and other fields, its entropy is also of great help to many scholars. In [12], two entropy-based methods were used to apply the extended Burr XII distribution to six peak flow datasets, and quantiles (discharges) corresponding to different return periods were computed. Besides, in spectrum analysis, entropy and the maximum entropy principle (MEP) have many applications as the important subjects of signal processing. According to the MEP, it is helpful to select the probability law with maximum entropy before using probability law in the inference model. Ali et al. [13] presented this process comprehensively. Therefore, the estimation of information entropy is of practical significance to industrial production and experimental design.
Additionally, information entropy is of great research significance in other fields such as physics, communication, economics and so on. For the study of entropy, many researchers have made contributions. Sunoj et al. (2012) [14] introduced a Shannon entropy function based on quantiles and studied the properties of the residual entropy function. AboEleneen and Z.A (2011) [15] simplified the entropy and derived some recurrence relations under progressively type-II censored samples. Lee [16] employed the generalized progressive hybrid censored data to study the entropy estimation using ML and Bayesian methods under the inverse Weibull distribution. Zhao et al. [17] researched the empirical entropy method under type-II censored data and compared it with the empirical likelihood method using a simulation. The famous Shannon information entropy is defined as where X is a continuous random variable and its pdf is f (x). Bayesian estimation is a newer and more practical method than classical estimation. Its basic idea is to combine the prior and sample information to obtain the posterior information that we will use, which can not be accomplished by using classical estimation. That means the Bayesian method can utilize prior information besides likelihood information, which can make up for the loss of information caused by censoring to some extent. Based on this idea, we adopted the Bayesian method to derive and calculate the information entropy under different prior distributions. Many scholars have used the Bayesian method to make some statistical inferences based on the Burr distribution, and most of them are about parameter estimation. Qin and Gui (2020) [6] obtained Bayes estimators and associated credible intervals under different loss functions using Markov chain Monte Carlo (MCMC) method with progressive type-II censoring data. Additionally, their inference is based on the competing risks model. Maurya et al. (2017) [18] made use of the Tierney and Kadane (TK) method and importance sampling procedure to derive Bayes estimators under different loss functions under censoring. To the best of our knowledge, no work has been done in applying the Bayesian method to the entropy estimation of a Burr XII distribution with progressive censored data. Furthermore, in the Bayesian method, we chose an approximation called the Lindley method, which is a third-order expansion of the Taylor formula. Theoretically, it performs better than the TK method (second order expansion).
The remaining structure is as follows: Section 2 investigates the progressive type-II censoring and derives the maximum likelihood (ML) estimator for entropy with the progressive censored data. Meanwhile, the corresponding asymptotic confidence intervals (ACIs) of entropy are also given. In Section 3, we obtain the Bayesian estimators and corresponding posterior risks (PRs) against different loss functions with informative and non-informative priors. Further, we utilize the Lindley method and Markov chain Monte Carlo (MCMC) algorithm to calculate Bayesian estimates. Additionally, the highest posterior density (HPD) intervals of entropy that are credible are obtained. In Section 4, the Monte Carlo method's application in the the simulation is described, and we compare the results under different prior distributions and loss functions. A set of real data is analyzed to illustrate the availability for the above methods. In Section 5 we draw some conclusions.

Maximum Likelihood Estimation
Suppose that (X 1 , X 2 , · · · , X m ) is the progressive type-II censored sample observed from the test with n items. Every X i follows the distribution defined by (1) and x i represents the specific value of X i . Thus the likelihood function with m progressive type-II censored samples is given by where Additionally, the corresponding log-likelihood function is Then the likelihood equations of β and α can be obtained as and Obviously, the ML estimates of β and α ( β and α) are the solutions of (6) and (7), which can be achieved by a numerical method such as the Newton-Raphson method.
In line with the invariance property of the ML estimation, it is not different to obtain the estimator for a function of the parameters. If the entropy in (3) under Burr XII distribution can be simplified into a function of parameters β and α, the ML estimation of entropy can be carried out. Theorem 1. Let X be a random variable with cdf (1); then the entropy of X iŝ where ψ is defined by ψ(z) = d dz ln Γ(z), which is also called the digamma function. Γ is the gamma function.

Proof. See Appendix A.
On the basis of Theorem 1, the ML estimator for entropy can be obtained aŝ

Asymptotic Confidence Interval for Entropy
In order to obtain the asymptotic confidence interval (ACI) of information entropy, the observed Fisher information matrix of α and β is given first as where I −1 (α,β) is the inverse of I(α,β), and also the variance-covariance matrix of β and α. When the sample size is large, the asymptotic distribution of the parameter estimators is in which the first and second elements on the diagonal of matrix I −1 ( β, α) are the estimated variances of β and α, respectively.
Since we want to get the asymptotic confidence interval of entropy in (8) which is a function of β and α, we shall use the delta method to calculate the variance of entropy. In the delta method, by creating a linear approximation of a function, the variance of a simpler linear function can be calculated. Note that where ψ (α) is the derivative of ψ(α) with respect to α (also called the trigamma function). Then using the delta method, the variance of entropy can be given as where C t is transposed C. Then, the 100(1−γ)% ACI of entropy is where z γ/2 is the upper (γ/2)th quantile of the standardized normal distribution.

Bayes Estimation
Bayesian estimation is a more practical method compared to classical estimation methods, and has attracted much interest from many researchers in recent years. Many people adopt the Bayesian method to estimate parameters and related functions for different distributions. Fu et al. (2012) [19] considered the Bayesian estimation for Pareto distribution under three non-informative priors. Musleh and Helu (2014) [20] used the ML, least squares, approximate ML and Bayesian estimation to make statistical inferences of unknown parameters for inverse Weibull distributions using progressive type-II censoring schemes. Rastogi and Merovci (2017) [21] obtained Bayesian estimators of parameters for Weibull-Rayleigh distribution under asymmetric and symmetric loss functions. Singh and Tripathi [22] investigated the Bayesian estimation for the unknown parameters of an inverse Weibull distribution based on progressive type-I interval censored data, and obtained the optimal censoring scheme. For more studies on the Bayesian estimation, refer to [23][24][25][26].
In this paper, we use Bayesian estimation to get the estimates of entropy. Then, the parameters β and α in (9) are treated as random variables which conform to the prior distributions. The parameters in the prior distribution are called hyper-parameters distinguished from the parameters β and α. The prior information combined with the likelihood function can be used to derive the posterior information. By comparing the posterior risks under different prior distributions and loss functions, the relative validity of estimation can be evaluated.

Prior Distribution and Corresponding Posterior Distribution
In order to observe the influences of priors on Bayesian estimators, we adopt an informative prior and a non-informative prior respectively. For the informative prior, gamma distribution is adopted for its flexibility. Moreover, it was found that in the case of this paper, the gamma distribution is a conjugate with respect to the parameter α, which is conducive to the implementation of the sample simulation algorithm later. Assume that α and β are independent random variables, which obey Γ(a, b) and Γ(c, d), respectively. Therefore, the form of their joint prior distribution is Then the posterior distribution of β and α is (using (4) and (15)).
where x = (x 1 , x 2 , x 3 , · · · , x m ) and when the prior distributions of β and α are both taken as non-informative priors; the form of their joint prior distribution is Thus the posterior distribution of β and α is (using (4) and (17)) where

Loss Function
In Bayesian estimation, the loss function is used as a way to evaluate the performances of estimators, and a penalty (the posterior risk) is assigned to each estimator. In fact, the loss function is used to measure the gap between the estimates and true values. Therefore, in order to minimize the posterior risk (PR), under different loss functions, we give Bayesian estimators for H and corresponding posterior risks in Table 1. Different situations require different loss functions. With the purpose of acquiring Bayesian inference more comprehensively, we adopt asymmetric and symmetric loss functions. Besides the squared error loss function (SELF), the weighted squared error loss function (WSELF), the precautionary loss function (PLF) and the K-Loss function (KLF) are also included (see [27,28]). The Bayesian estimators of entropy under different loss functions can be expressed as: where H( f ) represents the estimators of entropy H( f ), and the lower notation represents different loss functions. For example, H( f ) S stands for the Bayesian estimator of entropy against SELF.
In order to calculate the Bayesian estimators in (19), we need to obtain E(H( f )|x), However, one can observe that the entropy estimators we want to further calculate are in the form of a two-integral ratio, and it is not easy to simplify them to other closed integral forms. Hence, we consider two different methods, Lindley approximation and MCMC algorithm, to solve this problem. Table 1. Bayesian estimators and corresponding posterior risks of different loss functions.

Loss Function
Bayesian Estimator Posterior Risk WSELF:

Lindley Approximation
In this subsection, we employ an approximation method, which was proposed by Lindley in [29] to achieve the numerical calculation of entropy estimators. Referring to Lindley's method, we can define I(x) as where u(α, β) is the function of β and α, l(α, β) is the logarithm of likelihood (defined by (5)), and ρ(α, β) = ρ i (α, β) = ln π i (α, β) is the logarithm of the priors (defined by (15) and (17)), i = 1, 2. Further, use the Lindley method to approximate the Formula (20) as whereα andβ are the ML estimators of α and β. We use the symbol with a hat to represent the estimator, and the subscript indicates the derivative of the variable. For instance, the second derivative of the u(α, β) with respect to α is expressed as u αα . Similarly, others are expressed as follows: σ (i,j) represents the (i, j)th element of [− ∂ 2 l ∂α∂β ] −1 , i, j = 1, 2. By using the above expression, the Bayesian estimator of entropy in (19) can be further expressed in a more specific form. • The Bayesian estimator of entropy in (19) under SELF.
Obviously, the exact values of the entropy estimators in (19) can be calculated using Lindley approximate method. Then the corresponding posterior risk values of different loss functions in Table 1 can also be obtained.

MCMC Method with Gibbs Sampling
Although it is easy to calculate Bayesian estimates by the Lindley method, it cannot provide an interval estimation. In consideration of this, we adopted the MCMC method to compute the Bayesian estimates of entropy and obtain the corresponding HPD credible intervals. As a special case of the MCMC method, Gibbs sampling requires a marginal distribution for each parameter to generate a set of Markov chain samples. Take the gamma prior, for example: the conditional posterior densities of β and α can be obtained respectively as it can be seen that π 1 (α|β, x) is a gamma distribution. However, π 1 (β|α, x) is not a well-known distribution, so we adopt the Metropolis-Hastings method with a normal proposal distribution to generate the sample β (k) . Thus, the process of generating Markov chain samples can be described as in Algorithm 1.
In the interest of getting the HPDs, arrange where [(K − M)(1 − γ)] represents the largest integer less than or equal to (K − M)(1 − γ). The HPD credible interval is the shortest length interval in (29).

Monte Carlo Simulation
In this subsection, the Monte Carlo simulation method is used to further calculate the entropy estimators obtained by the different methods. In order to demonstrate the performances of proposed methods more comprehensively, we chose different sample sizes and progressive censoring schemes. That is, Scheme 1: R m = n − m, R 1 = R 2 = · · · = R m−1 = 0. Scheme 2: R 1 = n − m, R 2 = R 3 = · · · = R m−1 = 0.
The performance of point and interval estimation is evaluated by some different quantities. In order to compare the performance of ML and Bayesian estimates of entropy, average absolute bias (AB) and mean square error (MSE) are calculated respectively by  Table 1) were used to evaluate the performance of Bayesian estimates. To show the results more visually, Figure 1 was drawn with censoring ratio 50% and Scheme 2, where the horizontal axis has different n values, and the vertical axis is ABs under different methods. Given the censoring scheme (R 1 , R 2 , · · · , R m ),the values of n, m, the simulation steps are shown in Algorithm 2.
Here are some observations regarding the performances of the estimators for entropy according to Tables 2-5 and Figure 2: • Comparing the results in Table 2, for fixed n and increased m, the MSEs and ABs of the entropy estimates decrease as expected regardless of censoring schemes. For the fixed censoring ratio (e.g., n = 30, m = 15 and n = 40, m = 20), the MSEs and ABs also decrease distinctly with n increasing. The BOLD data in the Table 2 had the best performance under different methods respectively. • In Tables 2-4, different censoring schemes do have impacts on the estimated results, among which Schemes 2 and 1 performed best and worst respectively, in terms of MSEs, ABs and PRs. • According to Figure 1 and Table 2, the following conclusions can be drawn: In Bayesian estimation, the MSEs and ABs of entropy estimates with the informative prior are smaller than those with non-informative prior, which is obvious for both Lindley and MCMC methods. Meanwhile, under the same prior, MSEs and ABs under the Lindley method are close to those under MCMC, and even slightly smaller than that under MCMC. In Table 2, ML estimates are worse than Bayesian estimates, regardless of Lindley and MCMC methods, in terms of MSEs and ABs. The elementary reason is that Bayesian estimation combines data information with the priors of parameters, which ML estimation cannot achieve. • As shown in Tables 3 and 4, the estimates performed best against KLF, and worst against SELF in terms of PRs with both Lindley and MCMC methods for all censoring schemes (except for some small sample sizes). In Figure 2, it can be observed more intuitively that the posterior risk against SELF is the largest for small parameter values, and that the PRs against WSELF and PLF are very close in most cases. For the convenience of comparison, several posterior values of risk are marked in the figure.
It can be seen that the trends of PRs are very similar under two kinds of priors, but the PRs under non-informative prior are always strictly greater than the that under informative prior for the same parameters.
• In Table 5, it can be noted that the CVs of HPD credible intervals are quite close to the nominal level (95%) compared to ACIs. The AWs of ACIs and HPD intervals under non-informative prior are very close. However, with the informative prior, the HPD interval performs better than ACI for each censoring scheme.

Real Data Analysis
In this section, a set of real data is analyzed to illustrate the feasibility of the above model. This dataset was used in [2], which was about the first failure time of small electric trolleys within large manufacturing plants for transportation and delivery. Meanwhile, Lio et al. [30] used the goodness of fit test to check whether the dataset was reasonably acknowledged by Burr XII, and gave the ML estimates of the parameters using the dataset, α = 0.08 and β = 5.47, respectively. Meanwhile, the Kolmogorov-Smirnov test has p = 0.1008 and AIC = 4.1757. These results prove that Burr XII distribution fits this set of data reasonably. For easy reference, these dataset is reproduced in Table 6. The censoring scheme we chose was R 1 = R 2 = · · · = R 15 = 0, R 16 = 4. Then we got Table 7. Additionally, the 95% credible intervals using MCMC0 and MCMC1 method are, respectively, (3.441,7.304) and (3.371,6.971). If the experimenter obtains an extreme value of entropy using the real data, and is not sure whether it should be discarded or recalculated, one can refer to the credible interval to process it. However, the widths of credible intervals here are relatively wide. The primary reason may be that there is not a large sample size of real data. Through the simulation experiment results in the column of AWs (first column) in Table 5, it can be concluded that a larger sample size can lead to a shorter interval length. Therefore, in practice, a larger sample size is helpful to get a shorter posterior interval.

Conclusions
In this paper, we investigated the statistical inferences for the information entropy of Burr XII distribution using progressive type-II censored data. Based on point and interval estimation, frequentist and Bayesian estimations were developed. In the Bayesian section, we demonstrated the performances of estimators under different loss functions, prior distributions and censoring schemes, which is helpful for the selection of models with entropy, such as those using the maximum entropy principle.
We compared the ML and Bayesian estimators using Lindley and MCMC methods in terms of MSEs (ABs), and found that Bayesian estimators performed significantly better than ML estimators, and that Bayesian estimators with the informative prior performed better than those with the non-informative prior. Additionally, it was also found that different censoring schemes do have impacts on the estimated results, among which Scheme 2 performed best. If one want to estimate the entropy of the Burr XII distribution using progressive censored data, as in the case in this article, Bayesian estimation with the informative prior and censoring Scheme 2 may be the appropriate choices. Posterior risks are used to evaluate the performances of estimators against different loss functions, which provides a variety of comparative reference.
where X is a random variable with pdf f (x). Thus it is necessary to further deduce E(ln X) and E ln(1 + X β ) . 1 Taking the derivative of α on both sides of Make the following variable substitution 1 + x β = 1