Bayesian and Non-Bayesian Inference for the Generalized Pareto Distribution Based on Progressive Type II Censored Sample

Abstract: In this paper, first we consider the maximum likelihood estimators for two unknown parameters, reliability and hazard functions of the generalized Pareto distribution under progressively Type II censored sample. Next, we discuss the asymptotic confidence intervals for two unknown parameters, reliability and hazard functions by using the delta method. Then, based on the bootstrap algorithm, we obtain another two pairs of approximate confidence intervals. Furthermore, by applying the Markov Chain Monte Carlo techniques, we derive the Bayesian estimates of the two unknown parameters, reliability and hazard functions under various balanced loss functions and the corresponding confidence intervals. A simulation study was conducted to compare the performances of the proposed estimators. A real dataset analysis was carried out to illustrate the proposed methods.


Introduction
In recent years, with the continuous development of statistics, the research on the generalized Pareto distribution (GPD) is gradually deepened.The generalized Pareto distribution is an important distribution in statistics, which is widely used in the fields of finance and engineering.Zhang [1] proposed the likelihood moment estimation method for parameters in the generalized Pareto distribution.By combining maximum likelihood estimation and goodness of fit, Husler et al. [2] proposed a new estimator for the generalized Pareto distribution.Rezaei et al. [3] derived the maximum likelihood estimators, Bayes estimators and some confidence intervals for stress-strength reliability based on progressive Type II censoring schemes.
The cumulative distribution function (cdf) and the probability density function (pdf) of the GPD are, respectively, given by where x > 0, α > 0 and λ > 0. Here, α and λ are the shape and scale parameters, respectively.Then, reliability function r(x) and hazard function h (x) are, respectively, derived by In many cases, censored samples are more popular than complete samples because they need less test time and save some associated costs.Progressive Type II censoring scheme is a more common sampling method.It has the flexibility to remove units at points other than the end point of the experiment.The following is a brief introduction to the progressive Type II censoring scheme.Assume that there are n independent and identically distributed units in the experiment.There will be m failures observed before the end of the experiment.At the time of the first failure X 1:m:n , R 1 units are randomly removed from the n − 1 surviving units.When the second failure X 2:m:n occurs, R 2 units are randomly removed from the surviving n − R 1 − 2 units.Finally, when the mth failure occurs, the remaining R m = n − ∑ m−1 i=1 R i − m units are all discarded and the experiment ends.Then, we get the censoring scheme (R 1 , R 2 , • • • , R m ).There are two special cases worth noting.When ) is Type II censoring scheme.In addition, when R 1 = R 2 = • • • = R m = 0, the censoring scheme (0, 0, • • • , 0) corresponds to the complete sample.Some authors have studied different inference problems based on the progressive Type II censoring scheme.Wang et al. [4] studied inverse estimation for both parameters in a certain family of two-parameter lifetime distributions and some confidence intervals based on progressive Type II censoring scheme.Kim et al. [5] proposed Bayes and the maximum likelihood estimators for parameters and the reliability function of the exponentiated Weibull lifetime distribution on basis of progressive Type II censored sample.Ahmed [6] obtained the maximum likelihood and Bayes estimators of both parameters, reliability and hazard functions with two-parameter bathtub-shaped lifetime model under progressive Type II censoring scheme.
For Bayesian inference, we need to calculate posterior quantities of interest, which is essentially the calculation of the high-dimensional integral of a function.However, in practice, it is difficult to compute those high-dimensional integrals because they are complex and uncommon.The Markov Chain Monte Carlo (MCMC) technique properly solves this problem by calculating high-dimensional integrals in a simulated manner.It provides a convenient and efficient way to extract samples from the target posterior distribution.In recent years, it is common to estimate unknown parameters using the MCMC methods.Meshkanifarahani and EsmaileKhorram [7] derived Bayes estimators of parameters in the weighted exponential distribution by using MCMC methods.Jaheen and Harbi [8] obtained Bayes estimators of parameters and the reliability function in the exponentiated Weibull distribution by using MCMC methods.Pandey and Bandyopadhyay [9] obtained Bayes estimators of parameters in the inverse Gaussian distribution under MCMC methods.Sarikhanbaglu [10] discussed the maximum likelihood estimation and Bayesian estimation of the parameters and reliability functions of the GPD, which are based on a progressive Type II censored sample with random (Binomial) removals.Abdallah [11] studied the maximum likelihood estimation, Bayesian estimation and bootstrap estimation of three unknown parameters for the new Weibull-Pareto distribution based on progressive Type II censored sample.Based on a certain class of exponential-type distributions including the Pareto distribution, Abdel-Aty et al. [12] proposed the Bayesian prediction intervals of generalized order statistics by using multiply Type II censoring scheme.El-Sagheer [13] dealt with the Bayesian point prediction for the GPD based on general progressive Type II censored sample.
In this article, we derive the maximum likelihood and Bayesian estimators of the two unknown parameters, the reliability and hazard functions in the GPD under progressive Type II censored sample.We use various methods including maximum likelihood method, delta method, logit transformation, arc sine transformation and bootstrap algorithm to obtain different confidence intervals.More importantly, we discuss Bayesian estimates based on different balanced loss functions using the MCMC methods.In addition, we analyzed the influence of relevant parameters on the estimation results in a simulation study.Two examples were analyzed to illustrate the proposed methods.
The rest of this article is organized as follows.Section 2 discusses the maximum likelihood estimators of the two unknown parameters, the reliability and hazard functions.Some asymptotic confidence intervals are obtained in Sections 3 and 4 derives confidence intervals by using Bootstrap algorithm.We discuss Bayesian estimation under MCMC methods and four different balanced loss functions in Sections 5 and 6 shows some simulation results.Section 7 presents a numerical example to illustrate the methods proposed above and Section 8 makes the conclusion.

Maximum Likelihood Estimation
In this section, we discuss the maximum likelihood estimators of the two unknown parameters, the reliability and hazard functions with generalized Pareto distribution based on progressive Type II censored sample.Let x 1:m:m ≤ x 2:m:n ≤ • • • ≤ x m:m:n be a censored sample from GPD with the censoring scheme (R 1 , R 2 , • • • , R m ).Then, the likelihood function can be written as where Then, we get the log-likelihood function by Thus, the corresponding likelihood equations are obtained by and respectively.According to Equation (7), we can get the expression of the MLE of α by Substituting Equation (9) into Equation ( 7), we have Since it is difficult to find an explicit solution to Equation (10), we can use R to find the MLE of λ.Then, we put the MLE of λ into Equation ( 9) to get αMLE .
Utilizing the property of invariance, we can get the MLE of the reliability and hazard function from Equations ( 3) and (4) by Next, we consider the approximate confidence intervals of two parameters, the reliability and hazard function.

Asymptotic Confidence Interval
The 100(1 − ξ)% confidence intervals of α and λ can be obtained from the asymptotic normal distribution of the maximum likelihood estimators (α MLE , λMLE ), and their variance (Var(α MLE ), Var( λMLE )) can be obtained from the inverse of the Fisher information matrix.
The Fisher information matrix I = I(α, λ) is given by the expectation of the negative second derivative of the log-likelihood function.Then, we have where The expectation of the above expression is difficult to get.Therefore, we consider the observed Fisher information matrix, which is taken at point (α, λ) = (α MLE , λMLE ).The variance-covariance matrix of (α MLE , λMLE ), which is the inverse of the observed Fisher information matrix, is given by In general, (α MLE , λMLE ) approximates a bivariate normal distribution with mean (α, λ) and variance-covariance matrix respectively.The coverage probabilities can be expressed as respectively, where Z ξ/2 is the ξ/2 percentile of the standard normal distribution.Moreover, to construct the asymptotic confidence intervals for reliability and hazard functions, we need to apply the delta method to estimate their variances.Let where Then, the asymptotic estimators of Var(r) and Var( ĥ) are defined as respectively.Therefore, we have the following relationships: V ar(r) ∼ N(0, 1), ĥMLE (t) − h (t) V ar( ĥ) ∼ N(0, 1).
Furthermore, we can derive the 100(1 − ξ)% asymptotic confidence intervals of r(t) and h (t) by The coverage probabilities can be expressed as respectively.

Bootstrap Confidence Interval
When the sample size is small, it is well known that confidence intervals based on the progressive results does not perform well.Thus, we discuss two confidence intervals on basis of the bootstrap algorithm, including the percentile bootstrap algorithm, which we call the bootstrap-p algorithm, and the bootstrap-t algorithm.

Bootstrap-t Algorithm
Compared to the bootstrap-p algorithm, the bootstrap-t algorithm is slightly more complicated, but more accurate when the sample size is small.The following algorithm can be used to obtain the bootstrap-t confidence intervals. (1) Based on the progressive Type II censored sample X, calculate the MLEs of α and λ expressed as αMLE , λMLE by solving Equations ( 7) and ( 8).(3) Define the following statistics: and T h = ĥ * − ĥMLE √ V ar( ĥ * ) , where V ar( α * ), V ar( λ * ), V ar(r * ) and V ar( ĥ * ) are obtained using the Fisher information matrix and delta method.

Bayesian Estimation
In this section, we consider the Bayesian estimators of α, λ, reliability and hazard functions and the corresponding confidence intervals using the Markov Chain Monte Carlo (MCMC) technique.Suppose α and λ obey the gamma prior distribution independently.Then, the prior density functions of α and λ can be written as Therefore, the joint prior density function of α and λ is obtained by By using the likelihood function given by Equation ( 5) and the joint prior density function given by Equation (33), the joint posterior density function of α and λ is given by Obviously, Equation ( 34) is complicated and difficult to get the posterior conditional distribution of each parameter.Therefore, we consider the Gibbs sampling method, which is the simplest, most intuitive, and most widely used MCMC method.The Gibbs sampling method needs to simulate sampling from these posterior conditional distributions.With Equation (34), we can get the posterior conditional density of α by Similarly, the posterior conditional density of λ is given by From Equation (35), it is easy to see that samples of α can be generated using any gamma generating routine.By observing Equation (36), we know that the conditional posterior distribution of λ is not a common distribution, so we cannot sample directly from it in the usual way.Thus, we apply the Metropolis-Hastings algorithm (M-H algorithm) to obtain random data from this distribution.The M-H algorithm is a useful method for generating random samples from the posterior distribution using a proposal density.Therefore, we need to generate an initial value from the normal proposal distribution N(λ, V(λ)), where V(λ) represents the variance of λ.See [15] for more details.
We set up the M-H algorithm in Gibbs sampling as follows: Algorithm 1 M-H algorithm in Gibbs sampling 1: Set the initial value (α (0) , λ (0) ) and set j = 1.

11: until j=N+1
To ensure convergence and eliminate the impact on the choice of initial values, we discard the first M pairs of analog values.Therefore, for a large enough N, we obtain approximate posterior samples α (j) and λ (j) for Bayesian estimation, where j = 1, 2, • • • , N.

Bayesian Estimation under Balanced Loss Functions
In Bayesian estimation, to achieve the best estimate, a suitable loss function has to be chosen.However, there is no specific procedure in the estimation process to determine which loss function we should use.In many cases, authors usually use a symmetric loss function for convenience.However, in the case where losses are asymmetrical, it is not appropriate to choose a symmetric loss function indiscriminately.Thus, we need to consider some asymmetric loss functions.Furthermore, in some cases, when one loss is a real loss function, the Bayesian estimation under the other loss function performs better than that under the real loss function.Therefore, we consider different loss functions to get a better understanding in Bayesian analysis.In this paper, we discuss several symmetrical and asymmetric loss functions, such as K-Loss function, modified squared error loss function, precautionary loss function, etc.In this section, we consider the balanced loss function (see [16]), which is a more general asymmetric loss function and can be written as where ρ(θ, δ) represents an arbitrary loss function, and δ 0 represents a priori target estimator of θ, which can be obtained by maximum likelihood, least-squares or unbiasedness.ω represents the weight of ρ(δ 0 , δ), which ranges from 0 to 1.
ρ(θ, δ) can choose a variety of loss functions.When ρ(θ, δ) chooses the symmetric loss function , we obtain the balanced K-Loss function (BKLF), which can be written as We obtain the Bayesian estimator of θ under the BKLF by , we obtain the balanced weighted squared error loss function (BWSELF) by The Bayesian estimator of θ under the BWSELF is derived by , the balanced modified squared error loss function (BMSELF) has the following form: and the Bayesian estimator of θ under the BMSELF is given by When ρ(θ, δ) = (δ−θ) 2 δ , the balanced precautionary loss function (BPLF) is represented by and the Bayesian estimator of θ under the BPLF is obtained by By observing Equation (39), it is easy to know that, when ω = 1, the Bayesian estimate under BKLF is equivalent to the maximum likelihood estimate, and when ω = 0, it is equivalent to the Bayesian estimate under KLF (symmetric).According to Equations (41), ( 43) and (45), it is clear that the Bayesian estimates under the balanced loss functions are equivalent to the maximum likelihood estimates in the case of ω = 1, and are equivalent to the Bayesian estimates under different asymmetry loss functions (e.g., WSELF, MSELF, and PLF) in the case of ω = 0.
According to Equation (39), the approximate Bayesian estimates of α, λ, reliability and hazard functions based on the BKLF are given by respectively.In addition, by using Equation (41), we can obtain the approximate Bayesian estimates of α, λ, reliability and hazard functions based on the BWSELF by respectively.Similarly, by observing Equation (43), we can derive the approximate Bayesian estimates of α, λ, reliability and hazard functions under the BMSELF by respectively.Finally, from Equations (45), the approximate Bayesian estimates of α, λ, reliability and hazard functions under the BPLF are obtained by respectively.Further, α (j) , λ (j) , r (j) (x) and h (j) (x), j = M + 1, • • • , N are arranged in ascending order, and the 100(1 − ξ)% approximate confidence intervals of α, λ, r(t) and h (t

Simulation Study
In this section, we discuss the performance of the estimates and confidence intervals that are considered in the previous sections using different methods.Here, we report the simulation results in the case of α = 0.5, λ = 1.2, r(t) = 0.6742 and h (t) = 0.2727.For the point estimation methods, we compared the expected values (EV) and mean square errors (MSEs) of the estimators for α, λ, reliability and hazard functions.In addition, we generated kinds of progressively Type II samples from the generalized Pareto distribution (GPD) for different progressively Type II censoring schemes.For example, ((1, 0) * 3) presents the censoring scheme (R 1 , R 2 , R 3 , R 4 , R 5 , R 6 ) = (1, 0, 1, 0, 1, 0) in tables.For each censoring scheme, we calculated the MLEs of α, λ, r(t), h (t) by using Equations ( 9)- (12).According to Equations ( 46)-(61), we computed the Bayesian estimates of α, λ, r(t), h (t) under various balanced loss functions.In the Bayesian estimation under various balanced loss functions, for each given censoring scheme, we compared the EVs and MSEs of the parameters when ω takes 0, 0.3, 0.6, and 0.9, respectively, to obtain an optimal ω.We derived EVs and MSEs of these estimators more than 1000 times and then took the means (Tables 1-5).For the interval estimation methods, Tables 6 and 7 report the coverage probabilities (CPs) and average lengths (ALs) of the 95% confidence intervals (CIs) using the delta method and Bayesian estimation, respectively, which were based on 1000 simulations.See Appendix A for a selected R code.
For Bayesian estimation, we considered the case where the hyperparameters are 0, i.e., a 1 = a 2 = b 1 = b 2 = 0. Based on the MCMC methods discussed in Section 5 and the conclusions drawn in Section 6, we obtained the estimates for ω = (0, 0.1, 0.2, 0.3) under various balanced loss functions, as shown in Table 9.In addition, the 95% confidence intervals obtained by using MCMC techniques are shown in Table 8.

Conclusions
In this article, we discuss the MLEs for α, λ, r(t) and h (t) under progressive Type II censored sample.We also consider the asymptotic confidence intervals for α, λ, r(t) and h (t) by using maximum likelihood method, delta method, logit transformation and arc sine transformation.For comparison, we establish the bootstrap-p and bootstrap-t CIs.More importantly, we use the MCMC methods to derive the Bayesian estimates of α, λ, r(t) and h (t) under various balanced loss functions and the corresponding confidence intervals.To compare the performances of the proposed estimators, a simulation study was conducted.It is easy to know that the Bayesian estimation method is more efficient in most cases.We analyzed the influence of relevant parameters on the estimation results in the simulation study, and we found that it is more appropriate to choose BMSELF and let ω take a value between 0 and 0.3.Finally, a real dataset analysis was carried out to illustrate the proposed methods.