Estimation and Prediction for Nadarajah-Haghighi Distribution under Progressive Type-II Censoring

: The paper discusses the estimation and prediction problems for the Nadarajah-Haghighi distribution using progressive type-II censored samples. For the unknown parameters, we ﬁrst calculate the maximum likelihood estimates through the Expectation–Maximization algorithm. In order to choose the best Bayesian estimator, a loss function must be speciﬁed. When the loss is essentially symmetric, it is reasonable to use the square error loss function. However, for some estimation problems, the actual loss is often asymmetric. Therefore, we also need to choose an asymmetric loss function. Under the balanced squared error and symmetric squared error loss functions, the Tierney and Kadane method is used for calculating different kinds of approximate Bayesian estimates. The Metropolis-Hasting algorithm is also provided here. In addition, we construct a variety of interval estimations of the unknown parameters including asymptotic intervals, bootstrap intervals, and highest posterior density intervals using the sample derived from the Metropolis-Hasting algorithm. Furthermore, we compute the point predictions and predictive intervals for a future sample when facing the one-sample and two-sample situations. At last, we compare and appraise the performance of the provided techniques by carrying out a simulation study and analyzing a real rainfall data set.


Introduction
Nowadays, censoring has aroused great concern in the aspects of various reliability studies and life tests. There are already certain kinds of censoring schemes introduced in many references, among which type-I and type-II censoring are likely to be the two most famous types. Plenty of literature has discussed them, such as [1,2]. A life test in type-I censoring ends at a prespecified time, so the number of observations is random. By contrast, a life test in type-II censoring is not finished until a prefixed number of failures occur.
However, there are some constraints in type-I and type-II censoring. The units cannot be withdrawn when an experiment is in progress, for example. In this case, a kind of more flexible and more general censoring scheme named progressive type-II censoring scheme was put forward in order to improve this disadvantage. That is, it allows the removal of units in between. This scheme is introduced as follows. We assume that the number of n units are put in a life test. R 1 of the n − 1 surviving units are withdrawn at random from the experiment as the first failure takes place. Likewise, after the second failure, R 2 out of the n − R 1 − 2 remaining surviving units are randomly removed again. This experiment continues until m failures happen. The remaining units of size R m are withdrawn from the test at this moment, where R m = n − m − ∑ m−1 i=1 R i . Then, R = (R 1 , R 2 , · · · , R m ) is named the progressive type-II censoring scheme, which should be prefixed before the experiment. Many statistical inferences and studies have been done involved with progressive censoring by several researchers like [3][4][5][6].
An extension of the exponential distribution was suggested by [7] by introducing a shape parameter. Its corresponding probability density function (PDF) is defined by f (x; α, λ) = αλ(1 + λx) α−1 exp{1 − (1 + λx) α }, x > 0; α > 0, λ > 0, (1) and cumulative distribution function (CDF) is expressed as where the shape and scale parameters are respectively α and λ. Both of them are positive. We denote this distribution by NH(α, λ). In addition, the NH distribution in some other literature is also named the extended exponential distribution, such as [8,9]. Ref. [7] made a discovery that even though some data sets contained the value zero, an NH distribution still could provide suitable fits to them. In addition, the trend of hazard rate function is dependent on α in this distribution. For instance, the hazard rate function tends to increase (decrease) when α > 1 (α < 1), respectively. However, the hazard rate function turns into a constant as α is given the value of 1. At this time, (1) becomes the one-parameter exponential distribution as well.
Some literature can be found with regard to the statistical inferences for the NH distribution. As for the estimation of the unknown parameters, based on progressive type-II censored samples with binomial removals, Ref. [10] has discussed the maximum likelihood and Bayes estimations. Then, Ref. [11] researched progressive type-II censored data. They calculated the maximum likelihood estimates (MLEs) through the method of Newton-Raphson, and Bayes estimates through the Markov Chain Monte Carlo method. Interval estimations were made like asymptotic confidence intervals (CIs) and the highest posterior density (HPD) intervals as well. Several references also laid emphasis on the problems of order statistics of an NH distribution, such as [8,9]. They both worked on recurrence formulas of the moments for order statistics. However, the former dealt with the situation in progressive type-II right censoring while the latter focused on the complete samples. Additionally, for the unknown parameters, Ref. [12] discussed MLEs, and Bayesian estimates by the Lindley approximation. The point and interval predictions for future records were also made by making use of non-Bayesian and Bayesian predictions. In addition, based on the progressively first-failure censored NH distribution, Ref. [13] studied the MLEs and Bayes estimates for the two unknown parameters and lifetime parameters of survival and hazard rate functions. They also suggested an optimal censoring scheme through different optimality criteria.
In the article, using progressive type-II censored samples, we discuss the estimation and prediction problems for the NH distribution by diverse kinds of methods. In addition, we compare their performance through the simulation study and a real rainfall data set.
In Section 2, for the unknown parameters, we utilize the Expectation-Maximization (EM) algorithm for calculating the MLEs. Then, on the basis of the Fisher information matrix, we set up the asymptotic intervals. Furthermore, here bootstrap intervals are introduced as well. In Section 3, under two types of loss functions, Bayesian estimators of unknown parameters are calculated through both the Metropolis-Hasting (MH) method and Tierney and Kadane (TK) algorithm. Then, using the sample generated by the MH algorithm, we get the HPD credible intervals as well. In Section 4, Bayesian predictive intervals and the point prediction of a future sample are discussed. Additionally, we compare and appraise the performance of the provided techniques by carrying out a simulation study and analyzing a real data set in Section 5.

Maximum Likelihood Estimation
We assume that the number of n units are placed in a life test. They are independent and identically follow the NH distribution defined in (1). Under a progressive type-II censoring scheme R = (R 1 , R 2 , · · · , R m ), let us denote a set of censored samples of size m by X = (X 1:m:n , X 2:m:n , · · · , X m:m:n ) from the NH distribution. In light of [3], the likelihood function related to α and λ is written as , and x = (x 1 , x 2 , · · · , x m ) stands for the corresponding observed value of X. Ignoring the constant, we compute the log-likelihood function by As usual, to acquire the MLEs of α and λ, we are supposed to solve the corresponding likelihood equations as However, it is infeasible to solve the above equations analytically in closed form. Therefore, some numerical techniques like the EM algorithm are suggested for calculating the desired MLEs. As a solution to lost or partial data situations, Ref. [14] introduced the EM algorithm at first.
We are aware that the progressive type-II censored data are not complete, so they can be treated as a set of incomplete data. Now we assume that Z = (Z 1 , Z 2 , · · · , Z m ) represents censored data and X = (X 1:m:n , X 2:m:n , · · · , X m:m:n ) represents observed data. It is worth noting that for i = 1, 2, · · · , m, every Z i is a set of 1 × R i vectors of (Z i1 , Z i2 , · · · , Z iR i ). Consequently, observed data X combine censored data Z to form the complete data set as V = (X, Z). We disregard the additive constant and then obtain the corresponding log-likelihood function L c (V; α, λ) for V as where z ik denotes a censored value of Z iR i for k = 1, · · · , R i and i = 1, · · · , m. The EM algorithm is composed of two parts named E-step and M-step. To begin with, the E-step is involved with calculation related to the corresponding pseudo-log-likelihood function as where As for the M-step, the maximum value of (8) depends on the values of α and λ. We assume that the estimate of (α, λ) is referred to as (α (j) , λ (j) ) in the jth iteration. Thus, we can figure out (α (j+1) , λ (j+1) ) through maximizing the function As the values of α (j) and λ (j) are known, at first, λ (j+1) is calculated through solving the equation Once λ (j+1) is obtained, we can also solve α (j+1) by the following equation Then, in the next iteration, we treat (α (j+1) , λ (j+1) ) as the updated numeric value for (α, λ). Like this, we proceed with the above two steps repeatedly before α and λ finally converge to their MLEs. According to [15], the log-likelihood function is ensured to increase with each iteration. As a result, the EM algorithm almost always manages to make the likelihood function attain its local maximum value despite beginning with a random value of the parameter domain. When, in Section 5, using the EM algorithm for MLEs under progressive type-II censoring, we take the MLEs derived from complete samples as initial values.

Fisher Information Matrix
Let θ denote the unknown parameter (α, λ). We assume that V represents the complete data and X represents the observed data. Then, let us denote the observed, missing, and complete information by I X (θ), I V|X (θ), and I V (θ), respectively. Based on the idea of [16], we can obtain where and We note that these are two 2 × 2 matrices. Moreover, I j V|X (θ) stands for the Fisher information matrix of one single observation which is censored as the jth failure occurs. Subsequently, we first calculate the elements of I V (θ) as follows.
Then, we can calculate the total missing information matrix by the following expression Based on the two matrices above, we easily compute the observed information matrix from (14) and its inverse matrix as well. The MLEθ = (α,λ) has asymptotic normality, so we can write it asθ ∼ N(θ, I −1 X (θ)). As a consequence, the 100(1 − p)% asymptotic CIs are derived by where 0 < p < 1 and z p 2 is a numerical number of the upper p 2 th percentile of the standard normal variate. Here, two elements of the main diagonal of I −1 X (θ) are Var(α) and Var(λ).

Bootstrap Confidence Intervals
As is known to us, the asymptotic CIs perform poorly in a sample with a small size. The bootstrap methods are more likely to provide more approximate confidence intervals. As a result, we introduce two CIs using the bootstrap methods. One is the percentile bootstrap (Boot-p) method which was put forward by [17]. According to [18], another is called the bootstrap-t (Boot-t) method. The Boot-p method and the Boot-t method are presented in Algorithms 1 and 2, respectively.

Algorithm 1 Boot-p method
Require: The number of simulation times N; the censoring scheme R; the confidence level 100(1 − p)%; the progressive type-II censored sample X.

Algorithm 2 Boot-t method
Require: The number of simulation times N; the censoring scheme R; the confidence level 100(1 − p)%; the progressive type-II censored sample X.

Bayesian Estimation
In order to choose the best Bayesian estimator, a loss function must be specified and used to represent the penalty associated with each possible estimator. Researchers usually use the symmetric squared error loss function. There is no doubt that it is reasonable to use the squared error loss function when the loss is essentially symmetric. However, for some estimation problems, the actual loss is often asymmetric. Therefore, we also need to choose an asymmetric loss function.
We consider Bayesian estimators under symmetric and asymmetric loss functions. We suppose that δ is a Bayesian estimate for θ. Firstly, the symmetric squared error loss (SEL) function is written as Hence, under L S (θ, δ), the Bayesian estimate for θ is expressed by In light of [19], the balanced squared error loss (BSEL) function has the following form: where 0 ≤ ≤ 1, and δ 0 is a known estimate of θ. It is worth noting that when is given as the value of 0, the BESL function turns into the SEL function. Based on the BSEL function L B (θ, δ), the Bayes estimator is calculated by Let us denote a set of progressive type-II censored samples by X = (X 1:m:n , X 2:m:n , · · · , X m:m:n ) from the NH distribution. Suppose that two independent parameters α and λ follow the gamma prior distributions as Here, the prior information is shown by selecting the numerical numbers for four hyperparameters a, b, c, and d. Hence, the joint prior density is expressed by Consequently, the joint posterior density is then obtained by where denotes a normalizing constant. As we can see, it is impractical to deal with the joint posterior distribution analytically. The Bayesian estimators of some functions involved with α and λ also include the rates between two integrals. Therefore, as we calculate Bayesian estimators based on these two kinds of loss functions, two approximate methods are proposed to cope with the corresponding ratio of integrals in the next two subsections, respectively.

Metropolis-Hastings Algorithm
As we can see from the previous subsection, for the unknown parameters, the TK method is not able to make interval estimates. Thus, we introduce another method called the MH algorithm to obtain some approximate Bayesian point and interval estimations. The MH algorithm is a very general type of Markov Chain Monte Carlo technique. For the first time, Ref. [21] suggested this technique. Then, it was expanded based on the idea of [22]. From their studies, we are aware that no matter how complex the known target distribution is, the MH algorithm can still derive random samples from it.
A bivariate normal distribution is treated as the proposal distribution for α and λ. Using the posterior density, for unknown parameters, we can derive samples from it so as to calculate Bayesian point estimators and HPD interval estimations. We present the main steps of the MH algorithm in Algorithm 3.
For the sake of ensuring the convergence and reducing some effects brought by (α 0 , λ 0 ), we abandon the initial N 0 number and then compute estimates from the rest of the N − N 0 samples. Therefore, using the SEL function, we can calculate the Bayesian estimates byη where η can be α and λ.
Likewise, we also calculate the corresponding Bayes estimates based on the BSEL function. In addition, the 100(1 − p)% HPD credible intervals are set up as well according to [23]. We sort the rest of samples in ascending order to be η (N 0 +1) , η (N 0 +2) , · · · , η (N) . Then, we can compute the 100(1 − p)% Bayes interval estimate of η as The shortest interval among (39) is taken as the 100(1 − p)% HPD interval of η.

Bayesian Prediction
We consider estimates of future samples in terms of known information and compute corresponding predictive intervals as well. Using the predictive distribution, the Bayesian approach on the prediction of future samples is treated as a significant topic in statistics. Subsequently, in the light of [24], we make discussions about Bayesian prediction facing the one-sample and two-sample situations indicated.

One-Sample Prediction
Let us denote a set of m progressive type-II censored data by X = (X 1:m:n , X 2:m:n , · · · , X m:m:n ). It is obtained from n samples which follow the NH distribution described in (1) based on a censoring scheme R = (R 1 , R 2 , · · · , R m ). Suppose that Z i = (Z i1 , Z i2 , · · · , Z iR i ) stands for the ordered statistic of R i removed units as the ith failure is observed. Using the observed samples x, we focus on predicting the value of z = (z ik , k = 1, · · · , R i ; i = 1, · · · m). As a consequence, we can get the conditional density of z as According to the conditional density, the survival function can be expressed by Therefore, based on a prior φ(α, λ|x), the associated posterior predictive density is given by and posterior survival function is also given by Consequently, the two-sided 100(1 − p)% symmetric predictive interval (L o , U o ) of z is the solution to the following equations: As for the point prediction of the kth future sample z of the R i surviving units, on the basis of the SEL function, it is predicted bŷ where In our model, we then obtain that We note that we cannot evaluate the above expressions analytically. Thus, we adopt the MH algorithm to tackle this problem and then compute the point predictions. Assume that (α q , λ q ), q = 1, 2, · · · , N denote the samples which are derived from the posterior distribution. Therefore, we can compute the point prediction for z aŝ

Two-Sample Prediction
We generally consider the prediction in a two-sample situation as an extension of that in a one-sample situation. There are two samples in the two-sample prediction problem. One is named the informative sample, and another is referred to as the future sample. Suppose that W = (W 1 , W 2 , · · · , W M ) of size M stands for the ordered statistic of future observations, and it is independent of the informative sample X = (X 1:m:n , · · · , X m:m:n ). Thus, we focus on making predictions of observation of the jth failure of a future sample W generated identically from the target distribution. Therefore, we can calculate the predictive density of W j by where w j denotes the value of W j . Then, the survival function can be computed as S j (u|α, λ) = P(w j > u).
Consequently, the posterior survival function is written by where x = (x 1 , x 2 , · · · , x m ). The two-sided 100(1 − p)% symmetric predictive interval (L t , U t ) of w j is the solution to the following expressions: The posterior density is also obtained as Then, we can compute the point prediction of jth order lifetime aŝ where Using the MH algorithm as in the previous subsection, we can calculate the point prediction of a future sample W j aŝ H(α q , λ q ). (54)

Data Analysis and Simulation Study
For the purpose of demonstrating practical applications of provided techniques on estimation and prediction problems, in the next two subsections, we compare and appraise their performance by carrying out a simulation study and analyzing real data, respectively.

Data Analysis
We consider a set of real data on the total annual rainfall (unit: inches) during January from 1880 to 1916 researched by [12]. It is listed in Table 1 below. With an aim to analyze real data, at first, we calculate the MLEs through the EM algorithm and then figure out the Akaike information criterion (AIC), Bayesian information criterion (BIC), and Kolmogorov-Smirnov (K-S) statistics for goodness-of-fit tests. For comparison, some other life distributions are also put on goodness-of-fit tests, such as Generalized Exponential, Chen, and Inverse Weibull distributions. Their PDFs are given as below, respectively.
(1) The PDF of the Generalized Exponential distribution is (2) The PDF of the Chen distribution is (3) The PDF of the Inverse Weibull distribution is Table 2 presents all the test results of these distributions. As we all know, the smaller the numerical values of K-S, AIC, and BIC are, the bigger the log-likelihood function value is, and the better the model performs. Therefore, it can be seen from the table that the NH distribution provides a more suitable fit to this set of real data. For more fitting illustration, two plots made through the MLEs are provided in Figure 1. The first plot contains the fitted CDFs of the Nadarajah-Haghighi, Generalized Exponential, Chen, and Inverse Weibull distributions for the real data set. The second plot consists of the histogram of the real data and the fitted PDFs of these four distributions.
Then, we obtain some statistical inferences for the two sets of data. For the unknown parameters, we calculate TK and MH estimations accompanied by their MLEs under different loss functions in Table 5. Then we tabulate some interval estimations of α and λ in Table 6 which include 90% asymptotic CIs, 95% bootstrap CIs, and 95% HPD interval. Additionally, in Table 7, in the one-sample situation, we tabulate point predictions and the 95% predictive intervals for the kth experimental unit which is censored at the ith failure. Furthermore, Table 8 includes the point prediction and also the 95% predictive intervals for the top five items of future observations of size 25 and 30 under the two-sample framework, respectively.

Simulation Study
We implement a simulation study for comparison of numerical results and then evaluate the proposed methods. Using different censoring schemes, we first derive progressive type-II censored samples that follow an NH(α,λ) distribution, whereα = 1.4 andλ = 0.2 are true values we choose. Next, the EM algorithm can be applied so as to compute the MLEs. Afterwards, we give hyperparameters (a, b, c, d) the values as (0.5, 1.7, 0.5, 5.6). The Bayesian estimators under two different types of loss functions are calculated by the TK and MH methods. We take the mean-squared error (MSE) values into account for comparing these Bayes estimates with different methods. The MSEs of α and λ have the following forms whereα q andλ q denote the estimates of α and λ at qth simulation time from different methods. Different values of in the BSEL functions are considered as a kind of measure of its influence on Bayes estimates. In addition, all the (n, m) and censored schemes are shown in the tables, where m denotes the quantity of observed samples and n denotes the quantity of complete samples.
In Table 9, we tabulate average estimators and also the values of MSEs for α and λ under diverse censoring schemes. For each (n, m) with each censoring scheme, the first and the second rows show the average estimators of α and their MSEs. In addition, the average estimators of λ and their MSEs are shown in the third and fourth rows. This table demonstrates that the estimators computed through the MH algorithm have better performance than those obtained by the TK method. Additionally, Bayes estimates almost always play a more prominent role than the corresponding MLEs.
In Table 10, under pre-specified sample sizes and censoring schemes, we construct varieties of interval estimations for the unknown parameters along with their coverage probabilities (CPs) and average length (AL). In each (n, m) with each censoring scheme, the first and second rows respectively show the statistical inferences of α and λ. For more illustration, we also provide four plots in Figures 2 and 3, where the numbers 1 to 12 on the x-axis denote all the schemes from top to bottom in the table. Figure 2 includes the AL for different intervals of α and λ under different schemes, respectively. In addition, under different schemes, the CPs for different intervals of α and λ are provided in Figure 3. As we can see, the AL of the HPD intervals are the shortest, followed by the bootstrap intervals, and the asymptotic intervals are the longest. However, as for the CPs, the asymptotic intervals hold the highest, followed by the HPD intervals, while the CPs of bootstrap often lie below the nominal level. Therefore, considering both AL and CPs, we believe that the asymptotic and bootstrap intervals are inferior to the HPD intervals.    In Table 11, we compute point prediction and 95% predictive intervals for different censored observations using the MH algorithm. We can see from the predicted intervals that when we censor more than one unit at a time, the AL of the prediction intervals of the unit censored at the early stage is shorter than that at the late stage. However, when we censor just one unit many times, we find that the later the stage is, the smaller the AL of the interval prediction is. The AL of prediction intervals is inclined to shorten as the observed sample grows in number.  In Table 12, under the two-sample prediction frame, we tabulate the point predictions and 95% predictive intervals of the first, the third, and the fifth future observations of total size 5. It is worth noting that the lengths for these interval predictions are prone to become longer with the increasing value of j. As a consequence, M = m is taken into consideration when the real data are under analysis.

Conclusions
This article provides the classical and Bayesian estimations for Nadarajah-Haghighi distribution using progressive type-II censored samples. As for the unknown parameters, we compute maximum likelihood estimates through an Expectation-Maximization algorithm. Under two different kinds of loss functions including the balanced squared error and squared error loss functions, we calculate varieties of approximate Bayes estimates using both the Metropolis-Hasting algorithm and the Tierney and Kadane method. In addition, we get estimation results of confidence intervals, which include the asymptotic intervals, and bootstrap intervals. Then, using the sample derived from the Metropolis-Hasting algorithm, we also obtain the highest posterior density intervals. Furthermore, the point and interval prediction for a future sample is considered in the one-sample and two-sample situations. In the end, we compare and appraise the performance of the provided techniques through a simulation study. For further explanation, one set of real rainfall data is also analyzed.
Overall, we can learn from the numerical studies that the proposed methods can work well. For instance, compared with maximum likelihood estimates and the Bayes estimates using the Tierney and Kadane method, the Bayes estimates using the Metropolis-Hasting algorithm are quite closer to the true values of unknown parameters. The MSEs are mostly less than 0.1. In addition, in Table 7, we can see that the true values of censored samples are completely in the predictive intervals.
Additionally, the methods provided in this article can be extended to other life distributions.