Classical and Bayesian Inference for a Progressive First-Failure Censored Left-Truncated Normal Distribution

: Point and interval estimations are taken into account for a progressive ﬁrst-failure censored left-truncated normal distribution in this paper. First, we derive the estimators for parameters on account of the maximum likelihood principle. Subsequently, we construct the asymptotic conﬁdence intervals based on these estimates and the log-transformed estimates using the asymptotic normality of maximum likelihood estimators. Meanwhile, bootstrap methods are also proposed for the construction of conﬁdence intervals. As for Bayesian estimation, we implement the Lindley approximation method to determine the Bayesian estimates under not only symmetric loss function but also asymmetric loss functions. The importance sampling procedure is applied at the same time, and the highest posterior density (HPD) credible intervals are established in this procedure. The efﬁciencies of classical statistical and Bayesian inference methods are evaluated through numerous simulations. We conclude that the Bayes estimates given by Lindley approximation under Linex loss function are highly recommended and HPD interval possesses the narrowest interval length among the proposed intervals. Ultimately, we introduce an authentic dataset describing the tensile strength of 50mm carbon ﬁbers as an illustrative sample.


Introduction
Presently, due to increasingly fierce market competition, product reliability is generally improved with the advance of production technology. It often takes quite a long period of time to observe the failure time for all units in life-testing experiments, which results in a significant increase in test time and cost. Therefore, it is natural that censoring appears in reliability studies as a result of the limitation of duration and cost of the experiments. In the literature, numerous authors have investigated the traditional type-I censoring under which case the life-testing experiment terminates when the experimental time reaches the preset time, as well as type-II censoring under which case the life-testing experiment terminates when the number of the observed failure units reaches the preset target. Neither of them allows the removal of the test unit during the experiment, which is one of their drawbacks. Furthermore, the concept of progressive censoring is proposed as units may exit the experiment before their failure. In some special situations, the loss of units is beyond the control of the experimenters and may be caused by sudden damage to experimental equipment. It could also be intentional to remove units from the experiment for the sake of freeing up experimental facilities and materials for other experiments as well as saving time and cost. One may refer to [1], which provides an elaborate discussion on progressive censoring. Sometimes it still cannot meet the restriction of test time and cost. Thus, various censoring patterns are proposed successively to improve efficiency.
When experimental materials are relatively cheap, we can use k × n units for experiments instead of only n units and randomly divide them into n sets with k independent  The truncated normal distribution is used in many fields, including education, insurance, engineering, biology and medicine, etc. When a threshold is set on a normally distributed dataset, the remaining data naturally have a truncated normal distribution. For instance, when all college admission candidates whose SAT scores are below the screening value are eliminated, people may be interested in the scores of the remaining candidates. However, if the original score population is normally distributed, the problem they concern turns to investigate the truncated normal distribution. Generally speaking, the truncated normal distribution consists of one-sided truncated and two-sided truncated in terms of the number of truncated points. Simultaneously, with respect to the truncation range, the one-sided truncated normal distribution can be subdivided into left-truncated and right-truncated, and they are also known as lower-truncated and upper-truncated.
The truncated normal distribution has recently attracted a lot of research interest. The existence of MLEs for the parameters was discussed in [11] when the two truncated points were known, and the modified MLEs were further explored to improve the efficiency of the estimation. Ref. [12] developed a handy algorithm to compute the expectation and variance of a truncated normally distributed variable and they compared its behavior under both full and censored data. As for the left-truncated normal distribution, Ref. [13] employed three available approaches to investigate the problem of sample size. In addition to the MLEs and Bayes estimators, Ref. [14] also proposed midpoint approximation to derive the estimates of parameters based on progressive type-I interval-censored sample, and meanwhile, the optimal censoring plans were considered. For the purpose of advancing the research on the standardized truncated normal distribution, Ref. [15] developed a standard truncated normal distribution that wherever the truncated points are, it remains the mean value zero and the variance one. Ref. [16] proposed a mixed truncated normal distribution to describe the wind speed distribution and verified its validity.
When it comes to the properties of the truncated normal distribution, it is worth noting that its shape and scale parameters are not equal to its expectation and variance but correspond to the parameters of the parent normal distribution before truncation. After the truncation range is determined, the value of the probability density function outside it becomes zero, while the value within it is adjusted uniformly to make the total integral one. Therefore, expectation and variance are adjusted for the truncation.
Assuming X is a truncated normally distributed random variable and its truncation range is (a, b), the shape and scale parameters are µ and τ, then its expectation and variance correspond to and here φ(·) and Φ(·) denote the probability density function (PDF) and cumulative distribution function (CDF) of the standard normal distribution. Some mechanical and electrical products such as their material strength, wear life, gear bending, and fatigue strength can be considered to have a truncated normal life distribution. As a lifetime distribution, the domain of the random variable should be nonnegative and consequently, it is reasonable to assume that x > 0. Thus, we consider the lefttruncated normal distribution with the truncation range (0, ∞), denoted as TN(µ, τ), then the corresponding PDF that is f (x) and CDF that is F(x) of the distribution TN(µ, τ) are and here µ is the shape parameter and τ is the scale parameter. And the survival function is For comparison, Figure 2 visually shows the distinction of PDFs between three groups of parent normal distributions N(µ, τ) and the corresponding truncated normal distribution TN(µ, τ). The parent normal distributions possess the same τ but different µ. Obviously, the parent normal distribution whose shape parameter is closest to the truncated point changes drastically after truncation, while the one with the shape parameter farthest to the truncated point barely changes and even retains the same pattern as normal. In particular, it is worth mentioning that when the truncated point T * satisfies |T * −µ| √ τ ≥ 3.5, the truncation basically loses its effect. In Figure 3, we can observe that the position where the peak of the truncated normal distribution occurs is in accordance with the value of µ. As this value gets closer to the truncated point, the peak value of PDF of the distribution TN(µ, τ) and the same scale parameter will be larger as a result of the integral of one. However, under the same shape parameters, the image of PDF becomes flat with the increase of scale parameter and it is consistent with the property of normal distribution.   This article begins with the following sections. In the first place, the MLEs for two unknown parameters of the distribution TN(µ, τ) are derived in Section 2 and we establish the corresponding asymptotic confidence intervals (ACIs) associated with the approximate asymptotic variance-covariance matrix. In Section 3, the bootstrap resampling method is applied to develop both bootstrap-p and bootstrap-t intervals. In Section 4, we propose the Bayes approaches to estimate two parameters under squared error, Linex, and general entropy loss functions using Lindley approximation. As this approximation method is failed to provide us the credible intervals, the importance sampling procedure is employed to gain both parameter estimates and the highest posterior density (HPD) credible intervals. The behaviors of diverse estimators proposed in the above sections are evaluated and compared by amounts of simulations in Section 5. In Section 6, an authentic dataset is introduced and applied to clarify how to make statistical inferences using the methods presented before and the effectiveness of these approaches. Finally, a summary of the whole article is made in Section 7.

Maximum Likelihood Estimation
Suppose that a progressive first-failure type-II censored sample comes from a continuous population with PDF that is f (·) and CDF that is F(·). Let's denote the ith observation as x R (i:m:n:k) , thus we have x R (1:m:n:k) < x R (2:m:n:k) < · · · < x R (m:m:n:k) , here R = (R 1 , R 2 , · · · , R m ). For simplicity, let x = (x 1 , x 2 , · · · , x m ) replace (x R (1:m:n:k) , x R (2:m:n:k) , · · · , x R (m:m:n:k) ). According to [1,2], the joint PDF is presented as where For the case where the sample is from TN(µ, τ), after combining (3), (4) and (6), the likelihood function turns to be where Hence, the log-likelihood function is Take the partial derivatives of (8) concerning µ and τ respectively and set them equal to zero, the corresponding equations are and ∂l ∂τ where N = k × n.
The roots of the non-linear Equations (9) and (10) correspond to the MLEsμ andτ, but the explicit expressions are obviously unobtainable, so some numerical techniques such as Newton-Raphson method are employed to derive the MLEs.

Asymptotic Confidence Intervals for MLEs
Given that MLEs possess the asymptotic normality, the ACIs of µ and τ can be established by using Var(μ) and Var(τ). The asymptotic variances of MLEs can be obtained from the inverse Fisher information matrix. Let θ = (θ 1 , θ 2 ) = (µ, τ). The Fisher information matrix (FIM) I(θ) can be written as Here The FIM I(θ) is in form of expectation and the acquirement of its exact value depends on the distribution of the order statistics X (j) . Ref. [1] provided the PDF of the order statistics X (j) of the progressive type-II censored data in general, where Since progressive first-failure censoring can be regarded as an extention to the progressive type-II censoring. We can derive the PDF of the order statistics X (j) of the truncated normal distribution TN(µ, τ) under PFFCS after some transformations of (12), and it is given as Then, the FIM I(θ) can be calculated directly based on (13). In practice, the expectation in (11) can be removed naturally and the observed FIM I(θ) is used to approximate the asymptotic variance-covariance matrix for the purpose of simplifying the complicated calculation, see [2,17]. The observed FIM I(θ) corresponds to here,θ denotes the MLE of θ, namelyθ = (θ 1 ,θ 2 ) = (μ,τ). Then the approximate asymptotic variance-covariance matrix is Therefore, the 100(1 − ξ)% ACI for θ j is given by where z ξ/2 is the ξ/2 upper quantile of the standard normal distribution.

Asymptotic Confidence Intervals for Log-Transformed MLEs
The method just proposed has an obvious defect that its lower bound of ACI is prone to appear negative value when the truth value of the parameter is small. Since the parameter τ discussed in this paper is strictly non-negative, the negative part of the confidence interval is unreasonable at that time. To avoid this issue, we can use delta method and logarithmic transformation proposed in [18]. Similarly, this method is available to µ when µ > 0. The asymptotic distribution of lnθ j is where D −→ denotes convergence in distribution and var(lnθ j ) = var(θ j ) Therefore, the asymptotic confidence intervals based on log-transformed MLEs are The proposal of these two ACIs is on the premise that MLEs are asymptotically normally distributed. Hence, if the number of the sample is not large enough, the accuracy of these two confidence intervals may be declined. In the next section, we provide a resampling technique to solve the problem of building confidence intervals for parameters under a small sample size.

Bootstrap Confidence Intervals
Bootstrap methods can make great sense in the case with little effective sample size m, so here we propose two widely used bootstrap methods to establish the intervals, see [19]. One is the percentile bootstrap method, also regarded as bootstrap-p (boot-p). The other is known as the bootstrap-t (boot-t) method. The specific steps of the two methods are as follows.

Percentile Bootstrap Confidence Intervals
Step 1 For a given PFF censored sample x from TN(µ, τ) with n, m, k and R = (R 1 , R 2 , · · · , R m ), figure the MLEs of parameters µ and τ under the primitive sample x, denoted asμ andτ.
Step 2 In accordance with the identical censoring pattern (n, m, k, R) as x, generate a PFF censored bootstrap sample x * from TN(μ,τ). Similarly to step 1, compute the bootstrap MLEsμ * andτ * based on x * .
Step 2 In accordance with the identical censoring pattern (n, m, k, R) as x, generate a PFF censored bootstrap sample x * from TN(μ,τ). Similarly to step 1, compute the bootstrap MLEsμ * andτ * based on x * .

Bayesian Estimation
The selection of prior distribution is a primary problem of Bayesian estimation for the fact that the prior distribution could have a significant impact on the posterior distribution in small sample cases. So, a proper prior distribution is worth discussing at the beginning.
In general cases, the conjugate prior distribution is a preferred choice in Bayesian estimation because of its algebraic convenience. However, such prior does not exist when both parameters µ and τ are unknown. For the sake of simplicity, we need to find a prior distribution with the same form as (7). Furthermore, according to the form of the denominator part of the exponential term in the likelihood function (7), τ should appear as a parameter of the prior distribution of µ. Therefore, assuming that they are not independent is feasible and we can presume that τ follows an Inverse Gamma prior IG(α, β) and µ follows a truncated normal prior associated with τ, namely µ ∼ TN(a, τ b ) , where all hyper-parameters are bound to be positive. The PDFs of their prior distributions can be written as The corresponding joint prior distribution is Given x, the joint posterior distribution π(µ, τ|x) can be obtained by

Symmetric and Asymmetric Loss Functions
The loss function is used to evaluate the degree to which the predicted value or the estimated value of the parameter is different from the real value. In practice, the squared error loss function has been used extensively in the literature and it is preferred in the case where the loss caused by overestimation and underestimation is of equal importance. But sometimes, it is not appropriate to use a symmetric loss function when an overestimate plays a crucial role compared with an underestimate, or vice versa. Thus, in this subsection, we discuss the Bayesian estimations theoretically under one symmetric loss function, namely squared error loss function (SE), and two non-symmetric loss functions, namely Linex loss function (LX) and general entropy loss function (GE).

Squared Error Loss Function
This loss function is defined as here,θ denotes any estimate of ϑ. The Bayesian estimator of ϑ under SE iŝ Given a function g(µ, τ), the expression of its Bayesian posterior expectation is Thus, the Bayesian estimateĝ(µ, τ) under SE can be given theoretically aŝ

Linex Loss Function
The Linex loss function is suggested in the case that underestimation is more costly compared with overestimation, and this loss function is defined as In fact, it is recognized as a family L(∆) where ∆ could be either the usual estimation errorθ − ϑ or a relative error (θ − ϑ)/ϑ, namelyθ ϑ − 1. In this paper, we take advantage that ∆ =θ − ϑ and let b = 1, then LX becomes The sign of the s indicates the orientation of asymmetry, while its size represents the degree of asymmetry. Under the same differenceθ − ϑ, the larger the magnitude of s is, the larger the cost is. Given small values of |s|, LX is almost symmetric and very close to SE. One may refer to [20] for details.
The Bayesian estimator of ϑ under LX iŝ Thus, the Bayesian estimateĝ(µ, τ) under LX can be given theoretically aŝ

General Entropy Loss Function
This loss function is defined as The Bayesian estimator of ϑ under GE iŝ When h > 0, the positive error valueθ − ϑ is more costly compared with the negative error value, and vice versa. In particular, when h = −1, the Bayesian estimate under SE is the same as that under GE.
The Bayesian estimateĝ(µ, τ) under GE can be given theoretically aŝ It is noticeable that the Bayesian estimates are expressed in terms of the ratio of two integrals and the specific forms cannot be presented theoretically. So, we implement the Lindley approximation method to determine such estimates.

Lindley Approximation Method
In this subsection, we take advantage of Lindley approximation to acquire the Bayesian parameter estimates. Consider the posterior expectation of ϕ(µ, τ) expressed in terms of the ratio of two integrals here, l denotes the log-likelihood function, ρ denotes the logarithmic form of the joint prior distribution.

Linex Loss Function
For parameter µ, it is clear that The Bayesian estimate of µ under LX is derived by combining (29), (35) and (40) For parameter τ, it is clear that The Bayesian estimate of τ under LX is derived by combining (29), (35) and (42)

General Entropy Loss Function
For parameter µ, the corresponding items become The Bayesian estimate of µ under GE is derived by combining (32), (35) and (44) µ GE ={μ −h + 0.5ϕ 11 σ 11 + 0.5ϕ 1 [σ 2 11 l 30 + 3σ 11 σ 12 l 21 + σ 11 σ 22 l 12 For parameter τ, the corresponding items become The Bayesian estimate of τ under GE is derived by combining (32), (35), and (46) When it comes to estimating the ratio of two integrals in the form given in (34), the Lindley approximation method is very effective. Nevertheless, one of its drawbacks is that this method only provides point estimates but not the credible intervals. Therefore, in the upcoming subsection, we propose the importance sampling procedure to gain both estimations of points and intervals.

Importance Sampling Procedure
Here, we propose a useful approach called importance sampling procedure to acquire the Bayesian parameter estimates. Meanwhile, the HPD credible intervals for both parameters are constructed in this procedure.
From (22), the joint posterior distribution can be rewritten as where According to the Lemma 1, the parameters of the inverse gamma distribution and the truncated normal distribution in (48) are positive. Thus, it makes sense to sample τ from IG τ (α,β) the and sample µ from TN µ|τ (ã, τ b ).
Proof. According to the sum of squares inequality, we can get then we must prove the non-negativeness of the following quadratic function when we Notably, the quadratic function above minimizes at ( x i ) 2 = am, and the corresponding result is 2β(b + m), which is strictly non-negative. Thus, the lemma is proved.
Then, the following steps are used to derive the Bayesian estimate of any function Ω(µ, τ) of the parameters µ and τ.

Simulation Study
For evaluating the effectiveness of the proposed methods, plenty of simulations are carried out in this part. The maximum likelihood and Bayes estimators proposed above are assessed based on the mean absolute bias (MAB) and mean squared error (MSE), whereas interval estimations are considered according to the mean length (ML) and coverage rate (CR). Looking back to the progressive first-failure censoring presented in the first part, it can be found that when an experimental group is regarded as one unit, the life of this experimental unit turns into the life distribution of the minimum value of the group. In this way, it is intelligible to generate PFF censored sample through a simple modification of the algorithm introduced in [22] and Algorithm 1 given subsequently provides the specific generation method.
Here, we take the true values of parameters as µ = 3 and τ = 1. For comparison purposes, we consider k = 1, 2, n = 30, 40 and m = 40%n, 70%n, 100%n. Meanwhile, different censoring schemes (CS) designed for the later simulations are presented in Table 1. As a matter of convenience, we abbreviate the censoring scheme to make it clear, for instance, (0 * 5) is the abbreviation of the censoring scheme (0, 0, 0, 0, 0). In each case, we repeat the simulation at least 2000 times. Thus, the associated MABs and MSEs with the point estimation and the associated MLs and CRs with the interval estimation can be obtained.  First, it should be noted that all our simulations are performed in R software. For maximum likelihood estimation, we use optim command with method L-BFGS-B to derive the MLEs of parameters, and then we tabulate the corresponding results in Table 2. For Bayesian estimation, we naturally consider the true value as the mean of the prior distribution. But such hyper-parameters are intractable because of the complexity of prior distribution. Therefore, we use genetic algorithm and mcga package in R software to search for the optimal hyper-parameters and the result turns out to be a = 4, b = 2, α = 5.5, β = 2.5. Then two Bayes approaches with informative prior are implemented to derive the estimates under loss functions SE, LX and GE. We set the parameter s of LX to 0.5 and 1, while the parameter h of GE is −0.5 and 0.5. These simulation results are listed in Tables 3-6. At the same time, the proposed intervals are established at 95% confidence/credible level and Tables 7 and 8 summarize the results. Here, ACI denotes asymptotic confidence interval based on MLEs, Log-ACI denotes the asymptotic confidence interval based on log-transformed MLEs. In the following simulations, the bootstrap confidence intervals are obtained after K = 1000 resamples while HPD credible intervals are obtained after M = 1000 samples. From Table 2, we can observe some properties about maximum likelihood estimates: (1) The maximum likelihood estimate of µ performs much better than that of the maximum likelihood estimate of τ with respect to MABs and MSEs. (2) When the effective sample size m or the total number of groups n or the value of m/n increases, MABs and MSEs decrease significantly for all estimators, which is exactly as we expected. With the increasing group size k , MABs and MSEs generally decrease for the shape parameter µ, while the corresponding indicators generally increase for the scale parameter τ.    Table 4. Performance of Bayesian estimates of µ using Importance Sampling.    Table 6. Performance of Bayesian estimates of τ using Importance Sampling. (1) In general, the ML of HPD credible interval is the most satisfying compared with the other intervals, while the boot-t confidence interval possesses the widest ML.
With the increase of m/n, the ML shows a tendency to narrow, and this pattern holds for both parameters. (2) Boot-p confidence interval is unstable as its CR decreases significantly when the group size k increases, whereas boot-t interval is basically not affected by k and possesses the robustness to some extent considering µ.
(3) ACI competes well with Log-ACI for µ and they are similar in terms of ML and CR. However, the CR of Log-ACI is much improved and more precise than ACI for τ. Therefore, Log-ACI seems to be a better choice.  First, we test whether the distribution TN(µ, τ) fits this real dataset well. In particular, Ref. [24] compared the fitting results with many famous reliability distributions, such as Rayleigh distribution, and Log-Logistic distribution, etc. They concluded that Log-Logistic distribution has the best fitting effect. Therefore, we compare the fitting effect of truncated normal distribution and Log-Logistic distribution with the PDF that is g( Various criteria are applied for testing the goodness of fit of the model, such as the negative log-likelihood function − ln L, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Kolmogorov-Smirnov (K-S) statistics with its p-value. The corresponding definitions of the above criteria are given as here, θ is a parameter vector, d is the number of parameters in the fitted model, ln L d is evaluated at the MLEs, and n is the number of observed values. Table 10 shows the MLEs of the parameters for each distribution, along with − ln L, AIC, BIC, K-S, and p-value corresponding to two distributions. Conspicuously, since the truncated normal distribution has the lower statistics and higher p-value, it does fit the complete sample well. Now, we can use this dataset for analysis. Therefore, we randomly divide the given data into 50 groups, and each group has two independent units. Thus, the first-failure censored data are obtained, as shown in Table 11. In order to gain the PFF censored samples, we set m = 25 and give three different kinds of censoring schemes, namely c 1 = (25, 0 * 24), c 2 = (1 * 25), c 3 = (0 * 24, 25). Table 12 presents the PFF censored samples under left censoring, middle censoring and right censoring.  In Tables 13 and 14, the point estimates of parameters µ and τ are shown, respectively. No informative prior is available for Bayesian estimation, so we apply non-informative prior, and the four hyper-parameters are all around zero tightly and three loss functions discussed before are taken into account. As for the two asymmetric loss functions, we continue to use the parameters in the previous simulations, namely s = 0.5 and s = 1, h = −0.5 and h = 0.5. It can be seen from the table that there are some differences between the estimated values obtained by different censoring schemes and different methods. The parameter estimates based on censoring scheme c 1 are closest to the MLEs under the full sample, while the estimates using the importance sampling are pervasively inclined to be smaller compared with those gained by Lindley approximation. At the same time, we construct 95% ACIs, Log-ACIs, bootstrap, and HPD intervals, while Tables 15 and 16 display the corresponding results.  In Figure 4, we have drawn four different estimated distribution function images, and their corresponding parameters are MLEs under complete samples and censored samples with schemes c 1 , c 2 and c 3 . It is of considerable interest to see that the estimated curve based on the censoring scheme c 1 = (25, 0 * 24) is the closest to the estimated curve based on full data, which indicates that the left censored data is the superior one. In the middle part of the graphics, we can tell that the value of the estimated curve is underestimated based on the censoring scheme c 2 = (1 * 25), and on the contrary, the value based on the censoring scheme c 3 = (0 * 24, 25) is overestimated.

Conclusions
Throughout the full article, we consider the classical and Bayesian inference for a progressive first-failure censored left-truncated normal distribution. MLEs are derived using an optimization technique and the Bayesian estimation is taken into account under loss functions SE, LX, and GE. At the same time, confidence and credible intervals for the parameters are constructed and compared with each other. In the simulation section, MAB and MSE are taken into account for the point estimation while the ML and CR are considered for the interval estimation.
When it comes to the point estimation, the performance of MLEs is satisfactory, whereas the Bayesian estimation with proper informative prior is superior to MLEs in all cases. According to the simulation study presented in this paper, the Bayesian estimates with proper prior under loss function LX are the best among all estimates, and Lindley approximation method is highly recommended. Moreover, in terms of interval estimation, ACIs based on log-transformed MLEs have more accurate coverage rate than ACIs based on MLEs. HPD credible intervals consistently have the shortest interval length compared with other confidence intervals.
The truncated normal distribution is versatile as it possesses the flexibility of truncation and the superior properties of normal distribution. The research object in this article is a progressive first-failure censored left-truncated normal distribution with a known truncated point. In some cases, we may be interested in the position of the truncated point so it is inevitable to estimate the unknown truncated point. Furthermore, the research field of this censoring plan adding with binomial removals and competing risks can be explored. In brief, it is still of great potential to conduct further research of truncated normal distribution.