A Penalized Likelihood Approach to Parameter Estimation with Integral Reliability Constraints

Stress-strength reliability problems arise frequently in applied statistics and related fields. Often they involve two independent and possibly small samples of measurements on strength and breakdown pressures (stress). The goal of the researcher is to use the measurements to obtain inference on reliability, which is the probability that stress will exceed strength. This paper addresses the case where reliability is expressed in terms of an integral which has no closed form solution and where the number of observed values on stress and strength is small. We find that the Lagrange approach to estimating constrained likelihood, necessary for inference, often performs poorly. We introduce a penalized likelihood method and it appears to always work well. We use third order likelihood methods to partially offset the issue of small samples. The proposed method is applied to draw inferences on reliability in stress-strength problems with independent exponentiated exponential distributions. Simulation studies are carried out to assess the accuracy of the proposed method and to compare it with some standard asymptotic methods.


Introduction
We consider a stress-strength reliability problem, R = P (Y < X), where X and Y are independently distributed as exponentiated exponential random variables.In this case, the parameter of interest, R, is an integral with no closed form solution.Likelihood-based inference for this problem involves, in part, numerical constrained optimization.The likelihood ratio test, for example, uses constrained maximum likelihood parameter estimates where the log-likelihood function is maximized subject to the constraint that the parameter of interest takes on the null-hypothesized value.
A standard approach to solving (equality) constrained problems is to apply the Lagrange method.Detailed discussions can be found in many reference works in optimization, for example, see [1].Using the Lagrange technique, this problem requires that, in each iteration, one parameter has to be used to guarantee that the integral equality constraint is satisfied up to a given level of numerical accuracy.Satisfying the constraint involves simultaneous numerical integration and the numerical solution of a nonlinear equation for the correct parameter value.Moreover, all of this needs to be repeated for each value of the remaining parameters.
We found that the Lagrange approach led to problems of numerical accuracy and slow convergence.We also discovered that the integral function was homogeneous of degree 0 in two of the parameters of the model.The integral is therefore insensitive to equi-proportionate changes in the two parameters and this complicated optimization.
Penalty function methods are also often used to solve equality and inequality-constrained optimization problems.See Byrne [2] and the extensive list of references therein.The basic idea is to solve a constrained optimization problem by solving a sequence of unconstrained problems where constraint violation is penalized in a successively harsher manner.The sequence of unconstrained optima converges to the constrained optimum.Smith et al. (2014) [3] demonstrates the accuracy of the penalty method in a constrained optimization problem.When implementing this approach to the problem considered in this paper, we found it was numerically stable and rapid in the sense that parameter estimates from successively more harshly penalized models quickly converged to optimal values and became effectively insensitive to further penalization.
The remainder of the paper is organized as follows.Section 2 begins with a brief description of our stress-strength reliability problem.We then examine some of the properties of the unconstrained likelihood function.This seems necessary because the likelihood function is not concave and generally cannot be optimized by an algorithm that uses the exact Hessian matrix.We then introduce the integral constraint and some of its properties together with a numerical example to illustrate how the Lagrange approach failed.The section concludes with a discussion of some properties and an implementation of the penalty function approach.Section 3 briefly reviews the likelihood-based third-order method for inference concerning a scalar parameter of interest when samples are small.Section 4 applies the approaches developed in Sections 2 and 3 to study inference for the stress-strength reliability problem with independent exponentiated exponential distributions.Results from simulation studies are presented to illustrate the extreme accuracy of our suggested approach.Some concluding remarks are given in Section 5. Technical details related to some issues raised in Section 2 are presented in the Appendix.

Problem
We consider statistical inference for the reliability , R = P (Y < X) where X and Y are independently distributed with exponentiated exponential distributions.The estimation of R is important in the statistical literature and has been widely studied in other areas, such as radiology, mechanical engineering and materials science.In the context of the stress-strength model, reliability refers to the probability that the unit's strength Y is less than the stress X.Following the notation in [4], the cumulative distribution function of the two-parameter exponentiated exponential distribution, EE(α, β), is where α is the shape parameter and β is the scale parameter.The EE(α, β) distribution is also known as the generalized exponential distribution.It is equivalent to the exponentiated Weibull (κ = 1, α, σ) distribution as introduced in [5], where κ is the first shape parameter, α is the second shape parameter and σ is the scale parameter.If α = 1, the distribution reduces to the one parameter exponential (β) model.The EE(α, β) distribution is recognized as a useful model for lifetime data or skewed data.It has a monotone increasing hazard function when α > 1, decreasing when α < 1 and constant when α = 1 .It also has a unimodal and right-skewed density function.For a fixed scale parameter value, the skewness gradually decreases as the shape parameter increases.Kundu and Gupta [6] and Raqab et al. [7] considered maximum likelihood estimation of R when X and Y are independent EE(α 1 , β 1 ) and EE(α 2 , β 2 ) distributed random variables and where β 1 and β 2 are assumed to be the same.Then R = α 1 α 1 +α 2 .Likelihood-based inference for R can then be obtained easily.However, without the assumption that β 1 = β 2 , then and this integral does not have a known closed form.To obtain likelihood-based inference for R in this case, it is necessary to solve an optimization problem with an integral constraint.We found that standard macros, functions and subroutines in popular software packages such as R or Matlab experienced problems in converging to constrained maximum likelihood estimates in a Lagrange setting.As noted in the Introduction, a penalized likelihood method is proposed to provide a solution for this constrained maximization problem.
Once the constrained maximum likelihood estimates have been obtained, likelihood methods can be used to draw inferences about R. In this paper, the third-order method discussed in Fraser and Reid [8] is applied.This is known to be very accurate even when the sample sizes are small.See, for example, Chang and Wong [9] and She et al. [10] where the accuracy of third order methods has been assessed.

Unconstrained Likelihood and Its Properties
Let x = (x 1 , . . ., x n ) and y = (y 1 , . . ., y m ) be independent random samples from EE(α 1 , β 1 ) and EE(α 2 , β 2 ) respectively.Then the log-likelihood function is: The log-likelihood function is infinitely differentiable and has some interesting properties.In the first place it is not concave.Nor is it quasiconcave or pseudoconcave or a member of any of the other classes of generalized concave functions that appear regularly in the literature on optimization.However, the function is strictly concave in each of its parameters individually.For fixed values of any 3 parameters, the function reaches a unique maximum with respect to the fourth.The determinant of the Hessian matrix changes sign throughout the domain of the function.But, the gradient of the log-likelihood function vanishes at a unique vector of parameter values and at that point and in a neighbourhood around it the Hessian matrix is negative definite.All of these results are derived in the Appendix.So, when the function is restricted to an open region about the global optimum, the function is concave.These results have some important implications regarding how to maximize log-likelihood for any given sample.In particular, all parameters cannot typically be estimated simultaneously using any algorithm that uses exact Hessian information.Quasi-Newton techniques (sometimes called variable metric) work by building an increasingly accurate quadratic approximation to the log-likelihood surface.These will work in this setting as long as exact Hessians are not used building the surface approximation and updating the parameters.Hessian matrices from the exponentiated exponential model usually won't be negative definite and will lead to inappropriate updating.The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm can work because some versions of it start with negative definite and symmetric "approximate" Hessians and update them until they converge to the true Hessian at the optimum.Given the independence of X and Y it is also possible to concentrate the log-likelihood using the derivative information on α 1 and α 2 and then apply simple line searches to find the β parameters.The Appendix shows why this latter, perhaps more cumbersome, approach will work but only in the unconstrained case.
In summary, obtaining the unconstrained parameter estimates is possible if done carefully.As well, we found it is often helpful to restrict large changes in parameters during estimation.We recommend box constraints be placed on parameters during estimation.Preset upper and lower bounds on the parameters require that the parameters must be chosen from a four dimensional hyper-cube.A version of BFGS that implants constraints of this type is available in the statistical package, R.

The Integral Reliability Constraint
In this paper, the parameter of interest is stress-strength reliability, R = P (Y < X), where X and Y are independently distributed as EE(α 1 , β 1 ) and EE(α 2 , β 2 ) distributions respectively.The constraint function, R, is given in (2) and has no known closed form solution.Straightforward differentiation shows that R is an increasing function of α 1 and β 2 and a decreasing function of α 2 and β 1 .Introducing the change of variables: z = β 1 x, the integral constraint can be expressed equivalently as: This establishes that R is homogeneous of degree 0 in (β 1 , β 2 ).The contours of R are all constant along the line in parameter space satisfying β 2 = cβ 1 .This can complicate the numerical optimization process.It also provides a good reason to introduce the box constraints on the parameters as discussed in Section 2.2.

Constrained Optimization: Lagrange
We need to find the constrained maximum likelihood parameter estimates θψ = (α 1 , α2 , β1 , β2 ) for a given ψ 0 .This requires that we maximize the log-likelihood function in (3) subject to the constraint R = ψ(θ) = ψ 0 in (2).Using the Lagrange approach, for given values of ψ 0 and three of the likelihood parameters, the constraint R = ψ(θ) = ψ 0 can be numerically integrated and the fourth parameter simultaneously chosen so that the integral constraint holds.The value of the fourth parameter and ψ 0 are held fixed while the log-likelihood function is then maximized with respect to the three "free" likelihood parameters.This process is iterated until convergence.
There is much that can go wrong in the above iterative process.The likelihood function is not concave, the equality constraint is highly nonlinear in the parameters, the constraint integral must be evaluated numerically and then solved numerically and the integral R is homogeneous of degree 0 with respect to two of the parameters.
We discovered in simulation that this iterative Lagrange method sometimes gave relatively satisfactory results for some values of ψ 0 but it often went astray.Typically it took a long time for the estimation process to converge.Performance degraded quickly when R was constrained to be closer to the boundary values of 0 or 1.
Table 1 records one of these situations for a simulated data set from EE(2, 3) and EE(5, 1.6015) distributions each with a sample size of 10 and ψ 0 was set equal to 0.1.Initial attempts to program the search algorithm in Matlab resulted in floating point overflow and division by zero errors.We next tried Matlab's built-in macro, "fminsearch", which uses a simplex search method.This macro provided the correct unconstrained parameter estimates but failed when constraints were introduced.The unconstrained parameter estimates are recorded in Table 2.Not surprisingly, these unconstrained parameter estimates do not solve the constrained optimization problem.They yield a constraint value of 0.0141 as opposed 0.1.Correct results were found using a penalized likelihood method (appearing as Penalty in the table).This method is introduced below.We encountered similar problems with small samples and when sample sizes were unequal.We reprogrammed the Lagrange constrained optimization algorithm in the statistical package R.In addition to slow, inaccurate and sometimes failed convergence, we also encountered cases of singular matrices during the optimization process.

The Penalized Likelihood Approach
As noted in the previous section, we encountered a wide variety of serious numerical problems when we tried to implement the classical Lagrange approach to our integral equality constrained optimization problem.We now present a penalized likelihood method, which was also discussed in Smith et al. (2014) [3], that reliably solves the constrained optimization problems addressed in this paper.We begin by defining the penalized likelihood function, P L, as: The function consists of the log-likelihood function l(θ; x, y) and a positive function, k(R(θ) − ψ 0 ) 2 , called the penalty function, that is subtracted from the the likelihood function.Here we consider a penalty function that is the square of the divergence of R from the constraint value ψ 0 .The greater is k, the more the likelihood is penalized (decreased) for given divergences of R from ψ 0 .Next, consider the unconstrained problem of maximizing P L(θ, ψ 0 ; x, y) with respect to θ for a given value of k.Formally, we obtain an optimal vector θ * k .When k = 0, we obtain the parameter vector that maximizes the unconstrained log-likelihood.As k increases and we successively solve the optimization problems, we obtain a sequence of optimal parameter vectors that converges to the solution of the constrained problem of choosing θ to maximize l(θ; x, y) subject to R(θ) = ψ 0 .In contrast to the Lagrange approach, we now have a sequence of unconstrained optimization problems instead of one constrained problem.This means the penalty approach may be theoretically more complicated.But, in practice, we found it is faster and more reliable than the Lagrange approach to our problem.First, the penalty approach does not require that the equality R(θ) = ψ 0 hold at any stage of the optimization process.Instead, divergence of R(θ) from ψ 0 is (increasingly) discouraged as k increases.Many of the numerical problems we encountered in applying the Lagrange technique arose from trying to impose R(θ) = ψ 0 exactly.Second, in practice as k increases we found that the estimated parameters quickly converged to optimal values and it was not necessary to continue to increase k and solve additional optimization problems.Finally, the Lagrange and Penalty approaches are formally dual so that the Lagrange multiplier can be recovered as the limiting value of the partial derivative of P L with respect to the parameter ψ 0 .
A recent analysis of the rigorous foundations of the penalty (and other) approaches to constrained optimization can be found in [2] and the discussion that follows draws upon Byrne's presentation.We need very few technical conditions to hold in order for the sequence of P L-maximizing vectors, θ * k , to converge to the solution of the constrained optimization problem.First, we need the permissible parameter values to be drawn from a compact set.We satisfy this condition by imposing box constraints on the four likelihood parameters.Second, there is a straightforward restriction that l and R are continuous functions of the parameters.These conditions are satisfied.Finally, we require that the sequence of optimal vectors, θ * k , correspond to the global maximum of penalized likelihood for each k.Our experience is that, for each k, there is only ever one optimum and that occurs where the gradient of P L vanishes.In all cases the Hessian matrix at the optimum was negative definite.
When implementing the penalty approach we found that when 0.1 < ψ 0 < 0.9, there was little gain from increasing k beyond 10,000.For the extreme values of ψ 0 , k did not have to exceed 80,000.As well, short sequences were sufficient.It was not necessary solve separate optimization problems for all possible values of k.
We obtained parameter estimates from unconstrained optimization of P L using the optim function in the statistical software package, R. Within optim, we adopted the L-BFGS-B algorithm.L-BFGS-B was developed by Byrd et al. [11] and is a quasi-Newton optimization variant of the Broydon-Fletcher-Goldfarb-Shanno (BFGS) algorithm.L-BFGS-B conveniently allows for simple upper and lower search bounds (box constraints) on the parameters.As noted earlier, L-BFGS-B does not use an exact Hessian.Instead it updates an approximate Hessian that is guaranteed to be negative definite and symmetric.The approximate Hessian converges to the true one.Our code is available upon request.

Likelihood-based Inference for Any Scalar Parameter of Interest
In this section, we provide a brief review of likelihood-based inference for any scalar parameter of interest.Let y = (y 1 , . . ., y n ) be a random sample from a distribution with log-likelihood function (θ) = (θ; y), where θ is a vector parameter with dim(θ) = p.Also let ψ = ψ(θ) be a scalar parameter of interest.Denote θ to be the overall maximum likelihood estimate, which satisfies Moreover, denote θψ as the constrained maximum likelihood estimate of θ for a given ψ(θ) = ψ 0 .θ and θψ can be obtained from the penalized likelihood method discussed in Section 2.Moreover, with θψ , the corresponding Lagrange multiplier, K, can be obtained.Define the tilted log-likelihood function as which has the property ˜ ( θψ ) = ( θψ ).
Two widely used likelihood-based methods for obtaining asymptotic confidence interval for ψ are based on the maximum likelihood estimator of θ and the signed log-likelihood ratio statistic.It is well-known that ( θ − θ) [var( θ)] −1 ( θ − θ) is asymptotically distributed as χ 2 -distribution with p degrees of freedom, and var( θ) can be estimated by the inverse of either the expected Fisher information matrix or the observed information matrix evaluated at θ.In practice, the latter, is preferred because of the simplicity in calculation.By applying the Delta method, we have where is asymptotically distributed as N (0, 1).Thus a (1 − γ)100% confidence interval of ψ based on the maximum likelihood estimator is where z γ/2 is the (1 − γ/2)100 th percentile of the standard normal distribution.Alternatively, with regularity conditions stated in [12], the signed log-likelihood ratio statistic is is asymptotically distributed as N (0, 1).Therefore, a (1 − γ)100% confidence interval of ψ based on the signed log-likelihood ratio statistic is It should be noted that both methods have rates of convergence only O(n −1/2 ).While the maximum likelihood estimator based interval is often preferred because of the simplicity in calculation, the signed log-likelihood ratio method is invariant to reparametrization and the maximum likelihood estimator based method is not.Doganaksoy and Schmee [13], showed that the signed log-likelihood ratio statistic based interval has better coverage property than the maximum likelihood estimator based interval in cases they considered.
In recent years, various adjustments to the signed log-likelihood ratio statistic have been proposed to improve the accuracy of the signed log-likelihood ratio statistic.Reid [14] and Severeni [15] gave detail overview of this development.In this paper, we consider the modified signed log-likelihood ratio statistic, which was introduced by [16,17] and has the form where r(ψ) is the signed log-likelihood ratio statistic, and Q(ψ) is a statistic that based on the log-likelihood function and an ancillary statistic.Barndorff-Nielsen [16,17] showed that r * (ψ) is asymptotically distributed as N (0, 1) with rate of convergence O(n −3/2 ).Hence a 100(1 − γ)% confidence interval for ψ(θ) is given by However, for a general model, an ancillary statistic might not exist, and even when it does, it might not be unique.Fraser and Reid [8] showed that Q(ψ) is a standardized maximum likelihood departure in the canonical parameter scale and Fraser et al. [18] derived the general formula for obtaining the statistic Q(ψ).More specifically, Fraser et al. [18] obtained the locally defined canonical parameter where is the ancillary direction and z(y; θ) = (z 1 (y; θ), • • • , z n (y; θ)) is a vector of pivotal quantities.Then the standardized maximum likelihood departure in ϕ(θ) scale takes the form: where is the recalibrated parameter of interest in the ϕ(θ) scale, and Since the exact var(χ( θ) − χ( θψ )) is difficult to obtain, Fraser et al. [18] showed that it can be approximated by where with j θθ (θ) and jθθ (θ) being the observed information matrix obtained from the log-likelihood function and the tilted log-likelihood function respectively.Hence, the confidence interval of ψ based on r * (ψ) can be obtained from Equation (11).

Application to Stress-Strength Reliability with Independent EE Distributions
In Section 2 the EE distribution was introduced as was the reliability constraint.In Section 4.1, we consider the case where the scale parameters are different.The proposed penalized likelihood method is used to obtain the constrained maximum likelihood estimate.The likelihood-based third order method is then applied to obtain inference for R = P (Y < X).Section 4.2 presents results from some numerical studies.The special case where the scale parameters are assumed to be equal, and hence, R = P (Y < X) has a closed-form, is also examined.
The tilted log-likelihood function l(θ) is defined as where ψ(θ) = R defined by (2).Similarly, we can obtain the constrained maximum likelihood estimate θψ = (α 1 , α2 , β1 , β2 ) by the penalized likelihood method and K is the Lagrange multiplier, then constrained observed information matrix jθθ ( θψ ) can be written as where Thus r(ψ) can be obtained accordingly.
For the problem we are considering, the natural choice of the pivotal quantity is Hence the ancillary direction V is Then we can calculate the locally defined canonical parameter ϕ(θ) as where w = (x 1 , . . ., x n , y 1 , . . ., y m ) be the observed data.Hence, we also have ϕ θ (θ).Therefore, for this unequal scale parameter case, χ(θ), var χ( θ) − χ( θψ ) , Q(ψ) and r * (ψ) can be obtained accordingly.Hence, the (1 − γ)100% confidence interval can be obtained from the modified signed log-likelihood ratio statistics.
To compare the accuracy of the three methods discussed in this paper, Monte Carlo simulation studies were conducted.The cases of unequal and equal scale parameters are both examined.For each parameter configuration and for each sample size, we generate 10,000 random samples from the EE distributions by using the following transformation: where U is a uniform variate between 0 and 1.
For each simulated setting, we report the proportion of ψ that fall outside the lower bound of the confidence interval (lower error), the proportion of ψ that fall outside the upper bound of the confidence interval (upper error), the proportion of ψ that fall within the confidence interval (central coverage), and the average bias (Average Bias), which is defined as The nominal values for the lower and the upper errors, the central coverage and the average bias are 0.025, 0.025, 0.95 and 0 respectively.These values reflect the desired properties of the accuracy and symmetry of the interval estimates of ψ.
Tables 4-6 present simulation results for the unequal scale parameters case, i.e., α 1 = 2, α 2 = 5, β 1 = 3, and R = 0.1(0.1)0.9 with (n, m) = (10, 10), (10,50) and (50, 10).Note that, we fixed R and β 2 is determined uniquely by Equation (2).It is clear that the coverage probabilities for R are poor and the two-tail error probabilities are extremely asymmetric from the MLE method.Results from the signed log-likelihood method are not satisfactory especially when the two sample sizes are small or sample sizes are unequal, and it also shows some evidence of asymmetry of two-tail error probabilities.However, the proposed method gives not only an almost exact coverage probability but also it has symmetric two-tail error probabilities even for small or uneven sample sizes.
Note that if we had used the built-in macros / subroutines from Matlab or R, our simulation study could not have been completed because of difficulties in obtaining the constrained maximum likelihood estimates.Fortunately, we did not encounter any difficulties when the penalized likelihood method was used.
Tables 7-9 present simulation results for the equal scale parameter case, i.e., α 1 = 4, β = 8, and R = 0.1(0.1)0.9 with (n, m) = (10, 10), (10,50) and (50, 10).In estimation, we fixed R and α 2 is then determined uniquely by R = α 1 /(α 1 + α 2 ).Again, the proposed method outperformed the other two methods even when the sample sizes were small.Other simulation results, though not reported here, essentially corroborate those of Tables 7-9, and are available upon request.In the equal scale parameter case, when sample sizes were small, or when R was close to the boundary values, we used the penalized likelihood method to obtain the constrained maximum likelihood estimates.More simulations have been performed with the same pattern of results.They are not reported here, but are available on request.

Conclusions
A penalized likelihood method is proposed to overcome numerical difficulties that arose in using the Lagrange method to solve an optimization problem with an integral constraint.The proposed method is used to obtain constrained maximum likelihood estimates for the stress-strength reliability with independent exponentiated exponential distributions.In turn, these estimates are used to obtain inference for the stress-strength reliability via a likelihood-based third order asymptotic method.In our simulation studies, the penalized likelihood method did not encounter any numerical difficulties in obtaining the constrained maximum likelihood estimates.Moreover, the likelihood-based third order asymptotic method gives very accurate results even when the sample sizes are small.Although the paper is restricted to independent exponentiated exponential distributions, it can easily be extended to other distributions.
Parameters of one function do not enter the second function.For that reason, we limit our analysis to the generic log-likelihood function: The first and second partial derivatives are given, using subscript notation, as: x i (20) It is clear from (21) that l αα < 0 as stated in the Section 2. In order to establish that l ββ < 0, it is convenient to rewite Equation ( 22) as: The first and third terms in Equation (24) are negative but the second term is positive.However, it is possible to show that the the sum of the first two terms is negative.Note first that n = (1−e −βx 1 ) 2 (1−e −βx 1 ) 2 + • • • + (1−e −βxn ) 2 (1−e −βxn ) 2 .This allows us to rewrite Equation (24) as: A representative numerator term from the first expression is: β 2 x 2 i e −βx i − (1 − e −βx i ) 2 which is the difference of squares and can be re-expressed as: i e −βx i − (1 − e −βx i ) 2 = (βx i e −.5βx i − (1 − e −βx i ))(βx i e −.5βx i + (1 − e −βx i )) The second product term on the right hand side is the sum of two positive terms and is therefore positive.The first product term is always negative when β > 0 and x i > 0. This is because βx i e −.5βx i − (1 − e −βx i ) = e −.5βx i βx i − (e .5βxi − e −.5βx i ) and using a Taylor expansion of βx i − (e .5βxi − e −.5βx i ) < 0 with β > 0 and x i > 0, we see that βx i e −.5βx i − (1 − e −βx i ) < 0. This completes the demonstration that l ββ < 0 whenever β > 0 and x i > 0.
The log-likelihood function, l(α, β; x), is not a concave function of (α, β) over the domain of the function.However, it is possible to show that when n > 2, there is a unique maximum of l(α, β; x) that occurs where l α = 0 and l β = 0. We begin by showing that there is a unique solution to l α = 0 and l β = 0.
Equation ( 19) can be solved for α as: x i = 0 (27) The first three terms on the left hand side define a function of β, call it lhs(β), with the property that lhs(β) is monotone decreasing with lim β→0 lhs(β) = +∞ and lim β→∞ lhs(β) = 0.The fact that lim β→∞ lhs(β) = 0 is clear by inspection.As well, it is clear that the second term on the left hand side approaches +∞ as β → 0. What is less obvious is that the sum of the first and third terms limits to .5 n i=1 x i as β → 0. However, this can be established by writing the sum of the first and third terms as: ).The result above follows when the exponential terms are expressed as series and the limit is taken as β → 0. Alternatively, l'Hopital's rule can be applied to each term of the sum.Given that n i=1 x i > 0 because X is a positive random variable, we see that there is a unique solution to Equation (27).
The final task in this Appendix is to show that the unique point (α, β) * , where the gradient vanishes, is the global maximum of the likelihood function.This can be accomplished by showing that the determinant of the Hessian matrix of l(α, β; x) is strictly positive at (α, β) * .The determinant of the Hessian matrix, call it D, can be expressed equivalently as: where: x 2 i e −βx i (1 − e −βx i ) 2  (29) and with B = n ( n i=1 log(1−e −β 1 x i )) 2 > 0. To establish that D > 0 it is sufficient to show that A 1 < 0 and A 2 B < 0. Note that the first two terms in Equation ( 24) are exactly the same as A 1 and we proved the sum of those terms was negative in the discussion following Equation (24).It remains only to prove that: x i e −βx i (1 − e −βx i ) ) 2 < 0 (31)

Table 3 .
Interval Estimates of ψ for Example.