Bias Reduction for the Marshall-Olkin Extended Family of Distributions with Application to an Airplane’s Air Conditioning System and Precipitation Data

: The Marshall-Olkin extended family of distributions is an alternative for modeling lifetimes, and considers more or less asymmetry than its parent model, achieved by incorporating just one extra parameter. We investigate the bias of maximum likelihood estimators and use it to develop an estimator with less bias than traditional estimators, by a modiﬁcation of the score function. Unlike other proposals, in this paper, we consider a bias reduction methodology that can be applied to any member of the family and not necessarily to any particular distribution. We conduct a Monte Carlo simulation in order to study the performance of the corrected estimators in ﬁnite samples. This simulation shows that the maximum likelihood estimator is quite biased and the proposed estimator is much less biased; in small sample sizes, the bias is reduced by around 50 percent. Two applications, related to the air conditioning system of an airplane and precipitations, are presented to illustrate the results. In those applications, the bias reduction for the shape parameters is close to 25% and the bias reduction also reduces, among others things, the width of the 95% conﬁdence intervals for quantiles lower than 0.594.


Introduction
Marshall and Olkin [1] proposed a way of introducing a parameter in a family of distributions to compete with such commonly used distributions as the Weibull, gamma and lognormal distributions. Let G and g be the cumulative distribution function (cdf) and the probability density function (pdf) respectively, indexed by the vector of parameter λ ∈ Λ. The pdf of the Marshall-Olkin extended model is given by where X is the sample space defined for g and G, G = 1 − G and α = 1 − α. Henceforth, we use the notation X ∼ MOE g (α, λ) to refer to a random variable with density provided in (1). Marshall and Olkin [2] called α the "tilt" parameter because the hazard rate of MOE g (α, λ) is shifted below (α ≥ 1) or above (0 < α ≤ 1) the baseline hazard rate related to g. Castellares and Lemonte [3] also showed an interpretation of MOE g (α, λ) based on the distribution of order statistics. Specifically, let {X n , n ∈ N} be a sequence of independent, identically distributed (iid) random variables in which each X n has baseline cumulative function G. Also, let N 1 ∼ Geo(α), for 0 < α < 1 and N 2 ∼ Geo α −1 , for α > 1, with Geo(p) denoting the geometric distribution with mean 1/p. We have that Y N 1 = min(X 1 , . . . , X N 1 ) ∼ MOE g (α, λ), if 0 < α ≤ 1, and Z N 2 = max(X 1 , . . . , X N 2 ) ∼ MOE g (α, λ), if α > 1.
From a statistical point of view, interpretations for some of those extensions are as follows: for α ≥ 1, Marshall and Olkin [1] interpreted the Marshall-Olkin extended exponential model as the conditional distribution, given the variable in the positive axis, of a random variable with logistic survival function. Ristic et al. [6] interpreted the Marshall-Olkin extended gamma model as a minification process, useful in a time series context. Ghitany et al. [7] interpreted the Marshall-Olkin extended Lomax model as a compounding process with exponential mixing model. A similar interpretation was presented in Ghitany and Kotz [8] for the Marshall-Olkin extended linear failure-rate. Gómez-Déniz [11] discretize the Marshall-Olkin extended exponential model in Marshall and Olkin [1] to obtain a generalized version of the geometric distribution. This model also can be seen as an infinite mixture of geometric distributions.
Applications to real data sets for some of those extensions include remission times in bladder cancer patients [7] and cancers in general [24], reliability analyses of electronic devices [15] and mechanical components [20], stress-rupture life of kevlar 49/epoxy strands [25], strengths of glass fibers [25], solid epoxy electrical-insulation in an accelerated voltage life test [25], flood peaks in a river [21], lifetimes of front disk brake pads [21], annual salaries of baseball players in Major League Baseball [26], stream flow amounts [20], etc.
We note that in many simulation studies presented in those works, the bias for the estimator of α is greater than the bias for the estimator of λ (the vector for the baseline model). If the estimators are biased, this implies that they are inconsistent; consequently, any function of these estimators will be inconsistent. In particular, any measurement of interest, such as mean, median, quantile, etc. will be inconsistent. For illustrative purposes, we considered the Marshall-Olkin extended exponential (MOEE) with pdf given by and the the Marshall-Olkin extended Rayleigh (MOER), a submodel in the class proposed by Alshangiti et al. [15], for which pdf is Note that in both models, the dimension of λ is 1. Moreover, λ = θ. Figure 1 shows the density plot for different choices of α and θ in MOEE and MOER models, respectively.  Figures 2 and 3 show the estimated bias based on 5000 replicates for the MOEE and MOER models and different sample sizes for the maximum likelihood estimators (MLE). These figures illustrate that the bias for the estimator of α is considerably greater (in relative terms) than the bias of θ in the same models. For this reason, we propose to study a methodology to reduce the bias for the MLE of α in the general class of model defined in (1), which can be applied to any member of the class, i.e., with any considered g and G in the baseline model.
In a frequentist context, in general, the maximum likelihood method is used to estimate the parameters of the model. The inferences depend strongly on asymptotic properties of the MLE, for instance, letα be the MLE of α. Among these properties, we have that the MLE is approximately non-biased, i.e., E(α − α) = 0 and follows a normal distribution when the sample size is large enough. However, likelihood inferences based on asymptotic approximation, when samples are of small or moderate size, may not be reliable.    The study of the behavior of the bias of MLE in small samples is an important area of research. There are several works in the literature related with bias correction; they can be divided into two main approaches: the corrective and the preventive, proposed by Cox and Snell [27] and Firth [28], respectively. In the first method, the bias is corrected after the MLE calculation; in the second, with a modification in the score function, the procedure already computes a less biased estimator than the regular MLE. The two methods are comparable, however Firth's procedure has gained more popularity in recent years. Assuming g free of parameters in MOE g (α, λ), the aim of our paper is to obtain, through the preventive method, a bias-corrected maximum likelihood estimator (BCE) for α that is less biased and shows the useful side-effect for MLE for the components of the vector λ.
The work is organized as follows. In Section 2 we develop the procedure to estimate a bias-corrected parameter in MOE g (α, λ). Monte Carlo simulation experiments are presented in Section 3 to discuss the importance of the expression obtained in the previous section, which produces much less biased estimates than the traditional procedure. In Section 4, we consider two empirical examples. We conclude in Section 5 with some final remarks. In the Appendix A we present details of the quantities needed in our work.

Bias Correction of MLE
The first known method of bias correction was proposed by Cox and Snell [27]. Specifically, for the MOE g (α; λ) model this method requires algebraic manipulations for each specific g adopted. As our main goal is to apply a methodology to reduce the bias in the estimator of α for any considered g, we discarded this approach. The same logic is applied to discard the preventive approach of [28], at least in its original form. We added an extra supposition.

and κ α,αα = E S(α) ∂S(α) ∂α
, where E is the expectation operator. The solution of the modified likelihood equation S M (α) = 0 produces the modified MLE, say α M . Firth shows that the order of the bias of α M is reduced from O(n −1 ) to O(n −2 ) when compared with the ordinary MLE. Moreover, the asymptotic distribution of α M coincides with that of α, i.e., for more details on bias reduction, see Cordeiro and Cribari-Neto [30]. When λ is known, the likelihood function for the MOE g (α) distribution is It can be verified that . Solving S M (α) = 0, we obtain the BCE. For the case where λ is unknown, the log-likelihood function for ψ = (α, λ) is given by Our proposal is to consider a bias correction methodology only for α and not for λ. In the introductory section, we justified this fact in some models, such as MOEE and MOER, because the bias for α is considerable in small and moderate sample sizes and lower for the components of λ, as presented in Figures 2 and 3. The second reason is because with more than one parameter, the form of the Fisher Information matrix, among other cumulants, is not closed. Moreover, these terms required for the application of Firth's methodology need to be computed for each member of the MOE g (α; λ) family. Therefore, an alternative successfully applied in other models such as [29,31], is (i) first compute the constrained MLE λ(α) for a fixed α, and then (ii) apply Firth's method to the profile score function of α, which produces the modified estimator obtained from the non-linear equation In short, the estimation procedure can be described as
Steps 1 and 2 are repeated until a convergence rule is satisfied. For instance, ||ψ (k+1) − ψ (k) || is less than a tolerance value, where ||x|| is the Euclidean norm of x.

Numerical Results
In this section, we present a simulation study to illustrate the performance of our methodology compared with the traditional MLE in the MOEE and MOER models. Additionally, we present a simulated data set to illustrate the gains obtained when our proposal is applied.

Reducing the Bias in the MOEE and MOER Models
We evaluate the performance of the MLE and BCE, described in Section 2, through a Monte Carlo simulation. The sample sizes considered are n = 10, 20, . . . , 100 and the total number of replications was set at 5000. All simulations were performed using the R software [32].
In each of the 5000 replications, for each scenario, the data were drawn from the respective model and we computed the MLE (say α and θ) and the BCE (say α M and θ M ). Additionally, we computed the estimated absolute relative bias (ARB) and the estimated root mean square errors (RMSE) which are defined, respectively, as |E( α) − α| /α and E[( α − α) 2 ] for the MLE of α (for the BCE, we replace α by α M and the terms are analogous for the estimators of θ). In Table 1, we present the results using MOEE for n = 10, . . . , 50. Since the results were the same using MOER, we preferred to present them through Figures 4 and 5 because it is more interesting for large sample sizes.
We can make some observations from Table 1 and Figures 4 and 5. First, the ARB is not negligible for small samples, but the BCE α M has a smaller ARB than α. In small sample sizes especially (say n = 10, 20, 30), the ARB is half the ARB for α. In larger sample sizes, the ARB, hence the bias, of the MLE continues larger than the BCE, but as expected, the two terms become closer as n increases. We also note that the RMSE of α M is smaller than the RMSE of α. On the other hand, the original proposal only considered a bias reduction methodology for α M ; this also produces an improvement in the behavior of θ M in comparison with θ, provided that the bias and RMSE for θ M is less than the bias and RMSE for θ, in both MOEE and MOER. This suggests that in models where g is indexed by a vector λ with one or more parameters, the improvement in terms of bias and RMSE should benefit the BCE of α and λ jointly, as we will illustrate in Section 4. The histograms of the estimates for α M and θ M show that their respective distributions are less asymmetrical than α and θ; these histograms were omitted for the sake of brevity. As there is no theoretical explanation that justifies this behavior, we can interpret it as a side-effect of the BCE procedure.

A Simulated Example with Outliers
In this simulation study, we show the impact of the bias in an estimation of the MOEE distribution in presence of an outlier. From the model with α = 2.5 and θ = 1.5, we sampled 10 observations: x = (0.554, 0.960, 1.099, 1.200, 0.424, 0.223, 2.883, 0.938, 0.276, 1.545). To illustrate the robustness of the method, we replaced the first observation x 1 by x 1 + 1.5 × sd(x) = 1.732. This perturbation scheme is usually used in local influence. In Table 2, we present the estimate for the original data set and the data set with presence of the outlier. Please note that the impact on the estimators of α differs further for the MLE than for the BCE. Additionally, as expected, the bias for the MLE is considerably smaller than the bias for the BCE. On the other hand, the estimations of θ seem not to be much impacted by the presence of the outlier. Figure 6 shows the MOEE pdf of this artificial data set compared with the pdf estimated by MLE and BCE. We also note that the pdf estimated by BCE is closer to the pdf with the original parameters than the pdf estimated by MLE.

Two Applications
In this section, we present two real data applications illustrating the gains in bias reduction in the MOEE and MOER models.

Air Conditioning System of an Airplane Data Set
In this section, we analyze the dataset presented by Linhart and Zucchini [33] (p. 69) to illustrate our method of estimation in MOEE. The data are failure times of the air conditioning system of an airplane: 23,261,87,7,120,14,62,47,225,71,246,21,42,20,5,12,120,11,3,14,71,11,14,11,16,90,1,16,52,95. We use the MOEE distribution to fit this data. Table 3 presents the parameter estimates using the MLE and the preventive method of estimation. In agreement with our simulation study, the MLE overestimated parameter vector λ, especially for parameter α. In this case, besides the benefits related to bias reduction of BCE in comparison with MLE, we observe that the estimated standard error for α is lower with the BCE than with the MLE. Figure 7 shows the empirical cdf for this data set compared with the estimated cdf for MOEE in the MLE and BCE. We remark that the Kolmogorov-Statistic (i.e., the maximum distance between the empirical and estimated cdf) is reduced from 0.1284 to 0.1185 considering MLE versus BCE. Finally, Figure 8 presents the randomized quantile residuals [34] for MLE and BCE. If the model is correct, those models are a random sample from the standard normal distribution. Several normality test are presented in the figure. Please note that all p-values related to those tests are greater for the BCE than for the MLE, suggesting a better fit of the BCE estimates. Table 3. Estimates for MOEE distribution in the air conditioning system of an airplane. Standard errors are presented in parenthesis.

Precipitation Data Set
This data set is obtained from Hinkley [35]. It consists of thirty successive values of March precipitation (measured in inches) in Minneapolis/St Paul. We consider this illustration of our method of estimation in MOER. Table 4 presents the parameter estimates using maximum likelihood estimation and the preventive method of estimation. MLE overestimated parameter vector λ in comparison with BCE. Also, the bias (estimated via bootstrap) is reduced in both parameters, as well the standard errors. Figure 9 shows the histogram for this data set compared with the estimated pdf for MOER in MLE and BCE. Table 5 shows the approximated 95% confidence interval (CI) for the q-th quantile (say, x q ) in this model based on MLE and BCE considering different values for q (see Appendix A for details of the construction of this approximated CI). Finally, Figure 10 shows the width of these CI's based on the two methods. Please note that BCE provides a lower width for q ≤ 0.592 and MLE provides a lower width for q > 0.592.

Concluding Remarks
We derived an expression for the bias of MLE related to the α parameter in the Marshall-Olkin extended family. The expression found allowed us to construct a procedure, using penalized likelihood, which generates a modified MLE with reduced bias. This is a useful development. In many cases, it is very difficult or even impossible to find bias-corrected maximum likelihood estimators for specific distribution families; we were able to find them for the Marshall-Olkin extended family. The reduced-bias estimator presents better performance than the corresponding procedures based on the initial estimator; for sample sizes equal to 10 or 20, the bias reduction was approximately 50%. The mean square error was also reduced. The scheme to use BCE is quite simple to implement in statistical softwares such as R. As mentioned by Firth [28]: "the merits of bias reduction in any particular problem will depend on several factors, including the skewness of the maximum likelihood estimator." Since the MLE of the parameters in the Marshall-Olkin extended family is quite biased, this is an indication that its distribution is asymmetrical. For a future work, we suggest the study of the skewness of the MLE in the MOE g , as, recently performed by Magalhães et al. [36] for the varying beta regression model (BRM) and Magalhães et al. [37] for Weibull censored data (WCD). In the first work, the authors showed that the MLE of the precision parameter is quite asymmetrical, while in the second, the MLE of the parameters is close to symmetry. This agrees with bias literature for the respective models, which showed that estimates of the precision parameters in the varying beta regression model are highly biased while the estimates in WCD are little biased. Since there is no closed-form for the skewness coefficient of the MLE of the MOE g , it will be a greater contribution to this family of distributions.

Appendix A. Details of the Computation of M(α)
In this Section we present details of the computation of I(α), κ α,α,α and κ α,αα , required to compute the term M(α) in (4). First, the survival function of the MOE g model is given by Also, deriving (5) in relation to α we obtain . As U i = S MOE g (X i ; α), by the inverse transform method it follows that U i ∼ U(0, 1) and then, E(U r i ) = (r + 1) −1 , for r = −1. It is also valid to rewrite the last expression as 1). Please note that ∑ n i=1 V i is a symmetric random variable around zero; it is then immediate that κ α,α,α = E (S(α)) 3 = 0. On the other hand, Therefore, Finally, As U 1 , . . . , U n are iid, we have that Reducing terms, we obtain that κ α,αα = −n/(3α 3 ).

Details of the Asymptotic CI Used in Application 2
For the MOER model, the q-th quantile of the distribution, say x q , is given by Thus, ∇(x q ) = ∂x q ∂ψ From asymptotic properties of the MLE (which also is valid for the BCE), we have that where I 2 is the identity matrix with dimension 2 and H(·) is the hessian matrix of the model. Therefore, using the delta method it follows that where σ 2 x q = ∇( x q ) H( ψ)∇( x q ). Therefore, an asymptotic 100(1 − p)% confidence interval based on the delta method for x q is given by IC(x q ; 100(1 − p)%) = x q ∓ z 1−p/2 σ x q , with z q denoting the q-th quantile of the normal standard model.