The Odd Gamma Weibull-Geometric Model : Theory and Applications

In this paper, we study a new four-parameter distribution called the odd gamma Weibull-geometric distribution. Having the qualities suggested by its name, the new distribution is a special member of the odd-gamma-G family of distributions, defined with the Weibull-geometric distribution as baseline, benefiting of their respective merits. Firstly, we present a comprehensive account of its mathematical properties, including shapes, asymptotes, quantile function, quantile density function, skewness, kurtosis, moments, moment generating function and stochastic ordering. Then, we focus our attention on the statistical inference of the corresponding model. The maximum likelihood estimation method is used to estimate the model parameters. The performance of this method is assessed by a Monte Carlo simulation study. An empirical illustration of the new distribution is presented by the analyses two real-life data sets. The results of the proposed model reveal to be better as compared to those of the useful beta-Weibull, gamma-Weibull and Weibull-geometric models.


Introduction
The parametric models based on standard (probability) distributions are not always suitable to reveal the finer detail of the underlying structure of a data set.This limitation has triggered the creation of new families of distributions, often defined as compounding or weighting existing distributions.The most useful of them can be found in the surveys of [1,2].In particular, the families of distributions defined with gamma generators has demonstrated a high ability to construct flexible models, showing nice fits for various kinds of real-life data sets.See, for instance, [3][4][5][6].The odd-gamma-G family introduced by [5] will be at the heart of our study.For this reason, it is briefly described below.Let G(x) be a cumulative distribution function (cdf) of a continuous univariate distribution and g(x) be the corresponding probability density function (pdf).Then, the odd-gamma-G family of distributions is constructed from the gamma distribution and the odd transformation given by odd G (x) = G(x)/G(x), with G(x) = 1 − G(x).The corresponding cdf given by where γ(α, z) = z 0 t α−1 e −t dt, α > 0, z ≥ 0 is the lower incomplete gamma function and Γ(α) = +∞ 0 t α−1 e −t dt is the gamma function.The corresponding pdf is given by This modification can significantly enriches the former model related to G(x).This is supported by [5] with the uniform distribution as baseline and by [6] with the exponentiated version of the uniform distribution and the exponentiated version of the Weibull distribution as baselines.See also more general distributions in [7].In terms of modelling, it is shown that they enjoy better goodness of fit properties to other useful competitors.
On the other side, Ref. [8] introduced and studied another generalization of the Weibull distribution called the Weibull-geometric distribution.As indicated by the name, it is obtained by compounding the Weibull and geometric distributions.The corresponding cdf is given by where β > 0, c > 0 and p ∈ [0, 1).The corresponding pdf is given by One can remark that the Weibull distribution arises as a special case when p = 0.It is shown in [8] that the Weibull-geometric pdf and hrf take more general forms that the standard Weibull distribution.Among others, thanks to the presence of the parameter p, the related model is of interest for modeling unimodal failure rates (contrary to the standard Weibull model).
In this paper, we focus our attention on a new distribution with cdf defined by the compounding of the odd-gamma-G cdf given by (1) and the Weibull-geometric cdf given by (2).The obtained distribution is then called the odd gamma Weibull-geometric distribution (OGWG for short).We thus aim to benefit of the respective merits of the two compounded distributions to create a new one having a great flexibility in modelling.Among others, we show that the OGWG pdf can have reversed-J, right skewed shapes, left-skewed and approximately symmetric, and the OGWG hrf can have increasing failure rate, decreasing failure rate and bathtub shapes.These aspects are welcome to the construction of new flexible models for a wide variety of data sets.
The rest of the paper is organized as follows.In Section 2, the main functions related the OGWG distribution are presented, with analytical results and graphical illustrations on their shapes.The mathematical properties of the new distribution are derived in Section 3. In Section 4, estimators for the model parameters are obtained by the method of maximum likelihood estimation.A simulation study is then performed to show the numerical performance of the estimators.In Section 5, two real data sets are considered for analysis, showing the nice fit of the proposed model in comparison to useful competitors.Some concluding remarks are given in Section 6.

Main Probability Functions
Let us recall that the cdf of the OGWG distribution is defined by (1) with G(x) given (2).By noticing that odd G (x) = (e (βx) c − 1)/(1 − p), the corresponding cdf is given by with c > 0, α > 0, p ∈ [0, 1) and β > 0. When needed, the OGWG distribution will be denoted by OGWG(c, α, p, β) in order to specify the parameters.By differentiation (almost surely), the pdf corresponding to (3) is given by The survival function of the OGWG distribution is given by The corresponding hazard rate function (hrf) is given by Remark 1.One can remark that he OGWG distribution is special case of the general gamma-Weibull-Weibull distribution introduced by ([7], Section 6) (with the notations of [7], it corresponds to β = 1, simplifying the complexity of the distribution, α = 1/(1 − p), λ = 1/β and k = c).It is also an extension of the so-called exponentiated exponential power distribution thanks to the presence of the parameter p.

Analytical Properties of the Shapes
We now investigate the critical points and asymptotes for the OGWG pdf, i.e., f (x) given by (4), and the OGWG hrf, i.e., h(x) given by (5).In order to deal with tractable equations for the critical points, we work with the logarithmic functions: log[ f (x)] and log[h(x)].Thus, the critical points of f (x) are the solution x 0 of the nonlinear equation The nature of x 0 can be determined by the study of the sign of the value For given parameters c, α, p and β, this aspect can be evaluated numerically by using a standard software (R, Matlab, Mathematica. . .).In a similar way, the critical points of h(x) are the solution x * of the nonlinear equation Again, the nature of x * can be determined by the study of the sign of the value The asymptotic properties of F(x), f (x) and h(x) are now studied.Since γ(α, u) ∼ u α /α when u → 0, for x → 0, we have In this case, note that f (x) tends to 0 when αc > 1, tends to β/[αΓ(α)(1 − p) α ] when αc = 1 and tends to +∞ when αc ∈ (0, 1).The same results hold for h(x).
Since γ(α, u) ∼ Γ(α) − u α−1 e −u when u → +∞, for x → +∞, we have Let us remark that, in this case, f (x) tends to 0, whereas h(x) tends to +∞, for all the possible values of the parameters.Also, the parameter α plays no role in the asymptotic behavior of h(x).
Figures 1 and 2 display some plots of f (x) and h(x) when β = 1 for different values of c, α and p.The plots in Figure 1 reveal that f (x) can have reversed-J, right skewed shapes, left-skewed and approximately symmetric.The plots in Figure 2 indicate that the h(x) can have increasing failure rate, decreasing failure rate and bathtub shapes.

Quantile Function
The quantile function of the OGWG distribution, denoted by Q(y), y ∈ (0, 1), is characterized by the non-linear equation : F(Q(y)) = y.After some algebra, we obtain where γ −1 (α, x) is the inverse lower incomplete gamma function, i.e., satisfying γ −1 (α, γ(α, x)) = x for x > 0. The median of the OGWG distribution is given by M * = Q(0.5).For i ∈ {1, 2, 3}, the i-th quartile is given by ) and for j ∈ {1, . . ., 7}, the j-th octile is given by Among others, we can use the quartiles and the octiles to investigate the effect of the parameters c, α, p and β on the skewness and kurtosis of the OGWG distribution.One of the earliest skewness measure is the Bowley skewness introduced by [9] and defined by For the kurtosis, one can use the Moors kurtosis introduced by [10] and defined by The sign of B is informative on the skewness nature of the distribution.Indeed, if B = 0 then the distribution is symmetric, if B > 0 then the distribution has a right skewed tail and if B < 0 then the distribution has a left skewed tail.On the other side, the heaviness of the tail is evaluated numerically by M. A large M corresponds to a heavy tail.Also, we can derive the quantile density function by differentiation of Q(y).It comes , y ∈ (0, 1).
This function is useful to defined numerous statistical quantities (asymptotic confidence intervals, inference procedures. . .).We refer to [11].

Some Characterizations
Let U be a random variable following the uniform distribution over (0, 1) and Q(y) be the quantile function given by (6).Then, the random variable X defined by follows the OGWG(c, α, p, β) distribution.By noticing that Z = γ −1 (α, UΓ(α)) follows the gamma distribution with parameters 1 and α, i.e., with cdf R(x) = (1/Γ(α)) . This characterization is useful to generate data according to the OGWG(c, α, p, β) distribution.Furthermore, if X is a random variable following the OGWG(c, α, p, β) distribution, then the random variable Y defined by follows the gamma distribution with parameters 1 and α.

Series Expansion of the OGWG pdf
The following result presents a series expansion for the OGWG pdf.
Proposition 1.The OGWG pdf can be expressed as sums of Weibull pdfs.More precisely, there exists a sequence of real number (v r,s ) (r,s)∈N 2 such that the pdf f (x) given by can be expressed as where r,s (x) = c(r + s + 1)β c x c−1 e −(r+s+1)(βx) c and one can remark that r,s (x) is the pdf of the Weibull distribution with parameters (r + s + 1) 1/c β and c.
Proof.Let us consider the expression of f (x) depending on G(x) and g(x), i.e., By virtue of the power series expansion of the exponential function, i.e., e x = +∞ ∑ i=0 and the generalized binomial series expansion, i.e., (1 By applying the generalized binomial formula twice, we have Putting all the above equalities together, we get . This ends the proof of Proposition 1.
The result of Proposition 1 is useful to have sum expressions of various probability measures, specially those of the form where k(x) denotes a certain function.Indeed, when the dominated convergence theorem can be applied, we have the following series expansion:
It follows from the asymptotic study of f (x) performed in Section 2.2 that, for any integer m, m-th moment of X exists (by using Riemann integrals).Furthermore, it is given by Several expressions of this integral are given below.First of all, by applying the natural change of variable y = (e (βx) c − 1)/(1 − p), i.e., x = (1/β)[log(1 + (1 − p)y)] 1/c , we obtain To the best of our knowledge, there is no closed form for this integral.However, for given parameters c, α, p and β, it can be computed numerically by using a scientific softwares (see Table 1

below).
The following result proposes bounds for µ m .This gives an approximative analytical view on the roles of the parameters on the possible values of µ m .

Proposition 2. Let us set
Then, we can bound µ m as Proof.Let us consider the expression (8) for µ m .The following bounds hold for the logarithmic function: for any x > 0, x/(1 + x) ≤ log(1 + x) ≤ x.On the other hand, we have e x ≥ 1 + x for any x ∈ R. Therefore, for any x > 0, we have xe −x ≤ log(1 + x) ≤ x.Hence, by using xe −x ≤ log(1 + x) and a change of variable, we obtain In a similar way, by using log(1 + x) ≤ x, we have By combining the previous bounds, we end the proof of Proposition 2.
Alternatively, one can use directly the quantile function given by (6).Indeed, by the change of variable x = Q(y), we have dy.
As a final approach, one can use the result of Proposition 1 and more specially, the result (7).Hereafter, let X r,s be a random variable following the Weibull distribution with parameters (r + s + 1) 1/c β and c, i.e., with the pdf given by r,s (x) = c(r + s + 1)β c x c−1 e −[(r+s+1) 1/c βx] c , x > 0.Then, we have From any of the above expressions of µ m , we can derive central measures as the mean of X given by E(X) = µ 1 , the variance of X given by V(X) = µ 2 − (µ 1 ) 2 and the m-th central moment of X given by We can also determine other standard measures as the coefficient of skewness and coefficient of kurtosis, respectively given by Table 1 presents the numerical values of µ 1 , µ 2 , µ 3 , µ 4 , V(X), CS and CK for selected values of the parameters.

Moment Generating Function
The moment generating function of X is given by It is well defined for t ∈ R. By applying the change of variable y = (e (βx) c − 1)/(1 − p), we can express M(t) as Alternatively, by the change of variable x = Q(y), we have dy.
For given parameters c, α, p, β and t, the integrals above can be computed numerically.
A series expansion of M(t) can be derived from (7).Indeed, we have As always, we have the following relations between the moments and the moment generating function:

Incomplete Moments
Let 1 A be the indicator function of over an event A. Then, the m-th incomplete moment of X is defined by By applying the change of variable y = (e (βx) c − 1)/(1 − p), we obtain One can determined bounds for θ m (t) by proceeding as in the proof of Proposition 2.
Alternatively, by the change of variable x = Q(y), we have immediately dy.
Again, for given parameters c, α, p, β and t, we can evaluate these integrals numerically.Also, another expression comes from (7).We have With the first incomplete moment of X, one can define several kinds of means deviation.For instance, there are the mean deviation of X about the mean µ 1 given by and the mean deviation of X about the median M * given by We can also express the Bonferroni and Lorenz curves respectively given by and , y ∈ (0, 1).
As an example of the use of higher orders of the incomplete moments, the m-th moment of the residual life of X is given by

Stochastic Ordering
Under some assumptions on the parameters, the result below shows that the OGWG(c, α, p, β) distribution is ordered with respect to the likelihood ratio ordering.Further details and applications on stochastic ordering can be found in [12].Proposition 3. Let X be a random variable following the OGWG(c, α 1 , p 1 , β) distribution and Y be a random variable following the OGWG(c, α 2 , p 2 , β) distribution.Suppose that α 1 ≤ α 2 and p 2 ≤ p 1 .Then X is smaller than Y in the likelihood ratio order, i.e., the function defined by ratio of the pdf of X over the pdf of Y is decreasing.
Proof.Let f 1 (x) be the pdf of X and f 2 (x) be the pdf of Y.Then, we have 1 − e (βx) c .
Let us consider the logarithmic function to have a more tractable expression.We have 1 − e (βx) c .
Since α 1 ≤ α 2 and p 2 ≤ p 1 , as sum of two negative functions, we have The proof of Proposition 3 is completed.

Estimation of Parameters
Hereafter, we focus our attention on the applied aspect of the OGWG distribution, considering it as a statistical model.Indeed, motivated by its flexibility discussed in the above sections, the OGWG model is appropriate for the analyses of data sets with a non-trivial structure, as those frequently encountered in engineering, medicine, hydrology, economics and finance.

Maximum Likelihood Estimation
Several parameter estimation methods are available in the literature.Among them, thanks to its strong theoretical guaranties, the maximum likelihood method remains the most popular.In particular, it can be used to construct confidence intervals for the model parameters and also in test statistics.For these reasons, we consider the estimation of the unknown parameters for the OGWG model from complete samples with this method only.Let x 1 , . . ., x n be a sample of size n from the OGWG(c, α, p, β) distribution.The log-likelihood function for the vector of parameters Θ = (c, α, p, β) is given by The corresponding score vector is given by By solving the system U(Θ) = (0, 0, 0, 0) , we obtain a solution denoted by Θ = ( ĉ, α, p, β) (assuming that it is unique).Hence, ĉ, α, p and β are the maximum likelihood estimates (MLEs) of c, α, p and β, respectively.The analytical expressions of these estimates do not exist in our case.However, they can be solved numerically by using iterative techniques (quasi-Newton BFGS, Newton-Raphson algorithms. . .).Further details can be found in [13].Assuming that the parameters are in the interior of the parameter space but not on the boundary, the subjacent distribution of Θ can be approximated by a 4 dimensional normal distribution with mean Θ and covariance matrix given by J( Θ) −1 , where J(Θ) denotes the 4 × 4 symmetric matrix defined by whose elements are given in Appendix A. From this asymptotic property, one can construct approximate confidence intervals for α, p and β.More precisely, for h ∈ {c, α, p, β}, an approximate confidence interval for h at the level 100(1 − ω)% is given by CI h = [ ĥ − z ω/2 s ĥ, ĥ + z ω/2 s ĥ], (10) where s ĥ is the square-root of the diagonal element of J( Θ) −1 at the same position as h and z ω/2 is the quantile 100(1 − ω/2)% of the standard normal distribution.Also, we are able to compute the likelihood ratio (LR) statistics for testing goodness-of-fit of the OGWG model with its sub-models.

Monte Carlo Simulation Study
Now we assess the asymptotic properties of the MLEs for the parameters of the OGWG model using Monte Carlo simulations.The simulation study is repeated N = 5000 times each with sample sizes n = 50, 100, 200 and with the following parameter scenarios: I: c = 1.5, α = 0.5, p = 0.5 and β = 0.5, II: c = 1.5, α = 0.5, p = 0.1 and β = 0.5 and III: c = 1.5, α = 1.5, p = 0.5 and β = 0.8.We investigate the empirical bias (Bias), mean square error (MSE) and coverage probability (CP) at the nominal level 95%.For h ∈ {c, α, p, β}, they are respectively defined by where ĥi denotes the MLE of h obtained at the i-th repetition of the simulation and z 0.975 is the quantile 97.5% of the standard normal distribution, i.e., u = z 0.975 ≈ 1.95996.Table 2 gives the values of these measures under the scenarios and different sample sizes as indicated above.In most of the cases, we see that the empirical biases tends to zero when n increases, the empirical MSEs decay toward zero as n increases and the empirical CPs are quite close to the level 95%.Thus, based on these simulation results, we conclude that the MLEs perform well in estimating c, α, p and β.Therefore, the MLEs and their asymptotic results can be adopted for estimating and constructing approximate confidence intervals for c, α, p and β.

Data Analysis
In this section, the OGWG distribution is used as model to analyze two a real-life data sets.We compare the fits of the OGWG model with the beta-Weibull (BW) (see [14]), Weibull-geometric (WG) (see [8]) and gamma-Weibull (GW) models (i.e., the OGWG model with p = 0).The corresponding pdfs are respectively given by We estimate the model parameters by using the maximum likelihood method as presented in Section 4. We compare the goodness-of-fit of the models using Cramér-von Mises (W * ) and Anderson-Darling (A * ) statistics, which are described in detail by [15].In addition, we consider the Kolmogorov-Smirnov (K-S) statistic, AIC and BIC.In general, the smaller the values of these statistics, the better the fit to the data.Two analyses are performed on two different data sets, as described below.

Data analysis 1:
The first data set, taken from [16], represents the failure times of air conditioning system of air plan.Some descriptive statistics are given in Table 3.The skewness is positive (right-skewed data) and the kurtosis is positive.The boxplot and the TTT plot are given in Figure 3.In particular, the TTT plot shows a possible monotonically increasing or constant hrf, indicating that the OGWG model could be appropriate for the fitting of this data set.The MLEs (with SEs in parenthesis), A * , W * , K-S statistics, AIC and BIC are listed in Table 4.For each criterion, the smallest values is reached by the OGWG model, indicating that it provides the best fit.For a visual approach, the estimated pdf and cdf of the OGWG model are displayed in Figure 4. We also see the P-P and Q-Q plots.All the graphics show nice fits for the OGWG model.Finally, the asymptotic confidence intervals of the OGWG parameters given by (10) are presented in Table 5 with the levels 95% and 99%.

Data analysis 2:
The second data set, taken from [17], represents the tensile strength, measured in GPa, of 69 carbon fibers tested under tension at gauge lengths of 20 mm.The summary statistics are given in Table 6.The skewness and kurtosis are positive but close to zero. Figure 5 shows the boxplot and the TTT plot.Since the curve in the TTT plot is concave, it seems to correspond to a monotonically decreasing hrf, so the OGWG model could be suitable for the fitting of data set 2. The MLEs (with SEs in parenthesis), A * , W * , K-S statistics, AIC and BIC are listed in Table 7.All are favorable to the OGWG model.Also, the estimated pdf and cdf of the OGWG model are displayed in Figure 6, as well as P-P and Q-Q plots.All the graphics show nice fits for the OGWG model.Finally, the asymptotic confidence intervals of the OGWG parameters given by (10) are presented in Table 8 with the levels 95% and 99%.Among others, the obtained results of this data set suggest that the OGWG family of distributions can be also utilized in calibration and errors-in-variables modeling (see [18]).

Concluding Remarks
This paper proposed a new distribution called the odd gamma Weibull-geometric distribution (OGWG) and developed its merits from the mathematical and practical points of view.In particular, we study its shapes, asymptotes, quantile function, quantile density function, skewness, kurtosis, moments, moment generating function and stochastic ordering.Then, the statistical inference of the OGWG model is studied, with the maximum likelihood estimation method as benchmark.A simulation work is performed to show the usefulness of the obtained estimators.Then, the analyses two real-life data sets are explored, revealing that the OGWG model is better in terms of fit as compared to the useful beta-Weibull, gamma-Weibull and Weibull-geometric models.It is hoped that the new perspectives of applications presenting by the OGWG distribution will attract the statistician and the practitioners in general.

Table 2 .
Biases, MSEs and CPs of the simulation study.

Table 3 .
Descriptive statistics for data set 1.

Table 4 .
MLEs, their SEs (in parentheses) and goodness-of-fit measures for data set 1.

Table 5 .
Confidence intervals of OGWG for data set 1.

Table 6 .
Descriptive statistics for data set 2.

Table 7 .
MLEs, their SEs (in parentheses) and goodness-of-fit measures for data set 2.

Table 8 .
Confidence intervals of OGWG for data set 2.