The Modified Beta Gompertz Distribution: Theory and Applications

In this paper, we introduce a new continuous probability distribution with five parameters called the modified beta Gompertz distribution. It is derived from the modified beta generator proposed by Nadarajah, Teimouri and Shih (2014) and the Gompertz distribution. By investigating its mathematical and practical aspects, we prove that it is quite flexible and can be used effectively in modeling a wide variety of real phenomena. Among others, we provide useful expansions of crucial functions, quantile function, moments, incomplete moments, moment generating function, entropies and order statistics. We explore the estimation of the model parameters by the obtained maximum likelihood method. We also present a simulation study testing the validity of maximum likelihood estimators. Finally, we illustrate the flexibility of the distribution by the consideration of two real datasets.


Introduction
The Gompertz distribution was initially introduced by Gompertz (1825) to describe human mortality and provide actuarial tables. The literature about the use of the Gompertz distribution in applied areas is enormous. A nice review can be found in Tjørve and Tjørve (2017), and the references there in. From a mathematical point of view, the cumulative probability density function (cdf) of the Gompertz distribution with parameters λ > 0 and α > 0 is given by The related probability density function (pdf) is given by g(x) = λe αx e − λ α (e αx −1) , x > 0.
It can be viewed as a generalization of the exponential distribution (obtained with α → 0) and thus an alternative to the gamma or Weibull distribution. A feature of the Gompertz distribution is that g(x) is unimodal and has positive skewness whereas the related hazard rate function (hrf) given by h(x) = g(x)/(1 − G(x)) is increasing. In order to increase the flexibility of the Gompertz distribution, further extensions have been proposed. A natural one is the generalized Gompertz distribution introduced by El- Gohary et al. (2013). By introducing an exponent parameter a > 0, the related cdf is given by The related applications show that a plays an important role in term of model flexibility. This idea was then extended by Jafari et al. (2014) by using the so-called beta generator introduced by Eugene et al. (2002). The related cdf is given by (1) where a, b > 0, B(a, b) denotes the beta function defined by B(a, b) = 1 0 t a−1 (1 − t) b−1 dt and I x (a, b) denotes the incomplete beta function ratio defined by I x (a, b) = (1/B(a, b)) x 0 t a−1 (1 − t) b−1 dt, x ∈ [0, 1]. This distribution has been recently extended by  with a five parameters distribution. It is based on the beta generator and the generalized Gompertz distribution.
Motivated by the emergence of complex data from many applied areas, other extended Gompertz distributions have been proposed in the literature. See for instance, El-Damcese et al. (2015) who consider the Odd Generalized Exponential generator introduced by Tahir et al. (2015), Roozegar et al. (2017) who use the the McDonald generator introduced by Alexander et al. (2012), Moniem and Seham (2015) and Khan et al. (2017) who apply the transmuted generator introduced by Shaw and Buckley (2007), Lima et al. (2015) and Chukwu and Ogunde (2016) who use the Kumaraswamy generator, and Benkhelifa (2017), Yaghoobzadeh (2017) who consider the Marshall-Olkin generator introduced by Marshall and Olkin (1997) and Shadrokh and Yaghoobzadeh (2018) who consider the Beta-G and Geometric generators.
In this paper, we present and study a distribution with five parameters extending the Gompertz distribution. It is based on the modified beta generator developed by Nadarajah et al. (2014) (which can also be viewed as a modification of the beta Marshall-Olkin generator developed by Alizadeh et al. (2015)). The advantage of this generator is to nicely combine the advantages of the beta generator of Eugene et al. (2002) and the Marshall-Olkin generator of Marshall and Olkin (1997). To the best of our knowledge, its application to the Gompertz distribution has never been considered before. We provide a comprehensive description of its general mathematical properties (expansions of the cdf and pdf, quantile function, various kinds of moments, moment generating function, entropies and order statistics). The estimation of the model parameters by maximum likelihood is then discussed. Finally, we explore applications to real data sets that illustrate the usefulness of the proposed model.
The rest of the paper is organized as follows. Section 2 describes the considered modified beta Gompertz distribution. Some mathematical properties are investigated in Section 3. Section 4 provides the necessary to the estimation of the unknown parameters with the maximum likelihood method. A simulation study is performed in order to test validity of the obtained maximum likelihood estimators. To illustrate the flexibility of the resulting model, applications to two real life data sets are also given.

The Modified Beta Gompertz Distribution
Let c > 0, G(x) be a cdf and g(x) be a related pdf. The modified beta generator introduced by Nadarajah et al. (2014) is characterized by the cdf given by By differentiation of F (x), a pdf is given by The hrf is given by Let us now present our main distribution of interest. Using the cdf G(x) of the Gompertz distribution with parameters λ > 0 and α > 0 as baseline, the cdf given by (2) becomes The related distribution will be call the modified beta Gompertz distribution (MBGz distribution for short), also denoted by MBGz(λ, α, a, b, c). The related pdf (3) is given by 3 The hrf is given by Figure 1 shows the plots for f (x) and h(x) for selected parameter values λ, α, a, b, c. We observe that these functions can take various curvature forms depending on the parameter values, showing the increasing of the flexibility of the former Gompertz distribution. A strong point of the MBGz distribution is to contain different useful distributions in the literature. The most popular of them are listed below.
• When c = 1/(1 − θ) with θ ∈ (0, 1) (θ is a proportion parameter), we obtain the beta Gompertz geometric distribution introduced by Shadrokh and Yaghoobzadeh (2018), i.e. with cdf However, this distribution excludes the case c ∈ (0, 1) by construction. The importance of small values for c can also be determinant in the applications (see Section 4).
• When c = 1, we get the beta Gompertz distribution with four parameters introduced by Jafari et al. (2014), i.e. with cdf • When c = b = 1, we get the generalized Gompertz distribution studied by El-Gohary et al. (2013), i.e. with cdf a , x > 0.
• When a = b = 1 and c = 1 θ with θ > 1 we get the a particular case of the Marshall-Olkin extended generalized Gompertz distribution introduced by Benkhelifa (2017), i.e. with cdf • When c = 1 and α → 0, we get beta Exponential distribution studied by Nadarajah and Kotz (2006), i.e. with cdf • When b = c = 1 and α → 0, we get the generalized exponential distribution studied by Gupta and Kundu (1999), i.e. with cdf • When a = b = c = 1 and α → 0 we get the exponential distribution, i.e. with cdf

On the shapes of the pdf
The shapes of f (x) given by (5) can be described analytically. As usual, the critical points x * of the pdf f (x) satisfies ∂ ∂x ln(f (x * )) = 0, with ∂x 2 ln(f (x * )) > 0 and a point of inflection if ∂ 2 ∂x 2 ln(f (x * )) = 0. Let us now study the asymptotic properties of f (x). We have So, for a ∈ (0, 1), we have lim x→0 f (x) = +∞, for a = 1, we have lim x→0 f (x) = bcλ and for a > 1, we have lim x→0 f (x) = 0. We have Thus lim x→+∞ f (x) = 0 in all cases. Figure 1 (a) illustrates these points for selected parameters.

On the shapes of the hrf
Similarly to the pdf, the critical points x * of the hrf h(x) given by (6) Thus lim x→+∞ h(x) = +∞ in all cases. Figure 1 (b) illustrates these points for selected parameters.

Linear representation
Let us determine useful linear representations for F (x) given by (4) and f (x) given by (5). First of all, let us suppose that c ∈ (0, 1). It follows from the generalized On the other hand, using again the generalized binomial formula, we obtain In a similar manner, we have where H m (x) = 1 − e − mλ α (e αx −1) is the cdf of a Gompertz distribution with parameters mλ and α. Combining these equalities, we obtain the following series expansion: where where w m = −v m and h m (x) is the pdf of a Gompertz distribution with parameters mλ and α.
For the case c > 1, we must do some transformation for the equation (7) in order to apply the generalized binomial formula. We can write On the other hand, we have Therefore, we can write F (x) as (8) with and f (x) as (9) with w m = −v * m (and still h m (x) is the pdf of a Gompertz distribution with parameters mλ and α). For the sake of simplicity, we shall refer to the form (9) far all series representation of f (x), whatever c ∈ (0, 1) or c > 1.
Hereafter, we denote by X a random variable having the cdf F (x) given by (4) (and the pdf f (x) given by (5)) and by Y m a random variable following the Gompertz distribution with parameters mλ and α, i.e. having the cdf H m (x) (and the pdf h m (x)).

Quantile function
The quantile function of X is given by , u ∈ (0, 1), where I −1 u (a, b) denotes the inverse of I u (a, b). It satisfies F (Q(u)) = Q(F (u)) = u. Using Nadarajah et al. (2014), one can show that From Q(u), we can simulate the MBGz distribution. Indeed, let U be a random variable following the uniform distribution over (0, 1). Then the random variable X = Q(U ) follows the MBGz distribution.

8
The median of X is given by M = Q(1/2). We can also use Q(u) to define skewness measures. Let us just introduce the Bowley skewness based on quartiles and the Moors kurtosis respectively defined by Contrary to γ 1 and γ 2 , these quantities have the advantage to be always defined. We refer to Kenney and Keeping (1962) and Moors (1988).

Moments
Let r be a positive integer. The r-th ordinary moment of X is defined by µ r = E (X r ) = +∞ −∞ x r f (x)dx. Using the linear representation given by (9), we can express µ r as By doing the change of variables u = e αx , we obtain This integral has connections with the so-called generalized integro-exponential function. Further developments can be found in Milgram (1985) and Lenart (2014). Therefore we have Obviously, the mean of X is given by E(X) = µ 1 and the variance of X is given by V(X) = µ 2 − (µ 1 ) 2 .

Skewness
The r-th central moment of X is given by µ r = E [(X − µ 1 ) r ]. It follows from the binomial formula that On the other side, the r-th cumulants of X can be obtained via the equation: with κ 1 = µ 1 . The skewness of X is given by γ 1 = κ 3 /κ 3/2 2 and the kurtosis of X is given by γ 2 = κ 4 /κ 2 2 . One can also introduce the MacGillivray skewness given by , u ∈ (0, 1).
It illustrates the effects of the parameters a, b, α and λ on the skewness. Further details can be found in MacGillivray (1986).

Moment generating function
The moment generating function of X is given by M X (t) = E e tX = +∞ −∞ e tx f (x)dx. Using (9), we have where M Ym (t) = E(e tYm ), the moment generating function of Y m . Doing successively the change of variables u = e αx and the change of variable v = mλ α u, we obtain Alternatively, using the moments of X, one can write +∞ 1 (ln u) r e − mλ α u du.

Incomplete moments and mean deviations
The r-th incomplete moment of X is defined by m r (t) = E X r 1 {X≤t} = t −∞ x r f (x)dx. Using (9), we can express m r (t) as Doing successively the change of variables u = e αx , we obtain The mean deviation of X about the mean is given by where m 1 (t) denote the first incomplete moment. The mean deviation of X about the median M = Q(1/2) is given by

Entropies
Let us now investigate different kinds of entropy measures. The Rényi entropy of X is defined by with γ > 0 and γ = 1. It follows from (3) that The generalized binomial formula implies that By doing the change of variable u = e αx and the change of variable v = ( + γb) λ α u, we get By putting the above equalities together, we have The Shannon entropy of X is defined by S(X) = E(− ln[f (X)]) is a particular case of the Rényi entropy when γ tends to 1 + . The γ-entropy is defined by Using the expansion above, we obtain

Order statistics
Let X 1 , . . . , X n be the random sample from X and X i:n be the i-th order statistic. Then the pdf of X i:n is given by It follows from (8) and (9) that Using a result from Gradshteyn and Ryzhik (2000), power series raised to a positive power as follows where the coefficients (d n,k ) k∈N are determined from the recurrence equation: d n,0 = a n 0 and, for any m ≥ 1, d n,m = (1/(ma 0 )) m k=1 (k(n + 1) − m)a k d n,m−k . Therefore, noticing By combining the equalities above, we obtain Finally, one can observe that h m (x)(1 − H k (x)) = mλe αx e − (m+k)λ α (e αx −1) = m m+k u m+k (x), where u m+k (x) denotes the pdf of the Gompertz distribution with parameters (m + k)λ and α. So the pdf of i-th order statistic of the MBGz distribution can be expressed as a linear combination of Gompertz pdfs, i.e.
Let r be a positive integer. Then the r-th ordinary moment of X i:n can be expressed as

Maximum likelihood estimation
We now investigate the estimation of the parameters of the MBGz distribution. Let x 1 , . . . , x n be n observed values from the MBGz distribution and ξ = (λ, α, a, b, c) be the vector of unknown parameters. The log likelihood function is given by .
The maximum likelihood estimators of the parameters are obtained by maximizing the log likelihood function. They can be obtained by solving the non-linear equations: 1 − e − λ α (e αx i −1) .
We can solve the above non-linear equations simultaneously. Mathematical package can be used to get the maximum likelihood estimators of the unknown parameters. Also, all the second order derivatives exist. As usual, the normal approximation for the maximum likelihood estimators can be used for constructing approximate confidence intervals, confidence regions and testing hypotheses of λ, α, a, b, c.

Simulation
It is very difficult to compare the theoretical performances of the different maximum likelihood estimates (MLEs) for the MBGz distribution. Therefore, simulation is needed to compare the performances of the MLE mainly with respect to their mean square errors for different sample sizes. A numerical study is performed using Mathematica 9 software. Different sample sizes are considered through the experiments at size n = 50, 100 and 150. The experiment will be repeated 3000 times. In each experiment, the estimates of the parameters will be obtained by maximum likelihood methods of estimation. The means and MSEs for the different estimators will be reported from these experiments. We can see from Table 1 when n is increase MSE is decrease.

Applications
This section provides an application to show how the MBGz distribution can be applied in practice. We compare MBGz to Exponentaited Generalized Weibull-Gompertz distribution (EGWGz) by El-Bassiouny et al. (2017) and other well known distributions in literature, Kumaraswamy-Gompertz (Kw-Gz), beta Gompertz(BGz) and Gompertz (Gz) models. The MLEs are computed using Quasi-Newton Code for Bound Constrained Optimization and the log-likelihood function evaluated. The goodness-of-fit measures, Anderson-Darling (A*), Cramer-von Mises (W*), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and log-likelihood (ˆ ) values are computed. The lower the values of these criteria, the better the fit. The value for the Kolmogorov Smirnov (KS) statistic and its p-value are also provided. The required computations are carried out in the R software.

Data set 1
The first data set represents the time to failure(103 h) of turbocharger of one type of engine given in Xu et al. (2003). The data set is as follows: 0.

Data set 2
The second data set was given by Badar and Priest (1982). It corresponds to a single fiber with 20 and 101 mm of gauge length, respectively. The data set is as follows: 1.6, 2.0, 2.6, 3.0, 3.5, 3.9, 4.5, 4.6, 4.8, 5.0, 5.1, 5.3, 5.4, 5.6, 5.8, 6.0, 6.0, 6.1, 6.3, 6.5, 6.5, 6.7, 7.0, 7.1, 7.3, 7.3, 7.3, 7.7, 7.7, 7.8, 7.9, 8.0, 8.1, 8.3, 8.4, 8.4, 8.5, 8.7, 8.8, 9.0 Tables 2 and 4 list the maximum likelihood estimates (and the corresponding standard errors in parentheses) of the unknown parameters of the MBGz distribution for Data Set 1 and Data Set 2 respectively. Tables 3 and 5 show the statistics AIC, BIC, W*, A*, KS, P-Value values for all the considered models. We then see that the proposed MBGz model fits these data better than the other models. The MBGz model may be an interesting alternative to other models available in the literature for modeling positive real data. To complete this fact, PP, QQ, epdf and ecdf plots of the MBGz distribution given in Figures 2 and 3 for Data Set 1 and Data Set 2 respectively.