Discrete Gompertz-G Family of Distributions for Over- and Under-Dispersed Data with Properties, Estimation, and Applications

: Alizadeh et al. introduced a ﬂexible family of distributions, in the so-called Gompertz-G family. In this article, a discrete analogue of the Gompertz-G family is proposed. We also study some of its distributional properties and reliability characteristics. After introducing the general class, three special models of the new family are discussed in detail. The maximum likelihood method is used for estimating the family parameters. A simulation study is carried out to assess the performance of the family parameters. Finally, the ﬂexibility of the new family is illustrated by means of four genuine datasets, and it is found that the proposed model provides a better ﬁt than the competitive distributions.


Introduction
In probability and statistics, the Gompertz (Gz) distribution is a continuous probability distribution, named after Benjamin Gompertz. This distribution is a generalization of the exponential (Ex) distribution. The random variable T is said to have the Gz distribution with the shape parameter θ > 0 and scale parameter c > 0, if its cumulative distribution function (CDF) is given by The Gz distribution is often applied to describe the distribution of adult lifespans by demographers and actuaries. Related fields of science such as biology and gerontology also consider the Gz distribution for the analysis of survival. More recently, computer scientists have also started to model the failure rates of computer codes using the Gz distribution. In marketing science, it has been used as an individual-level simulation for customer lifetime value modeling. For more details, see Willemse et al. [1], Preston et al. [2], Melnikov and Romaniuk [3], Ohishi et al. [4], Bemmaor et al. [5], Cordeiro et al. [6], El-Bassiouny et al. [7][8][9], Alzaatreh et al. [10], Roozegar et al. [11], Mazucheli et al. [12], Eliwa et al. [13], among others.
Recently, discretizing continuous distributions has received much attention in the statistical literature. The discretization phenomenon generally arises when it becomes impossible or inconvenient to measure the life length of a product or a device on a continuous scale. Such situations may arise when the lifetimes need to be recorded on a discrete scale rather than on a continuous analogue. Therefore, several discrete distributions have been presented in the literature. See for example, Roy [22], Gómez-Déniz [23], Bebbington et al. [24], Nooghabi et al. [25], Nekoukhou et al. [26], Bakouch et al. [27], Nekoukhou and Bidram [28], Chandrakant et al. [29], Para and Jan [30], Mazucheli et al. [31], El-Morshedy et al. [17,20,32], Eliwa and El-Morshedy [33], among others. Although there are a number of discrete distributions in the statistical literature, there is still a lot of space left to develop new discretized distributions that are suitable under different conditions. Therefore, in this paper, we introduce a flexible discrete generator of distributions, in the so-called discrete Gz-G (DGz-G) family. Our reasons for introducing the DGz-G family are the following:

1.
To generate models with a negatively skewed, a positively skewed, or a symmetric shape; 2.
To define special models with all types of hazard rate function; 3.
To propose models which are appropriate for modeling both over-and under-dispersed data; 4.
To generate models for modeling both lifetime and counting datasets; 5.
To provide consistently better fits than other generated models under the same baseline distribution and other well-known models in the statistical literature.
The paper is organized as follows. In Section 2, the DGz-G family of distributions is defined. Some statistical and reliability properties of the DGz-G family are obtained in Section 3. In Section 4, three special models of the proposed family are discussed in detail. The family parameters are estimated by maximum likelihood method in Section 5. In Section 6, a simulation study is performed. The usefulness of the DGz-G family is illustrated by means of four genuine datasets, where we prove empirically that the DGz-G family outperforms some well-known distributions in Section 7. Section 8 offers some concluding remarks.

Quantile Function (QF)
For the DGz-G family, the qth QF, say z q , is the solution of F Z (z q ) − q = 0; z q > 0, then where q ∈ (0, 1) and G −1 represents the baseline QF. Setting q = 0.5, we get the median of the DGz-G family.

Moments, Dispersion Index, Skewness, Kurtosis, and Cumulants
Assume non-negative random variable Z ∼ DGz-G(z; p, c, ψ) family, then the rth moment of Z can be expressed as On the other hand, the moment generating function (MGF) can be represented as where The first four derivatives of Equation (16), with respect to t at t = 0, yield the first four moments about the origin, i.e., E(Z r ) = d r dt r M Z (t)| t=0 . Moreover, utilizing Equation (13) or (16), the skewness (Sk) and kurtosis (Ku) can be expressed as Sk = (µ 3 − 3µ 2 µ 1 + 2µ 3 1 )/(Var) 3/2 and Ku = (µ 4 − 4µ 3 µ 1 + 6µ 2 µ 2 1 − 3µ 4 1 )/(Var) 2 , respectively. In probability theory, the cumulants, say k n , of a probability model are a set of quantities that provide an alternative to the moments of a probability model. Because in some cases, theoretical treatments of problems in terms of cumulants are simpler than those using moments. The cumulant generating function (CGF) is the logarithm of the MGF. Thus, the k n can be recovered in terms of moments as follows: Further, the cumulants are also related to the moments by the following recursion formula: The first cumulant is the mean, the second cumulant is the variance, and the third cumulant is the same as the third central moment. However, the fourth and higher-order cumulants are not equal to central moments.

Rényi Entropy
Entropy refers to the amount of uncertainty associated with a random variable Z. It has many applications in several fields such as econometrics, quantum information, information theory, survival analysis, and computer science (see Rényi [34]). The measure of variation of the uncertainty of the random variable Z can be expressed as where η ∈ ]0, ∞[ and η = 1. The Shannon entropy can be defined by E [− log f (Z; p, c, ψ)]. It is observed that the Shannon entropy can be calculated as a special case of the Rényi entropy when η → 1.

Mean Time to Failure (MTTF), Mean Time between Failure (MTBF), and Availability (Av)
MTTF, MTBF, and Av are reliability terms based on methods and procedures for lifecycle predictions for a product. Customers often must include reliability data when determining what product to buy for their application. MTTF, MTBF, and Av are ways of providing a numeric value based on a compilation of data to quantify a failure rate and the resulting time of expected performance. In addition, in order to design and manufacture a maintainable system, it is necessary to predict the MTTF, MTBF, and Av. If T ∼ DGz-G(t; p 1 , c 1 , ψ 1 ), then the MTBF is given as Whereas, if T ∼ DGz-G(t; p 2 , c 2 , ψ 2 ), then the MTTF is given as The Av is considered as being the probability that the component is successful at time t, i.e., Av = MT T F MT BF .

Order Statistics (OS)
OS make their appearance in many areas of statistical theory and practice. Let Z 1 , Z 2 , ...,Z n be a random sample from the DGz-G(z; c, p, ψ) family of distributions and let Z 1:n , Z 2:n ,...,Z n:n be their corresponding OS. Then, the CDF of the ith OS Z i:n for an integer value of z can be written as where ∆ (m,j) (n,k) = (−1) j+m n k n − k j k + j m . The corresponding PMF of the ith OS can be expressed as The uth moment of Z i:n can be written as

L-Moment (LM) Statistics
L-moments (LMs) obtain their name from their construction as linear combinations of OS. Hosking and Wallis [35] defined LMs as summaries of theoretical distribution and observed samples. Therefore, LM statistics are used for computing sample statistics for data at individual regions or for testing for homogeneity/heterogeneity of proposed groupings of sites. Let Z(i|n) be ith largest observation in sample of size n, then the LMs can be take the form From Equation (25), we get λ * Then, we can define some statistical measures such as LM of mean, LM coefficient of variation, LM coefficient of Sk, and LM coefficient of ku in the form λ * 1 , , respectively.

The DGz-Exponential (DGzEx) Distribution
Consider the CDF of the Ex distribution. Then, the PMF of the DGzEx distribution can be expressed as where a > 0. The PMF in Equation (26)  f (z;p,c,a) is a decreasing function in z for all values of the model parameters. Therefore, it is strongly unimodal, it has all its moments, and the HRFs are increasing. Figures 1 and 2 show the PMF and HRF of the DGzEx distribution for various values of the parameters.  It is not possible to write the rth moment of the DGzE distribution in closed form, and therefore, we use Maple software to discuss some of its statistical properties. Other work such as Para and Jan [30], and Kundu and Nekoukhou [36] did not provide a closed form of the moments. Table 1 lists some descriptive statistics using the DGzEx model for different values of p and c with a = 0.2. 2.
The mean and Var increase whereas the Sk and Ku decrease for fixed values of a and c with p −→ 1; 3.
The mean, Var, and Sk decrease for fixed values of a and p with c −→ ∞. Table 2 shows the MTTF and entropy values for fixed values of a = 0.1 and η = 0.5 with p −→ 1 and c −→ ∞. According to Table 2, it is clear that the MTTF and entropy increase for fixed values of a, c, and η with p −→ 1. Whereas, for fixed values of a, p, and η with c −→ ∞, the MTTF and entropy decrease.

The DGz-Weibull (DGzW) Distribution
Consider the CDF of the Weibull (W) distribution. Then, the PMF of the DGzW distribution can be expressed as where a, b > 0. The PMF in Equation (27)    It is immediate that the PMF is unimodal and the HRF can be either increasing, decreasing, or of bathtub shape. Hence, the parameters of the underlying distribution can be adjusted to suit most datasets. Like in the case of the DGzE distribution, it is not possible to write the rth moment in closed form, and consequently, Maple is used to explain some of the statistical properties of the DGzW distribution. Table 3 shows some descriptive statistics utilizing the DGzW distribution for various values of p and c with a = 0.5 and b = 1.5. The mean and Var decrease for fixed values of a, b and p with c −→ ∞. Table 4 shows the MTTF and entropy values for fixed values of a = b = η = 0.5 with p −→ 1 and c −→ ∞. According to Table 4, it is clear that the MTTF and entropy increase for fixed values of a, b, c, and η with p −→ 1. Whereas, for fixed values of a, b, p, and η with c −→ ∞, the MTTF and entropy decrease.

The DGz-Inverse Weibull (DGzIW) Distribution
Consider the CDF of the inverse Weibull (IW) distribution. Then, the PMF of the DGzIW distribution can be expressed as where a, b > 0. The PMF in Equation (28) is log-concave for some values of the model parameters. Figures 5 and 6 show the PMF and HRF of the DGzIW distribution for various values of the parameters.   It is immediate that the PMF is decreasing, whereas the HRF can be either increasing, decreasing, or of unimodal shape. Hence, the parameters of the underlying distribution can be adjusted to suit most datasets.

Maximum Likelihood Estimation (MLE)
In this section, we estimate the unknown parameters of the DGz-G family using the maximum likelihood (ML) method. Suppose Z 1 , Z 2 , ...,Z n is a random sample from the DGz-G family. Then, the log-likelihood function (L) can be expressed as The MLEs of the parameters p, c, and ψ can be derived by solving the nonlinear likelihood equations obtained by differentiating (Equation (29)). The components of the score vector, and Setting the Equations (30)-(32) to zero and solving them, immediately yields the MLEs for the DGz-G family parameters. These equations cannot be solved analytically; therefore, an iterative procedure like Newton-Raphson is required to solve them numerically.

Simulation Results
In this section, we assess the performance of the MLE with respect to sample size n. The assessment is based on a simulation study which is describes in the following:
Compute the biases and mean-squared errors (MSEs), where bias(α) = 1 1000 The empirical results are shown in Figures 7-12.      The magnitude of bias always decreases to zero as n → ∞; 2.
The MSEs always decrease to zero as n → ∞. This shows the consistency of the estimators; 3.
Under the MLE method, the estimator of p is slightly negatively biased; 4.
The MLE method performs quite well for the parameters estimation.

Data Analysis
In this section, we illustrate the empirical importance of the DGzW, DGzEx, and DGzIW distributions using four applications to real data. The fitted models are compared using some criteria, namely, L, Akaike information criterion (AIC), correct Akaike information criterion (CAIC), Chi-square (χ 2 ) with degree of freedom (d.f) and its p-value, Kolmogorov-Smirnov (K-S) and its p-value. We shall compare the DGzW, DGzEx, and DGzIW distributions with some competitive models described in Table 5.

Dataset 1
This data represents the failure times (in weeks) of 50 devices put on a life test (see Bebbington et al. [24]). We compare the fits of the DGzW distribution with some competitive models, such as exponentiated discrete Weibull (EDW), discrete Weibull (DW), discrete inverse Weibull (DIW), discrete Lindley type II (DLi-II), exponentiated discrete Lindley (EDLi), discrete log-logistic (DLLc), and discrete Pareto (DPa). The MLEs with their corresponding standard errors (Std-er), and the goodness of fit statistics are reported in Tables 6 and 7, respectively.  Regarding Table 7, it is clear that the DW and DLi-II models work quite well for analyzing these data aside from the DGzW model (p-value > 0.05). However, we always search for the best model to get the best evaluation of the data, and therefore, concerning the −L, AIC, CAIC, K-S, and p-values, we can say that the DGzW model provides the best fit among all the tested models because it has the smallest values of −L, AIC, CAIC, and K-S statistics, as well as having the highest p-value. Figures 13  and 14 support the results of Table 7.  It is clear that the dataset plausibly came from the DW and DLi-II models. However, the the DGzW model is the best. Table 8 lists some statistics for Dataset 1 based on the DGzW parameters. Regarding Table 8, it is clear that these data suffer from over-dispersion phenomena. Moreover, these data are moderately skewed to the right: its right tail is longer and most of the distribution is to the left with platykurtic. The MTTF of these data equals 30.4215, whereas the entropy equals 2.3640. Table 9 lists some numerical values of the reliability properties when using Dataset 1. Regarding Table 9, it is clear that the RF decreases with t → ∞. Further, the HRF is bathtub-shaped, whereas the MTBF has a unimodal shape.

Dataset 2
These data are reported in Lawless [48] and it gives the failure times for a sample of 15 electronic components in an acceleration life test. For this dataset, we compare the fits of the DGzEx distribution with some competitive models such as discrete exponential (DEx), Discrete generalized exponential type II (DGEx-II), discrete Rayleigh (DR), discrete inverse Rayleigh (DIR), discrete inverse Weibull (DIW), discrete Lomax (DLo), two-parameter discrete Burr type XII (DB-XII), and DPa. The MLEs with their corresponding Std-er, and the goodness of fit statistics are reported in Tables 10 and 11, respectively.  Regarding Table 11, it is clear that the DEx, DGEx-II, DR, DIW, and DLo models work quite well for analyzing these data aside from the DGzW model. However, the DGzEx distribution is the best model among all the tested models. Figures 15 and 16 support the results of Table 11.  It is clear that the dataset plausibly came from the the DEx, DGEx-II, DR, DIW, and DLo models. However, the the DGzEx model is the best. Table 12 lists some statistics for Dataset 2 using the DGzEx parameters. Regarding Table 12, it is clear that these data suffer from over-dispersion phenomena. Moreover, these data are moderately skewed to the right with platykurtic. The MTTF of these data equals 27.160 whereas the entropy equals 4.354. Table 13 lists some numerical values of the reliability properties using Dataset 2. Regarding Table 13, it is clear that the RF and MTBF decrease, whereas the HRF increases with t → ∞.

Dataset 3
These data represent the counts of cysts of kidneys using steroids. This dataset originated from a study Chan et al. [49]. For this dataset, we compare the fits of the DGzW distribution with some competitive models such as DW, DIW, DR, DEx, discrete Lindley (DLi), discrete Lindley type II DLi-II, DLo, and Poisson (Poi). The MLEs with their corresponding Std-er, and the goodness of fit statistics are reported in Tables 14 and 15, respectively.  Regarding Table 15, it is clear that, the DW, DIW, and DLo models work quite well for analyzing these data aside from the DGzW model. However, the the DGzW provides the best fit among all the tested models. Figures 17 and 18 support the results of Table 15.   It is clear that the dataset plausibly came from the DGzW, DW, DIW, and DLo models. However, the DGzW model is the best. Table 16 reports some statistics for Dataset 3 based on the DGzW parameters. According Table 16, it is observed that these data suffer from over-dispersion phenomena. Moreover, these data are moderately skewed to the right with leptokurtic.

Dataset 4
This dataset is the biological experiment data which represents the number of European corn-borer larvae pyrausta in the field (see Bodhisuwan and Sangpoom [50]). It was an experiment conducted randomly on eight hills in 15 replications, where the experimenter counted the number of borers per hill of corn. We shall compare the fits of the DGzIW distribution with some competitive models such as DIW, DB-XII, DIR, DR, negative binomial (NvBi), DPa, and Poi distributions. The MLEs with their corresponding Std-er as well as goodness of fit statistics for Dataset 4 are listed in Tables 17 and 18, respectively.    It is clear that the dataset plausibly came from the DGzIW model. Moreover, it is considered the best model among all the tested models. Table 19 lists some statistics for Dataset 4 based on the DGzIW parameters. Regarding Table 19, it is observed that the data suffers from over-dispersion. Moreover, these data are moderately skewed to the right with leptokurtic.

Concluding Remarks
In this article, we propose a new discrete family of distributions, in the so-called DGz-G family. Several of its statistical properties were studied. Three special models of the new family are discussed in detail. It is found that the proposed family is capable of modeling a negatively skewed, a positively skewed, or a symmetric shape, and the HRF can take different shapes. Further, it is appropriate for modeling both over-and under-dispersed data. The proposed family can be used for modeling count and lifetime data. The maximum likelihood method was used for estimating the family parameters. A simulation study was carried out to assess the performance of the family parameters. It is found that the maximum likelihood method performs quite well in estimating the model parameters. Finally, the flexibility of the proposed family was illustrated by means of four distinctive datasets. The aim of the present work is to attract wider applications in medicine, engineering, and other fields of research.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: