A New Flexible Generalized Lindley Model: Properties, Estimation and Applications

: A new method for generalizing the Lindley distribution, by increasing the number of mixed models is presented formally. This generalized model, which is called the generalized Lindley of integer order, encompasses the exponential and the usual Lindley distributions as special cases when the order of the model is ﬁxed to be one and two, respectively. The moments, the variance, the moment generating function, and the failure rate function of the initiated model are extracted. Estimation of the underlying parameters by the moment and the maximum likelihood methods are acquired. The maximum likelihood estimation for the right censored data has also been discussed. In a simulation running for various orders and censoring rates, efﬁciency of the maximum likelihood estimator has been explored. The introduced model has ultimately been ﬁtted to two real data sets to emphasize its application.


Introduction
Lindley [1] distribution has been applied in analyzing lifetime data and stress-strength reliability models, see e.g., Cakmakyapan and Ozel [2]. Many authors have proposed generalizations for the Lindley distribution in recent years. Even so, all of the known generalizations so far have used three main approaches, namely increasing the underlying parameter to two or three, using larger value of the shape parameters and assuming that the parameter of some convenient discrete distributions follows Lindley distribution. The probability density function (PDF) of Lindley is The well-known Lindley PDF is a mixture of two PDFs G(1, θ), i.e., gamma with shape parameter 1 and scale θ, and G(2, θ) with weights θ 1+θ and 1 1+θ , respectively. The corresponding cumulative distribution function (CDF) is given by This model was studied in some detail by Ghitany et al. [3] and is a cornerstone in Sankaran [4], Ghitany et al. [5], Zamani and Ismail [6], Ghitany et al. [7], Ghitany et al. [8], Al-Mutairi et al. [9], Al-babtain et al. [10], Ghitany et al. [11] and Al-Mutairi et al. [12]. Moreover, Abouammoh et al. [13] have introduced two forms of Lindley generalization by using Lindley with the usual weights p = θ 1+θ and 1 − p and larger value of the underlying shape parameter. Shanker et al. [14] investigated some mathematical properties of an extension of Lindley model. Many authors improved the flexibility of the Lindley model by increasing the number of parameters. Among them, Shanker and Mishra [15] and Shanker and Ghebretsadik [16] introduced different quasi Lindley models which have one more parameter. Merovci and Sharma [17] introduced the beta-Lindley model which is, in fact, a model with two more parameters. Zakerzade and Dolati [18], Ibrahim et al. [19] and Shanker et al. [20] introduced generalizations of Lindley with three parameters. Also, Broderick and Tiantian [21] proposed a generalized Lindley distribution with four parameters.
Clearly, having proper modelling for data makes improved knowledge of data, gives its characteristics and makes its asymptotic behavior traceable, see for example Wolstenholme [22] and McPherson [23]. However, increasing the number of parameters may provide more flexible models, but it makes the parameter estimation very complicated especially for models with three or more parameters. Therefore, in this paper, a different idea is introduced based on a two-parameter model but applying additional baseline distributions to gain more flexibility. In other words, in this approach, we have one continuous parameter, θ, and one discrete parameter, m, and the model is a mixture of m baseline models. m is called the order of mixed distributions.
The sequel of this paper is as follows. In Section 2, the proposed model is introduced, some of its special cases are pointed out and its statistical and reliability properties are investigated. Maximum likelihood estimation of the underlying parameter is discussed in Section 3 and its properties are also studied. The simulation results have been gathered and discussed in Section 4. In the last section, the proposed model has been fitted to some real-life data that have been already studied in the literature for other models.

Definition and Basic Properties
Here we extend the concept of constructing Lindley PDF as a mixture of two gamma models. Definition 1. The random variable X is said to have generalized Lindley of order m (GLOm) if its PDF is This model is in fact a mixture of m gamma distribution (see Cheng and Feast [24]) as where g i is the gamma distribution with parameters (m − i + 1, θ) and the density Please note that for θ = 1, all mixed PDFs are given the same weight, namely 1 m . As θ < 1 and decreases more weights are to PDFs with higher shape parameter, whereas θ > 1 and increases then more weights are given to PDFs with less shape parameter. Also, for general m, the CDF of GLOm is where γ(a, x) = x 0 t a−1 e −t dt is the lower incomplete gamma function. By some algebra, we can write the CDF of GLOm as a simpler expression which since is free from incomplete gamma function is more convenient for numeric computations.
To check accuracy of (6), we can apply induction on m. Clearly, it holds for m = 1. It is straightforward to show that if we assume (6) is true for m = k, then it also holds for m = k + 1. Similarly, the survival function can be written as where γ u (a, x) = ∞ x t a−1 e −t dt is the upper incomplete gamma function. Also, the kth moment of GLOm can be derived as When k vanishes to 1 or 2, it reduces to respectively and can be applied to reach variance of GLOm. The moment generating function (refer to Bulmer [25]) is The failure rate function of this model is

Comparison of GLOm for Small Ms
To elucidate some properties of GLOm it would be interesting to consider basic statistical properties for small values of m such as m = 2, 3 and 4. Although it is known that for m = 1 the GLOm is reduced to exponential, Table 1 shows the PDF, CDF, mean and failure rate function for small ms. For a detailed description of failure rate, we can refer to Lai and Xie [26]. Table 1. PDF, CDF, mean and failure rate for generalized Lindley random variables of orders 2, 3 and 4.

GLO2
GLO3 GLO4 Failure rate Figures 1-3 draw density, survival and failure rate functions for some θ and m values. One can note that as θ decreases the mode drift away from zero in GLOm, for m = 2, 3, 4 and 5. Furthermore, the PDF shape is more skewed to the left as the order m decreases.
Please note that the survival, in Figure 2 decreases sharper as the order m decreases and as θ increases. Consequently, Figure 3 shows that the failure rate increases sharper for smaller m and larger θ.

Estimation of the Parameters
Consider one sample of size n following GLOm denoted by X 1 , X 2 , . . . , X n . For known m, to provide an estimation of θ via the moments method, we can apply the equation It is an equation of degree m for θ which gives m potential answers for θ. Then, we may have up to m different estimations for θ. We can use these estimation values of θ as initial values for optimizing likelihood function.
For outcomes x 1 , x 2 , . . . , x n of GLOm distribution, when m is known, the log-likelihood function is ).
The last expression does not depend on θ and can be ignored in the optimization process. However, it must be taken into account for comparing different models in terms of their likelihood values. When m is unknown, which is usually the case, we optimize the likelihood function for m = 1, 2, 3, . . . and compare the likelihood values to find a proper m. By differentiation from (10), the score statistics for m ≥ 2, is given as Therefore, the Fisher information for θ and m = 2 is and for m ≥ 3 It is well-known that variance of the maximum likelihood estimator is reverse of the Fisher information. For more details of the maximum likelihood estimation, see Shao [27].

Right Censored Data
Let X i , i = 1, 2, . . . , n follow the GLOm model representing independent event times which are exposed to right random censoring. The ith event time X i is said to be censored, when it occurs after corresponding random censorship time C i , i.e., whenever C i ≤ X i . In this situation, the censoring time is assumed to be observed and the only information about the event time is that it is greater than the observed censoring time. Thus, the observations are supposed to be consisting of T i = min(X i , C i ) and For more information about the right censoring phenomenon in survival data, we refer the readers to Fleming and Harrington [28].
In presence of data in the specific form as (t i , δ i ) where t i = min(x i , c i ), the log-likelihood function is obtained as in which f and S are, respectively, density and survival functions of the GLOm.

Simulation
In accordance with the certitude that the GLOm is veritably a mixture of gamma distributions, the following couple of steps can be taken to generate a sample of size n from the GLOm.

1.
Simulate one random variable of multinomial distribution with parameters n and w i , i = 1, 2, . . . , m, ∑ m i=1 w i = 1. Assume the generated instance be denoted by k 1 , k 2 , . . . , k m corresponding respectively to probabilities w 1 , w 2 , . . . , w m . Please note that ∑ m i=1 k i = n.

2.
Simulate samples of sizes k i , i = 1, 2, . . . , m from gamma distribution with parameters (m − i + 1, θ). Then, we can merge these samples to provide one sample of size n from GLOm model.
The results of our simulation study have been gathered in Table 2. Each time, proper values of m and θ is selected and r = 500 replicates of samples of size n = 20 have been drawn. For each replicate, the maximum likelihood estimation (θ) has been computed. Then, three measures have been reported in the table. The simulation results reported in Table 2 indicate two points.
• As m increases, B, B * and MSE ofθ decrease. • Also, B, B * and MSE increase with θ.

Censored Data
In another simulation study, p = 0.25 and p = 0.40 of the sample simulated from the GLOm have been censored from right. In actuality, the random censorship time C was assumed to follow the uniform distribution on [0, u]. Given the value of p, the amount of u is the root of the equation in which S(t) is the survival function of the GLOm. We solve this equation for both values of p mentioned above and generate r = 500 replicates of samples of sizes n = 20, 40. For each replicate, the maximum likelihood method is applied to perceive the estimation of θ. The results of the simulation study have been gathered in Table 3 for p = 0.25 and in Table 4 for p = 0.40.
As a summary of the simulation, we observe and verify the following points: • The mean of the absolute bias (B * ) is slightly greater than that for the complete (non-censored) data and increases with rising the portion of the censored part of the sample.

•
For small θs, in the presence of censorship, the bias exhibits more fluctuation around zero and is more skewed to left. It may, therefore, cause smaller values for B and larger values for B * . For large θs,θ is less than the actual value of the parameter and it causes B to be negative and B * being its absolute value.

•
Analogously as dealing with non-censored data, when m increases (θ decreases), B, B * and MSE ofθ decrease.

Failure of Yarn
Here, we consider one real data set reported by Lawless [29]. The data set which shows the number of cycles to failure for 25 specimens yarn have been gathered in Table 5. The results of fitting GLOm distribution to this data set is presented in Table 6 for m = 1 to 4. Clearly, the likelihood is stronger for m = 2. Therefore, we prefer GLO2, among others. Figure 4 shows empirical survival function along fitted models.

Ovarian Cancer
Edmunson et al. [30] considered survival data related to some ovarian cancer patients. The data set has also been reported in 'survival' package in R and includes variables 'futime' indicating time to death or censoring of the patients and 'fustat' as censoring status. Table 7 shows results of fitting data to GLOm for m = 1, 2, 3 and 4. Based on the log-likelihood measure, among these models, GLO3 describe the data better. The Kaplan-Meier survival function along with GLOms have been drawn in Figure 5.

Conclusions
Lindley distribution which is a mixture of gamma models has attracted the attention of many researchers in recent decades. It has been shown that this distribution is quite useful in describing real data sets. Over and above that, extensions of this model have also been introduced in the literature. Here, we introduced a fresh method to generalize the Lindley distribution which the flexibility rises by increasing the number of mixed components in the model. The proposed model is called the GLOm where m represents the number of baseline models called the order of the model. We studied some properties of GLOm for different m. The moments, the variance and the moment generating function were shown to have closed forms for every m. The MLE of the parameter have been discussed for either of the right censored data and the complete data. The supplied simulation results indicated that the MLE is suitable for both the censored data and the non-censored data. Two real data sets were discussed showing that generalizing the Lindley distribution as accomplished in this paper may be helpful to describe the data more conveniently.