Next Article in Journal
Evaluating Different Methods for Determining the Velocity-Dip Position over the Entire Cross Section and at the Centerline of a Rectangular Open Channel
Previous Article in Journal
What Is So Special about Quantum Clicks?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications

by
Abdulhakim A. Al-Babtain
1,
Abdul Hadi N. Ahmed
2 and
Ahmed Z. Afify
3,*
1
Department of Statistics and Operations Research, King Saud University, Riyadh 11362, Saudi Arabia
2
Department of Mathematical Statistics, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12631, Egypt
3
Department of Statistics, Mathematics and Insurance, Benha University, Benha 13511, Egypt
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(6), 603; https://doi.org/10.3390/e22060603
Submission received: 21 April 2020 / Revised: 19 May 2020 / Accepted: 26 May 2020 / Published: 28 May 2020
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
In this paper, we propose and study a new probability mass function by creating a natural discrete analog to the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. Several statistical properties of the introduced distribution have been established including moments and moment generating function, residual moments, characterization, entropy, estimation of the parameter by the maximum likelihood method. A bias reduction method is applied to the derived estimator; its existence and uniqueness are discussed. Applications of the goodness of fit of the proposed distribution have been examined and compared with other discrete distributions using three real data sets from biological sciences.

1. Introduction

Modeling of count data is found in many fields such as public health, medicine, epidemiology, applied science, sociology, and agriculture. Several distributions have been proposed for this count data, especially the count data with over-dispersion. However, it was found that the traditional discrete distributions (geometric, Poisson, etc.) have limited applicability as models for reliability, failure times, counts, etc. This is so, since many real count data show either over-dispersion, in which the variance is greater than the mean or under-dispersion, in which the variance is smaller than the mean. This has led to the development of some discrete distributions based on popular continuous models for reliability, failure times, etc.
On the other hand, it has been observed that many times in the real world the original variables may be continuous in nature but discrete by observation, e. g., modeling the number of motions of a pendulum before resting, the number of times devices are switched on/off, the number of days a patient stays in a hospital, and the number of weeks/months/years a kidney patient survives after treatment, the number of current fluctuations which an electrical item can withstand before its failure; among many other applications. Therefore, it is reasonable and convenient to model these situations by appropriate discrete distributions generated from the underlying continuous distributions preserving one or more important characteristics including probability density function (pdf), moment generating function (mgf), moments, hazard rate function (hrf), mean residual life function, etc. of the continuous distribution.
Interests in discrete failure data came relatively late in comparison to its continuous analogue. The subject matter has to some extent been neglected. It was only briefly mentioned by Barlow and Proschan [1]. For, earlier works on discrete lifetime distributions, see Salvia and Bollinger [2], Xekalaki [3], Padgett and Spurrier [4], and Ebrahimi [5].
In the last few decades, many papers have appeared in the statistical literature on the discretization of continuous distributions. In spite of all the available discrete models, there is still a great need to create more flexible discrete lifetime distributions to serve many areas like economics, social sciences, and biometrics, and reliability studies to suit various types of count data.
The most recent discrete distributions are those due to Krishna and Pundir [6], Jazi et al. [7], and Gómez-Déniz [8]. Krishna and Pundir [6] constructed discrete analogues of the continuous Burr and Pareto distributions. Jazi et al. [7] constructed a discrete analogue of the continuous inverse Weibull distribution. Gómez-Déniz [8] constructed a discrete analogue of the generalized exponential distribution due to Marshall and Olkin [9]. Being the most recent, these three distributions have not yet received any applications. All three distributions have at least two parameters each. All three distributions have moments expressed in terms of either infinite sums or non-standard special functions.
Several methods are available in the statistics literature to derive a discrete distribution from a continuous distribution. The most commonly used technique to generate discrete analogies from continuous ones is briefly described here. If the underlying continuous non-negative failure time X has the survival function (sf), S ( x ) = P [ X x ] , and times are grouped into unit intervals, the corresponding probability mass function (pmf) is defined by
p ( x ) = S ( x ) S ( x + 1 ) ,   x = 0 ,   1 ,   2 ,  
Several authors have used this discretization method of a continuous distribution to generate a corresponding discrete analog. Following this approach, the most recent discrete distributions are due to Stein and Dattero [10], Roy [11,12,13], Krishna and Pundir [6], Jazi et al. [7] and Gómez-Déniz [8].
Lindley [14] introduced the following continuous distribution function
F ( x ; θ ) = 1 1 + θ + θ x 1 + θ e θ x ,   x > 0   and   θ > 0
with the pdf
f ( x ; θ ) = θ 2 1 + θ ( 1 + x ) e θ x
to model various types of continuous lifetime data.
This distribution is derived as a mixture of exponential ( θ ) and gamma (2, θ ) distributions.
It is worth noting that the pmfs of the two most recent discrete Lindley distributions due to Gómez-Déniz and Calderín-Ojeda [15] and that due to Bakouch et al. [16] are obtained by discretizing the continuous sf of the Lindley distribution and have a quite complex structure in terms of parameter estimation. In order to overcome problems in the estimation process of the parameter of Lindley distribution, we propose our new discrete Lindley distribution. To the best of our knowledge, this is the first article that uses the well-known fact that the geometric and the negative binomial distributions are the natural discrete analogs of the exponential and the gamma distributions, respectively (see, e.g., Nakagawa and Osaki [17] and Roy [13], among others).
To this end, we created a “natural” discrete analog of Lindley’s distribution, by mixing the geometric distribution and the negative binomial distribution.
We note that the model obtained is over-dispersed, which makes it suitable to be applied in the collective risk models and is competitive with the Poisson distribution to fit automobile claim frequency data.
Definition 1.
Let X be a non-negative discrete random variable obtained as a finite mixture of geometric ( θ ) and negative binomial ( 2 , θ ) with mixing probabilities θ θ + β and β θ + β , respectively. The new TNDL distribution is specified by the pmf
p ( x ; θ , β ) = θ 2 θ + β ( 1 θ ) x [ 1 + β ( 1 + x ) ] ,   x = 0 ,   1 ,   2 , ,   β > 0   and   θ ( 0 , 1 )
We note that the TNDL distribution includes the following discrete distributions as particular cases:
(i)
The geometric distribution when β = 0 .
(ii)
The discrete Lindley distribution of Bakouch et al. [16], when θ = 1 p .
The corresponding cumulative distribution function (cdf), sf and hrf, denoted by r ( x ; θ , β ) , of the TNDL are given for x = 0 ,   1 ,   2 ,   ,   β > 0   and   θ ( 0 , 1 ) by
F ( x ; θ , β ) = P ( X < x ) = 1 θ ( 1 + β ) + β ( 1 θ + θ x ) θ + β ( 1 θ ) x ,
S ( x ; θ , β ) = P ( X x ) = θ ( 1 + β ) + β ( 1 θ + θ x ) θ + β ( 1 θ ) x
and
r ( x ; θ , β ) = θ 2 [ 1 + β ( 1 + x ) ] θ ( 1 + β ) + β ( 1 θ + θ x )
This new distribution can be considered as an alternative to the negative binomial, Poisson-inverse Gaussian, hyper-Poisson, and generalized Poisson distributions.
Although the two-parameter case has its merits, we decided to focus on the single parameter case, since the second parameter, β appears only in the mixing weights. Moreover, the one parameter case might be much more useful for practitioners and engineers.
Thus, our derivations have focused on of the single parameter natural discrete Lindley (NDL) distribution, i.e., TNDL when β = 1 . We note that the NDL distribution is the counterpart of the single parameter continuous Lindley distribution.
The rest of the paper is organized as follows. In Section 2, the discrete analog of the Lindley distribution is developed with some plots for its pmf and hrf. Some reliability characteristics of the NDL distribution along with some important theorems are established in Section 3. In Section 4, we develop explicit expressions for its moments. In Section 5, we introduce the entropy of the NDL model. A characterization of the NDL distribution in terms of a relationship between its mean residual life function and its hazard rate function is derived in Section 6. Section 7 provides the distribution of the maximum and the minimum in a random sample selected from the NDL distribution. We derive the asymptotic distribution of extreme order statistics in Section 8. In Section 9, the method of maximum likelihood and the method of moments are used to estimate the parameter θ . Section 9 applies a bias reduction method to the derived MLE estimator. Its existence and uniqueness are discussed along with simulation results to explore the behavior of the maximum likelihood estimator. Three real data sets are used to validate the use of NDL in fitting lifetime count data are presented in Section 10. Finally, conclusions are provided in Section 11.

2. The NDL Distribution

Definition 2.
Let X be a non-negative random variable obtained as a finite mixture of geometric ( θ ) and negative binomial ( 2 , θ ) with mixing probabilities θ θ + 1 and 1 θ + 1 , respectively. The new distributions specified by the pmf
p ( x ; θ ) = θ 2 1 + θ ( 2 + x ) ( 1 θ ) x ,   x = 0 ,   1 ,   2 ,   and   θ ( 0 , 1 ) .
Lindley distribution may not be considered as a flexible model for analyzing different lifetimes and actuarial data. Therefore, to increase the flexibility for modeling purposes, we developed a single parameter NDL distribution. Furthermore, our new formulation provides a tractable model with attractive properties, which makes it suitable for applications not only in insurance settings but also in other fields where over-dispersions are observed. Some of these features include the uni-modality and over-dispersion. Many other properties and a recurrence formula for the probabilities of the new distribution are provided.
The corresponding sf of the NDL distribution is given by
S ( x ; θ ) = P ( X x ) = 1 + θ + θ x 1 + θ ( 1 θ ) x ,   x = 0 , 1 , 2 ,   and   θ ( 0 , 1 )
Its hrf reduces to
r ( x ; θ ) = p ( x ) P ( X x ) = θ 2 ( 2 + x ) 1 + θ + θ x ,         x = 0 , 1 , 2 ,   and   θ ( 0 , 1 ) .
It is easy to see that lim x r ( x ; θ ) = θ . Hence, the parameter θ can be interpreted as a strict upperbound on the failure rate function, an important characteristic for lifetime models, corresponding to Equation (1). Not many discrete distributions have their parameters directly interpretable in terms of their failure rate functions. One exception is the geometric distribution but in this case the failure rate function is a constant. We shall also see later that the NDL distribution always allows for increasing failure rates. It does not allow for a constant or decreasing failure rate. The geometric, discrete Weibull and discrete gamma distributions do allow for constant or decreasing failure rates. These are very unrealistic features because there are hardly any real-life systems that have constant or decreasing failure rates. So, the NDL distribution is more useful than the geometric distribution for modeling the number of rare events. Furthermore, when θ is closed to zero, then the NDL distribution can have different shapes than the pmf of a geometric distribution. This situation made our distribution have a thinner right tail than a distribution, which is compounded with exponential distribution. Hence, the new NDL distribution can be useful for modeling lifetime data such as a time interval between successive earthquakes, the time period of bacteria spreading, and the recovery period of a certain disease.
Figure 1, Figure 2 and Figure 3 show some possible shapes for the pmf of the NDL distribution. One can note that the NDL distribution is always uni-modal for any value of θ (see also Theorem 1). Figure 4 and Figure 5 indicate that the hrf of the NDL distribution is always increasing in θ (see also Theorem 1).

3. Reliability Properties of NDL Distribution

3.1. Log-Concavity

Definition 3.
A discrete random variable X with pmf p ( x ) is said to be increasing failure rate (IFR) if p ( x ) is log-concave, i.e., if p ( x ) p ( x + 2 ) p ( x + 1 ) 2 ,   x = 0 , 1 , 2 , (see, e.g., Keilson and Gerber [18]).
Theorem 1.
The pmf of the NDL distribution in (1) is log-concave for all choices of θ ( 0 , 1 ) .
Proof. 
The condition in Definition (3) is easily verified from (1).  □
Generally, it is well-known that a log-concave pmf is strongly uni-modal (see, e.g., Nekoukhou et al. [19]) and accordingly have the discrete IFR property, (see, e.g., Barlow and Proschan [1]. It follows from Theorem 1 that the NDL distribution is uni-modal and has the discrete IFR property (see Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5). Thus, we have the following corollary.
Corollary 1.
If the random variable X ~ NDL( θ ) then the mode of X is located at w , where w is a positive integer satisfies 1 3 θ θ w 1 2 θ θ . This implies that p ( x + 1 ) p ( x )   x w and p ( x + 2 ) p ( x + 1 )   x w (see, e.g., Kielson and Gerber [18], Nekoukhou et al. [19], and Abouammoh and Mashhour [20]). Hence, the NDL distribution has the following chain of implications IFR IFRA NBU NBUE DMRL (see, Kemp [21]), where IFRA refers to increasing failure rate average, NBU refers to new better than used, NBUE refers to new better than used in expectation and DMRL refers to decreasing mean residual lifetime.
Definition 4.
A discrete life distribution P = { p k = P ( X = k ) } ,   k N , where N is the set of all non-negative integers. With A k = P ( X k ) , we define the discrete reversed failure rate (DRFR) as follows
r k = p k A k ,   k N .
Definition 5.
(Al-Zahrani and Al-Sobhi [22]): A discrete life distribution P = { p k = P ( X = k ) } ,   k N , where N is the set of all non-negative integers is said to be discrete increasing (decreasing) reversed failure rate DIRFR (DDRFR) if r k ,   k N is increasing (decreasing).
Proposition 1.
Let p k the sequence defined by the NDL distribution, then NDL distribution has the DIRFR property.
Proof. 
It is easy to prove that r k is increasing in k .  □
The reversed hrf of X is
r ( x ; θ ) = p ( x ; θ ) 1 S ( x ; θ ) = ( 1 + θ + θ x ) ( 1 θ ) x [ ( 1 + θ ) ( 1 + θ + θ x ) ( 1 θ ) x ] .
Remark 1.
The following is a simple recursion formula for p ( x + 1 ) in terms of p ( x ) of the NDL for x = 0 ,   1 ,   2 ,   , where
p ( x + 1 ) = 3 + x 2 + x ( 1 θ ) p ( x ) ,
where p ( 0 ) = 2 θ 2 / ( 1 + θ ) .
Remark 2.
(i) 
r ( 0 ) = p ( 0 ) .
(ii) 
r ( x ) is an increasing function in x and θ .
(iii) 
r ( x ) r ( 0 )     x N and hence the NDL distribution has the new better thanused in failure rate (NBUFR) property (see, Abouammoh and Ahmed [23]).
Remark 3.
Following Salvia and Bollinger [2], the following bounds hold for the sf, the mean, and the MRL function. For any k = 1 ,   2 ,  
S ( k ) 1 r ( 1 ) = ( 3 θ + 1 ) ( 1 θ ) 1 + 2 θ ,
μ 1 1 r ( 1 ) r ( 1 ) = ( 3 θ + 1 ) ( 1 θ ) 3 θ 2
and
1 θ 2 ( 2 + k ) 1 + θ + θ k μ 1 ( 3 θ + 1 ) ( 1 θ ) 3 θ 2 .

3.2. Stochastic Interpretations of the Parameter Theta

Stochastic orders are important measures to judge comparative behaviors of random variables. Shaked and Shanthikumar [24] showed that many stochastic orders exist and have various applications.
Definition 6.
Let X and Y be two random variables with cumulative distribution functions F X ( . ) and F Y ( . ) , respectively.
(i) 
Stochastic order ( X s t Y ): if F X ( x ) F Y ( x ) for all x .
(ii) 
Hazard rate order ( X h r Y ): if r X ( x ) r Y ( x ) for all x .
(iii) 
Reversed hazard rate order ( X r h Y ): if r X ( x ) r Y ( x ) for all x .
(iv) 
Mean residual life order ( X m r l Y ): if m X ( x ) m Y ( x ) for all x .
(v) 
Likelihood ratio order ( X l r Y ): if p X ( x ) / p Y ( x ) is non-decreasing in x .
The following chains of implication (see, e.g., Shaked and Shantihkumar [24]) hold.
X l r Y X h r Y X s t Y X m r l Y   and   X l r Y X r h Y
Theorem 2.
Let ~ N D L ( θ 1 ) and Y ~ N D L ( θ 2 ) . Then, X l r Y for all θ 1 > θ 2 .
Proof. 
We have
L ( x ) = p X ( x ) p Y ( x ) = θ 1 2 ( 1 + θ 2 ) ( 1 θ 1 ) x θ 2 2 ( 1 + θ 1 ) ( 1 θ 2 ) x .
Clearly, one can see that L ( x + 1 ) L ( x )     θ 1 > θ 2 .
Theorem 2 shows that the NDL distribution is ordered according to the strongest stochastic order (v).  □
Corollary 2.
Based on the chain of stochastic orders in the definition (4), X h r Y , X r h Y , X m r l Y and X s t Y .
Definition 7.
The discrete random variable X is said to be smaller than Y in weak likelihood ratio ordering (denoted by X w l r Y ) if p X ( x + 1 ) p Y ( x + 1 ) p X ( 0 ) p Y ( 0 )     x 0 (see, Khider et al. [25]).
Theorem 3.
Let ~ N D L ( θ 1 ) and Y ~ N D L ( θ 2 ) . Then, X is said to be smaller than Y in weak likelihood ratio ordering, denoted by X w l r Y , for all θ 1 < θ 2 .
Proof. 
According to Definition 2.5 of Khider et al. [25], we can prove that
p X ( x + 1 ) p Y ( x + 1 ) p X ( 0 ) p Y ( 0 ) .
Then, we obtain
θ 1 2 ( 1 + θ 2 ) [ ( 1 θ 1 ) x + 1 ( 1 θ 2 ) x + 1 ] θ 2 2 ( 1 + θ 1 ) ( 1 θ 2 ) x + 1 0 ,     θ 1 < θ 2 .
Hence X w l r Y .
The mean residual life (MRL) function of the NDL distribution is defined by
m ( x ) = E ( X x | X x ) = 1 θ θ 2 r ( x ) + ( 1 θ ) ( 2 θ ) θ ( 1 + θ + θ x ) ,
where r ( x ) is the hrf of the NDL distribution.  □
Theorem 4.
(Stochastic comparisons of partial random sums): Let { X i ,   i = 1 , 2 , } be a sequence of NDL random variables, and M and N be two NDL random variables and independent of X i s. Then
i = 0 M X i m r l i = 0 N Y i .
Proof. 
Follows directly from Corrolary and Theorem 1.B.4 in Shaked and Shanthikumar [26].  □

4. Moments

The first four raw moments of the NDL distribution are, respectively, given by
μ 1 = E ( X ) = ( 1 θ ) ( 2 + θ ) θ ( 1 + θ ) ,
μ 2 = θ 3 + θ 2 8 θ + 6 θ 2 ( 1 + θ ) ,
μ 3 = ( 1 θ ) ( θ 3 + 2 θ 2 24 θ + 24 ) θ 3 ( 1 + θ )
and
μ 4 = ( 1 θ ) ( θ 4 2 θ 3 + 78 θ 2 192 θ + 120 ) θ 4 ( 1 + θ ) .
The corresponding variance and index of dispersion ( ID ) are
Variance ( X ) = ( 1 θ ) ( 4 θ + 2 ) θ 2 ( 1 + θ ) 2
and
ID ( X ) = Variance ( X ) E ( X ) = 2 ( 2 θ + 1 ) θ ( 1 + θ ) ( 2 + θ ) .
The ID indicates whether a certain distribution is suitable for under or over-dispersed data sets and has applications in ecology for measuring clustering (see, e.g., Johnson [27]). If the   ID 1 , the distribution is over-dispersed. It is observed that the distribution shows over-dispersion for all values of θ . We note that the ID decreases monotonically in θ . It converges to 1 as θ 1 , while it tends to infinity when θ 0 . So, the NDL distribution should only be used in the count data analysis with over-dispersion. In Table 1 a set of numerical values for different values of θ are shown for practical uses. It is noted from Table 1 that the mean, variance, and ID are all decreasing functions of θ .
The moment generating function is
M X ( t ) = E ( e t X ) = θ 2 ( 2 θ ¯ e t ) ( 1 + θ ) ( 1 θ ¯ e t ) 2 , t < log ( 1 θ )   and   θ ( 0 , 1 ) .
The probability generating function is
Ψ X ( s ) = E ( s X ) = θ 2 ( 2 θ ¯ s ) ( 1 + θ ) ( 1 θ ¯ s ) 2 , | t | < 1 / ( 1 θ ) .
The k th descending factorial moment of X is given (for k = 0 , 1 , 2 , ) by
μ ( k ) = ( k + θ + 1 ) ( 1 θ ) k k ! ( 1 + θ ) θ k ,
where μ ( k ) = E [ X ( X 1 ) ( X k + 1 ) ] . Clearly, for k = 0 , we obtain μ ( 0 ) = 1 and the mean of X in (2) follows as μ ( 1 ) = E ( X ) .
The k th ascending factorial moment of X is given (for k = 0 ,   1 ,   2 , ) by
μ [ k ] = [ k ( 1 θ ) 2 + 1 + θ 2 θ 2 ] k ! ( 1 + θ ) θ k ,
where μ [ k ] = E [ X ( X + 1 ) ( X + k 1 ) ] . Clearly, for k = 0 , we obtain μ [ 0 ] = 1 + θ 2 θ 2 1 + θ and the mean of X in (2) follows as μ [ 1 ] = E ( X ) .

5. Entropy

The entropy is a measure of uncertainty of a random variable and it can be defined for a discrete random variable with pmf, p ( x ) , by the formula (Gray [28]).
H ( X ) = x p ( x ) log p ( x ) .
The entropy of the NDL distribution can be calculated by Mathematica software© var.9. The following formula is obtained by Mathematica as follows.
H ( X ) = log ( θ 2 1 + θ ) + θ 2 1 + θ LerchPhi ( 0 , 1 , 0 ) [ 1 θ , 1 , 2 ] + ( θ 2 + θ 2 ) θ ( 1 + θ ) ,
where LerchPhi ( 0 , 1 , 0 ) [ z , a , s ] gives the Lerch transcendent Φ ( z , a , s ) = k = z k ( k + s ) a .
Table 2 presents some numerical values of the entropy of an NDL ( θ ) for different choices of θ . Using the Mathematica software© var.9, Figure 6 relates the entropy, H ( X ) , to the values of parameter θ . One may note that H ( X ) is monotonically decreasing in θ ( 0 , 1 ) with its limit tending to zero as θ tends to 1.

6. Characterization

In this section, we characterize the NDL distribution in terms of a relationship between its mean residual life function and its hazard rate function. This is given in the following theorem.
Theorem 5.
Let X be a non-negative discrete random variable with pmf P ( X = x ) and x = 0 , 1 , 2 , , it then will follow the NDL distribution with parameter θ if
m ( x ) = E ( X x | X x ) = 1 θ θ 2 r ( x ) + ( 1 θ ) ( 2 θ ) θ ( 1 + θ + θ x ) ,   x ,
where r ( x ) is the hrf of the NDL distribution.
Proof. 
 
Necessity. The MRL function is defined as (see, Kemp [21])
m ( x ) = k = x + 1 S ( k ) S ( x ) .
This implies that
m ( x ) = ( 1 + θ ) k = x + 1 ( 1 θ ) k + θ k = x + 1 k ( 1 θ ) k ( 1 + θ ) S ( x )
m ( x ) = θ ( 1 θ ) x + 1 ( 2 + θ + θ x ) θ 2 ( 1 + θ ) S ( x ) .
After some simplification, one has
m ( x ) = ( 1 θ ) r ( x ) θ 2 + ( 1 θ ) ( 2 θ ) θ ( 1 + θ + θ x ) .
 □
Sufficiency: Suppose that Equation (3) holds, then we can rewrite it as
k = x + 1 S ( k ) = ( 1 θ ) p ( x ) θ + ( 1 θ ) ( 2 θ ) 1 + θ .
Or
k = x + 1 S ( k ) = p ( x + 1 ) θ + 2 ( 1 θ ) 2 ( 1 θ ) x 1 + θ .
Comparing the last two equations, we obtain
p ( x + 1 ) ( 1 θ ) p ( x ) = θ 2 ( 1 θ ) x + 1 ( 2 + x ) ( 1 + θ ) ( 2 + x )
( 2 + x ) p ( x + 1 ) = ( 3 + x ) ( 1 θ ) p ( x )
which gives
p ( 0 ) = 2 θ 2 1 + θ   and   p ( x ) = p ( 0 ) ( 2 + x ) 2 ( 1 θ ) x
Remark 4.
(i) 
m ( 0 ) = μ .
(ii) 
m ( x ) = ( 1 θ ) ( 2 + θ + θ x ) θ ( 1 + θ + θ x ) is a decreasing function in x and in θ .

7. Distribution of the Maximum and the Minimum in a Random Sample from the NDL Distribution

Maximum and minimum of random variable arise in reliability. Let X i ,   i = 1 , 2 , , n , be iid random variables from the NDL distribution with parameter θ . Then, the sf of the minimum, Min ( X 1 ,   X 2 ,   , X n ) , is given by
F ¯ Min ( x ) = ( 1 + θ + θ x 1 + θ ) n ( 1 θ ) n x .
The cdf of the maximum, Max ( X 1 ,   X 2 , , X n ) , is given by
F Max ( x ) = ( 1 1 + θ + θ x 1 + θ ) n ( 1 θ ) n x .

8. Asymptotic Distribution of Extreme Order Statistics

Sometimes it is of interest to consider the asymptotic distributions of the extreme order statistics, that is, X 1 : n and X n : n . One can see that
lim t 1 F ( t + x / θ ) 1 F ( t ) = lim t ( 1 + θ + x + θ t ) ( 1 θ ) x / θ ( 1 + θ + θ t ) = ( 1 θ ) x / θ .
It can also be shown that
lim t 0 F ( t x ) F ( t ) = lim t 0 1 + θ ( 1 + θ + θ t x ) ( 1 θ ) t x 1 + θ ( 1 + θ + θ t ) ( 1 θ ) t = x .
Hence, it follows from Theorem 1.6.2 in Leadbetter et al. [29] that there must be norming constants a n > 0 ,   b n ,   c n > 0 and d n such that
Pr { a n ( X n : n b n ) t } exp [ ( 1 θ ) x / θ ]
and
Pr { c n ( X 1 : n d n ) t } 1 ( 1 θ ) x θ .
as n .

9. Estimation and Simulation

In this section, we estimate the parameter θ of the NDL distribution, using both maximum likelihood estimator (MLE) and moment estimator (ME). We show that they result in the same estimator in closed form. We note that the maximum likelihood method is often adopted to estimate the unknown parameters of a statistical model because the maximum likelihood estimators (MLEs) have many appealing properties; for example, they are asymptotically unbiased, consistent, and asymptotically normally distributed, etc.
Let x 1 , , x n be a random sample of size n from the NDL distribution, then the log-likelihood function is given by
( θ | x ) 2 n log ( θ ) n log ( 1 + θ ) + n x ¯ log ( 1 θ ) .
The MLE of θ follows by solving d d θ ( θ | x ) = 0 , that is
d d θ ( θ | x ) = 2 n θ n 1 + θ n x ¯ 1 θ = 0 .
After some algebra, the MLE of θ , is given by the following compact formula
θ ^ = 1 2 1 + 8 / ( 1 + x ¯ ) 1 2 .
In the recent work of Balabdaoui et al. [30], the discrete MLE under the constraint of log-concavity was studied. As opposed to the continuous setting, existence of the uni-modal MLE when the data are discrete is guaranteed. On the other hand, uniqueness is not always true, but this problem is rather marginal, as a rule for selecting from among the finite options is immediate, making our estimator fully automatic and easy to compute.
Following the bias reduction technique presented in Reath et al. [31], one can reach the following formula for the bias-correction (BI-C):
BI-C ( θ ^ ) = 1 n 2 [ 1 θ 3 + 1 ( 1 + θ ) 3 + ( 1 θ ) ( 2 + θ ) θ ( 1 + θ ) ( 1 θ ) 3 ] .
Next, we present the results of a conducted simulation study to explore the behavior of the maximum likelihood estimator with and without bias reduction for different combinations of the parameter θ and samples sizes, usind the measures average estimate (AVE), average mean square error (MSE), average bias (ABI), BI-C and average mean relative error (MRE).
The MSE, ABI, BI-C, and MRE are defined (for θ ) by MSE = 1 N i = 1 N ( θ ^ θ ) 2 , ABI = 1 N i = 1 N | θ ^ θ | , and MRE = 1 N i = 1 N | θ ^ θ | / θ .
We generated 5000 random samples of sizes n = ( 20 ,   30 ,   50 ,   100 ,   150 ,   300 ) from NDL ( θ ) distribution using the inversion method. The numerical results are obtaind using the R software, for values θ = ( 0.07 ,   0.15 ,   0.25 ,   0.35 ,   0.50 ,   0.65 ,   0.83 ,   0.95 ) . The AVE, MSE, ABI, BI-C, and MRE for θ are reported in Table 3. It is clear, from Table 3, that the estimates of θ are very close to the true values for all values of θ Furthermore, for illustration Figure 7 presents comparisons between the estimators with and without bias corrections for n = 20 and n = 300 .
It is clear, from the above figure, that the difference in bias is noticed for values of θ less than 0.3 and almost disappears for values equal to or more than 0.3 when the sample size is 20, while the difference in bias is quite obvious when θ exceeds 0.3 and the sample size is 300.

10. Applications to Count Data

In this section, we use three real data sets to illustrate the importance and superiority of the NDL distribution over the existing models, namely discrete Lindley (DL) [16], discrete Burr (DB), geometric (Gc), discrete Pareto (DP) [6], and discrete Burr-Hatke (El-Morshedy et al. [32]) distributions. The first dataset consists of remission times in weeks for 20 leukemia patients randomly assigned to a certain treatment (Lawless [33]).
The second data set consists of 123 observations, and it refers to numbers of fires, only fires in forest districts are considered, in Greece for the period from 1 July 1998 to 31 August of the same year (Karlis and Xekalaki [34]). Both data sets have been analyzed and reported by [16].
The third data set represents the numbers of daily deaths in Egypt due to COVID-19 infections from 8 March to 30 April, 2020, and contains 47 observations which are reported on worldometer website through https://www.worldometers.info/coronavirus/country/egypt/. The data are: 1, 1, 2, 2, 1, 1, 2, 4, 5, 1, 1, 3, 6, 6, 4, 1, 5, 6, 6, 8, 5, 7, 7, 9, 9, 15, 17, 11, 13, 5, 14, 5, 13, 9, 19, 15, 11, 14, 12, 11, 7, 13, 10, 20, 22, 21, 12. Some summary statistics for the three datasets are shown in Table 4.
The maximum likelihood estimates, their standard errors, and Kolmogorov–Smirnov (KS) statistics with their associated p-values are reported in Table 5. It is shown, from Table 4, that the new NDL distribution provides better fits for the three data sets over the DL, Gc, DB, DP, and DBH models.
Probability–probability (PP) plots for the three data sets are shown in Figure 8, Figure 9 and Figure 10, respectively. These plots support the results in Table 5, that the NDL provides a closer fit for the emission times, numbers of fires, and Coronavirus data compared to DL, Gc, DB, DP, and DBH distributions.

11. Conclusions

In this paper, we propose and study a new natural discrete analog of the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution is called natural discrete Lindley (NDL) distribution and it has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. The moments of the NDL distribution and many reliability properties are derived in closed forms. A characterization of the NDL distribution relating its mean residual life function and its hazard rate function is derived and used to characterize the NDL distribution. We also provide the distribution of the maximum and the minimum in a random sample selected from the NDL distribution.
The maximum likelihood and moment estimators the parameter θ are derived and the bias reduction technique is applied. Simulation results to explore the MLE behavior and to compare between the bias and bias corrected are conducted. Three real data sets are used to validate the use of NDL in fitting lifetime count data.
It is worth mentioning that the research in this paper can be extended in many ways. For example, two or three-parameter NDL could be considered together with extensive bias reduction techniques.
Transmuted and/or exponentiated versions may be established, several properties of order statistics from the distribution could be explored and their relations to well-known stochastic orders, and a bivariate discrete NDL may also be studied.

Author Contributions

Conceptualization, A.H.N.A. and A.Z.A.; Methodology, A.H.N.A.; Software, A.Z.A.; Validation, A.H.N.A., A.Z.A. and A.A.A.-B.; Formal Analysis, A.Z.A.; Investigation, A.H.N.A.; Resources, A.A.A.-B.; Data Curation, A.Z.A.; Writing—Original Draft Preparation, A.Z.A.; Writing—Review & Editing, and Visualization, A.Z.A. and A.A.A.-B.; Supervision, A.H.N.A.; Project Administration, A.Z.A.; Funding Acquisition, A.A.A.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This project is supported by Researchers Supporting Project number (RSP- 2020/156) King Saud University, Riyadh, Saudi Arabia.

Acknowledgments

The authors would like to thank the Editorial Board, and three referees for their valuable comments and remarks that greatly improved the final version of this paper. This work was supported by King Saud University (KSU). The authors, therefore, gratefully acknowledge the KSU for technical and financial support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Barlow, R.E.; Proschan, F. Statistical Theory of Reliability and Life Testing; To Begin With: New York, NY, USA, 1981. [Google Scholar]
  2. Salvia, A.A.; Bollinger, R.C. On discrete hazard functions. IEEE Trans. Reliab. 1982, 31, 458–459. [Google Scholar] [CrossRef]
  3. Xekalaki, E. Hazard functions and life distributions in discrete time. Commun. Stat. Theory Methods 1983, 12, 2503–2509. [Google Scholar] [CrossRef]
  4. Padgett, W.J.; Spurrier, J.D. On discrete failure models. IEEE Trans. Reliab. 1985, 34, 253–256. [Google Scholar] [CrossRef]
  5. Ebrahimi, N. Classes of discrete decreasing and increasing mean-residual-life distributions. IEEE Trans. Reliab. 1986, 35, 403–405. [Google Scholar] [CrossRef]
  6. Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
  7. Jazi, M.A.; Lai, C.D.; Alamatsaz, M.H. A discrete inverse Weibull distribution and estimation of its parameters. Stat. Methodol. 2010, 7, 121–132. [Google Scholar] [CrossRef]
  8. Gómez-Déniz, E. Another generalization of the geometric distribution. Test 2010, 19, 399–415. [Google Scholar] [CrossRef]
  9. Marshall, A.W.; Olkin, I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika 1997, 84, 641–652. [Google Scholar] [CrossRef]
  10. Stein, W.E.; Dattero, R. A new discrete Weibull distribution. IEEE Trans. Reliab. 1986, 33, 196–197. [Google Scholar] [CrossRef]
  11. Roy, D. Discretization of continuous distributions with an application to stress-strength reliability. Calcutta Stat. Assoc. Bull. 2002, 52, 297–313. [Google Scholar] [CrossRef]
  12. Roy, D. The discrete normal distribution. Commun. Stat. Theory Methods 2003, 32, 1871–1883. [Google Scholar] [CrossRef]
  13. Roy, D. Discrete Rayleigh distribution. IEEE Trans. Reliab. 2004, 53, 255–260. [Google Scholar] [CrossRef]
  14. Lindley, D.V. Fiducial distributions and Bayes’ theorem. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1958, 20, 102–107. [Google Scholar] [CrossRef]
  15. Gómez-Déniz, E.; Calderın-Ojeda, E. The discrete Lindley distribution: Properties and applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
  16. Bakouch, H.S.; Jazi, M.A.; Nadarajah, S. A new discrete distribution. Statistics 2014, 48, 200–240. [Google Scholar] [CrossRef]
  17. Nakagawa, T.; Osaki, S. Discrete Weibull distribution. IEEE Trans. Reliab. 1975, 24, 300–301. [Google Scholar] [CrossRef]
  18. Keilson, J.; Gerber, H. Some results for discrete unimodality. J. Am. Stat. Assoc. 1971, 66, 386–389. [Google Scholar] [CrossRef]
  19. Nekoukhou, V.; Alamatsaz, M.; Bidram, H. A discrete analog of the generalized exponential distribution. Commun. Stat. Theory Methods 2012, 41, 2000–2013. [Google Scholar] [CrossRef]
  20. Abouammoh, A.; Mashhour, A. A note on the unimodality of discrete distributions. Commun. Stat. Theory Methods 1981, 10, 1345–1354. [Google Scholar] [CrossRef]
  21. Kemp, A. Classes of discrete lifetime distributions. Commun. Stat. Theory Methods 2004, 33, 3069–3093. [Google Scholar] [CrossRef]
  22. Al-Zahrani, B.; Al-Sobhi, M. On some properties of the reversed variance residual lifetime. Int. J. Stat. Probab. 2015, 4, 24–32. [Google Scholar] [CrossRef]
  23. Abouammoh, A.M.; Ahmed, A.N. The new better than used failure rate class of life distribution. Adv. Appl. Probab. 1988, 20, 237–240. [Google Scholar] [CrossRef]
  24. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar]
  25. Khider, S.E.; Ahmed, A.H.N.; Mohamed, M.K. Preservation of some new partial orderings under Poisson and cumulative damage shock models. J. Jpn. Stat. Soc. 2002, 32, 95–105. [Google Scholar] [CrossRef] [Green Version]
  26. Shaked, M.; Shanthikumar, J.G. Multivariate IFRA Properties of Some Markov Jump Processes with General State Space; Working Paper; Department of Mathematics, University of Arizona: Tucson, AZ, USA, 1984; Volume 35, pp. 241–258. [Google Scholar]
  27. Johnson, N.; Kotz, S.; Kemp, A. Univariate Discrete Distributions, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1992. [Google Scholar]
  28. Gray, R.M. Entropy and Information Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  29. Leadbetter, M.R.; Lindgren, G.; Rootzén, H. Extremes and Related Properties of Random Sequences and Processes; Springer: New York, NY, USA, 1987. [Google Scholar]
  30. Balabdaoui, F.; Jankowski, H.; Rufibach, K.; Pavlides, M. Asymptotics of the discrete log-concave maximum likelihood estimator and related applications. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2013, 75, 769–790. [Google Scholar] [CrossRef] [Green Version]
  31. Reath, J.; Dong, J.; Wang, M. Improved parameter estimation of the log-logistic distribution with applications. Comput. Stat. 2018, 33, 339–356. [Google Scholar] [CrossRef] [Green Version]
  32. El-Morshedy, M.; Eliwa, M.S.; Altun, E. Discrete Burr-Hatke distribution with properties, estimation methods and regression model. IEEE Access 2020, 8, 74359–74370. [Google Scholar] [CrossRef]
  33. Lawless, J.F. Statistical Models and Methods for Lifetime Data, 2nd ed.; John Wiley and Sons: New York, NY, USA, 2003. [Google Scholar]
  34. Karlis, D.; Xekalaki, E. On some discrete-valued time series models based on mixtures and thinning. In Proceedings of the Fifth Hellenic-European Conference on Computer Mathematics and Its Applications, Athens, Greece, 20–22 September 2001; pp. 872–877. [Google Scholar]
Figure 1. Probability mass function (pmf) plots of the natural discrete Lindley (NDL) distribution: θ = 0.1 (left panel) and θ = 0.3 (right panel).
Figure 1. Probability mass function (pmf) plots of the natural discrete Lindley (NDL) distribution: θ = 0.1 (left panel) and θ = 0.3 (right panel).
Entropy 22 00603 g001
Figure 2. pmf plots of the NDL distribution: θ = 0.5 (left panel) and θ = 0.05 (right panel).
Figure 2. pmf plots of the NDL distribution: θ = 0.5 (left panel) and θ = 0.05 (right panel).
Entropy 22 00603 g002
Figure 3. pmf plots of the NDL distribution: θ = 0.75 (left panel) and θ = 0.95 (right panel).
Figure 3. pmf plots of the NDL distribution: θ = 0.75 (left panel) and θ = 0.95 (right panel).
Entropy 22 00603 g003
Figure 4. Hazard rate function (hrf) plots of the NDL distribution: θ = 0.1 (left panel) and θ = 0.3 (right panel).
Figure 4. Hazard rate function (hrf) plots of the NDL distribution: θ = 0.1 (left panel) and θ = 0.3 (right panel).
Entropy 22 00603 g004
Figure 5. hrf plots of the NDL distribution: θ = 0.5 (left panel) and θ = 0.05 (right panel).
Figure 5. hrf plots of the NDL distribution: θ = 0.5 (left panel) and θ = 0.05 (right panel).
Entropy 22 00603 g005
Figure 6. Entropy of X , H ( X ) , versus θ .
Figure 6. Entropy of X , H ( X ) , versus θ .
Entropy 22 00603 g006
Figure 7. Comparison of the bias and bias corrected for n = 30 (left panel) and n = 300 (right panel).
Figure 7. Comparison of the bias and bias corrected for n = 30 (left panel) and n = 300 (right panel).
Entropy 22 00603 g007
Figure 8. Probability–probability (PP) plots for data set I.
Figure 8. Probability–probability (PP) plots for data set I.
Entropy 22 00603 g008
Figure 9. PP plots for data set II.
Figure 9. PP plots for data set II.
Entropy 22 00603 g009
Figure 10. PP plots for data set III.
Figure 10. PP plots for data set III.
Entropy 22 00603 g010
Table 1. Numerical values of mean, variance, and index of dispersion ( ID ).
Table 1. Numerical values of mean, variance, and index of dispersion ( ID ).
θ MeanVariance I D
0.01197.019819,798.06100.4877
0.0537.0952758.276620.4413
0.0919.3873223.159511.5105
0.1017.1818178.512410.3896
0.207.333338.88885.3030
0.304.128214.72713.5674
0.402.57146.88772.6785
0.501.66663.55552.1333
0.601.08331.90971.7628
0.700.68061.01681.4939
0.800.38880.50151.2896
0.900.16950.19151.1292
0.950.07960.08451.0613
0.990.01510.01531.0117
Table 2. Entropy of X versus θ .
Table 2. Entropy of X versus θ .
θ H ( X ) θ H ( X ) θ H ( X )
0.035.065550.352.295810.701.13240
0.054.540330.402.103620.750.97926
0.103.806290.451.924920.800.82278
0.153.355070.501.756150.850.65954
0.203.018170.551.594620.900.48397
0.252.743070.601.438110.950.28415
0.302.506440.651.284670.990.07885
Table 3. Simulation results for the NDL(θ).
Table 3. Simulation results for the NDL(θ).
n 203050100150300
AVE θ = 0.07 0.069710.069450.069040.069010.068990.06886
MSE0.000080.000040.000020.000010.000010.00001
ABI0.000290.000550.000960.000990.001010.00114
BI-C3.63772−1.27307−0.31543−0.13907−0.03452−0.01538
MRE0.100280.076190.054070.044390.033070.02872
AVE θ = 0.15 0.146640.145460.145060.145000.144770.14472
MSE0.000310.000190.000110.000080.000060.00005
ABI0.003360.004540.004940.005000.005230.00528
BI-C−0.40219−0.14380−0.03537−0.01562−0.00390−0.00173
MRE0.094360.075460.055870.048700.041260.03831
AVE θ = 0.25 0.237950.236270.235730.235650.235320.23525
MSE0.000800.000580.000390.000330.000280.00026
ABI0.012050.013730.014270.014350.014680.01475
BI-C−0.10355−0.03711−0.00916−0.00405−0.00101−0.00045
MRE0.092930.079490.066160.062070.059720.05922
AVE θ = 0.35 0.324400.322400.321790.321720.321320.32124
MSE0.001680.001370.001090.000990.000920.00089
ABI0.025600.027600.028210.028280.028680.02876
BI-C−0.04822−0.01730−0.00428−0.00190−0.00047−0.00021
MRE0.097340.089700.083090.081570.082000.08216
AVE θ = 0.50 0.445140.444150.442730.442420.442340.44240
MSE0.004380.003960.003710.003600.003460.00341
ABI0.054860.055850.057270.057580.057660.05760
BI-C−0.02734−0.00978−0.00244−0.00108−0.00027−0.00012
MRE0.115230.113270.114680.115190.115320.11520
AVE θ = 0.65 0.560050.558400.557810.557100.556630.55676
MSE0.009630.009350.008950.008940.008870.00879
ABI0.089950.091600.092190.092900.093370.09324
BI-C−0.02390−0.00854−0.00212−0.00094−0.00023−0.00010
MRE0.138960.140960.141820.142920.143650.14345
AVE θ = 0.83 0.694530.693280.692960.692380.692030.69217
MSE0.019610.019480.019250.019190.019160.01908
ABI0.135470.136720.137040.137620.137970.13783
BI-C−0.03173−0.01124−0.00278−0.00123−0.00031−0.00014
MRE0.163220.164720.165110.165800.166230.16606
AVE θ = 0.95 0.801940.801150.801060.800660.800440.80056
MSE0.022680.022630.022410.022460.022440.02238
ABI0.148060.148850.148940.149340.149560.14944
BI-C−0.06013−0.02111−0.00519−0.00229−0.00057−0.00025
MRE0.155850.156680.156780.157200.157430.15730
Table 4. Some descriptive statistics for remission times, numbers of fires, and COVID-19 data.
Table 4. Some descriptive statistics for remission times, numbers of fires, and COVID-19 data.
DataMin1st Qu.MedianMean3rd QuMax
Data Set I1.007.0016.5019.5528.2549.00
Data Set II0.002.004.005.408.0043.00
Data Set II1.004.007.008.3412.5022.00
Table 5. Fitted estimates for remission times, numbers of fires, and COVID-19 data.
Table 5. Fitted estimates for remission times, numbers of fires, and COVID-19 data.
DataModelEstimatesKSp-Value
Data Set INDL( θ )0.089342(0.013524) 0.117560.94505
DL( θ )0.095408(0.015115) 0.125460.91128
Gc( θ )0.048662(0.010609) 0.144750.79613
DB( α , θ )18.627559(38.92987)0.979964(0.041469)0.341110.01904
DP( θ )0.695781(0.056437) 0.356300.01247
DBH( λ )0.998365(0.009288) 0.7513050.00000
Data Set IINDL( θ )0.250054(0.014091) 0.137020.01974
DL( θ )0.300157(0.019435) 0.151550.00703
Gc( θ )0.156290(0.012944) 0.163640.00276
DB( α , θ )2.502556(0.486995)0.761172(0.042739)0.192470.00022
DP( θ )0.546251(0.029829) 0.2495970.00000
DBH( λ )0.983652(0.012697) 0.547400.00000
Data Set IIINDL( θ )0.181266(0.017119) 0.093310.80782
DL( θ )0.206932(0.021521) 0.101420.71910
DB( α , θ )32.10536(34.10685)0.983601(0.017297)0.297870.00048
DP( θ )0.617625(0.043428) 0.305510.00031
Gc( θ )0.107062(0.014756) 0.216500.02441
DBH( λ )0.991326(0.014370) 0.672420.00000

Share and Cite

MDPI and ACS Style

Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications. Entropy 2020, 22, 603. https://doi.org/10.3390/e22060603

AMA Style

Al-Babtain AA, Ahmed AHN, Afify AZ. A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications. Entropy. 2020; 22(6):603. https://doi.org/10.3390/e22060603

Chicago/Turabian Style

Al-Babtain, Abdulhakim A., Abdul Hadi N. Ahmed, and Ahmed Z. Afify. 2020. "A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications" Entropy 22, no. 6: 603. https://doi.org/10.3390/e22060603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop