The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data

: In this paper, a new discrete distribution called Binomial–Natural Discrete Lindley distribution is proposed by compounding the binomial and natural discrete Lindley distributions. Some properties of the distribution are discussed including the moment-generating function, moments and hazard rate function. Estimation of the distribution’s parameter is studied by methods of moments, proportions and maximum likelihood. A simulation study is performed to compare the performance of the different estimates in terms of bias and mean square error. SO 2 data applications are also presented to see that the new distribution is useful in modeling data.


Introduction
Count data modeling is a challenging task in many areas, including, but not limited to, public health, medicine, epidemiology, applied science, sociology, and agriculture. In many situations, the life length of a device cannot be measured on a continuous scale and the survival function is assumed to be a function of a count random variable instead of being a function of a continuous-time random variable. Therefore, discrete distributions are somewhat meaningful to model lifetime data in situations where output may be of a discrete nature. The traditional discrete distributions have limited applicability as models for reliability, failure times, aggregate loss, etc., especially with the count data with overdispersion in which the variance is greater than the mean. This has led to the development of some discrete distributions based on popular continuous models in reliability analysis, actuarial sciences survival analysis, etc. The discretization of continuous distributions has produced many discrete distributions in the last few decades in the statistical literature. However, the quest for a quintessential model remains the crux of the matter in the diverse scientific paradigm.
One of the many approaches to define new models is the discretization of distributions. Until recently, the majority of discrete lifetime distributions have been proposed in the statistical literature by discretizing the survival function S(x) of continuous lifetime distributions (see the work of authors, for example, in references [1][2][3][4][5][6][7][8][9][10][11][12]).
The probability mass function (pmf) P(X = x) is defined as follows 2 of 17 In this paper, we propose and study a new probability mass function (pmf), denoted by p x , by compounding the binomial and the NDL distributions. The basic principle of this method is stated as if N(input) and X(output) are two random variables denoting the number of particles entering and leaving an attenuator, then the probability functions p(n) and f (x) of these two random variables are connected by the binomial decay transformation where 0 ≤ p ≤ 1 is the attenuating coefficient which is discussed by Hu et al. [7]. They considered p(n) as a Poisson distribution with the parameter λ > 0, and then they showed that Pr(X = x) is the Poisson distribution with the parameter λp. For clarity, attenuators are electrical devices built to lower the amount of voltage flowing through them without severely compromising the signal's integrity. They serve as a safeguard against systems being exposed to signals with power levels that are too high to be decoded. Déniz [13] introduced uniform Poisson distribution using the idea of Hu et al. [7] by interchanging in Equation (1) the binomial distribution and the discrete uniform distribution and maintaining P(n) as the Poisson distribution. Some new discrete distributions also are proposed in the literature using the methodology of [7]. Akdogan et al. [14] proposed uniform-geometric distribution and Coşkun et al. [15] constructed binomial-discrete Lindley distribution. The rest of the paper is arranged as follows: Section 2 defines the natural discrete Lindley distribution and proposes the new binomial-natural discrete Lindley distribution with important properties, subsequently. In Section 3, various parameter estimation and simulation studies are given. Section 4 concerns the real data illustration of the findings. In Section 5, some conclusions are provided.

Natural Discrete Lindley Distribution
Recently, Al-Babtain et al. [16] proposed and studied a new natural discrete analog of the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution is called natural discrete Lindley (NDL) distribution and it has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. The NDL can be applied in the collective risk models and is competitive with the Poisson distribution to fit automobileclaim-frequency data. Let N be a non-negative random variable obtained as a finite mixture of geometric (p) and negative binomial (2, p) with mixing probabilities p p+1 and 1 p+1 , respectively, then the probability mass function of the NDL distribution is defined as One of the most important features of this distribution is that it has a single parameter and it has attractive properties, which makes it suitable for applications not only in insurance settings but also in other fields where over-dispersions are observed. For more details about this distribution, see Al-Babtain et al. [16]. Given the usefulness of NDL, the discrete analogue due to NDL known as the binomial NDL (BNDL) seems to be naturally interesting to explore.

The Proposed Discrete Analog
The probability mass function (1) can be expressed as where P( X|N = n) has the binomial b(n, p) distribution. Suppose that N is the random variable from NDL with parameter p given in (2); then, the probability mass function of the discrete random variable X is obtained as If X has the pmf (3), then it is called a binomial natural discrete Lindley (BNDL) random variable and it is denoted by X ∼ BNDL(p). For n = 0, this means that no particles enter into the attenuator and it will be termed as failure. Consequently, the corresponding cumulative distribution function (cdf) of BNDL distribution is given by (4) Figure 1 shows the probability mass function (pmf) plots of the proposed distribution for various values of parameter p. Thus, the pmf is always a decreasing function, and the new discrete random variable tends to take small values when p increases. The stochastic process tends to happen very quickly once the parameter value grows, which is implied quite strongly by the model's behavior. Therefore, the BNDL model is a logical substitute for the traditional exponential distribution to characterize such phenomena. Additionally, the flexibility of the proposed BNDL can be tested for varied count data sources. For example, this model may be helpful for simulating aggregate losses that are typically limited to actuarial data by maximizing the overall garment fit for a particular number of sizes and accommodation rate, crucial to assessing the goodness of the scaling system. Furthermore, it may be helpful to overcome the problem of over-dispersed data in social sciences, as in anthropology where civilizations grew near the existence of a consistent water source, which is necessary for human survival. Figure 2 complements the results of Figure 1.

Statistical Properties of the BNDL Distribution
Primarily in this section, we provide some explicit results based on the mathematical properties of the BNDL distribution.

Moment-Generating Function
If X ∼ BNDL(p) distribution, then the moment-generating function of X is given as For more on generating functions, see Yalcin and Simsek [17], Yalcin and Simsek [18] and Simsek [19].

Probability-Generating Function
The probability-generating function of the random variable X ∼ BNDL(p) can be obtained using its moment-generating function which is equivalent to calculating E t X ; therefore, the probability-generating function of the random variable X is Since, Therefore, at t = 1, we can obatin

Statistical Properties of the BNDL Distribution
Primarily in this section, we provide some explicit results based on the mathematical properties of the BNDL distribution.

Moment-Generating Function
If ~( ) distribution, then the moment-generating function of is given as For more on generating functions, see Yalcin and Simsek [17], Yalcin and Simsek [18] and Simsek [19].

Probability-Generating Function
The probability-generating function of the random variable ~( ) can be obtained using its moment-generating function which is equivalent to calculating ( ); therefore, the probability-generating function of the random variable is Since, Therefore, at = 1, we can obatin

Non-Central Moments and Variance
If X ∼ BNDL(p) distribution, then the kth moment about zero of X is given by The first four raw moments can be obtained as follows The variance in the random variable X is

Central Moments
The kth moment about the mean of X is Therefore, the second, third and fourth central moments of the random variable X are and

Skewness and Kurtosis
The coefficient of skewness and the coefficient of kurtosis of the of BNDL distribution are, respectively,

Index of Dispersion
The index of dispersion (ID) indicates whether a certain distribution is suitable for under-or over-dispersed datasets. For example, ID = 1 for the Poisson distribution where the variance is equal to the mean, for the geometric distribution and the negative binomial distribution ID > 1, while the binomial distribution has ID < 1. Theorem 1. If X ∼ BNDL(p), then Var(X) > E(X) for all p ∈ (0, 1).
From Theorem 1, BNDL distribution should only be used in the count data analysis with over-dispersion. In Table 1, some of the empirical findings of these measured are due for considerations. A necessary and sufficient condition that p x be strongly unimodal is that it has to be log-concave, i.e., p 2 x+1 ≥ p x p x+2 for all x (see Keilson and Gerber [20])). (3) is log-concave.

Theorem 2. The pmf of the BNDL distribution in
Proof. From (3), we can directly reach and After some algebraic operations, we find that for all x and for all choices p ∈ (0, 1). Theorem 2 confirms that the BNDL distribution is strongly unimodal.

Survival Function
If X ∼ BNDL(p) distribution, then from (4), the survival function of X is

Hazard Rate and Mean Residual Life Functions
The hazard (failure) rate function is the probability that an item has survived time x, given that it has survived to at least time x. If X ∼ BNDL(p) distribution, then its hazard rate (failure rate) function is given as Obviously, the upper limit of the failure rate function is 1 1−p , i.e., lim  Obviously, the upper limit of the failure rate function is , i.e., lim → ( ; ) = .
Graphical illustrations of hazard rate function are presented in Figure 3 while descriptive measures are presented in Figure 4.  The mean residual life function of X is given by Corollary 1. If X ∼ BNDL(p) distribution, then it has an increasing failure rate and decreasing mean residual life.
As we explained through Theorem 2, the BNDL distribution has a property of logconcavity; therefore, according to Gupta et al. [21], the BNDL distribution has an IFR property. According to Kemp [22], the next chain is verified So, the BNDL distribution is 1.

Stochastic Orderings
Stochastic orders are important measures to judge comparative behaviors of random variables. Shaked and Shanthikumar [8] showed that many stochastic orders exist and have various applications. Given two random variables X and Y, we say that X is smaller than Y in the 1.
Usual stochastic order, denoted by Hazard rate order, denoted by X ≤ hr Y, if h X (x) ≥ h Y (x), for all x.

3.
Reversed hazard rate order, denoted by X ≤ rh Y, if F X (x)/F Y (x) decreases in x.

4.
Mean residual life order, denoted by X ≤ mrl Y, if m X (x) ≤ m Y (x), for all x.

5.
Likelihood ratio order, denoted by For all the previous orders, we have the following chains of implications: Theorem 3. Let X ∼ BNDL(p 1 ) and Y ∼ BNDL(p 2 ); then, X ≤ lr Y for all p 1 > p 2 .

Entropy
Entropy is a measure of uncertainty of a random variable. The entropy of a discrete random variable X with pmf p(x) and alphabet X is given by Entropy can be interpreted as the measure of average uncertainty in X or the average number of bits needed to describe X. For more details on entropy and information theory, we refer the reader to Gray [23]. Now, if X ∼ BNDL(p), then the entropy of the random variable X can be calculated by the following formula Table 2 presents some numerical values of the entropy of X ∼ BNDL(p) for different choices of p. From Table 2, one can observe that H(X) is monotonically decreasing in p ∈ (0, 1) with its limits tending to be 1.88 as p tends to 0 as p → 1.  Figure 5 relates the H(X) to the values of parameter p. One may note that (X) is monotonically decreasing in p ∈ (0, 1) with its limit inclining to zero as p tends to 1.  Figure 5 relates the ℍ( ) to the values of parameter p. One may note that ( ) is monotonically decreasing in p ∈ (0, 1) with its limit inclining to zero as p tends to 1.

Estimation and Simulation
In this section, we determine the estimation of unknown parameter by the maximum likelihood, moment and proportion methods.

Method of Maximum Likelihood Estimation
Let , , … , be the observed values from the BNDL distribution with parameter . The likelihood and log-likelihood function are given, respectively, as The maximum likelihood estimate (MLE) of the parameter can be obtained by solving the following equation using some numerical procedures.

Estimation and Simulation
In this section, we determine the estimation of unknown parameter p by the maximum likelihood, moment and proportion methods.

Method of Maximum Likelihood Estimation
Let x 1 , x 2 , . . . , x n be the observed values from the BNDL distribution with parameter p. The likelihood and log-likelihood function are given, respectively, as The maximum likelihood estimate (MLE) of the parameter p can be obtained by solving the following equation using some numerical procedures.

Method of Moments Estimation
Let X 1 , X 2 , . . . , X n be a random sample from the BNDL distribution with parameter p. The moment estimate (ME) of the parameter p can be obtained by solving the following equation. (p

Method of Proportions Estimation
Let X 1 , X 2 , . . . , X n be a random sample from the BNDL distribution with parameter p. For i = 1, 2, . . . , n, we define the indicator functions Therefore, the proportion of 0s in the sample Π = 1 n ∑ n i=1 I(X i ). The proportion estimate (PE) of the parameter p can be obtained by solving the following equation with respect to p

Simulation Study
In this section, we assess the behavior of the maximum likelihood estimators for a finite sample of size n. Based on BNDL distribution, a simulation study is carried out. The simulation study is based on the following steps: firstly, generate N = 1000 samples of sizes n = 25, 50, . . . , 500 from the BNDL distribution. Then, compute the maximum likelihood estimators for the model parameters. Lastly, compute the MSEs given by For various parameters' values, the simulation's results provided in Figure 6 indicate that the estimated MSEs fall off toward zero when the sample size n increases. Hence, we have conclusive evidence to claim that the maximum likelihood estimation of p satisfies the asymptotic convergence of normality. The asymptotic normality of the MLE is a very well-known classic property given as follows. In a parametric model, we say that an estimatorp based on X 1 , X 2 , X 3 , . . . , X n is consistent ifp → p in probability as n → ∞ . We say that it is asymptotically normal if √ n(p − p) converges in distribution to a normal distribution. Sop above is consistent and asymptotically normal.

Method of Proportions Estimation
Let , , … , be a random sample from the BNDL distribution with parameter . For = 1,2, … , , we define the indicator functions Therefore, the proportion of 0s in the sample Π = ∑ ( ) . The proportion estimate (PE) of the parameter can be obtained by solving the following equation with respect to Π = 1 + 2 − ( + 1)(2 − ) .

Simulation Study
In this section, we assess the behavior of the maximum likelihood estimators for a finite sample of size n. Based on BNDL distribution, a simulation study is carried out. The simulation study is based on the following steps: firstly, generate N = 1000 samples of sizes n = 25, 50, …, 500 from the BNDL distribution. Then, compute the maximum likelihood estimators for the model parameters. Lastly, compute the MSEs given by For various parameters' values, the simulation's results provided in Figure 6 indicate that the estimated MSEs fall off toward zero when the sample size n increases. Hence, we have conclusive evidence to claim that the maximum likelihood estimation of p satisfies the asymptotic convergence of normality. The asymptotic normality of the MLE is a very well-known classic property given as follows. In a parametric model, we say that an estimator ̂ based on , , , … , is consistent if ̂→ in probability as → ∞. We say that it is asymptotically normal if √ (̂− ) converges in distribution to a normal distribution. So ̂ above is consistent and asymptotically normal.

Applications to Count Data
In this section, to show the application, we used a real-life data set to examine the efficiency and superiority of the BNDL distribution in modeling real data practice, recently studied by Balakarishnan et al. [24], consisting of 744 discrete observations. Santi-

Applications to Count Data
In this section, to show the application, we used a real-life data set to examine the efficiency and superiority of the BNDL distribution in modeling real data practice, recently studied by Balakarishnan et al. [24], consisting of 744 discrete observations. Santiago, Chile is recognized as one of the most environmentally contaminated cities in the world. In order to obtain the level of air pollution and its associated adverse effects on humans in Santiago, the National Commission of Environment (CONAMA) of the government of Chile collects data on sulfur dioxide (SO 2 ) concentrations in the air. The data corresponding to the hourly SO 2 concentrations (in ppm) observed at a monitoring station located in Santiago city are: We compare BNDL to Binomial-Discrete Lindley Distribution (BDLD) by Kuş et al. [15] and Negative Binomial distribution. The pmf of BDLD is given as We considered the AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion) and HQIC (Hannan-Quinn Information Criterion). The model with minimum values for these statistics could be chosen as the best model to fit the data. All results in Table 3 were obtained using the R PROGRAM.  Figure 7 gives the quantile-quantile plot (Q-Q plot) and box plot and Figure 8 gives TTT plot versus the EHRF for the given data set. Total Time on Test (TTT plots) showed that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figures 9 and 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.
TTT plot versus the EHRF for the given data set. Total Time on Test (TTT plots) showed that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figures  9 and 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.  that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figures  9 and 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.  . Fitted plots of BNDL and BDLD distribution for given data set. Figure 9. Fitted plots of BNDL and BDLD distribution for given data set.

Concluding Remarks
A new one-parameter discrete distribution was proposed and its important distributional, monotonic, and reliability characteristics were explored. Some statistical and reliability properties of the proposed discrete model were derived. Various estimating approaches were discussed. A simulation study was conducted to determine the MLEs' accuracy and precision. The applicability of the proposed distribution in modeling a reallife discrete data set was demonstrated. It is clear from the comparison that the new distribution is the best distribution for fitting the data sets from among the all-tested distributions and it will be a useful contribution to the field of count data modeling.

Concluding Remarks
A new one-parameter discrete distribution was proposed and its important distributional, monotonic, and reliability characteristics were explored. Some statistical and reliability properties of the proposed discrete model were derived. Various estimating approaches were discussed. A simulation study was conducted to determine the MLEs' accuracy and precision. The applicability of the proposed distribution in modeling a real-life discrete data set was demonstrated. It is clear from the comparison that the new distribution is the best distribution for fitting the data sets from among the all-tested distributions and it will be a useful contribution to the field of count data modeling.