The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data

Shakaiba Shafiq; Sadaf Khan; Waleed Marzouk; Jiju Gillariose; Farrukh Jamal

doi:10.3390/mca27040062

,

and

¹

Department of Statistics, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan

²

Faculty of Graduate Studies for Statistical Research, Department of Mathematical Statistics, Cairo University, Giza 12613, Egypt

³

Department of Statistics, CHRIST (Deemed to be University), Hosur Road, Bangalore 560029, India

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl.2022, 27(4), 62;https://doi.org/10.3390/mca27040062

This article belongs to the Special Issue Computational Mathematics and Applied Statistics

Version Notes

Order Reprints

Abstract

In this paper, a new discrete distribution called Binomial–Natural Discrete Lindley distribution is proposed by compounding the binomial and natural discrete Lindley distributions. Some properties of the distribution are discussed including the moment-generating function, moments and hazard rate function. Estimation of the distribution’s parameter is studied by methods of moments, proportions and maximum likelihood. A simulation study is performed to compare the performance of the different estimates in terms of bias and mean square error. SO₂ data applications are also presented to see that the new distribution is useful in modeling data.

Keywords:

discretizing; natural discrete Lindley distribution; over dispersion; maximum likelihood estimation

1. Introduction

Count data modeling is a challenging task in many areas, including, but not limited to, public health, medicine, epidemiology, applied science, sociology, and agriculture. In many situations, the life length of a device cannot be measured on a continuous scale and the survival function is assumed to be a function of a count random variable instead of being a function of a continuous-time random variable. Therefore, discrete distributions are somewhat meaningful to model lifetime data in situations where output may be of a discrete nature. The traditional discrete distributions have limited applicability as models for reliability, failure times, aggregate loss, etc., especially with the count data with over-dispersion in which the variance is greater than the mean. This has led to the development of some discrete distributions based on popular continuous models in reliability analysis, actuarial sciences survival analysis, etc. The discretization of continuous distributions has produced many discrete distributions in the last few decades in the statistical literature. However, the quest for a quintessential model remains the crux of the matter in the diverse scientific paradigm.

One of the many approaches to define new models is the discretization of distributions. Until recently, the majority of discrete lifetime distributions have been proposed in the statistical literature by discretizing the survival function

S (x)

of continuous lifetime distributions (see the work of authors, for example, in references [1,2,3,4,5,6,7,8,9,10,11,12]).

The probability mass function (pmf)

P (X = x)

is defined as follows

P (X = x) = S (x) - S (x + 1) x = 0, 1, 2, \dots

Away from this method, Afify [12] have introduced and studied a new discrete Lindley distribution by constructing a mixture of discrete analogs to the continuous components used in creating the continuous Lindley distribution.

In this paper, we propose and study a new probability mass function (pmf), denoted by

p_{x}

, by compounding the binomial and the NDL distributions. The basic principle of this method is stated as if

N

(input) and

X

(output) are two random variables denoting the number of particles entering and leaving an attenuator, then the probability functions

p (n)

and

f (x)

of these two random variables are connected by the binomial decay transformation

P (X = x) = \sum_{n = x}^{\infty} (\begin{matrix} n \\ x \end{matrix}) p^{x} {(1 - p)}^{n - x} p (n); x = 0, 1, \dots, \infty

(1)

where

0 \leq p \leq 1

is the attenuating coefficient which is discussed by Hu et al. [7]. They considered

p (n)

as a Poisson distribution with the parameter

λ > 0

, and then they showed that

\Pr (X = x)

is the Poisson distribution with the parameter

λ p

. For clarity, attenuators are electrical devices built to lower the amount of voltage flowing through them without severely compromising the signal’s integrity. They serve as a safeguard against systems being exposed to signals with power levels that are too high to be decoded. Déniz [13] introduced uniform Poisson distribution using the idea of Hu et al. [7] by interchanging in Equation (1) the binomial distribution and the discrete uniform distribution and maintaining

P (n)

as the Poisson distribution. Some new discrete distributions also are proposed in the literature using the methodology of [7]. Akdoğan et al. [14] proposed uniform-geometric distribution and Coşkun et al. [15] constructed binomial–discrete Lindley distribution.

The rest of the paper is arranged as follows: Section 2 defines the natural discrete Lindley distribution and proposes the new binomial–natural discrete Lindley distribution with important properties, subsequently. In Section 3, various parameter estimation and simulation studies are given. Section 4 concerns the real data illustration of the findings. In Section 5, some conclusions are provided.

2. Natural Discrete Lindley Distribution

Recently, Al-Babtain et al. [16] proposed and studied a new natural discrete analog of the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution is called natural discrete Lindley (NDL) distribution and it has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. The NDL can be applied in the collective risk models and is competitive with the Poisson distribution to fit automobile-claim-frequency data. Let

N

be a non-negative random variable obtained as a finite mixture of geometric (

p

) and negative binomial (2,

p

) with mixing probabilities

\frac{p}{p + 1}

and

\frac{1}{p + 1}

, respectively, then the probability mass function of the NDL distribution is defined as

P (N = n) = \frac{p^{2}}{p + 1} (2 + n) {(1 - p)}^{n}; n = 0, 1, 2, \dots and p \in (0, 1)

(2)

One of the most important features of this distribution is that it has a single parameter and it has attractive properties, which makes it suitable for applications not only in insurance settings but also in other fields where over-dispersions are observed. For more details about this distribution, see Al-Babtain et al. [16]. Given the usefulness of NDL, the discrete analogue due to NDL known as the binomial NDL (BNDL) seems to be naturally interesting to explore.

2.1. The Proposed Discrete Analog

The probability mass function (1) can be expressed as

P (X = x) = \sum_{n = x}^{\infty} P (X = x | N = n) P (N = n),

where

P (X | N = n)

has the binomial

b (n, p)

distribution. Suppose that

N

is the random variable from NDL with parameter

p

given in (2); then, the probability mass function of the discrete random variable

X

is obtained as

\begin{array}{l} p_{x} (x; p) = P (X = x) = \sum_{n = x}^{\infty} P (X = x | N = n) P (N = n) = \sum_{n = x}^{\infty} (\begin{matrix} n \\ x \end{matrix}) p^{x} {(1 - p)}^{n - x} \frac{p^{2}}{p + 1} (2 + n) {(1 - p)}^{n} \\ = \sum_{n - x = 0}^{\infty} (\begin{matrix} n \\ x \end{matrix}) p^{x} {(1 - p)}^{n - x} \frac{p^{2}}{p + 1} (2 + n) {(1 - p)}^{n} = \sum_{k = 0}^{\infty} (\begin{matrix} x + k \\ x \end{matrix}) p^{x} {(1 - p)}^{k} \frac{p^{2}}{p + 1} (2 + x + k) {(1 - p)}^{x + k} \\ = \frac{p^{2}}{p + 1} \sum_{k = 0}^{\infty} (\begin{matrix} x + k \\ x \end{matrix}) p^{x} (2 + x + k) {(1 - p)}^{x + 2 k} = \frac{{(1 - p)}^{x} (1 + x + 2 p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}}; x \\ = 0, 1, 2 \dots and p \in (0, 1) \end{array}

(3)

If

X

has the pmf (3), then it is called a binomial natural discrete Lindley (BNDL) random variable and it is denoted by

X ~ BNDL (p) .

For

n = 0

, this means that no particles enter into the attenuator and it will be termed as failure. Consequently, the corresponding cumulative distribution function (cdf) of BNDL distribution is given by

F (x; p) = P (X \leq x) = \sum_{t = 0}^{x} p_{x} (t) = \sum_{t = 0}^{x} \frac{{(1 - p)}^{t} (1 + t + 2 p - p^{2})}{(p + 1) {(2 - p)}^{t + 2}} = 1 - \frac{{(1 - p)}^{x + 1} (3 + x + p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}} .

(4)

Figure 1 shows the probability mass function (pmf) plots of the proposed distribution for various values of parameter p. Thus, the pmf is always a decreasing function, and the new discrete random variable tends to take small values when p increases. The stochastic process tends to happen very quickly once the parameter value grows, which is implied quite strongly by the model’s behavior. Therefore, the BNDL model is a logical substitute for the traditional exponential distribution to characterize such phenomena. Additionally, the flexibility of the proposed BNDL can be tested for varied count data sources. For example, this model may be helpful for simulating aggregate losses that are typically limited to actuarial data by maximizing the overall garment fit for a particular number of sizes and accommodation rate, crucial to assessing the goodness of the scaling system. Furthermore, it may be helpful to overcome the problem of over-dispersed data in social sciences, as in anthropology where civilizations grew near the existence of a consistent water source, which is necessary for human survival. Figure 2 complements the results of Figure 1.

Figure 1. Pmf of BNDL distribution for some choices of p.

Figure 2. Histograms of the BNDL model for simulated data.

2.2. Statistical Properties of the BNDL Distribution

Primarily in this section, we provide some explicit results based on the mathematical properties of the BNDL distribution.

2.2.1. Moment-Generating Function

If

X ~ BNDL (p)

distribution, then the moment-generating function of

X

is given as

M_{X} (t) = E (e^{t X}) = \sum_{x = 0}^{\infty} e^{t x} \frac{{(1 - p)}^{x} (1 + x + 2 p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}} = \frac{1 - p (e^{t} - 2) + p^{2} (e^{t} - 1)}{{(2 - e^{t} + p e^{t} - p)}^{2} (p + 1)} .

For more on generating functions, see Yalcin and Simsek [17], Yalcin and Simsek [18] and Simsek [19].

2.2.2. Probability-Generating Function

The probability-generating function of the random variable

X ~ BNDL (p)

can be obtained using its moment-generating function which is equivalent to calculating

E (t^{X})

; therefore, the probability-generating function of the random variable

X

is

G_{X} (t) = E (t^{X}) = M_{X} (l o g (t)) = \frac{1 - p (t - 2) + p^{2} (t - 1)}{{(2 - t + p (t - 1))}^{2} (p + 1)} .

Since,

G_{X}^{(k)} (t) = \frac{d^{k} G_{X} (t)}{d t^{k}} = E {X (X - 1) (X - 2) \dots (X - k + 1) t^{X - k}} .

Therefore, at

t = 1

, we can obatin

G_{X}^{(k)} (1) = {\frac{d^{k} G_{X} (t)}{d t^{k}} |}_{t = 1} = E {X (X - 1) (X - 2) \dots (X - k + 1)},

where

μ_{(k)} = E {X (X - 1) (X - 2) \dots (X - k + 1)}

is the

k

th factorial moment of

X

.

2.2.3. Non-Central Moments and Variance

If

X ~ BNDL (p)

distribution, then the kth moment about zero of X is given by

μ_{k}^{'} = E (X^{r}) = \sum_{x = 0}^{\infty} x^{k} p_{x} = \sum_{x = 0}^{\infty} x^{k} \frac{{(1 - p)}^{x} (1 + x + 2 p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}} .

The first four raw moments can be obtained as follows

μ_{1}^{'} = E (X) = \frac{(p + 2) (1 - p)}{p + 1},

μ_{2}^{'} = E (X^{2}) = \frac{(1 - p) (8 - 3 p - 2 p^{2})}{p + 1},

μ_{3}^{'} = E (X^{3}) = \frac{(1 - p) (44 - 53 p + 6 p^{2} + 6 p^{3})}{p + 1},

and

μ_{4}^{'} = E (X^{4}) = \frac{(1 - p) (308 - 516 p + 346 p^{2} - 12 p^{3} - 24 p^{4})}{p + 1} .

The variance in the random variable

X

is

V a r (X) = E (X^{2}) - {[E (X)]}^{2} = \frac{(1 - p) (4 + 5 p - 2 p^{2} - p^{3})}{{(p + 1)}^{2}} .

2.2.4. Central Moments

The kth moment about the mean of X is

μ_{r} = E [{(X - μ_{1}^{'})}^{k}] = \sum_{x = 0}^{\infty} {(x - μ_{1}^{'})}^{k} p_{x} (x) = \sum_{x = 0}^{\infty} {(x - μ_{1}^{'})}^{k} \frac{{(1 - p)}^{x} (1 + x + 2 p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}} .

Therefore, the second, third and fourth central moments of the random variable

X

are

μ_{2} = \frac{(1 - p) (4 + 5 p - 2 p^{2} - p^{3})}{{(p + 1)}^{2}},

μ_{3} = \frac{(1 - p) (12 + 21 p - 7 p^{2} - 21 p^{3} + 5 p^{4} + 2 p^{5})}{{(p + 1)}^{3}},

and

μ_{4} = \frac{(1 - p) (100 + 181 p - 132 p^{2} - 285 p^{3} + 50 p^{4} + 137 p^{5} - 27 p^{6} - 9 p^{7})}{{(p + 1)}^{4}}

2.2.5. Skewness and Kurtosis

The coefficient of skewness and the coefficient of kurtosis of the of BNDL distribution are, respectively,

β_{1} = \frac{μ_{3}}{\sqrt{μ_{2}^{3}}} = \frac{(1 - p) (12 + 21 p - 7 p^{2} - 21 p^{3} + 5 p^{4} + 2 p^{5})}{{(4 + p - 7 p^{2} + p^{3} + p^{4})}^{3 / 2}} .

β_{2} = \frac{μ_{4}}{μ_{2}^{2}} = \frac{100 + 181 p - 132 p^{2} - 285 p^{3} + 50 p^{4} + 137 p^{5} - 27 p^{6} - 9 p^{7}}{(1 - p) {(4 + 5 p - 2 p^{2} - p^{3})}^{2}} .

2.2.6. Index of Dispersion

The index of dispersion (ID) indicates whether a certain distribution is suitable for under- or over-dispersed datasets. For example,

ID = 1

for the Poisson distribution where the variance is equal to the mean, for the geometric distribution and the negative binomial distribution

ID > 1

, while the binomial distribution has

ID < 1

.

Theorem 1.

If

X ~ B N D L (p)

, then

V a r (X) > E (X)

for all

p \in (0, 1) .

Proof.

We have

ID (X) = \frac{V a r (X)}{E (X)} = \frac{4 + 5 p - 2 p^{2} - p^{3}}{p^{2} + 3 p + 2} .

This function is a monotonic decreasing function as

p \in (0, 1)

increases. It converges to 2 when

p \to 0

, while it tends to 1 as

p \to 1

; therefore,

ID (X) \in (1, 2)

, which means that

ID (X) > 1

, and hence,

V a r (X) > E (X)

. □

From Theorem 1, BNDL distribution should only be used in the count data analysis with over-dispersion. In Table 1, some of the empirical findings of these measured are due for considerations.

Table 1. Mean, Variance, Skewness, kurtosis and ID of the BNDL distribution for different values of the parameter p.

2.2.7. Log-Concavity

A necessary and sufficient condition that

p_{x}

be strongly unimodal is that it has to be log-concave, i.e.,

p_{x + 1}^{2} \geq p_{x} p_{x + 2}

for all

x

(see Keilson and Gerber [20])).

Theorem 2.

The pmf of the BNDL distribution in (3) is log-concave.

Proof.

From (3), we can directly reach

p_{x + 1}^{2} = \frac{{(1 - p)}^{2 x + 2} {(2 + x + 2 p - p^{2})}^{2}}{{(p + 1)}^{2} {(2 - p)}^{2 x + 6}},

and

p_{x} p_{x + 2} = \frac{{(1 - p)}^{2 x + 2} (1 + x + 2 p - p^{2}) (3 + x + 2 p - p^{2})}{{(p + 1)}^{2} {(2 - p)}^{2 x + 6}} .

After some algebraic operations, we find that

p_{x + 1}^{2} - p_{x} p_{x + 2} = \frac{{(1 - p)}^{2 x + 2}}{{(p + 1)}^{2} {(2 - p)}^{2 x + 6}} > 0,

for all

x

and for all choices

p \in (0, 1)

.

Theorem 2 confirms that the BNDL distribution is strongly unimodal. □

2.3. Reliability Properties of the BNDL Distribution

2.3.1. Survival Function

If

X ~ BNDL (p)

distribution, then from (4), the survival function of

X

is

S (x; p) = P (X \geq x) = \frac{{(1 - p)}^{x + 1} (3 + x + p - p^{2})}{(p + 1) {(2 - p)}^{x + 2}} .

2.3.2. Hazard Rate and Mean Residual Life Functions

The hazard (failure) rate function is the probability that an item has survived time

x

, given that it has survived to at least time

x

. If

X ~ BNDL (p)

distribution, then its hazard rate (failure rate) function is given as

r (x; p) = P (X = x | X > x) = \frac{p_{x} (x; p)}{S (x; p)} = \frac{1 + x + 2 p - p^{2}}{(1 - p) (3 + x + p - p^{2})} .

Obviously, the upper limit of the failure rate function is

\frac{1}{1 - p}

, i.e.,

\lim_{x \to \infty} r (x; p) = \frac{1}{1 - p}

. Graphical illustrations of hazard rate function are presented in Figure 3 while descriptive measures are presented in Figure 4.

Figure 3. Plots of hazard rate of BNDL distribution for some choices of p.

Figure 4. Plots of the BNDL model for (a) Mean, (b) Variance, (c) Skewness, (d) Kurtosis and (e) ID.

The mean residual life function of

X

is given by

m (x; p) = P (X - x | X > x) = \frac{\sum_{t = x + 1}^{\infty} S (t; p)}{S (x; p)} = \frac{(p - 1) (p^{2} - x - 5)}{3 + p - p^{2} + x} .

Corollary 1.

If

X ~ B N D L (p)

distribution, then it has an increasing failure rate and decreasing mean residual life.

As we explained through Theorem 2, the BNDL distribution has a property of log-concavity; therefore, according to Gupta et al. [21], the BNDL distribution has an IFR property. According to Kemp [22], the next chain is verified

IFR \Rightarrow IFRA \Rightarrow NBU \Rightarrow NBUE \Rightarrow DMRL .

So, the BNDL distribution is

IFR (increasing failure rate).
IFRA(increasing failure rate average).
NBU (new better than used).
NBUE(new better than used in expectation).
DMRL (decreasing mean residual lifetime).

2.4. Stochastic Orderings

Stochastic orders are important measures to judge comparative behaviors of random variables. Shaked and Shanthikumar [8] showed that many stochastic orders exist and have various applications. Given two random variables

X

and

Y,

we say that

X

is smaller than

Y

in the

Usual stochastic order, denoted by $X \leq_{s t} Y$ , if $F_{X} (x) \geq F_{Y} (x)$ , for all $x$ .
Hazard rate order, denoted by $X \leq_{h r} Y$ , if $h_{X} (x) \geq h_{Y} (x)$ , for all $x$ .
Reversed hazard rate order, denoted by $X \leq_{r h} Y$ , if $F_{X} (x) / F_{Y} (x)$ decreases in $x$ .
Mean residual life order, denoted by $X \leq_{m r l} Y$ , if $m_{X} (x) \leq m_{Y} (x)$ , for all x.
Likelihood ratio order, denoted by $X \leq_{l r} Y$ , if $f_{X} (x) / f_{Y} (x)$ decreases in $x$ .

For all the previous orders, we have the following chains of implications:

X \leq_{l r} Y \Rightarrow X \leq_{h r} Y \Rightarrow X \leq_{s t} Y,

and

X \leq_{l r} Y \Rightarrow X \leq_{r h} Y \Rightarrow X \leq_{s t} Y

also,

X \leq_{h r} Y \Rightarrow X \leq_{m r l} Y .

Theorem 3.

Let

X ~ B N D L (p_{1})

and

Y ~ B N D L (p_{2})

; then,

X \leq_{l r} Y

for all

p_{1} > p_{2} .

Proof.

Let

L (x; p_{1}, p_{2}) = \frac{p_{X} (x; p_{1})}{p_{Y} (x; p_{2})} .

Now,

L (x; p_{1}, p_{2}) = \frac{(p_{2} + 1) {(2 - p_{2})}^{x + 2} {(1 - p_{1})}^{x} (1 + x + 2 p_{1} - p_{1}^{2})}{(p_{1} + 1) {(2 - p_{1})}^{x + 2} {(1 - p_{2})}^{x} (1 + x + 2 p_{2} - p_{2}^{2})},

and

L (x + 1; p_{1}, p_{2}) = \frac{(p_{2} + 1) {(2 - p_{2})}^{x + 3} {(1 - p_{1})}^{x + 1} (2 + x + 2 p_{1} - p_{1}^{2})}{(p_{1} + 1) {(2 - p_{1})}^{x + 3} {(1 - p_{2})}^{x + 1} (2 + x + 2 p_{2} - p_{2}^{2})} .

Therefore,

\frac{L (x + 1; p_{1}, p_{2})}{L (x; p_{1}, p_{2})} = \frac{(2 - p_{2}) (1 - p_{1}) (2 + x + 2 p_{1} - p_{1}^{2}) (1 + x + 2 p_{2} - p_{2}^{2})}{(2 - p_{1}) (1 - p_{2}) (2 + x + 2 p_{2} - p_{2}^{2}) (1 + x + 2 p_{1} - p_{1}^{2})}

(5)

Let

p_{1} = 1 - δ

and

p_{2} = 1 - δ - ε

, where

0 < δ < 1

and

0 < ε < 1 - δ .

After substitution of the values

p_{1}

and

p_{2}

in (5), we obtain

\frac{L (x + 1; p_{1}, p_{2})}{L (x; p_{1}, p_{2})} = \frac{η_{1} (δ + δ^{2} + δ ε)}{η_{2} (δ + δ ε + δ^{2} + ε)},

where

η_{1} = (3 + x - δ^{2}) (2 + x - {(δ + ε)}^{2}),

and

η_{2} = (3 + x - {(δ + ε)}^{2}) (2 + x - {(δ)}^{2}) .

After some algebraic operations, we find that

η_{1} - η_{2} = - ε (2 δ + ε) < 0 \Rightarrow η_{1} < η_{2} .

Therefore,

η_{1} (δ + δ^{2} + δ ε) < η_{2} (δ + δ ε + δ^{2} + ε) .

This implies that

\frac{L (x + 1; p_{1}, p_{2})}{L (x; p_{1}, p_{2})} < 1 \Rightarrow L (x + 1; p_{1}, p_{2}) < L (x; p_{1}, p_{2}) .

□

2.5. Entropy

Entropy is a measure of uncertainty of a random variable. The entropy of a discrete random variable

X

with pmf

p (x)

and alphabet

X

is given by

ℍ (X) = - E (l o g p (X)) = - \sum_{x \in X} p (x) l o g (p (x)) .

Entropy can be interpreted as the measure of average uncertainty in

X

or the average number of bits needed to describe

X

. For more details on entropy and information theory, we refer the reader to Gray [23].

Now, if

X ~ BNDL (p)

, then the entropy of the random variable

X

can be calculated by the following formula

ℍ (X) = \frac{1}{{(2 - p)}^{2} (1 + p)} {{(2 - p)}^{2} [(- 2 + p + p^{2}) l o g (1 - p) + (4 + p - p^{2}) l o g (2 - p) + (1 + p) l o g (1 + p)] + LerchPhi^{(0, 1, 0)} [\frac{1 - p}{2 - p}, - 1, 1 + 2 p - p^{2}]},

where

{LerchPhi}^{(0, 1, 0)} [z, s, a]

gives the Lerch transcendent

Φ (z, s, a) = \sum_{k = 0}^{\infty} \frac{z^{k}}{{(a + k)}^{s}}

. Table 2 presents some numerical values of the entropy of

X ~ BNDL (p)

for different choices of

p

. From Table 2, one can observe that

ℍ (X)

is monotonically decreasing in

p \in (0, 1)

with its limits tending to be 1.88 as

p

tends to 0 as

p \to 1 .

Table 2. Numerical results of

ℍ (X)

for different values of the parameter p.

Figure 5 relates the

ℍ (X)

to the values of parameter p. One may note that (X) is monotonically decreasing in p ∈ (0, 1) with its limit inclining to zero as p tends to 1.

Figure 5.

ℍ (X)

of X versus p.

3. Estimation and Simulation

In this section, we determine the estimation of unknown parameter

p

by the maximum likelihood, moment and proportion methods.

3.1. Method of Maximum Likelihood Estimation

Let

x_{1}, x_{2}, \dots, x_{n}

be the observed values from the BNDL distribution with parameter

p

. The likelihood and log-likelihood function are given, respectively, as

L (p) = \prod_{i = 1}^{n} f (x_{i}) = \prod_{i = 1}^{n} \frac{{(1 - p)}^{x_{i}} (1 + x_{i} + 2 p - p^{2})}{(p + 1) {(2 - p)}^{x_{i} + 2}},

and

l (p) = l o g (1 - p) \sum_{i = 1}^{n} x_{i} + \sum_{i = 1}^{n} l o g (1 + x_{i} + 2 p - p^{2}) - n l o g (p + 1) - 2 n l o g (2 - p) - l o g (2 - p) \sum_{i = 1}^{n} x_{i} .

The maximum likelihood estimate (MLE) of the parameter

p

can be obtained by solving the following equation using some numerical procedures.

\frac{\partial l (p)}{\partial p} = \frac{3 p n}{2 + p - p^{2}} - \frac{\sum_{i = 1}^{n} x_{i}}{2 - 3 p + p^{2}} + 2 \sum_{i = 1}^{n} \frac{1 - p}{1 + 2 p - p^{2} + x_{i}} = 0

3.2. Method of Moments Estimation

Let

X_{1}, X_{2}, \dots, X_{n}

be a random sample from the BNDL distribution with parameter

p

. The moment estimate (ME) of the parameter

p

can be obtained by solving the following equation.

\frac{(p + 2) (1 - p)}{p + 1} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} .

3.3. Method of Proportions Estimation

Let

X_{1}, X_{2}, \dots, X_{n}

be a random sample from the BNDL distribution with parameter

p

. For

i = 1, 2, \dots, n

, we define the indicator functions

I (X_{i}) = {\begin{matrix} 1 i f X_{i} = 0 \\ 0 i f X_{i} > 0 \end{matrix}

.

Therefore, the proportion of 0s in the sample

Π = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i})

. The proportion estimate (PE) of the parameter

p

can be obtained by solving the following equation with respect to

p

Π = \frac{1 + 2 p - p^{2}}{(p + 1) {(2 - p)}^{2}} .

3.4. Simulation Study

In this section, we assess the behavior of the maximum likelihood estimators for a finite sample of size n. Based on BNDL distribution, a simulation study is carried out. The simulation study is based on the following steps: firstly, generate N = 1000 samples of sizes n = 25, 50, …, 500 from the BNDL distribution. Then, compute the maximum likelihood estimators for the model parameters. Lastly, compute the MSEs given by

MSE (p) = \frac{1}{1000} \sum_{i = 1}^{1000} {(\hat{p} - p)}^{2}

For various parameters’ values, the simulation’s results provided in Figure 6 indicate that the estimated MSEs fall off toward zero when the sample size n increases. Hence, we have conclusive evidence to claim that the maximum likelihood estimation of p satisfies the asymptotic convergence of normality. The asymptotic normality of the MLE is a very well-known classic property given as follows. In a parametric model, we say that an estimator

\hat{p}

based on

X_{1}, X_{2}, X_{3}, \dots, X_{n}

is consistent if

\hat{p} \to p

in probability as

n \to \infty

. We say that it is asymptotically normal if

\sqrt{n} (\hat{p} - p)

converges in distribution to a normal distribution. So

\hat{p}

above is consistent and asymptotically normal.

Figure 6. Plots of the estimated parameter and MSEs for various values of p.

4. Applications to Count Data

In this section, to show the application, we used a real-life data set to examine the efficiency and superiority of the BNDL distribution in modeling real data practice, recently studied by Balakarishnan et al. [24], consisting of 744 discrete observations. Santiago, Chile is recognized as one of the most environmentally contaminated cities in the world. In order to obtain the level of air pollution and its associated adverse effects on humans in Santiago, the National Commission of Environment (CONAMA) of the government of Chile collects data on sulfur dioxide (SO₂) concentrations in the air. The data corresponding to the hourly SO₂ concentrations (in ppm) observed at a monitoring station located in Santiago city are:

x	1	2	3	4	5	6	7	8	9	10 and above
f	86	235	120	119	35	15	11	9	4	10

The descriptive statistics of the data sets are, Mean = 2.93, Median = 2, Mode = 3, SD = 2.02, Coefficient of Variation = 0.69, Skewness = 4.32, Kurtosis = 34.57, Range = 24, Min value = 1 and Max value = 25.

We compare BNDL to Binomial–Discrete Lindley Distribution (BDLD) by Kuş et al. [15] and Negative Binomial distribution. The pmf of BDLD is given as

p_{x} (x; p) = \frac{p^{2 x} [{p^{3} - (1 - p) (1 - p - x)} l o g (p) + (1 - p) {1 - p (1 - p)}]}{{1 - l o g (p)} {1 - p (1 - p)}^{x + 2}}

We considered the AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion) and HQIC (Hannan–Quinn Information Criterion). The model with minimum values for these statistics could be chosen as the best model to fit the data. All results in Table 3 were obtained using the R PROGRAM.

Table 3. MLEs and their standard errors (in parentheses) with statistics AIC, BIC, HQIC and CAIC values for given data.

Figure 7 gives the quantile–quantile plot (Q-Q plot) and box plot and Figure 8 gives TTT plot versus the EHRF for the given data set. Total Time on Test (TTT plots) showed that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figure 9 and Figure 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.

Figure 7. (a) QQ plot and (b) box for the given data.

Figure 8. (a) TTT plot and (b) Expected Hazard Rate Function (EHRF) for the BDLD model for the dataset.

Figure 9. Fitted plots of BNDL and BDLD distribution for given data set.

Figure 10. Fitted plot of Negative Binomial distributions for given data set.

5. Concluding Remarks

A new one-parameter discrete distribution was proposed and its important distributional, monotonic, and reliability characteristics were explored. Some statistical and reliability properties of the proposed discrete model were derived. Various estimating approaches were discussed. A simulation study was conducted to determine the MLEs’ accuracy and precision. The applicability of the proposed distribution in modeling a real-life discrete data set was demonstrated. It is clear from the comparison that the new distribution is the best distribution for fitting the data sets from among the all-tested distributions and it will be a useful contribution to the field of count data modeling.

Author Contributions

Conceptualization, S.S. and S.K.; methodology, W.M.; software, J.G.; validation, S.S. and S.K.; formal analysis, W.M.; investigation, S.S.; resources, F.J.; data curation, W.M.; writing—original draft preparation, S.S. and W.M.; writing—review and editing, S.K.; visualization, J.G.; supervision, S.K.; project administration, F.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aryuyuen, S.; Bodhisuwan, W.; Volodin, A. Discrete Generalized Odd Lindley—Weibull Distribution with Applications. Lobachevskii J. Math. 2020, 41, 945–955. [Google Scholar] [CrossRef]
Chakraborty, S. A New Discrete Distribution Related to Generalized Gamma Distribution and Its Properties. Commun. Stat. Theory Methods 2015, 44, 1691–1705. [Google Scholar] [CrossRef]
Chakraborty, S.; Chakravarty, D. Discrete Gamma Distributions: Properties and Parameter Estimations. Commun. Stat. Theory Methods 2012, 41, 3301–3324. [Google Scholar] [CrossRef]
Chakraborty, S.; Dhrubajyoti, C. A Discrete Gumbel Distribution. arXiv 2014. Available online: https://arxiv.org/abs/1410.7568 (accessed on 8 June 2022).
El-Morshedy, M.; Eliwa, M.S.; Nagy, H. A New Two-Parameter Exponentiated Discrete Lindley Distribution: Properties, Estimation and Applications. J. Appl. Stat. 2018, 47, 354–375. [Google Scholar] [CrossRef]
Gómez-Déniz, E.; Calderín-Ojeda, E. The Discrete Lindley Distribution: Properties and Applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
Hu, Y.; Peng, X.; Li, T.; Guo, H. On the Poisson Approximation to Photon Distribution for Faint Lasers. Phys. Lett. A 2007, 367, 173–176. [Google Scholar] [CrossRef] [Green Version]
Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
Nekoukhou, V.; Alamatsaz, M.H.; Bidram, H. Discrete Generalized Exponential Distribution of a Second Type. Statistics 2013, 47, 876–887. [Google Scholar] [CrossRef]
Para, B.A.; Jan, T.R. Discrete Generalized Weibull Distribution: Properties and Applications in Medical Sciences. Pak. J. Stat. 2017, 33, 337–354. [Google Scholar]
Roy, D. The Discrete Normal Distribution. Commun. Stat.-Theory Methods 2003, 32, 1871–1883. [Google Scholar] [CrossRef]
Afify, A.Z.; Elmorshedy, M.; Eliwa, M.S. A New Skewed Discrete Model: Properties, Inference, and Applications. Pak. J. Stat. Oper. Res. 2021, 17, 799–816. [Google Scholar] [CrossRef]
Déniz, E.G. A New Discrete Distribution: Properties and Applications in Medical Care. J. Appl. Stat. 2013, 40, 2760–2770. [Google Scholar] [CrossRef]
Akdoğan, Y.; Kuş, C.; Asgharzadeh, A.; Kinaci, I.; Sharafi, F. Uniform-Geometric Distribution. J. Stat. Comput. Simul. 2016, 86, 1754–1770. [Google Scholar] [CrossRef]
Kuş, C.; Akdoğan, Y.; Asgharzadeh, A.; Kınacı, I.; Karakaya, K. Binomial-Discrete Lindley Distribution. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2019, 68, 401–411. [Google Scholar] [CrossRef]
Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications. Entropy 2020, 22, 603. [Google Scholar] [CrossRef]
Yalcin, F.; Simsek, Y. Formulas for characteristic function and moment generating functions of beta type distribution. Rev. Real Acad. Cienc. Exactas Físicas Y Naturales. Ser. A Matemáticas 2022, 116, 86. [Google Scholar] [CrossRef]
Yalcin, F.; Simsek, Y. Anew class of symmetric beta type distributions constructed by means of symmetric Bernstein type basis functions. Symmetry 2020, 12, 779. [Google Scholar] [CrossRef]
Simsek, B. Formulas derived from moment generating functions and Bernstein polynomials. Appl. Anal. Discret. Math. 2019, 13, 839–848. [Google Scholar] [CrossRef] [Green Version]
Keilson, J.; Gerber, H. Some Results for Discrete Unimodality. J. Am. Stat. Assoc. 1971, 66, 386–389. [Google Scholar] [CrossRef]
Gupta, P.L.; Gupta, R.C.; Tripathi, R.C. On the monotonic properties of discrete failure rates. J. Stat. Plan. Inference 1997, 65, 255–268. [Google Scholar] [CrossRef]
Kemp, A.W. Classes of discrete lifetime distributions. Commun. Stat. Theory Methods 2004, 33, 3069–3093. [Google Scholar] [CrossRef]
Gray, R.M. Entropy and Information Theory; Springer: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Balakrishnan, N.; Leiva, V.; Sanhueza, A.; Cabrera, E. Mixture inverse Gaussian distributions and its transformations, moments and applications. Statistics 2009, 431, 91–104. [Google Scholar] [CrossRef]

Figure 1. Pmf of BNDL distribution for some choices of p.

Figure 2. Histograms of the BNDL model for simulated data.

Figure 3. Plots of hazard rate of BNDL distribution for some choices of p.

Figure 4. Plots of the BNDL model for (a) Mean, (b) Variance, (c) Skewness, (d) Kurtosis and (e) ID.

Figure 5.

ℍ (X)

of X versus p.

Figure 6. Plots of the estimated parameter and MSEs for various values of p.

Figure 7. (a) QQ plot and (b) box for the given data.

Figure 8. (a) TTT plot and (b) Expected Hazard Rate Function (EHRF) for the BDLD model for the dataset.

Figure 9. Fitted plots of BNDL and BDLD distribution for given data set.

Figure 10. Fitted plot of Negative Binomial distributions for given data set.

Table 1. Mean, Variance, Skewness, kurtosis and ID of the BNDL distribution for different values of the parameter p.

p	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Mean	1.71818	1.4666	1.2384	1.0285	0.8333	0.6500	0.4764	0.3111	0.1526
Variance	3.3314	2.7288	2.1923	1.7191	1.3055	0.9475	0.6412	0.3832	0.1703
Skewness	1.5578	1.6186	1.6831	1.7542	1.8372	1.9427	2.0935	2.3522	2.9813
Kurtosis	7.7069	9.4991	11.8378	15.0902	19.9488	27.8656	42.3746	74.4447	180.1786
ID	1.9389	1.8606	1.770268	1.6714	1.5666	1.4576	1.3459	1.2317	1.1159

Table 2. Numerical results of

ℍ (X)

for different values of the parameter p.

Table 2. Numerical results of

ℍ (X)

for different values of the parameter p.

p	$ℍ (X)$	p	$ℍ (X)$
0.0001	1.87934	0.5	1.25943
0.01	1.86852	0.55	1.18391
0.03	1.84654	0.6	1.10402
0.05	1.82437	0.65	1.01888
0.07	1.80201	0.7	0.927315
0.09	1.77948	0.75	0.827736
0.11	1.75675	0.8	0.717861
0.14	1.72231	0.85	0.594157
0.17	1.6874	0.9	0.450497
0.2	1.652	0.95	0.273684
0.25	1.59181	0.96	0.231718
0.3	1.52994	0.97	0.186252
0.35	1.46611	0.98	0.135994
0.4	1.40002	0.99	0.078212
0.45	1.33128	0.999	0.0112562

Table 3. MLEs and their standard errors (in parentheses) with statistics AIC, BIC, HQIC and CAIC values for given data.

Distribution	MLE (SE)	MEASURES
Distribution	MLE (SE)	AIC	CAIC	BIC	HQIC
BNDL (p)	0.6283 (0.0129)	2681.839	2681.844	2686.451	2683.616
BDLD (p)	0.6922 (0.0055)	3092.3700	3092.3760	3096.9820	3094.1480
Negative Binomial (n, k)	17.2957, 2.9262 (4.7378, 0.0678)	2824.156	2849.44	2833.38	2818.69

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data

Abstract

1. Introduction

2. Natural Discrete Lindley Distribution

2.1. The Proposed Discrete Analog

2.2. Statistical Properties of the BNDL Distribution

2.2.1. Moment-Generating Function

2.2.2. Probability-Generating Function

2.2.3. Non-Central Moments and Variance

2.2.4. Central Moments

2.2.5. Skewness and Kurtosis

2.2.6. Index of Dispersion

2.2.7. Log-Concavity

2.3. Reliability Properties of the BNDL Distribution

2.3.1. Survival Function

2.3.2. Hazard Rate and Mean Residual Life Functions

2.4. Stochastic Orderings

2.5. Entropy

3. Estimation and Simulation

3.1. Method of Maximum Likelihood Estimation

3.2. Method of Moments Estimation

3.3. Method of Proportions Estimation

3.4. Simulation Study

4. Applications to Count Data

5. Concluding Remarks

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics