1. Introduction
In recent decades, count data analysis has drawn interest. There are many count data sets in practical as well as theoretical domains, including medicine, sports, engineering, finance, insurance, etc. (see [
1]). However, we are unable to use methodologies or typical standard probability distributions to analyze them. Building adaptable models has attracted a lot of interest from statisticians and applied scientists in order to improve the modeling of count data. Therefore, it is critical to create models that are superior to standard distributions in order to successfully investigate real-world data and its attributes.
Recently, for the purpose of modeling count data, several models have evolved. The use of conventional discrete distributions as models for dependability, hazard rates, counts, etc., is limited. The widespread parametric models for analyzing such data are the Poisson, geometric, and negative binomial (NB) models (see [
2]). The Poisson regression model is the most common model for modeling count data, but an obstacle arises: there is a fact that they may exhibit over- or under-dispersion, which is when a count’s conditional variance is greater or less than its conditional mean (see [
3]). In these cases, the Poisson model’s mean–variance relationship is a well-known drawback. This has led to the introduction of various Poisson distribution types (see [
4,
5]). A traditional way of overcoming over-dispersion is to allow the single parameter of the Poisson distribution to be a random variable following a given distribution. This is also known as the compounding method, and the idea was first proposed in [
6]. The resultant compound distributions are also termed as mixture distributions. One such famous mixture distribution is the negative binomial distribution, obtained by mixing the Poisson distribution with a gamma distribution. In real-world count modeling applications, the negative binomial distribution with an additional dispersion parameter is widely accepted as a solution to the over-dispersion issue.
As a result, various discrete distributions based on widely used continuous distributions for reliability, hazard rates, etc., have been developed. The discrete Weibull distribution is the most well-liked of these. It was introduced in [
7,
8,
9]. Since then, numerous applications have been made. There are many other recently constructed distributions with continuous analogues. The author in [
10] introduced the discrete gamma distribution, which has received significant attention for applications in the areas of molecular biology and evolution. Discrete analogues of the continuous Burr and Pareto distributions were constructed in [
11]. On the other hand, the authors in [
12] introduced a discrete analogue of the continuous inverse Weibull distribution. The discrete Lindley distribution was proposed in [
13].
There are so many models for studying over-dispersion, while only a few models are there to deal with under-dispersion, because over-dispersion exists more frequently (see [
14]).
Various extensions and generalizations of the Poisson distributions were developed for both over-dispersed and under-dispersed count data in the literature over the last decade. The authors in [
15] proposed the generalized Poisson (GP) regression model, whereas those of [
16] introduced the Conway–Maxwell–Poisson (COM–Poisson) model. The COM–Poisson regression model was also created. The authors in [
17] invented the Poisson–Tweedie regression model.
Each of the aforementioned models has some drawbacks. For instance, the GP model’s range must be truncated in order to achieve under-dispersion, with the level of truncation depending on the actual model parameters. The issue is that because of the range’s shortening, the probabilities no longer add up to 1. The convolutions (sum and difference) of two independent random variables are a clever way of broadening the modeling possibilities of well-known distributions.
The author in [
18] proposed the discrete Poisson–Lindley distribution, a compound Poisson distribution obtained by compounding the Poisson distribution with the Lindley distribution. The authors in [
19] introduced an efficient regression model for under-dispersed count data based on the Bernoulli–Poisson convolution (BerPoi) for under-dispersed count data. In it, the response variable is distributed according to the BerPoi distribution using a specific parameterization indexed by mean and dispersion parameters.
In this paper, we introduce a distribution generated from the sum of two independent random variables, one with the Bernoulli distribution and the other with the Poisson–Lindley distribution. The resulting distribution is known as the Bernoulli–Poisson–Lindley (BPL) distribution. One of its key advantages is that it is suitable for modeling both under-dispersed and over-dispersed count data, unlike the Poisson distribution. Furthermore, it has only two parameters, which reduces the complexity of the simulation study, unlike some Poisson generalizations with three parameters. Moreover, it has an increasing hazard rate, making it appropriate for modeling equipment wear and tear or ageing processes. The proposed model is appropriate for regression modeling since its moments may be retrieved in closed form.
The remaining sections of the paper are organized as follows:
Section 2 presents the BPL distribution.
Section 3 discusses the statistical properties of this distribution.
Section 4 introduces the parameter estimation using the maximum likelihood method, and its performance is assessed via a simulation study. The new model is shown to perform at least as well as other recently proposed two-parameter discrete models, and the conventional one-parameter discrete models using two real data sets are analyzed in
Section 5. In
Section 6, a regression model is developed. Finally, several key takeaways are outlined in
Section 7.
2. Bernoulli-Poisson-Lindley Distribution
The BPL distribution is obtained by the distribution of the sum of two independent random variables, one with the Bernoulli distribution, and the other with the Poisson–Lindley distribution.
The result below presents a simple expression of the corresponding probability mass function (pmf).
Proposition 1. The pmf of the BPL distribution with parameters α and θ can be expressed as Proof. Let
and
be two independent random variables, with
following the Bernoulli distribution with parameter
, i.e.,
and
and
following the Poisson–Lindley distribution with parameter
, i.e.,
with
Then, by the definition, the BPL distribution is the distribution of
. Let us now determine its pmf. For any
, we have
In particular, for
, we have
For
, we have
This ends the proof of Proposition 1. □
Remark 1. When , the Poisson–Lindley distribution is included in the BPL distribution as a special case.
Proposition 2. The cumulative density function (cdf) of the BPL distribution can be expressed as, for any integer x, Proof. It follows from the geometric series expansions and some algebra, that
This ends the proof of Proposition 2. □
The corresponding survival function is given by
The hazard rate function (hrf) of the BPL distribution is obtained as
Figure 1 shows the different shapes of the pmf. It clearly indicates that the BPL distribution is positively skewed, unimodal and as
goes larger, the mass concentrates more on values closer to 0 than at higher values.
Figure 2 also presents different shapes of the cdf.
Figure 3 presents different shapes of the hrf, indicating that the BPL distribution exhibits increasing hazard rates with respect to both
and
.