Next Article in Journal
Modified F(R,T2)-Gravity Coupled with Perfect Fluid Admitting Hyperbolic Ricci Soliton Type Symmetry
Previous Article in Journal
Dynamic Sliding Mode Control of Spherical Bubble for Cavitation Suppression
Previous Article in Special Issue
Unit-Power Half-Normal Distribution Including Quantile Regression with Applications to Medical Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Conflation of Negative Binomial and Logarithmic Distributions

by
Anfal A. Alqefari
1,
Abdulhamid A. Alzaid
1 and
Najla Qarmalah
2,*
1
Department of Statistics and Operations Research, College of Sciences, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
2
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(10), 707; https://doi.org/10.3390/axioms13100707
Submission received: 19 August 2024 / Revised: 7 October 2024 / Accepted: 11 October 2024 / Published: 13 October 2024
(This article belongs to the Special Issue Probability, Statistics and Estimations, 2nd Edition)

Abstract

:
In recent decades, the study of discrete distributions has received increasing attention in the field of statistics, mainly because discrete distributions can model a wide range of count data. One common distribution used for modeling count data, for instance, is the negative binomial distribution (NBD), which performs well with over-dispersed data. In this paper, a new count distribution is introduced, called the conflation of negative binomial and logarithmic distributions, which is formed by conflating the negative binomial and logarithmic distributions, resulting in a distribution that possesses some of the properties of negative binomial and logarithmic distributions. The distribution has two parameters and is verified by a positive integer. Two modifications are proposed to the distribution, which includes zero as a support point. The new distribution is valuable from a theoretical perspective since it is a member of the weighted negative binomial distribution family. In addition, the distribution differs from the NBD in the sense that the probability of lower counts is inflated. This study discusses the characteristics of the proposed distribution and its modified versions, such as moments, probability generating functions, likelihood stochastic ordering, log-concavity, and unimodality properties. Real-world data are used to evaluate the performance of the proposed models against other models. All computations shown in this paper were produced using the R programming language.

1. Introduction

The analysis and modeling of count data have received significant attention in recent decades, with a particular emphasis being placed on the development of discrete distributions. A widely used model for the analysis and modeling of count data is the Poisson distribution (PD). The important condition for the PD is an equal dispersion of count data. The equal dispersion can be evaluated using a statistical measure known as an index of dispersion (ID). The ID defines the quantity of variability in a distribution. The definition of the ID is given below:
Definition 1.
The index of dispersion of a distribution, denoted as ID, can be defined as follows:
ID = Var ( Y ) μ ,
where Var(Y) and μ are the variance and the mean of Y, respectively. The ID implies the following:
  • If ID > 1 , then it is over-dispersion.
  • If ID < 1 , then it is under-dispersion.
  • If ID = 1 , then it is equal dispersion.
For more details, see [1].
However, this condition is rarely observed in practical scenarios. Count data often show over-dispersion, thereby requiring an investigation of modeling alternatives, which provide greater flexibility than the PD. The negative binomial distribution (NBD) is a frequently employed alternative distribution for modeling count data, particularly for data that show over-dispersion. The NBD is extensively employed in the modeling of diverse datasets, including biological, medical sciences, accident statistics, social sciences, economics, quality control, ecology, and so forth. The study [2] outlines the NBD as a mixture of PDs, and the mean of the PD is a random variable following a gamma distribution. The probability mass function (pmf) of the NBD is denoted as follows:
f ( x ) = Γ ( x + r ) Γ ( x + 1 ) Γ ( r ) p x ( 1 p ) r , 0 < p < 1 , r > 0 , x = 0 , 1 , 2 , .
For more details about the NBD and its properties, see [1].
Furthermore, count data frequently display an excess of zeros and heterogeneity in variance, making traditional statistical distributions insufficient for modeling purposes. Nevertheless, the NBD is frequently favored over the PD because of its ability to provide increased flexibility in modeling data that appear over-dispersed, as previously indicated. Many studies have developed alternative models to deal with the presence of excess zeros and variability in the dataset. For example, zero-inflated models, hurdle models, or finite mixture models have been proposed in order to more efficiently deal with this issue. In addition, the mixing of PD or NBD with a lifetime distribution is frequently employed for the same issue. For example, numerous research studies have demonstrated that the mixed negative binomial distribution offers a superior fit for count data in comparison to the PD and NBD. Moreover, weighted distributions are used to solve the problem by multiplying count distributions with weight functions, as developed in [3,4]. Since then, the concept of weighted distributions has established itself in the literature as a powerful tool for modeling. By allowing us to adjust probabilities based on specific weights assigned to each outcome, weighted distributions provide a flexible framework that enhances our ability to accurately represent and analyze complex real-world phenomena. This adaptability has made weighted distributions invaluable in various fields such as statistics, biostatistics, biomedicine, ecology, survival data analysis, meta-analysis, and intervention data analysis. For discrete distributions, the weighted distribution is defined as follows:
Definition 2.
Let X be a random variable with pmf f ( x ) and let w ( x ) be a non-negative weighting function such that 0 < E [ w ( X ) ] < exists and is finite. Then, the pmf f w ( x ) is defined as follows:
f w ( x ) = w ( x ) f ( x ) E w ( X ) , x N .
is a weighted distribution of f ( x ) .
For more information on weighted distributions for discrete random variables, refer to the work [1] on the subject. For example, the modified negative binomial distribution in [5] can be viewed as a weighted geometric distribution.
Although the standard distributions possess attractive characteristics, they do not provide the best fit for real-world data that have deviations. The source of deviations can be either inflation in low counts or high dispersion. Hence, there is a need to develop new distributions that demonstrate superior performance. In this study, we employ the idea of conflation as a tool to cope with the presence of excess zeros or generally low counts and over-dispersed data. The concept of the conflation of probability distributions is presented by [6] and defined as follows:
Definition 3.
If f 1 , f 2 , , f n are pmfs, then the corresponding conflated distribution f C is
f C ( x ) = i = 1 n f i ( x ) y A i = 1 n f i ( y ) , x N .
where A is the intersection of the supports of all the distributions. In terms of random variables, if X 1 , , X n are independent with pmfs f 1 , , f n , respectively, then
f C ( x ) = P ( X 1 = x X 1 = X 2 = = X n ) .
Ref. [6] presented conflation as a method for consolidating data from several independent experiments, all of which were designed to measure the same unknown quantity. In other words, distribution conflation is a distribution that inherits some properties from its components. For n = 2 , Equation (1) can be viewed as a weighted distribution, where one mass function is the parent distribution and the other one is a weight function. In this sense, the conflation distributions are weighted distributions. Hence, one may use conflation methods to model data with a high excess of low counts and over-dispersion by conflating a distribution with a decreasing mass function with an over-dispersed distribution.
As a result, we introduce a new distribution by combining the NBD and the logarithmic distribution (LD) into a single distribution that reflects the common information between them with minimal loss of information. The pmf of the LD is given by:
f LD ( x ) = 1 log ( 1 p ) p x x , 0 < p < 1 , x = 1 , 2 , 3 , .
For more details about the logarithmic distribution and its properties, see [1].
The new distribution is called the conflation of negative binomial and logarithmic distributions (CNBLD). The LD has a decreasing pmf, hence it is capable of modeling data with a high frequency for low counts while the NBD is over-dispersed; therefore, their conflation is expected to handle data expressing high excesses of low counts and over-dispersion. The LD does not support zero, hence the CNBLD inherits this property, which limits the applications of the CNBLD to positive count data. To overcome this problem, the study also presents two modifications of the CNBLD. The first modification shifts the CNBLD one position to the left, resulting in the shifted CNBLD that is denoted as SCNBLD. The SCNBLD retains the flexibility of the CNBLD but extends its support to zero values. The second modification conflates a shifted logarithmic distribution with the NBD, resulting in the conflation of a negative binomial shift logarithmic distribution (CNBSLD). The CNBSLD also aims to combine the features of both distributions to provide flexibility and the ability to model a wider range of data.
The structure of this paper is organized as follows: Section 2 presents the definitions and discusses the graphical representations of the proposed models; Section 3 describes some of the statistical properties of the proposed models, such as moments, log-concavity, index of dispersion, and likelihood ratio stochastic order; Section 4 discusses the estimation of the parameters using the method of moments and the maximum likelihood method and evaluates the accuracy of these estimates by a simulation study; Section 5 outlines the usefulness of the new distribution across several fields, showing its superior performance compared to the existing modified negative binomial distributions employed to fit similar data; finally, Section 6 presents a conclusion.

2. Conflation of Negative Binomial and Logarithmic Distributions

In this section, the conflation of negative binomial logarithmic distributions (CNBLD), and the developed versions of the CNBLD are introduced. The developed versions of the CNBLD are named as follows: shifted conflation of negative binomial logarithmic distributions (SCNBLD) and conflation of negative binomial weighted by shift logarithmic distribution (CNBSLD). This section outlines the p m f s , the cumulative distribution functions ( c d f ), which are denoted as F ( · ) , the survival functions ( s f ), which are denoted as ( F ¯ ( · ) ) , and the hazard rate functions (h) of the CNBLD, SCNBLD, and CNBSLD.
Definition 4.
The random variable Y is said to follow the CNBLD with parameters r > 0 and 0 < θ < 1 if its p m f is given as follows:
f ( y ) = P ( Y = y ) = C 1 ( r , θ ) Γ ( y + r ) θ y Γ ( y + 1 ) Γ ( r ) y , y = 1 , 2 , .
Here, C 1 ( r , θ ) is the normalizing constant that can be expressed as follows:
C ( r , θ ) = k = 1 Γ ( k + r ) Γ ( k + 1 ) Γ ( r ) θ k k = z = 0 Γ ( z + r + 1 ) Γ ( z + 2 ) Γ ( r ) θ z + 1 z + 1 = θ r z = 0 ( 1 ) z ( 1 ) z ( r + 1 ) z ( 2 ) z ( 2 ) z θ z z ! = θ r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) .
where ( a ) u = Γ ( a + u ) Γ ( a ) is the Pochhammer symbol, and F 2 3 is a generalized hypergeometric function; for more details, see [1,7].
The generalized hypergeometric function is available in popular programming packages such as R, Mathematica, MATLAB, Python, and others. In this paper, we used the genhypergeo(.) function from the hypergeo package in R.
In comparison with (1), in terms of random variables, if X and Z are independent random variables with X following an NBD and Z following an LD, then Y = d X X = Z . In the special case when r = 1 , the CNBLD reduces to the LD.
Remark 1.
It should be noted that according to Definition 4, the CNBLD is a weighted negative binomial distribution with an LD as the weight function. In addition, the CNBLD can be considered as a weighted logarithmic distribution with an NBD as the weighting function.
Next, the p m f s of the CNBLD are visualized for different values of parameters θ = 0.2 , 0.5, and 0.8 and r = 1 , 4 , and 8 in Figure 1. The parameter θ has an impact on the dispersion of the distribution, while the parameter r is mainly responsible for the shape of the CNBLD. In general, the shape of the p m f s of the CNBLD is skewed to the right; however, with an increase in the value of r and θ , the distribution becomes less skewed and displays more symmetry. On the other hand, for smaller values of θ and r, the pmf of the CNBLD is a decreasing function with a high probability for low y values. However, as θ and r increase, the function’s behavior shifts, initially rising to a peak before decreasing. This change illustrates how larger parameters introduce greater variability into the distribution.
The c d f , s f , and h of the CNBLD, respectively, are as follows:
F ( y ) = 1 θ y Γ ( y + r + 1 ) F 2 3 ( 1 , y + 1 , y + r + 1 ; y + 2 , y + 2 ; θ ) Γ ( y + 2 ) Γ ( r + 1 ) ( y + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) , F ¯ ( y ) = θ y Γ ( y + r + 1 ) F 2 3 ( 1 , y + 1 , y + r + 1 ; y + 2 , y + 2 ; θ ) Γ ( y + 2 ) Γ ( r + 1 ) ( y + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) , and h ( y ) = ( y + 1 ) 2 θ y ( y + r ) F 2 3 ( 1 , y + 1 , y + r + 1 ; y + 2 , y + 2 ; θ ) .
We have
F ¯ ( y ) = t = y + 1 C 1 ( r , θ ) Γ ( t + r ) Γ ( t + 1 ) Γ ( r ) θ t t = C 1 ( r , θ ) A ( y )
where
A ( y ) = t = y + 1 Γ ( t + r ) Γ ( t + 1 ) Γ ( r ) θ t t = z = 0 Γ ( z + y + r + 1 ) Γ ( z + y + 2 ) Γ ( r ) θ z + y + 1 z + y + 1 = θ y + 1 z = 0 ( 1 ) z ( y + 1 ) z ( y + r + 1 ) z Γ ( y + r + 1 ) ( y + 2 ) z ( y + 2 ) z ( y + 1 ) Γ ( y + 2 ) ( r 1 ) ! θ z z ! = θ y + 1 Γ ( y + r ) Γ ( y + 1 ) Γ ( r 1 ) ( y + 1 ) z = 0 ( 1 ) z ( y + 1 ) z ( y + r + 1 ) z ( y + 2 ) z ( y + 2 ) z θ z z ! = θ y + 1 Γ ( y + r ) Γ ( y + 1 ) Γ ( r 1 ) ( y + 1 ) F 2 3 ( 1 , y + 1 , y + r + 1 ; y + 2 , y + 2 ; θ )
Thus, according to A ( y ) , the s f can be given as follows:
F ¯ ( y ) = θ y Γ ( y + r + 1 ) F 2 3 ( 1 , y + 1 , y + r + 1 ; y + 2 , y + 2 ; θ ) Γ ( y + 2 ) Γ ( r + 1 ) ( y + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) .
Using the definition of the s f , the c d f can be defined as follows:
F ( y ) = 1 F ¯ ( y )
Further, the h of the CNBLD can be calculated as follows:
h ( y ) = f ( y ) F ¯ ( y ) .
Most real-life count data have zero as a possible value. Therefore, the current study developed the CNBLD using two methods for this purpose. The first method was the obvious one which shifted the CNBLD by one to the left. The second method shifted the LD conflated with the NBD. Therefore, we obtained the following two definitions:
Definition 5.
The random variable Y is said to follow the SCNBLD with parameters r > 0 and 0 < θ < 1 , if its p m f is given by the following:
f ( y ) = Γ ( y + r + 1 ) θ y Γ ( y + 2 ) Γ ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) ( y + 1 ) , y = 0 , 1 , 2 , .
Consequently, we obtain the c d f , the s f , and the h functions of the SCNBLD, respectively, as follows:
F ( y ) = 1 θ y + 1 Γ ( y + r + 2 ) F 2 3 ( 1 , y + 2 , y + r + 2 ; y + 3 , y + 3 ; θ ) Γ ( y + 3 ) Γ ( r + 1 ) ( y + 2 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) , F ¯ ( y ) = θ y + 1 Γ ( y + r + 2 ) F 2 3 ( 1 , y + 2 , y + r + 2 ; y + 3 , y + 3 ; θ ) Γ ( y + 3 ) Γ ( r + 1 ) ( y + 2 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) , and h ( y ) = ( y + 2 ) 2 θ ( y + 1 ) ( y + r + 1 ) F 2 3 ( 1 , y + 2 , y + r + 2 ; y + 3 , y + 3 ; θ ) .
Definition 6.
The random variable Y is said to follow the CNBSLD with parameters r > 0 and 0 < θ < 1 if its p m f is given as follows:
f ( y ) = θ ( r 1 ) Γ ( y + r ) θ y Γ ( y + 1 ) Γ ( r ) ( 1 θ ) ( r 1 ) 1 ( y + 1 ) , y = 0 , 1 , .
Remark 2.
Note that the CNBSLD is a shifted LD for r = 1 and a geometric distribution for r = 2 , and hence the CNBSLD can be considered as an extension of the two distributions.
The following theorem can be used to derive the CNBSLD.
Theorem 1.
If X and Z are independent random variables following an NBD with parameters r > 0 and 0 < p 1 < 1 and a shifted logarithmic distribution with parameter 0 < p 2 < 1 , then
P ( Y = y ) = P ( X = y X = Z )
where θ = p 1 p 2 .
Proof. 
The proof can be obtained directly by calculating the conditional probability. □
The c d f , s f , and the h of the CNBSLD are, respectively:
F ( y ) = 1 θ y + 2 ( r 1 ) Γ ( y + r + 1 ) F 1 2 ( 1 , y + r + 1 ; y + 3 ; θ ) Γ ( y + 2 ) Γ ( r ) ( y + 2 ) ( 1 θ ) ( r 1 ) 1 , F ¯ ( y ) = θ y + 2 ( r 1 ) Γ ( y + r + 1 ) F 1 2 ( 1 , y + r + 1 ; y + 3 ; θ ) Γ ( y + 2 ) Γ ( r ) ( y + 2 ) ( 1 θ ) ( r 1 ) 1 , and h ( y ) = y + 2 θ ( y + r ) F 1 2 ( 1 , y + r + 1 ; y + 3 ; θ ) .
Here, F 1 2 ( · , · ; · ; · ) is the Gaussian hypergeometric function (see [1,7] for more information). It is possible to calculate the s f of the CNBSLD as follows:
F ¯ ( y ) = t = y + 1 θ ( r 1 ) Γ ( t + r ) θ t Γ ( t + 1 ) Γ ( r ) ( ( 1 θ ) ( r 1 ) 1 ) ( t + 1 ) = θ ( r 1 ) ( ( 1 θ ) ( r 1 ) 1 ) ( t + 1 ) B ( y )
where
B ( y ) = t = y + 1 Γ ( t + r ) θ t Γ ( t + 1 ) Γ ( r ) ( t + 1 ) = z = 0 Γ ( z + y + r + 1 ) θ z + y + 1 Γ ( z + y + 2 ) Γ ( r ) ( z + y + 2 ) = θ y + 1 Γ ( y + r + 1 ) Γ ( y + 2 ) Γ ( r ) ( y + 2 ) z = 0 z ! Γ ( y + r + 1 + z ) Γ ( y + 3 ) Γ ( y + r + 1 ) Γ ( y + 3 + z ) θ z z ! = θ y + 1 Γ ( y + r + 1 ) Γ ( y + 2 ) Γ ( r ) ( y + 2 ) z = 0 ( 1 ) z ( y + r + 1 ) z ( y + 3 ) z θ z z ! = θ y + 1 Γ ( y + r ) 2 F 1 ( 1 , y + r + 1 ; y + 3 ; θ ) Γ ( y + 1 ) Γ ( r 1 ) ( y + 2 ) .
Thus, according to B ( y ) , the s f can be given as follows:
F ¯ ( y ) = θ y + 2 ( r 1 ) Γ ( y + r + 1 ) F 1 2 ( 1 , y + r + 1 ; y + 3 ; θ ) Γ ( y + 2 ) Γ ( r ) ( y + 2 ) ( 1 θ ) ( r 1 ) 1
Using the definition of the s f , the c d f can be defined as follows:
F ( y ) = 1 F ¯ ( y )
Further, the h of the CNBSLD can be calculated as follows:
h ( y ) = f ( y ) F ¯ ( y ) .
A comparison between the SCNBLD, the CNBSLD, and the NBD can be made by looking at the p m f s for different values of θ and r. Figure 2 shows the p m f s for r = 1 , 4 , and 8 with θ = 0.2 , 0.5 , and 0.8 . The shape of all distributions is skewed to the right but tends to be symmetric for large θ and r. The difference between the distributions decreases significantly as θ and r increase, and the distributions behave more similarly. The p m f s appears identical for relatively large y, depending on the value of r in all plots with small probabilities of all distributions. For example, the p m f s of all distributions are the same after y = 4 when θ = 0.25 and r = 1 . In general, as the value of r increases, the value of y at which the p m f is constant increases. The plots show that both r and θ have a clear influence on the behavior of the different distributions.

3. Some Statistical Properties

In this section, we examine several useful statistical properties of the CNBLD, SCNBLD, and CNBSLD. These include deriving the mean, variance, and probability generating functions for each distribution. In addition, we calculate the index of dispersion (ID) for the CNBLD, SCNBLD, and CNBSLD, which provides information about the variability relative to their means. Furthermore, we discuss the likelihood ratio stochastic order and log-concavity property for the new distributions. The likelihood ratio stochastic order study is extended to the NBD to provide a more comprehensive understanding of the relative behaviors and properties of these distributions.

3.1. Moments and Probability Generating Functions

The statistical results for the moment and probability generating functions associated with the CNBLD are reviewed below.
The mean, the variance, and the probability generating function for the CNBLD are given as follows:
μ = C 1 ( r , θ ) ( 1 θ ) r 1 , V a r ( Y ) = r 2 θ 2 ( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) ( 1 θ ) r 1 2 θ r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 2 , and G ( s ) = s F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ s ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) .
The formulas above can be given as follows:
μ = y = 1 y C 1 ( r , θ ) θ y Γ ( y + r ) y Γ ( y + 1 ) Γ ( r ) = C 1 ( r , θ ) y = 1 θ y Γ ( y + r ) Γ ( y + 1 ) Γ ( r ) = C 1 ( r , θ ) ( 1 θ ) r 1 .
For the variance, the second moment can be expressed as follows:
E ( Y 2 ) = y = 1 y 2 C 1 ( r , θ ) θ y Γ ( y + r ) y Γ ( y + 1 ) Γ ( r ) = C 1 ( r , θ ) y = 1 y θ y Γ ( y + r ) Γ ( y + 1 ) Γ ( r ) = C 1 ( r , θ ) r θ z = 0 Γ ( z + r + 1 ) θ z Γ ( z + 1 ) Γ ( r + 1 ) = ( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ )
Hence, the variance can be shown to be as follows:
V a r ( Y ) = E ( Y 2 ) μ 2 = ( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) C 1 ( r , θ ) ( 1 θ ) r 1 2 = r 2 θ 2 ( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) ( 1 θ ) r 1 2 θ r 3 F 2 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 2
The form of the probability generating function becomes obvious from the p m f of the CNBLD.
The following introduces the moments and probability generating function related to the SCNBLD.
μ = C 1 ( r , θ ) ( 1 θ ) r 1 1 , V a r ( Y ) = r 2 θ 2 ( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) ( 1 θ ) r 1 2 θ r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 2 , and G ( s ) = F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ s ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) .
The results can be obtained from the fact that Z = Y 1 , where Y follows a CNBLD.
Finally, the moments and probability generating function of the CNBSLD are as follows:
The mean, the variance, and the probability generating function for the CNBSLD can be given as follows:
μ = ( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ , V a r ( Y ) = θ ( r 1 ) ( 1 θ ) r θ ( r 1 ) + 1 + θ 1 ( θ 1 ) ( 1 θ ) r + θ 1 2 , and G ( s ) = ( 1 s θ ) ( r 1 ) 1 s ( 1 θ ) ( r 1 ) 1 .
They can be obtained as follows:
E ( Y + 1 ) = y = 0 ( y + 1 ) θ ( r 1 ) θ y Γ ( y + r ) ( ( 1 θ ) ( r 1 ) 1 ) ( y + 1 ) Γ ( y + 1 ) Γ ( r ) = θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 y = 0 Γ ( y + r ) θ y Γ ( y + 1 ) Γ ( r ) = θ ( r 1 ) ( 1 θ ) r ( 1 θ ) ( r 1 ) 1 = θ ( r 1 ) 1 ( 1 θ ) r θ .
Hence,
μ = E ( Y ) = E ( Y + 1 ) 1 = θ ( r 1 ) 1 ( 1 θ ) r θ 1 = ( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ .
For the variance, we obtain the following:
E Y Y + 1 = y = 0 y ( y + 1 ) θ ( r 1 ) Γ ( y + r ) θ y ( ( 1 θ ) ( r 1 ) 1 ) ( y + 1 ) Γ ( y + 1 ) Γ ( r ) = θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 y = 0 y θ y Γ ( y + r ) Γ ( y + 1 ) Γ ( r ) = θ ( r 1 ) θ r ( 1 θ ) ( r + 1 ) ( 1 θ ) ( r 1 ) 1 = θ 2 r ( r 1 ) ( θ 1 ) ( 1 θ ) r + θ 1 .
Hence, the variance is
V a r ( Y ) = E y y + 1 E ( y ) μ 2 = θ 2 r ( r 1 ) ( θ 1 ) ( 1 θ ) r + θ 1 ( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ ( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ 2 = θ ( r 1 ) ( 1 θ ) r θ ( r 1 ) + 1 + θ 1 ( θ 1 ) [ ( 1 θ ) r + θ 1 ] 2 .
The form of the probability generating function becomes obvious from the shape of the p m f of the CNBSLD.

3.2. Index of Dispersion

In this subsection, we introduce the ID of the NBD, SCNBLD, and CNBSLD that are denoted as I D NBD , I D SCNBLD , and I D CNBSLD , respectively, for different values of r and θ . The I D NBD , I D SCNBLD , and I D CNBSLD for different values of θ and r are calculated in Table 1. Since I D SCNBLD and I D CNBSLD have complicated mathematical formulas, the IDs are calculated for selected values of r and θ .
  • I D NBD is given by 1 1 θ . This implies that as θ increases, I D NBD increases, indicating higher dispersion with higher θ .
  • For SCNBLD and CNBSLD, as θ increases, I D SCNBLD and I D CNBSLD increase. This means that for a fixed r, the dispersion increases as θ increases.
  • For SCNBLD and CNBSLD, as r increases, I D SCNBLD and I D CNBSLD also increase. This suggests that for a fixed θ , the dispersion increases as r increases.
  • If r > 1 , then I D CNBSLD < I D SCNBLD ; as a result, the SCNBLD is more suitable for data with greater dispersion. This suggests that the value of r determines the interchangeability of the two distributions.

3.3. Log-Concavity Property

Log-concave probability distributions are essential in various areas, including reliability theory, labor economics, monopoly theory, mechanism design theory, political science, and law. Refer to [8] for additional information.
Definition 7.
A discrete random variable X is log-concave if f 2 ( x + 1 ) f ( x ) f ( x + 2 ) for all x.
Theorem 2.
The pmf of the CNBLD is log-concave for r 7 and log-convex for r 2 .
Proof. 
For y = 1 , 2 , 3 , , we have
A ( y , r ) = ( f ( y + 1 ) ) 2 f ( y ) f ( y + 2 ) = Γ ( y + r + 1 ) Γ ( y + 2 ) 2 Γ ( y + r ) Γ ( y + r + 2 ) Γ ( y + 1 ) Γ ( y + 3 ) · y ( y + 2 ) ( y + 1 ) 2
Using the property that Γ ( y + 1 ) = y Γ ( y ) , we can obtain:
A ( y , r ) = y ( y + 2 ) 2 ( y + 1 ) 3 ( y + r ) ( y + 1 + r ) = y ( y + 2 ) 2 ( y + 1 ) 3 1 1 y + 1 + r .
As a result, we observe that A ( y , r ) shows an increase in r for y = 1 , 2 , 3 , . Therefore, A ( 1 , r ) 1 implies A ( y , r ) 1 for any y, but A ( 1 , r ) 1 if and only if 9 8 · 1 + r 2 + r 1 . Equivalently, this is true if and only if r 7 . Thus, A ( y , r ) 1 indicates that f ( y ) is log-concave when r 7 .
For r 2 , A ( y , r ) A ( y , 2 ) , resulting in
A ( y , r ) y ( y + 2 ) 2 ( y + 2 ) ( y + 1 ) 3 ( y + 3 ) ( y + 1 ) ( y + 2 ) 2 ( y + 2 ) ( y + 2 ) ( y + 1 ) 2 ( y + 3 ) = ( y + 2 ) 2 ( y + 1 ) ( y + 3 ) 1 .
This completes the proof. □
The CNBLD offers a flexible alternative to the NBD, with properties that depend on the parameter r. While the NBD is log-concave and unimodal for r 1 and log-convex for r 1 , the CNBLD has similar properties, but with different transition points: it is log-concave and unimodal when r 7 and log-convex when r 2 . Moreover, the transition from log-convex to log-concave in the CNBLD is gradual as r increases from 2 to 7, which improves its ability to model more diverse and precise datasets. This flexibility makes the CNBLD particularly well suited for developing precise statistical models that better fit the unique characteristics of the data and allow for a more effective analysis and interpretation compared to the NBD.
Remark 3.
Using a similar argument, we can conclude that A ( 2 , r ) 1 if and only if r 3.4 . Thus, for 3.4 r < 7 , the pmf of the CNBLD is log-concave on the set { 2 , 3 , 4 , } . The transition from log-convexity to log-concavity occurs gradually as r rises from 2 to 7.
Remark 4.
The SCNBLD is log-convex for r 2 and log-concave for r 7 , in contrast to the NBD, which is log-convex for r 1 and log-concave for r 1 . This is because log-concavity does not change with shifting.
Theorem 3.
The CNBSLD is log-convex for r 1 and log-concave for r 1 .
Proof. 
The conclusion is derived from the log-concavity of the NBD, which remains unchanged by truncation and shifting. □

3.4. Likelihood Ratio Stochastic Order

The likelihood ratio stochastic ordering provides a powerful method for comparing distributions, regardless of whether they belong to the same family with different parameters or are of completely different types. We can determine the ordering relationship between random variables by analyzing the likelihood ratio, which gives us insights into their probabilistic behavior and trends. In this section, we discuss the likelihood ratio stochastic ordering for our new distributions. We also extend this discussion to compare the likelihood ratio stochastic ordering for our new distributions with the NBD.
First, we introduce the definition of the likelihood ratio stochastic order used in this subsection.
Definition 8.
Let Y 1 and Y 2 be two discrete random variables with pmfs f ( y ) and g ( y ) , respectively. We say that Y 1 is smaller than Y 2 in the likelihood ratio stochastic order (denoted by Y 1 l r Y 2 if the ratio g ( y ) f ( y ) is non-decreasing in y over the union of the supports of Y 1 and Y 2 .
The likelihood ratio stochastic order is very strong; it implies the hazard stochastic order and other stochastic orders. For more details on the implications and applications of stochastic ordering, see [9]. In this subsection, we refer to the CNBLD with parameters θ and r as CNBLD(r, θ ).
Theorem 4.
Let Y 1 and Y 2 be two random variables following CNBLD ( θ 1 ,r) and CNBLD( θ 2 ,r), respectively. If θ 1 θ 2 , then Y 1 l r Y 2 .
Proof. 
Let f ( θ , r ) ( y ) be the p m f of the CNBLD ( θ , r). Then, we obtain the following:
f ( θ 2 , r ) ( y ) f ( θ 1 , r ) ( y ) = Φ r ( θ 1 , θ 2 ) ( θ 2 θ 1 ) y .
Here, Φ r ( θ 1 , θ 2 ) = C ( r , θ 2 ) C ( r , θ 1 ) . Since the ratio ( θ 2 θ 1 ) y increases in y if and only if θ 1 θ 2 , this implies Y 1 l r Y 2 . □
Remark 5.
For the SCNBLD and the CNBSLD, the following implications hold:
  • If Y 1 and Y 2 are two random variables following SCNBLD( θ 1 , r) and SCNBLD( θ 2 , r), respectively, such that θ 1 θ 2 , then Y 1 l r Y 2 .
  • If Y 1 and Y 2 are two random variables following CNBSLD( θ 1 , r) and CNBSLD( θ 2 , r), respectively, such that θ 1 θ 2 , then Y 1 l r Y 2 .
Proof. 
The proof is similar to the proof of Theorem 4. □
Theorem 5.
Let Y 1 and Y 2 be two random variables following CNBLD(θ, r 1 ) and CNBLD(θ, r 2 ), respectively. If r 1 r 2 , then Y 1 l r Y 2 .
Proof. 
Let f ( θ , r ) ( y ) be the p m f of CNBLD( θ , r). Then, we obtain the following:
f ( θ , r 2 ) ( y ) f ( θ , r 1 ) ( y ) = Ψ θ ( r 1 , r 2 ) ( y + r 2 1 ) ! ( y + r 1 1 ) ! = Ψ θ ( r 1 , r 2 ) ( y + r 2 1 ) ( y + r 2 2 ) ( y + r 1 ) .
Here Ψ θ ( r 1 , r 2 ) = C ( r 2 , θ ) C ( r 1 , θ ) ( r 2 1 ) ( r 2 2 ) r 1 ! . Since, the ratio f ( θ , r 2 ) ( y ) f ( θ , r 1 ) ( y ) increases in y if and only if r 1 r 2 , this implies Y 1 l r Y 2 . □
Remark 6.
For the SCNBLD and the CNBSLD, the following implications hold:
  • If Y 1 and Y 2 are two random variables following SCNBLD(θ, r 1 ) and SCNBLD(θ, r 2 ), respectively, such that r 1 r 2 , then Y 1 l r Y 2 .
  • If Y 1 and Y 2 are two random variables following CNBSLD(θ, r 1 ) and CNBSLD(θ, r 2 ), respectively, such that r 1 r 2 , then Y 1 l r Y 2 .
Proof. 
The proof is similar to the proof of Theorem 5. □
Corollary 1.
Let Y ( θ 1 , r 1 ) be a random variable from CNBLD ( θ 2 , r 2 ) . If r 1 r 2 , and θ 1 θ 2 , then we conclude from Theorems 4 and 5 the following:
Y ( θ 1 , r 1 ) l r Y ( θ 2 , r 1 ) l r Y ( θ 2 , r 2 ) .
Hence, the following is given:
Y ( θ 1 , r 1 ) l r Y ( θ 2 , r 2 ) .
Theorem 6.
Let Y 1 , Y 2 , and Y 3 be three random variables following SCNBLD ( θ , r ) , CSNBLD ( θ , r ) and NBD ( θ , r ) , respectively. Then, Y 1 l r Y 2 l r Y 3 .
Proof. 
To prove that Y 1 l r Y 2 , we examine the following ratio:
f SCNBLD ( y ) f CNBSLD ( y ) = Λ 1 ( r , θ ) y + r y + 1 .
where Λ 1 ( r , θ ) = ( 1 θ ) ( r 1 ) 1 θ r ( r 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) . We observe that the term y + r y + 1 is an increasing function of y when r > 1 . Therefore, Y 1 l r Y 2 .
Similarly, we need to examine the following ratio:
f NBD ( y ) f SCNBLD ( y ) = Λ 2 ( r , θ ) ( y + 1 ) 2 y + r .
Here, Λ 2 ( r , θ ) = r ( 1 θ ) r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) . We observe that the term ( y + 1 ) 2 y + r is an increasing function of y. Hence, Y 2 l r Y 3 .
Since we have showed that Y 1 l r Y 2 and Y 2 l r Y 3 , we conclude that Y 1 l r Y 2 l r Y 3 . This means that in the likelihood ratio stochastic order, Y 3 is stochastically larger than Y 2 , and Y 2 is stochastically larger than Y 1 .
In various fields such as economics, insurance, and risk management, the stochastic order of the likelihood ratio is concerned with risk analysis and decision-making by identifying which distributions are more or less likely to produce large values. □

4. Estimation and Simulation Study

This section examines the estimation of CNBLD and CNBSLD parameters using the method of moment (MM) and the maximum likelihood (ML) method. Two scenarios were investigated: one where r was known, and another where r was unknown. In all scenarios, it was assumed that Y 1 , Y 2 , , Y n represented a random sample drawn from the distribution under study. Furthermore, simulation studies were employed to assess the efficiency of the estimates delivered by the proposed methods.

4.1. Parameter Estimation of the CNBLD

4.1.1. Case 1: r Is Known

Here, we had only one parameter θ to estimate using the MM as follows:
E ( Y ) = y ¯ ,
Or equivalently, this can be achieved as follows:
C 1 ( r , θ ) ( 1 θ ) r 1 y ¯ = 0 .
Using the ML method, the likelihood function can be given as follows:
L ( θ , r | y 1 , y 2 , , y n ) = θ i = 1 n y i n [ r ! F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) ] n i = 1 n Γ ( y i + r ) y i Γ ( y i + 1 ) ,
Further, the log-likelihood function l ( θ ) from Equation (3) above is given as follows:
( θ ) = i = 1 n y i n ln θ n ln Γ ( r + 1 ) + ln F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) + i = 1 n ln Γ ( y i + r ) ln Γ ( y i + 1 ) ln y i
The ML estimate of θ is the solution of the equation l ( θ ) θ = 0 . Now, since θ F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) = ( r + 1 ) 4 F 2 3 ( 2 , 2 , r + 2 ; 3 , 3 ; θ ) , we have:
i = 1 n y i n 1 θ n ( r + 1 ) 4 F 2 3 ( 2 , 2 , r + 2 ; 3 , 3 ; θ ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) = 0 ,
i = 1 n y i n 1 θ n ( r + 1 ) 4 4 ( 1 θ ) r 4 θ r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 4 θ 2 r ( 1 + r ) F 2 3 1 , 1 , r + 1 ; 2 , 2 ; θ = 0 ,
i = 1 n y i n 1 θ n θ 2 r ( 1 θ ) r θ r F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 1 F 2 3 1 , 1 , r + 1 ; 2 , 2 ; θ = 0 ,
( 1 θ ) r 1 θ r F 2 3 1 , 1 , r + 1 ; 2 , 2 ; θ y ¯ = 0 .
C 1 ( r , θ ) ( 1 θ ) r 1 y ¯ = 0 .
which is the same as Equation (2). Thus, if r is known, the ML estimate of θ is the same as that of the MM estimate.

4.1.2. Case 2: r Is Unknown

To obtain the MM estimates of θ and r, the following equations were used:
E ( Y ) = y ¯ ,
and
E ( Y 2 ) = 1 n i = 1 n y i 2 ,
Or equivalently, the following could be used:
( 1 θ ) r 1 θ r 3 F 2 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) y ¯ = 0
and
( 1 θ ) ( r + 1 ) F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) 1 n i = 1 n y i 2 = 0 .
Numerical solutions for Equations (5) and (6) are not achievable using algebraic methods. Therefore, the equations were solved using the “nleqslv” function from the “nleqslv” package in the R programming language, which gave the MM estimate of r and θ .
Using the ML method, the likelihood function as defined in Equation (3) were used to determine the ML estimates for the unknown parameters r and θ as the solutions of the likelihood equations as the following:
( θ , r ) θ = i = 1 n y i n ln θ n ln F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) θ = 0 ,
( θ , r ) r = n ln Γ ( r + 1 ) + ln F 2 3 ( 1 , 1 , r + 1 ; 2 , 2 ; θ ) + i = 1 n ln Γ ( y i + r ) r = 0 ,
These two equations must be solved numerically to determine the MLEs of the parameters θ and r. Since the generalized hypergeometric function F 2 3 makes the system of equations complex, numerical methods are generally used to obtain the solutions.

4.2. Simulation Study for the CNBLD

To evaluate the methods of estimation, we performed the following simulations as outlined below:

4.2.1. Case 1: r Is Known

In this simulation study, we considered the values r = 1 , 5 and 10 and θ = 0.2 , 0.5 and 0.8 . The simulation algorithm consisted of the following steps:
Algorithm 1: Simulation algorithm wher r is known
  • Choose the values r, θ and the sample size n = 10 , 20 , 50 , 100 , and 500.
  • Generate a total of 1000 random samples of size n from a standard uniform distribution, namely, { U 1 , U 2 , , U n } such that U i U n i f ( 0 , 1 ) and i = 1 , , n .
  • Divide the unit interval [ 0 , 1 ] into intervals: I j = [ F ( Y j 1 ) , F ( Y j ) ] , where F ( Y j ) is the c d f of the CNBLD for Y j , j = 1 , 2 , .
  • Find j such that U i I j .
  • Return Y j .
  • Repeat steps 4 and 5 for i = 1 , 2 , , n .
  • Use the likelihood function in Equation (3) that takes these parameters as input and returns a negative log-likelihood function.
  • Find the ML for θ .
  • Repeat steps 1 to 8 for N = 1000 times to calculate the following:
    (a)
    The standardized bias of the simulated estimates is defined as follows:
    SBias ( θ ^ ) = 1 N i = 1 N ( θ ^ θ ) θ
    (b)
    The average of the mean squared error of the simulated estimates is defined as follows:
    MSE ( θ ^ ) = 1 N i = 1 N ( θ ^ θ ) 2
  • Simulation and concluding results:
The simulation results are shown in Table 2, Table 3 and Table 4 below.
From Table 2, Table 3 and Table 4, it is possible to conclude the following:
  • The MSE of θ decreases along with an increase in n, and thus the estimator of θ is consistent.
  • The |SBias| decreases as θ increases.

4.2.2. Case 2: r Is Unknown

Numerical solutions were used to obtain the parameter estimates of θ and r, when r was unknown. In that scenario, we considered r = 1 , 4 and 8 and θ = 0.2 , 0.5 , and 0.8 . The simulation algorithm involved the following steps:
Algorithm 2: Simulation algorithm wher r is unknown
  • Generate 1000 random samples of size n from the CNBLD following steps 1–6 in Algorithm 1.
  • Solve the system of nonlinear equations in Equations (5) and (6) and find the MM estimates for θ and r.
  • Find the ML estimates for θ and r using Equation (3).
  • Repeat steps 1 to 3 for N = 1000 times to calculate the Bias, SBias, MSE, and SMSE for both ML and MM methods.
  • Simulation and concluding results:
The results are reported in Table 5. From Table 5, one can conclude the following:
  • For MM and ML estimates, the MSE of r and θ decrease as n increases.
  • For a large n, both the ML and MM estimates are good, but for a small n, the ML estimate is better than the MM estimate according to small values of the SBias for ML estimates.

4.3. Parameter Estimation of the CNBSLD

In this subsection, estimation methods and simulation study were conducted to assess the performance of the CNBSLD as follows:

4.3.1. Case 1: r Is Known

Here, we had only one parameter θ to estimate using the MM. It was obtained by the following equation:
E ( Y ) = y ¯ ,
Or equivalently, this could be achieved with the following:
( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ y ¯ = 0 .
Using the ML method, the likelihood function can be given as follows:
L ( θ , r | y 1 , y 2 , , y n ) = i = 1 n θ ( r 1 ) θ y i Γ ( y i + r ) ( ( 1 θ ) ( r 1 ) 1 ) ( y i + 1 ) Γ ( y i + 1 ) Γ ( r )
Simplified, the likelihood function becomes:
L ( θ , r | y 1 , y 2 , , y n ) = θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 n θ i = 1 n y i i = 1 n Γ ( y i + r ) ( y i + 1 ) Γ ( y i + 1 ) Γ ( r ) .
Further, the log-likelihood function l ( θ ) from Equation (7) above is given as follows:
( θ ) = n log θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 + i = 1 n y i log ( θ ) + i = 1 n log Γ ( y i + r ) ( y i + 1 ) Γ ( y i + 1 ) Γ ( r )
The ML estimate of θ is the solution of the equation l ( θ ) θ = 0 . Then,
n θ + ( 1 θ ) r n ( 1 r ) ( 1 θ ) 1 r 1 + i = 1 n y i θ = 0
( 1 θ ) r ( θ r 1 ) + 1 ( 1 θ ) 1 r 1 y ¯ = 0 .

4.3.2. Case 2: r Is Unknown

To obtain the MM estimates of θ and r, the following equations were used:
E ( Y ) = y ¯ ,
and
E ( Y 2 ) = 1 n i = 1 n y i 2 ,
Or equivalently, the following could be used:
( 1 θ ) r + r θ 1 1 ( 1 θ ) r θ y ¯ = 0
and
θ r ( θ r 1 ) ( 1 θ ) r + 1 θ + 1 ( θ 1 ) ( 1 θ ) r + θ 1 1 n i = 1 n y i 2 = 0 .
Numerical solutions for Equations (8) and (9) are not achievable using algebraic methods. Therefore, the equations were solved numerically, which gave the MM estimate of r and θ .
Using the ML method, the likelihood function as defined in Equation (7) were used to determine the ML estimates for the unknown parameters r and θ as the solutions of the likelihood equations as the following:
( θ , r ) θ = n log θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 + i = 1 n y i log ( θ ) θ = 0 ,
( θ , r ) r = n log θ ( r 1 ) ( 1 θ ) ( r 1 ) 1 + i = 1 n log Γ ( y i + r ) Γ ( r ) r = 0 ,
These two equations must be solved simultaneously to determine the MLEs of the parameters θ and r.

4.4. Simulation Study for the CNBSLD

To evaluate the methods of estimation, we performed the following simulations as outlined below:

4.4.1. Case 1: r Is Known

In this simulation study, Algorithm 1 of the simulation outlined in Section 4.2.1 was implemented with the exception of the following:
  • In step 3, F ( Y j ) was the c d f of the CNBSLD.
  • In step 2, the likelihood function employed was represented by Equation (7).
  • Simulation and concluding results:
The simulation results are shown in Table 6, Table 7 and Table 8 below.
From Table 6, Table 7 and Table 8, it is possible to conclude the following:
  • The MSE decreases along with an increase in n.
  • The |SBias| decrease as n increases.
  • Both|SBias| and MSE show that n = 10 or more is required to accurately estimate θ .

4.4.2. Case 2: r Is Unknown

In this case, a similar strategy and the steps of Algorithm 2 of the simulation outlined in Section 4.2.2 were followed, with the exception of the following:
  • In step 1, random samples were generated from the CNBSLD.
  • In step 2, the nonlinear equations represented by Equations (8) and (9) were solved to find the MM estimates of θ and r.
  • In step 3, Equation (7) was solved numerically to obtain the ML estimates of θ and r.
  • Simulation and concluding results:
The simulation results are shown in Table 9 below.
From Table 9, one can conclude the following:
  • For the MM estimate, the MSE of r and θ decreases as n increases.
  • For the ML estimate, the MSE of r and θ decreases as n increases.
  • For a large n, both the ML and MM estimates are good, but for a small n, the ML estimate is better than the MM estimate according to the |SBias|.

5. Applications

In this section, the effectiveness of the CNBLD, SCNBLD, and CNBSLD is evaluated using real datasets and compared with other existing distributions. The parameter estimates for these distributions were obtained using the ML method. The tools of comparison used were the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). In general, the smaller the values of these statistics, the better the fit to the data. The calculations were performed using the R programming language.

5.1. The Number of Eggs per Flower Head

According to [10], the zero-truncated Poisson–Lindley distribution (ZTPLD) provides a better fit for data relating to the number of eggs per flower head, shown in Table 10, compared to the zero-truncated PD (ZTPD). The study [11] proposes the application of the ZTPD to the dataset.

Application and Concluding Results

The calculated values for the mean, variance, and dispersion index were 3.0340 , 3.3436 , and 1.1020 , respectively. These values indicated the presence of over-dispersion. In Table 10, when comparing the observed frequencies of the ZTPLD and CNBLD, we noticed that the LD had an impact on the CNBLD by gradually raising the probability of small values of Y. On the other hand, the truncated zero effect on the ZTPLD just increased its probability at zero. Table 10 clearly shows that the CNBLD model provided a superior fit to the data with lower AIC and BIC values.

5.2. The Number of Hospital Stays by United States Residents Aged 66 and Over

These data cover the number of hospitalizations of United States residents aged 66 and up, as reported by [12]. This dataset had 80.37% zeros with a sample ID of 1.882 . These characteristics showed over-dispersion and a high number of zero counts. The zero-inflated negative binomial-generalized exponential (ZINB-GE) distribution was used on the same dataset as in the analysis of [13]. The ZINB-GE distribution outperformed the zero-inflated Poisson distribution (ZIPD) and the zero-inflated negative binomial distribution (ZINBD) in terms of data fit.

Application and Concluding Results

Table 11 shows that the SCNBLD and CNBSLD models suited the data well. These models outperformed the ZINB-GE model based on the lower AIC and BIC values, indicating they could handle over-dispersion and significant numbers of zero counts in the dataset. As a result, the SCNBLD and CNBSLD models are most suited for modeling hospitalizations of United States residents aged 66 and older.

5.3. Accident Frequency Data among Machinists

In this subsection, the data concerning the frequency of accidents among 414 machinists originated from a study conducted towards the end of World War I. This study was performed by the Industrial Fatigue Research Board and was documented in a report published in [14]. The data covered a three-month period and were designed to assess the frequency of industrial accidents, particularly in environments with heavy machinery; they were used in [2].

Application and Concluding Results

The values for the mean, variance, and dispersion index were 0.4831 , 1.0106 , and 2.0919 , respectively. Based on Table 12, the SCNBLD and CNBSLD models were considered the best fit for these data as they had the lowest AIC and BIC values.

6. Conclusions

This study presented a novel distribution by employing the concept of conflating probability distributions, specifically combining the NBD and the LD into a single distribution that captured their shared information with minimal loss. The newly introduced distribution was referred to as the conflation of negative binomial and logarithmic distributions (CNBLD). The CNBLD is capable of modeling positive count data from the LD, but it does not take into consideration zero values, unlike the NBD. In order to overcome this constraint, two novel modified models were introduced. The first model was the SCNBLD, which was obtained by shifting the CNBLD one position to the left. The second model was the combination of the negative binomial and shifted logarithmic distributions (CNBSLD), which merged a shifted logarithmic distribution with the NBD to incorporate the characteristics of both distributions. These two distributions provided increased flexibility and the capacity to model a wider range of data. An investigation was conducted to examine the valuable statistical characteristics of the CNBLD, SCNBLD, and CNBSLD. In addition, we studied the estimation of the CNBLD and CNBSLD parameters using the methods of MM and ML. The efficiency of the estimations given by these methods was assessed by simulation studies. The simulation results demonstrated that both the ML and MM estimates exhibited consistency. The efficacy of these models was assessed by testing them against diverse distributions using real data. The new distributions exhibited superior performance in accurately fitting the data when compared to other models. Finally, despite the fact that we used these new models on particular datasets, we were able to expand their suitability and employ them in various study domains, demonstrating their potential as a proficient substitute for modeling count data.

Author Contributions

Conceptualization, A.A.A. (Abdulhamid A. Alzaid) and A.A.A. (Anfal A. Alqefari); methodology, A.A.A. (Abdulhamid A. Alzaid), A.A.A. (Anfal A. Alqefari) and N.Q.; validation, A.A.A. (Abdulhamid A. Alzaid), A.A.A. (Anfal A. Alqefari), and N.Q.; writing—original draft preparation, A.A.A. (Anfal A. Alqefari); writing—review and editing, A.A.A. (Abdulhamid A. Alzaid), A.A.A. (Anfal A. Alqefari), and N.Q.; visualization, A.A.A. (Anfal A. Alqefari) and N.Q.; funding acquisition, N.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R376), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

We used of publicly available data.

Acknowledgments

The authors gratefully acknowledge Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R376), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia for the financial support for this project.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
  2. Greenwood, M.; Yule, G.U. An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J. R. Stat. Soc. 1920, 83, 255–279. [Google Scholar] [CrossRef]
  3. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen. 1934, 6, 13–25. [Google Scholar] [CrossRef]
  4. Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
  5. Barmalzan, G.; Saboori, H.; Kosari, S. A Modified Negative Binomial Distribution: Properties, Overdispersion and Underdispersion. J. Stat. Theory Appl. 2019, 18, 343–350. [Google Scholar] [CrossRef]
  6. Hill, T. Conflations of probability distributions. Trans. Am. Math. Soc. 2011, 363, 3351–3372. [Google Scholar] [CrossRef]
  7. Abramowitz, M.; A., S.I. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; US Government Printing Office: Washington, DC, USA, 1968.
  8. Bagnoli, M.; Bergstrom, T. Log-concave probability and its applications. In Rationality and Equilibrium: A Symposium in Honor of Marcel K. Richter; Springer: New York, NY, USA, 2006; pp. 217–241. [Google Scholar]
  9. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Series in Statistics; Springer: New York, NY, USA, 2007. [Google Scholar]
  10. Shanker, R.; Hagos, F.; Sujatha, S.; Abrehe, Y. On zero-truncation of Poisson and Poisson-Lindley distributions and their applications. Biom. Biostat. Int. J. 2015, 2, 168–181. [Google Scholar] [CrossRef]
  11. Finney, D.; Varley, G. An example of the truncated Poisson distribution. Biometrics 1955, 11, 387–394. [Google Scholar] [CrossRef]
  12. Flynn, M.; Francis, L.A. More flexible GLMs zero-inflated models and hybrid models. Casualty Actuar. Soc. 2009, 2009, 148–224. [Google Scholar]
  13. Aryuyuen, S.; Bodhisuwan, W.; Supapakorn, T. Zero inflated negative binomial-generalized exponential distribution and its applications. Songklanakarin J. Sci. Technol. 2014, 36, 483–491. [Google Scholar]
  14. Greenwood, M.; Woods, H.M. The Incidence of Industrial Accidents upon Individuals: With Special Reference to Multiple Accidents; HM Stationery Office: London, UK, 1919. [Google Scholar]
Figure 1. The p m f s of the CNBLD for different values of θ and r.
Figure 1. The p m f s of the CNBLD for different values of θ and r.
Axioms 13 00707 g001
Figure 2. The p m f s of the SCNBLD, the CNBSLD, and the NBD for different values of θ and r.
Figure 2. The p m f s of the SCNBLD, the CNBSLD, and the NBD for different values of θ and r.
Axioms 13 00707 g002
Table 1. The index of dispersion for the NBD, SCNBLD, and CNBSLD for different values of r and θ .
Table 1. The index of dispersion for the NBD, SCNBLD, and CNBSLD for different values of r and θ .
SCNBLDCNBSLDNBD
r θ MeanVarianceIDMeanVarianceIDMeanVarianceID
0.20.08830.10421.18070.05900.06991.18400.12500.15621.25
0.50.50.30790.52201.69550.20710.35361.70710.50.52
0.80.91043.26223.58330.61802.23613.61802105
0.20.18890.23781.25920.250.31251.250.50.6251.25
20.50.77181.58552.0541122242
0.83.278517.34865.291642058405
0.20.43230.61151.41430.69370.94211.35791.251.56251.25
50.52.34186.08042.59653.26677.39562.26395102
0.813.534179.53495.876615.025679.71725.3054201005
0.20.74031.15361.55811.21431.73961.432522.51.25
80.54.779713.01372.72266.055113.72132.26608162
0.825.7471140.50065.456927.0003139.99175.1848321605
Table 2. The ML results of the CNBLD for different values of θ when r = 1 .
Table 2. The ML results of the CNBLD for different values of θ when r = 1 .
n θ θ ^ |SBias|MSE
10 0.18740.06270.0153
20 0.19120.04370.0075
500.20.19310.03430.0055
100 0.19540.02290.0037
500 0.19960.00180.0018
10 0.47890.04210.0106
20 0.48830.02330.0059
500.50.49620.00740.0048
100 0.49710.00580.0016
500 0.50060.00120.0009
10 0.78410.01990.0049
20 0.79190.01010.0021
500.80.79740.00320.0009
100 0.79840.00180.0004
500 0.79870.00150.0002
Table 3. The ML results of the CNBLD for different values of θ when r = 5 .
Table 3. The ML results of the CNBLD for different values of θ when r = 5 .
n θ θ ^ |SBias|MSE
10 0.18790.06040.0063
20 0.19090.04520.0032
500.20.19820.00910.0013
100 0.19850.00770.0006
500 0.20050.00250.0001
10 0.48920.02150.0044
20 0.49610.00780.0021
500.50.49800.00390.0008
100 0.50040.00080.0004
500 0.50030.00060.0001
10 0.79610.00490.0009
20 0.79810.00240.0004
500.80.79940.00080.0002
100 0.7998−0.00030.0001
500 0.80010.00020.00002
Table 4. The ML results of the CNBLD for different values of θ when r = 10 .
Table 4. The ML results of the CNBLD for different values of θ when r = 10 .
n θ θ ^ |SBias|MSE
10 0.19200.03960.0025
20 0.19770.01140.0013
500.20.19830.00830.0005
100 0.19970.00110.0002
500 0.19990.00030.0001
10 0.49810.00370.0014
20 0.49870.00250.0007
500.50.49910.00190.0002
100 0.49950.00090.0001
500 0.49960.00070.000004
10 0.79840.00190.0004
20 0.79920.00090.0002
500.80.79950.00060.00007
100 0.80020.00030.000004
500 0.80000.000040.000002
Table 5. The MM and ML results for the CNBLD for different values of θ and r.
Table 5. The MM and ML results for the CNBLD for different values of θ and r.
n θ Method θ ^ |SBias|MSE
r r ^
100.2MM0.36510.82580.0705
11.49730.49730.9397
0.2ML0.12200.38950.0122
11.49260.49260.8077
200.2MM0.22170.10880.0090
11.21230.21230.7119
0.2ML0.23900.19510.0086
11.15330.15330.6424
500.2MM0.18690.06550.0071
11.15650.15650.5158
0.2ML0.20380.01920.0038
11.12860.12860.4967
1000.2MM0.20950.04790.0043
10.98430.01560.2600
0.2ML0.19690.01750.0022
11.04540.04540.2554
5000.2MM0.20470.02390.0039
10.99710.00280.0038
0.2ML0.19670.01610.0019
10.99350.00640.0026
100.5MM0.47100.05790.0022
44.30240.07560.7953
0.5ML0.50910.01820.0015
44.13350.03330.1841
200.5MM0.50410.00830.0019
44.20850.05210.2568
0.5ML0.49320.01340.0005
44.06830.01710.0733
500.5MM0.49800.00390.0003
44.11670.02920.2368
0.5ML0.50460.00930.0001
43.99210.00190.0584
1000.5MM0.50090.00190.0002
43.98840.00280.1885
0.5ML0.49600.00780.00009
43.99820.00040.0049
5000.5MM0.49750.00480.00005
43.99540.00110.0041
0.5ML0.49940.00110.00002
43.99930.00020.0007
100.8MM0.80150.00190.0001
87.92000.07990.2022
0.8ML0.79690.00380.00008
88.14460.01810.1093
200.8MM0.79920.00090.00002
88.02910.00360.0668
0.8ML0.79940.00080.00001
88.00450.00060.0569
500.8MM0.79940.00070.00001
88.02320.00290.0211
0.8ML0.79990.00010.000009
88.00310.00040.0199
1000.8MM0.79970.00040.000001
88.01960.00240.0107
0.8ML0.79980.00030.0000005
88.00290.00030.0014
5000.8MM0.79990.00010.0000008
88.01220.00150.0010
0.8ML0.79980.000030.0000002
87.99990.0000090.0003
Table 6. The ML estimates of the CNBSLD for different values of θ when r = 0.9 .
Table 6. The ML estimates of the CNBSLD for different values of θ when r = 0.9 .
n θ θ ^ |SBias|MSE
10 0.18050.09710.0258
20 0.18590.07000.0139
500.20.19510.02440.0059
100 0.19660.01690.0029
500 0.19970.00120.0006
10 0.46960.06060.0201
20 0.47690.04610.0129
500.50.48950.02090.0076
100 0.49640.00720.0034
500 0.49980.00040.0005
10 0.78210.02220.0049
20 0.79190.01010.0027
500.80.79220.00970.0017
100 0.79640.00440.0006
500 0.79980.000130.00012
Table 7. The ML estimates of the CNBSLD for different values of θ when r = 5 .
Table 7. The ML estimates of the CNBSLD for different values of θ when r = 5 .
n θ θ ^ |SBias|MSE
10 0.19370.03140.0043
20 0.19710.01450.0013
500.20.19810.00950.0007
100 0.19850.00710.0004
500 0.19950.00250.00008
10 0.49140.01730.0036
20 0.49450.01090.0019
500.50.49890.00220.0006
100 0.50030.00050.0003
500 0.500040.000070.00006
10 0.79700.00370.0025
20 0.79800.00240.0004
500.80.79910.00120.0002
100 0.80010.00020.00008
500 0.800020.000030.00001
Table 8. The ML estimates of the CNBSLD for different values of θ when r = 10 .
Table 8. The ML estimates of the CNBSLD for different values of θ when r = 10 .
n θ θ ^ |SBias|MSE
10 0.19620.01910.0019
20 0.19720.01390.0009
500.20.19840.00760.0004
100 0.19950.00250.0002
500 0.20040.00190.00003
10 0.49780.00440.0014
20 0.49920.00160.0007
500.50.49980.00020.0003
100 0.49990.00010.000006
500 0.499960.000060.000002
10 0.79790.00260.0004
20 0.79920.00090.0002
500.80.79990.00010.00006
100 0.79990.000060.000007
500 0.8000080.000010.000003
Table 9. The MM and ML for the CNBSLD for different values of θ and r.
Table 9. The MM and ML for the CNBSLD for different values of θ and r.
n θ Method θ ^ |SBias|MSE
r r ^
100.2MM0.18040.09790.0155
0.91.32650.47390.5303
0.2ML0.18810.05910.0131
0.91.21000.34440.2422
200.2MM0.19360.03150.0102
0.91.15370.28190.1378
0.2ML0.19120.04390.0094
0.91.14310.27010.1245
500.2MM0.19670.01600.0071
0.91.11310.23680.0901
0.2ML0.19220.03920.0062
0.91.10370.22630.0789
1000.2MM0.19840.00770.0027
0.91.08560.20620.0600
0.2ML0.19800.00960.0018
0.91.07260.19180.0509
5000.2MM0.20040.00210.0012
0.90.88980.01120.0439
0.2ML0.19950.00230.0009
10.89890.00110.0261
100.5MM0.45320.09350.0126
43.76840.05790.6007
0.5ML0.51920.03840.0031
43.87340.03160.5942
200.5MM0.49060.01870.0028
44.15460.03870.3911
0.5ML0.49330.01330.0015
44.09580.02390.3027
500.5MM0.49770.00450.0019
44.06330.01580.2131
0.5ML0.49820.00340.0013
44.01530.00380.1142
1000.5MM0.50110.00220.0007
44.03620.00910.0179
0.5ML0.50140.00290.0002
44.00410.00100.0155
5000.5MM0.50010.00030.00009
44.02120.00530.0099
0.5ML0.50090.00180.00003
44.00090.00020.0033
100.8MM0.79700.00360.0001
88.14140.01770.2299
0.8ML0.79720.00340.00009
88.13060.01630.1737
200.8MM0.79890.00130.00003
87.97250.00340.0471
0.8ML0.79990.00020.00002
88.14330.01790.0337
500.8MM0.79920.00090.00001
88.00940.00120.0175
0.8ML0.800090.00010.000002
87.98840.00140.0033
1000.8MM0.79950.00060.000007
87.99580.00050.0034
0.8ML0.800010.000020.000001
87.99920.000090.0025
5000.8MM0.80020.00030.000002
88.00390.00040.0029
0.8ML0.8000010.0000020.0000008
88.000010.0000010.0016
Table 10. Number of counts of flower heads as per the number of fly eggs.
Table 10. Number of counts of flower heads as per the number of fly eggs.
YO.F 1ZTPDZTPLDCNBLD
12215.2826.7821.42
21821.8619.7719.79
31820.8413.9416.81
41114.909.5312.45
598.536.378.12
664.064.194.73
731.662.722.51
800.591.741.22
910.191.110.55
Total88888888
ML θ ^ = 2.8604 θ ^ = 0.7186 θ ^ = 0.1267
r ^ = 28.1719
AIC 335.09336.76331.93
BIC 337.57339.24336.89
1 O.F = observed frequency.
Table 11. Number of hospital stays by United States residents aged 66 and over.
Table 11. Number of hospital stays by United States residents aged 66 and over.
YO.F 1PDZIPDNBDZINBDZINB-GESCNBLDCNBSLD
035413277.31816.53544.43541.13541.13543.13543.7
1599969.91609.5583.5533.4601.0595.8591.3
2176143.5712.9177.4217.7168.7167.7170.9
34814.1210.662.368.756.458.559.9
4201.0546.723.318.921.622.923.2
5120.068.494.49.79.79.5
6501.33.61.34.44.34.1
7100.11.40.51.821.8
84000.601.30.90.8
Total44064406440644064406440644064406
ML θ ^ = 0.2959 ϕ ^ = 0.6659 θ ^ = 0.4437 ϕ ^ = 0.6040 ϕ ^ = 0.1645 θ ^ = 0.5935 θ ^ = 0.5337
θ ^ = 0.8859 r ^ = 0.3709 r ^ = 3.9683 r ^ = 2.2040 r ^ = 0.1333 r ^ = 0.6252
θ ^ = 0.8415 α ^ = 1.0331
α ^ = 7.3573
AIC 6611.016122.846023.24607860226019.236020.16
BIC 6617.406135.626036607960486032.026032.94
1 O.F = observed frequency.
Table 12. Accident frequency among 414 machinists: a statistical count over an undefined period.
Table 12. Accident frequency among 414 machinists: a statistical count over an undefined period.
YO.F 1PDNBDSCNBLDCNBSLD
0296255.38296.70296.60296.57
174123.3771.0172.4272.18
22629.8026.4172.4225.65
384.7910.9910.4310.56
440.584.814.664.69
540.062.172.192.19
610.0051.0021.081.06
700.00030.460.540.52
81000.270.27
Total414414414414414
ML θ = 0.4830 θ = 0.5046 θ = 0.6048 θ = 0.5794
r = 0.4742 r = 0.6148 r = 0.8400
AIC 855.82768.06767.63767.68
BIC 859.85776.11775.68775.73
1 O.F = observed frequency.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alqefari, A.A.; Alzaid, A.A.; Qarmalah, N. On the Conflation of Negative Binomial and Logarithmic Distributions. Axioms 2024, 13, 707. https://doi.org/10.3390/axioms13100707

AMA Style

Alqefari AA, Alzaid AA, Qarmalah N. On the Conflation of Negative Binomial and Logarithmic Distributions. Axioms. 2024; 13(10):707. https://doi.org/10.3390/axioms13100707

Chicago/Turabian Style

Alqefari, Anfal A., Abdulhamid A. Alzaid, and Najla Qarmalah. 2024. "On the Conflation of Negative Binomial and Logarithmic Distributions" Axioms 13, no. 10: 707. https://doi.org/10.3390/axioms13100707

APA Style

Alqefari, A. A., Alzaid, A. A., & Qarmalah, N. (2024). On the Conflation of Negative Binomial and Logarithmic Distributions. Axioms, 13(10), 707. https://doi.org/10.3390/axioms13100707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop