Next Article in Journal
Non-Newtonian Pressure-Governed Rivulet Flows on Inclined Surface
Previous Article in Journal
Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
Previous Article in Special Issue
Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Stochastic Representations of the Zero–One-Inflated Poisson Lindley Distribution

by
Razik Ridzuan Mohd Tajuddin
* and
Noriszura Ismail
Department of Mathematical Sciences, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(5), 778; https://doi.org/10.3390/math12050778
Submission received: 8 June 2023 / Revised: 4 July 2023 / Accepted: 11 July 2023 / Published: 6 March 2024
(This article belongs to the Special Issue New Advances in Distribution Theory and Its Applications)

Abstract

:
A zero–one-inflated Poisson Lindley distribution has been introduced recently as an alternative to the zero–one-inflated Poisson distribution for describing count data with a substantial number of zeros and ones. Several stochastic representations of the zero–one-inflated Poisson Lindley distribution and their equivalence to some well-known distributions under some conditions are presented. Using these stochastic representations, the distributional properties such as the nth moments, as well as the conditional distributions are discussed. These stochastic representations can be used to explain the relationship between two or more distributions. Several likelihood ratio tests are developed and examined for the presence of one-inflation and fixed rate parameters. The likelihood ratio tests are found to be powerful and have ability to control the error rates as the sample size increases. A sample size of 1000 is acceptable and sufficient for the likelihood ratio tests to be useful.

1. Introduction

One common phenomenon in statistics is the presence of excess zeroes only. This phenomenon happens when there are more zero-valued observations than explained by the Poisson distribution. There have been numerous studies conducted in analysing count data with zero-inflation such as zero-inflated models [1,2,3,4], hurdle models [4], zero-altered models [3] and others. Young et al. [5] has provided a comprehensive review on the use of the zero-inflated models and its associated regression models. The zero-inflated models are commonly used to explain the excess zeroes by introducing an inflation parameter known as zero-inflation parameter.
Recently, the presence of excess zeros and ones in count data have been gaining attraction by researchers as they are also common in statistics. This phenomenon happens when there is an abundance of observed events that are not happening and happening only once. This phenomenon arises quite naturally depending on the questions we would like to answer. Lin and Tsai [6] have provided a list of questions that will ultimately give the observations inflated at zero and non-zero. For inflation at zero and one, asking questions about a memorable event that happened in one’s life such as the number of marriages [6] will certainly yield results that have a huge spike at zero and one because it is natural and common across time for mankind to either stay single or get married to one person at a time or in life. The phenomenon of excess zeros and ones can also be found in various fields such as medicine [6] as well as quantitative criminology [7,8].
Introducing two inflation parameters into an existing distribution to describe excess zeros and ones, respectively, is normal and extensively researched [6,7,8,9,10]. Although the zero–one-inflated Poisson distribution (ZOIP) was introduced in the late 20th century by Melkersson and Olsson [9], its stochastic representations were not explored until 17 years later by Zhang et al. [10]. The study by Zhang et al. [10] interrelates the ZOIP distribution with other known Poisson distributions such as the zero-inflated Poisson, the zero-truncated Poisson and the one-truncated Poisson distributions. Following the idea of Zhang et al. [10], this paper examines and discusses some notes on the stochastic representations for the zero–one-inflated Poisson Lindley distribution (ZOIPL) developed by Tajuddin et al. [8]. Likelihood ratio tests are also developed to investigate whether the presence of one-inflation and fixed parameters is significant.
The probability mass function (pmf) for a random variable Y following the ZOIPL distribution [8] is given as:
Pr ( Y = y ) = { ω 0 + ( 1 ω 0 ω 1 ) θ 2 ( θ + 2 ) ( θ + 1 ) 3 ; y = 0 ω 1 + ( 1 ω 0 ω 1 ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 ; y = 1 ( 1 ω 0 ω 1 ) θ 2 ( θ + y + 2 ) ( θ + 1 ) y + 3 ; y 2 ,
where ω 0 and ω 1 explain the excess zeroes and ones, respectively, and θ is the parameter of the Poisson Lindley, P L distribution [11]. The P L distribution has been shown to provide a better fit than the Poisson distribution due to its ability to handle overdispersion in the data [11,12]. The parameter θ in the P L distribution plays a crucial role in determining the variation in the distribution. As θ increases, the variance and the mean of the P L distribution approach to an identical value, a phenomenon known as equidispersion (see [12], 2009 for further explanation). Similarly, the Z O I P L distribution has also been shown to provide better model fittings over the Z O I P distribution due to its ability to handle extra dispersion, of which cannot be single-handedly described by the inflation parameters in the Z O I P distribution [8].
Note that, if ω 0 = 0 , the Z O I P L distribution reduces to a one-inflated Poisson Lindley, O I P L distribution with parameters ω 1 and θ , which have not been studied yet. Readers are advised to not be confused with a one-inflated-positive Poisson Lindley distribution, which was developed to cater for inflation in one-valued data in positive count data [13]. If ω 1 = 0 , the Z O I P L distribution reduces to the zero-inflated Poisson Lindley distribution ( Z I P L ) with parameters ω 0 and θ [14]. If both ω 0 , ω 1 = 0 , the Z O I P L distribution reduces to the standard P L distribution with parameter θ . From the special cases, we can already identify the relationship between these distributions. Based on this idea, the stochastic representations of the Z O I P L distribution can be studied.
Before proceeding with the stochastic representations, we first adopt the definition of a degenerate distribution from Zhang et al. [10] to obtain an identical but compact representation for the pmf of the ZOIPL distribution. Let ξ c ~ D e g e n c be a random variable which follows a degenerate distribution at a single constant point c with Pr ξ c = c = 1 . Let ξ 0 ~ D e g e n 0 , ξ 1 ~ D e g e n 1 and X ~ P L θ be mutually independent. Therefore, the pmf of the Z O I P L can be written as
Pr ( Y = y ) = ω 0 Pr ( ξ 0 = y ) + ω 1 Pr ( ξ 1 = y ) + ω 2 Pr ( X = y ) = [ ω 0 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3 ] I ( y = 0 ) + [ ω 1 + ω 2 θ 2 ( θ + 3 ) ( θ + 1 ) 4 ] I ( y = 1 ) + [ ω 2 θ 2 ( θ + y + 2 ) ( θ + 1 ) y + 3 ] I ( y 2 ) ,
where I · refers to the indicator function, 0 ω 0 ,   ω 1 , ω 2 < 1 , ω 2 = 1 ω 0 ω 1 and ω 0 ,   ω 1 refers to the inflation parameters for excess zeroes and ones, respectively.
The paper is organized as follows: Section 2 describes various stochastic representations of the ZOIPL distribution. Section 3 describes the derivations of n t h moments based on the different stochastic representations. Section 4 describes the derivations of conditional distributions for selected stochastic representations. Section 5 presents several likelihood ratio tests to assess the presence of inflating parameters as well as fixed θ. Section 6 examines the performance of the likelihood ratio tests through a simulation study. Section 7 concludes the study.

2. Stochastic Representation (SR)

Several stochastic representations are discussed to highlight the relationship between the ZOIPL distribution with the zero-inflated Poisson Lindley ( Z I P L ), the zero-truncated Poisson Lindley ( Z T P L ) and the Poisson Lindley, P L distributions. Table 1 provides the probability mass functions for the remaining three distributions.
Before the stochastic representations for the Z O I P L distribution is discussed, we adapt some notations from Zhang et al. [10] and present them in Table 2 to facilitate the understanding of the stochastic representations.

2.1. First Stochastic Representation (SR1)

Let Z = Z 0 , Z 1 , Z 2 T ~ M u l t i n o m i a l 1 ; ω 0 , ω 1 , ω 2 and X ~ P L θ , such that Z X . The first SR for random variable Y ~ Z O I P L ω 0 , ω 1 , θ is given as Y = d Z 0 0 + Z 1 1 + Z 2 X = Z 1 + Z 2 X , or equivalently, Y = { 0 ;   with   probability   ω 0 1 ;   with   probability   ω 1 X ;   with   probability   ω 2 . Since Z 0 + Z 1 + Z 2 = 1 with Pr Z i = 1 = ω i where i = 1 , 2 , 3 , the pmf of Y can be written as:
Pr ( Y = 0 ) = Pr ( Z 0 = 1 ) + Pr ( Z 2 = 1 , X = 0 ) = ω 0 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3 Pr ( Y = 1 ) = Pr ( Z 1 = 1 ) + Pr ( Z 2 = 1 , X = 1 ) = ω 1 + ω 2 θ 2 ( θ + 3 ) ( θ + 1 ) 4 Pr ( Y = y ) = Pr ( Z 2 = 1 , X = y ) = ω 2 θ 2 ( θ + y + 2 ) ( θ + 1 ) y + 3 , y 2 .
From the first SR, the pmf is identical as the pmf of the Z O I P L distribution. Therefore, the random variable Y ~ Z O I P L ω 0 , ω 1 , θ can be denoted as the mixture of ξ 0 ~ D e g e n 0 , ξ 1 ~ D e g e n 1 and X ~ P L θ distributions.

2.2. Second Stochastic Representation (SR2)

Let Z ~ B e r n o u l l i 1 w , H ~ B e r n o u l l i p and X ~ P L θ , such that Z H X . The second SR for random variable Y ~ Z O I P L ω 0 , ω 1 , θ is given as Y = d 1 Z H + Z X , or equivalently,
Y = { H ;   with   probability   w X ;   with   probability   1 w .
Thus, the pmf of Y is given as
Pr ( Y = 0 ) = Pr ( Z = 0 , Y = 0 ) + Pr ( Z = 1 , Y = 0 ) = Pr ( Z = 0 , H = 0 ) + Pr ( Z = 1 , X = 0 ) = w ( 1 p ) + ( 1 w ) θ 2 ( θ + 2 ) ( θ + 1 ) 3 Pr ( Y = 1 ) = Pr ( Z = 0 , Y = 1 ) + Pr ( Z = 1 , Y = 1 ) = Pr ( Z = 0 , H = 1 ) + Pr ( Z = 1 , X = 1 ) = w p + ( 1 w ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 Pr ( Y = y ) = Pr ( Z = 1 , X = y ) = ( 1 w ) θ 2 ( θ + y + 2 ) ( θ + 1 ) y + 3 ; y 2
Using the reparameterizations ω 0 = w 1 p and ω 1 = w p , it can be obtained that w = ω 0 + ω 1 and p = ω 1 / ω 0 + ω 1 . In other words, the random variable Y ~ Z O I P L ω 0 , ω 1 , θ can be denoted as the mixture of B e r n o u l l i ω 1 / ω 0 + ω 1 and P L θ .

2.3. Third Stochastic Representation (SR3)

Let Z ~ B e r n o u l l i 1 w , ξ 1 ~ D e g e n 1 and Y * ~ Z I P L w * , θ , such that Z ξ 1 Y * . The third SR for random variable Y ~ Z O I P L ω 0 , ω 1 , θ is given as Y = d 1 Z ξ 1 + Z Y * , or equivalently,
Y = { 1 ;   with   probability   w Y * ;   with   probability   1 w .
Thus, the pmf of Y is given as
Pr ( Y = 0 ) = Pr ( Z = 1 , Y * = 0 ) = ( 1 w ) [ w * + ( 1 w * ) θ 2 ( θ + 2 ) ( θ + 1 ) 3 ] = w * ( 1 w ) + ( 1 w ) ( 1 w * ) θ 2 ( θ + 2 ) ( θ + 1 ) 3
Pr ( Y = 1 ) = Pr ( Z = 0 ) + Pr ( Z = 1 , Y * = 1 ) = w + ( 1 w ) ( 1 w * ) θ 2 ( θ + 3 ) ( θ + 1 ) 4
Pr ( Y = y ) = Pr ( Z = 1 , Y * = y ) = ( 1 w ) ( 1 w * ) θ 2 ( θ + y + 2 ) ( θ + 1 ) y + 3 ; y 2
Using the reparameterizations ω 0 = w 1 w , ω 1 = w and ω 2 = 1 w 1 w , one can obtain that w = ω 1 and w = ω 0 / 1 ω 1 . In other words, the random variable Y ~ Z O I P L ω 0 , ω 1 , θ can be denoted as the mixture of D e g e n 1 and Z I P L ω 0 / 1 ω 1 , θ .

2.4. Fourth Stochastic Representation (SR4)

Let V ~ Z T P L θ , Z * = Z 0 * , Z 1 * , Z 2 * T ~ M u l t i n o m i a l 1 ; ω 0 * , ω 1 * , ω 2 * , such that V Z * . The fourth SR for random variable Y ~ Z O I P L ω 0 , ω 1 , θ is given as Y = d Z 0 * 0 + Z 1 * 1 + Z 2 * V = Z 1 * + Z 2 * V , or equivalently,
Y = { 0 ;   with   probability   ω 0 * 1 ;   with   probability   ω 1 * V ;   with   probability   ω 2 * .
Thus, the pmf of Y is given as
Pr ( Y = 0 ) = Pr ( Z 0 * = 1 ) = ω 0 * Pr ( Y = 1 ) = Pr ( Z 1 * = 1 ) + Pr ( Z 2 * = 1 , V = 1 ) = ω 1 * + ω 2 * θ 2 ( θ + 3 ) ( θ 2 + 3 θ + 1 ) ( θ + 1 ) 4 Pr ( Y = y ) = Pr ( Z 2 * = 1 , V = y ) = ω 2 * θ 2 ( θ + y + 2 ) ( θ 2 + 3 θ + 1 ) ( θ + 1 ) y + 3 ; y 2
Using the reparameterizations ω 0 * = ω 0 + ω 2 θ 2 θ + 2 / θ + 1 3 , ω 1 * = ω 1 and ω 2 * = ω 2 θ 2 + 3 θ + 1 , one can obtain that ω 1 = ω 1 * , ω 2 = ω 2 * / θ 2 + 3 θ + 1 and ω 0 = ω 0 * ω 2 * θ 2 θ + 2 / θ 2 + 3 θ + 1 θ + 1 3 . Therefore, Y ~ Z O I P L ω 0 , ω 1 , θ can be denoted as the mixture of D e g e n 0 , D e g e n 1 and Z T P L θ .

3. The n t h Moments

In this section, the n t h moments for the Z O I P L distribution using the four stochastic representations, explained in Section 2, will be utilized. Usually, the n t h moments for any zero–one-inflated distributions are obtained directly as
E ( Y n ) = y = 0 y n Pr ( Y = y ) = Pr ( Y = 1 ) + y = 2 y n Pr ( Y = y ) .
With the help from the four stochastic representations, new forms of the n t h moments will be developed. The n t h moments are important in obtaining the mean, variance, skewness, and kurtosis of the distribution. Here, we only show the derivation of the n t h moments using different stochastic representations.

3.1. First Stochastic Representation

Referring to SR1, Y = d Z 1 + Z 2 X . Therefore, the n t h moment of Y is derived as follows:
E ( Y n ) = E [ ( Z 1 + Z 2 X ) n ] = E [ k = 0 n ( n k ) ( Z 1 k Z 2 n k ) X n k ] .
Zhang et al. [10] has mentioned that for any integers i and j , Z i k Z j n k ~ D e g e n 0 for i j . Furthermore, it is trivial to show that E Z i n = E Z i . Therefore, the n t h moment of Y can be simplified as
E ( Y n ) = k = 0 n ( n k ) E ( Z 1 k Z 2 n k ) E ( X n k ) = E ( Z 2 n ) E ( X n ) + k = 1 n 1 ( n k ) E ( Z 1 k Z 2 n k ) E ( X n k ) + E ( Z 1 n ) = ω 1 + ω 2 E ( X n ) .

3.2. Second Stochastic Representation

Referring to SR2, Y = d 1 Z H + Z X . Therefore, the n t h moment of Y is derived as follows:
E ( Y n ) = E { [ ( 1 Z ) H + Z X ] n } = E { k = 0 n ( n k ) [ ( 1 Z ) k Z n k ] H k X n k } .
The n t h moment of Y can be simplified as
E ( Y n ) = k = 0 n ( n k ) E [ ( 1 Z ) k Z n k ] E ( H k ) E ( X n k ) = E ( Z n ) E ( X n ) + k = 1 n 1 ( n k ) E [ ( 1 Z ) k Z n k ] E ( H k ) E ( X n k ) + E [ ( 1 Z ) n ] E ( H n ) = w p + ( 1 w ) E ( X n ) .

3.3. Third Stochastic Representation

Referring to SR3, Y = d 1 Z ξ + Z Y . Therefore, the n t h moment of Y is derived as follows:
E ( Y n ) = E { [ ( 1 Z ) ξ + Z Y * ] n } = E { k = 0 n ( n k ) [ ( 1 Z ) k Z n k ] ξ k Y * n k } .
The n t h moment of Y can be simplified as
E ( Y n ) = k = 0 n ( n k ) E [ ( 1 Z ) k Z n k ] E ( ξ k ) E ( Y * n k ) = E ( Z n ) E ( Y * n ) + k = 1 n 1 ( n k ) E [ ( 1 Z ) k Z n k ] E ( ξ k ) E ( Y * n k ) + E [ ( 1 Z ) n ] E ( ξ n ) = w + ( 1 w ) E ( Y * n ) .

3.4. Fourth Stochastic Representation

Referring to SR4, Y = d Z 1 + Z 1 V . Therefore, the n t h moment of Y is derived as follows:
E ( Y n ) = E [ ( Z 1 * + Z 2 * V ) n ] = E { k = 0 n ( n k ) [ Z 1 * k Z 2 * n k ] V n k } .
The n t h moment of Y can be simplified as
E ( Y n ) = k = 0 n ( n k ) E [ Z 1 * k Z 2 * n k ] E ( V n k ) = E ( Z 2 * n ) E ( V n ) + k = 1 n 1 ( n k ) E [ Z 1 * k Z 2 * n k ] E ( V n k ) + E ( Z 1 * n ) = ω 1 * + ω 2 * E ( V n ) .

4. Conditional Distributions

In this section, the conditional distributions based on the first two stochastic representations will be discussed.

4.1. First Stochastic Representation

Recall that in SR1, Y = d Z 1 + Z 2 X where Z = Z 0 , Z 1 , Z 2 T ~ M u l t i n o m i a l 1 ; ω 0 , ω 1 , ω 2 and X ~ P L θ , such that Z X . We would like to find the conditional distribution for Z | Y and X | Y . The conditional distributions are given in the following theorems.
Theorem 1. 
The joint conditional distribution for  Z | Y is given as
Z | ( Y = y ) ~ { Multinomial ( 1 ; β 1 , 0 , 1 β 1 ) ; if   y = 0 , Multinomial ( 1 ; 0 , β 2 , 1 β 2 ) ; if   y = 1 , Multinomial ( 1 ; 0 , 0 , 1 ) ; if   y 2 ,
where
β 1 = ω 0 ω 0 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3   and   β 2 = ω 1 ω 1 + ω 2 θ 2 ( θ + 3 ) ( θ + 1 ) 4 .
Proof. 
Recall that Z can take on 1 , 0 , 0 T , 0 , 1 , 0 T , 0 , 0 , 1 T and Pr Z = z Y = y = Pr Z 0 = z 0 , Z 1 = z 1 , Z 2 = z 2 , Y = y / Pr Y = y . For Y = 0 ,
Pr ( z = ( 1 , 0 , 0 ) T | Y = 0 ) = β 1 , Pr ( z = ( 0 , 1 , 0 ) T | Y = 0 ) = 0 , Pr ( z = ( 0 , 0 , 1 ) T | Y = 0 ) = 1 β 1 .
Therefore, Z | Y = 0 ~ M u l t i n o m i a l 1 ; β 1 , 0 , 1 β 1 . For Y = 1 ,
Pr ( z = ( 1 , 0 , 0 ) T | Y = 1 ) = 0 , Pr ( z = ( 0 , 1 , 0 ) T | Y = 1 ) = β 2 , Pr ( z = ( 0 , 0 , 1 ) T | Y = 1 ) = 1 β 2 .
Therefore, Z | Y = 1 = M u l t i n o m i a l 1 ; 0 , β 2 , 1 β 2 . Finally, for Y 2 ,
Pr ( z = ( 1 , 0 , 0 ) T | Y = 1 ) = 0 , Pr ( z = ( 0 , 1 , 0 ) T | Y = 1 ) = 0 , Pr ( z = ( 0 , 0 , 1 ) T | Y = 1 ) = 1 .
Therefore, Z | Y = y = M u l t i n o m i a l 1 ; 0 , 0 , 1 for y 2 . □
Corollary 1. 
The marginal conditional distribution  Z i | Y  based on SR1 is
Z 0 | ( Y = y ) ~ { B e r n o u l l i ( β 1 ) ; y = 0 , D e g e n ( 0 ) ; y 0 , Z 1 | ( Y = y ) ~ { B e r n o u l l i ( β 2 ) ; y = 1 , D e g e n ( 0 ) ; y 1 , Z 2 | ( Y = y ) ~ { B e r n o u l l i ( 1 β 1 ) ; y = 0 , B e r n o u l l i ( 1 β 2 ) ; y = 1 , D e g e n ( 1 ) ; y 2 .
Theorem 2. 
The conditional distribution for  X | Y  is given as
X | ( Y = y ) ~ { Z I P L ( 1 β 1 , θ ) ;   if   y = 0 , O I P L ( 1 β 2 , θ ) ;   if   y = 1 , D e g e n ( y ) ;   if   y 2 .
Proof. 
Recall that X ~ P L θ and Pr X Y = y = Pr X = x , Y = y / Pr Y = y . For Y = 0 ,
Pr ( X = x , Y = 0 ) Pr ( Y = 0 ) = Pr ( X = 0 , Z 1 = 0 ) Pr ( Y = 0 ) I ( X = 0 ) + Pr ( X = x , Z 0 = 1 ) Pr ( Y = 0 ) I ( X 0 ) = θ 2 ( θ + 2 ) ( θ + 1 ) 3 ( 1 ω 1 ) ω 0 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3 I ( X = 0 ) + θ 2 ( θ + x + 2 ) ( θ + 1 ) x + 3 ω 0 ω 0 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3 I ( X 0 ) = [ 1 β 1 + β 1 θ 2 ( θ + 2 ) ( θ + 1 ) 3 ] I ( X = 0 ) + [ β 1 θ 2 ( θ + x + 2 ) ( θ + 1 ) x + 3 ] I ( X 0 ) .
Therefore, X | Y = 0 ~ Z I P L 1 β 1 , θ . For Y = 1 ,
Pr ( X = x , Y = 1 ) Pr ( Y = 1 ) = Pr ( X = 1 , Z 0 = 0 ) Pr ( Y = 1 ) I ( X = 1 ) + Pr ( X = x , Z 1 = 1 ) Pr ( Y = 1 ) I ( X 1 ) = θ 2 ( θ + 3 ) ( θ + 1 ) 4 ( 1 ω 0 ) ω 1 + ω 2 θ 2 ( θ + 3 ) ( θ + 1 ) 4 I ( X = 1 ) + θ 2 ( θ + x + 2 ) ( θ + 1 ) x + 3 ω 1 ω 1 + ω 2 θ 2 ( θ + 2 ) ( θ + 1 ) 3 I ( X 1 ) = [ 1 β 2 + β 2 θ 2 ( θ + 3 ) ( θ + 1 ) 4 ] I ( X = 1 ) + [ β 2 θ 2 ( θ + x + 2 ) ( θ + 1 ) x + 3 ] I ( X 1 ) .
Therefore, X | Y = 1 ~ O I P L 1 β 2 , θ . Note that the O I P L distribution has not been explored yet. For Y 2 ,
Pr ( X = x , Y = y ) Pr ( Y = y ) = Pr ( X = y , Z 2 = 1 ) Pr ( Y = y ) = 1 .
Therefore, X | Y = y ~ D e g e n y for y 2 . □

4.2. Second Stochastic Representation

Recall that in SR2, Y = d 1 Z H + Z X where Z ~ B e r n o u l l i 1 w , H ~ B e r n o u l l i p and X ~ P L θ , such that Z H X .
Theorem 3. 
The conditional distribution  Z | Y  is given as
Pr ( Z | Y = y ) = { B e r n o u l l i ( λ 1 ) ; if   y = 0 , B e r n o u l l i ( λ 2 ) ; if   y = 1 , D e g e n ( 1 ) ; if   y 2 ,
where
λ 1 = ( 1 w ) θ 2 ( θ + 2 ) ( θ + 1 ) 3 w ( 1 p ) + ( 1 w ) θ 2 ( θ + 2 ) ( θ + 1 ) 3   and   λ 2 = ( 1 w ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 w p + ( 1 w ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 ,
or equivalently,  λ 1 = 1 β 1  and  λ 2 = 1 β 2 .
Proof. 
Recall that Pr Z = z Y = y = Pr Z = z , Y = y / Pr Y = y and Z can take on the values of either 0 or 1. For Y = 0 ,
Pr ( Z = z , Y = 0 ) Pr ( Y = 0 ) = Pr ( Z = 1 , X = 0 ) Pr ( Y = 0 ) = λ 1 .
Therefore, Z | Y = 0 ~ B e r n o u l l i λ 1 . For Y = 1 ,
Pr ( Z = z , Y = 1 ) Pr ( Y = 1 ) = Pr ( Z = 1 , X = 1 ) Pr ( Y = 1 ) = λ 2 .
Therefore, Z | Y = 1 ~ B e r n o u l l i λ 2 . For Y 2 ,
Pr ( Z = z , Y = y ) Pr ( Y = y ) = Pr ( Z = 1 , X = y ) Pr ( Y = y ) = 1 .
Therefore, Z | Y = y ~ D e g e n 1 for y 2 . Using the reparameterization in SR2, the conditional distribution Z | Y can be written as
Pr ( Z | Y = y ) = { B e r n o u l l i ( 1 β 1 ) ; if   y = 0 , B e r n o u l l i ( 1 β 2 ) ; if   y = 1 , D e g e n ( 1 ) ; if   y 2 ,
or equivalently, 
Pr ( Z | Y = y ) = { B e r n o u l l i ( λ 1 ) ; if   y = 0 , B e r n o u l l i ( λ 2 ) ; if   y = 1 , D e g e n ( 1 ) ; if   y 2 .
Theorem 4. 
The conditional distribution  H | Y  is given as
Pr ( H | Y = y ) = { B e r n o u l l i ( λ 3 ) ; if   y = 0 , B e r n o u l l i ( λ 4 ) ; if   y = 1 , B e r n o u l l i ( p ) ; if   y 2 ,
where
λ 3 = p ( 1 w ) θ 2 ( θ + 2 ) ( θ + 1 ) 3 w ( 1 p ) + ( 1 w ) θ 2 ( θ + 2 ) ( θ + 1 ) 3   and   λ 4 = p [ w + ( 1 w ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 ] w p + ( 1 w ) θ 2 ( θ + 3 ) ( θ + 1 ) 4 ,
or equivalently,  λ 3 = p λ 1 = p 1 β 1  and  λ 4 = 1 1 p λ 2 = 1 1 p 1 β 2 .
Proof. 
Recall that Pr H = η Y = y = Pr H = η , Y = y / Pr Y = y and H = η can take on the values of either 0 or 1. For Y = 0 ,
Pr ( H = η , Y = 0 ) Pr ( Y = 0 ) = Pr ( H = 1 , Z = 1 , X = 0 ) Pr ( Y = 0 ) = λ 3 .
Therefore, H | Y = 0 ~ B e r n o u l l i λ 3 . For Y = 1 ,
Pr ( H = η , Y = 1 ) Pr ( Y = 1 ) = Pr ( H = 1 , Z = 1 , X = 1 ) + Pr ( H = 1 , Z = 0 ) Pr ( Y = 1 ) = λ 4 .
Therefore, H | Y = 1 ~ B e r n o u l l i λ 4 . For Y 2 ,
Pr ( H = η , Y = y ) Pr ( Y = y ) = Pr ( H = 1 , Z = 1 , X = y ) Pr ( Y = y ) = p .
Therefore, H | Y = y ~ B e r n o u l l i p . Using the reparameterization in SR2 and from Theorem 3, the conditional distribution H | Y can be written as
Pr ( H | Y = y ) = { B e r n o u l l i ( p ( 1 β 1 ) ) ; if   y = 0 , B e r n o u l l i ( 1 ( 1 p ) ( 1 β 2 ) ) ; if   y = 1 , B e r n o u l l i ( p ) ; if   y 2 ,
or equivalently,
Pr ( H | Y = y ) = { B e r n o u l l i ( p λ 1 ) ; if   y = 0 , B e r n o u l l i ( 1 ( 1 p ) λ 2 ) ; if   y = 1 , B e r n o u l l i ( p ) ; if   y 2 ,
or equivalently,
Pr ( H | Y = y ) = { B e r n o u l l i ( λ 3 ) ; if   y = 0 , B e r n o u l l i ( λ 4 ) ; if   y = 1 , B e r n o u l l i ( p ) ; if   y 2 .
Theorem 5. 
The conditional distribution for  X | Y  is given as
X | ( Y = y ) ~ { Z I P L ( 1 β 1 , θ ) ; if   y = 0 , O I P L ( 1 β 2 , θ ) ; if   y = 1 , D e g e n ( y ) ; if   y 2 .
Proof. 
Similar to the proof for Theorem 2. □

5. Hypotheses Testing

This section presents two hypotheses involving the presence of one-inflation and a fixed θ . The hypothesis about the presence of one-inflation is examined using a likelihood ratio test, while the hypothesis about a fixed θ involves a two-sided test. The hypothesis about the presence of zero–one-inflation cannot be examined with the likelihood ratio test because the parameter values are situated at the boundary of the confined parameter space [10].

5.1. The Presence of One-Inflation

To investigate the existence of excess ones in the observations, the following null and alternative hypotheses are considered.
H 0 : ω 1 = 0   vs   H 1 : ω 1 > 0 .
The likelihood ratio (LR) test statistics is given as
S 1 = 2 { l ( ω ^ 0 , H 0 , 0 , θ ^ H 0 ) l ( ω ^ 0 , ω ^ 1 , θ ^ ) } ,
where l ( ) refers to the log-likelihood function. This hypothesis tests whether the ZIPL distribution is sufficient to describe the data compared to the Z O I P L distribution. Zhang et al. [10] investigated a similar test, but their study refers to the zero–one-inflated Poisson distribution. The authors mentioned that H 0 results in ω 0 being on the edge of the parameter space. Moreover, the appropriate null distribution is a mixture of D e g e n 0 and χ 2 1 with equal proportion [10,15]. The same conclusion can be drawn for this distribution since the nature of Poisson and Z O I P L distributions is similar to that of the zero–one-inflated Poisson distribution. Therefore, H 0 is rejected if Pr S 1 > s 1 = 1 2 Pr χ 2 1 > s 1 is smaller than the significance level, which is set at α = 0.05 . For more information on the asymptotic properties of likelihood ratio tests, see [16].

5.2. For Fixed θ = θ 0

To investigate the existence of excess ones in the observations, the following null and alternative hypotheses are considered.
H 0 : θ = θ 0   vs   H 1 : θ θ 0 .
The likelihood ratio (LR) test statistics is given as
S 2 = 2 { l ( ω ^ 0 , H 0 , ω ^ 1 , H 0 , θ 0 ) l ( ω ^ 0 , ω ^ 1 , θ ^ ) } .
This hypothesis investigates if a fixed θ 0 but varying ω 0 and ω 1 are adequate in describing the data with comparison to the Z O I P L distribution with three varying parameters. The H 0 is rejected if Pr S 2 > s 2 = Pr χ 2 1 > s 2 is less than the significant level, which is set at α = 0.05 .

6. Simulation Studies

In this section, the hypotheses and its corresponding likelihood ratio tests will be investigated via simulation studies. The simulation studies aim to compare the type I error rates under H 0 and the powers under H 1 .

6.1. Data Generation

To generate random data which follow the Z O I P L distribution, first, recall the SR1. We independently draw z 1 m , , z n m ~ M u l t i n o m i a l 1 ; ω 0 , ω 1 , ω 2 for m = 1 , 2 , , M , where z i m = Z 0 i m , Z 1 i m , Z 2 i m T for i = 1 , 2 , , n . We also draw X 1 m , , X n m ~ P L θ independently. Then, we set Y i m = Z 1 i m + Z 2 i m × X i m for i = 1 , 2 , , n and m = 1 , 2 , , M , where M = 1000 .

6.2. General Algorithm for Hypothesis Testing

Let r be the number of rejecting the H 0 . The type I error rate is obtained by computing r / M when H 0 is true, whereas the power of the test is obtained by computing r / M when we fail to reject H 0 . For the type I error and the power of the test, the sample sizes are set to be n = 200   200   1000 . The procedure to determine the type I error rate and the power of the test is repeated 1000 times. The adjusted Wald technique [17] is used to obtain the 95% confidence interval for the type I error rates. Bradley’s liberal criterion [18] has outlined that if the type I error rates are in the interval α ± 0.05 α , the test is robust. In this case, α = 0.05 , so the test is considered robust when type I error rates are between 0.025 and 0.075 .

6.3. The Presence of One-Inflation

Recall that the H 0 : ω 1 = 0 and H 1 : ω 1 > 0 . For this simulation study, the value of θ is fixed at 1.0 , while the value of ω 0 varies: ω 0 = 0.6 ,   0.7 ,   0.8 ,   0.9 . These different values of ω 0 were selected based on previous studies [8] for the Z O I P L distribution. These ω 0 s are used to study the type I error rates. The results of the simulation studies are shown in Figure 1. Figure 1 shows the type I error rate plots for varying ω 0 . When ω 0 = 0.60 , a sample size of 400 is sufficient to make the type I error rate fall below 0.05. On the other hand, when ω 0 = 0.70 ,   0.90 , at least a sample size of 800 is needed to make a type I error rate fall below 0.05. Surprisingly, when ω 0 = 0.80 , even 200 samples are sufficient. It can be observed that for each value of ω 0 , the type I error rates decrease with increasing sample size n and fall below 0.05 . Zhang et al. [10] mentioned that the smaller the type I error rate, the better the performance of the likelihood ratio test in controlling the error rates.
To assess the power of the test, the values of ω 1 under H 1 are set at 0.02 ,   0.04 ,   0.06 ,   0.08 ,   0.10 ,   0.12 with θ = 1.0 . Let r be the number of rejections of H 0 . The power of the test is obtained by calculating r / M when ω 1 > 0 . The results of the simulation studies are shown in Figure 2. Figure 2 shows the plots for the power of the test when ω 1 varies. It can be observed that the power of the test increases when ω 1 and the sample size n increase. Achieving at least 80% power can be carried out for ω 1 0.06 with at least a sample size of 800. For ω 1 < 0.06 , a large sample size is required for the test to obtain 80% power. This means that when ω 1 is small and close to zero, the test cannot accurately identify the existence of excess ones in the data. Generally, the larger the value of ω 1 , the quicker the power of the test increases as the sample size increases.

6.4. Fixed θ = θ 0

Recall that the H 0 : θ = θ 0 and H 1 : θ θ 0 . For this simulation study, the value of θ 0 varies: θ 0 = 0.5 ,   1.0 ,   1.5 ,   2.0 for the study of Type I error rates. The values for ω 0 = 0.75 and ω 1 = 0.10 are fixed. Figure 3 shows the simulation results of the test. From Figure 3, when θ = 0.5 ,   1.0 , a sample size of 600 is sufficient to make a type I error rate fall below 0.05. When θ = 1.5 , at least a sample size of 800 is needed to make a type I error rate fall below 0.05. Furthermore, a total of 1000 samples are required when θ = 2.0 . Generally, the larger the value of θ , the larger the sample size required so that the type I error becomes smaller than 0.05.
To investigate the power of the test, data are generated assuming that θ = 1.5 ,   2.0 ,   2.5 ,   3.0 , and let θ 0 = 1.0 . Figure 4 shows the simulation results of the test. From Figure 4, it can be noted that as the sample size increases, the power of the test increases. The further the distance between the assumed θ 0 = 1.0 from the true θ , the more powerful the test becomes. To achieve 80% power with 1000 samples, the true θ must be at least equal to 3.0 when θ 0 = 1.0 .

7. Conclusions

In this paper, various stochastic representations for the zero–one-inflated Poisson Lindley distribution have been studied extensively. The stochastic representations allow for us to view the zero–one-inflated Poisson Lindley distribution in different ways by combining several established distributions such as multinomial, degenerate, Poisson Lindley and other distributions. When handling data with excess zeroes and ones, as well as dispersion, these stochastic representations can be exploited. For example, if we are interested in studying positive count data distributions (observed) but we are presented with a full set of data containing both observed and unobserved values, instead of separating the full set of data into both observed and unobserved values, which may incur unnecessary costs, one may use the full set of data and use the fourth stochastic representation to identify the estimated parameter which describes the distribution of the unobserved data.
Besides that, some hypothesis tests have been conducted to investigate the presence of one-inflation in addition to fixed-rate parameters. The extensive simulation studies conducted investigate the ability of the test to handle both type I error and type II error rates in terms of errors as well as powers. All tests, which involve likelihood ratios, are found to be able to handle type I error rates and are found to be powerful as the sample sizes increases; hence, are found to be useful. It is suggested that a sample size of at least 1000 is sufficient for the tests to be useful.

Author Contributions

Conceptualization, R.R.M.T. and N.I.; methodology, R.R.M.T. and N.I.; software, R.R.M.T.; validation, N.I.; formal analysis, R.R.M.T.; writing—original draft preparation, R.R.M.T.; writing—review and editing, R.R.M.T. and N.I.; visualization, R.R.M.T.; supervision, N.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Iddi, S.; Doku-Amponsah, K. Statistical Model for Overdispersed Count Outcome with Many Zeros: An Approach for Marginal Inference. S. Afr. Stat. J. 2016, 50, 313–337. [Google Scholar] [CrossRef]
  2. Lambert, D. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
  3. Neelon, B.H.; O’Malley, A.J.; Normand, S.-L.T. A Bayesian Model for Repeated Measures Zero-Inflated Count Data with Application to Outpatient Psychiatric Service Use. Stat. Model. 2010, 10, 421–439. [Google Scholar] [CrossRef] [PubMed]
  4. Ridout, M.; Demétrio, C.G.B.; Hinde, J. Models for Count Data with Many Zeros. In Proceedings of the XIXth International Biometric Conference, Cape Town, South Africa, 14–18 December 1998; International Biometric Society Invited Papers: Cape Town, South Africa, 1998; Volume 19, pp. 179–192. [Google Scholar]
  5. Young, D.S.; Roemmele, E.S.; Yeh, P. Zero-inflated Modeling Part I: Traditional Zero-inflated Count Regression Models, Their Applications, and Computational Tools. Wiley Interdiscip. Rev. Comput. Stat. 2020, 14, e1541. [Google Scholar] [CrossRef]
  6. Lin, T.H.; Tsai, M. Modeling Health Survey Data with Excessive Zero and K Responses. Stat. Med. 2012, 32, 1572–1583. [Google Scholar] [CrossRef] [PubMed]
  7. Jornsatian, C.; Bodhisuwan, W. Zero-One Inflated Negative Binomial—Beta Exponential Distribution for Count Data with Many Zeros and Ones. Commun. Stat.-Theory Methods 2022, 51, 8517–8531. [Google Scholar] [CrossRef]
  8. Tajuddin, R.R.M.; Ismail, N.; Ibrahim, K.; Bakar, S.A.A. A New Zero–One-Inflated Poisson–Lindley Distribution for Modelling Overdispersed Count Data. Bull. Malays. Math. Sci. Soc. 2022, 45, 21–35. [Google Scholar] [CrossRef]
  9. Melkersson, M.; Olsson, C. Is Visiting the Dentist a Good Habit?: Analyzing Count Data with Excess Zeros and Excess Ones; University of Umeå: Umeå, Sweden, 1999. [Google Scholar]
  10. Zhang, C.; Tian, G.L.; Ng, K.W. Properties of the Zero-and-One Inflated Poisson Distribution and Likelihood-Based Inference Methods. Stat. Interface 2016, 9, 11–32. [Google Scholar] [CrossRef]
  11. Sankaran, M. 275. Note: The Discrete Poisson-Lindley Distribution. Biometrics 1970, 26, 145–149. [Google Scholar] [CrossRef]
  12. Ghitany, M.E.; Al-Mutairi, D.K. Estimation Methods for the Discrete Poisson–Lindley Distribution. J. Stat. Comput. Simul. 2009, 79, 1–9. [Google Scholar] [CrossRef]
  13. Tajuddin, R.R.M.; Ismail, N.; Ibrahim, K. Estimating Population Size of Criminals: A New Horvitz–Thompson Estimator under One-Inflated Positive Poisson–Lindley Model. Crime Delinq. 2022, 68, 1004–1034. [Google Scholar] [CrossRef]
  14. Borah, M.; Nath, A.D. A Study on the Inflated Poisson Lindley Distribution. J. Indian Soc. Agric. Stat. 2001, 54, 317–323. [Google Scholar]
  15. Joe, H.; Zhu, R. Generalized Poisson Distribution: The Property of Mixture of Poisson and Comparison with Negative Binomial Distribution. Biom. J. 2005, 47, 219–229. [Google Scholar] [CrossRef] [PubMed]
  16. Self, S.G.; Liang, K.-Y. Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions. J. Am. Stat. Assoc. 1987, 82, 605–610. [Google Scholar] [CrossRef]
  17. Agresti, A.; Coull, B.A. Approximate Is Better than “Exact” for Interval Estimation of Binomial Proportions. Am. Stat. 1998, 52, 119–126. [Google Scholar] [CrossRef]
  18. Bradley, J.V. Robustness? Br. J. Math. Stat. Psychol. 1978, 31, 144–152. [Google Scholar] [CrossRef]
Figure 1. Type I error rates for different values of ω 0 and N .
Figure 1. Type I error rates for different values of ω 0 and N .
Mathematics 12 00778 g001
Figure 2. Power of the likelihood ratio test for different values of ω 1 and N .
Figure 2. Power of the likelihood ratio test for different values of ω 1 and N .
Mathematics 12 00778 g002
Figure 3. Type I error rates for different values of θ and N .
Figure 3. Type I error rates for different values of θ and N .
Mathematics 12 00778 g003
Figure 4. Power of the likelihood ratio test for different values of θ and N .
Figure 4. Power of the likelihood ratio test for different values of θ and N .
Mathematics 12 00778 g004
Table 1. The probability mass functions for the Z I P L , P L and the Z T P L distributions.
Table 1. The probability mass functions for the Z I P L , P L and the Z T P L distributions.
DistributionProbability Mass Function
Z I P L Pr Y = y ω 0 , ω 1 , θ = ω 0 + ω 1 θ 2 θ + 2 θ + 1 3 ; y = 0 ω 1 θ 2 θ + y + 2 θ + 1 y + 3 ; y 1
where ω 1 = 1 ω 0 and ω 0 refers to the inflation parameter for the excess zeroes.
P L Pr X = y θ = θ 2 θ + y + 2 θ + 1 y + 3 ; y 0
Z T P L Pr V = v θ = θ 2 θ + v + 2 θ 2 + 3 θ + 1 θ + 1 v ; v 1
Table 2. Notations and their descriptions.
Table 2. Notations and their descriptions.
NotationDescription
A ~ Q τ Random   variable   A   follows   a   Q   distribution   with   parameter   τ .
A Vector   A .
A T Transpose   of   vector   A .
A B C Random   variables   A ,   B   and   C are mutually independent.
A = d B + C Random   variables   A   and   B + C have the same distribution.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tajuddin, R.R.M.; Ismail, N. On Stochastic Representations of the Zero–One-Inflated Poisson Lindley Distribution. Mathematics 2024, 12, 778. https://doi.org/10.3390/math12050778

AMA Style

Tajuddin RRM, Ismail N. On Stochastic Representations of the Zero–One-Inflated Poisson Lindley Distribution. Mathematics. 2024; 12(5):778. https://doi.org/10.3390/math12050778

Chicago/Turabian Style

Tajuddin, Razik Ridzuan Mohd, and Noriszura Ismail. 2024. "On Stochastic Representations of the Zero–One-Inflated Poisson Lindley Distribution" Mathematics 12, no. 5: 778. https://doi.org/10.3390/math12050778

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop