Next Article in Journal
Random Variables Aren’t Random
Previous Article in Journal
Fatigue Detection Algorithm for Nuclear Power Plant Operators Based on Random Forest and Back Propagation Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Default Priors in a Zero-Inflated Poisson Distribution: Intrinsic Versus Integral Priors

Department of Mathematical Data Science, Hanyang University, Ansan 15588, Republic of Korea
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(5), 773; https://doi.org/10.3390/math13050773
Submission received: 26 December 2024 / Revised: 15 February 2025 / Accepted: 24 February 2025 / Published: 26 February 2025
(This article belongs to the Section D1: Probability and Statistics)

Abstract

:
Prior elicitation is an important issue in both subjective and objective Bayesian frameworks, where prior distributions impose certain information on parameters before data are observed. Caution is warranted when utilizing noninformative priors for hypothesis testing or model selection. Since noninformative priors are often improper, the Bayes factor, i.e., the ratio of two marginal distributions, is not properly determined due to unspecified constants contained in the Bayes factor. An adjusted Bayes factor using a data-splitting idea, which is called the intrinsic Bayes factor, can often be used as a default measure to circumvent this indeterminacy. On the other hand, if reasonable (possibly proper) called intrinsic priors are available, the intrinsic Bayes factor can be approximated by calculating the ordinary Bayes factor with intrinsic priors. Additionally, the concept of the integral prior, inspired by the generalized expected posterior prior, often serves to mitigate the uncertainty in traditional Bayes factors. Consequently, the Bayes factor derived from this approach can effectively approximate the conventional Bayes factor. In this article, we present default Bayesian procedures when testing the zero inflation parameter in a zero-inflated Poisson distribution. Approximation methods are used to derive intrinsic and integral priors for testing the zero inflation parameter. A Monte Carlo simulation study is carried out to demonstrate theoretical outcomes, and two real datasets are analyzed to support the results found in this paper.

1. Introduction

Discrete data have been analyzed across various fields, ranging from the social sciences, such as economics and management, to natural sciences, like ecology. Several distributions may be utilized in dealing with discrete data, among which the Poisson distribution is one of the most commonly used models when a random variable of interest is the number of occurrences observed within a specific time period. For instance, the Poisson distribution is commonly used to observe the number of earthquakes per month or the number of daily occurrences of disease. However, these count datasets often contain a substantial number of zeros, yielding inaccurate results associated with biased estimators when using the conventional Poisson distribution.
To handle zero-inflated data, Cohen [1] proposed a model that considers zero inflation patterns, which led to the extensive work of Lambert [2] on zero-inflated Poisson (ZIP) distributions. In the literature, analysis of the ZIP distribution has been conducted using the maximum likelihood estimation and likelihood ratio test methods. Several improvements of the ZIP distribution have been introduced by various researchers since then. For example, Hall [3] and Yau and Lee [4] incorporated random effects into the ZIP model. In the Bayesian framework, Ghosh et al. [5] conducted various simulations to demonstrate the performance of Bayes estimates in the ZIP model, outperforming the frequentist approach. Recently, Yirdraw et al. [6] utilized the ZIP distribution to model pediatric disease-related data in Ethiopia, and Tshishimbi [7] proposed an augmented inverse probability weighting method to address missing data in the context of ZIP distributions.
Prior elicitation plays an important role in Bayesian inference, in which the prior distribution reflects pre-existing beliefs and uncertainties before data are observed. Choosing prior distributions is even more crucial when dealing with hypothesis testing or model selection. In objective Bayesian perspectives, the Jeffreys priors proposed by Jeffreys [8], or the reference priors developed by Berger and Bernardo [9], are commonly employed due to a lack of information and resources. However, these noninformative prior densities often yield non-finite values after integrating over a given support, i.e., they are improper. This implies that these noninformative priors are only defined up to an arbitrary multiplicative constant. Subsequently, the resulting Bayes factor involves a ratio of unspecified constants that are also arbitrary, and it is not well defined.
Berger and Pericchi [10] proposed a new criterion, called the intrinsic Bayes factor (IBF), to resolve the arbitrariness issue, and they ultimately suggested default Bayes procedures when dealing with model selection or hypothesis testing. The concept of the IBF is based on a data-splitting idea, in which a part of the data is used as a training sample. The IBF methodology can facilitate the elimination of arbitrary constants contained in two marginal distributions, resulting in well-defined Bayes factors. The IBF can produce stable results under various settings in the model selection context (see Lingham and Sivaganesan [11] and Sanso et al. [12] for more details).
On the other hand, prior elicitation would be a remedy as a justification for the use of the full likelihood. Berger and Pericchi [10] suggested plausible (possibly proper) priors, called intrinsic priors, for avoiding heavy computations of the IBF. Consequently, at least asymptotically, it is desirable to satisfy equivalence between the IBF and the ordinary Bayes factor that is calculated with intrinsic priors.
Cano et al. [13] introduced the concept of integral priors to compare two nonnested models by constructing priors based on cross-integrated expressions of the models. Similar to intrinsic priors, integral priors address the issue of arbitrariness on constant terms when noninformative priors are used. By calculating ordinary Bayes factors through the use of integral priors, Cano et al. [13] opened an alternative pathway for deriving well-defined Bayes factors. Subsequent research by Salmeron et al. [14] demonstrated the suitability of integral priors for Bayesian hypothesis testing. Recently, Salmeron et al. [15] further utilized this approach to illustrate the conditions under which integral priors can be applied when comparing multiple models.
There has not been much work conducted on deriving intrinsic priors for several models, mainly due to inherent difficulties in calculating the expected value of the marginal distribution of data. These phenomena especially occur when dealing with discrete distributions. Bayarri and García-Donato [16] conducted a Bayesian analysis to test the zero-inflated parameter in the ZIP distribution for which finding intrinsic priors is unavailable. Sivaganesan and Jiang [17] conducted default Bayesian procedures by testing the mean of the Poisson distribution and derived several intrinsic priors in the regular Poisson distribution. Recently, Han et al. [18] utilized an approximation method to derive intrinsic priors for testing the Poisson count parameter when the underlying distribution follows a ZIP. In this article, we employed similar approximation methods to those in Han et al. [18] to derive intrinsic priors for testing the zero-inflated parameter of the ZIP. Integral priors were also derived through an approximation method to compare with the intrinsic priors in which the zeta and hypergeometric functions were utilized to circumvent the complexity arising from the infinite summation.
The rest of this paper is organized in the following manner. In Section 2, default Bayesian procedures for hypothesis testing and model selection are presented along with a method for deriving intrinsic and integral priors. Section 3 presents detailed procedures for testing the zero-inflated parameter of the ZIP, including an encompassing approach to handle two nonnested hypotheses. An extensive simulation study is performed to evaluate the plausibility of the results in Section 4. A yellow dust storm dataset and a book reading dataset are analyzed for illustration purposes to support our findings in Section 5. A likelihood ratio test is conducted to compare the results with Bayesian approaches for both simulation and real data analysis. Finally, we finish the article with concluding remarks in Section 6.

2. Default Procedures in Bayesian Testing

Consider q models or hypotheses H 1 , , H q that are contending with each other, where any of which is a plausible model. If Model H i holds, Data x follow a parametric distribution with probability density (or mass) function f j ( x | θ j ) , where θ j are unknown parameters for j = 1 , 2 , , q . We assigned the prior model probability p ( H j ) of Model H j being the true model before data are observed. Let Θ j be the parameter space for θ j , and let π j ( θ j ) be the prior density for θ j under H j . Then, the posterior model probability that H j is a true model can be expressed as
Pr ( H j ; x ) = p ( H j ) m j ( x ) i = 1 q p ( H i ) m i ( x ) ,
where m j ( x ) = Θ j f j ( x | θ j ) π j ( θ j ) d θ j is called the marginal or predictive density of X under Model H j , and j = 1 , 2 , , q . Subsequently, for the given Data x , the model with the largest posterior probability in (1) can be regarded as the most plausible. On the other hand, we can define the Bayes factor Model H j to Model H i as
B j i ( x ) = m j ( x ) m i ( x ) = Θ j f j ( x | θ j ) π j ( θ j ) d θ j Θ i f i ( x | θ i ) π i ( θ i ) d θ i ,
that is, the Bayes factor is the ratio of two marginal densities. Kass and Raftery [19] suggested the scale for interpretation of Bayes factors as a selection measure.
Ideally, one would impose proper priors or informative priors on each model. However, especially in an objective Bayesian framework, the use of a noninformative (improper) prior is often desirable to match or compare with the frequentist approach (Reid et al. [20]; Datta and Mukerjee [21]). Suppose we have two models, denoted by H 0 and H 1 , to determine which is more plausible. Further, let π N j ( θ j ) ( j = 0 , 1 ) be the improper prior density. Then, the Bayes factor in (2) can be expressed as
B 10 N ( x ) = m 1 N ( x ) m 0 N ( x ) = Θ 1 f 1 ( x | θ 1 ) π 1 N ( θ 1 ) d θ 1 Θ 0 f 0 ( x | θ 0 ) π 0 N ( θ 0 ) d θ 0 .
Notice that we put an N in the superscript to denote ‘noninformative’. A noninformative prior π N j ( θ j ) is typically improper, and thus the density is defined only up to an arbitrary constant c j . Thus, the Bayes factor in (3) is defined only up to c 1 / c 0 , which is also arbitrary, making the Bayes factor not well defined. This motivated Berger and Pericchi [10] to use a part of the data, called a training sample, to circumvent indeterminacy issues. Formally, let x ( l ) be a minimal training sample in the sense that both marginals m 0 ( x ( l ) ) and m 1 ( x ( l ) ) are finite, and no subset of x ( l ) produces finite marginals. As such, Berger and Pericchi [10] proposed using the intrinsic Bayes factor as a default Bayes factor with the following form:
B 10 D ( x ) = B 10 N ( x ) · C F A 01 ,
where C F A 01 is often called the correction factor. Recall the B 10 N ( x ) is defined in (3), and it is calculated with the full data x . Moreover, the Bayes factor B 10 D ( x ) is well defined due to removal of arbitrary constants. Note that there are two types of correction factors based on arithmetic and geometric approaches, respectively. We only focused on the arithmetic approach in our analysis. By Berger and Pericchi [10], the arithmetic intrinsic Bayes factor (AIBF) of H 1 to H 0 is given by (4), where
C F A 01 = 1 L l = 1 L B 01 N ( x ( l ) ) .
Here, L is the number of all possible minimal training samples. In Section 3, we present our main results for testing the zero-inflated parameter when the underlying distribution follows a zero-inflated Poisson distribution.
Meanwhile, Peréz and Berger [22] defined the expected posterior prior for θ i as
π i * ( θ i ) = p i N ( θ i | x * ) m * ( x * ) d x * ,
where x * denotes the imaginary minimal training sample and p i N ( θ i | x * ) is the posterior based on x * . Note that m * could be the marginal or some other functions. Thus, m * is not fixed and can be chosen from various candidates (see Peréz and Berger [22] for selection schemes on m * ). As such, another default Bayes factor can be expressed as
B j i * ( x ) = m j N ( x | x * ) m * ( x * ) d x * m i N ( x | x * ) m * ( x * ) d x * .
Notice that the numerator and denominator share the same function m * in (5). However, the integral prior takes a different approach by constructing priors using distinct m * functions. Further details on this methodology will be provided in Section 3.3.

3. Testing for a Zero-Inflated Parameter in the ZIP

3.1. Default Bayes Testing for ω

Consider a random variable X having a zero-inflated Poisson distribution with the following probability mass function:
f ( x | ω , λ ) = ω + ( 1 ω ) e λ , for x = 0 , ( 1 ω ) e λ λ x x ! , for x = 1 , 2 , ,
where ω is often called the zero-inflated parameter. We denote the distribution in (6) as ZIP ( ω , λ ) in short for convenience. Let X = ( X 1 , X 2 , , X n ) be a random sample from ZIP ( ω , λ ) . We want to test
H 0 : ω = ω 0 versus H 1 : ω ω 0 ,
where ω 0 is a given value. Let α denote the number of zero observations, i.e., α = i = 1 n I ( X i = 0 ) , and let s = i = 1 n X i denote the sum of total observations. For the observed Data x , the likelihood functions under H 0 and H 1 are given, respectively, by
L 0 ( x | λ ) ω 0 + ( 1 ω 0 ) e λ α ( 1 ω 0 ) n α e ( n α ) λ λ s , L 1 ( x | ω , λ ) ω + ( 1 ω ) e λ α ( 1 ω ) n α e ( n α ) λ λ s .
We consider noninformative priors for both H 0 and H 1 as starting priors under independent a prioris for H 1 , that is,
π 0 N ( λ ) = λ c , λ > 0 and π 1 N ( ω , λ ) = λ c , 0 < ω < 1 , λ > 0 ,
where c can be diversely chosen for flexibility. Based on the full Data x , the marginal under H 0 is
m 0 ( x ) = Γ ( α + 1 ) Γ ( s c + 1 ) j = 0 α ω 0 j ( 1 ω 0 ) α j ( n j ) s c + 1 Γ ( j + 1 ) Γ ( α j + 1 ) ,
and the marginal under H 1 is
m 1 ( x ) = Γ ( s c + 1 ) j = 0 α Γ ( n j + 1 ) ( n j ) s c + 1 Γ ( α j + 1 ) .
Thus, the Bayes factor B 10 N ( x ) with the full sample x is the ratio of the two marginals given by (9) and (10). We take a zero and a nonzero observation x l from the ZIP distribution as a training sample to calculate the correction factor. As such, the training sample is denoted as x ( l ) = ( 0 , x l ) . Note that both marginals with only x l are finite, which implies that x ( l ) is not minimal. However, if we only use a nonzero observation for the training sample, the whole process does not rely on the ZIP distribution but on the regular Poisson distribution. Thus, we utilize x ( l ) as a training sample, even though it is not minimal. Subsequently, the likelihood functions under H 0 and H 1 based on the training sample x ( l ) are given, respectively, by
L 0 ( x ( l ) | λ ) = ω 0 + ( 1 ω 0 ) e λ ( 1 ω 0 ) e λ λ x l x l !
and
L 1 ( x ( l ) | ω , λ ) = ω + ( 1 ω ) e λ ( 1 ω ) e λ λ x l x l ! .
After straightforward algebra, the correction factor for the AIBF is given by
CFA 01 = 6 ( 1 ω 0 ) n α l = 1 n α ω 0 + ( 1 ω 0 ) 2 ( x l c + 1 ) 1 + 2 ( x l c ) .

3.2. Intrinsic Prior

Recall that the AIBF depends on the training sample and is well defined because unspecified constants in starting improper priors are canceled out. However, the AIBF could require heavy computation in cases where the size of the training sample is large. As such, heavy computation would be relieved if reasonable priors are available to directly utilize the full likelihood. Such a prior is called an intrinsic prior (Berger and Pericchi [10]). We state the required formulae under our setup (see Berger and Pericchi [10] and Moreno and Pericchi [23] for more details) to present the main results regarding intrinsic priors in conjunction with the AIBF. Note that there are two types of approaches for deriving intrinsic priors: direct and conditional approaches. We only focused on the direct approach. A set of intrinsic priors is given by
π 0 I ( λ ) = π 0 N ( λ ) , π 1 I ( ω , λ ) = π 1 N ( ω , λ ) E H 1 B 01 N ( x ( l ) | ( ω , λ ) .
Proposition 1.
The intrinsic prior for ( ω , λ ) under H 1 based on the direct approach is
π 1 I ( ω , λ ) = 6 ( 1 ω 0 ) λ c e λ 1 e λ [ 2 c x = 1 k λ x x ! ω 0 + 1 ω 0 2 x c + 1 + ω 0 ( e λ 1 ) + ( 1 ω 0 ) 2 c 1 ( e λ / 2 1 ) ] ,
where k is a given constant.
We provide some empirical calculations of the summand in (A1), which appears in the proof of Proposition 1 to justify the use of approximation. Let c = 1 / 2 for simplicity, and let
g ( x ) = λ x x ! · 1 2 x + 2 ω 0 2 x + 1 ω 0 2
be a function of x that should be manipulated through a reasonable approximation. Further, let
g * ( x ) = ( λ / 2 ) x x ! ω 0 2 x + 1 ω 0 2
be another function used for approximation in which 2 in g ( x ) is excluded. Figure 1 shows the two functions g ( x ) and g * ( x ) when ω 0 = 0.8 and λ = 4 for comparison purposes. We see that there is a number of differences between the ranges of the two functions for x < 5 , while the difference vanishes when x > 5 . We calculated the differences between g ( x ) and g * ( x ) to ensure if the approximation is plausible. Table 1 provides the values of the differences when ω 0 is 0.2 and 0.8 and  λ is 3 and 4 for x = 13 ( 1 ) 18 . We can see that the difference increases as ω 0 increases, and it also increases as λ increases.

3.3. Integral Prior

As mentioned in Section 2, Cano et al. [13] drew inspiration from the ideas of Peréz and Berger [22] to develop the concept of the integral prior. The proposed forms of the integral prior of each hypothesis are
π 0 T ( θ 0 ) = χ p 0 N ( θ 0 | x * ) m 1 ( x * ) d x *
and
π 1 T ( θ 1 ) = χ p 1 N ( θ 1 | x * ) m 0 ( x * ) d x * .
Unlike Peréz and Berger [22], it can be observed that m * in (5) can be specified as distinct functions. Specifically, m 0 ( x * ) in (14) is obtained using the default prior, while the resulting π 1 ( θ 1 ) is subsequently used to derive m 1 ( x * ) in (13).
Proposition 2.
The integral prior for ( ω , λ ) under H 1 is
π 1 T ( ω , λ ) = 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] ,
where k is a given constant and ζ ( s ) is called the zeta function, which is defined as
ζ ( s ) = k = 1 1 k s .
Similar to Proposition 1, an approximation method was utilized to obtain the result. For the purposes of consistency and simplicity in the analysis, we assumed c = 1 / 2 . Let
g 1 ( y ) = λ y y ! 2 y + 1 / 2 2 y + 1 / 2 + 2 ζ ( y + 1 / 2 )
be a function of y that should be manipulated through a reasonable approximation. Let
g 1 * ( y ) = λ y y !
be another function used for approximation. Figure 2 shows the two functions g 1 ( y ) and g 1 * ( y ) when λ = 4 for the comparison purposes conducted in Proposition 1.
Proposition 3.
The integral prior for λ under H 0 is
π 0 T ( λ ) = ( ω 0 + ( 1 ω 0 ) e λ ) λ c [ y = 1 k κ 1 ( y ) { κ 2 ( x , y ) + κ 3 ( y ) } + x = 1 k κ 4 ( x ) + 1 ω 0 ( e λ 1 ) ] ,
where
κ 1 ( y ) = 1 y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } , κ 2 ( x , y ) = x = 1 k λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) [ 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) 1 ω 0 ( 1 2 x + y c + 1 + 1 3 x + y c ) ] , κ 3 ( y ) = 1 ω 0 2 y c + 1 Γ ( y c + 1 ) Γ ( c + 1 ) ( F 1 1 ( y c + 1 , c + 1 , λ / 2 ) 1 ) + 1 ω 0 3 y c + 1 Γ ( y c + 1 ) Γ ( c + 1 ) ( F 1 1 ( y c + 1 , c + 1 , λ / 3 ) 1 ) , κ 4 ( x ) = λ x x ! [ 2 x c + 1 + 2 ω 0 2 x c + 1 + ( 1 ω 0 ) 1 ω 0 ] .
Here k, k , and  k are given constants, and F q p ( a 1 , a 2 , , a p ; b 1 , b 2 , , b q ; z ) is called a hypergeometric function and is defined as
F q p ( a 1 , a 2 , , a p ; b 1 , b 2 , , b q ; z ) n = 0 a 1 n ¯ a 2 n ¯ a p n ¯ b 1 n ¯ b 2 n ¯ b q n ¯ z n n ! .
Note that a n ¯ is called the rising factorial and is defined as a n ¯ = k = 1 n ( a + k 1 ) .
Two approximations are provided in the proof of Proposition 3. To proceed with the first approximation, let
g 2 ( x ) = λ x x ! Γ ( x + y + 1 / 2 ) Γ ( x + 1 / 2 ) 2 x + 1 / 2 ω 0 2 x + 1 / 2 + ( 1 ω 0 ) ( 3 ζ ( x + y + 1 / 2 ) 3 1 2 x + y 1 / 2 )
be a function of x that should be manipulated through a reasonable approximation. Let
g 2 * ( x ) = λ x x ! Γ ( x + y + 1 / 2 ) Γ ( x + 1 / 2 ) 1 ω 0 ( 1 2 x + y + 1 / 2 + 1 3 x + y 1 / 2 )
be another function used for approximation. Figure 3 shows the two functions g 2 ( x ) and g 2 * ( x ) when ω 0 = 0.8 and λ = 4 for comparison purposes. On the other hand, we noted that the second approximation closely resembled the approximation method used in Proposition 1.
We set k = k = k = 15 to be associated with the intrinsic and integral priors in Propositions 1, 2, and 3 for both simulations and real data analyses.

3.4. An Encompassing Approach

Consider two nonnested hypotheses regarding based on a random sample from ZIP ( ω , λ ) , that is, we are interested in testing
H E 1 : ω < ω 0 versus H E 2 : ω > ω 0 .
Since one cannot determine which model is more complex, Berger and Pericchi [10] suggested applying a global model, often called the encompassing model. Kim and Sun [24] conducted multiple tests on the power law process model using the encompassing approach. Sivaganesan and Jiang [17] used the encompassing model for testing the Poisson mean.
Let H E 0 : ω ω 0 be the encompassing model. Then, the encompassing arithmetic Bayes factor of H E 2 to H E 1 is defined as
B 21 E I ( x ) = B 21 N ( x ) CFA 10 E CFA 20 E ,
where B 21 N ( x ) is the Bayes factor of H E 2 to H E 1 based on the full Data x with improper priors, and the correction factors are expressed as
CFA 10 E = 1 n α l = 1 n α 0 0 ω 0 A ( λ , ω ) d ω d λ 0 0 1 A ( λ , ω ) d ω d λ
and
CFA 20 E = 1 n α l = 1 n α 0 ω 0 1 A ( λ , ω ) d ω d λ 0 0 1 A ( λ , ω ) d ω d λ ,
where
A ( ω , λ ) = [ ω + ( 1 ω ) e λ ] ( 1 ω ) e λ λ x l c .
Results for both simulations and real data analysis are presented in Section 4 and Section 5 to assess the model adequacy.

4. Simulation Studies

In this section, we conduct Monte Carlo simulations to evaluate the performance of default Bayes factors in testing the null hypothesis H 0 : ω = ω 0 against the alternative H 1 : ω ω 0 when the underlying distribution follows a zero-inflated Poisson distribution with the parameters λ and ω . We generate data with four values of ω : 0.2 , 0.4 , 0.6 , and  0.8 . We used two values of λ , 3 and 4, to identify an effect on λ . Regarding sample sizes, we used n = 30 , 50 , and 100 to check asymptotic behaviors. We calculated the default Bayes factor B 10 D ( x ) defined in (4) along with the two Bayes factors calculated with the intrinsic prior in (12) and the integral priors in (15) and (16), which are based only on the full sample x and are denoted by B 10 I and B 10 T , respectively. We utilized R for our analysis, employing the parallel, doParallel, and foreach libraries for parallel processing. The pracma library was used to compute the Riemann zeta function, the hypergeo library for the hypergeometric function, and the cubature library for numerical integration.
Table 2 provides the simulated averages and standard deviations (in parentheses) of the three Bayes factors based on 1000 simulated data after eliminating 0.5% from both the top and bottom outliers. We also provide the median of the p-values based on the likelihood ratio test (LRT) to compare our Bayesian approach with the frequentist approach. Next, we provide four proportions based on 1000 replications: the proportion for which B 10 D > 1 (supporting H 1 based on the Bayes factor), the proportion for which p L R < 0.05 (supporting H 1 based on the LRT), the proportion for which both Bayesian and frequentist approaches support the true models (denoted by P 1 ), and the proportion for which both Bayesian and frequentist approaches have the same conclusions (denoted by P 2 ). First, we see that the direct approach works almost perfectly (in the sense that the two Bayes factors, B 10 D and B 10 I , are very close to each other on average). We noticed that there was a little discrepancy between B 10 D and B 10 T . Second, the results show that the proposed approach based on Bayes factors effectively identifies the true model, and the proportion supporting the true model increased as the sample size increased for most of the cases considered here. In particular, when the difference between ω 0 and ω is 0.2, the Proportion P 1 increases rapidly as the sample size grows. This indicates that increasing sample sizes could entitle successful validations in which there is a not much difference between ω 0 and ω . Regarding the comparison with the frequentist approach associated with the LRT, the proposed approach with Bayes factors provides comparable results in identifying the true model. Finally, there does not seem to be a large difference between λ = 3 and λ = 4 , though the results were slightly better when λ = 4 than when λ = 3 for all of the cases considered here.
Another simulation study was carried out for testing H E 1 : ω < ω 0 versus H E 2 : ω > ω 0 to assess the effectiveness of the encompassing approach presented in Section 3.4 by calculating the encompassing arithmetic Bayes factor defined in (17). We used three ω 0 values (0.5, 0.6, and 0.7) to generate data by adding and subtracting 0.1 and 0.2 from ω 0 . We also used the same λ values and sample sizes that were considered in the previous simulation setup. We report the posterior probability of Pr ( H E 2 ; x ) defined in (1) instead of the Bayes factors so as to promptly check which of the two models identified the truth. Table 3 provides the results for the posterior probabilities with different configurations of parameters along with different sample sizes. Overall, the simulated data effectively support the true model. Specifically, the results are slightly better in terms of the posterior probabilities when ω 0 is smaller. There did not appear to be large differences when λ changed.

5. Real Data Analysis

In this section, we demonstrate our proposed methodologies under the ZIP model with two real datasets possessing significant zero inflation patterns across time intervals. We calculated the default Bayes factors to see if they are close to each other. In addition, p-values based on the likelihood ratio test are reported for comparison purposes.

5.1. Yellow Dust Storm Data

We examined a dataset representing the number of yellow dust storms that occurred in South Korea during six-month intervals from the first half of 2003 to the second half of 2022. The sand and dust particles are mainly derived from the deserts in China and Mongolia, and they are carried by westerly winds from the Yangtze River, particularly in the spring season. Therefore, the frequency of yellow dust storms is dramatically higher during the first half of the year. We selected Incheon, a city in South Korea that is directly affected by yellow dust storms because it is only 360 km (about 224 miles) from mainland China. There are 10 zero observations from n = 40 . Data are publicly accessible from the Korea Meteorological Administration website. Table 4 shows the results of testing H 0 : ω = ω 0 with different values of ω 0 that vary from 0.2 to 0.8 to see if the Bayes factors and the p-values effectively support the true models. We present the three Bayes factors along with the p-value as they were in the simulation study. The empirical proportion of zero observations is 25%, indicating that all three Bayes factors have minimum values at ω 0 = 0.25 . Meanwhile, the p-value has a maximum value of 0.942 at ω 0 = 0.25 , and decreases as ω 0 moves away from 0.25.

5.2. Book Reading Data

In this subsection, a reading dataset is employed to demonstrate the performance of the proposed methodologies as an illustration. The Korean Ministry of Culture, Sports, and Tourism has been conducting annual national reading surveys since 1993. The survey targets 6000 adults older than 19 years living in South Korea and asks the following: “How many paper books did you read from September 2020 to August 2021, excluding textbooks, reference books, and test preparation books?”. We extracted the data for male residents of Gyeonggi Province and gathered a total of 338 samples, 55.5% of which were zero observations. Table 5 shows the results for testing ω with a similar fashion to that which was conducted in Section 5.1. All three Bayes factors have minimum values at ω 0 = 0.55 . On the other hand, the p-value has a maximum value of 0.959 at ω 0 = 0.55 , as expected. Since we used a decently large sample of 338, a 5% differential from 0.55 yielded small p-values of 0.057 and 0.067 when ω 0 is 0.5 and 0.6, respectively. These phenomena can also be observed for the Bayes factors.

6. Concluding Remarks

In this paper, we presented Bayesian testing procedures on the zero-inflated parameter in the zero-inflated Poisson distribution. We utilized a training sample to calculate the arithmetic intrinsic Bayes factor. Two types of priors were derived based on reasonable approximations. It turned out that the results were promising with the intrinsic prior outperforming the integral priors. We also tested nonnested hypotheses using the encompassing approach, and the simulation results showed that the true model was adequately captured. The proposed Bayesian approach and the existing frequentist approach with the likelihood ratio test were compared, and simulation studies yielded comparable results. Finally, two real datasets were analyzed to demonstrate practical applicability.
Since the Poisson distribution has the same mean and variance, discrepancies often arise as the actual data frequently exhibit different mean and variance values; this would be the most notable characteristic and limitation of the Poisson distribution. However, because of its inherent properties, the zero-inflated Poisson distribution naturally allows for a variance greater than the mean. Consequently, even when overdispersion occurs due to the presence of outliers in the data, the ZIP distribution can accommodate such variability to some extent.
There are some drawbacks in our approach that warrant further attention. While the three Bayes factors yielded similar values under the ZIP framework, the difference between the AIBF and the ordinary Bayes factor with the intrinsic prior should be expected to decrease in a consistent pattern as the sample size increases. However, this consistency is not perfectly observed. Analysis with extreme values or outliers is somewhat hindered due to overflow on numerical integration. Thus, there could be some limitations in calculations when the dataset contains these values. In addition, as appeared in the simulation studies, the results are better for larger values of λ . However, detailed explanations of these results are not available, necessitating further investigation. For future research, it is of our interest to consider a two-sample problem in testing both Poisson count parameter and the zero-inflated parameter in which default Bayesian procedures in conjunction with intrinsic and integral priors are proposed. Such research is in progress, and we hope to report the results in a future paper. Furthermore, it might be possible to find intrinsic and integral priors for the encompassing approach under our setup. This project may pose considerable challenges for future research.

Author Contributions

Conceptualization, J.H., K.K. and S.W.K.; methodology, J.H.; software, J.H.; validation, J.H., K.K. and S.W.K.; formal analysis, J.H.; investigation, J.H.; resources, J.H.; data curation, K.K.; writing—original draft preparation, J.H. and K.K.; writing—review and editing, J.H., K.K. and S.W.K.; visualization, J.H.; supervision, S.W.K.; project administration, S.W.K.; funding acquisition, K.K. and S.W.K. All authors have read and agreed to the published version of the manuscript.

Funding

K. Kim’s research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), which is funded by the Ministry of Education (NRF-202400000003334). S. W. Kim’s research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), which is funded by the Ministry of Education (NRF-2021R1A2C1005271).

Data Availability Statement

The original data presented in the study are openly available in the Korea Meteorological Administration at http://www.weather.go.kr and the Korean Ministry of Culture, Sports, and Tourism at https://mdis.kostat.go.kr/index.do.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Proof of Proposition 1.
Note that x l x follows a zero-truncated Poisson distribution with parameter λ . As such, we have
E B 01 N ( x ( l ) ) = x = 1 6 ( 1 ω 0 ) ω 0 + ( 1 ω 0 ) · 2 ( x + c ) 1 + 2 ( x c ) · e λ λ x x ! ( 1 e λ ) = 6 ( 1 ω 0 ) e λ 1 e λ · x = 1 λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 .
Now, let
q ( λ ) = x = 1 λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 .
Notice that q ( λ ) in (A1) does not have a closed form due to the 2 c in the summand. We modified the summand to achieve a closed form through an approximation. Rewrite q ( λ ) as
q ( λ ) = x = 1 k λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 + x = k + 1 λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 x = 1 k λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 + x = k + 1 ( λ / 2 ) x x ! ω 0 2 x + ( 1 ω 0 ) 2 c 1 = x = 1 k λ x x ! · 1 2 x + 2 c ω 0 2 x + ( 1 ω 0 ) 2 c 1 x = 1 k ( λ / 2 ) x x ! ω 0 2 x + ( 1 ω 0 ) 2 c 1 + x = 1 ( λ / 2 ) x x ! ω 0 2 x + ( 1 ω 0 ) 2 c 1 = x = 1 k λ x x ! · 2 c ( 2 x + 2 c ) 2 x ω 0 2 x + ( 1 ω 0 ) 2 c 1 + x = 1 ( λ / 2 ) x x ! ω 0 2 x + ( 1 ω 0 ) 2 c 1 = 2 c x = 1 k λ x x ! · 1 2 x + 2 c ω 0 + 1 ω 0 2 x c + 1 + ω 0 x = 1 λ x x ! + ( 1 ω 0 ) 2 c 1 x = 1 ( λ / 2 ) x x ! = 2 c x = 1 k λ x x ! · 1 2 x + 2 c ω 0 + 1 ω 0 2 x c + 1 + ω 0 ( e λ 1 ) + ( 1 ω 0 ) 2 c 1 ( e λ / 2 1 ) .
Thus, the result is readily obtained by the formula associated with the directional approach. □
Proof of Proposition 2.
Note that y y ( l ) follows a zero-truncated Poisson distribution with parameter λ . As such, we have
π 1 T ( ω , λ ) = y = 1 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ y c y ! · 6 y ! 1 Γ ( y c + 1 ) 2 y c + 1 2 y c + 1 + 2 0 1 1 e λ e λ λ y c y ! d λ = y = 1 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ y c y ! · 6 y ! 1 Γ ( y c + 1 ) 2 y c + 1 2 y c + 1 + 2 1 y ! 0 e λ 1 e λ λ y c d λ = 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c y = 1 λ y y ! 2 y c + 1 2 y c + 1 + 2 1 Γ ( y c + 1 ) 0 λ y c i = 1 e i y d λ = 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c y = 1 λ y y ! 2 y c + 1 2 y c + 1 + 2 1 Γ ( y c + 1 ) i = 1 Γ ( y c + 1 ) i y c + 1 d λ = 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c y = 1 λ y y ! 2 y c + 1 2 y c + 1 + 2 1 Γ ( y c + 1 ) Γ ( y c + 1 ) ζ ( y c + 1 ) = 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c y = 1 λ y y ! 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] .
Proof of Proposition 3.
To eliminate the nuisance parameter ω in π 1 T ( ω , λ ) , we integrated out with respect to ω . This results in π 1 T ( λ ) having the following expression:
π 1 T ( λ ) = 0 1 6 ( ω + ( 1 ω ) e λ ) ( 1 ω ) e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] d ω = 6 e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] 0 1 ( ω + ( 1 ω ) e λ ) ( 1 ω ) d ω = 6 e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] 1 + 2 e λ 6 = ( 1 + 2 e λ ) e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] .
Thus, the marginal m 1 ( x ) with π 1 T ( λ ) becomes
m 1 ( x ) = 0 e λ 1 e λ λ x x ! ( 1 + 2 e λ ) e λ λ c [ e λ 1 + y = 1 k λ y y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } ] d λ = 1 x ! 0 e 2 λ λ x c 1 e λ ( 1 + 2 e λ ) ( e λ 1 ) d λ + 1 x ! y = 1 k 1 y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } 0 e 2 λ + 2 e 3 λ 1 e λ λ x + y c d λ = 1 x ! 0 ( 1 + 2 e λ ) e λ λ x c d λ + 1 x ! y = 1 k 1 y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } 0 ( n = 1 e ( n + 1 ) λ + 2 m = 1 e ( m + 2 ) λ ) λ x + y c d λ = 1 x ! Γ ( x c + 1 ) ( 1 + 1 2 x c ) + 1 x ! y = 1 k 1 y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } Γ ( x + y c + 1 ) · ( n = 1 1 ( n + 1 ) x + y c + 1 + 2 m = 1 1 ( m + 2 ) x + y c + 1 ) = 1 x ! [ Γ ( x c + 1 ) ( 1 + 1 2 x c ) + y = 1 k 1 y ! { 2 y c + 1 2 y c + 1 + 2 ζ ( y c + 1 ) 1 } Γ ( x + y c + 1 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) ] 1 x ! [ Γ ( x c + 1 ) ( 1 + 1 2 x c ) + y = 1 k κ 1 ( y ) Γ ( x + y c + 1 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) ] .
Then, π 0 T ( λ ) is calculated as follows:
π 0 T ( λ ) = x = 1 ( ω 0 + ( 1 ω 0 ) e λ ) ( 1 ω 0 ) e λ λ x c x ! x ! ( 1 ω 0 ) 1 Γ ( x c + 1 ) 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) · m 1 ( x ) = ( ω 0 + ( 1 ω 0 ) ) e λ λ c · [ y = 1 k κ 1 ( y ) x = 1 λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) + x = 1 λ x x ! 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 1 + 1 2 x c ) ] .
Using the approximation process, the two infinite sums in π 0 T ( λ ) become
x = 1 λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) x = 1 k λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) + x = k + 1 λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) 1 ω 0 ( 1 2 x + y c + 1 + 1 3 x + y c ) = x = 1 k λ x x ! Γ ( x + y c + 1 ) Γ ( x c + 1 ) [ 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 3 ζ ( x + y c + 1 ) 3 1 2 x + y c ) 1 ω 0 ( 1 2 x + y c + 1 + 1 3 x + y c ) ] + 1 ω 0 2 y c + 1 Γ ( y c + 1 ) Γ ( c + 1 ) ( F 1 1 ( y c + 1 , c + 1 , λ / 2 ) 1 ) + 1 ω 0 3 y c + 1 Γ ( y c + 1 ) Γ ( c + 1 ) ( F 1 1 ( y c + 1 , c + 1 , λ / 3 ) 1 ) κ 2 ( x , y ) + κ 3 ( y )
and
x = 1 λ x x ! 2 x c + 1 ω 0 2 x c + 1 + ( 1 ω 0 ) ( 1 + 1 2 x c ) x = 1 k λ x x ! [ 2 x c + 1 + 2 ω 0 2 x c + 1 + ( 1 ω 0 ) 1 ω 0 ] + 1 ω 0 ( e λ 1 ) x = 1 k κ 4 ( x ) + 1 ω 0 ( e λ 1 ) .
Therefore, the integral prior for λ under H 0 is
π 0 T ( λ ) = ( ω 0 + ( 1 ω 0 ) ) e λ λ c [ y = 1 k κ 1 ( y ) { κ 2 ( x , y ) + κ 3 ( y ) } + x = 1 k κ 4 ( x ) + 1 ω 0 ( e λ 1 ) ] .

References

  1. Cohen, A.C. Estimation in mixtures of two normal distributions. Technometrics 1967, 9, 15–28. [Google Scholar] [CrossRef]
  2. Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
  3. Hall, D.B. Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 2000, 56, 1030–1039. [Google Scholar] [CrossRef] [PubMed]
  4. Yau, K.K.W.; Lee, A.H. Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Stat. Med. 2001, 20, 2907–2920. [Google Scholar] [CrossRef]
  5. Ghosh, S.K.; Mukhopadhyay, P.; Lu, J. Bayesian analysis of zero-inflated regression models. J. Stat. Plan. Inference 2006, 136, 1360–1375. [Google Scholar] [CrossRef]
  6. Yirdraw, B.S.; Debusho, L.K.; Samuel, A. Application of longitudinal multilevel zero inflated Poisson regression in modeling of infectious diseases among infants in Ethiopia. BMC Infect. Dis. 2024, 24, 927. [Google Scholar]
  7. Tshishimbi, W.M. Double robust semiparametric weighted M-estimators of a zero-inflated Poisson regression with missing data in covariates. Commun.-Stat.-Simul. Comput. 2024, 1–24. [Google Scholar] [CrossRef]
  8. Jeffreys, H. The Theory of Probability, 3rd ed.; Oxford University Press: New York, NY, USA, 1961; ISBN 978-01-9850-368-2. [Google Scholar]
  9. Berger, J.O.; Bernardo, J.M. Estimating a product of means: Bayesian analysis with reference priors. J. Am. Stat. Assoc. 1989, 84, 200–207. [Google Scholar] [CrossRef]
  10. Berger, J.O.; Pericchi, L.R. The intrinsic Bayes factor for model selection and prediction. J. Am. Stat. Assoc. 1996, 91, 109–122. [Google Scholar] [CrossRef]
  11. Lingham, R.T.; Sivaganesan, S. Intrinsic Bayes factor approach to a test for the power law process. J. Stat. Plan. Inference 1999, 77, 195–220. [Google Scholar] [CrossRef]
  12. Sanso, B.; Pericchi, L.R.; Moreno, E.; Racugno, W. On the robustness of the intrinsic Bayes factor for nested models [with discussion and rejoinder]. Lect. Notes-Monogr. Ser. 1996, 29, 155–173. [Google Scholar]
  13. Cano, J.A.; Robert, C.P.; Salmerón, D. Integral equation solutions as prior distributions for model selection. Test 2008, 17, 493–504. [Google Scholar] [CrossRef]
  14. Salmerón, D.; Cano, J.A.; Robert, C.P. Objective Bayesian hypothesis testing in binomial regression models with intgral prior distributions. Stat. Sin. 2015, 25, 1009–1023. [Google Scholar]
  15. Salmerón, D.; Cano, J.A.; Robert, C.P. On integral priors for multiple comparison in Bayesian model selection. arXiv 2024, arXiv:2406.14184. [Google Scholar]
  16. Bayarri, M.J.; García-Donato, G. Generalization of Jeffreys divergence-based priors for Bayesian hypothesis testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 981–1003. [Google Scholar] [CrossRef]
  17. Sivaganesan, S.; Jiang, D. Objective Bayesian testing of a Poisson mean. Commun. Stat.-Theory Methods 2010, 39, 1887–1897. [Google Scholar] [CrossRef]
  18. Han, Y.; Hwang, H.; Ng, H.K.T.; Kim, S.W. Default Bayesian testing for the zero-inflated Poisson distribution. Stat. Its Interface 2024, 17, 623–634. [Google Scholar] [CrossRef]
  19. Kass, R.E.; Raftery, A.E. Bayes factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
  20. Reid, N.; Mukerjee, R.; Fraser, D.A.S. Some aspects of matching priors. Lect. Notes-Monogr. Ser. 2003, 42, 31–43. [Google Scholar]
  21. Datta, G.S.; Mukerjee, R. Probability Matching Priors: Higher Order Asymptotics; Springer: New York, NY, USA, 2004; ISBN 978-0-387-20329-4. [Google Scholar]
  22. Peréz, J.M.; Berger, J.O. Expected-posterior prior distributions for model selection. Biometrika 2002, 89, 491–511. [Google Scholar] [CrossRef]
  23. Moreno, E.; Pericchi, L.R. Intrinsic priors for objective Bayesian model selection. Adv. Econom. 2014, 34, 279–300. [Google Scholar]
  24. Kim, S.W.; Sun, D. Intrinsic priors for model selection using an encompassing model with applications to censored failure time data. Lifetime Data Anal. 2000, 6, 251–269. [Google Scholar] [CrossRef]
Figure 1. Comparison of g * ( x ) and g ( x ) when ω 0 = 0.8 and λ = 4 .
Figure 1. Comparison of g * ( x ) and g ( x ) when ω 0 = 0.8 and λ = 4 .
Mathematics 13 00773 g001
Figure 2. Comparison of g 1 * ( y ) and g 1 ( y ) when λ = 4 .
Figure 2. Comparison of g 1 * ( y ) and g 1 ( y ) when λ = 4 .
Mathematics 13 00773 g002
Figure 3. Comparison of g 2 * ( x ) and g 2 ( x ) when ω 0 = 0.8 and λ = 4 .
Figure 3. Comparison of g 2 * ( x ) and g 2 ( x ) when ω 0 = 0.8 and λ = 4 .
Mathematics 13 00773 g003
Table 1. The difference between g * ( x ) and g ( x ) with different values of ω 0 and λ .
Table 1. The difference between g * ( x ) and g ( x ) with different values of ω 0 and λ .
g * ( x ) g ( x )
ω 0 λ x = 13 x = 14 x = 15 x = 16 x = 17 x = 18
0.23 8.84 × 10 9 9.47 × 10 10 9.47 × 10 11 8.88 × 10 12 7.83 × 10 13 6.53 × 10 14
4 3.72 × 10 7 5.32 × 10 8 7.09 × 10 9 8.86 × 10 10 1.04 × 10 10 1.16 × 10 11
0.83 3.54 × 10 8 3.79 × 10 9 3.79 × 10 10 3.55 × 10 11 3.13 × 10 12 2.61 × 10 13
4 1.49 × 10 6 2.13 × 10 7 2.83 × 10 8 3.54 × 10 9 4.17 × 10 10 4.63 × 10 11
Table 2. The 0.5% trimmed average values and standard deviations (in parentheses) of the three Bayes factors based on 1000 replications. The following proportions are provided: B 10 D > 1 , p L R < 0.05 , P 1 , where both B 10 D and the LRT support the true model; and P 2 : B 10 D and the LRT yield the same conclusion.
Table 2. The 0.5% trimmed average values and standard deviations (in parentheses) of the three Bayes factors based on 1000 replications. The following proportions are provided: B 10 D > 1 , p L R < 0.05 , P 1 , where both B 10 D and the LRT support the true model; and P 2 : B 10 D and the LRT yield the same conclusion.
Propositions
ω 0 ω λ n B 10 D B 10 I B 10 T p-Value B 10 D > 1 p L R < 0.05 P 1 P 2
0.20.23300.436
(0.443)
0.437
(0.444)
0.420
(0.447)
0.5266.72.593.295.8
500.350
(0.378)
0.351
(0.379)
0.339
(0.380)
0.5215.03.295.098.2
1000.263
(0.361)
0.263
(0.362)
0.259
(0.369)
0.5403.33.795.397.6
4300.391
(0.431)
0.392
(0.434)
0.376
(0.461)
0.5015.94.094.198.1
500.306
(0.353)
0.307
(0.354)
0.296
(0.348)
0.5184.13.195.898.8
1000.231
(0.298)
0.231
(0.297)
0.228
(0.331)
0.5193.14.195.999.0
0.433092.405
(498.004)
92.726
(500.266)
113.300
(615.487)
0.07069.943.243.273.2
50 1.38 × 10 3
( 9.32 × 10 3 )
1.38 × 10 3
( 9.25 × 10 3 )
1.72 × 10 3
( 1.15 × 10 4 )
0.01682.766.666.683.9
100 1.77 × 10 6
( 1.74 × 10 7 )
1.76 × 10 6
( 1.73 × 10 7 )
2.30 × 10 6
( 2.29 × 10 7 )
4.27 × 10 4 97.395.095.097.6
430332.806
( 2.28 × 10 3 )
334.251
( 2.28 × 10 3 )
429.511
( 2.87 × 10 3 )
0.02273.961.861.887.9
50 1.87 × 10 4
( 3.05 × 10 5 )
1.86 × 10 4
( 3.02 × 10 5 )
2.53 × 10 4
( 4.11 × 10 5 )
0.00387.085.085.097.9
100 5.61 × 10 7
( 6.92 × 10 8 )
5.62 × 10 7
( 6.94 × 10 8 )
7.75 × 10 7
( 9.52 × 10 8 )
1.79 × 10 5 99.199.199.1100.0
0.6330 2.62 × 10 5
( 2.38 × 10 6 )
2.66 × 10 5
( 2.42 × 10 6 )
2.34 × 10 5
( 2.06 × 10 6 )
0.00499.780.280.280.5
50 6.04 × 10 8
( 4.92 × 10 9 )
6.07 × 10 8
( 4.93 × 10 9 )
5.69 × 10 8
( 4.52 × 10 9 )
1.21 × 10 4 100.093.193.193.1
100 2.56 × 10 17
( 5.76 × 10 18 )
2.58 × 10 17
( 5.79 × 10 18 )
2.59 × 10 17
( 5.78 × 10 18 )
1.83 × 10 8 100.099.999.999.9
430 3.45 × 10 6
( 2.29 × 10 7 )
3.44 × 10 6
( 2.27 × 10 7 )
3.32 × 10 6
( 2.17 × 10 7 )
5.86 × 10 5 99.998.898.798.7
50 1.90 × 10 11
( 1.53 × 10 12 )
1.91 × 10 11
( 1.54 × 10 12 )
1.93 × 10 11
( 1.53 × 10 12 )
4.91 × 10 8 100.0100.0100.0100.0
100 4.30 × 10 21
( 7.01 × 10 22 )
4.30 × 10 21
( 7.00 × 10 22 )
4.77 × 10 21
( 7.80 × 10 22 )
2.11 × 10 15 100.0100.0100.0100.0
0.80.43303.36 × 10 7
(2.53 × 10 8 )
3.37 × 10 7
(2.56 × 10 8 )
3.82 × 10 7
(2.86 × 10 8 )
4.35 × 10 6 99.599.399.399.8
506.33 × 10 11
(4.93 × 10 12 )
6.33 × 10 11
(4.93 × 10 12 )
7.83 × 10 11
(6.13 × 10 12 )
1.82 × 10 9 100.0100.0100.0100.0
1007.97 × 10 21
(1.02 × 10 23 )
7.95 × 10 21
(1.02 × 10 23 )
1.04 × 10 22
(1.32 × 10 23 )
<2 × 10 16 100.0100.0100.0100.0
4301.10 × 10 8
(1.12 × 10 9 )
1.10 × 10 8
(1.13 × 10 9 )
1.11 × 10 8
(1.10 × 10 9 )
1.36 × 10 6 99.799.799.7100.0
505.00 × 10 12
(6.96 × 10 13 )
4.99 × 10 12
(6.95 × 10 13 )
5.48 × 10 12
(7.52 × 10 13 )
1.60 × 10 9 100.0100.0100.0100.0
1004.06 × 10 22
(4.70 × 10 23 )
4.05 × 10 22
(4.69 × 10 23 )
5.01 × 10 22
(5.78 × 10 23 )
<2 × 10 16 100.0100.0100.0100.0
0.6330362.850
(2.40 × 10 3 )
363.706
(2.41 × 10 3 )
527.645
(3.45 × 10 3 )
0.01470.667.067.096.5
501.38 × 10 4
(1.03 × 10 5 )
1.38 × 10 4
(1.03 × 10 5 )
2.06 × 10 4
(1.54 × 10 5 )
0.00285.986.985.999.0
1002.87 × 10 7
(2.80 × 10 8 )
2.87 × 10 7
(2.80 × 10 8 )
4.33 × 10 7
(4.23 × 10 8 )
6.59 × 10 6 98.199.098.199.1
430444.102
(2.62 × 10 3 )
444.267
(2.62 × 10 3 )
618.955
(3.62 × 10 3 )
0.01169.368.068.098.8
501.98 × 10 4
(2.26 × 10 5 )
1.99 × 10 4
(2.27 × 10 5 )
2.83 × 10 4
(3.21 × 10 5 )
0.00188.588.588.5100.0
1001.28 × 10 8
(1.47 × 10 9 )
1.29 × 10 8
(1.48 × 10 9 )
1.87 × 10 8
(2.15 × 10 9 )
3.82 × 10 6 98.799.198.799.6
0.83300.316
(0.350)
0.313
(0.345)
0.327
(0.439)
0.4976.64.393.497.7
500.259
(0.361)
0.258
(0.357)
0.257
(0.438)
0.4225.66.094.099.6
1000.224
(0.618)
0.223
(0.610)
0.201
(0.431)
0.4183.97.392.796.6
4300.315
(0.354)
0.311
(0.347)
0.307
(0.398)
0.5964.44.195.699.6
500.273
(0.399)
0.271
(0.393)
0.259
(0.436)
0.4575.95.994.0100.0
1000.220
(0.487)
0.219
(0.486)
0.201
(0.413)
0.5264.05.994.198.1
Table 3. Simulation results for testing H E 1 : ω < ω 0 versus H E 2 : ω > ω 0 based on the posterior probability. The results are based on 1000 replications.
Table 3. Simulation results for testing H E 1 : ω < ω 0 versus H E 2 : ω > ω 0 based on the posterior probability. The results are based on 1000 replications.
Posterior Probability; Pr ( H E 2 ; x )
ω λ n ω = ω 0 0.2 ω = ω 0 0.1 ω = ω 0 + 0.1 ω = ω 0 + 0.2
0.63200.031 (0.048)0.085 (0.130)0.451 (0.331)0.711 (0.303)
300.016 (0.052)0.053 (0.098)0.471 (0.343)0.794 (0.270)
500.006 (0.025)0.025 (0.064)0.512 (0.355)0.896 (0.203)
4200.024 (0.041)0.073 (0.127)0.441 (0.337)0.724 (0.308)
300.012 (0.023)0.044 (0.091)0.468 (0.353)0.807 (0.275)
500.004 (0.005)0.021 (0.063)0.514 (0.364)0.906 (0.199)
0.63200.058 (0.078)0.131 (0.162)0.548 (0.316)0.807 (0.236)
300.030 (0.048)0.088 (0.131)0.566 (0.323)0.868 (0.204)
500.011 (0.015)0.042 (0.072)0.604 (0.330)0.945 (0.137)
4200.049 (0.078)0.115 (0.158)0.544 (0.327)0.821 (0.238)
300.024 (0.045)0.077 (0.123)0.564 (0.334)0.884 (0.201)
500.009 (0.012)0.034 (0.057)0.608 (0.339)0.952 (0.133)
0.73200.107 (0.108)0.221 (0.201)0.679 (0.265)0.898 (0.147)
300.059 (0.064)0.148 (0.160)0.696 (0.276)0.949 (0.097)
500.022 (0.023)0.080 (0.116)0.743 (0.278)0.985 (0.047)
4200.090 (0.097)0.198 (0.197)0.676 (0.275)0.910 (0.148)
300.047 (0.050)0.129 (0.152)0.697 (0.285)0.961 (0.091)
500.018 (0.018)0.069 (0.108)0.744 (0.288)0.991 (0.035)
Table 4. Results of testing H 0 : ω = ω 0 versus H 1 : ω ω 0 for yellow dust storm data. The three Bayes factors and p-values are provided with different ω 0 values.
Table 4. Results of testing H 0 : ω = ω 0 versus H 1 : ω ω 0 for yellow dust storm data. The three Bayes factors and p-values are provided with different ω 0 values.
ω 0 B 10 D B 10 I B 10 T p-Value
0.20.2340.2230.2500.496
0.250.2050.2000.1930.942
0.30.2950.2920.2520.444
0.350.6310.6330.5040.156
0.41.9311.9581.4740.041
0.458.3328.5176.2070.008
0.551.14052.62737.8950.001
0.55459.887475.909345.409 9.83 × 10 5
0.6 6.39 × 10 3 6.64 × 10 3 4.96 × 10 3 5.92 × 10 6
0.65 1.49 × 10 5 1.56 × 10 5 1.23 × 10 5 2.10 × 10 7
0.7 6.69 × 10 6 7.01 × 10 6 5.96 × 10 6 3.80 × 10 9
0.75 7.10 × 10 8 7.46 × 10 8 7.09 × 10 8 2.83 × 10 11
0.8 2.57 × 10 11 2.70 × 10 11 3.00 × 10 11 5.91 × 10 14
Table 5. Results of testing H 0 : ω = ω 0 versus H 1 : ω ω 0 for book reading data. The three Bayes factors and the p-values are provided with different ω 0 values.
Table 5. Results of testing H 0 : ω = ω 0 versus H 1 : ω ω 0 for book reading data. The three Bayes factors and the p-values are provided with different ω 0 values.
ω 0 B 10 D B 10 I B 10 T p-Value
0.2 8.50 × 10 42 8.35 × 10 42 1.22 × 10 43 <2 × 10 16
0.25 1.86 × 10 29 1.84 × 10 29 2.31 × 10 29 <2 × 10 16
0.3 1.22 × 10 19 1.21 × 10 19 1.36 × 10 19 <2 × 10 16
0.35 2.95 × 10 11 2.96 × 10 11 3.06 × 10 11 3.41 × 10 14
0.4 7.89 × 10 5 7.93 × 10 5 7.77 × 10 5 1.70 × 10 8
0.45113.364114.286108.393 1.71 × 10 4
0.50.5880.5950.5570.057
0.550.0940.0950.0900.959
0.60.4770.4840.4700.067
0.65101.381102.990105.429 1.65 × 10 4
0.7 1.57 × 10 6 1.60 × 10 6 1.77 × 10 6 6.55 × 10 9
0.75 4.90 × 10 12 4.99 × 10 12 6.18 × 10 12 1.33 × 10 15
0.8 1.90 × 10 22 1.94 × 10 22 2.80 × 10 22 <2 × 10 16
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hong, J.; Kim, K.; Kim, S.W. Default Priors in a Zero-Inflated Poisson Distribution: Intrinsic Versus Integral Priors. Mathematics 2025, 13, 773. https://doi.org/10.3390/math13050773

AMA Style

Hong J, Kim K, Kim SW. Default Priors in a Zero-Inflated Poisson Distribution: Intrinsic Versus Integral Priors. Mathematics. 2025; 13(5):773. https://doi.org/10.3390/math13050773

Chicago/Turabian Style

Hong, Junhyeok, Kipum Kim, and Seong W. Kim. 2025. "Default Priors in a Zero-Inflated Poisson Distribution: Intrinsic Versus Integral Priors" Mathematics 13, no. 5: 773. https://doi.org/10.3390/math13050773

APA Style

Hong, J., Kim, K., & Kim, S. W. (2025). Default Priors in a Zero-Inflated Poisson Distribution: Intrinsic Versus Integral Priors. Mathematics, 13(5), 773. https://doi.org/10.3390/math13050773

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop