Next Article in Journal
On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses
Previous Article in Journal
A Data-Driven Approach of DRG-Based Medical Insurance Payment Policy Formulation in China Based on an Optimization Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model

by
Michael Pearce
1,* and
Michael D. Perlman
2
1
Department of Mathematics and Statistics, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
2
Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195, USA
*
Author to whom correspondence should be addressed.
Stats 2025, 8(3), 55; https://doi.org/10.3390/stats8030055
Submission received: 28 May 2025 / Revised: 2 July 2025 / Accepted: 3 July 2025 / Published: 5 July 2025

Abstract

The problem of estimating the ratio of the means of a two-component Poisson mixture model is considered, when each component is subject to zero-inflation, i.e., excess zero counts. The resulting zero-inflated Poisson mixture (ZIPM) model can be viewed as a three-component Poisson mixture model with one degenerate component. The EM algorithm is applied to obtain frequentist estimators and their standard errors, the latter determined via an explicit expression for the observed information matrix. As an intermediate step, we derive an explicit expression for standard errors in the two-component Poisson mixture model (without zero-inflation), a new result. The ZIPM model is applied to simulated data and real ecological count data of frigatebirds on the Coral Sea Islands off the coast of Northeast Australia.

1. Introduction

Baker and Holdsworth (2013) [1] present data relevant to the determination of the relative abundances of two subspecies of frigatebirds (FB), least (LFB) and greater (GFB), in the Coral Sea Islands off the coast of Northeast Australia. The available data is indirect, consisting only of counts of nests in several standardized sites over several time points, rather than direct observations of individuals. Furthermore, the nests of LFB and GFB usually are indistinguishable, (possibly) differing only in their relative numbers per site. Thus, to infer the type of nest one may use techniques from model-based clustering, such as a finite mixture model [2]. In the presence of count data (such as ours), a finite mixture of Poisson distributions is a common choice [3,4]. Previous authors have studied identifiability and estimation of this and related models (e.g., [2]). Given inferred nest type, ecologists may be interested in estimating other qualities of the ecosystem: If the expected numbers of nests per site for LFB and GFB are denoted by μ and ν respectively, in this paper we study estimation of their ratio, θ μ / ν , where 0 < θ < .
Several complications arise. Because no further constraint can be imposed on θ a priori, the problem is unidentifiable as stated, i.e., ( μ , ν ) is indistinguishable from ( ν , μ ) . However, LFB are less prevalent than GFB (based on available labeled data, see [1]), which will render the model identifiable; see the second paragraph below. Furthermore, it is typical of such field studies that zero counts are recorded for reasons other than true absence, such as short study periods, secretive or small species, or other uncontrollable factors. In such cases, it is necessary to model the excessive zero-counts directly to not contaminate other inferences [5]. As is commonly done, we shall adopt the zero-inflated Poisson (ZIP) distribution to represent this feature [6,7,8].
We now state a probability model based on a finite mixture model of zero-inflated Poisson distributions to represent the aforementioned scenario: Let Y j indicate whether LFB ( Y j = 1 ) or GFB ( Y j = 0 ) are observed in time point j. Let M i j be the actual number of FB nests at site i in time point j (which may not be directly observed due to zero-inflation). Let Z i j indicate whether nest counts are directly observed ( Z i j = 1 ) or subject to zero-inflation and hence lost ( Z i j = 0 ). Finally, let N i j denote the number of FB nests observed at site i at time point j. Let I { 1 , , I } and J { 1 , , J } be the corresponding index sets, and set K = I × J , K = | K | = I J . For ( i , j ) K , consider random variables (rvs),
Y j Bernoulli ( π ) ,
M i j | Y j Poisson t i [ ( 1 0 Y j ) μ + 0 Y j ν ] ,
Z i j Bernoulli ( ϵ ) ,
N i j = Z i j M i j ;
where 0 0 = 1 , { Y j } and { Z i j } are mutually independent, and { M i j } and { Z i j } are conditionally mutually independent given { Y j } . Thus M i j  is a π- mixture of Poisson ( t i μ ) and Poisson ( t i ν ) rvs, where each t i > 0 is known, potentially reflecting a feature of each site common to all time points, and μ , ν ( 0 , ) are unknown. Further, N i j is a zero-inflated Poisson mixture (ZIPM) rv with zero-inflation parameter 1 ϵ ( 0 , 1 ) (One may ask if a conditional ZIPM model is equivalent to the well-studied Zero-Truncated Poisson model. We demonstrate that this is not the case in Appendix A).
The main objective of this paper is the problem of estimating the ratio θ μ / ν based solely on the observed data { N i j } , with { Y j } , { M i j } , and { Z i j } unobserved. As noted above, for identifiability of ( μ , ν ) , and therefore of θ = μ / ν , a restriction must be imposed: we assume that 0 < π 1 / 2 , corresponding to the knowledge that LFB occurs no more frequently than GFB. We propose frequentist estimation of θ via the EM algorithm and approximate standard errors via explicit calculation of the observed information matrix (A Bayesian analysis of this same problem can be found in a preprint version of this paper: M. D. Perlman (2022). Estimating the ratio of means in a zero-inflated Poisson mixture model. arXiv:math 2203.13994).
The rest of this paper is organized as follows: In Section 2, we briefly provide notation. In Section 3, we present a preliminary problem of estimating θ in a standard, two-component Poisson mixture model (i.e., { Y j } are unobserved and { M i j } are observed, without zero-inflation) to serve as a guidepost for the main problem. Therein, we estimate θ in a frequentist context via EM and approximate standard errors via explicit calculation of the observed information matrix, which to our knowledge is a new result (Approximate methods are usually used, such as the SEM algorithm [9] or the bootstrap [4]). In Section 4 we address the main problem of estimating the ratio of Poisson means in a zero-inflated Poisson mixture (ZIPM) model, as described in the previous paragraph. In Section 5 the ZIPM model is applied both to simulated data and real data on frigatebirds in the Coral Sea Islands. The results of this study are summarized in Section 6.

2. Notation

Column vectors and arrays denoted by Roman letters appear in bold type, their components in plain type; caps denote rvs:
t ( t 1 , , t I ) R I , y ( y 1 , , y J ) { 0 , 1 } J , Y ( Y 1 , , Y J ) { 0 , 1 } J , z ( z i j ) { 0 , 1 } K , Z ( Z i j ) { 0 , 1 } K , m = ( m i j ) Z + K , M = ( M i j ) Z + K , n = ( n i j ) Z + K , N = ( N i j ) Z + K ,
where R is the set of real numbers and Z + is the set of nonnegative integers. Note that Y j , Z i j { 0 , 1 } as each is an indicator variable, and M i j , N i j Z + as each represents count data. Sums and products will range over the index sets I and J unless otherwise specified, e.g.,
i = i = 1 I ,         j = j = 1 J ,         i , j = i = 1 I j = 1 J ,
etc. Summation over one or both of the indices i , j involving m i j , n i j , z i j , or their random (capitalized) versions will be indicated by simply dropping the indices that are summed over, e.g.,
m i = j m i j ,         n j = i n i j , m = i , j m i j ,         n = i , j n i j .
We set m ! = i , j m i j ! and n ! = i , j n i j ! . All conditioning events Y = y , N = n , etc., will be abbreviated as y , n , etc. Lastly, for i = 1 , , I and j = 1 , , J , we define
M j = ( M i , j | i = 1 , , I ) ,             m j = ( m i , j | i = 1 , , I ) , N j = ( N i , j | i = 1 , , I ) ,             n j = ( n i , j | i = 1 , , I ) , 1 i j = 1 i j ( n i j ) = 1 0 n i j ,             1 j = 1 j ( n j ) = i 1 i j , 1 = 1 ( n ) = j 1 j ,             1 = = 1 = ( n ) = j 1 j = , 1 i j = = 1 i j = ( n j ) = 0 n i j ,             1 j = = 1 j = ( n j ) = i 1 i j = , t j = t j ( n j ) = i t i 1 i j ,             t j = = t j = ( n j ) = i t i 1 i j = .
Here 1 i j ( 1 i j = ) is the indicator function of the event { n i j 0 } ( { n i j = 0 } ), so 1 j ( 1 j = ) is the number of nonzero (zero) n i j with j fixed, etc.

3. A Preliminary Problem

In this section, we address estimation of θ μ / ν in a standard, two-component Poisson mixture model. Specifically, we derive an EM algorithm for maximum likelihood estimation of the unknown model parameters (Section 3.1) and subsequently provide an explicit formula for standard errors of the maximum likelihood estimators (Section 3.2). We begin with a few preliminaries.
Here, M i j is a π -mixture of Poisson ( t i μ ) and Poisson ( t i ν ) rvs, where π is the unknown mixing probability, cf. (2). Thus the probability mass function (pmf) of the observed data array M ( M i j ) is
f π , μ , ν ( m ) = i , j π e t i μ ( t i μ ) m i j + ( 1 π ) e t i ν ( t i ν ) m i j / m i j ! = i , j π e t i μ μ m i j + ( 1 π ) e t i ν ν m i j · Ξ t ( m ) ,
where Ξ t ( m ) = i t i m i / m ! . The joint pmf of the complete (unobserved and observed) data ( Y , M ) is
f π , μ , ν ( y , m ) = f π ( y ) f μ , ν ( m | y ) = j π y j ( 1 π ) 1 y j i , j e t i μ μ m i j y j e t i ν ν m i j 1 y j Ξ t ( m ) = π y ¯ ( 1 π ) 1 y ¯ J j e t ¯ μ I μ m j y j e t ¯ ν I ν m j 1 y j Ξ t ( m ) , = π y ¯ ( 1 π ) 1 y ¯ J e t ¯ y ¯ μ μ m y ¯ e t ¯ ( 1 y ¯ ) ν ν m ( 1 y ) ¯ K Ξ t ( m ) ,
where
y ¯ = 1 J j y j , m ¯ = 1 K i , j m i j = m K , m y ¯ = 1 K j m j y j , m ( 1 y ) ¯ = 1 K j m j ( 1 y j ) .
Thus, f π , μ , ν ( y , m ) determines an exponential family with sufficient statistic ( Y ¯ , M Y ¯ , M ( 1 Y ) ¯ ) .

3.1. Estimation via the EM Algorithm

To obtain the MLEs π ^ , μ ^ , ν ^ and thus θ ^ = μ ^ / ν ^ , it is straightforward to apply the EM algorithm [10,11] as follows:
  • E-Step: Because (6) is an exponential family, Bayes formula shows that for l = 0 , 1 , , the ( l + 1 ) -st E-step simply imputes y j to be
    ( y j ^ ) l + 1 = E π ^ l , μ ^ l , ν ^ l [ Y j | m ] = P π ^ l , μ ^ l , ν ^ l [ Y j = 1 | m j ] = π ^ l i e t i μ ^ l ( t i μ ^ l ) m i j π ^ l i e t i μ ^ l ( t i μ ^ l ) m i j + ( 1 π ^ l ) i e t i ν ^ l ( t i ν ^ l ) m i j = π ^ l π ^ l + ( 1 π ^ l ) e I t ¯ ( ν ^ l μ ^ l ) ( ν ^ l μ ^ l ) m j .
Observe that in (7), the numerator and first term in the denominator is the unnormalized probability that Y j = 1 | m j and the second term in the denominator is the unnormalized probability that Y j = 0 | m j .
  • M-Step: From (6), the complete-data MLEs are found to be,
    π = y ¯ , μ = m y ¯ t ¯ y ¯ , ν = m ( 1 y ) ¯ t ¯ ( 1 y ¯ ) .
Thus, In the ( l + 1 ) -st iteration, we maximize estimates of the unknown parameters via
π ^ l + 1 = 1 J j ( y j ^ ) l + 1 , μ ^ l + 1 = m ( y ^ ) l + 1 ¯ t ¯ ( y ^ ) l + 1 ¯ , ν ^ l + 1 = m ( 1 y ^ ) l + 1 ¯ t ¯ ( 1 y ^ ) l + 1 ¯ ,
where ( y ^ ) l + 1 = ( ( y 1 ^ ) l + 1 , , ( y J ^ ) l + 1 ) . Note that the identifiability constraint, π 1 2 , is briefly ignored: Aitken and Rubin [10] (1985, p. 69) state that assuming convergence of ( π ^ l , μ ^ l , ν ^ l ) to an MLE ( π ^ , μ ^ , ν ^ ) , the same maximum value will occur at ( 1 π ^ , ν ^ , μ ^ ) . Thus, we simply take the MLE to be that for which the first component is ≤ 1 2 (say ( π ^ , μ ^ , ν ^ ) for the sake of specificity). This concludes the EM algorithm.
Using estimates ( π ^ , μ ^ , ν ^ ) from the EM algorithm, we obtain the following estimator of θ :
θ ^ l + 1 = μ ^ l + 1 ν ^ l + 1 = m ( y ^ ) l + 1 ¯ ( y ^ ) l + 1 ¯ 1 ( y ^ ) l + 1 ¯ m ¯ m ( y ^ ) l + 1 ¯ = 1 ( y ^ ) l + 1 ¯ 1 m ¯ m ( y ^ ) l + 1 ¯ 1 .

3.2. Standard Error for the MLE θ ^

We now provide an explicit formula for approximating standard errors of the unknown parameters, ( π , μ , ν ) , and thereby, of θ μ / ν . For simplicity of notation set ω = ( π , μ , ν ) , and assume that the EM iterates ω l converge to ω ^ ( π ^ , μ ^ , ν ^ ) , the actual MLEs based on the observed data M .
One method for approximating the standard error of ω ^ uses the total expected information matrix
I M ( ω ) E ω [ ω 2 log f ω ( M ) ]
for the observed data M : If K I J is large, it follows from Theorem 2 of [12] that
K ( ω ^ ω ) N 3 [ 0 , K I M 1 ( ω ) ] .
Alternatively, refs. [13,14] note that the observed information matrix I m ( ω ) ω 2 log f ω ( m ) usually yields a better normal approximation and often is more readily computed than expected information.
Theorem 1.
Assume the two-component Poisson mixture model presented in (1) and (2), where M = m is observed and Y is unobserved. If K I J is large, then
K ( ω ^ ω ) N 3 [ 0 , K I m 1 ( ω ^ ) ] .
The observed information matrix is given explicitly as follows:
I m ( ω ) = D ( ω ; t ¯ ; m ) e t ¯ I ( μ + ν ) Δ ( ω ; t ¯ ; m ) Δ ( ω ; t ¯ ; m ) ; D ( ω ; t ¯ ; m ) = J ( 1 2 π ) p ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 K m p ¯ μ 2 0 0 0 K m ( 1 p ) ¯ ν 2 , Δ ( ω ; t ¯ ; m ) = ( μ ν ) I m ¯ 1 / 2 γ 1 δ 1 , , ( μ ν ) I m ¯ J / 2 γ J δ J , δ j = 1 π ( 1 π ) I π ( 1 π ) m ¯ j μ t ¯ I π ( 1 π ) m ¯ j ν t ¯ , γ j = π ( e t ¯ μ μ m ¯ j ) I + ( 1 π ) ( e t ¯ ν ν m ¯ j ) I .
Elements on the main diagonal of covariance matrix K I m 1 ( ω ^ ) are estimated standard errors of parameters π ^ , μ ^ , ν ^ and off-diagonal elements are their respective covariances. The proof of this theorem appears in Appendix B.
Theorem 1 provides an approximate confidence interval for the parameter of interest, θ = μ / ν :
Proposition 1.
Under the conditions of Theorem 1, an approximate ( 1 α ) confidence interval for θ is given by,
θ ^ ± σ ^ n z α / 2 ,
where z α / 2 is the upper ( 1 α 2 ) -quantile of the standard normal distribution,
σ ^ 2 = K 1 ν ^ , μ ^ ν ^ 2 ( I 22 I 21 I 11 1 I 12 ) 1 1 ν ^ , μ ^ ν ^ 2 ,
and I m ( ω ^ ) is partitioned as
I m ( ω ^ ) = I 11 I 12 I 21 I 22
with I 11 : 1 × 1 , I 22 : 2 × 2 , I 12 : 1 × 2 , and I 21 = I 12 .
Proof. 
An approximate confidence interval for θ μ / ν g ( ω ) is obtained by propagation of error.
K ( θ ^ θ ) N [ 0 , K ( ω g ( ω ) | ω ^ ) I m 1 ( ω ^ ) ω g ( ω ^ ) | ω ^ ] ] = N 0 , K g π | ω ^ , g μ | ω ^ , g ν | ω ^ I m 1 ( ω ^ ) g π | ω ^ , g μ | ω ^ , g ν | ω ^ = N 0 , K 0 , 1 ν ^ , μ ^ ν ^ 2 I m 1 ( ω ^ ) 0 , 1 ν ^ , μ ^ ν ^ 2 = N 0 , K 1 ν ^ , μ ^ ν ^ 2 ( I 22 I 21 I 11 1 I 12 ) 1 1 ν ^ , μ ^ ν ^ 2 N ( 0 , σ ^ 2 ) .

4. The Main Problem

We now turn to our main problem of estimating the ratio of Poisson means θ μ / ν in a zero-inflated Poisson mixture (ZIPM) model cf. (1)–(4). Again, we derive an EM algorithm for maximum likelihood estimation of the unknown model parameters (Section 4.1) and subsequently provide an explicit formula for standard errors of the maximum likelihood estimators (Section 4.2). We begin with a few preliminaries.
Note that N i j is an ϵ -mixture of M i j and O i j , where O i j is degenerate at 0, so O i j Poisson ( λ = 0 ) ; while M i j is a π -mixture of Poisson ( t i μ ) Poisson ( t i θ λ ) and Poisson ( t i ν ) Poisson ( t i λ ) rvs. Thus this problem can be viewed as a three-component Poisson mixture model with one degenerate component and non-i.i.d. observations. The three weights are π ϵ , ( 1 π ) ϵ , and 1 ϵ , with the identifiability constraint 0 < π 1 / 2 .
For notational simplicity, set ω = ( π , ϵ , μ , ν ) . Under this three-component mixture model, the unconditional pmf of the observed data N { N i j } is
f ω ( n ) = i , j π ϵ e t i μ ( t i μ ) n i j + ( 1 π ) ϵ e t i ν ( t i ν ) n i j + ( 1 ϵ ) 0 n i j / n ! ,
where 0 0 = 1 . The joint pmf of the unobserved and observed data ( Y , Z , N ) is given by
f ω ( y , z , n ) = f π ( y ) f ϵ ( z ) f μ , ν ( n | y , z ) = j π y j ( 1 π ) 1 y j i , j ϵ z i j ( 1 ϵ ) 1 z i j · i , j e t i μ ( t i μ ) n i j y j z i j e t i ν ( t i ν ) n i j ( 1 y j ) z i j 0 n i j ( 1 z i j ) / n ! = π y ¯ ( 1 π ) 1 y ¯ J ϵ z ¯ ( 1 ϵ ) 1 z ¯ K · e t y z ¯ μ μ n y ¯ e t ( 1 y ) z ¯ ν ν n ( 1 y ) ¯ K Ξ t ( z , n ) ,
where y = { y j } , z = { z i j } , n = { n i j } ,
z ¯ = 1 K i , j z i j , t z ¯ = 1 K i , j t i z i j = 1 K i , j t i z i j , t y z ¯ = 1 K i , j t i y j z i j = 1 K i , j t i y j z i j , Ξ t ( z , n ) = i , j t i n i j z i j 0 n i j ( 1 z i j ) / n ! ,
and similarly with y replaced by 1 y . To obtain (9) we have used,
n y z ¯ = 1 K i , j n i j y j z i j = 1 K i , j n i j y j = 1 K j n j y j = n y ¯ Ξ t ( z , n ) = i , j t i n i j 0 n i j ( 1 z i j ) / n ! = i t i n i · i , j 0 n i j ( 1 z i j ) / n !
and similarly with y replaced by 1 y . Thus, f ω ( y , z , n ) determines an exponential family with sufficient statistic ( Y ¯ , Z ¯ , t Y Z ¯ , t ( 1 Y ) Z ¯ , n Y ¯ , n ( 1 Y ) ¯ ) .

4.1. Estimation via the EM Algorithm

To obtain the MLEs ϵ ^ , π ^ , μ ^ , ν ^ and then θ ^ = μ ^ / ν ^ , it is again straightforward (albeit, notationally challenging) to apply the EM algorithm, as follows:
  • E-Step: Since (9) is an exponential family, Bayes formula shows that for l = 0 , 1 , , the ( l + 1 ) -st E-step imputes y j , y j z i j , ( 1 y j ) z i j , and z i j , respectively, as,
    ( y j ^ ) l + 1 = E ω ^ l [ Y j | n ] = P ω ^ l [ Y j = 1 ] P ω ^ l [ N j = n j | Y j = 1 ] / P ω ^ l [ N j = n j ] = π ^ l i [ ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] π ^ l i [ ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] + ( 1 π ^ l ) i [ ϵ ^ l e t i ν ^ l ( t i ν ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] = π ^ l π ^ l + ( 1 π ^ l ) e t j ( ν ^ l μ ^ l ) ( ν ^ l μ ^ l ) n j i ϵ ^ l e t i ν ^ l + ( 1 ϵ ^ l ) ϵ ^ l e t i μ ^ l + ( 1 ϵ ^ l ) 1 i j = , ( y j z i j ^ ) l + 1 = E ω ^ l [ Y j Z i j | n ] = P ω ^ l [ Y j = 1 , Z i j = 1 ] P ω ^ l [ N j = n j | Y j = 1 , Z i j = 1 ] / P ω ^ l [ N j = n j ] = π ^ l ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j i i [ ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] π ^ l i [ ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] + ( 1 π ^ l ) i [ ϵ ^ l e t i ν ^ l ( t i ν ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j ] = ( y j ^ ) l + 1 · ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j ϵ ^ l e t i μ ^ l ( t i μ ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j = ( y j ^ ) l + 1 · 1 + 1 ϵ ^ l ϵ ^ l e t i μ ^ l 1 i j = ( 1 y j ) z i j ^ ] l + 1 = [ 1 ( y j ^ ) l + 1 ] · ϵ ^ l e t i ν ^ l ( t i ν ^ l ) n i j ϵ ^ l e t i ν ^ l ( t i ν ^ l ) n i j + ( 1 ϵ ^ l ) 0 n i j = [ 1 ( y j ^ ) l + 1 ] · 1 + 1 ϵ ^ l ϵ ^ l e t i ν ^ l 1 i j = ; ( z i j ^ ) l + 1 = ( y j z i j ^ ) l + 1 + [ ( 1 y j ) z i j ^ ] l + 1 .
Note that ( y j z i j ^ ) l + 1 ( y j ^ ) l + 1 · ( z i j ^ ) l + 1 in general.
  • M-Step: From (9), the complete-data MLEs are found to be,
    π ˜ = y ¯ , ϵ ˜ = z ¯ , μ ˜ = n y ¯ t y z ¯ , ν ˜ = n ( 1 y ) ¯ t ( 1 y ) z ¯ .
Thus the ( l + 1 ) -st M-step yields the updated estimates
π ^ l + 1 = 1 J j ( y j ^ ) l + 1 , ϵ ^ l + 1 = 1 K i , j ( z i j ^ ) l + 1 , μ ^ l + 1 = n ( y ^ ) l + 1 ¯ t ( y z ^ ) l + 1 ¯ = j n j ( y j ^ ) l + 1 i , j t i ( y j z i j ^ ) l + 1 , ν ^ l + 1 = n [ ( 1 y ) ^ ] l + 1 ¯ t [ ( 1 y ) z ^ ] l + 1 ¯ = j n j [ 1 y j ^ ] l + 1 i , j t i [ ( 1 y j ) z i j ^ ] l + 1 .
Again, as in Aitken and Rubin [10] (1985, p. 69), the constraint π 1 2 is ignored and, assuming convergence to an MLE ( π ^ , ϵ ^ , μ ^ , ν ^ ) , the same maximum value will occur at ( 1 π ^ , ϵ ^ , ν ^ , μ ^ ) . Thus, the MLE is taken to be that for which the first component is ≤ 1 2 , say ( π ^ , ϵ ^ , μ ^ , ν ^ ) . This concludes the EM algorithm.
Using estimates ( ϵ ^ , π ^ , μ ^ , ν ^ ) from the EM algorithm, we obtain an updated estimator θ ^ l + 1 μ ^ l + 1 ν ^ l + 1 . Note that unlike in (8), θ ^ l + 1 depends on { t i } .

4.2. Standard Error for the MLE θ ^

We provide an explicit formula for approximating standard errors of the unknown parameters ( ϵ , π , μ , ν ) , and thereby, of θ μ / ν .
Theorem 2.
Assume the two-component Poisson mixture model presented in (1)–(4), where N = n is observed and Y , Z , and M are unobserved. If K I J is large, then
K ( ω ^ ω ) N 4 [ 0 , K I n 1 ( ω ^ ) ] .
The observed information matrix I n ( ω ) ω 2 log f ω ( n ) is given explicitly as follows:
I n ( ω ) = T 1 + T 2 , T 1 = J ( 1 2 π ) q ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 0 K ( 1 2 ϵ ) ρ + ϵ 2 ϵ 2 ( 1 ϵ ) 2 0 0 0 0 K n q ¯ μ 2 0 0 0 0 K n ( 1 q ) ¯ ν 2 ,
T 2 = π ( 1 π ) j e t j ( μ + ν ) ( μ ν ) n j i { [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i ν 1 ) + 1 ] } 1 i j = ψ j 2 ϕ j ϕ j ϵ ( 1 ϵ ) i , j 1 i j = e t i μ [ ϵ ( e t i μ 1 ) + 1 ] 2 χ i ( 1 ) χ i ( 1 ) q j + e t i ν [ ϵ ( e t i ν 1 ) + 1 ] 2 χ i ( 0 ) χ i ( 0 ) ( 1 q j ) , ρ = 1 K + ϵ K i , j 1 i j = q j ϵ + ( 1 ϵ ) e t i μ + 1 q j ϵ + ( 1 ϵ ) e t i ν ,
q j = π e t j μ μ n j i [ ϵ e t i μ + ( 1 ϵ ) ] 1 i j = π e t j μ μ n j i [ ϵ e t i μ + ( 1 ϵ ) ] 1 i j = + ( 1 π ) e t j ν ν n j i [ ϵ e t i ν + ( 1 ϵ ) ] 1 i j = , ψ j = π e t j μ μ n j i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = + ( 1 π ) e t j ν ν n j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = , ϕ j = 1 π ( 1 π ) i 1 i j = ( e t i μ e t i ν ) [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i ν 1 ) + 1 ] n j μ t j ϵ i 1 i j = t i e t i μ ϵ ( e t i μ 1 ) + 1 n j ν t j ϵ i 1 i j = t i e t i ν ϵ ( e t i ν 1 ) + 1 , χ i ( 0 ) = 0 , 1 ϵ ( 1 ϵ ) , t i , 0 , χ i ( 1 ) = 0 , 1 ϵ ( 1 ϵ ) , 0 , t i .
The proof of this theorem also appears in Appendix B.
Again, Theorem 2 provides an approximate confidence interval for the parameter of interest, θ = μ / ν :
Proposition 2.
Under the conditions of Theorem 2, an approximate ( 1 α ) confidence interval for θ is given by,
θ ^ ± τ ^ K z α / 2 ,
where z α / 2 is the upper ( 1 α 2 ) -quantile of the standard normal distribution,
τ ^ 2 = K 1 ν ^ , μ ^ ν ^ 2 ( I 22 I 21 I 11 1 I 12 ) 1 1 ν ^ , μ ^ ν ^ 2 ,
and I n ( ω ^ ) is partitioned as
I n ( ω ^ ) = I 11 I 12 I 21 I 22 ,
with I 11 : 2 × 2 , I 22 : 2 × 2 , I 12 : 2 × 2 , and I 21 = I 12 .
Proof. 
Again, an approximate confidence interval for θ μ / ν g ( ω ) is obtained by propagation of error. For θ ^ = μ ^ / ν ^ ,
K ( θ ^ θ ) N [ 0 , K ( ω g ( ω ) | ω ^ ] ) I n 1 ( ω ^ ) ω g ( ω ^ ) | ω ^ ] = N 0 , K 0 , 0 , 1 ν ^ , μ ^ ν ^ 2 I n 1 ( ω ^ ) 0 , 0 , 1 ν ^ , μ ^ ν ^ 2 = N 0 , K 1 ν ^ , μ ^ ν ^ 2 ( I 22 I 21 I 11 1 I 12 ) 1 1 ν ^ , μ ^ ν ^ 2 N ( 0 , τ ^ 2 ) .

5. Simulation and Data Analysis

The frequentist estimation procedure for ZIPM models in Section 4 is now applied to simulated and real data.

5.1. Simulation Study

Simulated data is used to assess the estimation error and confidence interval coverage in various regimes. Across all simulations, we set μ = 10 and ν = 5 so that θ = 2 . Across simulations we vary I and J to assess accuracy as the overall amount of available data changes, vary π to assess accuracy as the relative prevalence between the more or less prevalent groups becomes more severe, and vary ϵ to assess accuracy as zero-inflation becomes more severe. Specifically, for each combination of I { 5 , 10 , 20 , 40 , 80 } , J { 5 , 10 , 20 , 40 , 80 } , π { 0.1 , 0.25 , 0.4 } , and ϵ { 0.6 , 0.7 , 0.8 } , we generate 200 independent datasets and estimate θ using the EM algorithm from Section 4.1 (with 20 random starts), as well as a 95% confidence interval using Proposition 2. Estimation error and nominal coverage of confidence intervals is shown in Figure 1 and Figure 2, respectively. (The information shown in these figures is presented in tabular form in Appendix C).
We observe that the methods derived in Section 4 yield accurate estimates of θ and well-calibrated confidence intervals. Figure 1 shows that mean absolute error decreases in I, J, π , and ϵ , and is generally small. The decrease in error as π and ϵ increase may be attributed to the corresponding increase in sample size for the less prevalent component and the decreasing amount of zero-inflation, respectively.
Figure 2 shows that coverage hovers close to 95% for most combinations of I, J, π , θ . We observe that error is somewhat larger and coverage is somewhat inaccurate when I , J { 5 , 10 } . However, these inaccuracies are modest given the highly-limited data availability in these simulation scenarios.

5.2. Analysis of Frigatebird Nest Counts

We study ecological count data on frigatebirds in the Coral Sea Islands off the coast of Northeast Australia, as described in [1] (The specific data studied herein was provided via email with author G. Barry Baker). They obtained counts of frigatebird nests over 11 standardized sites across 4 separate time points.
This data is relevant to our study for two reasons: First, the ecological count data is zero-inflated. Of the 44 unique combinations of sites and time points in which data was collected, 7 had 0 nests (about 15.9%). Second, the frigatebird species has two subspecies, least (LFB) and greater (GFB). In the observed data, some nests were specified as LFB nests or GFB nests, but the majority were unidentified (Table 1). We analyze counts of only unidentified frigatebird nests, N i j , by site i and time point j (Table 2).
We applied our work from Section 4 to the unidentified nest counts in order to estimate the ratio θ μ / ν , where μ and ν denote the expected numbers of nests per site for the less prevalent (Based on the numbers of nests that could be identified; see Table 1). LFB and more prevalent GFB, respectively. We set t i = 1 for each site i in the absence of additional information on each site (To assess the sensitivity of our results to this assumption, we have run an additional analysis in which t i is iteratively updated during the EM algorithm. We find estimates of θ are nearly unchanged; see Appendix C for details). During estimation the EM algorithm was run with 1000 random initializers.
Results appear in Table 3. We estimate that 25% of nests belong to LFB ( π ^ = 0.25 ), 75% belong to GBF, and that 16% of observed nest counts are zero-inflated ( 1 ϵ ^ = 0.16 ). The EM algorithm yields the MLE θ ^ = 3.65 for the ratio θ ; the 95% confidence interval for θ is (3.23, 4.08).

6. Conclusions

In this paper, we studied the zero-inflated Poisson mixture (ZIPM) model in the frequentist setting. In addition to deriving an EM algorithm for point-estimation of model parameters, we stated an explicit formula for estimating standard errors of the MLEs. As a preliminary, we derived analogous results for the commonly-used, two-component Poisson mixture model. Although somewhat complex notationally, our formulae are straightforward to apply.
Our results were applied to real data on frigatebirds in the Coral Sea Islands off the coast of Northeastern Australia, where the ratio between two subspecies is of interest to ecologists. In this setting, knowledge of which species was more prevalent allows identifiability. We then used only unlabeled, zero-inflated nest count data to estimate (i) the relative abundance sites for each subspecies, (ii) the rate of zero-inflation, (iii) the mean numbers of nests per site for each subspecies, and (iv) the ratio of nests per site for each subspecies. We expect the ZIPM model to be useful in other ecological count data settings. Hence, our work provides straightforward ways for practitioners to estimate key parameters of interest.

Author Contributions

Conceptualization, M.P. and M.D.P.; methodology, M.P. and M.D.P.; software, M.P.; validation, M.P. and M.D.P.; writing—original draft preparation, M.P. and M.D.P.; writing—review and editing, M.P. and M.D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and data required to reproduce our analyses can be found at https://github.com/pearce790/ZIPM, accessed on 2 July 2025.

Acknowledgments

We thank the three anonymous referees for their helpful feedback during the peer review process. Furthermore, we are grateful to Barry Baker for providing the frigatebird data used in Section 5.2, and to Jon Wellner for his generous and always-insightful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Conditional ZIPM = ZTP?

Consider the following two subsets of the index set K and two subarrays of the data array N ( N i j ) :
Ω Z = { ( i , j ) | Z i j = 1 } , Ω N = { ( i , j ) | N i j 0 } , N Z = ( N i j | Z i j = 1 ) = ( M i j | Z i j = 1 ) , N = ( N i j | N i j 0 ) = ( M i j | M i j 0 ) .
Both Ω Z and Ω N are random subsets, Ω Z is unobserved, Ω N is observed, and Ω N Ω Z , so N N Z . Because M is independent of Z , N Z is a random subarray of the i.n.i.d. array ( M i j ) , where membership in N Z depends only on Z . Thus N is also is a (smaller) random subarray of the i.n.i.d. array ( M i j ) , where membership in N depends on both Z and the events { M i j 0 } .
The latter fact suggest a question: Is the conditional distribution of the two-component ZIPM rv N i j given N i j 0 the same as the distribution of the mixture of the conditional distributions of the two Poisson components given that each is non-zero? The latter conditional distribution is the well-known zero-truncated Poisson (ZTP) distribution, also called positive Poisson, which has been thoroughly studied [15]. The ZTP distribution model also is an exponential family, with pmf given by
g λ ( x ) = λ x ( e λ 1 ) x ! , x = 1 , 2 , .
If the answer to the above question is yes, then estimation of π , μ , ν and thus θ could be based on only the set of non-zero N i j . That is, discard all 0’s and view the remaining N i j as π -mixtures of two ZTP components with parameters t i μ and t i ν . Because this involves only two mixture components rather than three as above, both being exponential families, and neither is degenerate, estimation methods such as the EM algorithm would be easier to carry out.
Unfortunately the answer to the question is no. If we abbreviate N i j by N, M i j by M, and Z i j by Z, then the question can be expressed as follows:
Is P [ N = x | N 0 ] = π μ x ( e μ 1 ) x ! + ( 1 π ) ν x ( e ν 1 ) x ! , x = 1 , 2 , ?
However, for x 1 ,
P [ N = x | N 0 ] = P [ Z M = x , Z M 0 ] P [ Z M 0 ] = P [ M = x , Z = 1 , M 0 ] P [ Z = 1 , M 0 ] = P [ M = x , M 0 ] P [ M 0 ] = π μ x e μ x ! + ( 1 π ) ν x e ν x ! 1 π e μ ( 1 π ) e ν ,
since M and Z are independent, so the question becomes:
Is π μ x e μ x ! + ( 1 π ) ν x e ν x ! 1 π e μ ( 1 π ) e ν = π μ x ( e μ 1 ) x ! + ( 1 π ) ν x ( e ν 1 ) x ! , x = 1 , 2 , ?
After some algebra, this equation simplifies to
μ ν x = e μ 1 e ν 1 ,
which cannot hold for all x 1 unless μ = ν .

Appendix B. Proofs of Theorems 1 and 2

Proof of Theorem 1.
As previously noted, it follows from [12,13,14] that for large K,
K ( ω ^ ω ) N 3 [ 0 , K I m 1 ( ω ) ] , I m ( ω ) ω 2 log f ω ( m ) = E ω [ ω 2 log f ω ( m ) | m ] = E ω [ ω 2 log f ω ( Y , m ) | m ] + E ω [ ω 2 log f ω ( Y | m ) | m ] .
From (6),
log f ω ( Y , m ) = J [ Y ¯ log π + ( 1 Y ¯ ) log ( 1 π ) ] + K [ m Y ¯ log μ t ¯ Y ¯ μ + m ( 1 Y ) ¯ log ν t ¯ ( 1 Y ¯ ) ν ] + h t ( m ) ; ω log f ω ( Y , m ) = J ( Y ¯ π ) π ( 1 π ) K m Y ¯ μ t ¯ Y ¯ K m ( 1 Y ) ¯ ν t ¯ ( 1 Y ¯ ) ; ω 2 log f ω ( Y , m ) = J ( 1 2 π ) Y ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 K m Y ¯ μ 2 0 0 0 K m ( 1 Y ) ¯ ν 2 ;
where m Y ¯ and m ( 1 Y ) ¯ are defined similarly to m y ¯ and h t ( m ) does not depend on ω . Furthermore by (6), for fixed m ,
f ω ( y | m ) = f ω ( y , m ) / f ω ( m ) j π e t ¯ I μ μ m j y j ( 1 π ) e t ¯ I ν ν m j 1 y j ,
hence Y 1 , , Y J are conditionally independent given m with
[ Y j | m ] Bernoulli ( p j ) , p j p j ( ω ; t ¯ ; m ¯ j ) = π e t ¯ I μ μ m j π e t ¯ I μ μ m j + ( 1 π ) e t ¯ I ν ν m j
= π ( e t ¯ μ μ m ¯ j ) I π ( e t ¯ μ μ m ¯ j ) I + ( 1 π ) ( e t ¯ ν ν m ¯ j ) I ,
where m ¯ j = 1 I m j . From (A5),
E ω [ Y ¯ | m ] = p ¯ ; E ω [ m Y ¯ | m ] = m p ¯ ; E ω [ m ( 1 Y ) ¯ | m ] = m ( 1 p ) ¯ ;
where m p ¯ and m ( 1 p ) ¯ are defined similarly to m y ¯ and m ( 1 y ) ¯ , and
p ¯ = 1 J j p j , p ( 1 p ) ¯ = 1 J j p j ( 1 p j ) .
Furthermore,
E ω [ ( 1 2 π ) Y ¯ + π 2 | m ] = ( 1 2 π ) p ¯ + π 2 .
Thus from (A3), the first term in (A2) is given by
E ω [ ω 2 log f ω ( Y , m ) | m ] = J ( 1 2 π ) p ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 K m p ¯ μ 2 0 0 0 K m ( 1 p ) ¯ ν 2 = : D ( ω ; t ¯ ; m ) .
The second term in (A2) is obtained as follows: From (A5),
f ω ( Y | m ) = j p j Y j ( 1 p j ) 1 Y j ; log f ω ( Y | m ) = j Y j log p j + ( 1 Y j ) log ( 1 p j ) ; ω log f ω ( Y | m ) = j ( Y j p j ) p j ( 1 p j ) ω p j ; ω 2 log f ω ( Y | m ) = j ( Y j p j ) ω 2 p j ( ω p j ) ( ω p j ) p j ( 1 p j ) ( Y j p j ) ( ω p j ) [ ω ( p j ( 1 p j ) ) ] p j 2 ( 1 p j ) 2 ; E ω [ ω 2 log f ω ( Y | m ) | m ] = j ( ω p j ) ( ω p j ) p j ( 1 p j ) = j ( ω log p j ) [ ω log ( 1 p j ) ] .
From (A6),
log p j = log π I t ¯ μ + I m ¯ j log μ log γ j , γ j γ j ( ω ; t ¯ ; m ¯ j ) : = π ( e t ¯ μ μ m ¯ j ) I + ( 1 π ) ( e t ¯ ν ν m ¯ j ) I ,
from which it can be shown that
ω log p j = ( e t ¯ ν ν m ¯ j ) I γ j 1 π , ( 1 π ) m ¯ j μ t ¯ I , ( 1 π ) m ¯ j ν t ¯ I , ω log ( 1 p j ) = p j 1 p j ω log p j = π 1 π e t ¯ ( ν μ ) μ ν m ¯ j I ω log p j = ( e t ¯ μ μ m ¯ j ) I γ j 1 1 π , π m ¯ j μ t ¯ I , π m ¯ j μ t ¯ I , ( ω log p j ) [ ω log ( 1 p j ) ] = [ e t ¯ ( μ + ν ) ( μ ν ) m ¯ j ] I γ j 2 δ j δ j , δ j δ j ( ω ; t ¯ ; m j ) : = 1 π ( 1 π ) , I π ( 1 π ) m ¯ j μ t ¯ , I π ( 1 π ) m ¯ j ν t ¯ .
Therefore
E ω [ ω 2 log f ω ( Y | m ) | m ] = j [ e t ¯ ( μ + ν ) ( μ ν ) m ¯ j ] I γ j 2 δ j δ j .
Thus by (A2), (A7), and (A8), the observed information matrix is
I m ( ω ) = D ( ω ; t ¯ ; m ) e t ¯ I ( μ + ν ) Δ ( ω ; t ¯ ; m ) Δ ( ω ; t ¯ ; m ) ; Δ ( ω ; t ¯ ; m ) : = ( μ ν ) I m ¯ 1 / 2 γ 1 δ 1 , , ( μ ν ) I m ¯ J / 2 γ J δ J .
Finally, we may now estimate I m ( ω ) in the normal approximation
K ( ω ^ ω ) N 3 [ 0 , K I m 1 ( ω ) ]
by replacing ω in I m ( ω ) by its MLE ω ^ ( π ^ , μ ^ , ν ^ ) to obtain
K ( ω ^ ω ) N 3 [ 0 , K I m 1 ( ω ^ ) ] .
This requires replacing π , μ , ν by π ^ , μ ^ , ν ^ wherever the former three appear in the entries of I m ( ω ) , including in p j , δ j , and γ j . For large K the 3 × 3 matrix I m ( ω ^ ) is positive definite, hence invertible. □
Proof of Theorem 2.
It follows from [12,13,14] that for large K,
K ( ω ^ ω ) N 4 [ 0 , K I n 1 ( ω ) ] ,
where I n ( ω ) is the 4 × 4 observed information matrix. Then,
I n ( ω ) ω 2 log f ω ( n ) = E ω [ ω 2 log f ω ( n ) | n ] = E ω [ ω 2 log f ω ( Y , Z , n ) | n ] + E ω [ ω 2 log f ω ( Y , Z | n ) | n ] T 1 + T 2 .
By (9),
log f ω ( Y , Z , n ) = J [ Y ¯ log π + ( 1 Y ¯ ) log ( 1 π ) ] + K [ Z ¯ log ϵ + ( 1 Z ¯ ) log ( 1 ϵ ) ] + K [ n Y ¯ log μ t Y Z ¯ μ + n ( 1 Y ) ¯ log ν t ( 1 Y ) Z ¯ ν ] + log Ξ t ( n , z ) ; ω log f ω ( Y , Z , n ) = J ( Y ¯ π ) π ( 1 π ) K ( Z ¯ ϵ ) ϵ ( 1 ϵ ) K n Y ¯ μ t Y Z ¯ K n ( 1 Y ) ¯ ν t ( 1 Y ) Z ¯ ; ω 2 log f ω ( Y , Z , n ) = J ( 1 2 π ) Y ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 0 K ( 1 2 ϵ ) Z ¯ + ϵ 2 ϵ 2 ( 1 ϵ ) 2 0 0 0 0 K n Y ¯ μ 2 0 0 0 0 K n ( 1 Y ) ¯ ν 2 .
Furthermore by (9), with n fixed,
f ω ( y , z | n ) = f ω ( y , z , n ) / f ω ( n ) j π y j ( 1 π ) 1 y j i , j ϵ e t i μ ( t i μ ) n i j y j e t i ν ( t i ν ) n i j 1 y j z i j [ ( 1 ϵ ) 0 n i j ] 1 z i j .
From this, { Z i j } are conditionally independent given Y and N , with
[ Z i j | y , n ] Bernoulli ( r i j ) , r i j r ( t i ; y j , n i j ) : = ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j + ( 1 ϵ ) 0 n i j
= 1 0 n i j + 0 n i j ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j + ( 1 ϵ ) ,
and
f ω ( y | n ) z f ω ( y , z | n ) j π y j ( 1 π ) 1 y j · i , j { ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j + ( 1 ϵ ) 0 n i j } = j π y j ( 1 π ) 1 y j · { i , j | n i j 0 } { ϵ e t i μ μ n i j y j e t i ν ν n i j 1 y j t i n i j } · { i , j | n i j = 0 } [ ϵ e t i μ y j e t i ν ( 1 y j ) + ( 1 ϵ ) ] ϵ 1 j [ π e t j μ μ n j ] y j [ ( 1 π ) e t j ν ν n j ] 1 y j · j { i | n i j = 0 } [ ϵ e t i μ y j e t i ν ( 1 y j ) + ( 1 ϵ ) ] j [ π e t j μ μ n j ] y j [ ( 1 π ) e t j ν ν n j ] 1 y j · i [ ϵ e t i μ y j e t i ν ( 1 y j ) + ( 1 ϵ ) ] 1 i j = j q j y j ( 1 q j ) 1 y j ,
where
q j q j ( n j ) q j ( π , ϵ , μ , ν ; t ; n j ) : = π e t j μ μ n j i [ ϵ e t i μ + ( 1 ϵ ) ] 1 i j = π e t j μ μ n j i [ ϵ e t i μ + ( 1 ϵ ) ] 1 i j = + ( 1 π ) e t j ν ν n j i [ ϵ e t i ν + ( 1 ϵ ) ] 1 i j = .
Thus { Y j } are conditionally independent given N , with
[ Y j | n ] Bernoulli ( q j ) .
Therefore E ω [ Y ¯ | n ] = q ¯ q ¯ ( n ) , while
E ω [ n Y ¯ | n ] = 1 K i , j n i j E ω [ Y j | n ] = 1 K i , j n i j q j = n q ¯ ; E ω [ n ( 1 Y ) ¯ | n ] = 1 K i , j n i j ( 1 q j ) = n ( 1 q ) ¯ .
Furthermore,
E ω [ ( 1 2 π ) Y ¯ + π 2 | n ] = ( 1 2 π ) q ¯ + π 2 .
Next,
E ω [ Z ¯ | n ] = E ω { E ω [ Z ¯ | y , n ] | n } ; = 1 K i , j E ω { r ( t i ; Y j , n i j ) | n } ; = 1 K i , j { q j r ( t i ; 1 , n i j ) + ( 1 q j ) r ( t i ; 0 , n i j ) } q r ( 1 ) + ( 1 q ) r ( 0 ) ¯ .
From (A11), note that
r ( t i ; 1 , n i j ) = 1 0 n i j + 0 n i j ϵ e t i μ μ n i j t i n i j ϵ e t i μ μ n i j t i n i j + ( 1 ϵ ) , r ( t i ; 0 , n i j ) = 1 0 n i j + 0 n i j ϵ e t i ν ν n i j t i n i j ϵ e t i ν ν n i j t i n i j + ( 1 ϵ ) ,
and decompose i , j as { i , j | n i j 0 } + { i , j | n i j = 0 } , so (A13) becomes
E ω [ Z ¯ | n ] = 1 K i , j 1 0 n i j + q j 0 n i j ϵ e t i μ μ n i j t i n i j ϵ e t i μ μ n i j t i n i j + ( 1 ϵ ) + ( 1 q j ) 0 n i j ϵ e t i ν ν n i j t i n i j ϵ e t i ν ν n i j t i n i j + ( 1 ϵ ) = 1 K + ϵ K { i , j | n i j = 0 } q j e t i μ ϵ e t i μ + ( 1 ϵ ) + ( 1 q j ) e t i ν ϵ e t i ν + ( 1 ϵ ) = 1 K + ϵ K i , j 1 i j = q j ϵ + ( 1 ϵ ) e t i μ + 1 q j ϵ + ( 1 ϵ ) e t i ν : = ρ ( ϵ , μ , ν ; t ; n ) ρ .
Thus,
E ω [ ( 1 2 ϵ ) Z ¯ + ϵ 2 | n ] = ( 1 2 ϵ ) ρ + ϵ 2 .
Therefore the first term in (A9) is evaluated explicitly as follows:
T 1 = E ω [ ω 2 log f ω ( Y , Z , n ) | n ) | n ] = J ( 1 2 π ) q ¯ + π 2 π 2 ( 1 π ) 2 0 0 0 0 K ( 1 2 ϵ ) ρ + ϵ 2 ϵ 2 ( 1 ϵ ) 2 0 0 0 0 K n q ¯ μ 2 0 0 0 0 K n ( 1 q ) ¯ ν 2 .
For the second term in (A9), it follows from (A10) and (A12) that
f ω ( Y , Z | n ) = f ω ( Y | n ) f ω ( Z | Y , n ) = j q j Y j ( 1 q j ) 1 Y j i , r i j Z i j ( 1 r i j ) 1 Z i j = j q j Y j ( 1 q j ) 1 Y j { i , j | n i j = 0 } r i j Z i j ( 1 r i j ) 1 Z i j ,
since n i j 0 Z i j = 1 and r i j = 1 . Thus,
log f ω ( Y , Z | n ) = j [ Y j log q j + ( 1 Y j ) log ( 1 q j ) ] + { i , j | n i j = 0 } [ Z i j log r i j + ( 1 Z i j ) log ( 1 r i j ) ] ; ω log f ω ( Y , Z | n ) = j ( Y j q j ) q j ( 1 q j ) ω q j + { i , j | n i j = 0 } ( Z i j r i j ) r i j ( 1 r i j ) ω r i j ; ω 2 log f ω ( Y , Z | n ) = j ( Y j q j ) ω 2 q j ( ω q j ) ( ω q j ) q j ( 1 q j ) ( Y j q j ) ( ω q j ) [ ω ( q j ( 1 q j ) ) ] q j 2 ( 1 q j ) 2 + { i , j | n i j = 0 } ( Z i j r i j ) ω 2 r i j ( ω r i j ) ( ω r i j ) r i j ( 1 r i j ) ( Z i j r i j ) ( ω r i j ) [ ω ( r i j ( 1 r i j ) ) ] r i j 2 ( 1 r i j ) 2 ,
where r i j r ( t i ; y j , n i j ) . Therefore, a preliminary expression for the second term in (A9) is given by
T 2 = E ω [ ω 2 log f ω ( Y , Z | n ) | n ] = j ( ω q j ) ( ω q j ) q j ( 1 q j ) { i , j | n i j = 0 } E ω ( ω r i j ) ( ω r i j ) r i j ( 1 r i j ) | n = j ( ω log q j ) [ ω log ( 1 q j ) ] + { i , j | n i j = 0 } E ω ( ω log r ( t i ; Y j , n i j ] ) ) [ ω log ( 1 r ( t i ; Y j , n i j ) ) ] | n ,
where we used the facts that for any functions h ( n ) and h ( y , n ) ,
E ω [ ( Y j q j ) h ( n ) | n ] = h ( n ) E ω [ ( Y j q j ) | n ] = 0 , E ω [ ( Z i j r i j ) h ( y , n ) | n ] = E ω { h ( y , n ) E ω [ Z i j r i j | y , n ] | n } = 0 .
Now note that
log q j = log π t j μ + n j log μ + i 1 i j = log [ ϵ ( e t i μ 1 ) + 1 ] log ψ j ; ψ j : = π e t j μ μ n j i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = + ( 1 π ) e t j ν ν n j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = ; ψ j π = e t j μ μ n j i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = e t j ν ν n j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = , ψ j ϵ = π e t j μ μ n j i 1 i j = ( e t i μ 1 ) ϵ ( e t i μ 1 ) + 1 i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = + ( 1 π ) e t j ν ν n j i 1 i j = ( e t i ν 1 ) ϵ ( e t i ν 1 ) + 1 i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = , ψ j μ = π e t j μ μ n j i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = n j μ t j ϵ i 1 i j = t i e t i μ ϵ ( e t i μ 1 ) + 1 , ψ j ν = ( 1 π ) e t j ν ν n j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = n j ν t j ϵ i 1 i j = t i e t i ν ϵ ( e t i ν 1 ) + 1 ;
from which it can be shown that
log q j π = e t j ν ν n j π ψ j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = , log q j ϵ = ( 1 π ) e t j ν ν n j ψ j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = i 1 i j = ( e t i μ e t i ν ) [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i μ 1 ) + 1 ] , log q j μ = ( 1 π ) e t j ν ν n j ψ j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = n j μ t j ϵ i 1 i j = t i e t i μ ϵ ( e t i μ 1 ) + 1 , log q j ν = ( 1 π ) e t j ν ν n j ψ j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = n j ν t j ϵ i 1 i j = t i e t i ν ϵ ( e t i ν 1 ) + 1 .
These four partial derivatives determine the 4 × 1 column vector ω log q j . Furthermore,
ω log ( 1 q j ) = q j 1 q j ω log q j = π 1 π e t j μ μ n j i [ ϵ ( e t i μ 1 ) + 1 ] 1 i j = e t j ν ν n j i [ ϵ ( e t i ν 1 ) + 1 ] 1 i j = ω log q j ,
hence
( ω log q j ) [ ω log ( 1 q j ) ] = π ( 1 π ) e t j ( μ + ν ) ( μ ν ) n j i { [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i ν 1 ) + 1 ] } 1 i j = ψ j 2 ϕ j ϕ j ,
where
ϕ j ϕ j ( ω ; t ; n j ) : = 1 π ( 1 π ) i 1 i j = ( e t i μ e t i ν ) [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i ν 1 ) + 1 ] n j μ t j ϵ i 1 i j = t i e t i μ ϵ ( e t i μ 1 ) + 1 n j ν t j ϵ i 1 i j = t i e t i ν ϵ ( e t i ν 1 ) + 1 .
Next, for n i j = 0 ,
r ( t i ; 1 , 0 ) = ϵ e t i μ ϵ ( e t i μ 1 ) + 1 , r ( t i ; 0 , 0 ) = ϵ e t i ν ϵ ( e t i ν 1 ) + 1 ; log r ( t i ; 1 , 0 ) = log ϵ t i μ log [ ϵ ( e t i μ 1 ) + 1 ] , log ( 1 r ( t i ; 1 , 0 ) ) = log ( 1 ϵ ) log [ ϵ ( e t i μ 1 ) + 1 ] ; log r ( t i ; 0 , 0 ) = log ϵ t i ν log [ ϵ ( e t i ν 1 ) + 1 ] , log ( 1 r ( t i ; 0 , 0 ) ) = log ( 1 ϵ ) log [ ϵ ( e t i ν 1 ) + 1 ] ;
so with ω = ( π , ϵ , μ , ν ) , we find that
ω log r ( t i ; 1 , 0 ) = 0 , 1 ϵ [ ϵ ( e t i μ 1 ) + 1 ] , ( 1 ϵ ) t i ϵ ( e t i μ 1 ) + 1 , 0 , ω log ( 1 r ( t i ; 1 , 0 ) ) = 0 , e t i μ ( 1 ϵ ) [ ϵ ( e t i μ 1 ) + 1 ] , ϵ t i e t i μ ϵ ( e t i μ 1 ) + 1 , 0 , ω log r ( t i ; 0 , 0 ) = 0 , 1 ϵ [ ϵ ( e t i ν 1 ) + 1 ] , 0 , ( 1 ϵ ) t i ϵ ( e t i ν 1 ) + 1 , ω log ( 1 r ( t i ; 0 , 0 ) ) = 0 , e t i ν ( 1 ϵ ) [ ϵ ( e t i ν 1 ) + 1 ] , 0 , ϵ t i e t i ν ϵ ( e t i ν 1 ) + 1 .
Thus
E ω ( ω log r ( t i ; Y j , 0 ) ) [ ω log ( 1 r ( t i ; Y j , 0 ) ) ] | n = ( ω log r ( t i ; 1 , 0 ) ) [ ω log ( 1 r ( t i ; 1 , 0 ) ) ] q j + ( ω log r ( t i ; 0 , 0 ) ) [ ω log ( 1 r ( t i ; 0 , 0 ) ) ] ( 1 q j ) } = ϵ ( 1 ϵ ) e t i μ [ ϵ ( e t i μ 1 ) + 1 ] 2 χ i ( 1 ) χ i ( 1 ) q j ϵ ( 1 ϵ ) e t i ν [ ϵ ( e t i ν 1 ) + 1 ] 2 χ i ( 0 ) χ i ( 0 ) ( 1 q j ) ,
where
χ i ( 1 ) χ ( ϵ ; t i ; 1 ) : = 0 , 1 ϵ ( 1 ϵ ) , t i , 0 , χ i ( 0 ) χ ( ϵ ; t i ; 0 ) : = 0 , 1 ϵ ( 1 ϵ ) , 0 , t i .
Therefore, using (A15), the second term in (A9) is given by
T 2 = E ω [ ω 2 log f ω ( Y , Z | n ) | n ] = π ( 1 π ) j e t j ( μ + ν ) ( μ ν ) n j i { [ ϵ ( e t i μ 1 ) + 1 ] [ ϵ ( e t i ν 1 ) + 1 ] } 1 i j = ψ j 2 ϕ j ϕ j ϵ ( 1 ϵ ) i , j 1 i j = e t i μ [ ϵ ( e t i μ 1 ) + 1 ] 2 χ i ( 1 ) χ i ( 1 ) q j + e t i ν [ ϵ ( e t i ν 1 ) + 1 ] 2 χ i ( 0 ) χ i ( 0 ) ( 1 q j ) .
Thus, (A14) and (A16) explicitly determine the observed information matrix I n ( ω ) in (A9).
Finally, we may estimate I n ( ω ) in the normal approximation
K ( ω ^ ω ) N 4 [ 0 , K I n 1 ( ω ) ]
by replacing ω in I n ( ω ) by its MLE, ω ^ ( π ^ , ϵ ^ , μ ^ , ν ^ ) , thereby obtaining
K ( ω ^ ω ) N 4 [ 0 , K I n 1 ( ω ^ ) ] .
For large K the 4 × 4 matrix I n ( ω ^ ) is positive definite, hence invertible. □

Appendix C. Additional Results from Section 5

Appendix C.1. Additional Results from Section 5.1

Table A1 and Table A2 display the results of Figure 1 and Figure 2, respectively, in tabular form.
Table A1. Mean absolute error in estimation of θ across varying values of I, J, ϵ , and π .
Table A1. Mean absolute error in estimation of θ across varying values of I, J, ϵ , and π .
J = 5J = 10J = 20J = 40J = 80
π 0.10.250.40.10.250.40.10.250.40.10.250.40.10.250.4
ϵ
I = 5 0.60.820.700.550.940.390.300.750.290.190.390.140.120.440.090.08
0.70.570.430.370.610.320.280.410.220.180.230.120.110.140.080.07
0.80.540.520.420.470.270.230.310.210.150.180.110.100.150.080.07
I = 10 0.60.550.340.330.380.240.180.270.130.120.150.080.080.100.060.06
0.70.480.320.270.340.190.170.230.130.120.130.080.070.090.060.05
0.80.430.270.260.320.220.150.240.110.100.120.080.070.070.050.05
I = 20 0.60.490.280.210.330.150.120.200.090.080.100.060.060.060.050.04
0.70.440.260.190.350.160.130.180.090.080.080.050.050.050.040.04
0.80.450.280.180.300.130.110.160.080.070.070.050.050.050.040.04
I = 40 0.60.470.240.150.260.120.090.140.060.060.070.040.040.050.030.03
0.70.430.240.170.290.100.090.150.060.050.070.040.040.050.030.03
0.80.480.220.140.230.090.080.090.050.040.060.040.040.040.030.02
I = 80 0.60.450.200.110.170.110.060.080.050.040.040.030.030.030.020.02
0.70.410.190.120.230.080.050.110.040.040.040.030.030.030.020.02
0.80.380.180.100.230.090.050.110.040.030.040.030.030.030.020.02
Table A2. Nominal coverage of 95% confidence intervals for θ across varying values of I, J, ϵ , and π .
Table A2. Nominal coverage of 95% confidence intervals for θ across varying values of I, J, ϵ , and π .
J = 5J = 10J = 20J = 40J = 80
π 0.10.250.40.10.250.40.10.250.40.10.250.40.10.250.4
ϵ
I = 5 0.60.650.810.920.820.930.970.900.980.970.950.980.990.940.981.00
0.70.750.940.930.920.960.970.930.970.970.950.980.990.980.990.98
0.80.790.900.960.930.960.970.940.970.960.971.000.960.960.980.98
I = 10 0.60.790.910.930.890.940.960.910.960.980.950.980.970.980.950.98
0.70.830.920.950.900.930.960.950.960.940.960.980.980.950.960.96
0.80.810.940.940.890.940.960.890.960.950.960.940.960.980.960.96
I = 20 0.60.800.920.950.880.970.980.920.950.960.960.940.920.960.940.94
0.70.780.920.910.800.910.890.910.940.960.940.950.950.960.950.96
0.80.740.910.900.860.930.950.900.960.930.970.960.940.940.940.96
I = 40 0.60.770.900.930.830.950.960.910.950.930.910.950.940.940.940.98
0.70.790.910.950.840.960.950.930.960.970.960.930.950.950.940.98
0.80.730.890.940.830.960.930.950.980.960.950.940.920.920.940.95
I = 80 0.60.720.900.940.900.930.950.970.950.950.950.970.920.960.930.94
0.70.700.910.940.840.950.960.920.940.960.940.940.940.930.940.96
0.80.750.870.940.860.960.970.930.950.960.950.940.930.940.940.94

Appendix C.2. Additional Results from Section 5.2

To assess the sensitivity of our results to the assumption that t i = 1 for each site i = 1 , , 11 , we ran a sensitivity analysis in which t i was iteratively updated during the E-step of the proposed EM algorithm. Specifically, we updated t i by maximizing (9) conditional on current estimates of y ^ , z ^ , μ ^ , and ν ^ at any given step of the algorithm. After estimation, estimates of t i were treated as fixed and known.
Results under this sensitivity analysis appear in Table A3. We notice that estimates of θ , π , and ϵ are remarkably similar, while estimates of μ and ν are each changed by the same scale factor. Thus, we observe that our results for the parameter of interest, θ , are not sensitive to our choice of t i in this case.
Table A3. Sensitivity analysis results when estimating t i in frigatebird analysis.
Table A3. Sensitivity analysis results when estimating t i in frigatebird analysis.
Parameter π ϵ μ ν θ (95% CI)
Estimate0.250.8850.5513.843.65 (3.22, 4.08)
Site (i)1234567891011
t ^ i 0.1040.1660.0650.2610.2930.3663.4132.5741.4344.4850.116

References

  1. Baker, G.B.; Holdsworth, M. Seabird monitoring study at Coringa Herald National Nature Reserve 2012; Report Prepared for Department of Sustainability, Environment, Water, Populations and Communities; Latitude 42 Environmental Consultants Pty Ltd.: Kettering, TAS, Australia, 2013. [Google Scholar]
  2. Bouveyron; Celeux, C.G.; Murphy, T.B.; Raftery, A.E. Model-Based Clustering and Classification for Data Science: With Applications in R; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
  3. Lindsay, B.G. Mixture Models: Theory, Geometry, and Applications; Institute for Mathematical Statistics: Hayward, CA, USA, 1995. [Google Scholar]
  4. McLachlan, G.J.; Peel, D. Finite Mixture Models; Wiley: New York, NY, USA, 2000. [Google Scholar]
  5. Martin, T.G.; Wintle, B.A.; Rhodes, J.R.; Kuhnert, P.M.; Field, S.A.; Low-Choy, S.J.; Tyre, A.J.; Possingham, H.P. Zero tolerance ecology: Improving ecological inference by modelling the source of zero observations. Ecol. Lett. 2005, 8, 1235–1246. [Google Scholar] [CrossRef] [PubMed]
  6. Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
  7. Lim, H.K.; Li, W.K.; Philip, L.H. Zero-inflated Poisson regression mixture model. Comput. Stat. Data Anal. 2014, 71, 151–158. [Google Scholar] [CrossRef]
  8. Long, D.L.; Preisser, J.S.; Herring, A.H.; Golin, C.E. A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat. Med. 2014, 33, 5151–5165. [Google Scholar] [CrossRef] [PubMed]
  9. Jamshidian, M.; Jennrich, R.I. Standard errors for EM estimation. J. R. Stat. Soc. Ser. B 2000, 62, 257–270. [Google Scholar] [CrossRef]
  10. Aitken, M.; Rubin, D.B. Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B 1985, 47, 67–75. [Google Scholar] [CrossRef]
  11. McLachlan, G.J.; Krishnan, T. The EM Algorithms and its Extensions, 2nd ed.; Wiley: New York, NY, USA, 2008. [Google Scholar]
  12. Hoadley, B. Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann. Math. Stat. 1971, 42, 1977–1991. [Google Scholar] [CrossRef]
  13. Efron, B.; Hinkley, D.V. Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 1978, 65, 457–482. [Google Scholar] [CrossRef]
  14. Louis, T. Finding the observed information matrix when using the EM Algorithm. J. R. Stat. Soc. Ser. B 1982, 44, 226–233. [Google Scholar] [CrossRef]
  15. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2005. [Google Scholar]
Figure 1. Estimation error for θ in simulated ZIPM data.
Figure 1. Estimation error for θ in simulated ZIPM data.
Stats 08 00055 g001
Figure 2. Nominal coverage of 95% confidence intervals for θ in simulated ZIPM data. Red dashed lines represent target coverage level of 0.95.
Figure 2. Nominal coverage of 95% confidence intervals for θ in simulated ZIPM data. Red dashed lines represent target coverage level of 0.95.
Stats 08 00055 g002
Table 1. Counts of frigatebird nests by subspecies.
Table 1. Counts of frigatebird nests by subspecies.
Frigatebird SubspeciesNest CountsRelative Proportion
Lesser460.036
Greater810.063
Unidentified11580.901
Table 2. Total counts of 1158 unidentified frigatebird nests by site and time point.
Table 2. Total counts of 1158 unidentified frigatebird nests by site and time point.
SiteAugust 2007September 2008October 2009August 2012
10103
22084
31122
444142
512186
600186
75402094
853541273
92438025
101376219618
114240
Table 3. ZIPM model MLEs to study unidentified frigatebird nests.
Table 3. ZIPM model MLEs to study unidentified frigatebird nests.
Parameter π ϵ μ ν θ (95% CI)
Estimate0.250.8466.6018.223.65 (3.23, 4.08)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pearce, M.; Perlman, M.D. Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model. Stats 2025, 8, 55. https://doi.org/10.3390/stats8030055

AMA Style

Pearce M, Perlman MD. Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model. Stats. 2025; 8(3):55. https://doi.org/10.3390/stats8030055

Chicago/Turabian Style

Pearce, Michael, and Michael D. Perlman. 2025. "Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model" Stats 8, no. 3: 55. https://doi.org/10.3390/stats8030055

APA Style

Pearce, M., & Perlman, M. D. (2025). Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model. Stats, 8(3), 55. https://doi.org/10.3390/stats8030055

Article Metrics

Back to TopTop