Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution

Lee, Jong-Seung; Ha, Hyung-Tae

doi:10.3390/math13152383

Open AccessFeature PaperArticle

Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution

by

Jong-Seung Lee

¹

and

Hyung-Tae Ha

^1,2,*

¹

Department of Next Generation Smart Energy System Convergence, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13557, Gyeonggi-do, Republic of Korea

²

Department of Applied Statistics, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13557, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2383; https://doi.org/10.3390/math13152383

Submission received: 19 June 2025 / Revised: 15 July 2025 / Accepted: 17 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Statistical Theory and Application, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

We propose the zero-inflated Polynomially Adjusted Poisson (zPAP) model. It extends the usual zero-inflated Poisson by multiplying the Poisson kernel with a nonnegative polynomial, enabling the model to handle extra zeros, overdispersion, skewness, and even multimodal counts. We derive the maximum-likelihood framework—including the log-likelihood and score equations under both general and regression settings—and fit zPAP to the zero-inflated, highly dispersed Fish Catch data as well as a synthetic bimodal mixture. In both cases, zPAP not only outperforms the standard zero-inflated Poisson model but also yields reliable inference via parametric bootstrap confidence intervals. Overall, zPAP is a clear and tractable tool for real-world count data with complex features.

Keywords:

zero-inflated poisson; polynomially adjusted poisson; maximum likelihood estimation; multimodality; count regression; fish catch dataset; parametric bootstrapping

MSC:

62E10; 62E15; 62E17

1. Introduction

Count data with an excessive number of zeros are widespread in fields such as ecology, health economics, transportation, criminology, and insurance analytics. In practice, observed zeros often arise from two distinct mechanisms: some zeros result from stochastic variability in a Poisson-like process (for example, a rare event that simply does not occur), while other zeros reflect a structural process that deterministically yields zero counts. Standard Poisson regression, which assumes equidispersion and independent event occurrences, cannot distinguish between these mechanisms. As a result, it frequently produces biased parameter estimates, underestimates standard errors, and delivers suboptimal predictive performance when applied to zero-inflated data.

Zero-Inflated Poisson (ZIP) models address part of this complexity by introducing a two-part mixture: one component that governs structural zeros and another that governs the count distribution. However, even this mixture imposes strict constraints on the shape of the count component. In many real-world applications, count distributions exhibit skewness, multimodality, heavy tails, or degrees of zero-inflation that exceed what a single Poisson family can accommodate. Empirical distributions may show multiple peaks due to mixed subpopulations, or have tails that are heavier than what the standard Poisson model allows. This can lead to systematic discrepancies in higher-order moments and tail probabilities.

Semiparametric and nonparametric extensions, such as transformation models, infinite-mixture formulations, and kernel-based likelihoods, have been proposed to capture these complex distributional characteristics. These methods improve flexibility, but often sacrifice interpretability and identifiability, complicate likelihood evaluation, or impose significant computational burdens, especially when working with moderate sample sizes or large-scale datasets. In applied settings, where transparent parameter meanings and efficient estimation are crucial, such a complexity may limit practical use.

These considerations motivate the need for a modeling framework that explicitly accounts for zero inflation while offering greater control over skewness, multimodality, and dispersion without relying on high-dimensional latent structures or prohibitive computation. The Zero Inflated Polynomially Adjusted Poisson (zPAP) model is designed to fill this gap: it retains an interpretable two-part structure for structural zeros and employs a polynomial tilt mechanism to enrich the shape of the count distribution. By allowing flexible adjustment of skewness, multimodality, and tail behavior, including excess zero inflation, zPAP aims to improve inferential accuracy and predictive performance in zero-inflated settings, while preserving theoretical rigor and computational tractability.

Comparative studies such as Mullahy [1], Lambert [2] and Ridout et al. [3] emphasize that ZIP and hurdle models encode fundamentally different assumptions and must be selected carefully in empirical contexts. Extensions of ZIP models to address overdispersion have led to the use of alternative kernels such as the negative binomial (Greene [4]), generalized Poisson, and Conway-Maxwell-Poisson (COM-Poisson) distributions (Shmueli et al. [5], Sellers and Shmueli [6]). These offer additional dispersion control and improved empirical fit. However, COM-Poisson, while flexible, often suffers from computational intractability due to the presence of intractable normalizing constants.

The proposed zPAP model addresses this limitation by integrating a polynomial adjustment into the Poisson component of a ZIP model. This idea builds upon a substantial body of prior work on orthogonal polynomial expansions and their applications to distributional approximation and density estimation. Im et al. [7] introduced a least-squares criterion to estimate polynomial coefficients while ensuring non-negativity and robustness by making use of quadratic programming. In addition, Min et al. [8] discussed approximations to discrete distributions of rank statistics. Ha [9] applied Krawtchouk polynomial expansions to discrete distributions, demonstrating their capacity to approximate overdispersed and skewed data with improved fit and tail accuracy. Ha [10] proposed a Charlier-series-based adjustment to model nonhomogeneous Poisson processes, capturing skewness and higher moments.

Collectively, this line of work provides the theoretical and methodological foundation for the zPAP model. By incorporating polynomial weight functions into a standard Poisson kernel within a zero-inflated structure, the zPAP framework retains interpretability, supports efficient estimation, and offers superior distributional flexibility. It enables practitioners to fit count data with complex features such as zero-inflation, overdispersion, multimodality and heavy tails, extending the reach of classical ZIP models while preserving computational feasibility. This novel framework retains the core structure of the standard ZIP model but introduces substantial flexibility into the count component through a parametric polynomial adjustment of the Poisson distribution. The central innovation of the zPAP model is a multiplicative polynomial adjuster, which reweights the Poisson kernel in a tractable and interpretable manner. This adjuster preserves parametric structure and ensures that the resulting distribution remains normalizable, enabling maximum-likelihood estimation. A low-degree nonnegative polynomial—such as cubic or quartic—is typically sufficient to capture key empirical features including overdispersion, skewness, and multimodality. The zPAP model thus provides a powerful yet interpretable framework for analyzing zero-inflated count data with complex distributional characteristics, bridging theoretical structure and practical flexibility for use in diverse applied domains.

2. Zero-Inflated Polynomial-Adjusted Poisson (zPAP) Distribution

Standard count data models such as Poisson or the negative binomial often miss ’excess zeros’, more zero counts than expected. The Zero-Inflated Polynomial-Adjusted Poisson (zPAP) distribution is designed to address this issue by explicitly modeling these two sources of zero observations. The model acknowledges that an observed zero count can originate from the component of ’structural zero’, with probability

π

, or from the component of ’sampling zero’, where the underlying count generation process (in this case, polynomial adjusted Poisson (PAP)) happens to produce a zero. This dual nature of zeros is fundamental to the zPAP formulation. The parameter

π

quantifies the probability of a zero that cannot be attributed to the PAP process, even if the PAP process itself has a non-negligible probability of producing a zero,

P^{*} (0; λ, α)

.

The count-generating mechanism within the zPAP model is the PAP distribution. This distribution belongs to the broader class of Weighted Poisson Distributions (WPDs) as can be seen in del Castillo and Pérez-Casany [11] and Ridout and Besbeas [12], which offer a flexible framework for modeling count data, especially when overdispersion (variance greater than the mean) or underdispersion (variance less than the mean) is present. A WPD modifies a standard Poisson distribution by introducing a weight function,

w (x)

, such that its probability mass function (PMF) is given by

P_{w} (x; θ) = \frac{w (x) P (x; θ)}{E_{θ} [w (X)]},

where

P (x; θ)

is the PMF of the standard Poisson distribution and

E_{θ} [w (X)]

is the expectation of the weight function under the standard Poisson, serving as a normalizing constant.

Proposition 1

(PMF of the zPAP distribution). Let Y be a random variable following the zero-inflated Polynomially Adjusted Poisson (zPAP) distribution with parameters π, λ, and α. Its probability mass function is

P (Y = k ∣ π, λ, α) = \{\begin{matrix} π + (1 - π) P^{*} (0; λ, α), & k = 0, \\ (1 - π) P^{*} (k; λ, α), & k = 1, 2, \dots, \end{matrix}

where the underlying Polynomially Adjusted Poisson component has

P^{*} (k; λ, α) = \frac{T (k; α) e^{- λ} λ^{k} / k!}{Z (λ, α)},

with

Z (λ, α) = \sum_{j = 0}^{\infty} T (j; α) e^{- λ} \frac{λ^{j}}{j!}

as the normalizing constant.

T (k; α)

is a nonnegative polynomial weight that adjusts standard Poisson probabilities, π is the structural-zero probability, with

0 \leq π < 1

,

λ > 0

is the base Poisson rate, and α (scalar or vector) governs the form and degree of the polynomial T, tuning dispersion, tail behavior, and possible multimodality.

By acting as a flexible, nonnegative weight on each count, the polynomial

T (y; α)

uniformly rescales the baseline Poisson probabilities to shape the overall distribution. Abstractly, T defines a family of count laws that interpolate between the classical Poisson (

T \equiv 1

) and richer alternatives with altered variance, skewness, and tail weight. When T increases in y, higher counts receive relatively more mass—enabling overdispersion and heavier tails—whereas a weight that diminishes for larger y can enforce underdispersion or non-monotonic modes. Thus, by choosing an appropriate nonconstant polynomial

T (y; α) \geq 0

, one gains a unified mechanism to tune dispersion, multimodality, and tail behavior, all while preserving the core Poisson framework and ensuring a valid probability model.

Proposition 2

(Identifiability and Validity of the Polynomial Adjuster). Let

T (y; α) = \sum_{j = 0}^{d} α_{j} y^{j}

be the polynomial weight in the PAP component. To ensure the zPAP model defines a proper, identifiable count distribution with a unique interpretable fit, one must require that

T (y; α) \geq 0 \forall y \in N_{0},

so that α lies in the convex cone

Θ_{d}^{+} = \{α \in R^{d + 1} | \sum_{j = 0}^{d} α_{j} y^{j} \geq 0 \forall y \in N_{0}\} .

Equivalently, one may insist on nonnegativity for all real

y \geq 0

, giving the semidefinite-friendly cone

{\tilde{Θ}}_{d}^{+} = \{α \in R^{d + 1} | \sum_{j = 0}^{d} α_{j} y^{j} \geq 0 \forall y \geq 0\},

which admits an exact sum-of-squares characterization

T (y; α) = s_{0} {(y)}^{2} + y s_{1} {(y)}^{2}

for some real polynomials

s_{0}

and

s_{1}

. In practice, enforcing the simpler constraint

α_{j} \geq 0

for all j is often sufficient and easy to implement. See Lasserre [13], Marshall [14] for details.

Ensuring these identifiability and nonnegativity conditions is essential. Without them, the zPAP likelihood can become non-invertible, its moments undefined or non-computable, and any statistical inference drawn may be misleading or uninterpretable.

The normalizing constant

Z (λ, α) = \sum_{j = 0}^{\infty} T (j; α) e^{- λ} \frac{λ^{j}}{j!}

ensures that the weighted Poisson terms sum to one. Before using it in the zPAP PMF, we must verify that for every

λ > 0

and admissible

α

,

Z (λ, α)

is both finite and strictly positive. These guarantees imply that

P^{*} (k; λ, α) = \frac{T (k; α) e^{- λ} λ^{k} / k!}{Z (λ, α)}

is a valid probability mass function. We now state the following result, which establishes that the normalizing constant yields a proper probability distribution.

Theorem 1

(Properties of the Normalizing Constant). Let

T (y; α) = \sum_{j = 0}^{d} α_{j} y^{j}, α_{0} = 1,

and define

Z (λ, α) = \sum_{y = 0}^{\infty} T (y; α) \frac{e^{- λ} λ^{y}}{y!}, λ > 0 .

Then the following hold:

(a): Convergence. For any fixed degree d and any $λ > 0$ , the series defining $Z (λ, α)$ converges absolutely. In particular, since $T (y; α) = O (y^{d})$ as $y \to \infty$ and $\frac{λ^{y}}{y!}$ decays super-exponentially, one has

$\sum_{y = 0}^{\infty} T (y; α) \frac{λ^{y}}{y!} < \infty ⟹ Z (λ, α) < \infty .$
(b): Positivity. Because $T (0; α) = α_{0} = 1$ , every term in the defining sum is non-negative, with at least one strictly positive term at $y = 0$ . Hence

$Z (λ, α) = \sum_{y = 0}^{\infty} T (y; α) \frac{e^{- λ} λ^{y}}{y!} > 0 for all λ > 0 .$
(c): Moment Representation. Writing $μ_{k} (λ) = E_{Pois (λ)} [Y^{k}]$ for the kth raw moment of a $Pois (λ)$ random variable, one obtains the finite-sum representation

$Z (λ, α) = \sum_{j = 0}^{d} α_{j} μ_{j} (λ) .$

Since $μ_{j} (λ)$ is a polynomial in λ of degree j (e.g., $μ_{0} = 1$ , $μ_{1} = λ$ , $μ_{2} = λ^{2} + λ$ , etc.), this shows $Z (λ, α)$ itself is a polynomial in λ of degree d, which is immediately computable without truncating an infinite sum.
(d): Identifiability Constraint. Because multiplying all $α_{j}$ by a common constant $c > 0$ leaves the ratio $T (y; α) / Z (λ, α)$ unchanged, one must fix a scale (e.g., $α_{0} = 1$ ) to ensure the parameters $α_{1}, \dots, α_{d}$ are identifiable.

Together, (a)–(d) guarantee that the polynomially adjusted Poisson PMF $P^{*} (Y = y) = \frac{T (y; α)}{Z (λ, α)} e^{- λ} λ^{y} / y!$ is well defined for all $λ > 0$ and all $y \geq 0$ .

3. Maximum-Likelihood Estimation

Assuming an independent and identically distributed (i.i.d.) sample

Y = (Y_{1}, Y_{2}, \dots, Y_{n})

, the joint likelihood function

L (π, λ, α ∣ Y)

is the product of the individual likelihood contributions:

L (π, λ, α ∣ Y) = \prod_{i = 1}^{n} [{(π + (1 - π) P^{*} (0; λ, α))}^{1 {Y_{i} = 0}} \times {((1 - π) P^{*} (Y_{i}; λ, α))}^{1 {Y_{i} > 0}}],

where

1 {A}

is the indicator function, taking the value 1 if condition A is true, and 0 otherwise. Using the counts of zero (

n_{0}

) and nonzero (

n_{1}

) observations, this can be written more compactly:

L (π, λ, α ∣ Y) = {[π + (1 - π) P^{*} (0; λ, α)]}^{n_{0}} {(1 - π)}^{n_{1}} \prod_{i : Y_{i} > 0} P^{*} (Y_{i}; λ, α) .

For simplicity,

P^{*} (k)

will denote

P^{*} (k; λ, α)

in subsequent expressions where the context is clear. The log-likelihood function

ℓ (π, λ, α) = log L (π, λ, α)

yields

ℓ (π, λ, α ∣ Y) = n_{0} log [π + (1 - π) P^{*} (0)] + n_{1} log (1 - π) + \sum_{i : Y_{i} > 0} log P^{*} (Y_{i}) .

To further elaborate, substitute the definition

P^{*} (k) = \frac{T (k; α) P (k; λ)}{Z (λ, α)}, P_{p} (k; λ) = e^{- λ} \frac{λ^{k}}{k!},

so that

P^{*} (0) = \frac{e^{- λ}}{Z (λ, α)} .

The log-likelihood becomes

\begin{matrix} ℓ (π, λ, α ∣ Y) & = n_{0} log [π + (1 - π) \frac{e^{- λ}}{Z (λ, α)}] + n_{1} log (1 - π) + \sum_{i : Y_{i} > 0} log [\frac{T (Y_{i}; α) e^{- λ} λ^{Y_{i}} / Y_{i}!}{Z (λ, α)}] \\ = n_{0} log [π + (1 - π) \frac{e^{- λ}}{Z (λ, α)}] + n_{1} log (1 - π) + \\ \sum_{i : Y_{i} > 0} [log T (Y_{i}; α) - λ + Y_{i} log λ - log (Y_{i}!) - log Z (λ, α)] \\ = n_{0} log [π + (1 - π) \frac{e^{- λ}}{Z (λ, α)}] + n_{1} log (1 - π) - n_{1} log Z (λ, α) + \\ \sum_{i : Y_{i} > 0} [log T (Y_{i}; α) - λ + Y_{i} log λ - log (Y_{i}!)] . \end{matrix}

The logarithmic likelihood of the zPAP is broken down into four parts. First, every zero count contributes

n_{0} log [π + (1 - π) \frac{e^{- λ}}{Z (λ, α)}],

which depends on

π

,

λ

, and

α

and combines structural zeros with stochastic zeros from the PAP component. Next, each positive observation contributes

n_{1} log (1 - π),

involving only

π

and reflecting that nonzeros must come from the PAP part. Then, the magnitudes of the positive counts enter through

\sum_{i : Y_{i} > 0} [log T (Y_{i}; α) - λ + Y_{i} log λ - log (Y_{i}!)],

which depends on

λ

and

α

and represents the unnormalized PAP log-mass, i.e., the log of the PAP PMF numerator before division by

Z (λ, α)

. Finally, all observations incur the term

- n_{1} log Z (λ, α),

again involving

λ

and

α

, to ensure the PAP probabilities sum to one. Together, these four contributions form the full log-likelihood and precisely show how

π

,

λ

, and

α

each enter the model.

A critical aspect revealed by the structure of the log-likelihood function is the intertwined nature of the estimation of

λ

and

α

. The normalizing constant

Z (λ, α)

is a function of both

λ

and

α

. Since

log Z (λ, α)

appears in the log-likelihood for all observations (either directly or implicitly through

P^{*} (0)

), its partial derivatives with respect to

λ

and

α

appear in the respective score equations. Specifically,

\partial ℓ / \partial λ

will involve

\partial Z / \partial λ

, and

\partial ℓ / \partial α

will involve

\partial Z / \partial α

. This mathematical linkage means that the score equations for

λ

and

α

,

S_{λ} = 0

and

S_{α} = 0

, form a system of equations that must be solved simultaneously to obtain the MLEs

\hat{λ}

and

\hat{α}

. Consequently, these parameters cannot be estimated independently within the PAP component. This interdependence can also lead to correlations between the estimators

\hat{λ}

and

\hat{α}

, particularly if

Z (λ, α)

exhibits similar sensitivity to changes in both parameters. The complexity of

Z (λ, α)

, being an infinite sum, further complicates the analytical derivation and numerical computation of these derivatives.

The MLEs of the parameters

θ = {(π, λ, α^{T})}^{T}

are found by solving the system of score equations

S (θ) = 0

, where

S (θ)

is the score vector, defined as the vector of first-order partial derivatives of the log-likelihood function with respect to each parameter:

S (θ) = (\begin{matrix} \frac{\partial ℓ}{\partial π} \\ \frac{\partial ℓ}{\partial λ} \\ \frac{\partial ℓ}{\partial α^{T}} \end{matrix}) .

Let

P_{0}^{*} = P^{*} (0; λ, α) = \frac{e^{- λ}}{Z (λ, α)} .

In order to have a partial derivative with respect to

π

, that is,

S_{π} = \partial ℓ / \partial π

, we differentiate ℓ with respect to

π

yields:

S_{π} = \frac{\partial ℓ}{\partial π} = n_{0} \frac{1 - P_{0}^{*}}{π + (1 - π) P_{0}^{*}} - \frac{n_{1}}{1 - π} .

Setting

S_{π} = 0

provides an equation that implicitly defines

\hat{π}

in terms of

{\hat{P}}_{0}^{*} = P^{*} (0; \hat{λ}, \hat{α})

:

n_{0} \frac{1 - {\hat{P}}_{0}^{*}}{\hat{π} + (1 - \hat{π}) {\hat{P}}_{0}^{*}} = \frac{n_{1}}{1 - \hat{π}} .

Therefore,

\hat{π} = \frac{n_{0} - n {\hat{P}}_{0}^{*}}{n (1 - {\hat{P}}_{0}^{*})} .

This expression for

\hat{π}

highlights its dependence on the estimated parameters of the PAP component through

{\hat{P}}_{0}^{*}

. The structure of

S_{π}

is typical for mixture proportions in statistical models. It allows for an intuitive update for

\hat{π}

if

λ

and

α

(and thus

P_{0}^{*}

) were known or estimated from a previous iteration. The equation essentially balances the observed zeros

n_{0}

against those expected to arise from the PAP component (if all n observations were from that component,

n P_{0}^{*}

), and those that are excess to this.

In order to have a partial derivative with respect to

λ

, that is,

S_{λ} = \partial ℓ / \partial λ

, the derivative with respect to

λ

is more involved due to

λ

’s presence in

P_{0}^{*}

(both in its numerator

e^{- λ}

and its denominator

Z (λ, α)

) and in each

log P^{*} (Y_{i})

term (through

Y_{i} log λ - λ

and the denominator

Z (λ, α)

). Using the form

ℓ = n_{0} log [π Z + (1 - π) e^{- λ}] + n_{1} log (1 - π) +

\sum_{i : Y_{i} > 0} [log T (Y_{i}; α) + Y_{i} log λ - λ - log (Y_{i}!)] - n_{1} log Z (λ, α),

where

T (0; α) = 1, Z = Z (λ, α) = \sum_{j = 0}^{\infty} T (j; α) \frac{e^{- λ} λ^{j}}{j!},

we obtain:

\begin{matrix} S_{λ} = \frac{\partial ℓ}{\partial λ} & = n_{0} \frac{π \frac{\partial Z}{\partial λ} + (1 - π) (- e^{- λ})}{π Z + (1 - π) e^{- λ}} + \sum_{i : Y_{i} > 0} (\frac{Y_{i}}{λ} - 1) - n_{1} \frac{1}{Z} \frac{\partial Z}{\partial λ} . \end{matrix}

A key component is

\frac{\partial Z (λ, α)}{\partial λ} = \sum_{j = 0}^{\infty} T (j; α) \frac{\partial}{\partial λ} (\frac{e^{- λ} λ^{j}}{j!}) = \sum_{j = 0}^{\infty} T (j; α) \frac{e^{- λ} λ^{j}}{j!} (\frac{j}{λ} - 1) .

Let

P_{0} = P (0; λ) = e^{- λ}, P_{0}^{*} = \frac{P_{0}}{Z} .

Equivalently, one may derive

S_{λ}

by noting

\partial log P^{*} (k) / \partial λ = \frac{k}{λ} - 1 - \frac{1}{Z} \frac{\partial Z}{\partial λ}

. In particular, for

k = 0

,

\frac{\partial log P^{*} (0)}{\partial λ} = - 1 - \frac{1}{Z} \frac{\partial Z}{\partial λ} .

Hence, another equivalent form is

\begin{matrix} S_{λ} & = n_{0} \frac{(1 - π) P_{0}^{*} (- 1 - \frac{1}{Z} \frac{\partial Z}{\partial λ})}{π + (1 - π) P_{0}^{*}} + \sum_{i : Y_{i} > 0} (\frac{Y_{i}}{λ} - 1 - \frac{1}{Z} \frac{\partial Z}{\partial λ}) \\ = n_{0} \frac{(1 - π) P_{0}^{*} (- 1 - \frac{1}{Z} \frac{\partial Z}{\partial λ})}{π + (1 - π) P_{0}^{*}} + \sum_{i : Y_{i} > 0} (\frac{Y_{i}}{λ} - 1) - n_{1} \frac{1}{Z} \frac{\partial Z}{\partial λ} . \end{matrix}

This expression highlights the complex interplay of terms involving the observed counts

Y_{i}

, the parameter

λ

, and the derivative of the normalizing constant

Z (λ, α)

.

And we derive the partial derivative(s) with respect to

α

, that is,

S_{α} = \partial ℓ / \partial α

. If

α

is a vector

α = {(α_{1}, \dots, α_{p})}^{T}

, then

S_{α}

is a vector of partial derivatives

S_{α_{s}} = \frac{\partial ℓ}{\partial α_{s}}, s = 1, \dots, p .

We have

S_{α_{s}} = n_{0} \frac{π \frac{\partial Z}{\partial α_{s}}}{π Z + (1 - π) e^{- λ}} + \sum_{i : Y_{i} > 0} \frac{1}{T (Y_{i}; α)} \frac{\partial T (Y_{i}; α)}{\partial α_{s}} - n_{1} \frac{1}{Z} \frac{\partial Z}{\partial α_{s}} .

Key components required are polynomial adjuster and normalizing constant, that is,

\frac{\partial T (k; α)}{\partial α_{s}}

depends on the specific polynomial form of

T (k; α)

. And

\frac{\partial Z (λ, α)}{\partial α_{s}} = \sum_{j = 0}^{\infty} \frac{\partial T (j; α)}{\partial α_{s}} \frac{e^{- λ} λ^{j}}{j!} .

Alternatively, using

\frac{\partial log P^{*} (k)}{\partial α_{s}} = \frac{1}{T (k; α)} \frac{\partial T (k; α)}{\partial α_{s}} - \frac{1}{Z (λ, α)} \frac{\partial Z (λ, α)}{\partial α_{s}},

one obtains

\begin{matrix} S_{α_{s}} & = n_{0} \frac{(1 - π) P_{0}^{*} (- \frac{1}{Z} \frac{\partial Z}{\partial α_{s}})}{π + (1 - π) P_{0}^{*}} + \sum_{i : Y_{i} > 0} (\frac{1}{T (Y_{i}; α)} \frac{\partial T (Y_{i}; α)}{\partial α_{s}} - \frac{1}{Z} \frac{\partial Z}{\partial α_{s}}), \end{matrix}

where

P_{0}^{*} = P^{*} (0; λ, α) = e^{- λ} / Z (λ, α)

. The difficulty stems from having to differentiate the polynomial weight function

T (k; α)

and the normalizing constant

Z (λ, α)

with respect to each component

α_{s}

.

The maximum-likelihood estimates

\hat{θ} = {(\hat{π}, \hat{λ}, {\hat{α}}^{T})}^{T}

satisfy the score equations for the general case of

S (θ) = 0

, that is, the three equations

S_{π} = 0

,

S_{λ} = 0

and

S_{α_{s}} = 0

for each component

α_{s}

. Apart from the obvious derivative of the zero-count term with respect to

π

, each score involves the derivative of the PAP probability

P^{*} (k; λ, α) = \frac{T (k; α) e^{- λ} λ^{k} / k!}{Z (λ, α)}

with respect to

λ

or

α_{s}

. In turn, those derivatives require the sensitivity of the normalizing constant to each parameter. Concretely,

\partial P^{*} / \partial λ

brings terms such as

\frac{k}{λ} - 1

and

\partial log Z / \partial λ

, while

\partial P^{*} / \partial α_{s}

depends on

\partial T (k; α) / \partial α_{s}

and

\partial log Z / \partial α_{s}

. Both

\partial Z / \partial λ

and

\partial Z / \partial α_{s}

are infinite sums over j, so there is no closed-form solution. This complexity, especially the appearance of Z and its derivatives in every score, means that the MLE system must be solved numerically, and good initial values and careful computation of those infinite sums are essential for reliable estimation.

4. Regression Parameterization

Count regression models have been extensively developed to analyze nonnegative integer-valued results in a wide range of applications [15,16,17,18]. The proposed zPAP model can be extended into a regression framework by incorporating covariates, allowing the parameters

π

,

λ

, and

α

to vary between observations. This extension improves the model’s capacity to capture heterogeneity in count data by linking distributional parameters to observed characteristics. Such regression-based formulations are particularly useful in settings where different covariates are believed to influence the zero-inflation and count-generating processes separately.

Let

x_{π_{i}}

,

x_{λ_{i}}

, and

x_{α_{i}}

be vectors of covariates for the ith observation, associated with parameters

π_{i}

,

λ_{i}

, and

α_{i}

, respectively. These parameters are related to the covariates through appropriate link functions. For the probability of zero inflation

π_{i}

, we usually use a logit link to ensure

π_{i} \in (0, 1)

:

logit (π_{i}) = log (\frac{π_{i}}{1 - π_{i}}) = η_{π_{i}} = x_{π_{i}}^{'} β_{π} ⟹ π_{i} = \frac{exp (x_{π i}^{'} β_{π})}{1 + exp (x_{π i}^{'} β_{π})} .

For the PAP rate parameter

λ_{i}

, we use a log link to ensure

λ_{i} > 0

:

log (λ_{i}) = η_{λ_{i}} = x_{λ_{i}}^{'} β_{λ} ⟹ λ_{i} = exp (x_{λ_{i}}^{'} β_{λ}) .

For the PAP adjustment parameter(s)

α_{i}

, the choice of link depends on its constraints. If

α_{i}

is a scalar and unconstrained, an identity link is natural:

α_{i} = η_{α_{i}} = x_{α_{i}}^{'} β_{α} .

If

α_{i}

must be positive, one may instead use a log link. When

α_{i}

is a vector, each component

α_{s_{i}}

can have its own link

g_{s}

:

α_{s_{i}} = g_{s}^{- 1} (x_{α_{i}}^{'} β_{α_{s}}) .

The parameters to be estimated under this regression specification are the coefficient vectors

β_{π}

,

β_{λ}

, and

β_{α}

.

With parameters

π_{i}

,

λ_{i}

, and

α_{i}

now dependent on covariates, the log-likelihood for the ith observation is

ℓ_{i} (β_{π}, β_{λ}, β_{α} ∣ Y_{i}, x_{i})

, and the total log-likelihood is

ℓ = \sum_{i = 1}^{n} ℓ_{i} .

The score equations are derived with respect to the regression coefficients

β

. For example, for a coefficient

β_{π_{j}}

(the jth element of

β_{π}

), the score component is:

\frac{\partial ℓ}{\partial β_{π_{j}}} = \sum_{i = 1}^{n} \frac{\partial ℓ_{i}}{\partial π_{i}} \frac{\partial π_{i}}{\partial η_{π_{i}}} \frac{\partial η_{π_{i}}}{\partial β_{π_{j}}},

where

\frac{\partial ℓ_{i}}{\partial π_{i}}

is the derivative of the ith observation’s log-likelihood contribution with respect to

π_{i}

, analogous to the non-regression case but specific to observation i,

\frac{\partial π_{i}}{\partial η_{π_{i}}}

is the derivative of the inverse link function. For the logit link,

\partial π_{i} / \partial η_{π i} = π_{i} (1 - π_{i})

and

\frac{\partial η_{π_{i}}}{\partial β_{π_{j}}}

is the jth covariate entry for

π_{i}

, i.e.,

x_{π_{i j}}

. Thus,

\frac{\partial ℓ}{\partial β_{π_{j}}} = \sum_{i = 1}^{n} [(1 {Y_{i} = 0} \frac{1 - P_{0 i}^{*}}{π_{i} + (1 - π_{i}) P_{0 i}^{*}} - 1 {Y_{i} > 0} \frac{1}{1 - π_{i}}) π_{i} (1 - π_{i}) x_{π_{i j}}],

where

P_{0 i}^{*} = P^{*} (0; λ_{i}, α_{i})

. Similar applications of the chain rule yield the score equations for

β_{λ}

and

β_{α}

. The resulting score equations typically involve sums over all observations, with each term weighted by the respective covariates.

5. Numerical Examples

Due to the non-convex nature of the zPAP likelihood, we employ multiple random initializations (multistart strategy) to reduce the risk of convergence to a suboptimal local maximum. We accept the solution with the highest achieved log-likelihood value. While all parameters in the zPAP model are estimated jointly, the specific form of

S_{π}

suggests the utility of iterative numerical procedures. Numerical optimization algorithms, such as quasi-Newton methods (e.g., BFGS), are required.

We utilized the Limited-Memory BFGS with bound constraints (L-BFGS-B) algorithm [19] for complex computation in numerical examples. Optimizing

ℓ (γ, β, α)

over

γ, β \in R

and

α \in Θ_{d}^{+}

ensures that at each iterate,

π_{i} \in (0, 1)

,

λ_{i} > 0

, and

T (y; α) \geq 0

for all y. We compute the maximum-likelihood estimator by directly optimizing the observed-data log-likelihood

ℓ (θ) = \sum_{i : y_{i} = 0} ln [π_{i} + (1 - π_{i}) P^{*} (0; λ_{i}, α)] + \sum_{i : y_{i} > 0} \{ln (1 - π_{i}) + ln P^{*} (y_{i}; λ_{i}, α)\},

subject to the constraints

0 < π_{i} < 1, λ_{i} > 0, α \in Θ_{d}^{+} .

This direct maximization approach avoids the slow “E-step/M-step” oscillations of EM algorithm and leverages curvature information via the approximate inverse-Hessian updates, resulting in substantially faster convergence to the global or high-quality local maxima of the zPAP log-likelihood.

5.1. Fish Catch Dataset

To illustrate the practical performance of the proposed zPAP model, we analyze the well-known Fish Catch dataset, which has been used extensively in zero-inflated modeling literature. The dataset originates from the COUNT data repository compiled by Hilbe [20] and is available through several statistical software libraries. It contains records of recreational fishing trips on the Great Lakes and is characterized by a high frequency of zero counts, reflecting trips where no fish were caught.

All models are estimated by maximum likelihood using numerical optimization. The likelihood functions are derived from the model-specific probability mass functions, with parameter constraints and initialization tailored to each formulation. To accelerate convergence and avoid local maxima, we initialize the parameters using a method-of-moments approach. For the ZIP model, initial estimates of

\hat{π}

and

\hat{λ}

are obtained by matching the empirical mean and variance to the theoretical expressions. For the zPAP model, we use the same moment-matched

\hat{π}

and

\hat{λ}

as starting values and initialize the polynomial coefficients as

α_{0} = 1

,

α_{1} = α_{2} = α_{3} = 0

. For the zPAP model, the normalizing constant

Z (λ_{i}, α)

involves an infinite sum over y. In practice, we truncate the sum at a sufficiently large upper bound (e.g.,

y_{\max} = 300

), chosen to ensure that the omitted tail has negligible effect on the computed probabilities.

To compare the relative quality of competing count-data models (e.g., standard ZIP versus various degrees of zPAP), we use two widely adopted information-criterion measures: the Akaike Information Criterion (AIC). These criteria quantify the trade-off between model fit and complexity by penalizing the maximized log-likelihood according to the number of estimated parameters.

Let

ℓ (\hat{θ}) = \sum_{i = 1}^{n} ln P (Y_{i}; \hat{θ})

be the maximized log-likelihood for a model with parameter estimate

\hat{θ}

, and let k denote the total number of free parameters in that model. Then,

AIC = 2 k - 2 ℓ (\hat{θ}) = - 2 ℓ (\hat{θ}) + 2 k,

and

BIC = ln (n) k - 2 ℓ (\hat{θ}) = - 2 ℓ (\hat{θ}) + ln (n) k .

Conceptually,

- 2 ℓ (\hat{θ})

measures lack of fit (smaller is better), while the penalty term (

2 k

in AIC,

ln (n) k

in BIC) discourages over-parameterization. Since

ln (n) > 2

whenever

n > 7

, BIC more strongly penalizes extra parameters in moderate-to-large samples.

As can be seen in the Table 1, the ZIP model already fits the data very well, with a log-likelihood of −921.62 and moderate AIC/BIC values. Adding a first-degree polynomial adjuster (zPAP(1)) delivers a huge boost in fit (log-likelihood jumps to −671), showing that even a small weight function can capture departures from the basic ZIP. Moving to second and third degrees (zPAP(2) and zPAP(3)) yields further improvements (log-likelihoods of −668 and −661). The new

α

-coefficients for higher-order terms remain near zero (

α_{2} = 0.004

,

α_{3} = 0.001

), which means the model is only adjusting minor discrepancies with the ZIP baseline. Throughout, the core regression coefficients for

λ

and

π

stay almost unchanged. This stability shows that zPAP is not overreacting or unstable; it simply adapts to small differences between the data and the ZIP fit, improving the model in a controlled and interpretable way. As summarized in Table 2, the estimated structural zero probabilities (

\hat{π}

) and Poisson means (

\hat{λ}

) for the observed data and each ZIP/zPAP model are reported.

Figure 1 displays the observed count distribution alongside the fitted frequencies for both models. The standard ZIP model already provides an accurate representation of the data, capturing the heavy zero-inflation and overall dispersion effectively. The zPAP model likewise achieves a comparable level of accuracy. Because the ZIP fit is strong to begin with, the polynomial adjustment in zPAP yields only marginal improvements in fit accuracy. Importantly, as the degree of the polynomial adjuster is increased, the zPAP estimates remain stable and do not deteriorate, demonstrating robustness even when higher-order terms are introduced.

We compare two count regression models on the Fish Catch dataset, ZIP and zPAP. Each model is composed of two components: a count component for the number of fish caught, and a zero-inflation component to model structural zeros. The structure of the covariates used is consistent across models to ensure fair comparison. We include one key covariate in each model: persons (number of individuals in the group) is used to model

λ_{i}

, and camper (a binary indicator of whether the group camped overnight) is used to model

π_{i}

. Specifically, for each group i, we have

log (λ_{i}) = β_{0} + β_{1} \cdot {persons}_{i}

and

logit (π_{i}) = γ_{0} + γ_{1} \cdot {camper}_{i}

. Each model includes an intercept term. Each observation corresponds to a fishing trip, and the primary response variable is the number of fish caught, denoted by

Y_{i} \in N

. The dataset includes the covariates of persons (number of individuals participating in the fishing party) and camper (binary indicator of whether the party camped overnight before the trip (

camper = 1

) or not (

camper = 0

)). The persons variable is used as a covariate for modeling the Poisson mean

λ_{i}

in all fitted models, while camper is used to model the probability of structural zeros

π_{i}

. These selections reflect intuitive hypotheses: a larger party may have higher expected catch rates (affecting

λ_{i}

), while those who camped may differ systematically in their likelihood of catching zero fish due to differing engagement or fishing strategies (affecting

π_{i}

).

The Poisson mean parameter

λ_{i}

is modeled via a log-linear regression using the number of persons in the fishing party

log (λ_{i}) = β_{0} + β_{1} \cdot {persons}_{i},

and the structural zero probability

π_{i}

is modeled through a logistic regression on the camper indicator

logit (π_{i}) = γ_{0} + γ_{1} \cdot {camper}_{i}

. Then, in the zPAP model, the count distribution is adjusted by a degree-3 polynomial applied to the Poisson kernel—that is, the adjusted probability of count y is proportional to:

P^{*} (y ∣ λ_{i}, α) = \frac{e^{- λ_{i}} λ_{i}^{y}}{y!} \cdot T (y; α), where T (y; α) = 1 + α_{1} y + α_{2} y^{2} + α_{3} y^{3} .

The normalizing constant

Z (λ_{i}, α)

ensures this defines a proper PMF.

We summarizes the estimated parameters and model fit statistics for the ZIP and zPAP models applied to the Fish Catch dataset. Table 3 reports the maximum-likelihood estimates for all model parameters, including regression coefficients in the count and zero-inflation components, and auxiliary parameters (

α_{1}

,

α_{2}

,

α_{3}

for zPAP). The coefficient

β_{1}

is positive across all models, indicating that groups with more persons are associated with higher expected catch counts. The coefficient

γ_{1}

is negative in all models, suggesting that campers have lower odds of being structural zeros, possibly due to better preparation or more serious fishing intent. The zPAP model introduces additional flexibility via the polynomial adjustment: the signs and magnitudes of the estimated coefficients

{\hat{α}}_{1}

,

{\hat{α}}_{2}

, and

{\hat{α}}_{3}

reflect the shape distortions applied to the baseline Poisson kernel. And in terms of log-likelihood and information criteria, the zPAP model achieves the highest likelihood and the lowest AIC and BIC values, indicating the best overall fit to the data. This confirms that adjusting the count distribution via a polynomial function offers substantial gains in model flexibility over ZIP.

5.2. Artificial Dataset

To evaluate the ability of the zPAP model to accommodate both zero-inflation and multimodal count behavior, we augmented the original Fish Catch dataset by superimposing an artificial latent subpopulation. Specifically, an additional set of the same number of observations was drawn from a Poisson distribution with mean

λ = 10

and appended to the existing counts. As a result, the combined dataset retains the original zero-inflation while also exhibiting a pronounced secondary mode around 10. Creating this bimodal, zero-inflated dataset is crucial for two reasons: first, it provides a controlled setting in which to assess whether the zPAP model can simultaneously capture excess zeros and a second mass of high counts; second, it mimics real-world scenarios in which heterogeneous subpopulations (e.g., light versus heavy fish catchers) coexist, thereby demonstrating the practical utility of zPAP for complex count data.

Figure 2 overlays the data histogram with each model’s fitted frequencies. The ZIP curve is too narrow—it underestimates overall spread and fails to reproduce the second mode. As we increase the polynomial degree, the zPAP fit rapidly converges to the true distribution, accurately matching both the large spike at zero and the bimodal peaks.

Table 4 compares the observed proportion of zero counts in the data with the proportions predicted by each model. As before, the zPAP model provides the closest match, highlighting its capacity to leverage both covariates and the shape of the response distribution. The table presents the estimated Poisson means (

λ

), zero-inflation parameters (

π

), and the resulting zero probabilities

p (y = 0)

across models of increasing polynomial degree. Although the zero-inflation parameter

π

varies with model complexity, the predicted zero probabilities remain close to the empirical value (≈0.2874). This demonstrates that the zPAP model—particularly at higher degrees—can flexibly adjust both

λ

and the polynomial weights to accurately reproduce the observed zero mass, even under varying internal parameterizations.

As can be seen in Figure 3, the pointwise differences between the observed PMF and model predictions—ZIP (red dots) and zPAP(4) (green triangles)—demonstrate that the proposed fourth-degree zPAP model substantially reduces discrepancies across the entire range of count values, yielding a more accurate fit to the empirical distribution.

As illustrated in Figure 4, and consistent with the discussion on the convergence of the normalizing constant in Section 2 , the polynomial adjuster

T (y; α)

grows at most as

O (y^{d})

as

y \to \infty

. In contrast, Figure 3 shows that the Poisson kernel

λ^{y} / y!

decays super-exponentially, ensuring that the overall model remains stable and capable of capturing heavy-tailed distributions effectively.

5.3. Constructing Bootstrap Confidence Intervals for MLEs

To rigorously assess the uncertainty associated with our MLEs for the proposed zPAP model, one must first establish their large-sample behavior. An analytical derivation of asymptotic properties, namely consistency, asymptotic normality, and efficiency, is mandatory to validate that MLEs converge to the true parameter values and achieve the lower bound of Cramér–Rao. In classical settings, these results follow from regularity conditions on the likelihood and are underpinned by the Fisher information matrix and the observed Hessian of the log-likelihood. However, for the zPAP framework the expressions for the Hessian matrix, the Fisher information matrix, and hence the asymptotic covariance matrix of the MLEs become complex. Closed-form second derivatives of the log-likelihood involve high-dimensional integrals and non-standard functions that preclude straightforward inversion and numerical stability, especially in finite samples.

To overcome these challenges and still provide valid inference on artificial Fish Catch data characterized by zero inflation and bimodal Poisson counts, we employ a parametric bootstrap approach with

B = 1000

replications. Specifically, we generate synthetic datasets by drawing from the fitted zPAP distribution at the estimated parameter values, re-estimate the MLEs for each replicate, and then use the empirical distribution of these bootstrap estimates to form confidence intervals. This procedure not only circumvents the analytical intractability of the model’s Hessian but also delivers accurate finite-sample inference.

We have implemented a parametric bootstrap procedure to compute standard errors, confidence intervals, and the full covariance matrix for the MLEs. By simulating 1000 bootstrap samples, this approach circumvents the analytical intractability of the model’s Hessian matrix and provides more reliable finite-sample inference. To assess the reliability of these intervals, we construct and evaluate coverage probabilities under the assumption of asymptotic normality to ensure that our intervals achieve the intended coverage in practice.

Figure 5 illustrates the bootstrap sampling distributions of the two parameters: in the left panel, the histogram of the Poisson mean

λ

estimates is markedly right-skewed, with a long upper tail indicating occasional larger values, whereas in the right panel the distribution of the zero-inflation probability

π

estimates is symmetric and bell-shaped, closely matching the Gaussian curve expected under asymptotic normality.

Figure 6 shows the bootstrap sampling distributions of the four degree-4 polynomial adjuster coefficients. While all four distributions are centered around their true values, none are perfectly symmetric. The distribution of

α_{1}

and

α_{3}

are noticeably right-skewed, with a pronounced upper tail, whereas

α_{2}

and

α_{4}

each exhibit left-skewness, reflected in their longer lower tails.

In Table 5, each column summarizes a different aspect of the bootstrap-derived sampling distribution of the MLEs, all based on

B = 1000

replications. The “Mean” is the average of the B bootstrap estimates for a given parameter. Because the bootstrap draws repeatedly from the fitted model, this empirical average serves not only as a point estimate but also as a simple bias check—if the bootstrap mean differs notably from the original MLE, it suggests small-sample bias. The “Standard Error” is calculated as the sample standard deviation of those B estimates, that is,

{\hat{SE}}_{boot} = \sqrt{\frac{1}{B - 1} \sum_{b = 1}^{B} {({\hat{θ}}^{* (b)} - {\bar{θ}}^{*})}^{2}},

and it quantifies the finite-sample variability of the estimator without relying on analytic Hessians or Fisher information.

The 95% confidence interval in the table is constructed by taking the 2.5th and 97.5th percentiles of the sorted bootstrap estimates. For example, for

λ

the interval

[5.4117, 8.1405]

encloses the central 95% of the

{\hat{λ}}^{*}

values. This percentile method implicitly accounts for skewness in the bootstrap distribution, so that asymmetric intervals arise naturally when the sampling distribution is skewed.

Under the asymptotic-normality assumption, each 95% interval is formed as

{\hat{θ}}_{mean} \pm 1.96 {\hat{SE}}_{boot} .

Table 5 shows that these intervals achieve coverage probabilities very close to the nominal 95% level:

π

hits exactly 0.950,

α_{3}

comes in at 0.954, and

α_{2}

at 0.944. The scale parameter

λ

exhibits slight under-coverage (0.933), while the highest-order coefficient

α_{4}

is slightly conservative (0.969), reflecting its very small variance and the corresponding tail behavior of its bootstrap distribution.

Σ = \begin{matrix} \begin{matrix} λ & π & α_{1} & α_{2} & α_{3} & α_{4} \end{matrix} \\ \begin{matrix} λ \\ π \\ α_{1} \\ α_{2} \\ α_{3} \\ α_{4} \end{matrix} & (\begin{matrix} 0.48039411 & 0.00351852 & 0.05192662 & - 0.02312406 & 0.00335162 & - 0.00016195 \\ 0.00351852 & 0.00044764 & 0.00039592 & - 0.00017702 & 0.00002578 & - 0.00000125 \\ 0.05192662 & 0.00039592 & 0.00607207 & - 0.00272718 & 0.00039795 & - 0.00001929 \\ - 0.02312406 & - 0.00017702 & - 0.00272718 & 0.00123572 & - 0.00018180 & 0.00000888 \\ 0.00335162 & 0.00002578 & 0.00039795 & - 0.00018180 & 0.00002696 & - 0.00000133 \\ - 0.00016195 & - 0.00000125 & - 0.00001929 & 0.00000888 & - 0.00000133 & 0.00000007 \end{matrix}) \end{matrix}

The matrix

Σ

is the empirical covariance of the bootstrap estimates for all six parameters, with rows and columns ordered as

{λ, π, α_{1}, α_{2}, α_{3}, α_{4}}

. Each diagonal entry gives the bootstrap variance of that parameter (for example,

Var ({\hat{λ}}^{*}) = 0.4804

and

Var ({\hat{π}}^{*}) = 0.0004476

), which correspond to the squared standard errors reported in Table 5. The much larger variance for

λ

compared to

π

reflects greater sampling variability in the scale parameter under our model. Off-diagonal entries capture pairwise covariances. A positive covariance, such as

Cov ({\hat{λ}}^{*}, {\hat{α}}_{1}^{*}) = 0.0519

, indicates that bootstrap draws in which

λ

is higher than its mean tend also to have larger

α_{1}

. Conversely, the negative covariance between

λ

and

α_{2}

(

- 0.0231

) shows that when

λ

is relatively large,

α_{2}

tends to be smaller. The tiny covariances involving

α_{4}

are consistent with its very small variance (0.00000007), indicating that

α_{4}

is estimated with high precision and almost independently of the other coefficients.

6. Concluding Remarks

We introduced the zero-inflated Polynomially Adjusted Poisson (zPAP) model. It extends the usual zero-inflated Poisson to handle extra zeros, overdispersion, skewness, and even bimodal counts. The key idea is to multiply the Poisson kernel by a simple polynomial weight, while still using a logistic link for zero inflation.

In the main text, we gave the full mathematical setup. We wrote down the adjusted likelihood and showed how each parameter—zero-inflation probability

π

, Poisson rate

λ

, and polynomial coefficients

α

—can depend on covariates. We then derived the maximum-likelihood equations and explained how to compute them in practice. That discussion covers how to pick starting values, enforce identifiability constraints, and evaluate the normalizing constant and its derivatives. By spelling out the log-likelihood and score functions, we give both the theory and a clear recipe for implementation.

We tested the zPAP model on two datasets. First, we used the Fish Catch data, which has many zeros and high dispersion. Then, we created a mixed dataset by adding random Poisson counts to the Fish Catch observations, producing a clear bimodal pattern. In both cases, zPAP fit the data better than the standard ZIP model, handling multiple peaks and skewed shapes with ease. We generated 1000 bootstrap datasets by sampling from the fitted zPAP(4) model, re-estimating the MLEs on each replicate, and used the resulting estimate distributions to compute standard errors, confidence intervals, and the covariance matrix.

In summary, zPAP is a powerful yet interpretable extension of classical count models. It can flexibly capture a wide range of patterns in count data. Future work will first establish the large-sample properties of the zPAP maximum-likelihood estimators—proving consistency, asymptotic normality, and efficiency under standard regularity conditions. We will also investigate using an orthogonal polynomial basis for

T (y; α)

to improve numerical stability and interpretability. In addition, we plan to compare different fitting methods—such as the method of moments and least squares—against maximum likelihood to see how they perform in practice.

Author Contributions

Conceptualization, H.-T.H.; Methodology, J.-S.L.; Software, J.-S.L.; Validation, J.-S.L.; Formal analysis, J.-S.L.and H.-T.H.; Investigation, H.-T.H.; Writing—original draft, H.-T.H.; Visualization, J.-S.L.; Supervision, H.-T.H.; Funding acquisition, H.-T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the MSIT (Ministry of Science and ICT), Republic of Korea, under the ITRC (Information Technology Research Center) support program (RS-2025-00259004) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation) and by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) grant funded by the Korea government (MOTIE) (20214000000060, Department of Next Generation Energy System Convergence based-on Techno-Economics-STEP).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mullahy, J. Specification and Testing of Some Modified Count Data Models. J. Econom. 1986, 33, 341–365. [Google Scholar] [CrossRef]
Lambert, D. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
Ridout, M.; Demétrio, C.G.B.; Hinde, J. Models for Count Data with Many Zeros. In Proceedings of the XIXth International Biometric Conference, Cape Town, South Africa, 13–18 December 1998; Sparks, A.H., Ed.; The International Biometric Society: Cape Town, South Africa, 1998; pp. 179–192. [Google Scholar]
Greene, W.H. Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models; Working Paper EC-94-10; Department of Economics, New York University: New York, NY, USA, 1994. [Google Scholar]
Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A Useful Distribution for Fitting Discrete Data: Revival of the Conway–Maxwell–Poisson Distribution. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2005, 54, 127–142. [Google Scholar] [CrossRef]
Sellers, K.F.; Shmueli, G. A Flexible Regression Model for Count Data. Ann. Appl. Stat. 2010, 4, 943–961. [Google Scholar] [CrossRef]
Im, J.; Morikawa, K.; Ha, H.-T. A Least Squares-Type Density Estimator Using a Polynomial Function. Comput. Stat. Data Anal. 2020, 144, 106882. [Google Scholar] [CrossRef]
Min, J.; Provost, S.B.; Ha, H.-T. Moment-Based Approximations of Probability Mass Functions with Applications Involving Order Statistics. Commun. Stat.-Theory Methods 2009, 38, 1969–1981. [Google Scholar] [CrossRef]
Ha, H.-T. Krawtchouk Polynomial Approximation for Binomial Convolutions. Kyungpook Math. J. 2017, 57, 493–502. [Google Scholar] [CrossRef]
Ha, H.-T. Charlier Series Approximation for Nonhomogeneous Poisson Processes. Commun. Stat. Appl. Methods 2024, 31, 645–659. [Google Scholar] [CrossRef]
del Castillo, J.; Pérez-Casany, M. Weighted Poisson Distributions for Overdispersion and Underdispersion Situations. Ann. Inst. Stat. Math. 1998, 50, 567–585. [Google Scholar] [CrossRef]
Ridout, M.S.; Besbeas, P. An empirical model for underdispersed count data. Stat. Model. 2004, 4, 77–89. [Google Scholar] [CrossRef]
Lasserre, J.B. Moments, Positive Polynomials and Their Applications; Imperial College Press: London, UK, 2009. [Google Scholar]
Marshall, M. Positive Polynomials and Sums of Squares; Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, USA, 2008; Volume 146. [Google Scholar]
Cordeiro, G.M.; McCullagh, P. Bias Correction in Generalized Linear Models. J. R. Stat. Soc. Ser. B (Methodol.) 1991, 53, 629–643. [Google Scholar] [CrossRef]
Heilbron, D.C. Zero-Altered and Other Regression Models for Count Data with Added Zeros. Biom. J. 1994, 36, 531–547. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Hilbe, J.M. Modeling Count Data; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization. Acm Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
Hilbe, J.M. Negative Binomial Regression, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]

Figure 1. Observed count distribution (bars) and fitted distributions (dotted lines) for models.

Figure 2. Observed count distribution (bars) and fitted distributions (lines) for zPAP models.

Figure 3. Pointwise differences between the observed PMF and the predicted values from the ZIP (red dots) and zPAP(4) (green triangles) models.

Figure 4. Polynomial adjusters for zPAP models.

Figure 5. Bootstrap distributions: (left) Histogram of

λ

and (right) Histogram of

π

.

Figure 5. Bootstrap distributions: (left) Histogram of

λ

and (right) Histogram of

π

.

Figure 6. Bootstrap distributions of the four degree-4 polynomial adjuster coefficients.

Table 1. Maximum likelihood estimates for ZIP, zPAP(1), zPAP(2), and zPAP(3) models.

Parameter	ZIP	zPAP(1)	zPAP(2)	zPAP(3)
$β_{0}$ (Intercept for $λ$ )	$0.623$	$- 0.368$	$- 0.368$	$- 0.327$
$β_{1}$ (persons)	$- 0.773$	$0.623$	$0.611$	$0.590$
$γ_{0}$ (Intercept for $π$ )	$- 0.300$	$0.605$	$0.604$	$0.604$
$γ_{1}$ (camper)	$0.752$	$- 0.744$	$- 0.745$	$- 0.745$
$α_{1}$	—	$0.000$	$0.000$	$0.000$
$α_{2}$	—	—	$0.004$	$0.000$
$α_{3}$	—	—	—	$0.001$
Log-Likelihood	$- 921.624$	$- 671.101$	$- 668.619$	$- 663.990$
AIC	$1851.250$	$1352.201$	$1349.239$	$1341.981$
BIC	$1865.330$	$1369.808$	$1370.368$	$1366.631$

Table 2. Estimated zero probability and poisson mean for unimodal models.

Model	Zero Probability ( $\hat{π}$ )	Poisson Mean ( $\hat{λ}$ )
Observed	0.5680	3.2960
ZIP	0.5670	0.3720
zPAP(1)	0.5822	4.1997
zPAP(2)	0.5822	4.0351
zPAP(3)	0.5773	4.2890

Table 3. MLEs for ZIP and zPAP(1) to zPAP(4) models for multimodal without regressor parameters.

Parameter	ZIP	zPAP(1)	zPAP(2)	zPAP(3)	zPAP(4)
$α_{1}$	—	$0.000$	$- 0.000$	$- 0.296$	$- 0.643$
$α_{2}$	—	—	$0.000$	$0.024$	$0.153$
$α_{3}$	—	—	—	$0.000$	$- 0.016$
$α_{4}$	—	—	—	—	$0.001$
Log-Likelihood	$- 1164$	$- 1164$	$- 1164$	$- 1053$	$- 1021$
AIC	$2330$	$2332$	$2334$	$2114$	$2053$
MSE	$8.55 \times 10^{- 4}$	$8.55 \times 10^{- 4}$	$8.55 \times 10^{- 4}$	$3.57 \times 10^{- 4}$	$1.55 \times 10^{- 4}$

Table 4. MLEs for zero rate, poisson mean, and zero probability.

Model	Zero Rate ( $\hat{π}$ )	Poisson Mean ( $\hat{λ}$ )	Predicted $p (y = 0)$
Observed	0.2874	5.8806	0.2874
ZIP	0.2873	8.2507	0.2874
zPAP(1)	0.2873	8.2507	0.2874
zPAP(2)	0.2873	8.2507	0.2874
zPAP(3)	0.2843	6.8084	0.2874
zPAP(4)	0.2739	6.1500	0.2875

Table 5. Bootstrap summary.

Parameter	Mean	Standard Error	95% Confidence Interval	Coverage Probability
$λ$	6.3026	0.6931	[5.4117, 8.1405]	0.933
$π$	0.2731	0.0212	[0.2328, 0.3158]	0.950
$α_{1}$	−0.6355	0.0779	[−0.7626, −0.4568]	0.934
$α_{2}$	0.1510	0.0352	[0.0766, 0.2129]	0.944
$α_{3}$	−0.0155	0.0052	[−0.0255, −0.0054]	0.954
$α_{4}$	0.0006	0.0003	[0.0001, 0.0011]	0.969

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.-S.; Ha, H.-T. Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution. Mathematics 2025, 13, 2383. https://doi.org/10.3390/math13152383

AMA Style

Lee J-S, Ha H-T. Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution. Mathematics. 2025; 13(15):2383. https://doi.org/10.3390/math13152383

Chicago/Turabian Style

Lee, Jong-Seung, and Hyung-Tae Ha. 2025. "Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution" Mathematics 13, no. 15: 2383. https://doi.org/10.3390/math13152383

APA Style

Lee, J.-S., & Ha, H.-T. (2025). Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution. Mathematics, 13(15), 2383. https://doi.org/10.3390/math13152383

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution

Abstract

1. Introduction

2. Zero-Inflated Polynomial-Adjusted Poisson (zPAP) Distribution

3. Maximum-Likelihood Estimation

4. Regression Parameterization

5. Numerical Examples

5.1. Fish Catch Dataset

5.2. Artificial Dataset

5.3. Constructing Bootstrap Confidence Intervals for MLEs

6. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI