Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms

Xun-Jian Li; Guo-Liang Tian; Mingqian Zhang; George To Sum Ho; Shuang Li

doi:10.3390/math11061478

,

and

¹

Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, China

²

Department of Supply Chain and Information Management, The Hang Seng University of Hong Kong, Shatin, N.T., Hong Kong, China

³

Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Mathematics2023, 11(6), 1478;https://doi.org/10.3390/math11061478

This article belongs to the Special Issue Computational Statistics and Data Analysis

Version Notes

Order Reprints

Abstract

Under-dispersed count data often appear in clinical trials, medical studies, demography, actuarial science, ecology, biology, industry and engineering. Although the generalized Poisson (GP) distribution possesses the twin properties of under- and over-dispersion, in the past 50 years, many authors only treat the GP distribution as an alternative to the negative binomial distribution for modeling over-dispersed count data. To our best knowledge, the issues of calculating maximum likelihood estimates (MLEs) of parameters in GP model without covariates and with covariates for the case of under-dispersion were not solved up to now. In this paper, we first develop a new minimization–maximization (MM) algorithm to calculate the MLEs of parameters in the GP distribution with under-dispersion, and then we develop another new MM algorithm to compute the MLEs of the vector of regression coefficients for the GP mean regression model for the case of under-dispersion. Three hypothesis tests (i.e., the likelihood ratio, Wald and score tests) are provided. Some simulations are conducted. The Bangladesh demographic and health surveys dataset is analyzed to illustrate the proposed methods and comparisons with the existing Conway–Maxwell–Poisson regression model are also presented.

Keywords:

generalized Poisson distribution; mean regression model; MM algorithms; over-dispersion; under-dispersion

MSC:

62-08

1. Introduction

Under-dispersed count data often appear in clinical trials, medical studies, demography, actuarial science, ecology, biology, industry and engineering. Examples include the number of embryonic deaths in mice in a clinical experiment [1], the number of power outages on each of 648 circuits in a power distribution system in the southeastern United States [2], the number of automotive services purchased on each visit for a customer at a US automotive services firm [3], the species richness that is the simplest measure of species diversity [4], the number of births during a period for women who live in Bangladesh (https://www.dhsprogram.com/data, accessed on 28 January 2022) and so on.

The Poisson distribution is suitable for modeling equally dispersed count data, while the negative binomial distribution is often utilized to model over-dispersed count data. To fit under-dispersed count data, theoretically speaking, researchers should employ the generalized Poisson (GP) distribution because it possesses the twin properties of under- and over-dispersion [5,6,7,8,9,10,11,12,13]. However, in the past 50 years, most authors just treat the GP distribution as an alternative to the negative binomial distribution by eyeing the former’s over-dispersion property, while seeming to ignore its under-dispersion characteristic [6,8,9,10,11,12,13], although Consul & Famoye [6] proved that there exist unique MLEs of parameters for both over- and under-dispersion cases. The main reason for hindering researchers from using the GP distribution with under-dispersion is that calculating the maximum likelihood estimates (MLEs) of parameters in GP models without/with covariates by a stable algorithm is not so easy. To our best knowledge, the issue of calculating MLEs of parameters in the GP model without/with covariates for the case of under-dispersion was not solved up to now; in other words, we may not obtain the correct MLEs of parameters by using the existing algorithms, see Section 5.

A non-negative integer-valued random variable (r.v.) X is said to follow a generalized Poisson (GP) distribution with parameters

λ > 0

and

ψ

, denoted by

X \sim G P (λ, ψ)

, if its probability mass function (pmf) is given by [5,14]

\begin{matrix} p (x | λ, ψ) = \{\begin{matrix} \frac{λ {(λ + ψ x)}^{x - 1} e^{- λ - ψ x}}{x!}, & x = 0, 1, \dots, \infty, \\ 0, & x > r, when ψ < 0, \end{matrix} \end{matrix}

where

max (- 1, - λ / r) < ψ < 1

and

r (⩾ 4)

is the largest positive integer for which

λ + ψ r > 0

when

ψ < 0

. The expectation and variance of X are given by [14]

\begin{matrix} E (X) = \frac{λ}{1 - ψ} and Var (X) = \frac{λ}{{(1 - ψ)}^{3}}, \end{matrix}

respectively. The

G P (λ, ψ)

distribution reduces to the

Poisson (λ)

when

ψ = 0

, and it has the twin properties of over-dispersion when

ψ > 0

and under-dispersion when

ψ < 0

.

To formulate the mean regression of the GP distribution, Consul & Famoye [7] introduced a so-called Type I generalized Poisson (

{GP}^{(I)}

) distribution, denoted by

Y \sim {GP}^{(I)} (μ, α)

, through the following reparameterizations:

\begin{matrix} μ = λ {(1 - ψ)}^{- 1} > 0 and α = {(1 - ψ)}^{- 1} . \end{matrix}

(1)

It is easy to show that the pmf of Y is

\begin{matrix} p (y | μ, α) = \{\begin{matrix} \frac{μ {[μ + (α - 1) y]}^{y - 1} exp \{- [μ + (α - 1) y] / α\}}{α^{y} y!}, & y = 0, 1, \dots, \infty, \\ 0, & y > m, when α < 1, \end{matrix} \end{matrix}

(2)

where

μ > 0

,

α ⩾ max (1 / 2, 1 - μ / m)

and

m (⩾ 4)

is the largest positive integer for which

μ + (α - 1) m > 0

when

α < 1

. The mean and variance of Y are given by:

E (Y) = μ and Var (Y) = α^{2} μ,

respectively, where

α

denotes the square root of the index of dispersion. The

{GP}^{(I)} (μ, α)

distribution reduces to the

Poisson (μ)

when

α = 1

, and it has the twin properties of over-dispersion when

α > 1

and under-dispersion when

α < 1

. Thus, the mean regression model for the

{GP}^{(I)}

distribution is [7]

\begin{matrix} {Y_{i}}_{i = 1}^{n} \overset{ind}{\sim} {GP}^{(I)} (μ_{i}, α) and log (μ_{i}) = w_{i}^{⊤} β, i = 1, \dots, n, \end{matrix}

(3)

where the notation “

{Y_{i}}_{i = 1}^{n} \overset{ind}{\sim} {GP}^{(I)} (μ_{i}, α)

” means that

Y_{1}, \dots, Y_{n}

follow the same

{GP}^{(I)}

distribution family but with different mean parameters

μ_{1}, \dots, μ_{n}

, and

Y_{1}, \dots, Y_{n}

are independent;

w_{i} = {(1, w_{i 1}, \dots, w_{i, q - 1})}^{⊤}

is the covariate vector of subject i and

β = {(β_{0}, β_{1}, \dots, β_{q - 1})}^{⊤}

is the vector of regression coefficients.

This paper mainly focuses on developing two new MM algorithms to stably calculate the MLEs of parameters in the

{GP}^{(I)} (μ, α)

distribution with under-dispersion (i.e.,

α < 1

) and the MLEs of the vector

β

of regression coefficients and the parameter

α

for the mean regression model in (3). Besides, we want to compare the performance of goodness-of-fit and computational efficiency between the

{GP}^{(I)}

mean regression model and the Conway–Maxwell–Poisson regression model in simulations and real data analysis.

2. MLEs of Parameters in Generalized Poisson with Under–Dispersion and Its Mean Regression Model

Let

{Y_{i}}_{i = 1}^{n} \overset{iid}{\sim} {GP}^{(I)} (μ, α)

and

Y_{obs} = {\{y_{i}\}}_{i = 1}^{n}

denote the observed counts. Define

\begin{matrix} I_{0} & ≜ & {i : y_{i} = 0, 1 ⩽ i ⩽ n}, \\ I_{1} & ≜ & {i : y_{i} = 1, 1 ⩽ i ⩽ n} and \\ I_{2} & ≜ & {i : y_{i} ⩾ 2, 1 ⩽ i ⩽ n} . \end{matrix}

Let

m_{k}

denote the number of elements in

I_{k}

for

k = 0, 1, 2

, then we have

m_{0} + m_{1} + m_{2} = n

. Based on (2), the likelihood function of

{μ, α}

is

\begin{matrix} L (μ, α) & = & (\prod_{i \in I_{0}} e^{- \frac{μ}{α}}) (\prod_{i \in I_{1}} \frac{μ}{α} e^{- \frac{μ + α - 1}{α}}) \\ \times \prod_{i \in I_{2}} \frac{μ {[μ + (α - 1) y_{i}]}^{y_{i} - 1} exp \{- [μ + (α - 1) y_{i}] / α\}}{α^{y_{i}} y_{i}!} \\ \propto & exp (- \frac{m_{0} μ}{α}) \cdot {(\frac{μ}{α})}^{m_{1}} exp [- \frac{m_{1} (μ + α - 1)}{α}] \\ \times \frac{μ^{m_{2}}}{α^{\sum_{i \in I_{2}} y_{i}}} exp \{- \frac{[m_{2} μ + (α - 1) \sum_{i \in I_{2}} y_{i}]}{α}\} \cdot \prod_{i \in I_{2}} {[μ + (α - 1) y_{i}]}^{y_{i} - 1}, \end{matrix}

where

\sum_{i \in I_{2}} y_{i} = n \bar{y} - m_{1}

and

\bar{y} = (1 / n) \sum_{i = 1}^{n} y_{i}

. Then, the log-likelihood function of

{μ, α}

is given by

\begin{matrix} ℓ (μ, α) & = & - \frac{m_{0} μ}{α} + m_{1} [log (μ) - log (α)] - \frac{m_{1} (μ + α - 1)}{α} + m_{2} log (μ) \\ - (n \bar{y} - m_{1}) log (α) - \frac{m_{2} μ + (α - 1) (n \bar{y} - m_{1})}{α} \\ + \sum_{i \in I_{2}} (y_{i} - 1) log [y_{i} (y_{i}^{- 1} μ + α - 1)] \\ = & (m_{1} + m_{2}) log (μ) - n \bar{y} log (α) - \frac{n (μ - \bar{y})}{α} - n \bar{y} + \sum_{i \in I_{2}} (y_{i} - 1) log (y_{i}) \\ + \sum_{i \in I_{2}} (y_{i} - 1) log (y_{i}^{- 1} μ + α - 1) \\ = & (m_{1} + m_{2}) log (μ) - n \bar{y} log (α) - \frac{n (μ - \bar{y})}{α} \\ + \sum_{i \in I_{2}} (y_{i} - 1) log (y_{i}^{- 1} μ + α - 1) + c_{1}, \end{matrix}

(4)

where

c_{1}

is a constant free from

{μ, α}

.

2.1. MLEs of ${μ, α}$ via a New MM Algorithm

This subsection aims to find the MLEs of

{μ, α}

for the case of

α < 1

. Define

y_{max} ≜ {max}_{i \in I_{2}} y_{i}

. Because

y_{i}^{- 1} μ + α - 1 > 0

for all

i \in I_{2}

, we have

y_{max}^{- 1} μ + α - 1 > 0

. Thus, we obtain

\begin{matrix} log (y_{i}^{- 1} μ + α - 1) & = & log [(y_{i}^{- 1} - y_{max}^{- 1}) μ + (y_{max}^{- 1} μ + α - 1)] \\ \overset{(A 2)}{⩾} & v_{i}^{(t, t)} log (μ) + (1 - v_{i}^{(t, t)}) log [μ + (α - 1) y_{max}] + c_{2 i}^{(t)}, \end{matrix}

(5)

for all

i \in I_{2}

, where

v_{i}^{(t, t)} ≜ v_{i} (μ^{(t)}, α^{(t)}) a n d v_{i} (μ, α) ≜ \frac{(y_{i}^{- 1} - y_{max}^{- 1}) μ}{y_{i}^{- 1} μ + α - 1}, i \in I_{2},

and

c_{2 i}^{(t)}

is a constant free from

{μ, α}

.

By combining (4) and (5), we have

\begin{matrix} ℓ (μ, α) & ⩾ & a_{1}^{(t, t)} log (μ) - n \bar{y} log (α) - \frac{n (μ - \bar{y})}{α} + a_{2}^{(t, t)} log [μ + (α - 1) y_{max}] + c_{3}^{(t)} \\ ≜ & Q (μ, α | μ^{(t)}, α^{(t)}), \end{matrix}

which minorizes

ℓ (μ, α)

at

{(μ, α)}^{⊤} = {(μ^{(t)}, α^{(t)})}^{⊤}

, where

a_{1}^{(t, t)} = m_{1} + m_{2} + \sum_{i \in I_{2}} (y_{i} - 1) v_{i}^{(t, t)}, a_{2}^{(t, t)} = \sum_{i \in I_{2}} (y_{i} - 1) (1 - v_{i}^{(t, t)}),

and

c_{3}^{(t)}

is a constant free from

{μ, α}

. Thus, by maximizing

Q (μ, α | μ^{(t)}, α^{(t)})

, we have the following MM iterates:

\begin{matrix} μ^{(t + 1)} & = & \frac{a_{3} (α^{(t)}) + \sqrt{a_{3}^{2} (α^{(t)}) + 4 n (1 - 1 / α^{(t)}) y_{max} a_{1}^{(t, t)}}}{2 n} \times α^{(t)} and \end{matrix}

(6)

\begin{matrix} α^{(t + 1)} & = & \frac{a_{4} (μ^{(t + 1)}) + \sqrt{a_{4}^{2} (μ^{(t + 1)}) + 4 (n \bar{y} - a_{2}^{(t + 1, t)}) \times a_{5} (μ^{(t + 1)})}}{2 (n \bar{y} - a_{2}^{(t + 1, t)})}, \end{matrix}

(7)

where

\begin{matrix} a_{3} (α) & = & - n (1 - α^{- 1}) y_{max} + n \bar{y}, \\ a_{4} (μ) & = & n μ (1 - \bar{y} y_{max}^{- 1}) and \\ a_{5} (μ) & = & n (μ - \bar{y}) (μ y_{max}^{- 1} - 1) . \end{matrix}

According to the one-to-one transformation (1), we can obtain the MLEs of

{λ, ψ}

as

\hat{ψ} = 1 - {\hat{α}}^{- 1} a n d \hat{λ} = {\hat{α}}^{- 1} \hat{μ},

where

{\hat{μ}, \hat{α}}

can be calculated through (6) and (7).

2.2. MLEs of ${β, α}$ in the Mean Regression Model

In this subsection, we consider the mean regression model (3) with

α < 1

. Similar to (4), the log-likelihood function of

{β, α}

is given by

\begin{matrix} ℓ (β, α) = \sum_{i = 1}^{n} \{b_{i 1} w_{i}^{⊤} β - \frac{μ_{i} - y_{i}}{α} - y_{i} log (α) + b_{i 2} log (y_{i}^{- 1} μ_{i} + α - 1)\} + c_{4}, \end{matrix}

(8)

where

b_{i 1} ≜ I (y_{i} ⩾ 1)

,

b_{i 2} ≜ (y_{i} - 1) I (y_{i} ⩾ 2)

,

μ_{i} = exp (w_{i}^{⊤} β)

, and

c_{4}

is a constant free from

{β, α}

. The goal is to calculate the MLEs of

{β, α}

.

2.2.1. MLE of $β$ Given ${β^{(t)}, α}$

Since

\partial μ_{i} / \partial β = μ_{i} w_{i}

, we have

\begin{matrix} \frac{\partial log (y_{i}^{- 1} μ_{i} + α - 1)}{\partial β} & = & \frac{y_{i}^{- 1} μ_{i}}{y_{i}^{- 1} μ_{i} + α - 1} w_{i} and \\ \frac{\partial^{2} log (y_{i}^{- 1} μ_{i} + α - 1)}{\partial β \partial β^{⊤}} & = & \frac{(α - 1) y_{i}^{- 1} μ_{i}}{{(y_{i}^{- 1} μ_{i} + α - 1)}^{2}} w_{i} w_{i}^{⊤} . \end{matrix}

(9)

According to (8), we know that

y_{i}^{- 1} μ_{i} + α - 1 > 0

, thus

0 < 1 - α < y_{i}^{- 1} μ_{i}

. Given

β^{(t)}

and

α

, to calculate the

(t + 1)

-th approximation of

\hat{β}

, we first restrict

β

in the following convex set

\begin{matrix} C^{(t)} = \{β : y_{i}^{- 1} μ_{i} ⩾ T_{i}^{(t)} (α) ≜ \frac{1}{2} [1 - α + y_{i}^{- 1} μ_{i}^{(t)}], \forall i \in I_{2}\}, \end{matrix}

(10)

where

T_{i}^{(t)} (α)

is the midpoint of the two endpoints of the open interval

(1 - α, y_{i}^{- 1} μ_{i}^{(t)})

and

μ_{i}^{(t)} ≜ exp (w_{i}^{⊤} β^{(t)})

. Then, for any

i \in I_{2}

, since

α - 1 < 0

, we have

\begin{matrix} \frac{(α - 1) y_{i}^{- 1}}{{(y_{i}^{- 1} μ_{i} + α - 1)}^{2}} \overset{(10)}{⩾} \frac{(α - 1) y_{i}^{- 1}}{{[T_{i}^{(t)} (α) + α - 1]}^{2}} ≜ b_{i 3}^{(t)} (α) . \end{matrix}

(11)

On the other hand, we define

\begin{matrix} h_{i}^{(t)} (β | α) = log [μ_{i} + (α - 1) y_{i}] - b_{i 3}^{(t)} (α) μ_{i}, \forall i \in I_{2} . \end{matrix}

(12)

By combining (9) with (11), we have

\frac{\partial^{2} h_{i}^{(t)} (β | α)}{\partial β \partial β^{⊤}} ⩾ 0;

i.e.,

\partial^{2} h_{i}^{(t)} (β | α) / \partial β \partial β^{⊤}

is a positive semi-definite matrix. By applying the second-order Taylor expansion of

h_{i}^{(t)} (β | α)

around

β^{(t)}

, we have

\begin{matrix} h_{i}^{(t)} (β | α) ⩾ h_{i}^{(t)} (β^{(t)} | α) + b_{i 4}^{(t)} (α) \times {(β - β^{(t)})}^{⊤} w_{i}, \end{matrix}

(13)

where the equality holds iff

β = β^{(t)}

, and

b_{i 4}^{(t)} (α) ≜ {{[μ_{i}^{(t)} + (α - 1) y_{i}]}^{- 1} - b_{i 3}^{(t)} (α)} μ_{i}^{(t)}

. Let

ℓ_{1} (β | α)

denote the conditional log-likelihood function of

β

given

α

, we have

\begin{matrix} ℓ_{1} (β | α) \overset{(8)}{=} ℓ (β, α) \\ \overset{(12) & (13)}{⩾} & \sum_{i = 1}^{n} \{[b_{i 1} + b_{i 2} b_{i 4}^{(t)} (α)] w_{i}^{⊤} β - [α^{- 1} - b_{i 2} b_{i 3}^{(t)} (α)] exp (w_{i}^{⊤} β)\} + c_{5}^{(t)} \\ ≜ & Q_{1} (β | β^{(t)}, α), \end{matrix}

which minorizes

ℓ_{1} (β | α)

at

β = β^{(t)}

, where

c_{5}^{(t)}

is a constant free from

β

.

Note that the

Q_{1} (β | β^{(t)}, α)

is a weighted log-likelihood function of

β

for the Poisson regression model with weight vector

{(α^{- 1} - b_{12} b_{13}^{(t)} (α), \dots, α^{- 1} - b_{n 2} b_{n 3}^{(t)} (α))}^{⊤}

and observations

Y_{obs}^{*} = {y_{i}^{*}}_{i = 1}^{n}

with

y_{i}^{*} = \frac{b_{i 1} + b_{i 2} b_{i 4}^{(t)} (α)}{α^{- 1} - b_{i 2} b_{i 3}^{(t)} (α)}, i = 1, \dots, n .

We can calculate the MLEs of

β

, denoted by

β_{*}^{(t + 1)}

, of the weighted Poisson regression model directly through the built-in ‘glm’ function in the VGAM R package. Since

β_{*}^{(t + 1)}

is restricted in the convex set

C^{(t)}

, we project

β_{*}^{(t + 1)}

on the convex set

C^{(t)}

, and calculate the

(t + 1)

-th approximation of

\hat{β}

as

\begin{matrix} β^{(t + 1)} & = & β^{(t)} + s^{(t)} (β_{*}^{(t + 1)} - β^{(t)}), \end{matrix}

(14)

where

\begin{matrix} s^{(t)} & ≜ & min (min_{i \in I_{2}} s_{i}^{(t)}, 1) and \\ s_{i}^{(t)} & ≜ & \frac{log [T_{i}^{(t)} (α) y_{i}] - w_{i}^{⊤} β^{(t)}}{w_{i}^{⊤} (β_{*}^{(t + 1)} - β^{(t)})} I (w_{i}^{⊤} (β_{*}^{(t + 1)} - β^{(t)}) < 0) \\ + I (w_{i}^{⊤} (β_{*}^{(t + 1)} - β^{(t)}) ⩾ 0) . \end{matrix}

2.2.2. MLE of $α$ Given ${β, α^{(t)}}$

Define

T_{min} (β) ≜ {min}_{i \in I_{2}} (y_{i}^{- 1} μ_{i})

. Given

β

, we have

\begin{matrix} log (y_{i}^{- 1} μ_{i} + α - 1) & = & log \{[y_{i}^{- 1} μ_{i} - T_{min} (β)] + [T_{min} (β) + α - 1]\} \\ \overset{(A 2)}{⩾} & u_{i} (β, α^{(t)}) log [T_{min} (β) + α - 1] + c_{6}^{(t)}, \forall i \in I_{2}, \end{matrix}

(15)

where

c_{6}^{(t)}

is a constant free from

α

and

u_{i} (β, α) ≜ \frac{T_{min} (β) + α - 1}{y_{i}^{- 1} μ_{i} + α - 1} .

Let

ℓ_{2} (α | β)

denote the conditional log-likelihood function of

α

given

β

, we have

\begin{matrix} ℓ_{2} (α | β) \overset{(8)}{=} ℓ (β, α) \end{matrix}

\begin{matrix} \overset{(15)}{⩾} & \sum_{i = 1}^{n} \{\frac{y_{i} - μ_{i}}{α} - y_{i} log (α) + b_{i 2} \cdot u_{i} (β, α^{(t)}) log [T_{min} (β) + α - 1]\} + c_{7}^{(t)} \\ ≜ & Q_{2} (α | β, α^{(t)}), \end{matrix}

which minorizes

ℓ_{2} (α | β)

at

α = α^{(t)}

, where

c_{7}^{(t)}

is a constant free from

α

. By setting

\partial Q_{2} (α | β, α^{(t)}) / \partial α = 0

, we have the following MM iterates:

\begin{matrix} α^{(t + 1)} = min (α_{*}^{(t + 1)}, 1), \end{matrix}

(16)

where

\begin{matrix} α_{*}^{(t + 1)} & ≜ & - \frac{d_{2} (β, α^{(t)}) + \sqrt{b_{2}^{2} (β, α^{(t)}) - 4 d_{1} (β, α^{(t)}) d_{3} (β)}}{2 d_{1} (β, α^{(t)})}, \\ d_{1} (β, α) & ≜ & \sum_{i = 1}^{n} b_{i 2} u_{i} (β, α) - n \bar{y}, \\ d_{2} (β, α) & ≜ & n [\bar{μ} - \bar{y} T_{min} (β)], \\ d_{3} (β) & ≜ & n [T_{min} (β) - 1] (\bar{μ} - \bar{y}), \\ \bar{μ} & = & \frac{1}{n} \sum_{i = 1}^{n} μ_{i} . \end{matrix}

3. Hypothesis Testing

For the

{GP}^{(I)}

mean regression model (3), suppose that we are interested in testing the following general null hypothesis:

H_{0} : C θ = c_{r} against H_{1} : C θ \neq c_{r},

(17)

where

C

is a known

r \times (q + 1)

matrix with

rank (C) = r_{0} < (q + 1)

,

θ = {(β^{⊤}, α)}^{⊤}

is the vector of parameters and

c_{r}

is a known

r \times 1

vector.

3.1. The Likelihood Ratio Test

Let

ℓ (θ) ≜ ℓ (β, α)

be given by (8). The likelihood ratio statistic is given by

T_{L} = 2 [ℓ (\hat{θ}) - ℓ ({\hat{θ}}_{H_{0}})],

(18)

where

\hat{θ}

is the unconstrained MLEs of

θ

, which can be calculated by the MM algorithm (14) and (16); while

{\hat{θ}}_{H_{0}}

is the constrained MLEs of

θ

under

H_{0}

.

T_{L}

asymptotically follows a chi-squared distribution with

r_{0}

degrees of freedom. The corresponding p-value is

p_{_{L}} = Pr (T_{L} > t_{L} | H_{0}) = Pr (χ^{2} (r_{0}) > t_{L}),

where

t_{L}

is the estimated likelihood ratio statistic.

3.2. The Wald Test

The Wald statistic is given by

T_{W} = {(C \hat{θ} - c_{r})}^{⊤} {[C I^{- 1} (\hat{θ}) C^{⊤}]}^{- 1} (C \hat{θ} - c_{r}),

(19)

where

\hat{θ}

denotes the unconstrained MLEs of

θ

and

I (\hat{θ})

is the Fishier information matrix (see Appendix B) evaluated at

θ = \hat{θ}

.

T_{W}

is asymptotically distributed as a chi-squared distribution with

r_{0}

degrees of freedom. The corresponding p-value is

p_{_{W}} = Pr (T_{W} > t_{W} | H_{0}) = Pr (χ^{2} (r_{0}) > t_{W}),

where

t_{W}

is the estimated Wald statistic.

3.3. The Score Test

The score statistic is given by

T_{S} = {[s ({\hat{θ}}_{H_{0}})]}^{⊤} I^{- 1} ({\hat{θ}}_{H_{0}}) s ({\hat{θ}}_{H_{0}}),

(20)

where

{\hat{θ}}_{H_{0}}

denotes the constrained MLEs of

θ

under

H_{0}

, and

s (θ) ≜ \frac{\partial ℓ (θ)}{\partial θ} = {(\frac{\partial ℓ (θ)}{\partial β_{0}}, \frac{\partial ℓ (θ)}{\partial β_{1}}, \dots, \frac{\partial ℓ (θ)}{\partial β_{q - 1}}, \frac{\partial ℓ (θ)}{\partial α})}^{⊤},

with details being presented in Appendix B.

T_{S}

is asymptotically distributed as a chi-squared distribution with

r_{0}

degrees of freedom. The corresponding p-value is

p_{_{S}} = Pr (T_{S} > t_{S} | H_{0}) = Pr (χ^{2} (r_{0}) > t_{S}),

where

t_{S}

is the estimated score statistic.

4. Simulations

4.1. Accuracy of MLEs of Parameters

To investigate the accuracy of MLEs of parameters, we consider dimensions:

q = 2, 4

. The sample sizes are set to be

n = 100, 200, 400

;

α = 0.6, 0.8, 0.95

and other parameters are set as follows:

(A1): When $q = 2$ , $β = {(1, - 1)}^{⊤}$ ; $w_{i} = {(1, w_{i 1})}^{⊤}$ , ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.3, σ_{0}^{2})$ with $σ_{0}^{2} = 0.5$ ;
(B1): When $q = 4$ , $β = {(1, - 1, 2, - 2)}^{⊤}$ ; $w_{i} = {(1, w_{i 1}, w_{i 2}, w_{i 3})}^{⊤}$ , ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.3, 0.5)$ , ${w_{i 2}}_{i = 1}^{n} \overset{iid}{\sim} U (0, 1)$ , ${w_{i 3}}_{i = 1}^{n} \overset{iid}{\sim} Bernoulli (0.5)$ .

For a given

{q, n, β, α}

, we first generate

{w_{i}}_{i = 1}^{n}

, and then generate

{Y_{i} = y_{i}}_{i = 1}^{n} \overset{iid}{\sim} {GP P}^{(I)} (w_{i}^{⊤} β, α)

by the inversion method [15] based on the pmf given by (2). Then, we can calculate the MLEs

{\hat{β}, \hat{α}}

via the MM algorithm (14) and (16) with the generated

{y_{i}}_{i = 1}^{n}

and corresponding covariate vectors

{w_{i}}_{i = 1}^{n}

. Finally, we independently repeat this process 10,000 times.

The resultant average bias (denoted by Bias; i.e., average MLE minus the true value of the parameter) and the mean square error (denoted by MSE; i.e., Bias² + (standard deviation)

^{2}

, the standard deviation is estimated by the sample standard deviation of 10,000 MLEs) are reported in Table 1 and Table 2.

Table 1. Parameter estimates based on 10,000 replications for Case (A1).

Table 2. Parameter estimates based on 10,000 replications for Case (B1).

Table 1 and Table 2 showed that the absolute values of Bias and MSE tend to zero with the growth of data size for each parameter in Cases (A1) and (B1). For fixing else parameters, the absolute values of Bias and MSE are small for a small

α

.

4.2. Hypothesis Testing

In this subsection, we explore the performances of the likelihood ratio, Wald and score statistics presented in (18)–(20) for the hypothesis testing in (17) with various parameter configurations. The sample sizes are set to be

n = 50 (50) 400

, where

n_{1} (s) n_{2}

means from

n_{1}

to

n_{2}

with step size s, and other parameters are set as follows:

(A2): When $q = 2$ , $β = {(β_{0}, β_{1})}^{⊤}$ , $α$ is set to be $0.75, 0.85, 0.95$ , $C = (0 1 0)$ , $c_{r} = 0$ and $θ = {(β^{⊤}, α)}^{⊤}$ , so that (17) becomes $H_{0} : β_{1} = 0$ . The true value of $β$ in $H_{0}$ is $β = {(1, 0)}^{⊤}$ , while the value of $β$ in $H_{1}$ is $β = {(1, 0.5)}^{⊤}$ . We generate ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.1, 0.2)$ and set $w_{i} = {(1, w_{i 1})}^{⊤}$ ;
(B2): When $q = 4$ , $β = {(β_{0}, β_{1}, β_{2}, β_{3})}^{⊤}$ , $α$ is set to be $0.75, 0.85, 0.95$ ,

$C = (\begin{matrix} 0 & 1 & - 1 & 0 & 0 \\ 0 & 0 & 1 & - 1 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{matrix}), c_{r} = 0 0_{3} a n d θ = {(β^{⊤}, α)}^{⊤},$

so that (17) becomes $H_{0} : β_{1} = β_{2} = β_{3} = 0$ . The true value of $β$ in $H_{0}$ is $β = {(1, 0, 0, 0)}^{⊤}$ and the value of $β$ in $H_{1}$ is $β = {(1, - 1, 1, - 0.5)}^{⊤}$ . We generate ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.1, 0.05)$ , ${w_{i 2}}_{i = 1}^{n} \overset{iid}{\sim} U (0, 0.1)$ , ${w_{i 3}}_{i = 1}^{n} \overset{iid}{\sim} 0.4 \times Bernoulli (0.5)$ , and set $w_{i} = {(1, w_{i 1}, w_{i 2}, w_{i 3})}^{⊤}$ ;
(A3): When $q = 2$ , $β = {(1, 1)}^{⊤}$ , $C = (0 0 1)$ , $c_{r} = 1$ and $θ = {(β^{⊤}, α)}^{⊤}$ , so that (17) becomes $H_{0} : α = 1$ . The alternative values of $α$ in $H_{1}$ are set as 0.9 and 0.95. We generate ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (1, 0.1)$ and set $w_{i} = {(1, w_{i 1})}^{⊤}$ ;
(B3): When $q = 4$ , $β = {(1, - 1, 1, - 0.5)}^{⊤}$ , $C = (0 0 0 0 1)$ , $c_{r} = 1$ and $θ = {(β^{⊤}, α)}^{⊤}$ , so that (17) becomes $H_{0} : α = 1$ . The alternative values of $α$ in $H_{1}$ are set as 0.9 and 0.95. We generate ${w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.1, 0.05)$ , ${w_{i 2}}_{i = 1}^{n} \overset{iid}{\sim} U (0, 0.1)$ , ${w_{i 3}}_{i = 1}^{n} \overset{iid}{\sim} 0.4 \times Bernoulli (0.5)$ , and set $w_{i} = {(1, w_{i 1}, w_{i 2}, w_{i 3})}^{⊤}$ .

All hypothesis testings are conducted at a significant level of 0.05. To calculate the empirical levels of the three tests, we first generate

{Y_{i} = y_{i}}_{i = 1}^{n} \overset{ind}{\sim} {GP}^{(I)} (w_{i}^{⊤} β, α)

under

H_{0}

. Repeating this process for L (=10,000) times, we obtained

Y_{obs}^{(l)} = {y_{1}^{(l)}, \dots, y_{n}^{(l)}}_{l = 1}^{L}

. Since our MM algorithm (14) & (16) is designed for

α < 1

, we apply two-stage method to obtain the MLEs of

θ

for the

{GP}^{(I)}

regression model. In the first stage, we calculate the MLEs

{\hat{β}, \hat{α}}

via the MM algorithm (14) & (16) with the generated

{y_{i}}_{i = 1}^{n}

and corresponding covariate vectors

{w_{i}}_{i = 1}^{n}

. If the estimated

\hat{α} < 1

, implying that the dataset is under-dispersed, we shall keep the estimation result and will not go to the second stage. If the estimated

\hat{α} = 1

, implying that the dataset may be equal- or over-dispersion, we shall go to the next stage; that is recalculating the MLEs

{\hat{β}, \hat{α}}

through the ‘vglm’ function by choosing family as ‘genpoisson1’ in VGAM R package because this function can only calculate the MLEs of the parameter when

α ⩾ 1

. Let

{r_{j}}_{j = 1}^{3}

denote the number of rejecting the null hypothesis

H_{0}

by the likelihood ratio, Wald and score statistics, respectively. Hence, the actual significance level can be estimated by

r_{j} / L

under

H_{0}

. Similarly, we generate

{Y_{i} = y_{i}}_{i = 1}^{n} \overset{ind}{\sim} {GP}^{(I)} (w_{i}^{⊤} β, α)

under

H_{1}

. Repeating this process for L (=10,000) times, we obtained

Y_{obs}^{(l)} = {y_{1}^{(l)}, \dots, y_{n}^{(l)}}_{l = 1}^{L}

. The empirical power can be estimated similarly to the empirical level. All results are reported in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.

Table 3. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

Table 4. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

Table 5. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

Table 6. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

Table 7. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A3).

Table 8. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B3).

Table 3 shows that the significant levels in the three statistics are around 0.05 for different sample sizes. Table 4 shows that the Wald statistic outperforms the likelihood ratio statistic, and the likelihood ratio statistic outperforms the score statistic. At the same time, the differences in the empirical powers among the three tests are very small. So, we can use the likelihood ratio, Wald, and score statistics for the regression hypothesis testing for various values of

α

when

q = 2

. The differences in performance among the three statistics are presented in Figure 1 and Figure 2.

Figure 1. The empirical levels of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A2) for different

α

. (a) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.75

; (b) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.85

; (c) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.95

.

Figure 2. The empirical powers of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A2) for different

α

’s. (a) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.75

; (b) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.85

; (c) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.95

.

Table 5 shows that the significant level in the likelihood ratio and the score statistics are around 0.05 for different sample sizes, while the significant levels in Wald statistic are around 0.07 for different

α

when

n = 50

and quickly decrease to 0.05 with the growth of sample size. Table 6 shows that the Wald statistic outperforms the likelihood ratio statistic, and the likelihood ratio statistic outperforms the score statistic. Unlike the differences in the empirical power among the three tests are small in Case (A2), Table 5 shows that the differences are more considerable in Case (B2). So, we can use the likelihood ratio and score statistics for the regression hypothesis testing for various values of

α

and different samples size when

q = 4

. Furthermore, we can use the Wald statistic when the sample size is more than 100. The differences in performance among the three statistics are presented in Figure 3 and Figure 4.

Figure 3. The empirical levels of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B2) for different

α

. (a) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.75

; (b) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.85

; (c) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.95

.

Figure 4. The empirical powers of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B2) for different

α

. (a) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.75

; (b) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.85

; (c) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.95

.

According to Table 7 and Table 8, we can see that the Wald statistic outperforms the other two statistics in Cases (A3)–(B3), and the likelihood ratio statistic outperforms the score statistic. Figure 5 and Figure 6 show a significant difference in empirical power among the three statistics. Furthermore, we can see that the empirical significant levels for the likelihood ratio and score statistics are satisfactorily controlled. In contrast, the significant level for the Wald statistic is over 0.08 when

n = 50

and gradually decreases to 0.05 with the growth of the sample size. So, we suggest using the likelihood ratio statistic for the dispersion hypothesis testing when the sample size is less than 200; and using the Wald statistic when the sample size is more than 200.

Figure 5. The empirical powers/level of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A3) for different

α

. (a) The empirical power with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 0.9

; (b) The empirical power with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 0.95

; (c) The empirical significant level with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 1

.

Figure 6. The empirical powers/level of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B3) for different

α

. (a) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.9

; (b) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.95

; (c) The empirical significant level with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 1

.

4.3. Comparisons of the ${G P}^{(I)}$ Regression Model with the Conway–Maxwell–Poisson Regression Model

To compare the performance of the goodness-of-fit tests and computational complexity in the

{GP}^{(I)}

regression model (3) and the Conway–Maxwell–Poisson (CMP) regression model, we consider using both models to fit a dataset, which is generated from one of the two models. A discrete r.v. Y is said to follow the CMP distribution with parameters

λ > 0

and

ν > 0

, denoted by

Y \sim CMP (λ, ν)

, if its pmf is [16]:

\begin{matrix} Pr (Y = y) & = & \frac{λ^{y}}{{(y!)}^{ν} Z (λ, ν)}, y = 0, 1, 2, \dots, \infty, \end{matrix}

where

Z (λ, ν) = \sum_{s = 0}^{\infty} λ^{s} / {(s!)}^{ν}

is a normalizing constant and

ν

is the dispersion parameter. The

CMP (λ, ν)

distribution reduces to the

Poisson (λ)

when

ν = 1

, and it has the twin properties of over-dispersion when

ν < 1

and under-dispersion when

ν > 1

. The CMP regression model [3,17] is:

\begin{matrix} {Y_{i}}_{i = 1}^{n} \overset{ind}{\sim} CMP (λ_{i}, ν) and log (λ_{i}) = w_{i}^{⊤} β, i = 1, \dots, n . \end{matrix}

The sample size is set to be

n = 1000

,

q = 4

,

β = {(1, - 0.5, 1, 0.5)}^{⊤}

,

w_{i} = {(1, w_{i 1}, w_{i 2}, w_{i 3})}^{⊤}

,

{w_{i 1}}_{i = 1}^{n} \overset{iid}{\sim} N (0.1, 0.5)

,

{w_{i 2}}_{i = 1}^{n} \overset{iid}{\sim} U (0, 1)

,

{w_{i 3}}_{i = 1}^{n} \overset{iid}{\sim} Bernoulli (0.5)

and other parameter configurations are set as follows:

(A4): For a fixed $β$ , set $α = 0.9$ and generate $X_{i} = x_{i} \overset{ind}{\sim} {GP}^{(I)} (μ_{i}, α)$ with $μ_{i} = exp (w_{i}^{⊤} β)$ for $i = 1, \dots, n$ .
(B4): For a fixed $β$ , set $ν = 1.2$ and generate $X_{i} = x_{i} \overset{ind}{\sim} CMP (λ_{i}, ν)$ with $λ_{i} = exp (w_{i}^{⊤} β)$ for $i = 1, \dots, n$ .

To assess the performance of the two models, we use the following three criteria: The average Akaike information criterion (AIC), the average Bayesian information criterion (BIC) and the average Pearson chi-squared statistic

χ_{n - q - 1}^{2}

[18]:

χ_{n - q - 1}^{2} = \sum_{i = 1}^{n} \frac{{(x_{i} - {\hat{μ}}_{i})}^{2}}{{\hat{σ}}_{i}^{2}},

where

{\hat{μ}}_{i}

and

{\hat{σ}}_{i}^{2}

are the estimated mean and variance of

X_{i}

and

(n - q - 1) = 995

is the degree of the Pearson chi-squared statistic because we use the MLEs

{\hat{β}, \hat{α}}

or

{\hat{β}, \hat{ν}}

to calculate

{{\hat{μ}}_{i}, {\hat{σ}}_{i}^{2}}_{i = 1}^{n}

. To show the differences in performance in the two models with the same data set, we first generate a data set from (A4) and estimate parameters with the

{GP}^{(I)}

and CMP regression models 1000 times. Specifically, we can calculate the MLEs of parameters for the

{GP}^{(I)}

regression model with the MM Algorithm (14) & (16). The MLEs of parameters of the CMP regression model can be calculated directly through the built-in ‘glm.cmp’ function in the COMPoissonReg R package. Next, we generate another data set from (B4) and estimate these parameters with the two models. By averaging the obtained results, the log-likelihood, AIC, BIC,

χ_{n - p}^{2}

and the time cost of the system when the algorithm converged (denoted by Sys. Time) are reported in Table 9.

Table 9. Model comparisons based on 1000 replications for Cases (A4) & (B4).

According to Table 9, in Case (A4), we can see that the log-likelihood of the

{GP}^{(I)}

regression model is larger than that of the CMP regression model, and the AIC and BIC of the

{GP}^{(I)}

regression model are smaller than that of the CMP regression model. However, the values of the log-likelihood, AIC and BIC show an inverse numerical relationship between

{GP}^{(I)}

and CMP regression models in Case (B4). So, the

{GP}^{(I)}

/CMP regression model performs better log-likelihood, AIC, and BIC when the data is generated from

{GP}^{(I)}

/CMP. For the Pearson chi-squared statistic, the

{GP}^{(I)}

regression model outperforms the CMP regression model in Case (A4) because the value of

{GP}^{(I)}

is closer to the degree of the Pearson chi-squared statistic, 995. In Case (B4), the

χ_{n - p - 1}^{2}

of the

{GP}^{(I)}

regression model is greater than 995 by around 5, and the

χ_{n - p - 1}^{2}

of the CMP regression model is less than 995 by around 5, implying they have similar performances. For the cost of time, our proposed

{GP}^{(I)}

regression model converges faster than the CMP model in simulation, in which the time cost of the

{GP}^{(I)}

regression model is nearly half of the CMP regression model.

5. Births in Last Five Years for Women in Bangladesh

The dataset is obtained from the Bangladesh demographic and health surveys (DHS) program (https://www.dhsprogram.com/data, accessed on 28 January 2022), recording several variables, e.g., Age, Education (educational level), Religion and Division, from 9067 women who are aged between 30 and 35. Our goal is to understand better the relationship between Births (births in the last five years) and its relevant explanatory variables. In this section, we construct a

{GP}^{(I)}

regression model to link the mean of Births with the values of Age, Education, Religion and Division and the mean regression model is presented as follows:

\begin{matrix} {Births}_{i} & \overset{ind}{\sim} & {GP}^{(I)} (μ_{i}, α), i = 1, \dots, n, and \\ log (μ_{i}) & = & β_{0} + {Age}_{i} \times β_{1} + {Primary}_{i} \times β_{2} + {Secondary}_{i} \times β_{3} + {Higher}_{i} \times β_{4} \\ + {Islam}_{i} \times β_{5} + {Hinduism}_{i} \times β_{6} + {Chittagong}_{i} \times β_{7} + {Dhaka}_{i} \times β_{8} \\ + {Khulna}_{i} \times β_{9} + {Mymensingh}_{i} \times β_{10} + {Rajshahi}_{i} \times β_{11} \\ + + {Rangpur}_{i} \times β_{12} + {Sylhet}_{i} \times β_{13} . \end{matrix}

(21)

Meanwhile, for comparisons, we also use the CMP regression model to fit the Bangladesh DHS data

\begin{matrix} {Births}_{i} & \overset{ind}{\sim} & C M P (λ_{i}, ν), i = 1, \dots, n, a n d \\ log (λ_{i}) & = & β_{0} + {Age}_{i} \times β_{1} + {Primary}_{i} \times β_{2} + {Secondary}_{i} \times β_{3} + {Higher}_{i} \times β_{4} \\ + {Islam}_{i} \times β_{5} + {Hinduism}_{i} \times β_{6} + {Chittagong}_{i} \times β_{7} + {Dhaka}_{i} \times β_{8} \\ + + {Khulna}_{i} \times β_{9} + {Mymensingh}_{i} \times β_{10} + {Rajshahi}_{i} \times β_{11} \\ + {Rangpur}_{i} \times β_{12} + {Sylhet}_{i} \times β_{13} . \end{matrix}

(22)

The MLEs of parameters for the

{GP}^{(I)}

regression model in (21) can be calculated through the proposed MM algorithm (14) and (16) and the MLEs of parameters for the

CMP

regression model in (22) can be calculated through the built-in ‘glm.cmp’ function in the COMPoissonReg R package. For a fixed j (

j = 1, \dots, 13

), the Std of

{\hat{β}}_{j}

calculated by the Wald statistic for testing

H_{0} : β_{j} = 0

is

\sqrt{e_{j}^{⊤} I^{- 1} (\hat{θ}) e_{j}}

, where

e_{j}

denotes the 15-dimensional vector with 1 for the

(j + 1)

-th element and 0’s elsewhere and

I (\hat{θ})

is the Fisher information matrix in Appendix B. Thus, the z-values (i.e., MLE/Std) and p-values can be calculated by the MLEs and their Stds and the estimation results of the

{GP}^{(I)}

and the CMP regression models are presented in Table 10.

Table 10. MLEs and CIs of parameters for the

{GP}^{(I)}

regression model in (21) and CMP regression model.

Table 10 indicates that the Age coefficient is

- 0.147

implying that the Age affects the number of births in the past five years negatively; that is, the willingness to give birth decreases as the growth of ages. The coefficients of Education shows that women with Higher education levels have more births than those with Primary and Secondary education levels. For the religious factor, we realize that there is no significant difference between women, whether Islam, Hinduism, or Christianity. Finally, we can see that the number of births varies widely depending on the Division where they live. More specifically, women who live in Chittagong, Mymensingh and Sylhet are willing to birth more kids, while those who live in Dhaka, Khulna, Rajshahi and Rangpur choose fewer births.

Table 10 shows that there exists minor difference of the coefficients between the

{GP}^{(I)}

and CMP regression models. The coefficients of Education shows that the Primary and Secondary, respectively, fails to pass the null hypotheses

H_{0} : β_{2} = 0

and

H_{0} : β_{3} = 0

in the CMP regression model under the significant level 5% because their corresponding p-values are both larger than 0.05. However, the two explanatory factors are significant in the

{GP}^{(I)}

regression model under the above conditions. It deserves to note that the

{GP}^{(I)}

regression links the mean with the covariate vector directly, so the model is of statistical meaning. However, the CMP regression lacks such statistical meanings because the regression model only constructs a connection between the parameter

λ

with the subject’s personalities.

Furthermore, to have a better understanding of the advantages of the proposed MM algorithm (14) and (16), we apply the existing ‘vglm’ function in VGAM R package to calculate the MLEs

{\hat{β}, \hat{α}}

of the

{GP}^{(I)}

regression model in (3). Further, we choose two functions ‘genpoisson0’ and ‘genpoisson’ in ‘vglm’ to calculate the MLEs of the parameters, in which the ‘genpoisson0’ function restricts

α ⩾ 1

while the ‘genpoisson’ function allows

α > max (1 / 2, 1 - μ / m)

. The criteria for the goodness-of-fit, like AIC, BIC and the Pearson chi-squared statistic, can be calculated by the obtained MLEs, the number of parameters, sample size, and log-likelihoods. The results are presented in Table 11.

Table 11. Comparisons of goodness-of-fit among the

{GP}^{(I)}

regression model, CMP regression model, log-lambda based GP regression model with constraint

λ ⩾ 0

and the log-lambda based GP regression model without constraint on

λ

.

Table 11 shows that the

{GP}^{(I)}

regression model estimated by the proposed MM algorithm and the CMP regression model share similar performance of the goodness-of-fit statistics, like AIC, BIC and Person Chi-square statistic, implying that both models fit the data set well. However, our MM algorithm converges to the

{\hat{β}, \hat{α}}

as nearly five times faster than the ‘glm.cmp’ function for calculating the CMP regression model. We can also see that the log-likelihood, AIC, BIC and

χ_{n - p - 1}^{2}

obtained through genpoisson0 and genpoisson functions perform much worse than

{GP}^{(I)}

and CMP regression models, even though they have a relatively less time for computation.

To test the dispersion, we use the likelihood ratio, Wald and score statistics, which have been proved efficiently for large sample sizes in Cases (A3)–(B3) in Section 4.2. The results in Table 12 show that the p-values of the three tests are zeros, implying that the null hypothesis

H_{0}

:

α = 1

should be rejected.

Table 12. Dispersion test for testing

H_{0}

:

α = 1

.

6. Discussion

In the present paper, given

{β^{(t)}, α}

, to avoid directly calculating

β^{(t + 1)}

in the maximization of the original log-likelihood function

ℓ (β, α)

, we successfully constructed a surrogate function

Q_{1} (β | β^{(t)}, α)

, which is equivalent to the log-likelihood function in a weighted Poisson regression, so that we can compute

β_{*}^{(t + 1)}

directly by using the VGAM R package. By projecting

β_{*}^{(t + 1)}

on the convex set

C^{(t)}

, we calculated

β^{(t + 1)}

as shown in (14). Besides, given

{β, α^{(t)}}

, we obtained an explicit expression for

α^{(t + 1)}

by maximizing a surrogate function

Q_{2} (α | β, α^{(t)})

. The simulation and real data analysis results showed that the proposed MM algorithms could stably obtain the MLEs of parameters for the

{GP}^{(I)}

distribution without/with covariates for various parameter configurations, while the built-in ‘genpoisson1’ function in the VGAM R package may converge to a wrong estimate of parameters. Besides, the results of the comparison between the proposed model and the existing CMP regression model reflected that the two models possess similar performance from the aspect of the goodness-of-fit. However, the proposed model outperforms the CMP regression model regarding computational efficiency and statistical meanings.

Author Contributions

Conceptualization, X.-J.L. and G.-L.T.; Methodology, X.-J.L. and G.-L.T.; Formal analysis, X.-J.L.; Investigation, M.Z., G.T.S.H. and S.L.; Resources, M.Z., G.T.S.H. and S.L.; Data curation, M.Z., G.T.S.H. and S.L.; Writing—original draft, X.-J.L.; Writing—review & editing, X.-J.L. and G.-L.T.; Supervision, G.-L.T.; Funding acquisition, G.-L.T. and G.T.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China: 12171225; Research Grants Council of Hong Kong: UGC/FDS14/P05/20; Big Data Intelligence Centre in The Hang Seng University of Hong Kong.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://www.dhsprogram.com/data].

Acknowledgments

Guo-Liang TIAN’s research was partially supported by National Natural Science Foundation of China (No. 12171225). G.T.S Ho would like to thank the Research Grants Council of Hong Kong for supporting this research under the Grant UGC/FDS14/P05/20. Furthermore, this research is also supported partially by the Big Data Intelligence Centre in The Hang Seng University of Hong Kong.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Discrete Version of Jensen’s Inequality

Let

f (z)

is a concave function defined on a convex set

C

, i.e.,

f^{″} (z) ⩽ 0

for all

z \in C

. The discrete version of Jensen’s inequality is

f (\sum_{k = 1}^{K} q_{k} z_{k}) ⩾ \sum_{k = 1}^{K} q_{k} f (z_{k}),

(A1)

which is true for any probability weights

{q_{k}}_{k = 1}^{K}

satisfying:

q_{k} > 0

and

\sum_{k = 1}^{K} q_{k} = 1

. Especially, in (A1) set

K = 2

,

f (\cdot) = log (\cdot)

, and suppose that

u_{1} (ϕ) > 0

and

u_{2} (ϕ) > 0

, then we obtain

\begin{matrix} log [u_{1} (ϕ) + u_{2} (ϕ)] & ⩾ & v (ϕ^{(t)}) log [\frac{u_{1} (ϕ)}{v (ϕ^{(t)})}] + [1 - v (ϕ^{(t)})] log [\frac{u_{2} (ϕ)}{1 - v (ϕ^{(t)})}] \\ = & v (ϕ^{(t)}) log [u_{1} (ϕ)] + [1 - v (ϕ^{(t)})] log [u_{2} (ϕ)] + c_{0}^{(t)}, \end{matrix}

(A2)

where the equality holds iff

ϕ = ϕ^{(t)}

,

c_{0}^{(t)}

is a constant free from

ϕ

, and

v (ϕ^{(t)}) ≜ \frac{u_{1} (ϕ^{(t)})}{u_{1} (ϕ^{(t)}) + u_{2} (ϕ^{(t)})} .

Appendix B. The Gradient Vector and Fisher Information Matrix

The gradient vector of

ℓ (β, α)

with respect to

β

and

α

are given by

\begin{matrix} \frac{\partial ℓ (θ)}{\partial β} & = & \sum_{i = 1}^{n} [1 + \frac{μ_{i} (y_{i} - 1)}{μ_{i} + (α - 1) y_{i}} - \frac{μ_{i}}{α}] w_{i}, \\ \frac{\partial ℓ (θ)}{\partial α} & = & \sum_{i = 1}^{n} [\frac{(y_{i} - 1) y_{i}}{μ_{i} + (α - 1) y_{i}} - \frac{y_{i}}{α} + \frac{μ_{i} - y_{i}}{α^{2}}] . \end{matrix}

The Hessian matrix is

H (θ) = (\begin{matrix} \frac{\partial^{2} ℓ (θ)}{\partial β \partial β^{⊤}} & \frac{\partial^{2} ℓ (θ)}{\partial β \partial α} \\ * & \frac{\partial^{2} ℓ (θ)}{\partial α^{2}} \end{matrix}),

where

\begin{matrix} \frac{\partial^{2} ℓ (θ)}{\partial β \partial β^{⊤}} & = & \sum_{i = 1}^{n} \{\frac{(α - 1) μ_{i} (y_{i} - 1) y_{i}}{{[μ_{i} + (α - 1) y_{i}]}^{2}} - \frac{μ_{i}}{α}\} w_{i} w_{i}^{⊤}, \\ \frac{\partial^{2} ℓ (θ)}{\partial α^{2}} & = & \sum_{i = 1}^{n} \{- \frac{(y_{i} - 1) y_{i}^{2}}{{[μ_{i} + (α - 1) y_{i}]}^{2}} + \frac{y_{i}}{α^{2}} - \frac{2 (μ_{i} - y_{i})}{α^{3}}\}, \\ \frac{\partial^{2} ℓ (θ)}{\partial β \partial α} & = & \sum_{i = 1}^{n} \{- \frac{(y_{i} - 1) y_{i} μ_{i}}{{[μ_{i} + (α - 1) y_{i}]}^{2}} + \frac{μ_{i}}{α^{2}}\} w_{i} . \end{matrix}

The Fisher information matrix is given by

I (θ) = - E [H (θ)],

where

\begin{matrix} E [\frac{\partial^{2} ℓ (θ)}{\partial β \partial β^{⊤}}] & = & - \sum_{i = 1}^{n} \{\frac{μ_{i}^{2} + 2 α (α - 1) μ_{i}}{[μ_{i} + 2 (α - 1)] α^{2}}\} w_{i} w_{i}^{⊤}, \\ E [\frac{\partial^{2} ℓ (θ)}{\partial α^{2}}] & = & - \sum_{i = 1}^{n} \{\frac{2 μ_{i}}{α^{2} [μ_{i} + 2 (α - 1)]}\}, \\ E [\frac{\partial^{2} ℓ (θ)}{\partial β \partial α}] & = & \sum_{i = 1}^{n} \{\frac{2 (α - 1) μ_{i}}{α^{2} [μ_{i} + 2 (α - 1)]}\} w_{i} . \end{matrix}

References

Saha, K.K. Analysis of one-way layout of count data in the presence of over or under dispersion. J. Stat. Plan. Inference 2008, 138, 2067–2081. [Google Scholar] [CrossRef]
Guikema, S.D.; Goffelt, J.P. A flexible count data regression model for risk analysis. Risk Anal. Int. J. 2008, 28, 213–223. [Google Scholar] [CrossRef]
Sellers, K.F.; Borle, S.; Shmueli, G. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2012, 28, 104–116. [Google Scholar] [CrossRef]
Lynch, H.J.; Thorson, J.T.; Shelton, A.O. Dealing with under- and over-dispersed count data in life history, spatial, and community ecology. Ecology 2014, 95, 3173–3180. [Google Scholar] [CrossRef]
Consul, P.C.; Jain, G.C. A generalization of the Poisson distribution. Technometrics 1973, 15, 791–799. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. The truncated generalized Poisson distribution and its estimation. Commun. Stat.–Theory Methods 1989, 18, 3635–3648. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. Generalized Poisson regression model. Commun. Stat.-Theory Methods 1992, 21, 89–109. [Google Scholar] [CrossRef]
Angers, J.F.; Biswas, A. A Bayesian analysis of zero-inflated generalized Poisson model. Comput. Stat. Data Anal. 2003, 42, 37–46. [Google Scholar] [CrossRef]
Joe, H.; Zhu, R. Generalized Poisson distribution: The property of mixture of Poisson and comparison with negative binomial distribution. Biom. J. 2005, 47, 219–229. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Hardin, J.W.; Addy, C.L.; Vuong, Q.H. Testing approaches for over-dispersion in Poisson regression versus the generalized Poisson model. Biom. J. 2007, 49, 565–584. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Hardin, J.W.; Addy, C.L. A score test for over-dispersion in Poisson regression based on the generalized Poisson-2 model. J. Stat. Plan. Inference 2009, 139, 1514–1521. [Google Scholar] [CrossRef]
Sellers, K.F.; Morris, D.S. Underdispersion models: Models that are “under the radar”. Commun. Stat.–Theory Methods 2017, 46, 12075–12086. [Google Scholar] [CrossRef]
Toledo, D.; Umetsu, C.A.; Camargo, A.F.M.; de Lara, I.A.R. Flexible models for non-equidispersed count data: Comparative performance of parametric models to deal with under-dispersion. AStA Adv. Stat. Anal. 2022, 106, 473–497. [Google Scholar] [CrossRef]
Consul, P.C.; Shoukri, M.M. The generalized Poisson distribution when the sample mean is larger than the sample variance. Commun. Stat.–Theory Methods 1985, 14, 667–681. [Google Scholar] [CrossRef]
Seber, G.A.F.; Salehi, M.M. Adaptive Sampling Designs: Inference for Sparse and Clustered Populations, Chapter 5: Inverse sampling methods; Springer: New York, NY, USA, 2012. [Google Scholar]
Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2005, 54, 127–142. [Google Scholar] [CrossRef]
Sellers, K.F.; Shmueli, G. A flexible regression model for count data. Ann. Appl. Stat. 2010, 4, 943–961. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]

Figure 1. The empirical levels of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A2) for different

α

. (a) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.75

; (b) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.85

; (c) The empirical level with

H_{0} : β = {(1, 0)}^{⊤}

for

α = 0.95

.

Figure 2. The empirical powers of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A2) for different

α

’s. (a) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.75

; (b) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.85

; (c) The empirical power with

H_{1} : β = {(1, 0.5)}^{⊤}

for

α = 0.95

.

Figure 3. The empirical levels of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B2) for different

α

. (a) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.75

; (b) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.85

; (c) The empirical level with

H_{1} : β = {(1, 0, 0, 0)}^{⊤}

for

α = 0.95

.

Figure 4. The empirical powers of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B2) for different

α

. (a) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.75

; (b) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.85

; (c) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.95

.

Figure 5. The empirical powers/level of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = 0

in Case (A3) for different

α

. (a) The empirical power with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 0.9

; (b) The empirical power with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 0.95

; (c) The empirical significant level with

H_{1} : β = {(1, 1)}^{⊤}

for

α = 1

.

Figure 6. The empirical powers/level of three test statistics (

T_{L}

,

T_{W}

,

T_{S}

) for testing

H_{0} : β_{1} = β_{2} = β_{3} = 0

in Case (B3) for different

α

. (a) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.9

; (b) The empirical power with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 0.95

; (c) The empirical significant level with

H_{1} : β = {(1, - 1, 1, - 0.5)}^{⊤}

for

α = 1

.

Table 1. Parameter estimates based on 10,000 replications for Case (A1).

n	Para	$α = 0.6$		$α = 0.8$		$α = 0.95$
n	Para	Bias	MSE	Bias	MSE	Bias	MSE
100	$β_{0}$	−0.0017	0.0371	−0.0034	0.0494	−0.0044	0.0582
	$β_{1}$	−0.0011	0.0703	−0.0001	0.0966	−0.0003	0.1186
	$α$	−0.0085	0.0394	−0.0100	0.0535	−0.0187	0.0556
200	$β_{0}$	−0.0007	0.0260	−0.0007	0.0350	−0.0020	0.0415
	$β_{1}$	0.0003	0.0543	−0.0001	0.0764	−0.0001	0.0916
	$α$	−0.0037	0.0280	−0.0051	0.0373	−0.0084	0.0417
400	$β_{0}$	−0.0004	0.0183	−0.0003	0.0242	−0.0009	0.0292
	$β_{1}$	0.0004	0.0357	0.0000	0.0484	−0.0006	0.0589
	$α$	−0.0014	0.0202	−0.0028	0.0267	−0.0042	0.0314

Table 2. Parameter estimates based on 10,000 replications for Case (B1).

n	Para	$α = 0.6$		$α = 0.8$		$α = 0.95$
n	Para	Bias	MSE	Bias	MSE	Bias	MSE
100	$β_{0}$	0.0011	0.0766	−0.0043	0.1050	−0.0075	0.1248
	$β_{1}$	0.0034	0.0612	0.0001	0.0815	−0.0010	0.0990
	$β_{2}$	−0.0063	0.1172	0.0026	0.1593	0.0057	0.1917
	$β_{3}$	0.0016	0.0622	0.0007	0.0848	0.0003	0.1021
	$α$	−0.0122	0.0421	−0.0189	0.0542	−0.0264	0.0580
200	$β_{0}$	−0.0001	0.0547	−0.0018	0.0743	−0.0018	0.0897
	$β_{1}$	0.0004	0.0474	0.0002	0.0655	0.0000	0.0781
	$β_{2}$	−0.0018	0.0767	0.0014	0.1039	0.0001	0.1256
	$β_{3}$	0.0003	0.0436	−0.0019	0.0606	−0.0005	0.0722
	$α$	−0.0060	0.0291	−0.0100	0.0377	−0.0132	0.0429
400	$β_{0}$	−0.0001	0.0363	−0.0006	0.0502	−0.0016	0.0610
	$β_{1}$	0.0015	0.0297	0.0004	0.0410	0.0001	0.0487
	$β_{2}$	−0.0004	0.0511	0.0000	0.0707	0.0004	0.0861
	$β_{3}$	0.0002	0.0297	0.0004	0.0417	−0.0001	0.0498
	$α$	−0.0029	0.0208	−0.0048	0.0269	−0.0063	0.0313

Table 3. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

Table 3. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

n	$α = 0.75$			$α = 0.85$			$α = 0.95$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.0530	0.0599	0.0485	0.0521	0.0591	0.0486	0.0530	0.0602	0.0482
100	0.0512	0.0548	0.0487	0.0502	0.0531	0.0485	0.0490	0.0522	0.0468
150	0.0490	0.0508	0.0476	0.0525	0.0538	0.0510	0.0522	0.0544	0.0505
200	0.0514	0.0526	0.0505	0.0545	0.0561	0.0538	0.0515	0.0525	0.0505
250	0.0495	0.0504	0.0488	0.0494	0.0501	0.0481	0.0528	0.0538	0.0517
300	0.0504	0.0515	0.0502	0.0451	0.0468	0.0451	0.0499	0.0509	0.0485
350	0.0545	0.0558	0.0540	0.0510	0.0514	0.0504	0.0503	0.0507	0.0500
400	0.0542	0.0547	0.0532	0.0480	0.0491	0.0477	0.0504	0.0517	0.0498

Table 4. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

Table 4. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A2).

n	$α = 0.75$			$α = 0.85$			$α = 0.95$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.2042	0.2252	0.1902	0.1580	0.1768	0.1489	0.1364	0.1505	0.1273
100	0.5457	0.5607	0.5333	0.4236	0.4392	0.4160	0.3387	0.3526	0.3316
150	0.6750	0.6844	0.6669	0.5411	0.5511	0.5340	0.4437	0.4549	0.4368
200	0.7733	0.7805	0.7702	0.6413	0.6519	0.6376	0.5366	0.5438	0.5326
250	0.8995	0.9030	0.8976	0.7986	0.8022	0.7973	0.6789	0.6853	0.6762
300	0.9439	0.9459	0.9420	0.8549	0.8574	0.8523	0.7591	0.7633	0.7564
350	0.9712	0.9719	0.9708	0.9129	0.9147	0.9119	0.8349	0.8371	0.8329
400	0.9903	0.9907	0.9903	0.9557	0.9571	0.9545	0.8975	0.8987	0.8962

Table 5. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

Table 5. The empirical levels of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

n	$α = 0.75$			$α = 0.85$			$α = 0.95$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.0588	0.0794	0.0439	0.0594	0.0761	0.0449	0.0539	0.0669	0.0405
100	0.0546	0.0648	0.0475	0.0497	0.0568	0.0435	0.0524	0.0579	0.0459
150	0.0523	0.0588	0.0486	0.0505	0.0563	0.0470	0.0492	0.0522	0.0449
200	0.0470	0.0507	0.0421	0.0541	0.0564	0.0506	0.0535	0.0573	0.0501
250	0.0510	0.0545	0.0485	0.0528	0.0547	0.0502	0.0531	0.0556	0.0506
300	0.0525	0.0550	0.0506	0.0501	0.0526	0.0486	0.0498	0.0515	0.0479
350	0.0490	0.0522	0.0476	0.0518	0.0535	0.0502	0.0527	0.0544	0.0510
400	0.0510	0.0527	0.0487	0.0560	0.0571	0.0540	0.0458	0.0474	0.0453

Table 6. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

Table 6. The empirical powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B2).

n	$α = 0.75$			$α = 0.85$			$α = 0.95$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.1629	0.2173	0.1193	0.1310	0.1700	0.0938	0.1124	0.1442	0.0850
100	0.3998	0.4409	0.3523	0.2918	0.3232	0.2576	0.2289	0.2502	0.2027
150	0.5320	0.5598	0.5042	0.3957	0.4186	0.3709	0.3089	0.3280	0.2912
200	0.7670	0.7824	0.7409	0.6148	0.6298	0.5879	0.4893	0.4988	0.4669
250	0.8241	0.8379	0.8096	0.6747	0.6867	0.6605	0.5487	0.5588	0.5337
300	0.8814	0.8887	0.8759	0.7551	0.7654	0.7442	0.6292	0.6381	0.6193
350	0.9657	0.9675	0.9621	0.8895	0.8934	0.8823	0.7897	0.7942	0.7792
400	0.9786	0.9800	0.9775	0.9215	0.9249	0.9199	0.8367	0.8409	0.8310

Table 7. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A3).

Table 7. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (A3).

n	$α = 0.9$			$α = 0.95$			$α = 1$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.1652	0.2764	0.0737	0.0881	0.1542	0.0445	0.0603	0.0884	0.0494
100	0.2553	0.3641	0.1616	0.0999	0.1548	0.0608	0.0555	0.0672	0.0479
150	0.3562	0.4563	0.2642	0.1203	0.1725	0.0834	0.0525	0.0614	0.0475
200	0.4738	0.5604	0.3814	0.1426	0.1929	0.0993	0.0533	0.0601	0.0498
250	0.5652	0.6446	0.4870	0.1689	0.2152	0.1286	0.0523	0.0587	0.0505
300	0.6560	0.7207	0.5876	0.1901	0.2423	0.1500	0.0516	0.0543	0.0500
350	0.7272	0.7779	0.6710	0.1996	0.2503	0.1623	0.0546	0.0583	0.0520
400	0.7891	0.8332	0.7418	0.2286	0.2765	0.1886	0.0516	0.0573	0.0496

Table 8. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B3).

Table 8. The empirical levels/powers of statistics (

T_{L}

,

T_{W}

,

T_{S}

) for Case (B3).

n	$α = 0.9$			$α = 0.95$			$α = 1$
n	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$	$T_{L}$	$T_{W}$	$T_{S}$
50	0.2247	0.3832	0.0878	0.1184	0.2194	0.0486	0.0671	0.1288	0.0417
100	0.3124	0.4417	0.1916	0.1238	0.2025	0.0731	0.0581	0.0846	0.0464
150	0.4119	0.5349	0.2957	0.1479	0.2185	0.0968	0.0553	0.0716	0.0487
200	0.5155	0.6211	0.4160	0.1672	0.2349	0.1165	0.0543	0.0679	0.0485
250	0.6117	0.6991	0.5210	0.1857	0.2510	0.1343	0.0541	0.0648	0.0492
300	0.6962	0.7740	0.6167	0.2040	0.2678	0.1595	0.0527	0.0602	0.0488
350	0.7668	0.8228	0.7022	0.2269	0.2918	0.1772	0.0532	0.0602	0.0502
400	0.8152	0.8606	0.7614	0.2558	0.3197	0.2076	0.0464	0.0545	0.0448

Table 9. Model comparisons based on 1000 replications for Cases (A4) & (B4).

Case	Model	Log-Likelihood	AIC	BIC	$χ_{n - p - 1}^{2}$	Sys. Time
(A4)	${GP}^{(I)}$	−2009.89	4029.77	4054.31	997.07	0.7783 s
(A4)	CMP	−2011.14	4032.28	4056.81	999.75	1.2971 s
(B4)	${GP}^{(I)}$	−2157.29	4324.58	4349.12	999.96	1.0038 s
(B4)	CMP	−2160.11	4330.21	4354.75	990.26	1.9550 s

Sys. Time represents the averaged time cost of the system when the algorithm converged for each repetition.

Table 10. MLEs and CIs of parameters for the

{GP}^{(I)}

regression model in (21) and CMP regression model.

Table 10. MLEs and CIs of parameters for the

{GP}^{(I)}

regression model in (21) and CMP regression model.

	${GP}^{(I)}$				CMP
Parameter	MLE	Std	$z$ -Value	$p$ -Value	MLE	Std	$z$ -Value	$p$ -Value
Intercept	4.455	0.3685	12.09	<0.0001	4.843	0.4700	1.31	<0.0001
$α$	0.913	0.0052	−16.71	<0.0001	–	–	–	–
$ν$	–	–	–	–	1.828	0.0666	27.44	<0.0001
Age	−0.147	0.0097	−15.05	<0.0001	−0.157	0.0121	−12.98	<0.0001
Education
Primary	−0.083	0.0401	−2.06	0.0391	−0.032	0.0508	−0.63	0.5271
Secondary	−0.085	0.0405	−2.10	0.0361	0.002	0.0512	0.04	0.9685
Higher	0.223	0.0534	4.18	<0.0001	0.403	0.0671	6.01	<0.0001
No education	0.000				0.000
Religion
Islam	−0.342	0.1797	−1.90	0.0573	−0.247	0.2387	−1.04	0.3002
Hinduism	−0.638	0.1868	−3.42	0.0006	−0.684	0.2478	−2.76	0.0057
Christianity	0.000				0.000
Division
Chittagong	0.064	0.0551	1.16	0.2461	0.103	0.0685	1.50	0.1331
Dhaka	−0.060	0.0572	−1.05	0.2946	−0.067	0.0711	−0.94	0.3492
Khulna	−0.320	0.0640	−5.00	<0.0001	−0.351	0.0794	−4.42	<0.0001
Mymensingh	0.052	0.0584	0.89	0.3759	0.072	0.0727	0.99	0.3210
Rajshahi	−0.319	0.0630	−5.06	<0.0001	−0.359	0.0781	−4.59	<0.0001
Rangpur	−0.093	0.0597	−1.56	0.1198	−0.116	0.0745	−1.55	0.1208
Sylhet	0.433	0.0540	8.03	<0.0001	0.576	0.0682	8.45	<0.0001
Barisal	0.000				0.000

Table 11. Comparisons of goodness-of-fit among the

{GP}^{(I)}

regression model, CMP regression model, log-lambda based GP regression model with constraint

λ ⩾ 0

and the log-lambda based GP regression model without constraint on

λ

.

Table 11. Comparisons of goodness-of-fit among the

{GP}^{(I)}

regression model, CMP regression model, log-lambda based GP regression model with constraint

λ ⩾ 0

and the log-lambda based GP regression model without constraint on

λ

.

Model	Log-Likelihood	AIC	BIC	$χ_{n - p - 1}^{2}$	Sys. Time
${GP}^{(I)}$	−7624.63	15,279.26	15,385.94	8974.17	11.0384 s
CMP	−7623.66	15,277.32	15,384.01	9017.33	52.4071 s
genpoisson0	−7645.95	15,321.90	15,428.58	7675.93	3.0545 s
genpoisson	−7706.93	15,443.87	15,550.55	7526.34	3.7786 s

genpoisson0 means using the function ‘genpoisson0’ in ‘vglm’; genpoisson means using the function ‘genpoisson’ in ‘vglm’; Sys. Time represents the time cost of the system when the algorithm converged.

Table 12. Dispersion test for testing

H_{0}

:

α = 1

.

Table 12. Dispersion test for testing

H_{0}

:

α = 1

.

Tests	Value	p-Value
Likelihood ratio	164.61	<0.0001
Wald	279.27	<0.0001
Score	128.22	<0.0001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms

Abstract

1. Introduction

2. MLEs of Parameters in Generalized Poisson with Under–Dispersion and Its Mean Regression Model

2.1. MLEs of ${μ, α}$ via a New MM Algorithm

2.2. MLEs of ${β, α}$ in the Mean Regression Model

2.2.1. MLE of $β$ Given ${β^{(t)}, α}$

2.2.2. MLE of $α$ Given ${β, α^{(t)}}$

3. Hypothesis Testing

3.1. The Likelihood Ratio Test

3.2. The Wald Test

3.3. The Score Test

4. Simulations

4.1. Accuracy of MLEs of Parameters

4.2. Hypothesis Testing

4.3. Comparisons of the ${G P}^{(I)}$ Regression Model with the Conway–Maxwell–Poisson Regression Model

5. Births in Last Five Years for Women in Bangladesh

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Discrete Version of Jensen’s Inequality

Appendix B. The Gradient Vector and Fisher Information Matrix

References

Article Metrics

Citations

Article Access Statistics

Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms

Abstract

1. Introduction

2. MLEs of Parameters in Generalized Poisson with Under–Dispersion and Its Mean Regression Model

2.1. MLEs of { μ , α } via a New MM Algorithm

2.2. MLEs of { β , α } in the Mean Regression Model

2.2.1. MLE of β Given { β ( t ) , α }

2.2.2. MLE of α Given { β , α ( t ) }

3. Hypothesis Testing

3.1. The Likelihood Ratio Test

3.2. The Wald Test

3.3. The Score Test

4. Simulations

4.1. Accuracy of MLEs of Parameters

4.2. Hypothesis Testing

4.3. Comparisons of the G P ( I ) Regression Model with the Conway–Maxwell–Poisson Regression Model

5. Births in Last Five Years for Women in Bangladesh

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Discrete Version of Jensen’s Inequality

Appendix B. The Gradient Vector and Fisher Information Matrix

References

Article Metrics

Citations

Article Access Statistics

2.1. MLEs of ${μ, α}$ via a New MM Algorithm

2.2. MLEs of ${β, α}$ in the Mean Regression Model

2.2.1. MLE of $β$ Given ${β^{(t)}, α}$

2.2.2. MLE of $α$ Given ${β, α^{(t)}}$

4.3. Comparisons of the ${G P}^{(I)}$ Regression Model with the Conway–Maxwell–Poisson Regression Model