Next Article in Journal
Gradient Method with Step Adaptation
Previous Article in Journal
A Throughput Analysis Using a Non-Saturated Markov Chain Model for LTE-LAA and WLAN Coexistence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Inference for Zero-Modified Power Series Regression Models

by
Katiane S. Conceição
1,
Marinho G. Andrade
1,
Victor Hugo Lachos
2 and
Nalini Ravishanker
2,*
1
Department of Applied Mathematics and Statistics, Institute of Mathematics and Computer Science, University of São Paulo, São Carlos 13566-590, SP, Brazil
2
Department of Statistics, University of Connecticut, Storrs, CT 06269, USA
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(1), 60; https://doi.org/10.3390/math13010060
Submission received: 13 October 2024 / Revised: 24 December 2024 / Accepted: 24 December 2024 / Published: 27 December 2024
(This article belongs to the Section D1: Probability and Statistics)

Abstract

:
Count data often exhibit discrepancies in the frequencies of zeros, which commonly occur across various application domains. These data may include excess zeros (zero inflation) or, less frequently, a scarcity of zeros (zero deflation). In regression models, both situations can arise at different levels of covariates. The zero-modified power series regression model provides an effective framework for modeling such count data, as it does not require prior knowledge of the type of zero modification, whether zero inflation or zero deflation, and can accommodate overdispersion, equidispersion, or underdispersion present in the data. This paper proposes a Bayesian estimation procedure based on the stochastic gradient Hamiltonian Monte Carlo algorithm, effectively addressing many challenges associated with estimating the model parameters. Additionally, we introduce a measure of Bayesian efficiency to evaluate the impact of prior information on parameter estimation. The practical utility of the proposed method is demonstrated through both simulated and real data across different types of zero modification.

1. Introduction

Applied statistical data analysis of count (enumeration) responses generally requires assumptions about discrete probability distributions generating the response [1,2,3,4]. In many situations, count datasets are analyzed using the Poisson distribution (see [5,6,7,8]) when we can assume that the mean and variance of the counts are the same. When the variance is assumed to be greater than or smaller than the mean, several flexible probability distributions have been proposed as alternatives to the Poisson distribution for modeling overdispersion and underdispersion [9]. They include discrete probability distributions belonging to the class of power series (PS) distributions or their generalizations [1,2,3,4,9,10,11,12].
Another often encountered situation with count data involves discrepancies in the frequency of zeros from that corresponding to the Poisson distribution. While zero inflation is most commonly observed, zero deflation can also occur in some situations. Pioneering work on these topics includes M’Kendrick [13], who studied the number of cancer cases between 1875 and 1898 in two suburbs of Luckau in Hanover, the data corresponding to a cholera epidemic in a village in India. David and Johnson [14] analyzed the number of defective teeth in 11-year-old boys and the number of ticks found on sheep. Statistical modeling of zero-inflated count data was proposed by [15,16] via the zero-inflated Poisson (ZIP) distribution, which is a mixture of a degenerate zero distribution and the Poisson distribution. For a review of ZIP models for count data, see [17]. In Dietz and Böhning [18], a zero-modified Poisson (ZMP) regression model was considered for analyzing zero-inflated or zero-deflated data. An application of the zero-inflated Poisson (ZIP) distribution for modeling data from natural calamities was presented by [19].
More recently, ref. [20] discussed zero-modified power series (ZMPS) distributions, which can adequately accommodate the discrepancy in the frequency of zeros without requiring any previous knowledge about the type of modification (zero-inflation or zero-deflation). As a special case, they illustrated the zero-modified negative binomial (ZMNB) distribution on simulated and real data, obtaining maximum likelihood estimates of model parameters. To our knowledge, the flexible and useful class of ZMPS distributions has not been employed in the literature for modeling dispersed count responses as a function of covariates, particularly in the Bayesian framework.
This paper presents a regression model for count data with the flexibility to handle different types of zero frequencies (inflation or deflation), as well as different dispersion characteristics (overdispersion or underdispersion) within a single dataset. Our framework includes, as special cases, standard regression models and zero-inflated models for count data. This paper addresses this issue in the Bayesian framework.
The sampling-based Bayesian framework using Markov Chain Monte Carlo (MCMC) methods [21] is attractive because the ZMPS likelihood function is complex, making exact Bayesian estimation and inference cumbersome. Most MCMC algorithms include an acceptance–rejection step of the generated candidate (the Metropolis–Hastings (MH) step) and may result in a high rejection rate in complex models, leading to slow convergence. To remedy this, several computationally efficient algorithms have been proposed in the literature, such as the Hamiltonian Monte Carlo (HMC) algorithm [22], the Metropolis Adjusted Langevin algorithm (MALA) [23,24,25], and an improvement of the HMC algorithm known as the No-U-Turn Sampler (NUTS) [26].
Further, the general class of algorithms using stochastic gradient MCMC (SGMCMC) [27,28,29,30,31] has gained recent popularity in the machine learning area but is still little used in the Bayesian context. An R package sgmcmc for stochastic gradient MCMC [30] calculates the gradients using a numerical computation library called tensorflow [32], which is a machine learning system with an R interface, operating on large, heterogeneous environments. In comparing algorithms in the SGMCMC class with the SGHMC algorithm, the latter is one of the most computationally efficient algorithms [31].
The objective of this paper is to develop a fully Bayesian approach for estimating the parameters of the ZMPS regression models via the computationally efficient stochastic gradient with friction HMC (SGHMC) algorithm proposed by [28]. A comprehensive study of various MCMC simulation methods is given by [33], along with a clear explanation of stochastic gradient methods. Since the required gradient equations can be obtained from the ZMPS models, we implement the SGHMC algorithm with the leapfrog integration method. We also derive and illustrate a relative Bayesian efficiency measure to compare the more general ZMPS distribution with the standard Power Series (PS) distribution.
The rest of the paper is organized as follows. Section 2.1 describes the ZMPS regression model framework for count responses. Section 2.3 presents the Bayesian approach for model parameter estimation, while Section 2.7 describes the SGHMC algorithm. Section 2.9 discusses the Bayesian information used to assess the efficiency loss incurred by fitting a ZMPS model when a PS distribution could have been applied. Section 3.1 presents simulation studies to evaluate the impact of the prior distribution on the quality of the Bayesian estimator. Section 3.2 reports the analysis of real datasets that consider different characteristics of the ZMPS distribution to analyze the number of femicides in 642 São Paulo municipalities in 2019 and 2020; in this application, we want to assess whether there was an increase in the number of femicides during the COVID-19 pandemic period. Section 4 draws some final comments on the relevant results achieved.

2. Materials and Methods

A regression model for count data is developed here, incorporating the flexibility of the ZMPS distribution, along with a Bayesian framework for parameter estimation.

2.1. ZMPS Regression Model

In ZMPS regression, suppose that the observations to be analyzed are represented by the vector y = ( y 1 , , y n ) . Let Y i , i = 1 n be n independent random variables assuming values in the support set A = { 0 , 1 , } and having probability mass function (p.m.f.) defined by  [20]
π Z M P S ( y i ; μ i , ϕ , p i ) = ( 1 p i ) I ( y i ) + p i π P S ( y i ; μ i , ϕ ) ,
where μ i > 0 , ϕ > 0 , and the indicator function I ( y i ) = 1 if y i = 0 and I ( y i ) = 0 if y i > 0 . The p.m.f. of the standard PS distribution is given by
π P S ( y i ; μ i , ϕ ) = a ( y i , ϕ ) g ( μ i , ϕ ) y i f ( μ i , ϕ ) ,
where a ( y i , ϕ ) , f ( μ i , ϕ ) and g ( μ i , ϕ ) are positive, finite, and twice-differentiable functions, with f ( μ i , ϕ ) = y i A 0 a ( y i , ϕ ) g ( μ i , ϕ ) y i , while the parameter p i is responsible for modifying the probability of zero of the standard PS distribution, such that
0 p i 1 1 π P S ( 0 ; μ i , ϕ ) ,
where π P S ( 0 ; μ i , ϕ ) is the probability of zero in the PS distribution.
The Poisson, generalized Poisson, COM-Poisson, and negative binomial distributions are well-known members of the PS family. Table 1 presents the functions f ( μ i , ϕ ) , g ( μ i , ϕ ) , and a ( y i , ϕ ) for several distributions within the PS family.
For the COM-Poisson distribution, we consider the parameterization proposed in [34] to provide a clearly centering parameter. The conditional mean and variance of Y i , E ( Y i ) = μ i P S and V ( Y i ) = υ i P S , are given by
μ i P S = f μ ( μ i , ϕ ) g ( μ i , ϕ ) f ( μ i , ϕ ) g μ ( μ i , ϕ ) and υ i P S = g ( μ i , ϕ ) g μ ( μ i , ϕ ) ,
where f μ and g μ denote derivatives with respect to μ . For the distributions shown in Table 1, the mean μ i P S = μ i , except for the COM-Poisson distribution, for which we can write the asymptotic approximations [34]:
μ i P S μ i + 1 ϕ 2 ϕ and υ i P S μ i ϕ .
This parametrization also allows ϕ to keep its role as a shape parameter. That is, if ϕ < 1 , the variance is greater than the mean (overdispersion), while ϕ > 1 leads to underdispersion. The mean and variance of random variable Y i are, respectively,
μ i ( Z M P S ) = p i μ i P S and σ i 2 ( Z M P S ) = p i { υ i P S + ( 1 p i ) ( μ i P S ) 2 } ,
Different values of p i lead to different ZMPS distributions, as can be seen in the evaluation of the proportion of additional or missing zeros, given by
π Z M P S ( 0 ; μ i , ϕ , p i ) π P S ( 0 ; μ i , ϕ ) = 1 p i + p i π P S ( 0 ; μ i , ϕ ) π P S ( 0 ; μ i , ϕ ) = ( 1 p i ) ( 1 π P S ( 0 ; μ i , ϕ ) ) .
According to (2), parameter p i also controls the frequency of zeros, such as the following:
(i)
When p i = 0 in (2), π Z M P S ( 0 ; μ i , ϕ , p i ) = 1 , and (1) is the degenerate distribution with the entire mass at zero.
(ii)
For all 0 < p i < 1 in (2), ( 1 p i ) ( 1 π P S ( 0 ; μ i , ϕ ) ) > 0 . Then, π Z M P S ( 0 ; μ i , ϕ , p i ) > π P S ( 0 ; μ i , ϕ ) , and (1) is the zero-inflated PS (ZIPS) distribution, which has a proportion of additional zeros.
(iii)
When p i = 1 in (2), π Z M P S ( 0 ; μ i , ϕ , p i ) π P S ( 0 ; μ i , ϕ ) = 0 . Therefore, π Z M P S ( 0 ; μ i , ϕ , p i ) = π P S ( 0 ; μ i , ϕ ) , and (1) is the standard PS distribution.
(iv)
For all 1 < p i < 1 1 π P S ( 0 ; μ i , ϕ ) in (2), ( 1 p i ) ( 1 π P S ( 0 ; μ i , ϕ ) ) < 0 . Then, π Z M P S ( 0 ; μ i , ϕ , p i ) < π P S ( 0 ; μ i , ϕ ) , and (1) is the zero-deflated negative binomial (ZDPS) distribution.
(v)
When p i = 1 1 π P S ( 0 ; μ i , ϕ ) in (2), π Z M P S ( 0 ; μ i , ϕ , p i ) = 0 . Then, (1) is the zero-truncated PS (ZTPS) distribution, whose p.m.f. is given by
π Z T P S ( y i ; μ i , ϕ ) = π P S ( y i ; μ i , ϕ ) 1 π P S ( 0 ; μ i , ϕ ) ( 1 I ( y i ) ) , y i A .
Note that the parameter p i controls the frequency of zeros and can assume different values at specific levels of covariates. Conditions (i)–(v) indicate that in zero inflation count data, deflation or standard zero frequency may occur in the same dataset at specific levels of covariates. Therefore, it is plausible that the ZMPS model can model these situations by considering different values of p i . More details about the ZMPS distributions are provided in Conceiçao et al. [20].

2.2. Hurdle Model Versus ZMPS Model

Note that (1) can be rewritten as
π Z M P S ( y i ; μ i , ϕ , p i ) = π Z M P S ( 0 ; μ i , ϕ , p i ) I ( y i ) + π Z M P S ( y i ; μ i , ϕ , p i ) ( 1 I ( y i ) ) = { 1 p i + p i π P S ( 0 ; μ i , ϕ ) ) } I ( y i ) + { p i π P S ( y i ; μ i , ϕ ) } ( 1 I ( y i ) ) = { 1 p i ( 1 π P S ( 0 ; μ i , ϕ ) ) } I ( y i ) + p i ( 1 π P S ( 0 ; μ i , ϕ ) ) π P S ( y i ; μ i , ϕ ) 1 π P S ( 0 ; μ i , ϕ ) ( 1 I ( y i ) ) .
Since 0 p i 1 1 π P S ( 0 ; μ i , ϕ ) , then 0 p i ( 1 π P S ( 0 ; μ i , ϕ ) ) 1 . Thus, setting ω i = p i ( 1 π P S ( 0 ; μ i , ϕ ) ) , the p.m.f. given in (4) can be written as a mixture distribution, i.e.,
π Z M P S ( y i ; μ i , ϕ , ω i ) = ( 1 ω i ) I ( y i ) + ω i π Z T P S ( y i ; μ i , ϕ ) , y i A ,
where 0 ω i 1
The ZMPS distribution in (5) is parameterized by ω i and can be interpreted as a hurdle distribution [35], where the probability of the event Y i = 0 is 1 ω i , while the probability of an observation Y i = y i , y i > 0 is ω i π Z T P S ( y i ; μ i , ϕ ) .
Properties (i)–(v) can also be written in terms of ω i . Therefore, the ZMPS model (1) and the hurdle model (5) are equivalent, and either can be used to model data with zero-frequency modifications (inflation or deflation). Ref. [36] compared the hurdle model to zero-inflated models, highlighting the advantages of hurdle models (or ZMPS) due to the viability of these models to consider zero-inflation ( 0 < p i < 1 ) and zero-deflation ( 1 < p i 1 / ( 1 π P S ( 0 ; μ i , ϕ ) ) ) with different levels of covariates. Henceforth, we will consider the hurdle version of ZMPS given by (5).
However, it is essential to note that ZMPS models may have a drawback when p i = 1 for all i = 1 , , n . A standard PS model could have been used without fitting the parameters p i . We can only say that fitting a ZMPS model instead of a simple PS model in this scenario may result in efficiency losses for the parameter estimators. This paper offers a method to assess this efficiency loss from a Bayesian perspective (see Section 2.9).
Let x i = ( x i , 1 , , x i , r ) with x i , 1 = 1 , and v i = ( v i , 1 , , v i , s ) , with v i , 1 = 1 , be fixed predictors. Let X = ( x 1 , , x n ) and V = ( v 1 , , v n ) , respectively, denote the n × r and n × s predictor matrices employed for modeling the μ i and ω i , respectively, via the dual-link function given by logarithmic and logit link functions, i.e.,
log ( μ i ) = η 1 ( β μ , x i ) = β μ , 1 + j = 2 r β μ , j x i , j , logit ( ω i ) = η 2 ( β ω , v i ) = β ω , 1 + j = 2 s β ω , j v i , j .
where β μ = ( β μ , 1 β μ , r ) , β ω = ( β ω , 1 , β ω , s ) . It is also possible to employ alternate link functions for ω i ; see [37].
The parameter ϕ accounts for under- or overdispersion in the data, in addition to any dispersion caused by zero inflation or zero deflation. In the case of the zero-modified Poisson distribution, ϕ = 1 , and all the overdispersion is attributed to zero inflation. In other models, some of the under- or overdispersion is explained by the parameters ω i and μ i . Therefore, we interpret ϕ as capturing any under- or overdispersion beyond that induced by zero-modification, without the need for additional covariates.

2.3. Bayesian Framework for the ZMPS Regression Model

Let D = { y , X , V } denote the observed data, based on which we estimate the parameters β μ , β ω , ϕ under the model in (5) using the fully Bayesian framework.

2.4. Likelihood Function

The likelihood function is
L ( β μ , β ω , ϕ | D ) = i = 1 n 1 ω i I ( y i ) ω i π Z T P S ( y i ; μ i , ϕ ) ( 1 I ( y i ) ) ,
where μ i > 0 and 0 < ω i < 1 . It follows that the log-likelihood function is
( β μ , β ω , ϕ D ) = i = 1 n I ( y i ) log ( 1 ω i ) + 1 I ( y i ) log ω i π Z T P S ( y i μ i , ϕ ) , = i = i n I ( y i ) log 1 ω i ω i + log ( ω i ) + i = 1 n 1 I ( y i ) log π Z T P S ( y i μ i , ϕ ) = 1 ( β μ , ϕ D ) + 2 ( β ω D ) ,
where
1 ( β μ , ϕ D ) = i = 1 n 1 I ( y i ) log π Z T P S ( y i μ i , ϕ , D )
and
2 ( β ω D ) = i = 1 n I ( y i ) log 1 ω i ω i + log ( ω i ) .
Here, 1 ( β μ , ϕ D ) is the portion of the log-likelihood function from the associated ZTPS distribution, considering only the positive observations of y , and 2 ( β ω D ) is the part associated with the hurdle parameter β ω . Therefore, we can estimate the parameters β μ and ϕ of the ZMPS distribution from its associated ZTPS distribution using only the positive observations of the sample y . The following result states a relevant link between the estimation of the parameters of ZMPS and ZTPS distributions.
Result 1.
If Y i follows the ZMPS distribution, we can estimate the β μ and ϕ parameters by only considering the positive observations of y and the associated ZTPS distribution.
To verify this result, let y + = ( y 1 + , , y m + ) denote the positive observations in y , where m = n n 0 , and n 0 is the number of zero observations in the vector y . Let X + denote the m × r predictor matrix associated with y + . Assuming that y + follows the associated ZTPS distribution and D + = ( y + , X + ) , the log-likelihood function associated with y + is given by
Z T P S ( β μ , ϕ D + ) = i = 1 m log π Z T P S ( y i + μ i , ϕ , D + ) .
We can easily show that (9) is obtained directly from (7). We see from (7) and (8) that estimation of ( β μ , ϕ ) independently of the parameter is independent of estimating β ω . Substituting
π Z T P S ( y i + μ i , ϕ ) = a ( y i + , ϕ ) g ( μ i , ϕ ) y i + f ( μ i , ϕ ) 1 , y i + { 1 , 2 , }
in (9) and differentiating Z T P S ( β μ , ϕ D + ) with respect to each parameter, β μ , j , j = 1 , r , and ϕ , we obtain the score vectors U β μ and U ϕ with elements:
U β μ , j = Z T P S ( β μ , ϕ D + ) β μ , j = i = 1 m y i + g μ ( μ i , ϕ ) g ( μ i , ϕ ) f μ ( μ i , ϕ ) f ( μ i , ϕ ) 1 μ i β μ , j
and
U ϕ = Z T P S ( β μ , ϕ D + ) ϕ = i = 1 m a ϕ ( y i + , ϕ ) a ( y i , ϕ ) + y i + g ϕ ( μ i , ϕ ) g ( μ i , ϕ ) f ϕ ( μ i , ϕ ) f ( μ i , ϕ ) 1 ,
where
g μ ( μ i , ϕ ) = g ( μ i , ϕ ) / μ i , f μ = f ( μ i , ϕ ) / μ i , a ϕ = a ( y i , ϕ ) / ϕ , g ϕ = g ( μ i , ϕ ) / ϕ , and f ϕ = f ( μ i , ϕ ) / ϕ . Considering the logit link function logit ( ω i ) = log ω i / ( 1 ω i ) = v i β ω and differentiating 2 ( β ω D ) with respect to each parameter, β ω , j , j = 1 , s , we obtain the score vector U β ω , whose elements are
U β ω , j = 2 ( β ω D ) β ω , j = i = 1 n 1 I ( y i ) v i , j e v i β ω 1 + e v i β ω v i , j .
The score vector, denoted by U ( β μ , ϕ , β ω ) = ( U β μ ( β μ , ϕ ) , U ϕ ( β μ , ϕ ) , U β ω ( β ω ) ) , will play a role in the MCMC simulation algorithm in the Bayesian approach.
Let the parameter vector be denoted by θ = ( β μ , β ω , ϕ ) , where θ = ( θ 1 , , θ m ) and m = r + s + 1 . The observed information matrix J ( θ ) is an ( r + s + 1 ) × ( r + s + 1 ) matrix, with entries given by
J j , k = 2 log L ( θ D ) θ j θ k , j , k = 1 , , ( r + s + 1 ) .
Due to the orthogonality between the parameters ( β μ , ϕ ) and β ω , the information matrix J ( θ ) is a block diagonal matrix given by
J ( θ ) = J β μ , β μ J β μ , ϕ 0 J β μ , ϕ J ϕ , ϕ 0 0 0 J β ω , β ω .
The elements of each block are defined by
J β μ j , β μ k = 2 Z T P S ( β μ , ϕ | D + ) β μ β μ = i = 1 m y i + h g ( β μ , ϕ ) + h f ( β μ , ϕ ) μ i β μ , j μ i β μ , k i = 1 m y i + g μ ( μ i , ϕ ) g ( μ i , ϕ ) f μ ( μ i , ϕ ) f ( μ i , ϕ ) 1 2 μ i β μ , j β μ , k ,
J β μ , ϕ = J ϕ , β μ = 2 Z T P S ( β μ , ϕ | D + ) ϕ β μ = i = 1 m y i + t g ( β μ , ϕ ) + t f ( β μ , ϕ ) μ i β μ , j ,
J ϕ , ϕ = 2 Z T P S ( β μ , ϕ | D + ) ϕ 2 = i = 1 m k a ( y i + , ϕ ) y i + k g ( β μ , ϕ ) + k f ( β μ , ϕ ) , J β ω j , β ω k = 2 2 ( β ω | D ) β ω β ω = i = 1 n x i x k ω i 1 e x i β ω ,
where
h g ( β μ , ϕ ) = g μ μ ( μ i , ϕ ) g ( μ i , ϕ ) g μ 2 ( μ i , ϕ ) g ( μ i , ϕ ) 2 , h f ( β μ , ϕ ) = f μ μ ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) f μ i 2 ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) 2 , k a ( y i + , ϕ ) = a ϕ ϕ ( y i + , ϕ ) a ( y i + , ϕ ) a ϕ 2 ( y i + , ϕ ) a ( y i + , ϕ ) 2 , k g ( β μ , ϕ ) = g ϕ ϕ ( μ i , ϕ ) g ( μ i , ϕ ) g ϕ 2 ( μ i , ϕ ) g ( μ i , ϕ ) 2 , k f ( β μ , ϕ ) = f ϕ ϕ ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) f ϕ 2 ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) 2 , t g ( β μ , ϕ ) = g μ ϕ ( μ i , ϕ ) g ( μ i , ϕ ) g μ ( μ i , ϕ ) g ϕ ( μ i , ϕ ) g ( μ i , ϕ ) 2 , t f ( β μ , ϕ ) = f μ ϕ ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) f μ ( μ i , ϕ ) f ϕ ( μ i , ϕ ) ( f ( μ i , ϕ ) 1 ) 2 ,
and, g μ μ ( μ i , ϕ ) = 2 g ( μ i , ϕ ) / μ i 2 , f μ μ = 2 f ( μ i , ϕ ) / μ i 2 , a ϕ ϕ = 2 a ( y i , ϕ ) / ϕ 2 , g ϕ ϕ = 2 g ( μ i , ϕ ) / ϕ 2 , and f ϕ ϕ = 2 f ( μ i , ϕ ) / ϕ 2 , g μ ϕ ( μ i , ϕ ) = 2 g ( μ i , ϕ ) / μ i ϕ , f μ ϕ = 2 f ( μ i , ϕ ) / μ i ϕ .

2.5. Prior Distributions

Assuming prior independence, we have π 0 ( β μ , ϕ , β ω ) = π 0 ( β μ ) π 0 ( ϕ ) π 0 ( β ω ) . We propose a multivariate normal prior distribution for β μ and β ω , i.e., β μ N ( β 0 , σ 0 2 I r ) and β ω N ( β 1 , σ 1 2 I s ) , where β 0 and β 1 are hyperparameter vectors of suitable dimensions, and σ 0 2 and σ 1 2 are prior variances. We assume that ϕ L N ( ϕ 0 , σ 2 2 ) , where L N denotes the lognormal prior distribution.
We specify the prior distribution π 0 ( β μ , ϕ , β ω ) so that, even with moderate sample sizes, the information from the data outweighs the non-informative prior due to the “vague” nature of the prior knowledge.

2.6. Posterior Distribution

The likelihood function associated with the observations D can be written in terms of Z T P S ( β μ , ϕ D + ) and 2 ( β ω D ) as
L ( β μ , ϕ , β ω D ) = exp Z T P S ( β μ , ϕ D + ) + 2 ( β ω D ) .
The joint posterior distribution is
π ( β μ , ϕ , β ω D ) exp Z T P S ( β μ , ϕ D + ) + 2 ( β ω D ) π 0 ( β μ ) π 0 ( ϕ ) π 0 ( β ω ) .
The Inference for each parameter is derived from its marginal posterior distribution, which can be obtained by integrating π ( β μ , ϕ , β ω D ) for each parameter. For the parameters ( β μ , ϕ ) , the joint posterior distribution is given by
π Z T P S ( β μ , ϕ | D + ) exp Z T P S ( β μ , ϕ D + ) π 0 ( β μ ) π 0 ( ϕ ) .
For the parameter β ω , the posterior distribution is
π ω ( β ω | D ) exp 2 ( β ω D ) π 0 ( β ω ) .
Under quadratic loss, we calculate the expected value and variance of the posterior densities of these parameters. Since the posterior densities π ( β μ , ϕ | D ) and π ( β ω | D ) do not have a standard analytic form, MCMC methods are useful. We employ the highly efficient stochastic gradient HMC (SGHMC) algorithm [31]. While Bayesian estimation based on SGHMC is similar in many aspects to a traditional MCMC approach, the SGHMC algorithm enhances sampling efficiency by eliminating the acceptance–rejection step typically found in other algorithms, such as MH and HMC. This streamlining allows for faster convergence and more effective sampling.

2.7. Stochastic Gradient HMC with Friction (SGHMC) Algorithm

This section reviews the SGHMC algorithm. Let L ( θ D ) and π 0 ( θ ) respectively denote the likelihood function and prior distribution. To develop the HMC algorithm, the posterior distribution of θ = ( θ 1 , , θ m ) given a set of independent observations D can be written as
π ( θ D ) = exp E ( θ ) ,
where E ( θ ) is a potential energy function given by
E ( θ ) = log L ( θ D ) log π 0 ( θ ) .
The HMC method generates samples from the joint distribution of ( θ , z ) defined by
π ( θ , z D ) exp E ( θ ) + K ( z ) .
where z is an auxiliary latent random variable of the same dimension as θ , with a Gaussian distribution whose kernel K ( z ) represents a kinetic energy function given by
K ( z ) = z M 1 z 2 ,
and the mass matrix M is often set to the identity matrix I .
Making an analogy with a physical system, the parameter vector θ denotes the generalized coordinate, and z is the generalized moment. The Hamiltonian function H ( θ , z ) = E ( θ ) + K ( z ) represents the energy of the system, which is the sum of the potential energy E ( θ ) and kinetic energy K ( z ) .
Suppose τ refers to the iteration of the algorithm. The partial derivatives of the Hamiltonian determine how θ and z change over τ , according to Hamilton’s equations, i.e.,
d θ d τ = H ( θ , z ) z ,
d z d τ = H ( θ , z ) θ .
For any time interval of duration Δ τ , these equations define a transition from the state at time τ to the state at time τ + Δ τ . The HMC algorithm simulates the Hamiltonian dynamics according to (13) and (14) for obtaining a sequence of random samples.
Ref. [28] modified Hamilton’s equations by adding a friction term to the momentum update, leading to the equations
d θ = M 1 z d τ , d z = E ( θ ) d τ B M 1 z d τ + N ( 0 , 2 B d τ ) ,
where B is a diffusion matrix, and with a slight abuse of notation, the last term in the second equation denotes the introduction of a multivariate Gaussian random variable N ( 0 , 2 B d τ ) . In practice, we need an estimate B ^ . Ref. [28] introduced a specified friction term C B and considered the following dynamics:
d θ = M 1 z d τ                                                                                                                                                                                                  
d z = E ( θ ) d τ C M 1 z d τ + N ( 0 , 2 ( C B ^ ) d τ ) + N ( 0 , 2 B ^ d τ ) .
Under a realistic, simple choice B ^ = 0 , the momentum update simplifies to
d z = E ( θ ) d τ C M 1 z d τ + N ( 0 , 2 C d τ ) .
Therefore, the dynamics are governed by the controllable injected noise N ( 0 , 2 C d τ ) and the friction term CM 1 . In practical applications, the differential Equations (15) and (16) cannot be solved analytically, and numerical methods are required. Considering the integration step size ϵ , we rewrite (15) and (16) in discretized form as
Δ θ = ϵ M 1 z                                                                            
Δ z = ϵ E ( θ ) α z + N ( 0 , 2 α M ) ,
where α = ϵ C M 1 . The constant α and the discretization step ϵ are tuning constants; α tends to be fixed at a small value in practice. In our implementation, we fixed α = 0.10 and ϵ = 0.01 . To solve (17) and (18), we consider the leapfrog integrator method summarized in the following steps (Algorithm 1):
It should be noted that the SGHMC does not need a Metropolis–Hastings (MH) step to reach the target distribution, which avoids wasting computational processing [28]. However, SGHMC’s performance is highly sensitive to two user-specified parameters: a step size ϵ and a desired number of steps L. In particular, if L is too small, the algorithm exhibits undesirable random walk behavior, while if L is too large, the algorithm wastes computation time. We chose ϵ = 0.01 and L = 10 to guarantee the good performance of the algorithm for the problems addressed in this paper.
Algorithm 1 Simulation of the discretized SGHMC dynamics with friction
  • Starting position : θ ( t ) , step size ϵ , number of steps L , constant α , and sample momentum z ( t ) N ( 0 , M ) .
  • Initialize::
    θ ( 0 ) = θ ( t ) , z ( 0 ) = ( 1 α ) z ( t ) ϵ 2 E θ ( t ) + N ( 0 , 2 α M ) .
  • for  τ 1 to L   do
  •    
    θ ( τ ) = θ ( τ 1 ) + ϵ M 1 z ( τ 1 ) if τ L z ( τ ) = ( 1 α ) z ( τ 1 ) ϵ E θ ( τ ) + N ( 0 , 2 α M )
  • end for
    z ( L ) = ( 1 α ) z ( L 1 ) ϵ 2 E θ ( L ) + N ( 0 , 2 α M )
    return  ( θ ( t + 1 ) , z ( t + 1 ) ) ( θ ( L ) , z ( L ) ) .

2.8. Monte Carlo Posterior Estimates

We employ the SGHMC algorithm for the posterior sampling of the parameters θ = ( β μ , β ω , ϕ ) from the ZMPS regression. Suppose we denote these by θ = ( θ 1 , θ m ) . Let { θ ( j ) , g = 1 , , G } , denote the θ samples generated from the posterior distribution π ( θ | D ) via the SGHMC procedure. Since we can approximate a function E ( ψ ( θ ) | D ) by
E ( ψ ( θ ) | D ) 1 G g = 1 G ψ ( θ ( g ) ) ,
in particular, the Monte Carlo estimates for the posterior means and variances of θ j , j = 1 , m are computed as
E ( θ j | D ) θ ^ j = 1 G g = 1 G θ j ( g ) ,
V ( θ j | D ) σ ^ θ j 2 = 1 G g = 1 G θ j ( g ) θ ^ j 2 .
Bayesian models can be evaluated and compared using various criteria; see [38] for a comprehensive overview. In this study, we employ three information criteria to select the best-fitting ZMPS regression model: the Conditional Predictive Ordinate (CPO) [39], the Expected Bayesian Information Criterion (EBIC) [40], and the Watanabe–Akaike Information Criterion (WAIC) [41]. All of these criteria can be estimated using Monte Carlo output; details are provided in Appendix C.

2.9. Posterior Information and Relative Efficiency

We use Bayesian information to quantify the efficiency loss from fitting a ZMPS model when a PS distribution could have been used. Ref. [42] originally introduced the concept of posterior information based on Fisher’s information matrix. In this work, we adapt this approach by using the observed information matrix, as the expected information matrix is intractable due to the model’s complexity.

2.10. Posterior Information

The information about the parameters ( β μ , β ω , ϕ ) contained in the joint distribution of y and ( β μ , β ω , ϕ ) is written as
I = E π 0 ( β μ , β ω , ϕ ) E π Z M P S ( y β μ , β ω , ϕ , x ) H ( β μ , β ω , ϕ , D ) ,
where H ( β μ , β ω , ϕ , D ) is the observed Bayesian information matrix, given by
H ( β μ , β ω , ϕ , D ) = 2 log π ( β μ , β ω , ϕ | D ) β μ β μ 2 log π ( β μ , β ω , ϕ | D ) β μ ϕ 0 2 log π ( β μ , β ω , ϕ | D ) ϕ β μ 2 log π ( β μ , β ω , ϕ | D ) ϕ 2 0 0 0 2 log π ( β μ , β ω , ϕ | D ) β ω β ω .
Calculating the expected value in (21) is not always possible, so we consider the observed information matrix H ( β μ , β ω , ϕ , D ) in this paper. From (11), we can write
log π ( β μ , β ω , ϕ | D ) = Z T P S ( β μ , ϕ | D + ) + 2 ( β ω | D ) + log π 0 ( β μ , β ω , ϕ ) + log ( C ) ,
where C > 0 is a normalizing constant. Replacing (23) in (22), we can write the block of the observed Bayesian information matrix for β μ and ϕ as
I ( β μ , ϕ ) = J ( β μ , ϕ ) + Π ( β μ , ϕ ) ,
where J ( β μ , ϕ ) is the block of the observed information matrix (10), and Π ( β μ , ϕ ) is the negative Hessian of the log-prior denoted by
Π ( β μ , ϕ ) = Π β μ β μ Π β μ ϕ Π ϕ β μ Π ϕ ϕ = 2 log π 0 ( β μ , ϕ ) β μ β μ 2 log π 0 ( β μ , ϕ ) β μ ϕ 2 log π 0 ( ϕ , β μ ) ϕ β μ 2 log π 0 ( β μ , ϕ ) ϕ 2 .
The Hessian matrix Π ( β μ , ϕ ) represents the amount of information about ( β μ , ϕ ) contained in the proper prior distribution. Obviously Π ( β μ , ϕ ) = 0 only when the prior is uniform [42]. For simplicity, let us consider the notation.
I ( β μ , ϕ ) = I β μ β μ b β μ ϕ b ϕ β μ c ϕ ϕ = J β μ β μ + Π β μ β μ J β μ ϕ + Π β μ ϕ J β μ ϕ + Π β μ ϕ J ϕ ϕ + Π ϕ ϕ ,
where I β μ β μ is a matrix p × p , b β μ ϕ is a vector p × 1 , and c ϕ ϕ is a scalar.
Equation (24) defines the amount of observed information in the posterior distribution (21). We denote the inverse of the posterior observed information matrix V = I 1 given by
V = V β μ β μ V β μ ϕ V ϕ β μ V ϕ ϕ = I β μ β μ 1 c ϕ ϕ b β μ ϕ b β μ ϕ 1 1 k ϕ ϕ I β μ β μ 1 b β μ ϕ 1 k ϕ ϕ b β μ ϕ I β μ β μ 1 1 k ϕ ϕ ,
where k ϕ ϕ = c ϕ ϕ b β μ ϕ I β μ β μ 1 b β μ ϕ . The matrix V in (25) allows us to compute the relative efficiency between the parameter estimators of the ZMPS regression model and those obtained from the PS regression model when p = 1 .
Additionally, ref. [43] introduced a related inequality, demonstrating that the variance of the posterior distribution is bounded. Specifically, E ( V ( θ j D ) ) E ( V j , j D ) . This inequality will be employed in a simulation study to assess the efficiency of Bayesian estimators derived using the SGHMC algorithm.

2.11. Bayesian Relative Efficiency Between the ZMPS and PS Models

When fitting a ZMPS model to a dataset, we can assess whether a standard PS model associated with the ZMPS model is more appropriate by testing the hypothesis H 0 : p = 1 . Tests such as FBST [44] can be used to assess the evidence in favor of H 0 . However, Result 1 ensures that the parameters θ = ( μ , ϕ ) can be estimated considering only the positive observations y + present in the dataset and by fitting the ZTPS model associated with the PS model. Using only y + with the ZTPS distribution can result in a loss of efficiency in the estimates θ ^ = ( μ ^ , ϕ ^ ) when the true distribution of the data is a standard PS distribution. To evaluate this loss of efficiency, we consider the Laplace approximation for the posterior density π k ( θ k | D ( k ) ) , where k = 0 refers to the posterior density π P S ( θ 0 | D ) and k = 1 is π Z T P S ( θ 1 | D + ) . Now, suppose we can apply the Laplace method; ref. [45] proves that under some conditions on π k ( θ k | D ( k ) ) and for sufficiently large n, the posterior distribution can be approximated by a multivariate normal distribution N ( θ k , V ( k ) ( θ k ) ) , where θ k denotes the mode of the a posteriori distribution, i.e., the value of θ k at which log π k ( θ k | D ( k ) ) achieves its maximum, such that
log π k ( θ k D ( k ) ) θ k θ k = θ k = 0
and V ( k ) ( θ k ) is the inverse of the posterior observed information matrix given by (25), evaluated at the mode θ k .
For ZTPS and most PS models, estimates of θ k can only be obtained numerically. To calculate θ k , we consider the Newton–Raphson iterative method, such that
θ k ( j + 1 ) = θ k ( j ) + V ( θ k ( j ) ) log π k ( θ k ( j ) D ( k ) ) ,
where log π k ( θ k D ( k ) ) = log π k ( θ k D ( k ) ) / θ k . As an initial condition for (26), we consider the Monte Carlo estimate of the θ ^ k ’s components by finding the componentwise sample mean generated from the posterior distribution via MCMC. This initial condition is reasonable since θ k and θ ^ k should be close.
Let the parameter vector θ = ( β μ , ϕ ) , and we denote these by θ = ( θ 1 , , θ d ) , which is the parameter vector in the ZTPS model, and two estimates of this parameter vector denoted by θ 0 = ( θ 0 , 1 , , θ 0 , d ) and θ 1 = ( θ 1 , 1 , , θ 1 , d ) are obtained, respectively, with the PS and ZTPS models. To evaluate the loss of efficiency caused by using Result 1, we propose to use a Bayesian relative efficiency (BRE) measure, which is defined by
BRE ( θ ^ 0 , i , θ ^ 1 , i ) = V i , i ( 0 ) V i , i ( 1 ) .
Appendix A presents the computation of the loss of efficiency in estimating μ and ϕ assessed by (27) under two scenarios in which we fit the proposed ZMPS (or ZTPS) models when the observations come from a PS distribution.

3. Results

3.1. Simulation Study

We present a simulation study to evaluate Bayesian estimation of the ZMPS class of models using the SGHMC algorithm. In this study, we considered samples y = ( y 1 , , y n ) from the distributions ZMP (Poisson), ZMNB (negative binomial), and ZMGP (generalized Poisson) models, with sample size n = 100 , under the link functions
log ( μ i ) = β μ , 1 + β μ , 2 x i , logit ( ω i ) = β ω , 1 + β ω , 2 x i ,
where x i U ( 0 , 1 ) , i = 1 , , n . We consider independent prior distributions, normal N ( β μ 0 , σ 0 2 ) with β μ 0 = 0 and σ 0 2 = 1 for informative prior and σ 0 2 = 100 for vague prior, and a gamma distribution G ( a ϕ , b ϕ ) for ϕ . Let θ = ( β μ , 1 , β μ , 2 , β ω , 1 , β ω , 2 , ϕ ) .
In this simulation study, we conducted a sensitivity analysis to evaluate the effect of initial conditions on the convergence of the generated chains. We tested three different chains, each initialized with conditions based on the mean of the prior, (i) 1.4 times the mean (i.e., plus 40%), and (ii) 0.4 times the mean (i.e., minus 60%). After verifying the burn-in period and ensuring the method’s convergence, we selected the mean of the prior distribution as the reference point for subsequent analysis.
We employed the SGHMC algorithm to generate 50,000 samples of θ from its full posterior distribution, discarding the first 50% as burn-in. This resulted in a chain consisting of 25,000 samples of θ . The final sample was used to calculate the Monte Carlo estimates of each model parameter’s posterior mean and variance, as given by (19) and (20). We repeated this procedure M = 100 times. With M estimates of the model parameters, we evaluated their performance using metrics such as mean bias (B), the ratio of mean squared error (MSE) to variance (Var), and the average Bayesian efficiency (BE), calculated for each model parameter θ j :
B ( θ ^ j ) = 1 M k = 1 M θ ^ j ( k ) θ j ,
M S E ( θ ^ j ) = 1 M k = 1 M ( θ ^ j ( k ) θ j ) 2 ,
B E ( θ ^ j ) = 1 M k = 1 M V j j ( k ) σ ^ θ j 2 ( k ) ,
where V j j ( k ) is given by (25) and σ ^ θ j 2 ( k ) is given by (20) for each θ ^ ( k ) , k = 1 , , L . Additionally, we have also computed the coverage probability ( C P ) of the Bayesian credible intervals expressed as percentages as
C P ( θ j ) = 100 M k = 1 M δ θ ( k ) ,
where δ θ ( k ) assumes 1 if the kth Bayesian credible interval (BCI) contains the true value θ j and 0 otherwise.
A summary of the M = 100 simulations is shown in Table 2. In this summary, we can see the effect of the prior information on the calculation of the standard deviation (s.d.) and the amplitude of the highest posterior density (HPD) credibility interval for each parameter.
As previously discussed, the performance of the estimators using the metrics in (28)–(31) is presented in Table 3. The results show improved efficiency of the estimators when the prior distribution is more informative.

3.2. Application: Number of Femicide Cases

Violence against women encompasses a wide range of physical, psychological, sexual, and property-related attacks, often occurring on a continuum that can tragically culminate in murder, the most extreme manifestation of violence inflicted upon women.
Femicide refers to the murder of a woman specifically because she is a woman, typically driven by hatred, contempt, or a sense of lost control or ownership over women. Brazil’s Feminicide Law (Law 13.104 of 9 March 2015) classifies homicide as femicide when committed against a woman due to her gender. The law defines such crimes as those involving domestic or family violence, contempt, or discrimination against women and also includes femicide in the category of heinous crimes.
Of the five measures to combat gender-based violence recommended by the United Nations, Brazil has implemented only one: online services. This includes the “Red Light” campaign launched by the National Council of Justice and the Brazilian Association of Magistrates, which functions similarly to emergency alert systems. However, there has yet to be an assessment of the campaign’s impact on protecting women in violent situations. Factors contributing to the increase in gender-based violence include victims’ difficulty in reporting, a reduction in crime reporting at police stations, and a decline in the number of emergency protective measures granted to women.
This study aims to determine whether femicides increased during the COVID-19 pandemic. We analyze data from 642 municipalities in São Paulo for the years 2019 and 2020 [46], fitting ZMPS regression models with the Municipal Human Development Index (MHDI) as an explanatory variable. The MHDI, a key measure of municipal development in Brazil, evaluates three dimensions of human development: longevity, education, and income. It ranges from 0 to 1, with higher values indicating greater development. By interpreting the mean μ i Z M P S of the model and the parameter p i (which accounts for zero inflation or deflation) as functions of the MHDI, we gain insights into femicide patterns. A summary of the sample is presented in Table 4.
We apply the proposed Bayesian approach to fit ZMPS models to the femicide data, using the MHDI as a regressor. Specifically, we consider the zero-modified Poisson (ZMP), zero-modified negative binomial (ZMNB), zero-modified generalized Poisson (ZMGP), and zero-modified COM-Poisson (ZMCOMP) distributions. For the parameters β μ and β w , we use normal priors N ( 0 , 10 2 ) , while for ϕ , we apply a gamma prior G ( a ϕ , b ϕ ) with a ϕ / b ϕ = 5 and a ϕ / b ϕ 2 = 10 2 . We employ three information criteria, LogCPO (A8), EBIC (A9), and WAIC (A10), to identify the distribution that best fits the data, with higher LogCPO values and lower WAIC and EBIC values indicating better model performance. The values of these criteria for each fitted model are presented in Table 5, with the selected model for each sample highlighted in bold.
The three criteria identify the ZMNB model as the best fit for both 2019 and 2020. Posterior summaries and credible intervals for the parameters μ i , ϕ , and p i are presented in Table 6. The results are derived using three Markov chains, each containing 100,000 samples, initialized with distinct starting conditions. To minimize the effect of the burn-in period, we discard 25% of each chain and select every 75th value from the remaining 75% (thinning), which helps reduce autocorrelation within the chains. We use the resulting final sample of 3000 observations to compute the Bayesian Monte Carlo estimates. The convergence details of the SGHMC algorithm are provided in Appendix B. Figure 1 displays the estimated mean μ i ( Z M P S ) and the parameter p i for the ZMNB models based on data from 2019 and 2020.
In Figure 1a, we examine the mean number of reported femicides in relation to the MHDI. The graphs demonstrate that the mean number of notifications increases with the MHDI and is slightly higher during the COVID-19 pandemic. This trend indicates that municipalities with higher levels of development tend to report a greater mean number of femicides. This pattern is anticipated as more developed municipalities are likely to offer better protective measures for women and more effective crime reporting systems. The increase in reported cases in 2020, compared with the averages in 2019, can be attributed to the impacts of the pandemic.
The p i parameters are depicted as a function of the MHDI in Figure 1b. These parameters help interpret the frequency of zero observations within the sample. Typically, municipalities with lower mean notification numbers, μ i , are expected to exhibit zero inflation, meaning p i 1 . However, Figure 1b shows that for municipalities with an MHDI below 0.7 (i.e., those with low average μ i ), the estimates of p i exceed 1. This indicates a deflation of zero observations at this level of the covariate, suggesting that municipalities with a low MHDI experience less zero inflation than expected. In fact, they report more cases than the standard model would predict ( p i = 1 ). Furthermore, the increase in registrations during 2020, marked by a rise in deflation from zero, can be attributed to the impact of the COVID-19 pandemic, as seen in the comparison of p i estimates between 2019 and 2020.

4. Conclusions

The ZMPS distribution is constructed by modifying the zero probability of standard PS distributions. This flexible model accommodates count data with varying characteristics related to the frequency of zeros, including zero inflation and deflation, as well as different types of dispersion, such as overdispersion, equidispersion, and underdispersion. This property is particularly significant for ZMPS regression models, where zero inflation or deflation can manifest at different levels of covariates. As a result, the ZMPS class can be applied to model count datasets without requiring prior knowledge of zero inflation or deflation. Importantly, the ZMPS distribution includes PS distributions as a specific case.
This paper presents a Bayesian approach using the SGHMC algorithm to fit a ZMPS regression model. We demonstrate that estimates of μ i and ϕ can be obtained by focusing exclusively on the positive observations in a dataset and assuming a ZTPS distribution for these data. Based on this approach, we can evaluate relative efficiency losses by considering a measure of Bayesian information across different levels of prior information.
The estimation of the parameter p i enables the characterization of zero inflation ( 0 < p i < 1 ) or zero deflation ( 1 < p i < 1 1 π P S ( 0 ; μ i , ϕ ) ). This parameter plays a key role in analyzing and interpreting the data, as discussed in Section 3.2. The ability to estimate p i is a significant strength of the ZMPS regression models.
Finally, as a practical application of the proposed ZMPS model, we analyzed the number of femicides in 642 municipalities in São Paulo for the years 2019 and 2020. In this dataset, zero observations are particularly significant, as the counts represent the number of women murdered. We fitted ZMPS regression models to these counts, utilizing the Municipal Human Development Index (MHDI) as an explanatory variable and estimating the p i parameter. This allowed for a more comprehensive analysis of the zero observations.
A limitation of the proposed model is the significant computational time and effort required to assess the impact of different initial conditions for a given dataset and to determine the appropriate chain length to ensure convergence and efficiency of the Monte Carlo estimators for the model parameters. Furthermore, the computational cost of fitting multiple models and applying a model selection criterion to identify the best PS distribution must be considered. In this study, we have used three criteria, LogCPO, EBIC, and WAIC, for Bayesian model selection.
A formulation of ZMPS regression models for longitudinal datasets that consider spatial correlations represents a natural extension for future research on the models presented in this paper.

Author Contributions

Conceptualization, K.S.C., M.G.A., V.H.L. and N.R.; Methodology, K.S.C., M.G.A., V.H.L. and N.R.; Formal analysis, K.S.C., M.G.A., V.H.L. and N.R.; Writing—review & editing, K.S.C., M.G.A., V.H.L. and N.R. All authors have read and agreed to the published version of the manuscript.

Funding

Marinho G. Andrade is supported by the Brazilian organization FAPESP (2019/21766-8); Katiane S. Conceição is supported by the Brazilian organization FAPESP (2019/22412-5).

Data Availability Statement

The data presented in this study are openly available at [statista] https://www.statista.com/statistics/1102041/number-femicides-brazil-state/ (accessed on 4 January 2020).

Acknowledgments

The authors are grateful to the referees for their useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Bayesian Relative Efficiency: Two Examples

  • Scenario 1: Large sample size: In the case of a large sample size, where the data information outweighs that of the prior densities, this scenario is referred to as the asymptotic case ( n ). To assess the loss of efficiency when fitting the proposed ZMPS (or ZTPS) models to observations from a PS distribution, we consider Equation (27) under this assumption.
  • Scenario 2: Small sample size: In this scenario, the goal is to evaluate the impact of prior information about the model parameters on the loss of efficiency. To achieve this, we consider a small sample size (sufficient for fitting the ZMPS models) and two prior densities that differ only in their level of information (prior variance).

Appendix A.1. Poisson Distribution (P) Versus ZMP (Or ZTP) Distribution

In this example, we consider the Poisson distribution P ( y μ ) without the regression model for simplicity ( θ = μ ). Let us consider (27) to assess the effect of the information level of the prior distribution and the effect of the sample size on the loss of efficiency when we consider fitting a more general ZMP (or ZTP) for data coming from a Poisson distribution (P).
Let D = y = ( y 1 , , y n ) be the observed dataset coming from a Poisson distribution P ( y μ ) = e μ μ y / y ! . Let D + = y + = ( y 1 + , y m + ) be the positive observations in D ( D + D ). Considering the Poisson distribution with parameters μ and a prior density gamma G ( α , β ) for μ , we can write the posterior density as
π 0 ( μ D ) μ n y ¯ + α 1 e ( n + β ) μ ,
where y ¯ is the sample mean, and the mode of the posterior distribution is given by
μ 0 = n y ¯ + α 1 n + β .
The observed posterior information is given by
I ( 0 ) ( μ ) = 2 log π 0 ( μ D ) μ 2 = n y ¯ + α 1 μ 2 ,
and the inverse of the posterior observed information matrix V ( 0 ) ( μ ) = I ( 0 ) 1 ( μ ) evaluated at μ 0 is given by
V ( 0 ) ( μ 0 ) = n y ¯ + α 1 ( n + β ) 2 .
Remark A1.
Note that the posterior variance is σ 0 2 = n y ¯ + α / ( n + β ) 2 and V ( 0 ) ( μ ) = σ 0 2 1 / ( n + β ) 2 . So V ( 0 ) ( μ ) σ 0 2 . Therefore, paralleling the classic Cramér–Rao inequality, we can see that V ( 0 ) ( μ ) provides the posterior Cramér–Rao lower bound type for the posterior variance.
Let D + be the positive observations in D , and the ZTP model be
P ( y + μ ) = e μ μ y + y + ! ( 1 e μ ) .
Considering a gamma prior density G ( α , β ) for μ , the posterior density is given by
π 1 ( μ D + ) μ m y ¯ + + α 1 e ( m + β ) μ ( 1 e μ ) m ,
where y ¯ + is the sample mean of the positive observations, and the mode of the posterior distribution can be found by solving numerically the following equation:
log π 1 ( μ D + ) μ = ( m y ¯ + + α 1 ) μ ( m + β ) m e μ 1 = 0 .
The observed posterior information for the ZTP distribution is given by
I ( 1 ) ( μ ) = 2 log π 1 ( μ D + ) μ 2 = m y ¯ + + α 1 μ 2 m e μ ( e μ 1 ) 2 ,
while the inverse of the posterior observed information matrix V ( 1 ) ( μ ) = I ( 1 ) 1 ( μ ) is given by
V ( 1 ) ( μ ) = μ 2 ( e μ 1 ) 2 ( m y ¯ + + α 1 ) ( e μ 1 ) 2 m μ 2 e μ .
Considering the mode of the posterior distribution μ 1 calculated via the Newton–Raphson procedure (26), we can evaluate the loss of relative efficiency BRE( μ 0 ^ , μ 1 ^ ) given in (27) when only the positive dataset D + and the zero-truncated Poisson distribution are considered:
BRE ( μ 0 ^ , μ 1 ^ ) = μ 0 μ 1 2 m y ¯ + + α 1 n y ¯ + α 1 μ 1 2 e μ 1 ( n y ¯ + α 1 ) ( 1 e μ 1 ) 2 .
Since n y ¯ = m y ¯ + , we can write (A1) as
BRE ( μ 0 ^ , μ 1 ^ ) = μ 0 μ 1 2 1 μ 1 2 e μ 1 ( m y ¯ + + α 1 ) ( 1 e μ 1 ) 2 .

Prior Information

In this example, let us consider a gamma prior distribution G ( α , β ) for μ , i.e.,
π 0 ( μ ) = β α Γ ( α ) μ α 1 e β μ , α > 1 , β > 0 .
Let μ = ( α 1 ) / β 2 represent the mode of the prior density that gives the Bayesian information conveyed by the prior density.
I μ μ ( π 0 ) ( μ ) = 2 log π 0 ( μ ) μ 2 μ = μ = β 2 α 1 .
The prior Bayesian information for μ is proportional to β 2 , meaning that the shape parameter β of the prior density is linked to the accuracy of the prior information about μ .
A simulation study was conducted to assess the effect of prior information and sample size (n) on the loss of relative efficiency (BRE), as calculated by (A2).
Figure A1a illustrates the asymptotic Bayesian relative efficiency (BRE) as a function of the parameters of the gamma prior distribution G ( α , β ) for β = 5 and β = 50 with α = β μ for μ = 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 .
In Figure A1a, we consider n = 300 . The figure illustrates that the information provided by the data outweighs that from the prior densities, resulting in BRE curves that are nearly identical for both prior densities. This figure demonstrates that the ZTP estimator is nearly fully efficient for μ 6 . Figure A1b,c illustrate the sample Bayesian relative efficiency (BRE).
The individual points were derived from simulated Poisson samples of size n = 30 for each true parameter μ . The BRE was calculated using (A2). For μ 5 , the sample BRE (with n = 30 ) and the asymptotic BRE (with n = 300 ) are nearly identical when β = 5 and β = 50 .
However, as shown in Figure A1b,c, prior information enhances Bayesian efficiency for μ 5 .
Figure A1. Bayesian relative efficiency of the ZTP estimation if a Poisson distribution is true: (a) asymptotic BRE ( n = 300 ); (b) Sample BRE for β = 5 ( n = 30 ); (c) Sample BRE for β = 50 ( n = 30 ).
Figure A1. Bayesian relative efficiency of the ZTP estimation if a Poisson distribution is true: (a) asymptotic BRE ( n = 300 ); (b) Sample BRE for β = 5 ( n = 30 ); (c) Sample BRE for β = 50 ( n = 30 ).
Mathematics 13 00060 g0a1

Appendix A.2. Negative Binomial Distribution (NB) Versus ZMNB (Or ZTNB) Distribution

Using (24) and (25), we can assess the loss of efficiency in estimating μ and ϕ when observations are drawn from a traditional negative binomial distribution, but parameter estimates are obtained from the ZTNB distribution. From (25), we can express the lower bounds for the variance of the Bayesian estimates of μ and ϕ , considering only the positive samples and the ZTNB distribution, as follows:
V μ μ ( Z T N B ) = J ϕ ϕ + Π ϕ ϕ ( J μ μ + Π μ μ ) ( J ϕ ϕ + Π ϕ ϕ ) J μ ϕ + Π μ ϕ 2 ,
V ϕ ϕ ( Z T N B ) = J μ μ + Π μ μ ( J μ μ + Π μ μ ) ( J ϕ ϕ + Π ϕ ϕ ) J μ ϕ + Π μ ϕ 2 .
The elements of the matrix J ( μ , ϕ ) are given by
J μ μ = i = 1 m y i + h g ( μ , ϕ ) + h f ( μ , ϕ ) , J μ ϕ = J ϕ , μ = i = 1 m y i + t g ( μ , ϕ ) + t f ( μ , ϕ ) , J ϕ ϕ = i = 1 m k a ( y i + , ϕ ) y i + k g ( μ , ϕ ) + k f ( μ , ϕ ) ,
The evaluation of (A3) and (A4) depends on the choice of the prior density π 0 ( μ , ϕ ) . Here, we illustrate this assessment by considering the loss in efficiency through the BRE, using independent prior gamma distributions G ( α , β ) for both μ and ϕ , respectively.
Using a similar procedure, we derived the observed information matrix for the parameters μ and ϕ of the traditional NB distribution. We denote this matrix by
J ( N B ) ( μ , ϕ ) = J μ μ ( N B ) J μ ϕ ( N B ) J ϕ μ ( N B ) J ϕ ϕ ( N B ) ,
and its elements are given by
J μ μ ( N B ) = i = 1 n y i h g ( μ , ϕ ) + h f ( N B ) ( μ , ϕ ) ,
where
h f ( N B ) ( μ , ϕ ) = f μ μ ( μ , ϕ ) f ( μ , ϕ ) f μ 2 ( μ , ϕ ) f 2 ( μ . ϕ ) .
J μ ϕ ( N B ) = J μ ϕ ( N B ) = i = 1 n y i t g ( μ , ϕ ) + t f ( N B ) ( μ , ϕ ) ,
where
t f ( N B ) ( μ , ϕ ) = f μ ϕ ( μ , ϕ ) f ( μ , ϕ ) f μ ( μ , ϕ ) f ϕ ( μ , ϕ ) f 2 ( μ . ϕ ) .
J ϕ ϕ ( N B ) = i = 1 n k a ( y i , ϕ ) μ k g ( μ , ϕ ) + k f ( N B ) ( μ , ϕ ) ,
where
k f ( N B ) ( μ , ϕ ) = f ϕ ϕ ( μ , ϕ ) f ( μ , ϕ ) f ϕ 2 ( μ , ϕ ) f 2 ( μ . ϕ ) .
From the inverse Bayesian information matrix V for the NB distribution, we can derive the variability of the posterior distribution for μ and ϕ as follows:
V μ μ ( N B ) = J ϕ ϕ ( N B ) + Π ϕ ϕ ( J μ μ ( N B ) + Π μ μ ) ( J ϕ ϕ ( N B ) + Π ϕ ϕ ) J μ ϕ ( N B ) + Π μ ϕ 2 ,
V ϕ ϕ ( N B ) = J μ μ ( N B ) + Π μ μ ( J μ μ ( N B ) + Π μ μ ) ( J ϕ ϕ ( N B ) + Π ϕ ϕ ) J μ ϕ ( N B ) + Π μ ϕ 2 .
By considering the Bayesian relative efficiency (BRE), we can assess the loss of efficiency in estimating μ and ϕ when observations are drawn from a standard NB distribution, but parameter estimates are obtained from the ZTNB distribution. Using (A3)–(A6), the BRE is given by:
BRE ( μ ) = V μ μ ( N B ) V μ μ ( Z T N B ) and BRE ( ϕ ) = V ϕ ϕ ( N B ) V ϕ ϕ ( Z T N B ) .
Figure A2a displays the BRE for μ considering an infinite sample size (here, n = 1000 ). Figure A2b,c compare the asymptotic BRE ( n ) with empirical BRE values calculated for n = 50 . Figure A2b shows the BRE with a less informative prior density ( β = 5 ), where the curve closely matches that obtained for a larger sample size ( n = 1000 ). In Figure A2c, a more informative prior density ( β = 50 ) is used, showing that prior information improves the BRE compared with that provided by the large sample size ( n = 1000 ).
The plots in Figure A3 demonstrate that prior information enhances the BRE compared with that provided by a large sample size. Figure A3a displays the BRE for ϕ when considering an infinite sample size ( n = 1000 ). Figure A3b,c compare the asymptotic BRE ( n = 1000 ) with empirical BRE values calculated for n = 50 .
As shown in Figure A2 and Figure A3, for both parameters μ and ϕ of the negative binomial distribution, there is virtually no loss of efficiency from using the zero-truncated distribution when μ > 6 and ϕ > 4 . However, the prior densities can still enhance the efficiency of estimating these parameters.
Figure A2. Bayesian relative efficiency of the ZTNB estimation for μ if a negative binomial distribution is true: (a) asymptotic BRE ( μ ); (b) empirical BRE ( μ ) for β = 5 ; (c) empirical BRE ( μ ) for β = 50 .
Figure A2. Bayesian relative efficiency of the ZTNB estimation for μ if a negative binomial distribution is true: (a) asymptotic BRE ( μ ); (b) empirical BRE ( μ ) for β = 5 ; (c) empirical BRE ( μ ) for β = 50 .
Mathematics 13 00060 g0a2
Figure A3. Bayesian relative efficiency of the ZTNB estimation for ϕ in a negative binomial distribution is true: (a) asymptotic BRE( ϕ ); (b) empirical BRE( ϕ ) for β = 5 ; (c) empirical BRE( ϕ ) for β = 50 .
Figure A3. Bayesian relative efficiency of the ZTNB estimation for ϕ in a negative binomial distribution is true: (a) asymptotic BRE( ϕ ); (b) empirical BRE( ϕ ) for β = 5 ; (c) empirical BRE( ϕ ) for β = 50 .
Mathematics 13 00060 g0a3

Appendix B. Results of the SGHMC Algorithm Applied to the Data in Section 3.2

The results were obtained using the SGHMC algorithm outlined in Section 2.7. Three Markov chains, each consisting of 100,000 samples, were generated with distinct initial conditions. To minimize the effect of the burn-in period, 25% of each chain was discarded. From the remaining 75%, a subsample was taken for every 75th generated value (a process known as thinning) to reduce autocorrelation within the chains. This process yielded a final sample of 3000 observations, which was used to compute Bayesian Monte Carlo estimates. In addition, the Geweke [47] and Gelman–Rubin [48] convergence diagnostics were calculated using the R CODA package [49] (see Table A1 and Table A2). The graphs in Figure A4 and Figure A5 show the generated chains, the final combined chain, the histogram, and the autocorrelation function for the samples from the final chain.
Table A1. The Gelman–Rubin and Geweke convergence diagnostics to the data of 2019.
Table A1. The Gelman–Rubin and Geweke convergence diagnostics to the data of 2019.
ParametersGelman & RubinGeweke
Point Est.Upper C.I. | z | < 1.96
β μ , 1 1.031.081.3895
β μ , 2 1.031.081.3664
β ω , 1 1.011.040.4394
β ω , 2 1.011.040.4320
ϕ 1.001.001.1438
Multivariate potential scale reduction factor. 1.02.
Figure A4. Results of the SGHMC algorithm for estimating the parameters of the ZMNB regression model applied to the data of 2019: (a) β 1 = β μ , 1 ; (b) β 2 = β μ , 2 ; (c) β 3 = β ω , 1 (d) β 4 = β ω , 2 ; (e) ϕ .
Figure A4. Results of the SGHMC algorithm for estimating the parameters of the ZMNB regression model applied to the data of 2019: (a) β 1 = β μ , 1 ; (b) β 2 = β μ , 2 ; (c) β 3 = β ω , 1 (d) β 4 = β ω , 2 ; (e) ϕ .
Mathematics 13 00060 g0a4
Table A2. The Gelman–Rubin and Geweke convergence diagnostics to the data of 2020.
Table A2. The Gelman–Rubin and Geweke convergence diagnostics to the data of 2020.
ParametersGelman & RubinGeweke
Point Est.Upper C.I. | z | < 1.96
β μ , 1 1.031.081.3863
β μ , 2 1.031.081.3648
β ω , 1 1.011.030.4278
β ω , 2 1.011.030.4206
ϕ 1.001.001.2430
Multivariate potential scale reduction factor. 1.02.
A sensitivity analysis was performed to assess the impact of the prior density parameters on the estimation of the model parameters. This analysis varied the prior mean parameters within the interval [ 50 , 50 ] and considered prior variances of σ 2 = 1 , 10 , 100 , and 10,000 . The results showed that, under the same generation procedure used in the SGHMC algorithm as detailed in this paper, the parameter estimates did not exhibit significant changes. However, the variance of the prior densities had a minimal effect on the parameter variances and, consequently, on the credibility intervals. Given the large dataset (N = 642), the prior parameters had little impact on the estimates. However, for smaller sample sizes, more informative prior densities could have a greater influence on the estimates. This is a fundamental aspect of Bayesian inference and highlights one of the key advantages of this approach.
Figure A5. Results of the SGHMC algorithm for estimating the parameters of the ZMNB regression model applied to the data of 2020: (a) β 1 = β μ , 1 ; (b) β 2 = β μ , 2 ; (c) β 3 = β ω , 1 (d) β 4 = β ω , 2 ; (e) ϕ .
Figure A5. Results of the SGHMC algorithm for estimating the parameters of the ZMNB regression model applied to the data of 2020: (a) β 1 = β μ , 1 ; (b) β 2 = β μ , 2 ; (c) β 3 = β ω , 1 (d) β 4 = β ω , 2 ; (e) ϕ .
Mathematics 13 00060 g0a5

Appendix C. Model Selection

Bayesian models can be evaluated and compared in several ways. Details on various criteria for comparing models in the Bayesian context are presented by [38]. We use three criteria to select the best-fitting ZMPS regression model: the conditional predictive ordinate (CPO) [39], expected Bayesian information criterion (EBIC) [40], and Watanabe–Akaike information criterion (WAIC) [41].
The Conditional Predictive Ordinate (CPO) criterion [39] offers a functional cross-validation approach and serves as a computationally efficient measure of model fit. It is calculated using the predictive probability density function of a future observation y i , given the data excluding the i-th observation, D ( i ) , as follows:
π Z M P S ( y i D i ) = Ω Θ π Z M P S ( y i Θ , D i ) π ( Θ D i ) d Θ ,
where Ω Θ is the parameter space. From (A7) and the generated chain from the SGHMC procedure { Θ ( g ) , g = 1 , , G } , we can estimate CPO i by
CPO ^ i = 1 G g = 1 G 1 π Z M P S ( y i Θ ( g ) , D i ) 1 .
In many cases, it is computationally more convenient to calculate log ( CPO i ) rather than CPO i . In such situations, log ( CPO ^ ) is obtained by summing the estimates of log ( CPO ^ i ) see [38]. The model with the largest log ( CPO ^ ) is then selected as the best-fitting model for the data.
The expected Bayesian information criterion (EBIC) [40] is estimated from the SGHMC output as
EBIC ^ = 1 G g = 1 G 2 log ( L ( Θ ( g ) D n ) ) + p log ( n ) ,
where L ( Θ ( g ) D n ) is the likelihood function defined in (6) evaluated using the generated chain from the SGHMC procedure { Θ ( g ) , g = 1 , , G } , p is the number of parameters in the model, and n is the total number of observations.
The Watanabe-Akaike information criterion (WAIC) was introduced by [41]. The WAIC criterion can be estimated using SGHMC output as
WAIC ^ = 2 i = 1 n log 1 G g = 1 G π Z M P S ( y i θ ( g ) , D ) p W A I C ,
where p W A I C = i = 1 n V g = 1 G log π Z M P S ( y i θ ( g ) , D ) , with V g = 1 G representing the sample variance, i.e., V g = 1 G a g = 1 G 1 g = 1 G ( a g a ¯ ) 2 .
To assess the performance of these model selection criteria for ZMPS regression models, we conducted a simulation study. In this study, we generated samples of size N = 100 from each of the ZMP, ZMNB, ZMGP, and ZM-COMP distributions and fitted the four models to each generated dataset. The SGHMC algorithm was applied with three different initial conditions. For each chain, 100,000 samples were generated, with the first 25% discarded as the burn-in period. From the remaining 75,000 samples, one was selected every 75 iterations (thinned sample), resulting in a joint sample of 3000 values. We then used this sample to calculate the Bayesian estimates for the logCPO, EBIC, and WAIC criteria. The results are presented in Table A3, Table A4, Table A5 and Table A6.
As shown in Table A3, for the data generated with the ZMP distribution, only the LogCPO criterion identified ZM-COMP as the best model, while the other criteria correctly identified the ZMP distribution as the best model.
As shown in Table A4, only the LogCPO criterion incorrectly identifies the best model. In contrast, both the WAIC and EBIC criteria correctly select the ZMNB model as the best fit.
Table A3. Data generated from the ZMP distribution.
Table A3. Data generated from the ZMP distribution.
ModelCriteria
LogCPOWAICEBIC
ZMP162333347
ZMNB164336355
ZMGP163334353
ZM-COMP178361382
The selected model is highlighted in bold.
Table A4. Data generated from the ZMNB distribution.
Table A4. Data generated from the ZMNB distribution.
ModelCriteria
LogCPOWAICEBIC
ZMP156323335
ZMNB150308327
ZMGP150309328
ZM-COMP157321341
The selected model is highlighted in bold.
Table A5. Data generated from the ZMGP distribution.
Table A5. Data generated from the ZMGP distribution.
ModelCriteria
LogCPOWAICEBIC
ZMP162339349
ZMNB152314332
ZMGP152312330
ZM-COMP159325344
The selected model is highlighted in bold.
In Table A5, we observe that the LogCPO criterion incorrectly identifies the best model. However, both the WAIC and the EBIC criteria correctly select the ZMGP model as the best fit.
Table A6. Data generated from the ZM-COMP distribution.
Table A6. Data generated from the ZM-COMP distribution.
ModelCriteria
LogCPOWAICEBIC
ZMP116240254
ZMNB116240258
ZMGP116241260
ZM-COMP117239259
The selected model is highlighted in bold.
In Table A6, we observe that for data generated with the ZM-COMP distribution, both the LogCPO and WAIC criteria correctly identify the best model. However, the EBIC criterion incorrectly selects the ZMP model as the best fit. This is likely due to the lower complexity of the ZMP model, as the EBIC criterion tends to favor models with greater parsimony.
As a final analysis of this simulation study with artificial data, we conclude that the WAIC criterion is the most robust, consistently identifying the correct model as the best in all cases. However, using all three criteria provides a valuable approach to resolving ties and selecting the best model.

References

  1. Jain, G.C.; Consul, P.C. A Generalized Negative Binomial Distribution. SIAM J. Appl. Math. 1971, 21, 501–513. [Google Scholar] [CrossRef]
  2. Ng, T. A new class of modified binomial distributions with applications to certain toxicological experiments. Commun. Stat. Theory Methods 1989, 18, 3477–3492. [Google Scholar] [CrossRef]
  3. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
  4. Consul, P.C.; Famoye, F. Lagrangian Probability Distributions; Birkhäuser: Boston, MA, USA, 2006. [Google Scholar]
  5. Frome, E.L. The analysis of rates using Poisson regression model. Biometrics 1983, 39, 665–674. [Google Scholar] [CrossRef]
  6. Frome, E.L.; Checkoway, H. Use of Poisson regression models in estimating incidence rates and ratios. Am. J. Epidemiol. 1985, 121, 309–323. [Google Scholar] [CrossRef] [PubMed]
  7. Conigliani, C.; Castro, J.I.; O’Hagan, A. Bayesian Assessment of Goodness of Fit against Nonparametric Alternatives. Can. J. Stat. 2000, 28, 327–342. [Google Scholar] [CrossRef]
  8. Bayarri, M.J.; Berger, J.O.; Datta, G.S. Objective Bayes Testing of Poisson Versus Inflated Poisson Models. Inst. Math. Stat. 2008, 3, 105–121. [Google Scholar]
  9. Hinde, J.; Demetrio, C.G.B. Overdispersion: Models and estimation. Comput. Stat. Data Anal. 1998, 27, 151–170. [Google Scholar] [CrossRef]
  10. Gupta, R.C. Modified Power Series Distribution and Some of its Applications. Indian J. Stat. 1974, 36, 288–298. [Google Scholar]
  11. Consul, P.C. New Class of Location-Parameter Discrete Probability Distributions and Their Characterizations. Commun. Stat. Theory Methods 1990, 19, 4653–4666. [Google Scholar] [CrossRef]
  12. Cordeiro, G.M.; Andrade, M.G.; de Castro, M. Power Series Generalized Nonlinear Models. Comput. Stat. Data Anal. 2009, 53, 1155–1166. [Google Scholar] [CrossRef]
  13. M’Kendrick, A.G. Applications of mathematics to medical problems. Proc. Edinb. Math. Soc. 1926, 44, 98–103. [Google Scholar]
  14. David, F.N.; Johnson, N.I. The truncated Poisson. Biometrics 1952, 8, 275–285. [Google Scholar] [CrossRef]
  15. Cohen, A.C. An extension of a truncated Poisson distribution. Biometrics 1960, 16, 447–450. [Google Scholar] [CrossRef]
  16. Umbach, D. On inference for a mixture of a Poisson and a degenerate distribution. Commun. Stat. Theory Methods 1981, 10, 299–306. [Google Scholar] [CrossRef]
  17. Ridout, M.; Demétrio, C.G.B.; Hinde, J. Models for count data with many zeros. In Proceedings of the XIXth International Biometrics Conference, Cape Town, South Africa, 13–16 December 1998; pp. 179–192. [Google Scholar]
  18. Dietz, E.; Böhning, D. On Estimation of the Poisson Parameter in Zero-Modified Poisson Models. Comput. Stat. Data Anal. 2000, 34, 441–459. [Google Scholar] [CrossRef]
  19. Beckett, S.; Jee, J.; Ncube, T.; Pompilus, S.; Washington, Q.; Singh, A.; Pal, N. Zero-inflated Poisson (ZIP) distribution: Parameter estimation and applications to model data from natural calamities. Involv. A J. Math. 2014, 7, 751–767. [Google Scholar] [CrossRef]
  20. Conceição, K.S.; Louzada, F.; Andrade, M.G.; Helou, E. Zero-modified power series distribution and its hurdle distribution version. J. Stat. Comput. Simul. 2017, 87, 1842–1862. [Google Scholar] [CrossRef]
  21. Gelfand, A.; Smith, A. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 1990, 85, 398–409. [Google Scholar] [CrossRef]
  22. Neal, R.M. Chapter 5: MCMC Using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo; CRC Press: New York, NY, USA, 2011; pp. 113–162. [Google Scholar]
  23. Roberts, G.O.; Rosenthal, J.S. Optimal Scaling of Discrete Approximations to Langevin Diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 255–268. [Google Scholar] [CrossRef]
  24. Roberts, G.; Stramer, O. Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 2003, 4, 337–358. [Google Scholar] [CrossRef]
  25. Girolami, M.; Calderhead, B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 123–214. [Google Scholar] [CrossRef]
  26. Hoffman, M.D.; Gelman, A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
  27. Welling, M.; Teh, Y.W. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 681–688. [Google Scholar]
  28. Chen, T.; Fox, E.; Guestrin, C. Stochastic Gradient Hamiltonian Monte Carlo. In Proceedings of the 31st International Conference on Machine Learning, Bejing, China, 22–24 June 2014; Xing, E.P., Jebara, T., Eds.; ACM Digital Library: New York, NY, USA, 2014; pp. 1683–1691. [Google Scholar]
  29. Ding, N.; Fang, Y.; Babbush, R.; Chen, C.; Skeel, R.D.; Neven, H. Bayesian sampling using stochastic gradient thermostats. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
  30. Baker, J.; Fearnhead, P.; Fox, E.B.; Nemeth, C. sgmcmc: An R package for stochastic gradient Markov chain Monte Carlo. J. Stat. Softw. 2019, 91, 1–27. [Google Scholar] [CrossRef]
  31. Nemeth, C.; Fearnhead, P. Stochastic Gradient Markov Chain Monte Carlo. J. Am. Stat. Assoc. 2021, 116, 433–450. [Google Scholar] [CrossRef]
  32. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
  33. Luo, R. Approximate Inference on Structured Distributions Using Stochastic Dynamics. Doctoral Thesis, UCL (University College London), London, UK, 2024. [Google Scholar]
  34. Guikema, S.D.; Coffelt, J.P. A flexible count data regression model for risk analysis. Risk Anal. 2008, 28, 213–223. [Google Scholar] [CrossRef] [PubMed]
  35. Mullahy, J. Specification and testing of some modified count data models. J. Econom. 1986, 33, 341–365. [Google Scholar] [CrossRef]
  36. Feng, C.X. Acomparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl. 2021, 8, 1–19. [Google Scholar] [CrossRef] [PubMed]
  37. Huayanay, A.; Bazán, J.; Cancho, V.; Dey, D. Performance of asymmetric links and correction methods for imbalanced data in binary regression. J. Stat. Comput. Simul. 2019, 89, 1694–1714. [Google Scholar] [CrossRef]
  38. Gelman, A.; Hwang, J.; Vehtari, A. Understanding predictive information criteria for Bayesian models. Stat. Comput. 2014, 24, 997–1016. [Google Scholar] [CrossRef]
  39. Gelfand, A.; Dey, K. Bayesian model choice: Asymptotics and exacts calculations. J. R. Stat. Soc. 1994, 56, 501–514. [Google Scholar] [CrossRef]
  40. Carlin, B.; Louis, T. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman & Hall/CRC Texts in Statistical Science; Taylor & Francis: London, UK, 2010. [Google Scholar]
  41. Watanabe, S. A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 2013, 14, 867–897. [Google Scholar]
  42. Ferreira, P. Extending Fisher’s measure of information. Biometrika 1981, 68, 695–698. [Google Scholar] [CrossRef]
  43. Schützenberger, M.P. A generalization of the Frechet-Cramer inequality to the case of Bayes estimation. Bull. Amer. Math. Soc. 1957, 63, 142. [Google Scholar]
  44. Pereira, C.A.B.; Stern, J.M. Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
  45. Bernardo, J.; Smith, A. Bayesian Theory; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
  46. Datasus. Ministério da Saúde. DATASUS. Tabnet. 2022. Available online: https://datasus.saude.gov.br/mortalidade-desde-1996-pela-cid-10 (accessed on 4 January 2020).
  47. Geweke, J. Evaluating the accuracy of sampling-based approaches to calculating posterior moments. J. R. Stat. Soc. 1994, 56, 501–514. [Google Scholar]
  48. Gelman, A.; Rubin, D. Inference from iterative simulation using multiple sequences. Stat. Sci. 1992, 7, 457–511. [Google Scholar] [CrossRef]
  49. Plummer, M.; Best, N.; Cowles, K.; Vines, K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News 2006, 6, 7–11. [Google Scholar]
Figure 1. The mean μ Z M N B (a) and the parameters p (b) fitted according to the MHDI for the 642 municipalities of São Paulo in 2019 and 2020.
Figure 1. The mean μ Z M N B (a) and the parameters p (b) fitted according to the MHDI for the 642 municipalities of São Paulo in 2019 and 2020.
Mathematics 13 00060 g001
Table 1. Distributions within the PS family.
Table 1. Distributions within the PS family.
PSDistribution f ( μ i , ϕ ) g ( μ i , ϕ ) a ( y i , ϕ )
PPoisson e μ i μ i 1 y i !
GPGeneralized e μ i ( 1 + μ i ϕ ) 1 μ i e μ i ϕ ( 1 + μ i ϕ ) 1 1 + μ i ϕ ( 1 + ϕ y i ) y i 1 y i !
Poisson
COMPCOM-Poisson s = 0 μ i s s ! ϕ μ i ϕ 1 y i ! ϕ
NBNegative ϕ μ i + ϕ ϕ μ i μ i + ϕ Γ ( ϕ + y i ) y i ! Γ ( ϕ )
Binomial
Table 2. Simulation study. Estimation for the ZMPS( μ , ϕ , p ) distribution using sample size N = 100 .
Table 2. Simulation study. Estimation for the ZMPS( μ , ϕ , p ) distribution using sample size N = 100 .
ZMPSParameters N ( 0 , 1 ) N ( 0 , 100 )
θ θ ^ s . d . HPD (95%) θ ^ s . d . HPD (95%)
ZMP β μ , 1 = −1.0−1.0610.306(−1.573, −0.427)−1.1110.426(−1.863, −0.299)
β μ , 2 =  3.0 3.0780.371(2.341,  3.678) 3.1350.511(2.185,  4.025)
β ω , 1 = −1.0−1.0250.312(−1.604, −0.424)−1.0760.465(−1.883, −0.212)
β ω , 2 =  2.5 2.5820.446(1.915,  3.393) 2.6860.768(1.654,  4.262)
ZMNB β μ , 1 = −1.0−1.0240.298(−1.560, −0.380)−1.1020.559(−2.261, −0.129)
β μ , 2 =  3.0 3.0350.403(2.236, 3.811) 3.1270.729(1.668, 4.549)
β ω , 1 = −1.5−1.4850.296(−2.095, −0.883)−1.5070.476(−2.443, −0.552)
β ω , 2 =  2.0 1.9670.459(1.146, 2.889) 1.9930.788(0.661, 3.666)
ϕ =  3.0 2.9540.266(2.423, 3.388) 2.9390.265(2.423, 3.398)
ZMGP β μ , 1 = −1.0−1.0220.300(−1.571, −0.412)−1.1240.605(−2.377, −0.057)
β μ , 2 =  3.0 3.0530.419(2.212,  3.829) 3.1860.816(1.519,  4.757)
β ω , 1 = −1.5−1.4850.296(−2.095, −0.883)−1.5070.476(−2.443, −0.551)
β ω , 2 =  2.0 1.9670.459(1.146,  2.889) 1.9930.788(0.661,  3.666)
ϕ =  0.2 0.2060.027(0.153,  0.256) 0.2080.027(0.156,  0.256)
ZMCOMP β μ , 1 = −3.5−3.5000.011(−3.523, −3.478)−3.5230.098(−3.716, −3.334)
β μ , 2 =  2.0 1.9990.012(1.976,  2.022) 1.9860.099(1.785,  2.173)
β ω , 1 = −1.0−0.9990.011(−1.022, −0.976)−0.9980.090(−1.103, −0.747)
β ω , 2 =  0.5 0.5000.011(0.478,  0.523) 0.4990.096(0.345,  0.726)
ϕ =  0.2 0.1980.027(0.125,  0.230) 0.1960.027(0.124,  0.231)
Table 3. Simulation study. Results of the Bayesian efficiency for the ZMPS( μ , ϕ , p ) distribution.
Table 3. Simulation study. Results of the Bayesian efficiency for the ZMPS( μ , ϕ , p ) distribution.
ZMPS θ N ( 0 , 1 ) N ( 0 , 100 )
BMSE PC ( % ) MSE Var BEBMSE CP ( % ) MSE Var BE
ZMP β μ , 1 −0.0610.097971.0391.00−0.1110.193911.0681.00
β μ , 2  0.0780.144991.0441.00 0.1350.279891.0691.00
β ω , 1 −0.0250.098981.0061.00−0.0760.221941.0271.00
β ω , 2  0.0820.205991.0331.00 0.1860.625941.0591.00
ZMNB β μ , 1 −0.0240.089991.0061.00−0.1020.323941.0331.00
β μ , 2  0.0350.164981.0081.00 0.1270.548941.0301.00
β ω , 1  0.0140.088971.0021.00−0.0070.226931.0051.00
β ω , 2 −0.0330.212981.0051.00−0.0070.621941.0071.00
ϕ −0.0460.073991.0291.00−0.0610.074991.0521.00
ZMGP β μ , 1 −0.0220.091991.0051.00−0.1240.382931.0420.97
β μ , 2  0.0530.178981.0161.000.1860.700931.0520.94
β ω , 1  0.0140.088971.0021.00−0.0070.226931.0001.00
β ω , 2 −0.0330.212981.0051.00−0.0070.621941.0001.00
ϕ  0.0060.001991.0520.830.0080.001991.0830.82
ZMCP β μ , 1  0.0020.001991.0041.000.0230.010991.0560.76
β μ , 2  0.0010.001991.0011.000.0140.010991.0190.74
β ω , 1  0.0030.001991.0000.930.0020.008991.0000.74
β ω , 2  0.0020.001991.0000.960.0020.009991.0000.75
ϕ  0.0210.011981.0390.410.0030.015981.0940.42
Table 4. Frequency distribution and descriptive statistics of each sample referring to the number of femicides in 2019 and 2020 in 642 São Paulo municipalities.
Table 4. Frequency distribution and descriptive statistics of each sample referring to the number of femicides in 2019 and 2020 in 642 São Paulo municipalities.
YearFrequency
y i 0123456789101112131415
2019 f i 25413256452723161377315224
2020 f i 2621216740361711108643235-
y i 161718192021222324252627282930>30
2019 f i 23233232-23121115
2020 f i 12342123211131317
Table 5. Criteria for selecting the models that were fitted to the data on the number of femicide cases in 2019 and 2020.
Table 5. Criteria for selecting the models that were fitted to the data on the number of femicide cases in 2019 and 2020.
Year20192020
Criteria ZMP ZMNB ZMGP ZMCOMP ZMP ZMNB ZMGP ZMCOMP
LogCPO22961376137214812392137213681492
WAIC46582758276729724866275027503032
EBIC46482783279130004847277527833044
Table 6. Posterior summaries and credible intervals of the parameters of ZMNB fitted to the number of femicide cases in 2019 and 2020.
Table 6. Posterior summaries and credible intervals of the parameters of ZMNB fitted to the number of femicide cases in 2019 and 2020.
Year20192020
Parameter Mean Median s.d. CI (95%) Mean Median s.d. CI (95%)
β μ , 0 −15.665−15.5941.579(−18.883, −12.757)−17.597−17.5281.592(−20.839, −14.647)
β μ , 1  22.629 22.5582.098(18.497, 26.587) 25.293 25.2262.117(21.200, 29.377)
β ω , 0 −10.441−10.4431.808(−14.178, −7.086) −7.602 −7.6131.751(−11.150, −4.259)
β ω , 1  14.724 14.6922.454(10.032, 19.653) 10.798 10.7982.375(6.339, 15.658)
ϕ    0.556   0.5540.085(0.396, 0.729)   0.574   0.5720.086(0.403, 0.741)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Conceição, K.S.; Andrade, M.G.; Lachos, V.H.; Ravishanker, N. Bayesian Inference for Zero-Modified Power Series Regression Models. Mathematics 2025, 13, 60. https://doi.org/10.3390/math13010060

AMA Style

Conceição KS, Andrade MG, Lachos VH, Ravishanker N. Bayesian Inference for Zero-Modified Power Series Regression Models. Mathematics. 2025; 13(1):60. https://doi.org/10.3390/math13010060

Chicago/Turabian Style

Conceição, Katiane S., Marinho G. Andrade, Victor Hugo Lachos, and Nalini Ravishanker. 2025. "Bayesian Inference for Zero-Modified Power Series Regression Models" Mathematics 13, no. 1: 60. https://doi.org/10.3390/math13010060

APA Style

Conceição, K. S., Andrade, M. G., Lachos, V. H., & Ravishanker, N. (2025). Bayesian Inference for Zero-Modified Power Series Regression Models. Mathematics, 13(1), 60. https://doi.org/10.3390/math13010060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop