A New Regression Model for the Analysis of Overdispersed and Zero-Modified Count Data

Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper’s primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson–Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the g-prior method. Intensive Monte Carlo simulation studies were performed to assess the Bayesian estimators’ empirical properties, and the obtained results have been discussed. The proposed model was considered for analyzing a real dataset, and its competitiveness regarding some well-established fixed-effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian p-value and the randomized quantile residuals were considered for the task of model validation.


Introduction
The ordinary Poisson (P ) distribution is often adopted for the analysis of count data, mainly due to its simplicity and having computational implementations available for most of the standard statistical packages. However, it is well-known that such a model is not suitable to describe over/underdispersed counts. Apart from data transformation, the most popular approach to circumvent such an issue is based on using hierarchical models that can accommodate different overdispersion levels [1].
The negative binomial (N B) distribution (that may arise as a P mixture model by using a gamma distribution for the continuous part) is undoubtedly the most popular alternative to model extra-P variability. There is extensive literature regarding other discrete mixed distributions that can accommodate different levels of overdispersion, for example, the Poisson-Lindley [2], the Poisson-lognormal [3], the Poisson-inverse Gaussian [4], Formally, a discrete random variable Y defined into N 0 = {0, 1, . . .} is said to follow a Z MP S distribution if its probability mass function (pmf) can be written as P * (Y = y; µ, p) = (1 − p)δ y + p P(Y = y; µ), y ∈ N 0 , (1) where p is the zero-modification parameter and δ y is an indicator function, so that δ y = 1 if y = 0 and δ y = 0 otherwise. Additionally, µ ∈ R + is the expected value of the ordinary P S distribution, whose reparameterized pmf is given by with s(µ) = 3µ 21µ 4 + 84µ 3 + 513µ 2 + 96µ + 15 + 2µ 4µ 2 + 33µ + 3 + 1 1 /3 , and µ = (θ 2 + 2θ + 6)[θ(θ 2 + θ + 2)] −1 for θ ∈ R + (shape parameter). This parameterization is particularly useful since our primary goal is to derive a regression model, in which the influence of fixed-effects can be evaluated directly over the mean of a zero-modified response variable. Unlike in zero-inflated models, here parameter p is defined on the interval [0, P(Y > 0; µ) −1 ], and so the Z MP S model is not a mixture distribution since p may assume values greater than 1. The expected value and variance of Y are given, respectively, by E(Y) = λ= µp and V(Y) = ς 2 = p[σ 2 + (1 − p)µ 2 ], where σ 2 ∈ R + is the variance of the P S distribution (see [43], Table 4). The hurdle version of the P S distribution can be obtained by taking ω = p P(Y > 0; µ), and so rewriting Equation (1) as P * (Y = y; µ, ω) = (1 − ω)δ y + ω P * (Y = y; µ), y ∈ N 0 , for ω ∈ [0, 1] and where P * (Y = y; µ) is the pmf of the zero-truncated Poisson-Sujatha (Z T P S) distribution [44]. Noticeably, Equation (3) is only a reparameterization of the standard Z MP S, and so one can conclude that these models are interchangeable. For ease of notation and understanding, the acronym Z MP S will be used when we refer to the hurdle version of the P S distribution. The corresponding cumulative distribution function (cdf) of Y is given by Comparatively, the proposed model can be considered more flexible than zero-inflated models as it allows for zero-deflation, which is a structure often encountered when handling count data (see, for example, [45,46]). Besides, it can incorporate overdispersion that does not come only from inflation/deflation of zeros, as one of its parts is dedicated to describing the positive values' behavior. In the regression framework that we have developed, discrepant points (outliers) can be identified, and through a careful sensitivity analysis, it is possible to quantify the influences of such observations. However, since the P S distribution accounts for different levels of overdispersion, its zero-modified version is naturally a robust alternative, as it may accommodate discrepant points that would significantly impact the parameter estimates of the Z MP model.
In this paper, the inferential procedures are conducted under a fully Bayesian perspective-an adaptation of the g-prior method [47] for the fixed-effects parameters is considered. The random-walk metropolis algorithm was used to draw pseudo-random samples from the posterior distribution of the model parameters. Local influence measures based on some well-known divergences were considered for the task of detecting influential points. Model validation metrics such as the Bayesian p-value and the randomized quantile residuals are presented. Intensive Monte Carlo simulation studies were performed to assess Bayesian estimators' empirical properties; the obtained results are discussed, and the overall performance of the adopted methodology was evaluated. Additionally, an application using a real dataset is presented to assess the proposed model's usefulness and competitivity.
This paper is organized as follows. In Section 2, we present the fixed-effects regression model based on the hurdle version of the P S distribution. In Section 3, we describe all the Bayesian methodologies and associated numerical procedures considered for inferential purposes. In Section 4, we discuss the results of an intensive simulation study, and in Section 5, a real data application using the proposed model is exhibited. General comments and concluding remarks are addressed in Section 6.

The ZMPS Regression Model
Suppose that a random experiment (designed or observational) is conducted with n subjects. The primary response for such an experiment is described by a discrete random variable Y i denoting the outcome for the i-th subject. The full response vector is given by Y = (Y 1 , . . . , Y n ), and we assume that the observed vector y is obtained conditionally to fixed-effects, here denoted by β = (β 1 , β 2 ). Assuming that Y i |β ∼ Z MP S(µ i , ω i ) holds for all i, a general fixed-effects regression model for count data based on the Z MP S distribution can be derived by rewriting Equation (3) as where µ i ≡ µ(x 1i , β 1 ) and ω i ≡ ω(x 2i , β 2 ) are parameterized nonlinear functions. In this framework, we have β k = (β k0 , . . . , β kq k ) (k = 1, 2) related to x ki = (1, x 1 ki , . . . , x q k ki ), where x ki is a vector of covariates that may include, for example, dummy variables, cross-level interactions, and polynomials. The quantity q 1 (q 2 ) denotes the number of covariates considered in the systematic component of a linear predictor for parameter µ i (ω i ). The full regression matrices of model (5) can be written as X k = (1 n , X k,n×q k ), where 1 n is the intercept column and the submatrix X k,n×q k is defined in such a way that its i-th row contains the vector (x 1 ki , . . . , x q k ki ). The overall dimension of X k is n × (q k + 1). Now, we have to specify two monotonic, invertible, and twice differentiable link functions, say g 1 and g 2 , in which µ i = g −1 1 (x 1i β 1 ) and ω i = g −1 2 (x 2i β 2 ) are well defined on R + and (0, 1), respectively. For this purpose, one may choose any suitable mappings g 1 and g 2 such that g −1 1 : R → R + and g −1 2 : R → (0, 1). The logarithm link function, log(µ i ) = x 1i β 1 , is the natural choice for g 1 . For g 2 , the popular choice is the logit link function, The probit link function, is also appropriate for the requested purpose. Another possible choice for g 2 is which corresponds to the complementary log-log link function. One can notice that these link functions exclude the limit cases p i = 0 and p i = P(Y > 0; µ i ) −1 . The link Function (8) is usually preferable when the occurrence probability of a specific outcome is considerably high/low as it accommodates asymmetric behaviors on the unit interval, which is not the case for link Functions (6) and (7). Besides, a more sophisticated approach considering power and reversal power link functions was proposed by [48], and can also be used to add even more flexibility when modeling parameter ω i . We may refer to the proposed model as a "semi-compatible" regression model. The term "compatible" alludes to "zero-altered," which defines the class proposed by [49], and extended by [50] in a setting including semiparametric zero-altered models that accommodate over/underdispersion. Zero-altered models are similar to zero-modified ones, but the compatibility arises from the linear predictors of µ i and ω i being the same. In our case, specifically, it is worthwhile to mention that identifiability problems may occur if one considers a fixed-effects regression model derived directly from (3), with parameters µ and p sharing covariates, even if β 2 = β 1 . Therefore, the adopted structure allows for more flexibility and robustness as µ and ω may share covariates not necessarily with β 2 = β 1 , and so the only requirement for ensuring model identifiability is the linear independence between covariates within linear predictors.
Given a set of covariates, the probability of a zero-valued count being observed for the i-th subject is given by 1 . Under the logistic regression model (6), β 2l (l = 1, . . . , q 2 ) represents the direct change in the log-odds of Y i , it being positive per 1-unit change in x l 2i , while holding the other covariates at fixed values. On the other hand, the same not apply if one adopts the link Function (8) since e β 2l is not the odds ratio for the l-th covariate effect, and so β 2l does not have a straightforward interpretation in terms of contribution to log-odds. Likewise, it is not possible to interpret the coefficients of the probit model (7) directly, but one can evaluate the marginal effect of β 2l by analyzing how much the conditional probability of Y i being positive is affected when the value of x l 2i is changed. The exact interpretation of β 1l (l = 1, . . . , q 1 ) is not direct in terms of the mean of the hurdle model since the positive counts are modeled by a zero-truncated distribution (Z T P S), and therefore, β 1l represents the overall effect of x l 1i on the expected value µ i when y i > 0, while holding the other covariates at fixed values.
The proposed model has d = dim(β) = q 1 + q 2 + 2 unknown quantities to be estimated. A fully Bayesian approach will be considered for parameter estimation and associated inference. The next section is dedicated to present details of such an approach.

Inference
In this section, we address the problem of estimating and making inferences about the proposed model from a fully Bayesian perspective. Firstly, we derive the model likelihood function, and then, a suitable set of prior distributions is considered to obtain a computationally tractable posterior density for the vector β. Beyond the primary distributional assumption that Y i |β ∼ Z MP S(µ i , ω i ) holds for all i, here we also assume that the outcomes for different subjects are unconditionally independent.
Let Y be a discrete random variable assuming values on N 0 . Suppose that a random experiment is carried out n times independently and, subject to x ki for each i, a vector y = (y 1 , . . . , y n ) of observed values from Y is obtained. Considering model Formulation (5), the likelihood function of β can be written as and so the corresponding log-likelihood function is given by x 2i β 2 (9) = 1 (β 1 ; y) + 2 (β 2 ; y).
In this work, we will consider a log-linear model for parameter µ i , that is, g 1 (µ i ) = log(µ i ) = x 1i β 1 . The choice of g 2 is left open and the notation ω i = g −1 2 (x 2i β 2 ) will be used when necessary. From Equation (9), one can easily notice that the vectors β 1 and β 2 are orthogonal and that 1 depends only on the positive values of y. In this way, the log-likelihood function of β 1 takes the form where J 1 = {j : y j > 0, y j ∈ y} is the finite set of indexes regarding the positive observations of y. Adopting this setup is equivalent to assuming that each positive element of y comes from a Z T P S distribution. Here, we are extending the fact that estimating the P parameter θ using the zero-truncated Poisson (Z T P ) distribution results in a loss of efficiency in the inference if there is no zero modification [35,37]. Now, the log-likelihood function of β 2 can be written as where J 2 = {j : y j = 0, y j ∈ y} is the finite set of indexes regarding the zero-valued observations of y.

Prior Distributions
The g-prior [47] is a popular choice among Bayesian users of the multiple linear regression model, mainly due to the fact of providing a closed-form posterior distribution for the regression coefficients. The g-prior is classified as an objective prior method which uses the inverse of the Fisher information matrix up to a scalar variance factor to obtain the prior correlation structure of the multivariate normal distribution. Such specification is quite attractive since the Fisher information plays a major role in determining large-sample covariance in both Bayesian and classical inference.
The problem of eliciting conjugate priors for a GLM was addressed by [51]. Their approach can be considered as a generalization of the original g-prior method. Still, its application is restricted for the class of GLMs since the proposed prior does not have closed-form for non-normal exponential families. Alternatively, ref. [52] have proposed the information matrix prior as a way to assess the prior correlation structure between the coefficients, not including the intercept since the regression matrix is centered as to ensure that β 0 is orthogonal to the other coefficients. This method uses the Fisher information similarly to a precision matrix whose elements are shrunken by a fixed variance factor. However, the authors have pointed out that such class of priors can only be considered Gaussian priors if the Fisher information matrix does not depend on the vector β = (β 1 , . . . , β q ). In this way, ref. [53] had considered a similar approach when they proposed a class of hyper-g priors for GLMs, where the precision matrix is evaluated at the prior mode, hence obtaining an information matrix that is β free.
The formal concept behind the information matrix prior is closely related to the unit information prior [54], whose main idea is that the amount of information provided by a prior distribution must be the same as the amount of information contained in a single observation. Such an idea can be applied in the previously mentioned approaches by simply considering the total sample size (n) as the variance factor. Ref. [52] have also considered fixed values for the scalar variance factor. On the other hand, some works, including [53,55,56] do consider prior elicitation and inference procedures for the variance scale factor. Here, we will adopt a methodology based on the unit information prior idea combined with the "noninformative g-prior" proposed by [57] for binary regression models. Based on such an approach, it is possible to obtain a quite simple prior distribution for the fixed-effects of the proposed model as It is worthwhile to mention that, in cases where X k is rank deficient (n < q k + 1) or contains collinear covariates, it is highly advisable to compute the generalized inverse of X k X k otherwise the prior covariance matrix of β k may not be defined.
Analogously to Marin and Robert's approach, we do not consider centered regression matrices in the prior specification. Hence, we are able to include β 10 in the proposed g-prior but, in this case, the intercept is a priori correlated with the other coefficients (β 11 , . . . , β 1q 1 ). The same applies for β 20 and the vector (β 21 , . . . , β 2q 2 ).

Posterior Distributions and Estimation
Considering the outlined structure for the Z MP S regression model, the unnormalized joint posterior distribution of the unknown vector β is given by π(β; y) ∝ exp{ (β; y)}π(β).
However, since β 1 and β 2 are orthogonal, we have that π 1 (β 1 ; y) ∝ exp{ 1 (β 1 ; y)}π 1 (β 1 ) and π 2 (β 2 ; y) ∝ exp{ 2 (β 2 ; y)}π 2 (β 2 ), where 1 and 2 are given by (10) and (11), respectively. Naturally, in the discrete setting, the use of proper (Gaussian) priors prevents π 1 and π 2 from being improper. From the Bayesian point of view, inferences for the elements of β k can be obtained from their marginal posterior distributions. However, deriving analytical expressions for these densities is infeasible, mainly due to the associated log-likelihood function's complexity. In this case, to make inferences for β k , we must resort to a suitable iterative procedure to drawn pseudo-random samples from their posterior densities. Hence, aiming to generate N chains for β k , we will adopt the well-known random-walk metropolis (RwM) algorithm [58,59]. For the posterior densities in (13), we consider a multivariate normal distributions for the proposal (candidate-generating) densities in the algorithm. These distributions will be used as the main terms in the transition kernels when computing the acceptance probabilities. Hence, at any state t > 0, the MCMC simulation are performed by proposing a candidate ψ k for β k as where ν = n(n + 1) −1 . One can notice that transitions depend on the acceptance of pseudorandom vectors generated with mean given by the actual state of the chain, which is shrunken by the factor ν. Besides, at any state t > 0, the covariance matrix of the candidate vector ψ k can be approximated numerically by evaluating S k = H −1 The procedure to generate pseudo-random samples from the approximate posterior distribution of β is summarized in Algorithm A1 (see Appendix A). To run it, one has to specify the size of chains to be generated (N) and the initial state vectors β (0) 1 and β (0) 2 beforehand. For a specific asymptotic Gaussian environment, [59] have shown that the optimal acceptance rate should be around 45% for 1-dimensional problems and asymptotically approaches to 23.40% in higher-dimensional problems. We consider acceptance rates varying between 23.40% and 32% as quite reasonable since the proposed model will generally have at least four parameters to be estimated. Indeed, the higher the value of n, the lower the acceptance rate in the RwM algorithm, which results in lower variability of estimates.
The convergence of the simulated sequences can be monitored by using trace and autocorrelation plots, and the run-length control method with a half-width test [60], the Geweke z-score diagnostic [61], and the Brooks-Gelman-Rubin scale-reduction statistic [62]. After diagnosing convergence, some samples can be discarded as burn-in. The strategy to decrease the correlation between and within generated chains is based on getting thinned steps, and so the final sample is supposed to have size M N for each parameter. A full descriptive summary of the posterior distribution (12) can be obtained through Monte Carlo (MC) estimators using the sequence {β t } M t=1 . We choose the posterior expected value as the Bayesian point estimator for θ, that is, which is also known as the minimum mean square error estimator.
In the next section, we discuss the results of the Monte Carlo simulation studies performed to assess the proposed Bayesian methodology's performance. In Section 5, the proposed model's usefulness and competitivity are illustrated by using a real dataset. All computations were performed using the R environment [63]. The executable scripts were made available at the publisher's website.

Posterior Predictive Distribution
In a Bayesian context, the posterior predictive distribution (ppd) is defined as the distribution of possible future (unobserved) values conditioned on the observed ones. Under the Z MP S distribution, the pmf of any observation w ∈ N 0 (subject to the vectors x 1w and x 2w of covariates) is given by where δ w = 1 if w = 0 and δ w = 0 otherwise. Noticeably, the ppd has no closed-form available, and therefore, an MC estimator for this quantity is given bŷ From Equation (15), one can easily estimate, for example, the posterior probability of Y = 0 (subject to x 10 and x 20 ) aŝ

Simulation Study
The empirical properties of an estimator can be accessed through Monte Carlo simulations. In this way, we have performed an intensive simulation study aiming to validade the Bayesian approach in some specific situations. The simulation process was carried out by generating 500 pseudo-random samples of sizes n = 50, 100, 200, and 500 of a variable Y following a Z MP S distribution under the regression framework presented in Section 2. For the whole process, it was considered a n × 2 regression matrix X 1 = (1 n , X 1,n×1 ) in which X n×1 is a vector containing n generated values from a Uniform distribution on the unit interval. Here, we have fixed X 2 = X 1 . Moreover, we have assigned different values for the vectors β 1 = (β 10 , β 11 ) and β 2 = (β 20 , β 21 ) in order to generate both zero-inflated and zero-deflated artificial samples. The logarithm link function was considered for g 1 . For g 2 , we have considered the link Functions (6)-(8) as a way to evaluate how these different specifications affect the estimation of β.
Algorithm A2 (see Appendix A) can be used to generate a single pseudo-random realization from the Z MP S distribution in the regression framework with covariate U (0, 1) for µ and ω. The extension for the use of more covariates is straightforward. The process to generate a pseudo-random sample of size n consists of running the algorithm as often as necessary, say n * times (n * n). The sequential search is a black-box algorithm and works with any computable probability vector. The main advantage of such a procedure is its simplicity. On the other hand, sequential search algorithms may be slow as the while-loop may have to be repeated very often. More details about this algorithm can be found at [64].
Under the Z MP S distribution, the expected number of iterations (NI), that is, the expected number of comparisons in the while condition, is given by where h(µ) is given by Equation (2). We have considered four scenarios for each kind of zero-modification. Table 1 presents the true parameter values that were considered in our study. For the zero-inflated (zerodeflated) case, the samples were generated from the Z MP S distribution by considering that Here, the regression coefficients were chosen by taking into account that zero-inflated (zero-deflated) samples have, naturally, proportion of zeros greater (lower) than expected under an ordinary count distribution and therefore, the variable Y i (i = 1, . . . , n) was generated with mean far from zero (close to zero). Table 1 also presents the range of parameters µ i and p i in each scenario. The bounds were obtained by evaluating the linear predictors β 10 + β 11 x and β 20 + β 21 x at x = 0 and x = 1 (limit values of the adopted covariate). Scenarios 1 and 2 of the zero-inflated case were considered to illustrate the Bayesian estimators' behavior when the proposed model is used to fit (right) long-tailed count data. To apply the proposed Bayesian approach to each scenario, we have considered the RwM algorithm for MCMC sampling. For each generated sample, a chain with N = 50,000 values was generated for each parameter, considering a burn-in period of 20% of the chain size. To obtain pseudo-independent samples from the posterior distributions given in (13), one out every 10 generated values were kept, resulting in chains of size M = 4000 for each parameter. Using trace plots and Geweke's z-score diagnostic, the remaining chains' stationarity was revealed. When running the simulations, the acceptance rates were ranging between 23.40% and 32%. The posterior mean (14) was considered as the Bayesian point estimator, and its performance was studied by assessing its bias (B), its mean squared error (MSE), and its mean absolute percentage error (MAPE). Besides, the coverage probability (CP) of the 95% highest posterior density intervals (HPDIs) was also estimated.
Using the generated samples and letting γ = β 10 , β 11 , β 20 or β 21 , the MC estimators for these measures are given by The variance ofγ was estimated as the difference between the MSE and the square of the bias. Moreover, the CP of the HPDIs was estimated by where δ j (γ) assumes 1 if the j-th HPDI contains the true value γ and 0 otherwise. We have also estimated the below noncoverage probability (BNCP) and the above noncoverage prob-ability (ANCP) of the HPDIs. These measures are computed analogously to CP. The BNCP and ANCP may be useful measures to determine asymmetrical behaviors as they provide the probabilities of finding the actual value of γ on the tails of its posterior distribution. Due to the massive amount of results, the obtained results were made available on the publisher's website as supplementary material. In our study, we have noticed that, as expected, the parameter estimates became more accurate with increasing sample sizes since the estimated biases and mean squared errors have decreased considerably as n increased. The squared ratio between the mean squared error and the estimated variance approaches 1 as n increases. Although high MAPE values were obtained for some parameters (when using small sample sizes), this does not compromise the overall estimation accuracy. For example, when n = 100, we have obtained a estimated MAPE value of approximately 56% for β 11 (see Table S25, Scenario S1, Supplementary Material). Taking into account the true value of such parameter (1.00), we have that the estimates for β 11 were ranging mostly between 0.44 and 1.56, which do not represent a significant impact on the estimated mean (µ). When (right) long-tailed count data are available, the CP of the HPDI for β 11 is considerably lower than the adopted nominal level (for small sample sizes) as its posterior distribution tends to be more asymmetric towards higher values on the parameter space. However, we have observed that the estimated CP of the HPDIs is converging to 95% in both zero-modified cases, and the posterior distributions became more symmetric with increasing sample sizes.
Considering the predefined scenarios, we conclude that our simulation study provides favorable indications about the adopted Bayesian methodology's suitability to estimate the parameters of the proposed model. We believe that in a similar procedure with a different set of actual values, the estimators' overall behavior should resemble the results that we have described here. Besides, the adopted methodology would also be reliable if one or more than one covariates (possibly of other nature) were included in the linear predictors of µ i and ω i .

Chromosomal Aberration Data Analysis
In this section, the Z MP S regression model is considered for analyzing a real dataset obtained from a cytogenetic dosimetry experiment that was first presented by [65]. In this study, the response variable is the number of cytogenetic chromosomal aberrations after the DNA molecule is treated with induced radiation. The dataset was obtained by irradiating five blood samples from a healthy donor with different doses x i (i = 1, . . . , 5) ranging between 0.1 and 1.0 Gy with 2.1 MeV neutrons in three different culture times (48 h, 56 h, and 72 h), considering partial-body exposure-densely ionizing radiation. In the following, n i cells were examined in each irradiated sample and the number of dicentrics and centric ring aberrations y ij (j = 1, . . . , n i ) was recorded.
While [65] have used a t-test to analyze whether the averages of the relative number of dicentrics plus centric ring aberration frequencies differed significantly between the three different culture times, we are primarily interested in evaluating if the averages of the number of dicentrics plus centric ring aberration differ significantly between doses of ionizing radiation, considering data from culture times of 72 h.
The frequency distribution of the collected data is available in Table 2, along with some descriptive statistics. From the observed dataset, there exist evidences that the response variable is slightly overdispersed since y . = 0.131 < s 2 . = 0.210 and s 2 . /y . = 1.607. Additionally, the number of aberrations appears to be heavily zero-inflated, as shown in the left-panel of Figure 1. On the other hand, one can notice that, as the dose of ionizing radiation increases, the number of observed zeros decreases. Still, the distribution becomes more overdispersed since it naturally increases the number of aberrations.  According to [32], when considering higher linear energy transfer radiations, the incidence of chromosomal aberrations becomes a linear function of the dose because the more densely ionizing nature of the radiation leads to an "one track" distribution of damage. Such an aspect can be seen in the right-panel of Figure 1, which highlights the linear behavior between the average number of aberrations and the doses. In this way, our assumption is that Y ij |x i ∼ Z MP S(µ ij , ω ij ), where parameters µ ij and ω ij are specified as linear dose models, that is, log µ ij = β 10 + β 11 x i and g 2 ω ij = β 20 + β 21 x i .
To fit the Z MP S regression model with dose as the only covariate, we have adopted the same procedure used in the previous section. The link Function (7) was chosen to relate ω ij with the linear predictor β 20 + β 21 x i and so we have the probit hurdle regression model. In this framework, the coefficient β 11 represents the effect of the dose of ionizing radiation on the expected count µ i when Y ij > 0, and β 21 indicates the effect of the dose on the probability of aberrations to occur. We have considered the RwM for MCMC sampling, generating a chain of size N = 50,000 for each parameter whereby the first 10,000 values were discarded as burn-in. The stationarity of the chains was revealed using the Geweke z-score diagnostic of convergence. To obtain the pseudo-independent samples from the posterior distributions given in (13), we have considered one value out of every 10 generated ones, resulting in chains of size M = 4000 for each parameter. Table 3 presents the posterior parameter estimates and 95% HPDIs from Z MP S fitted model. When obtaining the MCMC samples, the acceptance rate in the RwM algorithm was approximately 32%. Besides, we have computed the number of effectively pseudo-independent draws, that is, the Effective Sample Size (ESS) for each parameter. Figures 2 and 3 depict the chains' history (trace plots) and the marginal posterior distributions of the regression coefficients. The normality assumption of the generated chains is quite reasonable, even with slight tails on the estimated densities. Additionally, there exists evidence of symmetry since the posterior means and medians are very close to each other. For each parameter, the ESS was estimated at approximately half of M, indicating a good mixing of the generated chains without computational waste.  A sensitivity analysis to verify the existence of influential points is presented in Figure 4. We have estimated all divergence measures presented in Table A1 but, since the obtained results led to the same conclusions, we are only reporting the KL and H divergences and their calibration for each observation. Even being very conservative by considering an observation whose distance has a calibration exceeding 0.65 as an influential point, we do not have found evidence that any observation has influenced the estimation of any coefficient of the Z MP S regression model significantly. For comparison purposes, identical Bayesian procedures were adopted to fit the P, the N B, the P S, the Z MP and the Z MN B regression models. To estimate the fixed dispersion parameter (φ) of N B and Z MN B models, we have considered a noninformative inverse-gamma prior distribution with hyperparameters a = b = 1.0. For each fitted model, we have estimated the measures presented in Appendix C. The model comparison procedure is summarized in Table 4. One can notice that the zero-modified models have performed considerably better with Z MP S outperforming all. These results are highlighting that the proposed model is highly competitive with well-established models in the literature. This feature can be considered one of the most relevant achievements of the Z MP S model since it has to deal with the positive observations using fewer parameters than, for example, the Z MN B model.
In Table 4, we have also reported the Bayesian p-values as a way to evaluate the adequacy of the fitted models. As expected, the P model is unsuitable to describe the considered dataset, and the fit provided by the N B regression model is also highly questionable. For the zero-modified models, there is no indication of overall lack-of-fit, since the posterior values of p B were estimated close to 0.50. Figure 5 depicts additional evidence based on the RQRs for validating the fitted Z MP S regression. This residual metric was computed as discussed in Appendix D, using Equation (4). One can notice that the normality assumption of the residuals is easily verified by the behavior of its frequency distribution (left-panel). Additionally, the half-normal probability plot indicates that the fit of the Z MP S model was very satisfactory since all estimated residuals are lying within the simulated envelope (right-panel).    Table 3, one can make some conclusions. Firstly, we have observed that the HPDIs of parameters β 11 and β 21 do not contain the value zero, which constitutes the dose of ionizing radiation as a relevant covariate to describe the average number of chromosomal aberrations as well the probability of not observing at least one aberration (p 0 ). For example, the expected number of dicentrics and centric rings in a cell that was exposed to 1.0 Gy is 0.363, and the probability of such aberrations not to occur isp 0 = Φ(1.790 − 0.319) = 0.929. Therefore, based on the posterior estimates, the components of the fitted Z MP S model can be expressed bŷ where x i is the dose of ionizing radiation. Figure 6 present the Bayesian estimates, by dose, for the probability of not observing at least one aberration (left-panel) and for parameter p (right-panel). Noticeably, inferences about parameter p confirm the initial assumption that the analyzed sample has an excessive amount of zeros.  Table 5 presents a general posterior summary of the models that were fitted to the chromosomal aberration data. Here, parameter λ as estimated as n −1 ∑ 5 i=1 ∑ n i j=1λ ij and ς 2 was estimated analogously. One can notice that the expected number of zeros (n 0 ) obtained by the P, the N B and the P S models are slightly lower than the observed n 0 , while those provided by the zero-modified models are very close (or exactly equal) to 5252. Through these measures, one can better understand how the fitted models are adhering to the data since the nature of the observed counts should be well described regarding its overdispersion level and the frequency and the average number of nonzero observations.
The goodness-of-fit of the fitted models can be evaluated by the χ 2 statistic obtained from the observed and expected frequencies. To compute such measure, we have grouped cells with frequencies lower or equal than 5, resulting in 4 degrees of freedom. The obtained statistics are also presented in Table 5. Figure 7 depict the positive expected frequencies (left-panel) and the dose-response curves (right-panel) that were estimated by the zeromodified models. Noticeably, the zero-modified models describe much better the data's behavior, especially the Z MN B and the Z MP S distributions.  Table 3 0.132 0.210 5252 5.298 0.258 From the obtained results, one can conclude that despite the suitable fit provided by the Z MN B regression model, the proposed model have adhered better to the chromosomal aberration data. This achievement can be regarded as extremely relevant since the Z MN B model has an additional (dispersion) parameter to handle the non-zero observations. In contrast, the proposed model was proved highly competitive by its ability to accommodate the data overdispersion and zero modification using fewer parameters.

Concluding Remarks
This work aimed to introduce the Z MP S regression model as an alternative for the analysis of overdispersed datasets exhibiting zero-modification in the presence of covariates. Intensive Monte Carlo simulation studies were performed, and the obtained results have allowed us to assess the empirical properties of the Bayesian estimators and then conclude about the suitability of the adopted methodology to the predefined scenarios. The proposed model was considered for analyzing a real dataset on the number of cytogenetic chromosomal aberrations, considering the dose of ionizing radiation as the covariate for both model components. The response variable was identified as overdispersed and heavily zero-inflated, which justified using the Z MP S regression model. The main conclusion one can make from the fitted models is that the dose is statistically relevant to describe either the probability of occurrence and the average incidence of aberrations. Besides, when looking at the χ 2 statistic and the posterior-based comparison criteria, we have noticed that the proposed model has presented a better fit when compared to its competitors and therefore, it can be considered an excellent addition to the set of models that can be used for the analysis of overdispersed and zero-modified count data. Acknowledgments: We would like to thank the associate editor and the four anonymous referees for their careful reading and thoughtful suggestions, which certainly improved this work's content.

Conflicts of Interest:
The authors declare that they have no conflict of interest.

Appendix B. Influential Points
Identifying influential observations is a crucial step in any statistical analysis. Usually, the presence of influential points impacts the inferential procedures and the subsequent conclusions considerably. In this way, this subsection is dedicated to present some case deletion Bayesian diagnostic measures that can be used to quantify the influence of observations from each subject in a given dataset.
The computation of divergence measures between posterior distributions is a useful way to quantify influence. According [66], the ϕ-divergence measure between two densities f and g for θ ∈ Θ is defined by where ϕ is a smooth convex, lower semicontinuous function such that ϕ(1) = 0. Some popular divergence measures can be obtained by choosing specific functions for ϕ. The well-known Kullback-Leibler (KL) divergence is obtained by considering ϕ(z) = − log(z). A symmetric version of the KL divergence, the Jeffrey (J) divergence, can be obtained by specifying ϕ(z) = (z − 1) log(z) and the variational divergence (L 1 norm) is obtained when ϕ(z) = 0.50|z − 1|. In addition, the Chi-square (CS) divergence is obtained by considering ϕ(z) = (z − 1) 2 and the Hellinger (H) distance arises when ϕ(z) = 0.50( √ z − 1) 2 . We refer to [67] for a detailed study on several types of ϕ-divergence.
Let g(β) = π(β; y i ) be the joint posterior distribution of β based only on the i-th observation and let f (β) = π(β; y j i ), where y −i = (y 1 , . . . , y i−1 , y i+1 , . . . , y n ) is the response vector without the i-th observation. After some algebra (see [68] for the KL divergence case), one can verify that the ϕ-divergence corresponds to [69] for the i-th observation. Here, we are also not able to compute the inner expectation over β analytically and so, an MC estimator for the CPO i is given by According to [70], the harmonic mean estimator (A1) is stable when most of the individual log-likelihood values exceed -10. Using the estimated CPO, one can approximate the local influence of a particular y i on the joint posterior distribution (12) aŝ One can notice that, if π(β; y −i ) = π(β; y), then there is no divergence caused by observation y i . In practice, however, it may not be elementary to define a threshold value for the divergence to decide about the magnitude of the influence [71]. A measure of calibration for the KL divergence was proposed by [72]. The idea is based on the typical toy binary example of tossing a coin once and observing its upper face. This experiment can be described by P(Y = y; ρ) = ρ y (1 − ρ) 1−y , y ∈ {0, 1}, where ρ ∈ [0, 1] is the probability of success. Regardless of what success means, if the coin is unbiased, then P(Y = y; ρ) = 0.50. Thus, the ϕ-divergence between a (possibly) biased and an unbiased coin is given by from which one can conclude that the divergence between two posteriors distributions can be associated with the biasedness of a coin [67]. By analogy, this implies that predict unobserved responses using π(β; y −i ) instead of π(β; y) is equivalent to describe an unobserved event as having probability ρ i , when the correct probability is 0.50. Considering some specific choices for ϕ, in Table A1 we present MC estimators that can be used to compute the local influence of each y i . Besides, we also present the expression of d ϕ (ρ) for each ϕ. For ease of notation, we assume f t i = P(Y i = y i ; β (t) ).  The function d ϕ (ρ) is symmetric about 0.50 and increases as ρ moves away from 0.50. In addition, inf ρ∈(0,1) d ϕ (ρ) = 0, which is attained at ρ = 0.50 since d ϕ (0.50) = ϕ(1) = 0. Therefore, a general measure of calibration based on the ϕ-divergence can be obtained by solving An estimator for the calibration measure (ρ ϕ ) associated with each ϕ-divergence type is also presented in Table A1. Clearly, depending on the form of ϕ, such an equation may not have a closed-form, which is the case of the J divergence. Besides, one can notice that ρ i ∈ [0.50, 1] and so, for ρ i 0.50, the i-th observation may be considered an influential point. For example, if ρ i > 0.80 is considered a significative bias, then y i will be classified as influential ifd i > 0.223 (d ϕ (0.80) ≈ 0.223) under the KL divergence or yet ifd i > 0.051 (d ϕ (0.80) ≈ 0.051) under the H divergence.

Appendix C. Model Comparison and Adequacy
There are several techniques for Bayesian model selection that are useful to compare competing models. The most popular method is the deviance information criterion (DIC), which was proposed to work simultaneously to measure fit and complexity of the model. The DIC criterion is defined as where D(β) = −2 (β; y) is the deviance function and D = D(β) − D(β) is the effective number of model parameters, withβ given by (14). A negative value for D may suggest that the log-likelihood function is non-concave, the prior distribution is misspecified, or the posterior expected value is not a good estimator for β. On the other hand, when D d, then there is an indication of overfitting with estimateβ.
Noticeably, we are not able to compute the expectation of D(β) over β analytically. In this case, an MC estimator for such a measure is given bŷ and so the DIC can be estimated by DIC = 2D(β) − D(β).
The expected Akaike (EAIC) and the expected Bayesian (EBIC) information criteria can also be used when comparing Bayesian models [73,74]. Using the approximation for the expected value of D(β), these measures can be estimated by EAIC =D(β) + 2d and EBIC =D(β) + d log(n).
Another widely used criterion is derived from the CPO statistic, which is based on the cross-validation criterion to compare models. For the i-th observation, the CPO can be estimated through Equation (A1). A summary statistic of the estimated CPO's is the log-marginal pseudo-likelihood (LMPL) given by the sum of the logarithms of CPO i 's. Regarding model comparison, we have that the lower the values of DIC, EAIC, EBIC, and NLMPL (negative LMPL), the better the fit.
In addition to comparing, researchers are often interested in verifying the adequacy of the fitted models. An effective way to evaluate model suitability is based on the use of measures derived from the ppd. For instance, if any observation is extremely unlikely relative to the ppd, the obtained fit's adequacy might be questionable. Ref. [75] proposed a widespread discrepancy measure between model and data. In our case, we need a slightly adapted version of such a measure, which is given by The Bayesian p-value (posterior predictive p-value), proposed by [76], is defined as p B = P[T(y r , β) T(y, β); y], where y r denotes the response vector that might have been observed if the conditions generating y were reproduced. This predictive measure can be empirically estimated as the relative number of times that T(y r ,β) exceeds T(y,β) out of B simulations. In general, the model fit becomes suspect if the discrepancy is of practical relevance, and the associated Bayesian p-value is close either to 0 or 1 [75]. A large (small) value of p B , say greater than 0.95 (lower than 0.05), indicates model misspecification (lack-of-fit), that is, the observed behavior would be unlikely to be seen if we replicate the response vector using the fitted model.

Appendix D. Residual Analysis
The residual analysis plays an essential role in the task of validating the results obtained from a regression model. In general, residual metrics are responsible for indicating departures from the underlying model assumptions by quantifying the portion of data variability that the fitted model is not explaining. Assessing a regression model's adequacy using residual metrics is a common practice nowadays due to the availability of statistical packages providing diagnostic tools for well-established models. However, deriving appropriate residuals is not always an easy task for non-normal models that accommodate overdispersion. In this way, we will consider a popular residual metric proposed by [77], the randomized quantile residuals (RQRs), which can be straightforwardly used in our context to assess the appropriateness of the proposed model when fitted to real data.
For obvious reasons, we focus on the definition of RQRs for discrete random variables. In this case, the RQR associated with the i-th observation is defined as r i = Φ −1 (u i ), where Φ denotes the cdf of the standard normal distribution and u i is a Uniform random variable defined on (a i , b i ], with a i = lim y↑y i F(y i ) and b i = F(y i ), where F(y i ) is the cdf of the current model. In our case, we may obtain an MC estimator for the RQR asr i = Φ −1 (u i ), with u i ∼ U (lim y↑y iF * (y i ),F * (y i )]. Here,F * (y i ) ≡ F * (y i ;μ i ,ω i ) is an estimate for the probability of Y i y i using cdf (4), whereμ i andω i depend on the fitted model aŝ µ i = log(x 1iβ 1 ) andω i = g −1 2 (x 2iβ 2 ).
The primary assumption for this metric is thatr i ∼ N (0, 1) must hold, whichever the variability degree ofμ i andω i . In this case, after model fitting, one has to evaluate if these residuals are normally distributed around zero, which can be made through adherence tests and by using graphical techniques as histograms and half-normal probability plots. An excellent alternative for checking whether RQRs are consistent with the fitted model is the inclusion of simulated envelopes in their half-normal plot. Thus, if a significant subset of estimated residuals falls outside the envelope bands, then the fitted model's adequacy must be questioned, and further investigation on the corresponding observations is necessary. Ref. [78] provides an algorithm for obtaining simulated envelopes for a half-normal plot.