Zero-Inﬂated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships

: Our article explores an underused mathematical analytical methodology in the social sciences. In addition to describing the method and its advantages, we extend a previously reported application of mixed models in a well-known database about corruption in 149 countries. The dataset in the mentioned study included a reasonable amount of zeros (13.19%) in the outcome variable, which is typical of this type of research, as well as quite a bit of social sciences research. In our paper, present detailed guidelines regarding the estimation of models where the data for the outcome variable includes an excess number of zeros, and the dataset has a natural nested structure. We believe our research is not likely to reject the hypothesis favoring the adoption of mixed modeling and the inﬂation of zeros over the original simpler framework. Instead, our results demonstrate the importance of considering random effects at country levels and the zero-inﬂated nature of the outcome variable.


Introduction
Count data models have increasingly been applied in management research, but the number of studies is still small and further development of this area is needed. As the name suggests, these models have an outcome variable that is a count variable, i.e., a quantitative variable that assumes non-negative and discrete values. Count data models also include several predictor variables, and the statistical objective is to identify and rank the predictor variables that maximize the ability to predict membership in the values of the outcome variable.
The study of Blevins et al. [1] deserves mention since the authors present an in-depth overview of count data models in management. Examples of applications involving count models are the number of times a business firm issues stocks, the number of registered patents over time, and the quantity of branches a company owns/operates. These counts model applications are usually analyzed using generalized linear modeling (GLM), as described by Almeida et al. [2] and O'Hara and Kotze [3], where estimations focus on understanding and explaining how expected counts of membership in the values of the outcome variable change as a function of predictor variables.
In a seminal work, Lambert [4] was bothered by the biases generated by count data models in the presence of a considerable amount of zeros in the outcome variable. Thus, a proposal of a GLM estimation arises that combines a logistic component, in order to identify and better model the excess of zeros (the so-called structural zeros), with a count component which aims to model count data including the remainder of zeros probabilistically considered in the theoretical distributions of this type of estimation (the so-called sample zeros).
This type of GLM is known as a zero-inflated count model. In this type of count model, as already said, the researcher models the outcome variable as a count variable with an excess number of zeros. For instance, when analyzing a firm's number of patents, one may notice that there are observations when patents were not issued. In such situations, the researcher has two alternatives to obtain model solutions: (1) use methods to deal with the excessive zeros present in the data, or (2) estimate zero-inflated models. Several approaches are available to overcome the first point-excessive zeros in your data (i.e., methods aimed at dealing with outliers). But there are clear alternatives related to the second point-estimating zero-inflated models, since this second approach considers an outcome variable with excess zeros as part of the true data-generating process [5].
However, a limitation of traditional models such as GLM is they don't allow one to study individual heterogeneities, as well as differences among groups or contexts to which these individuals belong to, or for the specification of random components in different levels (such as groups) of the analysis [6,7]. In other words, traditional GLM methods disregard the natural nesting of the observations, based on the premise that such individuals do not share the same observational context [8], despite the fact that some students can be in the same school or some people share same cultural, social or religious aspects, for instance.
On the other hand, the so-called generalized linear mixed models (GLMM)-also called multilevel models, random coefficients models, nested models, or hierarchical models-act precisely to consider those limitations of the GLM techniques. GLMM estimates, therefore, take into account the existence of dependence among observations from the same group and, consequently, generate unbiased standard errors. Additionally, if the intention is to study if predictor group-level variables interact with individual-level variables, GLMM may be the most adequate estimation [9].
In this sense, the sudy published by Hall [10] deserves mention since it expands Lambert's [4] ideas to the field of GLMM estimations, considering not only the modeling of the structural and sample zeros, but also the modeling of the count component, taking into account the latent hierarchical groupings evidenced in the dataset.
With this in mind, the objective of this article is to present detailed guidelines regarding the estimation of models where the data for the outcome variable includes an excess number of zeros and the dataset has a natural nested structure. To do so, we use as a starting point one of the well-known models proposed by Fisman and Miguel [11] and compare it with a multilevel zero-inflated negative binomial model (ZINBM) that uses the same variables, but now taking into account the natural nesting of the observations which were originally in their dataset, but were not considered.
This article fills an important gap in methods development since zero-inflated generalized linear mixed models are not yet well explored in management research (according to the best of our knowledge), as discussed in the next sections. We illustrate the main steps related to the estimation of count data, zero-inflated, and zero-inflated multilevel models, and, at the same time, we present estimates of these distinct count models based on a dataset related to corruption practices in a public organization.
The paper is structured as follows: Section 2 discusses zero-inflated and multilevel models, and presents an overview of the articles related to these topics published in the Top 10 journals in the field of strategic management. Section 3 provides an overview on count data models, such as Poisson, negative binomial (NB) with overdispersion concept and zero-inflated. Section 4 presents the generalized linear mixed models (GLMM), with focus on the estimation of the parameters. In Sections 5 and 6, we offer an illustrative dataset for estimating GLMM through the package glmmTMB available at the Comprehensive R Archive Network (CRAN; http://cran.us.r-project.org (accessed on 28 March 2021)). Section 7 presents our conclusions. Technical details on count data statistical distributions and zero-inflated models, and the R codes, are included in the Appendices A and B.

Background
Zero-inflated regression models are considered a combination of a model for count data with a model for binary data [4]. They are used to identify the reasons why a particular quantity of counts occurs regardless of the number of observed counts. Two types of zero-inflated models are typically estimated, being the first related to the zero-inflated Poisson (ZIP) model estimated from the combination of a Bernoulli distribution with a Poisson distribution, while the second is related to a zero-inflated negative binomial (ZINB) model estimated from the combination of a Bernoulli distribution with a Poisson-Gamma distribution. And one can choose between these two types taking into account existence of overdispersion in the data, i.e., analyzing if the variance of the outcome variable is statistically higher than its mean, as we will discuss in Section 3.
Research in social sciences has dedicated only limited time to the topic of zero-inflated models. A brief review of publications on the topic confirms this point. Table 1 shows the results of searches for terms related to both count models and zero-inflated models in the field of Social Sciences (scholar.google.com.br/ (accessed on 28 March 2021)). Specifically, the table displays each journal's ranking (column (2)), as well as the number of times the term "Zero-Inflated" is found in a search at each website (column (3)). Table 1 also contains the number of times the terms "Poisson", "Negative Binomial" and "Overdispersion" appear (columns (4), (5), and (6), respectively). The seventh column combines columns (4) + (5) - (6). Thus, the last column adds the number of times the terms "Poisson" and "Negative Binomial" (two of the most commonly used models in count data analysis) are found, and then subtracts the term "Overdispersion". The latter term is subtracted to avoid the problem of double counting similar citations, since some of the cited papers may apply both Poisson and NB models when data overdispersion is present. In general, therefore, the seventh column can be considered a proxy for the use of count-based models in social sciences research. The main goal of this search was to obtain initial evidence of the relative importance of zero-inflated models as compared to count-based models.
When analyzing the results displayed in the last column, one notices that most journals present values in the 5-12% range (the average ratio is 9.00%, while the median is 7.60%). At first, this informal evidence reinforces the notion that zero-inflated models are still underused in contemporary social sciences research, a result in line with previous studies [12].
The results in Table 1 demonstrate that, while there is considerable evidence of count models in social sciences research, there is very limited reporting of zero-inflated journal's aims and scope, as well as editorial policy, and a more detailed exploration of these empirical patterns appears justified.
It could also be argued that the existence of inflation of zeros in the outcome count variable is a rare event when compared to the non-existence of inflation of zeros. Although the reasoning may make sense for some fields of knowledge, in the field of social sciences, the excess of zeros in the studied phenomenon is not uncommon [13]. Interesting examples of phenomena whose distribution often contains inflation of zeros are traffic accidents [14], insurance claims [15], the arrests unleashed due to the occurrence of human trafficking crimes [16], the behavior of consumption and dependence on alcohol, cigarettes and other drugs by adolescents [17], unemployment issues and their impacts on mental health [18], social, demographic, and economic factors and their impacts on the current Sars-Cov-2 pandemic [19,20]. And the field considered in this paper should also be mentioned, since cultural aspects influencing practices of corruption have not yet been studied from the perspective of the zero-inflated models.
Blevins et al. [1] previously presented a detailed exposition of count models specifically in management research. The authors started their analysis by reviewing 11 years of management research using count-based outcome variables in 10 leading journals. They found that, while one of four papers in their sample used the basic Poisson regression model to estimate count outcome variables, alternative models might have been more appropriate in such occasions.
In fact, in many of the applications considered by the authors, we believe an alternative zero-inflated model might have provided a better fit for the data. In drawing this conclusion, the authors compared different count models estimated with previously published data and, at the same time, ran simulations based on different parameter values. The results also concluded that, in some cases, there was the possibility of significance level changes in some regressors, as well as sign changes.
In addition, we argue that the consideration of the natural nesting of the observations should also be taken into account in conjunction with the inflation of zeros for the count data models. This is an important qualification since GLM do not take into account a possible grouping of observations within a nested data structure. However, household residences belonging to the same neighborhood, or different students from the same school are often correlated, and these correlations can be accounted for in GLMM with the consideration of random effects [21,22] (As the authors note: "Despite the benefits of using these models, the use of zero-inflated models is only recently gaining traction in management research. Our review showed that only a small percentage of management scholars (11%) used such models" [1]. We view the results presented in Table 1 as robust evidence to these authors' claims.).
In short, in comparison with the classic GLM, GLMM estimations have the advantage of proposing a structure with different levels, and each of these levels is investigated by its own model. According to Woltman et al. [23], each level tries to capture the behavior of the variables and specifies how these variables are related to and influence other levels. As discussed by Courgeau [8], GLMM enables the researcher to relate observations and contexts, such as firms and countries, students and schools, or families and neighborhoods. If one decides to ignore these relationships, incorrect interpretations can arise.
The mixed designation in the GLMM term comes from the fact that predictor variables can be considered in both fixed and random effects components of the regression model. The estimated parameters of fixed effects indicate the relationship between predictor variables and the outcome variable (as well as in the context of GLM estimations), while the random effects component can be represented by the combination of error terms and predictor variables. Recent contributions on the benefits of mixed models are found in Reference [24][25][26][27][28][29][30][31]. All of these studies emphasized that multilevel models are a generalization of regression methods and, thus, can be applied for prediction and also for causal inference based on observational studies.
Mixed models, a term that also commonly appears in the literature as multilevel models, hierarchical models or random coefficients models have acquired considerable importance in the social sciences, and the publication of papers that consider these estimations have been increasingly more frequent, mainly due to studies that take into account the existence of nested data structures. Table 2 displays the search results for the terms "Multilevel Model", "Hierarchical Model", and "Random Coefficients Model" in Social Sciences (scholar.google.com.br/(accessed on 28 March 2021)). While the analysis shows a meaningful number of papers in top social sciences research journals that apply some type of mixed modeling, our research was unable to find in these same journals a single paper with zero-inflated estimations and, simultaneously, with a multilevel perspective, i.e., that takes into account the existence of a nested data structure.
In addition to what is shown in Tables 1 and 2, studies that estimate models taking into account, simultaneously, the inflation of zeros in the outcome variable and the multilevel structure of the dataset are quite rare. Based on our search, zero-inflated generalized linear mixed models are not yet very widely explored in social sciences. "Poisson" "Negative Binomial" "Overdispersion"  Notes: Authors' calculations, based on Google Scholar and journals' data. Search covers all years for all journals. Journals' ranking positions (column 1) were obtained from Google Scholar for the field of "Social Sciences". Searches in specific journals were made for the terms "Zero-Inflated" (column 3), "Poisson" (column 4), "Negative Binomial" (column 5), and "Overdispersion" (column 6). Column (7) was obtained by the following column operation: (4) + (5) - (6). The latter column is a proxy for the use of count models in social sciences research top journals. Finally, column (8) was obtained by the column operation: (3)/(7). This column represents the relative participation of zero-inflated models (ZIM) in published articles employing count models.

GLMM (6) = (3) + (4) + (5)
Journal Notes: Authors' calculations, based on Google Scholar and journals' data. Search covers all years for all journals. Journals' ranking positions (column 1) were obtained from Google Scholar for the field of "Social Sciences". Searches in specific journals were made for the terms "Multilevel Model" (column 3), "Hierarchical Model" (column 4), and "Random Coefficients Model" (column 5). Column (6) brings the sum of columns (3), (4) and (5), i.e., represents a proxy for the total amount of papers that make use of mixed models in social sciences research top journals.

Count Data and Zero-Inflated Models
Understanding the probability distributions and algebraic and econometric criteria for estimating parameters is crucial for the definition of predictive equations that try to capture, in a more reliable way, the real behavior of data.
Nelder and Wedderburn [32], in a relevant paper, presented and discussed through theoretical and conceptual points of view, several works that had previously estimated logistic models involving Bernoulli and binomial distributions, count data models involving Poisson distribution, and polynomial models involving Gamma distribution, among others. All these model estimations were combined in a group referred to as GLM [29]. As part of GLM discussions, therefore, count data regression models can be estimated for cases in which the the otcome variable is quantitative with discrete and non-negative values. Moreover, both Cameron and Trivedi [33] and Hilbe [34,35] note that these estimations are more adequate that traditional linear regression models, which can fail to take into account the presence of discrete and non-negative values in the outcome variables.
Wooldridge [36] indicates that a general regression model for count data can be described using Expression (1) as follows: where represents the expected number of occurrences or incidence rate ratio of the phenomenon studied for a given exposure; β 0 is the intercept; β j (j = 1, 2, 3, ..., k) are the coefficients estimated for each predictor variable X i ; k is the number of predictor variables used in the model; and i indicates each observation of a given sample. While Poisson models assume the existence of equidispersion of the count data for the variable of interest, i.e., µ i = E(Y i ) = Var(Y i ) = λ i , the estimation of a NB model assumes overdispersion of the outcome variable, conditional to predictor variables Hilbe [35], i.e., (2) and (3) represent the mean and the variance, respectively, for a negative binomial model.
where φ = 1 ψ and u i = exp(β 0 + β 1 X 1i + β 2 X 2i + ... + +β k X ki ). As discussed by Cameron and Trivedi [37], the parameter φ in expression (3) represents overdispersion in the count data. According to Fávero et al. [38], in the cases where φ → 0, equidispersion would be detected in the dataset used, indicating that a Poisson regression model should be used. In addition, according to the authors, for the case in which φ is statistically greater than zero, the phenomenon of overdispersion would occur, which suggests the use of the NB model.
Although the most commonly used model for the analysis of count data is the Poisson model, by definition, its distribution has a single free parameter, λ, thus not allowing the variance to be fitted to the mean [33,39], as previously explained for the case of λ = Var(Y). Thus, in the presence of overdispersion, a NB model can provide a better fit for the count data, in which the mean of the Poisson distribution can be used as a random variable that follows a Gamma distribution, together with an additional free parameter φ [40].
Additionally, it is common that some count data variables show a great amount of values equal to zero, which represents a fact that can generate biases in the parameters estimated through traditional Poisson or NB regression models, since these models do not take into account the heightened presence of zero counts. In these cases, the consideration of zero-inflated regression models makes sense [5].
In this perspective, according to Lambert [4], zero-inflated regression models are used for investigating the reasons why a certain number of counts occur, including zeros, and the reasons why this quantity happens, regardless of the number of observed counts. As stated in Ngatchou-Wandji and Paris [41], zero-inflated regression models can be defined as follows: where Y is the count variable, ω is the proportion of the excess of zeros, δ 0 (y) = 1 if y = 0 and δ 0 (y) = 0, otherwise; and f(y) is the density of a count distribution, such as Poisson or NB. In this sense, ZIP models can be estimated through the consideration, simultaneously, of Bernoulli and Poisson distributions. On the other hand, ZINB models are estimated through the simultaneous consideration of Bernoulli and Poisson-Gamma distributions. And one can choose the best estimation depending on the existence of overdispersion in the data [13]. In other words, this decision can be made taking into account the statistical significance of the inverse of the Gamma distribution parameter. And the definition of the existence or not of an excessive amount of zeros in the outcome variable is determined through the analysis of the outputs of the Vuong Test [42]. Table 3 shows the relationship between count data models, overdispersion, and inflation of zeros. According to Table 3, while ZIP and ZINB regression models are more appropriate in the presence of an excessive amount of zeros in the outcome variable, the use of the latter is further recommended if there is overdispersion in the data. Technical details on statistical distributions, as well as on count data and zero-inflated regression models, are included in Appendix A and in the supplementary material.

Generalized Linear Mixed Models (GLMM)
Multilevel models have become quite important in several fields of knowledge, including management. The main reason is that some research constructs are taking into account the existence of nested, or mixed, data structures, in which certain variables vary among groups, or contexts, but do not vary among observations from the same context [21,43]. In addition, computational developments that have increased processing capacities allow the estimation, increasingly, of different types of multilevel models [29,44,45].

The Two-Level Model
According to Fávero [46], there are many situations where data are disposed within a mixed, or nested structure, and the hierarchies refer to fact that observations from the same groups, or contexts, share aspects, or characteristics, that represent a kind of homogeneity.
In this perspective, a two-level zero-inflated count data mixed model, in this sense, can be specified, where the first level refers to observations i(i = 1, ..., n), nested in two-level units j(j = 1, ..., J), as follows: where matrices of the predictor variables Z and X, of levels i (level 1) and j (level 2), appearing in the respective logistic (Bernoulli) and Poisson or Poisson-Gamma components are not necessarily the same (equal conditions hold for matrices of the variables G j and H j at level two), and δ, π, α, and β are the respective matrices of the regression parameters, i.e., δ can be interpreted in terms of the proportion of inflation of zeros, π is related to the mean response in the count data part, and α and β correspond to differences among level two context in structural and sample zeros, respectively, due to the behavior of predictor variables in level two (G j and H j ). Following Lee et al. [47] and Fávero et al. [40], δ j represents the random variations at second level, which means that heterogeneity among higher levels of analysis (groups, for instance) and between individuals is allowed through the random effects δ, with variance equal to σ 2 ν j . Despite the fact that Rabe-Hesketh et al. [48] discuss and estimate multilevel models taking into account discrete outcome variables, papers that consider zero-inflated count data regression models with random effects in the literature are still quite limited.

Model Estimation and Materials
As discussed by Lee et al. [47], one can estimate zero-inflated generalized linear mixed models considering the restricted maximum likelihood approach. The definition the GLMM likelihood function requires the specification of the log-likelihood function of the fixed component (first term), as well as the logarithm of the probability density function of the random effects (second term), which allows one to specify more complex correlation structures in the variance components.
Lee et al. [47] present the log-likelihood function for zero-inflated count data mixed models. The first term, LL 1 , is given by Expressions (A14) or (A19) presented in Appendix A and is related to ZIP and ZINB estimations, respectively, depending on the existence of overdispersion on data. The second term is given by: Random effects ν allow the existence of heterogeneity among clusters and also among individuals. Estimation proceeds by maximizing LL 1 (A14) or (A19) with the consequent update of the values of the variance components through the estimation of a restricted maximum likelihood (REML) function from LL2 [46]. Thus, LL 1 + LL 2 generates the final LL for a zero-inflated multilevel model. In the next section, we will estimate and present the parameters of one of the original models proposed by Fisman and Miguel [11], the NB estimation, which the authors called Model 3, including its analogous model that considers the inflation of zeros (ZINB estimation), as well as the model that considers the inflation of zeros under the multilevel perspective (ZINBM estimation).
With that said, first, one can study the statistical significance of the excessive amount of zeros in the outcome variable Y through the application of the Vuong test. Subsequently, the study can be conducted taking into account an eventual hierarchical structure of the observations in the dataset of Fisman and Miguel [11].
Finally, the log-likelihood values of the estimated models were compared to enable one to verify the best suitability for the considered data, and pointing out the biases generated by not considering the inflation of zeros and/or the nested structure of the data.
In this paper, all the estimations are obtained through the software R version 4.0.4, using the package MASS for the NB models, the package pscl for the ZINB model, and the package glmmTMB for the ZINBM model.

Culture and Corruption among United Nations' Diplomats: An Applied Example
Corruption is an important matter. For instance, when analyzing the effect of corrupt practices on economic outcomes for a cross-section of countries, Mauro [49] finds that corruption lowers private investment and long-run growth. In fact, there is an evergrowing literature aimed at studying and measuring corruption, both in individual and aggregate levels [50][51][52].
In this section, we present an application related to zero-inflated generalized linear mixed models for the study of corruption practices in organizations. We used the paper's code in Stata and respective datasets available at Edward Miguel's website, despite the fact that Fisman and Miguel also made available other versions. We first made extensive use of the materials available there, such as the authors' data codebook, for instance (The original datasets and code are available at the following links: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28059 (accessed on 28 March 2021) and http://emiguel.econ.berkeley.edu/research/corruptionnorms-and-legal-enforcement-evidence-from-diplomatic-parking-tickets (accessed on 28 March 2021). As an additional note, we did not get in touch with the authors in order to use their public dataset.).
Fisman and Miguel [11] presented an original approach to study corruption in a sample of countries. The authors explore based on a unique point of view: the parking behavior of United Nations' (UN) diplomats. They analyzed parking violations from worldwide diplomats during the 1997-2005 period. Given that, since in part of the considered period diplomatic immunity protects UN diplomats from parking enforcement actions, while in the subsequent period diplomatic immunity for parking enforcement actions no longer occurs, the authors argue that diplomats' behavior regarding parking tickets can be considered a good proxy for cultural aspects which can determine corrupt behavior.
Based on this data, the authors built a "revealed preference" measure of corruption among officials who work for governments of 149 countries. This measure has the advantage of being based on diplomats' observed actions, which is its main feature, given the well documented difficulties in measuring corruption worldwide [53] (The authors also built a ranking of countries based on parking violations. In the first positions of the ranking come Kuwait, Egypt and Chad (as more corrupt countries), while Japan, Norway, and Sweden are among the least corrupt in the sample [11].).
Fisman and Miguel [11] bring an interesting discussion emphasizing that parking violation corruption is strongly (and positively) correlated with some country corruption measures, such as those defined by the World Bank surveys, for instance. In general terms, this result suggests that home country corruption norms may serve as a predictor for the propensity to behave corruptly among government officials. Specifically, the data reveals that diplomats from high-corruption countries (such as Nigeria, for instance), on average, tend to behave remarkably bad in situations in which there are no legal consequences involved, while the opposite is true for diplomats from low-corruption countries (such as Norway), whose diplomats do not tend to commit more parking violations.
We begin by replicating a graphical pattern contained in the data. Figure 1 shows the scatterplot relating unpaid parking violations (vertical axis, in natural logarithmic scale) and country corruption measures (horizontal axis), considering the period of non-existence, as well as the period of existence of legal consequences for unpaid parking violations by diplomats (in Figure 1, the label 'no' means no legal consequences, while the label 'yes' means the opposite). The plot suggests a positive (and possibly nonlinear) relationship between diplomats' traffic violations and each country's corruption measures. As a second step, Figure 2 presents a histogram containing unpaid diplomats' parking violations, i.e., the outcome variable on Fisman and Miguel's Model 3, called by the authors violations_all. We construct this figure to provide a visual representation of zero values contained in the outcome variable, as well as to emphasize the likely overdispersion present in the outcome variable. The graphical patterns described in Figure 2 shows the occurrence of zero values in 13.19% of the sample containing the outcome variable. At first, there is still no way to affirm the need of estimation of zero-inflated models based only on Figure 2, and more formal tests are needed before reaching stronger conclusions. Table 4 presents the variables used in Fisman and Miguel's Model 3, which were considered in the replication model under the multilevel perspective with inflation of zeros. For more information on the variables, see Fisman and Miguel [11].  Table 5 presents the descriptive statistics of the quantitative variables, as well as the frequency tables of the qualitative variables explained in Table 4, which, in fact, were considered in the so-called Model 3 proposed by Fisman and Miguel [11]. In terms of econometric methodology, Fisman and Miguel (2007) estimated a count data NB regression model. In Table 6, in addition to the original NB model proposed by Fisman and Miguel [11], we present a ZINB estimation (necessary for the application of the Vuong test), and a ZINBM estimation using exactly the same variables as the initial NB model. Table 6. Estimations of negative binomial model [11], and correspondent zero-inflated and zeroinflated mixed model.

Results and Discussions
The confirmation of overdispersion of the outcome variable, conditional to predictor variables, comes from the Cameron and Trivedi test [37], whose result is in Figure 3. In Appendix A, we present further details on the diagnosis of overdispersion. The verification of the existence of excess of zeros in the outcome variable unpaid parking violations (violations_all) can be elaborated in sequence, through the Vuong test, which compares likelihood functions between a GLM estimation, such as the NB, with the respective estimation with eventually an inflation of zeros. Table 7 shows the results for the Vuong test when comparing the NB regression model with the corresponding ZINB model. Since the Vuong test might be biased towards supporting models with zero-inflation components, according to Desmarais and Harden [54], we implemented Desmarais and Harden's correction to verify if the former results are somehow affected. While the Vuong test statistic is z = 2.95, the AIC and BIC corrected statistics are z = 2.67 and z = 2.15, respectively, or rather, all present p-values < 0.05. These results indicate the better adequacy of the zero-inflated negative binomial regression model, in comparison with the traditional GLM negative binomial regression model estimated by Fisman and Miguel [11].
Such a diagnosis, which points out the better suitability of the ZINB estimation, by itself, would already suggest that the original NB estimation present biases by not considering the excess of zeros in the outcome variable. Furthermore, the NB estimation fails to identify that the slope of the variable corruption, in the ZINB logistic component, indicates that an increase in one unit leads to a 50.01% decrease, on average, in the chance of structural zeros, ceteris paribus, since exp(−0.695) = 0.499. In other words, the variable corruption has an even greater preponderance in the absence of unpaid parking violations by diplomats in the considered countries.
This fact leads to a second important bias: the NB model overestimates almost all parameters of the other variables that were shown to be statistically significant, when compared to the count component of the ZINB estimation, although there is no inversion of signs-the exception occurs for the variable post.
When the original NB model is compared to the ZINBM estimation, which takes into account the nested structure of data among countries, the biases are more striking. The first point to note is that the consideration of these groupings leads to a smoothing of the slope of the variable corruption in the logistic component in comparison with the ZINB model.
The value of the slope parameter of the ZINB logistic component is −0.695; for the ZINBM estimation, −0.564. In other words, when considering the nested structure among countries, the chance of existence of structural zeros in the dataset is reduced, on average, by 43.10% when increasing the variable corruption in one unit, ceteris paribus, since exp(−0.564) = 0.569. It is also interesting to note that the NB model cannot even capture the ratio of the inflation of zeros in the dataset and, even if the ZINB model succeeds, it reasonably overestimates the studied situation by neglecting the natural nesting in the data.
When considering the existing nested structure in the dataset of Fisman and Miguel [11], we realize that there is an overestimation of the parameters of the NB model, regarding both the general intercept and the slopes of the predictor variables corruption, staff, Europe, and Middle East, when compared to the ZINBM model. On the other hand, the NB estimation underestimates the parameters of the other variables when compared to the ZINBM estimation. There was also no inversion of signals.
Thus, in the NB model, the dummy variable that identifies the Oceania region has a coefficient equal to 1.508, against 1.832 for the ZINBM model. Thus, when adopting the North America region as the reference region, the NB model predicts, on average, an increase in the chance of occurrence of the outcome variable in 351.77%, if the diplomat is from Oceania, keeping the other conditions constant. The ZINBM model, on the other hand, intensifies this situation, since the increase in the chance of occurrence of the outcome variable, on average, is 524.64% if the diplomat is from Oceania, taking the region North America as a reference, ceteris paribus. In short, the NB model underestimates the occurrence of unpaid parking violations among diplomats from Oceania region by 67.05% when compared to the ZINBM estimation. Using this same approach, when compared to the ZINBM estimation, the NB model underestimates the impact of the Asia region by 15.95%.
The parameter related to the Europe region also draws attention. The NB model overestimates the chance of unpaid parking violations among diplomats, on average, by 27.94%, ceteris paribus, when compared to the ZINBM model and being North America as the reference region. Under the same perspective, for the Middle East region, the NB estimate overestimates the impact by 20.20%.
Behind a functional form that is better adherent to the data of Fisman and Miguel [11], the ZINBM estimation has the ability to estimate random effects for the groups that characterizes the hierarchical levels-countries, in this case. Thus, the proposed model presents different error terms related to intercepts and slopes for each country.
Equation (8) presents the NB model proposed by Fisman and Miguel [11]; Equation (9) brings the result of the ZINB estimation, showing both the logistic and the count compo-nents; finally, Equation (10) demonstrates the ZINBM estimation, its logistic and count components, and the random error terms for the intercepts (ν 0j ) and for the slopes due to the corruption variable (ν 1j ). Random effects (ν 0j ) and (ν 1j ) of the ZINBM estimation are presented in Appendix B.  (10) Looking again at Table 6, it is interesting to note that the increase in LL between the models could be said to be modest, but in all cases, according to a likelihood ratio test, it is statistically significant at 1%. More than that, when analyzing, for example, the fitted values of the ZINBM model considering the predictor variable staff, which has not yet been explored, it is difficult to advocate in favor of the NB model, as shown in Figure 4. According to Figure 4, it is evident that the NB estimation is unable to adequately capture the inflation of zeros present in the dataset. It is also interesting to note that, for the ZINB estimation, even though it captures the excess of zeros better than the NB model, there is no ability to adequately segregate, due to natural nesting (countries), the occurrence of structural zeros. In fact, the ZINBM estimation is superior to all others for the case studied.

Final Considerations
After exploring different approaches to a well-known NB model, we propose to divide our final considerations into three main parts: theoretical aspects, managerial implications and future research in count data multilevel modeling.

Theoretical Aspects
In this article, we presented guidelines for the estimation of zero-inflated generalized linear mixed models. That is, models where the outcome variable presents an excess number of zeros and the dataset has a structure where observations are correlated in ways that require random effects.
As an illustration of the importance of zero-inflated generalized linear mixed models, we presented an application based on a dataset related to corruption practices among UN's diplomats [11]. We emphasize that, in no way, we tried to diminish the merit of the study and the authors' findings. Furthermore, at the time of the study, multilevel models were rare and computational capabilities represented a considerable constraint. Our main message is about the superiority of GLMM models, in relation to GLM, when the natural nesting of observations is available to researchers.
We were able to replicate the original study's main results, confirming their authors' claims regarding the role of cultural norms for corrupt behavior. We found a strong effect of corruption norms: diplomats from high-corruption countries (on the basis of existing survey-based indices) accumulated significantly more unpaid parking violations, i.e., cultural norms can be seen particularly as an important determinant of corruption (Table 6 and Equations (8)-(10)).
We also extended their original analysis by estimating zero-inflated generalized linear mixed models. At one hand, the reported results confirm Fisman and Miguel's original findings, in terms of estimated signs and statistical significance ( Table 6). On the other hand, the results allow us to discuss the importance of considering different zero-generating processes for outcome variables, also taking into account the nature and the structure of the considered dataset. In this sense, we are cannot reject the hypothesis favoring the use of ZINBM model over the traditional GLM models.

Managerial Implications
Management research still uses count-based models with parsimony [1,12]. Although there has been an increase in the use of such models, there is still considerable room for improvement, given the many opportunities related to interesting themes, such as patents, product innovations, and issue of shares, for instance. In fact, even when studying the influence of social factors over behavior, such as social ties, researchers might benefit from using count models [13].
While we believe the results presented here provide additional evidence supporting the use of count-based models, we emphasize the importance of zero-inflated generalized linear mixed models for specific applications. In a broader sense, these results are important for emphasizing potential uses of this class of models in distinct areas of management research. We agree with Blevins et al. [1], who discuss that there is still room for improvement in terms of existing research practices. Specifically, both researchers and practitioners could obtain substantial gains and insight by incorporating in their toolkit's models with excess zeros in the outcome variable in a multilevel perspective.
By presenting this guideline and applied example related to zero-inflated linear mixed models, we hope to see more studies of this kind in the near future, as well as to stimulate the development of new ways to analyze and interpret quantitative data in management research, as originally proposed by other authors [55,56]. By the way, Dale and Sirchencko [57] bring an important example in this regard.
In addition, we believe that datasets from organizational research sometimes do not explicitly represent contexts that can be defined as levels of analysis in multilevel modeling. Regardless, researchers and practitioners can create clusters from the observations' own data, allowing the establishment of latent higher-level random effects (in this case, at the level of the cluster itself), even if there are no variables corresponding to these clusters. Due to the fact that cluster analysis is an unsupervised exploratory technique, the predictive power of the estimation will be lost, but even so the model can be more adequately adjusted for diagnostic purposes regarding the behavior of the existing data.

Future Research in Count Data Multilevel Modeling
In our research, we did not find studies that estimate zero-inflated multilevel models using longitudinal data, in which the temporal dimension represents level 1 of the analysis. This can be an interesting prospect for future research.
Another relevant point is that we did not discuss endogeneity problems arising from the fact that entities such as individuals, organizations or countries can vary systematically from one another. As discussed by Antonakis et al. [58], researchers typically estimate multilevel models assuming the random effects are uncorrelated with the regressors. But this problem can be avoided by including cluster means of level 1 explanatory variables as controls.
From our research search, we did not find works that discuss these points and problems in estimations of Count Data Multilevel Models with inflation of zeros. March 2021) and http://emiguel.econ.berkeley.edu/research/corruption-norms-and-legal-enforcementevidence-from-diplomatic-parking-tickets (accessed on 28 March 2021). As an additional note, we did not get in touch with the authors in order to use their public dataset.

Conflicts of Interest:
The authors declare no conflict of interest.

Sample Availability:
The scripts for establishing the images and models of this study are available as supplementary materials.

Abbreviations
The following abbreviations are used in this manuscript: According to Cameron and Trivedi [33], in general Poisson regression models can be applied when the distribution of the occurrence of a given phenomenon being studied follows a Poisson distribution, as shown in Figure A1, where p represents, for a determined observation i(i = 1, 2, . . . , n), the probability of occurrence of a specific count for a determined exposition m (such as geographic region or period of time, for instance), as shown in Expression (A1). Klakattawi et al. [59] argue that Poisson models assume the existence of equidispersion of the count data for the variable of interest; i.e., µ i = E(Y i ) = Var(Y i ) = λ i , in accordance with Expressions (A2) and (A3), as follows: The coefficients of a Poisson regression are estimated by the likelihood function given by Expression (A4) as follows: from which the logarithm of the likelihood function given by Expression (A5) is derived.
which should be iterated until it reaches its maximum value.

Appendix A.2. Negative Binomial Regression Models and Overdispersion
The estimation of a NB model is directly linked to the presence of overdispersion in the count dataset [60], whose distribution is illustrated in Figure A2. In Figure A2, p is the likelihood of a random portion of the number of occurrences of the outcome variable Y i in exposure m, as given by Expression (A6): where ψ(ψ > 0) is the shape parameter. As discussed, a NB model assumes overdispersion of the outcome variable, conditional to predictor variables ( As discussed by Cameron and Trivedi [61], negative binomial models are estimated via the likelihood criterion given in Expression (A7), as follows: which should be iterated until it reaches its maximum value. The test for detecting overdispersion of count data proposed by Cameron and Trivedi [37] is based on Expression (A8), where H 0 is the equidispersion given by Var(Y|X) = E(Y|X), as follows: which is similar to the variance function of the NB model indicated by Expression (A3). For the test in Expression (A8), the significance of parameter φ must be verified, in which H 1 : φ > 0 and H 0 : φ = 0. Cameron and Trivedi [37] postulated, for the detection of overdispersion in the count data and at a certain level of significance, a Poisson model should be estimated a priori. According to the authors, after this, an auxiliary ordinary least squares (OLS) model should be estimated without the intercept, whose outcome variable Y * , given by Expression (A9), should be calculated using the fitted values of λ from the initially established Poisson model.
The auxiliary model given by Expression (A10) should use λ as the sole predictor variable.
After the estimation of the auxiliary model in Expression (A10), Cameron and Trivedi [37] recommend checking the p-value from the Student's t-test for the predictor variable λ. In the cases where P > |t| > sig, equidispersion at a pre-established significance level is indicated; when P > |t| ≤ sig, overdispersion at a pre-established significance level is indicated. In R, the Cameron and Trivedi Test [37] can be quickly done with overdisp() command.

Appendix A.3. Zero-Inflated Poisson Regression Models
In ZIP regression models, the probability p of occurrence of a zero-count for any given observation i(i = 1, 2, . . . , n, where n is the sample size), that is, p(Y i = 0), is calculated taking into account the sum of a dichotomous component with a count component and, thus, one may define the probability p logit i of occurrence of a zero-count due solely to the dichotomous component. Additionally, the probability p of occurrence of a specific count m(m = 1, 2, . . . ), that is, p(Y i = m), follows the expression of the Poisson probability distribution, multiplied by (p logit i ). Thus: where 1(.) stands for an indicator function, being Y ∼ ZIP(λ, p logit ), where λ is the expected number of occurrences of the outcome variable for a given exposure (incidence rate ratio). If p logit i = 0, clearly the probability distribution of (4) boils down to the Poisson distribution, including cases where Y i = 0. In other words, the zero-inflated Poisson regression models consider two generator processes of zeros, being one due to binary distribution (the so-called structural zeros) and the other due to the Poisson distribution (the so-called sample zeros).
While the occurrence of structural zeros can be influenced by certain vector of predictor variables, the occurrence of certain m count can be influenced by another vector of predictor variables. In some cases, the researcher can enter the same variable in two vectors, if the decision is to investigate whether these variable influences, concomitantly, the occurrence of the event and, if so, the amount of events (counts) of that phenomenon. Maximum likelihood method is used for obtaining the estimated values of λ and p logit . From (A11), the likelihood function is given by Expression (A12): Following Mouatassim et al. [62], suppose that N is the number of outcomes and N 0 is the number of zeros in the data. The likelihood becomes: Thus, the log likelihood can be written as follows: As stated in Mouatassim et al. [62], the numerical algorithm can be used to find the estimated value of λ. Thus, the estimated value of p logit can then be determined by replacing λ in (A15) or (A16).

Appendix A.4. Zero-Inflated Negative Binomial Regression Models
In ZINB regression models, the probability p of occurrence of a zero-count for any given observation i, that is, p(Y i = 0), is also calculated taking into account the sum of a dichotomous component with a count component, and the probability p of occurrence of a particular count m(m = 1, 2, . . . ), that is, p(Y i = m), now follows the expression of the Poisson-Gamma probability distribution: where 1(.) stands for an indicator function, being now Y ∼ ZI NB(φ, λ, p logit ) and φ the inverse of the shape parameter of a given Gamma distribution.
Again, if p logit i = 0, the probability distribution of (A18) comes down to the Poisson-Gamma distribution, including cases where Y i = 0. Hence, the ZINB regression models also feature two generator processes of zeros, derived from the binary distribution and the Poisson-Gamma distribution, as stated in Cameron and Trivedi [61].
Therefore, based on (A18), one can define the log-likelihood function to estimate the parameters of a ZINB regression model as follows: The EM algorithm or the Newton-Raphson method can be used to obtain the maximum likelihood estimates. Figure A3 shows a comparison between some Poisson, NB, ZIP and ZINB theoretical distributions.