A Flexible Mixed Model for Clustered Count Data

Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to the underlying count process mechanism. We study the cross-sectional COM-Poisson regression model—a generalized regression model for count data in light of data dispersion—together with random effects for analysis of clustered count data. We demonstrate model flexibility of the COM-Poisson random intercept model, including choice of the random effect distribution, via simulated and real data examples. We find that COM-Poisson mixed models provide comparable model fit to well-known mixed models for associated special cases of clustered discrete data, and result in improved model fit for data with intermediate levels of over- or underdispersion in the count mechanism. Accordingly, the proposed models are useful for capturing dispersion not consistent with commonly used statistical models, and also serve as a practical diagnostic tool.


Introduction
The most commonly used distribution for count data is the Poisson model. Poisson regression models consider the relationship between count response and explanatory data under the strict assumption of equi-dispersion. Count data often arise in clusters where multiple outcome occasions observed for a cluster are naturally related. Positive correlation inherent to clustered data is a frequent cause of over-dispersion [1]. Poisson random effects regression models are regularly used for clustered count data because they account for additional variability induced by correlation of measurements within a cluster and thus address over-dispersion. Specifically, the Poisson random intercept model allows the intercept to vary by cluster, inducing a compound symmetric correlation structure [2]. This model combines the standard Poisson count model with a cluster-specific term that reflects cluster-level heterogeneity [3,4]: log λ * ij = x T ij β + α i α i ∼ g(α i |θ) for i = 1, . . . , N and j = 1, . . . , J i , where y ij is a scalar count outcome for cluster i at occurrence j, x ij is a p-dimensional vector of explanatory variables, and g(α i |θ) is the density of the random effect α i characterized by parameters θ. That is, the count outcome for cluster i at occurrence j follows a Poisson distribution conditional on a cluster-specific random effect, a set of covariates and a vector of regression parameters (β 1 , . . . , β p ) that are common to all subjects. If an intercept parameter is included in the vector of regression parameters, then count distributions. Section 5 presents a comparison of random intercept count models fit on real data. Section 6 concludes and discusses possible extensions of this work.

COM-Poisson Distribution and Regression Model
The COM-Poisson distribution is a flexible distribution for count data that allows for over-or under-dispersion [15,31,32]. The COM-Poisson probability mass function for a single observation takes the form P(Y = y | λ, ν) = λ y (y!) ν Z(λ, ν) , y = 0, 1, 2, . . . (2) for a random variable Y, where Z(λ, ν) = ∑ ∞ s=0 λ s (s!) ν is a normalizing constant. In this setting, λ = E(Y ν ), where ν ≥ 0 is the dispersion parameter such that ν = 1 denotes equi-dispersion and ν > (<)1 signifies under-(over-)dispersion. The moments of the COM-Poisson distribution are not of closed form, however, Shmueli et al. (2005) note that assuming an asymptotic approximation for Z(λ, ν) leads to a close approximation for the mean: The COM-Poisson distribution includes three well-known distributions as special cases: Poisson with rate parameter λ (ν = 1); geometric with success probability 1 − λ (ν = 0, λ < 1); and Bernoulli with success probability λ 1+λ (ν → ∞). Sellers and Shmueli [18] extend the COM-Poisson distribution to the regression context allowing varying λ for each observation i. This regression formulation relates λ i to explanatory variables using the link log λ i = β 0 + β 1 x i1 + · · · + β p x ip = x T i β, (4) thus specifying an indirect link between the mean and the linear predictor. This formulation links the logarithm of the ν th raw moment of Y i to the linear predictor. The associated log-likelihood is log L i (β, ν) = y i log λ i − ν log y i ! − log Z(λ i , ν). (5) Given this construct, the COM-Poisson regression model has two notable special cases: Poisson regression with ν = 1, and logistic regression with ν → ∞. The dispersion parameter ν may be modeled, however, we assume a constant ν in the development of the COM-Poisson mixed model for clustered data.

COM-Poisson Regression Mixed Model
We further study the extension of the Sellers and Shmueli [18] COM-Poisson regression model to include a random intercept for modeling clustered data. Throughout this paper we use the term clustered data in a general sense to include specific types of correlated data such as longitudinal, repeated measures, as well as spatial and family data [5].

Model Formulation
The COM-Poisson regression random intercept model is defined as where notation is defined as in Section 1. In this formulation, the additive intercept shift of α i can be reparameterized in terms of a multiplicative effect with u i = exp(α i ). This mixed model assumes that the association between the explanatory and response variables are the same across all clusters, while allowing for variability in the intercept value associated with different clusters. The associated conditional probability mass function for observation j of cluster i is Assuming conditional independence, the marginal likelihood for cluster i can be written as where x i is the set of x ij and y i is a vector of the clustered response outcomes (y i1 , . . . , y iJ i ). The random effect parameter θ captures all the dependence between multiple outcomes within a cluster, including the association of outcomes measured on different observation units. Under certain assumptions, this likelihood reduces to familiar models that can be easily estimated with maximum likelihood. For example, for the special case of Poisson (ν = 1), g(u i |θ) is sometimes chosen to be the conjugate gamma distribution such that L i reduces to a tractable form.

COM-Poisson-Lognormal Model
It is a common assumption in the mixed model literature to let the random effects be normal-distributed: α i ∼ N (µ, σ 2 ) or equivalently u i ∼ log N (µ, σ 2 ). For the COM-Poisson mixed model-as well as the standard Poisson mixed model-this assumption does not result in a closed form marginal loglikelihood and requires computational techniques. The marginal likelihood for the COM-Poisson-lognormal (CMP-LN) model for cluster i as per Equation (8) is As described in Morris and Sellers [27], maximum likelihood estimates can be obtained in R using (1) numerical integration (e.g., the integrate function) to obtain an approximation of the marginal loglikelihood: and (2) optimization (e.g., the nlminb function) to maximize the approximate marginal loglikelihood. The integrate function in the base stats package in R implements a variant of Gaussian quadrature based on QUADPACK routines [33]. A variety of alternatives for numerical integration exist, but this standard function suffices for this work. We use the default settings in nlminb to implement unconstrained optimization using PORT routines where positivity constraints for ν and σ 2 are incorporated into the objective function by exponential transformations. Maximum likelihood estimates for the CMP-LN model can similarly be obtained in SAS via a user-defined loglikelihood function in the NLMIXED procedure [26]. Approximate standard errors are obtained from the square root of the diagonal entries of the inverse observed Fisher information associated with the numerically-derived loglikelihood using the hessian function in the numDeriv package in R. The Z function is approximated with a finite summation using a truncation point at which successive terms are small; a truncation point of 100 sufficed for analysis in this paper. The loglikelihood is programmed using Rcpp in R [34] which greatly decreases the computing time.

COM-Poisson-Conjugate Model
Kadane et al. [30] established the probability density function that serves as the joint conjugate prior for a COM-Poisson distribution, for λ > 0 and ν ≥ 0, where κ(a, b, c) is the integration constant. This joint conjugate density is a proper density for finite κ −1 (a, b, c) which occurs for b c > log( a/c !) + (a/c − a/c ) log( a/c + 1). This distribution is conjugate for the COM-Poisson distribution in the Bayesian sense: the posterior distribution has the same form as Equation (11). The associated conditional distribution of λ derived from this "extended bivariate gamma distribution" is where κ(a, c) is the integration constant. This conditional conjugate density is a proper density for finite κ −1 (a, c). Figure 1 depicts the conditional conjugate distribution for select values of a, c and ν. The interested reader can further explore behavior of the conditional conjugate distribution via an R Shiny app developed by Morris [35]. For the Poisson, geometric, and Bernoulli special cases of the COM-Poisson distribution, the conditional conjugate distribution reduces to well-known distributions, respectively: gamma(a, c) for ν = 1; beta(a, c + 1) for ν = 0; and a c−a F(2a, 2(c − a)) with c > a or equivalently λ 1+λ ∼ beta(a, c − a) for ν = ∞ [30]. These are the familiar conjugate relationships between the Poisson-gamma, geometric-beta, and Bernoulli-beta distributions.
Kadane et al. [30] determine precise parameter constraints to induce a proper joint conjugate density. We find empirically that the corresponding conditional conjugate density is not proper when a > c and ν is large. Specifically, is divergent when a > c and ν is not small enough to compensate for the large numerator λ a−1 . Figure 2 presents (a, c, ν) combinations for which numerical integration evaluates respectively with and without error. This empirical assessment indicates a complex boundary that depends on the relative size of the a and c parameters compared with the size of the dispersion parameter ν.  As an alternative to the CMP-LN model, we propose a model that assumes the random effects follow the conditional conjugate distribution so that This assumption leads to the marginal likelihood for the COM-Poisson-conjugate (CMP-C) model for cluster i as per Equation (8): It is assumed that the same ν shapes the data distribution and the random effects distribution. This assumption yields the familiar special case of the Poisson-gamma random intercept model (CMP-C with ν = 1); however, a model with one ν to define the data distribution and another ν to define the random effects distribution would allow further flexibility.
The marginal likelihood for the CMP-C model is not generally of closed form and thus maximum likelihood estimation requires computational techniques. This is true despite the fact that the distribution of Equation (13) is the conditional conjugate for the COM-Poisson distribution because the model violates the notion of "strong conjugacy" [12]. Under the stated distributional assumptions, the posterior distribution of the random effects is of the form which differs from the form of the random effects distribution in Equation (13), and therefore violates the Bayesian notion of conjugacy. For the random effects distribution, the Z function depends on u i alone, whereas the Z function from the data distribution depends on u i λ ij . For the commonly used Poisson-gamma random intercept model, the form of the Z function allows all Z functions to reduce together. This is true because for gamma distributed random effects, one can write λ ij g(u i |θ) = g(λ ij u i |θ) so that the random effect multiplied by a fixed term results in the same random effect distribution with a scaled set of parameters [12]. This property of the random effect distribution defines strong conjugacy [12]. The CMP conditional conjugate, however, does not allow for this multiplicative invariance, and thus the CMP-C model generally does not have closed form. This holds true also for the geometric-beta and Bernoulli-beta special cases because multiplicative invariance is not a property of the beta distribution. The flexibility of the CMP-C model is achieved through the modeling of over-and under-dispersion with the CMP conditional distribution, together with the variety of random effect distributional forms allowed through the extended gamma conjugate distribution. Despite not leading to a closed form marginal likelihood, the flexibility of the conjugate distribution offers a potential advantage over the CMP-LN model, as well as an opportunity to assess sensitivity to the assumed random effect distribution. If a similarly flexible conditional count distribution is found to have an associated conjugate distribution with multiplicative invariance, then the existence of such a combination would certainly be more computationally efficient. Study of the formulation and properties of conjugate distributions for alternative over-and underdispersion count distributions is a topic of future research.
Assuming clusters are independent, the full loglikelihood of the CMP-C model is We obtain maximum likelihood estimates in R using (1) numerical integration (e.g., the integrate function) to obtain an approximation of this marginal loglikelihood and (2) optimization (e.g., the nlminb function) to maximize the approximate marginal loglikelihood. The standard error calculations, Z function approximation and use of Rcpp is as described in Section 3.2.

Analysis: Simulated Data
To illustrate the flexibility and utility of these COM-Poisson mixed models, we present analysis of simulated clustered count data under a variety of data generation specifications. For each specification, count responses are generated for N = 100 clusters observed J i = 5 times for a total of 500 observations in each of 50 replications. The simulation study is designed as where g(u i |θ) = log N (0.5, 0.5) in Scenario I and g(u i |θ) = gamma(1.54, 1.37) in Scenario II. The parameters of the gamma distribution in Scenario II are defined to correspond to the same mean and variance as the lognormal distribution in Scenario I. Five con- ij , 5 , and CMP λ * ij , 0.75 . These distributions correspond to the three special cases of the COM-Poisson distribution and two cases with intermediate levels of under-or over-dispersion in the underlying count mechanism. Simulated data from the special cases are generated using the rpois, rbinom and rgeom functions in R, while the intermediate cases are generated using the rcmp function in the COMPoissonReg package [36].
Four candidate random intercept models are fit to the simulated data: Poissonlognormal (Poi-LN), negative binomial-lognormal (NB-LN), COM-Poisson-lognormal (CMP-LN), and COM-Poisson-conjugate (CMP-C). These four models capture variations in distributional assumptions of the data and the random effects. The Poi-LN and NB-LN models are chosen for comparison because they are the most widely used and easily accessible models. They are fit using the default settings in glmer and glmer.nb from the lme4 function in R [37]. Logistic random intercept models with normal-distributed random effects (L-LN) are also fit using glmer for the Bernoulli special case simulated data. We measure model fit through two measures: the proportion that result in the largest loglikelihood, and the proportion of the 50 replications that result in the smallest AIC within 2. The tolerance for AIC comparisons follows from Burnham and Anderson [38] assessing that models with an AIC value no more than 2 greater than the lowest AIC have favorable evidence comparable to that of the lowest AIC model.
Embedded in this simulation study are two types of misspecification. First, the link between the mean and the linear predictor is misspecified in CMP-LN and CMP-C models for geometric simulated data, and in the Poi-LN and NB-LN models for the CMP simulated data. For the three special case simulated data, the link to the linear predictor is assumed through the mean. In the Poisson and Bernoulli special cases, that coincides with the specification of the CMP-LN and CMP-C models linked through λ * it . That is, the additive random effect on the linear predictor has the same interpretation. For geometric data, however, a specified mean of a geometric random variable is not equivalent to the mean of a COM-Poisson random variable with ν = 0. That is, the additive random effect on the linear predictor relates directly to the mean in the geometric case, but relates to an indirect function of the mean in the COM-Poisson case, making the CMP-LN and CMP-C models misspecified with respect to the link to the linear predictor. Conversely, for the COM-Poisson simulated data, the Poi-LN and NB-LN models are misspecified with respect to the link to the linear predictor. This misspecification affects interpretation of β and σ 2 . Second, the random effect distribution is misspecified for the CMP-C model in Scenario I, and for all models in Scenario II except for the CMP-C model fit to the Poisson simulated data. Table 1 presents results from fitting the four models on Scenario I simulated data. We find that one or both of the CMP models have better or comparable model fit for all simulated data settings except for the geometric special case. For the Poisson special case, the Poi-LN and CMP-LN models achieve the smallest AIC for 96% of the 50 simulated datasets. The CMP-LN model, however, yields the largest loglikelihood in 76% of the Poisson simulated datasets as compared to 0% for the Poi-LN model. For the Bernoulli special case, the L-LN, CMP-LN and CMP-C models achieve the smallest AIC for 100% of the 50 simulated datasets. The CMP-LN model, however, yields the largest loglikelihood in 55% of the Bernoulli simulated datasets as compared to 0% for the L-LN model. The dominant loglikelihood of the CMP-LN over the special case models is mitigated with respect to AIC because the CMP-LN model has an additional dispersion parameter. For the geometric simulated data, the NB-LN model outperforms both of the CMP models. We expect the NB-LN model counting to one success (i.e., a geometric-LN model) to perform similar to the CMP models because it is a special case. The simulation result, however, shows that the additional flexibility of the unrestricted NB-LN provides better fit for a majority of the extreme overdispersed simulated datasets. For both cases of the CMP simulated data, one or both of the CMP models greatly outperform the Poi-LN and NB-LN models.  The CMP-LN model fits better or similarly to the CMP-C model in all cases. The CMP-LN greatly outperforms the CMP-C model in the equi-dispersion and intermediate over-dispersion cases; while the CMP-LN and CMP-C models result in similar model fit for the under-dispersed and extreme over-dispersed data even though the random effect distribution of the CMP-C model is misspecified in this Scenario I.

Simulated Data Scenario I: Normal-Distributed Random Effects
The CMP models recognize true dispersion levels in all cases:ν ≈ 1 for Poisson,ν is large for Bernoulli,ν ≈ 0 for geometric,ν ≈ 5.00 > 1 for COM-Poisson (under-dispersion), andν ≈ 0.75 < 1 for COM-Poisson (over-dispersion). The dispersion parameter for the NB-LN model is appropriatelyk = 0.00 for the equi-and underdispersed simulated data and k ≈ 1.00 for the extreme overdispersed data. For the case of intermediate over-dispersion, however, all of the overdispersion is attributed by NB-LN to the clusteringσ 2 = 0.75 rather than overdispersion in the underlying count processk = 0.00.
The estimated random effect varianceσ 2 indicates the magnitude of the cluster effect, on average, for each model that assumes lognormal-distributed random effects. Because the conjugate distribution does not have closed form moments, we plot in Figure 3 the estimated random effect distribution for each of the simulated datasets based on the CMP-C model parameter estimates. For comparison we also include the density associated with the CMP-LN model and the true distribution. The plot legend includes the mean and standard deviation for the estimated distribution. These quantities are calculated by theoretical moments and moment definitions for the lognormal and conjugate density, respectively. An appropriate level of cluster variability is captured by all models for equi-dispersed data: σ 2 ≈ 0.50 and see Figure 3a. For the under-dispersed simulated data, however, the Poi-LN and NB-LN are inadequate for jointly capturing cluster variability when the data have underlying underdispersion:σ 2 = 0.00 for Bernoulli and CMP with intermediate underdispersion simulated data. For the overdispersed simulated data, the cluster variability is slightly over-estimated with the Poi-LN and NB-LN models, however, in the intermediate overdispersed CMP simulated data, this is likely due to misspecification in the linear predictor. For all simulated data, except for the geometric special case, Figure 3 depicts cluster variability close to the true magnitude for the CMP-LN and CMP-C models. In the geometric special case, the estimated random effect distributions are observed close to the true distribution when the true distribution is adjusted to reflect the differences in the link between the mean and the linear predictor.  Table 2 presents results from fitting the four models on Scenario II simulated data. Similar to the Scenario I findings, we find that one or both of the CMP models have better or comparable model fit for all simulated data settings except for the geometric special case. Interestingly, we find in this Scenario II that the CMP-C greatly outperforms even the Poi-LN model for the Poisson simulated data, indicating sensitivity of the Poi-LN model to random effect misspecification. Contrary to Scenario I findings, we find that the CMP-C model fits better or similarly to the CMP-LN model in all cases. The CMP-C greatly outperforms the CMP-LN model for the equi-and overdispersed data; while the CMP-LN and CMP-C models result in similar model fit for the underdispersed data. Similar to Scenario I, the CMP models recognize true dispersion levels in all cases. The NB-LN jointly captures the two sources of overdispersion for the extreme overdispersion case (i.e., geometric), but not the intermediate over-dispersed CMP case. Both the Poi-LN and NB-LN models over-estimate clustering variability for equi-and overdispersed data, while they cannot capture any clustering variability when the data have underlying underdispersion. Figure 4 depicts cluster variability for the CMP-LN and CMP-C models. Just as in Scenario I, the estimated random effect distributions are observed close to the true distribution except for the geometric special case on the original log link scale. Figure 4 shows that the estimated densities from Scenario II are further from the true distribution than in Scenario I, likely because all models (except CMP-C for the Poisson case) are misspecified with respect to the random effect distribution.

Analysis: Epilepsy Data
To illustrate the practical flexibility of the COM-Poisson mixed models, we study the epilepsy dataset originally analyzed by Thall and Vail [39], subsequently discussed in Diggle et al. [40], and generally often used as an example for longitudinal data analysis of discrete outcomes (e.g., [6,10,11]). This dataset concerns the number of seizures measured for 59 epileptic patients in an initial 8-week baseline period, followed by 4 consecutive 2-week treatment periods. The outcome variable of interest y ij is the number of seizures for subject i in time period j for j = 1, 2, 3, 4, 5. We assume the following form of the linear predictor: where T ij is the length of time period j in weeks, x ij2 indicates the receipt of an anti-epileptic drug Progabide as opposed to a placebo, x ij1 is an indicator of a period after baseline (weeks [8][9][10][11][12][13][14][15][16], and α i is the random intercept. This differs from the specification in Diggle et al. [40] in that we include log(T ij ) as a covariate rather than an offset. This specification change is because an offset in a linear predictor linked indirectly to the mean (e.g., for the COM-Poisson regression model) can no longer be interpreted as a rate as in Poisson and negative binomial regression. Furthermore, all log(T ij ) are the same value except for the baseline measurement making it colinear with x ij1 , so we exclude x ij1 as a main effect. Table 3. Epilepsy data parameter estimates (standard errors), AIC, loglikelihood, and Pearson statistic Expected counts E(y ij ) use empirical Bayes estimates for random intercept in all models, and mean approximation in Equation (3) for CMP models. AIC and X 2 are displayed for models fit with and without Subject 207. Likelihood ratio test p-values for hypothesis testing of H 0 : σ 2 = 0, H 0 : k = 0 and H 0 : ν = 1 are displayed in brackets next to associated estimate.  Table 3 presents the parameter estimates for the Poi-LN, NB-LN, CMP-LN and CMP-C models. All four of the models indicate a cluster/longitudinal effect. This is evident through the variance parameter estimate of the normal random effect distribution in the Poi-LN, NB-LN and CMP-LN, whereσ 2 = 0.608, 0.661 & 0.143 > 0, respectively. Recall from discussion of the analysis of simulated data, that the random effect variance estimated in the CMP-LN is not comparable to the Poi-LN and NB-LN models because the random effect is linked indirectly to the mean. For the CMP-C model, the empirical mean and variance associated with the conditional conjugate distribution at the estimated parameter values are 1.55 and 0.64, respectively. Likelihood ratio tests for all models indicate that the longitudinal effect is statistically significant, recognizing that the test is conservative due to testing on the boundary of the parameter space [41]. Figure 5 shows that the lognormal and conjugate random effects distribution at the estimated parameter values are similar, with both indicating a nonzero variance. The NB-LN and both CMP models can account for variability beyond the longitudinal effect. We find that overdispersion is evident in the negative binomial model (k = 0.148 > 0) and both of the COM-Poisson models (ν = 0.420 & 0.412 < 1). These effects are statistically significant based on the likelihood ratio test comparing NB-LN, CMP-LN and CMP-C to the Poi-LN model: see Table 3. These findings of longitudinal effects and over-dispersion are consistent with previous analyses of this dataset, but further illustrate the utility of a more flexible mixed model for clustered count data.

Parameter
The two COM-Poisson models offer better model fit over the negative binomial model, with the CMP-LN model outperforming the CMP-C model according to AIC. Model fit based on the Pearson statistic of squared Pearson residuals show that the Poi-LN model outperforms the NB-LN and both CMP models: see Table 3. Consistent with the outlier analysis in Diggle et al. [40], we find that the set of observations from Subject 207 contribute greatly to the magnitude of the overall Pearson statistic, particularly for the CMP models. The analysis excluding this set of observations shows a smaller Pearson statistic-substantially smaller for the CMP models-and a smaller AIC for all models except CMP-LN. We find that excluding the observations for i = 207 results in the NB-LN model outperforming the CMP models according to both AIC and the Pearson statistic. This suggests an interesting interpretation and influence of outliers defined with respect to the simpler Poi-LN model. The main advantage of the NB-LN and CMP models is how they account for dispersion in the underlying count process that is inconsistent with the assumption of the Poi-LN model. The estimation of a dispersion parameter incorporates information about all observations, even those deemed outliers via the Poi-LN model. In this sense, an outlier with respect to the Poi-LN model is likely to influence the model fit and the expected counts in the NB-LN and CMP models by design. In this data example, the effect of the influential observations is more pronounced in the Pearson statistic for the CMP models than for the NB-LN model, but less pronounced in the AIC. This highlights the importance of data quality and exploratory investigation into unusual data points. If unusual data points are deemed accurate, then the NB-LN and CMP models can naturally incorporate their influence through the flexibility of a dispersion parameter. In contrast, if the unusual data points are deemed inaccurate, then they may inappropriately influence the model fitting in the NB-LN and CMP models.

Conclusions and Discussion
The COM-Poisson mixed model is a flexible model for count data in light of data dispersion due to clustering and the underlying count process. Analysis of a variety of simulated clustered count datasets with varying degrees of underlying dispersion illustrates the flexibility of the COM-Poisson mixed model to provide a good model fit for special cases of well-known count distributions, as well as cases with intermediate levels of over-or underdispersion. In addition to the flexibility that the COM-Poisson data distribution allows, we illustrate further flexibility through the choice of normal-and conjugate-distributed random effects. Even though conjugacy does not allow generally for computational simplification, we find that it can be useful to assume the nonstandard random effects distribution to achieve better fit and to assess robustness of fixed parameter estimates. In the analysis of the classic epilepsy data, we find that the COM-Poisson mixed models indicate multiple sources of dispersion due the longitudinal nature and the underlying overdispersion in seizure counts. In this example of overdispersed data, the negative binomial mixed model provides similar insight, however the COM-Poisson mixed models would provide unique insight in the case of underdispersed data. The ability of the COM-Poisson mixed models to jointly capture clustering and underlying data under-or over dispersion makes them a practical diagnostic tool. Through comparison to simpler (nested) models, e.g., via likelihood ratio tests, COM-Poisson mixed models allow one to assess if accounting for data dispersion and/or clustering is necessary.
These models could be extended in a straightforward way to allow ν to vary with covariates; for simplicity we have considered only a constant ν. Allowing fixed effects for dispersion parameter modeling is conceptually simple, but potentially leads to identifiability complications because assuming a "common dispersion parameter across the entire data" allows estimable dispersion effects separate from clustering effects [28]. In some data scenarios, it may make sense to consider ν to also be a function of its own random effects. This introduces additional computational complexity due to multiple integrals in the marginal loglikelihood, and in the case of assuming the COM-Poisson conjugate distribution, the violation of strong conjugacy with respect to λ remains. Incorporating random slopes in the λ and/or ν formulation is similarly conceptually straightforward, but adds computational complexity.
Computational challenges and efficiency issues are not unfamiliar in the formulation and fitting of random effects models for discrete outcomes. The random intercept CMP-LN and CMP-C model fit to the 295 observations in the epilepsy dataset required only 2-3 min of computation time. As the number of clusters/observations per cluster/fixed effects/random effects increase, the estimation procedure may become computationally prohibitive. For the simulation and data analysis in this paper, general purpose numerical integration and optimization procedures sufficed. Computational efficiency may be improved by tailoring the numerical integration and optimization procedures to the specific nuances of the CMP-LN and CMP-C model functional form.