Comparing Bayesian Spatial Conditional Overdispersion and the Besag–York–Mollié Models: Application to Infant Mortality Rates

: In this paper, we review overdispersed Bayesian generalized spatial conditional count data models. Their usefulness is illustrated with their application to infant mortality rates from Colombian regions and by comparing them with the widely used Besag–York–Mollié (BYM) models. These overdispersed models assume that excess of dispersion in the data may be partially caused from the possible spatial dependence existing among the different spatial units. Thus, speciﬁc regression structures are then proposed both for the conditional mean and for the dispersion parameter in the models, including covariates, as well as an assumed spatial neighborhood structure. We focus on the case of response variables following a Poisson distribution, speciﬁcally concentrating on the spatial generalized conditional normal overdispersion Poisson model. Models were ﬁtted by making use of the Markov Chain Monte Carlo (MCMC) and Integrated Nested Laplace Approximation (INLA) algorithms in the speciﬁc context of Bayesian estimation methods. methodology, V.N.-A., and M.M.-O.; software, M.M.-O.; writing—original draft, V.N.-A., and M.M.-O.; writing—review and editing, V.N.-A., M.M.-O.


Introduction
Generalized linear models (GLM) are commonly used to model the response variable when working with count data (see [1]). However, count data regression models commonly exhibit overdispersion, a phenomenon generated when the data show a larger variance than the one that would be expected from the specification of the model itself (see [2]). This situation is also known as extra-Poisson or extra-binomial variation when the response variable is assumed to follow either a Poisson or a binomial distribution, respectively. Fitting a model without taking into account the presence of overdispersion may result in standard errors underestimation, among other problems, which can lead to an inferential process that may be incorrect (see [2]). One of the possible causes for overdispersion is the presence of correlation among the values of the variable under study for the different units considered in the specific data set being analyzed (see [3]). This is specially common with spatial data, where observations in locations that are closer in space tend to show similar values, a phenomenon known as spatial autocorrelation (see [4]). Consequently, these issues must be taken into account in order to obtain reliable inference processes for the estimated parameters in the proposed model.
Overdispersion has been extensively studied in the literature, particularly for the case of response variables which follow a Poisson or a binomial distribution. A common way to deal with overdispersion in these cases is to modify the mean variance relation for the model by scaling it with a dispersion parameter larger than one, and then use a quasi-likelihood approach for the estimation (see [1]). For overdispersed count data following a Poisson distribution, perhaps the most popular model is the negative binomial model, initially proposed by Margolin, Kaplan and Zeiger [5], who applied their proposed model to data from a mutagenicity assay for the Salmonella bacteria. They were able to model the variability and, hence, the precision, when replicating plate environments by means of the extra-Poisson variation parameter. The normal or log-normal Poisson models, proposed by Hinde [3], included a random effect following a normal distribution in the linear predictor, with a variance given by the extra parameter, so that it allowed for the possible existing overdispersion in the model specification itself. This model has been used for the analysis of cancer death rates by Breslow [6], where he proposed two estimation methods by weighted least squares in three steps. Results indicated that there was a clear evidence of variability in the data that the Poisson model failed to capture, an issue that was resolved with the fitting of the normal Poisson model with the extra overdispersion parameter included in it.
Williams [7] proposed the beta-binomial model for the case of binomial responses. This model assumed that the probability of success from the binomial distribution is a random variable following a beta distribution, which includes an additional parameter to model the extra-binomial variation. Williams [8] also modified the logistic linear model by incorporating an extra-binomial variation component and proposed a methodology for computing the maximum likelihood estimates of the regression parameters based on iterated re-weighted least squares. He applied his proposal to a data set where it was required to model the proportion of germinating seeds of two different types and root extracts, finding similar results when comparing his model's estimates to those from other models previously proposed in the literature. The normal binomial model can be obtained as an extension of this model by considering that the extra-binomial variation component follows a normal distribution (see [9]).
Most overdispersion models assume the existence of constant dispersion in the data but, in a considerable number of cases, dispersion could behave differently. Therefore, models have been developed to allow for the dispersion to vary as a function of some explanatory variables. Hinde and Demétrio [2] and McCullagh and Nelder [1] mentioned the idea of joint modeling of the mean and dispersion parameters. Double exponential families were introduced and applied to specific settings for the binomial and Poisson cases by Efron [10]. These distributions allow to model the mean of exponential family distributions, as well as to be able to capture the possible existing overdispersion in count data by specifying a regression model for the dispersion parameter. Quintero-Sarmiento, Cepeda-Cuervo and Núñez-Antón [9] proposed Bayesian extensions of overdispersion models, such as the negative binomial and the normal Poisson models for count data, where they assumed regression structures both for the mean and for the dispersion parameters, and, therefore, they were able to specify the so-called generalized overdispersion models.
Two of the most common model specifications to account for spatial dependence in the data are the Conditional Autoregressive (CAR) model [11] and the Simultaneous Autoregressive (SAR) model [12]. These models incorporate the spatial correlation by assuming a conditional covariance structure specified by means of a spatial neighborhood matrix. Wall [13] examined the correlation structures that these models imply in detail by fitting them to SAT scores from students in the USA, which led to finding conterintuitive results given that the assumed CAR and SAR structures for this data implied spatial correlations among some states that did not correspond to reality or to the specific data set characteristics.
Besag [11] introduced autoregressive models for count data, such as the auto-binomial and auto-Poisson models, among others, by following the SAR model's structure. These models, however, present the disadvantage that they can only account for negative spatial autocorrelation in the data (see [14]). The spatial autoregressive Poisson model was proposed by Lambert, Brown and Florax [15], who assumed a SAR model's structure for the mean of the response variable, for which a two step limited information maximum likelihood estimation method was also proposed. They analyzed a data set concerning firm births in the different states in the USA for the period 2000-2004, and concluded that this proposed model and estimation method allowed them to better understand how geographical determinants can affect the firm births under study.
In the context of Bayesian regression models for count data, spatial dependence is commonly addressed by specifying a hierarchical model, including a set of random effects in the linear predictor of a GLM, for which spatially correlated prior distributions are usually assumed. In this way, the CAR model's structure has been extended by the well known Besag-York-Mollié (BYM) model (see [16]), where the spatial dependence in the data is incorporated by means of a spatially correlated prior with intrinsic CAR distribution, and an extra-variability is also included with a set of uncorrelated random effects. This model, in particular, is widely used for estimating relative risks in small areas in disease mapping (see [17]). Comparison between this model and alternative models applied in this area can be found in Best, Richardson and Thomson [18]. Alternative variants of prior distributions that have been proposed can be also consulted in Lee [19].
More recently, a new parameterization of the BYM model has been developed with the proposal of the so-called BYM2 model, which allows for a more comprehensible interpretation of the parameters and offers some other improvements over the BYM model (see [20]). Morris, Wheeler-Martin, Simpson, Mooney, Gelman and DiMaggio [21] applied the BYM2 model to a data set of motor vehicle crashes from 2005 to 2014, where school-age pedestrians resulted injured in the city of New York, and explored sociodemographic factors related to their occurrence. They concluded that the BYM2 model offered a good fit to this data and are able to identify areas with increased risk of pedestrian injuries in crashes, and to establish positive relationships between this risk and the number of people who commute to work via walking, bicycle and public transport.
In order to be able to address the issue of spatial autocorrelation in the data, Cepeda-Cuervo, Córdoba and Núñez-Antón [22] proposed the spatial conditional models for overdispersed spatial count data, where they assumed that part of the overdispersion can be explained by the spatial neighborhood structure in the proposed models. This model accounts for the spatial dependence in the data by incorporating a spatial term in the linear predictor with a parameter that estimates the intensity of the spatial association. By adopting a Bayesian framework, the authors applied these models to infant mortality data from Colombia and to postnatal screening tests, obtaining good fit and being able to model the positive spatial dependence, as well as accounting for data overdispersion. Additionally, to allow for nonconstant overdispersion, extensions of the generalized overdispersion models of Quintero-Sarmiento, Cepeda-Cuervo and Núñez-Antón [9] were also proposed by incorporating spatial neighborhood structures in the regressions for the mean and the variance and, thus, proposing the so-called generalized spatial conditional overdispersion models.
When fitting Bayesian models, it is not always possible to obtain analytically closed form expressions for the posterior distribution (see [23]). Therefore, it is necessary to approximate this distribution by computational methods. One of the most extended examples of these approximation methods is the Markov Chain Monte Carlo (MCMC) approach, an algorithm that generates simulated samples from the posterior distributions of the regression parameters by the Gibbs sampling method, a special case of the Metropolis-Hastings algorithm (see [23][24][25]). An alternative to computational algorithms, such as the MCMC methods in the Bayesian estimation mentioned above, is the Integrated Nested Laplace Approximation (INLA) algorithm (see [26]), which is based on the deterministic numerical computation of posterior distributions approximations.
There are several software packages available for the estimation of Bayesian models, such as, for example, OpenBUGS [27] and the CARBayes R package [19], both using the MCMC approach and, the R-INLA R package [28], which uses the INLA approach. Carroll, Lawson, Faes, Kirby, Aregay and Watjou [29] studied models to account for spatial dependence for Poisson distributed count data in disease mapping, and their performance is compared for the two software packages OpenBUGS and R-INLA. More specifically, the authors fitted some models such as the Poisson, normal Poisson and BYM models, among others, to simulated count data for which spatial correlation structures were implemented in different ways. Reported results concluded that there were substantial differences in the two implementations, especially for the estimation of the random effects, but they managed to find a good fit in both cases by varying some of the prior settings. Moreover, they highlighted the much shorter computation time that R-INLA required for the fitting of a model when compared to OpenBUGS. In Vranckx, Neyens and Faes [30], an extensive comparison of the fitting of the BYM model with OpenBUGS, CARBayes, R-INLA and other available software packages was also reported by fitting data from young people receiving diabetes medication in Belgian municipalities for the year 2014. Although no covariates were used in their analysis, the authors were able to identify locations with increased relative risk by fitting the BYM model to the data set under study.
The main objective of this paper was to review regression models for spatial count data with overdispersion, such as the spatial conditional overdispersion models and the BYM models, and to be able to provide a comparison between the performance of both models, particularly focusing on the case where the response variable follows a Poisson distribution. In Section 2, we describe the models to be fitted and explain the methods that will be used for their estimation within a Bayesian context. In Section 3, we provide an application of these models to infant mortality rates from Colombia and present the most relevant results. Section 4 concludes with a brief discussion on the findings.

Spatial Conditional Overdispersion Models
Let us suppose that the random variables Y i , for i = 1, . . . , n, represent counts with corresponding means E(Y i ) = µ i . The Poisson model generally assumes that Y i ∼ Poi(µ i ), with variance Var(Y i ) = µ i , so that the variance is equal to the mean, a property that is known as equidispersion. In a generalized linear model, the mean of the distribution depends on the explanatory variables through the following regression model, known as linear predictor: where g(.) is a monotonic and differentiable link function, x i is the k × 1 vector of explanatory variables for the i-th observation and β is the k × 1 vector of unknown regression parameters that need to be estimated. For the Poisson regression model, the natural logarithm is often chosen as the link function; that is, g(µ) = log(µ). Under these assumptions, overdispersion would occur when there is extra-Poisson variability in the data, so that Var(Y i ) > µ i . Two of the most common models used to accommodate overdispersion in count data for the case of response variables following a Poisson distribution are the normal Poisson and the negative binomial models. In the normal Poisson model, the overdispersion is corrected by the inclusion of a random effect, assumed to be normally distributed, in the linear predictor. In this way, the normal Poisson can be written as: where x i and β are as before, and ν i ∼ N(0, τ), for i = 1, . . . , n. In this model, (Y i |ν i ), for i = 1, . . . , n, follows a Poisson distribution with conditional mean E(Y i |ν i ) = µ i . Although the distribution of Y i does not have a closed form expression (see [3]), when the variance τ of the random effects is small enough, the random variables Y i , i = 1, . . . , n, can be considered as mixed Poisson variables, with mean and variance that can be approximated by E(Y i ) ≈ µ i and Var(Y i ) ≈ µ i + τµ 2 i (see [31]). The dispersion parameter τ allows for modeling the possible existing overdispersion and, in addition, it also captures the variability unexplained by the covariates. That is, since τ > 0, the variance is larger than that specified by the Poisson model, so that µ i + τµ 2 i > µ i . We believe it is important to mention that the condition that the variance τ of the random effects should be small enough (see [3] or [31]) is satisfied in the relevant cases in the application of this model in Section 3, so that the approximations mentioned above hold. Another frequently used model to fit overdispersed count data is the standard negative binomial or NB2 model (see [32]). One possible way to be able to generate this model is by considering a Poisson-gamma mixture. That is, if we assume that the random variables m i , i = 1, . . . , n, follow a Gamma distribution, such that m i ∼ G(τ, τ), with τ > 0 a parameter that needs to be estimated, and that the random count variables Y i , i = 1, . . . , n, conditioned on µ i and the random variables m i follow a Poisson distribution with mean E(Y i ) = µ i m i , such that Y i ∼ Poi(µ i m i ), then the unconditional distribution of Y i can be derived in the following way [32]: which corresponds to the probability density for a negative binomial distributed count variable, so that The dispersion parameter τ allows for the modeling of the extra-Poisson variability because, since we have that τ > 0, then µ i + τ −1 µ 2 i > µ i . Therefore, and from the above, for the negative binomial regression model, the linear predictor is specified for the mean µ i , so that: where x i and β are as above.
One of the reasons for the existence of overdispersion in spatial data may be the possibly existing spatial correlation between the responses corresponding to the different adjacent locations. Hence, it can be assumed that a portion of the overdispersion can be explained by taking into account this spatial correlation. Thus, the spatial conditional overdispersion regression models proposed by Cepeda-Cuervo, Córdoba and Núñez-Antón [22] assumed a specific spatial structure for the variable under study. That is, they assumed that Y i , for i = 1, . . . , n, conditioned on the values in all of the neighbors of the ith region, except for the i-th region itself (i.e., Y ∼i ), follows a conditional overdispersed distribution denoted by f (y i |y ∼i ), for i = 1, . . . , n. In this distribution, the conditional mean follows a given regression structure that includes some covariates affecting the response variable, as well as its spatial lags, together with a spatial parameter that allows to account for the intensity of the spatial dependence that is present in the data. In the case where the conditional distribution follows one of the aforementioned models, this model leads to the spatial conditional Poisson, negative binomial and normal Poisson regression models, respectively.
The spatial distribution is commonly specified by means of a neighborhood structure, defined, for a sample of n regions, by an n × n spatial weights matrix, denoted by W = [w ij ], where its elements, w ij , are the weights to be specifically used to model the strength of the dependence between the i-th and the j-th regions. These elements are given by the contiguity criteria chosen by the researcher, which can be based on the boundaries of the regions or on the distance from one spatial location to the others, or by any other alternative criteria previously proposed in the literature. It is commonly assumed that, w ij = 1, if region i is adjacent or a neighbor to region j, and w ij = 0, otherwise. First order contiguity can be specified, for example, when we use the criteria that regions i and j are neighbors if they share at least one point in their boundaries. Second order could also be considered if we extend the criteria by considering that i and j are neighbors if they share a common neighbor. This weights matrix is usually standardized by rows, so that, if region i is adjacent to region j, then w ij = 1/n i , where n i is the number of neighbors region i has. Along these lines, if y is the n × 1 vector of observations for a response variable Y, then the spatial lag of Y is defined as the product of the 1 × n vector corresponding to the ith row of the weights matrix W, W i , and the vector y; that is, W i y, a product representing the averaged values of the considered variable in the neighboring locations for the ith region. In this work, we only assume first order adjacency among regions and, in addition, that the spatial weights matrix is standardized by rows.
The spatial conditional Poisson model is specified by assuming that the conditional distribution of the variable under study follows a Poisson distribution, that is (Y i |Y ∼i ) ∼ Poi(µ i ), with conditional mean E(Y i |Y ∼i ) = µ i , so that its corresponding regression model, where the previously described spatial association dependence is incorporated, can be specified as: where x i and β are as above, ρ is the parameter incorporating the first order spatial association, W i is the ith row of the n × n weight matrix W that represents the spatial neighborhood structure assumed in the model, and y is the vector of dimension n × 1 for the observed values for the response variable under study. The spatial conditional normal Poisson model assumes that the distribution of the variable under study, conditioned on its neighbors excluding the i-region itself, Y ∼i , and the normally distributed random effect ν i ∼ N(0, τ), follows a Poisson distribution. That is, so that its corresponding regression model, where the previously described spatial association dependence is incorporated, can be specified as: where x i , β, ρ, W i and y are as above.
In the same way, the spatial conditional negative binomial model can be also specified if we assume that the response variable under study, conditioned on Y ∼i , follows a negative binomial distribution. That is, (Y i |Y ∼i ) ∼ NB(τ/(τ + µ i ), τ), with conditional mean E(Y i |Y ∼i ) = µ i , and with a regression structure given by Equation (5).
We believe it is important to mention that, in the spatial conditional normal Poisson and the spatial conditional negative binomial models, a portion of the overdispersion that may have been generated by the possible existing spatial correlation in the data is considered to be incorporated into the model by using the specified neighborhood spatial structure, given by the product between the spatial weights matrix and the vector of responses (i.e., by incorporating spatial lags of the variable under study), W i y. The remaining unexplained overdispersion in the data will be modeled by means of the dispersion parameter τ. However, we should note that these models assume a constant overdispersion, and there are cases where the dispersion in the data can vary among groups or observations. The generalized overdispersion models proposed by Quintero-Sarmiento, Cepeda-Cuervo and Núñez-Antón [9] introduced Bayesian extensions of the standard overdispersion models, where regression structures are assumed both for the mean and for the dispersion parameter. Their model allowed for the dispersion to vary as a function of some explanatory variables, so that dispersion can vary for the different regions or observations in the study.
In this sense, generalized overdispersion models offer a reasonable and well justified proposal for fitting count data with overdispersion. However, for the case of spatial count data, they do not provide information or incorporate into the model the possible existing spatial dependence in the data set under study, which clearly motivates the inclusion of the spatial dependence structure in these models. Along these lines, the generalized spatial conditional overdispersion regression models [22] assumed that the spatial count variable in their model, Y i , i = 1, . . . , n, conditioned on the values in all of the neighbors of it, except for the ith region itself (i.e., Y ∼i ), follows an overdispersed conditional distribution f (y i |y ∼i ), i = 1, . . . , n, with conditional mean and dispersion parameter following specific regression structures that include some covariates affecting the response variable and the spatial lags of the variable of interest.
If we now consider the case where Y i follows a Poisson distribution with mean µ i , and assume the normal Poisson model with mean structure given by Equation (6), for the generalized spatial conditional normal Poisson model, we will have that the conditional mean and variance components in the random effect distribution will be specified by regression structures given by: where x i , β, W i and y are as above, and ρ 1 and ρ 2 are the parameters that explain the spatial association in the mean and dispersion structures, respectively. In addition, Z i is the q × 1 vector of explanatory variables for the ith observation and γ is a vector of dimension q × 1 containing the unknown regression parameters that need to be estimated. The generalized spatial conditional negative binomial model can be specified in the same way, by assuming that Y i , i = 1, . . . , n, conditioned on Y ∼i , follows a negative binomial distribution, and assuming the following regression structures for both the conditional mean and dispersion parameter: where x i , β, W i , y, ρ 1 , ρ 2 , Z i and γ are as above.

Besag-York-Mollié (BYM) Model
The Besag-York-Mollié (BYM) model [16] is a Bayesian Poisson hierarchical model widely used in the literature for fitting spatial count data, particularly in the field of disease mapping (see [17]). It is an extension of the generalized linear Poisson model that includes both a spatial structured and an unstructured random effect in the regression model structure. If we let Y i , i = 1, . . . , n represent counts for the n different regions, the BYM model is specified by assuming that the variable under study follows a Poisson distribution with mean E(Y i ) = µ i , having a mean regression structure given by: where x i and β are as above, ν i is a normally distributed random effect, so that ν i ∼ N(0, τ), with τ > 0 being an unknown variance parameter that needs to be estimated, and η i is an intrinsic conditional autoregressive (CAR) [16] distributed random effect, so that: where η ∼i represents the set of values of all neighbors of the ith region, except for the ith region itself, W is the spatial weights matrix and τ η is an unknown variance parameter that needs to be estimated. Given that this model is able to account for spatial dependence and also for the extra-variability in the data set not explained by the covariates, it has a considerable potential to motivate and justify its use for the analysis of spatial count data. However, since it is only possible to obtain information from the data from the sum of the two random effects, but not from each of the individual components separately, its use in this specific context has been questioned because of the possible identifiability problems that it may present (see [33]). Some authors (e.g., Riebler, Sørbye, Simpson and Rue [20]) have addressed the issue of identifiability. They proposed the BYM2 model, an extension of the BYM model that scales the spatial component and the unstructured component, so that the mean regression structure can be written as: where the random effects ν i and η i are as in the BYM model, but with a scaled variance approximately equal to one, τ s is an unknown precision parameter that captures the variance contribution from the sum of the two random effects, and φ s is a mixing parameter that controls for the variance contribution of the spatially structured component η = (η 1 , . . . , η n ) , whereas the variance contribution of the unstructured random component ν = (ν 1 , . . . , ν n ) is explained by 1 − φ s . The main advantage of the BYM2 model is precisely the possibility that it offers to be able to separately capture the impact of the spatial dependence and the effect of the variability or the overdispersion present in the data. The priors for these hyperparameters are defined by means of the penalized complexity priors developed by Simpson, Rue, Riebler, Martins and Sørbye [34]. The complexity prior for the parameter τ s can be specified by assuming the probability statement that Prob(1/ √ τ s > U) = α, and, for the parameter φ s , that Prob(φ s < U) = α, with U and α being fixed values that depend on the specific application under consideration. The use of these priors has been proved to be a suitable choice in Bayesian spatial models and, especially, for the BYM2 model, mainly due to the fact that they favor less complex models and allow for a clearer interpretation of the parameters (see [35]).

Bayesian Estimation
As we have already mentioned above, models studied here will be estimated by using a Bayesian approach. That is, we assume that we have a sample of n independent observations, y i , for i = 1, . . . , n from the variable Y, and we wish to estimate a parameter θ, we will consider it as a random variable and express our beliefs about this parameter via a prior distribution p(θ). The information available in the data about θ will be included in the likelihood function L(y|θ), which is the joint distribution of the sample, so that, if y i , i = 1, . . . , n, given the parameter θ, are independent and have a probability density function given by f (y i |θ), then L(y|θ) = ∏ n i=1 f (y i |θ). In Bayesian inference, we use this information to update our knowledge using the Bayes theorem, thus, being able to obtain a posterior distribution for the parameter given the data, p(θ|y) ∝ L(y|θ)p(θ). In the case of spatial conditional regression models, Cepeda-Cuervo, Córdoba and Núñez-Antón [22] considered the variables (Y i |Y ∼i ), conditioned on the assumed spatial neighborhood structure, following an overdispersion distribution such as the ones mentioned in Section 2.1, and the parameter θ, to be estimated, to be independent. Therefore, under these independence assumptions, the likelihood function can be obtained in the usual way and, therefore, the Bayesian inference process is valid.
In Bayesian analysis, vague or noninformative prior distributions for the parameters are usually specified in order to minimize the possible impact of prior information, compared to the likelihood of the data, on the posterior inference. For the regression coefficients of explanatory variables, typically normal prior distributions with zero mean and large variances are considered. In this case, we assume that β j ∼ N(0, 1 × 10 5 ), j = 1, . . . , k. In most software packages available for Bayesian inference approaches, the prior distribution for the variance component in the normal distribution is implemented on its inverse instead, which is usually labeled as the precision parameter (i.e., ψ), so that ψ = 1/τ, if τ is the variance component parameter. For this precision parameter several prior distributions have been proposed in the literature for Bayesian hierarchical models (see [36]), the Gamma distribution being the most commonly used. In this way, it is assumed that ψ ∼ G(α 1 , α 2 ), with α 1 and α 2 being fixed and user-specified parameters. The choice of these values, α 1 and α 2 , is a crucial issue that needs to be addressed in a careful manner, mainly because inference can be sensitive to their selection, especially when the data set does not have a large number of observations available (see [36]). Specific values of α 1 = α 2 = 0.001 for this prior distribution are often employed in many applications (see [17]), so that ψ ∼ G(0.001, 0.001), which, given that its mean is equal to 1 and its variance equal to 1000, a large value, it can be considered as a vague prior. Alternative frequently used values that can be found in the literature are, α 1 = 1 and α 2 = 0.01, in Vranckx, Neyens and Faes [30], α 1 = 0.05, α 2 = 5 × 10 −4 in Best, Richardson and Thomson [18], α 1 = 1 and α 2 = 0.5 in Carroll, Lawson, Faes, Kirby, Aregay and Watjou [29], α 1 = α 2 = 1 × 10 −4 in Cepeda-Cuervo, Córdoba and Núñez-Antón [22], among others. Nevertheless, the choice of these parameters must be based on their adequacy to the specific application considered and its adverse effects on the posterior inference should be appropriately assessed and studied.
Following these guidelines, for our Bayesian analysis, we will assume noninformative prior distributions with zero mean and large variance for the regression parameters, as well as for the spatial lag parameters included in the proposed models. For the inverse of the dispersion parameters, ψ = 1/τ, we will specify Gamma distributions with large variances, so that ψ ∼ G(α, α), with α being a very small value. Estimation will be carried out in OpenBUGS, and also in R-INLA for some specific cases.
Model selection will be performed by using the Deviance Information Criterion (DIC) [37] and the Watanabe-Akaike Information Criterion (WAIC) [38], also known as the widely applicable information criterion, where the models with the lowest values for these criteria would be considered as the best fitting ones. On the one hand, the DIC is based on the posterior distribution of the deviance statistic, a measure of the model's fit, and it is penalized by the effective number of parameters, which is a measure representing the complexity of the model. On the other hand, the WAIC is based on the logarithm of the pointwise posterior predictive density and receives a penalty specified by a different definition of the effective number of parameters. This criterion has become very popular in the last few years, since it is considered as a fully Bayesian approach (see [23,39]). Given that each of these two measures has its own advantages and drawbacks (see [39]), we will include both of them in our analysis, since we believe the information provided by one can be complemented by the other. Moreover, besides these information criteria values, we will also take into account the predictive accuracy of the fitted models to select the best fitting ones by performing posterior predictive checks on each of the fitted models.

Application to the Study of Infant Mortality in Colombia
The data analyzed here have been obtained from the National Statistics Department of Colombia and this specific data set corresponds to 32 departments or geographical units (areas or regions) in this country. For each geographical unit, the available variables are: the number of children under one year of age who died in 2005 (i.e., variable ND), the total number of births in 2005 (i.e., variable NB), an index representing the percentage of the population not having their basic services satisfactorily attended for the same year (i.e., variable NBI), the amount of resources (in thousands of dollars) for academic achievement or education and integral attention for young children provided by the government per household in the year 2005 (i.e., variable Rec), the percentage of women over the age of 18 who had suffered physical violence from their current partners (i.e., variable Viol), the percentage of young people (i.e., between 18 and 24 years) who were able to opt for a higher educational level (i.e., variable HE), and the percentage of children under one year of age who received the third dose of the polio vaccine in the year 2004 (i.e., variable Vac).
A similar version of this data has been previously analyzed by Quintero-Sarmiento, Cepeda-Cuervo and Núñez-Antón [9] and Cepeda-Cuervo, Córdoba and Núñez-Antón [22], where the authors fitted their proposed generalized spatial conditional models to analyze mortality rates for children under five years of age, considering the response variable to be the number of children under five years of age who died in each department in Colombia from 2000 up to 2005. They found evidence of overdispersion and positive spatial autocorrelation in the data set they analyzed, and they were able to capture these specific features in the data set under study with the fitting of the proposed models, where positive significant relations between the variables NBI and the number of deaths for children under five years of age, as well as negative significant relations between the variables Rec and mortality rates. We would like to indicate that the data set here was directly obtained from the National Statistics Department of Colombia because we were unable to have access to the one previously analyzed by other authors and, thus, our analysis is not applied to the same data set.
In this section, we study the mortality rates for children under one year of age in the year 2005, and fit the models previously discussed in Section 2. The explanatory variables that we will include in the study constitute relevant socio-economic indicators that can considerably affect infant mortality (see [22]). In order to specify noninformative prior distributions for the parameters in our Bayesian analysis, we assume independent normal distributions, N(0, 1 × 10 5 ), for all the regression parameters; that is, β j ∼ N(0, 1 × 10 5 ), j = 1, . . . , k, as well as for the spatial association parameter ρ. As for the inverse of the dispersion parameters τ, ψ = 1/τ, based on the sensitivity analysis performed in Section 3.2 for this specific prior distribution, gamma G(1 × 10 −4 , 1 × 10 −4 ) distributions were assumed. When running the implemented software programs in OpenBUGS, and after 10,000 iterations, a burn in period of 5000 samples and considering a thinning parameter of 10, the MCMC chains showed strong signs of convergence for all of the parameters included in the proposed models.
With the available information in the data set under study, we can approximate infant mortality rates as the number of children under one year of age who died in the year 2005 per 1000 born alive in each of the departments in Colombia. In this way, we can obtain a new variable (i.e., the variable Rates) as: In order to better understand the data under study, Table 1 includes a summary of some descriptive statistics for the available variables in the Colombian mortality rates data set.  Figure 1 shows the spatial distribution of the variable Rates, representing an approximation of infant mortality rates in each department of Colombia for the year 2005. In the map, there are clear indicators of spatial association in the data, as regions with similar values of the variable appear to be grouped together in space. Departments located in the east of the country, belonging to the natural region called the Amazonia, which is located in the Amazonian rain forest, show large values of mortality rates. In addition, for departments located in the central part of the country, surrounding the capital, Bogotá, smaller rates can be observed.
It is important to mention that it has seemed reasonable for authors that have previously analyzed similar data sets to assume that, with regard to infant mortality rates, regions closer in space share somehow similar socio-demographic and economic characteristics, i.e., they can present similar values of infant mortality rates. As observed in the regional map shown in Figure 1, we have reasons to believe that this data set may also exhibit spatial dependence, which must be properly assessed and, if required, it should also be appropriately included in the proposed models to be fitted to this specific data set. An additional issue worth mentioning is the possibility that the aforementioned association may also be related to migration, either to other regions or to the capital. This is a topic that needs to be further explored and the appropriate data sets, if available, where infant mortality rates and migration are related, should be used. However, to our knowledge, such data sets are not available at the moment and, therefore, this issue remains to be further investigated. Moreover, we believe it is out of the scope of the research proposals presented here. Therefore, we will specify a spatial neighborhood structure, represented by the n × n first order spatial weights matrix W = [w ij ], standardized by rows, with w ij = 1/n i , if i is adjacent, and hence, a neighbor to region j; that is, if they share at least one point in their boundaries, and w ij = 0, otherwise. Here, n i is the number of neighbors for the i-th region. The spatial lag of the variable Rates is obtained by multiplying the i-th row of the spatial weights matrix W = [w ij ], W i , and the n × 1 vector of observations for the variable Rates, that is W i Rates. In order to be able to assess the possible existence of spatial dependence in the data, we apply the global Moran's I contrast [40] to the variable Rates, obtaining a test statistic value of I = 0.3490, providing a p-value = 0.0017. Hence, for the specific data set under study, there is evidence against the null hypothesis that the values of the variable Rates are randomly distributed across the different regions in Colombia. This result can be better seen in a graphical display in Moran's scatterplot, shown in Figure 2, where the variable Rates is plotted against the spatial lag W i Rates, and where a red line represents the estimated linear regression line fitted for these two variables. The slope of this regression is precisely Moran's I statistic and, as can be seen in Figure 2, the values of the variable for each department appear to have a positive significant correlation with the averaged values of the variable in the adjacent regions. Hence, there is substantial evidence for the possible existence of positive spatial autocorrelation in the data that need to be accounted for in the models to be fitted in the following sections for its analysis. To further confirm the previous conclusions, we have also applied Geary's C test [41] to the variable Rates, obtaining a value for the C statistic of C = 0.5490, with a p-value = 0.0017. Taking into account that significant values for the C statistic of 1 indicate no spatial autocorrelation for this test, and that values between 0 and 1 indicate positive spatial autocorrelation in the variable under study, the results obtained for our case (i.e., C = 0.5490) suggest further evidence that the variable is not randomly distributed across the regions, but positively correlated.

Fitting of the Spatial Conditional Overdispersion Models
We assume that the number of children under one year of age who died in 2005 in the ith region; that is, the variable ND i follows a Poisson distribution with mean µ i . As we are interested in modeling the mortality rates, we will consider that E(ND i ) = µ i , with µ i = NB i λ i , where NB i is the total number of births in 2005 (i.e., the variable NB), and λ i represents the corresponding mortality rate. Therefore, we include the logarithm of the total number of births log(NB i ) in the linear predictor as an offset variable, so that we can write this model as: Table 2 includes the corresponding parameter estimates for the fitting of the Poisson model with regression structure given in Equation (13), together with its standard deviations and 95% credible intervals in parenthesis, when fitted to the Colombia infant mortality rates data set. For this model, the resulting information criteria values where DIC = 491.7 and WAIC = 524.0. Based on the estimations for the regression coefficients for this model, we can observe that all the explanatory variables, except HE, are statistically significant, as none of their 95% credible intervals includes the value zero. However, these reported results should not be taken as an indication of a good model's fit, mainly because it could be a consequence of fitting the model without taking into account the existence of overdispersion. In fact, if we plot the estimated mean and variance obtained by the fitting of this model (see Figure 3), where the blue line represents that its mean equals its variance, we notice that most of the estimated variance values from the Poisson model are larger than the mean, which may be a clear indication that the assumption of equidispersion of the Poisson distribution does not seem to hold for this data set.  Therefore and based on the possible existence of overdispersion, we have fitted the overdispersion models mentioned in the previous section, starting with the normal Poisson model with regression structure given by Equation (14), and the negative binomial model with regression structure as in the Poisson model in Equation (13). These models show considerable improvements in their DIC and WAIC values when compared to those for the Poisson model. That is, resulting values were DIC = 308.0 and WAIC = 302.3 for the normal Poisson model, and DIC = 362.2 and WAIC = 361.8 for the negative binomial model. Moreover, if we carefully observe the results for the 95% credible intervals for the estimated coefficients after fitting the normal Poisson model, we notice that, except for NBI, all the other variables are statistically nonsignificant. In the case of the fitting of the negative binomial model, only the variables NBI and Viol are statistically significant. These results are justified because these models have taken into account the overdispersion in the data set under study and, therefore, these credible intervals become wider than those obtained when fitting the Poisson model.
In addition, to be able to account for the possible spatial dependence present in the data set under study, we have also fitted the spatial conditional Poisson and negative binomial models, with regression structures given by Equation (15), where the spatial lag of the variable Rates is also included in these models. That is, we have that: (15) Finally, we have also fitted the spatial conditional normal Poisson model with regression structure given in Equation (16), with reported information criteria values of DIC = 307.3 and WAIC = 301.3, these being the lowest information criteria values obtained so far when compared to the rest of the models. In addition, the estimated value for the spatial lag parameter, ρ, wasρ = 0.0149(0.0068), with the value zero not included in its 95% credible interval, a clear evidence for the existence of positive spatial autocorrelation in the data. Moreover, the estimated dispersion parameter wasτ = 0.0243(0.0088), also indicating that, besides explaining the spatial dependence, this model also captures the overdispersion, allowing for an extra variation given that a specific random effect has been included for this purpose in this model.
From the results reported in Table 2, the lowest DIC and WAIC values were obtained for the spatial conditional normal Poisson model, which was considered therefore the best fitting model. A variable selection process was performed afterwards for this model by fitting reduced versions and comparing the information criteria values obtained for the different models. The results for some of the fitted models shown in Table 3 indicate that the model with the lowest information criteria values, i.e., DIC = 306.8 and WAIC = 301.2, is the one which includes the variables Viol, NBI and Rec, but in this model, the variables Viol and Rec are not statistically significant, since zero is contained in the 95% credible interval for the estimated coefficients for each of these variables.
The model including the variables NBI and Rec with regression structure given by Equation (17), which provided information criteria values of DIC = 307.4 and WAIC = 302.3, could be a good candidate for fitting the data under study. In this model, according to its 95% credible intervals, all of the estimated coefficients are statistically significant. The estimated coefficient for the variable NBI wasβ 1 = 0.0167(0.0018), which indicates that, according to the model fitted to this data, infant mortality rates have a statistically significant positive association with the percentage of people with unsatisfied basic needs. With regard to the estimated coefficient for the variable Rec, it wasβ 2 = −0.0011(4.982 × 10 −4 ), a fact that could be an indication of a statistically significant negative association between infant mortality rates and the amount of resources provided for academic achievement. In addition, the estimated coefficient for the spatial term has a positive value ofρ = 0.0151(0.0061), again a clear evidence of the presence of positive spatial autocorrelation in the data.
It is worth mentioning that the model providing information criteria values of DIC = 306.9 and WAIC = 301.4, which correspond to the model containing the variables NBI and Viol, with a regression structure given by Equation (18), should also be considered as a good candidate for fitting the data set under study, where, based on their 95% credible intervals, all of the estimated coefficients are statistically significant. The estimated coefficient for the variable NBI has a value ofβ 2 = 0.0159(0.0018), a very similar value to that obtained in the previous model for this specific coefficient. In addition, the estimated coefficient for the variable Viol wasβ 1 = 0.0114(0.0053), implying that infant mortality rates may have a statistically significant positive relation with the percent of women who suffered any type of physical abuse from their current partner. The spatial coefficient estimated value waŝ ρ = 0.0184(0.0059), which represents further evidence of the presence of positive spatial autocorrelation in the data.
In order to provide some information that could be useful for the readers about the size of the effects of the explanatory variables in the models, we have computed estimations of the marginal effects at the means for these variables in the spatial conditional normal Poisson models in Equations (17) and (18), which are reported in Table 4. For the variable NBI in the model in Equation (17), this estimated marginal effect was 3.9981 × 10 −4 , which means that with an increment of 1 percentage point in the variable NBI, the mortality rate would be increased by a 0.039981%, when the other variables are set at their mean value. Although this represents a very small effect on the infant mortality rate, according to the credible interval (3.1469 × 10 −4 , 4.8580 × 10 −4 ), it is indeed significant. Moreover, small effects, but significant according to their credible intervals, were also obtained for the variable Rec in this model and for the variables Viol and NBI for the model in Equation (18).  (17) and (18)  In order to assess the convergence of the MCMC chains for the fitted models, we have computed the effective sample size for estimating the means, (i.e., N eff ), and the potential scale reduction factor (i.e.,R) for each parameter's MCMC chain. For brevity of exposition, the obtained results are reported in Table 5 only for the models in Equations (17) and (18).
The values N eff represent the equivalent number of independent samples in each parameter's MCMC chain. It is often considered that a minimum of 100 independent simulations is a sufficient number for performing reasonable posterior inference (see [23]). Therefore, a desirable value for N eff would be any number of at least 100. We have simulated 3 Markov chains, with 10,000 iterations and discarded the first 5000 samples from each one, which leaves us with a total of 15,000 samples. Taking this into account and considering that the values of N eff for the chains of all the estimated parameters obtained from the fitting of models in Equations (17) and (18), reported in Table 5, are larger than 1000, we can conclude that we have enough simulations to correctly approximate the target distribution and that the correlation in the chains should not affect posterior inference on the parameters. Moreover, the valuesR are an estimation of the potential scale reduction factor (see [42]). Values closer to 1 indicate convergence of the chain, whereas when values larger than 1.1 are obtained, it is considered that further simulations need to be computed in order to improve posterior inference. In this specific application, all of theR values are approximately one, which suggests that the chains for all of the parameters have successfully converged to the target distribution.  (17) and (18) fitted to the Colombian infant mortality rates data set.

Intercept
Viol NBI Rec ρ τ Model in Equation (17)  As a summary of this section, we believe it may be convenient to mention that the results obtained from the fitting of these models in Equations (17) and (18) are consistent with the previous results obtained in Cepeda-Cuervo, Córdoba and Núñez-Antón [22] for a similar data set. This fact is an illustration of the usefulness of the spatial conditional overdispersion models for being able to explain spatial dependence and overdispersion in applications for real data sets.

Sensitivity Analysis for the Variance of the Prior Distributions
The prior distribution G(α, α), assumed for the inverse of the variance parameter τ for the random effects, that is, for the precision parameter ψ = 1/τ, can have a significant effect on the inferential process, so inference may be quite sensitive to the choice of the fixed parameters α in this gamma distribution (see [36]). Therefore, in order to better assess this effect and select prior distributions where it is limited or at least controlled for, we have performed a sensitivity analysis for the best fitting model in the previous section, the spatial conditional normal Poisson model with regression structure given by Equation (17), considering different possible values for α in the gamma prior distribution ψ ∼ G(α, α) for the precision parameter in the random effects, from α = 0.1 to α = 1 × 10 −8 . Results are included in Table 6, where we can observe that for values of α = 1 × 10 −4 , as well as for smaller values, there were only small differences in the estimates in the third decimal place for a few cases. If we also look at the posterior marginal densities for the estimated precision parameter ψ (see right panel in Figure 4), no changes are observed when setting the values for α in the prior distributions for the last five values considered here (i.e., from α = 1 × 10 −4 up to α = 1 × 10 −8 ). Therefore, we believe the choice of the value α = 1 × 10 −4 is well justified and, in addition, it does not appear to represent any undesirable influence on the inferential process considered here.

Fitting of the Generalized Spatial Conditional Normal Poisson Model
For all of the aforementioned fitted models, it is assumed that the dispersion parameter (i.e., τ) is constant, a fact that could not always be a reasonable assumption. Hence, we will allow the dispersion parameter to vary with some explanatory variables by considering the so-called generalized spatial conditional models (see [22]). After fitting the generalized spatial conditional normal Poisson model for the infant mortality data in Colombia and, performing a variable selection process, we have concluded that the best fitting model was the one where the mean model contains the spatial lag and the dispersion model includes the variable NBI, so that: Table 7 reports the results from the fitting of the model in Equation (19). The DIC = 308.0 and WAIC = 299.1 values from the fitting of this model are quite similar to those obtained when fitting the spatial conditional normal Poisson model and, therefore, this does not result in a significant improvement in the model's fitting. However, we believe that there are some very interesting features that can be noticed from these two fittings. On the one hand, the mean of the estimated coefficient for the variable NBI in the dispersion model wasγ 1 = 0.0430(0.0145) and, according to its 95% credible interval, this variable is statistically significant, which could indicate that the dispersion varies according to the variable NBI in such a way that in regions where the percentage of the population not having basic services satisfactorily attended is larger, the dispersion increases. On the other hand, the spatial parameter estimate in the mean model has a mean with a positive value ofρ = 0.0431(0.0093) and, in addition, according to its 95% credible interval, it is also statistically significant. The significance of ρ constitutes a clear evidence for the presence of positive spatial autocorrelation, which is appropriately captured by the fitting of this model. Taking these facts into consideration, the fitting of these generalized spatial conditional models would be well justified. In addition to being able to model the spatial dependence in the data, it also explains their nonconstant dispersion. As part of the posterior predictive checks required to better assess the fit of a model, Figure 5 includes the scatterplots for the observed mortality rates versus the predicted mortality rates for some of the fitted models. From the plots in Figure 5a,b, we can mention that the spatial conditional normal Poisson models in Equations (17) and (18) show high accuracy in the prediction of mortality rates for observed rates under 40 whereas, for some of the values larger than 40, predictions slightly deviate from the observed values. The scatterplot included in Figure 5c for the generalized spatial conditional normal Poisson model in Equation (19) shows a considerable improvement, mainly because values of the mortality rates larger than 40 are now more accurately predicted.

Figure 5.
Scatterplots for the observed versus the predicted rates obtained from some of the fitted models to the Colombia infant mortality rates data set. Figure 6 includes histograms of the posterior predictive distributions for the means over replicated simulations of the mortality rates, estimated from the fitting of the spatial conditional normal Poisson model in Equation (17) (see Figure 6a) and Equation (18) (see Figure 6b), and by the generalized spatial conditional normal Poisson model in Equation (19) (see Figure 6c). As we can see from these figures, they show how the means of the replicated data vary with respect to the actual mean of the observed values, a fact suggesting that the considered models offer an adequate fit to the data set under study.

Comparisons to the BYM Model
In order to be able to compare the performance between the previous models and the BYM model, we have decided to select the best fitting model we have so far, which is the spatial conditional normal Poisson model, and compare its fitting to that of the BYM model. In order to do so, we have fitted the BYM model with regression structure given by: where ν i , i = 1, . . . , n is a set of normally distributed random effects with Var(ν i ) = τ, with τ > 0, and η i , i = 1, . . . , n is a set of spatially structured random effects following an intrinsic CAR prior distribution, with variance parameter τ η . As in the previous models, we assume noninformative normal priors for the regression parameters; that is, N(0, 1 × 10 −5 ) and, for the precision parameters ψ = 1/τ and ψ η = 1/τ η , we also assume noninformative prior gamma G(1 × 10 −4 , 1 × 10 −4 ) distributions. Finally, for the BYM model, we assume the same first order neighborhood structure as before.
The fitting of this model in OpenBUGS is very unstable, since it showed drastic changes if we slightly changed the assumed values for the prior distributions, especially in the estimated values for the variance parameter τ η of the spatially structured random effects. In addition, in most cases, the effective number of parameters resulted in negative values, which could be a clear sign of conflict between the prior distribution and the data (see [37]). For the reasons we have just described here, and in order to be able to compare the performance of both models when fitting them to the data set under study, we have fitted both the spatial conditional normal Poisson model in Equation (21) and the BYM model in Equation (20) in R-INLA, by setting the same values for the prior distributions, so that more stable results were provided. Moreover, we have also fitted reduced versions of these models. (21) where ν i , i = 1, . . . , n is a set of normally distributed random effects with Var(ν i ) = τ. Results for the estimation of the spatial conditional normal Poisson model in Equation (21) are reported in Table 8, and those for the BYM model in Equation (20) are included in Table 9, as well as estimations of reduced versions of these models.
If we compare the results from the fitting of the spatial conditional normal Poisson model in OpenBUGS in Table 3 with the one in INLA in Table 8, we can see that reported results are quite similar. In some cases, the means of the estimated parameters only differ in the third decimal place and, only for very few cases, in the second decimal place. This also occurs for the estimations of the variance parameter for the random effects, where we can only observe differences in the third decimal place for some cases. Even though the estimations are very similar, it must be emphasized that the computation time that R-INLA used for the fitting of each one of the models reported in Table 8 was much smaller than the one used by OpenBUGS for the same model. With regard to the information criteria values for the fitted models (i.e., the DIC and WAIC values) reported in Table 9 for the BYM model, as well as those reported in Table 8 for the spatial conditional normal Poisson model, we can see that the WAIC values indicate a moderately better fit for the spatial conditional normal Poisson model whereas differences in the DIC values are minimal and do not favor any of these models in particular. Comparison between the performance of the spatial conditional normal Poisson and the BYM fitted models can be better explored in Figure 7, where scatterplots of the observed versus the estimated values obtained from the fitting of some of the reduced versions of these models are shown. In addition, there are certain points about the fitting of the BYM model that are worth being mentioned.
For the BYM models reported in Table 9, the estimated variance parameters for both the spatially correlated and the uncorrelated random effects show overall quite large standard deviations. For instance, in the case of the model with regression structure given by Equation (20), the estimated variance parameters for the random effects werê τ η = 0.0260(0.0251) andτ = 0.0224(0.0141), making it difficult to interpret them, especially to explain the spatial dependence or the extra-variability present in the data. This also occurs for the model containing only the explanatory variables NBI and Rec, where the means of the estimated variance parameters wereτ η = 0.0367(0.0301) and τ = 0.0170(0.0113), respectively.
In any case, all of the BYM fitted models resulted in larger means for the variance parameter estimates τ η for the spatially structured random effects than those obtained for the means for the estimates of the variance parameter τ for the uncorrelated effects. These results may be a sign indicating that fits of the BYM model for this data are probably giving more importance to the spatial structure assumed by the intrinsic CAR prior for the spatial effects than to the extra-variability represented by the unstructured effects, a fact that will be closer examined when fitting the BYM2 model in the next section. Nevertheless, we have not been able to obtain information about the strength, or even the type of the spatial association from the estimations in this model.
Another issue that we believe is important to mention relates to the interpretation of significance for the estimated coefficients of the regression parameters obtained when fitting the BYM models. According to the 95% credible intervals, the coefficient for the variable NBI is the only one that is statistically significant for each one of the reported models, whereas in the spatial conditional normal Poisson, the variables Rec and Viol were statistically significant for some cases. Hence, regarding the explanatory variables, a direct interpretation of their statistical significance according to their credible intervals on infant mortality rates can only be made for the variable NBI.
We have also computed the marginal effects at the means for the explanatory variables in some of the fitted BYM models, which are reported in Table 10. All of the values reported there appear to be quite small, consistently with the values for the marginal effects at the means obtained from the fitting of the spatial conditional normal Poisson models in OpenBUGS, reported in Table 4 in Section 3.1. A difference to be noticed in this case is that, according to their 95% credible intervals, the effects for the variables Rec and Viol are not statistically significant for some of the models. For instance, the effect of the variable Viol is not significant in any of the models, and the effect of variable Rec is not significant for the BYM model.
As a simple visual way of comparing the performance of the spatial conditional normal Poisson and the BYM models fitted in INLA, Figure 7 includes the scatterplots of the observed versus the predicted values obtained from the fitting of some of the reduced versions of these models. These plots suggest that there are no major differences in the fitting of these two models to this data in terms of predictive accuracy.
From the issues mentioned above, we believe that, for the application considered here, the spatial conditional normal Poisson model may be a better option to model spatial count data following a Poisson distribution when compared to the BYM model. However, leaving aside the fact that there were some serious issues that emerged when fitting the BYM model with the MCMC approach, when fitting them in INLA the DIC and WAIC values were quite similar for both models. Moreover, the spatial conditional normal Poisson model provided information about the type and strength of the spatial autocorrelation that is present in the data, information that could not be obtained from the fitting of the BYM model. Nevertheless, it is essential that we remember that the true model is not known, nor are the true values of the strength of the spatial association or the real overdispersion parameter.

Comparisons to the BYM2 Model
Finally, we would like to explore the possibilities that the recently developed BYM2 model offers, given that it allows us to overcome the issue of nonidentifiability in the BYM model. In order to do so, we have fitted the BYM2 model with regression structure given by: where ν i , i = 1, . . . , n is assumed to be a set of normally distributed random effects with a scaled variance approximately equal to one, and η i , i = 1, . . . , n is a set of spatially structured random effects following an intrinsic CAR prior distribution, each one also with scaled variance approximately equal to one. The unknown precision parameter τ s captures the variance contribution from the sum of the two random effects, and the mixing parameter φ s controls for the variance contribution of the spatially structured effect η = (η 1 , . . . , η n ) , whereas the variance contribution of the unstructured random effect ν = (ν 1 , . . . , ν m ) is explained by 1 − φ s . As in the previous models, we assume noninformative normal priors for the regression parameters; that is, N(0, 1 × 10 −5 ). For the precision and mixing parameters, complexity priors are specified when following the approach by Simpson, Rue, Riebler, Martins and Sørbye [34]. That is, we assume the probability statement that Prob(1/ √ τ s > U) = α for the parameter τ s . By considering an upper bound for the marginal standard deviation of 0.2, and using the rule of thumb from the aforementioned proposed approach, we can set U = 0.2/31 and α = 0.01. As for the mixing parameter φ s , we can specify that Prob(φ s < 0.5) = 2/3, which would represent the initial assumption that the proportion of the variability captured by the unstructured random effect ν is larger than the one explained by the spatially structured effect η. Table 11 reports the results for the estimation of the BYM2 model in Equation (22), as well as some of its reduced versions. As can be seen, most of the estimations obtained for the regression parameters by fitting the BYM2 model are quite similar to those obtained by fitting the BYM model reported in Table 9 and, in addition, there are no improvements in the information criteria values for any of the fitted models. The parameter φ s , associated to the amount of variability captured by the spatial structure considered in the model, explains more than 30% of the variability in all the fitted models. For instance, for the model in Equation (22), this parameter's estimate wasφ s = 0.3091(0.2219). In some cases, such as for the model including the variables NBI and Vac, and the model only including the variable NBI, the variance explained by the spatial effect was more than 50% of the total variability. Therefore, and, from our point of view, the main advantage that the BYM2 model offers over the BYM model is the possibility of identifying the spatially structured and the overdispersion effects separately. Moreover, the results obtained show the importance of taking into account the spatial dependence in the data in the sense that it may be explaining a large portion of the overdispersion in the data.
In addition, Table 12 includes the estimated marginal effects at the means for some of the fitted BYM2 models. The values obtained for these effects are quite small and greatly resemble those obtained from the BYM models previously fitted and reported in Table 10. Figure 8 shows the scatterplots of the observed versus the predicted values of infant mortality rates obtained from the fitting of some of the reduced versions of the BYM2 model in Equation (22). If we compare these plots to the ones from Figure 7, no significant differences can be seen. Hence, there are no improvements in terms prediction accuracy in the fitting of the BYM2 over the previously fitted BYM models that can be reported.  Finally, in order to illustrate the performance of the fitted models, Figure 9 includes maps of the observed infant mortality rates (i.e., variable Rates, see Figure 9a) and of the estimated mortality rates obtained by fitting some of the models considered here. As can be seen, the maps of the estimated mortality rates obtained by fitting the spatial conditional normal Poisson model in Equations (17) (see Figure 9b) and (18) (see Figure 9c) in Open-BUGS, and for the generalized spatial conditional normal Poisson model in Equation (19) (see Figure 9d) suggest that these three models, presented in Sections 3.1 and 3.3, provided similar estimations of the mortality rates and, in addition, that their fitted values maps are almost identical to the map for the observed values in Figure 9a. This fact is consistent with their observed versus predicted mortality rates scatterplots (see Figure 5 in Section 3.3). Moreover, the maps for the spatial conditional normal Poisson models considered above, fitted in INLA (see Figure 9e,f, respectively) also appear to be very similar to the map for the observed rates in Figure 9a.
The scatterplots of the observed versus the predicted values of infant mortality rates obtained from the fitting of the BYM (see Figure 7c,d) and BYM2 models (see Figure 8a,b) showed no distinguishable differences in the prediction accuracy of infant mortality rates, if we compare them with those obtained from the fitting of the spatial conditional normal Poisson (see Figure 7a,b). However, the maps for the BYM model including the variables NBI and Rec (see Figure 9g), and for the BYM model including the variables Viol and NBI (see Figure 9h) differ from the map of the observed rates in some of the regions. These discrepancies are also displayed between the observed values (see Figure 9a) and the predicted values for infant mortality rates obtained from the fitting of the BYM2 models considered here (see Figure 9i,j) These facts seem to suggest that we have not been able to correctly predict the mortality rates in some cases with the fitted BYM and BYM2 models analyzed here.
(c) Spatial conditional normal Poisson model in Equation (18) fitted in OpenBUGS.
(d) Generalized spatial conditional normal Poisson model in Equation (19)

Discussion
In this paper we have performed a revision of Bayesian spatial conditional overdispersion models [22] for area count data and also provide a comparison with the well known Besag-York-Mollié (BYM) model [16], widely used in the literature for spatial count data modeling. In this context, we would like to emphasize the importance of taking into account the overdispersion, as well as the dependence that can arise from the correlation among the values of the response variable in neighboring locations. We believe that the spatial conditional models offer a good, flexible, reasonable and worth considering alternatives to the BYM model. Moreover, we have shown their usefulness and appropriateness by fitting them to infant mortality data from Colombia.
The spatial conditional normal Poisson and negative binomial models were fitted for the case of Poisson distributed response variables, proving to be good candidates for fitting the data set under study. Moreover, they are also able to account for overdispersion, as well as to explain the intensity of the spatial autocorrelation that was present in the specific data set under study. More specifically, according to the DIC and WAIC information criteria values, the spatial conditional normal Poisson model was selected as the best fitting model. We were able to fit this model and its reduced versions in OpenBUGS and R-INLA and, by setting the same prior distributions, we obtained very close results in both implementations. Given that these two software packages are based on different methodologies, we believe it was convenient and necessary to confirm the consistency of the estimations in both software packages.
Results obtained from the fitting of the aforementioned models were consistent with the results obtained in previous analyses of a similar data set, reported in Cepeda-Cuervo, Córdoba and Núñez-Antón [22], where the authors modeled mortality rates for children younger than five years old of age in Colombia. Their analysis found positive significant relations with the variable explaining the index of unsatisfied basic needs (i.e., NBI) and negative significant relations with the variable representing the resources provided by the government for academic achievement of the population (i.e., Rec) when assessing their effect on the infant mortality rates under study. In our case, for infant mortality rates for children under one year of age, this significant relationship also holds and, in addition, a positive significant relationship was also identified with the percentage of women who had suffered physical violence from their current partners (i.e., Viol). As in Cepeda-Cuervo, Córdoba and Núñez-Antón [22], in this application we have also found evidence of positive spatial autocorrelation in the data under study. In addition, we have also fitted the generalized spatial conditional normal Poisson model, which provided more flexibility by allowing the dispersion to vary as a function of some explanatory variables.
In order to be able to compare the performance of the spatial conditional models, we have also fitted the BYM model to the data under study. There were some difficulties when fitting these models in OpenBUGS, which could be due to a number of problems, such as, for example, a possible existing conflict between the data and the assumed prior distributions (see [37]). In any case, these difficulties did not represent a problem for fitting these model in R-INLA, an issue that was confirmed with the models fitted in previous sections for this particular data set considered. The estimated parameters obtained by fitting the BYM model showed quite large standard deviations, especially for the variance parameters for both the spatially correlated and the uncorrelated random effects. However, the model seemed to favor the spatial structure over the extra-variability, although, because of the way these effects are specified in the BYM model, no more information about the specific spatial dependence could be obtained from it.
In order to be able to provide more information than that obtained from the fitting of the standard BYM model, we have also fitted the BYM2 model, which allows us to identify the spatially structured and the unstructured effects separately. As can be seen in the reported results, this model provided useful findings about the amount of variability in the data that could be explained by the spatial structure assumed in the fitted models. In addition, we were also able to conclude that no significant improvements were suggested by the information criteria values, or in terms of posterior predictive accuracy when the BYM2 model was compared to the previously fitted spatial conditional and to the BYM models.
We have also performed posterior predictive checks on the fitted models, finding that they can provide a reasonable accuracy in the predictions of the mortality rates for most cases, especially for the spatial conditional normal Poisson and the generalized spatial conditional normal Poisson models. In our view, and based on the previously reported results, the performance of the spatial conditional normal Poisson model was considerably better, keeping in mind that the information criteria slightly favored it over the BYM model, and also taking into account the aforementioned issues when fitting the latter. The spatial conditional normal Poisson model allowed for the overdispersion to be taken into account and, unlike the BYM model, it provided information on the type, and also the strength, of the spatial association which was present in the data. In addition, with the results obtained from the fitting of this model, appropriate and well justified inferences could be made about the regression parameters in the model. Finally, with regard to model implementation, we can mention that the models' fitting in the software package OpenBUGS was quite flexible, mainly because it allows the researcher to specify any kind of Bayesian models in a very simple and intuitive way. The same holds and it is also straightforward in R-INLA, since most models are already specified in this package. However, it becomes more complex when the researcher wishes to employ a particular model, different from the ones already available therein, an issue that also occurs when a different prior specification is required. More specifically, the implementation of new prior distributions for Bayesian analysis in R-INLA is one of the current main challenges for the developers of the package (see [43]). However, a point in favor of R-INLA is the much shorter computation time that it requires for the fitting of a model, when compared to OpenBUGS.