Bayesian Spatial Joint Model for Disease Mapping of Zero-Inflated Data with R-INLA: A Simulation Study and an Application to Male Breast Cancer in Iran

Hierarchical Bayesian log-linear models for Poisson-distributed response data, especially Besag, York and Mollié (BYM) model, are widely used for disease mapping. In some cases, due to the high proportion of zero, Bayesian zero-inflated Poisson models are applied for disease mapping. This study proposes a Bayesian spatial joint model of Bernoulli distribution and Poisson distribution to map disease count data with excessive zeros. Here, the spatial random effect is simultaneously considered into both logistic and log-linear models in a Bayesian hierarchical framework. In addition, we focus on the BYM2 model, a re-parameterization of the common BYM model, with penalized complexity priors for the latent level modeling in the joint model and zero-inflated Poisson models with different type of zeros. To avoid model fitting and convergence issues, Bayesian inferences are implemented using the integrated nested Laplace approximation (INLA) method. The models are compared according to the deviance information criterion and the logarithmic scoring. A simulation study with different proportions of zero exhibits INLA ability in running the models and also shows slight differences between the popular BYM and BYM2 models in terms of model choice criteria. In an application, we apply the fitting models on male breast cancer data in Iran at county level in 2014.


Introduction
In recent years, spatial statistical methods and spatial models for smoothing disease rates are widely applied within epidemiology and ecology studies. Bayesian hierarchical models for the class of latent Gaussian models (LGMs) are frequently used in disease mapping. In fact, Bayesian hierarchical models improve the estimates of log risk by neighboring regions information in the spatial structured component as well as region variation in the unstructured component [1]. Besag, York and Mollié (BYM) model using Markov chain Monte Carlo (MCMC) method is a Bayesian hierarchical model that is widely used in disease mapping (see for example Ebrahimipour et al. [2] and Sharafi et al. [3]).
Unfortunately, the MCMC simulation methods for LGMs, in computing complex posterior distributions with large datasets, are time-consuming and problematic in convergence [1,4,5]. As to improving MCMC algorithms, several MCMC sampling strategies have been suggested; for example, methods based on the Metropolis-adjusted Langevin algorithm (MALA), methods based on Hamiltonian mechanics (HMC) and the single-block strategy [6]. However, despite all these developments, MCMC sampling techniques for LGMs suffer from slow convergence and poor mixing for large and complex models [4,5]. In order to solve these issues, Rue et al. [5] proposed an algorithm for full Bayesian inference, so-called integrated nested Laplace approximation (INLA). Carroll et al. [7] compared INLA and MCMC methods for hierarchical Poisson modeling in disease mapping via R and OpenBUGS. With the same prior distributions, they found the equivalent results using both methods, and by simulation they proved that INLA is computationally faster and more precise than MCMC. Simpson et al. [8], by using INLA, proposed a re-parameterization of the BYM model (BYM2), so as to make priors interpretable and transferable, using penalized complexity (PC) priors and scaled spatial component. Blangiardo and Cameletti [1] showed the ability of INLA in running complex models, explained spatial and spatiotemporal Bayesian models in several fields and implemented code of numerous practical examples with spatial structure in R package R-INLA.
Spatial models for count data are often based on Poisson distribution. However, Poisson distribution might be inappropriate when the number of zeros in data is more than expected (zero-inflated data). To overcome this, zero-inflated Poisson (ZIP) distribution can be used for modeling this type of data. ZIP distribution is a proportion of zeros in combination with Poisson distribution [9]. On the other hand, models which are based on ZIP distribution can be combined with any spatial Bayesian hierarchical model for count data. Hence, Agarwal et al. [10] employed the ZIP model with a spatial Bayesian hierarchical model and showed the model fitting on Isopod nest burrows data in a Bayesian framework. Then, Gschlößl and Czado [11] overviewed spatial regression models with Poisson, generalized Poisson and negative Binomial distributions for zero-inflated count data in a hierarchical Bayesian point of view. They also showed the models on the number of invasive meningococcal disease cases in Germany in 2004. Arab [12] reviewed hurdle and ZIP models with spatial and spatiotemporal structures and used the models to analyze five-year counts of the confirmed cases of Lyme disease in Illinois at the county level with the INLA method.
In most published studies regarding spatial modeling zero-inflated count data of disease and mortality, so as to reduce complexity, spatial component was considered just for the occurrence or for the number of occurrences in a single model [10,13]. On the other hand, when responses come from two different distributions (disease number and occurrence) and we have two models for each response that are affected by common factor (spatial), it is better to use joint model. Joint models, as a way to fit models better, to control all uncertainty sources and to provide accurate inference, have been expanded over the past decade (see for example Blangiardo and Cameletti [1], Zhu et al. [14] and Illian et al. [15]). In the same vein, we expand the class of joint models using the shared spatial random effect for mapping zero-inflated count data. In a spatial joint model structure, random spatial effects can be formulated simultaneously for both the occurrence and the number of occurrences. Incorporating spatial structure in the joint model reduces variability, since information may be borrowed across different datasets, but increases the complexity of a model [15]. Such models might become unstable and face convergence issues by MCMC methods. However, the complex posterior distributions of spatial joint models can be defined and run by the INLA approach [1,4,15].
In our study, it is assumed that zero-inflated data come from two distributions: Bernoulli and Poisson (one for the occurrence and one for the number of occurrences). In this study, we account for spatial random effect in a joint model consisting of logistic and log-linear model to model both parameters of distributions simultaneously. In order to solve the mentioned issues, the parameters of the models are estimated by the INLA method via R-INLA. As spatial modeling, the BYM and BYM2 models, which have different ways of assigning priors, are considered. To compare the behavior of the popular BYM and BYM2 models in single and joint models, we perform a simulation study with different types and proportions of zero, which are common in medical studies. As an application, the models are applied on data of male breast cancer in Iran. The models can be compared according to the deviance information criterion (DIC) and the logarithmic scoring (LS) suggested by Spiegelhalter et al. [16] and Gneiting and Raftery [17].
The main framework of the paper is as follows: Section 2 gives a brief explanation on spatial models and ZIP models with different types of excess zeros, and then the spatial joint model is introduced practically. Sections 3 and 4 give explanations about simulation study and real data. Section 5 shows the result of different simulation scenarios and mapping male breast cancer in Iran in 2014, and Sections 6 and 7 are the discussion and conclusions.

Materials and Methods
Disease mapping is used to identify the spatial pattern of diseases and characteristics of areas and to discover unusual areas with high or low relative risk. In spatial studies, it is assumed that adjacent areas (e.g., administrative areas such as county, state, province) are more similar in environmental characteristics. These characteristics have similar effects on disease prevalence and incidence rates. Therefore, in spatial modeling, information obtained from neighboring areas should be used. For this purpose, Bayesian hierarchical models are proposed. In most published studies, Poisson distribution is used because the numbers of diseases or mortality cases are discrete and rare relative to the target population [1].

The Spatial Models
The number of occurrences or deaths for each area i, i = 1, . . . , n, is denoted by o i . The observation o i follows a probability function (likelihood function) as: so that E i and θ i are the expected number (offset) and relative risk. The main goal is modeling θ i , so using canonical link function a linear model is used as: where α, x T i , δ and γ denote intercept or average of incidence rate, region covariates, regression parameters and random effects, respectively. Priors on α and δ are specified non-informative, usually normal distribution with a large variability. Priors for random effects will be described in the next subsection.

BYM Model
At first, Besag et al. [18] proposed a combination of spatially structured and unstructured effects in the BYM model as where v i ∼ N 0, τ −1 v I represents the spatial unstructured random effect and τ −1 v the marginal variance. In contrast, u i ∼ N 0, τ −1 u Q − is the spatial structured random effect and is modeled under the class of intrinsic Gaussian Markov random field models. Q denotes the precision matrix (neighboring matrix), and Q − is the generalized inverse of Q. The marginal variances are τ −1 u [Q − ] ii , which are dependent on the neighboring matrix Q. In this model, covariance matrix is Gamma priors with small rate parameters are commonly assigned to τ v and τ u . Here, Gamma (1, 0.01) is considered.
In the BYM model, components v and u can not be interpreted independently because the independent random effect can be seen in the spatial random effect partially for the case where there is no spatial dependence. To resolve this issue, priors should be heavily dependent [8]. Hence, Dean et al. [19] replaced the precision parameters of the BYM model with a common precision parameter and added a mixing parameter. On the other hand, in the Dean model, priors cannot be transferred from one graph to another because the marginal variance of spatial dependence effect depends on the graph. Then, to remove the effect of graph structure on precision parameter priors, Simpson et al. [8] developed the Dean model by scaling the neighborhood matrix. They also introduced a new method, termed penalized complexity (PC) priors, to assign priors to the hyperparameters. PC priors are obtained from penalizing the complexity caused by deviation of the base model.

BYM2 Model
Simpson et al. [8] re-parameterized the BYM model with the mixing parameter ϕ ∈ [0, 1] and the precision parameter τ γ . They also scaled the spatial structured effect to facilitate interpretation and transition priors. The BYM2 is as: here, u * i is the scaled structured effect. The generalized variance is equal to 1/τ γ . Thus, τ γ shows the marginal deviation from the model without random effects independent of the graph. The covariance matrix is defined as with Q − * the scaled precision matrix. Therefore, using the standardized Q − * , the marginal precision from v and u * are (1 − ϕ)/τ γ and ϕ/τ γ , respectively and are interpretable independently. In the BYM2 model, PC priors can be applied to ϕ and τ γ . We refer to Simpson et al. [8] and Riebler et al. [20] for more details about BYM2 model and PC priors.

Joint Model
The focus is on disease mapping of count data with extra zeros (often with 50-80 percent zeros in the data). The Zero-inflated Poisson distribution is used instead of Poisson distribution when there are extra zeros in data. Extra zeros are divided into two types: Structural and non-structural zeros (sample zero). The structural zeros come from reality (the number of occurrences in area is reported zero based on reality). However, sample zeros come from chance (the number of occurrences in area is reported zero based on chance or mistake). In this section, we explain zero-inflated Poisson models type1, type0 and extend spatial modeling to a joint model.

Zero-Inflated Poisson Model Type1
Here, the number of occurrences or deaths (observations) is assumed to follow a probability function (likelihood function) as for each area i, i = 1, . . . , n. This probability function is a mixture of proportion of structural zeros (1 − p i ) and sample zeros (p i ). The Poisson distribution is applied to count data with sample zeros. The model includes both types of zero and is known as "Type1" in INLA texts and packages.

Zero-Inflated Poisson Model Type0
When only structural zeros may occur in data, observations follow a probability function as The Poisson distribution is applied for non-zero counts (truncated Poisson), and (1 − p i ) is defined as the proportion of structural zeros. The model includes only structural zeros and is known as "Type0" in INLA texts and packages.
The main interest is now in the modeling two latent fields p i and θ i using canonical link function logit(p i ) and log(θ i ). As usual the linear predictor for the Poisson is log transformation, and for the Bernoulli it is logistic transformation. It should be noted that log(θ i ) and logit(p i ) can include a different set of covariates x i , y i and random effects [1]. However, in the spatial joint model the spatial random effect (γ) is shared on both log(θ i ) and logit(p i ). The models for the latent field are specified as the log link function for θ i as, and logit function for p i as In the last two formulas, α O and α z denote average incidence rate by antilog transformation and average probability of events by antilogit transformation. As prior, normal distribution with mean 0 and precision 0.0001 is assigned to intercepts α O and regression parameters δ, and normal distribution with mean −1 and precision 0.2 is assigned to intercepts α z and regression parameters ω. The parameter β is the scaling parameter that can show the similarity of the spatial pattern between the occurrence and number of occurrences. Prior N(1, 10) is used for β because, the more probability of occurrence, the greater the number of occurrences.
The joint model has two likelihoods, one for the occurrence and one for the number of occurrences, so a two-column response matrix is defined. The joint model considers a combination of the Bernoulli distribution and the truncated Poisson distribution. The Bernoulli distribution ( z i ∼ Binomial(p i , n i = 1)) is viewed as the distribution for disease occurrence in area i, and the truncated Poisson distribution ( o i ∼ Poisson(E i θ i )) is considered for the number of occurrences in area i, given that the disease occurred. Since linear predictor for each response is different, the matrix is specified as misaligned. The number of occurrences in location n i can be zero or a positive number, so the disease occurrence z i at location n i is defined as In the following, we display response matrix structure Y and code of the joint model within the R-INLA package with PC priors for BYM2 model. In this code, phi = ϕ and pc.prec = 1/ √ τ γ . See the website http//www.r-inla.org/ that provides documentation and programming commands for INLA package. We implemented the mentioned models under a simulation study and on a real example via the stable version of the package INLA under R3.4.4. For spatial modeling or model process, the BYM and BYM2 models were considered. We can compare results by the DIC and LS based on the conditional predictive ordinate (CPO) as the model selection criteria. A smaller value of the DIC and LS shows that the model is better. Note that when we deal with two likelihoods, DIC and LS are the sum of local deviance information criteria and LSs in joint model [1].

Simulation Study
We simulated 200 datasets under 60 different scenarios to investigate the behavior of mentioned spatial models on the ZIP models type0, type1 and the joint model. Generally, non-contagious diseases, like cancers or contagious diseases with low prevalence like HIV, have low expected numbers. Therefore, we simulated with low values of E = 1 and 5. In contrast, the epidemic wave of some contagious diseases has high prevalence in some areas but may not be observed in other areas like influenza. Therefore, we simulated with higher expected values of E = 15, 60 and 200. Data are generated under the same pattern proposed by Riebler et al. [20] with two types of zero (type0 and type1) and proportions of zeros (P = 0.50 and P = 0.70), five different expected numbers (E = (1, 5, 15, 60, 200)) and three levels of risk; constant risk, spatially unstructured risk and spatially structured risk. Since truncated Poisson is used in joint model, the pattern of data generation for the joint model is similar to the type0 model. Consistent with Riebler et al.'s study, we used the neighborhood structure of Sardinia in simulations. Sardinia is a large Italian island in the Mediterranean Sea with 366 regions, suitable for testing methodology. The posterior mean estimates are considered for probability of zero, the two hyperparameters 1/ √ τ γ and ϕ and β scaling parameter for the joint model as well as DIC and LS.

Case Study: Male Breast Cancer in Iran
Male breast cancer is defined as a lump that can grow out of control in the chest. Breast cancer is almost 1.5% to 2.5% of all cases of malignancies in men [21]. Although this cancer is rare, its incidence rate is increasing [22]. More than 2000 cases of male breast cancer have been reported in the United States in 2012 [23]. Clinical features to breast cancer in males are like those in women [24]. Risk factors, such as positive family history; advanced age; obesity; BRCA2 mutations; conditions that increase estrogen to androgen ratio (such as liver diseases or prostate cancer treated with estrogens) and environmental factors, including high temperature, pollutants, aromatic hydrocarbons, radiation, etc., increase the incidence of male breast cancer [24,25].
Iran is a country in Western Asia and covers 1,648,195 km 2 of land. Cancers are the third highest cause of mortality in this country, so controlling and preventing cancers are important issues in the field of health in this country. According to the report of the Statistical Center of Iran (SCI) in 2015, Iran has almost 78 million inhabitants (51% males) and is subdivided into 429 administrative areas called counties.
The Iranian National Population-based Cancer Registry (INPCR) recorded cancers according to coding system of the International Classification of Diseases for Oncology ICD-O-3. The INPCR covered 98% of the total population in 2014 [26]. The number of reported cases of male breast cancer are based on code C50 and had a low incidence in Iran. Only 303 cases were reported for 429 counties in 2014 with the average of 0.71 per region. Unfortunately, regarding the importance of spatial patterns and environmental risk factors in the diagnosis and prevention of cancers, there is no geographic study on male breast cancer in Iran.
We decided to fit the mentioned models on dataset of the male breast cancer to obtain estimates of incidence rate in Iran, in 2014, over the population at risk aged >20 years. The data on male breast cancer were obtained from the INPCR. The frequency of disease was zero in many counties.

The Simulation Study
We performed a simulation study to assess the behavior of the BYM2 model on single models (the zero-inflated data of type0 and type1) and on the joint model. The results of type0, type1 and the joint model are almost similar for both proportions of zeros P = 0.50 and P = 0.70. Our focus is on the joint model, so we display the result of the joint model as a sample of simulation study results. Table 1 shows the posterior mean estimates for intercepts, hyperparameters (1 ⁄ and ), scaling parameter and average standard deviations (showed in parentheses) of the joint model. The intercepts have non-informative priors; thus, they are estimated closer to the likelihood. We need to apply exp ( ) (1 + exp ( ) ⁄ ) transformation and exp ( ) transformation on the intercepts and to obtain the average probability of occurrence and average incidence rate. Here, we expect exp ( ) to be estimated close to 1, and exp ( ) (1 + exp ( ) ⁄ ) to be estimated close to 0.50 for P = 0.50, and 0.30 for P = 0.70. Intercepts are well-estimated in each three levels. The parameter has informative prior, thus it is estimated closer to the prior distribution. All s confirm the similarity of spatial pattern between occurrence and number of occurrences because the credibility intervals of s do not contain zero. The results of the estimates for 1 ⁄ are almost reasonable for all levels, but there is no change, even by increasing the expected numbers. Datasets at constant level have no information about , so mixing parameter is estimated based on the prior distribution. In other levels, mixing parameters could be well-estimated, especially when expected numbers are bigger. The results for E = 1 give no clear information at each three levels of risk.

The Simulation Study
We performed a simulation study to assess the behavior of the BYM2 model on single models (the zero-inflated data of type0 and type1) and on the joint model. The results of type0, type1 and the joint model are almost similar for both proportions of zeros P = 0.50 and P = 0.70. Our focus is on the joint model, so we display the result of the joint model as a sample of simulation study results. Table 1 shows the posterior mean estimates for intercepts, hyperparameters 1/ √ τ γ and ϕ , scaling parameter β and average standard deviations (showed in parentheses) of the joint model. The intercepts have non-informative priors; thus, they are estimated closer to the likelihood. We need to apply exp(α z )/(1 + exp(α z )) transformation and exp(α o ) transformation on the intercepts α z and α o to obtain the average probability of occurrence and average incidence rate. Here, we expect exp(α o ) to be estimated close to 1, and exp(α z )/(1 + exp(α z )) to be estimated close to 0.50 for P = 0.50, and 0.30 for P = 0.70. Intercepts are well-estimated in each three levels. The parameter β has informative prior, thus it is estimated closer to the prior distribution. All βs confirm the similarity of spatial pattern between occurrence and number of occurrences because the credibility intervals of βs do not contain zero. The results of the estimates for 1/ √ τ γ are almost reasonable for all levels, but there is no change, even by increasing the expected numbers. Datasets at constant level have no information about ϕ, so mixing parameterφ is estimated based on the prior distribution. In other levels, mixing parameters could be well-estimated, especially when expected numbers are bigger. The results for E = 1 give no clear information at each three levels of risk.
In the second step, despite the advantages of the BYM2 model compared to the BYM model, we would like to compare DIC and LS average values of the BYM2 and BYM models. We included spatial effect for the number of occurrences in single models. According to Table 2, deviance information criteria of type0 and type1 of the BYM2 in all scenarios are smaller or almost equal to that of the BYM model. In the joint model, deviance information criteria of the BYM2 are slightly lower than deviance information criteria of BYM, in most cases. However, LSs of type0, type1 and the joint model of BYM2 have a small difference compared to the BYM model.

Analysis of Male Breast Cancer in Iran
We fitted the BYM and BYM2 models on male breast cancer data and reported the DIC and LS values for the joint model (see Table 3). Eighty five percent of areas have the expected number between (0-1) and the average of expected number is 0.70. The output of the joint model shows that ϕ = 0.56, which means 56% tendency to spatial dependence and 44% tendency to over-dispersion or spatial independence.β = 0.95 with credibility interval (CI) = (0.08-0.96) shows that spatial pattern of occurrence and the number of occurrences are similar in the joint model. In addition, 1/ τ γ = 0.20, which shows the marginal deviation from constant level α, independent of the graph. In the joint model, DIC and LS of the BYM2 are slightly smaller than those of the BYM model. In addition, we checked and saw a reduction in DIC and LS in joint model compared to the sum of them in single models. On the other hand, we would like to compare DIC and LS values of the BYM2 and BYM models for type1 and type0 models on the real data. As usual, we included spatial effect for the number of occurrences in type0 and type1 models. According to Table 4, in the type0 and type1 models, the DIC and LS of BYM2 model are slightly different from those of BYM model. According to model selection criteria, DIC and LS, type1 model is more appropriate than type0 model for male breast cancer data. The output of models shows that zero probability is estimated 0.08 using type1 and 0.70 using type0. Type1 gives lower probability owing to some zeros covered by the Poisson distribution. According to the results and advantages of the BYM2, we performed the male breast cancer mapping based on the BYM2 model. Figure 2 shows maps of the estimated relative risk (exp(γ)) of male breast cancer across 429 counties in Iran using the BYM2 model for type0, type1 and joint model.

Discussion
The major aim of this paper was mapping data with extra zeros using the joint model by INLA method. Further, we studied the behavior of the popular BYM model and BYM2 model to analyze count data with different proportions and types of zeros. The BYM2 model has three main advantages: (1) Priors are transferable by scaling the precision matrix-after scaling, a fixed

Discussion
The major aim of this paper was mapping data with extra zeros using the joint model by INLA method. Further, we studied the behavior of the popular BYM model and BYM2 model to analyze count data with different proportions and types of zeros. The BYM2 model has three main advantages: (1) Priors are transferable by scaling the precision matrix-after scaling, a fixed hyperprior for the precision parameter gives the same amount of smoothing if the graph changes; (2) interpretations of the hyperparameters are clear-the precision parameter shows deviation from a constant risk without depending on the graph structure, and the mixing parameter shows variability between the spatial dependence component and the spatial independence component and (3) the model structure gets prepared to assign PC priors-these priors shrink toward simpler models, such as a model with constant risk or a model without spatial correlation, so we focused on scaling and PC priors using the BYM2 model in simulation study and real data.
Riebler et al. [20] investigated on the BYM2 model with details under the simulation study with different scenarios using Poisson distribution and common disease prevalence. Furthermore, they compared the BYM2 model with usual spatial models in disease mapping and proved that the BYM2 model performs well. They used the INLA method for fitting models instead of MCMC.
Our simulation study on excess zeros showed the BYM2 model's ability to shrink toward true values in different risk surfaces. This study also showed that the joint model in combination with the BYM2 model performs well, without convergence problems, by INLA method. According to selection criteria, DIC and LS, there was a slight difference between the BYM model and BYM2 model. Therefore, the BYM2 model using INLA is preferred to BYM model for mapping zero-inflated data.
The selection between single model and the joint model is dependent on the nature of the data. When data come from two distributions and are influenced by common factors such as location and time, joint models are better. On the other hand, the best model can be chosen by model selection procedures. For preference between single model and joint model, we can fit two single models for each response with spatial component and then obtain the sum of DIS and LS of single models and compare with DIC and LS of joint model. In this study, a small number of datasets in the simulation study were selected at random, and in each of these datasets, we saw a large reduction in DIC and LS for joint model compared to separate models.
Finally, Bayesian spatial joint model, including the BYM and BYM2 models, and type0 and type1 models were applied on male breast cancer data in 429 counties in Iran. Regarding the advantages of the BYM2 model compared to the BYM model, maps were displayed based on the BYM2 model. The maps show that most areas with a high incidence rate of male breast cancer are in the central and eastern parts of Iran. The map of joint model shows that West Azerbaijan province (located in the northwest of Iran) and Khorasan Razavi province (located in the east of Iran) had the lowest (0.73) and highest (1.73) posterior means of the relative risk in 2014, respectively.
According to the Natural Center for Climatology of Iran, northwest provinces of Iran are mountainous and have cool weather. However, most areas in east and central Iran have warm and dry weather, and people are exposed to direct sunlight. Many people in these counties have outdoor occupations (such as working on farms), have special diets (such as eating spicy food) and consume unique opiate (Naswar). Unfortunately, there is no study with a focus on spatial modeling that identifies the geographic pattern of male breast cancer in Iran. Hence, this paper was designed to fill this gap. However, due to the lack of an appropriate registration method, we performed a simple model without covariates and environmental factors. We hope that in the near future, the quality of cancer data registration improves, making it possible to investigate environmental causes by statistical analysis.
Further researches, in order to expand models to other fields, can investigate the BYM2 behavior in spatiotemporal models to zero-inflated data by considering other types of zeros and/or other distributions instead of the Poisson distribution.

Conclusions
This study showed that the joint model for mapping of zero-inflated data in combination with the spatial models performs well, without convergence problem, by INLA method. According to selection criteria there are slightly difference between the BYM and BYM2 models. Therefore, the spatial joint model in combination with the BYM2 model using INLA can be used for disease mapping of zero-inflated data.