Integrative Analysis of Spatial Heterogeneity and Overdispersion of Crime with a Geographically Weighted Negative Binomial Model

: Negative binomial (NB) regression model has been used to analyze crime in previous studies. The disadvantage of the NB model is that it cannot deal with spatial e ﬀ ects. Therefore, spatial regression models, such as the geographically weighted Poisson regression (GWPR) model, were introduced to address spatial heterogeneity in crime analysis. However, GWPR could not account for overdispersion, which is commonly observed in crime data. The geographically weighted negative binomial model (GWNBR) was adopted to address spatial heterogeneity and overdispersion simultaneously in crime analysis, based on a 3-year data set collected from ZG city, China, in this study. The count of residential burglaries was used as the dependent variable to calibrate the above models, and the results revealed that the GWPR and GWNBR models performed better than NB for reducing spatial dependency in the model residuals. GWNBR outperformed GWPR for incorporating overdispersion. Therefore, GWNBR was proven to be a promising tool for crime modeling. model (local (global


Introduction
Due to the disparities of built environments and socio-demographic factors, the uneven distribution of crime across different neighborhoods has long been confirmed by many studies [1][2][3][4]. Previous researchers have found that the vast majority of crimes are committed in a few specific locations [5]. Among the many theories of crime geography, routine activity theory, and crime pattern theory are usually employed to explain the spatial agglomeration of criminal activity.
Routine activity theory proposed by Cohen and Felson is a theoretical framework commonly used in crime analysis [6], which states that the convergence of a motivated offender, a vulnerable victim, and a crime-prone place will lead to criminal offenses. Routine activity theory suggests that certain places are more likely to be criminal victimizations, such as bars, schools, and gas stations. Proposed by Brantingham and Brantingham [7], crime pattern theory argues that crime is not randomly distributed over space and time, but presents a specific pattern of places where the intersection of offenders and victims are more vulnerable to crime. The above two theories effectively explain why crime is spatially concentrated and form 'hot spots' of criminal offenses.
Social disorganization theory has been widely used to explore the relationship between crime and related neighborhood characteristics [8]. One of the premises of social disorganization theory is that the crime rate of disadvantageous communities is higher than others, which has been supported

Study Area
With the development of the economy, the crime situation in China is becoming increasingly serious. Residential burglary is the largest crime type in China, and there are more than 1 million burglaries each year [34]. This research was carried out in, Z.G.; which is the largest city in the southeast of China. It is one of the most crowded cities in China and the population of ZG is about 14.9 million in 2018, which is a 3rd of the whole country. Since the reform and open policy changes in the 1980s, the economy of ZG city has made rapid developments due to the advantages of geographical location. In 2018, the per capita GDP (Gross Domestic Product) of ZG was 22,167 US dollars, and it has become one of the 5 richest cities in China.

Data
The data used in this research were collected from, Z.G.; China. The crime data were provided by the ZG Municipal Public Security Bureau. There were more than 150,000 residential burglary records during the period of 2013-2015. The demographic and socioeconomic data were obtained from the ZG Statistical Yearbook published by the ZG Statistics Bureau. The number of bus stops was collected from the geographical information system of ZG city.
Different zone systems have been used in crime analysis, such as states, counties, cities, block groups, and census tracts. A police station is the most grass-roots law enforcement agency in China. The whole city was divided into many areas that were called police station management areas (PSMA) according to the location of each police station. All policing policies are carried out through the police stations in China. Compared with other units, PSMA can be easily integrated with the safety planning process. Therefore, PSMA was selected as the spatial unit in this study, and all data were aggregated at the PSMA-level. There were 215 PSMAs in ZG city. The number of residential burglaries in each PSMA varied from 9 to 3547, as shown in Figure 1. Following previous studies [24,25,[35][36][37], the variables selected for this research are shown in Table 1, as well as their descriptive statistics. The number of burglaries was adopted as the dependent variable in this study. The number of household units was selected as the exposure variable. The explanatory variables, which were commonly used in previous crime research, were selected according to the literature [38][39][40][41]. In order to prevent multi-collinearity between variables, to ensure a significant impact on the results of the model, a bivariate correlation test should be conducted before further analyses. The results are shown in Table 2. All the correlation coefficients were less than 0.7, which indicated that there was no strong correlation between explanatory variables. Note: ** for significant at 0.01 confidence level and * for significant at 0.05 confidence level.

Methodology
Although there are different methodologies to deal with spatial heterogeneity, GWR has been widely used for its convenience. The methodology adopted in this research was based on GWR. In order to compare the model performance, 4 models were developed in this study and are described briefly in this section.

Negative Binomial Model (NB)
The normal distribution of dependent variables is one of the basic assumptions of traditional linear regression models; however, this hypothesis is usually not met in practice. For example, when the number of crimes was adopted as the dependent variable, the distribution no longer presented as a normal distribution but as a Poisson or negative binomial distribution [42,43]. Then the generalized linear models, such as the Poisson regression model, were employed alternatively. The Poisson regression model is usually employed when the dependent variable is count data. However, the assumption of the Poisson model is that the mean is equal to the variance, which is often violated in crime data. Therefore, the negative binomial model is often used instead, to account for the overdispersion.
where NB stands for negative binominal, y i is the number of residential burglaries in the ith (i = 1, . . . , n) PSMA, x ik is the kth explanatory variable for PSMA i, β k (k = 0, 1, . . . , p) are the coefficients, t i is the number of household units in PSMA i, which is the offset variable, and α is the parameter of overdispersion.

Geographically Weighted Poisson Model (GWPR)
The Poisson regression model was extended to geographically weighted Poisson regression (GWPR) when the geographical coordinates of observations were incorporated into the modeling process. The geographically weighted Poisson model was an extension of GWR under the context of generalized linear models, while the dependent variable was count data. The framework of GWPR is described as follows: where (u i , v i ) are the geographical coordinates of the centroid of PSMA i, and β k (u i , v i ) is the function of the centroid of PSMA i, which could be calculated by: whereβ(u i , v i ) is the vector of the local parameters in PSMA i, and W(u i , v i ) is the spatial weight matrix, which can be presented as: where w ij is the weight given to PSMA j during the calibrating procedure for PSMA i.

Geographically Weighted Negative Binomial Model (GWNBR)
GWPR has been employed to explore the relationship between crime and related risk factors when the response variable was the number of crimes [24]. As Xu and Huang indicated, using GWPR to model count data was only a temporary solution, which was mainly restricted by the available software GWR4 [44]. GWR4 was developed by Nakaya et al. [45] to model spatial heterogeneity, which did not provide the calibration of GWR with a negative binomial structure. In order to overcome this disadvantage, the geographically weighted negative binomial regression model (GWNBR) should be used, which can model spatial heterogeneity and overdispersion simultaneously [33]. The GWNBR model can be described as follows: where t i is an offset variable, which is the number of house units, β k is the coefficient for the explanatory variable x k , for k = 1, . . . n, y i is the number of residential burglaries in the ith PSMA, and α is the parameter of overdispersion.
A modified iteratively reweighted least squares (IRLS) method and a Newton-Raphson (NR) algorithm were used alternately to estimate β k (u i , v i ) and α(u i , v i ), according to Silva and Rodrigues [33].
The basic idea of GWR was derived from the first law of geography [46], which indicates that observations near location i have more influence on the estimation of β k (u i , v i ) than observations located further away. A kernel function can effectively represent the magnitude of the influence, and bi-square is one of the most frequently used kernel functions and is used in this study: Bi-square: where d ij is the distance between PSMA i and PSMA j, and b i(k) is the adaptive bandwidth.
Bandwidth has an important influence on parameter estimation. Corrected Akaike information criterion (AICc) and cross-validation (CV) are two commonly used methods to determine the optimal bandwidth, which are described as follows: where L(β, α) is the log-likelihood of GWNBR and k is the effective number of parameters. The k of GWNBR should be recorded as k = k 1 + k 2 , where k 1 and k 2 are the effective number of parameters of β and α. Depending on whether the overdispersion parameter α varies over space, the GWNBR model can evolve into 2 models. The one with spatially various α is called local GWNBR, and the other, with the same α across the whole research area, is called global GWNBR. The k 2 for the global GWNBR is 1 and k 2 for the local GWNBR has been difficult to estimate until today. Therefore, the optimal bandwidth of local GWNBR should be estimated by CV: where b is the bandwidth andŷ j (b) is the estimation for point j.
The root mean squared error (RMSE) is another criterion to evaluate the performance of models, which can be presented as: where y j is the observed number of residential burglaries,ŷ j is the predicted number of residential burglaries, and n is the number of PSMAs.

Results and Discussion
The count data models were selected in this study since the dependent variable was the number of crimes, which usually presents a skew distribution. Furthermore, this study tries to incorporate overdispersion into a geographically weighted regression model in order to analyze the effect of overdispersion on the non-stationary modeling of crime. Four models were developed to investigate the effect of overdispersion on crime analyses based on the above-mentioned methodology, including the negative binomial model (NB), geographically weighted Poisson regression model (GWPR), geographically weighted negative binomial regression model with local alpha (local GWNBR), and geographically weighted negative binomial regression model with global alpha (global GWNBR).
The above-mentioned models were calibrated using SAS ® software macros developed by Silva and Rodrigues [33]. The optimum bandwidth for GWPR and global GWNBR were obtained by minimizing the AICc. Since it was impossible to estimate the AICc for local GWNBR, the CV was chosen to determine the optimum bandwidth.

Model Performance Comparison
Three criteria were adopted to compare the performance of the aforementioned four models, including root mean squared error (RMSE), log-likelihood (LL), and correct Akaike information criterion (AICc). The lower the RMSE and AICc of the model, the better the performance of the model. Models with higher LL values are advantageous over others. The results are shown in Table 3. The NB model had the highest RMSE, followed by the global GWNBR, local GWNBR, and GWPR models. It is obvious that the three spatial models outperform the non-spatial model. For the three spatial models, one possible reason to explain why the GWPR outperforms the two GWNBR models, with lower RMSE and higher, L.L.; is that the former had the smallest bandwidth. With regard to the AICc, The GWPR had the worst adjustment, followed by NB and the global GWNBR model. The possible reason is that the two later models incorporate overdispersion.  Table 4 presents the Moran's I statistics and the corresponding p-value for the four models' residuals. First of all, the Moran's I value decreased considerably after incorporating spatial effects and overdispersion in the data. Second, it should be noted that the spatial dependency becomes insignificant in the two GWNBR models, which indicates that the spatial autocorrelation between the models' residuals can be effectively explained by the overdispersion and spatial heterogeneity. With the combination of Tables 3 and 4, we can assess the relationship between model fit and spatial autocorrelation in the model residuals. The two GWNBR models yielded insignificant Moran's I statistics with a moderate RMSE, which was lower than for NB. While the GWPR had the lowest RMSE, it could not solve the spatial dependency efficiently. This indicated that the spatial effect, especially spatial dependency, may not be directly related to the predictive ability of a model. A model with strong predictive power does not guarantee that it is unbiased spatially. A spatial model that produces a spatial non-biased estimate may be at the expense of its predictive power.

Parameters Estimation
The results of the coefficient estimate are presented in Table 5. The means of the coefficients in the global model (NB) are provided, as well as the descriptive statistics of coefficients estimated by local models (GWPR, global GWNBR, and local GWNBR) including the minimum and maximum of values, the lower quartile, the upper quartile, and the median values.
The coefficients of GWPR, local GWNBR, and global GWNBR models vary spatially, while the parameters of the NB model are unique in the study area. With regard to the sign of the coefficients' mean value, there is only one variable, Over60 (percent of people over 60 years of age (%)), that has a negative impact on residential burglary in the NB model, as well as the local GWNBR and global GWNBR models, whereas there are three variables that have a negative impact in GWPR.
With regard to the magnitude of coefficients, the parameters estimated by local GWNBR and global GWNBR models were closer to NB than GWPR. The range of coefficient variation was considerably wider for the GWPR model than for the local GWNBR and global GWNBR models, which may be partly explained by the fact that the GWPR model did not take into account the overdispersion of the data.
There are several local parameters varying from negative to positive in the local models, which is not in conformity with our common sense. For example, the floating population has been reported to have a significantly positive impact on residential burglaries in previous studies [25,47,48], which means that PSMAs with fewer floating populations were safer. Nevertheless, the coefficients of the floating population in some PSMAs are negative in this research. The counterintuitive sign problem was very popular in modeling with local models, such as GWR and GWPR [24,44,49]. One possible reason for this problem was the multi-collinearity among the explanatory variables. In order to quantify the extent of multicollinearity, a bivariate correlation test was conducted, and the results are presented in Table 2. The maximum value of the correlation coefficient was 0.667 between the floating populations and renters, which implied that there were no highly correlated explanatory variables in the models.
On the other hand, overdispersion in the data may be an important explanation for the unexpected parameter signs, as previous researchers reported [32,44]. For instance, the bus stop density was proven to have a positive impact on residential burglaries [50][51][52], as well in our local GWNBR and global GWNBR models, while the same coefficient estimated by GWPR varied from negative to positive. Not considering overdispersion in GWPR may be the reason for this phenomenon.

Spatial Analyses of the Coefficients
The spatial distribution of all coefficients estimated by the above local models is presented in Figures 2-4, respectively, and the spatial patterns corresponding to them were investigated subsequently.    There are several spatial patterns that should be noted here. First, given the fact that GWPR was the model with the smallest bandwidth, the coefficients of local GWNBR and global GWNBR were more smooth than GWPR. Second, it seems that the magnitudes of the local coefficients estimated in local GWNBR and global GWNBR shrank towards the range of coefficients of the same variable in the GWPR.
The spatial distribution of the overdispersion parameter for the local GWNBR model is presented in Figure 5. It can be found that the lower values of α are located in the downtown areas, and these values increased from the urban areas to the suburbs. The overdispersion parameters are significant at a 90% level in more than eighty percent of PSMAs, which indicates the necessity of using the local GWNBR model. Given the fact that the two GWNBR models are similar, and outperform the NB and GWPR model, we selected the global GWNBR model to interpret our results. The developed model can also be effectively justified by a good interpretation of the parameter estimation.
The house area was adopted as attractiveness for offenders in this research. A higher frequency of large houses resulted in more targets for criminals to choose from. The house area was identified as a significant positive factor in residential burglaries in previous studies [24]. The coefficient signs of the house area in most PSMAs were positive, which indicated that the increase of big houses increased the residential burglary frequency. There were only 9 PSMAs with negative signs in GWPR followed by 4 in the local GWNBR, and 0 in the global GWNBR. The west of the city is an economic and technological development zone, where the house area has the greatest impact on crime. However, we know that this is a trade-off as larger houses may have better security and be harder to burgle. Burglars may give up stealing from big houses at the risk of being arrested according to rational choice theory. Additional variables should be added in future research to capture the variations.
The number of renters was positively related to the number of residential burglaries in the NB model, which suggested that more renters in a PSMA could result in more residential burglaries. The coefficients of the three local models were positive except for a few PSMAs. Renters have been reported as an important risk factor related to crime in previous studies due to high mobility [25]. According to the social disorganization theory, the increase of resident mobility will lead to more crimes [8]. This may be explained by the fact that house owners were more concerned about the security of the community than renters. When there was a potential security risk, house owners were more likely to try to solve the problem, while renters often moved away instead.
Elderly people are well-known as an important informal guardianship in crime literature [53], which meant that an area with more people over the age of 60 was expected to have fewer residential burglaries. In this research, Over60 was found to be associated with residential burglaries negatively in most of the PSMAs, except for 12. After checking the local t-statistics, we found that none of the 12 were significant at the 95% confidence level. As shown in Figures 2-4, from a spatial perspective, the impact of Over60 on residential burglaries was greater in the suburbs compared to the urban areas. This may be due to the difference between the physical features of urban and rural areas. In the city center, people live in high buildings that are excluded from monitoring activities, which reduce natural surveillance.
Bus stop density is positively related to the residential burglary frequency in global GWNBR, as in the NB model, suggesting that more bus stops in a PSMA could lead to more residential burglaries. There was no consensus on the impact of accessibility on burglary. Some studies indicated that accessibility was negatively associated with burglary [54,55], while some others found that areas with better accessibility could result in more burglaries [56,57], which was similar to this study. As shown in Figures 2-4, the bus stop density had a greater impact in the suburbs. Public transit is the major travel method in China and also for the offenders. There are many options for public transportation in urban areas, such as subway, bus, taxi, tramcar, shared bicycle, etc., while buses are almost the only means of public transport in the suburbs. Routine activity theory claimed that "illegal activities feed upon the legal activities of everyday life". Public transport is an important way to travel in China, thus bus stops are an important node of daily activities. Therefore, it is not surprising that bus stops have a positive impact on residential burglary.
Floating populations were special groups in the process of social development in China. Previous studies found that floating populations were positively related to crime [47,48]. The coefficient of the floating population was negative in 7 out of 215 PSMAs. The investigation of the local t-statistics indicated that these 7 PSMAs with negative parameters were not significant. According to the social disorganization theory, informal social control helped to prevent crime, while excessive residential mobility was not conducive to informal social regulation. A high proportion of floating populations would lead to more crimes, which has been confirmed in this study.

Limitations
Although the results of the current study support that GWNBR is a promising tool for crime analysis, we cannot forget that this methodology is only applicable to modeling spatial count data with significant overdispersion. One limitation of this study is that only residential burglary was examined. However, overdispersion has been found in different types of crime, thus this method should be applicable to other crime types. Additionally, only a single Chinese city was investigated. There were great disparities in geographical context between cities or countries. Therefore, further studies should be conducted in different cities and countries and with multiple types of crime to justify the benefit of the proposed models. Nonetheless, previous studies have confirmed that models based the crime pattern theories and routine activity theories were generally applicable in Chinese cities. Furthermore, any research based on spatial units could not avoid the modifiable area unit problem (MAUP), which has also attracted the attention of criminologists [58,59]. Multi-scale analysis is considered to be an effective method to solve the MAUP [60,61]. However, limited by the data, this study cannot carry out sensitivity analysis for the scale effect and zoning effect, which should be implemented in the future.

Conclusions
Models for crime analysis have been widely applied. Geographically weighted regression has been proven to be a powerful methodology for crime modeling, which could capture spatial heterogeneity in crime data. However, there are many issues that have remained unresolved to date, one of which is overdispersion. Therefore, this study mainly focused on the possibility of the integration of spatial heterogeneity and overdispersion in crime modeling. For this purpose, the geographically weighted negative binomial model (GWNBR) was introduced to accommodate spatial heterogeneity and overdispersion simultaneously. A comparison was conducted between four models including the negative binomial model (NB), geographically weighted Poisson model (GWPR), local geographically weighted negative binomial model (local GWNBR), and global geographically weighted negative binomial model (global GWNBR), based on a case study in, Z.G.; China.
In conclusion, the results of this study proved that incorporating overdispersion into spatial heterogeneity could improve the performance of crime modeling. Compared with local GWNBR and global GWNBR, the coefficients of GWPR are more heterogeneous, which may be due to the fact that it does not incorporate the overdispersion of crime data. Another consequence is that the bandwidth of GWPR is the smallest of the three local models, which makes its coefficient surface appear to be sharp. Although GWPR has achieved the best performance for RMSE, it could not eliminate spatial autocorrelation in the model residuals. In addition, the two GWNBR models can resolve spatial heterogeneity and spatial dependence at the same time by incorporating overdispersion.
The coefficients were estimated by the GWNBR model for each PSMA. Then, the crime prediction model could be developed for each PSMA. These crime prediction models can be used to evaluate the daily safety situation and forecast the number of crimes in the future. These models can also be used to assess the effectiveness of current policing policies or countermeasures applied in particular PSMAs.