The count data models were selected in this study since the dependent variable was the number of crimes, which usually presents a skew distribution. Furthermore, this study tries to incorporate overdispersion into a geographically weighted regression model in order to analyze the effect of overdispersion on the non-stationary modeling of crime. Four models were developed to investigate the effect of overdispersion on crime analyses based on the above-mentioned methodology, including the negative binomial model (NB), geographically weighted Poisson regression model (GWPR), geographically weighted negative binomial regression model with local alpha (local GWNBR), and geographically weighted negative binomial regression model with global alpha (global GWNBR).
The above-mentioned models were calibrated using SAS®
software macros developed by Silva and Rodrigues [33
]. The optimum bandwidth for GWPR and global GWNBR were obtained by minimizing the AICc
. Since it was impossible to estimate the AICc
for local GWNBR, the CV
was chosen to determine the optimum bandwidth.
3.1. Model Performance Comparison
Three criteria were adopted to compare the performance of the aforementioned four models, including root mean squared error (RMSE), log-likelihood (LL
), and correct Akaike information criterion (AICc
). The lower the RMSE and AICc
of the model, the better the performance of the model. Models with higher LL values are advantageous over others. The results are shown in Table 3
. The NB model had the highest RMSE, followed by the global GWNBR, local GWNBR, and GWPR models. It is obvious that the three spatial models outperform the non-spatial model. For the three spatial models, one possible reason to explain why the GWPR outperforms the two GWNBR models, with lower RMSE and higher, L
.; is that the former had the smallest bandwidth. With regard to the AICc
, The GWPR had the worst adjustment, followed by NB and the global GWNBR model. The possible reason is that the two later models incorporate overdispersion.
presents the Moran’s I
statistics and the corresponding p
-value for the four models’ residuals. First of all, the Moran’s I
value decreased considerably after incorporating spatial effects and overdispersion in the data. Second, it should be noted that the spatial dependency becomes insignificant in the two GWNBR models, which indicates that the spatial autocorrelation between the models’ residuals can be effectively explained by the overdispersion and spatial heterogeneity.
With the combination of Table 3
and Table 4
, we can assess the relationship between model fit and spatial autocorrelation in the model residuals. The two GWNBR models yielded insignificant Moran’s I
statistics with a moderate RMSE, which was lower than for NB. While the GWPR had the lowest RMSE, it could not solve the spatial dependency efficiently. This indicated that the spatial effect, especially spatial dependency, may not be directly related to the predictive ability of a model. A model with strong predictive power does not guarantee that it is unbiased spatially. A spatial model that produces a spatial non-biased estimate may be at the expense of its predictive power.
3.2. Parameters Estimation
The results of the coefficient estimate are presented in Table 5
. The means of the coefficients in the global model (NB) are provided, as well as the descriptive statistics of coefficients estimated by local models (GWPR, global GWNBR, and local GWNBR) including the minimum and maximum of values, the lower quartile, the upper quartile, and the median values.
The coefficients of GWPR, local GWNBR, and global GWNBR models vary spatially, while the parameters of the NB model are unique in the study area. With regard to the sign of the coefficients’ mean value, there is only one variable, Over60 (percent of people over 60 years of age (%)), that has a negative impact on residential burglary in the NB model, as well as the local GWNBR and global GWNBR models, whereas there are three variables that have a negative impact in GWPR.
With regard to the magnitude of coefficients, the parameters estimated by local GWNBR and global GWNBR models were closer to NB than GWPR. The range of coefficient variation was considerably wider for the GWPR model than for the local GWNBR and global GWNBR models, which may be partly explained by the fact that the GWPR model did not take into account the overdispersion of the data.
There are several local parameters varying from negative to positive in the local models, which is not in conformity with our common sense. For example, the floating population has been reported to have a significantly positive impact on residential burglaries in previous studies [25
], which means that PSMAs with fewer floating populations were safer. Nevertheless, the coefficients of the floating population in some PSMAs are negative in this research. The counterintuitive sign problem was very popular in modeling with local models, such as GWR and GWPR [24
]. One possible reason for this problem was the multi-collinearity among the explanatory variables. In order to quantify the extent of multicollinearity, a bivariate correlation test was conducted, and the results are presented in Table 2
. The maximum value of the correlation coefficient was 0.667 between the floating populations and renters, which implied that there were no highly correlated explanatory variables in the models.
On the other hand, overdispersion in the data may be an important explanation for the unexpected parameter signs, as previous researchers reported [32
]. For instance, the bus stop density was proven to have a positive impact on residential burglaries [50
], as well in our local GWNBR and global GWNBR models, while the same coefficient estimated by GWPR varied from negative to positive. Not considering overdispersion in GWPR may be the reason for this phenomenon.
3.3. Spatial Analyses of the Coefficients
The spatial distribution of all coefficients estimated by the above local models is presented in Figure 2
, Figure 3
and Figure 4
, respectively, and the spatial patterns corresponding to them were investigated subsequently.
There are several spatial patterns that should be noted here. First, given the fact that GWPR was the model with the smallest bandwidth, the coefficients of local GWNBR and global GWNBR were more smooth than GWPR. Second, it seems that the magnitudes of the local coefficients estimated in local GWNBR and global GWNBR shrank towards the range of coefficients of the same variable in the GWPR.
The spatial distribution of the overdispersion parameter for the local GWNBR model is presented in Figure 5
. It can be found that the lower values of α are located in the downtown areas, and these values increased from the urban areas to the suburbs. The overdispersion parameters are significant at a 90% level in more than eighty percent of PSMAs, which indicates the necessity of using the local GWNBR model.
Given the fact that the two GWNBR models are similar, and outperform the NB and GWPR model, we selected the global GWNBR model to interpret our results. The developed model can also be effectively justified by a good interpretation of the parameter estimation.
The house area was adopted as attractiveness for offenders in this research. A higher frequency of large houses resulted in more targets for criminals to choose from. The house area was identified as a significant positive factor in residential burglaries in previous studies [24
]. The coefficient signs of the house area in most PSMAs were positive, which indicated that the increase of big houses increased the residential burglary frequency. There were only 9 PSMAs with negative signs in GWPR followed by 4 in the local GWNBR, and 0 in the global GWNBR. The west of the city is an economic and technological development zone, where the house area has the greatest impact on crime. However, we know that this is a trade-off as larger houses may have better security and be harder to burgle. Burglars may give up stealing from big houses at the risk of being arrested according to rational choice theory. Additional variables should be added in future research to capture the variations.
The number of renters was positively related to the number of residential burglaries in the NB model, which suggested that more renters in a PSMA could result in more residential burglaries. The coefficients of the three local models were positive except for a few PSMAs. Renters have been reported as an important risk factor related to crime in previous studies due to high mobility [25
]. According to the social disorganization theory, the increase of resident mobility will lead to more crimes [8
]. This may be explained by the fact that house owners were more concerned about the security of the community than renters. When there was a potential security risk, house owners were more likely to try to solve the problem, while renters often moved away instead.
Elderly people are well-known as an important informal guardianship in crime literature [53
], which meant that an area with more people over the age of 60 was expected to have fewer residential burglaries. In this research, Over60 was found to be associated with residential burglaries negatively in most of the PSMAs, except for 12. After checking the local t-statistics, we found that none of the 12 were significant at the 95% confidence level. As shown in Figure 2
, Figure 3
and Figure 4
, from a spatial perspective, the impact of Over60 on residential burglaries was greater in the suburbs compared to the urban areas. This may be due to the difference between the physical features of urban and rural areas. In the city center, people live in high buildings that are excluded from monitoring activities, which reduce natural surveillance.
Bus stop density is positively related to the residential burglary frequency in global GWNBR, as in the NB model, suggesting that more bus stops in a PSMA could lead to more residential burglaries. There was no consensus on the impact of accessibility on burglary. Some studies indicated that accessibility was negatively associated with burglary [54
], while some others found that areas with better accessibility could result in more burglaries [56
], which was similar to this study. As shown in Figure 2
, Figure 3
and Figure 4
, the bus stop density had a greater impact in the suburbs. Public transit is the major travel method in China and also for the offenders. There are many options for public transportation in urban areas, such as subway, bus, taxi, tramcar, shared bicycle, etc., while buses are almost the only means of public transport in the suburbs. Routine activity theory claimed that “illegal activities feed upon the legal activities of everyday life”. Public transport is an important way to travel in China, thus bus stops are an important node of daily activities. Therefore, it is not surprising that bus stops have a positive impact on residential burglary.
Floating populations were special groups in the process of social development in China. Previous studies found that floating populations were positively related to crime [47
]. The coefficient of the floating population was negative in 7 out of 215 PSMAs. The investigation of the local t
-statistics indicated that these 7 PSMAs with negative parameters were not significant. According to the social disorganization theory, informal social control helped to prevent crime, while excessive residential mobility was not conducive to informal social regulation. A high proportion of floating populations would lead to more crimes, which has been confirmed in this study.