# Improving Victimization Risk Estimation: A Geographically Weighted Regression Approach

## Abstract



## 1. Introduction

#### 1.1. Literature on Crime Standardization and the Estimation of Victimization Risk

#### 1.2. Literature on Geographically Weighted Regression

## 2. Materials and Methods

#### 2.1. Problem Specification

- Crime counts are often a fraction $f$ of the victimization rates, being also subject to other types of error ${\mathit{\epsilon}}_{C}$ (e.g., missing data, geocoding errors, multiple reports of the same crime, and other less systematic forms of error affecting the relation between victimization and crime counts):$$C=fV+{\mathit{\epsilon}}_{C}$$
- Population data may not be a perfect measure of the actual pool of potential victims of the crime we are considering:$${P}^{*}-P={\mathit{\epsilon}}_{P}\ne 0$$
- Actual victimization rates may not be exactly the expected ones, but fluctuate around it:$$V-E\left[V\right]=\mathit{\u03f5}\ne 0$$

#### 2.2. Proposed Solution

#### 2.3. Validating the Method via a Simulation Study

#### 2.4. Application: Residential Burglaries in the City of Belo Horizonte, Brazil

## 3. Results

#### 3.1. Results for the Validation Study

#### 3.1.1. Simulation Study with One Reference Population

#### 3.1.2. Simulation Study with Two Reference Population

#### 3.2. Results for the Application Study

## 4. Discussion

## Appendix A

## Appendix B

**Table A1.**Results of the simulation study with one population, showing the fitness scores for estimated risk calculated using three different methods, and using different parameters for generating maps of true risk R and reference population P. The unit for the range parameters is in number of map cells, while the units for sill and nugget correspond to the square of the unit for risk (i.e., the unit for its variance: crimes

^{2}/targets

^{2}). Bold was used to highlighted the parameters that area varying from row to row.

Parameter | Fit for Estimated R | ||||||
---|---|---|---|---|---|---|---|

range_{R} | range_{P} | sill_{P} | nugget_{P} | ${\mathbf{\epsilon}}_{\mathbf{C}}$ | Fit (naïve) | Fit (GWRisk) | Fit (Bayes) |

50 | 7 | 16,500 | 1250 | 15% | 0.17 | 0.66 | 0.46 |

50 | 7 | 16,500 | 2500 | 15% | 0.11 | 0.65 | 0.37 |

50 | 7 | 16,500 | 5000 | 15% | 0.13 | 0.71 | 0.43 |

50 | 7 | 16,500 | 10,000 | 15% | 0.17 | 0.71 | 0.48 |

75 | 7 | 16,500 | 1250 | 15% | 0.13 | 0.65 | 0.39 |

25 | 7 | 16,500 | 1250 | 15% | 0.17 | 0.68 | 0.46 |

10 | 7 | 16,500 | 1250 | 15% | 0.17 | 0.57 | 0.50 |

5 | 7 | 16,500 | 1250 | 15% | 0.23 | 0.39 | 0.50 |

50 | 3.5 | 16,500 | 1250 | 15% | 0.12 | 0.73 | 0.42 |

50 | 14 | 16,500 | 1250 | 15% | 0.16 | 0.60 | 0.44 |

50 | 28 | 16,500 | 1250 | 15% | 0.26 | 0.59 | 0.55 |

50 | 56 | 16,500 | 1250 | 15% | 0.40 | 0.57 | 0.60 |

50 | 7 | 10,000 | 1250 | 15% | 0.18 | 0.66 | 0.52 |

50 | 7 | 20,000 | 1250 | 15% | 0.14 | 0.69 | 0.44 |

50 | 7 | 40,000 | 1250 | 15% | 0.12 | 0.71 | 0.34 |

50 | 7 | 80,000 | 1250 | 15% | 0.09 | 0.68 | 0.28 |

50 | 7 | 16,500 | 1250 | 5% | 0.32 | 0.68 | 0.76 |

50 | 7 | 16,500 | 1250 | 25% | 0.08 | 0.63 | 0.20 |

50 | 7 | 16,500 | 1250 | 50% | 0.03 | 0.65 | 0.06 |

50 | 7 | 16,500 | 1250 | 100% | 0.01 | 0.54 | 0.02 |

## Appendix C

**Figure A1.**An example case showing maps for simulated reference population, (true) victimization risk and crime counts, as well as the estimated victimization risks using each of the three methods considered: GWRisk, naïve estimation, and the Empirical Bayes Estimator method. Parameters for this case are listed in Table A1 (Appendix B), 15th entry. Very high values were removed for naïve and Empirical Bayesian to allow comparison between lower values. (See Figure 5 for uncapped figures). Notice that spurious peaks still exist even in this version.

**Figure A2.**Estimated victimization risks of burglary for single-family houses and residential apartments, calculated using each of the three methods tested. Very high values were removed for naïve and Empirical Bayesian to allow comparison between lower values. (See Figure 6 for uncapped figures). Notice that (probably spurious) peaks still exist even in this version.

**Figure 2.**Map of residential burglaries in Belo Horizonte, Brazil, from 2008 to 2014. Grid used consists of uniform square cells of 278.5 m per side.

**Figure 5.**An example case showing maps for simulated reference population, (true) victimization risk and crime counts, as well as the estimated victimization risks using each of the three methods considered: GWRisk, naïve estimation, and the Empirical Bayes Estimator method. Parameters for this case are listed in Table A1 (Appendix B), 15th entry.

**Figure 6.**Estimated victimization risks of burglary for single-family houses and residential apartments, calculated using each of the three methods tested. See Figure A2 in Appendix C for naïve and Empirical Bayes maps with very high values removed.

**Table 1.**Summary of simulation study with one population, showing the mean values of the fitness scores for each method, as well as their standard deviations and coefficient of variations.

Fit for Estimated R | Mean | Std. Dev. | Coef. Var. |
---|---|---|---|

Fit (naïve) | 0.16 | 0.08 | 52% |

Fit (GWR) | 0.61 | 0.08 | 14% |

Fit (Bayes) | 0.42 | 0.16 | 39% |

**Table 2.**Summary of the simulation study with two populations, showing the mean values of the fitness scores for each method, as well as their standard deviations and coefficient of variations.

Mean | Std. Dev. | Coef. Var | |
---|---|---|---|

Fit for estimated R1 | |||

Fit (naïve) | 0.01 | 0.02 | 147% |

Fit (GWR) | 0.67 | 0.07 | 11% |

Fit (Bayes) | 0.02 | 0.03 | 148% |

Fit for estimated R2 | |||

Fit (naïve) | 0.01 | 0.00 | 47% |

Fit (GWR) | 0.28 | 0.08 | 29% |

Fit (Bayes) | 0.01 | 0.01 | 48% |

