# Defining a Threshold Value for Maximum Spatial Information Loss of Masked Geo-Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Examples of Obfuscated Point Maps for Privacy Protection

^{2}to present locations of residencies of breast cancer in Cape Cod, Massachusetts. The audience of these publications may be primarily the scientific community (experts of the topic) and, at a later stage, the public.

#### 1.2. Examples of Point Maps Where a Confidential Theme is not Obfuscated

#### 1.3. Calculating the Error of Obfuscated Locations

#### 1.4. The Study’s Objective

## 2. Analytical Strategy

#### Preparation of Original and Masked Maps

**Figure 1.**Original maps. The five maps at the top cover the entire city of Vienna (414.67 km

^{2}) and vary in terms of the density of the points. The five maps at the bottom are squared areas in Vienna, each 13.65 km

^{2}large, and varying in terms of the distribution of the points and the clustering degree (NNI, nearest neighbor index).

^{2}and five maps at a larger scale, depicting a portion of Vienna with an area of 13.65 km

^{2}), the same masking degree would affect the larger scale maps more than the smaller scale maps. Furthermore, the application of the mask should yield a wide range of “local divergence” results (0–100). In other words, a mixture of masked maps with a small error that could be perceived as similar from the participants’ point of view, as well as masked maps with a large error that may be perceived as different. To ensure a variety of results, we masked the original datasets using three radii.

## 3. Results

#### 3.1. Survey Results and Participants

**Table 1.**Characteristics of participants (No = 398).

^{a}Nationalities in the U.K. are aggregated to “British” citizenship, since participants used different wording to describe their nationality;

^{b}nationalities with less than 10 participants per nationality (in total, 36 nationalities).

Group | No | % |
---|---|---|

Profession | ||

non-spatial science | 148 | 37.2% |

spatial science | 210 | 52.8% |

No response | 40 | 10.0% |

Sex | ||

female | 163 | 41.0% |

male | 195 | 49.0% |

No responses | 40 | 10.0% |

Age group | ||

<17 | 1 | 0.3% |

18–20 | 8 | 2.0% |

21–29 | 170 | 42.7% |

30–39 | 133 | 33.4% |

40–49 | 32 | 8.0% |

50–59 | 10 | 2.5% |

>60 | 4 | 1.0% |

Non-response | 40 | 10.1% |

Nationality | ||

Greek | 104 | 26.1% |

Austrian | 58 | 14.6% |

German | 51 | 12.8% |

Croat | 24 | 6.0% |

British ^{a} | 16 | 4.0% |

American | 11 | 2.8% |

Other ^{b} | 77 | 19.4% |

No response | 57 | 14.3% |

**Table 2.**Significance of differences in similarity perception between the categories of each group.

^{a}The groups’ categories are statistically different at the 0.05 significance level.

Group | Categories | Mean | Test | Score | p-Value |
---|---|---|---|---|---|

Profession ^{a} | spatial science | 2.83 | Wilcoxon matched-pair test | W = 78 | 0.001 |

non-spatial science | 3.23 | ||||

Sex ^{a} | female | 3.20 | Wilcoxon matched-pair test | W = 60 | 0.007 |

male | 2.90 | ||||

Age group ^{a} | 21–29 | 2.90 | Friedman’s test | X^{2} = 2.81 | 0.046 |

30–39 | 3.17 | ||||

40–49 | 3.17 | ||||

Nationality | Austrian | 3.00 | Friedman’s test | X^{2} = 1.86 | 0.183 |

German | 3.23 | ||||

Greek | 2.97 |

#### 3.2. Comparing Perceived with Statistical Similarity

^{2}; divergence range: 65.56–83.29) than a larger area (area size: 414.67 km

^{2}; divergence range: 52.21–71.75). Similar observations can be made for the perceived similarity. The only exception is that by decreasing the masking degree from 1000 meters to 600 meters of the same area, the perception of similarity does not change towards a more “similar” opinion.

**Table 3.**Perceived similarity and local divergences by area size and masking degree. The perception of similarity is compared with the results obtained from three spatial clustering methods. Nnh.di is the index of the hotspot areas’ divergence using the nearest-neighbor hierarchical spatial clustering. Gi*.di is the index of the hotspot areas’ divergence using the Getis-Ord Gi* statistic. Finally, Ans.di is the index of the hotspot areas’ divergence using the Anselin Local Moran’s I statistic. The greater the divergence, the higher is the dissimilarity between original and masked hot spots.

Similarity | Nnh.di | Gi.di | Ans.di | |
---|---|---|---|---|

Area: 13.65 km^{2}, 1,000-meter masking degree | different | 95.78 | 81.16 | 91.19 |

Area: 13.65 km^{2}, 600-meter masking degree | different | 87.07 | 71.14 | 89.12 |

Area: 13.65 km^{2}, 200-meter masking degree | slightly similar | 57.32 | 44.39 | 69.57 |

Area: 414.67km^{2}, 1,000-meter masking degree | slightly similar | 68.95 | 60.85 | 77.09 |

Area: 414.67km^{2}, 600-meter masking degree | slightly similar | 59.83 | 57.40 | 74.48 |

Area: 414.67km^{2}, 200-meter masking degree | similar | 43.04 | 38.39 | 63.69 |

Area: 13.65 km^{2} | different | 80.06 | 65.56 | 83.29 |

Area: 414.67 km^{2} | slightly similar | 57.27 | 52.21 | 71.75 |

**Table 4.**Correlation between perceived similarity and local divergence indices.

^{a}Correlation is significant at the 0.05 level (2-tailed). All other correlations are significant at the 0.01 level (2-tailed).

Correlation Test | All Participants | Non-Experts | Experts | ||||||
---|---|---|---|---|---|---|---|---|---|

Nnh.di | Ans.di | Gi*.di | Nnh.di | Ans.di | Gi*.di | Nnh.di | Ans.di | Gi*.di | |

Kendall’s tau b | 0.765 | 0.467 | 0.614 | 0.710 | 0.397 ^{a} | 0.492 | 0.703 | 0.451 | 0.583 |

Spearman’s rho | 0.805 | 0.499 | 0.643 | 0.766 | 0.423 ^{a} | 0.523 | 0.755 | 0.494 | 0.631 |

#### 3.3. Estimation Models of Perceived Similarity

Model | P-Value of Diagnostics | |||
---|---|---|---|---|

Model Fit (X^{2}) | Goodness of Fit (Pearson) | Nagelkerke | Test of Parallel Lines | |

All Participants | <0.01 | 0.962 | 0.744 | 0.838 |

Non-experts | <0.01 | 0.902 | 0.679 | 0.116 |

Experts | <0.01 | 0.930 | 0.673 | 0.683 |

Nnh.di Coefficient | ||||

Estimate | SE | Wald | p-Value | |

All Participants | 0.157 | 0.041 | 14.876 | <0.01 |

Non-experts | 0.133 | 0.035 | 14.109 | <0.01 |

Experts | 0.135 | 0.036 | 13.962 | <0.01 |

**Figure 4.**Cumulative percentages of Nnh.di by category of perceived similarity (very similar/similar, slightly similar and different/very different) for each group (all participants, non-experts, experts).

**Figure 5.**Nnh.di results and estimated probability of perceived similarity in three ordered categories (very similar/similar, slightly similar and different/very different) for each group ((

**A**) all participants; (

**B**) non-experts; (

**C**) experts).

## 4. Discussion

^{2}to 13.65 km

^{2}). This is an approximate representation of regions that range from a city to a neighborhood level. Although it is common to visualize a distribution of crime incidents at these scales, smaller or larger scales may be used, as well. For example, the interactive map of the Police.uk website reaches a resolution at the street level. Therefore, further research is needed to accurately evaluate spatial errors at these resolutions.

## Acknowledgement

## Author Contributions

## Conflicts of Interest

## References

- Kounadi, O.; Leitner, M. Why does geoprivacy matter? The scientific publication of confidential data presented on maps. J. Empir. Res. Hum. Res. Ethics
**2014**, 9, 34–45. [Google Scholar] [CrossRef] [PubMed] - Armstrong, M.P.; Rushton, G.; Zimmerman, D.L. Geographically masking health data to preserve confidentiality. Statistics Med.
**1999**, 18, 497–525. [Google Scholar] [CrossRef] - Cottrill, C.D. Location privacy: Who protects? URISA J.-Urban Reg. Inf. Syst. Association
**2011**, 23, 49–59. [Google Scholar] - Bridwell, S.A. The dimensions of locational privacy. Soc. Cities Age Instant Access
**2007**, 88, 209–225. [Google Scholar] - Wheeler, D.C. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996–2003. Int. J. Health Geogr.
**2007**, 6. [Google Scholar] [CrossRef] - Almanza, E.; Jerrett, M.; Dunton, G.; Seto, E.; Pentz, M.A. A study of community design, greenness, and physical activity in children using satellite, GPS and accelerometer data. Health Place
**2012**, 18, 46–54. [Google Scholar] [CrossRef] [PubMed] - Vieira, V.M.; Webster, T.F.; Weinberg, J.M.; Aschengrau, A. Spatial-temporal analysis of breast cancer in upper Cape Cod, Massachusetts. Int. J. Health Geogr.
**2008**, 7. [Google Scholar] [CrossRef] - Data.police.uk. Available online: http://data.police.uk/about/#location-anonymisation (accessed on 23 March 2015).
- Graham, C. Anonymisation: Managing Data Protection Risk Code of Practice; Information Commissioner’s Office: Cheshire, UK, 2012. [Google Scholar]
- Wartell, J.; McEwen, J.T. Privacy in the Information Age: A Guide for Sharing Crime Maps and Spatial Data Series: Research Report; Institute for Law and Justice: Washington, DC, USA, 2001. [Google Scholar]
- Quinton, P. The Impact of Information about Crime and Policing on Public Perceptions: The Results of a Randomised Controlled Trial; National Policing Improvent Agency: London, UK, 2011. [Google Scholar]
- Chainey, S.; Tompson, L. Engagement, empowerment and transparency: Publishing crime statistics using online crime mapping. Polic. J. Policy Pract.
**2012**. [Google Scholar] [CrossRef] - Kounadi, O.; Bowers, K.; Leitner, M. Crime mapping on-line: Public perception of privacy issues. Eur. J. Crim. Policy Res.
**2014**. [Google Scholar] [CrossRef] - The Journal News. Available online: http://archive.lohud.com/interactive/article/20121223/NEWS01/121221011/Map-Where-gun-permits-your-neighborhood-?nclick_check=1 (accessed on 23 March 2015).
- The New York Times. Available online: http://www.nytimes.com/2013/01/07/nyregion/after-pinpointing-gun-owners-journal-news-is-a-target.html (accessed on 23 March 2015).
- CNN. Available online: http://edition.cnn.com/2012/12/25/us/new-york-gun-permit-map/ (accessed on 23 March 2015).
- Foxnews. Available online: http://www.foxnews.com/us/2013/01/04/ex-burglars-say-newspapers-gun-map-wouldve-made-job-easier-safer/ (accessed on 12 February 2015).
- Curtis, A.J.; Mills, J.W.; Leitner, M. Spatial confidentiality and GIS: Re-engineering mortality locations from published maps about Hurricane Katrina. Int. J. Health Geogr.
**2006**, 5, 44. [Google Scholar] [CrossRef] [PubMed] - Kounadi, O.; Lampoltshammer, T.J.; Leitner, M.; Heistracher, T. Accuracy and privacy aspects in free online reverse geocoding services. Cartogr. Geogr. Inf. Sci.
**2013**, 40, 140–153. [Google Scholar] [CrossRef] - Leitner, M.; Mills, J.W.; Curtis, A. Can novices to geospatial technology compromise spatial confidentially? Kartogr. Nachr.(Cartographic News)
**2007**, 57, 78–84. [Google Scholar] - Krumm, J. Inference attacks on location tracks. In Pervasive Computing; LaMarca, A., Langheinrich, M., Truong, K., Eds.; Springer: Berlin Heidelberg, Germany, 2007; Volume 4480, pp. 127–143. [Google Scholar]
- Kwan, M.P.; Casas, I.; Schmitz, B.C. Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartogr. Inter. J. Geogr. Inf. Geovis.
**2004**, 39, 15–28. [Google Scholar] - Cassa, C.A.; Grannis, S.J.; Overhage, J.M.; Mandl, K.D. A context-sensitive approach to anonymizing spatial surveillance data: Impact on outbreak detection. J. Am. Med. Inform. Assoc.
**2006**, 13, 160–165. [Google Scholar] [CrossRef] [PubMed] - Olson, K.L.; Grannis, S.J.; Mandl, K.D. Privacy protection versus cluster detection in spatial epidemiology. Am. J. Public Health
**2006**, 96, 2002–2008. [Google Scholar] [CrossRef] [PubMed] - Hampton, K.H.; Fitch, M.K.; Allshouse, W.B.; Doherty, I.A.; Gesink, D.C.; Leone, P.A.; Serre, M.L.; Miller, W.C. Mapping health data: Improved privacy protection with donut method geomasking. Am. J. Epidemiol.
**2010**, 172, 1062–1069. [Google Scholar] [CrossRef] [PubMed] - Wieland, S.C.; Cassa, C.A.; Mandl, K.D.; Berger, B. Revealing the spatial distribution of a disease while preserving privacy. Proc. Natl. Acad. Sci. USA
**2008**, 105, 17608–17613. [Google Scholar] [CrossRef] [PubMed] - Kounadi, O.; Leitner, M. Spatial information divergence: Using global and local indices to compare geographical masks applied to crime data. Trans. GIS
**2014**. [Google Scholar] [CrossRef] - Leitner, M.; Curtis, A. Cartographic guidelines for geographically masking the locations of confidential point data. Cartogr. Perspect.
**2004**, 49, 22–39. [Google Scholar] [CrossRef] - Perez-Heydrich, C.; Warren, J.L.; Burgert, C.R.; Emch, M. Guidelines on the Use of DHS GPS Data; United States Agency for International Development (USAID): Calverton, MD, USA, 2013.
- Tompson, L.; Johnson, S.; Ashby, M.; Perkins, C.; Edwards, P. UK open source crime data: Accuracy and possibilities for research. Cartogr. Geogr. Inf. Sci.
**2015**, 42, 97–111. [Google Scholar] [CrossRef] - Goodman, L.A. Snowball sampling. Ann. Math. Stat.
**1961**, 32, 148–170. [Google Scholar] [CrossRef] - Likert, R. A technique for the measurement of attitudes. Arch. Psychol.
**1932**, 22, 140. [Google Scholar] - Everett, B. Cluster Analysis; Heinemann Educational Books Ltd.: London, UK, 1974. [Google Scholar]
- Anselin, L. Local indicators of spatial association—LISA. Geogr. Anal.
**1995**, 27, 93–115. [Google Scholar] [CrossRef] - Getis, A.; Ord, J.K. Local spatial statistics: An overview. In Spatial Analysis: Modelling in a GIS Environment; Longley, P.A., Batty, M., Eds.; Geolnformation International: Cambridge, UK, 1996; pp. 261–277. [Google Scholar]
- Agresti, A. Logistic regression. In Categorical Data Analysis, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2002; p. 165. [Google Scholar]
- Leitner, M.; Curtis, A. A first step towards a framework for presenting the location of confidential point data on maps—Results of an empirical perceptual study. Int. J. Geogr. Inf. Sci.
**2006**, 20, 813–822. [Google Scholar] [CrossRef] - Shi, X.; Alford-Teaster, J.; Onega, T. Kernel density estimation with geographically masked points. In Proceedings of 17th International Conference on Geoinformatics, Fairfax, VA, USA, 12–14 August 2009; Volumes 1 and 2, pp. 1153–1156.
- Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc.
**1937**, 32, 675–701. [Google Scholar] [CrossRef] - Wilcoxon, F. Individual comparisons by ranking methods. Biom. Bull.
**1945**, 1, 80–83. [Google Scholar] [CrossRef] - Kendall, M.G. A new measure of rank correlation. Biometrika
**1938**, 30, 81–93. [Google Scholar] [CrossRef] - Spearman, C. The proof and measurement of association between two things. Am. J. Psychol.
**1904**, 15, 72–101. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kounadi, O.; Leitner, M.
Defining a Threshold Value for Maximum Spatial Information Loss of Masked Geo-Data. *ISPRS Int. J. Geo-Inf.* **2015**, *4*, 572-590.
https://doi.org/10.3390/ijgi4020572

**AMA Style**

Kounadi O, Leitner M.
Defining a Threshold Value for Maximum Spatial Information Loss of Masked Geo-Data. *ISPRS International Journal of Geo-Information*. 2015; 4(2):572-590.
https://doi.org/10.3390/ijgi4020572

**Chicago/Turabian Style**

Kounadi, Ourania, and Michael Leitner.
2015. "Defining a Threshold Value for Maximum Spatial Information Loss of Masked Geo-Data" *ISPRS International Journal of Geo-Information* 4, no. 2: 572-590.
https://doi.org/10.3390/ijgi4020572