# Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Steps in the Matching of POIs from Different Datasets

#### 1.2. Previous Works on POI Matching

## 2. Methods

#### 2.1. POI Similarity Measures

_{i}and p

_{j}was computed based on their Euclidean distance by the following equation:

_{i}and p

_{j}and thr is the distance threshold under which two POIs from the different datasets may be considered to be matched. The proportion between $d({p}_{i},{p}_{j})$ and thr is subtracted from 1, so that closer POIs are assigned a spatial similarity closer to 1.

_{i}and s

_{j}. This measure is therefore based only on the structure of WordNet. The other measure considered is the one proposed by Lin [47] and it takes into account the relative positions of the synsets as well as their information content. The Lin measure is computed as follows:

_{i}and s

_{j}. The LCS is the most specific concept which is an ancestor of both s

_{i}and s

_{j}concerning the “is a” semantic relations from WordNet. For instance, the LCS of ‘mooset’ and ‘kangaroo’ would be ‘mammal’. Like the Path Similarity measure, the values of this measure also vary between 1 and 0, as $IC\left(LCS({s}_{i},{s}_{j})\right)<=IC\left({s}_{i}\right)$ and $IC\left(LCS({s}_{i},{s}_{j})\right)<=IC\left({s}_{j}\right)$. In order to compute the information content of the synsets and their LCS, the widely used SemCor corpus [48] was used.

#### 2.2. Aggregation of Similarity Measures

_{i}is the set of matching candidates from POI i. The weight of the spatial similarity is thus always proportional to the string similarity between i and j and always lower than 1. The spatial similarity will influence the matching but always to a lower extent than the string similarity. Similarly to the spatial similarity weight, the weight of the semantic similarity is given by

#### 2.3. Graph-Based Matching Strategies

_{i}from the graph in Figure 1a is matched with node q

_{j}. Probably because it is very simple to implement and effective for merging two similar datasets, this method has been widely applied for matching POI datasets [26,27,31,37]. However, it has two major drawbacks when applied to the matching of POI from VGI sources. The first is that it might produce ambiguous matches, i.e., cases when two nodes from the reference dataset are matched to the same node from the target dataset. For example, nodes m

_{j}and m

_{l}from the graph in Figure 1a would both be matched to node q

_{l}, as shown in Figure 1b. Although it is certainly possible in the VGI context that both m

_{j}and m

_{l}represent the same real-world feature, ambiguous matches are frequently the result of a mistaken match. They also occur, due to the second drawback of this strategy, namely, the assumption that every node from the reference dataset has one corresponding node at the target dataset. This assumption implies that every node of the reference dataset will be matched, regardless of whether the respective venue is also represented in the target dataset or not. Furthermore, this assumption makes the Naïve strategy unable to cope with one-to-none matching cases. Figure 1b shows the matching result obtained with the Naïve strategy for the graph on Figure 1a.

_{i}with q

_{i}if among the edges from m

_{i}the one it shares with q

_{j}has the highest weight and if among the edges from q

_{j}the one it shares with m

_{i}has also the highest weight. This is a conservative method in the sense that it only matches nodes when there is mutual evidence for the match. It thus decreases, theoretically, the risk of false-positive errors. On the other hand, it may leave many nodes from the reference dataset unmatched, thus producing more false-positive one-to-none matching errors, i.e., cases when the algorithm should have matched the node from the reference dataset, but it did not. Figure 1c shows the matching result obtained with the Best-best strategy for the graph on Figure 1a.

^{3}), what practically made our experiments impossible to be performed on a regular computer. However, after the edge weight transformation, which, as demonstrated in Figure 3, transforms some of the edge weights to negative values and thus enables the elimination of such edges from the graph, the time taken for each experiment was in the order of 1 to 2 h.

#### 2.4. Considering Multiple Entries from the Same Venue

#### 2.5. Experiment Design

^{2}located in the central area of London (England) was defined as test-area. The bounding box is circumscribed by the latitudes of 51${}^{\circ}{29}^{\prime}{44}^{\u2033}$ N and 51${}^{\circ}{31}^{\prime}{18}^{\u2033}$ N and the longitudes of 0${}^{\circ}{3}^{\prime}{54}^{\u2033}$ E and 0${}^{\circ}$10${}^{\prime}$12${}^{\u2033}$ E. This area is one of the most vibrant from London and contains a large variety of commercial and leisure-related venues, like pubs, restaurants, cafés, shops, movies and museums. From this area, POI from the VGI platform OSM and the place review location-based social media Foursquare were collected. These two datasets have complementary strengths, as OSM is to most standards reliable regarding the position of POIs, whereas Foursquare contains mostly detailed semantic information about them. POI from OSM were extracted as node features with a name and at least one of the following tag keys: ‘amenity’, ’shop’, ’cuisine’, ’tourism’, ’office’, ’land-use’, ’leisure’, ’food’, ’sport’, ’use’, ’memorial’, ’type’ and ’brewery’. As not all POI are represented as points, OSM polygons (i.e. way features) with a name as well as with one of these tags and the ‘building:yes’ key/value pair were also collected. These OSM ways were transformed into points by associating its semantic data and position to the ways centres-of-mass. POIs from Foursquare were collected when their most detailed use category (see https://developer.foursquare.com/categorytree) was included in the set of categories from our test-samples. In total, 8238 POIs from OSM and 13,548 from Foursquare were collected.

## 3. Results

## 4. Summary and Discussion

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Jonietz, D.; Zipf, A. Defining fitness-for-use for crowdsourced points of interest (POI). ISPRS Int. J. Geo-Inf.
**2016**, 5, 149. [Google Scholar] [CrossRef] - Touya, G.; Antoniou, V.; Olteanu-Raimond, A.-M.; Van Damme, M.-D. Assessing crowdsourced POI quality: Combining methods based on reference data, history, and spatial relations. ISPRS Int. J. Geo-Inf.
**2017**, 6, 80. [Google Scholar] [CrossRef] - Ballatore, A.; Zipf, A. A conceptual quality framework for volunteered geographic information. In Proceedings of the XII Conference on Spatial Information Theory, Santa Fe, NM, USA, 12–16 October 2015. [Google Scholar]
- Senaratne, H.; Mobasheri, A.; Ali, A.L.; Capieri, C.; Haklay, M. A review of volunteered geographic information quality assessment methods. Int. J. Geogr. Inf. Sci.
**2016**, 31, 139–167. [Google Scholar] [CrossRef] - Degrossi, L.C.; Albuquerque, J.P.D.; Rocha, R.D.S.; Zipf, A. A framework of quality assessment methods for crowdsourced geographic information: A systematic literature review. In Proceedings of the 14th International Conference on Information Systems for Crisis Response and Management, Albi, France, 21–24 May 2017. [Google Scholar]
- Li, L.; Goodchild, M.F. An optimisation model for linear feature matching in geographical data conflation. Int. J. Image Data Fusion
**2011**, 2, 309–328. [Google Scholar] [CrossRef] - Abdolmajidi, E.; Mansourian, A.; Will, J.; Harrie, L. Matching authority and VGI road networks using an extended node-based matching algorithm. Geo-Spat. Inf. Sci.
**2015**, 18, 65–80. [Google Scholar] [CrossRef] - Hetch, R.; Kunze, C.; Hahmann, S. Measuring completness of building footprints in OpenStreetMap over space and time. ISPRS Int. J. Geo-Inf.
**2013**, 2, 1066–1091. [Google Scholar] - Fan, H.; Zipf, A.; Fu, Q.; Neis, P. Quality assessment for building footprints data on OpenStreetMap. Int. J. Geogr. Inf. Sci.
**2014**, 28, 700–719. [Google Scholar] [CrossRef] - Rutta, M.; Scioscia, F.; De Filippis, D.; Ieva, S.; Binetti, M.; Di Sciasco, E. A semantic-enhanced augmented reality tool for OpenStreetMap POI discovery. Transp. Res. Procedia
**2014**, 3, 479–488. [Google Scholar] [CrossRef] - Guo, L.; Jiang, H.; Wang, X.; Liu, F. Learning to recommend point-of-interest with the weighted bayseian personalized ranking method in LBSNs. Information
**2017**, 8, 20. [Google Scholar] [CrossRef] - Bakillah, M.; Liang, S.; Mobasheri, A.; Arsanjani, J.J.; Zipf, A. Fine-resolution population mapping using OpenStreetMap points-of-interest. Int. J. Geogr. Inf. Sci.
**2014**, 48, 1940–1963. [Google Scholar] [CrossRef] - Jiang, S.; Alves, A.; Rodrigues, F.; Ferreira, J.; Pereira, F.C. Mining point-of-interest data from social networks for urban land use classification and disaggregation. Comput. Environ. Urban Syst.
**2015**, 53, 36–46. [Google Scholar] [CrossRef] - Kunze, C.; Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Comput. Environ. Urban Syst.
**2015**, 53, 4–18. [Google Scholar] [CrossRef] - Niu, N.; Liu, X.; Jin, H.; Ye, X.; Liu, Y.; Li, X.; Chen, Y.; Li, S. Integrating multi-source big data to infer building functions. Int. J. Geogr. Inf. Sci.
**2017**, 31, 1871–1890. [Google Scholar] [CrossRef] - Calegari, G.R.; Carlino, E.; Peroni, D.; Celino, I. Extracting urban land use from linked open geospatial data. ISPRS Int. J. Geo-Inf.
**2015**, 4, 2109–2130. [Google Scholar] [CrossRef] - Arsanjani, J.J.; Helbich, M.; Bakillah, M.; Hagenauer, J.; Zipf, A. Toward mapping land-use patterns from volunteered geographic information. Int. J. Geogr. Inf. Sci.
**2013**, 27, 2264–2278. [Google Scholar] [CrossRef] - Liu, X.; Long, Y. Automated identification and characterization of parcels with OpenStreetMap and points of interest. Environ. Plan. B Plan. Des.
**2016**, 42, 341–360. [Google Scholar] [CrossRef] - Yang, B.; Zhang, Y.; Lu, F. Geometric-based approach for integrating VGI POIs and road networks. Int. J. Geogr. Inf. Sci.
**2014**, 28, 126–147. [Google Scholar] [CrossRef] - Yang, B.; Zhang, Y. Pattern-mining approach for conflating crowdsourcing road networks with POIs. Int. J. Geogr. Inf. Sci.
**2015**, 29, 786–805. [Google Scholar] [CrossRef] - Pouke, M.; Goncalves, J.; Ferreira, D.; Kostakos, V. Pratical simulation of virtual crowds using points of interests. Comput. Environ. Urban Syst.
**2015**, 57, 118–129. [Google Scholar] [CrossRef] - Sun, Y. Investigating “locality” of intra-urban spatial interactions in New York city using Foursquare data. Int. J. Geo-Inf.
**2016**, 5, 43. [Google Scholar] [CrossRef] - Fang, Z.; Li, Q.; Zhang, X.; Shaw, S.-L. A GIS data model for landmark-based pedestrian navigation. Int. J. Geogr. Inf. Sci.
**2012**, 26, 817–838. [Google Scholar] [CrossRef] - Roussel, A.; Zipf, A. Toward a landmark-based pedestrian navigation service using OSM data. Int. J. Geo-Inf.
**2017**, 6, 64. [Google Scholar] [CrossRef] - Delgado, F.; Martínez-Gonzales, M.M.; Finat, J. An evaluation of ontology matching techniques on geospatial ontologies. Int. J. Geogr. Inf. Sci.
**2013**, 27, 2279–2301. [Google Scholar] [CrossRef] - Mckenzie, G.; Janowicz, K.; Adams, B. Weighted multi-attribute matching of user-generated points of interest. Cartogr. Geogr. Inf. Sci.
**2014**, 41, 125–137. [Google Scholar] [CrossRef] - Li, L.; Xing, X.; Xia, H.; Huang, X. Entropy-weighted instance matching between different sourcing points of interest. Entropy
**2016**, 18, 45. [Google Scholar] [CrossRef] - Novack, T.; Peters, R.; Zipf, A. Graph-based strategies for matching points-of-interests from different VGI sources. In Proceedings of the 20th AGILE Conference, Wageningen, The Netherlands, 9–12 May 2017. [Google Scholar]
- Vasardani, M.; Winter, S.; Richter, K.F. Locating place names from place descriptions. Int. J. Geogr. Inf. Sci.
**2013**, 27, 2509–2532. [Google Scholar] [CrossRef] - Kim, J.; Vasardani, M.; Winter, S. Similarity matching for integrating spatial information extracted from place descriptions. Int. J. Geogr. Inf. Sci.
**2017**, 31, 56–80. [Google Scholar] [CrossRef] - Scheffer, T.; Schirru, R.; Lehmann, P. Matching points of interest from different social networking sites. In KL 2012: Advances in Artificial Intelligence; Glimm, B., Krüger, A., Eds.; Springer: Berlin, Germany, 2012; pp. 245–248. ISBN 978-3-642-33346-0. [Google Scholar]
- Cohen, W.W.; Ravikumar, P.; Fienberg, S.E. A comparison of string distance metrics for name-matching tasks. In Proceedings of the 2003 International Joint Conferences on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, 9–10 August 2003. [Google Scholar]
- Meltzoff, A.N.; Kuhl, P.K.; Movellan, J.; Sejnowski, T.J. Foundations for a new science of learning. Science
**2009**, 325, 284–288. [Google Scholar] [CrossRef] [PubMed] - Liu, W.; Cai, M.; Yuan, H.; Shi, X.; Zhang, W.; Liu, J. Phonotactic language recognition based on Dnn-HMM acoustic model. In Proceedings of the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, 12–14 September 2014; pp. 153–157. [Google Scholar]
- Ballatore, A.; Bertolotto, M.; Wilson, D.C. The semantic similarity ensemble. J. Spat. Inf. Sci.
**2016**, 7, 27–44. [Google Scholar] [CrossRef] - Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res.
**2003**, 3, 993–1022. [Google Scholar] - Rodrigues, F.; Alves, A.; Polisciuc, E.; Jiang, S.; Ferreira, J.; Pereira, F.C. Estimating disaggregated employment size from points-of-interest and census data: From mining the web to model implementation and visualization. Int. J. Adv. Intell. Syst.
**2013**, 6, 41–52. [Google Scholar] - Olteanu-Raimond, A.M.; Mustière, S.; Ruas, A. Knowledge formalization for vector data matching using belief theory. J. Spat. Inf. Sci.
**2015**, 10, 21–46. [Google Scholar] [CrossRef] - Foursquare. Available online: https://foursquare.com/about (accessed on 11 January 2018).
- Yelp. Available online: https://www.yelp.com/about (accessed on 11 January 2018).
- Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl.
**1966**, 10, 707–710. [Google Scholar] - Bonzanini, M. Fuzzy String Matching in Python. Available online: https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/ (accessed on 12 March 2018).
- Miller, G.A. WorldNet: A lexical database for English. Commun. ACM
**1995**, 38, 39–41. [Google Scholar] [CrossRef] - Meng, L.; Huang, R.; Gu, J. A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol.
**2013**, 6, 1–12. [Google Scholar] - Sánchez, D.; Batet, M. A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl.
**2013**, 40, 1393–1399. [Google Scholar] [CrossRef] - Al-Bakri, M.; Fairbairn, D. Assessing similarity matching for possible integration of feature classifications of geospatial data from official and informal sources. Int. J. Geogr. Inf. Sci.
**1995**, 26, 1437–1456. [Google Scholar] [CrossRef] - Lin, D. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, 24–27 July 1998. [Google Scholar]
- Landes, S.; Leacock, C.; Fellbaum, C. Building semantic concordances. In WordNet: An Electronical Lexical Database; Fellbaum, C., Ed.; The MIT Press: London, UK, 1998; pp. 199–216. [Google Scholar]
- Galil, Z. Efficient algorithms for finding maximal matching in graphs. J. ACM Comput. Surv.
**1986**, 18, 23–38. [Google Scholar] [CrossRef] - Kuhn, H.W. The Hungarian method for assignment problems. Nav. Res. Logist. Q.
**1955**, 3, 253–258. [Google Scholar] [CrossRef] - Zwillinger, D.; Kokosa, S. Standard Probability and Statistics Tables and Formulae; Chapman and Hall: London, UK, 2000; p. 480. [Google Scholar]

**Figure 1.**(

**a**) A hypothetical graph. Nodes represent POI and edges matching pair candidates; (

**b**) Matching result obtained with the Naïve method; (

**c**) Matching result obtained with the Best-best method; (

**d**) Matching result obtained with the Combinatorial method.

**Figure 2.**Graph representing four existing POIs from OSM and Foursquare. (

**a**–

**c**) Matching results obtained with the three different strategies investigated in this work. Edge weights were computed with the name similarity measure presented in Section 2.1.

**Figure 3.**Transformation applied to the graph’s edges weights before applying the Combinatorial matching strategy. (

**a**) The graph and its original edge weights; (

**b**) Mean values of the edges connected to each node; (

**c**) Original edge weights minus the mean values computed in the previous step; (

**d**) New edge weights resulting from the summation of the values obtained in the previous step.

**Figure 4.**Queries applied for including edges in the subset of edges extracted by the Best-best method. (

**a**) Queries and decision applied to the ambiguous edges obtained by applying the Naïve method taking the blue dataset as the reference one; (

**b**) Queries and decision applied to the ambiguous edges obtained by applying the Naïve method taking the orange dataset as the reference one.

**Figure 5.**Histogram of the distances between the pairs of matching POI from OSM and Foursquare comprising our test-sample set.

**Figure 6.**Evaluation of the different matching strategies applied with different similarity measures aggregated by their unweighted and weighted sum. (

**a**,

**b**) One-to-one and one-to-many matching accuracies obtained with the three different strategies and similarity measures aggregated by their unweighted (

**a**) and weighted sum (

**b**). (

**c**,

**d**) One-to-none matching accuracies obtained with the Best-best and Combinatorial strategies and similarity measures aggregated by their unweighted (

**c**) and weighted sum (

**d**). (

**e**,

**f**) Overall accuracies with similarity measures aggregated by their unweighted (

**e**) and weighted sums (

**f**).

**Figure 7.**Matching accuracies obtained before and after applying the procedure for tackling the existence of multiple POIs representing the same place.

**Table 1.**The different types and respective amounts of test-samples considered in the performance analysis of the different matching strategies.

Sample Types | Purpose Is to Evaluate the Models Performance in Detecting … | Amount |
---|---|---|

One-to-one | Cases when a POI from OSM should be matched with only one POI from Foursquare and vice-versa. | 195 |

One-to-none | Cases when a POI from OSM does not have any match in Foursquare and should therefore be left unmatched. | 42 |

One-to-many | Cases when more than one POI from OSM should be matched to the same Foursquare POI and cases when more than one POI from Foursquare should be matched to the same POI from OSM. | 34 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Novack, T.; Peters, R.; Zipf, A. Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets. *ISPRS Int. J. Geo-Inf.* **2018**, *7*, 117.
https://doi.org/10.3390/ijgi7030117

**AMA Style**

Novack T, Peters R, Zipf A. Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets. *ISPRS International Journal of Geo-Information*. 2018; 7(3):117.
https://doi.org/10.3390/ijgi7030117

**Chicago/Turabian Style**

Novack, Tessio, Robin Peters, and Alexander Zipf. 2018. "Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets" *ISPRS International Journal of Geo-Information* 7, no. 3: 117.
https://doi.org/10.3390/ijgi7030117