A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response
Abstract
:1. Introduction
- Getting to know Twitter data, the potential elements of location information within a tweet, as well as dealing with the Twitter data collection and sampling
- Proposing a hybrid and multi-elemental approach towards the location inference on Twitter, which significantly improves the location accuracy of the current methods.
2. Existing Approaches to Location Inference
3. Twitter Data and Location-Specific Elements
4. Method Design and Development
4.1. Data Preparation
4.1.1. Data Collection
4.1.2. Data Sampling
4.1.3. Data Cleaning
- Multiple dots “…” which people use in a variety of situations (replaced by a single space).
- User mentions (@somebody).
- Hashtag signs (#) from the beginning of all hashtag words.
- All the punctuation marks, numbers and Internet links (starting with “http://”).
4.2. Location Inference
4.2.1. Location Name Class
- Suburb level: Suburbs that are partially or totally within the data collection zone are selected. To identify the suburbs, the suburbs polygon shapefile downloaded from the ABS website is intersected with the data collection zone (Figure 6). 1381 suburbs are selected and the name field of these suburbs represents the suburb-level name class (). The geographic centroid of the selected suburbs is calculated in a GIS environment. The coordinates of the centroid are considered to be the location of the corresponding suburb.
- City level: The main cities within the data collection zone are identified to constitute the city-level name class (). The coordinates of these cities are extracted from Google Maps and attached to the related name class.
- Administrative level: The names of large-scale administrative areas (state or country) in any possible forms (NSW, New South Wales, Australia, Aus and OZ) surrounding the data collection zone are considered to shape the administrative name class (). As they are too large to be represented as a single location point, geographic coordinates at this level are not calculated.
4.2.2. Location Scoring and Assignment
- be the textual content of a tweet
- be the profile location field of a tweet
- be the place label field of a tweet
- be a location-name class
- Final location of a tweet is the extracted field that belongs to the finest granular level.
- If there is more than one field belonging to the same granular level, the final location is assigned based on the following order of importance:
- Content-based location
- Place-labelled based location
- Profile-based location
5. Results and Evaluation
6. Discussion, Conclusions and Future Work
- When there are multiple location references belonging to the same location name class within a location-related element (e.g., tweet text), the method only detects the first instance and ignores the others. A more detailed investigation of a selected number of tweets shows that about 1% of tweets may have multiple location references of the same class (e.g., multiple suburb names), which are most likely to be neighbouring and adjacent. Even though this amount can be considered negligible without significantly affecting the performance and accuracy of the method, future developments of the method should include a more sophisticated handling of such cases.
- The method is not able to appropriately cope with the location references that might be found in the location-related element in a tweet but are not present in the location name classes. Resolving this issue in the future can increase the overall success rate of the method.
- The method is programmed to be applied to English tweets and may not be applicable on Non-English languages, especially the languages that use non-ASCII characters (e.g., Arabic and Chinese).
Acknowledgments
Author Contributions
Conflicts of Interest
Abbreviations
API | Application Programming Interface |
ASCII | American Standard Code for Information Interchange |
CDMPS | Centre for Disaster Management and Public Safety |
GPS | Global Positioning System |
JSON | JavaScript Object Notation |
References
- BBC. How the Paris Attacks Unfolded on Social Media. Available online: http://www.bbc.com/news/blogs-trending-348 36214 (accessed on 23 November 2015).
- South, J.A. Interactive Emergency Information and Identification Systems and Methods. U.S. Patent 20,150,111,524, 23 April 2015. [Google Scholar]
- Steiger, E.; Albuquerque, J.P.; Zipf, A. An advanced systematic literature review on spatiotemporal analyses of twitter data. In Transactions in GIS; Wiley Online Library: Hoboken, NJ, USA, 2015; pp. 809–834. [Google Scholar]
- Williams, S.A.; Terras, M.M.; Warwick, C. What do people study when they study twitter? Classifying twitter related academic papers. J. Doc. 2013, 69, 384–410. [Google Scholar] [CrossRef]
- Heinzelman, J.; Waters, C. Crowdsourcing Crisis Information in Disaster-Affected Haiti; US Institute of Peace Press: Washington, DC, USA, 2010. [Google Scholar]
- Mansourian, A.; Rajabifard, A.; Valadan Zoej, M.J.; Williamson, I. Using SDI and web-based system to facilitate disaster management. Comput. Geosci. 2006, 32, 303–315. [Google Scholar] [CrossRef]
- Poser, K.; Dransch, D. Volunteered geographic information for disaster management with application to rapid flood damage estimation. Geomatica 2010, 64, 89–98. [Google Scholar]
- Twitter. Twitter Blog: Location, Location, Location. Available online: https://blog.twitter.com/2009/location-location-location (accessed on 12 October 2015).
- Cheng, Z.; Caverlee, J.; Lee, K. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 759–768.
- Morstatter, F.; Pfeffer, J.; Liu, H.; Carley, K.M. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose; Cornell University arXiv: Ithaca, NY, USA, 2013. [Google Scholar]
- Bureau of Meteorology. Monthly Weather Review Australia April 2015. Available online: http://www.bom.gov.au/climat e/mwr/aus/mwr-aus-201504.pdf (accessed on 21 October 2015).
- Paul, M.J.; Dredze, M. You are what you tweet: Analyzing twitter for public health. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011.
- Ciulla, F.; Mocanu, D.; Baronchelli, A.; Gonçalves, B.; Perra, N.; Vespignani, A. Beating the news using social media: The case study of American Idol. EPJ Data Sci. 2012, 1, 1–11. [Google Scholar] [CrossRef]
- Skoric, M.; Poor, N.; Achananuparp, P.; Lim, E.-P.; Jiang, J. Tweets and votes: A study of the 2011 Singapore general election. In Proceedings of the 45th Hawaii International Conference on System Science (HICSS), Maui, HI, USA, 4–7 January 2012; pp. 2583–2591.
- Oku, K.; Ueno, K.; Hattori, F. Mapping geotagged tweets to tourist spots for recommender systems. In Proceedings of the IIAI 3rd International Conference on Advanced Applied Informatics (IIAIAAI), Kitakyushu, Japan, 31 August–4 September 2014; pp. 789–794.
- Sakaki, T.; Okazaki, M.; Matsuo, Y. Earthquake shakes twitter users: Real-Time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 851–860.
- Ajao, O.; Hong, J.; Liu, W. A survey of location inference techniques on twitter. J. Inf. Sci. 2015, 41, 855–864. [Google Scholar] [CrossRef]
- Eisenstein, J.; O’Connor, B.; Smith, N.A.; Xing, E.P. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 27–29 July 2010; pp. 1277–1287.
- Wing, B.P.; Baldridge, J. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 955–964.
- Watanabe, K.; Ochi, M.; Okabe, M.; Onai, R. Jasmine: A real-time local-event detection system based on geolocation information propagated to microblogs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, 24–28 October 2011; pp. 2541–2544.
- Dalvi, N.; Kumar, R.; Pang, B. Object matching in tweets with spatial models. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, Seattle, WA, USA, 8–12 February 2012; pp. 43–52.
- Han, B.; Cook, P.; Baldwin, T. Text-based twitter user geolocation prediction. J. Artif. Intell. Res. 2014, 49, 451–500. [Google Scholar]
- Minot, A.S.; Heier, A.; King, D.; Simek, O.; Stanisha, N. Searching for twitter posts by location. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, Northampton, MA, USA, 27–30 September 2015; pp. 357–360.
- Hecht, B.; Hong, L.; Suh, B.; Chi, E.H. Tweets from Justin Bieber’s heart: The dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, BC, Canada, 7–12 May 2011; pp. 237–246.
- Hiruta, S.; Yonezawa, T.; Jurmu, M.; Tokuda, H. Detection, classification and visualization of place-triggered geotagged tweets. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 956–963.
- Schulz, A.; Hadjakos, A.; Paulheim, H.; Nachtwey, J.; Mühlhäuser, M. A multi-indicator approach for geolocalization of tweets. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA, 8–11 July 2013.
- Twitter. Twitter Developers Documentation. Available online: https://dev.twitter.com/overview/documentation (accessed on 21 October 2015).
- Dataminr. Available online: https://www.dataminr.com/ (accessed on 16 January 2016).
- GNIP. Available online: https://www.gnip.com/ (accessed on 16 January 2016).
- DATASIFT. Available online: http://www.datasift.com/ (accessed on 16 January 2016).
- Australian Bureau of Statistics. Available online: http://www.abs.gov.au/ (accessed on 16 January 2016).
- Zekavat, R.; Buehrer, R.M. Handbook of Position Location: Theory, Practice and Advances; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Rick, D. Deriving the Haversine Formula. Available online: http://mathforum.org/ library/drmath /view/51879.html (accessed on 16 January 2016).
Label | Element | Description |
---|---|---|
A | .\user\location | Nullable. The user-defined location for this account’s profile. Not necessarily a location nor parsable. |
B | .\user\geo_enabled | When true, indicates that the user has enabled the possibility of geotagging their Tweets. This field must be true for the current user to attach geographic data. |
C | .\geo | Deprecated. Nullable. The “coordinates” field can be used instead. |
D | .\coordinates | Nullable. Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as longitude first, then latitude. |
E | .\place | Nullable. When present, indicates that the tweet is associated with (but not necessarily originated from) a Place. |
No. | Tweet ID | Source | Location Name Class | Inferred Location | Actual Location | Distance Error (KM) | |||
---|---|---|---|---|---|---|---|---|---|
Location Name | Latitude | Longitude | Latitude | Longitude | |||||
1 | 590334736905572352 | Place | L1 | Brighton-Le-Sands | −33.9583 | 151.1536 | −33.9697 | 151.1367 | 2.0105 |
2 | 590335052610936833 | Text | L1 | Manly | −33.8042 | 151.2905 | −33.7825 | 151.2847 | 2.4746 |
6 | 590338256392323072 | Profile Location | L1 | Sunshine | −33.1121 | 151.5619 | −32.9252 | 151.7733 | 28.6381 |
7 | 590338761805930498 | Place | L2 | Newcastle | −32.9167 | 151.7500 | −32.9242 | 151.7470 | 0.8836 |
8 | 590338765140332544 | - | - | NA | NA | NA | −33.9194 | 151.2526 | und |
9 | 590339563270184962 | Text | L1 | Sydenham | −33.9167 | 151.1680 | −33.9482 | 151.1401 | 4.3454 |
10 | 590341916333629441 | Text | L1 | Broke | −32.7681 | 151.0883 | −33.9174 | 151.2310 | 128.4835 |
11 | 590342183258968064 | Place | L2 | Central Coast | −33.2992 | 151.1922 | −33.3722 | 151.4796 | 27.9043 |
14 | 590351290875518976 | - | - | NA | NA | NA | −31.8964 | 152.4614 | und |
15 | 590351614130528256 | Text | L1 | Bulahdelah | −32.3868 | 152.1530 | −32.0242 | 152.4728 | 50.3128 |
16 | 592169625951014912 | Text | L1 | Petersham | −33.8946 | 151.1549 | −33.8963 | 151.1535 | 0.2267 |
17 | 592172876750409728 | Text | L1 | Wyong | −33.2778 | 151.4374 | −33.2688 | 151.4343 | 1.0432 |
18 | 592184074103566338 | Profile Location | L1 | Manly | −33.8042 | 151.2905 | −33.7744 | 151.2929 | 3.3290 |
20 | 592212076082405377 | Text | L1 | Petersham | −33.8946 | 151.1549 | −33.8964 | 151.1532 | 0.2534 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
2399 | 592217778284814338 | Text | L1 | Rhodes | −33.8292 | 151.0877 | −33.8870 | 151.1791 | 10.6023 |
2400 | 592218784267567104 | Profile Location | L1 | The Entrance | −33.3450 | 151.4957 | −33.3384 | 151.4958 | 0.7343 |
2401 | 592221934223368192 | Text | L1 | Manly | −33.8042 | 151.2905 | −33.7679 | 151.1065 | 17.4738 |
2402 | 592204745135108096 | Place | L3 | New South Wales | NA | NA | −30.8144 | 152.5375 | und |
2403 | 592228305371172864 | Place | L2 | Sydney | −33.8651 | 151.2099 | −33.7191 | 150.8924 | 33.5307 |
2404 | 592228457842544640 | Place | L2 | Newcastle | −32.9167 | 151.7500 | −32.9340 | 151.7250 | 3.0236 |
2405 | 592263688632963072 | Text | L1 | Rooty Hill | −33.7733 | 150.8401 | −33.8580 | 151.0340 | 20.2390 |
2406 | 592267839433637888 | Place | L2 | Sydney | −33.8651 | 151.2099 | −33.8663 | 151.0465 | 15.0855 |
2407 | 592269376172109824 | Text | L1 | Rosebery | −33.9189 | 151.2048 | −33.7890 | 151.0849 | 18.1999 |
2408 | 592281403053764608 | Text | L1 | Rhodes | −33.8292 | 151.0877 | −33.8866 | 151.1787 | 10.5456 |
2409 | 592283790472445952 | Text | L1 | Maroubra | −33.9440 | 151.2443 | −33.9556 | 151.2249 | 2.2034 |
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Laylavi, F.; Rajabifard, A.; Kalantari, M. A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response. ISPRS Int. J. Geo-Inf. 2016, 5, 56. https://doi.org/10.3390/ijgi5050056
Laylavi F, Rajabifard A, Kalantari M. A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response. ISPRS International Journal of Geo-Information. 2016; 5(5):56. https://doi.org/10.3390/ijgi5050056
Chicago/Turabian StyleLaylavi, Farhad, Abbas Rajabifard, and Mohsen Kalantari. 2016. "A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response" ISPRS International Journal of Geo-Information 5, no. 5: 56. https://doi.org/10.3390/ijgi5050056
APA StyleLaylavi, F., Rajabifard, A., & Kalantari, M. (2016). A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response. ISPRS International Journal of Geo-Information, 5(5), 56. https://doi.org/10.3390/ijgi5050056