The research focuses on detecting tourist flows in the Province of Styria in Austria based on crowdsourced data. Twitter data were collected in the time range from 2008 until August 2018. Extracted tweets were submitted to an extensive filtering process within non-relational database MongoDB. Hotspot Analysis and Kernel Density Estimation methods were applied, to investigate spatial distribution of tourism relevant tweets under temporal variations. Furthermore, employing the VADER method an integrated semantic analysis provides sentiments of extracted tweets. Spatial analyses showed that detected Hotspots correspond to typical Styrian touristic areas. Apart from mainly successful sentiment analysis, it pointed out also a problematic aspect of working with multilingual data. For evaluation purposes, the official tourism data from the Province of Styria and federal Statistical Office of Austria played a role of ground truth data. An evaluation with Pearson’s correlation coefficient was employed, which proves a statistically significant correlation between Twitter data and reference data. In particular, the paper shows that crowdsourced data on a regional level can serve as accurate indicator for the behaviour and movement of users.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited