Technological advances have led to numerous developments in data sources. Geo-tagged photo metadata has provided a new source of mass research data for tourism studies. A series of data processing methods centering on the various types of information contained in geo-tagged photo metadata have thus been proposed; as a result, the development of tourism studies based on such data has advanced. However, an in-depth study of the data processing methods designed to conduct tourist flow prediction based on geo-tagged photo metadata has not yet been conducted. In order to acquire accurate substitutive data regarding inbound flows in cities, this paper introduces and designs several methods, including data screening, text data similarity calculation, geographical location clustering, and time series data modelling, in order to realize a data preprocessing model for inbound tourist flows in cities based on geo-tagged photo metadata. Wherein, the entropy filtering method was introduced to aid in determining whether the data were posted by inbound tourists; whether the inbound persons’ activities were related to tourism was judged through the calculation of tag text similarity; an efficient clustering method based on geographic grid partition was designed for cases in which the tag values were empty; finally, the time series of the inbound tourist flows of a certain region and period were obtained through data statistics and normalization. For the empirical research, Beijing City in China was selected as the research case, after which the feasibility and accuracy of the methods proposed in this paper were verified through data correlation analysis between Flickr data and real statistical yearbook data, as well as analysis of the prediction results based on a machine learning algorithm. The data preprocessing method introduced and designed in this paper provides a reference for the study of geo-tagged photo metadata in the field of tourism flow prediction. These methods can effectively filter out inbound tourist flow data from geotag photo metadata, thus providing a novel, reliable, and low-cost research data source for urban inbound tourism flow forecasting.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited