Social media is a popular source of volunteered geographic information owing to its massive real-time data; however, the use of social media data in the context of geospatial analysis is challenging because complex semantic filters are required for the aggregation of geographic messages from the data streams. This article proposes a new query expansion method for social media streams which updates the query keywords periodically by the words extracted from the preceding search results. The proposed method has optimized the trade-off between precision and coverage of geographical messages by factoring in the influences of the keyword number and refresh cycle in the query process, and some improvements on the classic Term Frequency-Inverse Document Frequency (TF-IDF) method for short texts were achieved. Furthermore, a number of filters based upon relevance to the target topic were established and tested. This method was tested on a dataset from Twitter within the geographic extent of Macau in August 2017 during two consecutive typhoon hits. The result supports its effectiveness with a controllable precision and considerable increment of relevant information. Moreover, the query keywords can adjust themselves to the local language environment by discovering new keywords. To conclude, this query expansion method is able to provide a reliable method for social media-based information retrieval.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited