Geographic information, such as place names with their latitude and longitude (lat/long), is useful to understand what belongs where. Traditionally, Gazetteers, which are constructed manually by experts, are used as dictionaries containing such geographic information. Recently, since people often post about their current experiences in a short text format to microblogs, their geotagged (tagged with lat/long information) posts are aggregated to automatically construct geographic dictionaries containing more diverse types of information, such as local products and events. Generally, the geotagged posts are collected within a certain time interval. Then, the spatial locality of every word used in the collected geotagged posts is examined to obtain the local words
, representing places, events, etc., which are observed at specific locations by the users. However, focusing on a specific time interval limits the diversity and accuracy of the extracted local words. Further, bot accounts in microblogs can largely affect the spatial locality of the words used in their posts. In order to handle such problems, we propose an online method for continuously update the geographic dictionary by adaptively determining suitable time intervals for examining the spatial locality of each word. The proposed method further filters out the geotagged posts from bot accounts based on the content similarity among their posts to improve the quality of extracted local words. The constructed geographic dictionary is compared with different geographic dictionaries constructed by experts, crowdsourcing, and automatically by focusing on a specific time interval to evaluate its quality.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited