Sample data plays an important role in land cover (LC) map validation. Traditionally, they are collected through field survey or image interpretation, either of which is costly, labor-intensive and time-consuming. In recent years, massive geo-tagged texts are emerging on the web and they contain valuable information for LC map validation. However, this kind of special textual data has seldom been analyzed and used for supporting LC map validation. This paper examines the potential of geo-tagged web texts as a new cost-free sample data source to assist LC map validation and proposes an active data collection approach. The proposed approach uses a customized deep web crawler to search for geo-tagged web texts based on land cover-related keywords and string-based rules matching. A data transformation based on buffer analysis is then performed to convert the collected web texts into LC sample data. Using three provinces and three municipalities directly under the Central Government in China as study areas, geo-tagged web texts were collected to validate artificial surface class of China’s 30-meter global land cover datasets (GlobeLand30-2010). A total of 6283 geo-tagged web texts were collected at a speed of 0.58 texts per second. The collected texts about built-up areas were transformed into sample data. User’s accuracy of 82.2% was achieved, which is close to that derived from formal expert validation. The preliminary results show that geo-tagged web texts are valuable ancillary data for LC map validation and the proposed approach can improve the efficiency of sample data collection.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited