The Internet is increasingly a source of data for geographic information systems, as more data becomes linked, available through application programing interfaces (APIs), and more tools become available for handling unstructured web data. While many web data extraction and structuring methods exist, there are few examples of comprehensive data processing and analysis systems that link together these tools for geographic analyses. This paper develops a general approach to the development of spatial information context from unstructured and informal web data sources through the joint analysis of the data’s thematic, spatial, and temporal properties. We explore the utility of this derived contextual information through a case study into maritime surveillance. Extraction and processing techniques such as toponym extraction, disambiguation, and temporal information extraction methods are used to construct a semi-structured maritime context database supporting global scale analysis. Geographic, temporal, and thematic content were analyzed, extracted and processed from a list of information sources. A geoweb interface is developed to allow user visualization of extracted information, as well as to support space-time database queries. Joint keyword clustering and spatial clustering methods are used to demonstrate extraction of documents that relate to real world events in official vessel information data. The quality of contextual geospatial information sources is evaluated in reference to known maritime anomalies obtained from authoritative sources. The feasibility of automated context extraction using the proposed framework and linkage to external data using standard clustering tools is demonstrated.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.