Sustainability 2014, 6(4), 1896-1912; doi:10.3390/su6041896
Article

Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident

1email, 1,* email and 2email
Received: 21 February 2014; in revised form: 26 March 2014 / Accepted: 26 March 2014 / Published: 9 April 2014
(This article belongs to the Special Issue Borderland Studies and Sustainability)
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract: Social networking and network socialization provide abundant text information and social relationships into our daily lives. Making full use of these data in the big data era is of great significance for us to better understand the changing world and the information-based society. Though politics have been integrally involved in the hyperlinked world issues since the 1990s, the text analysis and data visualization of geo-events faced the bottleneck of traditional manual analysis. Though automatic assembly of different geospatial web and distributed geospatial information systems utilizing service chaining have been explored and built recently, the data mining and information collection are not comprehensive enough because of the sensibility, complexity, relativity, timeliness, and unexpected characteristics of political events. Based on the framework of Heritrix and the analysis of web-based text, word frequency, sentiment tendency, and dissemination path of the Huangyan Island incident were studied by using web crawler technology and the text analysis. The results indicate that tag cloud, frequency map, attitudes pie, individual mention ratios, and dissemination flow graph, based on the crawled information and data processing not only highlight the characteristics of geo-event itself, but also implicate many interesting phenomenon and deep-seated problems behind it, such as related topics, theme vocabularies, subject contents, hot countries, event bodies, opinion leaders, high-frequency vocabularies, information sources, semantic structure, propagation paths, distribution of different attitudes, and regional difference of net citizens’ response in the Huangyan Island incident. Furthermore, the text analysis of network information with the help of focused web crawler is able to express the time-space relationship of crawled information and the information characteristic of semantic network to the geo-events. Therefore, it is a useful tool to collect information for understanding the formation and diffusion of web-based public opinions in political events.
Keywords: web crawler technology; text information; sentiment analysis; Huangyan Island Incident
PDF Full-text Download PDF Full-Text [1074 KB, uploaded 9 April 2014 11:18 CEST]

Export to BibTeX |
EndNote


MDPI and ACS Style

Hu, H.; Ge, Y.; Hou, D. Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident. Sustainability 2014, 6, 1896-1912.

AMA Style

Hu H, Ge Y, Hou D. Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident. Sustainability. 2014; 6(4):1896-1912.

Chicago/Turabian Style

Hu, Hao; Ge, Yuejing; Hou, Dongyang. 2014. "Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident." Sustainability 6, no. 4: 1896-1912.

Sustainability EISSN 2071-1050 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert