Abstract: Social networking and network socialization provide abundant text information and social relationships into our daily lives. Making full use of these data in the big data era is of great significance for us to better understand the changing world and the information-based society. Though politics have been integrally involved in the hyperlinked world issues since the 1990s, the text analysis and data visualization of geo-events faced the bottleneck of traditional manual analysis. Though automatic assembly of different geospatial web and distributed geospatial information systems utilizing service chaining have been explored and built recently, the data mining and information collection are not comprehensive enough because of the sensibility, complexity, relativity, timeliness, and unexpected characteristics of political events. Based on the framework of Heritrix and the analysis of web-based text, word frequency, sentiment tendency, and dissemination path of the Huangyan Island incident were studied by using web crawler technology and the text analysis. The results indicate that tag cloud, frequency map, attitudes pie, individual mention ratios, and dissemination flow graph, based on the crawled information and data processing not only highlight the characteristics of geo-event itself, but also implicate many interesting phenomenon and deep-seated problems behind it, such as related topics, theme vocabularies, subject contents, hot countries, event bodies, opinion leaders, high-frequency vocabularies, information sources, semantic structure, propagation paths, distribution of different attitudes, and regional difference of net citizens’ response in the Huangyan Island incident. Furthermore, the text analysis of network information with the help of focused web crawler is able to express the time-space relationship of crawled information and the information characteristic of semantic network to the geo-events. Therefore, it is a useful tool to collect information for understanding the formation and diffusion of web-based public opinions in political events.
Keywords: web crawler technology; text information; sentiment analysis; Huangyan Island Incident
Export to BibTeX
MDPI and ACS Style
Hu, H.; Ge, Y.; Hou, D. Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident. Sustainability 2014, 6, 1896-1912.
Hu H, Ge Y, Hou D. Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident. Sustainability. 2014; 6(4):1896-1912.
Hu, Hao; Ge, Yuejing; Hou, Dongyang. 2014. "Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident." Sustainability 6, no. 4: 1896-1912.