A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Case-Study and Data Collection
2.2. Text Processing
2.3. VDCT
2.3.1. Text Similarity
2.3.2. Clustering
2.4. Quality Measures
3. Results and Discussion
3.1. Parameter Sselection
3.2. Quality Measures’ Results
3.3. Visual Comparison and Discussion
4. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
References
- Adedoyin-Olowe, M.; Gaber, M.M.; Dancausa, C.M.; Stahl, F.; Gomes, J.B. A rule dynamics approach to event detection in twitter with its application to sports and politics. Expert Syst. Appl. 2016, 55, 351–360. [Google Scholar] [CrossRef]
- Serrano, E.; Iglesias, C.A.; Garijo, M. A survey of Twitter rumor spreading simulations. In Computational Collective Intelligence; Springer: Berlin/Heidelberg, Germany, 2015; pp. 113–122. [Google Scholar]
- Fu, C.; McKenzie, G.; Frias-Martinez, V.; Stewart, K. Identifying spatiotemporal urban activities through linguistic signatures. Comput. Environ. Urban Syst. 2018, 72, 25–37. [Google Scholar] [CrossRef]
- Gerber, M.S. Predicting crime using Twitter and kernel density estimation. Decis. Support Syst. 2014, 61, 115–125. [Google Scholar] [CrossRef]
- Relia, K.; Akbari, M.; Duncan, D.; Chunara, R. Socio-spatial Self-organizing Maps: Using Social Media to Assess Relevant Geographies for Exposure to Social Processes. arXiv 2018, arXiv:1803.09002. [Google Scholar] [CrossRef]
- Akbari, M.; Relia, K.; Elghafari, A.; Chunara, R. From the User to the Medium: Neural Profiling Across Web Communities. In Proceedings of the Twelfth International AAAI Conference on Web and Social Media, Palo Alto, CA, USA, 25–28 June 2018. [Google Scholar]
- Atefeh, F.; Khreich, W. A survey of techniques for event detection in twitter. Comput. Intell. 2015, 31, 132–164. [Google Scholar] [CrossRef]
- Arın, İ.; Erpam, M.K.; Saygın, Y. I-TWEC: Interactive clustering tool for Twitter. Expert Syst. Appl. 2018, 96, 1–13. [Google Scholar] [CrossRef]
- Mohammadinia, A.; Alimohammadi, A.; Saeidian, B. Efficiency of Geographically Weighted Regression in Modeling Human Leptospirosis Based on Environmental Factors in Gilan Province, Iran. Geosciences 2017, 7, 136. [Google Scholar] [CrossRef]
- Saeidian, B.; Mesgari, M.; Pradhan, B.; Ghodousi, M. Optimized Location-Allocation of Earthquake Relief Centers Using PSO and ACO, Complemented by GIS, Clustering, and TOPSIS. ISPRS Int. J. Geo-Inf. 2018, 7, 292. [Google Scholar] [CrossRef]
- Yang, W.; Deng, M.; Xu, F.; Wang, H. Prediction of hourly PM2. 5 using a space-time support vector regression model. Atmos. Environ. 2018, 181, 12–19. [Google Scholar] [CrossRef]
- Brunsdon, C.; Fotheringham, S.; Charlton, M. Geographically weighted regression. J. R. Stat. Soc. Ser. D (Stat.) 1998, 47, 431–443. [Google Scholar] [CrossRef]
- Blank, G. The digital divide among Twitter users and its implications for social research. Soc. Sci. Comput. Rev. 2017, 35, 679–697. [Google Scholar] [CrossRef]
- Sloan, L.; Morgan, J.; Burnap, P.; Williams, M. Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data. PLoS ONE 2015, 10, e0115545. [Google Scholar] [CrossRef] [PubMed]
- Sloan, L.; Morgan, J.; Housley, W.; Williams, M.; Edwards, A.; Burnap, P.; Rana, O. Knowing the tweeters: Deriving sociologically relevant demographics from Twitter. Sociol. Res. Online 2013, 18, 1–11. [Google Scholar] [CrossRef]
- Mislove, A.; Lehmann, S.; Ahn, Y.-Y.; Onnela, J.-P.; Rosenquist, J.N. Understanding the Demographics of Twitter Users. ICWSM 2011, 11, 25. [Google Scholar]
- Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD 1996, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Parimala, M.; Lopez, D.; Senthilkumar, N. A survey on density based clustering algorithms for mining large spatial databases. Int. J. Adv. Sci. Technol. 2011, 31, 59–66. [Google Scholar]
- Capdevila, J.; Cerquides, J.; Nin, J.; Torres, J. Tweet-scan: An event discovery technique for geo-located tweets. Pattern Recognit. Lett. 2017, 93, 58–68. [Google Scholar] [CrossRef]
- Capdevila, J.; Pericacho, G.; Torres, J.; Cerquides, J. Scaling dbscan-like algorithms for event detection systems in twitter. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, Granada, Spain, 14–16 December 2016; pp. 356–373. [Google Scholar]
- Nakahori, K.; Yamaguchi, S. A method to discover spots from Twitter for tour miner. In Proceedings of the 2017 IEEE International Symposium on Consumer Electronics (ISCE), Taibei, Taiwan, 12–14 June 2017; pp. 32–34. [Google Scholar]
- Lee, C.-H. Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst. Appl. 2012, 39, 9623–9641. [Google Scholar] [CrossRef]
- Arcaini, P.; Bordogna, G.; Ienco, D.; Sterlacchini, S. User-driven geo-temporal density-based exploration of periodic and not periodic events reported in social networks. Inf. Sci. 2016, 340, 122–143. [Google Scholar] [CrossRef]
- Nguyen, M.D.; Shin, W.-Y. DBSTexC: Density-Based Spatio-Textual Clustering on Twitter. In Proceedings of Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia, 31 July–3 August 2017; pp. 23–26. [Google Scholar]
- Idrissi, A.; Rehioui, H.; Laghrissi, A.; Retal, S. An improvement of DENCLUE algorithm for the data clustering. In Proceedings of the 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA), Marrakech, Morocco, 21–23 December 2015; pp. 1–6. [Google Scholar]
- Liu, P.; Zhou, D.; Wu, N. VDBSCAN: Varied density based spatial clustering of applications with noise. In Proceedings of the 2007 International Conference on Service Systems and Service Management, Chengdu, China, 8–11 June 2007; pp. 1–4. [Google Scholar]
- Ram, A.; Sharma, A.; Jalal, A.S.; Agrawal, A.; Singh, R. An enhanced density based spatial clustering of applications with noise. In Proceedings of the 2009 Advance Computing Conference, Patiala, India, 6–7 March 2009; pp. 1475–1478. [Google Scholar]
- Al-Smadi, M.; Jaradat, Z.; Al-Ayyoub, M.; Jararweh, Y. Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 2017, 53, 640–652. [Google Scholar] [CrossRef]
- Lee, H.; Kihm, J.; Choo, J.; Stasko, J.; Park, H. iVisClustering: An interactive visual document clustering via topic modeling. Comput. Graph. Forum 2012, 1155–1164. [Google Scholar] [CrossRef]
- Hurlock, J.; Wilson, M.L. Searching Twitter: Separating the Tweet from the Chaff. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011; pp. 161–168. [Google Scholar]
- Zuo, Y.; Wu, J.; Zhang, H.; Lin, H.; Wang, F.; Xu, K.; Xiong, H. Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 24–27 August 2016; pp. 2105–2114. [Google Scholar]
- Fu, C.; Samet, H.; Sankaranarayanan, J. WeiboStand: Capturing Chinese breaking news using Weibo tweets. In Proceedings of the 7th ACM Sigspatial International Workshop on Location-Based Social Networks, Dallas/Fort Worth, TX, USA, 4 November 2014; pp. 41–48. [Google Scholar]
- Sankaranarayanan, J.; Samet, H.; Teitler, B.E.; Lieberman, M.D.; Sperling, J. Twitterstand: News in tweets. In Proceedings of the 17th Acm sigspatial International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 4–6 November 2009; pp. 42–51. [Google Scholar]
- Louhichi, S.; Gzara, M.; Ben-Abdallah, H. Unsupervised varied density based clustering algorithm using spline. Pattern Recognit. Lett. 2017, 93, 48–57. [Google Scholar] [CrossRef]
- Suthar, N.; Jeet Rajput, I.; Kumar Gupta, V. A Technical Survey on DBSCAN Clustering Algorithm. Int. J. Sci. Eng. Res. 2013, 4, 1775–1781. [Google Scholar]
- Birant, D.; Kut, A. ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data Knowl. Eng. 2007, 60, 208–221. [Google Scholar] [CrossRef]
- Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Friedman, J.H.; Bentley, J.L.; Finkel, R.A. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 1977, 3, 209–226. [Google Scholar] [CrossRef]
- Garcia, J.C.; Avendaño, A.; Vaca, C. Where to go in Brooklyn: NYC Mobility Patterns from Taxi Rides. In Proceedings of the World Conference on Information Systems and Technologies, Naples, Italy, 27–29 March 2018; pp. 203–212. [Google Scholar]
- Schweikert, D.G. An interpolation curve using a spline in tension. J. Math. Phys. 1966, 45, 312–317. [Google Scholar] [CrossRef]
- Bronshtein, I.N.; Semendyayev, K.A.; Musiol, G.; Muehlig, H. Tables. In Handbook of Mathematics; Springer: Berlin/Heidelberg, Germany, 2004; pp. 1007–1091. [Google Scholar]
- Ghaemi, Z.; Alimohammadi, A.; Farnaghi, M. LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran. Environ. Monit. Assess. 2018, 190, 300. [Google Scholar] [CrossRef]
- Saeidian, B.; Mesgari, M.S.; Ghodousi, M. Optimum allocation of water to the cultivation farms using Genetic Algorithm. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 31–38. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 224–227. [Google Scholar] [CrossRef]
- Dunn, J.C. Well-separated clusters and optimal fuzzy partitions. J. Cybern. 1974, 4, 95–104. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Chellal, A.; Boughanem, M.; Dousset, B. Word similarity based model for tweet stream prospective notification. In Proceedings of the European Conference on Information Retrieval, Aberdeen, UK, 8–13 April 2017; pp. 655–661. [Google Scholar]
- De Boom, C.; Van Canneyt, S.; Demeester, T.; Dhoedt, B. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognit. Lett. 2016, 80, 150–156. [Google Scholar] [CrossRef]
- Ozdikis, O.; Senkul, P.; Oguztuzun, H. Context based semantic relations in tweets. In State of the Art Applications of Social Network Analysis; Springer: Berlin/Heidelberg, Germany, 2014; pp. 35–52. [Google Scholar]
- Xu, W.; Callison-Burch, C.; Dolan, B. SemEval-2015 Task 1: Paraphrase and semantic similarity in Twitter (PIT). In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015; pp. 1–11. [Google Scholar]
- Gelernter, J.; Balaji, S. An algorithm for local geoparsing of microtext. GeoInformatica 2013, 17, 635–667. [Google Scholar] [CrossRef]
- Xu, Z.; Chen, L.; Chen, G. Topic based context-aware travel recommendation method exploiting geotagged photos. Neurocomputing 2015, 155, 99–107. [Google Scholar] [CrossRef]
- Abdelhaq, H.; Sengstock, C.; Gertz, M. Eventweet: Online localized event detection from twitter. Proc. VLDB Endow. 2013, 6, 1326–1329. [Google Scholar] [CrossRef]
- Zhang, L.; Sun, X.; Zhuge, H. Location-driven geographical topic discovery. In Proceedings of the 2013 Ninth International Conference on Semantics, Knowledge and Grids (SKG), Beijing, China, 3–4 October 2013; pp. 210–213. [Google Scholar]
Index | |||
---|---|---|---|
Clustering Algorithm | Davies–Bouldin | Dunn | Silhouette |
VDCT | 212.893 | 0.721 | 0.643 |
DBSCAN | 242.674 | 0.653 | 0.426 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghaemi, Z.; Farnaghi, M. A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data. ISPRS Int. J. Geo-Inf. 2019, 8, 82. https://doi.org/10.3390/ijgi8020082
Ghaemi Z, Farnaghi M. A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data. ISPRS International Journal of Geo-Information. 2019; 8(2):82. https://doi.org/10.3390/ijgi8020082
Chicago/Turabian StyleGhaemi, Zeinab, and Mahdi Farnaghi. 2019. "A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data" ISPRS International Journal of Geo-Information 8, no. 2: 82. https://doi.org/10.3390/ijgi8020082
APA StyleGhaemi, Z., & Farnaghi, M. (2019). A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data. ISPRS International Journal of Geo-Information, 8(2), 82. https://doi.org/10.3390/ijgi8020082