Find You: Multi-View-Based Location Inference for Twitter Users
Abstract
:1. Introduction
- We propose a local celebrity discovery algorithm that preserves the location-indicative role of local celebrities in the social relationship network.
- We represent the entire corpus as a heterogeneous graph, leveraging local and global correlations between words to extract user tweet features.
- We introduce MVGeo, a multi-view-based location inference model, which demonstrates superior performance in experimental results on public datasets.
2. Related Work
3. Proposed Methodology
3.1. Network View
3.2. Tweet-View Construction
3.3. Model Structure
Multi-View Fusion
4. Initialization
4.1. Datasets
4.2. Location Segmentation
4.3. Evaluation Indicators
4.4. Baseline Model
- MADCEL [27] combines text and web data with a logistic regression model to conduct location prediction.
- MLP4Geo [8] is a text-based model that improves prediction performance by employing dialectal terms. A simple MLP network is utilized for location prediction.
- MENET [30] is an architecture that integrates numerous tweet characteristics. In MENET, we only use text and network information to ensure a fair comparison, i.e., we do not use its metadata, such as timestamps.
- GeoAtt [34] models textual contexts using an RNN based on attention. We omit the location description from GeoAtt to ensure an equitable comparison.
- DCCA [12] is a multi-view geolocation model that uses Twitter text and network data.
- BiLSTM-C [35] is a text-view geolocation model that views user-generated content and its associated locations as sequences and infers locations using bidirectional long short-term memory (LSTM) and convolution operations.
- Attn [35] is an attentional memory network for the localization of social media messages. It consists of an attentional message encoder that concentrates on location-indicative terms selectively to produce a differentiated message representation.
- SGC4Geo [36] is a simplified graph convolutional network that reduces the superfluous complexity of GCNs by removing non-linearities between GCN layers iteratively and collapsing the resulting function into a single linear transformation. It determines a user’s residence based on his/her social posts and connections.
- M-GCN [14] employs graph convolutional networks to extract user text and link information from multiple perspectives to infer the location of the user.
- KB-emb [37] is a location inference technique that relies on entity linking and knowledge base embedding.
- MetaGeo [13] proposes a general framework for identifying user geolocation based on meta-learning to learn the prior distribution of geolocation tasks.
4.5. Experimental Settings
5. Experimental Results
- How well does the MVGeo model proposed in this work perform on location inference compared to current state-of-the-art baseline models?
- How well does the model proposed in this work perform when trained on a small amount of data using only the tweet view?
- How do the tweet view and the network view affect the inference effectiveness of the model?
- What is a good choice for the number of subdistributions for the Gaussian mixture model in the network view?
5.1. Location Inference Model Performance
5.2. Single-View Impact
5.3. Small-Scale Dataset
5.4. Ablation Experiments
5.5. Choice of S-Value
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sakaki, T.; Okazaki, M.; Matsuo, Y. Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development. IEEE Trans. Knowl. Data Eng. 2012, 99, 919–931. [Google Scholar] [CrossRef]
- Bao, J.; Zheng, Y.; Wilkie, D.; Mokbel, M. Recommendations in location-based social networks: A survey. GeoInformatica 2015, 19, 525–565. [Google Scholar] [CrossRef]
- Cheng, Z.; Caverlee, J.; Lee, K. A Content-Driven Framework for Geolocating Microblog Users. ACM Trans. Intell. Syst. Technol. (TIST) 2013, 4, 1–27. [Google Scholar] [CrossRef]
- Cheng, Z.; Caverlee, J.; Lee, K. You are where you Tweet: A content-based approach to geo-locating Twitter users. In Proceedings of the International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 759–768. [Google Scholar] [CrossRef]
- Wing, B.; Baldridge, J. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 21 June 2011; pp. 955–964. [Google Scholar]
- Roller, S.; Speriosu, M.; Rallapalli, S.; Wing, B.; Baldridge, J. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012. [Google Scholar]
- Chi, L.; Lim, K.H.; Alam, N.; Butler, C. Geolocation Prediction in Twitter Using Location Indicative Words and Textual Features. In Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), Osaka, Japan, 11 December 2016. [Google Scholar]
- Rahimi, A.; Cohn, T.; Baldwin, T. A Neural Model for User Geolocation and Lexical Dialectology. arXiv 2017, arXiv:1704.04008. [Google Scholar]
- Davis Jr, C.; Pappa, G.; Rennó Rocha de Oliveira, D.; Arcanjo, F. Inferring the Location of Twitter Messages Based on User Relationships. Trans. GIS 2011, 15, 735–751. [Google Scholar] [CrossRef]
- Wang, F.; Lu, C.T.; Qu, Y.; Yu, P. Collective Geographical Embedding for Geolocating Social Network Users. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2017; pp. 599–611. [Google Scholar] [CrossRef]
- Huu, T.; Nguyen, D.; Tsiligianni, E.; Cornelis, B.; Deligiannis, N. Multiview Deep Learning for Predicting Twitter Users’ Location. arXiv 2017, arXiv:1712.08091. [Google Scholar]
- Rahimi, A.; Cohn, T.; Baldwin, T. Semi-supervised User Geolocation via Graph Convolutional Networks arXiv 2018. arXiv 2018, arXiv:1804.08049. [Google Scholar]
- Zhou, F.; Qi, X.; Zhang, K.; Trajcevski, G.; Zhong, T. MetaGeo: A General Framework for Social User Geolocation Identification With Few-Shot Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8950–8964. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Ye, C.; Zhou, H. Geolocation using GAT with Multiview Learning. In Proceedings of the 2020 IEEE International Conference on Smart Data Services (SMDS), Beijing, China, 19–23 October 2020; p. 88. [Google Scholar] [CrossRef]
- Han, B.; Cook, P.; Baldwin, T. Text-Based Twitter User Geolocation Prediction. J. Artif. Intell. Res. (JAIR) 2014, 49. [Google Scholar] [CrossRef]
- Eisenstein, J.; O’Connor, B.; Smith, N.; Xing, E. A Latent Variable Model for Geographic Lexical Variation. In Proceedings of the EMNLP 2010—Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Cambridge, MA, USA, 9–11 October 2010; pp. 1277–1287. [Google Scholar]
- Rahimi, A.; Baldwin, T.; Cohn, T. Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 167–176. [Google Scholar] [CrossRef]
- Tang, H.; Zhao, X.; Ren, Y. A multilayer recognition model for twitter user geolocation. Wirel. Netw. 2022, 28, 1197–1202. [Google Scholar] [CrossRef]
- Compton, R.; Jurgens, D.; Allen, D. Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization. In Proceedings of the 2014 IEEE International Conference on Big Data, IEEE Big Data 2014, Washington, DC, USA, 27–30 October 2014. [Google Scholar] [CrossRef]
- McGee, J.; Caverlee, J.; Cheng, Z. A geographic study of tie strength in social media. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK, 24–28 October 2011; pp. 2333–2336. [Google Scholar] [CrossRef]
- McGee, J.; Caverlee, J.; Cheng, Z. Location prediction in social media based on tie strength. In Proceedings of the International Conference on Information and Knowledge Management, Chengdu, China, 20–21 July 2013; pp. 459–468. [Google Scholar] [CrossRef]
- Rout, D.; Preotiuc-Pietro, D.; Bontcheva, K.; Cohn, T. Where’s @wally: A classification approach to Geolocating users based on their social ties. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, Paris, France, 2–4 May 2013. [Google Scholar] [CrossRef]
- Jurgens, D. That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. Proc. Int. AAAI Conf. Web Soc. Media 2021, 7, 273–282. [Google Scholar] [CrossRef]
- Kothari, R.; Jain, V. Learning from labeled and unlabeled data. In Proceedings of the 2002 International Joint Conference on Neural Networks, Honolulu, HI, USA, 12–17 May 2002; Volume 3, pp. 2803–2808. [Google Scholar] [CrossRef]
- Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F. Twitter user geolocation by filtering of highly mentioned users. J. Assoc. Inf. Sci. Technol. 2018, 69. [Google Scholar] [CrossRef]
- Rahimi, A.; Vu, D.; Cohn, T.; Baldwin, T. Exploiting Text and Network Context for Geolocation of Social Media Users. arXiv 2015, arXiv:1506.04803. [Google Scholar] [CrossRef]
- Rahimi, A.; Cohn, T.; Baldwin, T. Twitter User Geolocation Using a Unified Text and Network Prediction Model. arXiv 2015, arXiv:1506.08259. [Google Scholar] [CrossRef]
- Talukdar, P.; Crammer, K. New Regularized Algorithms for Transductive Learning. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, 7–11 September 2009; pp. 442–457. [Google Scholar] [CrossRef]
- Bakerman, J.; Pazdernik, K.; Wilson, A.; Fairchild, G.; Bahran, R. Twitter Geolocation: A Hybrid Approach. ACM Trans. Knowl. Discov. Data 2018, 12, 1–17. [Google Scholar] [CrossRef]
- Huu, T.; Nguyen, D.; Tsiligianni, E.; Cornelis, B.; Deligiannis, N. Twitter User Geolocation Using Deep Multiview Learning. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6304–6308. [Google Scholar] [CrossRef]
- Sun, Y.; Han, J. Mining heterogeneous information networks: A structural analysis approach. SIGKDD Explor. 2012, 14, 20–28. [Google Scholar] [CrossRef]
- Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 7370–7377. [Google Scholar] [CrossRef]
- Srivastava, R.; Greff, K.; Schmidhuber, J. Highway Networks. arXiv 2015, arXiv:1505.00387. [Google Scholar]
- Miura, Y.; Taniguchi, M.; Taniguchi, T.; Ohkuma, T. Unifying Text, Metadata, and User Network Representations with a Neural Network for Geolocation Prediction. In Proceedings of the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 1260–1272. [Google Scholar] [CrossRef]
- Li, P.; Lu, H.; Kanhabua, N.; Zhao, S.; Pan, G. Location Inference for Non-Geotagged Tweets in User Timelines. IEEE Trans. Knowl. Data Eng. 2018, 31, 1150–1165. [Google Scholar] [CrossRef]
- Wu, F.; de Souza, A.H.S., Jr.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K.Q. Simplifying Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6861–6871. [Google Scholar]
- Miyazaki, T.; Rahimi, A.; Cohn, T.; Baldwin, T. Twitter Geolocation using Knowledge-Based Methods. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-Generated Text, Brussels, Belgium, 1 November 2018; pp. 7–16. [Google Scholar] [CrossRef]
Dataset | Tweets | Mentions | Users | Train | Test | Dev |
---|---|---|---|---|---|---|
GeoText | 378k | 109K | 9475 | 5685 | 1895 | 1895 |
GeoText | |||
---|---|---|---|
Acc@161 | Median | Mean | |
MADCEL [27] | 58 | 60 | 586 |
MLP4Geo [8] | 38 | 389 | 844 |
MENET [30] | 55 | 125 | 643 |
GeoAtt [34] | 57 | 81 | 612 |
DCCA [12] | 56 | 79 | 627 |
BiLSTM-C [35] | 45 | 363 | 796 |
Attn [35] | 52 | 236 | 657 |
SGC4Geo [36] | 61 | 45 | 543 |
M-GCN [14] | 61.10 | 46.09 | 519.27 |
KB-emb [37] | 43 | 321 | 793 |
MetaGeo [13] | 62 | 42 | 533 |
MVGeo | 63.6 | 41 | 519 |
GeoText | |||
---|---|---|---|
Acc@161 | Median | Mean | |
MVGeo | 63.6 | 41 | 519 |
(1) MVGeo-TF | 61.3 | 47 | 540 |
(2) MVGeo-LC | 62.5 | 42 | 530 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, H.; Li, J.; Li, S.; Li, H.; Ma, J.; Qiao, Y. Find You: Multi-View-Based Location Inference for Twitter Users. Appl. Sci. 2023, 13, 11848. https://doi.org/10.3390/app132111848
Wu H, Li J, Li S, Li H, Ma J, Qiao Y. Find You: Multi-View-Based Location Inference for Twitter Users. Applied Sciences. 2023; 13(21):11848. https://doi.org/10.3390/app132111848
Chicago/Turabian StyleWu, Huixin, Jiahui Li, Shuqing Li, Hanbing Li, Jiangtao Ma, and Yaqiong Qiao. 2023. "Find You: Multi-View-Based Location Inference for Twitter Users" Applied Sciences 13, no. 21: 11848. https://doi.org/10.3390/app132111848
APA StyleWu, H., Li, J., Li, S., Li, H., Ma, J., & Qiao, Y. (2023). Find You: Multi-View-Based Location Inference for Twitter Users. Applied Sciences, 13(21), 11848. https://doi.org/10.3390/app132111848