This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation
by
Lulu Alarfaj
Lulu Alarfaj 1,*
,
Jeremy Blackburn
Jeremy Blackburn 2,
Maaz Amjad
Maaz Amjad 3
,
Jay Patel
Jay Patel 2 and
Zeynep Ertem
Zeynep Ertem 1
1
Department of System Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902, USA
2
School of Computing, Binghamton University, Binghamton, NY 13902, USA
3
Department of Computer Science, Texas Tech University, Lubbock, TX 79409, USA
*
Author to whom correspondence should be addressed.
Information 2025, 16(9), 748; https://doi.org/10.3390/info16090748 (registering DOI)
Submission received: 9 July 2025
/
Revised: 13 August 2025
/
Accepted: 26 August 2025
/
Published: 28 August 2025
Abstract
The problem of geolocating Reddit users without access to the author information API is tackled in this study. Using subreddit data, we analyzed and identified user location based on their interactions within location-specific subreddits. Using unsupervised learning methods such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) algorithms, we examined conversations about COVID-19 and immunization across the U.S., focusing on COVID-19 vaccination. Our topic modeling identifies four themes: humor and sarcasm (e.g., jokes about microchips), conspiracy theories (e.g., tracking devices and microchips in the COVID-19 vaccine), public skepticism (e.g., debates over vaccine safety and freedom), and vaccine brand concerns (e.g., Pfizer, Moderna, and booster shots). Our geolocation analysis shows that regions with lower vaccination rates often exhibit a higher prevalence of misinformation-labeled comments. For example, counties such as Ada County (Idaho), Newton County (Missouri), and Flathead County (Montana) showed both a low vaccine uptake and a high rate of false information. This study provides useful information on the many different examples of misinformation that are disseminated online. It gives us a better understanding of how people in different parts of the U.S. think about getting a COVID-19 vaccine.
Share and Cite
MDPI and ACS Style
Alarfaj, L.; Blackburn, J.; Amjad, M.; Patel, J.; Ertem, Z.
Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information 2025, 16, 748.
https://doi.org/10.3390/info16090748
AMA Style
Alarfaj L, Blackburn J, Amjad M, Patel J, Ertem Z.
Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information. 2025; 16(9):748.
https://doi.org/10.3390/info16090748
Chicago/Turabian Style
Alarfaj, Lulu, Jeremy Blackburn, Maaz Amjad, Jay Patel, and Zeynep Ertem.
2025. "Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation" Information 16, no. 9: 748.
https://doi.org/10.3390/info16090748
APA Style
Alarfaj, L., Blackburn, J., Amjad, M., Patel, J., & Ertem, Z.
(2025). Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information, 16(9), 748.
https://doi.org/10.3390/info16090748
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.