Previous Article in Journal
Readiness to Practice for Biomedical Scientists and Screen-Based Simulated Learning Experiences: A Scoping Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation

1
Department of System Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902, USA
2
School of Computing, Binghamton University, Binghamton, NY 13902, USA
3
Department of Computer Science, Texas Tech University, Lubbock, TX 79409, USA
*
Author to whom correspondence should be addressed.
Information 2025, 16(9), 748; https://doi.org/10.3390/info16090748 (registering DOI)
Submission received: 9 July 2025 / Revised: 13 August 2025 / Accepted: 26 August 2025 / Published: 28 August 2025

Abstract

The problem of geolocating Reddit users without access to the author information API is tackled in this study. Using subreddit data, we analyzed and identified user location based on their interactions within location-specific subreddits. Using unsupervised learning methods such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) algorithms, we examined conversations about COVID-19 and immunization across the U.S., focusing on COVID-19 vaccination. Our topic modeling identifies four themes: humor and sarcasm (e.g., jokes about microchips), conspiracy theories (e.g., tracking devices and microchips in the COVID-19 vaccine), public skepticism (e.g., debates over vaccine safety and freedom), and vaccine brand concerns (e.g., Pfizer, Moderna, and booster shots). Our geolocation analysis shows that regions with lower vaccination rates often exhibit a higher prevalence of misinformation-labeled comments. For example, counties such as Ada County (Idaho), Newton County (Missouri), and Flathead County (Montana) showed both a low vaccine uptake and a high rate of false information. This study provides useful information on the many different examples of misinformation that are disseminated online. It gives us a better understanding of how people in different parts of the U.S. think about getting a COVID-19 vaccine.
Keywords: fake news; COVID-19; geolocation; misinformation; unsupervised learning; topic modeling fake news; COVID-19; geolocation; misinformation; unsupervised learning; topic modeling

Share and Cite

MDPI and ACS Style

Alarfaj, L.; Blackburn, J.; Amjad, M.; Patel, J.; Ertem, Z. Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information 2025, 16, 748. https://doi.org/10.3390/info16090748

AMA Style

Alarfaj L, Blackburn J, Amjad M, Patel J, Ertem Z. Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information. 2025; 16(9):748. https://doi.org/10.3390/info16090748

Chicago/Turabian Style

Alarfaj, Lulu, Jeremy Blackburn, Maaz Amjad, Jay Patel, and Zeynep Ertem. 2025. "Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation" Information 16, no. 9: 748. https://doi.org/10.3390/info16090748

APA Style

Alarfaj, L., Blackburn, J., Amjad, M., Patel, J., & Ertem, Z. (2025). Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation. Information, 16(9), 748. https://doi.org/10.3390/info16090748

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop