Raising the Flag: Monitoring User Perceived Disinformation on Reddit
Abstract
:1. Introduction
2. Previous Research
2.1. Scope and Consequences of Online Misinformation
2.2. Measures against Misinformation
2.3. Capturing User Interaction
3. Data Collection
3.1. Data Collection: Selecting Subreddits
3.2. Data Collection: Selecting Posts
3.3. Data Collection: Selecting Comments
4. NLP Model for Detecting Informal Flags
4.1. Keyword Filtering
- disinformation or misinformation
- fake or false news
- misleading or clickbait
- unreliable
- propaganda
- bullshit/bs
4.2. POS Matching on Sentences
4.3. The Vocabulary and Grammar of Flagging
- We extract all posts from subreddits that we track every day at midnight.
- Further filtering is performed to remove comments that are sarcastic, or those replying to posts that share articles directly referencing false information. Common procedures for sarcasm detections use rule-based, statistical, or deep learning methods [53], and more recently transfer learning from sentiment to neural models has shown good results [54]. Because it is not the main focus of the paper, we use a relatively simple rule-based method, contained in a regular expression (full regex in Appendix B). It is by no means a perfect sarcasm filter, but it does manage to remove the more obvious forms (such as “Oh, this must be ’fake’ news \s”).
- We extract the top 1000 comments, and match each sentence of each comment to our keyword-flag syntax patterns. If at least one match is present in a comment, the comment is marked as a flag (Figure 1).
- Finally, all sentences that match the POS pattern are added to the database of matches. This database is updated daily, and serves as the main input in the dashboard.
4.4. Evaluation of the POS Matcher
4.4.1. Manual Labeling
4.4.2. Comparison with Simple Keyword Filtering
4.4.3. Comparison with Machine Learning Models
5. Results
5.1. Informal Flagging: Trends and Peaks
5.2. Informal Flagging: "Lone Wolf" or "Brigading"?
5.3. Informal Flagging: How Do Redditors Flag When They Flag?
5.4. Informal Flagging: Topic Clustering
- a topic about the U.S. president and the Government’s response to the pandemic,
- a topic about China, and
- the largest topic about COVID-19 generic information: reports on deaths, infected, and health concerns.
5.5. Comparison to Fact Checking Websites
6. Discussion
7. Materials and Methods
7.1. Data Analysis Dashboard
7.2. Dashboard: Flags
7.3. Dashboard: Posts (Submissions) and Clustering
7.4. Dashboard: Subreddits, Domains, Authors
8. Data Availability Statement
8.1. Manual Annotation Files
8.2. Descriptive Analyses
8.3. Source Code of Dashboard
8.4. Testing Dashboard
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
ML | Machine Learning |
NLP | Natural Language Processing |
POS matcher | Matcher Using Part of Speech Tagging and Dependency Parsing |
MDPI | Multidisciplinary Digital Publishing Institute |
Appendix A. Data Processing Flowchart for Dashboard
Appendix B. Vocabulary, POS, Dependency Classes, and Matching Patterns
References
- Bovet, A.; Makse, H.A. Influence of fake news in Twitter during the 2016 US presidential election. Nat. Commun. 2019, 10, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Bastos, M.T.; Mercea, D. The Brexit Botnet and User-Generated Hyperpartisan News. Soc. Sci. Comput. Rev. 2017, 37, 38–54. [Google Scholar] [CrossRef] [Green Version]
- Bradshaw, S.; Howard, P.N. The Global Disinformation Order 2019 Global Inventory of Organised Social Media Manipulation; Working Paper; Project on Computational Propaganda: Oxford, UK, 2019. [Google Scholar]
- Woolley, S.C.; Howard, P.N. (Eds.) Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media; Oxford Studies in Digital Politics; Oxford University Press: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Bentzen, N. Foreign Influence Operations in the EU. 2018. Available online: http://www.europarl.europa.eu/RegData/etudes/BRIE/2018/625123/EPRS_BRI(2018)625123_EN.pdf (accessed on 29 November 2020).
- Brennen, J.S.; Simon, F.; Howard, P.N.; Nielsen, R.K. Types, Sources, and Claims of COVID-19 Misinformation; Oxford University Press: Oxford, UK, 2020; Volume 7, pp. 1–13. [Google Scholar]
- Zarocostas, J. How to fight an infodemic. Lancet 2020, 395, 676. [Google Scholar] [CrossRef]
- Lazer, D.M.J.; Baum, M.A.; Benkler, Y.; Berinsky, A.J.; Greenhill, K.M.; Menczer, F.; Metzger, M.J.; Nyhan, B.; Pennycook, G.; Rothschild, D.; et al. The science of fake news. Science 2018, 359, 1094–1096. [Google Scholar] [CrossRef]
- Paul, C.; Matthews, M. The Russian “Firehose of Falsehood” Propaganda Model: Why It Might Work and Options to Counter It; RAND Perspective; RAND: Santa Monica, CA, USA, 2016; Volume 198. [Google Scholar]
- Lazarsfeld, P.F.; Berelson, B.; Gaudet, H. The People’s Choice. How the Voter Makes Up His mind in a Presidential Campaign; Duell, Sloan & Pearce: New York, NY, USA, 1944. [Google Scholar]
- Bratich, J.Z. Amassing the Multitude: Revisiting Early Audience Studies. Commun. Theory 2006, 15, 242–265. [Google Scholar] [CrossRef]
- Baly, R.; Karadzhov, G.; Alexandrov, D.; Glass, J.; Nakov, P. Predicting Factuality of Reporting and Bias of News Media Sources. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar] [CrossRef] [Green Version]
- Canini, K.R.; Suh, B.; Pirolli, P.L. Finding Credible Information Sources in Social Networks Based on Content and Social Structure. In Proceedings of the 2011 IEEE Third Int’l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int’l Conference on Social Computing, Boston, MA, USA, 9–11 October 2011; pp. 1–8. [Google Scholar] [CrossRef]
- Gupta, A.; Kumaraguru, P. Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media—PSOSM ’12, Lyon, France, 17 April 2012; Kumaraguru, P., Almeida, V., Eds.; ACM Press: New York, NY, USA, 2012; pp. 2–8. [Google Scholar] [CrossRef] [Green Version]
- Helmstetter, S.; Paulheim, H. Weakly Supervised Learning for Fake News Detection on Twitter. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018; pp. 274–277. [Google Scholar] [CrossRef]
- Hounsel, A.; Holland, J.; Kaiser, B.; Borgolte, K.; Feamster, N.; Mayer, J. Identifying Disinformation Websites Using Infrastructure Features. Available online: https://www.usenix.org/system/files/foci20-paper-hounsel.pdf (accessed on 22 December 2020).
- Mihaylov, T.; Mihaylova, T.; Nakov, P.; Màrquez, L.; Georgiev, G.D.; Koychev, I.K. The dark side of news community forums: Opinion manipulation trolls. Internet Res. 2018, 28, 1292–1312. [Google Scholar] [CrossRef]
- Shu, K.; Wang, S.; Lee, D.; Liu, H. Disinformation, misinformation, and fake news in social media: Emerging research challenges and opportunities.; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Song, X.; Petrak, J.; Jiang, Y.; Singh, I.; Maynard, D.; Bontcheva, K. Classification Aware Neural Topic Model and its Application on a New COVID-19 Disinformation Corpus; 2020. arXiv 2020, arXiv:2006.03354. [Google Scholar]
- Zannettou, S.; Caulfield, T.; Setzer, W.; Sirivianos, M.; Stringhini, G.; Blackburn, J. Who Let The Trolls Out? In Proceedings of the 10th ACM Conference on Web Science—WebSci ’19, Oxford, UK, 28 June–1 July 2019. [Google Scholar] [CrossRef] [Green Version]
- Lagorio-Chafkin, C. We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internets Culture Laboratory; Hachette Books: New York, NY, USA, 2018. [Google Scholar]
- Allport, G.W.; Postman, L. An analysis of rumor. Public Opin. Q. 1946, 10, 501–517. [Google Scholar] [CrossRef]
- Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef]
- Giachanou, A.; Rosso, P.; Crestani, F. Leveraging Emotional Signals for Credibility Detection. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval; Association for Computing Machinery, New York, NY, USA, 21–25 July 2019; pp. 877–880. [Google Scholar] [CrossRef]
- Ghanem, B.; Rosso, P.; Rangel, F. An Emotional Analysis of False Information in Social Media and News Articles. ACM Trans. Internet Technol. 2020, 20. [Google Scholar] [CrossRef]
- Shin, J.; Jian, L.; Driscoll, K.; Bar, F. The diffusion of misinformation on social media: Temporal pattern, message, and source. Comput. Hum. Behav. 2018, 83, 278–287. [Google Scholar] [CrossRef]
- Allington, D.; Dhavan, N. The Relationship between Conspiracy Beliefs and Compliance with Public Health Guidance with Regard to COVID-19; Centre for Countering Digital Hate: London, UK, 2020. [Google Scholar]
- Imhoff, R.; Lamberty, P. A Bioweapon or a Hoax? The Link Between Distinct Conspiracy Beliefs About the Coronavirus Disease (COVID-19) Outbreak and Pandemic Behavior. Soc. Psychol. Personal. Sci. 2020. [Google Scholar] [CrossRef]
- Sultănescu, D.; Achimescu, V.; Sultănescu, D.C. Conspiracy Narratives and Compliance with Public Health Recommendations During the COVID-19 Crisis in Romania. In Proceedings of the 7th ACADEMOS Conference 2020 International Conference, Bucharest, Romania, 7–10 October 2020; pp. 393–401. [Google Scholar]
- Burstyn, L.; Rao, A.; Roth, C.; Yanagizawa-Drott, D. Misinformation during a Pandemic; Working Paper; University of Chicago, Becker Friedman Institute for Economics: Chicago, IL, USA, 2020. [Google Scholar]
- Huang, B.; Carley, K.M. Disinformation and Misinformation on Twitter during the Novel Coronavirus Outbreak. Available online: https://arxiv.org/abs/2006.04278 (accessed on 22 December 2020).
- Kouzy, R.; Abi Jaoude, J.; Kraitem, A.; El Alam, M.B.; Karam, B.; Adib, E.; Zarka, J.; Traboulsi, C.; Akl, E.W.; Baddour, K. Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter. Cureus 2020, 12, e7255. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Reddit. Update to Our Content Policy. 2020. Available online: https://www.reddit.com/r/announcements/comments/hi3oht/update_to_our_content_policy/ (accessed on 29 November 2020).
- Reddit. Misinformation and COVID-19: What Reddit Is Doing. 2020. Available online: https://www.reddit.com/r/ModSupport/comments/g21ub7/misinformation_and_covid19_what_reddit_is_doing/ (accessed on 29 November 2020).
- Silverman, H. Helping Fact-Checkers Identify False Claims Faster. 2019. Available online: https://about.fb.com/news/2019/12/helping-fact-checkers/ (accessed on 29 November 2020).
- Conroy, N.K.; Rubin, V.L.; Chen, Y. Automatic deception detection: Methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 2015, 52, 1–4. [Google Scholar] [CrossRef] [Green Version]
- Shu, K.; Cui, L.; Wang, S.; Lee, D.; Liu, H. dEFEND. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Teredesai, A., Kumar, V., Li, Y., Rosales, R., Terzi, E., Karypis, G., Eds.; ACM: New York, NY, USA, 2019; pp. 395–405. [Google Scholar] [CrossRef]
- Karadzhov, G.; Nakov, P.; Màrquez, L.; Barrón-Cedeño, A.; Koychev, I. Fully Automated Fact Checking Using External Sources. arXiv 2017, arXiv:1710.00341. [Google Scholar]
- Griffith, M.; Spies, N.C.; Krysiak, K.; McMichael, J.F.; Coffman, A.C.; Danos, A.M.; Ainscough, B.J.; Ramirez, C.A.; Rieke, D.T.; Kujan, L.; et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 2017, 49, 170–174. [Google Scholar] [CrossRef] [Green Version]
- Keuleers, E.; Stevens, M.; Mandera, P.; Brysbaert, M. Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. Q. J. Exp. Psychol. 2015, 68, 1665–1692. [Google Scholar] [CrossRef]
- Schlagwein, D.; Bjorn-Andersen, N. Organizational Learning with Crowdsourcing: The Revelatory Case of LEGO. J. Assoc. Inf. Syst. 2014, 15, 754–778. [Google Scholar] [CrossRef]
- Pennycook, G.; Rand, D.G. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc. Natl. Acad. Sci. USA 2019, 116, 2521–2526. [Google Scholar] [CrossRef] [Green Version]
- Pennycook, G.; Bear, A.; Collins, E.T.; Rand, D.G. The Implied Truth Effect: Attaching Warnings to a Subset of Fake News Headlines Increases Perceived Accuracy of Headlines Without Warnings. Manag. Sci. 2020. [Google Scholar] [CrossRef] [Green Version]
- Becker, J.; Porter, E.; Centola, D. The wisdom of partisan crowds. Proc. Natl. Acad. Sci. USA 2019, 116, 10717–10722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Degroot, M.H. Reaching a Consensus. J. Am. Stat. Assoc. 1974, 69, 118–121. [Google Scholar] [CrossRef]
- Achimescu, V.; Sultanescu, D. Feeding the troll detection algorithm. First, Monday 2020. [Google Scholar] [CrossRef]
- Singer, J.B. User-generated visibility: Secondary gatekeeping in a shared media space. New Media Soc. 2014, 16, 55–73. [Google Scholar] [CrossRef]
- Wardle, C.; Derakhshan, H. Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making; Technical Report; Council of Europe: Strasbourg, France, 2017. [Google Scholar]
- Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; Blackburn, J. The Pushshift Reddit Dataset. arXiv 2020, arXiv:cs.SI/2001.08435. [Google Scholar]
- Amaya, A.; Bach, R.; Keusch, F.; Kreuter, F. New Data Sources in Social Science Research: Things to Know Before Working with Reddit Data. Soc. Sci. Comput. Rev. 2019, 6. [Google Scholar] [CrossRef]
- Giachanou, A.; Zhang, G.; Rosso, P. Multimodal Multi-image Fake News Detection. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 647–654. [Google Scholar] [CrossRef]
- Paletz, S.B.F.; Auxier, B.E.; Golonka, E.M. A Multidisciplinary Framework of Information Propagation Online; Springer Briefs in Complexity; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Joshi, A.; Bhattacharyya, P.; Carman, M.J. Automatic Sarcasm Detection. ACM Comput. Surv. 2017, 50, 1–22. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, X.; Chan, J.; Rosso, P. Irony detection via sentiment-based transfer learning. Inf. Process. Manag. 2019, 56, 1633–1644. [Google Scholar] [CrossRef]
- DiMaggio, P.J.; Powell, W.W. The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields. Am. Sociol. Rev. 1983, 48, 147–160. [Google Scholar] [CrossRef] [Green Version]
- Zuckerman, E.; Kim, T.; Ukanwa, K.; von Rittmann, J. Robust Identities or Nonentities? Typecasting in the Feature-Film Labor Market. Am. J. Sociol. 2003, 108, 1018–1073. [Google Scholar] [CrossRef] [Green Version]
- Cer, D.; Yang, Y.; Kong, S.y.; Hua, N.; Limtiaco, N.; St John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; et al. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Association for Computational Linguistics, Brussels, Belgium, 31 October–4 November 2018; pp. 169–174. [Google Scholar] [CrossRef]
Patterns | Examples of Matched Phrases | Freq | |
---|---|---|---|
o, ao | Fake news! | Clickbait! | 35.8% |
svo, svao | The whole post is disinformation. | This has to be fake news. | 27.0% |
sva, svna, svna2 | Title is misleading. | Source is not reliable. | 15.5% |
nao | Not a reliable source. | Not a correct headline! | 12.4% |
vo, vao | Looks like false news. | Smells like propaganda | 3.2% |
stvo, stvao | Quit spreading misinformation. | Stop posting fake news. | 2.8% |
ivo, ivao | I call bullshit. | I’m calling bs. | 2.0% |
yvo, yvao | You are spreading falsehoods. | You’re posting false information. | 0.4% |
ivsva, ivsvo, ivsvao | I think this is propaganda. | I know that the title is false. | 0.3% |
svsv, svsav | This is how fake news spreads. | This is what fake news looks like. | 0.1% |
Total | All patterns | 100.0% |
Set 1 | Set 2 | Set 3 | Total | |
---|---|---|---|---|
Period | January–July 2020 | January–July 2020 | August–October 2020 | |
Extracted from | keyword matches | POS matches | keyword matches | |
N (number of comments) | 600 | 600 | 300 | 1500 |
Valid N | 573 | 591 | 291 | 1455 |
Used for: | Training set for ML models | Test set for ML models | ||
Additional use: | Validation set for POS matcher | Validation set for POS matcher |
Flag Type | prec_KEYWORD | Precision_POS | Recall_POS | F1_POS | n_Comments |
---|---|---|---|---|---|
misleading | 0.83 | 0.99 | 0.44 | 0.61 | 233 |
propaganda | 0.51 | 0.87 | 0.48 | 0.62 | 248 |
fake news | 0.27 | 0.73 | 0.54 | 0.62 | 257 |
unreliable | 0.15 | 0.67 | 0.36 | 0.47 | 204 |
disinformation | 0.29 | 0.65 | 0.32 | 0.43 | 216 |
bullshit | 0.23 | 0.52 | 0.50 | 0.51 | 241 |
POS Matcher | L1/L2 Logistic Regression | Random Forests | |
---|---|---|---|
accuracy | 0.72 | 0.70 | 0.76 |
precision | 0.73 | 0.62 | 0.78 |
recall | 0.45 | 0.60 | 0.54 |
F1 | 0.55 | 0.61 | 0.64 |
AUC-ROC | 0.67 | 0.68 | 0.72 |
291 | 291 | 291 |
N | N/Month | N/Week | N/Day | |
---|---|---|---|---|
posts | 3,395,847 | 565,974 | 147,645.52 | 18,658.50 |
comments | 17,254,621 | 2,875,770 | 750,200.91 | 94,805.61 |
keyword matches | 522,320 | 87,053 | 22,709.57 | 2869.89 |
POS matches | 33,609 | 5,601 | 1,461.26 | 184.66 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Achimescu, V.; Chachev, P.D. Raising the Flag: Monitoring User Perceived Disinformation on Reddit. Information 2021, 12, 4. https://doi.org/10.3390/info12010004
Achimescu V, Chachev PD. Raising the Flag: Monitoring User Perceived Disinformation on Reddit. Information. 2021; 12(1):4. https://doi.org/10.3390/info12010004
Chicago/Turabian StyleAchimescu, Vlad, and Pavel Dimitrov Chachev. 2021. "Raising the Flag: Monitoring User Perceived Disinformation on Reddit" Information 12, no. 1: 4. https://doi.org/10.3390/info12010004