Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data
Abstract
:1. Introduction
- An annotated (labeled) repository with multiple labels, ProText, was proposed—every instance is based on the sentence level (sent-level) with a precise propaganda technique. These techniques are annotated (labeled) throughout the data automatically and manually.
- We comprehensively explain propaganda news identification by combining a deep-learning model with natural language processing (NLP) technologies. RoBERTa is fine-tuned and formulates specific techniques that classify multi-label propaganda from the news source into spans and techniques.
- The recognition of propaganda news articles is demonstrated using two general algorithms, automatic and manual fact-checking in each data sample.
- The researchers, social media companies, and news media companies must provide propagandist data as part of our deep-learning model. Therefore, we call for collecting propagandist text data to aid researchers in advancing their studies.
2. Related Work
3. Materials and Methods
3.1. ProText Labeling and Matching Claims
3.1.1. Manual Fact-Checking
3.1.2. Automatic Fact-Checking
3.2. Propaganda Text Detection
3.3. Fact-Checking and Media Bias Topics
4. Results and Discussion
4.1. Dataset Analysis
4.2. Statistics of Available Datasets
4.3. Hyper-Parameters
4.4. Evaluation Performance Metrics
4.5. Results
4.6. Discussion and Research Implications
4.6.1. Theoretical Contribution
4.6.2. Practical Implication
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ahmed, S.; Hinkelmann, K.; Corradini, F. Fact Checking: An Automatic End to End Fact Checking System. In Combating Fake News with Computational Intelligence Techniques; Springer: Berlin/Heidelberg, Germany, 2022; pp. 345–366. [Google Scholar]
- Hao, F.; Yang, Y.; Shang, J.; Park, D.-S. AFCMiner: Finding Absolute Fair Cliques From Attributed Social Networks for Responsible Computational Social Systems. In IEEE Transactions on Computational Social Systems; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
- Ahmad, P.N.; Khan, K. Propaganda Detection And Challenges Managing Smart Cities Information On Social Media. EAI Endorsed Trans. Smart Cities 2023, 7, e2. [Google Scholar] [CrossRef]
- Khanday, A.M.U.D.; Wani, M.A.; Rabani, S.T.; Khan, Q.R. Hybrid Approach for Detecting Propagandistic Community and Core Node on Social Networks. Sustainability 2023, 15, 1249. [Google Scholar] [CrossRef]
- Barrón-Cedeno, A.; Jaradat, I.; Da San Martino, G.; Nakov, P. Proppy: Organizing the News Based on Their Propagandistic Content. Inf. Process. Manag. 2019, 56, 1849–1864. [Google Scholar] [CrossRef]
- Alhindi, T.; Pfeiffer, J.; Muresan, S. Fine-Tuned Neural Models for Propaganda Detection at the Sentence and Fragment Levels. arXiv 2019, arXiv:1910.09702. [Google Scholar]
- Daval-Frerot, G.; Weis, Y. WMD at SemEval-2020 Tasks 7 and 11: Assessing Humor and Propaganda Using Unsupervised Data Augmentation. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, Spain, 12–13 December 2020; pp. 1865–1874. [Google Scholar]
- Vosoughi, S.; Roy, D.; Aral, S. The Spread of True and False News Online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef]
- Pavleska, T.; Školkay, A.; Zankova, B.; Ribeiro, N.; Bechmann, A. Performance Analysis of Fact-Checking Organizations and Initiatives in Europe: A Critical Overview of Online Platforms Fighting Fake News. Soc. Media Converg. 2018, 29, 1–28. [Google Scholar]
- Shao, C.; Ciampaglia, G.L.; Flammini, A.; Menczer, F. Hoaxy: A Platform for Tracking Online Misinformation. In Proceedings of the 25th International Conference Companion on World Wide Web, Geneva, Switzerland, 11–15 April 2016; pp. 745–750. [Google Scholar]
- Da San Martino, G.; Yu, S.; Barrón-Cedeno, A.; Petrov, R.; Nakov, P. Fine-Grained Analysis of Propaganda in News Article. In Proceedings of the 2019 conference On Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5636–5646. [Google Scholar]
- Rashkin, H.; Choi, E.; Jang, J.Y.; Volkova, S.; Choi, Y. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9 September 2017; pp. 2931–2937. [Google Scholar]
- Paudel, K.; Hinsley, A.; Veríssimo, D.; Milner-Gulland, E. Evaluating the Reliability of Media Reports for Gathering Information about Illegal Wildlife Trade Seizures. PeerJ 2022, 10, e13156. [Google Scholar] [CrossRef]
- Chen, C.; Wu, K.; Srinivasan, V.; Zhang, X. Battling the Internet Water Army: Detection of Hidden Paid Posters. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), New York, NY, USA, 25–28 August 2013; pp. 116–120. [Google Scholar]
- Cooper, G. Populist Rhetoric and Media Misinformation in the 2016 UK Brexit Referendum; Tumber, H., Waisbord, S., Eds.; Routledge: London, UK, 2021; pp. 397–410. [Google Scholar]
- Amin, S.; Alharbi, A.; Uddin, M.I.; Alyami, H. Adapting Recurrent Neural Networks for Classifying Public Discourse on COVID-19 Symptoms in Twitter Content. Soft Comput. 2022, 26, 11077–11089. [Google Scholar] [CrossRef]
- DiMaggio, A.R. Conspiracy Theories and the Manufacture of Dissent: QAnon, the ‘Big Lie’, COVID-19, and the Rise of Rightwing Propaganda. Crit. Sociol. 2022, 48, 1025–1048. [Google Scholar] [CrossRef]
- Al-Khateeb, S.; Hussain, M.N.; Agarwal, N. Social Cyber Forensics Approach to Study Twitter’s and Blogs’ Influence on Propaganda Campaigns. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Washington, DC, USA, 5–8 July 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 108–113. [Google Scholar]
- Bolsover, G.; Howard, P. Computational Propaganda and Political Big Data: Moving toward a More Critical Research Agenda. Big Data 2017, 5, 273–276. [Google Scholar] [CrossRef]
- Arocena, P.C.; Glavic, B.; Mecca, G.; Miller, R.J.; Papotti, P.; Santoro, D. Messing up with BART: Error Generation for Evaluating Data-Cleaning Algorithms. Proc. VLDB Endow. 2015, 9, 36–47. [Google Scholar] [CrossRef] [Green Version]
- Da San Martino, G.; Cresci, S.; Barrón-Cedeño, A.; Yu, S.; Di Pietro, R.; Nakov, P. A Survey on Computational Propaganda Detection. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7 January 2021; pp. 4826–4832. [Google Scholar]
- Brunello, A.R. A Moral Compass and Modern Propaganda? Charting Ethical and Political Discourse. Rev. Hist. Political Sci. 2014, 2, 169–197. [Google Scholar]
- Guo, N.; Wang, Y.; Jiang, H.; Xia, X.; Gu, Y. TALI: An Update-Distribution-Aware Learned Index for Social Media Data. Mathematics 2022, 10, 4507. [Google Scholar] [CrossRef]
- Abdullah, M.; Altiti, O.; Obiedat, R. Detecting Propaganda Techniques in English News Articles Using Pre-Trained Transformers. In Proceedings of the 2022 13th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 21 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 301–308. [Google Scholar]
- Vlad, G.-A.; Tanase, M.-A.; Onose, C.; Cercel, D.-C. Sentence-Level Propaganda Detection in News Articles with Transfer Learning and BERT-BiLSTM-Capsule Model. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, Hong Kong, China, 4 November 2019; pp. 148–154. [Google Scholar]
- Mosseri, A. News Feed Fyi: Addressing Hoaxes and Fake News. Facebook Newsroom 2016, 15, 12. [Google Scholar]
- Paraschiv, A.; Cercel, D.-C.; Dascalu, M. Upb at Semeval-2020 Task 11: Propaganda Detection with Domain-Specific Trained Bert. arXiv 2020, arXiv:2009.05289. [Google Scholar]
- Martinkovic, M.; Pecar, S.; Šimko, M. NLFIIT at SemEval-2020 Task 11: Neural Network Architectures for Detection of Propaganda Techniques in News Articles. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, Spain, 12–13 December 2020; pp. 1771–1778. [Google Scholar]
- Bairaktaris, A.; Symeonidis, S.; Arampatzis, A. DUTH at SemEval-2020 Task 11: BERT with Entity Mapping for Propaganda Classification. arXiv 2020, arXiv:2008.09894. [Google Scholar]
- Zarour, M.; Alenezi, M.; Ansari, M.T.J.; Pandey, A.K.; Ahmad, M.; Agrawal, A.; Kumar, R.; Khan, R.A. Ensuring Data Integrity of Healthcare Information in the Era of Digital Health. Healthc. Technol. Lett. 2021, 8, 66–77. [Google Scholar] [CrossRef]
- Jabeen, F.; Rehman, Z.U.; Shah, S.; Alharthy, R.D.; Jalil, S.; Khan, I.A.; Iqbal, J.; Abd El-Latif, A.A. Deep Learning-Based Prediction of Inhibitors Interaction with Butyrylcholinesterase for the Treatment of Alzheimer’s Disease. Comput. Electr. Eng. 2023, 105, 108475. [Google Scholar] [CrossRef]
- Victoria, V. How Fake News Spreads Online? Int. J. Media Inf. Lit. 2020, 5, 217–226. [Google Scholar]
- García-Marín, D.; Elías, C.; Soengas-Pérez, X. Big Data and Disinformation: Algorithm Mapping for Fact Checking and Artificial Intelligence. In Total Journalism; Springer: Berlin/Heidelberg, Germany, 2022; pp. 123–135. [Google Scholar]
- Khattak, S.B.A.; Nasralla, M.M.; Marey, M.; Esmail, M.A.; Jia, M.; Umair, M.Y. WLAN Access Points Channel Assignment Strategy for Indoor Localization Systems in Smart Sustainable Cities. In Proceedings of the IOP Conference Series: Earth and Environmental Science, Riyadh, Saudi Arabia, 19–22 February 2022; IOP Publishing: Bristol, UK, 2022; Volume 1026, p. 012043. [Google Scholar]
- Khattak, S.B.A.; Jia, M.; Umair, M.Y.; Ahmed, A. Localization of a Mobile Node Using Fingerprinting in an Indoor Environment. In Communications, Signal Processing, and Systems, Proceedings of the 2018 CSPS Volume II: Signal Processing 7th, Dalian, China, 14–16 July 2020; Springer: Singapore, 2020; pp. 1080–1090. [Google Scholar]
- Chang, R.-C.; Lin, C.-H. Detecting Propaganda on the Sentence Level during the COVID-19 Pandemic. arXiv 2021, arXiv:2108.12269. [Google Scholar]
- Chiu, J.P.; Nichols, E. Named Entity Recognition with Bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 2016, 4, 357–370. [Google Scholar] [CrossRef]
- Du, J.; Vong, C.-M.; Chen, C.P. Novel Efficient RNN and LSTM-like Architectures: Recurrent and Gated Broad Learning Systems and Their Applications for Text Classification. IEEE Trans. Cybern. 2020, 51, 1586–1597. [Google Scholar] [CrossRef]
- Vorakitphan, V.; Cabrio, E.; Villata, S. PROTECT-A Pipeline for Propaganda Detection and Classification. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLIC-it 2021), Milan, Italy, 26–28 January 2022. [Google Scholar]
- Barfar, A. A Linguistic/Game-Theoretic Approach to Detection/Explanation of Propaganda. Expert Syst. Appl. 2022, 189, 116069. [Google Scholar] [CrossRef]
- Guo, Z.; Schlichtkrull, M.; Vlachos, A. A Survey on Automated Fact-Checking. Trans. Assoc. Comput. Linguist. 2022, 10, 178–206. [Google Scholar] [CrossRef]
- Li, W.; Li, S.; Liu, C.; Lu, L.; Shi, Z.; Wen, S. Span Identification and Technique Classification of Propaganda in News Articles. Complex Intell. Syst. 2021, 8, 3603–3612. [Google Scholar] [CrossRef]
- Chaudhari, D.; Pawar, A.V.; Barrón-Cedeño, A. H-Prop and H-Prop-News: Computational Propaganda Datasets in Hindi. Data 2022, 7, 29. [Google Scholar] [CrossRef]
- Chadwick, A.; Stanyer, J. Deception as a Bridging Concept in the Study of Disinformation, Misinformation, and Misperceptions: Toward a Holistic Framework. Commun. Theory 2022, 32, 1–24. [Google Scholar] [CrossRef]
- Lin, J.C.-W.; Shao, Y.; Djenouri, Y.; Yun, U. ASRNN: A Recurrent Neural Network with an Attention Model for Sequence Labeling. Knowl.-Based Syst. 2021, 212, 106548. [Google Scholar] [CrossRef]
- Zareie, A.; Sakellariou, R. Minimizing the Spread of Misinformation in Online Social Networks: A Survey. J. Netw. Comput. Appl. 2021, 186, 103094. [Google Scholar] [CrossRef]
- Ozturk, P.; Li, H.; Sakamoto, Y. Combating Rumor Spread on Social Media: The Effectiveness of Refutation and Warning. In Proceedings of the 2015 48th Hawaii International Conference on System Sciences, Kauai, HI, USA, 5–8 January 2015; pp. 2406–2414. [Google Scholar]
- Wu, Y.; Agarwal, P.K.; Li, C.; Yang, J.; Yu, C. Toward Computational Fact-Checking. Proc. VLDB Endow. 2014, 7, 589–600. [Google Scholar] [CrossRef] [Green Version]
- Jaradat, I.; Gencheva, P.; Barrón-Cedeño, A.; Màrquez, L.; Nakov, P. ClaimRank: Detecting Check-Worthy Claims in Arabic and English. arXiv 2018, arXiv:1804.07587. [Google Scholar]
- Margolin, D.B.; Hannak, A.; Weber, I. Political Fact-Checking on Twitter: When Do Corrections Have an Effect? Political Commun. 2018, 35, 196–219. [Google Scholar] [CrossRef]
- Chen, Y.; Conroy, N.J.; Rubin, V.L. Misleading Online Content: Recognizing Clickbait as “False News”. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, Washington, DC, USA, 13 November 2015; pp. 15–19. [Google Scholar]
- Zhang, A.X.; Ranganathan, A.; Metz, S.E.; Appling, S.; Sehat, C.M.; Gilmore, N.; Adams, N.B.; Vincent, E.; Lee, J.; Robbins, M. A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles. In Proceedings of the Companion Proceedings of the The Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 603–612. [Google Scholar]
- Altiti, O.; Abdullah, M.; Obiedat, R. JUST at SemEval-2020 Task 11: Detecting Propaganda Techniques Using BERT Pre-Trained Model. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, Spain, 12–13 December 2020; pp. 1749–1755. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A Robustly Optimized Bert Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Liu, Y.; Wu, Y.-F. Early Detection of Fake News on Social Media through Propagation Path Classification with Recurrent and Convolutional Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Mohtarami, M.; Baly, R.; Glass, J.; Nakov, P.; Màrquez, L.; Moschitti, A. Automatic Stance Detection Using End-to-End Memory Networks. arXiv 2018, arXiv:1804.07581. [Google Scholar]
- Mazza, M.; Cresci, S.; Avvenuti, M.; Quattrociocchi, W.; Tesconi, M. Rtbust: Exploiting Temporal Patterns for Botnet Detection on Twitter. In Proceedings of the 10th ACM Conference on Web Science, Boston, MA, USA, 30 June–3 July 2019; pp. 183–192. [Google Scholar]
- Hu, Y.; Yang, B.; Duo, B.; Zhu, X. Exhaustive Exploitation of Local Seeding Algorithms for Community Detection in a Unified Manner. Mathematics 2022, 10, 2807. [Google Scholar] [CrossRef]
- Pradeep, R.; Ma, X.; Nogueira, R.; Lin, J. Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 2066–2070. [Google Scholar]
- Leng, Y.; Zhai, Y.; Sun, S.; Wu, Y.; Selzer, J.; Strover, S.; Zhang, H.; Chen, A.; Ding, Y. Misinformation during the COVID-19 Outbreak in China: Cultural, Social and Political Entanglements. IEEE Trans. Big Data 2021, 7, 69–80. [Google Scholar] [CrossRef]
- Petrocchi, M.; Viviani, M. Report on the 2nd Workshop on Reducing Online Misinformation through Credible Information Retrieval (ROMCIR 2022) at ECIR 2022. In Proceedings of the ACM SIGIR Forum, Taipei, China, 23–27 July 2023; ACM: New York, NY, USA, 2023; Volume 56, pp. 1–9. [Google Scholar]
- Djenouri, Y.; Belhadi, A.; Srivastava, G.; Lin, J.C.-W. Advanced Pattern-Mining System for Fake News Analysis. In IEEE Transactions on Computational Social Systems; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
- Koch, T.K.; Frischlich, L.; Lermer, E. Effects of Fact-Checking Warning Labels and Social Endorsement Cues on Climate Change Fake News Credibility and Engagement on Social Media. J. Appl. Soc. Psychol. 2023, 1–3. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-Level Convolutional Networks for Text Classification. Adv. Neural Inf. Process. Syst. 2015, 28, 649–657. [Google Scholar]
- DiFonzo, N.; Robinson, N.M.; Suls, J.M.; Rini, C. Rumors about Cancer: Content, Sources, Coping, Transmission, and Belief. J. Health Commun. 2012, 17, 1099–1115. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Fang, J.; Jin, L.; Kang, H.; Liu, C. PEINet: Joint Prompt and Evidence Inference Network via Language Family Policy for Zero-Shot Multilingual Fact Checking. Appl. Sci. 2022, 12, 9688. [Google Scholar] [CrossRef]
- Yaseen, M.U.; Nasralla, M.M.; Aslam, F.; Ali, S.S.; Khattak, S.B.A. A Novel Approach Based on Multi-Level Bottleneck Attention Modules Using Self-Guided Dropblock for Person Re-Identification. IEEE Access 2022, 10, 123160–123176. [Google Scholar] [CrossRef]
- Saquete, E.; Tomás, D.; Moreda, P.; Martínez-Barco, P.; Palomar, M. Fighting Post-Truth Using Natural Language Processing: A Review and Open Challenges. Expert Syst. Appl. 2020, 141, 112943. [Google Scholar] [CrossRef]
- Abdelnabi, S.; Hasan, R.; Fritz, M. Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14940–14949. [Google Scholar]
- Kartal, Y.S.; Kutlu, M. Re-Think Before You Share: A Comprehensive Study on Prioritizing Check-Worthy Claims. IEEE Trans. Comput. Soc. Syst. 2022, 10, 362–375. [Google Scholar] [CrossRef]
- Chang, G.; Gao, H.; Yao, Z.; Xiong, H. TextGuise: Adaptive Adversarial Example Attacks on Text Classification Model. Neurocomputing 2023, 529, 190–203. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized Autoregressive Pretraining for Language Understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5753–5763. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A Lite Bert for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Kumar, G.; Singh, J.P.; Singh, A.K. Autoencoder-Based Feature Extraction for Identifying Hate Speech Spreaders in Social Media. In IEEE Transactions on Computational Social Systems; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
- Wani, M.A.; Agarwal, N.; Bours, P. Impact of Unreliable Content on Social Media Users during COVID-19 and Stance Detection System. Electronics 2020, 10, 5. [Google Scholar] [CrossRef]
- Johns, A.; Cheong, N. Feeling the Chill: Bersih 2.0, State Censorship, and “Networked Affect” on Malaysian Social Media 2012–2018. Soc. Media Soc. 2019, 5, 2056305118821801. [Google Scholar] [CrossRef] [Green Version]
- Chang, Y.; Keblis, M.F.; Li, R.; Iakovou, E.; White III, C.C. Misinformation and Disinformation in Modern Warfare. Oper. Res. 2022, 3, 1577–1597. [Google Scholar] [CrossRef]
- Founta, A.; Djouvas, C.; Chatzakou, D.; Leontiadis, I.; Blackburn, J.; Stringhini, G.; Vakali, A.; Sirivianos, M.; Kourtellis, N. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the International AAAI Conference on Web and Social Media, Stanford, CA, USA, 25–28 June 2018; Volume 12. [Google Scholar]
- Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent Trends in Deep Learning Based Natural Language Processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [Google Scholar] [CrossRef]
- Yang, K.-C.; Varol, O.; Hui, P.-M.; Menczer, F. Scalable and Generalizable Social Bot Detection through Data Selection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1096–1103. [Google Scholar]
- Echeverría, J.; De Cristofaro, E.; Kourtellis, N.; Leontiadis, I.; Stringhini, G.; Zhou, S. LOBO: Evaluation of Generalization Deficiencies in Twitter Bot Classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, PR, USA, 3–7 December 2018; pp. 137–146. [Google Scholar]
- Wadden, D.; Lin, S.; Lo, K.; Wang, L.L.; van Zuylen, M.; Cohan, A.; Hajishirzi, H. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 16–20 November 2020; pp. 7534–7550. [Google Scholar]
- Yilmaz, A.E.; Demirhan, H. Weighted Kappa Measures for Ordinal Multi-Class Classification Performance. Appl. Soft Comput. 2023, 134, 110020. [Google Scholar] [CrossRef]
- Jang, M.; Kang, P. Sentence Transition Matrix: An Efficient Approach That Preserves Sentence Semantics. Comput. Speech Lang. 2022, 71, 101266. [Google Scholar] [CrossRef]
- Solairaj, A.; Sugitha, G.; Kavitha, G. Enhanced Elman Spike Neural Network Based Sentiment Analysis of Online Product Recommendation. Appl. Soft Comput. 2023, 132, 109789. [Google Scholar]
- Liu, X.; Chen, Q.; Liu, Y.; Siebert, J.; Hu, B.; Wu, X.; Tang, B. Decomposing Word Embedding with the Capsule Network. Knowl. Based Syst. 2021, 212, 106611. [Google Scholar] [CrossRef]
- Sasaki, S.; Heinzerling, B.; Suzuki, J.; Inui, K. Examining the Effect of Whitening on Static and Contextualized Word Embeddings. Inf. Process. Manag. 2023, 60, 103272. [Google Scholar] [CrossRef]
- Probst, P.; Wright, M.N.; Boulesteix, A.-L. Hyperparameters and Tuning Strategies for Random Forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef] [Green Version]
Sentence Sample | ||
---|---|---|
B-5 Also the Left killed comedy E-5. | ||
B-8 “I hope the American people can see through this sham E-8” Graham warned fellow GOPers about voting against the nomination. | ||
President Donald J. Trump for the area, declaring him a B-10 “bigot” E-10 | ||
B-7 The plague is a lie, E-7 Helene Raveloharisoa told the wire service. | ||
He continued by saying that an FBI would make him feel better at ease. | ||
List of Propaganda Techniques | ||
0 Virtue_words | 7 Exaggeration, minimization | 15 Thought-terminating_Clichés |
1 Beautiful_people | 8 Flag_Waving | 16 Whataboutism |
2 Smears | 9 Loaded_Language | 17 Straw_Men |
3 Cult_of_personality | 10 Name_Calling, Labeling | 18 Red_Herring |
4 Repetition | 11 Bandwagon | 19 Obfuscation, intentional vagueness |
5 Slogans | 12 Reduction_ad-Hitlerum | 20 Appeal_to-Authority |
6 Doubt | 13 Black and White Fallacy | 21 Appeal_to-fear-prejudice |
14 Causal_Oversimplification |
Datasets | Sources | News Articles | Label | Propaganda Text | Level |
---|---|---|---|---|---|
PTC | 49 | 536 | 14 | 7385 | Doc |
TSHP-17 | 11 | 22,580 | 4 | 5330 | Doc |
QProp | 104 | 51,000 | 2 | 5737 | Doc |
ProText | 60 | 1200 | 22 | 11,532 | Span |
Parameters | TSHP-17 | Qprop | PTC | ProText | All Data |
---|---|---|---|---|---|
Case1 | Case2 | ||||
Weight decay | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
n_clusters | 0 | 2 | 2 | 2 | 0 |
Output Layer | GLU | Sigmoid | ReLU | PReLU | Softmax |
Batch sizes | 8 | 64 | 32 | 64 | 16 |
Epochs | 25 | 18 | 20 | 12 | 15 |
kernel | 1 | 2 | 1 | 1 | 2 |
Pre-Train model | 24 | 24 | 12 | 12 | 12 |
Optimizer | GD | GD | Adam | RMSprop | AdamW |
Embedding | TF/TF-IDF | TF-IDF | W-V | TF-IDF | GloVec |
Hidden-Layers | 768 | 768 | 768 | 768 | 768 |
Data-Train/test | 0.7/0.3 | 0.9/0.1 | 0.6/0.4 | 0.85/0.15 | 0.8/0.2 |
Learning Rates | 2 × | 2 × | 3 × | 3 × | 5 × |
Dropout | 0.4 | 0.3 | 0.4 | 0.4 | 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmad, P.N.; Liu, Y.; Ali, G.; Wani, M.A.; ElAffendi, M. Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data. Mathematics 2023, 11, 2668. https://doi.org/10.3390/math11122668
Ahmad PN, Liu Y, Ali G, Wani MA, ElAffendi M. Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data. Mathematics. 2023; 11(12):2668. https://doi.org/10.3390/math11122668
Chicago/Turabian StyleAhmad, Pir Noman, Yuanchao Liu, Gauhar Ali, Mudasir Ahmad Wani, and Mohammed ElAffendi. 2023. "Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data" Mathematics 11, no. 12: 2668. https://doi.org/10.3390/math11122668
APA StyleAhmad, P. N., Liu, Y., Ali, G., Wani, M. A., & ElAffendi, M. (2023). Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data. Mathematics, 11(12), 2668. https://doi.org/10.3390/math11122668