Attention-Enriched Mini-BERT Fake News Analyzer Using the Arabic Language
Abstract
:1. Introduction
- The Arabic news analyzer originality check was generated with the help of Arabic semantics and a BERT classifier.
- The original, labeled, and augmented Arabic datasets using DL and ML classifiers were evaluated to strengthen the performance of Arabic fake news analyzers compared to previous studies.
- ML and DL methods were used to compare the performance of the Arabic fake news analyzer, where the DL method assisted in terms of attention masks to pay more attention to the region of interest. This was proven to be a more robust fake news analyzer.
2. Related Work
3. Materials and Methods
3.1. Preprocessing Text
3.2. BERT Preprocessing
3.3. ML Classification
Mini-BERT Classification
4. Results and Discussion
4.1. Evaluation Measures
4.2. Dataset Description
4.3. ML Classification
4.4. Mini-BERT Transfer Learning Classifier Prediction Results
4.5. Comparison
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Harrag, F.; Djahli, M.K. Arabic Fake News Detection: A Fact Checking Based Deep Learning Approach. Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–34. [Google Scholar] [CrossRef]
- Pulido, C.M.; Ruiz-Eugenio, L.; Redondo-Sama, G.; Villarejo-Carballido, B. A new application of social impact in social media for overcoming fake news in health. Int. J. Environ. Res. Public Health 2020, 17, 2430. [Google Scholar] [CrossRef] [Green Version]
- Maldonado, M.A. Understanding fake news: Technology, affects, and the politics of the untruth. Hist. Comun. Soc. 2019, 24, 533. [Google Scholar] [CrossRef]
- Ozbay, F.A.; Alatas, B. Fake news detection within online social media using supervised artificial intelligence algorithms. Phys. A Stat. Mech. Its Appl. 2020, 540, 123174. [Google Scholar] [CrossRef]
- Meel, P.; Vishwakarma, D.K. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst. Appl. 2020, 153, 112986. [Google Scholar] [CrossRef]
- Lewandowsky, S.; Ecker, U.K.; Cook, J. Beyond misinformation: Understanding and coping with the “post-truth” era. J. Appl. Res. Mem. Cogn. 2017, 6, 353–369. [Google Scholar] [CrossRef] [Green Version]
- Davoudi, M.; Moosavi, M.R.; Sadreddini, M.H. DSS: A hybrid deep model for fake news detection using propagation tree and stance network. Expert Syst. Appl. 2022, 198, 116635. [Google Scholar] [CrossRef]
- Auxier, B. 64% of Americans Say Social Media Have a Mostly Negative Effect on the Way Things Are Going in the U.S. Today; Pew Research Center: Washington, DC, USA, 2020. [Google Scholar]
- Rubin, V.L. On deception and deception detection: Content analysis of computer-mediated stated beliefs. Proc. Am. Soc. Inf. Sci. Technol. 2010, 47, 1–10. [Google Scholar] [CrossRef]
- Soll, J.; White, J.B.; Sitrin, S.S.; Carly.; Gerstein, B.M. The Long and Brutal History of Fake News. Politico Magazine, 18 December 2016. [Google Scholar]
- Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor. Newsl. 2017, 19, 22–36. [Google Scholar] [CrossRef]
- Schonfeld, E. Citizen “Journalist” Hits Apple Stock with False (Steve Jobs) Heart Attack Rumor. 2008. Available online: https://techcrunch.com/2008\hskip.11emplus.33emminus.07em/10/03/citizen-journalist-hits-apple-stock-with-falsesteve-jobs-heart-attack-rumor (accessed on 15 May 2022).
- Zhou, X.; Zafarani, R. Network-based fake news detection: A pattern-driven approach. ACM SIGKDD Explor. Newsl. 2019, 21, 48–60. [Google Scholar] [CrossRef]
- Nassif, A.B.; Elnagar, A.; Elgendy, O.; Afadar, Y. Arabic fake news detection based on deep contextualized embedding models. Neural Comput. Appl. 2022, 34, 16019–16032. [Google Scholar] [CrossRef]
- Alotaibi, F.L.; Alhammad, M.M. Using a Rule-based Model to Detect Arabic Fake News Propagation during COVID-19. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 112–119. [Google Scholar] [CrossRef]
- Alabrah, A.; Alawadh, H.M.; Okon, O.D.; Meraj, T.; Rauf, H.T. Gulf countries’ citizens’ acceptance of COVID-19 vaccines—A machine learning approach. Mathematics 2022, 10, 467. [Google Scholar] [CrossRef]
- Zhang, X.; Ghorbani, A.A. An overview of online fake news: Characterization, detection, and discussion. Inf. Process. Manag. 2020, 57, 102025. [Google Scholar] [CrossRef]
- Muaad, A.Y.; Jayappa Davanagere, H.; Benifa, J.; Alabrah, A.; Naji Saif, M.A.; Pushpa, D.; Al-Antari, M.A.; Alfakih, T.M. Artificial intelligence-based approach for misogyny and sarcasm detection from Arabic texts. Comput. Intell. Neurosci. 2022, 2022, 7937667. [Google Scholar] [CrossRef]
- Kumar, A.; Singh, J.P.; Singh, A.K. COVID-19 Fake News Detection Using Ensemble-Based Deep Learning Model. IT Prof. 2022, 24, 32–37. [Google Scholar] [CrossRef]
- Mughaid, A.; Al-Zu’bi, S.; Al Arjan, A.; Al-Amrat, R.; Alajmi, R.; Zitar, R.A.; Abualigah, L. An intelligent cybersecurity system for detecting fake news in social media websites. Soft Comput. 2022, 26, 5577–5591. [Google Scholar] [CrossRef]
- Gumaei, A.; Al-Rakhami, M.S.; Hassan, M.M.; De Albuquerque, V.H.C.; Camacho, D. An effective approach for rumor detection of Arabic tweets using extreme gradient boosting method. Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–16. [Google Scholar] [CrossRef]
- Amer, E.; Kwak, K.S.; El-Sappagh, S. Context-Based Fake News Detection Model Relying on Deep Learning Models. Electronics 2022, 11, 1255. [Google Scholar] [CrossRef]
- Lasotte, Y.; Garba, E.; Malgwi, Y.; Buhari, M. An Ensemble Machine Learning Approach for Fake News Detection and Classification Using a Soft Voting Classifier. Eur. J. Electr. Eng. Comput. Sci. 2022, 6, 1–7. [Google Scholar] [CrossRef]
- Safaya, A.; Abdullatif, M.; Yuret, D. KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics, Barcelona, Spain, 12–13 December 2020; pp. 2054–2059. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Casola, S.; Lavelli, A. FBK@ SMM4H2020: RoBERTa for detecting medications on Twitter. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, Barcelona, Spain, 12–13 December 2020; pp. 101–103. [Google Scholar]
- Staliūnaitė, I.; Iacobacci, I. Compositional and lexical semantics in RoBERTa, BERT and DistilBERT: A case study on CoQA. arXiv 2020, arXiv:2009.08257. [Google Scholar]
- Abadeer, M. Assessment of DistilBERT performance on named entity recognition task for the detection of protected health information and medical concepts. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, Online, 19 November 2020; pp. 158–167. [Google Scholar]
- Mozafari, J.; Fatemi, A.; Moradi, P. A method for answer selection using DistilBERT and important words. In Proceedings of the 2020 6th International Conference on Web Research (ICWR), Tehran, Iran, 22–23 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 72–76. [Google Scholar]
- Assaf, R. Arabic Fake News Dataset. Available online: https://github.com/RashaAssaf/fake_news_Dtaset (accessed on 15 May 2022).
- Assaf, R.; Saheb, M. Dataset for Arabic Fake News. In Proceedings of the 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 13–15 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
Acronyms | Description |
---|---|
AUC | Area under curve |
BERT | Bidirectional encoder representations from transformers |
CNN | Convolutional neural network |
DL | Deep learning |
GRU | Gradient recurrent unit |
LR | Learning rate |
LSTM | Long short-term memory |
ML | Machine learning |
NLP | Natural language processing |
SVM | Support vector machine |
TF-IDF | Term frequency–inverse document frequency |
References | Dataset | Methods | Results |
---|---|---|---|
[18] | Misogyny, sarcasm | BERT and other classical ML methods | Misogyny: Binary = 91%, Multi = 89% Sarcasm: Binary = 88%, Multi = 77% |
[19] | Twitter data of COVID-19 tag news | Ensemble DL model | Weighted F1 score = 0.99 |
[20] | Different datasets | Website ranking + Tf-IDF scores used to calculate the proposed news’ accuracy | - |
[21] | Arabic Tweeter data of rumors and nonrumors | Content-, user-, and topic-based features with machine learning classifiers | Accuracy using XG-boost = 97.18% |
[22] | ISOT | Statistical, contextual features with machine learning classifiers, BERT, GRU, and LSTM | Highest accuracies: GRU = 0.988, LSTM = 0.991 |
[23] | Kaggle dataset | Soft voting classifier, SVM, LR, Naïve Bayesian, and FridSearchCv optimization | Highest accuracy achieved by ensemble method = 93% accuracy |
Parameters | Value |
---|---|
Batch size | 20 |
Adam optimizer learning rate | 0.00005 |
Adam optimizer Epsilon | 1 × 10 |
Hidden size | 256 |
Maximal length | 280 |
Epochs | 100 |
Dataset | Reliable | Unreliable | Total |
---|---|---|---|
Original | 100 | 222 | 322 |
Augmented | 200 | 444 | 644 |
Splits | Methods | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
70/30 | Decision tree | 84.97 | 86.76 | 91.47 | 89.05 |
Random forest | 83.93 | 82.66 | 96.12 | 88.88 | |
Naïve Bayesian | 73.57 | 72.15 | 98.44 | 83.27 | |
Linear support vector | 83.93 | 86.56 | 89.92 | 88.21 | |
80/20 | Decision tree | 93.02 | 97.59 | 92.04 | 94.73 |
Random forest | 96.12 | 97.70 | 96.59 | 97.14 | |
Naïve Bayesian | 86.04 | 83.01 | 100 | 90.72 | |
Linear support vector | 94.57 | 97.64 | 94.31 | 95.95 | |
90/10 | Decision tree | 98.43 | 100 | 97.5 | 98.73 |
Random forest | 98.43 | 100 | 97.5 | 98.73 | |
Naïve Bayesian | 79.68 | 76.47 | 97.5 | 85.71 | |
Linear support vector | 96.87 | 100 | 95.0 | 97.43 |
Splits | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
70/30 | 87.04 | 88.23 | 93.02 | 90.56 |
80/20 | 96.12 | 97.70 | 96.59 | 97.14 |
90/10 | 98.43 | 100 | 97.5 | 98.73 |
References | Purpose | Methods | Results |
---|---|---|---|
[21] | Arabic Tweeter data of rumors and nonrumors | Content, user, and topic-based features with ML classifiers | Accuracy using XG-boost = 97.18% |
[18] | Misogyny, sarcasm-based classification | BERT and other classical ML methods | Misogyny: binary = 91%, Multi = 89% Sarcasm: Binary = 88%, Multi = 77% |
Proposed study | Arabic-language-based fake news detection | Mini-BERT | accuracy = 98.43%, F1 score = 98.73 |
ML Classifiers | Best-performing, random forest and decision tree Accuracies = 98.43% F1 score = 98.73% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alawadh, H.M.; Alabrah, A.; Meraj, T.; Rauf, H.T. Attention-Enriched Mini-BERT Fake News Analyzer Using the Arabic Language. Future Internet 2023, 15, 44. https://doi.org/10.3390/fi15020044
Alawadh HM, Alabrah A, Meraj T, Rauf HT. Attention-Enriched Mini-BERT Fake News Analyzer Using the Arabic Language. Future Internet. 2023; 15(2):44. https://doi.org/10.3390/fi15020044
Chicago/Turabian StyleAlawadh, Husam M., Amerah Alabrah, Talha Meraj, and Hafiz Tayyab Rauf. 2023. "Attention-Enriched Mini-BERT Fake News Analyzer Using the Arabic Language" Future Internet 15, no. 2: 44. https://doi.org/10.3390/fi15020044
APA StyleAlawadh, H. M., Alabrah, A., Meraj, T., & Rauf, H. T. (2023). Attention-Enriched Mini-BERT Fake News Analyzer Using the Arabic Language. Future Internet, 15(2), 44. https://doi.org/10.3390/fi15020044