You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

21 September 2024

Pre-Trained Language Model Ensemble for Arabic Fake News Detection

and
Information Technology Department, College of Computer and Information Sciences, King Saud University, P.O. Box 145111, Riyadh 4545, Saudi Arabia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Data Mining and Machine Learning in the Era of Big Knowledge and Large Models

Abstract

Fake news detection (FND) remains a challenge due to its vast and varied sources, especially on social media platforms. While numerous attempts have been made by academia and the industry to develop fake news detection systems, research on Arabic content remains limited. This study investigates transformer-based language models for Arabic FND. While transformer-based models have shown promising performance in various natural language processing tasks, they often struggle with tasks involving complex linguistic patterns and cultural contexts, resulting in unreliable performance and misclassification problems. To overcome these challenges, we investigated an ensemble of transformer-based models. We experimented with five Arabic transformer models: AraBERT, MARBERT, AraELECTRA, AraGPT2, and ARBERT. Various ensemble approaches, including a weighted-average ensemble, hard voting, and soft voting, were evaluated to determine the most effective techniques for boosting learning models and improving prediction accuracies. The results of this study demonstrate the effectiveness of ensemble models in significantly boosting the baseline model performance. An important finding is that ensemble models achieved excellent performance on the Arabic Multisource Fake News Detection (AMFND) dataset, reaching an F1 score of 94% using weighted averages. Moreover, changing the number of models in the ensemble has a slight effect on the performance. These key findings contribute to the advancement of fake news detection in Arabic, offering valuable insights for both academia and the industry

1. Introduction

With the rise of online communication, the ability to share information has increased tremendously and surpassed authorized entities. Anyone, anywhere, can easily participate in generating and disseminating information. This comes with the significant consequence of the widespread proliferation of fake news. The sheer volume of information constantly streaming online makes manual verification nearly impossible. Traditional methods of addressing fake news involved employing journalists and editorial teams to verify information against reliable sources before publication. This process, however, proved costly and time consuming, especially when dealing with the vast volume of information requiring verification against a limited number of fact-checkers. This challenge has become particularly evident during critical events, such as the COVID-19 pandemic [1].
Recent research [2,3,4,5,6] has explored utilizing machine learning algorithms to address the shortcomings of manual fact-checking. However, these approaches often face challenges like overfitting, scalability limitations, and difficulties in generalizing to new situations. Additionally, due to their statistical nature, machine learning models might struggle to understand the contextual nuances of continuously evolving fake news content. While the development of deep learning models has been effective in predicting the validity of news, training these models from scratch can be time-consuming and requires large quantities of labeled data.
Pre-trained language (transformer-based) models (PLMs) are a type of language model trained using a self-supervised learning (SSL) approach on large-scale corpora [7]. These models can then be fine-tuned for specific downstream tasks such as text classification, named entity recognition, and text summarization. Pre-trained language models like BERT have demonstrated promising performance in various NLP tasks, including text classification [8]. These models require less data for training due to their prior exposure to large unlabeled datasets. The remarkable achievements of transformer-based approaches have encouraged the development of Arabic transformers such as Arabic Bidirectional Encoder Representations from Transformers (AraBERT) [9], the Arabic Generative Pre-trained Transformer version 2 (AraGPT2) [10], Arabic Efficiently Learning an Encoder that Classifies Token Replacements Accurately (AraELECTRA) [11], Arabic Bidirectional Encoder Representations from Transformers (ARBERT), and Multilingual Arabic Bidirectional Encoder Representations from Transformers (MARBERT) [12], which enabled significant progress in various NLP tasks. These techniques are widely used in text classification [13], where it has been employed to capture the relationship between labels in multilabel classification through attention. In the field of fake news detection, numerous efforts have been made to address the spread of fake news and enhance the accuracy of detection models. A recent survey [14] reviewed several studies that proposed solutions and developed fake news detection methods. A comparative analysis of these methods was reported in terms of feature-extraction techniques and classification accuracy. The study demonstrated the superior performance of contextually dependent-based models, particularly transformer models, which outperform other models consistently. Additionally, research suggests that combining multiple feature representation and classification models can yield better performance and reduce computational costs.
In the context of Arabic FND, pre-trained language models have shown significant progress recently, despite challenges like overfitting. Ensemble learning methods can address such issues by leveraging the strengths of multiple transformer models. Ensemble learning is a machine-learning technique that helps to improve the accuracy of poor-performing models by combining several models together into one optimal model [15]. This approach emerged with the introduction of machine learning and weak learners and continues to proliferate for improving deep learning networks to handle the complexity of data and to enhance accuracy for tasks like speech processing, image recognition, and text classification. By combining predictions from various models, this approach mitigates overfitting and utilizes the diverse knowledge acquired by each individual model [16]. According to a recent literature review [17], ensemble learning has been widely used in the field of fault diagnosis and classification. In their comparative analysis, the authors of [17] compare fault diagnosis models that used different ensemble methods, including bagging, boosting, stacking, and other ensemble models. The evaluation results demonstrate the promising results of ensemble learning strategies for fault diagnosis, outperforming using a single model.
In previous works, the focus has not been on utilizing ensemble learning techniques with Arabic transformer models specifically for the task of Arabic fake news detection (FND). While individual transformer models, such as AraBERT and MARBERT, have demonstrated strong performance in various natural language processing (NLP) tasks, their individual weaknesses and limitations—such as domain biases or overfitting—can hinder their ability to generalize effectively in fake news detection. Ensemble learning, which combines multiple models, has the potential to overcome these challenges by leveraging the diverse strengths of different transformers, thus providing a more robust and accurate solution. However, the use of ensemble techniques for Arabic FND has not been thoroughly investigated, particularly in the context of combining multiple transformer architectures.
This study aims to address this gap by developing an ensemble of state-of-the-art Arabic pre-trained transformer models with the goal of advancing Arabic fake news detection. By integrating multiple pre-trained models, such as AraBERT, MARBERT, and other transformer-based models, into an ensemble framework, we aim to enhance model performance by reducing individual model uncertainties and improving generalizations across various types of fake news. In addition, we combine several standard Arabic fake news datasets into a unified dataset, representing a wide range of domains, such as politics, health, and social issues, and experiment with various ensemble learning techniques including weighted averaging, hard voting, and soft voting. In particular, the contributions of this study are as follows:
  • A novel ensemble strategy using State-of-the-Art Arabic Pre-Trained Models: We propose and evaluate an ensemble strategy that incorporates state-of-the-art Arabic pre-trained models (e.g., AraBERT and MARBERT) for fake news detection. The ensemble leverages techniques such as weighted averaging, hard voting, and soft voting to enhance the overall performance by combining the strengths of these models. The weights assigned to each model in the ensemble are not fixed or predefined. These weights are dynamically optimized through a search algorithm that identifies the best combination of models to maximize the performance metrics (e.g., accuracy) on the validation set.
  • A Unified Arabic Multisource Fake News Dataset (AMFND): We create a comprehensive dataset called AMFND by merging several Arabic fake news datasets from different sources, enabling the model to generalize better across various types of misinformation and Arabic dialects. The dataset is curated from various online sources across different domains (e.g., politics, health, sports, and social issues). Unlike many fake news detection datasets that focus primarily on political misinformation, our approach ensures that the model is exposed to different types of fake news, each with distinct language patterns and structures.
  • The employment of different fine-tuning strategies: we fine-tune our model using both the traditional 80/20 train–test split and cross-validation techniques, ensuring the model is thoroughly evaluated and tuned for generalizability across different fake news contexts.
  • An extensive baseline comparison and performance evaluation: we rigorously compare the performance of the proposed ensemble models against a strong baseline, demonstrating significant improvements in fake news detection accuracy, precision, recall, and F1 score using the ensemble of pre-trained models.
  • Given the unique linguistic challenges posed by Arabic fake news detection, such as the complexity of morphological structures and the scarcity of relevant datasets, applying ensemble techniques of pre-trained language models offers a novel contribution to the field. The ensemble method proposed in this paper leverages the strengths of multiple models to address the specific challenges within the Arabic context, which has been relatively underexplored compared to other languages.

3. Methodology

In this study our aim was to investigate to what extent an ensemble model of transformers can improve the performance of Arabic FND. The general approach we adopted is a general machine learning approach (fine-tuning PLM), as depicted in Figure 1. Here, the datasets were combined, cleaned, and pre-processed. Then, the PLMs were fine-tuned and evaluated to develop the final model. We approached this investigation by building an ensemble of transformer models using standard datasets (details of the datasets are shown in Table 2). Two of the datasets were collected from a single source written using Modern standard Arabic (MSA). However, the third ArCOV19-rumors dataset is the only dataset that contains news from Twitter, which typically contains dialectal Arabic (DA). Research has shown that models trained on MSA datasets do not perfectly work in dialect datasets [37]. There is a growing interest in training models on large diverse datasets in order to achieve better results and reduce generalization errors. Therefore, in this study we used a combined dataset, which contains all three datasets, in an AMFND dataset to study its impact on the FND task.
Figure 1. General approach to FND.
Table 2. Details of Arabic FND datasets used in this study.

3.1. Arabic Multisource Fake News Dataset (AMFND)

We combine the three Arabic datasets, AraNews, AFND, and AraCOV19-Rumors, into one large dataset, AMFND. The combination of multiple datasets significantly enhances the overall size and diversity of data, which plays a crucial role in improving the generalization ability and enables machine learning models to capture a wide range of patterns and features for detecting fake news. To handle the diversity of labels, we first transformed the labels into one standard schema, in our case, to a binary classification (fake or real). For example, if the dataset contained multiple labels like the AFND dataset, which had the labels credible, not credible, and undecided, we first changed the labels into fake and real labels and removed items with the third label ‘undecided’, as the objective of our task was binary classification. After normalizing the labels, we removed the unnecessary columns element and kept only two columns, one for the text and one for the label. Finally, a unique ID was assigned for each news item. The three datasets were then merged sequentially to construct the final dataset. The resulted AMFND dataset consisted of the following attributes, News_ID, News_Text, and the Label, which is the normalized category of news.
The resulting AMFND dataset is composed of 56769 items classified as fake or real news. In the AMFND dataset, the labels were imbalanced (59% of the dataset was fake and 41% was real). An imbalanced dataset could lead to poor model performance biases against one label. To alleviate this problem, we considered using resampling strategies specifically under the sampling method, whereas the majority class was reduced until it became equal or balanced with the distribution of the other class.

3.2. Text Pre-Processing

Text pre-processing is a challenging task for data scientists and NLP researchers, especially when performed on datasets scrapped from social networks, which contain a lot of noise and unnecessary elements. The cleaning procedures were implemented using mathematical functions like regex and existing Python libraries called NLTK. We performed the following pipeline on all the three datasets before combing them.
Cleaning pipeline
  • Remove punctuation, diacritics, special characters, and non-Arabic text. Diacritics are vowel marks. As Arabic text does not provide information about pronunciation, the main purpose of diacritics is to provide a phonetic guide for correct pronunciation.
  • Remove duplicated sentences, missing or corrupted values, and handling missing values.
  • Normalize the Arabic text (transform the text to a unified form, removing elongation. Elongation is used to communicate a long vowel pronunciation) and remove stop words.
  • Remove hyperlinks, emojis, and Twitter handlers.
Pre-processing pipeline
  • Used text pre-segmentation requirements using the Farasa segmenter [38] from the official transformer documentation. Word segmentation involves breaking words into their constituent clitics.
  • Employ the transformer tokenizer. Tokenizers are essential tools in machine learning, especially in natural language processing (NLP). They break down a text into smaller units called tokens. These tokens can be words, sub-words, or characters. A tokenizer is in charge of preparing the inputs for a specific model.
  • Perform padding and truncations to handle sequence length inconsistencies. To avoid variable sequence lengths, we used padding and truncation strategies to handle the long input sequence and to ensure that all the sequences were in the same length that the model could accept, which typically adjusted to a maximum of 512 input tokens. The padding handled the short sequence by appending additional tokens or special tokens like [PAD] after the last tokens in the sentence so that all sentences had equal token lengths. The truncations worked differently by removing the end of long sequences in order to avoid mismatched sequence lengths and to reduce the overwhelming computational processing of long sequences.
Table 3 demonstrates the pre-processing steps applied on Arabic fake news using two examples prior to their input into a BERT-based model (e.g., AraBERT). Each text was tokenized into token IDs[‘CLS’] and segmented IDs[‘SEP’] using a BERT tokenizer. This table also shows how the text input was normalized, removing diacritics and standardizing certain characters. Additionally, it shows the text input after removing punctuations and non-alphabetic characters. For padding and truncation, Example 1 required both truncation and padding, whereas Example 2 required only padding. The input representation section shows how the text input was prepared for the model, including the token IDs and attention masks.
Table 3. Text-processing examples for the AMFND dataset.

3.3. Model Fine-Tuning and Ensemble

Five transformer models were used in our study: AraBERT v1, AraGPT2, AraELECTRA, MARBERT, and ARBERT. Details of these models are presented in Table 4. They were fine-tunned for Arabic FND, using the three individual datasets, as well as the combined AMFND.
Table 4. Details of Arabic transformer models used in this study.
For model evaluation, we performed two resampling approaches to evaluate our system. The first evaluation approach was random sampling, in which we split the dataset into three sets, training–validation–test, with a size ratio equal to 80:10:10, respectively. Additionally, we used a k-fold cross validation (k = 5) to examine the model performance. Evaluating the model using a cross validation usually aids in achieving a more generalized machine learning classifier and avoids the overfitting problem during the training process [39,40]. We compared the results of each evaluation approach and selected the best results.
For ensemble methods, the voting ensemble is the most commonly used method in the literature, due to its superior performance achievements in classification and regression tasks [41]. Essentially, a voting ensemble involves calculating the prediction of each model for a specific class and predicting the label that has the majority vote. There are two policies of voting: hard voting and soft voting. A hard voting is computed by collecting the class voting generated by each model member. Soft voting, on the other hand, relies on predicting the largest probability of the classes obtained from each of the model members.

4. Experiments

For model fine-tuning, we used the transformer library from the huggingface repository to fine-tune the pre-trained models. Here, we ran the training process using the Pytorch library [42], which supports hardware accelerators such as GPUs and efficient code debugging. Then, we implemented segmentations using the Farasa library [38] on all models that required pre-processing in their official repo, such as AraBERT, AraELECTRA, and AraGPT2. For tokenization, we used the Auto Tokenizer library from the transformer which included all the models used for training. Using Trainer API from the huggingface library, we determined the training parameters and the model hyperparameters used to train a transformer model. For the training setting, we fine-tuned all the models for 3 to 5 epochs with a learning rate of 2 × 10−5 and with an Adam optimizer equal to 1 × 10−8. The evaluation strategy was set according to the number of iterations, and the best model was loaded at the end of the training process. The models were evaluated using a macro F1 score, precision, recall, and accuracy in both the training and testing procedures. For the ensemble experiment, we loaded the model training weights and combined the predictions of the considered models on the test data using the ensemble methods.
We first developed a baseline using a zero-shot setting. Next, we conducted two major experiments, as shown in Figure 2.
Figure 2. Fine-tuning each of the five Arabic transformer models on the combined AMFND dataset.
  • Experiment 1: five Arabic transformers were fine-tuned on the combined dataset (AMFND).
  • Experiment 2: Each of the five Arabic transformer models were fine-tuned on the combined AMFND dataset. Then, an ensemble of five, three, and two transformers was created. We experimented with two kinds of ensemble strategies: the weighted-average ensemble and voting ensembles (hard and soft).

5. Results

The baseline results for all five models are shown in Table 5. Table 6 shows the performance results after fine-tuning on the combined AMFND dataset using a 5-fold cross validation. The results of experiment 2, fine-tuning each of the five Arabic transformer models on the combined AMFND dataset and then creating an ensemble, are shown in Table 7. Table 8 shows the performance results when the number of ensemble models was varied, depicting results using an ensemble of five, three, and two models.
Table 5. Performance of five baseline models (zero-shot)-highest value in bold.
Table 6. Experiment 1—performance of models on AMFND dataset using a 5-fold cross validation-highest value in bold.
Table 7. Experiment 2—performance of different ensemble methods using five transformer models on AMFND-highest value in bold.
Table 8. Performance results of different ensemble numbers (5/3/2)-highest value in bold.

6. Discussion

From the experiments, we observed that training a model using the common 80–20% ratio for training and testing produces lower performance for all models compared to a 5-fold cross validation. This can be explained by multiple reasons, such as the variance in the dataset; therefore, such a model is not able to capture a good representation from the training sample. The second reason for this is due to the sequential integration of the individual dataset into one large dataset, which might make the model biased to a specific dataset. Regarding experiment 1 (Table 5), among all models, AraELECTRA achieved the best F1 score using the stratified 5-fold cross validation, with a score of 0.92, followed by the AraGPT2 model, which reported 0.90% using the same training strategy. On the other hand, AraBERT recorded the lowest accuracy score of 0.84 under various settings compared to MarBERT and ARBERT. Our intuition for the cause of this result is that AraBERT has been trained on more structured and formal language corpora from Wikipedia, unlike our dataset, which contains a combination of MSA and dialectal texts.
With regards to experiment 2, we further conducted a comparative analysis of the ensemble model over state-of-the-art models reported in the literature, as shown in Table 9. We can observe that the ensemble model in this study is comparable to those of the literature. It outperformed some of the standalone models, reaching an F1 score of 94% using the weighted-average ensemble. The results show that composing predictions from multiple models yields the best overall performance, if we give the well-performing model the largest weights, which enable them to contribute more to the ensemble result and, therefore, to obtain a better prediction. On the other hand, a voting ensemble performs poorly, in general, with an F1 score equal to 88% and 90% for hard and soft voting, respectively. This is likely because it aggregates the class probabilities from each prediction to vote the final class, considering fair contributions for all ensemble members in the final ensemble prediction.
Table 9. Comparision results for SOTA methods against the ensemble.
Our experiments show how the number of ensembled models (two, three, and five) affect the performance of Arabic fake news detection (Table 8). Our results suggest that there is no observed impact of the number of models in improving the accuracy of the ensemble model. Therefore, increasing the number of models in the ensemble may not be helpful, if multiple models produced similar accuracy results. For the first ensemble, we combined the prediction of five transformer models, including AraBERT + MarBERT + ARBERT + AraELECTRA + AraGPT2, using the three ensemble methods, namely, weighted-average ensemble, hard voting, and soft voting. The weighted-average ensemble achieved the best F1 score of 94%, whereas the results of hard and soft voting showed no improvement against the performance of individual models used in the ensemble, reporting similar F1 scores of 86.6% and 86.7%, against the best standalone F1 result of 0.92 achieved by the AraELECTRA model. This is expected, because the voting accuracy is determined by the majority vote of the class predicted by the ensemble members, unlike the weighted-average ensemble, which basically aggregated all the predictions from each model, multiplied by certain weights, according to their performance. For this reason, if three out of five models make incorrect predictions, the voting ensemble will probably result in a lower performance, since it considers all the models equally, meaning that each model in the ensemble will have the same contribution in the prediction.
Similarly, we found that an ensemble of three models, which consists of AraBERT + MARBERT + AraELECTRA, yielded successful improvements in terms of F1 results using all ensemble methods. It achieved similar performance to the five-ensemble model using the weighted-average ensemble. However, the hard-voting and soft-voting results show some improvements, F1 = 0.88 and F1 = 0.87, respectively. As for the last ensemble variant, which comprised two models, which were AraELECTRA and MarBERT, the weighted-average ensemble reached 0.91 in terms of the F1 score, whereas for voting, the highest F1 value was obtained by soft voting, with F1 = 0.88, surpassing the hard-voting result of F1 = 0.85.
Although a perfect comparison is impossible given the disparity of models and resources; however, to enrich the discussion, we considered FND for other languages. For example, the study by [21] on deceptive text classifications in the Italian Cultural Heritage Dataset showed an accuracy of 81% and a micro F1 of 85% using the Sentita and Sentix lexicons joined together. This result is comparable to our ensemble-model results, however, with a noticeable improvement in the latter. In addition, the reported accuracy for end-to-end transformer models in a comparative study [14] reached a score of 25% for the Liar dataset (multi-class), 100% for the ISOT dataset (binary), 94% for the COVID-19 dataset (binary), and 99% for the GM dataset (binary). These results are comparable to our work and other Arabic FND results reported in the literature.

7. Conclusions

In this work, we carried out multiple experiments aimed at enhancing Arabic FND by employing transformer-based ensemble models trained on a diverse dataset compiled by combining three publicly available sources. We developed a unified dataset, AMFND, by integrating multiple Arabic fake news datasets from diverse sources. This comprehensive dataset facilitates improved model generalizations across different forms of misinformation and various Arabic dialects. We further conducted a thorough evaluation of the proposed ensemble models against a baseline, showcasing substantial improvements in fake news detection performance across metrics such as accuracy, precision, recall, and F1 score, which were achieved through the use of the pre-trained model ensemble.
To the best of our knowledge, this is the first study to apply an ensemble of pre-trained language models specifically for Arabic FND. The performance of our ensemble models was assessed on the combined dataset, AMFND. Various ensemble methods, including different voting techniques, such as weighted averaging, hard voting, and soft voting, were implemented to aggregate predictions from several transformer models. Additionally, we investigated factors affecting ensemble performance and provided recommendations for the effective implementation of these techniques.
Our findings indicate that the best ensemble result was achieved by implementing the weighted-average ensemble. However, we do acknowledge that the performance of ensemble models can be influenced by factors related to the structure and implementation of an ensemble model. These factors include the performance of individual models in the ensemble and the number of ensemble models, in addition to the extent of diversity between the ensemble members. We observed that integrating an odd number of models can enhance the performance of an ensemble. Moreover, the ensemble strategy used to combine the prediction of individual models together can play an important role in the model performance.
Although constructing an ensemble from multiple transformer models showed improved performance, it is important to analyze the cost of combining more than one model compared to fine-tuning a single transformer model. The cost can be computed according to different factors, which include computational resources, memory usage, development costs, energy consumption, and performance and accuracy. Regarding computational resources, ensemble models require significantly more time to fine-tune multiple transformer models and a high memory consumption for every transformer model used in the ensemble. However, despite the cost, the ensemble can provide better performance compared to a single model. Therefore, in order to leverage the advantage of an ensemble model, considering the cost efficiency is indeed crucial against superior performance.
While the discussed limitations highlight the complexities of Arabic fake news detection, they also unveil exciting opportunities for future research and improvements. Exploring the adaptation of fake news detection models to specific Arabic dialects has potential and is important. This research direction can significantly improve social media detection accuracy by addressing the linguistic variations across different Arab regions. By accounting for dialectal differences, models can become more adept at identifying fake news content, leading to a more robust Arabic FND. Moreover, another avenue for future work is to address the challenge of data scarcity in Arabic languages, and investigating cross-lingual transfer-learning techniques emerges as a promising solution. This approach involves leveraging existing, well-trained fake news detection models from resource-rich languages like English and transferring their knowledge to Arabic models. This strategy has the potential to significantly improve the state of the art in Arabic FND research, particularly in settings with limited annotated data.

Author Contributions

Conceptualization, L.A.-Z. and M.A.-Y.; methodology, L.A.-Z. and M.A.-Y.; software, L.A.-Z.; validation, L.A.-Z.; investigation, L.A.-Z. and M.A.-Y.; writing—original draft preparation, L.A.-Z.; writing—review and editing, M.A.-Y.; supervision, M.A.-Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data used in this study are publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Balakrishnan, V.; Ng, W.Z.; Soo, M.C.; Han, G.J.; Lee, C.J. Infodemic and fake news—A comprehensive overview of its global magnitude during the COVID-19 pandemic in 2021: A scoping review. Int. J. Disaster Risk Reduct. 2022, 78, 103144. [Google Scholar] [CrossRef] [PubMed]
  2. Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word embedding over linguistic features for fake news detection. IEEE Trans. Comput. Soc. Syst. 2021, 8, 881–893. [Google Scholar] [CrossRef]
  3. Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Citeseer, Los Angeles, CA, USA, 23–24 June 2003; pp. 29–48. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b3bf6373ff41a115197cb5b30e57830c16130c2c (accessed on 10 June 2024).
  4. Khanam, Z.; Alwasel, B.N.; Sirafi, H.; Rashid, M. Fake news detection using machine learning approaches. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2021; p. 012040. Available online: https://iopscience.iop.org/article/10.1088/1757-899X/1099/1/012040/meta (accessed on 17 February 2024).
  5. Madani, M.; Motameni, H.; Roshani, R. Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning. Int. J. Inf. Technol. Decis. Mak. 2023, 23, 1063–1098. [Google Scholar] [CrossRef]
  6. Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. Fake News Detection Model on Social Media by Leveraging Sentiment Analysis of News Content and Emotion Analysis of Users’ Comments. Sensors 2023, 23, 1748. [Google Scholar] [CrossRef]
  7. Min, B.; Ross, H.; Sulem, E.; Ben Veyseh, A.P.; Nguyen, T.H.; Sainz, O.; Agirre, E.; Heintz, I.; Roth, D. Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey. arXiv 2024, arXiv:2111.01243. [Google Scholar] [CrossRef]
  8. Wotaifi, T.A.; Dhannoon, B.N. Developed Models Based on Transfer Learning for Improving Fake News Predictions. JUCS J. Univers. Comput. Sci. 2023, 29, 491–507. [Google Scholar] [CrossRef]
  9. Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
  10. Antoun, W.; Baly, F.; Hajj, H. AraGPT2: Pre-Trained Transformer for Arabic Language Generation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 196–207. Available online: https://aclanthology.org/2021.wanlp-1.21 (accessed on 26 March 2022).
  11. Antoun, W.; Baly, F.; Hajj, H. AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 191–195. Available online: https://aclanthology.org/2021.wanlp-1.20 (accessed on 26 March 2022).
  12. Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 7088–7105. [Google Scholar] [CrossRef]
  13. Wei, X.; Huang, J.; Zhao, R.; Yu, H.; Xu, Z. Multi-Label Text Classification Model Based on Multi-Level Constraint Augmentation and Label Association Attention. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2024, 23, 1–20. [Google Scholar] [CrossRef]
  14. Farhangian, F.; Cruz, R.M.; Cavalcanti, G.D. Fake news detection: Taxonomy and comparative study. Inf. Fusion 2024, 103, 102140. [Google Scholar] [CrossRef]
  15. Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
  16. Wang, H.; Li, J.; Wu, H.; Hovy, E.; Sun, Y. Pre-Trained Language Models and Their Applications. Engineering 2023, 25, 51–65. [Google Scholar] [CrossRef]
  17. Mian, Z.; Deng, X.; Dong, X.; Tian, Y.; Cao, T.; Chen, K.; Al Jaber, T. A literature review of fault diagnosis based on ensemble learning. Eng. Appl. Artif. Intell. 2024, 127, 107357. [Google Scholar] [CrossRef]
  18. de Beer, D.; Matthee, M. Approaches to Identify Fake News: A Systematic Literature Review. Integr. Sci. Digit. Age 2020, 136, 13–22. [Google Scholar] [CrossRef]
  19. Bovet, A.; Makse, H.A. Influence of fake news in Twitter during the 2016 US presidential election. Nat. Commun. 2019, 10, 7. [Google Scholar] [CrossRef] [PubMed]
  20. Abu Salem, F.K.; Al Feel, R.; Elbassuoni, S.; Ghannam, H.; Jaber, M.; Farah, M. Meta-learning for fake news detection surrounding the Syrian war. Patterns 2021, 2, 100369. [Google Scholar] [CrossRef]
  21. Guarasci, R.; Catelli, R.; Esposito, M. Classifying deceptive reviews for the cultural heritage domain: A lexicon-based approach for the Italian language. Expert Syst. Appl. 2024, 252, 124131. [Google Scholar] [CrossRef]
  22. Abonizio, H.Q.; de Morais, J.I.; Tavares, G.M.; Junior, S.B. Language-Independent Fake News Detection: English, Portuguese, and Spanish Mutual Features. Future Internet 2020, 12, 87. [Google Scholar] [CrossRef]
  23. Blackledge, C.; Atapour-Abarghouei, A. Transforming Fake News: Robust Generalisable News Classification Using Transformers. arXiv 2021, arXiv:2109.09796. [Google Scholar]
  24. Koloski, B.; Stepišnik-Perdih, T.; Pollak, S.; Škrlj, B. Identification of COVID-19 Related Fake News via Neural Stacking. In Communications in Computer and Information Science; Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S., Eds.; Springer International Publishing: Cham, Germany, 2021; Volume 1402, pp. 177–188. [Google Scholar] [CrossRef]
  25. De, A.; Desarkar, M.S. Multi-Context Based Neural Approach for COVID-19 Fake-News Detection. In Companion Proceedings of the Web Conference 2022, Virtual Event; ACML: Lyon, France, 2022; pp. 852–859. [Google Scholar] [CrossRef]
  26. De, A.; Bandyopadhyay, D.; Gain, B.; Ekbal, A. A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 21, 1–20. [Google Scholar] [CrossRef]
  27. Das, S.D.; Basak, A.; Dutta, S. A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection. arXiv 2021, arXiv:2101.03545. [Google Scholar]
  28. Nagoudi, E.M.B.; Elmadany, A.; Abdul-Mageed, M.; Alhindi, T.; Cavusoglu, H. Machine Generation and Detection of Arabic Manipulated and Fake News. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 69–84. Available online: https://aclanthology.org/2020.wanlp-1.7 (accessed on 5 April 2022).
  29. Al-Yahya, M.; Al-Khalifa, H.; Al-Baity, H.; AlSaeed, D.; Essam, A. Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches. Complexity 2021, 2021, 5516945. [Google Scholar] [CrossRef]
  30. Harrag, F.; Debbah, M.; Darwish, K.; Abdelali, A. Bert Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 207–214. Available online: https://aclanthology.org/2020.wanlp-1.19 (accessed on 5 April 2022).
  31. Hussein, A.; Ghneim, N.; Joukhadar, A. DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, Online, 6 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 93–98. [Google Scholar] [CrossRef]
  32. Mahlous, A.R.; Al-Laith, A. Fake News Detection in Arabic Tweets during the COVID-19 Pandemic. Int. J. Adv. Comput. Sci. Appl. IJACSA 2021, 12, 30. [Google Scholar] [CrossRef]
  33. Ameur, M.S.H.; Aliane, H. AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset. Procedia Comput. Sci. 2021, 189, 232–241. [Google Scholar] [CrossRef]
  34. Ali, Z.S.; Mansour, W.; Elsayed, T.; Al-Ali, A. AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 231–236. Available online: https://aclanthology.org/2021.wanlp-1.26 (accessed on 29 March 2022).
  35. Haouari, F.; Ali, Z.S.; Elsayed, T. bigIR at CLEF 2019: Automatic Verification of Arabic Claims over the Web. In Proceedings of the Conference and Labs of the Evaluation Forum, Lugano, Switzerland, 9–12 September 2019. [Google Scholar]
  36. Alhindi, T.; Alabdulkarim, A.; Alshehri, A.; Abdul-Mageed, M.; Nakov, P. AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, Online, 6 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 57–65. [Google Scholar] [CrossRef]
  37. Kamr, A.M.; Mohamed, E. akaBERT at SemEval-2022 Task 6: An Ensemble Transformer-based Model for Arabic Sarcasm Detection. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), Online, 14–15 July 2022; Association for Computational Linguistics: Seattle, DC, USA, 2022; pp. 885–890. [Google Scholar] [CrossRef]
  38. Abdelali, A.; Darwish, K.; Durrani, N.; Mubarak, H. Farasa: A Fast and Furious Segmenter for Arabic. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA, 12–17 June 2016; DeNero, J., Finlayson, M., Reddy, S., Eds.; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 11–16. [Google Scholar] [CrossRef]
  39. Li, X.; Xia, Y.; Long, X.; Li, Z.; Li, S. Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English. arXiv 2021, arXiv:2101.02359. [Google Scholar]
  40. Vijjali, R.; Potluri, P.; Kumar, S.; Teki, S. Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking. arXiv 2020, arXiv:2011.13253. [Google Scholar]
  41. Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
  42. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Chintala, S. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.