BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation

Alnabhan, Mohammad Q.; Branco, Paula

doi:10.3390/bdcc8080093

Open AccessArticle

BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation

by

Mohammad Q. Alnabhan

^*

and

Paula Branco

School of Electrical Engineering and Computer Science, University of Ottawa, 800 King Edward Ave., Ottawa, ON K1N5N6, Canada

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2024, 8(8), 93; https://doi.org/10.3390/bdcc8080093

Submission received: 5 July 2024 / Revised: 10 August 2024 / Accepted: 13 August 2024 / Published: 16 August 2024

(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)

Download

Browse Figures

Versions Notes

Abstract

In an era where misinformation and fake news undermine social well-being, this work provides a complete approach to multi-domain fake news detection. Multi-domain news refers to handling diverse content across various subject areas such as politics, health, research, crime, and social concerns. Recognizing the lack of systematic research in multi-domain fake news detection, we present a fundamental structure by combining datasets from several news domains. Our two-tiered detection approach, BERTGuard, starts with domain classification, which uses a BERT-based model trained on a combined multi-domain dataset to determine the domain of a given news piece. Following that, domain-specific BERT models evaluate the correctness of news inside each designated domain, assuring precision and reliability tailored to each domain’s unique characteristics. Rigorous testing on previously encountered datasets from critical life areas such as politics, health, research, crime, and society proves the system’s performance and generalizability. For addressing the class imbalance challenges inherent when combining datasets, our study rigorously evaluates the impact on detection accuracy and explores handling alternatives—random oversampling, random upsampling, and class weight adjustment. These criteria provide baselines for comparison, fortifying the detection system against the complexities of imbalanced datasets.

Keywords:

domain classification; fake news; class imbalance; deep learning

1. Introduction

Social media has become a primary source of news that is readily accessible to people worldwide. Today, sharing news content on social media platforms is as simple as clicking a button, leading to a constant stream of news from various domains being uploaded daily [1]. Unfortunately, the ease of uploading news content coincides with the increased spread of fake news. This poses a major problem in terms of reliability amongst consumers and can have negative impacts, not only on individuals but also on social mobility, potentially inciting widespread social panic [2]. In 2020, it was reported that one in every five U.S. adults relied on social media as their primary source for news about the 2020 U.S. presidential election [3]. Therefore, detecting fake news is essential for assessing the truthfulness of the content being consumed. Weibo, previously known as Sina Weibo (http://www.weibo.com/, accessed on 25 June 2024), and X (formerly Twitter) (http://www.X.com/, accessed on 25 June 2024) are among the primary platforms used by social media users to obtain news. According to Weibo’s annual report, during the year 2020, a total of 76,107 fake news posts were detected and cleared. Similarly, a total of 8,711,000 engagements with Facebook stories were detected regarding the 2016 U.S. presidential election campaign, in comparison with a total of 7,367,000 engagements for the top 20 analyzed election stories on 19 major news websites (https://www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-outperformed-real-news-on-facebook, accessed on 25 June 2024).

While human judgment remains effective for fake news detection, machine learning (ML) models are increasingly employed by researchers for rapid and efficient fake news detection [4,5]. News gathered from social media has been categorized by topic into multiple domains, including health, social, politics, and entertainment [6]. While fake news detection is important in all domains, some domains have extremely limited fake news [7]. Further, fake news from specific domains has an impact on society that is deemed more significant [6]. For instance, the dissemination of fake news within the political domain during the 2016 U.S. presidential election may have had a significant effect on the election outcome [8,9]. Another example that led to a worldwide social panic [2] is linked to the influx of various information related to the COVID-19 pandemic. The impact of spreading fake news via various social media platforms led to a reduction in pandemic prevention measures [10]. As such, fake news within specific domains like political or health domains tends to be more significant than other fake news domains because of its ability to delve deeper, spread widely and more rapidly, and reach a larger audience.

To address the issue revolving around detecting fake news, a wide range of machine learning detection models have been suggested. Typically, these approaches concentrate on single-domain fake news detection (SFND) in areas such as politics or health, where an algorithm is specifically trained to identify fake news within a particular domain. However, models trained in this manner often produce inadequate results when applied to other domains [11]. Although it has been found that news pieces from various domains can be interconnected, which can potentially enhance the detection of fake news for the target domain, the use of SFND is currently viewed as a disadvantage [1,7]. As such, multi-domain fake news detection has been used to address the limitations of single-domain fake news detection [12]. Multi-domain fake news detection allows for the utilization of various domains at once within a given dataset, leading to an improvement in the performance in all domains. However, this comes with great complexity, creating a flaw within multi-domain fake news detection such that the detection of fake news in some domains will be higher at the expense of other domains performing worse [13].

In addition, the majority of the available real-world domains’ instances belong to the real news class (the majority or negative class), while a significantly smaller number corresponds to the opposite fake news class (the minority or positive class), which is the most important class [14,15]. This is known as the class imbalance problem, in which the dominant class has an advantage when training predictive models, causing them to disregard the minority class. The imbalance problem leads to inadequate classification performance, and most algorithms perform badly when datasets are substantially imbalanced [16]. The class imbalance problem also affects fake news detection, as there is typically a lower representation of fake news instances compared to true news instances in the available data, which makes the model predict the true class (true news) almost all the time [15].

A critical gap remains from previous works, which is the lack of the details needed to train models for each domain comprehensively. Moreover, existing solutions typically train models on one or two datasets at most, which limits the scope of their analyses. Recognizing the lack of systematic and comprehensive work in multi-domain fake news detection, this study presents BERTGuard, a holistic approach to multi-domain fake news detection. BERTGuard combines datasets from a diverse set of news domains to build the groundwork for a complex detection system that functions on two levels. The first level of our detection system performs domain classification, which is a critical phase in which a Bidirectional Encoder Representations from Transformers (BERT) model is trained on a merged multi-domain dataset. This initial classification attempts to determine the exact domain to which a given news story belongs. The second level uses domain-specific BERT models to predict the news inside each designated domain. This hierarchical structure ensures that the detection process is tailored to the unique characteristics of each domain, increasing the overall system’s precision and reliability.

To assess the performance and generalizability of our technique, we rigorously tested the created detection system in five distinct domains: politics, health, science, crime, and social. We considered as baselines a single-level BERT model (Baseline 1) and the model proposed by [12] (Baseline 2). Finally, we assessed the impact of class imbalance on detection accuracy and explored three different handling approaches; random downsampling, random upsampling, and class weight adjustment.

Contributions: The key contributions of this paper are as follows:
- Introduction of BERTGuard: We propose BERTGuard, a novel two-tier solution for multi-domain fake news detection utilizing BERT-based models, which addresses limitations in existing methods by improving the semantic understanding of textual data.
- State-of-the-art performance: We carry out an extensive evaluation of our proposed approach by comparing it against existing alternative solutions and demonstrate that BERTGuard outperforms existing methods by achieving a 27% improvement in accuracy and a 29% increase in F1-score across multiple benchmark datasets, establishing a new state-of-the-art in fake news detection.
- Class imbalance analysis: We analyze the impact of three methods for dealing with the class imbalance inherent to these domains, providing valuable insights for future research and practical applications in fake news detection.
Organization: The structure of this paper is as follows. Section 2 provides the background and a review of the literature. In Section 3, we present our proposed multi-stage solution for fake news detection, BERTGuard. In Section 4, we outline the experimental methodology employed in our study, providing details on the datasets and the evaluation metrics utilized. Section 5 presents the experimental results and analyzes the findings. Lastly, Section 6 concludes our paper.

2. Background and Related Work

2.1. Fake News Detection

Fake news consists of false information presented in various formats, including articles, images, or videos, that are spread across social media platforms to mimic genuine news content [17]. Such fake news is deliberately designed to deceive and influence readers into accepting the presented content as true [18]. Fake news can be generated using various tools, including bots, trolls, and social robots [18]. Social robots take advantage of fake news and input from readers. These bots are first designed as computer algorithms to engage in online disputes and spread false information. As a result, fake news causes a problem because of its ease of dissemination while potentially covering a wide variety of topics and targeting an immense number of websites. This directly influences readers since it deals with topics including democracy, media, health, money, and social concerns, among others [19].

Recent research examined how artificial intelligence (AI) influences user processing of and reactions to fake news in GenAI contexts [20]. The study used a heuristic–systematic model and diagnosticity to develop a cognitive model for processing fake news. It discovered that people with high heuristic processing mechanisms are more adept at distinguishing false information due to improved positive diagnostic perception than those with low heuristic processing. Furthermore, users’ perceived diagnosticity of fake news from GenAI may be anticipated using their heuristic systematic evaluations [20].

Deep learning (DL) is becoming more popular for detecting false news than other traditional ML approaches. Manually constructing features in traditional ML can be time-consuming and can contribute to biases [21]. However, DL needs a bigger dataset to train the model [21]. On the other side, DL has proven outstanding for the identification of fake news by automatically extracting useful information. Fake news detection methods that include convolutional neural networks (CNNs) [22], recurrent neural networks (RNNs) [23], BERT [24], and graph neural networks (GNNs) [25] have all been used to detect fake news.

The COVID-19 pandemic in 2020 highlighted the critical need for detecting fake news, as false information about the virus spread extensively on social media. This ’infodemic’, as termed by the World Health Organization (WHO) [26], encouraged researchers to propose a CNN-based model utilizing word embedding to detect COVID-19-related fake news [27]. This architecture employed a grid search to optimize the model hyperparameters, enabling the proposed CNN model to achieve impressive results, including a 96.19% mean accuracy and a 95% mean F1-score, outperforming several state-of-the-art machine learning algorithms.

Nasir et al. [28] developed a hybrid model that integrates a CNN and long short-term memory (LSTM). The CNN extracts the features, while LSTM captures the long-term dependencies. The approach uses Global Vectors for Word Representation (GloVe), which are pre-trained word embeddings that represent words as vectors. This methodology surpassed seven established ML techniques in terms of performance: logistic regression, random forest, multinomial naïve Bayes, k-nearest neighbors, decision tree, CNNs, and RNNs.

Kaliyar et al. [29] developed a deep-learning network based on pre-trained GloVe word embeddings called FNDNet. Three convolutional pooling layers gathered characteristics from word embedding vectors, which were then classified using additional convolutional pooling and dense layers.

Saleh et al. [30] proposed an optimized DL approach, OPCNN-FAKE, based on CNN architecture. They assessed its performance against RNNs, traditional ML approaches, and LSTM. The model includes an embedding layer that generates embedding vectors, a dropout layer for improved regularization, a convolution-pooling layer for extracting and reducing features, a flattened layer that produces a one-dimensional vector, and an output layer that decides whether the input text is fake or real based on the previous layer’s output.

Yang et al. [31] designed another CNN-based approach (TI-CNN) that detects fake news by integrating obvious and hidden features from text and picture data. The authors evaluated their approach to various models such as LSTM, GRU, and CNN. Similarly, Raj and Meel [32] developed a CNN approach that uses text and image data to categorize internet news. However, these methods may fail to capture long-term contextual information. Furthermore, the word embedding does not capture context-specific information inside the text.

Hashmi et al. [33] proposed a thorough approach to detecting fake news utilizing three publicly available datasets: WELFake, FakeNewsNet, and FakeNewsPrediction. They combined FastText word embeddings with a range of ML and DL methods, improving these algorithms through regularization and hyperparameter optimization to reduce overfitting and to boost model generalization. Significantly, a hybrid model that integrated CNNs and LSTMs and was enhanced with FastText embeddings surpassed other methods in terms of classification performance across all datasets, achieving accuracy and F1-score of 99%, 97%, and 99%, in WELFake, FakeNewsPrediction, and FakeNewsNet respectively.

Transformer-based models have also been explored for fake news detection. Bidirectional Encoder Representations from Transformers (BERT) is a DL model designed to generate text, analyze sentiment, and understand natural language. It leverages transformer encoders to perform various natural language processing (NLP) tasks [34]. BERT has been employed in previous studies to achieve superior results on various sentiment analysis tasks [35]. Despite being pre-trained on a large set of textual data, BERT needs to be modified to perform efficiently on a given task [36]. Several versions of BERT exist in the literature, with each tailored to specific domains or tasks. Some notable BERT versions include:

RoBERTa: Robustly Optimized BERT Pretraining Approach (RoBERTa) is a variant of BERT designed to improve the training process. It was developed by extending the training duration, using larger datasets with longer sequences, and employing larger mini-batches. The researchers achieved significant performance enhancements by adjusting several hyperparameters of the original BERT model [37].
DistilBERT: DistilBERT is a more compact, faster, and cost-effective version of BERT, which is derived from the original model with certain architectural features removed to enhance efficiency [38]. Similar to BERT, DistilBERT can be fine-tuned to achieve strong performance in various natural language processing tasks [39].
BERT-large-uncased: This is a BERT model with 12 more transformer layers, 230 million more parameters, and a larger embedding dimension, allowing the model to encapsulate much more information than BERT-base-uncased [40].
BERT-cased: This is a model trained with the same hyperparameters as first published by Google [39].
ALBERT: ALBERT is short for ‘A Lite’ BERT and is a streamlined version of BERT designed to reduce memory usage and accelerate training. It employs two primary parameter-reduction techniques: dividing the embedding matrix into two smaller matrices and using shared layers grouped together [41]. ALBERT is recognized for its reduced number of parameters compared to BERT, which enhances its memory efficiency while still delivering strong performance. These variations highlight BERT’s adaptability across various domains and its potential for tailored customization to specific tasks.

Table 1 summarizes the latest key advancements in fake news detection utilizing transformer models.

A comparative study was carried out involving five transformer models: BERT, ALBERT, RoBERTa, XLNet, and DistilBERT [56]. The authors ran the comparison with different hyperparameter combinations and yielded equivalent results.

Another comparative study investigated cross-domain models’ effectiveness and found that BERT-based models achieved the best detection accuracy [11].

Subsequently, a deep learning model combining BART and RoBERTa was developed to differentiate between true and fake news articles [55]. The embeddings from both BART and RoBERTa were first processed through LSTM and CNN architectures and were then concatenated and further processed through additional LSTM and CNN layers.

In addition, Kaliyar et al. [12] introduced FakeBERT, a deep learning model for fake news detection that leverages BERT in conjunction with 1D CNNs of various kernel sizes and filter configurations to enhance classification performance. BERT is used to generate word embeddings, which are subsequently processed through three parallel convolutional layers with different kernel sizes. The dataset used encompasses both social and political news, with a focus on the U.S. presidential election of 2016. This BERT-based approach achieved an impressive 98.9% detection accuracy, outperforming other ML techniques.

An attention-based transformer model has also been employed to detect fake news [58]. The authors compared their approach with a hybrid CNN model that integrates both text and metadata. The transformer model demonstrated superior accuracy compared to the hybrid CNN model. However, transformer-based methods often involve significant computational costs and require large amounts of training data.

Using single datasets for fake news detection has several limitations. First, they provide limited coverage: single datasets may focus on specific topics or domains, such as political statements or social media posts, leading to a lack of diversity in the types of fake news covered [59]. Second, there is dataset bias: datasets constructed only with specific types of news, such as political or e-commerce news, can lead to biased models that perform poorly when detecting news related to other topics, resulting in dataset bias [59,60]. Third, there is a lack of labeled data: the shortage of labeled data for training detection models impedes the development of effective fake news detection systems [54]. Fourth, there are problems with overfitting and generalizability: limited and imbalanced datasets can cause machine learning and deep learning models to overfit or underfit, affecting their ability to generalize and perform well [61,62].

To overcome these limitations, it is crucial to utilize a diverse set of publicly available evaluation datasets for fake news detection and incorporate multiple domain-specific datasets. Implementing cross-domain, cross-language, or cross-topic analyses provides a comprehensive approach by incorporating datasets across various domains or topics, thereby enhancing the detection process and improving the generalizability of fake news detection models [63].

Many studies on the automated detection of fake news have relied on datasets confined to a single domain, such as political, social, or health domains, for model training and evaluation. This focus on a single domain is driven by the performance decline observed in these machine and deep learning techniques when they encounter unseen data from other domains. Domain-specific features, particularly style-based attributes, can vary significantly across different domains [63]; consequently, features must be carefully selected to distinguish between fake and real news within the specific domain being examined.

The current state-of-the-art highlights the need for further research across various domains. As a result, comprehensive cross-domain techniques for fake news detection are essential, despite some previous studies attempting to tackle this issue using cross-domain data, such as the recent work done by Alnabhan and Branco [11].

Other works exist that address the multiple-domain problem in detecting fake news. Han et al. [64] proposed a continuous learning approach for domain-agnostic fake news detection that utilized a graph neural network to sequentially learn across multiple domains. However, this method has two drawbacks: (1) it assumes that new domains will arrive in a specific sequence, and (2) it presumes that these domains are already known, which is not the case in real-world data streams. In contrast, Cardoso et al. [65] introduced a method that retains knowledge across different domains by leveraging a robust, optimized BERT model to select informative instances for manual annotation from a large, unlabeled dataset. Consequently, earlier studies explored integrating information from multiple domains to develop a cross-domain fake news detection model. For example, Castelo et al. [66] trained a model on the Celebrity dataset and tested it on the US-Election2016 dataset to determine the generalizability of their approach.

Huang et al. [57] proposed a novel framework for detecting fake news in new domains called DAFD: Domain Adaptation Framework for Fake News Detection. The framework combines domain adaptation and adversarial training strategies to align the data distribution of the source and target domains and enhance the model’s generalization and robustness. The framework consists of a pre-training phase, where the data distribution alignment is performed, and a fine-tuning phase, where adversarial examples are generated to further improve the model’s performance. Experiments conducted on real datasets, including PolitiFact, GossipCop, and COVID show that DAFD outperforms state-of-the-art methods for detecting fake news in new domains with a small amount of labeled data. The framework’s components were analyzed, showing that both domain adaptation and adversarial training are crucial for improving detection performance.

Mosallanezhad et al. [34] proposed a domain-adaptive model called Reinforced Adaptive Learning Fake News Detection (REAL-FND). REAL-FND leverages generalized and domain-independent features to differentiate between fake and true news. This approach is based on prior findings that suggest domain-invariant features can improve the robustness and generalizability of fake news detection techniques. For example, it has been noted that fake news publishers frequently use clickbait writing styles to capture the attention of targeted audiences, highlighting a domain-invariant characteristic. Moreover, patterns derived from social contexts, such as a user’s comment disputing a news article or interactions between users and identified fake news disseminators, offer critical supplementary information for classifying fake news within a specific domain. In REAL-FND, the approach departs from the conventional method of using adversarial learning to train cross-domain models. Instead, it employs a reinforcement learning (RL) component to transform the learned representation from the source domain to the target domain. Unlike other RL-based methods that modify model parameters, in REAL-FND, the RL agent adjusts the learned representations to obscure domain-specific features while preserving domain-invariant components. This method offers greater flexibility than adversarial training, as the RL agent can directly optimize the confidence values of any classifier without the need for a differentiable objective function.

2.2. Fake News Datasets

Researchers identify the primary challenge in fake news detection as the scarcity of large-scale datasets and the absence of a comprehensive benchmark dataset with reliable ground truth labels [67]. Furthermore, there are not many datasets with different labels, sizes, and application domains that can be found online for the detection of fake news. Certain datasets are sourced exclusively from political statements, while others come from postings on social media and even news articles. This diversity presents a significant obstacle in the field of fake news.

Additionally, acquiring datasets for academic research is challenging due to privacy restrictions on extracting data from online sources. One solution is to purchase data from these platforms or crowdsourcing websites. Another approach is to utilize existing datasets from the literature that align with the study’s requirements, such as ISOT [68], PHEME [69], Liar [70], GossipCop [67], FakeNews (https://www.kaggle.com/competitions/fake-news/data, accessed on: 12 May 2024), Fake-OR-Real (https://github.com/joolsa/fake_real_news_dataset, accessed on: 12 May 2024), Snopes (http://fakenews.research.sfu.ca/, accessed on: 12 May 2024), COVID-19 FakeNews (https://data.mendeley.com/datasets/zwfdmp5syg/1, accessed on: 12 May 2024), COVIFN (https://ieee-dataport.org/documents/covifn-fake-news-covid19, accessed on: 12 May 2024), Politifact (https://www.kaggle.com/datasets/rmisra/politifact-fact-check-dataset, accessed on: 12 May 2024), Climate [71], and COVID-Claims (https://ieee-dataport.org/open-access/covid-19-fake-news-infodemic-research-dataset-covid19-fnir-dataset, accessed on: 12 May 2024).

2.3. Dealing with Class Imbalance

The class imbalance problem has received much attention [72]. However, when observing the particular domain of fake news detection, we verify that this issue still has received very little attention. Existing research has shown the prevalence of the imbalance problem as cases of fake news proliferate, sometimes surpassing those of true news [73]. This imbalance complicates the development of accurate and trustworthy fake news detection methods. Overfitting of neural networks due to class imbalance is a key challenge, highlighting the need for investigation and improvements in this domain [61].

While some studies have explored specific elements of fake news detection models, including dataset division, features, and classifiers, there has been an insufficient examination of the limitations of datasets and features and their impact on detection models, particularly regarding class imbalance [15,61]. Furthermore, the precision of detection models is significantly unsatisfactory, with a low rate of detection and a lengthy detection processing duration [59,63,73].

In addition, research has been performed to minimize domain biases and increase the accuracy of cross-domain fake news detection, which is connected to the difficulty of imbalanced data across different domains [60]. In addition, a comprehensive review highlights the constraints of current fake news detection models due to imbalanced and limited datasets, highlighting the necessity for comprehensive cross-domain techniques to address these difficulties [59].

Furthermore, learning techniques such as resampling methods (e.g., oversampling and undersampling), data augmentation, and cost-sensitive learning have been utilized to balance the class distribution and increase the accuracy of fake news detection models [74]. These approaches have yielded promising results in addressing the class imbalance problem in fake news detection.

Keya et al. [75] created ‘AugFake-BERT’ to control imbalances by data augmentation to boost the effectiveness of fake news categorization, specifically addressing the influence of imbalanced datasets on detection models. Similarly, [76] used back-translation as data augmentation, applying pre-trained word embeddings (Word2Vec, GloVe, and fastText) in CNN, bidirectional LSTM, and ResNet models for fake news detection.

Mouratidis et al. [77] addressed the class imbalance issue by implementing SMOTE (Synthetic Minority Over-sampling Technique) to oversample the minority class. SMOTE is a widely used method for generating synthetic samples to effectively reduce the class imbalance problem in machine learning models. This approach resulted in an improved accuracy and F1-score, improving from 95% and 99% to 98% and 100%, respectively.

Additionally, other methods such as the focal loss function and specific learning techniques have demonstrated the ability to achieve high accuracy and satisfactory recall, further mitigating the effects of class imbalance [78].

The amount of research studies conducted highlights ongoing efforts to tackle the imbalance in fake news detection.

3. BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation

In this section, we outline our BERTGuard approach for fake news detection, which utilizes a two-stage detection method based on BERT, and we explore the impact of domain-specific classification. The first stage of our solution entails classifying news domains with BERT to capture the nuances associated with various sources of information. Then, in the second stage, we use domain-specific BERT models to determine the validity of news within their respective areas. Figure 1 presents the BERTGuard detection approach.

We chose to base our approach on BERT due to its proven ability to enhance fake news detection by capturing complex linguistic patterns and contextual nuances. Numerous studies have validated BERT’s effectiveness in this domain [11,22,79,80,81,82]. In addition, BERT’s contextualized word representations enable it to capture the nuanced meaning and context of words and phrases within news articles, which is crucial for distinguishing between real and fake news. Moreover, BERT handles the issue of missing information in fake news detection through its advanced language understanding capabilities and the ability to capture contextual information. By analyzing the language used in news articles and comparing it to a database of known fake news, BERT can identify patterns and inconsistencies that suggest a news article may be fake, even in the presence of missing information [12]. Additionally, BERT’s fine-tuning capability allows it to adapt to the nuances of the task and the dataset, which can help mitigate the impact of missing information on the overall fake news detection accuracy [12].

As depicted in Figure 1, the initial stage of the pipeline employs a BERT-based multi-class classification model. This model ingests diverse text formats, such as news articles, and analyzes them to determine and assign a relevant domain.

Following initial testing, we discovered that the base version of DistilBERT, a distilled variant of BERT, outperformed other BERT models and was notably faster in both the training and testing phases. It was therefore decided to use DistilBERT for classification.

We trained this model on all the training datasets allocated for this purpose. These training datasets were chosen from five different news domains. We merged the datasets FA-KES, COVID-FN, COVID-FNIR, FakeNews, ISOT, LIAR, Climate, and GossipCop into a single dataset for training DistilBERT. The data were preprocessed for classification by encoding the categories into integer values (crime: 0, health: 1, politics: 2, science: 3, and social media: 4).

The DistilBERT model was therefore set up to have five labels, one for each category. The training, validation, and testing datasets were encoded with the DistilBERT tokenizer and converted into PyTorch tensors. The motivation for domain classification arises from the fact that a news article often belongs to multiple domains or topics, i.e., news articles may overlap across different topics. In addition, the domain classification model was tuned by adjusting the number of epochs, and several iterations of a given epoch were performed to assess the performance at a given epoch.

Following the domain classification stage, the pipeline directs the input data to a domain-specific detection model. We conduct an initial testing phase to determine the most suitable BERT for each domain. The models in this stage are trained on datasets with data belonging to that specific domain, making them specialized at detecting fake content within that domain. The model outputs a boolean value denoting whether the input is detected as fake or not. Fine-tuning of the models was done to maximize the F1-scores. The fine-tuning that was done on all of the models above for each category includes epochs, optimizer, and learning rate.

Five domain-specific models were trained on each of the five domains considered (crime, health, politics, science, and social media). The pre-trained model selection consists of the tokenizer, the sequence classifier, and the model itself. Below are all the options used:

Tokenizer: BertTokenizer, classifier: BertForSequenceClassification, and model: BERT (base uncased);
Tokenizer: BertTokenizer, classifier: BertForSequenceClassification, and model: BERT (base cased);
Tokenizer: DistilBertTokenizer, classifier: DistilBertForSequenceClassification, and model: DistilBERT (base uncased, finetuned SST-2 English);
Tokenizer: DistilBertTokenizer, classifier: DistilBertForSequenceClassification, and model: DistilBERT (base uncased);
Tokenizer: RobertaTokenizer, classifier: RobertaForSequenceClassification, and model: RoBERTa (base);
Tokenizer: AlbertTokenizer, classifier: AlbertForSequenceClassification, and model: ALBERT (base v2).

Overall, BERTGuard incorporates an initial BERT model for domain classification, followed by five specialized BERT-based models tailored to the detected domain.

In addition, BERTGuard addresses the critical issue of class imbalance in fake news detection. Our approach includes balancing the training datasets through oversampling the minority class, undersampling the majority class, and adjusting class weights.

4. Materials and Methods

In our study, we developed BERTGuard: a two-stage fake news detection approach utilizing BERT and focusing on domain-specific classification. The first stage involves classifying news domains with BERT to capture the nuances of various information sources. In the second stage, domain-specific BERT models are used to assess the validity of news within their respective domains. To establish baselines for comparison, we developed a model that disregards news domains to serve as the first baseline. We also used a well-known model using a base version of BERT called FakeBERT [12] as another baseline. Various versions of BERT were utilized after carefully considering the performance of and the time required by these models. This dynamic selection process enhances the adaptability and effectiveness of our solution. Based on these initial experiments, we chose the following BERT versions for the following news domains:

Crime: distilbert-base-uncased-finetuned-sst-2-english;
Health: distilbert-base-uncased;
Politics: bert-base-uncased;
Science: roberta-base;
Social media: distilbert-base-uncased.

For the first stage of our BERTGuard, we trained the DistilBERT model on a merged dataset comprising the FA-KES, COVID-FN, COVID-FNIR, FakeNews, LIAR, Climate, and GossipCop datasets. The dataset was preprocessed by encoding categories into integer values (crime: 0, health: 1, politics: 2, science: 3, and social media: 4). The DistilBERT model was configured with five labels: one for each category. In the second stage of BERTGuard, each domain’s model was trained on the training datasets from the respective domain. For evaluation, we compared the performance of BERTGuard against the baselines.

Our BERTGuard also attempts to mitigate the class imbalance since this is a critical issue in the fake news detection context. These attempts include balancing the training datasets by oversampling the minority class, undersampling the majority class, and adjusting the class weights.

Oversampling entails boosting the number of instances in the minority class by randomly replicating them. This technique helps mitigate bias towards the majority class, enhancing the model’s ability to learn from minority class examples.

Conversely, undersampling involves decreasing the number of instances in the majority class by randomly discarding some samples. This approach prevents the model from being dominated by the majority class, allowing it to concentrate more on learning from the minority class.

In addition, adjusting class weights is another technique used to address class imbalance. The idea is to assign higher weights to the minority class (e.g., fake news) and lower weights to the majority class (e.g., true news) during training to ensure that the model pays more attention to the minority class. The common approach to determining the optimal class weights is manual tuning. It is to manually tune the class weights based on domain knowledge or prior experience with the dataset. The choice of class weights is often a balance between overfitting and underfitting. Assigning excessively high weights to the minority class can lead to overfitting, where the model becomes overly biased towards classifying instances as the minority class, potentially reducing its overall accuracy and effectiveness. Conversely, setting the weights too low can result in underfitting, where the model does not adequately learn from the minority class, leading to poor performance in those examples. This required experimenting with different weight values and evaluating the model’s performance on a validation set.

Regarding the main baseline model, we developed a BERT (base uncased) model for one-stage fake news detection (Baseline 1) without paying attention to news domains. The implementation is based on what we did in the domain classification model, with the following changes: the label column will be either ‘1’ or ‘0’, corresponding to ‘fake’ or ‘not fake’, and this will be the target for the detection model.

The other baseline (Baseline 2) that we used from the literature, FakeBERT [12], was trained on the same dataset used to train our main baseline.

4.1. Data Collection

We chose the most widely used and publicly available datasets related to the following fake news domains shown in Table 2.

Regarding the training data and the preprocessing tasks, the model in the domain classification stage (the first stage of BERTGuard) is trained on a dataset composed of FA-KES, COVID-FN, COVID-FNIR, FakeNews, ISOT, LIAR, Climate, and GossipCop datasets. These datasets represent the five news domains we selected. The data are processed into ‘text’ and ‘label’ columns. The ‘text’ column contains all the news article content (title and text) used for detection. The ‘label’ column will be either ‘1’ or ‘0’, corresponding to ‘fake’ or ‘not fake’, which serves as the target for the detection model.

Combining these datasets to train the domain classification model required careful consideration as some datasets include more textual columns such as ‘title’ and ‘text’. We concatenated these fields into a single ‘text’ column containing all relevant content from the news articles. All other features were stored in a JSON dictionary and kept as supplementary metadata. By retaining the additional features in a metadata field, they can be readily accessed if needed for future refinements or other aspects of the project.

The datasets (news articles) we used were already labeled by their respective sources. We did not perform additional labeling. Each article was assigned a label in the ‘label’ column, where ‘1’ indicates ‘fake’ and ‘0’ indicates ‘not fake’. Some datasets had more than two possible truth values, so articles that were labeled half-true or mostly-true were labeled as 0 (True), while articles labeled barely-true or pants-fire were labeled 1 (Fake). This label was used as the target variable for our detection models.

For domain labeling and categorization, we added a new ‘domain’ column to each dataset, indicating the specific category (domain) of each article based on its content and source. The domain label is consistent across all news articles within the same dataset. The domains were categorized into five primary areas: Politics, Health, Crime, Social, and Science. These categories were encoded into integer values for consistency and ease of processing as follows: Crime: 0, Health: 1, Politics: 2, Science: 3, and Social: 4. We followed a systematic approach, splitting the data into training, validation, and testing sets with a ratio of 7:1:2. We also included essential preprocessing steps and tracked the training duration to maintain consistency and optimize effectiveness. We tuned the models by adjusting the number of epochs and running multiple training iterations at a given epoch.

To achieve a unified format for all the collected datasets, comprehensive preprocessing was performed. This standardization ensures minimal additional processing in later stages and maintains data consistency across the project. The following steps detail the main preprocessing tasks:

Label standardization: The label values were standardized to ensure uniformity. The datasets that we utilized in this study were already labeled by their sources. In this project, ‘1’ was assigned to indicate ‘fake’ news, and ‘0’ indicates ‘not fake’ news. Some datasets originally had multiple classes representing varying degrees of ‘fakeness’. These were binarized to a fake/not fake format based on criteria that balanced the dataset as effectively as possible.
Text processing:
- Cleaning: raw data were cleaned to remove irrelevant content such as HTML tags, scripts, special characters, and advertisements.
- Standardization: standardized formats were applied, such as converting dates to a consistent format and removing duplicates.
- Normalization: text normalization included converting text to lowercase, removing punctuation, and expanding contractions.
- Tokenization: the text was tokenized to break it down into individual words or tokens.
- Stop word removal: common stop words were removed to focus on the most meaningful words.
- Lemmatization: words were reduced to their base or root forms through lemmatization, aiding in the normalization of text data.
Handling missing data: entries with missing titles, bodies, or descriptions were removed to ensure the integrity of the dataset.
Multi-language data: for datasets containing multi-language data, articles were filtered to include only those written in English, ensuring uniformity in text processing and model training.
Metadata management: all other features not directly used in the primary analysis were stored in a JSON dictionary as supplementary metadata. This approach allows these features to be easily utilized if needed for further refinement of the results.

Regarding the detection stage (second stage of BERTGuard), each domain-specific model was trained using the training datasets that represent that domain. For example, we used the FakeNews, ISOT, and LIAR datasets to train the political BERT model. The remaining political datasets (Pheme and Politifact) were used later for final complete detection system testing.

Data were handled by loading them as a whole category to be used to train and test models. This process included detecting all feather files, loading them as dataframes, then concatenating all of the dataframes together to create a combined dataframe. This was then split using the train_validation_test_split from sklearn, and the subcategories were used accordingly.

4.2. Evaluation Metrics

The system’s performance was evaluated using various metrics, as detailed in Table 3. These metrics were chosen to provide a thorough assessment of the model’s effectiveness at detecting fake news. We included accuracy as well as several metrics suitable for imbalanced domains. This allowed us to obtain a better understanding of the performance of the models in an imbalanced setting [83].

5. Results and Discussion

This section presents the results of our experimental evaluation. We begin by presenting the ad hoc testing results for selecting the best model for each case and domain. Afterward, we observe the performance of the baseline models considered and BERTGuard. Then we explore the effect of handling the imbalance problem by adjusting the class weights and applying upsampling and downsampling.

5.1. Model Selection

In this section, we present the initial ad hoc testing for picking the model for classification and detection. This is not meant to be an exhaustive test but rather a quick comparison between the models when deciding the best course of action.

To select the domain classification model, we used a balanced dataset, with 850 samples per category in the training/validation data and 160 samples/category in the testing data. The testing/validation split was 80% to 20%. Table 4 shows the accuracy for each model used.

While roberta-base achieved a slightly higher average accuracy (98.6%) compared to distilbert-base-uncased (96.8%), the training time for distilbert-base-uncased was significantly shorter. The distilbert-base-uncased model trained in 96.21 s, whereas the roberta-base model required 1545.98 s.

This substantial difference in training time demonstrates that distilbert-base-uncased is much more efficient, making it a more practical choice, especially when computational resources and time are constrained. Moreover, the marginal difference in accuracy (1.8%) does not justify the significantly higher training time and resource consumption of the roberta-base model. Therefore, we opted for the distilbert-base-uncased model to achieve a balance between high accuracy and efficient resource utilization.

For the domain-specialized detection model, another quick testing task took place for each news domain. Table 5 presents the average accuracies of different models tested for each domain.

For deciding the best model for each news domain, we prioritized the highest average accuracy for each specific domain:

Crime: The distilbert-base-uncased-finetuned-sst-2-english model performed the best, with an average accuracy of 68.8%. This suggests that the finetuned version of DistilBERT is better at capturing the nuances in the crime domain.
Health: The distilbert-base-uncased model achieved the highest accuracy of 88.3%, making it the most suitable for health-related news detection.
Politics: The bert-base-uncased model had the highest accuracy at 87.8%, indicating its effectiveness in detecting political news.
Science: The roberta-base model excelled in the science domain, with an accuracy of 80.2%, suggesting it handles the specific language and context of science-related news more effectively.
Social: The distilbert-base-uncased model stood out, with a remarkable accuracy of 95.6%, making it the best choice for the social domain.

These selections balance accuracy across various domains, ensuring each model is optimized for the specific characteristics of the content it will encounter. While the roberta-base model showed strong performance in certain areas, the distilbert-base-uncased variants, including the finetuned version, offered competitive accuracies with the added benefit of being more resource-efficient, as noted in previous experiments.

5.2. Single-Stage Fake News Detection Results

In this section, we present the results of testing the first baseline model (Baseline 1) that we used in our work. This model is a BERT (base uncased) model trained on the merged datasets from the domains considered and tested on the set of unseen datasets. Domain categories were not taken into consideration when training this model. Table 6 presents the precision, recall, accuracy, F1-score, and G-mean when testing the Baseline 1 model, which is the main baseline model we developed.

The evaluation of Baseline 1 reveals the drawbacks of a non-domain-specific approach. Across several datasets from various domains, distinct patterns emerge. Starting with the accuracy and recall measures, the model’s performance varies significantly between datasets. Notably, in the ‘Snope’ dataset, the model shows higher precision but lower recall, demonstrating a tendency to properly identify true news while missing a significant proportion of fake news cases. In contrast, in the ‘GossipCop’ dataset, the model achieves a greater recall, indicating a stronger capacity to detect fake news but with a trade-off in precision. On the other hand, Specialized datasets such as ‘COVID-Claims’ pose distinct difficulties for the baseline model, as evident in the reduced precision, recall, and F1-score. This underscores the importance of tailoring models to specific content domains, particularly those with distinct characteristics such as health-related information.

The baseline model has modest precision and recall in the ‘Politifact’ dataset, indicating some adaptation across platforms. However, the findings highlight the need for a more complex, domain-aware strategy for detecting fake and accurate news across several platforms. However, an unanticipated anomaly appears in the ‘Pheme’ dataset, where the model achieves significantly higher precision and accuracy than other performance measures. This anomaly necessitates more analyses to identify potential biases or unique dataset factors that could have influenced the model’s performance. Understanding such anomalies is critical for improving the model’s adaptability to different datasets.

The discovered disparities highlight the limitations of creating a one-size-fits-all fake news detection methodology. While the baseline approach shows promise in some domains, its performance limits emphasize the need for domain-specific modifications.

5.3. Multi-Domain Fake News Approach

In this section, we present the results of testing our BERTGuard, a multi-tier detection system, by using unseen datasets that we kept for this purpose. We compared Baseline 1, the existing Baseline 2 (FakeBERT) solution, and our proposed BERTGuard.

Table 7 shows the overall results of these three solutions averaged across all the testing datasets.

The results demonstrate that our domain-specific strategy significantly improves detection performance by leveraging domain-specific information. Moving into more detailed results of our experiments, Table 8 presents the precision, recall, accuracy, F1-score, and the G-mean when testing the proposed BERTGuard for each testing dataset.

BERTGuard had significant improvements across multiple datasets that reflect multiple news domains, as evidenced by the precision, recall, F1-score, accuracy, and G-mean metrics.

The findings from the ‘Snopes’ and ‘COVID-Claims’ datasets demonstrate the success of our domain-specific approach in these crucial domains. With precision reaching 83% and recall over 97% in ‘Snopes’ and precision and recall both exceeding 95% in ‘COVID-Claims’, the model exhibits an impressive ability to properly classify both true and fake news instances, demonstrating its durability in areas where accuracy is critical.

In the ‘Pheme’ dataset, the model performs superbly, with a precision of 84.76%, suggesting a high proportion of accurately identified fake news events. However, the recall is rather low at 62.47%, indicating possible difficulties in collecting all instances of fake news in this particular domain. This intricate balance requires further investigation to fully comprehend the complexities of information transmission within the ‘Pheme’ dataset.

The model encounters significant obstacles in the ‘Politifact’ and ‘Climate’ domains. ‘Politifact’ has a lower precision of 80.05% and a recall of 8.32%, indicating a risk of false positives. The ‘Climate’ domain, despite obtaining moderate precision and recall, reveals complexity in differentiating between authentic and fake news in the science-related domain. In contrast, the model performs well on the ‘GossipCop’ and ‘ISOT-small’ datasets, with excellent precision and recall values. In ‘ISOT-small’, the model achieves precision and recall levels that exceed 89%, demonstrating its adaptability and effectiveness in dealing with a wide range of information across domains.

Across all datasets, our BERTGuard maintains a balanced F1-score, demonstrating its capacity to perform consistently across domains. Taking domains into consideration improved the detection performance compared to the baseline strategies, which did not consider news domains in their detection approach. Figure 2 shows the F1-score comparison between both baselines and the proposed BERTGuard solution with domain-specific knowledge.

Finally, the domain-specific technique appears to be a promising step toward a system that can detect fake news with greater adaptability and accuracy. The nuanced performance across domains emphasizes the significance of designing models to fit the complexities of unique information sets and domains.

5.4. Overall Impact of Class Imbalance Handling Strategies

This section details the findings from the experiments aimed at assessing the effectiveness of various strategies for addressing class imbalance, including random upsampling, random downsampling, and adjustments to class weights. We tested this impact on Baseline 1, FakeBERT (Baseline 2), and our proposed BERTGuard. We trained each model on balanced datasets by applying each imbalance strategy individually. The performance obtained was compared against the model’s performance without handling the class imbalance. The results shown in Table 9 show the results of this experiment averaged across all datasets tested. We observe that modifying class weights consistently outperforms the other strategies, indicating its effectiveness in mitigating the class imbalance in the context of fake news detection.

These findings underscore the importance of treating class imbalances in fake news detection, with class weighting being the most effective strategy among those tested, while downsampling did not consistently improve the model’s performance across all testing datasets.

5.5. Impact of Balancing Datasets by Adjusting Weights

The effort to address class imbalance through adjusting weights in the dataset has yielded distinctive outcomes for the baselines and the proposed domain-specific fake news detection approaches. Figure 3 and Figure 4 show the effect of handling the class imbalance issue by adjusting the class weights on both the main baseline (Baseline 1) we developed and the BERTGuard approaches.

The class weight adjustment approach results in significant improvements in some domains. Notably, ‘Snopes’ and ‘COVID-Claims’ show high precision and recall rates, demonstrating the model’s expertise in these critical domains. The targeted changes in class weights show the potential efficacy of domain-specific techniques for resolving class imbalance. Significant advances in ‘Pheme’ and ‘GossipCop’ were demonstrated by gaining higher precisions, recalls, and F1-scores. The model’s adaptability in capturing complex information transmission within these domains demonstrates the advantages of altering the class weight in the detection task as well as the possibility for domain-specific techniques to efficiently handle class imbalance over the baseline approach with no domains.

Despite efforts to address the imbalance, difficulties exist in some domains, as seen in the ‘Politifact’ and ‘Climate’ datasets. The models struggle to achieve a harmonious balance between precision and recall in several domains, implying that domain-specific complications may necessitate more specialized tactics than simple class weight modifications. Figure 5 presents a comparison between the baselines and the BERTGuard F1-scores after handling the class imbalance by adjusting the class weights.

In conclusion, the findings highlight the approach’s versatility across a variety of datasets while also noting ongoing problems in specific domains. Balancing datasets by altering class weights is a complicated process. The domain-specific methodology demonstrates promising advances in the majority of the domains, emphasizing the importance of continuing refinement and adaptation. The findings also highlight the dynamic nature of fake news identification, emphasizing the need for specific imbalance handling solutions to navigate the intricacies of various domains.

5.6. Impact of Balancing Datasets by Upsampling

In this section, we investigate handling class imbalance by using the random oversampling technique. Although this technique shows enhancements over unbalanced cases, upsampling performed more poorly compared to adjusting the class weights. Figure 6 shows the effect of the random oversampling technique on our proposed BERTGuard approach.

The F1-scores improved notably across multiple datasets as compared to the baseline. The F1-scores topped 89% in ‘Snope’ and ‘COVID-Claims’, showing a significant improvement in both precision and recall. The approach achieved balanced precision and recall in both ‘Gossipcop’ and ‘ISOT-small’, indicating an effective trade-off between false positives and false negatives in both the baseline and domain-specific approaches.

While some datasets’ F1-scores improved when compared to the imbalanced baseline, others declined. Notably, ‘COVID-Claims’ and ‘Pheme’ had lower F1-scores for the baseline approach, showing the limitations of the baseline model even in a balanced setting. In the domain-specific approach, several datasets, such as ‘Politifact’ and ‘Climate’, nevertheless faced difficulty, with lower F1-scores. This indicates that upsampling alone might not fully address the complexities in certain domains, and other techniques may perform better, such as adjusting the class weights.

In conclusion, while both upsampling and class weight adjustment strategies help to improve fake news detection in imbalanced datasets, the class weight adjustment approach displayed superior adaptability and competitive or superior F1-scores across many datasets, as presented in Figure 7.

5.7. Impact of Balancing Datasets by Downsampling

In this section, we investigate handling the class imbalance by using the downsampling technique. Although this technique has shown enhancement over the unbalanced case in different classification tasks, it did not help in our context.

Balancing the class distribution in the used dataset using random undersampling had a noticeable effect on the performance of our fake news detection approach. Looking at the F1-scores before and after treating the class imbalance using random undersampling, we can see that for most of the datasets, the F1-scores decreased slightly with training the models on balanced datasets by randomly undersampling the majority class. This could be due to the loss of information that occurs when randomly removing instances from the majority class during undersampling. However, it is important to note that while the F1-scores decreased, they are still relatively high, indicating that our model still performs well even after balancing the dataset. Figure 8 presents the F1-score results for the proposed BERTGuard after applying the random undersampling technique for handling class imbalance.

Regarding the baselines, downsampling produced mixed results. While there were advances in ‘Pheme’ and ‘Politifact’, other datasets such as ‘COVID-Claims’ and ‘Snope’ had difficulties, leading to decreased precision. Furthermore, ‘GossipCop’ and ‘ISOT-small’ revealed a trade-off between precision and recall, highlighting the difficulty of attaining balanced performance through downsampling alone. Figure 9 presents the F1-scores for the Baseline 1 (single-stage) detection model after applying the downsampling technique for handling the imbalance.

The performance of downsampling was context-dependent, with certain datasets favouring this approach over others. This underscores the importance of understanding the intricacies of each dataset to tailor the strategy accordingly. It performed well in some cases compared to the non-balanced case but had worse performance than the other imbalance handling techniques. Figure 10 compares the F1-score results when utilizing all three class imbalance handling techniques in the proposed domain-specific approach.

Another observation is that the F1-scores for some datasets, such as Climate, remained relatively stable after balancing, while others, like Politifact and Climate, saw a more significant decrease. This variation could be due to the nature of the datasets and the distribution of the classes within them.

Downsampling was not effective for improving the precision–recall balance and was not helpful in most other datasets. Upsampling, while beneficial in some domains, has drawbacks, including susceptibility to information nuances, and it handled more datasets than the downsampling technique. Class weight modification emerges as a consistently adaptive method for preserving and improving detection accuracy across diverse datasets and domains. The domain-specific character of issues, as seen in ‘Politifact’, necessitates nuanced studies and future model changes.

6. Conclusions

In this paper, we propose BERTGuard: a thorough, domain-specific strategy within the multi-domain fake news detection framework. Our methodology builds upon the unique characteristics that exist in various news domains. BERTGuard, a two-tiered fake news detection approach, includes domain categorization and domain-specific analyses using various BERT models. Our proposed approach demonstrates enhanced adaptability and precision across diverse information environments by leveraging multiple datasets from various domains simultaneously. To the best of our knowledge, this setting has not been explored previously. Our research also takes a methodical approach to the challenging obstacle of class imbalance in multi-domain fake news detection. We determine the best-performing strategy by rigorously evaluating handling strategies such as random oversampling, random undersampling, and class weight adjustment. This strengthens the detection system against the difficulties of imbalanced datasets. Our BERTGuard approach outperformed state-of-the-art solutions for fake news detection in a multi-domain setting.

Our findings provide a solid framework for comprehensive and deep fake news detection approaches and provides useful insights for practitioners looking for effective solutions to class imbalances in this critical domain. Future research should focus on refining the domain-specific methodology, exploring new information domains, and investigating developing solutions for dealing with increasing issues in the dynamic landscape of fake news. Future studies should also use transfer learning to build stronger models capable of detecting fake news patterns across other domains. In addition, future work will involve exploring various anomaly detection methods to address unanticipated anomalies and class imbalances more effectively. By integrating these techniques, we aim to refine our model’s ability to manage diverse and imbalanced datasets, thereby improving detection accuracy and reliability across different domains.

Author Contributions

Conceptualization, M.Q.A. and P.B.; Data curation, M.Q.A.; Formal analysis, M.Q.A.; Investigation, M.Q.A.; Methodology, M.Q.A.; Project administration, M.Q.A. and P.B.; Resources, M.Q.A.; Software, M.Q.A.; Supervision, P.B.; Validation, M.Q.A. and P.B.; Visualization, M.Q.A.; Writing—original draft, M.Q.A.; Writing—review and editing, P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this article are publicly available at the following links: https://github.com/joolsa/fake_real_news_dataset, accessed on: 12 May 2024; https://www.kaggle.com/competitions/fake-news/data, accessed on: 12 May 2024; http://fakenews.research.sfu.ca/, accessed on: 12 May 2024; https://data.mendeley.com/datasets/zwfdmp5syg/1, accessed on: 12 May 2024; https://ieee-dataport.org/documents/covifn-fake-news-covid19, accessed on: 12 May 2024; https://www.kaggle.com/datasets/rmisra/politifact-fact-check-dataset, accessed on: 12 May 2024; https://ieee-dataport.org/open-access/covid-19-fake-news-infodemic-research-dataset-covid19-fnir-datase, accessed on: 12 May 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Silva, A.; Luo, L.; Karunasekera, S.; Leckie, C. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 557–565. [Google Scholar]
Chen, Q. Coronavirus Rumors Trigger Irrational Behaviors among Chinese Netizens. 2020. Available online: https://www.globaltimes.cn/content/1178157.shtml (accessed on 12 May 2024).
Sharma, K.; Qian, F.; Jiang, H.; Ruchansky, N.; Zhang, M.; Liu, Y. Combating fake news: A survey on identification and mitigation techniques. Acm Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–42. [Google Scholar] [CrossRef]
Schuster, T.; Schuster, R.; Shah, D.J.; Barzilay, R. The limitations of stylometry for detecting machine-generated fake news. Comput. Linguist. 2020, 46, 499–510. [Google Scholar] [CrossRef]
Shabani, S.; Sokhn, M. Hybrid machine-crowd approach for fake news detection. In Proceedings of the 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), Philadelphia, PA, USA, 18–20 October 2018; pp. 299–306. [Google Scholar]
Nan, Q.; Wang, D.; Zhu, Y.; Sheng, Q.; Shi, Y.; Cao, J.; Li, J. Improving Fake News Detection of Influential Domain via Domain- and Instance-Level Transfer. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2834–2848. [Google Scholar]
Nan, Q.; Cao, J.; Zhu, Y.; Wang, Y.; Li, J. MDFEND: Multi-domain fake news detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, QLD, Australia, 1–5 November 2021; pp. 3343–3347. [Google Scholar]
Allcott, H.; Gentzkow, M. Social media and fake news in the 2016 election. J. Econ. Perspect. 2017, 31, 211–236. [Google Scholar] [CrossRef]
Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed]
Bursztyn, L.; Rao, A.; Roth, C.P.; Yanagizawa-Drott, D.H. Misinformation during a Pandemic; Technical Report; National Bureau of Economic Research: Cambridge, MA, USA, 2020. [Google Scholar]
Alnabhan, M.Q.; Branco, P. Evaluating Deep Learning for Cross-Domains Fake News Detection. In Proceedings of the International Symposium on Foundations and Practice of Security, Bordeaux, France, 11–13 December 2023; Springer: Cham, Switzerland, 2024; pp. 40–51. [Google Scholar]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
Tang, H.; Liu, J.; Zhao, M.; Gong, X. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, Rio de Janeiro, Brazil, 22–26 September 2020; pp. 269–278. [Google Scholar]
Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. A comparative study of data sampling and cost sensitive learning. In Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, Pisa, Italy, 15–19 December 2008; pp. 46–52. [Google Scholar]
Alnabhan, M.Q.; Branco, P. Fake News Detection Using Deep Learning: A Systematic Literature Review. IEEE Access 2024, 12, 1. [Google Scholar] [CrossRef]
Longadge, R.; Dongre, S. Class imbalance problem in data mining review. arXiv 2013, arXiv:1305.1707. [Google Scholar]
Alenezi, M.N.; Alqenaei, Z.M. Machine learning in detecting COVID-19 misinformation on twitter. Future Internet 2021, 13, 244. [Google Scholar] [CrossRef]
Moravec, P.; Kim, A.; Dennis, A. Flagging fake news: System 1 vs. System 2. In Proceedings of the 39th International Conference on Information Systems, San Francisco, CA, USA, 13–16 December 2018. [Google Scholar]
Khweiled, R.; Jazzar, M.; Eleyan, D. Cybercrimes during COVID-19 pandemic. Int. J. Inf. Eng. Electron. Bus. 2021, 13, 1–10. [Google Scholar] [CrossRef]
Shin, D.; Koerber, A.; Lim, J.S. Impact of misinformation from generative AI on user information processing: How people understand misinformation from generative AI. New Media Soc. 2024, 14614448241234040. [Google Scholar] [CrossRef]
Qawasmeh, E.; Tawalbeh, M.; Abdullah, M. Automatic identification of fake news using deep learning. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 22–25 October 2019; pp. 383–388. [Google Scholar]
Kozik, R.; Kula, S.; Choraś, M.; Woźniak, M. Technical solution to counter potential crime: Text analysis to detect fake news and disinformation. J. Comput. Sci. 2022, 60, 101576. [Google Scholar] [CrossRef]
Deepak, S.; Chitturi, B. Deep neural approach to Fake-News identification. Procedia Comput. Sci. 2020, 167, 2236–2243. [Google Scholar]
Sharma, S.; Saraswat, M.; Dubey, A.K. Fake News Detection Using Deep Learning. In Proceedings of the Knowledge Graphs and Semantic Web: Third Iberoamerican Conference and Second Indo-American Conference, KGSWC 2021, Kingsville, TX, USA, 22–24 November 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 249–259. [Google Scholar]
Pilkevych, I.; Fedorchuk, D.; Naumchak, O.; Romanchuk, M. Fake news detection in the framework of decision-making system through graph neural network. In Proceedings of the 2021 IEEE 4th International Conference on Advanced Information and Communication Technologies (AICT), Lviv, Ukraine, 21–25 September 2021; pp. 153–157. [Google Scholar]
Manene, S. Mitigating misinformation about the COVID-19 infodemic on social media: A conceptual framework. Jàmbá J. Disaster Risk Stud. 2023, 15, 1416. [Google Scholar] [CrossRef]
Akhter, M.; Hossain, S.M.M.; Nigar, R.S.; Paul, S.; Kamal, K.M.A.; Sen, A.; Sarker, I.H. COVID-19 Fake News Detection using Deep Learning Model. Ann. Data Sci. 2024, 1–32. [Google Scholar] [CrossRef]
Nasir, J.A.; Khan, O.S.; Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 2021, 1, 100007. [Google Scholar] [CrossRef]
Kaliyar, R.K.; Goswami, A.; Narang, P.; Sinha, S. FNDNet—A deep convolutional neural network for fake news detection. Cogn. Syst. Res. 2020, 61, 32–44. [Google Scholar] [CrossRef]
Saleh, H.; Alharbi, A.; Alsamhi, S.H. OPCNN-FAKE: Optimized convolutional neural network for fake news detection. IEEE Access 2021, 9, 129471–129489. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, L.; Zhang, J.; Cui, Q.; Li, Z.; Yu, P.S. TI-CNN: Convolutional neural networks for fake news detection. arXiv 2018, arXiv:1806.00749. [Google Scholar]
Raj, C.; Meel, P. ConvNet frameworks for multi-modal fake news detection. Appl. Intell. 2021, 51, 8132–8148. [Google Scholar] [CrossRef]
Hashmi, E.; Yayilgan, S.Y.; Yamin, M.M.; Ali, S.; Abomhara, M. Advancing fake news detection: Hybrid deep learning with fasttext and explainable AI. IEEE Access 2024, 12, 44462–44480. [Google Scholar] [CrossRef]
Mosallanezhad, A.; Karami, M.; Shu, K.; Mancenido, M.V.; Liu, H. Domain adaptive fake news detection via reinforcement learning. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 3632–3640. [Google Scholar]
Li, X.; Fu, X.; Xu, G.; Yang, Y.; Wang, J.; Jin, L.; Liu, Q.; Xiang, T. Enhancing BERT representation with context-aware embedding for aspect-based sentiment analysis. IEEE Access 2020, 8, 46868–46876. [Google Scholar] [CrossRef]
Xu, H.; Liu, B.; Shu, L.; Yu, P.S. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. arXiv 2019, arXiv:1904.02232. [Google Scholar]
Kumar, B. BERT Variants and Their Differences; Technical report; 360DigiTMG: Hyderabad, India, 2023. [Google Scholar]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2020, arXiv:1910.01108. [Google Scholar]
Lutkevich, B. BERT Language Model; Technical report; TechTarget: Newton, MA, USA, 2020. [Google Scholar]
Tida, V.S.; Hsu, D.S.; Hei, D.X. Unified Fake News Detection using Transfer Learning of BERT Model. IEEE 2020. Available online: https://d1wqtxts1xzle7.cloudfront.net/86079521/2202.01907v1-libre.pdf?1652817185=&response-content-disposition=inline%3B+filename%3DUnified_Fake_News_Detection_using_Transf.pdf&Expires=1723717032&Signature=SlJqui-38VOu3m7EAFYMcfZkoxq23tXKTFkq-wlwLHawKo0ibgs47MWTsCwm~7pRxvt4tl7LYN90t0QkZ7TNA8u30OuhD1JPpvNYhXoF4rYemFei0xLNEpYr4NkaPcsRshcrXcEuN0u1DTA5aR8TD1eZhJcU6x1~AZbl745yKnoIrztd032Gb2EVFS5VW~Gy3xxYIiAWD~HJ3zu5SFhTzdOcHChdGXexeXZ8Dls7N-UU-KGdGMWq4XnwnWXv9A20jpMYks6Dqcho9rutx~f3t3A0UyuCYilNghvcU-o0uGj4J4zGnEN1rhhCvtCUEAl1DMabCr-aCCW73t7Q9URcRg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA (accessed on 12 May 2024).
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
Luo, Y.; Shi, Y.; Li, S. Social media fake news detection algorithm based on multiple feature groups. In Proceedings of the 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 26–28 May 2023; Volume 3, pp. 91–95. [Google Scholar]
Bounaama, R.; Abderrahim, M.E.A. Classifying COVID-19 Related Tweets for Fake News Detection and Sentiment Analysis with BERT-based Models. arXiv 2023, arXiv:2304.00636. [Google Scholar]
Essa, E.; Omar, K.; Alqahtani, A. Fake news detection based on a hybrid BERT and LightGBM models. Complex Intell. Syst. 2023, 9, 6581–6592. [Google Scholar] [CrossRef] [PubMed]
Shushkevich, E.; Cardiff, J.; Boldyreva, A. Detection of Truthful, Semi-Truthful, False and Other News with Arbitrary Topics Using BERT-Based Models. In Proceedings of the 2023 33rd Conference of Open Innovations Association (FRUCT), Zilina, Slovakia, 24–26 May 2023; pp. 250–256. [Google Scholar]
Sultana, R.; Nishino, T. Fake News Detection System: An implementation of BERT and Boosting Algorithm. In Proceedings of the 38th International Conference on Computers and Their Applications, Virtual, 20–22 March 2023; Volume 91, pp. 124–137. [Google Scholar]
Alghamdi, J.; Lin, Y.; Luo, S. Towards COVID-19 fake news detection using transformer-based models. Knowl.-Based Syst. 2023, 274, 110642. [Google Scholar] [CrossRef] [PubMed]
SATHVIK, M.; Mishra, M.K.; Padhy, S. Fake News Detection by Fine Tuning of Bidirectional Encoder Representations from Transformers. IEEE Trans. Comput. Soc. Syst. 2023, 20, 20. [Google Scholar]
Kitanovski, A.; Toshevska, M.; Mirceva, G. DistilBERT and RoBERTa Models for Identification of Fake News. In Proceedings of the 2023 46th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 22–26 May 2023; pp. 1102–1106. [Google Scholar]
Saini, K.; Jain, R. A Hybrid LSTM-BERT and Glove-based Deep Learning Approach for the Detection of Fake News. In Proceedings of the 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India, 30–31 March 2023; pp. 400–406. [Google Scholar]
Fauzy, A.R.I.; Setiawan, E.B. Detecting Fake News on Social Media Combined with the CNN Methods. J. Resti (Rekayasa Sist. Dan Teknol. Informasi) 2023, 7, 271–277. [Google Scholar] [CrossRef]
Nassif, A.B.; Elnagar, A.; Elgendy, O.; Afadar, Y. Arabic fake news detection based on deep contextualized embedding models. Neural Comput. Appl. 2022, 34, 16019–16032. [Google Scholar] [CrossRef]
Ranjan, V.; Agrawal, P. Fake News Detection: GA-Transformer And IG-Transformer Based Approach. In Proceedings of the 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Virtual Conference, 27–28 January 2022; pp. 487–493. [Google Scholar]
Raza, S.; Ding, C. Fake news detection based on news content and social contexts: A transformer-based approach. Int. J. Data Sci. Anal. 2022, 13, 335–362. [Google Scholar] [CrossRef] [PubMed]
Truică, C.O.; Apostol, E.S. MisRoBÆRTa: Transformers versus misinformation. Mathematics 2022, 10, 569. [Google Scholar] [CrossRef]
Schütz, M.; Schindler, A.; Siegel, M.; Nazemi, K. Automatic fake news detection with pre-trained transformer models. In Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, Virtual Event, 10–15 January 2021; Springer: Berlin/Heidelberg, Germany, 2021. Part VII. pp. 627–641. [Google Scholar]
Huang, Y.; Gao, M.; Wang, J.; Shu, K. Dafd: Domain adaptation framework for fake news detection. In Proceedings of the Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, 8–12 December 2021; Springer: Berlin/Heidelberg, Germany, 2021. Part I 28. pp. 305–316. [Google Scholar]
Qazi, M.; Khan, M.U.; Ali, M. Detection of fake news using transformer model. In Proceedings of the 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 29–30 January 2020; pp. 1–6. [Google Scholar]
Nirav Shah, M.; Ganatra, A. A systematic literature review and existing challenges toward fake news detection models. Soc. Netw. Anal. Min. 2022, 12, 168. [Google Scholar] [CrossRef]
Kato, S.; Yang, L.; Ikeda, D. Domain Bias in Fake News Datasets Consisting of Fake and Real News Pairs. In Proceedings of the 2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI), Kanazawa, Japan, 2–8 July 2022; pp. 101–106. [Google Scholar]
Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion. Heliyon 2023, 9, e20382. [Google Scholar] [CrossRef]
Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
Rastogi, S.; Bansal, D. A review on fake news detection 3T’s: Typology, time of detection, taxonomies. Int. J. Inf. Secur. 2023, 22, 177–212. [Google Scholar] [CrossRef] [PubMed]
Zhou, P.; Han, X.; Morariu, V.I.; Davis, L.S. Two-stream neural networks for tampered face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 19–27. [Google Scholar]
Cardoso, E.F.; Silva, R.M.; Almeida, T.A. Towards automatic filtering of fake reviews. Neurocomputing 2018, 309, 106–116. [Google Scholar] [CrossRef]
Castelo, S.; Almeida, T.; Elghafari, A.; Santos, A.; Pham, K.; Nakamura, E.; Freire, J. A topic-agnostic approach for identifying fake news pages. In Proceedings of the Companion Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 975–980. [Google Scholar]
Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; Liu, H. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 2020, 8, 171–188. [Google Scholar] [CrossRef]
Ahmad, I.; Yousaf, M.; Yousaf, S.; Ahmad, M.O. Fake news detection using machine learning ensemble methods. Complexity 2020, 2020, 1–11. [Google Scholar] [CrossRef]
Zubiaga, A.; Liakata, M.; Procter, R. Learning reporting dynamics during breaking news for rumour detection in social media. arXiv 2016, arXiv:1610.07363. [Google Scholar]
Wang, W.Y. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv 2017, arXiv:1705.00648. [Google Scholar]
Diggelmann, T.; Boyd-Graber, J.; Bulian, J.; Ciaramita, M.; Leippold, M. CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. arXiv 2020, arXiv:2012.00614. [Google Scholar]
Branco, P.; Torgo, L.; Ribeiro, R.P. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 2016, 49, 1–50. [Google Scholar] [CrossRef]
Agarwal, I.Y.; Rana, D.P. Fake News and Imbalanced Data Perspective. In Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance; IGI Global: Hershey, PA, USA, 2021; pp. 195–210. [Google Scholar]
Salah, I.; Jouini, K.; Korbaa, O. On the use of text augmentation for stance and fake news detection. J. Inf. Telecommun. 2023, 7, 359–375. [Google Scholar] [CrossRef]
Keya, A.J.; Wadud, M.A.H.; Mridha, M.; Alatiyyah, M.; Hamid, M.A. AugFake-BERT: Handling imbalance through augmentation of fake news using BERT to enhance the performance of fake news classification. Appl. Sci. 2022, 12, 8398. [Google Scholar] [CrossRef]
Sastrawan, I.K.; Bayupati, I.; Arsa, D.M.S. Detection of fake news using deep learning CNN–RNN based methods. ICT Express 2022, 8, 396–408. [Google Scholar] [CrossRef]
Mouratidis, D.; Nikiforos, M.N.; Kermanidis, K.L. Deep learning for fake news detection in a pairwise textual input schema. Computation 2021, 9, 20. [Google Scholar] [CrossRef]
Al Obaid, A.; Khotanlou, H.; Mansoorizadeh, M.; Zabihzadeh, D. Multimodal fake-news recognition using ensemble of deep learners. Entropy 2022, 24, 1242. [Google Scholar] [CrossRef]
Isa, S.M.; Nico, G.; Permana, M. Indobert for Indonesian fake news detection. ICIC Express Lett. 2022, 16, 289–297. [Google Scholar]
Szczepański, M.; Pawlicki, M.; Kozik, R.; Choraś, M. New explainability method for BERT-based model in fake news detection. Sci. Rep. 2021, 11, 23705. [Google Scholar] [CrossRef] [PubMed]
Palani, B.; Elango, S.; Viswanathan K, V. CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT. Multimed. Tools Appl. 2022, 81, 5587–5620. [Google Scholar] [CrossRef] [PubMed]
Rai, N.; Kumar, D.; Kaushik, N.; Raj, C.; Ali, A. Fake News Classification using transformer based enhanced LSTM and BERT. Int. J. Cogn. Comput. Eng. 2022, 3, 98–105. [Google Scholar] [CrossRef]
Gaudreault, J.G.; Branco, P.; Gama, J. An analysis of performance metrics for imbalanced classification. In Proceedings of the International Conference on Discovery Science, Virtual, 11–13 October 2021; pp. 67–77. [Google Scholar]

Figure 1. BERTGuard fake news detection architecture.

Figure 2. F1-scores for both baselines against BERTGuard.

Figure 3. The effect of handling the imbalance on the main baseline (Baseline 1) F1-scores by adjusting the class weights.

Figure 4. The effect of handling the imbalance on the BERTGuard F1-scores by adjusting the class weights.

Figure 5. F1-scores of both baselines and BERTGuard after treating the imbalance by adjusting class weights.

Figure 6. The effect of random oversampling on BERTGuard F1-scores.

Figure 7. F1-score comparison between upsampling and class weight adjustments on BERTGuard.

Figure 8. The effect of random undersampling on BERTGuard F1-scores.

Figure 9. The effect of random undersampling on Baseline 1 F1-scores.

Figure 10. The effect of using upsampling, downsampling, and class weight adjustment vs. no handling technique on BERTGuard.

Table 1. The main transformer-based fake news detection models.

Ref.	Year	Model	Covered Multi-Domain?
[11]	2023	BiLSTM, Hybrid CNN+RNN, CNN, C-LSTM, and BERT	No
[42]	2023	BERT+LSTM model for text content analysis GAT for modeling social network features	No
[43]	2023	BERT	No
[44]	2023	BERT + LightGBM	No
[45]	2023	SBERT, RoBERTa, and mBERT	No
[46]	2023	Hybrid ensemble learning model: BERT for text classification tasks Ensemble learning models, including Voting Regressor and Boosting Ensemble	No
[47]	2023	CT-BERT with BiGRU	No
[48]	2023	BERT, LSTM, BiLSTM, and CNN-BiLSTM	No
[49]	2023	DistilBERT and RoBERTa	No
[50]	2023	hybrid model combining LSTM and BERT with GloVe embeddings	No
[51]	2023	TF-IDF N-gram, BERT, GloVe, and CNN	No
[52]	2022	Arabic-BERT, ARBERT, and QaribBert	No
[53]	2022	BERT, XLNet, RoBERTa, and Longformer	No
[34]	2022	BERT with Reinforcement Learning	Yes Gossip (Social) and Political
[54]	2022	BERT	No
[55]	2022	BART and RoBERTa	No
[12]	2021	FakeBERT: Combination of BERT and 1d-CNN	Yes Social and Political
[56]	2021	ALBERT, BERT, RoBERTa, XLNet, and DistilBERT	No
[57]	2021	BERT	Yes Political, Social, and Health

Table 2. Characteristics of the datasets used in our experiments.

Dataset	Domain	True	Fake
FA-KES	Crime	426	378
Snopes	Crime	195	120
COVID-FNIR	Health	3795	3793
COVID-Claims	Health	1591	1230
COVID-FN	Health	2061	1058
Pheme	Politics	5089	1335
Liar	Politics	7176	5669
ISOT	Politics	23,481	21,417
Politifact	Politics	11,760	9392
FakeNews	Politics	10,413	10,387
Climate	Science	654	253
ISOT-small (Science)	Science	1000	1000
GossipCop	Social	16,817	5323
ISOT-small (Social)	Social	1000	1000

Table 3. Performance evaluation metrics.

Metric	Formula	Evaluation Focus
Accuracy	$\frac{t p + t n}{t p + f p + t n + f n}$	The proportion of correctly predicted positive and negative instances out of the total instances.
Precision	$\frac{t p}{t p + f p}$	Measures the positive patterns that are correctly predicted from the total predicted patterns in a positive class.
Recall	$\frac{t p}{t p + t n}$	The proportion of actual positive instances that were correctly predicted by the model.
F1-Score	$\frac{2 * p r e * r e c}{p r e + r e c}$	The harmonic mean of precision and recall, balancing both metrics.
Geometric-mean (G-Mean)	$\sqrt{t p * t n}$	The geometric mean of recall for each class, focusing on the balance between classification performance across classes.

Note: tp—true positive; tn—true negative; fp—false positive; fn—false negative.

Table 4. Domain classification model results.

Model	Average Accuracy	Training Time
albert-base-v2	95.8%	724.1373889
bert-base-uncased	92.9%	1933.898181
distilbert-base-uncased	96.8%	96.21049619
distilbert-base-uncased-finetuned-sst-2-english	91.8%	220.6201028
roberta-base	98.6%	1545.978703

Table 5. The average accuracies of various models on news domains.

Model	Crime	Health	Politics	Science	Social
albert-base-v2	61.7%	84.8%	67.8%	75.3%	84.2%
bert-base-uncased	61.3%	87.5%	87.8%	74.7%	84.7%
distilbert-base-uncased	60.1%	88.3%	68.5%	77.7%	95.6%
distilbert-base-uncased-finetuned-sst-2-english	68.8%	86.2%	61.3%	71.6%	84.5%
roberta-base	56.9%	87.6%	71.8%	80.2%	88.7%
bert-base-cased	61.9%	87.1%	69.6%	76.4%	84.7%

Table 6. Single-stage fake news detection results—Baseline 1.

Testing Dataset	Precision	Recall	F1-Score	Accuracy	G-Mean
Snope	0.583011583	0.774358974	0.665198238	0.517460317	0.671907919
COVID-Claims	0.224425887	0.135135135	0.168693605	0.246979389	0.174148852
Pheme	0.313032887	0.192509363	0.238404453	0.744396015	0.245482712
Politifact	0.486368313	0.18228263	0.265180199	0.49520736	0.297752406
Climate	0.349206349	0.434782609	0.387323944	0.61631753	0.389652213
GossipCop	0.378611058	0.743565658	0.501743044	0.644941283	0.530586638
ISOT-small	0.581487556	0.789125069	0.669578551	0.556974212	0.677396788

Table 7. Overall results of Baseline 1, Baseline 2 (FakeBERT), and BERTGuard.

Architecture	Precision	Recall	F1-Score	Accuracy
BERTGuard (Domain-Aware)	82%	68%	70%	82%
Baseline 1 (No Domains)	42%	46%	41%	55%
Baseline 2 (FakeBERT)	32%	34%	31%	38%

These results reflect the average testing values for various unseen datasets.

Table 8. The proposed BERTGuard testing results.

Testing Dataset	Precision	Recall	F1-Score	Accuracy	G-Mean
Snopes	0.837719298	0.979487179	0.903073286	0.86984127	0.905834043
COVID-Claims	0.956869994	0.976115651	0.966397013	0.961620469	0.966444916
Pheme	0.847560976	0.624719101	0.71927555	0.89866127	0.727658939
Politifact	0.800490597	0.083248299	0.150812601	0.478772693	0.258146239
Climate	0.534798535	0.577075099	0.55513308	0.742006615	0.555534803
GossipCop	0.835781991	0.66259628	0.739180551	0.887579042	0.744168017
ISOT-small	0.898968008	0.868	0.883240223	0.8835	0.883362106

Table 9. The impact of class imbalance handling in BERTGuard.

Strategy	Precision	Recall	F1-Score	Accuracy
BERTGuard
No balancing	82%	68%	70%	82%
Random oversampling	81%	69%	71%	82%
Random undersampling	79%	61%	65%	68%
Adjusting class weights	87%	68%	73%	78%
Baseline 1
No balancing	42%	46%	41%	55%
Random oversampling	46%	48%	45%	52%
Random undersampling	36%	41%	37%	46%
Adjusting class weights	62%	51%	55%	56%
FakeBERT (Baseline 2)
No balancing	32%	34%	31%	38%
Random oversampling	46%	48%	45%	52%
Random undersampling	36%	41%	37%	46%
Adjusting class weights	62%	51%	55%	56%

Note: These results reflect the average testing values for testing using various unseen datasets to include Snope, COVID-Claims, Pheme, Politifact, Climate, GossipCop, and ISOT-Science.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alnabhan, M.Q.; Branco, P. BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation. Big Data Cogn. Comput. 2024, 8, 93. https://doi.org/10.3390/bdcc8080093

AMA Style

Alnabhan MQ, Branco P. BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation. Big Data and Cognitive Computing. 2024; 8(8):93. https://doi.org/10.3390/bdcc8080093

Chicago/Turabian Style

Alnabhan, Mohammad Q., and Paula Branco. 2024. "BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation" Big Data and Cognitive Computing 8, no. 8: 93. https://doi.org/10.3390/bdcc8080093

APA Style

Alnabhan, M. Q., & Branco, P. (2024). BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation. Big Data and Cognitive Computing, 8(8), 93. https://doi.org/10.3390/bdcc8080093

Article Menu

BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation

Abstract

1. Introduction

2. Background and Related Work

2.1. Fake News Detection

2.2. Fake News Datasets

2.3. Dealing with Class Imbalance

3. BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation

4. Materials and Methods

4.1. Data Collection

4.2. Evaluation Metrics

5. Results and Discussion

5.1. Model Selection

5.2. Single-Stage Fake News Detection Results

5.3. Multi-Domain Fake News Approach

5.4. Overall Impact of Class Imbalance Handling Strategies

5.5. Impact of Balancing Datasets by Adjusting Weights

5.6. Impact of Balancing Datasets by Upsampling

5.7. Impact of Balancing Datasets by Downsampling

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI