HyproBert: A Fake News Detection Model Based on Deep Hypercontext

Nadeem, Muhammad Imran; Mohsan, Syed Agha Hassnain; Ahmed, Kanwal; Li, Dun; Zheng, Zhiyun; Shafiq, Muhammad; Karim, Faten Khalid; Mostafa, Samih M.

doi:10.3390/sym15020296

Open AccessArticle

HyproBert: A Fake News Detection Model Based on Deep Hypercontext

by

Muhammad Imran Nadeem

¹

,

Syed Agha Hassnain Mohsan

²

,

Kanwal Ahmed

¹

,

Dun Li

^1,*,

Zhiyun Zheng

¹,

Muhammad Shafiq

³,

Faten Khalid Karim

^4,* and

Samih M. Mostafa

⁵

¹

School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China

²

Optical Communications Laboratory, Ocean College, Zhejiang University, Zheda Road 1, Zhoushan 316021, China

³

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China

⁴

Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

⁵

Computer Science Department, Faculty of Computers and Information, South Valley University, Qena 83523, Egypt

^*

Authors to whom correspondence should be addressed.

Symmetry 2023, 15(2), 296; https://doi.org/10.3390/sym15020296

Submission received: 10 November 2022 / Revised: 11 January 2023 / Accepted: 13 January 2023 / Published: 21 January 2023

Download

Browse Figures

Versions Notes

Abstract

:

News media agencies are known to publish misinformation, disinformation, and propaganda for the sake of money, higher news propagation, political influence, or other unfair reasons. The exponential increase in the use of social media has also contributed to the frequent spread of fake news. This study extends the concept of symmetry into deep learning approaches for advanced natural language processing, thereby improving the identification of fake news and propaganda. A hybrid HyproBert model for automatic fake news detection is proposed in this paper. To begin, the proposed HyproBert model uses DistilBERT for tokenization and word embeddings. The embeddings are provided as input to the convolution layer to highlight and extract the spatial features. Subsequently, the output is provided to BiGRU to extract the contextual features. The CapsNet, along with the self-attention layer, proceeds to the output of BiGRU to model the hierarchy relationship among the spatial features. Finally, a dense layer is implemented to combine all the features for classification. The proposed HyproBert model is evaluated using two fake news datasets (ISOT and FA-KES). As a result, HyproBert achieved a higher performance compared to other baseline and state-of-the-art models.

Keywords:

fake news detection; deep learning; natural language processing; DistilBERT; capsule network; deep context

1. Introduction

Over the years, the increase in the usage of social media platforms has contributed to facilitating the phenomenon of spreading false information. The spread of fake news can be linked to achieving several goals, which can be targeted toward specific individuals, groups, politics, regions, religions, ethnicity, culture, race, etc. It can also be connected with illegal and fraudulent activities [1,2]. The generation and propagation of fake news has a huge negative impact on the economy, peace, and health all over the world. It is considered a major threat to journalism. It was reported in the year 2013 that the American stock market had lost over USD 130 billion over fake news of its president being injured in an explosion. In 2016, the US elections were packed with fake news glorifying and defaming both presidential candidates. One such fake news claim proclaimed “Pope Francis endorsed President Trump”. This resulted in huge turmoil among the rivals and started riots among the supporters. In recent years, fake news related to the spread and effects of COVID has disturbed the harmony in society, created social instability, and resulted in the economic collapse of several governments.

To overcome the adverse effects of fake news, its propagation needs to be stopped. There is a lot of research in place to moderate social media content with the help of human content moderators. Machine Learning (ML) algorithms, along with previously moderated data, are effective for fake news detection [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]. However, machine learning models do not provide a significant level of efficacy and improvement over time for various datasets. Keeping in view the success of Deep Learning (DL) in various domains [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41], this research highlights the significance of deep learning techniques for Natural Language Processing (NLP) tasks. The proposed model is a hybrid of deep models, adept placements, and hyperparameter tuning for an effective solution for fake news detection. It utilizes the most valuable expertise of deep learning models in a structured manner to provide an automated, efficient, and effective classification solution for identification and detection of fake news. The main contributions of this research in the field of fake news detection include:

i.: The introduction of a novel deep neural network model, $H y p r o B e r t$ , for fake news detection;
ii.: Extraction and evaluation of content attributes at multiple orientations based on deep hypercontext;
iii.: Analysis of the developments in hyperparameter optimization on $H y p r o B e r t$ .

Section 2 describes the literature review of fake news detection along with the limitations in this domain. Section 3 includes the methodology of the proposed HyproBert model. Section 4 includes the experiments and results. Section 5 discusses the evaluation results and comparative analysis. The research along with a description of future work is concluded in Section 6.

2. Literature Review

In this section, a review of various existing machine learning and deep learning techniques being used for detection of fake news is presented. This section also summarizes the status and limitations of fake news detection in the news media.

The research on fake news detection is divided into two types: supervised learning and unsupervised learning. Ozbay and Alatas [3] suggested a two-step method to identify false news using text information; the first stage entails preprocessing, and the second stage involves applying the textual feature vector to 23 clever supervised classifiers for further experimental assessment. By applying multiple word embedding approaches across five datasets in three languages sourced from different social media sites, Faustini Covoes [4] created a model for detecting false news that is independent of language and platform. To this end, Guo et al. [5] thoroughly reviewed the advancements in the field of bogus information detection, detailing the many problems, unique methodologies, and specifics of available datasets.

For the first time in the domain of fake news detection, Ozbay and Alatas [6] employed two metaheuristic algorithms for the identification of fake news. Grey Wolf Optimization (GWO) resulted in higher performance for social media data. Kumar et al. [9] used five different classifiers and thirteen different attributes to categorize tweets; they also used Particle Swarm Optimization (PSO) to extract an ideal feature set from the tweets’ text and improve classifier performance.

Perez-Rosas et al. [7] created two new datasets for automatic identification of false news, each including seven different types of news related content. They laid out an extensive framework and patterns for distinguishing between authentic and fake news. In order to identify fake news, Ahmed et al. [8] developed a novel dataset called ISOT that was compiled from actual news stories around the globe. Linear Support Vector Machine (LSVM) classifier along with Term Frequency–Inverse Document Frequency (TF–IDF) evaluation for feature vector representation were utilized for the classification of fake news. However, these methods require bespoke features, which adds to their production and time requirements. Akyol et al. [10] determined applicability of Random Forest (RF), Gradient Boost Tree (GBT), and Multilayer Perceptron (MLP) for the concept of fake news detection.

With the rise of deep learning theory, an increasing number of researchers employ it to detect fake news. Recent research has been progressively more focused on unsupervised and semi-supervised detection methods. The capacity of deep learning models to automatically extract high-level properties from news articles is a major draw for academics; this makes these methods particularly useful for diagnosing fake news [11,12,13,14,15,16,17,18,19,20]. Goldani et al. [19] recommend the use of embedding models and Convolutional Neural Networks (CNN) to detect bogus news. During model training, static and dynamic word embeddings were compared and gradually updated. Their approach was evaluated using two public datasets. ISOT datasets improved accuracy by 7.9% and 2.1%, respectively.

Ma et al. [14] developed a Recurrent Neural Network (RNN) to represent the textual data sequence for rumor detection. Using text data, Asghar et al. [11] created a deep learning model that recognizes rumors by combining CNN with Bidirectional Long Short-Term Memory (Bi-LSTM). In order to effectively manage the textual contents in a bi-directional method, Kaliyar et al. [13] developed a hybrid of the CNN and BERT (Bidirectional Encoder Representations from Transformers) model, named FakeBERT.

Yu et al. [18] emphasized the shortcomings of RNN-based algorithms for the early identification of disinformation. They suggested a CNN model for extracting significant features and spotting propaganda. To identify and categorize bogus news, Shu et al. [15] took a pragmatic approach by establishing a co-attention method to find the top 1000 most important lines in the material and combine them with the top 1000 most significant user reactions. Autoencoder-based Unsupervised Fake News Detection (UFNDA) was proposed by Li et al. [20]. Their methodology combines the context and content data from news to produce a feature vector that will improve the identification of false news. For early rumor detection, Chen et al. [12] developed an attention-based RNN to accumulate unique language properties over time. Wang [16] offered a new dataset named LIAR. He suggested a deep learning model that combines CNN and Bi-LSTM to extract textual features and meta-data features, respectively. Yin et al. [17] employed Support Vector Machine (SVM) classifiers to determine correlation among data. CNN and Principal Component Analysis (PCA) are utilized as the feature extraction tools. In another study, CNN, Long Short-Term Memory, and Convolutional Long Short-Term Memory (C-LSTM) are all used in the ensemble learning model for fake news detection presented by Toumi and Bouramoul [42]. The hybrid technique of CNN and RNN was proposed by Nasir et al. [43]. At first, CNN was employed for feature extraction close to the point of interest, whereas RNN was used for discovering a correlation. The difficulty of optimizing RNN is exacerbated by the vanishing and exploding gradient problem [44].

Current Status and Limitations

The current literature analysis concludes with the reality of false news and its uncontrolled spread. The extraction of deep semantic and contextual information is crucial for the identification and classification of propaganda. Extracting context from an instance’s several orientations is a key function of the Capsule Network (CapsNet) [45]. CapsNets are also utilized for NLP in various studies, including text-based categorization [46], misleading headlines recognition [47], and opinion categorization [48]. It has also been utilized specifically for fake news detection [49,50], but these studies lack the effective exploration of CapsNet. While DL models have seen extensive use in analyzing news articles, they have yet to work out the following: retaining long-term word dependencies, using a parallelization technique in training, accepting bidirectional input sentences, and maintaining an attention mechanism. The proposed HyproBert model integrates the convolution layer, BiGRU, and CapsNet with self-attention to extract the spatial features along with contextual information and hidden representation from multiple orientations. In the current work, DistilBERT, a lighter and more efficient version of the BERT model has been implemented to extract context-based high-level characteristics from news text. A BERT-based model outperforms alternatives in terms of training loss decay time [13]. Adopting bidirectional pretrained word embeddings accelerates model training and improves classification performance [21]. Additionally, the proposed HyproBert model integrates the convolution layer, Bidirectional- Gated Recurrent Unit (BiGRU), and CapsNet with self-attention to extract the spatial features along with contextual information and hidden representation from multiple orientations. The BiGRU model is superior in its ability to capture the semantic combination of text and to store extended context data [50]. A CapsNet equipped with a self-attention mechanism is employed to simplify the routing of CapsNet at each layer [49]. The goal of the proposed HyproBert model is to improve the performance of classification by offering an interesting, effective, and efficient model to detect fake news.

3. Methodology

In this section, a detailed description of the proposed model HyproBert for fake news detection in news articles is presented.

The proposed HyproBert model is presented in Figure 1. The title of the news article along with the complete text is utilized for the tokenization process. Afterward, embedding vectors are generated to represent features in the solution space for further classification. Initially, CNN is applied along with operations to extract the spatial features from the embeddings. BiGRU is applied sequentially to the output of CNN to gather contextual information. Capsule networks are used along with the attention layer to enhance the spatial features [46]. Although CNN has extracted the spatial features from the contents, the capsule network identifies the spatial relations between the extracted features to model the hierarchical relationships within the data [45]. The proposed HyproBert model is a combination of essential, efficient, and effective DL models. The sequential processing of input extracts, enhances, and correlates highly valuable features semantically and contextually [51]. The introduction of the attention layer between the BiGRU and CapsNet enlightens context awareness [46] and helps CapsNet to identify semantic representation and hidden contextual information [49]. The details of the steps involved, from preprocessing data to the final output of the proposed HyproBert model, are provided in the following section.

3.1. Data Preprocessing

In order to utilize a dataset, it must be cleansed according to the requirements of the model. It will significantly help the model to understand the data in a better manner and increase its performance. Generally, it includes steps, such as stop word removal, tokenization, removal of special characters and spaces, numbers to words, sentence segmentation, etc. The datasets under consideration are real-world news datasets, and they contain several URLs. In this research, we examine text semantics for deep textual context. As a basic preprocessing step, we removed URLs, stop words, and other noise from the given data before feeding it into the proposed model.

3.2. The Proposed HyproBert Model

The components of the proposed HyproBert model along with the function and significance of each component is described as follows.

3.2.1. Input Layer

After preprocessing, the data are sent into the HyproBert input layer. The input layer is responsible for separating the title and text of a news article into smaller tokens. Furthermore, each token is indexed with a unique number using a dictionary. Additionally, padding is used to keep the length of the input text constant. Finally, numerical vectors are created from all the text from the news article N, converted into tokens t, and indexed with a dictionary D, such that

N \in D^{1 \times t}

.

D i s t i l B e r t T o k e n i z e r

is used for end-to-end tokenization.

3.2.2. Embedding Layer

The embedding layer is responsible for learning the word representation based on training over immense real-world data. It also finds that words that are connected conceptually have comparable vector representations. In this research, we use DistilBERT embedding, which is an approximate version of the BERT. It performs around 97% similar to the BERT model and uses the approximation of the posterior distribution generated by the BERT model [21]. It has over 110 million parameters and is still much smaller than BERT. The number of transformer layers is also reduced to six layers. Furthermore, the training and prediction times are significantly smaller. This makes DistilBERT the ultimate choice for the proposed model.

3.2.3. Convolutional Layer

Convolutional layers are the fundamental constituents of convolutional neural networks [18]. The output of neurons is controlled by the convolutional layer. By figuring out the scalar product of their weights and the density region of the input, the neurons are linked to parts of the input that are close to them [19].

The convolution layer is utilized in HyproBert to extract the spatial features from the embedding layer. In experimentation, a one-dimensional convolution layer along with 128 filters of 3 sizes is employed. The local spatial features are combined in a pooling operation to generate high-order features.

M a x P o o l i n g

function is utilized for the pooling operation.

R e L U

is used as an activation function to improve the performance. The feature sequence can be represented as

S_{i} = S ((W_{n} . x_{n}) + b)

), where i is the index of the feature sequence,

W_{n}

is the filter weight,

x_{n}

is the window size, and b is the bias weight.

3.2.4. Bigru Layer

The gated recurrent unit (GRU) is the fundamental building block of the Gated Recurrent Neural Network (GRNN). At each iteration, GRNN takes a textual vector as input and uses the previous iteration’s output vector as a weight in a weighted sum that is used to update the nodes in its hidden layer. The present context is built using a bias vector to selectively retain or discard related information. The operation of a hidden layer is managed by the following equations:

\begin{matrix} I_{t} = σ (W_{I} \cdot [y_{t - 1}, x_{t}]) \\ K_{t} = σ (W_{k} \cdot [y_{t - 1}, x_{t}]) \\ {\tilde{y}}_{t} = tanh (W \cdot [r_{t} * y_{t - 1}, x_{t}]) \\ y_{t} = (1 - I_{t}) * y_{t - 1} + I_{t} * {\tilde{y}}_{t} \end{matrix}

(1)

where

I_{t}

is the update gate,

K_{t}

is the reset gate,

σ

is a sigmoid function,

t a n h

is a hyperbolic tangent function, and

W_{I}

,

W_{k}

, and

r_{t}

are training parameters. A

B i G R U

network is capable of understanding and learning relationships between current, past, and future data, which is effective for extracting deep features from the input sequence [50]. The structure of

B i G R U

is presented in Figure 2.

The proposed model HyproBert utilizes BiGRU for its specialties in processing context and extracting sentence representations with forwarding and backward direction propagation [50]. The BiGRU layer is applied directly to the output of the convolution layer. A forward GRU processes the input sequence of

{(S}_{1}, S_{2}, S_{3}, \dots, S_{128})

, and the backward GRU processes the input sequence of

{(S}_{128}, S_{127}, S_{126}, \dots, S_{1})

. Later, both GRUs are integrated to combine the collected context information. As a result, the output of the BiGRU layer has an improved feature representation of propaganda incorporated in the input news text.

3.2.5. Attention Layer

The contextual information and representations from BiGRU are stored in a fixed-size vector. In the case of large input sequences, the related information cannot be fully stored in the vector, which hinders the understanding of the model, hence decreasing the overall efficiency. The attention mechanisms are known to focus on the most important parts of input while generating dynamic adaptive weights [22,23]. Furthermore, self-attention (SA) includes the interaction of inputs all together to generate an output for a single input, as shown in Figure 3. SA is a powerful tool for identifying self-awareness and dependencies amongst input sequences [46]. SA considers the entire context of vector representations and substitutes the words on the principle of location and associated weights.

In the proposed model, HyproBert, SA is used to highlight the context awareness and relevant features to detect the hidden relations in news data. This will result in a better understanding of the data and increase classification accuracy.

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(2)

where Q is query, k is key, v is the value, and

d_{k}

is the linear dimension of k.The HyproBert model uses the self-attention process; therefore, it sets Q = K = V. This offers the benefit that the present position’s information and the information of all other places may be computed to capture the interdependence throughout the full sequence. If the input is a sentence, for instance, each word must be attention-computed alongside all other words in the phrase.

3.2.6. Capsule Network Layer

Generally, CNN is known to mislay important information during the classification process. A higher performance of CNN requires hyperparameter tuning, which is a cumbersome and manual task. Hinton et al. [52] presented the CapsNet, a novel neural network architecture. It collects the syntactically enhanced characteristics from the input data while considering the positions and order of other words in a sentence. It has the ability to distinguish between full and partial relations in textual data. It outperforms a CNN in terms of recognizing a representation of the data and underlying relevant information in the input text [46,47,48]. Over the years, it has shown outstanding results for text categorization and information retrieval.

CapsNet is a combination of several capsules connected as neurons to detect semantic and syntactic information from input data. The capsules are presented as vectors of classification probability, and the direction of the capsules represents the position of text in the input. Additionally, the weight of hidden features is adjusted by the dynamic routing algorithm [53]. This improves and controls the limits on attention and connections between capsules to optimize the capsule network.

The proposed

H y p r o B e r t

model employs the capsule network to develop and focus the semantic and syntactic awareness of the news text input [49]. The output from the attention layer is provided as input for CapsNet. Initially, a nonlinear function is used to convert the input to a feature capsule

F_{i}

, which is then used to produce a prediction vector

P_{j ∥ i}

along with a weight matrix

W_{i j} \forall P_{j ∥ i} = {W_{i j} F}_{i}

to forecast the relationship between the layers of the input and the outcome. Furthermore, the coupling coefficients

C_{i j}

are calculated using the dynamic routing algorithm, represented in Equation (5). The vector

T_{i j}

is updated throughout the number of iterations

\exists T_{i j} ∥ T_{i j} + R_{j}^{M} f (F_{i}, θ_{j})

. This process provides attention to the high weights of related propaganda words and overlooks the irrelevant, less effective words, as shown in Figure 4. As a result, the output of the capsule has a higher-level contextual representation considering various orientations. The output of the capsule can be represented as Equation (3), where

B_{i j}

and

B_{i f}

represent the standard exponential function for the input vector and the output vector, respectively. The final output of capsule

R_{j}

contains a local ordering of words considering various orientations; it is represented in Equation (5).

C_{i j} = \frac{exp (B_{i j})}{\sum_{f} exp (B_{i f})}

(3)

O_{j} = \sum_{i = 1}^{n} C_{i j} P_{j ∣ i}

(4)

R_{j} = \frac{{∥O_{j}∥}^{2}}{1 + {∥O_{j}∥}^{2}} \frac{O_{j}}{∥O_{j}∥}

(5)

3.2.7. Dense and Output Layer

The proposed HyproBert model uses the output of the capsule network layer as input to the fully connected layer. This layer classifies the input by applying the sigmoid activation function. The collection of higher probability scores determines the news article to be fake news and vice versa.

The orderly processing of the proposed HyproBert model is presented in Algorithm 1. Firstly, the processing of HyproBert is set to sequential. The news article’s title and the text body are separated for tokenization. The tokens are then used for word embeddings. DistilBERT is used for both tokenization and embedding [21]. A single layer of one-dimensional CNN executes the word embedding vectors to extract the spatial features from the data. An optimized dropout of 0.4 is allotted to the model. Later, two layers of BiGRU are deployed. The self-attention layer, along with the capsule layer with optimized hyperparameters, is deployed. A dense layer with sigmoid activation is deployed to integrate the features required for the classification process. Finally, HyproBert is compiled using the Adam optimizer.

Algorithm 1

H y p r o B e r t

Require:: News articles
Ensure:: Fake news
1:: Model ← Sequential ()
2:: Model ← model.add(DistilBertTokenizer())
3:: Model ← model.add(Distillbert embeddings())
4:: Model ← model.add(Conv1D(128, 3, activation = ReLU))
5:: Model ← model.add(DropOut(0.4))
6:: Model ← model.add(BiGRU(128))
7:: Model ← model.add(BiGRU(128))
8:: Model ← model.add(Self-attention)
9:: Model ← model.add(Capsule(3, 5, 4))
10:: Model ← model.add(Dense(1, activation = Sigmoid))
11:: Model ← Model.compile(binary_crossentropy, Adam,)

4. Experiments and Results

The experimental details are presented in this section. It includes datasets, experiments, hyperparameter settings, evaluation metrics, comparative analysis, and results.

4.1. Dataset

In this research, we validated the proposed HyproBert model using two datasets i.e., FA-KES (https://zenodo.org/record/2607278 (accessed on 9 May 2022)) and ISOT (https://www.uvic.ca/engineering/ece/isot/datasets/fake-news/index.php (accessed on 9 May 2022)). The specifications of the employed datasets are provided in the Table 1.

The FA-KES [54] is a fake news dataset. It contains 804 news articles. These news articles are related to the Syrian war and contain title, date, location, and news text information. All the articles are marked, i.e., fake news is marked withzero, and real news is marked as one. It is considered a well-balanced dataset as 47% of news articles are fake news and the rest (53%) are real news.

The ISOT [55] is also a fake news detection dataset collected by the ISOT lab at the University of Victoria. It consists of 45,000 news articles. The collected news is related to politics and world news between 2016 and 2017, containing the title, date, topic, and news body text. The source for true news is the Reuters website, and fake news is collected from various sources, including PolitiFact and Wikipedia. This is also considered a well-balanced dataset, with almost an equal distribution of fake news and true news.

4.2. Experimental Settings

The proposed HyproBert model is developed and evaluated on

J u p y t e r

(https://jupyter.org/ (accessed on 26 May 2022)) notebook based on

G o o g l e C o l a b

(https://colab.research.google.com/ (accessed on 26 May 2022)) using

P y t h o n

(https://www.python.org/ (accessed on 26 May 2022)) programming language.

C o l a b

is a cloud-based tool that provides the environment to execute code along with the required GPUs and TPU for high performance. The reason for choosing Python as a preferred programming language is its fast execution and vast support libraries for building machine learning models.

We applied

p a n d a s

(https://pandas.pydata.org/ (accessed on 30 May 2022)) and

N u m P y

(https://numpy.org/ (accessed on 30 May 2022)), which are open-source libraries that provide data structures and data analysis tools for data manipulation and analysis. We applied

N L T K

(https://www.nltk.org/ (accessed on 30 May 2022)), a natural language toolkit in the extraction and characterization phase.

T e n s o r F l o w

(https://www.tensorflow.org/ (accessed on 30 May 2022)) is an open-source library package developed by Google to help researchers to build and test deep learning models. We have used the sequential model of the

K e r a s

(https://keras.io/ (accessed on 30 May 2022)) TensorFlow deep learning Python library to build and train the proposed HyproBert model. Additionally,

M a t p l o t l i b

(https://matplotlib.org/ (accessed on 30 May 2022)) a package of python is used for plotting graphs and charts.

4.3. Hyperparameter Settings

The datasets under consideration (FA-KES and ISOT) are divided into training and testing subsets, with 80% and 20%, respectively. The optimized hyperparameters for the convolution layer are set as the number of filters to be 128 and a pool size of 02. Furthermore, the number of neurons employed by BiGRU is 128. An attention layer is implemented to extract the hierarchical context representation using the weights of the keywords [25]. Finally, the capsule network with three routing iterations and five capsules with four dimensions is utilized. Additionally, the proposed HyproBert model uses a dropout of 0:4, a batch size of 128, an epoch of 100, and sigmoid as an activation function and Adam as an optimizer.

4.4. Comparison Models

The performance of the proposed HyproBert model is compared with baseline machine learning and state-of-the-art deep learning models including Logistic Regression (LR), Random Forest (RF), Multinomial Naive Bayes (MNB), K-Nearest Neighbors (KNN’s), CNN, RNN, LSTM, BiLSTM, and BiGRU. Baseline models were tested with their default parameters, while the state-of-the-art models are evaluated using the settings provided by the related authors. The efficacy of the proposed HyproBert model is evaluated using various a hyperparameter setting to optimize the performance for the fake news detection. The proposed HyproBert model is evaluated using standard evaluation matrices, i.e.,

A c c u r a c y

,

P r e c i s i o n

,

R e c a l l

, and

F_{M e a s u r e}

[56].

4.5. Results

The result of the proposed HyproBert model on FA-KES and ISOT datasets along with the comparison is shown in Table 2 and Table 3 below. The results reveal that the HyproBert model surpassed the performance benchmarks for the FA-KES and ISOT dataset in terms of

A c c u r a c y

,

P r e c i s i o n

,

R e c a l l

, and

F_{M e a s u r e}

.

Although, Elhadad et al. [57] reported 100% accuracy with their Decision Tree classifier on the ISOT dataset, the accuracy of their approaches varied from 85% to 100%. Only 25,200 articles from the ISOT dataset were used; therefore it appears they only used a portion of it. The proposed HyproBert method performs better than the Elhadad et al. [57] state-of-the-art method. When it came to accuracy, Random Forests performed significantly better than other baseline approaches on both datasets. For FA-KES, KNNs performed best, while Decision Trees performed better for ISOT among baseline models. Hybrid CNN-RNN [43] used a combination of convolutional and recurrent neural networks which resulted in competitive performance among the state-of-the-art models. The ensemble model [42] presented a decay in performance for both datasets.

The proposed HyproBert model outperformed the baseline models and state-of-the-art models. Various factors contributed to achieving the high performance of HyproBert, including the selection of the hyperparameter values, which played a vital role in higher accuracy [6]. The choice of one-dimensional CNN executed the extraction of spatial features efficiently and effectively [23]. The BiGRU was deployed in a multi-layer manner, which captures and integrates different textual features simultaneously [50]. By including input gates, output gates, and forget gates, it was able to circumvent the issue of long-term reliance. The multi-layer BiGRU provided significantly better feature extraction. However, increasing the number of BiGRU layers will increase the processing cost exponentially. Therefore, the optimal number of layers is obtained from experimentation. To remember the background details that BiGRU has supplied, the attention mechanism computes an adaptive weight that changes constantly [23]. It strengthens the generalization ability and solves the overfitting problem [46]. A self-attention layer along with CapsNet is also added to the ensemble to dynamically highlight the syntactically explicit features considering various orientations [49].

The classification performance of balanced datasets is often superior to that of imbalanced datasets [58]. Conventional classifiers may not be adequate for imbalanced classifications because most classifiers assume the classified dataset is balanced to achieve maximum accuracy. However, due to a lack of data, minority-class samples have low prediction accuracy. Classifiers have a strong tendency to favor samples from the majority class and ignore those from the minority class [59,60]. In this context, the usage of a deep neural network appears promising, as it can comprehend the essential distribution of the dataset and therefore avoid overfitting.

5. Discussions

As seen in the previous section, the proposed HyproBert model outperformed all the baseline and state-of-the-art models for fake news detection. The study of baseline models provided the intuition for the development of HyproBert. The best performing baseline models and advanced models are integrated and tuned in a logical manner to solve a fake news detection problem effectively, efficiently, and correctly. There are various reasons that contribute to the high performance of HyproBert. Various arrangements and hyperparameters were involved to obtain the optimal outcomes from HyproBert. These optimal values were gathered using extensive experimentation for each parameter involved. The details and effects of these parameters are described in the section below.

5.1. Convolutional Layer Optimization

Hyperparameters play a vital role in the performance of deep neural networks. An optimal set of hyperparameters produces a highly optimized output. The filter size is an important parameter in detecting the spatial features from the input. Various configurations of filter size are experimented with to achieve the optimal filter size. The optimal size of filters is determined to be three for both datasets under consideration. The effects of the filter size on the performance of the proposed HyproBert model are presented in Figure 5.

Another important parameter to enhance the performance of the proposed HyproBert model is the activation function. The activation function is responsible for the activation of neurons in a deep neural model. Two activation functions, SoftMax and sigmoid, were considered for experimentation. The results presented in Figure 6 have revealed that the sigmoid activation function enhances the performance of the proposed HyproBert model for both datasets due to the binary nature of the classification solution under consideration.

The number of epochs is also considered an important factor in improving the performance of the proposed HyproBert model. Increasing the number of epochs does not always result in more elevated performance. Therefore, we need to detect the optimal number of epochs to satisfy the training requirements of HyproBert. The optimal number of epochs was determined as 100 and 200 for the FA-KES and ISOT datasets, respectively. The effects of the number of epochs on the proposed HyproBert model are presented in the Figure 7.

5.2. Bigru Optimization

The number of neurons in a BiGRU highly affected the classification performance of the proposed HyproBert model. The experimentation included the number of neuron values from 32, 64, 128, and 256. The optimal number of neurons was determined to be 128 for both datasets. The impact of the number of neurons of BiGRU on HyproBert is represented in Figure 8.

A single BiGRU is very effective in the classification process. The performance of BiGRU can be enhanced by deploying it in a multi-layer manner. Increasing the number of layers of BiGRU will increase resource utilization as well [50]. Therefore, to a certain extent, it is considered a trade-off between available resources and performance. The optimal number of BiGRU layers was determined to be two for both datasets. The results of the increasing number of layers on the

A c c u r a c y

,

P r e c i s i o n

,

R e c a l l

, and

F_{M e a s u r e}

are presented in Table 4 and Table 5 for the FA-KES and ISOT datasets, respectively.

The experimentation also included testing various multi-layers of BIGRU to find the optimal number of layers of iterations and training time for the proposed HyproBert model. The effect of increasing Bi-GRU layers and the number of iterations is represented in Figure 9.

5.3. Capsnet Optimization

Just like with conventional neural networks, the performance of CapsNet heavily depends on its hyperparameters. Generally, the determination of hyperparameters requires experimentation because there is no specific optimal value for each hyperparameter. The optimal values vary for different data and solution types. One such hyperparameter is the number of capsules. The optimal number of capsules was determined to be five for both datasets. The capsule is the set of neurons that highlight a specific property. The effects of the number of capsules for the proposed HyproBert model are presented in Figure 10.

The next important consideration is the dimensions of the capsule. The lower dimension of the capsule will contain fewer features, and it will lose most of the information. A higher-dimensional capsule will undoubtedly contain more important features, but the computation complexities will increase. Therefore, an optimal number of dimensions is needed for optimal output. The dimension of a capsule indicates the length of the output vector. The dimension values, including two, four, and eight were tested for both datasets under consideration. After considering various options during the experimentation, the optimal dimension of the capsule was determined to be four for both datasets. The effects of the dimensions of a capsule on the proposed HyproBert model are presented in Figure 11.

The number of routing iterations is also a hyperparameter of CapsNet. It is responsible for connecting the capsules of the consecutive layers. The experimentation process for evaluating the optimal value of routing iteration was performed on both datasets under consideration. The testing range of values included 2, 3, 4, and 5. The optimal value was determined to be three iterations for both datasets. The effects of the number of routing iterations on the proposed HyproBert model are presented in Figure 12.

5.4. Imbalanced Data Study

The discrete imbalanced data problem affects many real-world applications and hinders the effectiveness of machine learning models when identifying samples from the marginalized group. There have been several investigations on this issue, with a concentration on imbalanced statistical datasets. However, imbalanced textual datasets have received relatively little attention [60]. The ISOT dataset is well-balanced, featuring both fake and real news articles [55]. The focus of this part of study is on skewed data, so we merged the ISOT dataset with 10,000 real news form the RCV1-v2 [61] dataset and named the resulting collection ISOT-R10.

Ranking metrics, threshold metrics, and probability metrics are the three types of performance evaluation measures [62]. In classification evaluation, basic threshold metrics are more frequently used. As the outcomes of evaluation metrics are crucial for making decisions, picking the right metrics is of utmost importance. There is evidence that relying just on accuracy as a criterion for evaluating classification in unbalanced datasets might lead to erroneous conclusions [63,64]. In a comparison of the F-measure and accuracy measures, [65] demonstrated that both can produce overly optimistic findings in the binary unbalanced data classification problem, which can lead to poor conclusions. According to [66], when datasets are not evenly split, the minority class experiences a higher rate of classification errors. The ROC-AUC [67] can be used to demonstrate the proposed hyproBert model’s ability to distinguish between the two categories, as the model performs a binary classification. When using unbalanced datasets, the area under the receiver operating characteristic ROC-AUC, is a good metric for gauging the efficacy of a model. Furthermore, metrics such as accuracy, precision, sensitivity, specificity, and AUC were considered for model evaluation in light of the imbalanced dataset evaluations [67].

As with our prior ISOT evaluations, we classified news samples in the ISOT-R10 dataset using the same baseline and state-of-the-art models. Table 6 demonstrates that accuracy was impacted by the oversampling of real news in the ISOT-R10 dataset, but the proposed HyproBert model is still significantly superior to other models in terms of performance measures.

Furthermore, to monitor the effects of imbalanced data on the performance of HyproBert, we added 10,000 more real news in each step twice to the ISOT-R10 dataset and named the updated dataset as ISOT-R20 and ISOT-R30, respectively. The comparison of the performance on imbalance data is provided in Table 7.

The findings of the unbalanced studies imply that the imbalanced data offers an additional challenge, as the majority of classifiers will demonstrate bias toward the class that comprises the majority of the data. A high-class imbalance is inherently prevalent in many applications that are used in the real world [68]. Implementing feature extraction and oversampling algorithms to address the challenge of binary unbalanced data classification will further improve the classification using imbalanced datasets [60].

5.5. Ablation Study

An ablation study was performed to evaluate the effectiveness of the designed architecture of the proposed model HyproBert. The architecture of the HyproBert was altered with different hybrid models in three combinations to measure the best possible outcome. The ablation architectures were evaluated with the FA-KES and ISOT datasets and the results are presented in the Table 8.

The first ablation architecture is BiGRU + CapsNet; BiGRU collects the contextual information, and CapsNet perceives the hidden relations amongst the identified features. The second ablation architecture is BiLSTM + CapsNet + Conv. BiLSTM processes data in the cell state as well as hidden states, which can be costly. The third ablation architecture is Conv + BiLSTM + Attention + CapsNet; BiGRU is replaced by the BiLSTM. The fourth is the hybrid of the proposed HyproBert model. The three identified models for the said study along with the proposed model and resultant F1 score are presented in the table. The results suggest that the proposed architecture of HyproBert is effective and performs better than its counter-ablation study architectures. The receiver operating characteristics ROC curve determines the usefulness of the binary classifier system, presented in Figure 13. The area under curve AUC determines the performance of the model, i.e., greater area means greater performance.

Throughout this body of work, we have investigated the efficacy of a variety of machine learning models as well as deep learning methods. Following our analysis, we concluded that the accuracy of detecting fake news for the FA-KES and ISOT datasets is not up to par. Consequently, a pretrained model for word embeddings is utilized in this research. DistilBERT is based on bidirectional transformer encoders. The BERT-based encoder is a robust feature extractor that yields good results in NLP tasks. The training loss decays faster with the BERT-based model than with the other word embedding model [21]. It is concluded that adopting bidirectional pretrained word embedding results in faster model training, reduced cross-entropy loss, and improved classification task performance.

Additionally, the BiGRU model can more effectively capture the semantic combination of text and store longer context information [50]. In addition, we stacked BiGRU units layer by layer to create a system with multiple layers. The output of the first layer is used without alteration as the input of the second layer. Therefore, the multi-layer BiGRU feature improvement contributed to the model’s improved classification. Furthermore, the layer-by-layer complexity of CapsNet’s routing is reduced by employing CapsNet with a self-attention mechanism in the proposed hyprobert model [49]. It increases the proportion of computations that could be performed simultaneously while reducing the number of required sequential processes [53]. Self-attention routing is able to capture the capsule’s global information, which results in fewer misclassification errors. A detailed analysis of the proposed hyprobert model has resulted in improved classification performance.

This research is limited to content-based features, which may not be sufficient to improve the performance of shorter news, such as that seen on social media. Future studies should also examine a comprehensive examination of news origins and context. Classification using imbalanced datasets can be enhanced through the use of feature extraction and oversampling algorithms, both of which aim to rectify the issue of binary unbalanced data classification [68]. In addition, we are working to incorporate meta data and supporting information, such as related images and social context, into our future publications.

6. Conclusions

A hybrid model based on deep hypercontext for fake news detection, HyproBert, is presented in this paper. HyproBert uses the distill-Bert for embeddings, the convolution neural network for spatial data extraction, the BiGRU for contextual data extraction, and the CapsNet with self-attention for the hierarchical understanding of full and partial relations among data. The CNN, BiGRU, and CapsNet are equipped with optimized hyperparameters. The proposed HyproBert model is evaluated using FA-KES and ISOT datasets. HyproBert results in high performance; it surpassed the performance of the baseline models. The results indicate that the HyproBert has an accuracy of 61.1% and 99.3% for the FA-KES and ISOT datasets, respectively. In the future, a multimodal approach using text and images simultaneously for better classification and fake news detection can be explored.

Author Contributions

Conceptualization, M.I.N. and S.A.H.M.; methodology, M.I.N. and Z.Z.; software, M.I.N., S.A.H.M., D.L., M.S. and F.K.K.; validation, S.A.H.M., D.L., M.S. and S.M.M.; formal analysis M.I.N., S.A.H.M., Z.Z. and S.M.M.; investigation, M.I.N., S.A.H.M., K.A., D.L. and S.M.M.; resources S.A.H.M., and M.S.; data curation, S.A.H.M., Z.Z. and M.S.; writing—original draft preparation, M.I.N.; writing—review and editing, K.A. and F.K.K.; visualization, M.I.N.; supervision, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R300), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R300), Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MDPI	Multidisciplinary Digital Publishing Institute
API	Application Programming Interface
NLP	Natural Language Processing
ML	Machine Learning
DL	Deep Learning
Conv	Convolutional Layer
GRU	Gated Recurrent Units
BERT	Bidirectional Encoder Representations from Transformers
BiGRU	Bidirectional GRU
BiLSTM	Bidirectional Long Short-Term Memory
CapsNet	Capsule Neural Network

References

Kumar, S.; West, R.; Leskovec, J. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In Proceedings of the 25th international conference on world wide web, Montreal, Canada, 11–15 April 2016; pp. 591–602. [Google Scholar]
Ratkiewicz, J.; Conover, M.; Meiss, M.; Gonçalves, B.; Patil, S.; Flammini, A.; Menczer, F. Truthy: Mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on world wide web, Hyderabad, India, 28 March–1 April 2011; pp. 249–252. [Google Scholar]
Ozbay, F.A.; Alatas, B. Fake news detection within online social media using supervised artificial intelligence algorithms. Phys. Stat. Mech. Its Appl. 2020, 540, 123174. [Google Scholar] [CrossRef]
Faustini, P.H.A.; Covões, T.F. Fake news detection in multiple platforms and languages. Expert Syst. Appl. 2020, 158, 113503. [Google Scholar] [CrossRef]
Guo, B.; Ding, Y.; Yao, L.; Liang, Y.; Yu, Z. The Future of False Information Detection on Social Media: New Perspectives and Trends. Acm Comput. Surv. (Csur) 2020, 53, 1–36. [Google Scholar] [CrossRef]
Ozbay, F.A.; Alatas, B. A novel approach for detection of fake news on social media using metaheuristic optimization algorithms. Elektron. Elektrotechnika 2019, 25, 62–67. [Google Scholar] [CrossRef] [Green Version]
Pérez-Rosas, V.; Kleinberg, B.; Lefevre, A.; Mihalcea, R. Automatic detection of fake news. arXiv 2017, arXiv:1708.07104. [Google Scholar]
Ahmed, H.; Traore, I.; Saad, S. Detection of online fake news using n-gram analysis and machine learning techniques. In International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments; Springer: Cham, Switzerland, 2017; pp. 127–138. [Google Scholar]
Kumar, A.; Sangwan, S.R.; Nayyar, A. Rumour veracity detection on twitter using particle swarm optimized shallow classifers. Multimed. Tools Appl. 2019, 78, 24083–24101. [Google Scholar] [CrossRef]
Akyol, K.; Sen, B. Modeling and predicting of news popularity in social media sources. Cmccomputers Mater. Contin. 2019, 61, 69–80. [Google Scholar] [CrossRef]
Asghar, M.Z.; Habib, A.; Habib, A.; Khan, A.; Ali, R.; Khattak, A. Exploring deep neural networks for rumor detection. J. Ambient Intell Human Comput. 2019, 12, 4315–4333. [Google Scholar] [CrossRef]
Chen, T.; Li, X.; Yin, H.; Zhang, J. Call attention to rumors: Deep attention based recurrent neural networks for early rumor detection. In Proceedings of the Pacifc-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, VIC, Australia, 3–6 June 2018; Springer: Cham, Switzerland, 2018; pp. 40–52. [Google Scholar]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef]
Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B.J.; Wong, K.F.; Cha, M. Detecting rumors from microblogs with recurrent neural networks 2016. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), New York, NY, USA, 9–15 July 2016. [Google Scholar]
Shu, K.; Cui, L.; Wang, S.; Lee, D.; Liu, H. Defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 95–405. [Google Scholar]
Wang, W.Y. Liar, liar pants on fre: A new benchmark dataset for fake news detection. arXiv 2017, arXiv:1705.00648. [Google Scholar]
Yin, L.; Meng, X.; Li, J.; Sun, J. Relation extraction for massive news texts. Comput. Mater Contin. 2019, 58, 275–285. [Google Scholar] [CrossRef]
Yu, F.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. A convolutional approach for misinformation identification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3901–3907. [Google Scholar]
Goldani, M.H.; Safabakhsh, R.; Momtazi, S. Convolutional neural network with margin loss for fake news detection. Inf. Process. Manag. 2021, 58, 102418. [Google Scholar] [CrossRef]
Li, D.; Guo, H.; Wang, Z.; Zheng, Z. Unsupervised fake news detection based on autoencoder. IEEE Access 2021, 9, 29356–29365. [Google Scholar] [CrossRef]
Kavatagi, S.; Rachh, R. A Context Aware Embedding for the Detection of Hate Speech in Social Media Networks. In Proceedings of the 2021 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Pune, India, 29–30 October 2021; pp. 1–4. [Google Scholar]
Jain, D.; Kumar, A.; Garg, G. Sarcasm detection in mash-up language using soft-attention based bidirectional LSTM and feature-rich CNN. Appl. Soft Comput. 2020, 91, 106198. [Google Scholar] [CrossRef]
Kumar, A.; Sangwan, S.R.; Arora, A.; Nayyar, A.; Abdel-Basset, M. Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 2019, 7, 23319–23328. [Google Scholar]
Nadeem, M.I.; Ahmed, K.; Li, D.; Zheng, Z.; Alkahtani, H.K.; Mostafa, S.M.; Mamyrbayev, O.; Abdel Hameed, H. EFND: A Semantic, Visual, and Socially Augmented Deep Framework for Extreme Fake News Detection. Sustainability 2023, 15, 133. [Google Scholar] [CrossRef]
Zheng, W.; Xun, Y.; Wu, X.; Deng, Z.; Chen, X.; Sui, Y. A Comparative Study of Class Rebalancing Methods for Security Bug Report Classification. IEEE Trans. Reliab. 2021, 70, 1–13. [Google Scholar] [CrossRef]
Wang, H.; Gao, Q.; Li, H.; Wang, H.; Yan, L.; Liu, G. A Structural Evolution-Based Anomaly Detection Method for Generalized Evolving Social Networks. Comput. J. 2022, 65, 1189–1199. [Google Scholar] [CrossRef]
Zheng, W.; Yin, L.; Chen, X.; Ma, Z.; Liu, S.; Yang, B. Knowledge base graph embedding module design for Visual question answering model. Pattern Recognit. 2021, 120, 108153. [Google Scholar] [CrossRef]
Li, D.; Ahmed, K.; Zheng, Z.; Mohsan, S.A.H.; Alsharif, M.H.; Hadjouni, M.; Jamjoom, M.M.; Mostafa, S.M. Roman Urdu Sentiment Analysis Using Transfer Learning. Appl. Sci. 2022, 12, 10344. [Google Scholar] [CrossRef]
Zheng, W.; Liu, X.; Ni, X.; Yin, L.; Yang, B. Improving Visual Reasoning Through Semantic Representation. IEEE Access 2021, 9, 91476–91486. [Google Scholar] [CrossRef]
Zheng, W.; Liu, X.; Yin, L. Sentence Representation Method Based on Multi-Layer Semantic Network. Appl. Sci. 2021, 11, 1316. [Google Scholar] [CrossRef]
Liang, X.; Luo, L.; Hu, S.; Li, Y. Mapping the knowledge frontiers and evolution of decision making based on agent-based modeling. Knowl.-Based Syst. 2022, 250, 108982. [Google Scholar] [CrossRef]
Li, J.; Xu, K.; Chaudhuri, S.; Yumer, E.; Zhang, H.; Guibas, L. GRASS: Generative recursive autoencoders for shape structures. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, C.; Zheng, L.; Xu, K. ROSEFusion: Random optimization for online dense reconstruction under fast camera motion. ACM Trans. Graph. 2021, 40, 1–17. [Google Scholar] [CrossRef]
Shi, Y.; Xu, X.; Xi, J.; Hu, X.; Hu, D.; Xu, K. Learning to Detect 3D Symmetry From Single-View RGB-D Images With Weak Supervision. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 1–15. [Google Scholar] [CrossRef]
Zhang, J.; Tang, Y.; Wang, H.; Xu, K. ASRO-DIO: Active Subspace Random Optimization Based Depth Inertial Odometry. IEEE Trans. Robot. 2002, 1–13. [Google Scholar] [CrossRef]
She, Q.; Hu, R.; Xu, J.; Liu, M.; Xu, K.; Huang, H. Learning High-DOF Reaching-and-Grasping via Dynamic Representation of Gripper-Object Interaction. ACM Trans. Graph. 2022, 41, 1–14. [Google Scholar] [CrossRef]
Xie, L.; Zhu, Y.; Yin, M.; Wang, Z.; Ou, D.; Zheng, H.; Yin, G. Self-feature-based point cloud registration method with a novel convolutional Siamese point net for optical measurement of blade profile. Mech. Syst. Signal Process. 2022, 178, 109243. [Google Scholar] [CrossRef]
Shen, Y.; Ding, N.; Zheng, H.; Li, Y.; Yang, M. Modeling Relation Paths for Knowledge Graph Completion. IEEE Trans. Knowl. Data Eng. 2021, 33, 3607–3617. [Google Scholar] [CrossRef]
Wahid, J.A.; Shi, L.; Gao, Y.; Yang, B.; Wei, L.; Tao, Y.; Yagoub, I. Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response. Expert Syst. Appl. 2022, 195, 116562. [Google Scholar] [CrossRef]
Wahid, J.A.; Shi, L.; Gao, Y.; Yang, B.; Tao, Y.; Wei, L.; Hussain, S. Identifying and characterizing the propagation scale of COVID-19 situational information on Twitter: A hybrid text analytic approach. Appl. Sci. 2021, 11, 6526. [Google Scholar] [CrossRef]
Nadeem, M.I.; Ahmed, K.; Li, D.; Zheng, Z.; Naheed, H.; Muaad, A.Y.; Alqarafi, A.; Abdel Hameed, H. SHO-CNN: A Metaheuristic Optimization of a Convolutional Neural Network for Multi-Label News Classification. Electronics 2023, 12, 113. [Google Scholar] [CrossRef]
Toumi; Chahrazad; Bouramoul, A. Ensemble learning-based model for fake news detection. In Proceedings of the 2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS), Bouaghi, Algeria, 12–13 October 2022; pp. 1–8. [Google Scholar]
Nasir, J.A.; Khan, O.S.; Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. Int. Inf. Manag. Data Insights 2021, 1, 100007. [Google Scholar] [CrossRef]
Okunoye, B.O.; Ibor, A.E. Hybrid fake news detection technique with genetic search and deep learning. Comput. Electr. Eng. 2022, 103, 108344. [Google Scholar] [CrossRef]
Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with em routing. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3May 2018; pp. 44–51. [Google Scholar]
Jain, D.K.; Jain, R.; Upadhyay, Y.; Kathuria, A.; Lan, X. Deep refinement: Capsule network with attention mechanism-based system for text classification. Neural Comput. Appl. 2020, 32, 1839–1856. [Google Scholar] [CrossRef]
Bhattacharjee, U. Capsule network on social media text: An application to automatic detection of clickbaits. In Proceedings of the 2019 11th International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 7–11 January 2019; pp. 473–476. [Google Scholar]
Du, Y.; Zhao, X.; He, M.; Guo, W. A novel capsule based hybrid neural network for sentiment classification. IEEE Access 2019, 7, 39321–39328. [Google Scholar] [CrossRef]
Wang, B.; Ding, H. YNU NLP at SemEval-2019 task 5: Attention and capsule ensemble for identifying hate speech. In Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN, USA, 6–7 June 2019; pp. 529–534. [Google Scholar]
Ding, Y.; Zhou, X.; Zhang, X. YNU_DYX at SemEval-2019 task 5: A stacked BiGRU model based on capsule network in detection of hate. In Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN, USA, 6–7 June 2019; pp. 535–539. [Google Scholar]
Ahmed, K.; Nadeem, M.I.; Li, D.; Zheng, Z.; Ghadi, Y.Y.; Assam, M.; Mohamed, H.G. Exploiting Stacked Autoencoders for Improved Sentiment Analysis. Appl. Sci. 2022, 12, 12380. [Google Scholar] [CrossRef]
Hinton, G.E.; Krizhevsky, A.; Wang, S.D. Transforming autoencoders. In Artificial Neural Networks and Machine Learning; Honkela, T., Duch, W., Girolami, M., Kaski, S., Eds.; Springer: Cham, Switzerland, 2011; pp. 44–51. [Google Scholar]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30, 3856–3866. [Google Scholar]
Abu Salem, F.K.; Al Feel, R.; Elbassuoni, S.; Jaber, M.; Farah, M. FA-KES: A Fake News Dataset around the Syrian War. Proc. Int. Aaai Conf. Web Soc. Media 2019, 13, 573–582. [Google Scholar] [CrossRef]
Ahmed, H.; Traore, I.; Saad, S. Detecting opinion spams and fake news using text classification. J. Secur. Priv. 2018, 1, e9. [Google Scholar] [CrossRef] [Green Version]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
Elhadad, K.M.; Li, F.K.; Gebali, F. A novel approach for selecting hybrid features from online news textual metadata for fake news detection. In Proceedings of the International conference on p2p, parallel, grid, cloud and internet computing 2019, Antwerp, Belgium, 7–9 November 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 914–925. [Google Scholar]
Drummond, C.; Holte, R.C. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II; ICML: Washington DC, USA, 2003; Volume 11. [Google Scholar]
Li, J.; Wu, Y.; Fong, S.; Tallón-Ballesteros, A.J.; Yang, X.; Wu, F. A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data. J. Supercomput. 2022, 78, 7428–7463. [Google Scholar] [CrossRef]
Mirmorsal, M.; Motameni, H.; Mohamadi, H. KNNGAN: An oversampling technique for textual imbalanced datasets. J. Supercomput. 2022, 1–36. [Google Scholar] [CrossRef]
Lewis, D.D.; Yang, Y.; Rose, T.G.; Li, F. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 2004, 5, 361–397. [Google Scholar]
Ferri, C.; Hernández-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classifcation. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef]
Haibo, H.; Yunqian, M. Imbalanced Learning: Foundations, Algorithms, and Applications, 1st ed.; Wiley-IEEE Press: Hoboken, NJ, USA, 2013. [Google Scholar]
García, V.; Mollineda, R.A.; Sánchez, J.S. Index of balanced accuracy: A performance measure for skewed class distributions. In Pattern Recognition and Image Analysis; Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I., Eds.; IbPRIA 2009; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5524. [Google Scholar] [CrossRef] [Green Version]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefcient (MCC) over F1 score and accuracy in binary classifcation evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [Green Version]
Branco, P.; Torgo, L.; Ribeiro, R. A survey of predictive modelling under imbalanced distributions. Acm Comput. Surv (Csur) 2015, 49, 1–50. [Google Scholar] [CrossRef] [Green Version]
Andrew, P.B. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]

Figure 1. The proposed

H y p r o B e r t

model.

Figure 1. The proposed

H y p r o B e r t

model.

Figure 2. Structure of a BiGRU network.

Figure 3. Structure of a self-attention mechanism.

Figure 4. Structure of a capsule network.

Figure 5. Evaluations based on filter size of CNN.

Figure 6. Evaluations based on activation function.

Figure 7. Evaluations based on number of Epochs.

Figure 8. Evaluations based on number of neurons.

Figure 9. Evaluations based on Bi-GRU layers, number of iterations, and training time.

Figure 10. Evaluations based on number of capsules.

Figure 11. Evaluations based on dimensions of capsules.

Figure 12. Evaluations based on number of routing iterations.

Figure 13. ROC curve.

Table 1. Multi-label text news dataset specifications.

Dataset	Total	Topic	# Of Articles	Class
FA-KES	804	miscellaneous	378	Fake
		miscellaneous	426	True
ISOT	44898	government news	1570	Fake
		Middle East	778	Fake
		US news	783	Fake
		left news	4459	Fake
		politics	6841	Fake
		news	9050	Fake
		world news	10,145	True
		politics news	11,272	True

Table 2. Results of all models on the FA-KES dataset.

Dataset	Accuracy	Precision	Recall	F_Measure
LR	49.82	50.46	49.65	49.27
RF	53.92	56.39	53.42	54.14
MNB	38.88	39.40	38.74	32.86
KNNs	57.62	58.53	57.17	57.30
AB	47.57	49.12	47.27	47.12
DT(Elhadad et al. [57])	58.63	63.35	58.21	50.85
CNN	50.68	55.74	50.35	48.83
RNN	50.75	51.27	50.53	50.43
LSTM	59.66	58.24	59.17	58.92
BiLSTM	59.75	53.42	59.45	58.92
BiGRU	60.85	57.97	60.28	60.44
BiLSTM-CNN+task-specific word embedding [11]	60.9	61.48	58.69	60.19
CALLATRUMORS [12]	60.45	61.52	59.84	60.12
FakeBERT [13]	60.95	61.87	59.89	60.43
Ensemble model [42]	54.74	55.25	54.18	54.27
Hybrid CNN-RNN [43]	60	59	60	59
HyproBert	61.15	60.40	61.07	61.09

Table 3. Results of all models on the ISOT dataset.

Dataset	Accuracy	Precision	Recall	F_Measure
LR	52.38	50.15	52.34	42.92
RF	92.42	92.25	92.04	92.18
MNB	60.25	60.34	60.35	60.47
KNNs	60.14	67.35	61.57	56.39
AB	92.29	91.49	91.34	91.25
DT(Elhadad et al. [57])	100	–	–	–
CNN	99.24	98.45	98.72	98.44
RNN	98.74	98.21	98.42	98.19
LSTM	98.33	98.44	97.25	97.64
BiLSTM	98.43	98.61	97.11	98.75
BiGRU	98.68	98.17	96.73	98.21
BiLSTM-CNN+task-specific word embedding [11]	98.82	98.26	97.28	98.24
CALLATRUMORS [12]	97.47	96.59	97.17	97.28
FakeBERT [13]	99.12	98.8 99	98.93
Ensemble model [42]	87.17	89.47	86.50	86.53
Hybrid CNN-RNN [43]	99	99	99	99
HyproBert	99.30	99.12	99.14	99.20

Table 4. BiGRU layers with HyproBert on the FA-KES dataset.

# of Layers	Accuracy	Precision	Recall	F_Measure
1	59.97	59.98	59.62	59.74
2	61.10	60.89	60.60	59.97
3	61.14	59.86	59.63	59.74
4	61.15	60.40	61.07	61.09

Table 5. BiGRU layers with HyproBert on the ISOT dataset.

# of Layers	Accuracy	Precision	Recall	F_Measure
1	99.07	98.63	97.62	98.74
2	99.10	98.93	97.60	98.96
3	99.11	98.61	97.63	98.99
4	99.30	99.12	99.14	99.20

Table 6. Results of all models on the ISOT-R10 dataset.

Dataset	Accuracy	Precision	Sensitivity	Specificity	AUC
LR	44.72	48.98	44.37	41.62	42.38
RF	82.64	82.91	82.14	77.52	80.91
MNB	48.92	50.41	49.34	46.28	45.22
KNNs	49.25	47.85	49.92	47.56	47.02
AB	72.96	76.61	73.88	70.36	70.07
DT(Elhadad et al. [57])	84.39	85.41	85.21	81.01	82.78
CNN	69.89	68.36	71.58	67.29	67.36
RNN	77	78	77	70	73.5
LSTM	88.34	91.23	90.51	84.52	86.37
BiLSTM	89.23	90.74	89.75	87.37	87.64
BiGRU	93.91	95.35	95.04	93.14	91.98
BiLSTM-CNN+task-specific word embedding [11]	93.83	95.82	95.11	93.27	92.13
CALLATRUMORS [12]	92.26	95.93	93.21	88.26	91.59
FakeBERT [13]	93.28	94.81	94.21	90.36	90.86
Ensemble model [42]	69.37	74.34	72.19	67.59	68.44
Hybrid CNN-RNN [43]	92.96	94.78	92.17	88.40	89.69
HyproBert	95.67	97.12	95.88	92.94	94.14

Table 7. Comparison of results of imbalanced data on HyproBert.

Dataset	Accuracy	Precision	Sensitivity	Specificity	AUC
ISOT	99.30	99.12	99.07	99.03	99.20
ISOT-R10	95.67	97.13	95.88	92.94	94.14
ISOT-R20	94.73	95.98	93.91	91.48	93.58
ISOT-R30	92.18	93.46	90.78	90.12	92.04

Table 8. Ablation study architecture and results.

Datasets	BiGRU + CapsNet	BiLSTM + CapsNet	Conv + BiLSTM	Conv + BiGRU
		+Conv	+Attention + CapsNet	+Attention + CapsNet
FA-KES	52.67	51.83	54.03	61.15
ISOT	72.10	69.35	83.89	99.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nadeem, M.I.; Mohsan, S.A.H.; Ahmed, K.; Li, D.; Zheng, Z.; Shafiq, M.; Karim, F.K.; Mostafa, S.M. HyproBert: A Fake News Detection Model Based on Deep Hypercontext. Symmetry 2023, 15, 296. https://doi.org/10.3390/sym15020296

AMA Style

Nadeem MI, Mohsan SAH, Ahmed K, Li D, Zheng Z, Shafiq M, Karim FK, Mostafa SM. HyproBert: A Fake News Detection Model Based on Deep Hypercontext. Symmetry. 2023; 15(2):296. https://doi.org/10.3390/sym15020296

Chicago/Turabian Style

Nadeem, Muhammad Imran, Syed Agha Hassnain Mohsan, Kanwal Ahmed, Dun Li, Zhiyun Zheng, Muhammad Shafiq, Faten Khalid Karim, and Samih M. Mostafa. 2023. "HyproBert: A Fake News Detection Model Based on Deep Hypercontext" Symmetry 15, no. 2: 296. https://doi.org/10.3390/sym15020296

APA Style

Nadeem, M. I., Mohsan, S. A. H., Ahmed, K., Li, D., Zheng, Z., Shafiq, M., Karim, F. K., & Mostafa, S. M. (2023). HyproBert: A Fake News Detection Model Based on Deep Hypercontext. Symmetry, 15(2), 296. https://doi.org/10.3390/sym15020296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HyproBert: A Fake News Detection Model Based on Deep Hypercontext

Abstract

1. Introduction

2. Literature Review

Current Status and Limitations

3. Methodology

3.1. Data Preprocessing

3.2. The Proposed HyproBert Model

3.2.1. Input Layer

3.2.2. Embedding Layer

3.2.3. Convolutional Layer

3.2.4. Bigru Layer

3.2.5. Attention Layer

3.2.6. Capsule Network Layer

3.2.7. Dense and Output Layer

4. Experiments and Results

4.1. Dataset

4.2. Experimental Settings

4.3. Hyperparameter Settings

4.4. Comparison Models

4.5. Results

5. Discussions

5.1. Convolutional Layer Optimization

5.2. Bigru Optimization

5.3. Capsnet Optimization

5.4. Imbalanced Data Study

5.5. Ablation Study

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI