A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms

Tashtoush, Yahya; Alrababah, Balqis; Darwish, Omar; Maabreh, Majdi; Alsaedi, Nasser

doi:10.3390/data7050065

Open AccessArticle

A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms

by

Yahya Tashtoush

^1,*

,

Balqis Alrababah

¹,

Omar Darwish

^2,*

,

Majdi Maabreh

³ and

Nasser Alsaedi

⁴

¹

Computer Science Department, Jordan University of Science and Technology, P.O. Box 3030, Irbid 22110, Jordan

²

Information Security and Applied Computing Department, Eastern Michigan University, Ypsilanti, MI 48197, USA

³

Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan

⁴

Computer Science Department, Taibah University, Medina 2003, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Data 2022, 7(5), 65; https://doi.org/10.3390/data7050065

Submission received: 18 April 2022 / Revised: 7 May 2022 / Accepted: 9 May 2022 / Published: 13 May 2022

(This article belongs to the Section Information Systems and Data Management)

Download

Browse Figures

Versions Notes

Abstract

:

The fast growth of technology in online communication and social media platforms alleviated numerous difficulties during the COVID-19 epidemic. However, it was utilized to propagate falsehoods and misleading information about the disease and the vaccination. In this study, we investigate the ability of deep neural networks, namely, Long Short-Term Memory (LSTM), Bi-directional LSTM, Convolutional Neural Network (CNN), and a hybrid of CNN and LSTM networks, to automatically classify and identify fake news content related to the COVID-19 pandemic posted on social media platforms. These deep neural networks have been trained and tested using the “COVID-19 Fake News” dataset, which contains 21,379 real and fake news instances for the COVID-19 pandemic and its vaccines. The real news data were collected from independent and internationally reliable institutions on the web, such as the World Health Organization (WHO), the International Committee of the Red Cross (ICRC), the United Nations (UN), the United Nations Children’s Fund (UNICEF), and their official accounts on Twitter. The fake news data were collected from different fact-checking websites (such as Snopes, PolitiFact, and FactCheck). The evaluation results showed that the CNN model outperforms the other deep neural networks with the best accuracy of 94.2%.

Keywords:

text classification; fake news detection; neural networks; deep learning; COVID-19; coronavirus; text mining

1. Introduction

Social media platforms, such as Facebook, TikTok, and Twitter, emphasize news sharing, communication, engagement, and collaboration. That is not only for personal sharing but also for businesses to promote their products and grab customers’ attention. Fortunately, the advancement of mobile applications and their availability have made these platforms accessible and user friendly. However, one of the big challenges is the control over the fake news or misinformation spread across social media, which covers several topics, such as economics, environment, politics, and health. Fake news and misinformation publishers might have a variety of motivations, such as entertainment, misleading public opinion about an issue, increasing the number of visitors to a website, promoting a biased point, etc. In general, they are either fraudulent or illegal.

Coronavirus (COVID-19) is a global pandemic caused by severe acute respiratory syndrome Coronavirus2 (SARS-CoV-2). The virus was identified in the Chinese city of Wuhan in December 2019, and it is considered one of the most infectious and spreadable diseases to have affected our planet over the past few decades [1]. Multiple types of coronavirus have emerged and become prevalent in many countries since 2021. More than 485 million cases were confirmed, with 6.2 million deaths [2]. Since the global spread of the coronavirus, social media platforms have played a larger role in informing people and maintaining their safety, productivity, and communication with one another. News agencies and medical organizations have taken advantage to spread information about the virus. The Centers for Disease Control and Prevention (CDC), World Health Organization (WHO), healthcare organizations, and medical journals have also contributed to publishing and updating information about the virus, in terms of spreading, prevention and treatments. However, because this problem is tied to the health and medical elements of human existence, the transmission of information and misleading news about the virus via the Internet and social media platforms has added another challenge to fight. Moreover, this fake news had tragic aspects, such as harming the country’s economy, reducing people’s trust in their governments, promoting certain products to make huge profits, and spreading wrong tips and instructions for preventing and treating the virus [1]. For example, as a result of the misleading news on social media platforms stating that consuming alcohol reduces coronavirus infection, over 480 individuals died and 280 others were poisoned in Iran, according to the independent [3]. In the United States, according to the Guardian [4], a person died and his wife was injured after taking a drug that Trump declared a treatment for coronavirus.

The term “Fake News” is defined in the literature as disinformation, misinformation, scam, or rumor, which is news that is written to mislead the reader and make them believe that it is factual and reliable [5]. There are two main types of fake news published on social media: “Disinformation”, which is intentionally disseminated by people with bad intentions. In the case of the COVID-19 pandemic, the disinformation blamed ethnic groups, illegal immigrants, and even governments for spreading the virus. It seems that some political groups want to sow chaos for political gain [6,7]. The other type is “Misinformation”, which spreads innocently despite being untrue. Examples of misinformation about the COVID-19 pandemic include misunderstanding of the virus, wishful thinking about false cures, and fictional implications derived from how the virus is spread. This manifests via speculation among well-meaning people who publish their opinion as fact [6,7].

Therefore, the search for a mechanism to detect and identify fake news on social media platforms became a novel approach that attracted great attention. Fake news detection is the task of evaluating the credibility of news and classifying it as “Fake or Real” [8]. Fake news detection methods can generally be categorized into (I) content-based detection methods and (II) social-context-based detection methods [1].

Content-based detection methods [7,9] rely on extracting features from the news content (such as headlines, body text, images/videos, and news sources) for the classification of news (fake/real). Generally, they can be categorized into (I) knowledge-based methods and (II) style-based methods. Knowledge-based methods validate the news based on its content, either by relying on human domain experts, such as journalists or scientists [10], or by providing an automated system to evaluate the credibility of news. The style-based methods rely on the writing style to determine and identify fake news because fake news publishers have a certain writing style to influence consumer behavior to spread fake news and convince a wide range of consumers. An example of this is posting a catchy headline without reporting any useful information, such as “You’ll never believe what he did!!!!!!!!!!!!!!!!!!!”. This type of headline arouses the curiosity of the reader who might click to read the news [7,10].

Social-context-based detection relies on the information between the users and the news [9]. Generally, they can be categorized into (1) stance-based methods and (2) propagation-based methods [5]. Stance-based methods rely on the user’s viewpoints towards the social media posts to evaluate the credibility of the news. The user’s stance can be represented explicitly or implicitly [11]. Explicit stances are outright expressions of opinion or emotion, such as reactions expressed on Facebook (Like, Dislike), while the implicit stances should be extracted from posts themselves [7]. Propagation-based methods rely on the interrelations between relevant social media posts to classify the news as Fake or Real [7].

The main contributions of this paper are: (i) Introducing a novel Fake news dataset, “COVID-19 Fake News dataset”, which contains real and fake news data for the COVID-19 pandemic and its vaccines. The dataset content has been collected from different English resources, containing a wide variety of real and fake COVID-19-related news. The dataset has no missing values. Researchers that want to create or evaluate different fake news detection algorithms may find this dataset valuable. (ii) Proposing a fake news detection system that can automatically identify and detect the fake news data for the COVID-19 pandemic using several deep neural networks: (1) LSTM, (2) bi-directional LSTM, (3) CNN, and (4) hybrid model of CNN and LSTM. The aforementioned deep learning methods successfully generated models with greater than 90% accuracy in detecting COVID-19-related false news; nonetheless, comparisons suggested that CNN may be the AI developer’s preferred deep learning algorithm on this dataset, because it created a more accurate model (94% accuracy). This is in case the users are interested in the most accurate model.

This study shows the performance of models using a variety of evaluation metrics, allowing users or AI developers to set up criteria or assign weights to the metrics based on their importance, to choose the best model that fits the requirements or may rank the models based on more than one metric. That being said, fake news detection might potentially be handled via multi-criteria decisions making [12,13,14,15], in which users rank deep learning models depending on their preferences, such as the most accurate model with the highest F1 score and the shortest training time. One might prefer the models that can be quickly updated with reasonable accuracy over the most accurate model that needs a long time to be updated or retrained on a new dataset. We evaluate the models in this study based on their prediction performance, where the CNN model shows the best performance among others.

The paper is organized as follows: Section 2 discusses the related work. Section 3 presents the experimental dataset. Section 4 presents, in detail, the methodology we followed in this research. Section 5 evaluates the deep learning algorithms’ performance and discusses the results. Finally, Section 6 shows our conclusions and future works.

2. Related Works

2.1. Related Fake News Detection Methods

Recently, several deep neural networks have been proposed to detect fake news on social media platforms.

Kumar et al. [16] evaluated several deep learning models (CNN, LSTM, and hybrid models) to solve the fake news problem on social media platforms. The results showed that the hybrid model, which consists of CNN + Bi-LSTM with an attention mechanism, achieved the best accuracy of 88.78%. Rodríguez and Iglesias [17] applied three different neural network models (LSTM, CNN, and BERT) for detecting and classifying fake news on the Internet using the text features only. The models were evaluated using a dataset consisting of 200,015 news articles classified as fake or real. The dataset contains many features, such as titles, contents, publisher and author details, and image URLs, but only titles and content features were used in their study. The highest accuracy was achieved by the BERT model, which was around 97%. It proved that it is possible to detect and classify fake news using text features only. Jiang et al. [18] proposed a framework using Bi-LSTM deep learning models that classifies the news as fake or real. The proposed model was evaluated using a fact-checking dataset. Moreover, they used various evaluation metrics, such as accuracy, recall, F1 measure, and execution time, to demonstrate the efficiency of the proposed model. The model achieved an accuracy of 99.82%, precision of 100%, and recall of 100%. Umer et al. [19] proposed a hybrid deep neural network architecture to solve the problem of identifying and detecting fake news on social media platforms. The hybrid model combines the capabilities of CNN and LSTM models. They used two approaches to reduce the dimensions of the feature vectors before passing them to the classifier, namely Chi-Square and Principal Component Analysis (PCA). The model was evaluated using the Fake News Challenges (FNC) dataset that contains news articles labeled into four categories (Agree, Discuss, Disagree, and Unrelated). Non-linear features are fed to the PCA and chi-square and that provides more contextual features for the fake news detection task. The results show that PCA outperformed Chi-square, with an accuracy of 97.8%.

Zhi et al. [20] believed that there are important factors that must be used in fake news detection models, such as (1) important clues in the comments and (2) the sources of news and data. Thus, they proposed a hybrid deep neural network based on CNN-LSTM that has several features (such as news body, comments, news sources, and market data). Moreover, they used the attention mechanism to extract important information from the comments and make a list of the official websites to identify the source of the news. As for the market dimension, for the financial products mentioned in the news, they get the market price and check if the data in the articles are correct. The model was evaluated using a dataset consisting of 8000 news samples collected from many financial sites. Each sample contains a headline, content, source, and comment. The proposed models achieved accuracy = 92.1%, precision = 88.9%, recall = 95.6%, and F1 score = 92.3. Wani et al. [21] evaluated several deep learning models that are used in text classification tasks to detect fake news, such as (1) BERT pre-trained model, (2) deep learning models based on CNNs, and (3) LSTMs. These models rely on automatically capturing the linguistic and stylistic features from news content, which can be used to determine the credibility of news. The proposed models were evaluated using the “AAAI 2021 COVID-19 fake news dataset”. The dataset contains 10,700 tweets and is labeled as (Fake/Real). The fake tweets were collected from fact-checking websites, such as NewsChecker, PolitiFact, and Boom live, while the verified Twitter handles were used to collect the real tweets. The results showed that the BERT model outperforms other models, with a difference in the accuracy of about 3–4%. The BERT models achieved an accuracy of 98.41%, while the baseline model achieved an accuracy of 93.32%.

Abdelminaam et al. [22] proposed a modified LSTM (from one to three layers) and modified GRU (from one to three layers) for detecting fake news related to coronavirus. Moreover, they compared the proposed model’s performance with the performance of traditional machine learning models, such as Decision Tree Random Forest, Logistic Regression, K Nearest Neighbor, Naïve Bayes, and Support Vector Machine. Two feature extraction methods were used: N-gram and TF-ID, with a classical machine learning model and word embedding with proposed deep neural networks. The results indicated a significant improvement over the results of classical machine learning models and the ability to detect fake news related to the coronavirus pandemic.

Similarly, Ajao et al. [23] evaluated and compared three deep learning algorithms for detecting fake news messages published on Twitter; namely, LSTM, LSTMdrop, and the hybrid model of LSTM and CNN. The findings showed that LSTM achieved the best prediction performance (82% accuracy). Another hybrid model of CNN and RNN (Recurrent Neural Network) was evaluated in [24]. While the CNN or RNN models could perform well on fake news datasets, the hybrid model achieved better performance in fake news prediction. Using a deep neural network and word embedding representation, the model in [25] achieved 93.92% accuracy on a 10,700 record dataset. The comparison in the study showed that deep learning outperforms other classical machine learning in fake news detection.

2.2. Related Fake News Detection Datasets

We reviewed the most common datasets used for fake news detection tasks in terms of news domain, sources, size, and language. This allowed us to demonstrate that: (1) most of the current fake news datasets are collected in the English language, (2) most fake news datasets are small, (3) the news in the datasets is mainly categorized as Fake or Real, and (4) most datasets rely on manually labeled data. Wang [26] presented a novel fake news detection dataset called “Liar”. Liar consisted of 12,800 manually labeled statements in different contexts collected from Politifact.com. The Liar dataset is larger than the other publicly available fake news datasets of a similar nature. Shu et al. [27] also introduced a novel fake news dataset called “FakeNewsNet”, related to the United States’ entertainment and political news. FakeNewsNet consists of two datasets with various features, such as news content, social context, and spatial data. FakeNewsNet was labeled as true/false by the experts. Moreover, they presented a full explanation of FakeNewsNet from various perspectives and discussed its benefits for fake news detection on a social media platform. Adali [28] introduced a novel fake news dataset called “BuzzFeedNews”. It focused on political news posted on Facebook during the 2016 United States Presidential Election. Full-text news articles were collected from nine Facebook pages and labeled by five BuzzFeed experts. BuzzFeedNews consists of 1380 news articles related to the United States election and candidates. BuzzFeedNews collected and labeled news into four categories: 1090 mostly true, 64 mostly false, 170 mixed true and false, and 56 articles with no factual information.

Riedel et al. [29] presented Fake News Challenges (FNC-1) fake news dataset. The FNC-1 dataset consists of 75,385 headline and article pairs, labeled as Agree, Discuss, Disagree, or Unrelated. FNC-1 is designed to classify the body of the articles and the claim in the headline into one of four classes: Agree, if the headline agrees with the body text; Disagree, if the headline disagrees with the body text; Discusses, if the headline and the body text discuss the same topic; Unrelated, if the headline and the body text discuss different topics. Barbado et al. [30] developed a “Yelp” fake news dataset by scraping 18,912 electronic reviews from four important cities: San Francisco, New York, Miami, and Los Angeles. Yelp was labeled by Yelp’s filter as fake or trust. This study mainly focused on detecting fake reviews related to the consumer electronics domain. “Getting Real about Fake News” [31] is a fake news dataset collected by Kaggle. It consists of 12,999 fake and real news articles and their metadata. “Getting Real about Fake News” was scraped from 244 websites and annotated using the BS Detector Chrome extension (BS is a plug-in that uses fake news sources as a reference and flags a webpage with a red banner if it is considered questionable). The BS Detector was developed to validate online news articles and give label output instead of human annotators.

Papadopoulou et al. [32] presented “FVC-2018”, a novel annotated fake news dataset. It consists of Real and Fake videos collected from three platforms (YouTube, Twitter, and Facebook), in addition to a set of Twitter posts sharing links with them. FVC-2018 provides an annotated dataset of 380 user-generated videos, 200 debunked, and 180 verified, as well as 5195 semi-duplicates of them. Table 1 summarizes some of the available datasets for fake news detection.

3. Experimental Dataset

To serve the goal of this study, we created the COVID-19 Fake News dataset from scratch, with a subset of the Zenodo dataset [35]. Dataset collection, dataset filtering, dataset preparation and preprocessing are the three primary steps of the procedure used to create the COVID-19 Fake News dataset. The stages utilized to build our dataset are detailed in the subsections that follow.

3.1. Dataset Collection Stage

The accuracy of any detection system to classify fake news is highly dependent on the quality of the dataset used in training the model and how these datasets describe the facts that are relevant to that topic. For this reason, we used reliable resources to collect the “COVID-19 fake News dataset “from the official accounts of global health organizations, which make an effort to avoid instilling fear and horror in the public, such as WHO [36], UN [37], UNICEF [38], and ICRC [39]. The real news data were collected by scraping all the news about the COVID-19 pandemic and its vaccines from the websites of the World Health Organization (WHO), International Committee of the Red Cross (ICRC), United Nations (UN), and the United Nations Children’s Fund (UNICEF) and their official accounts on Twitter. Table 2 shows the sources and the keywords used to search the COVID-19 real news data.

The fake news data for the COVID-19 pandemic and its vaccines were collected as follows. First, we used the API of Google Fact Check to collect the COVID-19 fake news data. Google Fact Check Tools API allows users to get news and information from various fact-checking websites, such as Snopes, PolitiFact, and FactCheck. We used the COVID-19 and vaccines keywords to search and get the COVID-19 news that was labeled as fake from different fact-checking websites. To address the issue of imbalance classes, we included some examples from the Zenodo dataset, which is a COVID-19 false news dataset [41]. Table 3 shows the source, keywords, and collection method for the COVID-19 fake news.

3.2. Dataset Filtering Stage

This stage was performed to filter and standardize our collected dataset. The processes applied in this stage were:

Remove Duplicate Data: we verified the collected dataset and removed the redundant news data. We used “drop_duplicates()” method in the panda python library to find and remove duplicate data instances.
Dataset Standardization: because we collected our dataset from different sources, this stage was performed to standardize our dataset by making the data fit into a standard structure containing the following fields:
•
ID: each news instance is given a unique id.
•
Text (content): news content for COVID-19 and its vaccines.
•
Publisher: WHO, ICRC, UNICEF, UN, PolitiFact, Snopes, Factcheck, etc.
•
Label: Real or Fake.
•
Language In this paper, we used the English language only.

3.3. Dataset Preprocessing Stage

The preprocessing stage is one of the most crucial processes in producing a clean dataset, free of errors, redundancy, and missing values [42]. In our collected dataset, the real and fake news texts contained irrelevant information, such as punctuation marks, special characters, links, hashtags, and user mentions. Therefore, pre-processing is required to facilitate the analysis process, reduce the required memory space, and shorten the training/testing time. The data preprocessing was achieved using Natural Language Toolkit (NLTK) [43] and python regular expressions, as follows:

Dataset Cleaning: each dataset instance is tokenized and then applied with regular expressions to remove all punctuation, links, hashtags, symbols, repeating text, and non-English alphabets.
Stop Words Removal: removing stop words, which are functional words that do not have any necessary information when analyzing the text, including propositions, pronouns, and conjunctions.
Word Lemmatization: removing the suffix or prefix from words to avoid redundant patterns. WordNet Lemmatizer was used to perform the Word Lemmatization process.

4. Methodology

Our approach uses deep neural networks to classify the COVID-19 news into Real news/Fake news. Figure 1 shows the general framework of our proposed fake news detection system. The methodology pipeline consists of four stages. In the first stage, COVID-19 real news data were collected from independent and internationally reliable institutions on the web: the World Health Organization (WHO), the International Committee of the Red Cross (ICRC), the United Nations (UN), the United Nations Children’s Fund (UNICEF), and their official accounts on Twitter. Further the fake news data were collected from different fact-checking websites (such as Snopes, PolitiFact, and FactCheck). After that, we cleaned the dataset from noise and errors and removed duplicate instances. The second stage is embedding—where the news data are embedded using GloVe pre-trained word embedding. The third stage involved training several deep neural networks (LSTM, BiLSTM, CNN, and a hybrid of CNN-LSTM) to detect and identify fake news related to the COVID-19 pandemic. The last stage is the classification and evaluation stage of COVID-19 news (Real/Fake) models using an unseen testing dataset. Figure 1 shows the fake news detection framework, and the following subsections explain our methodology in detail.

4.1. Dataset Preparation and Integration

Based on the keywords in Table 2 and Table 3 above, and after the COVID-19 real and fake news data from different sources were indexed, cleaned from non-English characters, numbers, hashtags, emojis, repeating text, user mentions, links, and punctuation, and standardized in a uniform format, 21,379 news data were available, where 80% of them were dedicated to the training and 20% for use in the testing phase. Table 4 shows the final class distribution and the number of news instances for the COVID-19 Real and Fake News dataset.

4.2. Converting Text to Vectors Using Pre-Trained Embeddings

In NLP, word embedding is considered the best representation of words. This is because it plays an important role in improving the overall accuracy, especially when using deep neural networks, as it can capture the semantic and syntactic relationship between the words that are represented as a vector space. There are several pre-trained word embedding models that can be used, but the best and most popular ones are Word2Vec and GloVe. In this paper, we chose the GloVe pre-trained word embedding to prepare and transform the training and testing dataset into a representation that a classifier can comprehend. First, we used the tokenizer function to divide the text into words and encoded it into a sequence of integers. Then, we applied the pad sequences, which post padded the sequences and made all of them have the same length. Finally, the sequences are trained using the GloVe pre-trained word embedding layer. The output produced is an “embedding matrix”, which is used for the training of deep learning models.

4.3. Deep Neural Networks for Fake News Detection

To detect the fake news related to the COVID-19 pandemic and its vaccines, several deep learning models were implemented, including LSTM, bi-directional LSTM, CNN, and the hybrid model of CNN and LSTM.

4.3.1. Long Short-Term Memory Networks (LSTM)

LSTM is a type of Recurrent Neural Network (RNN) that overcomes the vanishing gradient problems that occur in the RNN model. LSTM can remember information for long periods, using a memory unit called “cell state”, which is the core of the LSTM network. LSTM can remove or add information to a cell state through carefully designed structures called “gates”. Each cell state consists of three gates (the input gate, the forget gate, and the output gate). Gates can learn relevant information to maintain or forget information from the cell state during training. The input gate is used to update a cell state by defining what new information will be entered from the current input into the cell state. The forget gate decides what important information to keep and what unnecessary information to remove before merging with the cell state. The output gate determines the output from the memory cell. We implemented the LSTM model to classify the COVID-19 news as follows. We passed the input as a sequence of words through an embedding layer, which converts each word to its respective embedding vector using GloVe pre-trained word embedding. The output produced by the embedding layer is an “embedding matrix”, which is passed to the LSTM unit for computation. The common LSTM unit consists of a memory cell, an input gate, an output gate, and a forget gate, which can remember the flow of information. The output from the LSTM unit is fed into a single sigmoid neuron, which can output the class of the given news as Fake or Real [44].

4.3.2. Bi-Directional Long Short-Term Memory Networks (Bi-LSTM)

Bi-LSTM is an improvement on the traditional LSTM that can improve the efficiency of the model by training two separate LSTMs. Bi-LSTM consists of two LSTMs: front-to-back LSTM, which processes the input sequences from front-to-back, and back-to-front LSTM, which processes the input sequences from back-to-front [45]. Thus, the input sequence will process in both directions at the same time. We implemented the bi-LSTM model to classify the COVID-19 news as follows. First, we passed the input as a sequence of words through an embedding layer. The embedding layer converts each word to its respective embedding vector using GloVe pre-trained word embedding. The output produced an “embedding matrix”, which was passed to two LSTM units: (1) front-to-back LSTM and (2) back-to-front LSTM. Then, the outputs from the LSTM units were concatenated. Finally, the output is fed into a single sigmoid neuron, which can output the class of the given news as Fake or Real. Figure 2 shows our implemented architecture of the bi-LSTM Model.

4.3.3. Convolutional Neural Networks (CNN)

A convolutional neural network (CNN) is a type of feed-forward artificial neural network [46]. CNN achieved amazing results in the computer vision fields and many natural language processing (NLP) tasks, such as sentiment analysis [47], fake news detection [48], spam detection [49], sentence modeling [50], and many other related tasks. A CNN has one or more hidden layers, which can extract features from the input (image, audio, video, text) and a fully connected layer to produce the desired output. The CNN mainly consists of three types of layers: The convolution layer (CONV) applies a set of filters that can identify and recognize features or characteristics in the input (image, audio, video, text). The output from this layer is a feature map, which determines the positions and depths of the features in the input by multiplying the filter “set of weights” with the input matrix. The most important parameters used by the convolutional layer are (1) the number of kernels and (2) the size of the kernels. The Pooling layer (downsampling layer) reduces the dimension of the feature map by applying max-pooling or average-pooling functions. The fully connected layer (classification layer) is the last layer added at the end of the model. It is used to show and predict the final output by using the sigmoid activation function (for binary classification) or the softmax activation function (for multi-class classification) to compute the class scores. Finally, the output layer presents the class label [51].

Our proposed CNN model is composed of two convolution blocks. Each block consists of a single Conv-1D and a max-pooling layer. We implemented the CNN model as follows. First, we passed the input as a sequence of words through an embedding layer that converted each word to its respective embedding vector using GloVe pre-trained word embedding. The output produced is an “embedding matrix”, which is passed to two convolution blocks. Each convolution block consists of a (1) convolution layer where the local features are extracted, and a (2) max-pooling layer, where the feature vectors extracted from the convolution layer are reduced. Finally, the output is fed into the fully connected layer with a dropout rate of 0.3 to reduce the overfitting, which performs a news classification as Fake or Real. Figure 3 show the implemented architecture of the CNN model.

4.3.4. Hybrid Model (CNN and LSTM)

The hybrid model composed of CNN and LSTM uses the CNN capabilities, to extract the local features and the LSTM capabilities to learn the long-term dependencies. First, we passed the input as a sequence of words through an embedding layer that converts each word to its respective embedding vector, using GloVe pre-trained word embedding. The output produced is an “embedding matrix”, which is passed to the convolutional layer (Conv1D) to extract the local features. The output produced is “large feature vectors” which feed into the max-pooling layer. This layer is responsible for downsampling the feature vectors and reducing the number of parameters (reducing the dimensionality of CNN output). Then the feature maps generated by CNN are fed as the input for the LSTM layer, which uses these features to learn the long-term dependent features of the news. Finally, the output generated by the LSTM layer “trained feature vectors” is fed into a sigmoid neuron, which obtains the class of the given news as Fake or Real. The CNN-LSTM hybrid architecture that was implemented is shown in Figure 4.

4.4. Hyperparameter Tuning

Different hyperparameters affect the performance of deep learning models, such as activation functions, optimizers, number of epochs, batch size, learning rates, and dropout rates. Because there are a large variety of hyperparameters that can be used, we conducted many experiments to choose the optimal hyperparameters for our models using a simple grid search. Table 5 and Table 6 show the best hyperparameter values of our deep learning models that achieved the best classification results.

5. Evaluation and Result Analysis

5.1. Evaluation Metrics

We evaluated the performance of our fake news detection models from various perspectives, using a set of evaluation metrics derived from the confusion matrix.

5.2. Results Analysis

To classify COVID-19 news as Fake/Real, various experiments were conducted to build an accurate fake news detector. We trained a set of deep learning models that are capable of detecting COVID-19 fake news using word embeddings on the COVID-19 Fake News dataset. Each implemented model uses different sets of hyperparameters, such as batch size, loss function, learning rate, and the number of hidden units.

We used the confusion matrices for each implemented deep learning model shown in Table 7, Table 8, Table 9 and Table 10 to evaluate the performance of our models.

The performance obtained from different deep learning models using the COVID-19 Fake News dataset is shown in Table 11. The results show that the CNN model outperforms the other models with the best accuracy of 94.2%, precision of 5.8%, miss rate of 5.5%, and FPR = 6.1%.

To avoid the overfitting and underfitting problems, we used the early stopping technique [52] and the dropout layer with the deep learning models during the training process. The overfitting problem occurs when training the model using too many epochs, thus, making the model fail to predict when differences occur, while the underfitting problem occurs when training the model using too few epochs, thus, making the model unable to learn enough from the features obtained from the data. Figure 5, Figure 6, Figure 7 and Figure 8 show the training and validation loss values at the end of the training process for each model. As shown in the Figures, all models have been trained without overfitting and underfitting problems.

6. Conclusions and Future Works

Fake news detection from the text is a challenging task and has many applications. In this paper, a set of deep learning models (LSTM, bi-directional LSTM, CNN, and hybrid model of CNN and LSTM) were developed to detect and identify fake news for the COVID-19 pandemic from the Internet and social media platforms. The deep learning models were developed at the word level using GloVe pre-trained word embedding features, with different sets of hyperparameters, such as batch size, loss function, learning rate, and the number of hidden units to classify the COVID-19 news as Fake/Real. The “COVID-19 Fake News” dataset was used to evaluate the deep and machine models for this task. The “COVID-19 Fake News” dataset, consists of real and fake news data about the COVID-19 pandemic. The real news data were collected from internationally reliable and independent institutions on the web, while the fake news data were collected from different fact-checking websites. The evaluation results showed that the CNN model outperforms the other deep neural networks with the best accuracy of 94.2%, precision of 93.6%, recall of 93.9%, F1-Score of 93.7%, specificity of 93.9%, error rate of 5.8%, miss rate of 5.5%, and FPR of 6.1%.

For future work, we plan to use transformers, such as BERT, XLNet, XLM, and ULMFiT, to build our detection models. Further, we plan to extend the “COVID-19 Fake News” dataset to include more COVID-19 news data from different languages. In addition, we will explore additional methods to extract other features that will be useful in improving the overall performance.

Author Contributions

Conceptualization, Y.T., O.D. and B.A.; methodology, Y.T., O.D., B.A. and M.M.; software, B.A., Y.T., O.D. and M.M.; validation, Y.T., O.D., B.A., M.M. and N.A.; formal analysis, Y.T., O.D., B.A., M.M. and N.A.; investigation, Y.T., O.D., B.A., M.M. and N.A.; resources, Y.T., O.D. and B.A.; data curation, B.A.,Y.T., O.D., M.M. and N.A.; writing—original draft preparation, B.A., Y.T., M.M., O.D. and N.A.; writing—review and editing, N.A., M.M., Y.T., O.D. and B.A.; visualization, Y.T., O.D., B.A., M.M. and N.A.; supervision, Y.T. and O.D.; project administration, Y.T. and O.D.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this article must be approved by the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Ahmad, B.; Al-Zoubi, A.M.; Abu Khurma, R.; Aljarah, I. An Evolutionary Fake News Detection Method for COVID-19 Pandemic Information. Symmetry 2021, 13, 1091. [Google Scholar] [CrossRef]
COVID-19 Pandemic—Wikipedia. Available online: https://en.wikipedia.org/wiki/COVID-19_pandemic (accessed on 20 December 2021).
Coronavirus: Hundreds Dead in Iran from Drinking Methanol Amid Fake Reports It Cures Disease. Available online: https://www.independent.co.uk/news/world/middle-east/iran-coronavirus-methanol-drink-cure-deaths-fake-a9429956.html (accessed on 15 April 2022).
Arizona Man Dies after Attempting to Take Trump Coronavirus ‘cure’. Available online: https://www.theguardian.com/world/2020/mar/24/coronavirus-cure-kills-man-after-trump-touts-chloroquine-phosphate (accessed on 15 April 2022).
Kaliyar, R.K. Fake news detection using a deep neural network. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 14–15 December 2018; pp. 1–7. [Google Scholar]
Gupta, A.; Sukumaran, R.; John, K.; Teki, S. Hostility detection and COVID-19 fake news detection in social media. arXiv 2021, arXiv:2101.05953. [Google Scholar]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
Elhadad, M.K.; Li, K.F.; Gebali, F. Detecting misleading information on COVID-19. IEEE Access 2020, 8, 165201–165215. [Google Scholar] [CrossRef]
Raza, S. Automatic Fake News Detection in Political Platforms-A Transformer-based Approach. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text (CASE 2021), Online, 5–6 August 2021; pp. 68–78. [Google Scholar]
Zhang, X.; Ghorbani, A.A. An overview of online fake news: Characterization, detection, and discussion. Inf. Process. Manag. 2020, 57, 102025. [Google Scholar] [CrossRef]
Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor. Newsl. 2017, 19, 22–36. [Google Scholar] [CrossRef]
Yazıcı, A.; Keser, S.B.; Günal, S.; Yayan, U. A Multi-Criteria Decision Strategy to Select a Machine Learning Method for Indoor Positioning System. Int. J. Artif. Intell. Tools 2018, 27, 1850018. [Google Scholar] [CrossRef]
Ali, R.; Lee, S.; Chung, T.C. Accurate multi-criteria decision making methodology for recommending machine learning algorithm. Expert Syst. Appl. 2017, 71, 257–278. [Google Scholar] [CrossRef]
Chowdhury, N.K.; Kabir, M.A.; Rahman, M. An Ensemble-based Multi-Criteria Decision Making Method for COVID-19 Cough Classification. arXiv 2021, arXiv:2110.00508. [Google Scholar]
Pirouz, B.; Ferrante, A.P.; Pirouz, B.; Piro, P. Machine Learning and Geo-Based Multi-Criteria Decision Support Systems in Analysis of Complex Problems. ISPRS Int. J. Geo-Inf. 2021, 10, 424. [Google Scholar] [CrossRef]
Kumar, S.; Asthana, R.; Upadhyay, S.; Upreti, N.; Akbar, M. Fake news detection using deep learning models: A novel approach. Trans. Emerg. Telecommun. Technol. 2020, 31, e3767. [Google Scholar] [CrossRef]
Rodríguez, Á.I.; Iglesias, L.L. Fake news detection using Deep Learning. arXiv 2019, arXiv:1910.03496. [Google Scholar]
Jiang, T.; Li, J.P.; Haq, A.U.; Saboor, A. Fake News Detection using Deep Recurrent Neural Networks. In Proceedings of the 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 18–20 December 2020; pp. 205–208. [Google Scholar]
Umer, M.; Imtiaz, Z.; Ullah, S.; Mehmood, A.; Choi, G.S.; On, B.W. Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access 2020, 8, 156695–156706. [Google Scholar] [CrossRef]
Zhi, X.; Xue, L.; Zhi, W.; Li, Z.; Zhao, B.; Wang, Y.; Shen, Z. Financial Fake News Detection with Multi fact CNN-LSTM Model. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–10 May 2021; pp. 1338–1341. [Google Scholar]
Wani, A.; Joshi, I.; Khandve, S.; Wagh, V.; Joshi, R. Evaluating deep learning approaches for COVID-19 fake news detection. In International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation; Springer: Cham, Switzerland, 2021; pp. 153–163. [Google Scholar]
Abdelminaam, D.S.; Ismail, F.H.; Taha, M.; Taha, A.; Houssein, E.H.; Nabil, A. Coaiddeep: An optimized intelligent framework for automated detecting COVID-19 misleading information on twitter. IEEE Access 2021, 9, 27840–27867. [Google Scholar] [CrossRef] [PubMed]
Ajao, O.; Bhowmik, D.; Zargari, S. Fake news identification on twitter with hybrid cnn and rnn models. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018; pp. 226–230. [Google Scholar]
Nasir, J.A.; Khan, O.S.; Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 2021, 1, 100007. [Google Scholar] [CrossRef]
Pathwar, P.; Gill, S. Tackling COVID-19 infodemic using deep learning. In Lecture Notes on Data Engineering and Communications Technologies; Springer: Singapore, 2022; Volume 99, pp. 319–335. [Google Scholar]
Wang, W.Y. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv 2017, arXiv:1705.00648. [Google Scholar]
Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; Liu, H. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 2020, 8, 171–188. [Google Scholar] [CrossRef]
Horne, B.; Adali, S. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the International AAAI Conference on Web and Social Media, Montreal, QC, Canada, 15–18 May 2017; Volume 11. [Google Scholar]
Riedel, B.; Augenstein, I.; Spithourakis, G.P.; Riedel, S. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv 2017, arXiv:1707.03264. [Google Scholar]
Barbado, R.; Araque, O.; Iglesias, C.A. A framework for fake review detection in online consumer electronics retailers. Inf. Process. Manag. 2019, 56, 1234–1244. [Google Scholar] [CrossRef] [Green Version]
Anoop, K.; Gangan, M.P.; Deepak, P.; Lajish, V.L. Leveraging heterogeneous data for fake news detection. In Linking and Mining Heterogeneous and Multi-View Data; Springer: Cham, Switzerland, 2019; pp. 229–264. [Google Scholar]
Papadopoulou, O.; Zampoglou, M.; Papadopoulos, S.; Kompatsiaris, I. A corpus of debunked and verified user-generated videos. Online Inf. Rev. 2019, 43, 72–88. [Google Scholar] [CrossRef] [Green Version]
Ahmed, H.; Traore, I.; Saad, S. Detecting opinion spams and fake news using text classification. Secur. Priv. 2018, 1, e9. [Google Scholar] [CrossRef] [Green Version]
Posadas-Durán, J.P.; Gómez-Adorno, H.; Sidorov, G.; Escobar, J.J.M. Detection of fake news in a new corpus for the Spanish language. J. Intell. Fuzzy Syst. 2019, 36, 4869–4876. [Google Scholar] [CrossRef]
Banik, S. COVID Fake News Dataset. Zenodo. 2021. Available online: https://zenodo.org/record/4282522#.YcEjUWhBzIV (accessed on 21 December 2021).
Who.int. Coronavirus Disease (COVID-19)—World Health Organization. 2021. Available online: https://www.who.int/ (accessed on 21 December 2021).
Nations, U.N. Coronavirus | United Nations. 2021. Available online: https://www.un.org (accessed on 21 December 2021).
Unicef.org. Coronavirus Disease (COVID-19) Information Centre. 2021. Available online: https://www.unicef.org (accessed on 21 December 2021).
International Committee of the Red Cross. Coronavirus: COVID-19 Pandemic. 2021. Available online: https://www.icrc.org (accessed on 21 December 2021).
Makice, K. Twitter API: Up and Running: Learn How to Build Applications with the Twitter API; O’Reilly Media, Inc.: Sevastopol, CA, USA, 2009. [Google Scholar]
Nakov, P.; Da San Martino, G.; Elsayed, T.; Barrón-Cedeño, A.; Míguez, R.; Shaar, S.; Alam, F.; Haouari, F.; Hasanain, M.; Mansour, W.; et al. Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages; Springer: Cham, Switzerland, 2021; pp. 264–291. [Google Scholar]
Alasadi, S.A.; Bhaya, W.S. Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 2017, 12, 4102–4107. [Google Scholar]
Hardeniya, N.; Perkins, J.; Chopra, D.; Joshi, N.; Mathur, I. Natural Language Processing: Python and NLTK; Packt Publishing Ltd.: Birmingham, UK, 2016. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Nandanwar, A.K.; Choudhary, J. Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM. Symmetry 2021, 13, 1772. [Google Scholar] [CrossRef]
Nisha, S.S.; Meeral, M.N. Applications of deep learning in biomedical engineering. In Handbook of Deep Learning in Biomedical Engineering; Academic Press: Cambridge, MA, USA, 2021; pp. 245–270. [Google Scholar]
Rani, S.; Bashir, A.K.; Alhudhaif, A.; Koundal, D.; Gündüz, E.S. An efficient CNN-LSTM model for sentiment detection in# BlackLivesMatter. Expert Syst. Appl. 2022, 193, 116256. [Google Scholar]
Srivastava, S.; Raj, R.; Saumya, S. COVID-19 Fake News Identification Using Multi-layer Convolutional Neural Network. In Advanced Computational Paradigms and Hybrid Intelligent Computing; Springer: Singapore, 2022; pp. 149–157. [Google Scholar]
Shaaban, M.A.; Hassan, Y.F.; Guirguis, S.K. Deep convolutional forest: A dynamic deep ensemble approach for spam detection in text. Complex Intell. Syst. 2022. [Google Scholar] [CrossRef]
Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188. [Google Scholar]
Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv 2015, arXiv:1511.08630. [Google Scholar]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]

Figure 1. Diagram of data collection and deep learning framework for fake news detection.

Figure 2. The implemented architecture of the bidirectional LSTM Model.

Figure 3. The implemented architecture of the CNN model.

Figure 4. The implemented architecture of hybrid model.

Figure 5. LSTM model loss.

Figure 6. Bidirectional LSTM model loss.

Figure 7. CNN model loss.

Figure 8. Hybrid model loss.

Table 1. Summary of some datasets used for fake news detection.

Reference	Dataset	Content	Domain	Size	Source	Classes
Wang [26]	Liar	Text	Politics	12,836 short statements	PolitiFact (Fact-Checking website)	Mostly true, True, Barely True, Half True, False, Pants Fire
Shu et al. [27]	FakeNewsNet	Text, images	Politics, Society	422 news	PolitiFact and GossipCop (Fact-Checking website)	True, Fake
Adali [28]	BuzzFeed News dataset	Text	Politics	2283 news samples	Facebook	Mostly true, Mixed True, False, Mixed False
Riedel et al. [29]	Fake News Challenges (FNC-1 dataset)	Text	Politics, Society, Technology	75,385 articles	-	Agree, Disagree, Discuss, Unrelated
Barbado et al. [30]	Yelp dataset	Text	Technology	18,912 reviews	-	Trust, Fake
Anoop et al. [31]	Getting Real about Fake News	Text	Politics, Arts, Entertainment	12,999 posts	244 different websites	Fake, Real
Papadopoulou et al. [32]	FVC-2018	Videos, Text	Society	380 videos and 77,258 tweets	YouTube, Facebook, Twitter	Fake, Real
Ahmad et al. [33]	Fake and real news dataset	Text	Society	25,200 news article	News website and Kaggle	Fake, Truthful
Duran et al. [34]	Spanish fake news corpus	Text	Politics, Health, Education, Economy, Science, Security, Sport, Entertainment, Society.	971 news	Different news websites	Fake, Real

Table 2. COVID-19 Real News Dataset.

Sources	COVID-19 Pandemic and Vaccine Keywords	Data Collection Method
WHO website UNICEF website UN website ICRC website WHO on twitter UNICEF on twitter UN on Twitter ICRC on Twitter	COVID-19, Coronavirus, Novel Coronavirus, 2019-nCoV, nCoV, SARS-CoV-2, Pfizer, Sinopharm, AstraZeneca, Moderna, Covaxin, Janssen, CoronaVac, ZyCoV-D, Convidecia, ZF2001, Sputnik V, Sputnik Light, Abdala, ZF2001, EpiVacCorona, Medigen, Soberana 02	(1) We developed our text extraction program, and (2) Twitter API [40].

Table 3. COVID-19 Fake News Dataset.

Sources	COVID-19 Pandemic and Vaccine Keywords	Data Collection Method
Fact-Checking website, Zenodo dataset	COVID-19, Coronavirus, Novel Coronavirus, 2019-nCoV, nCoV, SARS-CoV-2, Pfizer, Sinopharm, AstraZeneca, Moderna, Covaxin, Janssen, CoronaVac, ZyCoV-D, Convidecia, ZF2001, Sputnik V, Sputnik Light, Abdala, ZF2001, EpiVacCorona, Medigen, Soberana 02	Google Fact-Check tool API

Table 4. News instances and class distribution used in the training and testing set.

Dataset	Real	Fake	Total
Training dataset	9179	7924	17,103
Testing dataset	2186	2090	4276
Total	11,365	10,014	21,379

Table 5. Hyperparameters for LSTM and bidirectional LSTM models.

Hyperparameter	Value (LSTM)	Value (Bidirectional LSTM)	Values Examined by Grid Search
Learning rate	0.00001	0.00001	0.01, 0.001, 0.0001, 0.00001
Batch size	64	64	32, 64, 128
Loss function	Binary cross-entropy	Binary cross-entropy	-
Activation function	Sigmoid	Sigmoid	-
Optimizer	Adam	Adam	-
Number of epochs	25	25	10, 15, 20, 25, 30, 35, 40, 45, 50
Dropout rates	0.2	0.1	0.1, 0.2, 0.3, 0.4, 0.5

Table 6. Hyperparameters for CNN model and Hybrid (CNN and LSTM) model.

Hyperparameter	Value (CNN)	Value (Hybrid CNN and LSTM)	Values Examined by Grid Search
Number of filters	64, 128	32	32, 64, 128
Kernel size	2	2	-
Batch size	32	64	32, 64, 128
Loss function	Binary cross-entropy	Binary cross-entropy	-
Learning rate	0.00001	0.00001	0.01, 0.001, 0.0001, 0.00001
Activation Function	Sigmoid, Relu	Sigmoid, Relu	-
Optimizer	Adam	Adam	-
Number of epochs	30	20	10, 15, 20, 25, 30, 35, 40, 45, 50
Dropout rate	0.3	0.4	0.1, 0.2, 0.3, 0.4

Table 7. Confusion matrix for LSTM model.

	Predicted Real News	Predicted Fake News
Actual Real News	2069	198
Actual Fake News	194	1815

Table 8. Confusion matrix for bidirectional LSTM model.

	Predicted Real News	Predicted Fake News
Actual Real News	2129	182
Actual Fake News	197	1768

Table 9. Confusion matrix for CNN model.

	Predicted Real News	Predicted Fake News
Actual Real News	2168	127
Actual Fake News	121	1860

Table 10. Confusion matrix for hybrid model (CNN and LSTM).

	Predicted Real News	Predicted Fake News
Actual Real News	2121	197
Actual Fake News	136	1822

Table 11. Evaluation of deep learning models for COVID-19 Fake News detection.

Deep Learning Model	Evaluation Metrics (%)
Deep Learning Model	Accuracy	Precision	Recall	F1-Score	Specificity	Error Rate	Miss Rate	FPR
LSTM	90.8	90.2	90.3	90.3	90.3	9.2	8.7	9.7
Bidirectional LSTM	91.1	90.7	90.0	90.3	90.0	8.9	7.9	10.0
CNN and LSTM (Hybrid)	92.2	90.2	93.1	91.6	93.1	7.8	8.5	6.9
CNN	94.2	93.6	93.9	93.7	93.9	5.8	5.5	6.1
CNN and LSTM (Hybrid) without preprocessing steps	85.0	82.8	85.3	84.0	84.2	15.0	14.5	14.7

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tashtoush, Y.; Alrababah, B.; Darwish, O.; Maabreh, M.; Alsaedi, N. A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms. Data 2022, 7, 65. https://doi.org/10.3390/data7050065

AMA Style

Tashtoush Y, Alrababah B, Darwish O, Maabreh M, Alsaedi N. A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms. Data. 2022; 7(5):65. https://doi.org/10.3390/data7050065

Chicago/Turabian Style

Tashtoush, Yahya, Balqis Alrababah, Omar Darwish, Majdi Maabreh, and Nasser Alsaedi. 2022. "A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms" Data 7, no. 5: 65. https://doi.org/10.3390/data7050065

APA Style

Tashtoush, Y., Alrababah, B., Darwish, O., Maabreh, M., & Alsaedi, N. (2022). A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms. Data, 7(5), 65. https://doi.org/10.3390/data7050065

Article Menu

A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms

Abstract

1. Introduction

2. Related Works

2.1. Related Fake News Detection Methods

2.2. Related Fake News Detection Datasets

3. Experimental Dataset

3.1. Dataset Collection Stage

3.2. Dataset Filtering Stage

3.3. Dataset Preprocessing Stage

4. Methodology

4.1. Dataset Preparation and Integration

4.2. Converting Text to Vectors Using Pre-Trained Embeddings

4.3. Deep Neural Networks for Fake News Detection

4.3.1. Long Short-Term Memory Networks (LSTM)

4.3.2. Bi-Directional Long Short-Term Memory Networks (Bi-LSTM)

4.3.3. Convolutional Neural Networks (CNN)

4.3.4. Hybrid Model (CNN and LSTM)

4.4. Hyperparameter Tuning

5. Evaluation and Result Analysis

5.1. Evaluation Metrics

5.2. Results Analysis

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI