Ternion: An Autonomous Model for Fake News Detection

Islam, Noman; Shaikh, Asadullah; Qaiser, Asma; Asiri, Yousef; Almakdi, Sultan; Sulaiman, Adel; Moazzam, Verdah; Babar, Syeda Aiman

doi:10.3390/app11199292

Open AccessArticle

Ternion: An Autonomous Model for Fake News Detection

by

Noman Islam

¹,

Asadullah Shaikh

²

,

Asma Qaiser

³

,

Yousef Asiri

^2,*

,

Sultan Almakdi

^2,*

,

Adel Sulaiman

^2,*

,

Verdah Moazzam

³ and

Syeda Aiman Babar

³

¹

Department of Computer Science, Iqra University, Karachi 76400, Pakistan

²

College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia

³

Department of Computer Science, NED University of Engineering and Technology, Karachi 76400, Pakistan

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2021, 11(19), 9292; https://doi.org/10.3390/app11199292

Submission received: 22 August 2021 / Revised: 22 September 2021 / Accepted: 27 September 2021 / Published: 6 October 2021

(This article belongs to the Special Issue Current Approaches and Applications in Natural Language Processing)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the consumption of social media content to keep up with global news and to verify its authenticity has become a considerable challenge. Social media enables us to easily access news anywhere, anytime, but it also gives rise to the spread of fake news, thereby delivering false information. This also has a negative impact on society. Therefore, it is necessary to determine whether or not news spreading over social media is real. This will allow for confusion among social media users to be avoided, and it is important in ensuring positive social development. This paper proposes a novel solution by detecting the authenticity of news through natural language processing techniques. Specifically, this paper proposes a novel scheme comprising three steps, namely, stance detection, author credibility verification, and machine learning-based classification, to verify the authenticity of news. In the last stage of the proposed pipeline, several machine learning techniques are applied, such as decision trees, random forest, logistic regression, and support vector machine (SVM) algorithms. For this study, the fake news dataset was taken from Kaggle. The experimental results show an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15% for the support vector machine algorithm. The SVM is better than the second best classifier, i.e., logistic regression, by 6.82%.

Keywords:

fake news detection; natural language processing; machine learning; stance detection; social media

1. Introduction

Fake news detection has always been a problem because of its long-term repercussions and consequences. Its root can be traced back to the 17th century in propaganda, which became misinformation in the cold war [1]. In modern days, this problem has become grave due to the emergence of social media platforms. Specifically, in the past few years, social media channels, such as Facebook, Twitter, and Instagram, have emerged as platforms for quick dissemination and retrieval of information. Figure 1 shows a snapshot of some fake news in recent years. According to various studies [2], almost 50% of the population of developed nations depend on social media for news. The importance of social media cannot be denied, and it has emerged as an effective medium at the time of crises in regard to the role it plays in breaking news, for example [3]. However, one drawback of the convenience provided by social media is the quick dissemination of fake news.

In contrast to conventional mediums such as print media or television, the content of social media can be modified by users, thereby enriching the content with their opinions or biases. This can alter the meaning or context of the news altogether [5]. According to various studies, social media is a fertile ground for quick sharing of information without fact checking [1].

Fake news can be defined as the creation or modification of news content by social media user to deliberately or non-deliberately change its apparent meaning or context, contaminating it with their opinion or biases, where the intent may be to jeopardize or harm a person, organization, or society, monetarily or morally. Examples of fake news are sarcasm, memes, fake advertisements, fake political statements, and rumors [3]. A fakester is a term used for a person responsible for spreading fake news. News can have various degrees based on its credibility, i.e., true, half-true, and false [5]. Fake news can be transmitted in the form of images, video, and text. The life cycle of fake news has been described in [6] as the creation, publication, and propagation of the news.

The impact of fake news spread on social media is immense [7]. It can cause a decline in stock prices, a drop in potential investments, etc. [6]. For instance, the 2016 US election was heavily impacted by fake news [2]. The fake news about the death of President Obama led to the loss of USD 130 billion in the stock market in just a fraction of time. The intent of fake news may be to malign someone for political or personal intent or to mislead people [6]. There are numerous websites used for detecting fake news, such as FactCheck, Snopes, TruthorFiction, and PolitiFact. Moreover, Google has also launched an initiative called Google News Initiative to counter fake news [3]. However, fake news detection is still a cumbersome task. This is because fake news often contains misleading information contaminated with credible facts [2]. The motivation behind fake news can be driven by politics, financial benefit, or ideology [3,5]. In the literature, various approaches based on linguistic features or deep learning techniques, such as the recurrent neural network, convolutional neural network, transformer, bidirectional encoder representations from transformers (BERT), and their combination, have been used for fake news detection [8]. Detection of fake news can be classified as a binary or multi-class classification problem. Alternatively, it can be modeled as a regression problem. A number of datasets are also available for fake news classification, such as Kaggle, ISOT, and LIAR [3].

Despite the extensive studies being carried out, the problem of fake news detection is still very challenging, and it is believed that it requires a comprehensive multi-phased approach. Addressing this problem, this paper proposes a novel approach to validate the authenticity of news. The approach comprises first detecting the stance of the news, then identifying the author’s credibility, and finally using machine learning to classify the news as fake or authentic. The objective of the research is to classify news as fake or genuine based on various attributes, such as the text of the news and its author’s profile.

The potential implications of the proposed work are multifold. As discussed earlier, fake news related to medical symptoms can have severe consequences if assumed true by its consumer. Similarly, fake news can lead to irreparable damage in rgw health, political, social and economic sectors. By using the proposed approach, this catastrophic effect can be avoided. This study also serves as a baseline and opens up avenues for future research on fake news detection. There is a scarcity of research related to use of a three-pronged approach to fake news classification. Research based on machine learning and deep learning is being extensively carried out to identify a novel solution to the issue of fake news detection. The current paper proposes a three-step solution. We have not found any such study in the past. Finally, based on the proposed work, a commercial tool can be developed that can tag news as fake and also provide appropriate ratings on its credibility.

The remainder of this paper is structured as follows: Section 2 presents related work; Section 3 describes the proposed novel approach to detect fake news; the experimental results are discussed in Section 4; and, finally, Section 5 provides the conclusions and future directions.

2. Related Work

In recent years, several approaches have been identified to establish with a solution to the issue of the detection of fake news. Primarily, they are classified as machine learning approaches, hybrid approaches, topic-agnostic approaches, knowledge-based approaches, and language approaches [1]. The authors of [7] classified the approaches as news content-based learning and social context-based learning. The former is based on the styles of the news being published, while the latter is based on latent information provided to a user by a news article. Users present on social media play an active in the identification of fake news. For example, Facebook ranks the comments on a post based on the number of replies or user engagement for a particular post 6]. An analysis of the existing literature revealed that there is major work in the direction of stance detection, identifying authors’ credibility, and using machine learning to classify news as fake or not. Hence, we discuss the work in these three directions below. Interested readers are directed to [9] for a comprehensive survey.

2.1. Stance Detection

Among many natural language processing tasks, stance detection is a very important task. It can be the very first step in fact checking [10,11]. In 2016, an online contest was started known as the fake news challenge [12]. The objective of this challenge was to encourage the improvement of devices that may help human fact checkers to recognize intentional falsehood in reports using artificial intelligence (AI), regular language handling, and artificial knowledge. In this challenge, stance detection is regarded as stage 1 in the identification of fake news. The main aim is to determine the relevancy of a news article headline and its body. Chaudhary [13] et al. discussed numerous deep neural network-based models for stance detection. They found that using a pre-trained global vector for word representation (GloVe) and word embedding along with a long short-term Memory (LSTM)-based bidirectional condition encoding model provided the best performance with 97% accuracy.

Bhatt et al. [14] presented a novel approach combining neural, external, and statistical features. With the help of feature engineering heuristics, handcrafted external features and statistical features from the n-gram bag-of-words model, and the deep recurrent model, the neural embedding was computed. Bourgonje et al. [15] worked on a system that used a lemmatization-based n-gram approach to carry out binary classification of headlines and article sets. They achieved the best accuracy of the system using logistic regression. In [16], the authors proposed a method to detect spam comments on YouTube by using different machine learning algorithms with the n-gram approach, and they proved that this technique is effective in detecting spam comments. García et al. [17] introduced a system for text classification that executes embedded feature elimination via an a priori algorithm. The aim of their study was to speed up the word sequence constructions by minimizing the explored branches’ number as much as possible.

In order to classify fake news, Saikh et al. [18] used the technique of stance detection with textual entailment (TE). Moreover, they proposed a system that used a combination of deep learning and statistical machine learning approaches. To detect a stance in fake news, Ghanem et al. [19] combine n-gram, lexical features, and word embedding. They accomplished state-of-the-art results (59.6% Macro F1) on the FNC-1 dataset [20]. In [21], a deep neural network architecture was used to predict the stance of a headline and article body.

2.2. Author Credibility

Research suggests that information related to the authors of articles helps to identify whether the news presented is fake or not. Hence, another area of research is identifying author credibility. Sitaula et al. [2] discussed different attributes that could help to determine author credibility and its role in news. With the attributes explained, they identified 26 features that were obtained in different categories. This paper’s results show not only the credibility of a given article but also the credibility of articles published by the same author. According to [22], author credibility plays a very important role in identifying fake reviews online. However, most users do not consider author credibility before sharing news on social media [23].

Research suggests that information related to the authors of articles helps to identify whether the presented news is fake or not. Hence, another area of research is identifying author credibility. Sitaula et al. [2] discussed different attributes that could help to determine author credibility and its role in news. With the attributes explained, they identified 26 features that were obtained in different categories. This paper’s results show not only the credibility of a given article but also the credibility of articles published the same author. Another work related to author profiling is mentioned in [24]. A corpus of Twitter data was used for this purpose. According to [22], author credibility plays a very important role in identifying fake reviews online. However, most users do not consider author credibility before sharing news on social media [23]. Therefore, the work on author credibility can be considered to be in the stage of infancy and regarded as an open research challenge in various fields [25].

2.3. Machine Learning-Based Classification

In a considerable amount of research, machine learning algorithms have been used for fake news detection. The credibility of fake news is one of the most important discussions, and many approaches have evolved with time for its detection. To detect fake news in online text, Girgis [26] et al. utilized deep learning algorithms, such as LSTMs and RNN. Models (vanilla and GRU) were implemented on the LIAR dataset. Among all algorithms, GRU showed the best performance, so in order to achieve better accuracy, a hybrid model was developed using the techniques of CNN and GRU on the dataset. For the detection of fake news, Shlok et al. and Gilda [27] applied different machine learning approaches. More machine learning techniques for the detection of fake news can be found in [28,29,30].

Ajao et al. [31] used a long short-term recurrent neural network and hybrid between convolutional neural network models. They implemented various deep neural networks: (1) LSTM, (2) LSTM along with dropout regularization, and (3) LSTM-CNN. Among all approaches, LSTM stands out and gives 82% accuracy. Sajjad et al. [32] provided a model of decent accuracy to identify fake news using a framed model combined with knowledge engineering and machine learning. In another work, automated discovery of social news is proposed, utilizing three-element extraction procedures, a count vectorizer, term frequency–inverse document frequency, and a hashing vectorizer [4]. An ensemble-based technique for fake news detection is presented in [33]. Ensemble-based approaches combined various weak classifiers to achieve better accuracy for combined classification tasks. In [34], various machine learning algorithms, such as logistic regression, naive Bayes, and random forest classification, are used.

In [31], a deep learning technique called Fake-BERT was used for the detection of fake news. In [6], a deep learning-based model, EchoFakeD, was proposed with a mix of content and contextual features. The authors proposed an effective tensor factorization scheme. In a number of studies, data augmentation, transfer learning, auto-encoders, and other semi-supervised models have been used for fake news detection [8]. A capsule-based neural network was used in [3] to classify fake news. In [35], the authors used geometric deep learning based techniques for fake news detection. These are an extension of the convolutional neural network that fuses other information, such as user profiles, news propagation, and the actual content. A hybrid deep learning model based on the combination of CNN and RNN was presented in [36]. The proposed model utilizes a combination of embedding, CNN, and RNN layers implemented in Keras and tested on ISO and FA-KES datasets. In [37], blockchain technology was used for the detection of fake news.

In recent years, following the spread of COVID-19, several pieces of fake news have spread in this context. Therefore, numerous studies have focused on the detection of news related to COVID-19. For instance, a novel approach to the detection of fake tweets related to COVID-19 was proposed in [8]. In a similar direction, an analysis of public sentiments based on tweets related to COVID-19 was performed in [38]. In [36], several supervised learning approaches, such as CNN, LSTM, and BERT, were used for the detection of fake news related to COVID-19. Moreover, unsupervised learning techniques, such as model pre-training and distributed word representations, were used.

After an extensive review of the literature, it was found that most of the studies on this topic have focused on stance detection, author credibility, and classification of news. However, existing approaches are limited because of the lack of social or political context awareness underlying the news. Therefore, a multi-stage pipeline is required for the correct classification of the credibility of news. This paper presents a novel approach, combining stance detection, author credibility, and news classification. This approach is motivated by [34], a study in which several machine learning algorithms are used for classification. The objective of this study is to spot fake news on a social medial platform, i.e., Twitter. Similar studies focusing on a specific platform have been conducted [35,39,40].

3. Proposed Approach and Implementation Details

This paper proposed a novel approach to fake news detection. The proposed method comprises the following modules: (1) data collection, (2) pre-processing, (3) feature extraction, and (4) inference engine. The architecture of this fake news detector is depicted in Figure 2.

3.1. Dataset Description

For this paper, a dataset called the fake news dataset [14] is selected from Kaggle. The dataset contains five features, namely “Id”, “Title”, “Text”, “Author”, and “Label.” The dataset has 20718 entries, of which 10349 entries are deemed fake news and the remaining are real news. A description of the dataset is provided in Table 1. A few records of the dataset are displayed in Figure 3. The extracted data from the dataset were passed through the pre-processing module. By using the Natural Language Tool Kit (NLTK) library [19], the text was divided sentence by sentence in tokens. This was followed by Parts of Speech (PoS) tagging, lemmatization, stop word elimination, and Named Entity Recognition (NER). In this module, the proposed model not only identifies traditional NER (i.e., name, location, and organization), but it also recognizes multiple NER, such as movies, book titles, cartoons, etc. This extension of NER is achieved by utilizing DBpedia.

A word cloud was made for the headline and body text of fake and real news in the selected dataset, and it is shown in Figure 4. Word cloud is a visualization technique of word frequency. The more regularly terms show up in the content being assessed, the bigger the word in the image created. For machine learning with fake news detection, pre-processed text documents should be represented in vector form. To convert text into features, machine learning provides a variety of options in which classifiers use Bags of Word (BoW) along with the TF-IDF vectorizer. Furthermore, the data were split into train, validation, and test datasets.

3.2. Proposed Approach: Inference Engine

This section discusses the proposed multi-stage approach, i.e., (1) stance detection, (2) author credibility verification, and (3) machine learning-based classification.

During stance detection, the very first step in the inference engine, it is determined whether or not the headline and the body of a news article are relevant or not. Listing 1 shows the pseudo-code of stance detection. In order to find relevancy, the cosine similarity technique is implemented, which is used to find similarity between two text documents irrespective of their size. If their headlines and body texts are similar, then one can proceed to the next module, i.e., author credibility; otherwise, the model declares that the examined news is fake news. In NLP, it is a well-informed and popular approach. It allows for detection in favor of the audience, and from the text, it determines whether the audience found the objective to be against, in favor of, or impartial to the target [41]. The objective could be an individual, an association, an administration strategy, a development, an item, and so forth.

Listing 1. Stance detection.

def get_vectors(title,text):

vocab = [title,text]
vectorizer = CountVectorizer(vocab)
vectorizer.fit(vocab)
return (vectoriz-er.transform([title]).toarray(),vectorizer.tranform([text]).toarray())

def stance_detection(row)

global total, fake
title, text = get_vectors(row[‘title’],row[‘text’]
total +=1
if(p.cosine_similarity(title,text) < 0.25:
fake += 1

frame . apply (stance_detection , axis =1 )

The next step is the verification of author credibility. In this module, the inference engine validates an author’s information to judge whether the news is fake or not. Twitter API [42] is used to obtain the author’s Twitter profile. It first checks how many followers the author has and then checks how many times this news has been retweeted.

Priya Gupta et al. in [41] described different features of evaluating the believability of client-produced content on Twitter, and a novel continuous framework to survey the trustworthiness of tweets was proposed. The discussed framework was implemented to accomplish this by relegating a score or rating to content on Twitter to show its dependability. The authors of [43] et al. investigated different grouping strategies in order to help versatility, and another solution to the constraints present in previously existing procedures was proposed.

Finally, for fake news detection, four different machine learning algorithms are applied. In this paper, we compare the results of all four algorithms. The selected algorithms are as follows:

A decision tree is one of the most popular classifiers that helps in prediction and classification, and it is supervised in nature. It splits the dataset by recursively selecting features. The selected features of the dataset can be in nominal or continuous form. This is a well-known classifier for data classification. The most distinct feature is the conversion of the process of complex decisions in order to simplify the process definition, and, as a result, it provides an easy way to understand and interpret the outcome [44].
Random forest is a regulated AI method that is supervised in nature. On the basis of random element choice, a set of decision trees (base classifiers) is produced, and the dominant party with respect to voting is selected for classification. It generates accurate and diverse decisions that are dynamic algorithms for this classifier [45]. In a random forest, the individual decision trees are an ensemble, and they operate on average to increase the accuracy of the prediction of the model. This model also focuses on the reduction in over-fitting. The sub-samples are drawn with replacement, keeping their size the same as the original input sample size.
Logistic regression is an AI technique for classification. In this algorithm, the probabilities portraying the potential results of the possible outcomes are demonstrated utilizing a logistic function. It is widely used in circumstances in which humans are not suited to perform the classification and automated functionality is required for this purpose [46].
The support vector machine (SVM) is known as a supervised learning algorithm that is widely used to predict or classify data. Its classifier is officially characterized by an isolating hyperplane. That is, the labeled dataset for training is required, and the algorithm yields an ideal hyperplane that generates new examples. In two-dimensional space, this hyperplane is a line separating a plane in two sections where each class is located on one of the two sides. SVM carries out generous upgrades and best-performing strategies, and it can be applied to a wide range of learning tasks. Moreover, it is completely programmed, eliminating the requirement for manual parameter tuning [47].

Figure 5 presented below shows the complete workflow of the implemented model.

4. Experimental Results

For experiments, the authors of this paper implemented the proposed approach in Python. To begin the experiment, the selected dataset was passed through the proposed pipeline. Initially, the pre-processing step was performed by using the NLTK library. Stance detection and author credibility were then determined. During the author credibility and stance detection phases, 28.88% of the news was classified as fake, among which 8% was in fact genuine (Figure 6).

In the last step, different machine learning algorithms were applied to the data after the pre-processed text document was converted into vector form using the TF-IDF vectorizer.

Moreover, different machine learning algorithms were applied to the proposed dataset. The first model applied was a decision tree for the detection of fake news. The performance of the decision tree was represented by a confusion matrix. Figure 7 shows the confusion matrix in a heatmap. A confusion matrix shows the true positive, true negative, false positive, and false negative values in the form of a matrix. The definitions of each of these terms are as follows:

True positive (TP): a classifier prediction is true positive if the news is authentic, and the classifier predicts it as authentic.
False-positive (FP): a classifier prediction is false positive if the news is fake, and the classifier predicts it as authentic.
True negative (TN): a classifier prediction is true negative if the news is fake, and the classifier predicts it as fake.
False-negative (FN): a classifier prediction is false negative if the news is authentic, and the classifier predicts it as fake.

It can be seen that for the decision tree, TP is 1916, and TN is 1524. Hence, the overall accuracy is as follows:

\begin{matrix} Accuracy = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(1)

\begin{matrix} Accuracy = \frac{1916 + 1524}{4572} \\ Accuracy = 75.24 % \end{matrix}

In many situations, accuracy is not a very good measure. Hence, it is essential to calculate other measures, such as precision, recall, and F1-score. The definitions of these terms are as follows:

Precision: the ratio of positive examples that were correctly predicted by the classifier to the total number of examples predicted as positive.
Recall: the ratio of the total number of true positives to the actual number of examples that were positive.
F1-score: the weighted average score of precision and recall.

The precision of the classifier is defined mathematically as

\begin{matrix} Precision = \frac{T P}{T P + F P} \end{matrix}

(2)

\begin{matrix} Precision = \frac{1916}{1916 + 599} \\ Precision = 76.18 % \end{matrix}

The recall of the classifier is defined mathematically as

\begin{matrix} Recall = \frac{T P}{T P + F N} \end{matrix}

(3)

\begin{matrix} Recall = \frac{1916}{1916 + 533} \\ Recall = 78.23 % \end{matrix}

Finally, F1-score is meant to balance precision and recall. It is defined as

\begin{matrix} F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{matrix}

(4)

\begin{matrix} F 1 = 2 \times \frac{76.18 \times 78.23}{76.18 + 78.23} \\ F 1 = 77.19 % \end{matrix}

The confusion matrix for random forest classifier, as illustrated in Figure 8, shows that the accuracy of the classifier is 82.23%, the precision value is 81.95%, the recall is 84.44%, and the F1-score is 83.17%.

The confusion matrix and accuracy of this logistic regression classifier, as illustrated in Figure 9, shows that the accuracy of the classifier is 87.2%, the precision value is 87.90%, the recall is 88.88%, and the F1-score is 88.30%.

Lastly, an SVM classifier was applied. The confusion matrix and the accuracy of this classifier are shown in Figure 10, and it can be observed that the accuracy of the classifier is 93.15%, the precision value is 92.65%, the recall value is 95.71%, and the F1-score is 94.15%.

After implementing all of the classifiers, their results were compared, and it was observed that all of the experiments conducted using the support vector machine provide the best accuracy for the proposed fake news detector and perform better than the other classifiers with an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15%. Table 2 and Figure 11 provide a comparison of various aspects of the classifier. Comparing the SVM with logistic regression, which was the second best classifier, it can be observed that SVM is better than logistic regression in terms of accuracy as follows:

\begin{matrix} Improvement in accuracy = \frac{93.15 - 87.20}{87.20} \\ Improvement in accuracy = 6.82 % \end{matrix}

5. Conclusions and Future Work

The detection of fake news on social media platforms is an essential topic of discussion considering the wide dissemination of news and the number of people consuming information through it. In this paper, a solution is proposed based on natural language processing and machine learning for a fake news dataset produced by Kaggle. The proposed approach is based on stance detection, author credibility, and machine learning algorithms. Stance detection verifies the relevancy between the title and paragraphs of a news article; if there is a match, the next module checks whether the author is authentic in order to determine whether or not the news should be believed. Finally, machine learning algorithms, i.e., logistic regression, support vector machine, decision tree, and random forest, are implemented, and among these, the support vector machine stands out with an accuracy of 93.15%.

In modern day, access to the internet has become ubiquitous. In just one minute on the internet, 18 million text messages are exchanged over WhatsApp, 2.4 million snaps are created on SnapChat, 38 million SMS messages and 187 million emails are sent, and 0.5 million tweets are posted [48]. Unfortunately, most of the population is dependent on the consumption of information from the internet. Hence, fake news detection has become a major concern. Most of the information flow on the internet is unverified and generally assumed true. This can be used to spread misinformation, destabilize a regime, and create riots. It has been predicted that in the next few years, people will consume more false information than true content [21]. Unfortunately, most content analyses cannot address fake news detection because of its challenges. The existing natural language processing techniques are limited because of the absence of the political or social context required to understand the content [35]. Therefore, there is a need for a multi-stage solution that can address this issue in the form of a pipeline. The proposed approach provides a three-pronged solution to verify the authenticity of any news article. After working on the stance and credibility of the author, the solution is then formulated to address a machine learning problem using any of the tested algorithms, such as SVM, random forest, and decision trees. The main advantages of using machine learning are its ability to learn the rules for the detection of fake news by using data and the fact that the end user is not required to explicitly program these rules.

There are several limitations of the proposed approach that can be worked on in the future. The proposed approach does not consider the correlation among news items. The correlation among news articles can assist in determining the credibility of a news article. Moreover, the author credibility check is based on Twitters’ information. This can be extended to include other attributes that are generally not available on social media. The proposed approach can also be extended to the use of advanced deep learning algorithms based on convolutional neural networks, LSTM, GRU, or BERT. Currently, the proposed approach is a sequential pipeline, and news passes through each stage one by one. A novel objective function can be developed based on the scores of stance detection, author credibility, and a machine learning classifier to determine if news is fake or not in a joint fashion. The currently available solutions only mark the news as authentic or unauthentic; however, a working solution requires the score or rating on the credibility of news. The detection of fake news is only one aspect of a bigger problem. Work regarding the fake news evolution process, its mitigation, and later steps of account detection and deletion must also be conducted.

Author Contributions

Conceptualization, N.I., A.S. (Asadullah Shaikh) and A.Q.; methodology, Y.A., S.A. and A.S. (Adel Sulaiman); software, V.M. and S.A.B.; validation, A.S. (Asadullah Shaikh) and V.M.; formal analysis, S.A. and Y.A.; investigation, A.Q., N.I. and A.S. (Asadullah Shaikh); writing—original draft preparation, N.I. and A.S. (Asadullah Shaikh); writing—review and editing, Y.A. and A.S. (Adel Sulaiman); supervision, V.M. and S.A.B.; funding acquisition, A.S. (Asadullah Shaikh). All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to acknowledge the support of the Deputy for Research and Innovation—Ministry of Education, Kingdom of Saudi Arabia, for this research through a grant (NU/IFC/INT/01/008) under the institutional Funding Committee at Najran University, Kingdom of Saudi Arabia.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

De Beer, D.; Matthee, M. Approaches to identify fake news: A systematic literature review. In International Conference on Integrated Science, Cambodia; Springer: Basel, Switzerland, 2020; pp. 13–22. [Google Scholar]
Sitaula, N.; Mohan, C.K.; Grygiel, J.; Zhou, X.; Zafarani, R. Credibility-based fake news detection. In Disinformation, Misinformation, and Fake News in Social Media; Springer: Basel, Switzerland, 2020; pp. 163–182. [Google Scholar]
Goldani, M.H.; Momtazi, S.; Safabakhsh, R. Detecting fake news with capsule neural networks. Appl. Soft Comput. 2021, 101, 106991. [Google Scholar] [CrossRef]
Kaur, S.; Kumar, P.; Kumaraguru, P. Automating fake news detection system using multi-level voting model. Soft Comput. 2020, 24, 9049–9069. [Google Scholar] [CrossRef]
Bühler, J.; Murawski, M.; Darvish, M.; Bick, M. Developing a Model to Measure Fake News Detection Literacy of Social Media Users. In Disinformation, Misinformation, and Fake News in Social Media; Springer: Basel, Switzerland, 2020; pp. 213–227. [Google Scholar]
Kaliyar, R.K.; Goswami, A.; Narang, P. EchoFakeD: Improving fake news detection in social media with an efficient deep neural network. Neural Comput. Appl. 2021, 33, 8597–8613. [Google Scholar] [CrossRef] [PubMed]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
Paka, W.S.; Bansal, R.; Kaushik, A.; Sengupta, S.; Chakraborty, T. Cross-SEAN: A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Appl. Soft Comput. 2021, 107, 107393. [Google Scholar] [CrossRef]
Saxena, A.; Saxena, P.; Reddy, H. Fake News Detection Techniques for Social Media. In Principles of Social Networking; Springer: Singapore, 2022; pp. 325–354. [Google Scholar]
Mohammad, S.; Kiritchenko, S.; Sobhani, P.; Zhu, X.; Cherry, C. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, 16–17 June 2016; pp. 31–41. [Google Scholar]
Riedel, B.; Augenstein, I.; Spithourakis, G.P.; Riedel, S. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv 2017, arXiv:1707.03264. [Google Scholar]
Pomerleau, D.; Rao, D. Fake News Challenge Stage 1 (fnc-i): Stance Detection. 2017. Available online: www.fakenewschallenge.org (accessed on 10 May 2021).
Chaudhry, A.K.; Baker, D.; Thun-Hohenstein, P. Stance detection for the fake news challenge: Identifying textual relationships with deep neural nets. In CS224n: Natural Language Processing with Deep Learning; Lecture Notes; Standaford NLP: Stanford, CA, USA, 2017; pp. 1–117. Available online: http://web.stanford.edu/class/cs224n/ (accessed on 10 May 2021).
Bhatt, G.; Sharma, A.; Sharma, S.; Nagpal, A.; Raman, B.; Mittal, A. Combining neural, statistical and external features for fake news stance identification. In Proceedings of the WWW ’18: Companion Proceedings of the The Web Conference 2018; Geneva, Switzerland, 23–27 April 2018, pp. 1353–1357.
Bourgonje, P.; Schneider, J.M.; Rehm, G. From clickbait to fake news detection: An approach based on detecting the stance of headlines to articles. In Proceedings of the 2017 EMNLP workshop: Natural Language Processing Meets Journalism, Copenhagen, Denmark, 2 May 2017; pp. 84–89. [Google Scholar]
Aiyar, S.; Shetty, N.P. N-gram assisted youtube spam comment detection. Procedia Comput. Sci. 2018, 132, 174–182. [Google Scholar] [CrossRef]
García, M.; Maldonado, S.; Vairetti, C. Efficient n-gram construction for text categorization using feature selection techniques. Intell. Data Anal. 2021, 25, 509–525. [Google Scholar] [CrossRef]
Saikh, T.; Anand, A.; Ekbal, A.; Bhattacharyya, P. A novel approach towards fake news detection: Deep learning augmented with textual entailment features. In Proceedings of the 24th International Conference on Applications of Natural Language to Information Systems, NLDB 2019, Salford, UK, 26–28 June 2019; pp. 345–358. [Google Scholar]
Ghanem, B.; Rosso, P.; Rangel, F. Stance detection in fake news a combined feature representation. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium, 1 November 2018; pp. 66–71. [Google Scholar]
Ferreira, W.; Vlachos, A. Emergent: A novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1163–1168. [Google Scholar]
Thota, A.; Tilak, P.; Ahluwalia, S.; Lohia, N. Fake news detection: A deep learning approach. SMU Data Sci. Rev. 2018, 1, 10. [Google Scholar]
Munzel, A. Assisting consumers in detecting fake reviews: The role of identity information disclosure and consensus. J. Retail. Consum. Serv. 2016, 32, 96–108. [Google Scholar] [CrossRef]
Xu, W.W.; Sang, Y.; Kim, C. What drives hyper-partisan news sharing: Exploring the role of source, style, and content. Digit. J. 2020, 8, 486–505. [Google Scholar]
Rangel, F.; Giachanou, A.; Ghanem, B.H.H.; Rosso, P. Overview of the 8th author profiling task at PAN 2020: Profiling fake news spreaders on Twitter. In CEUR Workshop Proceedings; Sun SITE Central Europe: Aachen, Germany, 2020; Volume 2696, pp. 1–18. [Google Scholar]
Parikh, S.B.; Atrey, P.K. Media-rich fake news detection: A survey. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, USA, 10–12 April 2018; pp. 436–441. [Google Scholar]
Kumar, A.; Upadhyay, M. Rumour Stance Classification using A Hybrid of Capsule Network and Multi-Layer Perceptron. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 4110–4120. [Google Scholar]
Ajao, O.; Bhowmik, D.; Zargari, S. Fake news identification on twitter with hybrid cnn and rnn models. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018; pp. 226–230. [Google Scholar]
Girgis, S.; Amer, E.; Gadallah, M. Deep Learning Algorithms for Detecting Fake News in Online Text. In Proceedings of the 2018 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 18–19 December 2018; pp. 93–97. [Google Scholar]
Gilda, S. Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms for fake news detection. In Proceedings of the 2017 IEEE 15th Student Conference on Research and Development (SCOReD), Wilayah Persekutuan Putrajaya, Malaysia, 13–14 December 2017; pp. 110–115. [Google Scholar]
Ahmed, S.; Hinkelmann, K.; Corradini, F. Combining machine learning with knowledge engineering to detect fake news in social networks-a survey. In Proceedings of the AAAI 2019 Spring Symposium, Palo Alto, CA, USA, 25–27 March 2019; Volume 12, p. 8. [Google Scholar]
Library, N. Natural Language Toolkit. 1999. Available online: https://www.nltk.org/ (accessed on 21 August 2021).
Kaggle. Fake news Dataset. 2018. Available online: https://www.kaggle.com/c/fake-news/data (accessed on 21 August 2021).
Jindal, R.; Dahiya, D.; Sinha, D.; Garg, A. A Study of Machine Learning Techniques for Fake News Detection and Suggestion of an Ensemble Model. In Proceedings of the International Conference on Innovative Computing and Communications, New Delhi, India, 19–20 February 2022; Springer: Berlin/Heidelberg, Germany; pp. 627–637. [Google Scholar]
Shrivastava, S.; Singh, R.; Jain, C.; Kaushal, S. A Research on Fake News Detection Using Machine Learning Algorithm. In Smart Systems: Innovations in Computing; Springer: Singapore, 2022; pp. 273–287. [Google Scholar]
Monti, F.; Frasca, F.; Eynard, D.; Mannion, D.; Bronstein, M.M. Fake news detection on social media using geometric deep learning. arXiv 2019, arXiv:1902.06673. [Google Scholar]
Nasir, J.A.; Khan, O.S.; Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 2021, 1, 100007. [Google Scholar]
Paul, S.; Joy, J.I.; Sarker, S.; Ahmed, S.; Das, A.K. Fake news detection in social media using blockchain. In Proceedings of the 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, 28–30 June 2019; pp. 1–5. [Google Scholar]
Manguri, K.H.; Ramadhan, R.N.; Amin, P.R.M. Twitter sentiment analysis on worldwide COVID-19 outbreaks. Kurd. J. Appl. Res. 2020, 5, 54–65. [Google Scholar] [CrossRef]
Helmstetter, S.; Paulheim, H. Weakly supervised learning for fake news detection on Twitter. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018; pp. 274–277. [Google Scholar]
Buntain, C.; Golbeck, J. Automatically identifying fake news in popular twitter threads. In Proceedings of the 2017 IEEE International Conference on Smart Cloud (SmartCloud), New York, NY, USA, 3–5 November 2017; pp. 208–215. [Google Scholar]
Gupta, P.; Pathak, V.; Goyal, N.; Singh, J.; Varshney, V.; Kumar, S. Content credibility check on Twitter. In Proceedings of the International Conference on Application of Computing and Communication Technologies, New Delhi, India, 9–10 March 2018; Springer: Singapore, 2018; pp. 197–212. [Google Scholar]
Twitter, I. Twitter API. 2021. Available online: https://developer.twitter.com (accessed on 21 August 2021).
Gupta, P.; Thakral, R.; Aggarwal, M.; Bhatti, S.; Jain, V. A Proposed Framework to Analyze Abusive Tweets on the Social Networks. Int. J. Mod. Educ. Comput. Sci. 2018, 10, 46–56. [Google Scholar] [CrossRef][Green Version]
Priyanka.; Kumar, D. Decision tree classifier: A detailed survey. Int. J. Inf. Decis. Sci. 2020, 12, 246–269. [Google Scholar] [CrossRef]
Kulkarni, V.Y.; Sinha, P.K. Pruning of random forest classifiers: A survey and future directions. In Proceedings of the 2012 International Conference on Data Science & Engineering (ICDSE), Cochin, India, 18–20 July 2012; pp. 64–68. [Google Scholar]
De Menezes, F.S.; Liska, G.R.; Cirillo, M.A.; Vivanco, M.J. Data classification with binary response through the Boosting algorithm and logistic regression. Expert Syst. Appl. 2017, 69, 62–73. [Google Scholar] [CrossRef]
Joachims, T. Machine Learning: ECML-94. In Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Springer Science & Business Media: Singapore, 2005; Volume 784, pp. 627–637. [Google Scholar]
Desjardins, J. What Happens in an Internet Minute in 2018? 2018. Available online: https://www.visualcapitalist.com/internet-minute-2018 (accessed on 22 September 2021).

Figure 1. Examples of some fake news [4].

Figure 2. Fake news detection approach.

Figure 3. A snapshot of the dataset.

Figure 4. Word cloud of the various news articles.

Figure 5. Flow diagram of the architecture.

Figure 6. Result of authors’ credibility and stance detection.

Figure 7. Confusion matrix for decision tree algorithm.

Figure 8. Confusion matrix for random forest algorithm.

Figure 9. Confusion matrix for logistic regression algorithm.

Figure 10. Confusion matrix for SVM.

Figure 11. Accuracy, precision, recall, and F1-score of various classifiers.

Table 1. Description of the dataset.

Column	Description
Id	A unique Id assigned to each piece of news
Title	The title of the news
Text	News text
Label	The label of the news

Table 2. Comparison of classifier performance.

Machine Learning Algorithm	TP	FP	FN	TN	Accuracy	Precision	Recall	F1
Decision Tree	1916	599	533	1524	75.24%	76.18%	78.23%	77.19%
Random Forest	2008	442	370	1752	82.23%	81.95%	84.44%	83.17%
Logistic Regression	2231	306	279	1756	87.20%	87.90%	88.88%	88.30%
Support Vector Machine	2523	200	113	1736	93.15%	92.65%	95.71%	94.15%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Islam, N.; Shaikh, A.; Qaiser, A.; Asiri, Y.; Almakdi, S.; Sulaiman, A.; Moazzam, V.; Babar, S.A. Ternion: An Autonomous Model for Fake News Detection. Appl. Sci. 2021, 11, 9292. https://doi.org/10.3390/app11199292

AMA Style

Islam N, Shaikh A, Qaiser A, Asiri Y, Almakdi S, Sulaiman A, Moazzam V, Babar SA. Ternion: An Autonomous Model for Fake News Detection. Applied Sciences. 2021; 11(19):9292. https://doi.org/10.3390/app11199292

Chicago/Turabian Style

Islam, Noman, Asadullah Shaikh, Asma Qaiser, Yousef Asiri, Sultan Almakdi, Adel Sulaiman, Verdah Moazzam, and Syeda Aiman Babar. 2021. "Ternion: An Autonomous Model for Fake News Detection" Applied Sciences 11, no. 19: 9292. https://doi.org/10.3390/app11199292

APA Style

Islam, N., Shaikh, A., Qaiser, A., Asiri, Y., Almakdi, S., Sulaiman, A., Moazzam, V., & Babar, S. A. (2021). Ternion: An Autonomous Model for Fake News Detection. Applied Sciences, 11(19), 9292. https://doi.org/10.3390/app11199292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ternion: An Autonomous Model for Fake News Detection

Abstract

1. Introduction

2. Related Work

2.1. Stance Detection

2.2. Author Credibility

2.3. Machine Learning-Based Classification

3. Proposed Approach and Implementation Details

3.1. Dataset Description

3.2. Proposed Approach: Inference Engine

4. Experimental Results

5. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI