Sentiment Analysis for Fake News Detection

: In recent years, we have witnessed a rise in fake news, i


Introduction
People are spending more and more time interacting on social media, as the wide adoption of smartphones makes its access available almost anytime and anywhere, which is not the case with traditional media. In addition, they facilitate interaction with friends, families, and even complete strangers through the comment chains, be it through comments, discussions, or simply like and dislike buttons. This has made social media a main channel for the dissemination of news. According to the Pew Research Center's Journalism Project [1], in 2020, 53% of US adults say they obtained news from social media "often" or "sometimes", with 59% of Twitter users and 54% of Facebook users consuming news on the site regularly. Interestingly, 59% of those who obtained news on social media said they expected that news to be largely inaccurate.
Such inaccurate information might result either from a deliberate attempt to deceive or mislead (disinformation) or from an honest mistake (misinformation) [2]. Rumors can fall into either of these two categories, depending on the intent of the source, given that rumors are not necessarily false but may turn out to be true [3]. Unlike rumors, fake news are, by definition, always false and, thus, can be seen as a type of disinformation. Other types of potentially false information that we can find on social media are propaganda, conspiracy theories, hoaxes, biased or one-sided stories, clickbait, and satire news, contributing to information pollution [4]. False information can be propagated by bots, criminal/terrorist organizations, activist or political organizations, governments, hidden paid posters, statesponsored trolls, journalists, useful idiots, conspiracy theorists, individuals that benefit from false information, and trolls [5]. The motivation behind these actors can be to hurt or disrepute, to obtain financial gain by increasing site views, to manipulate public opinion, to create disorder and confusion, to promote ideological biases, or even as individual entertainment [6]. Sentiment Analysis (SA) is the branch of Natural Language Processing (NLP) in charge of the design and implementation of models, methods, and techniques to determine whether a text deals with objective or subjective information and, in the latter case, to determine if such information is expressed in a positive, neutral, or negative way as well as if it is expressed in a strong or weak way. Since a large part of the subjective content expressed by users on social networks is about opinions (on review sites, forums, message boards, chats, etc.), SA is also known as Opinion Mining (OM).
The expression of sentiment plays an important role in fake news. Social media users tend to comment on posts when there is content that they consider arousing but on which they feel less in control. Conversely, users tend to share a post when they feel more in control [7]. By combining various sentiment variables, Dickerson et al. [8] showed that sentiment-related behavior was sufficient for distinguishing human accounts and social bot accounts. In order to increase the spread of news, headlines should stimulate the reader's curiosity and engage them emotionally. It is not by chance that the spread of fake news is often associated with the presence of clickbait, where the use of the configurations of emotional valence or polarity (positive and negative) and arousal (strong and weak) are knowingly used by publishers to misdirect readers [9] given that a relevant portion of the fake news audience does not read beyond the headlines [10]. Consequently, SA provides crucial information on the content of a news article to determine whether it is trustworthy or should be considered as fake news.
In this article, we discuss the different approaches that have been used to incorporate sentiment in the process of detecting fake news. After providing a description of the related work in Section 2, we define in Section 3 what is meant by fake news and the implication of its dissemination in today's society. We then focus on the process of detecting fake news in Section 4, where we also show a complete list of the data sets that have been built to assist the construction and evaluation of this type of systems as well as the competitive evaluation campaigns that have emerged in recent years; then, we end the section with a description of the measures used to determine the performance of a fake news detection system. In Section 5, we describe SA as a tool that has been used effectively in the broad field of text analytics. Section 6 is dedicated to systems that apply SA in the detection of fake news, considering both those in which SA is the basis of the system and those in which the results obtained by an SA system are used as a feature in a machine learning system. We continue in Section 7 with a discussion of the most relevant elements of such systems, showing in tabular form their main components and proposing the challenges faced in the near future. We finish with the conclusions in Section 8.

Related Work
There have been a number of articles reviewing the state-of-the-art in fake news detection in a given moment, but none of them had sentiment analysis or the use of sentiment information as their main focus, as in this case.
Conroy et al. [11] surveyed two major categories of methods for finding fake news. The first one corresponded to linguistic approaches in which the content of deceptive messages is extracted and analyzed to associate language patterns (usage of words, ngrams and syntactic constructions, semantic similarity, and rhetoric relations between linguistic elements) with deception. The second one corresponded to network approaches in which network information, such as message metadata or structured knowledge network queries, can be harnessed to provide aggregate deception measures. In the specific case of the detection of fake reviews, SA was considered a useful technique not specifically to detect fake texts but to detect fake negative reviewers as they overproduced negative emotional terms when compared to truthful reviews as a result of exaggerations of the sentiment they were trying to convey.
Shu et al. [12] described the psychological and social foundations of fake news in traditional and social media and surveyed the features and models used by detection techniques designed to address this phenomenon, considering both news content features and models as well as social context features and models, which can be based on posts, individual users, or user networks. They considered that SA should play a role in determining post-based features, as people express their emotions or opinions towards fake news through social media posts, such as skeptical opinions or sensational reactions. Years later, Shu et al. [13] revisited the subject by exploring weak social supervision for fake news detection and they stated that user comments that are related to the content of original news pieces are helpful to detect fake news and to explain prediction results. They also considered that machine-generated text created by successful deep generative models can be a new type of fake news that is fluent, readable, and catchy. With respect to sentiment analysis, they still considered sentiment among the features that can be extracted from text for fake news detection given that conflicting sentiments among news spreaders may indicate a high probability of fake news. In [14], Shu and Liu reviewed representative fake news detection methods in a principled way and illustrated challenging issues of fake news detection on social media from a data mining perspective.
Hussein [15] classified 41 articles on sentiment analysis according to the challenge they addressed. They found that eight articles addressed negation, seven dealt with domain dependence, six were dedicated to spam and fake detection, two addressed world knowledge, eight dealt with NLP overheads (sarcasm was included in this challenge), three worked on feature extraction, three studied bipolar words (words in which polarity depends on the context in which they are used [16] ), and four dealt with huge lexicons. As can be seen, fake detection is one of the main challenges, although most of the articles analyzed in [15] did not deal with fake news but rather with the detection of fake websites or fake reviews.
Thorne and Vlachos [17] reviewed fact-checking in journalism and listed the resources and methods available to automate such a task as well as the related tasks that could benefit from them. Elhadad et al. [18] differentiated fake news from other forms of disseminating disinformation, misinformation, and malinformation, such as hoaxes, propaganda, satire/parody, rumors, clickbait, and junk news. They added malinformation to the classical categories of disinformation and misinformation. Malinformation was defined as the sharing of genuine information with the intent to cause harm. However, fabricated and junk news, which cannot be considered to contain genuine information, were considered as a possible malinformation realization, which seems contradictory. Sentiment analysis was not mentioned in either [17] or [18].
Bondielli and Marcelloni [19] described the features that have been considered in fake news and rumour detection approaches, provided an analysis on the various techniques used to perform these tasks, and highlighted how the collection of relevant data for performing them is problematic. They considered that the information provided by sentiment analysis techniques can be used to obtain one of the most relevant semantic features of fake news texts.
Sharma et al. [6] addressed fake news detection and mitigation techniques that focus on computational methods to tackle these tasks, compiled a list of available data sets around fake news detection, and proposed a list of challenges and open issues. They found that sentiment analysis was a useful cue for fake news detection, as positive sentiment words tended to be exaggerated in positive fake reviews compared to their true counterparts while replies to fake news on social media tended towards negative sentiment.
Da Silva et al. [20] investigated machine learning approaches and techniques to detect fake news, finding that the preferred methods involved neural networks composed of classical classification algorithms that heavily focus on lexical analysis of the entries as main features for prediction. Sentiment analysis was often used as a content feature in the form of words belonging to sentiment lexicons or as the result of a machine learning-based sentiment analysis system. Klyuev [21] also discussed different approaches to combat fake news and the importance of determining text features by means of natural language processing methods in order to create a profile of the text document. Although he showed the importance of using dictionaries that contain, among other information, the sentiment polarity of words, he did not explicitly mention the task of sentiment analysis.
Collins et al. [22] described various human-based and machine-based fake news detection models. On the human side, professional fact-checkers are experts in various disciplines who are capable of verifying the veracity of certain news items and decide whether such information is fake or authentic. Human reviewers are overwhelmed by the continued rise in fake news, so the "wisdom of the crowds" can be drawn on through crowdsourcing, based on the premise that no matter how smart someone is, the collective effort of individuals or groups supersedes any single individual's intellectual capacity. In practice, crowdsourcing is mainly used for the annotation of data sets that are then used for the training of machine learning models rather than for the detection of fake news itself [23]. On the machine side, automated fake news detection techniques use machine learning in conjunction with natural language processing techniques, including sentiment analysis. Hybrid techniques were also considered, in particular an expert-crowdsource approach that combined two methods of manual fact-checking and a human-machine approach that combined machine learning algorithms and the collective effort of humans.
Meel and Vishwakarma [4] surveyed how the contents on the web are contaminated intentionally or sometimes unintentionally by fake reviews, fake news, and satire, among other sources of information pollution. For this purpose, they studied the false information ecosystem, from the categorization of false information and the motivations to spread it to the social impact and user perception. They also discussed the current state of fact-checking, including source detection, propagation dynamics, methods of detection, and methods for containment and intervention. They considered sentiment analysis as one of the base sources of information needed to detect false information.
Oshikaw et al. [24] studied the technical challenges in fake news detection and how researchers define different tasks and formulate machine learning solutions to tackle this task, focusing on how fake news detection was aligned with existing natural language processing tasks.
Zhang and Ghorbani [25] characterized the negative impact of online fake news and studied detection methods for this type of information, finding that many of them rely on identifying features of the users, content, and context that indicate misinformation. They stated that accurate fake news detection is challenging due to the dynamic nature of social media as well as the complexity and diversity of online communication data and that the limited availability of high-quality training data is a big issue for training supervised learning models. They defined each piece of news as consisting of physical news content and non-physical news content, where physical contents are the carriers and formats of the news and where non-physical contents are the opinions, emotions, attitudes, and sentiments that the news creators want to express. As a result, they considered that sentiment analysis is a useful method to illustrate the emotions, attitudes, and opinions that are conveyed by online social media and that sentiment-related factors are key attributes for suspicious account identification.
Zhou and Zafarini [23] surveyed fake news detection from the perspective of knowledge-based methods that detect fake news by verifying if the knowledge within the news text is consistent with facts, style-based methods that are concerned with how fake news is written, propagation-based methods that detect fake news based on how it spreads online, and source-based methods that detect fake news by analyzing the credibility of news sources. They considered sentiment as an important semantic-level feature of news content. They also stated that the implementation of efficient and explainable fake news detection systems needs collaborative efforts involving experts in computer and information sciences, social sciences, political science, and journalism.
De Souza et al. [26] reviewed the different types of features related to fake news detection methods and data sets, and they considered that SA was a useful feature to quickly verify the accuracy of information on social media. Finally, Antonakaki et al. [27] recently presented a survey on current research topics in Twitter, determining that sentiment analysis was one of the four main branches of research involving Twitter and that one of the major threats for this social network is the dissemination of fake news through it. However, they analyzed both topics separately, without making a connection between the two that shows the usefulness of sentiment analysis in detecting fake news.

Fake News
One of the most commonly accepted definitions by the research community is that "Fake news is a news article that is intentionally and verifiably false" [12,28]. Thus, we can identify three key aspects of fake news: its form as a news article, its deceptive intent, and the verifiability of its content as completely or partially false [19]. Wardle [29] deconstructed fake news into seven categories: • false connection, where headlines, visuals, or captions do not support the content; • false context, corresponding to genuine content shared with false contextual information; • manipulated content, i.e., genuine information manipulated to deceive; • misleading content, which involves misleading use of information to frame an issue or individual; • imposter content, where genuine sources are impersonated; • fabricated content, 100% false, designed to deceive and harm; and • satire/parody, with potential to fool but no intention to cause harm. Given the nonharmful nature of these news and because they are easily identifiable as parodic [30], this type of news is not usually considered for fake news detection, although satire can be used as an excuse to avoid the accusation of spreading false news [31].
Bakir and McStay [43] examined the 2016 US presidential election campaign (Trump vs. Clinton) to try to identify problems with, causes of, and solutions to the fake news phenomenon. Fake news were defined as either wholly false news or new containing deliberately misleading elements incorporated within its content or context. They argued that the fake news phenomenon was a logical outcome of five features of the digital media ecology: (1) the financial decline in legacy news, (2) the news cycle's increasing immediacy, (3) the rapid circulation of misinformation and disinformation via user-generated content and propagandists, (4) the increasingly emotionalized nature of online discourse, and (5) the growing number of people financially capitalizing on algorithms used by social media platforms and internet search engines. Related to (4), they considered sentiment analysis to be an element of great relevance to fake news detection, and they also suggested that the potential to manipulate public sentiment via "empathically optimized" automated fake news was a near-horizon problem. This prediction is particularly worrisome considering that the human ability to detect deception is not good, to the point that it has been shown to be worse than that achieved by machine learning systems. According to [64], human judgments achieve 50-63 percent success rates, depending on what is considered deceptive, while machine learning algorithms reached 65 percent accuracy at the time of the study.
The interest of the research community in disinformation and misinformation has increased in recent years, with a steady growth in the number of publications on rumors since 2006 [19]. In the case of fake news, there were few publications before 2016, but a rapid growth started in 2017 that has led fake news to become the most important research subject on these issues since 2018, surpassing rumors [19,65].

Fake News Detection
The influence of fake news cannot simply be undone by pointing out that the information was incorrect, especially in people with relatively lower cognitive ability [66]. Therefore, it is very important to detect fake news as soon as possible in order to prevent its spread. When the amount of fake news was much less than it is nowadays, the task of checking the veracity of the claims made in news could be carried out by trained journalists. Depending on the complexity of the claim, this process may take from a few minutes to a few days [17]. The enormous growth of social media in the last decade as well as the increasing spread of fake news on the internet makes it impossible to maintain a manual process of checking the news, so it is necessary to resort to automated methods to deal with the veracity of news.
The purpose of automatic fake news detection systems is twofold: first, to reduce human time and effort to detect fake news [24] and, second, to categorize news along a continuum of veracity with an associated measure of certainty, taking into account that veracity is compromised by the occurrence of intentional deceptions [11]. Thus, we can define fake news detection as the process of estimating whether a particular news article of any topic from any domain is intentionally or unintentionally misleading [18]. For Shu et al. [12], fake news is intentionally written to mislead readers to believe false information, which makes it difficult to detect based on news content as it uses diverse linguistic styles while simultaneously mocking true news. Therefore, we need to include auxiliary information, such as user social engagements on social media to detect fake news but this information is big, incomplete, unstructured, and noisy [67].
In recent years, the problem of detecting fake news has been attacked from various angles, leading to multiple different ways of categorizing the different approaches employed for this task [6,12,23,68,69]. One of the most common ways to divide approaches to detecting fake news is to consider the following categories: • Knowledge-based, fact-checking approaches that try to determine whether the claims made in a news story are supported by facts. For this, knowledge bases, including the semantic web and linked open data, are used [70,71]. We could also include in this category fact-checking approaches based on the use of information retrieval techniques to find documents that support a news piece [72,73]. • Context-based approaches that try to determine the truthfulness of a news story based on its metadata, such as the credibility of its author and publisher as well as the speed and form of dissemination of the news on social networks [27]. • Content-based approaches that try to determine the veracity of a story based on its text, which includes considerations of style (for example, length, variety of words, complexity of vocabulary, and complexity of syntactic constructions) [10,24,68] as well as of the type and strength of sentiments and emotions conveyed by the news, which is the topic covered in this article.
Fake news detection falls within the broader field of misinformation and disinformation detection and mitigation. The approaches to detecting fake news use methods, techniques, and resources that are useful in a variety of other tasks and vice versa. Among those related tasks, we can mention the following ones: • Stance classification [74,75], the task of determining the opinion behind some text, with the target or topic known in advance (target-specific stance classification) or not (open stance classification). It is different from fake news detection in that it is not used for assessing veracity but consistency [24]. • Rumor detection [3,19,76]. We can define a rumor as a piece of circulating information for which the truth or falsehood has yet to be verified at the time of spread [3,12]. The resolution of this task involves rumor identification, rumor tracking, stance classification, and veracity classification. • Truth discovery [77], the task of detecting true facts by resolving conflicts among multi-source noisy information, e.g., databases, the web, crowdsourced data, etc. To solve this task, both the credibility of the source and the truthfulness of the objects of interest need to be taken into account. • Clickbait detection [9], the task of distinguishing headlines that fit the facts described in a news item from those designed for eye-catching and teasing in online media. • Opinion spam [78,79] and fake reviews [80,81] detection, the task of filtering out fictitious opinions that have been deliberately written in review sites to sound authentic. People may write this type of fake reviews to promote their products or to defame their competitors' products. • Bot detection [82]. Social media is now populated by small programs designed to exhibit human-like behavior called social bots [83] that automatically spread posts to give the impression that a given piece of information is highly popular and endorsed by many people. The task of detecting bots is closely related to opinion spam and fake review detection. • Determining source credibility [84,85]. Credibility is a perceived quality associated with believability, trustworthiness, perceived reliability, expertise, and accuracy. As a result, we could say that credibility is a transversal quality for all tasks that deal with misinformation and disinformation. • Hate speech detection [86][87][88]. Hate speech is a broad umbrella term for insulting and offensive user-created content addressed to specific targets to incite violence or hate toward a minority based on specific characteristics of groups, such as ethnic origin, religion, or other. In hate speech, language is used to attack or diminish these groups. These types of messages are based on content of doubtful credibility, including rumors and fake news.

Resources
Finding the resources and data sets used in fake news research is not an easy task. To find data sets, we followed two processes: • Papers that discuss experimental results usually provide some reference to the source of the data used in such experiments, which can range from simply indicating the name of the data set to providing a link or, more rarely, a bibliographic reference. In the case of links, a lot of them were broken, so we did our best to obtain the right links. • A search was performed on Google (https://www.google.com/, accessed on 3 June 2021) using "fake news dataset" and "fake news corpus" as query phrases to find additional resources.
For each resource found, an additional search process was carried out to find a bibliographic reference in which the resource is described.
To find shared tasks relevant to fake news detection, we also followed two processes: • Manual browsing was performed for the most popular evaluation campaigns in NLP, such as SemEval, CLEF, IberLEF, etc. • In addition, a Google search for "fake news shared task" and "fake news campaign" was performed.
As a result, lists of data sets and shared tasks were compiled. For each resource, a summary of its most relevant characteristics as well as a working link and a bibliographic reference in which it is described in more detail were provided. We consider this compilation itself to be a valuable resource for the scientific community.

Data Sets
Building data sets to test different detection techniques is a crucial research element in this area. The usefulness of a fake news detection corpus depends on several factors: it should provide both truthful and deceptive instances in text format, the news verification process should be clearly indicated, and there should be homogeneity in length, writing matter, and timeframes [89]. The main issues in constructing such data sets are that the amount of false information is a small fraction of the online content produced every day, even if we restrict our focus on news articles and posts discussing breaking news, and that social media companies nowadays have strict policies concerning the analysis of data produced by their users [19]. Below, we list the data sets that have been built by the scientific community to assess the performance of fake news detection algorithms, techniques, and systems. Since most of the data sets comprise texts in English, we only explicitly specify the language of non-English data sets in the list below. It is also worth noting that we focus only on corpora that are composed mainly of news (or news-related content such as Twitter or Facebook posts), so we do not include those that are based on other types of false information, such as fake reviews [90], rumors [91][92][93], hoaxes [94], or everyday lies [95]. Both websites have large archives of fact-checked statements that cover a wide range of issues involving UK and US public life, and they provide detailed verdicts with fine-grained labels that were aligned to a five-point scale of "True", "MostlyTrue", "HalfTrue", "MostlyFalse", and "False". • BuzzFeed-Webis Fake News Corpus 2016 (https://zenodo.org/record/1239675, accessed on 3 June 2021) [68]. This corpus encompasses 1627 Facebook posts from 9 publishers on 7 workdays close to the US 2016 presidential election. It contains 256 posts from three left-wing publishers, 545 posts from three right-wing ones, and 826 posts from three mainstream publishers. All publishers earned Facebook's blue checkmark, indicating authenticity and an elevated status within the network. Each post and linked news article was rated "mostly true", "mixture of true and false", "mostly false", or "no factual content" by BuzzFeed journalists. • BuzzFace (https://github.com/gsantia/BuzzFace, accessed on 3 June 2021) [97] is an extension of the previous corpus enriched with 1 [98], with 12.8 K human labeled short statements from PolitiFact.com evaluated for its truthfulness using six labels: "pants-fire", "false", "barely-true", "half-true", "mostly-true", and "true". A rich set of meta-data for the author of each statement is also provided. The statements are sampled from news releases, TV and radio interviews, campaign speeches, TV ads, Twitter messages, debates, and Facebook posts. The most discussed subjects are economy, healthcare, taxes, federal-budget, education, jobs, state-budget, candidate biographies, elections, and immigration. pages, as well as 1145 posts from three large mainstream political news pages that were manually rated as "mostly true","mixture of true and false", or "mostly false". Satirical and opinion-driven posts or posts that lacked a factual claim were rated as "no factual content". • FEVER, Fact Extraction and VERification data set (https://fever.ai/resources.html, accessed on 3 June 2021) [103], consists of 185,445 claims manually verified against the introductory sections of Wikipedia pages and classified as Supported, Refuted, or NotEnoughInfo. For the first two classes, systems and annotators need to also return the combination of sentences forming the necessary evidence supporting or refuting the claim. The claims were generated by human annotators extracting claims from Wikipedia and mutating them in a variety of ways, some of which were meaningaltering. The verification of each claim was conducted in a separate annotation process by annotators who were aware of the page but not the sentence from which the original claim was extracted. Although this data set does not contain news, we consider it relevant and is included in this list because it shows a way to convert a text with objective facts (similar in some way to a true news item) into a false text (equivalent to a fake news). • Fake News vs. Satire corpus (https://github.com/jgolbeck/fakenews, accessed on 3 June 2021) [31] contains 283 fake news articles and 203 satirical stories focused on American politics, posted between January 2016 and October 2017. The title, a link, and the full text ID are provided for each article. For fake news stories, a rebutting article is also provided that disproves the premise of the original story.
• Fake.Br Corpus (https://github.com/roneysco/Fake.br-Corpus, accessed on 3 June 2021) [104,105] composed of 7200 true and fake news written in Brazilian Portuguese: 3600 fake news were manually collected from four Brazilian newspapers while 3600 true news were collected in a semi-automatic way from major news agencies in Brazil, choosing the most similar ones to fake news, with manual verification to guarantee that the fake and true news were in fact subject-related. • FakeNewsCorpus (https://github.com/architapathak/FakeNewsCorpus, accessed on 3 June 2021) by Pathak and Srihari [106] contains 704 fake and questionable articles from Aug.-Nov., 2016, on the topic of the 2016 US election. Each article was manually checked, and two types of labels were assigned to it: a primary label based on the assertions made ("False", "Partial truth", or "Opinions") and a secondary label ("Fake" or "Questionable"

Shared Tasks
In recent decades, the text analytics research field has been characterized by the development of collaborative resources that materialized mainly through the organization of "shared tasks". In these competitive evaluation campaigns, the organizers provide annotated data sets for training in advance, which are used by participating teams from all over the world to fine-tune their systems. Later, test data sets are released for a limited period of time before the official results are provided by the participants. After a shared task has finished, these data sets are used to evaluate emerging new systems, thus enabling comparison between systems. The area of automatic fake news detection is relatively recent and, for this reason, this type of shared tasks has not become popular until very recently. A list of the most relevant shared tasks follows (as in the previous section, the target language of the tasks is English unless specified otherwise): •  [113] held in 2017 aimed to determine the perspective (or stance) of a news article relative to a given headline. Therefore, despite its name, this task does not detect fake news but determines its stance. An article's stance can either agree or disagree with the headline, discuss the same topic, or be completely unrelated. An existing data set for stance classification [91] was enhanced for this purpose, with 50 K labelled claim-article pairs, combining 300 claims with 2582 articles. The claims and the articles were curated and labeled by journalists. • WSDM-Fake News Classification (https://www.kaggle.com/c/fake-news-pair-clas sification-challenge/, accessed on 3 June 2021) was a shared task organized within the framework of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM 2019). More than the detection of fake news, this task dealt with the detection of news that propagated or refuted other fake news, since given the title of a fake news article A and the title of a news article of interest B, participants were asked to classify B into one of three categories: "agreed" (B talks about the same fake news as A), "disagreed" (B refutes the fake news in A), and "unrelated" (B is unrelated to A). The training data set contained 320,767 news pairs in both Chinese and English, while the test data set contained 80,126 news pairs in both languages. • Fake News Detection Challenge KDD 2020 (https://www.kaggle.com/c/fakenewsk dd2020, accessed on 3 June 2021) was a shared task organized in the context of the Second International TrueFact Workshop: Making a Credible Web for Tomorrow in conjunction with SIGKDD 2020. The data set was composed of fake and true news. • Profiling Fake News Spreaders on Twitter (https://pan.webis.de/clef20/pan20-we b/author-profiling.html, accessed on 3 June 2021) [114] was a shared task organized within CLEF 2020 in the context of PAN, a series of scientific events, and shared tasks on digital text forensics and stylometry. The particularity of this task is that it does not properly try to detect fake news but to detect whether a Twitter user is a potential propagator of fake news. The languages considered for this task were English and Spanish. The data set consisted of the last 100 tweets of 500 users (250 fake news spreaders and 250 true news spreaders) for each of the two languages. • UrduFake@FIRE2020 (https://www.urdufake2020.cicling.org/, accessed on 3 June 2021) [115] was a shared task organized within the framework of the 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020). The data set used for the task was the Bend-The-Truth Urdu fake news data set, composed of 750 true news articles in the domains of technology, education, business, sports, politics, and entertainment, obtained from a variety of mainstream news websites predominantly in Pakistan, India, the United Kingdom, and the USA, and 550 fake news intentionally written by a group of professional journalists in the same domains and of the approximately same length as true news. • The CONSTRAINT 2021 shared task (https://constraint-shared-task-2021.github.io/, accessed on 3 June 2021) [116] was organized in the context of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT 2021) collocated with AAAI 2021. It encompassed a task for detecting fake news in English over a data set with real and fake news on COVID-19 [117] and a second task on hostile post detection in Hindi over a data set of 8200 hostile and non-hostile texts from various social media platforms such as Twitter, Facebook, or WhatsApp [118]. This latter task is a multi-label, multi-class classification problem where each post can belong to one or more of a set of classes: fake news, hate speech, offensive, defamation, and non-hostile posts. • FakeDeS (https://sites.google.com/view/fakedes, accessed on 3 June 2021) is the Fake News Detection in Spanish Shared Task established in 2021 under the umbrella of the Iberian Languages Evaluation Forum (IberLEF). The training data set for this task is The Spanish Fake News Corpus [110] described above. The testing corpus contains news related to COVID-19 and news from other Ibero-American countries.

Evaluation Measures
The most widely used measures to determine the performance of fake news detection systems are accuracy (Acc), precision (P), recall (R), F1, and Area Under the Curve (AUC). In the context of fake news, the articles annotated in the test data set are considered the ground truth. To compute the metrics, we must consider the number of true positives (|TP|), true negatives (|TN|), false positives (|FP|), and false negatives (|FN|) with respect to said ground truth, where • A true positive is counted for each article predicted as fake news that is actually annotated as fake in the test set; • A false positive is counted for each article predicted as fake news that is actually annotated as true news in the test set; • A true negative is counted for each article predicted as true news that is actually annotated as true in the test set; and • A false negative is counted for each article predicted as true news that is actually annotated as fake in the test set.
Accuracy refers to the percentage of articles that has been labeled correctly by the detection system, either as fake or true news, and is computed as indicated below in Equation (1). Precision refers to the percentage of fake news predicted that correspond to actual fake news and is computed as indicated in Equation (2). Recall refers to the percentage of total fake news items in the test set that are successfully recognized as fake by the detection system and is computed as indicated in Equation (3). F1 combines precision and recall by means of their harmonic mean, as shown in Equation (4). For all of these measures, the higher the value, the better the performance.
The Receiver Operating Characteristics (ROC) curve provides a way of comparing the performance of several detection systems by looking at the trade-off in the False Positive Rate (FPR, also known as fall-out, equivalent to one minus the True Negative Rate or specificity) and the recall, which is also known as the True Positive Rate (TPR) or sensitivity in this context. FPR is computed as indicated in Equation (5). A ROC curve is drawn by plotting the FPR on the horizontal axis and the TPR along the vertical axis.
AUC is precisely the area under the ROC curve. An excellent system has an AUC close to 1 (it is able to perfectly distinguish between all fake and true news correctly), while a poor system has an AUC close to 0 (it would be considering all fake news as true and all true news as fake). AUC is more statistically consistent and more discriminating than accuracy [119], and it is usually applied in imbalanced classification problems, as is the case of fake news detection, where the number of ground truth fake news articles and true news articles have a very imbalanced distribution [12].

Summary
The 2016 US presidential election marked a turning point in the interest in fake news, both in society as a whole and in the scientific community in particular. This has led to the creation of a large number of resources, mainly in the form of data sets but also in the form of shared tasks. The negative part of all this is that the research community has not been able to establish standard elements (data setsand performance measures) that can be used by all authors involved in the field in order to allow an adequate comparison of the various approaches that have been developed over the last five years.

Sentiment Analysis as a Base Component for Text Analytics
In text analytics and NLP, subjective information usually refers to the fragments of natural language utterances that reveal opinions, feelings, points of view, and stances with respect to a given topic of interest. Automatically analyzing such utterances and understanding the sentiment that is expressed there is usually referred to as sentiment analysis, and it is interesting not only from a pure research point of view but also from an industry one. In this way, it is possible to process huge amounts of data to monitorize the views of society with respect to public matters, events, or products. SA started to become popular in the late 90s, when the first practical data sets for subjective classification were introduced [120] and when the first paradigms that made it possible to create models to tackle the problem effectively were proposed.
On the one hand, Turney [121] introduced what are now commonly called lexiconbased methods. His approach consisted in automatically creating semantic orientation dictionaries that mapped words to subjectivity scores by computing the pointwise mutual information of adjectives and adverbs against the seed words "excellent" and "poor". Word scores could be then added up to determine the overall sentiment of a given input. Many authors have followed this angle to create their SA systems. For instance, Esuli and Sebastiani [122] presented SentiWordNet, a lexical resource where every entry in WordNet [123] is assigned a positive, neutral, and negative score, which is generated by a set of ternary semi-supervised classifiers. Differently, Taboada et al. [124] and Brooke et al. [125] manually created semantic orientation dictionaries for common nouns, verbs, adverbs, adjectives, and intensifiers. They also presented a simple yet effective rule-based system (for English and Spanish) that dealt with relevant phenomena for polarity classification, such as negation or intensification. Thelwall et al. [126] presented SentiStrength, another lexical rule-based system for sentiment classification but focused on short texts while being easily adaptable to different languages. Hutto and Gilbert [127] enhanced a lexicon-based SA system with five generalizable rules that try to embody some writing conventions that humans use when expressing or emphasizing sentiment intensity. Vilares et al. [128] went one step further and proposed a rule-based system that was based on syntactic structures, and it is applicable to any language for which a dependency parser and a semantic orientation dictionary are available. Related to lexicon-based methods, other authors such as Cambria et al. [129,130] have focused instead on concept-level sentiment analysis, creating semantic graphs that connect concepts in order to capture their semantics.
On the other hand, subjective analysis can be also tackled using machine learning models, an angle that has been especially dominant since the general adoption of deep learning in NLP. Pang et al. [131] were among the first to use machine learning models for SA, and studied the use of n-grams of words and part-of-speech tags to train naïve Bayes, maximum entropy, and support vector machine classifiers. Their setup served as the starting point for many authors to explore the usefulness of different types of features for SA, such as lexical [132], morphological [133], and syntactic features [134,135]. Kalchbrenner et al. [136], dos Santos et al. [137], and Severyn and Moschitti [138] were among the first to apply deep learning techniques over raw texts for SA, which has greatly reduced the need to manually engineer features thanks to models such as convolutional neural networks. In the same line, Socher et al. [139] introduced a novel model to compute the sentiment of texts with a tree-recursive neural network. To do so, they first created a sentiment treebank where constituents (phrases) were annotated by Amazon Mechnical turkers, which was then used to train a recursive deep model that learned to generate the sentiment of upper constituents by composing the sentiment of their children. In a related line, Tai et al. [140] showed how a tree-LSTM that exploited syntactic properties could obtain slight improvements over raw LSTM models for SA (as well as for other sequential tasks). Lately, the use of distant learning and semi-supervised learning has been used to obtain significant improvements in SA. For instance, Radford et al. [141] showed how pre-training an LSTM on a large data set of Amazon reviews at the character level (byte level) could obtain state-of-the-art results for SA on standard data sets by simply fine-tuning on a few supervised examples. Similarly, Devlin et al. [142] showed how BERT, the well-known bidirectional language model, obtained strong results for SA after the fine-tuning phase. For space reasons, we refer the reader to [143] for a complete review on deep learning methods for sentiment analysis.
SA models, methods, and techniques have been successfully applied in the context of text analytics; in applications such as the analysis of reviews on products and services [128,144]; in the analysis of social media posts in Twitter [145,146], Facebook [147], Instagram [148], etc.; in the detection of social spam to prevent normal users from being unfairly overwhelmed with unwanted or fake content via social media [149]; in the detection or irony, sarcasm, and satire in formal and informal text [150,151]; in the detection of sexism, racism, bullying, harassment, and hate speech [152][153][154][155][156]; in influence and reputation analysis [157,158]; in political [159,160], social [161,162], and economic analysis [163,164]; in security monitoring [165]; and in health and well-being analyses [166,167].

Sentiment Analysis for Fake News Detection
As established in previous sections, the sentiment that is expressed and the strength with which it is expressed constitute a substantial element to reliably determine the degree of veracity of a news piece. Among the developments in the last decade, we can distinguish two types of approaches for detecting fake news as far as the use of SA is concerned. On the one hand, there is a set of approaches that take SA as the fundamental basis of their fake news detection strategy, which is usually complemented by the use of other information extracted from both news contents and the context of news spreading on social networks. These are the approaches to which we dedicate Section 6.1. On the other hand, we have a more numerous set of models in which the sentiment expressed in a news item is considered as a feature along with other features obtained from the text and the context of the news piece. Papers that have chosen this approach are the subject of Section 6.2.
To compile the papers discussed in this section, we took into account that, in general, computer science researchers prefer to publish their findings at scientific conferences over journals and that the coverage of these conferences is insufficient in Web of Science (http://webofknowledge.com/, accessed on 3 June 2021) and Scopus (https://www.sc opus.com/, accessed on 3 June 2021) but quite good in Google Scholar (https://scholar. google.es/, accessed on 3 June 2021). Consequently, we searched for relevant articles on Google Scholar by using as search terms "fake news sentiment" and "fake news opinion". We considered it relevant to note that, in this review, we priorize presenting each related paper in detail over organizing the full research on the subject of the review. Thus, we follow the approach used in other reviews on fake news published in scientific journals in the last two years [4,6,19,23,25]. The other possibility would have been to follow the approach, less common in computer science than in other disciplines such as those related to healthcare, of carrying out a systematic review according to some guidelines, as it was the case of [18,20,26].

Fake News Detection Systems Based on SA
Diakopoulos et al. [168] presented an analytic tool designed to help journalists and media professionals to extract news value from large-scale aggregations of social media content around broadcast events and, in particular, to drive analysts to gather information from identified sources. For this purpose, they leveraged four types of automatic content analysis: relevance, uniqueness, sentiment, and keyword extraction. For sentiment analysis, they followed a two-step approach. As a first step, they ran a simple classifier based on a lexicon of words that classified texts based on whether they were carrying subjective information. In a second step, they applied a supervised machine learning classifier based on n-grams of length up to four. The combined classifier resulted in a five-fold crossvalidated accuracy of 62.4%, which was sufficient for giving an overall impression of the sentiment but still failed on difficult cases involving sarcasm or slang. Their tool provided a visual representation of aggregated sentiment in the form of a sentiment timeline. They evaluated the tool on 101,285 tweets about the State of the Union address by US President Barack Obama in early 2010 and they found that sentiment analysis in conjunction with the magnitude of the social media response to different quotes, topics, or issues were the most useful analytic indicators for journalistic inquiry.
AlRubaian et al. [169] also used SA to identify implausible content in Twitter written in Arabic in order to prevent the proliferation of fake or malicious information, since they considered that sentiment gives a measurement about the user behavior, which led to high precision in credibility analyses. In their system, sentiment weighted 30% with respect to all of the features that were considered. SA, along with topic classification, was also the basis of the system by Chatterjee and Agarwal [170] to determine the credibility of tweets written in English and Hinglish (a hybrid use of English and Hindi).
Dey et al. [171] applied several NLP methods (part-of-speech tagging, named-entity recognition, sentiment analysis) to a set of 200 tweets referred to the 2016 US Presidential Election. They found that credible tweets had mostly positive or neutral polarity, while tweets with fake content had a strong tendency towards negative sentiment. However, their data set was too small to obtain conclusive results. Bhutani et al. [172] based their fake news detector on sentiment analysis under the hypothesis that the sentiment that was put forth on writing a news article would serve as a pivotal deciding factor in the process of characterizing the news into fake or real. They applied a Naive-Bayes classifier to determine the sentiment of texts and then used it as a main feature of Multinomial Naive-Bayes and Random Forest classifiers for detecting fake news, with the latter obtaining the best results.
Ajao et al. [173] also exploited the hypothesis that there is a relation between fake news and the sentiment of the text posted online. This hypothesis was statistically tested on a corpus of rumors [174] and was confirmed later in experimental results to compare several classic and deep learning classifiers using sentiment against a deep learning baseline considering only text features. The best results for fake news detection were obtained by sentiment-aware classifiers, an SVM in the case of classic models and an LSTM with hierarchical attention in the case of the deep learning ones.
Cui et al. [175] found statistical evidence on the FakeNewsNet data set that the sentiment polarity of comments under fake news was greater than under real news. As a consequence, they decided to incorporate users' latent sentiments into an end-to-end deep embedding framework for detecting fake news. They used three neural networks to deal with news images, news text, and user profiles, whereas an adversarial mechanism was introduced to preserve semantic similarity, and enforce representation consistency between the text and image. Finally, they modeled users' sentiment to incorporate it into the proposed framework. A novelty of this work was the use of adversarial learning to find semantic correlations between different modalities in news contents. The resulting fake news detection system outperformed other classical and deep learning-based classifiers. Ablation experiments showed that the component that contributed the most to the performance of the system was sentiment analysis.
Del Vicario et al. [176] introduced a framework for early warning on possible misinformation targets on social media. They compiled a data set with real news from Facebook posts from Italian official newspapers and with fake news from Facebook posts taken from Italian sites known for disseminating misinformation. For each post, they extracted the entities associated with the textual content and the polarity of the sentiment expressed in the text. For each entity, they calculated the "presentation distance" (the absolute difference between the maximum and the minimum values among the sentiment scores of all posts containing the entity) and the "mean response distance" (the absolute difference between the mean sentiment score on the posts containing the entity and the mean sentiment score on their comments). From these two values, they set the controversy and the perception of the entity according to empirically derived thresholds. They noticed that presentation distance was a good indicator of the attention received by an entity in terms of likes and comments and that both controversial and captivating entities were much more present in fake news, thus highlighting the potential of such properties in identifying topics that were likely to be subject to misinformation. Finally, they used these measures to derive sentiment-based features which, together with others based on text properties (e.g., number of characters, words, etc.) and user behavior (e.g., number of comments, likes, etc.), fed several classic machine-learning classifiers to detect fake news. The best performance was attained by a logistic regression classifier.
Fake health news pieces have some characteristics that make them harder to detect than fake news on other domains. For example, they may mislead the reader by stating association as causation or by mixing up absolute risk and relative risk, which only require minor modifications based on the true information. Dai et al. [56] conducted an exploratory analysis to understand the characteristics of the data sets for health fake news detection, to analyze useful patterns, and to validate the quality of the FakeHealth data sets. With respect to social engagement on health news, they found that replies towards real news were more positive. In the same domain, Anoop et al. [55] targeted the detection of fake health news within online media sources that resembled traditional newspapers, given that the information was in the form of articles with some trustworthy information and abundant emotion-oriented narrative. For this reason, they based their approach for fake news detection on the different kinds of affective characteristics that were displayed on fake and true health news articles. Emotion features were extracted from a lexicon to feed classical and deep learning classifiers, and the results showed that emotion information increased performance for all classifiers. They also conducted preliminary experiments on detecting fake news on COVID-19, where they found a significant presence of emotional content in the narratives, indicating the applicability of emotion-oriented detection to identify fake news about this pandemic.
Zhang et al. [112] considered that most existing work on fake news was based on the emotional signals of content conveyed by the publishers but rarely focused on the emotions of comments aroused in the crowd, even when viral spread is fueled by the evocation of high-arousal emotions. Precisely, they explored whether emotions in news comments and their relationship to those in the content itself were helpful for fake news detection. They tested the approach using various deep learning-based classifiers on data sets in English and Chinese. The results on a Chinese fake news data set were good, while the results on the English data set were quite low, probably because the data set was originally designed for the detection of rumors, not fake news.

SA as a Feature for Fake News Detection Systems
Before fake news became a first-order issue, there was already some work in the literature that focused on determining the credibility of information circulating on social networks. In this line, Castillo et al. [84] focused on automatic methods for assessing the credibility of sets of tweets that spread information about a news event. They considered information from official and reputable sources as valuable information that other users synthesize and elaborate to produce derived interpretations in a continuous process. For their experiments, they collected 747 sets of tweets spreading news, each of one was manually tagged as "almost certainly true", "likely to be false", "almost certainly false", and "I can't decide". They noted that features based on sentiment analysis were very relevant for assessing credibility, as 3 out of the 10 best-performing features were sentimentrelated ones: average sentiment score, number of positive sentiment words, and number of negative sentiment words. They also observed that tweets exhibiting positive sentiment were more related to non-credible information while those with negative sentiment tended to be more related to credible information.
Ross and Thirunarayan [177] differed from [84] in that, instead of trying to determine the credibility of a tweet, they focused on ranking a collection of tweets by credibility and newsworthiness. In their base set of features, there were sentiment features, "has a happy emoticon" and "has a sad emoticon", and a sentiment score. In addition, they considered two features that aimed to capture when the sentiment of a tweet matched the overall sentiment of the topic hypothesizing that tweets that had similar sentiments to the topic would be credible, while tweets with dissimilar sentiments could mean that the tweet was non-credible or not newsworthy. They obtained features not only from the text of the tweet but also from the description of the authors of the tweet: the number of positive, negative, and curse words in the user description.
Popat et al. [102] addressed the problem of assessing the credibility of arbitrary textual claims that were expressed freely in an open-domain setting by automatically finding sources in news and social media. These sources were then fed into a logistic regression classifier in order to determine whether the claim was true or fake. A key element of their approach was the analysis of the style in which a claim was reported in an article, as it was assumed that a true claim was to be reported in an objective and unbiased language [178]. They captured the linguistic style by means of a set of lexicon-based features such as a list of positive and negative opinionated words; lists of assertive, factive, and report verbs; hedge words; implicative words; and discourse markers. This approach assumed that ample evidence or counter-evidence could be easily retrieved from a static snapshot of the web, but this is not true for newly emerging claims with sparse presence on the web. To overcome this limitation, Popat et al. [179] proposed to enhance the approach by determining the stance, reliability, and trend of retrieved sources of evidence or counter-evidence, which were used with the features already defined in [102] to feed a CRF classifier.
Hassan et al. [180] described a semi-automatic detection system that monitored live discourses, social media, and news to catch factual claims that were translated automatically into queries against a curated repository of fact-checks. For some claims, humans were brought into the loop, and in that case, the automated system assisted them in understanding and verifying the claims. The system was trained on a set of US general election presidential debates where each sentence was anotated as "Non-Factual", "Unimportant Factual", or "Check-worthy Factual". For each sentence, they extracted as features its sentiment polarity, its length, its bag of words, the number of ocurrences of every part of speech, and its named entities [181]. All of this resulted in a set of 6615 features, among which sentiment turned out to be the third most relevant one.
Varol et al. [182] studied the development of computational methods for the early detection of information campaigns that can be used to spread fake news, propaganda, or financial market manipulation. In particular, they tried to determine whether a Twitter hashtag was promoted based on information that would be available even in cases where the nature of a trend is unknown. This is a difficult task because a minority of promoted conversations is blended into a majority of organic content. Additionally, promoted hashtags may preexist the moment in which they are given the promoted status and may have originated in an entirely regular form, thus displaying features that are largely indistinguishable from those of other regular hashtags about the same topic until the moment of promotion. They proposed to use 487 features extracted from network structure and dissemination patterns, language, content and sentiment information, timing signals, and user meta-data. Sentiment-based features included sentiment polarity and strength, positive and negative emoticons, happiness score, arousal, valence and dominance scores, and emotion score. From these, only measures related to emoticons were among the top 10 features for experiments performed with machine learning classifiers.
Content differences between fake and real political news were studied by Horne and Adali [10]. They kept track of the sentiment polarity for each sentence, stylistic features (the number of times each part of speech appears in an article, the number of stopwords, punctuation, quotes, negations, informal/swear words, interrogatives, and words that appear in all capital letters), word complexity (number of syllables in words, ratio of unique words, and number of common and specialized words), and sentence complexity (number of words per sentence, depth of the sentence's syntax tree, and depths of the syntax trees for noun and verb phrases). They found that positive and negative sentiments were statistically significant features to differentiate the body text of real and fake news in the Silverman's Buzzfeed Political News Data set. This did not happen when only news headlines were considered, probably due to their short length.
Rashkin et al. [99] studied the language of news media in the context of political fact-checking and fake news detection. For this purpose, they estimated the use of strongly and weakly subjective words with a sentiment lexicon under the hypothesis that subjective words were used to dramatize or sensationalize news stories. Other types of lexical cues they considered were hedge words, intensifiers, comparatives, superlatives, action adverbs, manner adverbs, and modal adverbs. They found that words that can be used to exaggerate (subjective words, superlatives, and modal adverbs) were all used more by fake news. However, the additional features were only useful with classical classifiers, since the improvements for classifiers based on deep neural networks were negligible with respect to using the text as the only input feature.
Vosoughi et al. [63] analyzed true and fake news stories disseminated on Twitter from 2006 to 2017. They found that fake stories inspired the emotions of fear, disgust, and surprise in replies, whereas true stories inspired anticipation, sadness, joy, and trust. They conclude that the emotions expressed in reply to fake news may illuminate additional factors that inspire people to share false news. Although emotion analysis is not the same as SA, they are closely related because both analyze the subjective content expressed in a text, and the classification models used are very similar in both cases, with the biggest difference being in the set of classes with which the texts to be processed are annotated. For this reason, we have included this article in the analysis.
Yang et al. [183] analyzed the BS Detector data set and concluded that the polarity of sentiment in true and fake news was different, with fake news tending towards negative sentiment. The standard deviation of fake news on negative sentiment was also larger than that of real news, thus some of the fake news would have very strong negative sentiment. With respect to the use of language in true and fake news, they observed that fake news had fewer words and sentences and that the variance of these values in true news was much smaller than that in fake news; true news had fewer question marks; fake news had a much larger ratio of capital letters; the median of negations in fake news was much smaller; fake news had fewer first-person and second-person and more third-person pronouns; true news had more lexical diversity; and there were differences in the usage of words in titles of fake and true news. Besides these explicit features, the approach of [183] was based on the consideration that there exist hidden patterns in the words and images used in fake news that can be captured with a set of latent features extracted via the multiple convolutional layers in a deep neural network. The novelty of this approach was that it proposed a unified model to analyze both text and images from fake news. In particular, they used two parallel CNNs to extract latent features from both textual and visual information. Then, explicit and latent features were projected into the same feature space.
Reis et al. [184] extracted a large set of features from news content by using language processing techniquesas well as from news sources (bias, credibility and trustworthiness, and domain location) and from the environment (number of likes, shares and comments, and the rate at which comments are posted). One of the context features was the subjectivity and sentiment scores of the news text. All of these features were fed to several classic classifiers, with Random Forest and XGBoost obtaining the best performance on the BuzzFace data set. From their results, the authors observed that it was possible to choose a threshold so as to correctly classify almost all of fake news (TPR close to 1) while misclassifying 40% of the true news (FPR of 0.4), and they considered that this could be useful in assisting fact-checkers in identifying stories worth investigating.
Shu et al. [111] analyzed the FakeNewsNet data set, finding that people express their emotions or opinions toward fake news through social media posts such as skeptical opinions and sensational reactions, with real news having a larger proportion of neutral replies over positive and negative replies, whereas fake articles have a bigger ratio of negative sentiment. In their preliminary experiments to classify fake news, they used base features from the text only, so they did not provide information about the impact that SA has on the detection of fake news in this data set.

Discussion
We have seen how SA can be used to improve the effectiveness of fake news detection systems in a number of ways. Tables 1-3 show a summary of the most notable characteristics of the systems discussed in the previous section. In these tables, CRF stands for Conditional Random Field, SVM stands for Support Vector Machine, MaxEnt stands for Maximum Entropy, LSTM stands for Long Short-Term Memory, KNN stands for k-Nearest Neighbors, TI-CNN stands for Text and Image information-based Convolutional Neural Network [183], LSTM-HAN stands for Long Short-Term Memory with Hierarchical Atten-tion Networks [185], BiGRU stands for Bidirectional Gated Recurrent Unit, BERT stands for Bidirectional Encoder Representations from Transformers [142], NileTMRG stands for Nile Text-Mining Research Group [186], and HSA-BLSTM stands for Hierarchical Social Attention Bidirectional Long Short-Term Memory [187].
Some of the studies did not show experimental performance results on the fake news detection task for various reasons. Such systems are listed in Table 1: Diakopoulos et al. (2010) [168] performed a qualitative assessment on perceived utility rather than a quantitative analysis; Chatterjee and Agarwal (2016) [170] presented a qualitative assessment of results on a limited set of tweets; the purpose of Ross and Thirunarayan's [177] system was to rank tweets based on their credibility, which means that the performance measures used were not comparable to those commonly used in fake news detection systems; Hassan et al. [180] did not provide comprehensive results on the detection of false claims but provided a comparison of the detected percentages against journalistic news verification media such as CNN and PolitiFact; and Vosoughi et al. [63] did not present measures of news classification between fake and true news but instead presented statistical analyses with respect to the assumptions they made regarding the speed of news spread on Twitter.  Tables 2 and 3 show the systems for which performance measures were provided. We can see how the first models used especially created data sets to carry out the experiments described in each of the papers. Later, as time progressed, more and more data sets were used that could be considered standard in the sense that they are available for use by other researchers to corroborate the results or to test their own approaches to the task. We can also observe how, until 2019, most of the works tested a single machine learning system for the final phase of the detection system, while from that year on, it is common for a given paper to report results for different classifiers. To put these results in context with those obtained with other fake news detection methods not involving SA, Oshikawa et al. [24] reported that the best performing system on the LIAR data set achieved an accuracy of 0.457. In the case of FEVER, the best system achieved an accuracy of 0.647. In contrast, the accuracy of the best-performing systems increases to 0.944 for Buzzfeed Political News Data and 0.938 for the PolitiFact portion of FakeNewsNet. With regard to FakeNewsNet, Zhou and Zafarini [23] provided an analysis of the evolution of performance as lexical, syntactic, semantic, and discourse analysis features were considered.
It is difficult to compare results across systems due to the use of an abundant variety of different data sets and various performance measures. While many of the systems report the F1 score value along with that of precision P and recall R, this is because F1 is actually an aggregate measure of P and R. On the other hand, the systems that provide the value of the accuracy Acc do not report the value of F1 and vice versa. Only a couple of systems report the value of Acc and AUC, one provides AUC and F1 and another one shows Acc and F1. In practice, this diversity of measurements is not so relevant because the only systems that can be compared in the same evaluation data set are those created by the same authors. From a qualitative point of view, it is not uncommon that, although the results of many articles may coincide in many of their conclusions, they present discrepancies in certain specific aspects. We consider this to be due to the relative youth of the fake news detection field. With the organization of dedicated shared tasks in 2020 and 2021 and the creation of large data sets in recent years, we expect that, in the near future, it will be increasingly common to see systems that are evaluated on these new standard corpora.
For the SA methods used in these systems, a measure of the performance of the SA systems that have been applied is not normally given. This prevents us from knowing if an improvement in SA performance induces a significant improvement in final fake news detection performance. Furthermore, most papers use relatively simple, lexicon-based SA systems. The models that constitute the state-of-the-art in SA are based on machine learning. These systems are more expensive to build and require more computational resources than lexicon-based systems. It would be desirable to know if this cost increase is worth it, that is, if it would be reflected in an improvement in the effectiveness of a fine fake news detection system.  Table 3. Main characteristics of fake news detection systems using SA: systems that provide quantitative performance results on the task (part 2 of 2). There are a number of challenges that need to be addressed in the near future in the fields of SA and fake news detection:

Reference
• Multilingualism. Most of the work published on detecting fake news has been conducted on documents written in English. However, as this is a global issue, it is imperative to have systems that work in as many languages as possible. It was only recently that approaches that are able to deal with fake news for several languages have appeared [189,190]. For this purpose, effective multilingual SA methods may be applied [128,145]. • Multimedia content, particularly image and video, is becoming increasingly important on social media. Fake news are also increasingly accompanied by this type of content, so it will be necessary to enrich detection systems with multimedia SA methods [191]. • The most difficult fake news to detect are those in which falsehood has been subtly introduced, for example, expanding an authentic news piece with the addition of fake data or slightly modifying an authentic news story [192]. In this case, aspect-based SA [193] and adversarial training [194] can be of great help. • High-performance AI systems, particularly those based on deep learning, behave similar to black boxes that provide good results but can hardly justify a given output in a human-understandable way. The creation of explainable AI systems [195] is becoming more and more important, and therefore, it is necessary to add mechanisms both to the SA methods used and to the resulting fake news detection systems.
• It is known that NLP algorithms and resources may have inadvertently introduced biases [196,197]. This also applies to SA systems [198]. Adding algorithmic biases to psychological and sociological biases already present in people [199,200] could call into question the usefulness of automated fake news detection systems in the future. Therefore, we must add bias mitigation mechanisms in SA systems in order to avoid giving more or less credibility to a news item depending on the gender, race, geographic origin, religion, or any other personal circumstance of the writer or the people mentioned in the text.

Conclusions
The recent rise in the spread and social influence of fake news, driven by the popularization of social networks, has motivated a surge of interest in their automated detection. Since fake news tend to be written with the intent of conveying strong sentiments towards a given subject, sentiment analysis has proven to be a useful tool in the fake news detection toolbox, both when applied to news items themselves and to related information such as user comments.
In this article, we reviewed the field of fake news detection from the specific point of view of how sentiment analysis is being used to tackle the problem. We have seen that it has been proven useful in a diverse range of systems, both as a core component or as a source of auxiliary features. Direct comparison between systems and approaches is so far difficult due to the wide range of data sets used, many of them ad hoc, but this problem is on track to being solved with the recent appearance of publicly available data sets and shared tasks.
Thus, we can say that the research field of fake news detection (and in turn, the application of sentiment analysis for this purpose) is currently in its transition from infancy to maturity. In this stage, the most pressing challenges in our view involve the need to guarantee the fairness, accountability, and transparency of systems (ensuring that results are explainable and free from harmful biases); the support for multilingualism and multimedia content; and the detection of fake news generated by subtly modifying authentic stories or by using text-generation algorithms.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The BBVA Foundation accepts no responsibility for the opinions, statements, and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors.