Using Social Media to Detect Fake News Information Related to Product Marketing: The FakeAds Corpus

: Nowadays, an increasing portion of our lives is spent interacting online through social media platforms, thanks to the widespread adoption of the latest technology and the proliferation of smartphones. Obtaining news from social media platforms is fast, easy, and less expensive compared with other traditional media platforms, e.g., television and newspapers. Therefore, social media is now being exploited to disseminate fake news and false information. This research aims to build the FakeAds corpus, which consists of tweets for product advertisements. The aim of the FakeAds corpus is to study the impact of fake news and false information in advertising and marketing materials for speciﬁc products and which types of products (i.e., cosmetics, health, fashion, or electronics) are targeted most on Twitter to draw the attention of consumers. The corpus is unique and novel, in terms of the very speciﬁc topic (i.e., the role of Twitter in disseminating fake news related to production promotion and advertisement) and also in terms of its ﬁne-grained annotations. The annotation guidelines were designed with guidance by a domain expert, and the annotation is performed by two domain experts, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.815.


Introduction
Social media is a very fast and easy-to-access channel that disseminates news and, every second of the day, huge numbers of people are accessing and interacting with online news [1]. Over the last decade, social media channels, including Twitter, Facebook, YouTube, and Instagram, have become an integral part of our daily lives [2].
As an increasing amount of our time is spent interacting online through social media platforms, more and more people tend to seek out and consume news from social media sources, rather than traditional news organizations. Twitter is a very popular social media platform and its number of users has been growing rapidly since its creation in 2006. Today, it represents a very important and widely used source for news dissemination and also for marketing and promoting new products. For example, 62 percent of U.S. adults got their news from social media in 2016, while in 2012, only 49 percent reported reading the news on social media [3]. In recent years, Twitter has provided a panel where people can interact with each other and maintain social ties. People use Twitter to share their daily activities, happenings, thoughts and feelings with their contacts, which makes twitter both a valuable data source and a great target for various areas of research and practice. According to a report published in 2021 [4], Twitter has 340 million users and delivers 500 million tweets a day, 200 billion tweets a year [4,5].
It is fast and easy to use social media to obtain the latest news or to see advertisements for different products [6]. Any news spreads much faster via social media, no matter where in the world an event takes place [1]. However, despite the advantages provided by social media platforms, the credibility and quality of news on social media is lower than traditional news channels, including TV, newspapers, and other trusted news sources, due to the freedom afforded to social media channels in expressing (false) ideas and circulating (fake) news and (misleading) adverts [1]. Therefore, social media enables the wide and rapid dissemination of "fake news", i.e., low-quality news, which contains intentionally false information. Although the survey report [7] found that almost 60% of users expect news on social media to be inaccurate, it still leaves millions of people who will spread (retweet) fake news believing it to be true.
Twitter is widely used to spread false information promoting products and brands. For example, in the United States alone, 60 percent of adults who depend on social media for news consumption also share false information [5,8]. Individuals receive advertisements on social media based on their interests and consciousness about the facts and the content mentioned in the circulated advertisements. Around 54% of people around the globe have expressed their concerns about fake news [1]. Additionally, the younger generation is more heavily influenced by online-based news than older generations. This, in turn, results in the quick dissemination of news to millions and billions of people [9]. Additionally, online advertisements for products tend to target the younger generations and try to promote products relevant to their lifestyles, such as skincare products and technological gadgets, in eye-catching ways, to reach as many people as possible around the globe [1].
The widespread dissemination of false information has the potential to have an extremely negative impact on individuals and society [10]. For example, in 2008, a false report about the United Airlines parent company's bankruptcy caused the company's stock price to drop by 76%. Twitter [11,12] has been widely used to spread fake and biased news during the last two U.S. presidential election periods [13]. Following the last presidential election, it was estimated that over 1 million tweets were related to fake news "Pizzagate". Thus, the word "Fake news" was even named the word of the year by the Macquarie dictionary in 2016 [3].
Fake news is created for a variety of reasons, but mainly for financial and political gain [3]. For example, as most of the fake news is spread by propagandists, it usually conveys influential messages and persuades individuals, in different ways, to accept biased or false information [3,14,15]. From marketing perspectives, fake news also presents false information to promote a specific idea or product. If spread with malicious intent, fake news can be used by a competitor to damage the reputation of a specific brand or company.
According to Twitter's policy, a warning tag will be applied to any tweet containing disputed or misleading information related to COVID-19 that goes directly against guidance on COVID-19 from authoritative sources. However, Twitter is still working on the public conversation to make sure that credible and authentic information is available to the user [16].
Therefore, fake news detection on social media in general, and Twitter in particular, has recently become an emerging research topic that is attracting tremendous attention [3]. Although considerable effort has been made towards fake news detection on websites and news articles, very little effort has been put in to explore Twitter and, to the best of our knowledge, no prior work has focused on the influence of fake news and false information on marketing and promoting products, solely focused on Twitter, in order to tackle the rise and spread of fake news and to enhance the automatic detection of fake news and false information on this specific social media platform. To facilitate research into fake news detection on Twitter about misleading advertisements for differing products that target the consumer, and to help mitigate the negative effects caused by fake news-both to benefit consumers and the news ecosystem-it is critical that we develop methods to automatically detect fake news on social media. Machine Learning (ML)-based text mining (TM) tools have the potential to automatically detect fake news and false information related to product marketing. However, developing TM tools is reliant upon textual corpora, in which pertinent information is marked up by expert annotators. Such annotated corpora serve both as training datasets for ML-based Named Entity Recognition methods and as a gold standard for the systematic evaluation of new methodologies.
This research aims to explore how Twitter is used to disseminate false marketing information through deliberately misleading/fake adverts; the contribution of this research is threefold: 1. To use Twitter as a social media resource to explore the use of fake and false news to promote products. 2. To build annotated datasets for fake and real advertisements related to cosmetics, fashion, health and technology products. 3. The corpus is freely available to stimulate the development of ML-based Text Mining (TM) systems for the automatic extraction and classifications of details, relating to fake news intended to mislead the consumer by promoting false products. The developed TM systems can ultimately be a useful data resource for the research community to further the study of social media credibility, in promoting products and circulating fake advertisements.

Related Work
Most of the previous work tackles the problem of detecting fake news using textual sources from news articles. Fake news detection on social media in general, and Twitter in particular, has recently become an emerging research topic that is attracting considerable attention [3]. However, fake news detection is a very challenging task, as the purpose of the distribution of such news and events is to deliberately mislead people [17].
The previous studies on fake news detection in social media focused on different topics, including: bot detection [5,18,19], predicting spammer behavior and detecting these spammers [20][21][22][23][24][25], tweets related to natural disasters that were fake, spam and legitimate [26] clickbait detection [27]. Although there are many studies focused on spam detection problems related to online reviews for products [28][29][30][31], to the best of our knowledge, there is no prior work devoted to the study of fake advertisement marketing products on social media (i.e., Twitter in particular).
Supervised ML methods are widely used to identify fake news and most of the existing work is based on supervised ML methods, by formulating fake news detection as a binary classification problem, the main concern of this approach is to find effective features for training and evaluating the ML models [32]. Supervised ML-based approaches require a reliable pre-annotated dataset to train a classification model on a set of features that help the model to recognize and correctly classify the information in the unseen dataset [3,32]. The features that have been used by fake news detection ML algorithms generally fall into two categories: news contents and social context [33]. The contentbased approaches usually rely on using features, such as linguistics-based [34] and visualbased features [35,36], while social context approaches incorporate features from users' profiles [37], posts content and social networks [3,32]. However, supervised ML algorithms are strongly dependent on domain knowledge for designing features, which makes the method difficult to generalize to new tasks [38]. To the best of our knowledge, no prior work has focused on the influence of fake news on marketing products by giving false information and using false advertisements, which makes it difficult to adapt ML algorithms that are trained in different domains.
Fake news detection on social media presents unique challenges that make existing detection algorithms from traditional news media ineffective or not applicable [3]. The lack of manually labeled fake news datasets for text drawn from social media (i.e., Twitter) limits the advancement of ML-based approaches that could automatically detect fake news in social media. Examples of the publicly available dataset are LIAR [3], BuzzFeed News [34,39], CREDBANK [40]. All the mentioned datasets include text drawn from news websites, only the text in the CREDBANK dataset was drawn from Twitter. Recently, most of the work on fact checking comes in the form of shared tasks, e.g., CheckThat 2021 Cross-Language Evaluation Forum (CLEF) [41,42]. The shared task consists of various tasks related to fact checking about tweets related to COVID-19 and predicting the veracity of a news article and its topics (i.e., health, election, crime, climate, economy and education) [43]. Table 1 shows the comparison of the characteristics of the popular datasets, in terms of size, text genre, topic and annotation level. However, all the mentioned datasets are annotated at sentence level, to either classify the text as fake or real, or relate to the credibility level. Moreover, no prior dataset was dedicated to the impact of fake news and false advertisements on marketing and promoting products. Fake news is false information and facts disseminated with the main intention of deceiving the reader [45]. The term 'fake news' is often described in related literature using different terms, including 'misinformation', 'disinformation', 'hoax', and 'rumor', which are actually different variations of false information. Most of the previous works on fact checking and fake news detection [46] examine the problem from the angle of a veracity classification. However, there is no system that can automatically and completely stop the dissemination of fake news in social media and the consequent negative impact of the fake news on society without the involvement of humans [46]. Classical ML approaches can be applied to automatically extract fake news information, given that they have similar cases in the training dataset [9,17]. However, the development of text-mining tools depends on the availability of an annotated corpus.
In this research, we built a new corpus, named FakeAds. The corpus is collected from tweets for the topic of fake news in the marketing domain. The corpus is unique and novel, in terms of the very specific topic in the fake news domain (i.e., knowing how fake news influences marketing) and the fine-grained annotation provided at word level to classify each product into one of the following classes: fashion, cosmetics, health, and electronics.

Results and Discussion
To ensure that the generated corpus was of high quality, the annotations provided in the corpus closely followed the guidelines set by the annotators, who were English native speakers and experts in the field of annotation. We calculated the Inter Annotator Agreement (IAA) between the two annotators and a high IAA score provided assurance that the corpus annotations were reliable and of high quality.
We followed a number of other related studies [47][48][49] by calculating the IAA in terms of F-score. The F-score is the same whichever set of annotations is used as the gold standard [49,50]. To carry out such calculations, the set of annotations produced by one of the annotators was considered the 'gold standard', i.e., the set of correct annotations and the total number of correct entities was the total number of entities annotated by this annotator.
In this study, the annotations produced by the first annotator were considered as the 'gold standard', i.e., the set of correct annotations, and the total number of correct entities was the total number of entities annotated by this annotator. Based on the gold standard, the Inter Annotator Agreements (IAA), by means of precision, recall and F-score, were calculated. Precision (P) refers to the percentage of the correct positive annotated entities annotated by the second annotator in comparison to the annotation produced by the first annotator, which was assumed to be the gold standard. The precision was calculated as the ratio between the true positive (TP) entities and the total number of entities annotated by the second annotator (the sum of true positives (TPs) and false positives (FPs)), according to the following formula: P = TP/TP + FP.
Recall (R) is the percentage of positive annotated entities recognized by the second annotator. It is calculated as the ratio between the TP and the total number of annotations in the gold standard, according to the following formula: The F-score is the harmonic mean of precision and recall and is calculated according to the following formula: F-score = 2 * (Precision × Recall)/Precision + Recall. Table 2 shows the statistics and the IAA for the annotation of product types in the FakeAds corpus. Overall, the annotators agreed most of the time on annotating cosmetics and health products and the F-scores for these two classes were the highest, at 0.94 and 0.86, respectively. The reason for this high score is because the mentions and examples of cosmetics and health were very straightforward, and the annotators could easily recognize and classify the mentions. On the other hand, the F-scores for fashion and electronic products were generally lower than those for cosmetics and health because the number of tweets for electronics and fashion products were the fewest in the corpus compared with the number of examples of cosmetics and health products. In addition, there were a greater number of disagreements between the annotators, with regard to which type of products belonged to these two classes. For example, the second annotator annotated general words, e.g., clothes, bags, jumpers, etc., and this contributed to the low precision, especially for the fashion class, where the second annotator annotated irrelevant products as fashion (i.e., annotating very general descriptions of a fragrance instead of mentioning specific products e.g., luxurious scents). It was noticed that the low recall for electronics products was because the second annotator did not annotate every mention of electronic products and did not annotate broad coverage of electronics devices. For example, he did not correctly annotate electronic devices related to skincare and healthcare, such as skincare device and airbrush. To show the importance of our generated dataset, we compared the FakeAds corpus with other publicly available datasets in the fake news detection domain, which are reported in Table 1. In particular, we compared our dataset with CREDBANK [24] and CheckThat sub-Task 1: check worthiness [26] on twitter datasets, because they used Twitter as a textual source and they share some of the characteristics with the FakeAds corpus, e.g., they are annotated for similar classes related to reporting false or uncertain information. As shown in Table 2, the FakeAds corpus differs from the existing datasets in terms of the very specific domain, which is false advertisement to promote products, and also the rich annotations at two levels of annotation at tweet level, where the tweet is classified as real or fake, and at mention level, where the product mention is given one of the following classes: health, cosmetics, fashion or electronics. This makes it a valuable resource for training and evaluating ML-based techniques. The results of the annotation are satisfactory and are measured in terms of F-score at 0.815.

Corpus Construction
The FakeAds corpus consists of tweets that were collected from Twitter using the TweetScraper tool [51] for the period between 1 January 2015 and 30 December 2020. We targeted this particular five-year span as product marketing through social media was very common during this period. The following list of keywords was used to collect the relevant tweets: marketing, advertisement, digitalMarketing, socialmediaMarketing and onlinePromotion. We used the hashtagify tool [52] to find highly ranked, trending and popular hashtags, and also to find hashtags highly related to marketing and advertising. We found that the used search keywords represent hashtags ranked by hashtagify to be highly related to marketing hashtags. The tweets were further filtered by the annotators of this task who are English instructors, and only tweets that include information directly related to our task in question were retained, resulting in 5000 tweets. Manual inspection of the collected tweets revealed that the products that are discussed in the tweets generally belong to one of the following broad categories: cosmetics, health, fashion, and electronics. Thus, these categories were used as the classes for the products in the FakeAds corpus.
The tweets were annotated at two levels: 1. At tweet level so that tweets were annotated as fake or real. 2. At word level so that for each tweet, the product was classified into one of the following classes: cosmetics, health, fashion, and electronics.
In the tweet-level annotation task, the tweets were annotated as either fake or real. This annotation task is considered binary classification and we used the Amazon Mechanical Turk (AMT) tool to annotate the tweets. AMT is a crowdsourcing marketplace introduced by Amazon and which is becoming increasingly popular as an annotation tool for NLP research including: word sense disambiguation, word similarity, text entailment, and temporal ordering [53,54]. To ensure the quality of annotations produced by AMT we applied the country and high acceptance rate crowd filters so that only annotators with a 95% success rate on previous AMT Human Intelligence Tasks (HITs) and restricted to those who were located in the United States were accepted for the task. The reason to choose these two filters was because it lowered the pool of workers and it has been shown to be effective in reducing incidents of spamming and cheating found in previous studies [55,56]. The same set of annotation guidelines was shared/used by the annotators to ensure highquality and reliable annotations. As per the guidelines, the annotators need to consider two factors before deciding if a tweet is fake or real: the account-related features (e.g., the profile information such as number of followers and following users) and the tweet's related features (e.g., lexical and syntactical features of the tweet) [57].
Each of the 5000 tweets in our corpus was annotated by three workers, resulting in 5000 × 3 = 15,000 annotations in total. For each tweet the majority given class was chosen and hence the tweet was given that label: fake or real. In total, we collected 5000 tweets, out of which 2914 (0.5828) were labeled as real news while 2086 (0.4172%) were labeled as fake news. Figure 1 shows the distribution of tweets that contained either fake or accurate content in the FakeAds corpus. It has been noted that while 41% of the tweets in FakeAds were annotated to be fake, distributing real information related to product promotion still represents a higher percentage of product advertisements. This was something we expected as the Twitter platform is used by many trustworthy organizations to disseminate real adverts and factual information. However, for the multi-class annotation task the tweets were annotated at word level, which denotes mentions for products including the following classes: cosmetic, fashion, health and electronics. The annotation was done through the COGITO service where each tweet was annotated by two annotators for the mentions of the product type. Each tweet was annotated by two annotators for the entity types related to the product types by using the same set of annotation guidelines provided in Supplementary Materials File S1. The annotation included marking up all entity mentions in the corpus related to the four semantic types mentioned in Table 3. Table 3. Annotated entity classes in the FakeAds corpus.

Entity Type Description
Cosmetic Is product mention related to skincare, body care or make-up, for example, lipsticks, creams etc.

Electronic
Is products that require electric currents or electromagnetic fields to work. Examples are electronic devices, phones, cameras, computers etc.

Health
Is product mention related to supplement(s) that promotes the wellbeing of individuals, e.g., vitamins, herbs, etc.

Fashion
Is product related to accessories such as clothing, shoes, bags jewelry, fragrances, etc. Figure 2 describes the most common product types in the corpus and their distribution in the FakeAds corpus. As shown in Figure 2, there was considerable emphasis on Twitter in promoting cosmetic products e.g., skincare, makeup, etc. The cosmetic class represents 83% of the annotations in the FakeAds corpus, health-related products come next after the cosmetic products with 10% of the total annotation in the FakeAds corpus. The less-dominant and lesser-targeted products in advertisements on Twitter and in the FakeAds corpus in particular were electronics and fashion. It was also noted that people on Twitter tended to discuss fashion and electronics products less frequently in the context of advertising when compared with cosmetics. Table 4 summarizes the statistics of the corpus, and it shows the total number of fake and real annotations and the distribution for the different products among fame and real tweets. Figure 3 visualizes the distribution of products that are targeted most by fake news and false information, which are cosmetic and health products compared to real information for these two types of products. On the other hand, it is worth mentioning that the number of real news information related to electronic and fashion products is significantly higher than the number of fake news that targets these two types of products. This is because online advertisements for products in social media platforms tend to target the younger generations and try to promote products relevant to their lifestyles, such as skincare products and different supplements that match their lifestyles.

Conclusions
Our central goal in this paper was to provide the research community with a dataset that could serve the study of fake news detection on Twitter that targets information that misleads the consumer by falsely promoting products. The corpus consists of 5000 tweets, annotated at two levels: (1) each tweet is annotated as fake or real, (2) each tweet is annotated at word level. This is to classify the product into one of the following classes: cosmetics, health, fashion, or electronics. We envision that this will be a useful data resource for the community to further the study of social media credibility in promoting products and circulating fake advertisements. The proposed research could also provide a broader view about fake news related to marketing and help to enhance Twitter's policy to provide more credible and authentic information related to promoting products. It will also help to give an idea about which types of products are targeted more by propagandists to distribute fake news in an attempt to attract more consumers. The generated corpus can serve as a gold standard for the development and evaluation of TM tools that can classify each tweet as real or fake and extract product mentions, related to cosmetics, health, fashion and electronics. For example, in the future, we are planning to use classical ML-based NER and compare them with state-of-the-art contextual word embedding (e.g., BERT) on the FakeAds corpus to automatically classify tweets as fake or real, and also extract the product type discussed in the tweets.

Limitations and Future Works
Despite the contributions presented earlier, we acknowledge certain limitations. This work's main limitation lies in the product classes. While we tried to make sure that the product categories (i.e., classes) were broad enough to include all the products mentioned in the FakeAds corpus, the categories used may be not broad enough to include products that were not mentioned in the FakeAds corpus; for example, products related to sports equipment, furniture, cars, etc. Another potential limitation is the size of the corpus due to the cost of the manual annotation, in terms of time and money, so we were only able to annotate 5000 tweets. However, the size of the corpus is comparable to popular fake news datasets mentioned in Table 1 and exceeds the size of some datasets, e.g., the BuzzFeed and CheckThat datasets. ML-based text mining methods require a large dataset for accurate models to be trained and tested and, hence, increasing the size of the corpus would give more accurate results for training and evaluating ML models. In the future, we are planning to use the generated corpus as a gold standard for the development and evaluation of TM tools and we are also planning to broaden and increase the range of the corpus by including more product classes, and also by including text from other social media platforms, e.g., Facebook.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/data7040044/s1, File S1: The annotation guidelines. Data Availability Statement: The dataset will be freely available on Kaggle website upon the approval of the paper.