1. Introduction
The pervasiveness of online social networks (OSNs) has given people unprecedented power to share information and reach a huge number of peers, both in a short amount of time and at no cost. This has shaped the way people interact and access information: currently, it has become more common to access news on social media, rather than via traditional sources [
1]. A drawback of this trend is that OSNs are very fertile grounds to spread false information [
2]. To avoid being gulled, users therefore need to continuously check facts, and possibly establish reliable news access channels.
Contrasting unverified information has become an important topic of discussion in our society. In this hugely inter-connected world, we constantly witness attempts at targeting and persuading users both for commercial [
3] and political purposes [
4]. A single piece of false or misinterpreted information can not just severely damage the reputation of an individual, but also influence public opinion and affect the life of millions of people [
4]. In particular, social networking is a fundamental interaction channel for people, communities, and interest groups. Not only does social networking offer the chance to exchange opinions, information, discussion, as well as multimedia contents: it is also an invaluable source of data [
3]. Unhealthy online social behaviors can damage this source by working as echo chambers of toxic contents and may reduce the trust of the users in social media by posing privacy breaching concerns [
5,
6].
The very structure of OSNs, i.e., the underlying topology induced by the connections among the users, is key to amplifying the impact of fake news. The presence of “influencers” on social networks, such as celebrities or public figures with an uncommonly high number of followers, drastically speeds up information spreading [
7]. In this context, another issue is the presence of bots, i.e., autonomous agents that aim to flood communities with malicious messages, in order to manipulate consensus or to influence user behavior [
8].
Epidemic models [
9] are viable and common tools to study the spreading of fake news on social networks. They were first introduced to model the spreading of diseases among interacting agents [
10]. In these models, people are considered as the nodes of a social network graph, whose edges convey the social relationship between the corresponding nodes. We consider a variant to the classical compartmental susceptible, infected, recovered (SIR) model: the susceptible, believer, fact-checker (SBFC) model [
9]. Here, a node is initially susceptible to being gulled by a fake news item; it may then become a believer (i.e., it believes in the fake news item) when getting in contact with another believer; it may also become a fact-checker (i.e., it believes that the news item is fake) if it verifies the news item, or if it comes in contact with another fact-checker node.
The key question driving our work is the following: Can we perform an agent-based simulation of an online social network that reproduces key aspects of fake news injection and spreading and at the same time is realistic enough to enable a reliable analysis of multiple OSN scenarios? In fact, the spreading of fake news has several peculiar characteristics that differentiate it from the spreading of common diseases. These aspects include the different connectedness of agents in an OSN (e.g., bots, influencers, and common agents), the presence of fact-checkers, the spreading of debunking information that reduces the engagement of fake news, and the time-varying aspects of OSN user activity. Such peculiarities need to be taken into account in order to accurately characterize fake news spreading.
To answer the above research question, in this study, we build on the SBFC model to introduce some realistic extensions. In particular, we address the fundamental role of influencers and bots in the spreading of fake news online: this helps understand how fake news spreads over a social network and provides insight into how to limit its impact. Moreover, we consider the effects of dynamical network access patterns, time-varying engagement, and different degrees of trust in the sources of circulating information. These factors concur to make agent-based simulations more realistic, in the perspective that they should become a key tool to study the most effective fake news spreading strategies, and thus inform the most appropriate corrective actions.
In more detail, the contributions of this paper are the following:
We propose an agent-based simulation model, where different agents act independently to share news over a social network. We consider two adversary spreading processes, one for the fake news and a second one for debunking.
We differentiate among agents’ roles and characteristics, by distinguishing among common users, influencers, and bots. Every agent is characterized by its own set of parameters, chosen according to statistical distributions in order to create heterogeneity;
We propose a network construction algorithm that favors the emergence both of geographical user clusters and of clusters of users with common interests; this improves the degree of realism of the final simulated social network connections; moreover, we account for the presence of influencers and bots and for unequal trust among agents;
We propose ways to add time dynamical effects in fake news epidemic models, by modeling the connection time of each agent to the social network and the decay of the population’s interest on a news item over time.
The remainder of this paper is organized as follows. In
Section 2, we survey related work. In
Section 3, we introduce the model of the agents and the dynamics of our epidemic model, as well as our network construction algorithm. In
Section 4, we present the results of our simulations. In
Section 5, we discuss our findings and contextualize them in light of the current literature, before drawing concluding remarks, discussing the limitations of our work, and presenting future directions in
Section 6.
2. Related Work
Detecting fake news and bots on social networks and limiting their impact are active research fields. Numerous approaches tackle this issue by applying techniques from fields as different as network science, data science, machine learning, evolutionary algorithms, or simulation and performance evaluation.
For example, the comprehensive survey in [
3] showed that, among other uses, data science can be exploited to identify and prevent fake news in digital marketing, thus ruling out false content about companies and products and helping customers avoid predatory commercial campaigns [
3]. Given the large impact that data and genuine user behavior have on many processes, including data-driven innovation [
5], preserving a healthy online interaction environment is extremely important. Among other consequences, this may reduce concerns about the use of hyper-connected technologies, increasing trust and enabling the widespread deployment of high value-added services, as recently seen in contact tracing applications for pandemic tracking and prevention [
6]. This is especially important, as not only did fake news contribute to spreading a diffused uncertainty and confusion with respect to COVID-19, but it was also shown [
11] that the majority of the authoritative news regarding scientific studies on COVID-19 tend not to reach and influence people. This signifies a lack of correct communication from health authorities and politicians and a tendency to spread misleading content that can only be fought through the increased presence of health authorities in social channels.
In the following, we survey key contributions to fake news and bot detection (
Section 2.1), as well as modeling fake news spreading as an epidemic (
Section 2.2). As a systematic literature review falls outside the scope of this paper, we put special focus on those contributions that tackle fake news spreading from a simulation point of view, which forms the main foundation of our work. We refer the reader to the surveys in [
12,
13,
14] and the references therein, for an extensive survey of the research in the field of fake news, as well as of general research directions.
2.1. Fake News and Bot Detection
In recent years, there has been a growing interest in analyzing and classifying online news, as well as in telling apart human-edited content from content produced by bots.
Machine learning techniques have been heavily applied to the problem of classifying news pieces as fake or genuine. In particular, a recent trend is to exploit deep learning approaches [
15,
16,
17,
18]. A breakthrough in fake news detection comes from geometric deep learning techniques [
19,
20], which exploit heterogeneous data (i.e., content, user profile and activity, social graph) to classify online news.
In the same vein, the work in [
21] employed supervised learning techniques to detect fake news at an early stage, when no information exists about the spreading of the news on social media. Thus, the approach relies on the lexicon, syntax, semantics, and argument organization of a news item. This aspect is particularly important: As noted in [
2], acting on sentiment and feelings is key to push a group of news consumers to also produce and share news pieces. As feelings appear fast and lead to quick action, the spreading of fake news develops almost immediately after a fake news item is sowed into a community. The process then tends to escape traditional fact-based narratives from authoritative sources.
A common way to extract features from textual content is to perform sentiment analysis, a task that has been made popular by the growing availability of textual data from Twitter posts [
22,
23,
24]. Sentiment analysis is particularly useful in the context of fake news classification, since real news is associated with different sentiments with respect to fake news [
25]. For a more in-depth comparison among different content-based supervised machine learning models to classify fake news from Twitter, we refer the interested reader to [
26].
When analyzing fake news spreading processes, it must be taken into account that OSNs host both real human users and software-controlled agents. The latter, better known as bots, are programmed to send automatic messages so as to influence the behavior of real users and polarize communities [
27]. Machine learning techniques also find significant applications in this field. Machine learning has been used to classify whether a content was created by a human or a bot, or to detect bots using clustering approaches [
27,
28,
29,
30,
31,
32].
Other works propose genetic algorithms to study the problem of bot detection: In [
9], the authors designed an evolutionary algorithm that makes bots able to evade detection by mimicking human behavior. The authors proposed an analysis of such evolved bots to understand their characteristics, and thus make bot detectors more robust.
Recently, some services of practical utility have been deployed. For example, in [
33], the authors presented BotOrNot, a tool that evaluates to what extent a Twitter account exhibits similarities to known characteristics of social media bots. For a more in-depth analysis of bot detection approaches and studies about the impact of bots in social networks, we refer the interested reader to [
8,
34,
35] and the references therein.
2.2. Epidemic Modeling
Historically, the spreading of misinformation has been modeled using the same epidemic models employed for common viruses [
10].
In [
36], the authors proposed a fake news spreading model based on differential equations and studied the parameters of the model that keep a fake news item circulating in a social network after its onset. Separately, the authors suggested which parameters would make the spreading die out over time. In [
37], the authors modeled the engagement onset and decrease for fake news items on Twitter as two subsequent cascade events affecting a Poisson point process. After training the parameters of the model through fake news spreading data from Twitter, the authors showed that the model predicts the evolution of fake news engagement better than linear regression, or than models based on reinforced Poisson processes.
In [
9], the authors simulated the spreading of a hoax and its debunking at the same time. They built upon a model for the competitive spreading of two rumors, in order to describe the competition among believers and fact-checkers. Users become fact-checkers through the spreading process, or if they already know that the news is not true, or because they decide to verify the news by themselves. The authors also took forgetfulness into account by making a user lose interest in the fake news item with a given probability. They studied the existence of thresholds for the fact-checking probability that guarantees the complete removal of the fake news from the network and proved that such a threshold does not depend on the spreading rate, but only on the gullibility and forgetting probability of the users. The same authors extended their previous study assessing the role of network segregation in misinformation spreading [
38] and comparing different fact-checking strategies on different network topologies to limit the spreading of fake news [
39].
In [
40], the authors proposed a mixed-method study: they captured the personality of users on a social network through a questionnaire and then modeled the agents in their simulations according to the questionnaire, in order to understand how the different personalities of the users affect the epidemic spreading. In [
41], the authors studied the influence of online bots on a network through simulations, in an opinion dynamics setting. The clusterization of opinions in networks was the focus of [
42], who observed the emergence of echo chambers that amplify the influence of a seeded opinion, using a simplified agent interaction model that does not include time-dynamical settings. In [
43], the authors proposed an agent-based model of two separate, but interacting spreading processes: one for the physical disease and a second one for low-quality information about the disease. The spreading of false information was shown to worsen the spreading of the disease itself. In [
44], the authors studied how the presence of heterogeneous agents affects the competitive spreading of low- and high-quality information. They also proposed methods to mitigate the spreading of false information without affecting a system’s information diversity.
In
Table 1, we summarize the contributions in the literature that are most related to our approach based on the realistic agent-based simulation of fake news and debunking spreading over OSNs. The references appear in order of citation within the manuscript. We observe that while several contributions exist, most of them consider analytical models of the spreading process or simplified agent interactions that may not convey the behavior of a true OSN user. This prompted us to propose an improved simulation model including different types of agents, time dynamical events affecting the agent interactions, and non-uniform node trust. Moreover, some of the works in
Table 1 focus just on the spreading of the fake news itself, and neglect debunking. Instead, we explicitly model both competing processes.
3. Model
We now characterize the agents participating in the simulations (i.e., the users of the social network) by discussing their roles, parameters, and dynamics. We also introduce the problem of constructing a synthetic network that resembles a real OSN and state our assumptions about the time dynamics involved in our fake news epidemic model.
3.1. Agent Modeling
In this study, we consider three types of agents: (i) commons, (ii) influencers, and (iii) bots. The set of commons contains all “average” users of the social network. When we collect statistics to characterize the impact of the fake news epidemic, our target population is the set of common nodes.
By observing real OSNs and their node degree distributions, some nodes exhibit an anomalously high out-degree: these nodes are commonly called influencers [
45]. When an influencer shares some contents, these contents can usually reach a large fraction of the population. Bots [
46] also play an important role in OSNs. Bots are automated accounts with a specific role. In our case, they help the conspirator who created the fake news item in two ways: by echoing the fake news during the initial phase of the spreading and by keeping the spreading process alive. Being automated accounts, bots never change their mind.
Agents of type common and influencer can enter any of the possible states: susceptible, believer, and fact-checker. However, we further make our simulation more realistic by considering the presence of special influencers called eternal fact-checkers [
39]. These influencers constantly participate in the debunking of any fake news item.
With respect to other approaches based on simulations, we model the attributes of each agent via Gaussian distributions
having different means
and standard deviations
. We used Gaussian distributions to model the fact that the population follows an average behavior with high probability, whereas extremely polarized behaviors (e.g., very high or very low vulnerability to fake news) occur with low probability. The values drawn from the distributions are then clipped between 0 and 1 (denoted as
in
Table 2), except for the interest attributes, which are clipped between
and 1 (denoted as
in
Table 2). This simplifies the incorporation of the parameters in the computation of probabilities (e.g., to spread a fake news item or debunking information). Therefore, the specific values of the attributes are different for every OSN agent. We provide a summary of the parameters and of their statistical distributions in
Table 2. We remark that agents of type bot remain in the believer state throughout all simulations, have a fixed sharing rate equal to 1 (meaning they keep sharing fake news at any network access), and bear no other attributes.
3.2. Network Construction
We model our social network as a graph. Every node of the graph represents an agent, and edges convey a relationship between its two endpoints. A key problem in this scenario is to generate a synthetic network that resembles real OSNs like Facebook or Twitter. Generating graphs with given properties is a well-known problem: network models are the fundamental pieces to study complex networks [
47,
48,
49]. However, these mathematical models are solely driven by the topology of a network and do not take into account features and similarities among nodes, which instead have a key role in shaping the connections in an OSN. In this work, we build our network model via an approach similar to [
50]. We generate our networks according to the following properties:
Geographical proximity: A user is likely to be connected with another user if they live nearby, e.g., in the same city. The coordinates of a node are assigned by sampling from a uniform distribution in the square
. This procedure follows the random geometric graph formalism [
51] and ensures that geographical clusters appear.
We evaluate the distance between two nodes using the Euclidean distance formula, normalized such that the maximum possible distance is 1. In our simulations, we created an edge between two nodes if their attribute distance was less than a threshold equal to .
Attributes’ proximity: Each node has a set of five “interest” attributes, distributed according to a truncated Gaussian distribution; we employ these parameters to model connections between agents based on their interests. This helps create connections and clusters in the attribute domain, rather than limiting connections to geographical proximity criteria.
The distance between two sets of interest attributes is evaluated using the Euclidean distance formula, normalized such that the maximum possible distance is 1. In our simulations, an edge between two nodes is created if their distance is less than a threshold equal to .
Randomness: To introduce some randomness in the network generation process, an edge satisfying the above geographical and attribute proximity criteria is removed from the graph with a given probability .
The above thresholds reproduce the higher likelihood of connections among OSN agents that are geographically close or have similar interests.
In addition to connectivity properties, every node has three parameters that affect the behavior of the corresponding user in the simulation: the vulnerability (defined as the tendency to follow opinions: if higher, it makes it easier for a user to change its state when receiving fake news or its debunking); the sharing rate (defined as the probability to share fake news or debunking piece); and the recovery rate (defined as the probability to do fact-checking: if higher, it makes the node more inclined to verify a fake news item before re-sharing it). We assigned these parameters at random to each node, according to a truncated Gaussian distribution with the parameters reported in
Table 2. The distributions were truncated between 0 and 1.
We build the network starting from common nodes and creating edges according to the above rules. After this process, we enrich the network with influencers, which exhibit a much higher proportion of out-connections to in-connections with respect to a common node. To model this, we assign a higher threshold (twice those of common agents) to create an out-edge from geographical and attribute proximity and a lower threshold (halved with respect to common agents) to create an in-edge. As a result, influencers have many more out-edges than in-edges.
Finally, we deploy bot nodes. As opposed to other agents, edges involving bots do not occur based on geographical or attribute proximity. Rather, we assume that bots have a target population coverage rate, and randomly connect bots to other nodes in order to attain this coverage. For example, for a realistic target coverage rate of 2%, we connect bots to other nodes with a probability .
Figure 1 shows a sample network generated through our algorithm, including common nodes (blue), influencers (gold), and bots (red). Our network generation procedure ensures the emergence of realistic degree distributions, and therefore of a realistic web of social connections.
Figure 2 shows the in-degree and out-degree distributions of the nodes, averaged over 50 networks having different configurations. In each network, we deploy 2000 common nodes. The figures’ keys report the number of additional influencers (
), bots (
), and eternal fact-checkers (
).
Adding agents to the networks implies some slight shifts of the in-degree distributions to the right and the appearance of a few nodes with a high in-degree. In the out-degree distributions, instead, there appear large shifts to the right, in particular when adding agents with many followers, such as influencers and eternal fact-checkers. The larger the number of such nodes, the higher the second peak between in-degree values of 35 and 60. Hence, the most connected nodes can directly access up to 3% of the entire population.
3.3. Trust among Agents
A particular trait of human behavior that arises in the spreading of fake news is that people do not trust one another in the same way; for instance, social network accounts belonging to bots tend to replicate and insistingly spread content in a way that is uncommon for typical social network users. As a result, common people tend to place less trust in accounts that are known to be bots.
To model this fact, we assigned a weight to every edge [
52]: this weight represents how much a node trusts another. Given a pair of connected nodes, we computed the weight as the arithmetic mean of the geographical distance and of the attribute distance between the two nodes. The resulting network graph thus becomes weighted and directed (we assumed symmetry for common users, but not for influencers). The weight of an edge influences the probability of infection of a node: if a node sends news to another node, but the weight of the linking edge is low (meaning that the recipient puts little trust in the message sender), the probability that the news will change the state of the recipient correspondingly decreases. When considering weighted networks, we set the weights of the edges having a bot on one end to
. This allows us to evaluate the effects of placing a low (albeit non-zero) trust on bot-like OSN accounts.
3.4. Time Dynamics
We considered two time dynamical aspects in our simulation: the connection of an agent to the network and the engagement of an agent.
Several simulation models assume that agents connect to the networks simultaneously at fixed time intervals or “slots”. While this may be a reasonable model as a first approximation, it does not convey realistic OSN interaction patterns.
Conversely, in our model, we let each agent decide independently when to connect to the network. Therefore, agents are not always available to spread contents and do not change state instantly upon the transmission of fake news or debunking messages. We modeled agent inter-access time as having an exponential distribution, with a different average value that depends on the agent. This makes it possible to differentiate between common users and bots. In particular, we set the average time between subsequent OSN accesses for bots to be:
or four times that of common users, where
(minutes
−1). Conversely, in the baseline case without time dynamics, all nodes access the network at the same, a fixed rate of once every 16 minutes.
When accessing the OSN, we assume that every agent reads its own feed, or “wall”. In this feed, the agent will find the news received while it was offline. The feed may contain messages that induce a state change in the agent (e.g., from susceptible to believer, or from believer to fact-checker). We also assumed the first node was infected at time .
To model the typical loss of interest in a news item over time and also to model the different spreading effectiveness (or “virality”) of different types of news, we introduced an engagement coefficient, whose variation over time obeys the differential equation:
where
is the decay constant. The solution to (
2) is:
and models an exponential interest decay, where the engagement
at time
controls the initial virality of the news and the constant engagement decay factor
controls how fast the population loses interest in a news item. The factor
allows us to model the different level of initial engagement expected for different types of messages. For example, in our simulations, we set
for fake news and
for debunking messages. This gives debunking messages a 10-fold lower probability to change the state of a user from believer or susceptible to fact-checker, with respect to the probability of transitioning from susceptible to believer.
In our setting, this means that a user believing in a fake news item will likely be attracted by similar pieces confirming the fake news, rather than by debunking, a behavior know as confirmation bias [
53]. In the same way, a susceptible agent will tend to pay more attention to fake news, rather than to common news items. This models the fact that the majority of news regarding scientific studies tends not to reach and influence people [
11].
In addition, we set , where is the simulation time; in this way, at the end of the simulation, fake news still induces some small, non-zero engagement. By setting this value, we can simulate the lifetime of the fake news, from its maximum engagement power to the lowest one. For a given time t, is a multiplicative factor that affects the probability of successfully infecting an individual at time t.
3.5. Agent-Based Spreading Simulation
With reference to the SBFC model in
Figure 3, two parallel spreading processes take place in our simulations:
Fake news spreading process: a node creates a fake news item and shares it over the network for the first time. The fake news starts to spread within a population of susceptible (S) agents. When a susceptible node gets “infected” in this process, it becomes a believer (B);
Debunking process: a node does fact-checking. It becomes “immune” to the fake news item and rather starts spreading debunking messages within the population. When a node gets “infected” in this process, it becomes a fact-checker (FC).
Based on the content of the messages found in its own feed upon accessing the OSN, an agent has a probability of being infected (i.e., either gulled by fake news from other believers or convinced by debunking from other fact-checkers) and a probability of becoming a fact-checker itself. If the agent changes its state, it has a probability of sharing its opinion to its neighbors. If the agent is a bot, it is limited to sharing the news piece. The algorithmic description of the node infection and OSN interaction process is provided in more detail in Algorithm 1. For better clarity, we also provide a diagrammatic representation of our agent-based fake news epidemic simulation process in
Figure 4. The diagram reproduces the operations of each agent and helps differentiate between the baseline behavior and the improvements we introduce in this work. In the figure, margin notes explain these improvements, showing where enhancements (such as time-varying news engagement, weighted networks, and non-synchronous social network access) act to make the behavior of the nodes more realistic.
Algorithm 1: Spreading of fake news and debunking. |
|
4. Results
4.1. Roadmap
In this section, we present the results of the fake news epidemic from a number of simulations with different starting conditions. We carried out all simulations using custom software written in Python 3.7 and publicly available on GitHub in order to enable the reproducibility of our results (
https://github.com/FraLotito/fakenews_simulator, accessed on 16 March 2021). The section is organized as follows. In
Section 4.2, we describe the baseline configuration of our simulations, which contrasts with the realistic settings we propose to help emphasize the contribution of our simulation model.
Section 4.3 shows the impact of influencers and bots on the propagation of a fake news item.
Section 4.4 describes the effects of time dynamical network access, and finally,
Section 4.5 demonstrates how weighted network links (expressing non-uniform trust levels) change the speed of fake news spreading over the social network.
4.2. Baseline Configuration
As a baseline scenario, we consider a network including no bots and no influencers, with unweighted edges and no time dynamics. The latter means that all agents access the network at the same time and with the same frequency, and the engagement coefficient of a fake news item does not change over time.
As discussed in
Section 3, this is the simplest configuration for an agent-based simulation, but also the least informative. The network is flat, as all nodes are similar in terms of connectivity, vulnerability, and capability to recover from fake news. Fake news always has the same probability to deceive a susceptible OSN user, and OSN activities are synchronized across all nodes. While these aspects are clearly unrealistic, using this baseline configuration helps emphasize the effect of our improvements on fake news propagation. In the results below, we gradually introduce influencers, bots, and eternal fact-checkers, and comment on their effect on the fake news epidemic. Afterwards, we illustrate the impact of time dynamics and non-uniform node trust, which further increase the degree of realism of agent-based OSN simulations.
4.3. Impact of Influencers and Bots
We start with
Figure 5, which shows the results of simulations in two different scenarios through three plots. In the top row, SBFC plots shows the variation of the number of susceptible (blue), believer (orange), and fact-checker nodes (green) throughout the duration of the simulation, expressed in seconds.
In the middle row, we show the spreading of the fake news over the network (drawn without edges for better readability). The color of each node denotes its average infection time, where the fake news starts spreading from the same node over the same network (i.e., with always the same first believer). A node depicted with a dark color is typically infected shortly after the start of the simulation, whereas a light-colored one is typically infected much later. In the third row, the graphs similarly show the average time of recovery. The darker the color is, the faster the recovery is.
For this first comparison, we considered both our baseline network composed only of common nodes (left column in
Figure 5) and a network that additionally has 30 influencers and 10 bots. We constructed the SBFC plots by averaging the results of 400 simulations. In particular, we generated 50 different network topologies at random using the same set of parameters; these networks are the same for both our set of simulations, in order to obtain comparable results. Then, we executed eight simulations for each topology. Instead, for the infection and recovery time graphs, we computed the corresponding average times over a set of 100 simulations.
In the baseline network, the spreading is slow and critically affects only part of the network. The B curve (orange) reaches a peak of about 750 believer nodes simultaneously present in the network and then decreases, quenched by the debunking and fact-checking processes (FC, green curve). Adding influencers and bots accelerates the infection and helps both the fake news and the debunking spread out to more nodes.
In particular, with 2000 common nodes and 30 influencers, fake news spreading infects (on average) almost half of the common nodes. When 10 bots are further added to the network (right column in
Figure 5), the spreading is faster, and after a short amount of time, the majority of the common nodes believe in the fake news. The SBFC plot in the top-right panel shows that the fake news reaches more than 80% of the network nodes, whereas the middle panel confirms that infections occur typically early in the process, with the exception of a few groups of nodes that get infected later on. Similarly, it takes more time to recover from the fake news and perform fact-checking than in the baseline case (bottom-right graph).
The above simulations already confirm the strong impact influencers and bots may have on fake news spreading: influencers make it possible to convey fake news through their many out-connections, and reach otherwise secluded portions of the population. Bots, by insisting on spreading fake news, increase the infection probability early on in the fake news spreading process. As a result, the effectiveness of the fake news is higher, and more agents become believers before the debunking process starts to create fact-checkers.
4.4. Impact of Time Dynamics
We now analyze the impact of time dynamics on the simulation results. In particular, we show that including time dynamics is key to obtaining realistic OSN interactions. We focus on
Figure 6, where we show the all-time maximum fraction of believers and fact-checkers over time for different network configurations. We present results both with and without bots and eternal fact-checkers, as well as with and without time dynamics.
In the absence of time dynamics, all types of agents access the network synchronously. Such synchronous access patterns make simulated OSN models inaccurate as, e.g., we expect that bots access social media much more often than common users, in order to increase and maintain the momentum of the fake news spreading. Conversely, enabling time dynamics allows us to model more realistic patterns. Specifically, we recall that we allow bots to access the network four times more often than a common user.
In the absence of time dynamics, a larger fraction of the network remains unaffected by the fake news. For example, in the case that the network contains 10 bots, 30 influencers, and no eternal fact-checker, the maximum fraction of believers is 0.81 in the absence of time dynamics (light blue solid curve) and increases to 0.87 (+7%) with time dynamics (brown solid curve), thanks to the higher access rate of the bots. Because time dynamics enable the fake news to spread faster, the number of fact-checkers increases more slowly. In
Figure 6, we observe this by comparing the orange and green dashed lines.
Figure 6 also shows the effect of eternal fact-checkers: their presence does not slow down the spreading process significantly, as negligibly fewer network users become believers. Instead, eternal fact-checkers accelerate the recovery of believers, as seen by comparing, e.g., the red and blue lines (for the case without time dynamics) and the green and brown lines (for the case with time dynamics). In any event, fact-checkers access the networks as often as common nodes; hence, their impact is limited. This is in line with the observation that time dynamic network access patterns slow down epidemics in OSNs [
54].
4.5. Impact of Weighted Networks Expressing Non-Uniform Node Trust
In
Figure 7, we show the result of the same experiments just described, but introduce weights on the network edges, as explained in
Section 3.2. We recall that weights affect the trust that a node puts in the information passed on by another connected node. Therefore, a lower weight on an edge directly maps to lower trust in the corresponding node and to a lower probability of changing state when receiving a message from that node. As the weight of any link connecting a bot to any other node is now
, the infection slows down due to the reduced trust in bots that results. We observe that, by increasing the number of eternal fact-checkers while keeping the number of bots constant, the fraction of the network affected by the fake news spreading decreases slightly. However, the speed of the fake news onset does not change significantly, as it takes time to successfully spread the debunking. This is in line with the results in the previous plots.
Figure 7 also confirms that eternal fact-checkers increase the recovery rate of believers: a few hundred seconds into the process, the number of fact-checkers increases by about 1.5% for every 10 eternal fact-checkers added to the network, before saturating towards the end of the simulation, where all nodes (or almost all) have recovered.
We conclude with
Figure 8, where we consider the joint effect of eternal fact-checkers operating on a weighted network graph. The peak of the believers is markedly lower than in
Figure 5, because the weights on the connections tend to scale down the spreading curve. The impact of the eternal fact-checkers can be noticed from the fact-checker curve (green), which increases significantly faster than in
Figure 5. We observe a similar effect from the believer curve (orange), which instead decreases more rapidly after reaching a lower peak. In this case, the maximum number of believers is about one half of the network nodes. Correspondingly, a noticeable portion of the nodes in the infection graph (bottom-left panel in
Figure 8) is yellow, showing that the fake news reaches these nodes very late in the simulation. Instead, most of the nodes in the recovery graph (bottom-right panel in
Figure 8) are dark blue, showing that they stop believing in the fake news and become fact-checkers early on. These results confirm the importance of fact checking efforts in the debunking of a fake news item, and the need to properly model different classes of fact checking agents.
5. Discussion
As fake news circulation is characteristic of all OSNs, several studies have attempted to model and simulate the spreading of fake news, both independently of and jointly with the spreading of debunking messages belying the fake news. For example, the simulation and modeling approaches most related to our contribution are listed in
Table 1.
We identify a number of shortcomings in these works, such as exceedingly synchronous dynamics, constant fake news engagement, and uniform trust. We improve over these works by proposing an agent-based simulation system where each agent accesses the network independently and reacts to the messages shared by its contacts. Another improvement concerns the fact that the interest of the population in a fake news item decays over time, which realistically models the loss of engagement of fake news pieces. By considering a directed weighted network model, we convey realistic information flows better than in undirected, unweighted models. A consequence of this network model is that we can explicitly consider influencers and bots [
9].
Several previous studies assumed the emergence of influencers in the network as a result of the preferential attachment process [
48], but did not explicitly model their role, so that the impact of these agents remains largely unexplored. An exception is [
41], where the authors explicitly cited the role of influencers on a network and proposed the characterization and modeling of the role of bots. However, unlike our work, the authors of [
41] operated in an opinion dynamics setting and focused on binary opinions, which evolve throughout an initially evenly distributed population. Instead, we integrate influencers as agents that participate in the news spreading process and may become both believers (thus spreading fake news) and recover to become fact-checkers (thus promoting debunking).
The effects of heterogeneous agents were studied in [
40]. In our case, to model the personalities of the agents and properly assign their attributes, we sampled from statistical distributions instead of relying on questionnaires, in order to provide a simple, but realistic source of variability for the parameters that characterize OSN agents.
We believe that the main advantage of our model is to introduce realistic OSN agent behaviors. As a result, our agent-based simulation model can help assess the impact of fake news over an OSN population in different scenarios. This enables authorities and interested stakeholders alike to also evaluate effective strategies and countermeasures that limit or contrast fake news impact.
6. Conclusions
In this work, we argue that most modeling efforts for fake news epidemics over social networks rely on analytical models that capture the general trends of the epidemics (with some necessary approximations) or on simple simulation approaches that often lack sufficiently realistic settings. We fill this gap by extending the simple susceptible believer fact-checker (SBFC) model to make it more realistic. We propose adding time dynamical features to the simulations (including variable access times to the network and news engagement decay), node-dependent attributes, and a richer context for mutual node trust through weighted network edges. Moreover, we analyze the impact of influencers, bots, and eternal fact-checkers on the spreading of a fake news item.
Our simulations confirm that the above realistic factors are very impactful, and neglecting them (e.g., via a model that considers synchronous OSN access for all agents, or a network where all agents trust one another equally) masks several key outcomes. The importance of a realistic simulation tool lies in the reliability of the analyses it enables. For example, the designers of marketing campaigns could assess how much fake news would damage a company or product by spreading into a population of interest. Similarly, managers and policy-makers may assess opinion spreading and polarization (which follows similar mechanisms as fake news spreading) by means of realistic simulations.
In practical terms, our results suggest that reducing the impact of bots and increasing the presence and impact of authoritative personalities [
11] are key to limiting fake news spreading. This may entail educating OSN users to recognize bot behavior or by implementing forms of automatic recognition that flag suspicious activity patterns as bot-like. The impact of influencers is also very relevant, as they significantly speed up the spreading of the news online and amplify the “small-world effect” [
49], avoiding influencers falling prey of fake news, and rather increasing their involvement in stopping fake news and/or promoting their debunking. The above methods would further help improve the healthiness of interactions and opinion communications over social networks, reducing the concerns of online users.
The main limitations of this study concern the test scenarios, which should be extended to comprise a greater number of nodes and to include agents with multiple levels of popularity, besides common users and influencers. We leave this investigation for a future extension of our work. Similarly, our agent-based simulation model recreates network access patterns (hence, the storyboard of social interactions) according to a mathematical model. While this is still better than having all agents interact synchronously, we could further improve the level of realism by testing our model on social network datasets that include temporal annotations. Moreover, we could consider additional dynamics in the network model. For example, we plan to extend our interaction model through an unfollowing functionality (i.e., social connection removal) in order to amplify the circulation of some news pieces in specific communities.