Information is not a Virus, and Other Consequences of Human Cognitive Limits

The many decisions people make about what to pay attention to online shape the spread of information in online social networks. Due to the constraints of available time and cognitive resources, the ease of discovery strongly impacts how people allocate their attention to social media content. As a consequence, the position of information in an individual's social feed, as well as explicit social signals about its popularity, determine whether it will be seen, and the likelihood that it will be shared with followers. Accounting for these cognitive limits simplifies mechanics of information diffusion in online social networks and explains puzzling empirical observations: (i) information generally fails to spread in social media and (ii) highly connected people are less likely to re-share information. Studies of information diffusion on different social media platforms reviewed here suggest that the interplay between human cognitive limits and network structure differentiates the spread of information from other social contagions, such as the spread of a virus through a population.


I. INTRODUCTION
The spread of information in online social networks is often likened to the spread of a contagious disease. According to this analogy, information-whether a trending topic, a news story, a song, or a video-behaves much like a virus, "infecting" individuals, who then "expose" their naive followers by mentioning the topic, sharing the video, or recommending the news story, etc. These followers may, in turn, become "infected" by sharing the information, "exposing" their own followers, and so on. If each person "infects" at least one other person, information will keep spreading on the network, resulting in a "viral" outbreak, similar to how a spreading virus can create an epidemic that sickens a large portion of the population. The analogy between the spread of a disease and information is the basis of computational methods that attempt to amplify the spread of information in networks by identifying influential "superspreaders" [1][2][3][4][5], and those that make inferences about the network from observations of how information spreads on it [6][7][8][9][10].
One of the simplest and most widely used models of the spread of epidemics in networks is the Independent Cascade Model (ICM) [1,6,[11][12][13]. It describes a process wherein each exposure of a healthy but susceptible individual to a disease by an infected friend results in an independent chance of disease transmission: the more infected friends an individual has, the more likely he or she is to become infected. The model predicts the size of an outbreak (number of individuals infected) in any network for a given value of disease transmissibility (i.e., how easily the disease is transmitted upon exposure). Figure 1 shows the size of outbreaks simulated using the ICM on a social media follower graph (red dots) [14]. The black symbols give the size of simulated outbreaks on a randomly-generated graph with the same degree distribution as the follower graph. The simulated outbreaks are close in size to the theoretically predicted values [15], given by the golden line in Fig. 1. There exists a critical value of transmissibility-the epidemic threshold [16]- 1. Size of simulated outbreaks on a real-world and random graphs as a function of transmissibility. Contagions are simulated using the independent cascade model (ICM) on the follower graph of the Digg social news platform (red dots) and a random graph with the same degree distribution (black crosses). The golden line gives theoretically predicted outbreak sizes.
below which the contagion dies out, but above which it spreads to a finite portion of the network. The epidemic threshold depends only on structural properties of the network, and not the details of the disease or its transmissibility [17]: specifically, the epidemic threshold is given by the inverse of the largest eigenvalue of the adjacency matrix representing the network [16,18]. Note that even above the epidemic threshold, contagions starting in isolated corners of the network may die out. However, in general, the higher the transmissibility, the farther the contagion spreads, reaching a non-negligible fraction of the network above the epidemic threshold, for example, 10%, 20%, etc. of the network.
How well does the social contagion analogy hold for social media? In this review of empirical studies of information diffusion in social media, I first present evidence that information fails to spread widely in online social networks. The vast majority of outbreaks are very small (see Fig. 2), in stark contrast to the predictions of the epi-arXiv:1605.02660v1 [cs.CY] 9 May 2016 demic model. To explain these findings, I present studies examining the mechanisms of information diffusion, specifically how people respond to multiple exposures to information. The key finding of these studies-that central individuals in an online social network are less susceptible to becoming "infected"-is sufficient to explain why social contagions fail to propagate. The studies also link the reduced susceptibility of central individuals to information overload and human reliance on cognitive heuristics to compensate for the brain's limited capacity to process information. Accounting for how people use cognitive heuristics to decide what information to pay attention to in social media dramatically simplifies dynamics of social contagion and allows for more accurate predictions of how far information will spread online.

II. SIZE OF SOCIAL CONTAGIONS
Empirical studies of information spread in social media have failed to observe outbreaks as large as those predicted by the independent cascade model [14,19,20]. This review focuses on two widely-studied social platforms-Twitter and Digg-although similar behaviors were observed in a variety of other social platforms [19]. Twitter, a popular microblogging platform, allows registered users to broadcast short messages, or "tweets." These messages may contain URLs or descriptive labels, known as hashtags. In addition to composing original tweets, users can re-share, or "retweet," messages posted by others. In contrast to Twitter, Digg focuses solely on news. Digg users submit URLs to news stories they find on the web and vote for, i.e., "digg," stories submitted by others. Both platforms include a social networking component: users can subscribe to the feeds of other users to see the tweets those users posted (on Twitter) or the news stories they submitted or voted for (on Digg). The follow relationship is asymmetric; hence, we refer to the subscribing users as followers, and the users they subscribe to as their friends (or followees).
To measure the size of outbreaks on Twitter, researchers used URLs to external web content embedded in tweets as unique markers of information [21]. They tracked these URLs as users shared or retweeted the messages with their followers. A similar strategy was used to track each news story on Digg. Thus, the number of times a message containing a URL was retweeted or a news story was "dugg" in their respective networks gave an estimate of the outbreak size in that network. Figure 2 shows the distribution of outbreak sizes on the Twitter and Digg social platforms. Note that outbreaks have a long-tailed distribution, except for a bump on Digg that corresponds to promoted stories. When a newly submitted story accumulated enough votes, it was promoted to Digg's front page, where it was visible to everyone, not only to followers of voters [22]. The higher visibility of stories on the front page gave them a popularity boost, resulting in log-normally distributed popularity. However, even the most popular stories did not penetrate very far. Only one story, about Michael Jackson's death, could be said to have reached "viral" proportions, i.e., reaching a non-negligible fraction of active Digg users (in this case, about 5%). The next most popular story reached fewer than 2% of Digg voters, and the vast majority of front page stories reached fewer than 0.1% of the voters. Similarly, very few of the outbreaks on Twitter reached more than 10,000 users, or less than 2% of the active user population. These findings are in line with other studies, including by Goel et al. [19], who analyzed seven online social networks, ranging from communication platforms to networked games, to reach the same conclusion: the vast majority of outbreaks in online social networks are small and terminate within one step of the source of information.

III. MECHANICS OF CONTAGION: EXPOSURE RESPONSE
The observations above present a puzzle: what stops information from spreading widely on social media? And why is outbreak size so much smaller than predicted by the independent cascade model? A number of hypotheses could potentially explain the empirical findings: Subcriticality: The vast majority of information spread is sub-critical, with transmissibility below the epidemic threshold. As a result, information is unlikely to spread upon exposure, and can be considered uninteresting. This hypothesis is easy to dismiss, since it is difficult to imagine that all the information shared on many different social media platforms is uninteresting.
Load balancing: Social media users may modulate transmissibility of information to prevent too many pieces of information from spreading and creating information overload. This hypothesis is difficult to evaluate, though it is not very credible, since such wide-scale coordination would be difficult to achieve. Moreover, it would require users to correctly estimate the popularity of different pieces of information in their local neighborhood, a measurement that is easily skewed in networks [23].
Novelty decay: Transmissibility of information could diminish over time as information loses novelty. A study [24] explicitly addressed this hypothesis, and found that the probability to retweet information on Twitter does not depend on its absolute age, but only the time it first appeared in a user's social feed.
Network structure: Although it is conceivable that network structure (e.g., clustering or communities) could limit the spread of information, this hypothesis was ruled out [14]. As can be seen in Figure 1, the structure of the actual Digg follower graph somewhat reduces the size of outbreaks, but not nearly enough to explain empirical observations.
Contagion mechanism: The decisions people make to vote for a story on Digg or retweet a URL on Twitter, once their friends have shared, it could differ substantially from the ICM. These differences could prevent information from spreading [14].
To characterize the mechanisms of contagion, researchers use the "exposure response function." Since a person may be exposed to some information (or disease) by several friends, exposure response function gives the probability of an infection as a function of the number of exposures. Under the independent cascade model, infection probability rises monotonically with the number of infected friends as: p ICM (infection|k exposures) = 1 − (1 − µ) k , where µ is the transmissibility. Using social The figures report the probability (averaged over all users) to respond to information, i.e., (a) to digg a news story or (b) retweet a URL, as a function of number of friends who previously did so. media data, researchers empirically measured the exposure response function for Twitter and Digg users. To do this, they found all users who became "infected" (e.g., retweeted a URL [24] or adopted a hashtag [25] on Twitter, or "dugg" a story on Digg [14]) after k of their friends (i.e., the users he or she follows) became "infected." The exposure response function is the ratio of the number of users who became "infected" to the number who did not become "infected" for different values of k. Figure 3 shows the exposure response functions for Digg and Twitter, averaged over all users. The shape and magnitude of the exposure response functions are fundamentally different from that of the ICM. The form of the exposure response indicates that while initial exposures increase infection probability, additional exposures suppress new infections. According to Romero et al. [25], such response is suggestive of complex contagion, another popular model for describing social contagions, where "infection" does not occur until exposure by some specified fraction of friends [26][27][28][29].
Does the suppressed response to multiple exposures inhibit the spread of information online? Ver Steeg et  Fig. 3(a). Actual outbreaks on Digg are shown as red dots, while theoretically predicted (gold) line is the same as in Fig. 1. Suppressed response to repeated exposures vastly decreases the size of outbreaks as compared to prediction of the ICM (Fig. 1).
al. [14] simulated the spread of social contagions on the Digg follower graph with the suppressed exposure response function suggested by Fig. 3(a). In the simulation, exposure response was approximated as follows. If a node had any infected friends, it became infected with probability µ. However, if it did not get infected (with probability 1 − µ), it was forever immune to new infections. Figure 4 shows the size of the resulting outbreaks (red dots) as a function of transmissibility µ. The outbreaks are an order of magnitude smaller than those predicted by the independent cascade model, and in line with empirically observed outbreaks (black crosses). This suggests that online contagions fail to spread due to the reduced susceptibility of social media users to multiple exposures.

IV. LIMITED ATTENTION AND COGNITIVE HEURISTICS
Why don't social media users respond to multiple exposures to information? In distinction to viral infection, social media users must actively seek out information and decide to share it before becoming "infected." The enormous flux of information on social media often saturates human ability to process information [30]. Faced with an over-abundance of stimuli, humans evolved mechanisms to parsimoniously direct their attention to the most salient stimuli. What is salient depends on context: color, contrast, and motion help guide visual attention to important features of the environment, such as a predator. Social stimuli are also salient, as they aid coordination and help people avoid conflict. A variety of other cognitive heuristics are used to quickly (and unconsciously) focus attention on salient information [31,32].
In the context of social media, information that appears at the top of the web page or user's social feed is salient. As a result of this cognitive heuristic, known as "position bias" [33], people pay more attention to items at the top of a list than those in lower positions. Social influence bias, communicated through social signals, helps direct attention to online content that has been liked, shared or approved by many others [34,35]. Cognitive heuristics interact with how a web site displays information to users to alter the dynamics of social contagion. Twitter presents friends' messages in a user's feed as a chronologically ordered queue, with the most recently tweeted messages at the top. (Similarly, Digg orders news stories submitted by friends in a reverse chronological order.) Due to position bias, a user is more likely to see newest messages at the top of the feed than older messages in lower positions. Researchers conducted controlled experiments on Amazon Mechanical Turk to quantify position bias [36]. They presented study participants with a list of 100 items and asked them to recommend those they found interesting. Figure 5 shows the relative decrease in recommendations received at each list position, compared to what the items shown in those positions are expected to receive. Items in top list positions (0-5) receive three to five times as much attention as those in lower positions , purely by virtue of being in those positions.
However, in the observational data from social media, we do not know the feed position of a message at the time the user responds to it. Instead, we know that its position should be proportional to its age, i.e., time since its arrival in the user's feed, when the latter is a queue ordered by time of item's arrival. Figure 6 confirms the effect. It shows the probability to digg (or retweet) an item as a function of the time since item's arrival. Though Twitter Digg stories were only followed until promotion (the first 24 hours) during which time they were only visible to the followers. The data are smoothed using progressively wider smoothing windows, as in [24]. and Digg differ substantially in their functionality and user interface, they behave very similarly. The probability on both sites drops precipitously with time, which suggests that social media users are far less likely to seeand retweet-older messages in lower feed positions than newer messages in top feed positions.
Cognitive heuristics also interact with network structure of alter the dynamics of social contagion. The visibility of an item in a feed of a well-connected user (who follows many others) decreases faster in time than the visibility of an item in the feed of a poorly-connected user (who follows few friends). As a result, well-connected users with more friends rarely retweet old content. This is because these users receive many newer messages from their multitude of friends, which quickly push a given item further down the queue, where it is less likely to be seen. In contrast, poorly-connected users receive few messages, so that the visibility of an item does not decay as quickly. This effect is evident in Figure 6, where the probability to retweet (or digg) an item decreases faster for well-connected users (with more than 250 friends) than for the poorly-connected users (with fewer than 10 friends).

FIG. 7.
Exposure response function for Twitter. The figure shows the average probability to retweet some message as a function of the number of friends who previously tweeted it for two classes of users, separated according to the number of friends they follow. The well-connected users with many friends are less likely to retweet; hence, they are less susceptible, than users with few friends.
The quickly decaying visibility of information in the feeds of well-connected social media users reduces their susceptibility to becoming "infected" by that information. Figure 7 shows the exposure response function for two classes of Twitter users: those with few friends and those with many friends. The response probability of well-connected users with many friends is much lower compared to poorly-connected users, consistent with the argument that they have a harder time finding specific messages in their long feeds. Note also that both response functions increase monotonically, in line with simple contagion models, such as the ICM. The non-monotonic behavior observed for adoption of hashtags on Twitter [25] and in Figure 3 does not represent complex contaction, but is simply an artifact of averaging over heterogeneous populations of users with different cognitive load, i.e., different volumes of information in their feeds. Averaging the curves in Figure 7, we will observe an exposure response that initially increases, since both classes of users contribute; however, as the number of "infected" friends increases further, only users with many friends contribute to the response, bringing the average response function down. This is an illustration of "heterogeneity's ruses" [37]: averaging over heterogeneous populations, each with its own behavior, can produce nonsensical behavioral patterns. When studying social systems, one needs to isolate the more homogeneous populations and carry out analysis within each population [38].

V. PREDICTING SOCIAL CONTAGIONS
Knowing how cognitive heuristics constrain user behavior enables us to more accurately predict social contagions. To become "infected," a user must first discover at least one message containing the information. We approximate a message's visibility using the time response function, of the kind shown in Fig. 6, that gives the probability that a user with n f friends retweets or votes at a time ∆t after the exposure [24].
To understand the dynamics of social contagion, we must also specify how users respond to multiple exposures to information. Here, the details of how the web site presents information matter. Twitter puts each newly retweeted item-the new exposure-at a top of the followers' feeds, creating a new opportunity for the followers to discover the item. Thus, if k friends tweet some information, it will appear in a user's feed k times in different positions. In contrast, Digg does not change the news story's relative position after a friend's digg, but increments the number of recommendations shown next to the story: after k friends digg a story, it still appears only once in the user's feed, but with the number k next to it. This number serves as a social signal that change a user's response. The effect of social signals on user's likelihood to become "infected" can be measured experimentally [35] or estimated from observational data [39].
Putting these factors together, [39] proposed a simple model of social contagion where each exposure can independently cause an "infection" (i.e., a retweet). In contrast to the plain ICM, "infection" probability depends on the visibility of the exposures, which is related to the time of the exposures on Twitter or the time of the first exposure on Digg. A social signal, if present, will amplify "infection" probability. To validate this model, the authors used it to forecast "infections" and compared them to observed "infections." Specifically, they calculated on a minute-by-minute basis the observed frequency that a user with some number of exposures in their feed retweeted some specific information on Twitter or dugg a story on Digg in the subsequent 30 seconds. Then they calculated the theoretical probability that the same user would act in those 30 seconds, given the same exposures. Figure 8 shows the observed vs predicted probability of those infections, for different numbers of exposures in the users' feeds. For reference, perfect forecasts lie along the y = x line. The unbiased fidelity of the proposed model suggests that once visibility of the exposures is taken into account, social contagion operates as a simple contagion, i.e., with infection probability increasing monotonically with the number of exposures. Other works incorporated visibility into models of user behavior that account for user interests [40] and sentiment [41] about topics, their limited attention [42], and the multiple channels for finding information, such as on Digg's front page [22].

VI. DISCUSSION
The notion that networks amplify the flow of information has ignited the imagination of researchers and public alike. The few success stories-songs and videos that have spread in a chain reaction from person to person to reach millions-keep marketers searching for formulas for creating viral campaigns. Success, however, is rare. Empirical studies of the spread of information in online social networks revealed that information rarely spreads beyond the source. The search for answers as to why information fails to spread in social media has uncovered the vital role of brain's cognitive limits in social media interactions.
These cognitive limits are what differentiates the spread of information from the spread of a virus, and they must be accounted for in models of information diffusion. Specifically, in order to spread some information on social media, a person first has to discover it in his or her social feed. Discovery depends sensitively on how the web site arranges the feed, the flux of incoming information, and the effort the person is willing and able to expend on the discovery process. Moreover, as people add more friends, the volume of information they receive may grow superlinearly due to the friendship paradox in social networks [43] and its generalizations [44,45]: a person's friends are more active and post more messages on average, then the person himself or herself does. As a result, the volume of information may inevitably exceed an individual's cognitive capacity, creating conditions for information overload [30]. To deal with information overload, people rely on cognitive heuristics to focus only on salient information. In the context of social media, this means paying attention to the most recent messages at the top of their feed, and ignoring the rest. This reduces the probability that highly connected people will see and spread any given piece of information in their feed, making them less susceptible to becoming "infected." The reduced susceptibility of central users suppresses the spread of social contagions in social media. Accounting for these phenomena in models of information diffusion allows us to more accurately predict how far information will spread online.
The interplay between networks and human cognitive limits may have other non-trivial consequences. Poten-tially, people who have higher capacity to process information may put themselves in network positions allowing them greater access to information [40,46], which they may then leverage for personal gain [47,48]. Understanding the role of social networks and cognitive heuristics and biases in individual and collective behavior remains an open research area.