Next Article in Journal
Some Notes on Counterfactuals in Quantum Mechanics
Next Article in Special Issue
Cross-Domain Recommendation Based on Sentiment Analysis and Latent Feature Mapping
Previous Article in Journal
Universal Gorban’s Entropies: Geometric Case Study
Previous Article in Special Issue
Optimizing Variational Graph Autoencoder for Community Detection with Dual Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Complex Contagion Features without Social Reinforcement in a Model of Social Information Flow

1
Department of Mathematics & Statistics, University of Vermont, Burlington, VT 05405, USA
2
School of Mathematical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(3), 265; https://doi.org/10.3390/e22030265
Submission received: 1 February 2020 / Revised: 21 February 2020 / Accepted: 24 February 2020 / Published: 26 February 2020
(This article belongs to the Special Issue Computation in Complex Networks)

Abstract

:
Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information before they begin to spread it themselves. Here we show that the quoter model, a model of the social flow of written information over a network, displays features of complex contagion, including the weakness of long ties and that increased density inhibits rather than promotes information flow. Interestingly, the quoter model exhibits these features despite having no explicit social reinforcement mechanism, unlike complex contagion models. Our results highlight the need to complement contagion models with an information-theoretic view of information spreading to better understand how network properties affect information flow and what are the most necessary ingredients when modeling social behavior.

1. Introduction

Social networks mediated through online platforms are an increasingly important way in which individuals send and receive information, and their influence is now felt in economics, politics, and the workplace [1,2,3,4,5,6]. These platforms provide rich opportunities for researchers to collect and study real-world data related to human behavior and the spread of information. In concert with these datasets, considerable research has worked towards better statistical and information-theoretic tools to quantify information flow [7,8,9] and towards more accurate mathematical models to understand and even predict information flow [10,11,12].
A common approach to measuring information flow over a network is to idealize information as a collection of ‘packets,’ and then track the spread of those packets throughout the network. This approach is especially common when studying social media where keywords such as hashtags or URLs are easily tracked. More complex phenomena, such as the adoption of behaviors can also be monitored and used as a proxy for information flow [13]. Treating information flow in this way brings to mind the spread of infections and the use of epidemiologically inspired models is popular. In this context, the social “diffusion” of information is often characterized as either a simple contagion or a complex contagion [14]. Simple contagions are those where each exposure can independently lead to an infection. Complex contagions, in contrast, introduce a social reinforcement mechanism where multiple exposures are needed before the contagion can spread.
However, despite its simplicity and popularity, there can be drawbacks to treating information as the contagion of discrete packets. Within social media, for example, there is a wealth of written information being posted by users that is ignored when focusing only on particular keywords. Likewise, considerable information could be exchanged between individuals without leading to an observable adoption of behavior. Therefore, we argue in this work that a more nuanced approach grounded in information theory can give a better view of information flow in online social networks while more fully using the available data.
The goal of this work is to study how network properties can affect information flow when taking an information-theoretic view on information flow, and how this information-theoretic view compares to contagion. We study the quoter model [12], a simple model for individuals generating text data within social media and apply information-theoretic estimators to the model text. Using both network models and real-world network data, we compare the behavior of information flow in this model with traditional simple and complex contagion, to see the similarities and differences we may observe through these contrasting viewpoints. Interestingly, we find that the quoter model exhibits several phenomena characteristic of complex contagion, despite lacking an explicit social reinforcement mechanism, the key feature of complex contagion.
The rest of this work is organized as follows. In Section 2 we describe information-theoretic estimators of information flow and mathematical models of information flow and contagion. In Section 3 we describe the materials and methods used in this study, including simulation details, measures of information flow, the network properties we investigate, and the network data we use. Section 4 presents our results comparing contagion models with the information-theoretically motivated quoter model and exploring how various network properties affect information flow in the quoter model. We conclude with a discussion in Section 5.

2. Background

2.1. Measuring Information Flow

Suppose an individual within a social network generates a stream of text representing posts shared online on Twitter, for example. The entropy rate h of this text captures the information present within it. It can be challenging to estimate h for natural language data as information is present in the ordering of the words, not just the relative frequencies of words [15]. To help address this challenge, Kontoyianni et al. [16] proved that the estimator
h ^ = T log 2 T t = 1 T Λ t ,
converges to the true entropy rate h of a text, where T is the length of the sequence of words and Λ t is the match length of the prefix at position t: it is the length of the shortest substring (of words) starting at t that has not previously appeared in the text. This estimator has been used to study human dynamics including mobility patterns and social media predictability [11,17].
Equation (1) generalizes to an estimator of the cross-entropy h × between two texts A and B [11,18]:
h ^ × ( A B ) = T A log 2 T B t = 1 T A Λ t ( A B ) ,
where T A and T B are the lengths of the two texts, and Λ t ( A | B ) is the length of the shortest substring [ A t , A t + 1 , , A t + Λ t ( A B ) + 1 ] starting at position t of text A not previously seen in text B. Previously, in this case, refers to all the words of B written prior to the time when the tth word of A was written. Specifically, compute Λ t ( A | B ) by searching for each substring [ A t ] , [ A t , A t + 1 ] , within B : t [ B j time ( B j ) < time ( A t ) ] , the ordered sequence of words in B that appear before the time of the t-th word in A, until the first substring [ A t , , A t + Λ t ( A B ) + 1 ] that is not seen in B : t . By matching the future text of A (words posted at times time ( A t ) ) against the past text of B (words posted at times < time ( A t ) ) at every t, only B’s past predictive information about A’s future is estimated and temporal precedence is satisfied. The cross-entropy can be applied directly to the texts of a pair of individuals by choosing B to be the text stream of one individual and A the text stream of the other, and Equation (2) can be used to measure the information flow between those individuals by asking how much predictive information about one text is contained within the other. This can be a quite powerful and effective measure of information flow, as it satisfies temporal precedence of the text streams and it uses all of the available (text) data for the pair of users [7,11,12,16,18].
We focus on the cross-entropy estimated using Equation (2) as a pairwise measure of information flow, but generalizations can capture information flow from multiple social ties towards a single individual [11,12]. Doing so allows for measures of more complex information flow such as analogs of transfer entropy or causation entropy [7,8,19]. The best extensions of information flow estimators beyond pairwise measures remains an active and fruitful area of research (see also our discussion in Section 5).
Closely associated with the cross-entropy is the predictability Π . Predictability, given by Fano’s Inequality [20], provides a bound on how accurately an ideal predictive method can perform when working with data of a given entropy: Π is the probability the most accurate possible method will correctly predict the subsequent word with the given information’s uncertainty (i.e., the cross-entropy).
h ( Π ) + ( 1 Π ) log ( z 1 ) h ×
where h ( Π ) = Π log ( Π ) ( 1 Π ) log ( 1 Π ) and z is the cardinality of the sample space; in our problem, this is the vocabulary size or number of unique words for the quoter model (Section 3.1). The predictability is then given by finding numerically the largest Π that satisfies Equation (3). Equation (3) demonstrates that h × and Π are functionally equivalent (and inversely related, with higher h × corresponding to lower Π and vice versa) as z is a constant for the model we study here (see also discussion in Section 5). Higher values of Π (lower h × ) correspond to higher amounts of information flow.

2.2. Quoter Model

To study the effects of network properties on information flow, we use the recently proposed quoter model [12]. The quoter model represents an idealized model of social conversations, meant to capture some of the processes by which individuals in an online social network post text while also being analytically tractable. Nodes in a network generate text streams both by sampling from a given vocabulary distribution and by copying (“quoting”) short sub-sequences of text from their neighbors. This model provides a parameter q, the quote probability that tunes the degree of information flow. (Full details of the model and how we simulate it are given in Section 3.1.) After simulating the quoter model for a given number of time steps (Section 3.1), a text stream has been generated by each node in the network, and we can estimate the cross-entropies between these texts to study the social flow of written information. See Bagrow and Mitchell [12] for full details on the quoter model.

2.3. Other Models of Information Flow

Contagion approaches are often used to model information flow [14]. A classic simple contagion approach to information flow is compartment models, taken from models of epidemics. Two simple compartment models are Susceptible-Infected (SI) and Susceptible-Infected-Recovered (SIR) models. On a network, a small number of nodes are initially “infected” while the remaining nodes are susceptible. The contagion then spreads from those infected nodes with a constant transmission rate per link so that each node in the “S” compartment has a constant probability to move to the “I” compartment with any given exposure. For SIR models, an additional “R” compartment is used to model a recovery process where infected nodes cease spreading the contagion while also becoming immune to reinfection. Many variants on these models exist.
Complex contagion phenomena are typically captured with threshold models [21,22]. Here nodes are again labeled as susceptible or infected, but the probability for a node i to become “infected” is a function of the number of neighbors of that node already infected. If too few neighbors are infected there is zero probability that i will be infected. Yet if a sufficient fraction of i’s neighbors become infected, then i has a non-zero probability of becoming infected. This social reinforcement mechanism is intended to capture the cognitive mechanisms underlying opinion change, knowledge acquisition, and other facets of how individuals respond to and adopt information and ideas [23,24].
Complex contagion leads to several phenomena that differ from simple contagion. For one, there is an interesting cascade window where network density leads to a non-monotonic relationship with the spread of the contagion. Often denser networks lead to less spread, unlike simple contagion where a contagion will spread more easily as denser networks afford more opportunities (links) for spreading. Another feature of complex contagion is the complicated role of clustering where clustering can appear to either promote or inhibit contagion [25,26,27,28]. Complex contagion also exhibits a “weakness of long ties” effect, where long ties impede the flow of contagion [29], in contrast with the seminal “strength of weak ties” result [30] that implies long-range ties have an out-sized role in promoting information flow. The goal of our work here is to study the information-theoretic view of information flow we adopt here with the quoter model and compare to the effects of complex contagion that is commonly used as a non-information-theoretic view to study information flow.

3. Materials and Methods

In this study, we use the quoter model on networks to elucidate the role of network structure on information flow. Here we describe the procedures to simulate the quoter model, measure information flow between nodes in networks, we describe the network features we study in relation to information flow, and we provide the details on the network models (random graphs) and real-world network datasets we study.

3.1. The Quoter Model

We use the following process to simulate the quoter model on a given network. The quoter model requires a directed graph G = ( V , E ) (where N = V is the number of nodes and M = E is the number of edges) and, in the most general case, quote probabilities q u v on each directed edge (we say node v (ego) may quote u (alter) if the edge u v exists and has q u v > 0 ). We simplify this for our simulations: when an ego generates new text, with probability q (bidirectional quoting) we pick an alter (predecessor) uniformly at random to quote from; otherwise, with probability 1 q the ego generates new content. If an ego quotes an alter (probability q), copy a random segment of the alter’s past text and append this onto the ego’s growing text stream. We take the “quote length” (number of words) being copied to be Poisson-distributed (with mean λ ) for all users; Otherwise, if not quoting (probability 1 q ), generate new content by sampling with replacement from a vocabulary distribution W ( w ) and appending those samples onto the ego’s growing text stream, where the number of samples is again Poisson-distributed with mean λ . We assume a common, fixed vocabulary distribution W ( w ) that follows a Zipf law of word use, as in prior studies and motivated by real-world language usage patterns [12]. Specifically, a Zipf law defines the probability of using word w to be a power law based on the rank r w of w: W ( w ) = H z , α 1 r w α , where z is the vocabulary size and H z , α = r = 1 z r α . Here we take z = 1000 as in [12] and, unless otherwise stated, focus on the exponent α = 1.5 , a value typical of social media data. We focus in this work on q = 1 / 2 and λ = 3 but we explore the robustness of our results to other parameter choices in Appendix A. This process repeats for T = 1000 N time steps so that each user has generated approximately 1000 λ = 3000 words when complete. This number of time steps was chosen to ensure the entropy estimator would converge (see [16,18] for convergence proofs). While very short amounts of text will make the estimated entropy too uncertain to be reliable, this length of text is in line with the empirical convergence of h × reported in real data [11].

3.2. Measuring Information Flow over the Network

After generating text streams for all nodes in G by iterating the quoter model, the cross-entropy estimator (Equation (2)) is then applied as needed to compute h × . We compute the cross-entropy over all edges, { h × } = { h × ( u v ) ( u , v ) E } , and report the mean h × and variance Var ( h × ) of these values. (We examine the distribution of h × in Appendix B to show that h × and Var ( h × ) are reasonable summaries of the distribution of h × .) Likewise, the predictability Π , given by Fano’s Inequality [20], is a functionally equivalent measure of information flow (as we assume the same vocabulary sizes for nodes in the quoter model). We focus on link-based cross-entropies although the cross-entropy estimator can be applied to non-neighboring nodes. Indeed, when studying the role of community structure in modular networks (see Section 3.4), we also consider cross-entropies between nodes in different modules, to assess information flow between and within said modules.

3.3. Simulating Contagion Models

To compare and contrast information flow in the quoter model, we also simulate traditional models of information flow, specifically simple and complex contagion. For simple contagion we simulate a stochastic SIR model on different networks (1000-node Erdős-Rényi and Barabási-Albert networks, as well as a sample of real-world networks) using [31]. For the simulations here we set the transmission rate 20 and recovery rate 1. We initialize with a random 5% of the nodes infected, and run 10 outbreaks on 100 realizations of the network for each choice of average degree k . For complex contagion we use exactly the same parameters, except we introduce a threshold function for transmission as in [22], where the transmission rate is set to zero if the proportion of infected neighbors is below some threshold ϕ (and we set ϕ = 0.18 following [22]). For all simple and complex contagion simulations we measure the peak outbreak size, noting that larger outbreak sizes conventionally correspond to greater information flow.

3.4. Assessing the Impact of Structure on Dynamics

In this work we use several network models (random graphs) tailored to control for various network properties such as density, clustering, and modular structure. Here we describe the models and properties we study in relation to information flow in the quoter model.
Density and Average Degree
To explore how network density relates to information flow, we create Erdős-Rényi and Barabási-Albert networks of N nodes with varying average degree, k , allowing us to the tune their densities. For the Erdős-Rényi networks we add edges independently with probability p = k / ( N 1 ) . For the Barabási-Albert model we start with m = k / 2 nodes with no edges and add nodes which each form m links with previous nodes according to preferential attachment. Here we measure how cross-entropies varies with the densities of the networks using their average degree k and edge density M / N 2 where M is the total number of edges in the network. To complement the Erdős-Rényi and Barabási-Albert results, we also compare the densities of real networks with their average cross-entropy.
Degree Heterogeneity
To assess the role of degree heterogeneity on information flow, we study the simplest random graph model with tunable degree heterogeneity, termed “dichotomous networks” in [32]. Dichotomous networks are generated via the configuration model. They have only two types of nodes—those with degree k 1 and those with degree k 2 . We assume there are N / 2 nodes of each degree and fix k 1 + k 2 so that the average degree is fixed. The mean and variance of the degree distribution, respectively, are given by μ = 1 2 k 1 + k 2 and σ 2 = ( k 1 k 2 ) 2 / 4 . We are interested in how the cross-entropy varies with k 1 / k 2 . When k 1 / k 2 = 1 the network reduces to a random k-regular graph ( σ 2 = 0), while σ 2 as k 1 / k 2 0 .
Clustering
Clustering or triadic closure, the tendency towards forming triangles, is a key feature of social networks. We studied clustering using a network model with tunable numbers of triangles and with a randomization procedure that can lower the number of triangles in an existing network. We quantify a network’s clustering using transitivity T ( G ) , the fraction of possible triangles in the network which actually exist: T ( G ) = 3 N triangles / N triads , where N triangles counts the number of triangles in the network and N triads is the number of triads or paths of length 2.
We constructed “small-world” networks using the Watts–Strogatz (WS) model [33] to tune their clustering. We generated a one-dimensional periodic lattice of N nodes with k nearest-neighbor connections, and randomly rewired lattice edges with a rewiring probability p. Varying the rewiring probability p allows us to tune the network diameter and clustering.
While the Watts–Strogatz model lets us generate networks with different clustering values, a generic challenge when assessing the impact of clustering (and other network properties) on dynamics is generating networks with tunable clustering, but for which other structural properties, such as density or diameter, can be controlled for. To study the relationship between transitivity and information flow, we apply the established degree-preserving stochastic rewiring or “x-swap” method [34,35,36], in which we repeatedly choose two links at random and two randomly selected endpoints of those links are swapped as long as the number of links does not change by swapping and the network does not become disconnected. These swaps lower transitivity while fixing the number of links and degrees of all nodes in the network. We performed 5 M swaps for each real network. Examining information flow on the randomized network compared with information flow on the original network can then illustrate what effect, if any, transitivity had on information flow.
Community Structure and Modularity
Community structure is another inherent property of social networks. It is commonly quantified using modularity [37]:
Q = 1 2 M i , j a i j k i k j 2 M δ ( c i , c j ) ,
where M is the total number of links, the sum runs over all pairs of nodes in the network, A = [ a i j ] is the adjacency matrix of the network, k i is the degree of node i, δ is the Kronecker delta, and c i denotes the community containing i. The community structure encoded in the { c i } can be found using a community detection algorithm or it may be planted within a network model. To investigate community structure within a network model, we examined instances of the stochastic block model (SBM) [38,39] with N nodes and two planted blocks, or groups of nodes, denoted A and B, of equal size m = N / 2 . Here there are two connection probabilities: p 0 (the within-block connection probability) and p 1 (the between-block connection probability) governing the probability for a link to form between nodes in the same block and in different blocks, respectively. The expected modularity in this two-block stochastic block model is
Q = 1 2 p 0 p 0 m + p 1 m p 0 p 0 m p 1 m .
Our main quantities of interest are the average cross-entropy on within-block edges, h × ( within ) , the average cross-entropy on between-block edges h × ( between ) and their difference, Δ h × h × ( between ) h × ( within ) . These quantities describe to what extent information flows within and between communities.
We also computed modularity for real networks using the Louvain method [40]. The Louvain method is a hierarchical community detection algorithm that finds a partition of nodes that maximizes modularity Q. As commonly done, we initialize each node in its own community.
Multiple Vocabulary Distributions
A recent study [41] showed that heterogeneity in the dynamical parameters can be as important as structural heterogeneity. Communities offer an obvious way to implement such heterogeneity: We also investigate a two-block SBM where we distinguish the two groups A and B by giving them different Zipf exponents α A , α B , respectively, for their vocabulary distributions.

3.5. Network Datasets

To supplement the above graph models, we also studied contagion and quoter model dynamics on real-world networks. We developed a corpus of 10 social networks spanning a range of sizes and densities that were used as the basis for simulation. See Appendix C for details on network sources and processing. Table 1 shows several descriptive statistics for the networks we analyzed.

4. Results

Here we compare information flow in the quoter model with traditional simple and complex contagion (Section 4.1), then investigate how degree heterogeneity (Section 4.1), clustering (Section 4.2) and network modularity (Section 4.3) affect information flow. We also study how heterogeneity in the parameters affects information flow compared to the effects of network structure (Section 4.4).

4.1. Information Flow and Models of Contagion

A distinguishing feature of simple and complex contagion is that denser networks lead to higher spreading for simple contagion and lower spreading (mostly) for complex contagion. We illustrate this difference using simulations in Figure 1A,B. For the simple and complex contagion models we use the average peak size of the outbreak as our measure of information flow in the network, whereas for the quoter model we use the average predictability over links. The decrease in spreading in complex contagion is due to its social reinforcement mechanism: it is more difficult for a contagion to spread when egos have many alters as more alters must adopt the contagion before the ego does. Yet we see in Figure 1C that the quoter model, which lacks an explicit social reinforcement mechanism, also exhibits lower information flow at higher density. Here we measure information flow using predictability on links (Section 3.2), which is functionally equivalent (Section 2.1) in our simulations to the cross-entropy h × (Figure 1C inset). Please note that while the curve for h × looks visually similar to that of simple contagion’s average peak size, it is measuring the opposite effect: higher h × corresponds to lower information flow. These results also hold on our corpus of real-world networks (Figure 2).
Somewhat surprisingly, in Figure 1C we see that Erdős-Rényi (ER) and Barabási-Albert (BA) networks are qualitatively indistinguishable in terms of information flow, despite the preponderance of hubs in the latter that we expect would play an out-sized role in information flow. To better understand this observation, we investigated the variance of h × over links in Figure 3A. We see that the cross-entropy varies more from link to link in the BA networks than for ER networks, indicating that hubs do not move the average information flow but do create fluctuations in the flow, especially for sparser networks.
To further explore the role of network structure heterogeneity, we investigate dichotomous networks (Section 3.4). Here half the nodes have degree k 1 and the other half have degree k 2 . Varying the degree ratio k 1 / k 2 allows us to tune the degree variance within this simplified network model. In Figure 3B we see that the total number of nodes and average degree change the average information flow while the degree heterogeneity ( k 1 / k 2 ) has little effect. Yet degree heterogeneity does affect the variance of information flow (Figure 3C). These simpler dichotomous networks show the same effects as observed previously in BA networks.
The simplified bimodal degree distribution of dichotomous networks also lets us explore the effects of ego and alter degrees by computing conditional expectations of h × conditioned on degree. We see from the grouping of curves in Figure 3D that the degree of the ego (the node being predicted) but not the alter (the node predicting) plays a role in the information flow: degree- k 1 egos have more information flow than degree- k 2 egos regardless of the degree of the alter.

4.2. Interplay of Clustering and Information Flow

Next, we study how clustering (transitivity) affects information flow. Clustering plays a complicated role in both simple and complex contagion [25,27] and we report interesting, if mixed, results in Figure 4 with the quoter model’s information flow.
First, in Figure 4A we study information flow for small-world networks that are randomly rewired to remove clustering [33]. Regardless of network size or average degree, information flow decreases (higher h × in top panel of Figure 4A) as clustering decreases (Figure 4A bottom panel). Please note that rewiring also changes the diameter of the small-world network, but we see that the main increase in h × occurs when clustering begins to drop. In small-world networks, clustering tends to promote information flow.
Next, in Figure 4B we investigate transitivity in the corpus of real-world networks. For each network, we compute information flow on the original network and on a replicate of the network that is randomized by the “x-swap” method. The x-swap method lowers transitivity for all networks but for half of the networks it also lowers h × , contradicting the previous results on small-world networks by indicating that transitivity inhibits information. However, it is challenging to draw a sharp conclusion from this x-swap procedure as it also affects other network properties simultaneously. We illustrate this in Figure 4C where we compare four network properties in the original and x-swapped networks. X-swapping affects transitivity but also average shortest path length (ASPL), modularity and assortativity (degree correlations). This means the changes in information flow seen in Figure 4B may be due to changes in a combination of these (and possibly other) network properties. Unfortunately, it remains an open research problem how best to systematically control for network properties to uncover their effects on dynamics.

4.3. Community Structure and the Weakness of Long Ties

The effects of long-range links on information flow have been investigated for some time, from the seminal “strength of weak ties” [30] and the contrasting “weakness of long ties” in complex contagion [29]. Here we investigate long ties in the context of community structure: In networks with densely connected groups of nodes, long ties act to bridge nodes in different groups. How does information flow differ between groups compared to flow within groups?
Using the stochastic block model (Section 3.4) with two groups of equal size as a model for networks with dense modules, we study in Figure 5 information flow between and within groups. The two-group SBM is parameterized by two connection probabilities, the probability for a link within each group ( p 0 ) and the probability for a link between the two groups ( p 1 ). In Figure 5A we see that information flow decreases as p 0 increases and the network becomes denser. Likewise, the difference in information flow Δ h × increases due to between-block links containing less predictive information (Figure 5B). This supports the well-known “weakness of long ties” feature of complex contagion. For larger values of p 1 , when there are more links connecting the groups making them less distinct, this difference decreases. The collapse of curves in Figure 5C indicates Δ h × is entirely predicated on the network modularity Q.
Interestingly, we also remark that Δ h × is always positive—even when p 0 < p 1 (equivalently, Q < 0 ). We would expect more information flow between groups than within when within this “anti-community” regime of the SBM, when there are more links between groups than within groups, yet we observe a weak effect otherwise.

4.4. The Role of Dynamic Heterogeneity

In our results so far, we have treated nodes as identical within the quoter model and focused only on their topological differences within the network. Yet recent studies have underlined the importance of comparing dynamic heterogeneity with structural heterogeneity [41]. Here we taken an exploratory step in this direction by considering a generalization of the quoter model where nodes have different vocabulary distributions.
We explored how information flow changes in the stochastic block model when the nodes in the two blocks have different vocabulary distributions. This is intended to model a difference in the nodes between the two groups, capturing in the quoter model a social homophily in how egos write. Specifically, we assume they have the same vocabularies and follow Zipf distributions, but the exponent of the Zipf distribution is different: nodes in block A have exponent α A and nodes in block B have exponent α B . A larger α (steeper distribution) corresponds to a less diverse vocabulary, and could capture a group of nodes that is more consistent and repetitive in their dialog. In contrast, a lower α (shallower distribution) may describe a group of nodes that uses more diverse words.
Figure 6 shows how information flow changes when the two blocks have different vocabulary distributions (Figure 6A,C) compared with the same distribution (Figure 6B). For illustration, we show the Zipfian vocabulary distributions for the two groups as insets in Figure 6. We observe a much larger trend in how cross-entropy changes with modularity when the exponents are not equal compared to when they are equal. This underscores how structural features (the degree of modularity) greatly magnifies the effects of intrinsic dynamic heterogeneity (different vocabulary distributions). While modularity plays a role even when the two groups have identical vocabulary distributions (Figure 5), this difference is challenging to detect in Figure 6B when viewed on the scale of groups with different vocabulary distributions (Figure 6A,C).

5. Discussion

In this work, we have studied how the social flow of written information can be affected by network properties such as the density of links, preponderance of triangles, and modular or community structure. We focused on the quoter model, a toy model for a network of individuals to communicate by generating text sequences and applied information-theoretic estimators of the information flow to these texts. We compared results of information flow in the quoter model with traditional simple and complex contagion models.
A particularly intriguing facet of the interplay between quoter model dynamics and network topology is how the quoter model exhibits both the density-driven inhibition of information flow and the weakness of long ties that are signatures of complex contagion, despite lacking an explicit mechanism of social reinforcement. Social reinforcement, the idea that individuals adopt a piece of information only after receiving repeat exposure from social ties, is considered one of the characteristics that distinguishes complex contagion from epidemic spreading. Social reinforcement mechanisms better model how people perceive and react to information. Yet we found here that social reinforcement is not strictly necessary when modeling a more nuanced view of information flow. In particular, considering text streams (as generated by the quoter model) and predictive measures of information flow (as quantified using cross-entropy estimators) allows us to capture how information can be “drowned out” by the increased “cross-talk” that occurs in denser networks, showing how increased density can inhibit information flow. Further pursuing this line of investigation may give more insight into information flow and even human behavior within social networks.
We also found a mixed combination of results relating clustering to information flow. For small-world (Watts–Strogatz) networks, increasing the clustering leads to a significant increase in information flow (decrease in cross-entropy). At the same time, however, experiments on real-world networks showed the opposite effect: randomizing networks to lower transitivity while preserving connectedness and the degree distribution leads to a decrease in information flow. However, this well-established randomization procedure does not control for other network properties such as modularity or average shortest path length, so it remains an open question if the interplay of multiple effects may resolve the discrepancy between these results.
Another interesting result related information flow to community structure, with the modularity Q used to measure the strength of the modular divide. When Q > 0 , meaning there were fewer links between modules than expected, we found in Figure 5 an increase in cross-entropy between modules compared with the cross-entropy between nodes that share a module, as expected by the “weakness of long ties”. However, we found the same increase in cross-entropy when Q < 0 , where there were more links between modules than expected. We would initially expect this regime of “anti-community” structure to have more information flow between modules as there exist more links to facilitate this flow. One possible reason for this anti-community result is that nodes in the same group, while having fewer direct links to one another, may have many links to common nodes in the other group, leading to more similar inputs to their texts. This nonlocal interplay of information flow and network structure is an intriguing avenue for future work.
There are some important limitations to discuss regarding this work. We only considered undirected, unweighted networks. In the context of social networks, this implies all relationships are reciprocal and equal in strength. Future work should extend to directed, weighted networks. Furthermore, a more exhaustive study of the robustness of results to parameter choices is necessary (we take a first step towards this in Appendix A). Vocabulary size is another parameter worth exploring; here we assume it is constant across all nodes. Likewise, cross-entropy (Equation (2)) is a somewhat simplistic information-theoretic measure of information flow, and it is important to consider more advanced measures. Measures such as transfer or causation entropy can offer more insight, quantifying non-redundant information and allowing us to identify indirect influences [7,8]. However, in the context of time-ordered social text data, it is challenging to estimate conditional entropies, making it non-obvious how to implement such measures [12]. Finally, while we observed several features that are signatures of complex contagion, not all features of complex contagion are exhibited by the quoter model. For example, there is an optimal modularity that maximizes spreading of complex contagions within the stochastic block model: if Q is either too small or too large then the contagion will not spread [42]. We were unable to observe a corresponding feature within the quoter model. This warrants further investigation, in particular to understand if this is due to how the quoter model differs from complex contagion models, or if it is due to the information-theoretic measure of information, or a combination of the two.
In general, contagion models are a successful way to study information flow in social networks, but to gain more insight it is necessary to adopt more nuanced views of information flow. We argue here that information theory can provide a pathway towards these insights, especially when combined with models such as the quoter model that capture features of human behavior while also modeling key aspects of the data being generated by social network platforms.

Author Contributions

Conceptualization, T.P., L.M. and J.P.B.; Funding acquisition, L.M. and J.P.B.; Investigation, T.P., S.M. and L.M.; Methodology, T.P., T.S. and J.P.B.; Project administration, J.P.B.; Software, T.P., T.S. and L.M.; Supervision, L.M. and J.P.B.; Validation, T.P., S.M. and L.M.; Visualization, T.P.; Writing–original draft, T.P. and J.P.B.; Writing–review & editing, T.P. and J.P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1447634.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ASPLAverage Shortest Path Length
BABarabási-Albert
ERErdos-Rényi
SBMStochastic Block Model
SISusceptible-Infected
SIRSusceptible-Infected-Recovered
WSWatts–Strogatz

Appendix A. Further Exploring Quoter Model Parameters

To support our results, here we explore other choices of quoter model parameters (q and λ ). The simulations are done on smaller networks to make it less computationally expensive to do a wide sweep of the parameter space. We first simulate the quoter model on ER, BA, and small-world networks for q { 0.1 , 0.5 , 0.9 } and vary k or the rewiring probability, p, to support results from Section 4.1 and Section 4.2. We then simulate the ER, BA, and small-world experiments again for various combinations of the quote probability q and mean quote length λ . We evaluate the robustness of results for ER networks as follows. For each combination of ( q , λ ) , we calculate the difference h × k = 20 h × k = 6 , whereby h × k = 20 we mean the average cross-entropy on ER networks of average degree k = 20 . The quantity will be positive if density inhibits information flow. This allows us to assess the how the magnitude of our results vary with ( q , λ ) , although it does not confirm a monotonic trend holds. We repeat these calculations with the BA networks and extend them to the small-world networks by replacing k with p { 0 , 1 } . In general, we find in Figure A1 and Figure A2 that our results are qualitatively robust to parameter choices, with the exception of very small values of q, as we expect.
Figure A1. Trends in information flow in ER, BA, and small-world networks for q { 0.1 , 0.5 , 0.9 } . Except for very low quote probabilities, we see qualitatively similar trends. (A) ER & BA networks of size N = 100 with varying average degree. Each point constitutes 200 simulations. (B) Small-world networks of size N = 200 with k = 6 with varying rewiring probability. Each point constitutes 500 simulations.
Figure A1. Trends in information flow in ER, BA, and small-world networks for q { 0.1 , 0.5 , 0.9 } . Except for very low quote probabilities, we see qualitatively similar trends. (A) ER & BA networks of size N = 100 with varying average degree. Each point constitutes 200 simulations. (B) Small-world networks of size N = 200 with k = 6 with varying rewiring probability. Each point constitutes 500 simulations.
Entropy 22 00265 g0a1
Figure A2. Effects of quoter model parameter choices on observed trends. Information flow is lower for denser ER and BA networks across a range of q and λ with the effect being more pronounced at higher values of q and λ . Likewise, for small-world networks, more clustering (lower p) exhibits higher h × than less clustering (higher p), with the effect being most pronounced at q > 0.5 regardless of λ . Here, ER & BA networks had N = 100 and small-world networks had N = 200 and k = 6 . Each cell constitutes 100 simulations.
Figure A2. Effects of quoter model parameter choices on observed trends. Information flow is lower for denser ER and BA networks across a range of q and λ with the effect being more pronounced at higher values of q and λ . Likewise, for small-world networks, more clustering (lower p) exhibits higher h × than less clustering (higher p), with the effect being most pronounced at q > 0.5 regardless of λ . Here, ER & BA networks had N = 100 and small-world networks had N = 200 and k = 6 . Each cell constitutes 100 simulations.
Entropy 22 00265 g0a2

Appendix B. Summarizing h ×

In this work, we summarized h × by the mean h × and variance Var ( h × ) . In Figure A3, we see that this choice was appropriate: examining the distributions of h × for various networks shows that they are approximately normal. We also find the mean and median h × to be approximately equal.
Figure A3. The distributions of h × for quoter model simulations on various networks. Examining the distributions supports using h × and Var ( h × ) as summary statistics, although some real networks show a small bimodality (an excess of h × < 3 bits). We also remark that the mean and median are approximately equal (solid line shows h × , dashed line shows median h × ) for all networks. ER & BA networks have N = 1000 nodes with k = 12 , and 200 simulations as in Figure 1. Small-world networks have N = 200 nodes with k = 6 and p = 10 4 , and 500 simulations as in Figure 4A. Real-world networks are from 300 simulations as in Figure 2 and Figure 4B,C. Quoter model parameters are given in Section 3.1.
Figure A3. The distributions of h × for quoter model simulations on various networks. Examining the distributions supports using h × and Var ( h × ) as summary statistics, although some real networks show a small bimodality (an excess of h × < 3 bits). We also remark that the mean and median are approximately equal (solid line shows h × , dashed line shows median h × ) for all networks. ER & BA networks have N = 1000 nodes with k = 12 , and 200 simulations as in Figure 1. Small-world networks have N = 200 nodes with k = 6 and p = 10 4 , and 500 simulations as in Figure 4A. Real-world networks are from 300 simulations as in Figure 2 and Figure 4B,C. Quoter model parameters are given in Section 3.1.
Entropy 22 00265 g0a3

Appendix C. Network Corpus

All networks studied here can be found through the Index of Complex Networks (ICON) [43]. We converted any directed or weighted networks to undirected (bidirectional) and unweighted. Details for each of the ten networks:
  • Les Miserables co-appearances [44] [Undirected, Weighted].
  • Hollywood film music [45] [Undirected, Weighted]. This is a bipartite network; we converted it to a one-mode projection (nodes are composers and two composers are linked if they worked with the same producer).
  • Freeman’s EIES dataset [46] [Directed, Weighted]. We used the “personal relationships (time 1)” network.
  • Sampson’s monastery [47] [Directed, Weighted]. We used the Pajek dataset. The weight of a directed link represents how an individual rates the other. The rating can be positive (1,2,3 = top 3 ranked) or negative (-1,-2,-3 = worst 3 ranked). We chose to only keep links which were positive.
  • Golden Age of Hollywood [48] [Directed, Weighted]. We used the aggregated network over 1909-2009.
  • 9-11 terrorist network [49] [Undirected, Unweighted].
  • CKM physicians social network [50] (1966) [Directed, Unweighted]. We used “CKM physicians Freeman” networks hosted by Linton Freeman, and chose the “friend” network (i.e., the third adjacency matrix). We took only the giant component.
  • Kapferer tailor shop [51] (1972) [Undirected, Unweighted]. We used the “Kapferer tailor shop 1” Pajek dataset (kapfts1.dat).
  • Dolphin social network [52] (1994-2001) [Undirected, Unweighted].
  • Email network (Uni. R-V, Spain, 2003) [53] [Directed, Unweighted]. We used the “email-uni-rv-spain-arenas” network.

References

  1. Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabasi, A.L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; et al. SOCIAL SCIENCE: Computational Social Science. Science 2009, 323, 721–723. [Google Scholar] [CrossRef] [Green Version]
  2. Tumasjan, A.; Sprenger, T.O.; Sandner, P.G.; Welpe, I.M. Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, Washington, DC, USA, 23–26 May 2010; pp. 178–185. [Google Scholar]
  3. Conover, M.D.; Ferrara, E.; Menczer, F.; Flammini, A. The Digital Evolution of Occupy Wall Street. PLoS ONE 2013, 8, e64679. [Google Scholar] [CrossRef] [Green Version]
  4. Castells, M. Networks of Outrage and Hope: Social Movements in the Internet Age; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  5. De Montjoye, Y.A.; Radaelli, L.; Singh, V.; Pentland, A. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 2015, 347, 536–539. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Garcia, D. Leaking privacy and shadow profiles in online social networks. Sci. Adv. 2017, 3, e1701172. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Sun, J.; Bollt, E.M. Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings. Phys. D 2014, 267, 49–57. [Google Scholar] [CrossRef] [Green Version]
  9. Borge-Holthoefer, J.; Perra, N.; Gonçalves, B.; González-Bailón, S.; Arenas, A.; Moreno, Y.; Vespignani, A. The dynamics of information-driven coordination phenomena: A transfer entropy analysis. Sci. Adv. 2016, 2, e1501158. [Google Scholar] [CrossRef] [Green Version]
  10. Wang, D.; Wen, Z.; Tong, H.; Lin, C.Y.; Song, C.; Barabási, A.L. Information spreading in context. In Proceedings of the 20th international conference on World wide web (WWW 2011), Hyderabad, India, 28 March–1 April 2011; pp. 735–744. [Google Scholar]
  11. Bagrow, J.P.; Liu, X.; Mitchell, L. Information flow reveals prediction limits in online social activity. Nat. Hum. Behav. 2019, 3, 122–128. [Google Scholar] [CrossRef] [Green Version]
  12. Bagrow, J.P.; Mitchell, L. The quoter model: A paradigmatic model of the social flow of written information. Chaos 2018, 28, 075304. [Google Scholar] [CrossRef] [Green Version]
  13. Centola, D. The Spread of Behavior in an Online Social Network Experiment. Science 2010, 329, 1194–1197. [Google Scholar] [CrossRef]
  14. Borge-Holthoefer, J.; Banos, R.; Gonzalez-Bailon, S.; Moreno, Y. Cascading behaviour in complex socio-technical networks. J. Complex Netw. 2013, 1, 3–24. [Google Scholar] [CrossRef] [Green Version]
  15. Shannon, C. Prediction and Entropy of Printed English. Bell Labs Tech. J. 1951, 30, 50–64. [Google Scholar] [CrossRef]
  16. Kontoyiannis, I.; Algoet, P.; Suhov, Y.; Wyner, A. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Trans. Inf. Theory 1998, 44, 1319–1327. [Google Scholar] [CrossRef]
  17. Song, C.; Qu, Z.; Blumm, N.; Barabasi, A.L. Limits of Predictability in Human Mobility. Science 2010, 327, 1018–1021. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Ziv, J.; Merhav, N. A Measure of Relative Entropy between Individual Sequences with Application to Universal Classification. In Proceedings of the IEEE International Symposium on Information Theory, San Antonio, TX, USA, 17–22 January 1993; p. 352. [Google Scholar]
  19. Sun, J.; Taylor, D.; Bollt, E.M. Causal Network Inference by Optimal Causation Entropy. SIAM J. Appl. Dyn. Syst. 2015, 14, 73–106. [Google Scholar] [CrossRef] [Green Version]
  20. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1991. [Google Scholar]
  21. Granovetter, M. Threshold Models of Collective Behavior. Am. J. Sociol. 1978, 83, 1420–1443. [Google Scholar] [CrossRef] [Green Version]
  22. Watts, D. A simple model of global cascades on random networks. Proc. Natl. Acad. Sci. USA 2002, 99, 5766–5771. [Google Scholar] [CrossRef] [Green Version]
  23. Centola, D.; Eguíluz, V.M.; Macy, M.W. Cascade dynamics of complex propagation. Phys. A 2007, 374, 449–456. [Google Scholar] [CrossRef] [Green Version]
  24. Ugander, J.; Backstrom, L.; Marlow, C.; Kleinberg, J. Structural diversity in social contagion. Proc. Natl. Acad. Sci. USA 2012, 109, 5962–5966. [Google Scholar] [CrossRef] [Green Version]
  25. Miller, J.C. Percolation and epidemics in random clustered networks. Phys. Rev. E 2009, 80, 020901. [Google Scholar] [CrossRef] [Green Version]
  26. Pastor-Satorras, R.; Castellano, C.; Van Mieghem, P.; Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015, 87, 925–979. [Google Scholar] [CrossRef] [Green Version]
  27. O’Sullivan, D.J.; O’Keeffe, G.J.; Fennell, P.G.; Gleeson, J.P. Mathematical modeling of complex contagion on clustered networks. Front. Phys. 2015, 3, 71. [Google Scholar] [CrossRef] [Green Version]
  28. Gray, C.; Mitchell, L.; Roughan, M. Super-blockers and the effect of network structure on information cascades. In Proceedings of the Companion Proceedings of the The Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1435–1441. [Google Scholar]
  29. Centola, D.; Macy, M. Complex Contagions and the Weakness of Long Ties. Am. J. Sociol. 2007, 113, 702–734. [Google Scholar] [CrossRef] [Green Version]
  30. Granovetter, M.S. The Strength of Weak Ties. In Social Networks; Elsevier: New York, NY, USA, 1977; pp. 347–367. [Google Scholar]
  31. Miller, J.; Ting, T. EoN (Epidemics on Networks): A fast, flexible Python package for simulation, analytic approximation, and analysis of epidemics on networks. J. Open Source Softw. 2019, 4, 1731. [Google Scholar] [CrossRef]
  32. Lambiotte, R. How does degree heterogeneity affect an order-disorder transition? Europhys. Lett. 2007, 78, 68002. [Google Scholar] [CrossRef]
  33. Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
  34. Singh, P.; Sreenivasan, S.; Szymanski, B.; Korniss, G. Threshold-limited spreading in social networks with multiple initiators. Sci. Rep. 2013, 3, 2330. [Google Scholar] [CrossRef] [Green Version]
  35. Milo, R.; Kashtan, N.; Itzkovitz, S.; Newman, M.E.; Alon, U. On the uniform generation of random graphs with prescribed degree sequences. arXiv 2003, arXiv:cond-mat/0312028. [Google Scholar]
  36. Blitzstein, J.; Diaconis, P. A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees. Internet Math. 2011, 6, 489–522. [Google Scholar] [CrossRef] [Green Version]
  37. Newman, M.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [Green Version]
  38. Danon, L.; Díaz-Guilera, A.; Duch, J.; Arenas, A. Comparing community structure identification. J. Stat. Mech. 2005, 2005, P09008. [Google Scholar] [CrossRef]
  39. Karrer, B.; Newman, M. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, 2008, P10008. [Google Scholar] [CrossRef] [Green Version]
  41. De Arruda, G.F.; Petri, G.; Rodrigues, F.A.; Moreno, Y. Impact of the distribution of recovery rates on disease spreading in complex networks. Phys. Rev. Res. 2020, 2, 013046. [Google Scholar] [CrossRef] [Green Version]
  42. Nematzadeh, A.; Ferrara, E.; Flammini, A.; Ahn, Y.Y. Erratum: Optimal Network Modularity for Information Diffusion. Phys. Rev. Lett. 2014, 113, 088701. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Clauset, A.; Tucker, E.; Sainz, M. The Colorado index of complex networks. Retrieved July 2016, 20, 2018. Available online: https://icon.colorado.edu (accessed on 25 February 2020).
  44. Knuth, D.E. Stanford GraphBase: A platform for Combinatorial Computing; Addison-Wesley: Boston, MA, USA, 1993. [Google Scholar]
  45. Faulkner, R.R. Music on Demand: Composers and Careers in the Hollywood Film Industry; Transaction Books: New Brunswick, NJ, USA, 1983. [Google Scholar]
  46. Freeman, S.C.; Freeman, L.C. The Networkers Network: A Study of the Impact of a New Communications Medium on Sociometric Structure; University of California: Irvine, CA, USA, 1979. [Google Scholar]
  47. Sampson, S.F. A novitiate in a Period of Change: An Experimental and Case Study of Social Relationships. Ph.D. Thesis, Cornell University, Ithaca, NY, USA, 1968. [Google Scholar]
  48. Taylor, D.; Myers, S.A.; Clauset, A.; Porter, M.A.; Mucha, P.J. Eigenvector-Based Centrality Measures for Temporal Networks. Multiscale Model. Simul. 2017, 15, 537–574. [Google Scholar] [CrossRef] [Green Version]
  49. Krebs, V. Uncloaking Terrorist Networks. First Monday 2002, 7, 43–52. [Google Scholar] [CrossRef]
  50. Burt, R.S. Social Contagion and Innovation: Cohesion versus Structural Equivalence. Am. J. Sociol. 1987, 92, 1287–1335. [Google Scholar] [CrossRef]
  51. Kapferer, B. Strategy and Transaction in an African Factory: African Workers and Indian Management in a Zambian Town; Manchester University Press: Manchester, UK, 1972. [Google Scholar]
  52. Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 2003, 54, 396–405. [Google Scholar] [CrossRef]
  53. Guimerà, R.; Danon, L.; Díaz-Guilera, A.; Giralt, F.; Arenas, A. Self-similar community structure in a network of human interactions. Phys. Rev. E 2003, 68, 065103. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Denser networks are associated with higher information flow for simple contagion but lower information flow for both complex contagion and the quoter model. Here density is measured by average degree k for Erdős-Rényi (ER) & Barabási-Albert (BA) model networks. (A) Simple contagion. (B) Complex contagion (C) Quoter model. (Panel C, inset) Average cross-entropy on links; higher cross-entropies correspond to lower predictabilities and lower information flow, unlike for contagions where higher average peak sizes correspond to higher information flow. Networks consisted of N = 1000 nodes and each point constitutes 200 simulations; parameters for simulating information flow in these models are described in Section 3.
Figure 1. Denser networks are associated with higher information flow for simple contagion but lower information flow for both complex contagion and the quoter model. Here density is measured by average degree k for Erdős-Rényi (ER) & Barabási-Albert (BA) model networks. (A) Simple contagion. (B) Complex contagion (C) Quoter model. (Panel C, inset) Average cross-entropy on links; higher cross-entropies correspond to lower predictabilities and lower information flow, unlike for contagions where higher average peak sizes correspond to higher information flow. Networks consisted of N = 1000 nodes and each point constitutes 200 simulations; parameters for simulating information flow in these models are described in Section 3.
Entropy 22 00265 g001
Figure 2. Information flow on real-world networks. (A) Simple contagion. (B) Complex contagion. (C) Quoter model. Here information flow measures (average peak size, average text predictability) are compared to network density M / N 2 . The association between information flow and density, either positive (simple contagion) or negative (complex contagion, quoter model), is significant (Wald test on non-zero regression slope, p < 0.05 ). Each point constitutes 300 simulations.
Figure 2. Information flow on real-world networks. (A) Simple contagion. (B) Complex contagion. (C) Quoter model. Here information flow measures (average peak size, average text predictability) are compared to network density M / N 2 . The association between information flow and density, either positive (simple contagion) or negative (complex contagion, quoter model), is significant (Wald test on non-zero regression slope, p < 0.05 ). Each point constitutes 300 simulations.
Entropy 22 00265 g002
Figure 3. Exploring the variance of information flow. (A) Variance of cross-entropy is higher at low densities for BA than ER networks despite the average h × being similar (Figure 1C). (B–D) Information flow on dichotomous networks (random networks where all nodes have degree k 1 or degree k 2 , allowing tunable degree heterogeneity) of size N { 500 , 1000 } with k { 16 , 32 } . Each point constitutes 500 trials. (B) Average cross-entropy versus k 1 / k 2 . Degree heterogeneity does not affect average cross-entropy, supporting Figure 1C. Network size has a smaller effect on h × compared to the average degree. (C) Variance of cross-entropy versus k 1 / k 2 . Higher degree heterogeneity (lower k 1 / k 2 ) leads to higher variation in h × over links, indicating the existence of highly predictive nodes and nodes that contribute little predictive information within heterogeneous networks. (D) Dichotomous networks of size N = 1000 and k = 16 . Average cross-entropy over links conditioned on degrees of endpoints (predicting ego from alter). Only the degree of the ego matters, approximately, not the degree of the alter.
Figure 3. Exploring the variance of information flow. (A) Variance of cross-entropy is higher at low densities for BA than ER networks despite the average h × being similar (Figure 1C). (B–D) Information flow on dichotomous networks (random networks where all nodes have degree k 1 or degree k 2 , allowing tunable degree heterogeneity) of size N { 500 , 1000 } with k { 16 , 32 } . Each point constitutes 500 trials. (B) Average cross-entropy versus k 1 / k 2 . Degree heterogeneity does not affect average cross-entropy, supporting Figure 1C. Network size has a smaller effect on h × compared to the average degree. (C) Variance of cross-entropy versus k 1 / k 2 . Higher degree heterogeneity (lower k 1 / k 2 ) leads to higher variation in h × over links, indicating the existence of highly predictive nodes and nodes that contribute little predictive information within heterogeneous networks. (D) Dichotomous networks of size N = 1000 and k = 16 . Average cross-entropy over links conditioned on degrees of endpoints (predicting ego from alter). Only the degree of the ego matters, approximately, not the degree of the alter.
Entropy 22 00265 g003
Figure 4. Mixed effects of clustering on information flow. (A) Information flow on small-world networks of size N { 200 , 400 } and average degree k { 6 , 12 } . As network rewiring increases (and clustering decreases) h × increases. This suggests that clustered networks promote information flow. Rewiring a small-world network changes the diameter (L) as well the clustering (panel A, bottom); however, h × begins to increase primarily when the clustering begins to drop, not when diameter begins to drop. Each point constitutes 300 trials. (B) Average cross-entropy versus transitivity for real-world networks. By randomizing networks using the standard “x-swap” method (Section 3.4), we can lower the transitivity and investigate how h × changes. Some networks show little change in h × on randomized networks compared with the original networks, while others show a slight decrease in h × . This is especially visible in the inset comparing h × directly. Each point constitutes 300 simulations. (C) Several network properties before and after the x-swap method. While the x-swap method lowers transitivity, it also alters other important network properties, making it challenging to isolate the role of clustering from other properties.
Figure 4. Mixed effects of clustering on information flow. (A) Information flow on small-world networks of size N { 200 , 400 } and average degree k { 6 , 12 } . As network rewiring increases (and clustering decreases) h × increases. This suggests that clustered networks promote information flow. Rewiring a small-world network changes the diameter (L) as well the clustering (panel A, bottom); however, h × begins to increase primarily when the clustering begins to drop, not when diameter begins to drop. Each point constitutes 300 trials. (B) Average cross-entropy versus transitivity for real-world networks. By randomizing networks using the standard “x-swap” method (Section 3.4), we can lower the transitivity and investigate how h × changes. Some networks show little change in h × on randomized networks compared with the original networks, while others show a slight decrease in h × . This is especially visible in the inset comparing h × directly. Each point constitutes 300 simulations. (C) Several network properties before and after the x-swap method. While the x-swap method lowers transitivity, it also alters other important network properties, making it challenging to isolate the role of clustering from other properties.
Entropy 22 00265 g004
Figure 5. Information flow within the stochastic block model (SBM) of N = 100 (two blocks of size N = 50 ). Each point constitutes 10k trials. (A) Average cross-entropy on within-block edges and between-block edges as a function of the within-block connection probability p 0 for different between-block connection probabilities p 1 . (B, C) Examining the cross-entropy difference Δ h × h × ( between ) h × ( within ) across (B) connection probabilities and (C) modularity Q. Examining Δ h × as a function of modularity Q shows a clear collapse across values of SBM probabilities. Interestingly, anti-community structure ( Q < 0 ) still leads to positive Δ h × , indicating that information flow is still more prevalent within blocks.
Figure 5. Information flow within the stochastic block model (SBM) of N = 100 (two blocks of size N = 50 ). Each point constitutes 10k trials. (A) Average cross-entropy on within-block edges and between-block edges as a function of the within-block connection probability p 0 for different between-block connection probabilities p 1 . (B, C) Examining the cross-entropy difference Δ h × h × ( between ) h × ( within ) across (B) connection probabilities and (C) modularity Q. Examining Δ h × as a function of modularity Q shows a clear collapse across values of SBM probabilities. Interestingly, anti-community structure ( Q < 0 ) still leads to positive Δ h × , indicating that information flow is still more prevalent within blocks.
Entropy 22 00265 g005
Figure 6. Effects of dynamic heterogeneity on information flow in the stochastic block model. Nodes in group A have Zipfian vocabulary distribution with exponent α A while nodes in B have exponent α B . The between-block connection probability is fixed ( p 1 = 0.15 ) and the within-block connection probability p 0 is varied to generate a range of modularities. Since the structure is symmetric (subgraphs A and B have the same size and expected density), we only show the result of fixing α A = 2 and varying α B . Each point constitutes 150 trials. (A) The vocabulary distribution of group A has a lower Shannon entropy than of B, and this difference is visible from examining links A A and B B . When examining links A B and B A , the cross-entropy is mainly dependent on the vocabulary distribution of the alter. As modularity increases, differences between the predictabilities of various nodes are exaggerated. (B) In homogeneous communities, the cross-entropy does not vary with modularity at such a scale. (C) The vocabulary distribution of group A has a higher Shannon entropy than of B. Similar mirror results are seen as in panel A.
Figure 6. Effects of dynamic heterogeneity on information flow in the stochastic block model. Nodes in group A have Zipfian vocabulary distribution with exponent α A while nodes in B have exponent α B . The between-block connection probability is fixed ( p 1 = 0.15 ) and the within-block connection probability p 0 is varied to generate a range of modularities. Since the structure is symmetric (subgraphs A and B have the same size and expected density), we only show the result of fixing α A = 2 and varying α B . Each point constitutes 150 trials. (A) The vocabulary distribution of group A has a lower Shannon entropy than of B, and this difference is visible from examining links A A and B B . When examining links A B and B A , the cross-entropy is mainly dependent on the vocabulary distribution of the alter. As modularity increases, differences between the predictabilities of various nodes are exaggerated. (B) In homogeneous communities, the cross-entropy does not vary with modularity at such a scale. (C) The vocabulary distribution of group A has a higher Shannon entropy than of B. Similar mirror results are seen as in panel A.
Entropy 22 00265 g006
Table 1. Descriptive statistics for real-world networks used in this study. ASPL: Average Shortest Path Length. Modularity computed using the Louvain method [40].
Table 1. Descriptive statistics for real-world networks used in this study. ASPL: Average Shortest Path Length. Modularity computed using the Louvain method [40].
Network | V | | E | k DensityTransitivityASPLModularityAssortativity
Sampson’s monastery18717.90.4640.531.540.29−0.07
Freeman’s EIES3441524.40.7400.821.260.07−0.15
Kapferer tailor391588.10.2130.392.040.32−0.18
Hollywood music3921911.20.2960.561.860.20−0.08
Golden Age5556420.50.3800.531.640.45−0.13
Dolphins621595.10.0840.313.360.52−0.04
Terrorist621524.90.0800.362.950.52−0.08
Les Miserables772546.60.0870.502.640.56−0.17
CKM physicians1101933.50.0320.164.240.61−0.11
Email Spain113354529.60.0090.173.610.57−0.08

Share and Cite

MDPI and ACS Style

Pond, T.; Magsarjav, S.; South, T.; Mitchell, L.; Bagrow, J.P. Complex Contagion Features without Social Reinforcement in a Model of Social Information Flow. Entropy 2020, 22, 265. https://doi.org/10.3390/e22030265

AMA Style

Pond T, Magsarjav S, South T, Mitchell L, Bagrow JP. Complex Contagion Features without Social Reinforcement in a Model of Social Information Flow. Entropy. 2020; 22(3):265. https://doi.org/10.3390/e22030265

Chicago/Turabian Style

Pond, Tyson, Saranzaya Magsarjav, Tobin South, Lewis Mitchell, and James P. Bagrow. 2020. "Complex Contagion Features without Social Reinforcement in a Model of Social Information Flow" Entropy 22, no. 3: 265. https://doi.org/10.3390/e22030265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop