Revisiting Information Cascades in Online Social Networks

Sidorov, Michael; Hadar, Ofer; Vilenchik, Dan

doi:10.3390/math13010077

Open AccessArticle

Revisiting Information Cascades in Online Social Networks

by

Michael Sidorov

^*

,

Ofer Hadar

and

Dan Vilenchik

School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Be’er Sheba 84105001, Israel

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(1), 77; https://doi.org/10.3390/math13010077

Submission received: 21 November 2024 / Revised: 22 December 2024 / Accepted: 25 December 2024 / Published: 28 December 2024

(This article belongs to the Special Issue Big Data and Complex Networks)

Download

Browse Figures

Versions Notes

Abstract

It is widely believed that a user’s activity pattern in Online Social Networks (OSNs) is strongly influenced by their friends or the users they follow. Building on this intuition, numerous models have been proposed over the years to predict information propagation in OSNs. Many of these models drew inspiration from the process of infectious spread within a population. While this approach is definitely plausible, it relies on knowledge of users’ social connections, which can be challenging to obtain due to privacy concerns. Moreover, while a significant body of work has focused on predicting macro-level features, such as the total cascade size, relatively little attention has been given to the prediction of micro-level features, such as the activity of an individual user. In this study we aim to address this gap by proposing a method to predict the activity of individual users in an OSN, relying solely on their interactions rather than prior knowledge of their social network. We evaluated our results on four large datasets, each comprising over 14 million tweets, recorded on X social network across four different topics over several month. Our method achieved a mean

F_{1}

score of 0.86, with a best result of 0.983.

Keywords:

deep learning; online social networks; information cascades; machine learning; information diffusion models

MSC:

68T07

1. Introduction

The propagation of ideas and innovation has driven significant political, cultural, and economic changes. Social scientists have observed that ideas often spread through social connections, mirroring the way an epidemic spreads within a population [1,2]. In 2018, it was found that people in US consume more news from social media than from newspaper [3], so understanding how information propagates on OSN platforms carries substantial real-world implications, ranging from marketing strategies, warning systems, personalized recommendations, and even political stability [4,5,6,7].

Extensive efforts have been made to predict various characteristics of information cascades, such as their size [8], temporal growth [9], and virality [10]. Much of this research has focused on transient copy propagation protocols, a concept introduced in [11], where content is replicated as it moves from user to user, like in a retweet. This has led to the tempting assumption that information in OSNs spreads according to an epidemiological model, such as the Bass model [12] or the susceptible-infected-recovered (SIR) model [13]. However, numerous “cautionary tales” have been raised by researchers regarding the limitations of this optimistic perspective.

One critical assumption in epidemiological models is that all newly infected individuals are part of the susceptible group (i.e., neighbors of infected individuals). However, mass media marketing often employs a “broadcast” mechanism, where large audiences receive information directly from the same source, bypassing social links. For instance, the authors of [14], who analyzed information cascades on the Flickr OSN, discovered that 47% of “favorite” markings (out of 10 million) were not transmitted through social links but via other broadcasting mechanisms employed by Flickr. Similarly, the findings of [15] revealed that 33% of retweets in their dataset credited users whom the retweeters did not follow.

Furthermore, unlike epidemiological models, where diseases spread in waves, it has been observed that most cascades in OSNs are remarkably shallow and wide. For example, Ref. [16] found that the average size of diffusion trees on X is only 1.3, with the vast majority consisting of a single node. This pattern has also been observed across other platforms, such as Digg and Flickr [8,14,17,18]. A study on email forwarding and recommendation chains conducted by [19] revealed that fewer than 10% of forwarded emails reach more than one recipient, an unexpectedly low figure. Similarly, the authors of [20] conducted a large-scale study on the effectiveness of word-of-mouth product recommendations and found that most recommendation chains terminate after just one or two steps.

For clarity, we can categorize research in cascade prediction into two main categories: macro-level and micro-level analysis. Research focusing on macro-level predictions aims to infer global features of cascades, such as final cascade size, width, depth, total number of reshares or clicks, overall adoption rate, or the mean shortest path between users. Conversely, micro-level prediction research focuses on individual users and their specific activities within the network. In this context, micro-level prediction tasks can be seen as a finer-grained version of macro-level tasks. For instance, knowing the activity of each individual user in the network transforms the task of determining macro-level features into a straightforward aggregation problem. Furthermore, insights into individual user activity have additional practical applications, including ad targeting, content personalization, and the prevention of harmful events. While macro-level prediction tasks have been extensively studied, relatively little attention has been devoted to micro-level prediction. In this study, we aim to bridge this gap by focusing on predicting the activity of a single user within a large set of users interacting on an OSN. X often is presented as an information network [21,22] and remains one of the most used OSNs today. In addition, through its API, Tweepy, the collection of large volumes of data is made fairly easy, which makes it an attractive OSN for analysis. Given the availability of relevant data at the time of collection, the chosen OSN for this study was X.

The data-driven approach has become the predominant method for predicting user interactions and interests in the era of “Big Data”, as collecting raw activity data from individual users and modeling their behavior have become generally more straightforward than manually crafting descriptive features. Previous studies in this domain [15,23,24] have highlighted social links as the strongest predictor for this task. While plausible, relying on social information has significant drawbacks. For instance, extracting social links can be highly time-consuming; on the platform X, the process of extracting social connections is on average six times slower than retrieving other information, making such approaches less suitable for real-time applications. Additionally, access to social link data may be restricted by limitations imposed by the OSN platform. We hypothesize that a carefully designed model can leverage social link information implicitly, thereby eliminating the need for the explicit extraction of social data altogether.

In this work, we address several fundamental questions about the nature of information propagation in OSNs: (1) Can the prediction task be performed using only the time series of user interactions, without relying on additional features such as linguistic information? (2) Is there a model-agnostic algorithm for this purpose? (3) Is the performance of the algorithm influenced by the the information on the social links between the users, or can this information be retrieved implicitly from the interactions?

Our contribution may be summarized as follows:

We collected four extensive datasets, each comprising over 14 million tweets recorded from X over a period of at least two month. These datasets were centered around four distinct topics that sparked significant discussion during their respective collection periods.
We thoroughly analyzed the datasets and proposed a novel criterion, which we term the “broadcastisity of the network” (B). Intuitively, B quantifies the extent to which interactions in an OSN resemble a broadcast process—where a single source disseminates information to many recipients.
We propose three simple approaches and one framework based on a deep convolutional neural network, which we call the Tweet Residual Convolutional Network (TWRCN), to address the micro-level prediction problem. Notably, three out of four proposed models, including TWRCN, do not require explicit social link information. The TWRCN achieved a mean $F_{1}$ score of 0.86 across all datasets, with a highest $F_{1}$ score of 0.983 on one specific dataset.
Our final contribution is the release of all the datasets, along with the accompanying code, to the research community, ensuring open access and promoting further achievements in the field.

The following work is organized as follows: In Section 2, we review some of the most influential studies connected to the current research. Section 3 describes the proposed method, outlining its key features and approach. Section 4 presents details of the datasets used for evaluation, while Section 5 describes the experimental setup. The results are presented in Section 6 and discussed in Section 7. Finally, in Section 8 we conclude with a summary of contributions and potential future directions.

2. Related Work

The first line of research in cascade prediction focuses on the problem of the identification of the most influential users in the OSN, with the ultimate goal of maximizing the reach of information. In this context, Matsubara et al. [25] modeled user influence using differential equations derived from the susceptible-infected (SI) model. Tang and Liu [26] incorporated both network structural and user-specific topic distribution features. Kempe et al. [27], followed by Chen et al. [28], developed greedy algorithms to address this problem. Feng et al. [29] further extended this approach by introducing a “novelty decay” parameter, which dynamically reduced the transmission probability of posts over time, effectively modeling an “old news” mechanism. Recently, Wu et al. [30] tried to identify key retweeters on the X OSN by incorporating features that relate the author of the initial tweet to its potential retweeters, which included temporal and locational features, as well relations based on the status and the interactions of the users. Later, Zola et al. [31] proposed a novel measure for the influence of users in the X network based on a grid search algorithm to find optimal weights that optimize the combination of custom features created for each user in the dataset.

The second line of research focuses on developing models that best fit cascades observed in the “wild”, with the goal of predicting their behavior in advance. Within this category, Zhao et al. [32] and Mishra et al. [33] designed statistical models based on the self-exiting point process to predict the final cascade size. Moreno et al. [34] utilized the Susceptible-Infected-Recovered (SIR) and the Susceptible-Infected-Recovered (SIR) models to demonstrate that the epidemiological threshold—the minimum number of individuals required to contract the disease for the epidemic to commence—disappears in heterogeneous networks with a scale-free property. These networks are categorized by the connectivity probability of

P (k)

∼

k^{- 2 - γ}

, where k represents the number of connections of an individual in the network. Ghosh and Lerman [35] proposed a custom cascade generating function to predict macroscopic features such as the final cascade size, spread factor, diameter, number of shortest paths, and their mean length. Lastly, Li et al. [9] proposed a custom neural network (NN) architecture to estimate the final size of cascades. Recently Kong et al. [36] established a relation between a general stochastic SIR epidemiological model in finite population.

Next, some studies focus on gathering descriptive information about cascades rather then predicting their behavior. Notable works in this category include those by Lerman and Ghosh [8], who studied the dynamics of information propagation on the X and Digg OSNs. They concluded that social links play a critical role in the spread of information. Cha et al. [14] analyzed large-scale traces from the Flickr OSN to uncover the laws governing the speed and depth of information propagation and the influence of social links between users. The authors found that despite the high popularity of some content, it did not necessarily spread quickly or widely, and overall communication between friends accounted for only around 50% of the information flow. Goel et al. [16] analyzed a large dataset collected on X OSN and containing over a billion posts. They proposed a novel measure of post popularity called “structural virality”, which distinguishes between a post becoming popular through a single large broadcast event and one gaining traction via many small reshares. Their findings showed that popular posts exhibit a wide variety of structural virality values, suggesting independence from network size. However, their model, based on an epidemiological infection process with a low transmission rate, failed to replicate this observed diversity in the structural virality noted in the real-world data. Lastly, Ver Steeg et al. [37] examined popular events on the Digg OSN, finding that many posts that initially spread rapidly—exceeding the viral threshold—ultimately influenced only a small portion of the social network (around 0.1%). The authors attributed this outcome to the phenomenon where people are less likely to share posts they have already encountered through other information channels.

Fewer studies have attempted to predict microscopic events at the post or user level. For instance, Galuba et al. [15] predicted whether a popular URL that had been retweeted multiple times would be retweeted again within a given time window. Their model-based approach extended the linear threshold (LT) and the at-least-one (ALO) models. In the LT model, a user reshares information if the sum of influences from all their neighbors exceeds a predefined threshold, whereas, in the ALO model, a user reshares a post if even a single one of his neighbors reshared it. Using these models, the authors achieved an

F_{1}

score of 0.7. Latter, Petrovic et al. [23] attempted to predict retweets of an original tweet using a machine learning technique based on a passive aggressive algorithm, first introduced in [38]. This algorithm maintains a linear boundary and, for each new example, aims to classify it correctly while minimizing the changes to the boundary’s location. However, their approach yielded significantly lower results, achieving an

F_{1}

score of just 0.46. Recently, other studies have explored the use of deep neural networks (DNNs) albeit for the prediction of macro-level features. For instance, Horawalavithana et al. [39] and Li et al. [9] focused on predicting cascade structure, while Qiu et al. predicted the influence of a given user based on its activity. Zhong et al. [24] used a NN that uses attention mechanisms to infer cascade final size. Kefato et al. [40] represented the interactions between users as a linguistic structure and predicted if some post will become viral by employing a convolutional neural network (CNN). Zhong et al. [24] proposed a framework based on attention mechanisms and user influence to predict the incremental cascade addition during future time period. Although it achieved impressive results on a close time window prediction task, the error of their method increased with larger time windows.

Recently, significant research leveraging neural networks has focused on the problem of sentiment analysis in OSNs. In this context, Saquia et al. [41] proposed a custom BERT [42] model combined with Word2Vec [43] and GloVe [44] techniques to enhance the detection of immoral conversations on the X platform. Similarly, Jose and Simritha [45] introduced an innovative approach utilizing natural language processing (NLP) and Long-Short Term Memory (LSTM) [46] networks for sentiment classification. Their method categorized sentiment into three classes—positive, negative, and neutral—achieving a notable accuracy of 86% on a dataset from the X OSN.

Our work builds upon this line of research but differs in two major aspects: (1) Unlike Galuba et al. [15] or Petrovic et al. [23], our model does not rely on model-based assumptions or handcrafted features, such as topic virality or number of followers, and (2) our method does not require explicit knowledge of social links between users but learns them implicitly from the interactions. To the best of our knowledge, we are the first to use DNN-based framework to perform a prediction task at the micro-level, achieving an

F_{1}

score exceeding 0.8.

3. Materials and Methods

The problem is formulated as follows: for each user

u \in V

, where V is a fixed set of users, and a time interval is defined as

[t_{0}, t_{0} + Δ T]

, the task is to predict whether user u will react (e.g., retweet, reply, or mention) to an existing post by another user

u^{'} \in V

(who u does not necessarily follow). The input to the prediction algorithm consists of a snapshot of activity for all users in V during the interval

[t_{0} - Δ T, t_{0}]

. This snapshot is represented as a binary vector, indicating whether each individual user

u \in V

was active at any point within the specified timeframe.

To this extent, we consider four models that incorporate information about the user’s neighbors to varying degrees. The first, and the simplest model, is the Maximum Likelihood Estimator (MLE). This model calculates probabilities for user activity in the future based on users’ activity in previous timestamps.

The second model, which we call the Tweet Prior Network (TWPN), tries to mimic the functionality of the MLE by a neural network (NN) with just two layers of equal size—input and output—as illustrated in Figure 1. Each neuron in this model represents a user in a network in two consecutive timestamps. The simplicity of this network reflects its focus on the temporal activity patterns without explicitly modeling social connections.

The third model, the Tweet Mask Network (TWMN), builds on the structure of TWPN but incorporates additional constraints Figure 2. Like the TWPN, each neuron represents the same user across two consecutive time windows. Initially, every neuron in the input layer is connected to every neuron in the output layer. During training, however, the adjacency matrix of the network is used to zero out connections between non-adjacent users, simulating the influence imposed by a user’s neighbors. This approach aligns with the concept of influence in the Linear Threshold (LT) model.

Finally, we introduce a composite framework that combines an Autoencoder (AE) [47] neural network, used for dimensionality reduction, with a Residual Convolutional Neural Network (RCNN) [48] as a predictor. We term this framework the Tweet Residual Convolutional Network (TWRCN) Figure 3. This model uses a novel transformation process to convert input data that are natively a 1D time series into 2D matrices suitable for convolutional neural network (CNN) processing. The TWRCN identifies latent connections between users and condenses their activity within the network. Notably, this approach does not require explicit social connections as input.

The learning process for all the models, except the MLE (which simply summarizes historical activity and directly calculates predictions without a learning phase), involves feeding the activity vector from timestamp

[τ - 1, τ]

into the model and minimizing the error in predictions for the subsequent time window

[τ, τ + 1]

.

3.1. Models Definition

3.1.1. Maximum Likelihood Estimation (MLE)

This probabilistic model learns the conditional probabilities of a user’s activity at a future timestamp

(τ + 1)

based solely on their activity in previous timestamp

(τ)

. Essentially, it predicts user activity by computing transition probabilities from one state (active/inactive) to another over consecutive time intervals. Mathematically, it may be expressed as follows:

p_{a, b}^{u} = P r [I_{u}^{τ} = a | I_{u}^{τ - 1} = b]

(1)

where

a, b \in {0, 1}

denote the activity states of the user, and

p_{a, b}^{u}

captures how likely user u will transition between these states. In this context, it is a simple yet effective model, as is demonstrated by our experiments, that assumes that user activity flows a consistent temporal pattern without explicitly incorporating social connections or more complex interactions.

3.1.2. Tweet Prior Network ( $T W P N$ )

The TWPN include two layers, input and output (as presented in Figure 1), and is designed to predict user activity at timestamp

τ + 1

based solely on their activity at timestamp

τ

. This method partially resembles the MLE model discussed previously, as it also tries to infer users’ probability to react based on the history, without relating to the user social connections. Each input and output neuron corresponds to a user’s activity state, and the network employs a

t a n h (w_{u} \cdot x_{u})

activation function to account for both positive and negative correlations between consecutive timestamps, where

w_{u}

represents the connection weight, and

x_{u}

is the input signal. This structure allows the TWPN to mimic the behavior of the MLE model while leveraging the neural network adaptability to model temporal patterns in user activity, without considering social links or external influences.

3.1.3. Tweet Mask Network ( $T W M N$ )

The TWMN is an extension of the TWPN, with the main difference being that it includes a fully connected architecture, i.e., each neuron in the input layer is connected to each neuron in the output layer. The key feature of TWMN is the inclusion of the influence of the neighbors on the activity of the particular user in the OSN. This is achieved using a masking technique in the process of training, where the weights between non-adjacent users are set to zero, effectively excluding them from contribution to the prediction for the user’s activity (Figure 2). In terms of functionality, the TWMN calculates the output for each user by summarizing the products of the connection weights and the inputs from their active nodes in the input layer. Specifically, the activity status (active or inactive) for a user is determined by the following:

x_{u} = t a n h (\sum_{n \in N (u)} w_{n, u} x_{n}) = \frac{e^{\sum_{n \in N (u)} w_{n, u} x_{n}} - e^{- \sum_{n \in N (u)} w_{n, u} x_{n}}}{e^{\sum_{n \in N (u)} w_{n, u} x_{n}} + e^{- \sum_{n \in N (u)} w_{n, u} x_{n}}}

(2)

where

x_{n}

represents the activity of neighboring user n, and

w_{n, u}

is the weight representing its influence on user u, which allows the model to account for the influence that user’s neighborhood exerts on its activity.

3.1.4. Tweet Convolutional Residual Network ( $T W C R N$ )

This model introduces a composite framework designed to predict user interactions in OSNs by leveraging hidden patterns within temporal activity data. The key premise is that user dynamics exhibit spatiotemporal invariance similar to images, i.e., that localized patterns in user behavior remain consistent and can reveal latent influences between users. Drawing inspiration from computer vision, where CNNs like

R e s N e t

excel at extracting localized features, this framework adapts such techniques to OSN data analysis.

The model comprises four components as follows: Encoder (E), Inflator (I), Predictor (P), and Decoder (D). First, the Encoder reduces dimensionality of the 1D input vector, projecting it into a compact latent feature space to minimize computational overhead while enhancing feature representation. The Inflator then transforms the encoded 1D vector into a 2D matrix by stacking multiple vectors, enabling the application of CNN architectures. The Predictor, utilizing

R e s N e t_{x 18}

or

R e s N e t_{x 32}

, extracts meaningful features and predicts future user interactions. Finally, the Decoder reconstructs the output to match the original input length, providing predictions for user activity (see Figure 3).

3.2. Training and Testing the Models

The training process for all the models, except MLE that did not include training phase but where the probabilities for each user were calculated from the tain data, was performed as follows:

For each dataset, we chose only users that exhibited a significant activity in the period of time collection, as will be described in Section 4. The number of users for each dataset is summarized in Table 1.
Data were segmented into equal intervals of 12 h each, and user activity in this interval was recorded. A binary vector of the same size as the number of users in each dataset was created, each users’ activity was summarized in each time window, and 1 was placed at the index of the user if it was active in this time window, with 0 for otherwise.
To enhance the gradient flow, we replaced the 1 s in the training vectors with values sampled from a normal distribution defined as x∼ $N (1, 0.01)$ , and 0 s from x∼ $N (- 1, 0.01)$ .
To make the results statistically valid, we employed a 5-fold cross-validation method, i.e., choosing 20% of the data for unbiased final testing each time, and repeating the process 5 times, each time choosing distinct test set.
For the MLE model, summary statistics were calculated on the training data and tested on the test set. To compute the conditional probabilities denoted by $p_{a, b}^{u}$ in Equation (1), we calculated for each user u the fraction of occurrences for each combination $(a, b)$ within the pairs $(τ_{i}, τ_{i + 1})$ for $i = 1, \dots, K$ , where K is the number of samples in the training set. During the testing phase, we compared the predicted MLE model output vector ${\tilde{τ}}_{i + 1}$ with the ground truth, and we computed the evaluation metrics.
For the models based on NN, we used the standard method employed in this domain, i.e., we split the data into train and test sets in proportion 80:20 (5-fold cross-validation). Each time, we chose a different set out of the data as the test set. From the data designated for the training, we chose 20% as validation. We halted the training procedure if the training validation loss did not decrease in the course of 50 consecutive epochs. The models were trained for 1000 epochs
For the TWPN and TWMN, the weights were randomly initialized, while for the TWCRN, we initialized the $R e s N e t_{x 18 / x 34}$ with the weights from pretrained model on the ImageNet task.
We used an Adam [49] optimizer with standard settings and a learning rate of 0.001 that was halved each time the loss on the validation set did not decrease for consecutive 20 epochs.
We used a mean squared error (MSE) loss for our models.

At the test step, we fed each model with

τ_{i}

, for

i = K + 1, \dots N

, to receive the predicted vector

{\tilde{τ}}_{i + 1}

. Each entry in

{\tilde{τ}}_{i + 1}

(either the output of tanh or of the decoder, depending on the model) was set to 1 if positive and 0 if negative. The MLE probabilities were rounded to 0 or 1. This vector was then compared with the real

τ_{i + 1}

to compute precision, recall, and

F_{1}

score defined as follows:

p r e c i s i o n = \frac{t r u e p o s i t i v e s}{t r u e p o s i t i v e s + f a l s e p o s i t i v e s}

(3)

p r e c i s i o n = \frac{t r u e p o s i t i v e s}{t r u e p o s i t i v e s + f a l s e n e g a t i v e s}

(4)

F_{1} = 2 \cdot \frac{p r e c i s i o n \cdot r e c a l l}{p r e c i s i o n + r e c a l l}

(5)

4. Data

We collected four datasets from X using the Tweepy API. Each dataset concerns a different major event at that time: Volcano (the eruption of the Taal volcano in the Philippines on 12 January 2020), Kobe (the death of Kobe Bryant in a helicopter crash on 26 January 2020), Princess (the stepping down of the Duke and Duchess of the British royal family in January 2020), and Beirut (the explosion in Beirut on 4 August 2020). Table 2 describes the keywords that were used and the size of each dataset, and Table 3 shows the breakdown according to different types of posts.

There are several possible ways of posting and interacting on X. The simplest post is a tweet, i.e., a post that the user writes himself and is not related to any other post. Tweets appear on the home timeline of the sender, and on the home timelines of all his followers, but the timeline algorithm of X keeps changing frequently. Features like “in case you forgot” break the synchronous timeline structure, and suggestions of tweets from users one does not follow break the social link structure.

There are three possible reactions to a tweet: mention is a post that contains the “@username” syntax in the body of the text, reply is a direct response to another tweet, and a retweet is a propagation of someone else’s tweet (retweet can also be done in an automatic manner using the 1-click option). All three reactions will be visible on the home timeline of the reacting user and on the timeline feed of all his followers.

The next key component in the data pre-processing was to recover the social links between the users. We can infer two types of graphs. The first is the mention graph, in which a link exists between user u and v if u reacted to v’s post (but u need not follow v). Another graph, the followers graph, is the one recovered using the API of X where a link between u and v exists if user u follows v. Note that there need not be an inclusion relationship between the two sets of links, but the way our dataset was collected (using the mention graph) entails that the followers graph is a subgraph of the mention graph.

We recovered both social networks, and as we shall see, the accuracy of the classification task differs according to which network we use. Let us note that querying the API of X is computationally more expensive, as X limits the number of requests. Therefore it is interesting to know what is the gain, if any, when using the more costly obtained information of the follower graph of X.

As is customary in many works, users that have low activity are filtered out from the dataset. We created two sets: A, the active users who post at least a posts (either tweet or reshare of some sort), and P, the popular users that were reacted to at least p times. We chose a and p so that the number of users in the dataset would be roughly 10,000 (due to the constraint of memory resources).

Broadcasticity

Another parameter that is of interest is the depth of the cascade. As noted, most cascades are shallow. Various measures have been used to compute depth, including the Wiener index in [16], which calculates the average distance between all pairs of nodes in a diffusion tree. We suggest a much simpler criterion, which does not require the laborious task of computing the diffusion tree. We call it the broadcasticity measure, and it measures how much the diffusion resembles a broadcast rather than a diffusion. It is computed using the Jaccard index of the two sets A and P:

B (A, P) = 1 - J (A, P) = 1 - \frac{| A \cap P |}{| A \cup P |}

(6)

High broadcasticity means that A and P are almost disjoint, namely, users either tweet or react, but very few do both. This is exactly the case for a shallow 1-hop diffusion. If a diffusion, on the other hand, is deep and wide (like an epidemic spreading in the population), then many users belong to both A and P, and the broadcasticity is low. Note that a shallow diffusion with few long paths still attains large broadcastisity.

Table 4 describes the various features of each dataset with respect to A, P, and the broadcastisity.

5. The Experiments

Recall the classification task described in Section 3. Given a vector

τ_{i} \in {0, 1}^{| V |}

, in which the

j^{t h}

entry is 1 if user j tweeted in time window i, our goal is to predict the vector

τ_{i + 1}

, in which the

j^{t h}

entry is 1 if user j reacted to some post in time interval

i + 1

.

5.1. Other Baselines

We benchmarked our model against four other popular models. The first two are random guess models

R N D_{p = 0.5}

and

R N D_{p = π}

, which predict

τ_{i + 1} [j]

using a fair random coin flip, or the fraction of users that were active in the previous time window,

π

. The two last models are variations of the SI model: at-least-one (

A L O

) and linear threshold (

L T

). We follow the

L T

and

A L O

models that were developed in [15] as we found them to be the most general. The intuition of both is that if the neighbors of the user were active in the preceding time window, this may stir the user’s decision to become active himself in the next time window. Each algorithm takes a different try at estimating the probability that user u will become active in the next time window; Ref. [15] use the following expression for the

L T

:

p_{u}^{L T} = A (α_{v_{1}, u}, \dots, α_{v_{k}, u}, β_{u}, γ) T (μ_{u}, σ_{u}^{2}, t_{u}^{p o s t})

(7)

The

α_{v_{i}, u} \in [0, 1]

are the influence of a neighbor of u,

v_{i}

, on u. The

β_{u} \in [0, 1]

is a prior probability of user u to become active, and

γ \in [0, 1]

is the virality of the topic that is being discussed.

A (\cdot)

is an atemporal component, i.e., the probability that the user will respond to some post due to the influence that was exerted on him from his social circle. Concretely, this component is given by

A (\cdot) = σ_{a, b} (γ (β_{u} + \sum_{v : v \to u} γ α_{v, u} p_{u}))

(8)

where

σ_{a, b} (x) = \frac{1}{1 + e^{- a (b - x)}}

is a sigmoid function.

The second component of Equation (7),

T (\cdot)

, is a temporal component of the predicted activity probability, and it represents an empirical observation regarding the time that passes from a moment when any user in the social network initiates a new post until the first of his followers responds to it (

μ_{u}

and

σ_{u}^{2}

are the mean and SD of the latter). This component is unique for each social network and is given by

T (μ_{u}, σ_{u}^{2}, t_{u}^{p o s t}) = \frac{1}{2} e r f c (- \frac{l n (t_{u_{i}}^{p o s t}) - μ_{u}}{σ_{u} \sqrt{2}})

(9)

The parameter

t_{u_{i}}^{p o s t}

represents the time of post’s initiation relative to the start of the time window, and

e r f c (x) = 1 - \frac{2}{\sqrt{π}} \int_{0}^{x} e^{- t^{2}} d t

is a complementary Gauss error function.

For

A L O

, the following is used in [15]:

p_{u}^{A L O} = 1 - (1 - γ β_{u}) \prod_{v : v \to u} (1 - γ α_{v, u} p_{u})

(10)

In both cases, we followed the rule in [15], where a user is predicted to react if

p_{u} > 0.5

.

5.2. Training the Baseline Models

The baseline models were trained similarly to our models, with a 5-fold cross validation using 80% of the time intervals for training and 20% for testing. Figure 4 shows the total number of time intervals per dataset.

R N D_{p = 0.5}

and

R N D_{p = π}

have no tunable parameters. The parameters of

A L O

and

L T

were set as follows. Optimization was performed with Python’s (https://hyperopt.github.io/hyperopt/ (accessed on 20 November 2024)) Hyperopt library, with the

F_{1}

score serving as the function to maximize. This library uses the Tree of Parzen Estimators hyperparameter optimization algorithm from [50]. The parameter

γ

, topic virality, was set to 1, since all tweets come from the same topic. Also, the temporal part

T (\cdot)

was set to 1, since we are not interested in the event that a tweet t is reshared sometime in the future but whether user u reshared some content in a very limited time interval. Due to the computational constraints, the optimization was performed only on the prior vector of the

p_{u}

s. The dataset in [15] included 100 URLs for which retweeting was predicted, ours include 10,000 events, and the number of

α_{u, v}

s ended up being even larger. Therefore, the influence matrix

α_{u, v}

was supplied from the output of the

T W M N

model, where the weight of the edge in the NN that connects v and u may be thought of as the influence of v on u.

6. Results

We turn to describe the results of evaluating both the algorithms described in Section 3 and the baseline algorithms, Section 5.1, on the data that we collected, Section 4.

Our raw dataset spans the mention graph. Namely, the social links are derived from users resharing posts of other users. We further removed low-activity users from the dataset, leaving us with roughly

10^{4}

users, as described in Table 4. In our experiments, we also ran the algorithms on the followers graph dataset, which we obtained from the mention graph dataset by removing users which no one follows or who follow no one, by querying the API of X.

To assess the impact of using the adjacency matrix A on the performance of

T W M N

, we trained a second variant,

T W M N_{a l l - 1}

, in which the masking is removed, namely, the NN becomes fully connected (the same as replacing A with the all one matrix).

The convolutional network

T W C R N

does not receive explicitly the social links. To understand whether it implicitly learns them or not, we performed the following permutation test, which resulted in a model we call

T W C R N_{S H U F}

. At training time, the input vectors

τ_{i}

that are fed into the model are randomly shuffled, anew for every i, while the labels of the result vectors

τ_{i + 1}

are kept without change.

Table 5 describes the hardware and the running time it took to train each model. In Table 6, the results for the mention graph dataset, while Table 7 describes the results for the followers graph dataset. Since the training of the algorithms contains random choices, we repeated each execution five times. The standard deviations in both tables were measured over these five executions.

Note that our dataset is imbalanced. There are more zeros than ones in each

τ_{i}

. The average density of

τ_{i}

is given by the precision of the

R N D_{p = 0.5}

algorithm.

We summarize our main findings that we read from Table 6 and Table 7:

Our first observation is that the results obtained from the followers graph dataset are significantly better for all four datasets, with the F1 score being nearly twice as good in three out of the four (for the leading method). This means that the signal embedded in the followers graph is stronger than the one embedded in the mention graph. Trying to understand why, we ran Botometer [51] on the users that were removed from the mention graph and found that their bot score was on average much lower than the remaining users (e.g., for the Princess dataset, the score was 0.97 vs. 1.62 out of 5).
The best results across all datasets were achieved by the $T W C R N$ (both for the follower graph and the mention graph), with an average F1 score of 0.86. This NN does not take into account the social links, at least not explicitly (perhaps the NN learns them implicitly). This network is also completely agnostic to epidemiological models. A clue to that that $T W C R N$ learns the network structure in some implicit way lies in the fact that $T W C R N_{S H U F}$ failed miserably in all datasets. As we have mentioned in the introduction, many works have pointed out that information spreads differently in social networks than epidemics spread in the population. Thus, it may be that the $T W C R N$ is able to learn this more complicated model of infection, which eludes simple human intuition.
Having said that, we see that the simple $M L E$ algorithm performed surprisingly well on all datasets, achieving an average F1 score of 0.78 on the followers graph dataset. The $M L E$ follows a very simple intuition—it learns the user’s “trends”, such as whether the user takes a break between tweets or is a chain-tweeter. Note that the $M L E$ algorithm completely ignored the social links.
THe $T W M N$ is the only one of our algorithms that takes into account the social links. Indeed, it outperformed the $M L E$ , with an average F1 score of 0.81, but its performance was just slightly better than $T W M N_{a l l - 1}$ (at an average F1 score of 0.8), where again, the social links are not explicitly served. Hence, we may conclude that the NN is able to learn by itself the important social links, and it does not need them as explicit input.
The $T W P N$ performed better than the MLE, and the same as the $T W M N$ , at an average F1 score of 0.81. The intuition behind the $T W P N$ is similar to the $M L E$ , with the difference being that the $T W P N$ is basically learning a logistic regression per user rather than estimating the probabilities by averaging (although the tanh function is used and not the logit).
The two baseline models $A L O$ and $L T$ performed poorly, slightly improving over the random guess models. There are several ways to explain these poor results. This may be attributed to the way we have constructed the classification task, which may not be suitable for such models. Another possibility is that the influence weights $α_{u, v}$ that are used in Equations (7) and (10), which we derived from the weights of $T W P N$ , were not a good choice.

7. Discussion

In this work, we addressed two main premises:

Prediction Model Agnosticism: Can a prediction model can be developed without explicit consideration of the motivational aspect of information cascades or viewed through an epidemiological lens?
Role of Social Link Structure: To what extent is social link structure essential as an input to such prediction algorithms, or can it be learned implicitly?

We answered both questions affirmatively. The TWCRN framework achieved a high mean

F_{1}

score of 0.86 without an explicit provision of social link structure, as demonstrated by the random shuffle test, which indicates that the structure was implicitly utilized. Additionally, simpler algorithms such as the MLE and TWPN achieved slightly lower

F_{1}

scores while completely ignoring the social links. Further research is needed to understand the role of social links in the prediction of user activity, especially in other OSNs and settings.

When excluding the “Beirut“ dataset—the smallest in our collection—a notable pattern emerged: the ranking of

F_{1}

scores became inversely proportional to the ranking of broadcastisity scores, as shown in Figure 5. The anomalous behavior of the “Beirut” dataset in this relationship may be explained by the nature of the events being analyzed. Specifically, events that spread organically among the population, such as rumors, may differ significantly from events disseminating through mass media and only later appearing on OSNs. For instance, the incident in Beirut in August 2020 may reflect a situation where official statements carried more weight than users’ opinions. In contrast, the rumors surrounding the Princess in January 2020 plausibly represents a scenario where user-driven narratives dominated. This is likely because the latter topic was rooted more in personal opinions than factual reports, as was the case with the tweets related to the event in Beirut. We leave the further exploration of this subject as a topic for future research.

Initially, we hypothesized that the topic orientation of the datasets would introduce a significant bias into the model, leading to degraded performance on datasets not centered around a specific topic. Surprisingly, this assumption was proven incorrect, as demonstrated by the results on the “Unfiltered” dataset. Not only did the performance on this dataset not decline significantly, but it achieved an

F_{1}

score of 0.83—surpassing the performance observed on the “Beirut” dataset. This result underscores the robustness of our method and its ability to generalize across diverse datasets.

Furthermore, the broadcastisity metric for the unfiltered tweets aligns with the trends discussed previously, reinforcing our conclusion that a deeper investigation into the influence of the broadcastisity on the network dynamics is necessary to fully understand its impact.

We found that the results are significantly better in followers graph dataset, obtained by removing users that interact only with users that are not in their followers or friends list. Our first guess that these users are bots that turn out to be false, and we leave this point as well for future work.

Finally, let us discuss the limitations of our approach:

Our prediction task is time-constrained, namely, the input to the prediction algorithm is the activity in the last 12 h, and prediction is made for the following 12 h. While we played with these time slices and noted a minor effect on the results, we did not try other configurations such as training for a week and predicting for the next day.
We did not evaluate our models on platforms other than X. Nevertheless, we expect them to generalize well on any platform that provides the same mechanisms for communication and social interaction, since we did not use any domain-specific features such as linguistic features or other meta-data features.
Although the TWCRN framework does not require the adjacency network as an input, the data preparation phase, as was described in Section 4, still requires the extraction of the social links from the network, leaving the problem of designing a truely social-link-agnostic model for future work.

8. Conclusions

In this work, we addressed the problem of micro-level prediction of user activity in online social networks (OSNs). Our research involved the collection of four large datasets from the X OSN, each exceeding 14 million tweets, focusing on distinct topics that generated significant discussion during their collection periods. These datasets, which can be considered informational cascades, were collected over at least two months, with tweets filtered to induce only those containing topic-related words.

To better understand the OSN communication dynamics, we introduced a novel criterion called “broadcastisity” designed to measure the extent to which the communication on the OSN resembles a broadcast process. This was achieved by calculating the Jaccard distance between the set of users who produce new content (i.e., tweet) to those who only disseminate it (i.e., retweet).

We proposed four models, including a comprehensive framework, that consider social connections within OSN to varying degrees. The first model, the MLE, is a simple stochastic approach that calculates a user’s probability of becoming active based on their past behavior. Following this, we introduced two basic neural network models, namely, the TWMN and the TWPN, where the former incorporates social connection information and another that predicts user activity, while the latter does that without relaying on this information. Lastly, we presented a composite framework, the TWCRN, which integrates convolutional neural network (CNN) and Autoencoder (AE) architectures. In this framework, the CNN serves as the primary predictor, leveraging its ability to detect latent features within user interactions, akin to its function in visual data analysis. To adapt the 1D interaction vector for the CNN input, we developed an “inflation” procedure, transforming the 1D vector into 2D matrix by row-wise stacking. The AE model was used to reduce the dimensionality dimensionality of the input vector, ensuring computational efficiency and enriching its features by projecting it into the latent space. This framework demonstrated superior prediction performance across the evaluated datasets, efficiently inferring social connection information directly from user interactions.

For future work, we propose investigating the impact of semantic features on the algorithms by integrating natural language processing (NLP) tools to extract features related to tweet content. These features could serve as valuable predictors for retweet probabilities, enhancing the accuracy of our models. Additionally, we plan to explore the broadcastisity criterion further, applying it to other OSNs to analyze influence on news dissemination patterns and uncover its potential role in understanding network dynamics.

In conclusion, we believe the models presented in this work have significant potential for practical applications, including targeted advertising, preventing the spread of false information, and optimizing strategies for political campaigns. To facilitate further research, we are making the code and dataset used in this study freely available upon request.

Author Contributions

Conceptualization, M.S. and D.V.; methodology, M.S. and D.V.; software, M.S.; validation, M.S.; formal analysis, M.S.; investigation, M.S.; resources, M.S., D.V. and O.H.; data curation, M.S.; writing—original draft preparation, M.S. and D.V.; writing—review and editing, M.S., D.V. and O.H.; visualization, M.S.; supervision, D.V. and O.H.; project administration, D.V. and O.H.; funding acquisition, D.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Due to the size of the data, the authors are unable to store them on a free access server, but they will be handed personally upon request. Please contact the corresponding author at sidorov@post.bgu.ac.il for further details.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Delre, S.A.; Jager, W.; Bijmolt, T.H.; Janssen, M.A. Will it spread or not? The effects of social influences and network topology on innovation diffusion. J. Prod. Innov. Manag. 2010, 27, 267–282. [Google Scholar] [CrossRef]
García-Avilés, J. Diffusion of Innovation. In The International Encyclopedia of Media Psychology; Wiley-Blackwell: Oxford, UK, 2020; pp. 1–8. [Google Scholar] [CrossRef]
Haselton, T. More Americans Now Get News from Social Media Than from Newspapers, Says Survey. 2018. Available online: https://www.cnbc.com/2018/12/10/social-media-more-popular-than-newspapers-for-news-pew.html (accessed on 20 November 2024).
Avvenuti, M.; Cresci, S.; Marchetti, A.; Meletti, C.; Tesconi, M. Predictability or early warning: Using social media in modern emergency response. IEEE Internet Comput. 2016, 20, 4–6. [Google Scholar] [CrossRef]
Zola, P.; Ragno, C.; Cortez, P. A Google Trends spatial clustering approach for a worldwide Twitter user geolocation. Inf. Process. Manag. 2020, 57, 102312. [Google Scholar] [CrossRef]
Zola, P.; Cortez, P.; Brentari, E. Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers. Neural Comput. Appl. 2021, 33, 1245–1260. [Google Scholar] [CrossRef]
Attia, A.M.; Aziz, N.; Friedman, B.; Elhusseiny, M.F. Commentary: The impact of social networking tools on political change in Egypt’s “Revolution 2.0”. Electron. Commer. Res. Appl. 2011, 10, 369–374. [Google Scholar] [CrossRef]
Lerman, K.; Ghosh, R. Information contagion: An empirical study of the spread of news on digg and twitter social networks. In Proceedings of the International AAAI Conference on Web and Social Media, Washington, DC, USA, 23–26 May 2010; Volume 4. [Google Scholar]
Li, C.; Ma, J.; Guo, X.; Mei, Q. Deepcas: An end-to-end predictor of information cascades. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 577–586. [Google Scholar]
Cheng, J.; Adamic, L.; Dow, A.; Kleinberg, J.; Leskovec, J. Can Cascades be Predicted? In Proceedings of the WWW 2014, 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014. [Google Scholar] [CrossRef]
Cheng, J.; Kleinberg, J.M.; Leskovec, J.; Liben-Nowell, D.; State, B.; Subbian, K.; Adamic, L.A. Do Diffusion Protocols Govern Cascade Growth? In Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM 2018, Stanford, CA, USA, 25–28 June 2018; pp. 32–41. [Google Scholar]
Bass, F.M. A new product growth for model consumer durables. Manag. Sci. 1969, 15, 215–227. [Google Scholar] [CrossRef]
Kermack, W.O.; McKendrick, A.G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. London. Ser. A, Contain. Pap. A Math. Phys. Character 1927, 115, 700–721. [Google Scholar]
Cha, M.; Mislove, A.; Gummadi, K.P. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; pp. 721–730. [Google Scholar]
Galuba, W.; Aberer, K.; Chakraborty, D.; Despotovic, Z.; Kellerer, W. Outtweeting the twitterers-predicting information cascades in microblogs. WOSN 2010, 10, 3–11. [Google Scholar]
Goel, S.; Anderson, A.; Hofman, J.; Watts, D.J. The structural virality of online diffusion. Manag. Sci. 2016, 62, 180–196. [Google Scholar] [CrossRef]
Bakshy, E.; Hofman, J.; Mason, W.; Watts, D. Everyone’s an Influencer: Quantifying Influence on Twitter. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, Hong Kong, China, 9–12 February 2011; pp. 65–74. [Google Scholar] [CrossRef]
Goel, S.; Watts, D.; Goldstein, D. The structure of online diffusion networks. In Proceedings of the ACM Conference on Electronic Commerce, Valencia, Spain, 4–8 June 2012. [Google Scholar] [CrossRef]
Wu, F.; Huberman, B.A.; Adamic, L.A.; Tyler, J.R. Information flow in social groups. Phys. A Stat. Mech. Its Appl. 2004, 337, 327–335. [Google Scholar] [CrossRef]
Leskovec, J.; Adamic, L.; Huberman, B. The Dynamics of Viral Marketing. ACM Trans. Web 2005, 1, 5. [Google Scholar] [CrossRef]
Myers, S.A.; Sharma, A.; Gupta, P.; Lin, J. Information network or social network? the structure of the twitter follow graph. In Proceedings of the WWW ’14 Companion, 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 493–498. [Google Scholar] [CrossRef]
Kwak, H.; Lee, C.; Park, H.; Moon, S. What is Twitter, a social network or a news media? In Proceedings of the WWW ’10, 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 591–600. [Google Scholar] [CrossRef]
Petrovic, S.; Osborne, M.; Lavrenko, V. Rt to win! predicting message propagation in twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; Volume 5. [Google Scholar]
Zhong, C.; Xiong, F.; Pan, S.; Wang, L.; Xiong, X. Hierarchical attention neural network for information cascade prediction. Inf. Sci. 2023, 622, 1109–1127. [Google Scholar] [CrossRef]
Matsubara, Y.; Sakurai, Y.; Prakash, B.A.; Li, L.; Faloutsos, C. Rise and fall patterns of information diffusion: Model and implications. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 6–14. [Google Scholar]
Tang, L.; Liu, H. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 817–826. [Google Scholar]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
Chen, W.; Wang, C.; Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 1029–1038. [Google Scholar]
Feng, S.; Chen, X.; Cong, G.; Zeng, Y.; Chee, Y.M.; Xiang, Y. Influence maximization with novelty decay in social networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Wu, B.; Cheng, W.H.; Zhang, Y.; Cao, J.; Li, J.; Mei, T. Unlocking Author Power: On the Exploitation of Auxiliary Author-Retweeter Relations for Predicting Key Retweeters. IEEE Trans. Knowl. Data Eng. 2020, 32, 547–559. [Google Scholar] [CrossRef]
Zola, P.; Cola, G.; Mazza, M.; Tesconi, M. Interaction strength analysis to model retweet cascade graphs. Appl. Sci. 2020, 10, 8394. [Google Scholar] [CrossRef]
Zhao, Q.; Erdogdu, M.A.; He, H.Y.; Rajaraman, A.; Leskovec, J. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1513–1522. [Google Scholar]
Mishra, S.; Rizoiu, M.A.; Xie, L. Feature driven and point process approaches for popularity prediction. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 1069–1078. [Google Scholar]
Moreno, Y.; Pastor-Satorras, R.; Vespignani, A. Epidemic outbreaks in complex heterogeneous networks. Eur. Phys. J. B-Condens. Matter Complex Syst. 2002, 26, 521–529. [Google Scholar] [CrossRef]
Ghosh, R.; Lerman, K. A Framework for Quantitative Analysis of Cascades on Networks. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, Hong Kong, China, 9–12 February 2011; pp. 665–674. [Google Scholar]
Kong, Q.; Rizoiu, M.A.; Xie, L. Modeling information cascades with self-exciting processes via generalized epidemic models. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 286–294. [Google Scholar]
Ver Steeg, G.; Ghosh, R.; Lerman, K. What stops social epidemics? In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; Volume 5. [Google Scholar]
Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online Passive-Aggressive Algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
Horawalavithana, S.; Skvoretz, J.; Iamnitchi, A. Cascade-LSTM: Predicting Information Cascades using Deep Neural Networks. arXiv 2020, arXiv:2004.12373. [Google Scholar]
Kefato, Z.T.; Sheikh, N.; Bahri, L.; Soliman, A.; Montresor, A.; Girdzijauskas, S. Cas2vec: Network-agnostic cascade prediction in online social networks. In Proceedings of the 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain, 15–18 October 2018; pp. 72–79. [Google Scholar]
Saqia, B.; Khan, K.; Rahman, A.U.; Khan, W. Deep Learning-Based Identification of Immoral Posts on Social Media Using Fine-tuned Bert Model. Int. J. Data Inform. Intell. Comput. 2024, 3, 26–39. [Google Scholar]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Mikolov, T. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Jose, J.; Simritha, R. Sentiment Analysis and Topic Classification with LSTM Networks and TextRazor. Int. J. Data Inform. Intell. Comput. 2024, 3, 42–51. [Google Scholar]
Hochreiter, S. Long Short-term Memory. In Neural Computation; MIT-Press: Cambridge, MA, USA, 1997. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), Granada, Spain, 12–15 December 2011; Volume 24. [Google Scholar]
Davis, C.A.; Varol, O.; Ferrara, E.; Flammini, A.; Menczer, F. BotOrNot: A System to Evaluate Social Bots. In Proceedings of the WWW ’16 Companion, 25th International Conference Companion on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 273–274. [Google Scholar] [CrossRef]

Figure 1. Tweet Prior Network (

T W P N

) schematic representation. The vector

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

, and

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

.

Figure 1. Tweet Prior Network (

T W P N

) schematic representation. The vector

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

, and

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

.

Figure 2. Tweet Mask Network (

T W M N

) schematic representation. The vector

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

, and

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

.

Figure 2. Tweet Mask Network (

T W M N

) schematic representation. The vector

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

, and

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

.

Figure 3. Tweet Convolutional Residual Network (

T W C R N

) schematic representation, where

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

,

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

. (a) the Encoder part of the Autoencoder (AE) network with K layers [47], which is responsible for compressing the original activity vector. (b) The inflation step that receives a 1D vector and transforms it into a 2D matrix by copying it row-wise. (c) The ConvResNet with

L = 18

or

L = 34

layers, as described in [48]. (d) The Decoder network, which constitutes the second part of the AE network with M layers, which is responsible for the final decompression of the prediction vector.

Figure 3. Tweet Convolutional Residual Network (

T W C R N

) schematic representation, where

τ_{i}

is the activity vector in the time window

[t_{i} - Δ T, t_{i}]

,

{\tilde{τ}}_{i + 1}

is the predicted activity in time window

[t_{i}, t_{i} + Δ T]

. (a) the Encoder part of the Autoencoder (AE) network with K layers [47], which is responsible for compressing the original activity vector. (b) The inflation step that receives a 1D vector and transforms it into a 2D matrix by copying it row-wise. (c) The ConvResNet with

L = 18

or

L = 34

layers, as described in [48]. (d) The Decoder network, which constitutes the second part of the AE network with M layers, which is responsible for the final decompression of the prediction vector.

Figure 4. Histogram of the average activity of users in a single time slice of

Δ T = 12

h for the Princess dataset. Top: statistics for the mention graph dataset. Bottom: statistics for the followers graph dataset. Evidently, most users tweet or reshare only one post per time slice.

Figure 4. Histogram of the average activity of users in a single time slice of

Δ T = 12

h for the Princess dataset. Top: statistics for the mention graph dataset. Bottom: statistics for the followers graph dataset. Evidently, most users tweet or reshare only one post per time slice.

Figure 5. Relation of the broadcastisity (B) parameter, and the corresponding

F_{1}

score for all the datasets. As we see from the plot, when excluding the “Beirut” dataset, a clear negative relation between B and

F_{1}

score appears.

Figure 5. Relation of the broadcastisity (B) parameter, and the corresponding

F_{1}

score for all the datasets. As we see from the plot, when excluding the “Beirut” dataset, a clear negative relation between B and

F_{1}

score appears.

Table 1. Mean shortest path

E [L]

between any two randomly chosen nodes in the social networks.

E_{c} [L] = l o g (N)

, where N is the number of valid users in the social networks, and

E_{c} [L]

is the critical mean shortest path. Networks that satisfy the

E [L] \leq E_{c} [L]

(i.e., have a small-world property) are presented in bold.

Table 1. Mean shortest path

E [L]

between any two randomly chosen nodes in the social networks.

E_{c} [L] = l o g (N)

, where N is the number of valid users in the social networks, and

E_{c} [L]

is the critical mean shortest path. Networks that satisfy the

E [L] \leq E_{c} [L]

(i.e., have a small-world property) are presented in bold.

Name	Network Crawl			Tweet Log
Name	N	$E_{c} [L]$	$E [L]$	N	$E_{c} [L]$	$E [L]$
Volcano	8990	3.953	2.943	10,440	4.019	4.648
Kobe	8339	3.921	2.368	9225	3.965	2.593
Princess	9793	3.991	2.869	11,891	4.075	3.579
Beirut	7943	3.9	2.498	8876	3.948	3.103
Unfiltered	7208	3.86	6.23	7363	3.87	6.23

Table 2. Description of the five datasets that we collected. The first four were collected during different “viral” events, which took place during the year 2020, and they were filtered to include only tweets which were related to those events. The fifth dataset is an unfiltered tweet stream, which was recorded during August 2021.

Name	Keyword	Posts
Volcano	taal, volcano, philippines	17,994,283
Kobe	kobe, bryant	24,562,000
Princess	meghan, markle, harry, prince, princess	22,353,429
Beirut	beirut, lebanon, explosion	14,106,557
Unfiltered	-	56,338,915

Table 3. Breakdown of the post types for each dataset, where PO is a general post (i.e., tweet, retweet, or mention), TW is the initial tweet, RT is a retweet using the “RT @username” syntax or by clicking the dedicated button, and MT is a mention.

Name	PO	TW	RT	MT
Volcano	17,994,283	3,400,415	11,810,708	2,783,160
Kobe	24,562,000	2,962,387	20,768,520	831,093
Princess	22,353,429	4,694,745	15,457,234	2,201,450
Beirut	14,106,557	1,951,752	11,418,457	736,348
Unfiltered	56,338,915	35,728,226	813,013	19,797,676

Table 4. The table provides a summary of the datasets after filtering out low-activity users, i.e., users with less that a retweets, which constitute the A set, and the ones whose tweets were retweeted at least p times—the P set. The broadcastisity parameter is presented in the last column.

Name	$Posts$	slices	a	$\| A \|$	p	$\| P \|$	$\| A \cap P \|$	$\| A \cup P \|$	B
Volcano	2,710,915	108	103	3189	207	7757	321	10,625	0.970
Kobe	1,380,543	87	167	4674	167	4835	186	9323	0.980
Princess	2,710,915	87	234	5151	234	7866	993	12,024	0.917
Beirut	960,228	117	105	3830	210	5778	661	8,947	0.926
Unfiltered	721,301	28	100	3943	201	5238	88	8253	0.989

Table 5. Hardware specs with corresponding mean run times, where

C P U_{l o w}

is Intel i7-8550U @ 1.80 GHz,

C P U_{h i g h}

is Intel Xeon @ 2.00 GHz, and GPU is Tesla P100-PCIE-16GB. [H], [m], and [s] stand for “Hours”, “minutes”, and “seconds”, respectively.

Table 5. Hardware specs with corresponding mean run times, where

C P U_{l o w}

is Intel i7-8550U @ 1.80 GHz,

C P U_{h i g h}

is Intel Xeon @ 2.00 GHz, and GPU is Tesla P100-PCIE-16GB. [H], [m], and [s] stand for “Hours”, “minutes”, and “seconds”, respectively.

Model	CPU	RAM	Accelerator	Run Time
$R N D_{p = 0.5}$	$C P U_{l o w}$	32 GB	None	$31.75 \pm 1.48$ [s]
$R N D_{p = p r o p}$	$C P U_{l o w}$	32 GB	None	$30.00 \pm 1.22$ [s]
$M L E$	$C P U_{l o w}$	32 GB	None	$28.38 \pm 1.41$ [s]
$A L O$	$C P U_{h i g h}$	16 GB	None	$1.45 \pm 0.38$ [H]
$L T$	$C P U_{h i g h}$	16 GB	None	$1.29 \pm 0.31$ [H]
$T W M N_{S H U F}$	$C P U_{h i g h}$	16 GB	GPU	$1.18 \pm 0.75$ [H]
$T W M N_{a l l 1}$	$C P U_{h i g h}$	16 GB	GPU	$54.35 \pm 22$ [m]
$T W M N$	$C P U_{h i g h}$	16 GB	GPU	$1.12 \pm 72$ [H]
$T W P N$	$C P U_{h i g h}$	16 GB	GPU	$1.29 \pm 54$ [H]
$T W C R N_{x 18}^{S H U F}$	$C P U_{h i g h}$	16 GB	GPU	$1.1 \pm 0.28$ [H]
$T W C R N_{x 18}$	$C P U_{h i g h}$	16 GB	GPU	$54 \pm 11.5$ [m]
$T W C R N_{x 34}^{S H U F}$	$C P U_{h i g h}$	16 GB	GPU	$1.23 \pm 0.32$ [H]
$T W C R N_{x 34}$	$C P U_{h i g h}$	16 GB	GPU	$1 \pm 0.23$ [H]

Table 6. Summary of a 5-fold validation results on the test set for the datasets acquired through the mention graph technique, i.e., each cell is the average of five train–test executions ((most of the mean values were rounded up to the second decimal point, unless a higher precision was needed to decide on the best value, and standard deviations smaller then

0.01

were omitted). The best values for each metric appear in bold.

Table 6. Summary of a 5-fold validation results on the test set for the datasets acquired through the mention graph technique, i.e., each cell is the average of five train–test executions ((most of the mean values were rounded up to the second decimal point, unless a higher precision was needed to decide on the best value, and standard deviations smaller then

0.01

were omitted). The best values for each metric appear in bold.

	Volcano			Kobe			Princess			Beirut			Unfiltered
Model	Volcano			Kobe			Princess			Beirut			Unfiltered
Metric	P	F₁	R	P	F₁	R	P	F₁	R	P	F₁	R	P	F₁	R
$R N D_{p = 0.5}$	$0.19$	$0.28$	$0.5$	$0.29$	$0.34$	$0.5$	$0.34$	$0.4$	$0.5$	$0.17$	$0.23$	$0.5$	$0.33$	$0.4$	$0.5$
$R N D_{p = π}$	$0.19$	$0.21$	$0.24$	$0.29$	$0.29$	$0.34$	$0.34$	$0.38$	$0.43$	$0.17$	$0.18$	$0.22$	0.63 ± 0.02	0.64 ± 0.01	0.64 ± 0.01
$M L E$	$0.42$	$0.42$	$0.46$	0.31 ± 0.02	0.38 ± 0.02	$0.56$	0.58 ± 0.02	$0.6$	$0.68$	$0.22$	$0.29$	$0.44$	0.63 ± 0.01	0.63 ± 0.01	0.64 ± 0.01
$A L O$	0.18 ± 0.02	0.26 ± 0.03	$0.52$	0.18 ± 0.06	0.26 ± 0.07	$0.51$	0.31 ± 0.03	0.38 ± 0.04	$0.53$	0.09 ± 0.02	0.15 ± 0.03	0.51 ± 0.02	$0.4$	$0.54$	$0.83$
$L T$	0.18 ± 0.03	0.26 ± 0.04	$0.52$	0.17 ± 0.04	0.24 ± 0.05	$0.5$	0.31 ± 0.05	0.37 ± 0.06	0.53 ± 0.01	0.11 ± 0.03	0.17 ± 0.05	0.53 ± 0.01	$0.17$	$0.17$	$0.17$
$T W M N_{a l l 1}$	$0.49$	$0.45$	$0.49$	0.31 ± 0.02	0.38 ± 0.01	$0.59$	0.57 ± 0.01	0.57 ± 0.01	$0.66$	$0.21$	$0.29$	$0.53$	$0.39$	0.46 ± 0.02	0.56 ± 0.02
$T W M N$	$0.47$	$0.48$	0.52	0.32 ± 0.02	0.44 ± 0.02	0.83	0.6 ± 0.03	0.63 ± 0.03	0.75 ± 0.01	$0.26$	$0.32$	$0.46$	$0.64$	$0.64$	$0.65$
$T W P N$	$0.51$	$0.49$	$0.5$	0.33 ± 0.01	0.45 ± 0.01	$0.8$	$0.63$	$0.66$	0.78	0.327	$0.36$	$0.45$	$0.68$	$0.68$	$0.68$
$T W C R N_{x 18}^{S H U F}$	$0.23$	$0.2$	$0.18$	$0.22$	$0.3$	0.48 ± 0.02	$0.41$	$0.42$	0.43 ± 0.01	$0.15$	$0.19$	0.25 ± 0.02	$0.59$	$0.58$	$0.58$
$T W C R N_{x 18}$	0.54 ± 0.02	0.51 ± 0.01	$0.49$	0.38	0.5 ± 0.02	0.75 ± 0.07	$0.7$	$0.73$	$0.77$	0.32 ± 0.02	0.42 ± 0.02	0.62 ± 0.02	$0.837$	0.836	0.835
$T W C R N_{x 34}^{S H U F}$	$0.23$	$0.2$	$0.18$	$0.21$	0.28 ± 0.02	0.45 ± 0.03	$0.41$	$0.42$	$0.44$	$0.15$	$0.19$	0.25 ± 0.02	0.59 ± 0.01	0.58 ± 0.01	0.58 ± 0.01
$T W C R N_{x 34}$	0.55 ± 0.01	0.52	0.49 ± 0.01	$0.36$	0.49 ± 0.01	0.79 ± 0.03	0.71	0.74	0.777 ± 0.01	0.32 ± 0.02	0.419 ± 0.01	0.61 ± 0.04	0.838 ± 0.01	0.835 ± 0.01	0.832 ± 0.02

Table 7. Summary of results on the test set for the datasets acquired through the follower graph technique, i.e., each cell is the average of five train–test executions (most of the mean values were rounded up to the second decimal point, unless a higher precision was needed to decide on the best value, and standard deviations smaller then

0.01

were omitted). The best values for each metric appear in bold.

Table 7. Summary of results on the test set for the datasets acquired through the follower graph technique, i.e., each cell is the average of five train–test executions (most of the mean values were rounded up to the second decimal point, unless a higher precision was needed to decide on the best value, and standard deviations smaller then

0.01

were omitted). The best values for each metric appear in bold.

	Volcano			Kobe			Princess			Beirut			Unfiltered
Model	Volcano			Kobe			Princess			Beirut			Unfiltered
Metric	P	F₁	R	P	F₁	R	P	F₁	R	P	F₁	R	P	F₁	R
$R N D_{p = 0.5}$	$0.1$	$0.16$	$0.5$	$0.1$	$0.17$	$0.5$	$0.11$	$0.18$	$0.5$	$0.09$	$0.15$	$0.5$	$0.31$	$0.39$	$0.5$
$R N D_{p = π}$	$0.1$	$0.1$	$0.1$	$0.1$	$0.1$	$0.11$	$0.11$	$0.12$	$0.12$	$0.09$	$0.09$	$0.1$	$0.32$	$0.4$	$0.55$
$M L E$	$0.8$	$0.81$	$0.86$	0.73 ± 0.03	0.78 ± 0.02	$0.88$	$0.88$	$0.89$	$0.97$	0.59 ± 0.01	$0.65$	$0.79$	$0.63$	$0.64$	$0.64$
$A L O$	$0.09$	0.16 ± 0.01	$0.51$	$0.08$	0.14 ± 0.03	0.5 ± 0.01	0.11 ± 0.1	0.18 ± 0.02	0.53 ± 0.02	0.07 ± 0.01	0.12 ± 0.02	0.54 ± 0.01	$0.32$	$0.47$	$0.87$
$L T$	0.1 ± 0.01	0.17 ± 0.02	$0.56$	0.09 ± 0.02	0.15 ± 0.03	$0.56$	0.12 ± 0.01	0.19 ± 0.02	$0.55$	0.07 ± 0.01	0.12 ± 0.02	0.56 ± 0.01	$0.31$	$0.2$	$0.15$
$T W M N_{a l l 1}$	$0.81$	$0.81$	$0.85$	0.74 ± 0.02	0.8 ± 0.01	$0.93$	$0.86$	$0.86$	$0.97$	0.6 ± 0.02	0.66 ± 0.17	$0.79$	0.37 ± 0.03	0.44 ± 0.02	0.55 ± 0.01
$T W M N$	$0.8$	$0.86$	$0.97$	0.74 ± 0.02	0.83 ± 0.02	0.997	$0.86$	$0.87$	$0.99$	$0.56$	$0.69$	$0.96$	$0.47$	$0.54$	$0.62$
$T W P N$	$0.8$	$0.86$	$0.97$	0.73 ± 0.03	0.82 ± 0.02	$0.98$	$0.86$	$0.87$	$0.99$	0.58 ± 0.02	0.71 ± 0.02	$0.96$	$0.69$	$0.68$	$0.68$
$T W C R N_{x 18}^{S H U F}$	$0.09$	$0.01$	$0.1$	$0.09$	0.1 ± 0.01	0.12 ± 0.01	$0.1$	$0.1$	$0.1$	$0.08$	0.09 ± 0.01	0.12 ± 0.01	$0.56$	0.55 ± 0.01	$0.55$
$T W C R N_{x 18}$	0.892	$0.927$	$0.966$	0.785	0.877	$0.995$	0.975	$0.983$	$0.992$	0.71 ± 0.01	0.82	$0.97$	0.842	0.83	0.82
$T W C R N_{x 34}^{S H U F}$	$0.11$	$0.11$	$0.12$	$0.1$	$0.11$	$0.12$	$0.11$	$0.11$	$0.11$	$0.1$	0.1 ± 0.01	0.12 ± 0.01	$0.56$	$0.56$	$0.551$
$T W C R N_{x 34}$	$0.891$	0.929	0.97	$0.78$	$0.876$	$0.996$	$0.974$	0.983	0.993	0.68 ± 0.02	0.8 ± 0.01	0.977 ± 0.01	$0.84$	$0.827$	0.81 ± 0.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sidorov, M.; Hadar, O.; Vilenchik, D. Revisiting Information Cascades in Online Social Networks. Mathematics 2025, 13, 77. https://doi.org/10.3390/math13010077

AMA Style

Sidorov M, Hadar O, Vilenchik D. Revisiting Information Cascades in Online Social Networks. Mathematics. 2025; 13(1):77. https://doi.org/10.3390/math13010077

Chicago/Turabian Style

Sidorov, Michael, Ofer Hadar, and Dan Vilenchik. 2025. "Revisiting Information Cascades in Online Social Networks" Mathematics 13, no. 1: 77. https://doi.org/10.3390/math13010077

APA Style

Sidorov, M., Hadar, O., & Vilenchik, D. (2025). Revisiting Information Cascades in Online Social Networks. Mathematics, 13(1), 77. https://doi.org/10.3390/math13010077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revisiting Information Cascades in Online Social Networks

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Models Definition

3.1.1. Maximum Likelihood Estimation (MLE)

3.1.2. Tweet Prior Network ( $T W P N$ )

3.1.3. Tweet Mask Network ( $T W M N$ )

3.1.4. Tweet Convolutional Residual Network ( $T W C R N$ )

3.2. Training and Testing the Models

4. Data

Broadcasticity

5. The Experiments

5.1. Other Baselines

5.2. Training the Baseline Models

6. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Revisiting Information Cascades in Online Social Networks

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Models Definition

3.1.1. Maximum Likelihood Estimation (MLE)

3.1.2. Tweet Prior Network ( T W P N )

3.1.3. Tweet Mask Network ( T W M N )

3.1.4. Tweet Convolutional Residual Network ( T W C R N )

3.2. Training and Testing the Models

4. Data

Broadcasticity

5. The Experiments

5.1. Other Baselines

5.2. Training the Baseline Models

6. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.2. Tweet Prior Network ( $T W P N$ )

3.1.3. Tweet Mask Network ( $T W M N$ )

3.1.4. Tweet Convolutional Residual Network ( $T W C R N$ )