Predicting the Popularity of Information on Social Platforms without Underlying Network Structure

The ability to predict the size of information cascades in online social networks is crucial for various applications, including decision-making and viral marketing. However, traditional methods either rely on complicated time-varying features that are challenging to extract from multilingual and cross-platform content, or on network structures and properties that are often difficult to obtain. To address these issues, we conducted empirical research using data from two well-known social networking platforms, WeChat and Weibo. Our findings suggest that the information-cascading process is best described as an activate–decay dynamic process. Building on these insights, we developed an activate–decay (AD)-based algorithm that can accurately predict the long-term popularity of online content based solely on its early repost amount. We tested our algorithm using data from WeChat and Weibo, demonstrating that we could fit the evolution trend of content propagation and predict the longer-term dynamics of message forwarding from earlier data. We also discovered a close correlation between the peak forwarding amount of information and the total amount of dissemination. Finding the peak of the amount of information dissemination can significantly improve the prediction accuracy of our model. Our method also outperformed existing baseline methods for predicting the popularity of information.


Introduction
With the booming development of communication technologies and mobile services, online social networks enable billions of users to create and share information worldwide freely. Reading and reposting online content has become a significant way for individuals to communicate and express opinions [1,2]. To this end, the dissemination of information plays a fundamental role in our daily life and is of great economic value and practical significance [3,4]. The capacity to collect, clean, and analyze large-scale data has transformed the field of social-network analysis and empowers scientists with enhanced convenience and efficacy in conducting large-scale study [5][6][7][8]. The study of information spreading in social networks has become one of the core topics in computational social science [3,9,10] and network science [11,12]. It attracts increasing attention from fields such as sociology, physics, computer science, etc.
Among the above, the popularity prediction of information on social platforms is a crucial issue that has been widely concerned by both academic and industrial researchers in recent years [13][14][15][16][17][18][19][20]. By "popularity", we usually mean the final amount of viewing, collecting, forwarding, or sharing of information in networks [13], depending on the actual setting of each research.
First, let us briefly review the research progress on the popularity prediction of information. As one of the most classic studies, Szabo and Huberman [13] analyzed the popularity of content submitted to Digg and YouTube, where popularity means the number of votes on Digg or the number of views on YouTube, respectively. A strong linear correlation was discovered between the logarithmically transformed popularity of content in early and later time periods. The authors proposed a log-linear model-based linear regression (LR) method to predict popularity. See more details of this method in Section 2.4. Inspired by the above approach, the linear regression with degree model (LR-D) [21] was proposed to predict the popularity of the information in a greater variety of data sets by considering the cumulative degree of the users who reshare content. Furthermore, Bao et al. [22] found a close relationship between the popularity of the information and the structural diversity of the social network. Specifically, there exists a strong negative/positive near-linear correlation between the final popularity and its link density/diffusion depth over time. Thus, the final popularity of information can be computed by linear regression with the structural characteristics (LR-S) model.
From another viewpoint, a user who has forwarded a message may trigger another user to forward the message with a probability. By considering the underlying arrival process of information, and the aging effect and reinforcement effect in the spreading process, Gao et al. [23] proposed a model, named Exponential reinforcement and Time Mapping process (PETM), which combines the reinforced Poisson process model with a power-law relaxation. Based on the theory of self-exciting point processes, Zhao et al. [21] developed a Self-Exciting Model of Information Cascades (SEIMIC) method to predict the future sharing volumes of given posts on Twitter. The SEISMIC only requires the timestamps of reposts and the number of followers of the users.
From the empirical analysis, it is easy to find that a handful of vital users [24] dominate the spreading of information on social networks. Taking into account this phenomenon, Gao et al. [25] propose a mixture process to predict the popularity of information.
Besides the above algorithms, an enormous amount of research has been conducted to predict the popularity of information on social networks recently [26][27][28][29][30]. These research advances shed light on the applications spanning from communication, decision-making, cooperation, viral marketing, and advertising to prompt user-generated content such as blogs and scientific papers and understanding the evolution of information cascades online.
However, these methods either rely heavily on complicated features that are timevarying and cannot be easily extracted from multilingual and cross-platform content, or on the underlying network structures or properties that are often difficult to obtain. In this article, we analyzed several empirical data sets and found that the information-cascading process is best characterized as an activate-decay dynamic process. Based on our findings, we propose an activate-decay (AD)-based algorithm for predicting the long-term popularity of online content solely based on their early repost amount, without requiring knowledge of the social-network structure or content properties. The results show that our method uses the forwarding amount of information in WeChat within the first two hours to forecast its popularity for seven days with remarkable accuracy. Furthermore, we identified a close correlation between the peak of the amount of information dissemination and the total amount of dissemination. As long as the peak of the amount of information dissemination can be found, the prediction accuracy will be significantly improved. Our method also outperformed existing baseline methods for predicting the popularity of information.
Following the above brief introduction to the problem we are investigating, the rest of this paper is structured as follows. First, we conduct empirical analyses of two data sets of information-forwarding processes across the Weibo and WeChat platforms. Our analysis describes the rise and fall of information as an activate-decay dynamic process, which provides insight into attempts to model and predict information transmission. Second, we propose a model based on the (Bi)Hill equation from biochemistry, which has limited parameters and can predict the popularity of information without requiring knowledge of the underlying structure of social networks or content features. Finally, we perform experiments to demonstrate the effectiveness of our proposed method.

Materials and Methods
In this section, we begin by presenting an empirical analysis of two prominent socialnetwork platforms: WeChat and Weibo. We then use the observed spreading patterns of information on these platforms to develop a dynamic process that describes the rise and fall of information over time. Using this proposed dynamic process, we can predict the popularity of information.

Empirical Data Analysis
To begin with, let us provide a brief introduction to the datasets utilized. The WeChat dataset comprises over 90,000 news articles, including political news, economic news, legal news, military news, scientific and technological news, cultural and educational news, sports news, social news, etc., and their forwarding records between the individuals in the WeChat social platform from 2 June to 8 June 2016, was created in a collaboration project with Tencent's WeChat department. The forwarding records were collected from individuals sharing in timelines, group chat, and individual forwarding. The data includes the message id and the time t when a message is forwarded. The forwarding records of all messages in this dataset were anonymized.
The Weibo dataset, obtained from a competition hosted by Wolong Big Data on DataCastle (https://challenge.datacastle.cn, accessed on 1 May 2023), consists of roughly 30,000 microblogs, with over 17,840,000 forwarding records. Weibo is commonly referred to as the "Twitter of China". The messages in the Weibo dataset are mainly short paragraphs with at most 140 Chinese characters, with or without pictures. The dataset includes the content of microblogs, the users who published or forwarded the microblogs, the publish and forward time t, and the following relationship between users. In this research, we only use the ids and publish/forward times of the microblogs.
To better analyze the collective forwarding pattern of different messages, we standardize the timestamp of all the forwarding records in the three data sets and note the time when the message was released as t = 0. In Figure 1, we show the average amounts of information forwarded on WeChat and Weibo exhibit varying statistical trends over time. The figure's top row depicts the correlation between the average forwarding amount and time unit. For WeChat and Weibo the horizontal axis scale is (a) 1 min, (b) 10 s. The X-axis is logarithmic. On average, it takes less than 30 min (1800 s) for a message to reach its peak from generation to transmission per unit time, while it takes only 200 s for Weibo. After passing the peak period, the forwarding volume of all messages gradually decreases over time. Figure 1 indicates that the entire process can be divided into two stages, namely active and decay. The active stage is very fast to reach the peak point while the decay stage lasts a very long time. To gain a comprehensive understanding of the entire process, the x-axis of Figure 1's top row was plotted using a logarithmic scale. To visually show the rate of change in the forwarding number before and after reaching the maximum value, i.e., the maximum forwarding volume per unit time, after the information was released, the lower row of Figure 1 was plotted in a log-log coordinate. The shapes of the curves indicate that the change in the information's dissemination rates roughly follows a power law. The dissemination of news takes a little time to reach the average peak, and the rate of information dissemination on different social platforms exhibits a subtle difference. Notably, Weibo shows faster transmission rates than WeChat. Please find more analysis in Section 2.2.2.
In this research, our goal is to predict the final number of forwarding of a given message. Building on the empirical analysis mentioned above, we formulate a mathematical method that captures the rise and fall of the information dissemination process depicted in Figure 1. Our model enables us to predict the future shares of a piece of information by examining its sharing history, indicating whether the sharing cascade has undergone an initial stage of rapid expansion and identifying the messages that are most likely to be shared extensively in the future. After clearing and filtering the records, the data sets were divided as a train set and a test with 75% and 25% of the messages according to their real release time. The average forwarding amounts of information on WeChat and Weibo display similar statistical trends over time. In this figure, the upper row depicts the relationship between the average forwarding amount and time unit, with the horizontal axis scaled to (a) 1 min and (b) 10 s for WeChat and Weibo, respectively. The lower row is the trend of the average forwarding volume from its peak value over time. In terms of time, it takes time for the amount of news dissemination to reach the average peak, and the dissemination of information on different social platforms shows a large gap in the rate of information dissemination. The transmission rate of information on Weibo is faster than on WeChat. On average, for WeChat, it takes less than 30 min (1800 s) for a message to reach its peak from generation to transmission per unit time, while it takes only 200 s for Weibo.

The Hill Equation and BiHill Equation
The Hill equation, which was introduced by A.V. Hill in 1910 [31], is a biochemical characterization equation that has been widely utilized for analyzing nonlinear quantitative drug-receptor relationships [32]. Additionally, the Hill equation and its variant BiHill can also be used to describe the nonlinear transmission mathematically [33]. Hill equation can be expressed as follows [34] θ is the fraction of occupied sites where the ligand can bind to the active site of the receptor protein. |L| is free (unbound) ligand concentration. n is the Hill coefficient, which describes the synergy and is a measure of super sensitivity (i.e., the steepness of the response curve). Generally speaking, n determines the cooperativity of ligand binding in the following way: n > 1, positively cooperative binding: Once a ligand molecule is bound to the enzyme. the affinity of the enzyme for other ligands will increase. n < 1, negatively cooperative binding: Once one ligand molecule is bound to the enzyme, its affinity for other ligand molecules decreases. n = 1, noncooperative (completely independent) binding: The affinity of an enzyme for a ligand molecule does not depend on whether a ligand molecule has been bound to it. We apply the Hill function to the process of information propagation, take it as the function of time t, and its equation form is expressed in the following: where p > 0, k > 0, h > 0, are three parameters. And when h > 0, the system is in the activation effect, and the curve rises; when the h > 0, the system is in the inhibition effect, and the curve decays. The Biphasic Hill equation, abbreviated as the BiHill equation, indicates that activation and inhibition exist in the whole system at the same time. The BiHill equation is expressed as follows: where p m > 0, K a > 0, K i > 0, H a > 0, H i > 0 are the maximum value, the half-maximal activating value, the half-maximal inhibitory value, the activation Hill coefficient, the inhibitory Hill coefficient of BiHill(t), respectively. See the upper row of Figure 1. Applying this function to the process of information dissemination, the effects of activation and inhibition mechanisms in information dissemination are consistent with the mathematical meaning of this formula.

The Activation-Decay Model
According to the empirical analysis, the average forwarding amount of messages changes over time. In the beginning, the average amount of forwarding in unit time increases fast. However, when reaching the peak, i.e., the maximal amount, it decays slowly, until close to 0. Define an index r(t) to measure the degree of information dissemination approaching the peak value, where q max = max[q(t)]. It then clearly follows that: where K and H are two parameters. It is deduced that It is just a form of the Hill equation. r(t) is a quantitative index, and the greater its value is, the closer the amount of propagation per unit granularity is to the peak value. We have verified that r(t) is a segmented function with a log-log presenting the shape of "V" according to the real social-network data. When H < 0, r(t) is the "V" decaying part in the double log coordinate, while H > 0, it is the "V" rising part. See Figure 1.
In the process of disseminating information to a broader audience, there are often two opposing forces at play: activation and decay. Activation refers to factors that contribute to the spread or promotion of information, while decay refers to factors that inhibit or slow the spread of information. These two forces interact with each other in a dynamic and gamelike manner, influencing the ultimate outcome of the information dissemination process. This interaction between activation and decay factors can be complex and multifaceted, as various factors may contribute to the spread or inhibition of information.
We consider that the process of information dissemination is the interaction of activation and decay factors, and a game exists between them. Before the peak value of propagation per unit granularity, the activation state plays a leading role. After the peak value, the decay factor begins to dominate. Hence q(t) will show a process of rising and then decaying over time. Therefore, we define When H < 0, F is the motivation factor, and when H > 0, F is the decay factor. Based on the analysis above, and the random fluctuations can be regarded as an additive noise term, we construct a prediction function named AD function, q(t) = α * q max * Activation f actor * Decay f actor + Error f unction, i.e., where α and β are harmonic parameters, which acquire from the historical date training. Additionally, it can be shown as: Therefore, we can directly use the BiHill equation in OriginLab to fit the parameters K a , K d , H a , H d . From the calculation of the average propagation of all messages selected by the system to the forwarding of each message, the prediction function is: where . Then the propagation total amount of each message in T days is Except for Q max , other parameters can be obtained from historical data training, i.e., we only need to know the peak value of information dissemination, and we can predict the information dissemination. In fact, the amount of social-network information dissemination will reach its peak in a short time, with WeChat within 30 min and Weibo within 5 min, see Figure 1.

The Algorithm for Popularity Prediction Based on Activation-Decay Model
Assume that we have propagation data of N messages in T known , to predict the total information propagation (T > T known ) after T time, note id is the message, the amount of id's being forwarded at t is Q(t) id and the average amount of N messages is q(t): Step 1 Gaining model parameters from historical data sets, K a , H a , K d , H d , as shown in Figure 2 1 -3 : (1) Taking the time of each message generation as the zero time, obtain the forward amount in every unit time (unit granularity adjustable). Process N messages' forward amount in T period into data sequence, t, id, Q(t) id .
(2) Calculate the average amount of these N messages in T period time , which yields date sequence t, q(t). (3) Estimate the parameters K a , H a , K d , H d from Equations (4) and (5), or directly obtain these parameters by fitting through BiHill equation from Equation (9), see Figure 1. Step 2 Obtaining best parameters, α and β, by training set and test set, as shown in Figure 2 4 .
(1) The training set data are divided into two parts, with the known maximum time T known (which can be set by oneself): the 0 − T known part is the known information set, and the T known − T part is the information set for prediction. e.g., if the information propagation data of 10 min is known, i.e., the data within 0-10 min are available, and the rest is a test set. (2) Find out the Q max = max[Q(t)]| T known 0 , calculate the total propagation amount of each message from Equation (11). The calculated value of the propagation amount of each message is compared with the actual propagation amount and calculates the average absolute error MPAE. When MAPE is minimum, the parameters α and β are the optimal parameters.
Step 3 Put the Related parameters (α, β, K a , H a , K d , H d ) into the AD algorithm to predict the propagation quantity of the information to be predicted, as shown in Figure 2 5 -7 .

Evaluation Metrics for the Prediction Algorithm
In this subsection, the evaluation metrics of the prediction algorithms used were introduced briefly.

APE and MAPE
APE (Absolute Percent Error) is used to measure the relative error between the predicted value and the real value on the experimental dataset. APE is defined as: The lower the value of APE, the better the accuracy of the prediction model. MAPE (Mean Absolute percent error) is the average value of APE in the system, which is used to measure the relative errors between the average predicted value and the real value on the test set. MAPE is defined as: Additionally, the lower the value of MAPE, the better the accuracy of the prediction model.

TIC
The TIC (Theil inequality coefficient) is an indicator to measure the prediction ability of the model. The smaller the general value is, the better the prediction ability of the model is. The TIC is defined as: Therefore, the value range of this coefficient is 0-1. The closer it is to 0, the smaller the root mean square of unit error, i.e., the closer the predicted value is to the actual value, the better the model fitting effect will be.

Baseline Algorithm
As discussed in the introduction, there are currently numerous ways to predict popularity, including three main categories. These are predictions of early popularity [13], influence factors [35,36], and cascade propagation [22,37,38]. To validate the accuracy of our prediction method, we chose a typical popularity prediction algorithm [13] as the baseline method. The authors performed a logarithmic transformation on the popularity of submissions of online content from two content-sharing portals, YouTube and Digg. They found a strong correlation between the early and later times and used this relationship to predict the future popularity of messages.
where N s (t) is the popularity of message s at time t, t 1 and t 2 are two arbitrarily points in time, t 2 > t 1 , and η(τ) refers to independent values drawn from a fixed probability distribution.

Experimental Results
The performance of the prediction model will be shown in this section. We apply three error function indicators: APE, MAPE, and TIC. We evaluate both the AD algorithm and the baseline algorithm for data on WeChat and Weibo, by comparing the performance of the MAPE, TIC, and APE.

Prediction of the Popularity of Information
In Figure 3, we compare the performance of the AD algorithm and baseline algorithm (called BS algorithm) on WeChat (with message number N = 31,247) and Weibo (with message number N = 25,467) social networks. We can draw the following conclusions: (1) AD algorithm: Within a certain granularity range, as the granularity becomes larger, the accuracy will increase, but it will not continue to improve as the granularity increases. It can be seen from the figure that the optimal value on WeChat data is obtained when the granularity is 5 min, and the better value on Weibo is 120 s. (2) With the growth of the known information time series (T known ), the effects of the two algorithms are becoming better and better. In WeChat data, the AD algorithm outperforms the baseline algorithm (BS), no matter in MAPE or TIC index. In the Weibo data, the AD performed better than the BS at any granularity in the MAPE index. For TIC indexes, the AD algorithm does not perform better than the BS algorithm when the granularity is 30 s or 60 s. However, the AD algorithm begins to show its advantages when the granularity is 120 s. (3) After the granularity is changed, with the increase in the known propagation time, the accuracy rate is better, the reason should be that the peak value of some information may appear over a long time. If the time is short, the true peak of the information has not yet appeared when the statistics are calculated, which affects the accuracy.
In Figure 4, we compare the predictive performance between the AD algorithm and the baseline algorithm. The AD algorithm has a wider range of high prediction accuracy. Intuitively, the red area represents the smallest error (less than 0.2). Compared with the BS algorithm, the AD algorithm can predict the future forwarding amount more accurately (the known forwarding amount ranges from about 1 to 10,000), while the BS algorithm can only reach this standard in the known forwarding amount range (about 50-3000). Whether or not the information is popular in the future, the AD algorithm can give more accurate predictions. This means that the AD algorithm is more flexible and robust, and its prediction performance is less affected by the known information.  We run AD and BS algorithms on the test set and compute the APE as a function of time. We plot the quantiles of the distribution of APE of the AD algorithm in Figure 5. The AD method demonstrates a clear improvement over the baseline. Take the upper figure (WeChat data) as an example, after 30 min, the APE of both algorithms was only in a stable state. After observing the cascade for 20 min, for the AD algorithm, the 90th, 70th, and 50th percentiles of APE are less than 75.6%, 54.2%, and 37.8%, respectively. This means that after 20 min, the average error is less than 37.8%for 50% of the messages and less than 71% for 90% of the messages. After 30 min, the error becomes stable-APE for 90%, 70% and 50% of the messages drops to 73.8%, 53% and 36.8%, respectively. At the same time, the degree of shadow location indicated in the figure indicates that the AD algorithm has greater prediction accuracy than the BS algorithm.  We make a more comprehensive presentation of the errors, plotting the AMPE, TIC, and the distribution of APE in a graph, and take these error indicators as a function of the known information-forwarding time, as shown in Figure 6. The greater the blue coverage area, the higher the algorithm's prediction impact. Again, the AD algorithm is giving much more accurate rankings than the baseline algorithm in every way.

Determine the Peak Q peak
In our AD algorithm, there is a very significant variable, Q max . During the implementation, we found that if Q max is the peak value Q peak in the process of information forwarding (Q max = Q peak ), the prediction accuracy of AD algorithm will be greatly improved, as shown in Figure 7. Q peak is the maximum value of the time series of informationforwarding volume in the whole life cycle. It is different from Q max , which is the maximum value of the time series of information-forwarding volume in the known period T known . We use the amount of information forwarded in the T known to predict the total amount of information forwarded in the life cycle (7 days in this paper). The experimental results show that whether the Q peak of information occurs within the known time T known will directly affect the prediction accuracy.

Peak Time t peak
Peak time t peak , we refer to the time when the popularity reaches the highest value Q peak per unit time once the popularity evolution starts. That means we can obtain Q max = Q peak if t peak < T known . The longer the known time T known , and the greater the probability of the real peak Q peak appearing, the more accurate the prediction result is. See Figure 7, MAPE_realpeak, which signifies that the Q peak value emerged within t peak < T known = 120 min, i.e., Q max = Q peak , which we term the real peak, as illustrated by the red dot in Figure 7. MAPE_fakepeak, which indicates t peak > T known = 120 min, i.e., Q peak did not emerge within the known 120 min, then Q max < Q peak , we use its maximum value Q max to predict, evidently its prediction accuracy rate is lower than Q max = Q peak , see the blue dot in Figure 7. The real forecast result is the outcome of combining the aforementioned two conditions, as represented in Figure 7's green dot schematic design. As a result, the most crucial issue we should examine in our future work is how to determine or forecast Q peak . In the first known 120 min of message spread data, using the peak Q peak to predict the final counts, the MAPE can reach 0.27, while the fake peak result is 0.35.   . MAPE of the messages varies with the knowing information in the AD algorithm on the WeChat dataset. The X-axis is the time of the known information set, and Y-axis is the MAPE for predicting the final forward number of messages. The red line represents the messages that have obtained their Q peak by T known , while the blue line means the messages have not obtained their peak Q peak by T known . The internal graph is the ratio of true and fake peaks in information propagation over the first known 120 min. AD algorithm can predict more accurately when the Q peak of the message is known.
To more intuitively assess the impact of Q peak on the prediction outcomes, we partition the dataset into two portions for prediction using t peak < T known and t peak > T known (T known = 40 min, with WeChat Official Account, it takes less than 30 min on average for a message to reach its peak from generation to transmission per unit time, see Figure 1). In Figure 8, the peak Q peak has been reached in the left figure, i.e., Q max = Q peak (t peak < T known ), and that the colored spots with APE < 0.4 account for 70.7% of the total. Its final retweets range from 10 3 to 10 5 (Y axis). Nevertheless, the peak Q peak is not attained in the right figure, i.e., Q max < Q peak (t peak > T known ), the colored points with APE < 0.4 account for 65.1% of the total, and the final forwarding volume range is only from 10 3 to 10 4 (Y axis). This demonstrates that Q peak has a considerable influence on the final forwarding amount range. The determination of the peak Q peak may not only broaden the forecast range of information popularity, but it can also considerably enhance information popularity predictability. Figure 8. APE distribution of the messages in AD algorithm on the WeChat dataset when the peak forward amount Q peak is known (left panels) and not known (right panels). The X-axis represents the number of messages forwarded in the known time T known , and the Y-axis represents the total number of messages forwarded in 7 days.

Conclusions
The spread of information, ideas, innovation, influence, behaviors, and styles within social networks is ubiquitous [7,8]. The popularity prediction of information on social platforms is a hot research topic recently [28,30]. Nonetheless, the majority of current methodologies either heavily depend on intricate features that are time-dependent and arduous to extract from multilingual and cross-platform content, or rely on intricate network structures or properties that are frequently challenging to acquire. In this paper, we analyzed several empirical data sets and found that the information-cascading process is best characterized as an activate-decay dynamic process. Then, we introduced the activate-decay-based (AD) algorithm, which predicts the long-term forwarding amount of information without requiring knowledge of social-network structure or content features. Instead, the AD algorithm only uses limited information, i.e., the amount of information forwarded within specific time intervals (e.g., 30 min for WeChat, and 3 min for Weibo), to predict the total forwarding amount over several days accurately.
The AD algorithm is a straightforward and practical approach for predicting information popularity, which outperforms the baseline algorithm in accuracy. However, a challenge remains in determining the actual maximum forwarding amount within a given time interval. To address this challenge, we assume that the maximum propagation amount per unit of time based on past data denoted as Q real max , represents the peak value. Nonetheless, we find that identifying the genuine peak forwarding value can further improve the accuracy of our prediction results, as illustrated in Figure 7. Therefore, we plan to focus on this issue in future research. Funding: This research was funded by the STI 2030-Major Projects (2022ZD0211400), the China Postdoctoral Science Foundation (2022M710620), the Sichuan Science and Technology Program (2023NS-FSC1353), the Project of Huzhou Science and Technology Bureau (2021YZ12), and the UESTCYDRI research start-up (U032200117). This work has been partially supported by the New Cornerstone Science Foundation through XPLORER PRIZE.