# Predicting the Popularity of Information on Social Platforms without Underlying Network Structure

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Empirical Data Analysis

#### 2.2. The Activation-Decay Model

#### 2.2.1. The Hill Equation and BiHill Equation

#### 2.2.2. The Activation-Decay Model

#### 2.2.3. The Algorithm for Popularity Prediction Based on Activation-Decay Model

**Step****1**- Gaining model parameters from historical data sets, ${K}_{a},{H}_{a},{K}_{d},{H}_{d}$, as shown in Figure 2 ①–③:
- (1)
- Taking the time of each message generation as the zero time, obtain the forward amount in every unit time (unit granularity adjustable). Process N messages’ forward amount in T period into data sequence, t, $id$, $Q{\left(t\right)}_{id}$.
- (2)
- Calculate the average amount of these N messages in T period time $q\left(t\right)=\frac{\sum Q{\left(t\right)}_{id}}{N}$, which yields date sequence t, $q\left(t\right)$.
- (3)

**Step****2**- Obtaining best parameters, $\alpha $ and $\beta $, by training set and test set, as shown in Figure 2 ④.
- (1)
- The training set data are divided into two parts, with the known maximum time ${T}_{known}$ (which can be set by oneself): the $0-{T}_{known}$ part is the known information set, and the ${T}_{known}-T$ part is the information set for prediction. e.g., if the information propagation data of 10 min is known, i.e., the data within 0–10 min are available, and the rest is a test set.
- (2)
- Find out the ${Q}_{max}{=max\left[Q\left(t\right)\right]|}_{0}^{{T}_{known}}$, calculate the total propagation amount of each message from Equation (11). The calculated value of the propagation amount of each message is compared with the actual propagation amount and calculates the average absolute error $MPAE$. When $MAPE$ is minimum, the parameters $\alpha $ and $\beta $ are the optimal parameters.

**Step****3**- Put the Related parameters ($\alpha ,\beta ,{K}_{a},{H}_{a},{K}_{d},{H}_{d}$) into the AD algorithm to predict the propagation quantity of the information to be predicted, as shown in Figure 2 ⑤–⑦.

#### 2.3. Evaluation Metrics for the Prediction Algorithm

#### 2.3.1. APE and MAPE

#### 2.3.2. TIC

#### 2.4. Baseline Algorithm

## 3. Experimental Results

#### 3.1. Prediction of the Popularity of Information

#### 3.2. Determine the Peak ${Q}_{peak}$

#### Peak Time ${t}_{peak}$

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Conflicts of Interest

**Figure 1.**The average forwarding amounts of information on WeChat and Weibo display similar statistical trends over time. In this figure, the upper row depicts the relationship between the average forwarding amount and time unit, with the horizontal axis scaled to (

**a**) 1 min and (

**b**) 10 s for WeChat and Weibo, respectively. The lower row is the trend of the average forwarding volume from its peak value over time. In terms of time, it takes time for the amount of news dissemination to reach the average peak, and the dissemination of information on different social platforms shows a large gap in the rate of information dissemination. The transmission rate of information on Weibo is faster than on WeChat. On average, for WeChat, it takes less than 30 min (1800 s) for a message to reach its peak from generation to transmission per unit time, while it takes only 200 s for Weibo.

**Figure 3.**Predicting the final forward number of messages after seven days based on knowing ${T}_{known}$ period of information. The upper row of the figure is the results on the WeChat dataset, while the lower is on the Weibo dataset. The X-axis represents the known propagation time. The Y-axis means that the prediction accuracy varies with the time of known information transmission. The granularity of extracted data would affect the accuracy of AD algorithm prediction. In the upper part (WeChat) of the figure, the prediction result would reach a relatively optimal level when the unit time was 10 min, while in the lower part (Weibo) of the figure, the unit time was 120 s. These results indicate that the proposed AD algorithm outperforms the baseline (BS) algorithm.

**Figure 4.**APE distribution on utilizing the initial 120-min data to predict the number of messages forwarded in the next 7 days. The X-axis represents the number of messages forwarded in the first 120 min, and the Y-axis represents the total number of messages forwarded in 7 days. The colored bars indicate the size of the APE. The upper part of the figure represents the experimental WeChat data results. The lower part of the figure represents the experimental Weibo data results.

**Figure 5.**Absolute Percentage Error (APE) distribution of the algorithms in the test set. We show the median and the middle 50th, 70th, and 90th percentiles of the distribution of APE across the forward messages. The upper part of the figure represents the experimental WeChat data results. The lower part of the figure represents the experimental Weibo data results.

**Figure 6.**The APE distribution and the MAPE and TIC index vary with knowing ${T}_{known}$ period of information when predicting the final forward amount after seven days. The X-axis is the time of the known information set, and Y-axis is the ratio of the APE for predicting the final forward number of messages. Compared with the BS method of predicting the popularity of information, the AD method obviously outperforms in every way. The upper part of the figure represents the experimental WeChat data results. The lower part of the figure represents the experimental Weibo data results.

**Figure 7.**MAPE of the messages varies with the knowing information in the AD algorithm on the WeChat dataset. The X-axis is the time of the known information set, and Y-axis is the MAPE for predicting the final forward number of messages. The red line represents the messages that have obtained their ${Q}_{peak}$ by ${T}_{known}$, while the blue line means the messages have not obtained their peak ${Q}_{peak}$ by ${T}_{known}$. The internal graph is the ratio of true and fake peaks in information propagation over the first known 120 min. AD algorithm can predict more accurately when the ${Q}_{peak}$ of the message is known.

**Figure 8.**APE distribution of the messages in AD algorithm on the WeChat dataset when the peak forward amount ${Q}_{peak}$ is known (left panels) and not known (right panels). The X-axis represents the number of messages forwarded in the known time ${T}_{known}$, and the Y-axis represents the total number of messages forwarded in 7 days.

