Inﬂuence Maximization Based on Snapshot Prediction in Dynamic Online Social Networks

: With the vigorous development of the mobile Internet, online social networks have greatly changed the way of life of human beings. As an important branch of online social network research, inﬂuence maximization refers to ﬁnding K nodes in the network to form the most inﬂuential seed set, which is an abstract model of viral marketing. Most of the current research is based on static network structures, ignoring the important feature of network structures changing with time, which discounts the effect of seed nodes in dynamic online social networks. To address this problem in dynamic online social networks, we propose a novel framework called Inﬂuence Maximization based on Prediction and Replacement (IMPR). This framework ﬁrst uses historical network snapshot information to predict the upcoming network snapshot and then mines seed nodes suitable for the dynamic network based on the predicted result. To improve the computational efﬁciency, the framework also adopts a fast replacement algorithm to solve the seed nodes between different snapshots. The scheme we adopted exhibits four advantages. First, we extended the classic inﬂuence maximization problem to dynamic online social networks and give a formal deﬁnition of the problem. Second, a new framework was proposed for this problem and a proof of the solution is given in theory. Third, other classical algorithms for inﬂuence maximization can be embedded into our framework to improve accuracy. More importantly, to reveal the performance of the scheme, a series of experiments based on different settings on real dynamic online social network datasets were carried out, and the experimental results are very promising.


Introduction
With the popularization of the mobile Internet and the vigorous development of new media, online social networks have changed many aspects of human daily life. People can carry out a series of activities in online social networks, such as sharing ideas, communicating, receiving news, establishing friendships, and so on. Mass users and real-time information spreading make online social networks a new carrier of information diffusion. More and more companies are beginning to use online social networks to market their products. This trend has attracted the interest of researchers in many different areas. Understanding the information diffusion process in social networks is beneficial to reveal the structure of human society and influence the strategies for marketing products.
Viral marketing based on the word-of-mouth effect is an important application of online social networks. This marketing pattern can be abstractly described as an influence maximization problem, which is an indispensable branch of social network analysis [1]. The problem of influence maximization is to select a small group of seed nodes in an online social network to maximize their influence on other nodes in the network. It is proved that influence maximization is an NP-hard problem under the linear threshold and independent cascade model. proved that influence maximization is an NP-hard problem under the linear threshold and independent cascade model.
There has been a lot of research around influence maximization. In these studies, an online social network is usually regarded as a graph in which nodes represent users and edges represent the relationships between users. The researchers analyze the process of influence diffusion based on the graphs and then use the greedy algorithm or heuristic algorithm to find the most influential seed set. However, we found that the online social network structure generally remained unchanged in their study. In reality, the structure of online social networks is constantly changing over time, which is an important characteristic of online social networks. For example, in Twitter, a user follows a singer, and after a while, he may no longer like the singer, unfollows the singer, and then follows another singer. Once the network structure changes, the influence of users also changes, and individuals are more inclined to be influenced by people who are closely related to them. Therefore, using static social networks to study influence maximization in dynamic social networks eventually leads to finding suboptimal seeds. Some studies have considered the dynamic characteristics of networks, but none of them have perfectly solved the problems of performance and efficiency.
In dynamic networks, snapshots can be used to record the topology of the network at different times. To select the most influential seed node in the whole process, we need to find the optimal solution in different snapshots. This is because as the network structure changes, the influence of seed nodes also change. To facilitate understanding of the problem in dynamic social networks, we illustrate this concept with an example. Figure 1 shows snapshots of an online social network at different time stamps.
represents the snapshot at time = . This network contains 4 users, the connections between users are represented by edges, and the two connected users can influence each other. It is easy to find that the network structure has changed over time. At = 0, the most influential user is 1 , with the dynamic change of the network structure, the most influential user becomes 4 at = 2, and becomes 2 at = 3. This shows the importance of dynamic changes to the network. Therefore, to select the most influential seed set in a dynamic social network, we need to mine seed nodes from each snapshot. Ignoring changes between network snapshots may lead to poor results. In response to this problem, we propose a new framework called Influence Maximization based on Prediction and Replacement (IMPR). First, predict the upcoming network topology based on the previous network snapshots, and then use the prediction result to mine the seed nodes. For example, in Figure 1, 0 and 1 are used to predict 2 , and the prediction result is ′ , and the seed node at time = 2 is calculated according to ′ . In addition, to improve the computational efficiency, we adopted a fast replacement algorithm to mine the seed set under the new snapshot.  In short, the contributions of this paper are fourfold. First, we extended the classic influence maximization problem to dynamic online social networks and give a formal definition of the problem. Second, a new framework was proposed for this problem and a proof of the solution is given theoretically. Third, the accuracy of traditional methods can be improved based on our proposed framework. Finally, a series of experiments with different specifications and settings were conducted on real dynamic online social network datasets to examine the advantages of the framework, which prove to be very promising.
The organization of this paper is as follows. We summarize the literature related to influence maximization in dynamic online social networks in Section 2. In Section 3, we give a formal definition of the problem and introduce the proposed framework in detail. In Section 4, we conduct a series of experiments based on real online social network data sets to reveal the performance of our framework. Finally, conclusions and future work are presented in Section 5.

Related Work
The study of influence maximization was first proposed in 2001 by Domingos and Richardson [2]. Based on this research, Kempe et al. [3] defined the problem as a discrete optimization problem, which was a milestone for influence maximization research. They defined the problem as mining K seed nodes that maximize the spread of influence in an online social network based on a given diffusion model. In addition to this, they also proved that the influence maximization problem is an NP-hard problem when the given information propagation model is an Independent Cascade or a Linear Threshold model. For the solution of the problem, they proposed a greedy algorithm that can guarantee the approximate optimality of 1 − 1/e − ε [4]. Sviridenko [5] extended this greedy framework with a non-uniform cost function. Since the greedy algorithm involves a large number of Monte Carlo simulations, to reduce the computational complexity, many researchers have improved it. Leskovec et al. [6] and Goyal et al. [7] proposed Cost Effective Lazy Forward schema (CELF) and CELF++, respectively, using the sub-mode attribute of the influence function to reduce the number of Monte Carlo simulations for each seed node selection. Estevez et al. [8] discarded the overlapping part with the neighbors of the seed node when selecting the seed node, and this method is called the Set Covering Greedy algorithm (SCG). Chen et al. [9] removed those edges that could not successfully propagate information in the iterative process and proposed a new algorithm called New Greedy-IC. Following these, Zhou et al. [10] found the upper limit of the marginal benefit of node influence diffusion in the influence function and proposed an Upper Bound based Lazy Forward algorithm (UBLF). UBLF shortens the computation time and achieves similar accuracy to the greedy algorithm. In addition, the improved methods based on greedy algorithm include cascade discount algorithm (CD), influence maximization based on learning automata algorithm (IMLA), hybrid potential-influence greedy algorithm (HPG), and so on [11][12][13].
Although the greedy algorithm is very accurate, computational complexity remains a huge challenge when the network scales up. This is because Monte Carlo simulation is a time-consuming operation. Therefore, some researchers start to use heuristic algorithms to solve this problem. Chen et al. [9] studied the relationship between node influence and node degree and proposed the Degree-Discount algorithm. This algorithm greatly reduces the computational time complexity but sacrifices some accuracy. Khomanmi et al. [14] considered the influence of community structure on propagation and proposed a fast and scalable algorithm, called Community Finding Influential Node (CFIN). Kundu et al. [15] proposed the diffusion degree of a node, which is used to represent the influence of a node on other nodes. They use this centrality measure to select seed nodes. Kim et al. [16] proposed the Independent Path Algorithm (IPA) by using the independent influence paths to evaluate the influence of nodes. Apart from this, there are other heuristics based on influence paths, such as Influence Maximization Shortest Path (IMSP) [17], SIMPATH [18], and LDAG [19]. To support computation on large-scale networks, Tang et al. [20] proposed the TIM algorithm. TIM can significantly increase computation speed without compromising performance. Furthermore, there are many other heuristics CGA [21], ACO-IM [22], IRIE [23], and others [24,25]. Classical influence maximization algorithms are mainly divided into two categories: greedy algorithms and heuristic algorithms. The greedy algorithm has high precision but is computationally time-consuming, while the heuristic algorithm is efficient but sacrifices some precision. Most importantly, these studies are based on static networks.
Recently, some researchers have begun to devote themselves to the study of the influence maximization in dynamic networks. Currently these studies can be divided into two categories. The first one mainly considers the dynamics in the process of information dissemination, such as dynamic activation probability, dynamic threshold, and dynamic perception. The second is to consider the dynamic of the network topology, where edges are added or removed over time. Hao et al. [26] considered the dynamic changes in the propagation process, and proposed two models to solve the influence maximize in dynamic networks. The activation probability between two individuals in the first model depends on previous activation trials. The second is the dynamic variable threshold model, which argues that an individual's activation threshold can change according to an individual's attitude toward information. Considering user preferences and social influence, Teng et al. [27] used the knowledge graph to capture the dynamic perception of users, proposed a new problem of maximizing influence based on dynamic personal perception, and gave an approximate solution. Ge et al. [28] considered the dynamic changes of user interests in online social networks. Additionally, Li et al. [29] explored the dynamics of propagation and the influence of local aggregation factors on influence diffusion, and proposed a dynamic influence maximization algorithm based on cohesive entropy. This type of research [26][27][28][29][30][31] focuses on the dynamics of propagation. The influence between individuals in the propagation process is dynamically variable, but the network topology remains fixed.
This paper focuses on the influence maximization problem when the network topology changes dynamically. In response to this situation, to quantify the influence between two nodes in a dynamic network, Wang et al. [32] proposed a dynamic factor graph model (DFG) to calculate the dynamic influence of nodes. Agarwal et al. [33] studied the interaction patterns of users in dynamic social networks and proposed a globally optimized forward trace approach to mine key nodes in the propagation. Considering the situation that the influence between users changes with time and the network topology remains unchanged, Rodriguez et al. [34] proposed the continuous-time influence maximization problem and gave an approximate solution method. Moreover, Peng et al. [35] studied the influence maximization problem when social networks expand over time, and they proposed an adaptive sampling method to transform the influence maximization problem into a MAX − K coverage problem.
In addition, there are some other studies based on dynamic networks, among which are similar to ours including Meng et al. [36] studied the diffusion mode of information in multiple networks, and proposed the influence maximization problem of dynamic multisocial networks based on common friends. They combined multiple social networks into a dynamic network to study the influence maximization problem. Song et al. [37] studied the problem of tracking the most influential node sets in dynamic social networks and proposed an Upper Bound Interchange Greedy algorithm (UBIG). UBIG updates the seed set under different snapshots by calculating the difference between network snapshots with different timestamps. Wang et al. [38] defined a stream influence maximization (SIM) problem and proposed a sliding window model to maintain a set of k seeds that have the largest influence over the most recent social behaviors. Jia et al. [39] proposed a community-based influence maximization (CIM) algorithm to solve the problem in dynamic networks. CIM first divides the network into communities, then calculates the candidate seed nodes in each community after updating the network structure, and finally selects the k most influential nodes from the candidate seed nodes. However, these studies ignore that the network topology is updated in real-time in dynamic online social networks. Using snapshots of the network or existing update operations to mine seeds, the resulting seed set may not be optimal under the current network, and there is a lag between the seed set and the current network changes. Therefore, there is still a lot of research space for this issue.
In this paper, since the dynamic evolution of online social networks is continuous, we used historical network snapshots to predict the network topology at the next moment and then mined the seed nodes on the prediction result. Our goal was to maximize the influence of the seed set on the current network and weaken the impact of network changes on the results. To predict the structural changes in online social networks, we employed the link prediction technique in this paper. Methods for link prediction can be divided into three categories, including learning-based, probabilistic models, and similarity-based models [40]. There are three types of measures commonly used in similarity-based methods, including local, global, and quasi-local similarity measures. The local similarity index mainly utilizes local neighborhood information. The global similarity index is calculated based on the topology information of the entire network. Global similarity indices contain more information about the entire structure, but they are more complex to compute than local similarity indices. The quasi-local similarity index combines these two similarity measures and aims to find a balance between local and global. In this paper, we fused three similarity indexes to construct the feature vectors of edges in the network and then used a neural network to construct a prediction model.

Methodology
This section is mainly divided into three parts. First, we give a formal definition of the influence maximization problem in dynamic online social networks. Next, we introduce the computational framework proposed in this paper. Finally, we give a theoretical proof of the solution.

Preliminaries
An online social network is usually represented by a graph G = (V, E), where the node set V represents the user set, |V| = N indicates that there are N users, and the edge set E represents the relationship between different users. Information propagates along the edges in the network.
The classic influence maximization can be defined as an optimization problem in which the network topology is static. That is, given an online social network G = (V, E) and an information diffusion model M that simulates how information spreads in the network, this optimization problem can be defined as selecting K nodes from V as seed nodes such that the number of affected nodes is maximized after the end of the propagation process based on the diffusion model M in G. Assuming that S represents the set of seed nodes, the number of nodes affected by the seed nodes is denoted by R(S). Formally, the classical influence maximization problem can be defined as follows In a dynamic social network, as the network topology is constantly changing, network snapshots can be used to record the updates. In this study, we only consider the changes of edges over time, the nodes remain unchanged, so we denote the network snapshot at time t by G t (V, E t ), where V is the set of nodes and E t is the set of edges in the network at timestamp t. Since the network topology is constantly changing, the seed set S t will also change constantly, where S t indicates the seed set at time t. Referring to the classical definition of influence maximization, influence maximization in dynamic online social networks can be defined as follows. largest. Let R t (S t ) is the number of nodes affected by the seed nodes in the network based on M at time t. The formal expression is as follows: In this paper, the information diffusion model adopted the Independent Cascade model. In the Independent Cascade model, each edge in the network is assigned an independent probability p, which represents the strength of the influence between adjacent nodes. If a node is activated, it has only one chance to activate its inactive neighbor nodes. Additionally, once a node is activated, it remains activated throughout the process.

Proposed Method
Analyzing the evolution process of the dynamic online social network, it can be found that if the most influential seed set S t is mined based on G t , then S t may become less effective in practice. This is because the network is constantly updated and it takes time to calculate the S t . when the computation of S t is done, the network may have evolved to G t+σ , where we assume that the computation time of S t is f (S t ), the time interval between adjacent snapshots is d and f (S t ) ≤ σ < d. To avoid this problem, we propose a novel framework-Influence Maximization based Prediction and Replacement (IMPR), which first predicts the upcoming network snapshot based on historical snapshots, and then mines seed nodes on the predicted results. The obtained seed nodes are applied to the latest network as the most influential nodes. This is a near real-time scheme that improves the matching between seed nodes and the dynamic network.

Predict Upcoming Network Snapshot
Predicting the upcoming network topology becomes a link prediction problem when only considering the dynamic changes of links in dynamic online social networks. We can solve this problem with machine learning methods. IMPR uses a neural network algorithm (NN) for link prediction. The structure of the neural network is shown in Figure 2. This algorithm uses non-linear activation functions and multiple hidden layers to model complex patterns of edges in dynamic online social networks.
In this paper, the information diffusion model adopted the Indepen model. In the Independent Cascade model, each edge in the network is ass pendent probability , which represents the strength of the influence bet nodes. If a node is activated, it has only one chance to activate its inactive n Additionally, once a node is activated, it remains activated throughout the

Proposed Method
Analyzing the evolution process of the dynamic online social netw found that if the most influential seed set is mined based on , then less effective in practice. This is because the network is constantly update time to calculate the . when the computation of is done, the netw evolved to + , where we assume that the computation time of is ( terval between adjacent snapshots is and ( ) ≤ < . To avoid thi propose a novel framework-Influence Maximization based Prediction and (IMPR), which first predicts the upcoming network snapshot based on h shots, and then mines seed nodes on the predicted results. The obtained s applied to the latest network as the most influential nodes. This is a near rea that improves the matching between seed nodes and the dynamic network

Predict Upcoming Network Snapshot
Predicting the upcoming network topology becomes a link prediction only considering the dynamic changes of links in dynamic online social net solve this problem with machine learning methods. IMPR uses a neural rithm (NN) for link prediction. The structure of the neural network is show This algorithm uses non-linear activation functions and multiple hidden la complex patterns of edges in dynamic online social networks.
The IMPR framework uses a feature fusion algorithm that fuses diffe measures together to generate a feature vector, which is then passed to the the neural network.

Input Layer
Output Layer Hidden Layer  The IMPR framework uses a feature fusion algorithm that fuses different similarity measures together to generate a feature vector, which is then passed to the input layer of the neural network.
The local similarity indices used in the feature vector generation process include Adamic-Adar index (AA), Common Neighbors (CN), Preferential Attachment (PA), and Jaccard Coefficient (JC). The AA index S AA is to measure the similarity between two entities based on the shared features of the two entities. Let N(α) and N(β) denote the neighbor node sets of nodes α and β, respectively, and d γ represents the degree of node γ. The Adamic-Adar index can be expressed as: The CN index S CN between two nodes represents the size of the intersection of the neighbors of the two nodes, which is defined as follows.
The JC index S JC is similar to common neighbors. It normalizes the number of common neighbors and can be defined as: The preferential attachment property was first used in network generation models. The PA index S PA between node α and β is defined as: The global similarity indices usually contain more complete topological information about the network. The global similarity indices adopted in IMPR include cosine based on L + (Cos + ), Shortest Path (SP), Average Commute Time (ACT), and Matrix Forest index (MF).
Let L denote the Laplacian matrix of the network, which is widely used in graph theory as an alternative representation for graphs. L + denotes the pseudo-inverse of the L matrix computed by Moore-Penrose. Each entry of L + can be used to represent the similarity score between two corresponding nodes. Therefore, the Cos + index S COS + between nodes α and node β can be expressed as follows: The SP index S SP represents the shortest distance from a node to another node in the network. The shortest path between node α and node β is defined as: where D(α, β) represents the shortest distance between nodes α and β calculated using the Dijkstra algorithm [41]. The ACT index is based on the concept of random walk. The ACT similarity index S ACT between node α and node β is defined as the average number of steps required by a random walker to go from start node α to target node β and back to start node α. If s(α, β) is the average number of steps required to get from α to β, the following formula captures this concept.
S ACT (α, β) = s(α, β) + s(β, α) Mathematics 2022, 10, 1341 8 of 20 The MF index S MF is based on the concept of spanning trees. The similarity between nodes α and β can be calculated with the following formula. (I + L) (α,β) represents the number of spanning trees rooted at node α and containing both nodes α and β.
The quasi-local indices are a trade-off between global and local metrics. These metrics are computationally more efficient than global metrics. The quasi-local matrices used by IMPR are Path of Length 3 (L3) and Local Path Index (LP).
The L3 index was first used in protein-protein interaction networks. The L3 similarity index S L3 (α, β) between node α and node β is defined as: where a α,µ represents the interaction strength between node α and node µ, and d µ is the degree of node µ.
The LP index S LP is a local path-based metric that trades off accuracy and computational complexity. This metric can be expressed as follows, where A represents the adjacency matrix of the network and ρ represents a free parameter.
The local similarity index has high computational efficiency, the global index has more comprehensive information, and the quasi-local index ignores the information with lower correlation. To extract more comprehensive feature information and improve the performance of prediction, we employ a feature fusion scheme. The edge feature vector of dynamic online social networks is generated by the fusion of local similarity indices, global similarity indices, and quasi-local similarity indices, as in Algorithm 1. To obtain the best-performing feature vector, we fused these similarities in different combinations. The optimal feature vector is eventually used as input to the neural network algorithm to predict the structure of the upcoming network.

Mining Seed Nodes for Influence Maximization
In a dynamic online social network, the network topology changes over time, but is unlikely to change drastically in a short period of time. Therefore, the network structure in two adjacent snapshots is similar, which also leads to the possibility that the most influential seed nodes may be similar. To solve the influence maximization problem in dynamic networks, based on this idea, the IMPR framework adopts a fast replacement algorithm. In this algorithm, if the seed set S t in the network snapshot G t at time t has been obtained, then when calculating the seed set S t+1 at time t + 1, S t+1 can be obtained by directly replacing and updating the nodes in S t . This avoids building from scratch and greatly saves computing time.
We adopt the Interchange Heuristic proposed by Fisher et al. [42] as our strategy for replacing nodes in S t . The Interchange Heuristic changes only one element of the set at a time, and they have proved that when the objective function is a monotonic submodular function, it is possible to quickly find the set that can no longer be improved. The influence function is a monotone submodular function that satisfies the applicable conditions.
The purpose of updating S t to S t+1 according to the Interchange Heuristic strategy is to obtain the maximum gain. Let δ v,v s (S t ) denote the gain brought by replacing node v s ∈ S t with node v ∈ V − S t , then the replacement rule can be expressed as: where V represents the set of nodes in the network.

Algorithm 1 Generate the input feature vector
Input: Snapshots of a dynamic online social network {G t } t 0 Output: Feature set for edges Edge_ f s 1: for snapshot in G 0 , G 1 , G 2 , G 3 , · · · , G t do 2: for each edge_curr in snapshot do 3: node1, node2 ← edge_curr 4: cn We can find that this strategy involves a lot of Monte Carlo simulation processes, which is a time-consuming operation. To improve efficiency, we use an upper bound on the gain to reduce a large number of computational processes. Algorithm 2 describes the process of selecting a node to replace a fixed node v s v s ∈ S t in the seed set. If the maximum replacement gain δ v,v s is less than a given threshold λ, the search is abandoned and we can then reselect a node from the seed set for replacement. This loses some improvements but speeds up the update process. Additionally, the improvement below the threshold is negligible and wastes computation time. In our framework, in order to calculate the seed set S t+1 at time t + 1, we only need to select the node with the greatest possible replacement gain from the seed set S t at time t, and use the above algorithm to exchange it.

Algorithm 2 Select a candidate seed node
Input: Snapshot G t (V, E), Seed set S t at time t,v s v s ∈ S t , The upper bound on replacing gain δ v.v s (S t ) Output: A candidate seed node v * 1: if cur v * then 10: break 11: else 12: cur v * ← true 14: end if 15: end while With the above two important parts, the problem of influence maximization in a dynamic social network can be solved easily by IMPR. At the beginning of the algorithm, we use the greedy algorithm to obtain the seed set S 1 on the initial snapshot G 1 . The next process of the whole framework is shown in Figure 3. We first use the historical snapshots {G t } t 1 to predict the upcoming network snapshot G t+1 , then use the fast replacement algorithm to update the seed set on the predicted network snapshot, and finally get the fresh seed set S t+1 for the network at time t + 1. The complete prediction and fast replacement process are described in Algorithm 3. This seed set has the highest matching degree with the dynamic network and has the largest influence on the network at time t + 1.

Theory Proof
In this section, we give a theoretical proof of the scheme proposed in this paper.

Theorem 1.
The higher the accuracy of the prediction result, the closer the seed set +1 obtained according to the prediction result is to the expected seed set ̅ +1 , and the greater the influence.

Algorithm 3 Influence maximization based on prediction and fast replacement
Input: Snapshot {G t } t 1 , The size of seed nodes K Output: Seed node set S t+1 1: S 1 = greedy(G 1 , K) 2:Ĝ t+1 ← predict the upcoming network snapshot G t+1

Theory Proof
In this section, we give a theoretical proof of the scheme proposed in this paper. Theorem 1. The higher the accuracy of the prediction result, the closer the seed set S t+1 obtained according to the prediction result is to the expected seed set S t+1 , and the greater the influence.
Proof of Theorem 1. Suppose the set of edges in the network at time t + 1 is E t+1 , the prediction result isÊ t+1 , the probability of information spreading in the network is P uv , and the accuracy of structure prediction is η, then: Assuming that the influence function is denoted by R(S t ), the following inequality is satisfied for any dynamic online social network, where ε > 0 arg max Combining these two formulas, we can obtain Combining the above equations, we can obtain So far, it can be proved that if the prediction accuracy η is more accurate, the seed set obtained based on the prediction will be more closely matched with the expected result.

Experiments and Discussion
In this section, the performance and efficiency of the proposed scheme are verified through a series of experiments. The experiments are mainly divided into two parts. The first part verifies the accuracy of our prediction module, and the second part compares the classical methods and other similar algorithms with our framework.

Datasets
To evaluate the performance of the proposed framework, we conduct experiments on four different dynamic network datasets, all of which are real dynamic online social networks. Table 1 shows the information of the datasets. The second column of the table specifies the name of the dataset, the third column indicates the total number of temporal edges included in each dataset, and the last column shows the time span. As can be seen from the table, in order to make the experiments more convincing, we use datasets of different scales.

Evaluate the Prediction Module
To evaluate the algorithm, we adopt a widely used metric in link prediction--AUC. This metric can be interpreted as the probability that the score of an edge in the test set is higher than the score of a randomly selected edge that does not exist. The larger the AUC value, the higher the accuracy of the algorithm prediction. The following formula explains the AUC calculation process: where n is the number of comparisons, n is the number of times the edge has a larger score in the test set, and n is the number of cases where two scores are the same.
In the experiment, TensorFlow was used to build our prediction model. In the model, the hidden layer of the neural network is two layers, and each hidden layer has 1024 neurons. The activation functions used in the model are the ReLu function and the sigmoid function. The learning rate used during training was 0.001 and the batch size was 32 for training purposes with epoch 5. The model utilized an Adam optimizer to minimize cross-entropy. All datasets are divided into 20 equally spaced snapshots by time interval, and the first 19 snapshots are used to train the model. After the model is trained, it is used to predict the edges in the last snapshot.
During the experiment, we tested the effect of different feature fusion methods to construct the model input vector. The AUC values of four prediction methods are shown in Table 2, where NNLG (neural network based on local and global similarity indices) represents the fusion of local similarity indices and global similarity indices to generate feature vectors, NNLQ means fusing local similarity indices and quasi-local similarity indices. Similarly, NNGQ and NNLGQ represent different fusion methods of the three similarity measures, respectively. Analyzing the experimental results, it can be found that the NNLQ that fuses local features and quasi-local features exhibits the best performance. Although the input vector of the NNLGQ algorithm contains local features, global features, and quasi-local features, the effect is not as good as that of NNLQ. In-depth analysis of the reason behind this phenomenon revealed that LQ contains local information and quasi-local information, but not global information, and this combination captures the most accurate features of link prediction, while the redundant information in NNLGQ may interfere with prediction results. Therefore, in our IMPR framework, the local similarity indexes and quasi-local similarity indexes were fused to construct the feature vector.

Evaluation of the Proposed Framework
In order to reveal the performance of our framework, we first embedded classical influence maximization algorithms into our framework for experiments. Moreover, we also compared the proposed framework with some existing algorithms on dynamic networks.

Baseline Algorithms
In order to demonstrate the superiority of our framework, we compared the classical influence maximization algorithm embedded in the framework and not embedded in the framework. The algorithms used for comparison in the experiments are summarized as follows.
Upper Bound based Lazy Forward (UBLF) [10]: This is a typical representative of a greedy-based influence maximization algorithm, which uses an upper bound on the gain of the influence function to speed up the computational process. Compared with other greedy algorithms, the UBLF algorithm was more efficient.
Prediction-based Upper Bound based Lazy Forward (PUBLF): This was to embed UBLF into our IMPR framework and add the prediction part to the original.
Degree-Discount (DD) [9]: This was the most typical algorithm based on heuristic information, which selects the seed node according to the degree of the node.
Prediction-based Degree-Discount (PDD): This was to embed the Degree-Discount algorithm into our IMPR framework and add the prediction part to the original.
Community Finding Influential Node (CFIN) [14]: This was a recently proposed algorithm based on community structure. First, the network is divided into communities, and then the seed nodes are found in the community according to the dynamic programming algorithm.
Prediction-based Community Finding Influential Node (PCFIN): This was to embed the CFIN algorithm into our IMPR framework.
Furthermore, a series of experiments are conducted to compare our framework with some algorithms in dynamic networks to demonstrate the advantages of our framework. A brief description of these algorithms is given below.
Upper Bound Interchange Greedy algorithm (UBIG) [31]: This algorithm was used to track the influence nodes in the dynamic network, and the result set was continuously updated by comparing the changes of the network structure.
Community-based influence maximization (CIM) [33]: This algorithm mainly uses the community structure to mine the seed nodes in the community, and then decides whether to update the seed set according to the changes of the community structure.
Influence Maximization based Common Users (IMCU) [30]: This algorithm is based on common users and studies the influence maximization problem in dynamic networks from the perspective of users.
Influence Maximization based Prediction and fast Replacement (IMPR): This is a new computational framework proposed in this paper. First, we predicted changes in network structure based on historical snapshots, and then dynamically updated the seed set based on the differences between snapshots.

Evaluation Metric
According to the existing analysis, the purpose of maximizing the influence of dynamic online social networks is to find the K nodes with the greatest influence at each moment in the network as the seed set. To evaluate the performance of our proposed framework, we first assumed that the network was continuously changing dynamically, and then obtained the seed set for each time window in the network according to different algorithms. When the calculation of the seed set was completed, based on the network structure at the current moment and a given information diffusion model, the seed node was used as the information source to simulate the information diffusion process. The number of affected nodes when the propagation ends was used as the influence spread of the seed set. It is important to note that to avoid randomness of the results, each propagation process goes through 100 iterations.
To compare different models, we took the average influence spread of all snapshots as the evaluation metric for different models.

Result and Discussion
In the experimental process, in order to facilitate comparison with other methods, we adopted the most widely used independent cascade model for the information diffusion model, and the probability of information propagation between adjacent nodes is set to p = 0.06. For datasets, we split each dataset into 20 snapshots in an equally spaced manner. We trained our predictive model with the first 10 snapshots. After the model training was completed, we calculated the seed nodes according to different algorithms and used the influence spread as a metric to evaluate the seed nodes.
To demonstrate the importance of the prediction module in our framework, we first compared the classical influence maximization algorithm embedded in our framework with the case without embedding. The results on different datasets are shown in Figure 4, where the abscissa k represents the size of the seed set. During the experiment, the value of K ranges from 10 to 100, with 10 as the interval.
After careful analysis of these figures, it is easy to observe that as the size of the seed set increases, the influence spread of the seed nodes in all datasets gradually becomes larger. The greedy algorithm UBLF exhibited the best performance, and the Degree-Discount algorithm exhibited the worst effect. This was because the Degree-Discount algorithm only considers the information of the node degree, which sacrifices accuracy in exchange for efficiency improvement. Most importantly, we found that prediction techniques help each algorithm improve accuracy and achieve better results.
Next, we compared our algorithm with some existing influence maximization algorithms in dynamic networks on different datasets. Figure 5 shows the experimental results. Comparing these figures, it can be found that our proposed scheme outperformed other algorithms. This is because our framework could better predict the upcoming network snapshot compared to other algorithms. Mining seed nodes on the prediction network can maximize the fit between the seed nodes and the dynamic network. While other algorithms used outdated network snapshots, when the seed node was calculated, the network structure had changed. To demonstrate the importance of the prediction module in our framework, we firs compared the classical influence maximization algorithm embedded in our framework with the case without embedding. The results on different datasets are shown in Figure 4 where the abscissa k represents the size of the seed set. During the experiment, the valu of ranges from 10 to 100, with 10 as the interval.   After careful analysis of these figures, it is easy to observe that as the size of the seed set increases, the influence spread of the seed nodes in all datasets gradually becomes larger. The greedy algorithm UBLF exhibited the best performance, and the Degree-Dis count algorithm exhibited the worst effect. This was because the Degree-Discount algo rithm only considers the information of the node degree, which sacrifices accuracy in ex change for efficiency improvement. Most importantly, we found that prediction tech niques help each algorithm improve accuracy and achieve better results.
Next, we compared our algorithm with some existing influence maximization algo rithms in dynamic networks on different datasets. Figure 5 shows the experimental re sults. Comparing these figures, it can be found that our proposed scheme outperformed other algorithms. This is because our framework could better predict the upcoming net work snapshot compared to other algorithms. Mining seed nodes on the prediction net work can maximize the fit between the seed nodes and the dynamic network. While other Finally, we compared the running time of different algorithms on four datasets, where we fixed the size of the seed set to 50. The experimental results are shown in Figure 6. It can be seen intuitively from the figure that the UBIG algorithm has the shortest running time, followed by the algorithm proposed in this paper. This is because the UBIG algorithm only calculates the seed node based on the existing historical snapshot every time, and there is no network update operation. Other algorithms include the operation of updating the network structure. Finally, we compared the running time of different algorithms on four datasets, where we fixed the size of the seed set to 50. The experimental results are shown in Figure  6. It can be seen intuitively from the figure that the UBIG algorithm has the shortest running time, followed by the algorithm proposed in this paper. This is because the UBIG algorithm only calculates the seed node based on the existing historical snapshot every time, and there is no network update operation. Other algorithms include the operation of updating the network structure.
Combining the experimental results, we can conclude that our proposed computational framework is more suitable for solving the influence maximization problem in dynamic networks, especially for those that change continuously. The limitation of our scheme is that it requires a training process; however, training can improve the accuracy of the results.   Finally, we compared the running time of different algorithms on four datasets, where we fixed the size of the seed set to 50. The experimental results are shown in Figure  6. It can be seen intuitively from the figure that the UBIG algorithm has the shortest running time, followed by the algorithm proposed in this paper. This is because the UBIG algorithm only calculates the seed node based on the existing historical snapshot every time, and there is no network update operation. Other algorithms include the operation of updating the network structure.
Combining the experimental results, we can conclude that our proposed computational framework is more suitable for solving the influence maximization problem in dynamic networks, especially for those that change continuously. The limitation of our scheme is that it requires a training process; however, training can improve the accuracy of the results.  Combining the experimental results, we can conclude that our proposed computational framework is more suitable for solving the influence maximization problem in dynamic networks, especially for those that change continuously. The limitation of our scheme is that it requires a training process; however, training can improve the accuracy of the results.

Conclusions
With the continuous development of the mobile Internet, online social networks have changed many aspects of our lives. Many researchers are devoted to the study of online social networks. Influence maximization is one of the important issues of research in this field. Most of the existing research is based on static network structure, but in fact the network structure changes dynamically with time. To this end, we delved into the problem of influence maximization in dynamic online social networks.
In this paper, we propose a novel computational framework for solving the influence maximization problem in dynamic online social networks. Our framework first predicts upcoming network snapshots based on historical network snapshots, and then mines the most influential seed nodes on the predicted results. We theoretically demonstrate the proposed scheme. Moreover, a series of experiments on four real dynamic online social network datasets were conducted to reveal the advantages of our scheme, and the experimental results show that our algorithm can improve the accuracy of the results and the computational efficiency.
In the future, we will continue to study issues related to online social networks. There are two potential research directions, one is to study the influence maximization when the network topology is unavailable, and the other is to study the information diffusion on multilayer networks and extend our model to multilayer networks