1. Introduction
The complex network model is an abstract approach to analyzing real-world interactions between objects in the form of points and lines [
1]. Any individual in nature can be connected with another in a specific relation. With different observation scales, the expression range of a complex network can be applied to different fields. Complex networks can express any chemical structure [
2] or microscopic biological structure [
3] by taking molecules and cells as individuals. By shrinking the scale to neurons, the complex network can also express the structure of the human brain [
4]. By abstracting the sensors used in autonomous driving as nodes, the complex network can be represented as a road topology within a specific area [
5]. Thus, a series of complex network techniques can be further used to analyze the target system features, such as node importance evaluation [
6], fractal research of complex networks [
7], and community detection [
8]. As one of the most representative problems in complex networks, link prediction aims to estimate the unknown or missing links possibility between nodes by using the current node connection information of the target network.
Link prediction can more accurately represent interaction principle and help people to efficiently understand evolutionary trends and mechanisms of the target network [
9]. Additionally, link prediction results have many applications in current society. Link prediction matching between user demand and hotel conditions can improve hotel revenue [
10]. The link prediction in wireless sensor networks (WSNs) can improve the information transmission efficiency while maintaining relatively low re-transmission rates for energy-saving purposes [
11]. The link prediction analysis in the disease–gene interaction network can help the pharmaceutical industry develop targeted drugs [
12].
The processing objects of link prediction have been diversified, including static networks, dynamic networks, temporal networks, heterogeneous networks, heterogeneous temporal networks, and hypergraphs. There are five primary methods for temporal network link prediction: matrix factorization, probability, spectral clustering, machine learning, and time series [
13]. In matrix factorization, most methods are based on nonnegative matrix factorization (NMF). Sheng et al. [
14] propose a temporal link prediction method based on NMF. This method considers three aspects: the global structure of nodes, the local information of nodes, and the attributes of nodes. By leveraging multiple sources of information, this method predicts the probability of link occurrence. In probability, the uncertainty and variability of links among nodes are typically quantified using the maximum likelihood approaches or probability distributions. Based on the concept of Markov chains, the extended temporal exponential random graph model (etERGM) [
15] can predict the future attributes and connections of nodes based on historical data. In spectral clustering, Fang et al. proposed a time regression model for temporal link prediction. The idea is to integrate the spectral graph theory and low-rank approximation into a time series model, allowing the model to capture more graph information and improve the accuracy of temporal link prediction. In machine learning, network representation learning (NRL) utilizes various graph embedding algorithms to represent all properties of temporal network in a low-dimensional vector space. This approach effectively eliminates the challenges associated with extracting features of snapshots. By leveraging these low-dimension feature vectors, temporal link prediction can be performed well. Zhou et al. [
16] proposed a NRL method called DynamicTriad, which simulates the occurrence of triadic closure processes among nodes. This enables the model to capture temporal features and obtain vector representation for each node across different periods. In time series-based temporal link prediction, researchers have proposed various methods to enhance the accuracy of temporal link prediction by obtaining the local similarity of nodes in different ways. Huang et al. [
17] used specific time granularity to treat temporal network data sets into snapshots. They proposed an improved gravity model with second-order neighbors, denoted by gravity (GR), to compute the score matrix in each static network snapshot. All score matrices will be combined with a time attenuation factor to accumulate the next period connection probability. Yang et al. [
18] proposed a temporal network link prediction method, tensor-based node similarity (TBNS). The TBNS treats the collection of snapshots with time dimension as a three-dimensional tensor and employs the exponential smoothing model to compress a three-dimensional tensor into a two-dimensional node similarity matrix, which serves as the link prediction scores for the next period. This approach addresses the problem that snapshots cannot capture node connectivity strength due to the large time granularity sizes. However, capturing any microscopic connections in this approach will likely to lead to many sparse matrices, which wastes storage space. Güneş et al. [
19] first calculated the nodes similarity indexes in different periods by common neighbors, preferential attachment, Adamic–Adar, and the Jaccard indicator. Then, they used the autoregressive integrated moving average (ARIMA) to predict the similarity between nodes for the future.
All of the above time series-based temporal link predictions [
17,
18,
19] follow the framework processing rules shown in
Figure 1. First, the temporal network represents different snapshots at different periods. Second, the network snapshot in each layer can construct nodes’ similarity matrix in different ways. Finally, various numerical prediction models are used to predict the snapshot similarity matrix at a future time. Therefore, constructing reasonable node similarity is crucial in improving the accuracy of time series-based network temporal link prediction methods.
Table 1 briefly highlights some existing node similarity-based link prediction methods.
However, the link prediction idea of using the network snapshots as a time series prediction needs to be improved. The network snapshot is an adjacency matrix format. Multiple joins within a time granularity
will be recorded in a binary form. In temporal networks, the interaction information is represented using the triplet format
. The time granularity
will divide time stamps
into
slices of snapshots. Each network snapshot contains interaction events within the corresponding period. In the temporal prediction process, there may be misjudgments in the reference scores of the similarity matrix among nodes within each snapshot. For example, during the snapshot period, some nodes may have lower interaction strength, and as a result, their similarity may not provide more weight for the time series forecasting model. A large time granularity
can provide more edge information for snapshots, but it can also destroy the accuracy of reference scores for similarity among nodes during the prediction process. The time series-based temporal link prediction method [
17,
18,
19] mentioned above enhances the expressive capacity of network snapshots and improves the computational speed of the algorithm by considering the local information of node. Regardless, large time granularity may not guarantee the algorithm’s accuracy [
31]. To address the above problems, we propose a temporal network link prediction method based on optimized exponential smoothing model and node interaction entropy (OESMNIE). The OESMNIE method considers the fine-grained interaction information among nodes within a snapshot and the impact of interaction intensity on node similarity in the time series prediction process. The OEMNIE method leverages the characteristics of wide-ranging and low-frequency node interactions within a snapshot, combined with information entropy, to further differentiate the popularity of nodes within the snapshot structure, thereby enhancing the accuracy of constructing node similarity based on the gravity model. Furthermore, the OESMNIE method normalizes the sum of interaction entropy for each node within the snapshot and incorporates the smoothing coefficient from the exponential smoothing model. This allows the prediction of node similarity to dynamically weight according to the overall trend of network interactions, and the three-dimensional snapshot similarity tensor is eventually compressed into a two-dimensional node similarity matrix (i.e., future link prediction scores). This provides a new idea for modeling time series-based temporal link prediction.
The major contributions are summarized as follows:
We record the fine-grained interaction information among nodes within the snapshot period and incorporate the concept of information entropy and weak ties to construct the node interaction entropy. This value differentiates the popularity of nodes within the snapshot structure from a more nuanced perspective.
We combine node interaction entropy and eigenvector centrality to construct an enhanced node similarity that considers the network structure’s distance between nodes, weak ties characteristics, and centrality.
We normalized the sum of node interaction entropy, and the normalized result will reflect the ratio of the current snapshot’s weak ties during the entire period. With the higher ratio, the node similarity matrices can provide more weights for time series prediction. We combine the smoothing coefficient and the ratio into the exponential smoothing model. This improves the shortcoming of the single reference score of the prediction process.
The remaining content of this paper is divided into several parts. In
Section 2, we will highlight the related works, introduce the concepts of temporal networks, network snapshots, and multi-layer network model, as well as explain weak ties theory, gravity model, information entropy, eigenvector centrality of nodes, temporal network link prediction method, and the role of exponential smoothing model in link prediction.
Section 3 will provide a more detailed description of the OESMNIE method proposed in this paper.
Section 4 will include specific comparative experimental analysis and discussion. Finally,
Section 5 will conclude this paper.
4. Experiments and Discussion
4.1. Experimental Environment
The experimental environment for this study is as follows: Processor: 12th Gen Intel (R) Core (TM) i5-12400F 2.50GHz; Memory: 16.0 GB (DDR4); Operating system: Windows 11 Professional (22H2); Compiler version: Python 3.7; and Relevant tool libraries: NetworkX 2.3, Numpy 1.21.6, and Multinetx.
4.2. Data Selection
We selected three real temporal network datasets, including Emaildept3 [
44], Workspace [
45], and Email-EU-core [
46], to verify the efficiency and accuracy of the OESMNIE method and compare it with TBNS, GR, and traditional node similarity indicators.
Table 2 shows the statistical characteristics of the selected temporal network datasets.
Figure 6 illustrates the evolution of snapshot interactions in each temporal network dataset.
To validate the effectiveness of the OESMNIE method under different time granularities, we utilize different time granularities to obtain snapshots. The number of network snapshots generated based on the time granularity
is represented in
Table 3.
4.3. Evaluation Method
We validate the algorithm from two perspectives. First, we predict the individual snapshots in the temporal network (which serves to verify the effectiveness of fine-grained interaction behavior among nodes in improving node similarity). Second, we predict the future links of the temporal network (which serves to verify the effectiveness of dynamically adjusting the smoothing coefficient based on the temporal network’s interaction changes in link prediction). We treat the individual snapshot
(randomly selected from the snapshot set of the temporal network) as a static network. We divide the edges within snapshot
into training data
and testing data
in a ratio of 20%. Therefore, the set
consists of edges that do not actually exist in the snapshot (constructed as
). We divide the snapshot set into training and test datasets for future prediction (based on all network snapshots). Assuming that the graph data represented by the snapshot set of the temporal network is
, since
is composed of snapshots from different periods, the snapshot subgraphs contained in
can be denoted as
, and the corresponding edge and node dataset can be represented by
. Therefore, we take the subset
of elements
to
from the set
as the training data
, including {
}. In order to predict the link status of the next snapshot, we set the edge set
of the
subgraph (i.e., the latest snapshot) as the test dataset
. We use the AUC to evaluate the prediction performance of individual snapshots and the entire temporal network at future. The definition of the AUC is given in Equation (19).
The AUC randomly selects pairs of edges, one from the dataset of actual existing edges , and another from the dataset of actual non-existing edges , and compares their predicted score multiple times. is the number of times that the predicted score of an actual existing edge is greater than an actual non-existing edge in experiments. is the number of times that the predicted score of an actual existing edge is equal to an actual non-existing edge. is the total number of selections in the validation experiments. The value of the AUC indicator will range between 0.5 and 1, and the closer it is to 1, the more accurate is the method’s prediction.
4.4. Performance Comparison
Firstly, we compared the AUC of our proposed method with the existing GR, CN, AA, JC, PA, and RA indictors on a single snapshot with different time granularities. These selected indicators are based on local information to construct the similarity between nodes and have been widely applied in link prediction. Therefore, they are similar to the OESMNIE method in terms of the topological structure and are representative of the field. The results are presented in
Table 4.
In
Table 4, our proposed OESMNIE method demonstrates competitive performance across single-layer snapshots of temporal networks at varying time granularities. As the temporal granularity L increases, all link prediction methods based on local information show improved accuracy. This can be attributed to the fact that a larger time granularity L allows for capturing more interaction information within the snapshot, thereby enhancing the information richness of the snapshot. However, for temporal network link prediction, increasing the temporal granularity disrupts the temporal attributes of node interaction information, thus hindering future link prediction. In contrast, the OESMNIE method leverages the fine-grained interactions of the current snapshot and exploits the weak ties between nodes, resulting in a significant improvement in accuracy while maintaining the same snapshot conditions. This approach neither compromises the snapshot’s temporal features nor their ability to characterize the node similarity across snapshots accurately.
Secondly, we conducted 100 experiments and averaged the results to compare the predictive performance of the OESMNIE method with selected time series-based temporal link prediction methods.
Figure 7 illustrates the AUC results of these eight methods on the temporal network.
In the temporal network with weekly and monthly time granularities, the OESMNIE method outperforms other methods, achieving the highest scores of 0.9181 and 0.9384, respectively. The other methods show a significant decrease in accuracy under different time granularities. Two factors cause the decreased accuracy of these methods. First, the node similarity matrix is constructed based on limited adjacency matrix information from the network snapshots, and thus limiting the accuracy of node similarity within a single snapshot. Second, relying on the attenuation constant of the GR or using the exponential smoothing model can cause the algorithm to deviate from the actual network interactions during the prediction process, leading to other methods providing incorrect reference scores at the wrong snapshot period.
4.5. Parameters Analysis
The time granularity
used to construct network snapshots is a critical factor influencing the performance of the temporal network link prediction methods. In addition, we also consider that the smoothing coefficient
in the improved exponential smoothing model plays an important role on the algorithm’s accuracy. Therefore, we explored the impact of the smoothing coefficient on the algorithm’s effectiveness by incrementally varying
. Based on the network snapshots with different time granularities for different networks, we can observe the relationship between the change in the number of interactions within each snapshot and the smoothing coefficient
’s (
) reference ratio, as shown in
Figure 8,
Figure 9 and
Figure 10.
From
Figure 8,
Figure 9 and
Figure 10, we can observe that the smoothing coefficient in the exponential smoothing model varies and adjusts with the interaction patterns within the snapshots. This provides more moderate weights for the predicted node similarity values. The experimental results are consistent with our initial hypothesis.
To further determine the impact of the smoothing coefficient
on the prediction accuracy of the OEMSNIE method, we conduct a comparative analysis with similar methods for varying
values.
Figure 11,
Figure 12 and
Figure 13 shows the detailed comparison results.
As shown in
Figure 11 and
Figure 12, prediction accuracy fluctuations are observed for all algorithms with the increase in the smoothing coefficient
and the snapshot time granularity
. However, the OESMNIE method demonstrates overall stability, indicating its superior stability to the other algorithms. This further validates the improved prediction weights’ effectiveness in adapting to network structure changes.
As shown in
Figure 13a, the accuracy of the OESMNIE method is weaker than other indicators, but this is only limited to the case of a small snapshot period. From the overall experimental results, the method proposed in this paper shows better robustness and accuracy than other methods under different time granularities and smoothing coefficients.
4.6. Discussion
Although our proposed method has achieved certain results, it is essential to acknowledge its limitations. Firstly, our approach focuses on extracting more link information from the snapshot to establish the similarity between nodes within each period, which is used for link prediction. As a result, the structure of each snapshot is represented in a matrix. However, for large networks, the data processing approach based on the adjacency matrix of network snapshots can result in larger sparse matrices, thereby increasing storage burden. Hence, it is more suitable for small-sized and medium-sized networks. Secondly, in constructing node similarity, we use the shortest path length between nodes as the distance in the gravity model. Therefore, the node similarity matrices are symmetric, indicating that the current method only applies to unweighted and undirected temporal networks. Finally, the OESMNIE algorithm requires snapshots to contain fine-grained interaction information among nodes to distinguish the similarity between nodes. Therefore, when dealing with real-time link prediction, the OESMNIE method can only record the real-time data and transform it into a near real-time approach for prediction. In future research, we can explore graph embedding techniques to alleviate the sparsity issue in large networks and improve the measurement of node similarity to make the method applicable to a wider range of network types and application scenarios.