Next Article in Journal
Evaluating MPTCP Congestion Control Algorithms: Implications for Streaming in Open Internet
Next Article in Special Issue
A Comprehensive Analysis and Investigation of the Public Discourse on Twitter about Exoskeletons from 2017 to 2023
Previous Article in Journal
A New Approach to Web Application Security: Utilizing GPT Language Models for Source Code Inspection
Previous Article in Special Issue
Temporal-Guided Knowledge Graph-Enhanced Graph Convolutional Network for Personalized Movie Recommendation Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs

1
Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100191, China
2
University of Chinese Academy of Sciences, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Future Internet 2023, 15(10), 327; https://doi.org/10.3390/fi15100327
Submission received: 1 September 2023 / Revised: 28 September 2023 / Accepted: 28 September 2023 / Published: 30 September 2023

Abstract

:
Temporal knowledge graphs play an increasingly prominent role in scenarios such as social networks, finance, and smart cities. As such, research on temporal knowledge graphs continues to deepen. In particular, research on temporal knowledge graph reasoning holds great significance, as it can provide abundant knowledge for downstream tasks such as question answering and recommendation systems. Current reasoning research focuses primarily on interpolation and extrapolation. Extrapolation research aims to predict the likelihood of events occurring in future timestamps. Historical events are crucial for predicting future events. However, existing models struggle to fully capture the evolutionary characteristics of historical knowledge graphs. This paper proposes a multi-scale evolutionary network (MSEN) model that leverages Hierarchical Transfer aware Graph Neural Network (HT-GNN) in a local memory encoder to aggregate rich structural semantics from each timestamp’s knowledge graph. It also utilizes Time Related Graph Neural Network (TR-GNN) in a global memory encoder to model temporal-semantic dependencies of entities across the global knowledge graph, mining global evolutionary patterns. The model integrates information from both encoders to generate entity embeddings for predicting future events. The proposed MSEN model demonstrates strong performance compared to several baselines on typical benchmark datasets. Results show MSEN achieves the highest prediction accuracy.

1. Introduction

Temporal knowledge graphs (TKGs) are a type of dynamic knowledge structure in which knowledge may change over time. Compared with traditional static knowledge graphs, TKGs take into account the influence of the time dimension. TKGs with multiple timestamps can be represented as a series of knowledge graph (KG) snapshots, where each snapshot contains all facts within the same timestamp.
For static knowledge graph completion, methods such as DiSMult [1], ComplEx [2], RGCN [3], ConvE [4], and RotatE [5] have been proposed. TKG reasoning tasks can perform temporal reasoning and evolution analysis of knowledge, mainly divided into interpolation and extrapolation [6]. Interpolation is to complete missing events based on existing facts; extrapolation is to predict future events based on past events. Predicting future events based on the evolution of historical knowledge graphs is very important and challenging. Obtaining possible future events in a timely manner based on temporal relations can support applications such as financial risk control, disaster relief, etc.
This paper focuses on extrapolation research. The challenge of this task is how to better obtain relevant historical information that can reflect future behaviors. Currently, there are two methods that can be used for TKG extrapolation, namely, query-specific and entire graph-based methods [7]. The former includes RE-NET [6], CyGNet [8], xERTE [9], TITer [10], CENET [11], etc., using high-frequency historical facts related to a query’s topics and relations to predict future trend, ignoring structural dependencies within snapshot; the latter includes RE-GCN [12], CEN 7, TANGO [13], TiRGN [14], L2TKG [15], etc., using historical knowledge graph as input to obtain the evolutionary patterns of TKG.
The above two methods do not consider the historical features from different perspectives, ignoring potentially useful information. Relying solely on either local or global knowledge graphs may result in information loss, affecting prediction results. However, future events may depend on both local structural dependencies and global temporal patterns. Historical knowledge from varying perspectives likely influences future occurrences to differing degrees.
In this work, we propose a model to capture multi-scale evolutionary features of historical information, called Multi-Scale Evolutionary Network (MSEN). Typically, different features manifest at different scales, providing diverse semantic information. The local memory focuses on contextual knowledge, while the global memory recalls long-range patterns. By integrating both scales, MSEN can better model TKG evolution for reasoning.
The main contributions are:
(1)
In the local memory encoder, a hierarchical transfer aware graph neural network (HT-GNN) is proposed to enriches structural semantics within each timestamp;
(2)
In the global memory encoder, a time related graph neural network (TR-GNN) is proposed to extract periodic and non-periodic temporal patterns globally across timestamps;
(3)
Through experiments on typical event-based datasets, the paper demonstrates the effectiveness of our proposed model MSEN.

2. Related Work

In this section, this paper reviews the existing methods for TKG Reasoning under the interpolation setting and the extrapolation setting, and summarizes the methods for event prediction.

2.1. TKG Reasoning under the Interpolation Setting

For the interpolation setting, models try to complete missing facts in the past timestamps. TTransE [16], TA-TransE [17], and TA-DistMult [17] embed temporal information into scoring functions and relations, respectively, to obtain the evolution information of facts; HyTE [18] proposes a time-aware model based on hyperplanes, and projects entities and subgraphs onto specific hyperplanes; TNTComplEx [19] introduces complex numbers to factorize a fourth-order tensor to generate timestamp embeddings. DE-SimplE [20] accomplishes the task of knowledge graph completion by making feature representations of temporal information; ChronoR [21] captures the rich interactions between time and multi-relational features in knowledge graphs by learning embeddings of entities, relations, and time; TempCaps [22] is a capsule network model that uses information retrieved through dynamic routing to construct entity embeddings. TKGC [23] maps entities and relations in TKG to multivariate Gaussian processes, thereby simulating overall and local trends. However, these models cannot obtain embeddings of future events.

2.2. TKG Reasoning under the Extrapolation Setting

Recently, several studies have attempted to infer future events, and existing methods can be divided into two categories: query-specific and entire graph-based methods [7] They all use historical information to predict future trends.

2.2.1. Query-Specific Models

The methods focus on obtaining historical information of a specific query. GHNN [24] predicts the occurrence of future events through obtaining the dynamic sequence of evolving graphs; xERTE [9] uses iterative sampling and attention propagation techniques to extract closed subgraphs around specific queries. Both CluSTeR [25] and TITer [10] are based on reinforcement learning to discover paths for specific queries in the historical KG; RE-NET [6] can effectively extract sequential and structural information related to a specific query in the TKG to solve the entity prediction task for a given query, but cannot model long-term KG information; CyGNet [8] combines two reasoning modes, copy mode and generation mode, making predictions based on repeated entities in the historical vocabulary or the entire entity vocabulary, but ignores higher-order semantic dependencies between co-occurring entities; CENET [11] proposes a novel historical comparative learning approach for knowledge graph reasoning that uses temporal reasoning patterns mined from historical knowledge graph versions to make more accurate predictions.

2.2.2. Entire Graph-Based Models

The methods focus on obtaining the history of the latest KG of fixed length. Glean [26] introduces unstructured event description information to enrich entity representations, but description information is not available for all events in real applications; TANGO [13] extends neural ordinary differential equations to multi-relational graph convolutional networks to model structural information, but the model’s utilization of long-distance information is still limited; RE-GCN [12] and CEN [7] both model the evolution of representations of all entities by capturing fixed-length historical information, but neither of these two models considers the evolutionary features of historical information from different perspectives; TiRGN [14] introduces a new recurrent graph network model that incorporates local and global historical patterns in temporal knowledge graphs to improve the accuracy of link prediction over time.
Some recent works have incorporated both local and global modeling for TKG reasoning. However, MSEN proposes new designs for each module and their integration, achieving better results. The hierarchical design in the local module enriches semantic learning, while the periodic and aperiodic temporal modeling in the global module improves temporal encoding.

3. The Proposed Method

In this section, this paper mainly introduces Temporal Knowledge Graph (TKG) Reasoning under the extrapolation setting and the proposed method. Table 1 shows some of the important symbols of the paper and the corresponding explanations.

3.1. Problem Formulation

A TKG G = { G 0 , G 1 , G 2 , G 3 , G n } is a KG sequence. In each KG, G t = { Ε , R , Γ t } is a directed multi-relational graph, where Ε is the set of entities; R is the set of relations; Γ t is the set of facts at timestamp t. A fact in Γ t can be expressed as a quaternion (s,r,o,t), where s , o E and r R . The goal of the TKG entity reasoning task is to complete tail entity prediction (s,r,?, t m ) or head entity prediction (?,r,o, t m ) based on the historical knowledge graph sequence { G 0 , G 1 , G 2 , G 3 , G t m 1 } . For each quaternion (s,r,o,t), an inverse quaternion ( o , r 1 , s , t ) is added to the KG. Therefore, the prediction of head entity (?,r,o, t ) can be transformed into the prediction of tail entity ( o , r 1 , ? , t ).

3.2. Model Overview

In this section, a Multi-Scale Evolutionary Network (MSEN) is proposed to solve the problem of feature extraction at different scales. The overall framework of a MSEN is illustrated in Figure 1. There are three parts in our model: (1) A local memory encoder that applies a hierarchical transfer-aware graph neural network (HT-GNN) to obtain structural features within each timestamp’s knowledge graph snapshot. The local encoder also performs temporal encoding of entities and relations using a gated recurrent unit (GRU); (2) a global memory encoder that employs a time-related graph neural network (TR-GNN) to mine temporal-semantic dependencies across the historical knowledge graph, yielding entity and relation embeddings; (3) a decoder that integrates the local and global encoder outputs and utilizes a scoring function to predict future entities.
Specifically, the input is a series of temporal knowledge graph snapshots. The local encoder extracts per-timestamp features using HT-GNN and temporal dynamics via the GRU. The global encoder identifies long-range patterns using TR-GNN. Finally, the decoder combines local and global embeddings to predict future entities using the scoring function. This multi-scale approach allows MSEN to capture both contextual knowledge and global evolutions for enhanced temporal reasoning.

3.3. Local Memory Encoder

The local memory encoder focuses on obtaining the memories of the m historical knowledge graphs adjacent to the query time. It aggregates each knowledge graph snapshot to capture structural dependencies within timestamps. Additionally, sequence neural networks are utilized to learn temporal features of entities and relations, enabling the model to better represent temporal dynamics across knowledge graphs.

3.3.1. HT-GNN

Obtaining informative embedding representations for each temporal knowledge graph is critical for effective local encoding. To better capture semantic information, we propose a novel graph neural network-based knowledge graph embedding method called Hierarchical Transfer-Aware Graph Neural Network (HT-GNN). As illustrated in Figure 2, HT-GNN performs sequential aggregation at each layer: entity aggregation, relation aggregation, convolution aggregation, and sum aggregation. The blue nodes serve as aggregation centers. Different aggregation results are merged as input to the next layer. Finally, the outputs from the multi-layer aggregations are combined as the overall model output.
Specifically, entity aggregation allows nodes to incorporate semantics from adjacent entities based on co-occurrence, capturing semantic dependencies. The entity aggregation function and weight calculation are:
h o , t e n t l = σ s , r N t α e n t l w e n t l h s , t l ,
where h o , t e n t is embedding representation of entity o. h s , t l is the embedding representation of entity s which is a neighboring node of entity o. w e n t l is a learnable weight matrix. N t represents the neighboring entities and relations of entity o. σ is a nonlinear activation function. α e n t l is the aggregation weight, calculated as follows:
α e n t l = e x p h L h s , t l h o , t l s , r N t e x p h L h s , t l h o , t l ,
where h · is the LeakyRelu activation function. L · represents the fully connected function. represents the concatenation operation.
The result of entity aggregation is used as input to relation aggregation. The relation aggregation infuses relation semantics into the entity representations, thereby obtaining the co-occurrence of each node and relation in the knowledge graph. The relation aggregation function and corresponding weight calculation method used in the paper are as follows:
h o , t r e l l = σ s , r N t α r e l l w r e l l h r , t l ,
where h r , t l is the embedding representation of the neighboring relation r of entity o. w r e l l is a learnable weight matrix. α r e l l is the relation aggregation weight, calculated as follows:
α r e l l = e x p h L h r , t l h o , t l s , r N t e x p h L h r , t l h o , t l
The result of relation aggregation is used as input to convolution aggregation. For convolution aggregation, it obtains the hidden relation between entity s, relation r, and entity o, which enriches the semantic information of entities. The convolution aggregation function and weight calculation method used in the paper are as follows:
h o , t c o n l = σ s , r N t α c o n l w c o n l c o n s , r , o ,
where c o n · is the convolution operation. w c o n l is a learnable weight matrix. α r e l l is the aggregation weight, calculated as follows:
α c o n l = e x p h L h s , t l h r , t l h o , t l s , r N t e x p h L h s , t l h r , t l h o , t l .
The result of convolution aggregation is used as input to sum aggregation. For sum aggregation, information is obtained through relational paths between entities, and sum-obtained information is similar to translation model TransE [27]. The sum aggregation function and corresponding weight calculation method used in the paper are as follows:
h o , t s u m l = σ s , r N t α s u m l w s u m l h s , t l + h r , t l ,
where h s , t l and h r , t l are the embedding representations of neighboring entity s and relation r of entity o. w s u m l is a learnable weight matrix. α s u m l is the relation aggregation weight, calculated as follows:
α s u m l = e x p h L h s , t l + h r , t l h o , t l s , r N t e x p h L h s , t l + h r , t l h o , t l .
Finally, the final entity embedding can be expressed as:
h o , t l + 1 = h o , t e n t l + h o , t r e l l + h o , t c o n l + h o , t s u m l .

3.3.2. Local Memory Representation

In order to obtain the temporal features of entities and relations in adjacent historical knowledge graphs, following RE-GCN [12], GRU is adopted to update the representations of entities and relations. The update of entity embedding representations is as follows:
H t + 1 = G R U H t , H t H T G N N ,
where H t + 1 is the entity embedding at time t + 1. H t H T G N N is the entity embedding obtained after aggregating the knowledge graph using HT-GNN.
Similarly, in order to obtain the temporal embedding representation of the relationship, the following method is used for calculation:
R t + 1 = G R U R t , R ´ ,
R ´ = p o o l i n g H t ,
where R t + 1 is the relation embedding at time t + 1. p o o l i n g · represents mean pooling operation to retain all entity embeddings related to the relationship r and average them.

3.4. Global Memory Encoder

To extract cross-timestamp entity dependencies, we construct a global knowledge graph by associating historical snapshots containing the same entities [28]. We introduce a Time-Related Graph Neural Network (TR-GNN) to effectively encode this global graph.
The global graph aggregates facts across timestamps, including periodic and non-periodic occurrences. To fully utilize this temporal information, we design separate periodic and aperiodic time vector representations:
T p = f w 1 t + φ 1 ,
T a p = w 2 t + φ 2 ,
where T p is the periodic time vector; T a p is the non-periodic time vector. w and φ are learnable vectors, representing frequency and phase respectively. f · is a periodic activation function, which is sin function in the paper. t represents the time interval T i T j between two connected entities in the global knowledge graph.
To obtain the time-semantic dependencies between entities in the global knowledge graph, the paper designs TR-GNN to encode the knowledge graph:
h o , t l + 1 = σ s , r N t ( α 1 l h s , t l + α 2 l h r , t l ) + w 2 h o , t l ,
h r , t l = h r , t l + T p ,
where h o , t l + 1 is the encoded entity embedding. w 2 is a learnable weight parameter. α 1 l and α 2 l are weights for entities and relations respectively, calculated as follows:
α 1 l = e x p h L h s , t l h o , t l T a p s , r N t e x p h L h s , t l h o , t l T a p ,
α 2 l = e x p h L h r , t l h o , t l T a p s , r N t e x p h L h r , t l h o , t l T a p .
Using the non-periodic time vector to compute weights for entities and relations better captures the temporal-semantic relationships between entities. Finally, the updated entity embeddings are nonlinearly transformed to obtain the embedding representations of each entity in the global knowledge graph.

3.5. Gating Integration

In order to better integrate the embeddings vectors obtained by the local memory encoder and the global memory encoder for reasoning, the paper applies a learnable gating function [29] to adaptively adjust the weights of local memory embeddings and global memory embeddings, finally obtaining a vector:
h o , t = λ g h o , t g l o b a l + 1 λ g h o , t l o c a l ,
h r , t = λ g h r , t g l o b a l + 1 λ g h r , t l o c a l ,
where λ · is the sigmoid function, aiming to restrict the range of each element to [0, 1]. g is the gate parameter to adjust the weights. is element-wise multiplication.

3.6. Scoring Function and Loss Function

In this section, the paper mainly introduces the details of scoring function and the loss function. In order to obtain the probability of an event occurring at timestamp t + 1 in the future, ConvTransE [12] is utilized as the decoder to calculate the probability of the existence between entities at timestamp t + 1 under the relation r:
P t + 1 o s , r = λ h o , t + 1 C o n v T r a n E h s , t + 1 , h r , t + 1 ,
where λ · is the sigmoid function. h s , t + 1 and h r , t + 1 are the embedding representations of entities and relations at timestamp t + 1. h o , t + 1 is the embedding representations of entities after global and local fusion.
Following RE-GCN [12], the loss function during training is defined as:
L e = t = 0 T s , r , 0 , t + 1 G t + 1 l o g P t + 1 o s , r ,
L r = t = 0 T s , r , 0 , t + 1 G t + 1 l o g P t + 1 r s , o .
The final loss is L = λ L e + 1 λ L r + μ Θ 2 , where L e is the loss for predicting entities, L r is the loss for predicting relations, λ is the hyperparameter balancing different tasks. Θ 2 is L 2 norm, and μ is to control the penalty term. The overall training algorithm is shown in Algorithm 1.
Algorithm 1. Training of MSEN
Input: The train set: { G 0 , G 1 , G 2 , G 3 , G t m 1 } ; the number of epoch n.
Output: The minimum Loss on the train set.
  • Ε , R , Γ t Init.normal_(), l o s s m i n = 0 //Initialize the model’ parameters
  • For each i [ 1 , T ] do
    •    /* Local Memory Encoder */
  •       h o , t l + 1 = h o , t e n t l + h o , t r e l l + h o , t c o n l + h o , t s u m l //Obtaining the semantic information using HT-GNN
  •       H t + 1 = G R U H t , H t H T G N N //local memory representation
    •    /* Global Memory Encoder */
  •      Construct a global graph G;
  •       h o , t l + 1 = σ s , r N t ( α 1 l h s , t l + α 2 l h r , t l ) + w 2 h o , t l //Obtaining time-semantic dependencies using TR-GNN
  •       h o , t = λ g h o , t g l o b a l + 1 λ g h o , t l o c a l //Gating Integration
  •       P t + 1 o s , r ConvTranE()
  •      Loss Calculate the sum of L e and L r
  •      Update parameters
  • End for
  • Return Loss

4. Experiments

In this chapter, the paper compares MSEN with several baselines on two datasets. Meanwhile, ablation experiments are conducted to analyze the ability of acquiring semantic information at different scales. Comparison experiments evaluate the effects of different GCNs. In addition, the paper also studies the parameter settings of the model.

4.1. Datasets

To evaluate the effect of the MSEN model, experiments are conducted on two typical event-based datasets, including: ICEWS14 [17], ICEWS18 [6]. Both datasets are from the event-based dataset Integrated Crisis Early Warning System [30], with the time interval of 24 h. The statistical details of the datasets are listed in Table 2.

4.2. Baselines

The paper compares MSEN with two types of methods: static KG reasoning and TKG reasoning methods.
For static KG methods: DiSMult [1], ComplEx [2], RGCN [3], ConvE [4], and RotatE [5] are selected.
For TKG reasoning under the extrapolation setting: CyGNet [8], RE-NET [6], xERTE [9], RE-GCN [12], TITer [10], CENET [11], and TiRGN [14] are selected.

4.3. Evaluation Metrics

This paper adopts mean reciprocal rank (MRR) and Hit@k (k is 1 and 10) to evaluate the performance of models for entity prediction. For the fairness of the comparison, the ground truth is used when performing multi-step reasoning, and the experimental results are reported under the time-aware filtered setting. MRR is the average of the reciprocal ranks of all relevant candidate entities, measuring the rank of the highest-ranking relevant entity. Hit@k represents the proportion of cases where the true target entity is included in the top k predicted candidates, reflecting the model’s ability to contain the target entity within the top k.

4.4. Implementation Details

In the experiments, the embedding dimension was fixed at 200 for all methods. The Adam optimizer was used for parameter training with a learning rate of 0.001. The hyperparameter for balancing different tasks was set to 0.7. To ensure fairness of comparison with other models, for the local memory encoder, the number of adjacent historical knowledge graphs obtained was 3 and 6 on the ICEWS14 and ICEWS18 datasets respectively; for the global memory encoder, the number of adjacent historical knowledge graphs obtained was 10 and 10 on the ICEWS14 and ICEWS18 datasets, respectively. The number of layers in HT-GNN was 2 on both datasets. The number of layers in TR-GNN was 2 and 3 on the ICEWS14 and ICEWS18 datasets respectively. For the ConvTranE decoder, the number of channels used was 50 and the kernel size was 2 × 3.

5. Results

5.1. Performance Comparison

As shown in Table 3 and Table 4, entity prediction experiments are conducted on facts at future timestamps. The tables list the predictive performance of MSEN and the baseline models on ICEWS14 and ICEWS18 datasets. The results of baseline models in the table are from [15].
In the experimental results, the MSEN model outperforms other baseline models. The results show the effectiveness of the MSEN model in predicting entities at future timestamps. Compared with TiRGN*, MSEN improves the MRR by 3.5 and 1.83 percentage points on ICEWS14 and ICEWS18, respectively.
Probing deeper, we evaluated recall capabilities against TiRGN* on ICEWS14. At recall@1, MSEN achieved 32.78%, while TiRGN* achieved 34.88%. For recall@10, the models achieved near identical results, with MSEN at 61.98% versus 62.21% for TiRGN*. While marginally behind in recall@k, MSEN still demonstrates competitive retrieval ability.

5.2. Ablation Study

The encoder of the MSEN model is mainly composed of two submodules: Local Memory Encoder (LME) and Global Memory Encoder (GME). To further analyze the effect of each module on the final entity prediction results, the paper reports the results of LME (HT-GNN) and GME (TR-GNN) on the ICEWS14 dataset using MRR and Hit@k (k = 1, 3, 10) metrics. LME (HT-GNN) represents removing the GME module from the MSEN model; GME (TR-GNN) represents removing the LME module from the MSEN model. As can be seen from Table 5, the performance of LME (HT-GNN) is worse than MSEN, because this part only extracts local features and the obtained embedding representations have less semantic information. The performance of GME (TR-GNN) has greatly improved compared to LME (HT-GNN), but integrating both using a gating function can further improve the accuracy of entity prediction.
To study the effects of HT-GNN and TR-GNN in LME and GME, respectively, the paper compares these two models with different types of GNN-based models on the ICEWS14 dataset using MRR and Hit@k (k = 1, 3, 10) metrics. In LME, RGCN [3], KBGAT [31], and CompGCN (add) [32] are used to replace the HT-GNN model; in GME, RGCN [3], KBGAT [31], CompGCN (add) [32], and HRGNN [28] are used to replace the TR-GNN model. The experimental results in Table 5 show that the MRR of HT-GNN is 0.83 percentage points higher than CompGCN (add); in GME, the MRR of TR-GNN is 0.31 percentage points higher than HRGNN. Overall, both can effectively improve the accuracy of entity prediction in LME and GME respectively. The improvement of HT-GNN may be due to the hierarchical transfer method that enriches the semantic information of entities in the final knowledge graph. The improvement of TR-GNN may be due to its full utilization of both periodic and non-periodic time in the global knowledge graph.

5.3. Sensitivity Analysis

The paper further studied the effect of the number of layers in TR-GNN and HT-GNN on entity prediction performance for GME and LME modules using MRR and Hit@k (k = 1, 3, 10) metrics, respectively. Figure 3 show the experimental results with different numbers of layers on the ICEWS14 dataset. The experimental results show that with just 1 layer, the TR-GNN module can already effectively extract temporal-semantic information between entities. As the number of layers increases, the extracted features may interfere with entity embeddings. The HT-GNN model can extract sufficient semantic information with 2 layers, thus achieving higher entity prediction accuracy. Through experiments, it was proven that the different number of layers in TR-GNN and HT-GNN can affect the extraction of semantic information in the knowledge graph, thus influencing the effect of entity prediction.

5.4. Training Process Analysis

This paper studied the evolution of loss and evaluation metrics of the MSEN model during the training process. Figure 4 shows how the experimental results change with increasing epoch on the validation dataset of ICEWS14. The experimental results show that as the number of training epochs increases, the loss first gradually decreases, reaching the lowest point at the 10th epoch, and then slowly increases again; the values of the evaluation metrics MRR, Hit@1, Hit@3, and Hit@10 gradually increase, reaching a plateau after the 10th epoch. The results indicate that after 10 epochs of training on the ICEWS14 dataset, the model starts overfitting as evidenced by the increasing loss and stabilizing evaluation metrics. Therefore, 10 epoch is an appropriate number for training the MSEN model on the ICEWS14 dataset to achieve good performance without overfitting.

5.5. Statistical Analysis

In order to demonstrate the reliability of the results, both permutation tests and multiple repeated experiments were conducted in this study.
For the permutation test, we first trained the proposed MSEN model on the original training set and recorded the model’s performance (MRR) on the test set as the true observation. We then created perturbed training sets by replacing different proportions of head or tail entities in the original training set. The MSEN model was re-trained on these perturbed sets and achieved varying MRRs on the test set. The p-value was calculated using the following formula:
P = M / N
where M is the number of times the MRR obtained using the perturbed datasets was greater than the true observation, and N is the number of experiments conducted with the perturbed datasets.
As illustrated in Figure 5, the MSEN model achieved lower MRRs when trained on more heavily perturbed sets, with p-values below the 0.05 significance level. This suggests the model has genuinely captured meaningful patterns in the training data.
Additionally, the experiment was repeated three times using the original training set, yielding consistent MRRs of 46.68%, 46.75%, and 46.72%, with an average of 46.72%. This further demonstrates the reliability and reproducibility of the proposed model.

6. Discussion

The experimental results demonstrate that the proposed MSEN model achieves superior performance compared to several baselines on typical benchmark datasets. The better results validate the effectiveness of modeling multi-scale evolutionary information for predicting future events in TKGs.
The ablation studies in Section 5.2 provide insights into the contribution of each module in MSEN. Removing the global memory encoder leads to a significant performance drop, showing the importance of capturing long-distance temporal dependencies. The local memory encoder also contributes steady improvements by aggregating rich semantics within each timestamp. Using HT-GNN and TR-GNN as the graph neural network encoders also leads to better results compared to alternative GNN models, demonstrating their capabilities in encoding structural and temporal patterns from TKGs.
The multi-scale modeling in MSEN aligns with findings from previous works such as TiRGN [14] which show combining local and global patterns is beneficial. However, MSEN proposes new designs for each module, leading to better results. The hierarchical design in HT-GNN enriches semantic learning, while the periodic and aperiodic temporal modeling in TR-GNN improves temporal encoding.

7. Conclusions and Future work

In this work, we propose a new method Multi-Scale Evolutionary Network (MSEN) to solve temporal knowledge graph (TKG) reasoning under the extrapolation setting. First, a hierarchical transfer graph neural network (HT-GNN) is designed in the local memory encoder to acquire rich semantic information from the local knowledge graph, and sequence neural networks are used to learn temporal features of entities and relations. Second, in the global memory encoder, a time-related GNN (TR-GNN) is designed to make full use of temporal information in the global knowledge graph, obtaining periodic and aperiodic temporal-semantic dependencies in the global knowledge graph, providing different weights for entities and relations, improving performance of downstream tasks. Finally, embeddings from different scales are integrated to complete entity prediction. Experimental results on a typical benchmark demonstrate the effectiveness of MSEN in solving TKG reasoning problems.
While MSEN achieves good results, there remain several promising directions for future investigation: (1) Model Compression: The current MSEN model entails relatively high computational complexity. To enable real-time inference and deployment under resource constraints, we can explore model compression techniques such as pruning and knowledge distillation. By simplifying the model while retaining predictive power, we can enhance its applicability to real-world systems. (2) Robustness to Noisy Data: Historical knowledge graphs often suffer from incomplete, biased, or noisy data. The robustness of MSEN’s predictions rely on the quality of the historical inputs. Advanced data filtering, denoising, and imputation methods can be studied to handle imperfect historical graphs. Techniques such as outlier detection and data estimation can help improve the model’s resilience against noise and missing facts. (3) Advanced GNN Architectures: Further improving the local and global graph neural network encoders with more sophisticated architectures such as graph attention networks is a promising direction. This can potentially capture more complex structural, semantic, and temporal patterns.
In the future, we will aim to enhance MSEN along these directions to make broader impacts in real-world applications of temporal knowledge graph reasoning.

Author Contributions

Conceptualization, S.C. (Shudong Chen) and Y.Y.; methodology, Y.Y. and R.D.; software, Y.Y. and D.T.; validation, Y.Y., H.X. and D.T.; formal analysis, Y.Y.; investigation, H.X.; writing—original draft preparation, Y.Y.; writing—review and editing, S.C. (Shuai Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The ICEWS14 and ICEWS18 datasets used in this study are derived from the Integrated Crisis Early Warning System event dataset, which is available from the ICEWS project website (https://www.andybeger.com/icews/ (accessed on 27 September 2023)). The specific dataset versions utilized can be accessed through the reference links provided (references [6,17]).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, B.; Yih, W.T.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  2. Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33rd International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016. [Google Scholar]
  3. Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018, Proceedings 15; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
  4. Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 1811–1818. [Google Scholar]
  5. Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  6. Jin, W.; Qu, M.; Jin, X.; Ren, X. Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 6669–6683. [Google Scholar]
  7. Li, Z.; Guan, S.; Jin, X.; Peng, W.; Lyu, Y.; Zhu, Y.; Bai, L.; Li, W.; Guo, J.; Cheng, X. Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning. arXiv 2022, arXiv:2203.07782. [Google Scholar] [CrossRef]
  8. Zhu, C.; Chen, M.; Fan, C.; Cheng, G.; Zhang, Y. Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks. arXiv 2020, arXiv:2012.08492. [Google Scholar] [CrossRef]
  9. Han, Z.; Chen, P.; Ma, Y.; Tresp, V. Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
  10. Sun, H.; Zhong, J.; Ma, Y.; Han, Z.; He, K. TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting. arXiv 2021, arXiv:2109.04101. [Google Scholar] [CrossRef]
  11. Xu, Y.; Ou, J.; Xu, H.; Fu, L. Temporal knowledge graph reasoning with historical contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. [Google Scholar]
  12. Li, Z.; Jin, X.; Li, W.; Guan, S.; Guo, J.; Shen, H.; Wang, Y.; Cheng, X. Temporal Knowledge Graph Reasoning Based on Evolutional Representation Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Montreal, QC, Canada, 11–15 July 2021. [Google Scholar]
  13. Han, Z.; Ding, Z.; Ma, Y.; Gu, Y.; Tresp, V. Learning Neural Ordinary Equations for Forecasting Future Links on Temporal Knowledge Graphs. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
  14. Li, Y.; Sun, S.; Zhao, J. TiRGN: Time-guided recurrent graph network with local-global historical patterns for temporal knowledge graph reasoning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 July 2022; pp. 2152–2158. [Google Scholar]
  15. Zhang, M.; Xia, Y.; Liu, Q.; Wu, S.; Wang, L. Learning Latent Relations for Temporal Knowledge Graph Reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
  16. Leblay, J.; Chekol, M.W. Deriving Validity Time in Knowledge Graph. In Proceedings of the Companion of the the Web Conference, Lyon, France, 23–27 April 2018; pp. 1771–1776. [Google Scholar]
  17. García-Durán, A.; Dumančić, S.; Niepert, M. Learning Sequence Encoders for Temporal Knowledge Graph Completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4816–4821. [Google Scholar]
  18. Dasgupta, S.S.; Ray, S.N.; Talukdar, P. HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2001–2011. [Google Scholar]
  19. Lacroix, T.; Obozinski, G.; Usunier, N. Tensor Decompositions for Temporal Knowledge Base Completion. arXiv 2020, arXiv:2004.04926. [Google Scholar] [CrossRef]
  20. Goel, R.; Kazemi, S.M.; Brubaker, M.; Poupart, P. Diachronic Embedding for Temporal Knowledge Graph Completion. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, 7–12 February 2020; pp. 3988–3995. [Google Scholar]
  21. Sadeghian, A.; Armandpour, M.; Colas, A.; Wang, D.Z. ChronoR: Rotation Based Temporal Knowledge Graph Embedding. arXiv 2021, arXiv:2103.10379. [Google Scholar] [CrossRef]
  22. Fu, G.; Meng, Z.; Han, Z.; Ding, Z.; Ma, Y.; Schubert, M.; Tresp, V.; Wattenhofer, R. TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion. In Proceedings of the Sixth Workshop on Structured Prediction for NLP, SPNLP@ACL 2022, Dublin, Ireland, 27 May 2022; pp. 22–31. [Google Scholar]
  23. Linhai, Z.; Deyu, Z. Temporal Knowledge Graph Completion with Approximated Gaussian Process Embedding. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 12–17. [Google Scholar]
  24. Han, Z.; Wang, Y.; Ma, Y.; Günnemann, S.; Tresp, V. Graph Hawkes Network for Reasoning on Temporal Knowledge Graphs. arXiv 2020, arXiv:2003.13432. [Google Scholar] [CrossRef]
  25. Li, Z.; Jin, X.; Guan, S.; Li, W.; Guo, J.; Wang, Y.; Cheng, X. Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs. arXiv 2021, arXiv:2106.00327. [Google Scholar] [CrossRef]
  26. Deng, S.; Rangwala, H.; Ning, Y. Dynamic Knowledge Graph based Multi-Event Forecasting. In Proceedings of the KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, Virtual Event, CA, USA, 6–10 July 2020. [Google Scholar] [CrossRef]
  27. Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
  28. Zhang, M.; Xia, Y.; Liu, Q.; Wu, S.; Wang, L. Learning long-and short-term representations for temporal knowledge graph reasoning. In Proceedings of the ACM Web Conference, Austin, TX, USA, 30 April–4 May 2023; pp. 2412–2422. [Google Scholar]
  29. Hu, L.; Yang, T.; Zhang, L.; Zhong, W.; Tang, D.; Shi, C.; Duan, N.; Zhou, M. Compare to The Knowledge: Graph Neural Fake News Detection with External Knowledge. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual, 1–6 August 2021; pp. 754–763. [Google Scholar]
  30. Boschee, E.; Lautenschlager, J.; O’Brien, S.; Shellman, S.; Starz, J.; Ward, M.; Natarajan, D.; Bagozzi, B.; Berger, D.; Carley, K.M.; et al. ICEWS Coded Event Data. Harvard Dataverse 2015. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075 (accessed on 27 September 2023).
  31. Nathani, D.; Chauhan, J.; Sharma, C.; Kaul, M. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. arXiv 2019, arXiv:1906.01195. [Google Scholar] [CrossRef]
  32. Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based Multi-Relational Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
Figure 1. Illustration of the proposed MSEN model.
Figure 1. Illustration of the proposed MSEN model.
Futureinternet 15 00327 g001
Figure 2. Architecture of the proposed HT-GNN model.
Figure 2. Architecture of the proposed HT-GNN model.
Futureinternet 15 00327 g002
Figure 3. (a,b) are the effects of the number of layers on GME (TR-GNN) and LME (HT-GNN) in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%), respectively.
Figure 3. (a,b) are the effects of the number of layers on GME (TR-GNN) and LME (HT-GNN) in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%), respectively.
Futureinternet 15 00327 g003
Figure 4. (a) is the change in loss as the training epoch increases; (b) is the change in metric results as the training epoch increases in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%).
Figure 4. (a) is the change in loss as the training epoch increases; (b) is the change in metric results as the training epoch increases in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%).
Futureinternet 15 00327 g004
Figure 5. The change in MRR (%) as the perturbation ratio increases.
Figure 5. The change in MRR (%) as the perturbation ratio increases.
Futureinternet 15 00327 g005
Table 1. Important Symbols.
Table 1. Important Symbols.
SymbolExplanation
G n KG at timestamp n
Ε , R The set of entities, relations
Γ t The set of facts at timestamp t
h s , t , h r , t Embedding representation of entity s, relation r at timestamp t
h o , t e n t l Embedding representation of entity o at l-th layer
H t , R t The set of entity embedding, relation embedding at time t
T p ,   T a p The periodic time vector, the non-periodic time vector
h o , t g l o b a l ,   h o , t l o c a l Global memory embeddings, local memory embeddings
Table 2. The statistical details of the datasets.
Table 2. The statistical details of the datasets.
DatasetsTrainValidTestEntRelTime Gap
ICEWS1474,84585147371686923024 h
ICEWS18373,01845,99549,54523,03325624 h
Table 3. Performance on the ICEWS14 dataset in terms of MRR (%), Hit@1 (%), and Hit@10 (%).
Table 3. Performance on the ICEWS14 dataset in terms of MRR (%), Hit@1 (%), and Hit@10 (%).
ModelMRRHit@1Hit@10
DistMult25.3117.9342.22
ComplEx32.3323.2152.37
RGCN28.1419.4346.02
ConvE30.9321.7450.18
RotatE27.5318.6047.62
CyGNet37.6527.4357.90
RE-NET39.8630.1158.21
xERTE40.7932.7057.30
TITer41.7332.7458.44
RE-GCN*41.9932.9361.92
CENET41.3032.5858.22
TiRGN*43.1833.1262.24
Our Method46.6835.2968.96
The mark * in the table represents removing static information from models for fair comparison [15]. The best results are in bold.
Table 4. Performance on the ICEWS18 dataset in terms of MRR (%), Hit@1 (%), and Hit@10 (%).
Table 4. Performance on the ICEWS18 dataset in terms of MRR (%), Hit@1 (%), and Hit@10 (%).
ModelMRRHit@1Hit@10
DistMult16.5910.0131.69
ComplEx18.8411.4125.78
RGCN18.048.5735.68
ConvE24.2815.6144.59
RotatE15.357.1033.09
CyGNet27.1217.2146.85
RE-NET29.7819.7348.46
xERTE29.3121.0346.48
TITer29.9822.0544.83
RE-GCN*30.1319.1148.86
CENET29.6519.9848.23
TiRGN*32.2222.2451.88
Our Method34.0522.8356.80
The mark * in the table represents removing static information from models for fair comparison [15]. The best results are in bold.
Table 5. Performance of GNN-based model in LME and GME in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%).
Table 5. Performance of GNN-based model in LME and GME in terms of MRR (%), Hit@1 (%), Hit@3 (%), and Hit@10 (%).
ModelICEWS14
MRRHit@1Hit@3Hit@10
LME (RGCN)38.6929.3443.1156.68
LME (KBGAT)38.7429.5043.2556.26
LME (CompGCN(add))39.0829.7843.7556.77
LME (HT-GNN)39.9130.4544.4857.91
GME (RGCN)43.8431.2949.9069.06
GME (KBGAT)43.4131.5348.8367.08
GME (CompGCN(add))43.7231.1149.8668.98
GME (HRGNN)44.9333.0250.5868.97
GME (TR-GNN)45.2433.3650.8869.22
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, Y.; Chen, S.; Du, R.; Tong, D.; Xu, H.; Chen, S. MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs. Future Internet 2023, 15, 327. https://doi.org/10.3390/fi15100327

AMA Style

Yu Y, Chen S, Du R, Tong D, Xu H, Chen S. MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs. Future Internet. 2023; 15(10):327. https://doi.org/10.3390/fi15100327

Chicago/Turabian Style

Yu, Yong, Shudong Chen, Rong Du, Da Tong, Hao Xu, and Shuai Chen. 2023. "MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs" Future Internet 15, no. 10: 327. https://doi.org/10.3390/fi15100327

APA Style

Yu, Y., Chen, S., Du, R., Tong, D., Xu, H., & Chen, S. (2023). MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs. Future Internet, 15(10), 327. https://doi.org/10.3390/fi15100327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop