1. Introduction
Temporal knowledge graphs (TKGs) are a type of dynamic knowledge structure in which knowledge may change over time. Compared with traditional static knowledge graphs, TKGs take into account the influence of the time dimension. TKGs with multiple timestamps can be represented as a series of knowledge graph (KG) snapshots, where each snapshot contains all facts within the same timestamp.
For static knowledge graph completion, methods such as DiSMult [
1], ComplEx [
2], RGCN [
3], ConvE [
4], and RotatE [
5] have been proposed. TKG reasoning tasks can perform temporal reasoning and evolution analysis of knowledge, mainly divided into interpolation and extrapolation [
6]. Interpolation is to complete missing events based on existing facts; extrapolation is to predict future events based on past events. Predicting future events based on the evolution of historical knowledge graphs is very important and challenging. Obtaining possible future events in a timely manner based on temporal relations can support applications such as financial risk control, disaster relief, etc.
This paper focuses on extrapolation research. The challenge of this task is how to better obtain relevant historical information that can reflect future behaviors. Currently, there are two methods that can be used for TKG extrapolation, namely, queryspecific and entire graphbased methods [
7]. The former includes RENET [
6], CyGNet [
8], xERTE [
9], TITer [
10], CENET [
11], etc., using highfrequency historical facts related to a query’s topics and relations to predict future trend, ignoring structural dependencies within snapshot; the latter includes REGCN [
12], CEN 7, TANGO [
13], TiRGN [
14], L2TKG [
15], etc., using historical knowledge graph as input to obtain the evolutionary patterns of TKG.
The above two methods do not consider the historical features from different perspectives, ignoring potentially useful information. Relying solely on either local or global knowledge graphs may result in information loss, affecting prediction results. However, future events may depend on both local structural dependencies and global temporal patterns. Historical knowledge from varying perspectives likely influences future occurrences to differing degrees.
In this work, we propose a model to capture multiscale evolutionary features of historical information, called MultiScale Evolutionary Network (MSEN). Typically, different features manifest at different scales, providing diverse semantic information. The local memory focuses on contextual knowledge, while the global memory recalls longrange patterns. By integrating both scales, MSEN can better model TKG evolution for reasoning.
The main contributions are:
 (1)
In the local memory encoder, a hierarchical transfer aware graph neural network (HTGNN) is proposed to enriches structural semantics within each timestamp;
 (2)
In the global memory encoder, a time related graph neural network (TRGNN) is proposed to extract periodic and nonperiodic temporal patterns globally across timestamps;
 (3)
Through experiments on typical eventbased datasets, the paper demonstrates the effectiveness of our proposed model MSEN.
2. Related Work
In this section, this paper reviews the existing methods for TKG Reasoning under the interpolation setting and the extrapolation setting, and summarizes the methods for event prediction.
2.1. TKG Reasoning under the Interpolation Setting
For the interpolation setting, models try to complete missing facts in the past timestamps. TTransE [
16], TATransE [
17], and TADistMult [
17] embed temporal information into scoring functions and relations, respectively, to obtain the evolution information of facts; HyTE [
18] proposes a timeaware model based on hyperplanes, and projects entities and subgraphs onto specific hyperplanes; TNTComplEx [
19] introduces complex numbers to factorize a fourthorder tensor to generate timestamp embeddings. DESimplE [
20] accomplishes the task of knowledge graph completion by making feature representations of temporal information; ChronoR [
21] captures the rich interactions between time and multirelational features in knowledge graphs by learning embeddings of entities, relations, and time; TempCaps [
22] is a capsule network model that uses information retrieved through dynamic routing to construct entity embeddings. TKGC [
23] maps entities and relations in TKG to multivariate Gaussian processes, thereby simulating overall and local trends. However, these models cannot obtain embeddings of future events.
2.2. TKG Reasoning under the Extrapolation Setting
Recently, several studies have attempted to infer future events, and existing methods can be divided into two categories: queryspecific and entire graphbased methods [
7] They all use historical information to predict future trends.
2.2.1. QuerySpecific Models
The methods focus on obtaining historical information of a specific query. GHNN [
24] predicts the occurrence of future events through obtaining the dynamic sequence of evolving graphs; xERTE [
9] uses iterative sampling and attention propagation techniques to extract closed subgraphs around specific queries. Both CluSTeR [
25] and TITer [
10] are based on reinforcement learning to discover paths for specific queries in the historical KG; RENET [
6] can effectively extract sequential and structural information related to a specific query in the TKG to solve the entity prediction task for a given query, but cannot model longterm KG information; CyGNet [
8] combines two reasoning modes, copy mode and generation mode, making predictions based on repeated entities in the historical vocabulary or the entire entity vocabulary, but ignores higherorder semantic dependencies between cooccurring entities; CENET [
11] proposes a novel historical comparative learning approach for knowledge graph reasoning that uses temporal reasoning patterns mined from historical knowledge graph versions to make more accurate predictions.
2.2.2. Entire GraphBased Models
The methods focus on obtaining the history of the latest KG of fixed length. Glean [
26] introduces unstructured event description information to enrich entity representations, but description information is not available for all events in real applications; TANGO [
13] extends neural ordinary differential equations to multirelational graph convolutional networks to model structural information, but the model’s utilization of longdistance information is still limited; REGCN [
12] and CEN [
7] both model the evolution of representations of all entities by capturing fixedlength historical information, but neither of these two models considers the evolutionary features of historical information from different perspectives; TiRGN [
14] introduces a new recurrent graph network model that incorporates local and global historical patterns in temporal knowledge graphs to improve the accuracy of link prediction over time.
Some recent works have incorporated both local and global modeling for TKG reasoning. However, MSEN proposes new designs for each module and their integration, achieving better results. The hierarchical design in the local module enriches semantic learning, while the periodic and aperiodic temporal modeling in the global module improves temporal encoding.
3. The Proposed Method
In this section, this paper mainly introduces Temporal Knowledge Graph (TKG) Reasoning under the extrapolation setting and the proposed method.
Table 1 shows some of the important symbols of the paper and the corresponding explanations.
3.1. Problem Formulation
A TKG $\mathrm{G}=\{{\mathrm{G}}_{0},{\mathrm{G}}_{1},{\mathrm{G}}_{2},{\mathrm{G}}_{3},\cdots {\mathrm{G}}_{\mathrm{n}}\}$ is a KG sequence. In each KG, ${\mathrm{G}}_{\mathrm{t}}=\{\mathsf{{\rm E}},\mathrm{R},{\mathsf{\Gamma}}_{\mathrm{t}}\}$ is a directed multirelational graph, where Ε is the set of entities; $\mathrm{R}$ is the set of relations; ${\mathsf{\Gamma}}_{\mathrm{t}}$ is the set of facts at timestamp t. A fact in ${\mathsf{\Gamma}}_{\mathrm{t}}$ can be expressed as a quaternion (s,r,o,t), where $\mathrm{s},\mathrm{o}\in \mathrm{E}$ and $\mathrm{r}\in \mathrm{R}$. The goal of the TKG entity reasoning task is to complete tail entity prediction (s,r,?,${\mathrm{t}}_{\mathrm{m}}$) or head entity prediction (?,r,o,${\mathrm{t}}_{\mathrm{m}}$) based on the historical knowledge graph sequence $\{{\mathrm{G}}_{0},{\mathrm{G}}_{1},{\mathrm{G}}_{2},{\mathrm{G}}_{3},\cdots {\mathrm{G}}_{{\mathrm{t}}_{\mathrm{m}}1}\}$. For each quaternion (s,r,o,t), an inverse quaternion $(\mathrm{o},{\mathrm{r}}^{1},\mathrm{s},\mathrm{t})$ is added to the KG. Therefore, the prediction of head entity (?,r,o,$\mathrm{t}$) can be transformed into the prediction of tail entity ($\mathrm{o},{\mathrm{r}}^{1},?,\mathrm{t}$).
3.2. Model Overview
In this section, a MultiScale Evolutionary Network (MSEN) is proposed to solve the problem of feature extraction at different scales. The overall framework of a MSEN is illustrated in
Figure 1. There are three parts in our model: (1) A local memory encoder that applies a hierarchical transferaware graph neural network (HTGNN) to obtain structural features within each timestamp’s knowledge graph snapshot. The local encoder also performs temporal encoding of entities and relations using a gated recurrent unit (GRU); (2) a global memory encoder that employs a timerelated graph neural network (TRGNN) to mine temporalsemantic dependencies across the historical knowledge graph, yielding entity and relation embeddings; (3) a decoder that integrates the local and global encoder outputs and utilizes a scoring function to predict future entities.
Specifically, the input is a series of temporal knowledge graph snapshots. The local encoder extracts pertimestamp features using HTGNN and temporal dynamics via the GRU. The global encoder identifies longrange patterns using TRGNN. Finally, the decoder combines local and global embeddings to predict future entities using the scoring function. This multiscale approach allows MSEN to capture both contextual knowledge and global evolutions for enhanced temporal reasoning.
3.3. Local Memory Encoder
The local memory encoder focuses on obtaining the memories of the m historical knowledge graphs adjacent to the query time. It aggregates each knowledge graph snapshot to capture structural dependencies within timestamps. Additionally, sequence neural networks are utilized to learn temporal features of entities and relations, enabling the model to better represent temporal dynamics across knowledge graphs.
3.3.1. HTGNN
Obtaining informative embedding representations for each temporal knowledge graph is critical for effective local encoding. To better capture semantic information, we propose a novel graph neural networkbased knowledge graph embedding method called Hierarchical TransferAware Graph Neural Network (HTGNN). As illustrated in
Figure 2, HTGNN performs sequential aggregation at each layer: entity aggregation, relation aggregation, convolution aggregation, and sum aggregation. The blue nodes serve as aggregation centers. Different aggregation results are merged as input to the next layer. Finally, the outputs from the multilayer aggregations are combined as the overall model output.
Specifically, entity aggregation allows nodes to incorporate semantics from adjacent entities based on cooccurrence, capturing semantic dependencies. The entity aggregation function and weight calculation are:
where
${\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{e}\mathrm{n}\mathrm{t}}$ is embedding representation of entity o.
${\mathrm{h}}_{\mathrm{s},\mathrm{t}}^{\mathrm{l}}$ is the embedding representation of entity s which is a neighboring node of entity o.
${\mathrm{w}}_{\mathrm{e}\mathrm{n}\mathrm{t}}^{\mathrm{l}}$ is a learnable weight matrix.
${\mathrm{N}}_{\mathrm{t}}$ represents the neighboring entities and relations of entity o.
$\mathsf{\sigma}$ is a nonlinear activation function.
${\mathsf{\alpha}}_{\mathrm{e}\mathrm{n}\mathrm{t}}^{\mathrm{l}}$ is the aggregation weight, calculated as follows:
where
$\mathrm{h}\left(\xb7\right)$ is the LeakyRelu activation function. L
$\left(\xb7\right)$ represents the fully connected function.
$\parallel $ represents the concatenation operation.
The result of entity aggregation is used as input to relation aggregation. The relation aggregation infuses relation semantics into the entity representations, thereby obtaining the cooccurrence of each node and relation in the knowledge graph. The relation aggregation function and corresponding weight calculation method used in the paper are as follows:
where
${\mathrm{h}}_{\mathrm{r},\mathrm{t}}^{\mathrm{l}}$ is the embedding representation of the neighboring relation r of entity o.
${\mathrm{w}}_{\mathrm{r}\mathrm{e}\mathrm{l}}^{\mathrm{l}}$ is a learnable weight matrix.
${\mathsf{\alpha}}_{\mathrm{r}\mathrm{e}\mathrm{l}}^{\mathrm{l}}$ is the relation aggregation weight, calculated as follows:
The result of relation aggregation is used as input to convolution aggregation. For convolution aggregation, it obtains the hidden relation between entity s, relation r, and entity o, which enriches the semantic information of entities. The convolution aggregation function and weight calculation method used in the paper are as follows:
where
$\mathrm{c}\mathrm{o}\mathrm{n}\left(\xb7\right)$ is the convolution operation.
${\mathrm{w}}_{\mathrm{c}\mathrm{o}\mathrm{n}}^{\mathrm{l}}$ is a learnable weight matrix.
${\mathsf{\alpha}}_{\mathrm{r}\mathrm{e}\mathrm{l}}^{\mathrm{l}}$ is the aggregation weight, calculated as follows:
The result of convolution aggregation is used as input to sum aggregation. For sum aggregation, information is obtained through relational paths between entities, and sumobtained information is similar to translation model TransE [
27]. The sum aggregation function and corresponding weight calculation method used in the paper are as follows:
where
${\mathrm{h}}_{\mathrm{s},\mathrm{t}}^{\mathrm{l}}$ and
${\mathrm{h}}_{\mathrm{r},\mathrm{t}}^{\mathrm{l}}$ are the embedding representations of neighboring entity s and relation r of entity o.
${\mathrm{w}}_{\mathrm{s}\mathrm{u}\mathrm{m}}^{\mathrm{l}}$ is a learnable weight matrix.
${\mathsf{\alpha}}_{\mathrm{s}\mathrm{u}\mathrm{m}}^{\mathrm{l}}$ is the relation aggregation weight, calculated as follows:
Finally, the final entity embedding can be expressed as:
3.3.2. Local Memory Representation
In order to obtain the temporal features of entities and relations in adjacent historical knowledge graphs, following REGCN [
12], GRU is adopted to update the representations of entities and relations. The update of entity embedding representations is as follows:
where
${\mathrm{H}}_{\mathrm{t}+1}$ is the entity embedding at time t + 1.
${\mathrm{H}}_{\mathrm{t}}^{\mathrm{H}\mathrm{T}\mathrm{G}\mathrm{N}\mathrm{N}}$ is the entity embedding obtained after aggregating the knowledge graph using HTGNN.
Similarly, in order to obtain the temporal embedding representation of the relationship, the following method is used for calculation:
where
${\mathrm{R}}_{\mathrm{t}+1}$ is the relation embedding at time t + 1.
$\mathrm{p}\mathrm{o}\mathrm{o}\mathrm{l}\mathrm{i}\mathrm{n}\mathrm{g}\left(\xb7\right)$ represents mean pooling operation to retain all entity embeddings related to the relationship r and average them.
3.4. Global Memory Encoder
To extract crosstimestamp entity dependencies, we construct a global knowledge graph by associating historical snapshots containing the same entities [
28]. We introduce a TimeRelated Graph Neural Network (TRGNN) to effectively encode this global graph.
The global graph aggregates facts across timestamps, including periodic and nonperiodic occurrences. To fully utilize this temporal information, we design separate periodic and aperiodic time vector representations:
where
${\mathrm{T}}^{\mathrm{p}}$ is the periodic time vector;
${\mathrm{T}}^{\mathrm{a}\mathrm{p}}$ is the nonperiodic time vector.
$\mathrm{w}$ and
$\mathsf{\phi}$ are learnable vectors, representing frequency and phase respectively.
$\mathrm{f}\left(\xb7\right)$ is a periodic activation function, which is sin function in the paper. t represents the time interval
$\left{\mathrm{T}}_{\mathrm{i}}{\mathrm{T}}_{\mathrm{j}}\right$ between two connected entities in the global knowledge graph.
To obtain the timesemantic dependencies between entities in the global knowledge graph, the paper designs TRGNN to encode the knowledge graph:
where
${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}\right)}^{\mathrm{l}+1}$ is the encoded entity embedding.
${\mathrm{w}}_{2}$ is a learnable weight parameter.
${\mathsf{\alpha}}_{1}^{\mathrm{l}}$ and
${\mathsf{\alpha}}_{2}^{\mathrm{l}}$ are weights for entities and relations respectively, calculated as follows:
Using the nonperiodic time vector to compute weights for entities and relations better captures the temporalsemantic relationships between entities. Finally, the updated entity embeddings are nonlinearly transformed to obtain the embedding representations of each entity in the global knowledge graph.
3.5. Gating Integration
In order to better integrate the embeddings vectors obtained by the local memory encoder and the global memory encoder for reasoning, the paper applies a learnable gating function [
29] to adaptively adjust the weights of local memory embeddings and global memory embeddings, finally obtaining a vector:
where
$\mathsf{\lambda}\left(\xb7\right)$ is the sigmoid function, aiming to restrict the range of each element to [0, 1]. g is the gate parameter to adjust the weights.
$\u229a$ is elementwise multiplication.
3.6. Scoring Function and Loss Function
In this section, the paper mainly introduces the details of scoring function and the loss function. In order to obtain the probability of an event occurring at timestamp t + 1 in the future, ConvTransE [
12] is utilized as the decoder to calculate the probability of the existence between entities at timestamp t + 1 under the relation r:
where
$\mathsf{\lambda}\left(\xb7\right)$ is the sigmoid function.
${\mathrm{h}}_{\mathrm{s},\mathrm{t}+1}$ and
${\mathrm{h}}_{\mathrm{r},\mathrm{t}+1}$ are the embedding representations of entities and relations at timestamp t + 1.
${\mathrm{h}}_{\mathrm{o},\mathrm{t}+1}$ is the embedding representations of entities after global and local fusion.
Following REGCN [
12], the loss function during training is defined as:
The final loss is
$\mathrm{L}=\mathsf{\lambda}{\mathrm{L}}_{\mathrm{e}}+\left(1\mathsf{\lambda}\right){\mathrm{L}}_{\mathrm{r}}+\mathsf{\mu}{\Vert \mathsf{\Theta}\Vert}_{2}$, where
${\mathrm{L}}_{\mathrm{e}}$ is the loss for predicting entities,
${\mathrm{L}}_{\mathrm{r}}$ is the loss for predicting relations,
$\mathsf{\lambda}$ is the hyperparameter balancing different tasks.
${\Vert \mathsf{\Theta}\Vert}_{2}$ is
${\mathrm{L}}_{2}$ norm, and
$\mathsf{\mu}$ is to control the penalty term. The overall training algorithm is shown in Algorithm 1.
Algorithm 1. Training of MSEN 
Input: The train set: $\{{\mathbf{G}}_{0},{\mathbf{G}}_{1},{\mathbf{G}}_{2},{\mathbf{G}}_{3},\cdots {\mathbf{G}}_{{\mathbf{t}}_{\mathbf{m}}1}\}$; the number of epoch n. Output: The minimum Loss on the train set. 
$\mathsf{{\rm E}},\mathrm{R},{\mathsf{\Gamma}}_{\mathrm{t}}\u27f5$Init.normal_(), ${\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}_{\mathrm{m}\mathrm{i}\mathrm{n}}=0$ //Initialize the model’ parameters For each $\mathrm{i}\in [1,\mathrm{T}]$ do ${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}\right)}^{\mathrm{l}+1}={\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{e}\mathrm{n}\mathrm{t}}\right)}^{\mathrm{l}}$ + ${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{l}}\right)}^{\mathrm{l}}$ + ${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{c}\mathrm{o}\mathrm{n}}\right)}^{\mathrm{l}}$ + ${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{s}\mathrm{u}\mathrm{m}}\right)}^{\mathrm{l}}$//Obtaining the semantic information using HTGNN ${\mathrm{H}}_{\mathrm{t}+1}=\mathrm{G}\mathrm{R}\mathrm{U}\left({\mathrm{H}}_{\mathrm{t}},{\mathrm{H}}_{\mathrm{t}}^{\mathrm{H}\mathrm{T}\mathrm{G}\mathrm{N}\mathrm{N}}\right)$//local memory representation Construct a global graph G; ${\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}\right)}^{\mathrm{l}+1}=\mathsf{\sigma}\left({\sum}_{\left(\mathrm{s},\mathrm{r}\right)\in {\mathrm{N}}_{\mathrm{t}}}{(\mathsf{\alpha}}_{1}^{\mathrm{l}}{\mathrm{h}}_{\mathrm{s},\mathrm{t}}^{\mathrm{l}}+{\mathsf{\alpha}}_{2}^{\mathrm{l}}\left({\mathrm{h}}_{\mathrm{r},\mathrm{t}}^{\mathrm{l}}\right)\prime )+{{\mathrm{w}}_{2}\left({\mathrm{h}}_{\mathrm{o},\mathrm{t}}\right)}^{\mathrm{l}}\right)$ //Obtaining timesemantic dependencies using TRGNN ${\mathrm{h}}_{\mathrm{o},\mathrm{t}}=\mathsf{\lambda}\left(\mathrm{g}\right)\u229a{\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}+\left(1\mathsf{\lambda}\left(\mathrm{g}\right)\right)\u229a{\mathrm{h}}_{\mathrm{o},\mathrm{t}}^{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}$//Gating Integration ${\mathrm{P}}_{\mathrm{t}+1}\left(\mathrm{o}\left\mathrm{s},\mathrm{r}\right.\right)\u27f5$ConvTranE() Loss$\u27f5$Calculate the sum of ${\mathrm{L}}_{\mathrm{e}}$ and ${\mathrm{L}}_{\mathrm{r}}$ Update parameters End for Return Loss

4. Experiments
In this chapter, the paper compares MSEN with several baselines on two datasets. Meanwhile, ablation experiments are conducted to analyze the ability of acquiring semantic information at different scales. Comparison experiments evaluate the effects of different GCNs. In addition, the paper also studies the parameter settings of the model.
4.1. Datasets
To evaluate the effect of the MSEN model, experiments are conducted on two typical eventbased datasets, including: ICEWS14 [
17], ICEWS18 [
6]. Both datasets are from the eventbased dataset Integrated Crisis Early Warning System [
30], with the time interval of 24 h. The statistical details of the datasets are listed in
Table 2.
4.2. Baselines
The paper compares MSEN with two types of methods: static KG reasoning and TKG reasoning methods.
For static KG methods: DiSMult [
1], ComplEx [
2], RGCN [
3], ConvE [
4], and RotatE [
5] are selected.
For TKG reasoning under the extrapolation setting: CyGNet [
8], RENET [
6], xERTE [
9], REGCN [
12], TITer [
10], CENET [
11], and TiRGN [
14] are selected.
4.3. Evaluation Metrics
This paper adopts mean reciprocal rank (MRR) and Hit@k (k is 1 and 10) to evaluate the performance of models for entity prediction. For the fairness of the comparison, the ground truth is used when performing multistep reasoning, and the experimental results are reported under the timeaware filtered setting. MRR is the average of the reciprocal ranks of all relevant candidate entities, measuring the rank of the highestranking relevant entity. Hit@k represents the proportion of cases where the true target entity is included in the top k predicted candidates, reflecting the model’s ability to contain the target entity within the top k.
4.4. Implementation Details
In the experiments, the embedding dimension was fixed at 200 for all methods. The Adam optimizer was used for parameter training with a learning rate of 0.001. The hyperparameter for balancing different tasks was set to 0.7. To ensure fairness of comparison with other models, for the local memory encoder, the number of adjacent historical knowledge graphs obtained was 3 and 6 on the ICEWS14 and ICEWS18 datasets respectively; for the global memory encoder, the number of adjacent historical knowledge graphs obtained was 10 and 10 on the ICEWS14 and ICEWS18 datasets, respectively. The number of layers in HTGNN was 2 on both datasets. The number of layers in TRGNN was 2 and 3 on the ICEWS14 and ICEWS18 datasets respectively. For the ConvTranE decoder, the number of channels used was 50 and the kernel size was 2 × 3.
5. Results
5.1. Performance Comparison
As shown in
Table 3 and
Table 4, entity prediction experiments are conducted on facts at future timestamps. The tables list the predictive performance of MSEN and the baseline models on ICEWS14 and ICEWS18 datasets. The results of baseline models in the table are from [
15].
In the experimental results, the MSEN model outperforms other baseline models. The results show the effectiveness of the MSEN model in predicting entities at future timestamps. Compared with TiRGN*, MSEN improves the MRR by 3.5 and 1.83 percentage points on ICEWS14 and ICEWS18, respectively.
Probing deeper, we evaluated recall capabilities against TiRGN* on ICEWS14. At recall@1, MSEN achieved 32.78%, while TiRGN* achieved 34.88%. For recall@10, the models achieved near identical results, with MSEN at 61.98% versus 62.21% for TiRGN*. While marginally behind in recall@k, MSEN still demonstrates competitive retrieval ability.
5.2. Ablation Study
The encoder of the MSEN model is mainly composed of two submodules: Local Memory Encoder (LME) and Global Memory Encoder (GME). To further analyze the effect of each module on the final entity prediction results, the paper reports the results of LME (HTGNN) and GME (TRGNN) on the ICEWS14 dataset using MRR and Hit@k (k = 1, 3, 10) metrics. LME (HTGNN) represents removing the GME module from the MSEN model; GME (TRGNN) represents removing the LME module from the MSEN model. As can be seen from
Table 5, the performance of LME (HTGNN) is worse than MSEN, because this part only extracts local features and the obtained embedding representations have less semantic information. The performance of GME (TRGNN) has greatly improved compared to LME (HTGNN), but integrating both using a gating function can further improve the accuracy of entity prediction.
To study the effects of HTGNN and TRGNN in LME and GME, respectively, the paper compares these two models with different types of GNNbased models on the ICEWS14 dataset using MRR and Hit@k (k = 1, 3, 10) metrics. In LME, RGCN [
3], KBGAT [
31], and CompGCN (add) [
32] are used to replace the HTGNN model; in GME, RGCN [
3], KBGAT [
31], CompGCN (add) [
32], and HRGNN [
28] are used to replace the TRGNN model. The experimental results in
Table 5 show that the MRR of HTGNN is 0.83 percentage points higher than CompGCN (add); in GME, the MRR of TRGNN is 0.31 percentage points higher than HRGNN. Overall, both can effectively improve the accuracy of entity prediction in LME and GME respectively. The improvement of HTGNN may be due to the hierarchical transfer method that enriches the semantic information of entities in the final knowledge graph. The improvement of TRGNN may be due to its full utilization of both periodic and nonperiodic time in the global knowledge graph.
5.3. Sensitivity Analysis
The paper further studied the effect of the number of layers in TRGNN and HTGNN on entity prediction performance for GME and LME modules using MRR and Hit@k (k = 1, 3, 10) metrics, respectively.
Figure 3 show the experimental results with different numbers of layers on the ICEWS14 dataset. The experimental results show that with just 1 layer, the TRGNN module can already effectively extract temporalsemantic information between entities. As the number of layers increases, the extracted features may interfere with entity embeddings. The HTGNN model can extract sufficient semantic information with 2 layers, thus achieving higher entity prediction accuracy. Through experiments, it was proven that the different number of layers in TRGNN and HTGNN can affect the extraction of semantic information in the knowledge graph, thus influencing the effect of entity prediction.
5.4. Training Process Analysis
This paper studied the evolution of loss and evaluation metrics of the MSEN model during the training process.
Figure 4 shows how the experimental results change with increasing epoch on the validation dataset of ICEWS14. The experimental results show that as the number of training epochs increases, the loss first gradually decreases, reaching the lowest point at the 10th epoch, and then slowly increases again; the values of the evaluation metrics MRR, Hit@1, Hit@3, and Hit@10 gradually increase, reaching a plateau after the 10th epoch. The results indicate that after 10 epochs of training on the ICEWS14 dataset, the model starts overfitting as evidenced by the increasing loss and stabilizing evaluation metrics. Therefore, 10 epoch is an appropriate number for training the MSEN model on the ICEWS14 dataset to achieve good performance without overfitting.
5.5. Statistical Analysis
In order to demonstrate the reliability of the results, both permutation tests and multiple repeated experiments were conducted in this study.
For the permutation test, we first trained the proposed MSEN model on the original training set and recorded the model’s performance (MRR) on the test set as the true observation. We then created perturbed training sets by replacing different proportions of head or tail entities in the original training set. The MSEN model was retrained on these perturbed sets and achieved varying MRRs on the test set. The pvalue was calculated using the following formula:
where M is the number of times the MRR obtained using the perturbed datasets was greater than the true observation, and N is the number of experiments conducted with the perturbed datasets.
As illustrated in
Figure 5, the MSEN model achieved lower MRRs when trained on more heavily perturbed sets, with pvalues below the 0.05 significance level. This suggests the model has genuinely captured meaningful patterns in the training data.
Additionally, the experiment was repeated three times using the original training set, yielding consistent MRRs of 46.68%, 46.75%, and 46.72%, with an average of 46.72%. This further demonstrates the reliability and reproducibility of the proposed model.
6. Discussion
The experimental results demonstrate that the proposed MSEN model achieves superior performance compared to several baselines on typical benchmark datasets. The better results validate the effectiveness of modeling multiscale evolutionary information for predicting future events in TKGs.
The ablation studies in
Section 5.2 provide insights into the contribution of each module in MSEN. Removing the global memory encoder leads to a significant performance drop, showing the importance of capturing longdistance temporal dependencies. The local memory encoder also contributes steady improvements by aggregating rich semantics within each timestamp. Using HTGNN and TRGNN as the graph neural network encoders also leads to better results compared to alternative GNN models, demonstrating their capabilities in encoding structural and temporal patterns from TKGs.
The multiscale modeling in MSEN aligns with findings from previous works such as TiRGN [
14] which show combining local and global patterns is beneficial. However, MSEN proposes new designs for each module, leading to better results. The hierarchical design in HTGNN enriches semantic learning, while the periodic and aperiodic temporal modeling in TRGNN improves temporal encoding.
7. Conclusions and Future work
In this work, we propose a new method MultiScale Evolutionary Network (MSEN) to solve temporal knowledge graph (TKG) reasoning under the extrapolation setting. First, a hierarchical transfer graph neural network (HTGNN) is designed in the local memory encoder to acquire rich semantic information from the local knowledge graph, and sequence neural networks are used to learn temporal features of entities and relations. Second, in the global memory encoder, a timerelated GNN (TRGNN) is designed to make full use of temporal information in the global knowledge graph, obtaining periodic and aperiodic temporalsemantic dependencies in the global knowledge graph, providing different weights for entities and relations, improving performance of downstream tasks. Finally, embeddings from different scales are integrated to complete entity prediction. Experimental results on a typical benchmark demonstrate the effectiveness of MSEN in solving TKG reasoning problems.
While MSEN achieves good results, there remain several promising directions for future investigation: (1) Model Compression: The current MSEN model entails relatively high computational complexity. To enable realtime inference and deployment under resource constraints, we can explore model compression techniques such as pruning and knowledge distillation. By simplifying the model while retaining predictive power, we can enhance its applicability to realworld systems. (2) Robustness to Noisy Data: Historical knowledge graphs often suffer from incomplete, biased, or noisy data. The robustness of MSEN’s predictions rely on the quality of the historical inputs. Advanced data filtering, denoising, and imputation methods can be studied to handle imperfect historical graphs. Techniques such as outlier detection and data estimation can help improve the model’s resilience against noise and missing facts. (3) Advanced GNN Architectures: Further improving the local and global graph neural network encoders with more sophisticated architectures such as graph attention networks is a promising direction. This can potentially capture more complex structural, semantic, and temporal patterns.
In the future, we will aim to enhance MSEN along these directions to make broader impacts in realworld applications of temporal knowledge graph reasoning.