Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks

Lu, Yizhi; Su, Lei; Wu, Liping; Jiang, Di

doi:10.3390/app15062958

Open AccessArticle

Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 2958; https://doi.org/10.3390/app15062958

Submission received: 31 January 2025 / Revised: 5 March 2025 / Accepted: 6 March 2025 / Published: 10 March 2025

Download

Browse Figures

Versions Notes

Abstract

The multi-granularity temporal knowledge graph question-answering model consists of two core tasks: question information extraction and knowledge graph embedding representation. Existing studies typically compute the relevance score between the question and the associated temporal knowledge graph to identify the answer. However, current multi-granularity temporal knowledge graph datasets are relatively scarce, and most research has not fully exploited the potential of these limited datasets, resulting in limitations in the model’s ability to capture complex temporal relationships. Additionally, existing methods often overlook the model’s ability to capture the rich semantic relationships within questions, especially when dealing with diverse and dynamically changing temporal information. To make full use of the limited datasets and help the model better capture the complex relationships in the question, this paper proposes a multi-granularity temporal knowledge graph question-answering model based on data augmentation and convolutional networks. By using semantic paraphrasing to expand the questions and obtain more comprehensive question representations, and incorporating convolutional networks in the feature learning phase to capture more diverse and richer question representations, the proposed method outperforms baseline models on publicly available datasets, as demonstrated by experimental results.

Keywords:

multi-granularity temporal knowledge graph question answering; relation; data augmentation; convolutional networks; temporal aggregation

1. Introduction

In recent years, question-answering (QA) systems have gradually developed into important tools for intelligent information interaction, especially in the field of natural language processing. With their advantages in providing automated and accurate answers, they have gradually permeated various industries. QA systems not only enhance the efficiency of user–computer interaction but also play a crucial role in improving the accuracy and speed of knowledge acquisition. Among them, knowledge graph question answering uses knowledge graphs as data sources to answer corresponding natural language questions. This approach not only improves the accuracy of answers but also leverages entities, relationships, and attributes in the knowledge graph to provide more comprehensive and in-depth answers. Currently, knowledge graph question answering is widely applied in intelligent dialogue [1], personalized recommendations [2], intelligent search and question answering, and medical consultations. Temporal knowledge graphs [3], unlike traditional static knowledge graphs, add timestamps to represent changes over time, reflecting the dynamic nature of relationships. Previous temporal knowledge graph datasets such as TempQuestions [4], CRONQUESTIONS [5], and TempoQR [6] typically use yearly granularity, neglecting finer temporal details and lacking time sensitivity in addressing real-world issues. Some researchers have proposed multi-granularity temporal knowledge graph datasets, such as MULTIQ [7], which enhance event timeliness by incorporating richer temporal granularity such as year, month, and day. Although MULTIQ [7] has a large dataset, multi-granularity temporal datasets are still scarce, and it is necessary to explore how to better learn data features to improve performance.

Current temporal knowledge graph question-answering (KGQA) methods can be divided into two categories: those that use semantic frameworks and those that use embeddings. The decomposition method using semantic frameworks involves breaking the original question into a non-temporal question and a temporal constraint question, and then utilizing traditional KGQA methods to find candidate answers for the sub-questions. Finally, temporal constraints are applied to filter and produce the final answer. For example, TEQUILA [8] decomposes the question into simpler sub-questions with some form of rewriting, while SF-TQA [9] establishes constraints and their possible interpretations, exploring relevant knowledge graph (KG) facts to build these constraints. Methods using semantic frameworks decompose the question using predefined templates, which limits their generalization ability. This approach requires human experts to create rules, and it is not suitable for addressing complex questions. Additionally, it necessitates the creation of an extensive template library, and the expansion of templates is time-consuming and labor-intensive. Embedding-based methods, on the other hand, use vector similarity and relevance scoring functions to find answers related to the question. As shown in Figure 1, these methods map entities and relationships into low-dimensional vector spaces, allowing computers to better understand and manipulate the information within the knowledge graph. There are many knowledge graph embedding methods currently available. TransE [10] can solve one-to-one relationships but struggles with one-to-many, many-to-one, and many-to-many relationships. TransH [11] can partially address many-to-many relationships, while ComplEx [12] introduces complex numbers into knowledge graph embeddings (KGEs) for the first time, solving more complex symmetric types beyond just symmetric and asymmetric relationships.

The main contributions of this paper include the following:

(1) A temporal knowledge graph question-answering framework is constructed, utilizing data augmentation through question paraphrasing to enhance the model’s ability to understand different ways of phrasing. The diverse set of paraphrased questions helps the system capture the complex dynamic relationships between entities and events in the temporal knowledge graph, improving the performance of the question-answering system in complex temporal reasoning tasks.

(2) Leveraging a deep learning framework, convolutional neural networks (CNNs) are introduced during the question vector representation phase. Through convolution operations, the model extracts richer temporal features from the raw data, capturing the dynamic relationships between entities at different time points and enhancing the understanding of complex relationships.

(3) Based on the aforementioned data augmentation and convolution operations, a multi-granularity temporal knowledge graph question-answering system is developed. Experiments on the multi-granularity temporal knowledge graph dataset MULTITQ show that the proposed method effectively understands complex questions as well as time-related multi-granularity questions.

2. Related Work

This paper proposes a multi-granularity temporal knowledge graph question-answering method based on data augmentation and convolutional networks, which deeply enhances the richness of questions and uncovers hidden information to perform temporal knowledge graph question answering. The following briefly introduces the research progress of temporal knowledge graph question answering in existing work.

2.1. Knowledge Graph Embedding Models

Knowledge graph embedding methods include RESCAL [13], which allows for deep interaction between the information of entities and relationships. However, RESCAL is prone to overfitting, and its complexity increases significantly as the dimensionality of the relationship matrix grows. DisMult [14] relaxes the constraints on the relationship matrix, but it overly simplifies the RESCAL model, making it capable of solving only symmetric relationships in the knowledge base, and not other types of relationships. Additionally, due to the presence of noise in the data and the difficulty in calculating confidence, DisMult is challenging to apply directly to non-deterministic knowledge graphs. To address the issues of DisMult, ComplEx [12] extends DisMult to the complex space representation. After this extension, ComplEx can handle both symmetric and asymmetric relationships. Due to its advantages, ComplEx is widely used in knowledge graph question answering. For temporal knowledge graphs, which include timestamps, additional processing of temporal information is required. TimePlex [15] represents temporal sequences by decomposing them into multiple components, emphasizing trends, seasonality, and residuals. This approach helps analyze the intrinsic structure of time series and is suitable for various time series analysis tasks. TComplEx [16] builds upon ComplEx by introducing temporal information, using complex numbers to represent time series embeddings, and emphasizing periodicity and phase information. This method is capable of representing both the amplitude and phase of a time series, making it particularly useful for capturing periodic patterns.

2.2. Temporal Knowledge Graph Question Answering

The core issue of temporal knowledge graph question answering lies in how to accurately understand the natural language questions posed by users, which is a key task in the study of question-answering systems. TEQUILA [8] detects whether the question is temporal, decomposes the question into simpler sub-questions, and then retrieves time-constrained candidate answers and dates from the knowledge base question-answering (KBQA) system. It then applies constraint-based reasoning to the candidates to produce the final answer. SF-TQA [9] evokes constraints and their possible interpretations, establishing constraints by exploring relevant knowledge graph (KG) facts. CRONKGQA [5] proposes a transformer-based solution that uses time-sensitive embedding methods for temporal knowledge graphs (TKGs) to obtain embeddings of TKGs. TempoQR [6] infers question-specific temporal information based on question representations, and it recovers missing temporal facts by manipulating the embedding space. TSQA [17] introduces a time-sensitive question-answering (TSQA) framework to address these issues, with a timestamp estimation module that can infer unrecorded timestamps from the question. Exaqt [18] uses a dense subgraph algorithm and a fine-tuned BERT model to identify the answer subgraph required to answer the question and applies a relational graph convolutional network (R-GCN) to infer the answer from the graph. CTRN [19] extracts implicit temporal features from the semantic information of the question representation using the T-GCN module and generates relational representations from the instruction-combined matrix using the Transformer encoder. MultiQA [7] parses the question text to identify corresponding entities and times and generates temporal semantic embeddings at different granularities.

2.3. Deep Learning and Time Series

Deep learning-based methods use deep neural networks to flexibly model complex nonlinear relationships and effectively capture shared information in time series data. For example, recurrent neural networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) networks [20] and Gated Recurrent Units (GRUs) [21], have been widely applied in time series forecasting due to their ability to automatically extract features, recognize complex patterns, and model long-term dependencies. To improve prediction accuracy, researchers have proposed more sophisticated structures, such as recursive-skip layers (LSTNet-S), time attention layers (LSTNet-A) [22], and the innovative Time Pattern Attention (TPA) mechanism [23]. At the same time, Transformer models based on self-attention mechanisms [24,25,26] have been introduced for sequence modeling. Traditional research typically focuses on a single-time granularity, leading to the loss of multi-granularity information. “Multi-granularity” methods aim to address the problem of coexisting data with different granularities or aggregation levels, meaning they can simultaneously process statistical information or feature sets with different time scales [27].

However, the aforementioned deep learning-based methods conduct question answering by merely calculating the correlation between the question and the temporal knowledge graph, neglecting the diverse relationships and complex information within the question. Additionally, they fail to deeply capture the relational vectors within the question.

3. MTQADC Model

This section introduces the multi-granularity temporal knowledge graph question-answering model based on data augmentation and convolutional networks, with the MTQADC model structure framework shown in Figure 2. The model includes data processing, multi-granularity time aggregation, convolutional feature learning, and a scoring module.

3.1. Data Processing Module

The data augmentation module primarily expands the original dataset through synonym replacement. For each question’s text, each word is processed individually. For each word in the text, its synonym set is retrieved using WordNet. If synonyms are found, the first synonym replaces the original word; if no synonym is found, the original word is retained. This can be represented as follows:

a u g m e n t e d (q_{i}) = s y n o n y m (q_{i})

(1)

Here,

s y n o n y m (q_{i})

represents the process of performing synonym replacement for each word in the question, followed by generating the augmented question.

After synonym replacement, the original data and the augmented data are merged to create a new dataset containing both the original and augmented questions. The merging operation is as follows:

a u g m e n t e d_{d a t a} = \sum_{i = 1}^{C} s y n o n y m (q_{i})

(2)

a l l_{d a t a} = o r i g i n a l_{d a t a} + a u g m e n t e d_{d a t a}

(3)

Here,

C

represents the number of questions. After merging the data, further processing is performed on each question’s text, as shown in Figure 3. First, temporal information is extracted from the question sentence using a temporal parsing tool. Then, named entity recognition (NER) is applied to identify entities in the question, and these entities are added to the entity field. Finally, the character positions of each entity in the sentence are recorded. This processing provides structured data support for subsequent tasks, such as knowledge graph matching, temporal parsing, and entity analysis.

3.2. Feature Extraction Module

Given a question, the language model ALBERT is used to obtain the state of the last hidden layer. The question context is then transformed into a semantic matrix using ALBERT.

Q_{A} = W_{A} ALBERT (q_{text})

(4)

Through a learnable projection matrix

W_{A} ALBERT

, the initial embedding vector

q_{text}

is transformed into the target embedding matrix

Q_{A}

, using the index

A

to represent the use of the ALBERT model. Here,

Q_{A} = [q_{CLS}, q_{1}, \dots, q_{N}]

is the

D \times N

embedding matrix, where

D

denotes the embedding dimension, representing the dimension of each token embedding.

N

refers to the number of tokens, indicating the number of words in the input sequence.

W_{A}

is a learnable projection matrix with

D \times D_{A}

, where

D_{A}

is the output dimension, corresponding to the dimension of the ALBERT hidden layers.

q_{CLS}

represents the embedding of the

[CLS]

token, which contains the context information of the entire input sequence.

q_{1}

to

q_{N}

are the embedding representations of the other tokens in the input text, with

q_{1}

corresponding to the first token in the input text,

q_{N}

to the second token, and so on.

Multi-granularity time data are typically presented at a daily time granularity, such as “19 March 2008”. Embeddings of temporal knowledge graphs at the daily granularity contain only the time information for each day. However, certain tasks may require handling a larger time granularity, such as year or month, and this information cannot be directly obtained from pre-trained daily granularity embeddings. To address this issue, a multi-granularity time aggregation module is used, as shown in Figure 4. This module aims to extract coarser-grained temporal semantic information, such as monthly granularity, from the daily granularity embeddings. For example, in the case of annual time aggregation, suppose we need to aggregate all related monthly granularity time information from a specific year to obtain the time semantics related to that year.

In this process, the first step is to extract all the monthly granularity timestamps

m_{1}, m_{2}, \dots m_{Z}

related to the specific year, along with the temporal embeddings

t_{1}, t_{2}, \dots t_{Z}

corresponding to those timestamps. Here,

Z

represents the number of months related to the year, which is 12 for a year. Therefore,

Z

equals 12. To obtain the yearly granularity temporal representation, these monthly embeddings are constructed into a temporal semantic matrix

T_{m}

.

T_{m} = [t_{1}, t_{2}, \dots t_{Z}]

(5)

In this process,

T_{m} \in R^{M \times S}

represents the number of days related to the target month, while

S

is the dimension of each timestamp embedding.

To enhance the sequentiality of the temporal information, sinusoidal positional encoding is introduced [18]. Specifically, for the h-th position in

T_{m}

, the positional information is encoded as follows:

PE (h, a) = \{\begin{matrix} sin (h / 10000^{\frac{2 l}{D}}), i f a = 2 l \\ cos (h / 10000^{\frac{2 l}{D}}), i f a = 2 l + 1 \end{matrix}

(6)

l is the index of the position in the matrix, and a is the exponent in the dimension of the positional encoding, which represents the a-th dimension of the positional encoding.

D

is the dimension of the positional encoding. Using this method, the positional encoding

PE (h, a)

is added to

T_{m}

, resulting in the enhanced time matrix

T_{m}^{'}

, which contains the positional information and ensures the sequential relationship of the timestamps.

Finally, the time information of different granularities is fused [28] to generate a unified multi-granularity temporal representation, which facilitates its use in subsequent tasks. This layer combines the time information of multiple granularities into a comprehensive representation, aiding in better temporal reasoning and semantic analysis:

T_{y} = Transformer (T_{m})

(7)

T_{y} = [t_{CLS}, t_{y 1}, \dots, t_{y J}]

and

t_{CLS}

are global temporal representations, representing different years. Similarly, daily granularity time can be aggregated into monthly granularity.

3.3. Convolutional Feature Learning Module

The convolutional feature learning module is shown in Figure 5. First, the dimensionality of the problem text relation representation is transformed into the embedding dimension, facilitating subsequent scoring.

R = Q_{A} \cdot W_{linear}

(8)

Q_{A}

is the input question embedding obtained through ALBERT.

W_{linear} \in R^{G \times F}

is the weight matrix of the linear layer,

G

is the hidden layer with a dimension of 768, and

F

is the embedding with a dimension of 512. To facilitate the use of the convolutional network, the dimensionality is expanded to

R \in R^{G \times F \times 1}

. Next, a convolution operation is performed.

R_{C N N 1} (i) = \sum_{j = 0}^{p - 1} W_{1} (j) \cdot R (i + j)

(9)

W_{1}

is the weight corresponding to the position in the convolutional kernel.

p

represents the size of the convolutional kernel, which is 3. i is the index of the target position. For each convolutional kernel, j represents the elements at different positions within the kernel, and

R (i + j)

is the value at position

i + j

in the input sequence. The convolution operation performs a sliding window operation on the input tensor to extract local features. Then, the convolutional output is passed through the

ReLU

activation function for a nonlinear transformation.

R_{C N N 1}^{'} = ReLU (R_{C N N 1})

(10)

The first convolutional layer and the second convolutional layer follow similar steps.

R_{C N N 2}^{'} = ReLU (\sum_{j = 0}^{p - 1} W_{2} (j) \cdot R_{C N N 1}^{'} (i + j))

(11)

Finally, batch normalization and regularization are applied to obtain the final relationship representation.

R = Dropout (BatchNorm (R_{C N N 2}^{'}))

(12)

Through two layers of convolution operations, the model can automatically extract multi-level local features from the relationship embeddings. Each convolution layer uses convolution kernels to perform weighted summation on the input features, extracting information-rich local representations. The use of activation functions introduces nonlinearity, enabling the model to learn more complex features. This series of convolution operations not only enhances the model’s ability to understand complex relationships but also provides more effective input features for the subsequent scoring function.

3.4. Scoring Module

First, the score for the entities is calculated, where s and o represent the subject and object. The following formula is used to handle the exchange of the subject and object, and the

max (\cdot)

function ensures that the score can be ignored when the subject or object is a virtual entity.

s c o r e_{e n t i t y} = max (ϕ (e_{s}, R, e_{ϵ}, t_{τ}), ϕ (e_{o}, R, e_{ϵ}, t_{τ}))

(13)

Next, the score for the time is calculated:

s c o r e_{t i m e} = ϕ (e_{s}, R, e_{o}, t_{τ})

(14)

Finally, the entity score and the time score are concatenated:

s c o r e = s c o r e_{e n t i t y} \oplus s c o r e_{t i m e}

(15)

In the training process, the entity and time scores are concatenated and converted into probabilities through the

Soft max

function. The model’s parameters are updated to assign higher probabilities to the correct answers by minimizing the cross-entropy loss.

4. Experiments and Analysis

4.1. Dataset

4.1.1. Multi-Granularity Time Dataset

As shown in Table 1, for the MULTITQ [7] dataset, questions can be classified into simple and complex questions based on whether a single tuple or multiple tuples are required to answer the question. Additionally, questions can be divided into categories such as “Equal”, “Before/After”, “First/Last”, etc., depending on whether the event times in the question are consistent. Alternatively, questions can be classified based on the time granularity, such as year-level, month-level, or day-level questions.

4.1.2. Single-Granularity Time Dataset

CRONQUESTIONS [5] is derived from WikiData [16], and is a subset that consists of a knowledge graph with temporal annotations and a set of natural language questions requiring temporal reasoning. The time in CRONQUESTIONS is measured in years, unlike the MULTITQ dataset, which features multi-granularity time. The specific types of questions in the dataset and the number of related types are shown in Table 2.

4.2. Evaluation Metrics

Hits @ k

is an evaluation metric commonly used to assess the performance of ranking-based models. It measures whether the correct answer appears within the top

k

positions in the predicted results. This metric calculates the average proportion of quadruples whose ranks are less than or equal to

k

across all quadruple predictions. A higher value of this metric indicates better predictive accuracy of the model.

Hits @ k = \frac{1}{|S|} \sum_{i = 1}^{|S|} 1 ({rank}_{i} \leq k)

(16)

S

represents the set of quadruples, with

|S|

being the total number of quadruples in the set and

{rank}_{i}

denoting the predicted rank of the

i

-th quadruple.

1 {(rank}_{i} \leq k)

is the indicator function, which returns 1 when the rank of the

i

-th quadruple is less than or equal to

k

, and returns 0 otherwise. For each quadruple, if its rank is less than or equal to

k

, it is considered a hit, with a contribution value of 1. The sum of hits across all quadruples is then divided by the total number of quadruples to obtain

Hits @ k

. In the experiments,

k

values of 1 and 10 are chosen as evaluation metrics, which not only reflect the precision of the answers but also capture the range of correct answers.

4.3. Experiment Parameters

To obtain the TKG (temporal knowledge graph) embeddings, the TComplEx embedding method was used, with the embedding dimension set to 1024. The question text information is generated using the ALBERT pre-trained language model, the specific version we are using is albert-base-v2, which can be downloaded and used from Hugging Face at the following URL: https://huggingface.co/ (accessed on 5 March 2025). The number of encoder layers was set to two, with four heads per layer. The sentence embedding dimension was set to 768, the dropout rate in the forward pass was 0.3, and the learning rate was set to 0.0002. A two-layer convolutional network was used for feature processing, with both convolution layers having a kernel size of 3, a stride of 1, and padding size set to 1. The model was trained for 10 epochs, with the final parameters determined by the best validation performance. We have rented a cloud server, all experiments were conducted on a server equipped with a 24GB GPU (RTX 3090) and 43GB of memory. The RTX 3090 is manufactured by NVIDIA. NVIDIA is headquartered in Santa Clara, CA, USA.

4.4. Comparison Models

This paper compares the proposed model with the following models. The following Table 3 and Table 4 illustrate whether the relevant models include modules such as large language models and temporal aggregation.

(1) ALBERT [29]: An improved version of BERT that reduces model parameters through parameter sharing and factorized embedding matrices, significantly improving training efficiency.

(2) EmbedKGQA [30]: Effective in performing multi-hop KGQA on sparse knowledge graphs, it relaxes the requirement to select answers from a pre-specified local neighborhood and uses KG entity embeddings to learn question embeddings. During evaluation, all entities are re-scored, and the entity with the highest score is selected as the answer.

(3) T5 [31]: T5 (Text-To-Text Transfer Transformer) is a general text generation model proposed by Google in 2019, with the core idea of unifying all NLP tasks into a “text-to-text” transformation task. In comparative experiments, the T5-Base model is used, which is a medium-sized version within the T5 series, offering a good balance between performance and efficiency.

(4) GPT-2 [32]: GPT-2 (Generative Pre-trained Transformer 2) is a natural language processing model based on the Transformer architecture, primarily used for text generation tasks. The comparative experiments utilize the GPT-2 Medium version, which is a medium-sized model within the GPT-2 series developed by OpenAI. Compared to GPT-2 Small, it has more parameters and stronger generative capabilities.

(5) CronKGQA [5]: Applies a time-sensitive KG embedding algorithm (TCOMPLEX) to obtain embeddings for TKG entities, relations, and timestamps, and uses pre-trained BERT for question processing. CronKGQA uses two scoring functions: one for predicting entities and another for predicting time.

(6) CTRN [19]: Proposes an improved temporal reasoning method, the Complex Temporal Reasoning Network (CTRN). For each question, it captures implicit temporal features and relational representations and then integrates them to generate implicit temporal relationship representations.

(7) MultiQA [7]: Introduces the concept of multi-granularity temporal question answering based on knowledge graphs and presents a large-scale dataset for multi-granularity temporal QA. It utilizes TKG embeddings to obtain semantic information and generates different granularity temporal semantic embeddings, using scoring functions to compute candidate answers.

4.5. Experimental Results and Analysis

4.5.1. Comparative Experimental Results and Analysis

Table 5 presents the experimental results categorized by question type and answer type, with Hits@1 and Hits@10 as evaluation metrics. Table 6 shows the experimental results at different time granularities using Hits@1 as the evaluation metric. In the table, the bolded data represents the best performance, while the underlined data indicates the second-best performance. Based on the experimental results of the proposed model and six comparison models, the following conclusions can be drawn:

(1) Among the knowledge graph question-answering models, ALBERT performs lower than MTQADC in both the Hits@1 and Hits@10 evaluation metrics. In terms of Hits@1, its accuracy is 0.099 lower than MTQADC for the Multiple question type, 0.290 lower for the Single question type, 0.291 lower for the Entity answer type, and 0.096 lower for the Time answer type. Regarding time granularity, these three models also perform worse than other models that use embedding operations. For example, in the Equal Multi type of daily granularity questions, ALBERT is approximately 0.057 lower than MTQADC. This is because ALBERT does not use embedding operations, ignoring the representation of the knowledge graph, and without an embedding-based scoring mechanism, the model’s answer retrieval is less accurate.

(2) Among the models that combine language models and embedding operations, EmbedKGQA performs worse than CronKGQA, MultiQA, and MTQADC. Overall, EmbedKGQA’s Hits@1 is about 0.136 lower than that of MTQADC, and its Hits@10 is about 0.174 lower than that of MTQADC. For Equal-type questions, it is lower by 0.419, 0.040, and 0.096 in daily, monthly, and yearly granularities, respectively. This is because EmbedKGQA uses the ComplEx embedding method, which is an algorithm that does not include time embeddings and is more suitable for traditional knowledge graphs. In contrast, CronKGQA, MultiQA, and MTQADC all use time-sensitive embedding methods like TComplEx, which incorporate time embeddings, allowing for better processing and extraction of time-related information and making them more suitable for embedding temporal knowledge graphs.

(3) The T5 model employs a multi-granularity temporal aggregation module. Compared to the MTQADC model, the T5 model is lower by 0.063 and 0.073 in the Hits@1 and Hits@10 metrics, respectively. The T5 model performs well on Multiple-type questions, being only 0.01 lower than the MTQADC model in the Hits@1 metric. However, for Time-type questions, the T5 model’s performance is 0.092 lower than that of the MTQADC model in the Hits@1 metric, indicating that the T5 model has a general performance in answering time-related questions. From the perspective of temporal multi-granularity, the T5 model performs well on Before/After questions, achieving Hits@1 metric values of 0.611, 0.637, and 0.637 for day, month, and year granularity questions, respectively, which is second only to the MTQACD model.

(4) In terms of overall question-answering performance, the GPT-2 model is lower than the MTQADC model by 0.059 in the Hits@1 metric and by 0.047 in the Hits@10 metric. For Single-type questions, the GPT-2 model is lower than the MTQADC model by 0.073 in the Hits@1 metric. In the Entity question type, the GPT-2 model is lower than the MTQADC model by 0.068 in the Hits@1 metric. From the perspective of temporal multi-granularity, the GPT-2 model has a Hits@1 of 0.33 for the Equal-type month granularity, which is 0.063 lower than that of the MultiQA model.

(5) CronKGQA, MultiQA, and MTQADC all use time-sensitive embedding algorithms, but as shown in the table, CronKGQA performs worse compared to MultiQA and MTQADC. CronKGQA is 0.063 and 0.025 lower than MTQADC in Hits@1 and Hits@10, respectively. In terms of time granularity, for the Before/After question type, CronKGQA is lower than MTQADC by 0.265, 0.233, and 0.273 in yearly granularity. This is because CronKGQA was initially designed to handle questions with a single-time granularity, such as temporal knowledge graph questions answered with only yearly granularity. Therefore, it performs poorly on multi-granularity temporal knowledge graph question answering. In contrast, MultiQA and MTQADC use the multi-granularity time aggregation module, which allows for the aggregation of different time granularities, resulting in better performance on multi-granularity questions.

(6) Both CTRN and MTQADC use time-sensitive embedding operations and multi-granularity time aggregation, but in terms of overall Hits@1 and Hits@10 evaluation metrics, MTQADC is 0.049 higher and 0.002 lower, respectively. In terms of time granularity, for the Before/After question type, MTQADC outperforms CTRN by 0.261, 0.159, and 0.198 in daily, monthly, and yearly granularities, respectively. Compared to MultiQA, MTQADC benefits from data augmentation and convolutional feature learning modules.

(7) Both MultiQA and MTQADC use time-sensitive embedding operations and multi-granularity time aggregation, but in terms of overall Hits@1 and Hits@10 evaluation metrics, MTQADC is 0.049 higher and 0.002 lower, respectively. In terms of time granularity, for the Before/After question type, MTQADC outperforms MultiQA by 0.261, 0.159, and 0.198 in daily, monthly, and yearly granularities, respectively. Compared to MultiQA, MTQADC benefits from data augmentation and convolutional feature learning modules. The data augmentation module enriches the problem representation by paraphrasing the questions, thereby expanding the limited question set and enhancing the problem representation. Furthermore, by applying convolutional processing to the questions, MTQADC gradually strengthens the understanding of complex relationships, thereby improving the accuracy and efficiency of the question-answering system.

Overall, MTQADC outperforms other models, but there is still room for improvement in certain problem types.

4.5.2. Impact of Dropout Rate on Experiments

As shown in Figure 6, multiple dropout rates of 0.1, 0.2, 0.3, 0.4, and 0.5 were selected for the experiments to evaluate their effects on training loss, validation loss, and test loss, as well as to measure the model’s performance using the Hits@1 and Hits@10 metrics.

As the dropout rate increased, the training loss gradually decreased, indicating an enhanced fitting ability of the model on the training set. However, when the dropout rate exceeded 0.3, the rate of decrease in training loss slowed, suggesting that the model might begin to lose important features. On the validation set, a dropout rate of 0.3 yielded the lowest validation loss, demonstrating that this rate effectively balanced the model’s complexity and generalization capability. Lower dropout rates, such as 0.1 and 0.2, failed to effectively prevent overfitting, while higher dropout rates, such as 0.4 and 0.5, led to an increase in validation loss, indicating a decline in the model’s performance on the validation set. On the test set, the model with a dropout rate of 0.3 also performed exceptionally well, achieving the lowest test loss, which further validated the effectiveness of this dropout rate. In terms of practical performance evaluation, the model with a dropout rate of 0.3 achieved the best performance on the Hits@1 and Hits@10 metrics, with values of 0.342 and 0.633, respectively. This result indicates that selecting a dropout rate of 0.3 not only improved the model’s accuracy but also enhanced its effectiveness in practical applications.

In summary, the experimental results demonstrate that choosing a dropout rate of 0.3 is based on a comprehensive consideration of model performance, effectively preventing overfitting, improving the model’s generalization ability, and achieving optimal results on both the validation and test sets.

4.5.3. Ablation Experiment Results and Analysis

This section presents the ablation experiment results of the MTQADC model, as shown in Table 7 and Figure 7. The effectiveness of the model’s data augmentation, convolutional feature learning, and multi-granularity time aggregation on temporal knowledge graph question answering is explored.

First, removing the data augmentation module shows a noticeable decline in model performance on the Hits@1 and Hits@10 evaluation metrics, with decreases of 0.021 and 0.018, respectively. In terms of multi-granularity time, the performance also drops for the “Equal” question type, with reductions of 0.031, 0.007, and 0.039 at the daily, monthly, and yearly granularities, respectively. These results indicate that data augmentation enriches the question representation and improves the question-answering performance in temporal knowledge graph tasks.

Next, removing the convolutional feature learning module causes the most significant drop in model performance compared to MTQADC. The Hits@1 and Hits@10 metrics decreased by 0.039 and 0.026, respectively. For multi-granularity time, the “Before/After” question type shows declines of 0.081, 0.084, and 0.103 at the daily, monthly, and yearly granularities, respectively. This further demonstrates that the CNN can effectively capture latent relational features in time series data. In temporal knowledge graph question-answering tasks, CNNs extract richer temporal features through convolution operations, capturing the dynamic relationships between entities at different time points.

Finally, removing the multi-granularity time aggregation module leads to decreases of 0.022 and 0.026 in the Hits@1 and Hits@10 metrics, respectively. As shown in the table, the multi-granularity time aggregation module has the most significant impact on time-related question types, with reductions of 0.035 in Hits@1 and 0.071 in Hits@10. This highlights that the time aggregation module improves the learning of time information, thereby enhancing the experiment’s performance.

4.5.4. Justification and Interpretability

We conducted a weight analysis on the question “Before 7 October 2007, with whom did the Prime Minister of Bangladesh declare his intention to cooperate?” First, we performed a normalization operation on the weights, converting the original weight values into weight coefficients ranging from 0 to 1. The variations in weight coefficients are illustrated in Figure 8 and Figure 9. Before using the convolutional network, some words that are less relevant to the question, such as “the”, “of”, and “to”, had relatively high weight coefficients. After employing the convolutional network, the weight coefficients of these words decreased, indicating that the model became more focused on information closely related to the question, such as “before”, “7”, “declare”, and “intention”. This demonstrates that the use of convolutional networks can better capture key information in the question, thereby enhancing the effectiveness of question answering.

4.5.5. Error Analysis

As shown in Table 8, in multi-granularity temporal knowledge graph question answering, the errors in the model’s answer predictions reflect its shortcomings in understanding questions and reasoning abilities. For instance, when addressing the question “When did Adji Otheth Ayassor first visit China?”, the model predicted multiple dates, including “27 April 2009” and “8 April 2009”, while the actual answer was “8 April 2009”. This situation indicates that the model failed to effectively identify the key temporal information in the question, leading it to choose among multiple candidate answers without accurately pinpointing the unique correct answer. Such a lack of understanding of temporal context may be a primary reason for the model’s poor performance on time-related questions. Moreover, the model also exhibits significant limitations in handling entity recognition and temporal reasoning. For example, in predictions involving multiple countries and individuals, the model failed to accurately identify the specific entity related to “Businessperson of Uzbekistan”, reflecting its confusion in processing multiple entities. Similarly, the model struggled to grasp the granularity of time when dealing with month information, resulting in incorrect date predictions.

Overall, these errors indicate that the model needs to enhance its contextual understanding and temporal reasoning capabilities in multi-granularity temporal knowledge graph question answering to improve overall answer accuracy. Future research could focus on improving the model’s sensitivity to temporal information and reasoning abilities, thereby enhancing its performance in complex question-answering scenarios.

4.5.6. Scalability and Computational Complexity

When using the data for the first time, data preprocessing is required. The average duration for preprocessing the raw data is 13 h. After preprocessing, the entire MTQADC model and the data size are approximately 3.5 GB, with an average training duration of 2.5 h for the model.

From Figure 10, it can be seen that as the size of the dataset increases, the total Hits @ 1 value shows a gradual upward trend, increasing from 0.24 at 20% to 0.342 at 100%. This indicates that the scale of the dataset has a significant impact on the model’s performance, as larger datasets can provide more information, thereby enhancing the model’s accuracy. Among the different data types, the “Entity” and “Single” types perform relatively well, particularly when the dataset reaches 100%, with Hits@1 values of 0.43 and 0.406, respectively. This demonstrates the model’s effectiveness in handling entity and single questions. In contrast, the improvement for the “Multiple” type is minimal, increasing from 0.156 at 20% to 0.185 at 100%, suggesting that the model’s performance gains are limited when addressing multiple questions. Although the Hits@1 value for the “Time” type also increases with the dataset size, it remains at a relatively low level.

Overall, the size of the dataset positively influences the model’s performance, especially in handling entity and single questions. Future research could further explore ways to enhance the model’s performance on multiple questions and time-related issues.

4.5.7. Broader Applicability and Generalization

TempoQR [6] is a temporal knowledge reasoning model that extracts textual features of questions, associates entity information from knowledge graphs, and parses time ranges through three core modules. It ultimately integrates these three types of information within a Transformer architecture for joint reasoning, enabling precise answers to complex questions involving temporal changes. TempoQR has developed two methods to recover time-related information specific to a given question. The first method, referred to as hard supervision, retrieves relevant entities and facts from the knowledge graph (TKG) to obtain temporal information. The second method, known as soft supervision, infers temporal information by analyzing the representation of the question, utilizing operations in the embedding space to recover missing temporal facts. Since the time granularity processed is in years, the multi-granularity time aggregation module in the MTQADC model was not utilized when handling the CronQuestions dataset. However, our model employs both soft and hard supervision from TempoQR to recover time-related information specific to the question. Table 9 presents the experimental results on the CronQuestions dataset.

The experimental results on the CronQuestions dataset indicate that the hard supervision method of MTQADC achieved a Hits@1 score of 0.875 and a Hits@10 score of 0.977, demonstrating its effectiveness in retrieving relevant answers. While the performances of both TempoQR and MTQADC’s hard supervision methods are commendable, they show limitations in handling complex questions, with Hits@1 scores of 0.864 and 0.789, respectively, suggesting that they may not fully meet the demands of certain complex queries. The soft supervision method of MTQADC achieved a Hits@1 score of 0.771 and a Hits@10 score of 0.958, which, although slightly lower, still showcases the potential of soft supervision in recovering time-related information.

In comparison with other models, MTQADC’s hard supervision method demonstrated strong performance in retrieving entity answers, with Hits@1 at 0.903 and Hits@10 at 0.980, reflecting its capability in entity recognition. Other models, such as TempoQR, also warrant attention for their performance in handling temporal answers; TempoQR’s hard supervision method achieved a Hits@1 score of 0.918 and a Hits@10 score of 0.978. Overall, the MTQADC model, by combining the soft and hard supervision methods from TempoQR, effectively recovers information related to specific time ranges. However, further optimization is needed to enhance its overall performance and adaptability, particularly in addressing complex questions and certain types of answers.

5. Conclusions and Future Work

By introducing temporal information, temporal knowledge graphs can accurately represent the occurrence time, duration, and interrelationships of events, thereby providing users with more accurate and relevant answers based on time series. This paper improves question-answering performance through data augmentation and convolutional feature learning. First, by generating different expressions for the same question, the robustness and generalization ability of the question-answering system are enhanced. Paraphrasing helps the system understand temporal semantic variations, enabling it to better handle diverse queries. Additionally, in the task of temporal knowledge graph question answering, convolution operations are used to mine deeper temporal features from raw data, capturing the changing relationships between entities at different time points. Convolution processing of questions gradually enhances the system’s understanding of complex temporal relationships, thereby improving the accuracy of the question-answering system. Experiments on public datasets show that the proposed model significantly outperforms existing models, and further experimental analysis demonstrates the functionality and effectiveness of each module in the model.

Future work will further optimize the temporal knowledge graph question-answering system: Firstly, the representation capability of temporal information will be enhanced to better capture complex time dependencies. Secondly, advanced model architectures will be utilized to improve the system’s reasoning ability and accuracy. Furthermore, the system’s application scope will be expanded to cover more dynamic scenarios, such as financial markets, healthcare, etc. Finally, the system’s ability to handle large-scale temporal data will be strengthened to improve its efficiency and response speed in real-world applications.

Author Contributions

Conceptualization, Y.L. and L.S.; methodology, Y.L.; software, L.W.; validation, D.J., L.S. and L.W.; formal analysis, L.S.; investigation, Y.L.; resources, Y.L.; data curation, L.S.; writing—original draft preparation, Y.L.; writing—review and editing, L.S.; visualization, L.W.; supervision, D.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China National Natural Science Foundation Project: Research on Question Answering System Based on Deep Transfer Learning, the National Natural Science Foundation of China [Grant numbers 62166021].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article, and further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the anonymous reviewers for their valuable and helpful comments, which substantially improved this paper. Finally, we also would also like to thank all of the editors for their professional advice and help.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, M.; Peng, S.; Yang, M.; Li, N.; Wang, H.; Qiao, L.; Mi, H.; Wen, Z.; Xu, T.; Liu, L. IIAS: An Intelligent Insurance Assessment System through Online Real-time Conversation Analysis. In Proceedings of the IJCAI, Montreal, QC, Canada, 19–27 August 2021; pp. 5036–5039. [Google Scholar]
Tai, C.Y.; Huang, L.Y.; Huang, C.K.; Ku, L.W. User-centric path reasoning towards explainable recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 879–889. [Google Scholar]
Leblay, J.; Chekol, M.W. Deriving validity time in knowledge graph. In Proceedings of the Companion Proceedings of the Web Conference, Lyon, France, 23–27 April 2018; pp. 1771–1776. [Google Scholar]
Jia, Z.; Abujabal, A.; Saha Roy, R.; Strötgen, J.; Weikum, G. Tempquestions: A benchmark for temporal question answering. In Proceedings of the Companion Proceedings of the Web Conference, Lyon, France, 23–27 April 2018; pp. 1057–1062. [Google Scholar]
Saxena, A.; Chakrabarti, S.; Talukdar, P. Question Answering Over Temporal Knowledge Graphs. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 6663–6676. [Google Scholar]
Mavromatis, C.; Subramanyam, P.L.; Ioannidis, V.N.; Adeshina, A.; Howard, P.R.; Grinberg, T.; Hakim, N.; Karypis, G. Tempoqr: Temporal question reasoning over knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022; Volume 36, pp. 5825–5833. [Google Scholar]
Chen, Z.; Liao, J.; Zhao, X. Multi-granularity temporal question answering over knowledge graphs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, QC, Canada, 9–14 July 2023; pp. 11378–11392. [Google Scholar]
Jia, Z.; Abujabal, A.; Saha Roy, R.; Strötgen, J.; Weikum, G. Tequila: Temporal question answering over knowledge bases. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1807–1810. [Google Scholar]
Ding, W.; Chen, H.; Li, H.; Qu, Y. Semantic Framework based Query Generation for Temporal Question Answering over Knowledge Graphs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1867–1877. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the NIPS’13: 27th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011; Volume 11, pp. 3104482–3104584. [Google Scholar]
Yang, B.; Yih, S.W.t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Jain, P.; Rathi, S.; Mausam; Chakrabarti, S. Temporal Knowledge Base Completion: New Algorithms and Evaluation Protocols. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020; pp. 3733–3747. [Google Scholar]
Lacroix, T.; Obozinski, G.; Usunier, N. Tensor Decompositions for Temporal Knowledge Base Completion. arXiv 2020, arXiv:2004.04926. [Google Scholar]
Shang, C.; Wang, G.; Qi, P.; Huang, J. Improving Time Sensitivity for Question Answering over Temporal Knowledge Graphs. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 8017–8026. [Google Scholar]
Jia, Z.; Pramanik, S.; Saha Roy, R.; Weikum, G. Complex temporal question answering on knowledge graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, New York, NY, USA, 1–5 November 2021; pp. 792–802. [Google Scholar]
Jiao, S.; Zhu, Z.; Wu, W.; Zuo, Z.; Qi, J.; Wang, W.; Zhang, G.; Liu, P. An improving reasoning network for complex question answering over temporal knowledge graphs. Appl. Intell. 2023, 53, 8195–8208. [Google Scholar] [CrossRef]
Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
Shih, S.Y.; Sun, F.K.; Lee, H.y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef]
Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Efficient Transformer. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhang, L.; Aggarwal, C.; Qi, G.J. Stock price prediction via discovering multi-frequency trading patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 2141–2149. [Google Scholar]
Vaswani, A. Attention is all you need. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Hou, M.; Xu, C.; Li, Z.; Liu, Y.; Liu, W.; Chen, E.; Bian, J. Multi-granularity residual learning with confidence estimation for time series prediction. In Proceedings of the ACM Web Conference, Lyon, France, 25–29 April 2022; pp. 112–121. [Google Scholar]
Févry, T.; Soares, L.B.; Fitzgerald, N.; Choi, E.; Kwiatkowski, T. Entities as Experts: Sparse Memory Access with Entity Supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020; pp. 4937–4951. [Google Scholar]
Lan, Z. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
Saxena, A.; Tripathi, A.; Talukdar, P. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 4498–4507. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]

Figure 1. Basic process of temporal knowledge graph question answering.

Figure 2. MTQADC model framework.

Figure 3. Data processing.

Figure 4. Multi-granularity temporal aggregation.

Figure 5. Convolutional feature learning module.

Figure 6. The impact of different dropout rates on the experiments: (a) Comparison of training loss. (b) Comparison of validation loss. (c) Comparison of test loss. (d) Comparison of experimental results.

Figure 7. Ablation study of multi-granularity time for different types of questions under the Hits@1 metric: (a) ablation study for Equal types of questions. (b) Ablation study for Before/After types of questions. (c) Ablation study for Equal Multi types of questions. (d) Ablation study for overall questions.

Figure 8. The weight coefficients of the tokenized question processed by ALBERT.

Figure 9. The weight coefficients of the tokenized question processed by the CNN after ALBERT.

Figure 10. The impact of dataset size on model performance.

Table 1. MULTITQ dataset.

Category	Subcategory	Train	Dev	Test
Single	Equal	135,890	18,983	17,311
	Before/After	75,340	11,665	11,073
	First/Last	72,252	11,097	10,480
Multiple	Equal Multi	16,893	3213	3207
	After First	43,305	6499	6266
	Before Last	43,107	6532	6247
Total		386,787	57,979	54,584

Table 2. CRONQUESTIONS dataset.

	Train	Dev	Test
Entity Answer	225,672	19,362	19,524
Time Answer	124,328	10,638	10,476
Simple Entity	90,651	7745	7812
Simple Time	61,471	5197	5046
Before/After	23,869	1982	2151
First/Last	118,556	11,198	11,159
Time Join	55,453	3878	3832
Total	350,000	30,000	30,000

Table 3. Characteristics of different methods.

	ALBERT	EmbedKGQA	T5	GPT-2
Large Language Model	yes	yes	yes	yes
Embedding Operations	no	yes	yes	yes
Temporal Aggregation	no	no	yes	yes
Convolutional Network	no	no	no	no
Data Augmentation	no	no	no	no

Table 4. Characteristics of different methods.

	CronKGQA	CTRN	MultiQA	MTQADC
Large Language Model	yes	yes	yes	yes
Embedding Operations	yes	yes	yes	yes
Temporal Aggregation	yes	yes	yes	yes
Convolutional Network	no	no	no	yes
Data Augmentation	no	no	no	yes

Table 5. Comparative experimental results on the MULTITQ dataset.

Model	Hits@1					Hits@10
	Overall	Question Type		Answer Type		Overall	Question Type		Answer Type
	Overall	Multiple	Single	Entity	Time	Overall	Multiple	Single	Entity	Time
ALBERT	0.108	0.086	0.116	0.139	0.032	0.484	0.415	0.512	0.589	0.228
EmbedKGQA	0.206	0.134	0.235	0.290	0.001	0.459	0.439	0.467	0.648	0.001
T5	0.279	0.175	0.322	0.379	0.036	0.560	0.497	0.585	0.707	0.201
GPT-2	0.283	0.161	0.333	0.362	0.091	0.586	0.489	0.625	0.702	0.304
CronKGQA	0.279	0.134	0.337	0.328	0.156	0.608	0.453	0.671	0.696	0.392
CTRN	0.307	0.177	0.359	0.387	0.110	0.611	0.507	0.653	0.723	0.338
MultiQA	0.293	0.159	0.347	0.349	0.157	0.635	0.519	0.682	0.733	0.396
MTQADC	0.342	0.185	0.406	0.430	0.128	0.633	0.532	0.674	0.742	0.369

Table 6. Comparison of multi-granularity time under the Hits@1 metric.

Model	Equal			Before/After			Equal Multi
Model	Day	Month	Year	Day	Month	Year	Day	Month	Year
ALBERT	0.069	0.082	0.132	0.221	0.277	0.308	0.103	0.144	0.144
EmbedKGQA	0.200	0.336	0.218	0.392	0.518	0.511	0.145	0.321	0.263
T5	0.418	0.365	0.239	0.611	0.637	0.637	0.201	0.319	0.284
GPT-2	0.49	0.33	0.247	0.556	0.572	0.599	0.173	0.289	0.28
CronKGQA	0.425	0.389	0.331	0.375	0.474	0.450	0.295	0.333	0.251
CTRN	0.522	0.337	0.288	0.592	0.611	0.604	0.192	0.293	0.296
MultiQA	0.445	0.393	0.350	0.379	0.548	0.525	0.308	0.321	0.283
MTQADC	0.619	0.376	0.314	0.640	0.707	0.723	0.160	0.308	0.313

Table 7. Ablation Experiment Results.

Model	Hits@1					Hits@10
	Overall	Question Type		Answer Type		Overall	Question Type		Answer Type
	Overall	Multiple	Single	Entity	Time	Overall	Multiple	Single	Entity	Time
MTQADC	0.342	0.185	0.406	0.430	0.128	0.633	0.532	0.674	0.742	0.369
w/o Data Augmentation	0.321	0.160	0.386	0.409	0.105	0.615	0.527	0.615	0.740	0.311
w/o Convolutional Networks	0.303	0.164	0.359	0.381	0.113	0.607	0.503	0.650	0.721	0.331
w/o Temporal Aggregation	0.320	0.175	0.378	0.417	0.083	0.607	0.518	0.644	0.735	0.298

Table 8. Error analysis.

	Question	Answer	Predicted Answer	Answer Type	Time	qtype	qlable
1	When did Adji Otheth Ayassor first visit China?	‘8 April 2009’	‘27 April 2009’, ‘8 April 2009‘, ‘Laos’	time	day	first_last	Single
2	Who would wish to visit Oman on the same month of the Businessperson of Uzbekistan?	‘Saudi Arabia’	‘Iran’, ‘Mahmoud Ahmadinejad’, ‘United Arab Emirates’	entity	month	equal_multi	Multiple
3	In which month did Don McKinnon visit Swaziland?	‘December 2006’	‘14 December 2006’, ‘7 November 2005’, ‘17 August 2009’	time	month	equal	Single
4	Before China, with whom did the UN Security Council last express its willingness to negotiate?	‘France’	‘Japan’, ‘Iran’, ‘China’	entity	day	before_last	Multiple
5	Who was the first to praise China after Lawrence Cannon?	‘Ma Ying Jeou’	‘Japan’, ‘South Korea’, ‘Vietnam’	entity	day	after_first	Multiple

Table 9. Comparative experimental results on the CRONQUESTIONS dataset.

Model	Hits@1					Hits@10
	Overall	Question Type		Answer Type		Overall	Question Type		Answer Type
	Overall	Complex	Simple	Entity	Time	Overall	Complex	Simple	Entity	Time
EmbedKGQA	0.288	0.286	0.290	0.411	0.057	0.672	0.632	0.725	0.850	0.341
CronKGQA	0.647	0.392	0.987	0.699	0.549	0.884	0.802	0.992	0.898	0.857
TempoQR Soft	0.799	0.655	0.990	0.876	0.653	0.957	0.930	0.993	0.972	0.929
TempoQR Hard	0.918	0.864	0.990	0.926	0.903	0.978	0.967	0.993	0.980	0.974
MTQADC Soft	0.771	0.607	0.989	0.847	0.628	0.958	0.931	0.990	0.974	0.926
MTQADC Hard	0.875	0.789	0.990	0.903	0.822	0.977	0.964	0.991	0.980	0.971

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Su, L.; Wu, L.; Jiang, D. Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks. Appl. Sci. 2025, 15, 2958. https://doi.org/10.3390/app15062958

AMA Style

Lu Y, Su L, Wu L, Jiang D. Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks. Applied Sciences. 2025; 15(6):2958. https://doi.org/10.3390/app15062958

Chicago/Turabian Style

Lu, Yizhi, Lei Su, Liping Wu, and Di Jiang. 2025. "Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks" Applied Sciences 15, no. 6: 2958. https://doi.org/10.3390/app15062958

APA Style

Lu, Y., Su, L., Wu, L., & Jiang, D. (2025). Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks. Applied Sciences, 15(6), 2958. https://doi.org/10.3390/app15062958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Graph Embedding Models

2.2. Temporal Knowledge Graph Question Answering

2.3. Deep Learning and Time Series

3. MTQADC Model

3.1. Data Processing Module

3.2. Feature Extraction Module

3.3. Convolutional Feature Learning Module

3.4. Scoring Module

4. Experiments and Analysis

4.1. Dataset

4.1.1. Multi-Granularity Time Dataset

4.1.2. Single-Granularity Time Dataset

4.2. Evaluation Metrics

4.3. Experiment Parameters

4.4. Comparison Models

4.5. Experimental Results and Analysis

4.5.1. Comparative Experimental Results and Analysis

4.5.2. Impact of Dropout Rate on Experiments

4.5.3. Ablation Experiment Results and Analysis

4.5.4. Justification and Interpretability

4.5.5. Error Analysis

4.5.6. Scalability and Computational Complexity

4.5.7. Broader Applicability and Generalization

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI