Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering

Sha, Yuchen; Feng, Yujian; He, Miao; Liu, Shangdong; Ji, Yimu

doi:10.3390/math11153269

Open AccessArticle

Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering

by

Yuchen Sha

¹,

Yujian Feng

¹,

Miao He

²,

Shangdong Liu

²

and

Yimu Ji

^2,*

¹

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

²

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(15), 3269; https://doi.org/10.3390/math11153269

Submission received: 3 June 2023 / Revised: 14 July 2023 / Accepted: 19 July 2023 / Published: 25 July 2023

(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Existing knowledge graph (KG) models for commonsense question answering present two challenges: (i) existing methods retrieve entities related to questions from the knowledge graph, which may extract noise and irrelevant nodes, and (ii) there is a lack of interaction representation between questions and graph entities. However, current methods mainly focus on retrieving relevant entities with some noisy and irrelevant nodes. In this paper, we propose a novel retrieval-augmented knowledge graph (RAKG) model, which solves the above issues using two key innovations. First, we leverage the density matrix to make the model reason along the corrected knowledge path and extract an enhanced subgraph of the knowledge graph. Second, we fuse representations of questions and graph entities through a bidirectional attention strategy, in which two representations fuse and update using a graph convolutional network (GCN). To evaluate the performance of our method, we conducted experiments on two widely used benchmark datasets: CommonsenseQA and OpenBookQA. The case study gives insight into the finding that the augmented subgraph provides reasoning along the corrected knowledge path for question answering.

Keywords:

commonsense question answering; knowledge graph; graph convolutional network

MSC:

68T50; 03B65; 91F20

1. Introduction

Question answering (QA) has become one of the most popular downstream tasks in natural language processing (NLP) in recent years. QA tasks utilize large-scale pre-trained language models (LMs) to obtain token representations, exemplified by BERT [1], GPT [2], ELMo [3], and RoBERTa [4], which have all achieved remarkable success. Meanwhile, commonsense as external knowledge is essential for QA systems to predict the correct answer, which is natural knowledge for humans [5]. External knowledge is often incorporated in the form of ConceptNet [6] and Freebase [7], where nodes represent entities and edges represent the relationships between two entities [8,9]. Adding a knowledge graph can enhance the interpretability and credibility of prediction answers. For example, Figure 1 shows an example of commonsense question answering based on KG. In this example, given the question “Which is a good source of nutrients for a mushroom?”, the correct answer entity “a cut peony” is given along with some incorrect choice nodes. In order to use the KG to answer a question about “nutrients for a mushroom”, we need to look for entities that involve the following concepts: “nutrients” and “mushroom”. Moreover, we use entity linking to identify these entities and match them with the concepts in the knowledge graph (KG).

However, most existing models use the KG to obtain information, which increases the risk of retrieving irrelevant or noisy nodes and leads to difficulties in interaction representation between questions and graph entities. Therefore, extracting an enhanced knowledge graph subgraph has been shown to be more effective for obtaining an accurate reasoning path for question answering [10]. On the one hand, when extracting a relevant subgraph, some noisy nodes are often contained by simple semantic matching [11,12]. As the reasoning paths become more complex, the noisy nodes also change continuously. When noisy nodes are not discarded in a timely manner, the performance of the model in predicting answers will be negatively impacted [13,14]. On the other hand, in previous works [15,16], language models and knowledge graph models have existed as independent components, which resulted in a missing relationship between the question and graph entities. The limited interaction between language models and knowledge graph models is a major issue, which causes models to struggle with understanding complex question–knowledge relations [17,18,19].

To address the above two issues, we propose a novel retrieval-augmented knowledge graph (RAKG) architecture, which makes the model reason using the corrected knowledge graph for question answering. First, the RAKG model extracts the most relevant subgraphs by utilizing the density matrix, which affects the weights of neighborhood nodes during training. Second, our proposed model utilizes the bidirectional attention strategy to fuse the representations of questions and knowledge graph entities.

Specifically, the RAKG model has two major steps: (i) We concatenate the given QA pair to obtain the representations, and we extract the KG in the form of ConceptNet in order to obtain the graph embeddings, which removes any irrelevant nodes to ensure an appropriate reasoning path. We then compute the inner product of node representations and build direct neighborhoods based on the density matrix. This can be seen as a way to capture the importance of question–entity pairs in the subgraph. (ii) Given the question and the retrieved subgraphs, RAKG obtains initialized representations of both the question and graph entities by using a graph convolutional network and language models, respectively. In addition, to make the model fuse the representations of the question and graph entities, we further incorporate a bidirectional attention strategy between the language model and the knowledge graph model to bridge the gap between the question and graph entity representations.

In summary, the contributions of this work are as follows:

(1): We propose a novel RAKG model with a retrieval-augmented KG subgraph for question answering. The augmented subgraph is extracted using the density matrix, which removes any irrelevant nodes at each layer of the RAKG.
(2): Our model utilizes a bidirectional attention strategy to effectively integrate the representations of both language models and knowledge graphs. Moreover, we use R-dropout to prevent overfitting and improve model generalization.
(3): The experimental results show that the proposed RAKG achieves better performance than several baselines on the CommonsenseQA and OpenbookQA benchmarks.

2. Related Work

2.1. Knowledge-Graph-Based Question Answering

Knowledge-graph-based question answering (KG-QA) is a natural language processing task that predicts the answer by retrieving relevant nodes from the knowledge graph. The task has been studied in recent years [20,21]. In order to utilize external knowledge, KGs are widely used in QA tasks [22,23]. Since their implementation, researchers have made significant progress in developing KG-QA systems. The method has shown significant improvement over previous methods in terms of both accuracy and interpretability.

However, given that KGs are usually large and complex, they contain many nodes that are irrelevant to the question [24]. Moreover, with the large LMs, the relationship complexity of reasoning will increase. To address this issue, retrieving a KG subgraph based on a variety of methods has been proposed [25,26]. Some works [14,27] leverage KG subgraphs for commonsense question answering, obtaining multi-hop relationships for reasoning paths. Furthermore, the additional node contributes to the construction of relationships between the language model and knowledge graph model, while [13] adds the question node to the KG subgraph structure. Therefore, the graph representations can learn more information from question nodes and graph entities. However, the above methods do not integrate the semantic information between questions and knowledge graph representations well, so we mainly focus on this point.

2.2. Graph Convolutional Network

The graph convolutional network (GCN) [28] is a type of neural network that operates on graphs and has become increasingly popular in recent years. The use of GCN allows for the modeling of complex relationships and dependencies between nodes in a graph [29]. For QA tasks, GCN have been utilized for KG complex relationships between entities and provide more accurate answers, which are achieved by using GCN to encode text and construct a graph based on the entities mentioned in the KG [30]. Moreover, R-GCN is an extension of GCN that deals with a large number of relationships with multiple graphs [18].

Recent research on GCN has focused on expanding its ability to handle different types of graph data, such as directed GCN [31] and dynamic GCN [17]. For example, the dynamic GCN is capable of handling graphs with varying structures and sizes, as well as temporal data and time-varying graphs. Moreover, the dynamic GCN can adapt to changing graphs by learning the graph structure from the KG, enabling it to identify the most relevant nodes and edges for the QA task. Meanwhile, there have been other works that focus on combining GCN with other methods, such as graph attention network (GAT) [32] and graph adversarial training [33], which use the self-attention and multi-head attention strategies to learn the node representations. In contrast to previous research, our approach utilizes a static R-GCN model, but leverages a density matrix and a bidirectional attention strategy to weight the neighborhood nodes in the KG.

3. Methodology

The retrieval-augmented KG model is composed of four main parts, as shown in Figure 2: the language context encoder, knowledge graph subgraph extraction, RAKG module, and answer prediction.

3.1. Task Formulation

Given a question q and a set of candidate answers

\{a_{1}, a_{2}, \dots, a_{n}\}

, the goal of the RAKG model is to identify the most plausible answer among a set of N candidate answers

a_{n}

. Our proposed method provides the question q and an external knowledge graph (KG) as inputs to the model. A KG is denoted as

K G = (V, E)

, where V represents entities in the knowledge graph, and E represents edges between entity1 and entity2.

3.2. Language Context Encoder

We have chosen RoBERTa [14] as the backbone for our model. Our proposed end-to-end question answering system utilizes RoBERTa to perform token representations on concatenated question–answer pairs, allowing for a streamlined one-pass approach.

Given an input question q of length

|L|

, we concatenate the question q and the answer

a_{n}

together in the format of

[[C L S]; q; [S E P]; a_{n}]

to form the language context L, where

[C L S]

and

[S E P]

are special tokens utilized by large-scale pre-trained language models. The input L is provided to the encoder

f_{e n c}

of the pre-trained LMs to generate a list of token representations. The representations of tokens Q sent to the RAKG module, in order to be further integrated with the graph entity representations, are calculated as follows:

Q = σ (f_{s} (f_{e n c} (L)))

(1)

where

σ

is the activation function,

f_{s}

represents a linear transformation, and

f_{e n c}

encodes L into vectors.

3.3. KG Subgraph Extraction

We develop an end-to-end entity-linking system, which takes a question–answer pair as input and links entities in the question to the knowledge graph. For the KG, we utilize ConceptNet [6], a general-domain knowledge graph that has multiple semantic relational edges.

To construct the subgraph for each example, we follow a previous approach [13] by selecting entities with an exact match between n-gram tokens and ConceptNet corpus using some normalization rules. We expand the subgraph by adding another set of entities based on the two-hop [14] reasoning path in the KG from the current entities in the subgraph. Additionally, we include the question as a separate node in the subgraph

K G_{s u b}

and connect it to entities in the question that are relevant to the answer.

3.4. Retrieval-Augmented Knowledge Graph Module

After obtaining token representations through the LM encoder, we further use the model to obtain entity representations in the subgraph. First, we utilize the RoBERTa model to obtain an initialization representation for each entity

X^{0} = {\{x_{n}^{0}\}}_{n = 0}^{|V|}

, and then we use an R-GCN [18] network to update the entity node representation through an iterative message passing update, calculated as follows:

\{{\tilde{x}}_{1}^{l}, \dots, {\tilde{x}}_{|V|}^{l}\} = R - G C N (\{{\tilde{x}}_{1}^{l - 1}, \dots, x_{|V|}^{l - 1}\})

(2)

where

|V|

represents the graph entity node sizes, subtracting 1 due to the question node.

To provide a more explicit representation of the reasoning path, we utilize the density matrix to re-evaluate the significance of neighboring nodes. Specifically, we propose the RAKG model, in which we use the neighbor density matrix to score the relevance of neighbor nodes. For each node, we use the inner product to compute the density matrix

M_{m a t}

, and the density matrix changes while the representations are computed at each graph layer. In our proposed RAKG model, the forward-pass message passes updates for the graph entity nodes and question nodes.

Finally, we use

M_{m a t}^{(l + 1)}

to multiply the node representation to learn the weight of edges based on the density matrix between the nodes during the message passing. The updated node representations are computed as follows:

h^{(l + 1)} = σ (M_{m a t}^{(l + 1)} \cdot h^{^{'} (l + 1)} \cdot W_{g})

(3)

where

σ

represents activation function, and

W_{g}

represents a hyperparameter. Furthermore, we update the question node representation in a similar way to that above.

In addition, we employ a bidirectional attention strategy to facilitate interaction between the pre-trained language model and the knowledge graph, computing a pair-wise matching matrix

M_{p a i r}

. The computation of the attention strategy from the language model to the knowledge graph is shown as

S_{q_{i}}^{l} = s o f t m a x (M_{p a i r})

(4)

and the computation of the attention strategy knowledge graph to the language model is shown as

S_{x_{j}}^{l} = s o f t m a x (M_{p a i r}^{T})

(5)

We obtain vector representations for both entities and questions, and fuse them using a bidirectional attention strategy with a concatenated matrix. The attended features are then compressed into a low-dimensional space. This approach helps to further clarify the reasoning path and minimizes the number of irrelevant entity nodes for the given question. Specifically, we use a bidirectional attention strategy to choose the top K relevance nodes on the above basis. Specifically, we define a retention ratio

⌈K \cdot |V|⌉ K \in (0, 1]

, which represents the relevance

⌈K \cdot |V|⌉

nodes to be retained. We choose the top-ranking nodes according to the value of the bidirectional attention strategy.

3.5. Answer Prediction

In this system, the query representation and the graph representation are combined to compute a score for a given answer. The query representation is obtained after N layers of iteration, where the query information and knowledge graph information are fused together. Similarly, the graph representation is obtained from the KG subgraph representation pooled from the last graph layer. The score of a candidate answer is computed as the dot product of the mean pooled query representation and the pooled KG subgraph representation:

p = (a | L, K G_{s u b}) = M L P ([s; g])

(6)

where s is the mean pooling of query representation, including token representations

h_{L}

and question representations

h_{Q}

, and g is the KG subgraph representation pooled from the last graph layer. The scoring function is used to determine the similarity between the query and the candidate answer, based on their representations in the knowledge graph. We obtain the final probability by normalizing all question–choice pairs with softmax and use R-Dropout [34] to regularize the end-to-end model. R-Dropout decreases the randomness introduced by the original dropout by minimizing the KL-divergence loss between the output distributions of the two sub-models sampled by the dropout.

4. Experimental Setup

4.1. Datasets

We evaluated our model on two different benchmarks for commonsense reasoning, CommonsenseQA [35] and OpenbookQA [36], which are widely used benchmarks. We used the same data split for both benchmarks and applied ConceptNet as an external knowledge graph for both benchmarks.

CommonsenseQA [35] is a benchmark for evaluating the ability of computer systems to understand and reason about commonsense knowledge. With over 32,000 questions and answers covering everyday scenarios and events, it requires systems to possess commonsense reasoning capabilities to accurately answer the questions. The dataset is unique in its focus on testing the ability of systems to reason through multiple choice questions.

OpenbookQA [36] is based on open text sources, with questions sourced from foundational science textbooks covering subjects such as physics, biology, and chemistry. The dataset contains over 5000 questions and is unique in that it focuses on testing systems’ ability to answer questions that require reasoning beyond simple fact retrieval. Each question is accompanied by a multiple-choice set of answers, making it a useful benchmark for evaluating natural language understanding and reasoning models.

The goal of the task for these two datasets is to select the most appropriate answer for a given question. During our training process, the data cleaning was performed to ensure that each question had at least one correct answer. Table 1 gives the experimental data scale and split of the two datasets. Two commonly used rank-based metrics, IHdev accuracy and IHtest accuracy, are used to evaluate the performance of the model on the same task and dataset.

4.2. Implementation Details

The RAKG model is implemented by Pytorch. We utilized the pre-trained RoBERTa [14] model to encode our information. Our end-to-end architecture was trained using R-Dropout [34] and the RAdam optimizer [37]. The batch size was set to 16 and the maximum text input sequence length was set to 128. We employed a five-layer graph neural module in our experiments, and its impact is described in Section 5.2. The learning rate for the language model was set to

10^{- 5}

and the learning rate for the graph module was set to

10^{- 3}

. We trained our model for 100 epochs and the best model obtained in the dev set was used to evaluate the test set. Each model was trained on a single GPU (3090Ti).

4.3. Baselines

To assess the effectiveness of the proposed approach, we conducted experiments using three baseline methods that rely on multiple-choice question answering tasks. A comprehensive comparison with a wide range of models was made. The following baselines methods were included:

Roberta-large [38] is a common baseline for multi-choice question answering tasks. It does not rely on any extra knowledge, such as ConceptNet.

RN [39] is designed to capture the core common properties for relation reasoning.

KagNET [27] is a path-based model that encodes multi-hop relationships extracted from knowledge graphs using an LSTM sequence model.

MHGRN [14] is a powerful baseline. The MHGRN model proposes a scalable method for knowledge-aware question answering that uses multi-hop relational reasoning. The model uses graph neural networks to perform multi-hop reasoning on a knowledge graph and integrates language models to handle NLP tasks.

QA-GNN [13] is a novel approach, which integrates language models and knowledge graphs for multi-hop question answering. The method leverages the strengths of both models, using the language model to generate the initial representations of the query and the knowledge graph to refine the representations through graph neural networks.

5. Results and Analysis

5.1. Main Results

In this section, we compare our model with the baseline models and methods, as well as the ranking results of CommonsenseQA and OpenbookQA. We also provide an analysis of the components and features of the model. Table 2 and Table 3 illustrate the main comparison results with baselines, respectively.

Table 2 shows the performance of different models on the CommonsenseQA benchmark. KagNet and RN are strong baselines. We can observe that RAKG outperforms other methods. Specifically, our method outperforms RoBERTa-large by 3.7% and RN by 2.2% on the CommonsenseQA benchmark. The boost over RN suggests that RAKG performs a better extraction of the KG subgraph for the language model and knowledge graph model. We use the same RoBERTa-large model as the backbone for training our model to ensure a fair comparison with other models. This result shows the effectiveness of our RAKG model architecture. Additionally, Table 3 shows the performance on the OpenbookQA. Our method performance improves by 8% over R-GCN and 4.1% over QA-GNN.

5.2. Ablation Studies

Impacts of RAKG components. We conducted an ablation study on the CommonsenseQA development benchmark to evaluate the effectiveness of different components of RAKG, as presented in Table 4. First, we removed the KG subgraph, and our model without the KG subgraph obtained 69.6% on the CommonsenseQA. This result indicates that while the language model can provide answers to the questions, the accuracy is relatively low. Our model, which incorporates the KG subgraph and density matrix, achieved an accuracy of 75.4%. This result demonstrates the usefulness of the density matrix in helping the model to identify more relevant neighbor nodes, improving the learning of node representations, and ultimately achieving better performance. Moreover, by incorporating a bidirectional attention strategy module, our method accuracy improved to 76.2%. We further enhanced the model’s generalization capability by incorporating R-Dropout, resulting in an accuracy rate of 76.7%. These results show the effectiveness of our approach and suggest that it has the potential for broader application.

Impact of number of RAKG Layers. In our RAKG model, each layer of the graph is composed of numerous nodes that represent various features or patterns of the input data. However, not all nodes are equally important for predicting the output. The number of graph layers has an impact on the RAKG architecture because the density matrix is based on GCN. As shown in Figure 3, we investigate the impact of the number of RAKG layers. The results indicate that as the number of graph layers increases, the accuracy rate also increases, peaking at five layers. However, after the fifth layer, the accuracy gradually decreases due to model overfitting. Therefore, our model sets the number of graph layers to five.

Impact of KL-divergence loss weight in R-Dropout. We evaluate the influence of KL-divergence loss weight. As shown in Table 5, we conducted an experiment where we set the weights of KL to take the values of 0.5, 0.7, 0.9, 1, and 5. Through experimentation, we found that as the weights become larger, the peak value of 76.7% is reached at the weight value of 0.7, but then the accuracy starts to decrease. By adjusting the KL-divergence loss weight, we can effectively control the complexity of our model and prevent model overfitting. It is worth noting that the optimal weight value for kl-divergence loss may vary depending on the specific task, dataset size, and model architecture. Therefore, it is important to choose the KL-divergence loss weight in R-Dropout carefully to avoid model overfitting and achieve better performance. Therefore, we use

α

= 0.7 to ensure the best performance of our model.

Model Interpretability. In order to better understand how our model arrives at its answers, we can analyze the reasoning path it takes. Figure 4 provides two examples from the CommonsenseQA, where our model provides the correct reasoning path and answers the question accurately. Each example consists of a question and five candidate answers. Our method links the entities in the question to those in the answer options, and uses these entities to find the most likely reasoning path between the question and the correct answer. Our model is able to capture the potential relationships between the candidate answers and the question. By analyzing the relationships between entities and making connections across them, our model can effectively reason and make accurate predictions. This is particularly important in question answering tasks where the answer may not be immediately obvious and requires the ability to reason and draw connections between different pieces of information. In summary, our model’s ability to identify and analyze reasoning paths between entities is a critical component of its success in answering questions. By leveraging this capability, our model can accurately reason and arrive at the correct answers even when faced with complex and ambiguous questions.

6. Conclusions

In this paper, we propose a novel retrieval-augmented knowledge graph (RAKG) model, which retrieves entities from the external source of knowledge in the form of a knowledge graph. Our key innovations are as follows: (i) the use of a density matrix, where we compute the relevance of KG neighborhood relationships to remove irrelevant nodes at each layer of RAKG, and (ii) the use of a bidirectional attention strategy, where we integrate the representations of both questions and knowledge graphs to obtain semantic information. Moreover, R-dropout can prevent model overfitting and improve model generalization. Through both quantitative and qualitative analyses, our results on CommonsenseQA and OpenBookQA demonstrate the superiority of RAKG over baseline methods using the KG, as well as its strong performance in carrying out complex reasoning paths.

Author Contributions

Ideas, writing—review and editing, Y.S. and Y.F.; derivation and simulation experiments, writing—original draft, Y.S. and M.H.; funding acquisition, Y.J. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62176264), the Natural Science Foundation of Jiangsu Province (Higher Education Institutions, No. 20KJA520001), the Open Research Project of Zhejiang Lab (No. 2021KF0AB05), and the Postgraduate Research & Practice Innovation Program of the Jiangsu Province (KYCX23_1049).

Data Availability Statement

The data supporting the findings of this study are available from github.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

KG	Knowledge graph.
QA	Question answering.
GCN	Graph convolutional network.
KG-QA	Knowledge graph-based question answering.

References

Li, J.; Niu, L.; Zhang, L. From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 21273–21282. [Google Scholar]
Liu, J.; Hallinan, S.; Lu, X.; He, P.; Welleck, S.; Hajishirzi, H.; Choi, Y. Rainier: Reinforced knowledge introspector for commonsense question answering. arXiv 2022, arXiv:2210.03078. [Google Scholar]
Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. arXiv 2018, arXiv:1802.05365v2. [Google Scholar]
Zhan, X.; Huang, Y.; Dong, X.; Cao, Q.; Liang, X. PathReasoner: Explainable reasoning paths for commonsense question answering. Knowl.-Based Syst. 2022, 235, 107612. [Google Scholar] [CrossRef]
Seo, J.; Oh, D.; Eo, S.; Park, C.; Yang, K.; Moon, H.; Park, K.; Lim, H. PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge. Knowl.-Based Syst. 2022, 256, 109861. [Google Scholar] [CrossRef]
Speer, R.; Chin, J.; Havasi, C. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31, pp. 1–8. [Google Scholar]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
Bhargava, P.; Ng, V. Commonsense knowledge reasoning and generation with pre-trained language models: A survey. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; Volune 36, pp. 12317–12325. [Google Scholar]
Qiao, Z.; Ye, W.; Zhang, T.; Mo, T.; Li, W.; Zhang, S. Exploiting Hybrid Semantics of Relation Paths for Multi-hop Question Answering over Knowledge Graphs. arXiv 2022, arXiv:2209.00870. [Google Scholar]
Zhang, Q.; Chen, S.; Fang, M.; Chen, X. Joint reasoning with knowledge subgraphs for Multiple Choice Question Answering. Inf. Process. Manag. 2023, 60, 103297. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, X.; Yu, J.; Tang, J.; Tang, J.; Li, C.; Chen, H. Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering. Assoc. Comput. Linguist. 2022, 5773–5784. [Google Scholar]
Cui, L.; Zhu, W.; Tao, S.; Case, J.T.; Bodenreider, O.; Zhang, G.Q. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. J. Am. Med. Inform. Assoc. 2017, 24, 788–798. [Google Scholar] [CrossRef] [Green Version]
Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; Leskovec, J. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. arXiv 2021, arXiv:2104.06378. [Google Scholar]
Feng, Y.; Chen, X.; Lin, B.Y.; Wang, P.; Yan, J.; Ren, X. Scalable multi-hop relational reasoning for knowledge-aware question answering. arXiv 2020, arXiv:2005.00646. [Google Scholar]
Zhang, Y.; Jin, L.; Li, X.; Wang, H. Edge-Aware Graph Neural Network for Multi-Hop Path Reasoning over Knowledge Base. Comput. Intell. Neurosci. 2022, 2022, 4734179. [Google Scholar] [CrossRef] [PubMed]
Ren, H.; Dai, H.; Dai, B.; Chen, X.; Zhou, D.; Leskovec, J.; Schuurmans, D. Smore: Knowledge graph completion and multi-hop reasoning in massive knowledge graphs. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1472–1482. [Google Scholar]
Zheng, C.; Kordjamshidi, P. Dynamic Relevance Graph Network for Knowledge-Aware Question Answering. arXiv 2022, arXiv:2209.09947. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018, Proceedings 15; Springer International Publishing: Cham, Switzerland, 2018; pp. 593–607. [Google Scholar]
Wang, X.; Kapanipathi, P.; Musa, R.; Yu, M.; Talamadupula, K.; Abdelaziz, I.; Chang, M.; Fokoue, A.; Makni, B.; Mattei, N.; et al. Improving natural language inference using external knowledge in the science questions domain. In Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volune 33, pp. 7208–7215. [Google Scholar]
Zhang, Q.; Chen, S.; Xu, D.; Cao, Q.; Chen, X.; Cohn, T.; Fang, M. A survey for efficient open domain question answering. arXiv 2022, arXiv:2211.07886. [Google Scholar]
Zhang, Y.; Dai, H.; Kozareva, Z.; Smola, A.; Song, L. Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orlean, LA, USA, 2–7 February 2018; Volune 32. [Google Scholar]
Abacha, A.B.; Zweigenbaum, P. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Inf. Process. Manag. 2015, 51, 570–594. [Google Scholar] [CrossRef]
Li, Z.; Zhong, Q.; Yang, J.; Duan, Y.; Wang, W.; Wu, C.; He, K. DeepKG: An end-to-end deep learning-based workflow for biomedical knowledge graph extraction, optimization and applications. Bioinformatics 2022, 38, 1477–1479. [Google Scholar] [CrossRef] [PubMed]
Tiddi, I.; Schlobach, S. Knowledge graphs as tools for explainable machine learning: A survey. Artif. Intell. 2022, 302, 103627. [Google Scholar] [CrossRef]
Chen, M.; Zhang, W.; Zhu, Y.; Zhou, H.; Yuan, Z.; Xu, C.; Chen, H. Meta-knowledge transfer for inductive knowledge graph embedding. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 927–937. [Google Scholar]
Howard, P.; Ma, A.; Lal, V.; Simoes, A.P.; Korat, D.; Pereg, O.; Wasserblat, M.; Singer, G. Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs. In Proceedings of the ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 780–790. [Google Scholar]
Lin, B.Y.; Chen, X.; Chen, J.; Ren, X. Kagnet: Knowledge-aware graph networks for commonsense reasoning. arXiv 2019, arXiv:1909.02151. [Google Scholar]
Gao, H.; Wang, Z.; Ji, S. Large-Scale Learnable Graph Convolutional Networks. arXiv 2018, arXiv:1808.03965. [Google Scholar]
Parisot, S.; Ktena, S.I.; Ferrante, E.; Lee, M.; Moreno, R.G.; Glocker, B.; Rueckert, D. Spectral graph convolutions for population-based disease prediction. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, 11–13 September 2017, Proceedings, Part III; Springer International Publishing: Cham, Switzerland, 2017; pp. 177–185. [Google Scholar]
Therasa, M.; Mathivanan, G. ARNN-QA: Adaptive Recurrent Neural Network with feature optimization for incremental learning-based Question Answering system. Appl. Soft Comput. 2022, 124, 109029. [Google Scholar] [CrossRef]
Ma, L.; Zhang, P.; Luo, D.; Zhu, X.; Zhou, M.; Liang, Q.; Wang, B. Syntax-based graph matching for knowledge base question answering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 8227–8231. [Google Scholar]
Shan, Y.; Che, C.; Wei, X.; Wang, X.; Zhu, Y.; Jin, B. Bi-graph attention network for aspect category sentiment classification. Knowl.-Based Syst. 2022, 258, 109972. [Google Scholar] [CrossRef]
Li, J.; Peng, J.; Chen, L.; Zheng, Z.; Liang, T.; Ling, Q. Spectral Adversarial Training for Robust Graph Neural Network. IEEE Trans. Knowl. Data Eng. 2022, 1, 1–14. [Google Scholar] [CrossRef]
Wu, L.; Li, J.; Wang, Y.; Meng, Q.; Qin, T.; Chen, W.; Zhang, M.; Liu, T.Y. R-drop: Regularized dropout for neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 10890–10905. [Google Scholar]
Talmor, A.; Herzig, J.; Lourie, N.; Berant, J. CommonsenseQA: A question answering challenge targeting commonsense knowledge. arXiv 2018, arXiv:1811.00937. [Google Scholar]
Mihaylov, T.; Clark, P.; Khot, T.; Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv 2018, arXiv:1809.02789. [Google Scholar]
Liu, L.; Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Han, J. On the variance of the adaptive learning rate and beyond. arXiv 2019, arXiv:1908.03265. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Santoro, A.; Raposo, D.; Barrett, D.G.; Malinowski, M.; Pascanu, R.; Battaglia, P.; Lillicrap, T. A simple neural network module for relational reasoning. arXiv 2017, arXiv:1706.01427. [Google Scholar]

Figure 1. An example of QA tasks (question and five candidate choices). * represents the correct answer. Our approach involves analyzing the question entity and incorporating KGs to predict the answer.

Figure 2. Our proposed RAKG model, composed of the language context encoder module, KG subgraph extraction module, retrieval-augmented KG module and answer prediction module. The

[C L S]

token is placed at the beginning of the first sentence, while the

[S E P]

token is used to separate the two input sentences. The red circle represents the question node. The yellow circle represents entity nodes mentioned in the question. The blue circle represents the answer node.

Figure 2. Our proposed RAKG model, composed of the language context encoder module, KG subgraph extraction module, retrieval-augmented KG module and answer prediction module. The

[C L S]

token is placed at the beginning of the first sentence, while the

[S E P]

token is used to separate the two input sentences. The red circle represents the question node. The yellow circle represents entity nodes mentioned in the question. The blue circle represents the answer node.

Figure 3. Impact of number of RAKG layers. We show IHdev Accuracy of RAKG on CommonsenseQA.

Figure 4. Case study on model interpretability. * represents the correct answer. We present two examples from CommonsenseQA with the reasoning paths in RAKG.

Table 1. Statics of CommonsenseQA (CSQA) and OpenBookQA (OBQA).

Dataset	Train	Dev	Test	Choices
CSQA	9741	1221	1140	5
OBQA	4957	500	500	4

Table 2. Dev accuracy and test accuracy of various models on the CommonsenseQA benchmark.

Methods	IHdev-Acc. (%)	IHtest-Acc. (%)
RoBERTa-large [38]	73.07	68.69
+R-GCN [18]	72.69	68.41
+GconAttn [19]	72.61	68.59
+KagNet [27]	73.47	69.01
+RN [39]	74.57	69.08
+MHGRN [14]	74.45	71.11
+QA-GNN [13]	76.50	73.40
+RAKG (Ours)	76.74	73.51

Table 3. Dev accuracy and test accuracy of various models on the OpenbookQA benchmark.

Methods	IHdev-Acc. (%)	IHtest-Acc. (%)
RoBERTa-large [38]	66.7	64.8
+R-GCN [18]	65.0	62.4
+GconAttn [19]	64.5	61.9
+RN [39]	66.8	65.2
+MHGRN [14]	68.1	66.8
+QA-GNN [13]	68.9	67.8
+DRGN [17]	70.1	69.6
+RAKG (Ours)	73.0	77.0

Table 4. Ablation study on model components using RoBERTa-large as the text encoder. We report the IHdev accuracy on CommonsenseQA.

Methods	IHdev-Acc. (%)
RAKG (N = 5) w/o KG subgraph	69.6
+KG subgraph	72.6
+density matrix	75.4
+bidirectional attention	76.2
+R-Dropout	76.7

Table 5. Impact of the KL-divergence loss weight in R-Dropout.

$α$	RAKG + R-Dropout (%)
$α$ = 0.5	74.5
$α$ = 0.7	76.7
$α$ = 0.9	76.3
$α$ = 1.0	75.6
$α$ = 5.0	75.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sha, Y.; Feng, Y.; He, M.; Liu, S.; Ji, Y. Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering. Mathematics 2023, 11, 3269. https://doi.org/10.3390/math11153269

AMA Style

Sha Y, Feng Y, He M, Liu S, Ji Y. Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering. Mathematics. 2023; 11(15):3269. https://doi.org/10.3390/math11153269

Chicago/Turabian Style

Sha, Yuchen, Yujian Feng, Miao He, Shangdong Liu, and Yimu Ji. 2023. "Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering" Mathematics 11, no. 15: 3269. https://doi.org/10.3390/math11153269

APA Style

Sha, Y., Feng, Y., He, M., Liu, S., & Ji, Y. (2023). Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering. Mathematics, 11(15), 3269. https://doi.org/10.3390/math11153269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering

Abstract

1. Introduction

2. Related Work

2.1. Knowledge-Graph-Based Question Answering

2.2. Graph Convolutional Network

3. Methodology

3.1. Task Formulation

3.2. Language Context Encoder

3.3. KG Subgraph Extraction

3.4. Retrieval-Augmented Knowledge Graph Module

3.5. Answer Prediction

4. Experimental Setup

4.1. Datasets

4.2. Implementation Details

4.3. Baselines

5. Results and Analysis

5.1. Main Results

5.2. Ablation Studies

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI