1. Introduction
Question answering (QA) has become one of the most popular downstream tasks in natural language processing (NLP) in recent years. QA tasks utilize large-scale pre-trained language models (LMs) to obtain token representations, exemplified by BERT [
1], GPT [
2], ELMo [
3], and RoBERTa [
4], which have all achieved remarkable success. Meanwhile, commonsense as external knowledge is essential for QA systems to predict the correct answer, which is natural knowledge for humans [
5]. External knowledge is often incorporated in the form of ConceptNet [
6] and Freebase [
7], where nodes represent entities and edges represent the relationships between two entities [
8,
9]. Adding a knowledge graph can enhance the interpretability and credibility of prediction answers. For example,
Figure 1 shows an example of commonsense question answering based on KG. In this example, given the question “Which is a good source of nutrients for a mushroom?”, the correct answer entity “a cut peony” is given along with some incorrect choice nodes. In order to use the KG to answer a question about “nutrients for a mushroom”, we need to look for entities that involve the following concepts: “nutrients” and “mushroom”. Moreover, we use entity linking to identify these entities and match them with the concepts in the knowledge graph (KG).
However, most existing models use the KG to obtain information, which increases the risk of retrieving irrelevant or noisy nodes and leads to difficulties in interaction representation between questions and graph entities. Therefore, extracting an enhanced knowledge graph subgraph has been shown to be more effective for obtaining an accurate reasoning path for question answering [
10]. On the one hand, when extracting a relevant subgraph, some noisy nodes are often contained by simple semantic matching [
11,
12]. As the reasoning paths become more complex, the noisy nodes also change continuously. When noisy nodes are not discarded in a timely manner, the performance of the model in predicting answers will be negatively impacted [
13,
14]. On the other hand, in previous works [
15,
16], language models and knowledge graph models have existed as independent components, which resulted in a missing relationship between the question and graph entities. The limited interaction between language models and knowledge graph models is a major issue, which causes models to struggle with understanding complex question–knowledge relations [
17,
18,
19].
To address the above two issues, we propose a novel retrieval-augmented knowledge graph (RAKG) architecture, which makes the model reason using the corrected knowledge graph for question answering. First, the RAKG model extracts the most relevant subgraphs by utilizing the density matrix, which affects the weights of neighborhood nodes during training. Second, our proposed model utilizes the bidirectional attention strategy to fuse the representations of questions and knowledge graph entities.
Specifically, the RAKG model has two major steps: (i) We concatenate the given QA pair to obtain the representations, and we extract the KG in the form of ConceptNet in order to obtain the graph embeddings, which removes any irrelevant nodes to ensure an appropriate reasoning path. We then compute the inner product of node representations and build direct neighborhoods based on the density matrix. This can be seen as a way to capture the importance of question–entity pairs in the subgraph. (ii) Given the question and the retrieved subgraphs, RAKG obtains initialized representations of both the question and graph entities by using a graph convolutional network and language models, respectively. In addition, to make the model fuse the representations of the question and graph entities, we further incorporate a bidirectional attention strategy between the language model and the knowledge graph model to bridge the gap between the question and graph entity representations.
In summary, the contributions of this work are as follows:
- (1)
We propose a novel RAKG model with a retrieval-augmented KG subgraph for question answering. The augmented subgraph is extracted using the density matrix, which removes any irrelevant nodes at each layer of the RAKG.
- (2)
Our model utilizes a bidirectional attention strategy to effectively integrate the representations of both language models and knowledge graphs. Moreover, we use R-dropout to prevent overfitting and improve model generalization.
- (3)
The experimental results show that the proposed RAKG achieves better performance than several baselines on the CommonsenseQA and OpenbookQA benchmarks.
3. Methodology
The retrieval-augmented KG model is composed of four main parts, as shown in
Figure 2: the language context encoder, knowledge graph subgraph extraction, RAKG module, and answer prediction.
3.1. Task Formulation
Given a question q and a set of candidate answers , the goal of the RAKG model is to identify the most plausible answer among a set of N candidate answers . Our proposed method provides the question q and an external knowledge graph (KG) as inputs to the model. A KG is denoted as , where V represents entities in the knowledge graph, and E represents edges between entity1 and entity2.
3.2. Language Context Encoder
We have chosen RoBERTa [
14] as the backbone for our model. Our proposed end-to-end question answering system utilizes RoBERTa to perform token representations on concatenated question–answer pairs, allowing for a streamlined one-pass approach.
Given an input question
q of length
, we concatenate the question
q and the answer
together in the format of
to form the language context
L, where
and
are special tokens utilized by large-scale pre-trained language models. The input
L is provided to the encoder
of the pre-trained LMs to generate a list of token representations. The representations of tokens
Q sent to the RAKG module, in order to be further integrated with the graph entity representations, are calculated as follows:
where
is the activation function,
represents a linear transformation, and
encodes
L into vectors.
3.3. KG Subgraph Extraction
We develop an end-to-end entity-linking system, which takes a question–answer pair as input and links entities in the question to the knowledge graph. For the KG, we utilize ConceptNet [
6], a general-domain knowledge graph that has multiple semantic relational edges.
To construct the subgraph for each example, we follow a previous approach [
13] by selecting entities with an exact match between n-gram tokens and ConceptNet corpus using some normalization rules. We expand the subgraph by adding another set of entities based on the two-hop [
14] reasoning path in the KG from the current entities in the subgraph. Additionally, we include the question as a separate node in the subgraph
and connect it to entities in the question that are relevant to the answer.
3.4. Retrieval-Augmented Knowledge Graph Module
After obtaining token representations through the LM encoder, we further use the model to obtain entity representations in the subgraph. First, we utilize the RoBERTa model to obtain an initialization representation for each entity
, and then we use an R-GCN [
18] network to update the entity node representation through an iterative message passing update, calculated as follows:
where
represents the graph entity node sizes, subtracting 1 due to the question node.
To provide a more explicit representation of the reasoning path, we utilize the density matrix to re-evaluate the significance of neighboring nodes. Specifically, we propose the RAKG model, in which we use the neighbor density matrix to score the relevance of neighbor nodes. For each node, we use the inner product to compute the density matrix , and the density matrix changes while the representations are computed at each graph layer. In our proposed RAKG model, the forward-pass message passes updates for the graph entity nodes and question nodes.
Finally, we use
to multiply the node representation to learn the weight of edges based on the density matrix between the nodes during the message passing. The updated node representations are computed as follows:
where
represents activation function, and
represents a hyperparameter. Furthermore, we update the question node representation in a similar way to that above.
In addition, we employ a bidirectional attention strategy to facilitate interaction between the pre-trained language model and the knowledge graph, computing a pair-wise matching matrix
. The computation of the attention strategy from the language model to the knowledge graph is shown as
and the computation of the attention strategy knowledge graph to the language model is shown as
We obtain vector representations for both entities and questions, and fuse them using a bidirectional attention strategy with a concatenated matrix. The attended features are then compressed into a low-dimensional space. This approach helps to further clarify the reasoning path and minimizes the number of irrelevant entity nodes for the given question. Specifically, we use a bidirectional attention strategy to choose the top K relevance nodes on the above basis. Specifically, we define a retention ratio , which represents the relevance nodes to be retained. We choose the top-ranking nodes according to the value of the bidirectional attention strategy.
3.5. Answer Prediction
In this system, the query representation and the graph representation are combined to compute a score for a given answer. The query representation is obtained after
N layers of iteration, where the query information and knowledge graph information are fused together. Similarly, the graph representation is obtained from the KG subgraph representation pooled from the last graph layer. The score of a candidate answer is computed as the dot product of the mean pooled query representation and the pooled KG subgraph representation:
where
s is the mean pooling of query representation, including token representations
and question representations
, and
g is the KG subgraph representation pooled from the last graph layer. The scoring function is used to determine the similarity between the query and the candidate answer, based on their representations in the knowledge graph. We obtain the final probability by normalizing all question–choice pairs with softmax and use R-Dropout [
34] to regularize the end-to-end model. R-Dropout decreases the randomness introduced by the original dropout by minimizing the KL-divergence loss between the output distributions of the two sub-models sampled by the dropout.
6. Conclusions
In this paper, we propose a novel retrieval-augmented knowledge graph (RAKG) model, which retrieves entities from the external source of knowledge in the form of a knowledge graph. Our key innovations are as follows: (i) the use of a density matrix, where we compute the relevance of KG neighborhood relationships to remove irrelevant nodes at each layer of RAKG, and (ii) the use of a bidirectional attention strategy, where we integrate the representations of both questions and knowledge graphs to obtain semantic information. Moreover, R-dropout can prevent model overfitting and improve model generalization. Through both quantitative and qualitative analyses, our results on CommonsenseQA and OpenBookQA demonstrate the superiority of RAKG over baseline methods using the KG, as well as its strong performance in carrying out complex reasoning paths.