Research on Automatic Question Answering of Generative Knowledge Graph Based on Pointer Network

: Question-answering systems based on knowledge graphs are extremely challenging tasks in the ﬁeld of natural language processing. Most of the existing Chinese Knowledge Base Question Answering(KBQA) can only return the knowledge stored in the knowledge base by extractive methods. Nevertheless, this processing does not conform to the reading habits and cannot solve the Out-of-vocabulary(OOV) problem. In this paper, a new generative question answering method based on knowledge graph is proposed, including three parts of knowledge vocabulary construction, data pre-processing, and answer generation. In the word list construction, BiLSTM-CRF is used to identify the entity in the source text, ﬁnding the triples contained in the entity, counting the word frequency, and constructing it. In the part of data pre-processing, a pre-trained language model BERT combining word frequency semantic features is adopted to obtain word vectors. In the answer generation part, one combination of a vocabulary constructed by the knowledge graph and a pointer generator network(PGN) is proposed to point to the corresponding entity for generating answer. The experimental results show that the proposed method can achieve superior performance on WebQA datasets than other methods.


Introduction
Natural language processing is a hot research field in artificial intelligence. A questionanswering system, as a sub-field of natural language processing, studies how to interact with machines naturally. In 1950, Turing [1] published an article titled "Computing Machinery and Intelligence" in the Mind journal. A classic Turing Test is proposed by him [2]. That is, testing the machine by means of human-machine question answering to see whether a machine can think and speak like a human. If it is impossible to distinguish whether it is a machine or a human after a certain round of conversation, the machine can be considered intelligent enough. Since then, scholars have done a lot of related research. The data sources of traditional search engines are often some web documents and other unstructured data, but with the advent of the era of big data [3], this method can no longer effectively deal with the problem of rapid increase in the amount of information, we need a new data processing method to solve this problem. In order to better meet the needs of this aspect, the knowledge graph technology is proposed [4].
The knowledge graph is essentially a semantic network. Its nodes represent entities or concepts, and the edges represent semantic relationships between entities or concepts. The concept of knowledge graph was proposed by Google [4] in May 2012. Its original intention was to improve the search capabilities and results of search engines, and to enhance the user experience. Immediately thereafter, high-quality large-scale knowledge graphs at home and abroad have been developed rapidly, and knowledge graphs in many

•
BERT pre-trained language model is adopted to obtain the word vectors of the question sentence, combining the semantic features of the word frequency of the entity words in the question sentence, which are combined together as the input sequence. • Knowledge graph combined with the source text information is used to construct a vocabulary as a pointer to generate a soft link of the network, and point to the corresponding entity for fusion to generate the corresponding answer. The rest of this paper is arranged as follows. Section 2 introduces the related work of generative question answering based on knowledge graph. Section 3 introduces the overall implementation scheme of generative question answering based on knowledge graph. Section 4 gives specific experiments and analysis. Section 5 summarizes the paper and gives future prospects of the research.

Related Work
At present, the research method of question answering based on knowledge graph has gradually changed from the previous research based on semantic analysis [17,18] to the research of question answering based on deep learning derived from information retrieval methods [19][20][21]. Methods based on semantic analysis often require vocabulary mapping and construction of a syntax tree to transform natural language into a semantic representation that can be understood by machine language, so as to perform reasoning or query to get the correct answer. However, this type of method requires researchers to be familiar with natural language related knowledge and also need to label a large amount of data. It is not suitable for large-scale knowledge base question answering tasks and has low generalization ability. The methods based on information retrieval start from the entities and relationships in the question, search for relevant paths in the knowledge base as candidate answers, and finally select the best answer through the classification model or the ranking model. In recent years, with the development of deep learning technology, it has significant advantages over traditional methods in terms of answer selection effects, and it has become a hot spot for research scholars. The above three methods are the traditional mainstream methods. In recent years, many researchers have made new improvements to these three methods or proposed new research methods. For example, Bordes et al. [22] proposed to apply the memory network to the knowledge graph question answering, making full use of the scalability and generalization of the memory network in combination with the knowledge base such as FreeBase [23]. Sorokin et al. [24] introduced the gated neural network (GGNN) to encode the graph structure with semantic analysis in response to the semantic analysis structure problem of the complicated problem that was easy to be ignored in the past.
Currently, the method of automatic question answering has been advanced to an end-to-end method. Lukovnikov et al. [25] proposed an end-to-end approach, training an encoder that can handle out-of-vocabulary(OOV) and rare word problems. At the same time, it can use character-level semantic encoders to extract different semantic information, and achieved good results. Here, the OOV problem is a problem often encountered in language models, that is, words that are beyond the scope of the vocabulary appear in the dataset, and the rare word is a special vocabulary in a specific field, such as a relatively uncommon surname. Xiong et al. [14] studied the end-to-end question answering model combining inadequate knowledge bases with the results of unstructured text retrieval, and achieving a good improvement in simple question answering questions and complex questions. Wang et al. [26] added a verification mechanism for the reliability evaluation and inspection of entity and prediction relationships on the basis of the existing KBQA question answering framework, which strengthened the reliability of relationship prediction, and applied the Sequence-to-Sequence(Seq2Seq) [27] framework to multi-relationship KBQA problems. Among them, Seq2Seq is also called Encoder-Decoder model. Encoder is used to encode sequence information and encode sequence information of any length into a vector, and Decoder is a decoder. After the decoder gets the context information vector, it can decode the information and output it as a sequence. The above methods are all based on search-type answers. Its characteristic is to be able to extract entities or relationships from the knowledge base as the answer to return, so the answer is precise and concise. In order to ensure users have a comfortable experience on the basis of ensuring the accuracy of the answers, this paper adopts the method of generative question answering. The difference between generative question answering and extractive question answering is that it is often used in reading comprehension style question answering system. Its main feature is that it can generate corresponding answers to users' questions after reading a specified article or paragraph. However, the answers generated by this type of question answering often have frequent repetitive questions. Therefore, this paper combines knowledge graph with generative question answering.
At present, there are relatively few researches on the generative question answering of the knowledge graph. In this paper, the PGN is introduced into the knowledge graph automatic question answering to form an "extraction-generation" question answering framework.

Construction of Generative QA System
The overall implementation process of this paper is shown in Figure 1, which consists of three parts, namely the knowledge vocabulary construction module, the word vectors acquisition module, and the generative model building module. Here, vocabulary of knowledge is used in the generative model, so as to select entities with high probability in the vocabulary as the next word to be generated. In the knowledge vocabulary construction module, named entity recognition BiLSTM-CRF [28] model is used to identify entities that appear in the source text and question sentences, finding the relationship and tail entities that the entity points to in the knowledge base, counting the entities separately, then separately counting the frequency of each part of all triples corresponding to the entity. In the word vectors acquisition module, the pre-trained language model BERT [29] is used to obtain the word vectors of the question sentence and then spliced with the word frequency semantic features of the entity in the question sentence as the input sequence. In the generative model building module, a PGN model is introduced to determine whether to generate vocabulary from the vocabulary or copy vocabulary from the question sentence and return it as an answer. mation vector, it can decode the information and output it as a sequence. The above methods are all based on search-type answers. Its characteristic is to be able to extract entities or relationships from the knowledge base as the answer to return, so the answer is precise and concise. In order to ensure users have a comfortable experience on the basis of ensuring the accuracy of the answers, this paper adopts the method of generative question answering. The difference between generative question answering and extractive question answering is that it is often used in reading comprehension style question answering system. Its main feature is that it can generate corresponding answers to users' questions after reading a specified article or paragraph. However, the answers generated by this type of question answering often have frequent repetitive questions. Therefore, this paper combines knowledge graph with generative question answering.
At present, there are relatively few researches on the generative question answering of the knowledge graph. In this paper, the PGN is introduced into the knowledge graph automatic question answering to form an "extraction-generation" question answering framework.

Construction of Generative QA System
The overall implementation process of this paper is shown in Figure 1, which consists of three parts, namely the knowledge vocabulary construction module, the word vectors acquisition module, and the generative model building module. Here, vocabulary of knowledge is used in the generative model, so as to select entities with high probability in the vocabulary as the next word to be generated. In the knowledge vocabulary construction module, named entity recognition BiLSTM-CRF [28] model is used to identify entities that appear in the source text and question sentences, finding the relationship and tail entities that the entity points to in the knowledge base, counting the entities separately, then separately counting the frequency of each part of all triples corresponding to the entity. In the word vectors acquisition module, the pre-trained language model BERT [29] is used to obtain the word vectors of the question sentence and then spliced with the word frequency semantic features of the entity in the question sentence as the input sequence. In the generative model building module, a PGN model is introduced to determine whether to generate vocabulary from the vocabulary or copy vocabulary from the question sentence and return it as an answer. Here, the construction method of the knowledge vocabulary is given in Section 3.1. The model and details of answer generation are described in Section 3.2.

Knowledge Vocabulary Construction
The applied question answering dataset is Baidu's public WebQA dataset [30], while the knowledge base uses CN-DBPedia [31]. For the triple information in the knowledge base, the large-scale import tool neo4j-admin-import provided by Neo4j [32] is adopted to import the information into the Neo4j graph database.
The format of the content stored in the WebQA dataset is source text, question, answer. In order to build a vocabulary of knowledge covering the entire dataset, a named entity recognition model is used to identify the source text in the dataset. First, jieba [33] word segmentation tool is used to segment the text data to be recognized, remove stop words and other operations to reduce the occurrence of errors. Here, the preprocessed data are manually filtered to create a custom dictionary, and then the BiLSTM-CRF [28] neural network model is used to identify entities in the original text. The identified entity is queried for its corresponding head entity or tail entity through the constructed cypher sentence. If the entity does not exist in the knowledge base, it will be stored in the vocabulary after manual inspection. Furthermore, TF algorithm is used to count word frequency.

Entity Recognition
Chinese named entity recognition is an important task in the field of Chinese natural language processing. However, Chinese entity names are highly context-dependent, so the task is extremely challenging. In order to be able to match the golden entity, the deep learning model for named entity recognition is used. Here, the BiLSTM-CRF [28] model is selected, which mainly contains two parts, namely the Bidirectional Long Short-Term Memory(BiLSTM) [34] module and the Conditional Random Field(CRF) [35] module. The model framework is shown in Figure 2. The specific steps are as follows. The bidirectional cyclic neural network with LSTM [36] unit is used to extract the features of the input sequence information, and finally connecting the LSTM results in the two directions and input them to the CRF layer. Then, CRF is used as the output layer of the model to generate text sequence annotation results.
Here, the construction method of the knowledge vocabulary is given in Section 3.1. The model and details of answer generation are described in Section 3.2.

Knowledge Vocabulary Construction
The applied question answering dataset is Baidu's public WebQA dataset [30], while the knowledge base uses CN-DBPedia [31]. For the triple information in the knowledge base, the large-scale import tool neo4j-admin-import provided by Neo4j [32] is adopted to import the information into the Neo4j graph database.
The format of the content stored in the WebQA dataset is source text, question, answer. In order to build a vocabulary of knowledge covering the entire dataset, a named entity recognition model is used to identify the source text in the dataset. First, jieba [33] word segmentation tool is used to segment the text data to be recognized, remove stop words and other operations to reduce the occurrence of errors. Here, the preprocessed data are manually filtered to create a custom dictionary, and then the BiLSTM-CRF [28] neural network model is used to identify entities in the original text. The identified entity is queried for its corresponding head entity or tail entity through the constructed cypher sentence. If the entity does not exist in the knowledge base, it will be stored in the vocabulary after manual inspection. Furthermore, TF algorithm is used to count word frequency.

Entity Recognition
Chinese named entity recognition is an important task in the field of Chinese natural language processing. However, Chinese entity names are highly context-dependent, so the task is extremely challenging. In order to be able to match the golden entity, the deep learning model for named entity recognition is used. Here, the BiLSTM-CRF [28] model is selected, which mainly contains two parts, namely the Bidirectional Long Short-Term Memory(BiLSTM) [34] module and the Conditional Random Field(CRF) [35] module. The model framework is shown in Figure 2. The specific steps are as follows. The bidirectional cyclic neural network with LSTM [36] unit is used to extract the features of the input sequence information, and finally connecting the LSTM results in the two directions and input them to the CRF layer. Then, CRF is used as the output layer of the model to generate text sequence annotation results.

1.
BiLSTM Module LSTM is a variant of RNN [37]. According to its special design, it can solve the vanishing gradient or exploding gradient generated during RNN training. It cleverly uses the concept of gating to effectively capture long sequence information. The formula of an LSTM structure is as follows: Information 2021, 12, 136 6 of 17 Among them, x t is the unit input, σ is the activation function, i t , f t , o t represent the input gate, forget gate, and output gate at time t, while W and b represent the weight matrix and bias vector. c t represents the update status at time t while h t represents the output at time t.
Compared with LSTM, BiLSTM adopts forward and backward LSTMs for each word sequence, and then obtains the final hidden layer representation through vector splicing. Therefore, BiLSTM can process context information at the same time and improve the performance of named entity recognition. The specific expression is as follows: 2.

CRF Module
CRF combines the characteristics of the maximum entropy model and the hidden Markov model, and can consider the dependencies between tags to obtain the global optimal tag sequence. This can make up for the shortcomings of BiLSTM. The probability of CRF for a given sequence X = (x 1 , x 2 , . . . , x n ) and corresponding predicted sequence Y = (y 1 , y 2 , . . . , y n ) can be expressed as: Here, A represents the transfer score matrix, and A i,j represents the score of label i transferred to label j.
The formula for predicting the probability of sequence Y is as follows: The formula of the likelihood function of the marker sequence during training is as follows: where Y X represents all possible labeling sequences. The formula for the output sequence with the largest score after decoding is as follows:

Vocabulary Construction
By observing the recognition results, it can be found that there are some entity types that are difficult to recognize, such as movie names. In response to this problem, this paper specifies corresponding rules for entity types to filter, such as using common surname dictionaries to match names, and formulating date formats to identify dates. The entity types that cannot be set by the above rules are all set as mixed entities. Then, the cypher statement is used to query the corresponding entity in the knowledge graph (relationship, tail entity). Then the TF algorithm is used to count the number of occurrences of each part in the dataset (head entity, relationship, tail entity). The purpose of using the TF algorithm is to count the word frequency of the entity or relationship to emphasize its importance in the source text. If the entity or relationship does not exist in the knowledge graph, it will be stored it in the dictionary, which can reduce the problem of OOV.

Answer Generation Model
The current question answering tasks are mostly done in extractive manner. In addition, the answer is a series of fragments in the text, or from the triples corresponding to the entities in the question sentence. This will cause duplicate answers or OOV phenomenon [38,39]. To solve this problem, this paper implements a joint training method with extractive and generative joints to get the answer. The model includes word vectors acquisition and PGN model. The answer generation module model is shown in Figure 3.
By observing the recognition results, it can be found that there are some entity types that are difficult to recognize, such as movie names. In response to this problem, this paper specifies corresponding rules for entity types to filter, such as using common surname dictionaries to match names, and formulating date formats to identify dates. The entity types that cannot be set by the above rules are all set as mixed entities. Then, the cypher statement is used to query the corresponding entity in the knowledge graph (relationship, tail entity). Then the TF algorithm is used to count the number of occurrences of each part in the dataset (head entity, relationship, tail entity). The purpose of using the TF algorithm is to count the word frequency of the entity or relationship to emphasize its importance in the source text. If the entity or relationship does not exist in the knowledge graph, it will be stored it in the dictionary, which can reduce the problem of OOV.

Answer Generation Model
The current question answering tasks are mostly done in extractive manner. In addition, the answer is a series of fragments in the text, or from the triples corresponding to the entities in the question sentence. This will cause duplicate answers or OOV phenomenon [38,39]. To solve this problem, this paper implements a joint training method with extractive and generative joints to get the answer. The model includes word vectors acquisition and PGN model. The answer generation module model is shown in Figure 3.

Word Vectors Acquisition
In order to better obtain the semantic information of Chinese questions, the BERT pre-trained language model is used to obtain the word vectors of the questions in the word vectors acquisition stage, and the word frequency feature is used to give the question sentences. The entities are scored, and then the two are joined as the input sequence.

BERT Module
At present, the research on language models has experienced one-hot, Word2Vec [40], ELMO [41], GPT [42] to BERT [29]. The first few language models have certain defects. For example, the word vector trained by the Word2Vec model is a static word vector and cannot represent a word with multiple meanings; GPT is a one-way language model and cannot obtain the context of a word. BERT has strong semantic representation capabilities. BERT is a model structure of a multilayer bidirectional Transformer encoder. The model structure diagram is shown in Figure 4. Transformer encoder pays attention to the contextual information on both sides. The input of the BERT model is obtained by adding together position embedding, segment embedding, and word embedding. In addition, the head and tail tokens in each question are represented by special identifiers [CLS] and [SEP], respectively, which are used to disconnect two sentences in the corpus. The output of the model is the semantic representation of the corresponding fusion context information after each word vectors passes through the Transformer encoder.

BERT Module
At present, the research on language models has experienced one-hot, Word2Vec [40], ELMO [41], GPT [42] to BERT [29]. The first few language models have certain defects. For example, the word vector trained by the Word2Vec model is a static word vector and cannot represent a word with multiple meanings; GPT is a one-way language model and cannot obtain the context of a word. BERT has strong semantic representation capabilities. BERT is a model structure of a multilayer bidirectional Transformer encoder. The model structure diagram is shown in Figure 4. Transformer encoder pays attention to the contextual information on both sides. The input of the BERT model is obtained by adding together position embedding, segment embedding, and word embedding. In addition, the head and tail tokens in each question are represented by special identifiers [CLS] and [SEP], respectively, which are used to disconnect two sentences in the corpus. The output of the model is the semantic representation of the corresponding fusion context information after each word vectors passes through the Transformer encoder. Among them, the most important part of BERT is the two-way Transformer encoder. Transformer abandons the cyclic network structure of RNN and models the text entirely based on the attention mechanism. The key to the encoder is the self-attention mechanism, as shown in the formula: Among them, Q, K, and V are the input word vectors matrix, and k d is the input vector dimension. In order to expand the model's ability of focusing on different positions, the transformer adopts a multi-head mode and finally stitches different attention results together, as shown in the following formula: Among them, the fully connected feedforward network in the Transformer structure has two layers of dense, the first layer is the RELU activation function, and the second Among them, the most important part of BERT is the two-way Transformer encoder. Transformer abandons the cyclic network structure of RNN and models the text entirely based on the attention mechanism. The key to the encoder is the self-attention mechanism, as shown in the formula: Among them, Q, K, and V are the input word vectors matrix, and d k is the input vector dimension. In order to expand the model's ability of focusing on different positions, the transformer adopts a multi-head mode and finally stitches different attention results together, as shown in the following formula: Among them, the fully connected feedforward network in the Transformer structure has two layers of dense, the first layer is the RELU activation function, and the second layer is the linear activation function. If the output of the multi-head attention mechanism is expressed as Z, FFN can be expressed as: Using the BERT pre-trained language model allows the word vectors to obtain the semantic information of contextual interaction and better express the content of the question.

Frequency Feature
In view of the high frequency of keywords in the text, the traditional word frequency feature is used to describe the text to improve the performance of the important entities or relationships in the question. The word frequency feature represents the number of times a word appears in the text, and it can reflect the theme of the text. The basic TF algorithm is Information 2021, 12, 136 9 of 17 used to count the number of times the entity or relationship in the currently input question appears in the source text. The method is as follows: Among them, n i,j represents the number of occurrences of the entity in the text, and the denominator represents the sum of the number of occurrences of all words in the text.

Pointer-Generator Network Model
Pointer-Generator Networks [16] is a combination of the pointer network [43] and the Encoder-Decoder model based on Attention. It allows pointers to point to the generated words, and also allows the generation of words through a fixed vocabulary. In this paper, the question sentence is first passed into the BERT word frequency feature, and then entered into the Encoder-Decoder model based on Attention. The PGN calculates a generation probability for each decoder time step to determine whether to generate words from the knowledge vocabulary or copy words from the question sentence.
In the encoder and decoder model, the encoder integrates the input of a complete sentence into a fixed-dimensional vector, and then inputs the vector to the decoder, and the decoder predicts the output based on the vector. However, when the input sentence is long, the fixed-dimensional vector cannot store all the information. In response to this problem, Bahdanau et al. [44] proposed an attention mechanism model in 2015. The attention mechanism allows the decoder to view the words or fragments of the input sentence in the encoder at any time, thus solving the problem of using intermediate vectors to store information. In order to better understand the semantic information of the question entered by the user, it is to ensure that it will not forget the previous key information in the longer text. In this paper, the encoder part of the model uses the BiLSTM model, which captures the long-distance dependence and location information of the source text, so it can better understand the user's intention to enter the question. The LSTM model is used in the decoding part of the model. The detailed description of the model is as follows.
After the question is spliced by BERT and word frequency semantic features, a new input sequence is generated, and the input sequence is input to the BiLSTM encoder. Then the hidden layer state h i is generated after a single-layer BiLSTM. At the time t, the LSTM decoder receives the previous generated word vectors, getting the decoding state sequence s t . Then the state of the encoder and the decoder are calculated to obtain the attention distribution a t , so as to determine the characters that need attention at this time step. The formula is as follows: a t = so f tmax(e t ) (18) v T refers to the coefficient matrix of attention mechanism, and W h and W s represent the coefficient parameters obtained through training. The important context vector h * t is obtained by summing the attention weight and the h i weight, the formula is as follows: The context vector is the question information read from the encoder at the current time step.
When the model is used to generate words, the words generated by the model are generated in the knowledge vocabulary. The probability distribution is concatenated and spliced by the decoding state sequence s t and the context vector h * t . After passing through two fully connected layers, the current predicted word list distribution P vocab is generated. The formula is as follows: P(w) = P vocab (w) (21) In the formula, V , V, b, b are the parameters obtained through learning, and P(w) represents the probability of the generated word at the current moment being the word w in the knowledge vocabulary.
When the model is used to copy words, the probability of pointing to words in the input sequence w is determined according to the attention distribution a t at time t. The formula is as follows: The model uses the generation probability P gen to determine whether to copy words from the question sentence or generate words from the knowledge vocabulary. The formula is as follows: The vectors w h * , w s , w x and scalar b ptr are the parameters obtained through training, and σ is the sigmoid function.
Finally, the weighted average of the vocabulary distribution and the attention distribution by P gen is used to obtain the final probability distribution of the generated word w. The formula is as follows: P(w) = P gen P vocab (w) + (1 − P gen )P a (w) (24) It can be seen from the above formula that when the word w does not appear in the knowledge vocabulary P vocab (w) = 0, and when the word w does not appear in the question sentence P a (w) = 0.

Datasets
The dataset used in this experiment is Baidu's open source WebQA dataset. In the WebQA (http://idl.baidu.com/WebQA.html, accessed on 21 March 2021) dataset, each sample is composed of source text, questions, and answers, among which the questions are based on the source text. In the WebQA dataset, there are 36,145 items in the training set, 3024 items in the test set, and 3018 items in the validation set. The questions in this dataset are provided by "Baidu Know".
When constructing the vocabulary of knowledge, the knowledge base used in this paper is CN-DBPedia (http://openkg.cn/dataset/cndbpedia, accessed on 21 March 2021). It is a large-scale general-purpose domain structured encyclopedia developed and maintained by the Fudan University Knowledge Works Research Labortory. It contains 9 million encyclopedia entities and 67 million triple relationships. Among them, mention2entity information is 1.1 million, summary information is 4 million, label information is 19.98 million, and infobox information is 41 million.

Evaluation Metrics
In order to better evaluate the effect of the model, the standard evaluation methods precision(P), recall(R), and F1-score(F1) are used in the entity recognition module. Accuracy is used in the answer generation module, which are as follows:

Experimental Environment
The experimental environment of this paper is shown in Table 1.

Parameter Setting
Tensorflow is a deep learning framework developed by Google's artificial intelligence team and is widely used to implement various machine learning algorithms, which is used in the paper to build the model.
In the entity recognition model, the experimental parameter settings in this paper are shown in Table 2. In the answer generation model, the Bert_base version is used to obtain the question vector in the experiment. It has a total of 12 layers of transformers, 768 hidden units, and 12 attention-heads. The parameter sets the maximum sequence length to 128 and batch_size to 16. The batch_size set by the PGN model is 16, and the size of the custom knowledge vocabulary is 50,000, arranged in reverse order of word frequency to ensure that the most common words appear in the vocabulary. The experimental parameter settings used by the model are shown in Table 3. The BiLSTM-CRF named entity recognition model is tested in the source text of the WebQA dataset, and the evaluation results are shown in Table 4. In order to verify the recognition effect of the BiLSTM-CRF model used in this article, it is compared with the recognition results of CRF, CNN, BiLSTM, and LSTM-CRF, as shown in Table 5. It can be seen from Table 5 that the recognition effect of our model is better than other models. 1. BiLSTM-CRF can provide better sequence characteristics, so it is better than using a single CRF. 2. Although CNN can extract features effectively, it extracts static features, so for dynamic sequences, using BiLSTM-CRF can more comprehensively obtain the contextual semantic information of the question answering text.
It can be clearly seen from the model comparison histogram in Figure 5 that the results of BiLSTM-CRF are better than other models. It can be clearly seen from the model comparison histogram in Figure 5 that the results of BiLSTM-CRF are better than other models.

Answer Generation Module
At present, there are few research works on generative based on knowledge graph, and there is no unified standard for the evaluation of generating a completed sentence containing the answer. In this regard, researchers usually propose standards based on the research question. Therefore, in this experiment, whether the specified entity of the knowledge graph can be included in the generated answer, the evaluation standard adopted is the coverage of the entity in the knowledge graph in the output answer.
In order to prove the effectiveness of our method, the (question, answer) in the CN-DBPedia knowledge base and the WebQA dataset in the following models for comparative experiments are used. The results are shown in Table 6.

Answer Generation Module
At present, there are few research works on generative based on knowledge graph, and there is no unified standard for the evaluation of generating a completed sentence containing the answer. In this regard, researchers usually propose standards based on the research question. Therefore, in this experiment, whether the specified entity of the knowledge graph can be included in the generated answer, the evaluation standard adopted is the coverage of the entity in the knowledge graph in the output answer.
In order to prove the effectiveness of our method, the (question, answer) in the CN-DBPedia knowledge base and the WebQA dataset in the following models for comparative experiments are used. The results are shown in Table 6. proposed a relationship detection model based on a multiangle attention mechanism, which uses the attention mechanism to extract the correlation between question patterns and candidate relationships from word level and relationship level.
Since there are currently fewer extraction-generative methods based on knowledge graphs, a comparative model of extraction methods is chosen for experiments. It can be seen from Table 6 that our method has achieved good results. After research and analysis, it can be found that the errors in the experiment are mainly caused by the lack of entities in the knowledge vocabulary, the existence of wrong entities, and the words describing relationships in the knowledge base and the answers, which are caused by the recognition effect of the entity recognition module. The impact and the impact of the vocabulary size setting thereby affects the performance of our model. Table 7 shows examples of answers returned by the above three models and the methods in this paper. It can be seen from the sample that the CN-DBPedia knowledge base corresponds to the entity "Huo Xiaolin" in the question sentence with (Huo Xiaolin, father, Huo Shaochang), (Huo Xiaolin, starring, Yang Zhigang) and other triples, but there is no entity " "Huo Shaochang" corresponds to the triple, so the answer returned by the extractive model is wrong. However, the method adopted in this paper first reads the entity in the original text to ensure that the entity is stored in the knowledge vocabulary. Therefore, the above experiments show that the method proposed in this paper can effectively return the correct answer when the knowledge base is not fully stored, and generate a humanized complete sentence answer while ensuring the accuracy of the answer. Based on the above analysis, this paper considers the impact of the size of the knowledge vocabulary on the answer generation. Therefore, three vocabulary sizes are used for experiments. In the comparison model, the same triples corresponding to the knowledge vocabulary are used. As shown in Figure 6, the accuracy of the extractive model is improving with the increase of knowledge base data. However, the main reason that the accuracy of the method proposed in this paper decreases after the word expression reaches a certain level is that after the scale of the vocabulary expands, more useless nouns increase, which will also lead to inaccurate answers.

Our approach (Kou Zhenhai is Huo
Xiaolin's father.) Based on the above analysis, this paper considers the impact of the size of the knowledge vocabulary on the answer generation. Therefore, three vocabulary sizes are used for experiments. In the comparison model, the same triples corresponding to the knowledge vocabulary are used. As shown in Figure 6, the accuracy of the extractive model is improving with the increase of knowledge base data. However, the main reason that the accuracy of the method proposed in this paper decreases after the word expression reaches a certain level is that after the scale of the vocabulary expands, more useless nouns increase, which will also lead to inaccurate answers.   In the process of realizing answer generation, the BERT-word frequency semantic feature + pointer generation network model is used as the answer generator. In the word vectors acquisition stage, the semantic features of BERT and word frequency are spliced as the input sequence of the answer generator. In addition, in the word vector acquisition phase, the influence of using Term Frequency-Inverse Document Frequency (TF-IDF) [48] on the output is considered. In order to verify the effectiveness of our method, a comparative experiment was carried out. The results are shown in Table 8. According to Table 8, the accuracy of using only PGN is 3.19% lower than that of using the BERT pre-trained language model to splice word frequency semantic features, and our method is 1.25% higher than the method using no word frequency semantic features. Experiments show that our method can further improve the effect of the model. The word vectors representation obtained by the pre-trained language model and word frequency semantic features can more accurately understand the semantic information of the question, and then improve the accuracy of the generated answer. Then our method is compared with a further improved method using TF-IDF. It can be found that the accuracy rate was increased by 0.32% compared to the previous one. Combining the pre-trained language model with TF-IDF, our method can express the importance of the entities of the question in the dataset more clearly.
In order to facilitate viewing, the experimental results of each part of the model as a histogram are compared, as shown in Figure 7. question, and then improve the accuracy of the generated answer. Then our method is compared with a further improved method using TF-IDF. It can be found that the accuracy rate was increased by 0.32% compared to the previous one. Combining the pretrained language model with TF-IDF, our method can express the importance of the entities of the question in the dataset more clearly.
In order to facilitate viewing, the experimental results of each part of the model as a histogram are compared, as shown in Figure 7.

Conclusions
This paper innovatively proposes a question-and-answer method that combines a generative knowledge graph . First, it introduces how to construct a knowledge vocabulary for generating words, and then gives a method for obtaining the BERT-word frequency semantic feature of word vectors, and proposes using the knowledge graph as a pointer to generate a soft link of the network to generate answers.
In future work, other feature fusion will be considered to improve the understanding of question semantics, and add BERT to the entity recognition module to improve the 74

Conclusions
This paper innovatively proposes a question-and-answer method that combines a generative knowledge graph. First, it introduces how to construct a knowledge vocabulary for generating words, and then gives a method for obtaining the BERT-word frequency semantic feature of word vectors, and proposes using the knowledge graph as a pointer to generate a soft link of the network to generate answers.
In future work, other feature fusion will be considered to improve the understanding of question semantics, and add BERT to the entity recognition module to improve the accuracy of identifying entities. In order to ensure the quality of the expanded knowledge vocabulary, deep learning methods will be considered to further extract the entities and relationships of the original part of the dataset. Next, more datasets will be chosen for experiments to verify the performance of this method. Furthermore, knowledge graph technology will be combined with a variety of generative models to generate answers, and further improve the effect of generating answers.