You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

30 July 2022

KGNER: Improving Chinese Named Entity Recognition by BERT Infused with the Knowledge Graph

,
,
,
and
1
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
3
State Grid Xinjiang Electric Power Co., Ltd., Urumqi 830000, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Natural Language Processing (NLP) and Applications

Abstract

Recently, the lexicon method has been proven to be effective for named entity recognition (NER). However, most existing lexicon-based methods cannot fully utilize common-sense knowledge in the knowledge graph. For example, the word embeddings pretrained by Word2vector or Glove lack better contextual semantic information usage. Hence, how to make the best of knowledge for the NER task has become a challenging and hot research topic. We propose a knowledge graph-inspired named-entity recognition (KGNER) featuring a masking and encoding method to incorporate common sense into bidirectional encoder representations from transformers (BERT). The proposed method not only preserves the original sentence semantic information but also takes advantage of the knowledge information in a more reasonable way. Subsequently, we model the temporal dependencies by taking the conditional random field (CRF) as the backend, and improve the overall performance. Experiments on four dominant datasets demonstrate that the KGNER outperforms other lexicon-based models in terms of performance.

1. Introduction

Named-entity recognition (NER) is devoted to locating and classifying certain occurrences of words or expressions in unstructured text into predefined semantic categories, such as person names, locations, organizations, etc. This is not only an important upstream task of natural language processing (NLP), but also an essential prerequisite for other related tasks, such as information retrieval [1], relation extraction [2,3], question-and-answer (Q&A) systems [4] and other applications. It has also drawn the attention of the academic community in recent decades.
Chinese language has the characteristic that text consists of characters rather than words, resulting in Chinese sentences lacking clear word boundaries. This also adds new opportunities and challenges to the task of Chinese named-entity recognition. Previous methods have shown that character-based approaches perform better than word-based approaches in Chinese NER, because they are not affected by Chinese word segmentation errors [5,6]. The application of lexical features enables external lexical information to enhance the training of NER [7,8,9,10]. However, due to the flexibility of named entity recognition, there may be a large number of out of vocabulary (OOV) named entities in the open domain, which poses a great challenge. In addition, named entities may be ambiguous. For example, for the sentence “Blue Moon tops Premier League with 12-game winning streak.” The term “blue moon” literally means that the moon is blue in color. However, in soccer news, it often stands for the English Premier League team Manchester City F.C. Nevertheless, knowledge graphs containing domain knowledge may be helpful in this regard [11,12].
The rise in pre-trained models in recent years has brought new solutions to named-entity recognition. Many NLP tasks have shown promise for unsupervised pre-trained language representation models, e.g., BERT [13] and ELMo [14]. After being pre-trained on many unlabeled data to obtain generic representations, the pre-trained models are often equipped with back-end models to suit downstream tasks. Although successful results have been achieved for limited labeled data in a specific domain, these models often perform poorly on knowledge-driven tasks.
Knowledge graphs (KG) represent entities and relationships in a graph and contain a wealth of information regarding world knowledge. Therefore, they are an important complement to existing pre-trained language models and have the potential to address the sparsity problem existing among most NLP tasks. Recall the previous example. The term “blue moon” also implies the Manchester City team in the soccer domain. We can infer that, in order to make the model aware of this hidden meaning, we should empower it with the ability to inquire relevant information from reliable knowledge sources. The knowledge graph opens the door to this information, allowing details about entities or relations, which may never have been encountered in the training data, to be learned.
There are two lines of recent methods enhancing knowledge-based neural Chinese named-entity rcognition. The first directly considers integrating knowledge when we begin to train a model such as ERNIE [15]. Although domain-specific knowledge graphs can be injected in the pre-training phase, this training process can be expensive and time-consuming. The second considers the integration of domain-specific knowledge into a pre-trained model such as K-BERT [12]. However, there are two challenges to face in the road of knowledge integration:
  • Knowledge noise (KN), disturbing knowledge, is often incorporated into the modeling, which may confuse semantic information.
  • Heterogeneous information fusion (HIF), word embeddings in text, and entity embeddings in a knowledge graph are obtained in different ways, and they are two independent vector spaces.
To cope with the above problems and challenges, this paper proposes KGNER based on BERT and KG to extract the information of entities and enable language models to obtain detailed information, beyond the training data. There are three main contributions of this paper, which are summarized as follows:
  • This paper proposes a new position coding method that can make good use of the detailed information of the knowledge graph and also preserve the original sentence semantic information.
  • Our method avoids KN and HIF problems during the process of injecting structural information in knowledge graph.
  • We adopt a conditional random field model for better modeling of sequential information.
To verify our proposed method, we conducted elaborate experiments on four publicly available datasets, and the results demonstrate the effectiveness of KGNER.
The rest of the paper is organized as follows: We first present related work in Section 2. We then describe the proposed methods and formulations in details in Section 3, followed by experiments and results in Section 4 and Section 5. We conclude our paper with a discussion on future work in Section 6.

3. Model

3.1. Model Architecture

Formally, given a knowledge graph and a Chinese sentence with n characters s c = { c 1 , c 2 , . . . , c n }, where c i denotes the i-th character in the sentence and n is the length of this sentence. KG, denoted as K , is a collection of triples ε = ( w i , r j , w k ), where w i and w k are names of entities. Each token w i is included in the vocabulary, w i V and r j is the relation between w i and w k . All the triples are included in KG.
As illustrated in Figure 1, the model architecture is made up of three modules, i.e., the knowledge layer, embedding layer and output layer.
Figure 1. The model structure.

3.2. Knowledge Layer

The knowledge layer (KL) is designed to inject the knowledge of the knowledge graph into the sentence. Specifically, given an input sentence S = { w 0 , w 1 , w 2 , . . . , w n } and a KG, we can obtain a sentence tree. This process can be broken down into two steps: knowledge query (K-Query) and knowledge injection (K-Inject).
In K-Query, first, the sentence needs to be cut apart, and the knowledge graph is built as a look table. Each token in the sentence is taken to match with the look table for selecting the triples. K-Query can be formulated as (1).
Next, to create a sentence tree T with a wealth of knowledge. K-Inject places the triples in E to their corresponding position so that the triples queried can be injected into the original sentence S. The structure of T is illustrated in Figure 2, where E = { ( w i , r i 0 , w i 0 ) , . . . , ( w i , r i k , w i k ) } is a collection of the corresponding triples. K-Inject can be formulated as (2)
E = K - I n j e c t ( S , E )
Figure 2. Structure of the sentence tree.

3.3. Embedding Layer

In the embedding layer, the BERT encoder can encode sentence trees as embedding representation. The embedding representation of BERT consists of token embedding, position embedding, and segment embedding. There is one significant difference between our model input and the general BERT. Our model input is a sentence tree instead of a token sequence. Therefore, it is crucial for BERT to convert the sentence tree into a sequence while preserving the original information of the sentences.
Token embedding: For this work, our token embedding is in line with BERT. We adopt the vocabulary provided by Google BERT in this paper. Each token in the sentence tree is turned into an embedding vector of dimension H with the help of a trainable lookup table. Moreover, [ C L S ] is considered as as a category tag and [ M A S K ] is regarded as a masked token in BERT. Nevertheless, tokens in the sentence tree are expected to be rearranged before the embedding operation is performed.
Seg embedding: When multiple sentences are entered, BERT uses segmentation embedding to distinguish between the different sentences. In this work, we only use a sentence. { A , A , A , A , . . . , A }, a sequence of segment tags, is adopted to mark a sentence.
Position embedding: For BERT, the position embedding contains all the structural information of the BERT’s input sentence. Without position embedding, this will be treated as a bag-of-word model, resulting in a lack of structural information (i.e., the order of tokens). We are allowed to add the missing structured information to the unreadable rearranged sentence.
E = K - Q u e r y ( S , K )
Mask-Self-Attention: The risk involved in using knowledge is that the original sentence can suffer from changes in meaning. To avoid affecting the meaning of the original sentences, some measures need to be taken. First, a sentence tree is constructed by knowledge, and each word in the sentence tree is encoded by absolute position. Then, the tokens in the sentence tree are flattened into a sequence of the token by means of their absolute-position index. In other words, the original sentence is followed by tokens in the branch. As shown in Figure 3, the sentence tree is rearranged as “Jack is in Beijing Zhongshan park now capital China is_a city locate_in Tian’an’men west”. The advantage of this is that the original semantic information of the sentence is preserved, but the sentence is still unreadable. For example, there is no connection between [ n o w ] and [ c a p i t a l ].
Figure 3. Flat of the sentence tree.
Fortunately, this can be solved by limiting the visible area of each token by the mask matrix. We introduce the visible mask matrix in Figure 4. The sentences of different branches are invisible to each other. Here is an example to explain this. [ C h i n a ] and [ c i t y ] are not visible to each other; however, [ B e i j i n g ], [ c a p i t a l ] and [ C h i n a ] are visible each other. In this way, the words in the sentence not only acquire the corresponding knowledge information but also prevent the influence of irrelevant words. The visible matrix is defined as (3).
M i j = 0 w i w j w i w j
where w i w j indicates that w i and w j are in the same branch, while w i w j are not. i and j are the absolute position index. To avoid the semantic changes caused by taking advantage of the sentence structure information in M, we utilize a mask-self-attention, which is an extension of self-attention. Formally, the mask-self-attention is defined as (4).
Q i + 1 , K i + 1 , V i + 1 = h i W q , h i W k , h i W v ,
S i + 1 = softmax Q i + 1 K i + 1 + M d k ,
h i + 1 = S i + 1 V i + 1
where W q , W k and W v are trainable model parameters. h i is the hidden state of the i-th mask-self-attention blocks. d k is the scaling factor.
Figure 4. Mask matrix.

3.4. Output Layer

For sequence labeling tasks, there is a strong connection between the labels. For example, compared to a verb, an noun is more likely to follow a adjective. In NER, using a standard BIO annotation [46], I-PER cannot be followed by I-ORG. Therefore, considering the connection between adjacent tags can be of great benefit to NER. As conditional random field (CRF) [47,48] can make full use of the neighbor tag information when predicting current tag, it can jointly decode the best chain of labels for a given input sentence. Consequently, we consider CRF when modelling a label sequence instead of decoding each label independently.
Formally, we denote a generic input sequence by z = { z 1 , . . . , z n }. z i represents the input vector of the i-th word. We denote a generic sequence of labels for z by y = y 1 , , y n . Y ( z ) represents the set of probable label sequences for z. Of all possible label sequences y given z, the probabilistic model for sequence CRF defines a family of conditional probability p ( y z ; W , b ) . It takes the following form:
p ( y z ; W , b ) = i = 1 n ψ i y i 1 , y i , z y Y ( z ) i = 1 n ψ i y i 1 , y i , z
ψ i y , y , z = exp W y , y T T z i + b y , y are potential functions. W y , y T T denotes the weight vector, and b y , y denotes bias corresponding to label pair y , y .
We employ the maximum conditional likelihood estimation to train CRF. Given a training set z i , y i , the logarithm of the likelihood (a.k.a. the log-likelihood) is defined by the following equation:
L ( W , b ) = i log p ( y z ; W , b )
The parameters maximizing the log-likelihood L ( W , b ) will be choosed by maximum likelihood training. Decoding aims to find the label sequence y * possessing the highest conditional probability:
y * = argmax y Y ( z ) p ( y z ; W , b )
For a sequence CRF model, the Viterbi algorithm is adopted for training and decoding.

4. Experiments Setup

An extensive set of experiments were carried out to investigate the effectiveness of knowledge graphs across different domains. In addition, we aimed empirically compare lexicon-based and knowledge-based Chinese NER in different settings.

4.1. Data

Four Chinese NER datasets were used to evaluate our model, including (1) Ontonotes 4.0 [49] (2) MSRA [50] (3) Resume [8] (4) Weibo [51,52]. OntoNotes and MSRA datasets belong to the field of news. Weibo NER dataset was obtained from the social media website Sina Weibo. Resume NER dataset comprised the resumes of senior executives, which were annotated by [8]. Dataset statistics are demonstrated in Table 1.
Table 1. Statistics of datasets.

4.2. Metrics

Precision P, recall R and F-measure were used as measures, defined as follows:
P = N m N p , R = N m N r , F 1 = 2 × P × R P + R
N m , N p and N r denote the total number of matched entities, predicted entities and real entities, respectively. F 1 is the reconciled average metric of accuracy and recall, and is a composite metric that balances the impact of accuracy and recall.

4.3. Hyperparameters

Our BERT parameters are consistent with Google BERT [13]. L is the number of mask-self-attention layers, and A is the size of each head. H is denoted as the hidden dimension of embedding vectors. The model is configured as follows. L,A and H are 12, 12 and 768, respectively. The total trainable K-BERT parameters are the same as BERT(110M). The Adam optimizer with an initial learning rate of 2 × 10 5 is adopted. The number of maximum epoch number is 5 for training on all datasets. We set the max length of the sequence to 256 and the training batch size to 16 for all datasets.

4.4. Baselines

To verify the validity of the proposed model, we conducted a comparision with the K-BERT [12] in the experiments.
The experimental results on Chinese NER datasets are given in Table 2. In the first part of the table, the first four rows [8,53,54,55] displayed the performance of lexicon-enhanced, character-based Chinese NER models. They built dictionaries from pre-trained word vectors. The last two rows [18,21] in the same block were the state-of-the-art models, and they integrated lexicon information and BERT using a shallow fusion layer; the medial five rows employed the pre-trained language model. BERT directly fine-tuned a pre-trained ChineseBERT on Chinese sequence labeling tasks. ERNIE [15] extended the BERT by using an entity-level mask to guide pre-training. ZEN [56] explicitly injected N-gram information into BERT through extra multi-layers of N-gram Transformer encoder and pre-training. To integrate lexicon features into BERT, LEBERT came up with a fresh method for Chinese sequence labeling, which directly used a Lexicon Adapter to integrate lexicon information between Transformer layers in BERT. It has a deep integration of dictionaries and BERT. The last two rows improved Chinese NER by BERT infused with the knowledge graph. As we can see, the models based on the knowledge achieved a better performance than lexicon-enhanced models.
Table 2. Four datasets results (F1).

5. Overall Results

5.1. Ablation Studies on the Four Datasets

To study the role of each part of KGNER, we performed ablation experiments on the four datasets and displayed the results in Figure 5. The results demonstrate that the model’s performance is declined if the mask matrix is dropped. For example, Weibo is severely injured by 4.4 without a mask matrix. All the tokens are visible in this situation, and some tokens can be interfered by other tokens, indicating that the mask matrix plays an irreplaceable role in the graph structure.
Figure 5. Ablation study on the four datasets.
To better show the advantage of our model, we dropped the position coding and mask matrix and simplified the structure. KGNER is equivalent to the BERT without the position coding and mask matrix. The results show that the KGNER reachs a F 1 score by 0.9, on average, than the BERT on the four datasets. From this discovery, we deduce that the knowledge graph plays an essential role in KGNER. It also manifests that the KGNER has a more robust ability to model sentences.

5.2. Performance against Adding Different Knowledge

The Knowledge Graph contains a large amount of structured data, and there are different ways to make full use of the existing knowledge. Table 3 shows the obtained results by adding different knowledge to the model. Adding the relational knowledge included in triple only provides a modest boost to the Ontonotes dataset. However, on the Weibo dataset, the F 1 value drops by more than 5.0. When adding both relationships and tail entities, the F 1 value gains a little in MSRA datasets. This shows that adding more knowledge to the sentence is not better. We infer that this may be subjected to the different kinds of datasets.
Table 3. Adding different knowledge to the sentence (F1).

5.3. Performance against Using Different Knowledge Graph

  • CN-DBpedia [34] is a large open-field encyclopedic KG developed by the Knowledge Work Lab of Fudan University, which involves a large number of entities and relationships. CN-DBpedia has been refined by eliminating those triples with entity names of less than 2 in length or containing special characters. There are 5.17 million triples in the improved CN-DBpedia.
  • HowNet [35] is a large-scale language knowledge base for Chinese vocabulary and concepts, in which each Chinese word is annotated with semantic units called sememes. If we take (word, contain, sememes) as a triple, HowNet is a language KG. Similarly, the official HowNet is refined by eliminating those triples with entity names less than 2 in length or containg special characters. There are 52,576 triples in the improved HowNet.
  • MedicalKG is a Chinese medical concept KG developed by [12]. There are four types of hypernym(sysptoms, diseases, parts, and treatments) and 13,864 triples in it.
  • Medicine_NER is the Clinical Named Entity Recognition (CNER) task released in CCKS 2017.
We use the three different knowledge graphs to perform substantial experiments on five datasets. Table 4 shows the experimental results. Compared with HowNet and MedicalKG, Weibo and Resume achieve the excellent F 1 score on CN-DBpedia, but Ontonotes obtains the highest F 1 score on HowNet. We speculate the following reasons: the Weibo and Resume dataset are obtained from the Internet and better match the knowledge in CN-DBpedia; Ontonotes is in the news domain and can acquire more knowledge from the HowNet. Finally, Medicine_NER only achieves the best results for MedicalKG. From the above results, we conclude the correct selection of KG is of great benefit to domain-specific tasks.
Table 4. Comparison of different knowledge graphs (F1).

5.4. F1 Score against Sentence Length

Based on the sentence length, we divided the test dataset into six parts. The second column denotes the number of sentences corresponding to each length range. The third column counts the number of sentences that can be matched to knowledge. The fourth column is the ratio of the two left columns. Table 5 can reflect some of the following information. First, the performance of both short and long sentences is not very good. Some of the reasons are as follows. The number of short sentences that can be matched to knowledge is too low. Although long sentences can match increased knowledge, the sentence after adding the knowledge is too long, increasing the semantic complexity of the sentence. In contrast, sentences from 40 to 100 in length not only match a lot of knowledge but also lead to a higher F 1 score.
Table 5. F 1 score against sentence length on the OntoNotes dateset.

5.5. Efficiency Comparison

As shown in Figure 6, we compared the inference speed of the three models on four datasets. As we can see, on different datasets, the inference time becomes longer as the dataset size increases, in addition to Ontonotes dataset. Using the same data, the inference time of both KBERT and KGNER increases due to the introduction of external knowledge. This may be the reason that the introduced knowledge leads to an increase in the sentence length. Compared with KBERT, except for the Ontonotes dataset, our model inference is a little slower. We infer that this may suffer from the impact of the CRF.
Figure 6. Inference speed on the four datasets.

5.6. Case Study

Table 6 illustrates examples of Chinese NER tagging results on Ontonotes and Weibo datasets, respectively. For a sentence, “The old woman won twelve games in the match”. “老妇人 (The old woman)”, the nickname of a football club Juventus F.C., according to our normal understanding, if matched knowledge is introduced, “老妇人 (The old woman)” should be classified as “ORG”. However, the results were contrary to expectations. It is not even correctly labeled on the given datasets. Nevertheless, this is difficult to understand, since we do not introduce domain-specific knowledge in the training phase. Consequently, it is indispensable to introduce domain-specific knowledge according to different tasks.
Table 6. Labeling comparison on different datasets.

6. Conclusions

In this paper, we proposed KGNER, a knowledge graph-inspired, named-entity recognition model, aiming to incorporate knowledge into NER. To prevent the introduced knowledge from diverting the sentence from its correct meaning, we conceived a new means of position encoding for raw sentences and knowledge and subjected the knowledge’s impact to the masking matrix. This not only preserves the original information of the sentence but also avoids the differentiation caused by the different use of vector space. Compared to baseline, since we use CRF and build the sentence tree, our model contains a slight time delay. Nevertheless, the experimental results show that our model outperforms other lexicon-based models in four Chinese datasets. It also shows that introducing knowledge into NER is a promising endeavor. We will continue to explore the potential of KGNER on other NLP tasks as future work.

Author Contributions

Conceptualization, W.H. and L.H.; methodology, W.H.; formal analysis, H.M.; investigation, H.M.; resources, K.W.; data curation, J.X.; writing—original draft preparation, W.H.; writing—review and editing, L.H.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The Weibo, Resume, MSRA and Ontonotes datasets are publicly available at: https://github.com/liuwei1206/LEBERT. The HowNet, CN-DBpedia, MedicaKG, and Medicine_NER are publicly available at: https://github.com/autoliuweijie/K-BERT.

Acknowledgments

We sincerely appreciate the anonymous reviewers for their precious comments and valuable suggestions. Moreover, we genuinely thank my teacher for his constructive guidance for the period of production of this paper and my friends Xuebin Jing and Zhenping Kang for their help polishing our paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NERNamed entity recognition
BERTBidirectional Encoder Representation Transformers
CRFConditional random field
NLPNatural language procession
Q&AQuestion and answer
ELMOEmbeddings from Language Models
KGKnowledge graphs
KNKnowledge noise
HFHeterogeneous information fusion
LSTMLong Short-Term Memory
RNNRecurrent neural network
PTMsPre-Trained Models
GPTGenerative Pre-Training
BARTBidirectional and Auto-Regressive Transformers
KBKnowledge base
GRUGated recurrent unit
KGEKnowledge graph embedding
KLKnowledge layer

References

  1. Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 167–176. [Google Scholar]
  2. Mao, N.; Huang, W.; Zhong, H. KGGCN: Knowledge-Guided Graph Convolutional Networks for Distantly Supervised Relation Extraction. Appl. Sci. 2021, 11, 7734. [Google Scholar] [CrossRef]
  3. Kim, K.; Hur, Y.; Kim, G.; Lim, H. GREG: A Global Level Relation Extraction with Knowledge Graph Embedding. Appl. Sci. 2020, 10, 1181. [Google Scholar] [CrossRef] [Green Version]
  4. Diefenbach, D.; Lopez, V.; Singh, K.; Maret, P. Core techniques of question answering systems over knowledge bases: A survey. Knowl. Inf. Syst. 2018, 55, 529–569. [Google Scholar] [CrossRef]
  5. He, J.; Wang, H. Chinese named entity recognition and word segmentation based on character. In Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, Hyderabad, India, 11–12 January 2008. [Google Scholar]
  6. Liu, Z.; Zhu, C.; Zhao, T. Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words? In Proceedings of the International Conference on Intelligent Computing, Washington, DC, USA, 22–23 June 2010; pp. 634–640. [Google Scholar]
  7. Luo, G.; Huang, X.; Lin, C.Y.; Nie, Z. Joint entity recognition and disambiguation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 879–888. [Google Scholar]
  8. Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. arXiv 2018, arXiv:1805.02023. [Google Scholar]
  9. Gui, T.; Ma, R.; Zhang, Q.; Zhao, L.; Jiang, Y.G.; Huang, X. CNN-Based Chinese NER with Lexicon Rethinking. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 4982–4988. [Google Scholar]
  10. Gui, T.; Zou, Y.; Zhang, Q.; Peng, M.; Fu, J.; Wei, Z.; Huang, X. A Lexicon-Based Graph Neural Network for Chinese NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 1039–1049. [Google Scholar]
  11. He, Q.; Wu, L.; Yin, Y.; Cai, H. Knowledge-graph augmented word representations for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 7919–7926. [Google Scholar]
  12. Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar]
  13. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  14. Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
  15. Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced language representation with informative entities. arXiv 2019, arXiv:1905.07129. [Google Scholar]
  16. Huang, S.; Sun, X.; Wang, H. Addressing domain adaptation for chinese word segmentation with global recurrent structure. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Taipei, Taiwan, 27 November–1 December 2017; pp. 184–193. [Google Scholar]
  17. Chen, X.; Shi, Z.; Qiu, X.; Huang, X. Adversarial multi-criteria learning for chinese word segmentation. arXiv 2017, arXiv:1704.07556. [Google Scholar]
  18. Ma, R.; Peng, M.; Zhang, Q.; Huang, X. Simplify the usage of lexicon in Chinese NER. arXiv 2019, arXiv:1908.05969. [Google Scholar]
  19. Mengge, X.; Bowen, Y.; Tingwen, L.; Yue, Z.; Erli, M.; Bin, W. Porous lattice-based transformer encoder for chinese NER. arXiv 2019, arXiv:1911.02733. [Google Scholar]
  20. Sui, D.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3830–3840. [Google Scholar]
  21. Li, X.; Yan, H.; Qiu, X.; Huang, X. FLAT: Chinese NER using flat-lattice transformer. arXiv 2020, arXiv:2004.11795. [Google Scholar]
  22. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
  23. Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
  24. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-training. Available online: https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035 (accessed on 22 July 2022).
  25. Yang, H. BERT meets chinese word segmentation. arXiv 2019, arXiv:1909.09292. [Google Scholar]
  26. Meng, Y.; Wu, W.; Wang, F.; Li, X.; Nie, P.; Yin, F.; Li, M.; Han, Q.; Sun, X.; Li, J. Glyce: Glyph-vectors for chinese character representations. arXiv 2019, arXiv:1901.10125. [Google Scholar]
  27. Hu, Y.; Verberne, S.; Scott, D.; Bel, N.; Zong, C. Named entity recognition for Chinese biomedical patents. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 627–637. [Google Scholar]
  28. Liu, W.; Fu, X.; Zhang, Y.; Xiao, W. Lexicon enhanced chinese sequence labeling using bert adapter. arXiv 2021, arXiv:2105.07148. [Google Scholar]
  29. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
  30. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  31. Du, Z.; Qian, Y.; Liu, X.; Ding, M.; Qiu, J.; Yang, Z.; Tang, J. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 320–335. [Google Scholar]
  32. Yan, H.; Gui, T.; Dai, J.; Guo, Q.; Zhang, Z.; Qiu, X. A unified generative framework for various NER subtasks. arXiv 2021, arXiv:2106.01223. [Google Scholar]
  33. Cui, L.; Wu, Y.; Liu, J.; Yang, S.; Zhang, Y. Template-based named entity recognition using BART. arXiv 2021, arXiv:2106.01760. [Google Scholar]
  34. Xu, B.; Xu, Y.; Liang, J.; Xie, C.; Liang, B.; Cui, W.; Xiao, Y. CN-DBpedia: A never-ending Chinese knowledge extraction system. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Arras, France, 27–30 June 2017; pp. 428–438. [Google Scholar]
  35. Dong, Z.; Dong, Q. HowNet-a hybrid language and knowledge resource. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China, 26–29 October 2003; pp. 820–824. [Google Scholar]
  36. Bordes, A.; Usunier, N.; Garcia-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2, Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 2787–2795. [Google Scholar]
  37. Xie, R.; Liu, Z.; Jia, J.; Luan, H.; Sun, M. Representation learning of knowledge graphs with entity descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
  38. Zhong, H.; Zhang, J.; Wang, Z.; Wan, H.; Chen, Z. Aligning knowledge and text embeddings by entity descriptions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 267–272. [Google Scholar]
  39. Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. Transg: A generative mixture model for knowledge graph embedding. arXiv 2015, arXiv:1509.05488. [Google Scholar]
  40. Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
  41. Shi, B.; Weninger, T. Proje: Embedding projection for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
  42. Annervaz, K.; Chowdhury, S.B.R.; Dukkipati, A. Learning beyond datasets: Knowledge graph augmented neural networks for natural language processing. arXiv 2018, arXiv:1802.05930. [Google Scholar]
  43. Xin, J.; Lin, Y.; Liu, Z.; Sun, M. Improving neural fine-grained entity typing with knowledge attention. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  44. Wang, Y.; Sun, Y.; Ma, Z.; Gao, L.; Xu, Y. An ERNIE-Based Joint Model for Chinese Named Entity Recognition. Appl. Sci. 2020, 10, 5711. [Google Scholar] [CrossRef]
  45. Wang, R.; Tang, D.; Duan, N.; Wei, Z.; Huang, X.; Cao, G.; Jiang, D.; Zhou, M. K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv 2020, arXiv:2002.01808. [Google Scholar]
  46. Sang, E.F.; Veenstra, J. Representing text chunks. arXiv 1999, arXiv:cs/9907006v1. [Google Scholar]
  47. Khan, W.; Daud, A.; Shahzad, K.; Amjad, T.; Banjar, A.; Fasihuddin, H. Named Entity Recognition Using Conditional Random Fields. Appl. Sci. 2022, 12, 6391. [Google Scholar] [CrossRef]
  48. Lafferty, J.D.; McCallum, A.; Pereira, F.C.N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML `01, San Diego, CA, USA, 28 June–1 July 2001; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 282–289. [Google Scholar]
  49. Weischedel, R.; Pradhan, S.; Ramshaw, L.; Palmer, M.; Xue, N.; Marcus, M.; Taylor, A.; Greenberg, C.; Hovy, E.; Belvin, R.; et al. Ontonotes Release 4.0; LDC2011T03; Linguistic Data Consortium: Philadelphia, PA, USA, 2011. [Google Scholar]
  50. Levow, G.A. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22–23 July 2006; pp. 108–117. [Google Scholar]
  51. Peng, N.; Dredze, M. Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 548–554. [Google Scholar]
  52. He, H.; Sun, X. F-score driven max margin neural network for named entity recognition in chinese social media. arXiv 2016, arXiv:1611.04234. [Google Scholar]
  53. Zhu, Y.; Wang, G.; Karlsson, B.F. CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv 2019, arXiv:1904.02141. [Google Scholar]
  54. Liu, W.; Xu, T.; Xu, Q.; Song, J.; Zu, Y. An encoding strategy based word-character LSTM for Chinese NER. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2 June–7 June 2019; pp. 2379–2389. [Google Scholar]
  55. Ding, R.; Xie, P.; Zhang, X.; Lu, W.; Li, L.; Si, L. A neural multi-digraph model for Chinese NER with gazetteers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1462–1467. [Google Scholar]
  56. Diao, S.; Bai, J.; Song, Y.; Zhang, T.; Wang, Y. ZEN: Pre-training Chinese text encoder enhanced by n-gram representations. arXiv 2019, arXiv:1911.00720. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.