Next Article in Journal
Data Spaces in Manufacturing and Supply Chains: A Review and Insights from European Initiatives
Previous Article in Journal
Pore Structure Differences and Influencing Factors of Tight Reservoirs Under Gravity Flow–Delta Sedimentary System in Linnan Subsag, Bohai Bay Basin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Chinese Few-Shot Named-Entity Recognition Model Based on Multi-Label Prompts and Boundary Information

School of Computer, Electronic and Information, Guangxi University, Nanning 530004, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(11), 5801; https://doi.org/10.3390/app15115801
Submission received: 3 April 2025 / Revised: 15 May 2025 / Accepted: 18 May 2025 / Published: 22 May 2025

Abstract

:
Currently, few-shot setting and entity nesting are two major challenges in named-entity recognition (NER). Compared to English, Chinese NER not only has issues such as complex grammatical structures, polysemy, and entity nesting but also faces low-resource scenarios in specific domains due to difficulties in sample annotation. To address these two issues, we propose a Chinese few-shot named-entity recognition model that integrates multi-label prompts and boundary information (MPBCNER). This model is an improvement based on a pre-trained language model (PLM) combined with a pointer network. First, the model uses multiple entity label words and position slots as prompt information in the entity recognition training task. Activating the relevant parameters in PLM associated with the corresponding entity labels through the prompt information improved the model’s performance in entity recognition under small-sample data. Secondly, by using a Graph Attention Network (GAT) to construct the boundary information extraction module, we integrated boundary information with text features, allowing the model to pay more attention to features near the boundaries when recognizing entities, thereby improving the accuracy of entity boundary recognition. Experiments on multiple public small-sample datasets and our own annotated datasets in the field of government auditing demonstrated the effectiveness of this model.

1. Introduction

Named-entity recognition (NER) holds a crucial position in the field of natural language processing [1]. In information-extraction tasks, it can accurately identify key information such as the names of people, locations, and organizations from vast amounts of text, transforming unstructured text into structured data, thereby providing strong support for subsequent data analysis and decision making [2].
Traditional named-entity recognition methods based on sequence labeling highly depend on a large amount of labeled data [3]. This data helps the model learn the relationship between entities and context, as well as the characteristics of entity categories, enabling it to perform well when annotating new texts. At this stage, LAFFERTY J et al. [4] have proposed the conditional random field (CRF) model for sequence-labeling tasks. The CRF model can automatically identify and capture features and patterns in sequences by learning the mapping from input sequences to output label sequences. HUANG et al. [5] proposed a model based on bidirectional long short-term memory networks (BiLSTM) and CRF for sequence labeling. This model uses the BiLSTM network as the context encoding layer, which can better capture long-term dependencies in the sequence. MA et al. [6] have introduced CNN units based on the BiLSTM-CRF model to extract morphological information from word characters, thereby improving the model’s final performance.
In recent years, with the successive emergence and development of large-scale pre-trained models (e.g., Bert [7], GPT [8,9,10], RoBERTa [11], T5 [12], ERNIE [13,14,15], etc.), the large-scale training parameters have enabled these large models to possess better knowledge representation and transfer learning capabilities, thereby achieving the latest and optimal results in many natural language processing tasks. Souza et al. [16] proposed a BERT-CRF model for the NER task, combining BERT’s transfer capabilities with CRF’s structured prediction, achieving results better than those of the aforementioned baseline methods. However, both traditional sequence labeling methods and fine-tuning modes of downstream tasks based on large models require a large amount of sample data during the training phase. In practical applications, obtaining large-scale labeled data may face difficulties, requiring industry experts to spend a lot of time and effort to complete it manually [17]. In small-sample scenarios, due to the limited labeled data, it is difficult for the model to learn the complex features and constraints of different entity types, leading to unsatisfactory entity-recognition results.
To harness the potential of large models in small-sample-data scenarios, researchers have adopted a range of methods, including transfer learning [18], contrastive learning [19], knowledge graph injection [20], and prompt learning [21]. These methods have been used to enhance a model’s generalization ability, thereby improving its performance on small samples. Among these, prompt learning, as an emerging model adaptation paradigm, has emerged as a particularly efficient and flexible solution for model applications in small-sample-data scenarios through the embedding mechanism of natural language prompts, garnering widespread attention from researchers in the field. CUI et al. [22] proposed TemplateNER, which was the first to apply the prompt method to NER, predicting entity types by manually constructing discrete templates and enumerating all possible entity spans. The model has been shown to significantly outperform traditional sequence labeling methods in cross-domain and few-shot setting. However, the prediction results are heavily dependent on the quality of the template design, and the introduction of a large number of negative samples leads to severe efficiency issues. Ma et al. [23] introduced the soft template prompt learning method, incorporating a small number of learnable parameters as prompt guidance in the self-attention layer of pre-trained models, compensating for the shortcomings of manually designed prompt templates. Shen et al. [24] designed a dual-slot multi-prompt template for entity types and positions, extracting all entities by predicting each prompt slot. This avoids the efficiency issues of enumerating spans and achieves a single round of prediction for all entities. This method provides a new approach for using prompts in NER tasks, but its model design is relatively simple. It utilizes the positional information of entities without considering the boundary information of the entity heads and tails, leaving room for further improvement in Chinese NER tasks.
Compared to English, Chinese NER faces many unique challenges. The structure of Chinese is complex, and Chinese characters themselves do not have natural delimiters, making the boundaries between words ambiguous [25]. This makes both character-based and word-based sequence labeling quite challenging. If the model cannot effectively handle nested relationships, it is prone to recognition errors. These factors make Chinese named-entity recognition much more difficult than that of English, placing higher demands on the model’s performance. A series of experiments conducted in [26,27,28,29,30] have demonstrated that the incorporation of boundary information into the encoding and decoding layers of the model can effectively enhance the model’s accuracy in Chinese natural language tasks.
Regarding the issues identified in the aforementioned Chinese NER task, after comprehensively considering the advantages and disadvantages of various methods, we propose a Chinese few-shot named-entity recognition model that integrates multi-label prompts and boundary information (MPBCNER). First, the MPBCNER uses a pre-trained large model as the encoding layer, introducing a template composed of entity label prompt words and entity position prompt slots at the model input layer. This better utilizes the prior knowledge of the pre-trained model during training and prediction phases, enhancing the model’s performance in small-sample scenarios. Secondly, we used a graph attention network to construct a boundary information extraction module, allowing the model to focus more on features near the boundaries when recognizing entities, thereby improving the accuracy of entity boundary recognition. Finally, we use a pointer network as the model’s decoding layer, cleverly avoiding the complexity and inefficiency brought by predicting enumerated spans. Meanwhile, the structure of the pointer network has a certain degree of feature similarity with the boundary information network, which allows it to better learn the boundary features of Chinese entities, providing significant advantages in handling Chinese samples and nested entities.

2. Related Works

2.1. Pre-Trained Language Model

Early language models were primarily based on statistical methods and neural networks, such as a recurrent neural network (RNN) [31] and long short-term memory (LSTM) [32]. However, these models had limitations in dealing with long-distance dependencies and semantic understanding [33]. In 2017, Google introduced the Transformer architecture, which addressed the problem of long-distance dependencies through a self-attention mechanism, laying the foundation for the development of pre-trained language models (PLMs) [34].
In general, there are two main types of PLMs: the autoencoding language model represented by Bert [7], and the autoregressive language model represented by GPT-1 [8]. Bert uses the Transformer’s encoder structure and is pre-trained through the masked language model (MLM) and next sentence prediction (NSP) tasks, and performs exceptionally well on various natural language processing (NLP) tasks. GPT, on the other hand, uses Transformer’s decoder structure and is pre-trained by predicting the next word in a unidirectional language model. A commonality between these two categories of pre-trained models is their extensive parameter scale. Through training on extensive corpora, these models are capable of acquiring intricate language structures and semantics. In numerous natural language processing tasks, fine-tuning the model is sufficient to achieve and even surpass traditional baseline methods.

2.2. Fine-Tuning and Prompt Learning

A large language model (LLM) undergoes self-supervised learning on substantial text corpora to acquire extensive linguistic knowledge. Through fine-tuning, these models can achieve optimal performance on associated downstream tasks. However, the fine-tuning paradigm currently faces two major challenges [35]:
  • Fine-tuning requires a large amount of manually annotated data to achieve good model performance on large corpora. However, in practice, many fields have limited sample data and low-quality manually annotated data. Consequently, the efficacy of the model is diminished in a few-shot setting;
  • The fine-tuned model is usually only suitable for specific tasks and datasets and cannot be directly used for new tasks. The capabilities of the pre-trained language model cannot be fully utilized.
Researchers have found that through prompt learning, large models can perform better on the same downstream tasks [21]. Currently, there are two main methods of prompt learning:
  • Hard prompt. The prompt template consists of a discrete sequence of text [22,36];
  • Soft prompt. The prompt template is represented by a continuous sequence of learnable features [37,38].
In most cases, a soft prompt performs better than a hard prompt because of their ability to learn and adapt dynamically.

2.3. Chinese Named-Entity Recognition

Chinese named-entity recognition is an important part of the NER field and has significant research implications. Chinese NER suffers from fuzzy boundary information, lack of labeled data, and semantic complexity. Researchers mainly improve it by lexical information enhancement and fusion of Chinese language character features. Zhang et al. [39] proposed a new dynamic embedding method that uses an attention mechanism to combine character and word vector features in the embedding layer. Zhang et al. [40] proposed the Lattice LSTM model, which incorporates lexical information for the first time into a character-based model, which explicitly utilizes word and word sequence information as compared to character-based methods. The model is not affected by segmentation errors as compared to word-based approaches. Wang et al. [41] propose a hierarchical label-enhanced contrastive learning method for Chinese NER, which reduces word segmentation errors and increases word boundary information for character sequences.

3. MPBCNER Model

In this section, we first describe the overall framework of the model, and then introduce the implementation details of each module of the model, including the construction of prompt templates, the encoding and decoding layers, the auxiliary training of the starting position prompt, and the entity boundary feature extraction in the graph attention network implementation. The overall structure of the model is shown in Figure 1.

3.1. Prompt Template

The prompt template affects the quality of the model’s feature output. Hard prompts, entirely designed by humans using natural language, have high interpretability but may not fully align with the PLM’s understanding. Soft prompts, on the other hand, automatically optimize by introducing learnable parameters to adapt to different tasks but cannot explain their rationality. We have combined the characteristics of both soft and hard prompt to design a new NER prompt template.
By default, each prompt consists of three parts: a start position slot [ S ] , an end position slot [ E ] , and learnable contextual tokens < L > . Specifically, our model fills in a set of prompts for each type of entity and then concatenates all the prompts with the original text to use as input. For the sentence X = “北京银行在广西设立分行 (Bank of Beijing opens branch in Guangxi)”, we set two entity types: “地点 (Location)”, “组织 (Organization)”. The input sequence T with prompt is represented as:
T = { [ S 0 ] [ E 0 ] , [ S 1 ] [ E 1 ] } [ C L S ] 广 西
Generally, for a sequence composed of n tokens with M entity types, the input sequence T with a prompt can be represented as:
T = [ S i ] < L > [ E i ] i = 1 , 2 , , M [ C L S ] < x 1 x 2 , , x n >
where “ [ S i ] < L > [ E i ] ” represents the i -th prompt. < L > is the soft prompt consisting of learnable contextual tokens. Specifically, we added a set of trainable embedding matrices { P 1 , P 2 , , P N } for the PLM encode as prompt parameters, where N is the number of transformer layers. In the model-training phase, the PLM parameters are frozen and the prompt parameters are the only trainable parameters. Meanwhile, we used label words embedding as hard prompts to complete the initialization of the soft prompt.

3.2. Encoder Layer

For a given input sentence T spliced by the sentence X and M prompts, we used PLM to encode and then obtain the hidden representation H T :
H T = P L M ( T )
Then, by indexing on the corresponding position of H T , we can obtain the encoding of the sentence X and the encodings of the two prompt slots, denoted as H X , H S , and H E , where H X n × h , H S , H E n × h .
In the subsequent experiments, we will use different PLMs for encoding to explore the impact of PLMs on the model.

3.3. GAT-Based Boundary Information Fusion Layer

The graph attention network (GAT) is a neural network model that has been specifically designed to process graph-structured data. It introduces an attention mechanism that enables the model to assign different weights to the neighboring nodes of each node during graph data processing. This mechanism facilitates the enhanced capture of the complex relationships between nodes and the local structures within the graph data, thereby improving the model’s ability to understand and represent the intricate patterns present in the data.
While PLM is capable of effectively extracting contextual features, its ability to capture boundary information features within the input text is limited. To address this, the present paper proposes employing GAT to extract features based on the internal dependency relationships present in the input text. This approach involves mining the word and character dependency relationships within the text to extract the boundary features of entities. Consequently, this enhances the accuracy of entity recognition.
In this study, a sequence of Chinese text is treated as a graph-structured network, with each character in the sequence considered a node in the graph. In particular, we use DDParser [42] to obtain the dependencies of the input sequence and then construct a directed graph based on the dependencies, where the directed edges point to the center word of a given word, and the directed edges are labeled with the specific type of dependency, and the words located at the beginning of the directed edges depend on the words located at the end of the directed edges in some kind of dependency relationship. For a Chinese text sequence “北京银行在广西设立分行 (Bank of Beijing opens branch in Guangxi)”, the syntactic dependency relationship graph is shown in Figure 2.
In this sentence, “北京” is a qualifying modifier for “银行”, “北京银行” and “设立” have a subject–predicate relationship, “在” and “广西” are in a verb–object relationship, etc. Since the model in this paper is character-based, each character in a word has the same dependency relationship with the word. Meanwhile, to retain the feature information of the node itself, it is assumed that each word is related to itself. Each node in the graph will have a ring pointing to itself. According to the dependency relationship between nodes, the elements at the corresponding positions in the dependency matrix are set to 1, and the remaining elements are set to 0. According to the above rules, the dependency matrix of the input sequence is constructed.
In a directed graph, nodes that have a dependency relationship with entities reflect the existence of the entities to a certain extent. Meanwhile, directed edges containing dependency information can assist in distinguishing entity boundaries and identifying the categories of entities.
To more clearly extract features related to boundary information between entities, we built a GAT to capture dependencies between words. Firstly, a linear transformation is done on the input features of each node to map the feature dimension from F to F :
W h i F
where W F × F is sharing weight matrices. h i is the original feature vector of node i .
Then, the attention coefficient e i j indicates the importance of node j to node i :
e i j = a t t W h i , W h j = L e a k R e L U ( a T W h i W h j )
where a t t ( , ) is the attention function, implemented by a single-layer feedforward neural network. a is the weight vector.
In order to compare the importance of different neighboring nodes, the s o f t max function is used to normalize all attention coefficients:
α i j = s o f t m a x ( e i j ) = exp ( e i j ) k N i exp ( e i k )
where N i is the neighborhood nodes of node i .
To enhance the model’s representational capacity, GAT introduces a multi-head attention mechanism. A GAT operation with K independent attention heads can be expressed as:
h = σ 1 K k = 1 K j N i α i j k W k h j
where σ is the activation function.
The GAT layer takes the node dependency matrix and the character vector representation that integrates lexical information as input and uses the attention mechanism to calculate each node, thereby obtaining the feature representation of each node. The text dependency relationship features extracted by the GAT layer can effectively distinguish entity boundaries, thereby significantly improving the accuracy of named-entity recognition.

3.4. Feature Fusion Layer

Although GAT has shown significant effectiveness in capturing internal dependencies within entities, there is still room for improvement in clearly defining entity boundaries. To address this shortcoming, we treat entity boundary detection as a binary classification task and train it simultaneously with the NER. This approach allows us to provide clearer and more accurate entity boundary information to the NER model, thereby enhancing the overall performance of the model.
During the training phase, we used two separate MLP networks to predict the start and end positions of entities:
H S = M L P S t a r t H S
H E = M L P E n d H E
where H S and H E are the hidden feature representation of two prompt slot positions.
Then, we added the hidden features of the entity to the output of the GAT layer:
H = W S H S + W E H E + W G H G A T
where H is the final input for the point network. W S , W E , and W G are learnable weight matrices.
At the feature fusion layer, by fusing boundary features and entity location features, we obtain a hidden vector representation after multi-feature fusion. It will further improve the model’s recognition of Chinese entities.

3.5. Decode Layer

We chose a pointer network as the model’s decoding layer. On one hand, the pointer network primarily focuses on the start and end positions of entities, which is consistent with the two auxiliary tasks of our method, making it well-suited for our work. On the other hand, the pointer network addresses the issue of entity nesting, which sequence labeling struggles to resolve, making it more practical for Chinese NER tasks.
The core of the pointer network is to select entity boundaries in the input sequence through the attention mechanism. For a feature sequence output H = { h 1 , h 2 , , h n } , the attention weight calculation formula for the pointer network is:
α i j = exp ( s c o r e ( h i , h j ) ) k = 1 n exp ( s c o r e ( h i , h k ) )
s c o r e ( h i , h j ) = h i T W s h j
The pointer network uses binary cross-entropy loss to predict whether each position is the start or end of an entity:
p i S t a r t , p j E n d = s i g m o d ( h i S t a r t , h j E n d )
L S t a r t = i = 1 n y i S t a r t log p i S t a r t + ( 1 y i S t a r t ) log ( 1 p i S t a r t )
L E n d = i = 1 n y i E n d log p i E n d + ( 1 y i E n d ) log ( 1 p i E n d )
The overall loss function L of the model consists of L S t a r t and L E n d together and:
L = λ L S t a r d + ( 1 λ ) L E n d
where λ is the weight parameter, and we set it to 0.5.

4. Experiment

4.1. Datasets

This study conducted experiments on a manually collected dataset in the field of Chinese government auditing and three public datasets, OntoNotes V5.0 [43], Weibo NER [44], and MSRA [45].

4.1.1. The Government Auditing Dataset

Due to the particularity of the field of government auditing, there is a lack of relevant publicly available dataset resources. We manually collected and created a small amount of sample data from the publicly available information on various government websites. After data cleaning, we obtained 1000 samples, involving four types of entities: time, unit, project, and amount. In our subsequent experiments, we will refer to it as GA.

4.1.2. OntoNotes V5.0 Dataset

The dataset is a Chinese dataset and consists of texts from the news domain.

4.1.3. Weibo NER Dataset

The dataset contains annotated NER messages drawn from the social media Sina Weibo.

4.1.4. MSRA Dataset

The dataset is released by Microsoft Research Asia, which contains data on multiple entity nested types.

4.2. Experiment Environment

The experimental code is developed based on the PyTorch 2.3.0 framework, and the cloud GPU server is used as the experimental runtime environment. The experimental environment specifics are shown in Table 1.
The experimental parameters setting are shown in Table 2.
In configuring the pre-trained language model, we established a maximum input length for the text sequence of 512 units. We leveraged 12 transformer layers to facilitate semantic encoding and feature extraction. The learning rate was set to 1 × 10−5, aiming to enhance the model’s learning velocity and convergence. Additionally, to boost training efficiency and the model’s generalization capacity, we set the number of epochs to 8 and the batch size to 10.

4.3. Evaluation Indicators

This paper uses the F1 score as an evaluation metric. Precision ( P ) refers to the ratio of the number of correctly identified named entities to the total number of identified named entities. Recall ( R ) refers to the ratio of the number of correctly identified named entities to the total number of entities. Precision is the ratio of the number of correctly predicted samples to the total number of samples, and F 1 is the harmonic mean of precision and recall. The calculation formula is:
P = T P T P + F P
R = T P T P + F N
F 1 = 2 T P 2 T P + F P + F N
Among them, T P represents the number of samples predicted as positive and actually positive, F P represents the number of samples predicted as positive but actually negative, F N represents the number of samples predicted as negative but actually positive, and T N represents the number of samples predicted as negative and actually negative. Accuracy and F 1 score can comprehensively evaluate the performance of the NER model.

4.4. Baselines

In this work, we conducted experimental comparisons with the following baseline methods to verify the performance of the model in the small-sample scenarios NER task, including based on sequence labeling (BERT-CRF [16]), based on span (SpERT [46]), based on machine reading comprehension (BERT-MRC [47]), based on sequence generation (BARTNER [48]), and based on prompt learning (TemplateNER [22], PromptNER [24]).
These methodologies employ distinct pre-trained language models as encoders. Consequently, the experimental results provide the performance of GLK on various PLMs, including Bert, RoBERTa, and ERNIE. Bert is the most classical large language model. RoBERTa is based on the architecture of BERT with key modifications and optimizations. RoBERTa provides richer linguistic contextual information by using larger datasets (e.g., a combination of BookCorpus and OpenWebText) for training, which helps to improve the model’s generalization ability. In addition, it dynamically adjusts hyperparameters, such as the learning rate, batch size, etc., to optimize the learning efficiency and effectiveness of the model. ERNIE is a knowledge-enhanced pre-trained language model. ERNIE focuses not only on word representation but also on phrase, entity, and sentence representation, and this multi-granularity knowledge modeling helps to capture the rich structure in the language and improves semantic comprehension.

5. Results and Discussion

5.1. Results in Standard Chinese Dataset

We first conducted experiments on the standard Chinese dataset, where OntoNotes and MSRA are flat datasets and Weibo and GA contain a large amount of nested entities. The experimental results are shown in Table 3.
From the results in Table 3, the MPBCNER model performed better than baselines on both the standard Chinese flat and nested. The average lead was 1–2% in the Chinese flat dataset and 3–5% in the standard Chinese nested dataset. The results indicate the efficacy of our methodology in incorporating entity boundary information. This capacity facilitates the model’s enhanced precision in handling nested entity data.
It can especially be seen that different PLMs affected the results to some extent. RoBERTa uses more than ten times the data and longer training time than BERT does, which improves model robustness and performance. ERNIE additionally introduces a knowledge-enhanced pre-training task and enhances the model’s ability to learn about the understanding of words and entities by integrating external knowledge such as knowledge graphs. This feature gives ERNIE a clear advantage in Chinese NLP tasks.

5.2. Results in Few-Shot Setting

To assess the performance of the model in scenarios with limited resources, we randomly sampled a fixed number of instances for each dataset as K-shot, where K = 10, 20, 50, 100, 200, 500. The few-shot setting follows [10]. The experimental results in the OntoNotes few-shot dataset are shown in Table 4. The experimental results in the MRSA few-shot dataset are shown in Table 5. The experimental results in the Weibo few-shot dataset are shown in Table 6. The experimental results in the GA few-shot dataset are shown in Table 7.
From Table 4, Table 5, Table 6 and Table 7, we can find that the models based on prompt generally exhibited better performance than others did in the few-shot setting. The prompt template designed in this paper could better stimulate the a priori knowledge of PLM and get better results than other models. After fusion with boundary information features, the MPBCNER model achieved optimal results with limited resources.
From the above experimental results, we can get the following three conclusions:
  • Comparing the approach based on sequence labeling and the approach based on span, it can be found that the prompt-based model had a significant advantage because the prompt template could provide the necessary context for large-scale pre-training of the language model, helping the model to better understand the downstream task requirements and therefore achieve better performance in a few-shot setting;
  • Comparing the approach based on machine reading comprehension and the approach based on sequence generation, it can be found that these two methods, although they performed better than the sequence labeling-based method did in the few-shot setting, did not perform as well as the prompt-based method in the experimental results because of insufficient deep mining of the prior knowledge of the pre-trained model;
  • Comparing the approach based on prompts, it can be found that the prompt-based method was more advantageous in the few-shot setting, but due to the lack of feature learning on Chinese entity boundaries, it was less effective on Chinese datasets, especially on nested datasets. MPBCNER, which incorporates boundary features and entity location features, could better handle the NER task in the Chinese small-sample setting.

5.3. Ablation Assay

In this section, we conducted ablation experiments on the GA dataset to analyze the effect of different modules of MPBCNER. We selected BERT as a defult PLM. The experimental results are shown in Table 8.
The results in Table 8 show that the F1 score of the model decreased after removing different submodules. In this case, the model performance was affected more by the removal of the prompt module, which indicates that prompt learning can effectively improve the performance of the model in a few-shot setting. Meanwhile, for Chinese NER, incorporating effective entity boundary features into the model improved the model’s entity recognition capability more accurately.
In order to explore the effect of different prompt templates on model performance, we set up relevant comparison experiments. We denoted the prompt template method that only uses the prompt slot as MPBCNER (Slot), the prompt template method that only uses the hard prompt template as MPBCNER (Hard), and the prompt template method that only uses the soft prompt template as MPBCNER (Soft).
The results in Table 9 show that different prompt templates had different effects on the models. Of all of these, the soft prompt and prompt slots had the best lift on the model. The effect of hard prompts was the worst. The MPBCNER combining all three cueing features performed the best.

6. Conclusions

In this paper, we designed a dual-position slot template combining soft and hard prompts, using entity label words as a hard prompt to complete the initialization of the prompt slots, and the position cue slots and the NER task jointly involved in the training could be learned to correspond to the start and end position features of the labeled entities. We also utilized graph attention networks to complete the extraction of entity boundary features. Compared with the baseline approach, our model achieved better performance in Chinese flat and nested NER and low-resource scenarios after fusing positional and boundary features.
At the same time, the model still has some shortcomings. One is the use of a GAT and a pointer network for feature extraction of data. There is a problem with operational efficiency, and the use of a sparse matrix can be considered to improve the operational efficiency of the model. The other is that for each class of entity type, a corresponding pointer network is needed for decoding, and the use of a global pointer network can be considered to avoid this problem.

Author Contributions

Conceptualization, C.Z.; methodology, C.Z.; software, C.Z.; validation, C.Z. and B.H.; formal analysis, C.Z. and Y.L.; investigation, C.Z. and Y.L.; resources, B.H.; data curation, C.Z.; writing—original draft preparation, C.Z.; writing—review and editing, B.H.; visualization, C.Z.; supervision, B.H.; project administration, C.Z.; funding acquisition, B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Open Project Program of Guangxi Key Laboratory of Digital Infrastructure, project number GXDINBC202406.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data underlying this article will be shared upon reasonable request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cheng, J.; Liu, J.; Xu, X.; Xia, D.; Liu, L.; Sheng, V.S. A Review of Chinese Named Entity Recognition. KSII Trans. Internet Inf. Syst. (TIIS) 2021, 15, 2012–2030. [Google Scholar]
  2. Nadeau, D.; Sekine, S. A Survey of Named Entity Recognition and Classification. Lingvisticae Investig. 2007, 30, 3–26. [Google Scholar] [CrossRef]
  3. Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef]
  4. Lafferty, J.; Mccallum, A.; Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001. [Google Scholar]
  5. Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
  6. Ma, X.; Hovy, E. End-to-End Sequence Labeling via Bi-Directional Lstm-Cnns-Crf. arXiv 2016, arXiv:1603.01354. [Google Scholar]
  7. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  8. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training; OpenAI: San Francisco, CA, USA, 2018. [Google Scholar]
  9. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  10. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
  11. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  12. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  13. Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
  14. Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Tian, H.; Wu, H.; Wang, H. Ernie 2.0: A Continual Pre-Training Framework for Language Understanding. Proc. AAAI Conf. Artif. Intell. 2020, 34, 8968–8975. [Google Scholar] [CrossRef]
  15. Sun, Y.; Wang, S.; Feng, S.; Ding, S.; Pang, C.; Shang, J.; Liu, J.; Chen, X.; Zhao, Y.; Lu, Y.; et al. ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation. arXiv 2021, arXiv:2107.02137. [Google Scholar]
  16. Souza, F.; Nogueira, R.; Lotufo, R. Portuguese Named Entity Recognition Using BERT-CRF. arXiv 2019, arXiv:1909.10649. [Google Scholar]
  17. Hu, Z.; Hou, W.; Liu, X. Deep Learning for Named Entity Recognition: A Survey. Neural Comput. Appl. 2024, 36, 8995–9022. [Google Scholar] [CrossRef]
  18. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
  19. Das, S.S.S.; Katiyar, A.; Passonneau, R.J.; Zhang, R. CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning. arXiv 2022, arXiv:2109.07589. [Google Scholar]
  20. Al-Moslmi, T.; Ocaña, M.G.; Opdahl, A.L.; Veres, C. Named Entity Extraction for Knowledge Graphs: A Literature Overview. IEEE Access 2020, 8, 32862–32881. [Google Scholar] [CrossRef]
  21. Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 195. [Google Scholar] [CrossRef]
  22. Cui, L.; Wu, Y.; Liu, J.; Yang, S.; Zhang, Y. Template-Based Named Entity Recognition Using BART; ACL Anthology: Boca Raton, FL, USA, 2021. [Google Scholar]
  23. Chen, X.; Li, L.; Deng, S.; Tan, C.; Xu, C.; Huang, F.; Si, L.; Chen, H.; Zhang, N. LightNER: A Lightweight Tuning Paradigm for Low-Resource NER via Pluggable Prompting. arXiv 2022, arXiv:2109.00720. [Google Scholar]
  24. Shen, Y.; Tan, Z.; Wu, S.; Zhang, W.; Zhang, R.; Xi, Y.; Lu, W.; Zhuang, Y. PromptNER: Prompt Locating and Typing for Named Entity Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; Volume 1: Long Papers, pp. 12492–12507. [Google Scholar]
  25. Liu, P.; Guo, Y.; Wang, F.; Li, G. Chinese Named Entity Recognition: The State of the Art. Neurocomputing 2022, 473, 37–53. [Google Scholar] [CrossRef]
  26. Zhang, S.; Li, G.; Shi, X.; Liu, L.; Chen, M. Chinese Named Entity Recognition with Integrated Boundary Features and Location Information. In Proceedings of the 2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 20–22 September 2024; Volume 7, pp. 565–570. [Google Scholar]
  27. Chen, Y.; Wu, L.; Zheng, Q.; Huang, R.; Liu, J.; Deng, L.; Yu, J.; Qing, Y.; Dong, B.; Chen, P. A Boundary Regression Model for Nested Named Entity Recognition. Cogn. Comput. 2023, 15, 534–551. [Google Scholar] [CrossRef]
  28. Li, Z.; Song, M.; Zhu, Y.; Zhang, L. Chinese Nested Named Entity Recognition Based on Boundary Prompt. In Proceedings of the Web Information Systems and Applications, Chengdu, China, 15–17 September 2023; Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X., Eds.; Springer Nature: Singapore, 2023; pp. 331–343. [Google Scholar]
  29. Wang, Y.; Tong, H.; Zhu, Z.; Hou, F.; Li, Y. Enhancing Biomedical Named Entity Recognition with Parallel Boundary Detection and Category Classification. BMC Bioinform. 2025, 26, 63. [Google Scholar] [CrossRef] [PubMed]
  30. Chen, C.; Kong, F. Enhancing Entity Boundary Detection for Better Chinese Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; Volume 2: Short Papers, pp. 20–25. [Google Scholar]
  31. Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
  32. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  33. Van Houdt, G.; Mosquera, C.; Nápoles, G. A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  34. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  35. Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-Trained Models: Past, Present and Future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
  36. Ma, R.; Zhou, X.; Gui, T.; Tan, Y.; Zhang, Q.; Huang, X. Template-Free Prompt Tuning for Few-Shot NER. arXiv 2021, arXiv:2109.13532. [Google Scholar]
  37. Gu, Y.; Han, X.; Liu, Z.; Huang, M. PPT: Pre-Trained Prompt Tuning for Few-Shot Learning. arXiv 2022, arXiv:2109.04332. [Google Scholar]
  38. Han, X.; Zhao, W.; Ding, N.; Liu, Z.; Sun, M. PTR: Prompt Tuning with Rules for Text Classification. arXiv 2021, arXiv:2105.11259. [Google Scholar] [CrossRef]
  39. Zhang, N.; Li, F.; Xu, G.; Zhang, W.; Yu, H. Chinese NER Using Dynamic Meta-Embeddings. IEEE Access 2019, 7, 64450–64459. [Google Scholar] [CrossRef]
  40. Zhang, Y.; Yang, J. Chinese NER Using Lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018. [Google Scholar]
  41. Wang, C.; Zhao, S.; Yan, T.; Song, S.; Ma, W.; Liu, K.; Wang, M. Hierarchical Label-Enhanced Contrastive Learning for Chinese NER. In IEEE Transactions on Neural Networks and Learning Systems; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
  42. Zhang, S.; Wang, L.; Sun, K.; Xiao, X. A Practical Chinese Dependency Parser Based on A Large-Scale Dataset. arXiv 2020, arXiv:2009.00901. [Google Scholar]
  43. Pradhan, S.; Moschitti, A.; Xue, N.; Ng, H.T.; Björkelund, A.; Uryupina, O.; Zhang, Y.; Zhong, Z. Towards Robust Linguistic Analysis Using Ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, 8–9 August 2013; pp. 143–152. [Google Scholar]
  44. Peng, N.; Dredze, M. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 548–554. [Google Scholar]
  45. Levow, G.-A. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22–23 July 2006; Ng, H.T., Kwong, O.O.Y., Eds.; Association for Computational Linguistics: Sydney, Australia, 2006; pp. 108–117. [Google Scholar]
  46. Eberts, M.; Ulges, A. Span-Based Joint Entity and Relation Extraction with Transformer Pre-Training. In Proceedings of the ECAI 2020, Santiago de Compostela, Spain, 29 August–8 September 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2006–2013. [Google Scholar]
  47. Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 5849–5859. [Google Scholar]
  48. Yan, H.; Gui, T.; Dai, J.; Guo, Q.; Zhang, Z.; Qiu, X. A Unified Generative Framework for Various NER Subtasks. arXiv 2021, arXiv:2106.01223. [Google Scholar]
Figure 1. An overall framework of the MPBCNER model. The MPBCNER integrates multi-label prompts and boundary information features.
Figure 1. An overall framework of the MPBCNER model. The MPBCNER integrates multi-label prompts and boundary information features.
Applsci 15 05801 g001
Figure 2. Example of a syntactic dependency diagram.
Figure 2. Example of a syntactic dependency diagram.
Applsci 15 05801 g002
Table 1. Experiment environment.
Table 1. Experiment environment.
NameParameters
Operating SystemCentOS 7.7.1908 (Core)
CPUIntel(R) Xeon(R) Gold 6271C
GPUTesla V100
Memory128 G
Table 2. Experiment parameters.
Table 2. Experiment parameters.
ParameterValue
batch_size8
epoch10
max_seq_length512
learning_rate1 × 10−5
dropout0.1
Table 3. Results in standard Chinese flat and nested dataset (F1%).
Table 3. Results in standard Chinese flat and nested dataset (F1%).
ModelOntoNotesMSRAWeiboGA
BERT-CRF87.6289.5670.1378.23
SpERT88.4792.2679.0383.14
BERT-MRC84.6190.1175.2782.42
BARTNER88.2491.1679.7583.86
TemplateNER84.6590.3276.3279.65
PromptNER88.7192.8678.2682.16
MPBCNER (BERT)89.1292.1481.3585.27
MPBCNER (RoBERTa)90.2392.7582.1685.92
MPBCNER (ERNIE)92.1693.0183.7488.16
Table 4. Results in the OntoNotes few-shot dataset (F1%).
Table 4. Results in the OntoNotes few-shot dataset (F1%).
ModelK = 10K = 20K = 50K = 100K = 200K = 500
BERT-CRF29.147.953.156.461.174.1
SpERT33.141.750.057.965.275.4
BERT-MRC42.950.357.461.666.178.8
BARTNER41.049.155.260.166.978.2
TemplateNER42.451.459.164.169.079.3
PromptNER49.158.264.568.471.378.5
MPBCNER (BERT)49.358.864.968.772.179.0
MPBCNER (RoBERTa)50.260.165.769.273.379.1
MPBCNER (ERNIE)51.162.468.171.275.280.3
Table 5. Results in the MSRA few-shot dataset (F1%).
Table 5. Results in the MSRA few-shot dataset (F1%).
ModelK = 10K = 20K = 50K = 100K = 200K = 500
BERT-CRF30.548.453.857.262.474.9
SpERT34.642.550.658.566.175.8
BERT-MRC43.151.058.262.367.379.1
BARTNER41.650.255.960.267.078.8
TemplateNER43.152.860.865.470.380.1
PromptNER52.761.465.368.571.181.6
MPBCNER (BERT)54.365.171.377.280.482.9
MPBCNER (RoBERTa)55.866.272.078.680.983.5
MPBCNER (ERNIE)58.369.174.780.082.384.9
Table 6. Results in the Weibo few-shot dataset (F1%).
Table 6. Results in the Weibo few-shot dataset (F1%).
ModelK = 10K = 20K = 50K = 100K = 200K = 500
BERT-CRF22.631.540.851.358.663.1
SpERT25.438.244.552.160.365.2
BERT-MRC30.139.345.153.261.966.3
BARTNER32.240.247.555.162.366.8
TemplateNER38.144.251.960.765.470.5
PromptNER40.246.355.762.968.573.7
MPBCNER (BERT)43.550.259.865.170.275.8
MPBCNER (RoBERTa)43.850.861.065.570.676.3
MPBCNER (ERNIE)45.152.362.566.172.376.9
Table 7. Results in the GA few-shot dataset (F1%).
Table 7. Results in the GA few-shot dataset (F1%).
ModelK = 10K = 20K = 50K = 100K = 200K = 500
BERT-CRF20.129.537.845.654.860.1
SpERT24.336.542.950.458.563.4
BERT-MRC30.538.944.151.260.367.9
BARTNER33.141.646.353.061.868.2
TemplateNER40.244.352.660.866.172.3
PromptNER42.847.558.365.170.274.8
MPBCNER (BERT)45.152.661.567.372.178.6
MPBCNER (RoBERTa)45.953.562.167.973.079.3
MPBCNER (ERNIE)46.355.464.869.775.181.5
Table 8. Ablation experiment.
Table 8. Ablation experiment.
ModelK = 10K = 20K = 50K = 100K = 200K = 500
MPBCNER (BERT)45.152.661.567.372.178.6
-GAT lay43.248.959.465.770.175.4
-Prompt37.542.552.659.761.868.6
-ALL30.136.642.750.258.265.1
Table 9. Experimental results for different prompts.
Table 9. Experimental results for different prompts.
ModelK = 10K = 20K = 50K = 100K = 200K = 500
MPBCNER (BERT)45.152.661.567.372.178.6
MPBCNER (Slot)44.851.560.365.871.277.5
MPBCNER (Hard)40.347.355.161.966.571.0
MPBCNER (Soft)44.250.959.765.170.676.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, C.; Huang, B.; Ling, Y. A Chinese Few-Shot Named-Entity Recognition Model Based on Multi-Label Prompts and Boundary Information. Appl. Sci. 2025, 15, 5801. https://doi.org/10.3390/app15115801

AMA Style

Zhou C, Huang B, Ling Y. A Chinese Few-Shot Named-Entity Recognition Model Based on Multi-Label Prompts and Boundary Information. Applied Sciences. 2025; 15(11):5801. https://doi.org/10.3390/app15115801

Chicago/Turabian Style

Zhou, Cong, Baohua Huang, and Yunjie Ling. 2025. "A Chinese Few-Shot Named-Entity Recognition Model Based on Multi-Label Prompts and Boundary Information" Applied Sciences 15, no. 11: 5801. https://doi.org/10.3390/app15115801

APA Style

Zhou, C., Huang, B., & Ling, Y. (2025). A Chinese Few-Shot Named-Entity Recognition Model Based on Multi-Label Prompts and Boundary Information. Applied Sciences, 15(11), 5801. https://doi.org/10.3390/app15115801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop