You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

30 May 2025

Named Entity Recognition Based on Multi-Class Label Prompt Selection and Core Entity Replacement

,
and
School of Information and Electrical Engineering, Hebei University of Engineering, Handan 056038, China
*
Author to whom correspondence should be addressed.

Abstract

At present, researchers are showing a marked interest in the topic of few-shot named entity recognition (NER). Previous studies have demonstrated that prompt-based learning methods can effectively improve the performance of few-shot NER models and can reduce the need for annotated data. However, the contextual information of the relationship between core entities and a given prompt may not have been considered in these studies; moreover, research in this field continues to suffer from the negative impact of a limited amount of annotated data. A multi-class label prompt selection and core entity replacement-based named entity recognition (MPSCER-NER) model is proposed in this study. A multi-class label prompt selection strategy is presented, which can assist in the task of sentence–word representation. A long-distance dependency is formed between the sentence and the multi-class label prompt. A core entity replacement strategy is presented, which can enrich the word vectors of training data. In addition, a weighted random algorithm is used to retrieve the core entities that are to be replaced from the multi-class label prompt. The experimental results show that, when implemented on the CoNLL-2003, Ontonotes 5.0, Ontonotes 4.0, and BC5CDR datasets under 5-Way k-Shot (k = 5, 10), the MPSCER-NER model achieves minimum F1-score improvements of 1.32%, 2.14%, 1.05%, 1.32%, 0.84%, 1.46%, 1.43%, and 1.11% in comparison with NNshot, StructShot, MatchingCNN, ProtoBERT, DNER, and SRNER, respectively.

1. Introduction

Human–computer dialogue systems have been a hot topic in the research domain of natural language comprehension [1]. Dialogue state tracking, as one of the key tasks of human–computer dialogue systems, helps the system to accurately understand a user’s words and execute their commands. To accomplish this goal, access to structured information provided by NER is required for dialogue state tracking [2].
NER tasks identify and classify entities in text into predefined categories, such as the names of people, places, and organizations. Structured information is formed to provide data support for downstream tasks [3]. As the field of natural language processing has evolved, named entities in downstream tasks have been refined. The naming rules for these entities vary widely, resulting in significant labor costs for the annotation of data. Therefore, the development of methods that can be used to accurately identify named entities in scenarios with limited data has become an urgent requirement in this field.
In recent years, pretrained large language models (LLMs) have gradually emerged. LLMs have also been used to implement NER tasks in few-shot scenarios [4]. Prompt learning is introduced into the LLM to recognize named entities and reduce the need for a large amount of labeled data. Good prompts can prevent the need for many annotated examples, and can thus contribute to improved performance [5]. Additionally, data augmentation methods can also increase sample diversity, enhancing performance in the context of few-shot NER. These methods play a crucial role in the further development of named entity recognition. The main contributions of this study are as follows:
  • A model is designed to address the NER task in few-shot scenarios. A multi-class label prompt selection strategy is designed to select an annotated instance with a clear sentence structure for demonstration. The entity context information between the sentence and the multi-class label prompts is enhanced to improve the accuracy of core entity recognition. The optimization effect of multi-class label prompt demonstrations on word vector representations for entities in target sentences is empirically validated. The low-density core entity demonstrations empirically prove that prompts with clearer sentence structures can effectively enhance the accuracy of core entity recognition.
  • A core entity replacement strategy is designed to increase the diversity of input word vectors during training. A weighted random algorithm is employed to retrieve the core entities that are to be replaced in the prompt. The core entities selected in the multi-class label prompt are updated during each the training epoch. The vector of each token in the training data is updated. The core entity replacement method dynamically updates word vector labels in demonstration prompts. A novel approach to enrich input data in few-shot learning scenarios is proposed.
  • Experiments on the CoNLL-2003, OntoNotes 5.0, OntoNotes 4.0, and BC5CDR datsets showed the superiority of our model in few-shot NER.
The rest of the study is organized as follows: Section 2 reviews the literature related to the research conducted in the present study. Section 3 details the structure of the MPSCER-NER (multi-class label prompt selecting with core entity replacement for named entity recognition) model, and the working principle of the model is described in detail. Section 4 presents the results and analysis of the modeling experiments. Section 5 summarizes the conclusions of the study and presents future research directions.

3. The MPSCER-NER Model

The goal of the present study was to address the problems that arise when one limits the context window of a demonstration-based few-shot NER model; here, the window is limited to specific entities. Additionally, we sought to resolve the problems that arise when the NER model learns word vectors with little information. A new model, MPSCER-NER, for multi-class label prompt selection with core entity replacement is proposed. Multi-class label prompts are retrieved as candidates from the few-shot instances dataset. Here, we identify instances with low-density core entity from the candidates in order to provide a demonstration. The sentence to be identified with the multi-class label prompt demonstration is input into BERT to obtain better token representations for words. A weighted random algorithm is used to select a core entity from the demonstration. The entity is replaced with the corresponding label. The training data word vectors are changed after the demonstration update. The new word vectors help BERT to recognize similar entities in different situations. BiLSTM handles the token representation of the multi-class label prompt demonstration. CRF processes the information from BiLSTM to resolve the mismatch between labels and finally obtains the entity labels. The MPSCER-NER framework is shown in Figure 1.
Figure 1. The MPSCER-NER framework.

3.1. Multi-Class Label Prompt Selecting

The task of NER can be described as follows: Given a sentence, S, consisting of n words—where each word is denoted as x i —the individual entities in the text are identified and annotated according to a pre-specified set of label types Y = { PER , LOC , ORG , O , } ; these will be mapped to the labels in the set according to the computational probability function. The core entity is the entity that is not labeled as O in the collection of labels.
Prior knowledge in the model can be effectively leveraged to enhance NER performance through prompt learning. Sentences from the instance dataset are selected as demonstration prompts to enable the model to accurately understand the entity information of the downstream task. In a demonstration sentence, the core entity is the key information of the sentence that effectively represents the features of its corresponding label. A contextual link between the entities is formed using the sentence that is to be recognized. Sentences that have more than one type of entity in the instance set are called multi-class label prompts. Using a multi-class label prompt as a demonstration can help models in learning information about various types of entities. The MPSCER-NER model learns the distribution of the input text and the overall format of the sequence from the multi-class label prompts. The process of multi-class label prompt selection is shown in Figure 2.
Figure 2. Multi-class label prompt selection.
In Figure 2, PER, LOC, and ORG denote the name of person, place, and organization, respectively. MISC represents the other core entity types. In the sentence ‘Sarah visited the NVIDIA in California’, ‘Sarah’ is PER, ‘NVIDIA’ is ORG, and ‘California’ is LOC. The core entity type in the sentence is 3. Similarly, in the sentence ‘Obama returned to Washington’, ‘Obama’ is PER and ‘Washington’ is LOC. In the sentence ‘Sophia took her dog Bella for a walk’, ‘Sophia’ is PER and ‘Bella’ is MISC. In the three sentences, the kinds of label are 3, 2, and 2. The sentence ‘Sarah visited the NVIDIA in California’, which has the most different kinds of core entity, is selected as a candidate.
Among the candidates, the core entity density of the sentences is calculated. The core entity is highlighted in a sentence with a low-density core entity. The sentence structure is clear, bringing clear contextual information about entity boundaries and helping the LLM to identify the core entity in the annotated corpus. Sentences with low-density core entity have a clear context of the entity, which helps the LLM identify core entities in the unlabeled corpus. The process of low-density core entity prompt selection is shown in Figure 3.
Figure 3. Low-density core entity prompt selection.
In Figure 3, there are five core entities in the sentence: PER (Sarah, John), ORG (Apple Inc., Google), and LOC (London). There are 12 non-core entities, so the calculation results in a core entity density of 0.29. The core entity density is computed sequentially for the different candidate sentences; the goal of this is to identify the sentence with the minimum number of core entities.
Let { p ^ 1 , p 2 , , p m } represent the multi-class label prompt; here, p j represents the jth word in the prompt, p ^ j represents the core entity word, and y j represents the label corresponding to the jth word in the prompt. Let { x 1 , x 2 , , x n } represent the input corpus and x i represent the ith word in the corpus. The input corpus is combined with the multi-class label prompt and merged into the format { x 1 , x 2 , , x n [ S E P ] p ^ 1 , p 2 , , p m } . A demonstration of the multi-class label prompt process is shown in Figure 4.
Figure 4. Multi-class label prompt demonstration.
As shown in Figure 4, in the NER method without demonstration, the core entity is usually associated with the word in the sentence that is to be recognized. Only the linkage of entities within the sentence is constructed. In the multi-class label prompt demonstration of the NER method, the word vector of ‘Peter’ is related to the word vector of ‘Sarah’ in the multi-class label prompt. Word vectors of ‘Peter’ are influenced by the entity. The word vector is closer to the corresponding label.
Multi-class label prompts are selected for the demonstrations of the sentences that are to be recognized. The entity contextual links between the sentences are constructed to obtain better token representation. The entity word vectors in the sentence to be recognized are close to their corresponding labels.

3.2. Core Entity Replacement

To enable the language model to learn information about entities of the same category, entity labels are explicitly added to the demonstration. A core entity is dynamically selected in a demonstration sentence, and the entity is replaced using its corresponding label. Dynamic demonstration implies different entity word vectors, which helps the model to be exposed to word vectors in different scenarios. The model’s ability is enhanced to recognize the same type of entities. An example of the core entity replacement process is shown in Figure 5.
Figure 5. An example of the core entity replacement process.
In Figure 5, the training sentence is ‘Peter got drunk alone at the bar’. The multi-class label prompt is ‘Sarah visited the NVIDIA in California’. Here, ‘NVIDIA’ is replaced in the prompt with its corresponding label, ‘ORG’. At the next training stage, ‘California’—selected by the weighted random algorithm—will be replaced with its corresponding label, ‘LOC’. Thus, the multi-class label prompt has been updated, and the word vectors in the training data have changed. The details of the core entity replacement method are illustrated in Figure 6.
Figure 6. The details of the core entity replacement method.
As shown in Figure 6, the word embedding process of the training data is updated with changes in the multi-class label prompts. The model is exposed to more word vectors, which helps to improve the ability of the MPSCER-NER model to recognize entities. There are multiple kinds of core entities in the multi-class label prompt. For the MPSCER-NER model to adequately learn named entity vector information in different situations, the weighted random algorithm is used to ensure that each core entity is replaced. Thus, the balanced learning of all entity labels is achieved.
There are multiple kinds of core entities in the multi-class label prompt. For the MPSCER-NER model to adequately learn named entity vector information in different situations, the weighted random algorithm is used to ensure that each core entity is replaced. Thus, the balanced learning of all entity labels is achieved. To prevent completely random labels from appearing, one weight value per core entity is used to adjust the random selection probability during training to select the most likely word. The equation of new weights of the selected words W u p d a t e is calculated as shown in the following equation:
W u p d a t e = W o r i g i n a r
where W o r i g i n is the weight of the selected core entity before updating, a is a penalty factor specified in advance, and r is a random variable. After the weight has updated, the entity labels are guaranteed to be balanced for replacement.
After identifying the selected core entity, the entity is replaced by its corresponding label. The original sentence, { p ^ 1 , p 2 , , p m } , is transformed to { y 1 , p 2 , , p m } . The prompt for the demonstration is obtained. Therefore, the training data can be formatted as { [ CLS ] X i n p u t [ SEP ] y 1 , p 2 , , p m . [ SEP ] } .
For example, the sentence selected using the multi-class label prompt selection method is ‘Yesterday, Sarah from Apple Inc. met John in London and attended the conference at Google headquarters’. At each round of training, the updated prompts with the training data are input into the model. The candidates for replacement from the core entities are shown in Figure 7.
Figure 7. Candidates for replacement from the core entities.
In Figure 7, X i n p u t represents the input training data. Candidate represents the core entities to be replaced in the demonstration. Labels represents the core entity labels in the multi-class label prompt, and Prompt represents the multi-class label prompt. A core entity is shown to be replaced with the corresponding label in each round of training. For example, ‘Sarah’, an entity labeled ‘PER’, is replaced to obtain a new sentence: ‘Yesterday, PER from Apple Inc. met John in London and attended the conference at Google headquarters’. The updated multi-class label prompt with the training corpus is input into the model.
In the MPSCER-NER model, each word is labeled as a specific entity category. The classification loss function is used to measure the difference between the predicted label and the actual label. The classification loss is later minimized to make the output of the model so that the output is closer to the probability distribution of the real labels. This means that the model becomes more accurate in predicting the named entity category for each word. The classification L o s s is as shown in the following equation:
L o s s = 1 N n = 0 N 1 c = 0 C 1 y n , c l o g p n , c
where N denotes the number of samples; p n , c denotes whether the word at n belongs to the true label of category c or not, taking the value of 1 if the true category of sample i is equal to c. It denotes the predicted probability that y n , c belongs to the c category.
The MPSCER-NER model selects core entities in the multi-class label prompts as candidates during training. The weighted random algorithm is used to identify the next replaced word, influenced by the word changes in the demonstration. The word vectors of the input corpus are updated, which enhances the MPSCER-NER model’s ability to conduct entity segmentation.

3.3. BiLSTM-CRF Classification

After acquiring word vectors with the contextual information of multi-class label prompts, BiLSTM-CRF is used to process semantic information and output label sequences. Multi-class label prompts increase the length of input sentences. The sequence context information can be effectively captured by BiLSTM. The information processed using BiLSTM is passed on to the CRF. An example of how BiLSTM-CRF processes word embeddings in MPSCER-NER is shown in Figure 8.
Figure 8. An example of how BiLSTM-CRF processes word embeddings in MPSCER-NER.
In Figure 8, the acquired word vectors affected by the multi-class label prompts are processed by BiLSTM. Finally, the word vectors with entity context information are output to CRF.
Word vectors are generated with the problem of mismatched labels. The classifier can only discern the most likely label for the word vector, but not whether it makes logical sense. Dependencies between labels can be established by CRF. The output is adjusted globally based on the results. It makes the NER model more compatible with the actual labeling constraints and also improves the robustness of the model in terms of the named entity boundaries.
The observation sequences and state sequences of the CRF interact with each other by defining the eigenfunctions. The characteristic function is defined on the input sequence. They can capture different features depending on the needs of the modeling task. Therefore, CRF is chosen to solve the label before and after the mismatch problem. The inferred label probability formula based on CRF is shown in the following equation:
P y | p = 1 Z ( x ) exp i , k λ k t k y i 1 , y i , p , i + i , l u l s l y i , p , i Z ( x ) λ k u l
Z x = exp i , k λ k t k ( y i 1 , y i , p , i ) + i , l u l s l ( y i , p , i )
where Z ( x ) is the normalization factor; λ k and u l are the weight values corresponding to the feature functions; p and y are the corresponding labels, respectively; k represents the kth transfer feature function; y i is the label predicting the ith word; and l is the lth state feature function.
The function t k ( · ) relies on the current position and previous position labels, and s l ( · ) relies only on the current position label. The transfer eigenfunctions t k ( · ) and s l ( · ) are calculated as shown in the following equation:
t k ( y i 1 , y i , x , i ) = 1 , y i 1 = B , y i = I 0 , otherwise
s l ( y i , x , i ) = 1 , y i = B 0 , otherwise
where y i 1 is the word label of the previous position, and y i is the word label of the current position. The letter B denotes the beginning of the label (Begin), and I denotes the position inside the label (Inside). We obtain 1 for B and I, and we obtain 0 in all the other cases.
After BiLSTM-CRF processing, the contextual information in the sentence is clearer and the illogical label pairings are reduced. The labels of the identified entities are closer to their real labels.
In summary, the MPSCER-NER model training processes are described as follows:
Multi-class label prompts are selected as prompts using the MPSCER-NER model. The distribution of text in the prompt learning is learned along with the overall input format of the sequence. The weighted random algorithm is used to replace the core entities to help the MPSCER-NER model in identifying entities in different situations. Finally, the word vectors are input into BiLSTM-CRF to deal with the label mismatches that occur before and after the achievement of NER. The training algorithm of the MPSCER-NER model is shown in Algorithm 1.
Algorithm 1 MPSCER-NER model training
Require: Training dataset—D; prompt dataset—P; batch size— B s ; epoch— E p ; learning rate— L r ; dropout— D r ; θ —the initial MPSCER-NER model parameters.
Ensure: The MPSCER-NER model( θ ˜ ).
  1:
Θ
  2:
for  p t in P do
  3:
     if  C o r e E n t i t y L a b e l p t   >   L a b e l _ n u m  then
  4:
          Clear Θ
  5:
           Θ p t
  6:
           L a b e l _ n u m     C o r e E n t i t y L a b e l p t
  7:
     end if
  8:
     if  C o r e E n t i t y L a b e l p t   =   L a b e l _ n u m  then
  9:
           Θ Θ p t
10:
     end if
11:
end for
12:
X ˜     N o n e
13:
δ 0
14:
for  c i n Θ do
15:
     if  C o r e E n t i t y D e n s e   <   δ  then
16:
           X ˜ c
17:
     end if
18:
end for
19:
Ω C o r e E n t i t y R e t r i v e ( X ˜ )
20:
for n in E p  do
21:
      Ω     C o r e E n t i t y R e t r i v e ( X ˜ )
22:
      X ˜     C o r e E n t i t y R e p l a c e ( X ˜ , y i )
23:
      L o s s ( θ ) = 1 N n = 0 N 1 c = 0 C 1 y n , c l o g p ( θ )
24:
      θ ˜ θ η Δ L o s s ( θ )
25:
end for
26:
return The MPSCER-NER model( θ ˜ )

4. Experimental Results and Analysis

4.1. Datasets and Experimental Settings

In this study, BERT-base-cased is used as the encoder, AdamW is used as an optimizer, the PyTorch 1.7 library is used for implementing the code, and an Nvidia GeForce RTX2080Ti GPU is used for training.
The CoNLL-2003 [23] dataset is a news text containing English and German texts; it is used to train and evaluate NER systems. Moreover, it is widely used to evaluate the ability of natural language processing models to recognize named entities, such as the names of people, places, and organizations. The Ontonotes 5.0 [24] dataset contains textual data from a wide range of sources, such as news, conversations, web, and radio; it covers a large number of linguistic tasks such as named entities, semantic role annotations, and denotational disambiguation. OntoNotes 4.0 [25] is a dataset covering Chinese–English named entity annotation. BC5CDR [26] is a dataset for Named Entity Recognition tasks in the biomedical field. These datasets are often used to validate the performances of various NER models. The dataset statistics are shown in Table 1.
Table 1. Dataset statistics.
Note the following hyper-parameters: ‘Learning rate’ denotes the learning rate, which is set to 2 × 10−5; ‘Batch size’ denotes the batch, the number of which is set to 64; ‘Epoch’ denotes the number of training rounds, which is set to 50; and ‘Dropout’ denotes the dropout rate, which is set to 0.5. Training will stop if the F1 does not improve after Maxnoincre epochs. The experimental hyper-parameter settings are shown in Table 2.
Table 2. Experimental hyper-parameter settings.

4.2. Evaluation Indicators

In this experiment, P r e c i s i o n , R e c a l l , and F1 are used to evaluate the performances of the MPSCER-NER model and the comparison model.
P r e c i s i o n is used to measure the accuracy of the MPSCER-NER model in cases where the prediction is a positive case. The value of the accuracy rate ranges from 0 to 1. Higher values indicate that the model is more accurate in predicting positive cases. P r e c i s i o n is calculated as shown in the following equation:
P r e c i s i o n = T P T P + F P
where T P denotes the number of texts where the actual label is true and the entity category result is also true; F P denotes the number of entities whose actual label is false, but which are identified as true.
R e c a l l is used to measure the proportion of positive examples that can be captured by the MPSCER-NER model. Higher values indicate higher coverage of positive examples by the model. R e c a l l is calculated as shown in the following equation:
R e c a l l = T P T P + F N
where F N denotes the number of entities whose entity label is true but where the model entity identification result is false.
The F1 is the reconciled average of P r e c i s i o n and R e c a l l , which combines the performance of P r e c i s i o n and R e c a l l . F1-score is calculated as shown in the following equation:
F 1 = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l
Overall, precision measures how accurately the model predicts one of the positive examples. Recall measures the model’s coverage of the true-positive examples. The F1 is a composite metric that strikes a balance between P r e c i s i o n and R e c a l l .

4.3. Effectiveness on K-Shot

The number of shots is a very critical parameter in further improving model performance. Within certain limits, increasing the number of shots improves the stability of convergence, but may lead to a limitation in the generalization ability of the model. In order to verify the effect of the number of shots on the MPSCER-NER model, the effect of k-shots on the P r e c i s i o n , R e c a l l , and F1 of the MPTSCER-NER model on the CoNLL-2003 dataset (k-shots, k = 25, 50, 100) is shown in Figure 9.
Figure 9. The effect of k-shot on the P r e c i s i o n , R e c a l l , and F1 of the MPSCER-NER.
As can be seen from the figure, the P r e c i s i o n , R e c a l l , and F1 of the model gradually increase as the number of shots becomes larger. The P r e c i s i o n , R e c a l l , and F1 of the model are maximized when the number of shots is taken as 100. More samples of the entity feature information are learned by the model. The models are exposed to different contexts and different forms of domain knowledge. The ability of the model to recognize entity classes is enhanced.

4.4. Confusion Matrices

As shown in Table 3, Table 4 and Table 5, the use of the multi-class label prompt selection method resulted in an increase in the TP + TN values for certain categories. We speculate that this outcome is related to the types of core entities presented in the multi-class label demonstration. The multi-class label prompt optimizes the vector representations of specific categories of entities in the sentences being recognized. In the case of solely using the core entity replacement method, the number of TP + TN values for multiple categories increased, which is due to the fact that each prompt update enriched the input word vectors and enhanced the model’s ability to identify entities. In the MPSCER-NER model, the recognition accuracy of TP + TN has effectively improved; however, the number of correctly identified MISC entities has not seen significant enhancement. We speculate that this is due to the presence of multiple different fine-grained labels within the MISC category.
Table 3. Confusion matrix on CoNLL-2003 (multi-class label prompt selection).
Table 4. Confusion matrix on CoNLL-2003 (core entity replacement).
Table 5. Confusion matrix on CoNLL-2003 (MPSCER-NER).

4.5. Ablation Studies

The MPSCER-NER model is mainly divided into modules for multi-class label prompt selection and core entity replacement. Different combinations of the stages are used in the CoNLL-2003 dataset. The MPSCER-NER model is experimentally analyzed using the ablation method. The ablation experiments verified the validity of the multi-class label prompt selection and core entity replacement methods. The approach for selecting the ablation experiment modules is shown in Table 6.
Table 6. MPSCER-NER model ablation experiment module selection.
As can be seen from Table 6, Module 1 denotes the strategy comprising multi-class label prompt demonstration only. Module 2 denotes the strategy comprising multi-class label prompt demonstration with low-density core entity selection. Module 3 denotes the strategy comprising core entity replacement with a weighted random algorithm. Module 4 denotes the strategy comprising multi-class label prompt demonstration with core entity replacing. Module 5 is the model presented in this study, MPSCER-NER.
From the results achieved using Module 1 and Module 2 in Table 7, it can be seen that the multi-class label prompt with a low-density core entity have improved the effects of the 25-shot, 50-shot, and 100-shot circumstances compared to the normal multi-class label prompts. Multi-class label prompts with low-density core entities have a clearer sentence structure. The entity context information between the sentence to be recognized and the demonstration sentence is clearer. As can be seen in Module 1, the R e c a l l of the entity recognition process is effectively improved through the use of the core entity replacement methods. As the demonstration prompt changes, the word vectors in the training sentences also change. The model is exposed to more word vectors and becomes more capable of recognizing entities. In 25-shot, 50-shot, and 100-shot circumstances, the MPSCER-NER model works optimally in terms of P r e c i s i o n , R e c a l l , and F1-score. The core entity replacement method for normal multi-class label prompts shows a sub-optimal performance. The results for this method prove that the multi-class label prompt selection and core entity replacement method can effectively improve the model’s ability to recognize entities and that the effects can mutually enhance each other.
Table 7. Results of ablation experiments with the MPSCER-NER model.

4.6. Baselines

To validate the performance of the MPSCER-NER model proposed in this study, comparative experiments were conducted on CoNLL-2003, Ontonotes 5.0, and Ontonotes 4.0 with benchmark models, including the following:
  • NNshot and StructShot [27] comprise a simple NER based on nearest neighbor learning and structured reasoning. This is a supervised NER model trained on the source domain, which is used for feature extraction; a nearest neighbor classifier is used to learn in the feature space, capturing label dependencies between entity labels.
  • MatchingCNN [28] is a network that maps a small labeled support set; an unlabeled example for its label was proposed. It calculates the similarity between query instances and support instances, adapting to the recognition of new class types.
  • ProtoBERT [29] uses a token-level prototypical network that represents each class by averaging token representations with the same label; then, the label of each token in the query set is decided by its nearest class prototype.
  • DemonstrationNER [11] is a prompt learning NER method based on demonstration. The sentences marked in the dataset are selected as prompts to be input into the BERT model. The authors presented a demonstration of the relationship between the entities and the labels after the example sentences were constructed. This process helps the model to learn the contextual information from the task demonstration, contextualizing the task before the input, and enabling the model to recognize more entities through a good demonstration.
  • SR-Demonstration [30] is an NER method that was proposed for marking the relevance of demonstrations; it removes useless information from demonstration prompts, creates a relevance vocabulary consisting of tokens that appear in the annotated datasets, samples the tokens from the relevance vocabulary to replace the tokens in the demonstration, and calculates the most suitable demonstration sentence length required to achieve a demonstration of NER.
As can be seen from Table 8, the MPSCER-NER model shows better performances in the 25-shot and 50-shot scenarios on all three datasets—CoNLL-2003, Ontonotes 5.0, and Ontonotes 4.0—in comparison with DemonstrationNER (which performed better with the CoNLL-2003 dataset in terms of NNshot, StructShot, MatchingCNN, ProtoBERT, DemonstrationNER, and SR-Demonstration). The MPSCER-NER model improved the F1-score of the 25-shot and 50-shot scenarios by 1.32% and 2.14%, respectively. Compared to SR-Demonstration—which had a better performance on Ontonotes 4.0 and Ontonotes 5.0—MPSCER-NER improved the F1-score of the 25-shot and 50-shot scenarios by 1.05%, 1.32%, 0.84%, and 1.46%. Compared to SR-Demonstration—which had a better performance on BC5CDR—MPSCER-NER improved the F1-score in the 25-shot and 50-shot scenarios by 1.43% and 1.11%, respectively. The MPSCER-NER model was used for demonstration, identifying multi-class label prompts in the set of examples. Entity context links between the multi-class label prompts and the sentences that are to be recognized can be formed. The word vector representation in the sentence was optimized to make it closer to its corresponding label. The core entities in the multi-class label prompts in training were replaced with their corresponding labels. The word vectors in the sentences were altered to improve the model’s ability to recognize the named entities in different situations.
Table 8. Performance of the model with different datasets: F1-score.

5. Conclusions

Intelligent dialogue systems are gradually being integrated into people’s lives. Dialogue state tracking is a fundamental task that intelligent dialogue systems must perform. NER provides entity information that helps dialogue state tracking models to accurately understand users’ words. In this study, a multi-class label prompt selection and core entity replacement-based named entity recognition (MPSCER-NER) model was proposed. In the multi-class label prompt selection phase, multi-class label prompts from the instance dataset are selected as candidates. Low-density multi-class label candidate prompts are used as a demonstration; these are input into the model together with the sentences that are to be recognized. Contextual links between entities are established. The word vector representation is optimized to improve the model’s ability to identify positive samples. In the core entity replacement phase, the weighted random algorithm is used to find out the core entities that will be replaced. Then, the word vectors used for the training data are enriched. The ability of the model to recognize entities in different scenarios can thus be enhanced. In the entity classification stage, BiLSTM is used to deal with word vectors with information from multi-class label prompts. CRF is used to deal with the problem of mismatch between the fronts and the backs of the labels. The experimental results show that, on CoNLL-2003, Ontonotes 5.0, Ontonotes 4.0, and BC5CDR under 5-Way k Shot (k = 5, 10), the MPSCER-NER model achieves minimum F1-score improvements of 1.32%, 2.14%, 1.05%, 1.32%, 0.84%, 1.46%, 1.43%, and 1.11% in comparison with NNshot, StructShot, MatchingCNN, ProtoBERT, DNER, and SRNER, respectively.These results demonstrate the superior NER accuracy of the MPSCER-NER model.The multi-class label prompt selection method mentioned in this study can effectively improve the accuracy rate of positive sample recognition; the core entity replacement method can effectively improve the recall rate. These results are promising for the recognition of proprietary named entities. However, the methods cannot achieve the expected effect in the face of a zero-sample scenario. Methods for avoiding this problem must be researched further.

6. Limitations

In the experimental results, we can see that the model’s accuracy is limited; this can be attributed to the following factors: (1) Limitations in annotated sample size—in few-shot settings, 5-way–5-shot and 5-way–10-shot annotated datasets were used to train models (excluding BC5CDR); here, the limited nature of the training data means that it is difficult to cover the full diversity of named entity expressions and to account for the complexity of contextual environments. Additionally, the small number of annotated samples leads to unstable parameter updates, causing the optimization process to fall into local optima. (2) Limitations in model parameter size—the BERT-base model, comprising 12 attention layers and approximately 110 million parameters, was used for named entity recognition; the relatively smaller parameter size of this model led to a limitation in the achievable F1-score. (3) The granularity of the data—among the CoNLL03, OntoNotes 5.0, OntoNotes 4.0, and BC5CDR datasets, the CoNLL03 dataset achieved the highest accuracy, which is primarily because it is a clean and general-purpose dataset; in OntoNotes 5.0 and OntoNotes 4.0, named entities were similarly divided into five categories (i.e., PER, LOC, ORG, MISC, and O); however, within the MISC category, the experimental data exhibit finer granularity and a larger number of subtypes, resulting in lower F1-scores on the OntoNotes datasets compared to on the CoNLL03 dataset; BC5CDR, as a biomedical domain dataset, contains highly specialized terminology, and compared to the general-domain pretraining corpus used by BERT, its specialized nature leads to lower accuracy relative to general-domain datasets.

7. Future Work

In this model, annotated data are required, providing prompts for establishing the connections between the training data and the multi-class label prompts for the entities. As a result, the model cannot independently perform the zero-shot named entity recognition (NER) task. Domain adaptation and zero-shot data generation are identified as key approaches for enabling zero-shot NER.

7.1. Domain Transfer

(1) The annotated source domain dataset is used as the training set, where multi-class label templates are selected as demonstration prompts. In this approach, the BERT model serves solely as a fixed-feature extractor, where all parameters are frozen and they are not updated during training. Only the linear classification layer is fine-tuned so that the model can learn the mapping from the BERT-derived representations of the specific label templates. (2) Subsequently, the constructed label templates can be utilized in a prompt-based or text-matching manner to predict entities in the target domain. This framework ultimately enables the implementation of zero-shot named entity recognition.

7.2. Zero-Shot Data Generation

(1) Annotated data are selected to train the generative model; here, a text-generation task can be constructed based on these annotations, transforming the original named entity recognition (NER) task into a text-to-text generation formatting task.This allows the model to generate corresponding zero-shot NER data, which can subsequently be used to train an NER model. (2) The generated data are then converted into the standard NER BIO tagging format, and these standardized data are then used to train the target domain NER model.

Author Contributions

D.W. and Y.C. wrote the main manuscript and M.Y. conducted the experiments. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research Projects of the Nature Science Foundation of Hebei Province grant number F2020402003.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets used in this study were derived from public resources and made available within the article. We have published our code at https://github.com/CMSLDL/MPSCERmodel (accessed on 25 May 2025).

Acknowledgments

The authors look forward to the insightful comments and suggestions of the anonymous reviewers and editors, which will go a long way towards improving the quality of this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Jiang, M.; Chen, H. Label-Guided Data Augmentation for Chinese Named Entity Recognition. Appl. Sci. 2025, 15, 2521. [Google Scholar] [CrossRef]
  2. Jehangir, B.; Radhakrishnan, S.; Agarwal, R. A survey on Named Entity Recognition—datasets, tools, and methodologies. Nat. Lang. Process. J. 2023, 3, 100017. [Google Scholar] [CrossRef]
  3. Gong, F.; Tong, S.; Du, C.; Wan, Z.; Qiu, S. Named Entity Recognition in the Field of Small Sample Electric Submersible Pump Based on FLAT. Appl. Sci. 2025, 15, 2359. [Google Scholar] [CrossRef]
  4. Hu, Y.; Chen, Q.; Du, J.; Peng, X.; Keloth, V.K.; Zuo, X.; Zhou, Y.; Li, Z.; Jiang, X.; Lu, Z.; et al. Improving large language models for clinical named entity recognition via prompt engineering. J. Am. Med. Inform. Assoc. 2024, 31, 1812–1820. [Google Scholar] [CrossRef]
  5. Chen, Y.; Zheng, Y.; Yang, Z. Prompt-Based Metric Learning for Few-Shot NER. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 7199–7212. [Google Scholar]
  6. Petroni, F.; Rocktäschel, T.; Riedel, S.; Lewis, P.; Bakhtin, A.; Wu, Y.; Miller, A. Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2463–2473. [Google Scholar]
  7. Ding, N.; Chen, Y.; Han, X.; Xu, G.; Wang, X.; Xie, P.; Zheng, H.; Liu, Z.; Li, J.; Kim, H.G. Prompt-learning for Fine-grained Entity Typing. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6888–6901. [Google Scholar]
  8. He, K.; Mao, R.; Huang, Y.; Gong, T.; Li, C.; Cambria, E. Template-free prompting for few-shot named entity recognition via semantic-enhanced contrastive learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 18357–18369. [Google Scholar] [CrossRef] [PubMed]
  9. Hu, S.; Ding, N.; Wang, H.; Liu, Z.; Wang, J.; Li, J.; Wu, W.; Sun, M. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 2225–2240. [Google Scholar]
  10. Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, 5–6 August 2021; Volume 1, pp. 3816–3830. [Google Scholar]
  11. Lee, D.H.; Kadakia, A.; Tan, K.; Agarwal, M.; Feng, X.; Shibuya, T.; Mitani, R.; Sekiya, T.; Pujara, J.; Ren, X. Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 2687–2700. [Google Scholar]
  12. Dong, G.; Wang, Z.; Zhao, J.; Zhao, G.; Guo, D.; Fu, D.; Hui, T.; Zeng, C.; He, K.; Li, X.; et al. A multi-task semantic decomposition framework with task-specific pre-training for few-shot ner. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 430–440. [Google Scholar]
  13. Huang, Y.; He, K.; Wang, Y.; Zhang, X.; Gong, T.; Mao, R.; Li, C. Copner: Contrastive learning with prompt guiding for few-shot named entity recognition. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2515–2527. [Google Scholar]
  14. Su, L.; Chen, J.; Peng, Y.; Sun, C. Based Learning for Few-Shot Biomedical Named Entity Recognition Under Machine Reading Comprehension. J. Biomed. Inform. 2024, 159, 104739. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, T.; Chen, J.; Ma, L. Chinese Named Entity Recognition by Fusing Dictionary Information and Sentence Semantics. Comput. Mod. 2024, 3, 24–28. [Google Scholar]
  16. Lu, X.; Sun, L.; Ling, C.; Tong, Z.; Liu, J.; Tang, Q. Named entity recognition of Chinese electronic medical records incorporating pinyin and lexical features. J. Chin. Mini-Micro Comput. Syst. 2025. [Google Scholar] [CrossRef]
  17. Mengge, X.; Yu, B.; Zhang, Z.; Liu, T.; Zhang, Y.; Wang, B. Coarse-to-Fine Pre-training for Named Entity Recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020; pp. 6345–6354. [Google Scholar]
  18. Chen, J.; Liu, Q.; Lin, H.; Han, X.; Sun, L. Few-shot Named Entity Recognition with Self-describing Networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 5711–5722. [Google Scholar]
  19. Bartolini, I.; Moscato, V.; Postiglione, M.; Sperlì, G.; Vignali, A. Data augmentation via context similarity: An application to biomedical Named Entity Recognition. Inf. Syst. 2023, 119, 102291. [Google Scholar] [CrossRef]
  20. Liu, W.; Cui, X. Improving named entity recognition for social media with data augmentation. Appl. Sci. 2023, 13, 5360. [Google Scholar] [CrossRef]
  21. Zhou, R.; Li, X.; He, R.; Bing, L.; Cambria, E.; Si, L.; Miao, C. MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 2251–2262. [Google Scholar]
  22. Ghosh, S.; Tyagi, U.; Kumar, S.; Manocha, D. Bioaug: Conditional generation based data augmentation for low-resource biomedical ner. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 1853–1858. [Google Scholar]
  23. Chang, J.; Han, X. Character-to-word representation and global contextual representation for named entity recognition. Neural Process. Lett. 2023, 55, 8551–8567. [Google Scholar] [CrossRef]
  24. Fang, J.; Wang, X.; Meng, Z.; Xie, P.; Huang, F.; Jiang, Y. MANNER: A variational memory-augmented model for cross domain few-shot named entity recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1, pp. 4261–4276. [Google Scholar]
  25. Sajun, A.R.; Zualkernan, I.; Sankalpa, D. A Historical Survey of Advances in Transformer Architectures. Appl. Sci. 2024, 14, 4316. [Google Scholar] [CrossRef]
  26. Li, J.; Sun, Y.; Johnson, R.J.; Sciaky, D.; Wei, C.H.; Leaman, R.; Davis, A.P.; Mattingly, C.J.; Wiegers, T.C.; Lu, Z. BioCreative V CDR task corpus: A resource for chemical disease relation extraction. Database 2016, 2016, baw068. [Google Scholar] [CrossRef] [PubMed]
  27. Yang, Y.; Katiyar, A. Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020; pp. 6365–6375. [Google Scholar]
  28. Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
  29. Fritzler, A.; Logacheva, V.; Kretov, M. Few-shot classification in named entity recognition task. In Proceedings of the ACM Symposium on Applied Computing, Limassol, Cyprus, 8–12 April 2019; pp. 993–1000. [Google Scholar]
  30. Zhang, H.; Zhang, Y.; Zhang, R.; Yang, D. Robustness of Demonstration-based Learning Under Limited Data Scenario. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1769–1782. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.