A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information

Hwang, Hyunsun; Jung, Youngjun; Lee, Changki; Go, Wooyoung

doi:10.3390/app15158255

Open AccessArticle

A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information

¹

Department of Computer Science and Engineering, Kangwon National University, Chuncheon 24341, Republic of Korea

²

Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon 24341, Republic of Korea

³

National Security Research Institute, Daejeon 34044, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8255; https://doi.org/10.3390/app15158255

Submission received: 29 June 2025 / Revised: 18 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)

Download

Browse Figures

Versions Notes

Abstract

Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general named entities. We enhance the Biaffine nested NER model by modifying its output layer to incorporate label semantic information through a novel label description embedding (LDE) approach, improving performance with limited training data. Our method replaces the traditional biaffine classifier with a label attention mechanism that leverages comprehensive natural language descriptions of entity types, encoded using BERT to capture rich semantic relationships between labels and input spans. We conducted comprehensive experiments on four benchmark datasets: GENIA (nested NER), ACE 2004 (nested NER), ACE 2005 (nested NER), and CoNLL 2003 English (flat NER). Performance was evaluated across multiple few-shot scenarios (1-shot, 5-shot, 10-shot, and 20-shot) using F1-measure as the primary metric, with five different random seeds to ensure robust evaluation. We compared our approach against strong baselines including BERT-LSTM-CRF with nested tags, the original Biaffine model, and recent few-shot NER methods (FewNER, FIT, LPNER, SpanNER). Results demonstrate significant improvements across all few-shot scenarios. On GENIA, our LDE model achieves 45.07% F1 in five-shot learning compared to 30.74% for the baseline Biaffine model (46.4% relative improvement). On ACE 2005, we obtain 44.24% vs. 32.38% F1 in five-shot scenarios (36.6% relative improvement). The model shows consistent gains in 10-shot (57.19% vs. 49.50% on ACE 2005) and 20-shot settings (64.50% vs. 58.21% on ACE 2005). Ablation studies confirm that semantic information from label descriptions is the key factor enabling robust few-shot performance. Transfer learning experiments demonstrate the model’s ability to leverage knowledge from related domains. Our findings suggest that incorporating label semantic information can substantially enhance NER models in low-resource settings, opening new possibilities for applying NER in specialized domains or languages with limited annotated data.

Keywords:

deep learning; natural language processing; nested named entity recognition; label embedding; few-shot learning

1. Introduction

Named entity recognition (NER) is a fundamental step in information extraction that recognizes and classifies certain predefined linguistic expressions, mainly proper nouns such as people, place names, and organization names, in unstructured text to determine the range of entity names and classify which entity names belong to that range [1]. These NERs are often composed of a nested structure. Traditional NER techniques are limited in their ability to capture hierarchical entity relationships, particularly from an information extraction perspective. Flat NER techniques have mainly been studied for technical reasons.

Nested NER is an approach that analyzes the nested structure in which a named entity representation contains another named entity representation to overcome traditional flat NER [2]. However, the problem is that NER techniques are mainly studied for flat NER, and few training data have been built for nested NER.

To address these issues, this study proposes a label description embedding (LDE) model and a nested NER model robust to few-shot learning environments. The LDE model modifies the output layer of the existing biaffine nested NER model into a label attention layer that can use the semantic information of labels by encoding their descriptions.

2. Related Works

Traditionally, NER has been approached as a sequential labeling task. In this approach, tokens are annotated with BIO (Beginning, Inside, Outside) tags to identify entity boundaries and types. Various models based on conditional random fields (CRFs) have been developed to address this task [1]. In recent research, a machine reading comprehension model was utilized to recognize named entities by focusing on the part where NER was used to find a specific span [2].

Nested NER is challenging to perform using existing CRF-based models because it requires recognizing all nested named entities. In [3], based on the dependency parsing model [4], which utilizes a biaffine classifier to classify the dependency relationship between two words in a sentence, a biaffine Span classifier was applied to recognize nested named entities. In [5], a Triaffine nested NER model was proposed by adding cross-span representation information to a Biaffine nested NER model. In [6], researchers attempted to handle nested NER using nested tags in traditional BIO tag-based CRF models. The study in [7] proposed a novel method called focusing, bridging, and prompting (FIT) for few-shot nested NER. This approach consists of three stages: a focusing stage that identifies entity-concentrated parts using pre-trained language models, a bridging stage for span extraction, and a prompting stage that utilizes contextual information and relationships between nested entities. It demonstrated superior performance compared to existing methods on four benchmark datasets: ACE 2004, ACE 2005, GENIA, and KBP2017. Ref. [8] introduced LPNER, a new label-prompt-based method for few-shot nested NER. This approach first extracts all possible spans from text and represents label information through an intuitive prompt template in the form of ‘[entity|type]’. The extracted spans are then classified by calculating similarity with prototype vectors for each entity type, enabling effective recognition of nested entities. Unlike existing multi-round prompt methods, it achieves computational efficiency through single-round processing and showed excellent performance on five nested NER datasets including ACE 2004, ACE 2005, and GENIA. Ref. [9] proposed a novel approach to nested NER using large language models (LLMs). The researchers found that decomposed-QA techniques, which were effective in flat NER, showed degraded performance in nested NER. They introduced inference techniques specialized for nested structures and instruction tuning utilizing detailed label information. While experiments on ACE 2004, ACE 2005, and GENIA datasets demonstrated the effectiveness of this approach, it fell short of BERT-based supervised learning models, suggesting the need for further research.

Classifiers in traditional deep learning models attempt to classify input information by representing it in a vector space. Label embedding learns information regarding the label to be classified, represents it in a vector space, and uses it for classification purposes. The advantage of this label-embedding technique is that information about labels is learned simultaneously, making it possible to create a model that can handle data about labels that have not been learned at all [10]. In [11,12], label-embedding technology was applied to CRF-based models for sequential labeling problems, and a label attention layer was used instead of a CRF layer. A model was proposed to utilize label information and encode it into an attention network. Ref. [13] proposed FewNER, a meta-learning-based model for few-shot NER. This method separates the network into task-independent and task-specific parts, where the task-independent part learns shared knowledge across multiple tasks through meta-learning, while the task-specific part is designed to adapt in low-dimensional space for each task. During testing, only the task-specific part is updated while keeping the task-independent part fixed, enabling efficient adaptation to new tasks and achieving superior performance in intra/cross-domain adaptation experiments compared to existing methods. The research in [14,15] explored few-shot named entity recognition using label embedding. Ref. [14] proposed the SpanNER framework, which can perform effective named entity recognition with minimal training data. This method learns from natural language descriptions of entity classes and processes entity span detection and class inference in separate modules, attempting named entity recognition using word similarity between spans and natural language descriptions. It was specifically designed to enable zero-shot learning and achieved performance improvements of 10% in few-shot learning, 23% in domain transfer, and 26% in zero-shot learning across five datasets compared to existing methods. Ref. [15] attempted few-shot named entity recognition for transfer learning by creating embedding vectors from simple label words and using them for BIO tag-based named entity recognition classification.

The model proposed in this paper uses label descriptions similar to the work of [14,15] but uses label descriptions of one completed long sentence as label information, and the model in [14,15] is a flat NER model, not nested NER, and uses label information in combination with input information; however, in this study, we use the Biaffine model as the base model for nested NER and label information only in label attention in the output layer. The model proposed in this paper uses a span-based affine nested NER model similar to that in [3] but utilizes label information in the output layer to show high performance in a few-shot learning environment.

Our work makes several key contributions that differentiate it from previous approaches to nested NER in few-shot learning environments:

Unlike existing few-shot nested NER methods such as FIT [7], LPNER [8], and FewNER [13] that require complex multi-stage frameworks, our approach achieves comparable or superior performance through a more elegant solution—modifying only the classification layer with label semantic information.
While previous label-aware approaches like [14,15] focus primarily on flat NER and combine label information with input features, our model specifically addresses nested NER challenges by integrating rich label descriptions directly into the output layer of a span-based architecture.
By utilizing comprehensive label descriptions rather than single words, our model captures deeper semantic relationships between entities, making it more robust in few-shot scenarios where limited examples are available.
Our method enables effective transfer learning across different domains and datasets, including between nested and flat NER tasks, demonstrating greater versatility than traditional classification-based approaches.
Extensive experiments across multiple datasets (GENIA, ACE 2004, ACE 2005, and CoNLL 2003) demonstrate that our approach maintains high performance across various entity types, including the challenging nested relationships, where previous methods often struggle in few-shot environments.

3. Span-Based Nested Named Entity Recognition Model

Traditionally, NER has been approached as a sequential labeling task. In this approach, tokens are annotated with BIO (Beginning, Inside, Outside) tags to identify entity boundaries and types. Various models based on conditional random fields (CRFs) have been developed to address this task [1]. In recent research, a machine reading comprehension model was utilized to recognize named entities by focusing on the part where NER was used to find a specific span [2].

3.1. Biaffine-Based Nested Named Entity Recognition Model

The biaffine-based nested NER model classifies all span candidates in a sentence using the following formula:

h = BERT(token_seq)

(1)

h_start = FFNN_start(h)

(2)

h_end = FFNN_end(h)

(3)

s_{i, j} = b i a f f i n e (h_{i}^{s t a r t}, h_{j}^{e n d}) = {(h_{i}^{s t a r t})}^{T} U (h_{j}^{e n d}) + W (h_{i}^{s t a r t} \oplus h_{j}^{e n d}) + b

(4)

y_{i, j}^{'} = \arg m a x (s_{i, j})

(5)

Let

{t o k e n s}_{s e q}

be the words of the input sentence, which were encoded using Bidirectional Encoder Representations from Transformers (BERT) [16]. The encoded information was then processed using

{F F N N}_{s t a r t}

and

{F F N N}_{e n d}

to obtain

h^{s t a r t}

and

h^{e n d}

, respectively.

h^{s t a r t}

and

h^{e n d}

represent all candidate spans in the sentence’s start and end word information. For all candidate spans, a named entity classification was attempted using a biaffine classifier, and the detailed formula is presented in Equation (4).

U, W, and b

are all trainable weights, and finally classify the person’s name tag for span

i, j

(

i, j

is the index of all words in the sentence).

3.2. Label Description, Embedding Model Using Label Information

The LDE model replaces the Biaffine classifier with label attention score in the biaffine nested NER model, which is expressed as follows:

h = BERT(token_seq)

(6)

{h 2}_{k} = B E R T ({t o k e n s}_{k}^{L a b e l D e s c r i p t i o n})

(7)

{h 2}_{k}^{l a b e l} = {F F N N}_{l a b e l} ({h 2}_{k} [C L S])

(8)

{h 3}_{i, j}^{s p a n} = {F F N N}_{s p a n} ([h_{i}; h_{j}])

(9)

s_{i, j}^{k} = attention_score ({h 2}_{k}^{l a b e l}, {h 3}_{i, j}^{s p a n}) = {({h 2}_{k}^{l a b e l})}^{T} {h 3}_{i, j}^{s p a n}

(10)

y_{i, j}^{'} = \arg m a x (s_{i, j})

(11)

The process of encoding the input sentence was the same as that of the Biaffine model in Section 3.1, with the addition of encoding label descriptions. Label descriptions were written for each label, where

k

is the index of all the labels. The label description encoder uses the same BERT with the same weights as the input sentence encoder to create the information

{h 2}_{k}

about the label and the vector of the

[C L S]

index position (the first word of the input sentence in the BERT) from

{h 2}_{k}

through

{F F N N}_{l a b e l}

to

{h 2}_{k}^{l a b e l}

, and utilizes that information as the label embedding. The input sentence information

h

encoded by BERT is concatenated with the start and end indices of all candidate spans (index

i, j

), and the span representation

{h 3}_{i, j}^{s p a n}

for index

i, j

is obtained using the

{F F N N}_{s p a n}

. Then, for each

k

in the span representation

{h 3}_{i, j}^{s p a n}

,

{h 2}_{k}^{l a b e l}

and attention score are obtained to classify the span. The attention score function uses the

d o t

formula from [17], and the overall picture of the model is as follows (Figure 1):

The label descriptions for the LDE model were generated using ChatGPT (https://chatgpt.com/, accessed on 18 August 2023) with reference to annotation rule documents and Wikipedia for each dataset, with detailed information provided in Table A1, Table A2, Table A3 and Table A4 of Appendix A.

3.3. Methodological Overview and Key Differences

Our LDE model differs from existing few-shot nested NER approaches in several key aspects:

Architecture Simplicity: Unlike multi-stage frameworks such as FIT [7] (focusing-bridging-prompting) and LPNER [8] (span extraction + prototype classification), our approach modifies only the output layer of the biaffine model, replacing the traditional classifier with a label attention mechanism.
Label Information Integration: While previous label-aware methods [14,15] combine label information with input features throughout the network, our model incorporates rich semantic descriptions exclusively in the classification layer, enabling more focused utilization of label semantics.
Semantic Depth: Compared to approaches using single label words [15] or simple prompt templates [8], our model leverages comprehensive natural language descriptions (average 43.13 tokens per label) to capture deeper semantic relationships between entity types.
Nested NER Focus: Unlike SpanNER [14] and similar approaches designed primarily for flat NER, our span-based architecture inherently handles hierarchical entity relationships while maintaining the benefits of label semantic information.

This design enables effective few-shot learning through minimal architectural changes while maintaining computational efficiency compared to complex multi-stage approaches.

4. Experiments and Results

To evaluate the performance of the proposed LDE model, we used GENIA [18], ACE 2004 [19], ACE 2005 [20] nested NER data, and CoNLL 2003 English [21] flat NER data (details in Appendix A, Table A2, Table A3, Table A4, Table A5 and Table A6). These datasets were selected as they represent the most widely adopted benchmarks in nested and flat NER research communities. For nested NER, GENIA, ACE 2004, and ACE 2005 are considered the de facto standard evaluation datasets, as nested NER research has received significantly less attention compared to flat NER, and the available comparative studies predominantly utilize these specific datasets for evaluation. For flat NER evaluation, CoNLL 2003 English serves as the most established benchmark dataset in the field, ensuring fair comparison with existing methods across both nested and flat NER paradigms. The label descriptions for the LDE model were written by referring to the named entity tags in each dataset as long sentences containing various words.

4.1. Detailed Experimental Settings

The named entity recognition tags used in each experiment were identical to those in previous studies, and for GENIA data, unified tags were used in accordance with [3]. For experiments with GENIA data, BioBERT-v1.1 [22] was used as the basic backbone model due to domain characteristics, while the BERT-base-cased model [16] was used as the backbone model for ACE 2004, ACE 2005, and CoNLL 2003 English data experiments. The hyperparameters of the model are detailed in Table A7 of the Appendix A. We applied identical settings across all datasets and model variants to ensure fair comparison. All experiments were implemented using a PyTorch v.2.4.0 framework and conducted on NVIDIA RTX A6000 hardware. For the FFNN size, we conducted a parameter sweep across {200, 400, 600, 800, 1200, 1600} and selected the optimal value based on performance for each experimental configuration. The experiments were optimized using each development set to measure performance on the test set. For GENIA data, which lack a development set, experiments were designed identical to [3] to measure performance on the evaluation data. For the few-shot environment setup, k-shot training data were created by randomly extracting k samples for each label to create five different k-shot learning datasets. For comparison models, experiments were conducted using the BERT-LSTM-CRF model utilizing nested tags from [6] and the biaffine model from [5].

4.2. Results

Table 1 shows the named entity recognition experimental results for each dataset, with baseline results from FewNER [13], FIT [7], LPNER [8], and SpanNER [14] cited from their original publications, while BERT-LSTM-CRF, Biaffine, and our LDE models were directly implemented by us. To provide context for comparison, we include several state-of-the-art few-shot nested NER approaches: FewNER [13], a meta-learning based model that separates task-independent and task-specific components; FIT [7], which employs a three-stage process (focusing, bridging, and prompting) to handle nested entities without source domain data; LPNER [8], a label-prompt method that integrates prompt learning with prototype networks in a single round; and SpanNER [14], which decomposes NER into span detection and class inference using natural language descriptions.

In experiments with nested named entity recognition data (GENIA, ACE 2004, ACE 2005), when trained on the full dataset, both biaffine and LDE models showed higher performance compared to the BERT-LSTM-CRF model using nested tags, though the performance difference between biaffine and LDE models was minimal. However, in 5-shot, 10-shot, and 20-shot experiments, the LDE model demonstrated higher performance than the biaffine model, indicating that the proposed LDE model is effective in few-shot environments. Notably, while existing few-shot named entity recognition methods like FewNER [13], FIT [7], and LPNER [8] required complex methods dividing tasks into multiple stages to handle nested NER, the LDE model achieved similar or better performance than existing research in few-shot environments by simply modifying the classification layer, demonstrating the effectiveness of the label description approach for few-shot nested NER. However, in extreme one-shot environments, the LDE model showed lower performance compared to existing research and the BERT-LSTM-CRF model, indicating that the relatively complex LDE model is difficult to train adequately in one-shot environments.

This limitation in extreme few-shot scenarios can be attributed to several factors. First, the architectural complexity of the LDE model requires more training examples to properly optimize its parameters compared to simpler models like BERT-LSTM-CRF. Unlike generative large language models that rely heavily on pre-training, our approach depends on learning the relationships between label descriptions and input spans within a specialized architecture. The high standard deviations observed in one-shot experiments (e.g., 5.54 ± 3.27% for ACE 2005) further suggest that performance is highly sensitive to the specific examples chosen for training, creating potential training instability. As demonstrated in Section IV.D, this limitation can be effectively addressed through transfer learning from related domains, which provides the model with a more robust foundation even when target domain examples are extremely limited.

In experiments with CoNLL 2003 English, a flat named entity recognition dataset, there was a minimal performance difference between BERT-LSTM-CRF, biaffine, and LDE models when trained on the full dataset, indicating that this is a relatively easier dataset compared to other nested NER datasets (GENIA, ACE 2004, ACE 2005). However, in few-shot environments, the BERT-LSTM-CRF model showed higher performance, which appears to be due to this being a relatively easier task, while the LDE model, specialized for nested NER, showed relatively lower performance. It is worth noting that SpanNER [14], which was specifically designed for low-resource flat NER, achieves significantly higher performance (71.1 ± 0.4) on the CoNLL 2003 English dataset in the five-shot setting compared to our LDE model (42.53 ± 7.04). This performance difference can be attributed to SpanNER’s approach of directly learning from natural language descriptions of entity classes and its specialized architecture that separates entity span detection from class inference. While SpanNER excels at flat NER tasks, our LDE model offers advantages in nested NER scenarios, where the hierarchical relationships between entities present additional challenges. This comparison highlights the trade-offs between models optimized for specific NER architectures (flat vs. nested) and suggests that incorporating some of SpanNER’s techniques for handling label semantics could further improve our model’s performance on flat NER tasks.

4.3. Ablation Study on Label Description Components

To systematically isolate and evaluate the impact of label description embedding in our proposed LDE model, we conducted comprehensive ablation experiments on the ACE 2005 dataset. We designed variations of our model by systematically removing or replacing key components, allowing us to quantify the individual contribution of each element to the overall performance. Table 2 shows the experimental results across different model configurations in few-shot learning scenarios. We evaluated four distinct model variants:

Biaffine: Our baseline model without any label embedding components.
LDE with Trainable Label Features (LDE_{w Label Feature}): Replaces BERT-encoded label information with simple trainable embedding vectors.
LDE with Label Words Only (LDE_{w Label Word}): Uses only the entity type words (e.g., “person”, “organization”) without descriptive context.
LDE with Full Label Descriptions (LDE_{w Label Description}): Our complete proposed model using detailed semantic descriptions of entity types.

Table 2. Impact of different label description types on ACE 2005 performance.

ACE 2005 (Nested)
	Avg. Tokens Per Label Description	1-Shot	5-Shot	10-Shot	20-Shot
Biaffine		3.17 ± 1.84	32.38 ± 2.96	49.50 ± 2.34	58.21 ± 0.97
LDE_{w Label Feature}		0.00	15.71 ± 10.01	49.01 ± 1.76	58.10 ± 0.63
LDE_{w Label Word}	4.0	1.96 ± 1.62	37.75 ± 3.28	54.27 ± 2.27	63.42 ± 1.09
LDE_{w Label Description}	43.13	5.54 ± 3.27	44.24 ± 2.01	57.19 ± 1.27	64.50 ± 1.02

The experimental results clearly demonstrate the incremental benefits of each component. In 10-shot and 20-shot settings, the LDE_{w Label Feature} model performed similarly to the baseline Biaffine model (49.01% vs. 49.50% F1 in 10-shot, and 58.10% vs. 58.21% F1 in 20-shot), indicating that merely replacing the classification layer with trainable label features provides minimal advantage. However, incorporating semantic information through label words (LDE_{w Label Word}) produced substantial improvements across all few-shot settings compared to both the baseline and trainable feature models. This confirms our hypothesis that leveraging semantic knowledge from the pre-trained BERT model for labels significantly enhances few-shot learning capability. Most notably, the complete LDE_{w Label Description} model, which utilizes comprehensive entity type descriptions, consistently achieved the best performance across all few-shot scenarios, with particularly substantial gains in extremely low-resource settings (five-shot: 44.24% vs. 32.38% for Biaffine, a relative improvement of 36.6%). This systematic progression in performance demonstrates that the rich semantic context provided by full label descriptions is the critical component driving the success of our approach. We observed that the performance gain from label descriptions becomes more pronounced as the training data become scarcer, with the largest relative improvements occurring in the five-shot setting. This finding confirms that incorporating label semantic information is especially valuable in extremely low-resource scenarios, where the model has insufficient examples to learn the conceptual boundaries between entity types. In the one-shot scenario, all models struggle significantly, with even our best model achieving only 5.54% F1. This indicates that while label descriptions provide meaningful improvements, they cannot completely overcome the fundamental challenge of learning from a single example per class. However, as shown in Section IV.D, this limitation can be addressed through transfer learning approaches. These ablation results conclusively demonstrate that the semantic information captured in our label description embeddings is the key factor enabling robust few-shot learning performance, rather than merely architectural changes to the output layer.

4.4. Transfer Learning

Named entity recognition faces the challenge that tags differ by domain, making it difficult for simple classification-based NER models to utilize transfer learning with training data from different domains. However, the proposed LDE model can apply transfer learning to any NER domain as it learns label descriptions through the BERT model.

Table 3 shows comparison experiments of transfer learning by source domain data on both the ACE 2005 and GENIA datasets. The experimental results show that in few-shot experiments, all performances improved compared to the basic LDE model, demonstrating the potential of models using transfer learning. When attempting transfer learning after training on CoNLL 2003 English data, which is a flat NER dataset from a similar domain and is most easily obtainable, the model achieved a performance of 33.22 in one-shot scenarios on ACE 2005. This shows that transfer learning can overcome the disadvantage of poor performance in extreme few-shot environments, such as one-shot, which was a limitation of the LDE model proposed in this paper.

Similarly, for the GENIA dataset, we conducted transfer learning experiments using models trained on CoNLL 2003 English and ACE 2005 as source domains. These results highlight an important characteristic of our approach: while the LDE model leverages label descriptions, it requires minimal examples from the target domain to adapt effectively. This is expected, as our model fundamentally depends on fine-tuning with domain-specific examples to learn the relationships between label descriptions and domain contexts.

The challenges in cross-domain adaptation can be primarily attributed to the non-overlapping entity tag sets between different domains. For instance, the biomedical domain (GENIA) uses highly specialized entity types like G#DNA, G#protein, and G#RNA that have no direct correspondence in general domain datasets like CoNLL 2003 or ACE 2005.

However, it is worth noting that with even minimal target domain examples (one-shot), transfer learning significantly improves performance compared to training from scratch. For instance, when transferring from CoNLL 2003 to GENIA, the one-shot performance jumps to 35.14% (compared to 11.60% without transfer), and five-shot performance reaches 49.98% (compared to 45.07% without transfer). Similar improvements are observed when transferring from ACE 2005 to GENIA. This demonstrates that the model can rapidly adapt with minimal target domain data, making this approach viable for real-world scenarios where obtaining a few labeled examples is typically feasible.

4.5. Additional Analysis

Additionally, to confirm that the proposed model is robust in nested named entity recognition tasks, we examined the performance by entity type in one five-shot case during ACE 2005 nested NER experiments.

Table 4 includes results from [9], who explored nested NER using GPT-4 through in-context learning. Their methodology utilized GPT-4 in a five-shot learning setting. In their approach, the authors found that decomposed-QA techniques, which were effective for flat NER, showed decreased performance when applied to nested NER. To address this limitation, they introduced specialized inference techniques designed specifically for nested entity structures and incorporated detailed label information in their prompts. Their experiments on the ACE 2005 dataset followed the same evaluation criteria (flat, nested, and nesting) that we use in this study, allowing for direct comparison with our proposed methods. Table 4 shows the performance by entity type for one five-shot case in the ACE 2005 dataset. Performance evaluation utilized the nested type-specific named entity recognition performance from [23]. Here, ‘standard’ is the basic method that evaluates performance for each named entity, ‘flat’ evaluates only flat named entities, ‘nested’ evaluates only nested named entities without judging nested relationships, and ‘nesting’ is an evaluation method that assesses whether all entity tags inside and outside are correct, including nested relationships. The BERT-LSTM-CRF model, despite using the nested tag approach from [6], completely failed to handle named entity recognition considering nested relationships in the five-shot environment. The biaffine model and LDE_{w Label Feature} model can handle nested named entity recognition but showed lower performance than the BERT-LSTM-CRF model. Models using BERT-embedded vectors for labels, LDE_{w Label Word} and LDE_{w Label Description}, both showed high performance; the LDE_{w Label Description} model proposed in this paper demonstrated high performance with nested entities, indicating its strength in nested named entity recognition in few-shot environments. Notably, we have included results from [9], who explored using GPT-4 for nested NER through in-context learning approaches. As shown in Table 4, GPT-4, despite being one of the most advanced and largest language models with billions of parameters, achieves 34.75%, 38.29%, and 6.63% F1 scores on flat, nested, and nesting evaluations, respectively, in a five-shot setting. Our LDE_{w Label Description} model, which is based on a much smaller BERT architecture, achieves competitive or superior performance with 54.48%, 28.31%, and 2.62% on the same metrics. This comparison demonstrates that our proposed approach can achieve comparable or even better performance on nested entity recognition tasks using significantly fewer parameters and computational resources compared to large language models like GPT-4.

To provide deeper insights into how label descriptions influence entity span classification, we conducted a qualitative analysis of specific cases from the ACE 2005 dataset. Table 5 presents four representative examples comparing the predictions of different model variants against gold standard annotations. These cases highlight the enhanced capabilities of our label description approach in handling nested entity recognition challenges.

As shown in Table 5 and Figure 2, the LDE model with label descriptions (LDE_{w Label Description}) consistently captures more complex entity relationships compared to other variants. In Case 1, while all models correctly identify some of the person entities, only the LDE_{w Label Description} model recognizes the pronoun “our” as a potential entity, although it misclassifies it as GPE rather than PER. This suggests that the label descriptions provide richer semantic context that helps the model identify less obvious entity mentions.

Case 2 demonstrates the model’s ability to recognize nested facility and location entities. The sentence contains a complex nested structure where “the lane to the farm” is a facility with “the farm” nested within it. The baseline Biaffine model only identifies person entities and misses this nested structure entirely. In contrast, the LDE_{w Label Description} model correctly identifies both the outer facility entity and the inner location entity, showing how semantic information in label descriptions helps distinguish between closely related entity types in hierarchical structures.

Case 3 provides a compelling example of the model’s capability to handle rare entity types. The gold standard identifies “that barge down there on the river” as a vehicle entity with “the river” nested within it as a location. While most model variants fail to recognize anything beyond basic location entities, the LDE_{w Label Description} model successfully identifies both the outer entity (though misclassified as a facility) and the nested location entity. This demonstrates how label descriptions provide contextual clues that assist in recognizing uncommon entity types, even with minimal training examples.

Finally, Case 4 shows a perfect nested entity recognition scenario where “several OS ##U players” contains the organization “OS ##U” within it. Only the LDE_{w Label Description} model correctly identifies both the outer person entity and the nested organization entity, capturing the hierarchical relationship exactly as in the gold standard. This precision in detecting nested structures highlights how the semantic richness of label descriptions enables the model to better understand entity boundaries and their relationships.

These qualitative examples illustrate that the integration of label descriptions significantly enhances the model’s capability to

Recognize subtle entity mentions that might be overlooked by other approaches;
Correctly identify hierarchical relationships between nested entities;
Distinguish between semantically related entity types (e.g., facility vs. location);
Handle rare entity types with minimal training data;
Maintain precision in complex nested structures.

The analysis confirms that semantic information from comprehensive label descriptions provides valuable contextual knowledge, enabling more robust entity recognition in few-shot learning scenarios. This advantage is particularly pronounced when dealing with complex nested entities, which require a deeper understanding of entity type characteristics and their potential hierarchical relationships.

5. Conclusions

In this study, we changed the output layer of the Biaffine model, which is a span-based nested object name recognition model, to utilize label semantic information for a nested object name recognition model that is robust to a few-shot learning environment. We used deeper semantic information using label descriptions as label information, and the experimental results showed that it outperformed the existing model in an environment with very little training data.

The proposed LDE model is characterized by its ability to handle data from new domains with new labels to some extent because it uses information about the labels together. In future research, we plan to explore the model’s adaptability to new domains and entity types by developing more generalizable label descriptions. This approach could potentially enhance cross-domain transfer capabilities and improve performance when encountering previously unseen entity categories with minimal training data.

Author Contributions

Conceptualization, H.H. and C.L.; methodology, H.H.; software, H.H.; investigation, H.H., Y.J. and W.G.; data curation, W.G.; writing—original draft preparation, H.H.; writing—review and editing, C.L.; supervision, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is the result of commissioned research project supported by the affiliated institute of ETRI[2023-033].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in publicly accessible repositories. These data were derived from the following resources available in the public domain: GENIA dataset [18], ACE 2004 dataset [19], ACE 2005 dataset [20], and CoNLL 2003 English dataset [21]. All datasets are well-established benchmark datasets in the named entity recognition research community.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1 shows a prompt for automatically generating label descriptions through ChatGPT. In this prompt, “Document” refers to inputting documents about the data (e.g., [18,19,20,21]) and specifying the tags used for the task.

Table A1. Prompt for generating label descriptions.

Prompt
{Document} Refer to the above document and create descriptions for each object recognition tag {TAG1, TAG2, TAG3, …}. Write a relatively long sentence for each description.

Table A2, Table A3 and Table A4 show the label descriptions for each named entity recognition dataset.

Table A2. Named entity tag descriptions and definitions for GENIA dataset.

NER Tag	Description
O	O: Outside of named entities.
G#DNA	G#DNA: DNA is a fundamental molecule in the biomedical domain, serving as the genetic blueprint of all living organisms, carrying hereditary information, enabling genomics research, facilitating personalized medicine, aiding in diagnostics and forensics, and offering insights into evolutionary biology and gene editing for disease treatments.
G#protein	G#protein: Proteins are fundamental biomolecules in the biomedical domain, serving as essential building blocks of cells and tissues, catalysts for biochemical reactions, and key regulators of biological processes, playing crucial roles in health and disease.
G#cell_type	G#cell_type: In the biomedical domain, cell type refers to a specific class or category of cells sharing similar morphological, functional, and genetic characteristics within a particular organism or tissue.
G#cell_line	G#cell_line: A cell line in the biomedical domain refers to a population of cells derived from a single source and cultured in a laboratory setting, providing a valuable tool for studying various biological processes and testing experimental treatments.
G#RNA	G#RNA: RNA (Ribonucleic acid) in the biomedical domain is a versatile molecule responsible for translating genetic information from DNA to proteins, regulating gene expression, and serving as a potential therapeutic target in various diseases.

Table A3. Named Entity Tag Descriptions and Definitions for CoNLL 2003 English Dataset.

NER Tag	Description
O	O: Outside of named entities.
ORG	ORG: Organizations indicate companies, subdivisions of companies, brands, political movements, government bodies, publications, musical companies, public organizations, and other collections of people within the text data.
MISC	MISC: Miscellaneous includes adjectives and derivations from terms associated with locations, organizations, individuals, or general concepts, as well as encompassing entities indicating religions, political ideologies, nationalities, languages, events, wars, sports-related names, titles, slogans, eras, or objects types within the text data.
PER	PER: Persons indicate the first, middle, and last names of people, animals and fictional characters, and aliases within the text data.
LOC	LOC: Locations are entities that indicate specific places, such as roads, trajectories, regions, structures, natural locations, public places, commercial places, assorted buildings, countries, or landmarks, within the text data.

Table A4. Named entity tag descriptions and definitions for ACE 2004/2005 datasets.

NER Tag	Description
O	O: Outside of named entities.
ORG	ORG: Organizations are entities that indicate government agencies, commercial companies, educational institutions, non-profit organizations, and other structured groups of people, encompassing various subtypes like government, commercial, educational, non-profit entities within the text data.
GPE	GPE: Geo-Political Entities are complex entities that represent geographical regions, political entities, or their combinations, including nations, states, cities, and other politically defined locations that have both a physical and administrative aspect within the text data.
PER	PER: Persons are entities that indicate human beings through named mentions (proper names), nominal mentions (descriptions), or pronominal mentions (pronouns), including individual names, titles, roles, and references to people as individuals or groups within the text data.
LOC	LOC: Locations are entities that indicate purely geographical or physical places without political significance, such as mountains, rivers, oceans, regions, continents, and other natural or artificial geographical features within the text data.
FAC	FAC: Facilities are entities that indicate human-made structures, buildings, architectural features, and infrastructure elements like bridges, airports, highways, and other constructed spaces within the text data.
VEH	VEH: Vehicles are entities that indicate any means of transportation, including cars, planes, ships, spacecraft, and other mobile machines designed for carrying and transporting within the text data.
WEA	WEA: Weapons are entities that indicate instruments designed for combat or defense, including conventional weapons, military equipment, and other tools specifically designed for warfare or combat within the text data.

Table A5 shows the nested entity ratio for each nested named entity recognition dataset. Table A6 shows the average number of spans per sentence for each dataset.

Our experiments utilized the following datasets:

GENIA: 16,691 sentences in the training set and 1855 sentences in the test set.
ACE2004: 6198 sentences in the training set and 809 sentences in the test set.
ACE2005: 7294 sentences in the training set and 1057 sentences in the test set.
CoNLL 2003 English: 14,041 sentences in the training set and 3453 sentences in the test set.

For our k-shot setups (where k = 1, 5, 10, or 20), we randomly selected k sentences per entity tag from the training set, excluding the ‘O’ (Outside) tag. The selection process only verified that a sentence contained at least one instance of the target tag, without considering entity overlap in nested scenarios. To ensure robust evaluation, we created five different few-shot data groups using five distinct random seeds. This approach allowed us to account for potential variance in performance due to the specific examples selected in the few-shot setting.

Table A5. Nested ratio (%) of each dataset.

Groups	GENIA
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	0.00%	22.83%	18.54%	22.92%	17.97%	21.73%
#2	12.50%	21.62%	21.50%	18.50%	-	-
#3	53.33%	27.72%	12.26%	26.51%	-	-
#4	21.05%	24.44%	28.57%	18.53%	-	-
#5	42.86%	24.74%	26.84%	24.12%	-	-
Groups	ACE 2004
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	62.50%	58.25%	54.50%	53.53%	45.81%	46.75%
#2	48.65%	46.07%	57.39%	58.49%	-	-
#3	57.89%	53.85%	52.78%	57.16%	-	-
#4	60.61%	65.48%	52.27%	58.22%	-	-
#5	56.25%	58.99%	60.10%	53.00%	-	-
Groups	ACE 2005
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	70.21%	50.78%	46.90%	49.67%	40.66%	39.56%
#2	53.49%	50.28%	55.65%	46.66%	-	-
#3	48.08%	46.06%	45.66%	47.98%	-	-
#4	40.00%	47.67%	50.15%	49.30%	-	-
#5	66.67%	53.99%	50.40%	45.30%	-	-

Table A6. Average number of spans per sentence for each dataset.

Groups	GENIA
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	3.00	3.68	3.56	3.36	3.08	3.02
#2	3.20	4.44	4.00	4.00	-	-
#3	3.00	4.04	4.24	4.15	-	-
#4	3.80	3.60	3.64	4.21	-	-
#5	4.20	3.88	3.80	3.98	-	-
Groups	ACE 2004
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	5.71	5.89	6.03	5.26	3.58	3.75
#2	5.29	5.46	5.70	5.30	-	-
#3	5.43	5.57	5.14	5.09	-	-
#4	4.71	4.80	5.36	5.78	-	-
#5	4.57	5.09	5.51	5.35	-	-
Groups	ACE 2005
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	6.71	5.51	4.84	5.48	3.40	2.88
#2	6.14	5.17	5.31	5.02	-	-
#3	7.43	4.71	4.44	4.96	-	-
#4	4.29	4.91	4.61	5.11	-	-
#5	6.86	4.66	5.41	4.94	-	-
Groups	CoNLL 2003 English
Groups	1-Shot	5-Shot	10-Shot	20-Shot	Train (100%)	Test
#1	3.00	2.45	2.28	2.59	1.67	1.64
#2	3.25	2.65	2.90	2.48	-	-
#3	1.50	2.40	2.43	2.18	-	-
#4	1.50	3.15	2.25	2.61	-	-
#5	3.25	1.85	2.45	2.59	-	-

Table A7. Hyperparameters of models for experiments.

Parameter	Value
BiLSTM size (Only BERT-LSTM-CRF)	256
FFNN size	{200, 400, 600, 800, 1200, 1600}
Drop out	0.1
Optimizer	AdamW
Learning rate	5 × 10⁻⁵
Weight decay	0.1

References

Yang, J.; Zhang, T.; Tsai, C.-Y.; Lu, Y.; Yao, L. Evolution and emerging trends of named entity recognition: Bibliometric analysis from 2000 to 2023. Heliyon 2024, 10, e30053. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Yu, J.; Bohnet, B.; Poesio, M. Named Entity Recognition as Dependency Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Dozat, T.; Manning, C.D. Deep Biaffine Attention for Neural Dependency Parsing. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Yuan, Z.; Tan, C.; Huang, S.; Huang, F. Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022. [Google Scholar]
Straková, J.; Straka, M.; Hajič, J. Neural architectures for nested NER through linearization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5326–5331. [Google Scholar]
Xu, Y.; Yang, Z.; Zhang, L.; Zhou, D.; Wu, T.; Zhou, R. Focusing, bridging and prompting for few-shot nested named entity recognition. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 2621–2637. [Google Scholar]
Yang, J.; Zhu, Z.; Ming, H.; Jiang, L.; An, N. LPNER: Label Prompt for Few-shot Nested Named Entity Recognition. In Proceedings of the 16th Asian Conference on Machine Learning, Hanoi, Vietnam, 11–14 November 2024. [Google Scholar]
Kim, H.; Kim, J.-E.; Kim, H. Exploring Nested Named Entity Recognition with Large Language Models: Methods, Challenges, and Insights. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 15–20 November 2024. [Google Scholar]
Akata, Z.; Perronnin, F.; Harchaoui, Z.; Schmid, C. Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1425–1438. [Google Scholar] [CrossRef] [PubMed]
Cui, L.; Zhang, Y. Hierarchically-Refined Label Attention Network for Sequence Labeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
Kim, H.; Kim, H. Integrated Model for Morphological Analysis and Named Entity Recognition Based on Label Attention Networks in Korean. Appl. Sci. 2020, 10, 3740. [Google Scholar] [CrossRef]
Li, J.; Chiu, B.; Feng, S.; Wang, H. Few-Shot Named Entity Recognition via Meta-Learning. IEEE Trans. Knowl. Data Eng. 2022, 34, 4245–4256. [Google Scholar] [CrossRef]
Wang, Y.; Chu, H.; Zhang, C.; Gao, J. Learning from Language Description: Low-shot Named Entity Recognition via Decomposed Framework. In Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 7–11 November 2021; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 1618–1630. [Google Scholar]
Ma, J.; Ballesteros, M.; Doss, S.; Anubhai, R.; Mallya, S.; Al-Onaizan, Y.; Roth, D. Label Semantics for Few Shot Named Entity Recognition. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
Luong, T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015. [Google Scholar]
Kim, J.D.; Ohta, T.; Tateisi, Y.; Tsujii, J. GENIA corpus—A semantically annotated corpus for bio-textmining. Bioinformatics 2003, 19, i180–i182. [Google Scholar] [CrossRef] [PubMed]
Doddington, G.; Mitchell, A.; Przybocki, M.; Ramshaw, L.; Strassel, S.; Weischedel, R. The Automatic Content Extraction (ACE) program-tasks, data, and evaluation. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 26–28 May 2004; pp. 837–840. [Google Scholar]
Walker, C.; Strassel, S.; Medero, J.; Maeda, K. ACE 2005 Multilingual Training Corpus; Linguistic Data Consortium: Philadelphia, PA, USA, 2006; Available online: https://catalog.ldc.upenn.edu/LDC2006T06 (accessed on 16 January 2025).
Tjong Kim Sang, E.F.; De Meulder, F. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, AB, Canada, 31 May–1 June 2003; pp. 142–147. [Google Scholar]
Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
Rojas, M.; Bravo-Marquez, F.; Dunstan, J. Simple Yet Powerful: An Overlooked Architecture for Nested Named Entity Recognition. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2108–2117. [Google Scholar]

Figure 1. Figure of label description embedding model.

Figure 2. Sentence F1-measure performance by model for examples in Table 5.

Table 1. Performance comparison of NER models across different datasets.

GENIA (Nested)
	1-Shot	5-Shot	10-Shot	20-Shot	100%
FewNER [13]	23.24 ± 0.73	29.19 ± 0.64
FIT [7]		34.43 ± 9.06	44.98 ± 3.38	51.26 ± 3.96
LPNER [8]	26.32 ± 3.88	44.99 ± 2.20
BERT-LSTM-CRF (our)	17.71 ± 8.33	17.91 ± 8.39	29.16 ± 1.79	42.12 ± 8.46	76.82
Biaffine (our)	5.75 ± 2.58	30.74 ± 2.95	31.93 ± 1.93	50.65 ± 2.39	78.20
LDE (our)	11.60 ± 2.51	45.07 ± 3.57	47.90 ± 2.27	61.46 ± 1.62	79.01
ACE 2004 (Nested)
	1-Shot	5-Shot	10-Shot	20-Shot	100%
FIT [7]		35.87 ± 4.92	44.88 ± 4.82	53.92 ± 2.99
LPNER [8]	25.67 ± 7.05	42.67 ± 7.55
BERT-LSTM-CRF (our)	10.84 ± 7.17	18.94 ± 9.28	34.19 ± 4.30	49.70 ± 2.96	82.38
Biaffine (our)	3.52 ± 1.95	28.10 ± 1.91	47.25 ± 1.10	55.98 ± 0.88	85.77
LDE (our)	4.06 ± 3.04	42.23 ± 2.30	57.04 ± 1.19	62.86 ± 0.56	85.82
ACE 2005 (Nested)
	1-Shot	5-Shot	10-Shot	20-Shot	100%
FIT [7]		37.74 ± 5.33	42.25 ± 10.65	52.71 ± 2.55
LPNER [8]	25.01 ± 10.83	46.62 ± 5.82
BERT-LSTM-CRF (our)	5.74 ± 6.49	30.80 ± 5.89	41.46 ± 1.49	51.88 ± 1.24	80.94
Biaffine (our)	3.17 ± 1.84	32.38 ± 2.96	49.50 ± 2.34	58.21 ± 0.97	83.95
LDE (our)	5.54 ± 3.27	44.24 ± 2.01	57.19 ± 1.27	64.50 ± 1.02	84.23
CoNLL 2003 English (Flat)
	1-Shot	5-Shot	10-Shot	20-Shot	100%
SpanNER [14]		71.1 ± 0.4
BERT-LSTM-CRF (our)	26.55 ± 8.55	46.64 ± 1.67	49.74 ± 3.82	63.61 ± 1.85	89.86
Biaffine (our)	9.22 ± 9.30	36.95 ± 4.00	23.15 ± 6.85	50.82 ± 2.56	91.81
LDE (our)	6.61 ± 7.50	42.53 ± 7.04	45.05 ± 2.38	63.85 ± 1.85	92.06

Table 3. Transfer learning effects on ACE 2005 and GENIA dataset performance.

ACE 2005 (Nested)
	Source Domain	1-Shot	5-Shot	10-Shot	20-Shot
LDE	-	5.54 ± 3.27	44.24 ± 2.01	57.19 ± 1.27	64.50 ± 1.02
LDE	GENIA	14.28 ± 7.26	50.94 ± 1.59	59.92 ± 1.96	66.99 ± 0.75
LDE	CoNLL 2003 English	33.22 ± 7.33	58.06 ± 0.57	62.44 ± 1.21	68.13 ± 0.80
GENIA (Nested)
	Source Domain	1-Shot	5-Shot	10-Shot	20-Shot
LDE	-	11.60 ± 2.51	45.07 ± 3.57	47.90 ± 2.27	61.46 ± 1.62
LDE	CoNLL 2003 English	35.14 ± 0.72	49.98 ± 1.42	57.51 ± 0.63	64.75 ± 0.56
LDE	ACE 2005	31.67 ± 1.95	48.52 ± 1.69	58.88 ± 1.56	65.35 ± 0.35

Table 4. Entity type-specific performance analysis in 5-shot ACE 2005.

ACE 2005 (Nested) 5-Shot
	Standard	Flat	Nested	Nesting
In-context Learning (GPT-4) [9]		34.75	38.29	6.63
BERT-LSTM-CRF	33.85	36.56	25.74	0.00
Biaffine	31.65	40.48	17.11	0.33
LDE_{w Label Feature}	27.31	37.02	15.11	0.54
LDE_{w Label Word}	39.92	47.59	27.69	3.51
LDE_{w Label Description}	43.98	54.48	28.31	2.62

Table 5. Sample cases showing entity recognition results across different model variants for ACE 2005 dataset. ## symbols indicate token separation marks due to BERT’s WordPiece Tokenizer.

Case 1
Model	Result
Sentence	[CLS] We have been so damned busy with the holidays (that’s what we call December at our house) that I just haven’t had time. [SEP]
Gold	(1, 1, ‘We’, ‘PER’) (15, 15, ‘we’, ‘PER’) (19, 19, ‘our’, ‘PER’) (19, 20, ‘our house’, ‘FAC’) (23, 23, ‘I’, ‘PER’)
Biaffine	(1, 1, ‘We’, ‘ORG’) (15, 15, ‘we’, ‘PER’) (15, 23, ‘we call December at our house) that I’, ‘PER’) (23, 23, ‘I’, ‘PER’)
LDE_{w Label Feature}	(1, 1, ‘We’, ‘PER’) (1, 15, “We have been so damned busy with the holidays (that ‘ s what we”, ‘PER’) (1, 23, “We have been so damned busy with the holidays (that ‘ s what we call December at our house) that I”, ‘PER’) (15, 15, ‘we’, ‘PER’) (15, 23, ‘we call December at our house) that I’, ‘PER’) (23, 23, ‘I’, ‘PER’)
LDE_{w Label Word}	(1, 1, ‘We’, ‘ORG’) (15, 15, ‘we’, ‘PER’)
LDE_{w Label Description}	(1, 1, ‘We’, ‘ORG’) (15, 15, ‘we’, ‘PER’) (19, 19, ‘our’, ‘GPE’) (23, 23, ‘I’, ‘PER’)
Case 2
Model	Result
Sentence	[CLS] He found the lane to the farm and drove up into the farm ##yard, where he was met by the farmer [SEP]
Gold	(1, 1, ‘He’, ‘PER’) (3, 7, ‘the lane to the farm’, ‘FAC’) (6, 7, ‘the farm’, ‘FAC’) (12, 22, ‘the farm ##yard, where he was met by the farmer’, ‘FAC’) (16, 16, ‘where’, ‘FAC’) (17, 17, ‘he’, ‘PER’) (21, 22, ‘the farmer’, ‘PER’)
Biaffine	(1, 1, ‘He’, ‘PER’) (16, 16, ‘where’, ‘LOC’) (17, 17, ‘he’, ‘PER’)
LDE_{w Label Feature}	(1, 1, ‘He’, ‘PER’) (1, 17, ‘He found the lane to the farm and drove up into the farm ##yard, where he’, ‘PER’)
LDE_{w Label Word}	(1, 1, ‘He’, ‘PER’) (16, 16, ‘where’, ‘LOC’) (17, 17, ‘he’, ‘PER’) (21, 22, ‘the farmer’, ‘PER’)
LDE_{w Label Description}	(1, 1, ‘He’, ‘PER’) (3, 7, ‘the lane to the farm’, ‘FAC’) (6, 7, ‘the farm’, ‘LOC’) (16, 16, ‘where’, ‘LOC’) (17, 17, ‘he’, ‘PER’) (21, 22, ‘the farmer’, ‘PER’)
Case 3
Model	Result
Sentence	[CLS] You see that barge down there on the river ? [SEP]
Gold	(1, 1, ‘You’, ‘PER’) (3, 9, ‘that barge down there on the river’, ‘VEH’) (8, 9, ‘the river’, ‘LOC’)
Biaffine	(6, 6, ‘there’, ‘LOC’)
LDE_{w Label Feature}
LDE_{w Label Word}
LDE_{w Label Description}	(3, 9, ‘that barge down there on the river’, ‘FAC’) (6, 6, ‘there’, ‘LOC’) (8, 9, ‘the river’, ‘LOC’)
Case 4
Model	Result
Sentence	[CLS] All ##egation ##s have come to light that several OS ##U players received illegal benefits including cash, access to cars, etc. [SEP]
Gold	(9, 12, ‘several OS ##U players’, ‘PER’) (10, 11, ‘OS ##U’, ‘ORG’) (21, 21, ‘cars’, ‘VEH’)
Biaffine
LDE_{w Label Feature}	(9, 11, ‘several OS ##U’, ‘PER’) (9, 12, ‘several OS ##U players’, ‘PER’)
LDE_{w Label Word}	(9, 12, ‘several OS ##U players’, ‘PER’)
LDE_{w Label Description}	(9, 12, ‘several OS ##U players’, ‘PER’) (10, 11, ‘OS ##U’, ‘ORG’)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hwang, H.; Jung, Y.; Lee, C.; Go, W. A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information. Appl. Sci. 2025, 15, 8255. https://doi.org/10.3390/app15158255

AMA Style

Hwang H, Jung Y, Lee C, Go W. A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information. Applied Sciences. 2025; 15(15):8255. https://doi.org/10.3390/app15158255

Chicago/Turabian Style

Hwang, Hyunsun, Youngjun Jung, Changki Lee, and Wooyoung Go. 2025. "A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information" Applied Sciences 15, no. 15: 8255. https://doi.org/10.3390/app15158255

APA Style

Hwang, H., Jung, Y., Lee, C., & Go, W. (2025). A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information. Applied Sciences, 15(15), 8255. https://doi.org/10.3390/app15158255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information

Abstract

1. Introduction

2. Related Works

3. Span-Based Nested Named Entity Recognition Model

3.1. Biaffine-Based Nested Named Entity Recognition Model

3.2. Label Description, Embedding Model Using Label Information

3.3. Methodological Overview and Key Differences

4. Experiments and Results

4.1. Detailed Experimental Settings

4.2. Results

4.3. Ablation Study on Label Description Components

4.4. Transfer Learning

4.5. Additional Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI