Next Article in Journal
The Black Box Paradox: AI Models and the Epistemological Crisis in Motor Control Research
Previous Article in Journal
From Digital Services to Sustainable Ones: Novel Industry 5.0 Environments Enhanced by Observability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entity Span Suffix Classification for Nested Chinese Named Entity Recognition

1
School of Electrical Engineering, Guangzhou Railway Polytechnic, Guangzhou 511300, China
2
School of Biomedical Engineering, Guangdong Medical University, Dongguan 523109, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(10), 822; https://doi.org/10.3390/info16100822
Submission received: 18 August 2025 / Revised: 11 September 2025 / Accepted: 18 September 2025 / Published: 23 September 2025
(This article belongs to the Section Artificial Intelligence)

Abstract

Named entity recognition (NER) is one of the fundamental tasks in building knowledge graphs. For some domain-specific corpora, the text descriptions exhibit limited standardization, and some entity structures have entity nesting. The existing entity recognition methods have problems such as word matching noise interference and difficulty in distinguishing different entity labels for the same character in sequence label prediction. This paper proposes a span-based feature reuse stacked bidirectional long short term memory network (BiLSTM) nested named entity recognition (SFRSN) model, which transforms the entity recognition of sequence prediction into the problem of entity span suffix category classification. Firstly, character feature embedding is generated through bidirectional encoder representation of transformers (BERT). Secondly, a feature reuse stacked BiLSTM is proposed to obtain deep context features while alleviating the problem of deep network degradation. Thirdly, the span feature is obtained through the dilated convolution neural network (DCNN), and at the same time, a single-tail selection function is introduced to obtain the classification feature of the entity span suffix, with the aim of reducing the training parameters. Fourthly, a global feature gated attention mechanism is proposed, integrating span features and span suffix classification features to achieve span suffix classification. The experimental results on four Chinese-specific domain datasets demonstrate the effectiveness of our approach: SFRSN achieves micro-F1 scores of 83.34% on ontonotes, 73.27% on weibo, 96.90% on resume, and 86.77% on the supply chain management dataset. This represents a maximum improvement of 1.55%, 4.94%, 2.48%, and 3.47% over state-of-the-art baselines, respectively. The experimental results demonstrate the effectiveness of the model in addressing nested entities and entity label ambiguity issues.

Graphical Abstract

1. Introduction

Named entity recognition (NER) is one of the fundamental tasks in building knowledge graphs [1,2], mainly focusing on identifying named entities within unstructured text. Traditional NER tasks mainly recognize entities such as personal names, locations, and organizations, which involve large-scale training datasets [3]. With the development of deep learning technology, NER has transcended the boundaries of traditional general domains and demonstrated corresponding application value in multiple specific fields, such as social media [4], manufacturing [5], etc. Taking manufacturing as an example, performing NER in supply chain management case texts enables the effective identification of key information, including management domains, issues, and solutions covered in these texts. These key pieces of information can not only significantly reduce users’ information retrieval costs and enhance supply chain management capabilities, but also promote the rapid development of management knowledge graph construction and intelligent management optimization [6].
NER research has evolved from traditional rule-based and dictionary-based methods to machine learning approaches, and further advanced with deep learning techniques [7]. Because deep learning methods eliminate the need to manually design features or rules, they have become one of the main approaches to NER. Recently, the emergence of large-scale language model pre-trained methods, such as bidirectional encoder representation of transformers (BERT) [8] and ELMo [9] methods, combined with deep learning techniques, has further enhanced the recognition performance of NER tasks [10].
Compared to unstructured text in general domains, specific domain corpora typically exhibit variable text lengths and complex nested relationships among entities. For example, the characteristics of the Weibo dataset are a high degree of colloquial language, a short sentence length, and insufficient context information [11]. Although manufacturing supply chain management cases comprehensively record the knowledge of supply chain business process management and are long texts with relatively complete context information, there are complex nested relationships among entities, which means that one entity may contain other entities within its structure. For example, “supplier management” and “supplier” are both entities, while the “supplier management analytic hierarchy process” represents a larger nested entity, each potentially belonging to different categories. These challenges make NER tasks in specific domains more complex. Currently, some research works have enhanced character-level feature representations through potential word matching. However, incorrect potential words can easily interfere with the prediction of entity labels. Furthermore, lexical databases typically do not cover all potential words in specific fields [12]. The improvement of neural network structures mainly increases the depth of the neural network, which can obtain abstract text features and recognize entities in sentences of limited length. However, deep neural networks are prone to network degradation [13], which affects the generalization ability of the model. When dealing with nested or complex entities, the approach of recognizing entities through sequence labeling may lead to the inability to recognize or distinguish hierarchical entities. For example, in supply chain management entities, there may be situations where the same character but different entity label categories occur. The sequence labeling method may have the problem of error propagation.
Based on the motivation for the above research, this paper proposes the span-based feature reuse stacked bidirectional long short term memory (BiLSTM) nested named entity recognition (SFRSN) model, which transforms the sequence labeling mode into a span-based entity suffix category classification mode. Firstly, the BERT pre-trained model is used to generate character feature representations. Then, a feature reuse stacked BiLSTM model is proposed to obtain context features. Third, this paper constructs a character adjacency matrix based on entity boundaries, a dilated convolution neural network (DCNN) is used to obtain the span representation features of character pairs based on context features, and a single-tail selection function is introduced to obtain the entity suffix category classification features between characters. On this basis, a global feature gated attention module is proposed to fuse the two features. Finally, the optimal entity label prediction result of the span classification is obtained. In summary, the contributions of this paper are as follows:
  • We propose a new Chinese NER model for specific domains’ entity recognition, in which we propose a feature reuse stacked BiLSTM. This feature reuse stacked BiLSTM utilizes feature reuse to involve shallow features in the learning process of deep models, alleviating the problem of deep network degradation while maintaining the acquisition of semantic abstract features.
  • We propose a single-tail selection function to predict entity suffix categories, which transforms entity label sequence prediction into label suffix classification. This method enables the model to recognize nested entities and complex entities with identical characters corresponding to different entity categories. Additionally, a global feature gated attention mechanism is proposed to fuse the span representation feature of character pairs and the entity suffix category classification feature of character pairs in the form of a weight distribution.
  • This paper evaluates the performance of SFRSN model on Chinese NER datasets in four different fields, including social media, news, finance, and manufacturing supply chain management. Experimental results show that SFRSN has the best performance compared with the latest methods of recent years.

2. Related Work

At present, the main research work on NER for specific fields mainly focuses on optimizing character feature representation and improving neural network structures. This forms the research basis of SFRSN and, thus, is described in detail.

2.1. Character Feature Representation Optimization

Due to the semantic complexity of the Chinese language and the lack of large-scale labeled samples in specific fields, the limited number of labeled samples makes it difficult for NER models to obtain sufficient semantic features during training [14]. Therefore, some research works enrich character features by matching lexical, glyph structure, and radical features from external data sources to improve recognition performance. Wu et al. [15] proposed a multi-metadata embedding-based cross-transformer (MECT) model for NER, which integrates multi-metadata embeddings into the bidirectional transformer architecture. This approach combines Chinese character features with radical-level embeddings to better obtain semantic information of Chinese characters. Zhao et al. [16] proposed a multi-granularity contrastive learning framework (MCL) that optimizes inter-granularity distribution distances in lexicons, emphasizing the important key matching words in the dictionary. By combining cross-granularity contrastive learning, it can effectively utilize lexical information and improve recognition performance. Wang et al. [17] proposed an interactive fusion technique that utilizes graph attention networks to fuse character and vocabulary information, and it connects character and word features to achieve secondary fusion. The experimental results on multiple datasets are superior to other models that fuse vocabulary information. Meng et al. [18] proposed a glyph feature-based NER method for medical entity recognition. This approach enhances model accuracy and generalization capability by fusing glyph vectors into text representations and leveraging negative samples from the dataset. Du et al. [19] proposed the multi-medical entity recognition method MF-MNER, which uses a bidirectional auto-regressive transformer model to dynamically fuse semantic character representations, requiring only a small number of labeled samples and performing well in recall performance.
In addition to word, radical, and glyph fusion methods, some research works generate character feature vectors through natural language processing pre-trained models, such as RoBERTa [20], ERNIE [21], and ALBERT [22]. The character features generated by these models can be adjusted according to the actual character context, enabling them to learn character vector representations and capture the corresponding semantic and syntactic information of sentences. They have achieved excellent results in the task of entity recognition.

2.2. Neural Network Architecture Enhancement

In recent years, researchers have enhanced the generalization ability of models for flat or nested entity recognition by enhancing the structure of neural networks. Qi et al. [23] proposed a few-shot NER model based on fine-grained prototype networks. This method dynamically constructs non-entity prototypes to capture negative sample characteristics and designs an inconsistency metric module to minimize intra-class variation between entities and non-entities. Deng et al. [24] proposed binocular attention-based stacked BiLSTM with a CNN (BACSBN) model, which utilizes the binocular attention mechanism to focus on potential word information and key information of entities of different lengths. The experimental results are superior to those of previous advanced models. Sun et al. [25] proposed the global span semantic dependency awareness and filtering network (GSSDAF) model, which generates a span matrix through multi-head biaffine attention to generate span matrices, introduces the global span dependency awareness (GSDA) module to capture global semantic dependencies, designs the local span dependency enhancement (LSDE) module to enhance local dependencies, and implements nested NER classification through binocular biaffine decoding.
To address the issue where identical characters with different entity labels interfere with sequence prediction, some research works have transformed sequence label sequence prediction into an entity boundary classification problem. Li et al. [26] proposed a new W2NER model, which transforms a sequence labeling entity recognition task into a character-pair relationship classification task. The model captures relationships between characters through character-pair grid representation, thereby enhancing its ability to recognize discontinuous entities. Chen et al. [27] proposed a unified global feature-aware framework (GFNER) for NER, which introduces a global feature learning module that focuses on obtaining important global relationships for entity boundaries, obtaining the associations between entities, and enhancing flat and nested NER tasks. Li et al. [28] proposed a multi-level semantic enhancement method for a self-distillation BERT framework (MSE), which optimizes complex entity training through data augmentation, designs a boundary smoothing module to enhance boundary robustness, and adopts distillation reweighting to balance entity and context knowledge, significantly improving the recognition performance. Mo et al. [29] proposed a multi-task transformer framework that integrates entity boundary detection tasks through labeled relation classification and optimizes type mapping using prior distributions from external knowledge bases. The experiments demonstrated its effectiveness for flat, nested, and discontinuous NER tasks.
In summary, the above-mentioned research works mainly focused on enriching the representation of character features by matching word, glyph, radicals, and other types of information, using improved neural network structures to enhance the generalization ability of the model, or constructing spans based on entity boundaries to solve nested NER problems. The research work reported in this paper improved on the basis of the above-mentioned studies. This paper introduces feature reuse stacked BiLSTM. The output of each layer of BiLSTM is concatenated with the original features as the input of the next layer of BiLSTM, rather than using a simple feature stacking method. The feature reuse stacked BiLSTM is combined with DCNN and single-tail selection functions to transform the problem of entity label sequence prediction into the problem of entity span category classification. The global feature gated attention mechanism further enhances the entity recognition performance by selecting features that are more beneficial for entity span suffix category classification through weight selection.

3. Methods

The structure of SFRSN is shown in Figure 1, which consists of five parts: character embedding layer, feature reuse stacked BiLSTM layer, convolution layer, single-tail selection function layer, and global feature gated attention mechanism layer. In the character embedding layer, the character embedding vector is generated by the BERT pre-trained model. The feature reuse stacked BiLSTM layer mainly concatenates the output features of each BiLSTM layer with the original features as the input of the next BiLSTM layer, mainly obtaining deep context features of the text. The convolution layer is used to obtain spatial relations and structural patterns such as nested and overlapping entity spans in the sentence span matrix. The single-tail selection function layer is used to obtain all possible entity span suffix category classification features, and the global feature gated attention mechanism layer combines the entity span suffix category classification feature and the span structure representation through weight selection, ultimately achieving entity span suffix category classification.

3.1. Character Embedding Layer

This paper uses a BERT pre-trained model to generate character feature vectors. BERT contains 12 hidden layer outputs. Unlike the traditional method of outputting single-layer features, in this paper, the output of the last four layers is extracted from the hidden feature representation of the BERT output, and the output of the four layers is stacked and averaged to generate more accurate character features. Given a sentence sequence, the output of the character feature sequence of BERT is represented as x = ( x 1 , x 2 , . . . , x n ) , where n is the length of the sentence.

3.2. Feature Reuse Stacked BiLSTM Layer

Unlike traditional stacked BiLSTM, this paper makes improvements to the stacked BiLSTM. Given input sequence x generated by BERT, for the character feature vector x t corresponding to the time step t, the output h t 1 of the first layer’s BiLSTM is the concatenation of the forward and backward LSTM hidden states. The calculation formula for h t 1 is shown in Formula (1).
h t 1 = [ L S T M ( x t ) ; L S T M ( x t ) ] , t [ 1 , n ]
Then, the output of the first layer is concatenated with the original input feature vector x t to achieve the feature reuse of the first layer’s BiLSTM, forming the enhanced feature h t 1 as the input of the second layer’s BiLSTM. The feature reuse formula of the first layer’s BiLSTM is shown in Formula (2).
h t 1 = [ x t , h t 1 ]
By analogy, starting from the second layer’s BiLSTM, the input of this layer is the enhanced features after reusing the features of the previous layer. Taking the L layer (L > 1) as an example, the output feature formula of the L-layer BiLSTM is shown in Formula (3).
h t L = [ L S T M L ( h t ( L 1 ) ) ; L S T M L ( h t ( L 1 ) ) ] , t [ 1 , n ] , L > 1
Concatenate the output of this layer with the output of the previous layer to form the reused feature of the L-layer, as shown in Formula (4).
h t L = [ h t L 1 , h t L ] , L > 1
After L-layer stacking, the final context feature sequence formula of the model is represented as shown in Formula (5).
H L = ( h 1 L , h 2 L , . . . , h n L )

3.3. Convolution Layer

Because the main objective of the entity recognition task in this paper is to predict entity label suffix between character pairs, it is necessary to generate high-quality character-pair span representations. In traditional entity label sequence labeling, one character corresponds to a complete label, such as B-PER, I-LOC, etc. Entity labels consist of two parts, including entity boundary identifiers such as B—(Begin) and I—(Inside), and category identifiers such as PER (Person), LOC (Location), etc. In this paper, category identifiers are defined as entity suffixes, and for non-entity characters, their labels are O. This paper maps them to a special suffix category N. Therefore, this paper adds three columns to the annotation content of the dataset, as shown in Figure 2a. In Figure 2a, the character position represents the position of the character in the sentence, and the entity label mainly includes the entity boundary and entity suffix category. Within the entity range, the entity category of the entity head character is marked as the corresponding entity suffix, and the position of the corresponding tail character is marked as the position of the entity tail character. The entity category of other characters is marked as [‘N’], and the position of the tail character is consistent with the position of the character in the sentence. For the case of a nested entity, all entity suffixes are marked on the entity category corresponding to the header character of the nested entity, and the positions of the corresponding tail character are marked.
Figure 2b shows the span representation matrix corresponding to the sample in Figure 2a. In the sample shown in Figure 2a, the text “I love Beijing tian’anmen, the scenery in Beijing is really beautiful” has a sentence length of 15. In this paper, a 15 × 15 span representation matrix is constructed, and each element (i, j) in the matrix represents the span formed from the i-th character to the j-th character in the sentence. When mapping entity labels, if an entity starts with character i and ends with character j, the element values of (i, j) in the matrix are labeled as the suffix category of the entity. The sentence sample includes the entity “Beijing tian’anmen” and “Beijing”. In the entity “Beijing tian’anmen”, the connection between “bei” and “men” is the entity category, and their positions in the sentence are 2 and 6, respectively. Therefore, the “POI” entity suffix is marked at position (2,6) in the representation matrix. Similarly, the “bei” and “jing” in “Beijing” are marked as an “LOC” entity suffix at position (8,9) in the representation matrix. Additionally, if a character c has no entity category relation with any other characters, then only its self-connection position is assigned the entity suffix category N. In the case of a nested entity, if the nested entity also starts with character i and ends with character k, where i is not equal to j, the element values of (i, k) in the matrix are labeled as the suffix category of the entity. For characters inside the entity (i.e., non-head and tail characters), they form a span with their own position (i, i), and their category is marked as N. The span corresponding to all character pairs (i, j) that do not form any entity is also marked with the category of N. In this way, the entity recognition task is transformed into a classification task, predicting the corresponding entity suffix category for each element (i, j) in the span matrix.
Based on the span representation matrix, referring to the architecture of the convolution layer of the W2NER model in [26], the convolution layer in this paper includes conditional layer normalization, span representation construction, and DCNN. In conditional layer normalization, the normalized gain parameter γ i j for character i and character j based on character i is first calculated. The calculation formula is shown in Formula (6).
γ i j = W 1 h i L + b 1
Among them, h i L is the context feature corresponding to time t in the sequence H L output by the feature reuse stacked BiLSTM network in Section 3.2, W 1 and b 1 are the training parameter matrix and the bias, respectively. Then, the normalization bias function λ i j calculation formula is shown in Formula (7). In Formula (7), W 2 and b 2 are the training parameter matrix and the bias, respectively.
λ i j = W 2 h i L + b 2
Finally, calculate the normalized representation feature V i j corresponding to character i and character j, as shown in Formula (8).
V i j = γ i j ( h j L 1 d h k = 1 d h h j k L 1 d h k = 1 d h ( h j k L 1 d h k = 1 d h h j k L ) 2 ) + λ i j
Among them, d h represents the dimension of the context feature vector dimension, and h j k L is the k-th dimension of h j L .
The gain and bias of conditional layer normalization are dynamically generated parameters based on character i, which can adjust the entity suffix category according to specific characters, enhancing the model’s ability to express context structures.
In span representation construction, the relative position information between each pair of characters is mapped to E d , the regional information of the lower triangle and upper triangle regions in the character adjacency span matrix is mapped to E r . The two vectors are concatenated with the character pair representation feature V to obtain the span representation feature with the relative position and regional information of character pairs, denoted as V = [ V , E d , E r ] .
In the character adjacency span matrix, there are inclusion and intersection relationships among nested entities. Therefore, multi-scale DCNN is used to obtain the spatial structural relationship of the span matrix. Given the span representing feature V , let the j-th level dilation convolution with a dilation step of θ be represented as D j θ ( · ) ; the dilation step size θ = 1,2,3 and 3 × 3 convolution kernel are used. Among them, θ = 1 is the standard convolution kernel, which is used to obtain locally continuous grad features to help identify short entities. When θ = 2, the receptive field is expanded to obtain grad features separated by one character, which helps identify medium-length entities. When θ = 3, the maximum receptive field is obtained, which helps identify long-distance entities. Additionally, the 3 × 3 convolution kernel is a standard trade-off between capturing sufficient spatial context information and maintaining computational efficiency. The first layer D 1 1 ( · ) of the DCNN network performs the input V to obtain the output of the first layer of DCNN. Then, the first dilated convolution result F 1 is obtained through the activation function G e L U . The calculation formula is shown in Formula (9).
F 1 = G e L U ( D 1 1 ( V ) )
F 1 is input to the second dilation step for iteration, as shown in Formula (10).
F 2 = G e L U ( D 2 1 ( F 1 ) )
Repeat the above calculation to obtain the third dilation step’s dilation convolution result F 3 . Concatenate the DCNN outputs with different dilation rates to obtain the final span representation feature, as shown in Formula (11).
F = [ F 1 , F 2 , F 3 ]

3.4. Single-Tail Selection Function Layer

Although the character pair span representation feature F output by the convolution layer can predict entity suffix categories between character pairs, [26] indicates that the introduction of the biaffine predictor in combination with the character pair span representation feature can enhance the prediction effect. In order to reduce training parameters, this paper proposes a single-tailed selection function, which mainly improves the biaffine predictor by reducing the linear transformation matrix. The idea of single-tail selection is to consider all characters in a sentence as having an entity suffix category relation with other characters. The single-tail selection function acquires entity suffix category classification features between character pairs and ultimately converts the features into the probability values of the output. To avoid information redundancy, the single-tail selection function only determines the entity suffix category between the first and last characters in the entity. The tail character of an entity is called the tail, as shown in the entity “Beijing” in Figure 2a. When performing calculations, the single-tail selection function usually determines the entity suffix category between the characters “bei” and “jing”. Therefore, the goal of the single-tail selection function is to identify the entity tail characters corresponding to the entity head characters and the entity suffix categories that exist between them.
Given the character context feature sequence H L , the representations s i and o j of the span head character context feature h i L and span tail character context feature h j L are calculated using two linear functions, respectively. The corresponding formulas are shown in Formulas (12) and (13).
s i = w f h i L
o j = w b h j L
Then, the entity suffix category classification feature between the span head and span tail characters is calculated using the broadcasting function. The calculation formula is shown in Formula (14).
y ( s i , o j ) = W v ( ( s i o j ) + b ) + b s
Among them, w f , w b , and W v are training parameters; b and b s are bias functions; and ⊕ is the broadcasting function. The broadcast function mainly swaps the s dimension in the s i R b × s × h ; then, it adds feature dimension 1 to obtain s i R b × s × h × 1 , swaps the h dimension in the o j R b × s × h , and then adds feature dimension 1 to obtain o j R 1 × b × s × h . Perform element-level addition through the broadcasting mechanism, where s i broadcasts along the fourth dimension and o j broadcasts along the first dimension to generate a four-dimension feature y ( s i , o j ) . Finally, adjust the output y ( s i , o j ) to y ( s i , o j ) R b × s × s × h through dimension transformation to form the classification feature of each suffix category in the predefined set of entity suffix categories given the character pairs c i and c j .

3.5. Global Feature Gated Attention Mechanism Layer

After obtaining the classification feature of the entity suffix category for character pairs and the final representation of the character pair span feature, due to the situation that some entities in specific fields have the same characters but different entity categories, such as “Peking University” and “Beijing”, where the character “bei” belongs to the “organization” category and “place” category, respectively, similar entity spans correspond to different entity suffixes. Therefore, the method of feature addition may not be able to adapt to different context differences. In response to the different degrees of adaptation of each sample to context differences, this paper introduces a global feature gated attention mechanism. Through joint training of the model, the feature of the entity suffix category for character pairs and the final representation feature of the character pair span are autonomically weighted. The two features further improve the classification accuracy of entity suffix categories through a weighted combination.
The calculation process of the global feature gating attention mechanism is as follows: firstly, the final representation of the character pair span feature is subjected to dimensionality reduction processing, with the aim of obtaining the classification feature for each suffix category in the predefined entity suffix category set of the given character pairs c i and c j based on the final representation of the character pair span feature. The dimension reduction formula is as follows:
y i j = W v ( G e L U ( W F i j + b ) ) + b v
Among them, W and W v are trainable parameter matrices, and b and b v are bias functions. F i j is the final representation of the span feature of the character pair c i and c j .
Set the classification feature of the entity suffix category of character pairs as matrix A, and set the classification feature of the entity suffix category obtained based on the final representation of the character pair span feature as matrix B. Taking matrix A as an example, calculate the average value on the matrix A space to obtain the global average feature. The main purpose is to preserve the overall features of the sample and suppress noise interference. Then, take the maximum value in the span-head character direction of the matrix and the average value in the span-tail character direction. This is to obtain significant features in the span-head feature direction of the matrix, which can be considered as the key information of the entity’s span-head characters. At the same time, the maximum value is taken in the span-tail character direction of the matrix, and the average value is taken in the span-head character direction, with the aim of obtaining significant features in the span-tail character direction of the matrix, which can be considered as the key information of the entity’s span-tail characters. The three types of features are concatenated to combine the global features, denoted as Y A . Similarly, the above operation is repeated for matrix B to obtain the corresponding combined global features, denoted as Y B . The two features are concatenated and dimensionality-reduced to obtain the weights of matrix A, denoted as α 1 , and the weights of matrix B, denoted as α 2 . The relationship between the weights of matrix A and matrix B is set to α 1 + α 2 = 1 . The calculation formula of α 1 is shown in Formula (16), and the feature fusion formula is shown in Formula (17).
α 1 = w 1 [ Y A , Y B ] + b 1
z = α 1 y + α 2 y
Among them, w 1 and b 1 are the weight matrix and bias term, and z is the predicted feature of the entity suffix category by combining the features of the two parts of characters for the entity suffix category. These weights adaptively determine which type of feature (semantic relationship or structural feature) is more relied upon in the final classification. For example, when classifying long entities in a sample, the head and tail characters themselves may not be directly related, so DCNN may obtain long-distance entity span features, and, thus, the weight of α 2 will be relatively high. When solving the problem of different entity categories for the same character, the gated mechanism can dynamically adjust weights based on different context structures or semantics.
The softmax function is used to calculate the prediction probability of the character pairs c i and c j for the entity suffix category, denoted as p i j k ; k is the entity suffix category.
During the training period, this study used binary cross entropy as the model loss function. The calculation formula of the loss function is as follows:
L = 1 i n 1 j n 1 k m q i j k log ( p i j k )
Among them, n is the number of characters in the sentence, m is the number of entity suffix categories, and q i j k represents the probability of marking the entity suffix category k between character c i and c j in Figure 2b. When performing entity label suffix classification, the input sentence is processed through modules such as BERT and feature reuse stacked BiLSTM to obtain the final classification score matrix z, where z ( i , j , k ) represents the score of the character pair (i, j) belonging to the entity suffix category k. This study first applies the softmax function to calculate the probability distribution along the category dimension, obtaining the probability matrix P. Then, the probability matrix P is traversed, and all non-N categories with the highest probability entity span outputs are extracted. If there is an overlapping or nested entity phenomenon, in this study, the one with the highest predicted probability score is retained for output, while the rest are filtered out.

4. Experiments

4.1. Experimental Settings

4.1.1. Datasets

In order to evaluate the performance of the SFRSN model proposed in this paper in the entity recognition task of a specific domain, this study conducted experiments on the Weibo, Ontonotes, and resume datasets. Among them, the Weibo, Ontonotes, and resume datasets are, respectively, from the fields of social media, news, and human resources and belong to public datasets. Among them, the Weibo dataset includes four types of entities: PER, LOC, ORG, and GPE, each of which corresponds to named entity and nominal mention entity. PER and LOC entities are dominant, and there is a certain degree of category imbalance. The Ontonotes dataset contains 18 fine-grained entity types (such as PERSON, ORG, DATE, etc.), and due to the relatively large scale of the dataset, the distribution of samples in each category is relatively balanced. The resume dataset contains 8 types of entities, including NAME, ORG, RACE, PRO, etc. The occurrence frequency of NAME and ORG entities is higher than that of RACE, PRO, and other categories, indicating a phenomenon of class imbalance. The statistics of the three datasets are shown in Table 1. In addition, to test the generalization ability of the model, this study also added the supply chain management dataset used in [24] for testing. The supply chain management dataset is a self-labeled corpus of manufacturing enterprise supply chain management cases collected by the authors team in this study. The dataset includes 12 named entities including object, attribute, attribute_value, company, field, index, index_system, industry, method, model, problem, and trigger_word. The distribution of each category in the dataset is shown in Table 2. Among them, the number of entities of object and trigger_word accounts for about 64%, and there is a phenomenon of class imbalance. The main reason is that this dataset mainly describes the event logic knowledge of management events, so the number of event trigger word and management object class entities is relatively large. All four datasets are for character-level entity span construction and character feature modeling. The scale of the supply chain management dataset is 1200 sentences. Since this dataset is not divided into a training set and test set, all experiments on the supply chain management dataset in this study were verified by the 5-fold cross validation method, while the other three datasets were divided into training and testing sets. To ensure the reliability of the experimental results, three independent experiments were conducted on each of these three datasets in this study, and different random seeds were used for initialization each time. The performance indicators of the final report are the average of three experimental results.

4.1.2. Model Setting

To provide a comprehensive view of the model’s efficiency, this paper first analyzes the computational complexity of the SFRSN model. Let the sentence length be n, the character feature dimension be d, the BiLSTM hidden layer dimension be h, the stacked layer number be l, the number of entity suffix categories be c, and the convolution kernel size be k. The complexity of the BERT encoder is O ( n 2 × d ) , the complexity of feature reuse stacked BiLSTM is O ( l × n × h 2 ) , the complexity of DCNN is O ( n 2 × k 2 ) , and the complexity of the single tailed selection function and gate attention is O ( n 2 × c ) . The computational complexity of the entire model is O ( n 2 × d ) + O ( l × n × h 2 ) + O ( n 2 × k 2 ) + O ( n 2 × c ) . The hardware configuration for the experimental phase included an Intel(R) Core(TM) i9-13900K CPU @ 3.00 GHz (Intel Corporation, Santa Clara, CA, USA), two NVIDIA GeForce RTX 4090 GPUs (NVIDIA Corporation, Santa Clara, CA, USA), and 128 GB of DDR4 RAM (Samsung Electronics Co., Ltd., Suwon, Republic of Korea). When processing data, this study did not impose length restrictions or pruning, and the longest sentence in the batch was used as the sample length for training to ensure the overall performance of the model. For testing, the number of LSTM hidden layer cells in the Ontonotes dataset was set to 384, while the number of LSTM hidden layer cells in the other three datasets was set to 256. The number of BiLSTM stacked layers for feature reuse was 2, and the number of dilated convolution kernels in the Weibo dataset and the resume dataset was 96. The number of dilated convolution kernels in the Ontonotes dataset and the supply chain management dataset was 128 and 64, respectively. The dropout rate was set to 0.5, and other parameters were randomly initialized. All experiments were evaluated using micro and macro P(precision), micro and macro R(recall), and micro and macro F1 score indicators. The primary success criterion for our proposed SFRSN model is to achieve a higher overall micro and macro F1 score than all compared baseline models on each of the four datasets. For the ablation experiments, the success criterion for each proposed component is that its removal results in a decrease in micro and macro F1 scores across all four datasets. In addition, to ensure that the observed performance changes were not caused by random fluctuations, in each experiment, a random seed was used to shuffle the order of the training set, and the experimental results presented on the Weibo, Ontonotes, and resume datasets are the average results of three independent experiments. All experimental results include their 95% confidence intervals.

4.2. Overall Comparison Experimental Results

In this section, this paper presents the experimental results of SFRSN on the Ontonotes, Weibo, resumes, and supply chain management datasets, and entity recognition methods from recent years are selected for comparison. The model was trained using GPU. During training, it took 158 s, 16 s, 27 s, and 17 s, respectively, to iterate on the Ontonotes, Weibo, resume, and supply chain management datasets. The comparison results are shown in Table 3 and Table 4. The comparison methods include the following:
  • Lattice-based models, including MCL; Wang et al. [17].
  • BERT-based with conditional random field (CRF) improvement model, including BACSBN.
  • Span-based with biaffine model, including W2NER.
  • Other state-of-the-art models including MSE, GFNER.
To ensure fairness, these models used the same pre-trained BERT model as the SFRSN model as the character feature encoder.
It can be seen from Table 3 and Table 4 that SFRSN achieves a better recognition performance than the comparison methods in all four specific domain datasets. For the Ontonotes, Weibo, resume, and supply chain management datasets, the micro-F1 scores of SFRSN are 83.34%, 73.27%, 96.90%, and 86.77%, respectively. Macro-F1 of SFRSN outperforms the micro-F1 on both the resume and supply chain management datasets, indicating that its performance is more balanced in identifying different types of entities on these two datasets. On the Weibo dataset with imbalanced categories, its micro-F1 scores are higher. Compared to the Weibo and Ontonotes datasets with relatively balanced entity categories, its micro-F1 score is superior to the macro-F1 score. Compared with character feature enhancement methods, SFRSN brings relative improvements in micro-F1 scores of 0.9–1.28%, 0.3–4.94%, 0.28–1.1%, and 1.09–2.68%, respectively. This is because Wang et al. [17] used graph attention networks to fuse word and character information, but incorrect words and long-distance words may affect information interaction, resulting in errors in feature iteration and affecting recognition results. While MCL adopts cross-granularity contrastive learning of dictionaries and the interaction of important word information, emphasizing the information of key matching words, the recognition effect is improved. However, MCL uses lexical information for dense cross-granularity interaction on the initial lattice structure, which increases training time and computing resources. The SFRSN model does not require word segmentation or matching of character potential words, which can reduce the interference of incorrect words. The feature reuse stacked BiLSTM can alleviate the problem of deep network degradation while obtaining deep context features. DCNN and single-tail selection function, respectively, obtain entity suffix category classification features from different perspectives and solve the problem of interference from nested entities and different entity labels of the same character, and the fusion of information by the global feature gated attention mechanism can further improve the recognition effect.
Compared with the BERT-based with CRF improvement model, classification models based on span entity boundaries, or other models, SFRSN achieved improvements on four datasets. This is because, although BACSBN obtains deep context features and focuses on important sentence information through stacked BiLSTM and binocular attention mechanism, respectively, its ability to recognize nested entities, long-distance entities, and entity boundaries is relatively weak. The MSE method improves the training quality through data augmentation and distillation reweighting methods, but the boundary smoothing module weakens the model’s classification of boundaries. Meanwhile, the way of balancing the context led to the insufficient generalization performance of the model on the Weibo dataset, which primarily consists of short texts. GFNER introduces a global feature learning module to obtain the associations between entities, but global features may interfere with the recognition of key boundary signals of entities. W2NER obtains context features through shallow BiLSTM and processes the features output by the convolution layer and biaffine by simple feature addition, but there is the problem of insufficient feature interaction ability. The SFRSN model can overcome the shortcomings of the above-mentioned comparison methods. It uses feature reuse stacked BiLSTM to obtain deep context features of sequences, enhancing the ability of semantic understanding while alleviating the problem of deep network degradation. The single-tail selection function can reduce training parameters. Meanwhile, the convolution layer is combined to enhance the acquisition of entity boundary information. The global feature gated attention mechanism improves the feature interaction ability and recognition accuracy. It indicates that SFRSN is more suitable for entity recognition in small and medium-sized specific domain corpora.

4.3. Effect of Each Component of SFRSN

SFRSN mainly improves three parts, including feature reuse stacked BiLSTM, single-tail selection function, and global feature gated attention mechanism. This section of the experiment mainly studied the influence of each part on the recognition performance of SFRSN. In the experiment, SFRSN was divided into the following four situations, and the experimental results are shown in Table 5 and Table 6.
  • w/o DCNN indicates that the SFRSN model removes DCNN while other structures remain unchanged;
  • w/o feature reuse stacked BiLSTM indicates that SFRSN only uses traditional 2-layer stacked BiLSTM to obtain text context features;
  • w/o single-tail selection function indicates that SFRSN uses the biaffine function instead of the single-tail selection function, while other structures remain unchanged;
  • w/o global feature gated attention mechanism indicates that SFRSN uses feature addition instead of the global feature gated attention mechanism.
It can be seen from Table 5 and Table 6 that when DCNN is removed, the micro-F1 and macro-F1 scores on all four datasets decreased. This indicates that DCNN is crucial for obtaining the span feature representation of entity structures, whether it is the Weibo dataset with unbalanced entity categories, the supply chain management dataset, or the resume dataset. For the Ontonotes dataset with relatively balanced entity categories, the improvement in recognition effect brought by DCNN is effective. When the feature reuse stacked BiLSTM is removed, the micro-F1 values of SFRSN in the Ontonotes, Weibo, resume, and supply chain management datasets decrease by 0.39%, 0.30%, 0.36%, and 0.17%, respectively. This indicates that the feature reuse enables the model to learn more sufficient features, alleviating the problem of deep network degradation. When the biaffine function is used to replace the single-tail selection function, the micro-F1 and macro-F1 values of SFRSN decrease, which indicate that the single-tail selection function not only does not require matrix linear mapping and reduce training parameters, but also improves the final recognition performance. When the global feature gated attention mechanism is removed and the feature addition method is adopted, the micro-F1 values of SFRSN in the Ontonotes, Weibo, resume, and supply chain management datasets decreased by 0.71%, 0.76%, 0.51%, and 0.35%, respectively, which affect the recognition performance of the SFRSN model. This indicates that the global feature gated attention mechanism can obtain salient features from both the direction of the span-head character and the span-tail character within character pair span representations and entity suffix classification features. By weighting distributions, it selectively enhances features more beneficial for entity recognition, outperforming simple feature addition. For the Weibo dataset composed mainly of short texts, the global feature-gated attention mechanism is helpful to improve the recognition effect. In the supply chain management dataset, the performance only decreases slightly. This is because the description of the supply chain management corpus is relatively professional and the frequency of words in some entities is relatively high. Simple feature addition is sufficient to achieve a good recognition effect. However, the global feature gated attention mechanism is still conducive to improving recognition performance. In summary, this experiment demonstrates that the recognition performance of each component for SFRSN is effective.

4.4. Tuning of Hyperparameters in SFRSN

In SFRSN, the feature reuse stacked BiLSTM is used to obtain deep context features of text. Feature reuse with different stacked layers has different effects on the classification of the span of entity suffixes. In order to test the effect of different stacked layers, a set of experiments was conducted in this section, with the feature reuse BiLSTM stacked layers set to 1, 2, and 3. Other parameters were consistent with the above parameter settings. The experimental results are shown in Table 7 and Table 8.
As shown in Table 7 and Table 8, with the increase in the number of feature reuse stacked BiLSTM layers, the micro-F1 and macro-F1 scores of SFRSN in all four datasets improved. This indicates that increasing the number of stacked BiLSTM layers can improve deep context features to a certain extent, which is very important for understanding the semantic structure of sentences. However, when the number of stacked BiLSTM layers is set to 3, the micro-F1 and macro-F1 scores of SFRSN will decrease. This indicates that overly deep neural networks involve more shallow features in model training, which can lead to overfitting and a decrease in recognition performance. Figure 3 shows the comparison of the micro-F1 values in four datasets with different numbers of stacked layers. It can be seen from Figure 3 that the optimal feature reuse stacked BiLSTM layer number is set to 2. This result also confirms the observations in Table 7 and Table 8.
The number of convolution kernels in the DCNN of SFRSN will affect the model’s learning of the span structure information of character pairs. In this section, this paper studies the impact of different numbers of convolution kernels on entity recognition, with the number of convolution kernels set at 64, 96, and 128. Similar to the above experiments, other parameters remain unchanged. The experimental results are shown in Table 9 and Table 10.
As shown in Table 9 and Table 10, except for the supply chain management dataset, in the other three datasets, by appropriately increasing the number of convolution kernels, the DCNN can enhance the model’s representation ability and help identify nested and complex entity structures. In the Weibo dataset and resume dataset, an excessive number of convolution kernels leads to a decrease in the recognition performance of the SFRSN model. This indicates that in small and medium-scale datasets, too many training parameters will increase the risk of overfitting and reduce the training effect. In the supply chain management dataset, since the dataset size is smaller than the other three datasets, some professional terms are relatively fixed; using a smaller number of convolution kernels can achieve better generalization results. Figure 4 shows the comparison of the micro-F1 values in four datasets with different numbers of dilated convolution kernel sizes. It can be seen from Figure 4 that appropriately increasing the number of convolution kernels can help improve the effect of entity recognition. This result also confirms the observations in Table 9 and Table 10.

4.5. Qualitative Case Analysis

In addition to providing quantitative indicators of the recognition performance of the SFRSN model, this paper also conducts a qualitative analysis of the prediction results of the SFRSN, MCL, and W2NER models on Chinese cases. This analysis focuses on three challenging areas: span length, nesting entity structure depth, and entity type disambiguation, including successful and failed cases.
For span length analysis, given the sentence, “The company is committed to the hierarchical supplier optimization analysis method”, this sentence includes the entity “hierarchical supplier optimization analysis method”. The MCL model incorrectly segments long entities into short entities, identifying “supplier optimization” and “analysis method”, while W2NER recognizes them as “hierarchical suppliers” and “optimization analysis method”. SFRSN correctly recognizes the entire entity. This success demonstrates the DCNN module’s ability to capture long-distance dependent structural relationships. Combined with the single-tail selection function, which can coherently obtain long-distance entity information and better identify long-range entities.
For a nested entity structure depth analysis, given the sentence “Beijing municipal people’s government issues new notice”, nested entities include “Beijing municipality” (LOC) and external entity “Beijing municipal people’s government” (ORG). SFRSN successfully predicts these two entities. In contrast, MCL can recognize the entity “Beijing” and “people’s government”, while the W2NER model can only recognize the entity “Beijing people’s government”. This indicates that the span-based entity suffix classification can naturally accommodate nested entities, which is different from the label sequence prediction model with entity label conflicts for the same character. However, for the sentence “China banking and insurance regulatory commission”, the SFRSN, MCL, and W2NER models can only recognize the entity “China” and cannot recognize the external entity “China banking and insurance regulatory commission”. This may indicate that the frequency of the nested entity’s occurrence is relatively low, making it difficult to conduct correct inferring.
For entity type disambiguation, given the sentence “Apple incorporated has released a new apple mobile phone”, the sentence contains two “apple” entities, namely company (ORG) and product (product). SFRSN accurately identifies instances based on context, while the MCL and W2NER models can only recognize instances as “company”. This indicates that feature reuse stacked BiLSTM can enhance the understanding of sentences’ semantic structures and the effectiveness of the global feature gating attention mechanism in integrating span structure features and span classification features to achieve accurate disambiguation. However, for the sentence “This apple is very sweet, and the stock price of apple is falling”, the SFRSN, MCL, and W2NER models can only recognize the “apple” entity as “company”, which indicates that in the absence of guidance from context keyword information, the model may rely on vocabulary bias to incorrectly identify entities.

5. Conclusions

Named entity recognition in Chinese texts in specific domains presents significant challenges due to its lack of standardization, complex entity nesting, and the potential for the same characters to represent different entity categories. To address these issues, this paper proposes the SFRSN model, which transforms the sequence labeling prediction problem into an entity suffix span classification problem. The SFRSN model first generates character vector features through a BERT pre-trained model and proposes a feature reuse stacked BiLSTM to obtain deep context features from the text. On this basis, the span representation features of character pairs are obtained through DCNN. Meanwhile, a single-tail selection function is introduced to obtain the classification features of character pairs for entity suffix categories. Finally, a global feature gated attention mechanism is proposed to fuse these two types of features and achieve the classification of entity suffix spans. Extensive experiments on four Chinese NER datasets, including the Ontonotes, Weibo, resume, and supply chain management datasets, demonstrated that the SFRSN model achieves state-of-the-art performance, outperforming recent advanced methods.
Despite these promising results, this study has several limitations that open avenues for future research. First, the model’s robustness to extreme noise and highly colloquial language, and its generalization capability across vastly different domains, were not fully evaluated. Second, the current model relies on substantial computational resources, which may hinder its deployment in resource-constrained environments. Third, the use of sensitive real-world data, such as supply chain management corpus, raises concerns regarding privacy and the potential for model bias due to inherent class imbalance. Finally, the current implementation lacks integration with external Chinese lexical resources, which could enhance the recognition of domain-specific terms. To address these limitations and advance this research line, this paper outlines the following future directions. First, for efficiency and deployment optimization, we will explore model compression techniques, such as knowledge distillation and pruning, to develop a lightweight version of SFRSN. Additionally, we will implement improved span pruning strategies to automatically filter out character pairs unlikely to form entities, reducing computational overhead during both training and inference. Second, for robustness and generalization, we plan to rigorously evaluate the model’s cross-domain transfer capabilities and its innate robustness by testing it on noisy social media datasets and other domain-specific corpora. We will also investigate adaptive suffix modeling schemes, including dynamic or hierarchical labeling approaches, to better handle complex and nested entity structures. Third, for privacy and bias mitigation, we will research privacy-sensitive technologies such as differential privacy and synthetic data generation. To address class imbalance and potential bias, we will employ techniques including balanced sampling and loss reweighting, and use large language models to mitigate this bias. In addition, for knowledge integration and cross-language expansion, we will explore methods to effectively incorporate external Chinese lexical resources and knowledge bases to enhance recognition of out of vocabulary and domain-specific terms. Furthermore, given the language-agnostic nature of SFRSN’s core architecture, we will adapt and evaluate its performance on cross-language and low-resource NER tasks, aiming to provide a unified solution beyond Chinese.

Author Contributions

J.D.: Methodology, Investigation, Formal analysis, Writing—original draft. R.Z.: Validation, Resources, Investigation, Writing—review and editing. W.Y.: Supervision, Formal analysis. S.Z.: Supervision, Software. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2024 university research project of the Guangzhou education bureau under Grant No. 2024312480 and the research start-up fund for newly recruited talents at Guangzhou Railway Polytechnic under Grant No. GTXYR2314.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Z.; Chen, H.; Xu, G.; Ren, M. A novel large-language-model-driven framework for named entity recognition. Inf. Process. Manag. 2025, 62, 104054. [Google Scholar] [CrossRef]
  2. Qian, L.; Cui, Y.; Lian, L.; Chen, Y.; Huang, L. Survey of Named Entity Recognition for Low-Resource Scenarios. J. Comput. Eng. Appl. 2025, 1–26. Available online: https://link.cnki.net/urlid/11.2127.TP.20250224.1408.002 (accessed on 24 February 2025).
  3. Wu, B.; Deng, C.; Guang, B.; Chen, X.; Zan, D.; Chang, Z.; Xiao, Z.; Q u, D.; Wang, Y. Dynamically Transfer Entity Span Information for Cross-domain Chinese Named Entity Recognition. J. Softw. 2022, 33, 3776–3792. [Google Scholar]
  4. Liu, P.; Wang, G.; Li, H.; Liu, J.; Ren, Y.; Zhu, H.; Sun, L. Multi-granularity cross-modal representation learning for named entity recognition on social media. Inf. Process. Manag. 2024, 61, 103546. [Google Scholar] [CrossRef]
  5. Lyu, P.; Yue, Y.; Yu, W.; Xiao, L.; Liu, C.; Zheng, P. An adaptive multi-neural network model for named entity recognition of Chinese mechanical equipment corpus. J. Eng. Des. 2024, 1–26. [Google Scholar] [CrossRef]
  6. Zhu, Y.; Jia, R.; Wang, G.; Xie, C. A review of supply chain finance risk assessment research: Based on knowledge graph technology. Syst. Eng. Theory Pract. 2023, 43, 795–812. [Google Scholar]
  7. Li, l.; Xi, X.; Sheng, S.; Cui, Z.; Xu, J. Research Progress on Named Entity Recognition in Chinese Deep Learning. J. Comput. Eng. Appl. 2023, 59, 46–69. [Google Scholar]
  8. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  9. Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar]
  10. Shang, J.; Cheng, C.; Lu, Y.; Xi, L.; Cheng, J.; Liu, H. Fine-grained named entity recognition of invasive alien plants using multi-feature fusion. Trans. Chin. Soc. Agric. Eng. 2025, 41, 230. [Google Scholar]
  11. Yang, W.; Li, D.; Liang, F. Sina weibo bursty event detection method. IEEE Access 2019, 7, 163160–163171. [Google Scholar] [CrossRef]
  12. Zhang, Z.; Sun, S.; Xu, S.; Xu, F.; Liu, J. Named entity recognition in motor field based on BERT and multi-window gated CNN. Appl. Res. Comput. 2023, 40, 107–114. [Google Scholar]
  13. Zhang, X.; Wang, Z.; Gong, Q.; Wang, Y. State of Health Estimation of Lithium-Ion Batteries Based on Hybrid Neural Networks with Residual Connections. J. Electrochem. Soc. 2025, 172, 020503. [Google Scholar] [CrossRef]
  14. Li, R.; Wang, P.; Fu, X.; Wang, L. Morphology-Driven Entity Recognizer for Process Specification Text. Available online: https://link.cnki.net/doi/10.13196/j.cims.2024.0020 (accessed on 16 April 2024).
  15. Wu, S.; Song, X.; Feng, Z. MECT: Multi-metadata embedding based cross-transformer for Chinese named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, 2–5 August 2021; Volume 1, pp. 1529–1539. [Google Scholar]
  16. Zhao, S.; Wang, C.; Hu, M.; Yan, T.; Wang, M. MCL: Multi-granularity contrastive learning framework for Chinese NER. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 14011–14019. [Google Scholar]
  17. Wang, Y.; Wang, Z.; Yu, H.; Wang, G.; Lei, D. The interactive fusion of characters and lexical information for Chinese named entity recognition. Artif. Intell. Rev. 2024, 57, 258. [Google Scholar] [CrossRef]
  18. Meng, W.; Guo, J.; Xing, K.; Wei, N.; Wang, Q.; Liu, B. A Chinese Medical Named Entity Recognition Method Based on Glyph Features. Acta Electron. Sin. 2024, 52, 1945–1954. [Google Scholar]
  19. Du, H.; Xu, J.; Du, Z.; Chen, L.; Ma, S.; Wei, D.; Wang, X. Mf-mner: Multi-models fusion for mner in Chinese clinical electronic medical records. Interdiscip. Sci. Comput. Life Sci. 2024, 16, 489–502. [Google Scholar]
  20. Gao, F.; Zhang, L.; Wang, W.; Zhang, B.; Liu, W.; Zhang, J.; Xie, L. Named Entity Recognition for Equipment Fault Diagnosis Based on RoBERTa-wwm-ext and Deep Learning Integration. Electronics 2024, 13, 3935. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1441–1451. [Google Scholar]
  22. Liu, H.; Zhang, D.; Xiong, S.; Ma, X.; Xi, L. Named Entity Recognition of Wheat Diseases and Pests Fusing ALBERT and Rules. J. Front. Comput. Sci. Technol. 2023, 17, 1395–1404. [Google Scholar]
  23. Qi, R.; Zhou, J.; Li, S.; Mao, Y. Few-shot Named Entity Recognition Based on Fine-grained Prototypical Network. J. Softw. 2023, 35, 4751–4765. [Google Scholar]
  24. Deng, J.; Chen, C.; Huang, X.; Chen, W.; Cheng, L. Research on the construction of event logic knowledge graph of supply chain management. Adv. Eng. Inform. 2023, 56, 101921. [Google Scholar] [CrossRef]
  25. Sun, Y.; Wang, X.; Wu, H.; Hu, M. Global Span Semantic Dependency Awareness and Filtering Network for nested named entity recognition. Neurocomputing 2025, 617, 129035. [Google Scholar] [CrossRef]
  26. Li, J.; Fei, H.; Liu, J.; Wu, S.; Zhang, M.; Teng, C.; Ji, D.; Li, F. Unified named entity recognition as word-word relation classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 10965–10973. [Google Scholar]
  27. Chen, J.; Chen, X.; Pan, S.; Zhang, W. GFNER: A Unified Global Feature-Aware Framework for Flat and Nested Named Entity Recognition. IEEE Access 2023, 11, 55139–55148. [Google Scholar] [CrossRef]
  28. Li, Z.; Cao, S.; Zhai, M.; Ding, N.; Zhang, Z.; Hu, B. Multi-level semantic enhancement based on self-distillation BERT for Chinese named entity recognition. Neurocomputing 2024, 586, 127637. [Google Scholar] [CrossRef]
  29. Mo, Y.; Liu, J.; Tang, H.; Wang, Q.; Xu, Z.; Wang, J.; Quan, X.; Wu, W.; Li, Z. Multi-Task Multi-Attention Transformer for Generative Named Entity Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 4171–4183. [Google Scholar] [CrossRef]
Figure 1. The overall structure of our proposed SFRSN model. Note: The color of each module (for example, BiLSTM) distinguishes different architectural components. In addition, the colors of the feature vectors denote their origin and the model’s feature reuse strategy. Specifically, the concatenation of features is visually represented by the combination of their feature colors (for example, the blue feature is concatenated with the dark green feature, and then concatenated with the red feature output by the next layer of BiLSTM). In the convolution layer and single-tail selection function layer, colors are used primarily for visual distinction of different features and do not carry any specific quantitative meaning. * represents the multiplication sign. batch_size represents the number of training samples in the batch, seq_len represents the sentence length, h_BiLSTM represents the number of BiLSTM hidden layer cells, conv_size is the number of DCNN convolution kernels, h is the dimension of the fully connected layer, and c is the number of entity suffix categories.
Figure 1. The overall structure of our proposed SFRSN model. Note: The color of each module (for example, BiLSTM) distinguishes different architectural components. In addition, the colors of the feature vectors denote their origin and the model’s feature reuse strategy. Specifically, the concatenation of features is visually represented by the combination of their feature colors (for example, the blue feature is concatenated with the dark green feature, and then concatenated with the red feature output by the next layer of BiLSTM). In the convolution layer and single-tail selection function layer, colors are used primarily for visual distinction of different features and do not carry any specific quantitative meaning. * represents the multiplication sign. batch_size represents the number of training samples in the batch, seq_len represents the sentence length, h_BiLSTM represents the number of BiLSTM hidden layer cells, conv_size is the number of DCNN convolution kernels, h is the dimension of the fully connected layer, and c is the number of entity suffix categories.
Information 16 00822 g001
Figure 2. Subfigure (a) shows an example of text tagging. Note: The Chinese example text in the subfigure is translated as “I love Beijing tian’anmen, the scenery in Beijing is really beautiful.”. Subfigure (b) shows the span representation matrix corresponding to the sample in subfigure (a).
Figure 2. Subfigure (a) shows an example of text tagging. Note: The Chinese example text in the subfigure is translated as “I love Beijing tian’anmen, the scenery in Beijing is really beautiful.”. Subfigure (b) shows the span representation matrix corresponding to the sample in subfigure (a).
Information 16 00822 g002
Figure 3. Comparison of micro-F1 values of SFRSN with different feature reuse stacked BiLSTM layers. (a) Ontonotes dataset. (b) Weibo dataset. (c) Resume dataset. (d) Supply chain management dataset.
Figure 3. Comparison of micro-F1 values of SFRSN with different feature reuse stacked BiLSTM layers. (a) Ontonotes dataset. (b) Weibo dataset. (c) Resume dataset. (d) Supply chain management dataset.
Information 16 00822 g003
Figure 4. Comparison of micro-F1 values of SFRSN with different dilated convolution kernel size numbers. (a) Ontonotes dataset. (b) Weibo dataset. (c) Resume dataset. (d) Supply chain management dataset.
Figure 4. Comparison of micro-F1 values of SFRSN with different dilated convolution kernel size numbers. (a) Ontonotes dataset. (b) Weibo dataset. (c) Resume dataset. (d) Supply chain management dataset.
Information 16 00822 g004
Table 1. Statistics of datasets.
Table 1. Statistics of datasets.
DatasetsTypeEntity TypesTrainDevTest
OntonotesSentence515.7 k4.3 k4.3 k
Char 491.9 k200.5 k208.1 k
WeiboSentence91.4 k0.27 k0.27 k
Char 73.8 k14.5 k14.8 k
ResumeSentence93.8 k0.46 k0.48 k
Char 124.1 k13.9 k15.1 k
Table 2. Statistics of supply chain management dataset.
Table 2. Statistics of supply chain management dataset.
CategoryTotal
object5138
attribute2552
attribute_value135
company224
field954
index290
index_system166
industry212
method643
model241
problem357
trigger_word5011
Table 3. Comparison of our method with the state-of-the-art methods on Ontonotes and Weibo datasets.
Table 3. Comparison of our method with the state-of-the-art methods on Ontonotes and Weibo datasets.
ModelOntonotes DatasetWeibo Dataset
P R F1 P R F1
MCL81.57 ± 1.37%83.33 ± 0.68%82.44 ± 0.66%76.69 ± 1.62%69.58 ± 1.69%72.96 ± 0.30%
Wang et al. [17]80.14 ± 0.90%84.21 ± 0.46%82.12 ± 0.64%68.01 ± 0.95%68.66 ± 0.85%68.33 ± 0.22%
BACSBN81.35 ± 0.78%83.74 ± 0.95%82.52 ± 0.76%74.18 ± 0.51%69.80 ± 0.63%71.92 ± 0.44%
W2NER80.02 ± 0.97%85.06 ± 0.67%82.47 ± 0.60%75.60 ± 1.51%69.02 ± 1.06%72.16 ± 0.39%
MSE81.07 ± 0.50%83.59 ± 0.67%82.31 ± 0.50%71.59 ± 0.90%72.80 ± 0.93%72.19 ± 0.46%
GFNER79.65 ± 1.45%84.05 ± 0.58%81.79 ± 1.03%68.85 ± 0.65%74.80 ± 0.81%71.70 ± 0.69%
SFRSN (Micro)84.96 ± 0.61%81.78 ± 0.22%83.34 ± 0.41%73.10 ± 1.24%73.44 ± 1.32%73.27 ± 0.31%
SFRSN (Macro)82.76 ± 1.46%80.61 ± 0.68%81.42 ± 0.36%68.49 ± 0.86%58.73 ± 0.85%61.47 ± 0.42%
Table 4. Comparison of our method with the state-of-the-art methods on resume and supply chain management datasets.
Table 4. Comparison of our method with the state-of-the-art methods on resume and supply chain management datasets.
ModelResume DatasetSupply Chain Management Dataset
P R F1 P R F1
MCL96.68 ± 0.35%96.56 ± 0.19%96.62 ± 0.25%84.94 ± 0.83%86.43 ± 0.84%85.68 ± 0.73%
Wang et al. [17]95.66 ± 0.25%95.95 ± 0.45%95.80 ± 0.32%83.06 ± 0.83%85.15 ± 0.69%84.09 ± 0.67%
BACSBN96.58 ± 0.37%96.46 ± 0.24%96.52 ± 0.27%84.81 ± 0.92%87.03 ± 1.15%85.91 ± 0.98%
W2NER96.34 ± 0.20%96.79 ± 0.26%96.56 ± 0.15%84.68 ± 0.83%86.66 ± 0.87%85.65 ± 0.80%
MSE96.88 ± 0.28%96.59 ± 0.47%96.74 ± 0.23%83.59 ± 0.28%85.10 ± 0.84%84.34 ± 0.66%
GFNER94.36 ± 1.44%94.48 ± 1.68%94.42 ± 1.34%78.67 ± 0.28%88.54 ± 2.16%83.30 ± 1.09%
SFRSN (Micro)96.81 ± 0.33%96.99 ± 0.62%96.90 ± 0.28%86.24 ± 0.85%87.11 ± 1.16%86.77 ± 0.91%
SFRSN (Macro)98.38 ± 0.41%99.03 ± 0.44%98.69 ± 0.32%87.58 ± 1.24%87.68 ± 0.98%87.46 ± 1.02%
Table 5. Comparison of each part’s contribution in the SFRSN model on the Ontonotes dataset and Weibo dataset.
Table 5. Comparison of each part’s contribution in the SFRSN model on the Ontonotes dataset and Weibo dataset.
ModelOntonotes DatasetWeibo Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
w/o DCNN82.21 ± 0.72%82.19 ± 0.87%82.20 ± 0.76%80.13 ± 0.52%81.30 ± 0.92%80.71 ± 0.50%70.67 ± 1.77%73.21 ± 0.95%71.92 ± 0.64%61.22 ± 0.88%60.32 ± 0.55%60.77 ± 0.47%
w/o feature reuse stacked BiLSTM81.08 ± 0.77%84.90 ± 1.01%82.95 ± 0.75%80.97 ± 0.73%81.41 ± 0.92%81.01 ± 0.78%72.76 ± 1.27%73.18 ± 1.24%72.97 ± 0.55%61.09 ± 0.33%59.69 ± 0.94%60.03 ± 0.62%
w/o single-tail selection function80.96 ± 0.50%84.98 ± 0.99%82.92 ± 0.70%79.71 ± 0.77%81.32 ± 0.72%80.48 ± 0.70%72.66 ± 0.83%73.18 ± 1.14%72.92 ± 0.47%64.79 ± 0.58%59.78 ± 0.53%61.23 ± 0.69%
w/o global feature gated attention mechanism81.64 ± 0.72%83.65 ± 0.87%82.63 ± 0.80%80.84 ± 0.75%81.13 ± 0.79%80.94 ± 0.72%72.22 ± 2.21%72.98 ± 2.09%72.51 ± 0.27%56.97 ± 0.78%66.91 ± 0.59%60.86 ± 0.49%
SFRSN(Ours)84.96 ± 0.61%81.78 ± 0.22%83.34 ± 0.41%82.76 ± 1.46%80.61 ± 0.68%81.42 ± 0.36%73.10 ± 1.24%73.44 ± 1.32%73.27 ± 0.31%68.49 ± 0.86%58.73 ± 0.85%61.47 ± 0.42%
Table 6. Comparison of each part’s contribution in the SFRSN model on the resume dataset and supply chain management dataset.
Table 6. Comparison of each part’s contribution in the SFRSN model on the resume dataset and supply chain management dataset.
ModelResume DatasetSupply Chain Management Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
w/o DCNN96.05 ± 0.51%96.83 ± 0.36%96.43 ± 0.23%97.60 ± 0.33%98.63 ± 0.53%98.09 ± 0.37%85.61 ± 1.13%87.42 ± 0.83%86.51 ± 0.94%86.54 ± 1.78%87.69 ± 1.01%87.04 ± 1.35%
w/o feature reuse stacked BiLSTM96.21 ± 0.36%96.87 ± 0.28%96.54 ± 0.29%97.55 ± 0.32%99.13 ± 0.42%98.31 ± 0.33%86.25 ± 0.99%86.96 ± 1.14%86.60 ± 1.04%86.82 ± 1.82%87.25 ± 1.23%86.98 ± 1.46%
w/o single-tail selection function96.33 ± 0.44%96.97 ± 0.32%96.65 ± 0.30%97.80 ± 0.36%98.58 ± 0.66%98.18 ± 0.21%85.73 ± 1.22%87.38 ± 0.76%86.54 ± 1.13%85.70 ± 1.20%87.80 ± 0.83%86.63 ± 0.95%
w/o global feature gated attention mechanism95.98 ± 0.23%96.81 ± 0.18%96.39 ± 0.33%97.32 ± 0.44%98.99 ± 0.70%98.10 ± 0.40%85.87 ± 1.08%86.98 ± 1.00%86.42 ± 1.03%86.87 ± 1.33%87.62 ± 0.67%87.20 ± 0.92%
SFRSN(Ours)96.81 ± 0.33%96.99 ± 0.62%96.90 ± 0.28%98.38 ± 0.41%99.03 ± 0.44%98.69 ± 0.32%86.24 ± 0.85%87.11 ± 1.16%86.77 ± 0.91%87.58 ± 1.24%87.68 ± 0.98%87.46 ± 1.02%
Table 7. Impact of feature reuse stacked BiLSTM layers on Ontonotes dataset and Weibo dataset.
Table 7. Impact of feature reuse stacked BiLSTM layers on Ontonotes dataset and Weibo dataset.
Stacked BiLSTM LayersOntonotes DatasetWeibo Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
182.30 ± 0.95%82.62 ± 0.96%82.46 ± 0.93%80.89 ± 0.26%81.01 ± 0.33%80.81 ± 0.46%72.99 ± 1.02%72.42 ± 1.16%72.70 ± 0.42%58.48 ± 0.58%60.64 ± 0.71%59.16 ± 0.78%
284.96 ± 0.61%81.78 ± 0.22%83.34 ± 0.41%82.76 ± 1.46%80.61 ± 0.68%81.42 ± 0.36%73.10 ± 1.24%73.44 ± 1.32%73.27 ± 0.31%68.49 ± 0.86%58.73 ± 0.85%61.47 ± 0.42%
382.92 ± 0.31%81.84 ± 0.55%82.38 ± 0.39%80.13 ± 0.69%81.39 ± 0.45%80.64 ± 0.76%72.52 ± 0.93%72.73 ± 1.09%72.61 ± 0.41%62.95 ± 1.76%60.72 ± 1.38%60.86 ± 0.19%
Table 8. Impact of feature reuse stacked BiLSTM layers on resume dataset and supply chain management dataset.
Table 8. Impact of feature reuse stacked BiLSTM layers on resume dataset and supply chain management dataset.
Stacked BiLSTM LayersResume DatasetSupply Chain Management Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
196.05 ± 0.27%97.17 ± 0.15%96.61 ± 0.10%97.30 ± 0.27%99.11 ± 0.34%98.15 ± 0.40%85.58 ± 0.82%87.17 ± 1.13%86.37 ± 0.93%86.84 ± 0.94%87.74 ± 0.96%87.16 ± 0.99%
296.81 ± 0.33%96.99 ± 0.62%96.90 ± 0.28%98.38 ± 0.41%99.03 ± 0.44%98.69 ± 0.32%86.24 ± 0.85%87.11 ± 1.16%86.77 ± 0.91%87.58 ± 1.24%87.68 ± 0.98%87.46 ± 1.02%
396.04 ± 0.54%96.99 ± 0.47%96.51 ± 0.19%98.02 ± 0.28%98.98 ± 0.36%98.49 ± 0.12%85.78 ± 0.85%87.73 ± 0.95%86.74 ± 0.88%87.34 ± 1.35%87.77 ± 1.07%87.45 ± 0.99%
Table 9. Impact of the number of convolution kernel size on Ontonotes dataset and Weibo dataset.
Table 9. Impact of the number of convolution kernel size on Ontonotes dataset and Weibo dataset.
Convolution Kernel Size NumberOntonotes DatasetWeibo Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
6482.87 ± 0.81%82.57 ± 0.73%82.72 ± 0.28%81.95 ± 0.82%80.12 ± 1.48%80.82 ± 0.61%71.30 ± 0.72%73.68 ± 0.98%72.47 ± 0.44%62.33 ± 0.66%61.75 ± 0.84%61.22 ± 0.55%
9683.22 ± 0.58%82.66 ± 0.55%82.94 ± 0.54%82.72 ± 0.77%80.69 ± 1.04%81.42 ± 0.45%73.10 ± 1.24%73.44 ± 1.32%73.27 ± 0.31%68.49 ± 0.86%58.73 ± 0.85%61.47 ± 0.42%
12884.96 ± 0.61%81.78 ± 0.22%83.34 ± 0.41%82.76 ± 1.46%80.61 ± 0.68%81.42 ± 0.36%73.00 ± 1.07%72.55 ± 1.16%72.77 ± 0.26%61.10 ± 0.49%62.71 ± 0.63%61.32 ± 0.40%
Table 10. Impact of the number of convolution kernel size on resume dataset and supply chain management dataset.
Table 10. Impact of the number of convolution kernel size on resume dataset and supply chain management dataset.
Convolution Kernel Size NumberResume DatasetSupply Chain Management Dataset
Micro Macro Micro Macro
P R F1 P R F1 P R F1 P R F1
6496.18 ± 0.22%96.99 ± 0.36%96.58 ± 0.13%97.94 ± 0.15%99.14 ± 0.06%98.52 ± 0.06%86.24 ± 0.85%87.11 ± 1.16%86.77 ± 0.91%87.58 ± 1.24%87.68 ± 0.98%87.46 ± 1.02%
9696.81 ± 0.33%96.99 ± 0.62%96.90 ± 0.28%98.38 ± 0.41%99.03 ± 0.44%98.69 ± 0.32%86.07 ± 1.00%87.15 ± 1.11%86.60 ± 1.01%86.96 ± 1.21%87.39 ± 1.21%87.06 ± 1.19%
12896.86 ± 0.22%96.74 ± 0.53%96.80 ± 0.51%98.48 ± 0.13%98.86 ± 0.24%98.66 ± 0.20%85.63 ± 1.26%86.91 ± 1.22%86.26 ± 1.21%86.55 ± 1.79%87.17 ± 1.12%86.76 ± 1.39%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, J.; Zhao, R.; Ye, W.; Zheng, S. Entity Span Suffix Classification for Nested Chinese Named Entity Recognition. Information 2025, 16, 822. https://doi.org/10.3390/info16100822

AMA Style

Deng J, Zhao R, Ye W, Zheng S. Entity Span Suffix Classification for Nested Chinese Named Entity Recognition. Information. 2025; 16(10):822. https://doi.org/10.3390/info16100822

Chicago/Turabian Style

Deng, Jianfeng, Ruitong Zhao, Wei Ye, and Suhong Zheng. 2025. "Entity Span Suffix Classification for Nested Chinese Named Entity Recognition" Information 16, no. 10: 822. https://doi.org/10.3390/info16100822

APA Style

Deng, J., Zhao, R., Ye, W., & Zheng, S. (2025). Entity Span Suffix Classification for Nested Chinese Named Entity Recognition. Information, 16(10), 822. https://doi.org/10.3390/info16100822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop