Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module

Deng, Jinxin; Liu, Junbao; Ma, Xiaoqin; Qin, Xizhong; Jia, Zhenhong

doi:10.3390/app13169200

Open AccessArticle

Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module

by

Jinxin Deng

¹,

Junbao Liu

¹,

Xiaoqin Ma

^1,2,

Xizhong Qin

^1,2,* and

Zhenhong Jia

^1,2

¹

College of Information Science and Engineering, Xinjiang University, Urumqi 830049, China

²

Xinjiang Signal Detection and Processing Key Laboratory, Urumqi 830049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(16), 9200; https://doi.org/10.3390/app13169200

Submission received: 5 July 2023 / Revised: 3 August 2023 / Accepted: 11 August 2023 / Published: 13 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

Named entity recognition involves two main types: nested named entity recognition and flat named entity recognition. The span-based approach treats nested entities and flat entities uniformly by classifying entities on a span representation. However, the span-based approach ignores the local features within the entities and the relative position features between the head and tail tokens, which affects the performance of entity recognition. To address these issues, we propose a nested entity recognition model using a convolutional block attention module and rotary position embedding for local features and relative position features enhancement. Specifically, we apply rotary position embedding to the sentence representation and capture the semantic information between the head and tail tokens using a biaffine attention mechanism. Meanwhile, the convolution module captures the local features within the entity to generate the span representation. Finally, the two parts of the representation are fused for entity classification. Extensive experiments were conducted on five widely used benchmark datasets to demonstrate the effectiveness of our proposed model.

Keywords:

nested entity recognition; convolutional block attention module; rotary position embedding

1. Introduction

Named entity recognition (NER) aims to identify words of specific meaning from text, e.g., person, organization, location, and is a fundamental task in natural language processing, aiming to identify the corresponding entities from text and label their types [1]. It is an essential human support task for downstream tasks such as syntactic analysis [2], machine translation [3], automated question and answer [4], and knowledge graphs [5].

The NER task was first proposed by Rau et al. [6] and has been widely used in areas such as information extraction. Initially, lexicon-based and rule-based approaches were commonly used in NER, which mainly utilized string and model matching. Petasis et al. [7] proposed a method to better maintain rule-based NER and classification systems for improving the shortcomings in lexicon-based and rule-based approaches, but it still has some limitations. With the development of machine learning, statistical-based methods came into the public eye. Li et al. [8] were the first to add Conditional Random Fields (CRF) to NER and obtained good results, but the training time was too long. Mikhael et al. [9] were the first to apply the maximum entropy model to NER, which combined a rule-based grammar with a maximum entropy model, but the time complexity required for training was high. In recent years, with the development of deep learning, deep-learning-based methods have also been applied to NER, eliminating the need for extensive manual feature engineering and domain knowledge provided by domain experts in traditional ways. Peng et al. [10] combined Long Short-Term Memory (LSTM) with CRFs for the joint training of NER and word separation tasks, and the experimental results were effectively improved. In addition, methods based on convolutional neural networks [11] and hybrid neural networks [12] have also been widely used on the NER task with good results.

The above methods are all oriented toward flat entities, which refer to entities that do not contain other entities within the identified entity. However, human languages are very complex, and not every token can be represented by a single label, so nested entities are prevalent in all areas of expertise [13]. A nested entity is a phenomenon where one or more short entities are contained within a long entity. As shown in Figure 1, using the above methods cannot identify the nested entities resulting in a loss of information.

To solve the nested entity problem, Ju et al. [14] proposed a dynamic stacked entity decoding model, yet there is a phenomenon that the inner entities cannot use the information of the outer entities. Some researchers have tried to adopt a hypergraph-based approach [15,16], but problems such as very complex hypergraph structures arise when sentences are too long or when there are too many classes of entities. Most researchers have used a span-based approach in model formation [17,18] to solve the problem of nested entities. The span-based approach is used to divide the sentence into a two-dimensional grid, where each span in the grid is represented by the semantic information of the head and tail tokens, and the problem of nested entities can be naturally solved by classifying entities for each span. Zheng et al. [19] improved the span-based approach by locating entity boundaries and jointly learning the boundary detection and entity classification tasks. Yuan et al. [20] devised a span-based biaffine attention mechanism incorporating boundary information and used the biaffine mechanism to compute entity scores for NER. However, this approach does not consider the entity’s local feature and the relative position feature between the head and tail tokens, which affects the performance of entity recognition.

To address these issues, we propose a nested entity recognition model that achieves local feature enhancement and relative position feature fusion using convolutional modules and rotary position embedding. Specifically, since the same token is represented differently at the beginning and end of an entity, we use two feedforward networks to differentiate between the head and tail sequences. Subsequently, rotary position embedding [21] is applied to the head and tail sequences and a biaffine attention mechanism [22] to produce a span representation with relative position information. At the same time, a convolutional module is used to capture the span representation composed of local features within the entity. Finally, the two parts of the span representation are fused to achieve entity classification. Our main contributions are as follows:

1. A relative position feature was added between head and tail tokens to the span representation using rotational position encoding to improve the precision of entity recognition.

2. The local feature extraction was performed using a convolutional attention module to achieve local feature enhancement and improve model performance.

3. Channel and spatial attention were applied to differentiate the importance of different dimensions and tokens.

2. Related Work

NER is one of the critical steps in constructing a knowledge graph. In early methods, NER was treated as a sequence labeling task, often using Bidirectional Long Short-Term Memory (BiLSTM) [23] as the encoder and a Hidden Markov Model (HMM) [24] and CRFs as the decoders. Ratinov et al. [25] proposed an improved HMM architecture, which includes using different observation features and introducing transition features to enhance the performance of NER. Konkol et al. [26] constructed a Czech NER system based on CRFs. Before the emergence of pre-trained language models, the BiLSTM-CRF model had achieved excellent performance in NER tasks. With the advent of pre-trained language models, combining them with traditional sequence labeling models can further enhance performance.

However, the phenomenon of nested entities naturally exists in the text, in which a long entity contains one or more short entities. Kim et al. [27] first proposed the concept of nested entities and constructed the biomedical nested entity recognition dataset, GENIA. Traditional sequence labeling methods cannot solve the problem of complex nested entities. Therefore, many researchers have explored ways for nested entity recognition.

In the early stages, the layered-based model was mainly used for nested entity recognition. This model decodes nested entities by stacking multiple sequence labeling modules in a layered manner. For example, Ju et al. [14] recognized nested entities from the inside out by stacking BiLSTM-CRF modules. The model first identifies the inner layer of entities. If there are entities in the current layer, the model stacks another BiLSTM-CRF module on top of the current one until all nested entities are recognized. By stacking CRF models, Jiang et al. [28] recognized nested entities in electronic medical records. While this approach is intuitive and easy to implement, the problem of error propagation becomes more pronounced as the number of nested entity layers increases. A transition-based model performs the sequential parsing of characters in the entire sentence to achieve nested entity recognition. Wang et al. [29] introduced three components: Buffer, Stack, and Action. Buffer stores the unprocessed sentence, Stack stores the processed sentence, and Action processes the tokens in the sentence. Although transition-based methods do not suffer from error propagation, they have notable limitations. These methods can only handle entities consisting of two tokens; however, entities are often longer than that. The approach based on reading comprehension [30] cleverly transforms the named entity recognition task into a reading comprehension task. This approach converts each entity label into a question. For example, a question like “What is the organization entity in this sentence?” can be constructed for an organization entity. Then, the question and the target sentence are concatenated using a special token [SEP]. Finally, the entity is obtained based on the output position. However, this method lacks a fixed method for constructing questions, and the quality of the prediction heavily relies on the construction of the questions.

To address the shortcomings of the previous methods, span-based methods [31] are now mainly used for nested entity recognition. The span-based approach classifies each sub-span in a sentence. For example, Yu et al. [32] use a biaffine attention mechanism to generate a span representation of head and tail tokens for entity classification. However, this approach mainly considers only the head and tail tokens’ information of entities to compose entities, resulting in the unsatisfactory performance of entity classification.

Some work improves the performance of entity recognition by enhancing boundary information. For example, Tan et al. [33] enhance the span representation by additional boundary supervision. Gao et al. [34] refine the representation of entity classification by multi-task learning of boundary recognition. Xu et al. [35] use the representation of boundary recognition and sentence representation for entity recognition after concatenating. Although adding boundary information can improve the precision of entity recognition, the span-based model already uses the boundary information of entities to form a categorical representation, and the internal information of entities should be considered more. And the length characteristics of the entities are also used as an important basis for entity judgment.

Therefore, we propose a local feature-enhanced nested entity recognition model, which uses a convolution module to capture local features for entity classification and rotary position embedding to obtain relative position features between head and tail characters. In contrast to previous work, we consider multiple features for entity classification to improve the model’s performance.

3. Methods

The structure of our model is shown in Figure 2, which mainly consists of four parts. First, the pre-trained language model and BiLSTM are used as the encoder to generate contextual semantic representations of the input sentences. After that, the biaffine attention mechanism is used to construct the representation of head–tail pairs, which is prepared for the subsequent entity classification. Meanwhile, the convolutional module is used to capture and refine the representation within the entity to achieve local feature enhancement. Finally, the representation of the biaffine attention mechanism and the convolution module are fused to perform entity classification. In this section, we will elaborate on the details of each section.

3.1. Encoder

We use the pre-trained language model Bidirectional Encoder Representation from Transformers (BERT) [36] and BiLSTM as the encoder of the model. BERT is able to construct favorable sentence representations and is widely used in NLP tasks. Given an after-tokenization sentence,

{W = (w}_{1}, w_{2} \dots, w_{N})

with

N

characters, we feed the sentence into the BERT to obtain the sentence representation

{X = (x}_{1}, x_{2} \dots, x_{N})

. Subsequently, to further model the contextual information, we use BiLSTM to yield the final sentence representation:

\vec{h_{i}} = L S T M (x_{i}, \vec{h_{i - 1}})

(1)

\overset{\leftarrow}{h_{i}} = L S T M (x_{i}, \overset{\leftarrow}{h_{i + 1}})

(2)

h_{i} = [\vec{h_{i}}; \overset{\leftarrow}{h_{i}}]

(3)

BERT generates the word embedding

x_{i} \in R^{d_{w o r d}}

. BiLSTM obtains the contextual information embedding

h_{i} \in R^{{2 d}_{h}}

, where the dimension of the hidden state of the BiLSTM is

{2 d}_{h}

.

3.2. Head–Tail Pair Representation Module

The biaffine attention mechanism and rotary position embedding are mainly used to construct head–tail pair representations with relative position feature as an essential part of subsequent entity classification. Since the same tokens are represented differently when they are located at the beginning and end of an entity, we first use two feedforward networks to obtain the head-sequence representation and the tail-sequence representation:

s_{i} = W_{s} (h_{i}) + b_{s}

(4)

t_{i} = W_{t} (h_{i}) + b_{t}

(5)

W_{s}

,

W_{t} \in R^{{2 d}_{h} \times d_{h}}

,

b_{s}

, and

b_{t} \in R^{d_{h}}

are trainable parameters. As the span-based model uses semantic information of head and tail tokens to determine whether the current span is an entity, it does not consider the relative position between head and tail tokens, ignoring the fact that entities have a specific length limit. Therefore, introducing the relative position feature between the head and tail tokens into the span representation can improve the ability to recognize entities. Rotary position embedding combined with linear attention can capture the relative position feature using absolute position embedding, which is more suitable for span-based models than sinusoidal position embedding. We, therefore, apply rotary position embedding to both head and tail sequences:

s_{i} = R o P E (s_{i})

(6)

t_{i} = R o P E (t_{i})

(7)

where the dimensions of

s_{i}

and

t_{i}

are the same as the input.

R o P E (*)

denotes the addition of rotary position embedding information to

*

. We use

R o P E (s_{i})

as an example:

R o P E (s_{i}) = (\begin{matrix} s_{0} \\ s_{1} \\ s_{2} \\ s_{3} \\ ⋮ \\ s_{d_{h} - 2} \\ s_{d_{h} - 1} \end{matrix}) \cdot (\begin{matrix} \cos {m θ}_{0} \\ \cos {m θ}_{0} \\ \cos {m θ}_{1} \\ \cos {m θ}_{1} \\ ⋮ \\ \cos {m θ}_{d_{h} / 2 - 1} \\ \cos {m θ}_{d_{h} / 2 - 1} \end{matrix}) + (\begin{matrix} {- s}_{1} \\ s_{0} \\ - s_{3} \\ s_{2} \\ ⋮ \\ - s_{d_{h} - 1} \\ s_{d_{h} - 2} \end{matrix}) \cdot (\begin{matrix} \sin {m θ}_{0} \\ \sin {m θ}_{0} \\ \sin {m θ}_{1} \\ \sin {m θ}_{1} \\ ⋮ \\ \sin {m θ}_{d_{h} / 2 - 1} \\ \sin {m θ}_{d_{h} / 2 - 1} \end{matrix})

(8)

where

m

is the position of the token in a sentence and takes a range of values

m \in [0, N)

.

θ_{i} = {10,000}^{- 2 i / d_{h}}

, where

i

is the token embedding dimension of the input sequence, taking values in the range

i \in [0, d_{h} / 2)

. Our model uses the biaffine attention mechanism to generate the span representation of head–tail pairs as it is more effective in capturing the correlation between head and tail information than directly concatenating the semantic information of head and tail tokens to generate the span representation. After rotary position embedding, the head and tail sequences are fed into the biaffine attention mechanism decoder to obtain the scoring tensor for entity classification.

v_{i j} = s_{i}^{⊺} U_{m} t_{j} + W_{m} (s_{i} \oplus t_{j}) + b_{m}

(9)

U_{m}

,

W_{m}

, and

b_{m}

are trainable parameters. Among them, the scoring for span is represented by a tensor V of size N × N × M. The sub-span

v_{i j}

starts with token

s_{i}

and ends with

t_{j}

.

3.3. Local Feature Representation Module

The pre-trained language model and BiLSTM focus on constructing the global information of sentences rather than local information. However, local features within an entity are essential information for judging the entity. Therefore, we use a Convolutional Block Attention Module (CBAM) [37] for local feature capture. First, we perform a dimension extension on the sentence representation and capture local features using convolution for the sentence representation:

F = G e L U (f^{3 \times 3} (X))

(10)

where,

F \in R^{C \times H \times W}

, in which

C

denotes the embedding dimension of BERT, and the values of

H

and

W

are the same as the sentence length. Gaussian Error Linear Units (GeLU) denotes the activation function, and

f^{3 \times 3}

denotes a convolution operation with a convolution kernel of

3 \times 3

. Subsequently, since the importance placed on different channels varies, we use channel attention to assign different weights to each dimensional representation:

F^{'} = M_{c} (F) \times F

(11)

where

F^{'} \in R^{C^{'} \times H \times W}

denotes the output after channel attention.

C^{'}

denotes that channel attention weights have been assigned to

C

.

M_{c} (F)

is the calculation of channel attention weights for

F

:

M_{c} (F) = σ (W_{1} (W_{0} (A v g P o o l (F))) + W_{1} (W_{0} (M a x P o o l (F))))

(12)

where

σ

denotes the activation function sigmoid,

A v g P o o l

denotes the average pooling operation, and

M a x P o o l

denotes the maximum pooling operation.

W_{0} \in R^{C / r \times C}

,

W_{1} \in R^{C \times C / r}

,

r

is a constant, and the purpose of dimensional transformation is to reduce parameter overhead. The mapping of spatial attention is then generated according to the internal spatial relations characterized by the following:

F^{''} = M_{s} (F^{'}) \times F^{'}

(13)

where

F^{'} \in R^{C^{'} \times H^{'} \times W^{'}}

, in which

H^{'}

and

W^{'}

denote that spatial attention weights have been assigned to

H

and

W

.

M_{c} (F^{'})

is the calculation of spatial attention weights for

F^{'}

:

M_{c} (F) = σ (f^{n \times n} [A v g P o o l (F^{'}); M a x P o o l (F^{'})])

(14)

where

n

denotes the size of the convolution kernel, and

[A v g P o o l (F^{'}); M a x P o o l (F^{'})]

denotes the concatenation of the average pooling and maximum pooling results.

F^{''}

is the local feature we generate, which will be used for subsequent entity classification.

3.4. Entity Classification

Finally, the features generated by the biaffine attention mechanism and the features generated by the convolution module are fused for the entity classification task. The

m_{i j}

contains the probabilities of all entity types in the span, starting with token

i

and ending with token

j

. The type with the highest probability is taken as the predicted result of the entity type in the span. In training, this study uses a cross-entropy loss function to optimize the model:

M = V + F^{'}

(15)

y_{i j} (r) = \frac{e x p (m_{i j} (r))}{\sum_{\hat{r} \in M} e x p (m_{i j} (\hat{r}))}

(16)

L_{s p a n} = - \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} \sum_{r = 1}^{M} {\hat{y}}_{i j} (r) l o g y_{i j} (r)

(17)

where

y_{i j} (r)

and

{\hat{y}}_{i j} (r)

denote the actual entity type distribution and the predicted entity type distribution, respectively.

L_{s p a n}

represents the entity classification loss in our model.

4. Experiments

In this section, we conduct extensive experiments on three nested entity recognition and two flat entity recognition datasets.

4.1. Datasets

We begin with an introduction to the five datasets used in the experiment:

ACE2004 [38] is a dataset issued by the Language Data Consortium (LDC) and exists in three types, English, Arabic, and Chinese, of which we have chosen the English type. This dataset aims to develop automatic content extraction techniques to support the automated processing of human language in a textual form. It can be used for nested entity recognition in NER.

ACE2005 [39] is also a dataset published by LDC and includes English, Arabic, and Chinese training datasets. We chose the English-type dataset. This dataset can be used for nested NER and relation extraction tasks. There are 145,000 English words in the dataset with six entity types: person name, organization name, location name, time, currency, and percentage.

The GENIA [25] dataset was extracted from the biomedical literature, which informs the development of bio-text mining systems. The dataset contains 1999 abstracts from medical literature analysis and retrieval systems. The abstracts are collected using three medical subject terms: human, blood cell, and copy factor. It can be used to develop and evaluate natural language processing (NLP) algorithms and tools, such as text classification and named entity recognition.

MSRA [40] is a dataset on Chinese NER from Microsoft Research Asia (MSRA), which contains over 50,000 Chinese entity recognition annotations in the following entity categories: place name, institution name, and person name.

Resume [41] is a CV-oriented Chinese NER dataset published by Purdue University, which contains eight entity types, including country (CONT), educational background (EDU), location name (LOC), person name (NAME), organization name (ORG), profession (PRO), race (RACE), and title (TITLE).

4.2. Baselines

Xie et al. [42]: detection and identification of entities on multi-granularity feature information.

Luan et al. [43]: NER via dynamic span diagrams.

Straková et al. [44]: linearization of multiple labels of nested entities into one label followed by sequence labeling methods.

Tan et al. [33]: a span-based nested entity recognition model using boundary enhancement.

Fu et al. [45]: a view-based nested NER as constituency parsing with partially observed trees.

Wang et al. [13]: nested entity recognition using a stacked model in the shape of a pyramid.

Xu et al. [35]: construction of nested entity recognition models for span representation using additive attention mechanisms.

Gao et al. [34]: training using biaffine attention mechanisms and multi-task learning for boundary recognition.

Zhang and Yang [41]: a model for Chinese entity recognition tasks that explicitly utilizes word information and word order information is proposed.

Yan et al. [46]: an NER framework using an adaptive transformer encoder is proposed for modeling character-level features and word-level features.

Gui et al. [47]: a dictionary-based graph neural network model is used to deal with the Chinese entity recognition problem.

Kong et al. [48]: a multi-level CNN was constructed to capture both short-term and long-term contextual information for entity recognition.

Wu et al. [49]: A new lexicon enhancement is proposed that can effectively improve the problems of excessive memory and computational costs caused by the previous use of lexicons.

4.3. Hyperparameters and Evaluation Indicators

All our experiments were performed on computers with an Intel(R) Xeon(R) Silver 4116 CPU @ 2.10 GHz and two sheets with 16 G of video memory NVIDIA-Tesla T4 GPU cards, and the models were implemented on the Pytorch1.12.0 framework.

In our model, we use the Bert-Large-Cased model for ACE2004 and ACE2005 datasets; use the dmis-labbiobert-v1.1 model for GENIA datasets; and use the Chinese_Roberta_wwm_ext model for Resume and the MSRA dataset. We set the hyperparameters for the five datasets used, as shown in Table 1. We maintain consistent hyperparameters with the baseline model. During parameter tuning, we mainly adjust the parameters batch size and learning rate to obtain the most suitable parameters for the model and test the results with the best model on the validation set.

We evaluate the performance of the model based on the precision (P), recall (R), and F1 values, and these three values are calculated as follows:

P = \frac{N u m b e r o f c a s e s t h a t a r e p r e d i c t e d t o b e p o s i t i v e a n d a r e a c t u a l l y p o s i t i v e}{N u m b e r o f s a m p l e s w i t h a l l c o r r e c t p r e d i c t i o n s}

(18)

R = \frac{N u m b e r o f c a s e s t h a t a r e p r e d i c t e d t o b e p o s i t i v e a n d a r e a c t u a l l y p o s i t i v e}{N u m b e r o f s a m p l e s w h e r e a l l a r e a c t u a l l y p o s i t i v e c a s e s}

(19)

F 1 = \frac{2 * P * R}{P + R}

(20)

4.4. Results

The performance of our model on the nested entity recognition task is shown in Table 2. The results demonstrate that our model effectively improves the performance of the nested entity recognition task. Our model achieves F1 values of 86.66%, 85.83%, and 81.33% on the three datasets, which are 0.06%, 0.43%, and 1.32% higher than the state-of-the-art baseline model, respectively. In particular, Gao et al. [31] used a multi-task learning model with a biaffine attention mechanism and boundary recognition for nested entity recognition. However, our model performed significantly better on all three datasets, demonstrating that relative position and local features within the entity can be more effective for nested entity recognition.

Table 3 shows the results of our model and the baseline model on the flat entity recognition task. Our model achieves the best performance on the MSRA and Resume datasets with F1 values of 95.79% and 96.47%, precision of 95.83% and 96.62%, and recall of 95.76% and 96.32%, respectively. The comparison models used are all based on the sequence labeling method, but our model not only improves in precision but also increases in recall. This proves that our method is applicable not only to nested entities but also to flat entities.

To demonstrate the importance of relative position features and local features in the entity recognition task, ablation experiments were carried out according to entity type. The results of the ablation experiments on the ACE2004 dataset are shown in Table 4, where BAM indicates that only the biaffine attention mechanism is used; RoPE-BAM indicates that the capture of relative position features is achieved by rotary position embedding; and CBAM-RoPE-BAM indicates that the local features are enhanced using the convolution module on top of RoPE-BAM. The results show that the F1 values of CBAM-RoPE-BAM and RoPE-BAM are 1.67% and 0.49% higher than those of BAM, respectively, and the performance is improved for all seven types of entity recognition. We believe this is because the relative position feature can effectively improve the precision of the model. At the same time, the local features within the entity can help the model to perform entity recognition more comprehensively. As a result, the recall of the model has also improved.

In addition, we also conducted ablation experiments on the flat entity recognition dataset Resume, and the results are shown in Table 5. The CBAM-RoPE-BAM model improved the F1 values on several entity categories. And the precision was significantly enhanced on several entity types, with 3.3% and 2.72% improvement on the TITLE and ORG entity types, respectively. However, the recall of the RoPE-BAM model dropped dramatically by only 50% on the LOC entity type, which we believe is because the number of LOC entities in the training set is too small. The average length is basically 5, but only half of the entities in the test set are of length 5, resulting in overfitting. The results of the ablation experiments on all datasets show that our proposed method is effective for nested NER, flat NER, Chinese NER, and English NER tasks, demonstrating the generalization performance of the CBAM-RoPE-BAM model.

We explored the effect of the number of convolutional modules on the results of the GENIA dataset, and the results are shown in Figure 3, where N denotes the number of convolution modules. It can be seen from the table that the best performance is achieved when the number of convolutions N = 5, and the performance when only one convolution module is used is also better than that without a convolution module, demonstrating that convolution modules are effective in capturing local features of entities for entity classification.

5. Conclusions

In this paper, we propose a novel nested entity recognition model to address the problems of single features and poor generalization ability leading to low recall in current methods. Specifically, it is considered that the length feature of entities is an important factor affecting entity classification. Therefore, this paper utilizes rotary position embedding to capture the relative position features between the head and tail characters. Considering the span-based method ignores the internal features of the entity, this paper uses the convolution module to enhance the local features. The results of extensive experiments on five benchmark datasets prove that the relative position features can improve the precision of entity recognition pairs. At the same time, the local features inside the entity can effectively improve the generalization ability of the span-based model and thus increase the recall of the model.

Author Contributions

Conceptualization and methodology, J.D.; software, J.D.; validation, J.D., J.L. and X.Q.; formal analysis, J.D. and J.L.; investigation, J.D. and X.M.; resources, J.D.; data curation, X.Q.; writing—original draft preparation, J.D.; writing—review and editing, J.D., X.Q. and Z.J.; visualization, J.L. and X.M.; supervision, X.Q. and Z.J.; project administration, X.Q. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the research and application of food safety risk prevention and control and big data smart supervision technology in Xinjiang Uygur Autonomous Region (2020A03001).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, X.Y.; Ting, W.; Chen, H.W. Research on named entity recognition. Comput. Sci. 2005, 32, 44–48. [Google Scholar]
Hahne, A.; Angela, D.F. Electrophysiological evidence for two steps in syntactic analysis: Early automatic and late controlled processes. J. Cogn. Neurosci. 1999, 11, 194–205. [Google Scholar] [CrossRef] [PubMed]
Babych, B.; Anthony, H. Improving machine translation quality with automatic named entity recognition. In Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT through Other Language Technology Tools, Resource and Tools for Building MT at EACL 2003, Budapest, Hungary, 13 April 2003. [Google Scholar]
Soricut, R.; Eric, B. Automatic question answering: Beyond the factoid. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Boston, MA, USA, 2–7 May 2004. [Google Scholar]
Zhang, J.; Xie, J.; Hou, W.; Tu, X.; Xu, J.; Song, F.; Lu, Z. Mapping the knowledge structure of research on patient adherence: Knowledge domain visualization based co-word analysis and social network analysis. PLoS ONE 2012, 7, e34497. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rau, L.F. Extracting company names from text. In Proceedings the Seventh IEEE Conference on Artificial Intelligence Application, Miami Beach, FL, USA, 24–28 February 1991; IEEE Computer Society: Washington, DC, USA, 1991. [Google Scholar]
Petasis, G.; Vichot, F.; Wolinski, F.; Paliouras, G.; Karkaletsis, V.; Spyropoulos, C.D. Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, 6–11 July 2001; pp. 426–433. [Google Scholar]
Li, W.; McCallum, A. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. Asian Lang. Inf. Process. (TALIP) 2004, 2, 290–294. [Google Scholar] [CrossRef] [Green Version]
Mikheev, A.; Moens, M.; Grover, C. Named entity recognition without gazetteers. In Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, 8–12 June 1999; pp. 1–8. [Google Scholar]
Peng, N.; Mark, D. Improving named entity recognition for chinese social media with word segmentation representation learning. arXiv 2016, arXiv:1603.00786. [Google Scholar]
Dong, X.; Qian, L.; Guan, Y.; Huang, L.; Yu, Q.; Yang, J. A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In Proceedings of the 2016 New York Scientific Data Summit (NYSDS), New York, NY, USA, 14–17 August 2016; pp. 1–10. [Google Scholar]
Shao, Y.; Hardmeier, C.; Nivre, J. Multilingual named entity recognition using hybrid neural networks. In Proceedings of the Sixth Swedish Language Technology Conference (SLTC), Umeå, Sweden, 17–18 November 2016. [Google Scholar]
Wang, J.; Shou, L.; Chen, K.; Chen, G. Pyramid: A layered model for nested named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5918–5928. [Google Scholar]
Ju, M.; Miwa, M.; Ananiadou, S. A neural layered model for nested named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 1, pp. 1446–1459. [Google Scholar]
Lu, W.; Roth, D. Joint mention extraction and classification with mention hypergraphs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 857–867. [Google Scholar]
Katiyar, A.; Cardie, C. Nested named entity recognition revisited. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 1. [Google Scholar]
Sohrab, M.G.; Miwa, M. Deep exhaustive model for nested named entity recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2843–2849. [Google Scholar]
Shen, Y.; Ma, X.; Tan, Z.; Zhang, S.; Wang, W.; Lu, W. Locate and label: A two-stage identifier for nested named entity recognition. arXiv 2021, arXiv:2105.06804. [Google Scholar]
Zheng, C.; Cai, Y.; Xu, J.; Leung, H.F.; Xu, G. A boundary-aware neural model for nested named entity recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Toronto, ON, Canada, 2019. [Google Scholar]
Yuan, Z.; Tan, C.; Huang, S.; Huang, F. Fusing heterogeneous factors with triaffine mechanism for nested named entity recognition. arXiv 2021, arXiv:2110.07480. [Google Scholar]
Su, J.; Lu, Y.; Pan, S.; Murtadha, A.; Wen, B.; Liu, Y. Roformer: Enhanced transformer with rotary position embedding. arXiv 2021, arXiv:2104.09864. [Google Scholar]
Dozat, T.; Manning, C.D. Deep biaffine attention for neural dependency parsing. arXiv 2016, arXiv:1611.01734. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Krogh, A.; Larsson, B.; Von Heijne, G.; Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001, 305, 567–580. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ratinov, L.; Roth, D. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), Boulder, Colorado, 4–5 June 2009; pp. 147–155. [Google Scholar]
Konkol, M.; Konopík, M. CRF-based Czech named entity recognizer and consolidation of Czech NER research. In Text, Speech, and Dialogue: 16th International Conference, TSD 2013, Pilsen, Czech Republic, 1–5 September 2013; Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2013; pp. 153–160. [Google Scholar]
Kim, J.D.; Ohta, T.; Tateisi, Y.; Tsujii, J.I. GENIA corpus—A semantically annotated corpus for bio-textmining. Bioinformatics 2003, 9 (Suppl. 1), i180–i182. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, J.; Cheng, M.; Liu, Q.; Li, Z.; Chen, E. Nested Named Entity Recognition from Medical Texts: An Adaptive Shared Network Architecture with Attentive CRF. In CAAI International Conference on Artificial Intelligence, Beijing, China, 27–28 August 2022; Springer Nature: Cham, Switzerland, 2022; pp. 248–259. [Google Scholar]
Wang, B.; Lu, W.; Wang, Y.; Jin, H. A neural transition-based model for nested mention recognition. arXiv 2018, arXiv:1810.01808. [Google Scholar]
Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A unified MRC framework for named entity recognition. arXiv 2019, arXiv:1910.11476. [Google Scholar]
Zhu, E.; Li, J. Boundary smoothing for named entity recognition. arXiv 2022, arXiv:2204.12031. [Google Scholar]
Yu, J.; Bohnet, B.; Poesio, M. Named entity recognition as dependency parsing. arXiv 2020, arXiv:2005.07150. [Google Scholar]
Tan, C.; Qiu, W.; Chen, M.; Wang, R.; Huang, F. Boundary enhanced neural span classification for nested named entity recognition. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9016–9023. [Google Scholar] [CrossRef]
Gao, W.; Li, Y.; Guan, X.; Chen, S.; Zhao, S. Research on Named Entity Recognition Based on Multi-Task Learning and Biaffine Mechanism. Comput. Intell. Neurosci. 2022, 2022, 2687615. [Google Scholar] [CrossRef]
Xu, Y.; Huang, H.; Feng, C.; Hu, Y. A supervised multi-head self-attention network for nested named entity recognition. Proc. AAAI Conf. Artif. Intell. 2021, 35, 14185–14193. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Mitchell, A.; Strassel, P.; Huang, S.; Zakhary, R. ACE 2004 Multilingual Training Corpus LDC2005T09; Web Download; Linguistic Data Consortium: Philadelphia, PA, USA, 2005. [Google Scholar]
Walker, C.; Strassel, S.; Medero, J.; Maeda, K. ACE 2005 Multilingual Training Corpus LDC2006T06; Web Download; Linguistic Data Consortium: Philadelphia, PA, USA, 2006. [Google Scholar]
Levow, G.A. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22–23 July 2006; pp. 108–117. [Google Scholar]
Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. arXiv 2018, arXiv:1805.02023. [Google Scholar]
Xia, C.; Zhang, C.; Yang, T.; Li, Y.; Du, N.; Wu, X.; Yu, P. Multi-grained named entity recognition. arXiv 2019, arXiv:1906.08449. [Google Scholar]
Luan, Y.; Wadden, D.; He, L.; Shah, A.; Ostendorf, M.; Hajishirzi, H. A general framework for information extraction using dynamic span graphs. arXiv 2019, arXiv:1904.03296. [Google Scholar]
Straková, J.; Straka, M.; Hajič, J. Neural architectures for nested NER through linearization. arXiv 2019, arXiv:1908.06926. [Google Scholar]
Fu, Y.; Tan, C.; Chen, M.; Huang, S.; Huang, F. Nested named entity recognition with partially-observed treecrfs. Proc. AAAI Conf. Artif. Intell. 2021, 35, 12839–12847. [Google Scholar] [CrossRef]
Yan, H.; Deng, B.; Li, X.; Qiu, X. TENER: Adapting transformer encoder for named entity recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar]
Gui, T.; Zou, Y.; Zhang, Q.; Peng, M.; Fu, J.; Wei, Z.; Huang, X.J. A lexicon-based graph neural network for Chinese NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 1040–1050. [Google Scholar]
Kong, J.; Zhang, L.; Jiang, M.; Liu, T. Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition. J. Biomed. Inform. 2021, 116, 103737. [Google Scholar] [CrossRef]
Wu, S.; Song, X.; Feng, Z.; Wu, X. Nflat: Non-flat-lattice transformer for chinese named entity recognition. arXiv 2022, arXiv:2205.05832. [Google Scholar]

Figure 1. Sentences with nested entities.

Figure 2. The structure of our model. The model mainly consists of four parts: encoder, head–tail pair representation module, local feature representation module, and entity classification. Label 2 in the figure implies that the span (mouse, alpha) is a protein entity. Similarly, label 1 implies that the span (mouse, gene) is a DNA entity.

Figure 3. Effect of different numbers of convolution modules in the GENIA dataset on the results.

Table 1. Hyperparameter settings.

Dataset	Epochs	Batch Size	Learning Rate	Bert Learning Rate	LSTM Layer	LSTM Size	Start/Tail FFNN Size	Warm Factor	Gradient Clipping
ACE2004	15	8	5 × 10⁻⁴	5 × 10⁻⁶	1	768	384	0.1	1.0
ACE2005	15	12	5 × 10⁻⁴	5 × 10⁻⁶	1	768	384	0.1	1.0
GENIA	10	8	5 × 10⁻⁴	5 × 10⁻⁶	1	512	256	0.1	5.0
MSRA	10	6	1 × 10⁻³	5 × 10⁻⁶	1	512	256	0.1	5.0
Resume	10	12	5 × 10⁻³	5 × 10⁻⁶	1	512	256	0.1	5.0

Table 2. Comparison of our model with the baseline model on the nested entity recognition dataset. The highest score is marked in bold.

Model	ACE2004			ACE2005			GENIA
Model	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
Xie et al. [42]	81.70	77.40	79.50	79.00	77.30	78.20	--	--	--
Luan et al. [43]	--	--	84.70	--	--	82.90	--	--	72.60
Straková et al. [44]	--	--	84.33	--	--	83.42	--	--	78.20
Tan et al. [33]	85.80	84.80	85.30	83.80	83.90	83.90	79.20	77.40	78.30
Wang et al. [45]	86.08	86.48	86.20	83.95	85.93	84.60	79.45	78.94	79.10
Fu et al. [13]	86.70	86.50	86.60	84.50	86.40	85.40	78.20	78.20	78.20
Xu et al. [35]	86.90	85.80	86.30	85.70	85.20	85.40	80.30	78.90	79.60
Gao et al. [34]	84.88	85.78	85.33	84.23	86.15	85.18	80.62	80.68	80.65
Ours	86.96	86.36	86.66	84.94	86.73	85.83	82.35	80.33	81.33

Table 3. Comparison of our model and the baseline model on the flat entity recognition dataset. The highest score is marked in bold.

Model	MSRA			Resume
Model	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
Zhang and Yang [41]	93.57	92.79	93.18	94.81	94.11	94.46
Yan et al. [46]	--	--	92.74	--	--	95.00
Gui et al. [47]	94.50	92.93	93.71	95.37	94.84	95.11
Kong et al. [48]	93.51	92.51	93.01	94.69	95.21	94.95
Wu et al. [49]	94.92	94.19	94.55	95.63	95.52	95.58
Ours	95.83	95.76	95.79	96.62	96.32	96.47

Table 4. Results of the ablation experiments on the ACE2004 dataset. The highest score is marked in bold.

Entity Type	BAM			RoPE-BAM			CBAM-RoPE-BAM
Entity Type	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
GPE	85.27	86.93	86.09	89.33	89.33	86.51	88.40	86.93	87.66
ORG	77.12	80.62	78.83	80.94	81.52	81.23	81.14	82.61	81.87
PER	89.61	90.39	90.00	90.82	90.52	90.67	90.48	90.12	90.30
LOC	65.49	70.48	67.89	72.73	76.19	74.42	68.91	78.10	73.21
FAC	70.93	54.46	61.62	69.07	59.82	64.11	77.08	66.07	71.15
VEH	83.33	88.24	85.71	82.35	82.35	82.35	85.00	100.0	91.89
WEA	65.52	59.38	62.30	84.21	50.00	62.72	94.44	53.12	68.00
Total	84.55	85.44	84.99	87.16	85.21	86.17	86.96	86.36	86.66

Table 5. Results of the ablation experiments on the Resume dataset. The highest score is marked in bold.

Entity Type	BAM			RoPE-BAM			CBAM-RoPE-BAM
Entity Type	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
NAME	100.0	99.12	99.56	99.11	99.11	99.11	99.12	100.0	99.56
CONT	100.0	100.0	100.0	100.0	96.43	98.18	96.43	96.43	96.43
RACE	100.0	100.0	100.0	93.33	100.0	96.55	100.0	100.0	100.0
TITLE	93.69	96.11	94.88	97.34	94.95	96.16	96.99	96.11	96.55
EDU	95.69	99.11	97.37	99.10	98.21	98.65	95.69	99.11	97.37
ORG	94.33	96.20	95.26	95.15	95.84	95.50	97.05	95.30	96.17
PRO	84.62	100.0	91.67	93.33	84.85	88.89	78.59	100.0	88.00
LOC	75.00	100.0	85.71	100.0	50.00	66.67	100.0	66.67	80.00
Total	94.27	96.81	95.52	96.77	95.46	96.11	96.62	96.32	96.47

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, J.; Liu, J.; Ma, X.; Qin, X.; Jia, Z. Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module. Appl. Sci. 2023, 13, 9200. https://doi.org/10.3390/app13169200

AMA Style

Deng J, Liu J, Ma X, Qin X, Jia Z. Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module. Applied Sciences. 2023; 13(16):9200. https://doi.org/10.3390/app13169200

Chicago/Turabian Style

Deng, Jinxin, Junbao Liu, Xiaoqin Ma, Xizhong Qin, and Zhenhong Jia. 2023. "Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module" Applied Sciences 13, no. 16: 9200. https://doi.org/10.3390/app13169200

APA Style

Deng, J., Liu, J., Ma, X., Qin, X., & Jia, Z. (2023). Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module. Applied Sciences, 13(16), 9200. https://doi.org/10.3390/app13169200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Local Feature Enhancement for Nested Entity Recognition Using a Convolutional Block Attention Module

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Encoder

3.2. Head–Tail Pair Representation Module

3.3. Local Feature Representation Module

3.4. Entity Classification

4. Experiments

4.1. Datasets

4.2. Baselines

4.3. Hyperparameters and Evaluation Indicators

4.4. Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI