Next Article in Journal
Entropic Dynamics of Mutations in SARS-CoV-2 Genomic Sequences
Previous Article in Journal
Weighted Signed Networks Reveal Interactions between US Foreign Exchange Rates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FSN: Joint Entity and Relation Extraction Based on Filter Separator Network

1
School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China
2
Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi 830017, China
3
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(2), 162; https://doi.org/10.3390/e26020162
Submission received: 31 December 2023 / Revised: 1 February 2024 / Accepted: 6 February 2024 / Published: 12 February 2024

Abstract

:
Joint entity and relation extraction methods have attracted an increasing amount of attention recently due to their capacity to extract relational triples from intricate texts. However, most of the existing methods ignore the association and difference between the Named Entity Recognition (NER) subtask features and the Relation Extraction (RE) subtask features, which leads to an imbalance in the interaction between these two subtasks. To solve the above problems, we propose a new joint entity and relation extraction method, FSN. It contains a Filter Separator Network (FSN) module that employs a two-direction LSTM to filter and separate the information contained in a sentence and merges similar features through a splicing operation, thus solving the problem of the interaction imbalance between subtasks. In order to better extract the local feature information for each subtask, we designed a Named Entity Recognition Generation (NERG) module and a Relation Extraction Generation (REG) module by adopting the design idea of the decoder in Transformer and average pooling operations to better capture the entity boundary information in the sentence and the entity pair boundary information for each relation in the relational triple, respectively. Additionally, we propose a dynamic loss function that dynamically adjusts the learning weights of each subtask in each epoch according to the proportionality between each subtask, thus narrowing down the difference between the ideal and realistic results. We thoroughly evaluated our model on the SciERC dataset and the ACE2005 dataset. The experimental results demonstrate that our model achieves satisfactory results compared to the baseline model.

1. Introduction

Joint entity and relation extraction aims at extracting both entities and relations from a given text and finally connecting the semantic links between entities through relations, presenting the relation triples in the text in the form of (s, r, o). As subtasks of information extraction, joint entity and relation extraction provide theoretical and technical support for many research areas, such as knowledge graph construction [1], text summarization [2], and question answering [3].
The majority of the early research on Named Entity Recognition (NER) and Relation Extraction (RE) was realized through pipeline-based methods, such as the models proposed by Zelenko et al. [4] in 2002, Zhou et al. [5] in 2005, and Chan and Roth et al. [6] in 2011. However, this approach has two fatal drawbacks. First, it separates the two subtasks of NER and RE without taking into account the interaction between these two subtasks. Second, this method generally performs the NER task before the RE task, so it is susceptible to receiving the effect of error propagation [7].
In order to address problems that are difficult to solve with conventional pipeline-based methods, researchers have begun to explore joint entity and relation extraction methods, such as the models proposed by Yan et al. [8] in 2021, Ma et al. [9] in 2022, and Ma et al. [10] in 2022. Although these methods have made much progress in joint entity and relation extraction, they ignore the association and difference between the NER subtask features and the RE subtask features, which leads to an imbalance in the interaction between these two subtasks. As shown in Figure 1, the NER subtask features and RE subtask features have partial overlap in the input features. If these two features are not effectively separated, it can lead to the over-training of one subtask and the inadequate extraction of features for the other subtask.
Therefore, in order to address the above issues, we propose a new joint entity and relation extraction method, FSN. In order to balance the subtask interactions, we set up a Filter Separation Network (FSN) module, which first filters out the hidden state information and the memory state information in the sentence through the LSTM in both directions, and then separates the fused state information of the sentence into the features that are only related to the NER, the features shared by the two subtasks, and the features that are only related to the RE through the separation operation. Finally, the features related to the NER task and the features related to the RE task are obtained through the stitching operation. In order to be able to better extract the local feature information of the two subtasks separately, by adopting the idea of decoder construction in Transformer and pooling operations, we designed a Named Entity Recognition Generation (NERG) module to capture the boundary information of all entities in a sentence as well as a Relation Extraction Generation (REG) module to capture the entity pair boundary information corresponding to each relation in a sentence. We evaluated our model on the ACE2005 and SciERC datasets. Numerous experiments demonstrate that our model outperforms other models.
In summary, our contribution is as follows:
(1)
We propose a FSN module that employs a two-directional LSTM to filter and separate the information contained in sentences as well as a splicing operation to merge similar features, thus solving the problem of interaction imbalance between subtasks in joint entity and relation extraction.
(2)
We propose a NERG module and a REG module, which better capture the boundary information of all entities in a sentence and the entity pair boundary information corresponding to each relation in a sentence, respectively, by adopting pooling operations and the design ideas of the decoder in Transformer, thus enabling better extraction of local feature information for each subtask.
(3)
We propose a dynamic loss function that dynamically adjusts the learning weights of each subtask in each epoch according to the proportionality of losses between each subtask, thus narrowing down the difference between the ideal and realistic results.
(4)
We conducted extensive experiments on the ACE2005 and SciERC datasets, which demonstrated that our method achieves better results compared to the baseline model. Further ablation studies and analyses confirm the validity of each module construct in our model.

2. Related Work

The majority of the early research used a pipeline-based method before exploring joint entity and relation extraction methods, such as those utilized by Zelenko et al. [4] in 2002, Zhou et al. [5] in 2005, and Chan and Roth et al. [6] in 2011. This method can be separated into two different tasks: NER and RE. It initially extracts every entity from the input text before predicting the relations between every pair of entities. Nevertheless, this method suffers from two significant flaws. First, it divides the two tasks of NER and RE without taking into account their interaction, and second, it is susceptible to mistake propagation [7].
In order to address the issues of conventional pipeline-based methods, researchers have begun to explore joint entity and relation extraction methods. These can be divided into two main categories: feature engineering-based methods and neural network- based methods.
The feature engineering-based method first transforms the raw data into features that express the essence of the problem and then applies these features to the model to improve the model performance, such as in the models proposed by Kate et al. [11] in 2010, Yu et al. [12] in 2010, Miwa et al. [13] in 2014, and so on. However, this method relies heavily on Natural Language Processing (NLP) tools in the process of acquiring features, requires a large amount of manpower and specialized domain knowledge, and suffers from the same problem of error propagation, which ultimately affects the results of joint extraction.
Due to the excellent feature learning ability of neural networks [14], neural network-based methods are gradually applied to joint entity and relation extraction. We categorize these methods into two primary categories based on the research lines adopted by the current neural network-based methods.
Shared parameters-based methods. These methods allow each subtask to have an independent decoder, and information interaction is achieved by letting subtasks share sequence-encoding information among themselves, such as the models proposed by Miwa et al. [15] in 2016, Dai et al. [16] in 2019, Yuan et al. [17] in 2020, Shen et al. [18] in 2021, Xiong et al. [19] in 2022, and so on. However, it is exceptionally difficult for such methods to explore the interaction between two subtasks in depth.
Joint decoding-based method. This method usually superimposes a unified decoder on the sequence coding layer, which is directly decoded to obtain the relational triple information. Examples include the models proposed by Wang et al. [20] in 2020, Ren et al. [21] in 2021, Yan et al. [8] in 2021, Ma et al. [9] in 2022, Ma et al. [10] in 2022, and so on. However, this method requires the design of complex decoding architectures, which prevents each subtask from adequately extracting local features.
It can be seen that both of the above methods have fatal flaws and cannot effectively solve the problem proposed in this paper. Therefore, in order to solve the above problem, we designed a filter separation network. It first filters the hidden state information as well as the memory state information from each word to the next word in the forward and reverse directions of the sentence, then adopts the idea of partitioning to classify the fusion state information into features related to NER only, features shared by the two subtasks, and features related to RE only, and finally realizes the interaction balance of the two subtasks through the splicing operation. In addition, we designed the NERG and REG modules to further capture the local feature information in the NER and RE tasks, respectively. We conducted extensive experiments on the ACE2005 and SciERC datasets, and the experimental results demonstrate the validity of our model design.

3. Methodology

We will describe our model design in this section. The general structure of our model is shown in Figure 2, which consists of an Encoder module, a Filter Separator Network (FSN) module, a Named Entity Recognition Generation (NERG) module, and a Relation Extraction Generation (REG) module. For each given sentence S =  ω 1 ω 2 ω n , we first generate the sentence representation through the Encoder module, then feed the sentence representation to the FSN module to obtain the information related to NER and the information related to RE, and then finally feed these two kinds of information into the NERG module and the REG module, respectively, so as to complete the extraction of the entities in the sentence as well as the relation triples.

3.1. Encoder Module

Here, we use the pre-trained model BERT-Base-Cased [22] as an encoder for our model. For each given sentence, the module first encodes the sentence into a sequence of token representations (notated as  H R n × d n ). For the NERG module, we transmit the token representation sequence H generated by the encoder to two independent FFNs (Feed-Forward Networks) to generate the feature  H e 1  representing the start boundary of the entity and the feature  H e 2  representing the end boundary of the entity, respectively, as expressed in Equation (1).
H e 1 = W e 1 H + b e 1 H e 2 = W e 2 H + b e 2
where  W e 1 / e 2 R d h × d h  and  b e 1 / e 2 R d h  are trainable weights and biases, respectively.
For the REG module, we send the token representation sequence H generated by the encoder to two independent FFNs (Feed-Forward Networks) to generate the feature  H r 1  representing the start boundary of the entity pair and the feature  H r 2  representing the end boundary of the entity pair, respectively, as expressed in Equation (2).
H r 1 = W r 1 H + b r 1 H r 2 = W r 2 H + b r 2
where  W r 1 / r 2 R d h × d h  and  b r 1 / r 2 R d h  are trainable weights and biases, respectively.

3.2. Filter Separator Network (FSN) Module

The structure of the FSN module is shown in Figure 3. The FSN module first utilizes the properties of LSTM to extract the hidden state information and memory state information from each word to the next word in the sentence using LSTM in both directions. Then, the hidden state information and memorized state information obtained by inputting the same word into the LSTM in both directions are fused, thereby obtaining the fused-state feature X =  x 1 x 2 x n  for the sentence. The separation operation is then used to separate the fusion state into features related only to NER, shared features, and features related only to RE. Finally, we splice the shared features with the features related to NER only and RE only to obtain the features related to NER in the sentence and the features related to RE in the sentence, respectively.

3.2.1. Filter

Since the hidden state information in LSTM captures the information of the current time step and passes this information to the next time step, it enables continuous modeling of sequence data; and the memory state information controls the flow and retention of information, which allows the model to selectively forget and retain the information, thus enabling the capture of long-term dependencies as well as a better prediction of future sequences. Therefore, we use two-direction ground LSTM to capture sentence bi-directional hidden state information as well as memorized state information. The specific formula is shown in Equation (3).
H 1 t + 1 , C 1 t + 1 = L S T M ( ω t , H 1 t , C 1 t ) H 2 n t , C 2 n t = L S T M ( ω n t 1 , H 2 n t 1 , C 2 n t 1 )
where  ω t  denotes the i-th word in the sentence S.  H 1 t + 1  and  C 1 t + 1  denote the hidden state information and memorized state information from  ω t  to  ω t + 1 , respectively.  H 2 n t  and  C 2 n t  denote the hidden state information and memorized state information from  ω n t 1  to  ω n t , respectively.
In order to extract all the information in the sentence related to the NER task and the RE task, we fuse  H 1 t + 1 C 1 t + 1 H 2 n t , and  C 2 n t , thus obtaining the fusion state information of the sentence  x t . The specific formula is shown in Equation (4).
x t = H 1 t + 1 + C 1 t + 1 + H 2 n t + C 2 n t

3.2.2. Separator

Since the fusion state information X =  x 1 x 2 x n  contains information related both to the NER task and the RE task, which in most cases will contain each other, it is difficult to extract these two types of information independently. Therefore, we adopt the idea of separation to separate the fusion state information into three types of features, namely features related only to NER,  μ e , shared features,  μ s , and features related only to RE,  μ r . The exact formula is shown in Equation (5).
μ e = X [ 0 , 1 / 3 ] μ s = X 1 / 3 , 2 / 3 μ r = X 2 / 3 , n
where X[0,1/3] denotes the features in the first one-third of the fused state information;  X 1 / 3 , 2 / 3  denotes the features in the middle one-third of the fused state information; and  X 2 / 3 , n  denotes the features in the last one-third of the fused state information.
Since the features associated with the NER task contain both  μ e  and  μ s , and the features associated with the RE task include both  μ r  and  μ s , for each sub-task, we use a splicing operation. The two features are spliced and finally the features  H n e r  related to the NER task and  H r e  related to the RE task are obtained. The exact formula is shown in Equation (6).
H n e r = W e s ( μ e : μ s ) + b e s H r e = W s r ( μ s : μ r ) + b s r

3.3. Named Entity Recognition Generation (NERG) Module

The NERG module is shown in Figure 4. In order to better extract all entities in a sentence, we use a feature  H e 1  associated with the start boundary of the entity and a feature  H e 2  associated with the end boundary of the entity to represent the boundary information of all entities in the sentence. In order to target the boundary information of entities more accurately, we adopt the design idea of the decoder in the Transformer [23] model to capture the maximum features of entity boundary information,  H e 1 _  and  H e 2 _ , as well as to allow them to be associated with the features  H n e r  that are relevant to the task of NER. The specific flow of the module is as follows.
First, in order to interact the feature  H e 1  associated with the start boundary of the entity and the feature  H e 2  associated with the end boundary of the entity, we apply the Hadamard product operation to  H e 1  as well as  H e 2  to generate a unified table feature of entity boundary information  UF n e r . The exact formula is shown in Equation (7).
U F n e r ( i , j ) = W n e r ( H e 1 , i H e 2 , j ) + b n e r
where • represents the Hadamard product, and  H e 1 , i  and  H e 2 , j  are the feature representations of the tokens  ω i  and  ω j , respectively.
Then, in order to capture the maximum association of the entity boundary information with the NER features, we use the maximum pooling operation to extract the maximum entity boundary features from the unified table features. The exact formula is shown in Equation (8).
H e 1 _ = W e 1 _ m a x p o o l e 1 ( U F n e r ) + b e 1 _ H e 2 _ = W e 2 _ m a x p o o l e 2 ( U F n e r ) + b e 2 _
Next, we adopted a design idea based on the decoder in Transformer [23]. Multi-head self-attention is first used to capture the maximum intrinsic association of entity boundaries between entities in a sentence. Then, we use the multi-head attention method to allow the maximum entity boundary information in the sentence to fully interact with the features  H n e r  that are relevant to the task of NER to mine the information that can target the entity boundary in the NER task. Finally, we fuse the obtained information with the original entity boundary information to obtain the new entity boundary information. The specific formula is shown in Equation (9).
H e 1 _ / e 2 _ = M u l t i H e a d S e l f A t t ( U F n e r ) H e 1 _ / e 2 _ = M u l t i H e a d A t t ( H e 1 _ / e 2 _ , H n e r , H n e r ) H e 1 / e 2 = H e 1 / e 2 + H e 1 _ / e 2 _
Finally, we again use the Hadamard product operation to obtain the unified table features  UF n e r _  of the entity boundary information and perform table filling to generate the NER task. The specific formula is shown in Equation (10).
U F n e r _ ( i , j ) = W n e r _ ( H e 1 , i H e 2 , j ) + b n e r _ t a b l e n e r ^ ( i , j ) = s o f t m a x ( R e L U ( U F n e r _ ) ) t a b l e n e r ( i , j ) = a r g m a x l L n e r ( t a b l e n e r ^ ( i , j ) )
where  t a b l e n e r ^ ( i , j )  denotes the initial table features for the named body recognition task, and  t a b l e n e r ( i , j )  denotes the labeling results of the entities  ω i j .

3.4. Relation Extraction Generation (REG) Module

The REG module is shown in Figure 5. We use the features  H r 1  and  H r 2  to represent the entity pair start boundary and entity pair end boundary for each relation in the sentence. In order to be able to target the entity pair boundary information corresponding to each relation more accurately, we adopt the design idea of the decoder in the Transformer [23] model to capture the association of the entity pair boundary information maximal features  H r 1 _  and  H r 2 _  of each relation with the sequence of sentence token representations H. In addition, in order to fuse the features associated with the RE task  H r e , we use an average pooling operation to fuse  H r e  into each table entry in the RE task. The specific flow of the module is as follows.
First, in order to correlate the entity-pair start boundary information and entity-pair end boundary information with each other, we perform a Hadamard product operation on  H r 1  and  H r 2  to generate a table of entity-pair boundary information (i.e., a unified table feature)  U F r e  for each relation in the sentence. The specific formula is shown in Equation (11).
U F r e ( i , j , r ) = W r e ( H r 1 , i , r H r 2 , j , r ) + b r e
where • represents the Hadamard product, and  H r 1 , i , r  and  H r 2 , j , r  denote the feature representations of tokens  ω i  and  ω j  of relation r, respectively.
Then, since the determination of the boundary information of subject and object in the relational triple is closely related to the semantic information of the sentence, we adopt the maximum pooling operation here to extract the maximum boundary information from the entity hidden in  U F r e . The specific formula is shown in Equation (12).
H r 1 _ = W r 1 _ m a x p o o l r 1 ( U F r e ) + b r 1 _ H r 2 _ = W r 2 _ m a x p o o l r 2 ( U F r e ) + b r 2 _
Next, we use the same idea based on the decoder in Transformer [23]. Multi-head self-attention is first used to capture the interconnection of entity pair boundary information between relational triples in a sentence. Then, we use a multi-head attention method to allow the maximum entity pair boundary information in the sentence to interact sufficiently with the sequence of sentence token representations H to more accurately target the entity pair boundaries of each relational triple in the sentence. Finally, we fuse the obtained information with the original entity pair boundary information to become the new entity pair boundary information. The specific formula is shown in Equation (13).
H r 1 _ / r 2 _ = M u l t i H e a d S e l f A t t ( U F r e ) H r 1 _ / r 2 _ = M u l t i H e a d A t t ( H r 1 _ / r 2 _ , H , H ) H r 1 / r 2 = H r 1 / r 2 + H r 1 _ / r 2 _
Finally, since the new entity pair boundary information does not fuse the feature  H r e , which is relevant to the RE task, we apply an average pooling operation to  H r e  to compress its embedded feature information into a single word. Finally, it is fused into each table entry in the new unified table feature  UF r e _ , and table filling is performed for each table entry to generate the RE task. The specific formula is shown in Equation (14).
H a v g = W a v g a v g p o o l r e ( H r e ) + b a v g U F a v g _ ( i , j , r ) = W a v g _ ( H a v g _ , i , r H a v g _ , j , r ) + b a v g _ U F r e _ ( i , j , r ) = W r e _ ( H r 1 _ , i , r H r 2 _ , j , r ) + b r e _ t a b l e r e ^ ( i , j , r ) = s o f t m a x ( R e L U ( U F r e _ + U F a v g _ ) ) t a b l e r e ( i , j , r ) = a r g m a x l L r e ( t a b l e r e ^ ( i , j , r ) )
where  t a b l e r e ^ ( i , j , r )  denotes the initial table features for the relation extraction task, and  t a b l e r e ( i , j , r )  denotes the labeling results of token pairs ( ω i , ω j ) for relation r.

3.5. Loss Function

The loss function of our model is as follows. For each given training set, the loss function  L  that guides the model during training consists of two parts:  L n e r  denotes the loss function for the NER task and  L r e  denotes the loss function for the RE task. In addition, we perform a Sigmoid operation on the values of  L n e r  and  L r e  to dynamically control the learning weights of the NER task and the RE task.
L n e r = i = 1 n j = 1 n y i , j l o g P i , j L r e = i = 1 n j = 1 n r = 1 R y i , j , r l o g P i , j , r L = e x p ( L n e r ) e x p ( L n e r ) + e x p ( L r e ) L n e r + e x p ( L r e ) e x p ( L n e r ) + e x p ( L r e ) L r e
where (i, j) denotes the index of ( ω i ω j ) labels in the NER task; (i, j, r) denotes the index of ( ω i , ω j ) labels with relation r in the RE task; and both  L n e r  and  L r e  use the cross-entropy loss function.

4. Experiments

4.1. Experimental Settings

4.1.1. Datasets

We evaluated our model on the ACE2005 [24] dataset as well as the SciERC [25] dataset. The ACE2005 dataset was collected from a variety of sources, including news articles and online forums. This dataset was built on top of the ACE2004 dataset and is often used as a benchmark test for NER and RE methods. In the ACE2005 dataset, seven entity categories were defined and six relation categories were defined for each pair of entities. The SciERC dataset is derived from 500 abstracts taken from papers in the field of artificial intelligence, which include annotations on scientific entities, their relations, and co-reference clusters. The dataset is predefined with six scientific entity types and seven relation types. The purpose of this dataset is to provide a benchmark test dataset for evaluating the performance of NER and RE tasks. The specific content distribution of these two datasets is shown in Table 1.

4.1.2. Evaluation Metrics

We use precision, recall, and micro-F1 as our evaluation metrics. For NER, a prediction is considered correct only if the predicted entity boundaries as well as types match the ground truth exactly; for RE, a prediction is considered correct only if the predicted entity boundaries as well as relation types in the relational triple match the ground truth exactly; and for RE+, a prediction is considered correct only if the predicted entity boundaries and entity types as well as relation types in the relational triple match the ground truth exactly. In addition, for a fair model comparison, we discuss only the case where the encoder is BERT-Base-Cased [22] on the ACE2005 dataset, and only the case where the encoder is SciBERT [26] on the SciERC dataset.

4.1.3. Baselines

We compare the model with the following joint entity and relation extraction models: SPE [27], MRC4ERE++ [28], TRIMF [18], UNIRE [29], PURE [30], PFN [8], TablERT [9], TablERT-CNN [10], MGE [19], and PL-Marker [31].
Most of the experimental results of these baseline models were copied directly from their original papers.

4.1.4. Implementation Details

Our experiments were carried out on an Ubuntu 18.04.6 LTS workstation with a single A40. We used the Adam [32] optimizer for model training. The learning rate was 1 × 10 5  on the ACE2005 dataset and 3 × 10 5  on the SciERC dataset. The number of training epoch was 100. The batch size of the training set was set to 4. The batch size of the validation and test sets was set to 6. We set the maximum length of the input sentence to 100. The other parameters were randomly initialized.

4.2. Main Experimental Results

Table 2 demonstrates the performance comparison of our model with other benchmark models. From Table 2, it can be seen that our model NER’s F1 scores on the ACE2005 dataset and SciERC dataset are 0.4% and 0.4% lower than the F1 scores of the best model, respectively. However, our model achieved optimality on RE and RE+. This is due to the fact that previous models have focused more on the performance enhancement of the NER task and have not fully explored the effect of the subtask interaction balance between the entity and relation on the relational triple extraction. We set up the FSN module to separate the features related only to the NER task and the features related only to the RE task, so as to accomplish the NER task and the RE task so that they fully interact with each other, and then to achieve the intrinsic correlation between entity and relation. This is a testament to the strength of our FSN module design.
Compared to the joint entity and relation extraction model PFN, which is also based on table filling, our model achieved absolute performance gains on both the ACE2005 dataset and the SciERC dataset. We attribute this performance improvement to the NERG and REG modules we set up. The NERG and REG modules can more accurately target all entity boundaries contained in a sentence and entity pair boundaries in a relational triple, respectively. In addition to this, we explored the performance differences between the pipeline-based method and the joint entity and relation extraction method. Compared to the PURE model using the pipeline-based method, our model achieved performance improvements of 2.9%, 6.0%, and 6.7% for the F1 scores of NER, F1 scores of RE, and F1 scores of RE+ on the SciERC dataset, respectively. In addition to the reason that joint entity and relation extraction can solve the subtask independence problem as well as the error propagation problem, we attribute this performance improvement to the setup of the FSN module. The FSN module interacts with the NER task and the RE task by setting up a shared partition, which solves the problems of difficult interaction between subtasks and error propagation in conventional pipeline-based methods.

4.3. Ablation Study

In this section, we explore the impact of each part of our model on RE+. Some of the parts of the model explored include the FSN module (Forward_LSTM, Backward_LSTM), NERG module (NER_MFE), and REG module (RE_MFE, RE_AvgPooling).
We mainly explore the effect of forward LSTM filtered sentence information and reverse LSTM filtered sentence information on balancing subtask interactions in the FSN module. As shown in Table 3, removing the forward LSTM and backward LSTM in the FSN module reduces the RE+ scores by 3.5% and 3.8%, respectively. This is because the hidden state information in LSTM captures the information of the current time step and passes this information to the next time step, and enables continuous modeling of sequence data, whereas the memory state information controls the flow and retention of information, which allows the model to selectively forget and retain the information, thus enabling the capture of long-term dependencies as well as better prediction of future sequences. Removing forward LSTM and backward LSTM will result in that word-to-word hidden state information and word-to-word memory state information will not be captured.
When we removed the NERG module, we found that RE+ scores dropped by 3.6%. This huge performance gain is attributed to the NERG module’s ability to capture the maximum correlation between sentence semantic information and entity pair boundary information by fully utilizing Hadamard product operations and attention mechanisms, which enables better extraction of entity pair boundary information in relational triples.
Similarly, we explored the impact of removing the local feature extraction part of the REG module on RE task performance. As can be seen in Table 3, the performance has decreased by 4.1%. This indicates that the local feature extraction part of the REG module can contribute to capture the entity pair boundary information corresponding to each relation in a sentence. In addition to this, we performed an ablation study on the maximum pooling operation in the REG module. In Table 3, it is shown that removing the maximum pooling operation in the REG module decreases the RE+ performance by 3.3%. The main reason for this is that the features related to the RE task contain associations between entity and relation in a relational triple. This association is incorporated into each table entry of the RE task through the average pooling operation, which improves the performance of RE+.

4.4. Robustness Test on Named Entity Recognition

We use robustness tests to evaluate the stability of our model in the face of various special cases. The performance of our model as well as the baseline model in the face of a NER-facing dataset domain transformation method proposed by Wang et al. [33] is demonstrated in Table 4, and the specific transformation method is shown at https://www.textflint.io/. We compare our model with several unrelated models, including the BiLSTM-CRF model [34], the BERT model [22], the TENER model [35], and the Flair Embeddings model [36].
Based on the observations in Table 4, we can find that our model exhibits greater robustness in the face of input perturbations compared to other baseline models, especially in the case of cross-categories. This increase in robustness may be attributed to the fact that we use relational signaling of type-constrained entities in our training. In our model, reasoning about entity types is not only influenced by the semantic meaning of the target entity itself, but also by the (relation) context around the entity. This means that our model takes into account the contextual information around the entity when reasoning about entity types, rather than relying only on the characteristics of the entity itself. This type-constrained training approach allows our model to better understand the relationship between an entity and its surroundings, which improves its robustness in the presence of input perturbations. When confronted with cross-category situations, where the type of an entity does not exactly match the type of other entities in its surroundings, our model is better able to adapt to and deal with this complexity.

4.5. Model Efficiency

We evaluate the training time as well as the inference time for the efficiency of our model mainly with PFN, a joint entity and relation extraction model that also employs a table-filling method. The results in Table 5 demonstrate that while both our model and the PFN model are theoretically O(N L 2 ), our model took less time to train on the ACE2005 dataset as well as the SciERC dataset. We attribute this improvement in model training efficiency to the FSN module in our model. Compared to the previous joint entity and relation extraction models, the FSN module makes it simpler to perform feature extraction for all subtasks, as well as making it simpler to accomplish subtask interactions through partitioning operations by extracting the hidden state information and memory state information from each word to the next in a sentence. Although the model required similar model inference time on both datasets, our model achieved 7.1% and 3.9% performance improvement over the PFN model on the ACE2005 dataset as well as the SciERC dataset, respectively, which is sufficient to demonstrate the advantages of our model design.

5. Conclusions

In this paper, we mainly analyze the advantages and disadvantages of joint learning methods based on shared parameters and joint learning methods based on joint decoding and propose a new joint entity and relation extraction method, which sets up a FSN module to solve the problem of interaction imbalance among subtasks by adopting the filter and separator as well as splicing operation. We also set up a NERG module and a REG module to solve the problem of insufficient extraction of local features from subtasks by adopting the design idea of the decoder in Transformer and a pooling operation. In addition, we propose a dynamic loss function for model optimization. We conducted comprehensive experiments on two public datasets, demonstrating that our model yields more desirable outcomes compared to the baseline model. Further analyses and ablation studies validate the significance of every modular component in our model.

Author Contributions

Conceptualization, Q.D.; methodology, Q.D.; validation, Q.D.; data curation, Q.D., L.H. and Y.L.; writing—original draft preparation, Q.D.; writing—review and editing, Q.D., W.Y. and F.W.; supervision, Q.D.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

We thank all anonymous reviewers for their constructive comments. This work is supported by the “Tianshan Talent” Research Project of Xinjiang (No.2022TSYCLJ0037), the National Natural Science Foundation of China (No.62262065), the Autonomous Region Science and Technology Program (No.2022B01008-2), the Autonomous Region Science and Technology Program (No.2020A02001-1), and the Autonomous Region Science and Technology Program (No.2021D01C080).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to copyright.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Riedel, S.; Yao, L.; McCallum, A.; Marlin, B.M. Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 74–84. [Google Scholar]
  2. Dalal, V.; Malik, L. A Survey of Extractive and Abstractive Text Summarization Techniques. In Proceedings of the 2013 6th International Conference on Emerging Trends in Engineering and Technology, Nagpur, India, 16–18 December 2013; pp. 109–110. [Google Scholar] [CrossRef]
  3. Diefenbach, D.; Lopez, V.; Singh, K.; Maret, P. Core Techniques of Question Answering Systems over Knowledge Bases: A Survey. Knowl. Inf. Syst. 2018, 55, 529–569. [Google Scholar] [CrossRef]
  4. Zelenko, D.; Aone, C.; Richardella, A. Kernel Methods for Relation Extraction. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, PA, USA, 6 July 2002; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 71–78. [Google Scholar] [CrossRef]
  5. Zhou, G.; Su, J.; Zhang, J.; Zhang, M. Exploring Various Knowledge in Relation Extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 427–434. [Google Scholar] [CrossRef]
  6. Chan, Y.S.; Roth, D. Exploiting Syntactico-Semantic Structures for Relation Extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 551–560. [Google Scholar]
  7. Ren, F.; Zhang, L.; Zhao, X.; Yin, S.; Liu, S.; Li, B. A Simple but Effective Bidirectional Framework for Relational Triple Extraction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, New York, NY, USA, 21–25 February 2022; pp. 824–832. [Google Scholar] [CrossRef]
  8. Yan, Z.; Zhang, C.; Fu, J.; Zhang, Q.; Wei, Z. A Partition Filter Network for Joint Entity and Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event/Punta Cana, Dominican Republic, 7–11 November 2021; pp. 185–197. [Google Scholar] [CrossRef]
  9. Ma, Y.; Hiraoka, T.; Okazaki, N. Named Entity Recognition and Relation Extraction Using Enhanced Table Filling by Contextualized Representations. J. Nat. Lang. Process. 2022, 29, 187–223. [Google Scholar] [CrossRef]
  10. Ma, Y.; Hiraoka, T.; Okazaki, N. Joint Entity and Relation Extraction Based on Table Labeling Using Convolutional Neural Networks. In Proceedings of the Sixth Workshop on Structured Prediction for NLP, Dublin, Ireland, 27 May 2022; pp. 11–21. [Google Scholar] [CrossRef]
  11. Kate, R.J.; Mooney, R. Joint Entity and Relation Extraction Using Card-Pyramid Parsing. In Proceedings of the Conference on Computational Natural Language Learning, Uppsala, Sweden, 15–16 July 2010. [Google Scholar]
  12. Yu, X.; Lam, W. Jointly Identifying Entities and Extracting Relations in Encyclopedia Text via A Graphical Model Approach. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, 23–27 August 2010; Volume 2, pp. 1399–1407. [Google Scholar]
  13. Miwa, M.; Sasaki, Y. Modeling Joint Entity and Relation Extraction with Table Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
  14. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
  15. Miwa, M.; Bansal, M. End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1105–1116. [Google Scholar] [CrossRef]
  16. Dai, D.; Xiao, X.; Lyu, Y.; Dou, S.; She, Q.; Wang, H. Joint Extraction of Entities and Overlapping Relations Using Position-Attentive Sequence Labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6300–6308. [Google Scholar] [CrossRef]
  17. Yuan, Y.; Zhou, X.; Pan, S.; Zhu, Q.; Song, Z.; Guo, L. A Relation-Specific Attention Network for Joint Entity and Relation Extraction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; Volume 4, pp. 4054–4060. [Google Scholar] [CrossRef]
  18. Shen, Y.; Ma, X.; Tang, Y.; Lu, W. A Trigger-Sense Memory Flow Framework for Joint Entity and Relation Extraction. In Proceedings of the Web Conference 2021, WWW’21, Ljubljana, Slovenia, 19–23 April 2021; pp. 1704–1715. [Google Scholar] [CrossRef]
  19. Xiong, X.; Liu, Y.; Liu, A.; Gong, S.; Li, S. A Multi-Gate Encoder for Joint Entity and Relation Extraction. In Proceedings of the Chinese Computational Linguistics; Lecture Notes in Computer Science; Sun, M., Liu, Y., Che, W., Feng, Y., Qiu, X., Rao, G., Chen, Y., Eds.; Springer: Cham, Switzerland, 2022; pp. 163–179. [Google Scholar] [CrossRef]
  20. Wang, Y.; Yu, B.; Zhang, Y.; Liu, T.; Zhu, H.; Sun, L. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking. In Proceedings of the International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1572–1582. [Google Scholar] [CrossRef]
  21. Ren, F.; Zhang, L.; Yin, S.; Zhao, X.; Liu, S.; Li, B.; Liu, Y. A Novel Global Feature-Oriented Relational Triple Extraction Model Based on Table Filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Virtual Event/Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2646–2656. [Google Scholar] [CrossRef]
  22. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MI, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  23. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  24. Walker, C.; Strassel, S.; Medero, J.; Maeda, K. ACE 2005 Multilingual Training Corpus; Linguistic Data Consortium: Philadelphia, PA, USA, 2006. [Google Scholar] [CrossRef]
  25. Luan, Y.; He, L.; Ostendorf, M.; Hajishirzi, H. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3219–3232. [Google Scholar] [CrossRef]
  26. Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. arXiv 2019, arXiv:1903.10676. [Google Scholar]
  27. Wang, Y.; Sun, C.; Wu, Y.; Yan, J.; Gao, P.; Xie, G. Pre-Training Entity Relation Encoder with Intra-span and Inter-span Information. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1692–1705. [Google Scholar] [CrossRef]
  28. Zhao, T.; Yan, Z.; Cao, Y.; Li, Z. Asking Effective and Diverse Questions: A Machine Reading Comprehension Based Framework for Joint Entity-Relation Extraction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; Volume 4, pp. 3948–3954. [Google Scholar] [CrossRef]
  29. Wang, Y.; Sun, C.; Wu, Y.; Zhou, H.; Li, L.; Yan, J. UniRE: A Unified Label Space for Entity Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 220–231. [Google Scholar] [CrossRef]
  30. Zhong, Z.; Chen, D. A Frustratingly Easy Approach for Entity and Relation Extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 50–61. [Google Scholar] [CrossRef]
  31. Ye, D.; Lin, Y.; Li, P.; Sun, M. Packed Levitated Marker for Entity and Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 4904–4917. [Google Scholar] [CrossRef]
  32. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
  33. Wang, X.; Liu, Q.; Gui, T.; Zhang, Q.; Zou, Y.; Zhou, X.; Ye, J.; Zhang, Y.; Zheng, R.; Pang, Z.; et al. TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, Online, 1–6 August 2021; pp. 347–355. [Google Scholar] [CrossRef]
  34. Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar] [CrossRef]
  35. Yan, H.; Deng, B.; Li, X.; Qiu, X. TENER: Adapting Transformer Encoder for Named Entity Recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar] [CrossRef]
  36. Akbik, A.; Bergmann, T.; Blythe, D.; Rasul, K.; Schweter, S.; Vollgraf, R. FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MI, USA, 2–7 June 2019; pp. 54–59. [Google Scholar] [CrossRef]
Figure 1. Subtask feature distribution. Pink represents the distribution of Named Entity Recognition (NER) features in the input features. Green represents the distribution of Relation Extraction (RE) features in the input features.
Figure 1. Subtask feature distribution. Pink represents the distribution of Named Entity Recognition (NER) features in the input features. Green represents the distribution of Relation Extraction (RE) features in the input features.
Entropy 26 00162 g001
Figure 2. General model architecture.
Figure 2. General model architecture.
Entropy 26 00162 g002
Figure 3. Filter Separator Network (FSN) module.
Figure 3. Filter Separator Network (FSN) module.
Entropy 26 00162 g003
Figure 4. Named Entity Recognition Generation (NERG) module.
Figure 4. Named Entity Recognition Generation (NERG) module.
Entropy 26 00162 g004
Figure 5. Relation Extraction Generation (REG) module.
Figure 5. Relation Extraction Generation (REG) module.
Entropy 26 00162 g005
Table 1. Statistics of datasets.  | E |  and  | R |  denote the number of entity types and the number of relation types, respectively.
Table 1. Statistics of datasets.  | E |  and  | R |  denote the number of entity types and the number of relation types, respectively.
DatasetSentencesEntitiesRelations   | E |   | R |
TrainDevTest
ACE200510,0512424205038,287707076
SciERC18612755518094468467
Table 2. Results of the main experiments on the ACE2005 and SciERC datasets. * denotes results generated from source code. denotes that the model leverages the cross-sentence information. The best results are shown in bold and the second-best results are underlined. BERT-Based-Cased [22] and SciBERT [26] were used on the ACE2005 and SciERC datasets, respectively.
Table 2. Results of the main experiments on the ACE2005 and SciERC datasets. * denotes results generated from source code. denotes that the model leverages the cross-sentence information. The best results are shown in bold and the second-best results are underlined. BERT-Based-Cased [22] and SciBERT [26] were used on the ACE2005 and SciERC datasets, respectively.
DatasetModelNERRERE+
ACE2005SPE87.266.763.2
MRC4ERE++85.5-62.1
  T r i M F 87.666.562.8
PFN85.5 *-58.6 *
PURE88.766.763.9
TablERT87.666.262.6
TablERT-CNN87.865.061.8
FSN88.368.765.7
SciERCSPE68.047.634.6
  U n i R E 68.4-36.9
PURE66.648.235.6
PFN66.8-38.4
MGE68.4-39.4
  P L M a r k e r 69.953.241.6
FSN69.554.242.3
Table 3. Ablation study of FSN on ACE2005. The best of these experimental results are marked in bold.
Table 3. Ablation study of FSN on ACE2005. The best of these experimental results are marked in bold.
AlbationPre.Rec.F1
FSN70.061.965.7
w/o Forward_LSTM67.857.562.2
w/o Backward_LSTM66.058.261.9
w/o NER_MFE66.458.362.1
w/o RE_MFE65.358.261.6
w/o RE_AvgPooling66.758.762.4
Table 4. Robustness test of NER against input perturbation in ACE2005; baseline results and test files are copied from https://www.textflint.io/ (accessed on 30 December 2023).
Table 4. Robustness test of NER against input perturbation in ACE2005; baseline results and test files are copied from https://www.textflint.io/ (accessed on 30 December 2023).
ModelConcatSentCrossCategoryEntTyposOOVSwapLongerAverage
Ori→AugDeclineOri→AugDeclineOri→AugDeclineOri→AugDeclineOri→AugDeclineDecline
BiLSTM-CRF83.0→82.20.882.9→43.539.482.5→73.59.082.9→64.218.782.9→67.715.216.6
BERT-based (cased)87.3→86.21.187.4→48.139.387.5→83.14.187.4→79.08.487.4→82.15.311.6
BERT-based (uncased)88.8→88.70.188.7→46.042.789.1→83.06.188.7→74.614.188.7→78.510.214.6
TENER84.2→83.40.884.7→39.645.184.5→76.67.984.7→51.533.284.7→31.153.628.1
Flair85.5→85.20.384.6→44.939.786.1→81.54.684.6→81.33.384.6→73.111.511.9
PFN89.1→87.91.289.0→80.58.589.6→86.92.789.0→80.48.689.0→84.34.75.1
FSN88.3→86.41.988.3→82.75.688.8→86.22.788.3→81.17.288.3→85.62.74.0
Table 5. Comparison of the model efficiency. Training time (s) refers to the amount of time needed to train one epoch.; inference time (ms) is the amount of time it takes to predict relational triples of a single sentence. * denotes results acquired from the source code.
Table 5. Comparison of the model efficiency. Training time (s) refers to the amount of time needed to train one epoch.; inference time (ms) is the amount of time it takes to predict relational triples of a single sentence. * denotes results acquired from the source code.
DatasetModelTraining TimeInference TimeF1
ACE2005PFN3611758.6 *
FSN3191665.7
SciERCPFN74638.4
SOIRP65342.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, Q.; Yang, W.; Wei, F.; He, L.; Liao, Y. FSN: Joint Entity and Relation Extraction Based on Filter Separator Network. Entropy 2024, 26, 162. https://doi.org/10.3390/e26020162

AMA Style

Dai Q, Yang W, Wei F, He L, Liao Y. FSN: Joint Entity and Relation Extraction Based on Filter Separator Network. Entropy. 2024; 26(2):162. https://doi.org/10.3390/e26020162

Chicago/Turabian Style

Dai, Qicai, Wenzhong Yang, Fuyuan Wei, Liang He, and Yuanyuan Liao. 2024. "FSN: Joint Entity and Relation Extraction Based on Filter Separator Network" Entropy 26, no. 2: 162. https://doi.org/10.3390/e26020162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop