Next Article in Journal
Multilayered Emergent Phenomena Caused by Basic Income and Labor Supply on the Wider Economic System
Previous Article in Journal
Comparison of Differential Evolution and Nelder–Mead Algorithms for Identification of Line-Start Permanent Magnet Synchronous Motor Parameters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Easy Partition Approach for Joint Entity and Relation Extraction

1
Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
2
University of Chinese Academy of Sciences, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(13), 7585; https://doi.org/10.3390/app13137585
Submission received: 6 March 2023 / Revised: 4 May 2023 / Accepted: 23 May 2023 / Published: 27 June 2023

Abstract

:
The triplet extraction (TE) task aims to identify the entities and relations mentioned in a given text. TE consists of two tasks: named entity recognition (NER) and relation classification (RC). Previous work has either treated TE as two separate tasks with independent encoders, or as a single task with a unified encoder. However, both approaches have limitations in capturing the interaction and independence of the features for different subtasks. In this paper, we propose a simple and direct feature selection and interaction scheme. Specifically, we use a pretraining language model (e.g., BERT) to extract various features, including entity recognition, shared, and relation classification features. To capture the interaction, shared features consist of the common semantic information used by the two tasks simultaneously. We use a gate module to obtain the task-specific features. Experimental results on various public benchmarks show that our proposed method can achieve competitive performance, and the calculation speed of our model is seven times faster than CasRel, and two times faster than PFN.

1. Introduction

Triplet extraction (TE) aims at extracting the entities and relations between entity pairs to form triples in the given raw text. It is useful for many applications, especially those that require extracting the critical information from a given text. It is related to two other tasks: named entity recognition (NER) [1] and relation classification (RC) [2].
Triplet extraction has been widely studied in the existing work from two perspectives. One is multitask learning perspective: entity recognition and relation classification are regarded as two separate tasks [3], and two independent encoders are used to learn the features of NER [4,5,6] and RC tasks [7,8,9]. Despite the effectiveness of these approaches, the correlation between the features of the two tasks is absent, and thus may lead to error propagation between tasks. In order to mitigate these issues, a single-task perspective is proposed, which typically designs a unique structure to represent the two tasks [10,11,12], such as TPLinker [13] and UniRE [14]. These approaches integrate NER and RC as one task, and generally use a unified encoder to learn the unified feature representation. These approaches avoid error propagation, but they may not capture the specific information needed for each task. In addition, a uniform feature was empirically shown to cause the interference issue [3].
To obtain improved task-specific features, the authors of [15] proposed a neuron-level partition filter network that learns three feature representations at the neuron level, and the structure has been proven effective in multiple tasks, but the method is a bit complex. The pretraining language model (PLM) has achieved state-of-the-art performance in various NLP tasks, including machine translation [16,17] and sentiment analysis [18]. To leverage the latent contextual space of PLM, we propose a simple but effective approach, called easy partition approach for relation extraction (EPRE). Unlike state-of-the-art alternative PFN [15], our proposed EPRE is different in the following aspects: (I) We do not construct a unique partition network based on neurons, but directly and simply partitions the results of PLM based on tokens. Specifically, our feature partition leverages the contextual representations of PLMs and performs feature partitioning on the token level to capture task-specific features. (II) We add a gate module to control the information flow between the NER and RC tasks and enhance the model flexibility. We discard the recursive structure and speed up the training process. As a result, our EPRE is simpler and more time-saving. To validate the effectiveness of EPRE, we empirically conduct extensive experiments on various benchmark datasets. Our experimental results show that EPRE is promising. In brief, we summarize our contributions, as follows:
  • We propose a simple but effective joint coding relation extraction approach, namely, easy partition approach for relation extraction (EPRE). Particularly, EPRE models the features between NER and RC tasks to ensure the independence and interaction of specific tasks.
  • We consider different gate modules to fuse task features to increase the elasticity of the model.
  • Our method is superior in speed, and yields the state-of-the-art results of the NER task and RC task on some benchmark datasets.

2. Related Work

The task of TE aims to extract entities and their relations from unstructured text and represent them as relational triples. The traditional TE method is usually divided into two subtasks: NER and RC. NER identifies entity boundaries and types from text, while RC determines the semantic relation between entity pairs. Various methods have been proposed for both subtasks. Table 1 shows the main methods. For NER, there are span-based methods [19,20] that predict entity spans by scoring all possible text segments, and tag-based methods [21] that assign a tag to each token indicating its entity type and position. For RC, there are methods that use independent encoders for each subtask [3,22], treating them as span classification and span pair classification problems, respectively. However, these methods suffer from weak interaction between the two subtasks, which may lead to inconsistent predictions.
To address this limitation, recent works have explored joint extraction methods that combine NER and RC using a single model. These methods can be roughly categorized into three paradigms: 1. Based on the table filling method [15,24,25], which presents entities and relations in tables. For example, Yan et al. [15] used a table for each type of entity and relation. By combining the results from NER and RE, relational triples can be extracted. 2. Based on the method of sequence tagging, the tag system encodes both entity position and relation type for each token [21]. Then, a triplet is formed according to the tag of the token. For example, Zheng et al. [21] proposed a novel tagging scheme and changed the extraction problem to a sequence labeling problem. To solve overlapping relations, Wang et al. [23] designed a handshaking tagging scheme and transformed the task into a token pair linking problem. 3. Graph-based methods [26,27], which build graphs for entities and relations, and perform graph neural network operations on them. For example, Wang et al. [27] constructed an entity-relation graph based on dependency parsing results, and applied graph convolutional networks to capture global semantic information. In the joint extraction method, the two subtasks usually share the encoding layer with weak independence. Therefore, how to obtain a better feature representation suitable for each subtask is a key challenge.
In order to alleviate the issue, the classical network model LSTM [28] used a gating mechanism to control how much information is forgotten or remembered in the hidden state. Another solution is to use attention mechanisms, which can learn to focus on the relevant parts of the input sequence and ignore the irrelevant ones. Attention mechanisms have been widely adopted in most pretrained language models [29,30], which are large-scale neural networks that are trained on massive amounts of text data, and can be fine-tuned for various NLP tasks. The closest work to us is that of Yan et al. [15]. They proposed a partition filtering network for the neuron level, which needs to use a separate network layer to learn the feature representation of each task. However, their method requires additional parameters and computations for each task, which increases the training cost and complexity. Our approach is much more straightforward with less training cost.

3. Method

3.1. Triple Extraction Task

Following Yan et al. [1] and Zhong et al. [2], our framework takes entity recognition and relation classification as two subtasks. To maintain the correlation and independence between the two subtasks of entity recognition and relation extraction, we adopt the feature partitioning method and the feature fusion strategy under the gating mechanism to generate the feature representation for each task. Given a sequence s = [ w 1 , w 2 , w 3 , w n ] with length N, our goal is to identify all entities under each entity type, and judge whether the relation exists when the head of one entity pair matches under each relation. Formally, let w i , w j denote the start and end indices of an entity type k     K , respectively, and w p , w q denote the start indices of an entity pair in a relation type l L , respectively; we require EPRE to identify all token pairs ( w i : w j ) K and ( w p , w q )   L. K and L represent the entity type set and the relation type set, respectively. At last, we can combine the results of the two tasks to obtain triples.

3.2. EPRE Architecture

An example of EPRE is depicted in Figure 1. EPRE consists of an encoder module, a fused module, and a task module. The feature generation module generates the task features for the NER task and RC task. Then, we take table filling as the solution for two tasks. Next, we describe each component in detail.

3.2.1. Encoder Module

We leverage the PLM to partition the semantic representation information. Additionally, we assume that the contextual representation can encourage learning suitable information for specific tasks in the training process [31,32]. Specifically, according to the number of tasks, we divide the semantic representation information obtained by each token into three partitions: entity partition, sharing partition, and relation partition. The shared partition can be used for both tasks.
We first use the PLM to model the semantic representation h i of each token w i . Each partition feature is obtained by being segmented according to the feature dimensions. If the embedding dimension of one token is d , then we split it into three parts according to three tasks, and each part’s dimension is d / 3 . We use the chunk function to express this operation. The features of the three parts are defined as the shared feature representation h share , NER feature representation h ner , and RC feature representation h rel :
h 1 , h 2 , h n = PLM [ w 1 , w 2 , w 3 , w n ] ,
h ner , h share , h rel = chunk h i .

3.2.2. Fused Module

We observe that the semantic information of each part may contribute differently to the task. Therefore, a gate module is used to weigh up this information in different contexts. Considering the feature of entity recognition part, we compute the feature weight of h ner and h share as follows:
g = sigmoid W 1 · h n e r + W 2 · h s h a r e + b ,
where W 1 and W 2 are trainable matrices, and b is the corresponding offset term. Then, we merge features to form the NER-specific feature h ge , as follows:
h ge = [ g h n e r 1 g h s h a r e ] ,
where ∘ represents the element-wise multiplication operation, and 1 is a 1-vector with all its elements equal to 1.
The relation classification part is symmetric with the NER process. We first use a gate module to compute the weight of h rel and h share , as follows:
g = sigmoid W 1 · h r e l + W 2 · h s h a r e + b ,
where W 1 and W 2 are trainable matrices, and b is the corresponding offset term. Then, the global feature h gr used for the RC task is formed as follows:
h gr = [ g h r e l 1 g h s h a r e ] .
To illustrate how our model works, we take the sentence “The current CEO of Apple is Cook” as an example. After feeding this sentence to the encoder module, we obtain the semantic representation h i of the corresponding tokens (‘the’, ‘current’, ‘ceo’, ‘of’, ‘apple’, ‘is’, ‘cook’), and the embedding dimension of h i is d. Then, we use the chunk function to split it into three parts according to the three tasks, and each part’s dimension is d/3.
In the fused module, taking the entity extraction task as an example, since these two parts of features may have different contributions to the NER task, we use a gating mechanism to control the weights of h ner and h share features. Specifically, we use a sigmoid function as the gating function, and learn trainable parameters to obtain g and 1 g to determine the weights of the two parts of features. Then, we concatenate the weighted features to obtain a new feature vector h ge , which is used for the NER task in the subsequent task module.

3.2.3. Task Module

In the task module, we apply the features obtained from the previous module to obtain triples by table filling. Specifically, in the NER unit, we need to find the index of the tail token of the entity in the upper triangle part of the table, and mark the corresponding index position as 1. In the RE unit, we need to find the index of the head token pairs of the entity pair in the whole table range, and mark the index position as 1. The specific calculation process is as follows.
Entity recognition unit. To make the model elastic enough, we feed the entity-level feature h ge to the fully-connected layer to obtain the token-level entity representation h e for identifying each entity type e . We take the i-th token to describe the process:
h i e = Linear h i ge ,
where h i ge is the entity-level global feature of the i-th token, and h i e is the feature of the i-th token for entity type classification.
Assume that the token x with position i is expressed as h i e , and the token y with position j is denoted as h j e . We can obtain the entity score e ij k , as follows:
e ij k = h i e T h j e .
The corresponding loss function is as follows:
L ner = k K i , j P k BCELoss ( e s k i , j ,   e s k ^ i , j ) ,
where s k i ,   j represents the predicted entity score of the entity span of type k, which is composed of tokens [i:j], s k ^ i , j represents the true entity score of the entity span of type k, which is composed of token [i:j], P k is the entity span set of type k , and K is the type set of entities.
Relation extraction unit. In this unit, we only predict the relation score between the starting indices of each entity pair. The computation of the RE unit is symmetrical to the NER unit. We use the i-th and p-th tokens to denote the starting index of the entities in an entity pair. To obtain the global representation, for example, we take the p-th token global representation h p gr to the fully-connected layer to obtain the token representation h p r . Then we use the dot product to compute the relation score r iq l :
  h p r = Linear h p gr ,
r pq l = ( h p r ) T h q r ,
where r iq l is the relation score, which represents the probability of the relation type l between the entity pair. When the relation score is higher than a certain threshold β r , it can be interpreted as entities with start token indices i , and p has a relation of type l . The loss function L rel is as follows:
L rel = l L i , p L l BCELoss ( e s l i , p , e s l ^ i , p   ) ,
where l represents the type of relation, L is the set of relation types, s l i , p is the predicted relation score and s l ^ i , p is the true relation score, and L l is the entity pairs which have the l relation.

3.3. Training and Inference

The training objective is to minimize the loss function L, which consists of two parts: L = L ner + L rel . During the inference step, we predict the triple ( e ij k , r ip l , e pq k ) by combining the results of the NER unit and the RC unit ( k E , l R ). It should satisfy the following conditions: e ij k     α e ,   e pq l     α e , and r ip l     β r . α e , β r are thresholds set to 0.

4. Experiments

4.1. Setup

We evaluate our approach on three publicly available free datasets: ADE [33], SCIERC [34], and WebNLG [35]. Different datasets have different target domains, entity types, and relation types. Table 2 shows the data statistics of each dataset. Since WebNLG datasets do not contain entity type information, we uniformly set their entity label to “None”. A brief introduction to the three datasets is as follows:
  • ADE [33] is a dataset for adverse drug events (ADE) extraction, which contains 4272 sentences collected from the medical literature, and each sentence is labeled with three entities, namely, drug, disease, and adverse event. The difficulty of the ADE dataset is the diversity and ambiguity of adverse events, and the complexity and implication of cause–effect relationships.
  • SCIERC [34] is a dataset for information extraction in the scientific domain, which contains 2687 sentences collected from computer science papers. The difficulties of the SCIERC dataset are the fine granularity of entities and relations, and the presence of a large number of abbreviations, symbols, and formulas in the text.
  • WebNLG [35] is a dataset for knowledge graph to natural language generation, which contains a collection of 23,767 triads extracted from the DBpedia knowledge graph. The difficulty of the WebNLG dataset is the size and diversity of the triad collection, and the fluency and diversity of the natural language descriptions.

4.1.1. Evaluation Metrics

We use standard micro-F1 to evaluate our model performance. We follow the previous evaluation metrics. In the NER unit, if the entity type and the corresponding span of the entity are correct, the entity is correct. In the RC unit, if the relation type and two corresponding entities are correct, the triple is correct.

4.1.2. Implementation Details

We use bert-base-cased as the basic encoder of WebNLG and ADE. We use scibert-scivocab-uncased as the basic encoder of SCIERC. We retrieve 10% of ADE for validation. The initial learning rate is 5 × 10−5 for all datasets, with a batch size of 12. We use Adam as our optimizer to train our model with 100 epochs.

4.2. Comparison with Previous Work

4.2.1. Main Results

Table 3 reports the comparative evaluation of our EPRE with the existing methods. We report the average F1 scores of three runs on each dataset. Compared with the previous methods of directly using independent (PURE) or shared (CasRel/TPLinker) encoding for two tasks, our EPRE performs better. These results are expected due to the interaction between the entity and relation units. Compared with PFN, EPRE has slightly lower performance on the NER and RC tasks. We consider there are two reasons for this: (1) The model’s performance on WebNLG tends to be saturated with only 5019 training data points. We can observe that the model’s F1 scores on both tasks are above 90%, which is well above the manual level. We believe it is difficult for the model to improve substantially with these training data. (2) We compared the data predicted by the two models, and we find that our model tends to miss the selection when the entities in the sentences are connected in parallel, or when the same type of entities appear in the sentences, which we believe is also related to the quality of the dataset. After examining the data, we find that the triples labeled in the dataset are incomplete, leading to our model’s entity confusion in recognition. The general error is that the model extracts entities that also satisfy relation type, or that are juxtaposed with entities in the labeling. In relation classification, the model is prone to errors when dealing with synonymous or similar relationships (such as country and birthplace) because only one relation type is labeled in the dataset. However, our model may recognize relations that are similar or synonymous with the labeled type. In this case, the model will judge it as an error, leading to a decrease in the accuracy of our model. Our model also performs poorly in the NER task on the SCIERC dataset. We analyze the main types of errors. We observe that EPRE and PFN perform poorly when facing short entities (entities with only one word), and the F1 values are 54.9% and 56.31%, respectively. Compared to other entities, EPRE and PFN reach 67.17% and 68.39% F1 values on NER, respectively. Models have difficultly accurately identifying short entities. In addition, we analyze the impact of whether the entity is included in the triple on NER performance, and we divide the entity into In_triple and Out_triple classes. We observe that EPRE and PFN on Out_triple F1 values (46.5% and 48.6%, respectively) were significantly lower than in In_triple (75.4% and 76.2%, respectively), which may be due to less contextual information about the entity. However, we believe that our model speed advantage can make up for the small gap in NER performance. At the same time, our model performs slightly better in the RE task than PFN.

4.2.2. Computation Efficiency

Table 4 compares calculation efficiency between CasRel, TPLinker, PFN, and our EPRE. Computing time represents the average time (ms) the model takes to process a sample. In this comparative experiment, we used the WebNLG dataset. The results of CasRel and TPLinker are produced by official implementation and default configuration. The results of PFN and EPRE are produced with RTX A5000. CasRel is limited to processing one sentence at a time, making it seriously inefficient and difficult to deploy. PFN uses a recursive structure to obtain cell-level semantic information, reducing parallelism. Overall, the speed of our EPRE is competitive.

4.2.3. Detailed Results

Detailed results on overlapping patterns and triple numbers. We compare the ability of our model to extract sentences with different numbers of triples and triples with different overlapping patterns in the WebNLG dataset. We refer to the previous work ([13,36]) and divide the sentences into three classes according to three overlapping patterns: normal, SEO (single entity overlap), and EPO (entity pair overlap). The sentences are divided into five subcategories according to the number of triples: 1, 2, 3, 4, and ≥ 5. The results are shown in Figure 2 and Figure 3. We observe that in both cases, the improvement of the F1 score of our model comes from the most challenging subclass (EPO and ≥ 5). The best results are obtained under the conditions of EPO and N ≥ 5. In these sentences, entities and relations become more complex. This proves that our model shows better advantages than other baselines in dealing with complex scenarios.
Detailed results on entity density. We compare the ability of EPRE in terms of extracting entities under different entity densities. Note that the WebNLG dataset adopts the evaluation method of partial matching; thus, for a fair comparison, we use the ratio of the number of entities to the number of words contained in the text as the calculation formula for entity density. For example, in the sentence “The current CEO of Apple is Cook”, we can observe that there are three entities (CEO, Apple, Cook) and the number of words in the sentence is 7, so the density of the sentence is 3/7. The sentences in datasets are divided into three categories: dense ≥ 10%, 10% > dense ≥ 5%, and dense < 5%. Our results are shown in Figure 4. We observe that the effect is best when the density of the entity is the highest. The lower the density of entities in the sentence, the worse the model’s performance.

4.3. Ablation Studies

We perform an ablation study on the ADE dataset to check the contribution of each element to the final performance of EPRE in NER and RC tasks. It involves three aspects: gate module, feature interaction strategy, and partition granularity. Table 5 shows the results.
Gate module. We compare the results without using the gate module while using the add gate module and the cat gate module. Without the gate module, we directly apply the feature blocks obtained from the PLM for the NER and RC units. In the add gate module, we replace the cat operation with an add operation. Take the NER unit as an example, that h ge = [ g h ner + 1 g h share ) . Compared with the case without a gate, the experiment shows that the gate module is effective, and the cat gate module is more effective than the add gate module. This may be because the cat module can maintain the independence of each part of the feature than add a module.
Feature interaction strategy. We compare three feature interaction approaches. In Table 4, the proportional value represents the feature proportion of the NER, share, and RE units. For example, 1:1:1 means that the three-part feature dimension is equal to d/3, where d is the dimension of the PLM. We compare the results under three partition proportions, and the effect is best when the proportion is equal. When we assign a large proportion of the features to a specific task, the proportion of shared features shrinks, and the model performance decreases by 0.8% and 1.7% on the NER and RE tasks, respectively. Compared to the model performance in the three-split case, performance is best when the three-part feature dimensions have equal sizes, which indicates the importance of shared features and dependence features in both tasks.
Partition granularity. We try three partition granularity. One partition means directly using the pretraining features to complete two subtasks, and the features of two specific tasks are the same. Different from the former, two partitions divide the output of the pretraining model into two partitions, one for the NER task and the other for the RC task. The features of two specific tasks are not shared. Our model has three parts. As can be seen, the approach of using independent task-specific features is better than the approach of taking shared features. We expect the features required for each specific task to be different, but the sharing method cannot separate the features for two specific tasks. We observe that the performance is significantly better when considering both independent and shared features for each specific task simultaneously.

5. Conclusions

In this paper, we propose a simple and effective feature partition method to ensure the feature independence and interaction of NER and RC tasks. We simply divide the results of the pretraining model into the RC-specific feature, NER-specific feature, and shared feature. This accelerates the whole process. Then, we use the gate module to obtain the NER feature and RC feature, ensuring feature interaction and independence between tasks. We conduct extensive experiments on three datasets to verify the effectiveness of our EPRE. We will further investigate what the PLM has learned so as to make better use of feature information. Furthermore, an adaptive threshold for each relation type and entity type can alleviate the long-tail problem. However, it needs further investigation.

Author Contributions

Conceptualization, J.H.; methodology, J.H.; validation, P.H. and X.D.; supervision, P.H.; funding acquisition, P.H. and X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Research data are available at https://github.com/Coopercoppers/PFN/tree/main/data, accessed on 10 February 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ekbal, A.; Bandyopadhyay, S. Bengali Named Entity Recognition Using Classifier Combination. In Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition, Washington, DC, USA, 4–6 February 2009; pp. 259–262. [Google Scholar] [CrossRef]
  2. Zhou, G.; Su, J.; Zhang, J.; Zhang, M. Exploring Various Knowledge in Relation Extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 427–434. [Google Scholar] [CrossRef] [Green Version]
  3. Zhong, Z.; Chen, D. A Frustratingly Easy Approach for Entity and Relation Extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language 163 Technologies, Online, 6–11 June 2021; pp. 50–61. [Google Scholar] [CrossRef]
  4. Patil, N.; Patil, A.; Pawar, B. Named Entity Recognition using Conditional Random Fields. Procedia Computer Science. In Proceedings of the International Conference on Computational Intelligence and Data Science, Las Vegas, NV, USA, 16–18 December 2020; Volume 167, pp. 1181–1188. [Google Scholar] [CrossRef]
  5. Yang, L.; Fu, Y.; Dai, Y. BIBC: A Chinese Named Entity Recognition Model for Diabetes Research. Appl. Sci. 2021, 11, 9653. [Google Scholar] [CrossRef]
  6. Wang, Y.; Sun, Y.; Ma, Z.; Gao, L.; Xu, Y. An ERNIE-Based Joint Model for Chinese Named Entity Recognition. Appl. Sci. 2020, 10, 5711. [Google Scholar] [CrossRef]
  7. Peng, T.; Han, R.; Cui, H.; Yue, L.; Han, J.; Liu, L. Distantly Supervised Relation Extraction using Global Hierarchy Embeddings and Local Probability Constraints. Knowl. -Based Syst. 2022, 235, 107637. [Google Scholar] [CrossRef]
  8. Li, Q.; Li, L.; Wang, W.; Li, Q.; Zhong, J. A comprehensive exploration of semantic relation extraction via pre-trained CNNs. Knowl. -Based Syst. 2020, 194, 105488. [Google Scholar] [CrossRef]
  9. Zheng, S.; Xu, J.; Zhou, P.; Bao, H.; Qi, Z.; Xu, B. A neural network framework for relation extraction: Learning entity semantic 179 and relation pattern. Knowl. -Based Syst. 2016, 114, 12–23. [Google Scholar] [CrossRef]
  10. Wan, Q.; Wei, L.; Chen, X.; Liu, J. A region-based hypergraph network for joint entity-relation extraction. Knowl. -Based Syst. 2021, 228, 107298. [Google Scholar] [CrossRef]
  11. Tang, R.; Chen, Y.; Qin, Y.; Huang, R.; Dong, B.; Zheng, Q. Boundary assembling method for joint entity and relation extraction. Knowl. -Based Syst. 2022, 250, 109129. [Google Scholar] [CrossRef]
  12. Zhao, K.; Xu, H.; Cheng, Y.; Li, X.; Gao, K. Representation iterative fusion based on heterogeneous graph neural network for joint 185 entity and relation extraction. Knowl. -Based Syst. 2021, 219, 106888. [Google Scholar] [CrossRef]
  13. Wang, Y.; Yu, B.; Zhang, Y.; Liu, T.; Zhu, H.; Sun, L. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1572–1582. [Google Scholar] [CrossRef]
  14. Wang, Y.; Sun, C.; Wu, Y.; Zhou, H.; Li, L.; Yan, J. UniRE: A Unified Label Space for Entity Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 220–231. [Google Scholar] [CrossRef]
  15. Yan, Z.; Zhang, C.; Fu, J.; Zhang, Q.; Wei, Z. A Partition Filter Network for Joint Entity and Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 185–197. [Google Scholar] [CrossRef]
  16. Zhao, L.; Gao, W.; Fang, J. High-Performance Englishndash;Chinese Machine Translation Based on GPU-Enabled Deep Neural 198 Networks with Domain Corpus. Appl. Sci. 2021, 11, 10915. [Google Scholar] [CrossRef]
  17. Tanoli, I.K.; Amin, I.; Junejo, F.; Yusoff, N. Systematic Machine Translation of Social Network Data Privacy Policies. Appl. Sci. 2022, 12, 10499. [Google Scholar] [CrossRef]
  18. AlBadani, B.; Shi, R.; Dong, J.; Al-Sabri, R.; Moctard, O.B. Transformer-Based Graph Convolutional Network for Sentiment Analysis. Appl. Sci. 2022, 12, 1316. [Google Scholar] [CrossRef]
  19. Li, F.; Lin, Z.; Zhang, M.; Ji, D. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 4814–4828. [Google Scholar] [CrossRef]
  20. Wang, B.; Lu, W. Combining Spans into Entities: A Neural Two-Stage Approach for Recognizing Discontiguous Entities. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6216–6224. [Google Scholar] [CrossRef]
  21. Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1227–1236. [Google Scholar] [CrossRef] [Green Version]
  22. Ye, D.; Lin, Y.; Li, P.; Sun, M. Packed Levitated Marker for Entity and Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 4904–4917. [Google Scholar] [CrossRef]
  23. Wang, H.; Qin, K.; Lu, G.; Luo, G.; Liu, G. Direction-sensitive relation extraction using Bi-SDP attention model. Knowl. -Based Syst. 2020, 198, 105928. [Google Scholar] [CrossRef]
  24. Zheng, H.; Wen, R.; Chen, X.; Yang, Y.; Zhang, Y.; Zhang, Z.; Zhang, N.; Qin, B.; Ming, X.; Zheng, Y. PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–9 August 2021; pp. 6225–6235. [Google Scholar] [CrossRef]
  25. Ren, F.; Zhang, L.; Yin, S.; Zhao, X.; Liu, S.; Li, B.; Liu, Y. A Novel Global Feature-Oriented Relational Triple Extraction Model based on Table Filling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2646–2656. [Google Scholar] [CrossRef]
  26. Xue, F.; Sun, A.; Zhang, H.; Chng, E.S. GDPNet: Refining Latent Multi-View Graph for Relation Extraction. arXiv 2020, arXiv:2012.06780. [Google Scholar] [CrossRef]
  27. Liang, Z.; Du, J. Sequence to sequence learning for joint extraction of entities and relations. Neurocomputing 2022, 501, 480–488. [Google Scholar] [CrossRef]
  28. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  29. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
  30. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Under-standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  31. Alt, C.; Gabryszak, A.; Hennig, L. Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1534–1545. [Google Scholar] [CrossRef]
  32. Conneau, A.; Kruszewski, G.; Lample, G.; Barrault, L.; Baroni, M. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2126–2136. [Google Scholar] [CrossRef] [Green Version]
  33. Gurulingappa, H.; Rajput, A.M.; Roberts, A.; Fluck, J.; Hofmann-Apitius, M.; Toldo, L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 2012, 243, 885–892. [Google Scholar] [CrossRef] [PubMed]
  34. Riedel, S.; Yao, L.; McCallum, A. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2010, Barcelona, Spain, 20–24 September 2010; pp. 148–163. [Google Scholar]
  35. Gardent, C.; Shimorina, A.; Narayan, S.; Perez-Beltrachini, L. Creating Training Corpora for NLG Micro-Planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 179–188. [Google Scholar] [CrossRef] [Green Version]
  36. Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1476–1488. [Google Scholar] [CrossRef]
  37. Eberts, M.; Ulges, A. Span-based Joint Entity and Relation Extraction with Transformer Pre-training. arXiv 2019, arXiv:1909.07755. [Google Scholar]
Figure 1. The main architecture of EPRE, which includes an encoder module, a fused module, and a task module. We use the PLM to generate embedding of tokens in the encoder module and then obtain the final subtask features in the fused module. For both tasks, we use the approach of table filling, but note that we only need to consider the upper triangular region in the NER unit, and we need to consider the whole table region in the RC unit.
Figure 1. The main architecture of EPRE, which includes an encoder module, a fused module, and a task module. We use the PLM to generate embedding of tokens in the encoder module and then obtain the final subtask features in the fused module. For both tasks, we use the approach of table filling, but note that we only need to consider the upper triangular region in the NER unit, and we need to consider the whole table region in the RC unit.
Applsci 13 07585 g001
Figure 2. Partial matching F1 score of relational triples extracted from sentences with three different overlapping patterns in the WebNLG dataset.
Figure 2. Partial matching F1 score of relational triples extracted from sentences with three different overlapping patterns in the WebNLG dataset.
Applsci 13 07585 g002
Figure 3. Partial matching F1 score of relational triples extracted from sentences with different numbers of triples in the WebNLG dataset. We divide sentences into five categories. Each category contains sentences that have 1, 2, 3, 4, and ≥ 5 triples.
Figure 3. Partial matching F1 score of relational triples extracted from sentences with different numbers of triples in the WebNLG dataset. We divide sentences into five categories. Each category contains sentences that have 1, 2, 3, 4, and ≥ 5 triples.
Applsci 13 07585 g003
Figure 4. The F1 score of the entity is extracted under the condition of different entity densities. According to the average frequency of entities, we divide them into three categories: dense ≥ 10%, 10% > dense ≥ 5%, and dense < 5%.
Figure 4. The F1 score of the entity is extracted under the condition of different entity densities. According to the average frequency of entities, we divide them into three categories: dense ≥ 10%, 10% > dense ≥ 5%, and dense < 5%.
Applsci 13 07585 g004
Table 1. Common technology paradigm of entity recognition and relation extraction.
Table 1. Common technology paradigm of entity recognition and relation extraction.
Method TypeMeaning of MethodsRelated Paper
Span-based methodIn NER tasks, it refers to the process of identifying entity spans in text and classifying them.
In RE tasks, it is the process of classifying entity span pairs.
[19,20]
Tag-based methodIn NER, it refers to assigning a label to tokens in text, often using “BIO” and “BIOS” to label token positions, and combining token positions and entity types as labels for entities, such as B-location.
In RE tasks, a combination of token position, relation type, and entity position is often used as labels, such as B-CEO-1.
[21,23]
Table filling methodUsually using T × N × the matrix of N represents the sentence sequence, T represents the entity type or relation type, and N represents the sentence length. In NER, take the diagonal of the matrix as the starting position of the entity, find the corresponding ending position of the entity in the upper triangle of the matrix, and mark it as 1. Determine the entity in the sentence according to the matrix.
In RE, find the position corresponding to the token pair of the head or tail of the entity pair in the matrix, mark it as 1, and determine the entity pairs with relation based on the matrix.
[15,24,25]
Graph-based methodThe main idea of this method is to represent entities and relationships as nodes and edges in the graph, and then use graph algorithms to process them.[26,27]
Table 2. Statistics of the three datasets. We use ADE, SCIERC, and WebNLG to evaluate our model. In WebNLG, the entity type has no annotation.
Table 2. Statistics of the three datasets. We use ADE, SCIERC, and WebNLG to evaluate our model. In WebNLG, the entity type has no annotation.
DatasetEntity
Type
Relation
Type
Sentence
TrainTestDev
SCIERC671861275551
WebNLGNone1705019500703
ADE213845-427
Table 3. F1 scores for WebNLG, ADE, and SCIERC datasets. The SCIERC dataset uses the SCIBERT model, and the rest use the BERT model. The data with ★ on the superscript are the results we reproduce.
Table 3. F1 scores for WebNLG, ADE, and SCIERC datasets. The SCIERC dataset uses the SCIBERT model, and the rest use the BERT model. The data with ★ on the superscript are the results we reproduce.
MethodWebNLGADESCIERC
NERRENERRENERRE
CasRel [36]95.591.8----
TPLinker [13]-91.9----
PURE [3]----66.635.6
SpERT [37]--89.379.2
PFN [15]98.0★93.5★89.7★80.3★68.4★37.5★
EPRE97.592.990.081.967.638.3
Table 4. Comparison of computational efficiency. Computing time represents the model’s average time (ms) to process a sample.
Table 4. Comparison of computational efficiency. Computing time represents the model’s average time (ms) to process a sample.
ModelComputing Time
CasRel [36]76.8
TPLinker [13]25.6
PFN [15]23.8
EPRE10.2
Table 5. Results of ablation study on the ADE dataset.
Table 5. Results of ablation study on the ADE dataset.
AblationSettingsEntityTriple
Prec.Rec.F1Prec.Rec.F1
Gate ModuleNo gate89.1090.3089.3481.3081.3081.30
Add Gate87.7191.2989.4080.6182.9981.80
Cat Gate88.7191.3990.0080.9082.9981.90
Partition Proportion1:1:188.7191.3990.0080.9082.9981.90
2:1:287.8190.7089.2078.5281.9080.21
1:2:188.1190.4089.3081.7081.8081.70
Partition GranularityOne Part88.4090.0089.2681.7080.4181.20
Two Part87.5190.5089.0080.9082.7081.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hou, J.; Deng, X.; Han, P. An Easy Partition Approach for Joint Entity and Relation Extraction. Appl. Sci. 2023, 13, 7585. https://doi.org/10.3390/app13137585

AMA Style

Hou J, Deng X, Han P. An Easy Partition Approach for Joint Entity and Relation Extraction. Applied Sciences. 2023; 13(13):7585. https://doi.org/10.3390/app13137585

Chicago/Turabian Style

Hou, Jing, Xiaomeng Deng, and Pengwu Han. 2023. "An Easy Partition Approach for Joint Entity and Relation Extraction" Applied Sciences 13, no. 13: 7585. https://doi.org/10.3390/app13137585

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop