1. Introduction
Relation extraction, an important research direction [
1,
2] in the field of information extraction [
3], aims to automatically extract entities and their relations from massive text data, providing support for downstream tasks such as intelligent recommendation, semantic search, and deep question answering [
4,
5]. Relation extraction is usually divided into two subtasks: entity identification [
6,
7] and relation identification [
8,
9]. The main goal of entity identification is to identify entities with specific meanings from text, such as names of people, places, dates, organizations, and so on. Entity identification technology has a wide range of applications in the field of natural language processing, including the tasks of question-and-answer systems, public opinion analysis, entity linking, and so on. The main goal of relation identification is to identify the relations between entities from text. By automatically extracting relations between entities from large amounts of textual data, relation identification can help build applications such as knowledge graphs, recommendation systems, and sentiment analysis.
With the increasing complexity of information, the relation overlapping problem has begun to emerge [
10]. The relation overlapping problem refers to the situation where there are shared entities among the entity-relation triad in the text. According to the types of overlapping, the relation overlapping problem can be further divided into three subcategories: Normal, Single Entity Overlap (SEO), and Entity Pair Overlap (EPO), as illustrated in the examples in
Figure 1.
Most existing relation extraction methods cannot effectively deal with the relation overlapping problem for two main reasons: (1) The model design is flawed and does not consider the situation where one entity in the text may have relations with multiple entities (Zheng et al. [
4]), or it does not consider the existence of multiple relations between a particular head and tail entity situation (Miwa et al. [
11]). (2) When the relation overlapping problem arises, there are always numerous entities in the text, and most of these entities do not have relations with each other. The noise interference from these irrelevant entities may mislead the model, causing it to incorrectly identify these irrelevant entities as part of the tail entity or relation. Additionally, the model needs to accurately understand the context around the head entity in order to correctly identify the tail entity and relation. The presence of irrelevant entities increases the difficulty of understanding the context. As shown in
Figure 2, the head entity “Zhang Fansheng” in the text is only able to form the entity-relation triad “{Zhang Fansheng, Founder, Jinghai Enterprises}” with the tail entity “Jinghai Enterprises”, and there are no relations with other entities. Due to the interference of entities unrelated to “Zhang Fansheng”, it is difficult for the model to identify the tail entities related to the head entity “Zhang Fansheng” and the relations between them.
In order to solve the above problems, we have designed the Entity Attention network, and we propose the Relation extraction method based on the Entity Attention network and Cascade binary Tagging framework (REACT). To address problem (1), we divide the relation extraction task into two subtasks: head entity identification and tail entity and relation identification. REACT first identifies the head entities present in the text and then identifies the tail entities that can be paired with the head entities and all possible relations between them. With this architecture, REACT is able to handle the relation overlapping problem. Please note that in this paper, entity pairs specifically refer to combinations of head and tail entities that have at least one type of relation. To address problem (2), we introduce the Entity Attention network. The Entity Attention network includes two parts: an Entity Attention Mechanism and an Entity Gated Mechanism. Words in the text have different degrees of importance for different head entities, and by introducing head entity information into the Entity Attention Mechanism, REACT is able to assign attention weights to words according to the head entities and reduce the word noise accordingly. After reducing the word noise according to the head entity, it is also necessary to consider the relevance of the words to the relation extraction task. The Entity Gated Mechanism calculates the degree of association of the words with the relation extraction task and diminishes the noise from words noise less associated with the task. By reducing the word noise through the above operations, REACT is able to focus on words with higher relevance to the head entity and the relation extraction task, thus increasing the accuracy of identifying the tail entity and relation and improving the performance of relation extraction.
Our main contributions are as follows:
Due to the current lack of Chinese datasets, we constructed a high-quality Chinese dataset with a high number of data with relation overlapping problems by optimizing the public Duie 2.0 entity-relation dataset;
For the relation overlapping problem, we propose the Relation extraction method based on the Entity Attention network and Cascade binary Tagging framework;
We conducted extensive experiments on a high-quality Chinese dataset to evaluate REACT and compared it with other baselines. The results demonstrate that REACT outperforms other baselines in handling relation overlapping problems.
2. Related Works
Early works on relation extraction adopted a pipeline approach (Zelenko et al. [
8], Zhou et al. [
12], Chan et al. [
13], Mintz et al. [
14], and Gormley et al. [
15]). It first identifies all the entities in a sentence and then performs relation classification for each entity pair. This approach often faces the problem of error propagation because errors in the early stages cannot be corrected in later stages. To address this problem, subsequent works proposed joint learning of entities and relations, which includes feature-based models (Yu et al. [
16]; Li et al. [
17]; and Ren et al. [
18]), as well as more recent neural network models (Gupta et al. [
19]; Katiyar et al. [
20]; Zhou et al. [
21]; and Fu et al. [
22]). By replacing manually constructed features with learned representations, neural network-based models have achieved considerable success in the relation extraction task.
As research has progressed, the relation overlapping problem has been increasingly emphasized. The relation overlapping problem refers to the situation where there are shared entities among the entity-relation triad in the text. To address this problem, many neural network-based joint models have been proposed [
23].
Zeng et al. [
24] were among the first to consider the relation overlapping problem. They first divided the relation overlapping problem into Normal, SEO, and EPO, and proposed a sequence-to-sequence (Seq2Seq) model with a copying mechanism. To address the relation overlapping problem, they allowed entities to participate freely in multiple triples. Building upon the Seq2Seq model, Zeng et al. [
25] further investigated the impact of triple extraction order on relation extraction performance, transforming the relation extraction task into a reinforcement learning process, resulting in significant improvements.
Yuan et al. [
26] redesigned the relation extraction method and successfully enabled the model to handle the relation overlapping problem. They first identified all entity pairs in the text and then individually determined whether each relation existed, rather than extracting only the most probable relation. Additionally, they pointed out that previous relation extraction methods ignored the connections between relations and independently predicted each possible relation. For example, if an entity pair has the “liveIn” relation, the “dieIn” relation is almost impossible to establish. Therefore, when determining the existence of a certain relation between entities, it is necessary to not only consider the target relation but also calculate the probability scores with respect to other relations. Inspired by the aforementioned research, Yuan et al. [
27] assumed that the importance of words in text varies across different relations. They constructed a joint model based on a specific attention mechanism, which first identifies the relation existing in the text, then incorporates the relation into the attention mechanism, and finally identifies entity pairs containing the relation based on attention scores.
Although the above research was able to address the relation overlapping problem, they still regarded the identification of head and tail entities as independent processes, ignoring the semantic and logical connections between them. For example, organizational entities usually have relations with human entities (leaders, founders, members, etc.), but do not always have relations with entities such as songs, animals, movies, etc. Yu et al. [
28] re-designed the method to first identify the head entity, then identify the potential tail entities based on the head entity, and finally determine all possible relations that may exist between the head entity and the tail entities. This approach not only addressed the relation overlapping problem but also achieved outstanding performance by utilizing head entity information in the tail entity identification process. Building upon this research, Li et al. [
29] proposed an HBT framework, where they regarded the relation extraction task as a multi-turn question-answering task, incorporating external knowledge to introduce entities and relations. However, generating appropriate questions remains a challenge and is not suitable for most complex scenarios [
30]. Inspired by the work of Yuan et al. [
26] and Yu et al. [
28], Fu et al. [
22] introduced graphical convolutional networks, which are widely used in the field of logical reasoning [
31], into the task of relation extraction. After identifying the relation between entity pairs, graph convolutional networks are used to infer the possibility of the existence of other relations. For example, “{Trump, LiveIn, United States}” can be inferred from “{Trump, LiveIn, Florida}”. Prior to this, Zheng et al. [
4] proposed a strong neural end-to-end joint model based on an LSTM sequence tagger for entities and relations, which helped infer unidentified relations based on identified relations. However, they were unable to address the relation overlapping problem.
Although current research has achieved joint extraction of entities and relations and developed models to address the relation overlapping problem, relations are still treated as discrete labels for entity pairs. This makes it difficult for models to correctly extract overlapping triples. Wei et al. [
32] proposed a new cascaded binary labeling framework (CasRel), which uses BERT [
33] as the feature extraction method and maps the tail entities through the head entities conditioned on the relations. As shown in the equation,
r stands for relation,
S stands for head entity, and
O stands for tail entity. Relations are modeled as functions that map the head entity to the tail entity:
In natural language, the relations between entities are often closely related to their context. By introducing information about the head entity, the model can better understand the semantic structure of the sentence, thus predicting the tail entity and relations more accurately. Additionally, entities in a sentence may have multiple roles or meanings, but when paired with a specific head entity, their roles or meanings become clearer, reducing ambiguity in the tail entity and relations and improving the accuracy of identifying them. Therefore, CasRel has achieved good results. However, Wei et al. [
32] were still unable to effectively utilize head entity information. Specifically, they only used head entity information as a parametric feature when identifying tail entities and relations, ignoring the potential role of head entity information in reducing word noise. When the problem of overlapping relations occurs, especially in cases of multiple overlapping relations, numerous entities typically emerge in the text. However, most of these entities do not participate in forming effective relations. This situation leads to the generation of a large number of potential invalid entity pairs, making it difficult for the model to accurately identify truly effective entity pairs among numerous possibilities, ultimately affecting the overall accuracy of relation extraction tasks.
In order to utilize the head entity information to reduce text noise and minimize interference from other irrelevant entities, we propose the Relation extraction method based on the Entity Attention network and Cascade binary Tagging framework (REACT). We utilize Entity Attention networks to help the model focus on words that are highly relevant to the head entity and the relation extraction task, thereby reducing the likelihood of identifying errors in tail entities and relations and improving the accuracy of extracting entity relation triplets.