Next Article in Journal
Mathematical Model for Cargo Allocation Problem in Synchromodal Transportation
Previous Article in Journal
Mechanical and Corrosion Studies of Friction Stir Welded Nano Al2O3 Reinforced Al-Mg Matrix Composites: RSM-ANN Modelling Approach
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Entity Relation Extraction Based on Entity Indicators

College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2021, 13(4), 539;
Submission received: 10 March 2021 / Revised: 22 March 2021 / Accepted: 23 March 2021 / Published: 25 March 2021
(This article belongs to the Section Computer)


Relation extraction aims to extract semantic relationships between two specified named entities in a sentence. Because a sentence often contains several named entity pairs, a neural network is easily bewildered when learning a relation representation without position and semantic information about the considered entity pair. In this paper, instead of learning an abstract representation from raw inputs, task-related entity indicators are designed to enable a deep neural network to concentrate on the task-relevant information. By implanting entity indicators into a relation instance, the neural network is effective for encoding syntactic and semantic information about a relation instance. Organized, structured and unified entity indicators can make the similarity between sentences that possess the same or similar entity pair and the internal symmetry of one sentence more obviously. In the experiment, a systemic analysis was conducted to evaluate the impact of entity indicators on relation extraction. This method has achieved state-of-the-art performance, exceeding the compared methods by more than 3.7%, 5.0% and 11.2% in F1 score on the ACE Chinese corpus, ACE English corpus and Chinese literature text corpus, respectively.

1. Introduction

Relation extraction is one of the fundamental information extraction (IE) tasks that aims to identify the semantic relationship between two named entities in a sentence [1]. For example, given a sentence “Steve Jobs was the co-founder of Apple Inc.”, an employee relation instance can be identified between “Steve Jobs” and “Apple Inc.”. Finding the relation between two named entities has a wide range of applications, e.g., clinical decision support [2], drug discovery [3] and economic management [4]. This task is seen as foundational to support other natural language processing (NLP) tasks, such as knowledge graph construction [5], question answering [6] and natural language understanding [7]. Therefore, relation extraction has received extensive research attention [8]. Currently, the neural network is the most popular method to support relation extraction, where multilayer stacked architecture is adopted to support the designated feature transformation, e.g., convolutional neural network (CNN) [9], recurrent neural network (RNN) [10] and attention mechanism [11]. This approach has the advantage of extracting high-order abstract features from raw inputs, avoiding the effort required for the manual generation of designed features. The main problem for relation extraction is that a sentence often contains several named entities. Because relation types are asymmetrical, every entity pair in a sentence should be considered a possible relation instance. This consideration leads to a serious data imbalance problem. Furthermore, all entity pairs share the same context, weakening the discriminability of the features to predicate a relation instance. Therefore, obtaining entity position information is highly important for a neural network to concentrate on the considered entity pair.
In related works, several neural-network-based strategies have been proposed to encode the entity position information. Position embedding is a traditional method [12] in which for every word in a sentence, its coordinates relative to the two relation arguments (named entities) are embedded into a vector; it is concatenated with the word embedding to encode the position information. The multichannel method is another technique [13] to notice the entity pair in relation extraction. In this method, a sentence is divided into five channels. Each channel adopts an independent lookup table, enabling the neural network to learn different representations for the same word. The shortest dependency path between the named entities are also widely adopted for encoding position information of relation instances [14]. In other methods, Zhang et al. [15] used semantic information of patterns and flexible match method to constrain the structure of relation instances. Zheng et. al. [16] exploited the subsentences between two named entities.
A relation is mentioned in a sentence relevant to two named entities. The entity pair plays a central role in relation representations. Entity pairs have three characteristics. First, the structural information of a relation mention is determined by the entity pair. It precisely divides a sentence into five parts. Each part has specific expression patterns. Second, the semantic expression of a relation mention depends on the entity pair. Every word in a relation mention semantically depends on the named entities. Third, semantic information of the entity pair is important for relation extraction. For example, some relation types only appear in specific named entity types.
Motivated by these characteristics, in this paper, rather than directly implementing a neural network on a sentence, task-related entity indicators are designed for a neural network to capture the position information of named entities. This approach enables a deep neural network to concentrate on the task-relevant information and ignore the effect of the irrelevant named entities. This strategy has also been discussed by Soares et al. [17], Zhang et al. [18] and Zhong et al. [19]. In this paper, we extend this notion further by encoding positional, syntactic and semantic information into entity indicators, making a deep neural network concentrate on the task-relevant information. This method is also effective for capturing syntactic and semantic information of a relation instance. The contributions of this paper include the following:
Entity indicators are designed to support relation extraction. Several types of entity indicators are proposed in this paper. These indicators are effective for capturing the semantic and structural information of a relation instance.
The entity indicators are evaluated based on three public corpora, providing a systematic analysis of these indicators in supporting relation extraction. A performance comparison showed that our method considerably outperforms all compared works.
The rest of this paper is organized as follows. Section 2 discusses the related works on relation extraction. In Section 3, entity indicators are introduced. Section 4 evaluates the entity indicators based on three public corpora. The conclusion is given in Section 5.

2. Related Works

Most of the early research methods on relation extraction can be categorized into feature-based methods [20] and kernel-based methods [21]. In feature-based models, a shallow architecture (e.g., Support Vector Machine (SVM) [22] or Maximum Entropy (ME) [20]) is adopted to make a prediction based on categorical features. For example, Kambhatla et al. [20] combined lexical features, syntactic features and semantic features for relation extraction. Kia Dashtipour et al. [23] proposed a scalable system for Persian Named Entity Recognition (PNER) that was developed by combining Persian grammatical rule-based approach with SVM. Minard et al. [24] used external resources, e.g., stems of the words, and VerbNet’s classes. Because structural information of relation instances is important to support relation extraction, Chen et al. [25] proposed a feature assembly method to generate combined features. In addition, Liu et al. [26] proposed a convolutional tree kernel-based method that incorporates semantic similarity measures. Panyam et al. [27] employed a graph kernel to encode edge labels and edge directions. The extendibility of kernels is limited because manually designed distance functions (kernel) are used to compute the similarities between relation instances. Furthermore, generation of dependency trees heavily depends on external toolkits, which are also error-prone.
Because neural networks have the advantage of automatically extracting high-order representation from relation instances, they are widely used to support relation extraction. For example, Leng et al. [28] designed a deep architecture fusing both word level and sentence level information. Liu et al. [9] introduced an early model based on a neural network, where a convolutional neural network (CNN) was designed to support relation extraction. In relation extraction, the entity position is important information to point to the entity pair. Zeng et al. [29] utilized relative distance between the current word and the target entity pair to encode position features under a CNN model. To utilize contextual features between the named entities, Zeng et al. [12] proposed a piecewise convolutional neural network (PCNN). Li et al. [30] combined the PCNN and attention mechanism for relation extraction. Zhang et al. [31] applied CNN model to shape classification in image processing. Wang et al. [32] used a bidirectional shortest dependency path (Bi-SDP) attention mechanism to capture the dependency information of words. To capture semantic dependencies in a relation instance, a long short-term memory (LSTM) model with attention mechanism was also applied [33,34]. Based on a graph neural network, Zhao et al. [35] proposed an N-gram Graph LSTM (NGG LSTM) model to capture sentence structure information. Chen et al. [13] used a multichannel deep neural network to partition a whole sentence into five parts, enabling the neural network to learn different representations for the same word.
Instead of investigating position embedding, Zhang et al. [18] utilized entity indicators to identify the start and end of the entity pair under a recurrent neural network. In this approach, indicators are placed on both sides of the entity pair to point to entity positions. An attention-based bidirectional long short-term memory (Att-BLSTM) model was proposed for obtaining contextual semantic dependencies from long texts [33]. Entity indicators were also used as the position indicators in this method. Based on the pretrained language model BERT [36], Soares et al. [17] placed four entity indicators on both sides of two named entities. The outputs of these four entity marker representations are concatenated as the relation representation. Zhong et al. [19] used entity type information to mark the entity pair and to learn the relation representation. Huang et al. [37] proposed a knowledge graph enhanced transformer encoder architecture to handle the semantic gap problem between word embeddings and entity embeddings.
The study on named entity recognition (NER) is important for relation extraction. Due to relation extraction being based on the correct position of entity pair in the sentences, the study on multiple languages NER is becoming more and more popular. McDonough et al. [38] studied the specified scope geographic NER about modern french. Isozaki et al. [39] studied Japanese named entity recognition based on a rule generator and decision tree learning. Medical NER in Swedish and Spanish has also been studied [40]. In summary, the ability to capture position information about named entities is highly important for supporting relation extraction and the study on relation extraction can also extend to multiple languages. In feature-based models, features are combined to capture the position information. The shortcoming of feature-based models is that manually designed rules should be used to generate combined features. Furthermore, comparing with distributed representations of words, categorical features are not effective to encode semantic information of words. In kernel-based models, dependency trees are used to model the sentence structure. The main problem of kernel-based models is that generating dependency trees depends strongly on external toolkits, which are also error-prone. This process also depends on manually designed distance functions for computing the similarities between the relation instances. The neural network is the most popular technique for relation extraction. Position embedding, multichannel, PCNN and entity markers were proposed to capture the position information. The shortcoming of these methods is that they cannot simultaneously capture both position information and semantic information.

3. Methodology

A relation instance is defined as a triad I = r , a 1 , a 2 that consists of a relation mention (r) and two arguments ( a 1 , a 2 ). A relation mention is a sentence or a clause that mentions a relation, e.g., r = w 1 , w 2 , , w N . Arguments refer to named entities in the relation mention.
In a deep neural network, word embedding is implemented to transform the one-hot representation of a word into a dense semantic space. The embedding operation is represented as:
[ H 1 e , H 2 e , , H N e ] = E m b e d d i n g ( r )
The output is referred as H e . The superscript e represents that it is a hidden layer outputted by an embedding layer.
Let W c R L × K be a filter. Then, the convolutional operation is defined as H i c = f c ( W c · [ H i e , , H i + K e ] + b ) , where f c denotes a nonlinear function, and b is a bias term. Then, the implementation of a convolutional operation through the H e is represented as:
[ H 1 c , H 2 c , , H N K + 1 c ] = C o n v ( H e ) .
The output of the convolutional layer is referred to as H c R ( N K + 1 ) × N , representing a high-order abstract representation generated from the local patches of the input. It is often followed with a pooling layer to collect the salient features. A padding operation can be designed to generate a matrix H c with the same column size as H e .
If the dependency between the inputs is introduced, the recurrent operation can be defined as H i r = f r ( U r · H i e + V r · H i 1 r + b ) , where f r is a nonlinear function, and U r , V r and b are the parameters of the recurrent operation.
The recurrent operation indicates that the output H i r depends on both the input H i e and the previous state H i 1 r . Implementing a recurrent operation on H e , the output H r can be formalized as:
[ H 1 r , H 2 r , , H N r ] = R e c u ( H e )
The recurrent network has the advantage of capturing the dependency information in a sentence.
In a deep neural network, the CNN and RNN are mainly used to support designed abstract feature transformation. They usually follow a fully connected layer ( C o n n ( · ) ) and an output layer ( S o f t m a x ( · ) ), carrying out a global adjustment and outputting normalized results.
For the CNN and RNN, their abilities to capture structural information are different. In the CNN, the parameters W c and b are shared across the whole input matrix H e . Due to the vanishing gradient problem, the recurrent network weakly captures the long-distance dependence in a sentence. Let w i and w j ( i + K < j ) be two named entities in a relation mention r, where K is the size of the convolutional filter. Because the distance between w i and w j is larger than K, the convolutional operation cannot capture the dependency between them. In the recurrent operation, the semantic information of w i is propagated as H j r = f r ( U r · H j e + V r · ( ( f r ( U r · H i + 1 e + V r · H i r + b ) ) ) + b ) , where the influence of w i vanishes considerably. In this condition, both the CNN and RNN are unable to capture the dependency between the entity pair. Furthermore, named entities can be composed of any words. A neural network can be easily bewildered when learning a relation representation without position information for the named entities.
In this paper, entity indicators are proposed for capturing position information of a relation instance. Prior to introducing the entity indicator method, two traditional strategies for getting entity positions are presented, which will be compared with the entity indicator strategy.
Position Embedding: Let i and j denote the start positions of two named entities e 1 and e 2 in a sentence, respectively. Then, the coordinate of a word with position k is computed as [ k i , k j ] . The vector is embedded into a higher-dimension vector, concatenated with the word embedding and fed into a neural network. This process is formalized as follows:
H e = [ , E m b e d d i n g ( w k ) E m b e d d i n g ( [ k i , k j ] ) , ] | k = 1 N
Let N, L and D represent the length of a relation mention, the dimension of word embedding and the dimension of position embedding, respectively. Then, H e R N × ( L + D ) is the output of word embedding concatenated with position embedding.
Multichannel: In the multichannel method, a relation instance r = [ w 1 , , w i , , w i + s , , w j , , w j + t , , w N ] can be partitioned into five parts by two named entities.
( 1 ) r 1 = [ w 1 , , w i 1 ] ( 2 ) r 2 = [ w i , , w i + s ] ( 3 ) r 3 = [ w i + s + 1 , , w j 1 ] ( 4 ) r 4 = [ w j , , w j + t ] ( 5 ) r 5 = [ w j + t + 1 , , w N ]
Every part is viewed as an independent channel that uses a nonshared lookup table to transform every word in each channel into a vector representation. This approach can be formalized as follows:
y = S o f t m a x C o n n k = 1 5 C o n v E m b e d d i n g k ( r k )
The multiple channels enable the same word to express different semantic meanings in different channels. In the training process, these channels do not interact during recurrent propagation, enabling the neural network to learn different representations for the same word.

3.1. Entity Indicators

In a traditional relation extraction task, named entities are manually labeled and given as inputs. They have precise positions in a sentence. By inserting specific tokens next to the boundaries of named entities, it enables a deep neural network to concentrate on the considered entity pair. Each indicator is encoded into the same representation, which can be seen as the “anchor” of a relation instance. Therefore, in addition to pointing to the positions of arguments, they are beneficial for learning the dependencies between the entity indicators and words.
Let r = w 1 , , w i , , w i + s , , w j , , w j + t , , w N be a relation mention where e 1 = w i , , w i + s and e 2 = w j , , w j + t denote two named entities in r. Because of asymmetrical relation types, the relation mention r can generate two relation instances: I 1 = r , a 1 = e 1 , a 2 = e 2 and I 2 = r , a 1 = e 2 , a 2 = e 1 . In a relation instance, four indicators are inserted into two sides of arguments a 1 and a 2 , denoted as l 11 , l 12 , l 21 and l 22 , respectively. Then, the relation mentions for the relation instances I 1 and I 2 are revised as:
w 1 , , l 11 , w i , , w i + s , l 12 , , l 21 , w j , , w j + t , l 22 , , w N w 1 , , l 21 , w i , , w i + s , l 22 , , l 11 , w j , , w j + t , l 12 , , w N
where l 11 , l 12 , l 21 and l 22 denote predefined tokens. When a relation mention is implanted with these indicators and fed into a neural network, the network will “know” information about the positions of the arguments. Furthermore, entity indicators can encode more information, such as syntactic or semantic information. In this paper, three types of entity indicators are proposed: position indicators, semantic indicators and compound indicators. To simplify our discussion, entity indicators of arguments a 1 and a 2 are denoted as a quadruple l 11 , l 12 , l 21 , l 22 .
Position Indicators: These indicators point to the positions of two arguments in a relation mention. To show the influence of position indicators, they are divided into three subcategories. In the first category, all indicators use the same mark, for example, [ P ] , [ P ] , [ P ] , [ P ] , where “ [ P ] ” is a token that indicates that it is a boundary of an argument. In the second category, the beginning and end boundaries of a named entity are distinguished, e.g., [ P ] , [ / P ] , [ P ] , [ / P ] , where [ P ] and [ / P ] represent the beginning and end boundaries of an argument, respectively. In the third category, entity indicators are modified with marks about the attached arguments, e.g., [ P _ 1 ] , [ / P _ 1 ] , [ P _ 2 ] , [ / P _ 2 ] , representing the beginning and end boundaries relative to a 1 and a 2 , respectively.
Semantic Indicators: The entity type and subtype contain important semantic information about named entities. Therefore, using the entity type and subtype as entity indicators can capture both entity positional information and entity semantic information. Entity indicators can be combined with entity types, subtypes and argument positions and are divided into four subcategories: entity type indicator, entity subtype indicator, entity type with position and entity subtype with position. For example, let “PER” and “ORG” represent the entity types person and organization, respectively. “IND” and “GOV” represent the entity subtypes individual and government, respectively. Two indicator quadruples can be generated as [ P E R ] , [ / P E R ] , [ O R G ] , [ / O R G ] or [ I N D ] , [ / I N D ] , [ G O V ] , [ / G O V ] . To distinguish arguments between a 1 and a 2 , subscripts can also be added, e.g., [ P E R _ 1 ] , [ / P E R _ 1 ] , [ O R G _ 2 ] , [ / O R G _ 2 ] , where [ / P E R _ 1 ] means that it is an end boundary of argument a 1 with entity type person.
Compound Indicators: The above semantic indicators have shown the ability to combine semantic information and positional information (e.g., [ / P E R _ 1 ] ). This strategy can be extended further to generate more complex indicators, which are referred as compound indicators. Compound indicators have a two-side effect. On the one hand, they encode more syntactic or semantic information, which is beneficial for enhancing the discriminability of a neural network. On the other hand, they also lead to a sparse representation, which disperses the significance of the indicators. In this paper, two types of compound indicators are evaluated to demonstrate their influence on the performance. The first utilizes the entity type and subtype simultaneously. For example, [ / P E R _ I N D _ 1 ] combines three types of information about the entity type person (PER), subtype individual (IND) and argument position ( a 1 ). In the second type, syntactic features are introduced to indicate the Part-of-speech (POS) tag of the left word (e.g., [ V _ I N D _ 1 ] ) or the POS tag of the right word (e.g., [ / N _ I N D _ 1 ] ), where V and N represent verb and noun, respectively.
In summary, three types and nine subtypes of entity indicators are proposed in this paper. They are listed in Table 1.
In all the indicators, the entity type and subtype are manually annotated in the employed corpus. They are widely used in the field of information extraction. To generate POS tags of words, two popular POS tools J I E B A and N L T K were adopted for Chinese and English, respectively. In addition to the above three indicator types, syntactic indicators can be used. For example, [ V _ 1 ] , [ / N _ 1 ] indicate that for the first argument, the left word is a verb and the right word is a noun. However, because POS tags are generated by external toolkits that are error-prone, they are not used independently.

3.2. Model

After entity indicators were inserted into relation mentions, they are ready for processing by a deep neural network. In this paper, we designed a simple but effective architecture to evaluate the effectiveness of entity indicators. This model is composed of an input layer, an embedding layer, a convolutional layer and an output layer. The architecture of this model is shown in Figure 1.
In the input layer, instead of directly input an original sentence, entity indicators are implanted into relation mentions to point to the position, syntactic and semantic information of the arguments.
In the embedding layer, three approaches are adopted to support word embedding. In the first approach, a randomly initialized lookup table is adopted to support embedding. In the second approach, wiki-100 [41] and GoogleNews-vectors-negative-300 are adopted for Chinese and English word embeddings, respectively. The third model is based on BERT [36], which is pretrained with external resources by an unsupervised method. This approach is effective to capture semantic information of words. Furthermore, BERT is based on the Transformer [11], which can learn the dependency between words.
The convolutional layer performs four one-dimensional convolutional operations with a kernel shape of 3 × 1 on the output of the embedding layer. The output of this operation is a vector with size 50. The convolutional layer automatically learns abstract representation from local features. The output of the convolutional layer is fed into max-pooling and fully connected operations. The pooling layer collects salient features from the input, which reduces the parameters of the model and extends the generalizability of the model. A fully connected layer resembles the features for global regulation. In the output layer, a cross-entropy loss function is adopted to calculate the loss during the training process (Our code to implement deep learning model with entity indicators for relation extraction is available at:, accessed on 20 March 2021).

4. Experiments

The experiment used a NVIDIA Tesla P40 GPU for training and testing under Linux environment. In this section, three datasets the ACE 2005 Chinese corpus, ACE 2005 English corpus [42] and Chinese literature text corpus (CLTC) [43] were adopted to evaluate the performance of entity indicators.
The ACE 2005 corpus is a classic dataset for automatic content extraction. It is collected from weblogs, broadcast news, newsgroups and broadcast conversation. The corpus was annotated with 7 entity types, 44 entity subtypes, 6 relation types and 18 relation subtypes. The Chinese corpus contains 628 documents, containing 9244 relation mentions that are used as positive instances. If two named entities have no predefined relation type, the instance should be considered as negative. To generate negative instances for training, the method in Chen et al. [25] is adopted, which generates 98,140 negative instances. The ACE English corpus generates 6583 positive instances and 97,534 negative instances.
The Chinese literature text corpus [43] is a discourse-level corpus whose articles are collected from Chinese literature. In total, seven entity types and nine relation types are manually annotated according to a gold standard. In this corpus, the entity relation only exists in four entity types: Thing, Person, Location and Organization. The corpus contains a total of 695 articles for training, 58 for validation and 84 for testing.
The experimental performance is measured by the commonly used evaluation indexes in the NLP field, namely, precision rate (P), recall rate (R) and comprehensive evaluation (F1). The total performance (referred to as “Total”) is the macro-average over all positive relation types. The ACE 2005 corpus was divided into 8:1:1 for training, validation and testing in our experiments. The fixed length of a sentence is set as 100.

4.1. Performance of Entity Indicators

In this section, based on the ACE Chinese corpus, ACE English corpus and CLTC corpus, all the entity indicators in Table 1 were evaluated to demonstrate their influence on the performance. In this experiment, word embeddings were initialized by two pretrained embedding models, wiki-100 [41] for Chinese and GoogleNews-vectors-negative-300 for English. The results are shown in Table 2, where the “None” model was implemented on the relation mentions directly without any entity indicators and was implemented for comparison. “×” means that the performance is not supported because no entity subtype was annotated in the CLTC corpus.
Comparison of the ACE Chinese corpus and the CLTC corpus shows that the performance is higher for the ACE Chinese corpus. This is because in the CLTC corpus, sentence semantics are often expressed in a subtle and special manner [44]. Their intuitions and feelings are usually expressed through very complex and flexible sentence structures
The results show that distinguishing the beginning and end boundaries of named entities is beneficial for improving the performance (e.g., “P_D ” and “P_TS”). When entity types are encoded into the entity indicators, the performance is improved considerably. All the entity indicators considerably outperform the “None” strategy. As Table 2 shows, the performance is improved with entity indicators encoding more relevant information.
When the POS tags are used in the indicator “C_PTSA ”, the ACE Chinese corpus shows an impressive improvement, outperforming the “ C_DTSA" by more than 12% in F1 score. An improvement was also observed for the CLTC corpus. On the other hand, on the English corpus, the performance is unexpectedly decreased with POS tags. The reason of the difference may be that English is an alphabetic language, where many adjacent words are function words (e.g., articles, auxiliary words), which are more ambiguous and have little lexical meaning, possibly affecting the performance.

4.2. Comparison with Other Strategies

In this experiment, the entity indicator method was compared with the position embedding and multichannel methods to demonstrate their ability to capture the entity positions of a relation instance. The performance on the three corpora is shown in Table 3.
The “None” is the same as in Table 2. In the position embedding model, the position coordinates of every word are embedded into a 25-dimensional vector. In the multichannel model [13], each channel employs an independent lookup table for word embedding that does not interact during recurrent propagation. In the entity indicator model, the “C_PTSA” encoding is used for the ACE Chinese and CLTC corpora, while the “C_DTSA” encoding is adopted for the ACE English corpus. To avoid the influence caused by external resources, all the word embeddings are initialized by randomly initialized lookup tables.
Position embedding and multichannel are two traditional strategies to support relation extraction [13,18]. A comparison of Table 2 and Table 3 shows that in the CLTC corpus, the performance displays impressive improvements. However, position embedding and multichannel have little effect on the ACE corpus. This is because ACE suffers from a serious data imbalance problem. Furthermore, because the relation types of ACE are asymmetrical, every positive instance (e.g., s , e 1 , e 2 ) has a corresponding negative instance (e.g., s , e 2 , e 1 ). In this condition, the position information of named entities is more complex, worsening the final performance.
Compared with other strategies, entity indicators show remarkable improvements. In both the ACE corpus and the CLTC corpus, the performance is stably increased. The reason for the improvement is that named indicators encode positional information (e.g., entity boundary positions), syntactic information (e.g., POS tags) and semantic information (e.g., entity types and subtypes). Thus, they provide powerful support for relation extraction.

4.3. Evaluation on the Chinese Corpus

In the ACE Chinese corpus, the related works can be divided into those using shallow architecture and those using deep architecture. Some of them are listed in Table 4. Among the shallow architecture models, Yu et al. [45] proposed a convolutional tree kernel-based approach for relation extraction. Liu et al. [26] adopted a tree-kernel model that combines external semantic resources (HowNet) to support relation extraction. Chen [46] proposed a feature calculus strategy. In deep architecture models, Li et al. [47] proposed a lattice LSTM model that combines multigrained information for relation extraction. Chen et al. [48] designed a CNN-attention neural network model.
For comparison, the same settings as those in Chen et al. [48] were adopted in this experiment. Shallow architecture models divide the evaluation corpus into 8:2 for training and testing. In deep architecture models, the corpus is divided into 8:1:1 for training, developing and testing. In this experiment, two strategies are adopted to initialize the lookup table for word embedding. The first is “Random-CNN”, which uses a randomly initialized lookup table. In the second (“BERT-CNN”), BERT [36] is adopted to support the word embedding. The embedding layer outputs 768-dimensional vectors.
Because kernel-based models are heavily based on error-prone parsing trees, the methods reported by Yu et al. [45] and Liu et al. [26] show lower performance. Chen et al. [46] designed a feature calculus strategy to generate combined features, achieving state-of-the-art performance. In the works of Li et al. [47] and Chen et al. [48], the performance is worsened because their neural networks are directly implemented on raw inputs and cannot utilize the positional and semantic information of named entities. Compared with the related works, entity indicators receive the highest performance, outperforming the state-of-the-art by 3% in F1 score for relation types and by 7% in F1 score for relation subtypes.
In the following, the entity indicator method was evaluated on the CLTC corpus, which was published by Peking University in 2017 [43]. It contains 695 articles for training, 58 for validation and 84 for testing. Because the training, validation and testing articles are manually divided, it is convenient for comparing between different systems. Wen et al. [44] have tested several models on this corpus, and the obtained performance characteristics are listed in Table 5, where the column “Arch.” denotes neural network architectures in corresponding models.
In the compared models, Zhang et al. [54] proposed a multifeature fusion model, which integrates multilevel features into deep neural network models. The convolutional layer is built on the Att-BLSTM model to capture the features at the word level to obtain more structural information. Wen et al. [44] obtained the highest performance by implementing two bidirectional LSTMs on the shortest dependency path of relation instances. Because Chinese sentences are insensitive to structure, parsing a Chinese sentence is error-prone and depends strongly on external toolkits. Therefore, the performance is influenced by the parsing process. On the other hand, entity indicators are based on manually annotated named entities. They contain precise position information about relation arguments. Combined with POS tags and entity types, the entity indicators are effective for utilizing entity positional and semantic information. Compared with that of the SR-BRCNN model, the performance of this approach is improved by more than 11% in terms of the F1-score.

4.4. Evaluation on the English Corpus

In this experiment, the entity indicator is compared with several related works implemented on the ACE English corpus.
Kambhatla et al. [20] proposed a feature-based maximum entropy (ME) model. Zheng et. al. [16] used multiple CNN convolution kernels of different sizes to extract features from raw inputs. Gormley et al. [55] presented a feature-rich compositional embedding model (FCM) that combines handcrafted features and word embeddings. Zhou et al. [56] adopted an SVM model with diverse lexical, syntactic and semantic knowledge. Zhong et al. [19] used encoding of entity markers to represent the relation mention based on BERT model. Chen et al. [46] proposed a feature calculus method, where combined features are generated for capturing structural information of the sentences. The results are presented in Table 6.
The results showed that entity indicators outperformed the compared work by approximately 5% F1-score in relation types and approximately 18.5% F1-score in relation subtypes.

5. Conclusions

Unlike sentence classification that makes a prediction based on sentence representation, relation extraction should consider the semantic information between two named entities. Because a sentence often contains several named entities that share the same context, directly making a decision-based sentence representation learned from raw inputs is not effective for supporting relation extraction. In this paper, entity indicators were proposed to capture position information of a relation instance. Instead of implementation on a raw input, task-related entity indicators are inserted into each relation instance. This strategy lets the neural network “know” the position, syntactic and semantic information of the named entities about a relation instance. The uniformly structured indicators can make the similarity between sentences and the internal symmetry of one sentence more obviously. It also helps neural networks to learn semantic dependencies in a relation instance. Experiments have shown that this is a powerful approach for supporting relation extraction. In further work, the notion of using entity indicators can be extended to other NLP tasks (e.g., named entities) for a neural network to capture the structural and semantic information of a sentence.

Author Contributions

Y.Q.: formal analysis, funding acquisition, project administration and supervision; W.Y.: data curation, investigation, methodology, resources and writing—original draft; K.W.: investigation and writing—review and editing; R.H.: funding acquisition and supervision; F.T.: supervision and writing—review and editing; S.A.: data curation; Y.C.: conceptualization, formal analysis, methodology, supervision and writing—review and editing. All authors have read and agreed to the final version of the manuscript.


This work is supported by the Joint Funds of the National Natural Science Foundation of China under Grant No. U1836205, the Major Research Program of National Natural Science Foundation of China under Grant No. 91746116, National Natural Science Foundation of China under Grant No. 62066007 and No. 62066008, the Major Special Science and Technology Projects of Guizhou Province under Grant No. [2017]3002, the Key Projects of Science and Technology of Guizhou Province under Grant No. [2020] 1Z055 and Project of Guizhou Province Graduate Research Fund (Qianjiaohe YJSCXJH[2019]102).


Thanks to the editors and anonymous reviewers for their valuable suggestions and comments that made our final version of the paper more perfect.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Hendrickx, I.; Kim, S.N.; Kozareva, Z.; Nakov, P.; Séaghdha, D.O.; Padó, S.; Pennacchiotti, M.; Romano, L.; Szpakowicz, S. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, ACL, Uppsala, Sweden, 15–16 July 2010; pp. 33–38. [Google Scholar]
  2. Agosti, M.; Nunzio, G.M.D.; Marchesin, S.; Silvello, G. A relation extraction approach for clinical decision support. arXiv 2019, arXiv:1905.01257. [Google Scholar]
  3. Zheng, S.; Dharssi, S.; Wu, M.; Li, J.; Lu, Z. Text mining for drug discovery. In Bioinformatics and Drug Discovery; Springer: Berlin/Heidelberg, Germany, 2019; pp. 231–252. [Google Scholar]
  4. Jabbari, A.; Sauvage, O.; Zeine, H.; Chergui, H. A french corpus and annotation schema for named entity recognition and relation extraction of financial news. In Proceedings of the LREC ’20, Marseille, France, 11–16 May 2020; pp. 2293–2299. [Google Scholar]
  5. Macdonald, E.; Barbosa, D. Neural relation extraction on wikipedia tables for augmenting knowledge graphs. In Proceedings of the CIKM ’20, Galway, Ireland, 17–20 August 2020; pp. 2133–2136. [Google Scholar]
  6. Li, X.; Yin, F.; Sun, Z.; Li, X.; Yuan, A.; Chai, D.; Zhou, M.; Li, J. Entity-relation extraction as multi-turn question answering. arXiv 2019, arXiv:1905.05529. [Google Scholar]
  7. Han, R.; Liang, M.; Alhafni, B.; Peng, N. Contextualized word embeddings enhanced event temporal relation extraction for story understanding. arXiv 2019, arXiv:1904.11942. [Google Scholar]
  8. Liu, K. A survey on neural relation extraction. Sci. China Technol. Sci. 2020, 63, 1971–1989. [Google Scholar] [CrossRef]
  9. Liu, C.Y.; Sun, W.B.; Chao, W.H.; Che, W.X. Convolution neural network for relation extraction. In Proceedings of the DMA 2013: Advanced Data Mining and Applications, Hangzhou, China, 14–16 December 2013. [Google Scholar]
  10. Li, Z.; Yang, J.; Gou, X.; Qi, X. Recurrent neural networks with segment attention and entity description for relation extraction from clinical texts. Artif. Intell. Med. 2019, 97, 9–18. [Google Scholar] [CrossRef] [PubMed]
  11. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  12. Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015. [Google Scholar]
  13. Chen, Y.; Wang, K.; Yang, W.; Qing, Y.; Huang, R.; Chen, P. A multi-channel deep neural network for relation extraction. IEEE Access 2020, 8, 13195–13203. [Google Scholar] [CrossRef]
  14. Xu, Y.; Mou, L.; Li, G.; Chen, Y.; Peng, H.; Jin, Z. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the Proceedings of the EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; pp. 1785–1794. [Google Scholar]
  15. Zhang, C.; Xu, W.; Ma, Z.; Gao, S.; Guo, J. Construction of semantic bootstrapping models for relation extraction. Knowl. Based Syst. 2015, 83, 128–137. [Google Scholar] [CrossRef]
  16. Zheng, S.; Xu, J.; Zhou, P.; Bao, H.; Qi, Z.; Xu, B. A neural network framework for relation extraction: Learning entity semantic and relation pattern. Knowl. Based Syst. 2016, 114, 12–23. [Google Scholar] [CrossRef]
  17. Soares, L.B.; FitzGerald, N.; Ling, J.; Kwiatkowski, T. Matching the blanks: Distributional similarity for relation learning. arXiv 2019, arXiv:1906.03158. [Google Scholar]
  18. Zhang, D.; Wang, D. Relation classification via recurrent neural network. arXiv 2015, arXiv:1508.01006. [Google Scholar]
  19. Zhong, Z.; Chen, D. A frustratingly easy approach for joint entity and relation extraction. arXiv 2020, arXiv:2010.12812. [Google Scholar]
  20. Kambhatla, N. Combining lexical, syntactic and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL (07 2004), Barcelona, Spain, 21–26 July 2004. [Google Scholar] [CrossRef]
  21. Zelenko, D.; Aone, C.; Richardella, A. Kernel methods for relation extraction. J. Mach. Learn. Res. 2003, 3, 1083–1106. [Google Scholar]
  22. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
  23. Dashtipour, K.; Gogate, M.; Adeel, A.; Algarafi, A.; Howard, N.; Hussain, A. Persian named entity recognition. In Proceedings of the 2017 IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), Oxford, UK, 26–28 July 2017; pp. 79–83. [Google Scholar]
  24. Minard, A.-L.; Ligozat, A.-L.; Grau, B. Multi-class svm for relation extraction from clinical reports. In Proceedings of the International Conference Recent Advances in Natural Language Processing, Varna, Bulgaria, 2–4 September 2011. [Google Scholar]
  25. Chen, Y.; Zheng, Q.; Chen, P. Feature assembly method for extracting relations in chinese. Artif. Intell. 2015, 228, 179–194. [Google Scholar] [CrossRef]
  26. Liu, D.; Hu, Y.; Qian, L. Exploiting lexical semantic resource for tree kernel-based chinese relation extraction. In Proceedings of the NLPCC, Beijing, China, 31 October–5 November 2012; pp. 213–224. [Google Scholar]
  27. Panyam, N.C.; Verspoor, K.; Cohn, T.; Kotagiri, R. Asm kernel: Graph kernel using approximate subgraph matching for relation extraction. In Proceedings of the ALTA 2016, Perth, Australia, 21–28 May 2016; pp. 65–73. [Google Scholar]
  28. Leng, J.; Jiang, P. A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm. Knowl. Based Syst. 2016, 100, 188–199. [Google Scholar] [CrossRef]
  29. Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of the COLING’14, Dublin, Ireland, 23–29 August 2014. [Google Scholar]
  30. Li, Y.; Nee, M.; Li, G.; Chang, V. Effective piecewise cnn with attention mechanism for distant supervision on relation extraction task. In Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk 2020 (COMPLEXIS 2020), Prague, Malta, 8–9 May 2020. [Google Scholar]
  31. Zhang, C.; Zheng, Y.; Guo, B.; Li, C.; Liao, N. Scn: A novel shape classification algorithm based on convolutional neural network. Symmetry 2021, 13, 499. [Google Scholar] [CrossRef]
  32. Wang, H.; Qin, K.; Lu, G.; Luo, G.; Liu, G. Direction-sensitive relation extraction using bi-sdp attention model—Sciencedirect. Knowl. Based Syst. 2020, 198, 105928. [Google Scholar] [CrossRef]
  33. Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2, pp. 207–212. [Google Scholar]
  34. Lee, J.; Seo, S.; Choi, Y.S. Semantic relation classification via bidirectional lstm networks with entity-aware attention using latent entity typing. Symmetry 2019, 11, 785. [Google Scholar] [CrossRef] [Green Version]
  35. Zhao, L.; Xu, W.; Gao, S.; Guo, J. Cross-sentence n-ary relation classification using lstms on graph and sequence structures. Knowl. Based Syst. 2020, 207, 106266. [Google Scholar] [CrossRef]
  36. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  37. Huang, W.; Mao, Y.; Yang, Z.; Zhu, L.; Long, J. Relation classification via knowledge graph enhanced transformer encoder. Knowl. Based Syst. 2020, 206, 106321. [Google Scholar] [CrossRef]
  38. McDonough, K.; Moncla, L.; van de Camp, M. Named entity recognition goes to old regime france: geographic text analysis for early modern french corpora. Int. J. Geogr. Inf. Sci. 2019, 33, 2498–2522. [Google Scholar] [CrossRef]
  39. Isozaki, H. Japanese named entity recognition based on a simple rule generator and decision tree learning. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, 6–11 July 2001; pp. 314–321. [Google Scholar]
  40. Weegar, R.; Pérez, A.; Casillas, A.; Oronoz, M. Deep medical entity recognition for swedish and spanish. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 1595–1601. [Google Scholar]
  41. Dong, C.; Zhang, J.; Zong, C.; Hattori, M.; Di, H. Character-based lstm-crf with radical-level features for chinese named entity recognition. In Natural Language Understanding and Intelligent Applications; Springer: Berlin/Heidelberg, Germany, 2016; pp. 239–250. [Google Scholar]
  42. Walker, C.; Strassel, S.; Medero, J.; Maeda, K. Ace 2005 multilingual training corpus. Linguist. Data Consort. Phila. 2006, 57, 45. [Google Scholar]
  43. Xu, J.; Wen, J.; Sun, X.; Su, Q. A discourse-level named entity recognition and relation extraction dataset for chinese literature text. arXiv 2017, arXiv:1711.07010. [Google Scholar]
  44. Wen, J.; Sun, X.; Ren, X.; Su, Q. Structure regularized neural network for entity relation classification for chinese literature text. arXiv 2018, arXiv:1803.05662. [Google Scholar]
  45. Liu, D.; Zhao, Z.; Hu, Y.; Qian, L. Chinese semantic relation extraction based on syntax and entity semantic tree. J. Chin. Inf. Process. 2010, 24, 11–21. [Google Scholar]
  46. Chen, Y.; Zheng, Q.; Chen, P. A set space model for feature calculus. IEEE Intell. Syst. 2017, 32, 36–42. [Google Scholar] [CrossRef]
  47. Li, Z.; Ding, N.; Liu, Z.; Zheng, H.; Shen, Y. Chinese relation extraction with multi-grained information and external linguistic knowledge. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 4377–4386. [Google Scholar]
  48. Chen, Y.; Wang, G.; Zheng, Q.; Qin, Y.; Huang, R.; Chen, P. A set space model to capture structural information of a sentence. IEEE Access 2019, 7, 142515–142530. [Google Scholar] [CrossRef]
  49. Zhang, P.; Li, W.; Hou, Y.; Song, D. Developing position structure-based framework for chinese entity relation extraction. ACM Trans. Asian Lang. Inf. Process. 2011, 10. [Google Scholar] [CrossRef]
  50. Socher, R.; Pennington, J.; Huang, E.H.; Ng, A.Y.; Manning, C.D. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Scotland, UK, 27–31 July 2011; pp. 151–161. [Google Scholar]
  51. Santos, C.N.d.; Xiang, B.; Zhou, B. Classifying relations by ranking with convolutional neural networks. arXiv 2015, arXiv:1504.06580. [Google Scholar]
  52. Liu, Y.; Wei, F.; Li, S.; Ji, H.; Zhou, M.; Wang, H. A dependency-based neural network for relation classification. arXiv 2015, arXiv:1507.04646. [Google Scholar]
  53. Cai, R.; Zhang, X.; Wang, H. Bidirectional recurrent convolutional neural network for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 756–765. [Google Scholar]
  54. Zhang, J.; Hao, K.; Tang, X.s.; Cai, X.; Xiao, Y.; Wang, T. A multi-feature fusion model for chinese relation extraction with entity sense. Knowl. Based Syst. 2020, 206, 106348. [Google Scholar] [CrossRef]
  55. Gormley, M.R.; Yu, M.; Dredze, M. Improved relation extraction with feature-rich compositional embedding models. arXiv 2015, arXiv:1505.02419. [Google Scholar]
  56. Zhou, G.; Su, J.; Zhang, J.; Zhang, M. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annual meeting of the association for computational linguistics (acl’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 427–434. [Google Scholar]
Figure 1. Framework of neural network.
Figure 1. Framework of neural network.
Symmetry 13 00539 g001
Table 1. Types of entity indicators.
Table 1. Types of entity indicators.
Dull PositionsP_D [ P ] , [ P ]
Two-side PositionsP_TS [ P ] , [ / P ]
Two-side-ARG PositionsP_TSA [ P _ 1 ] , [ / P _ 1 ]
Two-side TypesS_TS_T [ P E R ] , [ / P E R ]
Two-side SubtypesS_TS_S [ I N D ] , [ / I N D ]
Two-side-ARG TypesS_TSA_T [ P E R _ 1 ] , [ / P E R _ 1 ]
Two-side-ARG SubtypesS_TSA_S [ I N D _ 1 ] , [ / I N D _ 1 ]
Dual-types-side-ARGC_DTSA [ P E R _ I N D _ 1 ] , [ / P E R _ I N D _ 1 ]
Type-POS-side-ARGC_PTSA [ V _ P E R _ 1 ] , [ / V _ P E R _ 1 ]
Table 2. Performance of the entity indicators.
Table 2. Performance of the entity indicators.
Entity IndicatorsACE ChineseACE EnglishCLTC
Table 3. Comparison with traditional strategies.
Table 3. Comparison with traditional strategies.
DataTYPENonePosition Emb.MultichannelEntity Indicator
ACE ChinesePHYS39.7045.4042.3653.2133.3340.9955.0025.2934.6585.8883.9184.88
ACE EnglishPHYS62.9631.2941.8075.0034.9447.7082.6134.9749.1472.8662.5867.33
Table 4. Comparison on the ACE 2005 Chinese corpus.
Table 4. Comparison on the ACE 2005 Chinese corpus.
Yu et al. [45]Convolutional kernel based on syntax and entity semantic tree.75.30
Liu et al. [26]Tree-Kernel with lexical semantic resources.81.10
Zhang et al. [49]Position structures between named entities.80.71
Chen et al. [46]Combined features for capturing structural information.93.01
Li et al. [47]Lattice LSTM with multigrained information.×
Chen et al. [48]A CNN and attention architecture.82.35
OursRandom-CNN with the “C_PTSA” encoding.91.18
BERT-CNN with the “C_PTSA” encoding.95.32
Table 5. Comparison on Chinese literature text corpus.
Table 5. Comparison on Chinese literature text corpus.
Hendrickx et al. [1]SVMWord embeddings, NER, WordNet, dependency parse, HowNet, POS, Google n-gram48.9
Socher et al. [50]RNNWord embeddings, POS, NER, WordNet49.1
Zeng et al. [29]CNNWord embeddings, position embeddings, NER, WordNet52.4
Santos et al. [51]CR-CNNWord embeddings, position embeddings54.1
Xu et al. [14]SDP-LSTMWord embeddings, POS, NER, WordNet55.3
Liu et al. [52]DepNNWord embeddings, WordNet55.2
Cai et al. [53]BRCNNWord embeddings, POS, NER, WordNet55.6
Zhang et al. [54]C-ATT-BLSTMCharacter embedding, position embedding, entity sense56.2
Wen et al. [44]SR-BRCNNWord embeddings, POS, NER, WordNet65.9
OursRandom-CNNEntity indicators with the S_TSA_T encoding.74.72
BERT-CNNEntity indicators with the S_TSA_T encoding.77.14
Table 6. Comparison on the ACE 2005 English corpus.
Table 6. Comparison on the ACE 2005 English corpus.
Kambhatla et al. [20]MEA feature-based model.63.50
Zheng et al. [16]MIX-CNNAutomatically extract features based on multiple CNNs.60.00
Gormley et al. [55]FCMCombine features and word embeddings.71.52
Zhou et al. [56]SVMPhrase chunking information77.20
( 63.10 )
( 49.50 )
( 55.50 )
Zhong et al. [19]BERTEntity marker embedding×
Chen et al. [46]SSMFeature calculus84.50
OursBERT-CNNEntity indicators with the C_PTSA encoding.88.83
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qin, Y.; Yang, W.; Wang, K.; Huang, R.; Tian, F.; Ao, S.; Chen, Y. Entity Relation Extraction Based on Entity Indicators. Symmetry 2021, 13, 539.

AMA Style

Qin Y, Yang W, Wang K, Huang R, Tian F, Ao S, Chen Y. Entity Relation Extraction Based on Entity Indicators. Symmetry. 2021; 13(4):539.

Chicago/Turabian Style

Qin, Yongbin, Weizhe Yang, Kai Wang, Ruizhang Huang, Feng Tian, Shaolin Ao, and Yanping Chen. 2021. "Entity Relation Extraction Based on Entity Indicators" Symmetry 13, no. 4: 539.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop