A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging

Tuo, Meimei; Yang, Wenzhong; Wei, Fuyuan; Dai, Qicai

doi:10.3390/electronics12041013

Open AccessArticle

A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging

by

Meimei Tuo

¹,

Wenzhong Yang

^1,2,*,

Fuyuan Wei

¹ and

Qicai Dai

¹

School of Information Science and Engineering, Xinjiang University, Urumqi 830017, China

²

Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi 830017, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(4), 1013; https://doi.org/10.3390/electronics12041013

Submission received: 18 January 2023 / Revised: 11 February 2023 / Accepted: 12 February 2023 / Published: 17 February 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In recent years, overlapping entity relation extraction has received a great deal of attention and has made good progress in English. However, the research on overlapping entity relation extraction in Chinese still faces two key problems: one is the lack of datasets with overlapping entity instances, and the other is the lack of a neural network model that can effectively solve overlapping entity relation extraction. To address the above problems, this paper produces an interpersonal relationship dataset, NewsPer, for news texts and proposes a Chinese overlapping entity relation extraction model, DepCasRel. First, the model uses “Word-label” to incorporate the character features of Chinese text into the dependency analysis graph, and then uses the same binary labeling method to label the head and tail entities embedded in the text. Finally, the text’s triples are extracted. DepCasRel solves the problem that traditional methods make it difficult to extract triples with overlapping entities. Experiments on our manually annotated dataset NewsPer show that DepCasRel can effectively encode the semantic and structural information of text and improve the performance of an overlapping entity relation extraction model.

Keywords:

relation extraction; overlapping entities; graph convolutional neural network (GCN); dependency analysis

Graphical Abstract

1. Introduction

With the booming development of information technology, the information resources in the network have also shown an explosive growth trend, and retrieving and extracting information data on such a large scale in such diverse forms and with such rapid growth hfas become a new challenge. In this development context, relation extraction (RE) technology has emerged. The purpose of RE is to extract the semantic relations between entity pairs in a sentence and form structured data for storage and retrieval [1]. For example, “After a preliminary investigation by the public security authorities, Yuan Mou, a student of the school, stabbed his classmate Ye Mou due to a dispute. (经公安机关初步调查, 该校学生袁某因纠纷, 将同学叶某捅伤).” The entities “Yuan Mou (袁某)” and “Ye Mou (叶某)” have the relation “classmate (同学)” which can be recorded as the relationship triad 〈Yuan Mou, classmate, Ye Mou〉 (〈袁某, 同学, 叶某〉). Extracting the interpersonal relations in the text provides basic data for natural language processing tasks such as retrieving information about people and constructing social knowledge graphs, which is of great value in many social security fields.

After decades of development, the theory and technology of RE, from the early model based on pattern matching and rule extraction [2,3] to the late model based on machine learning [4] and deep learning [5,6], are becoming mature. However, linguistics is an extremely complex discipline, and the relational facts in sentences are often complex. Different triples in a sentence may have overlapping entities, as shown in Table 1.

At present, the research on overlapping entity relation extraction in English has achieved good results, but the overlapping entity relation extraction model in Chinese cannot effectively solve this problem. Meanwhile, Chinese has its own unique syntactic structure and complex semantic relations, which leads to the phenomenon of overlapping entities being more prevalent in practical language usage. Therefore, it is of great significance to propose a Chinese overlapping entity relation extraction model.

The technical issues concerning the extraction of overlapping entity relations in Chinese can be summarized as follows: One is the Chinese corpus problem. Most of the current research on relation extraction is based on the English corpus, whereas the existing Chinese dataset only contains normal relationships in text and lacks relation instances with overlapping entities. Therefore, it is an important task to construct a high-quality Chinese overlapping entity relation extraction dataset. The second is the problem of word differences between Chinese and English. Chinese consists of characters that form words and then words that form sentences, and each character and word has its own unique semantic meaning. In contrast, the smallest unit of English with semantic information is the word. When the English overlapping entity relation extraction model was directly migrated to the Chinese dataset, the model did not learn the Chinese semantic features as well as the English dataset, making the migrated model’s extraction results mediocre. As a result, it is critical to propose a model that can solve the entity overlapping problem for Chinese relation extraction technology. Thirdly, there is the problem of feature fusion. Due to the linguistic and textual differences between English and Chinese, many methods cannot perform as well on the Chinese corpus as they do on the English. At the same time, the linguistic structure of Chinese is more complex, which makes the task of extracting overlapping entity relations in Chinese difficult. There is relatively little research on the extraction of overlapping entity relations in Chinese. How to effectively fuse the word features and other features of Chinese text and how to adequately represent the semantic information of the text are crucial to this research.

In order to solve the above research problems, this paper proposes a dependency-based Chinese overlapping entity relation extraction model, DepCasRel. Inspired by the CasRel [8] framework, we transformed the problem of overlapping entity relation extraction into the problem of sequence labeling and proposed the “Word-label” method. By using the pretraining language model BERT and the graph convolution neural network (GCN), we fully and effectively integrated the character features and dependency features of the Chinese text. Compared with other overlapping entity relation extraction models, DepCasRel has achieved a performance improvement on our annotated interpersonal relationship dataset, NewsPer.

This paper has the following main contributions:

We have constructed an interpersonal relationship dataset for news texts, NewsPer, which is fully manual annotated, reducing the impact of noise data and supplementing a high-quality dataset for Chinese overlapping entity relation extraction research.
We propose the Word-label method, which integrates the character features of Chinese text into the dependency graph of text segmentation so that the character features and dependency features of Chinese text can be effectively combined. This solves the problem of embedding differences between Chinese characters and words and the problem of feature differences caused by different grammatical structures in Chinese and English.
We successfully integrated text context semantic information and dependency structure information by using GCN. We also conducted experimental comparisons with other overlapping entity relation extraction models on the dataset we constructed. The results show that our proposed model has greatly improved the performance of Chinese overlapping entity relation extraction.

2. Related Work

2.1. Relation Extraction

The concept of RE was first proposed at the Message Understanding Conference (MUC) and was supported by the Defense Advanced Research Projects Agency (DARPA) in the late 1980s [9]. With the rise of deep learning technology, researchers have also applied this technology to RE and achieved good results. At present, entity relation extraction methods based on deep learning can be divided into the pipeline method and the joint learning method.

The pipeline method is to divide RE into two subtasks, named entity recognition (NER) and relation extraction (RE). This method mainly uses recurrent neural networks (RNN) and convolutional neural networks (CNN). Socher et al. [10] were the first to propose using RNN for relation extraction tasks. Their model combines matrix vector representation with RNN, which can not only learn the meaning of words themselves but also modify other words. Experiments show that this model effectively solves the problem that the word vector space model cannot capture the meaning of long phrases. Sun et al. [11] proposed a method combining SVM and CNN. The experiment on the COAE2016 dataset shows that the effective combination of the two can achieve good performance in Chinese entity relation extraction tasks. Gao et al. [12] proposed a technology called KMCN, which is based on CNN and improved by a kernel function. This technology achieves the goal of CNN mining relationships by calculating the similarity of the effective subtree of phrases and has achieved good extraction results on real judicial textual documents.

The pipeline method has achieved good results, but it is easy to cause error propagation and ignores the connection between the two subtasks. In contrast, the joint learning method can closely link the interactive information between NER and RE, that is, combine the models of the two subtasks to directly obtain the relation triples. Miwa et al. [13] proposed a new end-to-end neural model to complete relation extraction. The model divides the coding layer’s LSTM unit sequence representation into two subtasks and extracts relations in the dependency tree based on the shortest path between target entities. The experimental F1 value on the SemEval 2010 Task 8 dataset reached 84.4%. Li et al. [14] improved Miwa et al.’s model for error propagation. One is to use beam search in the NER subtask, and the other is to introduce a new relation “Invalid_Entity” in the RE subtask. The experimental results show that these two improvements significantly improve the performance of relation extraction. Zheng et al. [15] proposed a new annotation strategy-based relation extraction method. This method combines two subtasks into a sequence annotation problem. The end-to-end network model is used to extract relation triples, which improves the recall and accuracy of relation extraction.

Because of the complexity and difficulty of Chinese text, Chinese relation extraction is more challenging than English. The majority of existing Chinese relation extraction methods rely on the use of characters or words [16,17,18,19]. However, these methods only focus on the improvement of the model itself, ignoring the fact that different input granularities have a significant impact on the RE model. Generally speaking, purely word-based models are often affected by the quality of word segmentation. The purely character-based model cannot use word information, so it cannot capture more features. Compared with single-dimension information at the character or word level, multi-dimensional information can weaken the influence of Chinese word segmentation and retain more semantic information. For example, Zhang et al. [20] use a character-word lattice-structured LSTM model to obtain sentence representation. Li et al. [21] further solved the problem of polysemy of Chinese words and proposed the MG Lattice model. Furthermore, the study [22,23,24] discovered that it achieved good performance in Chinese relation extraction by using a graph model, thereby avoiding the defect of Chinese text features.

2.2. Entity Overlapping

At present, according to the degree of entity overlap, it can be divided into three types, including Normal, SEO and EPO. Researchers have mainly used sequence-to-sequence (Seq2Seq), graph-based models, and pretrained language model-based approaches for overlapping entity relation extraction studies.

In 2018, Zeng et al. [7] proposed CopyRe, an end-to-end model based on the copying mechanism, which extracts relations before entities, involves entities in different triples by copying them, and employs different decoding strategies for different cases. In 2019, Zeng et al. [25] introduced reinforcement learning based on their own model, comparing the triples generated during decoding with the existing labeled triples, setting a reward mechanism, and iterating the model many times. The model effect was improved. Subsequently, Fu et al. [26] used dependency syntactic analysis to transform the input sentence into a dependency tree, input the adjacency matrix of the tree into a Bi-GCN to obtain local features so as to extract entities and relations, respectively, and introduced a weighted GCN to calculate the weights of the edges (relations) between any entity pair for each relation that has been extracted, thus solving the problem of overlapping entity relations. Wang et al. [27] proposed a novel handshake marking strategy that solved the impact of complex entities on overlapping entity relation extraction tasks by answering questions. Sui et al. [28] turned the joint entity relation extraction task into a set prediction problem, thus reducing the burden of model prediction triple order, and took the lead in introducing a nonautoregressive decoder combined with a bipartite match loss function so that the model directly outputs the triple. In 2020, Wei et al. [8] designed the cascaded binary labeling framework (CaseRel) to enable the model to learn the mapping function between head entities and tail entities for a given relation, thus achieving the effect of modeling the triple as a whole. This model introduces a new solution for overlapping entity relation extraction, but the extraction effect is poor after migration to a Chinese dataset. Therefore, this paper aims at the problems in Chinese overlapping entity relation extraction and combines the characteristics of Chinese text to improve the framework.

2.3. Datasets for relation extraction

Common datasets for RE include ACE [29,30], SemEval 2010 Task 8 [31], TACRED [32], NYT [33], etc. The dataset is the key component of relation extraction, which determines whether the model trained by the dataset can be applied to the real world. Some researchers choose to create their own datasets in order to conduct Chinese relation extraction research. For example, Chen et al. [16] constructed a dataset containing three types of relations and tested it for multiinstance learning. Wen et al. [19] constructed an RE dataset based on Chinese literature texts and proposed a structure for a regularized neural network. Baidu announced the dataset DuIE [34] in the public information extraction competition. In the 2019 China Conference on Knowledge Graph and Semantic Computing (CCKS), the sponsor disclosed the dataset IPRE [35]. However, the current Chinese relation extraction dataset either contains noise problems caused by distant supervision or lacks relation instances with overlapping entities. In contrast, NewsPer, a dataset constructed by reliable manual annotation, enriches the content of the Chinese relation extraction corpus and is suitable for training and evaluating Chinese overlapping entity relation extraction tasks.

3. Proposed Method

In this paper, we propose a cascade annotation model, DepCasRel, for Chinese overlapping entity relation extraction based on dependency, which can extract the entity and relation information contained in the text. First, BERT is used to construct text character features, and the text dependency relation is extended to characters through Word-label. Then, GCN is used to encode the text dependency features, and the position of the head entity in the text is marked. Finally, the corresponding tail entity in the text is marked according to the predefined relation, so as to extract the entity relation triple information contained in the text. The overall framework of the model is shown in Figure 1 .

3.1. Task Formulation

The RE task can be formally defined as follows: suppose S = {

c_{k}

}_{k = 1}^{n}

represents a sentence containing n characters in the test set, where

c_{k}

is the

k - t h

character of the sentence. The goal of the RE task is to identify all entity pairs

(s u b, o b j)

in the sentence and predict the relation

r \in R

between them, where R is a predefined set of relation types. If and only if the

s u b, o b j, r

predicted by the model are consistent with the gold value, it indicates that the result of relation extraction is correct.

3.2. Encoder

The encoder can be divided into the BERT Encoder and the GCN Encoder. The BERT Encoder is used to encode the character features of text, and the GCN Encoder is used to encode the dependent features of text segmentation.

3.2.1. BERT

The coding layer BERT [36] is a deep bidirectional language representation model that uses a bidirectional multilayer transformer structure to jointly adjust the context of each layer of text to learn the language representation. The Chinese version of BERT is published by the Joint Laboratory of Harbin Institute of Technology and iFlytek (HFL), which encodes the input Chinese text.

The Chinese version of BERT processes Chinese text by character. In fact, the semantic information of words in Chinese text is more than that of characters, and more dependency features in sentences are based on words. Aiming at the differences between Chinese character and word embedding, this paper proposes a Word-label method. Specifically, before the text is input into the BERT encoder, we first process the text segmentation: on the basis of the Chinese word segmentation tool, add Word-label to words containing more than two characters (“unused1” is used in this paper). As shown in Figure 2, the first line is the original sentence, the second line is the sentence processed by the Chinese word segmentation tool, and the third line is the sentence inserted into the Word-label.

Through the introduction of Word-label, we can link the BERT encoded character features with the dependency features of text segmentation so that the GCN encoder can encode the dependency features of text at the character level. After the text has been processed, the sentence with the Word-label inserted is fed into the BERT coding layer, and the feature vector is denoted as

H_{c h a r}

, where n represents the original number of characters in the sentence and b represents the number of Word-label inserted.

H_{c h a r} = B E R T ([c_{1}, c_{2}, . . ., c_{n + b}])

(1)

3.2.2. GCN

The research shows that the text dependency feature can effectively improve the performance of the entity relation extraction model. GCN is just able to capture the dependencies between data through information transmission between graph nodes, so it is often used to process data with rich relationships and interdependencies between objects [37].

In the NLP field, GCN generally encodes text based on a dependency analysis graph. The dependency analysis graph shows the dependency between text word segmentations, where head is a virtual root node and there is only one node that depends on the root node, while the edges in the graph represent the dependency between word segmentations. Figure 3 shows the dependency analysis diagram for “After a preliminary investigation by the public security authorities, Yuan Mou, a student of the school, stabbed his classmate Ye Mou due to a dispute. (公安机关初步调查,该校学生袁某因纠纷,将同学叶某捅伤).”

Unlike the previous studies of GCN in which the text was directly divided into words as nodes in the graph, in order to integrate the character features and dependency features of the text, this paper uses the results of the text dependency analysis to set the character node, word node, character-word edge, and dependency edge to form the basic graph structure required by GCN. The character–word edges represent character-to-word associations, and the dependency edges represent word-to-word associations. The proposed Word-label are representations of words containing more than two characters and have no real meaning. Figure 4 shows the dependency analysis graph with Word-label added in Figure 4, where “unused1” is the word node. Orange edges are character–word edges, indicating that a word is composed of the word it points to. The black edges are dependency edges, and the ones marked beside them are dependency types.

For a sentence with n characters, we use the

(n + b) \times (n + b)

adjacency matrix A to denote its graph structure, where b represents the number of words with more than two characters (the number of inserted Word-label).

A_{i j}

=1 is usually used to represent that there is an edge between node i and node j.

In the L-layer GCN,

h_{i}^{(l - 1)}

denotes the input vector and

h_{i}^{(l)}

denotes the output vector of node i at

l - t h

layer. A graph convolution operation is shown as follows:

h_{i}^{(l)} = σ (\sum_{j = 1}^{n} {\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} W^{(l)} h_{i}^{(l - 1)} + b^{(l)})

(2)

\tilde{A} = A + I

(3)

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j},

(4)

where

W^{(l)}

is a linear transformation,

b_{(l)}

is the bias,

σ

is a nonlinear function (e.g., ReLU), I is the identity matrix, D is the degree matrix,

{\tilde{D}}_{i i}

represents the number of nodes connected to node i, and A is the adjacency matrix formed by the graph after the insertion of the Word-labels. In each graph convolution calculation, each node will gather information on its adjacent nodes in the graph.

The GCN encoder based on the dependency analysis graph uses the feature vector representation generated by BERT to encode the relevant information in the neighborhood of each node in the graph into a new representation vector

H_{d e p}

= {

x_{i}

}_{i = 1}^{N}

, where N represents the dimension of the feature vector (N = 300).

3.3. Decoder

The decoder is mainly divided into two parts: the head entity decoder and the joint decoder of relations and tail entities. First, decode all possible head entities in the text and mark their start and end positions. Then select one predicted head entity each time and mark its corresponding tail entity under each predefined relation.

3.3.1. Head Entity Decoder

The head entity decoder recognizes all possible head entities in the input sentence S by directly decoding the vector

H_{d e p}

generated by the L-layer GCN encoder. Specifically, the head entity decoder uses two identical binary classifiers to check the start position and end position of the head entity, respectively. If the current position is the start or end position of the head entity, it is marked as 1; otherwise, it is marked as 0. The specific operations are as follows:

p_{i}^{s u b_b e g i n} = σ (W_{b e g i n} x_{i} + b_{b e g i n})

(5)

p_{i}^{s u b_e n d} = σ (W_{e n d} x_{i} + b_{e n d}),

(6)

where

p_{i}^{s u b_b e g i n}

and

p_{i}^{s u b_e n d}

, respectively, represent the probability that the current position is marked as the starting and ending position of the head entity. If the probability exceeds a certain threshold (0.5 is adopted in this paper), the mark of the corresponding position will be allocated as 1, otherwise the allocation mark will be 0.

x_{i}

is the coded representation of the

i - t h

position in the input sequence, i.e.,

x_{i}

=

H_{d e p} [i]

.

W_{(\cdot)}

represents the trainable weight value,

b_{(\cdot)}

represents the bias, and

σ

represents the sigmoid activation function.

The head entity decoder optimizes the likelihood function

p_{θ} (s u b ∣ H_{d e p})

to identify the span of a head entity

s u b

in a given sentence represents

H_{d e p}

. If z is true,

I (z)

=1, otherwise it is 0.

y_{i}^{s u b_b e g i n}

and

y_{i}^{s u b_e n d}

represent binary marks of the start and end positions of the head entity, respectively. Parameter

θ = {W_{b e g i n}, b_{b e g i n}, W_{e n d}, b_{e n d}}

. For multiple entity recognition, we use the same matching principle as CasRel [8] to obtain the span of the head entity based on the marking results of the start and end positions:

p_{θ} (s u b ∣ H_{d e p}) = \prod_{t \in {s u b_b e g i n, s u b_e n d}} \prod_{i = 1}^{n} {(p_{i}^{t})}^{I {y_{i}^{t} = 1}} {(1 - p_{i}^{t})}^{I {y_{i}^{t} = 0}} .

(7)

3.3.2. Joint Decoder of Relation and Tail Entity

The joint decoder of relations and tail entities simultaneously recognize the tail entity and the relation with the predicted head entity. The decoder and the head entity decoder have the same structure but are different from the direct decoding vector

H_{d e p}

of the head entity decoder. The joint decoder for each specific relation considers the characteristics of the head entity. The specific operations are as follows:

p_{i}^{o b j_b e g i n} = σ (W_{b e g i n}^{r} (x_{i} + v_{s u b}^{k}) + b_{b e g i n}^{r})

(8)

p_{i}^{o b j_e n d} = σ (W_{e n d}^{r} (x_{i} + v_{s u b}^{k}) + b_{e n d}^{r}),

(9)

where

p_{i}^{o b j_b e g i n}

and

p_{i}^{o b j_e n d}

, respectively, represent the probability that the current position is marked as the start and end positions of the tail entity.

v_{s u b}^{k}

represents the vector of the

k - t h

head entity obtained by the head entity decoder. To ensure consistency in the

x_{i}

and

v_{s u b}^{k}

dimensions, we use the average of the kth head entity start and end position representation vector as

v_{s u b}^{k}

.

The joint decoder optimizes the likelihood function

p_{ϕ_{r}} (o b j ∣ s u b, H_{d e p})

to identify the span of corresponding to the tail entity

o b j

when given the sentence representation

H_{d e p}

and the head entity

s u b

.

y_{i}^{o b j_b e g i n}

and

y_{i}^{o b j_e n d}

represent the binary marks of the start and end positions of the tail entity in

H_{d e p}

. If the head entity

s u b

is related to the tail entity

o b j

, the

o b j

’s start and end positions will be marked as 1; otherwise they will be 0. If there is no corresponding tail entity, then

y_{i}^{ϕ_b e g i n} = y_{i}^{ϕ_e n d} = 0

for all i. Parameter

ϕ_{r} = {W_{b e g i n}^{r}, b_{b e g i n}^{r}, W_{e n d}^{r}, b_{e n d}^{r}}

.

p_{ϕ_{r}} (o b j ∣ s u b, H_{d e p}) = \prod_{t \in {o b j_b e g i n, o b j_e n d}} \prod_{i = 1}^{n} {(p_{i}^{t})}^{I {y_{i}^{t} = 1}} {(1 - p_{i}^{t})}^{I {y_{i}^{t} = 0}} .

(10)

In this case, the joint decoder of relations and tail entities can simultaneously identify the tail entities related to the head entity obtained from the head entity decoder and the relations between them.

According to the design of Wei et al. [8], we set the objective likelihood function as follows:

\begin{matrix} J (Θ) = \sum_{j = 1}^{∣ D ∣} & [\sum_{s u b \in T_{j}} log p_{θ} (s u b ∣ H_{d e p}^{j}) + \sum_{r \in T_{j} | s u b} log p_{ϕ_{r}} (o b j ∣ s u b, H_{d e p}^{j}) + \\ \sum_{r \in R ∖ T_{j} ∣ s u b} log p_{ϕ_{r}} (o b j_{ϕ} ∣ s u b, H_{d e p}^{j})], \end{matrix}

(11)

where

Θ = {θ, {ϕ_{r}}_{r \in R}}

, D are training sets,

H_{d e p}^{j}

is the representation of a sentence

S_{j}

in D.

T_{j} = {(s u b, o b j, r)}

is the set of possible overlapping entity relation triples in sentence

S_{j}

. We use the Adam random gradient descent algorithm to maximize

J (Θ)

and then train the model.

4. Proposed Dataset

4.1. Data Collection

In order to generate high-quality corpus data, we collected all kinds of news reports from People’s Daily (http://paper.people.com.cn/; (accessed on 10 February 2022)) , Tencent News (https://news.qq.com/; (accessed on 10 February 2022)), China News (https://www.chinanews.com.cn/; (accessed on 10 February 2022)), and Toutiao (https://www.toutiao.com; (accessed on 10 February 2022)). We selected sentences that meet the requirements of interpersonal relation, and manually annotated all entity information and relation information to avoid the noise problem caused by distant supervision. We collected 10,082 sentences, each of which contained at least one instance of a relation. These sentences are divided into a training set, a verification set, and a test set according to the 7:2:1 ratio. Table 2 shows the statistical information of the proposed dataset, NewsPer.

4.2. Data Processing

After collecting the original news text, we divided the original corpus by sentence and used the Baidu lexical tool LAC (https://gitee.com/baidu/lac; (accessed on 10 February 2018)) (Lexical Analysis of Chinese) to pick out those sentences that contain person names (PER), place names (LOC), organization names (ORG), and work names (nw). Then we manually annotated them according to the types of relations between people we divided (see Appendix A.1 for specific type descriptions). In order to ensure the reliability and accuracy of the annotated data, we have arranged for two markers and one auditor. The marker is responsible for data annotation, and the auditor is responsible for reviewing the quality of the data annotation. In case of any discrepancy in annotation, the final annotation result will be determined by the auditor.

According to common interpersonal relationships and important attributes of people, we have defined three major categories (relatives, social relationships, and other relationships) and 17 subcategories (not including unknown) of interpersonal relationships, which cover a wide range of types of interpersonal relationships, as shown in Figure 5. (See Appendix A.2 for specific type descriptions.)

In addition, Figure 6 shows the sentence-length statistics in NewsPer, and Table 3 shows the number of relations in a sentence. As can be seen from the statistics, NewsPer reflects the complexity of the context in which relations occur in the real text, and the manual annotation also improves the accuracy of the dataset.

4.3. Data Features

The English text in the TACRED [32] adds POS, NER, DEPREL, and other functions by using Stanford CoreNLP. Because Chinese text cannot use spaces for word segmentation, inspired by the TACRED dataset, we used DDParser (https://gitee.com/yeahking/DDParser; accessed on 10 February 2020) (Baidu Dependency Parser), a Chinese dependency parsing tool released by Baidu NLP, to obtain the tokens, POS, and dependency. Moreover, for the entity mention problem in RE, we use the "entMention” label for annotation to ensure that the entity information in the sentence can be effectively used.

4.4. Discussion

When compared to other Chinese relation extraction datasets, our NewsPer dataset has the following advantages. First, we provide the corpus with rich features (POS, dependency, entity type, and entity mention). Secondly, we manually reviewed and modified some errors generated by third-party word segmentation tools during data processing, reducing the negative impact of error propagation on model training. Thirdly, we created a reasonable classification of interpersonal relationships, which includes both social relationships between people and important attributes of people, and this can provide sufficient data support for downstream tasks (such as building a knowledge graph of people). Fourth, we annotated multiple entity relations in a sentence, which reflects the complexity of the text in which relations occur in the real environment.

5. Experiments

5.1. Experimental Setting

The experimental data is our manually annotated corpus of interpersonal relationships, which is divided into a training set, a verification set, and a test set according to the ratio of 7:2:1.The experimental environment and hyperparameters are shown in Appendix A.3. The parameter settings are the best parameters adjusted according to the experimental conditions and training effects, including the network structure parameters and the experimental hyperparameters. The experimental environment and parameter settings of the comparison model are consistent with DepCasRel. For the extraction effect of relation triples, we use precision, recall, and the F1 value as evaluation indicators.

5.2. Comparison Experiment and Result Analysis

For comparison, we use the following models as comparison models:

AttBLSTM [38] is a bidirectional LSTM network based on an attention mechanism that can automatically extract important words in text without using additional knowledge or NLP systems.
CNN modifies the method proposed by Zeng et al. [39] by instead extracting text sentence features by using BERT, splicing the sentence features with the location features of the entity pairs, and finally feeding the output of CNN into a softmax classifier to predict the relationship between two tagged entities.
GCN [40] is a deep learning model based on graph structure data; this experiment uses the publicly available Chinese word embedding (https://github.com/Embedding/Chinese-Word-Vectors; (accessed on 10 February 2018)), inputting the text feature representation and adjacency matrix into the GCN, and finally inputting the output of the GCN into a softmax classifier to predict the relationship between two tagged entities.
CopyMTL [41] is a multitask learning framework that uses conditional random fields (CRF) to identify entities and the Seq2Seq model to extract relation triples. OneDecoder uses shared parameters to predict all triples, while MultiDecoder uses unshared decoders, and each decoder predicts a triplet.
CasRel [8] is a novel cascading binary tagging framework. In the first stage, all possible head entities are identified, and then for each identified head entity, all possible relations and the corresponding tail entities are identified by a relation-specific tagger at the same time.
SPN [28] represents the joint relation extraction task as a set prediction problem. The model employs a nonautoregressive decoder based on a transformer as the set generator, and when combined with the bipartite match loss function, all relation triples can be output directly at the same time.

On the basis of the experimental setup in Section 5.1, we trained our cascade annotation model, DepCasRel, for Chinese overlapping entity relation extraction. At the same time, DepCasRel is compared with other classical relation extraction models under the same experimental conditions. We analyzed the experimental results.

Table 4 shows the results of the different models for relational triad extraction on the NewsPer test set. As can be seen in Table 4, DepCasRel outperforms the other models in all three evaluation metrics. Compared to the multitask learning framework CopyMTL, DepCasRel far exceeds the F1 metric by 34.27%. Compared to the CasRel model, DepCasRel improved by 3.24%, 8.85%, and 6.14% on the precision, recall and F1 evaluation metrics, respectively, with a large improvement on recall in particular. Compared to the joint extraction model SPN, DepCasRel also improved in three metrics, whereas precision and recall values were comparable.

To further analyze the performance of DepCasRel, we also conducted experiments on the NER subtask, and the results are shown in Figure 7. As can be seen in Figure 7, DepCasRel not only outperformed the other models on relation extraction but also on the NER subtask. Compared to the CasRel model, DepCasRel improved by 3.06% on the F1 of the NER subtask. We attribute this to the fact that Word-label allows for the interaction of information between word features and dependent features of text, which in turn improves the probability of entity recognition by the model.

We also compared the effect of extracting relations from sentences containing different numbers of triads, and the results are shown in Figure 8. As can be seen in Figure 8, DepCasRel achieves better extraction results even for sentences containing two or even more triads, which demonstrates that introducing dependent features of text in the feature learning phase can improve the semantic learning of text and bring positive effects to model extraction.

The above performance improvement indicates that the framework of extracting the head entities first and then the tail entities under specific relations is more effective in handling the overlapping entities in complex scenarios. In addition, GCN is a neural network that directly acts on the graph structure. It can fully learn the complex syntactic information contained in Chinese text through the information of word segmentation nodes and edge relations, and it can represent more abundant semantic information. Thirdly, GCN has the ability to encode local features and word segmentation dependencies. Adding sentence dependency information encoded by GCN to the joint extraction model can improve the probability that two entities in the triple are correctly extracted, thus improving the network performance.

5.3. Ablation Experiment and Result Analysis

In order to verify the effectiveness of Word-label, we conducted an ablation experiment to test the performance of the model with and without it. The experimental results are shown in Table 5, where

with regard to Word-label, the word feature vector of BERT encoding is spliced with the dependent feature vector of GCN encoding as the feature vector of text for decoder annotation;
with regard to GCN Encoder, CasRel’s experimental results on the NewsPer test set are given;
with regard to BERT Encoder, using the publicly available Chinese word embedding, the text features are encoded using BiLSTM for the decoder to annotate the location of entities.

As can be seen from Table 5, the direct splicing of text-dependent features on top of word features does not improve the extraction performance of the model but rather impairs it, especially in terms of accuracy. We suspect that the reason for this is that the splicing of feature vectors is not really considered feature fusion but rather adds unnecessary interference to the model and prevents it from learning the textual feature information. In contrast, our proposed Word-label can incorporate the word features into the Chinese word separation-based dependency graph, effectively solving the fusion problem of word features and dependency features and improving the extraction performance of the model. At the same time, adding GCN to CasRel can extract the dependency features of the text and have a positive impact on model recognition. The BERT encoder can fully learn the word features of the text compared to the trained word embedding, which helps the model learn.

5.4. Effects of Graph Setting

To further analyze the role of GCN Encoder in DepCasRel, we compared the effects of different feature maps and different numbers of layers of GCN on the performance of the model. The experimental results are shown in Table 6 and Figure 9, respectively, where

with regard to the directed graph, the character-word edge is from word node to character node, and the dependency edge is from head to dependent word;
with regard to the mixed graph, character–word edge refers to word node pointing to character node, and dependency edge refers to undirected edge; and
with regard to the undirected graph, both character–word edges and dependency edges are undirected edges.

Based on the experimental results in Table 6, we can see that the orientation of the edges in the feature graph also has an impact on the performance of the model; in particular, the undirected graphs are better than the directed and mixed graphs in terms of extraction results. We suspect there are several main reasons for this. First, most sentences in the NewsPer dataset are longer than 40 characters (as shown in Figure 6), which may produce sparse graphs. Whereas undirected graphs have more edges than directed graphs, more edges mean more supervised signals and are therefore more helpful for the model to learn features. Secondly, the order of head and tail entities, due to the complex sentence structure of Chinese, does not help much in interpersonal relationship extraction. An undirected graph is more suitable for interpersonal relationship extraction. Thirdly, there is no directionality between character nodes and word nodes, so it is more helpful for the model to identify entities when character–word edges are undirected.

In order to determine the optimal number of GCN layers, we selected 1, 2, 3, 4, and 5 as the number of GCN layers for our experiments, where the feature graphs used were all undirected graphs. From Figure 9, we can see that DepCasRel has the best extraction performance when the GCN is set at two layers. As the number of GCN layers continues to increase, the extraction performance of the model gradually decreases. We conjecture that the reason for this is that more GCN layers are not conducive to the model learning the features of the nodes themselves. Specifically, the GCN updates node features by aggregating information from neighboring nodes, and each update aggregates information from one more order of neighboring nodes. Assuming that the order of the highest neighboring node is called the “aggregation radius” of that node, as the number of layers of the GCN increases, the “aggregation radius” of the node becomes larger, and then each node covered by the node converges to the full graph node, which will result in a much less diverse local network structure for each node, which is bad for the learning of the node’s own features. Therefore, the higher the number of GCN layers, the less favorable it is for DepCasRel to identify entity information, which in turn will reduce the extraction performance of the model.

In addition, we found that the one-layer GCN model performed worse than the model without the GCN encoder. We conjecture that the reason for this is that dependencies do not all exist between adjacent words, and some may span several words. Therefore, some of the structural information introduced by the one-layer GCN may interfere with the model and reduce its precision compared to the absence of GCN.

6. Conclusions

In this paper, we build a Chinese interpersonal relationship dataset, NewsPer, which contains overlapping entities, and propose a cascade annotation model, DepCasRel, for Chinese overlapping entity relation extraction based on dependency. This model can jointly extract the relationship triplets with overlapping entities in Chinese text. Combined with the linguistic characteristics of Chinese, we propose a Word-label method that combines character features and dependency features so that both can be input to GCN through a dependency analysis graph, which is not only helpful for the model to identify entities but also can make full use of structural information to help the model extract entity relationships. The experimental results show that DepCasRel has the ability to encode local features and text dependencies. Adding GCN encoded structural information to the joint extraction model can improve the probability of correct extraction of relational triplets, thus improving the network’s performance, especially in extracting relational triplets of overlapping entities. We hope that future work can integrate external knowledge bases (such as the Thesaurus, WordNet, and HowNet) into the current model. We believe that introducing external knowledge into the future model will greatly improve the accuracy of model extraction of overlapping entity relationships.

Author Contributions

Conceptualization, M.T. and W.Y.; methodology, M.T. and F.W.; validation, M.T.; data curation, M.T. and Q.D.; writing—original draft preparation, M.T.; writing—review and editing, M.T., W.Y. and F.W.; supervision, M.T. and W.Y.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China grant number 202204120017, the Autonomous Region Science and Technology Program grant number 2022B01008-2, and Autonomous Region Science and Technology Program grant number 2020A02001-1.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to copyright.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Types of Named Entities

Because the dataset NewsPer mainly focuses on interpersonal relationships, we have selected four types of entities as the named entities in the dataset based on the word segmentation of LAC (Baidu Lexical Analysis of Chinese), namely, PER, LOC, ORG, and nw. The types of named entities in NewsPer and the content they cover are shown in Table A1.

Table A1. Types of Named Entities.

Types	Content
PEO	Personal name, including nicknames and made-up names.
LOC	Geographic location, including country, city, etc.
ORG	Generally refers to businesses, schools, government departments, and so on, excluding bands, groups, and so on.
nw	Books or articles written, songs sung, programs performed, etc.

Appendix A.2. Types of Named Relations

According to common interpersonal relationships and important attributes of people, we have defined three major categories (relatives, social relationships, and other relationships), 17 subcategories, and “Unknown,” which cover a wide range of interpersonal relationships. The relation list for NewsPer is shown in Table A2, including relation ID, name, and description.

Table A2. Types of Relations.

ID	Names	Description
0	Unknown	Relationships that do not belong to any other relationship type.
1	亲属关系/夫妻	Established by marriage, including husband, wife, fiance, and fiancee.
2	亲属关系/祖孙	It refers to generational kinship, such as grandfather and grandmother.
3	亲属关系/父母	Refers to the person who plays the roles of father and mother, including father, mother, a spouse’s parents, and adoptive parents.
4	亲属关系/兄弟姐妹	It refers to brothers and sisters, including blood-related brothers and sisters and non-blood-related brothers and sisters.
5	亲属关系/亲戚	Refers to internal and external relatives, including matrilineal relatives and paternal relatives, such as uncles, aunts, etc.
6	社交关系/情侣	Refers to two people who attract and love each other, including boyfriend, girlfriend, lover, and cohabitation.
7	社交关系/前任	Refers to a person who previously held a certain position or status, including an ex-boyfriend, ex-girlfriend, ex-husband, and ex-wife.
8	社交关系/朋友	It refers to people with deep friendships, including good friends, girlfriends, confidants, etc.
9	社交关系/同学	Refers to people who go to school at the same school, including classmates, seniors, elder sisters, younger brothers, younger sisters, etc.
10	社交关系/师生	The collective name of teacher and student here refers to teacher, coach, and master.
11	社交关系/合作	Refers to the people who work together, including customers, teammates, colleagues, associates, team members, etc.
12	社交关系/竞争	It refers to the relationship of competing with others for their own interests, such as competitors in competitions and at work, and so on.
13	其他关系/工作于	Refers to a place or organization where a person works, such as a place or organization where he is working or once worked.
14	其他关系/学习于	It refers to a person studying in a certain place or organization, such as a graduating college, a college they are currently studying at, a place to study abroad, etc.
15	其他关系/出生于	A place, such as a country, city, etc.
16	其他关系/作品	It refers to the original intellectual achievements in the field of literature, art, or science produced by a person through creative activities and can be expressed in a certain form.
17	其他关系/国籍	Refers to the identity of an individual belonging to a certain country.

Appendix A.3. Implementation Details

We use an NVIDIA GeForce RTX 2080 Ti graphics card for model training and testing. Table A3 shows the parameter settings of DepCasRel, including network structure parameters and experimental super parameters. The parameter settings are the best parameters adjusted according to the experimental conditions and training effects. The experimental environment and parameter settings of the comparison model are consistent with DepCasRel.

Table A3. Super parameter setting.

Parameter Name	Parameter Value
Batch size	8
Epoch	200
GCN layer number	2
GCN hidden layer dimension	300
GCN Learning rate	0.0001
Bert Learning rate	0.00001

References

Liu, S.; Li, B.; Guo, Z.; Wang, B.; Chen, G. Review of Entity Relation Extraction. J. Inf. Eng. Univ. 2016, 17, 541–547. [Google Scholar]
Aone, C.; Ramos-Santacruz, M. REES: A large-scale relation and event extraction system. In Proceedings of the Sixth Applied Natural Language Processing Conference, Seattle, WA, USA, 29 April–4 May 2000; pp. 76–83. [Google Scholar]
Aitken, J.S. Learning Information Extraction Rules: An Inductive Logic Programming Approach. In Proceedings of the 15th Eureopean Conference on Artificial Intelligence, ECAI’2002, Lyon, France, 21–26 July 2002; pp. 355–359. [Google Scholar]
Schutz, A.; Buitelaar, P. Relext: A tool for relation extraction from text in ontology extension. In Proceedings of the International Semantic Web Conference, Galway, Ireland, 6–10 November 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 593–606. [Google Scholar]
Rink, B.; Harabagiu, S.M. A generative model for unsupervised discovery of relations and argument classes from clinical texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing 2011, Edinburgh, UK, 27–31 July 2011. [Google Scholar]
Thattinaphanich, S.; Prom-On, S. Thai Named Entity Recognition Using Bi-LSTM-CRF with Word and Character Representation. In Proceedings of the 4th International Conference on Information Technology 2019, Bali, Indonesia, 24–27 October 2019. [Google Scholar]
Zeng, X.; Zeng, D.; He, S.; Kang, L.; Zhao, J. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 2018, Melbourne, Australia, 15–20 July 2018. [Google Scholar]
Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020, Online, 5–10 July 2020. [Google Scholar]
Yang, X.; Zhang, S.; Ou-Yang, C. A Comprehensive Review on Relation Extraction. J. Univ. S. China (Sci. Technol.) 2018, 1. [Google Scholar]
Socher, R.; Huval, B.; Manning, C.D.; Ng, A.Y. Semantic Compositionality through Recursive Matrix-Vector Spaces. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning 2012, Jeju, Republic of Korea, 12–14 July 2012. [Google Scholar]
Sun, J.D.; Xiu-Sen, G.U.; Yan, L.I.; Wei-Ran, X.U. Chinese entity relation extraction algorithms based on coae2016 datasets. J. Shandong Univ. 2017, 52, 7. [Google Scholar]
Gao, D.; Peng, D.L.; Liu, C. Entity Relation Extraction Based on CNN in Large-scale Text Data. J. Chin. Comput. Syst. 2018, 39, 5. [Google Scholar]
Miwa, M.; Bansal, M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2016, Berlin, Germany, 7–12 August 2016. [Google Scholar]
Li, F.; Zhang, M.; Fu, G.; Ji, D. A Neural Joint Model for Extracting Bacteria and Their Locations. Advances in Knowledge Discovery and Data Mining. 2017, 10235, 15–26. [Google Scholar]
Zheng, S.; Hao, Y.; Lu, D.; Bao, H.; Xu, J.; Hao, H.; Xu, B. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 2017, 257, 59–66. [Google Scholar] [CrossRef]
Chen, Y.J.; Hsu, Y.J. Chinese Relation Extraction by Multiple Instance Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 2016, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Rönnqvist, S.; Schenk, N.; Chiarcos, C. A recurrent neural model with attention for the recognition of chinese implicit discourse relations. arXiv 2017, arXiv:1704.08092, 2017. [Google Scholar]
Zhang, Q.Q.; Chen, M.D.; Liu, L.Z. An effective gated recurrent unit network model for chinese relation extraction. In Proceedings of the 2017 2nd International Conference on Wireless Communication and Network Engineering, WCNE 2017, Xiamen, China, 24–25 December 2017; pp. 275–280. [Google Scholar]
Xu, J.; Wen, J.; Sun, X.; Su, Q. A discourse-level named entity recognition and relation extraction dataset for chinese literature text. arXiv 2017, arXiv:1711.07010. [Google Scholar]
Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. arXiv 2018, arXiv:1805.02023. [Google Scholar]
Li, Z.; Ding, N.; Liu, Z.; Zheng, H.; Shen, Y. Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019, Florence, Italy, 28 July–2 August 2019; pp. 4377–4386. [Google Scholar]
Wan, H.; Moens, M. F.; Luyten, W.; Zhou, X.; Mei, Q.; Liu, L.; Tang, J. Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks. J. Am. Med. Inform. Assoc. 2016, 23, 356–365. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jin, Y.; Zhang, W.; He, X.; Wang, X.; Wang, X. Syndrome-aware herb recommendation with multi-graph convolution network. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering 2020, Dallas, TX, USA, 20–24 April 2020; pp. 145–156. [Google Scholar]
Ruan, C.; Ma, J.; Wang, Y.; Zhang, Y.; Yang, Y. Discovering regularities from traditional Chinese medicine prescriptions via bipartite embedding model. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019, Macao, China, 10–16 August 2019. [Google Scholar]
Zeng, X.; He, S.; Zeng, D.; Liu, K.; Liu, S.; Zhao, J. Learning the Extraction Order of Multiple Relational Facts in a Sentence with Reinforcement Learning. In Proceedings of the Empirical Methods in Natural Language Processing 2019, Hong Kong, China, 3–7 November 2019. [Google Scholar]
Fu, T.J.; Li, P.H.; Ma, W.Y. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction. In Proceedings of the Meeting of the Association for Computational Linguistics 2019, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
Wang, Y.; Yu, B.; Zhang, Y.; Liu, T.; Sun, L. Tplinker: Single-stage joint extraction of entities and relations through token pair linking. arXiv 2020, arXiv:2010.13415. [Google Scholar]
Sui, D.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Joint entity and relation extraction with set prediction networks. arXiv 2020, arXiv:2011.01675. [Google Scholar]
Doddington, G.R.; Mitchell, A.; Przybocki, M.A.; Ramshaw, L.A.; Strassel, S.M.; Weischedel, R.M. The automatic content extraction (ace) program-tasks, data, and evaluation. In Proceedings of the Lrec, Lisbon, Portugal, 26–28 May 2004; pp. 837–840. [Google Scholar]
Song, Z.; Maeda, K.; Walker, C.; Strassel, S. Ace 2007 Multilingual Training Corpus. Available online: https://catalog.ldc.upenn.edu/LDC2014T18 (accessed on 15 September 2014).
Hendrickx, I.; Su, N.K.; Kozareva, Z.; Nakov, P.; Szpakowicz, S. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. In Proceedings of the Association for Computational Linguistics 2010, Uppsala, Sweden, 11–16 July 2010. [Google Scholar]
Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; Manning, C. D. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
Riedel, S.; Yao, L.; Mccallum, A. K. Modeling relations and their mentions without labeled text. In Proceedings of the Machine Learning and Knowledge Discovery in Databases, Barcelona, Spain, 20–24 September 2010; pp. 148–163. [Google Scholar]
Li, S.; He, W.; Shi, Y.; Jiang, W.; Liang, H.; Jiang, Y.; Zhu, Y. Duie: A large-scale chinese dataset for information extraction. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Dunhuang, China, 9–14 October 2019; pp. 791–800. [Google Scholar]
Wang, H.; He, Z.; Ma, J.; Chen, W.; Zhang, M. IPRE: A dataset for inter-personal relationship extraction. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Dunhuang, China, 9–14 October 2019; pp. 103–115. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://www.cs.ubc.ca/amuham01/LING530/papers/radford2018improving.pdf (accessed on 10 February 2018).
Zhang, J.L.; Zhang, Y.F.; Wamg, M.Q.; Huang, Y.J. Joint Extraction of Chinese Entity Relations Based on Graph Convolutional Neural Network. Comput. Eng. 2021, 47, 103–111. [Google Scholar]
Peng, Z.; Wei, S.; Tian, J.; Qi, Z.; Bo, X. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2016, Berlin, Germany, 7–12 August 2016. [Google Scholar]
Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics 2014, Dublin, Ireland, 23–29 August 2014; pp. 2335–2344. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C. D. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2205–2215. [Google Scholar]
Zeng, D.; Zhang, H.; Liu, Q. CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020; pp. 9507–9514. [Google Scholar]

Figure 1. Overview of the proposed method—DepCasRel.

Figure 2. Text processing example. The original sentence is “After a preliminary investigation by the public security authorities, Yuan Mou, a student of the school, stabbed his classmate Ye Mou due to a dispute.” Both the red and green fonts are entities.

Figure 3. Example of a dependency analysis graph.

Figure 4. Dependency analysis graph with Word-lable. The sentence is “After a preliminary investigation by the public security authorities, Yuan Mou, a student of the school, stabbed his classmate Ye Mou due to a dispute.”

Figure 5. Types and proportion of relations. The relationship classification from top to bottom is as follows: other relationships/nationality, other relationships/composition, other relationships/born in, other relationships/study in, other relationships/working on, social relationships/compete, social relationships/cooperation, social relationships/teacher-student relationship, social relationships/fellow student, social relationships/friend, social relationships/ex-lover, social relationships/lovers, relatives/directly-related members of one’s family, relatives/brothers and sisters, relatives/parent, relatives/grandchild elationship, relatives/spouse, unknown.

Figure 6. Sentence length statistics.

Figure 7. Results of different models on NER subtask.

Figure 8. F1 score of extracting relational triples from sentences with different numbers of triples.

Figure 9. Results of different layer number of GCN.

Table 1. Examples of overlapping entity types. Normal represents a normal category, SEO represents single entity overlap, and EPO represents entity pair overlap.

Types	Text	Triples
Normal [7]	周星驰中学就读于香港圣玛利奥英文书院。 (Zhou Xingchi studied at St. Mary’s English College in Hong Kong as a secondary school student.)	〈周星驰,就读于,香港圣玛利奥英文书院〉 〈Zhou Xingchi, studied at, St. Mary’s English College in Hong Kong〉
SEO [7]	周星驰主演了《喜剧之王》和《大话西游》。 (Zhou Xingchi starred in “King of Comedy” and “A Chinese Odyssey”.)	〈周星驰,演员,喜剧之王〉 〈Zhou Xingchi, actor, King of Comedy〉 〈周星驰,演员,大话西游〉 〈Zhou Xingchi, actor, A Chinese Odyssey〉
EPO [7]	由周星驰导演并主演的《功夫》于近期上映。 (Directed by and starring Zhou Xingchi, “Kung Fu Hustle” was recently released.)	〈周星驰,演员,功夫〉 〈Zhou Xingchi, actor, Kung Fu Hustle〉 〈周星驰,导演,功夫〉 〈Zhou Xingchi, Director, Kung Fu Hustle〉

Table 2. Statistics of the proposed dataset—NewsPer.

Spilt	Sentences	Single Relation Sentences	Overlapping Entity Sentences
Train	7057	5400	1657
Dev	2016	1526	490
Test	1009	734	275
All	10,082	7660	2422

Table 3. Statistics on the number of tuples in a sentence.

Triple Number	Train	Dev	Test	All
1	5504	1555	746	7805
2	989	275	172	1436
3	377	124	56	557
4	97	23	16	136
5	39	16	11	66
6	32	13	7	52
>7	19	10	1	30

Table 4. Results of different models.

Model	Precison	Recall	F1
AttBLSTM	0.4412	0.4920	0.4652
CNN	0.5595	0.5480	0.5537
GCN	0.5819	0.5832	0.5825
$C o p y M T L_{O n e D e c o d e r}$	0.4296	0.3792	0.4028
$C o p y M T L_{M u l t i D e c o d e r}$	0.4673	0.4017	0.4320
CasRel	0.7105	0.6596	0.6841
SPN	0.7168	0.7153	0.7160
DepCasRel	0.7429	0.7481	0.7455

Table 5. Results of ablation experiment.

Condition	Precison	Recall	F1
DepCasRel	0.7429	0.7481	0.7455
-Word-label	0.4684	0.6660	0.5500
-GCN Encoder	0.7105	0.6596	0.6841
-BERT Encoder	0.6144	0.4484	0.5185

Table 6. Results of different graphs.

Condition	Precison	Recall	F1
Directed graph & Word-label	0.7244	0.7193	0.7218
Mixed graph & Word-label	0.7327	0.7291	0.7309
Undirected graph & Word-label	0.7429	0.7481	0.7455

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tuo, M.; Yang, W.; Wei, F.; Dai, Q. A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging. Electronics 2023, 12, 1013. https://doi.org/10.3390/electronics12041013

AMA Style

Tuo M, Yang W, Wei F, Dai Q. A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging. Electronics. 2023; 12(4):1013. https://doi.org/10.3390/electronics12041013

Chicago/Turabian Style

Tuo, Meimei, Wenzhong Yang, Fuyuan Wei, and Qicai Dai. 2023. "A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging" Electronics 12, no. 4: 1013. https://doi.org/10.3390/electronics12041013

APA Style

Tuo, M., Yang, W., Wei, F., & Dai, Q. (2023). A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging. Electronics, 12(4), 1013. https://doi.org/10.3390/electronics12041013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Chinese Overlapping Entity Relation Extraction Model Using Word-Label Based on Cascade Binary Tagging

Abstract

1. Introduction

2. Related Work

2.1. Relation Extraction

2.2. Entity Overlapping

2.3. Datasets for relation extraction

3. Proposed Method

3.1. Task Formulation

3.2. Encoder

3.2.1. BERT

3.2.2. GCN

3.3. Decoder

3.3.1. Head Entity Decoder

3.3.2. Joint Decoder of Relation and Tail Entity

4. Proposed Dataset

4.1. Data Collection

4.2. Data Processing

4.3. Data Features

4.4. Discussion

5. Experiments

5.1. Experimental Setting

5.2. Comparison Experiment and Result Analysis

5.3. Ablation Experiment and Result Analysis

5.4. Effects of Graph Setting

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Types of Named Entities

Appendix A.2. Types of Named Relations

Appendix A.3. Implementation Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI