BTDM: A Bi-Directional Translating Decoding Model-Based Relational Triple Extraction

: The goal of relational triple extraction is to extract knowledge-rich relational triples from unstructured text. Although the previous methods obtain considerable performance, there are still some problems, such as error propagation, the overlapping triple problem, and suboptimal subject–object alignment. To address the shortcomings above, in this paper, we decompose this task into three subtasks from a fresh perspective: entity extraction, subject–object alignment and relation judgement, as well as propose a novel bi-directional translating decoding model (BTDM). Speciﬁcally, a bidirectional translating decoding structure is designed to perform entity extraction and subject–object alignment, which decodes entity pairs from both forward and backward extraction. The bidirectional structure effectively mitigates the error propagation problem and aligns the subject– object pairs. The translating decoding approach handles the overlapping triple problem. Finally, a (entity pair, relation) bipartite graph is designed to achieve practical relationship judgement. Experiments show that our model outperforms previous methods and achieves state-of-the-art performance on NYT and WebNLG. We achieved F1-scores of 92.7% and 93.8% on the two datasets. Meanwhile, in various complementary experiments on complex scenarios, our model demonstrates consistent performance gain in various complex scenarios.


Introduction
One of the most important information extraction tasks in natural language processing is extracting entities and relations from unstructured text to generate structured relational triples.Typically, the structured triple takes the form (subject, relation, object), where both the subject and the object are entities tied together by a relation.Extracting the triple of structured knowledge from an unstructured text can serve downstream tasks such as knowledge graphs [1,2], question answering [3,4], and biomedical tasks [5,6].
Early studies [7][8][9][10] on relational triple extraction split the work into two parts, identifying entities [11,12] and predicting their relations [7,13].Specifically, the first step is identifying all entities in the text, then enumerating entities for relation judgement.This method is known as the pipeline-based approach.The advantage of this strategy is that it may leverage existing techniques to identify the named entity and classify the relations.However, there are two drawbacks as follows: (1) the link between entity identification and relation prediction is disregarded, and (2) errors in one subtask can easily propagate to other tasks.
To solve the above shortcomings, joint extraction [14][15][16], which is the simultaneous extraction of entities and relationships, is studied.CasRel [17] identifies all the subjects and then identifies the relations and objects related to the subjects.TDEER [18] extracts the possible relations and subjects, respectively, then decodes the objects using an attention mechanism.PRGC [19] first identifies the relation and entity pairs and then aligns them.
These sequence labeling approaches make use of multiple tagging sequence networks to determine the beginning and ending positions of entities.
Although the above methods have achieved a promising performance, they have a common feature in that they extract subjects before objects and relations or relations before entities.This method has a fatal drawback in that the subject extraction failure will lead to the entire triple being impossible to extract.Specifically, as shown in Figure 1 for the triple (Stephen Chow, Nationality, China), failure to extract the subject Stephen Chow will lead to the failure to extract the object China and relation Nationality.The extraction of triples is sensitive to whether the subject is extracted, a problem that exists in most current methods.The problem is detrimental to the identification of entity pairs; furthermore, it significantly impairs the performance of the whole relational triple-extraction task.
[Stephen Chow],the best comedian in [China], was born in [Hong Kong].

Nationality Contains
Sentence: Bipartite Graph Linking:

Entity pairs Relations
Stephen Chow Hong Kong In this paper, to mitigate error propagation, we designed a bidirectional framework to identify entity pairs.We gave the subject and object equal importance in this framework, which extracts entity pairs using forward extraction (subject → entity pair, denoted as s2p) and backward extraction (object → entity pair, denoted as o2p), respectively.Meanwhile, the features in both directions flow into each other, which enhances the convergence speed and recognition accuracy.Secondly, to achieve the efficient and accurate alignment of entity pairs, a translating decoding approach is presented based on the attention mechanism.This approach decodes entity pairs by translating relational features and the results of entity recognition.Finally, to fully mine the relations between entity pairs, we add relational features to the bidirectional decoding to strengthen the interaction between entity pairs and relations.We also design a (entity pair, relation) bipartite graph, which maintains a relation matrix for each relation, thus effectively implementing a mapping of entity pairs to relationships.

China
We implemented the above approach in BTDM.This consists of an encoder module, a bidirectional entity, and alignment of the subject-object module and relation judgement module.According to the experimental results of two widely used benchmark datasets, the proposed BTDM outperforms earlier techniques and reaches state-of-the-art performance.The main contributions of this paper are as follows: 1.
We propose a new bidirectional perspective to decompose the relational triple-extraction task into three subtasks: entity extraction, subject-object alignment, and relation judgement.

2.
Following our perspective, we propose a novel end-to-end model BTDM, which greatly mitigates error propagation, handles the overlapping triple problem, and efficiently aligns the subject and object.

3.
We conduct extensive experiments on several public NYT and WebNLG datasets.We compared the proposed method with 12 baselines, demonstrating that the proposed model achieves a state-of-the-art performance.

Problem Definition
In this section, we first provide a principled problem definition and present our view of the joint relational triple-extraction task.

Problem Formulation
Given a sentence S = {x 1 , x 1 , x 3 , . . . ,x n } with n tokens, x i denotes the i-th token in sequence S. The goal of the relation extraction model is to identify all the structured relational triples T(S) = {(s, r, o)|s, o ∈ E, r ∈ R}, where E = {e 1 , e 2 , . . ., e m } is a set of entities, and R = {r 1 , r 2 , . . . ,r k } is a predefined set of relations.

Our View of the Problem
In this paper, the task of joint relational triple extraction is divided into three subtasks: Entity Extraction.For the given sentence S, this subtask identities all subjects and objects separately.The output of this subtask are the subjects set Y s (S) = {s i } and objects set Y o (S) = {o i }.
Subject-Object Alignment.For the given sentence S and the identified set of subjects Y s (S) and set of objects Y o (S), this subtask identifies all objects that correspond to the subjects set Y s (S) and the subjects that correspond to the objects set Y o (S).

Method
In this section, each component of the BTDM model is elaborated.Figure 2 depicts an overview of BTDM.

BTDM Encoder
The BTDM encoder takes as input a sentence S with n tokens.To obtain a sentence contextual representation h i for each token, we generated an initial representation with a large-scale pre-trained language model (PLM).We utilized the BERT [20] encoder so that our comparisons were consistent, but alternative encoders, such as Glove [21] or RoBERTa [22], would work just as well.The detailed operations are as follows: where d is the embedding dimension, h i is the representation of i-th token in sentence S, and h is the contextual representation.Previous excellent works, such as CasRel [17] or SPN [23] PRGC [19], all used the single text feature h to identify triples.However, we believe that each of the three elements of the triple should have a unique set of features.
We obtained the relational embedding and added it to the state of subject-object alignment, which also helps to strengthen the interaction between entity pairs and relations.For this reason, we further extracted subject features, object features, and relation features on the basis of text features h.Notably, h is fed into three different feed-forward networks(FFN) to generate subject features, object features, and relation features (denoted as hs, ho, hr, respectively).The detailed operations are as follows:

BTDM Decoder
In this section, we will introduce the BTDM decoder, which consists of three parts corresponding to three subtasks: entity extraction, subject-object alignment, and relation judgement.

Entity Extraction
Our proposed BTDM is a bidirectional extraction framework.Due to the space limitation, we only introduce the details of BTDM in the s2p direction, and the details of the model in the backward direction o2p are similar.The structure of our the proposed approach, displaying the processing for handling one sentence that contains three relational triples (Jackie Chan, born in, Beijing), (Jackie Chan, nationality, China), and (Beijing, capital of, China).In this example, BTDM first identifies entity pairs by translating and decoding in both directions, and then aligns them.Then, the relation is judged by linking a (entity pairs-relations) bipartite graph.Please note that all invalid links have been removed for the sake of clarity.

Text Encoder
The subject tagger module extracted the subject for the s2p direction.Following prior research [17,18], we employed the span-based tagging strategy to effectively identify the subject and its position.In particular, the entity extraction module employed two binary classifiers to identify the start and end positions of the subject by assigning a binary(0/1) tag to each token.The binary tag indicates whether the current token position is the start or the end of a subject.If the token at the present position is the start or end of a subject, then Tag 1 is assigned; otherwise, Tag 0 is assigned.The detailed operations of the entity extraction on each token in the sentence are as follows: where p s_start i and p s_end i represent the possibility that the i-th token in a sentence is the start or end position of a subject, respectively.If the possibility exceeds the threshold, we set the predefined value; then, the current token is assigned Tag 1. Otherwise, the current token is assigned Tag 0; σ denotes an activation function.
The process of the o2p direction is similar and will not be introduced here.At this stage, we obtain all possible sets of subjects Y s (S)and sets of objects Y o (S), respectively.

Subject-Object Alignment
This section only introduces entity pair extraction in the s2p direction.The subtask of subject-object alignment is to identify all possible objects corresponding to all subjects extracted in the previous phase.In order to do this, we iterated the subject of detected subjects set Y s (S).We used the attention mechanism to mine the relation between the identified subject, object features, and relational features, translating and decoding all possible entities corresponding to each detected subject.
Specifically, we obtaind a fused representation of specific subject, relation features, and object features, then used the self-attention mechanism to obtain a selective representation.The detailed operations are as follows: where d k is the dimension of the attention network, and • denotes a hadamard product operation.Note that the subject is frequently composed of more than one token.We take the maxpool vector representation of the k-th subject as v sub_k .
Next, a binary classifier is used to identify the object, including its start and end position.The detailed operations are as follows: where p s2p_start i and p s2p_end i represent the possibility that the i-th token in the sentence is the start or end position of the object corresponding to the subject, respectively.
Finally, the final set of entity pairs Y (s,o) (S) is formed by fusing the two sets of entity pairs identified in both directions.

Relation Judgement
After subject-object alignment, we obtaind all the entity pairs with possible relations; as shown in Figure 2, we defined a (entity pair, relation) bipartite graph for relation judgement, which takes the projected entity pairs E head representations and E tail .
where E head and E tail are the d-dimentional representations obtained by v sub_k and v obj_k ; W h , W t ∈ R d×d are two matrices, allowing for the model to extract subject-object characteristics of entity pairs; and b h , b t ∈ R d×1 are two biases.Lastly, we predicted the relations between each pair of entities for each relation r to determine if they can form a valid triple: where U k ∈ R d+1×d+1 is a relation-specific matrix.If the possibility p k ij exceeds the predefined threshold, the relational triple (s i , r k , o j ) will be considered to exist; otherwise, it will be considered not to exist.
Linking a bipartite graph to judge relations can better encode the directional information of words during decoding and has an excellent classification ability.These two benefits help to increase the precision of the extraction.On the one hand, this keeps track of the matrix for every type of relation, which faithfully represents the traits of the relation; on the other hand, it accurately mines the relations between subjects and objects thanks to its probabilistic computational architecture.

Joint Training Strategy
Our proposed approach can be seen as three steps with five subtasks.We traind the model jointly, optimized the combined objective function during training, and shared the BTDM encoder parameters.According to our description, our model is a bidirectional extraction model, where all modules on the bidirectional work in multi-task learning, and the extraction of entities in the s2p direction and loss of entities on the extracted modules are calculated as follows. where , L is the length of the sentence, I{z} = 1 if z is true and 0 otherwise, and p i (t) is a label of the start or end position of each entity for the i-th token in sentence.Similarly, the same loss functions L o and L o2p were found in the o2p direction.
The subtask of relation judgement minimizes the following binary cross-entropy loss function: where K denotes the number of predefined types of relation, and y t is the right label of the relational triple (s i , r k , o j ).
Finally, the overall loss function definition of BTDM is as follows.The weight of each sub-loss may be carefully adjusted to improve performance; however, for the sake of simplicity, we merely provided identical weights (i.e., α = β = γ = δ = = 1).
In addition, to alleviate the issue of exposure bias, we used a negative sample strategy to randomly generate some negative samples to add to all the training sets.During the inference time, the total sample containing negative samples can simulate the true scenario.This method improves the robustness and recognition accuracy of the model in natural scenes.As a result, the exposure bias problem is greatly alleviated.

Experiments
We conducted numerous experiments to evaluate and characterize the proposed BTDM.This section first describes the experimental conditions.Following that, the evaluation findings and subsequent discussion are given.

Datasets and Evaluation Metrics
For a fair and comprehensive comparison, we followed [24,25] to evaluate our model on two publicly used datasets: The New York Times (denoted as NYT) [26] and WebNLG [27].The NYT dataset was developed using distant supervision, which au-tomatically aligns the relational facts in Freebase.The NYT dataset consists of 56 k training data and 5 k test data.The WebNLG dataset was initially created for the natural language generation (NLG) task, which tries to generate corresponding descriptions using the given triples.The WebNLG dataset contains 5 k training data and 703 test data.There were two versions of each dataset.The different versions were denoted as NYT*, NYT and WebNLG*, WebNLG, respectively.It is worth noting that NYT* and WebNLG* annotate the last word of entities, whereas NYT and WebNLG annotate the entire entity span.Table 1 describes the statistics of the datasets.Following [17], we further characterized the test set with respect to the overlapping patterns of relational triples and the number of relational triples in each sentence.In keeping with previous studies, a relational triple is only regarded as valid if it precisely fits the ground truth.Specifically, if the subject s, object o, and relation r are identical to the ground truth, the predicted triple (s, r, o) is considered valid.Meanwhile, the standard micro-precision (denoted as Prec.), recall (denoted as Rec.), and F1-score (denoted as F1) for each baseline are presented.

Implementation Details
We implemented our model using the frame of PyTorch.During training, we adopted the Adam [28] optimizer.The hyperparameters were tuned by the grid search on the validation set.The learning rate was set to 1 × 10 −3 /5 × 10 −5 .The batch size was set to eight on NYT* and NYT and six on WebNLG* and WebNLG.We trained the model for 200 epochs and chose the best model.We used the pre-trained Cased BERT base (https://huggingface.co/bert-base-cased) as the sentence encoder and set the max length of a sentence to 100.Our experiments were conducted on the workstation with an Intel Xeon(R) E5 2.20 GHz CPU, 256 GB memory, an NVIDIA Tesla V100GPU, and Ubuntu 20.04.

Baselines
We compared our model with several baselines.The compared models were as follows: • NovelTagging [29] treats the extraction problem as a sequence labeling problem by merging entities and relational roles; • CopyR [30] applies a sequence-to-sentence architecture; • MultiHead [31] uses a multi-headed selection technique to identify the entity and the relation;

•
GraphRel [32] uses graph convolutional neural networks to extract entities and their relations; • OrderCopyR [33] uses reinforcement learning in seq-to-seq models to extract triples; • ETL-span [24] proposed a decomposition-based tagging scheme; • RSAN [34] proposed a relation-specific attention mechanism network to extract entities and relations extraction; • CasRel [17] applies a cascade binary tagging framework to extract triples; • TPLinker [25] firstly finds all token pairs and then uses map to tag token links to recognize relations between token pairs; • TDEER [18] proposes a translating decoding network to relational triples; • PRGC [19] proposes a framework based on a potential relation and global correspondence; • R-BPtrNet [35] proposes a unified network to extract explicit and implicit relational triples.
For a fair comparison, all baseline results are taken directly from the original literature.

Experimental Results
In this section, we provide the overall outcomes as well as the results of the other difficult scenarios.
In addition, we noticed two phenomena.CasRel is a typical one-way extraction framework.Our model BDTM shows a superior performance compared to CasRel, demonstrating the correctness of our idea that bidirectional extraction avoids the propagation of errors due to subject identification failure.Additionally, the PRGC is the latest SOTA model, which first predicts the possible relations in the text.This leads to the omission of some possible relations, which leads to poor generalization of the model.Our model, the BDTM, performs better than the PRGC, especially in datasets with more relation types.This proves our idea that adding relation features can strengthen the interaction between entity pairs and relations so that the model can find the relationship between entity pairs more accurately and effectively.In addition, our bipartite graph-linking approach maintains a relationship matrix, allowing for each relation to make relation judgments and providing powerful classification and relation-mining capabilities.

Detailed Results on Complex Scenarios
To validate the performance of BTDM in some complex scenarios, we used a subset of the NYT* and WebNLG* datasets for complementary experiments, which included testing the capacity to handle different types of overlapping triples and sentences with varying amounts of triples.Following previous works [17,19,25,34], We divided the overlapping patterns into four categories: Normal, Single Entity Overlap (SEO), Entity Pair Overlap (EPO), and Subject Object Overlap (SOO).The dataset can be divided into five subdatasets: N = 1, N = 2, N = 3, N = 4, N ≥ 5, where N is the number of relational triples in a sentence.We chose five strong models as our baselines, and Table 3 provides the total outcomes.As shown in the Table 3, BTDM has the highest F1-score on thirteen of the eighteen subsets and ranks second on the remaining four subsets.In addition, when there are multiple triples in a sentence, the BTDM obtains more performance gains.
The excellent performance of BTDM could be due to two factors: on the one hand, it effectively alleviates the error propagation problem, ensuring the recall and precision of the extraction of entity pairs and further ensures the precision of the extracted relational triples; on the other hand, the model maintains a relation matrix for each relation and applies a specific relation link between each entity pair, which guarantees the recall of the extracted relational triples.This ultimately means that the model has an excellent composite F1 metric in various complex scenarios.
In general, the aforementioned experimental results show that BTDM outperforms the baseline and is more robust in dealing with a variety of complicated scenarios.

Detailed Results on Different Sub-Tasks
The bidirectional translating decoding structure of our model can effectively mitigate error propagation and align entity pairs, and the linking of bipartite graphs can effectively identify the relations between entity pairs.To further investigate the performance of the model on the alignment of entity pair and identification of relations, we chose the current best SOTA model PRGC (one of the current state-of-the-art triple-extraction models, with an outstanding performance in relation judgement and subject-object alignment) for further comparison experiments on subtasks.The results are shown in Table 4.
In general, BTDM outperforms PRGC in thirteen of the eighteen metrics and achieves a similar performance to PRGC in the remaining metrics.Moreover, BTDM exceeds PRGC in F1 metrics on all subtasks, indicating the overall superiority of model and the stability of performance of model on each subtask.In addition, on the relation extraction subtask, we perform about the same, or even slightly better, than the PRGC method on dataset NYT*.However, on dataset WebNLG with more types of relations, the recall rate is similar to that of PRGC: we achieved a 3.4% performance improvement in terms of precision, indicating the superior performance of the relation embedding and bipartite graph-linking approaches in relation extraction.
Lastly, on the entity pair and relation triplet tasks, BTDM leads PRGC overall.The experiments also demonstrate that capturing the dependencies between entity pairs and relations is crucial, which points us in the direction of designing stronger models in future work.

Related Work
The work associated with joint entity and relationship extraction can be divided into three categories.
The first category is tagging-based methods.This type of approach [17][18][19]24,29,34,[36][37][38]] can be used for relational triple-extraction tasks when applied to multiple interrelated sequential labeling problems.It first marks the start and the end positions of the entity and then identifies the relation between them.For instance, NovelTagging [29] is a typical approach, designing a complex labeling scheme that contains information about the start position of the entity, the end position of the entity, and relations.Several studies employ a sequence-tagging network to identify the positions of the entity, then classify the relationship using multiple classification networks.The following endeavors have recently gained much attention and achieved a competitive performance.CasRel [17] treats the relational triple-extraction task as a subject-to-object mapping relation, first identifying all possible entity heads and then applying a relation-based sequence tagging network to each entity head to identify the corresponding entity tails.PRGC [19] proposes an extraction method based on relation prediction and global correspondence, which constrains the relation classification to a subset of predicted relations, greatly reducing the complexity of determining all relations.
The second category is table-filling based methods.This type of approach [25,32,[39][40][41]] forms a table of input sentences and identifies the relations between entities by filling the table.Some works keep a table for each relation, with items representing the start and end positions of the two entities with this relation.Graphrel [32] uses words as nodes and identifies the relations between word pairs by discriminating between nodes.TPLinker [25] converts the relational triple-extraction task into a linking problem between entity pairs and introduces a handshake-marking scheme to align the boundaries of the entity pairs.The third category is seq2seq-based methods.This type of approach [2,30,33,35,42] treats a triple as a sequence of tokens.It transforms the triple-extraction task into a generation task, which uses an encoder-decoder framework to generate the elements of the triple.For instance, CopyRE [30] uses a copy method to extract the hidden relations behind entities and can resolve the triple overlapping issue.R-BPtrNet [35] proposes a unified framework with a binary pointer network, which can extract explicit relational triples and implicit relational triples through a binary pointer network.

Conclusions and Future Work
In this study, we revisited the relational triple-extraction task from a fresh bidirectional perspective and proposed a new joint relational triple-extraction framework bidirectional translating decoding model.The bidirectional framework of BTDM greatly mitigates error propagation; the translating decoding approach of BTDM handles the overlapping triple problem and efficiently aligns subject-object.The relation judgement of the bipartite graph of BTDM fully mines the relations between entity pairs to achieve accurate and efficient relation judgement.The experimental results show that BTDM exhibited SOTA performance on several public datasets and performed equally well in different complex scenarios.
For future work, we hope to investigate more effective approaches to the error propagation problem.We would like to investigate the use of triple classification in additional types of information-extraction challenges, such as document-level relation extraction and event extraction.

Figure 1 .
Figure 1.An example of single-directional and bidirectional extraction, as well as the bipartite graph that links entity pairs and relationships.

Figure 2 .
Figure 2.The structure of our the proposed approach, displaying the processing for handling one sentence that contains three relational triples (Jackie Chan, born in, Beijing), (Jackie Chan, nationality, China), and (Beijing, capital of, China).In this example, BTDM first identifies entity pairs by translating and decoding in both directions, and then aligns them.Then, the relation is judged by linking a (entity pairs-relations) bipartite graph.Please note that all invalid links have been removed for the sake of clarity.
The output of this subtask are two entity pair sets Y s2p (S, s i |s i ∈ Y s (S)) = (s i , o j ) and Y o2p (S, o i |o i ∈ Y o (S)) = (s j , o i ) .Then, the model can obtain the final identification of the entity pairs set Y (s,o) (S) = {(s, o)} by aligning the two sets Y s2o (S, s i |s i ∈ Y s (S)) and Y o2s (S, o i |o i ∈ Y o (S)).

Table 1 .
Statistics of four datasets used in our experiments, where N is the number of relational triples in a sentence.We divided the overlapping patterns into four categories: Normal, Single Entity Overlap (SEO), Entity Pair Overlap (EPO), and Subject Object Overlap (SOO).

Table 2 .
Comparison (%) of the proposed BTDM method with the baselines.Note Bold marks the highest score and underline marks the second best score.

Table 3 .
F1-score (%) of relational triple-extraction with different overlapping patterns.Note that † marks results reported by PRGC, and Bold marks the highest score in the table.

Table 4 .
Results on different subtasks.(s, o) denotes the subtask of the entity pair, r means the subtask of relation, and (s, r, o) means the task of relational triples.