A Study on Double ‐ Headed Entities and Relations Prediction Framework for Joint Triple Extraction

: Relational triple extraction, a fundamental procedure in natural language processing knowledge graph construction, assumes a crucial and irreplaceable role in the domain of academic research related to information extraction. In this paper, we propose a Double ‐ Headed Entities and Relations Prediction (DERP) framework, which divides the entity recognition process into two stages: head entity recognition and tail entity recognition, using the obtained head and tail entities as inputs. By utilizing the corresponding relation and the corresponding entity, the DERP frame ‐ work further incorporates a triple prediction module to improve the accuracy and completeness of the joint relation triple extraction. We conducted experiments on two English datasets, NYT and WebNLG, and two Chinese datasets, DuIE2.0 and CMeIE ‐ V2, and compared the English dataset experimental results with those derived from ten baseline models. The experimental results demonstrate the effectiveness of our proposed DERP framework for triple extraction.


Introduction
With the development of natural language processing and knowledge graphs, data storage and presentation methods for structured text have become more mature, but there are still many unsolved problems in the processing of unstructured and semistructured text [1].Extracting triple groups is crucial in natural language processing and knowledge graph construction.In constructing knowledge graphs, unstructured texts usually extract entities and form correspondences by forming a (head entity, relation, tail entity) triple.
Existing triple extraction methods mainly include two major kinds, pipeline extraction methods and joint extraction methods.Traditional pipeline extraction methods divide knowledge extraction into two subtasks [2]: named entity recognition and relation extraction.However, this approach ignores potential information interactions between entities and relations, leading to incorrect relation extractions or failure to recognize entity relations.Many previous experiments have demonstrated that a joint learning approach greatly improves the effectiveness of entity and relation extraction due to the consideration of the information interactions between the two subtasks, so most of the current research for the task of entity and relation extraction adopts the joint learning approach.
In recent scholarship, there has been a notable surge in research attention directed toward the intricacies of overlapping triples, as shown in Figure 1.This phenomenon is exemplified in sentences wherein there is the potential presence of both entity pair overlap (EPO) triples and single entity overlap (SEO) triples.This burgeoning area of inquiry underscores the escalating interest in dissecting and comprehending the complexities inherent to overlapping triples in textual data.Previous research has revealed several shortcomings in the extraction of multiple relationships (overlapping triples) within the same entity.For example, the NovelTagging method uses a joint decoding of sequence annotations to treat entity and relation extraction as a sequence annotation problem [3]; however, this method only assigns a single label to each token, rendering it incapable of handling overlapping triples in the data.In contrast, the CasRel framework models relations as functions that map subject to object [4], successfully overcoming the issue of poor handling of overlapping triples by previous models.Nevertheless, the CasRel framework suffers from the disadvantage of incorrectly identifying the head entity, leading to failure in identifying the relation and the tail entity.An overview of the CasRel framework structure is shown in Figure 2. In this study, a head entity recognition module is used to predict the triple related to the head entity and a tail entity recognition module is added to predict the triple related to the tail entity.Combining the information from the two modules results in a triple of higher accuracy.Experimental results show that the performance of the framework is improved by combining the BERT encoder.This work contributes as follows: 1.A double-headed entities and relations prediction framework for joint triple extraction based on the BERT encoder is proposed.The named entity recognition task is decomposed into head entity recognition and tail entity recognition.2. To ensure recognition accuracy, a triple prediction module, which gives different weights to the triple derived from the head entity recognition and the triple extracted from the tail entity recognition, is set up to improve the accuracy of triple extraction.3. To validate the method, experiments were conducted on two English public datasets, NYT and WebNLG, and two Chinese datasets, DuIE2.0 and CMeIE-V2, and the proposed framework was compared with ten baselines.

Related Work
In recent years, many methods have been proposed to accomplish knowledge extraction that can be categorized into pipeline extraction methods and joint learning methods based on the learning process.

Pipeline Extraction Methods
Usually, pipeline extraction methods consist of the entity recognition stage and the relation extraction stage, where the output of the previous stage becomes the input of the next stage.This approach has the advantage that a specific model can handle a responding task, but it may also lead to errors accumulating in each stage.
The primary objective of named entity recognition (NER) is to identify and classify named entities within textual content, such as people, places, time, purpose, etc., with specific meanings.It is mainly responsible for automatically extracting the basic element entities in the knowledge graph from the unstructured and semi-structured.In order to uphold the quality of the knowledge graph, it is imperative to ascertain the precision and comprehensiveness of the entities extracted therein.Li et al. proposed a metalearning method, integrating distributed systems with a meta-learning approach to extract relations among Chinese entities [5].Through the utilization of machine learning and neural network methodologies, particularly leveraging the attention mechanism within the domain of natural language processing, Li et al. proposed a combination of conditional random fields (CRF) and bidirectional long short-term memory (BILSTM) for extracting information in a mathematical language [6].Luo et al. introduced a neural network model, known as the attention-based bidirectional long short-term memory with a conditional random field layer (Att-BiLSTM-CRF), for document-level chemical entity recognition [7].Li et al. advocated the utilization of distinct layers, specifically long short-term memory (LSTM) for text feature extraction and conditional random field (CRF) for label prediction decoding [8].Ren proposed a method to enhance entity recognition by transforming text into a vector representation combining contextual and global features through a pretrained model and a graph neural network GCN [9].
Relation extraction refers to extracting relations between connecting basic element entities from the unstructured and semi-structured.The mesh structure of the knowledge graph is similar to the structure of the brain for storing knowledge.Neurons represent entities and record basic information, and the process of extracting relations activates some of the neurons (entities) and adds them to the brain structure (knowledge graph), using relations to connect the entities to the whole knowledge graph.Zeng

Joint Learning Methods
In pipeline learning methods of relation extraction, the intrinsic connection between entities and relations is often overlooked, and the federated model is an excellent solution to this problem.Huang et al. suggested using soft label embedding as an effective means to facilitate information exchange between entity recognition and relation extraction [15].Wei et al. proposed a novel cascade binary tagging framework (CASREL), which models relations as functions that map subjects to objects [4].Liu et al. introduced an attention-driven integrated model, primarily comprising an entity extraction module and a relation detection module, as a means to effectively confront the prevailing challenges [16].Yu et al. decomposed the comprehensive extraction task into two mutually interconnected subtasks: one subtask handles the head entities, and the other subtask deals with the tail entities related to the head entities and their respective relations [17].Guo et al. introduced an integrated model for the extraction of entities and relations pertaining to concepts within the realm of cybersecurity (CyberRel) [18], and they adopted a perspective wherein the triple is conceived as a sequence of entity relations.Subsequently, Lv et  relational tags into sentence embeddings, which are used to distinguish the importance of relational tags for each word [21].Huang et al. proposed a novel translation-based unified framework, which is used to solve redundant predictions, overlapping triples, and relation connections problems [22].Liu et al. presented a model referred to as the bidirectional encoder representation from transformers-multiple convolutional neural network (BERT-MCNN), which has demonstrated a high level of accuracy and stability [23].

CasRel Framework
The goal of triple extraction is to identify all possible triples (head entity, relation, tail entity) in a sentence, which may contain some overlapping and shared entities.The structure of the CasRel framework is shown in Figure 2. The CasRel framework presents a fresh perspective on the task of triple extraction.It introduces a novel cascade binary tagging framework, known as CasRel, that effectively addresses the complex challenge of managing overlapping relations by systematically establishing subject-object mappings within sentences [4].This framework consists of a set of functions that identify entities and their related relations in an entity tagger and relation-specific object taggers.By employing the CasRel framework, the issue of sharing the same entity in multiple triples is addressed effectively, providing multiple related relations and corresponding entities for each entity.However, in the CasRel framework, if the subject tagger does not recognize an entity, the associated triad will be missed.
To solve the triple extraction omission that occurs in the CasRel framework, we propose an improved DERP framework based on the CasRel framework.Which improves the entity recognition accuracy by adding a tail entity recognition module in the entity tagger, and adding a triple prediction module after relation-specific object taggers.This framework will combine head entities, tail entities and relations to make predictions and comes up with a more accurate triple.

The DERP Framework
Entity recognition and relation extraction are the design priorities for triple extraction.The primary objective of this DERP framework is to ascertain the complete set of potential triples within a given sentence, acknowledging the potential existence of entities with overlapping attributes in some instances.
The ultimate prediction of the (head entity, relation, tail entity) triple is achieved through the recognition and forecasting of the acquired triples within the triple prediction layer.The DERP framework is shown in Figure 3.In the DERP framework, we model relations as functions that map topics to objects.We optimize the previously commonly used learning relation classifiers   ,  → , to learning relation-specific taggers   →  .Each tagger will identify entities that may exist under a specific relationship, or entities that may not be returned.If the entity is not returned, it indicates that there is no triple in the current entity and relation.
When dealing with overlapping triples, the DERP framework uses an entity tagger for entity recognition and allows multiple relationship representations in relationspecific entity taggers.Within relation-specific entity taggers, multiple relationships and their corresponding entities can be obtained.By using the DERP framework, different types of data structures, including EPO triples and SEO triples, can be effectively handled.
We used an entity tagger to identify head entities at the very beginning of the research on framework development and used the identified head entities to find related relations and tail entities.During the experiments, it was found that if there is a head en-tity in the entity tagger that is missing, this triple will be missed in the final triple prediction, especially in the case of overlapping triples where a head entity corresponds to more than one related tail entity.There are also cases where some of the tail entities related to this head entity are missed when performing the triple extraction; in this case, we can better find these missing tail entities by adding a tail entity recognition module to the entity tagger.So, two matching entities and accurate relations between entities are achieved by adding a tail entity recognition module to the entity tagger, and by looking up the corresponding relation and another matching entity in the relation-specific entity taggers.
During the experiment, by learning and improving the previous model, we added the tail entity recognition module.If the probability of recognizing the correct triple by the head entity module only is   and the probability of recognizing the correct triple by the tail entity module only is   , we will increase the probability of finally recognizing the correct triple by combining the two entity modules with the following probability equation: where   is the probability of obtaining the correct triple,   is the probability of obtaining the correct triple by only using a single head entity recognition module,   is the probability of obtaining the correct triple by only using a single tail entity recognition module, and   ∩  is the probability of duplicate triples obtained by the head entity recognition module and tail entity recognition module.

BERT Encoder
BERT mainly consists of N layers of transformer block.A BERT encoder extracts sentence feature information from sentence S and inputs the feature information into the entity tagger.

ℎ 𝑂 𝑊 𝑊
(2) where  is the one-hot vector matrix indexed in the input sentence,  is the word embedding matrix,  is the positional embedding matrix,  in  denotes the positional index in the input sequence, and  is the i-th relation type embedding.

Entity Tagger
Compared with the CasRel framework, the entity recognition is divided into head entity recognition and tail entity recognition in the entity tagger, which reduces the situation of missing triples due to the omission of the first stage of entity recognition, and also improves the accuracy of the extraction of overlapping triples [24].
The BERT encoded sentence is entered in the entity tagger to extract head and tail entities by the binary method.
Within the entity tagger, the identification of entity positions within sentences encoded by the BERT encoder is achieved.In this module, two binary classifiers are designed to check for the start and end positions of entity words.By setting specific thresholds, if the probability surpasses the designated threshold, the token is marked as 1; otherwise, it is marked as 0. The following is specific to the head entity recognizer and tail entity recognizer: HE_end     (5) TE_end Where  _ ,  HE_end ,  TE_start , and  TE_end are the probability of the marker position being predicted to be the start and end positions of the head entity and the tail entity,  denotes the i-th marker in sentence S,  ,  ,  , and  denote the training weights of the head entities and tail entities, and  、  、  , and  denote the bias of the head entities and tail entities.In the use of the model, we need to keep the dimensions of the start binary classifier and the end binary classifier the same.
The entity recognition module uses the following likelihood function to recognize the range of sentences that have been encoded by the encoder: Where L is the length of the sentence,   1 if  is true and 0 otherwise,  _ ,  _ ,  _ , and  _ are the -th tag in the sequence that marks the start position and the end position.

Relation-Specific Entity Taggers
In the relation-specific entity taggers, an entity tagger is assigned to each relation word.The relation terms are used to correspond to the head entity or tail entity extracted in the previous layer to extract the entity in satisfying the relations.The calculations are shown below: Where  HR_start ,  HR_end ,  TR_start , and  _ are the probabilities that the head entity and the tail entity at the labeled position are predicted to be the entity start position and end position,  is the relation-specific entity tagger's vector of coded representations of the kth subject detected in the module,  ,  ,  , and  denote the training weights of the head entities and tail entities, and  ,  ,  , and  indicate deviations of head entities and tail entities.
Relation-specific entity taggers use the following likelihood function to identify the range of sentences that the encoder has encoded: Where  is the length of the sentence,   1 if  is true and 0 otherwise, and  _ ,  _ ,  _ , and  _ are the -th tags in the sequence that marks the start position and the end position.

Triple Prediction
The relation-specific entity taggers identify the head entity, tail entity, and the corresponding relations and use the method of entity relation prediction to match the head entities and tail entities identified in the entity tagger using the following method: When  _ ,  _ ,  _ , and  _ equal to 1, the head entity or tail entity corresponding to the entity extracted in entity tagger and the corresponding relation is obtained, and if the value is equal to 0, the triple is excluded. ,  ,  , and  are the set thresholds.
where  _ represents the triplets of the tail entity and the relation between entities obtained based on the head entity,  _ represents the triplets of the head entity and the relation between entities obtained based on the tail entity, and  denotes the final predicted triplets.

Experiments
The effectiveness of the proposed framework is validated with experiments.The datasets and evaluation metrics are first introduced, and then the model names are compared with different baseline models.

Experiment Setup and Experiment Description
As most of the previous studies conducted experiments using English datasets, this study conducted experiments using two publicly English available datasets, NYT [25] and WebNLG [26], and compared the results of the experiments with 10 baseline models.Due to the specificity of the Chinese language, the complexity and difficulty of Chinese triple extraction is considerably greater than that of English relations [27].We used two Chinese datasets, DuIE2.0 [28] and CMeIE-V2 [29].DuIE2.0 is the most comprehensive Chinese relational extraction dataset in the industry [30].CMeIE-V2 is a Chinese medical information extraction dataset, specifically designed for pediatrics and covering more than a hundred common diseases.

Relation-Specific Entity Taggers
Head Entity Tail Entity HS+v E

Head Entity
Tail Entity This model performs head entity recognition and tail entity recognition in the entity recognition part and performs the corresponding triple extraction based on the experimental results.In the experiments, the head entity recognition model and the tail entity recognition model are used individually for comparison experiments to verify the reliability and validity of the experiments.The schematic diagram of the head entity recognition module and the tail entity recognition module is shown in Figure 4.
The DERP framework is implemented using TensorFlow.In the BERT encoder section, the framework is implemented on English datasets using the cased_L-12_H-768_A-12 model and on Chinese datasets using the RoBERTa model.Dropout is applied to word embeddings and hidden states with a rate of 0.1.Network weights are optimized with Adam.The learning rate is set as 1 5.The max length of the input sentence is set to 100.The batch size is set as 6.We use 100 epochs and choose the model with the best performance on the validation set to output results on the test set.
In our experimental procedures, for the sake of maintaining consistency with prior research, an extracted triple is deemed accurate if the head entity, the relation, and the tail entity are each validated as correct.The study reports standard metrics, including micro-precision (Prec.),recall (Rec.), and F1 score (f1), in line with the established baselines.
Unless otherwise noted, the results of these baseline models were taken from the original papers.

Results
Table 1 shows the results of our model relative to other baselines extracted from entities and relations on both datasets.On the WebNLG dataset, DERP outperformed all baselines in both recall and F1 score, and on the NYT dataset, DERP achieved the second highest F1 score.These results directly validate the utility of the proposed DERP framework.Table 2 shows the experimental results of DERP on the DuIE2.0 and CMeIE-V2 datasets, which shows an improvement over CasRel in terms of F1 score results.The F1 score of DERP_HeadEntity is also higher than CasRel when experiments are conducted using DERP_HeadEntity.We conducted experiments on CasRel under the same experimental conditions as the DERP framework.On the NYT dataset, CasRel* scored precision 88.87%, recall 90.34%, and F1 score 89.60%; on the WebNLG dataset, CasRel* scored precision 91.92%, recall 91.39%, and F1 score 91.65%.Compared with the replicated CasRel* framework, DERP has 1.38 percent improvement in F1 score on the NYT dataset, 1.21 percent improvement in F1 score on the WebNLG dataset, 0.6 percent improvement in F1 score on the DuIE2.0dataset, and 1.98 percent improvement in F1 score on the CMeIE-V2 dataset.On the four datasets of NYT, WebNLG, DuIE2.0, and CMeIE-V2, in the experiments using head entity recognition and tail entity recognition alone for triple prediction, DERP_HeadEntity has higher precision, recall and F1 score than the original CasRel model in the experiments.In the DERP tail entity experiment, the features of the tail entity are not as easy to recognize as the features of the head entity, resulting in weaker F1 experimental results than DERP_HeadEntity on the four datasets.
Table 1 also presents that in the experiments on the two English datasets, with the existing models compared, a significant gap in processing performance between the models is found, which proves that DERP performs better in dealing with redundant entities and overlapping triples.In the comparison experiments on four datasets, NYT, WebNLG, DuIE2.0, and CMeIE-V2, it is demonstrated that dividing entity recognition into head entity recognition and tail entity recognition, as in the DERP framework, can effectively improve the accuracy of entity recognition, and can produce more accurate results in relation extraction and triple prediction.

Conclusions
In this study, a double-headed entities and relations prediction framework for joint triple extraction is proposed.The entity recognition part is decomposed into head entity recognition and tail entity recognition.Specifically, relation prediction and tail entity recognition are executed for the head entities, and in parallel, relation prediction and head entity recognition are performed for the tail entities.In addition, a triple prediction module is designed to solve the entity overlapping problem in previous joint triple extractions.We systematically conducted experiments across four distinct datasets and compared them with ten baseline models.By proceeding with joint triple extraction, a good foundation is constructed for subsequent natural language processing or knowledge graph construction efforts.The results of these rigorous investigations substantiate that the conceptual framework introduced in this paper exhibits certain improvements when juxtaposed with prior models.
In the DERP framework, we have only improved the case of missing triple extraction, and in future work, we will conduct research on the case of error in triple extraction.We will also conduct research on

Figure 1 .
Figure 1.Normal, entity pair overlap (EPO) triple, and single entity overlap (SEO) triple cases.In each example, overlapping entities are marked with the same color.

Figure 2 .
Figure 2. Overview of the CasRel framework structure.
et al. conducted an analysis of the pivotal role played by the order of relation extraction and employed reinforcement learning techniques to ameliorate the efficiency of relation extraction [10].Han et al. proposed a one pass model based on BERT, capable of predicting entity relations by processing the text in a single pass [11].Chen et al. utilized a neutralized feature engineering approach for entity relation extraction, namely, enhancing neural networks with manually designed features [12].Yuan et al. proposed a relationaware attention network to construct relation-specific sentence representations [13].Wan et al. proposed a span-based multi-modal attention network (SMAN) for joint entity and relation extraction [14].
al. constructed the joint extraction of entity mentions and relations model, which was based on the bidirectional long short-term memory and maximum entropy Markov model (Bi-MEMM) [19].Zheng et al. introduced an integrated framework for extracting relational triples, underpinned by the principles of potential relation and global correspondence (PRGC) [20].Li et al. proposed a relation-aware embedding mechanism (RA) for relation extraction, with attention mechanisms being used to merge

Figure 3 .
Figure 3.The architecture of the proposed DERP framework.In the framework, the start and end positions of predicted entities and relations are color-marked, with entities belonging to the same group marked with the same color.

Figure 4 .
Figure 4. (a) Schematic diagram of the head entity recognition module.(b) Schematic diagram of the tail entity recognition module.
Chinese text triple extraction to study the special characteristics of Chinese text triple extraction and improve the accuracy and effectiveness of Chinese text triple extraction.Author Contributions: Conceptualization, Y.X. and G.C.; methodology, Y.X.; software, Y.X. and C.D.; validation, Y.Y., L.L., and J.Z.; formal analysis, J.L.; investigation, Y.X.; resources, Y.X.; data curation, L.L.; writing-original draft preparation, Y.X.; writing-review and editing, G.C.; visualization, C.D.; supervision, C.D.; project administration, Y.Y.; funding acquisition, G.C.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by cooperative projects between universities in Chongqing and the Chinese Academy of Sciences, grant number Grant HZ2021015; the Chongqing Technolo-General Project of the Chongqing Municipal Science and Technology Commission, grant number cstc2021jcyjmsxm3332; the Sichuan Science and Technology Program 2023JDRC0033; the Young Project of Science and Technology Research Program of the Chongqing Education Commission of China, number KJQN202001513 and number KJQN202101501; the Luzhou Science and Technology Program 2021-JYJ-92; the Chongqing Postgraduate Scientific Research Innovation Project, grant number CYS23752; and the Chongqing University of Science and Technology Master and Doctoral Student Innovation Project, grant number YKJCX2120811.

Table 1 .
Precision (%), recall (%) and F1 score (%) of the compared models on the NYT and WebNLG databases.* marks results quoted directly from the original papers.