LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction

Lai, Rongxuan; Wu, Wenhui; Zou, Li; Liao, Feifan; Wang, Zhenyi; Mi, Haibo

doi:10.3390/bdcc9080198

Open AccessArticle

LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction

by

Rongxuan Lai

^1,2

,

Wenhui Wu

¹,

Li Zou

^1,2,

Feifan Liao

^1,2,

Zhenyi Wang

^1,2,*

and

Haibo Mi

^1,2,*

¹

Information Support Force Engineering University, Wuhan 430001, China

²

State Key Laboratory of Complex & Critical Software Environment, Wuhan 430001, China

^*

Authors to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(8), 198; https://doi.org/10.3390/bdcc9080198

Submission received: 22 May 2025 / Revised: 14 July 2025 / Accepted: 24 July 2025 / Published: 29 July 2025

Download

Browse Figures

Versions Notes

Abstract

End-to-end relation extraction (E2ERE) generally performs named entity recognition and relation extraction either simultaneously or sequentially. While numerous studies on E2ERE have centered on enhancing span representations to improve model performance, challenges remain due to the gaps between subtasks (named entity recognition and relation extraction) and the modeling discrepancies between entities and relations. In this paper, we propose a novel Label Annotation Interaction-based representation enhancement method for E2ERE, which institutes a two-phase semantic interaction to augment representations. Specifically, we firstly feed label annotations that are easy to manually annotate into a language model, and conduct the first-round interaction between three types of tokens with a partial attention mechanism; Then we construct a latent multi-view graph to capture various possible links between label and entity (pair) nodes, facilitating the second-round interaction between entities and labels. A series of comparative experiments with methods of various transformer-based architectures currently in use show that LAI-Net can maintain performance on par with the current SOTA in terms of NER task, and achieves significant improvements over existing SOTA models in terms of RE task.

Keywords:

nature language process; jointly entity and relation extraction; information extraction; graph neural network; representation enhancement

1. Introduction

As a core information extraction (IE) task, end-to-end relation extraction (E2ERE) can be split into named entity recognition (NER) subtask for entity identification and relation extraction (RE) subtask for capturing inter-entity relations from plain texts. As postulated by [1], E2ERE is challenging due to its difficulty in capturing affluent correlations between entities and relations. IE research has traditionally converted NER and RE tasks into span-based tasks [1,2,3,4,5]. Though these methodologies have incrementally advanced model performance from various perspectives, they are still impeded by two pivotal limitations: overdetached subtasks leads to insufficient information exchange between entitiy and relation, and the disparity in modeling strategies between entity and relation result in semantic gaps. In this paper, we mainly focus on enhancing semantic interaction during modeling processes to achieve enhanced span representation for E2ERE.

To address the challenges above, prevailing research mainly focuses on reorganizing the input or intermediate network layers of pre-trained language models (PLMs), attempting to enhance the semantic information of representation through the integration of specialized symbols or extrinsic prior knowledge. We roughly divide this into three types (as shown in Figure 1): The vanilla-based method is a straightforward approach to acquiring a given span representation by feeding raw text tokens series into a pre-trained encoder. The marker-based method inserts independent entity markers like

[M], [∖ M]

amidst text tokens to highlight the presence of entities, aiming at attracting more model attention. The enumerate-based method enumerates all possible entity candidates from plain text and then concatenates them after text tokens, and entity tokens share the position IDs with candidates as well. For these methods above, the distinguished LSTM-CRF [6] is a typical vanilla-based sequence labeling method, and PURE [7] is a combination of the marker-based method and the enumeration-based method, which adopts the marker-based method during the NER phase and the enumerate-based method for the RE phase, achieving SOTA performance. PL-Marker [8] is a typical enumeration-based method that promoted the SOTA further.

In addition to the above three methods, what we develop in this paper can be classified as the fourth class, named the annotate & enumeration-based method. It is an novel semantic enhancement approach with external knowledge, inspired by external knowledge-based approaches [3,9,10]. We argue that a thorough comprehension of label semantics will significantly enhance the IE model abilities; this serves as the premise for our work.

As shown in Figure 1, our principal improvement over preceding methods lies in the insertion of external prior knowledge (i.e., label annotations) into the PLM input sequence, aiming to leverage PLM’s internal layers to enhance semantic interaction between label and text. This represents the first-round semantic interaction in the LAI-Net framework.

Unlike the lexicon-adapter-based methods [11,12] and lattice-based methods [13,14,15], we manually expand the label information and embed it between text tokens and enumerated candidates then feed the series into a pre-trained model. We further enhance the representation by combining word vectors using downstream neural networks. Formally, the augmented representations derived from the aforementioned methods can be summarized as follows:

\begin{matrix} h_{span}^{V} = f (h_{span}^{s}; h_{span}^{e}) \end{matrix}

(1)

\begin{matrix} h_{span}^{M} = f (h_{M}; h_{∖ M}) \end{matrix}

(2)

\begin{matrix} h_{span}^{E} = f (h_{span}^{s}; h_{span}^{e}; h_{M}; h_{∖ M}) \end{matrix}

(3)

\begin{matrix} h_{span}^{A & E} = f (h_{span}^{s}; h_{span}^{e}; h_{M}; h_{∖ M}; h_{a}^{s}; h_{a}^{e}) \end{matrix}

(4)

where

f (\cdot)

is an tensor operator, equipped to execute series of operations including tensor addition, tensor multiplication, tensor concatenation, etc., or even could be a neural network. And superscripts

s, e

denote the commencement and termination tokens of a span or annotation, while the subscripts

span, M, ∖ M

represent the disparate token types.

Based on the first-round semantic interaction, we further meticulously crafted with selectively designed downstream network layers. The main innovations and improvements lie in the following aspects:

We explore a novel two-round semantic interaction approach for enhancing span representations, wherein the first-round interaction reorganizes the PLM input with annotated information in what we term an annotation & enumeration-based method, and the second-round interaction employs a graph convolutional network (GCN) built atop Gaussian graph generator modules to facilitate label semantic fusion.
We conduct a coarse screening with a entity candidate filter to eliminate out spans that are clearly not real entities, which also promotes the saving of computing resources.
Experiments demonstrate that our method, while slightly lagging behind current SOTA in NER performance, takes the lead in the downstream RE task, surpassing the current SOTA performance.

2. Related Work

Graph Convolutional Networks. Recently, academic interest in span representation enhancement has surged, providing a substantial impetus to E2ERE. Traditional neural-network-based methods often ignore non-local and non-sequential context information from input text [16], which is precisely the area where GCNs [17,18] excel. GCNs, which our discussion centers on, have been widely used to model the interaction between entities and relations in the text, and have been demonstrated as a typical and effective approach. GCN-based approaches typically leverage a predefined graph structure, constructed from plain text, to facilitate information propagation among nodes, thus capturing text’s non-linear structure and enhancing NER and RE models’ capabilities to capture both global graph structures and representations of nodes and edges. GCN-based methods [2,5,19,20,21,22] mostly utilize different approaches to define nodes (sentences, words, tokens, spans, labels, etc.), and these nodes can be connected by syntactic dependency edges [23,24,25,26], re-occurrence edges [16], co-occurrence edges [27,28], adjacent word edges [16,26,29], adjacent sentence edges [26], etc. Convolution operations are performed on the graph to facilitate the flow of information between nodes, which enables nodes to efficiently acquire both local and global information. This further refines node representations and downstream network performance.

Traditional Entities and Relation Extraction Methods. Research on traditional entities and relation extraction methods has undergone several stages of evolution, progressing from early rule-based approaches [30,31] to classical machine learning techniques [32,33], neural-network-based approaches [34,35,36,37], and, more recently, to PLM-based methods [7,8,38,39,40,41,42,43,44,45,46,47] that have gained popularity in the past few years. Throughout these developments, system performance has continuously improved. Broadly, existing methods can be categorized into two main paradigms: joint extraction and pipeline extraction techniques. Joint extraction approaches integrate the two subtasks (NER and RE) within a unified framework, often adopting a multi-task learning strategy, aim to perform NER and RE in a single model, thereby mitigating the issue of error propagation from the upstream NER task to the downstream RE task, and ultimately enhancing overall performance. In contrast, pipeline extraction methods treat NER and RE as two separate stages. Specifically, a dedicated NER model is first trained to identify and extract relevant entities, and the resulting entity information is then passed to a separate RE model, which infers the relationships between entity pairs.

Prompt-based Entities and Relation Extraction Methods. The advent of GPT-2 [48] catalyzed the emergence of prompt-based algorithms, wherein early approaches primarily concentrated on identifying optimal prompts to be fed into PLMs [49,50]. With the subsequent development of chat-based generative models, the focus of RE research has increasingly shifted toward leveraging LLMs for classification and information extraction tasks. LLMs trained on massive corpora demonstrate the capability to make effective inferences from only a limited number of examples, thereby rendering them promising candidates for few-shot and low-resource scenarios. Consequently, the potential of LLMs in such settings has attracted significant research attention. Some studies report that LLMs can perform relation extraction effectively under few-shot conditions [51,52,53,54,55], which is something that traditional methods struggle to achieve.

Based on these previous studies, we integrate the powerful capabilities of PLM and GCNs to design and develop a hybrid architecture, aiming to enhance the performance of E2ERE through two-phase semantic interaction.

3. Datasets and Preprocessing

We selected two standard corpus (ACE05, SciERC) in terms of the E2ERE task.

(1) ACE05 (https://catalog.ldc.upenn.edu/LDC2006T06) (accessed on 21 May 2025) is collected from a variety of domains (such as newswire and online forums). It includes 7 entity types and 6 relation types between entities. For data processing, we use the same entity and relation types, data splits (https://github.com/tticoin/LSTM-ER/tree/master/data/ace2005/split) (accessed on 21 May 2025), and pre-processing as [56,57] (351 training, 80 development and 80 testing).

(2) SciERC [58] is a scientific-oriented dataset that is built from 12 AI conference/workshop proceedings in four AI communities, and includes 7 entity types and 7 relation types.

(3) ADE [59] consists of 4272 sentences and 6821 relations extracted from medical reports.

In terms of the experiment, for ACE05 and SciERC, we run our model 10 times with different random seeds, and report averaged results of all the runs. And for ADE, we adopt 10 fold cross-validation, respectively, and run each fold 10 times and report averaged results of all the runs.

4. Methodology

4.1. Task Definition

Given a sentence formularized as

S = {w_{1}, w_{2}, \dots, w_{m}}

= {t_{1}, t_{2}, \dots, t_{n}}

with m words (or n tokens,

n \geq m

naturally), the goal of the E2ERE task is to recognize a set of entity spans and relationships of entity pairs automatically, which can be written as

(e_{i}, r_{i, j}, e_{j}) \in T

. Every entity e, which attaches a specific type (e.g., person (PER), organization (ORG)), is a sequence of tokens. Every relation

r_{i, j}

represents the relationship between

e_{i}

and

e_{j}

, and also attaches a specific type (e.g., organization affiliation relation (ORG-AFF)) (both entity type sample and relation type sample mentioned above are quoted from ACE05). Formally, we define the set of possible entity types and relation types as

E

,

R

, respectively.

4.2. First-Round Semantic Interaction

During data processing, we concatenate text tokens, annotation tokens, and marker tokens sequentially to formulate a unified input sequence (the bottom element of Figure 2). The PLM encoder conducts the first-round semantic interaction, then outputs the encoded representation, which is a semantic amalgamation of the three types of tokens.

Text Tokens. Our approach breaks down the words from raw text into text token sequences as part of the model input.
Annotation Tokens. Inspired by [3,60], we augment semantic information by manually annotating the entity (or relation) abbreviated label both in the NER and the RE phase. For example, the abbreviated entity type GPE can be annotated as “geography political entity”, a fully-semantic unbroken phrase. Correspondingly, the abbreviated relation type ORG-AFF can be annotated as “organization affiliation”. Each label is manually expanded to enrich semantic content and then tokenized into annotation tokens (highlighted by red rectangle in bottom of Figure 2), which are appended to the text tokens sequentially.
Marker Tokens. We enumerate all potential consecutive token sequences (i.e., entity candidates) not exceeding a predefined limitation of length c (with $c \leq n$ ) within a sentence, labeling each with an entity type. If $c = 2$ , as shown in Figure 2, the set of all the possible spans from sentence “chalabi is the founder and leader of the iraqi national congress.” can be written as $Ψ = {$ “chalabi”, “chalabi is”, “is”, “is the”, “the founder”, “founder”, “founder and”, “and”, “and leader” $\dots}$ . The i-th span can be written as ${span}_{i} = [{span}_{i}^{s}, {span}_{i}^{e}]$ , where ${span}_{i}^{s}$ and ${span}_{e}^{e}$ are indicative of start and end position IDs of entity span, respectively. Therefore, the entity candidate series can also be written as $Ψ = {[0, 0], [0, 1], [1, 1], [1, 2], [2, 2], [2, 3], [3, 3], [3, 4], [4,$ $4], [4, 5]}$ given position ID perspective. Thus, the number of candidates for a sentence with m words: $| Ψ | = m \cdot c + (c - c^{2}) / 2$ .

In model input, we define a start marker (M) and an end marker (

∖ M

), which form a pair of marker tokens, respectively, representing the start and end of an entity span and are appended subsequent to annotation tokens. The start and end markers share the same position embedding with the corresponding span’s start token and end token, respectively, while keeping the position ID of the original text tokens unchanged. From an PLM encoder input perspective, every marker is a token element of token series, called a marker token. As shown in Figure 2, entity chalabi is highlighted by the light yellow bordered square, and its corresponding markers are denoted by the colorful non-bordered square with line frame differing in various entity labels (white means non-entity). In conclusion, the complete input sequence can be represented as Equation (5), where

a_{i}

is a token broken from label annotation, and

M_{i}^{s}, M_{i}^{e}

represent the start and end marker tokens, respectively.

\begin{matrix} \tilde{S} = {[CLS], t_{0}, \dots, t_{n - 1}, [SEP], a_{0}, \dots, a_{N - 1}, [SEP], M_{0}^{s}, \dots, M_{| Ψ | - 1}^{s}, M_{0}^{e}, \dots, M_{| Ψ | - 1}^{e}} \end{matrix}

(5)

Partial Attention. Although special tokens such as [CLS] and [SEP] serve to isolate different types of tokens, there still exists semantic interference among them. The straightforward blend of annotation tokens and marker tokens with text tokens may disrupt semantic consistency of the raw text. To mitigate this, we devise a partial attention mechanism, allowing selective semantic influence among the different types of token. This mechanism can effectively control the information flow (could be regarded as a kind of visibility) between different tokens, by adjusting the value of elements of the attention mechanism mask matrix. It suppresses the information interaction among tokens that are mutually invisible while enhancing the information interaction among tokens that are mutually visible. Experimental results show that partial attention effectively improves model performance. See Appendix B for more detailed information about partial attention.

4.3. Second-Round Semantic Interaction

To refine semantic integration, we introduce second-round semantic interaction, employing a semantic integrator that explicitly model interactions between entity candidates and label annotations. The semantic integrator consists of multiple GCN layers with randomly generated adjacency matrix, treats both entity spans and label annotations as nodes, and establishes connections between nodes through the construction of a graph

G

so that the interactions between nodes can be explicitly modeled. A GCN typically necessitates a manually predefined and fixed adjacency matrix to depict the inter-nodes connections. The fixed adjacency matrix fixes the perspective from which the model understands the semantics. However, it is noteworthy that the inter-nodes connection cannot be predetermined accurately when considering our task. Otherwise, our task would be meaningless. Therefore, inspired by [22,61], we forgo a static adjacency matrix in favor of a multi-view graph, called Gaussian Graph Generator-based Graph Convolutional Networks (

G^{4} CN

).

G^{4} CN

is unlike vanilla GCN, which takes a fixed adjacency matrix; it gives up fixing the edge weights and sets the edge weights via trainable neural networks during the network initial stage, which allows the model to assimilate semantic contexts from multiple perspectives.

First, we attach every node with a Gaussian distribution

N (μ, σ)

, where both

μ, σ

are generated by trainable neural networks, formulated as in Equations (6) and (7), and set the activation function

ϕ

as the SoftPlus function, as the standard deviation of Gaussian distribution is bounded on

(0, + \infty)

. Then, we simulate edge weight by computing the KL divergence

w_{i j}^{e}

between two Gaussian distributions of node, where

w_{i j}^{e} = KL (N (μ_{i}, σ_{i}) ∥ N (μ_{j}, σ_{j}))

. We can obtain a number of Gaussian distributions for the multi-view graph; each

N_{i}

here corresponds to a node representation

v_{i}

. Thus, a vanilla GCN’s value-fixed adjacency matrix

A = {(a_{i j}^{e})}_{k \times k}

can be modified to be a

G^{4} CN

’s value-varied adjacency matrix

A^{'} = {(w_{i j}^{e})}_{k \times k}

, where k is the number of nodes. The whole process can be formulated as Equation (8), where

H_{span}, H_{anno}^{ent}

are matrixes formed by concatenating multiple nodes (span or annotation) in the constructed heterogeneous graph

G

, and

GCN (\cdot)

is the vanilla GCN (see detailed formulas in Appendix A).

\begin{matrix} {μ_{i}^{1}, & μ_{i}^{2}, μ_{i}^{3}, \dots, μ_{i}^{N}} = g_{θ} (v_{i}) \end{matrix}

(6)

\begin{matrix} {σ_{i}^{1}, & σ_{i}^{2}, σ_{i}^{3}, \dots, σ_{i}^{N}} = ϕ ({g^{'}}_{θ} (v_{i})) \end{matrix}

(7)

\begin{matrix} {\tilde{H}}_{span} & = \frac{1}{2} (H_{span} + GCN ({A^{'}}_{ner}, [H_{span}; H_{anno}^{ent}])) \end{matrix}

(8)

4.4. Name Entity Recognition

Span and Annotation Representation. We extract the contextualized representations

h

for individual token s from PLM output, and naturally obtain the involved mathematical formulas for spans and annotations as Equations (9) and (10), where

h_{anno} \in R^{d}, h_{span} \in R^{d}

.

h_{a}^{s}, h_{a}^{e}

is the embedding of the first and last token of a certain type of label annotation, respectively.

h_{t}^{s}, h_{t}^{e}

is the embedding of the first token and last token of an entity candidate, respectively, and

h_{M}^{s}, h_{M}^{e}

indicates the embedding of the start and end token of the marker, respectively. The linear layer

FC

is used to harmonize dimensional space.

\begin{matrix} h_{anno} & = {FC}_{a} ([h_{a}^{s}; h_{a}^{e}]) \end{matrix}

(9)

\begin{matrix} h_{span} & = {FC}_{span} ([h_{span}^{s}; h_{span}^{e}; h_{M}^{s}; h_{M}^{e}]) \end{matrix}

(10)

Entity Candidates Filter. Before the development of the entity candidates filter module, we attempted to train the model, but it ended in failure. The reasons can be summarized as follows: (1) During the initialization phase of training, model parameters are randomly assigned values, leading to lots of candidate entities being randomly predicted as real entities in the early phase. This further causes the adjacency matrix of the graph neural network to become excessively large in scale, resulting in extremely slow network training and significant resource consumption. (2) Among the numerous candidate entities enumerated, positive sample entities are extremely few while negative sample entities are abundant. This easily induces the model to tend towards classifying all entities as non-entities to achieve the minimum loss value, which is not the desired outcome. To overcome the aforementioned issues, we devise a binary classifier that acts as a entity filter, performing coarse screening for all enumerated entities by discarding non-genuine entities, thus optimizing subsequent predictions.

As for loss function, the primary aim of the entity filter is to maximize the likelihood function, which drives us follow [2] to adopt the likelihood loss function as per Equation (11), where

Ψ_{g} \subseteq Ψ

indicates a set of real entity spans. In addition to intuitive time consumption optimization, experimental results indicate that the entity filter successfully alleviates model weakening engendered by negative samples and enhances overall performance.

Span Classifier. We conduct span representations classification through a linear classifier, utilizing cross-entropy loss to direct the learning process. The combined loss function $L_{ner} = L_{filter} + L_{span}$ is optimized during training, using $L_{span}$ from Equations (11) and (12), with dropout layers for regularization.

In addition, Ye D et al.[8] had proved that packing a series of related spans into a training instance can promote the NER model performance, which naturally prompted us to follow the effective measures when reorganizing the input.

\begin{matrix} L_{filter} & = - \frac{1}{| Ψ |} \sum_{i = 1}^{| Ψ |} log P ({span}_{i} \in Ψ_{g} | {span}_{i} \in Ψ) \end{matrix}

(11)

\begin{matrix} L_{span} & = - \frac{1}{| Ψ_{g} |} \sum_{Ψ_{g}} log P_{span} \end{matrix}

(12)

4.5. Relation Extraction

Subject marker. In the RE phase, we design our annotate & enumeration-based method (as demonstrated in Figure 1) to acquire the enhanced representation. Concretely, we adopt the marker-based approach (shown in Figure 1), and insert a pair of subject markers, called solid markers by [8], into the left and right of the subject entity, and the enumerated object candidate spans, following on the heels of annotation tokens to extract relations involving the subject entity.

Entity pairs representation. We match subject and object representations up pairwise to obtain a series of entity pairs ( $h_{pair} = [h_{subj}; h_{obj}]$ ). The label semantic confused pair representation formulas can be written as in Equation (13), where $H_{pair}, H_{anno}^{rel}$ are matrixes formed by concatenating entity pair or relation label annotation representations.

\begin{matrix} {\tilde{H}}_{pair} = \frac{1}{2} (H_{pair} + GCN ({A^{'}}_{re}, [H_{pair}; H_{anno}^{rel}])) \end{matrix}

(13)

The loss function of the RE phase is the cross-entropy loss.

5. Experiments

Datasets. We utilize three widely used standard corpora: (1) ACE05 spans various domains (newswire, online forums), and contains seven entity types and six relation types between entities. (2) SciERC [58] is a scientific dataset built from AI conference/workshop proceedings across four communities. It includes seven entity types and seven relation types. (3) ADE [59] consists of 4272 sentences and 6821 relations extracted from medical reports.

Metrics. The model with the best F1 performance on the test set will be selected on a fixed number of epochs. Both micro and macro average metrics are used to evaluate the model performance, the former for ACE05/SciERC and the latter for ADE. For the NER task, an entity prediction is correct if and only if its type and boundaries both match with those of a gold entity. For the RE task, a relation prediction is considered correct if its relation type and the boundaries of the two entities match with those in the gold data. We also report the strict relation F1 (denoted RE+), where a relation prediction is considered correct if its relation type as well as the boundaries and types of the two entities all match with those in the gold data. We show detailed experimental settings in Appendix B.

5.1. Main Results

To ensure the credibility of experimental results, we conducted multiple experiments with different random seeds on all datasets, and the stability of runs are presented in Appendix B.9.

5.1.1. Results Against Horizontal Comparison

Table 1 presents a horizontal comparison with baselines, focusing on comparing excellent models developed in recent years, including several previous SOTA models (details are in Appendix B.1). In terms of the NER task, our method is on par with the current SOTA, with the discrepancy across three datasets ranging from 0.39% to 0.1%, which achieves suboptimal performance. Starting from a slightly lower NER performance compared to the SOTA, our method still achieves a performance gain of 2–10% on relation F1 and strict relation F1, consistently outperforming all selected baselines, even with the error propagation between the NER and the RE task. For example, on ACE05, our method exhibits a 0.24% disadvantage in the NER phase, but takes the lead in the downstream RE phase, surpassing the current SOTA by 0.41%, setting a new SOTA record. Similarly, our method, despite having a 0.39% and 0.1% disadvantage in NER performance, respectively, on SciERC and ADE, achievew performance improvements of 10% and 0.52%. All these improvements demonstrate that two rounds of semantic interactions indeed further utilize the predicted entities from the NER phase, significantly improving the performance of relation recognition while slightly compromising NER performance.

5.1.2. Results Against Significant Hyperparameters

Table 2 delineates the impact of varying significant hyperparameters of

G^{4} CN

on the performance of LAI-Net. For the number of GCN layers, we consider a range from 0 to 5, with 0 indicating the absence of GCN for semantic interaction and more layers corresponding to increased communication times among nodes within the graph. As for the number of attention heads, we opt for values of 1, 2, 3, 4, and 6. The deliberate exclusion of the value 5 is attributed to the fact that the encoder’s hidden dimension is not divisible by 5, a constraint inherent to the multi-head attention mechanism.

The table permits the intuitive observations that (1) an increase in the GCN layers number does not linearly translate to enhanced performance; an excessive GCN layers number can exert a deleterious effect on the model, with the more optimal layers number identified as either 1 or 2; (2) concerning the number of attention heads, the more optimal solution exhibits some variation across different datasets, yet it is unequivocally clear that neither an excessively high (e.g., 6) nor a disproportionately low (e.g., 1) number of attention heads can fully capitalize on the GCN’s capabilities.

5.2. Inference Speed

There is justification to assess whether the adoption of GCNs to enhance F1 performance has made trade-offs respect to inference efficiency, for the reason that GCNs have certain efficiency disadvantages typically. Hence, we conducted a comparison between PL-Marker and LAI-Net in terms of inference speed. The experiment was evaluated on a NVIDIA GeForce RTX 3090 24 GB GPU with a evaluate batch size of 16. We used the Bert-base model for ACE05 and SciBert for SciERC.

As shown in Table 3, we indeed sacrificed varying degrees of inference efficiency in exchange for varying degrees of improvement in F1 performance. Overall, the greater the sacrifice, the greater the benefit.

5.3. Ablation Study

5.3.1. Ablations Against Entity Filter

Considering the binomial surge in candidate entity quantity accompanying increased entity length in the NER phase, the relatively paltry positive examples can be easily overwhelmed by a vast array of negative samples, thereby instilling the network a strong propensity to categorize all samples as negative, leading to difficulty in accurately identifying genuine entities. To mitigate this, LAI-Net incorporates a deliberately inserted filter prior to the entity classifier to preliminarily sieve out spans that are clearly non-entity. To validate whether said filter genuinely facilitates the NER process, we devised associated ablation experiments. As depicted in the two rightmost columns of Table 2, the presence of the filter effectively improves NER performance, with advantages most conspicuous on the ACE05 dataset (surpassing no-filter models by 1.74% in terms of F1-scores), and improvements of varying degrees are also observed on other datasets.

5.3.2. Ablation Against Two Rounds of Interaction

We conducted an ablation experiment specifically targeting the two-round semantic interaction. Drawing upon the outcomes of the experiment, we can directly evaluate the extent to which our devised dual semantic interaction genuinely augments the model’s performance. What should be noted is that the second-round interaction is built upon the annotated label information (that is what the first-round interaction does), so the second-round interaction ceases to exist once the label annotation information is no longer injected. However, the existence of the second-round interaction does not affect the first-round interaction. As shown in Table 4, regardless of which round of semantic interaction is eliminated, it invariably leads to adverse effects of varying magnitudes on the newly SOTA that we have developed, encompassing the precision, recall, and F1-score metrics. From the perspective of performance degradation, the detrimental effect of eliminating the second-round interaction on the basis of LAI-Net is significantly more pronounced than the impact of further removing the first round of interaction on the basis of without second-round interaction, with the discrepancy peaking at over 21-fold (

\frac{4.67 %}{4.89 % - 4.67 %}

for RE+ in ACE05).

5.3.3. Ablations Against Attention Mask Matrix

In order to evaluate the efficacy of partial attention matrices, we selected three distinct attention mask matrices for ablation (details explained in Appendix B.10), namely, the full attention matrix, wherein all tokens are mutually visible; the anno-token visible attention matrix, where annotation tokens and text tokens are intervisible; and the anno-token invisible attention matrix, with annotation tokens and text tokens also being intervisible. Irrespective of the attention masking matrix employed, tokens of the same type remain mutually visible during the computation of attention scores.

As Table 5 elucidates, across all three datasets, the anno-token invisible attention matrix exhibits a markedly superior performance with strict F1 metrics compared to the other attention matrix types. The anno-token visible attention matrix comes in second place, with its largest deficit compared to the top-performing technique capping out at 4.7% across the varied tasks encapsulated within the trio of benchmarks. Meanwhile, the fully attention matrix has a performance lagging behind the peak achieved score on each respective dataset (max up to 6.28%), indicative of appreciably inferior capabilities among the range of workloads tested.

We attempt to analyze the underlying reasons, which may lie in the following points: (1) The mutual visibility mechanism between annotation tokens and text tokens establishes a conduit for semantic communication, thereby enhancing the semantic richness of the embeddings for both annotations and text. On the one hand, this enables annotation tokens to fully comprehend background information of text, which directly impacts their high-dimensional semantic representation. On the other hand, text tokens can also fully integrate the information from annotations, which directly facilitates the screening and categorization of candidate entities. (2) Conversely, the full attention matrix allows for the intermingling of semantic information among annotation tokens, text tokens, and entity candidate tokens, which may lead to an excessive infusion of additional semantics, resulting in semantic redundancy, and ultimately causing a decline in model performance.

5.4. Case Study

We conduct qualitative analysis on concrete examples to help understand our model. We take PL-Marker as the baseline, which is the closest model to ours. For a fair comparison, we set BERT-base as the common encoder. The golden data marked with entities and relations is shown in Figure 3, along with the results of LAI-Net and PL-Marker. For the given text, PL-Marker mistakenly predicts “coalition” as an ORG entity, of course missing the PART-WHOLE relation between “coalition” and “force”, while LAI-Net correctly predicts both entity and relation. In addition, PL-Marker captures “They” and “Iraqi” as “ORG” and “GPE”, but loses the “GEN-AFF” relation between them.

The disparity in NER performance between LAI-Net and PL-Marker is relatively marginal and is, thus, not the primary focus of our analysis. Accordingly, we direct our attention to a more in-depth comparison of their performance on the RE task. To illustrate the performance gap, we analyze a representative case drawn from numerous instances where LAI-Net yields correct predictions whereas PL-Marker performs suboptimally. Given the condition that both the head and tail entities are correctly recognized on two models, LAI-Net benefits from two phases of interaction between the label and the entity pair, thereby facilitating a more accurate understanding of the “GEN-AFF” relation. Consequently, it correctly predicts the relationship between “They” (ORG) and “Iraqi” (GPE).

6. Conclusions

Faced with the challenges posed by subtask excessive separation and modeling gaps between entities and relations, we propose LAI-Net, which leverages a two-phase semantic interaction and attains SOTA performance in both NER and RE tasks. The key novelty is the multi-phase semantic interaction framework to effectively inject external knowledge and unify representations for entities and relations. We manually annotate label abbreviations to fully semantic unbroken phrase for expanding lexical semantics, and then leverage

G^{4} CN

to fuse information from latent multi-view. An entity candidate filter is embedded to coarse screening of candidate spans for NER. Experiments on several datasets show that our LAI-Net allows high-level semantic information flow and facilitates the E2ERE task, successfully achieving the SOTA performance. While the approach is effective, it may be influenced by factors such as annotation quality and limited use of external knowledge. Extending the model to better handle unseen entities and more open-domain settings remains a promising direction for future work.

Author Contributions

R.L. participated in and led all aspects of the research, including but not limited to data analysis and preprocessing, algorithm design, code implementation, experimental procedure formulation, analysis of experimental results, and writing and reviewing of the paper. Z.W. and H.M. contributed to the algorithm design, conducted the data analysis, and critically reviewed the manuscript for important intellectual content. W.W. and L.Z. were responsible for the statistical analysis and interpretation of data, and also helped in revising the manuscript for better clarity and coherence. F.L. participated in the data collection and contributed to the writing of the sections related to methodology and results. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China (Grant No.2024ZD01NL00102).

Data Availability Statement

The datasets used during the current study in this article are available publicly in Section 3. Access to these data is open to all interested researchers, and any restrictions on data availability, such as those related to participant privacy or confidentiality, can be found in the websites mentioned in Section 3. The processing of the data and the associated code will be completely transparent after the paper is accepted.

Acknowledgments

This work was supported by National Key R&D Program of China (Grant No.2024ZD01NL00102). The authors declare no other relevant financial or non-financial interests related to the content of this manuscript that could be perceived as influencing the work reported.

Conflicts of Interest

This research was funded by the project with the grant numbers 2024ZD01NL00100. The authors declare no other relevant financial or non-financial interests related to the content of this manuscript that could be perceived as influencing the work reported. This statement is provided to ensure a transparent and unbiased evaluation of the research.

Appendix A. Vanilla Graph Convolutional Network

Typically, a good text representation is a prerequisite for achieving superior model efficacy, which has motivated numerous researchers to try to leverage contextual features through diverse model architectures to better comprehend textual semantics, thereby enhancing the representation of text. Among all these architectural choices, the graph convolutional network (GCN), whose goal is to learn structure-aware node representations on the graph, is a widely utilized framework for encoding information within graphs, where in each GCN layer, every node engages in information exchange and communication with neighboring nodes through their connections (called edges). The efficacy of the GCN model in encoding contextual information on a graph of input sentences has been demonstrated by numerous prior studies.

Now, we briefly describe the vanilla graph convolutional networks in mathematical form. The first step in utilizing GCNs is to build a graph

G = (V, E)

from plain text, where

v_{i} \in V

represents a node of

G

, and

e_{i j} = (v_{i}, v_{j})

represents the edge between nodes

v_{i}

and

v_{j}

, respectively. The structural information that GCN can comprehend is concretely manifested in sentences as the dependencies between words, which are represented by an adjacency matrix

A = {(a_{i, j})}_{n \times n}

. In vanilla GCN,

a_{i, j} = 1

if

i = j

or there is a dependency connection (arc) between two words/tokens

x_{i}

and

x_{j}

, and

a_{i, j} = 0

otherwise. However, in our work, we adopt a new method to initiate the

a_{i, j}

with neural networks.

Based on the graph with adjacency matrix

A

, for each node

v_{i}

, the l-th GCN layer gathers the information carried by its neighbor node and computes the output representation

h_{i}^{(l)}

for

v_{i}

by

\begin{matrix} h_{i}^{(l)} = σ (\sum_{j = 1}^{n} a_{i, j} (W^{(l)} \cdot h_{j}^{(l - 1)} + b^{(l)})) \end{matrix}

(A1)

where

σ

is the activation function, and

h_{j}^{(l - 1)}

denotes the output representation of

v_{j}

from the

(l - 1)

-th GCN layer;

W^{(l)}

and

b^{(l)}

are trainable matrices and the bias for the l-th GCN layer, respectively.

In the other words, the l-th layer in a vanilla GCN can be written in the form of matrices as follows:

\begin{matrix} \hat{A} & = D^{- \frac{1}{2}} A D^{- \frac{1}{2}} \end{matrix}

(A2)

\begin{matrix} H^{(l + 1)} & = σ (\hat{A} H^{(l)} W^{(l)}) \end{matrix}

(A3)

\begin{matrix} H^{(l)} & = {h_{1}^{(l)}, h_{2}^{(l)}, h_{3}^{(l)}, \dots, h_{| V |}^{(l)}} \end{matrix}

(A4)

where

H^{(0)}

is the combination of all initial node representations, where

\hat{A}

is the normalized symmetric adjacency matrix and

W^{(l)}

is a parameter matrix for the l-th GCN layer,

D

is the diagonal node degree matrix, where

D_{i i} = \sum_{j} A_{i j}

,

σ

is a non-linear activation function like

ReLU

.

Finally, the entire L-layer vanilla GCN modeling process can be formulated as follows:

\begin{matrix} GCN (A, H^{(0)}) & = H^{(L)} \\ = σ (\hat{A} H^{(L - 1)} W^{(L - 1)}) \end{matrix}

(A5)

Figure A1. Statistical analysis diagram of algorithm stability, in which each data point represents the averaged F1 performance with various random seeds, and the shaded area surrounding the data line indicates the error range under the specified hyperparameters (GCN layer number or attention head number), which we measure with standard deviation.

Appendix B. Implement Details

Appendix B.1. Chosen of Baselines

To ensure fairness and comprehensiveness in performance comparison as much as possible, we established several principles for selecting baselines. The first is to select well-recognized baselines from those cited in the SOTA works published in recent years. The second principle is to cover models of encoder-only, decoder-only, and encoder–decoder architectures as comprehensively as possible, and the priority is given to models that have similar or comparable parameter scales to our PLM during the selection process for encoder-only-based baselines. Thirdly, one or more datasets should be the same as those used in our work. Lastly, we tend to select methods that are formally published in relatively authoritative conferences or journals, which is not mandatory.

Appendix B.2. PLMs and Hardware Devices

For fair comparison with previous works, we employ bert-base-uncased [69] as the encoders for ACE05, and use the in-domain scibert-scivocab-uncased (SciBERT is a BERT model trained on scientific text, whose corpus includes the full text of 1.14 million scientific papers (82% in biomedical and 12% in computer science), and may be more suitable for natural language processing tasks on the SciERC dataset) [70] as the encoder for SciERC, and all the experiments are executed using three GeForce RTX 3090 24 GB GPUs.

Appendix B.3. Optimizer and Learning Rate Settings

We use the AdamW optimizer during training. We set the learning rate as 4 × 10⁻⁴ for both the NER and RE tasks. We tried to set different learning rates for different layers, and experimental results show that this is useless.

Appendix B.4. Maximum Length Settings

We, respectively, set the maximum length of reorganized sentence C as 150, 150 on ACE05, SciERC. For enumerating possible spans, we set the maximum span length L as 8 for all datasets, and limit the number of entity candidates as 220 for every train/eval sample.

Appendix B.5. Batch Size and Epoch Settings

In the NER phase, we, respectively, set batch size as 16 per GPU for SciERC and 20 per GPU for ACE05. And in the RE phase, we set the batch size as 40 per GPU for SciERC and 50 per GPU for ACE05. We set the epoch as 80 for all datasets in the NER phase, and 60 for all datasets in the RE phase.

Appendix B.6. Avoidance the Negative Influence of Annotation

Considering that our method increases the length of the training instances, which could impact the model’s performance, we strive to alleviate this negative impact as much as possible. Firstly, we avoid using overly long annotation information, but simply restore label abbreviations (such as restoring the PER tag to “person” and the PER-ORG tag to “person-organization”), and the length of the annotations used in the experiments does not exceed 10 words. Secondly, we use adjusted attention mask matrices to minimize semantic confusion caused by the insertion of annotations.

Appendix B.7. Cold Start Settings for NER

It should be noted that the enumeration of a large number of non-entity spans as negative samples in the NER phase is extremely likely to lead to “unable-convergence” train or the phenomenon that the model directly predicts all spans as non-entities. Therefore, we set up a specific cold start process for training. In the first half of training, we use the true labels as the filter result for next-entity classification in order to calculate the loss to update the parameters. This guides the model during early learning before exposing it to negatively labeled non-entity spans. In actual training, we set the cold boot epoch number as 15, 40 for ACE05, SciERC, respectively. Moreover, according to unpublished experimental results, experiments without cold start configuration were utterly incapable of acquiring any knowledge whatsoever, with all performance metrics equaling zero throughout the training phase.

Appendix B.8. Symmetry of Relation for RE

We formulate the directed relation as

r_{i, j}

, with the subject entity

e_{i}

always pointing to object entity

e_{j}

. Therefore, a triplet with positive relation can be written as

e_{i} \to r_{i, j} \to e_{j}

, and its reverse formula is

e_{j} \to r_{j, i} \to e_{i}

.

r_{i, j}, r_{j, i}

may refer to different relation types. In the RE phase, we consider the subject to be left and the object to be right by default. Either (

e_{i} \to r_{i, j} \to e_{j}

) or (

e_{j} \to r_{j, i} \to e_{i}

) will be predicted if it is a real relation. Only when they are both predicted to be true (that is, the probability value is greater than the threshold) will the triplet (

e_{i}, r_{i, j}, e_{j}

) and triplet (

e_{j}, r_{j, i}, e_{i}

) be established.

Appendix B.9. Stability of Training

To ensure the reliability of the experimental data, we repeated the experiments multiple times for each hyperparameter combination, each time initializing the network with a different random seed, and then we calculated the mean and standard deviation of the experimental results, as shown in Figure A1. The first row of the chart in Figure A1 represents the fluctuations in NER and RE performance across three datasets for different numbers of GCN layers. The second row of the chart in Figure A1 represents the fluctuations in NER and RE performance across three datasets for different numbers of attention heads.

Figure A2. Diagrams of three types attention mask matrix with an assumption that there are two entity candidates.

Appendix B.10. Parital Attention Mask

Obviously, the series of annotation tokens and marker tokens manually attached after the text tokens does not form a coherent and semantically complete sentence. It inevitably affects the semantic construction from text tokens and impairs the representational ability of word vectors during the PLM encoding process.

To address this potential issue, we adopt a specialized partial attention mechanism to selectively mitigate or enhance the semantic impact of tokens from differed types. In detail, by adjusting the value of elements of the attention mechanism mask matrix, we can control the visibility among three types of tokens.

Partial attention can effectively control the information flow between different tokens. It suppresses the information interaction between text and annotations (i.e., invisible) while enhancing the information interaction between text and candidates (i.e., visible).

We show three different masking matrices in Figure A2, for which we conduct some ablation experiments. In conjunction with Figure A2, we further elucidate the meaning of the discrete elements within the mask matrix.

The first row of Figure A2a delineates the attention pattern associated with the first text token. As shown in the figure, both the rows and columns of the attention matrix are composed of three distinct segments: the original sentence tokens (text tokens), the annotation tokens, and the special marker tokens. The matrix itself is binary, where each element takes the value of 0 or 1, indicating the absence or presence of attention, respectively, from the row token to the column token.

In the first row of the matrix (corresponding to the first text token), the colored entries highlight which tokens can be attended to. The pale green regions indicate accessible tokens, the white regions indicate tokens outside the attention scope, and the faint yellow squares denote marker tokens within the receptive field. Notably, the two yellow squares on the left and right sides represent the start and end marker tokens, respectively.

For example, the element at the intersection of the first text token and the first annotation token is white, indicating a value of 0 in the matrix, which means that the model does not compute attention between them. In contrast, the colored entries (value 1) indicate that attention is permitted and that the semantic content of the corresponding token can be integrated during model computation.

In summary, by configuring the attention matrix with binary values, we can precisely control which tokens are allowed to attend to others. This mechanism enables selective semantic interaction across different types of tokens, allowing the model to integrate information from the designated tokens, facilitating effective information exchange.

References

Tang, W.; Xu, B.; Zhao, Y.; Mao, Z.; Liu, Y.; Liao, Y.; Xie, H. UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp. 7087–7099. [Google Scholar] [CrossRef]
Sun, C.; Gong, Y.; Wu, Y.; Gong, M.; Jiang, D.; Lan, M.; Sun, S.; Duan, N. Joint Type Inference on Entities and Relations via Graph Convolutional Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 1361–1370. [Google Scholar] [CrossRef]
Yang, P.; Cong, X.; Sun, Z.; Liu, X. Enhanced Language Representation with Label Knowledge for Span Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing; Moens, M.F., Huang, X., Specia, L., Yih, S.W.T., Eds.; Online; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 4623–4635. [Google Scholar] [CrossRef]
Ji, B.; Li, S.; Xu, H.; Yu, J.; Ma, J.; Liu, H.; Yang, J. Span-based joint entity and relation extraction augmented with sequence tagging mechanism. arXiv 2022, arXiv:2210.12720. [Google Scholar] [CrossRef]
Shang, Y.M.; Huang, H.; Sun, X.; Wei, W.; Mao, X.L. Relational Triple Extraction: One Step is Enough. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22; Raedt, L.D., Ed.; International Joint Conferences on Artificial Intelligence Organization; Main Track: Forest Lake, QLD, Australia; Volume 7, pp. 4360–4366. [CrossRef]
Dai, Z.; Wang, X.; Ni, P.; Li, Y.; Li, G.; Bai, X. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records. In Proceedings of the 2019 12th iNternational Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 19–21 October 2019; pp. 1–5. [Google Scholar]
Zhong, Z.; Chen, D. A Frustratingly Easy Approach for Entity and Relation Extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 50–61. [Google Scholar] [CrossRef]
Ye, D.; Lin, Y.; Li, P.; Sun, M. Packed Levitated Marker for Entity and Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 4904–4917. [Google Scholar] [CrossRef]
Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 1441–1451. [Google Scholar] [CrossRef]
Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Tian, H.; Wu, H.; Wang, H. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8968–8975. [Google Scholar]
Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-Efficient Transfer Learning for NLP. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 2790–2799. [Google Scholar]
Liu, W.; Fu, X.; Zhang, Y.; Xiao, W. Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 5847–5858. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, J. Chinese NER Using Lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 1554–1564. [Google Scholar] [CrossRef]
Sui, D.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3830–3840. [Google Scholar] [CrossRef]
Li, X.; Yan, H.; Qiu, X.; Huang, X. FLAT: Chinese NER Using Flat-Lattice Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 6836–6842. [Google Scholar] [CrossRef]
Qian, Y.; Santus, E.; Jin, Z.; Guo, J.; Barzilay, R. GraphIE: A Graph-Based Framework for Information Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 751–761. [Google Scholar] [CrossRef]
Daigavane, A.; Ravindran, B.; Aggarwal, G. Understanding Convolutions on Graphs, Understanding the Building Blocks and Design Choices of Graph Neural Networks. 2021. Available online: https://distill.pub/2021/understanding-gnns/ (accessed on 2 September 2021).
Wu, L.; Chen, Y.; Shen, K.; Guo, X.; Gao, H.; Li, S.; Pei, J.; Long, B. Graph neural networks for natural language processing: A survey. Found. Trends® Mach. Learn. 2023, 16, 119–328. [Google Scholar] [CrossRef]
Quirk, C.; Poon, H. Distant Supervision for Relation Extraction beyond the Sentence Boundary. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers; Lapata, M., Blunsom, P., Koller, A., Eds.; Association for Computational Linguistics: Valencia, Spain, 2017; pp. 1171–1182. [Google Scholar]
Luo, Y.; Zhao, H. Bipartite Flat-Graph Network for Nested Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 6408–6418. [Google Scholar] [CrossRef]
Sun, K.; Zhang, R.; Mao, Y.; Mensah, S.; Liu, X. Relation extraction with convolutional network over learnable syntax-transport graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8928–8935. [Google Scholar]
Xue, F.; Sun, A.; Zhang, H.; Chng, E.S. Gdpnet: Refining latent multi-view graph for relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Electr Network, 2–9 February 2021; Volume 35, pp. 14194–14202. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 2205–2215. [Google Scholar] [CrossRef]
Fu, T.J.; Li, P.H.; Ma, W.Y. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 1409–1418. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, Y.; Lu, W. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 241–251. [Google Scholar] [CrossRef]
Sahu, S.K.; Christopoulou, F.; Miwa, M.; Ananiadou, S. Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 4309–4316. [Google Scholar] [CrossRef]
Christopoulou, F.; Miwa, M.; Ananiadou, S. Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 4925–4936. [Google Scholar] [CrossRef]
Zeng, S.; Xu, R.; Chang, B.; Li, L. Double Graph Based Reasoning for Document-level Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP); Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1630–1640. [Google Scholar] [CrossRef]
Luan, Y.; Wadden, D.; He, L.; Shah, A.; Ostendorf, M.; Hajishirzi, H. A general framework for information extraction using dynamic span graphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 3036–3046. [Google Scholar] [CrossRef]
McDonald, R.; Pereira, F.; Kulick, S.; Winters, S.; Jin, Y.; White, P. Simple algorithms for complex relation extraction with applications to biomedical IE. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Stroudsburg, PA, USA, 25–30 June 2005; pp. 491–498. [Google Scholar]
Iria, J. T-rex: A flexible relation extraction framework. In Proceedings of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK’05), Manchester, UK, 2005; Volume 6, p. 9. [Google Scholar]
Culotta, A.; Sorensen, J. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), Barcelona, Spain, 21–26 July 2004; pp. 423–429. [Google Scholar]
Jiang, J.; Zhai, C. A systematic exploration of the feature space for relation extraction. In Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY, USA, 22–27 April 2007; Proceedings of the Main Conference. pp. 113–120. [Google Scholar]
Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 2335–2344. [Google Scholar]
Bekoulis, G.; Deleu, J.; Demeester, T.; Develder, C. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst. Appl. 2018, 114, 34–45. [Google Scholar] [CrossRef]
Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A novel cascade binary tagging framework for relational triple extraction. arXiv 2019, arXiv:1909.03227. [Google Scholar]
Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint extraction of entities and relations based on a novel tagging scheme. arXiv 2017, arXiv:1706.05075. [Google Scholar] [CrossRef]
Ji, B.; Yu, J.; Li, S.; Ma, J.; Wu, Q.; Tan, Y.; Liu, H. Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations. In Proceedings of the 28th International Conference on Computational Linguistics; Scott, D., Bel, N., Zong, C., Eds.; Online; Association for Computational Linguistics: Barcelona, Spain, 2020; pp. 88–99. [Google Scholar] [CrossRef]
Wang, Y.; Sun, C.; Wu, Y.; Zhou, H.; Li, L.; Yan, J. UniRE: A Unified Label Space for Entity Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 220–231. [Google Scholar] [CrossRef]
Wang, Y.; Sun, C.; Wu, Y.; Li, L.; Yan, J.; Zhou, H. HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction. arXiv 2023, arXiv:2305.04297. [Google Scholar] [CrossRef]
Yan, Z.; Yang, S.; Liu, W.; Tu, K. Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 7512–7526. [Google Scholar] [CrossRef]
Zhu, T.; Ren, J.; Yu, Z.; Wu, M.; Zhang, G.; Qu, X.; Chen, W.; Wang, Z.; Huai, B.; Zhang, M. Mirror: A Universal Framework for Various Information Extraction Tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 8861–8876. [Google Scholar] [CrossRef]
Li, J.; Zhang, Y.; Liang, B.; Wong, K.F.; Xu, R. Set Learning for Generative Information Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 13043–13052. [Google Scholar] [CrossRef]
Tang, R.; Chen, Y.; Qin, Y.; Huang, R.; Zheng, Q. Boundary regression model for joint entity and relation extraction. Expert Syst. Appl. 2023, 229, 120441. [Google Scholar] [CrossRef]
Zaratiana, U.; Tomeh, N.; Holat, P.; Charnois, T. An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 19477–19487. [Google Scholar]
Wang, Y.; Liu, X.; Kong, W.; Yu, H.T.; Racharak, T.; Kim, K.S.; Le Nguyen, M. A Decoupling and Aggregating Framework for Joint Extraction of Entities and Relations. IEEE Access 2024, 12, 103313–103328. [Google Scholar] [CrossRef]
Wang, X.; Zhou, W.; Zu, C.; Xia, H.; Chen, T.; Zhang, Y.; Zheng, R.; Ye, J.; Zhang, Q.; Gui, T.; et al. InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction. arXiv 2023, arXiv:2304.08085. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Han, J.; Zhao, S.; Cheng, B.; Ma, S.; Lu, W. Generative prompt tuning for relation classification. arXiv 2022, arXiv:2210.12435. [Google Scholar] [CrossRef]
Chen, X.; Zhang, N.; Xie, X.; Deng, S.; Yao, Y.; Tan, C.; Huang, F.; Si, L.; Chen, H. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web Conference 2022, Online, 25–29 April 2022; pp. 2778–2788. [Google Scholar]
Wei, X.; Cui, X.; Cheng, N.; Wang, X.; Zhang, X.; Huang, S.; Xie, P.; Xu, J.; Chen, Y.; Zhang, M.; et al. Zero-shot information extraction via chatting with chatgpt. arXiv 2023, arXiv:2302.10205. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, Y.; Chen, B. A prompt tuning method based on relation graphs for few-shot relation extraction. Neural Netw. 2025, 185, 107214. [Google Scholar] [CrossRef] [PubMed]
Cui, X.; Yang, Y.; Li, D.; Cui, J.; Qu, X.; Song, C.; Liu, H.; Ke, S. PURE: A Prompt-based framework with dynamic Update mechanism for educational Relation Extraction. Complex Intell. Syst. 2025, 11, 1–14. [Google Scholar] [CrossRef]
Han, P.; Liang, G.; Wang, Y. A Zero-Shot Framework for Low-Resource Relation Extraction via Distant Supervision and Large Language Models. Electronics 2025, 14, 593. [Google Scholar] [CrossRef]
Duan, J.; Lu, F.; Liu, J. CPTuning: Contrastive Prompt Tuning for Generative Relation Extraction. arXiv 2025, arXiv:2501.02196. [Google Scholar] [CrossRef]
Li, Q.; Ji, H. Incremental Joint Extraction of Entity Mentions and Relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Toutanova, K., Wu, H., Eds.; Association for Computational Linguistics: Baltimore, MD, USA, 2014; pp. 402–412. [Google Scholar] [CrossRef]
Miwa, M.; Bansal, M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1105–1116. [Google Scholar]
Luan, Y.; He, L.; Ostendorf, M.; Hajishirzi, H. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 3219–3232. [Google Scholar] [CrossRef]
Gurulingappa, H.; Rajput, A.M.; Roberts, A.; Fluck, J.; Hofmann-Apitius, M.; Toldo, L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 2012, 45, 885–892. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Ballesteros, M.; Doss, S.; Anubhai, R.; Mallya, S.; Al-Onaizan, Y.; Roth, D. Label Semantics for Few Shot Named Entity Recognition. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 1956–1971. [Google Scholar] [CrossRef]
He, S.; Liu, K.; Ji, G.; Zhao, J. Learning to Represent Knowledge Graphs with Gaussian Embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM ’15), New York, NY, USA, 18–23 October 2015; pp. 623–632. [Google Scholar] [CrossRef]
Liu, T.; Jiang, Y.E.; Monath, N.; Cotterell, R.; Sachan, M. Autoregressive Structured Prediction with Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp. 993–1005. [Google Scholar] [CrossRef]
Wang, S.; Sun, X.; Li, X.; Ouyang, R.; Wu, F.; Zhang, T.; Li, J.; Wang, G. GPT-NER: Named Entity Recognition via Large Language Models. arXiv 2023, arXiv:2304.10428. [Google Scholar] [CrossRef]
Wan, Z.; Cheng, F.; Mao, Z.; Liu, Q.; Song, H.; Li, J.; Kurohashi, S. GPT-RE: In-context Learning for Relation Extraction using Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 3534–3547. [Google Scholar] [CrossRef]
Li, B.; Fang, G.; Yang, Y.; Wang, Q.; Ye, W.; Zhao, W.; Zhang, S. Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness. arXiv 2023, arXiv:2304.11633. [Google Scholar]
Wadden, D.; Wennberg, U.; Luan, Y.; Hajishirzi, H. Entity, Relation, and Event Extraction with Contextualized Span Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 5784–5789. [Google Scholar] [CrossRef]
Eberts, M.; Ulges, A. Span-based Joint Entity and Relation Extraction with Transformer Pre-training. arXiv 2019, arXiv:1909.07755. [Google Scholar]
Wang, J.; Lu, W. Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP); Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1706–1721. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3615–3620. [Google Scholar] [CrossRef]

Figure 1. An illustration of different representation enhancement methods. t indicates an individual token from text. The bordered rectangles highlighted in assorted colors signify discrete elements: blue, light yellow (or green), and pink, respectively, indicate entities markers, groups of entity tokens and trailed markers, and groups of label annotation tokens. Bidirectionally connected squares sharing the same color refer to elements that have identical position IDs.

Figure 2. Illustration of LAI-Net, where annotation interaction is highlighted by the red color. The left and right part of the architecture represent the RE and NER phases, respectively. In addition, ▴ indicates the span representation and different types of tokens marked with the same color with their corresponding embeddings.

Figure 3. Examples regarding using LAI-Net and PL-Marker in RE subtask.

Table 1. The main overall results, with our new SOTA results highlighted with bold and suboptimal performance highlighted with underline. In addition,

⋆, \circ, •

, respectively, indicate decoder-only, encoder–decoder, and encoder-only framework. Bold font indicates the optimal performance, while underline indicates the suboptimal performance.

Table 1. The main overall results, with our new SOTA results highlighted with bold and suboptimal performance highlighted with underline. In addition,

⋆, \circ, •

, respectively, indicate decoder-only, encoder–decoder, and encoder-only framework. Bold font indicates the optimal performance, while underline indicates the suboptimal performance.

	Models	Backbone	NER			RE			RE+
	Models	Backbone	P	R	F1	P	R	F1	P	R	F1
ACE05	SPAN (2020) [38]	• Bert-base	89.32	89.86	89.59	-	-	-	71.22	60.19	65.24
	UniRE (2021) [39]	• Bert-base	88.80	88.90	88.80	-	-	-	67.10	61.80	64.30
	PURE (2021) [7]	• Bert-base	-	-	90.20	-	-	67.70	-	-	64.60
	PL-Marker (2022) [8]	• Bert-base	-	-	89.70	-	-	68.80	-	-	66.30
	ASP (2022) [62]	∘ T5-base	-	-	90.70	-	-	71.10	-	-	68.60
	HIORE (2023) [40]	• Bert-base	-	-	89.60	-	-	-	-	-	65.80
	HGERE (2023) [41]	• Bert-base	-	-	89.60	-	-	-	-	-	65.80
	Mirror (2023) [42]	∘ DeBERTa-v3	-	-	86.72	-	-	-	-	-	64.88
	GPT-NER (2023) [63]	⋆ GPT3	72.77	75.51	73.59	-	-	-	-	-	-
	GPT-RE (2023) [64]	⋆ GPT3	-	-	-	-	-	-	-	-	68.73
	ChatGPT (2023) [65]	⋆ ChatGPT	-	-	-	-	-	-	-	-	40.50
	SET (2023) [43]	∘ T5-large	-	-	-	-	-	-	-	-	65.90
	BR (2023) [44]	• Albert	-	-	90.80	-	-	-	-	-	66.00
	ATG (2024) [45]	∘ DeBERTa-v3	-	-	90.10	-	-	68.70	-	-	66.20
	BiDArtER (2024) [46]	• Albert	-	-	89.80	-	-	-	-	-	68.40
	LAI-Net (Ours)	• Bert-base	90.28	90.60	90.44	73.80	70.42	72.06	71.96	68.67	70.27
SciERC	DyGIE++ (2019) [66]	• SciBert	-	-	67.50	-	-	-	-	-	48.40
	Spert (2019) [67]	• SciBert	70.87	69.79	70.33	-	-	-	53.40	48.54	50.84
	UniRE (2021) [39]	• SciBert	65.80	71.10	68.40				37.30	36.60	36.90
	PURE (2021) [7]	• SciBert	-	-	68.20	-	-	50.10	-	-	36.70
	PL-Marker (2022) [8]	• SciBert	-	-	69.90	-	-	52.00	-	-	40.60
	HIORE (2023) [40]	• SciBert	-	-	68.20	-	-	-	-	-	38.30
	Mirror (2023) [42]	∘ DeBERTa-v3	-	-	-	-	-	-	-	-	36.66
	ChatGPT (2023) [65]	⋆ ChatGPT	-	-	-	-	-	-	-	-	25.90
	InstructUIE (2023) [47]	⋆ FlanT5-11B	-	-	-	-	-	-	-	-	45.15
	GPT-RE (2023) [64]	⋆ GPT3	-	-	-	-	-	-	-	-	69.00
	SET (2023) [43]	∘ T5-large	-	-	-	-	-	-	-	-	35.90
	ATG (2024) [45]	• SciBert	-	-	69.70	-	-	51.10	-	-	38.60
	BiDArtER (2024) [46]	• SciBert	-	-	69.40	-	-	-	-	-	39.90
	LAI-Net (Ours)	• SciBert	70.04	69.89	69.94	65.56	68.48	66.99	59.84	62.01	60.88
ADE	Spert (2019) [67]	• Bert-base	89.02	88.87	88.94	-	-	-	78.09	80.43	79.24
	Table-Sequence (2020) [68]	• Bert-base	-	-	89.70				-	-	80.10
	SPAN (2020) [38]	• Bert-base	89.88	91.32	90.59	-	-	-	79.56	81.93	80.73
	LAI-Net (Ours)	• Bert-base	89.78	91.24	90.49	80.48	83.79	82.09	79.37	83.28	81.25

Table 2. The ablation F1 result against number of GCN layer, number of attention head, and whether there is an embed entity filter or not. Bold font indicates the optimal performance.

Task		Number of GCN Layer						Number of Attention Head					Entity Filter
Task		0	1	2	3	4	5	1	2	3	4	6	w	w/o
ACE05	NER	90.23	89.95	90.44	89.92	90.03	89.94	90.16	90.21	90.30	90.44	90.24	90.44	88.70
	RE	68.05	68.38	72.06	69.37	69.26	69.53	71.75	72.06	72.16	71.84	71.37	-	-
	RE+	65.61	65.75	70.27	66.93	66.70	66.88	69.54	70.27	70.03	69.86	69.21	-	-
ADE	NER	90.49	90.18	90.23	90.32	90.28	90.17	-	-	-	-	-	90.49	89.92
	RE	80.99	82.09	81.64	80.71	81.25	81.04	81.42	81.79	82.09	81.21	80.94	-	-
	RE+	80.99	81.25	80.81	80.39	80.95	80.63	80.83	81.02	81.25	80.88	80.55	-	-
SciERC	NER	69.40	69.94	69.47	69.17	69.31	69.40	69.42	69.76	69.32	69.94	69.23	69.94	69.76
	RE	66.08	66.27	66.99	66.43	66.35	66.28	65.80	65.66	65.62	66.99	64.48	-	-
	RE+	60.49	60.57	60.88	59.91	60.82	60.06	60.24	60.15	60.61	60.88	59.56	-	-

Table 3. The inference speed comparison results. Bold font indicates the optimal performance.

Task		Metric	PL-Marker	LAI-Net
ACE05	NER	F1	89.70	90.44
	NER	Speed (sent/s)	62.94	35.10 (−44.23%)
	RE	F1	66.30	70.27
	RE	Speed (sent/s)	93.14	43.00 (−53.83%)
SciERC	NER	F1	69.90	69.94
	NER	Speed (sent/s)	54.13	52.17 (−3.62%)
	RE	F1	40.60	60.88
	RE	Speed (sent/s)	93.29	39.57 (−57.58%)

Table 4. The ablation against two rounds of interaction. Bold font indicates the optimal performance.

Task		Method	P	R	F1
ACE05	NER	LAI-Net	90.28	90.60	90.44
		w/o 2nd	89.97 (−0.31)	90.50 (−0.10)	90.23 (−0.21)
		w/o 1st	89.72 (−0.55)	90.68 (0.08)	90.20 (−0.24)
	RE	LAI-Net	73.80	70.42	72.06
		w/o 2nd	69.70 (−4.09)	66.49 (−3.94)	68.05 (−4.01)
		w/o 1st	68.64 (−5.16)	67.16 (−3.26)	67.89 (−4.17)
	RE+	LAI-Net	71.96	68.67	70.27
		w/o 2nd	67.20 (−4.77)	64.10 (−4.58)	65.61 (−4.67)
		w/o 1st	66.47 (−5.49)	64.35 (−4.32)	65.39 (−4.89)
ADE	NER	LAI-Net	89.78	91.24	90.49
		w/o 2nd	-	-	-
		w/o 1st	88.94 (−0.84)	91.07 (−0.17)	89.99 (−0.51)
	RE	LAI-Net	79.38	83.29	81.26
		w/o 2nd	79.01 (−0.37)	83.17 (−0.12)	81.04 (−0.22)
		w/o 1st	79.08 (−0.30)	82.99 (−0.30)	80.99 (−0.27)
	RE+	LAI-Net	79.37	83.28	81.25
		w/o 2nd	78.73 (−0.64)	82.64 (−0.64)	80.63 (−0.62)
		w/o 1st	78.47 (−0.91)	82.44 (−0.84)	80.41 (−0.85)
SciERC	NER	LAI-Net	70.04	69.89	69.94
		w/o 2nd	69.82 (−0.22)	68.98 (−0.91)	69.40 (−0.54)
		w/o 1st	69.58 (−0.46)	69.12 (−0.77)	69.35 (−0.59)
	RE	LAI-Net	65.56	68.48	66.99
		w/o 2nd	64.21 (−1.35)	68.07 (−0.41)	66.08 (−0.91)
		w/o 1st	63.96 (−1.60)	67.82 (−0.66)	65.83 (−1.16)
	RE+	LAI-Net	59.84	62.01	60.88
		w/o 2nd	59.79 (−0.05)	61.22 (−0.80)	60.49 (−0.39)
		w/o 1st	59.81 (−0.03)	60.47 (−1.54)	60.14 (−0.74)

Table 5. The ablation results against attention mask matrix, where Vis. represents visible and Inv. represents invisible.

Task		NER	RE	RE+
ACE05	Inv.	90.44	72.06	70.27
	Vis.	90.36 (−0.08)	68.01 (−4.05)	65.57 (−4.70)
	Full	88.29 (−2.14)	65.79 (−6.28)	64.28 (−5.99)
ADE	Inv.	90.49	82.09	81.25
	Vis.	90.09 (−0.40)	81.26 (−0.83)	80.09 (−1.16)
	Full	89.40 (−1.10)	79.99 (−2.10)	78.92 (−2.33)
SciERC	Inv.	69.94	66.99	60.88
	Vis.	69.13 (−0.81)	66.44 (−0.55)	60.61 (−0.27)
	Full	66.21 (−3.73)	66.06 (−0.93)	60.30 (−0.59)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lai, R.; Wu, W.; Zou, L.; Liao, F.; Wang, Z.; Mi, H. LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction. Big Data Cogn. Comput. 2025, 9, 198. https://doi.org/10.3390/bdcc9080198

AMA Style

Lai R, Wu W, Zou L, Liao F, Wang Z, Mi H. LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction. Big Data and Cognitive Computing. 2025; 9(8):198. https://doi.org/10.3390/bdcc9080198

Chicago/Turabian Style

Lai, Rongxuan, Wenhui Wu, Li Zou, Feifan Liao, Zhenyi Wang, and Haibo Mi. 2025. "LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction" Big Data and Cognitive Computing 9, no. 8: 198. https://doi.org/10.3390/bdcc9080198

APA Style

Lai, R., Wu, W., Zou, L., Liao, F., Wang, Z., & Mi, H. (2025). LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction. Big Data and Cognitive Computing, 9(8), 198. https://doi.org/10.3390/bdcc9080198

Article Menu

LAI: Label Annotation Interaction-Based Representation Enhancement for End to End Relation Extraction

Abstract

1. Introduction

2. Related Work

3. Datasets and Preprocessing

4. Methodology

4.1. Task Definition

4.2. First-Round Semantic Interaction

4.3. Second-Round Semantic Interaction

4.4. Name Entity Recognition

4.5. Relation Extraction

5. Experiments

5.1. Main Results

5.1.1. Results Against Horizontal Comparison

5.1.2. Results Against Significant Hyperparameters

5.2. Inference Speed

5.3. Ablation Study

5.3.1. Ablations Against Entity Filter

5.3.2. Ablation Against Two Rounds of Interaction

5.3.3. Ablations Against Attention Mask Matrix

5.4. Case Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Vanilla Graph Convolutional Network

Appendix B. Implement Details

Appendix B.1. Chosen of Baselines

Appendix B.2. PLMs and Hardware Devices

Appendix B.3. Optimizer and Learning Rate Settings

Appendix B.4. Maximum Length Settings

Appendix B.5. Batch Size and Epoch Settings

Appendix B.6. Avoidance the Negative Influence of Annotation

Appendix B.7. Cold Start Settings for NER

Appendix B.8. Symmetry of Relation for RE

Appendix B.9. Stability of Training

Appendix B.10. Parital Attention Mask

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI