Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents

Zhou, Yipeng; Fan, Jiaxin; Zhang, Qingchuan; Zhu, Lin; Sun, Xingchen

doi:10.3390/app15105279

Open AccessArticle

Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents

by

Yipeng Zhou

^1,2,*,

Jiaxin Fan

^1,2,

Qingchuan Zhang

^1,2,*,

Lin Zhu

² and

Xingchen Sun

²

¹

Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China

²

School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5279; https://doi.org/10.3390/app15105279

Submission received: 9 March 2025 / Revised: 5 May 2025 / Accepted: 6 May 2025 / Published: 9 May 2025

Download

Browse Figures

Versions Notes

Abstract

Event extraction aims to identify and structure event information from unstructured text, playing a critical role in real-world applications such as news analysis, public opinion discovery, and intelligence gathering. Traditional approaches, however, struggle with event co-occurrence and long-distance dependencies. To address these challenges, we introduce the Semantic-aware Prompt-based Argument Extractor (SPARE) model, which integrates entity extraction, heterogeneous graph construction, event type detection, and argument filling. By constructing a document–sentence–entity heterogeneous graph and employing graph convolutional networks (GCNs), the model effectively captures global semantic associations and interactions between cross-sentence triggers and arguments. Additionally, a position-aware semantic role (SRL) attention mechanism is proposed to enhance the association between semantic and positional information, improving argument extraction accuracy in the context of event co-occurrence. The experimental outcomes on the Richly Annotated Multilingual Schema-guided Event Structure (RAMS) and WikiEvents datasets display considerable F1 score improvements, which confirms the model’s effectiveness.

Keywords:

event extraction; document-level event argument extraction; graph neural networks

1. Introduction

Event extraction (EE) [1] focuses on extracting events of interest to users from unstructured text, often including identifying event trigger words, event types, and arguments such as participants, time, and place in the event, and presenting them in a structured form. Event extraction involves two core subtasks: event detection and event argument extraction (EAE). The former aims to identifying events contained in the text, while the latter focuses on identifying the arguments involved in these events and their roles.

In event argument extraction, arguments refer to both entities and non-entity components of an event, such as time and location, while argument roles define their semantic relationship with event triggers. In the example in Figure 1, in a “life.die” event, the trigger word “killed” links to arguments like “killer”–“a man” and “victims”–“people”. Similarly, in a “life.injure” event, the trigger word “injured” corresponds to the argument-role pair “victim”–“girl” and “injurer”–“a man”. Event argument extraction has a wide range of applications, from recommendation systems [2] and financial intelligence [3,4] to question-answering systems [5,6,7]. At the sentence level [8,9], the task focuses on identifying arguments within individual sentences. At the document level [10,11], however, it requires analyzing multiple sentences to uncover argument relationships and shared contexts. One of the major challenges in document-level extraction is event co-occurrence when multiple related or independent events appear in the same text. Effectively addressing this phenomenon is crucial for improving extraction accuracy and efficiency, as it not only enhances the generalization ability of the model but also helps uncover deeper semantic connections between events. Consider a scenario where Event 1 (“killed”) and Event 2 (“injured”) share the same subject, “a man”, suggesting a temporal link between them. Additionally, Event 3 (“apprehended”) might refer to the same individual, weaving a complex web of interconnected events. Capturing such patterns requires models capable of recognizing argument sharing and underlying semantic structures. However, traditional event argument extraction methods often treat each event as an isolated case, breaking the text into separate samples. This approach not only weakens the ability to detect event relationships but also makes it harder to capture deeper meanings, such as causal links and chronological order.

In summary, numerous real-world documents depict multiple, interconnected events that span across sentences or even paragraphs. This document-level complexity presents two significant challenges for current event argument extraction. First, long-distance dependencies: the trigger words and arguments of events may be distributed in different sentences or even different paragraphs, making it difficult for traditional single-sentence modeling methods to capture these dispersed pieces of information. This long-distance dependency requires the model to possess cross-sentence reasoning and global modeling capabilities. Second, event co-occurrence: a document may contain multiple events, and there may be complex relationships between events, such as argument sharing, temporal order, or causal associations. Existing methods find it challenging to comprehensively capture the semantic dependencies in event co-occurrence, which affects the accuracy of extraction and overall performance.

To handle the challenges previously outlined in this paper, we present the Semantic-Aware Prompt-based Argument Extractor (SPARE) model. This model not only tackles these challenges but also enhances the practical utility of event extraction in large-scale applications such as news analysis, public opinion discovery, and intelligence gathering. The SPARE model integrates entity extraction, heterogeneous graph construction, event type detection, and argument filling based on table generation, aiming to achieve efficient event argument extraction. Firstly, it encodes the context through Conditional Random Fields (CRFs), which identifies entities in the text and provides the basis for subsequent modules. CRFs can capture the dependencies between contexts by modeling global features, thereby enhancing the accuracy and coherence of entity recognition. The model then constructs a heterogeneous graph with documents, sentences, and entities as nodes, and captures long-distance dependencies and event co-occurrence information through graph convolutional networks (GCNs), generating a document-level global representation. Using this global representation, the model accurately identifies event types, effectively addressing the problem of event co-occurrence. Moreover, the model innovatively introduces a semantic role attention mechanism based on position-awareness, strengthening the semantic association between trigger words and arguments. Finally, through span selection, the model accurately extracts event arguments from the text and fills them into predefined argument slots, achieving end-to-end event information structuring extraction. This design can efficiently handle long-distance dependencies and complex event co-occurrence situations. Our experimental evaluations on the RAMS and WikiEvents datasets reveal a statistically significant improvement in F1 scores. These results validate the excellent performance of the proposed method in the event extraction task.

Our contributions can be summarized as follows:

To address the challenges of long-distance dependencies, a document–sentence–entity heterogeneous graph is constructed and graph convolutional networks (GCNs) are employed to model global semantic associations. This approach effectively captures interactions between cross-sentence triggers and arguments, enabling the model to better handle information dispersed across different sentences and paragraphs.
To tackle the challenges of event co-occurrence, a position-aware semantic role (SRL) attention mechanism is proposed. This mechanism strengthens the association between semantic and positional information, thereby improving the accuracy of argument extraction and allowing the model to more effectively handle the complex relationships between multiple events in a document.
We conducted comprehensive evaluations of SPARE on two widely recognized benchmark datasets in the field of event argument extraction: RAMS and WikiEvents. The experimental outcome is that SPARE surpasses the latest baseline methods.

The remainder of this paper is structured as follows: Section 2 presents a comprehensive literature review of event type recognition, document-level event argument extraction, and joint event extraction. Section 3 outlines the task definition and introduces two proposed models: a model for dynamic recognition of event types based on graph neural networks and an event argument extraction model based on table generation. Section 4 describes the datasets and evaluation metrics utilized in the study, details the experimental parameterization, compares the proposed models with baseline approaches, presents the main experimental results, and includes ablation experiments to analyze the contributions of different components. Section 5 provides a detailed cross-event correlation analysis and evaluates the model’s semantic capture capabilities in both inter-event and intra-event correlation perspectives. Section 6 presents case studies to further illustrate the practical applications and effectiveness of the proposed models. Finally, Section 7 summarizes the key findings of this study and offers concluding remarks.

2. Related Work

2.1. Pre-Trained Language Models for Event Extraction

In recent years, pre-trained language models (PLMs) have achieved remarkable success across a wide range of natural language processing (NLP) tasks, including event extraction. Among these, BERT (Bidirectional Encoder Representations from Transformers) [12] and GPT (Generative Pre-trained Transformer) [13,14] represent two influential paradigms: bidirectional masked language modeling and unidirectional autoregressive generation, respectively.

BERT-based models have significantly advanced event extraction tasks by leveraging deep bidirectional contextual representations. For instance, Shi et al. [15] proposed a hybrid framework that employs separate encoders for event detection and argument extraction, effectively mitigating feature interference between tasks and enhancing overall performance. Additionally, Wan et al. [16] introduced a joint document-level event extraction approach using a Token–Token Bidirectional Event Completed Graph (TT-BECG), which captures intricate token-level relationships and improves the extraction of complex event structures. These studies demonstrate the robustness and adaptability of BERT-based models in handling diverse event extraction scenarios.

On the other hand, the GPT series—particularly GPT-3 [14], GPT-3.5, and GPT-4—have shown remarkable performance in generative and reasoning-heavy tasks via few-shot or zero-shot prompting. GPT-3.5 serves as an improved intermediate step between GPT-3 and GPT-4, with enhanced reasoning ability and reduced hallucinations. Gao et al. [17] explored the feasibility of using ChatGPT-3.5 for event extraction and found that, while ChatGPT can perform the task, its performance is approximately 51% of that of specialized models in complex scenarios, highlighting challenges in robustness and prompt sensitivity. Conversely, Wei et al. [18] proposed ChatIE, a two-stage framework that transforms zero-shot information extraction into a multi-turn question-answering problem, enabling ChatGPT to achieve impressive results and even surpass some fully supervised models on certain datasets. These findings suggest that, while GPT-based models show promise in event extraction, especially in low-resource settings, careful prompt design and task decomposition are crucial for optimal performance.

From a mathematical perspective, both BERT and GPT are based on the Transformer architecture [19], but they differ fundamentally in their pre-training objectives and attention mechanisms. BERT is trained using a masked language modeling (MLM) objective. Given an input sequence

X = [x_{1}, x_{2}, \dots, x_{n}]

, the model randomly masks tokens and learns to predict them based on bidirectional context:

L_{MLM} = - \sum_{i \in M} log P (x_{i} | X_{∖ i})

(1)

where

M

is the set of masked positions and

X_{∖ i}

denotes the masked sequence. This bidirectional encoding enables better reasoning over global context. Its attention mechanism can be expressed as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(2)

In contrast, GPT uses a causal language modeling (CLM) objective:

L_{CLM} = - \sum_{t = 1}^{n} log P (x_{t} | x_{1}, x_{2}, \dots, x_{t - 1})

(3)

and employs a causal mask to block forward attention:

{Attention}_{GPT} (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}} + M) V

(4)

where M is a mask matrix enforcing the left-to-right constraint. This unidirectional nature is advantageous for generative tasks but limits GPT’s performance in extraction settings where bidirectional context is essential.

Beyond BERT and GPT, other advanced PLMs have been developed with stronger long-document modeling capabilities, which are also relevant for document-level event argument extraction (D-EAE). Models such as Longformer [20] and BigBird [21] extend the Transformer to handle longer sequences efficiently, while T5 [22] and PaLM [23] offer a unified text-to-text framework across NLP tasks. Although their direct application to event argument extraction remains limited, these models provide potential directions for enhancing document-level extraction in future work.

Overall, while both BERT and GPT-based models demonstrate unique strengths, BERT-style architectures remain more advantageous for structured span-level tasks like D-EAE due to their bidirectional encoding and fine-tuning flexibility. This study builds upon the BERT framework with targeted improvements and systematically compares its performance with GPT models to better understand their respective strengths and limitations in document-level argument extraction.

2.2. Event Type Recognition

Event type recognition plays a pivotal role in information extraction, with substantial progress achieved through various methodologies designed to address the complexities of open-domain text, the diversity of event expressions, and the scarcity of event-related data. Traditional approaches, such as bottom-up concept linking [24] and top-down clustering [25,26], faced limitations due to their dependence on external knowledge resources or predefined schema templates. In contrast, recent innovations have focused on harnessing the capabilities of large pre-trained language models (PLMs) to generate high-quality and comprehensive event schemas. For example, Shen et al. [27] proposed ETYPECLUS, a method that models events as (predicate, object) pairs and employs a reconstruction-based clustering technique to identify event types. Huang and Ji [28] introduced SS-VQ-VAE, a semi-supervised framework that utilizes a vector-quantized variational autoencoder to learn discrete event representations. Li et al. [29] developed TABS, a framework that integrates type abstraction with co-training to enhance the recognition of relation and event types by capitalizing on the complementary strengths of token-based and mask-based representations. Moreover, Tang et al. [30] developed ESHer, which uses PLMs to generate event schemas by structuring them based on confidence and aggregating them with graph techniques, effectively addressing the challenges of sparse and diverse event data.

However, despite these advancements, event type recognition still faces limitations that exacerbate existing challenges. The long-distance dependency issue hinders the model’s ability to capture coherent event structures, as critical event-related information (e.g., triggers and arguments) is often scattered across sentences or paragraphs, leading to incomplete or fragmented event representations. To address this, we propose a novel approach that constructs a document–sentence–entity heterogeneous graph and employs graph convolutional networks (GCNs) to model global semantic associations. This method effectively captures cross-sentence interactions, enabling the model to integrate dispersed event information and significantly improve event type recognition in long-text scenarios.

2.3. Document-Level Event Argument Extraction

Currently, there are four main strategies in the research methodology for document-level event argument extraction (DocEAE).

(1): Traditional classification-based approaches: determining whether a candidate argument acts as an argument for a role by generating candidate arguments and making classification judgments for each role, e.g., Xu et al. [31] modeled chapter semantics using Abstract Semantic Representation Graphs (AMRs), and Liu et al. [32] used the STCP methodology to introduce role correlations to enhance accuracy. In addition, Tan et al. [33] incorporated knowledge distillation with association modeling to improve the capture of argument role dependencies in event structures, thereby enhancing overall model performance.
(2): Span selection-based approaches: avoid the complexity of candidate generation by selecting the text span of an argument directly in the chapter. For example, Ma et al. [34] designed a hint template. This template generates two span selectors for individual characters, which are designed to capture the boundary positions of arguments. Nguyen et al. [35] extended the PAIE approach by introducing soft prompts to more flexibly utilize contextual information. Li et al. [36] went further by constructing a network of dependency-aware graphs within and between events to model the role dependencies in events. Zhang et al. [37] captured long-distance dependencies through a sparse attention mechanism, which effectively improved the extraction accuracy and efficiency. Zhang et al. [38] proposed a hyperspherical multi-prototype model with optimal transport to assign arguments to prototypes, guiding the learning of argument representations for event argument extraction.
(3): Machine Reading Comprehension (MRC)-based approach: the task is converted to machine reading comprehension. Argument extraction is achieved by asking questions and identifying the answers in the text. For example, Wei et al. [39] enhanced the inference ability of the model by capturing the semantic relationships between arguments and arguments by using other arguments and their roles in the same event as clues.
(4): Text generation-based approach: the task is formulated as text generation to realize event argument extraction. Du et al. [40] extended the generative model to capture the association semantics between multiple events. Ren et al. [41] incorporated retrieval enhancement techniques into the generative model for better generation of argument information, which provides diversified solution ideas for event argument extraction.

Although these strategies have made progress, they still face certain limitations. Traditional generative frameworks, while effective in modeling complex event semantic relationships, often employ autoregressive mechanisms that rely on sequence order, limiting the efficiency of parallel processing of multiple events. This sequential dependency hampers the model’s ability to handle long-distance dependencies and complicates the extraction of arguments spanning multiple sentences. To address these issues, a document–sentence–entity heterogeneous graph is constructed and graph convolutional networks (GCNs) are employed to model global semantic associations, enabling more efficient handling of dispersed information. Additionally, a position-aware semantic role (SRL) attention mechanism is proposed to strengthen the association between semantic and positional information, improving argument extraction accuracy and the model’s ability to manage complex event relationships in documents.

2.4. Joint Event Extraction

Joint event extraction, a pivotal focus area within natural language processing, is dedicated to pinpointing events and their associated arguments within text. Initial research endeavors were chiefly concentrated on sentence-level extraction, with the primary aim of identifying event triggers and corresponding arguments confined to individual sentences. However, contemporary research has underscored the significance of document-level event extraction, which entails extracting events and their arguments from entire documents while accounting for the intricate relationships and dependencies between events. For instance, Wang et al. [42] proposed a Neural Gibbs Sampling (NGS) method that combines neural networks with Gibbs sampling to model the joint distribution of event arguments, thereby improving the correlation modeling among event arguments. Sheng et al. [43] introduced CasEE, a joint learning framework with cascade decoding, which sequentially performs type detection, trigger extraction, and argument extraction to handle overlapping events. Xu et al. [44] developed a Heterogeneous Graph-based Interaction Model with a Tracker (GIT), which constructs a heterogeneous graph to capture global interactions among sentences and entity mentions, and incorporates a tracker to model the interdependencies among events. Wan et al. [16] proposed a Token–Token Bidirectional Event Completed Graph (TT-BECG) method, which designs a novel graph structure to accurately decode event arguments and event types, transforming the document-level event extraction into a prediction and decoding task of the token–token adjacency matrix. Another approach, also by Wan et al. [45], proposed a novel method for document-level event extraction that introduces a token-event-role data structure and a multi-channel argument role prediction module, effectively merging entity extraction and multi-event extraction into a unified task.

These studies collectively advance the field of joint event extraction by addressing the challenges of argument scattering, event overlapping, and interdependencies among events. However, challenges such as long-distance dependencies and event co-occurrence remain. Existing methods often struggle to capture the complex interactions between events and their arguments, particularly when these elements are dispersed across multiple sentences or paragraphs. Additionally, some approaches face limitations in computational efficiency and the ability to handle overlapping events accurately. To address these challenges, our work introduces a novel approach, the Semantic-aware Prompt-based Argument Extractor (SPARE) model, which enhances both the efficiency and accuracy of joint event extraction.

3. Approach

3.1. Task Definition

This paper targets document-level event argument extraction by formalizing the task as a span extraction problem through a prompt-based framework, which extracts arguments directly from text via flexible prompt templates without relying on trigger words, leveraging global contextual information to model long-distance semantic dependencies. A document X is a collection of L sentences, defining the i-th sentence sequence as

S_{i}

, and the whole document sequence as

X = S_{1}, S_{2}, \dots, S_{L}

. The event type

e \in E

is recognized by the model. For each recognized event type, predefined ontological roles

R_{e} = {r_{1}, r_{2}, \dots, r_{m}}

guide the extraction process through prompt templates adapted from Ma et al. [34], where each hint contains all the ontological roles of the corresponding event. In document X, the goal is to extract the argument set A for all events within the document in one go. This allows the model to automatically identify each event type and obtain the corresponding argument role r as well as the specific text span of the argument in the document through the prompt template.

So as to surmount these challenges, the SPARE model employs two synergistic modules. First, the complex relationships between documents, sentences and entities are modeled by Graph Neural Network (GNN) to accurately identify event types. Next, the slotted table is constructed using table generation, and the model’s understanding is further refined through the semantic role-attention mechanism to extract and populate the correct event arguments. Overall, these two modules can contribute significantly to enhancing the precision and reliability of event argument extraction by fostering interdependence and mutual strengthening between them.

3.2. Model for Dynamic Recognition of Event Types Based on Graph Neural Networks

The architecture of the event type detection model, illustrated in Figure 2, consists of three sequential modules: entity extraction, heterogeneous graph construction, and event type detection. The first module employs Conditional Random Fields (CRFs) for entity recognition. In the second module, a heterogeneous graph is constructed to capture the dynamic interactions and associations among documents, sentences, and entities. Subsequently, Graph Convolutional Networks (GCNs) are leveraged to extract document-level representations, effectively encoding the structural dependencies inherent in the data; and event type detection is performed in the last module. Through the step-by-step design of entity extraction, heterogeneous graph construction, and event type detection, complex semantic interactions and event associations in documents are gradually modeled from local to global, which improves the effectiveness and generalization ability of document-level event extraction.

3.2.1. Entity Extraction

Entity recognition is approached as a sequence-labeling task in the study. Each sentence is processed using the Bidirectional Encoder Representations from Transformers (BERT) pre-trained model, which transforms the input sequence into a series of dense, continuous vector representations. The representation vector of each token, denoted as

E_{tab}

, contains two parts:

E_{token}

(word embedding) and

E_{position}

(position embedding). To model interdependencies among entity labels effectively, the sequence-labeling framework integrates Conditional Random Fields (CRFs) for structured prediction. The sequence-labeling scheme uses the traditional BIO labeling system {B—begin (marking the beginning of an entity), I—inside (indicating the middle part of an entity), O—outside (signifying non-entity regions)}. In training, the negative log-likelihood loss function, i.e., the maximum probability value of the true label, is the loss function of the model for entity extraction:

L_{ner} = - \sum_{s \in X} log P (y_{s} | s)

(5)

where

L_{ner}

represents the loss function for named entity recognition,

y_{s}

represents the ground-truth label sequence for input sequence s, P denotes the likelihood assigned to this correct sequence, and the label sequence is decoded for inference based on maximum probability using the Viterbi algorithm [46]. This architecture synergizes the contextual knowledge from the pre-trained language model with the CRF’s ability to enforce label transition constraints, ensuring consistent and contextually coherent predictions. Consequently, the approach improves entity extraction accuracy, particularly in linguistically intricate texts.

3.2.2. Heterogeneous Graph Construction

We present a method for constructing a heterogeneous graph G, which includes three types of nodes: entities, sentences, and documents. The purpose is to model the interaction relationships between sentences and entities. In the graph structure

G = (V, E)

, the node set V consists of entities, sentences, and documents, while the edge set E is used to describe the associations between entities, entities and sentences, and documents and sentences.

The initial embeddings of entity, sentence, and document nodes are constructed using different strategies. Entity node embeddings are obtained by averaging the word vectors within the entity, i.e.,

h_{e}^{(0)} = M e a n_{j \in e} {g_{j}}

, while sentence node embeddings combine the maximum-pooled word vectors with positional embeddings, expressed as

h_{s}^{(0)} = M a x_{j \in s} {g_{j}} + S e n t P o s (s)

. Document node embeddings are derived by aggregating sentence and entity embeddings within the document. To enhance representation, a multihead attention mechanism is applied. Sentence and entity embeddings are mapped to query, key, and value matrices, and then split into multiple attention heads. Each head computes attention scores, applies softmax normalization, and updates representations. The final embedding is generated by concatenating the results from all attention heads and passing them through a fully connected layer. The multihead attention mechanism can be described as follows:

MultiHeadAttention (x) = FC ({(Attention (Q, K, V))}_{1}, \dots, {(Attention (Q, K, V))}_{m})

(6)

where m indicates the number of attention heads.

After computing the initial embedding vectors of sentences using multihead attention, we apply weighted pooling to aggregate these vectors along the document dimension, resulting in sentence-level document vectors. For entity embeddings, the word vectors within each entity are averaged to form a word embedding matrix of consistent size. Multihead attention is applied to the matrix, with outputs weighted and aggregated along the document dimension to produce word-level document vectors. These vectors are concatenated to form the document node’s feature representation. The feature vector is then formally defined:

DocEmbedding (X_{sent}, X_{token}) = [Attention (X_{sent}), Attention (X_{token})]

(7)

This model incorporates five types of edges to capture diverse interactions within the document:

(1): Sentence–sentence edges (S-S): These edges connect sentence nodes to model long-distance dependencies between sentences in a document.
(2): Sentence–entity edges (S-E): These edges link sentences to all entities mentioned within them, capturing the contextual information of entities in the sentence.
(3): Intra-sentence entity–entity edges (E-E intra): These edges connect different entities within the same sentence, indicating potential relationships between entities related to the same event.
(4): Inter-sentence entity–entity edges (E-E inter): These edges link occurrences of the same entity across multiple sentences, facilitating the tracking and continuity of entities throughout the document.
(5): Document–sentence edges (doc-S): These edges connect document nodes to sentence nodes, facilitating interactions between documents and sentences.

By integrating these edge types, the model can effectively aggregate information from different nodes, improving the representation of long-distance dependencies between sentences and entities. This heterogeneous graph structure provides a holistic view for modeling interactions between entities and sentences, strengthening the document’s overall structure. Furthermore, it enhances event-related information extraction by improving connections between documents and sentences. A multi-layer graph convolutional network (GCN) is utilized to capture global interactions within the graph, iteratively updating each node’s representation across layers. For a given node i with an initial feature representation, its representation at layer l is calculated using the following formula:

h_{i}^{l} = Relu (\sum_{r \in R} \sum_{j \in N_{i}^{r}} \frac{1}{C_{i, r}} W_{r}^{l} h_{j}^{l - 1} + b_{r}^{l})

(8)

where R represents all edge relation types.

N_{i}^{r}

represents the set of neighboring nodes connected to node i via relation type r.

C_{i, r}

is a normalization constant used for normalization.

W_{r}^{l}

denotes the weight matrix corresponding to edge relation type r, and

b_{r}^{l}

is the bias term corresponding to

W_{r}^{l}

. Relu represents the activation function.

The final hidden state of node i, denoted as

h_{i}

, is computed by concatenating the hidden representations from all GCN layers, including the initial input feature

h_{i}^{(0)}

, up to the final layer L. This concatenated vector,

[h_{i}^{(0)}; h_{i}^{(1)}; \dots; h_{i}^{(L)}]

, integrates the increasingly abstract representations of the node captured through multi-relational message passing across each layer. To generate the final representation

h_{i}

, this concatenated vector is then linearly transformed using a learnable weight matrix

W_{a}

. This process can be formally expressed as follows:

h_{i} = W_{a} [h_{i}^{(0)}; h_{i}^{(1)}; \dots; h_{i}^{(L)}]

(9)

where

h_{i}

is the final hidden state of node i,

W_{a}

is the learnable weight matrix for linear transformation, and

h_{i}^{(l)}

represents the hidden representation of node i at the l-th layer. Through this process, the final sentence and entity embedding vectors are acquired. This approach enables context-aware interactions between sentences and entities, capturing their relationships effectively. Moreover, this design enhances the information fusion among sentences, entities, and documents from a global perspective, which is especially suitable for capturing cross-sentence-dependent event information in documents

3.2.3. Event Type Detection

In real-world documents, multiple event types and their related arguments may be distributed across different sentences. To address this challenge, we frame event type detection as a multi-label classification task. The implementation involves two key steps: First, sentence feature matrix construction—sentence feature matrix S consists of all sentence embedding vectors concatenated together. Then, context-aware classification—apply a LogSoftmax classifier to S, which computes the probability distribution over all candidate event types while capturing cross-sentence contextual relationships:

A t t = MultiHead (Q, S, S) \in R^{d \times J}

(10)

R = Logsoftmax (A t t^{J} W_{j}) \in R^{J}

(11)

The attention mechanism is employed to compute the relevance between queries and keys with respect to values. Here, J denotes the number of possible event types. The attention output

A t t

is calculated using the multihead attention mechanism with Q, K, and V representing the query, key, and value matrices, respectively. Specifically,

Q \in R^{d \times J}

and

W_{j} \in R^{J}

are trainable parameters. Finally, the loss calculation value method for event type detection is calculated as follows:

L_{detect} = - \sum_{j = 1}^{J} \prod ({\hat{R}}_{j} = 1) log P (R_{j} | D) + \prod ({\hat{R}}_{j} = 0) log (1 - P (R_{j} | D))

(12)

where

{\hat{R}}_{j} \in R^{J}

finds all possible corresponding scores based on the trained parameters and the predicted event type is the index value corresponding to the maximum value. This design can effectively integrate the semantic information of different sentences to more accurately detect multiple event types in complex documents.

3.3. Event Argument Extraction Model Based on Table Generation

The overall architecture of the event argument extraction model is depicted in Figure 3, and it is composed of four modules: context encoding, prompt for extraction, feature fusion, and span selection. In the first module, this paper encodes the context. In the second module, we encode the prompt templates corresponding to the event types, and construct a slotted table where “column headings are the prompts for the corresponding event types, and row headings are for each event type”. In the third module, feature fusion is performed, and in particular, a position-awareness semantic role-attention mechanism is employed, enabling the model to learn more and thereby enhancing its accuracy. The last module is responsible for selecting the real values from the text to be filled into the argument slots, thus realizing event argument extraction.

3.3.1. Context Encoding

Given a context

X = S_{1}, S_{2}, \dots, S_{L}

, tokenize it:

\tilde{X} = [〈 s 〉, S_{1}, S_{2}, \dots, S_{L}, 〈 / s 〉]

(13)

In this paper, we choose

L

as RoBERTa-large, consisting of encoder and decoder:

L = [L_{enc}, L_{dec}]

. Thus the encoding of the text can be obtained:

E_{\tilde{X}} = Encoder (\tilde{x})

.

3.3.2. Prompt for Extraction

We encode the prompt templates corresponding to the event types, and construct a slotted table as input to the decoder with row headers for event types and column headers for splices corresponding to event type prompt. The slotted table construction enhances the ability to model events at a fine-grained level by processing multiple event types in parallel. Taking the example in Figure 3, two events, “life.die” and “life.injure”, are identified through event type detection. The column header is constructed as follows: Prompt for the “life.die” [34]: “Victim (and Victim) died at Place (and Place killed by Killer (and Killer)”. Prompt for the “life.injure”: “Victim (and Victim) injured by Injurer (and Injurer)”. Each underlined segment corresponds to an argument role. Multiple columns share the same argument role to extract multiple arguments that play the same role in the event. The representation of initialized column headings is represented by encoded prompt messages in series:

E_{P R_{j}} = Encoder (P R_{j})

(14)

E_{C H} = [E_{P R_{1}} : \dots : E_{P R_{j}} : \dots : E_{P R_{J}}]

(15)

where

P R_{j}

denotes the j-th prompt. In the slotted table, row i starts with the event type

E_{t_{i}}

, followed by the corresponding argument slots

S_{i}

. The argument slot is denoted as

S = {S_{i}}_{i = 1}^{N}

. The initial representation of the slotted table is row-wise concatenated, denoted as follows:

E_{Tab} = [E_{C H} : E_{t_{i}} : E_{s_{i}} : \dots : E_{t_{N}} : E_{s_{N}}]

(16)

3.3.3. Feature Fusion

The structure-aware attention mechanism and the cross-attention mechanism are used for iterative representation when decoding the slotted table, and the semantic role label (SRL) is used for other decoding. Semantic role labeling (SRL) is a natural language processing task. Its core task is to classify semantic roles (e.g., agent, patient, temporal adjunct, locative adjunct) for relevant phrases based on predicates within sentences. In this study, semantic roles are delineated with the help of SRL to distinguish the different components of a sentence and to allow positional attention in these components to be passed between the different semantic roles.

Take the sentence “A man injured a girl” as an example of semantic role labeling analysis, marking semantic roles and dependencies; the specific annotation process is illustrated in Figure 4. To further enhance model performance, after determining the core subject and predicate, the model needs to learn additional location-related semantic information. Based on this, this paper proposes a theoretical assumption: within a specific distance range, the influence of semantic roles on the hidden layer dimension follows a Gaussian distribution.

K e r n e l (u)

is a position-awareness Gaussian kernel function used to quantify the propagation of semantic role influence. Its mathematical definition is as follows:

K e r n e l (u) = exp (\frac{- u^{2}}{2 σ^{2}})

(17)

We construct an influence basis matrix K based on this assumption. The matrix K is defined mathematically as follows:

K (i, u) \sim N (K e r n e l (u), σ)

(18)

where i represents the dimension, u represents the semantic role distance, and

K (i, u)

represents the influence value. Additionally, we introduce a normal distribution N.

By analyzing the positional relationships of semantic roles, we construct an influence matrix for each role and derive context-aware impact vectors through cumulative distance-based statistics. The cumulative impact vector

S_{r_{j}}

of semantic roles at position j is finally obtained:

S_{r_{j}} = K C_{j}

(19)

where

C_{j}

represents the distance statistic vector for core semantic roles at j. The vector

C_{j} (u)

is computed by summing the indicator functions for positions

j - u

and

j + u

across all words w in the sentence.

C_{j} (u) = \sum_{w \in S} [1 else 0 if (j - u) \in pos (w)] + [1 else 0 if (j + u) \in pos (w)]

(20)

where

pos (w)

denotes the positions of w in core semantic roles.

Finally, the obtained influence vectors are combined and multiplied with the hidden layer vectors of RoBERTa-large, which effectively improves the model’s understanding of the relationships between semantic roles.

3.3.4. Span Selection

The span selector

\{Φ_{S_{k}}^{start}, Φ_{S_{k}}^{end}\}

is represented as follows:

Φ_{S_{k}}^{start} = h_{S_{k}} ⊙ W^{start}

(21)

Φ_{S_{k}}^{end} = h_{S_{k}} ⊙ W^{end}

(22)

where the argument slots

H_{S} \subset

the slotted table

H_{tab}

, each representation vector

h_{S_{k}} \in H_{S}

,

W^{start}

and

W^{end}

serve as learnable weight parameters, influencing the computation through element-wise multiplication ⊙. The span selector

\{Φ_{S_{k}}^{start}, Φ_{S_{k}}^{end}\}

operates on the text to identify and extract the most relevant span

({\hat{start}}_{k}, {\hat{end}}_{k})

, thereby filling the target argument slot

S_{k}

:

{logit}_{k}^{start} = H_{\tilde{X}} Φ_{S_{k}}^{start} \in R^{L}

(23)

{logit}_{k}^{end} = H_{\tilde{X}} Φ_{S_{k}}^{end} \in R^{L}

(24)

{score}_{k} (l, m) = {logit}_{k}^{start} [l] + {logit}_{k}^{end} [m]

(25)

({s t \hat{a} r t}_{k}, {e \hat{n} d}_{k}) = \underset{(l, m) : 0 < m - l < L}{argmax} {score}_{k} (l, m)

(26)

where l and m denote the indexes of arbitrary tokens. Since an event may involve multiple arguments fulfilling the same semantic role, further refinement of the golden argument span allocation is necessary during model training. To achieve this, we fine-tune the model by incorporating the bipartite graph matching loss, ensuring a more precise alignment between predictions and ground-truth labels. The loss function for the training example is defined in the following way:

P_{k}^{start} = Softmax ({logit}_{k}^{start})

(27)

P_{k}^{end} = Softmax ({logit}_{k}^{end})

(28)

L = - \sum_{i = 1}^{N} \sum_{({start}_{k}, {end}_{k}) \in δ (A_{i})} (log P_{k}^{start} [{start}_{k}] + log P_{k}^{end} [{end}_{k}])

(29)

where

δ (\cdot)

denotes the optimal allocation computed using the Hungarian algorithm. The

({start}_{k}, {end}_{k})

span is the ground-truth assigned to the k-th argument slot. If the parameter slot is not associated with any parameter, the assignment is

(0, 0)

. Span selection ensures the efficiency and accuracy of event extraction by combining positional and semantic information to precisely select the argument spans.

4. Experiment

4.1. Datasets and Evaluation Metrics

Experiments in this paper are performed on two widely utilized public datasets, RAMS [47] and WikiEvents [48]. The RAMS dataset contains 9124 annotated events from news articles, compared to the WikiEvents dataset, which consists of about 4000 event annotations from English-language Wikipedia news reports. In addition to event annotations, both datasets also incorporate information related to public opinion, providing a more comprehensive resource for event analysis and understanding. A detailed breakdown of the datasets is provided in the following Table 1:

By using these diverse datasets, we demonstrate that SPARE is well suited for event extraction tasks across multiple document types, including formal, informal, and multilingual texts, further proving the versatility and applicability of our approach in real-world applications.

This paper uses two metrics to measure performance:

(1): Strict Argument Identification F1 (Arg-I): A predicted event argument is regarded as valid if its range coincides precisely with the limits of any reference argument in the event.
(2): Strict Argument Classification F1 (Arg-C): A predicted event argument is validated as correct solely when its range and designated role category align with those of the reference argument.

The formulas are as follows:

Arg - I = 2 \times \frac{{Precision}_{Arg - I} \times {Recall}_{Arg - I}}{{Precision}_{Arg - I} + {Recall}_{Arg - I}}

(30)

Arg - C = 2 \times \frac{{Precision}_{Arg - c} \times {Recall}_{Arg - c}}{{Precision}_{Arg - c} + {Recall}_{Arg - c}}

(31)

where Precision and Recall represent the ratios of correctly predicted arguments to all predicted arguments and all golden arguments, respectively.

4.2. Experimental Parameterization

This study implements the event extraction model using the PyTorch framework 1.13.1+cu117. For event type detection, the model sets the sentence length to 64–128 tokens, employing an eight-layer encoder module and a four-layer decoder module with four attention heads per decoder layer. Key parameter configurations include a hidden layer dimension of 768, a feedforward neural network dimension of 1024, and a dropout rate of 0.1. In the event argument extraction model based on table generation, the encoder module is initialized using the first seventeen layers of the RoBERTa-large pre-trained model, while the decoder module uses the remaining seven layers. More detailed hyperparameter setting is shown in Table 2. The optimization process utilizes the AdamW algorithm [49], combined with a linear learning rate scheduler for parameter tuning.

4.3. Baseline Comparison

Compare this model with the following widely used benchmark models for event-theoretic meta-extraction on the same dataset.

(a): BART-Gen [48]: a generation-based approach that relies on input text and templates.
(b): PAIE [34]: a model for efficiently extracting sentence-level and document-level event parameters using pre-trained language models by facilitating inter-parameter interactions through prompt learning.
(c): TSRA [31]: a method that utilizes dual-stream encoding and AMR semantic enhancement maps for extracting arguments.
(d): TARA [50]: by constructing customized AMR graphs and using graph neural networks as link prediction models.
(e): TabEAE [51]: extends prompt-based EAE modeling to a non-autoregressive generative framework to extract arguments from multiple events in parallel.

4.4. Main Results

The experimental results in Table 3 highlight the superior performance of the proposed method on the RAMS and WikiEvents datasets. In terms of key metrics, namely argument identification (Arg-I) and argument classification (Arg-C), the F1 scores achieved by this method exceed those of the current SOTA approach. On the RAMS dataset, the method shows improvements of 1.6% in Arg-I and 2.7% in Arg-C. Similarly, on the WikiEvents dataset, it achieves a 5.4% increase in Arg-I F1 and a 5.3% increase in Arg-C F1.

4.5. Sensitivity Analysis

To assess the robustness of our model, we conduct a sensitivity analysis on key hyperparameters during fine-tuning, including the learning rate, batch size, and number of training epochs. Most other hyperparameters remain consistent with pre-trained defaults. This approach aligns with best practices in prior work [12], where only a small set of critical parameters are varied while others are kept fixed.

The following ranges are explored:

Learning Rate (Adam): 5 × 10⁻⁵, 3 × 10⁻⁵, 2 × 10⁻⁵
Batch Size: 4, 8
Number of Epochs: 2, 3, 4
Dropout Rate: 0.05, 0.1, 0.2

Performance is evaluated on the development sets of both RAMS and WikiEvents. Across all hyperparameter combinations, the F1 score varies within ±1%, indicating that the model is relatively insensitive to moderate changes in these settings. The optimal performance is observed when the learning rate is set to 2 × 10⁻⁵, batch size to 4, the number of epochs to 2, and dropout rate to 0.1, demonstrating the effectiveness of this configuration.

These results confirm that the proposed method maintains stable performance under typical hyperparameter fluctuations, supporting its robustness and applicability across domains.

4.6. Ablation Experiments

To validate the contribution of different components of the SPARE model, an ablation study was conducted on two datasets, with the results shown in Table 4. After removing the event type detection model, the Arg-I and Arg-C scores of RAMS decreased by 1.3% and 1.3%, respectively, and the Arg-I and Arg-C scores of WikEvents decreased by 0.9% and 0.8%, respectively. This indicates that the detection of event types is crucial for enhancing the precision of event argument extraction. After removing the position-awareness semantic role attention mechanism, the Arg-I and Arg-C metrics of RAMS decreased by 2.0% and 2.4%, respectively, and the Arg-I and Arg-C metrics of WikiEvents decreased by 2.0% and 2.0%, respectively. This demonstrates that reinforcing the location information of the arguments can boost the model’s capacity to perceive the semantic context within a specific area. When constructing table column headings without using prompts but instead using the connections of argument roles, the Arg-I and Arg-C metrics decreased by 2.6% and 3.0% for RAMS and 2.4% and 2.4% for WikiEvents, respectively. This is consistent with the finding of Ma et al [34] that hand-crafted prompts are very helpful for EAE tasks. When the pre-encoded embedding of the input form was replaced with the token embedding of RoBERTa, the Arg-I and Arg-C metrics of RAMS decreased by 5.5% and 7.1%, respectively, and the Arg-I and Arg-C metrics of WikiEvents decreased by 5.4% and 6.0%, respectively. This demonstrates the necessity of initializing the input form embedding with encodings computed by the encoder.

5. Analysis

5.1. Comparative Analysis of BERT and GPT Performance on D-EAE Tasks

To assess the capabilities of different pre-trained language models on D-EAE, we compare the performance of GPT-series models with that of our proposed BERT-based model, SPARE. The results for GPT-3.5 are adopted from Shuang et al. [52], while the GPT-4 results are based on findings reported by Liu et al. [53]. Table 5 presents experimental results on the RAMS dataset, illustrating how these models perform on two key metrics: argument identification (Arg-I) and argument classification (Arg-C).

As shown, SPARE significantly outperforms both GPT-3.5 and GPT-4. Specifically, it achieves an Arg-I score of 57.3 and an Arg-C score of 52.9, while GPT-4 records 50.4 and 42.8, respectively. GPT-3.5 exhibits an even lower performance. These results underscore the advantages of integrating BERT’s architectural characteristics with targeted enhancements tailored for D-EAE tasks.

SPARE builds upon the BERT framework, which inherently benefits from a bidirectional encoding mechanism. Unlike GPT models—which employ a unidirectional, autoregressive architecture that processes input from left to right—BERT is trained through masked language modeling, allowing simultaneous attention to both preceding and succeeding tokens. This bidirectional context modeling is especially advantageous in D-EAE, where identifying arguments often relies on long-range dependencies and semantic cues that span multiple sentences.

D-EAE inherently involves multi-sentence reasoning and discourse-level understanding. Argument spans may appear far from their corresponding event triggers, requiring the model to resolve coreference, infer implicit relations, or track entity roles across a document. SPARE leverages BERT’s holistic context representation to better capture these complex inter-sentential relationships, providing it with a clear edge over GPT models that struggle to incorporate future context during inference.

In addition, SPARE is explicitly fine-tuned on the argument extraction task, enabling it to learn structural regularities and semantic roles specific to event arguments. In contrast, GPT models are typically applied in zero-shot or few-shot paradigms without specialized task-specific training. This reliance on generic pre-training makes them less effective in structured prediction tasks like D-EAE, where precision in boundary detection and role classification is critical. Consequently, the performance gap observed reflects both architectural differences and task adaptation strategies.

Furthermore, we provide a brief analysis of the computational complexity of SPARE to complement the empirical evaluation. The inference complexity of SPARE, which is built upon a BERT-based encoder, is approximately

O (n^{2} \cdot d)

, where n denotes the sequence length and d is the hidden dimensionality. This reflects the quadratic complexity characteristic of the self-attention mechanism [19]. However, due to its fully parallelizable architecture, SPARE achieves efficient inference in practice, particularly when leveraging GPU acceleration. In contrast, GPT’s autoregressive decoding introduces sequential constraints, leading to a linear time dependency on the number of output tokens during generation. This can significantly slow down inference for long sequences [14]. These observations further underscore the advantages of encoder-only architectures like SPARE for structured extraction tasks that demand efficient and precise boundary-level predictions.

5.2. Cross-Event Correlation Analysis

To evaluate the model’s capability in cross-event correlation analysis, this paper measures the model by considering the event count per instance. As illustrated in Figure 5a, when applied to documents containing either a single event or multiple events within the RAMS dataset, the SPARE model achieves better performance than TabEAE, with respective increases of 0.5%, 2.7%, and 3.0%. In the same vein, when it comes to the WikiEvents dataset, there are increases of 0.8%, 5.0%, and 1.6% in performance, respectively, as can be seen in Figure 5b. The findings indicate that the current model is highly effective when dealing with documents that contain varying numbers of events, validating the capability of SPARE in cross-event correlation analysis.

5.3. Model Semantic Capture Capability: Inter-Event Correlation

To assess the model’s capability to understand the semantics of inter-event associations, this paper evaluates its performance by considering the number of arguments associated with each event, as shown in Figure 6, and compares our approach with the TabEAE model, which performs well in the relevant tasks, and at the same time, has a high degree of similarity with the present approach in terms of the task definitions and the design of the framework, which can reflect more accurately the model’s improved effectiveness in capturing the semantic boundaries between events. The performance of Arg-C improves by 1.0%, 0.9%, and 0.6% on the RAMS dataset and by 4.3%, 1.9%, and 2.1% on the WikiEvents dataset. The results show that the present model captures the inter-event semantic boundaries better.

5.4. Model Semantic Capture Capability: Intra-Event Correlation

In this paper, we evaluate the semantic capture ability of intra-event associations by analyzing the distance between an argument and a trigger word. In this paper, the distance is defined as the trigger word index minus the center header index. The experiments were also carried out on both datasets, as illustrated in Figure 7. Overall, the model is better able to capture intra-event semantic boundaries.

6. Case Studies

As illustrated in Figure 8 (WikiEvents case study), the event type detection model correctly identifies two events: “conflict.attack.n/a” and “contact.commitment promiseexpressintent”. However, the TabEAE model fails to extract the argument “Turkey’s air force” for the first event and misassigns the “Communicator” role of the second event to “Ankara” (ground truth: “Damascus”). This error likely stems from TabEAE’s over-reliance on local sentence tokens and trigger proximity, whereas SPARE successfully captures cross-sentence arguments through its position-aware semantic role attention mechanism, which explicitly models document-level argument distribution.

As shown in Figure 9 (Campus Public Opinion case study), the SPARE model demonstrates strong real-world applicability in detecting emerging public opinion events. In this case, it accurately identifies two interrelated events—“conflict.demonstrate.n/a” and “contact.requestadvise.n/a”—from a short news-style report about a student protest. While the TabEAE model correctly extracts some arguments (e.g., “Students” as demonstrators and communicators), it fails to recognize the university administration as the recipient of the students’ demands and overlooks the location “Greenfield University” for the advice request event. This shortcoming likely results from TabEAE’s limited ability to associate roles beyond the sentence scope.

In contrast, SPARE’s position-aware role attention mechanism allows it to trace arguments across sentence boundaries, enabling precise extraction of multi-role arguments. This showcases SPARE’s advantage in public opinion discovery, particularly in campus settings where protests and administrative responses are distributed across multiple utterances or documents.

7. Conclusions

This study focuses on critical challenges in event argument extraction, notably long-range dependencies and event co-occurrence, which inherently constrain the precision and scalability of existing approaches. Our framework introduces a novel heterogeneous graph architecture designed to model interactions among entities and event structures. Complementing this, a position-awareness semantic role attention mechanism is integrated to enhance argument identification. A rigorous evaluation of the RAMS and WikiEvents benchmarks reveals that the proposed method surpasses existing baselines, with empirical results highlighting its enhanced capability in capturing cross-event dependencies and semantic role correlations. These advantages suggest that our method holds strong potential for real-world applications such as news analysis, public opinion discovery, and intelligence gathering.

Limitations persist in domain generalization and multi-role entity disambiguation. The model struggles to adapt to unseen domains and resolve contextual role conflicts when entities participate in overlapping events. Future research should prioritize domain-agnostic learning and dynamic role-boundary detection mechanisms to improve robustness.

Author Contributions

Conceptualization, Y.Z. and Q.Z.; methodology, J.F., Y.Z. and Q.Z.; software, J.F. and Y.Z.; validation, J.F., L.Z. and X.S.; formal analysis, J.F.; investigation, Q.Z.; resources, Y.Z.; data curation, J.F.; writing—original draft preparation, J.F.; writing—review and editing, Y.Z.; visualization, J.F.; supervision, Y.Z.; project administration, Y.Z. and Q.Z.; funding acquisition, Y.Z. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under (Grant No. 62433002), the National Science and Technology Major Project (No. 2021ZD0113703), the Beijing Municipal Social Science Foundation Annual Planning Project (No. 23GLB014), the Project of Construction and Support for high-level Innovative Teams of Beijing Municipal Institutions (No. BPHR20220104), and Beijing Scholars Program (No. 099).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank all the anonymous reviewers for their thoughtful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiang, W.; Wang, B. A Survey of Event Extraction From Text. IEEE Access 2019, 7, 173111–173137. [Google Scholar] [CrossRef]
Saxena, A. A Survey of Session-Based Recommender Systems. In Proceedings of the 2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 15–16 December 2023; pp. 47–50. [Google Scholar] [CrossRef]
Jia, R.; Zhang, Z.; Jia, Y.; Papadopoulou, M.; Roche, C. Improved GPT2 Event Extraction Method Based on Mixed Attention Collaborative Layer Vector. IEEE Access 2024, 12, 160074–160082. [Google Scholar] [CrossRef]
Zheng, S.; Cao, W.; Xu, W.; Bian, J. Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 337–346. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, Y.; Zhao, J.; Wu, Y.; Xu, J.; Li, J. What the Role is vs. What Plays the Role: Semi-Supervised Event Argument Extraction via Dual Question Answering. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI, Virtual Event, 2–9 February 2021; pp. 14638–14646. [Google Scholar] [CrossRef]
Wang, J.; Jatowt, A.; Färber, M.; Yoshikawa, M. Improving question answering for event-focused questions in temporal collections of news articles. Inf. Retr. J. 2021, 24, 29–54. [Google Scholar] [CrossRef]
Hong, Z.; Liu, J. Towards Better Question Generation in QA-based Event Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 9025–9038. [Google Scholar] [CrossRef]
Jin, Y.; Jiang, W.; Yang, Y.; Mu, Y. Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and Matching. IEEE Trans. Multimed. 2022, 24, 1896–1908. [Google Scholar] [CrossRef]
Li, P.; Zhou, G. Joint Argument Inference in Chinese Event Extraction with Argument Consistency and Event Relevance. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 612–622. [Google Scholar] [CrossRef]
You, T.; Li, Z.; Fan, Z.; Yin, C.; He, Y.; Cai, J.; Fu, J.; Wei, Z. An Iterative Framework for Document-Level Event Argument Extraction Assisted by Long Short-Term Memory. In Proceedings of the Natural Language Processing and Chinese Computing, Hangzhou, China, 1–3 November 2024. [Google Scholar]
Chen, J.; Long, K.; Li, S.; Tang, J.; Wang, T. FineCSDA: Boosting Document-Level Event Argument Extraction with Fine-Grained Data Augmentation. In Proceedings of the Natural Language Processing and Chinese Computing—13th National CCF Conference, NLPCC 2024, Proceedings, Part II, Hangzhou, China, 1–3 November 2024; Wong, D.F., Wei, Z., Yang, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2024; Volume 15360, pp. 3–15. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Radford, A.; Narasimhan, K. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://api.semanticscholar.org/CorpusID:49313245 (accessed on 8 March 2025).
Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Shi, G.; Su, Y.; Ma, Y.; Zhou, M. A Hybrid Detection and Generation Framework with Separate Encoders for Event Extraction. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–4 May 2023; Vlachos, A., Augenstein, I., Eds.; 2023; pp. 3163–3180. [Google Scholar] [CrossRef]
Wan, Q.; Wan, C.; Xiao, K.; Liu, D.; Li, C.; Zheng, B.; Liu, X.; Hu, R. Joint Document-Level Event Extraction via Token-Token Bidirectional Event Completed Graph. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 10481–10492. [Google Scholar] [CrossRef]
Gao, J.; Zhao, H.; Yu, C.; Xu, R. Exploring the Feasibility of ChatGPT for Event Extraction. arXiv 2023, arXiv:2303.03836. [Google Scholar]
Wei, X.; Cui, X.; Cheng, N.; Wang, X.; Zhang, X.; Huang, S.; Xie, P.; Xu, J.; Chen, Y.; Zhang, M.; et al. Zero-Shot Information Extraction Via Chatting with ChatGPT. arXiv 2023, arXiv:2302.10205. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Red Hook, NY, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2004, arXiv:2004.05150. [Google Scholar]
Zaheer, M.; Guruganesh, G.; Dubey, A.; Ainslie, J.; Alberti, C.; Ontanon, S.; Pham, P.; Ravula, A.; Wang, Q.; Yang, L.; et al. Big bird: Transformers for longer sequences. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20), Red Hook, NY, USA, 6–12 December 2020. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–61. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
Huang, L.; Cassidy, T.; Feng, X.; Ji, H.; Voss, C.R.; Han, J.; Sil, A. Liberal Event Extraction and Event Schema Induction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 258–268. [Google Scholar] [CrossRef]
Chambers, N. Event Schema Induction with a Probabilistic Entity-Driven Model. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, DC, USA, 18–21 October 2013; pp. 1797–1807. [Google Scholar]
Cheung, J.C.K.; Poon, H.; Vanderwende, L. Probabilistic Frame Induction. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 837–846. [Google Scholar]
Shen, J.; Zhang, Y.; Ji, H.; Han, J. Corpus-based Open-Domain Event Type Induction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 5427–5440. [Google Scholar] [CrossRef]
Huang, L.; Ji, H. Semi-supervised New Event Type Induction and Event Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 718–724. [Google Scholar] [CrossRef]
Li, S.; Ji, H.; Han, J. Open Relation and Event Type Discovery with Type Abstraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6864–6877. [Google Scholar] [CrossRef]
Tang, J.; Lin, H.; Li, Z.; Lu, Y.; Han, X.; Sun, L. Harvesting Event Schemas from Large Language Models. arXiv 2023, arXiv:2305.07280. [Google Scholar]
Xu, R.; Wang, P.; Liu, T.; Zeng, S.; Chang, B.; Sui, Z. A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA, 10–15 July 2022; pp. 5025–5036. [Google Scholar] [CrossRef]
Liu, W.; Cheng, S.; Zeng, D.; Hong, Q. Enhancing Document-level Event Argument Extraction with Contextual Clues and Role Relevance. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 12908–12922. [Google Scholar] [CrossRef]
Tan, L.; Hu, Y.; Cao, J.; Tan, Z. AssocKD: An Association-Aware Knowledge Distillation Method for Document-Level Event Argument Extraction. Mathematics 2024, 12, 2901. [Google Scholar] [CrossRef]
Ma, Y.; Wang, Z.; Cao, Y.; Li, M.; Chen, M.; Wang, K.; Shao, J. Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 6759–6774. [Google Scholar] [CrossRef]
Nguyen, C.; Man, H.; Nguyen, T. Contextualized Soft Prompts for Extraction of Event Arguments. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 4352–4361. [Google Scholar] [CrossRef]
Li, H.; Cao, Y.; Ren, Y.; Fang, F.; Zhang, L.; Li, Y.; Wang, S. Intra-Event and Inter-Event Dependency-Aware Graph Network for Event Argument Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 6362–6372. [Google Scholar] [CrossRef]
Zhang, M.; Chen, H. Document-Level Event Argument Extraction with Sparse Representation Attention. Mathematics 2024, 12, 2636. [Google Scholar] [CrossRef]
Zhang, G.; Zhang, H.; Wang, Y.; Li, R.; Tan, H.; Liang, J. Hyperspherical Multi-Prototype with Optimal Transport for Event Argument Extraction. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 9271–9284. [Google Scholar] [CrossRef]
Wei, K.; Sun, X.; Zhang, Z.; Zhang, J.; Zhi, G.; Jin, L. Trigger is Not Sufficient: Exploiting Frame-aware Knowledge for Implicit Event Argument Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 4672–4682. [Google Scholar] [CrossRef]
Du, X.; Li, S.; Ji, H. Dynamic Global Memory for Document-level Argument Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 5264–5275. [Google Scholar] [CrossRef]
Ren, Y.; Cao, Y.; Guo, P.; Fang, F.; Ma, W.; Lin, Z. Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 293–306. [Google Scholar] [CrossRef]
Wang, X.; Jia, S.; Han, X.; Liu, Z.; Li, J.; Li, P.; Zhou, J. Neural Gibbs Sampling for Joint Event Argument Extraction. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, 4–7 December 2020; pp. 169–180. [Google Scholar] [CrossRef]
Sheng, J.; Guo, S.; Yu, B.; Li, Q.; Hei, Y.; Wang, L.; Liu, T.; Xu, H. CasEE: A Joint Learning Framework with Cascade Decoding for Overlapping Event Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 164–174. [Google Scholar] [CrossRef]
Xu, R.; Liu, T.; Li, L.; Chang, B. Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 3533–3546. [Google Scholar] [CrossRef]
Wan, Q.; Wan, C.; Xiao, K.; Xiong, H.; Liu, D.; Liu, X.; Hu, R. Token-Event-Role Structure-Based Multi-Channel Document-Level Event Extraction. ACM Trans. Inf. Syst. 2024, 42, 1–27. [Google Scholar] [CrossRef]
Forney, G. The viterbi algorithm. Proc. IEEE 1973, 61, 268–278. [Google Scholar] [CrossRef]
Ebner, S.; Xia, P.; Culkin, R.; Rawlins, K.; Van Durme, B. Multi-Sentence Argument Linking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8057–8077. [Google Scholar] [CrossRef]
Li, S.; Ji, H.; Han, J. Document-Level Event Argument Extraction by Conditional Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 894–908. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Yang, Y.; Guo, Q.; Hu, X.; Zhang, Y.; Qiu, X.; Zhang, Z. An AMR-based Link Prediction Approach for Document-level Event Argument Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 12876–12889. [Google Scholar] [CrossRef]
He, Y.; Hu, J.; Tang, B. Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 2542–12556. [Google Scholar] [CrossRef]
Shuang, K.; Zhouji, Z.; Wang,, Q.; Guo, J. Energizing LLMs’ Emergence Capabilities for Document-Level Event Argument Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 5520–5532. [Google Scholar]
Liu, W.; Zhou, L.; Zeng, D.; Xiao, Y.; Cheng, S.; Zhang, C.; Lee, G.; Zhang, M.; Chen, W. Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 9470–9487. [Google Scholar]

Figure 1. Schematic diagram of event extraction (EE) and event argument extraction (EAE). Arguments are indicated with underlines. The EE model aims to extract all events simultaneously, while mainstream EAE models extract the arguments of one event through a single training session.

Figure 2. Event type detection model diagram, mainly divided into entity extraction (Section 3.2.1), heterogeneous graph construction (Section 3.2.2), event type detection (Section 3.2.3).

Figure 3. Event argument extraction model, mainly divided into context encoding (Section 3.3.1), prompt for extraction (Section 3.3.2), feature fusion (Section 3.3.3), and span selection (Section 3.3.4).

Figure 4. Semantic role labeling and dependencies: ROOT (main verb or core predicate), HED (action-expressing predicate), SBV (action performer), VOB (action target), ATT (modifier providing additional information).

Figure 5. Comparison of the performance of SPARE and TabEAE as the number of events in a document varies. (a) shows the F1 scores for the RAMS dataset, and (b) shows the F1 scores for the WikiEvents dataset.

Figure 6. Comparison of the performance of SPARE and TabEAE on events that vary with the number of arguments. (a) illustrates the F1-score gap between SPARE and TabEAE models on the RAMS dataset. (b) presents the F1-score gap between the two models on the WikiEvents dataset.

Figure 7. Analysis following the distance between arguments and triggers in events. (a) shows the F1 values of SPARE and TabEAE models on the RAMS dataset across different distance ranges. (b) shows the F1 values of the two models on the WikiEvents dataset across different distance ranges.

Figure 8. SPARE and TabEAE models in the case from WikiEvents for the study.

Figure 9. SPARE and TabEAE models in the Campus Public Opinion case study.

Table 1. RAMS and WikiEvents datasets.

Dataset	RAMS		WikiEvents
Event types	139		50
Args per event	2.33		1.40
Events per text	1.25		1.78
Roles	65		80
Split	#Doc	#Event	#Doc	#Event
Train	3194	7329	206	3241
Dev	399	924	20	345
Test	400	871	20	365

Table 2. Hyperparameter setting.

Parameters	Values
Training Steps	10,000
Warmup Ratio	0.1
Learning Rate	2 × 10⁻⁵
Dropout Rate	0.1
Epoch	2
Batch size	4
Context Window Size	250
Max Span Length	10

Table 3. Overall performance of the SPARE model versus the benchmark model on the RAMS and WikiEvents datasets for argument recognition and argument classification. The highest scores are in bold and the next highest scores are underlined.

Model	RAMS		WikiEvents
Model	Arg-I	Arg-C	Arg-I	Arg-C
BART-Gen	48.64	51.2	67.62	61.17
PAIE	53.2	48.0	69.3	63.4
TSRA	53.01	48.06	67.52	60.11
TARA	52.34	48.06	68.76	62.18
TabEAE	56.4	51.5	70.0	65.6
SPARE	57.3	52.9	73.8	69.1
Gain over TabEAE	+0.9	+1.4	+3.8	+3.5

Table 4. Results of ablation experiment. PET: Pre-computed Encodings of the input Table.

Model	RAMS		WikiEvents
Model	Arg-I	Arg-C	Arg-I	Arg-C
w/o Event type	56.5	52.2	73.1	68.5
w/o SRL Attention	56.1	51.6	72.3	67.7
w/o Prompts	55.8	51.3	72.0	67.4
w/o PET	54.1	49.1	69.8	64.9
SPARE	57.3	52.9	73.8	69.1

Table 5. Results on the RAMS dataset.

Method	RAMS
Method	Arg-I	Arg-C
GPT3.5	46.2	40.4
GPT4	50.4	42.8
SPARE	57.3	52.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, Y.; Fan, J.; Zhang, Q.; Zhu, L.; Sun, X. Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents. Appl. Sci. 2025, 15, 5279. https://doi.org/10.3390/app15105279

AMA Style

Zhou Y, Fan J, Zhang Q, Zhu L, Sun X. Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents. Applied Sciences. 2025; 15(10):5279. https://doi.org/10.3390/app15105279

Chicago/Turabian Style

Zhou, Yipeng, Jiaxin Fan, Qingchuan Zhang, Lin Zhu, and Xingchen Sun. 2025. "Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents" Applied Sciences 15, no. 10: 5279. https://doi.org/10.3390/app15105279

APA Style

Zhou, Y., Fan, J., Zhang, Q., Zhu, L., & Sun, X. (2025). Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents. Applied Sciences, 15(10), 5279. https://doi.org/10.3390/app15105279

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents

Abstract

1. Introduction

2. Related Work

2.1. Pre-Trained Language Models for Event Extraction

2.2. Event Type Recognition

2.3. Document-Level Event Argument Extraction

2.4. Joint Event Extraction

3. Approach

3.1. Task Definition

3.2. Model for Dynamic Recognition of Event Types Based on Graph Neural Networks

3.2.1. Entity Extraction

3.2.2. Heterogeneous Graph Construction

3.2.3. Event Type Detection

3.3. Event Argument Extraction Model Based on Table Generation

3.3.1. Context Encoding

3.3.2. Prompt for Extraction

3.3.3. Feature Fusion

3.3.4. Span Selection

4. Experiment

4.1. Datasets and Evaluation Metrics

4.2. Experimental Parameterization

4.3. Baseline Comparison

4.4. Main Results

4.5. Sensitivity Analysis

4.6. Ablation Experiments

5. Analysis

5.1. Comparative Analysis of BERT and GPT Performance on D-EAE Tasks

5.2. Cross-Event Correlation Analysis

5.3. Model Semantic Capture Capability: Inter-Event Correlation

5.4. Model Semantic Capture Capability: Intra-Event Correlation

6. Case Studies

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI