IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains

Chen, Hefei; Cai, Yuanyuan; Song, Zexi; Zhang, Yiyao; Zhang, Hongbo

doi:10.3390/app15137348

Open AccessArticle

IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains

by

Hefei Chen

,

Yuanyuan Cai

^*

,

Zexi Song

,

Yiyao Zhang

and

Hongbo Zhang

Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7348; https://doi.org/10.3390/app15137348

Submission received: 19 May 2025 / Revised: 17 June 2025 / Accepted: 28 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Artificial Intelligence and Information Visualization in Social and Industrial Systems)

Download

Browse Figures

Versions Notes

Abstract

Event Causality Identification (ECI) is a crucial task in Information Extraction (IE). However, information about events described in documents is often distributed across sentences, which makes it difficult for existing studies to capture long-distance causal relations between events. To address these issues, this paper proposes Iterative Event Causal Identification (IECI), a pipelined framework for event causality identification that integrates two modules. The first module introduces Prompt-Based Event Detection (PRED), which integrates semantic role awareness with prompt templates to provide foundational input for the next module. The second module proposes the Semantic-Role Guided Causal Inference Graph (SRCIG), which identifies causal relations between events by constructing a causal graph and applying a dynamic threshold adjustment mechanism during the iterative process. Our experiments show that PRED and IECI consistently outperform the state-of-the-art baseline model. Specifically, on the EventStoryLine dataset, they achieve F1 improvements of 3.7–9.8% and 4.2–18.8%, respectively, while on MAVEN-ERE the gains are 4.2–10.3% and 1.0–40.3%. This demonstrates the effectiveness and robustness of the proposed framework in both event detection and event causality identification.

Keywords:

event causality identification; prompt-based learning; semantic role labeling; pipeline framework

1. Introduction

Event Causality Identification (ECI) plays a vital role in the field of Natural Language Processing (NLP), which helps to both deeply understand the potential relationships between events in text and supports practical applications, such as Intelligent Recommendation [1] and Intelligent Education [2]. However, event causality often spans multiple sentences or even paragraphs in real corpus, with strong long-range dependency. These cross-sentence causal relations are difficult to capture through local windows or static contexts, further increasing the challenges of modeling [3]. Traditional methods, limited to surface or local context features, struggle to capture deeper semantic or causal structures. In recent years, some studies have attempted to introduce Graph Neural Networks (GNNs) to construct causal graphs between events [4] and to infer causal relations between events through graph structures. However, these methods still face challenges, such as inadequate control of information propagation in causal graphs, insensitivity to edge weight updates, and introduction of potentially redundant edges. These can easily lead to the over-propagation of erroneous causal paths, affecting the accuracy of identification and the interpretability of causal chains. Recently, there have been studies exploring the use of sequence decoders to directly generate causal event pairs or causal chains. These approaches use pre-trained Encoder Decoder models (such as T5, GPT) to enhance the flexibility and generation capability of document-level causal inference [5,6].

Event Detection (ED) aims to identify the presence of predefined events in a text and locate the corresponding event-related content. Most traditional approaches treat event detection as a classification or sequence labeling problem. However, events are expressed in various forms, with fuzzy semantic boundaries, and some do not even have obvious clues. As a result, models tend to rely on a lot of fine-grained annotations, increasing the annotation cost [7]. In recent years, Pretrained Language Models (PLMs) have performed well in event detection tasks [8]. These models can capture contextual semantic associations by being pre-trained on large-scale corpora. They also have good migration capabilities and can even be applied to weakly supervised and unsupervised scenarios [9]. In addition, natural language prompting approaches have been gradually introduced into event detection, transforming the original structured task into a language understanding problem and providing a new perspective on event detection. Although these methods have led to notable advancements, their application to event causality reasoning remains limited.

To address the challenges above, this paper proposes a pipeline framework for iterative event causality identification with dynamic inference chains (IECI). IECI consists of two modules, where ED is both an independent detection task and a critical step providing ECI input. Specifically, the Prompt-Based Event Detection (PRED) module effectively extracts events from the raw text by integrating semantic understanding with prompt-based reasoning. These events provide clear inputs for causal recognition. On this basis, the Semantic-Role Guided Causal Inference Graph (SRCIG) module performs iterative reasoning to discover new causal relations. It also dynamically optimizes the causal graph structure, enhancing the overall accuracy and robustness of causal inference. The experimental results show that the proposed IECI framework performs strongly in both modules, demonstrating a wide range of adaptability and application potential.

The main contributions of this paper are as follows:

(1): We propose a pipeline-based event causality identification framework, IECI, in which the SRCIG module performs causal inference by incorporating semantic role guidance, dynamic threshold control, and iterative optimization mechanisms.
(2): We introduce the PRED module, which leverages prompts to supervise the downstream causal inference task of the IECI framework.
(3): The performance of IECI is systematically evaluated using both the EventStoryLine and MAVEN-ERE datasets. Experimental results show that IECI significantly outperforms state-of-the-art methods across several key metrics, demonstrating the accuracy and robustness of the proposed model.

2. Related Work

2.1. Event Causality Identification

In ECI tasks, sentence-level ECI mainly focuses on the causal relationships between events within a single sentence, aiming to provide precise information for more complex reasoning. Hashimoto utilized the multilingual redundancy and causal relationship description of Wikipedia [10] to design a weakly supervised method. This method efficiently extracts entity-level causal relationships through cross-lingual context aggregation and linear classifiers. Liu et al. enhanced the causal inference of event pairs by combining the semantic relationships of events in an external knowledge base with sentence-level text [11]. They replaced events with generic mask tokens, forcing the model to focus on contextual patterns independent of specific events. Zuo et al. used external resources to solve the problem of data scarcity [12], first learning context-specific causal patterns from unlabeled external resources and then using contrastive alignment strategies to transfer these patterns to the target ECI model. Hu et al. focused on both the event itself and the structure between events [13]. They employed a GNN-based aggregator to enrich event representations with neighboring semantic elements and an LSTM-based path aggregator to capture multi-hop relational paths between events. Liu et al. proposed the KEPT model [14], which combines prompt tuning with external Knowledge Bases (KBs), integrates relational information and event background knowledge through an interactive attention mechanism, and learns knowledge representation to identify causal relationships better. Wang et al. introduced ensemble learning [15]. They combined the Mamba architecture with Temporal Convolutional Networks (TCN) to overcome traditional Transformers’ quadratic time complexity limitation. They employed a dual-graph structure to capture document-level event associations.

Compared to sentence-level ECI, document-level ECI requires modeling causal relationships between events across a broader context. It faces challenges like large information spans and multi-hop causal chains, making the research more complex and in-depth. Gao et al. considered both global and local causality [16], separating intra-sentence and inter-sentence causalities for prediction and introducing lexical similarity features to assist in understanding the meaning between events. Tran Phu and Nguyen modeled inter-sentence event relationships by constructing multi-source interaction graphs and introduced a regularization mechanism to reduce noise interference [17]. Chen et al. proposed transforming the identification of document-level event causal relationships into a node classification problem and employed a graph transformer to construct high-order causal chains [18]. Liu et al. proposed the PPAT model [19], which achieves hierarchical reasoning by constructing a Sentence boundary Event Relational Graph (SERG). It forms causal chains from intra-sentence to inter-sentence levels through a pairwise attention mechanism, gradually uncovering causal relationships hidden across sentence boundaries. Chen et al. proposed the CHEER model [20], which considers the existence of central events in each document and introduces central event features. It constructs an interaction graph using event nodes and event pair nodes, supporting high-order causal propagation and coreferential consistency reasoning. Yuan et al. proposed SENDIR [21], which designed a sparse attention mechanism to capture long-range dependencies and reduce interference from low-information-density regions in documents. Zeng et al. proposed the EiGC model [22], which innovatively integrates multiple types of event relationship graphs and event aware constraint mechanisms. It encodes high-order structural features through a Relational Graph Convolutional Network (R-GCN) and constrains them using Integer Linear Programming (ILP).

Although the introduction of graph neural networks, causal graph modeling, and other methods has led to some progress, most of the existing causal inference methods rely on static graph structures or rule definitions. They lack dynamic adjustment mechanisms and struggle to adapt to semantically shifting causal distributions in real-world texts, thereby limiting the inference depth and robustness of the models.

2.2. Event Detection

In the early research of event detection, feature engineering [23,24] was the main development direction. With the rise of deep learning technology, neural-network-based methods have gradually replaced traditional models that rely on manually crafted features. Nguyen et al. were the first to apply neural networks for joint event extraction [25], which treats the event extraction process as a sequential labeling task and introduces memory vectors to store information. The drawbacks of pipelined frameworks were better addressed. Sha et al. proposed the dbRNN model [26], which effectively connects trigger words with their associated arguments via syntactic paths and enhances semantic modeling capabilities. Orr et al. introduced the DAG-GRU model [27] to address the performance differences and stability of neural networks in event detection tasks. The model improves the performance of trigger word classification by combining contextual and syntactic information. Yan et al. proposed the MOGANED model [28], which constructs dependency syntactic graphs and uses attention mechanisms to focus on higher-order syntactic information. They performed multi classification to enhance the model’s ability to distinguish key information.

In recent years, Transformer-based pre-trained models have demonstrated stronger transferability and semantic understanding capabilities in event detection. Huang and Ji proposed the SS-VQ-VAE model [29] to address the high cost of traditional event extraction methods relying on predefined event types. This model uses a pre-trained BERT to obtain candidate triggers and predict their types. It then uses a variational autoencoder as a regularizer to reconstruct the trigger words and alleviate overfitting to labeled types. Chen et al. constructed a Structural Causal Model (SCM) that introduces backdoor adjustment for causal intervention in the context [30], blocking the confounding effects between trigger words, context, and predicted results. This provides new ideas for improving the generalization of event detection models. Sheng et al. proposed the CorED model [31], which utilizes the literal semantics of type names to construct dynamic graph structures and employs graph neural networks to aggregate relevant type information, improving representation learning for low resource types. Wu et al. proposed SMGED [32], which designs a multi-layer graph attention network combined with a skip connection mechanism to capture syntactic dependencies of different orders effectively. Bi-directional Long Short-Term Memory (BiLSTM) is introduced to encode contextual information. Its output is fused with that of the graph attention networks to compensate for the information loss caused by external dependency analysis tools in traditional methods. In low-resource event detection tasks, Fu et al. proposed a syntactic-embedding-based model, SynED [33]. It extracts syntactic information using SpaCy and introduces a syntactic gating matrix in the Transformer’s self-attention layer to improve the accuracy of trigger word recognition. Liu et al. proposed the DE3TC model [34], which concatenates event type clues, type-specific clues, original sentences, and similar type descriptions into a semantic modeling sequence. This sequence is then encoded using a pre-trained language model to achieve end-to-end event detection.

Although the above methods have achieved significant improvements in event detection performance, most of them rely on contextual representations or structural information to enhance trigger word recognition. However, they lack sufficient semantic prior modeling of event types and do not incorporate effective semantic guidance mechanisms. As a result, the models often exhibit unstable recognition performance when dealing with events that involve complex semantics or blurred boundaries.

3. Methodology

To address the challenges of accurately recognizing events and the low visibility of causal relationships, this paper proposes a pipelined event causality recognition framework, IECI, which contains two modules. The PRED module is constructed by utilizing SRL and prompt-based mechanisms. The SRCIG module is constructed by combining graph neural networks and iterative inference mechanisms. This module effectively achieves the dual goals of accurately identifying events from raw text and reasoning about their causal dependencies. Section 3.1 introduces the design and implementation of the event detection module. Section 3.2 details the causal graph construction and inference process.

3.1. Prompt-Based Event Detection

The PRED module integrates semantic and prompt-based reasoning for the ED task. Figure 1 illustrates the overall structure of the PRED, which consists of four key sub-modules: Semantic-Based Context Embedding, Prompt-Based Type-Aware Embedding, Attention-Based Feature Fusion, and Event Prediction.

3.1.1. Semantic-Based Context Embedding

Raw input is modeled with a subword mapping mechanism. The input of

C = {c_{1}, c_{2}, \dots, c_{n}}

is a character sequence of length

n

, where

w_{i}

denotes the

i

-th character. After being tokenized by the BERT tokenizer based on the WordPiece algorithm [35], the character sequence is converted into subword sequences:

S = {{\hat{w}}_{1}, {\hat{w}}_{2}, \dots, {\hat{w}}_{m}}, n \geq m,

(1)

where

{\hat{w}}_{j}

denotes the

j

-th subword.

The WordPiece algorithm divides characters into subwords by analyzing statistical patterns in the training corpus, as in the following formula:

M_{C \to S} = {(s_{j}, e_{j}) | s_{j}, e_{j} = 1, \dots, n},

(2)

where

(s_{j}, e_{j})

represents the start and end of the character sequence corresponding to the

j

-th subword in

S

.

Context embedding. The subword sequence

S = {{\hat{w}}_{1}, {\hat{w}}_{2}, \dots, {\hat{w}}_{m}}

is first converted into input embeddings

X \in R^{m \times d_{B E R T}}

through tagging, positional embedding, and segment embedding.

X = E_{t o k} + E_{p o s} + E_{s e g},

(3)

where

E_{t o k}

,

E_{p o s}

, and

E_{s e g}

represent token embeddings, positional embeddings, and segment embeddings, respectively.

Then, the

X

is input into a pre-trained BERT encoder to extract contextual representations:

H_{B E R T} = E n c o d e r (X) = {h_{1}, h_{2}, \dots, h_{n^{'}}} \in R^{m \times d_{B E R T}},

(4)

where

h_{j}

is the hidden representation of the

j

-th subword in the context and

d_{B E R T}

is the hidden layer dimension of the BERT output.

BiLSTM is a recurrent neural network that can simultaneously capture contextual information. In order to enhance the ability to model sequential structures, this paper introduces BiLSTM on top of BERT output for further modeling to obtain the deep hidden states:

H_{B i L S T M} = B i L S T M (H_{B E R T}) = [\vec{h_{1}}; \overset{\leftarrow}{h_{1}}], \dots, [\vec{h_{n}}; \overset{\leftarrow}{h_{m}}] \in R^{m \times 2 d_{l s t m}},

(5)

where

\vec{h_{i}}; \overset{\leftarrow}{h_{i}}

denote the forward and backward hidden states of the

i

-th token and 2

d_{l s t m}

denotes the hidden layer dimension of the BiLSTM.

Positional alignment and semantic role attention mechanism. For each word representation, its start position

s_{i}

and end position

e_{i}

are extracted according to their original subword position mapping

M_{C \to S}

. In addition, a role attention mechanism is introduced to guide the module to assign higher attention weights to important semantic slots:

α_{i j} = \frac{e x p (h_{i}^{T} W h_{j} + {S R L}_{i j})}{\sum_{k = 1}^{n} e x p (h_{i}^{T} W h_{k} + {S R L}_{i k})},

(6)

where

h_{i} = [\vec{h_{1}}; \overset{\leftarrow}{h_{1}}] \in R^{2 d_{lstm}}

is the BiLSTM hidden vector representation of the

i

-th token.

{S R L}_{i j}

is defined as

{S R L}_{i j} = λ

if token

i

and

j

appear within the same predicate-argument structure, and 0 otherwise; here,

λ

is a trainable scalar.

W ϵ R^{2 d_{l s t m} \times 2 d_{l s t m}}

is a learnable bilinear attention weight matrix [19].

3.1.2. Prompt-Based Type-Aware Embedding

This paper designs a text-independent prompt template encoding module to provide the PRED with more explicit semantic guidance for event types. This module uses a set of predefined natural language prompt statements to introduce structured language templates. These templates serve as the guidance information for the feature fusion module to complete the modeling and discrimination of potential event types in the context.

Each event type is associated with a uniform format, such as “The Event in this sentence is related to law, profession, place, society, etc.”. The template length is set based on the actual maximum length. The “Event” is the slot name. This decision is based on two reasons: (1) ensuring structural consistency between different event types; (2) simplifying the decoding process by reducing the variation of prompt formats. Although each event type template contains only one slot, the slot will connect multiple event spans, as multiple events may be matched in the text.

We constructed a structured table of initial prompt templates based on this set of prompt templates. Each column of the table corresponds to an event type, and the columns contain only one slot for populating information about one event under that type. This table structure provides explicit types and fill slots for subsequent event decoding, allowing the PRED to clearly distinguish between various types of events and their components structurally.

Specifically, let there be a total of

k

event types; then, the set of prompt words is

P = p_{1}, p_{2}, \dots, p_{k}

, and each

p_{k}

is an instruction template sentence indicating the

k

-th class of event types. After encoding by the encoder, the corresponding prompt code representation is obtained:

H_{k}^{p} = E n c o d e r (p_{k}) = \{h_{1}^{(p)}, h_{2}^{(p)}, \dots, h_{m_{k}}^{(p)}\}, H_{k}^{p} \in R^{m_{k} \times d},

(7)

where

m_{k}

denotes the length of the

k

-th prompt template and

d

is the hidden layer dimension.

3.1.3. Attention-Based Feature Fusion

After obtaining the contextual embeddings and prompt template embeddings of the input text, the PRED enters the feature fusion phase. This phase first applies structured self-attention processing to the encoded prompt template part. Then, it performs semantic alignment between the event type and the text through the cross-attention mechanism, combined with the contextual embeddings.

In the self-attention layer of prompt coding, the attention mask is specially designed to control the attention relationships between different tokens. As illustrated in Figure 1, light colors indicate permitted attention and dark colors prohibit it. The mask follows the following core rules:

All slots across event type templates are free to attend to each other, enabling shared semantic modeling.
Each slot and the event type it belongs to can follow each other.

This mechanism enhances slot representation capability and captures the shared semantics among different event types.

Cross-attention fusion. This section constructs the standard cross-attention by taking the table encoding as the input to the decoder and the contextual encoding as the encoder hidden states. The slots after cross-attention fusion are represented as

H^{d e c} = D e c o d e r (H^{t a b}, H^{c t x}),

(8)

where

H^{c t x} \in R^{L \times d}

denotes the contextual embedding of the output,

L

is the text length, and

d

is the hidden layer dimension.

H^{t a b} \in R^{T \times d}

denotes the structured representation of the template input, and

T

is the structure table length.

The core formula of the cross-attention module is

A t t n (Q, K, V) = S o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}} + m a s k) V,

(9)

where

Q = H^{t a b} W_{Q}, K = H^{c t x} W_{K}, V = H^{c t x} W_{V}

. Form vectors are used as

Q

and contexts as

K

and

V

for perceptual fusion of contextual representations.

d_{k}

denotes the dimensionality of the attention header, which is used for scaling.

3.1.4. Event Prediction

The output of the table after cross-attention is

H^{d e c}

, and each slot vector

h^{(s l o t)} \in R^{d}

is used to predict the start and end positions of events. Two learnable vectors

w_{s t a r t}, w_{e n d} \in R^{d}

are introduced, and the slot vectors are subjected to dot-product operation with the context to compute their scoring corresponding to each token as start and end:

q^{s t a r t} = h^{(s l o t)} ⨀ w_{s t a r t},

(10)

q^{e n d} = h^{(s l o t)} ⨀ w_{e n d},

(11)

Then, with the context sequence

H^{c t x} = [h_{1}^{c t x}, \dots, h_{L}^{c t x}]

for correlation calculation to generate position scoring:

{s c o r e}_{i}^{s t a r t} = < h_{i}^{c t x}, q^{s t a r t} >,

(12)

{s c o r e}_{i}^{e n d} = < h_{i}^{c t x}, q^{e n d} >,

(13)

where

h_{i}^{c t x}

denotes the hidden vector of the

i

-th token of the context.

< ., . >

denotes the inner product operation, which measures the matching degree. The score

{score}_{i}^{s t a r t / e n d}

indicates the scoring of the slot at the

i

-th position as a potential start or end point of the event span.

Because a single slot may correspond to multiple candidate spans, a structural selector is introduced to determine whether each span should be retained.

For candidate span

(s_{j}, e_{j})

, the fusion vector is formed after concatenating its contextual representation:

g_{j} = [z_{s_{j}}; z_{e_{j}}],

(14)

Use a linear classifier to determine its confidence level:

{\hat{p}}_{j} = σ (w^{T} g_{j} + b),

(15)

where

σ

is the Sigmoid and

\hat{p} \in [0, 1]

is the prediction confidence.

In the training phase, the model may predict multiple candidate events, while the annotated data typically provide only a few or a single ground-truth labels. The Hungarian matching algorithm is introduced to optimally match the predictions with the true labels to address this mismatch.

Each predicted span

(\hat{s_{i}}, \hat{e_{i}})

is matched with the corresponding true span

(s_{i}, e_{i})

. If the predicted number is more than the true number, zero-padding is applied to maintain alignment. After matching, losses are computed using cross-entropy losses for the start and end positions, respectively:

L_{s t a r t} = \frac{1}{N} \sum_{i = 1}^{N} C E ({s c o r e}_{i}^{s t a r t}, s_{i}),

(16)

L_{e n d} = \frac{1}{N} \sum_{i = 1}^{N} C E ({s c o r e}_{i}^{e n d}, e_{i}),

(17)

The final loss is

L_{s p a n} = \frac{1}{2} (L_{s t a r t} + L_{e n d}),

(18)

After averaging over all samples and event types within the batch, the overall loss is

L_{t o t a l} = \frac{1}{B} \sum_{b = 1}^{B} \frac{1}{M_{b}} \sum_{j = 1}^{M_{b}} L_{s p a n}^{(b, j)},

(19)

B

denotes the size of the batch size,

M_{b}

denotes the number of event slots in the

b

-th sample, and

L_{s p a n}^{(b, j)}

denotes the loss value of the

j

-th slot.

3.2. Semantic-Role Guided Causal Inference Graph

The second module of IECI is the Semantic-Role Guided Causal Inference Graph (SRCIG), as shown in Figure 2. SRCIG consists of four key components: Contextual Representation with semantic role Attention, Event Causal Predictor (ECP), Causal Relation Graph Construction (CRGC), and Causal Graph Refinement (CGR). Except for the initial stage, the SRCIG is iterated in the order of ECP, CRGC, and CGR to optimize the causal representation and structure of events.

3.2.1. Contextual Representation with Semantic Role Attention

The SRCIG module is jointly modeled by BERT and BiLSTM. At the text input stage, it incorporates a semantic role attention mechanism to enhance the expressiveness of the semantic representation of events.

Specifically, the input document is first represented as a sequence of n words

x_{1}, x_{2}, \dots, x_{n}

. This sequence is then fed into the pre-trained language model BERT to obtain a sequence of semantic representation vectors

E_{1}, E_{2}, \dots, E_{n}

corresponding to each word. Subsequently, BiLSTM is introduced to capture global semantic dependency information further. The BERT output is used as the input of BiLSTM to obtain the hidden state representations

H_{1}, H_{2}, \dots, H_{n}

.

We employ semantic role labeling (SRL) as a guiding signal to incorporate prior syntactic–semantic knowledge, increasing the weight of event-detection-related words. A Semantic Role Attention module is proposed to weight the BiLSTM output, computed as follows.

First, obtain the semantic role embedding vectors

S_{1}, S_{2}, \dots, S_{n}

for each word, which is generated by SRL and corresponds to different role categories. For each time step

t

, calculate its corresponding semantic attention score

α_{t}

:

α_{t} = \frac{e x p (H_{t}^{⊤} W_{a} S_{t})}{\sum_{j = 1}^{n} e x p (H_{j}^{⊤} W_{a} S_{j})},

(20)

where

W_{a}

is the learnable attention weight matrix.

Ultimately, the contextual representation of each location is obtained through attentional weighting:

y_{t} = α_{t} \cdot H_{t},

(21)

The contextual representation of the whole text can be denoted as

y_{1}, y_{2}, \dots, y_{n}

, which serves as input to the subsequent modules.

3.2.2. Event Causal Predictor

First, the SRCIG uses the context representations

y_{1}, y_{2}, \dots, y_{n}

obtained in Section 3.1 as the initial event node representation

z_{i}

in the event causal graph. In each iteration, for every pair of candidate events

(e_{i}, e_{j})

, SRCIG concatenates the context representations

h_{i}, h_{j}

and the difference

z_{i} - z_{j}

between their node representations. This concatenated vector is then input to the causal classifier for inference:

v_{i j} = [h_{i}; h_{j}; z_{i} - z_{j}],

(22)

Next, Multilayer Perceptron (MLP) is used as a causal judgment module to predict the causal category of the event pair

(e_{i}, e_{j})

:

p_{i j} = s o f t m a x (M L P (v_{i j})) R^{3},

(23)

where

p_{i j}

denotes the probability that the event pair is or is not causally related. This probability is subsequently used as the basis for the causal graph construction module.

3.2.3. Causal Relation Graph Construction

Based on the above causal prediction results of event pairs, the SRCIG constructs a document-level event causal graph. The graph’s vertices are the events, and the edges represent the causal relationships between event pairs. The edges are constructed using the following strategy. For each pair of events

(e_{i}, e_{j})

, the model computes the causal prediction probability. If the score

p_{i j}^{(r)}

of a relationship type in its causal prediction probability exceeds a threshold

θ

, a directed edge is added to indicate the existence of the relationship. The types of edges include intra-sentence edges and inter-sentence edges.

To avoid the problem of missed detections caused by fixed thresholds, SRCIG introduces a dynamic threshold adjustment mechanism based on the density of events. This mechanism automatically adjusts the threshold

θ

according to the number

N

of recognized events. The specific formula is as follows:

θ = θ_{base} - λ \cdot N,

(24)

where

θ_{base}

is the initial threshold and

λ

is the decay factor.

This strategy reduces the threshold in long documents with many events, avoiding the omission of true causal relationships due to overly strict standards. In contrast, for short texts, a higher threshold is maintained to ensure the accuracy of predicting relationships.

3.2.4. Causal Graph Refinement

To effectively model the complex causal structure between events and capture latent inference patterns in causal chains, this module introduces a graph-structure-aware mechanism for causal information updating. The node representation

z_{i}

in the causal graph is progressively refined through a process of causal inference complementation.

Causality completion based on inference chains. First, based on the identified causal relationships in the current graph, we identify all Transitive Chains shaped as

(e_{i} \to e_{j}, e_{j} \to e_{k})

, which represent the potential conduction paths from

e_{i}

to

e_{k}

. However, we do not directly add the transitive link to avoid erroneous complementation. We introduce the semantic similarity of event embeddings as a decision criterion. Specifically, if the cosine similarity between the embeddings of events

e_{i}

and

e_{k}

is satisfied with a predefined threshold, the causal edge

{(e}_{i} \to e_{k})

is complemented:

c o s (h_{i}, h_{k}) = \frac{h_{i}^{⊤} h_{k}}{∥ h_{i} ∥ \cdot ∥ h_{k} ∥} > δ,

(25)

δ

is the similarity threshold.

Attention-based edge weight calculation. The causal graph is divided into intra-sentence and inter-sentence for attention calculation to improve the model’s inference accuracy. The higher the importance of the source and target nodes, the more critical the edge is to the inference. The attention scores of the source and target nodes are calculated by combining the contextual event representations

h_{i}, h_{j}

:

α_{i \to j}^{(s)} = σ (W_{s} h_{i} + b_{s}), α_{i \to j}^{(t)} = σ (W_{t} h_{j} + b_{t}),

(26)

where

W_{s}

denotes the learnable weight matrix of the source node.

b_{s}

denotes the bias term of the source node.

α_{i \to j}^{(s)}

denotes the attentional weight of the event

e_{i}

to the event pair

(e_{i}, e_{j})

.

α_{i \to j}^{(t)}

denotes the attentional weight of the event

e_{j}

to the event pair

(e_{i}, e_{j})

.

σ

is the activation function.

The combined attention weight

w_{i j}

of the edges connected to the event pair

(e_{i}, e_{j})

is calculated as follows:

w_{i j} = α_{i \to j}^{(s)} + α_{i \to j}^{(t)},

(27)

Multi-Head Attention aggregation and node representation update. The Multi-Head Attention mechanism is introduced to update the representation of each event node

e_{i}

in the graph. This update is based on the representations and edge weights of its neighboring nodes. In the

(l + 1)

th iteration, the representation

z_{i}^{(l + 1)}

of node

e_{i}

in the causal graph is calculated as follows:

z_{i}^{(l + 1)} = ∥_{m = 1}^{M} σ (\sum_{j \in N (i)} w_{i j}^{(m)} \cdot W^{(m)} z_{j}^{(l)}),

(28)

where

∥_{m = 1}^{M}

denotes the concatenation of outputs from all

M

attention heads along the feature dimension.

σ

is the activation function.

N (i)

is the set of neighbors of the event node

e_{i}

.

w_{i j}^{(m)}

is the edge attention matrix of the node pairs

(e_{i}, e_{j})

in the

m

-th attention head.

W^{(m)}

is the matrix of the corresponding learnable linear transformations of the

m

-th attention head.

z_{j}^{(l)}

is the causal representation of node

e_{j}

of the causal event graph in the

l

-th iteration.

When merging intra-sentence and inter-sentence causal graphs, we introduce the parameter

γ

to perform weighted fusion of the updated information

z_{i}^{(l + 1)}

across different edge types. Specifically, intra-sentence edges are assigned a weight of

γ

, while inter-sentence edges are weighted by

1 - γ

.

If the number of sentences exceeds a pre-set threshold, the maximum number of iterations is the number of sentences. For all node representations

z_{i}

of the causal graph, compare the causal representation of the iteration round with the previous iteration:

∆^{l} = \frac{1}{N} \sum_{i = 1}^{N} {∥ z_{i}^{(l)} - z_{i}^{(l - 1)} ∥}_{2},

(29)

where

N

is the total number of events. The

$ {| \cdot |}_{2} $

denotes the Euclidean distance.

∆^{l}

reflects the semantic fluctuation of the whole causal graph, and the iteration stops when

∆^{l}

is smaller than a set threshold.

4. Experiments

4.1. Datasets and Evaluation Metrics

This paper evaluates two modules in the IECI framework using both the EventStoryLine [36] and the MAVEN-ERE datasets [37]. The dataset has 258 documents, 7275 event mentions, 1770 intra-sentence causal event pairs, and 3885 inter-sentence causal event pairs. For the first module, events are divided into six categories of events related to different contexts, such as law, place, person, and transportation. The first 20 topics of the dataset were used for five-fold cross-validation, while the last 2 topics were used as the development dataset in the second module. The MAVEN-ERE dataset consists of 4480 documents, 112,676 event mentions, and 57,992 causal relation pairs, including 10,617 CUASE and 47,375 preconditions. This article uses the validation set of the original dataset for partitioning and experimentation. In this paper, three commonly used metrics, namely, precision (P), recall (R), and F1 Score, are used to evaluate the performance of IECI in the two modules.

4.2. Parameter Settings

The experimental software environment for the IECI framework proposed in this paper is based on Python 3.7 and PyTorch 1.13.1. The hardware environment includes an NVIDIA A800-SXM4-80GB, providing a strong computational foundation for model training. To ensure the stability and reproducibility of the experiments, we uniformly set random seeds in all experiments. Specific hyper-parameter settings are as follows: in PRED, batch_size is set to 16, learning_rate is 2 × 10⁻⁵, warmup_steps is 0.1, max_steps is 10,000, keep_ratio is 1.0, weight_decay is 0.01, and seed is 42. in IECI, batch_size is set to 1, learning_rate is 2 × 10⁻⁵, num_heads is 4, epoch is 30, and seed is 209.

4.3. Experimental Results and Discussion

4.3.1. Experiments for PRED

To validate the effectiveness of PRED for event detection, comparative experiments are conducted on the EventStoryLine dataset and the MAVEN-ERE dataset. The comparison includes several state-of-the-art named entity recognition methods based on pre-trained language models, BERT-NER [38], BERT-CRF-NER [39], which integrates Conditional Random Fields (CRFs), and BERT-BiLSTM-NER [40], which combines BiLSTM with BERT.

From the results of the EventStoryLine and MAVEN-ERE datasets in Table 1 and Table 2, “Runtime (min)” refers to the time required to complete training and evaluation in minutes. The traditional BERT-NER method performs moderately in ED, with F1 scores of 85.6% and 86.1%, respectively. The BERT-CRF-NER model, with the introduction of Conditional Random Fields (CRF), shows a significant improvement in accuracy, reaching an F1 score of 88.2% and 89.3%, respectively. This demonstrates the advantages of CRF in dealing with label dependencies and sequence consistency. Further improvement is observed in the BERT-BiLSTM-NER model, which incorporates a sequence modeling mechanism. It achieves an F1 score of 91.7% and 92.2%, respectively, indicating that BiLSTM effectively enhances the model’s ability to capture contextual information.

Compared to the above methods, the proposed PRED module achieves the best performance, with a precision of 95.9% and a recall of 95.2% on the EventStoryLine dataset, while achieving a precision of 96.5% and a recall of 96.2% on the MAVEN-ERE dataset. The F1 score improves by 3.7% and 4.2% over BERT-BiLSTM-NER on the EventStoryLine and MAVEN-ERE datasets, respectively, and by 9.8% and 10.3% over BERT-NER. This mainly results from the introduction of a BiLSTM-based semantic role attention mechanism and a prompt-guided strategy, which enhance the model’s capacity for event detection and its adaptability to complex event expressions. In addition, PRED maintains a high recall ability, verifying its ability to capture the full picture of events in their entirety. This provides more accurate and structured event inputs for subsequent causal inference tasks.

Compared to the traditional BERT-NER family of models, PRED shows an increase in runtime. However, this extra cost results in better event recognition performance, suggesting an effective trade-off between accuracy and efficiency.

4.3.2. Experiments for IECI

Comparison Experiments. To validate the effectiveness of IECI for event causality identification, we conduct comparative experiments on the EventStoryLine dataset and the MAVEN-ERE dataset. Specifically, IECI is compared with several state-of-the-art models on EventStoryLine, including LIP [28], BERT [41], RichGCN [29], ERGO [30], PPAT [31], and HOTECI [42]. Furthermore, IECI is compared with several state-of-the-art models on MAVEN-ERE, including BERT [41], ERGO [30], and PPAT [31]. The comparison is carried out under three different settings: intra-sentence, inter-sentence, and combined intra-sentence and inter-sentence scenarios. The experimental results under these settings are presented in Table 3 and Table 4. “Runtime (min/fold)” denotes the average time for training and evaluation during five-fold cross-validation.

In the intra-sentence, IECI achieves superior performance on both datasets. On the EventStoryLine dataset, IECI achieves an F1 score of 70.8%, representing a 1.7% improvement over the 69.1% achieved by the currently better-performing HOTECI model. And, IECI maintains high precision of 76.5% and recall of 67.5%, demonstrating strong capabilities in fine-grained semantic modeling. On the more challenging MAVEN-ERE dataset, IECI still performs well, achieving an F1 score of 64.9%. Compared with BERT, ERGO, and PPAT, the F1 score of IECI increases by 54.4%, 0.7%, and 20.2%, respectively, which indicates that the IECI has stronger causal identification ability.

In the inter-sentence, IECI also shows strong inference. This task is more difficult, requiring the model to cross sentences, integrate contextual information, and capture potential semantic prompts. On the EventStoryLine dataset, despite ranking second in accuracy among all models, IECI achieves an F1 score of 58.5%, the highest among all models. Compared to HOTECI’s 55.1% and PPAT’s 52.0%, IECI demonstrates stronger cross-sentence reasoning and context integration. On MAVEN-ERE, IECI still leads with an F1 score of 56.5%, showing a significant improvement compared to BERT and PPAT. The advantage is mainly due to the model’s causal graph structure and iterative update mechanism. This mechanism dynamically adjusts the threshold based on event density. Ultimately, the updating and expansion of causal edges are effectively controlled, thus enhancing the generalization ability and robustness of the model in complex contexts.

In the overall causal identification scenario (Intra + Inter), IECI also achieves leading comprehensive performance. On the EventStoryLine dataset, precision, recall, and F1 score reach 59.5%, 64.0%, and 60.7%, respectively. The F1 score improves by 4.2% over HOTECI, the best of the baseline models, and by 18.8% over LIP, the lowest. In MAVEN-ERE, IECI still achieves a 56.2% F1 score, which represents an improvement of 1.0% compared to the best baseline model, ERGO, and 40.3% compared to the lowest-performing model, BERT. This demonstrates that IECI maintains stable and efficient performance when dealing with mixed types of event relationships.

We further count the runtime of each model to measure the computational efficiency of the model while ensuring performance. From the results, on the EventStoryLine dataset, traditional methods, such as LIP and BERT, have shorter runtimes due to their simpler structure and short running time of 100.4 min and 117.8 min, respectively. They perform poorly in causal relation recognition performance. The running time of RichGCN and ERGO is at a moderate level, which is 141.8 min and 139.5 min, respectively. The running time of PPAT and HOTECI is relatively high, at 156.6 min and 167.0 min, respectively. On the MAVEN-ERE dataset, BERT, ERGO, and PPAT show similar trends. Overall, our proposed IECI has achieved higher performance in exchange for a certain amount of time.

Case Study. We further validate the effectiveness of the IECI model in identifying causal relationships between events through case studies. As shown in Figure 3, BERT demonstrates the ability to identify causal event pairs within a single sentence. But, it still exhibits limitations in handling implicit causal relationships, cross-sentence reasoning, and other aspects. In contrast, IECI correctly identifies causal relationships between events by constructing a global inference chain. (1) IECI can identify succession-based causal relationships. For example, the “pulling off” in an event pair (transfer, pulling off) occurs during the “transfer” process, which constitutes a causal relationship. (2) IECI can identify causal relationships in indirect reasoning. For example, in the event pair (pulling off, received), there is an “escape” step between “pulling off” and “received”, and the model can avoid interference and make a correct identification. (3) IECI can identify causal chains of multiple events correctly. For example, the event pairs (transfer, pulling off) and (pulling off, received) are connected through the intermediate event “pulling off”. IECI is able to form a complete inference path (transfer, pulling off, received) and identifies that (transfer, received) has a causal relationship. (4) IECI can correctly identify long-distance causal relationships across sentences. For example, the event pair (run, received) is located far apart in different sentences and has actions like “pulling off”, “transfer”, and “escape” between. IECI can still capture the deep semantic connections between events. At the same time, it can also recognize that there is no causal relationship in (run, transfer).

Ablation Experiments. To verify the effectiveness of the key modules in the IECI proposed in this paper, we designed three ablation experiments to remove the semantic role labeling, the dynamic threshold adjustment mechanism, and the semantic similarity judgment mechanism (SimMatch), respectively, and analyze their impacts on the performance of ECI. The ablation experiment results on two datasets are presented in Table 5 and Table 6.

The model’s performance drops significantly when the SRL module is unused. Specifically, on the EventStoryLine dataset, the F1 score for intra-sentence causal recognition decreases from 70.8% to 69.2%, and the F1 score for inter-sentence causal recognition decreases from 58.5% to 54.9%. On the MAVEN-ERE dataset, the F1 scores of intra-sentence and inter-sentence decrease by 2.1% and 4.2%, respectively. This indicates that semantic role information plays a crucial role in modeling the semantic structure of events and improving the accuracy of causality recognition, especially when dealing with sentences with complex structures or semantic ambiguity in documents.

We remove the dynamic threshold adjustment mechanism and uniformly use a fixed threshold (e.g.,

θ

= 0.6) for relationship discrimination for all samples. On the EventStoryLine dataset, the overall F1 score of the model decreases from 60.7% to 58.0%, a 2.7% drop. The F1 scores of intra-sentence and inter-sentence decreases by 0.7% and 2.4%, respectively. On the MAVEN-ERE dataset, the overall F1 score of the model also decreases by 1.9%. This indicates that the dynamic threshold adjustment mechanism makes a significant contribution to improving overall performance. Further analysis of intra-sentence and inter-sentence scenarios shows a notable decline in performance, mainly due to the inability of fixed thresholds to effectively capture low-confidence but true causal relationships in event-dense long texts.

Without using the similarity judgment mechanism, the model introduces more noisy edges in the causal graph inference complementation phase, which leads to a drop in performance for intra-sentence, inter-sentence, and overall causal identification on the EventStoryLine and MAVEN-ERE datasets. This suggests that the semantic similarity judgment mechanism helps prevent both false connections and missed connections in the inference process and plays an important role in improving the robustness of ECI.

On both datasets, the runtime analysis shows that there is a trade-off between overall model performance and runtime efficiency. The full model of IECI has a relatively high runtime while maintaining optimal recognition performance. This suggests that the added components, while increasing the computational cost, result in significant performance gains and maintain good scalability across documents of varying lengths and event densities. The increased runtime of IECI is primarily attributed to the similarity calculations performed between events.

5. Conclusions

This paper proposes the IECI, a pipelined framework for event causality identification that integrates two modules. The PRED module achieves event detection through structural semantic fusion with a prompt-based mechanism and provides high-quality semantic input for subsequent causal recognition modeling. The SRCIG module builds and iteratively optimizes document-level event causal graphs through an iterative mechanism based on event detection results, fusing semantic role labeling, adaptive thresholding, and causal inference complementation to capture implicit causal relationships across sentences. Experimental results on the EventStoryLine and MAVEN-ERE datasets validate the superior performance of both modules and demonstrate the combined strengths of the IECI framework in terms of structural modeling, semantic understanding, and interpretability. However, while the model performs well within documents, its ability to generalize to cross-document scenarios with sparse or ambiguous contextual links remains limited. Future work can further explore the joint optimization mechanism of event detection and event causality identification, as well as the generalization ability of cross-document causal inference to provide stronger support for causal inference in complex semantic scenarios.

Author Contributions

Conceptualization, Y.C.; software, H.C., Z.S., Y.Z. and H.Z.; validation, H.C., Z.S., Y.Z. and H.Z.; resources, Y.C.; data curation, H.C.; writing—original draft preparation, H.C.; writing—review and editing, H.C. and Y.C.; supervision, Y.C.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Technology R&D Program of China, grant number 2019YFC1606401, the National Natural Science Foundation of China, grant number 72301010, the Project of Cultivation for Young Top-Notch Talents of Beijing Municipal Institutions, grant number BPHR202203061, the Project of Construction and Support for High-Level Innovative Teams of Beijing Municipal Institutions, grant number BPHR20220104, and the Beijing Scholars Program, grant number 099.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and code used in this study are publicly available at https://github.com/qwe185/IECI (accessed on 17 May 2025).

Acknowledgments

We thank all the anonymous reviewers for their thoughtful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Moreira, F.; Velez-Bedoya, J.I.; Arango-López, J. Integration of Causal Models and Deep Neural Networks for Recommendation Systems in Dynamic Environments: A Case Study in StarCraft II. Appl. Sci. 2025, 15, 4263. [Google Scholar] [CrossRef]
Lu, F.; Li, Y.; Bao, Y. Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN. Appl. Sci. 2025, 15, 1504. [Google Scholar] [CrossRef]
Razouk, H.; Benischke, L.; Gärber, D.; Kern, R. Increasing the Accessibility of Causal Domain Knowledge via Causal Information Extraction Methods: A Case Study in the Semiconductor Manufacturing Industry. Appl. Sci. 2025, 15, 2573. [Google Scholar] [CrossRef]
Zhao, K.; Ji, D.; He, F.; Liu, Y.; Ren, Y. Document-Level Event Causality Identification via Graph Inference Mechanism. Inf. Sci. 2021, 561, 115–129. [Google Scholar] [CrossRef]
Chan, C.; Cheng, J.; Wang, W.; Jiang, Y.; Fang, T.; Liu, X.; Song, Y. ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations. arXiv 2024, arXiv:2304.14827. [Google Scholar]
Cai, R.; Yu, S.; Zhang, J.; Chen, W.; Xu, B.; Zhang, K. Dr.ECI: Infusing Large Language Models with Causal Knowledge for Decomposed Reasoning in Event Causality Identification. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B.D., Schockaert, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 9346–9375. [Google Scholar]
Yan, Y.; Liu, Z.; Gao, F.; Gu, J. Type Hierarchy Enhanced Event Detection without Triggers. Appl. Sci. 2023, 13, 2296. [Google Scholar] [CrossRef]
Chen, J.; Chen, P.; Wu, X. Generating Chinese Event Extraction Method Based on ChatGPT and Prompt Learning. Appl. Sci. 2023, 13, 9500. [Google Scholar] [CrossRef]
Zhang, S.; Ji, T.; Ji, W.; Wang, X. Zero-Shot Event Detection Based on Ordered Contrastive Learning and Prompt-Based Prediction. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, USA, 10–15 July 2022; Carpuat, M., de Marneffe, M.-C., Meza Ruiz, I.V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2572–2580. [Google Scholar]
Hashimoto, C. Weakly Supervised Multilingual Causality Extraction from Wikipedia. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 2988–2999. [Google Scholar]
Liu, J.; Chen, Y.; Zhao, J. Knowledge Enhanced Event Causality Identification with Mention Masking Generalizations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021. [Google Scholar]
Zuo, X.; Cao, P.; Chen, Y.; Liu, K.; Zhao, J.; Peng, W.; Chen, Y. Improving Event Causality Identification via Self-Supervised Representation Learning on External Causal Statement. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 2–4 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2162–2172. [Google Scholar]
Hu, Z.; Li, Z.; Jin, X.; Bai, L.; Guan, S.; Guo, J.; Cheng, X. Semantic Structure Enhanced Event Causality Identification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10901–10913. [Google Scholar]
Liu, J.; Zhang, Z.; Guo, Z.; Jin, L.; Li, X.; Wei, K.; Sun, X. KEPT: Knowledge Enhanced Prompt Tuning for Event Causality Identification. Knowl.-Based Syst. 2023, 259, 110064. [Google Scholar] [CrossRef]
Wang, X.; Luo, W.; Yang, X. An Event Causality Identification Framework Using Ensemble Learning. Information 2025, 16, 32. [Google Scholar] [CrossRef]
Gao, L.; Choubey, P.K.; Huang, R. Modeling Document-Level Causal Structures for Event Causal Relation Identification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 1808–1817. [Google Scholar]
Tran Phu, M.; Nguyen, T.H. Graph Convolutional Networks for Event Causality Identification with Rich Document-Level Structures. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3480–3490. [Google Scholar]
Chen, M.; Cao, Y.; Deng, K.; Li, M.; Wang, K.; Shao, J.; Zhang, Y. ERGO: Event Relational Graph Transformer for Document-Level Event Causality Identification. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; Calzolari, N., Huang, C.-R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.-S., Ryu, P.-M., Chen, H.-H., Donatelli, L., Ji, H., et al., Eds.; International Committee on Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2118–2128. [Google Scholar]
Liu, Z.; Hu, B.; Xu, Z.; Zhang, M. PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023. [Google Scholar]
Chen, M.; Cao, Y.; Zhang, Y.; Liu, Z. CHEER: Centrality-Aware High-Order Event Reasoning Network for Document-Level Event Causality Identification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10804–10816. [Google Scholar]
Yuan, C.; Huang, H.; Cao, Y.; Wen, Y. Discriminative Reasoning with Sparse Event Representation for Document-Level Event-Event Relation Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 16222–16234. [Google Scholar]
Zeng, X.; Bai, Z.; Qin, K.; Luo, G. EiGC: An Event-Induced Graph with Constraints for Event Causality Identification. Electronics 2024, 13, 4608. [Google Scholar] [CrossRef]
Li, X.; Nguyen, T.H.; Cao, K.; Grishman, R. Improving Event Detection with Abstract Meaning Representation. In Proceedings of the First Workshop on Computing News Storylines, Beijing, China, 26–31 July 2015; Caselli, T., van Erp, M., Minard, A.-L., Finlayson, M., Miller, B., Atserias, J., Balahur, A., Vossen, P., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 11–15. [Google Scholar]
Liu, S.; Liu, K.; He, S.; Zhao, J. A Probabilistic Soft Logic Based Approach to Exploiting Latent and Global Information in Event Classification. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; AAAI Press: Washington, DC, USA, 2016; pp. 2993–2999. [Google Scholar]
Nguyen, T.H.; Cho, K.; Grishman, R. Joint Event Extraction via Recurrent Neural Networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Knight, K., Nenkova, A., Rambow, O., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 300–309. [Google Scholar]
Sha, L.; Qian, F.; Chang, B.; Sui, Z. Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018. [Google Scholar]
Orr, W.; Tadepalli, P.; Fern, X. Event Detection with Neural Networks: A Rigorous Empirical Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 999–1004. [Google Scholar]
Yan, H.; Jin, X.; Meng, X.; Guo, J.; Cheng, X. Event Detection with Multi-Order Graph Convolution and Aggregated Attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 5766–5770. [Google Scholar]
Huang, L.; Ji, H. Semi-Supervised New Event Type Induction and Event Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 718–724. [Google Scholar]
Chen, J.; Lin, H.; Han, X.; Sun, L. Honey or Poison? Solving the Trigger Curse in Few-Shot Event Detection via Causal Intervention. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.-F., Huang, X., Specia, L., Yih, S.W., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 8078–8088. [Google Scholar]
Sheng, J.; Sun, R.; Guo, S.; Cui, S.; Cao, J.; Wang, L.; Liu, T.; Xu, H. CorED: Incorporating Type-Level and Instance-Level Correlations for Fine-Grained Event Detection. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1122–1132. [Google Scholar]
Wu, G.; Lu, Z.; Zhuo, X.; Bao, X.; Wu, X. Semantic Fusion Enhanced Event Detection via Multi-Graph Attention Network with Skip Connection. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 931–941. [Google Scholar] [CrossRef]
Fu, R.; Wang, H.; Zhang, X.; Zhou, J.; Yan, Y. SynED: A Syntax-Based Low-Resource Event Detection Method for New Event Types. Int. J. Innov. Comput. Inf. Control 2023, 19, 47–60. [Google Scholar]
Liu, B.; Rao, G.; Wang, X.; Zhang, L.; Cong, Q. DE3TC: Detecting Events with Effective Event Type Information and Context. Neural Process. Lett. 2024, 56, 89. [Google Scholar] [CrossRef]
Hershowitz, B.; Hodkiewicz, M.; Bikaun, T.; Stewart, M.; Liu, W. Causal Knowledge Extraction from Long Text Maintenance Documents. Comput. Ind. 2024, 161, 104110. [Google Scholar] [CrossRef]
Caselli, T.; Vossen, P. The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction. In Proceedings of the Events and Stories in the News Workshop, Vancouver, BC, Canada, 4 August 2017; Caselli, T., Miller, B., van Erp, M., Vossen, P., Palmer, M., Hovy, E., Mitamura, T., Caswell, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 77–86. [Google Scholar]
Wang, X.; Chen, Y.; Ding, N.; Peng, H.; Wang, Z.; Lin, Y.; Han, X.; Hou, L.; Li, J.; Liu, Z.; et al. MAVEN-ERE: A Unified Large-Scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 926–941. [Google Scholar]
Lai, P.; Ye, F.; Zhang, L.; Chen, Z.; Fu, Y.; Wu, Y.; Wang, Y. PCBERT: Parent and Child BERT for Chinese Few-Shot NER. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; Calzolari, N., Huang, C.-R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.-S., Ryu, P.-M., Chen, H.-H., Donatelli, L., Ji, H., et al., Eds.; International Committee on Computational Linguistics: New York, NY, USA, 2022; pp. 2199–2209. [Google Scholar]
Hu, S.; Zhang, H.; Hu, X.; Du, J. Chinese Named Entity Recognition Based on BERT-CRF Model. In Proceedings of the 2022 IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS), Zhuhai, China, 26–28 June 2022; pp. 105–108. [Google Scholar]
He, W.; Xu, Y.; Yu, Q. BERT-BiLSTM-CRF Chinese Resume Named Entity Recognition Combining Attention Mechanisms. In Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering, Dalian, China, 17–19 November 2023; Association for Computing Machinery: New York, NY, USA, 2024; pp. 542–547. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
Man, H.; Nguyen, C.V.; Ngo, N.T.; Ngo, L.; Dernoncourt, F.; Nguyen, T.H. Hierarchical Selection of Important Context for Generative Event Causality Identification with Optimal Transports. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; Calzolari, N., Kan, M.-Y., Hoste, V., Lenci, A., Sakti, S., Xue, N., Eds.; ELRA and ICCL: Paris, France, 2024; pp. 8122–8132. [Google Scholar]

Figure 1. PRED module diagram.

Figure 2. SRCIG module diagram.

Figure 3. A case study.

Table 1. Comparative experimental results of PRED on EventStoryLine.

Methods	Event Detection (%)			Runtime (min)
Methods	P	R	F1	Runtime (min)
BERT-NER [38]	86.4	84.9	85.6	31.5
BERT-CRF-NER [39]	89.1	87.7	88.2	38.6
BERT-BiLSTM-NER [40]	90.1	89.9	91.7	50.2
PRED (Ours)	95.9	95.2	95.4	73.1

Table 2. Comparative experimental results of PRED on MAVEN-ERE.

Methods	Event Detection (%)			Runtime (min)
Methods	P	R	F1	Runtime (min)
BERT-NER [38]	86.1	83.0	86.1	24.2
BERT-CRF-NER [39]	89.7	86.1	89.3	30.0
BERT-BiLSTM-NER [40]	91.4	89.5	92.2	36.3
PRED (Ours)	96.5	96.2	96.4	56.0

Table 3. Comparative experimental results of IECI on EventStoryLine.

Methods	Intra-Sentence (%)			Inter-Sentence (%)			Intra + Inter (%)			Runtime (min/fold)
Methods	P	R	F1	P	R	F1	P	R	F1	Runtime (min/fold)
LIP [28]	38.8	52.4	44.6	35.1	48.2	40.6	36.2	49.5	41.9	100.4
BERT [41]	47.8	57.2	52.1	36.8	29.2	32.6	41.3	38.3	39.7	117.8
RichGCN [29]	49.2	63.0	55.2	39.2	45.7	42.2	42.6	51.3	46.6	141.8
ERGO [30]	49.7	72.6	59.0	43.2	48.8	45.8	46.3	50.1	48.1	139.5
PPAT [31]	62.1	68.8	65.3	54.0	50.2	52.0	56.8	56.0	56.4	156.6
HOTECI [42]	66.1	72.3	69.1	81.4	40.6	55.1	63.1	51.2	56.5	167.0
IECI (Ours)	76.5	67.5	70.8	54.8	65.4	58.5	59.5	64.0	60.7	175.8

Table 4. Comparative experimental results of IECI on MAVEN-ERE.

Methods	Intra-Sentence(%)			Inter-Sentence(%)			Intra + Inter (%)			Runtime (min/fold)
Methods	P	R	F1	P	R	F1	P	R	F1	Runtime (min/fold)
BERT [41]	43.7	5.9	10.5	29.4	12.3	17.4	30.8	10.7	15.9	217.5
ERGO [30]	63.1	65.3	64.2	48.7	62.0	54.6	49.6	62.3	55.2	256.1
PPAT [31]	37.9	66.7	47.7	28.2	40.8	33.6	31.3	45.1	37.0	289.7
IECI (Ours)	71.7	50.2	64.9	63.7	48.3	56.5	66.0	48.1	56.2	326.6

Table 5. Ablation experiment results of IECI on EventStoryLine.

Methods	Intra-Sentence (%)			Inter-Sentence (%)			Intra + Inter (%)			Runtime
Methods	P	R	F1	P	R	F1	P	R	F1	(min/fold)
IECI (Ours)	76.5	67.5	70.8	54.8	65.4	58.5	59.5	64.0	60.7	175.8
w/o SRL	74.7	62.3	69.2	49.7	62.1	54.9	55.6	60.2	57.1	170.6
w/o Dynamic Threshold	75.4	65.6	70.1	53.5	62.9	56.1	57.3	61.4	58.0	171.2
w/o SimFilter	74.4	63.7	68.6	50.1	64.5	56.7	56.3	60.5	57.9	167.8

Table 6. Ablation experiment results of IECI on MAVEN-ERE.

Methods	Intra-Sentence (%)			Inter-Sentence (%)			Intra + Inter (%)			Runtime
Methods	P	R	F1	P	R	F1	P	R	F1	(min/fold)
IECI (Ours)	71.7	50.2	64.9	63.7	48.3	56.5	66.0	48.1	56.2	326.6
w/o SRL	69.5	44.9	62.8	57.4	43.8	52.3	61.8	42.9	53.4	310.7
w/o Dynamic Threshold	70.2	48.7	63.4	62.4	45.1	53.6	63.9	46.3	54.3	319.5
w/o SimFilter	68.8	47.1	63.6	59.1	47.2	55.1	61.9	43.8	54.5	304.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, H.; Cai, Y.; Song, Z.; Zhang, Y.; Zhang, H. IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains. Appl. Sci. 2025, 15, 7348. https://doi.org/10.3390/app15137348

AMA Style

Chen H, Cai Y, Song Z, Zhang Y, Zhang H. IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains. Applied Sciences. 2025; 15(13):7348. https://doi.org/10.3390/app15137348

Chicago/Turabian Style

Chen, Hefei, Yuanyuan Cai, Zexi Song, Yiyao Zhang, and Hongbo Zhang. 2025. "IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains" Applied Sciences 15, no. 13: 7348. https://doi.org/10.3390/app15137348

APA Style

Chen, H., Cai, Y., Song, Z., Zhang, Y., & Zhang, H. (2025). IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains. Applied Sciences, 15(13), 7348. https://doi.org/10.3390/app15137348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IECI: A Pipeline Framework for Iterative Event Causal Identification with Dynamic Inference Chains

Abstract

1. Introduction

2. Related Work

2.1. Event Causality Identification

2.2. Event Detection

3. Methodology

3.1. Prompt-Based Event Detection

3.1.1. Semantic-Based Context Embedding

3.1.2. Prompt-Based Type-Aware Embedding

3.1.3. Attention-Based Feature Fusion

3.1.4. Event Prediction

3.2. Semantic-Role Guided Causal Inference Graph

3.2.1. Contextual Representation with Semantic Role Attention

3.2.2. Event Causal Predictor

3.2.3. Causal Relation Graph Construction

3.2.4. Causal Graph Refinement

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Parameter Settings

4.3. Experimental Results and Discussion

4.3.1. Experiments for PRED

4.3.2. Experiments for IECI

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI