A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning

Chen, Jianhui; Tang, Zhiyi; Ma, Lianfang; Zhang, Zitong; Yang, Haonan

doi:10.3390/sym17111910

Open AccessArticle

A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning

by

Jianhui Chen

^1,*,

Zhiyi Tang

¹,

Lianfang Ma

²

,

Zitong Zhang

¹ and

Haonan Yang

¹

School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China

²

School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1910; https://doi.org/10.3390/sym17111910

Submission received: 29 September 2025 / Revised: 30 October 2025 / Accepted: 4 November 2025 / Published: 7 November 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Event–event relation extraction (ERE) is an important and challenging task in natural language processing. At present, state-of-the-art ERE methods mainly adopt supervised learning, especially deep learning, which needs a large number of high-quality labeled event corpora. However, these methods will face the challenge of few-shot learning for extracting Chinese multiple event–event relations. Complex deep learning models often cannot converge effectively on small Chinese event corpora. And the manual event relation labeling is a very time-consuming and uncertain work. This paper proposes a joint extraction model for multiple Chinese event–event relations based on weighted double consistency constraint learning, named the Chinese event–event relations miner (CERMiner), to extract multiple types of Chinese emergency event–event relations jointly. After encoding event pairs from their contexts, a group of weighted double consistency constraint, including common sense constraints and domain constraints, are designed and integrated into model learning to accelerate model convergence on few-shot corpora. To evaluate the effectiveness of the CERMiner model, we conduct experiments on the CEC dataset, which contains three relation types—CE, EC, and AC—with 697, 200, and 242 instances, respectively. We report Precision, Recall, and F1-score as evaluation metrics. Our method achieves 84.8%, 72.7%, and 78.2% in Precision, Recall, and

F_{1}

-score, respectively, outperforming the SGT baseline by 1.7% in

F_{1}

-score. These results demonstrate that the proposed model can better realize joint extraction of multiple Chinese emergency event–event relations in low-resource environments compared to existing state-of-the-art methods.

Keywords:

Chinese emergency events; event–event relation extraction; consistency constraint learning; few-shot learning; Bi-LSTM

1. Introduction

Over the past few years, a vast amount of emergency information [1], especially online news, has been generated on various Internet and social media platforms. This information pertains to both natural emergency events, such as fires and earthquakes, and man-made emergencies, such as traffic accidents, epidemics, and terrorist attacks, which often prompt intense public discussion. For example, using the keyword “Myanmar 7.9 magnitude earthquake”, over 200 related reports were found on Tencent News within a week after the earthquake, on Weibo, the number of discussion threads exceeded 10,000.

Timely mining of emergency information is crucial for public safety, and event–event relation extraction (ERE) is a core component of this process [2]. Emergency ERE involves various types of event–event relations, including causal relation [3], temporal relation [4], coreference relation [5] and hierarchical relation [6]. These relations provide critical information for tasks such as event tracing and detection. Emergency ERE is also a challenging task.Compared with traditional ERE data sources such as scientific literature and financial reports, emergency news and public opinion demonstrate characteristics of non-standardized expression, cross-source information entanglement, and blurred domain boundaries due to their timeliness requirements [7]. Existing studies on Emergency ERE have typically employed sophisticated heterogeneous neural networks, such as RoBERTa-MCT(RoBERTa-based Temporal Relation Extraction Model) [8] and Multi-Graph Attention Network [9], to model document-level contextual features for improving relation recognition or classification. These approaches depended on a large number of high-quality labeled event corpora. Those large-scale open English event corpora, such as Richer Event Descriptions, CausalBank, and The Penn Discourse TreeBank (PDTB), provided a solid foundation. For example, the PDTB-2.0 dataset [10] contains four types of event–event relations, including 18,459 instances of the “Explicit” relation, 16,224 instances of the “Implicit” relation, 624 instances of the “AltLex” relation, and 5210 instances of the “EntRel” relation, totaling 40,600 annotated instances. The dataset employed discourse relation annotation, semantic role annotation, and attribution annotation to enrich corpus features at syntactic, semantic, and discourse structural levels.

However, there is a big challenge for ERE on corpora of other languages, such as Chinese. It is challenging to obtain large-scale, high-quality labeled corpora for such tasks. For example, the numbers of top three types of event–event relations, including “Cause-Effect”, “Effect-Cause”, and “Accompany”, in the open Chinese Emergency Corpus (CEC) are only 697, 200, and 242, respectively. Complex neural network models of existing ERE methods often cannot converge effectively on such a small corpus set. Moreover, the structures of events and event–event relations are far more complex than those of entities and entity relations. Manual annotation of large-scale Chinese ERE corpora is not only prohibitively time-consuming but also highly susceptible to subjective bias, rendering it impractical for real-world deployment. Event relation types often exhibit high semantic similarity: the same relation can be expressed through diverse linguistic patterns, while different but similar relations may share nearly identical syntactic forms. This ambiguity in semantic boundaries makes it difficult for models to distinguish fine-grained distinctions, leading to poor generalization and increased risk of overfitting in low-data regimes. Few-shot learning also faces the challenge of data imbalance, where significant disparities in the number of instances across different relation types result in biased model performance and uneven predictive accuracy across categories. Therefore, few-shot learning has become an urgent research demand in non-English ERE, particularly in dynamic and high-stakes domains such as emergency management.

Based on the above observations, This work investigates how to effectively extract multiple event relations in the Chinese emergency domain under few-shot settings by integrating domain constraints and common sense constraints into a weighted double consistency constraint learning framework—aimed at improving generalization, accelerating convergence, and reducing label noise. To this end, we propose CERMiner, a joint extraction model for Chinese emergency event–event relations based on this framework. The model incorporates prior knowledge through two types of consistency constraints to effectively address the few-shot learning challenges in Chinese emergency ERE. The main contributions are as follows:

Firstly, this paper proposes a joint extraction model for multiple Chinese event–event relations based on heterogeneous deep neural networks. By concatenating contextualized embeddings with POS embeddings and encoding them via Bi-LSTM, we aim to capture richer syntactic and semantic features of event pairs; we expect this design to yield more accurate relation predictions, especially when labeled data is limited.
Secondly, this paper proposes a weighted double consistency constraint learning framework to integrate prior knowledge into neural network models. Both common sense and domain consistency constraints are defined and transformed into distinguishable learning objectives with dynamic weights. We hypothesize that this regularization mechanism will guide the model toward logically consistent predictions, thereby improving its generalization capability in few-shot settings.
Finally, a group of experiments was performed on the open CEC corpus. The experimental results show that the proposed CERMiner achieves significant improvement in the joint extraction of multiple Chinese emergency event–event relations over existing state-of-the-art models under low-resource conditions.

The rest of this paper is organized as follows. Section 2 reviews related work in event relation extraction and few-shot learning for information extraction. Section 3 presents the proposed framework and datasets, focusing on the feature extraction module and the integration of two types of logical constraints. The CEC dataset is used to validate the effectiveness of the approach. Section 4 describes the experimental setup. Section 5 reports the results and provides in-depth analysis. Finally, Section 6 concludes the paper and discusses future work.

2. Related Work

2.1. Event–Event Relation Extraction

The purpose of ERE is to extract diverse semantic relations between different events. In the early research, researchers mainly used pattern matching methods to extract event–event relations. Khoo et al. performed pattern matching on newspaper texts to extract causal knowledge through a set of language patterns that usually represent the existence of causal relations [11]. Nichols et al. improved the convex closure algorithm for pattern matching in partial order event data to better identify relations between complex event sets [12].

However, pattern matching often leads to serious semantic drift, which affects the accuracy of extraction [13]. In order to overcome these shortcomings, researchers began to use machine learning methods to develop ERE models. Chang et al. used probabilistic and naive Bayes classifiers to extract event–event relations based on noun phrases or sentence representations of event–event relations [14]. Girju et al. integrated core features, contextual features, and special features extracted from various knowledge sources for identifying semantic relations between events [15]. These event extraction models based on machine learning are essentially feature-based classifiers, which rely on explicit and implicit features of events and their relations. Due to the advantages of deep neural networks in text feature learning, the event extraction models based on deep learning have become a major focus of current research. Liu et al. adopted a convolutional neural network (CNN) to learn lexical features and integrated semantic knowledge into the neural network for ERE [16]. Xu et al. used a multilayer bidirectional long short-term memory (Bi-LSTM) and the attention mechanism to learn hidden semantic context representations, combining these representations with edge attributes for event temporal relation extraction [17]. Li et al. added contrastive learning to the pre-trained language model BERT for long-tail distantly supervised relation extraction in text [18].

In addition to extracting a specific type of event–event relations, researchers have begun to focus on the joint extraction of multiple types of event–event relations. Man et al. proposed the SCS-EERE model for the joint extraction of different types of sub-event–event relations [19]. In order to better model document-level context with important context sentences, the model adopts RoBERTa and a Long Short-Term Memory Network to identify the most important context sentences for a given entity mention pair in a document. El-Allaly et al. proposed an attentive joint model to extract complex relations between adverse drug events. They developed a transformer-based weighted graph convolutional network (GCN) to effectively combine the advantages of context and syntactic information to improve model performance [20]. Huang et al. proposed the LogicERE to extract multi-type of temporal relations and sub-event–event relations [21]. They constructed a high-order reasoning network on the logic constraint induced graph to capture diversified high-order interactions within events and event pairs. However, their rules are relatively simplistic, as they only consider the inherent properties of event relation types in a unidirectional manner. Moreover, the construction of the graph structure depends heavily on high-quality datasets, which may lead to suboptimal performance in few-shot scenarios. Chen et al. proposed CHEER, a joint event causality identification model for document-level reasoning. It constructs an Event Interaction Graph (EIG) to unify event and event pair representations, enabling joint inference over event triggers and their causal relations. By incorporating event centrality and high-order logic, CHEER performs effective end-to-end reasoning, capturing complex dependencies across events [22]. Nevertheless, the approach suffers from high computational overhead due to the large graph construction and does not account for conflicts among logical constraints. Most of these models employ complex neural networks, such as transformer-based weighted GCN, to capture diverse linguistic features. Consequently, they often rely on large-scale corpora. For example, Man et al.’s experiments were performed on HiEve, with 29,956 pairs of event mentions [19].

Chinese ERE has also received research attention. Li et al. proposed a discourse-level global reasoning model to identify the temporal relations between Chinese events at document-level [23]. The model provided various discourse-level constraints, which are derived from event semantics, to enhance its performance. Zhu et al. proposed a BiLSTM-Attention model to extract emergency relations in the text records of urban rail transit operation emergencies [24]. The model combined Chinese word vector with Chinese character vector to obtain the information of the text data from different levels, and inputted obtained information into the bidirectional long short-term memory neural network attention mechanism model for simultaneously extracting the overall and local features of the text. Wan et al. proposed a multi-type Chinese event–event relations extraction framework in the finance domain [25]. By constructing a syntactic–semantic dependency graph and an augmented BERT embedding layer, richer sentence semantics are captured to improve ERE. Similar to English ERE, existing studies on Chinese ERE also focus on complex deep learning networks and rely on large labeled datasets. For example, Li et al.’s study used the ACE2005 Chinese corpus, which contains 21,132 annotated event temporal relations [23]. Due to the complexity of events and event–event relations, manually annotating large-scale event–event relation corpora is extremely time-consuming. Moreover, in the context of emergency texts, which often exhibit low structural regularity, the diversity of event–event relation types further complicates the annotation process. Large-scale annotation efforts struggle to ensure data quality and are difficult to implement in practical applications. Therefore, developing models with strong few-shot learning capabilities has become an urgent research demand in the field of Chinese ERE.

2.2. Few-Shot Learning on Information Extraction

Few-shot learning, which uses small sample data for effective model training, is a current research hot spot and has been widely studied in medical images [26], remote sensing images [27], natural language processing [28], etc. Information extraction is also an important research field of few-shot learning. Current few-shot learning on information extraction mainly focuses on relatively simple information extraction tasks, such as entity recognition and relation extraction. Related methods are divided into three categories usually, including data enhancement, transfer learning, and knowledge enhancement.

Data augmentation is one of the earliest adopted methods for few-shot learning. It directly generates a large target dataset based on external data and knowledge sources. For example, Hou et al. took distant-supervised relation extraction as the starting point to overcome the limitations of artificial data annotation. The model incorporates a bag-level attention mechanism that labels sentences in an external knowledge base using a small-scale knowledge graph (KG). It addresses the noise introduced by distant supervision through a selective attention mechanism [29]. Since data augmentation methods tend to introduce noise, transfer learning-based methods have gained more attention. These methods leverage pre-trained knowledge from large-scale source tasks to rapidly adapt to target tasks with limited labeled data. For example, Yang et al. introduced a fine-tuning task based on hints to better capture the semantics of relational labels in the case of low resources [30]. Chen proposed an improved multi-source domain neural network transfer learning architecture for the biomedical trigger detection task, which can share knowledge between the multi-source and target domains more comprehensively [31]. These methods depend on large-scale pre-trained models or similar datasets for source task learning. Different from them, knowledge enhancement methods utilize external knowledge bases to enrich data representations to improve the learning ability of models on the small dataset. For example, Zhang et al. proposed a new biomedical information extraction method, which constructed sentence-level knowledge maps from Comparative Toxicogenomics Database external knowledge base and used them to enrich AMR maps to improve the model’s understanding of complex scientific concepts [32]. Yuan et al. proposed a knowledge-enhanced cross-modal prompt model that uses dynamic prompt and GPT–3.5–turbo as knowledge acquisition to jointly extract entities and relations from text–image pairs for social media posts [33]. Li et al. proposed a deep knowledge fusion approach for biomedical event causal relation extraction, integrating structural event representations and knowledge-enhanced entity paths via attention, and augmenting imbalanced data using RoBERTa-based text generation [34].

These few-shot learning methods have achieved significant results in various information extraction tasks based on small datasets, but there are still some shortcomings. Data augmentation methods often introduce noisy data. Transfer learning-based methods need high-quality and large-scale pre-trained models or source datasets. Knowledge enhancement methods rely on the well-defined knowledge base closely related to the target task.

Compared with named entity recognition and entity relation extraction, ERE is a relatively complex and challenging task. To our knowledge, the studies about few-shot learning on ERE tasks, especially Chinese ERE tasks, are still lacking. Although Chinese ERE studies have also received research attention, relevant corpus resources, including pre-training models, similar source datasets, and domain knowledge bases, are still insufficient. Therefore, those existing common few-shot learning methods on information extraction are difficult to apply directly to Chinese ERE.

In the recent past, constrained learning has emerged as a method for few-shot learning [35]. It is a novel knowledge enhancement approach for few-shot learning. Hoernle et al. proposed MultiplexNet which represents domain knowledge as a quantifier-free logical formula in disjunctive normal form (DNF) to incorporate expert knowledge into the training of deep neural networks. Experimental results of image labeling showed that MultiplexNet led to better results in terms of data efficiency, reducing the data burden in the training process [36]. Daniele et al. developed a Neural-Symbolic architecture that injected prior logical knowledge into a neural network by adding on its top a residual layer that modifies the initial predictions according to the knowledge. Experimental results of citation network classification showed that adding the knowledge has the same effect as doubling up the training data [37]. Huang et al. incorporated logic constraint information via a joint logic learning module for logic-guided ERE [21].

These studies have demonstrated the effectiveness of constrained learning in few-shot learning across multiple tasks. It does not need additional corpus resources, such as pre-trained models, source datasets, and formal domain knowledge bases. However, there is still a challenge for applying constrained learning to few-shot learning on Chinese ERE. Though Wang et al. have adopted constrained learning for English ERE [38], their study used trigger words to represent events, which is more similar to entity relations extraction. However, the method lacks experimentation on Chinese domain data, and it is difficult to achieve effective event relation extraction in few-shot scenarios. Furthermore, because the description of events in the emergency information is highly rigorous, there are a series of domain rules that can be applied to Chinese ERE. Therefore, this paper develops a weighted double consistency constraint learning framework for realizing few-shot learning on Chinese ERE. Both common sense and domain consistency constraints are defined and transformed into distinguishable learning objectives. An adaptive weight learning mechanism is introduced to dynamically adjust the influence of each logical rule. This framework integrates domain-specific knowledge of events with prior relational constraints, offering an innovative approach to event relation extraction in emergency-domain graphs.

3. Methods

3.1. Dataset

The CEC corpus is a Chinese emergency event corpus constructed by the Shanghai University Semantic Intelligence Laboratory [39]. According to the classification system of the Overall Emergency Response Plan, emergency events in the CEC corpus can be divided into five categories, including earthquakes, fires, traffic accidents, food poisoning, and terrorist attacks. Each event contains five elements, namely Denoter, Time, Location, Participant, and Object. There are seven types of event triggers, including “emergency”, “movement”, “stateChange”, “statement”, “perception”, “action”, “operation”. There are six types of event–event relations, including “Cause-Effect”, “Effect-Cause”, “Accompany”, “Follow”, “Thought content”, “Composite”, and “Concurrency”. The three most frequent types of relations are “Cause-Effect (CE)”, “Effect-Cause (EC)”, and “Accompany (AC)”. Their numbers are 697, 200 and 242, respectively. Therefore, this paper chose these three relations for joint extraction.

Figure 1 gives an example in the CEC corpus. Red words are trigger words, and green or yellow words are arguments. This example contains three “Terrorist attack” events and two “CE” event relations. By using the “trigger + parameter” structure [40], these three events can be represented as follows:

E1: Terrorist attack{Trigger: 发生(rocked), <Argument1: 连环爆炸(a series of explosions), Role: Emergency>, <Argument2: 沙姆沙伊赫(Sharm el-Sheikh), Role: Location>}

E2: Terrorist attack{Trigger: 死亡(killed), <Argument: 至少90人(At least 90 people), Role: Object>}

E3: Terrorist attack {Trigger: 受伤(injured), <Argument: 240多人(more than 240 people), Role: Object>}

Figure 1. An example of event–event relations in the CEC Corpus.

3.2. The Overall Framework

In this paper, we propose the CERMiner for jointly extracting multi-type relations between Chinese emergency events. As shown in Figure 2, The overall framework consists of four layers: input vectorization, feature extraction, domain knowledge fusion, and relation extraction. The model adopts a hybrid RoBERTa + Bi-LSTM architecture, combining strong semantic representation with sequential modeling capabilities, making it well suited for fine-grained token-level prediction tasks. This architecture has been widely validated in information extraction. To address the challenges of few-shot learning, constraint learning is adopted. Both common sense and domain consistency constraints are defined and implemented in the relation extraction layer for accelerating the convergence of the model on the few-shot Chinese event corpora. Figure 3 presents the data flow diagram, illustrating the overall data processing pipeline.

3.2.1. Input Vectorization Layer

The input vectorization layer transforms a document into a sequence of input vectors. This paper adopts contextualized embeddings and POS embeddings to construct input vectors. For a given document, it can be represented as a sequence of tokens:

\begin{matrix} D = {w 1_{P}, w 2_{P}, \dots, P, \dots, w n_{P}, w 1_{H}, w 2_{H}, \dots, H, \dots, w m_{H}} \end{matrix}

(1)

where some tokens belong to the marked event trigger set

E_{D} = {P, H \dots}

and the rest of the tokens belong to the word set

W_{D} = {w 1_{P}, w 2_{P}, \dots, w n_{P}, w 1_{H}, \dots, w m_{H}}

.

Contextualized embeddings of tokens are produced using the pre-trained language model RoBERTa, which demonstrates strong semantic representation capabilities in natural language processing tasks, effectively captures long-range contextual dependencies, understands complex semantic relationships, and achieves good performance in event relation extraction from long documents [41]. Feeding D into RoBERTa, a sequence of contextualized embeddings

C E_{D} = {C E_{w 1_{P}}, C E_{w 2_{P}}, \dots, C E_{P}, \dots, C E_{w n_{P}}, C E_{w 1_{H}}, \dots, C E_{H}, \dots, C E_{w m_{H}}}

can be produced. POS embeddings are produced using one-hot vectors of POS tags. For the given document D, its POS embeddings sequence can be represented as

P E_{D} = {P E_{w 1_{P}}, P E_{w 2_{P}}, \dots, P E_{P}, \dots, P E_{w n_{P}}, P E_{w 1_{H}}, \dots, P E_{H}, \dots, P E_{w m_{H}}}

. Concatenating contextualized embeddings with POS embeddings, the input vector can be obtained as the input of the feature extraction layer:

\begin{matrix} I V_{D} = {I V_{w 1_{P}}, I V_{w 2_{P}}, \dots I V_{P}, \dots, I V_{w n_{P}}, I V_{w 1_{H}}, \dots I V_{H}, \dots, I V_{w m_{H}}} \end{matrix}

(2)

where

I V_{i} = [C E_{i}, P E_{i}]

.

3.2.2. Feature Extraction Layer

The feature extraction layer encodes contextual features of event pairs based on the input vector. Bi-LSTM is capable of effectively capturing bidirectional semantic dependencies and has proven to be powerful in sequence prediction tasks. Therefore, this paper employs Bi-LSTM, combined with a series of embedding operations, to perform feature extraction.

Firstly, the input vector

I V_{D}

is fed into the Bi-LSTM. The last BiLSTM layer is stacked on top of each event trigger and its hidden state is treated as the embedding representation of the event. For any event P in D, its embedding can be expressed as follows:

\begin{matrix} h_{P} = [\vec{L S T M} (I V_{S}), \overset{\leftarrow}{L S T M} (I V_{S})] \end{matrix}

(3)

where

I V_{S} = {I V_{w 1_{P}}, I V_{w 2_{P}}, \dots, I V_{P}, \dots, I V_{w n_{P}}}

denotes the input vector of event mention of P,

\vec{L S T M} (I V_{S})

denotes the output of the forward hiding layer, and

\overset{\leftarrow}{L S T M} (I V_{S})

denotes the output of the backward hiding layer.

Secondly, element-wise Hadamard product and subtraction are performed on each event pair embeddings to get the contextual feature of event pairs for comprehensively modeling embedding interactions. For any event pair

(P, H)

, its contextual feature can be expressed as follows:

\begin{matrix} h (P, H) = [h_{P}, h_{H}, (h_{P} - h_{H}), h_{P} \circ h_{H}] \end{matrix}

(4)

where ∘ denotes the Hadamard product operator for matrices.

3.2.3. Relation Extraction Layer

The relation extraction layer predicts event–event relations based on their joint features. This paper sends the joint features of event pairs into the MLP decoder and trains the decoder to estimate the possibility of the relations between event pairs:

\begin{matrix} H_{e} = R e l u (\hat{h} (P, H) W_{h} + b_{h}) \end{matrix}

(5)

\begin{matrix} O_{e} = S o f t m a x (H W_{y} + b_{y}) \end{matrix}

(6)

where

H_{e}

denotes the hidden layer of the MLP encoder,

W_{h}

and

b_{h}

denote the weight and deviation of the hidden layer,

O_{e}

denotes the output layer of the MLP decoder, and

W_{y}

and

b_{y}

denote the weight and deviation of the output layer.

There are various logic constraints among events. For an example, if there is an “CE” relation between an event pair

(P, H)

, there is a “EC” relation between the event pair

(H, P)

. Obviously, these constraints are useful for relation prediction. This paper develops a weighted double consistency constraint learning framework to integrate these logic constraints into the learning goal of encoder. The details will be introduced in the following section.

3.3. Weighted Double Consistency Constraints

This paper specifies two aspects of logic constraints among events, including common sense constraints and domain constraints. The domain constraints rules are manually defined based on expert assessment, while the common sense constraints are defined with reference to the work of Wang et al. [38]. We define these constraints with declarative logic rules and transform them into differentiable loss functions. When the model’s predictions violate the constraint rules, the loss value increases, which helps guide the model to learn in the correct direction.

3.3.1. Common Sense Constraints

Common sense constraints are some domain independent logic constraints for event–event relations prediction. Referring thus, this paper defines three common sense constraints, including annotation consistency, symmetry consistency, and connection consistency.

Annotation consistency. If an event pair

(P, H)

is annotated with

α

, the model should predict this relation

α

. This logical formula can be represented as follows:

\begin{matrix} \underset{P, H \in E}{⋀} ⊤ \to α (P, H) \end{matrix}

(7)

where

E = E_{1} \cup E_{2} \cup \dots \cup E_{i} \cup \dots \cup E_{m}

is the event set in corpora and m is the number of documents, and ⊤ is the boolean true.

Symmetry consistency. If the model predicts a relation

α (P, H)

to hold between

(P, H)

, the converse relation

α^{- 1} (P, H)

holds. For an example, if the model predicts the “CE” relation between

(P, H)

, the converse relation “EC” holds. This logical formula can be represented as follows:

\begin{matrix} \underset{P, H \in E, α \in R_{s}}{⋀} α (P, H) \to α^{- 1} (H, P) \end{matrix}

(8)

where

R_{s}

is the set of relations enforcing the symmetry constraint.

Conjunction consistency. This constraint has two rules. In the first rule, if the model predicts two relations

α (P, H)

and

β (H, M)

, it can be inferred that the relation

γ (P, M)

holds. For an example, the “AC” relation is a transitive relation. If the model predicts two relations

A C (P, H)

and

N R (H, M)

, the

N R (P, M)

holds. This logical formula can be represented as follows:

\begin{matrix} \underset{\begin{matrix} P, H, M \in E, \\ α, β \in R, γ \in D e c (α, β) \end{matrix}}{⋀} α (P, H) ⋀ β (H, M) \to γ (P, M) \end{matrix}

(9)

where R is the set of all relations, and

D e c (α, β)

is the set of all relations that do not conflict with

α

and

β

in R.

The other rule is that if the model predicts two relations

α (P, H)

and

β (H, M)

, it can be inferred that the negation of relation

γ (P, M)

holds. For an example,

N o R e l (P, H)

denotes that not any relation holds between P and H. If the model predicts two relations

N o R e l (P, H)

and

C E (H, M)

, the

\neg E C (P, M)

holds. This logical formula can be represented as follows:

\begin{matrix} \underset{\begin{matrix} P, H, M \in E, \\ α, β \in R, γ \notin D e c (α, β) \end{matrix}}{⋀} α (P, H) ⋀ β (H, M) \to \neg δ (P, M) \end{matrix}

(10)

Table 1 describes all conjunctive rules for relations in the CEC corpus. Given the relations

α (P, H)

in the first column and the relations

β (H, M)

in the second column, the relations

γ (P, M)

in the third column can be deduced from their conjunction.

3.3.2. Domain Constraints

Type inferring consistency

In specific domains, sometimes possible relations between event pairs can be inferred based on the event types in event pairs. For example, there is a given event pair

(P, H)

in the CEC corpus. If P is a “statement” event and H is an “emergency” event, the model should predict that the relation between them is “CE” or “NR”. This logical formula can be represented as follows:

\begin{matrix} \underset{P, H \in E, λ \in I n f (P, H)}{⋀} ⊤ \to λ (P, H) \end{matrix}

(11)

where

I n f (P, H)

is the set of relations that satisfy the inferring consistency.

Type excluding consistency

In specific domains, sometimes specific event–event relations between event pairs can be excluded based on event types in event pairs. For example, there is a given event pair

(P, H)

in the CEC corpus. If the P is an “action” event and H is an “operation”, the model should predict that the relation between them is not “EC”. This logical formula can be represented as follows:

\begin{matrix} \underset{P, H \in E, λ \in O u t f (P, H)}{⋀} ⊤ \to \neg λ (P, H) \end{matrix}

(12)

where

O u t f (P, H)

is the set of relations that satisfy the excluding consistency.

This paper defines both type inferring consistency and type excluding consistency as domain constraints. Table 2 presents the induction table for these two types of constraints in the CEC corpus. Given the event P in the left-most column and the event H in the top row, each entry in the table is the inferred relation set of

(P, H)

, which includes all the relations that can be inferred between P and H.

3.3.3. Weighted Joint Learning Objective Loss

We employ the product t-norm to formulate the learning objective of maximizing the probability of the true labels, and transform the inconsistency with the product t-norm into the negative log space. This enables the conversion of logical constraints into specific differentiable loss functions to guide model learning. Accordingly, the loss functions for common sense constraints and domain constraints are derived as follows.

(1): Common sense constraints logical formulas

The annotation consistency constraint is transformed into the following annotation loss:

\begin{matrix} L_{a n n} = \sum_{P, H \in E} - ω_{R} l o g α (P, H) \end{matrix}

(13)

where

ω_{R}

is the label weight that seeks to balance the training loss for each relation.

The symmetry consistency constraint is transformed into the following symmetry loss:

\begin{matrix} L_{s y m} = \sum_{P, H \in E, K \in R_{s}} |l o g α (P, H) - l o g α^{- 1} (H, P)| \end{matrix}

(14)

The cross entropy of conjunction consistency is transformed into the following conjunction loss:

\begin{matrix} L_{c o n} = L_{c o n - i} + L_{c o n - n} \end{matrix}

(15)

\begin{matrix} L_{c o n - i} = \underset{\begin{matrix} P, H, M \in E, \\ α, β \in R, γ \in D e c (α, β) \end{matrix}}{⋀} |l o g α (P, H) + l o g β (H, M) - l o g γ (P, M)| \end{matrix}

(16)

\begin{matrix} L_{c o n - n} = \underset{\begin{matrix} P, H, M \in E, \\ α, β \in R, γ \notin D e c (α, β) \end{matrix}}{⋀} |l o g α (P, H) + l o g β (H, M) - l o g (1 - δ (P, M))| \end{matrix}

(17)

(2): Domain constraints logical formulas

Similarly to annotation constraints, the type inferring consistency is transformed into the following inferring consistency loss:

\begin{matrix} L_{t y p - i} = \sum_{\begin{matrix} P, H \in E, \\ λ \in I n f (P, H) \end{matrix}} - l o g λ (P, H) \end{matrix}

(18)

The type excluding consistency is transformed into the type inferring consistency loss:

\begin{matrix} L_{t y p - e} = \sum_{\begin{matrix} P, H \in E, \\ λ \in O u t f (P, H) \end{matrix}} - l o g (1 - λ (P, H)) \end{matrix}

(19)

After expressing the above logical formulas with different cross-entropy loss terms, we merge them into the following joint learning objective loss:

\begin{matrix} L = λ_{a n n} L_{a n n} + λ_{s y m} L_{s y m} + λ_{c o n} L_{c o n} + λ_{t y p - i} L_{t y p - i} + λ_{t y p - e} L_{t y p - e} \end{matrix}

(20)

where

λ

is a non-negative coefficient that controls the influence of the loss term and can be learned through the dynamic adjustment method [42].

λ

is initialized to 1.0, updated every step, and clipped to [0.1, 10.0] for all constraint groups. As the number of training epochs increases,

λ

is automatically adjusted by computing gradients during backpropagation, dynamically tuning its values to minimize the overall training loss, thereby balancing conflicts among different constraints.

4. Experiment

4.1. Baselines

Current ERE studies focus on designing sophisticated models to effectively extract both event-internal and inter-event contextual features. Complex model structures often require large-scale training data, which does not align with the few-shot learning scenario in this study. Since few studies have explored few-shot learning for ERE, this paper selects the following ERE models with low training data requirements as baseline methods. Their source codes are available in original papers.

C-GCN (Contextualized Graph Convolutional Networks) [43]: This model extracted entity-centric representations for robust relation prediction. An extended graph convolutional network (GCN) was employed to encode dependency structures of sentences for augmenting input information and enhancing the model’s learning capability with limited data.
B-LSTM (Bert-LSTM) [44]: This model employed BERT to encode sentences and their constraint information for relation extraction. A key phrase extraction network was constructed to obtain key phrase features of contexts and a global gating mechanism was used to transfer phrase contextual information to the current phrase representation for enhancing the information representation of the phrase itself. By augmenting input information, the model’s learning capability with limited data can be improved.
Capsule (Attention-Based Capsule Networks with Dynamic Routing) [45]: This model used a new capsule network with the attention mechanism for few-shot relation extraction. Unlike traditional networks that rely on implicit statistical learning, capsule networks with dynamic routing can stably learn generalizable feature patterns from limited data due to their explicit component-wise modeling.
SGT (Syntax-guided Graph Transformer network) [46]: This model proposed a new syntax-guided Graph Transformer network (SGT) to extract the temporal relations between events. By adding the new syntax-guided attention into Graph Transformer, the prior knowledge-guided feature extraction can be implemented to obtain an enhanced contextual representation of event mentions that considers both local and global dependencies between events. The model’s learning capability with limited data was consequently improved.

4.2. Metric

This paper uses the Precision, Recall, and

F_{1}

to measure experimental results. They can be calculated as follows:

\begin{matrix} Precision = \frac{T P}{T P + F P} \end{matrix}

(21)

\begin{matrix} Recall = \frac{T P}{T P + F P} \end{matrix}

(22)

\begin{matrix} F_{1} = \frac{2 * Precision * Recall}{Precision + Recall} \end{matrix}

(23)

where

T P

represents the number of true example,

F N

represents the number of false negative examples, and

F P

represents the number of false positive examples.

4.3. Implementations

To evaluate model performance under extreme data scarcity, simulating few-shot conditions in real-world emergency response, we adopted a specialized 5-fold evaluation setup on the CEC corpus. The corpus was randomly divided into five folds. In each fold, only one fold was used for training, while the remaining four folds were used for testing. This low-resource configuration ensures that the model learns from minimal labeled examples. We report the average Precision, Recall, and

F_{1}

.

All methods are implemented in PyTorch and trained on an NVIDIA RTX A6000, using Python version 3.7.12 and PyTorch version 1.11.0. Our models are trained for 80 epochs, taking approximately 2 h.

For the baseline model C-GCN, the dropout was set to 0.3, the seed was set to 1, the hidden layer dimension was set to 100, the learning rate was set to 0.003, the learning rate decay was set to 0.7, and the dimension of the word vector was set to 60.

For the baseline model B-LSTM, the dropout was set to 0.3, the seed was set to 1, the hidden layer dimension was set to 100, the learning rate was set to 0.003, learning rate decay was set to 0.7, the weight decay was set to 0.0001, and the dimension of the word vector was set to 60.

For the baseline model Capsule, the dropout was set to 0.3, the seed was set to 1, the learning rate was set to 0.0003, the influence factor of learning rate was set to 0.7, the weight decay was set to 0.0000001, and the dimension of word vector was set to 60.

For the baseline model SGT, the dropout was set to 0.1, the seed was set to 1, the batch size was set to 32, the weight decay was set to 0.0001, and the dimension of word vector was set to 60.

For our model CEMiner, the random seed is set to 42, the pre-trained language model is hfl/chinese-roberta-wwm-ext with all transformer layers frozen, the tokenizer is the built-in WordPiece tokenizer for Chinese subword segmentation, the number of Bi-LSTM layers is set to 1, the hidden size of Bi-LSTM is set to 256, the dropout rate is set to 0.5, the learning rate is set to 0.0000001, the batch size is set to 4, and the value of

ω_{R}

is set to 2.

5. Results

The experimental results are shown in Table 3. Compared with C-GCN, B-LSTM, Capsule, and SGT, our proposed method achieves the best results in Precision, Recall, and

F_{1}

-score. From Table 3, it can be seen that the average Precision rate, Recall rate and

F_{1}

-score in the CEC corpus reaches 84.8%, 72.7% and 78.2%, respectively.

It can be seen from Table 3 that the results of C-GCN and B-LSTM are the worst among these models and their average

F_{1}

-score remain around 45.8% and 64.1%. This indicates that, even with the introductions of syntactic dependency encoding and phrase encoding, the deep learning model still fails to adequately learn events and their contextual features in a few-shot dataset. Compared with these two baseline methods, Capsule demonstrates significant improvements in Precision rate, Recall rate, and

F_{1}

-score. This indicates that model optimization may prove more effective than input enhancement for few-shot ERE tasks. The results of SGT also confirmed this finding. Compared with Capsule, SGT achieves an improvement of 6.3% in the Precision rate, 3.8% in the Recall rate, and 4.8% in the

F_{1}

-score. Syntax knowledge-guided model optimization is effective for few-shot ERE tasks. Our proposed method CERMiner is also a knowledge-guided model optimization, in which constraint knowledge guides objective learning. This knowledge-guided model optimization is more task-oriented compared with knowledge-guided feature learning in SGT. Consequently, our method achieves state-of-the-art performance, demonstrating 4% and 1.7% improvements in Precision rate, and

F_{1}

-score, respectively, over SGT.

Figure 4 gives the performance comparison across relation types. Consistent with the results in Table 3, the two model optimization methods (Capsule and SGT) significantly outperform the two input enhancement methods (C-GCN and B-LSTM). Notably, the four baseline methods exhibit substantial variations in extracting the three types of event–event relations. The performance on AC relations is markedly inferior to the other two relation types. This stems from the inherently more ambiguous expression of accompany relations in texts compared to causal–effect or effect–cause relations, which leads to difficulties in contextual feature extraction and substantially hinders effective AC relation extraction.

Unlike these baseline methods, our method employs constrained knowledge-guided objective learning to directly optimize the model for task-specific objectives, thereby effectively circumventing the challenges of contextual feature extraction. Consequently, compared to baseline methods, our approach achieves the most balanced extraction performance across all three relation types, ultimately realizing optimal AC relation extraction. This demonstrates that by incorporating both domain constraints and common sense constraints, the model can effectively handle relation types that are difficult to identify, thereby enhancing its learning capability. This demonstrates that by incorporating both domain constraints and common sense constraints, the model can effectively handle relation types that are difficult to identify, thereby enhancing its learning capability. Moreover, the model performs well in addressing data sparsity: AC and EC relations are significantly underrepresented compared to CE relations in the dataset. Despite this imbalance, our model achieves robust performance across all relation types, whereas baseline methods tend to overfit to the majority class. This comparison further confirms that our approach effectively alleviates overfitting caused by sample imbalance, demonstrating its superior generalization in few-shot scenarios.

6. Ablation Study

This section will analyze the effectiveness of weighted double consistency constraint learning by an ablation study. Four CERMiner variants are designed as follows:

Bi-LSTM+MLP(BM): This model removes the weighted double consistency constraint learning from CERMiner.
Bi-LSTM+MLP+Common sense constraints (BM-CC): This model removes all domain constraints from CERMiner. The dynamic weight adjustment mechanism is also removed. All weights of loss terms are set to 1.
Bi-LSTM+MLP+Domain constraints (BM-DC): This model removes all common sense constraints from CERMiner. The dynamic weight adjustment mechanism is also removed. All weights of loss terms are set to 1.
Bi-LSTM+MLP+ Common sense constraints+Domain constraints (BM-CC-DC): This model contains both common sense constraints and domain constraints, but the dynamic weight adjustment mechanism is removed.

Table 4 shows the results of the ablation study. As can be seen from the table, BM-DC adds domain constraints to the model BM, and its Precision rate increases by 1.5%, Recall rate increases by 1.3%, and

F_{1}

-score increases by 1.3% compared with BM, indicating that domain constraints can effectively guide the model learning to alleviate the problem of insufficient training labels during the model learning. This guidance is implemented by integrating the constraints into the loss function or inference process, where predictions that violate known domain logic are penalized. As a result, the model is less likely to make semantically implausible predictions, thereby reducing false positives and improving

F_{1}

-score. This guidance is implemented by integrating the constraints into the loss function or inference process, where predictions that violate known domain logic are penalized. As a result, the model is less likely to make semantically implausible predictions, thereby reducing false positives and improving

F_{1}

-score. BM-CC incorporates common sense constraints into the base model (BM), resulting in a 1.4% increase in Precision, a 1.8% increase in Recall, and a 1.6% improvement in

F_{1}

-score. These constraints act as a “reasoning prior” that steers the model’s outputs toward semantically plausible configurations by filtering out illogical predictions—thus reducing false positives—and enabling logical propagation to infer missing relations, thereby improving overall performance. The success of both variants (BM-DC and BM-CC) demonstrates the effectiveness of integrating structured knowledge through domain and common sense constraints.

However, the BM-CC-DC model incorporating two types of constraints exhibits poorer performance compared to variants with only a single constraint. Specifically, compared to BM-CC, its precision rate decreases by 1.4% and the

F_{1}

-score decreases by 0.4%; compared to BM-DC, its Precision rate decreases by 1.3% and the

F_{1}

-score decreases by 0.3%. This is because potential conflicts exist between the two types of constraints. As illustrated in Figure 5, there can be overlapping or contradictory signals between common sense and domain-specific rules, which may confuse the model during training and degrade performance. The majority of conflicts currently arise from this source. To investigate the extent of its impact on model performance, we counted the number of relation types involved in such conflicts. As shown in Figure 6, subfigure (a) reveals that approximately 3.2% of all relation types exhibit conflicts under the two constraint sets. Furthermore, as illustrated in subfigure (b), taking common sense constraints as the reference, we calculated the proportion of domain rules that conflict with them, finding that about 36.4% of such rules are in conflict. To address this issue, we introduce an adaptive weight adjustment mechanism to build our final model CERMiner. Compared to the BM-CC-DC, CERMiner achieves a 2.2% improvement in precision rate, a 1% improvement in recall rate, and a 1.4% improvement in the

F_{1}

-score. Through an adaptive mechanism that modulates the influence of each constraint, the model is able to prioritize more reliable constraints on a per-instance basis, effectively resolving conflicts via gradient-based feedback during training. By harmonizing these competing signals within a unified framework, the model achieves more coherent and semantically plausible predictions, thereby significantly enhancing its generalization capability and performance in few-shot learning scenarios under knowledge-scarce conditions These results underscore the pivotal role of dynamic constraint harmonization, suggesting that the mechanism for adaptively coordinating knowledge sources is more critical than the constraints themselves in achieving effective performance within complex, knowledge-guided few-shot learning systems.

7. Conclusions

In few-shot event relation extraction, existing constraint learning methods primarily focus on predefined logical rules for event relations, without considering the inherent attributes of events in specific domains, and fail to account for potential conflicts among different constraints. Addressing these limitations and aligning with the specific requirements of this study, we propose a novel model named CERMiner for Chinese emergency event-relation extraction (ERE). To tackle the scarcity of annotated corpora in few-shot learning, the model adopts the classic RoBERTa+Bi-LSTM deep learning framework and innovatively introduces domain-specific constraints and common sense constraints to enhance the model’s learning capability. To mitigate constraint conflicts, a dynamic adjustment mechanism is designed to balance the influence of the two types of constraints, which is validated by our experimental results. As a paragraph-level ERE model, CERMiner can effectively extract multiple relations of emergency events from news and reports, thereby providing substantial assistance for Chinese event–event relation extraction under diverse conditions. However, this study employs a pre-trained model for feature encoding, and the rules must be defined manually, which requires considerable computational resources and involves summarizing constraints that may contain potential conflicts. In the future, we plan to optimize CERMiner efficiency by replacing the feature extraction component with a lightweight model. More importantly, we aim to incorporate an automated rule summarization module into the framework—enabling the model to dynamically discover and weight constraints from limited labeled or unlabeled data. This would significantly enhance its adaptability and generalizability across diverse and dynamic domains.

Author Contributions

Conceptualization, J.C. and Z.T.; methodology, J.C. and Z.T.; software, L.M. and Z.T.; validation, L.M. and Z.T.; formal analysis, J.C. and Z.T.; investigation, J.C. and Z.T.; resources, J.C.; data curation, Z.Z. and H.Y.; writing—original draft preparation, J.C. and Z.T.; writing—review and editing, J.C., Z.T., Z.Z. and H.Y.; visualization, Z.T.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Beijing Natural Science Foundation under Grant 4222022, in part by the National Key Research and Development Program of China under Grant 2020YFB2104402.

Data Availability Statement

The original data presented in the study are openly available in [the Semantic Intelligence Laboratory at Shanghai University] at [https://github.com/open-nlp/CEC-Corpus (accessed on 6 November 2025)].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shen, H.; Shi, J.; Zhang, Y. CrowdEIM: Crowdsourcing emergency information management tasks to the mobile social media users. Int. J. Disaster Risk Reduct. 2021, 54, 102024. [Google Scholar] [CrossRef]
Kuai, H.; Huang, J.X.; Tao, X.; Pasi, G.; Yao, Y.; Liu, J.; Zhong, N. Web intelligence (wi) 3.0: In search of a better-connected world to create a future intelligent society. Artif. Intell. Rev. 2025, 58, 265. [Google Scholar] [CrossRef]
Zhang, B.; Li, L.; Song, D.; Zhao, Y. Bomedical event causal relation extraction based on a knowledge-guided hierarchical graph network. Soft Comput. 2023, 27, 17369–17386. [Google Scholar]
Yao, H.-R.; Breitfeller, L.; Naik, A.; Zhou, C.; Rose, C. Distilling Multi-Scale Knowledge for Event Temporal Relation Extraction. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), Boise, ID, USA, 21–25 October 2024. [Google Scholar]
Yong, S.J.; Dong, K.; Sun, A. DOCoR: Document-level OpenIE with Coreference Resolution. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Kyoto, Japan, 12–16 February 2024. [Google Scholar]
Zhang, M.; Qian, T.; Liu, B. Exploit feature and relation hierarchy for relation extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 917–930. [Google Scholar] [CrossRef]
Han, X.; Wang, J. Earthquake Information Extraction and Comparison from Different Sources Based on Web Text. ISPRS Int. J. Geo-Inf. 2019, 252, 2220–9964. [Google Scholar] [CrossRef]
Xiao, H.; Zheng, S.; Chen, X.Y. Temporal Relationship Extraction of Conflict Events in Open Source Military Journalism. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, 15–17 September 2023. [Google Scholar]
Qiu, J.; Sun, L. A Joint Graph Neural Model for Chinese Domain Event and Relation Extraction with Character-Word Fusion. In Proceedings of the 2024 10th International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2024. [Google Scholar]
Prasad, R.; Dinesh, N.; Lee, A.; Miltsakaki, E.; Robaldo, L.; Joshi, A.; Webber, B. The Penn Discourse TreeBank 2.0. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, 28–30 March 2008. [Google Scholar]
Khoo, C.S.G.; Kornfilt, J.; Oddy, R.N.; Myaeng, S.H. Automatic Extraction of Cause-Effect Information from Newspaper Text Without Knowledge-based Inferencing. Lit. Linguist. Comput. 1998, 13, 77–186. [Google Scholar] [CrossRef]
Nichols, M. Efficient Pattern Search in Large, Partial-Order Data Sets. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2008. [Google Scholar]
Shen, J.; Wu, Z.; Lei, D.; Shang, J.; Ren, X.; Han, J. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. arXiv 2019, arXiv:1910.08192. [Google Scholar]
Chang, D.-S.; Choi, K.-S. Causal relation extraction using cue phrase and lexical pair probabilities. In Proceedings of the First International Joint Conference on Natural Language Processing, Hainan Island, China, 22–24 March 2004. [Google Scholar]
Girju, R.; Beamer, B.; Rozovskaya, A.; Fister, A.; Bhat, S. A knowledge-rich approach to identifying semantic relations between nominals. Inf. Process. Manag. 2010, 46, 589–610. [Google Scholar] [CrossRef]
Liu, C.; Sun, W.; Chao, W.; Che, W. Convolution Neural Network for Relation Extraction. In Advanced Data Mining and Applications; Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 231–242. [Google Scholar]
Xu, X.; Gao, T.; Wang, Y.; Xuan, X. Event Temporal Relation Extractionwith Attention Mechanism and Graph Neural Network. Tsinghua Sci. Technol. 2022, 27, 79–90. [Google Scholar] [CrossRef]
Li, T.; Wang, Z. LDRC: Long-tail Distantly Supervised Relation Extraction via Contrastive Learning. In Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing, Chongqing, China, 5–7 January 2023. [Google Scholar]
Man, H.; Ngo, N.T.; Van, L.N.; Nguyen, T.H. Selecting Optimal Context Sentences for Event-Event Relation Extraction. Proc. AAAI Conf. Artif. Intell. 2025, 36, 11058–11066. [Google Scholar] [CrossRef]
El-allaly, E.-d.; Sarrouti, M.; En-Nahnahi, N. An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation. J. Biomed. Inform. 2022, 125, 103968. [Google Scholar] [CrossRef]
Huang, P.; Zhao, X.; Hu, M.; Tan, Z.; Xiao, W. Logic Induced High-Order Reasoning Network for Event-Event Relation Extraction. Proc. AAAI Conf. Artif. Intell. 2025, 39, 24141–24149. [Google Scholar] [CrossRef]
Chen, M.; Cao, Y.; Zhang, Y.; Liu, Z. CHEER: Centrality-aware High-order Event Reasoning Network for Document-level Event Causality Identification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
Li, P.; Zhu, Q.; Zhou, G.; Wang, H. Global Inference to Chinese Temporal Relation Extraction. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016. [Google Scholar]
Zhu, G.; Huang, X.; Yang, R.; Sun, R. Relationship Extraction Method for Urban Rail Transit Operation Emergencies Records. IEEE Trans. Intell. Veh. 2023, 8, 520–530. [Google Scholar] [CrossRef]
Wan, Q.; Wan, C.; Xiao, K.; Hu, R.; Liu, D.; Liu, X. CFERE: Multi-type Chinese financial event relation extraction. Inf. Sci. 2023, 630, 119–134. [Google Scholar] [CrossRef]
Pan, X.; Wang, P.; Jia, S.E.A. Multi-contrast learning-guided lightweight few-shot learning scheme for predicting breast cancer molecular subtypes. Med. Biol. Eng. Comput. 2024, 62, 1601–1613. [Google Scholar] [CrossRef]
Miao, W.; Huang, K.; Xu, Z.; Zhang, J.; Geng, J.; Jiang, W. Pseudo-label meta-learner in semi-supervised few-shot learning for remote sensing image scene classification. Appl. Intell. 2024, 54, 9864–9880. [Google Scholar] [CrossRef]
Schwartz, E.; Karlinsky, L.; Feris, R.; Giryes, R.; Bronstein, A. Baby steps towards few-shot learning with multiple semantics. Pattern Recognit. Lett. 2022, 160, 142–147. [Google Scholar] [CrossRef]
Hou, J.; Li, X.; Zhu, R.; Zhu, C.; Wei, Z.; Zhang, C. A Neural Relation Extraction Model for Distant Supervision in Counter-Terrorism Scenario. IEEE Access 2020, 8, 225088–225096. [Google Scholar] [CrossRef]
Yang, S.; Song, D. FPC: Fine-tuning with Prompt Curriculum for Relation Extraction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Online Only, 20–23 November 2022. [Google Scholar]
Chen, Y. A transfer learning model with multi-source domains for biomedical event trigger extraction. BMC Genom. 2021, 22, 31. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Parulian, N.; Ji, H.; Elsayed, A.; Myers, S.; Palmer, M. Fine-grained Information Extraction from Biomedical Literature based on Knowledge-enriched Abstract Meaning Representation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021. [Google Scholar]
Yuan, l.; Cai, Y.; Huang, J. Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 28 October–1 November 2024. [Google Scholar]
Li, L.; Xiang, Y.; Hao, J. Biomedical event causal relation extraction with deep knowledge fusion and Roberta-based data augmentation. Methods 2024, 231, 8–14. [Google Scholar] [CrossRef]
Giunchiglia, E.; Stoian, M.C.; Lukasiewicz, T. Deep Learning with Logical Constraints. arXiv 2022, arXiv:2205.00523. [Google Scholar] [CrossRef]
Hoernle, N.; Karampatsis, R.M.; Belle, V.; Gal, K. MultiplexNet: Towards Fully Satisfied Logical Constraints in Neural Networks. Proc. AAAI Conf. Artif. Intell. 2022, 36, 5700–5709. [Google Scholar] [CrossRef]
Daniele, A. Knowledge Enhanced Neural Networks for Relational Domains. In PRICAI 2019: Trends in Artificial Intelligence, Proceedings of the Pacific Rim International Conference on Artificial Intelligence Cuvu, Yanuka Island, Fiji, 26 August 2019; Dovier, A., Montanari, A., Orlandini, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 542–554. [Google Scholar]
Wang, H.; Chen, M.; Zhang, H.; Roth, D. Joint Constrained Learning for Event-Event Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 6–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 696–706. [Google Scholar]
Liu, Z.; Huang, M.; Zhou, W.; Zhong, Z.; Fu, J.; Shan, J.; Zhi, H. Research on Event-oriented Ontology Model. Comput. Sci. 2009, 36, 189–192+199. [Google Scholar]
Sun, R.; Guo, S.; Ji, D.H. Topic Representation Integrated with Event Knowledge. Chin. J. Comput. 2017, 40, 791–804. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Chen, Z.; Badrinarayanan, V.; Lee, C.-Y.; Rabinovich, A. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. arXiv 2017, arXiv:1711.02257. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2–4 November; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 2205–2215. [Google Scholar]
Xu, S.; Sun, S.; Zhang, Z.; Xu, F.; Liu, J. BERT gated multi-window attention network for relation extraction. Neurocomputing 2022, 492, 516–529. [Google Scholar] [CrossRef]
Zhang, N.; Deng, S.; Sun, Z.; Chen, X.; Zhang, W.; Chen, H. Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistic: Stroudsburg, PA, USA, 2018; pp. 986–992. [Google Scholar]
Zhang, S.; Ning, Q.; Huang, L. Extracting Temporal Event Relation with Syntax-guided Graph Transformer. In Findings of the Association for Computational Linguistics: NAACL 2022; Carpuat, M., Marneffe, M.-C., Meza Ruiz, I.V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 379–390. [Google Scholar]

Figure 2. The overall framework of CERMiner.

Figure 3. Data flow diagram of CERMiner.

Figure 4.

F_{1}

-score comparison across relation types.

Figure 4.

F_{1}

-score comparison across relation types.

Figure 5. An example of conflicting logical constraints in event relation extraction.

Figure 6. Distribution of conflicts among constraint rules.

Table 1. The conjunctive constraints in the CEC corpus.

$α (P, H)$	$β (H, M)$	$γ (P, M)$
CE	CE	−
	EC	−
	AC	¬AC
	NR	¬AC
EC	CE	−
	EC	−
	AC	−
	NR	¬CE
AC	CE	NR
	EC	−
	AC	−
	NR	NR
NR	CE	−
	EC	¬CE
	AC	NR
	NR	−

−: no constraints; NR: no relations; ¬: excluding relation.

Table 2. Type inferring consistency constraints and excluding consistency constraints in the CEC corpus.

$(P, H)$	Emergency	Movement	stateChange	Statement	Perception	Action	Operation
Emergency	-	¬EC	CE, NR, ¬AC	-	-	-	-
Movement	-	-	-	-	-	-	¬EC
StateChange	EC, NR, ¬AC	-	-	-	-	EC, NR	-
Statement	CE, NR	CE, NR	-	-	AC, NR	-	AC, NR
Perception	-	-	-	-	-	-	-
Action	-	-	-	-	-	-	¬EC
Operaton	-	-	-	-	-	-	AC, NR, ¬EC

-: no constraints; ¬: excluding relation.

Table 3. Performance comparison between baselines and proposed CERMiner.

	C-GCN	B-LSTM	Capsule	SGT	CERMiner
P	0.575	0.664	0.745	0.808	0.848
R	0.382	0.631	0.695	0.733	0.727
$F_{1}$	0.458	0.641	0.717	0.765	0.782

Table 4. Performance comparison between proposed CERMiner and its variants.

Model	P	R	$F_{1}$
BM	0.825	0.702	0.759
BM-DC	0.840	0.715	0.772
BM-CC	0.839	0.720	0.775
BM-CC-DC	0.826	0.717	0.768
CERMiner	0.848	0.727	0.782

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Tang, Z.; Ma, L.; Zhang, Z.; Yang, H. A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning. Symmetry 2025, 17, 1910. https://doi.org/10.3390/sym17111910

AMA Style

Chen J, Tang Z, Ma L, Zhang Z, Yang H. A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning. Symmetry. 2025; 17(11):1910. https://doi.org/10.3390/sym17111910

Chicago/Turabian Style

Chen, Jianhui, Zhiyi Tang, Lianfang Ma, Zitong Zhang, and Haonan Yang. 2025. "A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning" Symmetry 17, no. 11: 1910. https://doi.org/10.3390/sym17111910

APA Style

Chen, J., Tang, Z., Ma, L., Zhang, Z., & Yang, H. (2025). A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning. Symmetry, 17(11), 1910. https://doi.org/10.3390/sym17111910

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Joint Extraction Model of Multiple Chinese Emergency Event–Event Relations Based on Weighted Double Consistency Constraint Learning

Abstract

1. Introduction

2. Related Work

2.1. Event–Event Relation Extraction

2.2. Few-Shot Learning on Information Extraction

3. Methods

3.1. Dataset

3.2. The Overall Framework

3.2.1. Input Vectorization Layer

3.2.2. Feature Extraction Layer

3.2.3. Relation Extraction Layer

3.3. Weighted Double Consistency Constraints

3.3.1. Common Sense Constraints

3.3.2. Domain Constraints

3.3.3. Weighted Joint Learning Objective Loss

4. Experiment

4.1. Baselines

4.2. Metric

4.3. Implementations

5. Results

6. Ablation Study

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI