1. Introduction
In the real world, emergency events [
1], such as traffic accidents, fires, forest fires, earthquakes, and health hazards, pose a serious threat to human life and property. As a result, they garner widespread concern from people across various sectors. In the context of the Internet information era, the rapid spread and fermentation of information about emergency events through online media will breed public opinion events and affect social public safety. Accurate and efficient access to structured information about emergency events can assist relevant personnel in early detection and prompt response to such events, mitigate the emergence of public opinion incidents, and uphold societal safety.
The goal of event extraction is to identify pre-specified types of events and their corresponding event arguments from plain text. Numerous previous studies have primarily focused on sentence-level event extraction [
2,
3,
4], with most of these studies evaluating the ACE dataset [
5]. These sentence-level approaches make predictions within a single sentence and are incapable of extracting events that span multiple sentences. In the real world, the information of event arguments cannot be fully obtained from a single sentence. For example, as shown in
Figure 1, the event argument of the “Accident” event “重庆永川吊水洞煤矿”(Chongqing Yongchuan Diaoshuidong coal mine) and “12月4日17时许”(around 17:00 on 4 December) are distributed in two different sentences, s1 and s2, so we devoted ourselves to the study of DEE. To extract structured event information from documents, researchers have proposed numerous model methods and datasets for model training and validation. In the following sections, we present the DEE datasets and DEE models and methods that have been put forth in previous works.
A wide range of research scholars have conducted extensive work on DEE datasets to train and validate DEE model methods. They have constructed numerous DEE datasets, such as the MUC-4 [
6] dataset. This dataset consists of 1700 documents annotated using an associated role–population template. Additionally, the Twitter dataset was constructed by collecting and annotating English tweets posted in December 2010. It includes 20 event types and 1000 tweets. The WIKIEVENTS dataset, published by Li et al. [
7], as a document-level benchmark dataset, utilizes English Wikipedia articles as the data source. Yang et al. [
8] performed experiments on four types of financial events, namely equity freeze events, equity pledge events, equity repurchase events, and equity increase events. They tagged a total of 2976 announcements.
Although numerous DEE datasets have been constructed by domestic and foreign researchers in previous works, there are two issues. Firstly, most of these datasets are in English, making it impossible to train and validate Chinese DEE model methods. Secondly, there is a lack of DEE datasets specifically constructed for the field of emergency events. Hence, the current priority is to develop Chinese document-level emergency event extraction datasets and address the issue of missing datasets.
Regarding DEE models and methods, numerous scholars have directed their focus towards addressing two challenges: argument scattering and multiple events. Specifically, argument scattering refers to the situation where event arguments are dispersed across multiple sentences in a document. For instance, as shown in
Figure 1, the event arguments “重庆永川吊水洞煤矿”(Chongqing Yongchuan Diaoshuidong coal mine) and “30个小时”(30 h) for the “Rescue” event are found in two different sentences, s1 and s5, within the document. On the other hand, multiple events indicate that a document can encompass various distinct events. For example,
Figure 1 includes three different events: “InjureDead”, “Rescure”, and “Accident”.
In a previous study, Yang et al. [
8] proposed the DCFEE model, which extracts trigger words and arguments in a sentence-by-sentence manner. Subsequently, a convolutional neural network is employed to classify each sentence and determine whether it qualifies as a key sentence. Additionally, to acquire comprehensive event arguments, an argument complementation strategy is proposed to obtain arguments from the surrounding sentences of the key event’s location for supplementation. Zheng et al. [
9] redesigned the DEE task to treat the DEE task as a table-filling task using a trigger-word-free approach to populate candidate entities into a predefined event table. Specifically, they modeled DEE as a continuous prediction paradigm in which arguments are predicted in a predefined role order and multiple events are also extracted in a predefined event order. While the method achieves the goal of the DEE task through trigger-word-free extraction, it introduces error propagation issues as the former argument identification results do not consider the latter argument identification results, given that arguments are predicted in a predefined role order during the argument identification process. Yang et al. [
10] proposed an end-to-end model in which information about multiple events as well as event arguments is extracted simultaneously from the document in a parallel manner using a multi-grain decoder after the overall document representation is obtained using multiple encoders. Based on previous work, we can divide the DEE task into three subtasks: candidate entity extraction, event type detection, and argument identification. Among them, candidate entity extraction involves extracting entities related to events from the text; event type detection involves determining the types of events present in the text; and argument identification involves identifying the arguments belonging to an event among the candidate entities. Candidate entity extraction, as the first subtask of DEE, affects the effectiveness of the two subsequent subtasks. Previous research has focused on addressing issues such as scattered arguments and multiple events, overlooking the presence of role overlapping in the first subtask. However, this overlapping significantly impacts the performance of both subsequent subtasks and the overall DEE task. Role overlapping refers to the phenomenon of candidate entities playing multiple roles in the same event or in multiple different events. For example, in
Figure 1, the entity “12人”(12 people) plays the role of “InjureDead” and “DeadPerson” in the event of “InjureDead”; the entity “重庆永川吊水洞煤矿”(Chongqing Yongchuan Diaoshuidong coal mine) plays the role of “RescurePlace” in the “Rescure” event and “HappenPlace” role in the “Accident” event.
To address the above-mentioned problems of missing datasets and role overlapping, we have done the following two things. On the one hand, to address the lack of datasets, we defined a framework for unexpected event extraction by analyzing and summarizing information of Chinese emergency events, and constructed a Chinese document-level emergency event extraction dataset CDEEE. We defined 4 event types and 19 role types in this dataset. Furthermore, we have provided annotations for three specific issues: argument scattering, multiple events, and role overlapping. Ultimately, the CDEEE dataset has been annotated, consisting of 5000 documents and 10,755 events. On the other hand, to address the role overlapping problem, we propose the DEE model RODEE. In this model, we first use the pre-trained language model RoBERTa-wwm [
11] to embed the text representation and then encode it using Transformer to obtain the text representation, which gives us an overall understanding of the text. Specifically, we designed two separate models to represent the start position information and end position information of the candidate entities, and use multiplicative attention to combine the two to obtain the scoring matrix, so as to predict the candidate entities and assist in the event extraction task.
Overall, our main contributions are in the following three areas:
We constructed a Chinese document-level emergency event extraction dataset called CDEEE through manual annotation. During the annotation process, we addressed not only the problem of scattered arguments and multiple events but also the issue of role overlapping.
We propose RODEE, a DEE model designed to handle the role overlapping problem. It utilizes two independent matrices to represent the start and end positions of candidate entities. Additionally, it employs multiplicative attention to generate a score matrix for predicting entities with role overlapping, thereby assisting in the event extraction task.
We compared the RODEE model approach with the existing DEE model approach using the CDEEE dataset, and the experimental results demonstrate that the RODEE model approach surpasses the performance of the existing DEE model approach.
4. Proposed Method
The model described in this paper comprises three main subtasks: candidate entity extraction, event type detection, and argument identification. First, the text embedding and text representation are obtained by the pre-trained language models RoBERTa-wwm and Transformer [
29]. Then, we utilize two independent modules to extract the head and tail position information of candidate entities. The interaction between the head and tail position information is achieved through a multiplicative attention mechanism, resulting in a score matrix for predicting candidate entities. Subsequently, the candidate entity representation is fused with the sentence representation and the document representation is obtained for event type detection using Transformer. Finally, we decode the event and role information using a multi-granularity decoder for the recognition and prediction of argument elements. The structure of the proposed model is shown in
Figure 7.
The DEE task can be described as extracting one or more structured events, denoted as , from an input document consisting of sentences, where represents the number of events contained in the document. Each event extracted from the document contains event types and their associated roles. We denote the set of all event types as and the set of all role types as . The structured event information extracted from the document is denoted as , where represents the event type. The role information corresponding to each event type and the arguments used to populate the role information are denoted as and , respectively.
4.1. Candidate Entity Extraction
Candidate entity extraction, as the first subtask of the DEE task, has a huge impact on the performance of the two subsequent subtasks of entity type detection and argument identification. However, in previous candidate entity extraction tasks, candidate entities are usually considered as flat entities and the candidate entity extraction task is accomplished using sequence annotation. Although this sequence labeling-based approach achieves better results in the task of flat entity extraction, it fails to address the issue of overlapping roles and cannot accurately extract entities that have multiple roles. To solve this problem, we firstly use two different matrices to represent the head position information and tail position information of candidate entities respectively in this stage. We then employ multiplicative attention to facilitate interaction between the two matrices, enabling the extraction of deep information. This process yields a score matrix, which is subsequently used to accomplish candidate entity extraction.
Specifically, given a document
, where each sentence is represented as
, we first apply embedding using the pre-trained language model RoBERTa to obtain an embedded representation
, where
represents the sentence length. Then, to obtain the textual representation, we encode the embedded representation of the text using the Transformer encoder, and eventually we can obtain the text representation
for each sentence
as follows:
where
,
,
are the hidden layer sizes.
Finally, in order to accurately extract the overlapping candidate entities of characters, we use two FNNs networks to generate two different matrices:
and
. These matrices represent the head position information and tail position information of candidate entities, capturing the contextual information of the target characters. By using different matrices to represent the head position information and tail position information of the candidate entities and training them, the start position and end position of the candidate entities can be identified, respectively. Since the contexts of the start and end positions of the candidate entities are different, using different matrices to represent the head position information and tail position information of the candidate entities, respectively, greatly improves task accuracy compared with using the output of Transformer directly. We employ multiplicative attention, as illustrated in
Figure 8, to facilitate the interaction between the start position information and end position information. This interaction produces a score matrix, denoted as
, for candidate entity prediction, where
is the number of predefined role types plus one (i.e., non-predefined role types). We obtain the score matrix in the following way:
where
,
, and
are hyperparameters, indicating the window size used to obtain the contextual embedding of the target character. In this case, we set its value to 64, indicating that 64 characters preceding and following the target character are obtained to form the contextual representation. This contextual representation serves as the start or end position information for the target character.
and
represent the vector representations of the starting position information and ending position information of the candidate entity with the role type
and entity span
, where
. In order to facilitate interaction between the starting position information and ending position information of the target characters for each role type, we transpose the representation of the ending position information of the target characters. This involves swapping the last two dimensions of
, denoted as
.
After obtaining the score matrix
, we generate the candidate entity prediction matrix using the following formula:
where
is a trainable matrix, and
is the candidate entity prediction matrix in
Figure 8. Finally, after the transformation, we obtain the prediction results
for the start position, end position, and role type of the candidate entities, as shown in
Figure 8. For the candidate entities extracted from each sentence, we denote them as the triplet
, where
represents the start position of the candidate entity,
represents the end position of the candidate entity, and
represents the role type of the candidate entity. For all the candidate entities extracted from the whole document, we denote them as the set
. For each candidate entity, we represent them with the quadruplet
, where
denotes the sentence index of the candidate entity.
Regarding the loss function in this section, the cross-entropy loss function is used as follows:
4.2. Event Type Detection
Before performing event type detection, our model needs to understand the document as a whole, which means obtaining document-level contextual encoding information. To obtain a holistic representation of the document, we employ the Transformer encoder, enabling the interaction between all sentence information and candidate entity information. Specifically, to obtain comprehensive documents encoding information, we first use MaxPoolling to obtain the textual representation of candidate entities
and each sentence textual representation
. This ensures that the textual representation of candidate entities has the same dimension as the sentence representation, facilitating interaction between the two. Then, the sentence information and the candidate entity information are separately embedded in two aspects. The positional information of the sentence is integrated with the sentence text representation after the MaxPooling operation. On the other hand, we embed the role type
of the candidate entity after performing the same MaxPooling operation. This process is applied separately for candidate entities with multiple role types, and the multiple embeddings are fused together. Finally, the completed embedded sentence text representation
and the candidate entity text representation
are fed to the Transformer encoder, resulting in the interaction between them and obtaining the entire document representation. Specifically, as shown in the following equation:
where
represents the sentence representation within the document representation,
represents the candidate entity representation represents the document representation,
represents the number of candidate entities extracted from the document, PE represents sentence position information, and
represents feature fusion embedding.
After obtaining the overall document representation, we can use the sentence representation
from the document representation for event type detection. Specifically, we bifurcate each event type by performing the MaxPolling operation on
. That is, for each event type, we carry out the following:
where
represents the probability that the i-th event is an event of type
, and
represents the number of predefined event types.
Regarding the loss function in this section, the cross-entropy loss function is used as follows:
4.3. Argument Identification
In this stage, we need to match and fill in the arguments and event roles for the existing events. Following Yang et al. [
10], we use a multi-granularity decoder to extract events in a parallel manner. This method consists of three parts: an event decoder, a role decoder, and an event-to-role decoder.
The event decoder is designed to support the parallel extraction of all events and is used to model interactions between events. A learnable query matrix
is generated for event extraction, where
represents the number of events contained in the document and is a hyperparameter. The event query matrix
is then passed through a non-autoregressive decoder, which consists of multiple identical stacks of Transformer layers. In each layer, there is a multi-head self-attention mechanism that simulates interactions between events. Additionally, there is a multi-head cross-attention mechanism that integrates the document-aware representation
into the event query
:
where
.
The role decoder is designed in a similar way to support parallel filling of all roles in the event and modeling interactions between roles. A learnable query matrix
is generated for event extraction, where
represents the number of roles associated with the corresponding event. Then, the role query matrix
is fed into a decoder with the same architecture as the event decoder. Specifically, the self-attention mechanism can model relationships between roles, and the cross-attention mechanism can integrate candidate entity representations
with the document representation.
where
.
In order to generate different events and their corresponding roles, we designed an event-to-role decoder to simulate the interaction between event information and role information.
where
.
Finally, after decoding with a multi-granularity decoder, we transform
event queries and
role queries into
predicted events and their corresponding
predicted roles. To filter out false events, we assess whether each predicted event is non-empty. Specifically, predicted events can be obtained through the following approach:
where
is a learnable matrix.
Afterwards, for each predicted event with predefined roles, we decode the predicted arguments by filling the candidate indices or null values with an
class classifier.
where
, and
are learnable matrices, and
. It is worth noting that we carried out some dimensional transformations during the actual operation process.
So far, we have obtained predicted events, and the candidate entities for each role corresponding to each event, . This completes the event extraction, as well as the identification, matching, and filling of the corresponding arguments.
Regarding the loss function for this section, we first use the assignment problem in operations research [
30] to find the optimal assignment between the predicted events
and the ground truth events
:
where
represents the permutation space with a length of
, and
denotes the pairwise matching cost between the ground truth data
and the predicted data
with the index
. To account for all the predicted instances of event roles, we define
as follows:
where
indicates that the event is not empty. The optimal assignment
can be effectively calculated using the Hungarian algorithm [
30]. Then, based on all optimal assignments, we define a loss function with negative log likelihood:
Finally, our overall loss function considers the candidate entity extraction loss
, the event type detection loss
, and the event argument recognition loss
, which involves filling the entity–role pairs, as shown below:
where
,
, and
are hyperparameters.
5. Experiments
5.1. Experimental Setting
We used our labeled Chinese document-level unexpected event extraction dataset CDEEE as our experimental data. Our dataset contains a total of 5000 documents, including four event types: “InjureDead”, “Rescure”, “Accident”, and “NaturalHazard”.
For a fair comparison, we adopted the evaluation standard used in Doc2EDAG and DE-PPN. Specifically, for each predicted event, we select the most similar ground truth without a replacement to calculate precision (P), recall (R), and the F1 measure (F1 score). Since an event type often includes multiple roles, we calculated micro-averaged role-level scores as the final DEE metric. Since event types typically consist of multiple roles, the micro-F1 value is calculated at the role level, serving as the final metric. To better reflect the performance of our model, the hyperparameter information in the experiment section follows the settings of the super parameters in the DE-PPN model. We have provided a detailed description of the experimental environment and hyperparameter information in Appendix A.
5.2. Comparison Experiment and Result Analysis
Given that our dataset follows the concept of triggerless word annotation, we utilized the following model as a baseline model for comparison experiments as well as a quality validation model for the dataset:
Doc2EDAG: An end-to-end model that converts DEE into a table–population task, directly populating event tables with entity-based paths for extensions.
GreedyDec: This model is a baseline model in the Doc2EDAG model that uses a greedy strategy to populate against an event table.
DE-PPN: This model uses multi-granularity decoders for parallel extraction of events to improve the speed of event extraction while effectively addressing the challenges of multiple events and argument scattering of document-level events.
In addition, to verify the quality of the dataset, we conducted an experiment involving manual extraction of document-level event information. For human evaluation, we randomly selected 1000 events and invited three individuals to manually extract the event information. The final result is an average of the three individuals’ findings.
Based on the experimental setup in
Section 5.1, we used a manual approach to extract event information and analyze it in comparison to the results of the baseline model on the CDEEE dataset. Simultaneously, we trained our proposed model RODEE and analyzed the results of RODEE against the baseline model experiments under the same experimental conditions.
It should be noted that in the experimental results below, we highlight in bold the data from the human evaluation (Human), the RODEE model, and other baseline models that outperform the RODEE model.
Table 2 and
Table 3 show the results obtained by each model on the CDEEE dataset for each event type, as well as the overall experimental results. We can observe that the scores achieved by humans on the CDEEE dataset were much higher than those of the existing DEE models. On the one hand, this indicates the high quality of our labeled dataset, and on the other hand, it also indicates that there is still more room for improvement in the DEE task.
Considering that the existing DEE model methods use sequence annotation to complete the candidate entity prediction task in the candidate entity extraction stage and embed the role types of candidate entities to assist in the DEE task, our CDEEE dataset is annotated for the candidate entity role overlapping problem. Therefore, we modified the baseline model by removing the role type embedding module from the baseline model and naming the modified baseline model as Doc2EDAG*, GreedyDec*, and DE-PPN*.
By observing
Table 2 and
Table 3, it can be seen that in the CDEEE dataset annotated with overlapping role issues, the model approach that employs embedding of individual role types to assist in completing the DEE task performed lower overall compared to the DEE model approach that does not involve role type embedding. Therefore, we believe that incomplete embedding of role type information not only fails to enhance the overall performance of the DEE task but may also have negative effects, potentially reducing the performance of the DEE task to some extent.
We can also observe from
Table 2 and
Table 3 that the overall performance of our proposed RODEE model was better than that of the existing DEE task model, with an improvement of 7 percentage points in the precision P of our model compared to the best-performing DE-PPN model among the baseline models, and compared to the best-performing Dco2EDAG* model among the baseline models in terms of F1 values, our model’s F1 value improved by 3.9 points. In addition, we also observed that the recall R and F1 values for the “InjureDead” event were lower than those of the Doc2EDAG* model, which we attribute to the fact that event “InjureDead” has only four role types and a low role overlapping rate, and thus the performance of our proposed model for the candidate entity role overlapping problem is slightly lower than that of the Doc2EDAG* model for this class of events. Additionally, the recall rate R and F1 score for the “Rescure” event in the Doc2EDAG* model were slightly higher than those of our model. We believe that although the “Rescure” event has more role types compared to the “InjureDead” event, the overall role overlap in this event is relatively low, which leads to the aforementioned phenomenon.
To further analyze the performance of RODEE, we also conducted experiments on the candidate entity extraction subtask, and the results are shown in
Figure 9. As can be seen in
Figure 9, RODEE not only outperformed the other models on the DEE task, but also on the candidate entity extraction subtask. Compared with other models, RODEE improved at least 11 percentage points on the F1 score for the candidate entity extraction subtask. We believe this is due to the fact that more deep information about the text is obtained to improve the performance of the model for candidate entity extraction. This demonstrates that the performance of candidate entity extraction, as the first subtask of the DEE task, has a significant impact on the overall DEE task, and the improvement in the candidate entity extraction subtask has a positive effect on the DEE task.
5.3. Ablation Experiment
To validate the effectiveness of our improvements in the model, we conducted ablation experiments on certain modules. Firstly, we performed ablation experiments on the feature fusion of candidate entity role information. Previously, we had also removed the role information features of candidate entities in various baseline models and obtained better experimental results compared to the source model. Therefore, we also need to verify whether the fusion of role information features positively or negatively affects our overall task. Secondly, due to the use of the pre-trained language model RoBERTa-wwm, we need to verify whether the overall performance improvement in the task is solely attributed to the pre-trained language model. It is well known that pre-trained language models significantly improve performance in various natural language processing tasks. In our case, to address the overlapping problem of argument entity role information, we required more granular information and thus adopted a pre-trained language model, RoBERTa-wwm, which can capture more text information. We made this pre-trained language model an important component of our model. For various reasons, we also conducted ablation experiments on this part. However, in order to support our need for obtaining more granular information, we did not remove the pre-trained language model, but rather added it to the baseline model, DE-PPN. In general, we compared the results of the DE-PPN model, which does not incorporate role type embedding and pre-trained language model modules, with the RODEE module we built and the following two ablation modules:
In
Table 4, we present the results of our model -RoleEmb after removing the candidate entity role information features, as well as the baseline model DE-PPN with the addition of the same pre-trained language model +BERT that we used. It is evident that our model still maintains its advantageous position compared to the aforementioned two models. Our model achieved a 1.1-point improvement in the F1 score compared to the -RoleEmb model, which removes the candidate entity role information feature. This demonstrates that incorporating correct candidate entity role information plays a beneficial role in our DEE task. Moreover, as shown in
Section 5.1, incomplete or incorrect candidate entity role features can negatively impact the overall performance of document-level event extraction. Furthermore, we can observe that DE-PPN achieved a 6.9-point increase in the F1 score after incorporating a pre-trained language model. However, when compared to our RODEE model, there was still a 3.2-point gap in the F1 score. We can conclude that although the powerful performance of the pre-trained language model is leveraged in our model, our proposed model improvements for the candidate entity role overlapping problem still contribute significantly to the performance of the DEE task.