Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan

He, Yun; Yang, Banghui; He, Haixia; Fei, Xianyun; Fan, Xiangtao; Liu, Jian

doi:10.3390/w16233535

Open AccessArticle

Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan

by

Yun He

^1,2,3,

Banghui Yang

¹,

Haixia He

⁴,

Xianyun Fei

³,

Xiangtao Fan

^1,2 and

Jian Liu

^1,2,*

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China

³

School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang 222005, China

⁴

National Disaster Reduction Center, Ministry of Emergency Management, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Water 2024, 16(23), 3535; https://doi.org/10.3390/w16233535

Submission received: 18 October 2024 / Revised: 5 December 2024 / Accepted: 6 December 2024 / Published: 8 December 2024

(This article belongs to the Section Urban Water Management)

Download

Browse Figures

Versions Notes

Abstract

Rainstorm disasters have wide-ranging impacts on communities, but traditional information collection methods are often hampered by high labor costs and limited coverage. Social media platforms such as Weibo provide new opportunities for monitoring and analyzing disaster-related information in real-time. In this paper, we present ETEN_BERT_QA, a novel model for extracting event arguments from Weibo rainstorm disaster texts. The model incorporates the event text enhancement network (ETEN) to enhance the extraction process by improving the semantic representation of event information in combination with event trigger words. To support our approach, we constructed RainEE, a dataset dedicated to rainstorm disaster event extraction, and implemented a two-step process, as follows: (1) event detection, which identifies trigger words and classifies them into event types, and (2) event argument extraction, which identifies event arguments and classifies them into argument roles. Our ETEN_BERT_QA model combines ETEN with a BERT-based question-answering mechanism to further improve the understanding of the event text. Experimental evaluations on the RainEE and DuEE datasets show that ETEN_BERT_QA significantly outperforms the baseline model in terms of accuracy and the number of event argument extractions, validating its effectiveness in analyzing rainstorm disaster-related Weibo texts.

Keywords:

heavy rain disaster; event detection; event argument extraction; deep learning

1. Introduction

Due to the unique climatic conditions and topography of the country, rainstorm disasters occur frequently and pose a serious threat to human life, infrastructure, and economic stability [1]. The increasing frequency and severity of rainstorm disasters underscore the imperative for effective disaster management. However, traditional approaches encounter considerable obstacles, including the necessity for prompt information, an expedient response, and a unified allocation of resources [2]. It is frequently the case that traditional data sources, such as official reports, are unable to provide real-time, detailed insights that are critical in emergencies [3]. In this context, the extraction of pertinent information from unstructured texts pertaining to rainstorm disasters has become a crucial step in enhancing disaster response and management. The process of identifying and structuring critical event information, otherwise known as event extraction, can provide emergency services with actionable insights that facilitate faster, more targeted interventions.

In recent years, Weibo has become a key source for obtaining data on rainstorm disaster events by virtue of its wide coverage, real-time update capability, and embedded geolocation information [4]. Compared with traditional official report data, Weibo data are instantaneous and can reflect the latest situation of those affected by disasters. In contrast, official reports are usually released late and may lack detailed descriptions of specific disaster areas [5]. The immediacy and extensiveness of Weibo data give it a unique advantage in extracting structured event information from massive unstructured rainstorm disaster texts. Event extraction is a key part of information extraction, which can automatically identify and organize event information in unstructured text and transform it into actionable structured data [6]. This capability is particularly important in rainstorm disaster response because rapid and accurate access to information about the relevant people, content, location, and time of rainstorm disaster events can effectively support decision-making and coordination in emergency management.

Recent research has explored various methods to enhance the effectiveness of event extraction in social media. Panem et al. introduced a method in 2014 to automatically extract structured information (such as attribute–value pairs and fact triplets) from tweets related to natural disasters, which improves the relevance of search results and provides critical real-time information for crisis management [7]. This work highlights the potential of utilizing social media data to quickly and effectively respond to disasters. Similarly, Edouard et al. proposed a graph-based event extraction method in 2017, which detects and clusters tweets describing the same disaster event by constructing a temporal event graph, significantly improving the accuracy of event information extraction [8]. Typically, event extraction can be divided into two subtasks [9]: (1) event detection, where the goal is to identify trigger words in the event text and categorize them into event types. Trigger words are the core words of an event, which can clearly express the occurrence of the event, usually in the form of nouns or verbs. (2) Event argument extraction, the goal of which is to identify event arguments in event texts and classify them into argument roles. An event argument is a participant in an event, while an argument role is the role that an event argument plays in an event [10].

Despite the clear structure provided by these subtasks, current research mainly treats event extraction as a classification problem, focusing on event trigger word extraction and classification as well as event argument role extraction [11,12,13,14]. Despite the progress made, classification-based approaches still lack data and require large amounts of training data to ensure good performance [15,16]. In addition, such methods are usually unable to handle new event types that are never encountered during training [17].

In this study, we introduce a new event extraction learning paradigm that aims to address the above issues simultaneously. Our main motivation is that event extraction can essentially be viewed as a machine reading comprehension (MRC) problem [18] involving text comprehension and matching to discover information about specific events in a text.

This implies a new approach to address event extraction, which has two main advantages. Firstly, by treating event extraction as MRC, we can augment the event extraction task by utilizing the latest advancement in MRC, BERT [19], which may significantly enhance the inference process in the model. Second, we can directly utilize the rich MRC dataset to enhance event extraction, which may alleviate the data scarcity problem (this is known as cross-domain data augmentation) [20]. The second advantage also opens the door to zero-sample event extraction for invisible event types, we can list the questions that define their patterns and use the MRC model to retrieve the answers as event extraction results, instead of obtaining training data for them in advance.

To connect MRC and event extraction, the key challenge is to generate relevant questions that describe the event scenario. Note that we cannot use a supervised problem-generation approach [21,22,23] because of the lack of aligned problem–event pairs. Previous work on connecting MRC and other tasks typically used manually designed templates [24,25,26]. For example, in QA-SRL, the question for the predicate publish is always ‘Who published something?’, regardless of the context. Such a question may not be sufficient to instruct the MRC model to find an answer.

The aforementioned challenges are addressed through the introduction of an unsupervised question generation process, which is designed to produce questions that are both relevant and context-dependent. Specifically, our approach assumes that each question can be decomposed into two parts, reflecting the event type and argument role information, respectively. To illustrate, the question “Q” can be decomposed into two constituent parts: “Personal safety?” and “Number of trapped persons?” In order to generate question expressions, a template-based generation method has been devised which combines event types and argument roles. Subsequently, a BERT-based MRC model is constructed in order to answer each question and synthesize the answers into an event extraction result.

Deep learning methods are data-driven, and well-labeled event extraction datasets are essential for obtaining accurate event extraction results. Currently, representative publicly available event extraction datasets include the DuEE dataset.

In 2020, Li et al. [27] developed the DuEE dataset, a large-scale, high-quality, and diverse event extraction dataset. Available online: https://opendatalab.org.cn/OpenDataLab/DuEE (accessed on 17 October 2024). This publicly available Chinese event extraction dataset provides comprehensive coverage and promotes extensive research and evaluation in the field of event extraction.

The Weibo platform is an important social media platform through which users can publish and share rainstorm disaster-related information in real-time. Collecting and processing rainstorm disaster data can effectively solve the problem of scarcity of data sources for event extraction, thus improving the completeness of the event extraction task. Using Weibo data as a data source, the best deep learning method for event extraction can be identified, which in turn facilitates subsequent research in the field of rainstorm disasters. Therefore, it is crucial to generate event extraction datasets related to rainstorm disasters.

Inspired by the work mentioned above, the main contributions of this work are as follows:

In view of the importance of rainstorm disaster event information extraction for subsequent natural language processing tasks, this paper generates a rainstorm event extraction (RainEE) dataset based on rainstorm disaster texts on Weibo.
In order to enhance the perception of trigger words in event text and improve the accuracy and completeness of event argument extraction, a BERT-based question and answer network (ETEN_BERT_QA) is proposed with an event text enhancement network in the coding section.
To verify the superiority of our proposed network, we conduct experiments on the RainEE dataset, the DuEE dataset, and the 20 July 2021 Henan rainstorm case. The results show that our proposed network outperforms the baseline model in both accuracy and number of event extractions.

2. Materials and Methods

This section introduces the rainstorm disaster event extraction dataset used in this paper.

2.1. Materials

This section introduces the process of conducting a self-made event extraction dataset in the field of rainstorm disasters, as well as the existing event extraction datasets that are currently widely used.

2.1.1. Production of RainEE

Due to the lack of event extraction datasets dedicated to rainstorm disasters, we developed the Rainstorm Event Extraction (RainEE) dataset to fill this research gap. The RainEE dataset is based on a large amount of user-generated content related to rainstorm disasters on Weibo, especially those that contain information about the events and is designed to improve the applicability of rainstorm event extraction methods. Compared with traditional data sources, the immediacy and extensiveness of Weibo data enable us to capture rainstorm disaster field information quickly and accurately. This immediacy not only helps to gain an in-depth understanding of the dynamic development of the disaster event but also provides data support for timely emergency response measures and rational allocation of resources, thus enhancing the responsiveness and decision-making efficiency of disaster management.

In order to construct the RainEE dataset, a bespoke web crawler framework was devised and deployed with the specific objective of extracting content from Weibo pertaining to rainstorm-related disasters. The initial step involved the construction of a keyword list that was highly relevant to rainstorm events. Subsequently, between 15 July 2021 and 3 August 2021, a further 70,000 or more Weibo were collected using the keywords ‘Zhengzhou flood’ and ‘Henan rainstorm’. The selected time period was chosen to encompass the peak of the rainstorm disaster, ensuring that the dataset can comprehensively reflect the public’s real-time response to the disaster and the latest developments. The selection of this period is primarily based on the occurrence of the rainstorm disaster, as shown in Figure 1. Statistical analysis of comment volume reveals a significant surge in posts and user engagement around July 20, coinciding with the peak of the disaster. This sharp increase in activity reflects heightened public attention and ongoing discourse surrounding the event, which offers valuable data for subsequent event extraction and disaster management research.

During the data preprocessing stage, it became evident that retweeting and copying behaviors are pervasive on social media, which presents a significant challenge for text processing. To address this issue, a series of pre-processing techniques were implemented to ensure the high quality and relevance of the data. Initially, redundant text is identified and removed through the utilization of deduplication techniques. Subsequently, texts pertaining to rainstorm disasters were filtered using a keyword-based template filtering method. In this stage, we initially assume that the data collected through keyword-based filtering represents real information for the specific disaster event (i.e., rainstorm disaster). However, as the data application scope expands, especially in multi-disaster scenarios, a series of validation methods, such as source reliability checks, sentiment analysis, and cross-platform comparisons, will be employed to further screen and distinguish true from false information to ensure data accuracy and reliability. Subsequently, the texts were subjected to a syntactic analysis to ascertain the soundness of their syntactic structure. The implementation of these preprocessing techniques resulted in a notable enhancement of the dataset quality, thereby establishing a robust foundation for the subsequent annotation work.

In the process of event annotation, we adhered rigorously to the established guidelines for annotation set forth by the DuEE dataset, to ensure consistency and accuracy in the annotation process. In order to define the various event types, I have made use of the disaster classification framework set out in the China standard SL579-2012 [28]. This provides a scientific basis for the systematic classification of rainstorm-related disaster events. In practice, depending on the amount and type of data, some categories were merged and adjusted to ensure reasonable classification and full utilization of the data. The final dataset’s event schema comprises 7 event types and 28 argument roles, including waterlogging, vegetation damage, traffic congestion, rescue operations, personal safety, landslides, and building collapse/damage. Table 1 illustrates the argument roles pertinent to each event type, including the victim, time, and location. Furthermore, a visual representation of the event schema and the outcomes of event detection and event role extraction are presented in Figure 2. The classification system facilitated not only the more efficient organization of the dataset but also provided a more targeted basis for subsequent emergency response and rescue plans for different incident types.

To address the possibility of new disaster events emerging in future research, it is important to expand the dataset to include new disaster types as they arise. This will help optimize model training and improve its adaptability. In particular, the continuous evolution of disaster events may require adjustments to the classification system to ensure its relevance and accuracy. By incorporating emerging disaster types, future research can enhance the robustness of the model, providing more comprehensive support for emergency response efforts in a dynamic environment.

During the annotation process, we carefully analyzed the storm disaster text, identified and annotated the event trigger words, classified them into event types, and annotated the event arguments and classified them into argument roles. This information was recorded in a predefined format. The final dataset consists of 1432 labeled data entries, of which, 70% (1021 training data entries) form the training set, and 30% (411 entries) form the validation set. The training and validation sets were selected from data collected between 15 July 2021 and 3 August 2021, covering the period of the heavy rainstorm event in Henan. This specific time frame ensures that the dataset captures a representative range of social media posts related to the disaster event. These labeled events not only provide a comprehensive picture of the public’s response to the disaster but also provide a valuable reference for understanding the evolution of the disaster. To ensure the consistency and accuracy of the data annotations, we regularly review the annotations that have been made. Although we were the sole annotators, the rigorous measures taken ensured that the process of annotating the RainEE dataset met the necessary standards of quality and reliability. The entire process of building the dataset is shown in Figure 3.

2.1.2. Existing Event Extraction Datasets

This study utilized two distinct datasets: the self-constructed RainEE dataset and the publicly available DuEE dataset. Table 2 provides a detailed comparison of the main parameters of these two datasets.

The DuEE dataset was made available by Baidu in 2020. It is the largest event extraction dataset in China to date, and its influence in this field is considerable. The DuEE dataset is meticulously partitioned into 11,958 training samples, 35,000 test samples, and 1498 validation samples, thereby establishing a robust foundation for the training and evaluation of event extraction models [27]. The DuEE dataset encompasses the most pertinent topics within Baidu search results, with data sourced from Baijiahao News, thereby ensuring the timeliness and relevance of the events. This diversity not only enriches the dataset but also enhances its applicability in different fields and environments, thus representing an important resource for evaluating the generalization ability of event extraction models.

Furthermore, the extensive coverage of diverse event types and argument roles inherent to the DuEE dataset presents a considerable challenge to researchers and practitioners alike. The dataset is notable for its complexity and comprehensiveness, which render it one of the most representative of Chinese event extraction tasks. This feature provides a robust foundation for benchmarking novel methods and techniques, ensuring valid and meaningful performance comparisons.

The RainEE dataset has been constructed with the specific intention of being highly relevant and serves as a pivotal benchmark for the extraction of information pertaining to disasters. In the context of emergency response, the timely and accurate dissemination of event information can have a significant impact on decision-making and the allocation of resources. The utilization of real-time Weibo social media data enables the reflection of the immediate response and actions of affected communities, as observed in the RainEE dataset.

In contrast to the DuEE dataset, which comprises a distinct test set, the RainEE dataset is not divided into separate test sets. This decision was made deliberately to concentrate on the principal event of 20 July 2021, the rainstorm in Henan. The substantial volume of data generated on Weibo social media, coupled with the diverse array of consequences associated with this event, offers a comprehensive repository for examining a multitude of incidents pertaining to heavy precipitation. In lieu of dividing the test set, we elected to conduct a comprehensive analysis of the model’s performance in this real-world scenario.

In the analysis process, we selected 6000 data entries from the crawled data as specific cases for the subsequent experimental part. Through this approach, we can gain a deeper understanding of the actual performance of the model in dealing with the complex challenges posed by the rainstorm disaster in Henan. In addition, it provides valuable insights and references for future research and disaster management strategies. By analyzing this specific case, we can comprehensively assess the adaptability and reliability of the model in responding to actual disaster events, thus providing an important basis for future improvement and application.

In conclusion, the DuEE dataset offers a comprehensive and heterogeneous foundation for event extraction, whereas the RainEE dataset concentrates on the particular domain of rainstorm disasters, providing a specialized resource for event extraction research. The combination of the two not only provides a comprehensive evaluation platform for event extraction methods but also advances the field and enhances the effectiveness of disaster response strategies.

2.2. Methods

This section describes the structure of each part of BERT_QA and the evaluation metrics of the network training results.

2.2.1. Optimizing the BERT_QA Model by Adding Event Text Augmentation Block

Developed by Devlin et al. at Google, BERT (Bidirectional Encoder Representation of Transformers) is a pre-trained language model that sets new standards in a wide range of Natural Language Processing (NLP) tasks, especially machine reading comprehension. BERT uses bidirectional transformer encoders, which, by simultaneously processing text in both pre- and post-contexts, can better capture semantic relationships and contextual nuances [29]. This bi-directional approach significantly improves BERT’s ability to model complex linguistic dependencies, enabling it to excel in tasks such as question-answering (QA) [30].

In QA tasks, BERT generates contextual representations of questions and paragraphs through fine-tuning. In the process, the model learns to predict where the answer will start and end in the context, enabling an accurate understanding of the text [31]. As shown in Figure 4, the architecture of the BERT model consists of a multi-layer transformer encoder that dynamically adapts its understanding of the context to the complexity of the input text [32,33].

This architecture makes BERT perform particularly well when dealing with multi-layered linguistic structures and can cope with the challenges in many NLP tasks, providing strong support for the model’s performance enhancement in the fields of text classification, named entity recognition, and machine translation.

Based on the BERT architecture, this study proposes ETEN_BERT_QA, which aims to enhance the model’s ability to process and understand event-related text through event trigger words. Compared to conventional BERT, which directly processes unaltered input text, our approach achieves more efficient information capture by preprocessing and enhancing event-specific text, especially by highlighting event trigger words. The aforementioned trigger words are identified in the event de-detection task and strategically enhanced in our approach, thus enabling the model to capture event-related information more efficiently, thereby improving the accuracy and effectiveness of the output results.

ETEN is based on the ability to highlight event trigger words in the appropriate context. This approach enables BERT to more effectively utilize these trigger words as key cues, thereby significantly enhancing the model’s ability to understand and interpret disaster scenarios. Compared to traditional approaches, our approach places more emphasis on the fundamental components of an event, thereby enhancing the model’s ability to parse the text of a specific event.

In addition, our model incorporates an intermediate linear layer after the BERT output that facilitates the connection between the BERT representation output and subsequent task processing. This layer combines a ReLU (rectified linear unit) activation function and a dropout operation to optimize feature extraction while reducing the risk of model overfitting. This design ensures that the model is not dependent on specific features in the training data, thus improving the generalization ability of the model.

The generated enhanced event representations can be used as inputs for subsequent tasks, including the question-answering (QA) task. In the QA task, the output of the event-text augmentation network goes through a final linear layer to predict the likelihood of a particular text span (i.e., answer). This layer helps the model to more accurately identify and extract key information from the event text, thus improving the performance of the machine reading comprehension task.

Figure 5 depicts the ETEN_BERT_QA structure, highlighting the interrelationships between the BERT layer, the intermediate linear layer, and the final output layer. This design optimizes the model’s ability to extract and understand event-related information, providing a powerful tool for event extraction tasks and a solid foundation for meeting complex disaster event extraction requirements.

These improvements markedly enhance the model’s capacity to process and interpret event-related text, thereby optimizing the performance of the event extraction task. By enhancing the representation of event trigger words and optimizing the output of the middle layer, the event text augmentation network can provide more accurate and relevant results. This capability is of particular importance when dealing with complex event-related linguistic phenomena, such as ambiguous wording, nested events, and diverse contextual cues. Consequently, the model not only enhances the precision of event recognition but also guarantees that the extracted information is contextualized and actionable, thereby providing crucial support for more effective disaster management and the development of more efficacious emergency response strategies.

2.2.2. Event Text Enhancement Network

In this study, we propose a machine question-answering approach based on a pre-trained model to effectively utilize event schema information for extracting relevant event arguments from a given event text. The task involves obtaining event argument role information

r

from an event text

x_{s}

containing an event type

x_{t s}

and its corresponding trigger word

x_{t r}

. The objective of the model is to maximize the conditional probability

p ((α| r, x_{s}, x_{t s}, x_{t r}))

, where

α

represents the most plausible event argument.

The design of question-answering (QA) text plays a crucial role in encoding and decoding processes within the pre-trained model. Therefore, crafting effective QA texts is essential for achieving the methodology’s goals. Machine QA texts are generated by integrating enhanced representations of question templates and event texts to leverage the capabilities of pre-trained models.

To ensure efficient processing and avoid excessive text length that may impair model performance, we use a prompt word template derived from prior knowledge, rather than a traditional natural language template. The question template is formulated as follows:

Q = \{x_{t s 1}, \dots, x_{t s m},^{'},^{'}, r_{1}, \dots, r_{n},^{'} ?^{'}\}

Here,

x_{t s}

is determined by the event detection subtask, and

r

is obtained from the event schema corresponding to

x_{t s}

. The variables

m

and

n

represent the lengths of

x_{t s}

and

r

, respectively.

To improve the model’s understanding of event text, the representation of trigger words is enhanced. This involves incorporating trigger words into the event text. The process is as follows:

Event text construction: For the original event text $x_{s}$ , the text is vectorized by the event content in the form of the following:

$x_{s} = \{x_{s 1}, \dots, x_{s m}\} \in R^{n^{'}}$

Among them, $n^{'}$ represents the length after vectorization.
Trigger word representation: The trigger word form obtained from the event detection subtask is as follows:

$x_{t r} = \{x_{t r 1}, \dots, x_{t r m^{'}}\} \in R^{m^{'}}$
Enhanced representation construction: In order to improve the model’s ability to understand event text, we adopted the strategy of adding special separators ‘[SEP]’ before and after the trigger word $x_{t r}$ . Specifically, we construct the enhanced text $X$ , which is in the following form:

$X = \{x_{s 1}, x_{s 2}, \dots, [S E P], x_{t r 1}, \dots, x_{t r m^{'}}, [S E P], \dots, x_{s n^{'}}\}$

The question template and the enhanced event text are integrated as the input C of the machine question-answering model. The integration form is as follows:

C = \{[C L S], Q, [S E P], X, [S E P], [P A D], \dots, [P A D]\}

Here,

[C L S]

marks the beginning,

[S E P]

separates components, and

[P A D]

ensures uniform input length.

Next, we will use the pre-trained model Chinese-BERT-WWM to encode the integrated question and answer text

C

to obtain its hidden layer representation

H

.

H = B E R T (C) \in R^{|C| * d}

where

| C |

represents the length of the input

C

and

d

represents the dimension of the hidden layer representation.

2.2.3. Event Extraction Evaluation Metrics

This paper adopts three accuracy evaluation indicators for instance-level and argument-level evaluation, which are widely used as evaluation criteria for event extraction: precision, recall and F1 score.

T P

is the number of samples correctly classified as positive,

F P

is the number of samples misclassified as positive,

T N

is the number of samples misclassified as negative, and

F N

is the number of samples correctly classified as negative. The above three precision evaluation metrics are calculated as follows:

This study employs three widely accepted accuracy metrics to evaluate event extraction performance at both the instance and argument levels: precision, recall, and F1 score.

Precision measures the proportion of correctly identified positive samples out of all samples predicted as positive. It is calculated as follows:

$P r e c i s i o n = \frac{T P}{T P + F P}$

where $T P$ (true positive) is the number of samples correctly classified as positive, and $F P$ (false positive) is the number of samples incorrectly classified as positive.
Recall, also known as sensitivity, indicates the proportion of actual positive samples that are correctly predicted by the model. It is calculated as follows:

$R e c a l l = \frac{T P}{T P + F N}$

where $F N$ (false negative) is the number of samples that are positive but incorrectly classified as negative.
The F1 score is the harmonic mean of precision and recall, providing a single metric to evaluate the model’s performance. It is calculated as follows:

$F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

These metrics offer a comprehensive assessment of the model’s accuracy in event extraction, balancing the trade-offs between precision and recall.

3. Experiment and Result

3.1. Experiment

The experiments were conducted in a computing environment configured with PyTorch 2.2.1 and CUDA 12.2, with the use of GPUs optimized to ensure efficient model training and evaluation. To validate the accuracy and effectiveness of the proposed model, a series of comprehensive experiments were conducted using both the customized RainEE dataset and the widely recognized DuEE dataset. This dual approach guarantees robust evaluation in disparate event extraction environments.

Furthermore, an investigation was conducted into a significant rainstorm that took place on 20 July 2021 in the Henan Province. The widespread impact of this event and the substantial volume of Weibo social media data it generated made it an invaluable real-world scenario for testing the model.

The principal objectives of this experiment are twofold: firstly, to evaluate the accuracy of event argument role extraction; and secondly, to quantify the total number of successful extractions performed by the model.

The structured experimental framework is designed to rigorously validate the performance of the proposed model and demonstrate its suitability for real-time disaster management applications.

3.2. Result

3.2.1. Accuracy Results for Event Argument Extraction

The results of the experiments conducted on the DuEE dataset are shown in Table 3. The F1 score of our proposed augmented model ETEN_BERT_QA is 76.3%, which is significantly improved by about 3% compared to the F1 score of the baseline model BERT_QA of 73.3%. This indicates that ETEN plays a significant role in improving the accuracy of the model in identifying and extracting event arguments.

This improvement can be attributed to ETEN’s emphasis on event trigger words, which allows the model to better understand complex event narratives and, thus, extract event-related details more accurately. In particular, when faced with complex and context-rich unstructured text, ETEN helps the model to more effectively identify key information that has been obscured.

Experiments conducted on the DuEE dataset not only validate the model’s performance in the event extraction task but also demonstrate its adaptability and robustness in dealing with a wide range of event types. The DuEE dataset covers a wide range of event types and provides a good test benchmark for the model’s generalization ability.

In conclusion, the experimental results on the DuEE dataset validate the effectiveness of our proposed model architecture and provide a solid foundation for subsequent evaluations based on the RainEE dataset.

Next, we evaluated the performance of the model on the RainEE dataset, and the results are shown in Table 4. The F1 score of our proposed model ETEN_BERT_QA is 75.7%. Although the recall of 79.3% is slightly lower than that of the benchmark BERT_QA model of 82.7%, the precision of our model is significantly higher than that of the BERT_QA model of 69.0%, reaching 72.4%. The improved accuracy supports our F1 score and highlights its strong performance.

The higher accuracy suggests that the ETEN_BERT_QA model can more reliably extract event arguments from complex, rainstorm disaster-related Weibo texts. This is particularly important in rainstorm disaster management, where the accuracy of the information can have a significant impact on disaster relief efforts.

In summary, these results demonstrate the effectiveness of our model in extracting event arguments from rainstorm disaster-related Weibo texts. These findings not only affirm the model’s ability to handle complex real-world data but also highlight its potential application in improving situational awareness and decision-making in emergency situations.

In summary, the design and improvement of the ETEN_BERT_QA model significantly improves the performance of the event argument extraction task for both general event types and specific disaster-related scenarios, proving its usefulness and effectiveness in real-world data processing. These results lay a solid foundation for future research and applications.

3.2.2. Quantitative Results and Specific Examples of Event Argument Extraction

In analyzing the DuEE dataset, we used a GlobalPointer-based event detection method for the 35,000 test texts in the dataset. Through the analysis, we identified 30,475 texts containing relevant event types and extracted event parameter roles from them.

As shown in Table 5, the baseline model BERT_QA successfully extracts event arguments from 25,369 texts. In contrast, our enhanced model ETEN_BERT_QA, which incorporates the trigger word enhancement method, successfully extracts event arguments from 28,261 texts with significantly improved performance. This result indicates that the model improvement effectively enhances its ability to recognize relevant event roles.

ETEN was particularly effective in providing additional cues to the model to help identify key event roles. By utilizing the contextual information associated with trigger words, ETEN_BERT_QA improves the accuracy and coverage of event parameter extraction. This is crucial for comprehensive event analysis.

These results highlight the strengths of our proposed event argument extraction model, emphasizing the potential of targeted enhancements to improve the performance of complex natural language processing tasks. Especially in areas such as disaster management, where timely and accurate information is critical.

Table 6 shows the specific results of the event argument extraction task, demonstrating the significant advantages of the ETEN_BERT_QA model in real scenarios. Compared to the benchmark model BERT_QA, ETEN_BERT_QA performs better in terms of both accuracy and completeness. This is mainly due to its enhanced textual representation capability, which enables it to identify event roles more efficiently.

Specifically, the ETEN_BERT_QA model not only performs well in basic event argument extraction (e.g., location) but also excels in handling complex event details (e.g., time and number of people marchers). This suggests that the model can better understand the nuances in text, especially in variable contexts, which are crucial for event extraction.

Overall, the ETEN_BERT_QA model significantly improves the quality of event argument extraction through a deeper understanding of context. These results not only validate the robust performance of the model in multi-scenario tasks but also demonstrate its potential application in complex event analysis.

To further assess the robustness of the proposed ETEN_BERT_QA, we conducted a comprehensive case study of the 20 July 2021 Henan rainstorm. This rainstorm was one of the most destructive rainstorms in modern Chinese history, causing extensive flooding, casualties, and significant property damage. Consequently, it offers an optimal setting for meticulous evaluation of the efficacy of our event extraction model.

As illustrated in Table 7, the baseline BERT_QA model is capable of effectively extracting event arguments from 4713 transcripts, while the enhanced ETEN_BERT_QA model can extract event arguments from 5461 transcripts, representing a notable improvement of approximately 16%. This result demonstrates the efficacy of our model in accurately identifying and extracting relevant event arguments. The enhanced representation of trigger words plays a pivotal role in enhancing the model’s performance, enabling it to more accurately identify and contextualize event roles.

Furthermore, the case study illustrates the model’s capacity to discern intricate details, such as the precise location of the inundated area, the number of individuals involved in the rescue operation, and the temporal occurrence of the event. This level of detail is of critical importance to emergency management, as it allows emergency responders to make informed decisions based on real-time information.

In conclusion, the results of this case study confirm the effectiveness of our model in intricate real-world environments and emphasize the potential of ETEN to enhance the task of event extraction. The model’s ability to handle complex details and massive amounts of data proves its utility in practical applications in the field of emergency management and disaster response. This will improve the efficiency of disaster relief operations.

Table 8 provides a detailed comparison of the event argument extraction results for rainstorm disaster texts. The results clearly show that our enhanced model significantly outperforms the baseline model, especially in the extraction of argument roles, such as the number of people trapped, which is an important detail often overlooked by traditional models.

This significant improvement can be attributed to the enhanced text representation capabilities and the effective use of the model to deepen contextual understanding. By accurately capturing the nuances of language in disaster scenarios, our model demonstrates greater accuracy in recognizing complex event details. These details are critical in real-time disaster management.

In disaster-related situations, accurate information can be a watershed for effective response and assistance. Therefore, the ability to extract these subtle and complex event details is critical. The model’s success in identifying these subtle elements not only emphasizes its robustness but also validates its practical application in disaster response and resource allocation. This capability ensures that emergency responders and decision-makers have access to the most relevant information, thereby increasing the efficiency of disaster response efforts. By providing a comprehensive understanding of event nomenclature, our model has become an important tool for responding to and mitigating the effects of natural disasters.

4. Discussion

In this paper, we propose a novel event extraction method, ETEN_BERT_QA, focusing particularly on extracting event argument elements from Weibo texts about rainstorm disasters. Our evaluation using the self-built RainEE dataset and the existing DuEE dataset shows that ETEN_BERT_QA excels in handling noisy and unstructured social media data. The ETEN component significantly improves event extraction in complex contexts by introducing event trigger words and optimizing the textual representations to improve the accuracy of identifying event parameters.

Compared to existing BERT-based event extraction models, ETEN_BERT_QA performs well in the event argument extraction task, especially in the context of noisy and complex Weibo data with enhanced robustness. Although the BERT-based machine reading comprehension model proposed by Du et al. [34] lays the foundation for event extraction, it lacks explicit modeling of event trigger words. This study introduces the trigger word mechanism, which significantly improves the accuracy of event extraction, especially in the context of complex text. Most of the current event extraction studies focus on generic events, and the refinement of natural disasters, especially rainstorm disasters, is significantly underdeveloped. To address this limitation, we construct a specialized RainEE dataset that enables the model to better cope with social media texts related to heavy rainfall. In addition, this study proposes practical solutions to effectively deal with social media data noise, thus opening up new avenues for event extraction tasks in complex environments.

The ETEN_BERT_QA model is founded upon the theoretical framework of event trigger words and augmented text representation. By explicitly modeling event trigger words, our study demonstrates the pivotal role of trigger words in event extraction. This result is in accordance with the theory of event semantic analysis, which posits that the accuracy of event argument extraction can be enhanced by the identification of pivotal information that precipitates the occurrence of events. The model demonstrates the successful application of this theory to event extraction for rainstorm disasters, thereby further enhancing the accurate identification of event arguments.

Notwithstanding the impressive outcomes yielded by the ETEN_BERT_QA model, certain constraints remain. Firstly, the domain specificity of the RainEE dataset constrains the model’s capacity to generalize to other categories of disaster events or broader social media datasets. To enhance the model’s capacity for generalization, the construction of more diverse datasets in the future is imperative. Secondly, the presence of noise and unstructured information in social media texts continues to present a challenge to the accuracy of event detection. The potential for misdetection and omission may impact the extraction of event arguments. Moreover, although existing noise reduction methods are effective in mitigating noise interference, the efficacy of these techniques in large-scale or real-time event extraction scenarios requires further verification.

While the ETEN_BERT_QA method is tailored for Weibo data and the specific context of rainstorm disasters, its core event extraction approach can be adapted to other social media platforms. The event extraction techniques, including the handling of noisy and unstructured text and the use of event trigger words, are not inherently platform-specific and could be applied to other platforms like Twitter, Facebook, or Instagram. However, it is important to note that the RainEE dataset, being based on Chinese Weibo data, is not directly applicable to other platforms or languages. The event extraction methods and preprocessing steps, such as text cleaning, and temporal and spatial information tagging, are transferable and could be applied in the construction of datasets for other platforms. Additionally, the event templates developed in this study, which focus on the extraction of specific event schema like locations, times, and event triggers, are adaptable for broader disaster datasets. Thus, while the RainEE dataset itself may limit the model’s immediate applicability to other platforms, the methodologies developed in this study provide a valuable framework for event extraction tasks on a wider scale. The ETEN_BERT_QA model demonstrates considerable potential for practical applications, particularly in the domains of disaster emergency management and social media monitoring. By effectively extracting data pertaining to rainstorm disasters from social media sources, government agencies, and emergency response organizations can gain insights into disaster dynamics and implement appropriate measures in a more expedient manner.

Further research could be conducted in several different ways. Firstly, the construction of diverse datasets encompassing a greater range of natural hazards is essential to enhance the model’s generalizability. Secondly, future research should investigate more sophisticated automatic event detection techniques, particularly in the context of the intricate nuances of social media texts. The utilization of more sophisticated multi-task learning strategies may prove beneficial in optimizing event detection and event parameter extraction. Furthermore, the incorporation of multimodal data (e.g., images, videos, sensor data, etc.) will facilitate the enhancement of contextual information for event extraction, thereby optimizing the accuracy of the model. Furthermore, real-time event extraction represents a crucial avenue of research, particularly in the context of disaster response and emergency management. Here, the objective is to enhance computational efficiency and reduce latency, which will be pivotal for effective decision-making in high-pressure situations. Furthermore, an investigation into the efficacy of pre-trained models on expansive and heterogeneous datasets, along with an analysis of the potential of migration learning for deployment in disparate domains, will enhance the adaptability and precision of the models. Ultimately, the implementation of reinforcement learning in dynamic environments, particularly in the context of optimizing event extraction decisions, has the potential to facilitate significant advancements in model development.

5. Conclusions

In recent years, social media has played an increasingly critical role in disaster studies, particularly during extreme weather events such as heavy rainstorms. Platforms like Weibo provide vast amounts of real-time data that can inform disaster response and mitigation efforts. However, the unstructured nature of the data presents significant challenges, making it difficult to extract actionable insights effectively. To address these challenges, this paper proposed an innovative event extraction approach utilizing a machine QA model enhanced by event trigger words. This method aims to systematically extract multi-dimensional information related to disasters, including time, location, and specific event details, from social media texts.

Our method demonstrated substantial improvements over traditional BERT-based models in event extraction tasks. Specifically, the ETEN_BERT_QA model was evaluated using both the custom RainEE dataset and the widely recognized DuEE dataset, achieving notable performance gains. On the RainEE dataset, which focuses on rainstorm disaster events, our model achieved an F1 score of 75.7%, effectively handling the complexity and noise typical of social media data. When tested on the DuEE dataset, which covers a broader range of event types, our approach showed a 3% improvement in F1 score over the baseline BERT_QA model, underscoring the effectiveness of incorporating event trigger words into the extraction process. The enhanced text representations allowed our model to more accurately capture relevant event triggers and arguments, thereby improving the completeness and accuracy of the extracted information.

A key strength of our approach lies in its ability to address the nuanced challenges posed by disaster-related social media texts, which often contain incomplete, noisy, and context-dependent information. By incorporating trigger word representations, the ETEN_BERT_QA model significantly enhances the identification of relevant event arguments, resulting in more robust and reliable extraction outcomes. The case study of the 20 July 2021, Henan rainstorm further validated the practical utility of our method, demonstrating its effectiveness in real-world disaster scenarios. During this devastating event, which led to extensive flooding, casualties, and property damage, our model was able to extract critical event details from over 6000 Weibo entries, highlighting its potential to support emergency management efforts with timely and accurate data.

Author Contributions

Methodology, Y.H.; validation, Y.H. and B.Y.; resources, Y.H. and H.H.; writing—original draft, Y.H.; writing—review and editing, J.L. and X.F. (Xianyun Fei); supervision, J.L. and X.F. (Xiangtao Fan). All authors have read and agreed to the published version of the manuscript.

Funding

This research is sponsored by National Key R&D Program of China (No. 2022YFC3800704).

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lei, H. Climate Change and Extreme Weather Events in China in 2007. In The China Environment Yearbook; Brill: Leiden, The Netherlands, 2009. [Google Scholar]
Khan, S.M.; Shafi, I.; Butt, W.H.; Diez, I.D.L.T.; Flores, M.A.L.; Galán, J.C.; Ashraf, I. A Systematic Review of Disaster Management Systems: Approaches, Challenges, and Future Directions. Land 2023, 12, 1514. [Google Scholar] [CrossRef]
Mavrodieva, A.V.; Shaw, R. Social Media in Disaster Management. In Media and Disaster Risk Reduction: Advances, Challenges and Potentials; Shaw, R., Kakuchi, S., Yamaji, M., Eds.; Springer: Singapore, 2021; pp. 55–73. ISBN 9789811602856. [Google Scholar]
Wu, K.; Wu, J.; Ding, W.; Tang, R. Extracting Disaster Information Based on Sina Weibo in China: A Case Study of the 2019 Typhoon Lekima. Int. J. Disaster Risk Reduct. 2021, 60, 102304. [Google Scholar] [CrossRef]
Dong, Z.S.; Meng, L.; Christenson, L.; Fulton, L. Social Media Information Sharing for Natural Disaster Response. Nat. Hazards 2021, 107, 2077–2104. [Google Scholar] [CrossRef]
Liu, J.; Min, L.; Huang, X. An Overview of Event Extraction and Its Applications. arXiv 2021, arXiv:2111.03212. [Google Scholar] [CrossRef]
Panem, S.; Gupta, M.; Varma, V. Structured Information Extraction from Natural Disaster Events on Twitter. In Proceedings of the 5th International Workshop on Web-Scale Knowledge Representation Retrieval & Reasoning, Shanghai, China, 3 November 2014; ACM: New York, NY, USA, 2014; pp. 1–8. [Google Scholar]
Edouard, A.; Cabrio, E.; Tonelli, S.; Le Thanh, N. Graph-Based Event Extraction from Twitter. In Proceedings of the RANLP17-Recent Advances in Natural Language Processing, Varna, Bulgaria, 2–8 September 2017. [Google Scholar]
Liu, K.; Chen, Y.; Liu, J.; Zuo, X.; Zhao, J. Extracting Events and Their Relations from Texts: A Survey on Recent Research Progress and Challenges. AI Open 2020, 1, 22–39. [Google Scholar] [CrossRef]
Sardianos, C.; Katakis, I.M.; Petasis, G.; Karkaletsis, V. Argument Extraction from News. In Proceedings of the 2nd Workshop on Argumentation Mining, Denver, CO, USA, 4 June 2015; Cardie, C., Ed.; Association for Computational Linguistics: Denver, CO, USA, 2015; pp. 56–66. [Google Scholar]
Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; Zong, C., Strube, M., Eds.; Association for Computational Linguistics: Beijing, China, 2015; pp. 167–176. [Google Scholar]
Nguyen, T.H.; Cho, K.; Grishman, R. Joint Event Extraction via Recurrent Neural Networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Knight, K., Nenkova, A., Rambow, O., Eds.; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 300–309. [Google Scholar]
Li, Q.; Ji, H.; Huang, L. Joint Event Extraction via Structured Prediction with Global Features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, 4–9 August 2013; Schuetze, H., Fung, P., Poesio, M., Eds.; Association for Computational Linguistics: Sofia, Bulgaria, 2013; pp. 73–82. [Google Scholar]
Ahn, D. The Stages of Event Extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, Sydney, Australia, 23 July 2006; Boguraev, B., Muñoz, R., Pustejovsky, J., Eds.; Association for Computational Linguistics: Sydney, Australia, 2006; pp. 1–8. [Google Scholar]
Chen, Y.; Liu, S.; Zhang, X.; Liu, K.; Zhao, J. Automatically Labeled Data Generation for Large Scale Event Extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Barzilay, R., Kan, M.-Y., Eds.; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 409–419. [Google Scholar]
Liu, J.; Chen, Y.; Liu, K.; Zhao, J. Event Detection via Gated Multilingual Attention Mechanism. Proc. AAAI Conf. Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
Huang, L.; Ji, H.; Cho, K.; Dagan, I.; Riedel, S.; Voss, C. Zero-Shot Transfer Learning for Event Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 2160–2170. [Google Scholar]
Chen, D.; Bolton, J.; Manning, C.D. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; Erk, K., Smith, N.A., Eds.; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 2358–2367. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A Survey of Data Augmentation Approaches for NLP. arXiv 2021, arXiv:2105.03075. [Google Scholar]
Yuan, X.; Wang, T.; Gulcehre, C.; Sordoni, A.; Bachman, P.; Zhang, S.; Subramanian, S.; Trischler, A. Machine Comprehension by Text-to-Text Neural Question Generation. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, BC, Canada, 3 August 2017; Blunsom, P., Bordes, A., Cho, K., Cohen, S., Dyer, C., Grefenstette, E., Hermann, K.M., Rimell, L., Weston, J., Yih, S., Eds.; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 15–25. [Google Scholar]
Duan, N.; Tang, D.; Chen, P.; Zhou, M. Question Generation for Question Answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; Palmer, M., Hwa, R., Riedel, S., Eds.; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 866–874. [Google Scholar]
Elsahar, H.; Gravier, C.; Laforest, F. Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; Walker, M., Ji, H., Stent, A., Eds.; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 218–228. [Google Scholar]
Gao, S.; Sethi, A.; Agarwal, S.; Chung, T.; Hakkani-Tur, D. Dialog State Tracking: A Neural Reading Comprehension Approach. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, Stockholm, Sweden, 11–13 September 2019; Nakamura, S., Gasic, M., Zukerman, I., Skantze, G., Nakano, M., Papangelis, A., Ultes, S., Yoshino, K., Eds.; Association for Computational Linguistics: Stockholm, Sweden, 2019; pp. 264–273. [Google Scholar]
Li, X.; Yin, F.; Sun, Z.; Li, X.; Yuan, A.; Chai, D.; Zhou, M.; Li, J. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Florence, Italy, 2019; pp. 1340–1350. [Google Scholar]
Levy, O.; Seo, M.; Choi, E.; Zettlemoyer, L. Zero-Shot Relation Extraction via Reading Comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3–4 August 2017; Levy, R., Specia, L., Eds.; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 333–342. [Google Scholar]
Li, X.; Li, F.; Pan, L.; Chen, Y.; Peng, W.; Wang, Q.; Lyu, Y.; Zhu, Y. DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios. In Proceedings of the Natural Language Processing and Chinese Computing, Hangzhou, China, 1–3 November 2024; Zhu, X., Zhang, M., Hong, Y., He, R., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 534–545. [Google Scholar]
Water Industry Standard of the People’s Republic of China (SL 579-2012): Flood Disaster Evaluation Criteria (Chinese Edition) by Zhong Hua Ren Min Gong He Guo Shui Li Bu: New Paperback|Liu Xing. Available online: https://www.abebooks.com/Water-industry-standard-Peoples-Republic-China/14934923043/bd (accessed on 11 October 2024).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Rajpurkar, P.; Jia, R.; Liang, P. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 784–789. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2020, arXiv:1906.08237. [Google Scholar]
Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019, arXiv:1904.09223. [Google Scholar] [CrossRef]
Du, X.; Cardie, C. Event Extraction by Answering (Almost) Natural Questions. arXiv 2021, arXiv:2004.13625. Available online: https://arxiv.org/pdf/2004.13625 (accessed on 17 October 2024).

Figure 1. Number of comments changing over time.

Figure 2. Event extraction example.

Figure 3. Overall flowchart of the RainEE dataset production methodology.

Figure 4. The structure of BERT_QA, as well as the baseline of our work.

Figure 5. Event argument extraction structure diagram based on trigger word perception coding event text enhancement.

Table 1. Event schema.

Event Types	Argument Roles
Waterlogging	Time, location, depth of water accumulation
Vegetation damage	Time, location, vegetation name
Transportation	Time, location, transportation name
Rescue operations	Time, location, rescuer, rescued, supplies/equipment
Personal safety	Time, location, victims, number of missing persons, number of dead, number of injured, number of trapped persons, number of disaster victims
Landslides	Time, location, landslide subject
Building collapse/damage	Time, location, subject of collapse/damage

Table 2. Parameters of the datasets used in this experiment.

Dataset	Training Set	Test Set	Validation Set
DuEE	11,958	35,000	1498
RainEE	1021	/	411

Table 3. Accuracy of event argument extraction on the DuEE dataset. The best results are highlighted in bold.

Network	Recall (%)	Precision (%)	F1 (%)
BERT_QA	72.7	73.9	73.3
ETEN_BERT_QA	76.9	75.7	76.3

Table 4. Accuracy of event argument extraction on the RainEE dataset. The best results are highlighted in bold.

Network	Recall (%)	Precision (%)	F1 (%)
BERT_QA	82.7	69.0	75.2
ETEN_BERT_QA	79.3	72.4	75.7

Table 5. Quantitative analysis of event argument extractions on the DuEE dataset. The best results are marked in bold.

Network	Total Number of Items	Number of Extractions
BERT_QA	30,475	25,369
ETEN_BERT_QA	30,475	28,261

Table 6. Detailed event argument extraction results on the DuEE dataset.

Network	Text	Event Argument Extraction
BERT_QA	There were multiple demonstrations in Paris on the 21st, local time, with preliminary statistics showing more than 10,000 people participating.	Organizational Behavior_Parade_Location: Paris
ETEN_BERT_QA		Organizational Behavior_Parade_Time: the 21st local time Organizational Behavior_Parade_Location: Paris Organizational behavior_Parade_number of marchers: more than 10,000 people

Table 7. Quantitative analysis of event argument extractions for the Henan rainstorm case. The best results are marked in bold.

Network	Total Number of Items	Number of Extractions
BERT_QA	6000	4713
ETEN_BERT_QA	6000	5461

Table 8. Detailed event argument extraction results for the Henan rainstorm case.

Network	Text	Event Argument Extraction
BERT_QA	More than one hundred people were trapped at the entrance of the Zhengzhou Confucian Temple Station.	Personal Safety_Location: the entrance of Zhengzhou Confucian Temple Station
ETEN_BERT_QA		Personal Safety_number of trapped persons: More than one hundred people Personal Safety_Location: the entrance of Zhengzhou Confucian Temple Station

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Y.; Yang, B.; He, H.; Fei, X.; Fan, X.; Liu, J. Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan. Water 2024, 16, 3535. https://doi.org/10.3390/w16233535

AMA Style

He Y, Yang B, He H, Fei X, Fan X, Liu J. Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan. Water. 2024; 16(23):3535. https://doi.org/10.3390/w16233535

Chicago/Turabian Style

He, Yun, Banghui Yang, Haixia He, Xianyun Fei, Xiangtao Fan, and Jian Liu. 2024. "Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan" Water 16, no. 23: 3535. https://doi.org/10.3390/w16233535

APA Style

He, Y., Yang, B., He, H., Fei, X., Fan, X., & Liu, J. (2024). Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan. Water, 16(23), 3535. https://doi.org/10.3390/w16233535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Event Argument Extraction for Rainstorm Disasters Based on Social Media: A Case Study of the 2021 Heavy Rains in Henan

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Production of RainEE

2.1.2. Existing Event Extraction Datasets

2.2. Methods

2.2.1. Optimizing the BERT_QA Model by Adding Event Text Augmentation Block

2.2.2. Event Text Enhancement Network

2.2.3. Event Extraction Evaluation Metrics

3. Experiment and Result

3.1. Experiment

3.2. Result

3.2.1. Accuracy Results for Event Argument Extraction

3.2.2. Quantitative Results and Specific Examples of Event Argument Extraction

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI