1. Introduction
Storm surge is a catastrophic phenomenon characterized by an abnormal rise in coastal water levels driven by low atmospheric pressure and strong winds [
1]. While primarily induced by tropical and extratropical cyclones [
2,
3], the severity of a surge is further modulated by coastal topography and geomorphology [
4,
5]. Historically, the most devastating impacts of tropical cyclones have stemmed from coastal flooding caused by these surges [
6,
7]; indeed, half of all fatalities resulting from Atlantic tropical cyclones are directly attributed to this phenomenon [
8,
9]. Recent history demonstrates the profound vulnerability of coastal populations worldwide to such extreme events. Notable examples include the Bhola Cyclone of 1970, which claimed 300,000 lives across Pakistan and India [
10], as well as Hurricane Katrina (2005) [
11] and Hurricane Sandy (2012) [
12] in the United States, and Cyclone Nargis (2008) in Myanmar [
13]. Furthermore, surges triggered by extratropical cyclones have exerted significant impacts, as evidenced by the 1993 “Storm of the Century” in the eastern United States [
14], Cyclone Xynthia in France (2010) [
15], and the 2013 North Sea flood in Northern Europe [
16].
China is particularly susceptible to these disasters, possessing an extensive mainland coastline of 18,000 km and an island coastline of 16,000 km [
17]. During the summer and autumn months, frequent typhoons often trigger storm surges that result in severe coastal flooding and substantial economic losses [
18]. In 2022 alone, China’s coastal areas experienced 13 storm surge events, incurring direct economic losses of 2.38 billion yuan [
19]. Given the severity of this threat, the accurate extraction of key information from multi-source disaster texts is critical. It serves as a core step in constructing knowledge bases for decision-making support and offers a scientific reference for modeling disaster information.
In the domain of disaster emergency response, entity and relation extraction are fundamental technologies, and their synergistic application can significantly enhance information processing efficiency. Currently, international research on entity and relation extraction predominantly follows two paradigms: pipeline methods and joint extraction methods. Regarding pipeline approaches, Liu et al. [
20]. proposed a novel end-to-end framework capable of integrating contextual semantic representations unavailable in other models; by introducing explicit entity mentions to capture positional and type information, their framework significantly enhances performance. Similarly, Chen et al. [
21]. introduced “Patti,” a pattern-first pipeline approach designed to mitigate entity redundancy and error propagation issues often found in entity-first methods. Liu et al. [
22]. presented a hybrid machine learning workflow integrating named entity recognition (NER) to detect location mentions, demonstrating the potential of conducting forensic analysis by leveraging map events and social media attention differences. Furthermore, Lai et al. utilized Natural Language Processing (NLP) to develop a hybrid NER model combining domain-specific machine learning, linguistic features, and rule-based matching [
23]. This model was the first capable of extracting detailed flood information and risk reduction projects across the contiguous United States. Alternatively, joint extraction methods attempt to unify these tasks. Cheng et al. proposed the Global Table Attention GRU (GL-TGRU) model [
24]. By utilizing sequential and tabular information encoding, this model jointly learns the representations of entities and relations, strengthening their global associations to better identify entity-relation triples. Sui et al. framed the task as a direct set prediction problem, proposing a transformer-based network with non-autoregressive parallel decoding to output relational triples in a single step [
25]. Focusing on boundary precision, Tang et al. introduced a boundary regression mechanism to learn offsets relative to true named entities, proving highly effective in refining entity spans [
26]. Additionally, Wan et al. proposed the Span-based Multimodal Attention Network (SMAN), which consistently outperformed state-of-the-art models on the SciERC and ADE datasets, achieving an F1-score over 1.42% higher than competing methods on the CoNLL04 dataset [
27].
In summary, extant research has achieved notable progress in the accuracy and efficiency of entity recognition and relation extraction, but exhibits significant limitations when applied to the complex domain of storm surge disasters. Pipeline models, based on sequential execution architectures, suffer from error propagation. While joint extraction models attempt to address this via parameter sharing, traditional decomposition paradigms often overlook strong interdependencies among triple components, leading to reasoning biases in complex or data-sparse scenarios. To resolve cumulative errors of pipeline models and inherent limitations of joint models, as well as to handle multi-temporal expressions and complex cross-entity associations typical of storm surge texts, this paper proposes a disaster information extraction method based on Global Pointer Networks (GPN). GPN is a deep learning model for Named Entity Recognition (NER) that uses a global normalization method to simultaneously identify nested and non-nested entities. By treating an entity’s head and tail as an integrated whole, it provides a more comprehensive perspective for entity recognition than traditional pointer networks. GPN performs comparably to Conditional Random Fields (CRF) in non-nested scenarios but shows significant advantages in nested entity recognition. Accordingly, this study integrates GPN into a joint extraction architecture, endowing entity-relation extraction with a stronger global view and thus effectively mitigating the adverse impacts of cumulative errors.
2. Storm Surge Knowledge Extraction Model
To address the complexities of disaster text mining, this study proposes the Storm Surge Knowledge Extraction Model (SSKEM). This framework is trained on a heterogeneous corpus comprising 4000 records aggregated from official reports (e.g., the Annual Guangdong Marine Disaster Bulletin), news outlets (China News Service), and social media (Weibo). The primary objective is to extract structured knowledge entities, specifically focusing on temporal, spatial, status, and impact attributes. The methodology involves a two-stage technical implementation: the construction of a specialized instruction dataset and the execution of domain-adaptive fine-tuning. As shown in
Figure 1, the annotated corpus is partitioned to facilitate both structured fine-tuning and rigorous testing. While the test set encompasses the model’s full output, our analysis is stratified into two components to ensure a granular evaluation: a qualitative assessment of knowledge extraction performance and a quantitative metric-based evaluation of the model’s accuracy.
2.1. Encoding Layer
Bidirectional Encoder Representations from Transformers (BERT) is a language representation model that learns vector representations of text through pre-training on large-scale unannotated corpora [
28]. It primarily consists of an embedding module, a Transformer module, and a pre-training–fine-tuning module. When inputting storm surge text data, each character in the input text is encoded into a vector representation, and the calculation for each vector W is shown in Equation (1).
where
represents the token embedding,
represents the position embedding, and
represents the segment embedding (token type embedding). Although the standard BERT model uses
to distinguish between sentence pairs, in our single-sentence extraction task, this vector serves to provide global semantic segmentation information and maintain structural consistency with the pre-trained BERT architecture.
The Transformer, introduced by Google in 2017, is a type of encoder that differs from traditional Recurrent Neural Networks (RNNs) in that it is based on the attention mechanism [
29]. Self-Attention is the most important component of the Transformer, and its computation is shown in Equation (2). The core idea of Self-Attention is to enhance the semantic representation of a target word by considering other words in the text, thereby better utilizing the contextual information [
30,
31]. Through Self-Attention, a new vector representation for each word is obtained. This new representation not only includes the features of the current word but also incorporates contextual information from the surrounding words in the sentence. This makes the word vector more relevant and global, capturing relationships not just with the word itself, but also with its surrounding context [
32]. This ability to consider the entire context of a word allows the model to generate more accurate and meaningful word representations. In essence, Self-Attention helps the model understand how each word relates to others, enhancing its understanding of the entire sentence. This mechanism significantly improves the model’s ability to grasp complex syntactic and semantic relationships within the text.
where Q, K, and V represent the query, key, and value vectors corresponding to each word, respectively. These vectors are derived from the original input vectors through linear transformations, and they share the same dimensionality as the original input vectors.
As shown in Equation (3), the dynamic multi-source adaptation architecture [
33] is introduced, the parameter-sharing gating mechanism is designed, and the BERT coding layer and the pointer network share 80% of the parameters.
where h
fusion denotes the final fused feature representation, h
BERT represents the context-aware semantic vector output by the BERT encoder, and h
GP refers to the feature representation derived from the GPN. The parameter λ is a learnable gating coefficient (or a hyperparameter) that dynamically balances the contribution of semantic information from the pre-trained model and the structural features from the pointer network.
2.2. Head Entity Extraction Layer Based on GPN
The head entity extraction process begins by encoding the original text using a BERT encoder, which converts the input into a vector representation denoted as OL. This output is then decoded using the GPN network. Traditional pointer networks typically use two separate modules to identify the start and end positions of entities independently. For instance, models like CasRel apply two sigmoid functions to recognize the head and tail separately. However, such an approach overlooks the holistic nature of Chinese words, leading to a significant loss of contextual features between adjacent characters. As a result, entity recognition performance is suboptimal, particularly when dealing with nested entities [
34].
Specifically, assume the input text sequence has a length of
n, and each target entity to be recognized is a continuous span within this sequence, with no fixed length and allowing for nested structures. In this case, the sequence contains a total of
n(
n + 1)/2 candidate spans. For example, as shown in
Figure 2, the entity “Yangjiang City” (阳江市) can be decomposed into seven candidate spans:
“阳” (Yang), “阳江” (Yangjiang), “阳江市” (Yangjiang City), “江” (Jiang), “江市” (Jiang City), and “市” (City).
The task then becomes a multi-label classification problem, where the goal is to select k correct entities from this set of candidates.
In this study, we adopt a dot-product attention mechanism. Let the input text of length n be encoded by the BERT encoder into a sequence of vectors. These vectors are then processed through feature concatenation and a fully connected layer. The detailed computation is shown in Equations (4) and (5).
where
and
are trainable weight matrices.
and
represent the vector sequences used to identify entities of type α, corresponding to the Query vector of the i-th token and the Key vector of the j-th token in the input sentence, respectively. By computing the dot product between these vectors, we obtain the attention score as shown in Equation (6):
The resulting score function reflects the likelihood that the continuous span from position i to j is an entity of type α. If the score is greater than zero, it indicates that the span from i to j is likely an entity of type α; if the score is less than or equal to zero, the span is either not an entity or belongs to another entity type.
Meanwhile, to enhance the GPN decoding layer, this paper introduces the Rotary Position Encoding (RoPE) mechanism [
6,
35]. This mechanism incorporates relative positional information through trigonometric functions, enabling the construction of a global entity matrix that captures position-aware interactions between tokens. The detailed computation is shown in Equation (7):
2.3. Relation and Tail Entity Extraction Layer
To highlight the features of head entities during relation and tail entity extraction, thereby improving accuracy, this paper proposes a fusion strategy. Specifically, the character-level vectors of the recognized head entity are averaged to obtain a mean representation. This mean vector is then combined with the output of the BERT encoder applied to the original text. By adding the averaged head entity vector to each token vector in the encoded text, the model effectively incorporates head entity information into the input representation. The detailed computation is shown in Equations (8) and (9).
We employed an element-wise addition strategy to fuse the head entity representation with the context representation . Compared to concatenation, which doubles the dimensionality and computational overhead, element-wise addition efficiently injects conditional information about the subject while preserving the original semantic space of the BERT embeddings. This lightweight fusion strategy effectively allows the model to focus on relation-specific features without significantly increasing parameter complexity.
Where denotes the averaged vector representation of the t-th head entity, while represents the vector of the character at index i. The variables’ start and end indicate the starting and ending positions of the entity. The fused vector is obtained by integrating the original token vector with the head entity vector.
This fused representation is then passed into the GPN layer to compute the tail entity score function under a specific relation. The computation process is defined in Equations (10)–(12), where the score function
represents the likelihood that the span from position i to j corresponds to a tail entity under relation
.
where w
q,β and w
k,β are trainable weight matrices specific to the relation type β, and b
q,β are the corresponding bias vectors. The vectors q
i,β and k
j,β represent the Query and Key representations for the tokens at positions i and j, respectively, projected into the relation-specific semantic space. The score s
β(i,j) captures the correlation between these positions, indicating the probability that the span from i to j constitutes a valid tail entity under relation β.
2.4. Training Strategy
Traditional training strategies often treat multi-label classification tasks as multiple independent binary classification problems. A Sigmoid function is used to compute the probability of each label, and predictions are made by applying a fixed threshold. However, this approach suffers from significant class imbalance and is highly sensitive to the choice of threshold, which can negatively impact the model’s performance.
Inspired by the ZLPR (Zero-Positive Learning with Pairwise Ranking) multi-label loss function proposed by Su et al. [
6], this study extends the “Softmax + Cross-Entropy” framework to the multi-label setting. The resulting loss function is defined in Equations (13)–(15).
where
denotes the loss function for head entity extraction,
represents the loss function for tail entity extraction under a specific relation
, and
is the overall loss function for jointly extracting head entities, tail entities, and their corresponding relations. The set
contains the start and end positions of entities of type
,
presents the start and end positions of non-entities and entities of types other than
. Similarly,
denotes the start and end positions of tail entities under relation
, and
indicates the start and end positions of non-tail entities under the same relation.
3. Experiments and Results Analysis
3.1. Experimental Environment and Parameter Settings
The experiments were conducted using PyCharm 2025.3 as the development environment on a Linux 3.9.0 operating system, equipped with 32 GB RAM and an NVIDIA 1660s GPU. The deep learning framework utilized was PyTorch 1.8.1. The maximum sentence length was set to 300 tokens, and the training batch size was set to 8. The model was optimized using the Adam optimizer, with a learning rate of
. Training was performed for 50 epochs, and the model with the highest F1 score on the validation set was selected as the final model. Since deep learning is sensitive to initialization [
36], we conducted five independent runs using different random seeds for precision, recall, and F1-score to ensure that the performance improvements are not coincidental [
37]. The average values were recorded, and the standard deviations for all results were below 0.4%, confirming the reliability and stability of the proposed model’s performance.
3.2. Dataset and Evaluation Metrics
The dataset was constructed by crawling storm surge-related textual data from various sources, including annual marine disaster bulletins, mainstream news websites such as China News Service, and social media platforms, resulting in a total of 4000 documents (see
Table 1). After manual filtering to remove duplicate and irrelevant content, a high-quality corpus of 2372 storm surge-related texts was obtained. The texts exhibit considerable length variability, ranging from a minimum of 120 characters to a maximum of 3500 characters, with an average length of approximately 750 characters. The dataset constructed in this study consists primarily of Chinese texts, given that the storm surge events analyzed (e.g., Typhoon Mangkhut, Typhoon Hato) predominantly impacted China’s coastal regions. Consequently, the linguistic features processed by the model, such as character-based tokenization, are specific to the Chinese language context.
To ensure the model’s generalization ability, the dataset was randomly split into training (70%), testing (15%), and validation (15%) sets, facilitating thorough cross-validation and model evaluation. Annotation was performed following a detailed guideline by a professional annotation team to guarantee consistency and accuracy. Moreover, to ensure data diversity and representativeness, samples were carefully selected from different sources (such as disaster bulletins, news reports, and social media), and cross-validation was conducted across these sources to verify the model’s performance on various text types. To ensure data quality, we employed a cross-validation annotation strategy. Each sample was annotated by two independent domain experts. In cases of disagreement, a third senior expert adjudicated the final label. The inter-annotator agreement (Cohen’s Kappa) was calculated to be 0.85, indicating high consistency.
To further validate model robustness, a stratified cross-validation strategy was employed based on data sources. Specifically, the dataset was stratified by source (disaster bulletins, news reports, social media) to ensure that each source’s proportion in the training, testing, and validation sets reflected its overall distribution. The model was independently trained and tested on each subset to evaluate its performance across different text types. This approach ensures that the model not only performs well on specific text types but also maintains high precision and recall when processing storm surge-related texts from diverse sources.
During the evaluation of the triple extraction task, a predicted triple is considered correct if and only if both the head and tail entities, as well as the corresponding relation, are correctly identified. This study employs three commonly used evaluation metrics: Precision (P), Recall (R), and F1-score (F). The calculation formulas are shown in Equations (16)–(18).
where
denotes the number of true positive predictions (correctly identified triples),
represents false negatives (triples that were not identified), and
indicates false positives (incorrectly identified triples).
To validate the effectiveness of the proposed model for triple extraction from storm surge disaster texts, three baseline models were employed for comparison: the joint extraction model NovelTaggingBert, CopyRRL, and CasRel. By comparing their performance on the dataset, the effectiveness of the proposed method is demonstrated.
3.3. Experimental Results and Analysis
3.3.1. Case Study of Typical Scenarios
To quantitatively assess the specific advancements of the proposed framework, comparative experiments were conducted against the baseline CasRel model across three representative scenarios, as detailed in
Table 2. The results validate significant performance gains in handling nested entities, cross-sentence dependencies, and multi-source data heterogeneity. First, regarding nested entity recognition, the model leverages the matrix decoding mechanism of the GPN to effectively mitigate truncation issues common in traditional head-tail annotation methods. Consequently, the recognition accuracy for complex geographic entities exceeding 15 characters (e.g., ‘from Pearl River Estuary to Leizhou Peninsula coastline’) improved by 13.7%. Second, in cross-sentence relation modeling, the approach demonstrates superior long-range dependency capture. For texts spanning more than three paragraphs (with average spacing of 128 characters), the model achieves a relation recall rate of 85.6%—an 18.2 percentage point increase over the baseline. Third, regarding multi-source adaptability, the model exhibits robust performance on unstructured social media text, achieving an F1-score of 87.3%. Notably, this narrows the performance gap between formal government bulletins and informal social media data from 14.2% to 5.7%. Finally, the integration of an optimized segmented attention mechanism significantly enhances computational efficiency, processing long texts (up to 3500 characters) at 1800 characters per second. This represents a 2.6-fold speed increase compared to traditional RNN architectures, satisfying the real-time analysis demands of disaster emergency response.
3.3.2. Effectiveness of Data Augmentation Strategies
To evaluate the impact of data preprocessing on model performance, three sets of comparative experiments were conducted. These included: (1) Original model: training with the original BERT-base Chinese model without any preprocessing, (2) Lexicon-enhanced model: training with the addition of a manually curated vocabulary, which includes 23,000 entries extracted from the Dictionary of Storm Surge Geographic Entities and the Glossary of Marine Disasters, and (3) Logic-enhanced model: training with both the vocabulary and a set of logical rules incorporated (e.g., the relationship where “coastal city + tide level value” constitutes a “disaster-stricken area—disaster indicator” relation). The results of these experiments allow for a comprehensive assessment of the effectiveness of the data enhancement strategies.
The experimental results (
Figure 3) indicate that vocabulary augmentation improves the F1-score for nested entity recognition by 7.2%. In particular, the recognition accuracy for complex geographic entities such as “the eastern coast of the Leizhou Peninsula” increased from 68.4% to 82.1%. After incorporating logical rules, the accuracy of cross-sentence relation extraction improved by 9.8%. For instance, the recall rate of cross-paragraph associations between “economic loss” and “affected regions” increased from 71.3% to 83.5%.
3.3.3. Validation of Training Strategy Effectiveness
To comprehensively evaluate the performance of the proposed model, we conducted assessments not only on the overall dataset but also specifically on samples with varying text lengths.
Table 3 presents representative extraction results across different text lengths, including short texts (e.g., social media) and long texts (e.g., full news articles).
The examples demonstrate that the model is capable of accurately extracting head and tail entities along with their corresponding relations, even under complex conditions such as overlapping entities and relation crossings. For instance, in the sentence “The storm surge levels at Dinghai Station and Sanmen Station in Zhejiang Province reached 161 cm and 175 cm, respectively,” the model successfully identified each station and its associated surge measurement with high precision.
Furthermore, we compared the model’s performance across different text lengths, as shown in
Table 4 and
Table 5. The results indicate that the proposed model consistently outperforms baseline models on both short and long texts. In particular, for long-text processing, the GPN effectively captures global semantic information, leading to significant improvements in both precision and recall compared to the baselines.
Figure 4 further illustrates the performance comparison between the proposed model and the three baseline models on the full storm surge dataset. Our model outperforms all baselines in terms of precision, recall, and F1-score. Specifically, it achieved a precision of 89.5%, a recall of 84.3%, and an F1-score of 88.4%, representing a 5.5% improvement in F1-score over the best-performing baseline, CasRel. These results provide strong evidence of the effectiveness and superiority of the proposed method in storm surge information extraction tasks.
3.3.4. Comparison with Large Language Models (LLMs)
To further validate the necessity and efficiency of the proposed SSKEM in the era of large language models, we conducted a comparative experiment using representative LLMs: GPT-3.5-Turbo and GPT-4. We employed a “Few-shot Prompting” strategy, providing the models with 3 annotated examples and the definition of the schema (Location, Time, Impact) to perform extraction on the test set.
As shown in
Table 6, although GPT-4 achieves a high Recall (88.1%) due to its powerful semantic understanding, its Precision (81.2%) is significantly lower than SSKEM. The error analysis reveals that LLMs tend to “hallucinate” or over-interpret generalized geographic descriptions (e.g., inferring “Guangdong” from “Zhuhai”) which are not explicitly stated in the text, leading to false positives.
More importantly, in terms of efficiency, SSKEM processes over 125 samples per second, which is approximately 150 times faster than GPT-4. For disaster emergency response systems that require processing massive streams of social media data in real-time, or for deployment on offline edge devices in command centers, SSKEM offers an irreplaceable advantage in terms of latency and computational cost. While LLMs offer strong generalization, a specialized model like SSKEM is preferred in this domain because it provides higher extraction precision by suppressing hallucinations, satisfies real-time emergency response requirements with 150× faster inference speed, and allows for cost-effective, private deployment on local hardware
4. Discussion
The experimental results demonstrate that the Storm Surge Knowledge Extraction Model (SSKEM) significantly advances the extraction of structured intelligence from heterogeneous disaster texts, achieving an F1-score of 88.4%. This performance represents a decisive shift from traditional pipeline paradigms to a unified joint extraction framework. While pipeline methods, such as the end-to-end framework by Liu et al. [
20] and the pattern-first approach by Chen et al. [
21], improved semantic representation, they remain susceptible to error propagation—where failures in entity recognition cascade into relation classification. By integrating the Global Pointer Network with BERT, our model eliminates this cascading error by treating entity-relation extraction as a unified matrix prediction task, thereby validating the hypothesis that global normalization yields superior fidelity in complex linguistic environments.
A critical innovation of this study is the resolution of the nested entity boundary problem, a persistent challenge in processing geographic descriptions (e.g., “coastal areas from the Pearl River Estuary to the Leizhou Peninsula”). Traditional sequence labeling models (such as CRFs) often fail to capture overlapping spans due to their disjoint head-tail annotations. In contrast, our model utilizes a matrix decoding mechanism that perceives the entity head and tail as an integrated whole. This architectural shift resulted in a 13.7% improvement in identifying complex geographic entities. This finding suggests that mapping text to a multi-dimensional semantic space via GPN is more effective for high-density information extraction than reducing it to a linear sequence, a conclusion that aligns with and extends the findings of Sui et al. [
25] regarding set prediction formulations.
Furthermore, the study addresses the trade-off between model complexity and inference speed, a common bottleneck in real-time disaster response systems. Deep learning models often sacrifice latency for accuracy; however, SSKEM achieves a processing speed of 1800 characters per second—2.6 times faster than traditional RNN architectures. This efficiency is attributed to the Rotary Position Encoding (RoPE) and the optimized segmented attention mechanism, which capture global semantic dependencies without the quadratic computational cost typically associated with long-text processing. The ability to maintain a recall rate of 85.6% across texts exceeding three paragraphs confirms that the model effectively mitigates the “forgetting” issue inherent in long-sequence modeling, providing a viable technical solution for analyzing lengthy government disaster bulletins in real time.
Regarding data adaptability, the model successfully bridged the semantic gap between formal official bulletins and informal social media content, narrowing the performance disparity to just 5.7%. This robustness is likely driven by the ZLPR multi-label loss function, which effectively handles the class imbalance and sparse signals typical of short, noisy social media texts. This finding supports the forensic potential of social media analysis noted by Liu et al. [
22] but extends it by demonstrating that a single unified model can generalize across these divergent data modalities without requiring separate, modality-specific pipelines.
Despite these promising results, several limitations warrant discussion. First, the current dataset is confined to Chinese storm surge texts from 2013 to 2023. While the model demonstrates strong domain adaptability, its generalization to other languages or distinct disaster types (e.g., earthquakes or wildfires) with different linguistic structures remains to be verified. Second, the current framework operates exclusively on textual data. As noted by Wan et al. [
27] in the context of SMAN, multimodal information is increasingly vital. Disaster reports frequently contain charts, typhoon path maps, and flood imagery that our current model does not utilize. Future work will focus on integrating visual features into the embedding layer to construct a multimodal knowledge graph, further enhancing the situational awareness provided to emergency decision-makers.
Although currently trained on Chinese storm surge texts, the SSKEM architecture is language-agnostic. The Global Pointer Network treats entity extraction as a span detection task, which applies equally to English or other languages, provided that a corresponding multilingual encoder (e.g., mBERT) is used. Furthermore, the schema defined (location, time, impact) is highly transferable to other natural disasters such as hurricanes, floods, and landslides, as they share similar reporting structures in official bulletins and social media. Regarding data conflicts from multi-source inputs (e.g., discrepancies between official reports and social media), the current model extracts all distinct claims. Future iterations will incorporate a Source Credibility Parameter, assigning higher confidence weights to official government bulletins compared to social media posts, to automatically resolve conflicts during the knowledge fusion stage.
As demonstrated in the comparative experiments in
Section 3.3.4, SSKEM exhibits distinct advantages over large language models (LLMs) like GPT-4 in terms of computational efficiency and domain-specific precision. While LLMs demonstrate powerful semantic understanding, their high inference latency (0.8 samples/s) and reliance on cloud-based APIs pose significant risks for disaster response scenarios that require real-time processing or operation in disconnected environments. In contrast, SSKEM achieves a throughput of 125.4 samples/s, validating its suitability for deployment on edge devices in emergency command centers. Furthermore, our model mitigates the “hallucination” issue common in LLMs when dealing with fine-grained geographic entities.
5. Conclusions
This study addressed the critical challenge of automating information extraction from heterogeneous storm surge disaster texts, ranging from official bulletins to unstructured social media streams. By constructing a high-quality, domain-specific corpus and proposing a joint extraction framework based on GPNs, we aimed to overcome the limitations of traditional pipeline models in handling nested entities, overlapping relations, and the latency requirements of emergency response.
The experimental findings demonstrate that the proposed framework significantly advances the state of the art in disaster text mining. First, the model achieves a comprehensive F1-score of 88.4%, outperforming the strong baseline (CasRel) by 5.5%. Specifically, the utilization of the GPN matrix decoding mechanism successfully resolves the boundary ambiguity of complex entities, improving the recognition of nested geographic entities (e.g., long coastal descriptions) by 13.7%. Second, the integration of global semantic features proves highly effective for long-text modeling, enhancing the recall rate for cross-sentence relations by 18.2%. Third, the inclusion of domain lexicons and logical rules significantly boosts robustness, allowing the model to bridge the performance gap between standardized government reports and fragmented social media texts. These findings have substantial theoretical and practical implications. Theoretically, the study validates the superiority of global normalization approaches over sequence labeling in domains characterized by high entity density. Practically, the optimization of the segmented attention mechanism enables the model to process 1800 characters per second—2.6 times faster than traditional RNN architectures. This efficiency confirms the model’s viability for real-time applications, where it can provide decision-makers with timely, structured intelligence essential for rapid damage assessment and resource allocation.
Despite these achievements, this study bears certain limitations. The current dataset, while rigorously annotated, is limited to 4000 documents centered on specific linguistic patterns found in Chinese coastal reporting, which may constrain generalization to other languages or disaster types. Additionally, the current framework relies exclusively on textual modalities, overlooking valuable visual information (such as typhoon path maps or flooding photos) that often accompanies disaster reports. Future research will focus on expanding the model’s capabilities in two key directions. First, we intend to incorporate multimodal learning techniques to align textual extraction with visual features, thereby creating a more holistic perception of disaster situations. Second, we plan to explore the integration of Large Language Models (LLMs) to enhance the system’s few-shot learning abilities, allowing for rapid adaptation to new disaster scenarios with minimal annotated data. In summary, this research provides a robust, efficient solution for transforming unstructured disaster big data into actionable knowledge, offering a solid technical foundation for the development of intelligent emergency management systems.