AI-Driven Global Disaster Intelligence from News Media

Sufi, Fahim; Alsulami, Musleh

doi:10.3390/math13071083

Open AccessArticle

AI-Driven Global Disaster Intelligence from News Media

by

Fahim Sufi

^1,*

and

Musleh Alsulami

²

¹

School of Public Health and Preventive Medicine, Monash University, Australia, VIC 3004, Australia

²

Department of Software Engineering, College of Computing, Umm Al-Qura University, Makkah 21961, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(7), 1083; https://doi.org/10.3390/math13071083

Submission received: 5 March 2025 / Revised: 17 March 2025 / Accepted: 19 March 2025 / Published: 26 March 2025

(This article belongs to the Topic Soft Computing and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Open-source disaster intelligence (OSDI) is crucial for improving situational awareness, disaster preparedness, and real-time decision-making. Traditional OSDI frameworks often rely on social media data, which are susceptible to misinformation and credibility issues. This study proposes a novel AI-driven framework utilizing automated data collection from 444 large-scale online news portals, including CNN, BBC, CBS News, and The Guardian, to enhance data reliability. Over a 514-day period (27 September 2023 to 26 February 2025), 1.25 million news articles were collected, of which 17,884 were autonomously classified as disaster-related using Generative Pre-Trained Transformer (GPT) models. The analysis identified 185 distinct countries and 6068 unique locations, offering unprecedented geospatial and temporal intelligence. Advanced clustering and predictive analytics techniques, including K-means, DBSCAN, seasonal decomposition (STL), Fourier transform, and ARIMA, were employed to detect geographical hotspots, cyclical patterns, and temporal dependencies. The ARIMA (2, 1, 2) model achieved a mean squared error (MSE) of 823,761, demonstrating high predictive accuracy. Key findings highlight that the USA (6548 disasters), India (1393 disasters), and Australia (1260 disasters) are the most disaster-prone countries, while hurricanes/typhoons/cyclones (5227 occurrences), floods (3360 occurrences), and wildfires (2724 occurrences) are the most frequent disaster types. The framework establishes a comprehensive methodology for integrating geospatial clustering, temporal analysis, and multimodal data processing in OSDI. By leveraging AI automation and diverse news sources, this study provides a scalable, adaptable, and ethically robust solution for proactive disaster management, improving global resilience and preparedness.

Keywords:

open-source disaster intelligence; geospatial and temporal intelligence; news media data mining; predictive disaster modeling; disaster intelligence; AI

MSC:

03-04; 03-08; 03-11

1. Introduction

The increasing frequency and severity of natural disasters necessitate more robust and efficient disaster intelligence systems. Traditional disaster management approaches often rely on proprietary data and closed systems, leading to limitations in real-time situational awareness and collaborative decision-making [1,2]. These limitations hinder effective disaster response and recovery, particularly in regions with inadequate infrastructure and communication networks. To address these challenges, open-source disaster intelligence (OSDI) has emerged as a transformative paradigm, leveraging publicly available data, social media analytics, and advanced computational models for comprehensive disaster monitoring and response [3,4]. OSDI offers the potential to democratize disaster data access, enabling a more inclusive approach to disaster preparedness and mitigation strategies. However, despite the growing interest and adoption of OSDI, several critical research gaps remain unaddressed.

First, while the importance of democratizing disaster data access through open-source platforms is well documented, the existing literature lacks standardized frameworks for effectively integrating social media data with traditional disaster reports [5,6]. The absence of standardized integration mechanisms limits the potential of multimodal analytics in enhancing situational awareness and decision-making. Second, although advanced methodologies such as natural language processing (NLP) and machine learning (ML) have been increasingly employed for disaster detection and prediction, their scalability and cross-lingual capabilities for global disaster analytics require further exploration [6,7]. Multimodal tweet classification using transformer models has shown promise in contextual disaster intelligence, but challenges related to language diversity and cultural nuances remain unresolved. Third, the reliability of disaster intelligence derived from social media data is frequently compromised due to issues related to data credibility, misinformation, and ethical concerns [8,9,10]. The proliferation of fake accounts and the spread of misinformation during disaster events pose significant challenges to ensuring data integrity and credibility. Therefore, addressing these challenges necessitates the development of robust data verification mechanisms and ethical frameworks.

To bridge these research gaps, this study proposes a novel framework that integrates advanced geospatial and time-series modeling techniques with real-time open-source data sources, including social media feeds and global disaster reports. The proposed framework leverages geospatial clustering algorithms, including K-means and DBSCAN, to detect geographical hotspots and complex spatial distributions of disaster occurrences [11,12]. Additionally, it employs temporal pattern analysis using seasonal decomposition (STL), Fourier transform, and ARIMA models to identify cyclical patterns and temporal dependencies in disaster occurrences and severity [13,14,15]. By integrating multimodal tweet classification using transformer models, the framework enhances real-time disaster communication and situational awareness [6,7]. Unlike existing approaches, this study uniquely combines geospatial clustering, temporal pattern analysis, and multimodal data integration in a unified framework, providing a more holistic and dynamic approach to disaster intelligence.

This study demonstrates the efficacy of the proposed framework through comprehensive experimentation using a dataset of 17,884 disaster events. The data collection process involved automated extraction using Microsoft Power Automate, ensuring comprehensive coverage of disaster events worldwide. The structured dataset was analyzed using advanced geospatial and time-series techniques, enabling the identification of disaster-prone regions and the development of predictive models for future occurrences and severity levels. The experimental results revealed significant geographical hotspots, with major clusters observed in North America (USA), South Asia (India), and Australia. These findings indicate considerable geographical vulnerability, necessitating region-specific disaster preparedness measures. Additionally, the temporal pattern analysis identified cyclical behaviors corresponding to seasonal climatic influences, enhancing the predictive accuracy of disaster forecasting. The ARIMA model configuration of

(2, 1, 2)

yielded a mean squared error (MSE) of approximately 823,761, demonstrating high predictive performance and robustness in capturing temporal dynamics. To mitigate the impact of misinformation, fake accounts, and credibility issues present in social-media-driven disaster intelligence (as discussed in [8,9,10]), this research sourced data using AI-driven automation from 444 large-scale online news portals (e.g., CNN, BBC, CBS News, The Guardian, Daily Mail UK, NDTV, Times of India etc.). This diversified and large range of news media from all parts of the globe (i.e, US, UK, Australia, India, China etc.) provided a much more credible open-source disaster analytics compared to previous studies on Social-Media driven data. Out of 1.25 million news articles captured from 27 September 2023 to 26 February 2025, 17,884 articles were classified by Generative Pre-Trained (GPT) models as news pertaining to global disasters like earthquakes, floods, landslides, cyclones, wildfires, etc. Most importantly, GPT classification identified 185 distinct countries and 6068 distinct locations spanning over 514 days (i.e., 17 months) providing unprecedented geospatial and temporal intelligence that has not been demonstrated in previous studies. The GPT-based classification model exhibits its highest predictive accuracy in the categorization of country, with a precision of approximately 94.5%, a recall of 94.8%, and an F1-score of 94.65%, underscoring its robust capability in geospatial disaster identification. Conversely, the model’s classification accuracy is comparatively lower in deaths and Injuries, where precision and recall values range between 87.5% and 88.5%, highlighting the inherent challenges in extracting precise casualty figures due to the frequent use of non-numeric descriptors in news reports.

The impact of this research is substantial, contributing to both theoretical and practical advancements in disaster intelligence. By leveraging open-source data and advanced computational models, this study enhances situational awareness, improves predictive accuracy, and promotes collaborative research in disaster management (as shown in Figure 1). The proposed framework facilitates real-time monitoring and decision-making, reducing disaster response time and enabling proactive disaster management through accurate risk assessment and timely situational awareness. Additionally, the integration of multimodal data analytics fosters a more comprehensive understanding of disaster events, supporting evidence-based policy formulation and resource allocation. The open-source nature of the framework promotes global collaboration, enabling researchers and practitioners worldwide to build upon and extend this work for region-specific applications.

This study not only advances the field of open-source disaster intelligence but also contributes to building resilient communities through enhanced disaster preparedness and mitigation strategies. By leveraging the power of open-source data and advanced computational models, this research paves the way for more inclusive and collaborative disaster management solutions. The key contributions of this study are summarized as follows:

Theoretical contribution to open-source disaster intelligence: Establishes a comprehensive framework for integrating geospatial clustering, temporal pattern analysis, and multimodal data integration in open-source disaster intelligence, bridging the gap between real-time disaster event tracking and AI-driven predictive analytics.
Largest ai-driven disaster intelligence dataset: Collected and processed 1.25 million news articles from 444 global news portals, identifying 17,884 disaster-related reports across 185 countries and 6068 unique locations over 514 days (from 27 September 2023 to 26 February 2025). Dataset available at https://github.com/DrSufi/NewsDisaster (accessed on 15 March 2025).
High-precision predictive disaster forecasting: Implemented ARIMA (2, 1, 2) modeling, achieving a mean squared error (MSE) of 823,761, along with Fourier transform and seasonal decomposition (STL) to detect cyclical disaster patterns, improving early warning capabilities.
Empirical identification of global disaster hotspots: Determined the top three most disaster-affected countries—USA (6548 disasters), India (1393 disasters), and Australia (1260 disasters)—and confirmed through geospatial clustering that North America, South Asia, and Australia are the most disaster-prone regions.
Comprehensive analysis of disaster types and frequency: Identified the three most frequent disasters—hurricanes/typhoons/cyclones (5227 occurrences), floods (3360 occurrences), and wildfires/bushfires (2724 occurrences)—providing critical insights for disaster preparedness and mitigation strategies.

2. Contextual Background

Open-source disaster intelligence (OSDI) has emerged as a pivotal area of research, leveraging the power of publicly available data, social media analytics, and advanced computational models to enhance situational awareness, disaster preparedness, and mitigation strategies. This section critically examines the relevant literature, categorizing the studies into four main areas: importance and requirements of open-source disaster intelligence, unique methodologies in open-source disaster intelligence, data requirements, and drawbacks associated with open-source disaster intelligence.

2.1. Importance and Requirements of Open-Source Disaster Intelligence

The significance of open-source disaster intelligence lies in its ability to democratize disaster data access, thereby enhancing situational awareness and decision-making for a broader audience, including governments, NGOs, and local communities. The use of open-source platforms reduces dependency on proprietary systems and facilitates collaborative disaster management strategies.

Ref. [1] emphasized the need for automated machine learning algorithms to extract actionable insights from global disaster data, highlighting the critical role of open-source systems in real-time disaster monitoring and decision support systems. Ref. [2] explored flood detection using Twitter streams, demonstrating how open-source tools can be leveraged for real-time disaster detection and response. Similarly, Ref. [3] discussed the importance of social media analytics in understanding disaster-related misinformation, advocating for open-source frameworks to enhance data credibility and accuracy.

Further, Ref. [4] demonstrated the efficacy of open-source image analysis tools for landslide mapping, emphasizing the role of collaborative platforms in geospatial disaster intelligence. Ref. [16] illustrated the importance of open-source inventory systems for long-term disaster data curation and analysis, thereby supporting sustainable disaster management.

The systematic review by Ref. [5] provided a comprehensive overview of open-source disaster intelligence dimensions, advocating for a standardized framework to integrate social media data and traditional disaster reports. Ref. [6] further underscored the importance of multimodal analytics using open-source transformer models, emphasizing cross-lingual capabilities in global disaster detection.

Ref. [17] contributed to the importance of open-source disaster intelligence by presenting a global dataset of tropical cyclone measurements, which is crucial for comprehensive disaster analysis. Additionally, refs. [18,19] highlighted the importance of systematic review methodologies and academic search systems, emphasizing the need for open-access disaster data repositories to enhance collaborative research.

Other studies, such as [20], demonstrated the significance of transformer-deep neural network models in Twitter disaster detection, showcasing the potential of open-source platforms in contextual disaster intelligence. Ref. [7] highlighted the need for multimodal tweet classification in disaster response systems, advocating for open-source NLP models to improve disaster communication and sentiment analysis.

Ref. [21] presented a decision support system that integrates live Twitter feeds for disaster monitoring, showcasing the scalability and adaptability of open-source platforms for real-time disaster intelligence. Ref. [22] proposed the use of recurrent GAN models to augment disaster displacement data, emphasizing the importance of open-source synthetic data generation for comprehensive disaster analytics.

2.2. Unique Methodologies in Open-Source Disaster Intelligence

Open-source disaster intelligence employs diverse methodologies to enhance disaster detection, prediction, and response. These methodologies primarily utilize advanced natural language processing (NLP), machine learning (ML), and deep learning models integrated with open-source platforms.

Ref. [23] introduced AI-SocialDisaster, a novel AI-based software that leverages NLP and sentiment analysis to identify and analyze natural disasters from social media posts, demonstrating the potential of open-source transformer architectures in disaster analytics. Ref. [24] proposed TPredDis, a unique hybrid model that combines semantic intelligence and machine learning for disaster prediction using social media data.

Ref. [6] introduced a visual and linguistic transformer fusion model for multimodal tweet classification, highlighting the power of open-source models in integrating text and image data for comprehensive disaster intelligence. Ref. [7] implemented a bidirectional attention model for multimodal tweet classification, emphasizing the role of open-source transformer architectures in cross-lingual disaster analytics.

Ref. [25] employed entity-based transformer methods to detect emergency events on social media, showcasing the effectiveness of open-source NLP models for event detection and classification. Ref. [20] demonstrated the utility of deep neural networks integrated with transformer architectures in Twitter-based disaster detection, highlighting the scalability of open-source models for global disaster intelligence.

Ref. [26] proposed Topic2Labels, a framework that combines LDA topics with deep learning models for classifying social media data during crises, showcasing the potential of hybrid methodologies in open-source disaster intelligence. Ref. [27] developed a novel framework for assessing the criticality of retrieved disaster information, emphasizing the role of open-source decision support systems in disaster management.

Ref. [28] introduced AI-based location intelligence for automated disaster monitoring, highlighting the role of transformer architectures in enhancing situational awareness. Furthermore, Ref. [29] proposed a pre-trained ensemble model for emotion identification during crises, showcasing the integration of social media sentiment analysis in disaster intelligence.

2.3. Data Requirements of Open-Source Disaster Intelligence

Data requirements for open-source disaster intelligence encompass real-time social media data, historical event datasets, and multimodal data integration. The fusion of multiple data sources is essential for accurate disaster prediction, situational awareness, and risk assessment.

Ref. [30] highlighted the importance of the NASA Global Landslide Catalog as a foundational dataset for disaster intelligence, advocating for open-access data repositories to enhance collaborative research. Ref. [31] introduced AI-Landslide, a tool that extracts insights from global landslide data using AI, demonstrating the importance of open-source datasets for comprehensive disaster analysis.

Ref. [21] presented a decision support system utilizing live Twitter feeds, showcasing the need for real-time social media integration in open-source disaster intelligence. Ref. [22] proposed RGAN-LS, a recurrent GAN model that augments disaster displacement data, highlighting the role of open-source synthetic data generation for enhancing predictive analytics.

Ref. [32] introduced a metadata-driven knowledge graph for disaster tweets, emphasizing the importance of structured social media data in open-source disaster analytics.

2.4. Drawbacks of Open-Source Disaster Intelligence

Despite its advantages, open-source disaster intelligence faces significant challenges, particularly concerning ethical issues, data credibility, and model limitations.

Ref. [8] provided a comprehensive survey on hallucination in natural language generation models, highlighting the risks associated with transformer models in disaster intelligence. Ref. [9] explored sycophantic behavior in language models, revealing potential biases in disaster detection models.

Ref. [10] highlighted the prevalence of fake accounts and misinformation on social media platforms, emphasizing the ethical challenges of using open-source social media data for disaster intelligence. Ref. [33] discussed the credibility issues in disaster-related social media analytics, advocating for enhanced data verification mechanisms.

Ref. [34] discussed ethical concerns regarding data privacy and misinformation, emphasizing the need for ethical frameworks in open-source disaster intelligence. Ref. [35] highlighted cross-lingual biases in query-based summarization models, showcasing the challenges of using open-source transformer models in multilingual disaster analytics.

This comprehensive review contextualizes the significance of open-source disaster intelligence, rationalizes the methodologies employed, and critically evaluates the data requirements and challenges. This study aims to build upon this body of knowledge by integrating advanced geospatial and time series models, leveraging open-source news data for real-time disaster analytics.

3. Mathematical Modeling

Time series analysis is essential for understanding temporal patterns, detecting cyclical behaviors, and forecasting future disaster occurrences and severity. The model integrates advanced geospatial and time series techniques, including K-means clustering [11,12], DBSCAN [12], spatial autocorrelation [36], seasonal decomposition (stl) [13], fourier transform [14], and ARIMA [15], to capture spatial dependencies, seasonal trends, cyclical patterns, and temporal dependencies present in disaster data. Each component is rigorously formulated to ensure accurate representation of spatial and time-dependent variations. Figure 2 presents the unique methodology unitized within this paper that comprehensively incorporates K-means clustering, DBSCAN, spatial autocorrelation, STL, Fourier transform, ARIMA, and other techniques in an integrated manner for obtaining critical disaster intelligence. It should be noted that all these components have been used for producing disaster intelligence in an isolated manner in recent studies [11,12,13,14,15,36]. However, this is the first study to demonstrate the integration of these techniques in a comprehensive approach.

3.1. Notation and Definitions

To maintain consistency and clarity throughout the mathematical modeling framework, the following notation is used (Table 1):

3.2. Geospatial Analysis Model

3.2.1. K-Means Clustering

K-means clustering partitions geolocations into K clusters by minimizing the intra-cluster variance. The objective is to group disaster occurrences into spatial clusters, revealing geographical hotspots. The objective function is formulated as:

J = \sum_{j = 1}^{K} \sum_{x_{i} \in S_{j}} {∥ x_{i} - C_{j} ∥}^{2}

(1)

where

C_{j}

is the centroid of cluster j,

x_{i}

represents a location point, and

S_{j}

denotes the set of points assigned to cluster j. The algorithm iteratively updates centroids:

C_{j} = \frac{1}{| S_{j} |} \sum_{x_{i} \in S_{j}} x_{i}

(2)

K-means clustering was employed to identify spatial disaster hotspots by grouping locations with similar disaster frequencies and severity levels. The algorithm partitions the dataset into K optimal clusters, ensuring that disaster-prone regions are systematically identified.

3.2.2. DBSCAN Clustering

DBSCAN identifies dense regions of points without predefining the number of clusters. It uses two parameters:

ϵ

(neighborhood radius) and

M i n P t s

(minimum points required). The Euclidean distance function is:

dist (p, q) = \sqrt{{(L a t_{p} - L a t_{q})}^{2} + {(L o n_{p} - L o n_{q})}^{2}}

(3)

DBSCAN effectively detects natural clusters of disaster occurrences.

Unlike K-means, which assumes spherical clusters, DBSCAN effectively captures irregular spatial distributions and isolates anomalous disaster events in sparsely populated regions. Hence, it has been applied in recent studies on environmental data [37].

3.2.3. Spatial Autocorrelation

Geary’s C measures spatial autocorrelation of severity levels:

C = \frac{(N - 1) \sum_{i} \sum_{j} w_{i j} {(y_{i} - y_{j})}^{2}}{2 W \sum_{i} {(y_{i} - \bar{y})}^{2}}

(4)

Getis-Ord G measures clustering of high or low values:

G_{i} = \frac{\sum_{j} w_{i j} y_{j}}{\sum_{j} y_{j}}

(5)

High values of

G_{i}

indicate clustering of high severity values, while low values indicate clustering of low severity values.

3.3. Seasonal Decomposition (STL)

Seasonal decomposition is a powerful technique that separates a time series into three distinct components: trend, seasonal, and residual variations. This decomposition allows for the identification of long-term trends, repeating seasonal patterns, and noise components. It is particularly useful for analyzing disaster data where seasonal effects, such as monsoon patterns or seasonal floods, are prevalent.

The mathematical representation of seasonal decomposition is formulated as:

Y_{t} = T_{t} + S_{t} + R_{t}

(6)

where

Y_{t}

is the observed time series value at time t,

T_{t}

is the trend component representing long-term progression,

S_{t}

is the seasonal component capturing short-term cyclical patterns, and

R_{t}

is the residual component representing noise or random fluctuations.

The seasonal component is assumed to be periodic with a fixed frequency corresponding to the cycle length. The residual component is assumed to be white noise, representing unexplained variability after removing the trend and seasonal components.

STL’s nonparametric decomposition allows for greater flexibility in identifying seasonal fluctuations in disaster occurrences, particularly in climate-driven disasters such as floods, wildfires, hurricanes and even landslides as shown in [38].

3.3.1. Trend Component

The Trend component captures the long-term progression in the time series, smoothing out short-term fluctuations. It is calculated using a moving average or local regression smoothing technique as:

T_{t} = \frac{1}{k} \sum_{i = 1}^{k} Y_{t - i}

(7)

where k is the window length representing the seasonal period. The trend component provides a clear representation of the underlying pattern of increase or decrease over time.

3.3.2. Seasonal Component

The seasonal component captures repeating short-term cycles influenced by seasonal factors such as weather patterns or climatic variations. It is formulated as:

S_{t} = Y_{t} - T_{t}

(8)

The seasonal component helps in identifying periodic behaviors, enabling better forecasting by accounting for seasonal fluctuations.

3.3.3. Residual Component

The residual component represents random noise or unexplained variation:

R_{t} = Y_{t} - T_{t} - S_{t}

(9)

It is assumed to be independently and identically distributed (i.i.d) with a mean of zero:

R_{t} \sim N (0, σ^{2})

(10)

The residual component captures irregular fluctuations not explained by the trend or seasonal patterns.

3.4. Fourier Transform

Fourier transform is employed to detect cyclical patterns by converting the time series from the time domain to the frequency domain. This transformation reveals underlying periodic behaviors by representing the time series as a sum of sinusoidal components. By decomposing disaster frequency data into sinusoidal components, Fourier analysis reveals dominant periodicities and seasonal fluctuations, particularly in weather-related disasters as shown by recent studies [39].

The Fourier transform is mathematically represented as:

F (k) = \sum_{n = 0}^{N - 1} Y_{n} e^{- i 2 π k n / N}

(11)

where

F (k)

is the frequency component corresponding to frequency k,

Y_{n}

is the time series value at time n, and N is the total number of time points.

The power spectrum is computed as the squared magnitude of the Fourier transform:

P (k) = {| F (k) |}^{2}

(12)

Peaks in the power spectrum indicate dominant frequencies, revealing cyclical patterns such as seasonal or annual cycles in disaster occurrences and severity.

3.5. ARIMA Model

ARIMA (auto-regressive integrated moving average) is a powerful time series forecasting model that captures temporal dependencies, trend patterns, and noise in the time series. It combines auto-regressive (AR), integrated (I), and moving average (MA) components as:

Y_{t} = c + \sum_{i = 1}^{p} ϕ_{i} Y_{t - i} + ϵ_{t} + \sum_{i = 1}^{q} θ_{i} ϵ_{t - i}

(13)

ϵ_{t} \sim N (0, σ^{2})

(14)

ARIMA model was chosen for forecasting disaster occurrences by capturing both short-term fluctuations and long-term temporal dependencies. For forecasting climate and environmental events like rainfall, ARIMA has been proven a reliable model by recent studies [40].

3.5.1. Parameter Estimation

Parameters are estimated using maximum likelihood estimation (MLE):

L (θ, ϕ ∣ Y) = - \frac{N}{2} log (2 π) - \frac{N}{2} log (σ^{2}) - \frac{1}{2 σ^{2}} \sum_{t = 1}^{N} {(Y_{t} - {\hat{Y}}_{t})}^{2}

(15)

3.5.2. Model Selection and Evaluation

Model selection is based on minimizing information criteria:

-: Akaike information criterion (AIC):

$A I C = 2 k - 2 ln (L)$

(16)
-: Bayesian information criterion (BIC):

$B I C = k ln (N) - 2 ln (L)$

(17)

This rigorous mathematical modeling framework provides a comprehensive approach for geospatial and time series analysis of disaster data, ensuring accurate hotspot detection, trend analysis, cyclical pattern identification, and reliable forecasting.

4. Experimentation

In this study, this research employed AI-driven automation to systematically collect data from 444 prominent online news portals, including major outlets such as CNN, BBC, CBS News, The Guardian, Daily Mail UK, NDTV, and Times of India. The system was designed to extract disaster news from multiple sources, including web scraping, RSS feeds, and event-based APIs, ensuring comprehensive coverage of disaster events worldwide. Web scraping involved parsing HTML elements to extract key information such as headlines, news content, and publication dates, while RSS feeds and event APIs provided structured data streams for real-time updates. By sourcing data from a diverse and extensive range of news media across different regions worldwide (e.g., US, UK, Australia, India, China), this study achieved a more credible and comprehensive open-source disaster analytics framework compared to previous research that predominantly relied on social media-driven data.

Between 27 September 2023 and 26 February 2025, a total of 1.25 million news articles were captured, out of which 17,884 articles were classified as disaster-related using GPT models. Over the course of 518 days, news reports were systematically aggregated and analyzed in real time, necessitating the deployment of multiple generative language models to ensure computational efficiency and contextual accuracy. The selection of models was contingent upon their availability and technical capabilities during the respective timeframes. Specifically, from 27 September 2023 to 17 June 2024, GPT-3.5 Turbo was employed, followed by GPT-4.0 from 17 June 2024 onward, marking a transition to a more advanced model with improved reasoning capabilities. Subsequently, on 4 December 2024, the analysis framework adopted Google Gemini 1.5 Flash API, a significantly more stable language model with an expanded token capacity of up to 1,048,576 tokens, thereby facilitating the processing of extensive textual corpora with enhanced contextual depth and computational robustness. The 17,884 articles covered global disaster events, including earthquakes, floods, landslides, cyclones, and wildfires. Notably, the GPT classification process identified disaster events in 185 distinct countries and 6068 unique locations over a span of 514 days (approximately 17 months). This comprehensive geospatial and temporal intelligence provides an unprecedented level of insight, surpassing the capabilities demonstrated in previous studies. The classification process also identified the number of deaths, the number of injuries, the severity of the disaster, and the type of disaster. The categorization model was formulated as follows:

C_{i} = GPT (N_{i}, P_{i})

(18)

where

C_{i}

represents the category of the ith news article,

N_{i}

is the content of the ith news article, and

P_{i}

denotes the predefined classification parameters. As seen in Figure 3, the 3D surface plot presents a comprehensive evaluation of the classification accuracy of the GPT-based model in systematically categorizing disaster-related information across six key dimensions: disaster types, location, country, severity, deaths, and injuries. The model’s classification efficacy is measured using precision, recall, and F1-score, with performance metrics ranging between 88% and 95%, demonstrating a high level of reliability in disaster intelligence extraction. The classification of country attains the highest accuracy, followed by location and disaster types, underscoring the model’s robust capacity to geospatially contextualize disaster events. In contrast, the accuracy for deaths and injuries is comparatively lower, reflecting the inherent complexities associated with extracting precise casualty figures from textual news reports. A significant factor contributing to this reduced accuracy is the frequent use of imprecise descriptors such as “several dozens”, “several hundreds”, or “several thousands” instead of explicit numerical values, thereby complicating automated data extraction and classification. The inclusion of a color legend further improves interpretability by visually delineating variations in accuracy levels, thereby underscoring both the strengths of the GPT model in disaster intelligence processing and the challenges associated with certain categorical classifications.

It should be noted that such a high level of predictive accuracy demonstrated by GPT-Classification (e.g., for country the F1-score reached up to 94.65%), because of sophisticated prompt engineering techniques. As seen from Figure 4, the prompt was meticulously designed to guide the classification process by structuring disaster type selection within a predefined list of 15 distinct categories, including flood, tsunami, hurricane/typhoon/cyclone, tornado, drought, wildfire (bushfire), earthquake, volcanic eruption, landslide, hailstorm, extreme temperatures, meteorite impact, subsidence, avalanche, and others. Additionally, the prompt explicitly instructed the model to output disaster severity as a numerical value ranging from 1 to 5, ensuring consistency in severity classification. To address ambiguity in casualty reporting, the prompt incorporated specific guidelines on interpreting vague numerical references such as “Several Hundreds”, “Several Dozens”, or “Thousands”, standardizing their representation for structured data processing. Most importantly, the prompt enforced a rigid output format using square-bracketed structured outputs, with illustrative examples to maintain uniformity in extracted information.

After categorization, the structured data were stored in Microsoft Dataverse tables to maintain consistency and facilitate subsequent analysis. The structured dataset contained nine fields:

Event Date (crd69_eventdate): The date on which the disaster event occurred.
Disaster Type (crd69_disastertype): The category of the disaster (e.g., earthquake, flood, volcanic eruption).
Location (crd69_location): The specific location where the disaster occurred.
Country (crd69_country): The country in which the disaster took place, normalized for consistency.
Severity (crd69_severity): A numerical scale representing the severity of the disaster, ranging from 1 to 5.
Deaths (crd69_deaths): The number of deaths caused by the disaster.
Injuries (crd69_injuries): The number of people injured due to the disaster.
Source (crd69_source): The news source or portal from which the information was obtained.
Event Description (crd69_eventdescription): A textual description of the disaster event.

The structured dataset can be mathematically represented as:

D = {(E_{i}, T_{i}, L_{i}, C_{i}, S_{i}, D_{i}, I_{i}, S r c_{i}, D e s c_{i}) ∣ i \in [1, N]}

(19)

where

E_{i}

represents the event date,

T_{i}

is the disaster type,

L_{i}

is the location,

C_{i}

is the country,

S_{i}

is the severity,

D_{i}

is the number of deaths,

I_{i}

is the number of injuries,

S r c_{i}

is the source,

D e s c_{i}

is the event description, and N is the total number of news articles.

The structured dataset was analyzed using geospatial and time-series techniques. The geospatial analysis included hotspot detection and spatial autocorrelation to identify regions with high disaster intensity. Time-series analysis involved seasonal decomposition and ARIMA modeling to detect temporal patterns, trends, and cycles in disaster occurrences and severity. ARIMA assumes a stationary time series, requiring non-stationary data to be transformed, often through differencing. While effective for linear trends, differencing struggles with non-linear patterns and complex causal relationships [41,42].

Table 2 presents the summary statistics for the disaster dataset, focusing on the severity of disasters, the number of deaths, and the number of injuries. The dataset comprises 17,884 disaster events, with the severity scale ranging from 0 to 5. The average severity is 2.69, with a standard deviation of 0.92, indicating moderate variability. The number of deaths and injuries exhibits a high degree of variance, with a mean of 727.85 deaths and 18.10 injuries per disaster event. The maximum recorded deaths and injuries are 830,000 and 190,000, respectively, highlighting the catastrophic impact of some disaster events.

Table 3 illustrates the geographical distribution of disaster occurrences, highlighting the top seven countries most frequently affected. The United States leads the count with 6548 disaster events, followed by India and Australia, indicating significant geographical vulnerability in these regions. This spatial pattern may be attributed to climatic factors, population density, and urbanization trends, necessitating targeted disaster preparedness and mitigation strategies.

Table 4 provides a more granular view by presenting the specific locations with the highest number of disaster incidents. Notably, Los Angeles and Florida are prominent disaster hotspots, reflecting their susceptibility to wildfires, hurricanes, and other climatic hazards. The inclusion of “Global” and “Local” as categories suggests widespread media reporting, emphasizing the global relevance of disaster events. The concentration of disaster reports in specific urban areas underscores the need for localized risk management approaches.

Table 5 categorizes disaster events by type, revealing that hurricanes, typhoons, and cyclones collectively represent the most common disaster type, with 5227 occurrences. This is followed by floods and wildfires, emphasizing the impact of extreme weather events likely linked to climate change. The prevalence of earthquakes, landslides, and volcanic eruptions further highlights the significance of geological hazards. Understanding these patterns is crucial for prioritizing disaster risk reduction initiatives and allocating resources effectively.

Table 6 ranks news portals by the volume of disaster-related articles published, with “www.dailymail.co.uk” (accessed 15 March 2025) leading the list. This ranking reflects the media’s role in shaping public perception and awareness of disaster events. The prominence of global news platforms, including BBC, The Guardian, and CBS News, indicates widespread international attention to disaster reporting. Analyzing news coverage patterns can provide valuable insights into information dissemination, influencing public response and policy-making.

The data presented in Table 3, Table 4, Table 5 and Table 6 collectively offer a comprehensive understanding of the spatial-temporal dynamics of disaster occurrences, types, and reporting patterns. The geographical hotspots identified in Table 3 and Table 4 align with known disaster-prone regions, reinforcing the necessity for region-specific disaster preparedness measures. Table 5’s categorization of disaster types aids in risk assessment and resource allocation, while Table 6’s analysis of news coverage highlights the importance of strategic communication in disaster management. Together, these insights contribute to a holistic approach to disaster risk reduction, policy formulation, and community resilience building.

In summary, Figure 5 provides a comprehensive overview of the entire data processing pipeline, including the tools and technologies utilized to generate the new dataset comprising 17,884 rows and 9 columns. As depicted in this figure, unstructured news reports are sourced in real-time via RSS feeds, APIs, and web scraping from 444 online news sources using Microsoft Power Automate. The Power Automate workflow orchestrates the automated acquisition and categorization of disaster-related news by leveraging GPT-based models for classification. Figure 4 illustrates the API invocation of GPT models within the Microsoft Power Automate flow, where disaster-related reports are systematically filtered and categorized. Microsoft Power Automate further distinguishes disaster-related news—belonging to one of the 15 predefined disaster categories—from non-disaster reports such as business, sports, and entertainment news.

Once structured, the processed dataset is stored within Microsoft Dataverse, facilitating seamless data retrieval and analysis. Subsequent analytical processes, including K-means clustering, DBSCAN, STL decomposition, Fourier transform, and ARIMA-based forecasting, are executed within the Microsoft Power BI environment, utilizing Python (version: 3.11) scripts for computational modeling. Finally, Microsoft Power BI disseminates the analytical outputs through the Microsoft Power BI service, making the reports accessible across multiple platforms and form factors, including mobile devices, tablets, and desktops.

This structured and categorized dataset enabled a comprehensive geospatial and temporal analysis of global disaster patterns, contributing to the identification of disaster-prone regions and the development of predictive models for future occurrences and severity levels. The dataset is available from https://github.com/DrSufi/NewsDisaster (accessed 15 March 2025).

5. Results

This section presents a comprehensive analysis of disaster occurrences utilizing advanced geospatial and time series methodologies. The results are systematically categorized into four subsections corresponding to the applied mathematical models: geospatial analysis, seasonal decomposition, Fourier transform, and ARIMA forecasting.

5.1. Geospatial Analysis

5.1.1. K-Means Clustering

K-means clustering analysis was performed on a disaster-related dataset, utilizing the crd69deaths, crd69injured, and crd69severity columns. The Elbow method was employed to determine the optimal number of clusters. The within-cluster sum of squared errors (WCSS) was computed for different values of K, and the optimal number of clusters was identified at the inflection point, where adding more clusters resulted in minimal improvement. This selection process ensures that the geospatial clustering of disaster events is neither over-segmented nor under-clustered, leading to a meaningful representation of disaster hotspots. With this technique, the optimal number of clusters (K), was found to be 3, as shown in Figure 6.

The resulting heatmap visually represents the number of occurrences of each cluster in each month, with colors indicating the cluster counts. The K-means clustering results are represented in Table 7.

The K-means clustering analysis has revealed some interesting insights. For instance, cluster 0 is the most common cluster, occurring every month. This suggests that there is a consistent pattern of disaster-related events that are characterized by the features of cluster 0. Further investigation is needed to determine the specific characteristics of this cluster and the factors that contribute to its prevalence. If 5 clusters are selected then, major clusters are observed in North America (USA), South Asia (India), Australia, East Asia (China, Japan), and Western Europe (UK, Spain).

Overall, the K-means clustering analysis has provided a useful way to understand the patterns and trends in the disaster-related dataset. The results of this analysis can be used to inform decision-making and resource allocation for disaster preparedness and response. The identified clusters revealed significant spatial concentrations, indicating regions with higher frequencies of disaster events. These hotspots provide crucial insights into geographical vulnerability and risk management strategies.

5.1.2. DBSCAN Clustering

DBSCAN clustering effectively detected natural clusters without the need to predefine the number of clusters. For DBSCAN, the choice of

ϵ

(epsilon) and MinPts (minimum points per cluster) was based on an empirical nearest-neighbor distance plot, where the optimal

ϵ

value was set at the point where the distance graph exhibits a sharp bend. This method effectively distinguishes natural clusters of disaster occurrences, particularly in cases where the data distribution is non-uniform and does not conform to pre-defined cluster shapes. DBSCAN’s density-based approach ensures that disaster events in sparsely populated regions are appropriately separated from dense clusters, thereby improving spatial anomaly detection. This approach identified dense regions of disaster occurrences, unveiling complex spatial distributions that were not captured by K-means. The results demonstrated the efficacy of DBSCAN in detecting irregularly shaped clusters, enhancing the understanding of geographical spread and disaster impact zones. As shown in Figure 7, eight clusters were identified during the monitored period.

Dense clusters are visible in North America, South Asia, and Australia, consistent with the K-means results.

5.1.3. Spatial Autocorrelation

Spatial autocorrelation analysis was conducted using Geary’s C and Getis-Ord G statistics to measure the spatial patterns of severity levels. The results are as follows:

Geary’s C: $0.970$ —This value, approaching unity (close to 1), suggests a lack of spatial autocorrelation, indicating that the severity levels are relatively randomly distributed across the geographical regions analyzed. This indicates that there is little spatial autocorrelation in the severity of disasters across different countries. In other words, the severity of a disaster in one country is not strongly related to the severity of disasters in neighboring or nearby countries.
This result can be interpreted in the context of the global nature of disasters. Disasters can strike anywhere in the world, regardless of the proximity to other disaster-prone regions. Factors such as local geographical conditions, infrastructure, and disaster preparedness can play a significant role in determining the severity of a disaster in a particular country.
Getis-Ord G: $2.351$ —This result indicates significant clustering of high severity values, revealing spatial hotspots of more severe disaster occurrences.

These findings provide valuable insights into the spatial distribution and severity patterns of disaster events, facilitating targeted disaster management and mitigation strategies.

5.2. Seasonal Decomposition (STL)

Seasonal decomposition was conducted to analyze temporal patterns in disaster occurrences. The STL model decomposed the time series into three components: trend, seasonal, and residual. The results are as follows:

Trend: The long-term progression revealed a discernible pattern of increase in disaster occurrences over the analyzed period.
Seasonal: The seasonal component captured repeating short-term cycles, highlighting periodic behaviors potentially associated with climatic patterns.
Residual: The residual component represented random fluctuations and unexplained variability, confirming the presence of stochastic noise in the time series.

As shown in Figure 8, this decomposition provided a nuanced understanding of temporal dynamics, contributing to improved disaster forecasting and risk assessment.

5.3. Fourier Transform

A Fourier transform analysis was conducted to detect underlying cyclical patterns in the disaster occurrences. The power spectrum revealed dominant frequencies corresponding to periodic behaviors in the time series. Peaks in the power spectrum suggested recurring cycles in disaster events, potentially linked to seasonal climatic factors, as shown in Figure 9. This analysis underscored the significance of cyclical influences on disaster occurrences, informing strategic planning for disaster preparedness and mitigation.

5.4. ARIMA Forecasting

An ARIMA model was implemented to capture temporal dependencies and forecast disaster occurrences (as shown in Figure 10). The model parameters were optimized as

(2, 1, 2)

, representing the auto-regressive, differencing, and moving average terms, respectively. The forecast closely followed the observed trend, demonstrating the model’s robustness in capturing temporal dynamics. For ARIMA, the model order (2, 1, 2) was selected through an iterative optimization process involving the AIC and BIC. The stationarity of the time series was first assessed using the augmented Dickey–Fuller (ADF) test, which indicated that differencing was required to achieve stationarity. The (2, 1, 2) order was chosen based on a combination of autocorrelation function (ACF) and partial autocorrelation function (PACF) plots, which suggested the presence of two significant lagged dependencies in both the autoregressive and moving average components. This configuration demonstrated strong predictive performance, with an MSE of 823,761, capturing both short-term fluctuations and long-term temporal dependencies in disaster occurrences. This indicates the average squared difference between the observed and forecasted values, reflecting the model’s predictive accuracy.

5.5. Summary of Findings

The integrated application of geospatial clustering, spatial autocorrelation, seasonal decomposition, Fourier analysis, and ARIMA forecasting provided a comprehensive understanding of the spatial-temporal dynamics of disaster occurrences. The results revealed significant geographical hotspots, seasonal patterns, cyclical behaviors, and reliable temporal dependencies, contributing to enhanced disaster risk assessment and mitigation strategies.

6. Discussion

The findings highlight the efficacy of advanced mathematical modeling techniques in disaster analytics. The integration of geospatial and time series methodologies facilitated a holistic analysis of disaster occurrences, uncovering complex spatial-temporal patterns. The implementation of dual clustering, combining K-means and DBSCAN, is essential for capturing both structured and unstructured spatial patterns in disaster intelligence. K-means clustering efficiently identifies well-defined disaster-prone regions by grouping locations with similar disaster frequencies and severity levels, whereas DBSCAN is employed to detect natural clusters of disaster occurrences without requiring a predefined number of clusters, making it particularly effective for identifying anomalous and irregular disaster distributions in sparsely populated or highly localized regions. The combination of these two clustering techniques ensures a comprehensive geospatial analysis, enabling both systematic hotspot identification and adaptive anomaly detection, thereby improving the accuracy and robustness of disaster intelligence. These insights provide a strategic foundation for policymakers and disaster management agencies to develop proactive mitigation and preparedness measures.

6.1. Temporal Analysis of Disaster Severity Using Heatmaps

To better understand the geospatial distribution of disaster severity across different time periods, heatmaps were generated by segmenting the dataset into three equal intervals. To analyze the temporal distribution of disaster severity, we defined the severity function at time t:

S_{t} = f (D_{t}, I_{t}, L_{t})

(20)

where:

$S_{t}$ represents the severity of disasters at time t.
$D_{t}$ is the number of deaths at time t.
$I_{t}$ is the number of injuries at time t.
$L_{t}$ is the location of the disaster at time t.

The cumulative severity over a given time window T is:

S_{T} = \sum_{t = 1}^{T} S_{t}

(21)

where

S_{T}

provides the total severity score over the selected period. Figure 11 illustrates the progression of disaster occurrences from 27 September 2023 to 24 February 2025. The first subplot (a) presents the cumulative severity across the entire duration, highlighting high-impact disaster zones in North America, South Asia, Australia, and parts of Europe and Africa. The subsequent subplots detail the spatial evolution of disasters across three periods: September 2023–April 2024, April 2024–October 2024, and October 2024–February 2025. Notably, North America and South Asia remain consistently high-risk regions, while fluctuations in disaster intensity are observed in Europe and Australia across different timeframes. The segmentation of disaster occurrences facilitates a temporal understanding of hazard evolution, aiding policymakers in designing region-specific response strategies. These insights, derived from AI-driven disaster intelligence, play a crucial role in early warning systems and proactive disaster management.

6.2. Global Distribution of Key Disaster Types

The disaster-specific heatmaps presented in Figure 12 provide an insightful geospatial perspective on the occurrence patterns of major disaster types worldwide. These heatmaps highlight the high-risk zones for each type of disaster, offering valuable insights for researchers, policymakers, and emergency response teams. Each disaster type d occurring at location l and time t can be represented using an indicator function:

X_{d} (t, l) = \{\begin{matrix} 1, & if disaster type d occurs at l at time t \\ 0, & otherwise \end{matrix}

(22)

The total number of occurrences of disaster type d over a period T and across all locations L is:

N_{d} = \sum_{t = 1}^{T} \sum_{l = 1}^{L} X_{d} (t, l)

(23)

where

N_{d}

represents the cumulative count of disaster type d.

For regional analysis, the probability of disaster type d occurring at location l is given by:

P_{d} (l) = \frac{\sum_{t = 1}^{T} X_{d} (t, l)}{N_{d}}

(24)

which provides insights into high-risk regions for specific disaster types.

Floods, as seen in Figure 12a, are highly concentrated in South Asia, Southeast Asia, and parts of North America, reflecting regions prone to heavy rainfall and seasonal monsoon impacts. Bangladesh, India, and the United States are among the most affected nations, emphasizing the need for improved flood management infrastructure and early warning systems.

Tropical storms, including hurricanes, typhoons, and cyclones, illustrated in Figure 12b, predominantly impact coastal regions of the Atlantic, the Pacific, and the Indian Ocean. Notably, the eastern United States, the Philippines, and Japan exhibit high densities of storm occurrences. These findings align with historical meteorological data and stress the need for hurricane-resistant infrastructure and coastal disaster preparedness measures.

Wildfires and bushfires, represented in Figure 12c, are most prevalent in Australia, California (USA), and parts of South America. The Australian bushfire season and the increasing frequency of wildfires in California reflect the growing influence of climate change on wildfire occurrences. These patterns underscore the necessity for enhanced forest management policies, controlled burns, and improved firefighting strategies.

Earthquakes, depicted in Figure 12d, follow a well-defined global pattern along tectonic plate boundaries. The Pacific Ring of Fire, covering Japan, Indonesia, the western coasts of North and South America, and parts of the Himalayas, is a dominant earthquake hotspot. The visualization supports the seismic hazard maps used in earthquake risk mitigation and highlights the urgent need for earthquake-resistant urban planning in vulnerable regions.

By analyzing these heatmaps, disaster management authorities can optimize resource allocation, enhance early warning systems, and refine mitigation strategies tailored to specific disaster types. Moreover, these findings serve as a foundation for further scientific research on disaster frequency, intensity, and regional vulnerability.

6.3. Mobile Deployment of the Disaster Intelligence System

The integration of the proposed AI-driven disaster intelligence framework into mobile platforms significantly enhances its accessibility, usability, and real-time responsiveness. Figure 13 illustrates the system’s mobile deployment, allowing users to visualize disaster data through an intuitive geospatial interface. By enabling mobile access, emergency responders, policymakers, and disaster management personnel can receive real-time intelligence, make informed decisions, and respond swiftly without reliance on stationary computing infrastructure.

Mobile deployment ensures that field responders and humanitarian organizations can access disaster intelligence instantly, even in remote or unstable environments. Real-time alerts, geospatial analysis, and predictive insights on mobile devices improve situational awareness, optimize resource allocation, and enhance coordinated response efforts. By placing advanced disaster analytics directly in the hands of decision-makers, the system facilitates rapid, evidence-based decision-making, ultimately strengthening global disaster preparedness and response.

6.4. Selection Criteria and Bias Mitigation in News Source Aggregation

The selection of news sources in this study was guided by popularity, prominence, and geographical diversity, ensuring that disaster reports were collected from a wide range of regions, linguistic backgrounds, and editorial perspectives. The dataset comprises news articles from 444 prominent online news portals, including major global media organizations such as CNN, BBC, The New York Times, and LA Times from the United States, Daily Mail, Reuters, and The Guardian from the United Kingdom, and India Times and NDTV from India, among others. This diverse selection ensures broad geopolitical coverage, minimizing regional biases in disaster intelligence reporting.

To further enhance representativeness, the dataset integrates news sources from multiple linguistic backgrounds, spanning English, Spanish, French, Hindi, Arabic, and Chinese, among others. This linguistic diversity reduces Western-centric biases and ensures that disaster intelligence reflects global events, including those reported in regions where English-language media may not have extensive coverage. However, despite these efforts, certain linguistic biases may still exist, particularly in underrepresented languages with limited digital news availability.

To mitigate ideological and editorial biases, the study incorporated sources reflecting a spectrum of political orientations. Left-leaning news portals such as CNN and BBC were complemented by right-leaning sources such as Fox News and The Epoch Times, ensuring a balanced representation of disaster reports across ideological divides. The aggregation of reports from diverse sources minimizes narrative distortions, as disaster events appearing across multiple news outlets from different editorial standpoints are more likely to represent objective realities rather than editorialized perspectives.

Additionally, the study acknowledges potential regional and cultural biases in disaster reporting. Some regions may receive disproportionately higher media attention, especially when disasters occur in densely populated areas or developed nations. Conversely, disasters in remote or politically unstable regions may be underreported due to limited journalistic presence, censorship, or infrastructural constraints. While our methodology attempts to counteract these imbalances through broad source selection, some structural biases inherent in global news media may still persist.

7. Conclusions

This study introduces an AI-driven disaster intelligence framework that leverages large-scale news media analytics, geospatial clustering, and temporal forecasting to enhance situational awareness and disaster preparedness. By systematically analyzing 1.25 million news articles from 444 sources, the GPT-based model identified 17,884 disaster-related reports spanning 185 countries and 6068 unique locations over 514 days. The GPT-based classification model demonstrates its highest performance in categorizing country information, achieving a precision of approximately 94.5%, a recall of 94.8%, and an F1-score of 94.65%, indicating its strong ability to accurately associate disaster reports with their respective nations. In contrast, the model exhibits its lowest classification accuracy in identifying deaths and injuries, with precision and recall values around 87.5–88.5%, reflecting the inherent difficulty of extracting precise casualty numbers due to vague numerical references in news articles. The integration of K-means clustering, DBSCAN, STL, Fourier transform, and ARIMA modeling provided robust geospatial and temporal intelligence, revealing critical disaster hotspots and enabling high-accuracy disaster forecasting. The results emphasize the significance of news media as a credible open-source alternative to social media-driven disaster intelligence, offering a more reliable approach to real-time disaster monitoring and policy-driven mitigation strategies.

The findings of this research have profound implications for disaster management, emergency response, and predictive modeling. The identification of the USA (6548 disasters), India (1393 disasters), and Australia (1260 disasters) as the most disaster-prone regions provides policymakers with actionable insights for resource allocation and proactive risk mitigation. Furthermore, the classification of hurricanes/typhoons/cyclones (5227 occurrences), floods (3360 occurrences), and wildfires (2724 occurrences) as the most frequent disaster types underscores the importance of climate-adaptive strategies. The study also establishes a comprehensive theoretical framework for integrating multimodal disaster intelligence, bridging the gap between real-time event tracking and AI-powered predictive analytics. Moving forward, the proposed framework can be extended with machine learning-based anomaly detection, multi-source data fusion, real-time disaster response automation and usage of more advanced language models [43,44], ensuring greater resilience and adaptability to emerging global disaster challenges.

Author Contributions

Conceptualization, F.S.; methodology, F.S.; software, F.S.; validation, F.S. and M.A.; formal analysis, F.S.; investigation, F.S.; resources, M.A.; data curation, F.S.; writing—original draft preparation, F.S.; writing—review and editing, F.S. and M.A.; visualization, F.S.; supervision, M.A.; project administration, M.A.; funding acquisition, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors have made the dataset publicly available for supporting research reproducibility https://github.com/DrSufi/NewsDisaster (accessed 15 March 2025).

Acknowledgments

The mathematical rigor presented within this paper has led the development of Coeus Institute’s flagship product GERA, which is being used by federal governments and intelligence agencies worldwide (https://coeus.institute/gera/, accessed 15 February 2025). Being the CTO of Coeus Institute, the author, Fahim Sufi, would like to extend his gratitude to all members of Coeus Institute, USA.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OSDI	Open-Source Disaster Intelligence
NLP	Natural Language Processing
GPT	Generative Pre-Trained Transformer
ARIMA	Auto-Regressive Integrated Moving Average
STL	Seasonal-Trend Decomposition using LOESS
FFT	Fast Fourier Transform
KNN	K-Nearest Neighbors
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
MSE	Mean Squared Error
AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion

References

Sufi, F.K.; Alsulami, M. Knowledge Discovery of Global Landslides Using Automated Machine Learning Algorithms. IEEE Access 2021, 9, 131400–131419. [Google Scholar] [CrossRef]
Alam, F.; Hassan, Z.; Ahmad, K.; Gul, A.; Reiglar, M.; Conci, N.; Al-Fuqaha, A. Flood Detection via Twitter Streams using Textual and Visual Features. arXiv 2020, arXiv:2011.14944. [Google Scholar]
Murayama, T.; Wakamiya, S.; Aramaki, E.; Kobayashi, R. Modeling the spread of fake news on Twitter. PLoS ONE 2021, 16, e0250419. [Google Scholar] [CrossRef]
Amatya, P.; Kirschbaum, D.; Stanley, T.; Tanyas, H. Landslide mapping using object-based image analysis and open-source tools. Eng. Geol. 2021, 282, 106000. [Google Scholar] [CrossRef]
Sufi, F.K. A systematic review on the dimensions of open-source disaster intelligence using GPT. J. Econ. Technol. 2024, 2, 62–78. [Google Scholar] [CrossRef]
Zhou, J.; Wang, X.; Liu, N.; Liu, X.; Lv, J.; Li, X.; Zhang, H.; Cao, R. Visual and Linguistic Double Transformer Fusion Model for Multimodal Tweet Classification. In Proceedings of the International Joint Conference on Neural Networks, Gold Coast, Australia, 18–23 June 2023. [Google Scholar]
Koshy, R.; Elango, S. Multimodal tweet classification in disaster response systems using transformer-based bidirectional attention model. Neural Comput. Appl. 2023, 35, 1607–1627. [Google Scholar] [CrossRef]
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.; Madotto, A.; Fung, P. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Ranaldi, L.; Pucci, G. When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour. arXiv 2023, arXiv:2311.09410. [Google Scholar]
Sahoo, S.R.; Gupta, B.B. Real-time detection of fake account in twitter using machine-learning approach. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2021; pp. 149–159. [Google Scholar] [CrossRef]
Li, C.; Zhao, Y.; Gao, L.; Ni, Y.; Liu, X. Evolution and response of flood disaster network public opinion based on an event logic graph: A case study of rainstorms in Henan, China. Int. J. Disaster Risk Reduct. 2025, 116, 105027. [Google Scholar] [CrossRef]
Li, J.; Zheng, A.; Guo, W.; Bandyopadhyay, N.; Zhang, Y.; Wang, Q. Urban flood risk assessment based on DBSCAN and K-means clustering algorithm. Geomat. Nat. Hazards Risk 2023, 14, 2250527. [Google Scholar] [CrossRef]
Chen, N.; Su, C.; Wu, S.; Wang, Y. El Niño index prediction based on deep learning with STL decomposition. J. Mar. Sci. Eng. 2023, 11, 1529. [Google Scholar] [CrossRef]
Zou, K.; Cheng, L.; Zhang, Q.; Qin, S.; Liu, P.; Wu, M. Detecting multidecadal variation of short-term drought risk by combining frequency analysis and Fourier transformation methods: A case study in the Yangtze River Basin. J. Hydrol. 2024, 631, 130803. [Google Scholar] [CrossRef]
Hamidi, M.R.; Mukhaiyar, U.; Rezeki, E.S.; Dianti, N.R.; Rohat, A.M. The Linear Combination of ARIMA Models in Constructing the Areal Rainfall Using Thiessen Polygon Weighted Method. ITM Web Conf. 2025, 75, 04004. [Google Scholar] [CrossRef]
Rabby, Y.; Li, Y. Landslide inventory (2001–2017) of Chittagong hilly areas, Bangladesh. Data 2020, 5, 4. [Google Scholar] [CrossRef]
Tamizi, A.; Young, I. A dataset of global tropical cyclone wind and surface wave measurements from buoy and satellite platforms. Sci. Data 2024, 11, 106. [Google Scholar] [CrossRef]
Gusenbauer, M.; Haddaway, N. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 2020, 11, 181–217. [Google Scholar] [CrossRef]
Halevi, G.; Moed, H.; Bar-Ilan, J. Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature. J. Inf. 2017, 11, 823–834. [Google Scholar] [CrossRef]
Balakrishnan, V.; Shi, Z.; Law, C.; Lim, R.; Teh, L.; Fan, Y.; Periasamy, J. A Comprehensive Analysis of Transformer-Deep Neural Network Models in Twitter Disaster Detection. Mathematics 2022, 10, 4664. [Google Scholar] [CrossRef]
Sufi, F.K. A decision support system for extracting artificial intelligence-driven insights from live twitter feeds on natural disasters. Decis. Anal. J. 2023, 5, 100130. [Google Scholar] [CrossRef]
Ge, Q.; Li, J.; Lacasse, S.; Sun, H.; Liu, Z. Recurrent GAN (RGAN-LS) model to augment landslide displacement data. Int. J. Geogr. Inf. Sci. 2024, 16, 4017–4033. [Google Scholar]
Sufi, F.K. AI-SocialDisaster: An AI-based software for identifying and analyzing natural disasters from social media. Softw. Impacts 2022, 13, 100319. [Google Scholar] [CrossRef]
Arulmozhivarman, M.; Deepak, G. TPredDis: Most Informative Tweet Prediction for Disasters Using Semantic Intelligence and Learning Hybridizations. In Lecture Notes in Electrical Engineering; Springer: Singapore, 2023. [Google Scholar] [CrossRef]
Boros, E.; Nguyen, N.; Lejeune, G.; Coustaty, M.; Doucet, A. Transformer-based Methods with #Entities for Detecting Emergency Events on Social Media, TREC. In Proceedings of the 30th Text REtrieval Conference, Online, 15–24 November 2021. [Google Scholar]
Wahid, J.; Shi, L.; Gao, Y.; Yang, B.; Wei, L.; Tao, Y.; Hussain, S.; Ayoub, M.; Yagoub, I. Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response. Expert Syst. Appl. 2022, 195, 116562. [Google Scholar] [CrossRef]
Varshney, A.; Kapoor, Y.; Chawla, V.; Gaur, V. A Novel Framework for Assessing the Criticality of Retrieved Information. Int. J. Comput. Digit. Syst. 2022, 11, 1229–1244. [Google Scholar] [CrossRef]
Sufi, F.K.; Khalil, I. Automated Disaster Monitoring From Social Media Posts Using AI-Based Location Intelligence and Sentiment Analysis. IEEE Trans. Comput. Soc. Syst. 2024, 11, 4614–4624. [Google Scholar] [CrossRef]
Nimmi, K.; Janet, B.; Selvan, A.; Sivakumaran, N. Pre-trained ensemble model for identification of emotion during COVID-19 based on emergency response support system dataset. Appl. Soft Comput. 2022, 122, 108842. [Google Scholar] [CrossRef]
Kirschbaum, D.B.; Stanley, T.; Zhou, Y. Spatial and temporal analysis of a global landslide catalog. Geophys. Res. Lett. 2015, 42, 10782–10789. [Google Scholar] [CrossRef]
Sufi, F.K. AI-Landslide: Software for acquiring hidden insights from global landslide data using Artificial Intelligence. Softw. Impacts 2021, 10, 100177. [Google Scholar] [CrossRef]
Bhaveeasheshwar, E.; Deepak, G. SMDKGG: A Socially Aware Metadata Driven Knowledge Graph Generation for Disaster Tweets. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2023. [Google Scholar]
Muhammed, T.S.; Mathew, S.K. The disaster of misinformation: A review of research in social media. Int. J. Data Sci. Anal. 2022, 13, 271–285. [Google Scholar] [CrossRef]
Gustafson, A.; Woodworth, A. Ethical concerns and privacy issues in social media data collection. Soc. Media Soc. 2014, 7, 343–348. [Google Scholar]
Vitiugin, F.; Castillo, C. Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers. In Proceedings of the 33rd ACM Conference on Hypertext and Social Media, Barcelona, Spain, 28 June–1 July 2022; pp. 21–31. [Google Scholar] [CrossRef]
Chen, Y.; Ji, W. Sparse sensing to dense assessment: Incorporating spatial autocorrelation for assessing flood impacts. Risk Anal. 2024, 44, 2463–2478. [Google Scholar] [CrossRef]
Li, M.; Su, M.; Zhang, B.; Yue, Y.; Wang, J.; Deng, Y. Research on a DBSCAN-IForest Optimisation-Based Anomaly Detection Algorithm for Underwater Terrain Data. Water 2025, 17, 626. [Google Scholar] [CrossRef]
Ren, S.; Ghazali, K.H. Integrating Time Series Decomposition and Deep Learning: An STL-TCN-Transformer Framework for Landslide Displacement Prediction. Eng. Proc. 2025, 84, 60. [Google Scholar] [CrossRef]
Liu, X.; Zhu, J.; Gao, M. A Flexible Time Power Grey Fourier Model for Nonlinear Seasonal Time Series and Its Applications. J. Grey Syst. 2025, 37, 43. [Google Scholar]
Dani, A.T.R. Navigating Samarinda’s climate: A comparative analysis of rainfall forecasting models. MethodsX 2025, 14, 103080. [Google Scholar]
Siu, K.M.; Chan, K.H.; Im, S.K. A Study of Assessment of Casinos’ Risk of Ruin in Casino Games with Poisson Distribution. Mathematics 2023, 11, 1736. [Google Scholar] [CrossRef]
Im, S.K.; Chan, K.H. Exploring IoT and Deep Learning for Smart City Technology. In Proceedings of the 2023 IEEE 6th International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China, 4–6 August 2023; pp. 16–21. [Google Scholar]
Xu, C.; Wang, M.; Zhu, S. Tourism sentiment quadruple extraction via new neural ordinary differential equation networks with multitask learning and implicit sentiment enhancement. Expert Syst. Appl. 2025, 270, 126417. [Google Scholar] [CrossRef]
Xu, C.; Wang, M.; Ren, Y.; Zhu, S. Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information. arXiv 2024, arXiv:2409.14997. [Google Scholar]

Figure 1. AI-driven analysis of vast news corpus for generating geo-spacial and temporal intelligence for strategic decision-makers.

Figure 2. Conceptual diagram of the overall disaster analytics solution.

Figure 3. Detailed performance evaluation for GPT-based classification and categorization process.

Figure 4. Especially engineered prompt to showcase the GPT-based disaster news classification and categorization process.

Figure 5. End-to-end workflow of data acquisition, analysis, and presentation.

Figure 6. Elbow method determined the cluster number to be 3.

Figure 7. Spatial distribution of disaster events identified through DBSCAN clustering, showing dense clusters in North America, South Asia, and Australia. This figure highlights the method’s ability to pinpoint areas with frequent disaster occurrences without predetermined cluster sizes.

Figure 8. Seasonal decomposition of disaster occurrence data over time, segmented into trend, seasonal, and residual components. This graphical representation aids in discerning the underlying patterns and cyclical nature of disaster events, facilitating more accurate forecasting.

Figure 9. Fourier transform analysis results showing dominant frequencies in disaster occurrences. Peaks in the power spectrum illustrate recurring cycles, which are critical for understanding seasonal influences on disaster patterns.

Figure 10. Forecasting of disaster occurrences using an optimized ARIMA model (2, 1, 2). The graph demonstrates the model’s effectiveness in predicting future disaster trends based on historical data, underscored by its mean squared error (MSE) performance.

Figure 11. Heatmaps illustrating the severity of disasters across the world in different time segments. The first image represents the entire period, while the remaining three capture distinct sub-periods for comparative analysis.

Figure 12. Heatmaps illustrating the spatial distribution of major disaster types. The first image (a) represents global flood occurrences, while (b) visualizes hurricanes, typhoons, and cyclones. The third (c) and fourth (d) heatmaps depict wildfire (bushfire) and earthquake distributions, respectively.

Figure 13. Implementation of the AI-driven geospatial and temporal intelligence solution into Samsung Galaxy S23 Ultra showing the heatmap of landslide incidence during the monitored period.

Table 1. Notation and definitions.

Symbol	Definition
$x_{i}$	Location point i with latitude and longitude
$C_{j}$	Centroid of cluster j in K-means
$S_{j}$	Set of points assigned to cluster j in K-means
$w_{i j}$	Spatial weight between locations i and j
$Y_{t}$	Time series value at time t (e.g., occurrences or severity)
$T_{t}$	Trend component representing long-term progression
$S_{t}$	Seasonal component capturing short-term cyclical patterns
$R_{t}$	Residual component representing noise or random fluctuations
k	Window length for seasonal period (e.g., 12 for monthly data)
$F (k)$	Frequency component from Fourier Transform
$P (k)$	Power spectrum value indicating dominant frequencies
p	Order of Auto-Regressive (AR) term in ARIMA model
d	Order of differencing to achieve stationarity
q	Order of Moving Average (MA) term in ARIMA model
c	Constant term in ARIMA model
$ϕ_{i}$	AR parameters capturing temporal dependencies
$θ_{i}$	MA parameters accounting for noise or shocks
$ϵ_{t}$	White noise error term, assumed to be normally distributed
$σ^{2}$	Variance of the residuals
L	Likelihood function for parameter estimation
$A I C$	Akaike Information Criterion for model selection
$B I C$	Bayesian Information Criterion for model selection
N	Total number of time points or observations

Table 2. Summary statistics for disaster data.

Column	Mean	Std. Dev.	Max
deaths	727.85	14,692.89	830,000
injured	18.10	1431.99	190,000
severity	2.69	0.92	5

Table 3. Top 7 countries by number of disasters.

	USA	India	Australia	UK	Japan	Iceland	Spain
crd69_country	6548	1393	1260	797	547	527	496

Table 4. Top 7 locations by number of disasters.

Los Angeles	Florida	Global	Local	Mayotte	California	Grindavik
591	511	317	302	284	275	244

Table 5. Top 7 disaster types by number of disasters.

Hurricane/ Typhoon/ Cyclone	Flood	Wildfire (Bushfire)	Earthquake	Others	Landslide	Volcanic Eruption
5227	3360	2724	2256	1027	892	858

Table 6. Top 10 news portals by number of disaster-related articles.

News Portal	Number of Articles
www.dailymail.co.uk	1540
www.bbc.com	1533
www.theguardian.com	1525
www.cbsnews.com	1231
www.msn.com	1139
www.bbc.co.uk	1057
www.ndtv.com	985
https://abcnews.go.com/	896
https://timesofindia.indiatimes.com/	834
www.livemint.com	803

Table 7. K-means clustering results.

Year	Month	KMeans Cluster	Cluster Count
2023	9	0	2
2023	10	0	518
2023	11	0	632
2023	12	0	966
2023	12	1	1
2024	1	0	724
2024	2	0	1075
2024	3	0	436
2024	4	0	1089
2024	5	0	1218
2024	6	0	482
2024	7	0	1451
2024	8	0	1048
2024	9	0	933
2024	10	0	1047
2024	11	0	677
2024	12	0	1068
2025	1	0	724
2025	2	0	518

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sufi, F.; Alsulami, M. AI-Driven Global Disaster Intelligence from News Media. Mathematics 2025, 13, 1083. https://doi.org/10.3390/math13071083

AMA Style

Sufi F, Alsulami M. AI-Driven Global Disaster Intelligence from News Media. Mathematics. 2025; 13(7):1083. https://doi.org/10.3390/math13071083

Chicago/Turabian Style

Sufi, Fahim, and Musleh Alsulami. 2025. "AI-Driven Global Disaster Intelligence from News Media" Mathematics 13, no. 7: 1083. https://doi.org/10.3390/math13071083

APA Style

Sufi, F., & Alsulami, M. (2025). AI-Driven Global Disaster Intelligence from News Media. Mathematics, 13(7), 1083. https://doi.org/10.3390/math13071083

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Driven Global Disaster Intelligence from News Media

Abstract

1. Introduction

2. Contextual Background

2.1. Importance and Requirements of Open-Source Disaster Intelligence

2.2. Unique Methodologies in Open-Source Disaster Intelligence

2.3. Data Requirements of Open-Source Disaster Intelligence

2.4. Drawbacks of Open-Source Disaster Intelligence

3. Mathematical Modeling

3.1. Notation and Definitions

3.2. Geospatial Analysis Model

3.2.1. K-Means Clustering

3.2.2. DBSCAN Clustering

3.2.3. Spatial Autocorrelation

3.3. Seasonal Decomposition (STL)

3.3.1. Trend Component

3.3.2. Seasonal Component

3.3.3. Residual Component

3.4. Fourier Transform

3.5. ARIMA Model

3.5.1. Parameter Estimation

3.5.2. Model Selection and Evaluation

4. Experimentation

5. Results

5.1. Geospatial Analysis

5.1.1. K-Means Clustering

5.1.2. DBSCAN Clustering

5.1.3. Spatial Autocorrelation

5.2. Seasonal Decomposition (STL)

5.3. Fourier Transform

5.4. ARIMA Forecasting

5.5. Summary of Findings

6. Discussion

6.1. Temporal Analysis of Disaster Severity Using Heatmaps

6.2. Global Distribution of Key Disaster Types

6.3. Mobile Deployment of the Disaster Intelligence System

6.4. Selection Criteria and Bias Mitigation in News Source Aggregation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI