Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models

Li, Wenjia; Zhou, Yongzhang

doi:10.3390/geohazards7020040

Open AccessReview

Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models

by

Wenjia Li

^1,* and

Yongzhang Zhou

^1,2,3

¹

School of Earth Sciences and Engineering, Sun Yat-sen University, Zhuhai 519000, China

²

Center for Earth Environment & Resources, Sun Yat-sen University, Zhuhai 519000, China

³

Guangdong Provincial Key Laboratory of Mineral Resources and Geological Processes, Sun Yat-sen University, Zhuhai 519000, China

^*

Author to whom correspondence should be addressed.

GeoHazards 2026, 7(2), 40; https://doi.org/10.3390/geohazards7020040

Submission received: 12 February 2026 / Revised: 24 March 2026 / Accepted: 27 March 2026 / Published: 7 April 2026

(This article belongs to the Topic Big Data and AI for Geoscience)

Download

Browse Figures

Versions Notes

Abstract

Geohazards such as landslides, earthquakes, debris flows, and floods are governed by complex interactions among geological, hydrological, and human processes. Traditional data-driven models have improved hazard prediction but often lack interpretability and adaptability. This review examines the evolution of knowledge-guided approaches in geohazard research, highlighting how knowledge representation and artificial intelligence have progressively converged to enhance understanding, reasoning, and model transparency. A bibliometric analysis of 1410 publications indexed in the Web of Science reveals an evolution from early ontology-based knowledge engineering for expert reasoning to knowledge graphs (KG), frameworks enabling multi-source data integration and relational inference, and more recently, to large language model (LLM), augmented systems for automated knowledge extraction and cognitive geoscience. This review synthesizes advances in knowledge representation, knowledge graphs, and LLM-based reasoning, demonstrating how hybrid models that embed physical laws and expert knowledge can improve the interpretability and generalization of machine learning. These developments enable new forms of knowledge-driven geohazard intelligence and support applications in hazard monitoring, early warning, and risk communication. There are challenges we still face, including semantic fragmentation, limited causal reasoning, and sparse data for extreme events. Future directions require unified knowledge–data–mechanism architectures, causality-aware modeling, and interoperable standards to advance trustworthy and explainable geohazard intelligence.

Keywords:

geohazard intelligence; knowledge graph; ontology; large language model; knowledge-enhanced modeling; generative reasoning

1. Introduction

Geohazards such as landslides, earthquakes, debris flows, and floods rank among the most destructive natural phenomena affecting human settlements and critical infrastructure globally. These events arise from complex interactions among geological structures, hydrological processes, and anthropogenic activities, rendering their prediction and mitigation persistent challenges in Earth system science [1,2,3,4]. Conventional empirical and physical models provide valuable process understanding yet often fail to capture the full spectrum of multi-scale and cascading hazard processes, particularly under accelerating climate change and land-use transformation [5,6].

Over the past two decades, the rapid growth in multi-source geospatial data, including satellite imagery, sensor networks, and geological surveys, has enabled the development of data-driven methods for hazard mapping, susceptibility assessment, and early warning [2,3,7]. Machine learning (ML) and data mining approaches have achieved notable predictive performance across diverse hazard contexts [8,9,10]. However, most such models rely on statistical correlations without an explicit representation of underlying causal mechanisms, limiting interpretability and transferability across regions and hazard types [11,12]. This “black box” nature constitutes a recognized obstacle to the operational adoption of artificial intelligence in geosciences [3,13,14].

In geoscientific reasoning, symbolic knowledge is particularly important because hazard interpretation depends not only on statistical associations, but also on explicit process chains, threshold effects, stratigraphic context, and expert-defined causal relations. Ontologies and rule-based representations therefore do more than organize terminology: they externalize domain assumptions in a form that is transparent, shareable, and auditable across datasets and applications. This is especially valuable for geohazard systems, where sparse observations, spatial heterogeneity, and multi-factor interactions often require interpretable reasoning rather than pattern recognition alone [15,16,17].

The knowledge-guided paradigm has emerged as a promising pathway to address these limitations by integrating domain expertise, physical laws, and semantic knowledge into computational models [17,18]. Within this paradigm, ontologies and knowledge graphs (KGs) provide formal structures to represent entities, processes, and relationships in geohazard systems [16]. These semantic representations enable interoperability among heterogeneous datasets and support reasoning about hazard causation and propagation [15,19]. Research on geohazard ontologies, landslide knowledge graphs, and multi-hazard knowledge integration demonstrates the potential of this approach for interpretable geohazard intelligence [20,21].

The advent of large language models (LLMs), including general-purpose models such as GPT (generative pre-trained transformer) and LLaMA (Large Language Model Meta Artificial Intelligence), creates new opportunities for knowledge extraction and reasoning from unstructured text sources such as field reports, scientific publications, and disaster bulletins. General-purpose LLMs are pretrained on broad web-scale corpora and offer flexible language understanding, but they are not optimized for geoscientific terminology, hazard taxonomies, or physically grounded interpretation. By contrast, domain-adapted models are further aligned through geoscience corpora, task-specific instruction tuning, retrieval over validated repositories, or coupling with ontologies and knowledge graphs, enabling more reliable use in geohazard contexts. These models exhibit semantic understanding and contextual reasoning capabilities; when combined with KGs, they can support automatic knowledge extraction, summarization, and multimodal data fusion [22,23,24]. The coupling of structured and unstructured knowledge represents a conceptual shift from descriptive representation toward knowledge-enhanced reasoning for geohazard prediction and decision support [25].

Despite this progress, current studies remain fragmented by hazard type, data source, and computational paradigm. Some efforts focus on ontology construction, others address knowledge-guided ML- or LLM-based reasoning, often without integrative frameworks. A systematic synthesis is needed to clarify the intellectual evolution, methodological advances, and emerging convergence between symbolic and neural approaches in geohazard research. Existing reviews have examined related aspects of this topic from different perspectives, including artificial intelligence in geoscience more broadly, knowledge graph construction and application in geosciences, and the emerging use of large language models in environmental science [24,26,27]. These studies provide important methodological overviews, but they usually address semantic representation, knowledge-guided modeling, and language-based reasoning as separate strands. By contrast, the present review focuses specifically on geohazard intelligence and treats ontologies, knowledge graphs, knowledge-guided machine learning, and LLM-based reasoning as successive and increasingly integrated stages in the evolution of hazard-oriented knowledge systems. The aim is therefore not to duplicate broader reviews, but to provide a geohazard-centered synthesis of their emerging convergence. Accordingly, this review has three primary objectives:

To trace the historical evolution of knowledge-guided geohazard research from early expert systems to contemporary KG- and LLM-based frameworks;
To synthesize how ontologies and knowledge graphs structure, represent, and operationalize geohazard knowledge, supporting semantic interoperability and causal reasoning;
To analyze how machine learning and large language models integrate with domain knowledge to enhance interpretability, adaptability, and decision-making in geohazard modeling.

This paper is organized as follows. Section 2 presents bibliometric and thematic evolution analysis. Section 3 focuses on knowledge representation approaches, including ontology design and KG construction. Section 4 and Section 5 discuss the integration of knowledge-guided and LLM-based reasoning, respectively. Section 6 summarizes challenges and future directions toward unified, interpretable, and adaptive geohazard intelligence systems.

2. Bibliometric Analysis of Knowledge-Guided Approaches to Geohazards

2.1. Search Strategy and Literature Selection

To quantitatively characterize the development of knowledge-guided geohazard research, a comprehensive bibliometric analysis was conducted using publications indexed in the Web of Science Core Collection databases between 1 January 1990 and 20 April 2025. The search aimed to capture publications at the intersection of geohazards and knowledge-centered or language-centered computational approaches. The following query was used:

The Web of Science Core Collection was selected as the primary data source because it provides comparatively standardized bibliographic metadata, controlled author keywords, citation links, and subject classifications, which support reproducible bibliometric analysis. Scopus offers broad journal coverage and is also widely used in bibliometric studies, but its source indexing and keyword normalization are relatively more heterogeneous across disciplines [28]. Google Scholar has even broader coverage and often captures more citations than either Web of Science or Scopus, but its indexing rules are less transparent, and its metadata are less standardized for systematic bibliometric workflows [29]. We therefore used Web of Science to prioritize consistency, comparability, and reproducibility in bibliometric analysis, while acknowledging that this choice may underrepresent some regional, non-English, or non-journal literature.

The retrieved records were screened in four steps. First, records were retrieved from the Web of Science Core Collection using the search query above. Second, duplicate records were removed during export and verified manually. Third, titles, abstracts, and keywords were screened for relevance to geohazard processes and knowledge-based or AI-related methods. Fourth, records such as editorials, corrections, news items, and studies unrelated to geohazards were excluded. The remaining records were then cleaned and standardized for bibliometric analysis. All indexed publication types were retained except book chapters, editorial materials, corrections, and non-indexed preprints.

After screening, the bibliographic records were further cleaned and standardized. Publication year and citation counts were converted into numeric variables, and keywords were normalized through lowercasing and synonym merging to reduce semantic inconsistencies across the dataset.

2.2. Bibliometric Analysis Results

2.2.1. Publication Trends and Evolution

Figure 1 illustrates a clear upward trajectory in publications related to knowledge-guided approaches to geohazards between 1990 and 2024. During the 1990s and early 2000s, the number of papers remained very limited, typically fewer than ten publications per year. Research at that time was exploratory and largely based on expert knowledge and symbolic reasoning. Typical efforts involved the development of rule-based expert systems and ontology frameworks to formalize the understanding of landslide mechanisms, triggering factors, and geomorphological classifications. Although these studies were often qualitative and case-specific, they laid the conceptual groundwork for representing geohazard knowledge in a structured, machine-interpretable form.

From the mid-2000s onward, a gradual increase in publication activity reflected the growing influence of data-driven analysis in geohazard research. The rapid expansion of remote sensing (RS) and monitoring technologies, such as optical imagery, LiDAR (Light Detection and Ranging), InSAR (Interferometric Synthetic Aperture Radar), and rainfall sensor networks, provided massive datasets for quantitative modeling. Statistical and machine learning methods, including logistic regression, support vector machines, and random forests, have been increasingly applied to regional landslide susceptibility mapping, debris-flow prediction, and seismic hazard assessment. These techniques substantially improved predictive accuracy and efficiency, although their “black-box” nature limited interpretability and knowledge transparency.

After 2015, the publication curve steepened noticeably, indicating a new phase in which knowledge-based frameworks began to be combined with advanced artificial intelligence. The introduction of knowledge graphs, semantic web standards, and ontology-based data integration promoted the fusion of physical reasoning with deep learning (DL) models. In practical applications, this enabled multi-source information fusion, automated hazard database construction, and explainable susceptibility mapping, linking geological semantics to data-driven inference. This integration marked a methodological shift from empirical learning to hybrid and knowledge-guided intelligence.

Since 2021, the field has experienced a rapid expansion, with publication output peaking at 239 papers in 2023. This surge coincides with the emergence of LLMs and generative artificial intelligence, which have shown potential to support automated knowledge extraction, semantic reasoning, and intelligent hazard interpretation from heterogeneous geoscientific data. Recent studies have explored their use in geohazard knowledge graph construction, multi-modal monitoring analysis, and report generation for disaster early warning. Because the data for 2025 only covers publications up to April and therefore represent an incomplete year, the 2025 data were excluded from the figure to avoid potential misinterpretation of the publication trend.

Overall, knowledge-guided geohazard research has evolved from symbolic representation and expert reasoning toward data-intensive modeling and, more recently, toward knowledge-enhanced and generative intelligence that integrates geological understanding, physical constraints, and cognitive computation within a unified analytical framework.

2.2.2. Thematic Evolution and Research Focus

To further interpret the knowledge development process, a thematic analysis was performed based on the evolution of author keywords from 1990 to 2025. For this purpose, the study period was divided into four stages: Emergence (1990–2004), Early Growth (2005–2014), Acceleration (2015–2020), and Exponential Growth (2021–present), following both the publication trend and the observed inflection points in annual output (Figure 1).

Keyword data were extracted from bibliographic records and standardized through lemmatization and synonym merging. Keyword statistics and thematic visualizations were implemented using Python-based (Python 3.9) data analysis workflows. Word clouds were then generated based on keyword frequency statistics. Phase-specific word clouds (Figure 2) were generated to visualize the most frequent and co-occurring terms, while a keyword-phase heatmap (Figure 3) was constructed to quantify the relative importance of major research topics over time.

As shown in Figure 2, the thematic landscape of knowledge-guided geohazard research has undergone significant transformation. During the early period (1990–2004), the most frequent terms, such as expert system, knowledge base, ontology, neural networks and earthquake, indicate the initial emphasis on symbolic representation and conceptual frameworks. Studies mainly focused on the development of rule-based systems and domain ontologies to organize geohazard knowledge, reflecting the formative stage of knowledge engineering in this field.

In the following decade (2005–2014), terms like ontology, model, flood, earthquake, and GIS (Geographic Information System) became dominant. This shift suggests the early integration of spatial information systems and modeling techniques with semantic frameworks. Research began to focus on hazard mapping, vulnerability assessment, and the management of multi-source geospatial data, marking a transition from conceptual modeling toward practical implementation.

Between 2015 and 2020, the vocabulary expanded notably. Keywords such as disaster management, ontology, model, and natural language processing emerged, indicating the increasing application of data-driven methods and text-based information extraction. This period corresponds to the acceleration phase identified in publication trends, when hybrid frameworks combining semantic representation, machine learning, and remote sensing data became prevalent in disaster monitoring and early warning.

In the most recent period (2021–present), the thematic scope broadened further, as evidenced by the dense, diverse clusters in Figure 2 and the distribution patterns in Figure 3. While the absolute heatmap reflects the rapid increase in publication volume, the normalized heatmap highlights the relative prominence of emerging research themes across phases. High-frequency terms such as deep learning, transformer, knowledge graph, social media, and climate change become increasingly prominent, reflecting the growing integration of data-intensive methods, large language models, and multimodal information sources in geohazard research. These developments indicate an increasing emphasis on combining artificial intelligence techniques with heterogeneous data sources to support more comprehensive hazard assessment, early warning, and risk communication.

As illustrated in Figure 3, ontology, model, and disaster remain core themes throughout the entire timeline in both the absolute and normalized representations, highlighting the enduring importance of structured knowledge representation in geoscientific reasoning. Meanwhile, the increased relative prominence of terms such as deep learning, knowledge graph, and natural language processing in the latest phase indicates a shift towards intelligent and knowledge-enhanced geohazard systems. Overall, the thematic evolution suggests a transition from early symbolic knowledge structures and ontology engineering toward data-intensive and generative paradigms. This transition corresponds closely with the publication growth pattern discussed in Section 2.2.1, suggesting that advances in artificial intelligence, knowledge representation, and data technologies have progressively reshaped the conceptual foundations and methodological orientation of geohazard research.

3. Knowledge Representation in Geohazard

Knowledge representation (KR) provides the conceptual and semantic foundation for organizing, reasoning, and communicating geohazard information. Building upon the evolutionary trends identified in Section 2, this section reviews how geohazard knowledge representation has evolved from symbolic expert systems to ontology-based semantic frameworks and knowledge graphs and explains how these representations support dynamic reasoning and multi-source data integration. It further distinguishes the main classes of ontologies used in geohazard intelligence and examines how ontology-grounded knowledge graphs extend these semantic structures toward dynamic reasoning and multi-source data integration.

3.1. Evolution of Knowledge Representation in Geohazard Research

Early geohazard research attempted to formalize expert reasoning into symbolic structures, translating tacit understanding into explicit rules. Rule-based and expert systems encoded deterministic statements such as “if rainfall intensity exceeds threshold X, slope instability increases by factor Y,” allowing for transparent reasoning about hazard processes [30,31]. These systems proved useful for codifying domain expertise, yet scope remained limited: knowledge was static, site-specific, and difficult to adapt to new geomorphological contexts.

As digital elevation model (DEM), RS, and GIS technologies became widespread in the early 2000s, researchers sought more structured approaches to link conceptual knowledge with spatial data. Ontology engineering emerged to define geohazard-related entities (such as landslides, slopes, materials, and triggering conditions) along with their causal and spatial relationships in a machine-interpretable form [15,32,33].

In geohazard intelligence, ontology frameworks can be differentiated according to their semantic role. Upper- or mid-level semantic foundations provide general categories such as object, event, process, time, and space; in Earth science, the SWEET (Semantic Web for Earth and Environmental Terminology) ontology suite is often used as a broader semantic foundation for domain extension [34,35]. Domain ontologies specialize these abstractions into geoscientific concepts, such as lithology, slope unit, fault, hydrological trigger, and monitoring indicator. Hazard- or event-oriented ontologies focus more explicitly on process chains, for example, trigger–movement–impact–response or precursor–failure relations, and are therefore particularly relevant for geohazard reasoning, warning logic, and disaster-chain representation. By contrast, data and metadata models primarily support interoperability and exchange. GeoSciML, a representative of this role, it provides a logical model and GML (Geography Markup Language)/XML (Extensible Markup Language) encoding rules for geological map data, boreholes, geological time scales, and related metadata, but it does not by itself constitute a complete event-centric reasoning framework [19]. Representative geohazard examples illustrate these different roles. The Landslip Ontology was developed to support landslide early warning by linking landslide hazards with warning signs, Earth-observation data, urban data sources, and time-series discovery workflows [36]. More recently, a multilevel geohazard domain ontology has been proposed for landslide applications, organizing knowledge hierarchically to support knowledge sharing, reuse, and semantic consistency across geological hazard contexts [37]. Together, these ontology layers form a semantic foundation through which heterogeneous geohazard observations can be described, aligned, and interpreted more consistently.

The introduction of KGs extended these semantic foundations by linking conceptual entities to observations and events through explicit relationships. Unlike ontologies, which emphasize hierarchical organization, KGs incorporate instance-level data, such as rainfall events, monitoring time series, and mapped landslide occurrences, enabling more flexible reasoning across heterogeneous sources [38]. In geohazard applications, this shift facilitated causal and spatiotemporal inference within unified frameworks [39,40], marking a transition from abstract representations toward knowledge contextualized by real-world evidence.

This developmental trajectory reflects a cumulative logic rather than a sequence of replacements. Similar patterns are evident in related geoscience domains. In ore-deposit exploration, geological ontologies have gradually expanded into mineralization knowledge graphs capable of integrating structural, geochemical, and remote sensing indicators [3,41]. In contaminated-site management, soil and pollution ontologies have been extended into KGs that support reasoning about pollutant sources, migration pathways, and remediation strategies [42,43]. Landslide susceptibility studies likewise build upon expert classifications and geomorphological ontologies as semantic scaffolds for KG-based hazard identification and deep-learning models [44]. As illustrated in Figure 4, successive paradigms, rule-based systems, ontologies, knowledge graphs, and now cognitive AI, have accumulated into layered representational frameworks, each expanding the conceptual depth and analytical reach of geohazard knowledge.

3.2. Comparative Review: Ontologies vs. Knowledge Graphs

Ontologies and knowledge graphs represent two related yet functionally distinct approaches to geohazard knowledge representation. Ontologies emphasize formal semantics and logical consistency, defining concepts such as “landslide,” “slope,” “trigger,” and “material,” together with relations including has_cause, occurs_on, located_in (Guarino, 1998 [33]). Ontologies emphasize formal semantics and logical consistency, defining concepts and relations in a controlled manner. In geohazard research, this semantic rigor is essential for interoperability and consistent domain communication. Knowledge graphs, by contrast, prioritize connectivity, contextualization, and instance-level reasoning across heterogeneous data sources. They are better suited to representing evolving event relationships, cross-source evidence, and complex geohazard interactions that extend beyond predefined taxonomic structures.

Knowledge graphs, in contrast, prioritize connectivity and scalability. They extend ontological schemas by linking real-world instances (hazard events, monitoring records, environmental conditions) into relational networks capable of supporting causal reasoning and retrieval across heterogeneous data sources. Recent studies have shown that integrating landslide inventories, rainfall time series, and terrain attributes into KG-based structures improves spatiotemporal inference and enhances interpretability [45]. At larger scales, disaster-oriented KGs that fuse geological, hydrological, meteorological and remote-sensing observations have demonstrated their value in contextual variable retrieval and early-warning applications [46,47,48].

Table 1 compares ontology-based and KG-based frameworks in geohazard research in terms of conceptual focus, representational scale, update mechanism, reasoning capacity, and representative projects. Ontologies provide logical rigor and controlled vocabularies but remain static and difficult to update. Knowledge graphs offer richer instance-level representation and flexible graph reasoning, though they can suffer from inconsistent semantics if not grounded in robust ontological principles. As a result, ontology-driven KG construction, where formal semantics guide large-scale graph instantiation, has emerged as a major research trend in geohazard informatics [38,49,50]. This hybridization illustrates an increasing recognition that neither approach alone is sufficient: each complements the other in building coherent and operational geohazard knowledge systems.

In practice, geohazard knowledge graphs are populated through several complementary workflows. Manual ontology-first construction is typically used when experts first define the schema, entity classes, and relation types for relatively stable knowledge resources; the multilevel landslide ontology proposed by Wen et al. [37] is representative of this knowledge-first strategy. Semi-automated extraction workflows combine ontology-guided extraction, natural language processing, and human validation to populate graphs from hazard reports, inventories, and scientific texts; examples include ontology-based BERT extraction from geological hazard reports [51] and geohazard-chain KG construction for emergency response [45]. Literature-driven KG construction has also become increasingly prominent in landslide studies, where models such as XLNet-BiLSTM-CRF are used to extract entities and relations from geoscience literature and then populate domain-specific graphs [54]. Emerging human-in-the-loop LLM-assisted workflows can further support concept expansion and ontology refinement, as illustrated by a semi-automated flood ontology framework designed for risk communication [55]. From a representational perspective, some geohazard KGs are primarily entity-centric, emphasizing slope units, lithology, faults, sensors, and infrastructure, whereas others are event-centric, focusing on hazard episodes, trigger chains, impacts, and emergency-response relations; the geohazard-chain method [20] is especially useful as an event-centric example.

3.3. Advances in Knowledge Reasoning and Fusion

Recent research has moved beyond static representations toward dynamic reasoning and multi-source knowledge fusion. This shift reflects ongoing efforts to connect symbolic geoscientific understanding with data-driven learning and simulation processes, particularly for processes that evolve over time and vary across spatial scales. A key challenge lies in representing how rainfall infiltration, lithology, slope structure, and human disturbance interact to shape hazard development, and in enabling models to reason over these evolving relationships.

Some studies have explored temporal and dynamic knowledge graphs (TKGs) for representing evolving hazard systems. Unlike traditional ontologies, which primarily encode static semantics, TKGs incorporate temporal attributes, event sequences and processes, allowing model reason the process evolution and precursor–trigger–failure chains. To better capture the order and recency of hazard-related events, time-parameterized edges have also been proposed so that sequential relations can be learned explicitly in the graph [56]. Studies by Gottschalk and Demidova [57] and Bordogna et al. [58] have demonstrated how temporal triples and time-dependent spatial relations support near-real-time updates and the prediction of event progression.

Another direction addresses uncertainty-aware and probabilistic representations. Because geohazards involve substantial variability in both measurement and causality, deterministic semantics often fall short. Probabilistic ontologies and uncertainty-aware knowledge graphs therefore introduce confidence weights for relationships, for example, quantifying the likelihood that rainfall triggered a particular failure, integrating evidence from sensors, reports, and simulations [59,60]. These probabilistic structures help characterize epistemic and aleatory uncertainties, improving the reliability of early-warning and decision-support frameworks.

Furthermore, the developing direction involving the fusion of symbolic structures with machine learning model knowledge represents a key research focus. By aligning ontology-defined entities such as slope_unit, geological_material, and triggering_rainfall with ML feature representations, hybrid models can incorporate domain constraints directly into the predictive process. This integration enhances interpretability and supports more robust susceptibility mapping and cross-hazard inference, particularly where observational data are limited or spatially heterogeneous [13].

These developments are also important for multi-hazard integration. Ontology-grounded and KG-based frameworks make it possible to align landslides, earthquakes, floods, debris flows, and related disaster-chain processes within shared trigger-process-impact schemas, thereby supporting the representation of cascading hazards and cross-hazard dependencies. For example, ontology-based multi-hazard coupling systems have been developed to simulate earthquake-induced disaster chains and related cascading effects in complex infrastructure settings [61]. At the same time, representing event evolution and uncertainty remains challenging, because geohazard systems involve incomplete observations, competing causal hypotheses, and variable temporal granularity. From a data-infrastructure perspective, ontology-grounded KGs also support FAIR (Findable, Accessible, Interoperable, and Reusable) principles by improving semantic interoperability, provenance tracking, controlled vocabularies, and machine-readable metadata across heterogeneous geohazard datasets.

In general, these developments show that successive knowledge-representation paradigms extend rather than replace one another, forming a cumulative architecture for increasingly relational and evidence-linked geohazard reasoning.

4. Knowledge-Guided and Hybrid Modeling Approaches in Geohazards

While Section 3 focused on the formal representation of geohazard knowledge through ontologies and knowledge graphs, Section 4 examines how such structured knowledge can be operationalized within predictive models. More specifically, knowledge-guided modeling integrates three methodological traditions: physical laws encode mechanistic constraints, ontologies and KGs provide structured semantics, and machine learning extracts nonlinear patterns from complex data. Figure 5 illustrates this tri-layered architecture, linking multi-source geospatial data with a knowledge layer and an application layer for hazard detection, risk mapping, and decision support. In this framework, structured knowledge mediates between raw observations and high-level inference, enabling geohazard models that are more interpretable, physically consistent, and adaptable than purely empirical approaches.

4.1. Evolution from Data-Driven to Knowledge-Guided Modeling

Modeling approaches in geohazard research have evolved from empirical correlation toward knowledge-embedded inference. Early studies [62,63] mainly identified relationships between slope, lithology, and rainfall thresholds but lacked generalizability. With the development of ML, nonlinear models improved predictive performance [64,65,66]. However, these models often remained weakly constrained by physical processes, and their predictions could violate mechanistic expectations or overfit local conditions, thereby limiting interpretability and transferability [13].

In recent years, knowledge-guided modeling has emerged as a unifying paradigm that integrates data-driven inference with explicit domain reasoning. On the one hand, spatio-temporal geohazard KGs have been directly exploited for landslide prediction by aligning multi-source remote sensing, geological, and monitoring data into instance-level graphs that enable causal reasoning and temporal pattern discovery [40,46,52,53,67]. On the other hand, physics-guided ML incorporates physical equations or geomechanical rules as modeling priors, ensuring predictions remain scientifically consistent and physically plausible [68,69,70,71]. In both cases, expert knowledge is no longer used merely to interpret outputs after prediction; instead, it is embedded directly into the inference process itself.

This conceptual shift has led to several model classes for knowledge-enhanced geohazard intelligence. These approaches can be differentiated according to two dimensions: the source of prior knowledge and the mechanism through which that knowledge is injected into learning. In this review, four model classes are particularly useful for organizing the literature: physics-guided ML, theory-guided ML, KG-regularized ML, and graph-based spatiotemporal models (Table 2). At a higher level, however, these classes can be grouped into two broader families. The first family injects knowledge mainly as constraints on learning, including physics-guided and theory-guided approaches. The second family injects knowledge mainly as structural, relational, or reasoning priors, including KG-regularized ML, graph-based spatiotemporal models, and more tightly coupled symbolic–subsymbolic hybrids. This grouping clarifies the internal logic of Section 4: the present section focuses on predictive modeling and knowledge injection, whereas LLM-based generative and reasoning systems are discussed separately in Section 5.

Across these model classes, the overall direction of methodological development is consistent: knowledge is no longer treated as a passive background resource but as an active component of model construction. Whether encoded as physical equations, heuristic rules, semantic relations, or graph structure, prior knowledge increasingly constrains what a model is allowed to learn, how predictions are regularized, and how outputs are interpreted in relation to geohazard processes.

4.2. Constraint-Based Knowledge Injection: Physics- and Rule-Constrained Learning

Physical modeling remains the cornerstone of geohazard simulation because it provides mechanistic insight into slope stability, pore-pressure evolution, and deformation processes. Deterministic approaches such as limit-equilibrium analysis and finite-element modeling are therefore essential for understanding geohazard dynamics [77,78]. However, their practical use is often limited by the need for detailed boundary conditions, constitutive parameters, and site-specific calibration. To bridge this gap between physical rigor and computational adaptability, recent studies increasingly embed prior knowledge directly into learning algorithms. In geohazard modeling, the most representative forms of such constraint-based knowledge injection are physics-informed machine learning and ontology-/rule-constrained learning.

PHYSICS-INFORMED MACHINE LEARNING (PIML): PIML introduces governing equations, such as Darcy’s law or Mohr–Coulomb failure criteria, into neural architectures as soft constraints or auxiliary loss terms [68,69]. In practice, this knowledge can be embedded at multiple levels: as residual penalties in the loss function, as physically meaningful feasibility constraints on the output space, or as surrogate physical modules within the model architecture. This ensures physical plausibility even when observational data are sparse. In rainfall-induced landslide modeling, embedding physics-based rainfall-infiltration or slope-stability constraints has improved generalization and reduced overfitting [79]. More explicitly, physics-guided landslide models have incorporated geomechanical formulations to preserve stability-consistent behavior during learning [70]. Recent physics-informed neural network studies have integrated Newmark slope-stability mechanics or hydro-mechanical formulations so that predictions remain constrained by physically meaningful deformation or factor-of-safety behavior [80]. The major strength of PIML lies in its continuity with process-based understanding, allowing the network to approximate partial–differential systems under noisy or incomplete forcing. However, it remains limited in representing qualitative or symbolic knowledge (e.g., lithological categories or expert rules), which constrains its interpretability for complex multi-hazard scenarios.

ONTOLOGY- AND RULE-CONSTRAINED LEARNING: Whereas PIML injects knowledge through mechanistic equations, ontology- and rule-constrained learning injects knowledge through semantic and logical structure. Ontologies and knowledge graphs complement PIML by embedding qualitative semantics into learning. Relationships such as “saturated soils have reduced shear strength” or “cumulative rainfall increases instability probability” can be formalized as logic constraints or regularizes during training [47]. These symbolic priors inject causal reasoning into statistical learning, improving interpretability and transferability across regions and hazard types. Methodologically, such prior knowledge may enter the model through rule-based feature construction, semantic consistency constraints, knowledge-guided initialization, or graph-based relation regularization. This makes ontology-guided approaches particularly valuable when the available prior knowledge is relational, heuristic, or categorical rather than fully expressible as governing equations. However, its performance is usually unsatisfactory, particularly due to its reliance on the integrity, consistency, and extensibility of underlying models, especially in scenarios requiring the integration of dynamic or multi-source data.

In summary, these two paradigms are better understood as complementary rather than competing. Physics-informed models primarily constrain learning through physical process knowledge, whereas ontology- and rule-constrained approaches constrain learning through semantic and logical structure. Their strengths are therefore orthogonal: one improves mechanistic consistency and numerical plausibility, while the other improves semantic alignment and causal transparency. Importantly, this complementarity also reveals the limitation of treating them in isolation. Recent studies suggest that integrating physics-based and semantic constraints into hybrid frameworks may help bridge process-aware modeling and knowledge-guided reasoning [43,67]. Similar hybrid schemes have also been used in landslide displacement forecasting, where machine-learning predictors are coupled with computational models to capture both long-term trends and short-term fluctuations [81]. This observation naturally leads to the next level of development: hybrid symbolic–subsymbolic frameworks in which physical, semantic, and relational knowledge are optimized more jointly rather than injected separately.

4.3. Structure- and Reasoning-Based Hybrid Frameworks

This subsection focuses on how knowledge, conversely, shapes the very structure and reasoning mechanisms of the model itself. Recent geohazard studies increasingly move in this direction by integrating symbolic reasoning with subsymbolic learning, thereby constructing hybrid frameworks that combine structured domain knowledge with adaptive data-driven inference [13]. Such integration extends the role of knowledge beyond simple regularization: models are expected not only to predict hazard occurrence, but also to preserve the causal and semantic structures that underlie geohazard processes. Figure 6 captures this idea as a cyclic interaction among symbolic knowledge, hybrid models, and geohazard applications.

Graph-Based Hybrid Architectures A first route toward hybridization is to inject knowledge directly into model architecture through graph structure. In this context, knowledge graphs do not serve as external databases; rather, they serve as structural priors that represent spatial, geological, and causal relations among entities such as slope units, lithology, hydrological triggers, and historical failures. When these structures are coupled with graph neural networks or related spatiotemporal graph models, they help guide the learning process and reveal dependencies that may otherwise remain obscured in noisy or incomplete datasets. In geohazard applications, recent studies have shown that knowledge-informed graph convolutions can improve both the stability and interpretability of multi-hazard mapping, especially under sparse observational conditions [45,76,82,83]. In other words, graph-based models become genuinely hybrid only when their graph structure is itself knowledge-derived rather than purely data-constructed.

Symbolic-Neural Co-Reasoning: A second route emphasizes the joint operation of symbolic rules and neural learning objectives. This strategy focuses on embedding explicit rules or semantic relations within learning objectives. Logical statements, such as the reduction in shear strength under saturation or the increased likelihood of failure with cumulative rainfall, can be translated into differentiable or probabilistic constraints that regularize neural models. Xue et al. [84] demonstrated how such neuro-symbolic formulations help maintain causal consistency in explanatory visual question answering. In geohazard applications, related ideas have been explored through ontology-guided frameworks in which knowledge graphs impose semantic coherence on machine-learning outputs, ensuring that predictions remain compatible with geo mechanical understanding [47]. Despite significant variations in implementation, these approaches share the common objective of aligning model behavior with domain logic, rather than relying simply upon statistical correlations.

Joint Optimization of Data, Physics, and Semantics: From a methodological standpoint, many hybrid frameworks can be interpreted as jointly optimizing empirical fit, physical consistency, and semantic validity. A generic formulation can be written as

L_{t o t a l} = L_{d a t a} + λ_{1} L_{p h y s i c s} + λ_{2} L_{s e m a n t i c}

(1)

where

L_{data}

minimizes empirical errors (e.g., root mean squared error or cross-entropy),

L_{physics}

enforces mechanistic consistency (e.g., residuals from governing equations), and

L_{semantic}

penalizes violations of ontological or logical rules. The coefficients

λ_{1}

and

λ_{2}

balance predictive accuracy and knowledge fidelity. This formulation operationalizes domain understanding as a learnable constraint, not merely an external reference, allowing adaptive harmonization among data, physics, and semantics. Hadid et al. [85] demonstrated that such hybrid optimization substantially improves the stability and interpretability of multi-hazard early warning models.

Overall, hybrid symbol–subsymbol frameworks mark a methodological shift in geohazard modeling. They no longer treat knowledge as static background information, but as a dynamic component that interacts continuously with model training and inference. Through iterative feedback, neural models may refine the associations encoded within knowledge graphs, expose inconsistencies in ontologies, and even suggest new relations for expert validation. This also means that hybrid frameworks move geohazard analysis from loosely knowledge-aware prediction toward more tightly constrained, interpretable, and adaptive intelligent systems [25,82,86].

5. Cognitive and Generative Knowledge Systems in Geohazard

Although hybrid modeling approaches have improved predictive accuracy and interpretability, they still treat knowledge largely as a static resource. In most existing systems, knowledge provides structure, constraints, or explanatory context, but it does not itself evolve during reasoning. Recent developments in cognitive AI and LLMs suggest a further shift: knowledge systems can increasingly participate in interpretation, hypothesis generation, and iterative refinement. Against this background, Section 5 examines how cognitive and generative technologies extend knowledge-guided geohazard modeling toward adaptive knowledge evolution.

The significance of this shift lies not simply in the introduction of new tools, but in the changing role of knowledge itself. The knowledge-guided paradigm built on ontologies, knowledge graphs, and hybrid models, has markedly improved interpretability and data integration [26,27,43,67]. However, its current limitation is that knowledge is still used primarily as a fixed support for modeling. The next challenge is therefore not only how to use knowledge more effectively, but how to enable knowledge to evolve through reasoning, language, and generative cognition. Recent studies show that combining symbolic knowledge representations with generative AI and LLMs can transform geohazard systems into adaptive cognitive frameworks capable of learning, explaining, and hypothesizing [82,86,87]. The following subsections develop this transition from cognitive transformation to LLM-enabled reasoning, and finally to integrated systems of adaptive knowledge evolution.

5.1. Cognitive Transformation of Geohazard Knowledge

In traditional geohazard models, knowledge has mainly played a supporting role: it provides definitions, taxonomies, and well-established causal links that guide data selection or model interpretation. Cognitive approaches assign a more active role to knowledge: knowledge becomes an active participant in reasoning, validation, and hypothesis formation [26,38,67]. Rather than remaining in the background, knowledge becomes a participant in reasoning, validation, and hypothesis formation, directly interacting with both data and model logic during inference. This marks a conceptual shift from knowledge as descriptive context to knowledge as an active component of geohazard intelligence. This transformation has introduced several qualitative changes. First, knowledge moves from description to interpretation: models are expected not only to quantify hazard-related relationships but also to explain why those relationships may hold [11,13,17]. Second, reasoning modules can identify missing links latent dependencies, or alternative causal pathways that are not immediately evident from observational data alone [20,57,59,67]. Third, cognitive implementations facilitate across-scale integration: allowing micro-scale properties such as soil characteristics to be connected with meso-scale geomorphic patterns through semantic abstraction and structured representation [19,34,37,87]. In this context, cognition does not simply enrich existing models; it changes the epistemic role of knowledge within the modeling process itself.

As these components interact, geohazard research is gradually forming a knowledge ecosystem in which data, models and domain semantics co-evolve. Similar developments have emerged in climate and ecological modeling, where reasoning frameworks contribute to domain ontologies construction and inform simulation priors [13,17]. To operationalize this shift, however, heterogeneous observations must first be translated into forms that can be reasoned over. Table 3 shows how remote-sensing imagery, terrain models, monitoring streams, geological maps, and textual reports can be mapped into structured knowledge representations. Across these data types, ontologies and knowledge graphs provide the semantic scaffolding through which observations are interpreted, linked, and incorporated into inference and decision-making.

5.2. Large Language Models and Generative Cognition in Geological Reasoning

LLMs provide a new linguistic and analytical layer for geohazard research, particularly in tasks that require connecting observations with conceptual understanding. Unlike domain-specific tools that strictly adhere to predefined patterns, general-purpose large language models can organize scientific information in a more flexible manner, linking empirical descriptions to theoretical contexts through natural language reasoning [41,94]. Their value becomes increasingly apparent when these models are employed in conjunction with domain knowledge rather than in isolation.

With retrieval-augmented generation (RAG), structured prompting, and other grounding techniques, LLMs can integrate information from the scientific literature, sensor observations, and prior models to support interpretation and hypothesis generation in geoscience applications [85,93,95]. Wang et al. [93] showed that an LLM-based workflow can integrate crowdsourced post-earthquake messages into an enhanced Bayesian updating framework for near-real-time estimation of earthquake-induced fatalities. Similarly, Zhu et al. [82] demonstrated that constraining an LLM with a flood-specific knowledge graph within a GIS environment enables the model to produce spatially coherent flood-risk explanations. Together, these studies indicate that well-grounded LLMs can relate observational patterns to domain knowledge and hazard-relevant contextual information in a way that is more consistent with established process understanding, rather than replacing expert judgment.

Some published studies further illustrate the range of LLM applications in hazard-sensitive contexts. LLM-based text-mining pipelines have been used to identify tropical-cyclone-related flash floods and extract contributing factors from hazard narratives at scale, showing the potential of language models for structured event extraction from disaster records [92]. Human-in-the-loop LLM-assisted workflows have also been applied to semi-automated flood ontology construction from authoritative sources, demonstrating how language models can support structured knowledge population for risk communication rather than only free-text summarization [87]. In related landslide-focused work, multimodal LLM-based approaches have begun to support expert-level landslide image analysis, suggesting a pathway for integrating remote sensing and text within future geohazard reasoning systems [89].

Domain-adapted LLMs developed for geoscience tasks also show promise for multi-source summarization, knowledge-aligned reasoning, and interactive querying. Recent work demonstrates that such models can reconcile textual reports, sensor measurements, and structured vocabularies to build more coherent interpretations of hazardous events. When adapted to geospatial contexts, LLMs can help surface under-recognized patterns by cross-referencing heterogeneous evidence and providing context-aware reasoning [85,95]. Importantly, these systems complement rather than substitute expert knowledge. Their principal function is to help geoscientists incorporate ontology- and KG-based information into reasoning workflows, assess the internal consistency of emerging explanations, and iteratively refine conceptual models of hazard processes.

In addition, the use of LLMs in geohazard contexts raises specific risks for hazard communication, because factual inconsistencies and hallucinations can undermine the trustworthiness of geoscientific knowledge extraction and disaster decision support [96,97,98]. In hazard-sensitive settings, such errors may take the form of unsupported hazard facts or causal links, or of apparently plausible statements being attached to the wrong location, event, or evidence source, which is especially problematic when fast interpretation is required under evolving disaster conditions [96,97]. For this reason, KG grounding, RAG, provenance-linked outputs, and expert validation remain essential, especially in warning and risk-communication settings; recent flood-oriented FKG-RAG systems explicitly combine verified sources, reasoning traces, and mandatory expert approval layers for high-stakes outputs [99]. Practical deployment is also constrained by model size, memory footprint, inference latency, and energy cost. Recent studies on edge-efficient LLM deployment show that compression and quantization are often necessary for resource-constrained settings, while geoscience-oriented lightweight RAG frameworks are being explored precisely to reduce performance–cost trade-offs in domain applications [98,100]. Even so, edge-deployable execution can remain energy-intensive: an empirical CAIN 2025 study found that fetching LLM-generated content from a remote server used 3.5–8.9 times less client-side energy than edge-deployable generation, suggesting that cloud–local hybrid workflows may be more realistic than fully local deployment for many operational geohazard systems [101].

5.3. Toward Adaptive Knowledge Evolution and Integrated Cognitive Systems

Generative cognition extends knowledge-guided modeling into a knowledge-evolving paradigm. In this view, knowledge is neither static nor predefined but dynamically updated through continuous interaction between reasoning, data, and simulation. The process can be formalized as an adaptive mapping:

K_{t + 1} = f (K_{t}, D_{t}, R_{t}, G_{t})

(2)

where

K_{t}

is the current knowledge base,

D_{t}

the observational data,

R_{t}

the reasoning process, and

G_{t}

the generative inference produced by models or experts.

For geohazard applications, such adaptive frameworks can generate knowledge from anomalies or previously unrecognized patterns. When unfamiliar forms of compound slope failure or hydrological–geomechanical interactions are detected, the system may not only help identify potential explanatory hypotheses but also estimate their likelihood. Through iterative validation and symbolic updating, can be progressively refined, especially when LLM-assisted workflows are coupled with continuously updated geo-knowledge graphs [85,90]. This integration redefines the understanding of geohazards: geohazard intelligence shifts from analyzing isolated events to modeling how knowledge itself behaves, and how it accumulates, reorganizes, and acquires explanatory value under uncertainty.

This perspective also connects naturally with broader developments in hybrid physical-data modeling. Physics-informed learning has already shown that mechanistic constraints and observational evidence can jointly refine evolving model states [69]. By analogy, similar principles may guide the evolution of symbolic, semantic, and linguistic knowledge structures. As cognitive and generative systems continue to mature, the boundaries among physical models, knowledge bases, and natural-language reasoning are likely to become progressively less distinct, forming more integrated cognitive infrastructures for hazard monitoring, prediction, and decision support. Even so, adaptive knowledge evolution must remain controlled: newly generated hypotheses and relations still require provenance tracking, uncertainty assessment, and expert validation before they are incorporated into geohazard knowledge systems.

Overall, cognitive and generative systems extend geohazard research from knowledge description to knowledge evolution. By embedding reasoning and language within structured knowledge frameworks, geoscientific intelligence becomes more adaptive, interpretable, and reflective, supporting not only hazard prediction but also the iterative refinement of geoscientific understanding itself.

6. Conclusions and Future Perspectives

The integration of knowledge-guided paradigms, physics-informed modeling and LLM-based reasoning represents a significant advance in geohazard research. Recent studies show how knowledge representation, machine learning, and multimodal data analysis can be combined to improve hazard understanding and prediction. At the same time, this transition reveals several unresolved gaps and practical constraints. To conclude, this section summarizes the main current gaps in the field, highlights near-term opportunities for methodological advances, outlines a longer-term vision for geohazard intelligence, and clarifies the limitations and ethical considerations of this review.

6.1. Current Gaps

Despite rapid progress in knowledge-driven and AI-based geohazard research, several important gaps remain. These gaps arise from semantic inconsistencies across data sources, the limited causal interpretability of machine learning models, computational constraints in large-scale systems, and the scarcity of reliable observations for extreme events.

SEMANTIC FRAGMENTATION AND ONTOLOGY COMPATIBILITY: Geohazard research integrates data from remote sensing, geological surveys, in situ monitoring, and social media, each using different vocabularies and metadata structures. This heterogeneity creates semantic inconsistencies that complicate data integration [19,34,37]. Large language models, when used without explicit domain grounding, also struggle to align textual descriptions with physical processes such as slope failure or seismic rupture. Existing ontologies provide useful foundations [34,35,39]. However, many remain hazard-agnostic and insufficiently harmonized. Divergent classification schemes, such as contrasting landslide typologies [77,102], further complicate interoperability. Ontology-driven knowledge graphs offer a promising direction, but widely accepted standards are still lacking. A related challenge is that many current ontologies and KGs remain hazard-specific, making multi-hazard integration difficult. Representing event evolution, hazard cascades, and uncertainty propagation across linked events remains methodologically demanding, especially when observations are incomplete or temporally uneven. Similar issues arise for probabilistic LLM outputs, whose confidence signals do not necessarily correspond to geoscientific validity.

MODEL INTERPRETABILITY AND CAUSAL UNDERSTANDING: LLMs and other machine learning methods are effective at identifying statistical patterns, but their reasoning remains largely correlation-based rather than mechanistic. This limitation is critical for applications such as landslide or debris-flow warning systems, where understanding the causes of predictions is as important as the predictions themselves [11,13]. Interpretability tools such as attention maps or saliency analysis can identify influential inputs but do not fully explain the physical processes linking them to hazard events [103]. Recent attempts to incorporate physical constraints into neural architectures, such as physics-inspired neural networks, can improve the scientific coherence of model output. Yet these methods inevitably involve trade-offs: enforcing physical consistency often competes with empirical accuracy and comes with considerable computational cost. Symbolic approaches, including ontology-based frameworks and knowledge graphs, make domain relationships explicit [37,38,67]. However, when these structures are absorbed into high-dimensional embeddings, much of their semantic transparency is lost. For LLM-based systems, this challenge is not only algorithmic but also infrastructural: model size, deployment environment, and energy cost directly affect whether such systems can be used in operational monitoring. In practice, smaller task-specific models, edge-deployable deployment, or cloud–local hybrid workflows may therefore be more feasible than frontier-scale models for many geohazard agencies [100,101].

COMPUTATIONAL SCALABILITY AND REAL-TIME INFERENCE: Integrating multiple knowledge sources with high-resolution spatial data and physics-based simulations brings substantial computational costs. LLMs and physics-informed neural networks further increase this burden, limiting their use in real-time hazard monitoring systems [85,104]. To improve efficiency, researchers often simplify models or reduce spatial resolutions. Neural operators and surrogate models have shown promise in reducing computational cost compared with traditional numerical solvers. For example, Fourier Neural Operators have been used to approximate dynamic landslide and debris-flow processes with high efficiency while preserving physical consistency [53,104]. Even so, it remains a major challenge to strike a viable balance between computational efficiency and physical fidelity, particularly when continuous data assimilation and iterative model updates introduce additional overhead.

DATA SCARCITY IN EXTREME EVENTS: Many geohazards, including extreme landslides, significant earthquakes, and catastrophic floods, occur infrequently and yield limited data, resulting in sparse observational records for model training and validation. Spatial variability further complicates this problem, as models trained in one region may perform poorly in another due to differences in geology, climate, or topography. Historical records vary widely in quality and completeness, with many regions lacking systematic monitoring. Social media offers real-time crowdsourced information during disasters, but extracting reliable insights from noisy, unstructured content requires sophisticated processing [93,97]. Synthetic data, transfer learning, and physics-based approaches partially address data limitations, but validating performance for rare events remains problematic. Recent work demonstrates that large language models can extract knowledge from diverse sources and support rapid impact assessment [85,92,93,97]. However, robust validation frameworks for data-scarce conditions require further development.

6.2. Near-Term Opportunities

In the near term, research will mainly focus on improving the integration of knowledge, physical processes, and data-driven learning. Advances in hybrid modeling frameworks and causal inference methods provide practical pathways to enhance the reliability and interpretability of geohazard intelligence systems.

HYBRID ARCHITECTURES FOR KNOWLEDGE-PHYSICS-DATA INTEGRATION: Future geohazard intelligence systems will pursue a closer integration of symbolic knowledge, physical laws and data-driven learning. Rather than linking separate modules, new frameworks may embed physical constraints directly into neural architectures while preserving explicit symbolic reasoning. Physics-informed ML frameworks applied to geotechnical problems have shown improved robustness and accuracy [43,70]. Multimodal large language models that combine text, imagery and sensor data promise comprehensive disaster awareness [89]. Knowledge-graph-constrained approaches retain semantic coherence while extracting evolving event entities [83,86].

CAUSAL DISCOVERY AND MECHANISTIC UNDERSTANDING: Another key direction is the development of causal reasoning frameworks for geohazard systems. Current AI applications often emphasize pattern recognition, while causal relationships between environmental drivers and hazard events remain less explored. Future research may develop automated causal-discovery methods that respect physical constraints, counter-factual reasoning architectures for mitigation scenario testing, and explicit uncertainty quantification distinguishing aleatory and epistemic sources. Such advances would shift geohazard intelligence from reactive prediction toward proactive understanding and intervention.

6.3. Long-Term Vision

In the long term, geological hazard intelligence is expected to evolve from isolated predictive models into integrated, knowledge-rich decision-making systems. These systems will integrate heterogeneous observational data, physical understanding and machine reasoning to support reliable hazard monitoring and risk management. Achieving this vision requires enhanced interoperability and standardized knowledge frameworks. Unified data formats, extensible ontologies, and interoperable model interfaces are crucial for reproducible research and cross-regional comparisons. In this context, knowledge graphs can serve not only as reasoning structures but also as FAIR-oriented data infrastructures. By linking persistent identifiers, controlled vocabularies, provenance-aware metadata, and machine-readable relations across heterogeneous hazard datasets, KGs can improve findability, interoperability, and reuse while supporting reproducible cross-regional geohazard research.

Furthermore, future systems will emphasize human-centered design. Human–AI collaboration frameworks can integrate expert knowledge with automated analysis, enabling scientists and decision-makers to interact with AI systems more transparently and effectively.

6.4. Limitations and Ethical Considerations

LIMITATIONS OF THIS REVIEW: Although this review aims to synthesize recent advances in knowledge-enhanced geohazard intelligence, several limitations should be acknowledged. First, the scope focuses primarily on artificial-intelligence-driven approaches, including ontologies, knowledge graphs, hybrid modeling frameworks, and large language models. Other perspectives, such as traditional expert systems, GIS-based decision-support tools, and cognitive or behavioral studies of hazard perception, are not discussed in detail. Second, the bibliometric analysis is based mainly on publications indexed in the Web of Science Core Collection. While this database provides consistent citation metadata, it may underrepresent relevant studies published in regional journals or the non-English literature. Finally, the rapid development of generative AI and LLM-based systems means that the field is evolving quickly, and some emerging approaches may not yet be fully reflected in the literature reviewed here.

ETHICAL CONSIDERATIONS IN AI-BASED GEOHAZARD SYSTEMS: The increasing use of LLMs and automated analytics in geohazard raises important ethical questions. First, global geoscience datasets are unevenly distributed. Monitoring networks are concentrated in economically developed regions, while many hazard-prone areas have limited observational data. This imbalance may lead to models that perform poorly in regions with the highest risk. Second, the complexity of modern AI systems may reduce transparency in automated hazard assessments. When AI models influence early warning decisions or disaster response planning, their reasoning processes must remain understandable to scientists and emergency managers. Third, risk communication remains a critical challenge. AI-generated summaries or predictions must be carefully designed to avoid misinterpretation, particularly when communicating hazard information to vulnerable populations with limited access to scientific resources. Ensuring transparency, fairness, and the responsible use of AI will therefore be essential for the future development of geohazard intelligence systems.

Author Contributions

Conceptualization, W.L. and Y.Z.; methodology, W.L. and Y.Z.; software, W.L. and Y.Z.; validation, W.L. and Y.Z.; formal analysis, W.L. and Y.Z.; data curation, W.L. and Y.Z.; writing—original draft preparation, W.L. and Y.Z.; writing—review and editing, W.L. and Y.Z.; visualization, W.L.; supervision, Y.Z.; project administration, W.L. and Y.Z.; funding acquisition, W.L. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of China, grant number, 4250020697.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

During the preparation of this manuscript, the authors used CHATGPT 4o for the purposes of language polishing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Froude, M.J.; Petley, D.N. Global fatal landslide occurrence from 2004 to 2016. Nat. Hazards Earth Syst. Sci. 2018, 18, 2161–2181. [Google Scholar] [CrossRef]
Zhou, Y.; Zhang, L.; Zhang, A.; Wang, J. Geoscience Big Data Mining and Machine Learning; Sun Yat-Sen University Press: Guangzhou, China, 2018. [Google Scholar]
Zhou, Y.; Zuo, R.; Liu, G.; Yuan, F.; Mao, X.; Guo, Y.; Xiao, F.; Liao, J.; Liu, Y. A decade of advances in mathematical geoscience: Big data and artificial intelligence are reshaping geology. Bull. Mineral. Petrol. Geochem. 2021, 40, 556–573. [Google Scholar]
Kirschbaum, D.; Stanley, T.; Zhou, Y. Spatial and temporal analysis of a global landslide catalog. Geomorphology 2020, 249, 4–15. [Google Scholar] [CrossRef]
Gariano, S.L.; Guzzetti, F. Landslides in a changing climate. Earth-Sci. Rev. 2016, 162, 227–252. [Google Scholar] [CrossRef]
Emberson, R.; Kirschbaum, D.B.; Amatya, P.; Tanyas, H.; Marc, O. Insights from the topographic characteristics of a large global catalog of rainfall-induced landslide event inventories. Nat. Hazards Earth Syst. Sci. Discuss. 2022, 22, 1129–1149. [Google Scholar] [CrossRef]
Zhou, Y.; Zuo, R. Application of Big Data Mining, Machine Learning and Artificial Intelligence in Ore Deposits; MDPI: Basel, Switzerland, 2025; 222p. [Google Scholar] [CrossRef]
Gill, J.C.; Malamud, B.D. Anthropogenic processes, natural hazards, and interactions in a multi-hazard framework. Earth-Sci. Rev. 2017, 166, 246–269. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth-Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
Karpatne, A.; Atluri, G.; Faghmous, J.H.; Steinbach, M.; Banerjee, A.; Ganguly, A.; Shekhar, S.; Samatova, N.; Kumar, V. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 2017, 29, 2318–2331. [Google Scholar] [CrossRef]
Bergen, K.J.; Johnson, P.A.; de Hoop, M.V.; Beroza, G.C. Machine learning for data-driven discovery in solid Earth geoscience. Science 2019, 363, eaau0323. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Gil, Y.; Pierce, S.A.; Babaie, H.; Banerjee, A.; Borne, K.; Bust, G.; Cheatham, M.; Ebert-Uphoff, I.; Gomes, C.; Hill, M.; et al. Intelligent systems for geosciences: An essential research agenda. Commun. ACM 2019, 62, 76–84. [Google Scholar] [CrossRef]
Kuhn, W. Geospatial semantics: Why, of what, and how? In Journal on Data Semantics III; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–24. [Google Scholar] [CrossRef]
Janowicz, K.; Hitzler, P.; Adams, B.; Kolas, D.; Vardeman, C. Five stars of linked data vocabulary use. Semant. Web 2012, 5, 173–176. [Google Scholar] [CrossRef]
Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; Kumar, V. Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Jia, X.; Willard, J.; Karpatne, A.; Read, J.S.; Zwart, J.A.; Steinbach, M.; Kumar, V. Physics-guided machine learning for scientific discovery: An application in simulating lake temperature profiles. ACM/IMS Trans. Data Sci. 2021, 2, 1–26. [Google Scholar] [CrossRef]
Richard, S.M.; Model, C.D.; Testbed Working Group. GeoSciML-A GML Application for Geoscience Information Interchange. In Digital Mapping Techniques ’06; USGS Open File Report 2007-1285; U.S. Geological Survey: Reston, VA, USA, 2007. [Google Scholar]
Qiu, Q.J.; Wu, L.; Ma, K.; Xie, Z.; Tao, L. A knowledge graph construction method for geohazard chain for disaster emergency response. Earth Sci. 2023, 48, 1875–1891. [Google Scholar]
Zhang, Y.; Chen, Y.; Wang, J.; Pan, Z. Unsupervised deep anomaly detection for multi-sensor time-series signals. IEEE Trans. Knowl. Data Eng. 2022, 35, 2118–2132. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2022, arXiv:2108.07258. [Google Scholar]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Raeissi, M.M.; Knapen, R. Applications of Generative Large Language Models in Environmental Science: A Systematic Review. Adv. Environ. Eng. Res. 2025, 6, 028. [Google Scholar] [CrossRef]
Yang, L.; Chen, H.; Li, Z.; Ding, X.; Wu, X. Give us the facts: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv 2024. [Google Scholar] [CrossRef]
Ma, X. Knowledge Graph Construction and Application in Geosciences: A Review. Comput. Geosci. 2022, 161, 105082. [Google Scholar] [CrossRef]
Zhao, T.; Wang, S.; Ouyang, C.; Chen, M.; Liu, C.; Zhang, J.; Yu, L.; Wang, F.; Xie, Y.; Li, J.; et al. Artificial intelligence for geoscience: Progress, challenges, and perspectives. Innovation 2024, 5, 100691. [Google Scholar] [CrossRef] [PubMed]
Pranckutė, R. Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World. Publications 2021, 9, 12. [Google Scholar] [CrossRef]
Martín-Martín, A.; Orduna-Malea, E.; Thelwall, M.; Delgado López-Cózar, E. Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories. J. Informetr. 2018, 12, 1160–1177. [Google Scholar] [CrossRef]
Carrara, A.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. GIS techniques in mapping landslide hazard. In Geographical Information Systems in Assessing Natural Hazards; Carrara, A., Guzzetti, F., Eds.; Springer: Berlin/Heidelberg, Germany, 1991; pp. 135–175. [Google Scholar] [CrossRef]
Zhou, C.H.; Lee, C.F.; Li, J.; Xu, Z.W. On the spatial relationship between landslides and causative factors on Lantau Island, Hong Kong. Geomorphology 2002, 43, 197–207. [Google Scholar] [CrossRef]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Guarino, N. Formal ontology and information systems. In Proceedings of the FOIS’98; IOS Press: Amsterdam, The Netherlands, 1998; pp. 3–15. [Google Scholar]
Raskin, R.; Pan, M. Knowledge representation in the semantic web for Earth and environmental terminology (SWEET). Comput. Geosci. 2005, 31, 1119–1125. [Google Scholar] [CrossRef]
SWEET Team/ESIPFed. SWEET: Semantic Web for Earth and Environment Technology. Available online: http://sweetontology.net/sweetAll (accessed on 1 June 2025).
Phengsuwan, J.; Shah, T.; James, P.; Thakker, D.; Barr, S.; Ranjan, R. Ontology-based discovery of time-series data sources for landslide early warning system. Computing 2020, 102, 745–763. [Google Scholar] [CrossRef]
Wen, M.; Qiu, Q.; Zheng, S.; Ma, K.; Zheng, S.; Xie, Z.; Tao, L. Construction and application of a multilevel geohazard domain ontology: A case study of landslide geohazards. Appl. Comput. Geosci. 2023, 20, 100134. [Google Scholar] [CrossRef]
Hogan, A.; Blomqvist, E.; Cochez, M.; D’amato, C.; De Melo, G.; Gutierrez, C.; Kirrane, S.; Gayo, J.E.L.; Navigli, R.; Neumaier, S.; et al. Knowledge graphs. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
Sen, M.; Tim, D. GeoSciML: Development of a generic geoscience markup language. Comput. Geosci. 2005, 31, 1095–1103. [Google Scholar] [CrossRef]
Qiu, Q.; Xie, Z.; Ma, K.; Tao, L.; Zheng, S. NeuroSPE: A neuro-net spatial relation extractor for natural language text fusing gazetteers and pre-trained models. Trans. GIS 2023, 27, 1485–1510. [Google Scholar] [CrossRef]
Zhang, Q.; Zhou, Y.; Guo, L.; Yuan, Q.; Yu, P.; Wang, H.; Zhu, B.; Han, F.; Long, S. Intelligent applications of knowledge graphs in mineral exploration: A case study of the Qin–Hang metallogenic belt porphyry copper deposit. Earth Sci. Front. 2024, 31, 7–15. [Google Scholar] [CrossRef]
Han, F.; Deng, Y.; Liu, Q.; Zhou, Y.; Wang, J.; Huang, Y.; Zhang, Q.; Bian, J. Construction and application of the knowledge graph method in management of soil pollution in contaminated sites: A case study in South China. J. Environ. Manag. 2022, 3019, 115685. [Google Scholar] [CrossRef]
Han, Y.; Semnani, S.J. Integration of Physics-Based and Data-Driven Approaches for Landslide Susceptibility Assessment. Int. J. Numer. Anal. Methods Geomech. 2025, 49, 3060–3097. [Google Scholar] [CrossRef]
Ji, J.; Zhou, Y.; Cheng, Q.; Jiang, S.; Liu, S. Landslide susceptibility mapping based on deep learning algorithms using. Land 2023, 12, 1125. [Google Scholar] [CrossRef]
Qiu, Q.; Xie, Z.; Zhang, D.; Ma, K.; Tao, L.; Tan, Y.; Zhang, Z.; Jiang, B. Knowledge graph for identifying geological disasters by integrating computer vision with ontology. J. Earth Sci. 2023, 34, 1418–1432. [Google Scholar] [CrossRef]
Ge, X.; Yang, Y.; Chen, J.; Li, W.; Huang, Z.; Zhang, W.; Peng, L. Disaster prediction knowledge graph based on multi-source spatio-temporal information. Remote Sens. 2022, 14, 1214. [Google Scholar] [CrossRef]
Wu, Q.; Xie, Z.; Tian, M.; Qiu, Q.; Chen, J.; Tao, L.; Zhao, Y. Integrating Knowledge Graph and Machine Learning Methods for Landslide Susceptibility Assessment. Remote Sens. 2024, 16, 2399. [Google Scholar] [CrossRef]
Sajjadian, M.; Scheider, S. Geodata source retrieval by multilingual/semantic query expansion: The case of Google Translate and WordNet. Agil. GISci. Ser. 2022, 3, 60. [Google Scholar] [CrossRef]
Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
Du, W.; Liu, C.; Xia, Q.; Wen, M.; Hu, Y.; Chen, Z.; Xu, L.; Zhang, X.; Terfa, B.K.; Chen, N. OFPO & KGFPO: Ontology and knowledge graph for flood process observation. Environ. Model. Softw. 2025, 185, 106317. [Google Scholar] [CrossRef]
Ma, K.; Tian, M.; Tan, Y.; Qiu, Q.; Xie, Z.; Huang, R. Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports. J. Earth Sci. 2023, 34, 1390–1405. [Google Scholar] [CrossRef]
Chen, L.; Ge, X.; Yang, L.; Li, W.; Peng, L. An improved multi-source data-driven landslide prediction method based on spatio-temporal knowledge graph. Remote Sens. 2023, 15, 2126. [Google Scholar] [CrossRef]
Chen, L.; Peng, L. Improving landslide prediction: Innovative modeling and evaluation of landslide scenario with knowledge graph embedding. Remote Sens. 2024, 16, 145. [Google Scholar] [CrossRef]
Sun, Q.; Ding, Y.; Hou, J.; Zhu, Q.; Wu, Y.; Wu, T.; Wang, X.; Zhao, X.; Shao, S. LHAKG: A knowledge graph construction framework for landslide hazard assessment by using XLNet-BiLSTM-CRF from geoscience literature. Int. J. Digit. Earth 2025, 18, 2577292. [Google Scholar] [CrossRef]
Li, J.; Qin, J.; Kang, K.; Liang, M.; Liu, K.; Ding, X. Enhanced Spatiotemporal Landslide Displacement Prediction Using Dynamic Graph-Optimized GNSS Monitoring. Sensors 2025, 25, 4754. [Google Scholar] [CrossRef]
Chen, X.; Hu, D.; Zhang, L.; Wu, Y.; Dai, K.; Feng, Y.; Xu, Q. TPE: Time-parameterized edge for sequential link prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management; ACM: Singapore, 2020; pp. 1963–1972. [Google Scholar] [CrossRef]
Gottschalk, S.; Demidova, E. EventKG: A multilingual event-centric temporal knowledge graph. In European Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2018; pp. 272–287. [Google Scholar] [CrossRef]
Bordogna, G.; Frigerio, L.; Kliment, T.; Brivio, P.A.; Hossard, L.; Manfron, G.; Sterlacchini, S. “Contextualized VGI” creation and management to cope with uncertainty and imprecision. ISPRS Int. J. Geo-Inf. 2016, 5, 234. [Google Scholar] [CrossRef]
Laskey, K.B. MEBN: A language for first-order Bayesian knowledge bases. Artif. Intell. 2008, 172, 140–178. [Google Scholar] [CrossRef]
Berg, R.; Kipf, T.N.; Welling, M. GCMC: Graph convolutional matrix completion. arXiv 2017. [Google Scholar] [CrossRef]
Gu, Y.; Wang, C.; Liu, Y.; Zhou, R. An ontology-based multi-hazard coupling accidents simulation and deduction system for underground utility tunnel: A case study of earthquake-induced disaster chain. Reliab. Eng. Syst. Saf. 2025, 253, 110559. [Google Scholar] [CrossRef]
Guzzetti, F.; Reichenbach, P.; Cardinali, M.; Galli, M.; Ardizzone, F. Probabilistic landslide hazard assessment at the basin scale. Geomorphology 2006, 72, 272–299. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.W.; Khosravi, K.; Yang, Y.; Pham, B.T. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 2019, 662, 332–346. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A.X. Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 2018, 644, 1108–1119. [Google Scholar] [CrossRef]
Li, J.; Zhang, J.; Wang, L.; Zhao, A. A hierarchical spatiotemporal data model based on knowledge graphs for representation and modeling of geohazards. Sustainability 2024, 16, 10271. [Google Scholar] [CrossRef]
Wang, Y.; Fang, Z.; Hong, H.; Peng, L. Flood susceptibility mapping using convolutional neural network frameworks. J. Hydrol. 2020, 582, 124482. [Google Scholar] [CrossRef]
Wang, Z.; Li, W.; Tang, C. Ontology-based semantic reasoning for multi-hazard knowledge integration and risk assessment. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103285. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Pei, T.; Qiu, T.; Shen, C. Landslide susceptibility mapping using physics-guided machine learning: A case study of a debris flow event in the Colorado Front Range. Acta Geotech. 2024, 19, 6617–6641. [Google Scholar] [CrossRef]
Pei, T.; Maroufi, M.; Shen, C.; Tian, Y. Physics-Informed Machine Learning Framework for Predicting Rainfall-Induced Shallow Landslides in the Colorado Front Range. In Proceedings of the Geo-Extreme 2025, Long Beach, CA, USA, 2–5 November 2025; pp. 97–106. [Google Scholar] [CrossRef]
Monaco, S.; Apiletti, D.; Malnati, G. Theory-Guided Deep Learning Algorithms: An Experimental Evaluation. Electronics 2022, 11, 2850. [Google Scholar] [CrossRef]
Wu, R.; Huang, M.; Ma, H.; Huang, J.; Li, Z.; Mei, H.; Wang, C. A Multi-Temporal Knowledge Graph Framework for Landslide Monitoring and Hazard Assessment. GeoHazards 2025, 6, 39. [Google Scholar] [CrossRef]
Belvederesi, G.; Tanyas, H.; Lipani, A.; Dahal, A.; Lombardo, L. Distribution-agnostic landslide hazard modelling via Graph Transformers. Environ. Model. Softw. 2025, 183, 106231. [Google Scholar] [CrossRef]
Yang, C.; Yin, Y.; Zhang, J.; Ding, P.; Liu, J. A graph deep learning method for landslide displacement prediction based on global navigation satellite system positioning. Geosci. Front. 2024, 15, 101690. [Google Scholar] [CrossRef]
Ge, Y.; Cao, S.; Tang, H.; Zhang, Y. Graph neural network for spatiotemporal landslide prediction: A case study of the Three Gorges Reservoir area, China. Geomorphology 2023, 441, 108891. [Google Scholar] [CrossRef]
Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes classification of landslide types, an update. Landslides 2014, 11, 167–194. [Google Scholar] [CrossRef]
Lu, N.; Godt, J.W. Hillslope Hydrology and Stability; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar] [CrossRef]
Cui, H.-Z.; Tong, B.; Wang, T.; Dou, J.; Ji, J. A hybrid data-driven approach for rainfall-induced landslide susceptibility mapping: Physically-based probabilistic model with convolutional neural network. J. Rock Mech. Geotech. Eng. 2025, 17, 4933–4951. [Google Scholar] [CrossRef]
Dahal, A.; Lombardo, L. Towards physics-informed neural networks for landslide prediction. Eng. Geol. 2025, 344, 107852. [Google Scholar] [CrossRef]
Zhu, X.; Xu, Q.; Tang, M.; Li, H.; Liu, F. A hybrid machine learning and computing model for forecasting displacement of multifactor-induced landslides. Neural Comput. Appl. 2018, 30, 3825–3835. [Google Scholar] [CrossRef]
Zhu, J.; Dang, P.; Cao, Y.; Lai, J.; Guo, Y.; Wang, P. A flood knowledge-constrained large language model interactable with GIS: Enhancing public risk perception of floods. Int. J. Geogr. Inf. Sci. 2024, 38, 456–481. [Google Scholar] [CrossRef]
Li, W.; Wu, L.; Xu, X.; Xie, Z.; Qiu, Q.; Liu, H.; Huang, Z.; Chen, J. Deep learning and network analysis: Classifying and visualizing geologic hazard reports. J. Earth Sci. 2024, 35, 1289–1303. [Google Scholar] [CrossRef]
Xue, D.; Qian, S.; Xu, C. Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7893–7908. [Google Scholar] [CrossRef] [PubMed]
Hadid, A.; Chakraborty, T.; Busby, D. When geoscience meets generative AI and large language models: Foundations, trends, and future challenges. Expert Syst. 2024, 41, e13654. [Google Scholar] [CrossRef]
Hofmeister, M.; Bai, J.; Brownbridge, G.; Mosbach, S.; Lee, K.F.; Farazi, F.; Hillman, M.; Agarwal, M.; Ganguly, S.; Akroyd, J.; et al. Semantic agent framework for automated flood assessment using dynamic knowledge graphs. Data-Centric Eng. 2024, 5, e14. [Google Scholar] [CrossRef]
Li, S.; Erickson, C.; Zajac, M.; Guo, X.; Duan, Q.; Gong, J. A Semi-Automated Framework for Flood Ontology Construction with an Application in Risk Communication. Water 2025, 17, 2801. [Google Scholar] [CrossRef]
Aivalis, T.; Klampanos, I.A.; Troumpoukis, A. LLM-Driven Knowledge Graph Construction from Earth Observation Data for Extreme Events. In Workshop on AI-Driven Data Engineering and Reusability for Earth and Space Sciences (DARES’25), Co-Located with ECAI 2025, Bologna, Italy; CEUR Workshop Proceedings: Aachen, Germany, 2025; Volume 4128. [Google Scholar]
Areerob, K.; Nguyen, V.-Q.; Li, X.; Inadomi, S.; Shimada, T.; Kanasaki, H.; Wang, Z.; Suganuma, M.; Nagatani, K.; Chun, P.-J.; et al. Multimodal artificial intelligence approaches using large language models for expert-level landslide image analysis. Comput.-Aided Civ. Infrastruct. Eng. 2025, 40, 2900–2921. [Google Scholar] [CrossRef]
Shimizu, C.; Stephe, S.; Barua, A.; Cai, L.; Christou, A.; Currier, K.; Dalal, A.; Fisher, C.K.; Hitzler, P.; Janowicz, K.; et al. The KnowWhereGraph ontology: Enabling spatially explicit knowledge graphs for disasters and environment. Data Intell. 2023, 5, 304–328. [Google Scholar] [CrossRef]
Zajac, M.; Kulawiak, C.; Li, S.; Erickson, C.; Hubbell, N.; Gong, J. Unifying flood-risk communication: Empowering community leaders through AI-enhanced, contextualized storytelling. Hydrology 2025, 12, 204. [Google Scholar] [CrossRef]
Zhou, Y.; Matyas, C.J.; Liu, P.; Li, H. Identification of tropical cyclone–related flash floods from hazard narratives using a large language model–based approach. npj Nat. Hazards 2025, 2, 104. [Google Scholar] [CrossRef]
Wang, C.; Engler, D.; Li, X.; Hou, J.; Wald, D.J.; Jaiswal, K.; Xu, S. Near-real-time earthquake-induced fatality estimation using crowdsourced data and large-language models. Int. J. Disaster Risk Reduct. 2024, 108, 104680. [Google Scholar] [CrossRef]
Lin, Z.; Deng, C.; Zhou, L.; Zhang, T.; Xu, Y.; Xu, Y.; He, Z.; Shi, Y.; Dai, B.; Song, Y.; et al. Geogalactica: A scientific large language model in geoscience. arXiv 2023, arXiv:2401.00434. [Google Scholar]
Wang, S.; Hu, T.; Xiao, H.; Li, Y.; Zhang, C.; Ning, H.; Zhu, R.; Li, Z.; Ye, X. GPT, large language models (LLMs) and generative artificial intelligence (GAI) models in geospatial science: A systematic review. Int. J. Digit. Earth 2024, 17, 2353122. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, J. Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics 2025, 13, 856. [Google Scholar] [CrossRef]
Xu, F.; Ma, J.; Li, N.; Cheng, J.C.P. Large language model applications in disaster management: An interdisciplinary review. Int. J. Disaster Risk Reduct. 2025, 127, 105642. [Google Scholar] [CrossRef]
Zhou, B.; Li, K. Fusing Geoscience Large Language Models and Lightweight RAG for Enhanced Geological Question Answering. Geosciences 2025, 15, 382. [Google Scholar] [CrossRef]
Karimanzira, D.; Rauschenbach, T.; Hellmund, T.; Ritzau, L. Improved Flood Management and Risk Communication Through Large Language Models. Algorithms 2025, 18, 713. [Google Scholar] [CrossRef]
Wang, R.; Gao, Z.; Zhang, L.; Yue, S.; Gao, Z. Empowering large language models to edge intelligence: A survey of edge efficient LLMs and techniques. Comput. Sci. Rev. 2025, 57, 100755. [Google Scholar] [CrossRef]
Nguyen, V.; Dhopate, V.; Huynh, H.; Bouhlal, H.; Annengala, A.; Scoccia, G.L.; Martinez, M.; Stoico, V.; Malavolta, I. On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content. In Proceedings of the 4th International Conference on AI Engineering—Software Engineering for AI (CAIN 2025), Ottawa, ON, Canada, 27–28 April 2025; pp. 72–82. [Google Scholar]
Cruden, D.M.; Varnes, D.J. Landslide types and processes. In Landslides: Investigation and Mitigation; Turner, A.K., Schuster, R.L., Eds.; Special Report 247; Transportation Research Board, National Research Council: Washington, DC, USA, 1996; pp. 36–75. [Google Scholar]
Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K.R. Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, X.; Karniadakis, G.E. Fourier Neural Operators for surrogate modeling in geophysical simulations. Water Resour. Res. 2024, 60, e2023WR034939. [Google Scholar] [CrossRef]

Figure 1. The number of publications from 1990 to 2024.

Figure 2. Cloud words in different phases.

Figure 3. Keyword-phase heatmap. (a) Absolute keyword frequencies. (b) Relative keyword frequencies normalized within each phase, calculated as the proportion of keyword occurrences relative to the total keyword count in that phase.

Figure 4. Evolution of geohazard knowledge representation.

Figure 5. Conceptual framework of knowledge-enhanced geohazard modeling. The framework integrates three layers: (1) multi-source geohazard data, (2) knowledge representation and injection, and (3) predictive modeling and geohazard applications. Knowledge enters predictive models through two major pathways: constraint-based injection, including physics-informed and ontology-/rule-constrained learning, and structure-/reasoning-based priors, including knowledge graphs and graph-informed model structures. The framework shows how structured knowledge mediates between raw observations and interpretable geohazard prediction.

Figure 6. Hybrid symbolic–subsymbolic framework for knowledge-guided geohazard modeling.

Table 1. Comparison between ontology-based and KG-based knowledge representation in geohazard research.

	Ontology-Based Approaches	Knowledge Graph-Based Approaches
Conceptual focus	Formal domain semantics and taxonomy	Entity-relation networks with data instances
Strengths	Logical consistency, interpretability, semantic interoperability	Scalability, data integration, graph reasoning
Limitations	Static, difficult to update, limited instance coverage	Weaker formal semantics, inconsistent schemas
Representative works	[15,19,34,36,39,51]	[20,41,46,47,52,53]

Table 2. Taxonomy of knowledge-enhanced modeling approaches in geohazard studies.

Model Class	Main Knowledge Source	Injection Mechanism	Representative Geohazard Tasks	Main Strengths	Main Limitations	Representative References
Physics-guided ML	Physical laws and governing equations, including Darcy’s law, Mohr-Colomb criteria, PDEs, and geomechanical constraints	Physics-based loss functions, residual penalties, constrained optimization, hybrid PINN frameworks, knowledge-guided model initialization	Rainfall-induced landslide prediction, slope stability assessment, debris-flow forecasting, geomechanically constrained susceptibility mapping	Mechanistic interpretability; physically consistent predictions under sparse data; stronger numerical consistency; improved generalization under process-informed constraints	Difficult to encode qualitative or symbolic knowledge; computationally expensive for PDE-based settings; performance may depend on simplified process assumptions	[43,68,69,70,71]
Theory-guided ML	Expert rules, geotechnical heuristics, threshold logic, symbolic rules, and process-based prior knowledge	Rule-based feature construction, knowledge-guided loss terms, logic constraints, semantic consistency regularization, knowledge-based initialization	Slope-stability prediction, warning classification, hazard screening, semantically constrained susceptibility assessment	Useful when knowledge is qualitative or heuristic rather than fully equation-based; improves interpretability; preserves causal transparency; easier to implement than full physics-informed models	Domain rules may be incomplete or context-specific; difficult to formalize consistently; may lack numerical grounding	[11,17,67,72]
KG-regularized ML	Ontologies, knowledge graphs, semantic relations, graph embeddings, hazard-event-factor networks	KG embeddings, semantic regularization, relation constraints, ontology-guided feature fusion, rule/KG-supported inference	KG-supported risk assessment, hazard classification, semantic fusion of remote-sensing and geological data, multi-source geohazard monitoring	Maintains semantic alignment across heterogeneous data; supports explainable reasoning, causal transparency, and interoperability; useful for multi-hazard contexts	Depends on KG quality and schema consistency; knowledge coverage may be uneven; many systems remain project-specific and only semi-automated	[45,46,47,52,54]
Graph-based spatiotemporal models	Spatial adjacency, environmental similarity, monitoring-network topology, temporal dependencies, instance-level relational graphs	Graph neural networks, graph convolutions, attention/transformer modules, spatiotemporal graph learning, knowledge-informed graph priors	Landslide displacement prediction, landslide susceptibility mapping, dynamic hazard monitoring, multi-factor-induced landslide forecasting	Captures nonlocal dependencies and structured spatial–temporal interactions; supports trend and fluctuation modeling; effective for dynamic hazard processes	Performance depends on graph construction quality; may remain weakly interpretable without explicit semantic or physical constraints	[55,73,74,75,76]

Table 3. Mapping geohazard data types into structured knowledge representations for cognitive geohazard intelligence.

Data Type	Typical Source	Structured Knowledge Representation	Role in Grounded Cognitive Reasoning	Representative References
Remote sensing imagery	Sentinel-1/2, Landsat, UAV/aerial imagery, LiDAR-derived slope units	Geomorphology/land-cover ontology; image entity/event graph linking scar, runout, water extent, damaged assets, time, and location	Ontology-aligned labeling and KG grounding transform visual observations into interpretable hazard entities; support cross-source alignment with reports and geospatial metadata, and improve spatially coherent explanation	[45,88,89]
Topographic/terrain data	SRTM, ASTER, ALOS PALSAR, national DEMs, slope-unit maps	Geomorphologic ontology and terrain rule schema; geospatial KG linking slope units, adjacency, elevation, drainage, and exposure	Terrain semantics and expert rules act as interpretable priors for susceptibility screening, spatial retrieval, and rule-constrained reasoning; they also contextualize remote-sensing and monitoring evidence within physically meaningful terrain units	[46,78,90]
Monitoring/time-series data	InSAR, GNSS, rainfall gauges, hydrological stations, IoT sensors	Temporal ontology or trigger–event–impact KG; provenance-linked observation graph	Temporal relations support event tracking, threshold reasoning, alert propagation, and evidence-linked updating instead of isolated point prediction	[46,55,73,86]
Geological/soil/lithology maps	Field surveys, OneGeology, USGS databases, soil maps, fault inventories	Lithology/fault/soil ontology; geospatial KG linking material units, properties, hydrology, and hazard history	Semantic harmonization across heterogeneous geological vocabularies and map layers supports explainable zoning, concept alignment, and crosswalks between geological descriptions and model inputs	[19,34,37,90]
Textual and literature data	Scientific articles, hazard bulletins, disaster reports, social media, agency documents	Hazard ontology, event schema, provenance-linked concept graph/document graph	LLMs and NLP extract entities, relations, and event arguments; KG/RAG grounding supports evidence attribution, summarization, causal QA, consistency checking, and expert validation	[51,54,82,87,91,92]
Multi-source/multimodal evidence	Combined imagery, GIS layers, sensors, text reports, prior simulations/models	Multimodal KG/ontology + neural embeddings + provenance graph	Cross-modal grounding aligns visual, textual, and numerical evidence; supports hypothesis generation, scenario comparison, and validated knowledge updating across reasoning cycles	[46,86,88,90,93]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Zhou, Y. Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models. GeoHazards 2026, 7, 40. https://doi.org/10.3390/geohazards7020040

AMA Style

Li W, Zhou Y. Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models. GeoHazards. 2026; 7(2):40. https://doi.org/10.3390/geohazards7020040

Chicago/Turabian Style

Li, Wenjia, and Yongzhang Zhou. 2026. "Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models" GeoHazards 7, no. 2: 40. https://doi.org/10.3390/geohazards7020040

APA Style

Li, W., & Zhou, Y. (2026). Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models. GeoHazards, 7(2), 40. https://doi.org/10.3390/geohazards7020040

Article Menu

Toward Knowledge-Enhanced Geohazard Intelligence: A Review of Knowledge Graphs and Large Language Models

Abstract

1. Introduction

2. Bibliometric Analysis of Knowledge-Guided Approaches to Geohazards

2.1. Search Strategy and Literature Selection

2.2. Bibliometric Analysis Results

2.2.1. Publication Trends and Evolution

2.2.2. Thematic Evolution and Research Focus

3. Knowledge Representation in Geohazard

3.1. Evolution of Knowledge Representation in Geohazard Research

3.2. Comparative Review: Ontologies vs. Knowledge Graphs

3.3. Advances in Knowledge Reasoning and Fusion

4. Knowledge-Guided and Hybrid Modeling Approaches in Geohazards

4.1. Evolution from Data-Driven to Knowledge-Guided Modeling

4.2. Constraint-Based Knowledge Injection: Physics- and Rule-Constrained Learning

4.3. Structure- and Reasoning-Based Hybrid Frameworks

5. Cognitive and Generative Knowledge Systems in Geohazard

5.1. Cognitive Transformation of Geohazard Knowledge

5.2. Large Language Models and Generative Cognition in Geological Reasoning

5.3. Toward Adaptive Knowledge Evolution and Integrated Cognitive Systems

6. Conclusions and Future Perspectives

6.1. Current Gaps

6.2. Near-Term Opportunities

6.3. Long-Term Vision

6.4. Limitations and Ethical Considerations

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI