Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network

Zhang, Hanfeng; Chen, Shuai; Wen, Fenggang; Xu, Rui; Luo, Yuhao; Liu, Fushen; Wang, Shouguang; Duan, Hongfei

doi:10.3390/app16115568

Open AccessArticle

Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network

by

Hanfeng Zhang

^1,2,3,

Shuai Chen

⁴,

Fenggang Wen

⁵,

Rui Xu

^1,2,3,

Yuhao Luo

^1,2,3,

Fushen Liu

^1,2,3,*,

Shouguang Wang

⁶ and

Hongfei Duan

⁷

¹

Research Center of Coastal and Urban Geotechnical Engineering, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China

²

Zhejiang Key Laboratory of the Development and Utilization of Underground Space, Zhejiang University, Hangzhou 310058, China

³

State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou 310058, China

⁴

Institute of Mathematics, Henan Academy of Sciences, Zhengzhou 450046, China

⁵

Shaanxi Key Laboratory of Lacustrine Shale Gas Accumulation and Exploitation, Xi’an 710065, China

⁶

State Key Laboratory of Intelligent Coal Mining and Strata Control, Beijing 100013, China

⁷

School of Civil Engineering, Sun Yat-sen University, Zhuhai 519000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5568; https://doi.org/10.3390/app16115568

Submission received: 1 May 2026 / Revised: 21 May 2026 / Accepted: 26 May 2026 / Published: 2 June 2026

Download

Browse Figures

Versions Notes

Abstract

Spatiotemporal prediction of gas extraction-induced seismicity is a key challenge in regional seismic risk management, hindered by heterogeneous spatial coupling among reservoir blocks and extreme class imbalance in seismicity records. This study proposes a multi-task spatiotemporal forecasting framework based on a dual-encoder architecture combining a Graph Attention Network (GAT) with a Long Short-Term Memory (LSTM) network. The monitoring network is represented as a graph with node-level features including monthly production, reservoir pressure, compaction, and historical seismicity. A Voronoi tessellation strategy maps continuous epicentral coordinates to discrete graph nodes. The GAT encodes heterogeneous spatial interactions via adaptive attention, while a two-layer LSTM extracts multiscale temporal dependencies. Event detection and magnitude classification are treated as parallel tasks, jointly optimized using focal loss and focal-adjusted weighted cross-entropy to mitigate class imbalance. A Seismic Risk Index (SRI) integrates event occurrence and magnitude class probabilities into a continuous risk estimate. Validated on the KNMI seismic catalog and Groningen production data, the model achieves an event Probability of Detection (POD) of 0.677 and a magnitude classification macro average recall (MAvA) of 0.548 under an event rate of 0.07%. Compared with a pure LSTM baseline, the GAT improves POD by 2.1% and MAvA by 7.9%. The time-averaged risk field exhibits spatial heterogeneity broadly consistent with observed seismicity patterns, indicating the potential of this framework for fine-grained spatiotemporal risk assessment of extraction-induced seismicity.

Keywords:

induced seismicity; deep learning; graph neural network; spatiotemporal prediction; imbalanced learning

1. Introduction

Induced seismicity refers to seismic events triggered by stress changes from human activities such as gas extraction, reservoir impoundment, and geothermal development [1]. Compared with natural earthquakes, induced events are typically smaller in magnitude and occur at shallower depths; nevertheless, their high frequency and proximity to the surface can cause substantial cumulative damage to buildings and infrastructure, posing threats to regional public safety [2]. The Groningen gas field in the Netherlands—the largest onshore natural gas field in Europe—has been in production since 1963. Long-term fluid withdrawal has led to progressive pore-pressure depletion, reservoir compaction, and reactivation of pre-existing faults, producing numerous induced seismic events that have damaged tens of thousands of buildings and caused far-reaching societal impact [3,4]. Developing effective forecasting models for induced seismicity is therefore crucial for improving early warning systems and mitigating risks in gas production areas.

Traditional induced-seismicity forecasting methods can be broadly grouped into physics-based geomechanical models and statistical seismology approaches. For instance, Bourne et al. [5] developed a statistical model linking reservoir compaction to seismic activity through the Kostrov–McGarr strain–moment relationship, and showed that the seismic strain ratio increases with cumulative compaction. Moreover, Dempsey et al. [6] incorporated poroelastic triggering and fault-rupture processes by calculating stress changes on mapped faults from simulated spatiotemporal pressure fields, constructing a physics-based forecasting model for the Groningen field. Also, Candela et al. [7] used a Coulomb rate-and-state framework combined with differential compaction to compute Coulomb stress changes on faults and predict the spatiotemporal evolution of induced seismicity in Groningen. Although these approaches are physically grounded, they require parameters that are difficult to constrain, they are computationally demanding, and they rely on simplifying assumptions that can limit performance in complex settings. [8] More recently, machine learning has attracted increasing attention in induced-seismicity research, as its use in seismology in general has rapidly expanded [9,10]. In one such example, Limbeck et al. [11] applied random forests and support vector machines to build a spatiotemporal forecasting pipeline for induced seismicity rates in Groningen, demonstrating the feasibility of data-driven methods in this setting. Additionally, Qin et al. [12] used random forests to forecast wastewater-injection-induced seismicity in Oklahoma, identifying pore-pressure change rate and poroelastic-stress change rate as the most consequential predictors. As for deep learning methods, Karimpouli et al. [13] applied an attention-enhanced LSTM for induced seismicity forecasting in enhanced geothermal systems, showcasing its ability to capture complex nonlinear dependencies in long injection–production sequences. In addition, Picozzi et al. [14] used recurrent neural networks (RNNs) to identify preparatory-phase precursory patterns that occur before induced seismic events in enhanced geothermal systems. Dascher-Cousineau et al. [15] proposed a flexible and scalable earthquake-forecasting framework, RECAST, based on neural temporal point processes. Furthermore, Convertito et al. [16] developed the PreD-Net deep learning architecture to predict large induced seismic events from precursory signals, and demonstrated cross-regional generalization across multiple induced-seismicity settings.

Despite this progress, current deep learning-based studies still suffer from two central limitations. The first is inadequate modeling of spatial dependence. Many existing methods treat spatial nodes or reservoir blocks as independent entities, and therefore ignore coupling effects transmitted among blocks through fault systems and pore-pressure diffusion. In reality, pore-pressure perturbations can propagate across blocks through fault permeability and poroelastic effects, sometimes extending well beyond the directly exploited area [17]. In the Groningen gas field, the spatial distribution of seismicity is closely tied to fault density and the heterogeneous distribution of reservoir compaction [6]; moreover, the influence between blocks is neither symmetric nor uniform, but instead controlled by local geological structure. Graph Neural Networks (GNNs) are well-suited to solve this problem because they learn from graph-structured data through message passing and aggregation [18], and have already shown promise in varied seismological applications. As an example of this, Zhang et al. [19] proposed a spatiotemporal graph convolutional network (STGNN) that automatically constructs graph structures from inter-station distance and waveform similarity, yielding better source-characterization accuracy than fully connected networks and conventional GNNs. Also, Bloemheuvel et al. [20] developed TISER-GCN for multivariate seismic time-series regression, improving ground-motion predictions by explicitly modeling spatial relations among stations. Additionally, Leema et al. [21] proposed SeismoQuakeGNN, a hybrid GNN–Transformer framework for spatiotemporal earthquake prediction that combines spatial learning and temporal modeling. Even so, the usage of GNNs in extraction-induced seismicity forecasting is still in its infancy, and how to leverage graph attention to learn heterogeneous, geology-related spatial coupling in a fully data-driven manner remains an open question. The second limitation is extreme class imbalance. Induced seismicity in Groningen is highly sparse in both space and time, so within a node-discretized spatiotemporal sample framework, the proportion of seismic-event samples is extremely small. Under such conditions, many existing studies have adopted a regression formulation to directly predict seismicity rates or continuous magnitudes. These models are easily dominated by the overwhelming number of zero-valued samples, producing a “no-event bias” that markedly reduces the sensitivity to sparse seismic events. Taken together, the absence of fully data-driven modeling of heterogeneous spatial coupling and the insensitivity of regression-based formulations to sparse positive events motivate the present study.

This study aims to close these gaps by developing a multi-task spatiotemporal forecasting framework for the Groningen gas field that (i) captures heterogeneous inter-block coupling through graph attention and (ii) reformulates sparse-event prediction as joint classification tasks under imbalance-aware losses. The proposed model combines a Graph Attention Network (GAT) with a Long Short-Term Memory (LSTM) network to learn heterogeneous inter-node interactions and lagged reservoir responses from node-level sequences of production, reservoir pressure, compaction, and historical seismicity. To overcome the extreme sparsity of seismic events in Groningen, the forecasting problem is reformulated as two coupled classification tasks—namely event detection and magnitude classification—and is optimized using imbalance-aware loss functions. The model outputs are further integrated into a Seismic Risk Index (SRI), which is a continuous and spatially interpretable metric for seismic risk. Using the KNMI seismic catalog and field monitoring data, the proposed framework outperforms a lone LSTM baseline in event detection and magnitude classification under extremely sparse seismicity. These results demonstrate a practical data-driven methodology for fine-grained forecasting and spatial risk assessment of extraction-induced seismicity in producing gas fields.

The remainder of this paper is organized as follows. Section 2 introduces the physical background and formulates the prediction problem. Section 3 presents the GAT–LSTM model, the multi-task loss, and the Seismic Risk Index. Section 4 reports validation and extended evaluation on the Groningen field. Section 5 concludes and outlines future work.

2. Problem Formulation and Methodology

2.1. Physical Mechanisms of Induced Seismicity and the Temporal Prediction Principle

Seismicity can be induced when engineering operations disturb the pre-existing stress equilibrium of a subsurface reservoir. The typical engineering configuration and geomechanical processes of induced seismicity are schematically illustrated in Figure 1. For extraction-driven seismicity in fields such as Groningen, the governing mechanism can be understood in terms of poroelasticity and the effective stress principle.

During natural gas production, the reservoir pore pressure (

p

) declines continuously as gas is extracted. According to Biot’s generalized effective stress principle [22], the effective stress tensor

σ_{i j}^{'}

acting on the rock skeleton is related to the total stress

σ_{i j}

and pore pressure by

σ_{i j}^{'} = σ_{i j} - α p δ_{i j}

(1)

where

α

is the Biot coefficient and

δ_{i j}

is the Kronecker delta function. As the pore pressure decreases while the total overburden pressure remains approximately constant, the effective vertical stress acting on the reservoir rock skeleton increases significantly, causing elastic or inelastic compaction of the reservoir. Due to the heterogeneity of subsurface geological structures, the resulting reservoir compaction is spatially non-uniform; this differential compaction generates additional shear stress fields within the reservoir and in the formations above and below. When the accumulated shear stress exceeds the frictional strength limit of a given fault, the fault is reactivated, and seismicity is induced, following the Mohr–Coulomb failure criterion [23]:

τ = c + μ_{f} σ_{n}^{'}

(2)

where

c

is the cohesion of the fault,

μ_{f}

is the friction coefficient, and

{σ^{'}}_{n}

is the effective normal stress acting on the fault plane.

These physical considerations suggest that historical production, change in pore pressure, and reservoir compaction over a given period should contain statistically informative signals for predicting future seismicity. Let the time-series data recorded by a monitoring system be expressed as

D = {(t_{1}, x_{1}), (t_{2}, x_{2}), \dots, (t_{n}, x_{n})}

(3)

where

x_{i} \in ℝ^{d}

is the feature vector observed at time

t_{i}

, containing cumulative gas production, pore-pressure change, reservoir compaction, and related variables. Then, for the

k

th spatial unit in the study region, given a dynamic feature sequence

S^{k} \in ℝ^{τ \times d}

over a historical window of length

L

(where the

d

-dimensional features include cumulative gas production, pore-pressure change, reservoir compaction, and related variables) and a graph structure

G

describing the spatial topology, the forecasting task can be formulated as learning a mapping

f

such that

f (S^{k}, G) = (y_{1}^{k}, y_{2}^{k})

(4)

where

y_{1}^{k} \in {0, 1}

denotes event occurrence and

y_{2}^{k} \in

{

small (M \leq 1.0)

,

modetare (1.0 < M \leq 2.0)

,

large (M > 2.0)

} denotes the magnitude class. These two targets are predicted simultaneously for all spatial units in the study region by independent classification heads.

Accordingly, a monthly sliding-window scheme [24] is adopted to construct input–output samples, which are then fed into a neural network to perform spatiotemporal forecasting of induced seismicity. A schematic illustration of this sample construction procedure is presented in Figure 2.

Equations (1)–(4) imply that future seismicity at a given location is jointly controlled by two processes operating at distinct scales: temporal accumulation of effective stress driven by production-induced pressure depletion, and spatial transmission of pressure and stress perturbations along faults and through pore networks. These two characteristics motivate the dual-encoder design adopted below—an LSTM temporal encoder (Section 2.3) for delayed reservoir responses, and a GAT spatial encoder (Section 2.2) for directional, geology-modulated inter-block coupling.

2.2. Graph Neural Networks

Graph Neural Networks (GNNs) [18] are deep architectures designed for representation learning on graph-structured data. The central idea of a GNN is to model complex relationships through inter-node message passing and aggregation. In the context of induced seismicity forecasting, pressure coupling transmitted between reservoir blocks through faults and pore networks gives the study area an intrinsically graph-like structure. To capture these heterogeneous spatial interactions, we adopt a Graph Attention Network (GAT) [25] as the core spatial encoder. In contrast to conventional graph convolutions, GAT introduces an attention mechanism that allows the model to aggregate information from neighboring nodes with adaptive weights. The key step in this spatial-encoding process is the calculation of the attention score

{\tilde{e}}_{i j}

between nodes:

{\tilde{e}}_{i j} = a^{⊤} LeakyReLU (W_{l} h_{i} + W_{r} h_{j} + W_{e} e_{i j})

(5)

in which

h_{i}, h_{j} \in ℝ^{d}

are node embedding vectors,

e_{i j} \in ℝ^{d_{e}}

is the edge feature vector,

W_{l}

,

W_{r}

, and

W_{e}

are learnable weight matrices, and

a

is the attention vector. Through this mechanism, the model can aggregate neighbor nodal information with adaptive weights, offering greater expressiveness than fixed-weight convolutional networks. The attention coefficient used for neighborhood aggregation is then obtained through the standard neighborhood-wise softmax operation in the GAT layer. This makes the method more suitable for capturing heterogeneous spatial influence relationships between reservoir blocks.

2.3. Long Short-Term Memory Networks

The Long Short-Term Memory (LSTM) network [26] is a type of recurrent neural network (RNN) that is specialized for long-sequence modeling. By introducing memory cells together with forget, input, and output gates, it alleviates the vanishing gradient problem of conventional RNNs, and enables selective retention or suppression of information across many time steps; as such, this structure can capture long-range temporal dependence.

The physical processes that govern induced seismicity in the Groningen gas field exhibit pronounced multiscale temporal lags: pore pressure responses to production changes commonly lag by several months, whereas fault reactivation reflects the cumulative effect of long-term stress buildup. An LSTM is particularly suited to this setting because its internal state-updating mechanism controls how much past information is discarded through the forget gate and how much new information is incorporated through the input gate, thereby maintaining stable memory of long-range physical behavior. A standard LSTM unit is illustrated in Figure 3. This gating mechanism allows the model to extract temporally distributed key states from the historical window and thus capture the complex temporal linkage between reservoir production history and future seismic events.

2.4. Multi-Task Learning and Class Imbalance Mitigation Strategies

Existing studies on induced seismicity often formulate the problem as a regression task, directly predicting seismicity rates or continuous magnitude values [11,27]. In Groningen, however, seismicity is extremely sparse in both space and time, so the proportion of positive event samples among all node–time pairs is minuscule. Under these conditions, regression models are dominated by the large number of zero-valued samples and tend to develop a “no-event bias,” which severely limits their sensitivity to rare events.

To address this issue, we reformulate the forecasting problem as a multi-task classification framework. For every spatial unit in the study region, the model predicts both event occurrence and magnitude class using separate classification heads, and the two tasks are jointly optimized through a combined loss function:

L = λ_{1} \cdot L_{event} + λ_{2} \cdot L_{mag}

(6)

where

λ_{1}

and

λ_{2}

are the task-specific loss weights. To handle the extreme imbalance of the event detection task, Focal Loss [28] is adopted as the optimization objective:

L_{event} = - \frac{1}{N} \sum_{i = 1}^{N} α_{i}^{*} {(1 - p_{i}^{*})}^{γ_{e}} \log (p_{i}^{*})

(7)

in which

p_{i}^{*}

is the predicted probability of the model for the correct class,

γ_{e}

is the focusing parameter for event detection, and

α_{i}^{*}

is the class balance factor. This mechanism automatically downweighs the contribution of the numerous easy negative samples (no-event) and concentrates the optimization on the hard minority positive samples (event), thereby improving event-detection recall under extreme class imbalance. For magnitude classification, focal-adjusted weighted cross-entropy [29] is used to address the within-class imbalance among small-, moderate-, and large-magnitude events:

L_{mag} = - \frac{1}{N} \sum_{i = 1}^{N} {(1 - p_{i}^{*})}^{γ_{m}} \cdot (\sum_{c = 1}^{C} w_{c} y_{i, c} \log ({\hat{y}}_{i, c}))

(8)

Here,

N

is the number of seismic-event nodes,

C = 3

is the number of magnitude classes,

w_{c}

is the class weight.

y_{i, c}

is the ground truth label (after label smoothing), and

{\hat{y}}_{i, c}

is the predicted probability for the

i

th sample.

p_{i}^{*} = \sum_{c = 1}^{C} y_{i, c} {\hat{y}}_{i, c}

denotes the predicted probability of the true class for the

i

th sample, and

γ_{m}

is the focusing parameter for magnitude classification.

3. Model Construction for Temporal Prediction of Induced Seismicity

To provide an integrated view of the proposed methodology, Figure 4 summarizes the overall workflow. Starting from the seismic catalog and production monitoring data, the framework involves spatial label discretization, temporal feature engineering, graph construction, feature normalization, chronological dataset splitting, and sliding-window sample generation. The resulting graph time-series samples are then used to train the GAT-LSTM multi-task model for event detection and magnitude classification; the outputs from this model are further combined to derive the Seismic Risk Index (SRI).

3.1. Construction of Graph-Structured Seismic Time-Series Data

The Groningen production monitoring network comprises production wells and 500 m × 500 m regular grid nodes generated by the NAM dynamic reservoir model. We represent this network as a directed graph:

G = (V, ε, X, E)

where

V

is the node set containing reservoir grid nodes and production wells with

| V | = N

;

ε

is the edge set describing spatial adjacency;

X \in ℝ^{N \times d_{x}}

is the node-feature matrix (

d_{x} = 3

, containing 2D coordinates and fault density); and

E \in ℝ^{| E | \times d_{e}}

is the edge-feature matrix containing inter-node Euclidean distances and fault-density differences, which are used to characterize directional pore-pressure diffusion along structural trends. In the temporal dimension, the feature vector

s_{i}^{t} \in ℝ^{d_{s}}

for node

v_{i}

at time

t

includes monthly date encoding, monthly production, reservoir pressure, reservoir compaction, historical seismicity frequency, and historical maximum magnitude. Each sample also contains a

d_{z}

-dimensional exogenous statistical vector

z \in ℝ^{d_{z}}

that summarizes compaction-related and historical-seismicity statistics so as to provide global context. Following the sliding-window scheme described above, sample pairs are constructed using the graph sequence

{G^{t - Δ t + 1}, \dots, G^{t}}

and statistical features

z

over the window

[t - Δ t + 1, t]

as inputs, and the event label and magnitude-class label at time

t + 1

as outputs for node

v_{i}

. The graph construction is motivated by the geological setting of Groningen rather than by geometry alone. Edges encode pairwise interactions controlled by two physical factors: spatial proximity, since pore-pressure perturbations diffuse over finite radii from depleted volumes; and fault-density contrast, since faults act as preferred conduits for pressure communication and as the principal loci of stress concentration. Embedding inter-node Euclidean distance together with fault-density in difference allows the GAT to learn anisotropic, fault-modulated coupling weights, so the attention mechanism acts as a data-driven surrogate for the heterogeneous pressure-diffusion process.

3.2. Design of the Dual-Encoder Multi-Task Spatiotemporal Prediction Model

Based on the multi-task classification framework, we construct a parallel dual-branch spatiotemporal prediction model. In the spatial module, attention-based aggregation is applied to the neighborhood of each graph node at each time step to generate spatially informed node embeddings. In the temporal module, the GAT processes the graph at each time step within the window

Δ t

and outputs a node-embedding sequence

{h_{t - Δ t + 1}, \dots, h_{t}}

, which is then passed in temporal order to a two-layer LSTM. The first LSTM layer encodes the sequence step-by-step, and its hidden states form the input to the second layer, which extracts a higher-level temporal representation. The final hidden state of the second layer serves as a summary of the temporal context. This representation is concatenated with exogenous statistical features—such as cumulative gas production, pore pressure changes, and reservoir compaction—and then fed to the task-specific classification branches. Joint optimization using the combined loss function improves the model’s ability to capture sparse seismic events under extreme class imbalance.

Because the event detection and magnitude classification subtasks differ systematically in terms of training signal strength, feature sensitivity, and loss scale [30], each task is assigned an independent spatiotemporal encoder and classification head. In each branch, a GAT spatial encoder is followed by an LSTM temporal encoder, and both branches are jointly optimized through the loss defined in Equation (6). The overall architecture is depicted in Figure 5.

3.3. Model Evaluation Metrics

Accuracy, Ref. [31] defined in Equation (9), is the most widely used performance metric in machine learning classification tasks. Classification model performance can also be summarized by a confusion matrix (Table 1), in which rows denote the actual classes and columns denote the predicted classes. The matrix here contains four basic elements: true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). TP denotes samples that belong to the target class and are correctly classified as such; TN denotes non-target samples that are correctly classified; FP (or “false alarms”) denotes non-target samples that were incorrectly classified as the target class; and FN (or “misses”) denotes target samples incorrectly classified as non-targets.

Accuracy = \frac{T P + T N}{T P + F P + T N + F N}

(9)

However, in severely imbalanced datasets, the accuracy metric can easily be dominated by the majority negative class, and thus not adequately reflect the model’s ability to identify rare positive samples. We therefore also compute recall [32] from the confusion matrix as

Recall = \frac{T P}{T P + F N}

(10)

Recall measures the fraction of actual positive samples that are successfully captured by the model; for the event detection task, a higher recall corresponds to a lower miss rate. For the multi-class magnitude task, the macro average arithmetic recall (MAvA) [33] is used as the overall evaluation metric:

MAvA = \frac{1}{c} \sum_{j = 1}^{c} {Recall}_{j} = \frac{1}{c} \sum_{j = 1}^{c} \frac{T P_{j}}{T P_{j} + F N_{j}}

(11)

where

{Recall}_{j}

is the recall for the

j

th magnitude class,

C

= 3. MAvA assigns equal weight to each class, is unaffected by the size of the majority class, and faithfully reflects the model’s balanced recognition capabilities across different magnitude intervals.

To further evaluate class-specific behavior, the Probability of Detection (POD) and False Alarm Ratio (FAR) [31] are defined as

POD = \frac{T P}{T P + F N}

(12)

FAR = \frac{F P}{T P + F P}

(13)

In the context of seismic early warning, the cost of an FN is substantially higher than that of an FP. POD and MAvA are therefore the most practically significant metrics in this study, whereas Accuracy is used as a supplementary indicator of overall performance.

3.4. Seismic Risk Index Definition

To translate the outputs of the event detection and magnitude classification branches into an engineering-oriented risk measure, we define a Seismic Risk Index (SRI) for each node

i

as

{SRI}_{i} = P_{i} (event) \cdot \sum_{k = 1}^{C} P_{i} ({class}_{k}) \cdot w_{k}

(14)

where

P_{i} (event)

is the probability of seismic event occurrence output by the event detection branch;

P_{i} ({class}_{k})

is the conditional probability from the magnitude classification branch, where

C

= 3, and

w_{1}

= 1,

w_{2}

= 5.6, and

w_{3}

= 31.6 are derived from the seismic energy release formula [34] and reflect the different hazard potentials of the different magnitude classes. Here, the event detection branch uses a sigmoid output, so that the occurrence of an event at each node–time step is modeled as a Bernoulli variable whose parameter is the predicted occurrence probability, whereas the magnitude classification branch uses a softmax output, so that the magnitude class follows a categorical distribution over the three classes. The SRI is therefore the expected hazard weight under this joint Bernoulli–categorical distribution, which allows the multi-task forecast to express both the likelihood of an event and its potential severity as a single continuous value.

To characterize the spatial heterogeneity of risk over a prediction interval, the time-averaged Seismic Risk Index (Mean SRI) is obtained by normalizing the node-level SRI at each time step and then averaging across time:

{\bar{SRI}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} \frac{{SRI}_{i} (t)}{\max_{j} {SRI}_{j} (t)}

(15)

Here,

T

is the time span of the statistical interval and

\max_{j} {SRI}_{j} (t)

is the maximum raw risk value across all nodes at time step

t

.

{\bar{SRI}}_{i}

mitigates the distorting effect of extreme events at individual time steps on the spatial distribution and robustly reflects the overall risk level at each node during the test period.

4. Model Validation and Analysis

4.1. Study Area and Data Sources

The Groningen gas field is located in the northeastern Netherlands (Figure 6a). Its main producing interval is the Lower Permian Rotliegend sandstone, which is buried at depths of about 2800–3200 m and overlain by a Zechstein evaporite caprock that provides an effective seal [35]. The principal stratigraphic framework is shown in Figure 6b. The reservoir is on a large anticlinal structure and contains multiple faults that originated from the basement. As discussed above, continued gas extraction has caused persistent pore pressure depletion and spatially varying reservoir compaction; when the resulting shear stress near faults exceeds fault frictional strength, seismic slip is triggered [35]. Since production began in 1963, cumulative pressure depletion in the central core of the field has exceeded 160 bar, and maximum reservoir compaction has surpassed 50 cm. Both variables exhibit pronounced spatial heterogeneity, and their temporal accumulation is strongly synchronized with the overall evolution of seismic activity (Figure 7 and Figure 8). This provides direct support for our choice to include reservoir pressure and compaction as node-level temporal input features. The strong physical linkage among production history, reservoir pressure, reservoir compaction, and induced seismicity, therefore, justifies using historical monitoring data to forecast future seismic activity.

To validate the proposed framework, we use the official Groningen induced seismicity catalog published by the Royal Netherlands Meteorological Institute (KNMI) as the primary seismic dataset. The catalog contains all confirmed events associated with gas extraction since 1991, and each record reports the origin time, epicentral coordinates, focal depth, and moment magnitude (Mw). The analysis period considered here spans January 1996 to October 2023. We chose 1996 as the starting year because KNMI completed the setup of a regional borehole geophone network covering the Groningen gas field in 1995. The original network consisted of eight borehole stations with an average spacing of about 20 km [37], and from 1996 onward, the monitoring system entered routine operational use, yielding a catalog with good consistency and reliability in terms of event detection and location accuracy [38]. The selected period captures most of the evolution of the field, from intensifying induced seismicity to the gradual decline that followed production restriction measures, making the dataset representative of the Groningen production cycle. The spatial distribution and temporal evolution of the recorded seismicity are shown in Figure 9.

Prior to model training, the inputs are processed in three steps. First, all time-series records are aligned to a common base date and aggregated to a monthly resolution to unify temporal granularity. Second, reservoir pressure and compaction fields are spatially mapped onto the graph nodes. Third, all continuous features are z-score standardized, with statistics computed exclusively from the training period to avoid information leakage. The spatial discretization of the seismic catalog is detailed separately in Section 4.2.

4.2. Spatial Discretization of the Seismic Catalog Using Voronoi Tessellation

The proposed GAT-LSTM model operates in an end-to-end node-level prediction setting, which requires a strict one-to-one correspondence between graph nodes and output labels [39]. The seismic catalog, however, reports epicenters as continuous spatial coordinates and therefore does not directly align with the predefined discrete nodes of the graph. Thus, a spatial mapping from continuous epicentral locations to discrete graph nodes is required prior to training, so that the raw catalog can be converted into node-level spatiotemporal labels.

To address this, we adopt a Voronoi tessellation-based nearest-neighbor assignment strategy. In computational geometry, Voronoi tessellation partitions space into non-overlapping convex polygons generated by a prescribed set of points, such that any point within a given polygon is closer to its own generator than to any other generator [40]. This approach has been widely used in seismological spatial analysis, including spatial zoning of seismic activity [41], reconstruction of ground-motion fields [42], and spatial modeling of frequency–magnitude distributions in seismic catalogs [43].

To discretize seismic events to their corresponding graph nodes, the study region is partitioned via Voronoi tessellation using each grid node as a generator. Because the grid fully covers the study region, the epicenter of every seismic event falls within a unique Voronoi control cell and is uniquely assigned to the corresponding node as its label information at a given time step. Following this procedure, the original continuous-space seismic catalog is discretized into a spatiotemporal label matrix in one-to-one correspondence with the graph nodes, satisfying the input–output structural alignment requirement of the end-to-end model. The upper bound on the positional quantization error introduced by this spatial assignment is strictly no greater than half the grid spacing (0.25 km), which is smaller than the epicenter location accuracy of the KNMI seismic catalog (approximately 0.5–1.0 km) [44]; consequently, no significant error is added to the label quality, and the method has adequate precision.

4.3. Model Training

Next, the spatiotemporal graph data obtained from Voronoi discretization are partitioned into input–output sample pairs, following the sliding-window framework described in Section 3.1. To prevent temporal leakage, the dataset is strictly split in chronological order: January 1996–December 2015 for training, January 2016–December 2018 for validation, and January 2019–December 2020 for testing.

The input window length is determined via an empirical sensitivity analysis (Figure 10), since reservoir pressure transmission and fault-stress accumulation operate over finite time scales. A window that is too short cannot adequately capture the delayed evolution of pore pressure migration along fault pathways, whereas too long of a window introduces redundant information and weakens the relevance of the input to the current prediction target. We accordingly compare window sizes of 2, 4, 6, 8, and 10 months while holding all other model settings fixed, and use event POD and magnitude MAvA as the selection criteria. As shown in Figure 10, both metrics follow a unimodal trend with increasing window length and reach their best values at

Δ t = 6

months. All subsequent experiments, therefore, use a 6-month input window. Under this setting, the training, validation, and test sets contain 234, 30, and 18 valid prediction time steps, respectively. All remaining hyperparameters are tuned on the validation set through a combination of grid search and informed manual adjustment, using event detection POD and magnitude classification MAvA as the joint criteria. The final configuration is summarized in Table 2.

4.4. Prediction Results and Analysis

The trained GAT-LSTM dual-encoder model is now applied to the test set to obtain event detection and magnitude classification outputs for each spatial unit at every forecast step. The following analyses first evaluate event detection and magnitude classification separately, and then quantify the contribution of the GAT spatial encoder through comparison with a lone LSTM baseline.

For event detection, the predictive results on 92,124 test samples are reported in Table 3. The model achieves a POD of 0.677 for the event class, correctly identifying 65 of the 96 true seismic events. This shows that even under an event rate of only 0.07%, the model avoids the degenerate all-no-event solution and retains meaningful sensitivity to sparse positive samples. The event detection FAR is, nevertheless, very high (0.997), corresponding to 26,310 false positives in absolute terms. This behavior is intrinsic to settings with extreme class imbalance: when positive samples account for only 0.07% of the total, even a moderate false positive rate among the vast number of negative samples translates into a false alarm count that greatly exceeds the number of true positives, as illustrated in Figure 11a.

Magnitude classification is only evaluated on the 96 node–time samples for which seismic events actually occurred, including 49 small-, 41 moderate-, and 6 large-magnitude events. The class-specific results are summarized in Table 4. The macro average recall value of 0.548 indicates a reasonable degree of balance across the three magnitude classes. The class-wise POD values for small-, moderate-, and large-magnitude events are 0.490, 0.488, and 0.667, respectively. The large-event class has the highest POD but also the highest FAR, indicating that the improved recall for the rarest class is achieved at the cost of more frequent confusion with the other two classes. In the confusion matrix, most of the misclassified small-magnitude events are assigned to the moderate class, which is consistent with the fact that these classes are adjacent on the continuous magnitude scale. The two missed large events are misclassified as one moderate and one small event, respectively, as presented in Figure 11b.

To quantify the contribution of spatial encoding, we compare the GAT-LSTM model with a pure LSTM baseline: here, the spatial module is removed, and each node is modeled independently in time, while all other hyperparameters are kept unchanged. The results are given in Table 5 and Table 6. In terms of event detection, the GAT-LSTM improves the POD by 2.1% relative to the baseline. Although the baseline LSTM achieves slightly higher overall Accuracy, the GAT-LSTM provides better sensitivity to actual events, which is the more important objective in this application. For magnitude classification, the GAT-LSTM improves the MAvA by 7.9% and raises the total Accuracy from 0.448 to 0.500. More specifically, it outperforms the baseline for small and large events, whereas the baseline is only marginally better for the moderate class. Overall, these comparisons highlight how the spatial-correlation information learned by the GAT encoder contributes positively to both subtasks.

The above ablation result also clarifies the explanatory role of the learned spatial interactions. If each node is modeled independently, the model loses part of its ability to detect sparse events and to distinguish magnitude levels. The improvement obtained after adding the GAT encoder indicates that information from neighboring nodes contains predictive signals beyond the target node’s own temporal history. This is consistent with the induced-seismicity mechanism in which pore-pressure depletion, compaction, and fault-related stress redistribution evolve spatially rather than strictly locally.

The quarterly Mean SRI maps for the test period are shown in Figure 12. At the spatial level, high-risk areas (defined as Mean SRI > 0.5) are consistently concentrated in the central-western to south-central part of the field, and the fraction of the field classified as high risk varies from 8.5% to 32.3%. This pattern indicates strong spatial heterogeneity and a clear directional concentration in the predicted risk field.

At the temporal level, both the regional mean SRI (0.371) and the high-risk coverage proportion (32.3%) reach their maximum in Q3 2020, closely matching the quarter with the highest observed release of seismic energy. In Q4 2020, when the seismic activity weakened, both indicators dropped simultaneously to their lowest levels in the test period. This temporal co-evolution suggests that the model-derived risk field accurately captures the main fluctuations in regional seismic activity intensity.

The spatial correspondence between predicted risk and observed seismicity is also strong. During the test period, 88 nodes experienced seismic events, and most of these nodes fell within regions with Mean SRI > 0.25, with many clustered inside the 0.50 contour. Figure 13 further compares the quarterly normalized mean SRI with the total observed seismic energy release. In general, the two indicators evolve in the same direction: both rise from Q4 2019 to Q1 2020, both decrease slightly from Q1 to Q2 2020, both increase sharply and peak from Q2 to Q3 2020, and both decline markedly from Q3 to Q4 2020. The only directional mismatch occurs in Q4 2019, when the mean SRI increases slightly while total seismic energy decreases. This discrepancy may reflect the more dispersed spatial distribution and lower per-event energy of the seismic events in that quarter, implying that the mean risk indicator is subjected to some degree of spatial averaging smoothing. Overall, the time-averaged risk field provides a useful data-driven representation of the spatiotemporal variations in seismicity and may support extraction planning and regional seismic risk management.

4.5. Extended Evaluation and Relation to Previous Groningen Studies

To assess the applicability of the proposed framework under changing operational conditions, the trained GAT-LSTM model is further applied, without retraining, to data from 2021 to 2023. This period corresponds to the late operational stage of Groningen, during which production was progressively restricted and moving toward full cessation; monthly production dropped sharply, and the rates of pressure depletion and incremental compaction slowed substantially [45]. Relative to the inputs from the training and primary testing periods, these inputs exhibit a clear distributional shift. Previous studies have demonstrated that although strong production reductions can systematically suppress seismicity rates, historically accumulated compaction and residual stress may still sustain a measurable level of seismic activity for several years [46,47]. The 2021–2023 interval is therefore treated as a follow-up out-of-sample evaluation window for testing the model’s capability to identify spatiotemporal patterns of seismic risk under altered operating conditions.

Figure 14 shows the quarterly Mean SRI maps for the nine quarters from Q3 2021 to Q3 2023. The main high-risk belt remains concentrated in the central-western to south-central portion of the field, which is consistent with the dominant pattern identified for 2019–2020. The observed seismic events continue to overlap strongly with these high-risk areas, suggesting that the model can still distinguish the principal spatial pattern of regional risk after a staged change in the input distribution. Figure 15 further compares quarterly Mean SRI with the total observed seismic energy release over the same period. These two quantities remain broadly co-directional, indicating that the model continues to capture the main temporal fluctuations in seismic activity. However, in individual quarters, the amplitude of the predicted risk index does not always match the magnitude of the observed energy release, which suggests that the absolute calibration of the model weakens somewhat under rapidly declining production.

These observations also help clarify the specific contribution of the proposed framework within the broader Groningen forecasting literature, although direct numerical comparisons should be evaluated cautiously because forecasting targets and evaluation protocols differ across studies. Previous work has addressed field-scale event counts and exceedance probabilities under production scenarios [27,48,49], physics- and stress-based spatiotemporal seismicity models [5,7,47,49,50], hazard-oriented smoothing and probabilistic assessment [4,38,51], and machine learning benchmarking [11]. In contrast, the present study focuses on node-level spatiotemporal classification and deriving a continuous relative risk surface (SRI) that fuses event occurrence probability with magnitude class probabilities.

Regarding spatial patterns, previous Groningen studies consistently show that induced seismicity is spatially heterogeneous and concentrated within limited parts of the field [4,7,38,48,49,50,51]. The Mean SRI maps obtained here are consistent with that overall picture, with elevated risk primarily being located in the central-western to south-central regions. More importantly, the predicted high-risk zones show strong spatial overlap with the observed event nodes during the 2019–2020 test period, and the same broad pattern persists in the 2021–2023 follow-up period despite the different operating conditions. This implies that the GAT-LSTM framework can recover the first-order spatial organization identified by physics-based and statistical approaches, while also resolving it at a finer, node-level scale.

In terms of forecast output, many previous Groningen studies were designed for scenario analysis, uncertainty quantification, or probabilistic hazard and risk assessment [4,27,38,48,49,51], and these approaches remain essential for long-term regulatory and engineering decision making. Compared to existing machine learning benchmarking studies, the main contribution of the present work is not just improved forecast scores, but the transformation of node-level predictions into a spatially interpretable risk product. By integrating event occurrence probability with magnitude class probabilities, the SRI converts discrete classification outputs into a continuous spatiotemporal risk surface; this surface is useful for hotspot identification, within-field relative-risk ranking, and rolling updates of the evolving seismicity pattern. Together with the broad temporal consistency between quarterly Mean SRI and observed seismic energy release, these results suggest that the proposed framework is better viewed not as a replacement for physics-based or hazard models, but as a complement to them with improvements in spatial resolution and updatability. The remaining mismatch in absolute amplitude during the follow-up period indicates the need for future work on uncertainty quantification and more robust physics-informed constraints [27,49].

5. Conclusions

To address two central challenges in deep learning-based prediction of gas extraction-induced seismicity—inadequate modeling of spatial dependence and extreme class imbalance—we proposed a multi-task spatiotemporal forecasting framework based on a GAT-LSTM dual-encoder architecture. This framework was validated on the Groningen gas field using the official KNMI seismic catalog together with production monitoring data. A dual-encoder spatiotemporal prediction model integrating a Graph Attention Network with a two-layer Long Short-Term Memory network was constructed. The GAT serves as the spatial encoder and the LSTM as the temporal encoder, forming an end-to-end feature extraction pipeline for jointly modeling heterogeneous spatial coupling among reservoir blocks and multiscale time-lagged behavior. To cope with the extreme class imbalance in the data (an event rate of 0.07%), the forecasting target was reformulated as a multi-task classification problem with event detection and magnitude classification subtasks. The joint loss design effectively mitigates the “no-event bias” that commonly affects conventional regression models under highly imbalanced conditions.

Validation on Groningen observations from 2019 to 2020 shows that the model achieves an event detection POD of 0.677 and a magnitude classification MAvA of 0.548. Relative to a pure LSTM baseline, adding the GAT spatial encoder improves the event detection POD by 2.1% and magnitude classification MAvA by 7.9%, thus confirming the benefit of explicitly modeling the graph-based spatial topology. The proposed Seismic Risk Index (SRI) fuses event occurrence probability with magnitude class probabilities, thereby converting discrete classification outputs into continuous risk estimates. Across both the primary test period and the extended evaluation period, the time-averaged risk field exhibits pronounced spatial heterogeneity, high spatial correspondence with observed event nodes, and a broadly co-directional quarterly evolution with the total seismic energy release. These results show that the proposed framework can meaningfully indicate the spatiotemporal differentiation of induced-seismicity risk, and thus serves as an effective measure for fine-grained forecasting and regional risk assessment of induced seismicity.

Several limitations of the present study should be noted. The framework was validated solely on the Groningen gas field; whether it generalizes to other induced-seismicity settings, such as wastewater injection or geothermal systems, remains untested. The SRI also provides a relative rather than an absolute risk estimate, and the model currently lacks uncertainty quantification; the high False Alarm Ratio under extreme class imbalance further limits direct operational use without post-processing. In terms of applicability, the framework is most suitable where production monitoring data are available at regular temporal resolution and where a sufficiently long and reliable seismic catalog exists for model training. Mature gas extraction fields most naturally meet these conditions, though the approach could extend to other fluid-induced seismicity contexts with comparable data availability. Future work should target physics-informed calibration under rapidly changing operating conditions, uncertainty quantification of the SRI, and cross-field transferability.

Author Contributions

Conceptualization, F.L. and H.Z.; methodology, H.Z.; software, H.Z.; validation, H.Z.; formal analysis, H.Z., Y.L. and R.X.; investigation, H.Z.; resources, F.L., H.Z. and R.X.; data curation, H.Z., R.X. and Y.L.; writing—original draft preparation, H.Z.; writing—review and editing, F.L., S.C. and H.Z.; visualization, H.Z.; supervision, F.L., S.C., F.W., S.W. and H.D.; project administration, F.L., S.C., F.W., S.W. and H.D.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is jointly supported by the National Natural Science Foundation of China (NSFC) (Grant No. 42577162), Zhejiang Provincial Science and Technology Plan (Grant No. 2025E10118), and the Open Foundation of Shaanxi Key Laboratory of Lacustrine Shale Gas Accumulation and Exploitation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

We sincerely thank the reviewers for their valuable suggestions, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no competing financial interests.

Abbreviations

The following abbreviations are used in this manuscript:

GAT	Graph Attention Network
LSTM	Long Short-Term Memory
GNN	Graph Neural Network
RNN	Recurrent Neural Network
SRI	Seismic Risk Index
POD	Probability of Detection
FAR	False Alarm Ratio
MAvA	Macro Average Arithmetic Recall
KNMI	Koninklijk Nederlands Meteorologisch Instituut
NAM	Nederlandse Aardolie Maatschappij
NSFC	National Natural Science Foundation of China

References

Keranen, K.M.; Weingarten, M. Induced Seismicity. Annu. Rev. Earth Planet. Sci. 2018, 46, 149–174. [Google Scholar] [CrossRef]
Zang, A.; Oye, V.; Jousset, P.; Deichmann, N.; Gritto, R.; McGarr, A.; Majer, E.; Bruhn, D. Analysis of Induced Seismicity in Geothermal Reservoirs—An Overview. Geothermics 2014, 52, 6–21. [Google Scholar] [CrossRef]
van Thienen-Visser, K.; Breunese, J.N. Induced Seismicity of the Groningen Gas Field: History and Recent Developments. Lead. Edge 2015, 34, 664–671. [Google Scholar] [CrossRef]
van Elk, J.; Doornhof, D.; Bommer, J.J.; Bourne, S.J.; Oates, S.J.; Pinho, R.; Crowley, H. Hazard and Risk Assessments for Induced Seismicity in Groningen. Neth. J. Geosci. 2017, 96, s259–s269. [Google Scholar] [CrossRef]
Bourne, S.J.; Oates, S.J.; van Elk, J.; Doornhof, D. A Seismological Model for Earthquakes Induced by Fluid Extraction from a Subsurface Reservoir. J. Geophys. Res. Solid Earth 2014, 119, 8991–9015. [Google Scholar] [CrossRef]
Dempsey, D.; Suckale, J. Physics-Based Forecasting of Induced Seismicity at Groningen Gas Field, the Netherlands. Geophys. Res. Lett. 2017, 44, 7773–7782. [Google Scholar] [CrossRef]
Candela, T.; Osinga, S.; Ampuero, J.-P.; Wassing, B.; Pluymaekers, M.; Fokker, P.A.; van Wees, J.-D.; de Waal, H.A.; Muntendam-Bos, A.G. Depletion-Induced Seismicity at the Groningen Gas Field: Coulomb Rate-and-State Models Including Differential Compaction Effect. J. Geophys. Res. Solid Earth 2019, 124, 7081–7104. [Google Scholar] [CrossRef]
van Wees, J.-D.; Osinga, S.; Van Thienen-Visser, K.; Fokker, P.A. Reservoir Creep and Induced Seismicity: Inferences from Geomechanical Modeling of Gas Depletion in the Groningen Field. Geophys. J. Int. 2018, 212, 1487–1497. [Google Scholar] [CrossRef]
Beroza, G.C.; Segou, M.; Mostafa Mousavi, S. Machine Learning and Earthquake Forecasting—Next Steps. Nat. Commun. 2021, 12, 4761. [Google Scholar] [CrossRef]
Kubo, H.; Naoi, M.; Kano, M. Recent Advances in Earthquake Seismology Using Machine Learning. Earth Planets Space 2024, 76, 36. [Google Scholar] [CrossRef]
Limbeck, J.; Bisdom, K.; Lanz, F.; Park, T.; Barbaro, E.; Bourne, S.; Kiraly, F.; Bierman, S.; Harris, C.; Nevenzeel, K.; et al. Using Machine Learning for Model Benchmarking and Forecasting of Depletion-Induced Seismicity in the Groningen Gas Field. Comput. Geosci. 2021, 25, 529–551. [Google Scholar] [CrossRef]
Qin, Y.; Chen, T.; Ma, X.; Chen, X. Forecasting Induced Seismicity in Oklahoma Using Machine Learning Methods. Sci. Rep. 2022, 12, 9319. [Google Scholar] [CrossRef]
Karimpouli, S.; Kwiatek, G.; Martínez-Garzón, P.; Caus, D.; Wang, L.; Dresen, G.; Bohnhoff, M. Forecasting Induced Seismicity in Enhanced Geothermal Systems Using Machine Learning: Challenges and Opportunities. Geophys. J. Int. 2025, 242, ggaf155. [Google Scholar] [CrossRef]
Picozzi, M.; Iaccarino, A.G. Forecasting the Preparatory Phase of Induced Earthquakes by Recurrent Neural Network. Forecasting 2021, 3, 17–36. [Google Scholar] [CrossRef]
Dascher-Cousineau, K.; Shchur, O.; Brodsky, E.E.; Günnemann, S. Using Deep Learning for Flexible and Scalable Earthquake Forecasting. Geophys. Res. Lett. 2023, 50, e2023GL103909. [Google Scholar] [CrossRef]
Convertito, V.; Giampaolo, F.; Amoroso, O.; Piccialli, F. Deep Learning Forecasting of Large Induced Earthquakes via Precursory Signals. Sci. Rep. 2024, 14, 2964. [Google Scholar] [CrossRef]
Moein, M.J.A.; Langenbruch, C.; Schultz, R.; Grigoli, F.; Ellsworth, W.L.; Wang, R.; Rinaldi, A.P.; Shapiro, S. The Physical Mechanisms of Induced Earthquakes. Nat. Rev. Earth Environ. 2023, 4, 847–863. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
Zhang, X.; Reichard-Flynn, W.; Zhang, M.; Hirn, M.; Lin, Y. Spatiotemporal Graph Convolutional Networks for Earthquake Source Characterization. J. Geophys. Res. Solid Earth 2022, 127, e2022JB024401. [Google Scholar] [CrossRef]
Bloemheuvel, S.; van den Hoogen, J.; Jozinović, D.; Michelini, A.; Atzmueller, M. Graph Neural Networks for Multivariate Time Series Regression with Application to Seismic Data. Int. J. Data Sci. Anal. 2023, 16, 317–332. [Google Scholar] [CrossRef]
Leema, A.; Balakrishnan, P.; Kiruba, G.G.; Rajarajan, G.; Goel, S.; Aggarwal, P. SeismoQuakeGNN: A Hybrid Framework for Spatio-Temporal Earthquake Prediction with Transformer-Enhanced Models. Front. Artif. Intell. 2025, 8, 1690476. [Google Scholar] [CrossRef]
Biot, M.A. General Theory of Three-Dimensional Consolidation. J. Appl. Phys. 1941, 12, 155–164. [Google Scholar] [CrossRef]
Labuz, J.F.; Zang, A. Mohr–Coulomb Failure Criterion. Rock. Mech. Rock. Eng. 2012, 45, 975–979. [Google Scholar] [CrossRef]
Wang, J.; Jiang, W.; Li, Z.; Lu, Y. A New Multi-Scale Sliding Window LSTM Framework (MSSW-LSTM): A Case Study for GNSS Time-Series Prediction. Remote Sens. 2021, 13, 3328. [Google Scholar] [CrossRef]
Brody, S.; Alon, U.; Yahav, E. How Attentive Are Graph Attention Networks? In Proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), Virtual Event, 25–29 April 2022. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Kaveh, H.; Batlle, P.; Acosta, M.; Kulkarni, P.; Bourne, S.J.; Avouac, J.P. Induced Seismicity Forecasting with Uncertainty Quantification: Application to the Groningen Gas Field. Seismol. Res. Lett. 2023, 95, 773–790. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007, 9268–9277. [Google Scholar] [CrossRef]
Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; Belongie, S. Class-Balanced Loss Based on Effective Number of Samples 2019. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Sener, O.; Koltun, V. Multi-Task Learning as Multi-Objective Optimization 2019. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; pp. 525–536. [Google Scholar]
Galkina, A.; Grafeeva, N. Machine Learning Methods for Earthquake Prediction: A Survey. In Proceedings of the Fourth Conference on Software Engineering and Information Management (SEIM-2019), Saint Petersburg, Russia, 13 April 2019. [Google Scholar]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Gutenberg, B.; Richter, C.F. Magnitude and Energy of Earthquakes. Nature 1955, 176, 795. [Google Scholar] [CrossRef]
de Jager, J.; Visser, C. Geology of the Groningen Field—An Overview. Neth. J. Geosci. 2017, 96, s3–s15. [Google Scholar] [CrossRef]
Buijze, L.; van den Bogert, P.; Wassing, B.; Orlic, B.; ten Veen, J. Fault Reactivation Mechanisms and Dynamic Rupture Modelling of Depletion-Induced Seismic Events in a Rotliegend Gas Reservoir. Neth. J. Geosci. 2017, 96, s131–s148. [Google Scholar] [CrossRef]
Dost, B.; Goutbeek, F.; Eck, T.; Kraaijpoel, D. Monitoring Induced Seismicity in the North of the Netherlands: Status Report 2010; KNMI: De Bilt, The Netherlands, 2012. [Google Scholar]
Dost, B.; Ruigrok, E.; Spetzler, J. Development of Seismicity and Probabilistic Hazard Assessment for the Groningen Gas Field. Neth. J. Geosci. 2017, 96, s235–s245. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Okabe, A.; Boots, B.; Sugihara, K.; Chiu, S.N.; Kendall, D.G. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd ed.; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar] [CrossRef]
Kamer, Y.; Hiemer, S. Data-Driven Spatial b Value Estimation with Applications to California Seismicity: To b or Not to b. J. Geophys. Res. Solid Earth 2015, 120, 5191–5214. [Google Scholar] [CrossRef]
Fornasari, S.F.; Pazzi, V.; Costa, G. A Machine-Learning Approach for the Reconstruction of Ground-Shaking Fields in Real Time. Bull. Seismol. Soc. Am. 2022, 112, 2642–2652. [Google Scholar] [CrossRef]
Muntendam-Bos, A.G.; Grobbe, N. Data-Driven Spatiotemporal Assessment of the Event-Size Distribution of the Groningen Extraction-Induced Seismicity Catalogue. Sci. Rep. 2022, 12, 10119. [Google Scholar] [CrossRef]
Spetzler, J.; Dost, B. Hypocentre Estimation of Induced Earthquakes in Groningen. Geophys. J. Int. 2017, 209, 453–465. [Google Scholar] [CrossRef]
Bommer, J.J.; van Elk, J.; Zoback, M.D. Estimating the Maximum Magnitude of Induced Earthquakes in the Groningen Gas Field, the Netherlands. Bull. Seismol. Soc. Am. 2024, 114, 2804–2822. [Google Scholar] [CrossRef]
Boitz, N.; Langenbruch, C.; Shapiro, S.A. Production-Induced Seismicity Indicates a Low Risk of Strong Earthquakes in the Groningen Gas Field. Nat. Commun. 2024, 15, 329. [Google Scholar] [CrossRef]
Hainzl, S.; Dahm, T.; Zöller, G. Modelling Induced Seismicity in Groningen Based on Subcritical Stressed Faults. Geophys. J. Int. 2025, 241, 840–851. [Google Scholar] [CrossRef]
Dempsey, D.E.; Suckale, J. Physics-Based Forecasting of Induced Seismicity at Groningen Gas Field, The Netherlands: Post Hoc Evaluation and Forecast Update. Seismol. Res. Lett. 2023, 94, 1429–1446. [Google Scholar] [CrossRef]
Candela, T.; Pluymaekers, M.; Ampuero, J.-P.; van Wees, J.-D.; Buijze, L.; Wassing, B.; Osinga, S.; Grobbe, N.; Muntendam-Bos, A.G. Controls on the Spatio-Temporal Patterns of Induced Seismicity in Groningen Constrained by Physics-Based Modelling with Ensemble-Smoother Data Assimilation. Geophys. J. Int. 2022, 229, 1282–1308. [Google Scholar] [CrossRef]
Richter, G.; Hainzl, S.; Dahm, T.; Zöller, G. Stress-Based, Statistical Modeling of the Induced Seismicity at the Groningen Gas Field, The Netherlands. Environ. Earth Sci. 2020, 79, 252. [Google Scholar] [CrossRef]
van Lieshout, M.N.M.; Baki, Z. Exploring Seismic Hazard in the Groningen Gas Field Using Adaptive Kernel Smoothing. Math. Geosci. 2024, 56, 1185–1206. [Google Scholar] [CrossRef]

Figure 1. Schematic illustration of induced seismicity mechanisms.

Figure 2. Schematic illustration of the sliding-window method.

Figure 3. Architecture of a standard LSTM unit.

Figure 4. Workflow of the proposed GAT–LSTM framework.

Figure 5. Framework of the dual-encoder multi-task spatiotemporal prediction model.

Figure 6. (a) Location of the Groningen gas field; (b) reservoir stratigraphic structure [36].

Figure 7. Evolution of reservoir pressure in the Groningen gas field (1958–2023): (a) spatial distribution of cumulative pressure depletion; (b) temporal evolution of pressure statistics.

Figure 8. Evolution of reservoir compaction in the Groningen gas field (1958–2023): (a) spatial distribution of cumulative compaction; (b) temporal evolution of compaction statistics.

Figure 9. Induced seismicity in the Groningen gas field: (a) spatial distribution of seismic events; (b) temporal evolution of event frequency by magnitude.

Figure 10. Model performance metrics under different time window sizes.

Figure 11. Confusion matrices: (a) event detection; (b) magnitude classification.

Figure 12. Spatial distribution of Mean SRI during the test period: (a) Q3 2019; (b) Q4 2019; (c) Q1 2020; (d) Q2 2020; (e) Q3 2020; (f) Q4 2020.

Figure 13. Temporal comparison of quarterly total seismic energy release and mean SRI during the test period.

Figure 14. Spatial distribution of Mean SRI during the extended evaluation period: (a) Q3 2021; (b) Q4 2021; (c) Q1 2022; (d) Q2 2022; (e) Q3 2022; (f) Q4 2022; (g) Q1 2023; (h) Q2 2023; (i) Q3 2023.

Figure 15. Temporal comparison of quarterly total seismic energy release with mean SRI during the extended evaluation period.

Table 1. Confusion matrix.

	Predicted Positive	Predicted Negative
Actual Positive	TP	FN
Actual Negative	FP	TN

Table 2. Hyperparameter settings of the GAT-LSTM model.

Category	Hyperparameter	Value
Model Architecture	GAT layers/attention heads/hidden dim	3/2/128
	LSTM layers/hidden dim.	2/128
	Dropout (GAT/LSTM/decoder)	0.3/0.3/0.1
Training	Optimizer/weight initialization	AdamW/Xavier normal
	Batch size/gradient clip norm/weight decay	16/1.0/1 × 10⁻⁵
	Learning rate	5 × 10⁻⁴
	LR scheduler	ReduceLROnPlateau (factor = 0.5)
	Early stopping patience	30
Loss Function	Event detection loss/Focal γ/pos_weight	Focal loss/2.0/50
	Magnitude classification loss/Focal γ/label smoothing	Weighted CE + focal/2.0/0.05
	$Loss weight w_{event} / w_{magnitude}$	5.0/10.0

Table 3. Event detection performance metrics.

Class	POD	FAR	Accuracy
No event	0.714	0.001	0.714
Event	0.677	0.997	0.714

Table 4. Classification performance for different magnitude classes.

Class	POD	FAR	MAvA	Accuracy
Small	0.490	0.314	0.548	0.50
Moderate	0.488	0.444
Large	0.667	0.840

Table 5. Comparison with the baseline model for event detection performance.

Model	POD	FAR	Accuracy
GAT-LSTM	0.677	0.997	0.714
Baseline LSTM	0.656	0.997	0.783

Table 6. Comparison with the baseline model for magnitude classification performance.

Model	Class	POD	FAR	MAvA	Accuracy
GAT-LSTM Event	Small	0.490	0.314	0.548	0.50
	Moderate	0.488	0.444
	Large	0.667	0.840
Baseline LSTM	Small	0.347	0.393	0.469	0.448
	Moderate	0.561	0.50
	Large	0.50	0.864

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Chen, S.; Wen, F.; Xu, R.; Luo, Y.; Liu, F.; Wang, S.; Duan, H. Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network. Appl. Sci. 2026, 16, 5568. https://doi.org/10.3390/app16115568

AMA Style

Zhang H, Chen S, Wen F, Xu R, Luo Y, Liu F, Wang S, Duan H. Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network. Applied Sciences. 2026; 16(11):5568. https://doi.org/10.3390/app16115568

Chicago/Turabian Style

Zhang, Hanfeng, Shuai Chen, Fenggang Wen, Rui Xu, Yuhao Luo, Fushen Liu, Shouguang Wang, and Hongfei Duan. 2026. "Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network" Applied Sciences 16, no. 11: 5568. https://doi.org/10.3390/app16115568

APA Style

Zhang, H., Chen, S., Wen, F., Xu, R., Luo, Y., Liu, F., Wang, S., & Duan, H. (2026). Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network. Applied Sciences, 16(11), 5568. https://doi.org/10.3390/app16115568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Task Spatiotemporal Prediction of Gas Extraction-Induced Seismicity Using a Hybrid GAT-LSTM Neural Network

Abstract

1. Introduction

2. Problem Formulation and Methodology

2.1. Physical Mechanisms of Induced Seismicity and the Temporal Prediction Principle

2.2. Graph Neural Networks

2.3. Long Short-Term Memory Networks

2.4. Multi-Task Learning and Class Imbalance Mitigation Strategies

3. Model Construction for Temporal Prediction of Induced Seismicity

3.1. Construction of Graph-Structured Seismic Time-Series Data

3.2. Design of the Dual-Encoder Multi-Task Spatiotemporal Prediction Model

3.3. Model Evaluation Metrics

3.4. Seismic Risk Index Definition

4. Model Validation and Analysis

4.1. Study Area and Data Sources

4.2. Spatial Discretization of the Seismic Catalog Using Voronoi Tessellation

4.3. Model Training

4.4. Prediction Results and Analysis

4.5. Extended Evaluation and Relation to Previous Groningen Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI